tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

GSOC asynchronous IO



Hi.

I am Ethan. Just reaching out to express interest in this upcoming GSOC. I stumbled across a very interesting project, refactoring asynchronous IO, which looks to require quite a lot of skillful tact and vision. And to go through most of the questions on your project application guideline.

About the project?

We will be refactoring the low-level I/O pipeline within the kernel to make all requests asynchronous by default. Which means that with regards to the availability of a resource, instead of immediately blocking on that availability, you would rather work with respect to some callback.

So the first step is to develop some infrastructure for a new internal API to facilitate these asynchronous callbacks. This new API will intermingle with all kinds of internal synchronization primitives. The following is kind of what I have in mind.

struct aio_ops {
  int (*callback)(struct aio_ops*);
  void *private;

  int fd;
  void *buffer;
  size_t length;
  off_t offset;
  int ops;

  kmutex_t mtx;
  kcondvar_t delivered;
  kcondvar_t available;

  int id;
  int status;
  int error;
  ...

  TAILQ_ENTRY(aio_ops) entries;
};

TAILQ_HEAD(aio_ops_queue, aio_ops);
struct aio_service_pool {
  struct aio_ops_queue ops_queue;

  int (*write)(struct aio_ops *);
  int (*read)(struct aio_ops *);
  int (*sync)(struct aio_ops *);
  int (*cancel)(struct aio_ops *);
  ...

  kmutex_t mtx;
  kcondvar_t pending;
};

Essentially each asynchronous operation is defined using struct aio_ops, each operation includes a callback function along with synchronization primitives to track the delivery of the callback as well as the availability of the resource. Asynchronous operations are queued to a designated service pool. Each service pool handles a specific class of operations. These pools are backed by per-CPU kernel threads, and those threads remain dormant until operations are pending. And probably it would be a good idea to implement load-balancing of pending operations across multiple service pools of the same classification down the road.

Implementation has to start from the lowest abstraction layer of the I/O path to ensure asynchronicity between the stack. The issue is allowing these servicing pools to concurrently invoke multiple operations within a single thread without blocking on any individual operation. So we will likely have to begin work at the level of the block device, or at least where the most fundamental I/O primitives are defined, before moving upwards.

So the idea is that this protocol functions as an intermediary layer within the I/O path that supports both asynchronous and synchronous modes. For synchronous behaviour, setting the callback to NULL while using the calling thread itself as the servicing thread, which then blocks directly on the availability of the resource. This should alleviate any potential overhead associated with this interface, basically providing a zero-cost abstraction for synchronous operations within a broader asynchronous framework.

This design is still in the early stages and is very tentative, and is definitely open to all kinds of refinement. The crux of this project will be getting this protocol right. Everything else should be relatively straightforward. So the first priority is to design and implement a proper internal protocol for asynchronous I/O. Then begin integrating this protocol at the lowest abstraction layer of the I/O path, moving upwards from there. And then finally revising the user-exposed POSIX AIO interface.

About me?

I am going into my third year of Computer Science at the University of Alberta. Over the past few years, I have spent a good amount of time working on hobbyist projects. Recently, I set up a minimal build of NetBSD and created an efficient workflow with the help of some scripts. Yesterday I submitted a patch for PR 58922, just something real simple to get into the groove of contributing. While I am not yet fully familiar with every single internal structure within NetBSD, I have spent a lot of time working with other monolithic POSIX kernels, and I have found that the knowledge is quite transferable, so you can pick things up pretty quickly. The later stages of this project will require knowledge of the POSIX AIO interface. But really the entire project will require in-depth knowledge of POSIX, as revising the I/O pipeline will involve working with many core subsystems. I am quite comfortable with POSIX, having spent a lot of time working with and implementing POSIX interfaces.

One of my projects, pastoral, implements quite a large subset of POSIX standards, including signals, Unix domain sockets, and a Unix file system, with support for over a hundred POSIX calls. This compliance allowed me to cross-compile and port quite a lot of programs, software such as Python, Xorg, Bash, GCC, the GNU coreutils, and more.

More recently, I have been experimenting with microkernels, dufay. This project is still under development, but most of the core interfaces for IPC and RPC are implemented. It is a compatibility-based microkernel with a per-processor user space scheduler that implements a CFS. We also have a pretty interesting protocol that allows for the very efficient handling of IRQs, exploiting tricks of shared memory, not requiring any additional routing or context switching, unlike other microkernels.

When it comes to these projects, I usually work on them with a core group of individuals, which has been a rewarding experience.

I am excited about this project, definitely going to learn a lot of this summer working on a product of this calibre with real professionals. If anyone wishes to contact me, this email is perfect. Planning on formally submitting a proposal for GSOC soon, so I hope for some feedback or comments.

Thanks.





Home | Main Index | Thread Index | Old Index