Analysis of IO model

brief introduction

IO model classification

Server side programming often requires the construction of high-performance IO models. There are four common IO models:
(1) Blocking IO: the traditional IO model.
(2) Synchronous non blocking IO: all sockets created by default are blocked. Non blocking IO requires that the socket be set to NONBLOCK.
(3) Asynchronous blocking IO (IO Multiplexing): that is, the classic Reactor design mode, which is mainly represented by IO Multiplexing.
(4) Asynchronous IO: the classic Proactor design pattern, also known as asynchronous non blocking io.

Synchronous, asynchronous, blocking, non blocking

The concepts of synchronization and asynchrony describe the interaction between the user thread and the kernel, which is distinguished from the perspective of the user thread: synchronization refers to that the user thread needs to wait or poll the kernel for the completion of the IO operation after initiating the IO request; Asynchronous means that the user thread continues to execute after initiating the IO request. When the kernel IO operation is completed, it will notify the user thread or call the callback function registered by the user thread.
The concepts of blocking and non blocking describe the way in which the user thread invokes the kernel IO operation: blocking means that the IO operation needs to be completely completed before returning to the user space; Instead of blocking, it means that the IO operation will return a status value to the user immediately after being called, without waiting for the IO operation to complete completely.

Case understanding

If you want to print a file:
Synchronization blocking: you go to the printing shop to queue up for printing, and then wait there until you finish printing.
Synchronous non blocking: you give the document you need to print to the shopkeeper, and then go shopping. However, after a while, I went back to the printing shop to ask if the printing was completed. If the printing was completed, I would take away the documents.
Asynchronous blocking: when shopping, I received a call from the printing shop saying that the printing was completed and asked me to pick it up in person.
Asynchronous non blocking: the printing shop called and said that we know your location. We'll send it to you later and shop at ease.

Synchronous blocking IO


The user thread initiates the IO read operation through the system call read, which is transferred from the user space to the kernel space. After the data packet arrives, the kernel copies the received data to the user space to complete the read operation. In the whole process of IO request, the user thread is blocked, which makes the user unable to do anything when initiating IO request and insufficient utilization of CPU resources. JAVA traditional IO model belongs to this way.

Synchronous non blocking IO


When the user thread initiates an IO request, it returns immediately, but does not read any data. The user thread needs to continuously initiate an IO request until the data arrives, then it really reads the data and continues to execute. That is, the user needs to constantly call read to try to read the data, and will not continue to process the received data until the reading is successful. In the whole process of IO request, although the user thread can return immediately after each IO request, in order to wait for data, it still needs to poll and repeat requests continuously, which consumes a lot of CPU resources. This is how SOCKET sets the NON-BLOCK mode.

Asynchronous model appears

Before proposing the asynchronous model, let's look at a C10K problem.
With the popularity of the Internet, websites are subject to higher and higher concurrent requests. C10K means that websites accept 1W concurrent requests at the same time. That 1W concurrent request requires that the website server needs to open 1W threads at the same time.
Current CPUs are designed based on time-sharing model. One core can only process one thread request at the same time, and switching threads has a large overhead on CPU and memory. About 3 microseconds to 8 microseconds at a time.
Lmbench tool download link: https://sourceforge.net/projects/lmbench
The test results are as follows using lmbench3 tool:

size=0k ovr=1.50
2 3.12
4 3.45
8 3.75
16 5.13
24 6.83
32 6.34
64 8.54
96 7.98

The time of 1W request is 80 milliseconds. If an application has 20 interfaces and high concurrency query, it is 1.6 seconds. This is only the CPU switching time, plus each business processing time, which is very unfriendly to the user experience.
There are two general directions to solve the problem of thread switching overhead. One is to add back-end servers to share requests. The other is to improve the processing and receiving capacity of each thread. The first is that the server overhead is large and small companies can't afford it. The second is to optimize the network model and improve the processing capacity. Therefore, asynchronous IO appears.

Asynchronous blocking IO

Reactor model diagram

Asynchronous blocking IO is designed and built on the basis of Reactor model, which is also called Reactor model. The basic composition structure is as follows (see http://www.dre.vanderbilt.edu/~schmidt/PDF/reactor-siemens.pdf):

The meanings of each part in the figure are as follows:

  • Initiation Dispatcher: the initial dispatcher used to register and remove events.
  • Handles: refers to the resources managed by the operating system, which can be understood as fd.
  • Synchronous Event Demultiplexer: an event notifier that notifies Handles of resource changes.
  • Event Handler: the interface of the Event Handler.
  • Concrete Event Handler: the actual implementation of the event handler.

The cooperation steps of each module are basically as follows:

  1. First, the user initiates a request to register the event into the Initiation Dispatcher.
  2. The Initiation Dispatcher calls the get of each Event Handler_ The Handle interface gets its bound Handle.
  3. The Initiation Dispatcher calls handle_events starts the event processing loop. Here, the Initiation Dispatcher will collect all the handles obtained in step 2 and use Synchronous Event Demultiplexer to wait for the events of these handles to occur.
  4. When an event of a Handle occurs, the Synchronous Event Demultiplexer notifies the Initiation Dispatcher.
  5. The Initiation Dispatcher finds out the corresponding processor for processing according to the Handle of the event.

The execution sequence diagram of Reactor model is as follows:

IO multiplexing

The Reactor model is implemented by IO multiplexing technology based on the system kernel and is widely used in redis, tomcat and other components. IO refers to network I/O, multiplexing refers to multiple TCP connections, and multiplexing refers to sharing a thread or process. The biggest advantage of this model is that it has low system overhead, does not need to create or maintain too many threads or processes, and reduces context switching, resource competition, CPU switching consumption and various lock operations.
There are three modes commonly used in IO multiplexing: select, poll and epoll.

select model

In the way of Reactor, the work of user thread polling IO operation status can be uniformly handed over to the handle_events event loop for processing. After registering the event handler, the user thread can continue to perform other work (asynchronous), and the Reactor thread is responsible for calling the select function of the kernel to check the socket status. When a socket is activated, notify the corresponding user thread (or execute the callback function of the user thread) to execute handle_event is used to read and process data.
This paper takes redis5 0 as an example, code interception analysis.
Redis source address: http://download.redis.io/releases
linux2.6. Kernel source code address: http://ftp.sjtu.edu.cn/sites/ftp.kernel.org/pub/linux/kernel/v2.6
Redis ae_select.c code is as follows

#include <sys/select.h>
#include <string.h>
......
static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {
    aeApiState *state = eventLoop->apidata;
    int retval, j, numevents = 0;
    memcpy(&state->_rfds,&state->rfds,sizeof(fd_set));
    memcpy(&state->_wfds,&state->wfds,sizeof(fd_set));
    retval = select(eventLoop->maxfd+1,&state->_rfds,&state->_wfds,NULL,tvp);// Call kernel select function
    if (retval > 0) {
        for (j = 0; j <= eventLoop->maxfd; j++) {
            int mask = 0;
            aeFileEvent *fe = &eventLoop->events[j];
            if (fe->mask == AE_NONE) continue;
            if (fe->mask & AE_READABLE && FD_ISSET(j,&state->_rfds))
                mask |= AE_READABLE;
            if (fe->mask & AE_WRITABLE && FD_ISSET(j,&state->_wfds))
                mask |= AE_WRITABLE;
            eventLoop->fired[numevents].fd = j;
            eventLoop->fired[numevents].mask = mask;
            numevents++;
        }
    }
    return numevents;
}

The interface of select is defined as follows:

extern int select (int __nfds,
 fd_set *__restrict __readfds,fd_set *__restrict __writefds, fd_set *__restrict __exceptfds,
 struct timeval *__restrict __timeout);

Parameter 1 indicates the value of the maximum file descriptor in the set to be monitored + 1.
The three sets of parameters 2, 3 and 4 respectively store the file descriptors that need to listen for read, write and exception operations.
Parameter 5 indicates the timeout time. Set to 0 to scan immediately and return. Set to NULL to wait forever until a file descriptor is ready.

select.c kernel source code:

int do_select(int n, fd_set_bits *fds, s64 *timeout)
{
   struct poll_wqueues table;
   poll_table *wait;
   int retval, i;
   rcu_read_lock();
   retval = max_select_fd(n, fds);
   rcu_read_unlock();
   if (retval < 0)
      return retval;
   n = retval;
   poll_initwait(&table);
   wait = &table.pt;
   if (!*timeout)
      wait = NULL;
   retval = 0;
   for (;;) {// Polling operation
      unsigned long *rinp, *routp, *rexp, *inp, *outp, *exp;
      long __timeout;
      set_current_state(TASK_INTERRUPTIBLE);
      inp = fds->in; outp = fds->out; exp = fds->ex;
      rinp = fds->res_in; routp = fds->res_out; rexp = fds->res_ex;
      for (i = 0; i < n; ++rinp, ++routp, ++rexp) {// Traversing read, write and exception arrays
         unsigned long in, out, ex, all_bits, bit = 1, mask, j;
         unsigned long res_in = 0, res_out = 0, res_ex = 0;
         const struct file_operations *f_op = NULL;
         struct file *file = NULL;
         in = *inp++; out = *outp++; ex = *exp++;
         for (j = 0; j < __NFDBITS; ++j, ++i, bit <<= 1) {
            int fput_needed;
            if (file) {
               f_op = file->f_op;
               mask = DEFAULT_POLLMASK;
               if (f_op && f_op->poll)
                  mask = (*f_op->poll)(file, retval ? NULL : wait);// Call the poll operation corresponding to the file
	if ((mask & POLLIN_SET) && (in & bit)) {
   	    res_in |= bit;retval++;
	}
             }
            cond_resched();
         }
         if (res_in)
            *rinp = res_in;
      }
   }
   __set_current_state(TASK_RUNNING);
   poll_freewait(&table);
   return retval;
}

The high cost of select is that it has to traverse and scan the ready state of each file descriptor every time, and it starts from the smallest descriptor 0. It has done a lot of useless work, so the efficiency is very low. With the increase of file descriptors, the efficiency will be lower and lower.

epoll model

Epoll is an enhanced version of the previous select. To understand epoll, you must first understand the three key elements of epoll: mmap, red black tree and rdllist linked list.

  • Both the kernel address and the user address are mapped to the same block of memory space (whether the kernel address and the user address are in the physical state or in the virtual state). At the same time, both the kernel address and the user address are mapped to the same block of memory space. The kernel can directly see the handle monitored by epoll, which is efficient.
  • Epoll uses a red black tree to store all sockets in its implementation. When a socket is added or deleted (epoll_ctl), it is processed on the red black tree. The red black tree has good query performance, time complexity O(logN), and there is no maximum number limit.
  • When epoll adds an event, it will establish a callback relationship between the event and the corresponding program. When the corresponding event occurs, it will call this callback function, which is called EP in the kernel_ poll_ Callback, this callback function actually adds this event to rdllist, a two-way linked list. Once an event occurs, epoll will add the event to the two-way linked list. Then, when we check the polling, we only need to check whether there are registered events in the rdlist two-way linked list, which is very efficient.

Redis ae_epoll.c code is as follows

#include <sys/epoll.h>

static int aeApiCreate(aeEventLoop *eventLoop) {
    aeApiState *state = zmalloc(sizeof(aeApiState));
    if (!state) return -1;
    state->events = zmalloc(sizeof(struct epoll_event)*eventLoop->setsize);
    if (!state->events) {
        zfree(state);
        return -1;
    }
    state->epfd = epoll_create(1024); /* 1024 is just a hint for the kernel */
    if (state->epfd == -1) {
        zfree(state->events);
        zfree(state);
        return -1;
    }
    eventLoop->apidata = state;
    return 0;
}
static int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask) {
    aeApiState *state = eventLoop->apidata;
    struct epoll_event ee = {0}; 
    int op = eventLoop->events[fd].mask == AE_NONE ?
            EPOLL_CTL_ADD : EPOLL_CTL_MOD;
    ee.events = 0;
    mask |= eventLoop->events[fd].mask; /* Merge old events */
    if (mask & AE_READABLE) ee.events |= EPOLLIN;
    if (mask & AE_WRITABLE) ee.events |= EPOLLOUT;
    ee.data.fd = fd;
    if (epoll_ctl(state->epfd,op,fd,&ee) == -1) return -1;
    return 0;
}
static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {
    aeApiState *state = eventLoop->apidata;
    int retval, numevents = 0;
    retval = epoll_wait(state->epfd,state->events,eventLoop->setsize,
            tvp ? (tvp->tv_sec*1000 + tvp->tv_usec/1000) : -1);
    if (retval > 0) {
        int j;
        numevents = retval;
        for (j = 0; j < numevents; j++) {
            int mask = 0;
            struct epoll_event *e = state->events+j;

            if (e->events & EPOLLIN) mask |= AE_READABLE;
            if (e->events & EPOLLOUT) mask |= AE_WRITABLE;
            if (e->events & EPOLLERR) mask |= AE_WRITABLE;
            if (e->events & EPOLLHUP) mask |= AE_WRITABLE;
            eventLoop->fired[j].fd = e->data.fd;
            eventLoop->fired[j].mask = mask;
        }
    }
    return numevents;
}

epoll's interface definition code is as follows

/*
 * Create an epoll handle and generate a red black tree epitem in the kernel cache
 */
int epoll_create(int size);
/*
 * It can be understood that adding, deleting and modifying fd requires listening for events
 * epfd It's epoll_ Handle created by create().
 * op Indicates addition, deletion and modification
 * epoll_event Indicates events that need to be monitored. Redis only uses four statuses: readable, writable, error and hang up
 */
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
/*
 * It can be understood as querying qualified events
 * epfd It's epoll_ Handle created by create().
 * epoll_event Used to store the collection of events obtained from the kernel
 * maxevents Maximum number of events obtained
 * timeout Wait timeout
 */
int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);

eventpoll.c implementation code is as follows

struct epitem {// Red black tree definition
   /* RB-Tree node used to link this structure to the eventpoll rb-tree */
   struct rb_node rbn;
   /* List header used to link this structure to the eventpoll ready list */
   struct list_head rdllink;
   /* The file descriptor information this item refers to */
   struct epoll_filefd ffd;
   /* Number of active wait queue attached to poll operations */
   int nwait;
   /* List containing poll wait queues */
   struct list_head pwqlist;
   /* The "container" of this item */
   struct eventpoll *ep;
   /* The structure that describe the interested events and the source fd */
   struct epoll_event event;
   /*
    * Used to keep track of the usage count of the structure. This avoids
    * that the structure will desappear from underneath our processing.
    */
   atomic_t usecnt;
   /* List header used to link this item to the "struct file" items list */
   struct list_head fllink;
   /* List header used to link the item to the transfer list */
   struct list_head txlink;
   /*
    * This is used during the collection/transfer of events to userspace
    * to pin items empty events set.
    */
   unsigned int revents;
};
/*
 * It opens an eventpoll file descriptor by suggesting a storage of "size"
 * file descriptors. The size parameter is just an hint about how to size
 * data structures. It won't prevent the user to store more than "size"
 * file descriptors inside the epoll interface. It is the kernel part of
 * the userspace epoll_create(2).
 */
asmlinkage long sys_epoll_create(int size){
   int error, fd;
   struct eventpoll *ep;
   struct inode *inode;
   struct file *file;
   DNPRINTK(3, (KERN_INFO "[%p] eventpoll: sys_epoll_create(%d)\n",
           current, size));
   /*
    * Sanity check on the size parameter, and create the internal data
    * structure ( "struct eventpoll" ).
    */
   error = -EINVAL;
   if (size <= 0 || (error = ep_alloc(&ep)) != 0)
      goto eexit_1;
   /*
    * Creates all the items needed to setup an eventpoll file. That is,
    * a file structure, and inode and a free file descriptor.
    */
   error = ep_getfd(&fd, &inode, &file, ep);
   if (error)
      goto eexit_2;

   DNPRINTK(3, (KERN_INFO "[%p] eventpoll: sys_epoll_create(%d) = %d\n",
           current, size, fd));

   return fd;

eexit_2:
   ep_free(ep);
   kfree(ep);
eexit_1:
   DNPRINTK(3, (KERN_INFO "[%p] eventpoll: sys_epoll_create(%d) = %d\n",
           current, size, error));
   return error;
}

asmlinkage long
sys_epoll_ctl(int epfd, int op, int fd, struct epoll_event __user *event) {
   int error;
   struct file *file, *tfile;
   struct eventpoll *ep;
   struct epitem *epi;
   struct epoll_event epds;
   . . . . . . 
   /* Try to lookup the file inside our hash table */
   epi = ep_find(ep, tfile, fd);

   error = -EINVAL;
   switch (op) {
   case EPOLL_CTL_ADD:
      if (!epi) {
         epds.events |= POLLERR | POLLHUP;
         error = ep_insert(ep, &epds, tfile, fd);// Insert data into red black tree
      } else
         error = -EEXIST;
      break;
   case EPOLL_CTL_DEL:
      if (epi)
         error = ep_remove(ep, epi);
      else
         error = -ENOENT;
      break;
   case EPOLL_CTL_MOD:
      if (epi) {
         epds.events |= POLLERR | POLLHUP;
         error = ep_modify(ep, epi, &epds);
      } else
         error = -ENOENT;
      break;
   }
   /*
    * The function ep_find() increments the usage count of the structure
    * so, if this is not NULL, we need to release it.
    */
   if (epi)
      ep_release_epitem(epi);

   up_write(&ep->sem);
eexit_3:
   fput(tfile);
eexit_2:
   fput(file);
eexit_1:
   DNPRINTK(3, (KERN_INFO "[%p] eventpoll: sys_epoll_ctl(%d, %d, %d, %p) = %d\n",
           current, epfd, op, fd, event, error));
   return error;
}
static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
           struct file *tfile, int fd) {
   int error, revents, pwake = 0;
   unsigned long flags;
   struct epitem *epi;
   struct ep_pqueue epq;

   error = -ENOMEM;
   if (!(epi = kmem_cache_alloc(epi_cache, SLAB_KERNEL)))
      goto eexit_1;

   /* Item initialization follow here ... */
   ep_rb_initnode(&epi->rbn);
   INIT_LIST_HEAD(&epi->rdllink);
   INIT_LIST_HEAD(&epi->fllink);
   INIT_LIST_HEAD(&epi->txlink);
   INIT_LIST_HEAD(&epi->pwqlist);
   epi->ep = ep;
   ep_set_ffd(&epi->ffd, tfile, fd);
   epi->event = *event;
   atomic_set(&epi->usecnt, 1);
   epi->nwait = 0;

   /* Initialize the poll table using the queue callback */
   epq.epi = epi;
   init_poll_funcptr(&epq.pt, ep_ptable_queue_proc);// Initialize queue

   revents = tfile->f_op->poll(tfile, &epq.pt);
   . . . . . . 
   return 0;

eexit_2:
   ep_unregister_pollwait(ep, epi);

   /*
    * We need to do this because an event could have been arrived on some
    * allocated wait queue.
    */
   write_lock_irqsave(&ep->lock, flags);
   if (ep_is_linked(&epi->rdllink))
      ep_list_del(&epi->rdllink);
   write_unlock_irqrestore(&ep->lock, flags);

   kmem_cache_free(epi_cache, epi);
eexit_1:
   return error;
}

/*
 * This is the callback that is used to add our wait queue to the
 * target file wakeup lists.
 */
static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
             poll_table *pt){
   struct epitem *epi = ep_item_from_epqueue(pt);
   struct eppoll_entry *pwq;

   if (epi->nwait >= 0 && (pwq = kmem_cache_alloc(pwq_cache, SLAB_KERNEL))) {
      init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);// Call callback function
      pwq->whead = whead;
      pwq->base = epi;
      add_wait_queue(whead, &pwq->wait);
      list_add_tail(&pwq->llink, &epi->pwqlist);
      epi->nwait++;
   } else {
      /* We have to signal that an error occurred */
      epi->nwait = -1;
   }
}

/*
 * This is the callback that is passed to the wait queue wakeup
 * machanism. It is called by the stored file descriptors when they
 * have events to report.
 */
static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *key)
{
   int pwake = 0;
   unsigned long flags;
   struct epitem *epi = ep_item_from_wait(wait);
   struct eventpoll *ep = epi->ep;

   DNPRINTK(3, (KERN_INFO "[%p] eventpoll: poll_callback(%p) epi=%p ep=%p\n",
           current, epi->ffd.file, epi, ep));

   write_lock_irqsave(&ep->lock, flags);
   /*
    * If the event mask does not contain any poll(2) event, we consider the
    * descriptor to be disabled. This condition is likely the effect of the
    * EPOLLONESHOT bit that disables the descriptor when an event is received,
    * until the next EPOLL_CTL_MOD will be issued.
    */
   if (!(epi->event.events & ~EP_PRIVATE_BITS))
      goto is_disabled;
   /* If this file is already in the ready list we exit soon */
   if (ep_is_linked(&epi->rdllink))
      goto is_linked;
   list_add_tail(&epi->rdllink, &ep->rdllist);// Add the fd to the ready linked list monitored by epoll
is_linked:
   /*
    * Wake up ( if active ) both the eventpoll wait list and the ->poll()
    * wait list.
    */
   if (waitqueue_active(&ep->wq))
      __wake_up_locked(&ep->wq, TASK_UNINTERRUPTIBLE |
             TASK_INTERRUPTIBLE);
   if (waitqueue_active(&ep->poll_wait))
      pwake++;
}

asmlinkage long sys_epoll_wait(int epfd, struct epoll_event __user *events,
                int maxevents, int timeout)
{
   int error;
   struct file *file;
   struct eventpoll *ep;

   . . . . . . 

   /* Time to fish for events ... */
   error = ep_poll(ep, events, maxevents, timeout);// Determine whether a callback event occurs

eexit_2:
   fput(file);
eexit_1:
   DNPRINTK(3, (KERN_INFO "[%p] eventpoll: sys_epoll_wait(%d, %p, %d, %d) = %d\n",
           current, epfd, events, maxevents, timeout, error));

   return error;
}

static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
         int maxevents, long timeout) {
   int res, eavail;
   unsigned long flags;
   long jtimeout;
   wait_queue_t wait;
retry:
   write_lock_irqsave(&ep->lock, flags);
   res = 0;
   if (list_empty(&ep->rdllist)) {
      init_waitqueue_entry(&wait, current);
      __add_wait_queue(&ep->wq, &wait);
      for (;;) {// Polling operation
         set_current_state(TASK_INTERRUPTIBLE);
         if (!list_empty(&ep->rdllist) || !jtimeout)// Get EP_ poll_ Value stored in callback
            break;
         write_unlock_irqrestore(&ep->lock, flags);
         jtimeout = schedule_timeout(jtimeout);
         write_lock_irqsave(&ep->lock, flags);
      }
      __remove_wait_queue(&ep->wq, &wait);
      set_current_state(TASK_RUNNING);
   }
   eavail = !list_empty(&ep->rdllist);
   write_unlock_irqrestore(&ep->lock, flags);
   if (!res && eavail &&!(res = ep_events_transfer(ep, events, maxevents)) && jtimeout)
      goto retry;
   return res;
}

epoll operates fd in two modes:
LT mode (horizontal trigger): when epoll_wait detects the occurrence of a descriptor event and notifies the application of this event. The application may not process the event immediately. Next call epoll_ When you wait, the application responds again and notifies you of this event.
ET mode (edge trigger): when epoll_wait detects the occurrence of a descriptor event and notifies the application of this event, which must be handled immediately. If not, epoll will be called next time_ When waiting, the application will not respond again and notify this event.
Comparison between LT and ET:
LT efficiency will be lower than ET trigger, especially in the case of large concurrency and large traffic. However, lt has low requirements for code writing and is not prone to problems. The performance of LT mode service writing is that as long as there is data not obtained, the kernel will constantly notify you, so you don't have to worry about the loss of events.
The efficiency of ET is very high. In the case of concurrency and large traffic, there will be many fewer epoll system calls than LT, so the efficiency is high. However, the programming requirements are high, and each request needs to be handled carefully, otherwise it is prone to loss events.
In essence, compared with LT, ET model improves parallel efficiency by reducing system calls.
IO multiplexing is the most commonly used IO model, but its asynchrony is not "complete" because it blocks threads. Therefore, IO multiplexing can only be called asynchronous blocking IO, not real asynchronous io.

Asynchronous non blocking IO

"True" asynchronous IO requires stronger support from the operating system. In the IO multiplexing model, the event loop notifies the user thread of the status event of the file handle, and the user thread reads and processes the data by itself. In the asynchronous IO model, when the user thread receives the notification, the data has been read by the kernel and placed in the buffer specified by the user thread. After the IO is completed, the kernel can notify the user thread to use it directly.

Proactor model

The asynchronous IO model uses the Proactor design pattern to implement this mechanism.

In the asynchronous IO model, the user thread directly initiates a read request using the asynchronous IO API provided by the kernel, and returns immediately after initiation to continue executing the user thread code. However, at this time, the user thread has registered the corresponding event processor, and then the operating system starts an independent kernel thread to handle IO operations. At this time, the event processor does not focus on the read ready event, but on the read completion event, which is the key difference from Reactor.
When the data requested by read arrives, the kernel is responsible for reading the data and writing it into the buffer specified by the user (asynchronous IO is that the operating system is responsible for reading and writing the data into the buffer passed in by the application, and the operating system plays an important role). This is also different from Reactor.
Finally, the kernel distributes the read data and the user thread registration event to the internal Proactor, which notifies the user thread of the completion of IO (generally by calling the completion event handler registered by the user thread) to complete asynchronous io.

implementation technique

Open source C + + framework: ACE
Open source c + + development framework ace provides a large number of platform independent underlying concurrency support classes (threads, mutexes, etc.). At the same time, at a higher level, it also provides several independent groups of C + + classes to implement Reactor and Proactor patterns. Although they are platform independent units, they all provide different interfaces. Ace Proactor has better performance and robustness on MS Windows, mainly because Windows provides a series of efficient underlying asynchronous API s. Unfortunately, not all operating systems provide robust support for underlying asynchrony. For example, many Unix systems are in trouble. Therefore, ACE Reactor may be a more suitable solution on Unix system. Because of the different support at the bottom of the system, in order to have better performance on each system, developers have to maintain several independent Codes: ACE Proactor for windows and ACE Reactor for Unix series. True asynchronous mode requires operating system level support.
C Network Library: libevent
libevent is a network library written in C language. The official main support is linux like operating system. The latest version adds support for IOCP of windows.
Boost.Asio class library
Boost.Asio is a class library written in C + + language, which is implemented in the design mode of Proactor.

Compared with IO multiplexing model, asynchronous IO is not very commonly used. Many high-performance concurrent service programs can basically meet the requirements by using the architecture of IO multiplexing model + multithreaded task processing. Moreover, the current operating system's support for asynchronous IO is not particularly perfect, but more uses the IO multiplexing model to simulate asynchronous IO (the user thread is not directly notified when the IO event is triggered, but the data is put into the buffer specified by the user after reading and writing).

If there is any mistake or improper understanding, please give me advice.

Tags: Design Pattern C

Posted by bruckerrlb on Sat, 16 Apr 2022 18:33:40 +0930