Chapter 12 Block Device I/O and Buffer Management
1. Summary of knowledge points
12.1 Block Devices and I/O Buffers
Since disk I/O is slow compared to memory access, it is no longer desirable to perform disk I/O every time a file is read or written. Therefore, most file systems use I/O buffering to reduce the amount of physical I/O to storage devices. A properly designed I/O buffering scheme can significantly improve file I/O buffering scheme, which can significantly improve file I/O efficiency and increase system throughput.
The basic principle of I/O buffering is very simple. The file system uses a series of I/O buffers as cache memory for block devices. When a process attempts to read a disk block identified by (dev, blk), it first searches the buffer cache for the buffer allocated for the disk block. If the buffer exists and contains valid data, it simply reads the data from the buffer without reading the block from disk again. If the buffer doesn't exist, it allocates a buffer for the disk block, reads the data from disk into the buffer, and then reads the data from the buffer. When a block is read in, the buffer will be saved in the buffer cache. Available for the next read/write request of the same block by any process. Likewise, when a process writes to a disk block, it first gets a buffer allocated to that block. It then writes the data to the buffer, marks the buffer as dirty to delay writing, and releases it to the buffer cache. Since the dirty buffer contains valid data, it can be used to satisfy subsequent read/write requests to the same block without causing actual disk I/O. Dirty buffers are only written to disk when they are reallocated to different blocks.
bread(dev, blk) function:
BUFFER *bread(dev,blk) // return a buffer containing valid data { BUFFER *bp = getblk(dev,blk); // get a buffer for (dev,blk) if (bp data valid) return bp; bp->opcode = READ; // issue READ operation start_io(bp); // start I/O on device wait for I/O completion; return bp; }
write_block(dev, blk, data) function:
write_block(devf blk, data) { BUFFER *bp = bread(dev,blk); // read in the disk block first write data to bp; (synchronous write)? bwrite(bp) : dwrite(bp); }
12.2 Unix I/O Buffer Management Algorithms
(1) Unix buffer management subsystem
- I/O buffers: A series of NBUF buffers in the kernel are used as buffer caches. Each buffer is represented by a structure.
typdef struct buf{ struct buf *next_free; //freelist pointer struct buf *next_dev; //dev_list pointer int dev,blk; //assigned disk block; int opcode; //READ|WRITE int dirty; //buffer data modified int async; //ASYNC write flag int valid; //buffer data valid int busy; //buffer is in use int wanted; //some process needs this buffer struct, semaphore lock=l ; //buffer locking semaphore; value=L struct semaphore iodone=0; //for process to wait for I/O completion; char buf[BLKSIZE]; //block data area } BUFFER; BUFFER buf[NBUF], *freelist; // NBUF buffers and free buffer list
The buffer structure consists of two parts; the buffer header part for buffer management and the data part for data blocks. To protect kernel memory, the status field can be defined as a bitwise twist, where each bit represents a unique status condition.
- Device Table: Each block device is represented by a device table structure.
Each device table has a dev_Iist that contains the I/O buffers currently allocated to that device, and an io_queue that contains buffers waiting for I/O operations on the device. I/O queues should be organized in a way that ensures optimal I/O operations. Unix uses FIFO I/O queues.
- Buffer initialization:
When the system starts, all I/O buffers are in the free list, and all device lists and I/O queues are empty.
- Buffer list:
When the buffer is allocated to (dev, blk), it is inserted into the dev_list of the device table. If the buffer is currently in use, it is marked BUSY and removed from the free list. Busy buffers may also be in the I/O queue in the device table. Since a buffer cannot be free and busy at the same time, the device I/O queue can be maintained by using the same next_free pointer. When the buffer is no longer busy, it is released back to the free list, but remains in dev_list for possible reuse. Buffers may change from one dev_list to another only on reallocation. As mentioned earlier, read/write disk blocks can be represented as bread , bwrite and dwrite , all of which depend on getblk and brelse .
Some specific notes on Unix algorithms:
-
Data consistency: To ensure data-consistency, getblk must not allocate multiple buffers to the same (dev,blk). This can be achieved by having the process go through a "retry loop" again after waking up from sleep. The reader can verify that each buffer allocated is unique. Second, dirty buffers are written out before reallocation, which guarantees data consistency.
-
Cache effect: The cache effect can be achieved by the following methods. Freed buffers are kept in the device list for possible reuse. Buffers marked for deferred writes do not generate I/O immediately and can be reused. Buffers are freed to the end of the free list, but allocations start at the front of the free list. This is based on the LRU (Least Recently Used) principle, which helps to prolong the lifespan of allocated buffers and thus improve their caching effectiveness.
-
Critical section: A list of buffers that the device interrupt handler can manipulate, such as removing bp from the device table's I/O queue. Change its state and call brelse(bp). So, in getblk and brelse, device interrupts are masked in these critical sections. These are implied but not manifested in the algorithm.
Disadvantages of Unix Algorithms:
- Inefficiency: The algorithm relies on a retry loop. For example, freeing a buffer might wake up two groups of processes: those that need the freed buffer, and those that only need the free buffer. Since only one process can acquire the freed buffer, all other awakened processes must go back to sleep. After waking up from sleep, each awakened process must re-execute the algorithm from scratch, since the required buffers may already exist. This can lead to excessive process switching.
- Cache effects are unpredictable: in Unix algorithms, every buffer that is freed can be acquired. If the buffer is acquired by a process that needs a free buffer, the buffer will be reallocated. Even some processes still need the current buffer.
- There may be starvation: Unix algorithms are based on the principle of "free economy", i.e. every process has a chance to try, but success is not guaranteed. Therefore, process starvation may occur.
- The algorithm uses sleep/wake operations that are only available on uniprocessor systems.
12.3 New I/O Buffer Management Algorithms
The main advantages of semaphores over sleep/wake are:
- (1) Counting semaphores can be used to represent the number of available resources, such as the number of free buffers.
- (2) When multiple processes are waiting for a resource, the V operation on the semaphore will only release one waiting process, which does not have to retry because it is guaranteed to own the resource.
Designing a buffer management algorithm using P/V on a semaphore must satisfy the following conditions:
- buffer uniqueness
- No retry loop
- No need to wake up
- cache effect
- No deadlock and starvation
Second, the practice screenshots
Reference link: p/v algorithm implementation
#include <stdio.h> #include <string.h> #include <stdlib.h> #include <unistd.h> #include <pthread.h> #include<semaphore.h> #define P sem_wait #define V sem_post #define full_apple &fullA #define full_orange &fullB #define empty &empty_aha sem_t fullA; sem_t fullB; sem_t empty_aha; int num=0; void* Dad(void *p) { while(num<50) { P(empty); num++; printf("Dad put an apple%d\n",num); V(full_apple); } } void* Dangter(void *p) { while(num<50) { P(full_apple); num++; printf("Daughter ate an apple%d\n",num); V(empty); } } void* Mum(void *p) { while(num<50) { P(empty); num++; printf("Mom put an orange%d\n",num); V(full_orange); } } void* Son(void *p) { while(num<50) { P(full_orange); num++; printf("Son ate an orange%d\n",num); V(empty); } } int main() { sem_init(full_apple, 0, 0); sem_init(full_orange, 0, 0); sem_init(empty, 0, 1); pthread_t tid0; pthread_t tid1; pthread_t tid2; pthread_t tid3; pthread_create(&tid0, NULL, Dad, NULL); pthread_create(&tid1, NULL, Mum, NULL); pthread_create(&tid2, NULL, Son, NULL); pthread_create(&tid3, NULL, Dangter, NULL); getchar(); pthread_exit(0); return 0; }