Tuning I/O resources

How the buffer cache works

On a typical system approximately 85% of disk I/O can be avoided by using the buffer cache, though this depends on the mix of jobs running. The buffer cache is created in an area of kernel memory and is never swapped out. Although the buffer cache can be regarded as a memory resource, it is primarily an I/O resource due to its use in mediating data transfer. When a user process issues a read request, the operating system searches the buffer cache for the requested data. If the data is in the buffer cache, the request is satisfied without accessing the physical device. It is quite likely that data to be read is already in the buffer cache because the kernel copies an entire block containing the data from disk into memory. This allows any subsequent data falling within that block to be read more quickly from the cache in memory, rather than having to re-access the disk. The kernel also performs read-ahead of blocks on the assumption that most files are accessed from beginning to end.

The data area of each buffer for filesystems other than DTFS is 1KB which is the same size as a filesystem logical block and twice the typical physical disk block size of 512 bytes. DTFS filesystems use buffers with data areas in multiples of 512 bytes from 512 bytes to 4KB.

If data is written to the disk, the kernel first checks the buffer cache to see if the block, containing the data address to be written, is already in memory. If it is, then the block found in the buffer cache is updated; if not, the block must first be read into the buffer cache to allow the existing data to be overwritten.

When the kernel writes data to a buffer, it marks it as delayed-write. This means that the buffer must be written to disk before the buffer can be re-used. Writing data to the buffer cache allows multiple updates to occur in memory rather than having to access the disk each time. Once a buffer has aged in memory for a set interval it is flushed to disk by the buffer flushing daemon, bdflush.

The kernel parameter NAUTOUP specifies how long a delayed-write buffer can remain in the buffer cache before its contents are written to disk. The default value for NAUTOUP is 10 seconds, and ranges between 0 and 60. It does not cause a buffer to be written precisely at NAUTOUP seconds, but at the next buffer flushing following this time interval.

Although the system buffer cache significantly improves overall system throughput, in the event of a system power failure or a kernel panic, data remaining in the buffer cache but which has not been written to disk may be lost. This is because data scheduled to be written to a physical device will have been erased from physical memory (which is volatile) as a consequence of the crash.

The default flushing interval of the buffer flushing daemon, bdflush, is 30 seconds. The kernel parameter BDFLUSHR controls the flushing interval. You can configure BDFLUSHR to take a value in the range 1 to 300 seconds.

If your system crashes, you will lose NAUTOUP + (BDFLUSHR/2) seconds of data on average. With the default values of these parameters, this corresponds to 25 seconds of data. Decreasing BDFLUSHR will increase data integrity but increase system overhead. The converse is true if you increase the interval.

Apart from adjusting the aging and flushing intervals, you can also control the size of the buffer cache. The kernel parameter NBUF determines the amount of memory in kilobytes that is available for buffers. If you are using the DTFS filesystem, the value of NBUF does not correspond to the actual number of buffers in use. The default value of NBUF is 0; this causes the kernel to allocate approximately 10% of available physical memory to buffers.

The size of the buffer cache in kilobytes is displayed when the system starts up and in the file /usr/adm/messages. Look for a line of the form:

   kernel: Hz = 100, i/o bufs = numberk
If there are any buffers in memory above the first 16MB, the line may take the form:
   kernel: Hz = 100, i/o bufs = numberk  (high bufs = numberk)

The amount of memory reserved automatically for buffers may be not be optimal depending on the mix of applications that a system will run. For example, you may need to increase the buffer cache size on a networked file server to make disk I/O more efficient and increase throughput. You might also find that reducing the buffer cache size on the clients of the file server may be possible since the applications that they are running tend to access a small number of files. It is usually beneficial to do this because it increases the amount of physical memory available for user processes.

How you can investigate the effectiveness of the buffer cache is the subject of ``Viewing buffer cache activity''.

For more information on tuning the size of the buffer cache see:

``Configuring kernel parameters'' tells you how you can use the configure(ADM) utility to change the values of kernel parameters such as NAUTOUP, BDFLUSHR, and NBUF.

Next topic: Viewing buffer cache activity
Previous topic: Subsystems that affect disk and other I/O

© 2003 Caldera International, Inc. All rights reserved.
SCO OpenServer Release 5.0.7 -- 11 February 2003