Tuning networking resources

Tuning NFS performance

If NFS is configured, your system runs the /etc/rc2.d/S89nfs script each time it goes to multiuser mode. (Note that this file is a link to /etc/nfs.) This script starts several NFS daemons. If configured to run, the following daemons may affect performance:

biod: Runs on clients to handle access to remote filesystems.
nfsd: Runs on servers to handle access by remote clients to local filesystems.
pcnfsd: Runs on servers to handle access by remote clients that use the DOS, OS/2, or Macintosh operating systems and run PCNFS.

If the output from the sar -u command (see ``Identifying disk I/O-bound systems'') shows that an NFS client is spending a significant proportion of time waiting for I/O to complete (%wio is consistently greater than 15%), and this cannot be attributed to local disk activity (a disk is busy if sar -d consistently shows avque greater than 1 and %busy greater than 80%), then the performance of NFS may be causing an I/O bottleneck. See ``Tuning NFS client performance'' for more information.

Tuning NFS client performance

Read performance by an NFS client is influenced by several factors:

The biod daemon processes running on the client provide read-ahead of disk blocks into the client machine's buffer cache as described in ``Tuning the number of biod daemons on a client''. This only improves performance if the applications tend to perform sequential reads through files. The client's buffer cache should be tuned to enhance the read hit rate as described in ``Increasing disk I/O throughput by increasing the buffer cache size''. If the data can be found in the client's buffer cache, this avoids the overhead of going out to the remote filesystem for the data. Note that there is no guarantee that the data is the most current available unless file locking is used
The number of nfsd daemon processes running on the server to service requests from clients as described in ``Tuning the number of nfsd daemons on a server''.

Write performance by an NFS client is affected if you choose to use non-standard asynchronous writes as described in ``Configuring asynchronous or synchronous writes''. You should tune the server's buffer cache size to increase the write hit rate as described in ``Increasing disk I/O throughput by increasing the buffer cache size''.

There is no benefit to NFS client write performance in tuning the write hit rate on the server if you opt to use default synchronous writes through its buffer cache.

NFS server performance can be further improved by using a disk controller with a write-back (rather than a write-through) cache. This runs the risk of losing data unless its integrity is protected using a UPS.

Configuring NFS daemons

If your system does not serve clients running PCNFS, comment out the following lines in /etc/nfs that start pcnfsd:

   [ -x /etc/pcnfsd ] && {
          echo " pcnfsd\c"
          pcnfsd &
   }

When NFS is next started, pcnfsd will not run. This will not affect performance to any great extent apart from removing an unwanted process from the system.

The daemons mountd, portmap, statd, and lockd are needed for the operation of NFS.

The following sections discuss how to tune the number of biod and nsfd daemons running on clients and servers:

Tuning the number of nfsd daemons on a server

Like biods, nfsd daemons provide processes for the scheduler to control -- the bulk of the work dealing with requests from clients is performed inside the kernel. Each nfsd is available to service an incoming request unless it is already occupied. The more nfsds that are running, the faster the incoming requests can be satisfied. There is little context switching overhead with running several nfsds as only one sleeping daemon is woken when a request needs to be served.

If you run more nfsds than necessary, the main overhead is the pages of memory that each process needs for its u-area, data, and stack (program text is shared). Unused nfsd processes will sleep; they will be candidates for being paged or swapped out should the system need to obtain memory.

If too few nfsds are running on the server, or its other subsystems, such as the hard disk, cannot respond fast enough, it will not be able to keep up with the demand from clients. You may see this on clients if several requests time out but the server can still service other requests. If you run the command nfsstat -c on the clients, its output provides some information about the server's performance as perceived by the client:

   Client rpc:
   calls    badcalls retrans  badxid   timeout  wait      newcred
   336033   50       413      418      299      0         0
   ...

If badxid is non-zero and roughly equal to retrans, as is the case in this example, the server is not keeping up with the clients' requests.

If you run too few nfsds on a server, the number of messages on the request queue builds up inside the upstream networking protocol stack in the UDP module. In extreme cases, you could consume all memory on the server reserved for use by STREAMS; this would cause applications using STREAMS to fail. Use netstat -m to examine STREAMS usage on the server as described in ``Monitoring STREAMS performance''.

You can also use the command netstat -s -p udp to examine how many system failures due to shortage of STREAMS memory have occurred in the UDP module:

   udp:
   	0 incomplete headers
   	0 bad data length fields
   	0 bad checksums
   	62 bad ports
   	438014 input packets delivered
   	0 systems errors during input
   	417038 packets sent

To change the number of nfsds that are configured to run, edit the following lines in the file /etc/nfs on the server:

   [ -x /etc/nfsd ] && {
           umask 000
           echo " nfsd(xnumber)\c"
           nfsd number &
           umask $oldmask
   }

When NFS is next started on the client, number nfsds will run.

Tuning the number of biod daemons on a client

On an NFS client system, you do not need to run any biod processes for applications to access remote filesystems. The biods handle read-ahead and write-behind on remote filesystems in order to improve performance. When reading, they send requests to read disk blocks ahead of that currently requested. When writing, they take over responsibility for handling writing the block to the remote disk from the application. The biod processes visible using ps(C) are merely convenient handles used by the process scheduler to control NFS client operation -- the majority of the work dealing with the read and write requests is dealt with inside the kernel.

If no biods are running, the application's performance will suffer as a result. When it writes to the remote filesystem, the write system call will block until the data has been written to the disk on the server. When it reads from the remote filesystem, it is unlikely to find the blocks in the buffer cache.

From this, you might deduce that running an extra copy of biod will always enhance NFS performance on the client. For example, if four biods are running, each of these can perform asynchronous writes without applications programs having to wait for these to complete. If an application requires access to the remote filesystem while the biods are busy, it performs this itself. The limit to performance enhancement comes from the fact that each biod's disk requests impose a load on the server. nfsd daemons, the buffer cache, and disk I/O on the server will all come under more pressure if more biod daemons are run on the clients.Network traffic will also increase as will the activity of the networking protocol stacks on both the server and its clients. The default number of biod processes run on a client is four. To see if the number running on your system is adequate, use the ps -ef command and examine the elapsed CPU time used by the biods under the TIME column. Note that the results are only meaningful if your system has been operating under normal conditions for several hours.

If nfsstat -c on the client shows a wait for client handle value of zero and if the TIME value for at least one of the biods is substantially less than the others, then there are probably enough daemons running. If several biods show low TIME values, it should be safe to reduce their number to one more than the number showing high TIME values.

If all the TIME values are high, increase the number of biods by two, and continue to monitor the situation.

If you are root, you can reduce the number of biods running by killing them with kill(C). You can also start extra biods running using the command /etc/biod.

To change the number of biods that are configured to run, edit the following lines in the file /etc/nfs on each client:

   [ -x /etc/biod ] && {
           echo " biod(xnumber)\c"
           biod number &
   }

When NFS is next started on the client, number biods will run.

Configuring asynchronous or synchronous writes

One way of improving NFS performance is to prevent applications and biod daemons from performing synchronous write calls to the remote disk. The mechanism used is controlled by the value of the kernel variable nfs_server_async_writes set in the file /etc/conf/pack.d/nfs/space.c on the server. This variable can take three values:

0: Enables slow synchronous writes; write the data blocks through the buffer cache to the disk one at a time.
1: Enables fast synchronous writes (the default); the data blocks are first written to the server's buffer cache. A file sync operation then flushes the data to disk.
2: Enables asynchronous writes; write the data to the server's buffer cache and rely on the server's buffer flushing daemon to write the data to disk. This minimizes the time that processes on the client have to wait for the write operation to complete.

To change the way that blocks are written to disk on the server, edit the space.c file and change the value of nfs_server_async_writes to select the desired behavior. Relink the kernel and reboot the system as described in ``Relinking the kernel''.

WARNING: Asynchronous writes do not conform to the NFS V2 standard. There is a possibility that data may be lost irretrievably.

If you choose to use asynchronous writes, this will improve performance but it increases the risk that data can be lost without reporting an error to the client. The client can receive notification of a successful write while the data is still in the server's buffer cache. If the server's disk goes down because of a power failure or other fault, there is a risk that the data may not have been written to disk. You can protect against this to some extent using:

An uninterruptible power supply (UPS) to maintain power to the server's buffer cache, disk cache, and disk in case of power failure.
Virtual disk management or hardware RAID array to ensure data integrity and to protect against disk failure.

Configuring NFS to use TCP

For NFS filesystems mounted over a high latency, high bandwidth connection such as a wide area network (WAN), there are benefits in using TCP as the transport protocol rather than UDP. With TCP, you can define large send and receive windows to be set on an interface as described in ``Tuning TCP/IP performance''. This allows a large amount of data to be sent before requiring an acknowledgement. On a noisy connection, it is preferable to use TCP because it performs packet error detection and correction; UDP relies on the application to correct errors.

To define NFS to support TCP as a transport protocol on a server, edit /etc/nfs. Use the -d and -t options to nfsd(NADM) to allocate the number of nfsd daemons that support each protocol. For example, to define six nfsds to use UDP and two to use TCP, change the lines that starts the nfsd daemons to read:

   [ -x /etc/nfsd ] && {
           umask 000
           echo " nfsd(UDPx6,TCPx2)\c"
           nfsd -u 6 -t 2 &
           umask $oldmask
   }

Specify the mount(ADM) option modifier tcp on each client for the remote NFS filesystem that you want to mount using TCP. This must be added to the options defined for the mntopts keyword in the file /etc/default/filesys (see filesys(F) for more information). The following is an example of such an entry:

   bdev=nfs_svr:/remote \
           mountdir=/remote_mnt fstyp=NFS \
           fsck=no fsckflags= \
           init=yes initcmd="sleep 2" \
           mntopts="bg,soft,tcp" \
           rcmount=yes rcfsck=no mountflags=

If you use automount, you can specify the option modifiers in the auto.master configuration map as described in automount(NADM).

These changes will not take effect until NFS is next started on the server and clients.

Configuring IP to maximize NFS performance

By default, NFS transfers data in 8KB blocks. If the network is Ethernet-based and full frames are being used, six Ethernet frames are required to transmit these blocks. If the data in the frames is rounded down to 1KB, eight frames are required to transmit the data. If your network adapter can handle full frames and back-to-back packets, it should already be configured as such as described in ``Using ifconfig to change parameters for a network card''.

Configuring mount options to maximize NFS performance

If the network adapter on an NFS client cannot handle full frames and back-to-back packets, reduce the NFS read and write transfer sizes below the default of 8KB. To do this, specify the mount(ADM) option modifiers rsize and wsize for each mounted filesystem. These must be added to the options defined for the mntopts keyword in the file /etc/default/filesys (see filesys(F) for more information). The following is an example of such an entry reducing the read and write transfer sizes to 1KB (1024 bytes):

   bdev=nfs_svr:/remote \
           mountdir=/remote_mnt fstyp=NFS \
           fsck=no fsckflags= \
           init=yes initcmd="sleep 2" \
           mntopts="bg,soft,rsize=1024,wsize=1024" \
           rcmount=yes rcfsck=no mountflags=

If you use automount, you can specify these option modifiers in the auto.master configuration map as described in automount(NADM).

Performance considerations when using automount

If you use automount(NADM) to mount remote filesystems automatically on demand, you should consider the following performance implications:

automount is single threaded. Mount requests can be delayed by another request that has been made to a slow or inactive NFS server. For more information see ``Troubleshooting NFS''.
Making multiple requests to an automounted filesystem can cause high system overhead. The kernel forces a context switch to automount to look up the pathname for each request. This can happen, for example, in a shell script that repeatedly copies files between a local and a remote filesystem, or between two remote filesystems (see also ``Unnecessary automounts'').
Direct automount maps require two mount table entries for each automounted filesystem; if you use indirect maps, one mount table entry is used by automount plus one for each automounted filesystem. By default, the number of mount table entries is determined dynamically, so there is no need to change the MAX_MOUNT kernel parameter to a value other than 0 (see also ``Direct and indirect mounting'').
Login time can increase significantly for csh(C) users who include many automounted filesystems in their path. The C shell adds command pathnames to its internal hash table when the path variable is set. As a consequence, automount mounts each automounted remote filesystem that is listed.

Performance considerations when using NIS

The Network Information Service (NIS) supplements NFS and provides a distributed database of commonly accessed administration files. A master NIS server holds information files needed by all machines on the network centrally; examples of these files are /etc/passwd, /etc/group, and /etc/services. Whenever this information is updated, it is pushed out to slave servers and copy-only servers to ensure that it is updated globally.

NIS clients, which may be diskless, request information from servers whenever needed. This may be quite a common occurrence. For example, a command such as ls -l requires access to information held in the files /etc/passwd and /etc/group so that it can display the user and group ownership of files. If you are running NIS clients on your network, you should be aware that a proportion of network traffic will be caused by NIS clients requesting such information.