Troubleshooting system-level problems

Traps, faults, and exceptions

Traps, faults, and exceptions are special conditions detected by the CPU while it is executing an instruction on behalf of a user process (running in either user mode or system mode), a system process running in system mode (for example, a system daemon such as sched, vhand, or bdflush), or an interrupt routine. These special conditions cause the CPU to switch into system mode and execute a trap handler inside the kernel.

If the trap happens in user mode (in other words, if the trap is caused by a user process), the kernel usually sends a signal to the process. For example, if a process executes an instruction that causes a divide-by-zero error, the CPU raises a divide-by-zero exception, and the trap handler ultimately sends a SIGFPE (floating point error) signal to the process. (See the signal(S) manual page for a complete list of supported signals.) Some user exceptions are legal and do not cause a signal. For example, a process may de-reference a valid pointer that identifies a piece of data in the process's data segment that is currently paged out of main memory. The CPU raises a page fault, and the trap handler then loads the page of data from the swap area into memory and restarts the instruction that caused the fault. In this case, the trap handler does not send a signal to the process.

However, if a process dereferences an invalid pointer (the pointer may be corrupt or uninitialized), the trap handler will determine that there is no corresponding page to load from either the filesystem or the swap area, and will send a SIGSEGV (segmentation violation) signal to the process.

Except for a few special circumstances, the kernel is not allowed to cause traps, faults, and exceptions ``by itself,'' (in other words, when it is executing system calls, system processes, and interrupt routines). If the kernel does cause a fault, the situation is considered to be so serious that the system cannot continue to run. The trap handler calls a special panic( ) routine inside the kernel, which stops the system.

When the system panics because of an addressing violation, the current contents of the CPU registers are displayed on the console, the contents of the machine memory is written to dumpdev (usually the swap device) and the system makes an internal call to the kernel haltsys( ) function.

Next topic: Console panic information
Previous topic: Analyzing system failures

© 2003 Caldera International, Inc. All rights reserved.
SCO OpenServer Release 5.0.7 -- 11 February 2003