Administering virtual disks

Repairing a failed drive

A disk piece may be taken out of service for a number of reasons (controller or bus errors, bad cables or connectors, power failure, surface defects or other hardware problems). In the case of surface defects, run badtrk(ADM) to remap the bad blocks. Usually the drive that failed can be brought back into service by restoring the parity on the failed drive. If the drive needs repairing and the system does not support hot insertion (replacement of a drive unit while the array is operating), you must shut down the system to replace the drive.

NOTE: If the RAID configuration does not include a hot spare, use the Virtual Disk Manager to swap the failed drive with another drive to prevent the RAID virtual disk from going offline due to a second I/O error. Replacement drives must be fully configured with mkdev hd before they can be used in virtual disks.

When the system is booted, the array will be online and the disk piece associated with the failed drive will be out of service. If a standby disk was configured, the applications will be running on the spare. The applications will continue running without interruption as you start to configure the replacement drive.

Once the drive is configured, the RAID array can recreate the data on the replacement drive. If the configuration is running on a standby disk (spare) the restore will update the data on the replacement drive and the spare will be placed back on standby.

Next topic: Possible problems
Previous topic: Checking and restoring parity data

© 2003 Caldera International, Inc. All rights reserved.
SCO OpenServer Release 5.0.7 -- 11 February 2003