Guest

Cisco 7800 Series Media Convergence Servers

Configuring and Using Redundant Disks with Cisco MCS

Cisco - Configuring and Using Redundant Disks with Cisco MCS

Document ID: 9229

Updated: Jan 31, 2006

   Print

Introduction

This document answers some of the primary questions about disk redundancy on the Cisco Media Convergence Server (MCS). In addition, the document describes how to get the most out of the redundant disk technology (Redundant Array of Independent Disks [RAID]) that comes with the MCS.

Prerequisites

Requirements

Cisco recommends that you have basic hardware knowledge.

Components Used

The information in this document is based on these software and hardware versions:

  • MCS 7830

  • MCS 7835

Note: The Cisco CallManager OS images have been created for specific fixed hardware configurations on specific platforms. If you need to increase the hard disk space or performance, you must take a backup. Complete these steps:

  1. Upgrade the server platform.

  2. Reinstall Cisco CallManager.

  3. Use the Backup and Restore System (BARS) in order to restore.

You must perform these steps in order to use the same platform/server and increase the hard disk space. For more information about Cisco CallManager hardware, refer to Cisco 7800 Series Media Convergence Servers Product Brochures.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, make sure that you understand the potential impact of any command.

Conventions

Refer to Cisco Technical Tips Conventions for more information on document conventions.

Drive Mirroring (RAID 1)

The default Cisco CallManager OS image install installs the MCS with a RAID 1 configuration. Drive mirroring, which is also called RAID 1, is the highest performance and highest fault tolerance RAID method. RAID 1 is the only option that offers fault tolerance protection if only two drives are installed or selected for an array. In order to create fault tolerance, drive mirroring stores two sets of duplicate data on a pair of disk drives. RAID 1 is the most expensive fault tolerance method because 50 percent of the drive capacity is used to store the redundant data. RAID 1 always requires an even number of disks. The data is striped across the drives, and then mirrored.

If a drive fails, the mirror drive provides a backup copy of the files, and there is no interruption of normal system operations. The mirroring feature requires a minimum of two drives. By default, the MCS 7830 and MCS 7835 are delivered with two disks that are configured with RAID 1. Therefore, recovery from a single drive failure is possible.

This diagram shows the stripe of the data in chunks in order to provide a mirror. Data chunk A on one disk is mirrored to A on another disk, data chunk B is mirrored to B on another disk, and so on. In other words, the data is striped in chunks and then copied (mirrored) to the second disk. If the first disk that holds data A fails, you can still read/write from the other disk that contains data A:

disk_redundancy_mcs_9229a.gif

In order to find out how your disks have been configured, perform one of these two procedures:

  1. Use the Array Configuration Utility from the SmartStart and Support Software CD.

    1. Insert the SmartStart and Support Software CD in the CD drive and power up the server.

      A menu displays.

    2. Choose Array Configuration Utility.

    3. After completion, remove the CD and restart the server.

  2. Use the Compaq Array Configuration Tool.

    1. Choose Start > Programs > Compaq System Tools > Compaq Array Configuration Tool.

      This window pops up:

      disk_redundancy_mcs_9229b.gif

      This disk has one logical disk space area of 8673 MB.

    2. Click the Physical disk image.

      You can see that there are two physical disks present, each of 9.1 GB.

      disk_redundancy_mcs_9229c.gif

      Because these disks are mirrored, you only see one logical drive of 8673 MB on the logical tab.

Recognize a Drive Failure

A system operator can recognize a drive failure in one of several ways:

  • The amber LED is illuminated on failed drives in a hot-pluggable tray. However, the illumination only occurs if the storage system is turned on and the Small Computer System Interface (SCSI) cable works.

    Note: The amber LED can be illuminated briefly when you insert a hot-pluggable drive. This behavior is normal.

  • A power-on self test (POST) message lists failed drives whenever you restart the system. But the message displays only if the controller detects one or more "good" drives.

  • Drive Array Advanced Diagnostics (DAAD) lists all failed drives. An online version of DAAD is also available in Microsoft Windows NT and Windows 2000 environments.

  • Compaq Insight Manager can detect failed drives remotely across a network.

A drive failure also shows up on the Array Configuration Utility.

Assume, for example, that you pull disk 1 (ID 1) out of the array or that the disk is broken. The array controller discovers that one of the disks has failed or is missing.

disk_redundancy_mcs_9229d.gif

However, the system is still up and running. Logical drive 1 still operates because RAID 1 can survive a disk failure. But the drive operates with reduced performance.

disk_redundancy_mcs_9229e.gif

disk_redundancy_mcs_9229f.gif

The Physical Configuration View of the array shows that disk 1 (ID 1) has failed.

disk_redundancy_mcs_9229g.gif

A drive failure can also show this error message in the event log:

Event Type:	Error
Event Source:	cpqcissm
Event Category:	None
Event ID:	9
Description:
The device, \Device\Scsi\cpqcissm1, did not respond within the timeout period.

Recover from Drive Failure

The Smart Array 221 Controller with use of the MCS 7830 supports hot-pluggable drives. You can install or remove these drives without the need to turn off the system power.

You can remove and replace failed drives in hot-pluggable trays while the host system and storage system power are both ON. If you insert the drive while the power is ON in fault-tolerant configurations, the recovery of data on the replacement drive automatically begins. A blinking online LED indicates that this data recovery has begun.

In some situations, you remove disk 1 (ID 1) from the array, either because the disk has failed or because it was taken out before an upgrade. Then, you insert the disk back into the array. Or, you may insert a new disk because the previous disk was faulty. In these cases, the disk is automatically overwritten with the information on the original disk that was in the array. In this document example, that disk is disk 0, ID 0.

In general, approximately 15 minutes per GB is necessary for a rebuild. However, the actual rebuild time depends on these factors:

  • The Rebuild Priority set

  • The amount of I/O activity that occurs during the rebuild operation

  • The number of drives in the array

  • The disk drive speed

caution Caution: Never insert a disk if you do not want it to be overwritten by the original disk.

Replace a Failed Drive

These steps illustrate the automatic process that replaces a failed drive.

  1. Disk 1, ID 1 is put back into the array and the process to rebuild the logical drive is underway.

    disk_redundancy_mcs_9229h.gif

  2. In the Logical Configuration View, you can see that the array icon is no longer broken and the rebuild occurs.

    disk_redundancy_mcs_9229i.gif

  3. In the Physical Configuration View, you can now see two disks again because disk 1, ID 1 reappears during the rebuild.

    disk_redundancy_mcs_9229j.gif

  4. The array is now rebuilt and the Status appears as OK.

    disk_redundancy_mcs_9229k.gif

Recover from Upgrade Failure on Cisco CallManager

You can also replace hot-pluggable drives when the power is OFF. At the insertion of a hot-pluggable drive, all disk activity on the controller temporarily pauses while the drive spins up. This process usually takes about 20 seconds. Assume, for example, that you are about to do an upgrade on your Cisco CallManager system. As a precaution, you take disk 1, ID 1 out of the array. You perform the upgrade on disk 0, ID 0. The upgrade fails.

This procedure outlines the steps to take in order to go back to the original configuration (disk 1).

  1. Bring the server down.

  2. Take disk 0, ID 0 out of the server.

  3. Insert disk 1, ID 1 with the good configuration into the array.

  4. Boot the server with this disk.

  5. At the bootup window, press F2: "Interim Recovery mode will be enabled if configured for fault tolerance".

    Note: Always place the disk in the slot from which you have removed the disk.

These steps describe the process in detail:

  1. After you boot with disk 1, ID 1, the system notices that the original drive (disk 0, ID 0) has failed.

    disk_redundancy_mcs_9229l.gif

    disk_redundancy_mcs_9229m.gif

  2. In the Physical Configuration View, disk 0, ID 0 is no longer present and the array icon is broken.

    After you replace disk 0, ID 0, the array begins to rebuild. If the disk does not start to rebuild, remove the disk from the drive cage and insert it again.

    disk_redundancy_mcs_9229n.gif

  3. In the Logical Configuration View, the array icon is no longer broken.

    disk_redundancy_mcs_9229o.gif

  4. In the Physical Configuration View, the disk with the bad configuration (disk 0, ID 0) is now present again.

    disk_redundancy_mcs_9229p.gif

    The capacity of replacement drives must be at least as large as the capacity of the other drives in the array. The controller immediately fails drives that have insufficient capacity and does not start the Automatic Data Recovery.

    If the Smart Array 221 Controller has a failed drive, replace the drive with a new or known good replacement drive. In some cases, a drive that the controller has previously failed can appear to be operational after the system is power cycled or after removal and reinsertion of a hot-pluggable drive.

    caution Caution: This practice is highly discouraged because the use of such "marginal" drives can eventually result in data loss.

Related Information

Updated: Jan 31, 2006
Document ID: 9229