Cisco SFS 7008P and SFS 7000 Series Server Switch Redundancy
This chapter describes the Cisco SFS 7008P and SFS 7000 Series Server Switch redundancy and includes the following sections:
•
Cisco SFS 7008P Server Switch Redundancy
•
Cisco SFS 7000P and SFS 7000D Server Switch Redundancy
For information related to Subnet Manager Redundancy, see "Subnet Manager Redundancy".
Note
For expansions of acronyms and abbreviations used in this publication, see "Acronyms and Abbreviations."
Cisco SFS 7008P Server Switch Redundancy
This section describes redundancies in the Cisco SFS 7008P Server Switch and includes the following topics:
•
Software Redundancy
•
Power Supply Module Redundancy
•
Fan Tray Redundancy
•
Management Interface Module Redundancy
•
Fabric Controller Redundancy
•
Line Interface Module Redundancy
•
IB Fabric Redundancy
For more details about the Cisco SFS 7008P Server Switch, see the Cisco SFS 7008P InfiniBand Server Switch Hardware Installation Guide.
Software Redundancy
This section describes redundancy in the Cisco SFS 7008P Server Switch software.
The Cisco SFS 7008P Server Switch supports the hot-standby feature. When the primary controller fails, a standby controller assumes management of the server switch without having to reboot or reset other cards in the chassis.
When two controllers are installed in a Cisco SFS 7008P Server Switch, one controller acts as the primary controller, and the other acts as the standby controller. The primary controller is responsible for managing the chassis. The standby controller waits to take-over if the primary controller fails or is rebooted.
Verify the primary switch controller by entering the show card command in the CLI. The oper-code of the primary controller card is normal and for the standby controller is standby. An asterix marks the controller card that services this CLI session. So, from a console CLI session, your console port is on the card marked with an asterix.
The following is sample output from the show card command and verifies the status of each controller card:
=========================================================================
=========================================================================
admin oper admin oper oper
slot type type status status code
-------------------------------------------------------------------------
11* controllerFabric12x controllerFabric12x up up normal
12 controllerFabric12x controllerFabric12x up up standby
The following is sample output from the show card command from the console of slot 12 of the server switch and verifies the status of each controller card:
=========================================================================
=========================================================================
admin oper admin oper oper
slot type type status status code
-------------------------------------------------------------------------
11 controllerFabric12x controllerFabric12x up up normal
12* controllerFabric12x controllerFabric12x up up standby
When a Cisco SFS 7008P Server Switch is powered on, the fabric card in slot 11 assumes the primary card status, and the fabric card in slot 12 is the standby card. A fabric card is only eligible to be a controller card if the fabric card is installed in either slot 11 or 12 and a corresponding management interface module is available.
If a fabric card is operating in the recovery mode (such as, when it is executing the OS Recovery Image software), it is not eligible to be a primary or standby controller. The master and standby controllers automatically synchronize the state and configuration information. When a standby controller assumes managing a chassis, the service of other cards in the chassis (such as the Line Interface Modules, fabric controllers, and management interface modules) are not impacted. The other cards are not rebooted, reset, or interrupted. The standby controller is accessible through the serial console port. A user cannot access the standby controller using Telnet, SSH, SNMP, or HTTP.
The OS CLI is available on the standby controller (through the serial console only). The CLI on a standby controller is limited to read-only operations. A user can enter show commands but cannot enter config commands.
A card is only placed in-service if that card is running the same software as the primary controller. When the standby controller card has a different version than the primary controller card, the standby controller card shows a wrong image for its card opercode. In this event, no synchronization occurs between the two cards. The sys-sync-state for both the controller cards stay at not started.
When a hot-standby controller card takes over in the event of a primary controller card failure, the hot-standby controller card behaves according to its sys-sync-state.
When the sys-snyc-state is complete, the hot-standby controller card continues management without disturbing the services. No additional configuration file is executed, and there is no reboot to the node cards.
When the sys-sync-state is not started, the hot-standby controller card executes its startup-config, if it is present and continues management. There is no reboot to the node cards.
When the sys-sync-state is in progress, the primary controller card and the hot-standby controller card are partially synchronized. The hot-standby controller card reboots itself to avoid unpredictable results.
The synchronization begins only after the operStatus of the hot-standby controller card changes to up.
Power Supply Module Redundancy
This section describes the power supply module redundancy in the Cisco SFS 7008P Server Switch.
The Cisco SFS 7008P Server Switch has two AC-DC bulk power supply modules (see Figure 2-1). Each power supply has self-contained fans for cooling. Only one power supply, in either of the two slots is required to power the system. The second power supply acts as a redundant power supply. Each power supply has its own AC inlet and runs on an independent AC circuit. If the active power supply fails, the second power supply automatically takes over the full load of the server switch. There is no user intervention required in case of a failover. The current is shared in an active-active mode.
Figure 2-1 Cisco SFS 7008P Server Switch Front View
If a power supply module fails, it must remain within the chassis until a replacement is available. If it is removed, a blanking panel must be installed instead. During replacement, when the Cisco SFS 7008P Server Switch is in operation, the power supply module bay can remain empty for no more than three minutes.
Fan Tray Redundancy
This section describes fan tray redundancy in the Cisco SFS 7008P Server Switch.
The Cisco SFS 7008P Server Switch has two fan trays. Only one fan tray in either of the two slots is required to cool the system. The fan trays are hot swappable. The fan trays operate in an active-active mode.
If a fan tray fails, it must remain within the chassis until a replacement is available. If it is removed, a blanking panel must be installed instead. During replacement, when the Cisco SFS 7008P Server Switch is in operation, the fan tray bay can remain empty for no more than three minutes.
Management Interface Module Redundancy
This section describes management interface module redundancy in the Cisco SFS 7008P Server Switch.
The Cisco SFS 7008P Server Switch supports redundant, hot-swappable management interface modules. Each management interface module is paired to one of the fabric controller core modules. There are two core slots in the Cisco SFS 7008P Server Switch and two management interface modules (see Figure 2-1 and Figure 2-2). The controller in each of the core slots uses a management interface module to communicate with the outside network. Each management interface module provides its own serial and Ethernet port. Both sets of ports must be connected. The failover of the management interface module is paired with the failover of the fabric controller cards (see the "Fabric Controller Redundancy" section).
If a management interface module fails, it must remain within the chassis until a replacement is available. If it is removed, a blanking panel must be installed instead. During replacement, when the Cisco SFS 7008P Server Switch is in operation, the management interface module bay can remain empty for no more than three minutes.
Figure 2-2 Cisco SFS 7008P Rear View - Management Interface Modules and Line Interface Modules
Fabric Controller Redundancy
This section describes the fabric controller redundancy in the Cisco SFS 7008P Server Switch.
The behavior and responsibility of each fabric controller is determined by the type of slot into which it is inserted. A fabric controller can be installed in either a node slot or a core slot. When the software on a fabric controller module detects that a module is inserted in a core slot, it arbitrates for system mastership and runs. When you power on the Cisco SFS 7008P Server Switch, by default, the card in slot 11 becomes the active card and the card in slot 12 becomes the standby card (see Figure 2-1). The master and standby controller cards automatically synchronize state and configuration information. Thus, the switch does not have to be rebooted and none of the cards in the chassis require resetting if a switch failover occurs.
Fabric controller cards in the core slots are paired to the management interface modules of the Cisco SFS 7008P Server Switch (see the "Management Interface Module Redundancy" section). For redundancy, the pairing of both fabric controller cards and the management interface modules should be installed and operational. The two core cards operate in an active-active mode and are required for 100% throughput. If one core card fails, 50% of bandwidth is available to the system. If the failing card is the active master, the standby master assumes control of the chassis.
Removing the fabric controller in the core slot that currently acts as master, causes a failover to the standby pair of the fabric controller and management interface module pair. Before removing a fabric controller from one of the core slots, make sure that the redundant core fabric controller is functional.
The fabric controllers in the node slots act as slaves and do not operate in an active-standby configuration. Each node card is paired with two Line Interface Modules (see the "Line Interface Module Redundancy" section). If a node card fails, only ports connected to it and its corresponding Line Interface Modules are affected.
If a core or node card fails, it can be left within the chassis until a replacement is available. If it is removed, a blanking panel must be installed instead. During replacement, when the Cisco SFS 7008P Server Switch is in operation, the card bay can be left empty for no more than three minutes.
Line Interface Module Redundancy
This section describes the Line Interface Module redundancy in the Cisco SFS 7008P Server Switch.
Line Interface Modules support redundant connection from the HCAs (see Figure 2-2). Line Interface Modules are hot-swappable and redundant components. Each Line Interface Module is paired with a fabric controller node card (see the "Fabric Controller Redundancy" section).
If a Line Interface Module fails, it can be left within the chassis until a replacement is available. If it is removed, a blanking panel must be installed instead. During replacement, when the Cisco SFS 7008P Server Switch is in operation, the Line Interface Module bay can be left empty for no more than three minutes.
Figure 2-3 is an illustration that shows the core cards and Line Interface Module redundancy within a Cisco SFS 7008P Server Switch.
Figure 2-3 Redundancy within a Cisco SFS 7008P Server Switch
IB Fabric Redundancy
This section describes the IB fabric redundancy using Cisco SFS 7008P Server Switches.
For redundancy at the fabric level, IB HCAs can be dual-connected to a redundant pair of Cisco SFS 7008P Server Switches (see Figure 2-4). The IB links are active. Traffic over redundant IB links varies and is based on upper-level protocols and applications.
Figure 2-4 Redundancy with Dual Cisco SFS 7008P Server Switches
Cisco SFS 7000P and SFS 7000D Server Switch Redundancy
This section describes the Cisco SFS 7000P and SFS 7000D Server Switch redundancy and includes the following topics:
•
Power Supply Redundancy
•
Port Redundancy
•
IB Fabric Redundancy
Redundancy in the Cisco SFS 7000P and SFS 7000D Server Switches is supported at the hardware, port, and fabric levels. For more details about the Cisco SFS 7000P and SFS 7000D Server Switches, see the Cisco SFS 7000P and SFS 7000D InfiniBand Server Switches Hardware Installation Guide.
Power Supply Redundancy
This section describes power supply redundancy in the Cisco SFS 7000P and SFS 7000D Server Switches.
The Cisco SFS 7000P and SFS 7000D Server Switches power supply module is an integrated power supply and fan unit. A server switch can have up to two power supplies installed (see Figure 2-5). The switch requires only one power supply to function. The second power supply acts as a redundant power supply. The power supply modules are hot swappable. The replacement of any one power supply module does not disrupt the operation of the device and can be successfully completed without removing the device from a rack or disconnecting any cables.
Each power supply has its own AC inlet and runs on an independent AC circuit. The server switch automatically has the power supplies operating in active-active or active-standby mode. If one power supply were to fail, the second power supply automatically takes over the full load of the server switch. There is no user intervention required in case of a failover.
If a power supply module fails, it can be left within the chassis until a replacement is available. If it is removed, a blanking panel must be installed instead. During replacement, when the Cisco SFS 7000P or the SFS 7000D Server Switch is in operation, the power supply module bay can be left empty for no more than three minutes.
Figure 2-5 Cisco SFS 7000P and SFS 7000D Server Switches
Port Redundancy
This section describes port redundancy using the Cisco SFS 7000P and SFS 7000D Server Switches.
The Cisco SFS 7000P and SFS 7000D Server Switches each have 24 IB ports. Redundancy at the port level is such that if any single IB port fails, none of other ports have interrupted service.
In addition, to achieve port redundancy, users can employ two server switches. If the IB ports on one server switch fails, the second server switch automatically takes over the load of the first server switch.
IB Fabric Redundancy
This section describes fabric redundancy in the Cisco SFS 7000P and SFS 7000D Server Switches.
For redundancy at the fabric level, IB HCAs can be dual-connected to a redundant pair of Cisco SFS 7000 Series Server Switches. The Cisco SFS 7000 Series Server Switch redundant configuration is active-active. No hardware configuration is required. The IB links are active-active. But applications and upper-level protocol use of redundant IB links varies and could be active-active or active-standby, depending on the application.
In this typical configuration, a dual-port HCA is connected to a pair of Cisco SFS 7000 Series Server Switches (see Figure 2-6). This configuration provides server redundancy.
Figure 2-6 One Dual-Port HCA Connected to a Redundant Pair of Cisco SFS 7000 Series Server Switches
For greater redundancy, connect two single-port HCAs to a redundant pair of server switches. Such a configuration provides host and IB fabric redundancy.
Note
Figure 2-6 shows the Cisco SFS 7000D Server Switches in a redundant configuration. The Cisco SFS 7000P Server Switches can also be connected in similar redundant configurations.