Cisco SFS InfiniBand Redundancy Configuration Guide, Release 2.10
Subnet Manager Redundancy

Table Of Contents

Subnet Manager Redundancy

Embedded Subnet Manager

High-Performance Subnet Manager

Setting up Master and Standby Subnet Managers

Setting up Master and Standby Subnet Managers Using Embedded Subnet Managers

Setting up Master and Standby Subnet Managers with High-Performance Subnet Managers

Setting Up Database Synchronization

Setting up Database Synchronization for Embedded Subnet Managers

Setting up Database Synchronization for High-Performance Subnet Managers


Subnet Manager Redundancy


This chapter describes Subnet Manager redundancy and includes the following sections:

Embedded Subnet Manager

High-Performance Subnet Manager

Setting up Master and Standby Subnet Managers

Setting Up Database Synchronization


Note For expansions of acronyms and abbreviations used in this publication, see "Acronyms and Abbreviations."


Cisco Subnet Managers support redundancy as described in the IB specifications. There is a master Subnet Manager and there are one or more standby Subnet Managers. In the event that something happens to the master Subnet Manager, the next-in-line standby Subnet Manager assumes control of the IB fabric.

There are two types of Cisco Subnet Managers. They are as follows:

Embedded Subnet Manager

High-Performance Subnet Manager

The Embedded Subnet Manager runs on a chassis, and the High-Performance Subnet Manager runs on hosts. Both types of Subnet Managers support the IB standard master/standby failover between each other.

The Cisco Subnet Managers have a proprietary database synchronization protocol that synchronizes important data between the master Subnet Manager and one or more standby Subnet Managers. This provides high-availability redundancy, enabling a database synchronized standby Subnet Manager to assume control as master without disrupting the IB fabric.


Note For redundancy, we recommend that you either have two Embedded Subnet Managers operating together or two High-Performance Subnet Managers operating together. Database-synchronization is not supported between the Embedded Subnet Manager and the High-Performance Subnet Manager, so configuring an Embedded Subnet Manager and a High-Performance Subnet Manager could cause disruption in data traffic in the case of a failure. For more information about database synchronization, see the "Setting Up Database Synchronization" section.


Figure 5-1 shows the IB fabric with the Embedded Subnet Manager and Figure 5-2 shows the IB fabric with the High-Performance Subnet Manager. Figure 5-1 and Figure 5-2 show typical Subnet Manager configurations. The user can set up different configurations as required for use.

Figure 5-1 InfiniBand Fabric and Embedded Subnet Manager

Figure 5-2 InfiniBand Fabric and High-Performance Subnet Manager

For more information about High-Performance Subnet Managers, see the Cisco High-Performance Subnet Manager for InfiniBand Server Switches.

Embedded Subnet Manager

This section describes the Embedded Subnet Manager.

The Embedded Subnet Manager operates on the Cisco SFS 3504, the Cisco SFS 3000 series, the Cisco SFS 7000 series, and the Cisco SFS 7008P Server Switches. When deployed in pairs, the Embedded Subnet Manager prevents single points of failure at the system level. The Embedded Subnet Manager is recommended for use in subnets of up to 1,000 nodes, when available. With the Embedded Subnet Manager, a user can detect changes in large subnets within a short duration.


Note The Cisco SFS 3012R Server Switch and the Cisco SFS 7008P Server Switch have a Subnet Manager running on each controller card. There are two Subnet Managers in every chassis.


High-Performance Subnet Manager

This section describes the High-Performance Subnet Manager.

The High-Performance Subnet Manager is a standalone software package that centrally manages and controls an IB subnet. It provides high availability when configured in an N+1 configuration and provides high performance and scalability to large clusters as well.

For fabrics containing over 1,000 nodes, we recommend that you use the High-Performance Subnet Manager. Although the Embedded Subnet Manager operates on the Cisco SFS 3504, the Cisco SFS 3000 series, the Cisco SFS 7000 series, and the Cisco SFS 7008P Server Switches, due to larger memory capacity and faster CPU performance the High-Performance Subnet Manager scales more effectively in larger fabrics.

The Cisco High-Performance Subnet Manager complements the Embedded Subnet Manager by off loading the Subnet Manager function from the embedded processors on the IB switches.

The High-Performance Subnet Manager is also required in networking configurations of IB fabrics where chassis types do not contain Embedded Subnet Managers.


Note The Subnet Manager maintains optimal routing decisions. When a switch fails, the Subnet Manager is notified through an in-band trap mechanism, and it resets by re-running the routing calculation for subnets and reprogramming the routes.


Setting up Master and Standby Subnet Managers

This section describes how to set up the master and standby Subnet Managers and includes the following topics:

Setting up Master and Standby Subnet Managers Using Embedded Subnet Managers

Setting up Master and Standby Subnet Managers with High-Performance Subnet Managers


Note In the following sections, values are provided as examples only. We do not recommend that you use non-default values.


Setting up Master and Standby Subnet Managers Using Embedded Subnet Managers

This section describes how to set up master and standby Subnet Managers using the Embedded Subnet Manager.

Typically Embedded Subnet Managers are set up with two chassis in the IB fabric and should be disabled on all other chassis. The priority number of the Subnet Manager determines which is the master. We recommend that you keep the priority of all Subnet Managers in a network equal, to ensure that when a new Subnet Manager is added to the network, it does not take over mastership of the network.

To configure and verify the priority between Subnet Managers using CLI commands, perform the following steps:


Step 1 Configure the priority between Subnet Managers to determine which is the master Subnet Manager and which is the standby Subnet Manager.


Note The priority range is between 0 and 15. The Subnet Manager that is the master, is the one that is assigned the highest number in this range. The default priority number is 10. The High-Performance Subnet Manager is assigned a higher priority than the Embedded Subnet Manager to ensure that the High-Performance Subnet Manager takes preference over the Embedded Subnet Manager if they are both present in the IB fabric at the same time.


The following example shows how to configure priority between Embedded Subnet Managers in two Cisco SFS 3504 Server Switches:

SFS-3504# config
SFS-3504(config)# ib sm subnet-prefix fe:80:00:00:00:00:00:00 priority 12
SFS-3504(config)# exit

Step 2 Configure the master-poll-interval to set the time interval at which the master Subnet Manager is polled to see whether it is active.

The following example shows how to configure the master-poll-interval to 5:

SFS-3504(config)# ib sm subnet-prefix fe:80:00:00:00:00:00:00 master-poll-intval 5


Note Decrease the master-poll-interval value to hasten the standby Subnet Manager takeover and increase the value to slow it.


Step 3 Configure the master-poll-retries to set the number of times it polls the master.

The following example shows how to configure the master-poll-retries to 0:

SFS-3504(config)# ib sm subnet-prefix fe:80:00:00:00:00:00:00 master-poll-retries 0


Note Decrease the master-poll-retries value to hasten the standby Subnet Manager takeover and increase the value to slow it.


Step 4 Verify the configuration.

The following example shows how to verify the Subnet Manager configuration, the master-poll-interval value, and the master-poll-retries value:

SFS-3504# show ib sm configuration subnet-prefix all
================================================================================
                           Subnet Manager Information
================================================================================
            subnet-prefix : fe:80:00:00:00:00:00:00
                     guid : 00:05:ad:00:00:01:0c:19
                 priority : 12
                   sm-key : 00:00:00:00:00:00:00:00
              oper-status : master
                act-count : 12938
      sweep-interval(sec) : 10
   response-timeout(msec) : 200
  master-poll-intval(sec) : 5
      master-poll-retries : 0
           max-active-sms : 0
         LID-mask-control : 0
         switch-life-time : 18
     switch-hoq-life-time : 18
       host-hoq-life-time : 18
                 max-hops : 64
              mad-retries : 5
        node-timeout(sec) : 10
     wait-report-response : false
       sa-mad-queue-depth : 256
          qos-admin-state : disabled
       max-operational-v1 : auto-link
      min-vl-cap-detected : vl0-vl7

The output in this instance verifies that the operational status of this Subnet Manager is that of master, the master-poll-interval is 5 seconds, and the master-poll-retries is 0.

Step 5 Verify that the Subnet Managers are present in the IB fabric.

The following is sample output from the show ib sm sm-info subnet-prefix command and shows how to verify that the Subnet managers are present in the IB fabric:

SFS-3504# show ib sm sm-info subnet-prefix fe:80:00:00:00:00:00:00

================================================================================
                      Discovered Subnet Managers in Fabric
================================================================================
            subnet-prefix : fe:80:00:00:00:00:00:00
                port-guid : 00:05:ad:00:00:01:1d:20
                 priority : 0
                 sm-state : standby
                   sm-key : 00:00:00:00:00:00:00:00
                act-count : 219

The sm-state in this instance shows that this standby Subnet Manager is discovered in the fabric.


Setting up Master and Standby Subnet Managers with High-Performance Subnet Managers

This section describes how to set up Master and Standby Subnet Managers using the High-Performance Subnet Managers.

Typically High-Performance Subnet Managers are set up with two hosts in the IB fabric and should be disabled on all other chassis. The priority number of the High-Performance Subnet Manager determines which is the master. We recommend that you keep the priority of all Subnet Managers in a network equal, to ensure that when a new Subnet Manager is added to the network, it does not take over mastership of the network.

To configure and verify the priority between Subnet Managers using CLI commands, perform the following steps:


Step 1 Configure the priority between the High-Performance Subnet Managers to determine which is the master Subnet Manager, and which is the standby Subnet Manager.


Note The priority range is between 0 and 15. The Subnet Manager that is the master, is the one that is assigned the highest number in this range. The default priority number is 11. The High-Performance Subnet Manager is assigned a higher priority than the Embedded Subnet Manager to ensure that the High-Performance Subnet Manager takes preference over the Embedded Subnet Manager if they are both present in the IB fabric at the same time.


The following example shows how to configure priority between High-Performance Subnet Managers in two hosts:

ib_sm> config priority 12

Step 2 Configure the master-poll-interval to set the time interval at which the master Subnet Manager is polled to see whether it is active.

The following example shows how to configure the master-poll-interval to 5:

ib_sm> config master-poll-interval 5

Step 3 Configure the master-poll-retries to set the number of times it polls the master.

The following example shows how to configure the master-poll-retries to 0:

ib_sm> config master-poll-retries 0

Step 4 Verify the configuration.

The following is sample output from the show config command and shows how to verify the Subnet Manager configuration, the master-poll-interval value, and the master-poll-retries value:

ib_sm> show config

================================================================================
                          Subnet Manager Configuration
================================================================================
                 subnet-prefix : fe:80:00:00:00:00:00:00
                          guid : 00:05:ad:00:00:01:0c:19
                      priority : 12
                        sm-key : 00:00:00:00:00:00:00:00
                   oper-status : master
                     act-count : 2923
           sweep-interval(sec) : 10
        response-timeout(msec) : 200
                   mad-retries : 5
                  node-timeout : 10
     master-poll-interval(sec) : 5
           master-poll-retries : 0
                max-active-sms : 0
              LID-mask-control : 0
              switch-life-time : 18
               sw-link-hoqlife : 18
               ca-link-hoqlife : 18
                      max-hops : 64
          wait-report-response : false
            sa-mad-queue-depth : 256
            local-node-retries : 10
               qos-admin-state : disabled
            max-operational-vl : default
           min-vl-cap-detected : vl0-vl7
ib_sm>

The output in this instance verifies that the operational status of this Subnet Manager is that of master, the master-poll-interval is 5 seconds, and the master-poll-retries is 0.

Step 5 Verify the High-Performance Subnet Managers on the IB fabric.

The following is sample output from the show other-sm command and shows how to verify the Subnet Managers on the IB fabric:

ib_sm> show other-sm

================================================================================
                         Subnet Managers in the subnet
================================================================================
                 subnet-prefix : fe:80:00:00:00:00:00:00
                     port-guid : 00:05:ad:00:00:01:1d:20
                        sm-key : 00:00:00:00:00:00:00:00
                      priority : 0
                      sm-state : standby
                     act-count : 1133

ib_sm>

This instance shows that one standby Subnet Manager is discovered in the fabric.


Setting Up Database Synchronization

This section describes how to set up database synchronization and includes the following topics:

Setting up Database Synchronization for Embedded Subnet Managers

Setting up Database Synchronization for High-Performance Subnet Managers

Cisco Subnet Managers have a proprietary database synchronization protocol that synchronizes important data between the master Subnet Manager and one or more standby Subnet Managers. This provides high-availability redundancy, enabling a database synchronized standby Subnet Manager to take over as master without disrupting the IB fabric. It is critical in large clusters with enterprise-class IB fabrics and where MTBF is required to be minimal.


Note We recommend that you keep the priority of all Subnet Managers in a network equal. Thus, when a new Subnet Manager is added to the network, it enters as a standby and synchronizes itself to the master.


Setting up Database Synchronization for Embedded Subnet Managers

To set up database synchronization configurations for the Embedded Subnet Manager, perform the following steps:


Step 1 Verify that database synchronization is enabled (enable : true), and view the current configurations.


Note By default the database synchronization feature is enabled.


The following is sample output from the show ib db-sync subnet-prefix command:

SFS-3504# show ib sm db-sync subnet-prefix fe:80:00:00:00:00:00:00

================================================================================
              Subnet Manager Database Synchronization Information
================================================================================
            subnet-prefix : fe:80:00:00:00:00:00:00
                   enable : true
           max-dbsync-sms : 1
     session-timeout(sec) : 10
       poll-interval(sec) : 3
   cold-sync-timeout(sec) : 10
          cold-sync-limit : 2
    cold-sync-period(sec) : 900
   new-session-delay(sec) : 120
     resync-interval(sec) : 3600
                    state : in-sync
SFS-3504#

Step 2 (Optional) Configure max-dbsync-sms to set the maximum number of standby Subnet Managers with which the master Subnet Manager database can synchronize.

The following example shows how to set the max-dbsync-sms to 2:

SFS-3504(config)# ib sm db-sync subnet-prefix fe:80:00:00:00:00:00:00 max-dbsync-sms 2

Other values displayed in Step 1 under the Subnet Manager Database Synchronization Information can similarly be configured by the user.

Step 3 Verify that the Subnet Managers are synchronized.

The following is sample output from the show ib sm db-sync subnet-prefix command and shows how to view the standby Subnet Managers:

SFS-3504# show ib sm db-sync subnet-prefix fe:80:00:00:00:00:00:00 sm-list 

================================================================================
                              DB Synchronizing SMs
================================================================================
                 subnet-prefix : fe:80:00:00:00:00:00:00
                     port-guid : 00:05:ad:00:00:01:1d:20
                   entry-state : active
                 session-state : active
  session-timeout-current(sec) : 8
    poll-interval-current(sec) : 1
new-session-delay-current(sec) : 120
  resync-interval-current(sec) : 3589
                         state : in-sync

SFS-3504#

The display verifies that there is one standby Subnet Manager as listed.


Setting up Database Synchronization for High-Performance Subnet Managers

To set up database synchronization configurations for the High-Performance Subnet Manager, perform the following steps:


Step 1 Verify that database synchronization is enabled (admin-state : enabled), and see the current configurations.


Note By default the database synchronization feature is enabled.


The following is sample output from the show db-sync command:

ib_sm> show db-sync

================================================================================
                        DB Sync Configuration and Status
================================================================================
              protocol-version : 10
                   admin-state : enabled
                max-dbsync-sms : 1
          session-timeout(sec) : 10
            poll-interval(sec) : 3
        cold-sync-timeout(sec) : 10
               cold-sync-limit : 2
         cold-sync-period(sec) : 900
        new-session-delay(sec) : 120
          resync-interval(sec) : 3600
                         state : in-sync
ib_sm>

Step 2 (Optional) Configure max-dbsync-sms to set the maximum number of standby Subnet Managers with which the master Subnet Manager database can synchronize.

The following example shows how to set the max-dbsync-sms to 2:

ib_sm> config db-sync max-dbsync-sms 2

Other values displayed in Step 1 under the Database Synchronization Configuration and Status display can similarly be configured by the user.

Step 3 Verify that the Subnet Managers are synchronized.

The following is sample output from the show db-sync sm-list command and shows how to list the standby Subnet Managers:

ib_sm> show db-sync sm-list
================================================================================
                              DB Synchronizing SMs
================================================================================
                     port-guid : 00:05:ad:00:00:01:1d:20
                   entry-state : active
                 session-state : active
  session-timeout-current(sec) : 8
    poll-interval-current(sec) : 1
new-session-delay-current(sec) : 120
  resync-interval-current(sec) : 3373
                         state : in-sync

ib_sm>

The display verifies that there is one standby Subnet Manager as listed.