Cisco UCS Manager B-Series Troubleshooting Guide
Troubleshooting IOM Issues
Downloads: This chapterpdf (PDF - 1.2MB) The complete bookPDF (PDF - 3.01MB) | The complete bookePub (ePub - 302.0KB) | Feedback

Troubleshooting IOM Issues

Troubleshooting IOM Issues

This chapter contains the following sections:

IOM Terminology

The following abbreviations and terms may be encountered while diagnosing IOM difficulties.

  • HR - Host Receive Block
  • NR - Network Receive Block
  • SS- Switching Subsystem
  • HI- Host Interface Block
  • NI - Network Interface Block
  • CI- CPU Interface Block
  • BI- BMC Interface Block
  • HIF- Host Interface
  • NIF- Network Interface
  • CIF- CPU Interface
  • BIF- BMC Interface
  • VIF- Virtual Interface
  • VNTag- Virtual NIC Tag
  • h2n- Host-to-Network direction. Used to describe traffic received on an HI, CI, or BI destined for an NI.
  • n2h- Network-to-Host direction. Used to describe traffic received on an NI destined for an HI, CI, or BI
  • Redwood - ASIC on the 2104 IOM. The basic functionality of the Redwood ASIC is to aggregate traffic to/from 8 host-facing 10G Ethernet ports connected to server adapter cards from/to 4 network-facing 10G Ethernet ports.
  • Woodside - ASIC on the 2204 and 2208 IOM. Aggregates traffic to/from 32 host-facing 10G Ethernet ports from/to 4 or 8 network-facing 10G Ethernet ports.
  • Chassis Management Switch (CMS) - a Marvell 88E6095 Ethernet switch integrated into the IOM.
  • CMC- A CPU that controls the Redwood or Woodside ASIC and the CMS, runs the required IOM firmware, and perform other chassis management functionality.

Chassis Boot Sequence

The 2100 and 2200 series IOMs in the Cisco 5108 chassis are the only active components in the chassis itself. A single IOM is sufficient to bring the chassis up, though both would be needed in a cluster configuration.

Problems in the chassis and IOM can sometimes be traced by understanding the following boot sequence:

  1. Power is applied.
  2. The bootloader is invoked, the IOM memory is configured, scrubbed and ECC is enabled. The bootloader sets the IOM health LED to amber.
  3. The kernel checksum and boot begins. An alternate kernel is booted if the selected kernel's checksum fails, or if the selected kernel failed to boot the user process "OHMS" for the last two boots.
  4. If the kernel can't be booted the IOM health LED blinks amber. The IOM will not be recognized by Cisco UCS Manager. If this is the only active IOM, the entire chassis will not be recognized If the kernel boot is successful, the IOM Health LED is set to green. If the IOM is not recognized by Cisco UCS Manager, rule out a problem with the physical cabling between IOM and fabric interconnect. A single functioning physical connection should be enough for the chassis to be managed in Cisco UCS Manager. If the cabling is not the issue, the final possibility is that the firmware version on the IOM may be much older than the version of Cisco UCS Manager. This may or may not require the IOM be returned to Cisco.
  5. Processes running on the Communications ASIC (either Redwood or Woodside ) and the CMC Process Monitor (pmon) starts and restarts the following CMC platform processes:
    • platform_ohms - POST and run-time health monitoring
    • dmserver - device manager, caches seeprom data, scans I2C devices
    • ipmiserver - sends sensor and FRU data to UCS Manager
    • cmc_manager - set chassis info, respond to UCS Manager requests
    • cluster_manager - local cluster master and client data transfer
    • updated - listens for software update requests
    • thermal - chassis thermal management
    • pwrmgr- chassis power manager
    • pppd - communication path to peer CMC over UART 2
    • obfllogger - accepts client requests to log messages to OBFL flash
    • rsyslogd - syslog, messages sent to UCS Manager controlled by level

If a failure at a stage 2 or 3 of the boot sequence is identified, the related components are the most likely causes and the IOM will almost certainly have to be returned. There is an HDMI console port on the IOMs that can directly monitor the IOM bootloader console, but its use is limited to Cisco's internal technicians, who would have access to the debugging software needed to make further changes such as loading in known functioning firmware images.

Table 1 Expected IOM and Chassis LED Behavior

LED

Status

LED State

IOM Health LED

Normal operation

Green

Booting or minor error

Amber

Major error

Blinking amber

Chassis OK LED

Booting

Off

IOM Controlling

Green

Chassis FAIL LED

No error

Off

Minor error

Amber

Major error

Blinking amber

Link Pinning and Failover Behavior

Failures seen when a link between IOM and fabric interconnect (an IOM HIF port) goes down are more easily understood when the static route pinning applied to the servers in the chassis is understood. The quickest solution may be simply to reacknowledge the chassis, but understanding this topic will provide insight into when to apply that solution.

Table 2 Link Pinning on an IOM

Number of Active Fabric Links

Blade slot pinned to fabric link

1-Link

All the HIF ports are pinned to the active link

2-Link

1,3,5,7 to link-1

2,4,6,8 to link-2

4-Link

1,5 to link-1

2,6 to link-2

3,7 to link-3

4,8 to link-4

8-Link (Applies only to 2208XP )

1 to link-1

2 to link-2

3 to link-3

4 to link-4

5 to link-5

6 to link-6

7 to link-7

8 to link-8

Only 1,2,4 and 8 links are supported. 3,5,6, and 7 links are not valid configurations.

Here is an example of an expected behavior:

  1. There are four active links on each IOM to their respective fabric interconnects.
  2. Link 4 between IOM-1 and its fabric interconnect (currently Active) is accidentally unplugged by a datacenter worker.
  3. Connectivity through IOM-1 from blade slots 3,4,7, and 8 fails over to IOM-2 and the standby fabric interconnect. While you might think that only slots 4 and 8 would be affected, link 3 fails administratively because 3 link configurations are not supported. Data throughput is not lost, but the failure is noted in Cisco UCS Manager.
  4. At this point you can either:
    • Resolve the connectivity issue by plugging link 4 back in. Normal configured operation will resume.
    • Re-acknowledge the chassis, and the configuration will re-establish pinning to work over 2 fabric links. If at a later time the links are replaced or repaired, a second re-acknowledgement will be needed.

Recommended Solutions for IOM Issues

The following table lists guidelines and recommended solutions for troubleshooting IOM issues.

Table 3 IOM Issues

Issue

Recommended Solution

The IOM Health LED turns amber at initial bootup, and stays there.

Re-seat the affected IOM.

Remove and replace the IOM.

If both IOMs in a chassis are showing the same behavior, decommission the server or chassis and call Cisco TAC.

The IOM health LED blinks amber but never turns green.

Re-seat the affected IOM.

Remove and replace the IOM.

If both IOMs in a chassis are showing the same behavior, decommission the server or chassis and call Cisco TAC.

CMC receives chassis info from Cisco UCS Manager but one or more blades are either not responding or do not accept the chassis info.

Verify that the IOM firmware and Cisco UCS Manager are at the same software level.

Re-seat the affected IOM.

Check the POST results for the Redwood or Woodside ASIC in the IOM.

Check for runtime link down status.

Check for failed POST tests on the affected server. The most detail on chassis info is in the CMC Manager logs. The CMC Cluster state can be compared to the fabric interconnect with the following commands

FI: show cluster state

cmc connected directly to the IOM: show platform software cmcctrl dmclient all

CMC never receives chassis info from Cisco UCS Manager.

Verify that the IOM firmware and Cisco UCS Manager are at the same software level.

Verify that at least one physical cable between the IOM and fabric interconnect is functioning properly.

Check for runtime link down status.

Re-seat the affected IOM.

The link to one or more servers has been lost.

Verify that the affected servers are in the same pinning group. Isolate and replace the downed link if possible.

Re-seat the affected server.

Re-seat the affected IOM.

Re-establish pinning to the affected servers by reacknowledging the chassis.

IOM-1 does not seem to peer connect to IOM-2.

Re-seat both IOMs.

Checking POST results: a successful Woodside POST example

cmc-3-A# connect iom 1
fex-1# show platform software woodside post
PRBS passes: 0
+-------+--------+--------+--------+--------+
| Port  | Pat0   | Pat1   | Pat2   | Pat3   |
+-------+--------+--------+--------+--------+
+-------+--------+--------+--------+--------+

POST Results:
  legend:
        '.' PASSED
        'X' FAILED
        ' ' Not Run
        '/' Not applicable
              A C N N N N N N N N H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H
              s I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I
              i   0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
              c                                       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+----------+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| Register | |/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|
|1| MBIST    |.|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|
|2| CI lpbk  |.|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|/|
|3| Serdes   |/|/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
|4| PHY BIST |/|/|.|.|.|.|.|.|.|.| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
|5| PRBS     |/|/|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|
|6| PCS lpbk |/|/|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|.|
|7| Runtime  |/|/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
+-+----------+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Checking POST results: a successful Redwood POST example

cmc-3-A# connect iom 1
fex-1# show platform software redwood post
Redwood POST Results:
  legend:
        '.' PASSED
        'X' FAILED
        ' ' Not Run
+-------------------+-+----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                   |A|    | | | | | | | | | | | | | | |
|                   |S|ASIC| | |H|H|H|H|H|H|H|H|N|N|N|N|
|                   |I|LVL |C|B|I|I|I|I|I|I|I|I|I|I|I|I|
| POST Test         |C|RSLT|I|I|0|1|2|3|4|5|6|7|0|1|2|3|
+-------------------+-+----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0. Register Test  |0|  . | | | | | | | | | | | | | | |
| 1. MBIST          |0|  . | | | | | | | | | | | | | | |
| 2. CI Loopback    |0|  . | | | | | | | | | | | | | | |
| 3. Serdes         |0|    | | |.|.|.|.|.|.|.|.|.|.|.|.|
| 4. PHY BIST       |0|    | | |.|.|.|.|.|.|.|.|.|.|.|.|
| 5. PRBS           |0|    | | |.|.|.|.|.|.|.|.|.|.|.|.|
| 6. PCS Loopback   |0|    | | |.|.|.|.|.|.|.|.|.|.|.|.|
| 7. IIF PRBS       |0|    | | | | | | | | | | | | | | |
| 8. Runtime Failure|0|    | | | | | | | | | | | | | | |
+-------------------+-+----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
fex-1#

Verifying Chassis Management Switch Statistics

cmc-3-A# connect iom 1
fex-1# show platform software cmcctrl cms all
0        up	<= iBMC slot 1
1        up	<= iBMC slot 2
2        down	<= iBMC slot 3
3        down	<= iBMC slot 4
4        up	<= iBMC slot 5
5        down	<= iBMC slot 6
6        down	<= iBMC slot 7
7        down	<= iBMC slot 8
8        up	<= CMS/CMC Processor link	
9        no_phy	<= Redwood link
10       no_phy	<= Debug port link
IN_GOOD_OCTETS_LO (p0)  : [0x000290AA]
IN_GOOD_OCTETS_HI (p0)  : [0x00000000]
...
...
IN_FILTERED       (p10) : [0x0000]
OUT_FILTERED      (p10) : [0x0000]

Check for runtime link down status, Woodside ASICs.

  • NI - network interface is to the switch
  • HI - host interface is to the blades
cmc-3-A# connect iom 1
fex-1# show platform software woodside sts
Board Status Overview:
 legend:
        '  '= no-connect
        X   = Failed
        -   = Disabled
        :   = Dn
        |   = Up
        [$] = SFP present
        [ ] = SFP not present
        [X] = SFP validation failed
------------------------------

(FINAL POSITION TBD)     Uplink #:        1  2  3  4  5  6  7  8
                      Link status:        |  |  |  |  |  |  |  |
                                        +-+--+--+--+--+--+--+--+-+
                              SFP:       [$][$][$][$][$][$][$][$]
                                        +-+--+--+--+--+--+--+--+-+
                                        | N  N  N  N  N  N  N  N |
                                        | I  I  I  I  I  I  I  I |
                                        | 0  1  2  3  4  5  6  7 |
                                        |                        |
                                        |        NI (0-7)        |
                                        +------------+-----------+
                                                     |
             +-------------------------+-------------+-------------+---------------------------+
             |                         |                           |                           |
+------------+-----------+ +-----------+------------+ +------------+-----------+ +-------------+----------+
|        HI (0-7)        | |        HI (8-15)       | |       HI (16-23)       | |        HI (24-31)      |
|                        | |                        | |                        | |                        |
| H  H  H  H  H  H  H  H | | H  H  H  H  H  H  H  H | | H  H  H  H  H  H  H  H | | H  H  H  H  H  H  H  H |
| I  I  I  I  I  I  I  I | | I  I  I  I  I  I  I  I | | I  I  I  I  I  I  I  I | | I  I  I  I  I  I  I  I |
| 0  1  2  3  4  5  6  7 | | 8  9  1  1  1  1  1  1 | | 1  1  1  1  2  2  2  2 | | 2  2  2  2  2  2  3  3 |
|                        | |       0  1  2  3  4  5 | | 6  7  8  9  0  1  2  3 | | 4  5  6  7  8  9  0  1 |
+-+--+--+--+--+--+--+--+-+ +-+--+--+--+--+--+--+--+-+ +-+--+--+--+--+--+--+--+-+ +-+--+--+--+--+--+--+--+-+
 [ ][ ][ ][ ][ ][ ][ ][ ]   [ ][ ][ ][ ][ ][ ][ ][ ]   [ ][ ][ ][ ][ ][ ][ ][ ]   [ ][ ][ ][ ][ ][ ][ ][ ]
+-+--+--+--+--+--+--+--+-+ +-+--+--+--+--+--+--+--+-+ +-+--+--+--+--+--+--+--+-+ +-+--+--+--+--+--+--+--+-+
  -  -  -  |  -  -  -  |     -  -  -  :  -  -  -  :     -  -  -  :  -  -  -  :     -  -  -  |  -  -  -  |
  3  3  3  2  2  2  2  2     2  2  2  2  2  1  1  1     1  1  1  1  1  1  1  9     8  7  6  5  4  3  2  1
  2  1  0  9  8  7  6  5     4  3  2  1  0  9  8  7     6  5  4  3  2  1  0
  \__\__/__/  \__\__/__/     \__\__/__/  \__\__/__/     \__\__/__/  \__\__/__/     \__\__/__/  \__\__/__/
    blade8      blade7         blade6      blade5         blade4      blade3         blade2      blade1

Check for runtime link down status, Redwood ASICs.

  • NI - network interface is to the switch
  • HI - host interface is to the blades
cmc-3-A# connect iom 1
fex-1# show platform software redwood sts
Board Status Overview:
 legend:
        ' '= no-connect
        X  = Failed
        -  = Disabled
        :  = Dn
        |  = Up
        ^  = SFP+ present
        v  = Blade Present
------------------------------

        +---+----+----+----+
        |[$]| [$]| [$]| [$]|
        +---+----+----+----+
          |    |    |    |
        +-+----+----+----+-+
        | 0    1    2    3 |
        | I    I    I    I |
        | N    N    N    N |
        |                  |
        |      ASIC 0      |
        |                  |
        | H H H H H H H H  |
        | I I I I I I I I  |
        | 0 1 2 3 4 5 6 7  |
        +-+-+-+-+-+-+-+-+--+
          | | : : - - : :
         +-+-+-+-+-+-+-+-+
         |v|v|v|v|-|-|v|v|
         +-+-+-+-+-+-+-+-+
Blade:    8 7 6 5 4 3 2 1

fex-1#

Check administrative control, MAC and PHY status, and SFP detected, Redwood example.

cmc-3-A# connect iom 1
fex-1# show platform software redwood oper

ASIC 0: 
 +--+----+-+----+-----+-------------------------+-+
 |  |    | |    |MAC  |        PHY              | |
 |P | N  |A|    |-+-+-+----+-+-+-+--------+-----+ +
 |o | a  |d|    | | |A|    |X| | |        |     | |
 |r | m  |m|    |L|R|L|    |G|P|P|        |     |S|
 |t | e  |i|Oper|C|M|G|MDIO|X|C|M|        |     |F|
 |  |    |n| St |L|T|N|adr |S|S|D| u-code | Ver |P|
 +--+----+-+----+-+-+-+----+-+-+-+--------+-----+-+
 | 0| CI |E| Up | | | |  0 |0|0|0| n/a    | 0.00| |
 | 1| BI |E| Up | | | |  0 |0|0|0| n/a    | 0.00| |
 | 2| HI0|E| Up | | | | 18 |1|1|1| Ok     | 1.09| |
 | 3| HI1|E| Up | | | | 19 |1|1|1| Ok     | 1.09| |
 | 4| HI2|E| Dn |1| |1| 16 |0|0|0| Ok     | 1.09| |
 | 5| HI3|E| Dn |1| |1| 17 |0|0|0| Ok     | 1.09| |
 | 6| HI4|-| Dn | | |1| 14 |0|0|0| Ok     | 1.09| |
 | 7| HI5|-| Dn | | |1| 15 |0|0|0| Ok     | 1.09| |
 | 8| HI6|E| Dn |1| |1| 12 |0|0|0| Ok     | 1.09| |
 | 9| HI7|E| Dn |1| |1| 13 |0|0|0| Ok     | 1.09| |
 |10| NI0|E| Up | | | | 23 |1|1|1| Ok     | 1.39|*|
 |11| NI1|E| Up | | | | 22 |1|1|1| Ok     | 1.39|*|
 |12| NI2|E| Up | | | | 21 |1|1|1| Ok     | 1.39|*|
 |13| NI3|E| Up | | | | 20 |1|1|1| Ok     | 1.39|*|
 +--+----+-+----+-+-+-+----+-+-+-+--------+-----+-+

Check administrative control, MAC and PHY status, and SFP detected, Woodside example.

fex-1# show platform software woodside oper

ASIC 0:
 +---+-----+-+----+-----------+-----------------+
 |   |     | |    |    MAC    | |  PHY  | |     |
 |   |     | |    |           |S|---+---| |     |
 |   |     |A|    |-+-+-+-+-+-|e|XFI|SFI| |     |
 | P |     |d|    | |L|T|R|R|T|r|---+---| |     |
 | o |     |m|    |P|o|R|R|F|F|d|p|p|p|p|S|     |
 | r |     |i|Oper|C|c|d|d|l|l|e|m|c|c|m|F|ucode|----------------------------+----------------------------+-----+
 | t |Name |n| St |S|k|y|y|t|t|s|d|s|s|d|P| ver |    Time last came Up       |      Time last went Down   |Flaps|
 +---+-----+-+----+-+-+-+-+-+-+-+-+-+-+-+-+-----+----------------------------+----------------------------+-----+
 | 0 |HI0  |-| Dn |0|0|1|1|0|0|0|0|0|0|0| | 0.00| 01/01/1970 00:00:00.000000 | 01/01/1970 00:00:00.000000 |    0|
 | 1 |HI1  |-| Dn |0|0|1|1|0|0|0|0|0|0|0| | 0.00| 01/01/1970 00:00:00.000000 | 01/01/1970 00:00:00.000000 |    0|
 | 2 |HI2  |-| Dn |0|0|1|1|0|0|0|0|0|0|0| | 0.00| 01/01/1970 00:00:00.000000 | 01/01/1970 00:00:00.000000 |    0|
 | 3 |HI3  |E| Up |1|1|1|1|1|1|1|0|0|0|0| | 0.00| 02/03/2012 23:17:36.046137 | 02/03/2012 23:17:34.815303 |   24|
 | 4 |HI4  |-| Dn |0|0|1|1|0|0|0|0|0|0|0| | 0.00| 01/01/1970 00:00:00.000000 | 01/01/1970 00:00:00.000000 |    0|
 | 5 |HI5  |-| Dn |0|0|1|1|0|0|0|0|0|0|0| | 0.00| 01/01/1970 00:00:00.000000 | 01/01/1970 00:00:00.000000 |    0|
 | 6 |HI6  |-| Dn |0|0|1|1|0|0|0|0|0|0|0| | 0.00| 01/01/1970 00:00:00.000000 | 01/01/1970 00:00:00.000000 |    0|
 | 7 |HI7  |E| Up |1|1|1|1|1|1|1|0|0|0|0| | 0.00| 02/03/2012 22:53:04.761879 | 02/03/2012 22:52:44.548148 |   17|
 | 8 |HI8  |-| Dn |0|0|1|1|0|0|0|0|0|0|0| | 0.00| 01/01/1970 00:00:00.000000 | 01/01/1970 00:00:00.000000 |    0|
 | 9 |HI9  |-| Dn |0|0|1|1|0|0|0|0|0|0|0| | 0.00| 01/01/1970 00:00:00.000000 | 01/01/1970 00:00:00.000000 |    0|
 |10 |HI10 |-| Dn |0|0|1|1|0|0|0|0|0|0|0| | 0.00| 01/01/1970 00:00:00.000000 | 01/01/1970 00:00:00.000000 |    0|
 |11 |HI11 |E| Dn |0|0|1|1|0|0|0|0|0|0|0| | 0.00| 02/03/2012 20:46:44.214237 | 02/03/2012 20:49:30.606932 |    3|
 |12 |HI12 |-| Dn |0|0|1|1|0|0|0|0|0|0|0| | 0.00| 01/01/1970 00:00:00.000000 | 01/01/1970 00:00:00.000000 |    0|
 |13 |HI13 |-| Dn |0|0|1|1|0|0|0|0|0|0|0| | 0.00| 01/01/1970 00:00:00.000000 | 01/01/1970 00:00:00.000000 |    0|
 |14 |HI14 |-| Dn |0|0|1|1|0|0|0|0|0|0|0| | 0.00| 01/01/1970 00:00:00.000000 | 01/01/1970 00:00:00.000000 |    0|
 |15 |HI15 |E| Dn |0|0|1|1|0|0|0|0|0|0|0| | 0.00| 02/03/2012 20:45:30.918631 | 02/03/2012 20:48:06.811009 |    3|
 |16 |HI16 |-| Dn |0|0|1|1|0|0|0|0|0|0|0| | 0.00| 01/01/1970 00:00:00.000000 | 01/01/1970 00:00:00.000000 |    0|
 |17 |HI17 |-| Dn |0|0|1|1|0|0|0|0|0|0|0| | 0.00| 01/01/1970 00:00:00.000000 | 01/01/1970 00:00:00.000000 |    0|