Cisco UCS Manager CLI Configuration Guide, Release 2.1
Monitoring Hardware
Downloads: This chapterpdf (PDF - 1.34MB) The complete bookPDF (PDF - 8.92MB) | The complete bookePub (ePub - 1.42MB) | Feedback

Monitoring Hardware

Monitoring Hardware

This chapter includes the following sections:

Monitoring Fan Modules

Procedure
     Command or ActionPurpose
    Step 1 UCS-A# scope chassis chassis-num 

    Enters chassis mode for the specified chassis.

     
    Step 2UCS-A /chassis # show environment fan 

    Displays the environment status for all fans within the chassis.

    This includes the following information:

    • Overall status

    • Operability

    • Power state

    • Thermal status

    • Threshold status

    • Voltage status

     
    Step 3UCS-A /chassis # scope fan-module tray-num module-num 

    Enters fan module chassis mode for the specified fan module.

    Note   

    Each chassis contains one tray, so the tray number in this command is always 1.

     
    Step 4UCS-A /chassis/fan-module # show [detail | expand] 

    Displays the environment status for the specified fan module.

     

    The following example displays information about the fan modules in chassis 1:

    UCS-A# scope chassis 1
    UCS-A /chassis # show environment fan
    Chassis 1:
        Overall Status: Power Problem
        Operability: Operable
        Power State: Redundancy Failed
        Thermal Status: Upper Non Recoverable
    
        Tray 1 Module 1:
            Threshold Status: OK
            Overall Status: Operable
            Operability: Operable
            Power State: On
            Thermal Status: OK
            Voltage Status: N/A
    
            Fan Module Stats:
                 Ambient Temp (C): 25.000000
    
            Fan 1:
                Threshold Status: OK
                Overall Status: Operable
                Operability: Operable
                Power State: On
                Thermal Status: OK
                Voltage Status: N/A
    
            Fan 2:
                Threshold Status: OK
                Overall Status: Operable
                Operability: Operable
                Power State: On
                Thermal Status: OK
                Voltage Status: N/A
    
        Tray 1 Module 2:
            Threshold Status: OK
            Overall Status: Operable
            Operability: Operable
            Power State: On
            Thermal Status: OK
            Voltage Status: N/A
    
            Fan Module Stats:
                 Ambient Temp (C): 24.000000
    
            Fan 1:
                Threshold Status: OK
                Overall Status: Operable
                Operability: Operable
                Power State: On
                Thermal Status: OK
                Voltage Status: N/A
    
            Fan 2:
                Threshold Status: OK
                Overall Status: Operable
                Operability: Operable
                Power State: On
                Thermal Status: OK
                Voltage Status: N/A
    
    

    The following example displays information about fan module 2 in chassis 1:

    UCS-A# scope chassis 1
    UCS-A /chassis # scope fan-module 1 2
    UCS-A /chassis/fan-module # show detail
    Fan Module:
        Tray: 1
        Module: 2
        Overall Status: Operable
        Operability: Operable
        Threshold Status: OK
        Power State: On
        Presence: Equipped
        Thermal Status: OK
        Product Name: Fan Module for UCS 5108 Blade Server Chassis
        PID: N20-FAN5
        VID: V01
        Vendor: Cisco Systems Inc
        Serial (SN): NWG14350B6N
        HW Revision: 0
        Mfg Date: 1997-04-01T08:41:00.000
    

    Monitoring Management Interfaces

    Management Interfaces Monitoring Policy

    This policy defines how the mgmt0 Ethernet interface on the fabric interconnect should be monitored. If Cisco UCS detects a management interface failure, a failure report is generated. If the configured number of failure reports is reached, the system assumes that the management interface is unavailable and generates a fault. By default, the management interfaces monitoring policy is disabled.

    If the affected management interface belongs to a fabric interconnect which is the managing instance, Cisco UCS confirms that the subordinate fabric interconnect's status is up, that there are no current failure reports logged against it, and then modifies the managing instance for the endpoints.

    If the affected fabric interconnect is currently the primary inside of a high availability setup, a failover of the management plane is triggered. The data plane is not affected by this failover.

    You can set the following properties related to monitoring the management interface:

    • Type of mechanism used to monitor the management interface.

    • Interval at which the management interface's status is monitored.

    • Maximum number of monitoring attempts that can fail before the system assumes that the management is unavailable and generates a fault message.

    Important:
    In the event of a management interface failure on a fabric interconnect, the managing instance may not change if one of the following occurs:
    • A path to the endpoint through the subordinate fabric interconnect does not exist.

    • The management interface for the subordinate fabric interconnect has failed.

    • The path to the endpoint through the subordinate fabric interconnect has failed.

    Configuring the Management Interfaces Monitoring Policy

    Procedure
      Step 1   Enter monitoring mode.

      UCS-A# scope monitoring

      Step 2   Enable or disable the management interfaces monitoring policy.

      UCS-A /monitoring # set mgmt-if-mon-policy admin-state {enabled | disabled}

      Step 3   Specify the number of seconds that the system should wait between data recordings.

      UCS-A /monitoring # set mgmt-if-mon-policy poll-interval

      Enter an integer between 90 and 300.

      Step 4   Specify the maximum number of monitoring attempts that can fail before the system assumes that the management interface is unavailable and generates a fault message.

      UCS-A /monitoring # set mgmt-if-mon-policy max-fail-reports num-mon-attempts

      Enter an integer between 2 and 5.

      Step 5   Specify the monitoring mechanism that you want the system to use. UCS-A /monitoring # set mgmt-if-mon-policy monitor-mechanism {mii-status | ping-arp-targets | ping-gateway
      • mii-status —The system monitors the availability of the Media Independent Interface (MII).

      • ping-arp-targets —The system pings designated targets using the Address Resolution Protocol (ARP).

      • ping-gateway —The system pings the default gateway address specified for this Cisco UCS domain in the management interface.

      Step 6   If you selected mii-status as your monitoring mechanism, configure the following properties:
      1. Specify the number of seconds that the system should wait before requesting another response from the MII if a previous attempt fails.

        UCS-A /monitoring # set mgmt-if-mon-policy mii-retry-interval num-seconds

        Enter an integer between 3 and 10.

      2. Specify the number of times that the system polls the MII until the system assumes that the interface is unavailable.

        UCS-A /monitoring # set mgmt-if-mon-policy mii-retry-count num-retries

        Enter an integer between 1 and 3.

      Step 7   If you selected ping-arp-targets as your monitoring mechanism, configure the following properties:
      1. Specify the first IP address the system pings.

        UCS-A /monitoring # set mgmt-if-mon-policy arp-target1 ip-addr

        Type 0.0.0.0 for an IPv4 address to remove the ARP target or :: for an IPv6 address to remove the N-disc target.

      2. Specify the second IP address the system pings.

        UCS-A /monitoring # set mgmt-if-mon-policy arp-target2 ip-addr

        Type 0.0.0.0 for an IPv4 address to remove the ARP target or :: for an IPv6 address to remove the N-disc target.

      3. Specify the third IP address the system pings.

        UCS-A /monitoring # set mgmt-if-mon-policy arp-target3 ip-addr

        Type 0.0.0.0 for an IPv4 address to remove the ARP target or :: for an IPv6 address to remove the N-disc target.

      4. Specify the number of ARP requests to send to the target IP addresses.

        UCS-A /monitoring # set mgmt-if-mon-policy arp-requests num-requests

        Enter an integer between 1 and 5.

      5. Specify the number of seconds to wait for responses from the ARP targets before the system assumes that they are unavailable.

        UCS-A /monitoring # set mgmt-if-mon-policy arp-deadline num-seconds

        Enter a number between 5 and 15.

      Step 8   If you selected ping-gateway as your monitoring mechanism, configure the following properties:
      1. Specify the number of times the system should ping the gateway.

        UCS-A /monitoring # set mgmt-if-mon-policy ping-requests

        Enter an integer between 1 and 5.

      2. Specify the number of seconds to wait for a response from the gateway until the system assumes that the address is unavailable.

        UCS-A /monitoring # set mgmt-if-mon-policy ping-deadline

        Enter an integer between 5 and 15.

      Step 9   Commit the transaction to the system configuration.

      UCS-A /monitoring # commit-buffer


      The following example creates a monitoring interface management policy using the Media Independent Interface (MII) monitoring mechanism and commits the transaction:

      UCS-A# scope monitoring
      UCS-A /monitoring # set mgmt-if-mon-policy admin-state enabled
      UCS-A /monitoring* # set mgmt-if-mon-policy poll-interval 250
      UCS-A /monitoring* # set mgmt-if-mon-policy max-fail-reports 2
      UCS-A /monitoring* # set mgmt-if-mon-policy monitor-mechanism set mii-status
      UCS-A /monitoring* # set mgmt-if-mon-policy mii-retry-count 3
      UCS-A /monitoring* # set mgmt-if-mon-policy mii-retry-interval 7
      UCS-A /monitoring* # commit-buffer
      UCS-A /monitoring #

      Server Disk Drive Monitoring

      The disk drive monitoring for Cisco UCS provides Cisco UCS Manager with blade-resident disk drive status for supported blade servers in a Cisco UCS domain. Disk drive monitoring provides a unidirectional fault signal from the LSI firmware to Cisco UCS Manager to provide status information.

      The following server and firmware components gather, send, and aggregate information about the disk drive status in a server:

      • Physical presence sensor—Determines whether the disk drive is inserted in the server drive bay.

      • Physical fault sensor—Determines the operability status reported by the LSI storage controller firmware for the disk drive.

      • IPMI disk drive fault and presence sensors—Sends the sensor results to Cisco UCS Manager.

      • Disk drive fault LED control and associated IPMI sensors—Controls disk drive fault LED states (on/off) and relays the states to Cisco UCS Manager.

      Support for Disk Drive Monitoring

      Disk drive monitoring only supports certain blade servers and a specific LSI storage controller firmware level.

      Supported Cisco UCS Servers

      Through Cisco UCS Manager, you can monitor disk drives for the following servers:

      • B200 M1/M2 blade server

      • B250 M1/M2 blade server

      Cisco UCS Manager cannot monitor disk drives in any other blade server or rack-mount server.


      Note


      Disk Drive Monitoring behavior and the CIMC sensor values are not consistent with the storage controller reported device status across various UCS servers. This is observed during various operations such as removing or inserting a storage device, or during rebuild operations.


      Storage Controller Firmware Level

      The storage controller on a supported server must have LSI 1064E firmware.

      Cisco UCS Manager cannot monitor disk drives in servers with a different level of storage controller firmware.

      Prerequisites for Disk Drive Monitoring

      In addition to the supported servers and storage controller firmware version, you must ensure that the following prerequisites have been met for disk drive monitoring to provide useful status information:

      • The drive must be inserted in the server drive bay.

      • The server must be powered on.

      • The server must have completed discovery.

      • The results of the BIOS POST complete must be TRUE.

      Viewing the Status of a Disk Drive

      Procedure
         Command or ActionPurpose
        Step 1 UCS-A# scope chassis chassis-num 

        Enters chassis mode for the specified chassis.

         
        Step 2 UCS-A /chassis # scope server server-num 

        Enters server chassis mode.

         
        Step 3 UCS-A /chassis/server # scope raid-controller raid-contr-id {sas | sata} 

        Enters RAID controller server chassis mode.

         
        Step 4 UCS-A /chassis/server/raid-controller # show local-disk [local-disk-id | detail | expand]   

        The following example shows the status of a disk drive:

        UCS-A# scope chassis 1
        UCS-A /chassis # scope server 6
        UCS-A /chassis/server # scope raid-controller 1 sas
        UCS-A /chassis/server/raid-controller # show local-disk 1
        
        Local Disk:
            ID: 1
            Block Size: 512
            Blocks: 60545024
            Size (MB): 29563
            Operability: Operable
            Presence: Equipped
        

        Interpreting the Status of a Monitored Disk Drive

        Cisco UCS Manager displays the following properties for each monitored disk drive:

        • Operability—The operational state of the disk drive.

        • Presence—The presence of the disk drive, and whether it can be detected in the server drive bay, regardless of its operational state.

        You need to look at both properties to determine the status of the monitored disk drive. The following table shows the likely interpretations of the property values.

        Operability Status Presence Status Interpretation

        Operable

        Equipped

        No fault condition. The disk drive is in the server and can be used.

        Inoperable

        Equipped

        Fault condition. The disk drive is in the server, but one of the following could be causing an operability problem:

        • The disk drive is unusable due to a hardware issue such as bad blocks.

        • There is a problem with the IPMI link to the storage controller.

        N/A

        Missing

        Fault condition. The server drive bay does not contain a disk drive.

        N/A

        Equipped

        Fault condition. The disk drive is in the server, but one of the following could be causing an operability problem:

        • The server is powered off.

        • The storage controller firmware is the wrong version and does not support disk drive monitoring.

        • The server does not support disk drive monitoring.


        Note


        The Operability field may show the incorrect status for several reasons, such as if the disk is part of a broken RAID set or if the BIOS POST (Power On Self Test) has not completed.


        Managing Transportable Flash Module and Supercapacitor

        LSI storage controllers use a Transportable Flash Module (TFM) powered by a supercapacitor to provide RAID cache protection. With Cisco UCS Manager, you can monitor these components to determine the status of the battery backup unit (BBU). The BBU operability status can be one of the following:

        • Operable—The BBU is functioning successfully.

        • Inoperable—The TFM or BBU is missing, or the BBU has failed and needs to be replaced.

        • Degraded—The BBU is predicted to fail.

        TFM and supercap functionality is supported beginning with Cisco UCS Manager Release 2.1(2).

        TFM and Supercap Guidelines and Limitations

        TFM and Supercap Limitations

        • The CIMC sensors for TFM and supercap on the Cisco UCS B420 M3 blade server are not polled by Cisco UCS Manager.

        • If the TFM and supercap are not installed on the Cisco UCS B420 M3 blade server, or are installed and then removed from the blade server, no faults are generated.

        • If the TFM is not installed on the Cisco UCS B420 M3 blade server, but the supercap is installed, Cisco UCS Manager reports the entire BBU system as absent. You should physically check to see if both the TFM and supercap is present on the blade server.

        Supported Cisco UCS Servers for TFM and Supercap

        The following Cisco UCS servers support TFM and supercap:

        • Cisco UCS B420 M3 blade server

        • Cisco UCS C22 M3 rack server

        • Cisco UCS C24 M3 rack server

        • Cisco UCS C220 M3 rack server

        • Cisco UCS C240 M3 rack server

        • Cisco UCS C420 M3 rack server

        Monitoring RAID Battery Status

        This procedure applies only to Cisco UCS servers that support RAID configuration and TFM. If the BBU has failed or is predicted to fail, you should replace the unit as soon as possible.

        Procedure
           Command or ActionPurpose
          Step 1 UCS-A # scope chassis chassis-num  

          Enters chassis mode for the specified chassis.

           
          Step 2UCS-A /chassis #scope server server-num  

          Enters server chassis mode.

           
          Step 3UCS-A /chassis/server # scope raid-controller raid-contr-id {flash | sas | sata | sd | unknown}  

          Enters RAID controller server chassis mode.

           
          Step 4UCS-A /chassis/server/raid-controller # show raid-battery expand  

          Displays the RAID battery status.

           

          This example shows how to view information on the battery backup unit of a server:

          UCS-A # scope chassis 1
          UCS-A /chassis #scope server 3
          UCS-A /chassis/server #scope raid-controller 1 sas
          UCS-A /chassis/server/raid-controller # show raid-battery expand
          RAID Battery:
              Battery Type: Supercap
              Presence: Equipped
              Operability: Operable
              Oper Qualifier Reason:
              Vendor: LSI
              Model: SuperCaP
              Serial: 0
              Capacity Percentage: Full
              Battery Temperature (C): 54.000000
          
              Transportable Flash Module:
                  Presence: Equipped
                  Vendor: Cisco Systems Inc
                  Model: UCSB-RAID-1GBFM
                  Serial: FCH164279W6