Cisco UCS Manager CLI Configuration Guide, Release 2.0
Monitoring Hardware
Downloads: This chapterpdf (PDF - 507.0KB) The complete bookPDF (PDF - 7.82MB) | The complete bookePub (ePub - 1.25MB) | Feedback

Monitoring Hardware

Monitoring Hardware

This chapter includes the following sections:

Monitoring Fan Modules

Procedure
      Command or Action Purpose
    Step 1 UCS-A# scope chassis chassis-num 

    Enters chassis mode for the specified chassis.

     
    Step 2 UCS-A /chassis # show environment fan 

    Displays the environment status for all fans within the chassis.

    This includes the following information:

    • Overall status
    • Operability
    • Power state
    • Thermal status
    • Threshold status
    • Voltage status
     
    Step 3 UCS-A /chassis # scope fan-module tray-num module-num 

    Enters fan module chassis mode for the specified fan module.

    Note   

    Each chassis contains one tray, so the tray number in this command is always 1.

     
    Step 4 UCS-A /chassis/fan-module # show [detail | expand] 

    Displays the environment status for the specified fan module.

     

    The following example displays information about the fan modules in chassis 1:

    UCS-A# scope chassis 1
    UCS-A /chassis # show environment fan
    Chassis 1:
        Overall Status: Power Problem
        Operability: Operable
        Power State: Redundancy Failed
        Thermal Status: Upper Non Recoverable
    
        Tray 1 Module 1:
            Threshold Status: OK
            Overall Status: Operable
            Operability: Operable
            Power State: On
            Thermal Status: OK
            Voltage Status: N/A
    
            Fan Module Stats:
                 Ambient Temp (C): 25.000000
    
            Fan 1:
                Threshold Status: OK
                Overall Status: Operable
                Operability: Operable
                Power State: On
                Thermal Status: OK
                Voltage Status: N/A
    
            Fan 2:
                Threshold Status: OK
                Overall Status: Operable
                Operability: Operable
                Power State: On
                Thermal Status: OK
                Voltage Status: N/A
    
        Tray 1 Module 2:
            Threshold Status: OK
            Overall Status: Operable
            Operability: Operable
            Power State: On
            Thermal Status: OK
            Voltage Status: N/A
    
            Fan Module Stats:
                 Ambient Temp (C): 24.000000
    
            Fan 1:
                Threshold Status: OK
                Overall Status: Operable
                Operability: Operable
                Power State: On
                Thermal Status: OK
                Voltage Status: N/A
    
            Fan 2:
                Threshold Status: OK
                Overall Status: Operable
                Operability: Operable
                Power State: On
                Thermal Status: OK
                Voltage Status: N/A
    
    

    The following example displays information about fan module 2 in chassis 1:

    UCS-A# scope chassis 1
    UCS-A /chassis # scope fan-module 1 2
    UCS-A /chassis/fan-module # show detail
    Fan Module:
        Tray: 1
        Module: 2
        Overall Status: Operable
        Operability: Operable
        Threshold Status: OK
        Power State: On
        Presence: Equipped
        Thermal Status: OK
        Product Name: Fan Module for UCS 5108 Blade Server Chassis
        PID: N20-FAN5
        VID: V01
        Vendor: Cisco Systems Inc
        Serial (SN): NWG14350B6N
        HW Revision: 0
        Mfg Date: 1997-04-01T08:41:00.000
    

    Monitoring Management Interfaces

    Management Interfaces Monitoring Policy

    This policy defines how the mgmt0 Ethernet interface on the fabric interconnect should be monitored. If Cisco UCS detects a management interface failure, a failure report is generated. If the configured number of failure reports is reached, the system assumes that the management interface is unavailable and generates a fault. By default, the management interfaces monitoring policy is disabled.

    If the affected management interface belongs to a fabric interconnect which is the managing instance, Cisco UCS confirms that the subordinate fabric interconnect's status is up, that there are no current failure reports logged against it, and then modifies the managing instance for the end-points.

    If the affected fabric interconnect is currently the primary inside of a high availability setup, a failover of the management plane is triggered. The data plane is not affected by this failover.

    You can set the following properties related to monitoring the management interface:

    • Type of mechanism used to monitor the management interface.
    • Interval at which the management interface's status is monitored.
    • Maximum number of monitoring attempts that can fail before the system assumes that the management is unavailable and generates a fault message.
    Important:
    In the event of a management interface failure on a fabric interconnect, the managing instance may not change if one of the following occurs:
    • A path to the end-point through the subordinate fabric interconnect does not exist.
    • The management interface for the subordinate fabric interconnect has failed.
    • The path to the end-point through the subordinate fabric interconnect has failed.

    Configuring the Management Interfaces Monitoring Policy

    Procedure
      Step 1   Enter monitoring mode.

      UCS-A# scope monitoring

      Step 2   Enable or disable the management interfaces monitoring policy.

      UCS-A /monitoring # set mgmt-if-mon-policy admin-state {enabled | disabled}

      Step 3   Specify the number of seconds that the system should wait between data recordings.

      UCS-A /monitoring # set mgmt-if-mon-policy poll-interval

      Enter an integer between 90 and 300.

      Step 4   Specify the maximum number of monitoring attempts that can fail before the system assumes that the management interface is unavailable and generates a fault message.

      UCS-A /monitoring # set mgmt-if-mon-policy max-fail-reports num-mon-attempts

      Enter an integer between 2 and 5.

      Step 5   Specify the monitoring mechanism that you want the system to use. UCS-A /monitoring # set mgmt-if-mon-policy monitor-mechanism {mii-status | ping-arp-targets | ping-gateway
      • mii-status —The system monitors the availability of the Media Independent Interface (MII).
      • ping-arp-targets —The system pings designated targets using the Address Resolution Protocol (ARP).
      • ping-gateway —The system pings the default gateway address specified for this Cisco UCS domain in the management interface.
      Step 6   If you selected mii-status as your monitoring mechanism, configure the following properties:
      1. Specify the number of seconds that the system should wait before requesting another response from the MII if a previous attempt fails.

        UCS-A /monitoring # set mgmt-if-mon-policy mii-retry-interval num-seconds

        Enter an integer between 3 and 10.

      2. Specify the number of times that the system polls the MII until the system assumes that the interface is unavailable.

        UCS-A /monitoring # set mgmt-if-mon-policy mii-retry-count num-retries

        Enter an integer between 1 and 3.

      Step 7   If you selected ping-arp-targets as your monitoring mechanism, configure the following properties:
      1. Specify the first IP address the system pings.

        UCS-A /monitoring # set mgmt-if-mon-policy arp-target1 ip-addr

        Type 0.0.0.0 to remove the ARP target.

      2. Specify the second IP address the system pings.

        UCS-A /monitoring # set mgmt-if-mon-policy arp-target2 ip-addr

        Type 0.0.0.0 to remove the ARP target.

      3. Specify the third IP address the system pings.

        UCS-A /monitoring # set mgmt-if-mon-policy arp-target3 ip-addr

        Type 0.0.0.0 to remove the ARP target.

      4. Specify the number of ARP requests to send to the target IP addresses.

        UCS-A /monitoring # set mgmt-if-mon-policy arp-requests num-requests

        Enter an integer between 1 and 5.

      5. Specify the number of seconds to wait for responses from the ARP targets before the system assumes that they are unavailable.

        UCS-A /monitoring # set mgmt-if-mon-policy arp-deadline num-seconds

        Enter a number between 5 and 15.

      Step 8   If you selected ping-gateway as your monitoring mechanism, configure the following properties:
      1. Specify the number of times the system should ping the gateway.

        UCS-A /monitoring # set mgmt-if-mon-policy ping-requests

        Enter an integer between 1 and 5.

      2. Specify the number of seconds to wait for a response from the gateway until the system assumes that the address is unavailable.

        UCS-A /monitoring # set mgmt-if-mon-policy ping-deadline

        Enter an integer between 5 and 15.

      Step 9   Commit the transaction to the system configuration.

      UCS-A /monitoring # commit-buffer


      The following example creates a monitoring interface management policy using the Media Independent Interface (MII) monitoring mechanism and commits the transaction:

      UCS-A# scope monitoring
      UCS-A /monitoring # set mgmt-if-mon-policy admin-state enabled
      UCS-A /monitoring* # set mgmt-if-mon-policy poll-interval 250
      UCS-A /monitoring* # set mgmt-if-mon-policy max-fail-reports 2
      UCS-A /monitoring* # set mgmt-if-mon-policy monitor-mechanism set mii-status
      UCS-A /monitoring* # set mgmt-if-mon-policy mii-retry-count 3
      UCS-A /monitoring* # set mgmt-if-mon-policy mii-retry-interval 7
      UCS-A /monitoring* # commit-buffer
      UCS-A /monitoring #

      Server Disk Drive Monitoring

      The disk drive monitoring for Cisco UCS provides Cisco UCS Manager with blade-resident disk drive status for supported blade servers in a Cisco UCS domain. Disk drive monitoring provides a unidirectional fault signal from the LSI firmware to Cisco UCS Manager to provide status information.

      The following server and firmware components gather, send, and aggregate information about the disk drive status in a server:

      • Physical presence sensor—Determines whether the disk drive is inserted in the server drive bay.
      • Physical fault sensor—Determines the operability status reported by the LSI storage controller firmware for the disk drive.
      • IPMI disk drive fault and presence sensors—Sends the sensor results to Cisco UCS Manager.
      • Disk drive fault LED control and associated IPMI sensors—Controls disk drive fault LED states (on/off) and relays the states to Cisco UCS Manager.

      Support for Disk Drive Monitoring

      Disk drive monitoring only supports certain blade servers and a specific LSI storage controller firmware level.

      Supported Cisco UCS Servers

      Through Cisco UCS Manager, you can monitor disk drives for the following servers:

      • B-200 blade server
      • B-230 blade server
      • B-250 blade server
      • B-440 blade server

      Cisco UCS Manager cannot monitor disk drives in any other blade server or rack-mount server.

      Storage Controller Firmware Level

      The storage controller on a supported server must have LSI 1064E firmware.

      Cisco UCS Manager cannot monitor disk drives in servers with a different level of storage controller firmware.

      Prerequisites for Disk Drive Monitoring

      In addition to the supported servers and storage controller firmware version, you must ensure that the following prerequisites have been met for disk drive monitoring to provide useful status information:

      • The drive must be inserted in the server drive bay.
      • The server must be powered on.
      • The server must have completed discovery.
      • The results of the BIOS POST complete must be TRUE.

      Viewing the Status of a Disk Drive

      Procedure
          Command or Action Purpose
        Step 1 UCS-A# scope chassis chassis-num 

        Enters chassis mode for the specified chassis.

         
        Step 2 UCS-A /chassis # scope server server-num 

        Enters server chassis mode.

         
        Step 3 UCS-A /chassis/server # scope raid-controller raid-contr-id {sas | sata} 

        Enters RAID controller server chassis mode.

         
        Step 4 UCS-A /chassis/server/raid-controller # show local-disk [local-disk-id | detail | expand] 

        Displays the following local disk statistics:

        Name Description

        Operability field

        The operational state of the disk drive. This can be the following:

        • Operable—The disk drive is operable.
        • Inoperable—The disk drive is inoperable, possibly due to a hardware issue such as bad blocks.
        • N/A—The operability of the disk drive cannot be determined. This could be due to the server or firmware not being support for disk drive monitoring, or because the server is powered off.
        Note   

        The Operability field may show the incorrect status for several reasons, such as if the disk is part of a broken RAID set or if the BIOS POST (Power On Self Test) has not completed.

        Presence field

        The presence of the disk drive, and whether it can be detected in the server drive bay, regardless of its operational state. This can be the following:

        • Equipped—A disk drive can be detected in the server drive bay.
        • Missing—No disk drive can be detected in the server drive bay.
         

        The following example shows the status of a disk drive:

        UCS-A# scope chassis 1
        UCS-A /chassis # scope server 6
        UCS-A /chassis/server # scope raid-controller 1 sas
        UCS-A /chassis/server/raid-controller # show local-disk 1
        
        Local Disk:
            ID: 1
            Block Size: 512
            Blocks: 60545024
            Size (MB): 29563
            Operability: Operable
            Presence: Equipped
        

        Interpreting the Status of a Monitored Disk Drive

        Cisco UCS Manager displays the following properties for each monitored disk drive:

        • Operability—The operational state of the disk drive.
        • Presence—The presence of the disk drive, and whether it can be detected in the server drive bay, regardless of its operational state.

        You need to look at both properties to determine the status of the monitored disk drive. The following table shows the likely interpretations of the property values.

        Operability Status Presence Status Interpretation

        Operable

        Equipped

        No fault condition. The disk drive is in the server and can be used.

        Inoperable

        Equipped

        Fault condition. The disk drive is in the server, but one of the following could be causing an operability problem:

        • The disk drive is unusable due to a hardware issue such as bad blocks.
        • There is a problem with the IPMI link to the storage controller.

        N/A

        Missing

        Fault condition. The server drive bay does not contain a disk drive.

        N/A

        Equipped

        Fault condition. The disk drive is in the server, but one of the following could be causing an operability problem:

        • The server is powered off.
        • The storage controller firmware is the wrong version and does not support disk drive monitoring.
        • The server does not support disk drive monitoring.

        Note


        The Operability field may show the incorrect status for several reasons, such as if the disk is part of a broken RAID set or if the BIOS POST (Power On Self Test) has not completed.