Cisco MGX 8230 Edge Concentrator Overview
Reliability, Availability, and Serviceability (RAS)

Table Of Contents

Reliability, Availability, and Serviceability

Overview

Key Availability Features

Redundancy

AC Power Shelf (AC systems only)

Redundancy for DC

Switchover Mechanism

Nonbulk Mode Distribution

Bulk Mode Distribution

Hot Standby

Software Upgrades

Logical Connections

Performance

Automatic Protection Switching


Reliability, Availability, and Serviceability


Overview

The MGX 8230 is designed for carrier-class reliability. System components can be configured for 100 percent redundancy, and all MGX 8230 modules can be removed and reinserted without impacting service delivery or affecting the performance of other modules. Hot-standby interfaces offer optional redundancy so that if a module fails, the standby is fully online within milliseconds. Switchover to standby interfaces is nonservice-affecting for most protocols, ensuring nonstop application performance.

The MGX 8230 Edge Concentrator supports industry-standard, automatic protection switching (APS) for all SONET and synchronous digital hierarchy (SDH) interfaces using the 1+1 APS scheme. In the event of a fiber cut or card failure, APS performs switching to the backup fiber within milliseconds. The MGX 8230 series provides cost-effective 1:N redundancy of service interfaces to enhance overall reliability and availability. With support for 1:N redundancy, a single standby service module will automatically take over the traffic functions of any failed service module of the same type within seconds.

Key Availability Features

The MGX family supports the following Reliability, Availability, and Serviceability (RAS) features:

PXM1/SRM hot standby and hitless redundancy

Hot swappable front and back cards

Graceful upgrade/downgrade hitless switchover

Exception handling and software error detection

Graceful recovery from disk failures

Database management

Disk file and memory mirroring

Automatic updates

Upgrade/downgrade hooks

Automated trouble detection monitoring

Redundancy

The Service Resource Module (SRM) enables 1:1 or 1:N redundancy for the service modules. It also offers other features, such as BERT testing and M13 grooming of circuits. There are two SRMs per node. The SRM is1:1 redundant; one is active and the other is redundant. The SRM supports 1:N redundancy for the service modules.

For bulk distribution, the MGX 8230 shelf can support three channelized T3s using the SRM. The SRM can support 64 T1s per shelf. Bulk distribution is supported in all service module slots of the
MGX 8230 (all slots except slots 1/7 and 2/8 where the PXM1 cards reside).

The SRM can be used in conjunction with native T1/E1 Service Modules to bring the total to 192 DS1s: 160 DS1s using twenty 8-port cards and the SRMs, and 32 DS1s using four 8-port cards with T1/E1 back cards. The current SRMsupports 64 DS1s across the three T3s on each SRM.

PXM1 on slot 1 controls SRM on slot 7 and PXM1 on slot 2 controls SRM on slot 14. The active SRM depends on the active PXM1.

For redundancy, the MGX 8230 power supply tray has two options: one with one AC cord and another with two AC cords.

MGX-AC1-1 is for systems requiring a single AC power supply that will be powered from a single AC power source. MGX-AC1-1 provides up to 1200 W of load-shared redundant power. If additional power is needed, additional power supplies, providing an additional 1200 w each, can be added.
The one-AC cord version uses 1:N power-supply redundancy. If you have three 1200W power modules, you can support up to 2400W of power; the third module is redundant. Since there is only one AC cord, you do not have redundancy for the AC cord itself.

MGX-AC2-2 is for systems requiring redundant AC power supplies that will be powered from two AC power sources.
The two-AC cord power tray supports 1:1 power-supply redundancy. If you have four 1200W power supplies, you can support only 2400W of power. The two AC cord power trays each have two AC cords; therefore, both the AC cord and the power modules are redundant.

The quantity of AC power modules is determined by the type of power tray and by the customer's overall power requirements. The AC power module converts 220V 50/60 cycle AC into 48 VDC.

The MGX 8230 implements a robust, distributed database scheme in which the configuration parameters are stored in service module local memory, active PXM1 hard disk, and standby PXM1 hard disk. This scheme ensures update efficiency and database consistency. It also includes a synchronization protocol between PXMs and SMs to recover from database inconsistency due to certain error conditions (such as switchover).

The architecture of multiple cell buses ensures that if one service module pulls one cell bus down, other cell buses can continue to operate without down time.

The PXM1 is fully redundant in a 1:1 configuration. While one of the PXMs serves as the active switching fabric, the backup serves as a hot-standby module. Upon the detection of a failure in the active PXM1, the hot standby takes over the switching fabric function in a completely nondisruptive manner. The PXM1 switchover times are between 15-30 ms.

There are several signals cross-coupled between the two PXMs. If the active PXM1 resets, the couple signals will indicate to the standby PXM1 to take over mastership. Software polls the mastership logic periodically. Once it detects a hardware switchover, it will start running all the routines necessary to assume mastership.

All existing connections will be copied onto the new card. There are frequent exchanges between the active card and standby card to make sure that the standby card has all the information necessary to resume operation should the active card fail.

The T3/E3 and SONET interface ports on the service modules can be configured to provide a 1:1 redundancy on each service module. Additionally, the SRM can be configured to provide up to 1:N redundancy for the narrowband service modules (through the redundancy bus).

AC Power Shelf (AC systems only)

The AC power module includes the following power design features:

3U rack mountable (two shelves may be required per system)

Hot pluggable AC/DC power modules (1200W capacity each)

O-ring diodes in each power module

EMI filtering in each power module

Cooling fans in each power module

Circuit breaker(s)/switch(es)

2N (dual AC line input) for redundancy

Redundancy for DC

In the DC system, the 48 VDC is supplied through either one or two power-entry modules (PEMs). The PEMs will be plugged into the midplane through the same connectors as the AC power supply. Each PEM has a circuit breaker for protection. The DC power range is -42 to -56 VDC.

Switchover Mechanism

The MGX 8230 backplane supports the same distribution bus employed in the MGX 8220. This is used in conjunction with the SRM-3T3 to provide M13 circuit breakout and distribution capability, as well as T1/E1 1:N service module redundancy (in bulk mode, the service modules have 1:N redundancy without using the separate T1 redundancy bus).

Nonbulk Mode Distribution

Nonbulk mode distribution is a mode of operation where individual T1 lines are directly connected to the line module of each front card. During normal nonbulk mode operation, the T1/E1 data flow is from the service module's line module to its front card and vice versa. The line modules also contain isolation relays that switch the physical interface signals to a common redundancy bus under SRM-3T3 control in case of service module failure.

In nonbulk mode, upon the detection of a failure in any of the service modules, the traffic destined for the failed service module is carried over the redundancy bus to the active SRM on its shelf. Thus, each active SRM provides redundancy for a maximum of 8 service modules per shelf.

When a service module failure is detected, the PXM1 will initiate a switchover to the standby service module. The relays on the service module's line module (all T1/E1s) are switched to drive the signals onto the T1 redundancy bus. The designated standby card's line module (controlled by the SRM-3T3) receives these signals on the T1/E1 redundancy bus. The data path switches from the failed service modules' line module to the T1/E1 redundancy bus to the line module of the standby service module, and finally to the standby service module itself.

Bulk Mode Distribution

Bulk distribution is a mode of operation in which individual lines are not brought to the service modules, these lines are multiplexed into a few high-speed lines attached to the SRM. The SRM then takes this "bulk" interface, extracts the lines, and distributes them to the service modules. Any cards served by this bulk interface can participate in 1:N redundancy without using the separate redundancy bus. Any T1 in a T3 line can be distributed to any eight ports on a service module in any slots of the service bay without restriction.

During bulk mode operation, the SRM- 3T3/B unbundles T1 data from the incoming T3s and sends it to each service module. Any slot can be used to process T1 data or to house a standby service module. When a service module fails, the PXM1 will initiate a switchover to the previously configured standby module. The SRM-3T3/C will then redirect the recovered T1 traffic to the designated standby module. The switching takes place inside the SRM-3T3/C and requires no special back cards or cabling. The data path to the standby module is still via the distribution bus; the redundancy bus is NOT used in bulk mode.

Hot Standby

The switching fabric is configured in a 1:1 redundancy. Of the two PXM1s in the edge concentrator, one PXM1 serves as the active switch fabric while the other serves as a hot standby for the active PXM1. Switching service modules can be done in a nondisruptive manner.

Software Upgrades

Nonservice-affecting software upgrades enable the system to gracefully upgrade software or add new features without disrupting services. New features are received without interrupting service delivery or application performance.

Logical Connections

All the narrowband service modules are connected logically to the PXM1 and the SRM via the cell bus. Therefore, the failure of any single service module does not impact other interface cards.

Performance

The MGX 8230 availability performance is 99.999%. The primary RAS measure for the MGX 8230 product is "minutes of connection outage." The percentage of time the system or features are available for customer use is usually measured as a percentage: 99.999% (equivalent to 5.25 minutes of unavailability per year), which is the percentage of time that connections are available. It is also measured in defects per million (DPM) connection hours, or 10DPM = 99.999%.

Automatic Protection Switching

Automatic Protection Switching (APS) is a means to provide redundancy on SONET equipment to guard against hardware failures. There are three modes of APS defined in GR-253 and ITU-T G.783: APS 1+1, APS 1:1, and APS 1:N. All three modes require that after any failures have been detected, switching from the working equipment to the protection equipment must be initiated in 10 ms and completed in 50 ms (for a total of 60 ms).

In an APS 1+1 implementation, a redundant protection line exists for every working line. Traffic protected by the redundancy is carried simultaneously by the working and protection lines. The receiver terminating the APS 1+1 must select cells from either the working or protection line and be able to forward one consistent traffic stream. Both working and protection lines transmit identical information; therefore, the receiving ends can switch from one to the other without coordination with the transmit end. If the working (or active) fiber optic cable fails, the protection fiber is selected at the SONET layer. In full compliance with standards, the K1 and K2 bytes are utilized for this signaling.

A P S redundancy will be supported on PXM1 OC-3 and OC-12 interfaces. Of the modes mentioned above, only APS 1+1 is supported. The cross-coupling between adjacent slots permits this type of redundancy. In the event of a link failure, the main processor subsystem that controls the PXM1 switches away from the failing port and activates the backup port. The processor performs this function intelligently based on the alarm status.