Guest

Cisco MDS 9500 Series Multilayer Directors

Cisco MDS 9000 Family Offers the Only High-Availability Director on the Market Today

  • Viewing Options

  • PDF (204.5 KB)
  • Feedback

The Need for High Availability in Data Center Director-Class Switches

For today's organizations, high availability is essential, especially in industries with strict compliance and regulatory requirements such as financial services, legal services, and government. In an enterprise SAN, director-class switches are expected to be the most robust and reliable components; failure to provide high availability can mean a significant loss of revenue, productivity, and reputation.

Objective of the Test

A test was performed to evaluate the high availability capability of SAN directors when subject to a fabric module failure. A fabric module is responsible for switching traffic across various ports on the directors. A SAN director usually has two fabric modules for high-availability purposes, so that even if one fails, the traffic is not completely interrupted. The test simulated fabric module failure while traffic was applied through the director switch; the test then computed the frame loss and the forwarding outage time per port.
In this document "fabric module" refers to the Fabric Module 2 for MDS 9513 Multilayer Director, to the Supervisor 2 for MDS 9506/9509 and to the Core Routing module for the competitor's 8-Gbps Director.

Test Setup

Test Equipment Configuration

• Cisco MDS 9513 and 9509

– 8-Gbps port pairs configured across two 48-port 8-Gbps modules

• Competitor's SAN director

– 8-Gbps port pairs configured across two 32-port modules

– Exchange-based routing (default policy)

• Agilent SAN Tester traffic generator

– 99% load, with total traffic applied: approximately 6400 MBps

Test Procedure

• Cisco MDS 9513

– Power-off a Fabric Module 2, remove it, and then reinsert it.

– Remove a Fabric Module 2 without powering it off first; then reinsert it.

• Cisco MDS 9509

– Remove the active Supervisor 2 module without powering it off first; then reinsert it.

– Remove the standby Supervisor 2 without powering it off first; then reinsert it.

• Competitor's SAN director

– Power-off the Core Routing Module using the slider switch; then power it on using the slider switch.

The competitor's fabric module cannot be removed without powering it down first, so no direct removal test was performed for the competitor's director.

Results

Cisco

• When Cisco MDS 9513 Fabric Module 2 is removed (with or without powering off)

– No frames dropped on any port

• When Cisco MDS 9509 (active or standby) Supervisor 2 Module is removed (with or without powering off)

– No frames dropped on any port

Competitor

• When Core Routing Module is turned off

– Total frames dropped: 1,990,130

– 0.85 second outage (the number of frames dropped divided by frames per second [fps])

• When Core Routing Module is turned on

– Total frames dropped: 1,626,790 after the internal diagnostics were run (about 30 seconds) and the Core Routing module was brought online

– 0.69 second outage (the number of frames dropped divided by fps)

Real-World Application Environment

This test was repeated to observe the implications for real-world applications of the lack of high availability in the competitor's director. Instead of using the Agilent SAN Tester to generate the test data and measure the frame loss, an Oracle simulation was run, using HammerOra to generate the test data and measure the results while the Core Routing Module was turned off and on. This test allowed evaluation of the effects of frame loss at the application layer. The test configuration consisted of one server running the Oracle test script against an Oracle database on an array.
Figures 1, 2, and 3 show HammerOra screens. Figures 1 and 2 clearly show how turning the competitor's Core Routing Module off and on using the slider to simulate a failure results in an application outage lasting longer than 30 seconds-not acceptable for real-time applications such as Oracle.

Figure 1. Competitor Core Routing Module Powered Off: 34-Second Application Outage

Figure 2. Competitor Core Routing Module Powered On: 33-Second Application Outage

The same test was run on the Cisco MDS 9509 and 9513 and had no effect on the application. Figure 3 shows the case of removing a Fabric Module 2 from the Cisco MDS 9513. The Cisco MDS 9000 Family directors completely meet the high-availability requirements of real-world applications.

Figure 3. Cisco MDS 9513 Fabric Module 2 Removed: No Affect on Application

Summary

The Cisco MDS 9000 Family offers the only high-availability director on the market today. The competitor's director lacks high availability: in the case of fabric module failure, frames are dropped on multiple ports, which is not acceptable for a director-class switch. Both the Cisco MDS 9513 and 9509, however, proved to be robust and reliable, with no frame drops upon fabric module failure.
Lack of high availability is one more item to add to the list of design flaws in the competitor's director:

• Unpredictable and unbalanced performance (throughput and latency)

• Local switching limitations

• Blocking architecture

• Lack of arbitration and fair servicing of ports

• Lack of high availability, resulting in unacceptable application outages

For More Information