Cisco on Cisco
Storage Networking Case Study: How Cisco Multilayer Director Switch Operates with Other SAN Switches During SAN Migration
Cisco Multilayer Director Switch offers safe migration path from other vendor SAN switches.
Through the 1990s, many applications at Cisco relied on direct-attached storage (DAS). â€Å“DAS is fine when you are small,â€ï¿½ says Jesse Adam, a system administrator for Cisco engineering. â€Å“But as Cisco grew and acquired other companies, DAS became difficult to manage because of its lack of scalability.â€ï¿½
Specifically, moves or upgrades required that IT staff physically visit each device. Adding additional storage required shutting down the server. The proliferation of different types of storage by Cisco offices worldwide made it difficult to identify the source of performance problems; for example, slow disks, different firmware versions, or different storage types.
DAS also prevented Cisco from optimizing utilization. If an application needs additional storage capacity with DAS, the only option is to add an entire new disk, even if only a fraction of its capacity is needed. “We wanted a new approach to storage that would let us grow in flexible increments,” says Adam. “We also wanted to be able to consolidate storage so that we could allocate it without being constrained by the physical location of the application or storage device.”
Utilization, in fact, was a chief incentive for shifting away from DAS. In 2001, just 1.35 percent of Cisco servers operated at or above 85 percent utilization, while 42.4 percent operated at or under 20 percent utilization. (Chiefly, Cisco uses Sun Microsystems servers running the Solaris operating system and HP servers running HP-UX, in addition to a few servers running Linux or Microsoft Windows 2000.) Storage frames, too, were often underutilized, usually because of port density issues at the disk adapter, host bus adapter (HBA), or switch port level. (Most storage frames at Cisco are EMC Symmetrix 8530 and 8830 or HP XP512.)
At the end of 2001, Cisco began transitioning from its DAS model, in which individual business units owned storage devices that were directly connected to the host machine, to a shared SAN environment. By September 2002, Cisco IT had deployed more than 55 SAN
Program Manager, Cisco
switches from McDATA and Brocade in three large business data centers, for a total of 1400 ports. Buoyed by the success of the DAS-to-SAN migration in business data centers, engineering data center administrators at Cisco began migrating their storage to SAN environments using the McDATA Enterprise Director 6064 switch at the core and McDATA Sphereon 3032 switches at the edge. “The total cost of ownership for the SAN was at least 12 percent lower than for DAS,” says Lance Perry, vice president of IT Customer Strategy and Success.
Ultimately, Cisco deployed approximately 100 third-party SAN switches in more than 10 business and engineering data centers.
When Cisco introduced the Cisco MDS 9000 Series Multilayer Director Switch, the company wanted to take advantage of its better scalability, higher performance, and virtual SAN (VSAN) capabilities. Cisco MDS 9509 multilayer director switches offer up to 224 Fibre Channel ports on a single chassis, compared to 64 ports on the largest McDATA switch, the 6064. With VSANs, Cisco could support multiple SANs from a single switch, eventually making the physical location of hosts and storage irrelevant.
Instead of replacing all its McDATA switches with Cisco MDS 9000 Series multilayer switches at once, Cisco engineering first replaced only McDATA 6064 core switches with the more powerful Cisco MDS 9509 multilayer director switches. The engineering SAN continued to use McDATA 3032 switches at the edge, operating with Cisco MDS 9509 multilayer director switches in interoperability mode. “Many companies start out with lower-end McDATA switches and now need a higher-end switch like the Cisco MDS 9509,” says Adam. “We realize that some of them might not want to make the capital outlay to replace all their McDATA switches at once. We decided to run our engineering SAN in interoperability mode for several months so that we could learn the advantages and disadvantages for our customers who would be making the same migration.”
The plan: In mid-2003, in a Cisco engineering building in San Jose, California, IT would install two Cisco MDS 9509 multilayer director switches at the core and connect them to the existing McDATA 3032 edge switches and the storage arrays. After six months of running the SAN in operability mode, Cisco would replace the McDATA 3032 edge switches with Cisco MDS 9120 multilayer fabric switches to gain additional benefits. The technical challenge would be to manage the transition without interruption to Cisco business-critical applications that rely on the SAN.
Cisco wanted to test operability on a SAN that supports a mission-critical application with high performance demands. Its choice: IBM Rational ClearCase, which thousands of Cisco developers rely on daily as their source-code library to develop Cisco IOSÂ® Software. â€Å“ClearCase is a demanding application with thousands of transactions per second, low latency requirements, and critical availability requirements,â€ï¿½ says Dave Angulo, Cisco program manager for engineering storage.
To ensure business continuity in case of hardware, software, or network failures, the Cisco IT group designed the SAN with complete redundancy (Figure 1). Each of the two Cisco MDS 9509 multilayer director switches would connect to two different HBA cards on the ClearCase host using two existing McDATA 3032 edge switches. Each Cisco MDS 9509 Multilayer Director Switch would have multiple paths to the HP XP 1024 disk array. The Cisco MDS 9509 multilayer director switches have dual backplanes, dual power supplies, and dual supervisor engine modules, with stateful process failover and restart. “We designed the SAN with complete redundancy so we had no single point of failure, from the host to the edge and from the core to the disk array,” says Adam.
Cisco migrated the first core switch in June 2003 and the second in September 2003.
To cut over, Cisco IT configured the Cisco MDS 9509 multilayer director switches, disconnected the McDATA core switches from the storage array and edge switches, and finally reconnected the storage array and edge switches to the new core Cisco MDS switches. Cisco accomplished the transition in two phases, one Cisco MDS 9509 Multilayer Director Switch at a time. The phased migration ensured that the host continued operating without interruption during the transition (Figure 1).
Figure 1. Two-Phased Migration from McDATA 6064 Core Switches to Cisco MDS 9509 Multilayer Director Switches: Configuring the Cisco MDS Switch.
To configure the Cisco MDS 9509 multilayer director switches for the production environment required establishing logins and security settings, and configuring the firmware. “Cisco IT storage administrators built an operating template that saves time during configuration,” says Hagen Finley, an HP consultant who worked on the project. “We started with that and then cut and pasted a lot of the functionality in terms of logins, ‘phone home,’ and other parameters.”
When it was time to deploy the second Cisco MDS 9509 Multilayer Director Switch in September 2003, three months after the first core switch, Finley took advantage of a unique feature of Cisco MDS 9000 Series multilayer switches: the ability to export configuration information to a text file, edit it for the second switch on a PC, and then import the edited configuration file back to the switch. “We saved a lot of time,” Finley says. In fact, configuring both switches took one person approximately eight hours, including planning.
The next step was to shut down the first network fabric. Like most SAN storage environments, the Cisco environment consists of a dual fabric. Each host on the SAN has two HBAs connected to two separate McDATA edge switches in different fabrics. Each edge switch, in turn, is connected to one of the core switches, which connects to a storage array. Therefore, shutting down one fabric does not affect host access to storage. â€Å“We sent out notification that we would be shutting down one of the fabrics, and avoided peak times,â€ï¿½ says Finley. The fabric was down for less than six hours while Finley and his team accomplished the following procedures:
- HP disconnected the McDATA Intrepid 6064 Director switch.
- HP connected the Cisco MDS 9509 Multilayer Director Switch to the HP XP 1024 storage array.
- HP changed the modality of the McDATA 3032 edge switches from “McDATA fabric” to “Open Fabric.” This configures the McDATA switches for interoperability.
- HP validated the domain IDs of the McDATA edge switches. Only a subset of domain IDs work in interoperability mode because the McDATA switch supports fewer domain IDs than the Cisco MDS switch.
- HP checked for zoning conflicts. Edge switches have zoning tables that cannot easily be deleted. When those edge switches are connected to the Cisco MDS 9509 Multilayer Director Switch, it propagates its own zones. “We had to validate that the proper MDS zone had been propagated and that the old zone definition was not interfering,” says Finley.
With these steps complete, the McDATA 3032 edge switches were prepared to interoperate with the Cisco MDS 9509 Multilayer Director Switch.
Finley connected the McDATA edge switches to the Cisco MDS switch one at a time, using Inter-Switch Links (ISLs), which are physical fiber connections. The team confirmed the connection in two ways. One was to check that the Cisco MDS 9509 Multilayer Director Switch acknowledged the connection. For extra assurance, a team member logged onto a UNIX server on the storage path to confirm that the host could see its storage. â€Å“The first switch worked perfectly the first time,â€ï¿½ says Finley. Initially, the second switch did not work, and the team quickly identified the problem: a higher domain ID than McDATA switches permit.
Companies that use dynamic domain IDs instead of static domain IDs avoid this problem. Cisco IT chose to use static domain IDs to maintain unique domain IDs throughout the company. Currently, it is necessary only to confirm that all domain IDs within a single data center SAN are unique. When Cisco eventually interconnects all its data centers and SANs, the decision to maintain unique domain IDs will prevent potential domain ID conflicts. A change in domain IDs immediately solved the problem.
Because of thorough planning, the transition to a mixed Cisco MDS and McDATA SAN was achieved without service interruption. “When we took out one core switch, the host did not falter and the clients did not even notice,” says Angulo.
The Cisco engineering SAN ran without problems in interoperability mode from June 2003 to January 2004. At that point, Cisco declared the interoperability demonstration a success and replaced its McDATA 3032 edge switches with Cisco MDS 9120 multilayer fabric edge switches to gain the benefits of a SAN based exclusively on Cisco MDS switches. During the interoperability phase, Cisco improved scalability and utilization and achieved its availability goals. Soon it will take advantage of the powerful VSAN capability of MDS 9000 Series multilayer switches (see â€Å“Next Stepsâ€ï¿½).
The higher port density of the Cisco MDS 9509 Multilayer Director Switch; that is, 32 ports per blade and up to 224 Fibre Channel; reduces costs, improves utilization, and frees data center floor space. In fact, Cisco engineering plans to add 50 additional Cisco MDS 9216 multilayer fabric edge switches and will be able to support all 60 edge switches with only two Cisco MDS 9509 multilayer director switches, rather than the four to six McDATA 6064 switches that would have been needed.
The scalability of the MDS 9509 Multilayer Director Switch also ensures optimal resource utilization, says Angulo. “Suppose the ERP environment runs out of ports but the engineering environment has free ports,” he says. “In the past we could not utilize the available resource. But with the MDS 9509 Multilayer Director Switch, we can plug the ERP hosts into the free ports and configure them as part of the ERP VSAN.” To participate in a VSAN, the Cisco MDS 9509 Multilayer Director Switch can reside anywhere in the Cisco campus metropolitan-area network (MAN), including a different data center.
The two Cisco MDS 9509 multilayer director switches have remained available without interruption since they were deployed. Their high availability is a result of redundant supervisor engines, fully stateful supervisor engine failover, redundant crossbars, hitless software upgrades, individual process restart ability, and process isolation within VSANs. The ClearCase host experienced no service interruption during migration, and no application outages since then have been attributable to the Cisco MDS 9000 Series multilayer switches. â€Å“Everything stabilized almost immediately after we brought up the second fabric,â€ï¿½ says Finley.
Finley notes that the high availability of Cisco MDS 9509 multilayer director switches is especially critical when companies transition from a DAS to a SAN environment. In a DAS environment, losing a Fibre Channel connection between a host and storage typically affects only one set of applications, for one business group. In contrast, in a SAN environment, losing connectivity between hosts and storage can shut down applications and business groups throughout the data center.
Currently, all hosts and storage in the Cisco engineering SAN reside on the same VSAN, and Cisco is not yet taking full advantage of VSAN capabilities. If administrative traffic volume begins to affect performance, Cisco can solve the problem by creating multiple VSANs on the same switch, each with its own broadcast domain. For Cisco customers, an advantage of VSANs in a mixed MDS-McDATA environment is that a single Cisco MDS 9509 Multilayer Director Switch chassis can support some VSANs in interoperability mode while other VSANs retain normal functionality. This is important because a few features of the Cisco MDS 9509 Multilayer Director Switch do not function in interoperability mode; for example trunking, which involves aggregating I/O so that multiple cables appear as one. With the McDATA switch, in contrast, interoperability mode applies to the entire switch or not at all.
â€Å“It is extremely important to plan the cutover, because if something breaks, you affect many hosts,â€ï¿½ says Adam. For example, Cisco planned the fiber network topology so that the team could replace the two core switches one after the other instead of both at once which was good insurance for a smooth transition, even if one of the fabrics experiences a problem.
Finley emphasizes the importance of checking that the McDATA domain IDs are in the allowed range for interoperability mode and to check for zoning conflicts between the Cisco MDS switches and McDATA switches. It is also important to know that when Cisco MDS 9000 Series multilayer switches run in interoperability mode with switches from other vendors, features are limited to the intersection of the feature sets of the two switches. Notably, the sophisticated VSAN and network management features available on Cisco MDS switches do not work in an interoperability environment. After swapping out its remaining McDATA edge switches for Cisco MDS 9120 multilayer fabric edge switches in January 2004, Cisco gained unrestricted VSAN capabilities, greater scalability, and significantly better network management.
Cisco IT plans to continue expanding the SAN in the engineering data center in San Jose Building 5, adding additional hosts and storage frames as needed. The capacity is unprecedented. Each pair of Cisco MDS 9120 multilayer fabric edge switches can support 16 hosts. With 40 half-rows in the data center, that is 640 hosts in the same SAN (Figure 2). â€Å“The enabling technology for this massive scalability is the VSAN support in Cisco MDS switches,â€ï¿½ says Angulo. â€Å“With VSAN capability, the administrative traffic from only a fraction of the hosts would overwhelm the SAN.â€ï¿½
Each storage frame can house 80 terabytes, and with two storage frames the SAN can serve 180 terabytes, or more than 10 percent of all Cisco storage.
Figure 2. Planned Architecture for Engineering Data Center: Capacity for 640 Hosts.
When Cisco begins consolidating its storage and hosts across business units to reduce capital costs, improve resource utilization, and simplify management, VSANs will be invaluable in separating traffic from different business units. That is, VSANs will enable Cisco to isolate its business logically instead of physically. A single Cisco MDS 9509 Multilayer Director Switch can support multiple VSANs; for example, one for tape backups, another for engineering, and another for enterprise resource planning (ERP). Isolation will be increasingly important when Cisco combines business verticals into one infrastructure, because Cisco does not want ERP systems to see traffic associated with the engineering system, both to eliminate unnecessary administrative traffic and to isolate the respective systems from problems the others might experience. “Traditionally, the engineering SAN would need a physical connection to access the tape backup devices,” says Angulo. “But with this setup, you have connectivity at a virtual abstract layer. Our eventual goal is to have massive storage, in any location, that any host can provision as needed.”
Other upcoming steps include installing Cisco MDS-based SANs in engineering data centers in Bangalore, India; South Netanya, Israel; Research Triangle Park, North Carolina, USA; and Boxborough, Massachusetts, USA. SANs are near completion in Cisco business data centers in other locations. It is a decision made confidently, based on success. “This is not a lab,” says Angulo, “it is a real-world, P1 environment. If something did not work, thousands of engineers at Cisco headquarters in San Jose would stop working and development of Cisco IOS Software would grind to a halt. The 100-percent redundant deployment gave us the confidence we needed to migrate.”