Cisco on Cisco
Data Center Case Study: How Cisco IT Achieves Consolidation and Standardization in the Data Center
Server and data storage virtualization enhances Cisco business resilience and lowers TCO.
Cisco IT’s Data Center planners have developed a challenging and long range vision of where the Cisco data center should be over the next five years. This vision will guide Cisco IT’s testing and pilot deployment of several new technologies, some of them from Cisco Systems, and some from other vendors. The goal of this new vision is a data center that automatically allocates, at any moment, to any application, the optimal storage and processing resources from shared resource pools.
Cisco operates five enterprise production data centers of 36,000 square feet dedicated to non-engineering tasks, such as enterprise resource planning (ERP), customer resource management (CRM), customer issue resolution, human resources support, supply chain support, and Cisco internal Websites for employee support, customer support, partner support, and business-to-business tools. Two business data centers are located in San Jose, California; the others are located in Research Triangle Park (RTP), North Carolina, Amsterdam, and Sydney. In addition, there is a Linksys data center in Irvine California (Figure 1). Cisco engineering groups also support product development data centers around the world.
Figure 1. Cisco Data Centers
These enterprise production data centers support the business processes that keep Cisco business operating. The processes are automated by several hundred applications, some of them purchased from vendors and some developed or customized by Cisco. These applications process vast amounts of data following business rules developed within Cisco over many years. Supporting all these applications and data is a complex infrastructure of servers and storage, and the networks that connect servers, storage, and people to each other, to bring the right information to the right person quickly and efficiently.
Focusing more on the data center infrastructure domain, Cisco IT divides the data center infrastructure into four physical layers (Figure 2):
Figure 2. Data Center Architecture
- An access layer, connecting the data center to the rest of the company intranet and to the Internet to provide customer access to e-commerce Web servers as well as provide remote VPN access to Cisco employees and partners. This layer provides more than just network interconnectivity; it also provides secure authentication and firewall services, VPN access, Secure Sockets Layer (SSL) termination, intrusion prevention, content caching and switching, application traffic monitoring, and other communications services. The Cisco IT access layer is built upon Cisco Catalyst® 6500 Series switches, and many of these communications services are supported by blades (like the Cisco PIX® Firewall Services Module [FWSM], the SSL Services Module [SSM], and others).
- A Web server layer supports client-facing applications for employees, customers, manufacturing partners, and business-to-business portals. This layer also provides even more firewall capabilities, including packet encryption and protocol translation. These applications are supported on a variety of servers, running primarily Sun Solaris, Microsoft NT, Linux, and HP-UX operating systems. Since these servers need to be accessed by people over the Internet, these servers are located in a less secure area of the network called, in Cisco IT, “the dirtynet”, since traffic within this public part of the network was not trusted within the corporate Intranet.
- An applications server layer consists of a variety of middleware application services and the underlying application services, protected behind another layer of firewalls. These applications are also supported by a wide range of servers that run a variety of operating systems.
- Behind the application server layer is the database and storage layer, where more than 3900 GBs of data are stored. Most data is stored on large shared storage frames connected to application servers over a storage area network (SAN). Cisco migrated its initial Direct Access Storage, with storage dedicated to individual servers, to a SAN environment, with storage pooled to serve applications within individual business areas. This SAN environment was initially based on smaller SAN switches, but Cisco IT replaced them with larger and more sophisticated Cisco Multilayer Director Switches (see case studies in http://www.cisco.com/web/about/ciscoitatwork/data_center/index.html )
Figure 3. Horizontal and Vertical Areas of Cisco IT Group Responsibility
Over time, Cisco IT organized the components of the data center in two different ways: vertically, for applications support; and horizontally, for hardware and OS support (Figure 3). Applications developers organized to support the dozens of different vertical customer organizations within Cisco. They were funded separately by many different product engineering groups within Cisco, and these groups were also responsible for developing applications based on the unique requirements of their separate organizations. Hardware and OS support teams were organized horizontally, to support the network and the servers and to perform systems administration functions based on operating support systems. Each application became a silo, supported by a discrete set of network, computing, and storage resources.
This system worked well for years, but over time it began to show serious problems. Despite Cisco IT’s work to develop and enforce hardware, OS, and application development standards, each application environment differed significantly from the other environments.
This silo model is inefficient, expensive, and difficult to manage for several reasons:
- Despite Cisco IT’s work to develop and enforce hardware, OS, and application development standards, vertical, per-application infrastructures were often built with proprietary point products, making it virtually impossible to standardize management practices.
- It was difficult to understand the behavior of every application and difficult to ensure consistent security, so applications were potentially left vulnerable to attack;
- Maintenance, management, and training requirements for such a diverse environment significantly increased the total cost of ownership (TCO) of each application;
- Applications cannot scale easily when they are built on isolated islands of computer and storage resources, and scaling resources often requires downtime;
- Processing and storage resources cannot be used efficiently because they are permanently assigned to a single application and cannot be scaled to meet the changing needs of an application;
- Silo-based infrastructures lead to poorly managed recovery for some applications. Recovery policies are often out of line with the business objectives of the company;
- Divergent, nonstandard applications made it extremely difficult for the various applications to communicate and share data between each other. This significantly reduced Cisco IT’s ability to flexibly integrate systems and build new applications based on existing application platforms.
Ultimately, the need to manage and build isolated islands for each application cost Cisco IT too much in resources and flexibility, and a solution was required. “Often, more than 80 percent of IT budgets go toward maintaining existing applications. That doesn’t leave much money for new growth areas such as new applications and new business processes,” says Pierre-Paul Allard, vice president, Enterprise Marketing, Cisco Systems.
Underutilization of resources is a major factor in high TCO. “Typically we see about a 20 percent underutilization for servers and bandwidth. Direct-attached storage typically has 25 percent utilization,” says Allard.
As a result, enterprises are advised to transform their data centers into standards-based, high-performance compute-processing environments that power and integrate a wide range of hardware and services, including applications, servers, storage, and security.
Like many enterprises, Cisco is challenged to control the costs and improve the inefficiencies of maintaining a number of nonstandard, siloed business application environments. To mitigate these challenges, Cisco has embarked on a four-stage reengineering process that, when complete, will result in a service-oriented data center (SODC) in the US and Canada, and smaller, regional centers in the Asia Pacific and Japan, and in Europe theaters.
Nick Gall, Meta Research
These high-performance processing environments will power a wide range of applications and services, and will be optimized for reliability, availability, and serviceability. Management will take place through an Intelligent Management Fabric that will automatically provision and set storage service levels based on business unit requirements.
After years of bringing intelligence and functionality to local and global networks, Cisco began extending this added value into the data center, adding new intelligent capabilities that bring far more power and flexibility to business processes. “This is a significant opportunity for Cisco customers. Cisco has devoted a lot of time and talent and money to bring intelligent information networking into the storage and server and application area. This is good news for the Cisco IT data center ,” says Sidney Morgan, Cisco IT’s SODC manager.
The ambitious program is designed to meet these strategic business objectives:
1. Lower TCO. Cisco spends roughly 25 percent of its IT budget on data centers. And of the data center budget, approximately 50 percent is spent on storage and storage solutions.
According to a study by Gartner Research, “seriously reducing the cost of IT services is dependent on changing the model of computing from a non-shared to a shared model.” Cisco IT is working to lower TCO by consolidating data centers, by more closely monitoring current lifecycle management processes, and by establishing and enforcing data center standards of architecture and design. Cisco IT’s move toward a Linux and Windows/x86 standard for servers is one such drive toward standards; another is Cisco IT’s migration to Oracle 11i business tool integration.
2. Enhance business agility. Businesses succeed by being able to react quickly to marketplace changes and technology improvements, and rapidly deliver the right information where and when it is needed. But today’s disparate, complex infrastructures are both difficult to manage and change, and often incapable of meeting marketplace demands. Cisco IT is working toward delivering an on-demand utility, providing storage and processing resources to each application only when they are needed, and returning these resources to a single shared pool when they are no longer in use. Cisco IT’s goal is to provision new processing or storage resources to meet an application’s new requirements within 30 minutes, rather than the three months this process can currently take if storage and other needed components are not already ordered. Eventually, IT expects to be able to provide application support for different platforms, operating systems, and storage environments on demand, allowing the Cisco business units to deploy the technologies that make most sense for the business at that time. This goal requires a lot of flexibility from the underlying network; Cisco IT is already provisioning 10 Gigabit Ethernet connections within the data center to support a wider range of throughput, and using coarse wavelength-division multiplexing (CWDM) on the campus network to interconnect multiple data center SANs on campus into one larger SAN pool over Fiber Channel over IP (FCIP).
“For business agility, you need flexible and standards-based infrastructures that support new initiatives and technologies such as Web services, grid computing, virtualization, and automation,” says Morgan.
3. Improve business continuance. Downtime can be expensive, disruptive, and even catastrophic. Business continuance addresses both natural and man-made disruptions or disasters, and establishes mechanisms whereby network recovery does not impact the ability to do business. This includes traditional disaster recovery planning, an integrated security solution, and designing for high availability. Cisco IT’s disaster recovery planning currently requires maintaining two separate large data center locations in the United States, and connecting the two pooled SANs into a single SAN using FCIP over a CWDM link on the WAN. Critical e-commerce databases are duplicated across that SAN, allowing the backup data center to continue processing customer orders should the primary data center fail. Cisco IT’s integrated security solution is a many-layered threat detection and defense against directed and indiscriminate attacks on data-center resources, which also supports secure data transport and secure user and machine trust and identity management. Cisco IT’s high-availability design uses a high degree of redundancy in processors, power supplies, servers, storage and network links, as well as automatic load balancing of physical resources, transport traffic, and sessions so that losing any single link, server, or disk will have no impact on availability. Load balancing will take place at all levels, so every component is utilized or can be utilized at any time, but losing any single component will not affect application availability. This will also maximize the utilization of expensive resources, and costs will decrease.
“A loss of manufacturing operations for one hour costs Cisco between [US]$47,000 and $100,000 an hour, and it gets exponentially higher the longer services are down,” Morgan says.
“The goal at Cisco is to create a highly automated, services-based, secure virtual environment where all resources are allocated through an intelligent network fabric based on structured business goals,” says Sidney Morgan.
Figure 4. Cisco IT Data Center Redesign Phases
Cisco is undertaking a radical data center redesign with the goal of creating a single high-performance computer-processing environment powering a wide range of applications and services, including security, application optimization, and management. Although the data center has not yet been fully realized, the foundational technologies are being established on the way toward complete decentralization, virtualization, and dynamic provisioning.
Cisco is undertaking the redesign in three redesign phases (Figure 4).
CONSOLIDATION PHASE. Starting with isolated resource islands and disparate networks, the first evolutionary step is consolidating isolated computing and storage islands with enterprisewide networks. One example of this is Cisco IT’s move to consolidate data center resources into fewer physical locations. Another is the consolidation of storage resources into single SANs, using virtual SANs (VSANs) to allow consolidation of SAN islands onto a single fabric while ensuring scalability and security.
VIRTUALIZATION PHASE. Virtualization allows computing, network, and storage resources to be dynamically partitioned, provisioned, and assigned with ease to different applications. Cisco IT has nearly completed the process of pooling storage resources into a data center shared SAN, and is about to start pooling processing resources in the same way. Logical server partitioning, blade servers, and application-aware load-balancing services are all part of this phase. Storage resources are pooled into one or a few shared storage resource pools. Server processing resources are pooled into one or a few shared processing resource pools. This virtualization improves agility and makes it easier for the data center to keep up with changing business conditions. Resource virtualization requires the support of intelligent networks that are aware of applications and can respond to changing conditions to optimize the performance of each application. Content switching and application-oriented networking are examples of application integration.
AUTOMATION PHASE. The final step is flexible service automation, which allows an intelligent network fabric to rapidly and automatically detect and respond to the applications’ changing needs, and to provision processing and storage and security resources as needed. Automated service provisioning, automated security responses, and self-healing systems are the cornerstones of this phase. Automating these processes would not only speed Cisco IT’s response to new customer needs, but it would also significantly lower the TCO by reducing the need for manual intervention and the underutilization of allocated processing and storage resources. It would also improve reliability by reducing complexity and the need for human intervention.
Cisco IT is currently providing new data center functions within the consolidation phase. Application, databases, and SANs have been combined into an intelligent network architecture with a scalable foundation infrastructure. “The intelligent infrastructure supports security, delivery optimization, manageability, and availability from end to end,” says Allard. “We can tell our constituents that we can continue to provide or even improve their service level agreements [SLAs] on this consolidated infrastructure.” The network intelligence allows isolation of computing and storage components to ensure that a disruption to one application environment will not affect the other applications.
What differentiates the SODC from other architectures is the integration of higher-layer application services into the network. Offloading these capabilities from expensive server and storage resources improves overall performance and resource utilization, and gives Cisco more flexibility in choosing systems that best support each application within a consolidated infrastructure. These capabilities include data replication and distribution, virtualization, and intelligent application services such as file serving.
The physical architecture (Figure 5) is composed of storage, servers, and the network. At one time, Cisco IT storage was mostly disks directly attached to servers (direct attached storage, or DAS), but IT has migrated most of this storage to large shared SAN storage frames and NAS filers. Cisco IT currently uses a wide range of different types of servers with different OSs, but is planning a migration toward a limited standard of one-rack-unit servers and stackable blade servers, running either Linux or Windows. The network, based on the standard Cisco Catalyst 6500 Series, will support a wide array of blade-based shared services, like packet-level content switching, SSL encryption, stateful inspection firewalling, and application-aware content switching and communications services. Built into this architecture is Cisco IT’s security architecture: part physical, and part policy and practice.
Figure 5. SODC Physical Architecture
Currently (August 2005), Cisco manages 3.9 petabytes of storage (a petabyte is equal to 1024 terabytes). “Disk is relatively cheap, but storage and storage management is expensive,” says Scott Zimmer, Cisco IT senior manager, Enterprise Storage. “Meanwhile, growth is still rampant; and data center space is at a premium. Our storage utilization, currently at 34 percent capacity, is a serious problem at Cisco, but also presents potentially enormous return on investment [ROI]. We estimate that if we can drive up utilization of storage 10 percent a year to our goal of 70 percent utilization through storage pooling and virtualization, we can save $10,000,000 annually on our way to our ultimate goal of $30,000,000 in deferred costs.”
Utilization is measured in different ways by different people. In Cisco IT, utilization is measured in each step of the storage life cycle, and the overall utilization is that amount of physical storage that is configured for use, addressed for use, available in a logical volume, allocated to an application, and then actually used by that application. This utilization has increased from 20% to about 34% in 2005 (Figure 6). This improvement in utilization is partly because of improved technology (with the pooling and virtualization capabilities of the Cisco MDS switch), and partly through improved process (refer to the Cisco IT best practices "Data Storage Utilization" at http://www.cisco.com/web/about/ciscoitatwork/data_center/data_storage_utilization.html ).
Figure 6. Storage Utilization in the Storage Life Cycle
DAS is inefficient because it strands capacity behind servers. It is difficult to manage and difficult to back up. Many businesses have deployed SANs to address this problem, but they have often deployed isolated departmental SAN islands due to the difficulties of scaling and securing SAN fabrics. Resource utilization is often low, with stranded capacity, and inconsistent backup and data recovery practices prevent many organizations from achieving true high availability. The solution is consolidation of DAS and SAN islands into an enterprisewide intelligent SAN fabric.
Cisco IT is focusing on creating a consolidated storage utility using proprietary Cisco end-to-end storage networking solutions, including the Cisco MDS 9509 Multilayer Director Switch, which delivers multiple layers of intelligence, including multiprotocol support (Fibre Channel, iSCSI, FCIP), and VSANs, which provide embedded diagnostics and role-based security. With these innovations, companies can build highly scalable, available storage networks with comprehensive security and unified management.
The strategy allows Cisco to significantly lower overall TCO by providing storage when needed, as needed, and at appropriate service and cost levels. The strategy also provides these additional benefits.
- More ports to support multiple paths between servers and storage
- Nondisruptive upgrades
- Advanced troubleshooting and diagnostics
- VSANs to segregate traffic and management
- Data center crowding
- Multiprotocol support allows primary storage to be located in remote data centers
- Cost reduction
- SAN consolidation using VSANs to increase storage utilization by sharing each storage frame among many servers and business units
- Intelligent SAN fabric services
- Virtualization, continuous data protection, replication
- Improved performance
- A fully nonblocking architecture
Intelligent traffic management (quality of service [QoS], Fibre Channel congestion control)
Figure 7. Storage Architecture Goal
The goal is to create a minimal number of large, automated, fully networked, tiered storage pools with no physical ties between hosts, applications, and storage (Figure 7). “A storage network must be scalable, must support different interfaces and protocols, and must be easily manageable,” says Zimmer. While storage can be pooled and storage resources can be identical, Cisco IT is planning to provide varying service levels at varying prices to internal clients, labeled here as platinum, gold, silver, and bronze service levels.
Currently Cisco operates more than 8350 servers supported by about 120 system administrators, for a ratio of approximately 70 servers for every system administrator. These disparate servers use different operating systems and software environments. This is very difficult to maintain, and is an inefficient use of processing resources that results in unused capacity.
The goal by the end of Cisco FY2006 is to increase the ratio to 300 servers for every system administrator by standardizing on two platforms: Linux as the primary platform and Windows for applications that will not run on Linux X86 servers.
“This does not necessarily mean bigger hardware, it means modular hardware with consolidated management, like blade servers,” Allard says. “Consolidated management allows the sharing of resources between applications. This allows much lower total cost of ownership and better utilization. It is also easier to maintain,”
Cisco is also virtualizing its server infrastructure, using tools to take a single server and turn it into multiple virtual servers for small and temporary application needs, as well as turning a large pool of servers into a single virtual server for large engineering projects. A virtual server infrastructure enables unprecedented levels of workload isolation, and detailed resource control for all of the system’s computing and I/O resources. Virtual infrastructure integrates well with existing system management software and improves return on investment in shared storage (SAN). By consolidating physical systems in the data center onto virtual severs of nearly any size as soon as they are needed, Cisco hopes to:
- Lower hardware acquisition and maintenance costs using off-the-shelf commodity processors
- Consolidate idle system resources
- Increase operational efficiency
- Create cost-effective and consistent standard production environments
The network is a critical enabler for server consolidation. This allows more traffic and more applications on a single server. Also, this intelligent network must have the capability to route traffic to the most appropriate server, no matter where it might be at any point in time. “There must be a very reliable, high-speed, scalable network to support this, and session-based load sharing and failover must be possible,” says Morgan.
The network is changing to meet the challenges of the evolving data center. “We’re moving beyond the client-server models of the past toward a single, consolidated data center built on a resilient, intelligent network fabric that becomes the foundation for a new data center architecture,” says Wilson Ng, Cisco network engineer.
Figure 8. Layer 2 Data Center Gateway Architecture
Some of the foundational elements for the resilient, flexible, scalable, and service-oriented data center require a Layer 2 network architecture. With a Layer 2 network architecture, the physical server locations can be separated from the logical grouping of the servers (Figure 8). This flexibility promotes virtualization by allowing servers to be logically moved to various application environments without having to be physically moved. Application farms can request a pool of computing resources logically from the network fabric.
“As a result, we can extend a VLAN to any switch, port, or server that needs to be a member of the application farm,” says Ng. This intelligent infrastructure supports security, delivery optimization, manageability, and availability from end to end, and helps IT support improve their SLAs.
A Layer 2 architecture allows the centralization of the network service model, designed to virtualize and distribute Layer 4–7 network services such as load balancing and SSL acceleration to applications in an entire data center.
In 2003, Cisco made the decision to consolidate service-oriented appliances. Over time, Cisco IT has deployed new or improved services such as content switching (using the Content Switching Module [CSM]), stateless firewalling (using the FWSM), SSL acceleration (using the SSM), and network traffic analysis (using the Network Analysis Module [NAM]). Each of these module appliances consolidates multiple functions into one device, and enables a more flexible and secure deployment of applications within the data center. Cisco IT is also preparing for production deployment of the Application-Oriented Network (AON) blade, to add even more power and flexibility to the data center.
The AON blade examines communication streams within the network and performs intelligent routing and information processing on those streams. Cisco IT is already testing AON blades to provide middleware functions, providing security support with (for example) SSL transport-level encryption termination, XML payload encryption, protocol translation (for example HTTP to JMS), digital signature (for strong authentication), and “dirtynet”-to-application layer secure connection (like SSH or STA). Cisco IT is also testing the use of AON for message- and transaction-level logging and monitoring. In the future, Cisco IT will be exploring the use of AON to perform intelligent, application-aware routing, taking load balancing to a new level. One application of this intelligent load balancing that Cisco IT is testing is “service versioning”: directing user sessions to the right server applications based on the version of the client application running on the user PC. Other possible examples of AON-based load balancing under consideration are routing messages based on content or business rules, such as routing customers to premium servers based on the size or type of product request or order they are submitting.
“We used to have silos of service-oriented appliances, such as local directors and content service switches,” says Ng. “We had scores of these across the network, and it took a lot of management to manage each device for each application.” Currently, Cisco has aggregated nearly 70 percent of all such service to virtualized modules, he explains.
“We are able to share these services with any application group that has service requirements without having to provision new hardware for each application. As a result, management and provision is much easier.”
“The network intelligence allows isolation of computing and storage components to ensure that a disruption to one application environment will not affect the other applications. They retain control while enterprises reduce TCO, improve resilience, and improve the agility of their data center environments. The architectural approach to the data center network deliver consistent services to meet business goals,” says Allard.
The flexibility of intelligent networking will allow Cisco IT to provide more stringent layers of security to protect critical data. An intelligent network can subdivide a shared processor and storage service into any number of virtual separate data centers. Cisco IT is working to separate highly sensitive critical data—for example, financial data and sales forecasting data—from day-to-day operational data. Critical data will be protected by additional layers of firewalling, more sensitive intrusion protection systems (IPSs), stricter restrictions for access, and detailed logging and monitoring of user access to and manipulation of this critical data.
In its journey toward a service-oriented architecture, Cisco is demonstrating the value of consolidation and standardization in its own data center, starting with its storage network. Consolidating its storage resources into a high-performance, scalable SAN, Cisco saves $225 million per year over its previous infrastructure. Its cost per gigabyte of storage is much lower than the industry average. Cisco IT’s total cost of ownership (TCO) for storage was US$0.12 per MB of storage in 2002, when the initial migration to SAN was complete. Since migrating to MDS technology, that TCO has dropped to US$0.075 per MB in 2004 (refer to the Cisco IT best practices "Storage Area Network ROI" at http://www.cisco.com/en/US/about/ciscoitatwork/storagenetworking_op.html ). This TCO has reduced even further with better pooling of storage in 2005, to only US$0.034 per MB in 2005. This reduction in cost comes from two sources: the improvement in storage capacity of hard disks, and the improved ability of systems administrators to use the MDS switch management systems to manage disk storage across multiple storage frames and storage virtual SANs (VSANs).
The SODC at Cisco is delivering other improvements in resource utilization. Its application architecture is migrating toward standardized messaging, Web services, and other technologies to optimize its investments in application development, ensuring a smoother integration of applications and business processes from end to end.
Enterprises should start by identifying their main initiatives and strategies for the data center. The next step is developing a data center network strategy that includes the network stakeholders. These include the network architects, storage architects, security architects, and server and application architects, as a complete team.
“Don’t forget to include the business stakeholders. Organizational change is often more difficult than technical. And you need senior level buy-in to succeed,” says Morgan. “Cisco has created a formal Cisco-on-Cisco program run by a senior director to take the message to senior-level management. And that message is: your people no longer control physical boxes, but we can speed up application builds and improve service level agreements.
Cisco is integrating and virtualizing all of its application and storage environments. The front end of the network uses virtual LAN technology, and the back end uses virtual SAN technology. These technologies are similar—they allow logical division of application and storage resources. These will work with virtualization technologies such as grid computing, server and storage clustering, and virtual machines to help align resources with application requirements.
The ultimate goal is complete automation of the network and eventually the entire IT infrastructure. Automation aligns performance requirements with available resources. The network offers automation in the form of self-defending features, self-provisioning, and self-monitoring. With status-based interfaces, the network infrastructure can be integrated into policy engines and existing management systems to provision the infrastructure.