Cisco on Cisco
Business of IT Case Study: How Cisco IT Implemented Organizational Change and Advanced Services for Operational Success
New organizational framework greatly improves operations.
Given today’s pressing need to optimize IT services and resources while reducing costs and improving organization-wide productivity, the Cisco lifecycle methodology offers the framework needed to make operations more efficient and responsive. Cisco IT Network and Data Center Services (NDCS) changed from using a traditional organizational model to Cisco’s own lifecycle model, with substantial operations improvements across five different metrics.
This case study describes Cisco IT’s internal infrastructure, a leading-edge enterprise IT environment that is among the largest and most complex in the world.
An enterprise with 300 locations in 90 countries, Cisco has 46 data centers and server rooms supporting the 65,000-plus employees. Fourteen of the data centers/server rooms are production or customer-facing and 32 are used for product development.
Figure 1. NDCS Pre-existing Traditional Model. With two separate service organizations, there was much duplication and lack of focus.
Like most IT organizations of large enterprises, Cisco IT used a traditional siloed organizational structure, with staffers doing both implementational as well as operational work, often having to drop operational projects to complete deployments. With the traditional organizational arrangement, there was much duplication of effort and lack of focus across the organization. In many cases, employees were unaware of the duplication that existed across the organization. The original organizational model included regional network teams and regional voice teams that were responsible for all aspects of implementing and operating their environments and services (Figure 1).
Cisco IT’s Network and Data Center Services (NDCS) organization needed focus. NDCS engaged Cisco Advanced Services’ Network Availability Improvement Services organization (NAIS) to identify the areas that needed to be changed and recommend how to proceed .
Vice President, IT Network and Data Center Services, Cisco
The charter of Cisco Advanced Services NAIS is to leverage Cisco and industry network leading practices to achieve a highly available, reliable operations infrastructure. NAIS assesses and remediates the people, process, and tools needed to mitigate operational risk and network complexity by running an Operational Risk Management Analysis (ORMA). The ORMA is a Cisco support deliverable that outlines a roadmap for operational excellence and availability via a best-practice approach to network design, tools, process, and expertise. Cisco Advanced Services NAIS bases the identification and ongoing improvement of best practices upon its ongoing support experience, industry guidance, and the accepted Cisco network design principles for all networks demanding high availability. Over the past eight years, NAIS has worked with more than 300 customers, evaluating the critical areas of:
- Managing Service Support
- Managing Change
- Managing Service Performance
- Managing Service Resiliency
- Staffing and Expertise
NAIS begins the process by interviewing business and IT leaders and senior engineers, and then gathers technical, process, tools, and organizaitonal documents and templates. After an assessment of the current state, NAIS outlines a detailed remediation plan to achieve business and availability goals, and prepares an achievable vision and roadmap.
After the ORMA report was performed in 2006, it was apparent to Cisco Vice President of IT NDCS John Manville that organizational changes were needed to drive the team to provide the additional scalability and agility that Cisco’s business required. “The Network and Data Center organization could not accommodate the kind of growth and technology evolution that Cisco and Cisco IT were expecting,” says Manville. “The existing resources were not structured to support this, and there was significant duplication of work and processes. These would likely be strained, possibly to the breaking point, with even a minimal amount of growth.”
It was time to think outside of the traditional IT “box,” and restructure the organization to accommodate the rapidly changing IT needs. The processes had to be consolidated and simplified, and communication/collaboration vehicles were needed. However, a change of this nature was not inconsequential; it would have a ripple effect throughout Cisco IT’s data centers and globalwide.
An organizational restructure to Cisco’s IT NDCS group solved the business problem. In Cisco’s second quarter of fiscal year 2008 (CY08 fourth quarter), Manville restructured NDCS to map to its own lifecycle business model, typically used by Cisco Services for customer network implementation. With more than 400 employees in NDCS, this was a substantial restructuring.
Figure 2. Cisco Lifecycle Methodology. Cisco IT NDCS now uses this framework for its organizational structure.
The Cisco lifecycle methodology (Figure 2) is comprised of six phases, all closely related: Prepare, Plan, Design, Implement, Operate, and Optimize. The lifecycle phases are implemented as follows:
Prepare phase: Business agility starts with preparation: anticipating the broad vision, requirements, and technologies needed to build and sustain a competitive advantage. In the Prepare phase, the organization determines a business case and financial rationale to support the adoption of new technology. By carefully anticipating future needs and developing both a technology strategy and a high-level architecture to meet those needs, a business is better equipped to contain costs during deployment and operations.
Plan phase: Successful technology deployment depends upon an accurate assessment of the organization’s current network, security state, and overall readiness to support the proposed solution. In the Plan phase, the organization ascertains whether it has adequate resources to manage a technology deployment project to completion. To evaluate and improve network security, the IT department tests its network for vulnerability to intruders and outside networks. IT then develops a detailed project plan to identify resources, potential difficulties, individual responsibilities, and the critical tasks necessary to deliver the final project on time and on budget.
Design phase: Developing a detailed design is essential to reducing risk, delays, and the total cost of network deployments. A design aligned with business goals and technical requirements can improve network performance while supporting high availability, reliability, security, and scalability. Day-to-day operations and network management processes need to be anticipated, and, when necessary, custom applications need to be created to integrate new systems into existing infrastructure. The design phase can also guide and accelerate successful implementation with a plan to stage, configure, test, and validate network operations.
IT Manager, Network Services, Cisco
Implement phase: A network is essential to any successful organization, and it must deliver vital services without disruption. In the implement phase, the organization works to integrate devices and new capabilities in accordance with the design, without compromising network availability or performance. After identifying and resolving potential problems, the organization attempts to speed return on investment with an efficient migration and successful implementation, including installing, configuring, integrating, testing, and commissioning all systems. After the network operation is validated, the organization can begin expanding and improving IT staff skills to further increase productivity and reduce system downtime.
Operate phase: Network operations represent a significant portion of IT budgets, so it is important to be able to reduce operating expenses while continually enhancing performance. Throughout the operate phase, the IT department proactively monitors the health and vital signs of the network to improve service quality, reduce disruptions, mitigate outages, and maintain high availability, reliability, and security. By providing an efficient framework and operational tools to respond to problems, a company can avoid costly downtime and business interruption. Expert operations also enable an organization to accommodate upgrades, moves, additions, and changes, while effectively reducing operating costs.
Optimize phase: A good business never stops looking for a competitive advantage. That is why continuous improvement is a mainstay of the lifecycle. Optimization is the continuous process of planning, designing, and implementing incremental improvements to existing processes. Have business goals or technical requirements changed? Is a new capability or enhanced performance recommended? As the organization looks to optimize its network and prepares to adapt to changing needs, the lifecycle begins anew, continually evolving the network and improving results.
Cisco's new NDCS organization includes administration on both the front end (via the Program Office) and the back end (via the Business Office), and incorporates Cisco's Lifecycle Model (Figure 3).
Figure 3. NDCS New Lifecycle Model. Cisco IT NDCS's current organizational structure provides focus. While the original organization model included regional network and voice teams responsible for implementing and operating their environments and services, the new organizational model splits out the "Implement" phase from the "Operate" phase for both the network and voice areas.
Organizationally, the change involved moving some resources from the former Engineering and Operations teams to the new Implementations team. This was a key component of the restructuring and presented the opportunity for the new Operations team to focus on operations without the distraction of deployments. As well, other NDCS team members were able to concentrate on their specific areas of expertise. For example, in the former organization, there was a single Storage Engineering and Operations team, which handled the implementation, operation, and design of storage. In the new organization, this storage team has been dispersed into the Implementation, Operations, and Design teams.
The restructuring, together with the NAIS ORMA report affected change in NDCS. Over the past two years, NDCS has deepened its relationship with Cisco IT advanced services for significant results. Overall, the operational maturity comparison of 2006 to 2008 shows dramatic improvement in each of the five areas (Figure 4).
Figure 4. Operational Maturity Comparison. In two years, Cisco IT NDCS showed dramatic improvement in each of the five areas per a followup ORMA.
NDCS now has an accountable team identified, is shifting behaviors.
The Advanced Services group interacts regularly with NDCS. Jim Scaduto, service delivery executive, has experienced a dramatic difference: “Previously, most NDCS personnel had to balance design, operational, and architectural issues, which is practically impossible. Now the design engineers are less likely to be pulled out of meetings or are on call after their regular work hours for emergencies. Likewise, the Operations teams can concentrate on providing priority services to the clients. Now we know exactly who has the focus and expertise in a specific area. I used to have to invite multiple people to meetings because no one IT staff member had the specific expertise that was needed.”
Likewise, architects and engineers changed their alignment from hardware perspective to a lifecycle perspective. When asked what they do, previously the response would be “storage” or “networking.” With the restructuring, the response would now be “design” or “implementation.”
The new organizational structure enables Cisco NDCS to proactively look for additional ways to improve efficiency in managing service support. To this end, experienced engineers are called onto the incident bridge to train newer staff members to handle incidents more quickly, which in turn reduces the impact time. Tools are being created to identify problems before they cause client impact. As well, the restructure has enabled NDCS to track service-level agreements (SLAs) for client support, freeing up more time to talk to clients and educate them on Cisco’s processes. Overall, this change increases awareness, communication, and improves overall customer satisfaction.
Before using the lifecycle methodology, NDCS had:
- An average of approximately 150 client-impacting incidents per quarter
- Total impacting outage duration of 1000-plus hours per quarter
- A defective root cause percentage consistently above 40 percent
The Cisco lifecycle methodology now provides a focus on operational excellence with these results:
- Incidents have decreased to approximately 70 per quarter
- The total impacting outage duration has been reduced to 300 impact hours per quarter
- The defective root cause percentage is now consistently below 10 percent
Figure 5. Customer Satisfaction. Ciscoâ€™s IT customer satisfaction has gradually risen since the restructuring.
Meanwhile, demonstrating quantitative positive linear results, Cisco NDCS has achieved customer satisfaction scores of 4.856, with 5 being the best possible score (Figure 5).
The customer satisfaction improvement is at least in part due to improved case handling, and maintaining SLA levels of 90 percent (Figure 6).
Figure 6. Service-Level Agreements. Ciscoâ€™s percentage of cases that were satisfied within the alotted SLA timeframe has risen from 60 percent to 90 percent since the NDCS restructuring.
Lack of metrics and ineffective measurement were called out in both the Managing Service Support and Service performance areas. IT performance metrics are fundamental to achieving and maintaining business value. According to Martha Bohler, senior manager of NDCS Business Operations, “Cisco has found that the key to success is choosing a small number of metrics that are relevant to the business and have the most impact on business outcomes. For this reason, we’ve focused on measuring service level and operational excellence, the business value of data center investments, and IT availability.”
The increased focus in the individual areas has enabled the Business Operations team to institute a web-based NDCS “Dashboard Central” to track operational, organizational, and financial IT performance. Available on Cisco’s intranet, the Dashboard Central includes these individual dashboards:
- Global Traffic Overview (GTO) Dashboard: Summarizes Cisco’s global network traffic by platform, region, and protocol. It provides bandwidth usage statistics per the Telecom Management Office on the traffic types running over the WAN using NetQos ReporterAnalyzer. Data can be searched by protocol header (of Cisco TelePresence™, Non-TelePresence™ video, close-captioned TV, YouTube, or Content Delivery Networks), and by region to obtain the WAN capacity information. Rolling three- and six-month summaries are available.
- Fleet Dashboard: Provides information on the "health of the network." It focuses on an ongoing hardware and software standardization effort for Cisco's IT production network.
- Operations Metrics Dashboard: Provides alliance incident and case data dashboardfrequency, duration defective root cause, SLAs, and customer satisfaction metrics.
- Service Metrics Dashboard: Provides service reports for case volume, distribution, and client experience per service.
- Data Center Metrics Dashboard: Offers data center utilization, consumption, and virtualization stastics. It can be searched by vice president, group, or operating system (e.g., Windows, Linux, etc.).
Figure 7. Incident Frequency and Duration. Both the frequency and duration of IT incidents have declined dramatically since the restructuring.
Together, the NDCS Dashboards provide multiple views of massive amounts of information that have tracked the results of IT NDCS’s restructuring (Figure 7 and Figure 8).
The frequency of outages dropped from 190 per quarter to 68. And because the number of incidents dropped, the total duration of incidents dropped from 6000 hours per quarter down to only 200 hours per quarter. Likewise, the severity of the cases dropped significantly.
Enabling the teams to focus had a tremendous impact on productivity and effectiveness. Shawn Shafai, IT manager, Network Services, says, “The new organizational structure gave us the opportunity to focus on our core operational work. Through this greater focus, we developed best practices and special purposed workflows to address opportunity areas. Our critical metrics quickly displayed the positive results from these changes. Consequently, the outstanding results started consistently being delivered quarter after quarter."
Figure 8. RCA Case Status. The overall number of incidents decreased from upwards of 200 per quarter, down to 60. Also lowered were the number of Severity 1 and 2 cases, leaving primarily only the more easily resolved Severity 3 cases.
The restructuring led to a number of positive results:
- The team can now spend more time training and mentoring.
- The creation of “focus areas” within the team has enabled sub-teams to tackle specific service areas that require attention.
- The team developed a strategy around proactive operations, executed with matching team processes.
- The team has nurtured relationships with its peers within the new NDCS organicational structure to enable “horizontal” processes. This ensures that each team receives from its peers what it needs as a “client,” and also enables a “service provider-client” feedback process.
In addition, reinforcement and consistent messaging within the team has enabled the team to fully use staff meetings to review metrics and directly connect them in to recognition and rewards for the team members.
The existing structure is currently finely tuned and well-positioned to accommodate growth and enable Cisco to respond quickly to its rapidly-changing business demands. And, as Cisco continues to grow, its IT department will continue to evolve to better serve the business needs. Shafai comments, “The Cisco lifecycle methodology laid the groundwork for us to structure our efforts for greater effectiveness and productivity. We’re continuing to direct the right people to focus further. The same initial concept is being put to use as a microcosm in the various teams to reap further benefits. We wouldn’t have been able to do that without this evolutionary framework.”
Manville sums it up: “By moving from a traditional technology, silo-based organizational structure to a lifecycle-based model, we were able to improve our operational metrics considerably. Our number of cases decreased by approximately 60 percent, and our time-to-repair to get clients back up and running has decreased by almost 70 percent. Overall, five out of five metrics improved dramatically.”