Out-tasking simplifies complex infrastructure and strengthens customer experience.
The Cisco IT group strives to make the best use of their highly skilled staff by empowering them to focus on strategic programs and complex network problems rather than day-to-day monitoring and management of existing network infrastructure. Cisco IT formally engaged with the Cisco® Remote Management Services (RMS) team to support the Cisco global network, which includes the Cisco campus, core, branch, Wide Area Application Services (WaaS), unified communications, and business video platforms.
Like many large IT organizations, Cisco faced the challenge of deriving maximum value from a large group of highly specialized team members. Yet too often, engineers were mired in routine operational activities rather than solving complex issues that affect the business in more profound ways. Additionally, Cisco IT engineers were on call 24 hours a day for one week out of every six. Most of the issues for which they were paged were routine. This on-call duty diminished job satisfaction and productivity, and was clearly not the best use of time for a senior engineer.
Consistency across the Cisco IT global network service area was also a challenge. No single, integrated system of tools, processes, and automation existed for incident management. Cisco IT network operations and support processes lacked a comprehensive program to proactively identify incidents, track incident recurrence, manage cases, and report metrics.
Monitoring and managing the global Cisco network infrastructure is an enormously time- and resource-intensive discipline. Starting in 2005, Cisco IT freed employees to work on core activities by out-tasking network monitoring and management for more than 4000 devices within the Cisco global core and branch networks. Out-tasked activities also included monitoring VPN connectivity and voice service for over 450 Cisco sites and more than 160 Cisco extranet sites around the world. Since then, RMS support has grown to monitor and manage over 11,000 devices and components, including the Cisco campus, core, branch, and WaaS, as well as over 14,000 Cisco TelePresence® endpoints.
Cisco IT looked to Cisco Remote Management Services to help deliver more consistent performance and simplify the monitoring and management of an increasingly complex network and IT infrastructure. The RMS portfolio provides global customers with proactive and reactive monitoring capabilities that use automation combined with Cisco intellectual capital to offer a variety of monitoring and management options across a broad set of Cisco technologies. Delivered by a team of industry-leading Cisco engineers, using tools and processes aligned to Information Technology Infrastructure Library (ITIL) standards, Remote Management Services provides a comprehensive and consistent framework for IT services management without the usual IT complexities or disruptions to network performance.
Cisco IT and Cisco RMS teams complement each other by applying their distinct core competencies to the Cisco network. "Cisco Remote Management Services' expertise lies in its consistent and reliable incident management processes, tools, and automation," says Mike Bullard, Cisco IT operations engineer. "This service excels at running all aspects of a 24-hour network operations center on a Cisco infrastructure."
Key Out-Tasked Activities
Cisco IT retains core activities while assigning more tactical, routine tasks to Cisco RMS. One such activity is Cisco IOS® Software upgrades, which take about one hour per device on average including the time required to execute the change process approved in the change management ticket. Cisco RMS executes IOS upgrades using automation that creates the image, then performs preliminary tasks such as checking memory, interface state, module state, and Enhanced Interior Gateway Routing Protocol (EIGRP) neighbors. It also checks to see if a peer router is scheduled for upgrade at the same time, and if there are open support tickets on the device. With automated tools, Cisco RMS can upgrade 30 devices within 10 minutes, including validation and quality assurance checks. This capability translates to 180 devices upgraded in one hour; using manual methods, only one device per hour can be upgraded. Using Cisco RMS automation increases IOS upgrade productivity by 180 percent.
In addition, Cisco RMS coordinates the resolution of carrier circuit issues, which is particularly valuable because an estimated 30 percent of incidents are due to issues with a carrier circuit. Cisco RMS employs engineers with expertise in branch connectivity who can quickly coordinate resolution with the carrier. Cisco RMS provides ticket e-bonding with major carriers to automate the coordination of ticket updates; as the carrier updates its ticket, the update is reflected in the Cisco RMS ticket, and vice versa. "The campus and branch teams would need to triple in size in order to handle carrier-related circuit problems," says Dan Price, Cisco IT operations manager. "With Cisco Services, we can tackle these ongoing issues without losing focus on our own strategic business initiatives."
Evolution of the Cisco IT/Cisco RMS Relationship
Cisco IT initially engaged with Cisco RMS to support the Cisco global routing and switching network. Based on the positive results of that engagement, Cisco IT expanded its relationship with Cisco Services for additional IT management and consulting services, including advanced network assessments, high-touch support services, network planning, and network optimization.
"When we supported the Cisco global network in-house, it took senior IT staff away from other, more business critical projects. By moving the day-to-day operational tasks to RMS, we gained efficiencies that resulted in greater availability, while freeing our engineers to focus on strategic and higher business value projects," says John Manville, senior vice president, Cisco Global Infrastructure Services.
Later, Cisco IT further expanded the responsibilities of RMS to include Cisco TelePresence and unified communications devices, leading RMS to activate 300+ TelePresence systems in just one year. Since then, RMS has continued to increase its remote monitoring and management of Cisco IT business video devices to 350 Application and Content Networking System (ACNS) devices and more than 1000 Cisco TelePresence systems installed in Cisco offices worldwide. Through automation, the number of tickets requiring manual intervention has been reduced by 47 percent. Currently, RMS supports over 11,000 devices and components on the Cisco campus, core, branch, WaaS, unified communications, and business video platforms.
Cisco RMS helped address other issues that had been hampering the Cisco IT team. For instance, troubleshooting high-volume (video) in-call network quality events once required a dedicated engineer to recreate the issue and try to solve the problem. Now the RMS automation logs into the codec, issues key commands to capture status as the network quality event is in process, and enriches the ticket with diagnosis output and historical information. If the condition normalizes, the ticket automatically resolves with key data to aid problem management analysis. If thresholds are crossed, the ticket is escalated to a Tier 2 engineer, who instantly has all the diagnostic information needed to resolve the problem. As a result, the mean time to resolution has decreased from 1-2 hours to 10-15 minutes per incident while minimizing human error.
Another example of how automation provides proactive network readiness is the monitoring of a backup circuit for available bandwidth. Some remote offices have a large bandwidth circuit (such as a DS3) for primary connectivity and a smaller bandwidth circuit (such as a T1) as a backup. When the primary circuit fails and the smaller backup circuit is employed, problems may arise because the smaller circuit might not have sufficient bandwidth to support a video call. In response, the Cisco RMS system launches and verifies that the backup circuit has adequate bandwidth to support a video call. If, however, that is not the case, then the system sends an email notification to the Cisco TelePresence support team about the affected site and the potential impact to video calls.
"Our engagement with RMS has increased availability and stability across the Cisco infrastructure. In terms of business value, that translates to higher productivity and an improved experience for our customers, partners, and employees," says Guillermo Diaz, Jr., vice president of Cisco Connected IT Services.
• Number of incident tickets requiring engineer intervention has been reduced by 47 percent.
• Use of automation has reduced the diagnosing of "in-call" network issues for TelePresence systems from 1-2 hours to 10-15 minutes while eliminating human error.
• Average duration of an outage was reduced by 66 percent, and the overall number of outages decreased as well.
• By identifying the source of complex or chronic problems, Cisco RMS helped ensure shorter recovery times after incidents and service losses.
Business Benefits to Cisco
Over the past two years, RMS smart capabilities have brought real and significant value to Cisco IT operations. From a Tier 1 and 2 perspective, automated incident handling significantly decreases the time, effort, and resources needed to solve issues. For Tier 3 and higher, historical information about the affected area of the network, combined with empirical data on possible root causes, helps enable TAC engineers to quickly resolve problems.
Meanwhile, Cisco IT is collaborating with Cisco RMS to improve the services that it offers to Cisco customers. "Cisco IT is like many of our customers in terms of basic network needs," says Jim Scaduto, service delivery executive for Cisco Services. "But because of the unique nature of the Cisco network, the Cisco RMS team continues to learn from the requirements of Cisco IT's global operations and early deployments of new technologies, which are experiences we don't often encounter with other customers. We share that knowledge with the Cisco RMS team globally in order to refine the processes, tools, and services that we offer to Cisco customers."
More Strategic Use of Cisco IT Resources
Relieved of 24-hour pager duty, Cisco engineers have more time and energy to devote to strategic new applications and projects. Rather than grow frustrated with repetitive tasks, they can hone their skills and learn new technologies. That boosts motivation and retention while demonstrating that Cisco values the job satisfaction of its senior networking staff.
For Cisco IT, the partnership with Cisco Services has proven invaluable. "Through the partnership with Cisco Services, our IT engineers can work on critical business initiatives and focus on solving complex operational issues while Cisco Services can take care of the high-volume events, such as circuit outages and bulk code upgrades," says Hasan Talukdar, senior manager for Cisco IT.
Cisco IT also benefits from the extended coverage provided by Cisco RMS monitoring and case tracking. For example, Cisco RMS opens a support case when an outage occurs in a carrier circuit or when server hard drives fail. In the past, Cisco IT was not necessarily aware of these events if they did not interrupt business activity.
Consistent Global Management
Previously, each of the three Cisco geographic regions followed different processes for incident response. Cisco RMS employs consistent global processes, which facilitates scalability of the network support operation. If an operating system vendor issues a patch, for example, Cisco RMS uses automation to install it quickly, working outside of Cisco's business hours in each geographic region. As Cisco grows, Cisco IT does not need to increase staff to handle these global operational tasks.
Cisco IT will continue to expand the number and complexity of services out-tasked to Cisco RMS, such as monitoring and support of Cisco's internal Application and Content Networking System (ACNS) deployments. The partnership between Cisco IT and Cisco RMS has a long tradition of Cisco being the first and best customer for its own products and services. This relationship continues to provide valuable knowledge to both groups. And perhaps most importantly, Cisco customers will ultimately benefit from the improved products, expanded service offerings, and greater technical expertise that Cisco offers as a result of this collaboration.