Total time to resolve issues drops from 30 hours to less than 6; automated support decreases time customers spend monitoring, troubleshooting, and remediating issues.
Business Situation and Challenge
Smart Call Home, a Cisco® software feature, provides a mechanism for automatically creating support cases and updating Cisco IT, customers, or partners about events and changes on a Cisco device in a customer network. Smart Call Home is supported on a broad range of Cisco products, from routers and switches to security, unified communications, and data center devices including Cisco Nexus® and Multilayer Director Switches and Cisco Unified Computing System™ (Cisco UCS®). Smart Call Home is provided to Cisco customers with SMARTnet® and other specified service contracts.
When enabled Smart Call Home looks for a specific set of faults that Cisco has identified through interaction with Cisco Technical Assistance Center (TAC) engineers, the Cisco support community, and developers. "These faults are considered significant events," says Bryan Williams, technical marketing engineer, Technical Services Product Management at Cisco. "Essentially we are helping reduce the `noise' for customers in their network management efforts."
Several steps are required to isolate, troubleshoot, and remediate the problem after a significant event is detected. Traditionally this resolution process has averaged about 30 hours from beginning to end, based on aggregate support case data of typical IT interaction with Cisco.
"We wanted to automate as much of this interaction as possible. That's what the Smart Call Home service does," says Williams. "Instead of waiting for a user to notice a problem or a fault to escalate and be reported, Smart Call Home proactively identifies and diagnoses faults before they can affect the business."
Solution and Benefits
When a significant event is detected, it is reported to Cisco IT through the Smart Call Home system, which compares the alerts to an extensive set of rules based on Cisco's intellectual capital. These rules determine when the problem occurred and on what device in the network, the potential impact of the failure, and the analysis and
recommended steps to remediate the problem. Data, messages, and notifications transmitted to and from the Smart Call Home system are encrypted for security.
The Smart Call Home automated workflow consists of the following tasks:
• Incoming messages are analyzed and correlated. The system assesses their severity and activates a notification sequence based on profiles set up by the customer, that specify who should be notified, how messages should be transmitted, and what types of events should receive alerts. The notifications contain analysis, troubleshooting data, and remediation recommendations generated automatically based on Cisco intellectual capital.
"Our goal," says Williams, "is to give customers the tools they need to resolve issues quickly themselves, without Cisco TAC intervention."
All the notification and supporting content is sent to a web portal that the customer and Cisco TAC engineers can access securely. In addition to diagnostic and remediation data, the portal contains up-to-date information about Call Home-enabled devices personalized for the customer.
• For faults deemed complex or severe, Smart Call Home initiates a support case with Cisco TAC on the customer's behalf. The support case includes detailed diagnostics and is sent directly to a TAC engineer with experience in resolving the issue. The TAC engineer contacts the customer directly to troubleshoot the problem.
Cisco IT: Smart Call Home on UCS
In 2008 when Cisco IT began migrating to Cisco UCS internally, Smart Call Home was a service-level option available to clients. As the first major internal customer of Cisco UCS, the UCS operations team has experienced the implementation, and evolutionary bumps, of Smart Call Home firsthand. The team has nearly 200 UCS clusters globally. A standard cluster consists of 2 Cisco UCS Fabric Interconnects, 5 Blade Server Chassis, and 40 Blade Servers. Call Home is supported within the embedded Cisco UCS Manager, which provides centralized management of all Cisco UCS software and hardware components across multiple chassis.
Before migrating to Cisco UCS, the operations team had been using third-party monitoring tools that performed diagnostics at the hardware level. "We were looking to replicate the same type of information and thought Smart Call Home could be our route to accomplish that," says Sara Kogut, a services manager in Cisco IT who ran the UCS operations readiness efforts during the migration. "Without an alerting tool like Smart Call Home we wouldn't have insight into the health of the hardware."
Soon after enabling Smart Call Home, the UCS operations team encountered a big challenge-figuring out how to process the massive volume of alerts the tool was generating. What did each alert mean? Which alerts signaled real problems that required action or TAC involvement, and which ones were non-issues?
The team discovered that the tool was initially "over-alerting" (for example, picking up false positives for thermal readings). Meanwhile, the Cisco UCS migration was gaining momentum, and the operations team needed an environment that would scale with the growth.
"Cisco UCS operations is a global, virtual team. During the first year of growth, we realized that the Smart Call Home alerts would best be dealt with by a hands-on team with on-site presence in our major data center-a vendor that could verify issues manually when needed," says Kogut.
Both Kogut's team and the vendor were learning about the tool simultaneously. The vendor's training focused on how to categorize alerts, what to look for, when to follow up, and how to close support cases. As the UCS environment grew, the vendor supplemented Smart Call Home alerts with physical verification, by manually opening up the UCS chassis and looking through UCS Manager to confirm the detected problem areas.
According to Kogut, "There was a good 12 months of ramp-up time before we could all use the tool efficiently."
"The learning curve was particularly rough initially because of the sheer number of emails that were generated by Smart Call Home," says Rita Gapuz, IT engineer, Milestone Technologies. "Two emails are generated for every alert, one for notification of the problem and the other for recovery. It was challenging to go through every email. My team's inboxes were getting maxed out extremely fast."
Gapuz and her team have become adept at classifying alerts by hardware type and resolution category, and assign people to look for and follow up on specific alert categories. It is a model that scales well within the Cisco UCS environment.
"From a support standpoint, we don't have to worry about logging into every UCS cluster and doing a health check because Smart Call Home is actively doing that," says Gapuz. For actionable alerts, the tool also prevents the vendor from having to open up the UCS chassis manually.
Since the internal Cisco UCS implementation began, Kogut's team and the vendor have provided feedback to the Smart Call Home development group. Their input helped abate over-alerting and refine other functionality in subsequent releases of the tool.
Business Impact and Metrics
The primary benefit of Smart Call Home is that it is proactive, notifying users of problems before they can affect the business. The diagnostics, analysis, and recommendations automatically generated by Smart Call Home decrease troubleshooting time and minimize the number of customer support cases that must be opened.
Faster Time to Resolution
Cisco Operations, IT, and TAC personnel have experienced significant decreases in time to resolution with Smart Call Home. Based on aggregate case data compiled for a typical IT interaction with Cisco to identify, troubleshoot, and remediate a business-impacting fault, the use of Smart Call Home has reduced the total time to resolution from 30 hours to less than 6 (Figure 1).
Following are the steps and elapsed time allocations for a typical fault resolution scenario before and after using Smart Call Home.
Fault Resolution without Smart Call Home
A device fault occurs and worsens. The problem is eventually noticed by an end user while running an application to execute a sale.
1. Isolate (30 minutes) Operations must identify the source device or component with the failure. This step might require input from business analysts, developers, or several IT people.
2. Troubleshoot (2 hours) After the device is isolated, IT needs to figure out why it is not working. Several resources might be consulted to find out what the error means and if anyone has successfully resolved the problem, such as logs, Error Message Decoder, Cisco Support Community, and external references.
3. Open support case (2 hours, 15 minutes) IT gathers the required information, such as contract and product serial number(s), and opens a support case on the customer's behalf via Cisco.com or by contacting the TAC directly.
4. Engage with Cisco TAC (26 hours, 15 minutes elapsed since device fault detected) The bulk of remediation time is spent on this step. Technical support engineer asks the customer for information, including an updated configuration, show commands, and logs. Customer uploads information to FTP server and sends an email to the engineer. After the logs are received and analyzed, the TAC engineer establishes a Cisco WebEx® conferencing session to troubleshoot the problem with the customer. In this case, a device component needs to be replaced.
5. Issue Return Material Authorization (RMA) and dispatch part (assumes 4-hour replacement coverage)
The Cisco TAC engineer creates an RMA, and the replacement part is shipped to the customer per service-level agreement requirements. Customer replaces the problem component.
Total time to resolution without Smart Call Home = 30 hours
Fault Resolution with Smart Call Home
During Smart Call Home 24-hour proactive monitoring, a device fault is detected. Based on rules contained within Smart Call Home, the system isolates the source of the problem and automatically generates analysis and recommended steps for remediation.
1. Isolate (0 minutes)
2. Troubleshoot (0 minutes)
3. Open support case (12 minutes) A support case is opened on behalf of the customer automatically. The support case routes directly to a Cisco TAC engineer that has experience resolving the problem. Significant data related to the fault are sent to the secure web portal.
4. Engage with Cisco TAC (42 minutes) The right support engineer has immediate access to the analysis, recommendations, and pertinent customer information in the web portal. The Cisco TAC engineer informs customer of the problem and verifies the steps to remediation.
5. Issue RMA and dispatch part (1 hour, 27 minutes elapsed since device fault detected) RMA is automatically created and replacement part sent to the customer. A 4-hour replacement coverage is assumed.
Total time to resolution with Smart Call Home = 5 hours, 27 minutes
Figure 1. Total Time to Resolution Before and After Using Cisco Smart Call Home
Self-Help Troubleshooting, Fewer Support Cases
The following Smart Call Home data, reported by Cisco IT, are actual totals for messages, notifications, and support cases generated for a customer's Cisco UCS deployment in the three months of May through July 2012. This customer has a large Cisco UCS setup consisting of approximately 175 domains and 8000 serviceable devices and subsystem components:
Total messages (alerts) received by Smart Call Home: 28,754
Total email notifications sent to the customer: 9149
Total support cases initiated: 411
"Even though Smart Call Home sent us more than 28,000 messages, less than a third of them resulted in email notifications to the customer," says Williams. "The intellectual capital and rules in Smart Call Home allow us to significantly reduce the number of messages that customers have to deal with. We strive to notify customers only when there's something they can do to resolve the issue."
The system analyzes and correlates the incoming messages, assesses the severity, and activates a notification sequence based on profiles created by the customer. Notifications contain analysis, recommendations, and troubleshooting data that gives customers the tools they need to resolve issues themselves quickly, without Cisco TAC intervention.
In the customer example (Figure 2), 28,000-plus total messages generated by the system within the 90-day period resulted in only 411 support cases. In these instances, Smart Call Home proactively generated a Cisco TAC support case, with detailed diagnostics and relevant debugging information attached. Support cases were routed to the appropriate Cisco engineer who could resolve the problem. Much of the data that Cisco TAC engineers need from customers, such as show commands, were also automatically generated and sent to a secure web portal that the TAC engineer and customer could access, reducing Mean Time to Repair (MTTR).
Figure 2. Troubleshooting a Cisco UCS Deployment with Smart Call Home
The Smart Call Home team aggressively strives to evolve the tool's capability and increase the level of value delivered to Cisco customers and partners, with an emphasis on three areas:
• Growth and maturity of intellectual capital library
• Supported Cisco devices
• Increased functionality through feature releases
Intellectual capital is the foundation upon which Smart Call Home is built. It consists of many bits of knowledge gathered from within Cisco. This knowledge is codified into rules that help automate decisions about the types of activities monitored for each device, the diagnostic path collected when a given fault is identified, the steps followed to troubleshoot and resolve an issue, and the Cisco resources proactively engaged for critical issues.
In 2012 the Smart Call Home intellectual capital library grew by more than 91 percent, from 1616 assets to 3089 assets. At this rate, more than 200 assets are being added to the library each month, and the Smart Call Home team anticipates accelerating this growth over time.
In late 2012 and early 2013, improvements to internal processes enabled the Smart Call Home team to release more intellectual capital assets into production faster than in previous years.
Supported Cisco Devices
Smart Call Home is supported across a broad range of Cisco products, with coverage for the vast majority of devices in the Borderless Networks and data center solution categories, as well as key devices in security and collaboration. The inclusion of Smart Call Home in additional Cisco devices will grow along with the company's product portfolio.
This is the most dynamic area of the Smart Call Home evolution and, by design, is heavily influenced by customer interaction. The next few Smart Call Home feature releases are designed to focus on the following:
• Simplification of registration and entitlement processes
• APIs to enable new integration and consumption models, both for customers and Cisco partners as well as internal stakeholders including TAC, IT, and product teams
• Mobility applications
• Platform scalability and performance
• Improved customer feedback mechanisms
For More Information
To read additional Cisco IT case studies about a variety of business solutions, visit Cisco on Cisco: Inside Cisco IT www.cisco.com/go/ciscoit
This publication describes how Cisco has benefited from the deployment of its own products. Many factors may have contributed to the results and benefits described; Cisco does not guarantee comparable results elsewhere.
CISCO PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.Some jurisdictions do not allow disclaimer of express or implied warranties, therefore this disclaimer may not apply to you.