Cisco on Cisco
Network Management Case Study: EMAN Automates Services Lifecycle Management at Cisco
Service delivery improves and operations support system costs drop dramatically.
Cisco® IT provides a variety of services, from voicemail to network performance monitoring, to the company's business units. Its mission is to maximize availability, minimize operations support system (OSS) costs, and improve agility by enabling rapid delivery of business solutions. The group manages thousands of applications running on hundreds of servers in five data centers. It supports 50,000+ Cisco employees and contractors at more than 300 sites around the world through a massive wide area network (WAN) and campus local area networks (LANs) composed of thousands of routers and switches.
This mission presents significant challenges, which Cisco IT initially met by assigning network engineers to manage individual service processes manually. As the company grew and the complexity of the networked systems increased, Cisco IT began to develop the first of several hundred software-based tools to automate these processes and address the following challenges:
- Operational monitoring: -Cisco IT used commercially available network monitoring tools. However, these tools were not Web-based, and IT wanted the flexibility to monitor network operations from any Web browser in the company.
- Performance monitoring: -"Traditional performance-monitoring tools simply alert IT that a network service is unavailable. Cisco IT wanted to move beyond reactive, 'red-light/green light' service assurance to a predictive and preventive model," says Alan Conley.
- Operations automation: - Network engineers spend significant time configuring and upgrading equipment. This task should be tightly controlled but also easily available to authorized employees when issues arise.
- Inventory management: - Cisco began to formally document network inventory in 1995, including devices, operating systems, and applications, as a prelude to effectively planning upgrades, patches, and other maintenance activities.
- Service provisioning: -Cisco IT used individual provisioning applications but realized that a single solution would reduce OSS costs, make it easier to determine which employees used a particular service, and facilitate monitoring service requests. A single solution would also make it easier to apply entitlement policy changes across the enterprise and remove entitlements quickly when employees left the company.
- Reporting: -Cisco managers must track and control their departments' IT services budget. Cisco IT managers who are responsible for network operations are also responsible for meeting (or exceeding) availability Service Level Agreements (SLAs) each quarter. These tasks require accurate information, which is usually presented in tables or graphs.
Over the course of the last decade, these tools evolved into Enterprise Management (EMAN), an integrated, Web-based suite that IT relies on to monitor and manage the Cisco infrastructure. EMAN addresses all aspects of the service lifecycle-service requests, approvals, provisioning, operations, and deactivations-and can be accessed by Cisco employees from any Cisco site. Unlike CiscoWorks, which manages Cisco devices only, EMAN manages all network components including those supplied by other vendors.
As Conley says, "Like most organizations, Cisco has become more and more sophisticated in the ways that it manages its network infrastructure. The EMAN modular architecture is flexible and has scaled as demand for more different types of services have grown."
In 1995, Cisco developed the first EMAN tool, a basic form of operational monitoring. A Perl script, running on several servers, issued Internet Control Message Protocol (ICMP) "pings" every 15 seconds to monitor network routers, switches, and other IP-addressable networked resources. If a resource failed to respond twice consecutively, an alarm alerted network engineers of the failure. "Now we could prove that a device was in fact running, or at least we could know that it had failed," says Conley.
As the network grew in complexity, it became increasingly important to alert the right network engineering support group in case of a problem. IT linked EMAN with Cisco's paging system and the HR employee directory, creating different paging groups. When EMAN notes a system malfunction, it alerts the correct support team, rather than all network engineers.
Figure 1. Integrated Service Management Environment
The IT team then addressed service assurance by enabling EMAN to provide alerts when system performance measures exceeded pre-set thresholds. "If IT is expected to meet an SLA for 'five-nines' availability, by the time you receive an alert that a resource is down, you have violated the SLA. It is critical for us to address problems before they affect productivity," says Conley.
The next EMAN enhancements-provisioning and activation- required IT to standardize hardware and software platforms as well as server naming conventions. After all IT groups agreed on standardization policies, Cisco IT programmed EMAN to automatically generate host names that describe application type, location, and other information. This allows IT staff to log into any server for out-of-band management.
IT continued to enhance system functionality, and by 1999, EMAN had become a fully-integrated service management environment. Today, it manages the entire service lifecycle for nearly 25,000 network devices in 16 technology categories. (See Figure 1.) EMAN also manages the service lifecycle of a large number of employee services such as Cisco MeetingPlace, Cisco Unity, Enterprise Class Teleworker (ECT), and Virtual Private Network (VPN).
Figure 2. Integrated Service Management Across Multiple Technologies
From the beginning, the most important design criteria for EMAN were scalability, reliability and availability, and low OSS costs. "One way that we achieved these goals was re-use, says Conley. "For example, we used the system that we developed for IP address management (IPAM) for telephony number management (TNM)." EMAN is built on Integrated Services Management (ISM), an elegant, service-oriented framework of reusable components developed internally or by third-party providers. (See Figure 2.)
The ISM framework addresses all aspects of the service lifecycle:
- Service activation and deactivation - Cisco employees can either request a service explicitly or be activated automatically by a triggering event, such as a hire or transfer. Requests are approved or denied through role-based entitlements. When employees change roles, their privileges change automatically.
- Provisioning - Provisioning can be automatic, as with e-mail, or more complex, as in business-to-business provisioning for a new circuit. (See Figure 3.)
- Service assurance -Performance monitoring, change management processes, and alerts help ensure that services meet agreed-upon levels.
- Service usage - Management reports, asset management, billing, and charge-backs allow managers to monitor and control employee use of services.
EMAN resides on a centralized Oracle database deployed on Linux and Microsoft Windows servers in 13 locations around the world. Its service delivery infrastructure, which is based on the appliance model, includes:
- Standardized network and server hardware architecture, including Windows and Linux operating systems
- Distributed one rack-unit service processing appliances
- Automated operating system builds and application software deployment and configuration
- Automated operation system and application software maintenance
- Web-based proxy management tools for service provisioning
- Centralized data store
- Integration with other ISM components, such as ordering, provisioning, billing/chargeback, and monitoring
Figure 3.Provisioning Process for e-page
The appliance model, which Conley calls "EMAN-in-a-rack," is now centrally managed and globally distributed. Cisco IT simply ships equipment to a remote site, where local staff rack and wire the units and install any necessary software. Services are built using out-of-band connections, a process that does not require precious WAN bandwidth. Low-cost appliances can be kept in inventory, allowing rapid deployment when new or redundant capacity is needed. IT personnel do not need to back up data captured by appliances, because it is transmitted to the central EMAN database.
EMAN manages Cisco and third-party devices and solutions through application program interfaces (APIs). Cisco IT uses several criteria to determine whether to integrate a particular network management function into EMAN-whether or not a function uses business logic, its size, its ability to plug into a standard API, and its overall usefulness. Cisco management components currently integrated into EMAN include Cisco Secure Access Control Server (ACS), Cisco Network Registrar for DHCP services, Cisco IP Solution Center (IPSC), and Cisco Intelligence Engine (IE) 2100, which is used for image management.
EMAN contains eight major components:
- Availability monitor -pings network hosts (components like router and switch ports, UPS and firewalls, servers and storage switches) every 15 seconds to measure service availability and response time. It also initiates small synthetic transactions in important applications every 30 seconds. If a response is not received from these pings or synthetic transactions in 20 seconds, or it is unexpected, an alarm is sent. EMAN also reports average availability and response time for two-minute intervals.
- Real-time monitor -provides real-time performance metrics for CPU, memory, bandwidth, and file system usage through an optimized Simplified Network Management Protocol (SNMP) polling engine. Also tracks the response time of each ping or synthetic transaction from the availability monitor. Trends in usage or response times are tracked to forestall potential future problems.
- Event server -processes SNMP traps, syslog messages, and operating system eventlog messages.
- Alert server - accepts alerts from multiple sources and sends them to the appropriate subscriber using the delivery method that the subscriber has requested based on time of day.
- Data collector -collects performance metrics from target systems at five-minute intervals
- Cisco Netflow collector -processes near-real-time Netflow statistics
- Spare Solaris server - contains boot images for rapid recovery in the case of system failure
- Windows application server - contains boot images for Windows-based appliances for rapid recovery in the case of system failure
Cisco Distinguished Engineer
Cisco IT employees manage all EMAN functions from a browser-based interface. All functions share the same definitions and metrics-another example of Cisco's commitment to reuse components. "The change management tool uses the same metrics as the availability metrics tool," Conley says. "Similarly, a service has the same name whether I am provisioning, activating, billing, or paging someone because it is not working." (See Figure 4.)
Figure 4. EMAN Change Management Browser Interface
Currently, Cisco IT does not charge business units directly for EMAN infrastructure services other than calling cards, pagers, and cell phones. IT simply provides an EMAN-generated bill to the Finance group for these services, which handles the billing process.
Conley notes that many companies are considering charge-backs for storage, and EMAN provides the needed capabilities. "Today, a common scenario is that a user requests 50 gigabytes of storage. The database administrator ups the amount to 100 gigabytes, and then the system administrator allocates 250 gigabytes. By enabling charge-back at Cisco, EMAN can help business units become more accountable for their requests and avoid the expense of over provisioning."
EMAN has enabled Cisco IT to meet the goals that it has set as a provider of services to internal customers. IT can now easily create new services, roll them out quickly, and support self-service, while helping ensure network reliability and availability. EMAN affects business units across the enterprise, from network operations and technical support to provisioning and billing-allowing the company to deliver a level of service that significantly enhances employee productivity.
EMAN provides answers to questions such as "How can I check the status of the applications and devices I manage?" through the following capabilities:
- Near-Real-Time Notification - EMAN pings systems every 15 or 30 seconds. Two-minute average availability and response times are computed and made available through an Application Monitor.
- Performance and Configuration Information -EMAN provides 60 performance and exception reports as well as hundreds of real-time performance-monitoring tools. The system closely tracks host devices, applications, the functioning of the network, and security services.
- Historical Trending - EMAN provides summary reports and more than 100 trend reports that display statistics for specific devices, applications, or network performance. Remote locations throughout Cisco collect this information and send it to the EMAN database server for processing and storage.
- Capacity Planning - EMAN offers a number of capacity-planning reports, which with historical trending information, greatly facilitate the process of managing network capacity.
- Integrated Change Management -EMAN enables Cisco IT to view and manage change requests, host outages, and application outages from a browser.
- Alert Services - EMAN monitors a large number of applications and network-related devices, including SNMP traps. A user with an alerts profile can subscribe to both public and private alert groups and choose to receive alerts via both e-mail and pager.
EMAN provides answers to questions such as "What people and equipment resources do I have?" and "How much am I spending on tools to get the job done?"
- Executive Metrics - EMAN offers a high-level view of network and application performance with more than 300 specialized reports geared to IT managers. Network availability metrics, for example, analyze specific areas of network performance around uptime, traffic allocation, and numbers of different devices.
- Inventory Reporting - "Cisco at a glance" provides managers with a variety of inventory reports covering devices, employees, user IDs, and registered EMAN users.
- Budget and Billing - EMAN offers a Manager's Billing Report, a Cisco Unified Communications Manager report, and access to a billing database. This information makes it possible to see how much is being spent on calling cards, pagers, cell phones, and other telecommunications services as well as detailed records broken down on a per-employee basis.
- Performance and Service Usage Trending - EMAN provides nearly 200 performance, exception, and trend reports as well as hundreds of real-time performance and monitoring tools. In addition, managers can view near-real-time and historical "snapshots" of application health.
- Provisioning of Network Services -EMAN automates provisioning services such as home office network access and paging through Cisco OnRamp, an internally-developed application that simplifies setting up and administering accounts.
EMAN provides answers to questions such as "How can I get access to needed applications?" or "How can I get voice services in my home office, or calling card or paging services?" through the following capabilities:
- Automated Services Requests - EMAN's Application Services Dashboard enables Cisco employees to request access to over 90 applications, from email and active directory access to more specialized business applications.
- Paging Services - EMAN's Pagers and Paging Webpage allow employees to order pages, send pages, create paging groups, report problems, and check their paging history.
- Employee Search - Employees can search for fellow employees by entering a full name, partial name, or ID into EMAN's Employee Search page. Advanced Search supports additional search criteria.
- Voicemail Services - EMAN's Telecom Service Center enables Cisco employees to request new voicemail services, while configuration change requests are handled through the Client Services homepage.
Prior to 1995, each Cisco IT group managed a block of IP addresses and phone numbers, assigning them to new users and updating resource and service records. Occasionally, however, IP addresses overlapped groups, and different users could be assigned the same IP address or telephone number, resulting in confusion and delayed service delivery. "And if someone needed an IP address, they did not always know which group to ask, which increased the time needed to deliver services," says Conley.
Today, EMAN supports a single enterprise-wide system of record for IP addresses and phone numbers-and Cisco no longer organizes IP and voice service by IT groups. To assign a new IP address or phone number, an EMAN process consults a preallocated address block, assigns the address, pre-registers all Domain Name Server (DNS) addresses, and automatically pushes the address out across the network. As Conley says, "Services are provisioned faster, and we have eliminated the possibility of duplicate IP addresses." In creating the telephone number management system, Cisco IT reused the same code that it developed for the Telecommunications Management Network's (TMN) IP Address Management (IPAM) function. In addition to the obvious efficiency of code reuse, technical support staff can support more services, because it is not necessary to learn a new interface. And EMAN's data repository makes it easier to access all the resources necessary to provision services, such as Cisco's Unified Video Advantage for video telephony, to individual users or groups.
Helping ensure that this system of record is "aware" of all active services requires a centralized approach to service approval and provisioning, or what Cisco IT refers to as proxied administration with delegation of authority. EMAN provides local administrators with a tool to request new phone numbers or IP addresses but does not give them the responsibility for assigning them.
A common interface is indispensable for managing increasingly prevalent converged services. For example, Microsoft Active Directory is integrated with many Cisco services, including Network Admission Control (NAC), Cisco Unified Communications Manager, Cisco Unity, and Cisco MeetingPlace. "When a new employee joins the company, Cisco IT must provision all these services," says Conley. "In the past we would open multiple tickets because each service was independent. Now that services are converged, EMAN provides integrated service management so that a single ticket covers all components of the overall service."
By automating service provisioning, EMAN helps ensure that all devices in Cisco locations worldwide use the same software versions. In the past, when Cisco IT was organized by geographic region, IT staffers in Europe could not handle U.S. service calls because they did not know the different device configurations. EMAN's ability to automatically maintain configuration firmware has enabled IT to standardize processes and the tools used to support them -making it possible to scale support across geographic regions and time zones.
EMAN currently manages 17 service-related applications across the enterprise. Cisco IT plans to extend EMAN to manage services required by business applications such as Oracle 11i, application infrastructure such as J2EE, and operating infrastructure such as Microsoft Exchange.
Leveraging the lessons that it has learned in managing its own infrastructure, Cisco is actively working on reproducing subsets of EMAN functionality as tools for its enterprise customers. This solution is designed to enhance the CiscoWorks solution by enabling companies to manage services across an entire WAN and to manage non-Cisco devices. In addition, more CiscoWorks tools are being built using standard SOAP/XML APIs to allow for their integration into an enterprise management system. EMAN will be able to integrate more of their specialized management functions into its oversight and management systems.
As Conley says, "We cannot shrink-wrap EMAN because it is so tightly integrated with so many Cisco business processes. However, we can develop and package an integrated set of reusable components that other companies can adopt and customize to their internal needs, as we have across Cisco. I predict that most enterprises will move to a service provider paradigm by 2008 or 2010, and an EMAN-like solution will be invaluable in facilitating that transition."