Key Scenarios for Technical Operations
What You Will Learn
The Cisco® MATE portfolio, which consists of the Cisco MATE Design, Cisco MATE Live, and Cisco MATE Collector products, delivers the network manageability required for streamlining processes and for delivering cost-efficient reliable services. Each of these tightly integrated products simultaneously supports planning, engineering, and operational tasks.
This paper describes the portfolio and outlines key use cases for technical operations staff. As such, it focuses more on Cisco MATE Live with its immediate and easy access to both current and historical data. With Cisco MATE Live, you can instantly analyze network and traffic trends spanning hours to years. These capabilities are critical for network operations personnel.
Cisco MATE Portfolio
The Cisco MATE portfolio helps service providers to achieve and maintain operational excellence by bringing manageability to their networks. This means operating and growing networks efficiently, having the ability to scale operations, and providing services without disruption. These goals are essential to the success of telecommunications providers whether competing for customers based on cost or based on differentiated services.
Companies differ in their emphasis. For instance, many service providers rely mainly on buying capacity to avoid congestion. In contrast, traditional telecommunication providers often emphasize design and analysis to ensure resiliency, while some service providers consciously forego resiliency and work around congestion operationally.
Cisco MATE products work independently and collectively to meet cross-functional needs (Figure 1). MATE Live typically (though not exclusively) focuses on the operational scenarios; conversely, the planning, architecture, and engineering scenarios typically feature a combination of MATE Design and MATE Collector.
● MATE Design is a market-leading integrated system for design, engineering, and planning of IP/MPLS networks.
● MATE Live rapidly delivers in-depth network analytics with efficient navigation to both current and historical data for making critical business and technical decisions.
● MATE Collector automatically gathers and continuously maintains information on infrastructure elements, topology, operational state, and traffic statistics for network planning and analytics. It is used extensively by both MATE Design and MATE Live.
Figure 1. Key Purpose of Each MATE Product
Usage in Cross-Functional Environments
An underlying principle of the MATE portfolio is that many cross-functional activities would be more efficiently performed using a single common platform (Figure 2). The groups that benefit from the MATE portfolio - the planning, design, and operations groups - have complementary goals to ensure that networks remain robust and cost-effective to maintain, and that service agreements are maintained along with profitability.
As an example of this integration, MATE Live develops trending reports that can be imported into MATE Design growth plan tools to determine future needs, such as site-to-site traffic trends per class of service. Specific examples of this integration between MATE Design and MATE Live are discussed in the usage scenarios later in this paper.
Operational Usage Scenarios Emphasizing MATE Live
Technical operations departments address real problems on a continual basis. While monitoring, troubleshooting, and ensuring the health of the network, operations has to ensure that maintenance windows do not impact critical applications running on the network. Congestion during maintenance must be mitigated.
These groups have the conflicting drivers of a need to increase network capacity and footprint while maintaining existing network services with no impact to current customers. Notifying customers of pending impact due to a planned maintenance event is important to ensure customer service level agreements are not negatively impacted.
The great majority of network operators have minimal or limited network visibility. In large part, this is because it is very difficult to generalize a platform to handle disparities in equipment and implementations, and to be available to handle the rapid feedback needs of operations staff. Cisco MATE Live provides infrastructure analytics to fully support the needs of technical operations as well as other groups.
Some examples of the unique analytical capabilities of Cisco MATE Live include:
● Isolating sub-optimal Labeled Switch Paths (LSPs); for instance, those that may be away from the shortest path, or that have setup bandwidth that doesn’t match traffic.
● Mining data to find chronic congestion problems.
● Providing reliable site, peer, or region traffic statistics and trends.
● Providing quick access to issues, and the ability to navigate to their root causes, through current and historic “weather maps” (visual representations of network state) and network health panels.
Detecting and Troubleshooting Congestion
Operations groups need to detect congestion before it becomes disruptive to the network. Cisco MATE Live Map (Figure 2) allows them to do this. The Map page provides a high-level view of the current network health through a near-real-time weather map and health panels of critical peering traffic and network issues.
Figure 2. Monitoring Congested Interfaces in MATE Live Map
This information is current and visual, and offers at-a-glance views of any potential problems in the network. Since Map is tightly coupled with MATE Live’s analytics capabilities, operators can immediately navigate to detailed information to troubleshoot network issues (such as congestion).
After isolating an interface to interrogate further, the operator then navigates to related LSPs that are currently traversing that interface. These LSPs can then be ranked by traffic volume (Figure 3) so the operator can decide which ones ought to be rerouted to prevent congestion.
Figure 3. Interface LSPs Ranked by Setup Bandwidth and/or Traffic Volume
This constant monitoring and quick rerouting ensures that congestion is quickly solved before it becomes serious enough to affect service quality. The ability to do this quickly and repeatedly makes MATE Live an indispensable tool for operators to share with network engineering in order to set up LSPs in such a way that they may be managed more predictably.
When there is a small subset of LSPs to be fixed, and the fix is clear, it makes sense to manually configure the change. If, on the other hand, the fix is more complex (for example, if auto-bandwidth or some other dynamic setup is involved), then path computation is required for the fix. In these cases, the capabilities of MATE Design are needed. The section on Mitigating Congestion provides an illustration of this.
Coordinating Peering: Operations Perspective
Service provider peering is becoming increasingly complex and needs to be monitored more closely than in the past. Peering arrangements between service providers are a necessity on the Internet, generally reducing costs for transit services and increasing resiliency and capacity for all parties involved. However, peering relationships must be monitored closely to ensure that they are fair and commensurate with the original agreement.
Cisco MATE Live allows operators to closely monitor peering on multiple dimensions. As shown in Figure 4 (Peering Health Panel), a peering interface showing a disparity (and congestion in one direction) can be interrogated to view the traffic in both directions. This panel shows external peers and the sites that are connected to them; it also shows the “Traffic Out” (in blue) and the “Traffic In” (in green).
Figure 4. Select a Peering Interface to Interrogate Ingress and Egress Traffic
These icons illustrate the relationships between sites and their external peers; they show the highest utilization of incoming and outgoing traffic between all sites. Thus, operators can monitor the level of health of peering connections across sites, and can quickly navigate to specific interfaces to analyze issues.
The analytics are immediately available under MATE Live; they offer “to/from” traffic analysis, and identify any issues or changes with interfaces, nodes, capacities, or routing metrics. For example (Figure 5), you can project individual peering interface growth and plan which interfaces will need to be upgraded (and when).
Figure 5. Trends for Incoming Traffic on a Peering Interface
This planning is further augmented by the ability to report periodically (Figure 6) on aggregated traffic coming in and out of peering interfaces.
Figure 6. Report on Total Traffic at Peering Locations
Cisco MATE Design has a large role to play in the operations arena, often in conjunction with MATE Live. The following examples on growth projections illustrate this. MATE Live results can be imported into MATE Design for network planning, discussed in the next usage scenarios on growth projections.
Producing a Traffic Matrix Growth Trend (Site to Site)
Operations staff can generate a projection for traffic matrix growth based on a variety of groupings. For example, the projections can be based on groups of interfaces or on groups of LSPs, and can then be imported into MATE Design for use in forecasting (Figure 7).
Figure 7. Growth Is Projected for LSPs, Then Imported to MATE Design for Forecasting
The operator can then import these trends into MATE Design for forecasting. For example, operators can report on the traffic growth of external interfaces per site to determine which ones to upgrade by looking at those that are projected to exceed (for instance) 80 percent utilization.
Growth Within a Site or Overall Network
In addition to the site-to-site option, operations staff can monitor business health by looking at high-level traffic growth indicators.
Figure 8. High-Level Traffic Growth Indicators
For instance, navigating into specific interfaces, operations staff can look more closely at (for instance):
● Quarterly total network traffic and growth
● Quarterly per-site traffic and growth
Figure 9. Monthly Trend Information for Traffic Entering an Interface
Once these reports are generated, operators can also graph the historical and projected trends on various bases (per interface or group of interfaces, P95 values or not, weekly compared to daily traffic, etc.).
Health Reports for Chronic and Systemic Issues
Health reports are another useful tool for operations staff. In particular, chronic and systemic issues need to be sorted out before they cause service disruption. Reports through Cisco MATE Live make this information available to the network operator:
● Interfaces - Percentage of time over utilization threshold, operation status transitions
● LSPs - Percentage of time over setup bandwidth threshold, time not on shortest TE path, actual path changes
● Nodes - Percentage of time over CPU or memory utilization
Health Reports for LSPs
A sample LSP health report is shown in Figure 10.
Figure 10. Sample LSP Health Report
With this report, operators can make adjustments to LSPs when traffic is too far over their setup bandwidth, or when they are not of the shortest path often enough.
Health Reports for Interfaces or Nodes
Performing network health analysis allows an operator or engineer to catch a small problem before it becomes a larger one. For example (Figure 11), operators can use a health report to determine the 10 most hot or unstable interfaces, nodes, or LSPs.
Figure 11. Health Report to Determine Hot or Unstable Interfaces
Operators can then analyze behavior over time for further troubleshooting (Figure 12), and take corrective actions as needed.
Figure 12. Traffic Path Analysis
Monitoring network health on an ongoing basis helps ensure that service providers are within the bounds of SLAs.
Operations Tasks Served by Cisco MATE Design
Although Cisco MATE Live is cutting-edge for its analytics and flexible data stored, MATE Design has a number of key roles to play in the operations arena as well. For instance, MATE Design would be invoked by an operations team when scheduling maintenance or mitigating congestion.
Scheduling Maintenance using Cisco MATE Design
Increasing network capacity and footprint while maintaining existing network services (with no impact to current customers) presents a set of conflicting drivers. Notifying customers of pending impact due to a planned maintenance is important to ensure customer SLAs are not negatively impacted, and it is critical to perform network maintenance with little to no impact to current traffic.
To ensure that customers are notified before a maintenance activity, MATE Design can provide information about (for instance) VPN customers and specific services that are traversing the impacted network elements. MATE Design can also sequence the steps during maintenance to minimize the impact caused by congestion and packet loss in a network.
To do this, you first identify and notify the relevant customers by determining the nodes and links that will be impacted, and collecting a list of the VPN customers and services traversing those nodes and links (see Figure 13).
Figure 13. Selecting Demands Traversing an Interface or Node
Then, for maintenance sequencing, operators can make the appropriate changes to a design plan file through Metric Optimization or LSP optimization. MATE Design’s Changeover (under Tools) is used to sequence changes with the least amount of impact to the network (see Figure 14).
Figure 14. Using Changeover to Keep Link Utilization Below Specified Percentage
Mitigating Congestion using Cisco MATE Design
Either unplanned network failures or unusual network utilization have the potential to create network congestion. Dealing effectively with active network congestion is critical to network operations groups. Changes can be made in a network to alleviate active congestion, but accurately making these changes can be a risky activity.
MATE Design can be used to look at existing network issues, adjust network traffic flows with network optimization, and help accurately implement the changes in the network. Depending on the configuration of the customer’s network, this may involve tactical Internet Gateway Protocol (IGP) metric optimization, tactical explicit LSP optimization, or manual optimization.
To use MATE Design to address congestion mitigation, the operator first finds a plan file that represents the congested traffic, and either copies the failures from the current network into the representative plan file, or copies the network traffic from the representative plan file into the current plan file.
Figure 15 shows a plan file with a failed link (from Chicago to New York) and with representative demands.
Figure 15. Plan File with Failed Link But Representative Demands
The operator can optimize the network model to reduce steady-state congestion issues. IGP metrics can be used for coarse traffic engineering, and fine tuning can be achieved with MPLS Resource Reservation Protocol (RSVP) LSP traffic engineering.
In Figure 16, the explicit LSP traffic is traversing links that can handle it.
Figure 16. High-Bandwidth Demand Located and Rerouted Using Explicit LSP
Building and running a large-scale network requires a series of activities to be performed. These activities are typically divided into planning (months to years), engineering (weeks to months) and operations (tactical on an ongoing basis). These activities are most efficiently performed using a common infrastructure. The Cisco MATE portfolio is the foundational platform to support planning, engineering, and operations.
For More Information
To learn more about Cisco MATE Live visit http://www.cisco.com.
To learn more about the Cisco MATE portfolio, which includes MATE Live, MATE Design, and MATE Collector products please visit http://www.cisco.com.