As optical networks evolve, service providers and large enterprises are increasingly focused on protecting their optical networks and supporting element management systems (EMSs). Network management system (NMS) applications are becoming more critical for organizations looking for the highest workforce productivity. The Cisco® Transport Manager High-Availability Solution addresses these needs, providing customers with a highly robust, resilient configuration that protects their mission-critical optical EMSs.
With a geographic redundancy configuration based on Veritas Volume Replicator and Global Cluster Option, the Cisco Transport Manager High-Availability Solution gives customers the flexibility to create an EMS capable of withstanding any type of disaster with minimal downtime.
The Cisco Transport Manager High-Availability Solution helps to ensure that network downtime is kept to a minimum. It provides carrier-class reliability when customers need it most. Cisco Transport Manager is built from relatively low-cost, off-the-shelf hardware and software components, resulting in low cost of ownership. In addition, database and system administration are readily available in today's Internet-driven industry.
A High-Availability Infrastructure for Network Resilience
The Cisco Transport Manager High-Availability Solution provides IP connectivity with high performance and availability. It uses the intelligent, add-on Cisco Transport Manager High-Availability Agent, which automatically detects failures and implements failover on a customer's optical EMS. Dual Sun servers and tight integration with Cisco Transport Manager software (Oracle Relational Database Management System [RDBMS] and Veritas products) provide a resilient combination to protect customers' networked environments. With the risk of data corruption reduced and network visibility enhanced, customers now have the added security of knowing that the Cisco Transport Manager platform is optimized to provide continuous service should a failure occur.
Cisco Transport Manager High-Availability Solution Advantages
The Cisco Transport Manager High-Availability Solution provides an automatic failover solution to specific software and single hardware failures without the need to reconfigure IP addresses on the customer's switched/router network. The solution offers:
• A highly resilient configuration
• Minimal downtime, resulting in carrier-class reliability
• Multiple redundant components to protect against hardware failures
• Low cost of ownership
• Protection of mission-critical optical EMS platforms
• Automatic restoration through the intelligent Cisco Transport Manager High-Availability Agent
• Continuous operation, which translates into operations cost savings by averting disruptions to business
• The ability to perform modifications in real time, while the primary system is active
• Automatic reconnection of Cisco Transport Manager clients after failover
• A Redundant Array of Independent Disks (RAID) 1+0 disk configuration for reduced risk of corrupted database information
• Transparent recovery from any fault within an external disk array
• A blend of best-of-breed products from Veritas, Oracle, and Sun
Cisco Transport Manager High-Availability Solution Configuration
Figure 1 provides an overview of Cisco Transport Manager in a locally redundant high-availability configuration, where the cluster configuration is connected to a switch or router network.
Figure 1. Cisco Transport Manager in a Locally Redundant High-Availability Environment
Figure 2 provides an overview of Cisco Transport Manager in a geographically redundant high-availability configuration with the cluster configuration connected to a switch or router network. Each location consists of a one- or two-node Cisco Transport Manager local redundancy configuration (a two-node configuration is shown).
Figure 2. Cisco Transport Manager in a Geographically Redundant High-Availability Environment
Cisco Transport Manager High-Availability Agent Features
Cisco Transport Manager High-Availability Agent offers several features, including:
• Restoration of the system when a failure has been detected
• Server and disk array redundancy
• Automatic reconnection to Cisco ONS and Cisco MGX Family network elements
• Mirrored internal boot and root disks
• Status monitoring of multiple "heartbeat" interfaces
• The ability to make modifications in real time while the primary system is active
• Fiber connectivity between the servers and the disk
• RAID 1+0 disk configuration optimization
• Transparent recovery of any fault in an external disk
• Redundant IP interfaces
• Automatic reconnection of Cisco Transport Manager clients
Cisco Transport Manager High-Availability Components
The Cisco Transport Manager High-Availability Solution operates on two identical Sun servers installed in a local cluster configuration. For geographic redundancy, both single and dual local clusters are supported, allowing maximum flexibility and hardware efficiency. In a local redundancy configuration, a heartbeat interconnection between the servers is set up in a redundant configuration with dedicated 100BASE-T Ethernet connections to proactively monitor the status of the primary Sun server. Redundant power supplies with multiple IP interfaces eliminate any potential single point of failure within the hardware design. Additional redundancy has been added in both primary and secondary servers with mirrored internal boot and root disks.
The disk array connectivity to the Sun StorEdge 3510FC disk arrays is accomplished through diverse fiber cabling (Figure 3). Dedicated direct-connect interconnect cables are used to pass data between the internally redundant 3510 disk arrays with the disks optimized in a RAID 1+0 configuration
Figure 3. Hardware Connectivity to the Disk Array
The Sun StorEdge 3510 FC disk array is connected to each Sun server through dual redundant fiber paths. Connectivity is completely redundant and the data transfer occurs at the PCI adapter's optical speed of 200 MBps.
Cisco Transport Manager High-Availability Hardware Configuration
• Redundant servers and disk array design
• Optimized with RAID 1+0 disk configuration
• Mirrored internal boot and root disks
• Redundant 100BASE-T IP interfaces
• Diverse fiber connectivity between the servers and the disk arrays
Cisco Transport Manager High-Availability Cluster Software Configuration
The Cisco Transport Manager High-Availability Solution is composed of several software elements. The database and Cisco Transport Manager application processes run on Solaris with Oracle Database Enterprise Edition. Veritas is the high-availability foundation for the intelligent Cisco Transport Manager High-Availability Agent, providing a fully integrated solution.
The local-redundancy Cisco Transport Manager High-Availability Solution works with one or two identical Sun servers configured in a 1:0 or 1:1 protection schema to protect against a single failure in the cluster. The benefit of a single-node cluster is that Cisco Transport Manager High-Availability Agent and Veritas high-availability software can monitor the server hardware and software and can communicate problems to a geographically redundant Cisco Transport Manager server or failover redundant components within the server.
In case of dual node clusters, a virtual IP address is used for Cisco Transport Manager. If the intelligent Cisco Transport Manager High-Availability Agent has determined that the primary Sun server is no longer operational, the secondary Sun server is activated to assume the master role. All Cisco Transport Manager connectivity remains the same through the virtual IP address.
In the event of a primary server failure (dual node clusters), reconnection of the Cisco Transport Manager clients is automatic. Once the secondary server has restarted all its "daemon" processes in the correct sequence through the intelligent Cisco Transport Manager High-Availability Agent, reconnections from the Cisco Transport Manager clients will be transparent to users.
Prior to reverting the master secondary server to the primary server, platform managers should investigate the failure condition and manually revert to the primary server after the condition has been cleared. The system will not automatically revert to prevent toggling of servers. Since all clients and operations support systems (OSSs) will interface to the virtual IP address, the GateWay/TL1, GateWay/SNMP, and GateWay/CORBA functions will not be affected by the loss of the primary server (dual node clusters).
A failure with an external disk in the array will be transparent, and the system will continue to operate with the RAID 1+0 configuration. Even with the failure of one of the cluster servers' heartbeat interfaces, the primary server would continue to operate without a failover.
Virtual IP Address Connectivity
Cisco has deployed its Cisco Transport Manager High-Availability Solution with multiple redundant IP interfaces. The virtual IP address configuration integrated into the solution with dual node clusters enables all Cisco Transport Manager clients, the Cisco ONS Family, and the OSS platforms to interface through a single (but redundant) IP interface. If a failure occurs due to cabling, a faulty PCI adapter module, or the redundant hub, the intelligent Cisco Transport Manager High-Availability Agent will automatically fail over to another interface and continue to operate with the same virtual IP address through the customer's switched/router network. Customers do not have to reconfigure any network elements or Cisco Transport Manager clients.
The Intelligent Cisco Transport Manager High-Availability Agent
The intelligent Cisco Transport Manager High-Availability Agent interacts with best-of-class products from Oracle, Veritas, and Sun, providing the application link between Veritas and Cisco Transport Manager processes. It automatically detects failures by proactively monitoring the redundant heartbeat interfaces, checking the status of specific daemons, and assessing the health of Oracle and Cisco Transport Manager. It is also responsible for starting and stopping all processes in an orderly fashion.
When a critical failure has been detected, Cisco Transport Manager High-Availability Agent will attempt to shut down the primary server and restart all processes on the secondary server in a matter of minutes. Cisco Transport Manager automatically recovers, network visibility is restored, and all data is resynchronized, helping to ensure reliability, recoverability, and continuous operation of the network. If a failure has been detected on the active primary virtual IP interface, the system will automatically toggle to a new physical interface through the switched/router network.
Figure 4 shows the high-availability configuration with multiple network interface cards (NICs).
Figure 4. MultiNIC Configuration
The intelligent Cisco Transport Manager High-Availability Agent:
• Checks for heartbeat and status on the primary Sun server
• Monitors the status of daemons, Oracle RDBMS, and Cisco Transport Manager
• Operates with best-of-class products from Oracle, Veritas, and Sun
• Is responsible for orderly shutdown and startup of daemon processes
• Constantly checks the status of IP interfaces used with the virtual IP address
• Is an add-on module to Cisco Transport Manager
The combination of the Cisco Transport Manager High-Availability Installation Guide with the intelligent Cisco Transport Manager High-Availability Agent provides customers with an integrated high-availability solution. The installation guide is based on a reference architecture that details all of the hardware, software, and required Solaris patches needed for a successful high-availability implementation. It provides detailed steps (articulated down to UNIX shell-level commands where possible) to install the complete Cisco Transport Manager High-Availability Solution from ground zero.
The geographically redundant configuration consists of two locally redundant Cisco Transport Manager installations that are "connected" through Veritas Volume Replicator and "controlled" through Veritas Global Cluster Manager. Volume Replicator keeps the data at the two sites synchronized through asynchronous block-level replication, while Global Cluster Manager communicates with the high-availability agent at each site to handle site failover, when required. The two sites use an Internet Control Message Protocol (ICMP) ping-based heartbeat mechanism, allowing for short failure detection time. A Java-based Global Cluster Manager GUI allows flexible management of multiple clusters and sites from a single console. With geographic redundancy, the highest level of network resilience is attained.
Cisco provides a high-availability certification and test plan as part of the Cisco Transport Manager High-Availability Installation Guide.
The Cisco Transport Manager High-Availability Solution provides customers with a resilient configuration that helps to ensure that their mission-critical optical EMSs are protected. The platform protects against single hardware failures with mirrored internal/root disks, redundant servers, redundant disk arrays, N+1 power supplies, and multiple IP heartbeat interfaces. Incorporating a RAID 1+0 provides the extra layer of disk security with a RAID 1 (mirroring) and RAID 0 (striping) combination for added protection against any disk failure.
With Veritas as an integral part of this solution, customers can resize volumes; perform online storage reconfiguration, or defragmentation; and even add or remove disks in real time in their environments. Veritas File System and Volume Manager provide increased flexibility, allowing modifications to be done on the active Sun server in real time while the system continues to operate. These features help to ensure continuous operation during routine or unplanned outages.
Veritas enhanced features include:
• Reduced maintenance downtime by enabling online maintenance operations
• Reduced outages due to hardware failure
• Simple manageability
• Reduced outages due to file system panics
• Faster recovery when outages do occur
The Cisco Transport Manager high-availability team has created a baseline architecture composed of Sun hardware that customers can use to set up their Cisco Transport Manager high-availability environments. For customers with no high-availability hardware components in their networks, the reference architecture provides the complete tested and qualified solution. The hardware is designed around a single hardware vendor (Sun) to help ensure support and interoperability between all products. By implementing all Sun hardware (Sun servers, Sun StorEdge 3510 FC disk arrays), customers reap the benefits of working with a single vendor, including interoperability and a single contact for resolving any potential hardware problems.
This reference configuration provides a baseline to validate the complete Cisco Transport Manager High-Availability Solution for both hardware and software. Customers may deviate from this configuration and implement their preferred hardware (Sun server and disk array), provided the hardware selected has been validated by Veritas for use with the Veritas Storage Foundation Enterprise HA for Oracle. Factors such as CPU, memory, cache, and disk capacity should be assessed, and any applicable Solaris 10 patches will need to be applied.
Customers who have chosen to implement their preferred disk arrays can also select their optimal RAID configurations. Since a RAID configuration spans multiple disks, load balancing is optimized, increased throughput and bandwidth are accomplished, and protection against single-disk failures is ensured to prevent any data corruption. As a result, online disk maintenance (such as defragmenting file systems or adding new disks or volumes) can be performed while the system is in operation with the Veritas tools.
All of these features combine to provide customers with a fast recovery (if needed) and with continuous service for planned and unplanned outages--critical to organizations requiring a high level of availability.
The Cisco Transport Manager High-Availability Solution provides a resilient, flexible, carrier-class optical management solution. With the solution, the risk of losing data is significantly reduced and the Cisco Transport Manager platform is optimized to provide continuous service if a failure does occur. The solution maximizes revenue and customer satisfaction by minimizing downtime in the event of a disaster or unforeseen network interruption.