New platform supports more voice and video services, devices, and users with fewer servers.
For many years, Cisco IT used Cisco® 7845 Media Convergence Servers (Cisco MCS) as a standard hosting platform for voice services. Cisco IT also used systems such as Cisco Unified Communications Manager (Cisco UCM) for enterprise voice, the Cisco Unity® Connection servers for voicemail, and the Cisco Unified Contact Center Enterprise servers for contact centers.
In 2011, Cisco IT began to migrate all of its internal unified communications (UC) systems from dedicated, physical Cisco MCS servers to virtual machines running on Cisco Unified Computing System™ (Cisco UCS®) servers. This migration was prompted by several factors.
The first factor was the need to support more users and endpoints, particularly in the Cisco UCM clusters. Some of these clusters had grown close to their upper limit
1. "Many of our users have multiple phones, and many are adding Cisco Jabber® software clients that integrate voice, video, and data services," says Jim Marshall, Cisco IT engineer, UC and voice processing operations group. "All of these endpoints route voice calls through our Cisco UCM clusters, which means we needed more capacity for users and devices."
The second factor was the goal to consolidate and simplify Cisco's UC infrastructure worldwide to reduce data center costs. The third factor was the decision by Cisco IT to standardize on virtual machines (VMs) running on Cisco UCS servers to make regular maintenance easier and to improve server availability. The final factor was the need to replace the current Cisco MCS servers, which were aging and eligible for a scheduled hardware refresh. This migration effort from Cisco MCS to Cisco UCS servers would eventually encompass all Cisco locations and UC solutions. (Figure 1)
Such a large, complex, multi-part, and high-impact migration effort presented significant challenges and risks to Cisco IT. For example, although migration windows could be scheduled during times of low call volumes, Cisco IT wanted to reduce the risk of service disruptions as much as possible. This goal meant planning for many variables:
• Diverse UC host types, from the core Cisco UCM call-processing system to voice-messaging systems to contact center systems.
• Different sizes and implementations for the call-processing clusters, from a small cluster installed in a telecom closet in the Johannesburg Cisco office to multiple large clusters, supporting a variety of services, installed in the Cisco data center located in San Jose, California that serves the company's headquarters campus.
• Different data center requirements that required different types of Cisco UCS servers in different locations. Larger data centers could support the power requirements of UCS B-Series Blade Servers; smaller data centers, or less rack space, required UCS C-Series Rack Servers. (see Figure 2)
• Different requirements for UC service availability during the migration that would mean different migration plans for certain systems and sites.
• All UC servers had to be at version 8.0.3 or above to be safely migrated to Cisco UCS. Cisco IT needed to help ensure the upgrades took place before the migration began.
These factors were addressed by Cisco IT in the migration planning, which began in 2011 with the Cisco Unity Connection messaging servers and continued throughout 2012 and into 2013 with other UC systems.
As of early 2013, Cisco IT had migrated the following UC systems from Cisco MCS to Cisco UCS servers:
• Cisco Unity Connection for voice messaging
• Cisco UCM for call processing
• Cisco Unified Contact Center Enterprise (Cisco UCCE) and most related systems for Cisco's internal and outsourced contact centers
To meet the different requirements of the individual cluster locations and UC host types, Cisco IT defined two distinct migration methods: "changeover" and "modified greenfield."
Changeover method. In a changeover migration, the UC hosts and associated databases are moved from the Cisco MCS servers to the Cisco UCS servers during a designated time window using the Disaster Recovery System (DRS) tools provided by the application, as recommended by Cisco for customer migrations. This migration requires users to experience two phone resets, which can take a few minutes each: one reset when users fail-over to the backup server and another reset when users fail-back to the new primary server.
The changeover method also can take quite a bit of time for the IT operations team, depending on the size of the Cisco UCM cluster and the number of spare servers available as backup during the migration. Each server can require approximately one hour of installation time, and another 1.5 hours to perform the changeover. With clusters of more than 3
-4 subscriber survivors, this process can last longer than the time allocated for most reasonable change management windows. However, with more backup servers, the migration can be done in larger groups, reducing the time required.
The changeover method was used for Cisco UCM clusters with fewer subscriber servers because the service interruption was shorter and affected a smaller set of voice service clients. This method was not used for clusters serving Cisco's contact centers, which require 24-hour call system availability.
Modified greenfield method. In this method, the migration team built a separate, parallel server environment in advance, then simply redirected call traffic to the new server IP addresses when the migration point arrived. Cisco IT used this method for the larger Cisco UCM clusters because they support a large number of connected devices and critical contact center call services.
The modified greenfield method was also used to migrate the Cisco Unity Connection messaging hosts and Cisco UCCE hosts to minimize downtime of the voicemail and contact center services. Cisco IT also selected this method to re-name and assign new IP addresses to all Cisco UCM servers in accordance with IT's standard naming conventions and new IP address block requirements for the clusters.
"By building a parallel cluster, we avoided the pain of moving multiple subscriber pairs in a single weekend, but we added pain with the work of reconfiguring the voice gateways and making sure that all of the phones could move over correctly," says Jason Daniels, Cisco IT engineer. Building a parallel cluster also has the downside of requiring a freeze on moves/adds/changes (MAC) during the parallel build, since these changes in the existing cluster will not be reflected in the new cluster and will need to be manually re-built there.
However, "the modified greenfield method offers the confidence of a working and stable cluster at the start of the cutover," says Jan Seynaeve, unified communication engineer, Cisco IT. "For clusters with many phones, this migration method is faster because all phones can reregister to the new cluster right at the beginning of the cutover, even though you need to take the time to reconfigure the voice gateways with the new IP addresses. You can start testing the new cluster after a few hours, no matter the number of phones or subscriber pairs."
Table 1 presents factors to consider when choosing between the two migration methods.
Table 1. Advantages and Drawbacks of the Changeover and Greenfield Migration Methods
Little preparation needed
No MAC freeze needed, so the change is transparent for users (when done outside of work hours) and for peripheral services (e.g., Cisco Presence servers, backup/spare servers, webdialers, and Cisco WebEx®)
You get a working new cluster at the start
All phones migrate in one reset
Takes very little time to cutover
Cutover time is largely independent of the cluster size
Only option if UC planning requires new IP addressing and/or new Domain Name Service (DNS) naming during the upgrade
Takes longer during the cutover window; the more nodes/servers in the cluster, the longer it takes
Requires multiple resets of the phones and other endpoints
Requires multiple database synchronizations
MAC freeze is needed, which is an inconvenience to users
A lot of preparation needed (but no change window is needed during preparation)
New IP addresses must be assigned, so reconfiguration of some hardcoded endpoints is needed (e.g., H.323, Cisco IP Communicator), and some peripherals need reconfiguration (e.g., Cisco Presence servers, backup/spare servers, webdialers, and Cisco WebEx)
Moving over the cluster database to the new cluster is somewhat cumbersome
San Jose Headquarters Cluster
To serve the Cisco headquarters campus in San Jose, Cisco IT deployed one of the world's largest Cisco UCM clusters: a super-cluster with nine pairs of subscriber servers and a publisher server. This group of campus clusters serves 39,000 registered devices and 350 primary and backup voice gateways. The San Jose campus cluster includes the elements shown in Table 2. Migrating these servers to Cisco UCS created no service outages.
Table 2. Elements in the Cisco San Jose UCM Cluster
Trivial File Transfer Protocol (TFTP) servers
Phone services servers
Dedicated trace/file services servers
Cisco Unified Operations Manager server
Cisco Unified Provisioning Manager servers
Cisco Unified Service Manager server
Cisco Unified Service Statistics Manager server
In June 2012, Cisco IT used the modified greenfield method to migrate the legacy servers to virtualized machines running on Cisco UCS servers over a single weekend. The Cisco UCM cluster in San Jose is the largest and most complex in the Cisco network, so the decision to migrate this cluster in the early stages of the global migration might seem unusual. However, Cisco IT made this choice for two reasons:
1. To quickly bring the benefits of the new server environment to the cluster that was experiencing the highest growth and capacity demands.
2. To identify lessons that could simplify and speed the later migrations of large clusters in other locations.
As part of the San Jose cluster migration, Cisco IT also upgraded the Cisco UCM software from version 8.0.3 to version 8.6.2. Although this simultaneous upgrade required additional planning and implementation steps, it was done because the new version provides native support for a wider range of video endpoints and increased capacity to handle a larger number of endpoints.
The increased capacity was specifically beneficial to the clusters for the Cisco campuses in San Jose and Research Triangle Park (RTP), North Carolina where the large number of users and devices were reaching the capacity limits of the previous Cisco MCS servers. Support for the individual call-processing clusters increased from 7500 to 10,000 devices per call-processing node pair (primary and secondary subscriber pair), with overall Cisco UCM system capacity increasing from 60,000 to 80,000 devices total on Cisco IT's 19-server super-cluster.
In the future, the higher-speed Cisco UCS servers and 10 Gigabit Ethernet uplinks will reduce the time required to perform software upgrades by one-third, an ongoing benefit of migrating to the new servers.
One drawback of the modified greenfield method is that it requires a "freeze" of the production database that registers phones and other endpoints. This freeze is necessary to create a database snapshot to load onto the new cluster for testing and for the cutover itself. Because of the database freeze, no moves, adds, or changes (MAC) were to be entered into the old database for two weeks prior to the migration weekend. However, a few changes were essential, which meant the migration team needed to re-enter them manually into the new database after the cutover.
Also after the cutover, Cisco IT personnel verified that all of the connected phones reregistered automatically to the new servers. They ran scripts to re-point the connected voice gateways, gatekeepers, Session Initiation Protocol (SIP) trunks, and conferencing resources to the IP address of the new cluster. Automating this task with scripts was particularly important later for the Amsterdam cluster, which connects more than 200 voice gateways from the countries in the Cisco Europe, Middle East, and Africa (EMEA) region.
Cisco UCM Clusters in Other Cisco Locations
Decisions about Cisco UCS deployments for other locations were based on the cluster size and local facilities. Outside of San Jose, the larger Cisco UCM clusters reside in data centers at Cisco campuses in RTP and Amsterdam. These clusters were migrated to Cisco UCS B-Series Blade Servers, or a mix of B-Series and C-Series servers, which support a higher density of virtual machines. Additionally, the B-Series blades can use separate SAN storage for greater storage efficiency. In Amsterdam, the cutover migration of Cisco UCM hosts to the new Cisco UCS servers was completed in just four hours.
Smaller clusters, for example, the clusters in Johannesburg and Singapore (with one pair of subscribers each), require a smaller number of virtual machines and must support operation in smaller data centers, colocations, telecom closets, or small server rooms. These clusters were migrated to Cisco UCS C-Series Rack Servers.
Contact Center Systems
The servers for Cisco UCCE and other contact center hosts are installed in two data-center hubs, in Richardson, Texas and RTP. For redundancy and failover purposes, the servers operate in full-duplex mode to route all incoming contact center calls to locations around the world.
For the migration of the contact center systems to Cisco UCS servers, Cisco IT scheduled a migration window during a period of low call volumes. During that window, the contact centers used alternate routing methods to continue receiving inbound calls. Cisco IT later migrated the Cisco Unified Intelligence Center tools to the Cisco UCS platform as well.
To perform the Cisco UCCE migration, Cisco IT followed the migration advice found in these two documents:
For the contact center system hosts, the switchover migration required only 7 minutes out of a scheduled 15-minute migration window. "The contact center teams didn't even necessarily know that the server platform behind their systems had changed," says Oswald Fernandes, member of technical staff in the contact center practice, Cisco IT. "Once we made the switchover, all of the hosts and devices came back online right away."
New Team for Server Support
As part of the Cisco UCM migration, some responsibilities for server deployment, maintenance, and support shifted from the Cisco IT voice operations team to the hosting sysadmin team. Previously, the voice team was responsible for its own servers, but now the hosting team has primary responsibility for all server operations. The voice operations team continues to perform basic configuration and support tasks on the servers that deliver UC services, and is still responsible for the design and availability of the voice-service applications.
Cisco IT's migration of the UC systems to Cisco UCS servers has produced significant benefits for cost savings, simpler server support, and improved disaster recovery.
Reduced costs. With the migration to Cisco UCS servers, Cisco is realizing ongoing cost savings from the ability to use fewer servers and the associated reductions in equipment, cabling, power, rack space, and cooling.
Although the global migration did not reduce the number of UC server hosts, Cisco IT has reduced the number of physical servers from about 570 Cisco MCS servers to about 190 Cisco UCS servers. Most of the new servers are the Cisco UCS B-Series half-blade model, because virtualization allows a "mix and match" of multiple hosts on the Cisco UCS servers. Table 3 presents before-and-after data on the number of servers deployed to support Cisco UCM hosts.
Table 3. UC Server Needs at Cisco Before and After Migration
Size of Cisco (2013)
650 buildings, 365 locations
90 countries worldwide
Before migration: Cisco MCS physical servers
After migration: Cisco UCS physical servers
Cisco phone system
131,000 hardware phones (51,000 video phones)
59,000 software phones (mostly Cisco Jabber and Cisco WebEx Connect)
252 running in 13 global clusters (11 central core processing, 2 campus)
101 running in 12 global clusters
(10 central call processing, 2 campus)
Cisco TelePresence deployment
1510 Cisco TelePresence® rooms
5 running in one cluster
2 running in one cluster
Cisco Unity voicemail system
89,000 voicemail boxes
154 production servers, including 79 production Cisco Unity servers running in 41 nodes
30 production Cisco Unity Connection servers for running in 16 nodes
Cisco customer contact centers
5100 contact center agents
22 million calls per year
134 running in 6 clusters
49 running in 6 clusters
Cisco WebEx conferencing cluster
8.6 million meetings per year
1.5 billion minutes per year
10,000 concurrent sessions (peak)
21 in San Jose
6 in San Jose
Cisco Extranet cluster
8 in Bangalore
3 in Bangalore
Total physical server count
191 Cisco UCS B-Series or C-Series Servers
Cisco IT's overall ratio of old-to-new UC servers is a little more than 3:1. This ratio is less than the optimal 4:1 ratio for older Cisco UCS blades for two reasons: virtualization guidelines and cluster sizes.
Virtualization guidelines for UC on UCS recommend allocating two physical CPU cores per UC application, and four cores for specialized UC applications such as Cisco Unified Presence (CUP) and service management servers, e.g., Cisco Unified Operation Manager (CUOM), Cisco Unified Service Monitor (CUSM), and Cisco IT's proprietary Trace-File Servers. Cisco IT deployed most of these UCS servers in 2011, when the Cisco UCS B-200 half-blades with eight cores were readily available. Today the Cisco UCS blades are more powerful and can support more UC applications per blade. In addition, today's Cisco UCS servers are more power-efficient than early models.
Cisco IT cluster sizing is another important factor. Currently, Cisco IT dedicates a set of server blades exclusively to the UC service, and when only a partial blade is required, IT dedicates the entire blade to the service. For example, in the Johannesburg cluster, Cisco IT replaced 11 Cisco MCS servers (for two Cisco Unified Presence servers and one Trace File server) with 3.5 blades, which meant four blades were dedicated to the UC service. Part of the reduction in MCS-to-UCS ratio is based on this "rounding" practice.
Still, the overall power and space savings from the Cisco UCS migration are significant. (Figure 3) With only 191 servers (166 half-blade B-Series servers, and 25 1RU C-Series servers) to support, Cisco has produced these savings:
• 67 percent reduction in the total number of servers
• 84 percent reduction power and cooling requirements
• 87 percent lower space and cabling requirements
The migration also gave Cisco IT an opportunity to review and improve the voice infrastructure design for additional cost reduction. For example, Cisco IT consolidated the Cisco UCM cluster that serves the Middle East region (previously located in Manama, Bahrain) in the Amsterdam data center, although it is still configured as a separate cluster. This consolidation eliminated the associated operational and support costs for the separate UC systems and facilities. The migration teams also took advantage of opportunities to streamline databases used by the UC systems, such as removing records for inactive endpoints in the Cisco UCM subscriber databases.
Simpler support. The shift of responsibilities for most server maintenance to the hosting team has freed time for the voice operations team to work on issues more directly related to UC service delivery. With the ability to add virtual machines and storage capacity as needed, the voice team no longer needs to resolve problems related to server configuration or capacity, such as running out of memory or CPU resources.
For UC application support services, the migration to UCS is transparent to IT operations engineers; the servers simply appear as Cisco UCS hosts instead of Cisco MCS hosts. The hosting engineers now use the Cisco UCS management tool for basic server configuration and support.
Voice application support is much easier in a virtual environment as well. "Because all Cisco UCM servers are remote, we always upgrade them from the network, uploading the 4-GB ISO image to each server," says Seynaeve. "In the physical server world, we had to distribute the image to about 20 `local' ftp-servers, then each node in the cluster needed to download it from there. On Cisco UCS, we just upload the ISO images into the datastore once, then attach it to the virtual machines' DVD drive."
Improved disaster recovery. Recovering a UC host after a server failure can be accomplished in minutes instead of hours. "When rebooting the UC servers, there's always a chance of hardware issues, but on virtual machines that's not a problem", says Seynaeve. "When a server gets `stuck on boot', with hardware servers it was always a hassle troubleshooting them remotely with HP iLO (HP integrated Lights Out), which also requires an out-of-band infrastructure. In a VM environment, the VM console is just a click away."
Disaster recovery is also improved with the flexibility to quickly move the UC hosts to physical servers in either a failover data center located nearby or, over the network, even in another city.
"You can't approach a UC-to-UCS migration without good planning, but it's not textbook planning because of all the variations in the networks and systems involved," says Fernandes. "Our customers will want to draw on Cisco's knowledge and help in developing effective migration plans."
Drawing from the experience gained in migrating multiple UC systems in diverse environments, Cisco IT offers the following lessons learned.
Planning the Cutover: Avoid parallel server operation for Cisco UCM. "A cutover migration from the old servers to the new servers helps to avoid the confusion and complexity that would be created by trying to run Cisco UCM on parallel servers during the migration," says Denis O'Sullivan, manager, UC and voice processing operations group, Cisco IT. "You can still keep the old servers in place if you need to fallback because you encountered unexpected problems during the migration." However, parallel server operation may be appropriate for certain UC hosts or environments, such as for contact centers, where the migration can be accomplished gradually over time.
Planning the Cutover: Standardize migration plans. Although different types of UC hosts in different sites may have different installation and configuration requirements, these variations are easier to handle when migration decisions and activities are standardized. For example, standardizing on Cisco UCS B-Series Blade Servers for data center installations and Cisco UCS C-Series Rack Servers for telecom-closet installations simplifies the migration activity as well as ongoing server operations and troubleshooting.
Planning the Cutover: Create a detailed design for hosts and servers. Because of differing needs for server CPU, storage, network, and memory resources, not all UC hosts can be co-located on the same Cisco UCS server. Following the deployment recommendations of the Cisco product teams and creating a detailed plan for which hosts will share which servers can help to avoid conflicts and performance issues. More information on this topic is presented in a "
Rules of Thumb" blog post.
Before the Cutover: Notify the network support team and help desk about migration schedules. If support personnel are not aware that a cutover is occurring, user help calls during that time could prompt attempts to reactivate the old UC system, which disrupts the migration. To avoid this problem, provide clear information about migration plans and schedules to the help desk. Also, decommissioning the old servers as soon as the new servers are active, where possible, prevents attempts at reactivation. This step is particularly critical when the new and old servers have the same IP addresses.
Before the Cutover: Offer Cisco UCS training and tools to voice engineers. If the team that supports the voice network must also support the underlying servers, training on how to support the Cisco UCS servers and a virtual environment can be very valuable. This training would also be valuable for voice engineers when they work with a separate sysadmin team that supports the Cisco UCS infrastructure.
During cutover: Communicate the change to users. Service clients should be made aware of the upcoming change, even if it occurs over the weekend. Cisco IT sent emails to all users at affected sites, warning them that they may see their phone reboot during the cutover. Home users were asked to leave their Cisco Virtual Office routers and phones plugged in and running during the cutover. And people using the Cisco IP Communicator softphone were asked to update their Cisco UCM TFTP addresses to the new server addresses as soon as possible after the cutover.
After cutover: Expect manual changes. For the Cisco UCM cutover in San Jose, approximately four percent of phones had configuration issues that prevented automatic re-registration to the new clusters. These issues meant Cisco support staff needed to make a remote or on-site fix after the cutover to the Cisco UCS servers.
During the migration, Cisco IT kept to the current cluster architecture as much as possible, that is, keeping the different clusters (campus clusters, regional clusters, Cisco TelePresence clusters, Cisco Unity clusters, and contact center clusters) separated on different physical servers. With the initial migration effort complete, Cisco IT will create plans for additional Cisco UCM cluster consolidation to reduce circuit costs and to support video endpoints on these call-processing clusters instead of on dedicated video clusters.
This additional consolidation is possible because the Cisco UCS environment offers the necessary server capacity and stability, and the Cisco network offers sufficient bandwidth in the cluster uplinks. Cisco IT is also considering consolidating a few of the smaller, geographically separated clusters into larger clusters, to continue simplifying the management of Cisco IT's global voice and video services.
For contact center operations, Cisco IT will migrate interactive voice response (IVR) hosts from Cisco MCS servers to the Cisco Unified Voice Portal solution running on Cisco UCS servers. For this migration, Cisco IT will deploy the Cisco UCS servers in parallel to the existing environment, then gradually transition each region and gateway to minimize the impact on this vital element for inbound call routing.
Within the UC core, Cisco IT will migrate the inter-cluster network to full Session Initiation Protocol (SIP) interconnection. This migration entails replacing the internal Cisco IOS® Gatekeeper network with Cisco UCM Session Manager Edition (SME). (Figure 4)
SIP interconnection will reduce some management overhead on the network, and Cisco UCM will support SIP call signaling from end-to-end. It will also open support standards such as Binary Floor Control Protocol (BFCP), which makes integrating different video endpoints, mobile devices, and other devices into a single video conference session much easier and more scalable.
For More Information
In-depth insights about Cisco IT's strategies and activity for migrating UC systems to the Cisco UCS servers are presented in several posts on the Cisco website and blog site:
This publication describes how Cisco has benefited from the deployment of its own products. Many factors may have contributed to the results and benefits described; Cisco does not guarantee comparable results elsewhere.
Some jurisdictions do not allow disclaimer of express or implied warranties, therefore this disclaimer may not apply to you.
CISCO PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.