Cisco MediaSense Design Guide, Release 10.5
Scalability and Sizing
Downloads: This chapterpdf (PDF - 1.13 MB) The complete bookPDF (PDF - 5.57 MB) | The complete bookePub (ePub - 2.0 MB) | The complete bookMobi (Mobi - 3.44 MB) | Feedback

Scalability and Sizing

Scalability and Sizing


The supported capacity for MediaSense is a function of the hardware profile that the system selects at startup time. The hardware profile depends on which VM template the node is deployed on, and the VM template depends partially on what type of hardware you are deploying. (See Virtual Machine Configuration for a full description of each template.) The Hardware Profiles section shows the actual capacity when using each type of VM template.

For example, for each 7 vCPU template node (the standard for large production deployments) the MediaSense server supports up to 400 media streams simultaneously (200 calls) at a sustained busy hour call arrival rate of two calls per second on up to 12 terabytes of disk space. The 400 represents all streams used for recording, live monitoring, playback, .mp4 or .wav conversion, and HTTP download; all of which may occur in any combination. Conversion and download are not specifically streaming activities, but they do use system resources in a similar way and are considered to have equal weight. Playback of a video track takes 9 times more resources than playback of an audio track. As a result, each uploaded video playback (one video track plus one audio track) has the weight of 10 audio tracks, leading to a maximum capacity of 40 simultaneous video playbacks per node.

In determining how many streams are in use at any given time, you need to predict the number of onsets for each activity per unit time as well as their durations. Recording, live monitoring, and playback have a duration that is equal to the length of the recording. Video playbacks, if configured to play once only, have a duration equal to the length of the video. Video playbacks for hold purposes must be estimated to last as long as each video caller typically remains on hold. The .mp4 conversions, .wav conversions, and HTTP download durations are estimated at about 5 seconds per minute of recording.

To determine the number of servers required, evaluate this data:

  • The number simultaneous audio streams needed plus 10 times the number of videos being played, divided by the number of audio-weight media streams supported by each node
  • The number of busy hour call arrivals divided by the maximum call arrival rate for each node
  • The space required for retained recording sessions divided by the maximum media storage for each node.

The number of servers required is equal to the largest of the above three evaluations (rounded up).

Video playback for VoH, ViQ, and video messaging is further limited on 2\- and 4-vCPU virtual hardware and depends on the type of physical hardware being used. See Hardware Profiles for details.

Another factor that significantly impacts performance is the number of MediaSense API requests in progress. This is limited to 15 at a time for 7-vCPU systems, with the capability to queue up to 10 more (the numbers are reduced for smaller systems). These numbers are per node, but they can be doubled for MediaSense clusters that contain both a primary and a secondary node. For more information, see System Resiliency and Overload Throttling.

The media output and conversion operations (monitoring, playback, convert to MP4 or WAV, and HTTP download) are entirely under client control. The client enforces its own limits in these areas. The remaining operations (call recording and uploaded media file playback) are not under client control. The deployment can be sized so that the overall recording and video playback load will not exceed a desired maximum number cluster-wide (allowing for an enforceable number of monitoring, playback, and HTTP download operations). The recording and video playback load is balanced across all servers. (Perfect balance will not always be achieved, but each server has enough room to accommodate most disparities.)

Hardware Profiles

When MediaSense nodes are installed, they adjust their capacity expectations according to the hardware resources they discover from the underlying virtual machine. When the server is installed using one of the Cisco-provided OVA templates, the correct amount of CPU and memory are automatically provisioned and a matching hardware profile will be selected as a function of the number of vCPUs, CPU speed, and amount of memory provisioned. The hardware profile determines:

  • Number of audio-equivalent calls supported

  • Number of concurrent API requests supported

  • Maximum call arrival rate supported

  • Maximum number of nodes supported in the cluster

  • Maximum amount of media storage available

  • Cap on number of video playbacks supported

  • Number of other internal parameters

If an incorrect OVA template is used, or if the virtual machine's configuration is changed after the OVA template is applied so that the virtual machine does not exactly match one of the existing hardware profiles, the server is considered to be unsupported and the capacities in the Unsupported category are used.

For more information, see the Hardware Profile table at http:/​/​​wiki/​Virtualization_​for_​Cisco_​MediaSense.

Maximum Session Duration

MediaSense can record calls that are up to eight hours in duration. Beyond that duration, some sessions may end up being closed with an error status, and HTTP download and .mp4 or .wav conversion functions may not succeed.


MediaSense uses storage for two purposes: one set of disks holds the operating software and databases, and the other set is used for media storage. The two kinds of storage have very different performance and capacity requirements. Thin provisioning is not supported for any MediaSense disks.

Recorded Media Storage— Up to 60 terabytes is supported per cluster, divided into 12 TB in each of five servers. This is the theoretical maximum, which could only be attained if you are using SAN storage. If you are using Directly Attached Disks (DAS), then you are limited to the physical space available in the server.

Uploaded Media Storage— Uploaded media requires much less storage, but can also support up to 60 terabytes, divided into 12 TB in each of five servers.

If you are using Directly Attached Disks (DAS), then the first two disks (for operating software and database) must be configured as RAID 10.

If you are using SAN, note that only Fibre Channel-attached SAN is supported, and the SAN must be selected according to Cisco's specifications for supported SAN products (see "Cisco Unified Communications on the Cisco Unified Computing System" at http:/​/​​go/​swonly). SAN storage must be engineered to meet or exceed the disk performance specifications for each MediaSense virtual machine. These specifications are per node. If the nodes are sharing the same SAN, then the SAN must be engineered to support these specifications, times the number of nodes. For security purposes, you can use an encrypted SAN for media storage as long as the specifications at the link below can still be met.

For information about current disk performance specifications for MediaSense, see http:/​/​​wiki/​Virtualization_​for_​Cisco_​MediaSense.

UCS-E router blade modules come with fixed disk hardware and MediaSense scalability limits for each type of module are designed according to their actual performance characteristics. You do not need to engineer their disk arrays to meet the specifications. However, all of the drives should be manually configured as RAID-1.

Also, for these modules, the required downloadable .OVA template automatically cuts the disks into two 80-GB drives and one 210-GB drive, formatted. For those modules that have additional disk space available, you can configure the additional space for either uploaded media or recorded media as best suits your application.

Unified Border Element Capacity

A Cisco 3945E ISR G2 router when running as a border element and supporting simple call flows has a capacity of about 1000 simultaneous calls (if equipped with at least 2 GB preferably 4 GB of memory). In many circumstances, with multiple call movements, the capacity will be lower in the range of 800 calls (due to the additional signaling overhead). In addition, the capacity will further be reduced when other ISR G2 functions (such as QoS, SNMP polling, or T1-based routing) are enabled.

Some customers will need to deploy multiple ISR G2 routers in order to handle the required call capacity. A single MediaSense cluster can handle recordings from any number of ISR G2 routers.

The above cases apply to both Unified Border Element dial peer recording and Unified Communications Manager network-based recording.

Network Bandwidth Provisioning

For Call Recording

If Call Admission Control (CAC) is enabled, Unified Communications Manager automatically estimates whether there is enough available bandwidth between the forking device and the recording server so that media quality for either the current recording or for any other media channel along that path is not impacted. If sufficient bandwidth does not appear to be available, then Unified Communications Manager does not record the call; however, the call itself does not get dropped. There is also no alarm raised in this scenario. The only way to determine why a call did not get recorded in this situation is to examine its logs and CDR records.

It is important to provision enough bandwidth so that this does not happen. In calculating the requirements, the Unified Communications Manager administrator must include enough bandwidth for 2 two-way media streams, even though the reverse direction of each stream is not actually being used.

Bandwidth requirements also depend on the codecs in use and, in the case of video, on the frame rate, resolution, and dimensions of the image.

For Video Playback

Media connection negotiation is still bidirectional for video playback (even though MediaSense only sends data and does not receive it). This is an important consideration since the use of bidirectional media implies that you must provision double the bandwidth than what you might have otherwise expected.

Impact on Unified Communications Manager Sizing

MediaSense does not connect to any CTI engines, so the CTI scalability of Unified Communications Manager is not impacted. However, when MediaSense uses Cisco IP phone built-in-bridge recording, the Unified Communications Manager BHCA increases by two additional calls for each concurrent recording session.

For example, if the device busy hour call rate is six (6) without recording, then the BHCA with automatic recording enabled would be 18. To determine device BHCA with recording enabled, use this calculation:

(Normal BHCA rate + (2 * Normal BHCA rate)

For more information, see "Cisco Unified CM Silent Monitoring & Recording Overview.ppt" under SIP Trunk documents at http:/​/​​web/​sip/​docs.