Table Of Contents
Cisco Multipoint Technology and Design Details
Audio and Video Flows In A Multipoint TelePresence Design
Flow Control Overview
Audio and Video Positions
Audio to the CTMS in a Multipoint TelePresence Meeting
Calculating the Amount of Audio Traffic to the CTMS
Audio From the CTMS in a Multipoint Meeting
Calculating the Amount of Audio Traffic from the CTMS
Video in a Multipoint TelePresence Meeting
Camera Video Input
Auxiliary Video Input
Calculating the Amount of Video Traffic to the CTMS
Calculating the Amount of Video Traffic From the CTMS
Total Traffic to and from the CTMS
Video Switchover Delay
Overview of TelePresence Video on the Network
Deployment Models
Centralized Deployment
Deployment Considerations
Distributed Deployment
Deployment Considerations
Positioning of the CTMS within the Campus or Branch
Network Requirements
Latency
Bandwidth
Estimating Burst Sizes within Multipoint TelePresence Calls
Causes of Bursts within Multipoint TelePresence Calls
Bursts Due to I-Frame Replication
Location of the CTS Endpoints
Type of CTS Endpoints
Calculating Burst Sizes Due to I-Frame Replication
Other Considerations
Bursts due to the Auxiliary Video Input
Burst Estimation Due to Auxiliary Video Replication
Other Considerations
Normal P-Frame Video
Location of the CTS Endpoints
Burst Estimation Due to P-Frame Replication
Other Considerations
Cisco Multipoint Technology and Design Details
Audio and Video Flows In A Multipoint TelePresence Design
This section discusses in detail the audio and video flows within a multipoint TelePresence virtual meeting. The information provided within this section can be used by the network design engineer to correctly provision bandwidth to support multipoint TelePresence deployments.
Flow Control Overview
To help control bandwidth use during multipoint meetings, a new flow control feature has been implemented between the CTMS and CTS systems. This feature provides the ability for inactive table segments in a multipoint meeting to stop transmitting video, lowering overall bandwidth utilization.
After the multipoint meeting is initiated and the active table segments have been established, CTMS instructs CTS endpoints to stop transmitting video for table segments that are not currently being displayed. Audio continues to be sent from all table segments and used by the CTMS to determine when an inactive table segment becomes active. At that point, the CTMS instructs the CTS system to start transmitting video again for the newly active table segment. This process is continued throughout the meeting, helping reduce overall bandwidth consumption for the multipoint meeting. It should be noted that the CTMS implements a hold-down timer of approximately two seconds to keep the video from flapping due to temporary noise within the room. The flow control feature does not introduce any perceptible delay on top of the hold down time to participants within the meeting.
Audio and Video Positions
CTS endpoints are capable of both sending and receiving multiple audio and video streams. When CTS endpoints join a multipoint call, they first exchange Real Time Control Protocol (RTCP) packets. Successful exchange of these packets indicates the opposite endpoint is a Cisco TelePresence device, capable of supporting various Cisco extensions. Among other things, these extensions are used to determine the number of audio and video channels each TelePresence endpoint may send and receive.
Audio and video streams are sent and received based on their position within the CTS endpoint. Figure 11-1 shows this for a multipoint call consisting of CTS-3000s.
Figure 11-1 Audio and Video Stream Positions with CTS-3000s
Each CTS-3000 can transmit up to four audio streams and four video streams from the left, center, right, and auxiliary positions. These correspond to the left, center, and right cameras and microphones, as well as the auxiliary input. The auxiliary video input can be used by a PC for slide show presentations. The auxiliary audio input is shared between audio from the PC that accompanies the slide show presentation and audio from an audio-only participant, such as an IP phone add-on.
Note
With the CTS-3000 all microphones physically connect to the center codec, even though the audio positions are referred to as center, left, or right.
The CTMS can transmit up to four video streams, corresponding to the left, center, and right plasma displays of the CTS-3000, as well as either a projector or monitor for slide show presentations connected to the auxiliary video output. The CTMS can only transmit up to three audio streams, corresponding to the left, center, and right speaker positions of the CTS-3000. Audio sent by an originating CTS-3000 toward the auxiliary position is redirected to one of the three speaker positions of the destination CTS-3000 by the CTMS. The CTMS chooses the three loudest audio streams to send to the remote CTS-3000 when there are more than three streams with audio energy.
Note
The number of audio streams sent in a multipoint call is different than in a point-to-point call in which four audio streams can be sent and received by each CTS-3000.
Figure 11-2 shows the audio and video positions for a multipoint call consisting of CTS-1000s.
Figure 11-2 Audio and Video Stream Positions with CTS-1000s
Each CTS-1000 can transmit up to two audio streams and two video streams from the center and auxiliary positions. These correspond to the single camera and microphone of the CTS-1000, as well as the auxiliary input. However, the CTMS can still transmit up to three audio streams, corresponding to the left, center, and right speaker positions of the CTS-1000, even though the CTS-1000 only has a single center speaker. The CTS-1000 mixes the audio from each of the three positions to play out on its single speaker. The CTS-1000 can only receive up to two video streams, corresponding to the center plasma display and either a projector or monitor for slide show presentations connected to the auxiliary video output.
Audio to the CTMS in a Multipoint TelePresence Meeting
Audio from all TelePresence conference participants is always sent to the CTMS, regardless of whether the site has any audio energy (someone is speaking) or not. In other words, silence suppression of audio packets is not implemented within CTS units. Each microphone of a CTS unit transmits a single 64 Kbps RTP/AVC (IETF RFC 3551) audio stream using a separate RTP SSRC. A second 64 Kbps audio stream can be sent from either the auxiliary audio input or via the audio add-on feature, which can be used to add either a single phone or an audio conferencing bridge into the TelePresence virtual meeting.
Audio packet sizes average approximately 220 bytes in size including network headers and are sent every 20 ms. The payload size is approximately 160 bytes. Therefore the network overhead used for these calculations is approximately 27.27%. Each inbound audio stream generates approximately 88 Kbps inbound to the CTMS across an Ethernet segment.
Note
This has been confirmed through data traces taken by ESE. It includes a 20 Byte IP header, 8 Byte UDP header, 12 Byte RTP header, and 14 byte Ethernet header. When calculating the amount of bandwidth across the WAN, the Ethernet header overhead must be replaced with the appropriate Layer 2 WAN header.
Calculating the Amount of Audio Traffic to the CTMS
The number of audio streams inbound to the CTMS in a single multipoint meeting can be calculated by the following equations:
(N + (3 * M)) when the auxiliary audio input and audio-only add-on is not used
Or:
(N + (3 * M) + P) when the auxiliary audio input and/or an audio-only phone is added on
Where N = the number of CTS-1000 endpoints in the call, M = the number of CTS-3000 endpoints in the TelePresence call, and P is the number of auxiliary audio inputs and audio-only phones added-on to the meeting.
The total amount of audio traffic inbound to the CTMS in a single multipoint meeting can be calculated by simply multiplying the audio bandwidth per call to the equations above to yield the following:
88 Kbps * (N + (3 * M)) when the auxiliary audio input audio-only add-on is not used
Or:
88 Kbps * (N + (3 * M) + P) when the auxiliary audio input and/or an audio-only phone is added on
Note that only one device in a multipoint call can function as "presenter" for auxiliary video (i.e., PowerPoint slides) and audio input. Also, multiple audio-only phones can be bridged on through separate sites. However, if multiple audio-only devices need to be added into a TelePresence multipoint meeting, it is more effective to add an audio bridge, rather than have multiple sites add individual audio-only phones.
As an example, a 6-site multipoint CTS-1000 call in which the auxiliary audio input is not used and no audio-only phones are bridged onto the TelePresence meeting has an estimated inbound audio rate to the CTMS of the following:
88 Kbps per audio stream * 6 CTS-1000s = 528 Kbps toward the CTMS
The total amount of audio traffic inbound to the CTMS from multiple multipoint meetings can be calculated by simply summing the traffic from individual meetings. Extending the example above, if the CTMS is currently supporting one 6-site multipoint call with CTS-1000s only, and one 3-site multipoint call with CTS-3000s and an audio conference bridge added on, the total amount of inbound audio could be calculated as:
(88 Kbps * 6 CTS-1000s) + (88 Kbps * (3 * 3 CTS-3000s + 1 Audio Conf. Bridge)
528 Kbps + 880 Kbps = 1.408 Mbps
The maximum number of inbound audio streams to a CTMS can be estimated based upon the maximum number of table segments supported by the CTMS. Assuming 48 CTS-1000s in 16 separate three-party multipoint calls, with each site having an audio-only phone bridged onto the TelePresence meeting, the total amount of inbound audio traffic to the CTMS can be estimated as:
(3 CTS-1000s + 3 audio-only add-on phones) = 6 inbound audio streams per multipoint call
6 inbound audio streams * 16 multipoint calls = 96 inbound audio streams to the CTMS
The maximum amount of audio traffic inbound to the CTMS can be calculated by simply multiplying the audio bandwidth per call with the equations above to yield the following:
88 Kbps * 96 inbound audio streams = 8.448 Mbps
Therefore, the network would need to be able to support approximately 8.5 Mbps of inbound audio to the CTMS. Since the audio traffic is marked with the same DSCP marking (recommended by Cisco to be CS4) as the video traffic in a TelePresence meeting, this amount of audio may be relatively small compared to the amount of inbound video traffic to the CTMS. Video traffic calculations are discussed in other sections.
Audio From the CTMS in a Multipoint Meeting
Audio sent from the CTMS is somewhat more complex than audio sent to the CTMS. Since audio is continuously sent from each CTS unit to the CTMS, there can be multiple simultaneous speakers in a TelePresence multipoint conference. The CTMS determines which video stream is replicated and sent to the other endpoints, based on which speakers are talking at the moment and who is the loudest. This is signaled by way of the voice activity confidence metric within every voice packet sent from every CTS endpoint. Within the initial RTCP packet exchange which occurs immediately after the CTS endpoint establishes a connection with the CTMS, the CTS endpoint advertises its capability to send a voice activity confidence metric within voice packets. The voice activity confidence metric is an estimation of the amount of audio energy contained within the voice packet.

Note
The CTMS does not advertise the ability to send a voice activity confidence metric to CTS endpoints, nor does it include a voice activity confidence metric within voice packets sent to CTS endpoints.
Audio is replicated when sufficient audio energy is detected within the voice packets from each CTS endpoint, a feature known as Voice Activity Switching. The CTMS replicates up to three audio streams, each destined to a particular audio position of the CTS endpoint, left, center, or right. Each CTS endpoint mixes the inbound audio to be sent to its audio speaker(s). The audio data rate outbound from the CTMS onto the network is therefore somewhat variable, based on the number of simultaneous speakers in the virtual meeting.
Calculating the Amount of Audio Traffic from the CTMS
The number of audio streams outbound from the CTMS in a single multipoint meeting varies based upon how many speakers are talking. However, an estimate of the maximum number of audio streams outbound from the CTMS for a given multipoint meeting can be calculated with the following equation:
3 Audio Streams * Number of CTS Endpoints in the multipoint call
The total amount of audio traffic outbound from the CTMS in a single multipoint meeting can be calculated by simply multiplying the audio bandwidth per call with the equation above to yield the following:
88 Kbps * 3 Audio Streams * Number of CTS Endpoints in the multipoint call
For example, in a 6-site multipoint CTS-1000 call in which all sites are receiving the maximum of three audio streams, the CTMS would be sending approximately the following:
3 Audio Streams* 6 CTS-1000s =18 Audio Streams
88 Kbps per Audio Stream * 18 Audio Streams 1.58 Mbps of audio traffic
The total amount of audio traffic outbound from the CTMS from multiple multipoint meetings can again be calculated by simply summing the traffic from individual meetings.
Finally, the maximum number of outbound audio from a CTMS can also be estimated based upon the maximum number of audio segments supported by the CTMS. Assuming 48 CTS-1000s in 16 separate three-party multipoint calls, each receiving the maximum of three audio streams, the total number of outbound audio streams from the CTMS can be estimated as:
3 Audio Streams * 3 CTS-1000s per Call * 16 Separate Calls = 144 Audio Streams
The maximum amount of audio traffic outbound from the CTMS can be estimated by simply multiplying the audio bandwidth per call with the equation above to yield the following:
88 Kbps * 144 Audio Streams = 12.672 Mbps of audio outbound from the CTMS
Therefore, the network would need to be able to support approximately 12.7 Mbps of outbound audio from the CTMS.
Comparing the amount of audio sent to the CTMS with the amount of audio sent from the CTMS indicates that considerably more audio is typically sent outbound from the CTMS than is received by the CTMS during a TelePresence meeting. This traffic pattern is indicative of a multipoint meeting in which multiple audio streams have to be replicated and sent to CTS endpoints which can each simultaneously receive multiple audio streams. This is one of the major network differences between multiple point-to-point TelePresence meetings and multipoint TelePresence meetings.
Video in a Multipoint TelePresence Meeting
Unlike audio, video is not continuously transmitted from each CTS endpoint to the CTMS. Instead, the CTMS signals which endpoint should send its video. The CTMS determines which video stream to present to TelePresence meeting participants based on which speaker is currently talking or which speaker is talking the loudest if multiple speakers are talking simultaneously, also known as the active site or active segment. Figure 11-3 shows an example of this in a three-site CTS-1000 TelePresence call.
Figure 11-3 Video Flows in a 3-Site TelePresence Call
In order to switch the video, the CTMS determines which site is the active segment based upon the value of the voice activity confidence metric transmitted within voice packets from each site. Note that for multipoint calls which include CTS-3000s using speaker switching, there can be multiple active segments.
In the example above, CTS-1000 #1 is the active segment and slide presenter. Video from CTS-1000 #1 is therefore replicated on a packet-by-packet basis by the CTMS and sent to CTS-1000 #2 and CTS-1000 #3. However, the display of CTS-1000 #1 needs to continue showing video from the previous active segment. In the example above, the last active segment was CTS-1000 #2. Therefore, video from CTS-1000 #2 continues to be sent to the CTMS, where it is replicated on a packet-by-packet basis and sent to CTMS-1000 #1.
Camera Video Input
Video from the camera inputs is transmitted via H.264 at 30 frames/second using a separate RTP SSRC for each camera. Unlike audio, TelePresence video can be sent at different overall bit rates based upon the quality configuration of the CTS-endpoints and the CTMS. The following discussion is based upon the multipoint call configured for 1080p Best quality. Video rates can burst up to 4 Mbps * 110% = 4.4 Mbps per camera with this video setting. Since average video packet sizes are 1,100 bytes, network overhead can add an additional 4.91% overhead for a total of approximately 4.616 Mbps per video stream.
Auxiliary Video Input
Cisco TelePresence currently supports two frame rates for auxiliary video input, low speed auxiliary video input at 5 frames per second and high speed auxiliary video input at 30 frames per second. High speed auxiliary video input requires a separate codec be added to existing CTS endpoints and is not covered in this document.
Low speed auxiliary video is transmitted via H.264 at 5 frames/second using a separate RTP SSRC again. The maximum data rate can burst up to approximately 500 Kbps * 110% = 550 Kbps. Since average video packet sizes are approximately 1,100 bytes, network overhead can add another 4.91% overhead for a total of 577 Kbps for the low speed auxiliary video stream.
Note
These video overhead calculations include a 20 Byte IP header, 8 Byte UDP header, 12 Byte RTP header, and 14 byte Ethernet header. When calculating the amount of bandwidth across the WAN, the Ethernet header overhead must be replaced with the appropriate Layer 2 WAN header.
Calculating the Amount of Video Traffic to the CTMS
The total number of inbound video streams to the CTMS varies based upon the type of CTS units involved in the TelePresence meeting.
Meetings with CTS-1000s Only
For meetings which involve only CTS-1000s, the following equations hold for meetings with and without the use of the auxiliary video input, regardless of the number of CTS-1000 units involved in the multipoint call.
2 Inbound Camera Video Streams without Auxiliary Video Input Stream
Or:
2 Inbound Camera Video Streams + 1 Inbound Auxiliary Video Input Stream
The camera video input streams correspond to the active site and the last active site. There will only be a single additional inbound auxiliary video stream if any of the CTS-1000 units is functioning as a presenter (i.e., using the auxiliary video input for a PowerPoint presentation).
Note
When multiple devices connect to the auxiliary video and audio input, the last device connected is the presenter.
Therefore the following equations can be used to estimate the total amount of inbound video traffic to the CTMS by multiplying the video stream rates by the number of streams:
4.616 Mbps * 2 Inbound Video Streams = 9.232 Mbps without auxiliary video input
4.616 Mbps * 2 inbound video streams + 577 Kbps = 9.809 Mbps with auxiliary video input
Note
In order to determine the total amount of video with high-speed auxiliary video input, simply substitute 577 Kbps with 4.616 Mbps in all the equations discussed in this section.
The total number of inbound video streams to the CTMS from multiple TelePresence meetings involving only CTS-1000s can be found by multiplying the equations above by the number of simultaneous TelePresence meetings supported. For example, the total number of inbound video streams to a CTMS which is currently supporting two 8-site meetings and four 3-site meetings, both with presenters showing PowerPoint slides, can be calculated as:
6 total meetings * (2 Inbound Camera Video Streams + 1 Inbound Auxiliary Video Stream) = 12 Inbound Camera Video Streams + 6 Inbound Auxiliary Video Streams
The total amount of inbound video traffic to the CTMS can be estimated by simply multiplying the video rates by the number of meetings:
6 meetings * 9.809 Mbps Per Meeting with Auxiliary Video Input = 58.854 Mbps
The maximum amount of inbound video to a CTMS supporting only CTS-1000s can be estimated based upon the maximum number of video segments supported by the CTMS. Assuming 48 CTS-1000s in 16 separate three-party multipoint calls, each using the auxiliary video input, the total amount of inbound video traffic to the CTMS can be estimated as:
16 meetings * 9.809 Mbps Per Meeting with Auxiliary Video Input = 156.944 Mbps
Therefore, the network would need to be able to support approximately 157 Mbps of inbound video to the CTMS.
Meetings with CTS-3000s and CTS-3200s Only
For meetings which involve only CTS-3000s and CTS-3200s, there are six inbound camera video streams, regardless of the number of CTS-3000 and CTS-3200 units involved in the call and regardless of whether speaker-switching or room-switching is implemented within the meetings. These video streams correspond to the active segments and the last active segments. There is also a single additional inbound auxiliary video stream if any of the CTS-3000 or CTS-3200 units is functioning as a presenter (i.e., using the auxiliary video input for a PowerPoint presentation).
Therefore the following equations can be used to estimate the total amount of inbound video traffic to the CTMS, again assuming low-speed auxiliary video:
4.616 Mbps * 6 Inbound Video Streams = 27.696 Mbps without Auxiliary Video Input
4.616 Mbps * 6 inbound Video Streams + 577 Kbps = 28.273 Mbps with Auxiliary Video Input
The total amount of inbound video to the CTMS from multiple TelePresence meetings involving only CTS-3000s and CTS-3200s can be found by multiplying the equations above by the number of simultaneous TelePresence meetings supported. For example, the total amount of inbound video traffic to a CTMS which is currently supporting two 4-site meetings and two 3-site meetings, both with presenters showing PowerPoint slides, can be calculated as:
4 Total Meetings * 28.273 Mbps Per Meeting with Auxiliary Video Input = 113.092 Mbps
The maximum amount of inbound video to a CTMS supporting only CTS-3000s and CTS-3200s can be estimated based upon the maximum number of video segments supported by the CTMS. Assuming 16 CTS-3000s in four separate 3-site multipoint calls and one 4-site multipoint call, each using the auxiliary video input, the total amount of inbound video traffic to the CTMS can be estimated as:
5 Total Meetings * 28.273 Mbps Per Meeting with Auxiliary Video Input = 141.365 Mbps
Therefore, the network would need to be able to support approximately 141 Mbps of inbound video traffic to the CTMS.
Meetings with Combinations of CTS-1000s, CTS-3000s, and CTS-3200s
For meetings which involve one CTS-3000 or CTS-3200 and two or more CTS-1000s, or meetings which involve two or more CTS-3000s or CTS-3200s, and any number of CTS-1000s, there are always six inbound camera video streams. These video streams correspond to the active segments and the last active segments. There is also a single additional inbound auxiliary video stream if any of the CTS units is functioning as a presenter (i.e., using the auxiliary video input for a PowerPoint presentation). Therefore the same equations from the previous section regarding CTS-3000s and CTS-3200s only apply to mixed meetings of CTS-1000s, CTS-3000s, and CTS-3200s.
The one exception is when there is one CTS-3000 or CTS-3200 and only two CTS-1000s. In this case, there are a total of five inbound camera video streams. It should be noted that there are not enough video streams to fill all the CTS-3000 displays in such a meeting. In other words one CTS-3000 screen is blank.
Note
It is assumed that meetings with one CTS-3000 or CTS-3200 and one CTS-1000 do not require a CTMS, and are not considered a multipoint meeting, although it is possible to hold a two-site multipoint meeting.
Calculating the Amount of Video Traffic From the CTMS
The total number of outbound video streams from the CTMS to the CTS units equals the number of video table segments in the multipoint call, where each CTS-1000 counts as one table segment and each CTS-3000 or CTS-3200 counts as three table segments. In addition, if the auxiliary video input is being used in the TelePresence meeting, an additional amount of video up to 577 Kbps times the number of endpoints is transmitted by the CTMS. Again this assumes low-speed auxiliary video only.
The following equation can be used to estimate the amount of outbound video from the CTMS:
((N + (3 * M)) * 4.616 Mbps Per Video Stream without Auxiliary Video Input
((N + (3 * M)) * 4.616 Mbps + (N + M -1) * 577 Kbps with Auxiliary Video Input
Where N is the number of CTS-1000s in the call and M is the number of CTS-3000s in the call.
For example, in a single multipoint call with 3 CTS-3000s and 5 CTS-1000s, along with an auxiliary video stream from a presentation, the estimated amount of outbound video traffic from the CTMS would be:
((5 + (3 * 3)) * 4.616 Mbps + (5 + 3 - 1) * 577 Kbps = 68.663 Mbps
The total amount of video traffic outbound from the CTMS from multiple multipoint meetings can be calculated by simply summing the traffic from individual meetings.
Extending the example above, if the CTMS is currently supporting one multipoint call with 3 CTS-3000s and 5 CTS-1000s, along with a second call consisting of four CTS-3000s, both using the auxiliary video input for PowerPoint presentations, the total amount of video traffic outbound from the CTMS can be estimated as:
[((5 + (3 * 3)) * 4.616 Mbps + (5 + 3 - 1) * 577 Kbps] + [(4 * 3) * 4.616 Mbps + (4 -1) * 577 Kbps]= 125.786 Mbps
The maximum amount of outbound video from a CTMS can be estimated based upon the maximum number of video segments supported by the CTMS. Assuming 48 CTS-1000s in one large multipoint call, using the auxiliary video input for a PowerPoint presentation, the total amount of outbound video traffic from the CTMS can be estimated as:
[48 * 4.616 Mbps + (48 - 1) * 577 Kbps] = 248.687 Mbps of Video Outbound From the CTMS
Therefore, the network would need to be able to support approximately 249 Mbps of outbound video from the CTMS.
Total Traffic to and from the CTMS
The total amount of traffic to and from the CTMS for a given meeting or set of meetings can be found by summing the amount of audio and video presented in the previous sections. For example, the maximum amount of traffic from the CTMS can be estimated as:
249 Mbps of Video + 13 Mbps of Audio = 262 Mbps
Note
One could simply estimate the total traffic from the CTMS by using the estimated 5.5 Mbps per CTS-1000 and 15 Mbps per CTS-3000 or CTS-3200 and multiplying by the number of devices supported. For example 48 CTS-1000s would yield a total amount of traffic of 48 * 5.5 Mbps = 264 Mbps. Likewise, 16 CTS-3000s or CTS-3200s would yield a total amount of traffic of 16 * 15 Mbps = 240 Mbps.
Simply estimating 15 Mbps per CTS-3000 or CTS-3200 and 5.5 Mbps per CTS-1000 or CTS-500 provides reasonably accurate numbers for traffic outbound from the CTMS across the network. However, due to the asymmetric nature of multipoint TelePresence, they do not accurately reflect the amount of traffic inbound to the CTMS across the network. Further, with the addition of high speed auxiliary video input, a single traffic rate utilized for CTS endpoints in calculating bandwidth utilization will become increasingly inaccurate depending upon whether the auxiliary video input is in use or not in use. For this reason, the detailed explanation within this document has been provided.
Video Switchover Delay
The CTMS does not immediately signal CTS endpoints to send video upon seeing packets with audio energy. This could cause unnecessary flapping of video within the conference call and generate additional burstiness on the network. Instead the CTMS implements a hold down timer before signaling the new active segment to begin sending video. The hold down timer is designed to ensure that the new active segment is indeed speaking, and not just a random noise. There is also a short period of time between when the new active segment has been notified to start sending video and when it begins transmitting video, which is then switched by the CTMS. Any participants within a TelePresence multipoint conference should be aware that they may need to talk for approximately two seconds before the video switches over. They should be particularly aware of this when taking a roll-call of participants at the beginning of the TelePresence meeting, in order for their faces to be seen on the video. It should be noted that the audio is never interrupted or delayed.
Overview of TelePresence Video on the Network
Since many network configuration guidelines around the deployment of TelePresence, both within the campus and to the branch, are based on the specific behavior of video on the network, a brief overview of TelePresence video as it appears on the network is presented in this section. Figure 11-4 shows an sample comparison of voice and video traffic as it appears on a network.
Figure 11-4 Comparison of Voice and Video on the Network
Voice on the network appears as a series of packets, spaced at regular intervals (in the case of Cisco TelePresence, every 20 ms), each containing an encoded sample of the audio. Each voice packet is basically independent of the other packets. In other words, if one voice packet is discarded or lost in the network, it does not affect the next voice packet. The sizes of the voice packets are fairly consistent, averaging slightly over 200 bytes in size. Therefore, the overall characteristic of voice is a constant bit rate stream.
Unlike voice, the overall characteristic of video in general, including TelePresence, is a somewhat bursty, variable bit rate stream. Video traffic on the network appears as a series of video frames spaced at regular intervals (in the case of TelePresence video, approximately every 33 ms). A frame of video is also referred to as an Access Unit in H.264 terminology. The H.264 standard defines two layers, a Video Coding Layer (VCL) and a Network Abstraction Layer (NAL). The VCL is responsible for encoding the video and the output of the VCL is a string of bits representing the encoded video. The function of the NAL is to map the string of bits into units which can then be transported across a network infrastructure. Each video frame consists of multiple packets spaced out over the frame interval. Each RTP packet contains one or more NAL Units (NALUs). Each NALU consists of an integer number of bytes of coded video, as shown in Figure 11-5.
Figure 11-5 Mapping TelePresence Video into RTP Packets
RTP Packets within a single video frame, and across multiple frames, are not necessarily independent of each other. In other words, if one packet within a video frame is discarded, it affects the quality of the entire video frame and may possibly affect the quality of other video frames. The sizes of the individual RTP packets within frames vary, depending upon the number of NALUs they carry and the size of the NALUs. Overall, packet sizes average around 1,100 bytes in size. The number of packets per frame also varies considerably based upon how much information is contained within the video frame. This is partially determined by how the video is encoded.
There are basically two types of encoding:
•
Intra-frame encoding—Compresses the frame by reducing spatial redundancy within the frame.
•
Inter-frame encoding—Uses motion compensation to reduce temporal redundancy across one or more frames.
These two types of encoding lead two types of video frames, intra-coded frames (I-frames) and predictive-coded frames (P-frames). A third type of frame, bi-directional predictive-coded frames (B-frames), is currently not utilized by TelePresence.
Note
Coding is actually done at the macroblock layer. An integer number of macroblocks then form a slice, and multiple slices form a frame. Therefore, technically slices are intra-predicted (I-slices) or inter-predicted (P-slices). For simplicity of explanation within this section, these have been abstracted to I-frames and P-frames. A thorough discussion of the H.264 Video Coding Layer is outside the scope of this section.
Figure 11-6 shows an example of how these frames can appear in a video stream.
Figure 11-6 Types of Video Frames
I-frames serve as reference points in the video stream. They can also be referred to as an Instantaneous Decoding Refresh (IDR) in H.264 terminology. I-frames can be decoded and displayed without referencing any other frame. Frame 1 in Figure 11-6 is an I-frame. On the other hand, P-frames reference either another P-frame or an I-frame. They require reception of the previous reference frame in order to be decoded correctly. For example Frame 2 in Figure 11-6 references Frame 1. Frame 3 may reference Frame 1 or Frame 2, since a P-frame may reference another P-frame.
Compression of I-frames is typically only moderate, since only spatial redundancy within the frame is eliminated. Therefore, I-frames tend to be much larger in size than P-frames. I-frame sizes up to 64 Kbytes (and approximately 60 individual packets) have been observed with TelePresence endpoints. P-frames have much higher compression since only the difference between the frame and the reference frame is sent. This information is typically sent in the form of motion vectors indicating the relative motion of objects from the reference frame. The size of TelePresence P-frames is dependent upon the amount of motion within the conference call. Under normal motion, TelePresence P-frames tend to average around 13 Kbytes in size and typically consist of about 12 individual packets. Under high motion they can be around 19 Kbytes in size and consist of about 17 individual packets.
From a bandwidth utilization standpoint, much better performance can be achieved by sending I-frames very infrequently. This reduces the burstiness of the video as well as the overall bit rate. However, the side effect is that if part of one video frame is lost (in other words, if a packet is dropped in the network), then multiple video frames which reference it or reference each other may be affected.
In point-to-point TelePresence meetings, I-frames are sent approximately every five minutes. However, waiting for up to five minutes for video to correct itself in the event of a lost packet is not acceptable. Therefore, CTS endpoints include an RTCP-based feedback mechanism by which video receivers continuously update the sender regarding the status of video packets received. When the sender learns that the receiver has lost some video packets, the sender generates an IDR in order to establish a new reference point.
In multipoint TelePresence meetings, an I-frame must be sent by a CTS endpoint whenever it becomes the active table segment. This I-frame must be replicated by the CTMS to every other endpoint in the multipoint call, as shown in Figure 11-7.
Figure 11-7 I-frame Replication by the CTMS
As can be seen, one effect of the CTMS is to magnify bursts generated by I-frames during normal speaker transitions. These bursts must be accommodated by the buffers on LAN switches and routers. Of particular concern is the LAN switch to which the CTMS is directly connected, since all video streams replicated by the CTMS pass through this first switch. The bursts must also be accommodated by any polices and shapers configured on WAN circuits that the TelePresence video traverses.
In multipoint calls, I-frames are also generated as a result of lost video. From the perspective of CTS endpoints, the originator of video in a multipoint call is the CTMS. Downstream CTS endpoints report received video back to the CTMS via the same RTCP-based feedback mechanism discussed earlier. The CTMS aggregates reports from multiple downstream CTS endpoints and sends its own report to the upstream CTS endpoint which is the source of the video stream, as is shown in Figure 11-8.
Figure 11-8 Lost Video Feedback in a Multipoint Call
If video has been lost by any of the downstream CTS endpoints, it is reported back to the upstream CTS endpoint by the CTMS after a brief hold-down time. The hold-down time prevents excessive I-frame generation by allowing the CTMS to aggregate reports from all downstream CTS endpoints before informing the upstream CTS endpoint that it needs to send a new I-frame.
The purpose of the feedback mechanism is to ensure that every video endpoint remains synchronized with the source and maintains high video quality. The side effect is that any site which experiences network degradation causes I-frames to be sent and replicated to every site within the multipoint call. In a worst case scenario, the site experiencing network degradation continues to report lost video, resulting in an I-frame "storm" across the network. Therefore, care must be taken to ensure that all sites are provisioned correctly in order to prevent excessive I-frame video traffic.
Deployment Models
There are a number of factors that determine how multipoint is deployed in a production environment. The two biggest factors are the number of CTS endpoints and the geographic location of the CTS endpoints. The number of CTS endpoints determines the number of CTMS devices in the network and the location of the CTS endpoints determines whether the CTMS devices are centralized or distributed.
Centralized Deployment
Centralized designs are recommended for Cisco TelePresence deployments with six or fewer CTS units or for larger deployments which cover a limited geographic area.
For centralized deployments, it is recommended that the CTMS be located at a regional or headquarters campus site with the necessary WAN bandwidth available to each of the remote sites, as well as the necessary LAN bandwidth within the campus. It is recommended that the CTMS be centrally located, based on the geographic location of the CTS rooms, although this may not be entirely possible due to the existing network layout. This prevents unnecessary latency caused by backhauling calls to a site at the far edge of the network.
Figure 11-9 illustrates a small TelePresence deployment with three regional/headquarter campus sites in North America and one site in Europe. In this example the CTMS is placed centrally, located in New York, to minimize latency for multipoint meetings.
Figure 11-9 Centralized Multipoint Design
Deployment Considerations
There are a number of decisions that need to be made prior to deploying a centralized TelePresence multipoint solution.
1.
Selecting a site for the multipoint switch—As mentioned above, the multipoint switch should be located at a site that provides end-to-end network latency less than 200ms and provides adequate bandwidth for the number of CTS endpoints on the network. Calculate the required bandwidth for the multipoint site using the calculations from the previous section outlining bandwidth requirements.
2.
Supported meeting types—Supporting either scheduled, non-scheduled, or both meeting types is not a concern in most centralized multipoint deployments.
a.
A scheduled only meeting environment requires CTS Manager and provides one-button-to-push meeting access for end users. Meetings can be scheduled by end users or a centralized scheduling group using Microsoft Exchange or IBM Domino. When configuring CTMS resources, maximum segments should be configured for the total available segments plus additional ad hoc segments. It is important to remember that ad hoc resources need to be available even in a scheduled only environment. If no ad hoc resources are configured on the CTMS, there is no way to add non-scheduled CTS endpoints to scheduled meetings (as described in Multipoint Resources in Chapter 10, "Cisco TelePresence Multipoint Solution Essentials"). Scheduled segments are calculated based on maximum and ad hoc segment entries. Figure 11-10 illustrates an example of resource allocation for a scheduled only meeting deployment with five CTS-3000s (15 available segments).
Figure 11-10 Resource Allocation Example
b.
A non-scheduled only meeting environment does not require CTS-Manager, but does require users to manually dial or use speed dial entries to access meetings. When deploying a non-scheduled meeting environment for multipoint deployments with less than six rooms, it is recommended that a single speed dial entry be used for multipoint access (with five or less CTS endpoints it is only possible to have a single multipoint meeting). End users continue to use their existing calendaring system to reserve the rooms and use the single speed dial entry for multipoint meeting access.
Non-scheduled only meeting environment for TelePresence deployments with six or more rooms is not recommended. However, if this is the only option, the following consideration needs to be accounted for:
•
Meeting access—How will users access meetings? With six or more TelePresence rooms it is possible to have multiple multipoint meetings. Providing multiple speed dials for multiple multipoint meetings may cause confusion and ultimately end up causing users to dial into the wrong meeting. In this environment users need to manually dial into multipoint meetings.
For this reason it is recommended that a centralized scheduler be used for multipoint meetings. The scheduler can either create a new static meeting for each request or select from a pool of pre-configured static meetings. After each request is processed by the scheduler, meeting information is sent to the meeting participants. At the time of the meeting, participants manually dial into the multipoint meeting using the information provided by the meeting scheduler.
Another option is to install a standalone directory and groupware server that is supported by the solution (e.g., Microsoft Active Directory and Exchange or IBM Domino). This allows a centralized scheduler to schedule multipoint meetings using Exchange which in conjunction with CTS-Manager provides system resource management for all scheduled multipoint meetings. This also provides one-button-to-push meeting access, eliminating any issues with users having to manually dial into meeting.
If ad hoc meetings are used, the scheduler launches the meeting at the scheduled start time, allowing users to walk into the room and attend their meeting. This is a very secure method for conducting multipoint meetings. CTS endpoints cannot dial into an ad hoc meeting; only the meeting administrator can add CTS endpoints to the meeting through the web GUI of the CTMS. However, this is very resource-intensive process considering every meeting must be manually initiated by the meeting administrator at the time of the meeting.
•
Meeting security—Using a central scheduling resource to allocate static meetings is a resource-intensive process and prone to security risks. If a small number of multipoint numbers are used, it is possible for rouge endpoints to interrupt meetings. If a user tries to avoid the scheduling process and dials into the last multipoint meeting number they were assigned, they may interrupt a multipoint meeting in progress. To avoid this it is recommended that a maximum number of rooms be configured for each static meeting, minimizing the possibility of meeting interruptions. Configuring the number of rooms does not eliminate all potential risks. However it does minimize the threat by essentially locking the meeting after all the scheduled rooms are in the meeting. If a meeting requires more security, it is recommended that an ad hoc meeting be used. Figure 11-11 illustrates an example of an eight room deployment with pre-configured meeting numbers.
Figure 11-11 Pre-configured Meeting Numbers
•
Resource management—There is no resource management for non-scheduled meetings. In the event a centralized multipoint deployment supports more than 48 segments, a centralized scheduler is required to ensure multipoint resources are allocated properly. Maximum and ad hoc resources should be configured for the total number of available segments.
c.
Combined scheduled and non-scheduled meetings require CTS Manager. This deployment provides one-button-to-push meeting access for scheduled meetings and manual dial meeting access for non-scheduled meetings. This type of deployment allows personal static meeting numbers for power users or executives. These numbers can be used for last minute multipoint meetings when scheduling ahead of time is not convenient. Ad hoc meetings may also be used for high profile meetings or a white glove type meeting service. However, there are a number of considerations that must be taken into account:
•
Meeting security—Since static meeting numbers are not secure, it is possible for an uninvited room to dial into the multipoint meeting.
•
Administrative resources—If static meetings are supported by a centralized scheduler, as described above, or ad hoc meetings are used, additional administrative resources are probably required.
•
Resource management—There is no resource management for non-scheduled meetings. In the event a centralized multipoint deployment supports more than 48 segments, it is recommended that a separate CTMS devices be deployed, with one CTMS dedicated to scheduled meetings and one dedicated to non-scheduled meetings. This ensures that resources are always available for non-scheduled meetings.
3.
Failover/redundancy—With the current release of CTMS and CTS Manager, automated failover is not supported. The following failover options are recommended for all meeting deployment scenarios described above:
a.
Scheduled meeting deployment—In a scheduled meeting deployment, two CTMS devices can be configured in CTS Manager. CTMS-1 is configured in scheduled mode, while CTMS-2 is configured as non-scheduled. In case of a failure to CTMS-1, the system administrator uses the CTS Manger GUI interface to migrate all scheduled multipoint meetings to CTMS-2. The administrator then changes the control state of CTMS-1 to non-scheduled and changes the control state of CTMS-2 to scheduled. When meetings are migrated to CTMS-2, conference access number and meeting IDs are updated and new one button to push entries are propagated to all CTS endpoints. Any meeting that is in progress during the failure\migration is not migrated.
b.
Non-scheduled meeting deployment—For this deployment method a hot standby is recommended. Configure two CTMS devices with the same configuration, including the IP address. Manually shut down the Ethernet port to which CTMS-2 is connected. In case of a failure in CTMS-1, shut down the Ethernet port to which CTMS-1 is connected and no shut the Ethernet port to which CTMS-2 is connected.
Cisco Unified Communications Manager (CUCM) has the ability to route calls to a secondary CTMS in the event of a primary CTMS failure using route lists\route groups. However, this is not recommended, since there is no state information passed between CTMS devices. In the case of a temporary CTMS failure it is possible to have a meeting split between two CTMS devices (split meetings).
c.
Combined scheduled and non-scheduled—Almost all centralized deployments consist of less than 16 CTS-3000s, or a total of 48 table segments, allowing a single CTMS to accommodate all systems simultaneously. This does, however, provide a challenge for failover in an environment where a single CTMS is providing scheduled and non-scheduled resources. To provide seamless failover for all users, scheduled and non-scheduled resources must be supported on separate CTMS devices. Both failover methods described above for scheduled and non-scheduled must be used to supply failover.
Distributed Deployment
A distributed deployment is recommended for large TelePresence deployments or smaller deployments with three or more CTS endpoints in separate geographical regions, as shown in Figure 11-12. As TelePresence networks grow, it is very advantageous to localize CTMS devices if possible.
Regionally localizing CTMS devices minimizes latency and saves bandwidth. Figure 11-12 provides an example of a distributed deployment with a CTMS in New York providing multipoint services for North America and a CTMS in Paris providing multipoint services for Europe.
Figure 11-12 Distributed Multipont Deployment
Note
It should be noted that the current CTMS implementation (software version 1.1) does not support CTMS chaining/cascading for scalability.
Deployment Considerations
As seen above in the centralized deployment, there are a number of considerations that must be addressed for a simple multipoint deployment. In a distributed deployment, there are a number of additional considerations that must be addressed to ensure a successful deployment.
1.
Selecting sites for the CTMS resources—As mentioned above, multipoint switches need to be located at a sites providing end-to-end network latency less than 200ms, for targeted CTS endpoints, and adequate bandwidth for the number of CTS segments supported by each site. In Figure 11-13, four regional locations have been selected for CTMS resources based on the number of regional CTS endpoints, bandwidth availability, and proximity to the regional CTS endpoints.
Figure 11-13 Distributed CTMS Deployment Example
2.
Supported Meeting Types—In a distributed multipoint deployment, it is recommended that scheduled only or a combination scheduled and non-scheduled meeting environment be supported. Non-scheduled only meeting environments should be avoided in distributed multipoint deployments due to the complexity of administering multipoint meetings.
a.
A scheduled only meeting environment requires CTS Manager and provides one-button-to-push meeting access for end users. Meetings are scheduled by end users or a centralized scheduling group using Microsoft Exchange or IBM Domino. When configuring CTMS resources, maximum segments should be configured for the total available segments plus additional ad hoc segments. It is important to remember that ad hoc resources need to be available even in a scheduled only environment. If no ad hoc resources are configured on the CTMS, there is no way to add non-scheduled CTS endpoints to scheduled meetings (as described in Multipoint Resources in Chapter 10, "Cisco TelePresence Multipoint Solution Essentials").
b.
Non-scheduled meeting environments are not recommended in distributed multipoint deployments. Manually managing resources, meeting placement, and scheduling meetings is a difficult task that is prone to errors.
CTS Manager is required to support scheduled meetings and manage CTMS resources. If a calendaring system other than Exchange or Domino is used to schedule meetings\rooms, it is recommended that a standalone Active Directory and Exchange server be deployed. This allows a centralized scheduler to schedule multipoint meetings using Exchange which, in conjunction with CTS Manager, provides system and geographical resource management for all scheduled multipoint meetings. This also provides one-button-to-push meeting access, eliminating any issue with users having to manually dial into meetings.
c.
Combined scheduled and non-scheduled meetings require CTS Manager. This deployment provides one button to push meeting access for scheduled meetings and manual dial meeting access for non-scheduled meetings. This type of deployment allows personal static meeting numbers for power users or executives. These numbers can be used for last minute multipoint meetings when scheduling ahead of time is not convenient. Ad hoc meetings may also be used for high profile meetings or a white glove type meeting service. However, there are a number of considerations that must be taken into account:
•
Meeting security—Since static meeting numbers are not secure, it is possible for an uninvited room to dial into the multipoint meeting.
•
Administrative resources—If static meetings are supported by a centralized scheduler, as described Centralized Deployment, or ad hoc meetings are used, additional administrative resources are probably required.
•
Resource management—There is no resource management for non-scheduled meetings. For this reason, it is recommended that scheduled and non-scheduled resources be supported on separate CTMS devices. This ensures that enough resources are available for non-scheduled meetings and executives do not call complaining about their personal multipoint number not working.
3.
CTMS resources per site—After determining which sites will provide multipoint resources, it is important to determine how many segments each site will support. Table 11-1 shows the segment breakdown for each multipoint site, based on Figure 11-13.
Table 11-1 Resource Allocation
Site
|
Total Segments
|
Scheduled Segments
|
Ad hoc Segments
|
New York
|
42
|
36
|
6
|
San Jose
|
17
|
14
|
3
|
London
|
20
|
17
|
3
|
Hong Kong
|
17
|
14
|
3
|
Resources for each site are broken down into three categories, total segments, scheduled segments, and ad hoc segments. Total segments is used to limit the number of connections each CTMS device supports. This ensures that the bandwidth allocated to a multipoint site is not exceeded. Scheduled segments is passed to CTS Manager and used to manage resources on each CTMS device. Ad hoc Segments is used to add non-scheduled CTS endpoints to scheduled meetings or to provide resources for static and ad hoc meetings. In Table 11-1, ad hoc segments are only being provided to add non-scheduled CTS endpoints to scheduled meetings.
Determining how many multipoint resources are assigned to each site is based on call patterns and WAN bandwidth. There is no exact calculation for determining the number of multipoint resources for a region. However, at a minimum there should be enough resources allocated to support all CTS endpoints within the region.
In Table 11-1, it is decided that New York is the hub site, providing enough schedulable resources for a single multipoint meeting containing all deployed CTS-3000s. Six ad hoc resources are added, allowing non-scheduled CTS endpoints to be added to scheduled meetings. Total segments is configured for 42 to ensure the provisioned bandwidth for the hub site is not exceeded.
San Jose is configured with enough scheduled resources to support all regional CTS endpoints and six additional CTS segments. Three ad hoc resources are added, allowing non-scheduled CTS endpoints to be added to scheduled meetings. Total segments is configured for 17 to ensure the provisioned bandwidth for Hong Kong is not exceeded.
London is configured with enough scheduled resources to support all regional CTS endpoints and six additional CTS segments. Three ad hoc resources are added, allowing non-scheduled CTS endpoints to be added to scheduled meetings. Total Ssegments is configured for 20 to ensure the provisioned bandwidth for London is not exceeded.
4.
Required bandwidth for each multipoint site:
The appropriate bandwidth must be configured for each multipoint site. Calculate the required bandwidth for each multipoint site using the calculations from the previous section outlining bandwidth requirements.
5.
CTMS configurations for geographical selection:
As described in Geographical Resource Management in Chapter 10, "Cisco TelePresence Multipoint Solution Essentials", CTS Manager provides the ability to select a CTMS device with the closest proximity to scheduled CTS endpoints. It is important to carefully analyze the location of CTS endpoints and the regional multipoint sites to determine the best time zone entry for each CTMS. CTMS time zone entries may have to be modified to obtain the most accurate meeting placement.
6.
Failover/redundancy—With the current release of CTMS and CTS Manager, automated failover is not supported. The following failover options are recommended for all meeting deployment scenarios described above:
a.
Scheduled meeting deployment—In a scheduled meeting deployment, two CTMS devices can be configured in CTS Manager. CTMS-1 is configured in scheduled mode, while CTMS-2 is configured as non-scheduled. In case of a failure to CTMS-1, the system administrator uses the CTS Manger GUI interface to migrate all scheduled multipoint meetings to CTMS-2. The administrator then changes the control state of CTMS-1 to non-scheduled and changes the control state of CTMS-2 to scheduled. When meetings are migrated to CTMS-2, conference access number and meeting IDs are updated and new one button to push entries are propagated to all CTS endpoints. Any meeting that is in progress during the failure\migration is not migrated.
b.
Non-scheduled meeting deployment—Non-scheduled only meeting deployments are not recommended in a distributed deployment.
c.
Combined scheduled and non-scheduled—It is recommended that scheduled and non-scheduled multipoint resources be supported on separate CTMS devices. Failover scenarios for each device should be the same as described above for scheduled and non-scheduled meetings.
Positioning of the CTMS within the Campus or Branch
Due to the total audio and video bandwidth requirements for a Cisco TelePresence CTMS (which can be up to approximately 260 Mbps), it is important to consider its placement in the network. Within a campus deployment, placement of the CTMS within a logical data center LAN segment may be desirable due to the availability of bandwidth, an uninterruptible power supply, as well as ease of monitoring. The downside is that all multipoint TelePresence traffic must be backhauled into and out of the logical data center LAN segment. The data center design may need to be adjusted to accommodate the necessary increase in traffic.
An alternative is to locate the CTMS at the access layer, towards the logical WAN edge of the campus. Note that some customers may not have an access layer switch at this location. This type of placement may minimize the amount of traffic which is backhauled through the campus LAN, however this is dependent upon the location of the CTS endpoints. If the majority of the CTS endpoints within the multipoint TelePresence deployment are remote to the campus location, this design may provide some benefit. If the majority of the CTS endpoints are within the campus, this design may provide little benefit. The downside to this placement is that an uninterruptible power supply may not be available, depending upon the physical location of WAN network devices within the campus deployment. However, in many campus network deployments, the WAN routers are physically located within a data center itself, although not logically on a data center LAN segment.
A third alternative is to simply locate the CTMS at the access layer within the campus network. This type of placement minimizes the amount of unnecessary traffic to the logical WAN edge and the logical data center if the majority of CTS endpoints are within the campus. The downside is that the likelihood of an uninterruptible power for the CTMS may be lower at the access layer. Figure 11-14 shows the three campus placement alternatives.
Figure 11-14 Possible CTMS Locations within the Campus
Under some circumstances, it may be necessary to deploy a CTMS at a branch location. However, due to limited bandwidth of branch locations, this design is not highly recommended. When deploying at a branch, it is recommended that the CTMS be deployed at the distribution layer of any hierarchical LAN configuration, as shown in Figure 11-15.
Figure 11-15 Possible CTMS Locations Within the Branch
This minimizes the amount of unnecessary traffic backhauled through the branch LAN network. An alternative is to place the CTMS at the access layer if no available LAN ports exist at the distribution layer.
Next, it should be noted that the CTMS currently supports a single 1 Gigabit Ethernet connection. Resilient connections to dual LAN switches are currently not supported. Further, since the CTMS is capable of generating traffic loads in excess of 100 Mbps, it is not recommended to place the CTMS on a 100 Mbps Ethernet LAN port.
Finally, both CTS endpoints and their associated IP phones transmit CDP packets to associated network devices (switches and routers). CDP can be used to extend trust to the device connected to the LAN switch port and automatically place CTS endpoints within a voice VLAN. If the CTS endpoints within a network deployment are located within a separate voice VLAN, placement of the CTMS within the voice VLAN maintains consistency of the overall TelePresence deployment from a traffic isolation and QoS viewpoint.
Network Requirements
Maintaining the required SLAs for Cisco TelePresence can be challenging when multipoint is added to the network. SLA requirements defined for point-to-point TelePresence in Chapter 4, "Quality of Service Design for TelePresence" and duplicated in Table 11-2 remain the same and should not be affected with the addition of multipoint. However, providing acceptable latency for multipoint meetings can be a challenge for geographically disperse deployments. The three main network considerations that need to be carefully considered when deploying multipoint capabilities for Cisco TelePresence are bandwidth, latency, and traffic bursts. Latency and bandwidth are discussed in this section. Estimating Burst Sizes within Multipoint TelePresence Calls thoroughly discusses bursts within a multipoint TelePresence design. How and where the CTMS is deployed on the network directly affects latency for multipoint meetings and bandwidth patterns on the network. Deploying a multipoint device in the wrong location, physical or geographical, may cause an undesirable meeting experience and directly affect network performance.
Table 11-2 Cisco TelePresence SLA Requirements
Metric
|
Target
|
Threshold 1 (Warning)
|
Threshold 2 (Call Drop)
|
Enterprise Component
|
Service Provider Component
|
Latency
|
150 ms
|
200ms
|
400 ms1
|
20%
|
80%
|
Jitter
|
10 ms
|
20 ms
|
40 ms
|
50%
|
50%
|
Loss
|
0.05%
|
0.10%
|
0.20%
|
50%
|
50%
|
Latency
One of the key differentiators for Cisco TelePresence is the ability to maintain extremely low latency while providing high quality 1080p video and spatial audio. Excessive latency in any Cisco TelePresence meeting will degrade the "in-person" experience. Latency becomes an even bigger issue with multipoint, since all CTS systems dial into a CTMS that may not be located in the same geographic location as the CTS endpoints. Due to the nature of multipoint, two CTS endpoints that provide very low latency in a point-to-point meeting may have considerably higher latency in a multipoint meeting. Inserting any multipoint device in the media path of a Cisco TelePresence call introduces additional latency. However, proper placement of the CTMS helps minimize latency and preserve the Cisco TelePresence experience.
A Cisco TelePresence network should always be designed to target one-way, end-to-end, network latency of less than <150ms. However, in some cases this is not possible due to long distances between international sites. Therefore, the upper limit allowed for one-way, end-to-end network latency is < 200ms. Anything above 200ms causes the message "Experiencing Network Delay" to be displayed on the Cisco TelePresence endpoint, degrading the user experience. Therefore, multipoint deployments should provide one-way, end-to-end, latency below 200ms in all cases.
Figure 11-16 illustrates a three site multipoint deployment with the CTMS located in the hub site.
Figure 11-16 Multipoint Network Latency Example
As mentioned above, one way, network only latency must stay below 200ms to maintain the TelePresence experience. In Figure 11-16, the hub site is chosen to deploy the CTMS. Using the latency matrix in the diagram, the highest latency for any multipoint meeting is between Site #2 and Site #3. To calculate the "highest" latency for a multipoint deployment, take the two sites with the highest latency between themselves and the CTMS and add 10ms for CTMS switching delay. The worst case latency in Figure 11-16 is calculated as:
Site #2 - Hub 75ms + Site#3 - Hub 40ms + CTMS 10ms = 125ms
As previously noted, the current CTMS implementation (software version 1.1) does not support chaining/cascading for scalability. Therefore, the latency examples above also apply to distributed multipoint designs. The design engineer should also be aware that any CTS endpoint could potentially utilize any CTMS within the network as part of a multipoint call and should be aware of the maximum possible end-to-end latency of any combination of CTS endpoints utilizing any CTMS within the network.
Bandwidth
Sufficient bandwidth must also be provisioned to the site which houses the CTMS to support the additional traffic required for multipoint meetings, as well as the traffic required for point-to-point meetings.
Centralized Multipoint Designs
Figure 11-17 highlights the bandwidth requirements of centralized multipoint designs by extending the simple multisite example shown in Figure 11-9 to include the bandwidth requirements for TelePresence for each site.
Figure 11-17 Centralized Multipoint Design Bandwidth Example
Calculating the amount of bandwidth required at the Hub Site is fairly straight forward. In the example above, the circuit to the Hub Site must have sufficient bandwidth to support three CTS-3000 systems at 1080p even though the Hub Site only houses a single CTS-3000. This is due to the CTMS being located at the Hub Site. Audio and video traffic from Site #1, Site #2, and Site #3 must traverse the circuit during multipoint meetings. Note that the LAN infrastructure within the Hub Site must also be designed to support the cumulative bandwidth of all four TelePresence CTS endpoints.
A general rule of thumb for 1080p configurations is to simply estimate 15 Mbps per CTS-3000 or CTS-3200 and 5.5 Mbps per CTS-1000 or CTS-500 with low-speed auxiliary video input.
Note
Audio and Video Flows In A Multipoint TelePresence Design presents a thorough discussion on how to calculate the audio and video flows to and from a CTMS in a multipoint meeting; since traffic flows are asymmetric in a multipoint call.
Provisioning the correct amount of bandwidth on the LAN and WAN is essential for a successful multipoint deployment. As illustrated in Figure 11-17, the maximum potential bandwidth for each CTS-3000 (15Mbps) is provisioned to ensure there is no packet loss due to insufficient bandwidth. However, the actual bandwidth used during a multipoint meeting typically averages less (10 - 12 Mbps with six people sitting at the table participating in the meeting) than the provisioned bandwidth.
The design engineer should also keep in mind that the amount of bandwidth provisioned to the site which houses the CTMS must be increased for point-to-point meetings which occur at the same time as multipoint meetings.
Distributed Multipoint Designs
As illustrated above, provisioning bandwidth for TelePresence deployments with a single CTMS and a limited number of CTS systems is fairly straight forward. However, in larger deployments with multiple CTMS devices and a mix of CTS3000s, CTS-3200s, CTS-1000s, and CTS500s, bandwidth provisioning becomes more difficult. Several methods of provisioning bandwidth at the CTMS site are explored within the next sections.
Maximum Bandwidth Per CTMS Approach
Figure 11-18 provides an example of a large TelePresence network with distributed CTMS devices. The CTMS devices are deployed in regions around the world. In this example, the network design engineer must determine how much bandwidth needs to be provisioned to each CTMS site to handle not only multipoint calls involving CTS endpoints within the region, but also CTS endpoints which may be across the QoS-enabled WAN as well.
Figure 11-18 Bandwidth Provisioning Based Maximum Configured CTMS Capacity
The approach shown in Figure 11-18 is to simply provision sufficient bandwidth to accommodate the maximum amount of traffic from the CTMS at each regional location, assuming each CTMS is configured for its maximum of 48 segments (as of software version 1.1) or configured for less than the maximum segments. This method may be beneficial to customer deployments in which the number of CTS table segments (1 segment per CTS-1000 or CTS-500 and 3 segments per CTS-3000 or CTS-3200) deployed throughout the network greatly exceeds the maximum capacity of a single CTMS. In this type of large deployment, the customer may have no control of which sites have multipoint meetings with each other and what type of CTS units are involved in the meeting.
A rough estimate for calculating the required bandwidth is to simply multiply 5.5 Mbps per CTS-1000 by the maximum number of segments (also referred to as table segments) supported by the CTMS:
5.5 Mbps x 48 table segments = 264 Mbps
For example, if every CTS unit in the North America, South America, and Europe regions shown in Figure 11-18 were configured for a single multipoint call, the CTMS unit selected for the call and the bandwidth provisioned for TelePresence to the regional site which housed the CTMS would need to handle 48 segments.
Keep in mind that additional bandwidth capacity is required to handle additional point-to-point calls to and from the regional site as well if there were more CTS units at the regional site not involved in the multipoint call.
The advantage of this method of bandwidth provisioning is that as networks grow, the network design engineer does not constantly have to increase bandwidth to each site. The downside is that for many customers, provisioning that much bandwidth is unfeasible from a cost perspective.
Bandwidth Allocation Based on Meeting Patterns
A second method of provisioning bandwidth is based on historical meeting patterns and knowledge of the specific CTS units within the network. This method relies on limiting the Maximum Segments defined within the CTMS to be at or below the bandwidth allocated for TelePresence meetings from that regional location. This method may be beneficial to customer deployments in which the total number of CTS table segments may not exceed the maximum capacity of a single CTMS.
An example of this type of distributed multipoint TelePresence design is shown in Figure 11-19.
Figure 11-19 Bandwidth Provisioning Based Historical Meeting Patterns
In the example shown in Figure 11-19, based on WAN bandwidth and meeting patterns, the North America CTMS device is configured with a maximum of 36 table segments. Based on WAN bandwidth and meeting patterns, the Europe CTMS device is configured with the maximum 17 table segments. The remaining multipoint devices are configured to support a limited number of table segments.
When provisioning bandwidth using this method, it may be beneficial to base the bandwidth calculations for each site on the type of CTS system and video resolution supported, in order to more accurately assess the bandwidth requirement. A CTS-1000 requires more bandwidth than a CTS-3000 when provisioning is based on table segment versus entire system. A rough bandwidth guideline is to allocate 5.5 Mbps for a CTS-1000 system (one table segment) and 15 Mbps (three table segments at 5 Mbps each) for a CTS-3000 system. Table 11-3 provides the bandwidth required per system for each resolution and motion handling.
Table 11-3 Bandwidth Provisioning
Resolution
|
1080p
|
1080p
|
1080p
|
720p
|
720p
|
720p
|
Motion Handling
|
Best
|
Better
|
Good
|
Best
|
Better
|
Good
|
CTS-1000 Provisioned Bandwidth for Multipoint
|
5.5 Mbps
|
4.9 Mbps
|
4.4 Mbps
|
4.4 Mbps
|
3.2 Mbps
|
2.1 Mbps
|
CTS-3000 Provisioned Bandwidth for Multipoint
|
15 Mbps
|
12.9 Mbps
|
11.3 Mbps
|
11.3 Mbps
|
7.8 Mbps
|
4.4 Mbps
|
Below is the calculation for each multipoint device in Figure 11-19. The calculations are based on all CTS endpoints running at 1080p best resolution. In environments with mixed resolutions, it is recommended that bandwidth be provisioned based on the highest resolution only.
North America region:
9 - CTS-1000 @ 5.5Mbps = 49.5 Mbps 9 table segments
9 - CTS-3000 @ 15Mbps = 135 Mbps 27 table segments
Multipoint bandwidth = 184.5 Mbps 36 table segments
South America and Asia/Pacific regions:
8 - CTS-1000 5.5Mbps = 44 Mbps 8 table segments
2 - CTS-3000 15Mbps = 30 Mbps 6 table segments
Multipoint bandwidth = 74 Mbps 14 table segments
Europe region:
8 - CTS-1000 5.5Mbps = 44 Mbps 8 table segments
3 - CTS-3000 15Mbps = 45 Mbps 9 table segments
Multipoint bandwidth = 89 Mbps 17 table segments
Note
The calculations above are only for multipoint calls. Additional WAN bandwidth must be provisioned for CTS systems participating in point-to-point calls located at sites with CTMS devices.
Estimating Burst Sizes within Multipoint TelePresence Calls
This section presents a brief discussion of the causes of bursts within a multipoint TelePresence call; and a method of estimating the size of those bursts so that the network can be provisioned accordingly.
Causes of Bursts within Multipoint TelePresence Calls
In multipoint TelePresence calls, bursts are generated as a result of one of the following events.
•
Whenever a CTS endpoint joins a multipoint call
If the call is configured with the Video Announce feature enabled, when a new CTS endpoint joins the call, it becomes the active site. This causes an I-frame to be generated by the new CTS endpoint and replicated to every other CTS endpoint by the CTMS. In addition, the last active site sends an I-frame to the CTMS which is replicated and sent to the new CTS endpoint. Note that for CTS endpoints with multiple screens, multiple I-frames may be generated.
If the call is configured with the Video Announce feature disabled, it does not become the active speaker when joining the call. However, it needs to receive an I-frame to begin displaying video. Therefore, the active site sends an I-frame which is replicated by the CTMS to every CTS endpoint in the call. Note again that for CTS endpoints with multiple screens, multiple I-frames may still be generated.
•
Normal transitioning of the active site or segment from one CTS endpoint to another CTS endpoint
This causes an I-frame to be generated by the new active site or segment and replicated by the CTMS to all of the other CTS endpoints in the multipoint call.
•
One or more CTS endpoints or codecs reports loss in the received video
This causes the active site or segment to generate an I-frame to resynchronize all CTS endpoints. The I-frame is replicated by the CTMS and sent to all CTS endpoints in the multipoint call.
•
Periodic synchronization of the CTS endpoints by the active site
Each active site or segment periodically sends out a new I-frame to synchronize the CTS endpoints. This occurs approximately every 5 minutes with the current TelePresence solution. The I-frame is replicated by the CTMS and sent to all CTS endpoints in the multipoint call.
•
Normal transitioning of the video input from a device connected to the auxiliary input of one of the CTS endpoints.
Only one CTS endpoint at a time can function as a "presenter" within a multipoint meeting through the use of the Auto-Collaborate feature. Whenever the presenting CTS endpoint changes the content of the auxiliary video input, such as transitioning a PowerPoint slide, the burst of content is replicated by the CTMS to all of the other CTS endpoints within the multipoint call.
•
Normal replication of the P-frame video by the CTMS
The active site or segment sends a new video frame (P-frame) every 33 ms when not sending an I-frame. This is replicated by the CTMS and sent to all of the other CTS endpoints within the multipoint call. Likewise, the last active sites or segments send P-frames every 33 ms.
Bursts Due to I-Frame Replication
The size of I-frames generated during speaker transitions, periodic synchronization of the video, synchronization of the video due to packet loss, and when new CTS endpoints join the multipoint call is highly variable. However, under normal lighting and background conditions, ESE testing has shown that the maximum size of I-frames generated by these events is approximately 64 Kbytes.
Note
ESE testing has also indicated that burst sizes may exceed 64 Kbytes if lighting and background conditions do not comply with Cisco recommendations. It is therefore critical to follow Cisco documented room design recommendations in order to ensure proper functioning of the TelePresence deployment over the network.
This burst is replicated on a packet-by-packet basis by the CTMS to every endpoint in the multipoint call, as shown in Figure 11-20.
Figure 11-20 Burst Size Due to I-Frame Replication by the CTMS
It can be seen that as the number of CTS endpoints in the call increases, the size of the replicated I-frame burst also increases. In the example above the size of the I-frame burst sent by the CTMS is approximately 3 x 64 Kbytes = 192 Kbytes. This is sent within a single frame interval of 33 ms. Switch and router buffers need to be large enough to accommodate these bursts. Also, burst parameters configured within policers and/or shapers on any WAN circuits must also be large enough to accommodate the replicated I-frames.
Location of the CTS Endpoints
The location of the CTS endpoints within the multipoint call influences the size of the replicated I-frame burst across any given network device or WAN circuit. Figure 11-21 shows the same four-site multipoint call, with one of the CTS endpoints moved back to the head-end location.
Figure 11-21 Burst Size Due to Location of CTS Endpoints
As can be seen from Figure 11-21, the switch to which the CTMS is connected must still have sufficient buffer capacity on the ingress port to handle the I-frame replicated to three CTS endpoints. However, only two CTS-endpoints are now located across the WAN. The size of the I-frame burst sent across the WAN is now approximately 2 x 64 Kbytes = 128 Kbytes. Again, this is sent within a single frame interval of 33 ms. Therefore, the egress port of the switch which serves as the uplink between the switch and the router, as well as the router LAN ingress port, must have sufficient buffer capacity to handle the I-frame replicated to two CTS endpoints. Any policer and/or shaper parameters configured on the router or within the WAN must now be able to accommodate the I-frame replicated to two CTS endpoints.
Type of CTS Endpoints
The type of CTS endpoints and whether they are configured for room switching or speaker switching within the multipoint call also determines the size of the bursts generated due to I-frame replication by the CTMS.
Bursts as a Result of Room Switching
Figure 11-22 shows the same four-site multipoint call as shown in Figure 11-20. However, this time the CTS endpoints are CTS-3000 units instead of CTS-1000 units. Further, the multipoint call is configured for room switching.
Figure 11-22 Burst Size Due to CTS-3000 Room Switching
With room switching enabled, whenever any one of the participants at a CTS-3000 endpoint talks and becomes the active speaker, video from all three screens from the CTS-3000 is transmitted to every other CTS-3000. This means that three I-frames are generated, one for each screen position (left, center, and right). These are each replicated by the CTMS to each of the other rooms in the multipoint call. In the example above, the size of the I-frame bursts sent across the WAN are now approximately 3 sites x 64 Kbytes x 3 screens = 576 Kbytes.
CTS-3000s do slightly stagger the I-frame generation since the codecs each handle video independently. Therefore, although each I-frame is generated within a single 33 ms interval, the total duration of the sum of the three I-frames may extend across more than 33 ms. However, any policer and/or shaper parameters configured on routers or within the WAN which have a time constant (otherwise known as the refresh interval) greater than one frame interval (Tc > 33 ms), may need to be configured to handle the entire amount of I-frame bursts sent by the CTMS due to room switching.
Bursts as a Result of Speaker Switching
Figure 11-23 shows the same four-site multipoint call again. However, this time the CTS-3000 endpoints are configured for speaker switching.
Figure 11-23 Burst Size Due to CTS-3000 Speaker Switching
With speaker switching enabled, whenever any one of the participants at a CTS-3000 site talks and becomes the active speaker, only the video from the one segment is transmitted to every other participant. This behavior is similar to CTS-1000 endpoints as shown in Figure 11-20. In the example above, only the right screen position sends an I-frame. The size of the I-frame bursts replicated by the CTMS and sent across the WAN is again 3 sites x 64 Kbytes x 1 screen = 192 Kbytes. Combinations of CTS-1000s and CTS-3000s or CTS-3200s in a single call configured for speaker switching behave in a similar manner. Therefore room switching produces more burstiness across the network, although the I-frames may be slightly staggered by the CTS-3000 and CTS-3200 endpoints.
Calculating Burst Sizes Due to I-Frame Replication
Calls with CTS-1000s Only
An estimation of the maximum size of the burst replicated by the CTMS due to I-frames generated during an event such as a normal speaker transition, periodic synchronization, or synchronization due to loss, can be estimated as follows for calls with CTS-1000s only:
CTMS replicated burst size = (N -1) * 64 Kbytes
Where N is the number of CTS-1000s in the multipoint call.
Note that during these events, the number of I-frames replicated by the CTMS is one less than the number of CTS-1000 endpoints within the multipoint call.
However, a worst case scenario for a burst due to I-frame replication by the CTMS occurs when 48 CTS-1000s are in a single multipoint call, the call is configured with the Video Announce feature, and the last CTS-1000 endpoint joins the call. An example of this is shown in Figure 11-24.
Figure 11-24 Maximum Burst Size Generated by the CTMS Due to I-Frame Replication
In such a configuration the size of the I-frame burst replicated by the CTMS is:
CTMS replicated burst size = N * 64 Kbytes
Where N is the number of CTS-1000s in the multipoint call.
This can be calculated from the example above to be:
48 CTS-1000s * 64 Kbytes = 3.072 Mbytes
When the CTS-1000 #48 joins the call and becomes the active site, it generates an I-frame which is replicated by the CTMS to the 47 other sites as shown in Figure 11-24. The last active site, CTS-1000 #1 in the example above, also generates a new I-frame which is replicated by the CTMS and sent to CTS-1000 #48 for it to display on its screen. Therefore, in this situation, the total number of I-frames replicated by the CTMS is equal to the number of endpoints in the call. However, it should be noted that the I-frame generated by the last active speaker is not time synchronized with the rest of the I-frames generated by the new CTS-1000 joining the call. In other words, this I-frame may not occur within the same 33 ms frame window as the other 47 I-frames. However, any policer and/or shaper parameters configured on routers or within the WAN, which have a refresh interval greater than one frame interval (Tc > 33 ms), may need to be configured to handle the entire amount of I-frame bursts sent by the CTMS in this scenario.
Calls with CTS-3000s or CTS-3200s Only
An estimation of the maximum size of the burst replicated by the CTMS due to I-frames generated during an event such as a normal speaker transition, periodic synchronization, or synchronization due to loss can be estimated as follows for calls with CTS-3000s or CTS-3200s only:
CTMS replicated burst size = 3 screens * (M -1) * 64 Kbytes
Where M is the number of CTS-3000s or CTS-3200s in the multipoint call.
Keep in mind that since CTS-3000 and CTS-3200s stagger their I-frame generation slightly, this burst may occur over more than one 33 ms interval.
As in the previous section, a worst case scenario burst due to I-frame replication by the CTMS occurs when 16 CTS-3000s or CTS-3200s are in a single multipoint call, the call is configured for room switching, the Video Announce feature is enabled, and the last CTS-3000 or CTS-3200 endpoint joins the call. In this case, the size of the replicated I-frame burst by the CTMS is given by:
3 screens * M * 64 Kbytes
Where M is the number of CTS-3000s or CTS-3200s in the multipoint call.
Therefore, in this situation, the total number of I-frames replicated by the CTMS is equal to three times the number of endpoints in the call (because the CTS-3000s and CTS03200s have three screens). For the specific case of 16 CTS-3000s this results in the following:
3 * (16 CTS-3000s) * 64 Kbytes = 3.072 Mbytes
However, it should be noted that the I-frame generated by the last active site is again not time synchronized with the rest of the I-frames generated by the new CTS-3000 joining the call.
Calls with Mixed CTS-1000s and CTS-3000s or CTS-3200s
The maximum I-frame bursts which result from speaker transitions, periodic synchronization, or synchronization due to loss during mixed CTS-1000 and CTS-3000 or CTS-3200 calls, occurs when the CTS-3000s or CTS-3200s are configured for room switching and a CTS-3000 or CTS-3200 becomes the active site. An estimation of the maximum size of the burst replicated by the CTMS due to I-frames generated during such an event can be estimated as:
CTMS replicated burst size = (3 * (M-1) + N) * 64 Kbytes
Where N is the number of CTS-1000s in the multipoint call and M is the number of CTS-3000s or CTS-3200s in the multipoint call.
Again, keep in mind that since CTS-3000s and CTS-3200s stagger their I-frame generation slightly, this burst may occur over more than one 33 ms interval.
As with the previous two sections, a worst case burst due to I-frame replication by the CTMS occurs when the call is configured for room switching, the Video Announce feature is enabled, and the last CTS-3000 or CTS-3200 endpoint joins the call. In this case, the size of the replicated I-frame burst by the CTMS will be given by:
((3 * M) + N) * 64 Kbytes
Where M is the number of CTS-3000s or CTS-3200s and N is the number of CTS-1000s in the multipoint call.
For example, if a single multipoint call had six CTS-3000s and 30 CTS-1000s, the worst case burst due to I-frame replication would occur if a CTS-3000 was the last to join the call and would be as follows:
(3 * (6 CTS-3000s) + 30 CTS-1000s) * 64 Kbytes = 3.072 Mbytes
However, it should be noted that the I-frame generated by the last active speaker is again not time synchronized with the rest of the I-frames generated by the new CTS-3000 joining the call. Also the size of the burst is different if the last device to join the call is a CTS-1000 instead of a CTS-3000.
Other Considerations
I-frame bursts due to speaker transitions and periodic synchronization of the video are normal re-occurring events within a multipoint call. The network should be designed to handle bursts generated by these events. I-frame bursts generated when the last CTS-endpoint joins a call is typically a one-time event at the start of the meeting. Therefore, the design engineer has some discretion regarding whether to design the network to support the entire burst size generated by such an event, considering that I-frame generated by the last active speaker is not time-synchronized with the other I-frames. Also, the Video Announce feature can be disabled to minimize bursts at the beginning of calls.
Next, the design engineer should keep in mind that every network device may not see the entire burst replicated by the CTMS, depending upon the location of the endpoints on the network. The design engineer should determine the maximum number of I-frames which may traverse a particular network device in order to understand the shaper and policer parameters to configure.
Finally, the design engineer should keep in mind that the size of the I-frames replicated by the CTMS is highly variable. The value of 64 Kbytes represents a maximum size empirically observed by testing in the ESE lab. Actual sizes of I-frames generated by CTS endpoints may be smaller than 64 Kbytes. However, it is advisable to design the network to accommodate the maximum observed I-frame sizes. Failure to accommodate this could result in a I-frame "storm" which ultimately may degrade the video quality or cause the multipoint call to fail.
Bursts due to the Auxiliary Video Input
The size of bursts generated from the auxiliary video input is highly dependent upon the content being displayed. The current low-speed auxiliary video input is limited to approximately 500 Kbps without network overhead (approximately 577 Kbps with network overhead). The frame rate of the auxiliary video input is 5 frames per second.
Continuous motion inputs, such as a video clip or continuous animation on a PowerPoint slide being displayed through the auxiliary video input, tend to produce a relatively high bit rate close to 500 Kbps, with fairly uniform bursts of 3-4 Kbytes every 200 ms. On the other hand, PowerPoint slide transitions tend to produce an overall lower bit rate of around 200 Kbps, but much larger bursts. Bursts as high as 44 Kbytes within a single 200 ms frame interval have been observed. Packet sizes average around 1,100 bytes.
However, due to the upcoming release of high-speed auxiliary video which relies on a separate codec and operates at 4 Mbps with 30 frames per second as well, a recommended approach is to account for the auxiliary video burst in a similar fashion as another I-frame being generated by another camera video input, since it represents a worse case than the existing low-speed auxiliary video input. Therefore a value of 64 Kbytes is utilized for all calculations within this document.
Note
ESE has currently not tested high-speed auxiliary video input in order to confirm that bursts do not exceed 64 Kbytes at this time. Some caution is needed until this can be verified.
As with I-frames from speaker transitions auxiliary video bursts are also replicated on a packet-by-packet basis by the CTMS to every endpoint in the multipoint call, as shown in Figure 11-25.
Figure 11-25 Burst Size Due to Auxiliary Video Burst Replication by the CTMS
It can be seen that as the number of CTS endpoints in the call increases, the size of the replicated auxiliary video burst also increases. However, the type of CTS unit (CTS-1000, CTS-3000, or CTS-3200) does not matter, because each CTS endpoint only has one auxiliary video output.
Note
Simultaneous use of the low-speed and high-speed auxiliary video input in a single multipoint call is not supported.
Burst Estimation Due to Auxiliary Video Replication
An estimation of the maximum size of the burst replicated by the CTMS due to auxiliary video replication can be calculated as:
CTMS replicated burst size = (M + N -1) * 64 Kbytes
Where M is the number of CTS-3000 or CTS-3200s and N is the number of CTS-1000s in the multipoint call.
Since one CTS endpoint serves as the presenter, auxiliary video is replicated to one less than the number of CTS endpoints in the multipoint call. In the example above, the size of the auxiliary video burst sent by the CTMS is approximately 3 x 64 Kbytes = 192 Kbytes. This is sent within a single auxiliary video frame interval. For low speed auxiliary video the frame interval is 200 ms, which is a different frame interval than the video from the CTS endpoint cameras. For high speed auxiliary video the frame interval is 33 ms, which is the same frame interval as the video from the CTS endpoint cameras.

Note
ESE testing has shown that low-speed auxiliary video content is replicated to every CTS endpoint in the multipoint call, regardless of whether the CTS endpoint is configured to support a projector output within CUCM or whether a projector is actually connected to the output of the CTS endpoint. Note that the behavior of high-speed auxiliary video input has not been tested by ESE currently.
The maximum auxiliary video burst from the CTMS occurs with 48 CTS-1000s in a single multipoint call. An estimation of the maximum size of the burst generated during such an event can be estimated as:
CTMS replicated burst size = (48 - 1) * 64 Kbytes = 3.008 Mbytes
Other Considerations
Auxiliary video bursts due to PowerPoint slide transitions are normal re-occurring events within a multipoint call which utilizes the Auto-Collaborate feature. The network should be designed to handle bursts generated by these events.
As with I-frame bursts, the design engineer should keep in mind that every network device may not see the entire burst replicated by the CTMS, depending upon the location of the endpoints on the network. The design engineer should determine the maximum number of auxiliary video bursts which may traverse a particular network device in order to understand the shaper and policer parameters to configure.
Finally, the design engineer should keep in mind that the size of the auxiliary video bursts replicated by the CTMS is highly variable. A value of 44 Kbytes over a 200 ms frame interval has been confirmed from ESE testing of low-speed auxiliary video input using PowerPoint slides. However, with the upcoming release of high-speed auxiliary video, a recommended value of 64 Kbytes (matching existing codec I-frame bursts) is recommended for design purposes. Note that high-speed auxiliary video operates over a 33 ms frame interval as well.
Normal P-Frame Video
When the codecs of a CTS endpoint are not sending I-frames to re-synchronize the video, they send P-frames every 33 ms. The size of the P-frames is variable and depends on how well motion estimation and compensation was able to compress the frame. The video from each camera connected to the CTS codec is constrained to approximately 4 Mbps without network overhead or approximately 4.616 Mbps with network overhead. Without any I-frames, this translates to a maximum of approximately 19 Kbytes of video sent every 33 ms, although typically video in a TelePresence meeting runs below this value. The P-Frame video is also replicated by the CTMS on a packet-by-packet basis as shown in Figure 11-26.
Figure 11-26 Burst Size Due to P-Frame Replication by the CTMS
As can be seen from Figure 11-26, the active site generates P-frames which are sent to the CTMS and replicated to the other CTS endpoints. However, since the active site also has to display video, the last active site generates P-frames which are replicated by the CTMS and sent to the active site. In the example above, the size of the P-frame bursts sent by the CTMS is approximately 4 x 19 Kbytes = 76 Kbytes every frame interval. Note however that this fourth P-frame is not time synchronized (does not happen exactly at the same time) with the rest of the P-frames replicated by the CTMS.
Location of the CTS Endpoints
As with I-frame bursts, the location of the CTS endpoints within the multipoint call influences the size of the replicated P-frame burst across a given network device or WAN circuit. In the example above, four P-frames are replicated by the CTMS (three from the active site and one from the last active site) and seen by the switch directly connected to the CTMS. However only three P-frames or 3 x 19 Kbytes = 57 Kbytes are sent across the WAN to the remote CTS endpoints every 33 ms.
Burst Estimation Due to P-Frame Replication
The type of CTS endpoints within the multipoint call affects how the P-frame bursts are generated, but not the size of the overall bursts. For meetings which include CTS-3000s or CTS-3200s in speaker switching mode, or combinations of CTS-3000s, CTS-3200s, and CTS-1000s, there may be multiple last active segments since each screen of each CTS-3000 or CTS-3200 needs to display video. For meetings which include only CTS-3000s or CTS-3200s in room switching mode, the active site sends three P-frames every 33 ms, one from each codec. Likewise the last active site sends three P-frames every 33 ms which is replicated by the CTMS and sent to the active speaker.
However, in all cases, the number of P-Frames replicated by the CTMS during a frame interval, regardless of the type of CTS endpoint and whether it is configured for room or speaker switching, is given by the following equation:
Number of P-Frames Replicated by the CTMS = Number of Video Segments in the Multipoint Call
Therefore, as the number of CTS endpoints in the call increases, the size of the replicated P-frame burst also increases.
An estimation of the worst case scenario of a P-frame burst generated by the CTMS per frame interval, when 48 video segments are in a single multipoint , is:
CTMS replicated burst size = 48 * 19 Kbytes = 912 Kbytes
Other Considerations
P-frame bursts are normal re-occurring events within a multipoint call. The network must be designed to handle bursts generated by these events. However, the design engineer should keep in mind that every network device may not see the entire burst replicated by the CTMS, depending upon the location of the endpoints on the network. The design engineer should determine the maximum number of P-frames which may traverse a particular network device in order to understand any shaper and policer parameters to configure.
Finally, the design engineer should keep in mind that the size of the P-frames replicated by the CTMS is highly variable as well. The value of 19 Kbytes represents an average size based upon the maximum video rate of 4 Mbps (4.616 Mbps with network overhead). Normal video is typically below this rate, suggesting smaller P-frame bursts. However, it is advisable to design the network to accommodate the P-frame bursts based on maximum video rate, since this represents a worst case. Failure to accommodate P-frame bursts causes the video quality to degrade and the multipoint call to fail.