In compliance recording, calls are configured to always be recorded.
For IP phone recording, all calls received by or initiated by designated phones are recorded.
Individual lines on individual phones are enabled for recording by
configuring them with an appropriate recording profile in Unified
For CUBE recording, all calls passing
through the CUBE that match particular dial peers (typically
selected by dialed number pattern) are recorded. MediaSense
itself does not control which calls are recorded (except to the limited extent described under Incoming call
recording differs from selective recording because in selective recording, the recording server determines
which calls it will record. MediaSense itself does not
support selective recording, but the effect can be achieved by
deploying MediaSense in combination with certain partner
Recording is accomplished by media forking, where basically the phone or CUBE sends a copy of the incoming and outgoing media streams
to the MediaSense recording server. When a call originates or
terminates at a recording-enabled phone, Unified Communications Manager sends a pair of
SIP invitations to both the phone and the recording server. The
recording server prepares to receive a pair of real-time
transport protocol (RTP) streams from the phone. Similarly, when a
call passes through a recording-enabled CUBE, the CUBE device sends
a SIP invitation to the recording server and the recording server prepares to receive a pair of RTP streams from the CUBE.
This procedure has several implications:
Each recording session consists of two media streams (one for media flowing in each direction). These two streams are captured
separately on the recorder, though both streams (or tracks) end up on the same MediaSense recording server.
Most, but not all, Cisco IP phones support media forking. Those which do not support media forking cannot
be used for phone-based recording.
Though the phones can fork copies of media, they cannot
transcode. This means that whatever codec is negotiated by the
phone during its initial call setup, is the codec used in
recording. MediaSense supports a limited set of codecs; if
the phone negotiates a codec which is not supported by MediaSense, the call will not be recorded. The same is true for
The recording streams are set up only after the
phone's primary conversation is fully established, which could take
some time to complete. Therefore, there is a possibility of clipping
at the beginning of each call. Clipping is typically limited to
less than two seconds, but it can be affected by overall
CUBE, Unified Communications Manager, and MediaSense load; as well as by network
performance characteristics along the signaling link between CUBE
or Unified Communications Manager and MediaSense. MediaSense carefully
monitors this latency and raises alarms if it exceeds certain
MediaSense does not initiate
compliance recording. It only receives SIP invitations from Unified Communications Manager
or CUBE and is not involved in deciding which calls do or do not get recorded. The IP phone configuration and the CUBE
dial peer configuration determine whether media should be recorded. In some cases, calls may be recorded more than once, with
neither CUBE, Unified Communications Manager, nor MediaSense being aware that it is happening.
This would be the case if, for example,
all contact center agent IP phones are configured for recording
and one agent calls another agent. It might also happen if a call
passes through a CUBE which is configured for recording and lands
at a phone which is also configured for recording. The CUBE could end up creating two recordings of its own. However, MediaSense stores enough
metadata that a client can invoke a query to locate duplicate
calls and selectively delete the extra copy.
At this time, only audio streams can be forked by Cisco IP
phones and CUBE. Compliance recording of video media
is not supported; it is only available for the blogging modes of
recording. CUBE is capable of forking the audio streams of a
video call and MediaSense can record those, but
video-enabled Cisco IP phones do not offer this capability.
MediaSense can record calls of up to eight hours in
Conferences and transfers
MediaSense recordings are made up of one or more
sessions where each media forking session
contains two media streams—one for incoming and one for
the outgoing data. A simple call consisting of a
straightforward two-party conversation is represented entirely by a
single session. MediaSense uses metadata to
track which participants are recorded in which track of the
session, as well as when they entered and exited the conversation—but it cannot always do so when conferences are involved.
When sessions included transfer and conference activities, MediaSense does its best to retain the related information in its
If a recording gets divided into multiple sessions, metadata is
also available to help client applications correlate those sessions
A multi-party conference is also represented by a single session with one stream in each direction, with the conference
bridge combining all but one of the parties into a single
MediaSense participant. There is metadata to identify that one of the streams
represents a conference bridge, but MediaSense does not receive the full list of parties on the conference bridge.
Transfers behave differently depending on whether the
call is forked from a Unified Communications Manager phone or from a CUBE.
Communications Manager recordings, the forking phone anchors the recording.
Transfers that drop the forking phone terminate
the recording session but transfers that keep the forking
phone in the conversation do not.
With CUBE forking, the situation
is more symmetric. CUBE is an intermediary network element and
neither party is an anchor. Transfers on either side of
the device are usually accommodated within the same recording
session. (See Solution-level deployment models for more
Hold and pause
Hold and pause are two concepts sound similar, but they are not the same.
Hold (and resume) takes place as a result of a user pressing a key on
his or her phone. MediaSense is a passive observer.
Pause (and resume) takes place as a result of a client application
issuing a MediaSense API request to temporarily stop
recording while the conversation continues.
Hold behavior differs depending on which device is
forking media. In Unified Communications Manager deployments, one party places the call
on hold, blocking all media to or from that party's phone while the
other phone typically receives music (MOH). If the forking phone is
the one that invokes the hold operation, Unified Communications Manager terminates the recording session and creates a new recording
session once the call is resumed. Metadata fields allow client
applications to gather together all of the sessions in a given
If the forking phone is not the one that invokes the
hold operation, the recording session continues without a
break and even includes the music on hold—if it is unicast
(multicast MOH does not get recorded).
For deployments where Unified Communications Manager phones are configured for
selective recording, there must be a CTI
(TAPI or JTAPI) client that proactively requests Unified Communications Manager to
begin recording any given call. The CTI client does not need to retrigger recording in the case of a hold and resume.
For CUBE deployments, hold and resume are implemented as direct
SIP operations and the SIP protocol has no direct concept
of hold and resume. Instead, these operations are implemented in
terms of media stream inactivity events. MediaSense captures
these events in its metadata and makes it available to application
clients, but the recording session continues
The Pause feature allows applications such as Customer
Relationship Management (CRM) systems or VoiceXML-driven IVR
systems to automatically suppress recording of sensitive
information based on the caller's position in a menu or scripted
interaction. Pause is invoked by a MediaSense API client to temporarily stop recording, and the subsequent playback simply skips over the paused segment.
MediaSense does store the information in its
metadata and makes it available to application clients.
Pause behaves identically for CUBE and Unified Communications Manager
Direct inbound recording
In addition to compliance recording controlled by a CUBE or a Unified Communications Manager recording profile, recordings can be initiated by directly dialing a number associated with a MediaSense server configured for automatic recording. These recordings are not carried out through media forking technology and therefore are not limited to CUBE or Cisco IP phones, nor are they limited to audio media. This is how video blogging is accomplished.
Direct outbound recording
Using the MediaSense API, a client requests MediaSense to call a phone number. When the recipient
answers, the call is recorded similarly to the way it is recorded when a user dials the
recording server in a direct Inbound call. The client can be
any device capable of issuing an HTTP request to MediaSense,
such as a 'call me' button on a web page. Any phone, even a non-IP phone (like a
home phone), can be recorded if it is converted to IP using a
supported codec. Supported IP video phones can also be recorded in this
Direct outbound recording is only supported if MediaSense
can reach the target phone number through a Unified Communications Manager system. In
CUBE-only deployments where Unified Communications Manager is not used for call
handling, direct outbound recording is not supported.
While a recording
is in progress, the session is monitored by a third-party streaming-media
player or by the built-in media player in MediaSense.
To monitor a call
from a third-party streaming-media player, a client must specify a real time
streaming protocol (RTSP) URI that is prepared to supply HTTP-BASIC credentials
and is capable of handling a 302 redirect. The client can obtain the URI either
by querying the metadata or by capturing session events.
an HTTP query API that allows suitably authenticated clients to search for
recorded sessions based on many criteria, including whether the recording is
active. Alternatively, a client may subscribe for session events and receive
MediaSense Symmetric Web Service (SWS) events whenever a recording is started
(among other conditions). In either case, the body passed to the client
includes a great deal of metadata about the recording, including the RTSP URI
to be used for streaming.
streaming-media players that Cisco has tested for MediaSense are VLC and
RealPlayer. Each of these players has advantages and disadvantages that should
be taken into account when selecting which one to use.
are usually made up of two audio tracks. MediaSense receives and stores them
that way and does not currently support real time mixing.
VLC is capable of
playing only one track at a time. The user can alternate between tracks but
cannot hear both simultaneously. VLC is open source and is easy to embed into a
play the two streams as stereo (one stream in each ear) but its buffering
algorithms for slow connections sometimes results in misleading periods of
silence for the listener. People are more or less used to such delays when
playing recorded music or podcasts, but call monitoring is expected to be real
time and significant buffering delays are inappropriate for that purpose.
None of these
players can render AAC-LD, g.729 or g.722 audio. A custom application must be
created in order to monitor or play streams in those forms.
built-in media player is accessed by a built-in Search and Play application.
This player covers more codecs and can play both streams simultaneously, but it
cannot play video, and it cannot support the AAC-LD codec. This applies to both
playback of recorded calls and monitoring of active calls.
Only calls that
are being recorded are available to be monitored. Customers who require live
monitoring of unrecorded calls, or who cannot accept these other restrictions,
may wish to consider Unified Communications Manager's Silent Monitoring
Once a recording
session has completed, it can be played back on a third-party streaming-media
player or through the built-in media player in the Search and Play application.
Playing it back through a third-party streaming-media player is similar to
monitoring—an RTSP URI must first be obtained either through a query or an
While recording a
call, it is possible to create one or more segments of silence within the
recording (for example by invoking the pauseRecording API). Upon playback,
there are various ways to represent that silence. The requesting client uses a
set of custom header parameters on the RTSP PLAY command to specify one of the
The RTP stream pauses for
the full silent period, then continues with a subsequent packet whose mark bit
is set and whose timestamp reflects the elapsed silent period.
The RTP stream does not
pause. The timestamp reflects the fact that there was no pause, but the RTP
packets contain "TIME" padding which includes the absolute UTC time at which
the packet was recorded.
The RTP stream compresses
the silent period to roughly half a second; in all other respects it acts
exactly like bullet 1. This is the default behavior and is how the built-in
media player works.
In all cases, the
file duration returned by the RTSP DESCRIBE command reflects the original
record time duration. It is simply the time the last packet ended minus the
time the first packet began.
duration returned by the MediaSense API and session events may differ because
these are based on SIP activity rather than on media streaming activity.
players such as VLC and RealPlayer elicit the default behavior described in
bullet 3. However, these players are designed to play music and podcasts, they
are not designed to handle media streams that include silence—so they may hang,
disconnect, or not seek backwards and forwards in the stream.
recording sessions can be converted on demand to .mp4 or .wav format via an
HTTP request. Files converted this way format carry two audio tracks—not as a
mixed stream, but as stereo. Alternatively, .mp4 files can also carry one audio
and one video track.
.mp4 and .wav files are stored for a period of time in MediaSense along with
their raw counterparts and are accessible using their own URLs. (The files
eventually get cleaned up automatically, but are recreated on demand the next
time they are requested.) As with streaming, browser or server-based clients
can get the URIs to these files by either querying the metadata or monitoring
recording events. The URI is invoked by the client to play or download the
As with RTSP
streaming, the client must provide HTTP-BASIC credentials and be prepared to
handle a 302 redirect. In this way, conversion to .mp4 or .wav format provides
a secure, convenient, and standards-compliant way to package and export
scale conversion to .mp4 or .wav takes a lot of processing power on the
recording server and may impact performance and scalability. To meet the
archiving needs of some organizations, as well as to serve the purposes of
those speech analytics vendors who would rather download recordings than stream
them in real time, MediaSense offers a "low overhead" download capability.
allows clients using specific URIs to download unmixed and unpackaged
individual tracks in their raw g.722, g.711 or g.729 format. The transport is
HTTP 1.1 chunked, which leaves it up to the client (and the developer's
programming expertise) to reconstitute and package the media into whatever
format best meets its requirements. As with the other retrieval methods, the
client must provide HTTP-BASIC credentials and be prepared to handle a 302
redirect. Note that video streams and AAC-LD encoded audio streams cannot
currently be downloaded in this way.
and Play application
provides a web-based tool used to search, download, and playback recordings.
This Search and Play application is accessed using the API user credentials.
The tool searches
both active and past recordings based on metadata characteristics such as time
frame and participant extension. Recordings can also be selected using call
identifiers such as Cisco-GUID or Unified CM call leg identifier. Once
recordings are selected, they may be individually downloaded in mp4 or .wav
format or played using the application's built-in media player.
The Search and
Play tool is built using the MediaSense REST-based API. Customers and partners
interested in building similar custom applications can access this API from the
DevNet (formerly known as the Cisco Developer Network).
Support for the
Search and Play application is limited to clusters with a maximum of 400,000
sessions in the database. Automatic pruning provides the capability to adjust
the retention period to ensure that this limitation is respected using the
Retention Setting in Days =
400,000 / (avg # agents * avg # calls per hour * avg # hours per day)
For example, if
you have 100 agents taking 4 calls per hour, 8 hours per day every day, you can
retain these sessions for 125 days before exceeding the 400,000 session limit.
This is acceptable for most customers, but if you have 1000 agents taking 30
calls per hour, 24 hours per day every day, your retention period is about half
a day. The Search and Play application cannot be used in this kind of
recording uses a different set of codecs than those typically used for music
and podcasts. As a result, most off-the-shelf media players are not well suited
to playing the kind of media that MediaSense records. This is why partner
applications generally provide their own media players, and why MediaSense has
the built-in Search and Play application.
player supports g.729, g.711, and g.722 codecs. This applies to both playback
of recorded calls and monitoring of active calls.
The embedded media
player can be accessed through the Search and Play application or it can be
used by a 3rd party client application. Such an application can present a
clickable link to the user that, when clicked, loads the recording-specific
media player for the selected recording session into the user's browser. This
allows partners who do not have sophisticated user interface requirements to
avoid the complexity of either developing their own media player or
incorporating an off the shelf media player into their applications.
Uploaded videos to
support ViQ, VoD and VoH features
supports the Cisco Contact Center Video in Queue, Video on Demand, and Video on
Hold features by enabling administrators to upload .mp4 video files for
playback on demand.
To use these
features, users must:
Produce an .mp4 video that
meets the technical specifications outlined below.
Upload the .mp4 video to
the MediaSense Primary node. The video is automatically converted into a form
that can be played back to a supported video endpoint and distributed to all
other nodes. Playback is automatically load balanced across the cluster.
Create an "incoming call
handling rule" that maps a particular incoming dialed number to the uploaded
video. You may also specify whether this video should be played once or
user interfaces are provided for uploading the file to MediaSense and creating
the incoming call handling rule. These functions are not available through the
An .mp4 file is a
container that may contain many different content formats. MediaSense requires
that the file content meet the following specifications:
The file must contain one
audio track and one video track.
The video must be encoded
using the H.264.
The audio must be encoded
The audio must be monaural.
The entire .mp4 file size
must not exceed 2GB.
information is known as the
Specification. It must be provided to any professional studio that is
producing video content for this purpose. Most commonly available consumer
video software products can also produce this format.
resolution and aspect ratio are not enforced by MediaSense. MediaSense will
play back whatever resolution it finds in an uploaded file, so it is important
to use a resolution that looks good on all the endpoints on which you expect
the video to be played. Many endpoints are capable of up- or down-scaling
videos as needed, but some (such as the Cisco 9971) are not. For the best
compatibility with all supported endpoints, use standard VGA resolution
do not support AAC-LC audio (which is the standard for .mp4), so MediaSense
automatically converts the audio to AAC-LD, g.711 µlaw, and g.722 (note that
g.711aLaw is not supported for ViQ/VoH). MediaSense automatically negotiates
with the endpoint to determine which audio codec is most suitable. If
MediaSense is asked to play an uploaded video to an endpoint which supports
only audio, then only the audio track will be played.
capability is supported on all supported MediaSense platforms, but there are
varying capacity limits on some configurations. See the "Hardware Profiles"
section below for details.
with a sample video pre-loaded and pre-configured for use directly out of the
box. After successful installation or upgrade, dial the SIP URL
sip:SampleVideo@<mediasense-hostname> from any
supported endpoint or from Cisco Jabber Video to see the sample video.
Unity Connection for video voice-mail
Beginning with Cisco Unity Connection (CUC) release 10.0(1), configured
subscribers have the option to record video greetings in addition to audio
greetings. Subscribers who are configured to record video greetings and who are
calling from a video capable IP endpoint are presented with additional prompts
to record their video greeting. These recordings (both the audio and video
tracks) are stored and played back from MediaSense. A separate audio-only copy
of the recording remains on Unity Connection as well.
If for any reason Unity Connection is not able to play a video
greeting from MediaSense, it reverts to its locally stored audio greeting.
This is an introductory implementation and therefore contains a number
A single, dedicated MediaSense node may be connected to one Unity
Connection node, where a node is a single instance of CUC
or an HA pair.
The MediaSense node may not be used for any other MediaSense
The scale is limited to approximately 35 simultaneous video
More information about the Cisco Unity Connection integration,
including deployment and configuration instructions, can be found in the Unity
Finesse and Unified CCX
integrated with Cisco Finesse and Unified Contact Center Express (Unified CCX).
The integration is both at the desktop level and at the MediaSense API level.
At the desktop
level, MediaSense's Search and Play application has been adapted to work as an
OpenSocial gadget that can be placed on a Finesse supervisor's desktop. In this
configuration, MediaSense can be configured to authenticate against Finesse
rather than against Unified CM. Therefore, any Finesse user who has been
assigned a supervisor role can search and play recordings from MediaSense
directly from his or her Finesse desktop. (A special automatic sign-on has been
implemented so that when the supervisor signs in to Finesse, he or she is also
automatically signed into the MediaSense Search and Play application.) Note
that other than this sign-in requirement, there are currently no constraints on
access to recordings. Any Finesse supervisor has access to any and all
At the API level,
Unified CCX subscribes for MediaSense recording events and matches the
participant information it receives with the agent extensions that it knows
about. It then immediately tags those recordings in MediaSense with the
agentId, teamId, and if it was an ICD call, the contact service queue
identifier (CSQId) of the call. This allows the supervisor, through the Search
and Play application, to find recordings which are associated with particular
agents, teams, or CSQs without having to know the agent extensions.
uses BiB forking, selectively invoked through JTAPI by Unified CCX. Because
Unified CCX is in charge of starting recordings, it is also in charge of
managing and enforcing Unified CCX agent recording licenses. However, other
network recording sources (such as un-managed BiB forking phones or CUBE
devices) could still be configured to direct their media streams to the same
MediaSense cluster, which could negatively impact Unified CCX's license
Unified CCX might think it has 84 recording licenses to allocate to agent
phones as it sees fit, but it may find that MediaSense is unable to accept 84
simultaneous recordings because other recording sources are also using
MediaSense resources. This also applies to playback and download activities—any
activity that impacts MediaSense capacity. If you are planning to allow
MediaSense to record other calls besides those that are managed by Unified CCX,
then it is very important to size your MediaSense servers accordingly.
about this integration, including deployment and configuration instructions,
can be found in the Unified CCX documentation.
Unified CM for Video on Hold and native queuing
Starting with Unified CM Release 10.0, customers can configure a Video
on Hold source for video callers, similar to a Music on Hold source that is
used for audio callers. The same facility is used to provide pre-recorded video
to callers who are waiting for a member of a hunt group to answer. This is
known as "CUCM native queuing."
MediaSense can be used as the video media server for both purposes.
To use MediaSense in this way, administrators make use of the product's generic
ability to assign incoming dialed numbers to various uploaded videos, which are
then played back when an invitation arrives on those dialed numbers. Unified CM
causes one of these videos to play by temporarily transferring the call to the
corresponding dialed number on MediaSense.
Calls that are to be recorded must be routed through a CUBE device
that is configured to fork its media streams to MediaSense (because most of the
endpoints used for Remote Expert are not able to fork media themselves). All
the codecs listed in
Codecs supported are supported, except for
the video codec, H.264. If your version of IOS does fork video along with the
audio streams, MediaSense will only capture the audio. Please consult the
Compatibility Matrix to ensure that your
CUBE is running a supported version of IOS, to ensure that you incorporate
several bug fixes in this area.
Remote Expert provides its own user interface portal for finding and
managing recordings, and for playing them back. For AAC-LD audio calls (most
common when using EX-series endpoints), there are no known RTSP-based AAC-LD
streaming media players, so those calls can only be converted to .mp4 and
downloaded for playback. Live monitoring of such calls is not possible.
For more information about this integration, including deployment and
configuration instructions, see the Remote Expert documentation.
Incoming call handling rules
When MediaSense receives a call, it needs to know what
action to take. In Releases 9.0(1) and earlier, all incoming calls
would simply be recorded—irrespective of the dialed number to
which the call was addressed. As of Release 9.1(1), you have the
option to configure what action MediaSense takes for each call type. The following actions are available:
Record the call.
Reject the call.
Play a specified uploaded video once.
Play a specified uploaded video repetitively.
If your application is to record calls forked by a CUBE, then
the dialed number in question is configured as the
"destination-pattern" setting in the dial peer which points to
MediaSense. If your application is to record calls forked by
a Unified Communications Manager phone, then the dialed number in question is
configured as the recording profile's route pattern.
For compatibility with earlier releases, all incoming addresses
(except for SampleVideo) are configured to record.