Cisco TelePresence MCU 5300 Series

Recording voice prompts for Cisco TelePresence products

How to record voice prompts for Cisco TelePresence products

This article applies to the following products:

  • Cisco TelePresence MCU 5300 Series
  • Cisco TelePresence MCU 4500 Series
  • Cisco TelePresence MCU 4200 Series
  • Cisco TelePresence MCU MSE 8420 and 8510 blades
  • Cisco TelePresence ISDN GW 3200 and 3241 / MSE 8310 and 8321 ISDN blades
  • Cisco TelePresence Serial GW 3340 / MSE 8330 blade

By default, the Cisco TelePresence infrastructure products listed include English voice prompts spoken by an American woman. These prompts provide users with information, for example: "Sorry, I did not recognize that security PIN, please try again". You can replace these prompts with your own in order to change the wording, language, or accent used. Using the web interface, you can upload voice prompts as individual .wav files or in one go by using a resource package.

This article explains how to record alternative voice prompts.

Recording New Prompts

When recording new voice prompts there are a number of points to consider which will help you produce the best results with the minimum of effort. These considerations include:

  • Recording format, sampling frequency and resolution
  • Background noise
  • Voice consistency
  • Volume
Recording format

It is possible to convert a wide variety of audio file formats into the format required by the Cisco TelePresence voice prompts. However, every conversion has the potential to lower the quality of the audio, and therefore it is best to make the initial recording with the ideal settings and avoid any conversions.

The ideal format is Microsoft Wave format, uncompressed, mono, at 16 kHz and 16-bit resolution.

If you are unable to make mono recordings, the Cisco TelePresence products can convert stereo recordings. Internally, the products use 16 kHz audio samples, this is the ideal frequency to use. Do not record at a lower sampling frequency than 16KHz; if you must record at a higher rate, do so at 44.1 or 48 kHz. 16-bit sampling resolution is required for high-fidelity voice prompts.

Background noise

When recording new voice prompts, it is important to minimise background noise (hiss) as much as possible. As well as preventing ambient noise such as road noise and slamming doors, make sure that fan noise and other background noises are kept to a minimum.

When played back by the Cisco TelePresence products, samples with background noise (hiss) are very apparent. Although the background noise may not be obvious when listening to the recordings themselves, when the product starts to play a prompt, the transition from silence to a prompt with even slight background noise is obvious and can be distracting.

Voice consistency

If possible, record all voice prompts in one session. This will ensure that all voice and background conditions remain constant and that the recorded voice will sound similar from prompt to prompt. As with the transition from silence into background noise, differences in the recorded voice that go unnoticed when listening to prompts in isolation, can become very apparent when prompts are played one after another.

Also, try not to allow the wording of one prompt affect the inflection used in another. Although not specifically a problem for the Cisco TelePresence prompts, a common instance of this is recording the word yes after the word no; it is very difficult not to apply a different emphasis to the second word. Although you would expect a different emphasis when the words are used together in normal speech, it sounds strange when you hear them independently if recorded with inappropriate inflection.


Record prompts using a relatively constant loudness of voice, try not to vary speech volume from word to word. Note that a speaker will tend to speak at different volumes from session to session, so aim to record all of the prompts in one session.

Although it may take some trial and error, the best recordings will result from speaking loud enough that the voice is recorded loudly compared to any background noise, but not so loudly that it sounds distorted when played back.

Prompts Specification

The default wording for each prompt with its filename, is listed in the online help for the product.

  1. Go to Settings > User interface
  2. Click on the help icon in the top right of the web page

The help page will show a table listing each filename with default wording.

You do not have to use exactly the same wording in the voice prompts if they are not appropriate for your needs, but do follow the points below:

  • Keep the recording length of each individual prompt to less than ten seconds, otherwise it risks being cut short when played back by the conferencing equipment.
  • Ensure that if all samples were played back-to-back that they take no more than 240 seconds, there is a total length limit of four minutes for the full set of prompts.
  • Use the filenames associated with the voice prompts, do not change the filenames if you alter the wording of the voice prompts.
  • Record each prompt as a separate sample.
  • Follow the advice given for recording format, background noise, voice consistency and volume.

January 24th, 2014 KB_703