Audio File Support
Table of ContentsAudio File Support
Audio File Formats
Audio File Playout
Recording and Playback
Data-Length Correction Utility For Audio File Headers
This appendix describes the supported audio file formats, playout methods, and codecs for audio files that are played out by TCL or VoiceXML applications, or recorded by the gateway using the VoiceXML Voice Store and Forward feature. This appendix contains the following sections:
Cisco VoiceXML gateways support two standard audio formats for recording and playback: .au (audio/basic) and .wav (audio/wav). The format used to record an audio file is specified by the VoiceXML document at the time of the recording. If it is not defined in the VoiceXML document, the default format type is audio/basic. The gateway uses standard codec numbers for the codec format in the audio header whenever possible, including G.711 u-law, G.711 a-law, and G.726 (32k ADPCM); other codecs use a proprietary format mapping and may therefore not be recognized by third-party audio players. For a listing of the codec numbers used by the Cisco gateway, see the "Codec Mappings in Audio Recordings" section.
Recordings made by the gateway are open-ended and real-time streaming, so the .au and .wav file headers generated by the Cisco gateway are encoded with a data length of 0. All audio files recorded by the Cisco gateway can be played back by the gateway, however, because the standard .wav format does not allow for a 0 length in the header, some third-party audio players may not be able to play back .wav files encoded by the gateway. The .au file format allows a 0 length, but some audio players may also not support these files.
For example, the Cool Edit audio player can play 0-length .au and .wav files that are recorded by the gateway using the G.711 u-law codec. Other audio players and other codecs require the audio file header to be modified before playback. For example, the Windows Media Player audio player can play G.711 u-law and G.711 a-law audio files if the 0-length field is modified. The Cool Edit audio player can play G.711 a-law files if the header is corrected.
To enable an audio player to play back recordings made by the Cisco gateway, you must use a correction utility to modify specific fields in the audio file headers. For information about how to modify the file headers, see the "Data-Length Correction Utility For Audio File Headers" section.
TCL and VoiceXML applications can be configured for incoming POTS or VoIP call legs to play announcements to the user and to request user input (digits). The gateway can also play back audio recordings made by VoiceXML applications. Audio files can be played toward both the PSTN side and the IP side of the call leg.
TCL and VoiceXML applications can play out audio files by using the following playout methods from different locations:
The entire audio file is loaded into the gateway's memory and then played out to the appropriate call leg as needed. Memory-based prompts can be loaded from an HTTP server, or from Flash memory, a TFTP server, or an FTP server. Audio files can also be recorded into memory using VoiceXML recording capabilities, then played back from memory or submitted to an external HTTP server to become permanent audio files.
The amount of memory available to store audio prompts on the gateway can be configured by using the ivr prompt memory command.
The Cisco gateway can stream audio files from an external server to the appropriate call leg as needed. Loading an entire audio prompt into local memory before beginning playout can limit the length of audio prompts and impact memory resources. With streaming, pieces of a prompt are loaded into memory and then, if necessary, deleted after they are played to free up memory. The audio file is played out while it is being loaded into memory, with playback beginning as soon as a piece of the prompt is loaded.
Prompts can be streamed from an HTTP server or from Flash memory, a TFTP server, or an FTP server. With HTTP, each time a prompt is played, the HTTP caching system is checked, and the audio file is reloaded if necessary. The HTTP cached flag in the VoiceXML document specifies whether an audio file that is loaded into memory is safe to use again, and does not have to be deleted.
To enable the gateway to stream audio files during playout, use the ivr prompt streamed command. HTTP prompts are streamed by default, but HTTP streaming can be disabled by using the no ivr prompt streamed http command.
The amount of memory available to store audio prompts on the gateway can be configured by using the ivr prompt memory command. Performance is best when there is enough memory to store the entire audio file. If the ivr prompt memory command is set to a value smaller than the size of a streamed file, performance is not as good.
An external Real Time Streaming Protocol (RTSP) server can stream audio to the appropriate call leg as needed. RTSP is an application-level protocol that controls the on-demand delivery of real-time data, such as the delivery of audio streams from an audio server. By implementing an RTSP client on the Cisco VoIP gateway, a voice application running on the gateway can connect calls with audio streams from an external RTSP server. Prompts from RTSP servers are always streamed during playback. RTSP saves memory on the gateway because it is packet-based. Unlike HTTP or TFTP streaming, for example, RTSP streaming does not read any part of the audio file into RAM.
An external speech synthesizer using MRCP can generate prompts. Requests to synthesize speech from text strings or audio segments are sent to the media server, which responds with a real-time audio stream.
Dynamic prompts are formed by the underlying system assembling small audio files and playing them out in sequence. This provides simple TTS operations, like playing numbers, dollar amounts, dates, and time. For example, dynamic prompts can inform the caller of how much time is left in their debit account, as in:
The above prompt is created using eight individual audio files. They are: youhave.au, 15.au, minutes.au, and.au, 30.au, 2.au, seconds.au, and leftinyouraccount.au. These audio files are assembled dynamically by the underlying system and played out as a single prompt.
The language and location of the audio files used for dynamic prompts can be specified in the TCL script or VoiceXML document, or these parameters can be configured on the Cisco gateway by using the call application voice language command and the call application voice set-location command.
Each language uses a TCL language module. The TCL language module defines the list of TTS notations that the language supports. Cisco IOS software includes built-in language modules for English, Chinese, and Spanish. You can add support for new languages and new TTS notations by configuring a new TCL language module on the gateway.
The Cisco IOS infrastructure interfaces with the TCL language module to translate TTS notations supplied by the voice application into the specified language. Cisco IOS software translates TTS notations into the sequence of audio files according to the language structure. For example, English and French use different sequences for saying the date: the English language structure says the month first and then the day; the French language structure says the day first and then the month.
New TTS notations for the Cisco IOS built-in languages, such as playing dates and times of day, can also be configured. For example, if you configure a new English TCL language module, it overrides the built-in English TCL language module during the translation. When completed, any voice application can use the new notations, and the Cisco IOS infrastructure recognizes and plays the audio accordingly.
For information on configuring a new language module on the gateway, see the "Specifying a New Language Module for Dynamic Prompts" section.
The VoiceXML Voice Store and Forward feature allows streaming-based voice recording and playback features for various media including local memory, HTTP, ESMTP, and RTSP for 14 different Cisco codecs and two standard audio file formats, .au and .wav.
The VoiceXML Voice Store and Forward feature supports audio recording and playback using local memory on the Cisco gateway or a choice of external media server locations:
The URL of the recording destination is specified in the VoiceXML document by using the Cisco property cisco-dest. For more information, see the Cisco VoiceXML Programmer's Guide .
Table A-1 shows the codecs that are supported for audio recording and playback, by platform and minimum required Cisco IOS release.
Table A-1: Codec Support for Audio Recording by Platform and Minimum Cisco IOS Release
Table A-2 lists the codec number that is mapped to each codec in the audio file header, for .au format and .wav format files.
Table A-2: Codec Numbers Used in Audio File Headers
Because the recording duration is unknown at the beginning of an HTTP or ESMTP streaming session, the .au and .wav file headers generated by the Cisco gateway are encoded with a data length of 0 when recording to an HTTP or ESMTP server; the gateway correctly decodes 0-length files during playback.
The standard .wav format, however, does not allow for a 0-length in the header, so third-party audio players may not be able to play back .wav files encoded by the gateway. Although the .au format allows files with a 0-length, some audio players may not support these files either.
To enable an audio player to play back recordings made by the Cisco gateway to an HTTP or ESMTP server, you must use a correction utility to modify fields in the audio file headers. The corrections that must be made to the .au and .wav header fields are described below.
One field in the .au header requires correcting; the data_size field located at offset 8 from the beginning of the .au file. The correction utility must find the total size of the source .au file, subtract the .au header size at offset 4 from this file size, and insert the result into the data_size field at the offset 8 position.
The Cisco gateway puts a value of 24 in offset 4 and a value of 0 in offset 8, but this is not guaranteed. The correction utility should follow the correction logic outlined above.
Three fields in the .wav header require correcting:
The Cisco gateway puts a value of 0 in offset 4 and a value of 0 in offset 40, but this is not guaranteed. The correction utility should follow the correction logic outlined above.