Table Of Contents
Release Notes for Cisco Internet Streamer
These release notes cover Cisco Internet Streamer CDS Release 2.5.11-b26.
Note Release 2.5.11-b26 obsoletes all previous Release 2.5.11 builds.
Revised: March 2013, OL-25177-10
The following information is included in these release notes:
Release 2.5.11 of the Cisco Internet Streamer CDS introduces the following features:
Windows Media Streaming SDP Caching
Live streaming is content that is streamed while it is still being encoded by an encoder. There are two kinds of Windows Media live streaming
•Playlist live—One or more content items are streamed sequentially.
•Broadcast live—Live and prerecorded content can be streamed to more than one client simultaneously. The SE streams the content to all clients, which does not allow the clients to perform seeks on the stream.
Streaming is accomplished by using HTTP live or RTSP live. HTTP live uses Windows Media Streaming Protocol (MS-WMSP) where the wms-hdr in the WMS-Describe-Response describes the content. RTSP live uses RTSP where the Session Description Protocol (SDP) file in the DESCRIBE response describes the content.
The RTSP playlist live SDP file cannot be cached because the SDP file keeps changing to reflect the different content playlists.
Previously, getting the SDP file for RTSP broadcast live was accomplished by the Windows Media Streaming engine sending an RTSP DESCRIBE message to the upstream SE to retrieve the SDP file for each Windows Media Streaming broadcast live request. The RTSP DESCRIBE message eventually reached the Content Acquirer or Origin server, which created a bottleneck for Windows Media Streaming broadcast live.
In Release 2.5.11, because the SDP file for RTSP broadcast live does not change unless the program is stopped, it can be cached on the streaming SE. Once the SDP file is cached, it can be used to compose the DESCRIBE response. No further requests for the SDP file from the upstream server (SE, Content Acquirer, or Origin server) are necessary, which eliminates the bottleneck.
Note The SDP file cannot be cached if content requires authorization by either the Origin server or the SE.
Windows Media Streaming STB Seek Latency Reduction
Previously, when a client issued a seek operation, such as skipping forward or backwards, a new point in the stream time is specified in the request header, and the SE starts the stream with a new data packet for that new point. The new data packet may or may not contain the I-frame payload. The client (STB) may fill the memory buffer before receiving the I-frame, and because of the small buffer size on STBs, the buffer could overflow before the I-frame is received.
In Release 2.5.11, after a client issues a seek operation and before the SE streams the new data packet, the Internet Streamer CDS checks the data packets for the I-frame and chooses the most appropriate one as the first packet (the I-frame might have been fragmented into more than one data packet).
Windows Media Streaming Tracking for Client Billing
When a client abnormally disconnects from the SE and does not send a log, Windows Media Streaming generates a log event (408 log code) even though the client has not sent a log.
When the server generates logging data, it uses the server-side values that are available and any client values that were transferred at the beginning of the session. The value in the x-duration field, which is used to track client billing, is based on the values of the following fields: date, time, c-starttime, c-rate, and filelength. The date, time, and filelength field values are known at the client connection time. The x-duration and c-starttime are determined by the CDS based on the media content packet's playtime.
Previously, the x-duration field in the Windows Media Streaming transaction log either used the client value or no value. In Release 2.5.11, the x-duration field has the value based on the following:
•If the client does not report a value, indicated by a hyphen (-), the server value is used.
•If the value is reported as zero, then the server duration value is used.
•If the server compares the x-duration reported by the client with the filelength and the difference is more than two minutes, then the server's duration value is used.
Windows Media Streaming ASX Files with URL Signing
The Windows Media Streaming ASX Files with URL Signing feature adds a new Rule_Actions to the Service Rules XML schema, the Rule_UrlGenerateSign.
When the playback URL for a Windows Media Streaming live program has an ASX extension, the Content Abstraction Layer (CAL) returns metadata with an ASX file generated that contains both an HTTP URL and an RTSP URL for playback of the live program. These two URLs should be signed so that subsequent requests to playback the live program can be validated by the SE.
The Rule_UrlGenerateSign Rule_Action provides the ability to internally generate URL signatures using Version 2 of the URL signing script (SHA-1 encryption, protocol removed from beginning of the URL, and domain name not included). When the signed URL is sent back to the client as part of the ASX response, the domain name received from the client is added back in.
ASX File Request Flow
The request flow is as follows:
1. Client requests an ASX file.
2. A Service Rule XML file is configured for the delivery service that contains the new Rule_Action, Rule_UrlGenerateSign. The Rule_UrlGenerateSign Rule_Action element requires the following attribute values: Key Owner, Key Number, and timeout. If the timeout attribute value is not specified, the default value of 30 seconds is used. The range for the timeout value is from 0 to 50 seconds.
3. If the pattern for Rule_UrlGenerateSign is matched, the URL signature is generated by the SE using Version 2 of the URL signing script and the attribute values specified for the Rule_UrlGenerateSign element.
Internally signed URLs will have IS=1. The IS=0 string is for legacy support with some CDS components that use both internal and external signing mechanisms.
Both the HTTP and RTSP signed URLs are contained in the ASX file. The signed URL that is used is determined by which protocol (HTTP or RTSP) is allowed or disallowed in the Windows Media Streaming configuration.
Note If Windows Media Streaming is disabled, a 500 internal server message is sent to the client. The ASX file is not generated if Windows Media Streaming is disabled.
4. The client receives the ASX file with the signed URL. The player parses the ASX file and sends out the request again with the signed URL. The SE receives the signed URL and validates it. If the validation succeeds, the client is served the content.
The Service Rule XML file has to be created and uploaded through the CDSM GUI, then assign to the delivery service.
Service Rule XML Example with Rule_UrlGenerateSign
Following is an example of a Service Rule XML file that has the Rule_UrlGenerateSign element:<CDSRules xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="schema\CDSRules.xsd"><Revision>1.0</Revision><CustomerName>Cisco</CustomerName><Rule_Patterns><PatternListGrp id = "grp1"><Domain>cisco.co</Domain ></PatternListGrp></Rule_Patterns><Rule_Actions><Rule_UrlGenerateSign matchGroup = "grp1" protocol = "http"key-id-owner="1" key-id-number="2" timeout-in-sec="30"/></Rule_Actions></CDSRules>
Service Rule Action Order
The Rule_Actions processing is the same in Release 2.5.9 and Release 2.5.11; all Rule_Actions are processed in the same order as they are listed in the Rule_Actions element. However, in Release 2.5.11, for Rule_Validate and Rule_UrlGenerateSign, if the pattern is matched, and the URL validation or URL generation fails and there is a Rule_UrlRewrite or Rule_NoCache listed before, neither will be performed. Because the Rule_Validate or Rule_UrlGenerateSign process failed (validation or generation respectively), the authserver returns Action_Deny and the corresponding rule action (either Action_validate or Action_UrlGenerateSign). The Action_rewrite is not returned, nor is the action for Rule_NoCache if it is listed. This is true whenever Rule_Validate or Rule_UrlGenerateSign is listed, the pattern is matched, and the action fails (either URL validation or URL signing fails).
If either Rule_Validate or Rule_UrlGenerateSign is listed, the pattern is matched, and the action is successful, and if Rule_UrlRewrite is listed, then the Action_rewrite is returned and so is the Action_validate and Action_UrlGenerateSign (if all three rules are listed).
Service Rule Processing
This section describes the rule processing.
Note Pattern match failure as described in this section means that none of the patternGrps specified as part of the matchGroup matched for a particular action.
If pattern match fails, the request is blocked and there is no further processing of the remaining rules.
If pattern match is successful, rule processing continues to the next rule action.
If there is a pattern match for Rule_Block, the request is blocked and there is no further processing of the remaining rules.
If there is no pattern match for Rule_Block, rule processing continues to the next rule action.
Rule_UrlRewrite, Rule_NoCache, Rule_Validate. Rule_UrlGenerateSign—Pattern Match Failure Case
If pattern match fails, rule processing continues to the next rule action and there is no return value for the specified rule action. For example, if the rule action was Rule_Validate and the pattern match failed, there would be no URL validation performed on the request.
In the following XML example, because the pattern match failed for the action Rule_Validate, authserver does not return Action_validate. Because the Rule_UrlRewrite and Rule_UrlGenerateSign pattern matches were successful, authserver returns those actions in its response.<CDSRules xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="schema\CDSRules.xsd"><Revision>1.0</Revision><CustomerName>ATT</CustomerName><Rule_Patterns><PatternListGrp id = "grp1"><UrlRegex>asx</UrlRegex></PatternListGrp><PatternListGrp id = "grp2"><UrlRegex>abcd</UrlRegex></PatternListGrp></Rule_Patterns><Rule_Actions><Rule_UrlGenerateSign matchGroup = "grp1" key-id-owner = "1" key-id-number = "1" timeout-in-sec = "30" protocol = "http" /><Rule_Validate matchGroup = "grp2" error-redirect-url="http://18.104.22.168/index.html" protocol = "http" /><Rule_UrlRewrite matchGroup = "grp1" protocol = "http" regsub = "DejaVu" rewrite-url = "dummy" /></Rule_Actions></CDSRules>
Rule_UrlRewrite, Rule_No_Cache, Rule_Validate, Rule_UrlGenerateSign—Pattern Match Success Case
If pattern match is successful, the actions are processed as described in the following subsections:
Rule_Validate, Rule_UrlGenerateSign—Validation Fails, Signing Fails, Configuration Failure
Rule_Validate and Rule_UrlGenerateSign have a higher priority than Rule_UrlRewrite or Rule_NoCache. If the pattern matches, but the function fails (URL validation fails, URL signing fails, or there is a configuration failure), there is no further processing of the rule actions and the request is denied.
authserver returns [Action_Deny + Action_validate] if validation/UrlSignature generation fails.
authserver returns [Action_Deny + Action_UrlGenerateSign] if UrlSignature generation fails.
Also, the value from previous actions is not returned in either case. For example, if Rule_UrlRewrite preceded Rule_UrlGenerateSign, and Rule_UrlRewrite was successful, but Rule_UrlGenerateSign failed, authserver does not return the value for Action_Rewrite. Similarly, if Rule_UrlRewrite preceded Rule_Validate, and Rule_UrlRewrite was successful, but Rule_Validate failed, authserver would not return the value for Action_Rewrite. The same logic that is described for Rule_UrlRewrite applies to Rule_NoCache as well.
The following XML example illustrates the above scenarios:<CDSRules xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="schema\CDSRules.xsd"><Revision>1.0</Revision><CustomerName>ATT</CustomerName><Rule_Patterns><PatternListGrp id = "grp1"><UrlRegex>asx</UrlRegex></PatternListGrp><PatternListGrp id = "grp2"><UrlRegex>abcd</UrlRegex></PatternListGrp></Rule_Patterns><Rule_Actions><Rule_UrlRewrite matchGroup = "grp1" protocol = "http" regsub = "DejaVu" rewrite-url = "dummy" /><Rule_UrlGenerateSign matchGroup = "grp1" key-id-owner = "1" key-id-number = "1" timeout-in-sec = "30" protocol = "http" /><Rule_Validate matchGroup = "grp2" error-redirect-url="http://22.214.171.124/index.html" protocol = "http" /></Rule_Actions></CDSRules>
FRule_UrlRewrite and Rule_NoCache—Rewrite Fails
Rule_UrlRewrite and Rule_NoCache have a lower priority than Rule_Validate and Rule_UrlGenerateSign. If the pattern matches, but the Rule_UrlRewrite or Rule_NoCache fails, authserver does not return Action_Deny and processing of remaining rules actions continues. If Rule_UrlRewrite fails, authserver does not return the value for Action_Rewrite. If Rule_NoCache fails, authserver does not return its value.
Rule_UrlRewrite, Rule_NoCache, Rule_Validate, Rule_UrlGenerateSign—Success
If the Rule_UrlRewrite action is successful, authserver response contains the Action_Rewrite and the new rewritten URL is sent. Processing of the remaining rules actions continues.
If the Rule_NoCache action is successful, authserver sends the instructions to not cache the content. Processing of the remaining rules actions continues.
If Rule_Validate is successful, authserver response contains the Action_Validate.
If Rule_UrlGenerateSign is successful, authserver response contains Action_UrlGenerateSign.
Windows Media Streaming Scheduled Live Programs Blocking
Previously, Windows Media Streaming live programs that were scheduled only played during the scheduled time (which is as expected), but the connected streams that were established continued to play indefinitely.
In Release 2.5.11, the ability to configure whether currently streaming live programs should be stopped when the scheduled time has ended has been added to the CDSM GUI and the live program API.
To configure this feature in the CDSM GUI, choose Services > Live Video > Live Programs, click the Edit icon next to the program name. the Program Definition page is displayed with the Block per Schedule check box. Check the Block per Schedule check box and click Submit to stop active streams when the schedule ends.
A new debug command has been added to enable stream-scheduler debug level.# debug stream-scheduler ?error Stream-scheduler debug level set to errortrace Stream-scheduler debug level set to trace
A new attribute, blockPerSchedule, has been added to the Document Type Definition (DTD) for CDS program files. You can use the DTD to create program files for importing programs from third-party systems. The definition for this attribute is the following:blockPerSchedule (false \ true) "false"
Following is an example of a program file for Windows Media Streaming live program with the blockPerSchedule attribute:<?xml version="1.0"?><!DOCTYPE program SYSTEM "program.dtd"><program version="1.0" name="liveProgram" serviceType="wmt" description="test" autoDelete="true" live="true" blockPerSchedule="false"><media index="1" src="http://WMT_encoder:8080" id="media0"/><mcastInfo referenceUrl="http://contentacquirer/liveprogram.nsc" TTL="22"><addrPort addrVal="126.96.36.199" portVal="61248" id="media0"/></mcastInfo><schedule timeSpec="gmt" startTime="0" activeDuration="0"/></program>
Web Engine Maximum Size for Cachable Objects
When a request comes into an SE and the requested file is a cache miss, the request is forwarded to the upstream SE, and if the content is not found in the CDS, to the Origin server. The response from the Origin server or the upstream SE contains the content length. If the value of the content length is less than 2 MB, the asset file is treated as a small file. If the content length is greater than 2 MB, the asset is treated as a large file. Small files are cached in RAM (tmpfs) first, then after approximately four seconds, the file is moved to disk. Large files are stored on disk directly. So the response time for a small file is much less compared to that of a large file.
Previously, the delimiter between a small and large file was hard coded as 2 MB.
In Release 2.5.11, the Memory Cache Size field has been added to allow configuring this delimiter for each delivery service.
The Memory Cache Size value should be carefully chosen because it affects the performance and response time for that delivery service. When picking a value that defines a small file versus a large file for a delivery service, not only should the file size of the majority of the traffic be considered, but also the hardware of all the SEs in that particular delivery service (the value is limited by the hardware of weakest link in the chain of SEs in the delivery service), the bit-rate setting, and other parameters.
To configure the Memory Cache Size field, do the following:
Step 1 Log in to the CDSM GUI.
Step 2 Choose Services > Service Definition > Delivery Services, click the Edit icon next to the delivery service. The Delivery Service Definition page is displayed.
Step 3 Click General Settings. The General Settings page is displayed.
Step 4 In the Memory Cache Size field, enter the maximum file size (in MB) that defines a small file.
The range is from 1 to 10 MB. The default is 2 MB.
Step 5 Click Submit.
When the cache memory (/tmpfs) reaches capacity, which means there is no more space for small files, Web Engine performs a cache bypass and sends the file directly to the client. Previously, the small file was stored on disk, which increased the response time.
CAL Queue Limits
In addition to the Memory Cache Size field being added, the CAL queue is now limited to 2000 tasks. When the CAL queue threshold of 2000 is exceeded, Web Engine does not add anymore disk operation tasks (creates, updates, or popularity updates) and a trace message is logged with the following string:Reason: CalQThreshold Exceeded!
A new output field, "Outstanding Content Popularity Update Requests," has been added to the show statistics web-engine detail command. At any point, the sum of the "Outstanding Content Create Requests," "Outstanding Content Update Requests," and "Outstanding Content Popularity Update Requests" output fields is always less than 2000. If the sum of these three output fields exceeds the CAL queue threshold, no more create, update, and popularity update tasks are performed and the "Reason: CalQThreshold Exceeded!" trace message is logged.
LSR Path Caching for Windows Media Live Streaming
Previously, each incoming Windows Media Streaming live request caused a CAL lookup resolve, which resulted in the Live Service Routing (LSR) module sending liveness queries to some SEs in the same location and upper-tier locations to derive hierarchical splitting URL.
In Release 2.5.11, to avoid doing a CAL lookup resolve for each incoming Windows Media Streaming live request, the live hierarchical splitting URL is cached and is then used by all subsequent Windows Media Streaming live requests for the same live program.
Service Monitor Transaction Logs and Augmentation Alarms
Service Monitor in the Internet Streamer CDS provides threshold monitoring of the various components (CPU, disk, memory, and so on) of the devices (SE, SR, and CDSM), as well as the protocol engines on the SEs.
Release 2.5.11 introduces Service Monitor transaction logs to provide an additional tool for analyzing the health history of a device and the protocol engines, and additional augmentation alarms to ensure the device is within the configured capacity limits.
Service Monitor Transaction Logs
The device and service health information are periodically logged on the device in transaction log files. Transaction logs provide a useful mechanism to monitor and debug the system. The transaction log fields include both device and protocol engine information applicable to Service Engines and Service Routers that are useful for capacity monitoring. Additionally, when a device or protocol engine threshold is exceeded, detailed information is sent to a file (threshold_exceeded.log) to capture the processes that triggered the threshold alarm.
The Service Monitor transaction log filename has the following format: service_monitor_<ipaddr>_yyyymmdd_hhmmss_<>, where:
•<ipaddr> represents the IP address of the SE, SR, or CDSM.
•yyyymmdd_hhmmss represents the date and time when the log was created.
For example, service_monitor_192.168.1.52_20110630_230001_00336 is the filename for the log file on the device with the IP address of 192.168.1.52 and a time stamp of June 30, 2011 at 3:36 AM.
The Service Monitor transaction log file is located in the /local1/logs/service_monitor directory.
An entry to the Service Monitor transaction log is made every two seconds.
Note The following rules apply to Service Monitor transaction logs:
•A transaction log value is only logged if the Service Monitor is enabled for that component or protocol engine on the device. For example if CPU monitoring is not enabled, the transaction log value "-" is displayed.
•If Service Monitor is enabled for a protocol engine, but the protocol engine is not enabled, the value is not displayed in the log file.
•If a log field can have more than one value, the values are delimited by the pipe (|) character.
•If a value can have sub-values, the sub-values are delimited by the carrot (^) character.
•Some of the fields display aggregate values. If the statistics are cleared using the clear statistics command, the value after clearing the statistics may be less than the previous values, or may be zero (0).
Table 1 describes the fields for the Service Monitor transaction log on an SE.
Table 2 describes the fields for the Service Monitor transaction log on a SR.
Configuring Service Monitor Transaction Logs
Transaction logging for Service Monitor is disabled by default. To enable the Service Monitor transaction logging, enter the following commands:Device(config)# transaction-logs enableDevice(config)# service-router service-monitor transaction-log enable
To disable Service Monitor transaction logging, enter the no service-router service-monitor transaction-log enable command.
The show service-router service monitor command output displays some of the values monitored and logged.
Service Monitor currently raises an alarm after a user-configured threshold is exceeded. In Release 2.5.11, Service Monitor has been enhanced with augmentation alarms, which are soft alarms that send alerts before the threshold is reached. These alarms are applicable to all devices—Service Engines, Service Routers and CDSMs. Augmentation thresholds apply to device and protocol engine parameters.
Device-Level Augmentation Alarms
A different augmentation alarm is supported for each of the device-level thresholds. Based on the device parameters monitored by Service Monitor, the following minor alarms could be raised:
•Alarm 560007 (CpuAugThreshold) Service Monitor CPU augmentation alarm.
•Alarm 560008 (MemAugThreshold) Service Monitor memory augmentation alarm.
•Alarm 560009 (KmemAugThreshold) Service Monitor kernel memory augmentation alarm.
•Alarm 560011 (DiskAugThreshold) Service Monitor disk augmentation alarm.
•Alarm 560012 (DiskFailCntAugThreshold) Service Monitor disk failure count augmentation alarm.
•Alarm 560010 (NicAugThreshold) Service Monitor NIC augmentation alarm.
Check augmentation threshold, device-level threshold, and average load for the above alarm instance. Add more devices if necessary. A useful command is the show service-router service-monitor command.
The augmentation alarms raised are displayed in the show alarms detail command. The alarms are cleared when the load goes below the augmentation threshold.
Note For system disks (disks that contain SYSTEM partitions), only when all system disks are bad is the diskfailure augmentation and threshold alarms raised. The diskfailcnt threshold does not apply to system disks. The threshold only applies to CDNFS disks, which is also the case for the augmentation thresholds. This is because the system disks use RAID1. There is a separate alarm for bad RAID. With the RAID system, if the critical primary disk fails, the other mirrored disk (mirroring only occurs for SYSTEM partitions) seamlessly continues operation. However, if the disk drive that is marked bad is a critical disk drive (by definition this is a disk with a SYSTEM partition), the redundancy of the system disks for this device is affected. For more information on disk error handling and threshold recommendations, see the "SATA Disk Error Handling and Threshold Recommendations" section.
As the show disk details command output reports, if disks have both SYSTEM and CDNFS partitions, they are treated as only system disks, which means they are not included in the accounting of the CDNFS disk calculation.
Note The NIC augmentation alarm is only applicable if the device is a Service Engine.
RTSP Gateway Overload Alarm
Service Monitor checks if the RTSP gateway TPS overload alarm is raised and sets the Windows Media Streaming and Movie Streamer threshold exceeded states in the keepalive message sent to SR. This ensures that the SR does not redirect RTSP requests to the Service Engine when the RTSP gateway TPS is overloaded.
The new RTSP TPS alarm is raised when the RTSP gateway maximum transaction rate, which can be configured through the CLI, is met. Following are the alarm details:
•Alarm 512001 (tpsquotaexceed) RTSP request rate has reached service threshold limit.
RTSP request rate has reached service threshold limit. Further requests will wait in TCP queue until the service quota is refilled in the next two seconds.
RTSP gateway alarm is checked at two-second intervals. The RTSP gateway TPS value is updated in two-second intervals.
RTSP Gateway Augmentation Alarms
The following augmentation alarm is used for the RTSP gateway:
•Alarm 511017 (rtspgaugmentexceeded) RTSP gateway TPS has reached augmentation threshold limits.
RTSP gateway TPS has reached augmentation limits on maximum concurrent connections allowed bandwidth.
No service disruption. Monitor device to see if it exceeds service threshold limits and add more devices if necessary.
Web Engine Augmentation Alarms
The following augmentation alarms are used for Web Engine:
•Alarm 9000011 (aug_memory_exceeded) Web Engine augmentation memory threshold exceeded
Web Engine has reached augmentation limits for memory usage.
No service disruption. Monitor device to see if it exceeds service threshold limits and add more devices if necessary.
•Alarm 9000012 (aug_session_exceeded) Maximum augmentation concurrent session threshold exceeded
Web Engine service has reached augmentation limits for concurrent sessions.
No service disruption. Monitor device to see if it exceeds Web Engine threshold limits and add more devices if necessary.
Windows Media Streaming Augmentation Alarms
The following augmentation alarm is used for Windows Media Streaming:
•Alarm 511014 (wmtaugmentexceeded) Windows Media Streaming has reached augmentation threshold limits.
Windows Media Streaming has reached augmentation limits on maximum concurrent connections or allowed bandwidth.
No service disruption. Monitor device to see if it exceeds Windows Media Streaming threshold limits and add more devices if necessary.
Useful commands are the show wmt and show statistics wmt usage commands.
Movie Streamer Augmentation Alarms
The following augmentation alarm is used for Movie Streamer:
•Alarm 511016 (msaugmentexceeded) Movie Streamer has reached augmentation threshold limits.
Movie Streamer has reached augmentation limits on maximum concurrent connections or allowed bandwidth.
No service disruption. Monitor device to see if it exceeds Movie Streamer threshold limits and add more devices if necessary.
Useful commands are the show movie-streamer and the show statistics movie-streamer all commands.
Flash Media Streaming Augmentation Alarms
The following augmentation alarm is used for Flash Media Streaming:
•Alarm 511015 (FmsAugThreshold) Flash Media Streaming has reached augmentation threshold limits.
Flash Media Streaming has reached augmentation limits on maximum concurrent connections or allowed bandwidth.
No service disruption.Monitor device to see if it exceeds Flash Media Streaming threshold limits and add more devices if necessary.
Useful commands are the show flash-media-streaming and the show statistics flash-media-streaming connections commands.
Maximum concurrent connections have a default value of 200 and maximum bandwidth has a default value of 200 Mbps. The augmentation alarm is enabled through the Service Monitor and the augmentation threshold is configured at 80 percent (default). The default service threshold for Flash Media Streaming is 90 percent.
In this case, the augmentation alarm is raised for Flash Media Streaming when 0.8 * 0.9 * 200 = 144 connections or 144 Mbps of bandwidth is exceeded. The Service Router still redirects requests to this Service Engine. The alarm is cleared when the traffic falls below either of the thresholds; that is, 144 connections or 144 Mbps in this example.
Configuring the Augmentation Alarms
The Service Monitor Augmentation Alarms are disabled by default. To enable the augmentation alarms, enter the service-router service-monitor augmentation-alarm enable command.
To disable the augmentation alarms, enter the no service-router service-monitor augmentation-alarm enable command.
The augmentation alarms threshold is a percentage, that applies to the CPU, memory, kernel memory, disk, disk fail count, NIC, and protocol engine usages. By default it is set to 80 percent.
As an example of an augmentation alarm, if the threshold configured for CPU usage is 80 percent, and the augmentation threshold is set to 80 percent, then the augmentation alarm for CPU usage is raised when the CPU usage crosses 64 percent.
If "A" represents the Service Monitor threshold configured, and "B" represents the augmentation threshold configured, then the threshold for raising an augmentation alarm = (A * B) / 100 percent.
The threshold value range is 1-100. The following command sets the augmentation alarms threshold to 70 percent:Device(config)# service router service-monitor threshold augmentation 70
The following command resets the augmentation alarm threshold to the default:Device(config)# no service router service-monitor threshold augmentation 70
The show service-router service monitor command displays the augmentation alarm threshold configuration.
The show alarms command displays the alarms output.
The show alarms history detail command displays the history details.
The show alarms detail command displays the alarms details.
The show alarms detail support command displays the support information.
The Internet Streamer CDS runs on the CDE100, CDE200, CDE205, and the CDE220 hardware models. Table 3 lists the different device modes for the Cisco Internet Streamer CDS software, and which CDEs support them.
Table 3 Supported CDEs
Device Mode CDE100 CDE200 CDE205 CDE220-2G2 CDE220-2S3i
SR—Proximity Engine standalone
Release 2.5.11 supports the CDE220-2S3i platform. There are a total of 14 gigabit Ethernet ports in this CDE. The first two ports (1/0 and 2/0) are management ports. The remaining 12 gigabit Ethernet ports can be configured as two port channels. See the Cisco Content Delivery Engine CDE205/220/420 Hardware Installation Guide for set up and installation procedures for the CDE220-2S3i and the Cisco Internet Streamer CDS 2.5 Software Configuration Guide for information on configuring the Multi Port Support feature.
The CDE220-2G2 platform has a total of ten gigabit Ethernet ports. The first two ports (1/0 and 2/0) are management ports. The remaining eight gigabit Ethernet ports can be configured as one port channel. See the Cisco Content Delivery Engine CDE205/220/420 Hardware Installation Guide for set up and installation procedures for the CDE220-2G2.
The CDE100 can run as the CDSM, while theCDE200 can run as the Service Router or the Service Engine. See the Cisco Content Delivery Engine CDE100/200/300/400 Hardware Installation Guide for set up and installation procedures for the CDE100 and CDE200.
The CDE205 can run as the CDSM, Service Router, or Service Engine. See the Cisco Content Delivery Engine CDE205/220/420 Hardware Installation Guide for set up and installation procedures for the CDE205.
Note For performance information, see the release-specific performance bulletin.
Limitations and Restrictions
This release contains the following limitations and restrictions:
•There is a 4 KB maximum limit for HTTP request headers. This has been added to prevent client-side attacks, including overflowing buffers in the Web Engine.
•Standby interface is not supported for Proximity Engine. Use port channel configuration instead.
•There is no network address translation (NAT) device separating the CDEs from one another.
•Do not run the CDE with the cover off. This disrupts the fan air flow and causes overheating.
Note The CDS does not support network address translation (NAT) configuration, where one or more CDEs are behind the NAT device or firewall. The workaround for this, if your CDS network is behind a firewall, is to configure each internal and external IP address pair with the same IP address.
The CDS does support clients that are behind a NAT device or firewall that have shared external IP addresses. In other words, there could be a firewall between the CDS network and the client device. However, the NAT device or firewall must support RTP/RTSP.
The matchRule element in the Manifest file is only supported for HTTP; you cannot use FTP and use the matchRule element.
To maximize the content delivery performance of a CDE200, CDE205, or CDE220, we recommend you do the following:
1. Use port channel for all client-facing traffic.
Configure interfaces on the quad-port gigabit Ethernet cards into a single port-bonding interface. Use this bonding channel, which provides instantaneous failover between ports, for all client-facing traffic. Use interfaces number 1 and 2 (the two on-board Ethernet ports) for intra-CDS traffic, such as management traffic, and configure these two interfaces either as standby or port-channel mode. Refer to the Cisco Internet Streamer CDS 2.4 Software Configuration Guide for detailed instruction.
2. Use the client IP address as the load balancing algorithm.
Assuming ether-channel (also known as port-channel) is used between the upstream router/switch and the SE for streaming real-time data, the ether-channel load balance algorithms on the upstream switch/router and the SE should be configured as "Src-ip" and "Destination IP" respectively. Using this configuration ensures session stickiness and general balanced load distribution based on clients' IP addresses. Also, distribute your client IP address space across multiple subnets so that the load balancing algorithm is effective in spreading the traffic among multiple ports.
Note The optimal load-balance setting on the switch for traffic between the Content Acquirer and the edge Service Engine is dst-port, which is not available on the 3750, but is available on the Catalyst 6000 series.
3. For high-volume traffic, separate HTTP and WMT.
The CDE200, CDE205, or CDE220 performance has been optimized for HTTP and WMT bulk traffic, individually. While it is entirely workable to have mixed HTTP and WMT traffic flowing through a single CDE200 simultaneously, the aggregate performance may not be as optimal as the case where the two traffic types are separate, especially when the traffic volume is high. So, if you have enough client WMT traffic to saturate a full CDE200, CDE205, or CDE220 capacity, we recommend that you provision a dedicated CDE200 to handle WMT; and likewise for HTTP. In such cases, we do not recommended that you mix the two traffic types on all CDE servers which could result in suboptimal aggregate performance and require more CDE200, CDE205, or CDE220 servers than usual.
4. For mixed traffic, turn on the HTTP bitrate pacing feature.
If your deployment must have Streamers handle HTTP and WMT traffic simultaneously, it is best that you configure the Streamer to limit each of its HTTP sessions below a certain bitrate (for example, 1Mbps, 5Mbps, or the typical speed of your client population). This prevents HTTP sessions from running at higher throughput than necessary, and disrupting the concurrent WMT streaming sessions on that Streamer. To turn on this pacing feature, use the HTTP bitrate field in the CDSM Delivery Service GUI page.
Please be aware of the side effects of using the following commands for Movie Streamer:Config# movie-streamer advanced client idle-timeout <30-1800>Config# movie-streamer advanced client rtp-timeout <30-1800>
These commands are only intended for performance testing when using certain testing tools that do not have full support of the RTCP receiver report. Setting these timeouts to high values causes inefficient tear down of client connections when the streaming sessions have ended.
For typical deployments, it is preferable to leave these parameters set to their defaults.
5. For ASX requests, when the Service Router redirects the request to an alternate domain or to the origin server, the Service Router does not strip the .asx extension, this is because the .asx extension is part of the original request. If an alternate domain or origin server does not have the requested file, the request fails. To ensure requests for asx files do not fail, make sure the .asx files are stored on the alternate domain and origin server.
The open caveats section has the following subsections:
Open Caveats in Release 2.5.11-b26
There are no new open caveats for 2.5.11-b26.
Open Caveats in Release 2.5.11-b21
There are no new open caveats for 2.5.11-b21.
Open Caveats in Release 2.5.11-b20
There are no new open caveats for 2.5.11-b20.
Open Caveats in Release 2.5.11-b19
There are no new open caveats for 2.5.11-b19.
Open Caveats in Release 2.5.11-b18
There are no new open caveats for 2.5.11-b18.
Open Caveats in Release 2.5.11-b15
There are no new open caveats for 2.5.11-b15.
Open Caveats in Release 2.5.11-b13
This release contains the following open caveats:
Windows Media Streaming
Windows Media Streaming backend process caused a core dump file to be created during post-processing of a VOD fast-forward or rewind request.
Only happens in VOD pass-through logic when a client is sending a fast-forward or rewind request. Windows Media Streaming front-end process received an end-of-stream (EOS) message during the front-end process or post-process pausing message.
SE goes offline when enabling Fast SE offline detection.
This issue can be triggered by changing the UDP port on the CDSM GUI page.
Restart the CMS service on the CDSM.
Acquisition and Distribution
MetaReceiver process core dumps.
Because of a timing issue, the resources being accessed unsafely within MetaReceiver process.
None. However, node-mgr restarts the meta-receiver smoothly after the core dump. Minimum impact to the service.
Unified Kernel Streaming Engine (UKSE)
After Windows Media Streaming live client stops under stress conditions, the show statistics wmt streamstat command may show a few remaining session of incoming and outgoing for another 15 to 20 minutes.
It happens for Windows Media Streaming live, with lots of client coming and leaving quickly.
None. Low impact, a few stale session hang around for an extra 15 to 20 minutes.
The Web Engine generates a core dump in a particular scenario.
High stress Windows Media Streaming HTTP traffic is running, and Windows Media Streaming threshold is exceeded. This causes the Windows Media Streaming process to not accept the Web Engine HTTP forwarded request, and can cause Web Engine to core dump.
With SR in the scenario and the Web Engine threshold set appropriately the service threshold alarm is raised, and no more request reach the SE. In this case, this issue would not be seen.
The Web Engine crashes and the existing sessions are terminated. The process is restarted immediately and subsequent requests are handled seamlessly.
This occurs when a URL request is 2048 characters or longer and the request is handled by the Web Engine custom log format with both %r (to print the request first line) and %U (to print the URL) in the format string.
Use Apache or Extended-Squid transaction logging formats, or configure custom transaction logging with either %r or %U (including both %r and %U prints redundant information).
Web Engine experiences read time outs from the Authorization Server during an 8-hour, all unique, cache-fill test.
This occurred in a three-tier topology with a Content Acquirer, middle tier, and edge SE all configured on CDE220-2S3 platforms. The transactions per second were around 50 to 60. The testing used all unique cache-fill content with one Spirent client port and 1 Spirent Server port. The file size was set to 500 KB. The test lasted eight hours.10/23/2010 16:25:31.207(Local)(8159)ERRO:AuthSvrQuery.cpp:30-> Time out occurred with authsvr read10/23/2010 16:25:31.207(Local)(8159)ERRO:HTTPCacheAppCtxt.cpp:1510-> WorkerPid HTTPCacheApp[0xeef02968] : AppCtxt(0xe86a2158) Auth Server Query Error (-1), AuthSvrQuery(0xe869bc08)10/23/2010 16:25:31.207(Local)(8159)ERRO:HTTPCacheAppCtxt.cpp:1633-> WorkerPid HTTPCacheApp[0xeef02968] : AppCtxt(0xe86a2158) - Received Error (500) - Complete
This happens under stress and a long longevity test. Current read time is two seconds and Web Engine treats it as an internal error.
Zeri VOD playback fails in a particular scenario.
The per-delivery service pacing is set to 1 Mbps and there is two-tier setup for the SEs.
Increase the pacing to 50 or 100 Mbps.
Cache Router dumps core.
Happens sometimes when there is a connection issue with upstream device.
None. Low impact. Request is still served.
The Cache Router goes into core dump during Web Engine small-objects stress testing.
This occurs in a two-tier setup (Client->Edge->Acq->OS) with all unique cache-miss stress, running for about a day. The transactions per second was 200.
Minimal service impact. Self-correcting in seconds.
The following messages can be seen on a neighbor router when the BGP password is unconfigured on Proximity Engine, after the BGP adjacency has been formed, but corresponding removal is not performed on the router:*Feb 7 03:32:14.861: %TCP-6-BADAUTH: No MD5 digest from 192.168.82.33(179) to 192.168.82.2(24018)*Feb 7 03:34:00.573: %TCP-6-BADAUTH: No MD5 digest from 192.168.82.33(179) to 192.168.82.2(24018) (RST)
This issue occurs when adjacency is established with a neighboring router and the password is removed from Proximity Engine configuration and not re-configured within the hold time. Occurred in Release 2.5.3, as well as Release 2.5.9.
When the password is unconfigured on the Proximity Engine side, the two peers cannot communicate with each other. This state is reported on the router side with the following repeated messages:*Feb 7 03:32:14.861: %TCP-6-BADAUTH: No MD5 digest from 192.168.82.33(179) to 192.168.82.2(24018)
This occurs until the TCP connection is closed on Proximity Engine side and enters TIME_WAIT state. While this state lasts, no messages are printed on the router. The router is still retransmitting TCP packets, but the Proximity Engine is ignoring them, as per TIME_WAIT state. After about 60-75 seconds, the following messages start to display on the router:*Feb 7 03:37:32.937: %TCP-6-BADAUTH: No MD5 digest from 192.168.82.33(179) to 192.168.82.2(24018) (RST)
These indicate that the TCP connection has been completely closed on the Proximity Engine side, which therefore no longer has any knowledge of the TCP connection and responds to each retransmitted packet with an RST packet, which does not have an MD5 signature. This situation is described in RFC 2385, section 4.1 (Connectionless Resets). The messages are logged as long as the router retransmits TCP packets of the lost connection, which has been observed to occur for up to ten minutes. This issue does not affect correct operation.
The Ucache process goes into core dump when the memory usage reaches close to 4 GB, no service failure occurs during this period, just a core file is generated on the SE.
The core dump happens when the Ucache process memory usage reaches close to 4 GB. This can happen for the following reasons:
–SE has large amounts of content with large URLs.
–The clear cache all command was entered when there are many content files.
Occurred in Release 2.5.3, as well as Release 2.5.9.
If the number of content objects in the SE is not large and the URL size is small, then this core dump can be avoided. Maximum cache object count can be set by using the cache content max-obj-count command.
The error messages logged in the Ucache logs when there is stress and eviction in progress.
The error message occurred when the Ucache process was in a stressed environment with eviction in progress. The eviction resulted in sending an RPC to itself during the eviction. When there are many RPC messages coming into the Ucache process, the RPC can time out.
Too many deletion operations are the root cause for this error message. If the maximum object count is small, this issue can be avoided.
Memory usage of Service Router increases a lot and leads to core dump.
We see the memory leak when the following conditions are true:
–There is only one SE assigned to the delivery service and the SE becomes unavailable (reloaded or offloaded).
–There are a lot of requests from HTTP 1.1 where there are multiple requests within the same HTTP session.
The Service Router sometimes goes into a core dump after uploading 4 MB Coverage Zone file.
The Coverage Zone file is too large (28,000 entries), with 10 delivery services, and multiple SEs. Occurred in Release 2.5.3, as well as Release 2.5.9.
Reduce one of the three: number of Coverage Zone file entries, number of delivery service, or number of SEs per location.
MP3 Live Streaming
Web Engine goes into core dump on the edge SE or middle SE, when the origin server or the Content Acquirer restarts during playback.
During the MP3-live playback, restart the Web Engine on the Content Acquirer or stop the encoder process on origin server. When the origin server restarts, all the SEs go into core dump. When the Web Engine restarts on the Content Acquirer, the middle SE and edge SE go into core dump.
Web Engine restarts by itself.
The caveats listed in this section have been resolved since Cisco Internet Streamer CDS Release 2.5.11. Not all the resolved issues are mentioned here. The following list highlights the resolved caveats associated with customer deployment scenarios. The resolved caveats section has the following subsections:
Resolved Caveats in Release 2.5.11-b26
Table 4 lists the issues resolved in the Cisco Internet Streamer CDS 2.5.11-b26 release.
Click on the bug ID to view the bug details. This information is displayed in the Bug Toolkit.
Resolved Caveats in Release 2.5.11-b21
The following caveats have been resolved since Cisco Internet Streamer CDS Release 2.5.11-b21:
rpc_httpd (the Apache process) initiated a core dump when reloading.
When the device is reloading, the rpc_httpd process is exiting.
Alarm is not clearing despite rtspg logs showing that there is no traffic hitting the box.
These were generated during a change where 11 Internet Streamers were removed from a custom Device Group and assigned to the "BASELINE" Device Group.
ftp_export stalls. Debugging with GDB; it waits for read in the socket.
The sftp-server (SSH) does not response to the ftp_export and does not break the connection.
A script is needed to delete all the fragment files for a URL path.
delete_files_url_with_path.sh "<URL path>".
Web Engine failed to copy the HCACHE header or perform a new content route lookup for requests with the query string or HEAD only requests, which are bypassed up front.
This happens only when the file is already cached on disk, requires revalidation and new requests are either query string or HEAD requests.
Web Engine in the edge SE bypasses CA and reaches OS directly.
1. Edge SE, which is serving the client directly.
2. HTTP Head request.
3. The requesting content is cache-hit and expired and need revalidation.
4. There is an existing Datasource for the requesting URL but no existing DataSourceFinder for the request URL.
When a customer changes the encoder resolution of the video output, the cudtomer receives a garbled video image and has to restart the live program through the CDSM GUI.
The customer changes the encode resolution of the video output.
Resolved Caveats in Release 2.5.11-b20
The following caveats have been resolved since Cisco Internet Streamer CDS Release 2.5.11-b20:
Client has been redirected to 302 Last Resort while one SE device becomes offline. Supposedly, SR should pick up the other SE.
SE becomes offline when the Service-Monitor process hangs.
Two IP addresses are configured to an SE, one of the IP addresses maps to a delivery service and the other is still able to play the stream of the delivery service.
The client configures the other IP address of the SE in the client host file, so the client can play the stream.
When service_monitor process is hung, the SE state inside the SR is flapping. The interval of UDP packet for keepalive is increased from 2 seconds to 8 seconds. It is because there are multiple SR devices in production network. Each query adds extra delay.
SE service_monitor process is hung.
Resolved Caveats in Release 2.5.11-b19
The following caveats have been resolved since Cisco Internet Streamer CDS Release 2.5.11-b19:
The ability of the content based routing feature is limited when URL signing is used. Content based routing generates a hash on the whole URL, which includes the signed part and therefore is always unique. The SR redirects the client to multiple streamers for the same content
When URL signing feature is used.
Stream-schedule core dump on a customer device.
There is no corresponding record found from the (DB) table uni_multicast_info, but we have a logic issue for this case.
Clients receive a 504 error when streaming WMT.
There is a race condition in the Web Engine that when OS times out, it may cause all subsequent client requesting this movie to receive a 504 error.
Based on the investigation over the error logs and the Web Engine code, the root cause is:
1. Some existing data sources in the memory ended up with a bad status because of content downloading being terminated by OS HTTP response body read timeout 504 (because the OS is overloaded).
2. At the same time, many new requests kept hitting the existing data sources.
3. Before using those data sources, there is no status check; therefore, 500 were returned to the client.
4. Since new requests kept hitting the data sources, the data sources could not be evicted fr.om the memory.
188.8.131.52 Origin server is under heavy load.
WMT streams are affected.
Incoming bytes do not increment for that stream.
Resolved Caveats in Release 2.5.11-b18
The following caveats have been resolved since Cisco Internet Streamer CDS Release 2.5.11-b18:
Customer need to setup cookie that include "=" and ":"
BGP process does not start after config from CLI or CDSM.
This happens only if i-node count is very high. The i-node count can be checked by lsof. The utility netstat will show i-node INT_MAX for some processes, which is the root cause.
Windows Media (http), SE sometimes denies request.
1. URL sign generation is enabled through authsvr rule file.
2. URL sign verification is enabled through CLI rule configuration.
Cache control header smaxage is not cached correctly on the SE.
Cache control header smaxage is configured on the Origin Server.
Custom access logs (transaction logs) shows large value in the column which corresponds to the "Bytes-Transferred-Excluding-Header" when no data is sent out to the downstream SE.
Only under overloaded conditions when the SE or CA takes too long to send the data out to the downstream SE. But the downstream SE (after waiting for a certain time; default= 5secs), disconnects from the upstream SE and sends a 504-Gateway timeout to the client.
SE sends HTTP connections directly to Origin Server bypassing CA.
Code upgrade from 2.5.9 to 2.5.11.
Web Engine core dump.
1. There is traffic ongoing during the Web Engine startup.
2. Thread transfer happens.
Resolved Caveats in Release 2.5.11-b15
The following caveats have been resolved since Cisco Internet Streamer CDS Release 2.5.11-b15:
Movie Streamer engine core dumps on SE.
During Movie Streamer initialization of a unicast-in live program, the SDP enclosed in the DESCRIBE response from the Origin server does not contain any valid stream-level metadata. This may happens if the Origin server is Wowza server.
High number of SNMP traps flooding the monitoring station.
Internal configuration change notifications, possibly because of misconfiguration; for example, if a delivery service has an invalid URL, configuration traps are sent in an endless loop.
If the input URL is externally signed and passed to the URL Manager, the internal-sign process replaces the question mark (?) with the ampersand (&), which creates a false URL.
The problem occurs when a client request has an externally-signed URL.
When end-user changes channels, they start at the same point within the video (between 3 -10 minutes), the Manifest file is not updating as frequently as it should (every second or so).
Occurs when streaming adaptive bit rate (ABR) content.
The Web Engine core dumps after enabling range cache-fill during stress for large file, or the SE runs out of connection or memory and errors such as the following are seen:Web Engine Concurrent sessions exceeds threshold valueSession count (x) reached session threshold (30000) or Memory Usage (x) higher than (3435973836) for FD
The Web Engine crashes on failover scenario for large file range request, or Web Engine no longer accepts any new connections because connection limit is reached (connections are stuck in CLOSE_WAIT state).
The following symptom were seen for this issue:
–All the four SRs and one backup CDSM were reported to be down on the primary CDSM. Only these five devices were observed to flip between online and offline modes while the SEs states seem to be okay.
–There were no reported interruption to the end user. But the CDN monitoring system (CDSM) is reported to be unreliable.
–Huge /local/local1/logs/rpc_httpd/ssl_scache.pag file size (around 44 GB) on the primary CDSM.
–No core files observed.
The SRs, SEs, and backup CDSM send HTTP and HTTPS messages to the primary CDSM. The messages are handled by the rpc_httpd process on the CDSM. These requests are the HTTP and HTTPS messages that report the health of the various nodes to the CDSM.
Apache (rpc_httpd) uses ssl_scache.pag file to speed up parallel request processing by avoiding unnecessary session handshakes. At every SSLSessionCacheTimeout interval the global/inter-process SSL Session Cache information is timed out, with the httpd process acquiring a lock and traversing the records. Because of the size of the file (approximately 44 GB), this operation is taking an excessively long time, which blocks other processes from reading the file for session information.
Because the Fast SE Offline Detection is enabled, the SEs health is communicated to the CDSM using UDP messages (and not the HTTP/HTTPS mechanism). This corresponds to what was observed, where only the backup CDSM and the SRs were offline, while the SEs were reported to be online.
This issue happens once every six months with more than 50 SEs, 4 SRs, and 1 backup CDSM communicating with the primary CDSM using SSL.
Windows Media Streaming live stream request goes directly to the Origin Server from a non-Content Acquirer Service Engine.
Primary Content Acquirers were on reloading or down. Some liveness queries to the backup Content Acquirer returned failure for the Windows Media Streaming engine.
The following two symptoms have been observed:
1. The rea agent stops running after the cms agent crashes.
2. If the rea agent is started and the show rea info command is entered at the same time, the rea agent may fail to start.
For symptom 1, it happens each time the cms agent crashes.
For symptom 2, it is a rare case.
Custom transaction logs are printed continuously without a new line character between log entries.
Work load is high, stressed.
Clients received 403 response from SE.
SE failed to reach the Geo-location server when Geo-location API call failed.
Cache Router does not take into account all the Content Acquirers and SEs in a location for its route calculations. It only picks the first two (in the dynamic hash list of SEs generated for every URL) and sends liveness queries only to those two. If both of them fail to respond or respond with "unusable" state, the Cache Router skips that location and goes upstream (it could be another SE or directly to the Origin server accordingly).
If two Content Acquirers are down or offloaded in a root location, the SEs could be directly contacting the Origin server for some of the URLs which bypasses the other Content Acquirers in the root location.
Only when at least two of the Content Acquirers in the root location are offloaded or down and there are other live Content Acquirers in the root location.
Content Acquirer fails to send keep-alives to the Origin server.
We do not support Response messages with header "content-type:audio/aacp."
Resolved Caveats in Release 2.5.11-b13
The following caveats have been resolved since Cisco Internet Streamer CDS Release 2.5.11-b13:
Windows Media Streaming
Windows Media Streaming streams are affected.
Incoming bytes do not increment for that stream.
There are two incoming streams (ingest) for a single outgoing stream when encoder failure happens in some particular scenarios.
After Windows Media Streaming live program fails over from primary encoder to secondary encoder, recover the primary one, then send new request.
The show stat wmt streamstat command output displays stale entry; the process for that stream is gone.
For all unique cache-miss cases, during cache-fill stage, Windows Media Streaming can only sustain about 200 concurrent users.
All unique cache-miss cases.
Windows Media Streaming cached HTTP response into media data.
During cache fill, Windows Media Streaming hit some network issue or the connection with the Origin Server was dropped.
The cached data for one video contains the content from another video. Windows Media Streaming playback failed.
Under stress and ramp-up value is more than 20, Windows Media Streaming could generate the same session-id for two different clients in a short period of time, this causes a cache-filling error.
Windows Media Streaming stream statistics show a large value in the duration field.
Player only sends open and close requests, no play request is sent.
Line feeds in one of the Windows Media Streaming transaction log field causes a log entry to split across two entries.
The client is a Windows Media Player.
The wmt_ml process enters into a hang state and cannot serve requests anymore.
Bad cached data cause FE send stream-end message to wmt_ml continuously.
Some video content has a freezing issue because the Content Acquirer cached a corrupted block.
The Origin server is an Apache HTTP server.
Windows Media Streaming backend process core dumped during post-processing of a live session.
This issue only happens if the live source is a server- side playlist.
After five days load testing, core.wmt_be found on device.
It happens for Windows Media Streaming five, when a pause event happens in a live program.
Some alarms are not cleared, even when the issue no longer exists.
If the encoder is recovered but the live program does not get the source from it, it has no chance to clear the alarm. An example of this scenario follows:
A live program pulls the stream from encoder1; encoder2 is the backup source. When encoder1 fails there is an alarm raised and the live program switches to encoder2. The alarm was noticed and recovery to encoder1 was accomplished. The alarm is not recovered until the SE pulls the stream from encoder1 successfully.
The node health monitor framework allows one module to clear the alarm only if the module is the one who raised the alarm. CDSM only has the rights to view all alarms and clear the cdm alarms.
If the encoder is recovered but the live program does not get the source from it, it has no chance to clear the alarm.
A single HTTP over MMS request triggers two DESCRIBE requests sent to upstream SE or Origin server.
The media client is using HTTP 1.0 and disconnects the TCP connection after received the Describe response.
Backup Content Acquirers may take some flows to distribute to lower tier for live streams.
Live streams are configured and in Tier 1 there are multiple SEs present. Not all traffic flows through defined Content Acquirer; instead, other SEs in Tier 1 handle the traffic.
The wmt_mbe process generated core.
The wmt_mbe process generated core.
While handling Windows Media Streaming traffic.
The Windows Media Streaming front-end connection number between the Content Acquirer and the Origin server is much higher than the number of live programs configured. Usually the number is double. The extra connections are not persistent, they are dynamic and refresh every around 60 seconds.
Windows Media Streaming live program is set to be primed and wmt_mbe got some error from Origin server (USS) during the RTSP DESCRIBE response process.
The wmt_be process generated a core dump file.
The Origin server sent an announce message to the SE.
The cs-url process logs the original URL for the request, not the SR redirected URL.
When the request is redirected from an SR.
The cs-uri-stem limit is 128 bytes, so long string values are truncated.
When cs-uri-stem is longer than 128 characters.
The data server uses 100 percent of the CPU.
This is because of the fd leak when unbind fails. Add retry logic to fix it.
The Content-Length header results in httprequestreader to continue reading for head request.
The Content-Length header results in httprequestreader to continue reading for head request.
Nessus security scan caused Web Engine to go into a loop.
127.0.0.1 TCP_MISS/504 231 HEAD http://0.0.0.0/ application/octet-stream
This seems to be the result of one of the Nessus tests with a bad HTTP host header of 0.0.0.0.
An error condition occurs in which datasourcefinder is not cleaned up.
When a query URL is encoded, standard decoding happens.
Flash Media Streaming
Log parsing and analysis failed on Sawmill and other third-party log analysis tools.
Using Sawmill FMS log parsing module.
In the Service Rule XML file, Rule_Allow is configured with multiple matchGrps, none of the matchGrps match, and there is another Action configured following the Allow actions, the request is incorrectly allowed.
With multiple matchGrps in the Rule_Allow, when none of the matchGrps match and there is another action following the Allow action, the request is incorrectly allowed.
Changing the primary or secondary Geo-location servers configuration does not take affect immediately.
Change primary or secondary Geo-location IP address on the SE configuration.
An issue only happens on CAT6K. What happens here is if our fiber box is connected to CAT6K fiber module, when you shutdown Gige 10/0, the switch side always shows the interface is up but the SE side shows interface is down and network is not reachable. If we shutdown Gige 9/0, there is no issue, both switch and SE sides show interface is down and port-channel continues to work. I am not sure what is so special about the Gig10/0, it is PCI-x NIC but Gig9/0 is also on the same NIC card.
Release 2.5.3.b15 and using CAT6K.
The error condition triggers the core dump and causes the dataserver become out of sync.
The condition happens when the interface portchannel 1bandwidth 100 command and the interface portchannel 1 bandwidth 1000 command are entered, and then the SE is rebooted.
Dataserver bind fails in Web Engine even after 10 retries with 3-second retry intervals.
Dataserver bind fails in Web Engine even after 10 retries with 3-second retry intervals.
HTTP connection is closed after the content gets served, even if there is "Connection: Keep-Alive" in HTTP/1.0 request.
–HTTP/1.0 and "Connection: Keep-Alive" exists
–Large content gets served successfully
APPLE HLS streaming chunks take a very long time to get served, in minutes. The show statistics web-engine detail command shows a very large number of outstanding CAL updates.
Because of the backlog of disk operations, a disk file creation takes a long time, then because of the backlog of the file creations, it takes a very long time to serve any streaming chunk.
URL signature validation fails since the client browse; that is, the android, cannot understand a colon (:).
When a signed UEL contains a colon (:), the android browser cannot handle it.
After the standby CDSM is switched to primary, running the show cms info command on the SE or SR still shows the original primary CDSM's IP address as the "Current CDSM Address."
Switching from primary to standby CDSM.
CDSM graphs are not showing correct values.
After CDSM role is changed.
Core generated in MetaDataReceive.
When the table is dropped and the Acquisition and Distribution component wants to access the particular table. It is mostly triggered by entering the cms recover identity command.
Java core dump found on the CDSM.
Logging into the CDSM and password not provided.
The show programs command reports the wrong live program status.
The show programs command reports "Failed to start program (UNS resolve fails)" or "Failed to start program (WMT API failed to start program)."
When the live program is working correctly.
Transaction logs are not rolling properly on cds-is devices.
When the transaction logs file sequence number roll over.
For transaction logs, the rotation does not work for compressed files.
When configuring the transaction-logs export compression.
When UNS has inconsistent entries an alarm is raised, The complete alarm name, unsinconsistententries, is not displayed in the output of the show alarms command, because alarm name is too long.
Functionality works fine. This is just a cosmetic issue where complete alarm name string is not displayed.
The alarm "unsinconsistentetries" has been raised. It can be seen in the output of show alarms command.
This alarm is generally raised when an internal UNS journal file used by UNS process gets corrupted when UNS starts. It can also be caused when there is an inconsistency between total content count between two internal processes: UNS and Ucache.
The UNS server goes into core dump after a device reload (after running Flash Media Streaming mixed traffic).
When running Flash Media Streaming mixed performance testing(70-20-10: 70percent all unique, 20 percent single unique, 10 percent cache-miss) traffic. Occurred in Release 2.5.3, as well as Release 2.5.9.
The core.service_router generated on the Service router.
Number of errors("QUOVA_RETURN_FAILURE") being returned from the quova server.
No external symptom. The service monitor process makes an RPC call to the Service Router process even when the Service Monitor transaction log is not enabled (service-router service-monitor transaction-log enable).
Some of the HTTP requests sent to the Service Router take a long time to get a response.
Number of requests sent to the Service Router is high.
The Service Router is wrongly matching network prefixes x.y.0.0/24 when not expected, treating them as if they are x.y.0.0/16.
This was observed when offloading a Service Engine associated with prefix x.y.z.0/24 in the Coverage Zone File. The Service Router started to match the entry for prefix x.y.0.0/24 after that.
The SE shows as active in the show service-router services command output and the show service-router routes command output when the Service Engine is not sending keepalive messages to the Service Router. No SE keepalive alarms are seen for this down Service Engine on the Service Router.
All the streaming interfaces and the primary interface on the Service Engine are shut down. The keepalive interval is changed.
The fmsdge process core dumps sometimes on the SR when a stress test using RTMPT is running.
When a very high or uncontrolled ramp up rate is used in the load tool and more RTMPT connections are sent than what is the maximum configured, the extra connections are rejected. As a result the tool tries to send more connections leading to more connections getting rejected. As this happens the memory usage of the fmsedge process keeps increasing and it coredumps at 4GB. Occurred in Release 2.5.3, as well as Release 2.5.9.
Device is shown as offline in the CDSM. The show service-router service-monitor command takes a lot of time and does not print all values.
Service Monitor goes into core dump.
During stress test. Occurred in Release 2.5.3, as well as Release 2.5.9.
Unified Kernel Streaming Engine (UKSE)
There were TCP retransmits when we were overrunning the link.
The network link must be overrun.
Core file seen for process ucache-svr when device is serving RTSP VOD traffic.
Core file seen on setup. Issue happens because of corruption of metadata of one of the asset files copied on the device. The error condition to handle this incorrect metadata was not correct.
A core file is generated by Ucache process in a longevity test.
This happens for a very rare condition when the internal alarm infrastructure sometimes does not return the alarm information for an alarm. The alarm infrastructure maybe busy or unresponsive on that occasion.
Seeing a rootfs alarm on a CDS device.
Having, for example, a misconfigured DNS server which causes error messages to fill the rootfs file system, resulting in potential instability.
In live routing debug message, it prints message to connect to an invalid IP address.
Normal running. The SE itself must be the live routing candidate; that is, the "se_id 0 ip 0" line shows in error log before seeing this symptom.
Under a high load of Windows Media Streaming live transactions, sometimes requests are rejected because no response is received by Windows Media Streaming within 30 seconds.
This happens when Windows Media Streaming cannot get responses from the live stream router (LSR) module, which gets a hierarchical path of Service Engines in route to the Origin server. This is because the LSR has very high connect and read timeouts when it tries to get information from upstream SEs.
Transaction logs are exported twice.
Two (S)FTP export servers are configured and the device is upgraded.
No RTSP quota exceeded logging in error level log.
Occurs when error log level is set.
Client request from User-Agent Lavf is not sent to WIndows Media Streaming.
The video quality is poor during playing content by using RTSP from the Movie Streamer.
A core file is generated by the Web Engine.
In stress, there are many liveness queries simultaneously from other SEs.
Accessing Bug Tool kit
This section explains how to use the Bug Toolkit to search for a specific bug or to search for all bugs in a release.
Step 1 Go to http://tools.cisco.com/Support/BugToolKit.
Step 2 At the Log In screen, enter your registered Cisco.com username and password; then, click Log In. The Bug Toolkit page opens.
Note If you do not have a Cisco.com username and password, you can register for them at http://tools.cisco.com/RPF/register/register.do.
Step 3 To search for a specific bug, click the Search Bugs tab, enter the bug ID in the Search for Bug ID field, and click Go.
Step 4 To search for bugs in the current release, click the Search Bugs tab and specify the following criteria:
•Select Product Category—Video.
•Select Products—Cisco Content Delivery Engine Series.
•Search for Keyword(s)—Separate search phrases with boolean expressions (AND, NOT, OR) to search within the bug title and details.
•Advanced Options—You can either perform a search using the default search criteria or define custom criteria for an advanced search. To customize the advanced search, click Use custom settings for severity, status, and others and specify the following information:
–Severity—Choose the severity level.
–Status—Choose Terminated, Open, or Fixed.
Choose Terminated to view terminated bugs. To filter terminated bugs, uncheck the Terminated check box and select the appropriate suboption (Closed, Junked, or Unreproducible) that appears below the Terminated check box. Select multiple options as required.
Choose Open to view all open bugs. To filter the open bugs, uncheck the Open check box and select the appropriate suboptions that appear below the Open check box. For example, if you want to view only new bugs in Prime Optical 9.5, choose only New.
Choose Fixed to view fixed bugs. To filter fixed bugs, uncheck the Fixed check box and select the appropriate suboption (Resolved or Verified) that appears below the Fixed check box.
–Advanced—Check the Show only bugs containing bug details check box to view only those bugs that contain detailed information, such as symptoms and workarounds.
–Modified Date—Choose this option to filter bugs based on the date when the bugs were last modified.
–Results Displayed Per Page—Specify the number of bugs to display per page.
Step 5 Click Search. The Bug Toolkit displays the list of bugs based on the specified search criteria.
Step 6 To export the results to a spreadsheet:
a. In the Search Bugs tab, click Export All to Spreadsheet.
b. Specify the filename and location at which to save the spreadsheet.
c. Click Save. All bugs retrieved by the search are exported.
If you cannot export the spreadsheet, log into the Technical Support website at http://www.cisco.com/cisco/web/support/index.html or contact the Cisco Technical Assistance Center (TAC).
Upgrading to Release 2.5.11
The only supported upgrade paths are Release 2.5.x to Release 2.5.11. If you are running a release prior to Release 2.5.x, you must upgrade to at least Release 2.5.x before upgrading to Release 2.5.11.
Note Before upgrading from Release 2.5.3 to Release 2.5.11, enter the clear cache all command.
Content cached in the Release 2.5.3 Web Engine, if requested in Release 2.5.11, results in duplicate entries in the Ucache process. Duplicate entries were found in the output of the show content and show cache commands, but the disk maintains only a single copy of the content.
After the upgrade procedure starts, do not make any configuration changes until all the devices have been upgraded.
Note Release 2.5.11 only supports one IGP (IS-IS or OSPF) for the Proximity Engine. When upgrading to Release 2.5.11 from Release 2.5.1 or Release 2.5.3, if both IGPs (IS-IS and OSPF) were configured for the Proximity Engine, then one of the configurations must be removed.
Note The new Web Engine in Release 2.5.11 cannot be removed during downgrade to Release 2.5.3 because this configuration is still valid in Release 2.5.3 (the new Web Engine was supported as an EFT feature in Release 2.5.3). Therefore, both CLI commands are present after downgrading.
If user roles are defined in Release 2.5.11, and the system is then downgraded to Release 2.5.3, then the following menu options will not be accessible to the user with defined roles:
•Devices > Service Engines > Service Control > ICAP
•Devices > Service Engines > Service Control > ICAP Services
•Devices > Service Engines > Service Control > PCMM QoS Policy
•Devices > Service Engines > Application Control > Web > HTTP > HTTP Connections
•Devices > Service Engines > Application Control > Web > HTTP > HTTP Caching
•Devices > Service Engines > Application Control > Web > HTTP > Advanced HTTP Caching
•Devices > Device Group > Service Control > ICAP
•Devices > Device Group > Service Control > ICAP Services
•Devices > Device Group > Service Control > PCMM QoS Policy
•Devices > Device Group > Application Control > Web > HTTP > HTTP Connections
•Devices > Device Group > Application Control > Web > HTTP >HTTP Caching
•Devices > Device Group >Application Control > Web > HTTP > Advanced HTTP Caching
•Services > Service Definition > Delivery Service > PCMM Config
If any defined user with a defined role requires access to the above menu options, then the menu options must be added by choosing System > AAA > Roles and enabling the services for those menu options.
Source Policy Routes
Release 2.5.7 supported multiple IP addresses on the CDE220-2S3i, which included specifying the default gateway and IP routes. The IP routes, source policy routes, were added to ensure incoming traffic would go out the same interface it came in on. An IP route was added using the interface keyword, which was introduced in Release 2.5.7, and has the following syntax:ip route <dest_IP_addr> <dest_netmask> <default_gateway> interface <source_IP_addr>
In the following example, all destination traffic (IP address of 0.0.0.0 and netmask of 0.0.0.0) sent from the source interface, 184.108.40.206, uses the default gateway, 220.127.116.11. This is a default policy route.ip route 0.0.0.0 0.0.0.0 18.104.22.168 interface 22.214.171.124
A non-default policy route defines a specific destination (IP address and netmask). The following ip route command is an example of a non-default policy route:ip route 10.1.1.0 255.255.255.0 <gateway> interface <source_IP_addr>
When upgrading to Release 2.5.11, any source policy routes configured using the Release 2.5.7 interface keyword are rejected and are not displayed when the show running-config command is used. However, because you had to define the default gateway for all the interfaces as part of the multi-port support feature, the equivalent source policy route is automatically generated in the routing table.
The following example shows the output for the show ip route command after upgrading to Release 2.5.11 with the default source policy routes highlighted in bold and the non-default policy routes highlighted in italics:# show ip routeDestination Gateway Netmask---------------- ---------------- ----------------172.22.28.0 126.96.36.199 255.255.255.128188.8.131.52 0.0.0.0 255.255.255.08.2.1.0 0.0.0.0 255.255.255.08.2.2.0 0.0.0.0 255.255.255.017184.108.40.206 220.127.116.11 255.255.255.08.1.0.0 0.0.0.0 255.255.0.00.0.0.0 18.104.22.168 0.0.0.00.0.0.0 22.214.171.124 0.0.0.00.0.0.0 126.96.36.199 0.0.0.0Source policy routing table for interface 188.8.131.52/16172.22.28.0 184.108.40.206 255.255.255.128220.127.116.11 18.104.22.168 255.255.255.08.1.0.0 0.0.0.0 255.255.0.00.0.0.0 22.214.171.124 0.0.0.0Source policy routing table for interface 126.96.36.199/248.2.1.0 0.0.0.0 255.255.255.00.0.0.0 188.8.131.52 0.0.0.0Source policy routing table for interface 184.108.40.206/248.2.2.0 0.0.0.0 255.255.255.00.0.0.0 220.127.116.11 0.0.0.0
If you have a default source policy route where the gateway is not defined as a default gateway, then you must add it after upgrading to Release 2.5.11. For example, if you had a source policy route with a gateway of 18.104.22.168 for a source interface of 22.214.171.124, and you did not specify the gateway as one of the default gateways, you would need to add it.
If you have a non-default source policy route, then you must add it as a regular static route (without the obsoleted interface keyword) after upgrading to Release 2.5.11. This route is then added to the main routing table as well as the policy routing table.
URL Public Key Signing
Table 5 describes the compatibility and results when using a prior CDS software release to perform URL signing and the current software release to perform URL validation.
SATA Disk Error Handling and Threshold Recommendations
This section addresses the concerns related to a recent increase in SATA disk failure frequency observed at customer production networks, which mostly occurred following a software upgrade from Release 2.5.3 to Release 2.5.9.
We recommend the following configuration settings for disk error handling:(config)# disk error-handling threshold 100(config)# no disk error-handling reload(config)# service-router service-monitor threshold failcntdisk 4
The disk error-handling threshold command determines how many disk errors can be detected before the disk drive is automatically marked as bad. The disk error-handling threshold command range is 0-100 with a default value of 10. By default, this threshold is set to 10 disk-related read/write errors. A setting of 0 means the disk is never marked bad, but disk failure alarms are triggered frequently.
The default setting for the disk error-handling reload command is disabled.
The service-router service-monitor threshold failcntdisk command configures the disk failure count threshold value with a range of 1-15.
Changing the disk error-handling threshold command setting to 100, helps alleviate marking a good disk bad and prematurely offloading an SE because it reached the failed bad disk-count threshold. If you change this threshold to zero (0), the disk is never marked bad, but a disk failure alarm message occurs every time a disk error occurs. Setting the threshold to 100 is also beneficial by letting you know which drive has had errors, which could affect the end-user experience.
The service-router service-monitor threshold failcntdisk command sets the limit for how many disks with a CDNFS partition can fail or be marked bad before the Service Router no longer sends requests to the Service Engine. We recommend setting this threshold value to four; this means a third of the drives would have to fail before the device is considered not able to handle incoming requests or sessions efficiently. However, a device should never have four drives that are bad at any one point in time.
The system default setting for the disk error-handling threshold command is 10.
Starting with Release 2.5.9 of the Cisco Internet Streamer CDS software, a new alarm type, "badsector," was introduced to report specific bad sector errors. Release 2.5.3 did not have this alarm nor did it detect these badsector failures. In Release 2.5.9, after 10 sector-related I/O errors occur, a drive is marked as bad.
The following tasks are performed when a drive is marked bad:
•Raise a disk_failure alarm (this alarm also exists in Release 2.5.3).
For Release 2.5.3, the disk_failure alarm is raised for any sector-related error.
For Release 2.5.9, the disk_failure alarm is only raised after 10 sector-related errors occur.
•Forcibly unmount the drive.
Note In Release 2.5.9, additional retry attempts are made to unmount the drive. This was done in order to make the drive unmount logic more robust, especially during I/O streaming activity.
•Intentionally invalidate the Master Boot Record (MBR) of the drive, thereby destroying any cached content. This is a new feature in Release 2.5.9 and was added to eliminate the possibility of reusing potentially corrupt cached content.
Release 2.5.9 introduces several new disk-related alarms, which might give the false impression after upgrading the software that the disk subsystem is not healthy. In reality, the software is merely reporting more accurate (finer grained) failures through the use of additional alarm types.
Note If a disk is marked bad, the show disk detail command output displays "disk01: Not used (*)" and the drive is not used after a reload.
In Release 2.5.9, the disk error-handling counter is incremented when the follow error occurs:end_request: I/O error
In Release 2.5.3, the disk error-handling counter is incremented when the follow error occurs:Buffer I/O error
The Disk Error Handling feature allows you to set the disk error-handling threshold and how to handle disk errors if the threshold is reached. If the automatic reload feature is enabled (the disk error-handling reload command), and the disk drive gets marked as bad because the disk error-handling threshold (read/write) was reached, the device is automatically reloaded. Following the device reload, a syslog message and an SNMP trap are generated. If the disk drive that is marked bad is a critical disk drive (by definition this is a disk with a SYSTEM partition), the redundancy of the system disks for this device is affected.
The disk error-handling reload is a legacy command that was used when RAID 1 was not implemented. Because RAID 1 is now being used, we do not want the device to be reloaded, because the software state may be lost upon reload. With the RAID system, if the critical primary disk fails, the other mirrored disk seamlessly continues operation.
A disk is marked bad when the number of read/write errors reaches the threshold setting of the disk error-handling threshold command, which is 10 by default. As an example, if there is one bad sector on a disk that gets read 10 times, the disk is marked bad. As another example, if there are 10 bad sectors that each get read once, the disk is marked bad.
However, one bad sector does not mean a drive is bad. Typically, the indication that a drive is bad and needs to be replaced is if the show disk SMART-info detail command output exceeds the values described in Table 6.
A drive needs to be replaced if any of the RAW_VALUEs listed in Table 6 are exceeded. The values indicating drive replacement for SYSTEM drives (disk00 and disk01) are lower than CDNFS drives because of the critical nature of system drives as compared to data drives.
The show disk SMART-info command (without the detail keyword), provides information on the overall health of each drive. The following example of the show disk SMART-info command output shows that disk08 is bad:# show disk SMART-info... etc ...=== disk08 ===smartctl 5.40 2010-10-16 r3189 [i686-pc-linux-gnu] (local build)Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net=== START OF INFORMATION SECTION ===Model Family: Seagate Barracuda ES.2Device Model: ST3500320NSSerial Number: 9QM92HZ0Firmware Version: SN05User Capacity: 500,107,862,016 bytesDevice is: In smartctl database [for details use: -P show]ATA Version is: 8ATA Standard is: ATA-8-ACS revision 4Local Time is: Tue Jul 19 04:42:16 2011 PDT==> WARNING: There are known problems with these drives,see the following Seagate web pages:http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207963SMART support is: Available - device has SMART capability.SMART support is: Enabled=== START OF READ SMART DATA SECTION ===SMART overall-health self-assessment test result: FAILED!Drive failure expected in less than 24 hours. SAVE ALL DATA.Failed Attributes:ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE5 Reallocated_Sector_Ct 0x0033 025 025 036 Pre-fail Always FAILING_NOW 1548
The show disk SMART-info command should be repeated for each drive. If the overall-health assessment of a drive indicates "FAILED," then the drive should be replaced. The output of the show disk SMART-info command also shows the SMART attributes that indicate drive failure (in the above example, the Reallocated_Sector_Ct attribute indicates FAILING_NOW).
Additionally, the CDSM GUI and the show alarms command on the SEs display the sector alarms. If sector alarms have occurred, enter the show disk SMART-info details command on the SE to determine the state of the drive and whether the drive needs to be replaced or repaired.
Following is an example of the show alarms command output in Release 2.5.9:Minor Alarms:-------------Alarm ID Module/Submodule Instance-------------------- -------------------- -------------------------1 badsector sysmon disk012 badsector sysmon disk08
If the show disk SMART-info details command output values for Current_Pending_Sector and Offline_Uncorrectable are below the threshold described in Table 6, then you need to run the disk repair command. If the output values for Current_Pending_Sector and Offline_Uncorrectable are above the threshold described in Table 6, then you need to replace the disk. After the disk repair command completes, we recommend that you reboot the SE to ensure all CDS software services are functioning correctly.
Note In Release 2.5.9, there is a disk repair command similar to the repair-disk utility. The repair-disk utility provides progress indicators and displays a log of repaired sectors; it also provides more robust sector error detection, repair, and validation. Both the repair-disk utility and the disk repair command take approximately three hours to complete per disk.
Table 7 provides an example of the last part of the output of the show disk SMART-info detail command. The attributes that need to be reviewed to determine if the drive needs to be replaced or repaired are highlighted in bold. A drive needs to be replaced if any of the RAW_VALUEs listed in Table 6 are exceeded. In this example, because the Reallocated_Sector_Ct value is greater than 20, this drive should be replaced.
Table 8 provides an example of the last part of the output of the of the show disk SMART-info detail command. The attributes that need to be reviewed to determine if the drive needs to be replaced or repaired are highlighted in bold. In this example, the Current_Pending_Sector and Offline_Uncorrectable each have a value greater than one, so running the repair-disk utility will resolve this issue and this drive does not need to be replaced.
Note We recommend always repairing the disk if either Current_Pending_Sector or Offline_Uncorrectable are greater than one. This is because sector errors tend to be spatially and temporally (in time) adjacent to each other on a drive.
The show disk SMART-info detail command only reports sector errors that have been detected; there may be more sectors in error adjacent to the reported bad sector. Repairing the drive also proactively repairs unreported sector errors. However, because repairing a drive is a time-consuming process, it may be easier to just replace the drive if a spare drive is available.
Table 9 provides detailed description of the Attribute Names that could indicate disk problems.
The following documents have been added for this release:
•Release Notes for Cisco Internet Streamer CDS 2.5.11
Refer to the following documents for additional information about the Cisco Internet Streamer CDS 2.5:
•Cisco Internet Streamer CDS 2.5 Software Configuration Guide
•Cisco Internet Streamer CDS 2.4-2.5 Quick Start Guide
•Cisco Internet Streamer CDS 2.4-2.5 API Guide
•Cisco Internet Streamer CDS 2.5 Command Reference Guide
•Cisco Internet Streamer CDS 2.5 Alarms and Error Messages Guide
•Cisco Content Delivery System 2.x Documentation Roadmap
•Cisco Content Delivery Engine 205/220/420 Hardware Installation Guide
•Cisco Content Delivery Engine 100/200/300/400 Hardware Installation Guide
•Regulatory Compliance and Safety Information for Cisco Content Delivery Engines
•Open Source Used in CDS IS 2.5.11
The entire CDS software documentation suite is available on Cisco.com at:
The entire CDS hardware documentation suite is available on Cisco.com at:
Obtaining Documentation and Submitting a Service Request
For information on obtaining documentation, submitting a service request, and gathering additional information, see the monthly What's New in Cisco Product Documentation, which also lists all new and revised Cisco technical documentation, at:
Subscribe to the What's New in Cisco Product Documentation as a Really Simple Syndication (RSS) feed and set content to be delivered directly to your desktop using a reader application. The RSS feeds are a free service and Cisco currently supports RSS version 2.0.
This document is to be used in conjunction with the documents listed in the "Related Documentation" section.
Cisco and the Cisco Logo are trademarks of Cisco Systems, Inc. and/or its affiliates in the U.S. and other countries. A listing of Cisco's trademarks can be found at www.cisco.com/go/trademarks. Third party trademarks mentioned are the property of their respective owners. The use of the word partner does not imply a partnership relationship between Cisco and any other company. (1005R)
Any Internet Protocol (IP) addresses used in this document are not intended to be actual addresses. Any examples, command display output, and figures included in the document are shown for illustrative purposes only. Any use of actual IP addresses in illustrative content is unintentional and coincidental.
© 2013 Cisco Systems, Inc. All rights reserved.