Table Of Contents
Creating Manifest Files for Cisco ACNS Software, Release 5.0.3
Writing a Single-Item HTTP Manifest File
Writing a Single-Item FTP Manifest File
Writing an FTP Crawler Manifest File
Writing an HTTPS Crawler Manifest File
Migrating from ACNS 4.x Software to ACNS 5.0 Software
Writing Common Regular Expressions
Specifying a Single Content Item
Scheduling Content Acquisition
Specifying Attributes for Content Serving
Specifying Metadata for Content Serving
Specifying Time Values in the Manifest File
Refreshing and Removing Content
Running the Manifest Validator Utility
Understanding Manifest File Validator Output
Correcting Manifest File Syntax
Manifest File Structure and Syntax
Configuring Freshness of Pre-Positioned Content
Manifest File Time Zone Tables
Manifest File Automated Scripts
Installing Perl on Your Workstation
Listing Website Content Using the Spider Script
Selecting Live and Pre-Positioned Content Using the Manifest Script
Creating a Rules File for the Spider and Manifest Scripts
Creating Manifest Files for Cisco ACNS Software, Release 5.0.3
Note
This chapter replaces Chapter 6, "Creating Manifest Files", in the Cisco ACNS Software Deployment and Configuration Guide, Release 5.0 for ACNS 5.0.3 software. Go to the following link to access this guide on Cisco.com: http://www.cisco.com/en/US/products/sw/conntsw/ps491/products_configuration_guide_book09186a008012b6c4.html
This chapter describes the process for creating manifest files used to acquire and distribute content with ACNS 5.0.3 software. This chapter is divided into two major sections:
•
Manifest File User Guidelines
This first major section provides:
–
A general overview and purpose of manifest files in the context of a Cisco CDN
–
A quick start section to get you up and running immediately
–
A getting started section that describes how to complete specific tasks
–
Useful sample manifest files
–
A syntax validation utility
–
Explanation of live content distribution
This second major section describes:
–
Detailed manifest file structure and syntax
–
XML schema
–
Running of automated manifest file scripts
–
Manifest file time zone tables
Manifest File User Guidelines
This first major section contains the following topics:
Overview
The Cisco ACNS 5.0 software manages the acquisition and distribution of pre-positioned content through an Extensible Markup Language (XML)-based reference file called the manifest file. The manifest file lists content that is to be used to populate Content Engines registered on a Cisco CDN. There should be one manifest file per channel.
The manifest file is placed on an origin server and identified by a unique URL. The location of the manifest file is specified when you enter the manifest file URL in the Modifying Channel window of the Content Distribution Manager GUI. Unlike the treatment of content by Cisco ACNS 4.x software, pre-positioned content is not stored on the Content Distribution Manager in ACNS 5.0 software but is fetched from origin servers and distributed to Content Engines by a Content Engine that is a root Content Engine for the channel.
The Content Distribution Manager disseminates the manifest file URL to each of the root Content Engines of the CDN. The root Content Engine then parses the file and checks for any new or different information. After the root Content Engine determines what content is new, it fetches only that new content from the specified pre-positioned or live content from one or more origin servers.
The manifest file has the following features:
•
Administrators and content providers can provide content on an origin server.
•
Files can be imported over HTTP, HTTPS, or FTP while they are served using another streaming protocol based on a designated type of media playserver to play back the requested file.
Content acquisition and distribution can be controlled by setting pre-scheduled content availability dates and times. Two content acquisition methods can be configured within the manifest file. The first method specifies the acquisition of a single <item>. The second method specifies content acquisition by crawling a website or FTP server with the <crawler> feature. Either of these two methods can schedule when the acquisition is to start and how often its content is to be checked for freshness.
Quick Start
This section will help you succeed in writing manifest files that you can use to acquire content immediately. See other sections of this chapter to learn more about specifying useful attributes to customize the manifest files further and to obtain more information on the correct manifest file syntax.
Note
The username and password specified in the Channel property serves only to fetch the manifest file. The actual content acquisition process does not use this username and password. For fetching actual content, the username and password need to be specified in the <server> <host> tag.
Writing XML Tags
The manifest fie is a text file written in XML format. An XML text file consists of a series of XML tags. The following is an example of a simple XML tag:
<item attr1="value1" attr2="value2" />In the preceding example, "item" is the name of the XML tag, so this tag is called the "item" tag. A tag can have many attributes in the form of name="value." The value field must be bounded by double quotation marks. There are two attributes inside the "item" tag shown in the example. The first attribute, called "attr1," has a value called "value1." The second attribute, called "attr2," has a value called "value2."
Tags typically start with a "<" and end with a "/>," but they can start with a "<" but and end with ">." If a tag ends with ">," it means its scope is not yet complete. To complete its scope, a tag called "tag-name" must end with "</tag-name>." For example:
<server name="name" ><host name="name" /></server>On the first line of the example, the <server> tag ends with a ">," but its scope does not end on the first line. Its scope ends on the third line with the tag </server>. Because the <host> tag is inside of <server> tag, the <host> tag is called a subtag of the <server> tag, and the <server> tag is considered the parent tag of the <host> tag.
Two tag relationships can exist between XML tags: peer and subtag. In the following example, the two "item" tags have a peer relationship:
<item src="url1" /><item src="url2" />The key to identifying their peer relationship is that the first tag ends with a "/>" before the second tag starts. In the following example, the <server> and <host> tags have a subtag relationship:
<server name="cisco"><host name="url" /></server>The key to identifying their subtag relationship is that the first tag ends with a ">" before the second tag starts. The <host> tag is the subtag of the <server> tag; that is, the <server> tag is the parent of <host> tag.
Important Manifest Tags
This section lists and briefly describes the important manifest tags for you to better understand manifest files.
•
<CdnManifest> tag
The <CdnManifest> </CdnManifest> tag set must be the highest level tag for ACNS 5.0 software manifest file. The tag set is required and marks the beginning and end of the manifest file content. At a minimum, each <CdnManifest> tag set must contain at least one item, or content object, that is fetched and stored.
For example,
<CdnManifest>...</CdnManifest>•
<item> tag
The <item> tag is used to specify a single file to be pre-positioned.
–
item src attribute
The src attribute is required to specify the URL. The URL can be an absolute URL as shown in the following format:
proto://username:password@/domain-name:port/file-path/file-name
The src attribute can also use a relative URL, but you need to use the <server><host> tags to specify proto, domain, username, and password.
•
<crawler> tag
The <crawler> tag is used to specify a crawl job. You can use the <crawler> tag to crawl an FTP directory and its subdirectories or to crawl directories using HTTP directory indexing.
–
crawl directories
Use HTTP to crawl directories to fetch files in certain directories by enabling the built-in web server directory indexing feature.
If a URL points to a directory when this directory indexing feature is enabled, the web server dynamically generates an HTML page and lists all the files and subdirectories. By parsing such an HTML page, the ACNS software can identify those files it can fetch from that particular directory.
–
crawler start-url attribute
The start-url attribute is required to specify the start URL for crawling. It can be an absolute URL, or a relative URL (see <item> tag).
–
crawler depth attribute
The crawler depth attribute specifies the directory depth of web crawl. A depth value of 0 allows only a crawl of the starting URL page, while a depth value of 1 allows a crawl of the start URL page and its links.
Writing a Single-Item HTTP Manifest File
The following sample shows the simplest way to write a manifest file that fetches content using the HTTP protocol.
<CdnManifest><item src="http://www.my-server.com/project-one.html" /><item src="http://www.my-server.com/my-eng-group/project-two.html" /><item src="http://www.my-server.com/project-three.html" /></CdnManifest>The <CdnManifest> tag set is required to specify a manifest file. Upon execution, the preceding manifest file sample instructs the ACNS software to fetch the following items using HTTP protocol:
•
http://www.my-server.com/project-one.html
•
http://www.my-server.com/my-eng-group/project-two.html
•
http://www.my-server.com/project-three.html
Writing a Single-Item FTP Manifest File
The following sample shows the simplest way to write a manifest file that fetches content using the FTP protocol.
<CdnManifest><item src="ftp://johnw:my-pass-word@myftp.cisco.com/file1.txt" /><item src="ftp://johnw:my-pass-word@myftp.cisco.com//full-path/file2.txt" /></CdnManifest>Notice that following the FTP server name "myftp.cisco.com," the first URL has only one slash ("/") and the second URL has two slashes ("//"). The first file path, "file1.txt," is the relative file name and is relative to the home directory of the FTP user login. The second file path, "/full-path/file2.txt," is the absolute path.
Writing an FTP Crawler Manifest File
The following sample shows the simplest way to write a crawler manifest file that fetches content using FTP protocol.
<CdnManifest><crawler start-url="ftp://ftp-server/folder/" depth="10" ttl = "10" /></CdnManifest>The preceding manifest file sample instructs the ACNS software to start crawling from ftp://ftp-server/folder to ten directory levels deep and check those directories every 10 minutes for freshness.
The <crawler> tag specifies the crawl task. The start-url attribute specifies where the web crawler is to start crawling. The depth attribute of ten specifies how many levels of subdirectories the crawler is to check to obtain the required content. The ttl attribute specifies how often the file is to be checked for freshness. The ttl attribute can be specified as an attribute in a single-item manifest file as well.
Writing an HTTPS Crawler Manifest File
The following manifest file sample instructs the ACNS software to start crawling from https://www.cisco.com/jobs/eng/ to a depth of five levels.
<CdnManifest><crawler start-url="https://www.cisco.com/jobs/eng/" depth="5" /></CdnManifest>As with the single-item manifest file, the <CdnManifest> tag set is required to specify a manifest file. If directory indexing is enabled for the jobs/eng directory and its subdirectories, then the crawler will go to a depth of five directory levels to retrieve the files. If the origin server is running HTTP or HTTPS services, directory indexing must be enabled for these directories. Enabling directory indexing allows a request to that directory to return a list of files in that directory and allows ACNS 5.0 software to crawl the directory.
Validating Manifest Files
It is a good idea to use the Manifest Validator utility to validate your manifest file after it is created. See the "Manifest Validator Utility" section for more information on the Manifest Validator utility.
Migrating from ACNS 4.x Software to ACNS 5.0 Software
Unlike ACNS 4.3 software, ACNS 5.0 software requires one or more origin servers where source files can be stored for the pre-positioning of content. These origin servers require that remote access servers be installed to support HTTP, FTP, or HTTPS services so that CDN devices can fetch pre-positioned files.
Files associated with a particular channel are typically stored in the same directory or subdirectory on the origin server. If the origin server is running HTTP or HTTPS services, directory indexing must be enabled for these directories. Enabling directory indexing allows a request to that directory to return a list of files in that directory and allows ACNS 5.0 software to crawl the directory.
Once the content is uploaded to a suitable origin server and available, you can use the following simple manifest file to specify content acquisition.
<CdnManifest><crawler start-url="http://my-server/my-path/" ttl="10" /></CdnManifest>In the preceding sample, a crawl job is specified for the associated channel to check the "my-path" directory for freshness every 10 minutes. Once this setup is complete, the root Content Engine associated with this channel monitors this directory to determine if there are any new or updated files, and then automatically fetches them.
Running the preceding manifest file sample achieves the same objective as that featured in the ACNS 4.2 software, where users copy pre-positioned files into a Content Distribution Manager import folder. However, using the manifest file is more powerful than the Content Distribution Manager import feature. For example, if you store content at different locations but the content must be distributed through the same channel, you can create multiple crawl jobs in the manifest file to monitor these locations. The following sample manifest file allows you to monitor different locations.
<CdnManifest><crawler start-url="http://my-server/my-path-http/" ttl="10" /><crawler start-url="ftp://my-server/my-path-ftp/" ttl="10" /></CdnManifest>You are monitoring a directory "my-path-http" in an HTTP server and a directory "my-path-ftp" in an FTP server.
Getting Started
The manifest file, whose URL is stored in the Content Distribution Manager GUI, allows you to define a series of servers from which content can be fetched, as well as a list of content objects on each server to be fetched. Written in XML, a finished manifest file contains a series of URLs pointing to pre-positioned content.
This section explains the structure of the XML-based manifest file. In the manifest file syntax samples that follow, note the capitalization and data formats used. For your finished manifest file to be executed successfully, XML tags and tag attributes must use the format outlined in this section.
Sample Manifest File
The following example shows a simple functional manifest file. Use this example as a model when creating or troubleshooting your own manifest files.
<?xml version="1.0"?><CdnManifest><playServerTable><playServer name="wmt"><contentType name="wmt"/></playServer></playServerTable><options noRedirectToOrigin="true"/><server name="server0"><host name="http://www.cnn.com"/></server><item-group server="server0">serveStartTime="2003-01-12 14:00:00 PST" serveStopTime="2099-04-12 14:00:00 PST"><item src="item-01"/><crawler start-url="crawler-01" depth="10"/></item-group></CdnManifest>The format of the manifest file is important because it is the vehicle that specifies those content objects that are to be imported into your CDN for pre-positioning in your edge devices, such as Cisco Content Engines. With the manifest file, you can specify where to obtain web content objects, how long these objects should remain on the Content Engines of your CDN, and how frequently the ACNS software should check their freshness.
Using a simple text editor, you can write acquisition and pre-positioning instructions in XML format. The actual manifest file resides on a web server that the Content Distribution Manager can access. The manifest file URL is stored in the Content Distribution Manager GUI. The ACNS software takes its instructions from the manifest file, acquiring content from the origin server and pre-positioning it to the appropriate edge devices on your CDN. You can specify that the manifest file fetch content from servers using either of the following methods:
•
Fetch one or multiple single items or URLs.
•
Start a crawler job using its associated parameters, such as starting URL, level of directory depth, prefix, and filter, to accept or reject content using criteria you have specified.
You can also schedule when content acquisition is to start and how often content should be checked for freshness. Information on how end users can access pre-positioned content on the CDN must be provided. For example, end users need to know what playserver should be used to play media, how to access the content, when the content is to be served, and any additional metadata for media playback.
Using a Text Editor
Because XML files, like HTML files, are simple text format files that use special tags or elements to designate how content is to be handled and represented on a website, it is possible to create manifest files using any ASCII text editor. A variety of third-party XML authoring tools also exist, and they can speed the process of generating manifest files.
Unlike HTML, which serves as a language for creating web pages, XML is a language for creating languages. In this case, the manifest file becomes the XML application. The XML application contains tags that describe the information that is contained within the tags. This information is extracted from the manifest file XML application and reused repeatedly to carry out tasks, or it is merged with other information from a different source and the result used in a different framework or for a different function.
Writing XML is not as forgiving as writing HTML. XML is sensitive to uppercase and lowercase letters, the use of quotation marks, the proper closure of tags, and other formats that require exceptional attention to detail. Care must be taken to ensure that XML tags are properly formatted and otherwise syntactically correct. Incorrectly formatted data, such as incorrect usage of capitalization in a tag or tag attribute, results in syntax errors.
Formatting XML Files
The manifest file must be written using the XML format described in the "Manifest File Structure and Syntax" section. An XML file is a plain text file with tags. The following is an example of a simple XML tag:
<sample-tag/>The tag begins with the left angle bracket (<) and ends with a forward slash and a right angle bracket (/>). The name of this tag is "sample-tag."
The following is an example of a tag with attributes:
<sample-tag name1="value1" name2="value2" />The following sample tag has attributes and a subtag:
<sample-tag name1="value1" name2="value2"><sub-tag name1="value1" name2="values"/></sample-tag>If a subtag is contained within a tag, the subtag attribute list must end with a right angle bracket (>) instead of a forward slash and a right angle bracket (/>), and the entire tag must end with </tag-name>.
For more information on XML or XML tutorials, refer to the following links:
Writing Common Regular Expressions
A regular expression is a formula for matching strings that follow a recognizable pattern. The following special characters have special meanings in regular expressions:
. * \ ? [ ] ^ $
If the regular expression string does not include any of these special characters, then only an exact match satisfies the search. For example, "stock" must match the exact substring "stock."
For more information about writing regular expressions, refer to the following website:
http://yenta.www.media.mit.edu/projects/Yenta/Releases/Documentation/regex-0.12/
Working with Manifest Files
This section provides manifest file samples for carrying out specific tasks. Each sample has an associated explanation of its purpose and function. The manifest file can specify a single content object, a website crawler job, or an FTP server crawler job to acquire pre-positioned content or live content that is distributed to edge Content Engines later.
Specifying a Single Content Item
The following manifest file example specifies a single content item.
<CdnManifest><item src="http://www.my-server/test.html" /><item src="test.html" /><server name="my-origin-server-one"> <host name="http://www.my-server-one.com/eng/" /> </server><server name="my-origin-server-two"> <host name="http://www.my-server-two.com/eng/" /> </server><item src="project-two.html" /><item server="my-origin-server-one" src="project-one.html" /></CdnManifest>For a single item, the key is to specify the item URL in the src attribute. There are two ways to specify the item URL:
•
Specify the src attribute with the absolute URL as shown using the following format:
proto://username:password@/domain-name:port/file-path/file-name.
The first <item> uses the full path.
•
Specify the origin server information using the <server><host> tags and use the src attribute to only specify relative path.
In the preceding example, every <item> tag except the first one uses a relative path. The second <item> tag uses the manifest server, where test.html is relative to the manifest file URL. The second <item> tag, "project-two.html," uses "my-origin-server-two." The third <item> tag, "project-one.html," uses "my-origin-server-one."
Use the <item> tag to specify a single content item, object, or URL. The required attribute src is used to specify the relative path portion of the URL. If the server name attribute is omitted, the server name in the last specified <server> tag above it is used. If there are no <server> tags close by in the manifest file, the server that hosts the manifest file will be used, which means that the relative URL will be relative to the manifest file URL.
Note
Before any content can be acquired, you must enter the URL that defines the location of the manifest file in the Content Distribution Manager GUI. In the Modifying Channel window, enter the location URL of the manifest file, its Time To Live (TTL), the username, and the password required to access the manifest file (if the location is password-protected).
Specifying a Crawl Job
The web crawler application methodically and automatically searches acceptable websites and makes a copy of the visited pages for later processing. The web crawler starts with a list of URLs to visit and identifies every web link in the page, adding these links to the list of URLs to visit. The process ends after one or more of the following conditions are met:
•
Links have been followed to a specified depth.
•
Maximum number of objects has been acquired.
•
Maximum content size has been acquired.
By crawling a site at regular intervals using the Time To Live (or ttl) attribute, these links and their associated content can be updated regularly to keep the content fresh. For more information on the ttl attribute, see the "Refreshing and Removing Content" section.
Use the <crawler> tag to specify the website or FTP server crawler attributes. Table 6-1 lists the attributes, states whether these attributes are required or optional, and describes their functions.
Note
If you specify both the max-number and max-size attributes as the criteria to use to stop a crawler job, the condition that is met first takes precedence. That is, the crawler job stops either when the maximum number of objects is acquired or when the maximum content size is reached, whichever occurs first. For example, if the crawler job has acquired the maximum number of objects specified in the manifest file but has not yet reached the maximum content size, the crawler job stops.
The following is an example of a website crawler job.
<server name="cisco"><host name="http://www.cisco.com/jobs/" /></server><crawlerserver="cisco"start-url="eng/index.html"depth="10"prefix="eng/"reject="\.pl"max-size-in-MB="200"/>The attributes of this website crawler job example are:
•
The start-url path is http://www.cisco.com/jobs/eng/index.html.
•
Search to a website link depth of 10.
•
Search URLs with the prefix http://www.cisco.com/jobs/eng/.
•
Reject URLs containing .pl (Perl script pages).
•
Crawl only until 200 megabytes in total content size is acquired.
If the server name attribute is omitted, the server name in the last specified <server> tag above it is used. If there are no <server> tags close by in the manifest file, the server that hosts the manifest file will be used, which means that the relative URL will be relative to the manifest file URL.
Scheduling Content Acquisition
Two attributes, ttl and prefetch, are used to schedule content acquisition. Use ttl to specify the frequency of checking the content for freshness, in minutes. For example, to check for page freshness every day, enter ttl="1440."
In the following example, page freshness is scheduled to be checked once a day.
<itemsrc="index.html"ttl="1440"/>In the following example, page freshness is scheduled to be crawled and checked every hour to a link depth of 2.
<crawlerstart-url="index.html"depth="2"ttl="60"/>If the content is not yet available at a particular URL, the prefetch attribute can be used to specify the start time for acquisition at that specified URL. For example, prefetch="2002-28-06 18:35:21" means that the content acquisition job can only start on June 28, 2002 at this specific time.
The following example schedules a crawl of this website every hour to a link depth of 2 to start on November 9, 2001 at 8:45 a.m.
<crawlerstart-url="index.html"depth="2"prefetch="2001-09-11 08:45:12"ttl="60"/>Specifying Shared Attributes
Attributes in single <item> tags can be shared or have the same attribute values. Instead of writing these attributes individually for every <item> tag, you can extract them and place them into a higher-level tag called <item-group>, where these attributes can be shared from this higher level tag. You can create an <item-group> tag at a level below the <CdnManifest> tag, and write <item> tags into it as subtags, moving shared attributes into the <item-group> tag, as shown in the following example:
<?xml version="1.0"?><CdnManifest><server name="cisco-cco"><host name="http://www.cisco.com"proto="http" /></server><item-groupserver="cisco-cco"ttl="1440"type="prepos" ><item src="jobs/index.html"/><item src="jobs/index1.html"/><item src="jobs/index2.html"/><item src="jobs/index3.html"/><item src="jobs/index4.html"/><item src="jobs/index5.html"/></item-group></CdnManifest>You can also use the <options> tag to share attributes at the topmost level of the manifest file. Shared attributes in the <options> tag can be shared by every <item> tag or by the <crawler> tag in the manifest file. However, if a shared attribute is specified in both the <item-group> and the <item> tags or the <options> and <item> tags, attribute values in the <item> tags take precedent over the <item-group> and <options> tags. For a list of shared attributes, see the "options" section.
The following example illustrates this precedence rule. The first <item> tag takes the TTL value 1440 from the <options> tag, but the second <item> uses its own TTL value of 60.
<optionsttl="1440" ><item src="index.html" /><item src="index1.html" ttl="60" />If you need to specify many single <item> tags and if a manifest file with many single items or URLs must be created, Perl scripts are available to create such single <item> tags. See the "Manifest File Automated Scripts" section to use automated Perl scripts.
Specifying a Crawler Filter
With a rule-based crawler filter, you can crawl an entire website and only acquire contents with certain predefined characteristics. Crawler attributes in the <crawler> tag do not act as filters but only define the attributes for crawling. The <matchRule> tag is designed to act as a rule-based filter. You can define rule-based matches for file extensions, size, content type, and time stamp. In the following example, the crawl job is instructed to crawl the whole website starting at "index.html," but to acquire only files with the .jpg extension and those larger than 50 kilobytes.
<crawlerstart-url="index.html" ><matchRule><match size-min-in-KB="50" extension="jpg" /></matchRule></crawler>There can be multiple <match> subtags within a <matchRule> tag. Table 6-2 lists and describes the <match> subtag attributes.
Table 6-2 <match> Subtag Attributes
Attribute Descriptionmime-type
Specifies match of these MIME-types.
extension
Specifies match of files with these extensions.
time-before
Specifies match of files modified before this time (using the Greenwich mean time [GMT] time zone) in yyyy-mm-dd hh:mm:ss format. See the "options" section for a description of the timezone attribute.
time-after
Specifies match of files modified after this time (using the Greenwich mean time [GMT] time zone) in yyyy-mm-dd hh:mm:ss format.
size-min-in-MB
size-min-in-KB
size-min-in-B(Optional) Specifies match of content size equal to or larger than this value. The size can be expressed in megabytes (MB), kilobytes (KB), or bytes (B).
size-max-in-MB
size-max-in-KB
size-max-in-B(Optional) Specifies match of content size equal to or smaller than this value. The size can be expressed in megabytes (MB), kilobytes (KB), or bytes (B).
A <match> subtag can specify multiple attributes. Attributes within a <match> tag have a Boolean AND relationship. In the following example, to satisfy this match rule, a file must have an .mpg type file extension AND its size must be larger than 50 kilobytes.
<match extension="mpg" size-min-in-KB="50" />There is a Boolean OR relationship between the <match> rules themselves. A <matchRule> tag can have multiple <match> subtags, but only one of these subtags must be matched. The <matchRule> tag can be specified as a subtag of the <crawler> tag, or a subtag of the <item-group> tag. If there is a subtag in an <item-group> tag, it is shared by every <crawler> tag within that <item-group> tag.
Note
The accept or reject attributes can be mistakenly used in the <crawler> tag for a crawler filter.
For example, to crawl files with the extension .mpg, simply specifying accept="\.mpg" is not correct. In this case, although specifying accept="\.mpg" is not technically incorrect, no crawling occurs. Pages whose URLs do not match the accept constraint are not searched. For example, if the starting URL is index.html, this HTML file is parsed and any links not containing .mpg are rejected. If the .mpg files are located in the second or lower link levels, they are not fetched, because the links connecting them have been rejected.
To properly crawl for the .mpg extension, use <matchRule>. Specify <matchRule> <match extension="mpg" />. The whole site is crawled and only those files with the .mpg extension are retained.
Specifying Content Priority
A priority can be assigned to content objects to define their order of importance. The CDN determines the order of processing from the level of priority of the content. The higher the content priority, the sooner the acquisition of content from the origin server and the sooner the content is distributed to the Content Engines.
Note
Every content object acquired by running a crawler job has the same priority.
Three factors combine to determine content priority:
•
Channel priority—content Distribution Priority drop-down list in the Modifying Channels window of the Content Distribution Manager GIU in the Acquisition and Distribution Properties area.
•
Item index—content order listed in the manifest file
•
Item priority—priority of the attributes specified in the <item> or <crawler> tag
To calculate content priority, use either item-priority or item-index:
•
If there is a priority specified in item-priority of the manifest file for this content, use the following formula:
content-priority = channel-priority * 10000 + item-priority
Tip
The item-priority within the <item> tag can be any integer and is unrestricted. If you want a particular content object to have the highest priority, specify a very large integer value in the item-priority for that particular content object in the content-priority formula.
•
If an object does not have an item-specified priority, use the item-index order within the manifest file:
content-priority = channel-priority * 10000 + 10000 - item-index
Note
If there is no priority specified for any items, content is processed in the order listed in the manifest file.
Generating a Playserver List
ACNS 5.0 software supports playservers that play back the following pre-positioned content types on the CDN: HTTP, WMT, and RTSP (RealMedia and QuickTime Streaming Server [QTSS]). The CDN checks whether the requested protocol matches the list in the playserver table. If it matches, the request is delivered. If it does not match, the request is rejected.
You can generate a playserver list through:
•
The manifest file by configuring playserver attributes in an <item>tag.
•
The <playServerTable> tag by configuring playserver MIME-type extension names.
To create the playserver list directly through the manifest file, configure playserver attributes of the playserver list in an <item> tag. If an <item> tag does not have a playserver attribute, its playserver list is generated through the <playServerTable> tag. If the <playServerTable> tag is omitted in the manifest file, a built-in default <playServerTable> tag is used to generate the playserver list. Multiple servers are separated by commas, as shown in the following example:
<item src="video.mpg" playServer="real,wmt" />You can also generate the playserver list that supports these streaming media types through the <playServerTable> tag. The <playServerTable> tag maps content into a playserver list based on the MIME-type extension name. If there is a <playServerTable> tag in the manifest file, use the <playServerTable> tag in the manifest file.
To generate the playserver list though the <playServerTable> tag, use MIME-type extension names to configure which playserver can play the particular pre-positioned content, as shown in the following example:
<playServerTable><playServer name="real"><contentType name="application/x-pn-realaudio" /><contentType name="application/vnd.rn-rmadriver" /><extension name="rm" /><extension name="ra" /><extension name="rp" /><extension name="rt" /><extension name="smi" /></playServer><playServer name="wmt"><extension name="wmv" /><extension name="wma" /><extension name="wmx" /><extension name="asx" /><extension name="asf" /><extension name="avi" /></playServer><playServer name="http"><contentType name="application/pdf" /><contentType name="application/postscript" /><extension name="pdf" /><extension name="ps" /></playServer></playServerTable>The <playServerTable> tag is used to generate a playserver list for each content type. Note that in the preceding example, any file with a PDF file or a PostScript extension uses HTTP to play the content.
Customized Manifest Playserver Tables and the HTTP Playserver
In general, you do not need to specify your own <playServerTable> or playServer attribute in the manifest file. The ACNS 5.x software contains a default playserver table that maps appropriate file extensions or MIME-types to the proper playservers. (See the "Default PlayServer Table" section to view the default playserver table.)
When using the default playserver table, the HTTP playserver is always included in the playserver list. This means that ACNS software always allows pre-positioned content to be played using the HTTP protocol. If the default playserver table does not meet your needs, you can customize your playserver lists by defining your own <playServerTable> or by specifying a playServer attribute in the manifest file.
However, there are subtle implementation differences between ACNS 5.0.1 software and ACNS 5.0.3 software when using customized playserver tables or attributes. In ACNS 5.0.1 software, even if you do not specify HTTP on a customized playserver table, the HTTP playserver is always included automatically, so that in addition to your customized list of playservers, all pre-positioned content can always be played using the HTTP protocol. In ACNS 5.0.3 software, the HTTP playserver is included in the default playserver table; however, if you specify your own <playServerTable> or playServer attribute in <item> or <crawler>, you must add the HTTP playserver in order to play HTTP content or other content using the HTTP protocol. The HTTP playserver is not automatically included, as is the case in ACNS 5.0.1 software.
In the following example, ACNS 5.0.1 software allows both the HTTP playserver and the WMT playserver to play the content, even though only the WMT playserver is specified:
<item src="video.mpg" playServer="wmt" />However, in ACNS 5.0.3 software, only the WMT playserver plays the content. If you want to allow the HTTP playserver to play this content, you must specify the HTTP playserver, as follows:
<item src="video.mpg" playServer="wmt,http" />Furthermore, in the following example, the manifest produces different results depending on whether you are using ACNS 5.0.1 software or ACNS 5.0.3 software.
<CdnManifest><playServerTable><playServer name="wmt"><extension name="asf" /><extension name="wmv" /></playServer><playServer name="qtss"><extension name="mov" /></playServer></playServer></playServerTable><server name="server"><host name="http://server.com/" proto="http" /></server><crawler start-url="root" depth="3" ttl="45" /></CdnManifest>The contents crawled in this sample manifest file would have the following features:
•
In ACNS 5.0.1 software, files with the extension .asf and .wmv are played by both the HTTP playserver and the WMT playserver, and files with the extension .mov are played by the HTTP playserver and the QTSS (QuickTime) playserver, because the HTTP playserver is always included automatically, even though it is not specified.
•
In ACNS 5.0.3 software, files with the extension .asf and .wmv are played only by the WMT playserver, and files with the extension .mov are played only by the QTSS playserver. To include the HTTP playserver, you must add <playServer name="http"> and list all your extensions as shown in the following example:
<playServerTable><playServer name="wmt"><extension name="asf" /><extension name="wmv" /></playServer><playServer name="qtss"><extension name="mov" /></playServer><playServer name="http"><extension name="avi" /><extension name="mpeg" /><extension name="mpg" /><extension name="mp3" /><extension name="rm" /><extension name="ram" /></playServer></playServerTable>If you do not specify your own <playServerTable> or playServer attribute, the default playserver table gets used. The default playserver table always includes the HTTP playserver.
In the following manifest example, ACNS 5.0.1 software and ACNS 5.0.3 software behave the same. Because there is no <playServerTable> or playServer attribute defined, the default playserver table is used to generate a playserver for this item, which includes the HTTP playserver. The default playserver table automatically allows files to be played with the HTTP playserver as well as the associated playserver for that type of file extension.
<CdnManifest><item src="video.asf" /></CdnManifest>Generating a Publishing URL
A publishing URL is the URL that plays back pre-positioned content in the CDN. A complete publishing URL consists of three parts:
•
Scheme
•
Domain name
•
Path
The path includes both the file directory path and the filename. The playserver list determines the publishing URL for the CDN. Again, the playserver list is generated directly through the manifest file, through the <playServerTable> tag in the manifest file, or through the default <playServerTable> tag.
Scheme
The scheme of the publishing URL is the protocol used to play the content type. For example, if an .asf video file can be played by both an HTTP and a WMT playserver, two URL schemes can be used to access this content: HTTP and MMS.
Domain Name
The domain name of the publishing URL is determined by the configuration of the CDN. If WCCP is used to redirect requests to a Content Engine, its domain name is the origin FQDN (fully qualified domain name) in the website or channel. If content routing is used, the content routing FQDN (the FQDN of the website) becomes the domain name.
Path
In most cases, the path of the publishing URL is the relative src URL, or the src attribute in the <item> tags. For content crawling, it is a relative URL, relative to the host name of the origin server.
Certain attributes in the manifest file allow you to alter the publishing URL path. These attributes are cdn-url in the <item> tag, and srcPrefix of cdnPrefix in the <crawler> and <item-group> tags. These attributes convert a relative source URL into a completely new relative CDN URL.
For the content in the following example, the path uses default.html instead of index.html.
<item src="index.html" cdn-url="default.html" />The relative URL is always relative to the host name. In the following example, the relative URL is index.html, not sport/index.html.
<server><host name="http://www.cnn.com/sport/" /></server><item src="index.html" />In the following example, the srcPrefix and cdnPrefix attributes convert the prefix of every crawled content object from NBA/ to ABC/. The relative cdn-url is ABC/*. The path for the start-url is ABC/index.html.
<crawlerstart-url="NBA/index.html"srcPrefix="NBA/"cdnPrefix="ABC/"/>Specifying Attributes for Content Serving
Certain attributes in the manifest file can be specified to control the manner in which content is served by the Content Engines. These attributes can be specified in the <item> and <crawler> tags. These same attributes can also be specified in <item-group> or <options> tags, so they can be shared by their <item> and <crawler> subtags. Table 6-3 lists and describes these content-serving attributes.
Specifying Metadata for Content Serving
In certain situations, you must specify the metadata for content playback. For example, if content is acquired from an FTP server but must be played back with HTTP, the HTTP playback metadata, such as MIME-type and cache control, must be specified.
The <http-meta-data> subtag is used to specify HTTP metadata. Within the <http-meta-data> subtag shown in the following example, the name=value attribute are content-type="video/x-asf" and app-data="hh and dd." These are specified so that the CDN passes them directly to the end user when the HTTP content is played back.
<http-meta-data content-type="video/x-asf" app-data="hh and dd" />The <http-meta-data> can be specified as subtags of <item> or <crawler> tags. For every <item> or <crawler> tag in the <item-group> tag that is to share the metadata, configure the <http-meta-data> tag to be a subtag of the <item-group> tag. If a <crawler> tag has a <http-meta-data> subtag, each of its crawled content objects shares this metadata.
Specifying Time Values in the Manifest File
The following attributes require that you enter a time value in the format yyyy-mm-dd hh:mm:ss.
•
prefetch
•
serveStartTime
•
serveStopTime
•
expires
•
time-before
•
time-after
In the manifest file, the time string conforms to the yyyy-mm-dd hh:mm:ss (year-month-day hour:minute:second) format. A time zone designation can be optionally specified at the end of a time string to indicate the particular time zone used. If a time zone designation is omitted, the GMT time zone is used. For a complete list of time zone designations and their GMT offsets, please "Manifest File Time Zone Tables" section. Note that the auto conversion between daylight saving time and standard time within a time zone is not supported, but a special designation for daylight saving time can be used, such as PDT for pacific daylight saving time. In the following example, the prefetch time is September 5, 2002 at 09:09:09 pacific daylight saving time:
<options timeZone="PDT" /><item src="index.html" prefetch="2002-09-05 09:09:09 PDT" />Refreshing and Removing Content
Use the ttl (Time To Live) and expires attributes of the manifest file to monitor and control the freshness of the content objects.
The ttl attribute specifies how frequently the software checks the freshness of the content at the origin server and is expressed in minutes. If the ttl attribute is specified inside an <item> tag, it applies to that item; if it is specified inside a <crawler> tag, the attribute applies to the crawl job.
For example, if you give the ttl attribute a value of 10, the software checks the item or crawl job every 10 minutes. If the item has been updated, then the updated file is reacquired.
Note
Sometime a crawl job can be very large, crawling over thousands of files. It is very time consuming to recheck so many files. The recrawl speed is 5,000 files per hour for small files. We strongly recommend that you specify a large ttl value for such crawl jobs (for example, 1440 minutes [daily]). Otherwise, the software continues to crawl the site over and over again, blocking other acquisition tasks.
If you omit the ttl attribute in the manifest file, Time To Live is assumed to be zero and the software does not recheck that item after it is acquired. A value of 0 (zero) for ttl means that the content is fetched only once and is never checked again, unless you click the Fetch Manifest button in the Content Distribution Manager GUI or use the acquirer start-channel command at the root Content Engine CLI.
The Fetch Manifest button is located in the task bar of the Channels > Channels > Basic Settings > Definition window in the Content Distribution Manager GUI. When this button is chosen, the software checks to see if the manifest has been updated, and the updated manifest is downloaded and re-parsed. Also, regardless whether the manifest has been updated, all content in the channel is rechecked and the updated content is downloaded. The acquirer start-channel command corresponds to this GUI element.
If you assign a negative value to ttl, such as -1, that item is never to be rechecked. A negative value ttl prevents the software from checking item freshness, even if you click the Fetch Manifest button in the Content Distribution Manager GUI or use the acquirer start-channel command at the root Content Engine CLI.
Note
Configuring the Update Interval in the Content Distribution Manager GUI (Channels > Channels > Basic Settings > Definition) sets the interval for checking updates to the manifest file itself. This setting only pertains to checking the manifest file; it does not pertain to checking the content.
For further information, see the ttl attribute in the"item" section.
An attribute that can be confused with the ttl attribute is the failRetryInterval attribute. The fail and retry feature acts upon failed content or failed updates. If the acquisition of a single item or of some crawled content fails, the software automatically tries to refetch these failed objects after a default interval of five minutes. The fail and retry interval can also be specified by using the failRetryInterval attribute in the manifest.
The difference between failRetryInterval and ttl is that the ttl attribute is for successfully acquired content and the failRetryInterval attribute is for content acquisition failures. The ttl attribute needs to be specified for the software to recheck the content freshness, whereas the failRetryInterval attribute does not need to be specified unless you want to change the retry interval.
For further information, see the failRetryInterval attribute in the"item" section.
The expires attribute designates a time that the content is to be removed from CDN. If a time value is omitted when you set the expires attribute, content is stored at the CDN until it is explicitly removed when you modify the manifest file. The expires attribute uses the format yyyy-mm-dd hh:mm:ss (year-month-day hour:minute:second). In the following example, the content expires on June 12, 2003 at 2:00 p.m.
expires="2003-06-12 14:00:00 PST"If the expires attribute is specified inside an <item> tag, it applies to that item; if it is specified inside a <crawler> tag, the attribute applies to the crawl job. For further information, see the expires attribute in the"item" section.
You can monitor the status of content replication and freshness by enabling and then viewing the transaction log files that reside on the Content Engines of your CDN. To verify whether a content object or file was successfully imported to or refreshed on a particular Content Engine:
•
Enable the transaction log function on the Content Engine you want to monitor.
•
View the transaction log entries for the content object or filename that resides on that Content Engine.
Specifying Live Content
The two types of live content that you can specify in a manifest file are:
•
wmt-live
•
real-live
Use the <item> tag and specify the type attribute as either wmt-live or real-live, as shown in the following example.
<CdnManifest><server name="wmt-server"><host name="mms://www.company-web-site.org" /></server><item src="/tmp/ceo-talk" type="wmt-live" ></item><!--This is a "wmt-live" streaming content type specified by the "type" attribute. The live stream URL ismms://www.company-web-site.org/tmp/ceo-talk.--><server name="real-server"><host name="real-server" proto="rtsp" /></server><item src="tmp/funny-video" type="real-live" /><!--This is "real-live" streaming content type specified by the "type" attribute. The stream URL is rtsp://real-server/tmp/funny-video.--></CdnManifest>The two live streams are specified in the preceding manifest file example. One is wmt-live with url=mms://www.company-web-site.org/tmp/ceo-talk and the other one is real-live with url=rtsp://real-server/tmp/funny-video.
More Sample Manifest Files
This section contains five sample manifest files. In XML, text between <!--and--> are comments and have no effect on the execution of the file. In these five samples, narrative comments have been added immediately below certain tags or groups of tags to provide you with a better understanding of what these particular tags mean. You can copy an entire sample file, save it to a text file, and then view it with Microsoft Internet Explorer.
Additionally, cross-reference links from the first occurrence of a tag to the "Manifest File Reference" section section of this guide have been embedded in the narrative comments of each sample to provide you with a more in-depth explanation of the tag if you feel further explanation is necessary.
To download these sample files from Cisco.com, see the "Downloading the Sample Files" section.
The following describes the topics covered by the five sample manifest files:
•
How to use HTTP, HTTPS, and FTP protocols to acquire content
•
How to specify username and password when authentication is required
•
How to specify attributes for acquisition, such as:
–
ttl—sets the time interval between content freshness checks
–
prefetch—specifies the time when the ACNS software can start to acquire content from the origin server
•
How to specify acquisition and distribution priorities
•
How to specify the following play back attributes:
–
serveStartTime—sets time and date to start serving this content
–
serveStopTime—sets time and date to stop serving this content
–
alternativeURL—provides an alternative URL to use if content has not yet been replicated at the Content Engine
–
requireAuth—requires authentication credentials from users to play back the content
–
expires—sets content expiration time and date
–
playServer—chooses which play servers can play the specified content
–
noRedirectToOrigin—if false and content has not yet been replicated, does not redirect the incoming request to the origin server
–
<http-meta-data>—tag adds metadata for HTTP play back
•
A simple crawl job
–
FTP crawl of a directory
–
HTTP crawl of a directory
–
HTTP crawl of a web site
•
A simple crawl job using the matchRule tag
–
FTP crawl of a directory to fetch only MPEG files
–
HTTP crawl of a directory to fetch only files larger than 10 MB
–
HTTP crawl of a web site, to fetch only if-modified-since files
•
Items with the <contains> tag included
•
RealMedia and WMT streaming live content
Sample 1
Sample 1 shows how to use HTTP, HTTPS, and FTP protocols to acquire content and how to specify username and password when authentication is required.
<!--CdnManifest tag pair is absolutely essential for a manifest file. It has to be the first tag and used only as a super tag for other tags.--><!--The preceding XML defines single items, using the <item/> tag, to get from the origin server.--><item src="http://my-name:my-password@my-auth-server.xyz.com/mymoviecollection/myname/000001.wmv " /><!--This origin server requires user authentication. Notice that there is my-name:my-password in the url to specify the username and password to access the file--><!--From this origin server, the content is to be acquired using HTTPS, or HTTP over SSL, so that the protocol specified is HTTPS.-->sslAuthType="weak" /></server><!--The special tags <server> and <item> are required to specify access to origin server since "sslAuthType" cannot be specified in full path URL. The preceding XML defines the origin server from which to get content. The content is to be acquired using HTTPS, or HTTP over SSL, so that the protocol specified is HTTPS. This origin server also requires user authentication. The "user" and "password" attributes specify the required username and password to access content from the origin server. The sslAuthType is used to check either "weak" or "strong" SSL certification. For example, "weak" certification allows expired, self-signed certification.--><!--Since the <server> has been defined, we can specify the "src" attribute as a relative path. The preceding XML defines two single items to get from the origin server. These two items are relative to the web publishing root on the server.--><!--The preceding XML defines two single items to be obtained from the origin server. Notice that the first item has two forward slashes ("//"). This means the path is absolute or relative to the root directory. The second item only has one forward slash ("/"). This means that the content path is relative to the default login directory for an anonymous user. To understand absolute and relative paths, consider the following directory listings:The first directory lists the contents of /my-doc-dir, and the second directory lists the contents of anonymous-default-dir, where anonymous-default-dir is the default directory for the "anonymous" user.xyz# ls -lR /my-doc-root//my-doc-root/:total 1drwxrwxrwx 2 admin root 1024 Dec 28 01:46 myphotocollection/my-doc-root/myphotocollection:total 1-rw-rw-rw- 1 admin root 4 Dec 28 01:46 index.htmlxyz# ls -lR /anonymous-default-dir//anonymous-default-dir/:total 1drwxrwxrwx 3 admin root 1024 Dec 28 01:53 my-doc-root/anonymous-default-dir/my-doc-root:total 1drwxrwxrwx 2 admin root 1024 Dec 28 01:53 myphotocollection/anonymous-default-dir/my-doc-root/myphotocollection:total 1-rw-rw-rw- 1 admin root 4 Dec 28 01:53 index.htmlThe single item with the following absolute path<item src="/my-doc-root/myphotocollection/index.html" />fetches the file /my-doc-root/myphotocollection/index.html.The single item with the following relative path<item src="my-doc-root/myphotocollection/file1.jpg" />fetches the file /anonymous-default-dir/my-doc-root/myphotocollection/file1.jpg.You must be careful to specify exactly what you want.--><item src="ftp://my-name:my-password@my-auth-server.xyz.com//my-doc-root/mymoviecollection/index .html" /><item src="ftp://my-name:my-password@my-auth-server.xyz.com/my-own-moviecollection/wedding/file1 .wmv" /><!--The preceding XML defines two single items to obtain from the origin server. Notice that the first item specifies an absolute path, and the second one specifies a relative path. In this case, the relative path is relative to the default login directory for the user "my-name."--></CdnManifest>Sample 2
Sample 2 is a manifest file written to show how to specify attributes.
ttl="60"type="prepos"prefetch="2003-03-20 10:00:00 PST"noRedirectToOrigin="true"requireAuth="true"playServer="http,wmt"expires="2003-06-12 14:00:00 PST"alternateUrl="http://my-web-server.com/video-error-page.htm"priority="50000"serveStartTime="2003-01-12 14:00:00 PST"serveStopTime="2099-04-12 14:00:00 PST"playDuration="240"failRetryInterval="60"/><!--comments:src: specifies the file location and is required.prefetch: specifies the time when the ACNS sofware can start to acquire content from the origin server.ttl: checks whether this file is updated every 60 minutes and is required.noRedirectToOrigin: when false, does not redirect the request to origin server if the content has not yet been replicated to Content Engine.requireAuth: when true, requires authentication to play back this content to users. User requests are redirected to the origin server to check credentials. If the requests pass the credential check, the content is played back from Content Engine.playServer: allows the HTTP Apache server and WMT server to play back this content. That is, the supported play back protocols for this content are HTTP and MMS.expires: removes content from the CDN when it expires on the specified date.alternateUrl: redirects the user to this URL when the request to play back the content is received but the content has not yet been replicated to the Content Engine.priority: specifies the item-priority. Content acquisition and distribution is processed in the order set by the overall priority. This means that the higher the overall priority, the earlier the content is acquired and distributed. The overall priority is calculated as channel-priority * 10000 + item-priority. Channel priority is 250 for low, 500 for normal, and 750 for high. Item-priority is the 10000 - (index of the item in the manifest file) if a priority is not specified. For example, there are two items in this manifest file. The first item does not have a "priority" attribute, but the second item does and its priority is 20000. The item-priority of the first item is 10000 - 1 = 9999 and item-priority of the second item is 20000. In this example, the item priority for this item is 50000.serveStartTime: specifies the time CDN can start to serve this content.serveStopTime: specifies the time CDN stop to serve this content.playDuration: specifies the video play duration in second. it will beused by http play to figure out downloading bitratefailRetryInterval: specify the interval to retry in minutes if thisacquisition fails. In this example, will retry in 60 minutesend of comments -->depth="3"ttl="60"noRedirectToOrigin="false"requireAuth="true"playServer="http,wmt"expires="2003-06-12 14:00:00 PST"alternateUrl="http://my-web-server.com/video-error-page.htm"priority="50000"serveStartTime="2003-01-12 14:00:00 PST"serveStopTime="2003-04-12 14:00:00 PST"failRetryInterval="60"/><!-- commentsstart-url: specifies the crawling start directory "/root/data/video-files/."depth: specifies the crawl level of three sub-directories.noRedirectToOrigin: if false and the crawled items are not replicated to Content Engine, the request for that content is not redirected to origin server.requireAuth: if true, authentication is required for all crawled content.playServer: all crawled content can be played back by an HTTP web server and WMT streaming server.expires: all crawled content is expired and deleted on at time specified.alternateUrl: if any of crawled items is not replicated to Content Engine, the request for that content is redirected to this URL.priority: all crawled items have the same item-priority as 50000. Because they are in the same channel, they have the same overall-priority.serveStartTime: all crawled content can be served after the specified time.serveStopTime: all crawled content cannot be served after thespecified time.failRetryInterval: if some of contents in this crawl job failed, CDNwill retry these contents at specified interval. In this case, CDNwill retry in 60 minutesend of comments -->playServer="http,wmt" ><http-meta-data content-type="video/mpeg" /></item><!-- commentsThe <http-meta-data/> tag can be used to specify any metadata for HTTP play back. For example, because this item is acquired using FTP, this tag must be used to specify content-type for this MPEG file.end of comments-->depth="3" ><http-meta-data content-type="video/mpeg" /></crawler><!-- commentsFor this crawl job, three sub-directory levels are crawled under "data/mpeg-files-2/" folder. The <http-meta-data> tag is used to specify content-type for HTTP play back for all crawled content.end of comments --></CdnManifest>Sample 3
Sample 3 is a manifest file that shows how to use crawl feature.
<!-- The preceding XML specifies an FTP crawl job to crawl the "ftp-server" using the <crawler> </crawler> tag pair. Its starting directory is "pub-data/video-files/" and its crawl depth is 1. The files in this folder are monitored at 10-minute intervals. If there are files that get updated, removed, or added, the resulting change is reflected in the CDN. --></matchRule></crawler><!-- This crawl job is similar to the preceding crawl job, except that it includes a <matchRule> </matchRule> tag pair to specify the kind of content that is to be acquired. In this case, only files with the "mpg" file extension are acquired.--><!-- This is an HTTP directory crawl job. The HTTP server must be configured to enable the directory indexing feature for those directories you want to crawl. For the Apache server, you must modify the Apache configuration file such that it looks like the following:<crawl directories pub-data/video-files/ > Options Index </crawl directories> If the request URL points to a directory, the web server dynamically generates a HTML page with a list of files contained in that directory. In this crawl job, the directory "pub-data/video-files/" and its sub-directories are crawled to a depth level of up to 5. --><match size-min-in-MB="10" extension="mpg" /></matchRule></crawler><!-- This crawl job is similar to the preceding crawl job, except that it includes the <matchRule> </matchRule> tag pair. This matchRule tag fetches only files that match files with the extension "mpg" and file sizes equal to or larger than 10 MB.--><crawler start-url="http://www.cnn.com/sport/index.htm" prefix="sport/" depth="3" size-min-in-MB="1000" /><!-- This crawl job attempts to crawl part of cnn.com. The start URL is http://www.cnn.com/sport/index.htm. It only crawls URLs with the prefix http://www.cnn.com/sport/, acquiring only files from the directory "sport/." The "depth=3" means the job is to crawl only up to 3 link levels. The max-size-in-MB means that crawling stops if total of crawled items reaches 1000 MB in size.--><match extension="mpg" time-after="2002-01-02 00:00:00" /><match extension="asf" time-after="2002-07-02 00:00:00" /></matchRule></crawler><!-- This crawl job attempts to crawl part of cnn.com. Its start URL is http://www.cnn.com/movie/index.htm. and its crawl depth is 3. This matchRule acquires only files with the "mpg" extension created after than Jan. 2, 2002 or files with "asf" extension created after than July 2, 2002.--></CdnManifest>Sample 4
Sample 4 is a manifest file written to show the purpose and use of the <contains> tag. The <contains> tag is designed to prevent serving content if the required files are not present on the Content Engine. Typically, the delivery of a presentation consists of serving multiple files. For example, if an ASF video file uses several JPG or HTML files for its presentation, but the JPG or HTML files are not present, then the ASF video is not served.
<!--These are just two regular single item acquisition jobs.--></item><!-- The preceding item, movie1.asf, contains two other items, intro.html and intro.jpg. The items are contained using the <contains/> tag. If these two contained items are not present on the Content Engine, then the CDN does not serve the container file movie1.asf.--></CdnManifest>Sample 5
Sample 5 is a manifest file written to show how to specify live content. ACNS 5.0 software supports two types of live content: wmt-live and real-live. You need to use the type attribute to specify live streaming content.
<!-- This is a "wmt-live" streaming content type specified by the "type" attribute. The live stream URL is mms://www.company-web-site.org/tmp/ceo-talk.--><!-- This is "real-live" streaming content type specified by the "type" attribute. The stream URL is rtsp://real-server/tmp/funny-video.--></CdnManifest>Downloading the Sample Files
To download the preceding five sample files from Cisco.com, follow these steps:
Step 1
Go to the following URL to find the sample files:
http://www.cisco.com/pcgi-bin/tablebuild.pl/acns50
Step 2
When prompted, log in to Cisco.com using your designated Cisco.com username and password.
The Cisco ACNS Software download page appears, listing the available software updates for the Cisco ACNS Software product.
Step 3
Locate the file named ACNS-5.0.1-manifest-samples.zip. This is a ZIP archive containing the sample manifest files.
Step 4
Click the link for the ACNS-5.0.1-manifest-samples.zip file. The download page appears.
Step 5
Click Software License Agreement.
A new browser window opens, displaying the license agreement.
Step 6
After you have read the license agreement, close the browser window displaying the agreement and return to the Software Download page.
Step 7
Click the filename link labeled Download.
Step 8
Click Save to file and then choose a location on your workstation to temporarily store the zipped file containing the sample files.
Step 9
Use your preferred unzip program to unpack the scripts to a location on your workstation or your network.
After you have unzipped the sample files, you are ready to begin using them to create your own manifest files for your website.
Manifest Validator Utility
Because correct manifest file syntax is so important to the proper deployment of pre-positioned content on your CDN, Cisco makes available a manifest file syntax validator. The Manifest Validator, a Java-based command-line interface that verifies the correctness of the syntax of the manifest file you have written or modified, is built into the Content Distribution Manager GUI.
The Manifest Validator utility tests each line of the manifest file to identify syntax errors where they exist and determine whether or not the manifest file is valid and ready for use in importing content into your CDN. The results of these syntax validation tests are logged into a text file at a location that you name.
Running the Manifest Validator Utility
The Manifest Validator utility is built into the Content Distribution Manager GUI. Figure 6-1 shows the Manifest Validator GUI window.
Figure 6-1 Manifest Validator Content Distribution Manager GUI Window
After you create a channel, you can access the Manifest Validator using the following GUI path:
Channels Tab > Channels > Edit or Create Channel Icon > Tools > Manifest Validator
Note
You must first create a new channel or edit an exiting channel before you can access the Manifest Validator.
Under Validate Manifest File, enter the URL of the manifest file that you want to test in the Manifest File field and click Validate. The Manifest Validator checks the syntax of your manifest file to make sure that source files are named for each content item in the manifest. It then checks the URL for each content item to verify that the content is placed correctly and then displays the output in the lower part of the GUI window. The manifest file validator does not determine the size of the item.
A Valid Manifest File Example
The following is an example of a valid manifest file:
<CdnManifest><itemsrc="tmp/mao's.html"priority="20"/><server name="my-dev'box"><host name="http://128.107.150.26"proto="http" /></server><itemsrc="tmp/lu.html"priority="300"/><itemsrc="/tmp/first_grader.html"/><server name="server0"><host name="http://umark-u5.cisco.com:8080/" /></server><item src="a.gif"/><server name="server1"><host name="http://unicorn-web" /></server><item src="Media/wmtfiles/DCA%20Disk%201/Microsoft_Logos/Logos_100k.wmv" /></CdnManifest>The final lines of the manifest file validator's output indicate whether or not the manifest is valid or not. Wait until the following message is displayed, indicating that the validator has completed processing the manifest file you pointed to:
Total Number of Error: 0Total Number of Warning: 0Manifest File is CORRECT.If errors are found, the error messages reported appear before the preceding message.
Invalid Manifest File Example
The following is an example of an invalid manifest file:
<CdnManifest><itemsrc="tmp/mao's.html"priority="20"/><server name="my-dev'box"><host name="http://128.107.150.26"proto="http" /></server><itemsrc="tmp/lu.html"priority="300"/><itemsrc="/tmp/first_grader.html"/><server name="server0"><host name="http://umark-u5.cisco.com:8080/" ></server><item src="a.gif"/><server name="server1"><host name="http://unicorn-web" /></server><item src1="Media/wmtfiles/DCA%20Disk%201/Microsoft_Logos/Logos_100k.wmv" /></CdnManifest>In the preceding example, although there are no warnings, two errors are found, and this manifest file is syntactically incorrect, as shown in the following message:
ERROR (/state/dump/tmp.xml.1040667979990 line: 23 col: 1 ):No character data is allowed by content modelERROR (/state/dump/tmp.xml.1040667979990 line: 23 col: 9 ):Expected end of tag 'host' Manifest File: /state/dump/tmp.xml.1040667979990Total Number of Error: 2Total Number of Warning: 0Manifest File is NOT CORRECT!The following is a full-text output example of the invalid manifest file after the Manifest Validator checks the file:
Manifest validated: http://qiwzhang-lnx/nfs-obsidian/Unicorn/my-single-bad.xmlThe manifest is downloaded as /state/dump/tmp.xml.1040667979990 for validation, this file will be removed when validation is completed.Start CdnManifestStart itempriority=20src=tmp/mao's.htmlEnd itemStart servername=my-dev'boxStart hostname=http://128.107.150.26proto=httpuuencoded=falseEnd hostEnd serverStart itempriority=300src=tmp/lu.htmlEnd itemStart itemsrc=/tmp/first_grader.htmlEnd itemStart servername=server0Start hostname=http://umark-u5.cisco.com:8080/uuencoded=falseERROR (/state/dump/tmp.xml.1040667979990 line: 23 col: 1 ):No character data is allowed by content modelERROR (/state/dump/tmp.xml.1040667979990 line: 23 col: 9 ):Expected end of tag 'host' Manifest File: /state/dump/tmp.xml.1040667979990Total Number of Error: 2Total Number of Warning: 0Manifest File is NOT CORRECT!Understanding Manifest File Validator Output
The manifest file validator messages appear below the Manifest File field in the Manifest Validator window of the Content Distribution Manager GUI.
Each output file has a similar structure and syntax. It clearly identifies any errors or warning messages arising from incorrect manifest file syntax. Manifest files are determined by the validator to be either:
•
CORRECT—Contains possible syntax irregularities but is syntactically valid and ready for deployment on your CDN
•
INCORRECT—Contains syntax errors and is unsuitable for deployment on your CDN
Syntax Errors
The manifest file validator issues syntax errors only when it cannot identify a source file for a listed content item, either because it is not listed, or because it is listed using improper syntax. Files containing syntax errors are marked INCORRECT.
Syntax errors are identified in the output with the ERROR label. In addition to the label, the line and column number containing the error is provided, as well as the manifest attribute for which the error was issued. An error appears in the following example:
ERROR (/state/dump/tmp.xml.1040667979990 line: 23 col: 1 ):No character data is allowed by content model•
/state/dump/tmp.xml.1040667979990 is the manifest file name
•
line: 23 col: 1 is the manifest file line and column number where the error occurs
•
No character data is allowed by content model describes the type of manifest file error
Syntax Warnings
The manifest file validator issues syntax warnings for a wide variety of irregularities in the manifest file syntax. Files containing syntax warnings may be marked CORRECT or INCORRECT, depending on whether or not syntax errors have also been issued.
Syntax warnings are identified in the output with the WARNING label. In addition to this warning label, the line number for which the warning is issued is provided, as well as the manifest attribute, valid options, and the default value for that attribute for which the warning was issued.
Correcting Manifest File Syntax
Once you have identified syntax warnings, errors, and messages using the output from the manifest file validator, you can correct your manifest file syntax and then rerun the manifest file validator on the corrected file to verify its correctness.
To correct syntax warnings and errors in your manifest file, follow these steps:
Step 1
Open your manifest file using your preferred XML editor.
Step 2
Referring to your manifest file validator output, use the line numbers provided by the manifest file validator to locate the syntax violations in your manifest file.
It is a good idea to review every warning and error in your manifest file. Some warnings, although they still allow the manifest file validator to find your manifest file syntax to be correct, can be the source of problems when you deploy the identified content to your CDN.
Step 3
After you have made the necessary corrections for syntax warnings and errors, click Save.
Step 4
Run the manifest file through the manifest file validator again and review the validator output for new or unresolved errors and warnings.
Step 5
Repeat Step 1 through Step 4 until every error and warning has been adequately resolved and the manifest validator indicates that your manifest file syntax is correct.
Manifest File Reference
This major section contains the following topics:
•
Manifest File Structure and Syntax
•
Manifest File Automated Scripts
•
Manifest File Time Zone Tables
The most efficient and least error-prone methods of creating a manifest file are:
•
Modify one of the sample manifest files in this chapter to suit your particular needs, ensuring that your XML syntax is correct.
•
Use the two sample Perl scripts that can be downloaded from Cisco.com as is, or customize these downloaded scripts for your own purposes.
You can start with one of the prewritten sample XML manifest files presented in this chapter. Choose a sample manifest file that is closest to matching your content acquisition and pre-positioning needs, and then modify the XML code accordingly, while ensuring that your XML syntax is correct.
Alternatively, use the sample Perl scripts that Cisco provides (see the "Obtaining the Perl Scripts" section).
Once you have created a suitable manifest file, you can verify its correctness by running the Manifest Validator utility (see the "Manifest Validator Utility" section) on your newly written XML code from the Content Distribution Manager GUI.
Manifest File Structure and Syntax
The Cisco ACNS 5.0 software manifest file provides powerful features for representing and manipulating CDN data that can be easily edited using any simple text editor. Table 6-4 provides a summary list of the manifest file tags, their corresponding attributes and subelements, and a brief description of each tag. Table 6-5 shows an example of how tags are nested in a manifest file. The sections that follow provide a more detailed description of the manifest file tags, the data they contain, and their attributes.
Table 6-4 Manifest File Tag Summary
Tag Name Subelements Attributes Description<playServerTable/>
<options/>
<server/>
<item/>
<item-group/>
<crawler/>None
Marks the beginning and end of the manfest file content.
<playServer/>
None
(Optional) Sets default mappings for media types.
<contentType/>
<extension/>name1
real
httpgtss
wmtNames the media server type on the Content Engine responsible for playing content types and files with extensions mapped to it using <contentType> tags.
None
name
http
media
qtssreal
wmtSee Table 6-6
(Optional, but must have either content-type or extension) Names the MIME-type content mapped to a playserver.
None
name
http
media
qtssreal
wmtSee Table 6-6
(Optional, but must have either content-type or extension) Names the file extension that is mapped to a playserver.
None
timeZone
alternateUrl
expires
noRedirectToOrigin
playServer
prefetch
serveStartTimeserveStopTime
server
priority
ttl
type
playDuration
failRetryInterval(Optional) Defines attributes specific to the manifest file that can be shared.
<host/>
name
Defines only one host from which content is to be retrieved.
None
name
proto
port
userpassword
unencoded
sslAuthTypeDefines a web server or live server from which content is to be retrieved and later pre-positioned.
<contains/>
<http-meta-data/>src
cdn-url
type
noRedirectToOrigin
playServer
prefetchexpires
ttlserveStartTime
serveStopTime
alternateUrl
priority
requireAuth
playDuration
failRetryIntervalIdentifies specific content that is to be acquired from the origin server.
<http-meta-data/>
<matchRule/>start-url
depth
prefix
accept
reject
max-number
max-size-in-MB
srcPrefix
cdnPrefix
requireAuth
noRedirectToOrigin
playServerprefetch
expires
ttl
serveStartTime
serveStopTime
alternateUrl
priority
server
type
playDuration
failRetryIntervalSupports crawling of a website or FTP server.
<http-meta-data/>
<matchRule> <matchRule/>
<crawler></crawler>
<item></item>alternateUrl
expires
noRedirectToOrigin
playServer
prefetch
serveStartTime
serveStopTimeserverprioritysrcPrefix
cdnPrefix
ttl
type
requireAuth
playDuration
failRetryIntervalPlaces shared attributes under one tag so that they can be shared by every <item> and <crawler> tag within that group.
<match>
None
(Optional) Defines additional filter rules for crawler jobs.
None
MIME-type
extensions
time-beforetime-after
size-min-in-KB
size-max-in-KB(Optional) Specifies the acquisition criteria of content objects before they can be acquired by the CDN.
None
cdn-url
(Optional) Identifies content objects that are embedded within the content item currently being described.
None
name=value
(Optional) Sends HTTP response headers to end user HTTP requests to specify content type for FTP acquired content.
1 Attributes that are required for a tag are shown in boldface font.
CdnManifest
The<CdnManifest> </CdnManifest> tag set is required and marks the beginning and end of the manifest file content. At a minimum, each <CdnManifest> tag set must contain at least one item, or content object, that is fetched and stored.
Attributes
None
Subelements
The <CdnManifest> tag set can contain the following subelements:
•
playServerTable
The <CdnManifest> tag set can only contain one playServerTable subelement.
•
options
The <CdnManifest> tag set can only contain one options subelement.
•
server
•
item
•
item-group
•
crawler
Example
<CdnManifest><server name="origin-server"><host name="www.name.com" proto="http" port="80" /></server><item cdn-url= "logo.jpg" server="originserver" src= "images/img.jpg" type="prepos"playServer="http" ttl="300"/></CdnManifest>playServerTable
The <playServerTable> </playServerTable> tag set is optional and provides a means for you to set default mappings for a variety of media types. Mappings can be set for both MIME-type content (the preferred mapping) and file extensions. Playserver tables allow you to override default mappings on the Content Engine for content types from a particular origin server. Playservers can be any one of the following four streaming servers: WMT, RealMedia, HTTP, or QTSS. If no <playServerTable> tag is configured in the manifest file, a default <playServerTable> tag is used.
Using the manifest file, you can map groups of content items as well as individual content objects to an installed playserver. The following are content item and manifest file playserver mappings:
•
Content item URL
Playserver mappings appear immediately after the origin server name in place of the default <cdn-media> tag.
•
Manifest file as an attribute of the <item> or <item-group> tag
Playserver mappings placed at this location are identified using the playserver attribute and only apply to the named item or group of items.
•
Manifest file as a playserver table
Mappings are grouped within the <playServerTable> and <playServer> tags and are applied to content served from the origin server as directed by the manifest file.
•
System-level
Playserver mappings are configured during CDN startup.
The <playServerTable> tags are enclosed within the <CdnManifest> tags and name at least one of four playservers, such as RealServer, to which certain MIME-types and file extensions are mapped.
Attributes
None
Subelements
The <playServerTable> element must contain at least one <playServer> tag.
playServer
The <playServer> </playServer> tag set is required for the <playServerTable> tag and names the media server type on the Content Engine that is responsible for playing the content types and files with extensions mapped to it using the <contentType> tags. The <playServer> tag is enclosed within <playServerTable> tags.
Note
Do not confuse the <playServer> tag with the playserver setting in an <item> or <item-group> tag. An <item> or <item-group> tag specifies a server type to be used for an individual content object or group of related content objects. Although both playserver settings accomplish the same task, <item> tag-level playserver settings take precedence over the content-type and file extension mappings specified by the <playServer> tags in the <playServerTable> tag.
Attributes
The <playServer> tag name is required. Each <playServer> tag names the type of server to which content is mapped using the name attribute. In ACNS 5.0 software, Content Engines support four types of playservers:
•
real: RealMedia RealServer
•
http: HTTP web server
•
qtss: Apple QuickTime Streaming Server
•
wmt: Microsoft Windows Media Technologies
Subelements
At least one of the following subelements must be present in a <playServer> tag set.
•
<contentType/>
•
<extension/>
contentType
The <contentType/> tag is optional but either a <contentType/> or an <extension/> subelement must be present in a <playServer> tag set. The <contentType/> tag names MIME-type content that is to be mapped to a playserver. The <contentType/> tag must be enclosed within a <playServer> tag set. When both <contentType/> and <extension> tags are present in a <PlayServerTable> tag for a particular media type, the <contentType/> mapping takes precedence.
Attributes
Each <contentType/> tag names a media content-type that is to be mapped to the playserver using the name attribute. The name attribute is required. Table 6-6 lists supported media types.
Subelements
None
extension
The <extension/> tag is optional but either a <contentType/> or an <extension/> subelement must be present in a <playServer> tag set. The <extension/> tag names the file extension that is being mapped to a playserver.
The <extension/> tag follows the <playServer> tag. When both <contentType/> and <extension/> tags are present in the <playServer> tag for a particular media type, the <contentType/> mapping takes precedence.
Attributes
The name attribute is required and provides the file extension for a mapped content type. When files with the named extension are requested, the mapped playserver is used to serve them.
Subelements
None
Example
<CdnManifest><playServerTable><playServer name="real"><contentType name="application/x-pn-realaudio" /><contentType name="application/vnd.rn-rmadriver" /><extension name="rm" /><extension name="ra" /><extension name="rp" /><extension name="rt" /><extension name="smi" /></playServer><playServer name="wmt"><extension name="asx" /><extension name="asf" /><extension name="avi" /></playServer><playServer name="http"><contentType name="application/pdf" /><contentType name="application/postscript" /><extension name="pdf" /><extension name="ps" /></playServer></playServerTable><server name="test.origin.com/"><host name="http://tst.orgn.com" proto="http" /></server><itemsrc="pic1.mpg"/></CdnManifest>options
The <options/> tag is optional and used to define attributes specific to the manifest file. Shared attributes can be inherited by <item> and <crawler> tags in the manifest file. For example, timeZone is an attribute specific to the manifest file that is used to set the time zone for all time-related values. Attributes such as ttl and alternativeUrl can exist as <options/> tags, and their values can be shared by all <item> and <crawler> tags within the manifest file.
The <options/> tag set is enclosed within the <CdnManifest> tag set and specifies at least one global setting. No more than one <options> tag is allowed per manifest file.
If parameters are defined within the manifest file <options/>, <item-group>, or <item> tags, the order of precedence from lowest to highest is <options/>, <item-group>, and <item>.
Attributes
The timeZone attribute specifies the time zone for time values of attributes such as serveStart, serveStop, expire, and prefetch.
The following list of attributes can be shared by <item> and <crawler> tags. See the "item" section for descriptions of the following attributes:
•
alternateUrl
•
bitrate-in-bps/bitrate-in-kbps/bitrate-in-mpbs
•
expires
•
ignoreQueryString
•
noRedirectToOrigin
•
playServer
•
prefetch
•
serveStartTime
•
serveStopTime
•
server
•
priority
•
ttl
•
type
•
requireAuth
•
playDuration
•
failRetryInterval
Subelements
None
Example
<CdnManifest><optionsnoRedirectToOrigin= "true"timeZone="PST" /></CdnManifest>server
The <server> and <host> tag fields configure the origin content source server. The <host> tag field inside the <server> tag field configures the content source host. Having multiple <host> tag fields in one <server> tag field is not supported in ACNS 5.0 software.
Each <item> or <item-group> tag can have a server attribute that refers to this <server> tag field.The <server> </server> tag set is required and define only one host from which content is to be retrieved. The <server> tags are contained within <CdnManifest> tags and contain one <host> tag that identifies the host from which content is retrieved.
Attributes
•
name
The name attribute is required and can be any name as long as it matches the server attribute values in the <item> or <crawler> tags.
Subelements
<host/>
The <server> tag set can only contain one <host/> subelement.
host
The <host/> tag is required and defines a web server or live server from which content is to be retrieved and later pre-positioned. Only one host can be defined within a single <server> tag set. The <host> tag must be enclosed within <server> tags.
Attributes
•
name
The name attribute is required and identifies the domain name or IP address of the host, unless the proto attribute field is empty. If the proto attribute field is empty, the name attribute must be a fully qualified URL, including scheme and domain name or IP address. It can also include subdirectories, such as http://www.abc.com/media.
•
proto
The proto attribute is optional and identifies the communication protocol that is used to fetch content from the host. Supported protocols are HTTP, HTTPS, or FTP. The default proto attribute is HTTP. The proto attribute can be empty if the name attribute is a fully qualified domain name (FQDN).
•
port
The port attribute is optional and identifies the TCP port through which traffic to and from the host passes. The port used depends on the protocol used. The default port for HTTP is 80. The port attribute is only required for a nonstandard port assignment. The port attribute can also be specified in the name attribute, such as name="http://www.cisco.com:8080/."
•
user
The user attribute is optional and identifies the secure login used for host access.
•
password
The password attribute is optional and identifies the password for the user account that is required to access the host server.
•
unencoded
The unencoded attribute is optional. If set to true, the password is not encoded. The unencoded attribute default setting is false.
•
sslAuthType
The sslAuthType attribute is optional and has two possible values for the type of encryption:
–
strong
The default sslAuthType attribute setting is strong.
–
weak
Subelements
None
item
The <item> </item> tag set identifies the specific content that is to be acquired. The <item> tag names a single piece of content or a content object on the origin server, such as a graphic, MPEG video, or RealAudio sound file. Content items can be listed individually or grouped using the <item-group> tag.
The <item> tag must be enclosed within the <CdnManifest> tag set and can also be enclosed within <item-group> tags.
Attributes
•
src
The src attribute is required and identifies the URL from which to fetch the content. The URL can be a full URL or a relative URL. A full URL has the following format:
proto://username:password@/domain-name:port/file-path/file-name
If a relative path is used, the <server><host> tags are required to specify origin-server information.
For example:
<item src="http://user:password@www.cisco.com/HR/index.html" /><server name="ftp-server" ><host name="ftp://ftp-server" user="johw" password="wwww" /></host><item src="data/video.asf" />
Note
A URL containing a question mark ("?") is not supported. A manifest parsing error will occur if you specify a URL that contains a question mark.
Note
A URL containing a pound sign ("#") will be modified. All characters that follow a pound sign will be discarded including the pound sign itself.
•
server
The server attribute is optional and refers to the server name in the <server> tag. If the server attribute is omitted, the server listed in the closest <server> tag is used. If there is no <server> tag close to this <item>, the manifest server is used.
•
cdn-url
The cdn-url attribute is optional and is the relative CDN URL to allow end users to access this content. If no cdn-url value is specified, then the src value is used as the relative CDN URL.
Note
If you use FTP to acquire content and the content type is not specified in the manifest file and the cdn-url is specified to alter your publishing URL, the cdn-url must have the correct extension. Otherwise, the incorrect content type will be generated and you cannot play the content.
•
type
The type attribute is optional and defines whether content is to be pre-positioned or live on the CDN. The three type attributes are prepos, wmt-live, and real-live. The wmt-live and real-live attributes are used to deliver live content. If this field is left blank, the default type is prepos.
•
noRedirectToOrigin
The noRedirectToOrigin attribute is optional and sets the redirection to the origin server to true or false. A false setting allows the CDN Content Engine or other edge device to redirect content requests to the origin server if the content is available at that device. A true setting does not allow the CDN Content Engine or edge device to redirect content requests to the origin server, and it generates an error. The default noRedirectToOrigin setting is false. For the effect of the noRedirectToOrigin attribute on pre-positioned content freshness, see the "Configuring Freshness of Pre-Positioned Content" section.
•
playServer
The playServer attribute is optional and names the server used to play back the content. Valid playservers are real (RealServer), wmt (Windows Media Technologies), qtss (QuickTime Streaming Server), and http (web server). The value in this field is either one playserver or multiple playservers separated by commas. If a value for this attribute is left blank, the <PlayServerTable> tag in the manifest file is used to generate the playserver list for this content. If the manifest file does not have the <PlayServerTable> tag specified, it uses the default <PlayServerTable> tag.
•
prefetch
The prefetch attribute is optional and designates a time in yyyy-mm-dd hh:mm:ss (year-month-day hour:minute:second) format at which the content is to be retrieved from the origin server. The time zone for the time can be specified in the <options> tag. Note that the auto conversion between daylight saving time and standard time within a time zone is not supported, but a special designation for daylight saving time can be used, such as PDT for pacific daylight saving time. In the following example, the prefetch time is September 5, 2002 at 09:09:09 pacific daylight saving time:
<options timeZone="PDT" />
<item src="index.html" prefetch="2002-09-05 09:09:09 PDT" />
If a time value is omitted, the content is acquired immediately.
•
expires
The expires attribute is optional and designates a time in yyyy-mm-dd hh:mm:ss format when the content is to be removed from CDN. Additionally, you can specify the GMT time zone (see the "Specifying Time Values in the Manifest File" section). If a time value is omitted, content is stored at the CDN until it is removed when you modify the relevant manifest file code. For the effect of the HTTP header expires attribute on pre-positioned content freshness, see the "Configuring Freshness of Pre-Positioned Content" section.
•
ttl
The ttl attribute is optional and designates a time interval, in minutes, for revalidation of the content. If a time value is omitted, the content is fetched only once and its freshness is never checked again. Usually the ttl attribute is a positive value, howerver you can also assign a negative value to the ttl attribute. The following list describes ttl attribute value ranges.
•
serveStartTime
The serveStartTime attribute is optional and designates a time in yyyy-mm-dd hh:mm:ss format when the CDN is allowed to start serving the content. If the time to serve is omitted, content is ready to serve once it is distributed to the Content Engine or other edge device.
•
serveStopTime
The serveStopTime attribute is optional and designates a time in yyyy-mm-dd hh:mm:ss format when the CDN temporarily stops serving the content. If the time to stop serving is omitted, the CDN serves the content until it is removed by modifying the relevant manifest file code. For the effect of the serveStopTime attribute on pre-positioned content freshness, see the "Configuring Freshness of Pre-Positioned Content" section.
•
alternateUrl
The alternateUrl attribute is optional. If content requested by the user is not ready in the CDN, the CDN redirects the request to this alternative URL, which can be configured as an error reporting page. The alternateUrl attribute supports only the full URL.
•
priority
The priority attribute is optional and can be any integer value to specify the content processing priority. If a priority value is omitted, its index order within the manifest file is used to set the priority.
•
requireAuth
The requireAuth attribute is optional and determines whether users need to be authenticated before the specified content is played. When true, the Content Engine requires authentication to play back the specified content to users and communicates with the origin server to check credentials. If the requests pass the credential check, the content is played back from the Content Engine. If this attribute is omitted, a heuristic approach is used to determine its value: if the specified content is acquired by using a username and password, requireAuth is set to true, otherwise it is set to false. For FTP, if the username is anonymous, requireAuth is set to false.
•
playDuration
Specifies the play time duration, in seconds, for a video file. This time duration value is used to:
–
Download the video file and play it over HTTP
If the video file is played using HTTP, the CDN uses the playDuration attribute to calculate HTTP download bit rate.
–
Schedule a video TV-Out program
If the time duration value is omitted, the CDN attempts to calculate this value for an MPEG file.
•
failRetryInterval
This value specifies the retry interval, in minutes, when content acquisition fails. For example, failRetryInterval ="10" means the CDN retries content acquisition every 10 minutes when acquisition fails. If the retry universal value is omitted, the default value is five minutes (the minimum failRetryInerval value accepted). If a value of less than five minutes is specified, that value is converted to five minutes.
Behaviour is different between a single time and a crawl item.
For single item failure:
if ( ttl != 0, ttl < retryInterval)the item is rechecked with the ttl attribute. Otherwise recheck the item using the failRetryInterval attribute.
For crawl item failure:
If some of pages failed ( excluding 300 and 400 serial error status codes) only failed items are rechecked during the retry interval.
When the ttl attribute interval occurs, all pages are recrawled.
if ( ttl != 0 and ttl < retryInterval )always re-crawlFor examle, if ttl= 10, and retry=4, the following will occur:
Number of Minutes Action0
crawl
4
re-check failed
8
re-check failed
10
re-crawl
14
re-check
18
re-check
20
re-crawl
•
bitrate-in-bps/bitrate-in-kbps/bitrate-in-mbps
Specifies the bitrate of the content. The CDN uses this value in playback download. In the following example:
<item src="http://www.cisco.com/Prod/ACNS.swf" bitrate-in-kbps="500" /> means that ACNS software uses 500 kbps to download this file if it is requested as download and play.•
ignoreQueryString
This is a playback attribute that can be used with the options, item-group, item, and crawler tags. If value is is true, then the CDN will ignore any string after a question mark (?) in the request URL for playback. If this attribute is omitted, then the default value is false.
For example, content with the request URL url=http://web-server/foo has been pre-positioned. If a user requests content with the URL url=http://web-server/foo?id=xxx and the ignoreQureryString attribute value is false, then the CDN will not use the pre-positioned content fromthe URLrequest http://web-server/foo.
However, if the ignoreQueryString attribute is set to true, then the CDN treats the URL request http://www-server/foo?id=xxx the same as http://www-server/foo and returns with pre-positioned content.
Subelements
•
<contains/>
•
<http-meta-data/>
Example
<itemsrc="index.html"server="cisco.com"ttl="3000"alternateUrl="http://www.cisco.com/cdn-error.html"/>crawler
The <crawler> </crawler> tag set supports crawling a website or an FTP server.
Attributes
•
start-url
The start-url attribute is required. It defines the URL at which to start the process of crawling the website or FTP server. It is identical to the src attribute used in the <item> tag (see the "src" section).
•
depth
The depth attribute is optional and defines the link depth to which a website is to be crawled or directory depth to which an FTP server is to be crawled. If the depth is not specified, the default is 20. The following are the general depth values:
0 = acquire only the starting URL
1 = acquire the starting URL and its referred files
-1 = infinite or no depth restrictionThe depth is defined as the level of a website or the directory level of an FTP server, where 0 is the starting URL.
•
prefix
The prefix attribute is optional and combines the host name from the <server> tag with the value of the prefix attribute to create a full prefix. Only content whose URLs match the full prefix are acquired. For example:
<server name="xx"> <host name="www.cisco.com" proto="https" port=433 /> </server>and in a <crawler> tag:
prefix="marketing/eng/"The full prefix is "https://www.cisco.com:433/marketing/eng/." Only URLs that match this prefix are crawled.
If a prefix is omitted, the crawler checks the default full prefix, which is the host name portion of the URL from the server. In the previous example, the default full prefix is "https://www.cisco.com:433."
•
accept
The accept attribute is optional and uses a regular expression to define acceptable URLs to crawl in addition to matching the prefix. For example, accept="stock" means that only URLs that meet two conditions are searched: the URL matches the prefix and contains the string "stock." (See the "Writing Common Regular Expressions" section for more information on using regular expressions.)
•
reject
The reject attribute is optional and uses a regular expression to reject a URL if it matches the reject regular expression. The reject regular expression is checked after checking for a prefix URL match. If a URL does not match the prefix, it is immediately rejected. If a URL matches the prefix and the reject parameters, it is rejected by the particular reject constraint. (See the "Writing Common Regular Expressions" section for more information on using regular expressions.)
•
max-number
The max-number attribute is optional and specifies the maximum number of crawler job objects that can be acquired.
•
max-size-in-MB
The max-size-in-MB attribute is optional and specifies the maximum content size in megabytes that this crawler job can acquire. The size attribute can be expressed in megabytes (MB), kilobytes (KB), or bytes (B).
•
srcPrefix
The srcPrefix attribute is optional and must be used in conjunction with the cdnPrefix attribute to form a relative CDN URL. If a srcPrefix attribute is not specified, or if the prefix of the relative source URL does not match the srcPrefix attribute, then the relative CDN URL is the cdnPrefix value combined with the relative source URL. For example, if these content objects have same source URL prefix "acme/pubs/docs/online/Design/" and you want to replace this prefix with a simple "online/," then specify srcPrefix="acme/pubs/docs/online/Design/" and cdnPrefix="online/."
•
cdnPrefix
The cdnPrefix attribute is optional and must be used in conjunction with the srcPrefix attribute.
•
requireAuth
The requireAuth attribute is optional and determines whether users need to be authenticated before the specified content is played. When true, the Content Engine requires authentication to play back the specified content to users and communicates with the origin server to check credentials. If the requests pass the credential check, the content is played back from the Content Engine. If this attribute is omitted, a heuristic approach is used to determine its value: if the specified content is acquired by using a username and password, requireAuth is set to true, otherwise it is set to false. For FTP, if the username is anonymous, requireAuth is set to false.
The following attributes, described under the item attributes, can also be specified by the <crawler> tag.
•
alternateUrl
•
bitrate-in-bps/bitrate-in-kbps/bitrate-in-mbps
•
expires
•
failRetryInterval
•
ignoreQueryString
•
noRedirectToOrigin
•
playDuration
•
playServer
•
prefetch
•
serveStartTime
•
serveStopTime
•
server
•
priority
•
ttl
•
type
Subelements
•
<http-meta-data/>
•
<matchRule></matchRule>
Example
<server name="cisco"><host name="http://www.cisco.com/jobs/" /></server><crawlerserver="cisco"start-url="eng/index.html"depth="10"prefix="eng/"reject="\.pl"max-size-in-MB="200"/>item-group
The <item-group> </item-group> tag set is used to place shared attributes under one tag so that they can be shared by every <item> and <crawler> tag within that group. When attributes are shared, it means that attributes can be defined at either the <item-group> tag level for group-wide control or on a per <item> or per <crawler> tag basis. For example, if every <item> tag is using the same server and ttl attribute, you can create an <item-group> tag on top of these <item> tags and place the server and ttl attributes in the <item-group> tag.
Using shared attributes makes any manifest file with many <item> tags more efficient by consolidating the <item> tags with shared attributes. If the same attribute value exists in both the <item-group> and <item> tags, the value in the <item> tag takes precedence over that value in the <item-group> tag.
The <item-group> tag must be enclosed within the <CdnManifest> tag set and contain one or more <item> or <crawler> tags.
Attributes
If an attribute value is present only at the <item-group> tag level, then it is inherited by its inner element in the <item> tag. If an attribute value is present in a crawler job, its attributes, whether inherited or owned, are propagated to the content fetched by the crawler job.
The following attributes can be shared across many <item> and <crawler> tags and are candidates for the <item-group> level tag. See the "item" section for detailed descriptions of the following attributes:
•
alternateUrl
•
bitrate-in-bps/bitrate-in-kbps/bitrate-in-mbps
•
expires
•
failRetryInterval
•
ignoreQueryString
•
noRedirectToOrigin
•
playDuration
•
playServer
•
prefetch
•
requireAuth
•
serveStartTime
•
serveStopTime
•
server
•
priority
•
ttl
•
type
Additionally, the following two attributes can be placed within the <item-group> tag. See the "crawler" section for a detailed description of the two following attributes:
•
srcPrefix
•
cdnPrefix
These two attributes convert the prefix of the src-url (retrieve URL) to the cdn-url (publish URL) for multiple content objects. These content objects are either implicitly specified by multiple <item> tags or acquired through a crawler job.
These two attributes can also be specified in the <crawler> tag. If you explicitly specify the srcPrefix attribute and cdnPrefix attribute for an individual <crawler> job, the <crawler> tag-level specification takes precedence over the <item-group> tag-level settings. If you do not specify these attributes for an individual <crawler> job, the <item-group> tag-level specification is inherited by the <crawler> job.
The srcPrefix and cdnPrefix attributes generate the relative CDN URL using the following rules:
•
If the cdn-url attribute is present in the <item> tag, the relative CDN URL contains both the cdnPrefix attribute plus the cdn-url attribute. For example, if cdnPrefix="eng/spec" and cdn-url="e/f.html," the relative path in the URL is "eng/spec/e/f.html."
•
If the srcPrefix attribute is not present in the <item> tag, the relative CDN URL is the cdnPrefix attribute plus the relative source URL.
•
If the prefix of the relative source URL does not match the srcPrefix attribute, the relative CDN URL is the cdnPrefix attribute plus the source relative URL.
•
To generate a relative CDN URL, remove the matched prefix from the relative source URL and replace it with the cdnPrefix attribute.
The relative CDN URL of <item> in the following example is "acme/default.htm."
<item-group cdnPrefix="acme/" ><item src="design/index.html" cdn-url="default.html" /></item-group>In the following example, content objects with the srcPrefix attribute, such as "design/plan/," have the relative CDN URL as "acme/" plus relative source URLs stripped of "design/plan/." Other content objects whose prefix attribute does not match "design/plan/" have "acme/" plus their original relative source URL.
<crawlerstart-url="design/plan/index.html"depth="-1"srcPrefix="design/plan/"cdnPrefix="acme/" />Subelements
•
<matchRule></matchRule>
•
<http-meta-data/>
•
<crawler></crawler>
•
<item></item>
Example
<!--grouped content items--><item-group server="origin-web-server" type="prepos" ttl="300" cdnPrefix="unicorn/" ><item cdn-url="newHQpresentation.rm" src="newHQpresentation.rm" /><item cdn-url="animatedlogo.mpg" src="animlogo.mpg" /><item cdn-url="companytheme.mp3" src="cotheme.mp3" /><item cdn-url="newHQlayout.avi" src="newHQ.mov" /></item-group>matchRule
The <matchRule> </matchRule> tag set is optional and defines additional filter rules for crawler jobs. It affects only <crawler> tasks and is not used by single <item> tags. The crawler parameters defined in the <crawler></crawler> tag set determine primarily the scope of a crawl search. If a content object does not meet the criteria specified by the crawler parameter, neither it nor its children are searched.
The <matchRule> tag, however, determines only whether or not the content objects should be acquired regardless of the scope of the search. If a web page matches the crawler parameters without the <matchRule> feature, its children are searched even though its content objects are not acquired.
In the following crawler job example using the <matchRule> tag, the entire website is searched but only files with the .jpg file extension larger than 50 kilobytes are acquired.
<crawler start-url="index.html" depth="-1" ><matchRule><match size-min-in-KB="50" extensioin="jpg" /></matchRule></crawler>The <matchRule> element can be nested within an <item-group> tag to define group-wide filter rules for <crawler> tags contained in the group. It can also be a subelement of a particular <crawler> job. The <crawler> tag-level setting overrides the <item-group> tag-level setting when both tags are present.
If you define criteria locally for individual <crawler> jobs, any existing group-level criterion is entirely discarded for that <crawler> job. That is, if your <item-group> tag match rule is set to A and your <crawler> tag specifies another match rule set to B, only B is to be used for the <crawler> tag rather than a combination of A and B. You can define at most one <matchRule> tag per <item-group> tag and at most one <matchRule> tag per <crawler> tag.
Attributes
None
Subelements
At least one <match> tag
match
The <match> </match> tag is optional and specifies the acquisition criteria of content objects before they can be acquired by the CDN. Every attribute within a single <match> tag is ANDed (to form a logical conjunction) with the other attributes.
You can specify multiple <match> tags within the <matchRule> tag. The <match> tags are ORed (to form a logical inclusion) with other <match> tags. You must specify at least one <match> tag per <matchRule> tag.
Attributes
•
mime-type
The MIME-type attribute specifies MIME types.
•
extension
The extension attribute specifies file extensions.
•
time-before
The time-before attribute specifies that this content was modified before this time in yyyy-mm-dd hh:mm:ss format. Time parameters should be expressed in GMT time zones (for GMT offsets, see the "Manifest File Time Zone Tables" section).
•
time-after
The time-after attribute specifies that this content was modified after this time in yyyy-mm-dd hh:mm:ss format. Time parameters should be expressed in GMT time zones (for GMT offsets, see the "Manifest File Time Zone Tables" section).
•
size-min-in-MB
The size-min-in-MB attribute specifies that the acquired content size must be larger than this number of kilobytes. The size attribute can be expressed in megabytes (MB), kilobytes (KB), or bytes (B).
•
size-max-in-MB
The size-max-in-MB attribute specifies that the acquired content size must be smaller than this number of kilobytes. This attribute can be expressed in megabytes (MB), kilobytes (KB), or bytes (B).
Subelements
None
Example
<! - - crawling item group -- ><item-group server="origin-server" type="prepos"><matchRule><match time-before="2000-05-05 12:0:0"/></matchRule><crawler start-url="eng/index.html" depth="-1"/><crawler start-url="hr/index.html" depth="3"><matchRule><match size-min-in-KB="1" extension="xxx"/></matchRule></crawler></item-group>contains
The <contains/> tag is optional and identifies content objects that are embedded within the content item currently being described. For example, the components of a SMIL (Synchronized Multimedia Integration Language) file requests for an item using <contains/> links are only accepted after the CDN determines that dependent content objects are present in the Content Engine.
The <contains /> tag must be enclosed within the <item> </item> tag.
The <contains /> tag is used to include embedded files for some video files like .asf or .rp. The CDN does not serve this item unless every contained item is present.
Attributes
The cdn-url attribute is required and is the relative CDN URL of one of the embedded contents.
Subelements
None
Example
<item src="house/img08.jpb" cdn-url="img08.jpg" /><item src="house/img09.jpb" cdn-url="img09.jpg" /><item cdn-url="house.rp"src="house/house.rp"><contains cdn-url="img08.jpg"/><contains cdn-url="img09.jpg"/></item>http-meta-data
The <http-meta-data> tag is optional and used for HTTP playback of content. If a content object is requested through HTTP, these attributes are sent to the end users as HTTP response headers. This type of response header is useful when you specify content type for FTP acquired content.
The element can also be nested within an <item> or a <crawler> tag. The <item> or <crawler> tag-level settings override the <item-group> tag-level settings.
Attributes
The name=value attribute can be both standard HTTP header metadata and customized application metadata. If parented with the <item-group> tag, then the attribute applies to the content contained within the group.
This attribute can also be nested within an <item> or a <crawler> tag. The <item> or <crawler> tag-level settings override the <item-group> tag-level settings.
Subelements
None
Configuring Freshness of Pre-Positioned Content
Four different manifest file configurations are possible to configure and manage the freshness of your pre-positioned content using the serveStopTime and noRedirectToOrigin attributes. The following configurations are possible:
•
Both the serveStopTime and noRedirectToOrigin attributes are included in the manifest file, making the condition, noRedirectToOrigin, true. The conditions for this first case are shown in Table 6-7.
•
Only the serveStopTime attribute is included in the manifest file but the noRedirectToOrigin attribute is not, making the condition, noRedirectToOrigin, false. The conditions for this second case are shown in Table 6-8.
•
Neither the serveStopTime nor the noRedirectToOrigin attributes are included in the manifest file, making the condition, noRedirectToOrigin, false. The conditions for this third case are shown in Table 6-9.
•
Only the noRedirectToOrigin attribute is included in the manifest file but the serveStopTime attribute is not, making the condition, noRedirectToOrigin, true. The conditions for this fourth case are shown in Table 6-10.
Depending on whether or not the serveStopTime and noRedirectToOrigin attributes are included and the timing combinations of the serveStopTime value and the HTTP header expiration, the conditions and corresponding results are listed in Table 6-7 through Table 6-10. In the following tables, now is defined as the time the end-user request comes arrives. These tables use the end-user request arrival time to make content delivery decisions.
XML Schema
In the case of the manifest file, an XML schema defines the custom markup language of the manifest file and the appearance of a given set of XML documents. The XML schema specifies which tags or elements you can use in your documents, the attributes those tags can contain, and their arrangement.
Manifest XML Schema
An XSD is a library that provides an application programming interface (API) for manipulating the components of an XML schema. For more information on an XSD, go to http://www.w3schools.com/schema/schema_intro.asp.
The following XML code is the manifest XML schema (CdnManfest.xsd).
[qiwzhang@qiwzhang-linux schema]$ cat CdnManifest.xsd<?xml version="1.0"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs:include schemaLocation="PlayServerTable.xsd"/><xs:element name="CdnManifest"><xs:complexType><xs:sequence><xs:element ref="playServerTable" minOccurs="0" maxOccurs="1"/><xs:element ref="options" minOccurs="0" maxOccurs="1"/><xs:element ref="proxyServer" minOccurs="0" maxOccurs="unbounded"/><xs:choice minOccurs="0" maxOccurs="unbounded"><xs:element ref="server" maxOccurs="unbounded"/><xs:element ref="item-group" maxOccurs="unbounded"/><xs:element ref="item" maxOccurs="unbounded"/><xs:element ref="crawler" maxOccurs="unbounded"/></xs:choice><xs:choice minOccurs="1" maxOccurs="unbounded"><xs:element ref="item-group" maxOccurs="unbounded"/><xs:element ref="item" maxOccurs="unbounded"/><xs:element ref="crawler" maxOccurs="unbounded"/></xs:choice></xs:sequence></xs:complexType></xs:element><xs:element name="options"><xs:complexType><xs:attribute name="timeZone" type="xs:string" use="optional"/><xs:attribute name="notFoundUrl" type="xs:string" use="optional"/><xs:attribute name="alternateUrl" type="xs:string" use="optional"/><xs:attribute name="noRedirectToOrigin" type="xs:boolean" use="optional" default="false" /><xs:attribute name="requireAuth" type="xs:boolean" use="optional"/><xs:attribute name="ttl" type="xs:unsignedInt" use="optional"/><xs:attribute name="prefetch" type="xs:string" use="optional"/><xs:attribute name="ttl-for-missing" type="xs:unsignedInt" use="optional"/><xs:attribute name="ttl-for-non-ref" type="xs:unsignedInt" use="optional"/><xs:attribute name="type" use="optional" default="prepos"><xs:simpleType><xs:restriction base="xs:string"><xs:enumeration value="prepos"/><xs:enumeration value="wmt-live"/><xs:enumeration value="real-live"/></xs:restriction></xs:simpleType></xs:attribute><xs:attribute name="manifest-id" type="xs:string" use="optional"/><xs:attribute name="clearlog" type="xs:boolean" use="optional" default="false"/><xs:attribute name="rd" type="xs:string" use="optional"/><xs:attribute name="prepos-tag" type="xs:string" use="optional"/><xs:attribute name="live-tag" type="xs:string" use="optional"/></xs:complexType></xs:element><xs:element name="server"><xs:complexType><xs:sequence><xs:element ref="host" minOccurs="1" maxOccurs="1"/></xs:sequence><xs:attribute name="name" type="xs:string" use="required"/></xs:complexType></xs:element><xs:element name="host"><xs:complexType><xs:attribute name="name" type="xs:string" use="required"/><xs:attribute name="root" type="xs:string" use="optional"/><xs:attribute name="proxyServer" type="xs:string" use="optional"/><xs:attribute name="proto" use="optional"><xs:simpleType><xs:restriction base="xs:string"><xs:enumeration value="http"/><xs:enumeration value="https"/><xs:enumeration value="ftp"/><xs:enumeration value="mms"/><xs:enumeration value="rtsp"/></xs:restriction></xs:simpleType></xs:attribute><xs:attribute name="port" type="xs:unsignedShort" use="optional"/><xs:attribute name="user" type="xs:string" use="optional"/><xs:attribute name="password" type="xs:string" use="optional"/><xs:attribute name="uuencoded" type="xs:boolean" use="optional" default="false"/><xs:attribute name="proxyName" type="xs:string" use="optional"/><xs:attribute name="sslAuthType" use="optional"><xs:simpleType><xs:restriction base="xs:string"><xs:enumeration value="weak"/><xs:enumeration value="strong"/></xs:restriction></xs:simpleType></xs:attribute></xs:complexType></xs:element><xs:element name="proxyServer"><xs:complexType><xs:attribute name="serverName" type="xs:string" use="required"/><xs:attribute name="port" type="xs:unsignedShort" use="optional"/><xs:attribute name="user" type="xs:string" use="optional"/><xs:attribute name="password" type="xs:string" use="optional"/><xs:attribute name="uuencoded" type="xs:string" use="optional" default="false"/></xs:complexType></xs:element><xs:attributeGroup name = "contentAttr"><xs:attribute name="server" type="xs:string" use="optional"/><xs:attribute name="proxyServer" type="xs:string" use="optional"/><xs:attribute name="playServer" type="xs:string" use="optional"/><xs:attribute name="type" use="optional"><xs:simpleType><xs:restriction base="xs:string"><xs:enumeration value="prepos"/><xs:enumeration value="wmt-live"/><xs:enumeration value="real-live"/></xs:restriction></xs:simpleType></xs:attribute><xs:attribute name="noRedirectToOrigin" type="xs:boolean" use="optional"/><xs:attribute name="requireAuth" type="xs:boolean" use="optional"/><xs:attribute name="alternateUrl" type="xs:string" use="optional"/><xs:attribute name="ttl" type="xs:unsignedInt" use="optional"/><xs:attribute name="priority" type="xs:unsignedInt" use="optional"/><xs:attribute name="prefetch" type="xs:string" use="optional"/><xs:attribute name="expires" type="xs:string" use="optional"/><xs:attribute name="serve" type="xs:string" use="optional"/><xs:attribute name="serveStartTime" type="xs:string" use="optional"/><xs:attribute name="serveStopTime" type="xs:string" use="optional"/></xs:attributeGroup><xs:attributeGroup name = "prefixAttr"><xs:attribute name="cdnPrefix" type="xs:string" use="optional"/><xs:attribute name="srcPrefix" type="xs:string" use="optional"/></xs:attributeGroup><xs:element name="item-group"><xs:complexType><xs:sequence><xs:element ref="matchRule" minOccurs="0" maxOccurs="1"/><xs:element ref="http-meta-data" minOccurs="0" maxOccurs="1"/><xs:choice minOccurs="1" maxOccurs="unbounded"><xs:element ref="item-group" maxOccurs="unbounded"/><xs:element ref="item" maxOccurs="unbounded"/><xs:element ref="crawler" maxOccurs="unbounded"/></xs:choice></xs:sequence><xs:attributeGroup ref="contentAttr"/><xs:attributeGroup ref="prefixAttr"/></xs:complexType></xs:element><xs:element name="item"><xs:complexType><xs:sequence><xs:element ref="contains" minOccurs="0" maxOccurs="unbounded"/><xs:element ref="http-meta-data" minOccurs="0" maxOccurs="1"/></xs:sequence><xs:attribute name="src" type="xs:string" use="required"/><xs:attribute name="cdn-url" type="xs:string" use="optional"/><xs:attributeGroup ref="contentAttr"/></xs:complexType></xs:element><xs:element name="crawler"><xs:complexType><xs:all><xs:element ref="matchRule" minOccurs="0" maxOccurs="1"/><xs:element ref="http-meta-data" minOccurs="0" maxOccurs="1"/></xs:all><xs:attribute name="start-url" type="xs:string" use="required"/><xs:attribute name="depth" type="xs:short" use="optional"/><xs:attribute name="prefix" type="xs:string" use="optional"/><xs:attribute name="accept" type="xs:string" use="optional"/><xs:attribute name="reject" type="xs:string" use="optional"/><xs:attribute name="max-number" type="xs:unsignedInt" use="optional"/><xs:attribute name="max-size-in-B" type="xs:unsignedInt" use="optional"/><xs:attribute name="max-size-in-KB" type="xs:unsignedInt" use="optional"/><xs:attribute name="max-size-in-MB" type="xs:unsignedInt" use="optional"/><xs:attributeGroup ref="contentAttr"/><xs:attributeGroup ref="prefixAttr"/></xs:complexType></xs:element><xs:element name="contains"><xs:complexType><xs:attribute name="cdn-url" type="xs:string" use="required"/></xs:complexType></xs:element><xs:element name="matchRule"><xs:complexType><xs:sequence><xs:element ref="match" minOccurs="1" maxOccurs="unbounded"/></xs:sequence></xs:complexType></xs:element><xs:element name="match"><xs:complexType><xs:attribute name="mime-type" type="xs:string" use="optional"/><xs:attribute name="time-before" type="xs:string" use="optional"/><xs:attribute name="time-after" type="xs:string" use="optional"/><xs:attribute name="size-min-in-B" type="xs:int" use="optional"/><xs:attribute name="size-max-in-B" type="xs:int" use="optional"/><xs:attribute name="size-min-in-KB" type="xs:int" use="optional"/><xs:attribute name="size-max-in-KB" type="xs:int" use="optional"/><xs:attribute name="size-min-in-MB" type="xs:int" use="optional"/><xs:attribute name="size-max-in-MB" type="xs:int" use="optional"/><xs:attribute name="extension" type="xs:string" use="optional"/></xs:complexType></xs:element><xs:element name="http-meta-data"><xs:complexType><xs:anyAttribute processContents="skip" /></xs:complexType></xs:element><xs:complexType><xs:anyAttribute processContents="skip" /></xs:complexType></xs:element></xs:schema>PlayServerTable XML Schema
The following XML code defines the PlayServerTable schema (playServerTable.xsd) for the CdnManfiest.xsd.
<?xml version="1.0"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs:element name="playServerTable"><xs:complexType><xs:sequence><xs:element ref="playServer" minOccurs="1" maxOccurs="unbounded"/></xs:sequence></xs:complexType></xs:element><xs:element name="playServer"><xs:complexType><xs:choice minOccurs="1" maxOccurs="unbounded"><xs:element ref="contentType"/><xs:element ref="extension"/></xs:choice><xs:attribute name="name" use="required"><xs:simpleType><xs:restriction base="xs:string"><xs:enumeration value="real"/><xs:enumeration value="wmt"/><xs:enumeration value="http"/><xs:enumeration value="qtss"/></xs:restriction></xs:simpleType></xs:attribute></xs:complexType></xs:element><xs:element name="contentType"><xs:complexType><xs:attribute name="name" type="xs:string" use="required"/></xs:complexType></xs:element><xs:element name="extension"><xs:complexType><xs:attribute name="name" type="xs:string" use="required"/></xs:complexType></xs:element></xs:schema>Default PlayServer Table
The following XML code defines the default PlayServerTable schema (PlayServerTable.xsd).
<?xml version="1.0"?><playServerTable xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"xsi:noNamespaceSchemaLocation = "PlayServerTable.xsd"><playServer name="real"><!-- MIME type staken fromhttp://service.real.com/help/library/guides/server8/htmfiles/custmizg.htm--><contentType name="audio/x-pn-realaudio" /><contentType name="audio/x-pn-realaudio-plugin" /><contentType name="application/x-pn-realmedia" /><contentType name="application/smil" /><contentType name="application/vnd.rn-rmadriver" /><extension name="rm" /><extension name="rms" /><extension name="ra" /><extension name="rp" /><extension name="rt" /><extension name="smi" /></playServer><playServer name="qtss"><contentType name="video/quicktime" /><extension name="mov" /><extension name="qt" /><extension name="mp4" /><!-- extension avi could go here, but is also supported by wmt --></playServer><playServer name="wmt"><!-- MIME types taken fromhttp://msdn.microsoft.com/workshop/imedia/windowsmedia/server/mime.asp--><contentType name="video/x-ms-asf" /><contentType name="audio/x-ms-wma" /><contentType name="video/x-ms-wmv" /><contentType name="video/x-ms-wm" /><contentType name="application/x-ms-wmz" /><contentType name="application/x-ms-wmd" /><!-- comments courtesy of Laura Gaughan, 11jan2001 --><extension name="wma" /> <!-- audio content --><extension name="wmv" /> <!-- audio/video content --><extension name="asf" /> <!-- audio/video content (legacy) --><extension name="wm" /> <!-- reserved for future use --><!-- extension avi could go here, but is also supported by qtss --></playServer><playServer name="http"><contentType name="application/pdf" /><contentType name="application/postscript" /><extension name="pdf" /><extension name="ps" /><!-- this must be http; wmt doesn't do asx over mms --><contentType name="audio/x-ms-wax" /><contentType name="video/x-ms-wvx" /><contentType name="video/x-ms-wmx" /><extension name="asx" /> <!-- as for wvx + .asf .asx (legacy) --><extension name="wax" /> <!-- metadata for .wma .wax --><extension name="wvx" /> <!-- metadata for .wma .wmv .wvx .wax --><extension name="wmx" /> <!-- reserved for future use --><!--add all types from wmt tables to here, since they can be playedby http too.MIME types taken fromhttp://msdn.microsoft.com/workshop/imedia/windowsmedia/server/mime.asp--><contentType name="video/x-ms-asf" /><contentType name="audio/x-ms-wma" /><contentType name="video/x-ms-wmv" /><contentType name="video/x-ms-wm" /><contentType name="application/x-ms-wmz" /><contentType name="application/x-ms-wmd" /><!-- comments courtesy of Laura Gaughan, 11jan2001 --><extension name="wma" /> <!-- audio content --><extension name="wmv" /> <!-- audio/video content --><extension name="asf" /> <!-- audio/video content (legacy) --><extension name="wm" /> <!-- reserved for future use --><!-- extension avi could go here, but is also supported by qtss --></playServer></playServerTable>Manifest File Time Zone Tables
To convert to local time, you must know the time difference between Greenwich mean time (GMT) and local time for both standard time and summer time (daylight saving time). Table 6-11 through Table 6-26 list the time zones supported by the manifest file. The format for writing the time zone is:
<zonename>:[+|-:]hh:mm per line
where <zonename> is the name of the time zone or standard time zone abbreviation (see Table 6-11) without spaces before or after the colon (":"), and "[+|-:]hh:mm" is the GMT offset in hours and minutes. The GMT offset default is "+."
Table 6-18 Brazil GMT Offsets
Time Zone: GMT Offset Time Zone: GMT Offset Time Zone: GMT OffsetBrazil/Acre:-:05:00
Brazil/East:-:03:00
Brazil/West:-:04:00
Brazil/DeNoronha:-:02:00
Table 6-22 Jamaica/Japan/Kwajalein/Libya GMT Offsets
Time Zone: GMT Offset Time Zone: GMT Offset Time Zone: GMT OffsetJamaica:-:05:00
Kwajalein:+:12:00
Libya:+:02:00
Japan:+:09:00
Manifest File Automated Scripts
This section contains information about automated Perl scripts that you can use to automate the creation of manifest files for your CDN. The most efficient method of creating a manifest file is to customize the automated Spider Perl script in combination with the Manifest Perl script, both of which are available on Cisco.com. These two Perl scripts can serve as the basis for your own automation scripts that are modified accordingly to suit your own needs.
We provide two Perl automated scripts called spider.pl and manifest.pl. These scripts can be used as-is. If you are proficient in using Perl, you can modify the spider.pl and manifest.pl. However, if you modify these scripts, Cisco will not support them. Both the spider.pl and manifest.pl scripts contain a "--file" argument that is to be used in conjunction with a user-created rules file, such as .cfg. So that they can be reused, it is recommended that customers use this method to include the various arguments that they require, as opposed to running them from the command line.
First, run the Spider script, and then use the output of the Spider script as input to the Manifest script. The Spider script searches the content of the selected origin servers and outputs a database file containing a list of URLs of all content. The Manifest script uses this database file to build the manifest file with the correct syntax based on rules you stipulate from the command line or rules file. This produces an XML-based manifest file containing the URLs of only those content objects that you want made available to your users.
The following two sample automated Perl scripts are available on Cisco.com:
•
Spider Perl script
The Spider script crawls over the content of a selected origin server and outputs a database file containing a list of URLs.
•
Manifest Perl script
The Manifest script reads the database file output by the Spider script and uses rules that you establish to produce an XML-formatted manifest file containing the URLs of only those filtered content objects that you want to make available to users.
Installing Perl on Your Workstation
You must have Perl installed on your workstation before working with or running the Spider or Manifest scripts. It is useful to also have a Perl interpreter available. Perl is open source software and can be downloaded for free from a variety of locations on the Internet. Refer to the Comprehensive Perl Archive Network (CPAN) at:
http://www.cpan.org
or
http://www.perl.comObtaining the Perl Scripts
The Spider and Manifest scripts can be obtained from Cisco.com using the same procedure that is used to obtain updated versions of the Cisco ACNS software.
To obtain the Spider and Manifest scripts from Cisco.com, follow these steps:
Step 1
Go to the following URL to find the Spider and Manifest Perl scripts:
http://www.cisco.com/pcgi-bin/tablebuild.pl/acns50
Step 2
When prompted, log in to Cisco.com using your designated Cisco.com username and password.
The Cisco ACNS Software download page appears, listing the available software updates for the Cisco ACNS Software product.
Step 3
Locate the file named ACNS-5.0.1-manifest-tools.zip. This is a ZIP archive containing both the Manifest and Spider Perl scripts.
Step 4
Click the link for the ACNS-5.0.1-manifest-tools.zip file. The download page appears.
Step 5
Click Software License Agreement.
A new browser window opens, displaying the license agreement.
Step 6
After you have read the license agreement, close the browser window displaying the agreement and return to the Software Download page.
Step 7
Click the filename link labeled Download.
Step 8
Click Save to file and then choose a location on your workstation to temporarily store the zipped file containing the scripts.
Step 9
Use your preferred unzip program to unpack the scripts to a location on your workstation or your network.
After you have unzipped the scripts, you are ready to begin using them to build manifest files for your website. See the "Listing Website Content Using the Spider Script" section and the "Selecting Live and Pre-Positioned Content Using the Manifest Script" section for instructions on running the scripts.
Listing Website Content Using the Spider Script
In the simplest scenario, the Spider script is pointed to the address of an origin server and given the name of a database (.db) file into which it places any valid URLs it discovers on that site. For example, if you wanted to analyze the contents of www.cisco.com for content that might be pre-positioned after the manifest file is created, you would issue the following command:
perl spider.pl --start=www.cisco.com --db=ciscocontent.dbLimiting or Broadening the Scope of the Spider Script
Running the Spider script on the whole of www.cisco.com might take hours and produce much more information than you are interested in. The Spider script contains a variety of tools that enable you to limit as well as broaden the scope of a spider's action.
Note
When running the Spider script on large websites, you must plan for the long period of time and the large amount of memory that is required for the Spider script to create a database.
For example, to limit the Spider script's search of www.cisco.com to just that part of the server containing product-related support information, you could enter the following command:
perl spider.pl --start=www.cisco.com/public/support/ --db=ciscocontent.dbTo ask the Spider script to follow links from www.cisco.com to the Cisco networking professionals forum, you could enter the following Spider script command:
perl spider.pl --start=www.cisco.com --accept=business.cisco.com --db=ciscocontent.dbSpider Script Syntax Guidelines
The Spider script accepts the following syntax, as described in Table 6-27.
perl spider.pl {--start=origin_server_url [ --accept=accept_url | --depth=number | --file=filename |
--limit=number | --prefix=url_prefix | --reject=disallowed_url | ] --db=database_name.db}
Customizing the Spider Script
Because the Spider script anticipates certain platforms and scenarios that might not correspond to your own website configuration, Cisco provides you with the Perl source code for the Spider script, which you can modify to suit your own needs.
Selecting Live and Pre-Positioned Content Using the Manifest Script
Whereas the Spider script is used to gather a list of potential content from an origin server, the Manifest Script sifts through this information gathered by the Spider script and decides which content to actually import to the CDN for placement on a Content Engine.
Pre-Positioned Versus Live Content
The Manifest script distinguishes between content that needs to be pre-positioned and live, streamed content that, by definition, cannot be pre-positioned.
The result of using the live command is nearly the same as that of using the prepos command. Both commands expect you to specify what you intend to deliver as live content or to deliver as pre-positioned content with --prepos=match() or --prepos=type(). The only difference between these two commands is the tags contained in the .xml file that is created by manifest.pl. If the prepos command is used, then the .xml file that is created contains the tag <item-group type="prepos">. If the live command is used, then the .xml file contains the tag <item-group type="wmt-live"> or <item-group type="real-live">, depending on whether the streaming data is RealMedia or WMT.
By using the prepos command, you identify and pre-position content that meets criteria that you specify. For example, to pre-position image files from Cisco.com that are larger than one megabyte, you would enter the following command:
perl manifest.pl --prepos='type(image/*) and size > 1000k' --db=ciscocontent.db --xml=cisco.xmlBy using the live command, you identify the URLs of live content. Unlike pre-positioned content, live content cannot be identified by information stored in the header, so you must devise a method of locating live content based solely on information contained in the URL of that content. For example, you can identify streamed content with the following command:
perl manifest.pl --live=`match(http://*)'Manifest Script Syntax Guidelines
The Manifest script accepts the following syntax, as described in Table 6-28.
perl manifest.pl {[--file=filename | --live=`keyword_comparison' | --prepos=`keyword_comparison' | --set=`attribute=value : keyword_comparison' | --playservertable=filename | --map={origin_server_url_prefix=cdn_prefix}] --db=database_name.db --xml=manifest_file_name.xml}
Note
At least one of the --prepos or --live keywords is required for the manifest file to be created from the Spider database. If you do not use at least one or both of these keywords, the manifest file created will be minimal and will not contain any content URLs.
The --prepos keyword can be used with either --type() or --match, which perform different functions. The --match() keyword is a text match and acts on the name of the URL. For example, to call jpeg files named a.jaypeg, use the --match(*.jaypeg) keyword. Another example would be to use the --match() keyword to find news in the name of the URL. The --match() keyword can also be used as shown in the following example:
perl manifest.pl -db=name.db --prepos=='match(*.jpg)' -xml=xmlname.xmlThe --type() keyword is used for comparing the content named in the database file to the content-type header returned by the web server. It informs the client of the object MIME type. For example, if you name your jpeg files *.jaypeg, the web server returns "Content-Type: image/jpeg," which is then placed in the database.
Other examples of the --type() keyword include the following:
--prepos=type(text/html) --prepos=type(text/plain) --prepos=type(application/pdf) --prepos=type(image/gif) --prepos=type(image/jpeg) --prepos=type(video/mpeg)The following are two examples of using the --type() keyword in the full command line:
perl manifest.pl --db=name.db --prepos="type(image/jpeg)" -xml=xmlname.xmlperl manifest.pl --db=name2.db --prepos="type(application/pdf)" xml=xmlname2.xml
Tip
As a rule of thumb, you must use quotes only in the command line. You do not need to use quotes within a rules file for the --file keyword.
If the--prepos keyword is used in the full command line, then quotes are needed as follows:
•
Windows 2000.
Use double quotes instead of single quotes as shown in the preceding example.
•
Linux.
Use single quotes instead of double quotes.
If the --prepos argument is used within a rules file with the --file argument, then you can modify the file because quotes are not required, as shown in the following rules.cfg file example:
--start=www.cisco.com--accept=forums.cisco.com--reject=/cgi-bin--limit=0--db=ciscocontent.db--prepos=match(image/gif) and size > 1000k--xml=ciscomanifest.xmlIf the quotes are not removed from within the rule file, the following message is given: "Bareword found where operator expected."
Customizing the Manifest Script
Because the Manifest script anticipates certain platforms and scenarios that might not correspond to your own website configuration, Cisco provides you with the Perl source code for the Manifest script, which you can modify to suit your own needs.
Creating a Rules File for the Spider and Manifest Scripts
When using the Spider and Manifest scripts on a large web server, the parameters and rules you set for your scripts may be numerous and complex. When this is the case, it is more practical to create a separate file containing a list of your customized rules. Then you can simply point to the applicable rule rather than having to enter a long series of commands every time you want the rule applied.
Using a rules file facilitates rerunning of the Spider and Manifest scripts and ensures that the scripts are receiving identical commands each time the scripts are run. In addition, the same commands file can be read by both the Spider and the Manifest scripts without generating output errors. The Spider script simply ignores commands for the Manifest script, and vice versa.
To create a rules file for the Spider and Manifest scripts, follow these steps:
Step 1
Open your text editor.
Step 2
Enter your commands one at a time, each on its own line.
Each line of your rule file is sent to the scripts as a single argument. The following example shows a rules file for the Cisco website:
--start=www.cisco.com--accept=forums.cisco.com--reject=/cgi-bin--limit=0--db=ciscocontent.db--prepos=match(image/gif) and size > 1000k--xml=ciscomanifest.xmlStep 3
Save your file in a location relative to the Spider and Manifest scripts.
Step 4
Use the file command to run each script using your rules file. For example:
perl spider.pl --file=cisco-rules.cfgperl manifest.pl --file=cisco-rules.cfg



