Cisco ACNS Software Deployment and Configuration Guide, Release 5.0
Chapter 6: Creating Manifest Files

Table Of Contents

Creating Manifest Files

Manifest File User Guidelines

Overview

Quick Start

Writing XML Tags

Important Manifest Tags

Writing a Single-Item HTTP Manifest File

Writing a Single-Item FTP Manifest File

Writing an FTP Crawler Manifest File

Writing an HTTPS Crawler Manifest File

Validating Manifest Files

Migrating from ACNS 4.x Software to ACNS 5.0 Software

Getting Started

Sample Manifest File

Using a Text Editor

Formatting XML Files

Writing Common Regular Expressions

Working with Manifest Files

Specifying a Single Content Item

Specifying a Crawl Job

Scheduling Content Acquisition

Specifying Shared Attributes

Specifying a Crawler Filter

Specifying Content Priority

Generating a Playserver List

Generating a Publishing URL

Specifying Attributes for Content Serving

Specifying Metadata for Content Serving

Specifying Time Values in the Manifest File

Refreshing and Verifying the Manifest File Content

Specifying Live Content

More Sample Manifest Files

Sample 1

Sample 2

Sample 3

Sample 4

Sample 5

Downloading the Sample Files

Manifest Validator Utility

Running the Manifest Validator Utility

Understanding Manifest File Validator Output

Correcting Manifest File Syntax

Manifest File Reference

Manifest File Structure and Syntax

CdnManifest

playServerTable

playServer

options

server

host

item

crawler

item-group

matchRule

match

contains

wmt-meta-data

http-meta-data

Configuring Freshness of Pre-Positioned Content

XML Schema

Manifest XML Schema

PlayServerTable XML Schema

Default PlayServerTable Schema

Manifest File Time Zone Tables

Manifest File Automated Scripts

Installing Perl on Your Workstation

Obtaining the Perl Scripts

Listing Website Content Using the Spider Script

Selecting Live and Pre-Positioned Content Using the Manifest Script

Creating a Rules File for the Spider and Manifest Scripts


Creating Manifest Files


This chapter describes the process for creating manifest files used to acquire and distribute content with ACNS 5.0 software. This chapter is divided into two major sections:

Manifest File User Guidelines

This first major section provides:

A general overview and purpose of manifest files in the context of a Cisco CDN

A quick start section that has you up and running immediately

A getting started section that describes how to complete specific tasks

Useful sample manifest files

A syntax validation utility

An explanation of live content distribution

Manifest File Reference

This second major section describes:

Detailed manifest file structure and syntax

XML schema

Running of automated manifest file scripts

Manifest file time zone tables

Manifest File User Guidelines

This first major section contains the following topics:

Overview

Quick Start

Getting Started

Working with Manifest Files

More Sample Manifest Files

Manifest Validator Utility

Overview

The Cisco ACNS 5.0 software manages the acquisition and distribution of pre-positioned content through an Extensible Markup Language (XML)-based reference file called the manifest file. The manifest file lists content that is to be used to populate Content Engines registered on a Cisco CDN. There should be one manifest file per channel.

The manifest file is placed on an origin server and identified by a unique URL. The location of the manifest file is specified when you enter the manifest file URL in the Modifying Channel window of the Content Distribution Manager GUI. Unlike the treatment of content by Cisco ACNS 4.x software, pre-positioned content is not stored on the Content Distribution Manager in ACNS 5.0 software but is fetched from origin servers and distributed to Content Engines by a Content Engine that is a root Content Engine for the channel.

The Content Distribution Manager disseminates the manifest file URL to each of the root Content Engines of the CDN. The root Content Engine then parses the file and checks for any new or different information. After the root Content Engine determines what content is new, it fetches only that new content from the specified pre-positioned or live content from one or more origin servers.

The manifest file has the following features:

Administrators and content providers can provide content on an origin server.

Files can be imported over HTTP, HTTPS, or FTP while they are served using another streaming protocol based on a designated type of media playserver to play back the requested file.

Content acquisition and distribution can be controlled by setting pre-scheduled content availability dates and times. Two content acquisition methods can be configured within the manifest file. The first method specifies the acquisition of a single <item>. The second method specifies content acquisition by crawling a website or FTP server with the <crawler> feature. Either of these two methods can schedule when the acquisition is to start and how often its content is to be checked for freshness.

Quick Start

This section will help you succeed in writing manifest files that you can use to acquire content immediately. See other sections of this chapter to learn more about specifying useful attributes to customize the manifest files further and to obtain more information on the correct manifest file syntax.


Note The username and password specified in the Channel property serves only to fetch the manifest file. The actual content acquisition process does not use this username and password. For fetching actual content, the username and password need to be specified in the <server> <host> tag.


Writing XML Tags

The manifest fie is a text file written in XML format. An XML text file consists of a series of XML tags. The following is an example of a simple XML tag:

<item attr1="value1" attr2="value2" />

In the preceding example, "item" is the name of the XML tag, so this tag is called the "item" tag. A tag can have many attributes in the form of name="value." The value field must be bounded by double quotation marks. There are two attributes inside the "item" tag shown in the example. The first attribute, called "attr1," has a value called "value1." The second attribute, called "attr2," has a value called "value2."

Tags typically start with a "<" and end with a "/>," but they can start with a "<" and end with ">." If a tag ends with ">," it means its scope is not yet complete. To complete its scope, a tag called "tag-name" must end with "</tag-name>." For example:

<server name="name" >
	<host name="name" />
</server>

In the first line of the example, the <server> tag ends with a ">," but its scope does not end on the first line. Its scope ends on the third line with the tag </server>. Because the <host> tag is inside the <server> tag, the <host> tag is called a subtag of the <server> tag, and the <server> tag is considered the parent tag of the <host> tag.

Two tag relationships can exist between XML tags: peer and subtag. In the following example, the two "item" tags have a peer relationship:

<item src="url1" />
<item src="url2" />

The key to identifying their peer relationship is that the first tag ends with a "/>" before the second tag starts. In the following example, the <server> and <host> tags have a subtag relationship:

<server name="cisco">
	<host name="url" />
</server>

The key to identifying their subtag relationship is that the first tag ends with a ">" before the second tag starts. The <host> tag is the subtag of the <server> tag; that is, the <server> tag is the parent of the <host> tag.

Important Manifest Tags

This section lists and briefly describes the important manifest tags for you to better understand manifest files.

<CdnManifest> tag

The <CdnManifest> </CdnManifest> tag set must be the highest-level tag for an ACNS 5.0 software manifest file. The tag set is required and marks the beginning and end of the manifest file content. At a minimum, each <CdnManifest> tag set must contain at least one item, or content object, that is fetched and stored. For example,

<CdnManifest>
...
</CdnManifest>

<server> and <host> tags

The <server> and <host> tags are required to specify the origin content source server. The <server> tag must precede any <item> or <crawler> tag that refers to it. In the following example:

<server name="xyz" > <host name="http://www.xyz.com/" /> </server>
<item server="xyz" src="" />

the <server> tag must precede the <item> tag; otherwise, there is a syntax error. The <crawler> tag uses the <server> tag that immediately precedes it. If a <server> tag is not found immediately preceding the <crawler> tag, then the server that serves the manifest file is used by default.

Similarly, item priority is important when using the <item> tag. In the following example:

<item src="abc.html" />
<item src="xyz.html" />

abc.html is acquired and distributed before xyz.html.

The <host> tag field inside the <server> tag field configures the content source host. The <server> tag only requires the name attribute. The <host/> tag defines a web server or live server from which content is to be retrieved and later pre-positioned. Only one host can be defined within a single <server> tag set. The <host> tag must be enclosed within <server> tags.

server name attribute

The server name value can be any value as long as it is unique across all name values of the <server> tags within the same manifest file. The <server> tag is required to be the super tag of the <host> tag, and the <host> tag needs to have at least have one <host> subtag.

host name attribute

The host name value specifies the fully qualified domain name, including protocol and port for the origin server. For example:

<host name="http://www.cisco.com" />

or

<host name="ftp://my-ftp-server" />

or

<host name="https://my-ftp-server.com:843/" />

host user and password attributes

The user and password attributes specify the username and password when authentication is required. For example:

<host name="ftp://my-ftp-server"  user="honh" password="dsadda2" />

<item> tag

The <item> tag is used to specify a single file to be pre-positioned.

item src attribute

The src attribute is required to specify the relative URL of the file that is relative to the value specified in the <host name>.

<crawler> tag

The <crawler> tag is used to specify a crawl job. You can use the <crawler> tag to crawl an FTP directory and its subdirectories or to crawl directories using HTTP directory indexing.

crawl directories

Use HTTP to crawl directories to fetch files in certain directories by enabling the built-in web server directory indexing feature.

If a URL points to a directory when this directory indexing feature is enabled, the web server dynamically generates an HTML page and lists all the files and subdirectories. By parsing such an HTML page, the ACNS software can identify those files it can fetch from that particular directory.

crawler start-url attribute

The start-url attribute specifies the relative path of the URL from which to start the crawl. For example, if the host name of the crawl job <crawler start-url="HR/jobs/" /> is <host name="http://www.my-server.com/" />, the directory "http://www.my-server.com/HR/jobs/" is crawled.

crawler depth attribute

The crawler depth attribute specifies the directory depth of a web crawl. A depth value of 0 allows only a crawl of the starting URL page, while a depth value of 1 allows a crawl of the start URL page and its links.

Writing a Single-Item HTTP Manifest File

The following sample shows the simplest way to write a manifest file that fetches content using the HTTP protocol.

<CdnManifest>
	<server name="my-second-origin-server"> 
		<host name="http://www.my-server.com/" /> 
	</server>
	<item src="project-one.html" /> 
	<item src="my-eng-group/project-two.html" /> 
	<item src="project-three.html" /> 
</CdnManifest>

The <CdnManifest> tag set is required to specify a manifest file. The <server> tag set specifies the logical name of the server "my-second-origin-server." The <host> subtag specifies the actual URL used to access the files on the origin server. The <item> tag specifies the exact item that is to be fetched from the origin server.

Upon execution, the preceding manifest file sample instructs the ACNS software to fetch the following items using HTTP:

http://www.my-server.com/project-one.html

http://www.my-server.com/my-eng-group/project-two.html

http://www.my-server.com/project-three.html

Writing a Single-Item FTP Manifest File


Note When you use FTP to acquire content using a CDN URL, you must either specify the the content-type in the manifest file, or you must use the correct extension in the CDN URL. Otherwise the wrong content-type is generated and you are not able to play the content.


The following sample shows the simplest way to write a manifest file that fetches content using the FTP protocol.

<CdnManifest>
	<server name="my-ftp-server"> 
		<host name="ftp://myftp.cisco.com" user="johnw" password="georgebush" /> 
	</server>
	<item src="relative-path/file1.txt" /> 
	<item src="/full-path/file2.txt" /> 
</CdnManifest>

Upon execution, the preceding manifest file sample instructs the ACNS software to fetch content using FTP, where the "relative-path" is the path relative the home directory of johnw's login to the FTP server. The "/full-path" is the absolute path relative to the root directory.

For example, if the FTP home directory for "johnw" is "/users/ftp/johnw," the full path for the first file is /users/ftp/johnw/relative-path/file1.txt, and the full path for the second file is /full-path/file2.txt.

Writing an FTP Crawler Manifest File

The following sample shows the simplest way to write a crawler manifest file that fetches content using FTP protocol.

<CdnManifest>
	<server name="my-ftp-server" >
		<host name="ftp://ftp-server" />
	</server>
	<crawler
		start-url="folder/"
		depth="10"
		ttl = "10"
	/>
</CdnManifest>

The web crawler application methodically and automatically searches acceptable websites and makes a copy of the visited pages for later processing. The web crawler starts with a list of URLs to visit and identifies every web link in the page, adding these web links to the list of URLs to visit.

The preceding manifest file sample instructs the ACNS software to start crawling from ftp://ftp-server/folder to ten directory levels deep and check those directories every 10 minutes for freshness.

The <crawler> tag specifies the crawl task. The start-url attribute specifies where the web crawler is to start crawling. The depth attribute of ten specifies how many levels of subdirectories the crawler is to check to obtain the required content. The ttl attribute specifies how often the file is to be checked for freshness. The ttl attribute can be specified as an attribute in a single-item manifest file as well.

Writing an HTTPS Crawler Manifest File

The following sample shows the simplest way to write a crawler manifest file that fetches content using the HTTP protocol. The following manifest file sample instructs the ACNS software to start crawling from https://www.cisco.com/jobs/eng/ to a depth of five levels.

<CdnManifest>
	<server name="cisco"> 
	    <host name="https://www.cisco.com/" /> 
	</server> 
	<crawler 
	    start-url="jobs/eng/" 
	    depth="5" 
	/>
</CdnManifest>

As with a single-item manifest file, the <CdnManifest> tag set is required to specify a manifest file. The <server> tag set specifies the logical name of the server "cisco." The <host> subtag specifies the actual URL used to access the files on the origin server.

If directory indexing is enabled for the jobs/eng directory and its subdirectories, then the crawler will go to a depth of five directory levels to retrieve the files. Files associated with a particular channel are typically stored in the same directory or subdirectory on the origin server. If the origin server is running HTTP or HTTPS services, directory indexing must be enabled for these directories. Enabling directory indexing allows a request to that directory to return a list of files in that directory and allows ACNS 5.0 software to crawl the directory.

Validating Manifest Files

It is a good idea to use the Manifest Validator utility to validate your manifest file after it is created. See the "Manifest Validator Utility" section for more information on the Manifest Validator utility.

Migrating from ACNS 4.x Software to ACNS 5.0 Software

Unlike ACNS 4.3 software, ACNS 5.0 software requires one or more origin servers where source files can be stored for the pre-positioning of content. These origin servers require that remote access servers be installed to support HTTP, FTP, or HTTPS services so that CDN devices can fetch pre-positioned files.

Files associated with a particular channel are typically stored in the same directory or subdirectory on the origin server. If the origin server is running HTTP or HTTPS services, directory indexing must be enabled for these directories. Enabling directory indexing allows a request to that directory to return a list of files in that directory and allows ACNS 5.0 software to crawl the directory.

Once the content is uploaded to a suitable origin server and available, you can use the following simple manifest file to specify content acquisition.

<CdnManifest>

<server name="my-server" >
    <host name="http://my-server" />
</server>

<crawler start-url="my-path/"  ttl="10"  />

</CdnManifest>

In the preceding sample, a crawl job is specified for the associated channel to check the "my-path" directory for freshness every 10 minutes. Once this setup is complete, the root Content Engine associated with this channel monitors this directory to determine if there are any new or updated files, and then automatically fetches them.

Running the preceding manifest file sample achieves the same objective as that featured in the ACNS 4.2 software, where users copy pre-positioned files into a Content Distribution Manager import folder. However, using the manifest file is more powerful than the Content Distribution Manager import feature. For example, if you store content at different locations but the content must be distributed through the same channel, you can create multiple crawl jobs in the manifest file to monitor these locations. The following sample manifest file allows you to monitor different locations.

<CdnManifest>

<server name="my-http-server" >
    <host name="http://my-server" />
</server>

<crawler start-url="my-path-http/"  ttl="10"  />

<server name="my-ftp-server" >
    <host name="ftp://my-server" />
</server>

<crawler start-url="my-path-ftp/"  ttl="10"  />

</CdnManifest>

You are monitoring a directory "my-path-http" in an HTTP server and a directory "my-path-ftp" in an FTP server.

Getting Started

The manifest file, whose URL is stored in the Content Distribution Manager GUI, allows you to define a series of servers from which content can be fetched, as well as a list of content objects on each server to be fetched. Written in XML, a finished manifest file contains a series of URLs pointing to pre-positioned content.

This section explains the structure of the XML-based manifest file. In the manifest file syntax samples that follow, note the capitalization and data formats used. For your finished manifest file to be executed successfully, XML tags and tag attributes must use the format outlined in this section.

Sample Manifest File

The following example shows a simple functional manifest file. Use this example as a model when creating or troubleshooting your own manifest files.

<?xml version="1.0"?>
<CdnManifest>

<playServerTable>
 <playServer name="wmt">
   <contentType name="wmt"/>
 </playServer>
</playServerTable>

<options noRedirectToOrigin="true"/>

<server name="server0">
   <host name="http://www.cnn.com"/>
</server>

<item-group server="server0">
     serveStartTime="2003-01-12 14:00:00 PST" serveStopTime="2099-04-12 14:00:00 PST">
     <item src="item-01"/>
     <crawler start-url="crawler-01" depth="10"/>     
</item-group>

</CdnManifest>


Note The XML standard requires that the optional <?xml version="1.0"?> version line, if used, must be the first line of the XML file. If blank lines occur before the <?xml version="1.0"?> version line in a manifest file, the Manifest Validator will report syntax errors.


The format of the manifest file is important because it is the vehicle that specifies those content objects that are to be imported into your CDN for pre-positioning in your edge devices, such as Cisco Content Engines. With the manifest file, you can specify where to obtain web content objects, how long these objects should remain on the Content Engines of your CDN, and how frequently the ACNS software should check their freshness.

Using a simple text editor, you can write acquisition and pre-positioning instructions in XML format. The actual manifest file resides on a web server that the Content Distribution Manager can access. The manifest file URL is stored in the Content Distribution Manager GUI. The ACNS software takes its instructions from the manifest file, acquiring content from the origin server and pre-positioning it to the appropriate edge devices on your CDN. You can specify that the manifest file fetch content from servers using either of the following methods:

Fetch one or multiple single items or URLs.

Start a crawler job using its associated parameters, such as starting URL, level of directory depth, prefix, and filter, to accept or reject content using criteria you have specified.

You can also schedule when content acquisition is to start and how often content should be checked for freshness. Information on how end users can access pre-positioned content on the CDN must be provided. For example, end users need to know what playserver should be used to play media, how to access the content, when the content is to be served, and any additional metadata for media playback.

Using a Text Editor

Because XML files, like HTML files, are simple text format files that use special tags or elements to designate how content is to be handled and represented on a website, it is possible to create manifest files using any ASCII text editor. A variety of third-party XML authoring tools also exist, and they can speed the process of generating manifest files.

Unlike HTML, which serves as a language for creating web pages, XML is a language for creating languages. In this case, the manifest file becomes the XML application. The XML application contains tags that describe the information that is contained within the tags. This information is extracted from the manifest file XML application and reused repeatedly to carry out tasks, or it is merged with other information from a different source and the result used in a different framework or for a different function.

Writing XML is not as forgiving as writing HTML. XML is sensitive to uppercase and lowercase letters, the use of quotation marks, the proper closure of tags, and other formats that require exceptional attention to detail. Care must be taken to ensure that XML tags are properly formatted and otherwise syntactically correct. Incorrectly formatted data, such as incorrect usage of capitalization in a tag or tag attribute, results in syntax errors.

Formatting XML Files

The manifest file must be written using the XML format described in the "Manifest File Structure and Syntax" section. An XML file is a plain text file with tags. The following is an example of a simple XML tag:

<sample-tag/>

The tag begins with the left angle bracket (<) and ends with a forward slash and a right angle bracket (/>). The name of this tag is "sample-tag."

The following is an example of a tag with attributes:

<sample-tag name1="value1" name2="value2" />

The following sample tag has attributes and a subtag:

<sample-tag name1="value1" name2="value2"> 
        <sub-tag name1="value1" name2="values"/> 
  </sample-tag>

If a subtag is contained within a tag, the subtag attribute list must end with a right angle bracket (>) instead of a forward slash and a right angle bracket (/>), and the entire tag must end with </tag-name>.

For more information on XML or XML tutorials, refer to the following links:

http://www.w3.org/XML/

http://www.w3schools.com/

Writing Common Regular Expressions

A regular expression is a formula for matching strings that follow a recognizable pattern. The following special characters have special meanings in regular expressions:

. * \ ? [ ] ^ $

If the regular expression string does not include any of these special characters, then only an exact match satisfies the search. For example, "stock" must match the exact substring "stock."

For more information about writing regular expressions, refer to the following website:

http://yenta.www.media.mit.edu/projects/Yenta/Releases/Documentation/regex-0.12/

Working with Manifest Files

This section provides manifest file samples for carrying out specific tasks. Each sample has an associated explanation of its purpose and function. The manifest file can specify a single content object, a website crawler job, or an FTP server crawler job to acquire pre-positioned content or live content that is distributed to edge Content Engines later.

Specifying a Single Content Item

The following manifest file example specifies a single content item.

<CdnManifest>

<item src="test.html" />
<server name="my-origin-server-one"> 
<host name="http://www.my-server-one.com/eng/" /> 
</server>
<server name="my-origin-server-two"> 
<host name="http://www.my-server-two.com/eng/" /> 
</server> 
<item src="project-two.html" /> 
<item server="my-origin-server-one" src="project-one.html" /> 

</CdnManifest>

In the preceding example, the first <item> uses the manifest server, where test.html is relative to the manifest file URL. The second <item>, "project-two.html," uses "my-origin-server-two," and the third <item>, "project-one.html," uses "my-origin-server-one."

Use the <item> tag to specify a single content item, object, or URL. The required attribute src is used to specify the relative path portion of the URL. If the server name attribute is omitted, the server name in the last specified <server> tag above it is used. If there are no <server> tags close by in the manifest file, the server that hosts the manifest file will be used, which means that the relative URL will be relative to the manifest file URL.


Note Before any content can be acquired, you must enter the URL that defines the location of the manifest file in the Content Distribution Manager GUI. In the Modifying Channel window, enter the location URL of the manifest file, its Time To Live (TTL), the username, and the password required to access the manifest file (if the location is password-protected).


Specifying a Crawl Job

The web crawler application methodically and automatically searches acceptable websites and makes a copy of the visited pages for later processing. The web crawler starts with a list of URLs to visit and identifies every web link in the page, adding these links to the list of URLs to visit. The process ends after one or more of the following conditions are met:

Links have been followed to a specified depth.

The maximum number of objects has been acquired.

The maximum content size has been acquired.

By crawling a site at regular intervals using the Time To Live (or ttl) attribute, these links and their associated content can be updated regularly to keep the content fresh. For more information on the ttl attribute, see the "Refreshing and Verifying the Manifest File Content" section.

Use the <crawler> tag to specify the website or FTP server crawler attributes. Table 6-1 lists the attributes, states whether these attributes are required or optional, and describes their functions.

Table 6-1 Website or FTP Server Crawler Job Attributes 

Attribute
Description

start-url

(Required) Defines the relative path of the URL to start from for the specified crawl job.

depth (0, 1,-1)

(Optional) Defines the level of depth to crawl the specified website.

The depth is defined as the level of a website's URL links or FTP server's directory, where 0 is the URL or directory from which the crawler job starts.

0 = acquire only the starting URL
1, 2, 3,... = acquire the starting URL and its referred files to the depth specified
-1 = infinite or no depth restriction

The default is 20 if a depth is not specified.

Note It is not advisable to specify a depth of -1 because it will take a long time to crawl a large website and is wasteful if all of the content on that particular website is not required.

prefix

(Optional) Combines the host name from the <server> value and this field to create a full prefix. Only content whose URLs match the full prefix is acquired. For example:

<server name="xx"> <host name="www.cisco.com" proto="https" port=433/> 
</server>

and in a <crawler> tag:

prefix="marketing/eng/"

The full prefix is "https://www.cisco.com:433/marketing/eng/." Only URLs that match this prefix are crawled. If a web page refers to .../marketing/ops, the marketing/ops page and its children are not acquired.

If the prefix is omitted, the crawler checks the default full prefix, which is the host name portion of the URL from the server. In the previous example, the default full prefix is "https://www.cisco.com:433."

accept

(Optional) Uses a regular expression to define acceptable URLs to crawl, in addition to having acceptable URLs match a prefix. For example, accept="stock" means that only URLs that meet two conditions are crawled: the URL matches the prefix and also contains the regular expression string "stock."

reject

(Optional) Uses a regular expression to reject a URL if it matches the expression. The URL is first checked for a possible prefix match and then checked for a reject regular expression. If a URL does not match the prefix, it is immediately rejected. If a URL matches both the prefix and the reject regular expression, it is rejected by the expression.

max-number

(Optional) Specifies the maximum number of crawler job objects that can be acquired.

max-size-in-B
max-size-in-KB
max-size-in-MB

(Optional) Specifies the maximum size of content that this crawler job can acquire. The size can be expressed in bytes (B), kilobytes (KB), or megabytes (MB).



Note If you specify both the max-number and max-size attributes as the criteria to use to stop a crawler job, the condition that is met first takes precedence. That is, the crawler job stops either when the maximum number of objects is acquired or when the maximum content size is reached, whichever occurs first. For example, if the crawler job has acquired the maximum number of objects specified in the manifest file but has not yet reached the maximum content size, the crawler job stops.


The following is an example of a website crawler job.

<server name="cisco"> 
    <host name="http://www.cisco.com/jobs/" /> 
</server> 
<crawler 
    server="cisco" 
    start-url="eng/index.html" 
    depth="10" 
    prefix="eng/" 
    reject="\.pl" 
    max-size-in-MB="200" 
/>

The attributes of this website crawler job example are:

The start-url path is http://www.cisco.com/jobs/eng/index.html.

Search to a website link depth of 10.

Search URLs with the prefix http://www.cisco.com/jobs/eng/.

Reject URLs containing .pl (Perl script pages).

Crawl only until 200 megabytes in total content size is acquired.

If the server name attribute is omitted, the server name in the last specified <server> tag above it is used. If there are no <server> tags close by in the manifest file, the server that hosts the manifest file will be used, which means that the relative URL will be relative to the manifest file URL.

Scheduling Content Acquisition

Two attributes, ttl and prefetch, are used to schedule content acquisition. Use ttl to specify the frequency of checking the content for freshness, in minutes. For example, to check for page freshness every day, enter ttl="1440."

In the following example, page freshness is scheduled to be checked once a day.

<item 
    src="index.html" 
    ttl="1440" 
/> 

In the following example, page freshness is scheduled to be crawled and checked every hour to a link depth of 2.

<crawler 
    start-url="index.html" 
    depth="2" 
    ttl="60" 
/>

If the content is not yet available at a particular URL, the prefetch attribute can be used to specify the start time for acquisition at that specified URL. For example, prefetch="2002-28-06 18:35:21" means that the content acquisition job can only start on June 28, 2002 at this specific time.

The following example schedules a crawl of this website every hour to a link depth of 2 to start on November 9, 2001 at 8:45 a.m.

<crawler 
    start-url="index.html" 
    depth="2" 
    prefetch="2001-09-11 08:45:12" 
    ttl="60" 
/>

Specifying Shared Attributes

Attributes in single <item> tags can be shared or have the same attribute values. Instead of writing these attributes individually for every <item> tag, you can extract them and place them into a higher-level tag called <item-group>, where these attributes can be shared from this higher level tag. You can create an <item-group> tag at a level below the <CdnManifest> tag, and write <item> tags into it as subtags, moving shared attributes into the <item-group> tag, as shown in the following example:

<?xml version="1.0"?>
<CdnManifest>

<server name="cisco-cco">
  <host name="http://www.cisco.com"
         proto="http" />
</server> 

<item-group 
    server="cisco-cco"
    ttl="1440"
    type="prepos" >

   <item src="jobs/index.html"/>
   <item src="jobs/index1.html"/>
   <item src="jobs/index2.html"/>
   <item src="jobs/index3.html"/>
   <item src="jobs/index4.html"/>
   <item src="jobs/index5.html"/>

</item-group> 

</CdnManifest>

You can also use the <options> tag to share attributes at the topmost level of the manifest file. Shared attributes in the <options> tag can be shared by every <item> tag or by the <crawler> tag in the manifest file. However, if a shared attribute is specified in both the <item-group> and the <item> tags or the <options> and <item> tags, attribute values in the <item> tags take precedent over the <item-group> and <options> tags. For a list of shared attributes, see the "options" section.

The following example illustrates this precedence rule. The first <item> tag takes the TTL value 1440 from the <options> tag, but the second <item> uses its own TTL value of 60.

<options 
   ttl="1440" > 
<item src="index.html" /> 
<item src="index1.html" ttl="60" /> 

If you need to specify many single <item> tags and if a manifest file with many single items or URLs must be created, Perl scripts are available to create such single <item> tags. See the "Manifest File Automated Scripts" section to use automated Perl scripts.

Specifying a Crawler Filter

With a rule-based crawler filter, you can crawl an entire website and only acquire contents with certain predefined characteristics. Crawler attributes in the <crawler> tag do not act as filters but only define the attributes for crawling. The <matchRule> tag is designed to act as a rule-based filter. You can define rule-based matches for file extensions, size, content type, and time stamp. In the following example, the crawl job is instructed to crawl the whole website starting at "index.html," but to acquire only files with the .jpg extension and those larger than 50 kilobytes.

<crawler 
   start-url="index.html" > 
   <matchRule> 
       <match size-min-in-KB="50" extension="jpg" /> 
    </matchRule> 
</crawler>

There can be multiple <match> subtags within a <matchRule> tag. Table 6-2 lists and describes the <match> subtag attributes.

Table 6-2 <match> Subtag Attributes 

Attribute
Description

mime-type

Specifies match of these MIME-types.

extension

Specifies match of files with these extensions.

time-before

Specifies match of files modified before this time (using the Greenwich mean time [GMT] time zone) in yyyy-mm-dd hh:mm:ss format. See the "options" section for a description of the timeZone attribute.

time-after

Specifies match of files modified after this time (using the Greenwich mean time [GMT] time zone) in yyyy-mm-dd hh:mm:ss format.

size-min-in-MB
size-min-in-KB
size-min-in-B

(Optional) Specifies match of content size equal to or larger than this value. The size can be expressed in megabytes (MB), kilobytes (KB), or bytes (B).

size-max-in-MB
size-max-in-KB
size-max-in-B

(Optional) Specifies match of content size equal to or smaller than this value. The size can be expressed in megabytes (MB), kilobytes (KB), or bytes (B).


A <match> subtag can specify multiple attributes. Attributes within a <match> tag have a Boolean AND relationship. In the following example, to satisfy this match rule, a file must have an .mpg type file extension AND its size must be larger than 50 kilobytes.

<match extension="mpg" size-min-in-KB="50" />

There is a Boolean OR relationship between the <match> rules themselves. A <matchRule> tag can have multiple <match> subtags, but only one of these subtags must be matched. The <matchRule> tag can be specified as a subtag of the <crawler> tag, or a subtag of the <item-group> tag. If there is a subtag in an <item-group> tag, it is shared by every <crawler> tag within that <item-group> tag.


Note The accept or reject attributes can be mistakenly used in the <crawler> tag for a crawler filter.

For example, to crawl files with the extension .mpg, simply specifying accept="\.mpg" is not correct. In this case, although specifying accept="\.mpg" is not technically incorrect, no crawling occurs. Pages whose URLs do not match the accept constraint are not searched. For example, if the starting URL is index.html, this HTML file is parsed and any links not containing .mpg are rejected. If the .mpg files are located in the second or lower link levels, they are not fetched, because the links connecting them have been rejected.

To properly crawl for the .mpg extension, use <matchRule>. Specify <matchRule> <match extension="mpg" />. The whole site is crawled and only those files with the .mpg extension are retained.


Specifying Content Priority

A priority can be assigned to content objects to define their order of importance. The CDN determines the order of processing from the level of priority of the content. The higher the content priority, the sooner the acquisition of content from the origin server and the sooner the content is distributed to the Content Engines.


Note Every content object acquired by running a crawler job has the same priority.


Three factors combine to determine content priority:

Channel priority—Content Distribution Priority drop-down list in the Modifying Channels window of the Content Distribution Manager GIU in the Acquisition and Distribution Properties area.

Item index—Content order listed in the manifest file

Item priority—Priority of the attributes specified in the <item> or <crawler> tag

To calculate content priority, use either item-priority or item-index:

If there is a priority specified in item-priority of the manifest file for this content, use the following formula:

content-priority = channel-priority * 10000 + item-priority


Tip The item-priority within the <item> tag can be any integer and is unrestricted. If you want a particular content object to have the highest priority, specify a very large integer value in the item-priority for that particular content object in the content-priority formula.


If an object does not have an item-specified priority, use the item-index order within the manifest file:

content-priority = channel-priority * 10000 + 10000 - item-index


Note If there is no priority specified for any items, content is processed in the order listed in the manifest file.


Generating a Playserver List

ACNS 5.0 software supports playservers that play back the following pre-positioned content types on the CDN: HTTP, WMT, and RTSP (RealMedia and QuickTime Streaming Server [QTSS]). The CDN checks whether the requested protocol matches the list in the playserver table. If it matches, the request is delivered. If it does not match, the request is rejected.

You can generate a playserver list using these methods:

The manifest file, by configuring playserver attributes in an <item> tag

The <playServerTable> tag, by configuring playserver MIME-type extension names

To create the playserver list directly through the manifest file, configure playserver attributes of the playserver list in an <item> tag. If an <item> tag does not have a playserver attribute, its playserver list is generated through the <playServerTable> tag. If the <playServerTable> tag is omitted in the manifest file, a built-in default <playServerTable> tag is used to generate the playserver list. Multiple servers are separated by commas, as shown in the following example:

<item src="video.mpg" playServer="real,wmt" />

You can also generate the playserver list that supports these streaming media types through the <playServerTable> tag. The <playServerTable> tag maps content into a playserver list based on the MIME-type extension name. If there is a <playServerTable> tag in the manifest file, use the <playServerTable> tag in the manifest file.

To generate the playserver list though the <playServerTable> tag, use MIME-type extension names to configure which playserver can play the particular pre-positioned content, as shown in the following example:

<playServerTable>
<playServer name="real">
         <contentType name="application/x-pn-realaudio" />
         <contentType name="application/vnd.rn-rmadriver" />
         <extension name="rm" />
         <extension name="ra" />
         <extension name="rp" />
         <extension name="rt" />
         <extension name="smi" />
</playServer>
<playServer name="wmt">
         <extension name="wmv" />
         <extension name="wma" />
         <extension name="wmx" />
         <extension name="asx" />
         <extension name="asf" />
         <extension name="avi" />
</playServer>
<playServer name="http">
         <contentType name="application/pdf" />
         <contentType name="application/postscript" />
         <extension name="pdf" />
         <extension name="ps" />
</playServer>
</playServerTable>

The <playServerTable> tag is used to generate a playserver list for each content type. Note that in the preceding example, any file with a PDF or a PostScript extension uses HTTP to play the content. See the "Default PlayServerTable Schema" section to view the default playserver table.

Generating a Publishing URL

A publishing URL is the URL that plays back pre-positioned content in the CDN. A complete publishing URL consists of three parts:

Scheme

Domain name

Path

The path includes both the file directory path and the filename. The playserver list determines the publishing URL for the CDN. Again, the playserver list is generated directly through the manifest file, through the <playServerTable> tag in the manifest file, or through the default <playServerTable> tag.

Scheme

The scheme of the publishing URL is the protocol used to play the content type. For example, if an .asf video file can be played by both an HTTP and a WMT playserver, two URL schemes can be used to access this content: HTTP and MMS.

Domain Name

The domain name of the publishing URL is determined by the configuration of the CDN. If WCCP is used to redirect requests to a Content Engine, its domain name is the origin FQDN (fully qualified domain name) in the website or channel. If content routing is used, the content routing FQDN (the FQDN of the website) becomes the domain name.

Path

In most cases, the path of the publishing URL is the relative source URL, or the src attribute in the <item> tags. For content crawling, it is a relative URL, relative to the host name of the origin server.

Certain attributes in the manifest file allow you to alter the publishing URL path. These attributes are cdn-url in the <item> tag, and srcPrefix or cdnPrefix in the <crawler> and <item-group> tags. These attributes convert a relative source URL into a completely new relative CDN URL.

For the content in the following example, the path uses default.html instead of index.html.

<item src="index.html" cdn-url="default.html" />

The relative URL is always relative to the host name. In the following example, the relative URL is index.html, not sport/index.html.

<server> 
    <host name="http://www.cnn.com/sport/" /> 
</server> 

<item src="index.html" /> 

In the following example, the srcPrefix and cdnPrefix attributes convert the prefix of every crawled content object from NBA/ to ABC/. The relative cdn-url is ABC/*. The path for the start-url is ABC/index.html.

<crawler 
    start-url="NBA/index.html" 
    srcPrefix="NBA/" 
    cdnPrefix="ABC/" 
/>

Specifying Attributes for Content Serving

Certain attributes in the manifest file can be specified to control the manner in which content is served by the Content Engines. These attributes can be specified in the <item> and <crawler> tags. These same attributes can also be specified in <item-group> or <options> tags, so they can be shared by their <item> and <crawler> subtags. Table 6-3 lists and describes these content-serving attributes.

Table 6-3 Attributes for Content Serving 

Attribute
Description

noRedirectToOrigin

(Optional) Sets the redirection to the origin server to true or false. A false setting allows the CDN Content Engine to redirect content requests to the origin server if the content is not available at that device. A true setting does not allow the CDN Content Engine to redirect content requests to the origin server and generates an error. The default setting is false.

serveStartTime

(Optional) Designates a time in yyyy-mm-dd hh:mm:ss at which the CDN is allowed to start serving the content. If the serving start time is omitted, content is ready to serve once it is distributed to the Content Engine.

serveStopTime

(Optional) Designates a time in yyyy-mm-dd hh:mm:ss format at which the CDN temporarily stops serving the content. If the serving stop time is omitted, the CDN serves the content to the Content Engine until it is removed by modifying the manifest file or renaming the channel.

alternateUrl

(Optional) If the content requested by the user is not ready in the CDN, the CDN redirects the request to this alternative URL, which can be configured as an error reporting page. This attribute supports both the full URL or a relative path. (If it is a relative path, it must be relative to the requesting URL.)

requireAuth

(Optional) Determines whether users need to be authenticated before the specified content is played. If authentication is required, the Content Engine communicates with the origin server to check credentials. When true, requireAuth requires authentication to play back the specified content to users. If the requests pass the credential check, the content is played back from Content Engine. If this attribute is omitted, a heuristic approach is used. If the specified content is acquired by using a username and password, this attribute is required; otherwise, it is not required.


Specifying Metadata for Content Serving

In certain situations, you must specify the metadata for content playback. For example, if content is acquired from an FTP server but must be played back with HTTP, the HTTP playback metadata, such as MIME-type and cache control, must be specified.

The <http-meta-data> subtag is used to specify HTTP metadata. Within the <http-meta-data> subtag shown in the following example, the name=value attributes are content-type="video/x-asf" and app-data="hh and dd." These are specified so that the CDN passes them directly to the end user when the HTTP content is played back.

<http-meta-data content-type="video/x-asf" app-data="hh and dd" />

As with the HTTP metadata, you can use the <wmt-meta-data> subtag shown in the following example to specify WMT streaming properties, such as title, author, and copyright date.

<wmt-meta-data 
    Title="Who Let the Dogs Out?" 
    Author="Milton"
    Copyright="1968"

 />

Both <http-meta-data> and <wmt-meta-data> can be specified as subtags of <item> or <crawler> tags. For every <item> or <crawler> tag in the <item-group> tag that is to share the metadata, configure both <http-meta-data> and <wmt-meta-data> to be subtags of the <item-group> tag. If a <crawler> tag has either <http-meta-data> or <wmt-meta-data> as subtags, each of its crawled content objects shares these metadata.

Specifying Time Values in the Manifest File

The following attributes require that you enter a time value in the format yyyy-mm-dd hh:mm:ss.

prefetch

serveStartTime

serveStopTime

expires

time-before

time-after

In the manifest file, the time string conforms to the yyyy-mm-dd hh:mm:ss (year-month-day hour:minute:second) format. A time zone designation can be optionally specified at the end of a time string to indicate the particular time zone used. If a time zone designation is omitted, the Greenwich mean time (GMT) time zone is used. For a complete list of time zone designations and their GMT offsets, see the "Manifest File Time Zone Tables" section. Note that the automatic conversion between daylight saving time and standard time within a time zone is not supported, but a special designation for daylight saving time can be used, such as PDT for Pacific daylight saving time. In the following example, the prefetch time is September 5, 2002 at 09:09:09 Pacific daylight saving time:

<options timeZone="PDT" /> 
<item src="index.html" prefetch="2002-09-05 09:09:09 PDT" />

Refreshing and Verifying the Manifest File Content

Use the expires and ttl (Time To Live) attributes of the manifest file to monitor and control the freshness of the content objects. Additionally, you can specify the GMT time zone (see the "Specifying Time Values in the Manifest File" section). The expires attribute designates a time in yyyy-mm-dd hh:mm:ss (year-month-day hour:minute:second) format that the content is to be removed from CDN. If a time value is omitted when you set the expires attribute, content is stored at the CDN until it is explicitly removed when you modify the manifest file. The ttl attribute designates a time interval, in minutes, for revalidation of the content.

As content is modified or updated on the origin server, it updates the content on the CDN at the time interval set by the ttl attribute. This ttl attribute represents the minimum amount of time in which the content is to be updated. As the file size and volume of the content increase, the time needed to refresh the content can increase beyond the time interval set by the ttl attribute. If the modifications or updates to the content are relatively large, it is more practical to fetch the entire content from the origin server.

You can monitor the status of content replication and freshness by enabling and then viewing the transaction log files that reside on the Content Engines of your CDN. To verify whether a content object or file was successfully imported to or refreshed on a particular Content Engine:

Enable the transaction log function on the Content Engine you want to monitor.

View the transaction log entries for the content object or filename that resides on that Content Engine.

Specifying Live Content

The two types of live content that you can specify in a manifest file are:

wmt-live

real-live

Use the <item> tag and specify the type attribute as either wmt-live or real-live, as shown in the following example.

<CdnManifest>
<server name="wmt-server">
<host name="mms://www.company-web-site.org" />
</server>
<item src="/tmp/ceo-talk" type="wmt-live" >
<wmt-meta-data title="Company's vision" copyright="FirstName LastName" />
</item>
<!--
This is a "wmt-live" streaming content type specified by the "type" attribute. The live 
stream URL is
mms://www.company-web-site.org/tmp/ceo-talk. The "title" and "copyright" metadata is added 
using <wmt-meta-data> tag.
-->
<server name="real-server">
<host name="real-server" proto="rtsp" />
</server>
<item src="tmp/funny-video" type="real-live" />
<!--
This is "real-live" streaming content type specified by the "type" attribute. The stream 
URL is rtsp://real-server/tmp/funny-video.
-->
</CdnManifest>

Two live streams are specified in the preceding manifest file example. One is wmt-live with url=mms://www.company-web-site.org/tmp/ceo-talk and the other one is real-live with url=rtsp://real-server/tmp/funny-video.

More Sample Manifest Files

This section contains five sample manifest files. In XML, text between <!--and--> represents comments and has no effect on the execution of the file. In these five samples, narrative comments have been added immediately below certain tags or groups of tags to provide you with a better understanding of what these particular tags mean. You can copy an entire sample file, save it to a text file, and then view it with Microsoft Internet Explorer.

Additionally, cross-reference links from the first occurrence of a tag to the "Manifest File Reference" section of this guide have been embedded in the narrative comments of each sample to provide you with a more in-depth explanation of the tag if you feel further explanation is necessary.

To download these sample files from Cisco.com, see the "Downloading the Sample Files" section.

Topics covered by the five sample manifest files are:

Sample 1

How to use HTTP, HTTPS, and FTP protocols to acquire content

How to specify a username and password when authentication is required

Sample 2

How to specify attributes for acquisition, such as:

ttl—Sets the time interval between content freshness checks

prefetch—Specifies the time when the ACNS software can start to acquire content from the origin server

How to specify acquisition and distribution priorities

How to specify the following playback attributes:

serveStartTime—Sets time and date to start serving this content

serveStopTime—Sets time and date to stop serving this content

alternativeURL—Provides an alternative URL to use if content has not yet been replicated at the Content Engine

requireAuth—Requires authentication credentials from users to play back the content

expires—Sets an expiration time and date for content

playServer—Chooses which play servers can play the specified content

noRedirectToOrigin—If false and content has not yet been replicated, does not redirect the incoming request to the origin server

<http-meta-data>—Adds attributes for HTTP playback

<wmt-meta-data>—Adds attributes for WMT playback

Sample 3

A simple crawl job

FTP crawl of a directory

HTTP crawl of a directory

HTTP crawl of a website

A simple crawl job using the <matchRule> tag

FTP crawl of a directory to fetch only MPEG files

HTTP crawl of a directory to fetch only files larger than 10 MB

HTTP crawl of a website, to fetch only if-modified-since (IMS) files

Sample 4

Items with the <contains> tag included

Sample 5

RealMedia and WMT streaming live content

Sample 1

Sample 1 is a manifest file written to acquire single items, some of which require a username and password for authentication purposes, with HTTP, HTTPS, and FTP.

<!--
The CdnManifest tag pair is absolutely essential for a manifest file. It must be the first 
tag and is used only as a super tag for other tags.
-->
<server name="httpserver">
      <host name="my-server.xyz.com" proto="http" /> 
</server>
<!-- 
The preceding XML defines the origin server using the <server> tag from which to obtain 
content. Using the <host/> tag, the content is to be fetched using HTTP as specified by 
the "proto" attribute.
-->

<item src="myphotocollection/index.html" /> 
<item src="myphotocollection/myname/000001.jpg" /> 
<!--
The preceding XML defines single items, using the <item/> tag, to be obtained from the 
origin server. The "src" specifies the relative path to the web publishing root on the 
server. The full URL for the first item is "http://my-server.xyz.com/ 
myphotocollection/index.html"
-->


<server name="auth-httpserver">
     <host name="http://my-auth-server.xyz.com" 
    user="myself" 
    password="mypwd" /> 
</server>
<!--
This origin server requires user authentication. The "user" and "password" attributes 
specify the required username and password to access content from the origin server. In 
this case, the name attribute can have a fully qualified domain name with both protocol 
and port.
-->

<item src="mymoviecollection/index.html" /> 
<item src="mymoviecollection/myname/000001.wmv" /> 
<!--
Again, the preceding XML defines two single items to obtain from the origin server. 
Because the <server> tag with name="auth-httpserver" is the closest <server> tag, it is 
used as the origin server for the two items.
-->

<server name="httpsserver">
  <host name="my-server.xyz.com" proto="https" /> 
</server>
<!--
From this origin server, the content is to be acquired using HTTPS, or HTTP over SSL, so 
that the protocol specified is HTTPS.
-->

<item src="my_secure_photocollection/index.html" /> 
<item src="my_secure_photocollection/myname/000001.jpg" /> 
<!--
The preceding XML defines two single items to obtain from the origin server. These two 
items are relative to the web publishing root on the server.
-->

<server name="auth-httpsserver">
  <host name="https://my-auth-server.xyz.com:443" 
user="myself" password="mypwd"
/> 
</server>
<!--
The preceding XML defines the origin server from which to obtain content. The content is 
to be acquired using HTTPS, or HTTP over SSL, so that the protocol specified is HTTPS. 
This origin server also requires user authentication. The "user" and "password" attributes 
specify the required username and password to access content from the origin server. The 
sslAuthType is used to set either "weak" or "strong" SSL certification. For example, 
"weak" certification allows expired, self-signed certification.
-->

<item src="my_auth_moviecollection/index.html" /> 
<item src="my_auth_moviecollection/myname/000001.wmv" /> 
<!--
Again, the preceding XML defines two single items to obtain from the origin server. These 
two items are relative to the web publishing root on the server.
-->

<server name="ftpserver">
  <host name="my-server.xyz.com" proto="ftp" /> 
</server>
<!--
The preceding XML defines the origin server from which to obtain content. Here, the 
content is to be acquired using FTP, so that the protocol specified is FTP.
-->

<item src="/my-doc-root/myphotocollection/index.html" /> 
<item src="my-doc-root/myphotocollection/file1.jpg" /> 
<!--
The preceding XML defines two single items to obtain from the origin server. Notice that 
the first item starts with a "/" (forward slash). This means that the path is absolute, or 
relative to the root directory. The second item does not start with a "/" (forward slash). 
This means that the content path is relative to the default login directory for an 
anonymous user.

To understand absolute and relative paths, consider the following directory listings:

The first directory lists the contents of /my-doc-root, and the second directory lists the 
contents of anonymous-default-dir, where anonymous-default-dir is the default directory 
for the "anonymous" user.

xyz# ls -lR /my-doc-root/
/my-doc-root/:
total 1
drwxrwxrwx    2 admin    root         1024 Dec 28 01:46 myphotocollection

/my-doc-root/myphotocollection:
total 1
-rw-rw-rw-    1 admin    root            4 Dec 28 01:46 index.html

xyz# ls -lR /anonymous-default-dir/
/anonymous-default-dir/:
total 1
drwxrwxrwx    3 admin    root         1024 Dec 28 01:53 my-doc-root

/anonymous-default-dir/my-doc-root:
total 1
drwxrwxrwx    2 admin    root         1024 Dec 28 01:53 myphotocollection

/anonymous-default-dir/my-doc-root/myphotocollection:
total 1
-rw-rw-rw-    1 admin    root            4 Dec 28 01:53 index.html

The single item with the following absolute path
  <item src="/my-doc-root/myphotocollection/index.html" /> 
fetches the file /my-doc-root/myphotocollection/index.html.

The single item with the following relative path
  <item src="my-doc-root/myphotocollection/file1.jpg" /> 
fetches the file /anonymous-default-dir/my-doc-root/myphotocollection/file1.jpg.

You must be careful to specify exactly what you want.
-->

<server name="auth-ftpserver">
  <host name="ftp://my-auth-server.xyz.com" 
user="myself" password="mypwd" /> 
</server>
<!--
The preceding XML defines the origin server from which to obtain content. Here, the 
content is to be acquired using FTP, so that the protocol specified is FTP. The origin 
server requires user authentication. The "user" and "password" attributes specify the 
required username and password to access content on the origin server.
-->

<item src="/my-doc-root/mymoviecollection/index.html" /> 
<item src="my-own-moviecollection/wedding/file1.wmv" /> 
<!--
The preceding XML defines two single items to obtain from the origin server. Notice that 
the first item specifies an absolute path, and the second one specifies a relative path. 
In this case, the relative path is relative to the default login directory for the user 
"myself."
-->

Sample 2

Sample 2 is a manifest file written to show how to specify attributes.


<server name="ftp-server" >
       <host name="ftp://my-ftp-server" />
</server>

<item src="data/video.mpg" 
            ttl="60"
	type="prepos"
	prefetch="2003-03-20 10:00:00 PST"
	requireAuth="true"
	playServer="http,wmt"
	expires="2003-06-12 14:00:00 PST"
	alternateUrl="http://my-web-server.com/video-error-page.htm"
	priority="50000"
	serveStartTime="2003-01-12 14:00:00 PST"
	serveStopTime="2099-04-12 14:00:00 PST"
/>
<!-- 
src:    specifies the file location and is required.
prefetch:    specifies the time when the ACNS software can start to acquire content from 
the origin server.
ttl:    checks whether this file is updated every 60 minutes. This value is required.
noRedirectToOrigin:    when false, does not redirect the request to the  origin server if 
the content has not yet been replicated to the Content Engine.
requireAuth:    when true, requires authentication to play back this content to users. 
User requests are redirected to the origin server to check credentials. If the requests 
pass the credential check, the content is played back from the Content Engine.
playServer:    allows the HTTP Apache server and WMT server to play back this content. 
That is, the supported playback protocols for this content are HTTP and MMS.
expires:    removes content from the CDN when the content expires on the specified date.
alternateUrl:    redirects the user to this URL when the request to play back the content 
is received but the content has not yet been replicated to the Content Engine.
priority:    specifies the item-priority. Content acquisition and distribution is 
processed in the order set by the overall priority. This means that the higher the overall 
priority, the earlier the content is acquired and distributed. The overall priority is 
calculated as channel-priority * 10000 + item-priority. Channel priority is 250 for low, 
500 for normal, and 750 for high. Item-priority is 10000 - (index of the item in the 
manifest file) if a priority is not specified. For example, there are two items in this 
manifest file. The first item does not have a "priority" attribute, but the second item 
does and its priority is 20000. The item-priority of the first item is 10000 - 1 = 9999 
and the item-priority of the second item is 20000. In this example, the item priority for 
this item is 50000.
serveStartTime:    specifies the time CDN can start to serve this content.
serveStopTime:    specifies the time CDN stops serving this content.
-->

<crawler start-url="/root/data/video-files/"
	depth="3"
            ttl="60"
	requireAuth="true"
	playServer="http,wmt"
	expires="2003-06-12 14:00:00 PST"
	alternateUrl="http://my-web-server.com/video-error-page.htm"
	priority="50000"
	serveStartTime="2003-01-12 14:00:00 PST"
	serveStopTime="2003-04-12 14:00:00 PST"
/>
<!--
start-url:    specifies the crawling start directory "/root/data/video-files/."
depth:    specifies the crawl level of three subdirectories.
noRedirectToOrigin:    if false and the crawled items are not replicated to the Content 
Engine, the request for that content is not redirected to the origin server.
requireAuth:    if true, authentication is required for all crawled content.
playServer:    all crawled content can be played back by an HTTP web server and WMT 
streaming server.
expires:    all crawled content expires and is deleted at the specified time.
alternateUrl:    if any of the crawled items are not replicated to the Content Engine, the 
request for that content is redirected to this URL.
priority:    all crawled items have the same item-priority as 50000. Because they are in 
the same channel, they have the same overall priority.
serveStartTime:    all crawled content can be served after the specified time.
serveStopTime:    all crawled content cannot be served after the specified time.
-->

<item src="data/video2.mpg"
playServer="http,wmt"  >
	<http-meta-data content-type="video/mpeg" />
	<wmt-meta-data author="johnw" title="Movie" />
</item>
<!--
The <http-meta-data/> tag can be used to specify any metadata for HTTP playback. For 
example, because this item is acquired using FTP, this tag must be used to specify the 
content type for this MPEG file. The <wmt-meta-data/> tag can be used to specify 
attributes, such as title, author, copyright, and description, for WMT playback.
--> 

<crawler start-url="data/mpeg-files-2/"
	depth="3" >
	<http-meta-data content-type="video/mpeg" />
	<wmt-meta-data author="john" copyright="2003, Cisco Systems Inc." />
</crawler>
<!--
For this crawl job, three subdirectory levels are crawled under the "data/mpeg-files-2/" 
folder. The <http-meta-data> tag is used to specify content-type for HTTP playback for all 
crawled content. The <wmt-meta-data> tag is used to specify WMT attributes, such as author 
or copyright, for all crawled content.
-->

Sample 3

Sample 3 is a manifest file that shows how to use the crawl feature.


<server name="ftp-server" >
	<host name="ftp://ftp-server"  />
</server>
<crawler start-url="pub-data/video-files/"
	ttl="10"
	depth="1"  />
<!--
The preceding XML specifies an FTP crawl job to crawl the "ftp-server" using the <crawler> 
</crawler> tag pair. The starting directory is "pub-data/video-files/" and the crawl depth 
is 1. The files in this folder are monitored at 10-minute intervals. If files are updated, 
removed, or added, the resulting change is reflected in the CDN.
-->

<crawler start-url="video-files/" ttl="10" depth="1" >
		<match extension="mpg" />
</crawler>
<!--
This crawl job is similar to the preceding crawl job, except that it includes a 
<matchRule> </matchRule> tag pair to specify the kind of content that is to be acquired. 
In this case, only files with the "mpg" file extension are acquired.
-->

<server name="http-server" >
	<host name="http://www.ftp-server.com"  />
</server>

<crawler start-url="pub-data/video-files/"
	depth="5"  />
<!--
This is an HTTP directory crawl job. The HTTP server must be configured to enable the 
directory indexing feature for those directories that you want to crawl. For the Apache 
server, you must modify the Apache configuration file so that it looks like the following:

 <crawl directories pub-data/video-files/ >
Options Index

If the request URL points to a directory, the web server dynamically generates an HTML 
page with a list of files contained in that directory.

In this crawl job, the directory "pub-data/video-files/" and its subdirectories are 
crawled to a depth level of up to 5.
-->

<crawler start-url="pub-data/video-files/" depth="5"  >
	<match size-min-in-MB="10" extension="mpg" />
</crawler>
<!--
This crawl job is similar to the preceding crawl job, except that it includes the 
<matchRule> </matchRule> tag pair. This matchRule tag fetches only files that match files 
with the extension "mpg" and file sizes equal to or larger than 10 MB.
-->

<server name="cnn-site" >
	<host name="http://www.cnn.com" />
</server>
<crawler start-url="sport/index.htm"
	  prefix="sport/"
	  depth="3"
/>
<!--
This crawl job attempts to crawl part of cnn.com. The start URL is 
http://www.cnn.com/sport/index.htm. It only crawls URLs with the prefix 
http://www.cnn.com/sport/, acquiring only files from the directory "sport/." The "depth=3" 
means the job is to crawl only up to 3 link levels. The max-size-in-MB means that crawling 
stops if the total of crawled items reaches 1000 MB in size. 
-->

 <crawler start-url="movie/index.htm"  depth="3" >
		<match extension="mpg" time-after="2002-01-02 00:00:00" />
		<match extension="asf" time-after="2002-07-02 00:00:00" />
</crawler>
<!--
This crawl job attempts to crawl part of cnn.com. The start URL is 
http://www.cnn.com/movie/index.htm. and the crawl depth is 3. This matchRule acquires only 
files with the "mpg" extension created after than Jan. 2, 2002 or files with the "asf" 
extension created after July 2, 2002.
-->

Sample 4

Sample 4 is a manifest file written to show the purpose and use of the <contains> tag. The <contains> tag is designed to prevent serving content if the required files are not present on the Content Engine. Typically, the delivery of a presentation consists of serving multiple files. For example, if an ASF video file uses several JPG or HTML files for its presentation, but the JPG or HTML files are not present, then the ASF video is not served.

<server name="my-origin-server">
   <host name="http://my-origin-server/"  />
</server>

<item src="images/intro.html" />
<item src="images/intro.jpg" />
<!--
These are just two regular single item acquisition jobs. 
-->

<item src="movie/movie1.asf" >
	<contains cdn-url="images/intro.html" />
	<contains cdn-url="images/intro.jpg" />
</item>
<!--
The preceding item, movie1.asf, contains two other items, intro.html and intro.jpg. The 
items are contained using the <contains/> tag. If these two contained items are not 
present on the Content Engine, then the CDN does not serve the container file movie1.asf.
-->

Sample 5

Sample 5 is a manifest file written to show how to specify live content. ACNS 5.0 software supports two types of live content: wmt-live and real-live. You need to use the type attribute to specify live streaming content.

<server name="wmt-server">
<host name="mms://www.company-web-site.org" />
</server>
<item src="/tmp/ceo-talk" type="wmt-live" >
<wmt-meta-data title="Company's vision" copyright="FirstName LastName" />
</item>
<!--
This is the "wmt-live" streaming content type specified by the "type" attribute. The live 
stream URL is mms://www.company-web-site.org/tmp/ceo-talk. The "title" and "copyright" 
metadata is added using the <wmt-meta-data> tag.
-->
<server name="real-server">
<host name="real-server" proto="rtsp" />
</server>
<item src="tmp/funny-video" type="real-live" />
<!--
This is the "real-live" streaming content type specified by the "type" attribute. The 
stream URL is rtsp://real-server/tmp/funny-video.
-->

Downloading the Sample Files

To download the preceding five sample files from Cisco.com, follow these steps:


Step 1 Go to the following URL to find the sample files:

http://www.cisco.com/pcgi-bin/tablebuild.pl/acns50

Step 2 When prompted, log in to Cisco.com using your designated Cisco.com username and password.

The Cisco ACNS Software download page appears, listing the available software updates for the Cisco ACNS software product.

Step 3 Locate the file named ACNS-5.0.1-manifest-samples.zip. This is a Zip archive containing the sample manifest files.

Step 4 Click the link for the ACNS-5.0.1-manifest-samples.zip file. The download page appears.

Step 5 Click Software License Agreement.

A new browser window opens, displaying the license agreement.

Step 6 After you have read the license agreement, close the browser window displaying the agreement and return to the Software Download page.

Step 7 Click the filename link labeled Download.

Step 8 Click Save to file and then choose a location on your workstation to temporarily store the zipped file containing the sample files.

Step 9 Use your preferred unzip program to unpack the scripts to a location on your workstation or your network.

After you have unzipped the sample files, you are ready to begin using them to create your own manifest files for your website.


Manifest Validator Utility

Because correct manifest file syntax is so important to the proper deployment of pre-positioned content on your CDN, Cisco makes available a manifest file syntax validator. The Manifest Validator, a Java-based command-line interface that verifies the correctness of the syntax of the manifest file you have written or modified, is built into the Content Distribution Manager GUI.

The Manifest Validator utility tests each line of the manifest file to identify syntax errors where they exist and determine whether or not the manifest file is valid and ready for use in importing content into your CDN. The results of these syntax validation tests are logged into a text file at a location that you name.

Running the Manifest Validator Utility

The Manifest Validator utility is built into the Content Distribution Manager GUI. Figure 6-1 shows the Manifest Validator GUI window.

Figure 6-1 Manifest Validator Content Distribution Manager GUI Window

To access the Manifest Validator, follow these steps:


Step 1 From the Content Distribution Manager GUI, choose Channels > Channels.

The Channels window appears.

Step 2 Click either the Edit or Create New Channel icon.

Step 3 From the Contents pane, choose Tools > Manifest Validator.



Note You must first create a new channel or edit an exiting channel before you can access the Manifest Validator.


Enter the URL of the manifest file that you want to test in the Manifest File field and click Validate. The Manifest Validator checks the syntax of your manifest file to make sure that source files are named for each content item in the manifest. The Manifest Validator then checks the URL for each content item to verify that the content is placed correctly and then displays the output in the lower part of the GUI window. The Manifest Validator does not determine the size of the item.

Valid Manifest File Example

The following is an example of a valid manifest file:

<CdnManifest>
<item
        src="tmp/mao's.html"
        priority="20"
        />
<server name="my-dev'box">
<host name="http://128.107.150.26"
        proto="http" />
</server>
<item
        src="tmp/lu.html"
        priority="300"
        />
<item
        src="/tmp/first_grader.html"
        />
<server name="server0">
       <host name="http://umark-u5.cisco.com:8080/" />
</server>
<item src="a.gif"/>
<server name="server1">
       <host name="http://unicorn-web" />
</server>
<item src="Media/wmtfiles/DCA%20Disk%201/Microsoft_Logos/Logos_100k.wmv" />

</CdnManifest>

The final lines of the manifest file validator's output indicate whether or not the manifest is valid. Wait until the following message is displayed, indicating that the manifest file validator has finished processing the manifest file that you pointed to:

Total Number of Error: 0 
Total Number of Warning: 0 
Manifest File is CORRECT.

If errors are found, the error messages reported appear before the preceding message.

Invalid Manifest File Example

The following is an example of an invalid manifest file:

<CdnManifest>
<item
        src="tmp/mao's.html"
        priority="20"
        />
<server name="my-dev'box">
<host name="http://128.107.150.26"
        proto="http" />
</server>
<item
        src="tmp/lu.html"
        priority="300"
        />
<item
        src="/tmp/first_grader.html"
        />
<server name="server0">
       <host name="http://umark-u5.cisco.com:8080/" >
</server>
<item src="a.gif"/>
<server name="server1">
       <host name="http://unicorn-web" />
</server>
<item src1="Media/wmtfiles/DCA%20Disk%201/Microsoft_Logos/Logos_100k.wmv" />
</CdnManifest>

In the preceding example, although there are no warnings, two errors are found, and this manifest file is syntactically incorrect, as shown in the following message:

ERROR (/state/dump/tmp.xml.1040667979990 line: 23 col: 1 ):No character data is allowed by 
content model 
ERROR (/state/dump/tmp.xml.1040667979990 line: 23 col: 9 ):Expected end of tag 'host' 
  Manifest File: /state/dump/tmp.xml.1040667979990 
  Total Number of Error: 2 
  Total Number of Warning: 0 
  Manifest File is NOT CORRECT!

The following is a full-text output example of the invalid manifest file after the Manifest Validator checks the file:

Manifest validated: http://qiwzhang-lnx/nfs-obsidian/Unicorn/my-single-bad.xml
The manifest is downloaded as /state/dump/tmp.xml.1040667979990 for validation, this file 
will be removed when validation is completed. 
Start CdnManifest 
Start item 
     priority=20 
     src=tmp/mao's.html 
End item

Start server 
     name=my-dev'box 
Start host 
     name=http://128.107.150.26 
     proto=http 
     uuencoded=false 
End host 

End server 

Start item 
     priority=300 
     src=tmp/lu.html 
End item 

Start item 
     src=/tmp/first_grader.html 
End item 

Start server 
     name=server0 
Start host 
     name=http://umark-u5.cisco.com:8080/ 
     uuencoded=false 
ERROR (/state/dump/tmp.xml.1040667979990 line: 23 col: 1 ):No character data is allowed by 
content model 
ERROR (/state/dump/tmp.xml.1040667979990 line: 23 col: 9 ):Expected end of tag 'host' 
Manifest File: /state/dump/tmp.xml.1040667979990 
Total Number of Error: 2 
Total Number of Warning: 0 
Manifest File is NOT CORRECT! 

Understanding Manifest File Validator Output

The manifest file validator messages appear below the Manifest File field in the Manifest Validator window of the Content Distribution Manager GUI.

Each output file has a similar structure and syntax. It clearly identifies any errors or warning messages arising from incorrect manifest file syntax. Manifest files are determined by the validator to be either:

CORRECT—Contains possible syntax irregularities but is syntactically valid and ready for deployment on your CDN

INCORRECT—Contains syntax errors and is unsuitable for deployment on your CDN

Syntax Errors

The manifest file validator issues syntax errors only when it cannot identify a source file for a listed content item, either because it is not listed, or because it is listed using improper syntax. Files containing syntax errors are marked INCORRECT.

Syntax errors are identified in the output with the ERROR label. In addition to the label, the line and column numbers containing the error are provided, as well as the manifest attribute for which the error was issued. An error appears in the following example:

ERROR (/state/dump/tmp.xml.1040667979990 line: 23 col: 1 ):No character data is allowed by 
content model

/state/dump/tmp.xml.1040667979990 is the manifest file name

line: 23 col: 1 is the manifest file line and column number where the error occurs

No character data is allowed by content model describes the type of manifest file error

Syntax Warnings

The manifest file validator issues syntax warnings for a wide variety of irregularities in the manifest file syntax. Files containing syntax warnings may be marked CORRECT or INCORRECT, depending on whether or not syntax errors have also been issued.

Syntax warnings are identified in the output with the WARNING label. In addition to this warning label, the line number for which the warning is issued is provided, as well as the manifest attribute, valid options, and the default value for that attribute for which the warning was issued.

Correcting Manifest File Syntax

Once you have identified syntax warnings, errors, and messages using the output from the manifest file validator, you can correct your manifest file syntax and then rerun the manifest file validator on the corrected file to verify its correctness.

To correct syntax warnings and errors in your manifest file, follow these steps:


Step 1 Open your manifest file using your preferred XML editor.

Step 2 Referring to your manifest file validator output, use the line numbers provided by the manifest file validator to locate the syntax violations in your manifest file.

It is a good idea to review every warning and error in your manifest file. Some warnings, although they still allow the manifest file validator to find your manifest file syntax to be correct, can be the source of problems when you deploy the identified content to your CDN.

Step 3 After you have made the necessary corrections for syntax warnings and errors, click Save.

Step 4 Run the manifest file through the manifest file validator again and review the validator output for new or unresolved errors and warnings.

Step 5 Repeat Step 1 through Step 4 until every error and warning have been adequately resolved and the manifest file validator indicates that your manifest file syntax is correct.


Manifest File Reference

This major section contains the following topics:

Manifest File Structure and Syntax

XML Schema

Manifest File Automated Scripts

Manifest File Time Zone Tables

The most efficient and least error-prone methods of creating a manifest file are:

Modify one of the sample manifest files in this chapter to suit your particular needs, ensuring that your XML syntax is correct.

Use the two sample Perl scripts that can be downloaded from Cisco.com as is, or customize these downloaded scripts for your own purposes.

You can start with one of the prewritten sample XML manifest files presented in this chapter. Choose a sample manifest file that is closest to matching your content acquisition and pre-positioning needs, and then modify the XML code accordingly, while ensuring that your XML syntax is correct.

Alternatively, use the sample Perl scripts that Cisco provides (see the "Obtaining the Perl Scripts" section).

Once you have created a suitable manifest file, you can verify its correctness by running the Manifest Validator utility (see the "Manifest Validator Utility" section) on your newly written XML code from the Content Distribution Manager GUI.

Manifest File Structure and Syntax

The Cisco ACNS 5.0 software manifest file provides powerful features for representing and manipulating CDN data that can be easily edited using any simple text editor. Table 6-4 provides a summary list of the manifest file tags, their corresponding attributes and subelements, and a brief description of each tag. Table 6-5 shows an example of how tags are nested in a manifest file. The sections that follow provide a more detailed description of the manifest file tags, the data they contain, and their attributes.

Table 6-4 Manifest File Tag Summary 

Tag Name
Subelements
Attributes
 
Description

CdnManifest

<playServerTable/>
<options/>
<server/>
<item/>
<item-group/>
<crawler/>

None

 

Marks the beginning and end of the manifest file content.

playServerTable

<playServer/>

None

 

(Optional) Sets default mappings for media types.

playServer

<contentType/>
<extension/>

name1

real
http

qtss
wmt

Names the media server type on the Content Engine responsible for playing content types and files with extensions mapped to it using <contentType> tags.

contentType

None

name

http
media
qtss

real
wmt

See Table 6-6

(Optional, but must have either contentType or extension) Names the MIME-type content mapped to a playserver.

extension

None

name

http
media
qtss

real
wmt

See Table 6-6

(Optional, but must have either contentType or extension) Names the file extension that is mapped to a playserver.

options

None

timeZone
alternateUrl
expires
noRedirectToOrigin
playServer
prefetch

serveStartTime
serveStopTime
server
priority
ttl
type

(Optional) Defines attributes specific to the manifest file that can be shared.

server

<host/>

name

 

Defines only one host from which content is to be retrieved.

host

None

name
proto
port
user

password
unencoded
sslAuthType

Defines a web server or live server from which content is to be retrieved and later pre-positioned.

item

<contains/>
<wmt-meta-data/>
<http-meta-data/>

src
cdn-url
type
noRedirectToOrigin
playServer
prefetch

expires
ttl
serveStartTime
serveStopTime
alternateUrl
priority
requireAuth

Identifies specific content that is to be acquired from the origin server.

crawler

<wmt-meta-data/>
<http-meta-data/>
<matchRule/>

start-url
depth
prefix
accept
reject
max-number
max-size-in-MB
srcPrefix
cdnPrefix
requireAuth
noRedirectToOrigin

playServer
prefetch
expires
ttl
serveStartTime
serveStopTime
alternateUrl
priority
server
type

Supports crawling of a website or FTP server.

item-group

<wmt-meta-data/>
<http-meta-data/>
<matchRule> <matchRule/>
<crawler></crawler>
<item></item>

alternateUrl
expires
noRedirectToOrigin
playServer
prefetch
serveStartTime
serveStopTime

server
priority
srcPrefix
cdnPrefix
ttl
type
requireAuth

Places shared attributes under one tag so that they can be shared by every <item> and <crawler> tag within that group.

matchRule

<match>

None

 

(Optional) Defines additional filter rules for crawler jobs.

match

None

MIME-type
extensions
time-before

time-after
size-min-in-KB
size-max-in-KB

(Optional) Specifies the acquisition criteria of content objects before they can be acquired by the CDN.

contains

None

cdn-url

 

(Optional) Identifies content objects that are embedded within the content item currently being described.

wmt-meta-data

None

name=value

 

(Optional) Specifies one or more file attributes that are displayed in the Windows Media Player when the file is played back.

http-meta-data

None

name=value

 

(Optional) Sends HTTP response headers to end user HTTP requests to specify content type for FTP acquired content.

1 Attributes that are required for a tag are shown in boldface font.


Table 6-5 Manifest File Nested Tag Relationships 

<CdnManifest>

       
 

<playServerTable>
<playServer>

     

 

<contentType/>
<extension/>

   
     

</playServerTable>
</playServer>

 
 

<options>

     
   

Manifest file shared attributes

   
     

</options>

 
 

<server>

     
   

<host/>

   
     

</server>

 
 

<item>

     
   

<contains/>
<wmt-meta-data/>
<http-meta-data/>

   
     

</item>

 
 

<crawler>

     
   

<wmt-meta-data/>
<http-meta-data/>
<matchRule/>

   
     

</crawler>

 
 

<item-group>

     
   

<contains/>
<wmt-meta-data/>
<http-meta-data/>

   
     

</item-group>

 
       

</CdnManifest>


CdnManifest

The <CdnManifest> </CdnManifest> tag set is required and marks the beginning and end of the manifest file content. At a minimum, each <CdnManifest> tag set must contain at least one item, or content object, that is fetched and stored.

Attributes

None

Subelements

The <CdnManifest> tag set can contain the following subelements:

playServerTable

The <CdnManifest> tag set can only contain one playServerTable subelement.

options

The <CdnManifest> tag set can only contain one options subelement.

server

item

item-group

crawler

Example

<CdnManifest>
    <server name="origin-server"> 
                <host name="www.name.com" proto="http" port="80" />
    </server>
    <item cdn-url= "logo.jpg" server="originserver"  src= "images/img.jpg" type="prepos" 
              playServer="http" ttl="300"/>
</CdnManifest>

playServerTable

The <playServerTable> </playServerTable> tag set is optional and provides a means for you to set default mappings for a variety of media types. Mappings can be set for both MIME-type content (the preferred mapping) and file extensions. Playserver tables allow you to override default mappings on the Content Engine for content types from a particular origin server. Playservers can be any one of the following four streaming servers: WMT, RealMedia, HTTP, or QTSS. If no <playServerTable> tag is configured in the manifest file, a default <playServerTable> tag is used.

Using the manifest file, you can map groups of content items as well as individual content objects to an installed playserver. The following are content item and manifest file playserver mappings:

Content item URL

Playserver mappings appear immediately after the origin server name in place of the default <cdn-media> tag.

Manifest file as an attribute of the <item> or <item-group> tag

Playserver mappings placed at this location are identified using the playserver attribute and only apply to the named item or group of items.

Manifest file as a playserver table

Mappings are grouped within the <playServerTable> and <playServer> tags and are applied to content served from the origin server as directed by the manifest file.

System-level

Playserver mappings are configured during CDN startup.

The <playServerTable> tags are enclosed within the <CdnManifest> tags and name at least one of four playservers, such as RealServer, to which certain MIME-types and file extensions are mapped.

Attributes

None

Subelements

The <playServerTable> element must contain at least one <playServer> tag.

playServer

The <playServer> </playServer> tag set is required for the <playServerTable> tag and names the media server type on the Content Engine that is responsible for playing the content types and files with extensions mapped to it using the <contentType> tags. The <playServer> tag is enclosed within <playServerTable> tags.


Note Do not confuse the <playServer> tag with the playserver setting in an <item> or <item-group> tag. An <item> or <item-group> tag specifies a server type to be used for an individual content object or group of related content objects. Although both playserver settings accomplish the same task, <item> tag-level playserver settings take precedence over the content-type and file extension mappings specified by the <playServer> tags in the <playServerTable> tag.


Attributes

The <playServer> tag name is required. Each <playServer> tag names the type of server to which content is mapped using the name attribute. In ACNS 5.0 software, Content Engines support four types of playservers:

real: RealMedia RealServer

http: HTTP web server

qtss: Apple QuickTime Streaming Server

wmt: Microsoft Windows Media Technologies

Subelements

At least one of the following subelements must be present in a <playServer> tag set.

<contentType/>

<extension/>

contentType

The <contentType> tag is optional but either a <contentType> or an <extension> subelement must be present in a <playServer> tag set. The <contentType> tag names MIME-type content that is to be mapped to a playserver. The <contentType> tag must be enclosed within a <playServer> tag set. When both <contentType> and <extension> tags are present in a <PlayServerTable> tag for a particular media type, the <contentType> mapping takes precedence.

Attributes

Each <contentType> tag names a media content type that is to be mapped to the playserver using the name attribute. The name attribute is required. Table 6-6 lists supported media types.

Subelements

None

Table 6-6 Supported Media File Formats Grouped by Manifest File Content Type 

Extension
Supported Formats
Notes

http

Audio Visual Interleaved (AVI)

Graphics Interchange Format (GIF)

Hypertext Markup Language (HTML, HTM)

Joint Photographic Experts Group (JPG)

Microsoft PowerPoint (PPT)

Microsoft Word (DOC)

Moving Picture Experts Group (MPEG, MPG)

MPEG Audio Layer 3 (MP3)

Portable Document Format (PDF)

QuickTime Movie (MOV)

ASX

The content item is processed by an HTTP server. This tag is used for content that cannot be streamed by any of the servers, for example, Adobe PDF, PostScript (PS), and MPG files.

media

AVI

GIF

HTML, HTM

JPG

PPT

DOC

MPEG, MPG

MP3

PDF

This is the default value used by the Cisco ACNS 5.0 software. Use this media tag when no playserver is specified to process a content object. The linked object can be either a pre-positioned or a live content object.

qtss

QuickTime (QT)

MOV

The content object is processed by the Apple QuickTime Streaming Server.

real

RealAudio (RA)

RealMedia (RM)

RealPix (RP)

RealText (RT)

Synchronized Container Format (SMIL)

The content object is processed by the RealServer.

wmt

ASF (includes WMA and WMV)

ASX

The content object is processed by Windows Media Services.


extension

The <extension> tag is optional but either a <contentType> or an <extension> subelement must be present in a <playServer> tag set. The <extension> tag names the file extension that is being mapped to a playserver.

The <extension> tag follows the <playServer> tag. When both <contentType> and <extension> tags are present in the <playServer> tag for a particular media type, the <contentType> mapping takes precedence.

Attributes

The name attribute is required and provides the file extension for a mapped content type. When files with the named extension are requested, the mapped playserver is used to serve them.

Subelements

None

Example

<CdnManifest> 
<playServerTable>
    <playServer name="real">
         <contentType name="application/x-pn-realaudio" />
         <contentType name="application/vnd.rn-rmadriver" />
         <extension name="rm" />
         <extension name="ra" />
         <extension name="rp" />
         <extension name="rt" />
         <extension name="smi" />
</playServer>
<playServer name="wmt">
         <extension name="asx" />
         <extension name="asf" />
         <extension name="avi" />
</playServer>
<playServer name="http">
         <contentType name="application/pdf" />
         <contentType name="application/postscript" />
         <extension name="pdf" />
         <extension name="ps" />
</playServer>
</playServerTable>
<server name="test.origin.com/">
         <host name="http://tst.orgn.com" proto="http" />
</server>
<item 
    src="pic1.mpg" 
/>
</CdnManifest>

options

The <options> tag is optional and used to define attributes specific to the manifest file. Shared attributes can be inherited by <item> and <crawler> tags in the manifest file. For example, timeZone is an attribute specific to the manifest file that is used to set the time zone for all time-related values. Attributes such as ttl and alternativeUrl can exist as <options> tags, and their values can be shared by all <item> and <crawler> tags within the manifest file.

The <options> tag set is enclosed within the <CdnManifest> tag set and specifies at least one global setting. No more than one <options> tag is allowed per manifest file.

If parameters are defined within the manifest file <options>, <item-group>, or <item> tags, the order of precedence from lowest to highest is <options>, <item-group>, and <item>.

Attributes

The timeZone attribute specifies the time zone for time values of attributes such as serveStart, serveStop, expire, and prefetch.

The following list of attributes can be shared by <item> and <crawler> tags. See the "item" section for descriptions of the following attributes:

alternateUrl

expires

noRedirectToOrigin

playServer

prefetch

serveStartTime

serveStopTime

server

priority

ttl

type

requireAuth

Subelements

None

Example

<CdnManifest> 
<options 
    noRedirectToOrigin= "true" 
    timeZone="PST" />

</CdnManifest>

server

The <server> and <host> tag fields configure the origin content source server. The <host> tag field inside the <server> tag field configures the content source host. Having multiple <host> tag fields in one <server> tag field is not supported in ACNS 5.0 software.

Each <item> or <item-group> tag can have a server attribute that refers to this <server> tag field.The <server> </server> tag set is required and defines only one host from which content is to be retrieved. The <server> tags are contained within <CdnManifest> tags and contain one <host> tag that identifies the host from which content is retrieved.

Attributes

name

The name attribute is required and can be any name as long as it matches the server attribute values in the <item> or <crawler> tags.

Subelements

<host/>

The <server> tag set can contain only one <host> subelement.

host

The <host> tag is required and defines a web server or live server from which content is to be retrieved and later pre-positioned. Only one host can be defined within a single <server> tag set. The <host> tag must be enclosed within <server> tags.

Attributes

name

The name attribute is required and identifies the domain name or IP address of the host, unless the proto attribute field is empty. If the proto attribute field is empty, the name attribute must be a fully qualified URL, including scheme and domain name or IP address. It can also include subdirectories, such as http://www.abc.com/media.

proto

The proto attribute is optional and identifies the communication protocol that is used to fetch content from the host. Supported protocols are HTTP, HTTPS, or FTP. The default proto attribute is HTTP. The proto attribute can be empty if the name attribute is a fully qualified domain name (FQDN).

port

The port attribute is optional and identifies the TCP port through which traffic to and from the host passes. The port used depends on the protocol used. The default port for HTTP is 80. The port attribute is only required for a nonstandard port assignment. The port attribute can also be specified in the name attribute, such as name="http://www.cisco.com:8080/."

user

The user attribute is optional and identifies the secure login used for host access.

password

The password attribute is optional and identifies the password for the user account that is required to access the host server.

unencoded

The unencoded attribute is optional. If set to true, the password is not encoded. The unencoded attribute default setting is false.

sslAuthType

The sslAuthType attribute is optional and has two possible values for the type of encryption:

strong

The default sslAuthType attribute setting is strong.

weak

Subelements

None

item

The <item> </item> tag set identifies the specific content that is to be acquired. The <item> tag names a single piece of content or a content object on the origin server, such as a graphic, MPEG video, or RealAudio sound file. Content items can be listed individually or grouped using the <item-group> tag.

The <item> tag must be enclosed within the <CdnManifest> tag set and can also be enclosed within <item-group> tags.

Attributes

src

The src attribute is required and identifies the relative path of the origin server. For example:

src="a/b/c/d.html"

server

The server attribute is optional and refers to the server name in the <server> tag. If the server attribute is omitted, the server listed in the closest <server> tag is used. If there is no <server> tag close to this <item>, the manifest server is used.

cdn-url

The cdn-url attribute is optional and is the relative CDN URL to allow end users to access this content. If no cdn-url value is specified, then the src value is used as the relative CDN URL.


Note If you use FTP to acquire content and the content type is not specified in the manifest file and the cdn-url attribute is used to alter your publishing URL, the cdn-url attribute must have the correct extension. Otherwise, the incorrect content type will be generated and you cannot play the content.


type

The type attribute is optional and defines whether content is to be pre-positioned or live on the CDN. The three type attributes are prepos, wmt-live, and real-live. The wmt-live and real-live attributes are used to deliver live content. If this field is left blank, the default type is prepos.

noRedirectToOrigin

The noRedirectToOrigin attribute is optional and sets the redirection to the origin server to true or false. A false setting allows the CDN Content Engine or other edge device to redirect content requests to the origin server if the content is available at that device. A true setting does not allow the CDN Content Engine or edge device to redirect content requests to the origin server, and it generates an error. The default noRedirectToOrigin setting is false. For the effect of the noRedirectToOrigin attribute on pre-positioned content freshness, see the "Configuring Freshness of Pre-Positioned Content" section.

playServer

The playServer attribute is optional and names the server used to play back the content. Valid playservers are real (RealServer), wmt (Windows Media Technologies), qtss (QuickTime Streaming Server), and http (web server). The value in this field is either one playserver or multiple playservers separated by commas. If a value for this attribute is left blank, the <PlayServerTable> tag in the manifest file is used to generate the playserver list for this content. If the manifest file does not have the <PlayServerTable> tag specified, it uses the default <PlayServerTable> tag.

prefetch

The prefetch attribute is optional and designates a time in yyyy-mm-dd hh:mm:ss (year-month-day hour:minute:second) format at which the content is to be retrieved from the origin server. The time zone for the time can be specified in the <options> tag. Note that the automatic conversion between daylight saving time and standard time within a time zone is not supported, but a special designation for daylight saving time can be used, such as PDT for Pacific daylight saving time. In the following example, the prefetch time is September 5, 2002 at 09:09:09 Pacific daylight saving time:

<options timeZone="PDT" />

<item src="index.html" prefetch="2002-09-05 09:09:09 PDT" />

If a time value is omitted, the content is acquired immediately.

expires

The expires attribute is optional and designates a time in yyyy-mm-dd hh:mm:ss format when the content is to be removed from the CDN. Additionally, you can specify the GMT time zone (see the "Specifying Time Values in the Manifest File" section). If a time value is omitted, content is stored at the CDN until it is removed when you modify the relevant manifest file code. For the effect of the HTTP header expires attribute on pre-positioned content freshness, see the "Configuring Freshness of Pre-Positioned Content" section.

ttl

The ttl attribute is optional and designates a time interval, in minutes, for revalidation of the content. If a time value is omitted, the content is fetched only once and its freshness is never checked again.

serveStartTime

The serveStartTime attribute is optional and designates a time in yyyy-mm-dd hh:mm:ss format when the CDN is allowed to start serving the content. If the time to serve is omitted, content is ready to serve once it is distributed to the Content Engine or other edge device.

serveStopTime

The serveStopTime attribute is optional and designates a time in yyyy-mm-dd hh:mm:ss format when the CDN temporarily stops serving the content. If the time to stop serving is omitted, the CDN serves the content until it is removed by modifying the relevant manifest file code. For the effect of the serveStopTime attribute on pre-positioned content freshness, see the "Configuring Freshness of Pre-Positioned Content" section.

alternateUrl

The alternateUrl attribute is optional. If content requested by the user is not ready in the CDN, the CDN redirects the request to this alternative URL, which can be configured as an error reporting page. The alternateUrl attribute supports both the full URL or a relative path. (If the alternateUrl attribute is a relative path, the alternateUrl attribute must be relative to the requesting URL.)

priority

The priority attribute is optional and can be any integer value to specify the content processing priority. If a priority value is omitted, its index order within the manifest file is used to set the priority.

requireAuth

The requireAuth attribute is optional and determines whether users need to be authenticated to play the specified content. If authentication is required, the Content Engine communicates with the origin server to check credentials. When true, requireAuth requires authentication to play back the specified content to users. If the requests pass the credential check, the content is played back from the Content Engine. If this attribute is omitted, a heuristic approach is used. If the specified content is acquired by using a username and password, this attribute is required; otherwise, it is not required.

Subelements

<contains/>

<wmt-meta-data/>

<http-meta-data/>

Example

<item 
    src="index.html" 
    server="cisco.com" 
    ttl="3000" 
    alternateUrl="http://www.cisco.com/cdn-error.html" 
/> 

crawler

The <crawler> </crawler> tag set supports crawling a website or an FTP server.

Attributes

start-url

The start-url attribute is required. It defines the URL at which to start the process of crawling the website or FTP server. For an FTP server crawl, the start-url attribute must be a directory path with a forward slash as its last character. The start-url attribute defines a relative path, and the FTP server host name is necessary to compose the complete URL.

depth

The depth attribute is optional and defines the link depth to which a website is to be crawled or directory depth to which an FTP server is to be crawled. If the depth is not specified, the default is 20. The following are the general depth values:

0 = acquire only the starting URL
1 = acquire the starting URL and its referred files
-1 = infinite or no depth restriction

The depth is defined as the level of a website or the directory level of an FTP server, where 0 is the starting URL.

prefix

The prefix attribute is optional and combines the host name from the <server> tag with the value of the prefix attribute to create a full prefix. Only content whose URLs match the full prefix is acquired. For example:

<server name="xx"> <host name="www.cisco.com" proto="https" port=433 /> </server>

and in a <crawler> tag:

prefix="marketing/eng/"

The full prefix is "https://www.cisco.com:433/marketing/eng/." Only URLs that match this prefix are crawled.

If a prefix is omitted, the crawler checks the default full prefix, which is the host name portion of the URL from the server. In the previous example, the default full prefix is "https://www.cisco.com:433."

accept

The accept attribute is optional and uses a regular expression to define acceptable URLs to crawl in addition to matching the prefix. For example, accept="stock" means that only URLs that meet two conditions are searched: the URL matches the prefix and contains the string "stock." (See the "Writing Common Regular Expressions" section for more information on using regular expressions.)

reject

The reject attribute is optional and uses a regular expression to reject a URL if it matches the reject regular expression. The reject regular expression is checked after checking for a prefix URL match. If a URL does not match the prefix, it is immediately rejected. If a URL matches the prefix and the reject parameters, it is rejected by the particular reject constraint. (See the "Writing Common Regular Expressions" section for more information on using regular expressions.)

max-number

The max-number attribute is optional and specifies the maximum number of crawler job objects that can be acquired.

max-size-in-MB

The max-size-in-MB attribute is optional and specifies the maximum content size in megabytes that this crawler job can acquire. The size attribute can be expressed in megabytes (MB), kilobytes (KB), or bytes (B).

srcPrefix

The srcPrefix attribute is optional and must be used in conjunction with the cdnPrefix attribute to form a relative CDN URL. If a srcPrefix attribute is not specified, or if the prefix of the relative source URL does not match the srcPrefix attribute, then the relative CDN URL is the cdnPrefix value combined with the relative source URL. For example, if these content objects have same source URL prefix "acme/pubs/docs/online/Design/" and you want to replace this prefix with a simple "online/," then specify srcPrefix="acme/pubs/docs/online/Design/" and cdnPrefix="online/."

cdnPrefix

The cdnPrefix attribute is optional and must be used in conjunction with the srcPrefix attribute.

requireAuth

The requireAuth attribute is optional and determines whether users need to be authenticated in order to play the specified content. If authentication is required, the Content Engine communicates with the origin server to check credentials. When true, requireAuth requires authentication to play back the specified content to users. If the requests pass the credential check, the content is played back from Content Engine.If this attribute is omitted, a heuristic approach is used. If the specified content is acquired by using a username and password, this attribute is required; otherwise, it is not required.

The following attributes, described under the <> tag attributes, can also be specified by the <crawler> tag.

alternateUrl

expires

noRedirectToOrigin

playServer

prefetch

serveStartTime

serveStopTime

server

priority

ttl

type

Subelements

<wmt-meta-data/>

<http-meta-data/>

<matchRule></matchRule>

Example

<server name="cisco"> 
    <host name="http://www.cisco.com/jobs/" /> 
</server> 
<crawler 
    server="cisco" 
    start-url="eng/index.html" 
    depth="10" 
    prefix="eng/" 
    reject="\.pl" 
    max-size-in-MB="200" 
/> 

item-group

The <item-group> </item-group> tag set is used to place shared attributes under one tag so that they can be shared by every <item> and <crawler> tag within that group. When attributes are shared, it means that attributes can be defined at either the <item-group> tag level for group-wide control or on a per <item> or per <crawler> tag basis. For example, if every <item> tag is using the same server and ttl attribute, you can create an <item-group> tag on top of these <item> tags and place the server and ttl attributes in the <item-group> tag.

Using shared attributes makes any manifest file with many <item> tags more efficient by consolidating the <item> tags with shared attributes. If the same attribute value exists in both the <item-group> and <item> tags, the value in the <item> tag takes precedence over that value in the <item-group> tag.

The <item-group> tag must be enclosed within the <CdnManifest> tag set and contain one or more <item> or <crawler> tags.

Attributes

If an attribute value is present only at the <item-group> tag level, then it is inherited by its inner element in the <item> tag. If an attribute value is present in a crawler job, its attributes, whether inherited or owned, are propagated to the content fetched by the crawler job.

The following attributes can be shared across many <item> and <crawler> tags and are candidates for the <item-group> level tag. See the "item" section for detailed descriptions of the following attributes:

alternateUrl

expires

noRedirectToOrigin

playServer

prefetch

serveStartTime

serveStopTime

server

priority

ttl

type

requireAuth

Additionally, the following two attributes can be placed within the <item-group> tag. See the "crawler" section for a detailed description of the two following attributes:

srcPrefix

cdnPrefix

These two attributes convert the prefix of the src-url (retrieve URL) to the cdn-url (publish URL) for multiple content objects. These content objects are either implicitly specified by multiple <item> tags or acquired through a crawler job.

These two attributes can also be specified in the <crawler> tag. If you explicitly specify the srcPrefix attribute and cdnPrefix attribute for an individual <crawler> job, the <crawler> tag-level specification takes precedence over the <item-group> tag-level settings. If you do not specify these attributes for an individual <crawler> job, the <item-group> tag-level specification is inherited by the <crawler> job.

The srcPrefix and cdnPrefix attributes generate the relative CDN URL using the following rules:

If the cdn-url attribute is present in the <item> tag, the relative CDN URL contains both the cdnPrefix attribute plus the cdn-url attribute. For example, if cdnPrefix="eng/spec" and cdn-url="e/f.html," the relative path in the URL is "eng/spec/e/f.html."

If the srcPrefix attribute is not present in the <item> tag, the relative CDN URL is the cdnPrefix attribute as well as the relative source URL.

If the prefix of the relative source URL does not match the srcPrefix attribute, the relative CDN URL is the cdnPrefix attribute as well as the source relative URL.

To generate a relative CDN URL, remove the matched prefix from the relative source URL and replace it with the cdnPrefix attribute.

The relative CDN URL of the <item> tag in the following example is "acme/default.htm."

<item-group cdnPrefix="acme/" > 
    <item src="design/index.html" cdn-url="default.html" /> 
</item-group>

In the following example, content objects with the srcPrefix attribute, such as "design/plan/," have the relative CDN URL as "acme/" as well as relative source URLs stripped of "design/plan/." Other content objects whose prefix attribute does not match "design/plan/" have "acme/" as well as their original relative source URL.

<crawler 
    start-url="design/plan/index.html" 
    depth="-1" 
    srcPrefix="design/plan/" 
    cdnPrefix="acme/" /> 

Subelements

<matchRule></matchRule>

<wmt-meta-data />

<http-meta-data/>

<crawler></crawler>

<item></item>

Example

<!--grouped content items-->
<item-group server="origin-web-server" type="prepos" ttl="300" cdnPrefix="unicorn/" >
         <item cdn-url="newHQpresentation.rm" src="newHQpresentation.rm" /> 
         <item cdn-url="animatedlogo.mpg" src="animlogo.mpg" />
         <item cdn-url="companytheme.mp3" src="cotheme.mp3" />
         <item cdn-url="newHQlayout.avi" src="newHQ.mov" />
</item-group>

matchRule

The <matchRule> </matchRule> tag set is optional and defines additional filter rules for crawler jobs. It affects only <crawler> tasks and is not used by single <item> tags. The crawler parameters defined in the <crawler></crawler> tag set determine primarily the scope of a crawl search. If a content object does not meet the criteria specified by the crawler parameter, neither it nor its children are searched.

The <matchRule> tag, however, determines only whether or not the content objects should be acquired regardless of the scope of the search. If a web page matches the crawler parameters without the <matchRule> feature, its children are searched even though its content objects are not acquired.

In the following crawler job example using the <matchRule> tag, the entire website is searched but only files with the .jpg file extension larger than 50 kilobytes are acquired.

<crawler  start-url="index.html"  depth="-1" >
    <matchRule>
        <match size-min-in-KB="50" extensioin="jpg" />
    </matchRule>
</crawler>

The <matchRule> element can be nested within an <item-group> tag to define group-wide filter rules for <crawler> tags contained in the group. It can also be a subelement of a particular <crawler> job. The <crawler> tag-level setting overrides the <item-group> tag-level setting when both tags are present.

If you define criteria locally for individual <crawler> jobs, any existing group-level criterion is entirely discarded for that <crawler> job. That is, if your <item-group> tag match rule is set to A and your <crawler> tag specifies another match rule set to B, only B is to be used for the <crawler> tag rather than a combination of A and B. You can define at most one <matchRule> tag per <item-group> tag and at most one <matchRule> tag per <crawler> tag.

Attributes

None

Subelements

At least one <match> tag

match

The <match> </match> tag set is optional and specifies the acquisition criteria of content objects before they can be acquired by the CDN. Every attribute within a single <match> tag is ANDed (to form a logical conjunction) with the other attributes.

You can specify multiple <match> tags within the <matchRule> tag. The <match> tags are ORed (to form a logical inclusion) with other <match> tags. You must specify at least one <match> tag per <matchRule> tag.

Attributes

mime-type

The mime-type attribute specifies MIME types.

extension

The extension attribute specifies file extensions.

time-before

The time-before attribute specifies that this content was modified before this time in yyyy-mm-dd hh:mm:ss format. Time parameters should be expressed in GMT time zones (for GMT offsets, see the "Manifest File Time Zone Tables" section).

time-after

The time-after attribute specifies that this content was modified after this time in yyyy-mm-dd hh:mm:ss format. Time parameters should be expressed in GMT time zones (for GMT offsets, see the "Manifest File Time Zone Tables" section).

size-min-in-MB

The size-min-in-MB attribute specifies that the acquired content size must be larger than this number of kilobytes. The size attribute can be expressed in megabytes (MB), kilobytes (KB), or bytes (B).

size-max-in-MB

The size-max-in-MB attribute specifies that the acquired content size must be smaller than this number of kilobytes. This attribute can be expressed in megabytes (MB), kilobytes (KB), or bytes (B).

Subelements

None

Example

<! - - crawling item group -- >
<item-group server="origin-server" type="prepos">
       <matchRule> 
              <match time-before="2000-05-05 12:0:0"/>
       </matchRule>
          <crawler start-url="eng/index.html" depth="-1"/>
            <crawler start-url="hr/index.html" depth="3">
                       <matchRule>
                               <match size-min-in-KB="1" extension="xxx"/>
                       </matchRule>
            </crawler>
</item-group>

contains

The <contains> tag is optional and identifies content objects that are embedded within the content item currently being described. For example, the components of a SMIL (Synchronized Multimedia Integration Language) file requests for an item using <contains> links are only accepted after the CDN determines that dependent content objects are present in the Content Engine.

The <contains> tag must be enclosed within the <item> </item> tag.

The <contains> tag is used to include embedded files for some video files like .asf or .rp. The CDN does not serve this item unless every contained item is present.

Attributes

The cdn-url attribute is required and is the relative CDN URL of one of the embedded contents.

Subelements

None

Example

<item src="house/img08.jpb" cdn-url="img08.jpg" />
<item src="house/img09.jpb" cdn-url="img09.jpg" />
<item cdn-url="house.rp"src="house/house.rp">
     <contains cdn-url="img08.jpg"/>
     <contains cdn-url="img09.jpg"/>
</item>

wmt-meta-data

The <wmt-meta-data

> tag is optional and for use with Windows Media Technologies (.wma, .wmv, and .asf) files only. It specifies one or more file attributes that are displayed in the Windows Media Player when the file is played back.

The element may be enclosed within <item-group> or <item> or <crawler> tags. At most, one such element may be specified for its parent element.

Attributes

The name=value attribute; typical attributes for WMT players are Title, Author, Copyright, and Description. If parented with the <item-group> tag, then the attribute applies to the content contained within the group.

This attribute can also be nested within an <item> or a <crawler> tag. The <item> or <crawler> tag-level settings override the <item-group> tag-level settings.

Subelements

None

http-meta-data

The <http-meta-data> tag is optional and used for HTTP playback of content. If a content object is requested through HTTP, these attributes are sent to the end users as HTTP response headers. This type of response header is useful when you specify content type for FTP acquired content.

The element can also be nested within an <item> or a <crawler> tag. The <item> or <crawler> tag-level settings override the <item-group> tag-level settings.

Attributes

The name=value attribute can be both standard HTTP header metadata and customized application metadata. If parented with the <item-group> tag, then the attribute applies to the content contained within the group.

This attribute can also be nested within an <item> or a <crawler> tag. The <item> or <crawler> tag-level settings override the <item-group> tag-level settings.

Subelements

None

Configuring Freshness of Pre-Positioned Content

Four different manifest file configurations are possible to configure and manage the freshness of your pre-positioned content using the serveStopTime and noRedirectToOrigin attributes. The following configurations are possible:

Both the serveStopTime and noRedirectToOrigin attributes are included in the manifest file, making the condition noRedirectToOrigin true. The conditions for this first case are shown in Table 6-7.

Only the serveStopTime attribute is included in the manifest file. The noRedirectToOrigin attribute is not, making the condition noRedirectToOrigin false. The conditions for this second case are shown in Table 6-8.

Neither the serveStopTime nor the noRedirectToOrigin attribute is included in the manifest file, making the condition noRedirectToOrigin false. The conditions for this third case are shown in Table 6-9.

Only the noRedirectToOrigin attribute is included in the manifest file. The serveStopTime attribute is not, making the condition noRedirectToOrigin true. The conditions for this fourth case are shown in Table 6-10.

Depending on whether the serveStopTime and noRedirectToOrigin attributes are included and the timing combinations of the serveStopTime value and the HTTP header expiration, the conditions and corresponding results are listed in Table 6-7 through Table 6-10. In the following tables, now is defined as the time the end user content request arrives. These tables use the end user request arrival time to make content delivery decisions.

Table 6-7 Both serveStopTime and noRedirectToOrigin Attributes Included, noRedirectToOrigin=true 

Condition
Result

Now is past the serveStopTime value

Content is not served and an error message appears

Now is before the serveStopTime value but is past the HTTP expires header

Content is served from the cdnfs, but the content can be stale

Now is before the serveStopTime value and is before the HTTP expires header

Content is served from the CDNFS

Now is before the serveStopTime value and no HTTP expires header exists

Content is served from the CDNFS


Table 6-8 Only serveStopTime Attribute Included, noRedirectToOrigin=false 

Condition
Result

Now is past the serveStopTime value

Content is served by proxy from the origin server

Now is before the serveStopTime value but is past the HTTP expires header

Content is served from the cdnfs, but content can be stale

Now is before the serveStopTime value and is before the HTTP expires header

Content is served from the cdnfs

Now is before the serveStopTime value and no HTTP expires header exists

Content is served from the cdnfs


Table 6-9 Neither serveStopTime nor noRedirectToOrigin Attributes Included, noRedirectToOrigin=true 

Condition
Result

Now is past the HTTP expires header

Content is served by proxy from the origin server

Now is before the HTTP expires header

Content is served from the cdnfs

No HTTP expires header exists

Content is served from the cdnfs


Table 6-10 Only the noRedirectToOrigin Attribute Included, noRedirectToOrigin=true 

Condition
Result

Now is past the HTTP expires header

Content is served from the cdnfs, but content can be stale

Now is before the HTTP expires header

Content is served from the cdnfs

No HTTP expires header exists

Content is served from the cdnfs


XML Schema

In the case of the manifest file, an XML schema defines the custom markup language of the manifest file and the appearance of a given set of XML documents. The XML schema specifies which tags or elements you can use in your documents, the attributes those tags can contain, and their arrangement.

Manifest XML Schema

An XSD is a library that provides an application programming interface (API) for manipulating the components of an XML schema. For more information on an XSD, go to http://www.w3schools.com/schema/schema_intro.asp.

The following XML code is the manifest XML schema (CdnManifest.xsd).

[qiwzhang@qiwzhang-linux schema]$ cat CdnManifest.xsd
<?xml version="1.0"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:include schemaLocation="PlayServerTable.xsd"/>

<xs:element name="CdnManifest">
  <xs:complexType>
    <xs:sequence>
        <xs:element ref="playServerTable" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="options" minOccurs="0" maxOccurs="1"/>
        <xs:element ref="proxyServer" minOccurs="0" maxOccurs="unbounded"/>
        <xs:choice minOccurs="0" maxOccurs="unbounded">
            <xs:element ref="server" maxOccurs="unbounded"/>
            <xs:element ref="item-group" maxOccurs="unbounded"/>
            <xs:element ref="item" maxOccurs="unbounded"/>
            <xs:element ref="crawler" maxOccurs="unbounded"/>
        </xs:choice>
        <xs:choice minOccurs="1" maxOccurs="unbounded">
            <xs:element ref="item-group" maxOccurs="unbounded"/>
            <xs:element ref="item" maxOccurs="unbounded"/>
            <xs:element ref="crawler" maxOccurs="unbounded"/>
        </xs:choice>
   </xs:sequence>
  </xs:complexType>
</xs:element>

<xs:element name="options">
    <xs:complexType>
        <xs:attribute name="timeZone" type="xs:string" use="optional"/>
        <xs:attribute name="notFoundUrl" type="xs:string" use="optional"/>
        <xs:attribute name="alternateUrl" type="xs:string" use="optional"/>
        <xs:attribute name="noRedirectToOrigin" type="xs:boolean" use="optional" 
default="false" />
        <xs:attribute name="requireAuth" type="xs:boolean" use="optional"/>
        <xs:attribute name="ttl" type="xs:unsignedInt" use="optional"/>
        <xs:attribute name="prefetch" type="xs:string" use="optional"/>
        <xs:attribute name="ttl-for-missing" type="xs:unsignedInt" use="optional"/>
        <xs:attribute name="ttl-for-non-ref" type="xs:unsignedInt" use="optional"/>
        <xs:attribute name="type" use="optional" default="prepos">
            <xs:simpleType>
                <xs:restriction base="xs:string">
                    <xs:enumeration value="prepos"/>
                    <xs:enumeration value="wmt-live"/>
                    <xs:enumeration value="real-live"/>
                 </xs:restriction>
            </xs:simpleType>
        </xs:attribute>
        <xs:attribute name="manifest-id" type="xs:string" use="optional"/>
        <xs:attribute name="clearlog" type="xs:boolean" use="optional" default="false"/>
        <xs:attribute name="rd" type="xs:string" use="optional"/>
        <xs:attribute name="prepos-tag" type="xs:string" use="optional"/>
        <xs:attribute name="live-tag" type="xs:string" use="optional"/>
     </xs:complexType>
 </xs:element>

 <xs:element name="server">
     <xs:complexType>
         <xs:sequence>
             <xs:element ref="host" minOccurs="1" maxOccurs="1"/>
         </xs:sequence>
         <xs:attribute name="name" type="xs:string" use="required"/>
     </xs:complexType>
 </xs:element>

 <xs:element name="host">
     <xs:complexType>
           <xs:attribute name="name" type="xs:string" use="required"/>
           <xs:attribute name="root" type="xs:string" use="optional"/>
           <xs:attribute name="proxyServer" type="xs:string" use="optional"/>
           <xs:attribute name="proto" use="optional">
               <xs:simpleType>
                     <xs:restriction base="xs:string">
                         <xs:enumeration value="http"/>
                         <xs:enumeration value="https"/>
                         <xs:enumeration value="ftp"/>
                         <xs:enumeration value="mms"/>
                         <xs:enumeration value="rtsp"/>
                     </xs:restriction>
               </xs:simpleType>
           </xs:attribute>
           <xs:attribute name="port" type="xs:unsignedShort" use="optional"/>
           <xs:attribute name="user" type="xs:string" use="optional"/>
           <xs:attribute name="password" type="xs:string" use="optional"/>
           <xs:attribute name="uuencoded" type="xs:boolean" use="optional" 
default="false"/>
           <xs:attribute name="proxyName" type="xs:string" use="optional"/>
           <xs:attribute name="sslAuthType" use="optional">
               <xs:simpleType>
                     <xs:restriction base="xs:string">
                         <xs:enumeration value="weak"/>
                         <xs:enumeration value="strong"/>
                     </xs:restriction>
               </xs:simpleType>
           </xs:attribute>
  </xs:complexType>
 </xs:element>

 <xs:element name="proxyServer">
     <xs:complexType>
         <xs:attribute name="serverName" type="xs:string" use="required"/>
         <xs:attribute name="port" type="xs:unsignedShort" use="optional"/>
         <xs:attribute name="user" type="xs:string" use="optional"/>
         <xs:attribute name="password" type="xs:string" use="optional"/>
         <xs:attribute name="uuencoded" type="xs:string" use="optional" default="false"/>
     </xs:complexType>
 </xs:element>

 <xs:attributeGroup name = "contentAttr">
     <xs:attribute name="server" type="xs:string" use="optional"/>
     <xs:attribute name="proxyServer" type="xs:string" use="optional"/>
     <xs:attribute name="playServer" type="xs:string" use="optional"/>
     <xs:attribute name="type" use="optional">
         <xs:simpleType>
             <xs:restriction base="xs:string">
                 <xs:enumeration value="prepos"/>
                 <xs:enumeration value="wmt-live"/>
                 <xs:enumeration value="real-live"/>
             </xs:restriction>
         </xs:simpleType>
     </xs:attribute>
     <xs:attribute name="noRedirectToOrigin" type="xs:boolean" use="optional"/>
     <xs:attribute name="requireAuth" type="xs:boolean" use="optional"/>
     <xs:attribute name="alternateUrl" type="xs:string" use="optional"/>
     <xs:attribute name="ttl" type="xs:unsignedInt" use="optional"/>
     <xs:attribute name="priority" type="xs:unsignedInt" use="optional"/>
     <xs:attribute name="prefetch" type="xs:string" use="optional"/>
     <xs:attribute name="expires" type="xs:string" use="optional"/>
     <xs:attribute name="serve" type="xs:string" use="optional"/>
     <xs:attribute name="serveStartTime" type="xs:string" use="optional"/>
     <xs:attribute name="serveStopTime" type="xs:string" use="optional"/>
 </xs:attributeGroup>

 <xs:attributeGroup name = "prefixAttr">
      <xs:attribute name="cdnPrefix" type="xs:string" use="optional"/>
      <xs:attribute name="srcPrefix" type="xs:string" use="optional"/>
 </xs:attributeGroup>

 <xs:element name="item-group">
     <xs:complexType>
         <xs:sequence>
             <xs:element ref="matchRule" minOccurs="0" maxOccurs="1"/>
             <xs:element ref="http-meta-data" minOccurs="0" maxOccurs="1"/>
             <xs:element ref="wmt-meta-data" minOccurs="0" maxOccurs="1"/>
             <xs:choice minOccurs="1" maxOccurs="unbounded">
                 <xs:element ref="item-group" maxOccurs="unbounded"/>
                 <xs:element ref="item" maxOccurs="unbounded"/>
                 <xs:element ref="crawler" maxOccurs="unbounded"/>
             </xs:choice>
         </xs:sequence>
         <xs:attributeGroup ref="contentAttr"/>
         <xs:attributeGroup ref="prefixAttr"/>
     </xs:complexType>
 </xs:element>

 <xs:element name="item">
     <xs:complexType>
         <xs:sequence>
             <xs:element ref="contains" minOccurs="0" maxOccurs="unbounded"/>
             <xs:element ref="http-meta-data" minOccurs="0" maxOccurs="1"/>
             <xs:element ref="wmt-meta-data" minOccurs="0" maxOccurs="1"/>
         </xs:sequence>
         <xs:attribute name="src" type="xs:string" use="required"/>
         <xs:attribute name="cdn-url" type="xs:string" use="optional"/>
         <xs:attributeGroup ref="contentAttr"/>
     </xs:complexType>
 </xs:element>

 <xs:element name="crawler">
     <xs:complexType>
         <xs:all>
             <xs:element ref="matchRule" minOccurs="0" maxOccurs="1"/>
             <xs:element ref="http-meta-data" minOccurs="0" maxOccurs="1"/>
             <xs:element ref="wmt-meta-data" minOccurs="0" maxOccurs="1"/>
         </xs:all>
         <xs:attribute name="start-url" type="xs:string" use="required"/>
         <xs:attribute name="depth" type="xs:short" use="optional"/>
         <xs:attribute name="prefix" type="xs:string" use="optional"/>
         <xs:attribute name="accept" type="xs:string" use="optional"/>
         <xs:attribute name="reject" type="xs:string" use="optional"/>
         <xs:attribute name="max-number" type="xs:unsignedInt" use="optional"/>

         <xs:attribute name="max-size-in-B" type="xs:unsignedInt" use="optional"/>
         <xs:attribute name="max-size-in-KB" type="xs:unsignedInt" use="optional"/>
         <xs:attribute name="max-size-in-MB" type="xs:unsignedInt" use="optional"/>
         <xs:attributeGroup ref="contentAttr"/>
         <xs:attributeGroup ref="prefixAttr"/>
     </xs:complexType>
 </xs:element>

 <xs:element name="contains">
     <xs:complexType>
         <xs:attribute name="cdn-url" type="xs:string" use="required"/>
     </xs:complexType>
 </xs:element>

 <xs:element name="matchRule">
     <xs:complexType>
         <xs:sequence>
             <xs:element ref="match" minOccurs="1" maxOccurs="unbounded"/>
         </xs:sequence>
     </xs:complexType>
 </xs:element>

 <xs:element name="match">
     <xs:complexType>
         <xs:attribute name="mime-type" type="xs:string" use="optional"/>
         <xs:attribute name="time-before" type="xs:string" use="optional"/>
         <xs:attribute name="time-after" type="xs:string" use="optional"/>
         <xs:attribute name="size-min-in-B" type="xs:int" use="optional"/>
         <xs:attribute name="size-max-in-B" type="xs:int" use="optional"/>
         <xs:attribute name="size-min-in-KB" type="xs:int" use="optional"/>
         <xs:attribute name="size-max-in-KB" type="xs:int" use="optional"/>
         <xs:attribute name="size-min-in-MB" type="xs:int" use="optional"/>
         <xs:attribute name="size-max-in-MB" type="xs:int" use="optional"/>
         <xs:attribute name="extension" type="xs:string" use="optional"/>
     </xs:complexType>
 </xs:element>

 <xs:element name="http-meta-data">
     <xs:complexType>
         <xs:anyAttribute processContents="skip" />
     </xs:complexType>
 </xs:element>

 <xs:element name="wmt-meta-data">
     <xs:complexType>
         <xs:anyAttribute processContents="skip" />
     </xs:complexType>
 </xs:element>

</xs:schema>

PlayServerTable XML Schema

The following XML code defines the PlayServerTable schema (playServerTable.xsd) for the CdnManfiest.xsd.

<?xml version="1.0"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="playServerTable">
     <xs:complexType>
         <xs:sequence>
             <xs:element ref="playServer" minOccurs="1" maxOccurs="unbounded"/>
         </xs:sequence>
     </xs:complexType>
 </xs:element>

 <xs:element name="playServer">
     <xs:complexType>
         <xs:choice minOccurs="1" maxOccurs="unbounded">
             <xs:element ref="contentType"/>
             <xs:element ref="extension"/>
         </xs:choice>
         <xs:attribute name="name" use="required">
             <xs:simpleType>
                 <xs:restriction base="xs:string">
                      <xs:enumeration value="real"/>
                      <xs:enumeration value="wmt"/>
                      <xs:enumeration value="http"/>
                      <xs:enumeration value="qtss"/>
                 </xs:restriction>
             </xs:simpleType>
         </xs:attribute>
     </xs:complexType>
 </xs:element>

 <xs:element name="contentType">
     <xs:complexType>
         <xs:attribute name="name" type="xs:string" use="required"/>
     </xs:complexType>
 </xs:element>

 <xs:element name="extension">
     <xs:complexType>
         <xs:attribute name="name" type="xs:string" use="required"/>
     </xs:complexType>
 </xs:element>
</xs:schema>

Default PlayServerTable Schema

The following XML code defines the default PlayServerTable schema (PlayServerTable.xsd).

<?xml version="1.0"?>

<playServerTable xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
     xsi:noNamespaceSchemaLocation = "PlayServerTable.xsd">

  <playServer name="real">
    <!-- MIME type is taken from
     http://service.real.com/help/library/guides/server8/htmfiles/custmizg.htm
     -->
    <contentType name="audio/x-pn-realaudio" />
    <contentType name="audio/x-pn-realaudio-plugin" />
    <contentType name="application/x-pn-realmedia" />
    <contentType name="application/smil" />
    <contentType name="application/vnd.rn-rmadriver" />
    <extension name="rm" />
    <extension name="rms" />
    <extension name="ra" />
    <extension name="rp" />
    <extension name="rt" />
    <extension name="smi" />
  </playServer>

  <playServer name="qtss">
    <contentType name="video/quicktime" />
    <extension name="mov" />
    <extension name="qt" />
    <extension name="mp4" />

    <!-- extension avi could go here, but is also supported by wmt -->
  </playServer>

  <playServer name="wmt">
    <!-- MIME types taken from
    http://msdn.microsoft.com/workshop/imedia/windowsmedia/server/mime.asp
    -->
    <contentType name="video/x-ms-asf" />
    <contentType name="audio/x-ms-wma" />
    <contentType name="video/x-ms-wmv" />
    <contentType name="video/x-ms-wm"  />
    <contentType name="application/x-ms-wmz" />
    <contentType name="application/x-ms-wmd" />

    <!-- comments courtesy of Laura Gaughan, 11jan2001 -->
    <extension name="wma" /> <!-- audio content -->
    <extension name="wmv" /> <!-- audio/video content -->
    <extension name="asf" /> <!-- audio/video content (legacy) -->
    <extension name="wm"  /> <!-- reserved for future use -->

    <!-- extension avi could go here, but is also supported by qtss -->
  </playServer>

  <playServer name="http">
    <contentType name="application/pdf" />
    <contentType name="application/postscript" />
    <extension name="pdf" />
    <extension name="ps" />

    <!-- this must be http; wmt doesn't do asx over mms -->
    <contentType name="audio/x-ms-wax" />
    <contentType name="video/x-ms-wvx" />
    <contentType name="video/x-ms-wmx" />
    <extension name="asx" /> <!-- as for wvx + .asf .asx (legacy) -->
    <extension name="wax" /> <!-- metadata for .wma .wax -->
    <extension name="wvx" /> <!-- metadata for .wma .wmv .wvx .wax -->
    <extension name="wmx" /> <!-- reserved for future use -->

    <!--
      add all types from wmt tables to here, since they can be played
      by http too.

    MIME types taken from
    http://msdn.microsoft.com/workshop/imedia/windowsmedia/server/mime.asp
    -->
    <contentType name="video/x-ms-asf" />
    <contentType name="audio/x-ms-wma" />
    <contentType name="video/x-ms-wmv" />
    <contentType name="video/x-ms-wm"  />
    <contentType name="application/x-ms-wmz" />
    <contentType name="application/x-ms-wmd" />

    <!-- comments courtesy of Laura Gaughan, 11jan2001 -->
    <extension name="wma" /> <!-- audio content -->
    <extension name="wmv" /> <!-- audio/video content -->
    <extension name="asf" /> <!-- audio/video content (legacy) -->
    <extension name="wm"  /> <!-- reserved for future use -->

    <!-- extension avi could go here, but is also supported by qtss -->
  </playServer>

</playServerTable>

Manifest File Time Zone Tables

To convert to local time, you must know the time difference between Greenwich mean time (GMT) and local time for both standard time and summer time (daylight saving time). Table 6-11 through Table 6-26 list the time zones supported by the manifest file. The format for writing the time zone is:

<zonename>:[+|-:]hh:mm per line

where <zonename> is the name of the time zone or standard time zone abbreviation (see Table 6-11) without spaces before or after the colon (":"), and "[+|-:]hh:mm" is the GMT offset in hours and minutes. The GMT offset default is "+."

Table 6-11 Standard Time Zones and GMT Offsets 

Time Zone: GMT Offset
Time Zone: GMT Offset
Time Zone: GMT Offset

ACT:+:09:30

Etc/GMT+7:-:07:00

HST:-:10:00

ADT:-:03:00

Etc/GMT+8:-:08:00

IET:-:05:00

AET:+:10:00

Etc/GMT+9:-:09:00

IST:+:05:30

AGT:-:03:00

Etc/GMT-0:00:00

JST:+:09:00

ART:+:02:00

Etc/GMT-10:+:10:00

MDT:-:06:00

AST:-:09:00

Etc/GMT-11:+:11:00

MET:+:01:00

BET:-:03:00

Etc/GMT-12:+:12:00

MIT:-:11:00

BST:+:06:00

Etc/GMT-13:+:13:00

MST7MDT:-:07:00

CAT:+:02:00

Etc/GMT-14:+:14:00

MST:-:07:00

CDT:-:05:00

Etc/GMT-1:+:01:00

NET:+:04:00

CET:+:01:00

Etc/GMT-2:+:02:00

NST:+:12:00

CNT:-:03:30

Etc/GMT-3:+:03:00

NZ-CHAT:+:12:45

CST6CDT:-:06:00

Etc/GMT-4:+:04:00

NZ:+:12:00

CST:-:06:00

Etc/GMT-5:+:05:00

Navajo:-:07:00

CTT:+:08:00

Etc/GMT-6:+:06:00

PDT:-:07:00

EAT:+:03:00

Etc/GMT-7:+:07:00

PLT:+:05:00

ECT:+:01:00

Etc/GMT-8:+:08:00

PNT:-:07:00

EDT:-:04:00

Etc/GMT-9:+:09:00

PRC:+:08:00

EET:+:02:00

Etc/GMT0:00:00

PRT:-:04:00

EST5EDT:-:05:00

Etc/GMT:00:00

PST8PDT:-:08:00

EST:-:05:00

Etc/Greenwich:00:00

PST:-:08:00

Etc/GMT+0:00:00

Etc/UCT:00:00

ROK:+:09:00

Etc/GMT+10:-:10:00

Etc/UTC:00:00

SST:+:11:00

Etc/GMT+11:-:11:00

Etc/Universal:00:00

UCT:00:00

Etc/GMT+12:-:12:00

Etc/Zulu:00:00

UTC:00:00

Etc/GMT+1:-:01:00

GB-Eire:00:00

Universal:00:00

Etc/GMT+2:-:02:00

GB:00:00

VST:+:07:00

Etc/GMT+3:-:03:00

GMT0:00:00

W-SU:+:03:00

Etc/GMT+4:-:04:00

GMT:00:00

WET:00:00

Etc/GMT+5:-:05:00

Greenwich:00:00

Zulu:00:00

Etc/GMT+6:-:06:00

HDT:-:09:00

 

Table 6-12 Africa GMT Offsets 

Time Zone: GMT Offset
Time Zone: GMT Offset
Time Zone: GMT Offset

Africa/Abidjan:00:00

Africa/Djibouti:+:03:00

Africa/Maputo:+:02:00

Africa/Accra:00:00

Africa/Douala:+:01:00

Africa/Maseru:+:02:00

Africa/Addis_Ababa:+:03:00

Africa/El_Aaiun:00:00

Africa/Mbabane:+:02:00

Africa/Algiers:+:01:00

Africa/Freetown:00:00

Africa/Mogadishu:+:03:00

Africa/Asmera:+:03:00

Africa/Gaborone:+:02:00

Africa/Monrovia:00:00

Africa/Bamako:00:00

Africa/Harare:+:02:00

Africa/Nairobi:+:03:00

Africa/Bangui:+:01:00

Africa/Johannesburg:+:02:00

Africa/Ndjamena:+:01:00

Africa/Banjul:00:00

Africa/Kampala:+:03:00

Africa/Niamey:+:01:00

Africa/Bissau:00:00

Africa/Khartoum:+:03:00

Africa/Nouakchott:00:00

Africa/Blantyre:+:02:00

Africa/Kigali:+:02:00

Africa/Ouagadougou:00:00

Africa/Brazzaville:+:01:00

Africa/Kinshasa:+:01:00

Africa/Porto-Novo:+:01:00

Africa/Bujumbura:+:02:00

Africa/Lagos:+:01:00

Africa/Sao_Tome:00:00

Africa/Cairo:+:02:00

Africa/Libreville:+:01:00

Africa/Timbuktu:00:00

Africa/Casablanca:00:00

Africa/Lome:00:00

Africa/Tripoli:+:02:00

Africa/Ceuta:+:01:00

Africa/Luanda:+:01:00

Africa/Tunis:+:01:00

Africa/Conakry:00:00

Africa/Lubumbashi:+:02:00

Africa/Windhoek:+:01:00

Africa/Dakar:00:00

Africa/Lusaka:+:02:00

 

Africa/Dar_es_Salaam:+:03:00

Africa/Malabo:+:01:00

 

Table 6-13 America GMT Offsets 

Time Zone: GMT Offset
Time Zone: GMT Offset
Time Zone: GMT Offset

America/Adak:-:10:00

America/Grenada:-:04:00

America/Noronha:-:02:00

America/Anchorage:-:09:00

America/Guadeloupe:-:04:00

America/North_Dak/Ctr:-:06:00

America/Anguilla:-:04:00

America/Guatemala:-:06:00

America/Panama:-:05:00

America/Antigua:-:04:00

America/Guayaquil:-:05:00

America/Pangnirtung:-:05:00

America/Araguaina:-:03:00

America/Guyana:-:04:00

America/Paramaribo:-:03:00

America/Aruba:-:04:00

America/Halifax:-:04:00

America/Phoenix:-:07:00

America/Asuncion:-:04:00

America/Havana:-:05:00

America/Port-au-Prince:-:05:00

America/Atka:-:10:00

America/Hermosillo:-:07:00

America/Port_of_Spain:-:04:00

America/Barbados:-:04:00

America/Ind/Indian:-:05:00

America/Porto_Acre:-:05:00

America/Belem:-:03:00

America/Ind/Knox:-:05:00

America/Porto_Velho:-:04:00

America/Belize:-:06:00

America/Ind/Marengo:-:05:00

America/Puerto_Rico:-:04:00

America/Boa_Vista:-:04:00

America/Ind/Vevay:-:05:00

America/Rainy_River:-:06:00

America/Bogota:-:05:00

America/Indianapolis:-:05:00

America/Rankin_Inlet:-:06:00

America/Bogota:-:05:00

America/Inuvik:-:07:00

America/Recife:-:03:00

America/Buenos_Aires:-:03:00

America/Iqaluit:-:05:00

America/Regina:-:06:00

America/Cambridge_Bay:-:07:0

America/Jamaica:-:05:00

America/Rio_Branco:-:05:00

America/Cancun:-:06:00

America/Jujuy:-:03:00

America/Rosario:-:03:00

America/Caracas:-:04:00

America/Juneau:-:09:00

America/Santiago:-:04:00

America/Catamarca:-:03:00

America/Ken/Louisville:-:05:00

America/Santo_Domingo:-:04:0

America/Cayenne:-:03:00

America/Ken/Monticello:-:05:0

America/Sao_Paulo:-:03:00

America/Cayman:-:05:00

America/Knox_IN:-:05:00

America/Scoresbysund:-:01:00

America/Chicago:-:06:00

America/La_Paz:-:04:00

America/Shiprock:-:07:00

America/Chihuahua:-:07:00

America/Lima:-:05:00

America/St_Johns:-:03:30

America/Cordoba:-:03:00

America/Los_Angeles:-:08:00

America/St_Lucia:-:04:00

America/Costa_Rica:-:06:00

America/Louisville:-:05:00

America/St_Thomas:-:04:00

America/Cuiaba:-:04:00

America/Maceio:-:03:00

America/St_Vincent:-:04:00

America/Curacao:-:04:00

America/Managua:-:06:00

America/Swift_Current:-:06:00

America/Danmarkshavn:00:00

America/Manaus:-:04:00

America/Tegucigalpa:-:06:00

America/Dawson:-:08:00

America/Martinique:-:04:00

America/Thule:-:04:00

America/Dawson_Creek:-:07:00

America/Mazatlan:-:07:00

America/Thunder_Bay:-:05:00

America/Denver:-:07:00

America/Mendoza:-:03:00

America/Tijuana:-:08:00

America/Detroit:-:05:00

America/Menominee:-:06:00

America/Tortola:-:04:00

America/Dominica:-:04:00

America/Merida:-:06:00

America/Vancouver:-:08:00

America/Edmonton:-:07:00

America/Mexico_City:-:06:00

America/St_Lucia:-:04:00

America/Eirunepe:-:05:00

America/Miquelon:-:03:00

America/Virgin:-:04:00

America/El_Salvador:-:06:00

America/Monterrey:-:06:00

America/Whitehorse:-:08:00

America/Ensenada:-:08:00

America/Montevideo:-:03:00

America/Winnipeg:-:06:00

America/Fort_Wayne:-:05:00

America/Montreal:-:05:00

America/Yakutat:-:09:00

America/Fortaleza:-:03:00

America/Montserrat:-:04:00

America/Yellowknife:-:07:00

America/Glace_Bay:-:04:00

America/Nassau:-:05:00

America/Virgin:-:04:00

America/Godthab:-:03:00

America/New_York:-:05:00

America/Whitehorse:-:08:00

America/Goose_Bay:-:04:00

America/Nipigon:-:05:00

America/Winnipeg:-:06:00

America/Grand_Turk:-:05:00

America/Nome:-:09:00

America/Tortola:-:04:00


Table 6-14 Antarctica/Arctic GMT Offsets 

Time Zone: GMT Offset
Time Zone: GMT Offset
Time Zone: GMT Offset

Antarctica/Casey:+:08:00

Antarctica/McMurdo:+:12:00

Antarctica/Vostok:+:06:00

Antarctica/Davis:+:07:00

Antarctica/Palmer:-:04:00

Arctic/Longyearbyen:+:01:00

Antarctica/DtDUrville:+:10:00

Antarctica/South_Pole:+:12:00

 

Antarctica/Mawson:+:06:00

Antarctica/Syowa:+:03:00

 

Table 6-15 Asia GMT Offsets 

Time Zone: GMT Offset
Time Zone: GMT Offset
Time Zone: GMT Offset

Asia/Aden:+:03:00

Asia/Hong_Kong:+:08:00

Asia/Riyadh87:+:03:07

Asia/Almaty:+:06:00

Asia/Hovd:+:07:00

Asia/Riyadh88:+:03:07

Asia/Amman:+:02:00

Asia/Irkutsk:+:08:00

Asia/Riyadh89:+:03:07

Asia/Anadyr:+:12:00

Asia/Istanbul:+:02:00

Asia/Riyadh:+:03:00

Asia/Aqtau:+:04:00

Asia/Jakarta:+:07:00

Asia/Saigon:+:07:00

Asia/Aqtobe:+:05:00

Asia/Jayapura:+:09:00

Asia/Sakhalin:+:10:00

Asia/Ashgabat:+:05:00

Asia/Jerusalem:+:02:00

Asia/Samarkand:+:05:00

Asia/Ashkhabad:+:05:00

Asia/Kabul:+:04:30

Asia/Seoul:+:09:00

Asia/Baghdad:+:03:00

Asia/Kamchatka:+:12:00

Asia/Shanghai:+:08:00

Asia/Bahrain:+:03:00

Asia/Karachi:+:05:00

Asia/Singapore:+:08:00

Asia/Baku:+:04:00

Asia/Kashgar:+:08:00

Asia/Taipei:+:08:00

Asia/Bangkok:+:07:00

Asia/Katmandu:+:05:45

Asia/Tashkent:+:05:00

Asia/Beirut:+:02:00

Asia/Krasnoyarsk:+:07:00

Asia/Tbilisi:+:04:00

Asia/Bishkek:+:05:00

Asia/Kuala_Lumpur:+:08:00

Asia/Tehran:+:03:30

Asia/Brunei:+:08:00

Asia/Kuching:+:08:00

Asia/Tel_Aviv:+:02:00

Asia/Calcutta:+:05:30

Asia/Kuwait:+:03:00

Asia/Thimbu:+:06:00

Asia/Choibalsan:+:09:00

Asia/Macao:+:08:00

Asia/Thimphu:+:06:00

Asia/Chongqing:+:08:00

Asia/Magadan:+:11:00

Asia/Tokyo:+:09:00

Asia/Chungking:+:08:00

Asia/Manila:+:08:00

Asia/Ujung_Pandang:+:08:00

Asia/Colombo:+:06:00

Asia/Muscat:+:04:00

Asia/Ulaanbaatar:+:08:00

Asia/Dacca:+:06:00

Asia/Nicosia:+:02:00

Asia/Ulan_Bator:+:08:00

Asia/Damascus:+:02:00

Asia/Novosibirsk:+:06:00

Asia/Urumqi:+:08:00

Asia/Dhaka:+:06:00

Asia/Omsk:+:06:00

Asia/Vientiane:+:07:00

Asia/Dili:+:09:00

Asia/Phnom_Penh:+:07:00

Asia/Vladivostok:+:10:00

Asia/Dubai:+:04:00

Asia/Pontianak:+:07:00

Asia/Yakutsk:+:09:00

Asia/Dushanbe:+:05:00

Asia/Pyongyang:+:09:00

Asia/Yekaterinburg:+:05:00

Asia/Gaza:+:02:00

Asia/Qatar:+:03:00

Asia/Yerevan:+:04:00

Asia/Harbin:+:08:00

Asia/Rangoon:+:06:30

 

Table 6-16 Atlantic GMT Offsets 

Time Zone: GMT Offset
Time Zone: GMT Offset
Time Zone: GMT Offset

Atlantic/Azores:-:01:00

Atlantic/Faeroe:00:00

Atlantic/South_Georgia:-:02:00

Atlantic/Bermuda:-:04:00

Atlantic/Jan_Mayen:+:01:00

Atlantic/St_Helena:00:00

Atlantic/Canary:00:00

Atlantic/Madeira:00:00

Atlantic/Stanley:-:04:00

Atlantic/Cape_Verde:-:01:00

Atlantic/Reykjavik:00:00

 

Table 6-17 Australia GMT Offsets 

Time Zone: GMT Offset
Time Zone: GMT Offset
Time Zone: GMT Offset

Australia/ACT:+:10:00

Australia/LHI:+:10:30

Australia/Queensland:+:10:00

Australia/Adelaide:+:09:30

Australia/Lindeman:+:10:00

Australia/South:+:09:30

Australia/Brisbane:+:10:00

Australia/Lord_Howe:+:10:30

Australia/Sydney:+:10:00

Australia/Broken_Hill:+:09:30

Australia/Melbourne:+:10:00

Australia/Tasmania:+:10:00

Australia/Canberra:+:10:00

Australia/NSW:+:10:00

Australia/Victoria:+:10:00

Australia/Darwin:+:09:30

Australia/North:+:09:30

Australia/West:+:08:00

Australia/Hobart:+:10:00

Australia/Perth:+:08:00

Australia/Yancowinna:+:09:30


Table 6-18 Brazil GMT Offsets 

Time Zone: GMT Offset
Time Zone: GMT Offset
Time Zone: GMT Offset

Brazil/Acre:-:05:00

Brazil/East:-:03:00

Brazil/West:-:04:00

Brazil/DeNoronha:-:02:00

   

Table 6-19 Canada/Chile/Cuba GMT Offsets 

Time Zone: GMT Offset
Time Zone: GMT Offset
Time Zone: GMT Offset

Canada/Atlantic:-:04:00

Canada/Mountain:-:07:00

Canada/Yukon:-:08:00

Canada/Central:-:06:00

Canada/Newfoundland:-:03:30

Chile/Continental:-:04:00

Canada/East-Ssktchwan:-:06:00

Canada/Pacific:-:08:00

Chile/EasterIsland:-:06:00

Canada/Eastern:-:05:00

Canada/Saskatchewan:-:06:00

Cuba:-:05:00


Table 6-20 Egypt/Eire/Europe GMT Offsets 

Time Zone: GMT Offset
Time Zone: GMT Offset
Time Zone: GMT Offset

Egypt:+:02:00

Europe/Kiev:+:02:00

Europe/Simferopol:+:02:00

Eire:00:00

Europe/Lisbon:00:00

Europe/Skopje:+:01:00

Europe/Amsterdam:+:01:00

Europe/Ljubljana:+:01:00

Europe/Sofia:+:02:00

Europe/Andorra:+:01:00

Europe/London:00:00

Europe/Stockholm:+:01:00

Europe/Athens:+:02:00

Europe/Luxembourg:+:01:00

Europe/Tallinn:+:02:00

Europe/Belfast:00:00

Europe/Madrid:+:01:00

Europe/Tirane:+:01:00

Europe/Belgrade:+:01:00

Europe/Malta:+:01:00

Europe/Tiraspol:+:02:00

Europe/Berlin:+:01:00

Europe/Minsk:+:02:00

Europe/Uzhgorod:+:02:00

Europe/Bratislava:+:01:00

Europe/Monaco:+:01:00

Europe/Vaduz:+:01:00

Europe/Brussels:+:01:00

Europe/Moscow:+:03:00

Europe/Vatican:+:01:00

Europe/Bucharest:+:02:00

Europe/Nicosia:+:02:00

Europe/Vienna:+:01:00

Europe/Budapest:+:01:00

Europe/Oslo:+:01:00

Europe/Vilnius:+:02:00

Europe/Chisinau:+:02:00

Europe/Paris:+:01:00

Europe/Warsaw:+:01:00

Europe/Copenhagen:+:01:00

Europe/Prague:+:01:00

Europe/Zagreb:+:01:00

Europe/Dublin:00:00

Europe/Riga:+:02:00

Europe/Zaporozhye:+:02:00

Europe/Gibraltar:+:01:00

Europe/Rome:+:01:00

Europe/Zurich:+:01:00

Europe/Helsinki:+:02:00

Europe/Samara:+:04:00

Europe/Simferopol:+:02:00

Europe/Istanbul:+:02:00

Europe/San_Marino:+:01:00

Europe/Skopje:+:01:00

Europe/Kaliningrad:+:02:00

Europe/Sarajevo:+:01:00

Europe/Sofia:+:02:00


Table 6-21 Hong Kong/Iceland/India/Iran/Israel GMT Offsets 

Time Zone: GMT Offset
Time Zone: GMT Offset
Time Zone: GMT Offset

Hongkong:+:08:00

Indian/Cocos:+:06:30

Indian/Mauritius:+:04:00

Iceland:00:00

Indian/Comoro:+:03:00

Indian/Mayotte:+:03:00

Indian/Antananarivo:+:03:00

Indian/Kerguelen:+:05:00

Indian/Reunion:+:04:00

Indian/Chagos:+:06:00

Indian/Mahe:+:04:00

Iran:+:03:30

Indian/Christmas:+:07:00

Indian/Maldives:+:05:00

Israel:+:02:00


Table 6-22 Jamaica/Japan/Kwajalein/Libya GMT Offsets 

Time Zone: GMT Offset
Time Zone: GMT Offset
Time Zone: GMT Offset

Jamaica:-:05:00

Kwajalein:+:12:00

Libya:+:02:00

Japan:+:09:00

   

Table 6-23 Mexico/Mideast GMT Offsets 

Time Zone: GMT Offset
Time Zone: GMT Offset
Time Zone: GMT Offset

Mexico/BajaNorte:-:08:00

Mexico/General:-:06:00

Mideast/Riyadh88:+:03:07

Mexico/BajaSur:-:07:00

Mideast/Riyadh87:+:03:07

Mideast/Riyadh89:+:03:07


Table 6-24 Pacific/Poland/Portugal GMT Offsets 

Time Zone: GMT Offset
Time Zone: GMT Offset
Time Zone: GMT Offset

Pacific/Apia:-:11:00

Pacific/Johnston:-:10:00

Pacific/Ponape:+:11:00

Pacific/Auckland:+:12:00

Pacific/Kiritimati:+:14:00

Pacific/Port_Moresby:+:10:00

Pacific/Chatham:+:12:45

Pacific/Kosrae:+:11:00

Pacific/Rarotonga:-:10:00

Pacific/Easter:-:06:00

Pacific/Kwajalein:+:12:00

Pacific/Saipan:+:10:00

Pacific/Efate:+:11:00

Pacific/Majuro:+:12:00

Pacific/Samoa:-:11:00

Pacific/Enderbury:+:13:00

Pacific/Marquesas:-:09:30

Pacific/Tahiti:-:10:00

Pacific/Fakaofo:-:10:00

Pacific/Midway:-:11:00

Pacific/Tarawa:+:12:00

Pacific/Fiji:+:12:00

Pacific/Nauru:+:12:00

Pacific/Tongatapu:+:13:00

Pacific/Funafuti:+:12:00

Pacific/Niue:-:11:00

Pacific/Truk:+:10:00

Pacific/Galapagos:-:06:00

Pacific/Norfolk:+:11:30

Pacific/Wake:+:12:00

Pacific/Gambier:-:09:00

Pacific/Noumea:+:11:00

Pacific/Wallis:+:12:00

Pacific/Guadalcanal:+:11:00

Pacific/Pago_Pago:-:11:00

Pacific/Yap:+:10:00

Pacific/Guam:+:10:00

Pacific/Palau:+:09:00

Poland:+:01:00

Pacific/Honolulu:-:10:00

Pacific/Pitcairn:-:08:00

Portugal:00:00


Table 6-25 Singapore/System V/Turkey GMT Offsets 

Time Zone: GMT Offset
Time Zone: GMT Offset
Time Zone: GMT Offset

Singapore:+:08:00

SystemV/EST5:-:05:00

SystemV/PST8PDT:-:08:00

SystemV/AST4:-:04:00

SystemV/EST5EDT:-:05:00

SystemV/YST9:-:09:00

SystemV/AST4ADT:-:04:00

SystemV/MST7:-:07:00

SystemV/YST9YDT:-:09:00

SystemV/CST6:-:06:00

SystemV/MST7MDT:-:07:00

Turkey:+:02:00

SystemV/CST6CDT:-:06:00

SystemV/PST8:-:08:00

 

Table 6-26 U.S GMT Offsets 

Time Zone: GMT Offset
Time Zone: GMT Offset
Time Zone: GMT Offset

US/Alaska:-:09:00

US/Eastern:-:05:00

US/Pacific-New:-:08:00

US/Aleutian:-:10:00

US/Hawaii:-:10:00

US/Pacific:-:08:00

US/Arizona:-:07:00

US/Indiana-Starke:-:05:00

US/Samoa:-:11:00

US/Central:-:06:00

US/Michigan:-:05:00

 

US/East-Indiana:-:05:00

US/Mountain:-:07:00

 

Manifest File Automated Scripts

This section contains information about automated Perl scripts that you can use to automate the creation of manifest files for your CDN. The most efficient method of creating a manifest file is to customize the automated Spider Perl script in combination with the Manifest Perl script, both of which are available on Cisco.com. These two Perl scripts can serve as the basis for your own automation scripts that are modified accordingly to suit your own needs.

We provide two automated Perl scripts called spider.pl and manifest.pl. These scripts can be used as-is. If you are proficient in using Perl, you can modify the spider.pl and manifest.pl scripts. However, if you modify these scripts, we will not support them. Both the spider.pl and manifest.pl scripts contain a "--file" argument that is to be used in conjunction with a user-created rules file, such as .cfg. So that the scripts can be reused, it is recommended that users employ this method to include the various arguments that they require, as opposed to running them from the command line.

First, run the Spider script, and then use the output of the Spider script as input to the Manifest script. The Spider script searches the content of the selected origin servers and outputs a database file containing a list of URLs of all content. The Manifest script uses this database file to build the manifest file with the correct syntax based on rules you stipulate from the command line or rules file. This produces an XML-based manifest file containing the URLs of only those content objects that you want made available to your users.

The following two sample automated Perl scripts are available on Cisco.com:

Spider Perl script

The Spider script crawls over the content of a selected origin server and outputs a database file containing a list of URLs.

Manifest Perl script

The Manifest script reads the database file output by the Spider script and uses rules that you establish to produce an XML-formatted manifest file containing the URLs of only those filtered content objects that you want to make available to users.

Installing Perl on Your Workstation

You must have Perl installed on your workstation before working with or running the Spider or Manifest scripts. It is useful to also have a Perl interpreter available. Perl is open source software and can be downloaded for free from a variety of locations on the Internet. Refer to the Comprehensive Perl Archive Network (CPAN) at:

http://www.cpan.org
or
http://www.perl.com

Obtaining the Perl Scripts

The Spider and Manifest scripts can be obtained from Cisco.com using the same procedure that is used to obtain updated versions of the Cisco ACNS software.

To obtain the Spider and Manifest scripts from Cisco.com, follow these steps:


Step 1 Go to the following URL to find the Spider and Manifest Perl scripts:

http://www.cisco.com/pcgi-bin/tablebuild.pl/acns50

Step 2 When prompted, log in to Cisco.com using your designated Cisco.com username and password.

The Cisco ACNS Software download page appears, listing the available software updates for the Cisco ACNS Software product.

Step 3 Locate the file named ACNS-5.0.1-manifest-tools.zip. This is a Zip archive containing both the Manifest and the Spider Perl scripts.

Step 4 Click the link for the ACNS-5.0.1-manifest-tools.zip file. The download page appears.

Step 5 Click Software License Agreement.

A new browser window opens, displaying the license agreement.

Step 6 After you have read the license agreement, close the browser window displaying the agreement and return to the Software Download page.

Step 7 Click the filename link labeled Download.

Step 8 Click Save to file and then choose a location on your workstation to temporarily store the zipped file containing the scripts.

Step 9 Use your preferred unzip program to unpack the scripts to a location on your workstation or your network.

After you have unzipped the scripts, you are ready to begin using them to build manifest files for your website. See the "Listing Website Content Using the Spider Script" section and the "Selecting Live and Pre-Positioned Content Using the Manifest Script" section for instructions on running the scripts.


Listing Website Content Using the Spider Script

In the simplest scenario, the Spider script is pointed to the address of an origin server and given the name of a database (.db) file into which it places any valid URLs it discovers on that site. For example, if you wanted to analyze the contents of www.cisco.com for content that might be pre-positioned after the manifest file is created, you would issue the following command:

perl spider.pl --start=www.cisco.com --db=ciscocontent.db

Limiting or Broadening the Scope of the Spider Script

Running the Spider script on the whole of www.cisco.com might take hours and produce much more information than you are interested in. The Spider script contains a variety of tools that enable you to limit as well as broaden the scope of a spider's action.


Note When running the Spider script on large websites, you must plan for the long period of time and the large amount of memory that is required for the Spider script to create a database.


For example, to limit the Spider script's search of www.cisco.com to just that part of the server containing product-related support information, you could enter the following command:

perl spider.pl --start=www.cisco.com/public/support/ --db=ciscocontent.db

To ask the Spider script to follow links from www.cisco.com to the Cisco networking professionals forum, you could enter the following Spider script command:

perl spider.pl --start=www.cisco.com --accept=business.cisco.com --db=ciscocontent.db

Spider Script Syntax Guidelines

The Spider script accepts the following syntax, as described in Table 6-27.

perl spider.pl {--start=origin_server_url [ --accept=accept_url] [--depth=number] [--file=filename]
[ --limit=number] [ --prefix=url_prefix] [ --reject=disallowed_url] --db=database_name.db}

Table 6-27 Spider Script Keywords 

Keyword
Description
Command-Line Syntax

--start

Names the location (URL) of the origin server that is to be analyzed.

--start=www.cisco.com

--db

Names the database file in which content URLs from the origin server and any accepted locations are to be placed.

--db=ciscocontent.db

--accept (optional)

Names a location other than that specified using the start keyword that is to be accepted when it is found in URLs.

--accept=forums.cisco.com

Note --accept is a more general command that can include regular expressions. For example, you can use "jobs.*tech" to accept any URLs with the string "jobs" followed by "tech."

--depth (optional)

Causes the Spider script to stop after following links to a specified number of levels deep on the origin server.

--depth=6

--file (optional)

Causes the Spider script to read its commands from a specified file, in this case the rules file, one line at a time.

--file=cisco-rules.cfg

--limit (required)

Causes the Spider script to stop after retrieving a specified number of pages from the origin server. Specifying 0 sets no limit for the number of pages retrieved.

--limit=1000

--map (optional)

Causes the Spider script to substitute the second URL prefix (appearing after the second =) for the first in any URLs from the origin server. Or causes the Spider script to substitute the first prefix for the second when you rerun the Spider script on an origin server if links have been modified to go to the Content Engine.

--map=http://www.cisco.com/public/support/ 
tac/=/support

--prefix (optional)

Specifies a URL prefix that is matched by the Spider script. The --prefix keyword is a convenient option that accepts a fixed string that it must match from the beginning, such as "http://www.cisco.com/jobs."

--prefix=http://www.cisco.com/partners/CDN/

--reject (optional)

Names a location that is rejected when it is found in URLs.

Note The order in which the --accept and --reject keywords are given to the Spider script determines precedence. The first match takes precedence.

--reject=/cgi-bin

Customizing the Spider Script

Because the Spider script anticipates certain platforms and scenarios that might not correspond to your own website configuration, Cisco provides you with the Perl source code for the Spider script, which you can modify to suit your own needs.

Selecting Live and Pre-Positioned Content Using the Manifest Script

Whereas the Spider script is used to gather a list of potential content from an origin server, the Manifest script sifts through the information gathered by the Spider script and decides which content to actually import to the CDN for placement on a Content Engine.

Pre-Positioned Versus Live Content

The Manifest script distinguishes between content that needs to be pre-positioned and live, streamed content that, by definition, cannot be pre-positioned.

The result of using the live command is nearly the same as that of using the prepos command. Both commands expect you to to specify what you intend to deliver as live content or to deliver as pre-positioned content with --prepos=match() or --prepos=type(). The only difference between these two commands is the tags contained in the .xml file that is created by manifest.pl. If the prepos command is used, then the .xml file that is created contains the tag <item-group type="prepos">. If the live command is used, then the .xml file contains the tag <item-group type="wmt-live"> or <item-group type="real-live">, depending on whether the streaming data is RealMedia or WMT.

By using the prepos command, you identify and pre-position content that meets criteria that you specify. For example, to pre-position image files from Cisco.com that are larger than 1 megabyte, you would enter the following command:

perl manifest.pl --prepos='type(image/*) and size > 1000k' --db=ciscocontent.db 
--xml=cisco.xml

By using the live command, you identify the URLs of live content. Unlike pre-positioned content, live content cannot be identified by information stored in the header, so you must devise a method of locating live content based solely on information contained in the URL of that content. For example, you can identify streamed content with the following command:

perl manifest.pl --live=`match(http://*)'

Manifest Script Syntax Guidelines

The Manifest script accepts the following syntax, as described in Table 6-28.

perl manifest.pl {[--file=filename | --live=`keyword_comparison' | --prepos=`keyword_comparison' | --set=`attribute=value : keyword_comparison' | --playservertable=filename | --map={origin_server_url_prefix=cdn_prefix}] --db=database_name.db --xml=manifest_file_name.xml}


Note The --prepos keyword is required for the manifest file to be created from the Spider database. If you do not use this keyword, the manifest file created will be minimal and will not contain any content URLs.


The --prepos keyword can be used with either --type() or --match(), which perform different functions. The --match() keyword is a text match and acts on the name of the URL. For example, to call jpeg files named a.jaypeg, use the --match(*.jaypeg) keyword. Another example would be to use the --match() keyword to find news in the name of the URL. The --match() keyword can also be used as shown in the following example:

perl manifest.pl -db=name.db  --prepos=='match(*.jpg)' -xml=xmlname.xml

The --type() keyword is used for comparing the content named in the database file to the content-type header returned by the web server. It informs the client of the object MIME type. For example, if you name your jpeg files *.jaypeg, the web server returns "Content-Type: image/jpeg," which is then placed in the database.

Other examples of the --type() keyword include the following:

--prepos=type(text/html) 
--prepos=type(text/plain) 
--prepos=type(application/pdf) 
--prepos=type(image/gif) 
--prepos=type(image/jpeg) 
--prepos=type(video/mpeg)

The following are two examples of using the --type() keyword in the full command line:

perl manifest.pl --db=name.db  --prepos="type(image/jpeg)" -xml=xmlname.xml
perl manifest.pl --db=name2.db  --prepos="type(application/pdf)" xml=xmlname2.xml


Tip As a rule of thumb, you must use quotes only in the command line. You do not need to use quotes within a rules file for the --file keyword.


If the--prepos keyword is used in the full command line, then quotes are needed as follows:

Windows 2000—Use double quotes instead of single quotes, as shown in the preceding example.

Linux—Use single quotes instead of double quotes.

If the --prepos argument is used within a rules file with the --file argument, then you can modify the file because quotes are not required, as shown in the following rules.cfg file example:

--start=www.cisco.com 
--accept=forums.cisco.com 
--reject=/cgi-bin
--limit=0 
--db=ciscocontent.db 
--prepos=match(image/gif) and size > 1000k
--xml=ciscomanifest.xml

If the quotes are not removed from within the rules file, the following message appears:

Bareword found where operator expected.

Table 6-28 Manifest Script Keywords 

Keyword
Description
Command-Line Syntax

--file (required)

Causes the Manifest script to read its commands from a specified file, one line at a time.

--file=ciscocontent.cfg

--prepos (required)

Marks content URLs in the database file that match the terms of the keyword comparison as pre-positioned content (prepos=type) in the manifest file.

Note The type command matches on the Content-Type: field in the Spider database file.

--prepos=`type(image/jpeg) and size > 
1000k'

--set (optional)

Sets the specified attribute to the value provided for all content items with URLs in the database file that match the keyword comparison.

--set=`ttl=10000:match(*/urgent/*)'

--playservertable (optional)

Adds the playserver table in the specified file to the manifest file. Playserver tables map MIME-type content and filename extensions to specific server types to use (for example, "real" or "wmt") for the content in a specific Content Engine.

For the manifest file to validate properly, move the entire playserver table to the beginning of the manifest file as shown in the following example:

<CdnManifest> 
         <playServerTable>
         ...
         </playServerTable>
         <server>
         ... 
         </server>
         ...
</CdnManifest>
--playservertable=info.txt

--map (optional)

Causes the Manifest script to substitute the second URL prefix (appearing after the second =) for the first in any URLs from the origin server.

The second URL prefix must have a full path name. The --map keyword is used to change the names in the .xml file that uses the manifest.pl. When you run the manifest.pl, you should see <item> tags that have the cdn-url attribute set to the requested name.

--map=http://www.cisco.com/public/support/ 
tac/=/support

--db (required)

Names the database file in which content URLs from the origin server and any accepted locations are located. This file provides the data that the Manifest script analyzes.

--db=ciscocontent.db

--xml (required)

Names the manifest file that is generated by the Manifest script.

--xml=ciscomanifest.xml

match(comparison) (required)

Locates text in content URLs that are identical to a value that is provided.

--prepos=`match(http://forums.cisco.com/*)'

size(comparison) (required)

Identifies content named in the database file according to the specified file size parameter, which can be specified in kilobytes, megabytes, or gigabytes (k, kb, m, mb, g, gb).

--prepos=match(*.gif) and size > 1000k

time(comparison) (required)

Identifies content named in the database file according to the time since the content was last modified (in hours).

Note Do not use spaces before the word "hours," whether used within a rules file or not.

Note If using the time() keyword within a rules --file, then do not use quotes.

In the syntax example, the "modtime" is compared to "now" - <value>, where "now" is the current time (in seconds since 1970). The <value> is a unit of time in hours. For example, if "modtime" is the current time minus 2 hours, it would be expressed as:

modtime < (now - 2hours)

The following example shows a .cfg file with the time comparison using the ">" character:

--start=http://website.com
--db=time1.db
--limit=10000
--xml=time1.xml
# this works for bitmaps modified within 
last 2 hours
--prepos=type(image/bmp) and modtime > (now 
- 2hours)

The following example shows a .cfg file with the time comparison using the "<" character:

--start=http://website.com
--db=time2.db
--limit=10000
# this works for bitmaps that have NOT been 
modified within last 2 hours
--prepos=type(image/bmp) and modtime < (now 
- 2hours)
--xml=time2.xml

type(comparison) (required)

Identifies content named in the database file according to its MIME type (text, application, image, and so on).

--prepos=`type(image/gif)'

Customizing the Manifest Script

Because the Manifest script anticipates certain platforms and scenarios that might not correspond to your own website configuration, Cisco provides you with the Perl source code for the Manifest script, which you can modify to suit your own needs.

Creating a Rules File for the Spider and Manifest Scripts

When using the Spider and Manifest scripts on a large web server, the parameters and rules you set for your scripts may be numerous and complex. When this is the case, it is more practical to create a separate file containing a list of your customized rules. Then you can simply point to the applicable rule rather than having to enter a long series of commands every time you want the rule applied.

Using a rules file facilitates rerunning of the Spider and Manifest scripts and ensures that the scripts are receiving identical commands each time the scripts are run. In addition, the same commands file can be read by both the Spider and the Manifest scripts without generating output errors. The Spider script simply ignores commands for the Manifest script, and vice versa.

To create a rules file for the Spider and Manifest scripts, follow these steps:


Step 1 Open your text editor.

Step 2 Enter your commands one at a time, each on its own line.

Each line of your rule file is sent to the scripts as a single argument. The following example shows a rules file for the Cisco website.

--start=www.cisco.com
--accept=forums.cisco.com
--reject=/cgi-bin
--limit=0
--db=ciscocontent.db
--prepos=match(image/gif) and size > 1000k
--xml=ciscomanifest.xml

Step 3 Save your file in a location relative to the Spider and Manifest scripts.

Step 4 Use the file command to run each script using your rules file. For example:

perl spider.pl --file=cisco-rules.cfg
perl manifest.pl --file=cisco-rules.cfg