Table Of Contents
Creating Manifest Files
Manifest File User Guidelines
Overview
Quick Start
Writing XML Tags
Important Manifest Tags
Writing a Single-Item HTTP Manifest File
Writing a Single-Item FTP Manifest File
Writing an FTP Crawler Manifest File
Writing an HTTPS Crawler Manifest File
Validating Manifest Files
Migrating from ACNS 4.x Software to ACNS 5.0 Software
Getting Started
Sample Manifest File
Using a Text Editor
Formatting XML Files
Writing Common Regular Expressions
Working with Manifest Files
Specifying a Single Content Item
Specifying a Crawl Job
Scheduling Content Acquisition
Specifying Shared Attributes
Specifying a Crawler Filter
Specifying Content Priority
Generating a Playserver List
Generating a Publishing URL
Specifying Attributes for Content Serving
Specifying Metadata for Content Serving
Specifying Time Values in the Manifest File
Refreshing and Verifying the Manifest File Content
Specifying Live Content
More Sample Manifest Files
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Downloading the Sample Files
Manifest Validator Utility
Running the Manifest Validator Utility
Understanding Manifest File Validator Output
Correcting Manifest File Syntax
Manifest File Reference
Manifest File Structure and Syntax
CdnManifest
playServerTable
playServer
options
server
host
item
crawler
item-group
matchRule
match
contains
wmt-meta-data
http-meta-data
Configuring Freshness of Pre-Positioned Content
XML Schema
Manifest XML Schema
PlayServerTable XML Schema
Default PlayServerTable Schema
Manifest File Time Zone Tables
Manifest File Automated Scripts
Installing Perl on Your Workstation
Obtaining the Perl Scripts
Listing Website Content Using the Spider Script
Selecting Live and Pre-Positioned Content Using the Manifest Script
Creating a Rules File for the Spider and Manifest Scripts
Creating Manifest Files
This chapter describes the process for creating manifest files used to acquire and distribute content with ACNS 5.0 software. This chapter is divided into two major sections:
•
Manifest File User Guidelines
This first major section provides:
–
A general overview and purpose of manifest files in the context of a Cisco CDN
–
A quick start section that has you up and running immediately
–
A getting started section that describes how to complete specific tasks
–
Useful sample manifest files
–
A syntax validation utility
–
An explanation of live content distribution
•
Manifest File Reference
This second major section describes:
–
Detailed manifest file structure and syntax
–
XML schema
–
Running of automated manifest file scripts
–
Manifest file time zone tables
Manifest File User Guidelines
This first major section contains the following topics:
•
Overview
•
Quick Start
•
Getting Started
•
Working with Manifest Files
•
More Sample Manifest Files
•
Manifest Validator Utility
Overview
The Cisco ACNS 5.0 software manages the acquisition and distribution of pre-positioned content through an Extensible Markup Language (XML)-based reference file called the manifest file. The manifest file lists content that is to be used to populate Content Engines registered on a Cisco CDN. There should be one manifest file per channel.
The manifest file is placed on an origin server and identified by a unique URL. The location of the manifest file is specified when you enter the manifest file URL in the Modifying Channel window of the Content Distribution Manager GUI. Unlike the treatment of content by Cisco ACNS 4.x software, pre-positioned content is not stored on the Content Distribution Manager in ACNS 5.0 software but is fetched from origin servers and distributed to Content Engines by a Content Engine that is a root Content Engine for the channel.
The Content Distribution Manager disseminates the manifest file URL to each of the root Content Engines of the CDN. The root Content Engine then parses the file and checks for any new or different information. After the root Content Engine determines what content is new, it fetches only that new content from the specified pre-positioned or live content from one or more origin servers.
The manifest file has the following features:
•
Administrators and content providers can provide content on an origin server.
•
Files can be imported over HTTP, HTTPS, or FTP while they are served using another streaming protocol based on a designated type of media playserver to play back the requested file.
Content acquisition and distribution can be controlled by setting pre-scheduled content availability dates and times. Two content acquisition methods can be configured within the manifest file. The first method specifies the acquisition of a single <item>. The second method specifies content acquisition by crawling a website or FTP server with the <crawler> feature. Either of these two methods can schedule when the acquisition is to start and how often its content is to be checked for freshness.
Quick Start
This section will help you succeed in writing manifest files that you can use to acquire content immediately. See other sections of this chapter to learn more about specifying useful attributes to customize the manifest files further and to obtain more information on the correct manifest file syntax.
Note
The username and password specified in the Channel property serves only to fetch the manifest file. The actual content acquisition process does not use this username and password. For fetching actual content, the username and password need to be specified in the <server> <host> tag.
Writing XML Tags
The manifest fie is a text file written in XML format. An XML text file consists of a series of XML tags. The following is an example of a simple XML tag:
<item attr1="value1" attr2="value2" />
In the preceding example, "item" is the name of the XML tag, so this tag is called the "item" tag. A tag can have many attributes in the form of name="value." The value field must be bounded by double quotation marks. There are two attributes inside the "item" tag shown in the example. The first attribute, called "attr1," has a value called "value1." The second attribute, called "attr2," has a value called "value2."
Tags typically start with a "<" and end with a "/>," but they can start with a "<" and end with ">." If a tag ends with ">," it means its scope is not yet complete. To complete its scope, a tag called "tag-name" must end with "</tag-name>." For example:
In the first line of the example, the <server> tag ends with a ">," but its scope does not end on the first line. Its scope ends on the third line with the tag </server>. Because the <host> tag is inside the <server> tag, the <host> tag is called a subtag of the <server> tag, and the <server> tag is considered the parent tag of the <host> tag.
Two tag relationships can exist between XML tags: peer and subtag. In the following example, the two "item" tags have a peer relationship:
The key to identifying their peer relationship is that the first tag ends with a "/>" before the second tag starts. In the following example, the <server> and <host> tags have a subtag relationship:
The key to identifying their subtag relationship is that the first tag ends with a ">" before the second tag starts. The <host> tag is the subtag of the <server> tag; that is, the <server> tag is the parent of the <host> tag.
Important Manifest Tags
This section lists and briefly describes the important manifest tags for you to better understand manifest files.
•
<CdnManifest> tag
The <CdnManifest> </CdnManifest> tag set must be the highest-level tag for an ACNS 5.0 software manifest file. The tag set is required and marks the beginning and end of the manifest file content. At a minimum, each <CdnManifest> tag set must contain at least one item, or content object, that is fetched and stored. For example,
•
<server> and <host> tags
The <server> and <host> tags are required to specify the origin content source server. The <server> tag must precede any <item> or <crawler> tag that refers to it. In the following example:
<server name="xyz" > <host name="http://www.xyz.com/" /> </server>
<item server="xyz" src="" />
the <server> tag must precede the <item> tag; otherwise, there is a syntax error. The <crawler> tag uses the <server> tag that immediately precedes it. If a <server> tag is not found immediately preceding the <crawler> tag, then the server that serves the manifest file is used by default.
Similarly, item priority is important when using the <item> tag. In the following example:
abc.html is acquired and distributed before xyz.html.
The <host> tag field inside the <server> tag field configures the content source host. The <server> tag only requires the name attribute. The <host/> tag defines a web server or live server from which content is to be retrieved and later pre-positioned. Only one host can be defined within a single <server> tag set. The <host> tag must be enclosed within <server> tags.
–
server name attribute
The server name value can be any value as long as it is unique across all name values of the <server> tags within the same manifest file. The <server> tag is required to be the super tag of the <host> tag, and the <host> tag needs to have at least have one <host> subtag.
–
host name attribute
The host name value specifies the fully qualified domain name, including protocol and port for the origin server. For example:
<host name="http://www.cisco.com" />
or
<host name="ftp://my-ftp-server" />
or
<host name="https://my-ftp-server.com:843/" />
–
host user and password attributes
The user and password attributes specify the username and password when authentication is required. For example:
<host name="ftp://my-ftp-server" user="honh" password="dsadda2" />
•
<item> tag
The <item> tag is used to specify a single file to be pre-positioned.
–
item src attribute
The src attribute is required to specify the relative URL of the file that is relative to the value specified in the <host name>.
•
<crawler> tag
The <crawler> tag is used to specify a crawl job. You can use the <crawler> tag to crawl an FTP directory and its subdirectories or to crawl directories using HTTP directory indexing.
–
crawl directories
Use HTTP to crawl directories to fetch files in certain directories by enabling the built-in web server directory indexing feature.
If a URL points to a directory when this directory indexing feature is enabled, the web server dynamically generates an HTML page and lists all the files and subdirectories. By parsing such an HTML page, the ACNS software can identify those files it can fetch from that particular directory.
–
crawler start-url attribute
The start-url attribute specifies the relative path of the URL from which to start the crawl. For example, if the host name of the crawl job <crawler start-url="HR/jobs/" /> is <host name="http://www.my-server.com/" />, the directory "http://www.my-server.com/HR/jobs/" is crawled.
–
crawler depth attribute
The crawler depth attribute specifies the directory depth of a web crawl. A depth value of 0 allows only a crawl of the starting URL page, while a depth value of 1 allows a crawl of the start URL page and its links.
Writing a Single-Item HTTP Manifest File
The following sample shows the simplest way to write a manifest file that fetches content using the HTTP protocol.
<server name="my-second-origin-server">
<host name="http://www.my-server.com/" />
<item src="project-one.html" />
<item src="my-eng-group/project-two.html" />
<item src="project-three.html" />
The <CdnManifest> tag set is required to specify a manifest file. The <server> tag set specifies the logical name of the server "my-second-origin-server." The <host> subtag specifies the actual URL used to access the files on the origin server. The <item> tag specifies the exact item that is to be fetched from the origin server.
Upon execution, the preceding manifest file sample instructs the ACNS software to fetch the following items using HTTP:
•
http://www.my-server.com/project-one.html
•
http://www.my-server.com/my-eng-group/project-two.html
•
http://www.my-server.com/project-three.html
Writing a Single-Item FTP Manifest File
Note
When you use FTP to acquire content using a CDN URL, you must either specify the the content-type in the manifest file, or you must use the correct extension in the CDN URL. Otherwise the wrong content-type is generated and you are not able to play the content.
The following sample shows the simplest way to write a manifest file that fetches content using the FTP protocol.
<server name="my-ftp-server">
<host name="ftp://myftp.cisco.com" user="johnw" password="georgebush" />
<item src="relative-path/file1.txt" />
<item src="/full-path/file2.txt" />
Upon execution, the preceding manifest file sample instructs the ACNS software to fetch content using FTP, where the "relative-path" is the path relative the home directory of johnw's login to the FTP server. The "/full-path" is the absolute path relative to the root directory.
For example, if the FTP home directory for "johnw" is "/users/ftp/johnw," the full path for the first file is /users/ftp/johnw/relative-path/file1.txt, and the full path for the second file is /full-path/file2.txt.
Writing an FTP Crawler Manifest File
The following sample shows the simplest way to write a crawler manifest file that fetches content using FTP protocol.
<server name="my-ftp-server" >
<host name="ftp://ftp-server" />
The web crawler application methodically and automatically searches acceptable websites and makes a copy of the visited pages for later processing. The web crawler starts with a list of URLs to visit and identifies every web link in the page, adding these web links to the list of URLs to visit.
The preceding manifest file sample instructs the ACNS software to start crawling from ftp://ftp-server/folder to ten directory levels deep and check those directories every 10 minutes for freshness.
The <crawler> tag specifies the crawl task. The start-url attribute specifies where the web crawler is to start crawling. The depth attribute of ten specifies how many levels of subdirectories the crawler is to check to obtain the required content. The ttl attribute specifies how often the file is to be checked for freshness. The ttl attribute can be specified as an attribute in a single-item manifest file as well.
Writing an HTTPS Crawler Manifest File
The following sample shows the simplest way to write a crawler manifest file that fetches content using the HTTP protocol. The following manifest file sample instructs the ACNS software to start crawling from https://www.cisco.com/jobs/eng/ to a depth of five levels.
<host name="https://www.cisco.com/" />
As with a single-item manifest file, the <CdnManifest> tag set is required to specify a manifest file. The <server> tag set specifies the logical name of the server "cisco." The <host> subtag specifies the actual URL used to access the files on the origin server.
If directory indexing is enabled for the jobs/eng directory and its subdirectories, then the crawler will go to a depth of five directory levels to retrieve the files. Files associated with a particular channel are typically stored in the same directory or subdirectory on the origin server. If the origin server is running HTTP or HTTPS services, directory indexing must be enabled for these directories. Enabling directory indexing allows a request to that directory to return a list of files in that directory and allows ACNS 5.0 software to crawl the directory.
Validating Manifest Files
It is a good idea to use the Manifest Validator utility to validate your manifest file after it is created. See the "Manifest Validator Utility" section for more information on the Manifest Validator utility.
Migrating from ACNS 4.x Software to ACNS 5.0 Software
Unlike ACNS 4.3 software, ACNS 5.0 software requires one or more origin servers where source files can be stored for the pre-positioning of content. These origin servers require that remote access servers be installed to support HTTP, FTP, or HTTPS services so that CDN devices can fetch pre-positioned files.
Files associated with a particular channel are typically stored in the same directory or subdirectory on the origin server. If the origin server is running HTTP or HTTPS services, directory indexing must be enabled for these directories. Enabling directory indexing allows a request to that directory to return a list of files in that directory and allows ACNS 5.0 software to crawl the directory.
Once the content is uploaded to a suitable origin server and available, you can use the following simple manifest file to specify content acquisition.
<server name="my-server" >
<host name="http://my-server" />
<crawler start-url="my-path/" ttl="10" />
In the preceding sample, a crawl job is specified for the associated channel to check the "my-path" directory for freshness every 10 minutes. Once this setup is complete, the root Content Engine associated with this channel monitors this directory to determine if there are any new or updated files, and then automatically fetches them.
Running the preceding manifest file sample achieves the same objective as that featured in the ACNS 4.2 software, where users copy pre-positioned files into a Content Distribution Manager import folder. However, using the manifest file is more powerful than the Content Distribution Manager import feature. For example, if you store content at different locations but the content must be distributed through the same channel, you can create multiple crawl jobs in the manifest file to monitor these locations. The following sample manifest file allows you to monitor different locations.
<server name="my-http-server" >
<host name="http://my-server" />
<crawler start-url="my-path-http/" ttl="10" />
<server name="my-ftp-server" >
<host name="ftp://my-server" />
<crawler start-url="my-path-ftp/" ttl="10" />
You are monitoring a directory "my-path-http" in an HTTP server and a directory "my-path-ftp" in an FTP server.
Getting Started
The manifest file, whose URL is stored in the Content Distribution Manager GUI, allows you to define a series of servers from which content can be fetched, as well as a list of content objects on each server to be fetched. Written in XML, a finished manifest file contains a series of URLs pointing to pre-positioned content.
This section explains the structure of the XML-based manifest file. In the manifest file syntax samples that follow, note the capitalization and data formats used. For your finished manifest file to be executed successfully, XML tags and tag attributes must use the format outlined in this section.
Sample Manifest File
The following example shows a simple functional manifest file. Use this example as a model when creating or troubleshooting your own manifest files.
<contentType name="wmt"/>
<options noRedirectToOrigin="true"/>
<host name="http://www.cnn.com"/>
<item-group server="server0">
serveStartTime="2003-01-12 14:00:00 PST" serveStopTime="2099-04-12 14:00:00 PST">
<crawler start-url="crawler-01" depth="10"/>
Note
The XML standard requires that the optional <?xml version="1.0"?> version line, if used, must be the first line of the XML file. If blank lines occur before the <?xml version="1.0"?> version line in a manifest file, the Manifest Validator will report syntax errors.
The format of the manifest file is important because it is the vehicle that specifies those content objects that are to be imported into your CDN for pre-positioning in your edge devices, such as Cisco Content Engines. With the manifest file, you can specify where to obtain web content objects, how long these objects should remain on the Content Engines of your CDN, and how frequently the ACNS software should check their freshness.
Using a simple text editor, you can write acquisition and pre-positioning instructions in XML format. The actual manifest file resides on a web server that the Content Distribution Manager can access. The manifest file URL is stored in the Content Distribution Manager GUI. The ACNS software takes its instructions from the manifest file, acquiring content from the origin server and pre-positioning it to the appropriate edge devices on your CDN. You can specify that the manifest file fetch content from servers using either of the following methods:
•
Fetch one or multiple single items or URLs.
•
Start a crawler job using its associated parameters, such as starting URL, level of directory depth, prefix, and filter, to accept or reject content using criteria you have specified.
You can also schedule when content acquisition is to start and how often content should be checked for freshness. Information on how end users can access pre-positioned content on the CDN must be provided. For example, end users need to know what playserver should be used to play media, how to access the content, when the content is to be served, and any additional metadata for media playback.
Using a Text Editor
Because XML files, like HTML files, are simple text format files that use special tags or elements to designate how content is to be handled and represented on a website, it is possible to create manifest files using any ASCII text editor. A variety of third-party XML authoring tools also exist, and they can speed the process of generating manifest files.
Unlike HTML, which serves as a language for creating web pages, XML is a language for creating languages. In this case, the manifest file becomes the XML application. The XML application contains tags that describe the information that is contained within the tags. This information is extracted from the manifest file XML application and reused repeatedly to carry out tasks, or it is merged with other information from a different source and the result used in a different framework or for a different function.
Writing XML is not as forgiving as writing HTML. XML is sensitive to uppercase and lowercase letters, the use of quotation marks, the proper closure of tags, and other formats that require exceptional attention to detail. Care must be taken to ensure that XML tags are properly formatted and otherwise syntactically correct. Incorrectly formatted data, such as incorrect usage of capitalization in a tag or tag attribute, results in syntax errors.
Formatting XML Files
The manifest file must be written using the XML format described in the "Manifest File Structure and Syntax" section. An XML file is a plain text file with tags. The following is an example of a simple XML tag:
The tag begins with the left angle bracket (<) and ends with a forward slash and a right angle bracket (/>). The name of this tag is "sample-tag."
The following is an example of a tag with attributes:
<sample-tag name1="value1" name2="value2" />
The following sample tag has attributes and a subtag:
<sample-tag name1="value1" name2="value2">
<sub-tag name1="value1" name2="values"/>
If a subtag is contained within a tag, the subtag attribute list must end with a right angle bracket (>) instead of a forward slash and a right angle bracket (/>), and the entire tag must end with </tag-name>.
For more information on XML or XML tutorials, refer to the following links:
http://www.w3.org/XML/
http://www.w3schools.com/
Writing Common Regular Expressions
A regular expression is a formula for matching strings that follow a recognizable pattern. The following special characters have special meanings in regular expressions:
. * \ ? [ ] ^ $
If the regular expression string does not include any of these special characters, then only an exact match satisfies the search. For example, "stock" must match the exact substring "stock."
For more information about writing regular expressions, refer to the following website:
http://yenta.www.media.mit.edu/projects/Yenta/Releases/Documentation/regex-0.12/
Working with Manifest Files
This section provides manifest file samples for carrying out specific tasks. Each sample has an associated explanation of its purpose and function. The manifest file can specify a single content object, a website crawler job, or an FTP server crawler job to acquire pre-positioned content or live content that is distributed to edge Content Engines later.
Specifying a Single Content Item
The following manifest file example specifies a single content item.
<server name="my-origin-server-one">
<host name="http://www.my-server-one.com/eng/" />
<server name="my-origin-server-two">
<host name="http://www.my-server-two.com/eng/" />
<item src="project-two.html" />
<item server="my-origin-server-one" src="project-one.html" />
In the preceding example, the first <item> uses the manifest server, where test.html is relative to the manifest file URL. The second <item>, "project-two.html," uses "my-origin-server-two," and the third <item>, "project-one.html," uses "my-origin-server-one."
Use the <item> tag to specify a single content item, object, or URL. The required attribute src is used to specify the relative path portion of the URL. If the server name attribute is omitted, the server name in the last specified <server> tag above it is used. If there are no <server> tags close by in the manifest file, the server that hosts the manifest file will be used, which means that the relative URL will be relative to the manifest file URL.
Note
Before any content can be acquired, you must enter the URL that defines the location of the manifest file in the Content Distribution Manager GUI. In the Modifying Channel window, enter the location URL of the manifest file, its Time To Live (TTL), the username, and the password required to access the manifest file (if the location is password-protected).
Specifying a Crawl Job
The web crawler application methodically and automatically searches acceptable websites and makes a copy of the visited pages for later processing. The web crawler starts with a list of URLs to visit and identifies every web link in the page, adding these links to the list of URLs to visit. The process ends after one or more of the following conditions are met:
•
Links have been followed to a specified depth.
•
The maximum number of objects has been acquired.
•
The maximum content size has been acquired.
By crawling a site at regular intervals using the Time To Live (or ttl) attribute, these links and their associated content can be updated regularly to keep the content fresh. For more information on the ttl attribute, see the "Refreshing and Verifying the Manifest File Content" section.
Use the <crawler> tag to specify the website or FTP server crawler attributes. Table 6-1 lists the attributes, states whether these attributes are required or optional, and describes their functions.
Table 6-1 Website or FTP Server Crawler Job Attributes
Attribute
|
Description
|
start-url
|
(Required) Defines the relative path of the URL to start from for the specified crawl job.
|
depth (0, 1,-1)
|
(Optional) Defines the level of depth to crawl the specified website.
The depth is defined as the level of a website's URL links or FTP server's directory, where 0 is the URL or directory from which the crawler job starts.
0 = acquire only the starting URL 1, 2, 3,... = acquire the starting URL and its referred files to the depth specified -1 = infinite or no depth restriction
The default is 20 if a depth is not specified.
Note It is not advisable to specify a depth of -1 because it will take a long time to crawl a large website and is wasteful if all of the content on that particular website is not required.
|
prefix
|
(Optional) Combines the host name from the <server> value and this field to create a full prefix. Only content whose URLs match the full prefix is acquired. For example:
<server name="xx"> <host name="www.cisco.com" proto="https" port=433/>
</server>
and in a <crawler> tag:
The full prefix is "https://www.cisco.com:433/marketing/eng/." Only URLs that match this prefix are crawled. If a web page refers to .../marketing/ops, the marketing/ops page and its children are not acquired.
If the prefix is omitted, the crawler checks the default full prefix, which is the host name portion of the URL from the server. In the previous example, the default full prefix is "https://www.cisco.com:433."
|
accept
|
(Optional) Uses a regular expression to define acceptable URLs to crawl, in addition to having acceptable URLs match a prefix. For example, accept="stock" means that only URLs that meet two conditions are crawled: the URL matches the prefix and also contains the regular expression string "stock."
|
reject
|
(Optional) Uses a regular expression to reject a URL if it matches the expression. The URL is first checked for a possible prefix match and then checked for a reject regular expression. If a URL does not match the prefix, it is immediately rejected. If a URL matches both the prefix and the reject regular expression, it is rejected by the expression.
|
max-number
|
(Optional) Specifies the maximum number of crawler job objects that can be acquired.
|
max-size-in-B max-size-in-KB max-size-in-MB
|
(Optional) Specifies the maximum size of content that this crawler job can acquire. The size can be expressed in bytes (B), kilobytes (KB), or megabytes (MB).
|

Note
If you specify both the max-number and max-size attributes as the criteria to use to stop a crawler job, the condition that is met first takes precedence. That is, the crawler job stops either when the maximum number of objects is acquired or when the maximum content size is reached, whichever occurs first. For example, if the crawler job has acquired the maximum number of objects specified in the manifest file but has not yet reached the maximum content size, the crawler job stops.
The following is an example of a website crawler job.
<host name="http://www.cisco.com/jobs/" />
start-url="eng/index.html"
The attributes of this website crawler job example are:
•
The start-url path is http://www.cisco.com/jobs/eng/index.html.
•
Search to a website link depth of 10.
•
Search URLs with the prefix http://www.cisco.com/jobs/eng/.
•
Reject URLs containing .pl (Perl script pages).
•
Crawl only until 200 megabytes in total content size is acquired.
If the server name attribute is omitted, the server name in the last specified <server> tag above it is used. If there are no <server> tags close by in the manifest file, the server that hosts the manifest file will be used, which means that the relative URL will be relative to the manifest file URL.
Scheduling Content Acquisition
Two attributes, ttl and prefetch, are used to schedule content acquisition. Use ttl to specify the frequency of checking the content for freshness, in minutes. For example, to check for page freshness every day, enter ttl="1440."
In the following example, page freshness is scheduled to be checked once a day.
In the following example, page freshness is scheduled to be crawled and checked every hour to a link depth of 2.
If the content is not yet available at a particular URL, the prefetch attribute can be used to specify the start time for acquisition at that specified URL. For example, prefetch="2002-28-06 18:35:21" means that the content acquisition job can only start on June 28, 2002 at this specific time.
The following example schedules a crawl of this website every hour to a link depth of 2 to start on November 9, 2001 at 8:45 a.m.
prefetch="2001-09-11 08:45:12"
Specifying Shared Attributes
Attributes in single <item> tags can be shared or have the same attribute values. Instead of writing these attributes individually for every <item> tag, you can extract them and place them into a higher-level tag called <item-group>, where these attributes can be shared from this higher level tag. You can create an <item-group> tag at a level below the <CdnManifest> tag, and write <item> tags into it as subtags, moving shared attributes into the <item-group> tag, as shown in the following example:
<server name="cisco-cco">
<host name="http://www.cisco.com"
<item src="jobs/index.html"/>
<item src="jobs/index1.html"/>
<item src="jobs/index2.html"/>
<item src="jobs/index3.html"/>
<item src="jobs/index4.html"/>
<item src="jobs/index5.html"/>
You can also use the <options> tag to share attributes at the topmost level of the manifest file. Shared attributes in the <options> tag can be shared by every <item> tag or by the <crawler> tag in the manifest file. However, if a shared attribute is specified in both the <item-group> and the <item> tags or the <options> and <item> tags, attribute values in the <item> tags take precedent over the <item-group> and <options> tags. For a list of shared attributes, see the "options" section.
The following example illustrates this precedence rule. The first <item> tag takes the TTL value 1440 from the <options> tag, but the second <item> uses its own TTL value of 60.
<item src="index.html" />
<item src="index1.html" ttl="60" />
If you need to specify many single <item> tags and if a manifest file with many single items or URLs must be created, Perl scripts are available to create such single <item> tags. See the "Manifest File Automated Scripts" section to use automated Perl scripts.
Specifying a Crawler Filter
With a rule-based crawler filter, you can crawl an entire website and only acquire contents with certain predefined characteristics. Crawler attributes in the <crawler> tag do not act as filters but only define the attributes for crawling. The <matchRule> tag is designed to act as a rule-based filter. You can define rule-based matches for file extensions, size, content type, and time stamp. In the following example, the crawl job is instructed to crawl the whole website starting at "index.html," but to acquire only files with the .jpg extension and those larger than 50 kilobytes.
<match size-min-in-KB="50" extension="jpg" />
There can be multiple <match> subtags within a <matchRule> tag. Table 6-2 lists and describes the <match> subtag attributes.
Table 6-2 <match> Subtag Attributes
Attribute
|
Description
|
mime-type
|
Specifies match of these MIME-types.
|
extension
|
Specifies match of files with these extensions.
|
time-before
|
Specifies match of files modified before this time (using the Greenwich mean time [GMT] time zone) in yyyy-mm-dd hh:mm:ss format. See the "options" section for a description of the timeZone attribute.
|
time-after
|
Specifies match of files modified after this time (using the Greenwich mean time [GMT] time zone) in yyyy-mm-dd hh:mm:ss format.
|
size-min-in-MB size-min-in-KB size-min-in-B
|
(Optional) Specifies match of content size equal to or larger than this value. The size can be expressed in megabytes (MB), kilobytes (KB), or bytes (B).
|
size-max-in-MB size-max-in-KB size-max-in-B
|
(Optional) Specifies match of content size equal to or smaller than this value. The size can be expressed in megabytes (MB), kilobytes (KB), or bytes (B).
|
A <match> subtag can specify multiple attributes. Attributes within a <match> tag have a Boolean AND relationship. In the following example, to satisfy this match rule, a file must have an .mpg type file extension AND its size must be larger than 50 kilobytes.
<match extension="mpg" size-min-in-KB="50" />
There is a Boolean OR relationship between the <match> rules themselves. A <matchRule> tag can have multiple <match> subtags, but only one of these subtags must be matched. The <matchRule> tag can be specified as a subtag of the <crawler> tag, or a subtag of the <item-group> tag. If there is a subtag in an <item-group> tag, it is shared by every <crawler> tag within that <item-group> tag.

Note
The accept or reject attributes can be mistakenly used in the <crawler> tag for a crawler filter.
For example, to crawl files with the extension .mpg, simply specifying accept="\.mpg" is not correct. In this case, although specifying accept="\.mpg" is not technically incorrect, no crawling occurs. Pages whose URLs do not match the accept constraint are not searched. For example, if the starting URL is index.html, this HTML file is parsed and any links not containing .mpg are rejected. If the .mpg files are located in the second or lower link levels, they are not fetched, because the links connecting them have been rejected.
To properly crawl for the .mpg extension, use <matchRule>. Specify <matchRule> <match extension="mpg" />. The whole site is crawled and only those files with the .mpg extension are retained.
Specifying Content Priority
A priority can be assigned to content objects to define their order of importance. The CDN determines the order of processing from the level of priority of the content. The higher the content priority, the sooner the acquisition of content from the origin server and the sooner the content is distributed to the Content Engines.
Note
Every content object acquired by running a crawler job has the same priority.
Three factors combine to determine content priority:
•
Channel priority—Content Distribution Priority drop-down list in the Modifying Channels window of the Content Distribution Manager GIU in the Acquisition and Distribution Properties area.
•
Item index—Content order listed in the manifest file
•
Item priority—Priority of the attributes specified in the <item> or <crawler> tag
To calculate content priority, use either item-priority or item-index:
•
If there is a priority specified in item-priority of the manifest file for this content, use the following formula:
content-priority = channel-priority * 10000 + item-priority
Tip
The item-priority within the <item> tag can be any integer and is unrestricted. If you want a particular content object to have the highest priority, specify a very large integer value in the item-priority for that particular content object in the content-priority formula.
•
If an object does not have an item-specified priority, use the item-index order within the manifest file:
content-priority = channel-priority * 10000 + 10000 - item-index
Note
If there is no priority specified for any items, content is processed in the order listed in the manifest file.
Generating a Playserver List
ACNS 5.0 software supports playservers that play back the following pre-positioned content types on the CDN: HTTP, WMT, and RTSP (RealMedia and QuickTime Streaming Server [QTSS]). The CDN checks whether the requested protocol matches the list in the playserver table. If it matches, the request is delivered. If it does not match, the request is rejected.
You can generate a playserver list using these methods:
•
The manifest file, by configuring playserver attributes in an <item> tag
•
The <playServerTable> tag, by configuring playserver MIME-type extension names
To create the playserver list directly through the manifest file, configure playserver attributes of the playserver list in an <item> tag. If an <item> tag does not have a playserver attribute, its playserver list is generated through the <playServerTable> tag. If the <playServerTable> tag is omitted in the manifest file, a built-in default <playServerTable> tag is used to generate the playserver list. Multiple servers are separated by commas, as shown in the following example:
<item src="video.mpg" playServer="real,wmt" />
You can also generate the playserver list that supports these streaming media types through the <playServerTable> tag. The <playServerTable> tag maps content into a playserver list based on the MIME-type extension name. If there is a <playServerTable> tag in the manifest file, use the <playServerTable> tag in the manifest file.
To generate the playserver list though the <playServerTable> tag, use MIME-type extension names to configure which playserver can play the particular pre-positioned content, as shown in the following example:
<contentType name="application/x-pn-realaudio" />
<contentType name="application/vnd.rn-rmadriver" />
<contentType name="application/pdf" />
<contentType name="application/postscript" />
The <playServerTable> tag is used to generate a playserver list for each content type. Note that in the preceding example, any file with a PDF or a PostScript extension uses HTTP to play the content. See the "Default PlayServerTable Schema" section to view the default playserver table.
Generating a Publishing URL
A publishing URL is the URL that plays back pre-positioned content in the CDN. A complete publishing URL consists of three parts:
•
Scheme
•
Domain name
•
Path
The path includes both the file directory path and the filename. The playserver list determines the publishing URL for the CDN. Again, the playserver list is generated directly through the manifest file, through the <playServerTable> tag in the manifest file, or through the default <playServerTable> tag.
Scheme
The scheme of the publishing URL is the protocol used to play the content type. For example, if an .asf video file can be played by both an HTTP and a WMT playserver, two URL schemes can be used to access this content: HTTP and MMS.
Domain Name
The domain name of the publishing URL is determined by the configuration of the CDN. If WCCP is used to redirect requests to a Content Engine, its domain name is the origin FQDN (fully qualified domain name) in the website or channel. If content routing is used, the content routing FQDN (the FQDN of the website) becomes the domain name.
Path
In most cases, the path of the publishing URL is the relative source URL, or the src attribute in the <item> tags. For content crawling, it is a relative URL, relative to the host name of the origin server.
Certain attributes in the manifest file allow you to alter the publishing URL path. These attributes are cdn-url in the <item> tag, and srcPrefix or cdnPrefix in the <crawler> and <item-group> tags. These attributes convert a relative source URL into a completely new relative CDN URL.
For the content in the following example, the path uses default.html instead of index.html.
<item src="index.html" cdn-url="default.html" />
The relative URL is always relative to the host name. In the following example, the relative URL is index.html, not sport/index.html.
<host name="http://www.cnn.com/sport/" />
<item src="index.html" />
In the following example, the srcPrefix and cdnPrefix attributes convert the prefix of every crawled content object from NBA/ to ABC/. The relative cdn-url is ABC/*. The path for the start-url is ABC/index.html.
start-url="NBA/index.html"
Specifying Attributes for Content Serving
Certain attributes in the manifest file can be specified to control the manner in which content is served by the Content Engines. These attributes can be specified in the <item> and <crawler> tags. These same attributes can also be specified in <item-group> or <options> tags, so they can be shared by their <item> and <crawler> subtags. Table 6-3 lists and describes these content-serving attributes.
Table 6-3 Attributes for Content Serving
Attribute
|
Description
|
noRedirectToOrigin
|
(Optional) Sets the redirection to the origin server to true or false. A false setting allows the CDN Content Engine to redirect content requests to the origin server if the content is not available at that device. A true setting does not allow the CDN Content Engine to redirect content requests to the origin server and generates an error. The default setting is false.
|
serveStartTime
|
(Optional) Designates a time in yyyy-mm-dd hh:mm:ss at which the CDN is allowed to start serving the content. If the serving start time is omitted, content is ready to serve once it is distributed to the Content Engine.
|
serveStopTime
|
(Optional) Designates a time in yyyy-mm-dd hh:mm:ss format at which the CDN temporarily stops serving the content. If the serving stop time is omitted, the CDN serves the content to the Content Engine until it is removed by modifying the manifest file or renaming the channel.
|
alternateUrl
|
(Optional) If the content requested by the user is not ready in the CDN, the CDN redirects the request to this alternative URL, which can be configured as an error reporting page. This attribute supports both the full URL or a relative path. (If it is a relative path, it must be relative to the requesting URL.)
|
requireAuth
|
(Optional) Determines whether users need to be authenticated before the specified content is played. If authentication is required, the Content Engine communicates with the origin server to check credentials. When true, requireAuth requires authentication to play back the specified content to users. If the requests pass the credential check, the content is played back from Content Engine. If this attribute is omitted, a heuristic approach is used. If the specified content is acquired by using a username and password, this attribute is required; otherwise, it is not required.
|
Specifying Metadata for Content Serving
In certain situations, you must specify the metadata for content playback. For example, if content is acquired from an FTP server but must be played back with HTTP, the HTTP playback metadata, such as MIME-type and cache control, must be specified.
The <http-meta-data> subtag is used to specify HTTP metadata. Within the <http-meta-data> subtag shown in the following example, the name=value attributes are content-type="video/x-asf" and app-data="hh and dd." These are specified so that the CDN passes them directly to the end user when the HTTP content is played back.
<http-meta-data content-type="video/x-asf" app-data="hh and dd" />
As with the HTTP metadata, you can use the <wmt-meta-data> subtag shown in the following example to specify WMT streaming properties, such as title, author, and copyright date.
Title="Who Let the Dogs Out?"
Both <http-meta-data> and <wmt-meta-data> can be specified as subtags of <item> or <crawler> tags. For every <item> or <crawler> tag in the <item-group> tag that is to share the metadata, configure both <http-meta-data> and <wmt-meta-data> to be subtags of the <item-group> tag. If a <crawler> tag has either <http-meta-data> or <wmt-meta-data> as subtags, each of its crawled content objects shares these metadata.
Specifying Time Values in the Manifest File
The following attributes require that you enter a time value in the format yyyy-mm-dd hh:mm:ss.
•
prefetch
•
serveStartTime
•
serveStopTime
•
expires
•
time-before
•
time-after
In the manifest file, the time string conforms to the yyyy-mm-dd hh:mm:ss (year-month-day hour:minute:second) format. A time zone designation can be optionally specified at the end of a time string to indicate the particular time zone used. If a time zone designation is omitted, the Greenwich mean time (GMT) time zone is used. For a complete list of time zone designations and their GMT offsets, see the "Manifest File Time Zone Tables" section. Note that the automatic conversion between daylight saving time and standard time within a time zone is not supported, but a special designation for daylight saving time can be used, such as PDT for Pacific daylight saving time. In the following example, the prefetch time is September 5, 2002 at 09:09:09 Pacific daylight saving time:
<options timeZone="PDT" />
<item src="index.html" prefetch="2002-09-05 09:09:09 PDT" />
Refreshing and Verifying the Manifest File Content
Use the expires and ttl (Time To Live) attributes of the manifest file to monitor and control the freshness of the content objects. Additionally, you can specify the GMT time zone (see the "Specifying Time Values in the Manifest File" section). The expires attribute designates a time in yyyy-mm-dd hh:mm:ss (year-month-day hour:minute:second) format that the content is to be removed from CDN. If a time value is omitted when you set the expires attribute, content is stored at the CDN until it is explicitly removed when you modify the manifest file. The ttl attribute designates a time interval, in minutes, for revalidation of the content.
As content is modified or updated on the origin server, it updates the content on the CDN at the time interval set by the ttl attribute. This ttl attribute represents the minimum amount of time in which the content is to be updated. As the file size and volume of the content increase, the time needed to refresh the content can increase beyond the time interval set by the ttl attribute. If the modifications or updates to the content are relatively large, it is more practical to fetch the entire content from the origin server.
You can monitor the status of content replication and freshness by enabling and then viewing the transaction log files that reside on the Content Engines of your CDN. To verify whether a content object or file was successfully imported to or refreshed on a particular Content Engine:
•
Enable the transaction log function on the Content Engine you want to monitor.
•
View the transaction log entries for the content object or filename that resides on that Content Engine.
Specifying Live Content
The two types of live content that you can specify in a manifest file are:
•
wmt-live
•
real-live
Use the <item> tag and specify the type attribute as either wmt-live or real-live, as shown in the following example.
<server name="wmt-server">
<host name="mms://www.company-web-site.org" />
<item src="/tmp/ceo-talk" type="wmt-live" >
<wmt-meta-data title="Company's vision" copyright="FirstName LastName" />
This is a "wmt-live" streaming content type specified by the "type" attribute. The live
stream URL is
mms://www.company-web-site.org/tmp/ceo-talk. The "title" and "copyright" metadata is added
using <wmt-meta-data> tag.
<server name="real-server">
<host name="real-server" proto="rtsp" />
<item src="tmp/funny-video" type="real-live" />
This is "real-live" streaming content type specified by the "type" attribute. The stream
URL is rtsp://real-server/tmp/funny-video.
Two live streams are specified in the preceding manifest file example. One is wmt-live with url=mms://www.company-web-site.org/tmp/ceo-talk and the other one is real-live with url=rtsp://real-server/tmp/funny-video.
More Sample Manifest Files
This section contains five sample manifest files. In XML, text between <!--and--> represents comments and has no effect on the execution of the file. In these five samples, narrative comments have been added immediately below certain tags or groups of tags to provide you with a better understanding of what these particular tags mean. You can copy an entire sample file, save it to a text file, and then view it with Microsoft Internet Explorer.
Additionally, cross-reference links from the first occurrence of a tag to the "Manifest File Reference" section of this guide have been embedded in the narrative comments of each sample to provide you with a more in-depth explanation of the tag if you feel further explanation is necessary.
To download these sample files from Cisco.com, see the "Downloading the Sample Files" section.
Topics covered by the five sample manifest files are:
Sample 1
•
How to use HTTP, HTTPS, and FTP protocols to acquire content
•
How to specify a username and password when authentication is required
Sample 2
•
How to specify attributes for acquisition, such as:
–
ttl—Sets the time interval between content freshness checks
–
prefetch—Specifies the time when the ACNS software can start to acquire content from the origin server
•
How to specify acquisition and distribution priorities
•
How to specify the following playback attributes:
–
serveStartTime—Sets time and date to start serving this content
–
serveStopTime—Sets time and date to stop serving this content
–
alternativeURL—Provides an alternative URL to use if content has not yet been replicated at the Content Engine
–
requireAuth—Requires authentication credentials from users to play back the content
–
expires—Sets an expiration time and date for content
–
playServer—Chooses which play servers can play the specified content
–
noRedirectToOrigin—If false and content has not yet been replicated, does not redirect the incoming request to the origin server
–
<http-meta-data>—Adds attributes for HTTP playback
–
<wmt-meta-data>—Adds attributes for WMT playback
Sample 3
•
A simple crawl job
–
FTP crawl of a directory
–
HTTP crawl of a directory
–
HTTP crawl of a website
•
A simple crawl job using the <matchRule> tag
–
FTP crawl of a directory to fetch only MPEG files
–
HTTP crawl of a directory to fetch only files larger than 10 MB
–
HTTP crawl of a website, to fetch only if-modified-since (IMS) files
Sample 4
•
Items with the <contains> tag included
Sample 5
•
RealMedia and WMT streaming live content
Sample 1
Sample 1 is a manifest file written to acquire single items, some of which require a username and password for authentication purposes, with HTTP, HTTPS, and FTP.
The CdnManifest tag pair is absolutely essential for a manifest file. It must be the first
tag and is used only as a super tag for other tags.
The preceding XML defines the origin server using the <server> tag from which to obtain
content. Using the <host/> tag, the content is to be fetched using HTTP as specified by
the "proto" attribute.
<item src="myphotocollection/index.html" />
<item src="myphotocollection/myname/000001.jpg" />
The preceding XML defines single items, using the <item/> tag, to be obtained from the
origin server. The "src" specifies the relative path to the web publishing root on the
server. The full URL for the first item is "http://my-server.xyz.com/
myphotocollection/index.html"
<host name="http://my-auth-server.xyz.com"
This origin server requires user authentication. The "user" and "password" attributes
specify the required username and password to access content from the origin server. In
this case, the name attribute can have a fully qualified domain name with both protocol
and port.
<item src="mymoviecollection/index.html" />
<item src="mymoviecollection/myname/000001.wmv" />
Again, the preceding XML defines two single items to obtain from the origin server.
Because the <server> tag with name="auth-httpserver" is the closest <server> tag, it is
used as the origin server for the two items.
From this origin server, the content is to be acquired using HTTPS, or HTTP over SSL, so
that the protocol specified is HTTPS.
<item src="my_secure_photocollection/index.html" />
<item src="my_secure_photocollection/myname/000001.jpg" />
The preceding XML defines two single items to obtain from the origin server. These two
items are relative to the web publishing root on the server.
<host name="https://my-auth-server.xyz.com:443"
The preceding XML defines the origin server from which to obtain content. The content is
to be acquired using HTTPS, or HTTP over SSL, so that the protocol specified is HTTPS.
This origin server also requires user authentication. The "user" and "password" attributes
specify the required username and password to access content from the origin server. The
sslAuthType is used to set either "weak" or "strong" SSL certification. For example,
"weak" certification allows expired, self-signed certification.
<item src="my_auth_moviecollection/index.html" />
<item src="my_auth_moviecollection/myname/000001.wmv" />
Again, the preceding XML defines two single items to obtain from the origin server. These
two items are relative to the web publishing root on the server.
The preceding XML defines the origin server from which to obtain content. Here, the
content is to be acquired using FTP, so that the protocol specified is FTP.
<item src="/my-doc-root/myphotocollection/index.html" />
<item src="my-doc-root/myphotocollection/file1.jpg" />
The preceding XML defines two single items to obtain from the origin server. Notice that
the first item starts with a "/" (forward slash). This means that the path is absolute, or
relative to the root directory. The second item does not start with a "/" (forward slash).
This means that the content path is relative to the default login directory for an
anonymous user.
To understand absolute and relative paths, consider the following directory listings:
The first directory lists the contents of /my-doc-root, and the second directory lists the
contents of anonymous-default-dir, where anonymous-default-dir is the default directory
for the "anonymous" user.
xyz# ls -lR /my-doc-root/
drwxrwxrwx 2 admin root 1024 Dec 28 01:46 myphotocollection
/my-doc-root/myphotocollection:
-rw-rw-rw- 1 admin root 4 Dec 28 01:46 index.html
xyz# ls -lR /anonymous-default-dir/
drwxrwxrwx 3 admin root 1024 Dec 28 01:53 my-doc-root
/anonymous-default-dir/my-doc-root:
drwxrwxrwx 2 admin root 1024 Dec 28 01:53 myphotocollection
/anonymous-default-dir/my-doc-root/myphotocollection:
-rw-rw-rw- 1 admin root 4 Dec 28 01:53 index.html
The single item with the following absolute path
<item src="/my-doc-root/myphotocollection/index.html" />
fetches the file /my-doc-root/myphotocollection/index.html.
The single item with the following relative path
<item src="my-doc-root/myphotocollection/file1.jpg" />
fetches the file /anonymous-default-dir/my-doc-root/myphotocollection/file1.jpg.
You must be careful to specify exactly what you want.
<host name="ftp://my-auth-server.xyz.com"
The preceding XML defines the origin server from which to obtain content. Here, the
content is to be acquired using FTP, so that the protocol specified is FTP. The origin
server requires user authentication. The "user" and "password" attributes specify the
required username and password to access content on the origin server.
<item src="/my-doc-root/mymoviecollection/index.html" />
<item src="my-own-moviecollection/wedding/file1.wmv" />
The preceding XML defines two single items to obtain from the origin server. Notice that
the first item specifies an absolute path, and the second one specifies a relative path.
In this case, the relative path is relative to the default login directory for the user
"myself."
Sample 2
Sample 2 is a manifest file written to show how to specify attributes.
alternateUrl="http://my-web-server.com/video-error-page.htm"
src: specifies the file location and is required.
prefetch: specifies the time when the ACNS software can start to acquire content from
the origin server.
ttl: checks whether this file is updated every 60 minutes. This value is required.
noRedirectToOrigin: when false, does not redirect the request to the origin server if
the content has not yet been replicated to the Content Engine.
requireAuth: when true, requires authentication to play back this content to users.
User requests are redirected to the origin server to check credentials. If the requests
pass the credential check, the content is played back from the Content Engine.
playServer: allows the HTTP Apache server and WMT server to play back this content.
That is, the supported playback protocols for this content are HTTP and MMS.
expires: removes content from the CDN when the content expires on the specified date.
alternateUrl: redirects the user to this URL when the request to play back the content
is received but the content has not yet been replicated to the Content Engine.
priority: specifies the item-priority. Content acquisition and distribution is
processed in the order set by the overall priority. This means that the higher the overall
priority, the earlier the content is acquired and distributed. The overall priority is
calculated as channel-priority * 10000 + item-priority. Channel priority is 250 for low,
500 for normal, and 750 for high. Item-priority is 10000 - (index of the item in the
manifest file) if a priority is not specified. For example, there are two items in this
manifest file. The first item does not have a "priority" attribute, but the second item
does and its priority is 20000. The item-priority of the first item is 10000 - 1 = 9999
and the item-priority of the second item is 20000. In this example, the item priority for
this item is 50000.
serveStartTime: specifies the time CDN can start to serve this content.
serveStopTime: specifies the time CDN stops serving this content.
alternateUrl="http://my-web-server.com/video-error-page.htm"
start-url: specifies the crawling start directory "/root/data/video-files/."
depth: specifies the crawl level of three subdirectories.
noRedirectToOrigin: if false and the crawled items are not replicated to the Content
Engine, the request for that content is not redirected to the origin server.
requireAuth: if true, authentication is required for all crawled content.
playServer: all crawled content can be played back by an HTTP web server and WMT
streaming server.
expires: all crawled content expires and is deleted at the specified time.
alternateUrl: if any of the crawled items are not replicated to the Content Engine, the
request for that content is redirected to this URL.
priority: all crawled items have the same item-priority as 50000. Because they are in
the same channel, they have the same overall priority.
serveStartTime: all crawled content can be served after the specified time.
serveStopTime: all crawled content cannot be served after the specified time.
The <http-meta-data/> tag can be used to specify any metadata for HTTP playback. For
example, because this item is acquired using FTP, this tag must be used to specify the
content type for this MPEG file. The <wmt-meta-data/> tag can be used to specify
attributes, such as title, author, copyright, and description, for WMT playback.
<wmt-meta-data author="john" copyright="2003, Cisco Systems Inc." />
For this crawl job, three subdirectory levels are crawled under the "data/mpeg-files-2/"
folder. The <http-meta-data> tag is used to specify content-type for HTTP playback for all
crawled content. The <wmt-meta-data> tag is used to specify WMT attributes, such as author
or copyright, for all crawled content.
Sample 3
Sample 3 is a manifest file that shows how to use the crawl feature.
The preceding XML specifies an FTP crawl job to crawl the "ftp-server" using the <crawler>
</crawler> tag pair. The starting directory is "pub-data/video-files/" and the crawl depth
is 1. The files in this folder are monitored at 10-minute intervals. If files are updated,
removed, or added, the resulting change is reflected in the CDN.
This crawl job is similar to the preceding crawl job, except that it includes a
<matchRule> </matchRule> tag pair to specify the kind of content that is to be acquired.
In this case, only files with the "mpg" file extension are acquired.
<host name="http://www.ftp-server.com" />
This is an HTTP directory crawl job. The HTTP server must be configured to enable the
directory indexing feature for those directories that you want to crawl. For the Apache
server, you must modify the Apache configuration file so that it looks like the following:
If the request URL points to a directory, the web server dynamically generates an HTML
page with a list of files contained in that directory.
In this crawl job, the directory "pub-data/video-files/" and its subdirectories are
crawled to a depth level of up to 5.
This crawl job is similar to the preceding crawl job, except that it includes the
<matchRule> </matchRule> tag pair. This matchRule tag fetches only files that match files
with the extension "mpg" and file sizes equal to or larger than 10 MB.
This crawl job attempts to crawl part of cnn.com. The start URL is
http://www.cnn.com/sport/index.htm. It only crawls URLs with the prefix
http://www.cnn.com/sport/, acquiring only files from the directory "sport/." The "depth=3"
means the job is to crawl only up to 3 link levels. The max-size-in-MB means that crawling
stops if the total of crawled items reaches 1000 MB in size.
This crawl job attempts to crawl part of cnn.com. The start URL is
http://www.cnn.com/movie/index.htm. and the crawl depth is 3. This matchRule acquires only
files with the "mpg" extension created after than Jan. 2, 2002 or files with the "asf"
extension created after July 2, 2002.
Sample 4
Sample 4 is a manifest file written to show the purpose and use of the <contains> tag. The <contains> tag is designed to prevent serving content if the required files are not present on the Content Engine. Typically, the delivery of a presentation consists of serving multiple files. For example, if an ASF video file uses several JPG or HTML files for its presentation, but the JPG or HTML files are not present, then the ASF video is not served.
<host name="http://my-origin-server/" />
These are just two regular single item acquisition jobs.
<contains cdn-url="images/intro.html" />
The preceding item, movie1.asf, contains two other items, intro.html and intro.jpg. The
items are contained using the <contains/> tag. If these two contained items are not
present on the Content Engine, then the CDN does not serve the container file movie1.asf.
Sample 5
Sample 5 is a manifest file written to show how to specify live content. ACNS 5.0 software supports two types of live content: wmt-live and real-live. You need to use the type attribute to specify live streaming content.
<host name="mms://www.company-web-site.org" />
<item src="/tmp/ceo-talk" type="wmt-live" >
<wmt-meta-data title="Company's vision" copyright="FirstName LastName" />
This is the "wmt-live" streaming content type specified by the "type" attribute. The live
stream URL is mms://www.company-web-site.org/tmp/ceo-talk. The "title" and "copyright"
metadata is added using the <wmt-meta-data> tag.
<host name="real-server" proto="rtsp" />
<item src="tmp/funny-video" type="real-live" />
This is the "real-live" streaming content type specified by the "type" attribute. The
stream URL is rtsp://real-server/tmp/funny-video.
Downloading the Sample Files
To download the preceding five sample files from Cisco.com, follow these steps:
Step 1
Go to the following URL to find the sample files:
http://www.cisco.com/pcgi-bin/tablebuild.pl/acns50
Step 2
When prompted, log in to Cisco.com using your designated Cisco.com username and password.
The Cisco ACNS Software download page appears, listing the available software updates for the Cisco ACNS software product.
Step 3
Locate the file named ACNS-5.0.1-manifest-samples.zip. This is a Zip archive containing the sample manifest files.
Step 4
Click the link for the ACNS-5.0.1-manifest-samples.zip file. The download page appears.
Step 5
Click Software License Agreement.
A new browser window opens, displaying the license agreement.
Step 6
After you have read the license agreement, close the browser window displaying the agreement and return to the Software Download page.
Step 7
Click the filename link labeled Download.
Step 8
Click Save to file and then choose a location on your workstation to temporarily store the zipped file containing the sample files.
Step 9
Use your preferred unzip program to unpack the scripts to a location on your workstation or your network.
After you have unzipped the sample files, you are ready to begin using them to create your own manifest files for your website.
Manifest Validator Utility
Because correct manifest file syntax is so important to the proper deployment of pre-positioned content on your CDN, Cisco makes available a manifest file syntax validator. The Manifest Validator, a Java-based command-line interface that verifies the correctness of the syntax of the manifest file you have written or modified, is built into the Content Distribution Manager GUI.
The Manifest Validator utility tests each line of the manifest file to identify syntax errors where they exist and determine whether or not the manifest file is valid and ready for use in importing content into your CDN. The results of these syntax validation tests are logged into a text file at a location that you name.
Running the Manifest Validator Utility
The Manifest Validator utility is built into the Content Distribution Manager GUI. Figure 6-1 shows the Manifest Validator GUI window.
Figure 6-1 Manifest Validator Content Distribution Manager GUI Window
To access the Manifest Validator, follow these steps:
Step 1
From the Content Distribution Manager GUI, choose Channels > Channels.
The Channels window appears.
Step 2
Click either the Edit or Create New Channel icon.
Step 3
From the Contents pane, choose Tools > Manifest Validator.
Note
You must first create a new channel or edit an exiting channel before you can access the Manifest Validator.
Enter the URL of the manifest file that you want to test in the Manifest File field and click Validate. The Manifest Validator checks the syntax of your manifest file to make sure that source files are named for each content item in the manifest. The Manifest Validator then checks the URL for each content item to verify that the content is placed correctly and then displays the output in the lower part of the GUI window. The Manifest Validator does not determine the size of the item.
Valid Manifest File Example
The following is an example of a valid manifest file:
<server name="my-dev'box">
<host name="http://128.107.150.26"
src="/tmp/first_grader.html"
<host name="http://umark-u5.cisco.com:8080/" />
<host name="http://unicorn-web" />
<item src="Media/wmtfiles/DCA%20Disk%201/Microsoft_Logos/Logos_100k.wmv" />
The final lines of the manifest file validator's output indicate whether or not the manifest is valid. Wait until the following message is displayed, indicating that the manifest file validator has finished processing the manifest file that you pointed to:
Total Number of Warning: 0
Manifest File is CORRECT.
If errors are found, the error messages reported appear before the preceding message.
Invalid Manifest File Example
The following is an example of an invalid manifest file:
<server name="my-dev'box">
<host name="http://128.107.150.26"
src="/tmp/first_grader.html"
<host name="http://umark-u5.cisco.com:8080/" >
<host name="http://unicorn-web" />
<item src1="Media/wmtfiles/DCA%20Disk%201/Microsoft_Logos/Logos_100k.wmv" />
In the preceding example, although there are no warnings, two errors are found, and this manifest file is syntactically incorrect, as shown in the following message:
ERROR (/state/dump/tmp.xml.1040667979990 line: 23 col: 1 ):No character data is allowed by
content model
ERROR (/state/dump/tmp.xml.1040667979990 line: 23 col: 9 ):Expected end of tag 'host'
Manifest File: /state/dump/tmp.xml.1040667979990
Total Number of Warning: 0
Manifest File is NOT CORRECT!
The following is a full-text output example of the invalid manifest file after the Manifest Validator checks the file:
Manifest validated: http://qiwzhang-lnx/nfs-obsidian/Unicorn/my-single-bad.xml
The manifest is downloaded as /state/dump/tmp.xml.1040667979990 for validation, this file
will be removed when validation is completed.
name=http://128.107.150.26
src=/tmp/first_grader.html
name=http://umark-u5.cisco.com:8080/
ERROR (/state/dump/tmp.xml.1040667979990 line: 23 col: 1 ):No character data is allowed by
content model
ERROR (/state/dump/tmp.xml.1040667979990 line: 23 col: 9 ):Expected end of tag 'host'
Manifest File: /state/dump/tmp.xml.1040667979990
Total Number of Warning: 0
Manifest File is NOT CORRECT!
Understanding Manifest File Validator Output
The manifest file validator messages appear below the Manifest File field in the Manifest Validator window of the Content Distribution Manager GUI.
Each output file has a similar structure and syntax. It clearly identifies any errors or warning messages arising from incorrect manifest file syntax. Manifest files are determined by the validator to be either:
•
CORRECT—Contains possible syntax irregularities but is syntactically valid and ready for deployment on your CDN
•
INCORRECT—Contains syntax errors and is unsuitable for deployment on your CDN
Syntax Errors
The manifest file validator issues syntax errors only when it cannot identify a source file for a listed content item, either because it is not listed, or because it is listed using improper syntax. Files containing syntax errors are marked INCORRECT.
Syntax errors are identified in the output with the ERROR label. In addition to the label, the line and column numbers containing the error are provided, as well as the manifest attribute for which the error was issued. An error appears in the following example:
ERROR (/state/dump/tmp.xml.1040667979990 line: 23 col: 1 ):No character data is allowed by
content model
•
/state/dump/tmp.xml.1040667979990 is the manifest file name
•
line: 23 col: 1 is the manifest file line and column number where the error occurs
•
No character data is allowed by content model describes the type of manifest file error
Syntax Warnings
The manifest file validator issues syntax warnings for a wide variety of irregularities in the manifest file syntax. Files containing syntax warnings may be marked CORRECT or INCORRECT, depending on whether or not syntax errors have also been issued.
Syntax warnings are identified in the output with the WARNING label. In addition to this warning label, the line number for which the warning is issued is provided, as well as the manifest attribute, valid options, and the default value for that attribute for which the warning was issued.
Correcting Manifest File Syntax
Once you have identified syntax warnings, errors, and messages using the output from the manifest file validator, you can correct your manifest file syntax and then rerun the manifest file validator on the corrected file to verify its correctness.
To correct syntax warnings and errors in your manifest file, follow these steps:
Step 1
Open your manifest file using your preferred XML editor.
Step 2
Referring to your manifest file validator output, use the line numbers provided by the manifest file validator to locate the syntax violations in your manifest file.
It is a good idea to review every warning and error in your manifest file. Some warnings, although they still allow the manifest file validator to find your manifest file syntax to be correct, can be the source of problems when you deploy the identified content to your CDN.
Step 3
After you have made the necessary corrections for syntax warnings and errors, click Save.
Step 4
Run the manifest file through the manifest file validator again and review the validator output for new or unresolved errors and warnings.
Step 5
Repeat Step 1 through Step 4 until every error and warning have been adequately resolved and the manifest file validator indicates that your manifest file syntax is correct.
Manifest File Reference
This major section contains the following topics:
•
Manifest File Structure and Syntax
•
XML Schema
•
Manifest File Automated Scripts
•
Manifest File Time Zone Tables
The most efficient and least error-prone methods of creating a manifest file are:
•
Modify one of the sample manifest files in this chapter to suit your particular needs, ensuring that your XML syntax is correct.
•
Use the two sample Perl scripts that can be downloaded from Cisco.com as is, or customize these downloaded scripts for your own purposes.
You can start with one of the prewritten sample XML manifest files presented in this chapter. Choose a sample manifest file that is closest to matching your content acquisition and pre-positioning needs, and then modify the XML code accordingly, while ensuring that your XML syntax is correct.
Alternatively, use the sample Perl scripts that Cisco provides (see the "Obtaining the Perl Scripts" section).
Once you have created a suitable manifest file, you can verify its correctness by running the Manifest Validator utility (see the "Manifest Validator Utility" section) on your newly written XML code from the Content Distribution Manager GUI.
Manifest File Structure and Syntax
The Cisco ACNS 5.0 software manifest file provides powerful features for representing and manipulating CDN data that can be easily edited using any simple text editor. Table 6-4 provides a summary list of the manifest file tags, their corresponding attributes and subelements, and a brief description of each tag. Table 6-5 shows an example of how tags are nested in a manifest file. The sections that follow provide a more detailed description of the manifest file tags, the data they contain, and their attributes.
Table 6-4 Manifest File Tag Summary
Tag Name
|
Subelements
|
Attributes
|
|
Description
|
CdnManifest
|
<playServerTable/> <options/> <server/> <item/> <item-group/> <crawler/>
|
None
|
|
Marks the beginning and end of the manifest file content.
|
playServerTable
|
<playServer/>
|
None
|
|
(Optional) Sets default mappings for media types.
|
playServer
|
<contentType/> <extension/>
|
name1
real http
|
qtss wmt
|
Names the media server type on the Content Engine responsible for playing content types and files with extensions mapped to it using <contentType> tags.
|
contentType
|
None
|
name
http media qtss
|
real wmt
See Table 6-6
|
(Optional, but must have either contentType or extension) Names the MIME-type content mapped to a playserver.
|
extension
|
None
|
name
http media qtss
|
real wmt
See Table 6-6
|
(Optional, but must have either contentType or extension) Names the file extension that is mapped to a playserver.
|
options
|
None
|
timeZone alternateUrl expires noRedirectToOrigin playServer prefetch
|
serveStartTime serveStopTime server priority ttl type
|
(Optional) Defines attributes specific to the manifest file that can be shared.
|
server
|
<host/>
|
name
|
|
Defines only one host from which content is to be retrieved.
|
host
|
None
|
name proto port user
|
password unencoded sslAuthType
|
Defines a web server or live server from which content is to be retrieved and later pre-positioned.
|
item
|
<contains/> <wmt-meta-data/> <http-meta-data/>
|
src cdn-url type noRedirectToOrigin playServer prefetch
|
expires ttl serveStartTime serveStopTime alternateUrl priority requireAuth
|
Identifies specific content that is to be acquired from the origin server.
|
crawler
|
<wmt-meta-data/> <http-meta-data/> <matchRule/>
|
start-url depth prefix accept reject max-number max-size-in-MB srcPrefix cdnPrefix requireAuth noRedirectToOrigin
|
playServer prefetch expires ttl serveStartTime serveStopTime alternateUrl priority server type
|
Supports crawling of a website or FTP server.
|
item-group
|
<wmt-meta-data/> <http-meta-data/> <matchRule> <matchRule/> <crawler></crawler> <item></item>
|
alternateUrl expires noRedirectToOrigin playServer prefetch serveStartTime serveStopTime
|
server priority srcPrefix cdnPrefix ttl type requireAuth
|
Places shared attributes under one tag so that they can be shared by every <item> and <crawler> tag within that group.
|
matchRule
|
<match>
|
None
|
|
(Optional) Defines additional filter rules for crawler jobs.
|
match
|
None
|
MIME-type extensions time-before
|
time-after size-min-in-KB size-max-in-KB
|
(Optional) Specifies the acquisition criteria of content objects before they can be acquired by the CDN.
|
contains
|
None
|
cdn-url
|
|
(Optional) Identifies content objects that are embedded within the content item currently being described.
|
wmt-meta-data
|
None
|
name=value
|
|
(Optional) Specifies one or more file attributes that are displayed in the Windows Media Player when the file is played back.
|
http-meta-data
|
None
|
name=value
|
|
(Optional) Sends HTTP response headers to end user HTTP requests to specify content type for FTP acquired content.
|
Table 6-5 Manifest File Nested Tag Relationships
<CdnManifest>
|
|
|
|
|
| |
<playServerTable> <playServer>
|
|
|
|
|
|
<contentType/> <extension/>
|
|
|
| |
|
|
</playServerTable> </playServer>
|
|
| |
<options>
|
|
|
|
| |
|
Manifest file shared attributes
|
|
|
| |
|
|
</options>
|
|
| |
<server>
|
|
|
|
| |
|
<host/>
|
|
|
| |
|
|
</server>
|
|
| |
<item>
|
|
|
|
| |
|
<contains/> <wmt-meta-data/> <http-meta-data/>
|
|
|
| |
|
|
</item>
|
|
| |
<crawler>
|
|
|
|
| |
|
<wmt-meta-data/> <http-meta-data/> <matchRule/>
|
|
|
| |
|
|
</crawler>
|
|
| |
<item-group>
|
|
|
|
| |
|
<contains/> <wmt-meta-data/> <http-meta-data/>
|
|
|
| |
|
|
</item-group>
|
|
| |
|
|
|
</CdnManifest>
|
CdnManifest
The <CdnManifest> </CdnManifest> tag set is required and marks the beginning and end of the manifest file content. At a minimum, each <CdnManifest> tag set must contain at least one item, or content object, that is fetched and stored.
Attributes
None
Subelements
The <CdnManifest> tag set can contain the following subelements:
•
playServerTable
The <CdnManifest> tag set can only contain one playServerTable subelement.
•
options
The <CdnManifest> tag set can only contain one options subelement.
•
server
•
item
•
item-group
•
crawler
Example
<server name="origin-server">
<host name="www.name.com" proto="http" port="80" />
<item cdn-url= "logo.jpg" server="originserver" src= "images/img.jpg" type="prepos"
playServer="http" ttl="300"/>
playServerTable
The <playServerTable> </playServerTable> tag set is optional and provides a means for you to set default mappings for a variety of media types. Mappings can be set for both MIME-type content (the preferred mapping) and file extensions. Playserver tables allow you to override default mappings on the Content Engine for content types from a particular origin server. Playservers can be any one of the following four streaming servers: WMT, RealMedia, HTTP, or QTSS. If no <playServerTable> tag is configured in the manifest file, a default <playServerTable> tag is used.
Using the manifest file, you can map groups of content items as well as individual content objects to an installed playserver. The following are content item and manifest file playserver mappings:
•
Content item URL
Playserver mappings appear immediately after the origin server name in place of the default <cdn-media> tag.
•
Manifest file as an attribute of the <item> or <item-group> tag
Playserver mappings placed at this location are identified using the playserver attribute and only apply to the named item or group of items.
•
Manifest file as a playserver table
Mappings are grouped within the <playServerTable> and <playServer> tags and are applied to content served from the origin server as directed by the manifest file.
•
System-level
Playserver mappings are configured during CDN startup.
The <playServerTable> tags are enclosed within the <CdnManifest> tags and name at least one of four playservers, such as RealServer, to which certain MIME-types and file extensions are mapped.
Attributes
None
Subelements
The <playServerTable> element must contain at least one <playServer> tag.
playServer
The <playServer> </playServer> tag set is required for the <playServerTable> tag and names the media server type on the Content Engine that is responsible for playing the content types and files with extensions mapped to it using the <contentType> tags. The <playServer> tag is enclosed within <playServerTable> tags.
Note
Do not confuse the <playServer> tag with the playserver setting in an <item> or <item-group> tag. An <item> or <item-group> tag specifies a server type to be used for an individual content object or group of related content objects. Although both playserver settings accomplish the same task, <item> tag-level playserver settings take precedence over the content-type and file extension mappings specified by the <playServer> tags in the <playServerTable> tag.
Attributes
The <playServer> tag name is required. Each <playServer> tag names the type of server to which content is mapped using the name attribute. In ACNS 5.0 software, Content Engines support four types of playservers:
•
real: RealMedia RealServer
•
http: HTTP web server
•
qtss: Apple QuickTime Streaming Server
•
wmt: Microsoft Windows Media Technologies
Subelements
At least one of the following subelements must be present in a <playServer> tag set.
•
<contentType/>
•
<extension/>
contentType
The <contentType> tag is optional but either a <contentType> or an <extension> subelement must be present in a <playServer> tag set. The <contentType> tag names MIME-type content that is to be mapped to a playserver. The <contentType> tag must be enclosed within a <playServer> tag set. When both <contentType> and <extension> tags are present in a <PlayServerTable> tag for a particular media type, the <contentType> mapping takes precedence.
Attributes
Each <contentType> tag names a media content type that is to be mapped to the playserver using the name attribute. The name attribute is required. Table 6-6 lists supported media types.
Subelements
None
Table 6-6 Supported Media File Formats Grouped by Manifest File Content Type
Extension
|
Supported Formats
|
Notes
|
http
|
• Audio Visual Interleaved (AVI)
• Graphics Interchange Format (GIF)
• Hypertext Markup Language (HTML, HTM)
• Joint Photographic Experts Group (JPG)
• Microsoft PowerPoint (PPT)
• Microsoft Word (DOC)
• Moving Picture Experts Group (MPEG, MPG)
• MPEG Audio Layer 3 (MP3)
• Portable Document Format (PDF)
• QuickTime Movie (MOV)
• ASX
|
The content item is processed by an HTTP server. This tag is used for content that cannot be streamed by any of the servers, for example, Adobe PDF, PostScript (PS), and MPG files.
|
media
|
• AVI
• GIF
• HTML, HTM
• JPG
• PPT
• DOC
• MPEG, MPG
• MP3
• PDF
|
This is the default value used by the Cisco ACNS 5.0 software. Use this media tag when no playserver is specified to process a content object. The linked object can be either a pre-positioned or a live content object.
|
qtss
|
• QuickTime (QT)
• MOV
|
The content object is processed by the Apple QuickTime Streaming Server.
|
real
|
• RealAudio (RA)
• RealMedia (RM)
• RealPix (RP)
• RealText (RT)
• Synchronized Container Format (SMIL)
|
The content object is processed by the RealServer.
|
wmt
|
• ASF (includes WMA and WMV)
• ASX
|
The content object is processed by Windows Media Services.
|
extension
The <extension> tag is optional but either a <contentType> or an <extension> subelement must be present in a <playServer> tag set. The <extension> tag names the file extension that is being mapped to a playserver.
The <extension> tag follows the <playServer> tag. When both <contentType> and <extension> tags are present in the <playServer> tag for a particular media type, the <contentType> mapping takes precedence.
Attributes
The name attribute is required and provides the file extension for a mapped content type. When files with the named extension are requested, the mapped playserver is used to serve them.
Subelements
None
Example
<contentType name="application/x-pn-realaudio" />
<contentType name="application/vnd.rn-rmadriver" />
<contentType name="application/pdf" />
<contentType name="application/postscript" />
<server name="test.origin.com/">
<host name="http://tst.orgn.com" proto="http" />
options
The <options> tag is optional and used to define attributes specific to the manifest file. Shared attributes can be inherited by <item> and <crawler> tags in the manifest file. For example, timeZone is an attribute specific to the manifest file that is used to set the time zone for all time-related values. Attributes such as ttl and alternativeUrl can exist as <options> tags, and their values can be shared by all <item> and <crawler> tags within the manifest file.
The <options> tag set is enclosed within the <CdnManifest> tag set and specifies at least one global setting. No more than one <options> tag is allowed per manifest file.
If parameters are defined within the manifest file <options>, <item-group>, or <item> tags, the order of precedence from lowest to highest is <options>, <item-group>, and <item>.
Attributes
The timeZone attribute specifies the time zone for time values of attributes such as serveStart, serveStop, expire, and prefetch.
The following list of attributes can be shared by <item> and <crawler> tags. See the "item" section for descriptions of the following attributes:
•
alternateUrl
•
expires
•
noRedirectToOrigin
•
playServer
•
prefetch
•
serveStartTime
•
serveStopTime
•
server
•
priority
•
ttl
•
type
•
requireAuth
Subelements
None
Example
noRedirectToOrigin= "true"
server
The <server> and <host> tag fields configure the origin content source server. The <host> tag field inside the <server> tag field configures the content source host. Having multiple <host> tag fields in one <server> tag field is not supported in ACNS 5.0 software.
Each <item> or <item-group> tag can have a server attribute that refers to this <server> tag field.The <server> </server> tag set is required and defines only one host from which content is to be retrieved. The <server> tags are contained within <CdnManifest> tags and contain one <host> tag that identifies the host from which content is retrieved.
Attributes
•
name
The name attribute is required and can be any name as long as it matches the server attribute values in the <item> or <crawler> tags.
Subelements
<host/>
The <server> tag set can contain only one <host> subelement.
host
The <host> tag is required and defines a web server or live server from which content is to be retrieved and later pre-positioned. Only one host can be defined within a single <server> tag set. The <host> tag must be enclosed within <server> tags.
Attributes
•
name
The name attribute is required and identifies the domain name or IP address of the host, unless the proto attribute field is empty. If the proto attribute field is empty, the name attribute must be a fully qualified URL, including scheme and domain name or IP address. It can also include subdirectories, such as http://www.abc.com/media.
•
proto
The proto attribute is optional and identifies the communication protocol that is used to fetch content from the host. Supported protocols are HTTP, HTTPS, or FTP. The default proto attribute is HTTP. The proto attribute can be empty if the name attribute is a fully qualified domain name (FQDN).
•
port
The port attribute is optional and identifies the TCP port through which traffic to and from the host passes. The port used depends on the protocol used. The default port for HTTP is 80. The port attribute is only required for a nonstandard port assignment. The port attribute can also be specified in the name attribute, such as name="http://www.cisco.com:8080/."
•
user
The user attribute is optional and identifies the secure login used for host access.
•
password
The password attribute is optional and identifies the password for the user account that is required to access the host server.
•
unencoded
The unencoded attribute is optional. If set to true, the password is not encoded. The unencoded attribute default setting is false.
•
sslAuthType
The sslAuthType attribute is optional and has two possible values for the type of encryption:
–
strong
The default sslAuthType attribute setting is strong.
–
weak
Subelements
None
item
The <item> </item> tag set identifies the specific content that is to be acquired. The <item> tag names a single piece of content or a content object on the origin server, such as a graphic, MPEG video, or RealAudio sound file. Content items can be listed individually or grouped using the <item-group> tag.
The <item> tag must be enclosed within the <CdnManifest> tag set and can also be enclosed within <item-group> tags.
Attributes
•
src
The src attribute is required and identifies the relative path of the origin server. For example:
•
server
The server attribute is optional and refers to the server name in the <server> tag. If the server attribute is omitted, the server listed in the closest <server> tag is used. If there is no <server> tag close to this <item>, the manifest server is used.
•
cdn-url
The cdn-url attribute is optional and is the relative CDN URL to allow end users to access this content. If no cdn-url value is specified, then the src value is used as the relative CDN URL.
Note
If you use FTP to acquire content and the content type is not specified in the manifest file and the cdn-url attribute is used to alter your publishing URL, the cdn-url attribute must have the correct extension. Otherwise, the incorrect content type will be generated and you cannot play the content.
•
type
The type attribute is optional and defines whether content is to be pre-positioned or live on the CDN. The three type attributes are prepos, wmt-live, and real-live. The wmt-live and real-live attributes are used to deliver live content. If this field is left blank, the default type is prepos.
•
noRedirectToOrigin
The noRedirectToOrigin attribute is optional and sets the redirection to the origin server to true or false. A false setting allows the CDN Content Engine or other edge device to redirect content requests to the origin server if the content is available at that device. A true setting does not allow the CDN Content Engine or edge device to redirect content requests to the origin server, and it generates an error. The default noRedirectToOrigin setting is false. For the effect of the noRedirectToOrigin attribute on pre-positioned content freshness, see the "Configuring Freshness of Pre-Positioned Content" section.
•
playServer
The playServer attribute is optional and names the server used to play back the content. Valid playservers are real (RealServer), wmt (Windows Media Technologies), qtss (QuickTime Streaming Server), and http (web server). The value in this field is either one playserver or multiple playservers separated by commas. If a value for this attribute is left blank, the <PlayServerTable> tag in the manifest file is used to generate the playserver list for this content. If the manifest file does not have the <PlayServerTable> tag specified, it uses the default <PlayServerTable> tag.
•
prefetch
The prefetch attribute is optional and designates a time in yyyy-mm-dd hh:mm:ss (year-month-day hour:minute:second) format at which the content is to be retrieved from the origin server. The time zone for the time can be specified in the <options> tag. Note that the automatic conversion between daylight saving time and standard time within a time zone is not supported, but a special designation for daylight saving time can be used, such as PDT for Pacific daylight saving time. In the following example, the prefetch time is September 5, 2002 at 09:09:09 Pacific daylight saving time:
<options timeZone="PDT" />
<item src="index.html" prefetch="2002-09-05 09:09:09 PDT" />
If a time value is omitted, the content is acquired immediately.
•
expires
The expires attribute is optional and designates a time in yyyy-mm-dd hh:mm:ss format when the content is to be removed from the CDN. Additionally, you can specify the GMT time zone (see the "Specifying Time Values in the Manifest File" section). If a time value is omitted, content is stored at the CDN until it is removed when you modify the relevant manifest file code. For the effect of the HTTP header expires attribute on pre-positioned content freshness, see the "Configuring Freshness of Pre-Positioned Content" section.
•
ttl
The ttl attribute is optional and designates a time interval, in minutes, for revalidation of the content. If a time value is omitted, the content is fetched only once and its freshness is never checked again.
•
serveStartTime
The serveStartTime attribute is optional and designates a time in yyyy-mm-dd hh:mm:ss format when the CDN is allowed to start serving the content. If the time to serve is omitted, content is ready to serve once it is distributed to the Content Engine or other edge device.
•
serveStopTime
The serveStopTime attribute is optional and designates a time in yyyy-mm-dd hh:mm:ss format when the CDN temporarily stops serving the content. If the time to stop serving is omitted, the CDN serves the content until it is removed by modifying the relevant manifest file code. For the effect of the serveStopTime attribute on pre-positioned content freshness, see the "Configuring Freshness of Pre-Positioned Content" section.
•
alternateUrl
The alternateUrl attribute is optional. If content requested by the user is not ready in the CDN, the CDN redirects the request to this alternative URL, which can be configured as an error reporting page. The alternateUrl attribute supports both the full URL or a relative path. (If the alternateUrl attribute is a relative path, the alternateUrl attribute must be relative to the requesting URL.)
•
priority
The priority attribute is optional and can be any integer value to specify the content processing priority. If a priority value is omitted, its index order within the manifest file is used to set the priority.
•
requireAuth
The requireAuth attribute is optional and determines whether users need to be authenticated to play the specified content. If authentication is required, the Content Engine communicates with the origin server to check credentials. When true, requireAuth requires authentication to play back the specified content to users. If the requests pass the credential check, the content is played back from the Content Engine. If this attribute is omitted, a heuristic approach is used. If the specified content is acquired by using a username and password, this attribute is required; otherwise, it is not required.
Subelements
•
<contains/>
•
<wmt-meta-data/>
•
<http-meta-data/>
Example
alternateUrl="http://www.cisco.com/cdn-error.html"
crawler
The <crawler> </crawler> tag set supports crawling a website or an FTP server.
Attributes
•
start-url
The start-url attribute is required. It defines the URL at which to start the process of crawling the website or FTP server. For an FTP server crawl, the start-url attribute must be a directory path with a forward slash as its last character. The start-url attribute defines a relative path, and the FTP server host name is necessary to compose the complete URL.
•
depth
The depth attribute is optional and defines the link depth to which a website is to be crawled or directory depth to which an FTP server is to be crawled. If the depth is not specified, the default is 20. The following are the general depth values:
0 = acquire only the starting URL
1 = acquire the starting URL and its referred files
-1 = infinite or no depth restriction
The depth is defined as the level of a website or the directory level of an FTP server, where 0 is the starting URL.
•
prefix
The prefix attribute is optional and combines the host name from the <server> tag with the value of the prefix attribute to create a full prefix. Only content whose URLs match the full prefix is acquired. For example:
<server name="xx"> <host name="www.cisco.com" proto="https" port=433 /> </server>
and in a <crawler> tag:
The full prefix is "https://www.cisco.com:433/marketing/eng/." Only URLs that match this prefix are crawled.
If a prefix is omitted, the crawler checks the default full prefix, which is the host name portion of the URL from the server. In the previous example, the default full prefix is "https://www.cisco.com:433."
•
accept
The accept attribute is optional and uses a regular expression to define acceptable URLs to crawl in addition to matching the prefix. For example, accept="stock" means that only URLs that meet two conditions are searched: the URL matches the prefix and contains the string "stock." (See the "Writing Common Regular Expressions" section for more information on using regular expressions.)
•
reject
The reject attribute is optional and uses a regular expression to reject a URL if it matches the reject regular expression. The reject regular expression is checked after checking for a prefix URL match. If a URL does not match the prefix, it is immediately rejected. If a URL matches the prefix and the reject parameters, it is rejected by the particular reject constraint. (See the "Writing Common Regular Expressions" section for more information on using regular expressions.)
•
max-number
The max-number attribute is optional and specifies the maximum number of crawler job objects that can be acquired.
•
max-size-in-MB
The max-size-in-MB attribute is optional and specifies the maximum content size in megabytes that this crawler job can acquire. The size attribute can be expressed in megabytes (MB), kilobytes (KB), or bytes (B).
•
srcPrefix
The srcPrefix attribute is optional and must be used in conjunction with the cdnPrefix attribute to form a relative CDN URL. If a srcPrefix attribute is not specified, or if the prefix of the relative source URL does not match the srcPrefix attribute, then the relative CDN URL is the cdnPrefix value combined with the relative source URL. For example, if these content objects have same source URL prefix "acme/pubs/docs/online/Design/" and you want to replace this prefix with a simple "online/," then specify srcPrefix="acme/pubs/docs/online/Design/" and cdnPrefix="online/."
•
cdnPrefix
The cdnPrefix attribute is optional and must be used in conjunction with the srcPrefix attribute.
•
requireAuth
The requireAuth attribute is optional and determines whether users need to be authenticated in order to play the specified content. If authentication is required, the Content Engine communicates with the origin server to check credentials. When true, requireAuth requires authentication to play back the specified content to users. If the requests pass the credential check, the content is played back from Content Engine.If this attribute is omitted, a heuristic approach is used. If the specified content is acquired by using a username and password, this attribute is required; otherwise, it is not required.
The following attributes, described under the <> tag attributes, can also be specified by the <crawler> tag.
•
alternateUrl
•
expires
•
noRedirectToOrigin
•
playServer
•
prefetch
•
serveStartTime
•
serveStopTime
•
server
•
priority
•
ttl
•
type
Subelements
•
<wmt-meta-data/>
•
<http-meta-data/>
•
<matchRule></matchRule>
Example
<host name="http://www.cisco.com/jobs/" />
start-url="eng/index.html"
item-group
The <item-group> </item-group> tag set is used to place shared attributes under one tag so that they can be shared by every <item> and <crawler> tag within that group. When attributes are shared, it means that attributes can be defined at either the <item-group> tag level for group-wide control or on a per <item> or per <crawler> tag basis. For example, if every <item> tag is using the same server and ttl attribute, you can create an <item-group> tag on top of these <item> tags and place the server and ttl attributes in the <item-group> tag.
Using shared attributes makes any manifest file with many <item> tags more efficient by consolidating the <item> tags with shared attributes. If the same attribute value exists in both the <item-group> and <item> tags, the value in the <item> tag takes precedence over that value in the <item-group> tag.
The <item-group> tag must be enclosed within the <CdnManifest> tag set and contain one or more <item> or <crawler> tags.
Attributes
If an attribute value is present only at the <item-group> tag level, then it is inherited by its inner element in the <item> tag. If an attribute value is present in a crawler job, its attributes, whether inherited or owned, are propagated to the content fetched by the crawler job.
The following attributes can be shared across many <item> and <crawler> tags and are candidates for the <item-group> level tag. See the "item" section for detailed descriptions of the following attributes:
•
alternateUrl
•
expires
•
noRedirectToOrigin
•
playServer
•
prefetch
•
serveStartTime
•
serveStopTime
•
server
•
priority
•
ttl
•
type
•
requireAuth
Additionally, the following two attributes can be placed within the <item-group> tag. See the "crawler" section for a detailed description of the two following attributes:
•
srcPrefix
•
cdnPrefix
These two attributes convert the prefix of the src-url (retrieve URL) to the cdn-url (publish URL) for multiple content objects. These content objects are either implicitly specified by multiple <item> tags or acquired through a crawler job.
These two attributes can also be specified in the <crawler> tag. If you explicitly specify the srcPrefix attribute and cdnPrefix attribute for an individual <crawler> job, the <crawler> tag-level specification takes precedence over the <item-group> tag-level settings. If you do not specify these attributes for an individual <crawler> job, the <item-group> tag-level specification is inherited by the <crawler> job.
The srcPrefix and cdnPrefix attributes generate the relative CDN URL using the following rules:
•
If the cdn-url attribute is present in the <item> tag, the relative CDN URL contains both the cdnPrefix attribute plus the cdn-url attribute. For example, if cdnPrefix="eng/spec" and cdn-url="e/f.html," the relative path in the URL is "eng/spec/e/f.html."
•
If the srcPrefix attribute is not present in the <item> tag, the relative CDN URL is the cdnPrefix attribute as well as the relative source URL.
•
If the prefix of the relative source URL does not match the srcPrefix attribute, the relative CDN URL is the cdnPrefix attribute as well as the source relative URL.
•
To generate a relative CDN URL, remove the matched prefix from the relative source URL and replace it with the cdnPrefix attribute.
The relative CDN URL of the <item> tag in the following example is "acme/default.htm."
<item-group cdnPrefix="acme/" >
<item src="design/index.html" cdn-url="default.html" />
In the following example, content objects with the srcPrefix attribute, such as "design/plan/," have the relative CDN URL as "acme/" as well as relative source URLs stripped of "design/plan/." Other content objects whose prefix attribute does not match "design/plan/" have "acme/" as well as their original relative source URL.
start-url="design/plan/index.html"
Subelements
•
<matchRule></matchRule>
•
<wmt-meta-data />
•
<http-meta-data/>
•
<crawler></crawler>
•
<item></item>
Example
<!--grouped content items-->
<item-group server="origin-web-server" type="prepos" ttl="300" cdnPrefix="unicorn/" >
<item cdn-url="newHQpresentation.rm" src="newHQpresentation.rm" />
<item cdn-url="animatedlogo.mpg" src="animlogo.mpg" />
<item cdn-url="companytheme.mp3" src="cotheme.mp3" />
<item cdn-url="newHQlayout.avi" src="newHQ.mov" />
matchRule
The <matchRule> </matchRule> tag set is optional and defines additional filter rules for crawler jobs. It affects only <crawler> tasks and is not used by single <item> tags. The crawler parameters defined in the <crawler></crawler> tag set determine primarily the scope of a crawl search. If a content object does not meet the criteria specified by the crawler parameter, neither it nor its children are searched.
The <matchRule> tag, however, determines only whether or not the content objects should be acquired regardless of the scope of the search. If a web page matches the crawler parameters without the <matchRule> feature, its children are searched even though its content objects are not acquired.
In the following crawler job example using the <matchRule> tag, the entire website is searched but only files with the .jpg file extension larger than 50 kilobytes are acquired.
<crawler start-url="index.html" depth="-1" >
<match size-min-in-KB="50" extensioin="jpg" />
The <matchRule> element can be nested within an <item-group> tag to define group-wide filter rules for <crawler> tags contained in the group. It can also be a subelement of a particular <crawler> job. The <crawler> tag-level setting overrides the <item-group> tag-level setting when both tags are present.
If you define criteria locally for individual <crawler> jobs, any existing group-level criterion is entirely discarded for that <crawler> job. That is, if your <item-group> tag match rule is set to A and your <crawler> tag specifies another match rule set to B, only B is to be used for the <crawler> tag rather than a combination of A and B. You can define at most one <matchRule> tag per <item-group> tag and at most one <matchRule> tag per <crawler> tag.
Attributes
None
Subelements
At least one <match> tag
match
The <match> </match> tag set is optional and specifies the acquisition criteria of content objects before they can be acquired by the CDN. Every attribute within a single <match> tag is ANDed (to form a logical conjunction) with the other attributes.
You can specify multiple <match> tags within the <matchRule> tag. The <match> tags are ORed (to form a logical inclusion) with other <match> tags. You must specify at least one <match> tag per <matchRule> tag.
Attributes
•
mime-type
The mime-type attribute specifies MIME types.
•
extension
The extension attribute specifies file extensions.
•
time-before
The time-before attribute specifies that this content was modified before this time in yyyy-mm-dd hh:mm:ss format. Time parameters should be expressed in GMT time zones (for GMT offsets, see the "Manifest File Time Zone Tables" section).
•
time-after
The time-after attribute specifies that this content was modified after this time in yyyy-mm-dd hh:mm:ss format. Time parameters should be expressed in GMT time zones (for GMT offsets, see the "Manifest File Time Zone Tables" section).
•
size-min-in-MB
The size-min-in-MB attribute specifies that the acquired content size must be larger than this number of kilobytes. The size attribute can be expressed in megabytes (MB), kilobytes (KB), or bytes (B).
•
size-max-in-MB
The size-max-in-MB attribute specifies that the acquired content size must be smaller than this number of kilobytes. This attribute can be expressed in megabytes (MB), kilobytes (KB), or bytes (B).
Subelements
None
Example
<! - - crawling item group -- >
<item-group server="origin-server" type="prepos">
<match time-before="2000-05-05 12:0:0"/>
<crawler start-url="eng/index.html" depth="-1"/>
<crawler start-url="hr/index.html" depth="3">
<match size-min-in-KB="1" extension="xxx"/>
contains
The <contains> tag is optional and identifies content objects that are embedded within the content item currently being described. For example, the components of a SMIL (Synchronized Multimedia Integration Language) file requests for an item using <contains> links are only accepted after the CDN determines that dependent content objects are present in the Content Engine.
The <contains> tag must be enclosed within the <item> </item> tag.
The <contains> tag is used to include embedded files for some video files like .asf or .rp. The CDN does not serve this item unless every contained item is present.
Attributes
The cdn-url attribute is required and is the relative CDN URL of one of the embedded contents.
Subelements
None
Example
<item src="house/img08.jpb" cdn-url="img08.jpg" />
<item src="house/img09.jpb" cdn-url="img09.jpg" />
<item cdn-url="house.rp"src="house/house.rp">
<contains cdn-url="img08.jpg"/>
<contains cdn-url="img09.jpg"/>
wmt-meta-data
The <wmt-meta-data
> tag is optional and for use with Windows Media Technologies (.wma, .wmv, and .asf) files only. It specifies one or more file attributes that are displayed in the Windows Media Player when the file is played back.
The element may be enclosed within <item-group> or <item> or <crawler> tags. At most, one such element may be specified for its parent element.
Attributes
The name=value attribute; typical attributes for WMT players are Title, Author, Copyright, and Description. If parented with the <item-group> tag, then the attribute applies to the content contained within the group.
This attribute can also be nested within an <item> or a <crawler> tag. The <item> or <crawler> tag-level settings override the <item-group> tag-level settings.
Subelements
None
http-meta-data
The <http-meta-data> tag is optional and used for HTTP playback of content. If a content object is requested through HTTP, these attributes are sent to the end users as HTTP response headers. This type of response header is useful when you specify content type for FTP acquired content.
The element can also be nested within an <item> or a <crawler> tag. The <item> or <crawler> tag-level settings override the <item-group> tag-level settings.
Attributes
The name=value attribute can be both standard HTTP header metadata and customized application metadata. If parented with the <item-group> tag, then the attribute applies to the content contained within the group.
This attribute can also be nested within an <item> or a <crawler> tag. The <item> or <crawler> tag-level settings override the <item-group> tag-level settings.
Subelements
None
Configuring Freshness of Pre-Positioned Content
Four different manifest file configurations are possible to configure and manage the freshness of your pre-positioned content using the serveStopTime and noRedirectToOrigin attributes. The following configurations are possible:
•
Both the serveStopTime and noRedirectToOrigin attributes are included in the manifest file, making the condition noRedirectToOrigin true. The conditions for this first case are shown in Table 6-7.
•
Only the serveStopTime attribute is included in the manifest file. The noRedirectToOrigin attribute is not, making the condition noRedirectToOrigin false. The conditions for this second case are shown in Table 6-8.
•
Neither the serveStopTime nor the noRedirectToOrigin attribute is included in the manifest file, making the condition noRedirectToOrigin false. The conditions for this third case are shown in Table 6-9.
•
Only the noRedirectToOrigin attribute is included in the manifest file. The serveStopTime attribute is not, making the condition noRedirectToOrigin true. The conditions for this fourth case are shown in Table 6-10.
Depending on whether the serveStopTime and noRedirectToOrigin attributes are included and the timing combinations of the serveStopTime value and the HTTP header expiration, the conditions and corresponding results are listed in Table 6-7 through Table 6-10. In the following tables, now is defined as the time the end user content request arrives. These tables use the end user request arrival time to make content delivery decisions.
Table 6-7 Both serveStopTime and noRedirectToOrigin Attributes Included, noRedirectToOrigin=true
Condition
|
Result
|
Now is past the serveStopTime value
|
Content is not served and an error message appears
|
Now is before the serveStopTime value but is past the HTTP expires header
|
Content is served from the cdnfs, but the content can be stale
|
Now is before the serveStopTime value and is before the HTTP expires header
|
Content is served from the CDNFS
|
Now is before the serveStopTime value and no HTTP expires header exists
|
Content is served from the CDNFS
|
Table 6-8 Only serveStopTime Attribute Included, noRedirectToOrigin=false
Condition
|
Result
|
Now is past the serveStopTime value
|
Content is served by proxy from the origin server
|
Now is before the serveStopTime value but is past the HTTP expires header
|
Content is served from the cdnfs, but content can be stale
|
Now is before the serveStopTime value and is before the HTTP expires header
|
Content is served from the cdnfs
|
Now is before the serveStopTime value and no HTTP expires header exists
|
Content is served from the cdnfs
|
Table 6-9 Neither serveStopTime nor noRedirectToOrigin Attributes Included, noRedirectToOrigin=true
Condition
|
Result
|
Now is past the HTTP expires header
|
Content is served by proxy from the origin server
|
Now is before the HTTP expires header
|
Content is served from the cdnfs
|
No HTTP expires header exists
|
Content is served from the cdnfs
|
Table 6-10 Only the noRedirectToOrigin Attribute Included, noRedirectToOrigin=true
Condition
|
Result
|
Now is past the HTTP expires header
|
Content is served from the cdnfs, but content can be stale
|
Now is before the HTTP expires header
|
Content is served from the cdnfs
|
No HTTP expires header exists
|
Content is served from the cdnfs
|
XML Schema
In the case of the manifest file, an XML schema defines the custom markup language of the manifest file and the appearance of a given set of XML documents. The XML schema specifies which tags or elements you can use in your documents, the attributes those tags can contain, and their arrangement.
Manifest XML Schema
An XSD is a library that provides an application programming interface (API) for manipulating the components of an XML schema. For more information on an XSD, go to http://www.w3schools.com/schema/schema_intro.asp.
The following XML code is the manifest XML schema (CdnManifest.xsd).
[qiwzhang@qiwzhang-linux schema]$ cat CdnManifest.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:include schemaLocation="PlayServerTable.xsd"/>
<xs:element name="CdnManifest">
<xs:element ref="playServerTable" minOccurs="0" maxOccurs="1"/>
<xs:element ref="options" minOccurs="0" maxOccurs="1"/>
<xs:element ref="proxyServer" minOccurs="0" maxOccurs="unbounded"/>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="server" maxOccurs="unbounded"/>
<xs:element ref="item-group" maxOccurs="unbounded"/>
<xs:element ref="item" maxOccurs="unbounded"/>
<xs:element ref="crawler" maxOccurs="unbounded"/>
<xs:choice minOccurs="1" maxOccurs="unbounded">
<xs:element ref="item-group" maxOccurs="unbounded"/>
<xs:element ref="item" maxOccurs="unbounded"/>
<xs:element ref="crawler" maxOccurs="unbounded"/>
<xs:element name="options">
<xs:attribute name="timeZone" type="xs:string" use="optional"/>
<xs:attribute name="notFoundUrl" type="xs:string" use="optional"/>
<xs:attribute name="alternateUrl" type="xs:string" use="optional"/>
<xs:attribute name="noRedirectToOrigin" type="xs:boolean" use="optional"
default="false" />
<xs:attribute name="requireAuth" type="xs:boolean" use="optional"/>
<xs:attribute name="ttl" type="xs:unsignedInt" use="optional"/>
<xs:attribute name="prefetch" type="xs:string" use="optional"/>
<xs:attribute name="ttl-for-missing" type="xs:unsignedInt" use="optional"/>
<xs:attribute name="ttl-for-non-ref" type="xs:unsignedInt" use="optional"/>
<xs:attribute name="type" use="optional" default="prepos">
<xs:restriction base="xs:string">
<xs:enumeration value="prepos"/>
<xs:enumeration value="wmt-live"/>
<xs:enumeration value="real-live"/>
<xs:attribute name="manifest-id" type="xs:string" use="optional"/>
<xs:attribute name="clearlog" type="xs:boolean" use="optional" default="false"/>
<xs:attribute name="rd" type="xs:string" use="optional"/>
<xs:attribute name="prepos-tag" type="xs:string" use="optional"/>
<xs:attribute name="live-tag" type="xs:string" use="optional"/>
<xs:element name="server">
<xs:element ref="host" minOccurs="1" maxOccurs="1"/>
<xs:attribute name="name" type="xs:string" use="required"/>
<xs:attribute name="name" type="xs:string" use="required"/>
<xs:attribute name="root" type="xs:string" use="optional"/>
<xs:attribute name="proxyServer" type="xs:string" use="optional"/>
<xs:attribute name="proto" use="optional">
<xs:restriction base="xs:string">
<xs:enumeration value="http"/>
<xs:enumeration value="https"/>
<xs:enumeration value="ftp"/>
<xs:enumeration value="mms"/>
<xs:enumeration value="rtsp"/>
<xs:attribute name="port" type="xs:unsignedShort" use="optional"/>
<xs:attribute name="user" type="xs:string" use="optional"/>
<xs:attribute name="password" type="xs:string" use="optional"/>
<xs:attribute name="uuencoded" type="xs:boolean" use="optional"
default="false"/>
<xs:attribute name="proxyName" type="xs:string" use="optional"/>
<xs:attribute name="sslAuthType" use="optional">
<xs:restriction base="xs:string">
<xs:enumeration value="weak"/>
<xs:enumeration value="strong"/>
<xs:element name="proxyServer">
<xs:attribute name="serverName" type="xs:string" use="required"/>
<xs:attribute name="port" type="xs:unsignedShort" use="optional"/>
<xs:attribute name="user" type="xs:string" use="optional"/>
<xs:attribute name="password" type="xs:string" use="optional"/>
<xs:attribute name="uuencoded" type="xs:string" use="optional" default="false"/>
<xs:attributeGroup name = "contentAttr">
<xs:attribute name="server" type="xs:string" use="optional"/>
<xs:attribute name="proxyServer" type="xs:string" use="optional"/>
<xs:attribute name="playServer" type="xs:string" use="optional"/>
<xs:attribute name="type" use="optional">
<xs:restriction base="xs:string">
<xs:enumeration value="prepos"/>
<xs:enumeration value="wmt-live"/>
<xs:enumeration value="real-live"/>
<xs:attribute name="noRedirectToOrigin" type="xs:boolean" use="optional"/>
<xs:attribute name="requireAuth" type="xs:boolean" use="optional"/>
<xs:attribute name="alternateUrl" type="xs:string" use="optional"/>
<xs:attribute name="ttl" type="xs:unsignedInt" use="optional"/>
<xs:attribute name="priority" type="xs:unsignedInt" use="optional"/>
<xs:attribute name="prefetch" type="xs:string" use="optional"/>
<xs:attribute name="expires" type="xs:string" use="optional"/>
<xs:attribute name="serve" type="xs:string" use="optional"/>
<xs:attribute name="serveStartTime" type="xs:string" use="optional"/>
<xs:attribute name="serveStopTime" type="xs:string" use="optional"/>
<xs:attributeGroup name = "prefixAttr">
<xs:attribute name="cdnPrefix" type="xs:string" use="optional"/>
<xs:attribute name="srcPrefix" type="xs:string" use="optional"/>
<xs:element name="item-group">
<xs:element ref="matchRule" minOccurs="0" maxOccurs="1"/>
<xs:element ref="http-meta-data" minOccurs="0" maxOccurs="1"/>
<xs:element ref="wmt-meta-data" minOccurs="0" maxOccurs="1"/>
<xs:choice minOccurs="1" maxOccurs="unbounded">
<xs:element ref="item-group" maxOccurs="unbounded"/>
<xs:element ref="item" maxOccurs="unbounded"/>
<xs:element ref="crawler" maxOccurs="unbounded"/>
<xs:attributeGroup ref="contentAttr"/>
<xs:attributeGroup ref="prefixAttr"/>
<xs:element ref="contains" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="http-meta-data" minOccurs="0" maxOccurs="1"/>
<xs:element ref="wmt-meta-data" minOccurs="0" maxOccurs="1"/>
<xs:attribute name="src" type="xs:string" use="required"/>
<xs:attribute name="cdn-url" type="xs:string" use="optional"/>
<xs:attributeGroup ref="contentAttr"/>
<xs:element name="crawler">
<xs:element ref="matchRule" minOccurs="0" maxOccurs="1"/>
<xs:element ref="http-meta-data" minOccurs="0" maxOccurs="1"/>
<xs:element ref="wmt-meta-data" minOccurs="0" maxOccurs="1"/>
<xs:attribute name="start-url" type="xs:string" use="required"/>
<xs:attribute name="depth" type="xs:short" use="optional"/>
<xs:attribute name="prefix" type="xs:string" use="optional"/>
<xs:attribute name="accept" type="xs:string" use="optional"/>
<xs:attribute name="reject" type="xs:string" use="optional"/>
<xs:attribute name="max-number" type="xs:unsignedInt" use="optional"/>
<xs:attribute name="max-size-in-B" type="xs:unsignedInt" use="optional"/>
<xs:attribute name="max-size-in-KB" type="xs:unsignedInt" use="optional"/>
<xs:attribute name="max-size-in-MB" type="xs:unsignedInt" use="optional"/>
<xs:attributeGroup ref="contentAttr"/>
<xs:attributeGroup ref="prefixAttr"/>
<xs:element name="contains">
<xs:attribute name="cdn-url" type="xs:string" use="required"/>
<xs:element name="matchRule">
<xs:element ref="match" minOccurs="1" maxOccurs="unbounded"/>
<xs:element name="match">
<xs:attribute name="mime-type" type="xs:string" use="optional"/>
<xs:attribute name="time-before" type="xs:string" use="optional"/>
<xs:attribute name="time-after" type="xs:string" use="optional"/>
<xs:attribute name="size-min-in-B" type="xs:int" use="optional"/>
<xs:attribute name="size-max-in-B" type="xs:int" use="optional"/>
<xs:attribute name="size-min-in-KB" type="xs:int" use="optional"/>
<xs:attribute name="size-max-in-KB" type="xs:int" use="optional"/>
<xs:attribute name="size-min-in-MB" type="xs:int" use="optional"/>
<xs:attribute name="size-max-in-MB" type="xs:int" use="optional"/>
<xs:attribute name="extension" type="xs:string" use="optional"/>
<xs:element name="http-meta-data">
<xs:anyAttribute processContents="skip" />
<xs:element name="wmt-meta-data">
<xs:anyAttribute processContents="skip" />
PlayServerTable XML Schema
The following XML code defines the PlayServerTable schema (playServerTable.xsd) for the CdnManfiest.xsd.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="playServerTable">
<xs:element ref="playServer" minOccurs="1" maxOccurs="unbounded"/>
<xs:element name="playServer">
<xs:choice minOccurs="1" maxOccurs="unbounded">
<xs:element ref="contentType"/>
<xs:element ref="extension"/>
<xs:attribute name="name" use="required">
<xs:restriction base="xs:string">
<xs:enumeration value="real"/>
<xs:enumeration value="wmt"/>
<xs:enumeration value="http"/>
<xs:enumeration value="qtss"/>
<xs:element name="contentType">
<xs:attribute name="name" type="xs:string" use="required"/>
<xs:element name="extension">
<xs:attribute name="name" type="xs:string" use="required"/>
Default PlayServerTable Schema
The following XML code defines the default PlayServerTable schema (PlayServerTable.xsd).
<playServerTable xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation = "PlayServerTable.xsd">
<!-- MIME type is taken from
http://service.real.com/help/library/guides/server8/htmfiles/custmizg.htm
<contentType name="audio/x-pn-realaudio" />
<contentType name="audio/x-pn-realaudio-plugin" />
<contentType name="application/x-pn-realmedia" />
<contentType name="application/smil" />
<contentType name="application/vnd.rn-rmadriver" />
<contentType name="video/quicktime" />
<!-- extension avi could go here, but is also supported by wmt -->
<!-- MIME types taken from
http://msdn.microsoft.com/workshop/imedia/windowsmedia/server/mime.asp
<contentType name="video/x-ms-asf" />
<contentType name="audio/x-ms-wma" />
<contentType name="video/x-ms-wmv" />
<contentType name="video/x-ms-wm" />
<contentType name="application/x-ms-wmz" />
<contentType name="application/x-ms-wmd" />
<!-- comments courtesy of Laura Gaughan, 11jan2001 -->
<extension name="wma" /> <!-- audio content -->
<extension name="wmv" /> <!-- audio/video content -->
<extension name="asf" /> <!-- audio/video content (legacy) -->
<extension name="wm" /> <!-- reserved for future use -->
<!-- extension avi could go here, but is also supported by qtss -->
<contentType name="application/pdf" />
<contentType name="application/postscript" />
<!-- this must be http; wmt doesn't do asx over mms -->
<contentType name="audio/x-ms-wax" />
<contentType name="video/x-ms-wvx" />
<contentType name="video/x-ms-wmx" />
<extension name="asx" /> <!-- as for wvx + .asf .asx (legacy) -->
<extension name="wax" /> <!-- metadata for .wma .wax -->
<extension name="wvx" /> <!-- metadata for .wma .wmv .wvx .wax -->
<extension name="wmx" /> <!-- reserved for future use -->
add all types from wmt tables to here, since they can be played
http://msdn.microsoft.com/workshop/imedia/windowsmedia/server/mime.asp
<contentType name="video/x-ms-asf" />
<contentType name="audio/x-ms-wma" />
<contentType name="video/x-ms-wmv" />
<contentType name="video/x-ms-wm" />
<contentType name="application/x-ms-wmz" />
<contentType name="application/x-ms-wmd" />
<!-- comments courtesy of Laura Gaughan, 11jan2001 -->
<extension name="wma" /> <!-- audio content -->
<extension name="wmv" /> <!-- audio/video content -->
<extension name="asf" /> <!-- audio/video content (legacy) -->
<extension name="wm" /> <!-- reserved for future use -->
<!-- extension avi could go here, but is also supported by qtss -->
Manifest File Time Zone Tables
To convert to local time, you must know the time difference between Greenwich mean time (GMT) and local time for both standard time and summer time (daylight saving time). Table 6-11 through Table 6-26 list the time zones supported by the manifest file. The format for writing the time zone is:
<zonename>:[+|-:]hh:mm per line
where <zonename> is the name of the time zone or standard time zone abbreviation (see Table 6-11) without spaces before or after the colon (":"), and "[+|-:]hh:mm" is the GMT offset in hours and minutes. The GMT offset default is "+."
Table 6-11 Standard Time Zones and GMT Offsets
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
ACT:+:09:30
|
Etc/GMT+7:-:07:00
|
HST:-:10:00
|
ADT:-:03:00
|
Etc/GMT+8:-:08:00
|
IET:-:05:00
|
AET:+:10:00
|
Etc/GMT+9:-:09:00
|
IST:+:05:30
|
AGT:-:03:00
|
Etc/GMT-0:00:00
|
JST:+:09:00
|
ART:+:02:00
|
Etc/GMT-10:+:10:00
|
MDT:-:06:00
|
AST:-:09:00
|
Etc/GMT-11:+:11:00
|
MET:+:01:00
|
BET:-:03:00
|
Etc/GMT-12:+:12:00
|
MIT:-:11:00
|
BST:+:06:00
|
Etc/GMT-13:+:13:00
|
MST7MDT:-:07:00
|
CAT:+:02:00
|
Etc/GMT-14:+:14:00
|
MST:-:07:00
|
CDT:-:05:00
|
Etc/GMT-1:+:01:00
|
NET:+:04:00
|
CET:+:01:00
|
Etc/GMT-2:+:02:00
|
NST:+:12:00
|
CNT:-:03:30
|
Etc/GMT-3:+:03:00
|
NZ-CHAT:+:12:45
|
CST6CDT:-:06:00
|
Etc/GMT-4:+:04:00
|
NZ:+:12:00
|
CST:-:06:00
|
Etc/GMT-5:+:05:00
|
Navajo:-:07:00
|
CTT:+:08:00
|
Etc/GMT-6:+:06:00
|
PDT:-:07:00
|
EAT:+:03:00
|
Etc/GMT-7:+:07:00
|
PLT:+:05:00
|
ECT:+:01:00
|
Etc/GMT-8:+:08:00
|
PNT:-:07:00
|
EDT:-:04:00
|
Etc/GMT-9:+:09:00
|
PRC:+:08:00
|
EET:+:02:00
|
Etc/GMT0:00:00
|
PRT:-:04:00
|
EST5EDT:-:05:00
|
Etc/GMT:00:00
|
PST8PDT:-:08:00
|
EST:-:05:00
|
Etc/Greenwich:00:00
|
PST:-:08:00
|
Etc/GMT+0:00:00
|
Etc/UCT:00:00
|
ROK:+:09:00
|
Etc/GMT+10:-:10:00
|
Etc/UTC:00:00
|
SST:+:11:00
|
Etc/GMT+11:-:11:00
|
Etc/Universal:00:00
|
UCT:00:00
|
Etc/GMT+12:-:12:00
|
Etc/Zulu:00:00
|
UTC:00:00
|
Etc/GMT+1:-:01:00
|
GB-Eire:00:00
|
Universal:00:00
|
Etc/GMT+2:-:02:00
|
GB:00:00
|
VST:+:07:00
|
Etc/GMT+3:-:03:00
|
GMT0:00:00
|
W-SU:+:03:00
|
Etc/GMT+4:-:04:00
|
GMT:00:00
|
WET:00:00
|
Etc/GMT+5:-:05:00
|
Greenwich:00:00
|
Zulu:00:00
|
Etc/GMT+6:-:06:00
|
HDT:-:09:00
|
|
Table 6-12 Africa GMT Offsets
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Africa/Abidjan:00:00
|
Africa/Djibouti:+:03:00
|
Africa/Maputo:+:02:00
|
Africa/Accra:00:00
|
Africa/Douala:+:01:00
|
Africa/Maseru:+:02:00
|
Africa/Addis_Ababa:+:03:00
|
Africa/El_Aaiun:00:00
|
Africa/Mbabane:+:02:00
|
Africa/Algiers:+:01:00
|
Africa/Freetown:00:00
|
Africa/Mogadishu:+:03:00
|
Africa/Asmera:+:03:00
|
Africa/Gaborone:+:02:00
|
Africa/Monrovia:00:00
|
Africa/Bamako:00:00
|
Africa/Harare:+:02:00
|
Africa/Nairobi:+:03:00
|
Africa/Bangui:+:01:00
|
Africa/Johannesburg:+:02:00
|
Africa/Ndjamena:+:01:00
|
Africa/Banjul:00:00
|
Africa/Kampala:+:03:00
|
Africa/Niamey:+:01:00
|
Africa/Bissau:00:00
|
Africa/Khartoum:+:03:00
|
Africa/Nouakchott:00:00
|
Africa/Blantyre:+:02:00
|
Africa/Kigali:+:02:00
|
Africa/Ouagadougou:00:00
|
Africa/Brazzaville:+:01:00
|
Africa/Kinshasa:+:01:00
|
Africa/Porto-Novo:+:01:00
|
Africa/Bujumbura:+:02:00
|
Africa/Lagos:+:01:00
|
Africa/Sao_Tome:00:00
|
Africa/Cairo:+:02:00
|
Africa/Libreville:+:01:00
|
Africa/Timbuktu:00:00
|
Africa/Casablanca:00:00
|
Africa/Lome:00:00
|
Africa/Tripoli:+:02:00
|
Africa/Ceuta:+:01:00
|
Africa/Luanda:+:01:00
|
Africa/Tunis:+:01:00
|
Africa/Conakry:00:00
|
Africa/Lubumbashi:+:02:00
|
Africa/Windhoek:+:01:00
|
Africa/Dakar:00:00
|
Africa/Lusaka:+:02:00
|
|
Africa/Dar_es_Salaam:+:03:00
|
Africa/Malabo:+:01:00
|
|
Table 6-13 America GMT Offsets
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
America/Adak:-:10:00
|
America/Grenada:-:04:00
|
America/Noronha:-:02:00
|
America/Anchorage:-:09:00
|
America/Guadeloupe:-:04:00
|
America/North_Dak/Ctr:-:06:00
|
America/Anguilla:-:04:00
|
America/Guatemala:-:06:00
|
America/Panama:-:05:00
|
America/Antigua:-:04:00
|
America/Guayaquil:-:05:00
|
America/Pangnirtung:-:05:00
|
America/Araguaina:-:03:00
|
America/Guyana:-:04:00
|
America/Paramaribo:-:03:00
|
America/Aruba:-:04:00
|
America/Halifax:-:04:00
|
America/Phoenix:-:07:00
|
America/Asuncion:-:04:00
|
America/Havana:-:05:00
|
America/Port-au-Prince:-:05:00
|
America/Atka:-:10:00
|
America/Hermosillo:-:07:00
|
America/Port_of_Spain:-:04:00
|
America/Barbados:-:04:00
|
America/Ind/Indian:-:05:00
|
America/Porto_Acre:-:05:00
|
America/Belem:-:03:00
|
America/Ind/Knox:-:05:00
|
America/Porto_Velho:-:04:00
|
America/Belize:-:06:00
|
America/Ind/Marengo:-:05:00
|
America/Puerto_Rico:-:04:00
|
America/Boa_Vista:-:04:00
|
America/Ind/Vevay:-:05:00
|
America/Rainy_River:-:06:00
|
America/Bogota:-:05:00
|
America/Indianapolis:-:05:00
|
America/Rankin_Inlet:-:06:00
|
America/Bogota:-:05:00
|
America/Inuvik:-:07:00
|
America/Recife:-:03:00
|
America/Buenos_Aires:-:03:00
|
America/Iqaluit:-:05:00
|
America/Regina:-:06:00
|
America/Cambridge_Bay:-:07:0
|
America/Jamaica:-:05:00
|
America/Rio_Branco:-:05:00
|
America/Cancun:-:06:00
|
America/Jujuy:-:03:00
|
America/Rosario:-:03:00
|
America/Caracas:-:04:00
|
America/Juneau:-:09:00
|
America/Santiago:-:04:00
|
America/Catamarca:-:03:00
|
America/Ken/Louisville:-:05:00
|
America/Santo_Domingo:-:04:0
|
America/Cayenne:-:03:00
|
America/Ken/Monticello:-:05:0
|
America/Sao_Paulo:-:03:00
|
America/Cayman:-:05:00
|
America/Knox_IN:-:05:00
|
America/Scoresbysund:-:01:00
|
America/Chicago:-:06:00
|
America/La_Paz:-:04:00
|
America/Shiprock:-:07:00
|
America/Chihuahua:-:07:00
|
America/Lima:-:05:00
|
America/St_Johns:-:03:30
|
America/Cordoba:-:03:00
|
America/Los_Angeles:-:08:00
|
America/St_Lucia:-:04:00
|
America/Costa_Rica:-:06:00
|
America/Louisville:-:05:00
|
America/St_Thomas:-:04:00
|
America/Cuiaba:-:04:00
|
America/Maceio:-:03:00
|
America/St_Vincent:-:04:00
|
America/Curacao:-:04:00
|
America/Managua:-:06:00
|
America/Swift_Current:-:06:00
|
America/Danmarkshavn:00:00
|
America/Manaus:-:04:00
|
America/Tegucigalpa:-:06:00
|
America/Dawson:-:08:00
|
America/Martinique:-:04:00
|
America/Thule:-:04:00
|
America/Dawson_Creek:-:07:00
|
America/Mazatlan:-:07:00
|
America/Thunder_Bay:-:05:00
|
America/Denver:-:07:00
|
America/Mendoza:-:03:00
|
America/Tijuana:-:08:00
|
America/Detroit:-:05:00
|
America/Menominee:-:06:00
|
America/Tortola:-:04:00
|
America/Dominica:-:04:00
|
America/Merida:-:06:00
|
America/Vancouver:-:08:00
|
America/Edmonton:-:07:00
|
America/Mexico_City:-:06:00
|
America/St_Lucia:-:04:00
|
America/Eirunepe:-:05:00
|
America/Miquelon:-:03:00
|
America/Virgin:-:04:00
|
America/El_Salvador:-:06:00
|
America/Monterrey:-:06:00
|
America/Whitehorse:-:08:00
|
America/Ensenada:-:08:00
|
America/Montevideo:-:03:00
|
America/Winnipeg:-:06:00
|
America/Fort_Wayne:-:05:00
|
America/Montreal:-:05:00
|
America/Yakutat:-:09:00
|
America/Fortaleza:-:03:00
|
America/Montserrat:-:04:00
|
America/Yellowknife:-:07:00
|
America/Glace_Bay:-:04:00
|
America/Nassau:-:05:00
|
America/Virgin:-:04:00
|
America/Godthab:-:03:00
|
America/New_York:-:05:00
|
America/Whitehorse:-:08:00
|
America/Goose_Bay:-:04:00
|
America/Nipigon:-:05:00
|
America/Winnipeg:-:06:00
|
America/Grand_Turk:-:05:00
|
America/Nome:-:09:00
|
America/Tortola:-:04:00
|
Table 6-14 Antarctica/Arctic GMT Offsets
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Antarctica/Casey:+:08:00
|
Antarctica/McMurdo:+:12:00
|
Antarctica/Vostok:+:06:00
|
Antarctica/Davis:+:07:00
|
Antarctica/Palmer:-:04:00
|
Arctic/Longyearbyen:+:01:00
|
Antarctica/DtDUrville:+:10:00
|
Antarctica/South_Pole:+:12:00
|
|
Antarctica/Mawson:+:06:00
|
Antarctica/Syowa:+:03:00
|
|
Table 6-15 Asia GMT Offsets
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Asia/Aden:+:03:00
|
Asia/Hong_Kong:+:08:00
|
Asia/Riyadh87:+:03:07
|
Asia/Almaty:+:06:00
|
Asia/Hovd:+:07:00
|
Asia/Riyadh88:+:03:07
|
Asia/Amman:+:02:00
|
Asia/Irkutsk:+:08:00
|
Asia/Riyadh89:+:03:07
|
Asia/Anadyr:+:12:00
|
Asia/Istanbul:+:02:00
|
Asia/Riyadh:+:03:00
|
Asia/Aqtau:+:04:00
|
Asia/Jakarta:+:07:00
|
Asia/Saigon:+:07:00
|
Asia/Aqtobe:+:05:00
|
Asia/Jayapura:+:09:00
|
Asia/Sakhalin:+:10:00
|
Asia/Ashgabat:+:05:00
|
Asia/Jerusalem:+:02:00
|
Asia/Samarkand:+:05:00
|
Asia/Ashkhabad:+:05:00
|
Asia/Kabul:+:04:30
|
Asia/Seoul:+:09:00
|
Asia/Baghdad:+:03:00
|
Asia/Kamchatka:+:12:00
|
Asia/Shanghai:+:08:00
|
Asia/Bahrain:+:03:00
|
Asia/Karachi:+:05:00
|
Asia/Singapore:+:08:00
|
Asia/Baku:+:04:00
|
Asia/Kashgar:+:08:00
|
Asia/Taipei:+:08:00
|
Asia/Bangkok:+:07:00
|
Asia/Katmandu:+:05:45
|
Asia/Tashkent:+:05:00
|
Asia/Beirut:+:02:00
|
Asia/Krasnoyarsk:+:07:00
|
Asia/Tbilisi:+:04:00
|
Asia/Bishkek:+:05:00
|
Asia/Kuala_Lumpur:+:08:00
|
Asia/Tehran:+:03:30
|
Asia/Brunei:+:08:00
|
Asia/Kuching:+:08:00
|
Asia/Tel_Aviv:+:02:00
|
Asia/Calcutta:+:05:30
|
Asia/Kuwait:+:03:00
|
Asia/Thimbu:+:06:00
|
Asia/Choibalsan:+:09:00
|
Asia/Macao:+:08:00
|
Asia/Thimphu:+:06:00
|
Asia/Chongqing:+:08:00
|
Asia/Magadan:+:11:00
|
Asia/Tokyo:+:09:00
|
Asia/Chungking:+:08:00
|
Asia/Manila:+:08:00
|
Asia/Ujung_Pandang:+:08:00
|
Asia/Colombo:+:06:00
|
Asia/Muscat:+:04:00
|
Asia/Ulaanbaatar:+:08:00
|
Asia/Dacca:+:06:00
|
Asia/Nicosia:+:02:00
|
Asia/Ulan_Bator:+:08:00
|
Asia/Damascus:+:02:00
|
Asia/Novosibirsk:+:06:00
|
Asia/Urumqi:+:08:00
|
Asia/Dhaka:+:06:00
|
Asia/Omsk:+:06:00
|
Asia/Vientiane:+:07:00
|
Asia/Dili:+:09:00
|
Asia/Phnom_Penh:+:07:00
|
Asia/Vladivostok:+:10:00
|
Asia/Dubai:+:04:00
|
Asia/Pontianak:+:07:00
|
Asia/Yakutsk:+:09:00
|
Asia/Dushanbe:+:05:00
|
Asia/Pyongyang:+:09:00
|
Asia/Yekaterinburg:+:05:00
|
Asia/Gaza:+:02:00
|
Asia/Qatar:+:03:00
|
Asia/Yerevan:+:04:00
|
Asia/Harbin:+:08:00
|
Asia/Rangoon:+:06:30
|
|
Table 6-16 Atlantic GMT Offsets
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Atlantic/Azores:-:01:00
|
Atlantic/Faeroe:00:00
|
Atlantic/South_Georgia:-:02:00
|
Atlantic/Bermuda:-:04:00
|
Atlantic/Jan_Mayen:+:01:00
|
Atlantic/St_Helena:00:00
|
Atlantic/Canary:00:00
|
Atlantic/Madeira:00:00
|
Atlantic/Stanley:-:04:00
|
Atlantic/Cape_Verde:-:01:00
|
Atlantic/Reykjavik:00:00
|
|
Table 6-17 Australia GMT Offsets
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Australia/ACT:+:10:00
|
Australia/LHI:+:10:30
|
Australia/Queensland:+:10:00
|
Australia/Adelaide:+:09:30
|
Australia/Lindeman:+:10:00
|
Australia/South:+:09:30
|
Australia/Brisbane:+:10:00
|
Australia/Lord_Howe:+:10:30
|
Australia/Sydney:+:10:00
|
Australia/Broken_Hill:+:09:30
|
Australia/Melbourne:+:10:00
|
Australia/Tasmania:+:10:00
|
Australia/Canberra:+:10:00
|
Australia/NSW:+:10:00
|
Australia/Victoria:+:10:00
|
Australia/Darwin:+:09:30
|
Australia/North:+:09:30
|
Australia/West:+:08:00
|
Australia/Hobart:+:10:00
|
Australia/Perth:+:08:00
|
Australia/Yancowinna:+:09:30
|
Table 6-18 Brazil GMT Offsets
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Brazil/Acre:-:05:00
|
Brazil/East:-:03:00
|
Brazil/West:-:04:00
|
Brazil/DeNoronha:-:02:00
|
|
|
Table 6-19 Canada/Chile/Cuba GMT Offsets
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Canada/Atlantic:-:04:00
|
Canada/Mountain:-:07:00
|
Canada/Yukon:-:08:00
|
Canada/Central:-:06:00
|
Canada/Newfoundland:-:03:30
|
Chile/Continental:-:04:00
|
Canada/East-Ssktchwan:-:06:00
|
Canada/Pacific:-:08:00
|
Chile/EasterIsland:-:06:00
|
Canada/Eastern:-:05:00
|
Canada/Saskatchewan:-:06:00
|
Cuba:-:05:00
|
Table 6-20 Egypt/Eire/Europe GMT Offsets
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Egypt:+:02:00
|
Europe/Kiev:+:02:00
|
Europe/Simferopol:+:02:00
|
Eire:00:00
|
Europe/Lisbon:00:00
|
Europe/Skopje:+:01:00
|
Europe/Amsterdam:+:01:00
|
Europe/Ljubljana:+:01:00
|
Europe/Sofia:+:02:00
|
Europe/Andorra:+:01:00
|
Europe/London:00:00
|
Europe/Stockholm:+:01:00
|
Europe/Athens:+:02:00
|
Europe/Luxembourg:+:01:00
|
Europe/Tallinn:+:02:00
|
Europe/Belfast:00:00
|
Europe/Madrid:+:01:00
|
Europe/Tirane:+:01:00
|
Europe/Belgrade:+:01:00
|
Europe/Malta:+:01:00
|
Europe/Tiraspol:+:02:00
|
Europe/Berlin:+:01:00
|
Europe/Minsk:+:02:00
|
Europe/Uzhgorod:+:02:00
|
Europe/Bratislava:+:01:00
|
Europe/Monaco:+:01:00
|
Europe/Vaduz:+:01:00
|
Europe/Brussels:+:01:00
|
Europe/Moscow:+:03:00
|
Europe/Vatican:+:01:00
|
Europe/Bucharest:+:02:00
|
Europe/Nicosia:+:02:00
|
Europe/Vienna:+:01:00
|
Europe/Budapest:+:01:00
|
Europe/Oslo:+:01:00
|
Europe/Vilnius:+:02:00
|
Europe/Chisinau:+:02:00
|
Europe/Paris:+:01:00
|
Europe/Warsaw:+:01:00
|
Europe/Copenhagen:+:01:00
|
Europe/Prague:+:01:00
|
Europe/Zagreb:+:01:00
|
Europe/Dublin:00:00
|
Europe/Riga:+:02:00
|
Europe/Zaporozhye:+:02:00
|
Europe/Gibraltar:+:01:00
|
Europe/Rome:+:01:00
|
Europe/Zurich:+:01:00
|
Europe/Helsinki:+:02:00
|
Europe/Samara:+:04:00
|
Europe/Simferopol:+:02:00
|
Europe/Istanbul:+:02:00
|
Europe/San_Marino:+:01:00
|
Europe/Skopje:+:01:00
|
Europe/Kaliningrad:+:02:00
|
Europe/Sarajevo:+:01:00
|
Europe/Sofia:+:02:00
|
Table 6-21 Hong Kong/Iceland/India/Iran/Israel GMT Offsets
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Hongkong:+:08:00
|
Indian/Cocos:+:06:30
|
Indian/Mauritius:+:04:00
|
Iceland:00:00
|
Indian/Comoro:+:03:00
|
Indian/Mayotte:+:03:00
|
Indian/Antananarivo:+:03:00
|
Indian/Kerguelen:+:05:00
|
Indian/Reunion:+:04:00
|
Indian/Chagos:+:06:00
|
Indian/Mahe:+:04:00
|
Iran:+:03:30
|
Indian/Christmas:+:07:00
|
Indian/Maldives:+:05:00
|
Israel:+:02:00
|
Table 6-22 Jamaica/Japan/Kwajalein/Libya GMT Offsets
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Jamaica:-:05:00
|
Kwajalein:+:12:00
|
Libya:+:02:00
|
Japan:+:09:00
|
|
|
Table 6-23 Mexico/Mideast GMT Offsets
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Mexico/BajaNorte:-:08:00
|
Mexico/General:-:06:00
|
Mideast/Riyadh88:+:03:07
|
Mexico/BajaSur:-:07:00
|
Mideast/Riyadh87:+:03:07
|
Mideast/Riyadh89:+:03:07
|
Table 6-24 Pacific/Poland/Portugal GMT Offsets
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Pacific/Apia:-:11:00
|
Pacific/Johnston:-:10:00
|
Pacific/Ponape:+:11:00
|
Pacific/Auckland:+:12:00
|
Pacific/Kiritimati:+:14:00
|
Pacific/Port_Moresby:+:10:00
|
Pacific/Chatham:+:12:45
|
Pacific/Kosrae:+:11:00
|
Pacific/Rarotonga:-:10:00
|
Pacific/Easter:-:06:00
|
Pacific/Kwajalein:+:12:00
|
Pacific/Saipan:+:10:00
|
Pacific/Efate:+:11:00
|
Pacific/Majuro:+:12:00
|
Pacific/Samoa:-:11:00
|
Pacific/Enderbury:+:13:00
|
Pacific/Marquesas:-:09:30
|
Pacific/Tahiti:-:10:00
|
Pacific/Fakaofo:-:10:00
|
Pacific/Midway:-:11:00
|
Pacific/Tarawa:+:12:00
|
Pacific/Fiji:+:12:00
|
Pacific/Nauru:+:12:00
|
Pacific/Tongatapu:+:13:00
|
Pacific/Funafuti:+:12:00
|
Pacific/Niue:-:11:00
|
Pacific/Truk:+:10:00
|
Pacific/Galapagos:-:06:00
|
Pacific/Norfolk:+:11:30
|
Pacific/Wake:+:12:00
|
Pacific/Gambier:-:09:00
|
Pacific/Noumea:+:11:00
|
Pacific/Wallis:+:12:00
|
Pacific/Guadalcanal:+:11:00
|
Pacific/Pago_Pago:-:11:00
|
Pacific/Yap:+:10:00
|
Pacific/Guam:+:10:00
|
Pacific/Palau:+:09:00
|
Poland:+:01:00
|
Pacific/Honolulu:-:10:00
|
Pacific/Pitcairn:-:08:00
|
Portugal:00:00
|
Table 6-25 Singapore/System V/Turkey GMT Offsets
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Singapore:+:08:00
|
SystemV/EST5:-:05:00
|
SystemV/PST8PDT:-:08:00
|
SystemV/AST4:-:04:00
|
SystemV/EST5EDT:-:05:00
|
SystemV/YST9:-:09:00
|
SystemV/AST4ADT:-:04:00
|
SystemV/MST7:-:07:00
|
SystemV/YST9YDT:-:09:00
|
SystemV/CST6:-:06:00
|
SystemV/MST7MDT:-:07:00
|
Turkey:+:02:00
|
SystemV/CST6CDT:-:06:00
|
SystemV/PST8:-:08:00
|
|
Table 6-26 U.S GMT Offsets
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
Time Zone: GMT Offset
|
US/Alaska:-:09:00
|
US/Eastern:-:05:00
|
US/Pacific-New:-:08:00
|
US/Aleutian:-:10:00
|
US/Hawaii:-:10:00
|
US/Pacific:-:08:00
|
US/Arizona:-:07:00
|
US/Indiana-Starke:-:05:00
|
US/Samoa:-:11:00
|
US/Central:-:06:00
|
US/Michigan:-:05:00
|
|
US/East-Indiana:-:05:00
|
US/Mountain:-:07:00
|
|
Manifest File Automated Scripts
This section contains information about automated Perl scripts that you can use to automate the creation of manifest files for your CDN. The most efficient method of creating a manifest file is to customize the automated Spider Perl script in combination with the Manifest Perl script, both of which are available on Cisco.com. These two Perl scripts can serve as the basis for your own automation scripts that are modified accordingly to suit your own needs.
We provide two automated Perl scripts called spider.pl and manifest.pl. These scripts can be used as-is. If you are proficient in using Perl, you can modify the spider.pl and manifest.pl scripts. However, if you modify these scripts, we will not support them. Both the spider.pl and manifest.pl scripts contain a "--file" argument that is to be used in conjunction with a user-created rules file, such as .cfg. So that the scripts can be reused, it is recommended that users employ this method to include the various arguments that they require, as opposed to running them from the command line.
First, run the Spider script, and then use the output of the Spider script as input to the Manifest script. The Spider script searches the content of the selected origin servers and outputs a database file containing a list of URLs of all content. The Manifest script uses this database file to build the manifest file with the correct syntax based on rules you stipulate from the command line or rules file. This produces an XML-based manifest file containing the URLs of only those content objects that you want made available to your users.
The following two sample automated Perl scripts are available on Cisco.com:
•
Spider Perl script
The Spider script crawls over the content of a selected origin server and outputs a database file containing a list of URLs.
•
Manifest Perl script
The Manifest script reads the database file output by the Spider script and uses rules that you establish to produce an XML-formatted manifest file containing the URLs of only those filtered content objects that you want to make available to users.
Installing Perl on Your Workstation
You must have Perl installed on your workstation before working with or running the Spider or Manifest scripts. It is useful to also have a Perl interpreter available. Perl is open source software and can be downloaded for free from a variety of locations on the Internet. Refer to the Comprehensive Perl Archive Network (CPAN) at:
http://www.cpan.org
or
http://www.perl.com
Obtaining the Perl Scripts
The Spider and Manifest scripts can be obtained from Cisco.com using the same procedure that is used to obtain updated versions of the Cisco ACNS software.
To obtain the Spider and Manifest scripts from Cisco.com, follow these steps:
Step 1
Go to the following URL to find the Spider and Manifest Perl scripts:
http://www.cisco.com/pcgi-bin/tablebuild.pl/acns50
Step 2
When prompted, log in to Cisco.com using your designated Cisco.com username and password.
The Cisco ACNS Software download page appears, listing the available software updates for the Cisco ACNS Software product.
Step 3
Locate the file named ACNS-5.0.1-manifest-tools.zip. This is a Zip archive containing both the Manifest and the Spider Perl scripts.
Step 4
Click the link for the ACNS-5.0.1-manifest-tools.zip file. The download page appears.
Step 5
Click Software License Agreement.
A new browser window opens, displaying the license agreement.
Step 6
After you have read the license agreement, close the browser window displaying the agreement and return to the Software Download page.
Step 7
Click the filename link labeled Download.
Step 8
Click Save to file and then choose a location on your workstation to temporarily store the zipped file containing the scripts.
Step 9
Use your preferred unzip program to unpack the scripts to a location on your workstation or your network.
After you have unzipped the scripts, you are ready to begin using them to build manifest files for your website. See the "Listing Website Content Using the Spider Script" section and the "Selecting Live and Pre-Positioned Content Using the Manifest Script" section for instructions on running the scripts.
Listing Website Content Using the Spider Script
In the simplest scenario, the Spider script is pointed to the address of an origin server and given the name of a database (.db) file into which it places any valid URLs it discovers on that site. For example, if you wanted to analyze the contents of www.cisco.com for content that might be pre-positioned after the manifest file is created, you would issue the following command:
perl spider.pl --start=www.cisco.com --db=ciscocontent.db
Limiting or Broadening the Scope of the Spider Script
Running the Spider script on the whole of www.cisco.com might take hours and produce much more information than you are interested in. The Spider script contains a variety of tools that enable you to limit as well as broaden the scope of a spider's action.
Note
When running the Spider script on large websites, you must plan for the long period of time and the large amount of memory that is required for the Spider script to create a database.
For example, to limit the Spider script's search of www.cisco.com to just that part of the server containing product-related support information, you could enter the following command:
perl spider.pl --start=www.cisco.com/public/support/ --db=ciscocontent.db
To ask the Spider script to follow links from www.cisco.com to the Cisco networking professionals forum, you could enter the following Spider script command:
perl spider.pl --start=www.cisco.com --accept=business.cisco.com --db=ciscocontent.db
Spider Script Syntax Guidelines
The Spider script accepts the following syntax, as described in Table 6-27.
perl spider.pl {--start=origin_server_url [ --accept=accept_url] [--depth=number] [--file=filename]
[ --limit=number] [ --prefix=url_prefix] [ --reject=disallowed_url] --db=database_name.db}
Table 6-27 Spider Script Keywords
Keyword
|
Description
|
Command-Line Syntax
|
--start
|
Names the location (URL) of the origin server that is to be analyzed.
|
|
--db
|
Names the database file in which content URLs from the origin server and any accepted locations are to be placed.
|
|
--accept (optional)
|
Names a location other than that specified using the start keyword that is to be accepted when it is found in URLs.
|
--accept=forums.cisco.com
Note --accept is a more general command that can include regular expressions. For example, you can use "jobs.*tech" to accept any URLs with the string "jobs" followed by "tech."
|
--depth (optional)
|
Causes the Spider script to stop after following links to a specified number of levels deep on the origin server.
|
|
--file (optional)
|
Causes the Spider script to read its commands from a specified file, in this case the rules file, one line at a time.
|
|
--limit (required)
|
Causes the Spider script to stop after retrieving a specified number of pages from the origin server. Specifying 0 sets no limit for the number of pages retrieved.
|
|
--map (optional)
|
Causes the Spider script to substitute the second URL prefix (appearing after the second =) for the first in any URLs from the origin server. Or causes the Spider script to substitute the first prefix for the second when you rerun the Spider script on an origin server if links have been modified to go to the Content Engine.
|
--map=http://www.cisco.com/public/support/
tac/=/support
|
--prefix (optional)
|
Specifies a URL prefix that is matched by the Spider script. The --prefix keyword is a convenient option that accepts a fixed string that it must match from the beginning, such as "http://www.cisco.com/jobs."
|
--prefix=http://www.cisco.com/partners/CDN/
|
--reject (optional)
|
Names a location that is rejected when it is found in URLs.
Note The order in which the --accept and --reject keywords are given to the Spider script determines precedence. The first match takes precedence.
|
|
Customizing the Spider Script
Because the Spider script anticipates certain platforms and scenarios that might not correspond to your own website configuration, Cisco provides you with the Perl source code for the Spider script, which you can modify to suit your own needs.
Selecting Live and Pre-Positioned Content Using the Manifest Script
Whereas the Spider script is used to gather a list of potential content from an origin server, the Manifest script sifts through the information gathered by the Spider script and decides which content to actually import to the CDN for placement on a Content Engine.
Pre-Positioned Versus Live Content
The Manifest script distinguishes between content that needs to be pre-positioned and live, streamed content that, by definition, cannot be pre-positioned.
The result of using the live command is nearly the same as that of using the prepos command. Both commands expect you to to specify what you intend to deliver as live content or to deliver as pre-positioned content with --prepos=match() or --prepos=type(). The only difference between these two commands is the tags contained in the .xml file that is created by manifest.pl. If the prepos command is used, then the .xml file that is created contains the tag <item-group type="prepos">. If the live command is used, then the .xml file contains the tag <item-group type="wmt-live"> or <item-group type="real-live">, depending on whether the streaming data is RealMedia or WMT.
By using the prepos command, you identify and pre-position content that meets criteria that you specify. For example, to pre-position image files from Cisco.com that are larger than 1 megabyte, you would enter the following command:
perl manifest.pl --prepos='type(image/*) and size > 1000k' --db=ciscocontent.db
--xml=cisco.xml
By using the live command, you identify the URLs of live content. Unlike pre-positioned content, live content cannot be identified by information stored in the header, so you must devise a method of locating live content based solely on information contained in the URL of that content. For example, you can identify streamed content with the following command:
perl manifest.pl --live=`match(http://*)'
Manifest Script Syntax Guidelines
The Manifest script accepts the following syntax, as described in Table 6-28.
perl manifest.pl {[--file=filename | --live=`keyword_comparison' | --prepos=`keyword_comparison' | --set=`attribute=value : keyword_comparison' | --playservertable=filename | --map={origin_server_url_prefix=cdn_prefix}] --db=database_name.db --xml=manifest_file_name.xml}
Note
The --prepos keyword is required for the manifest file to be created from the Spider database. If you do not use this keyword, the manifest file created will be minimal and will not contain any content URLs.
The --prepos keyword can be used with either --type() or --match(), which perform different functions. The --match() keyword is a text match and acts on the name of the URL. For example, to call jpeg files named a.jaypeg, use the --match(*.jaypeg) keyword. Another example would be to use the --match() keyword to find news in the name of the URL. The --match() keyword can also be used as shown in the following example:
perl manifest.pl -db=name.db --prepos=='match(*.jpg)' -xml=xmlname.xml
The --type() keyword is used for comparing the content named in the database file to the content-type header returned by the web server. It informs the client of the object MIME type. For example, if you name your jpeg files *.jaypeg, the web server returns "Content-Type: image/jpeg," which is then placed in the database.
Other examples of the --type() keyword include the following:
--prepos=type(text/html)
--prepos=type(text/plain)
--prepos=type(application/pdf)
--prepos=type(image/gif)
--prepos=type(image/jpeg)
--prepos=type(video/mpeg)
The following are two examples of using the --type() keyword in the full command line:
perl manifest.pl --db=name.db --prepos="type(image/jpeg)" -xml=xmlname.xml
perl manifest.pl --db=name2.db --prepos="type(application/pdf)" xml=xmlname2.xml
Tip
As a rule of thumb, you must use quotes only in the command line. You do not need to use quotes within a rules file for the --file keyword.
If the--prepos keyword is used in the full command line, then quotes are needed as follows:
•
Windows 2000—Use double quotes instead of single quotes, as shown in the preceding example.
•
Linux—Use single quotes instead of double quotes.
If the --prepos argument is used within a rules file with the --file argument, then you can modify the file because quotes are not required, as shown in the following rules.cfg file example:
--accept=forums.cisco.com
--prepos=match(image/gif) and size > 1000k
If the quotes are not removed from within the rules file, the following message appears:
Bareword found where operator expected.
Table 6-28 Manifest Script Keywords
Keyword
|
Description
|
Command-Line Syntax
|
--file (required)
|
Causes the Manifest script to read its commands from a specified file, one line at a time.
|
|
--prepos (required)
|
Marks content URLs in the database file that match the terms of the keyword comparison as pre-positioned content (prepos=type) in the manifest file.
Note The type command matches on the Content-Type: field in the Spider database file.
|
--prepos=`type(image/jpeg) and size >
1000k'
|
--set (optional)
|
Sets the specified attribute to the value provided for all content items with URLs in the database file that match the keyword comparison.
|
--set=`ttl=10000:match(*/urgent/*)'
|
--playservertable (optional)
|
Adds the playserver table in the specified file to the manifest file. Playserver tables map MIME-type content and filename extensions to specific server types to use (for example, "real" or "wmt") for the content in a specific Content Engine.
For the manifest file to validate properly, move the entire playserver table to the beginning of the manifest file as shown in the following example:
|
--playservertable=info.txt
|
--map (optional)
|
Causes the Manifest script to substitute the second URL prefix (appearing after the second =) for the first in any URLs from the origin server.
The second URL prefix must have a full path name. The --map keyword is used to change the names in the .xml file that uses the manifest.pl. When you run the manifest.pl, you should see <item> tags that have the cdn-url attribute set to the requested name.
|
--map=http://www.cisco.com/public/support/
tac/=/support
|
--db (required)
|
Names the database file in which content URLs from the origin server and any accepted locations are located. This file provides the data that the Manifest script analyzes.
|
|
--xml (required)
|
Names the manifest file that is generated by the Manifest script.
|
|
match(comparison) (required)
|
Locates text in content URLs that are identical to a value that is provided.
|
--prepos=`match(http://forums.cisco.com/*)'
|
size(comparison) (required)
|
Identifies content named in the database file according to the specified file size parameter, which can be specified in kilobytes, megabytes, or gigabytes (k, kb, m, mb, g, gb).
|
--prepos=match(*.gif) and size > 1000k
|
time(comparison) (required)
|
Identifies content named in the database file according to the time since the content was last modified (in hours).
Note Do not use spaces before the word "hours," whether used within a rules file or not.
Note If using the time() keyword within a rules --file, then do not use quotes.
In the syntax example, the "modtime" is compared to "now" - <value>, where "now" is the current time (in seconds since 1970). The <value> is a unit of time in hours. For example, if "modtime" is the current time minus 2 hours, it would be expressed as:
|
The following example shows a .cfg file with the time comparison using the ">" character:
--start=http://website.com
# this works for bitmaps modified within
last 2 hours
--prepos=type(image/bmp) and modtime > (now
- 2hours)
The following example shows a .cfg file with the time comparison using the "<" character:
--start=http://website.com
# this works for bitmaps that have NOT been
modified within last 2 hours
--prepos=type(image/bmp) and modtime < (now
- 2hours)
|
type(comparison) (required)
|
Identifies content named in the database file according to its MIME type (text, application, image, and so on).
|
--prepos=`type(image/gif)'
|
Customizing the Manifest Script
Because the Manifest script anticipates certain platforms and scenarios that might not correspond to your own website configuration, Cisco provides you with the Perl source code for the Manifest script, which you can modify to suit your own needs.
Creating a Rules File for the Spider and Manifest Scripts
When using the Spider and Manifest scripts on a large web server, the parameters and rules you set for your scripts may be numerous and complex. When this is the case, it is more practical to create a separate file containing a list of your customized rules. Then you can simply point to the applicable rule rather than having to enter a long series of commands every time you want the rule applied.
Using a rules file facilitates rerunning of the Spider and Manifest scripts and ensures that the scripts are receiving identical commands each time the scripts are run. In addition, the same commands file can be read by both the Spider and the Manifest scripts without generating output errors. The Spider script simply ignores commands for the Manifest script, and vice versa.
To create a rules file for the Spider and Manifest scripts, follow these steps:
Step 1
Open your text editor.
Step 2
Enter your commands one at a time, each on its own line.
Each line of your rule file is sent to the scripts as a single argument. The following example shows a rules file for the Cisco website.
--accept=forums.cisco.com
--prepos=match(image/gif) and size > 1000k
Step 3
Save your file in a location relative to the Spider and Manifest scripts.
Step 4
Use the file command to run each script using your rules file. For example:
perl spider.pl --file=cisco-rules.cfg
perl manifest.pl --file=cisco-rules.cfg