Guest

Cisco Network Registrar

Monitoring Techniques for Cisco Network Registrar White Paper

  • Viewing Options

  • PDF (565.2 KB)
  • Feedback

Introduction. 3

Statistics Collection. 3

Collect DHCP Server Statistics. 3

Collect DNS Server Statistics. 5

Collect TFTP Server Statistics. 9

Collect Host Statistics. 10

Interpret the Collected Statistics. 10

Interpret DHCP Server Statistics. 11

Interpret DNS Server Statistics. 12

Interpret TFTP Server Statistics. 13

Related Information. 14


Introduction

This document describes the techniques that you can use to collect and interpret the statistics required to monitor a Cisco Network Registrar (CNR) deployment. You must collect and store the statistics in a usable format to make these decisions of both CNR and the network:

Capacity planning

Attack detection

Misconfiguration

The techniques in this document use these features to access the statistics counters that are available in the CNR version 6.1 servers:

Server logging

CLI commands

API features

You can apply these techniques to the earlier versions, where the statistics counters, logging functions, CLI, and API commands are supported.

The techniques to collect statistics (like CPU use, disk use, and network use) are not in the scope of this paper. The minimum recommendations on the statistics that you must collect are provided in the Collect Host Statistics section.

Statistics Collection

The statistics collection step is important to monitor a CNR deployment. You can use the statistics to analyze directions and identify the trouble spots.

Change the formats of collected statistics, so that all are in a common time-based reference and file format. Merge inputs with the comma separated value (CSV) text files as an intermediate format. You can use and analyze the collected data with integration of report and chart products. Each line in this file has a time stamp that is put into a normal state across the time zones if the deployment is geographically distributed.

The collection of statistics (taking measurements) occurs with a recommended time of five minutes between each measurement. Each line in the CSV text file records the data items measured in the period (the last five minutes). For example, if you measure the number of request packets processed by the DHCP server, you can calculate the number of packets received in the given interval for each line, not the number processed from the start, which is the value reported by the server.

Collect DHCP Server Statistics

The DHCP server collects the basic statistics while it processes incoming requests. You must not do any configuration to enable this feature.

The extensions that run in the CNR DHCP server are important performance factors that you must consider when you collect statistics. Any additional statistics that are available depend on the capabilities of the extension.

CNR API

Use the CNR API to collect the total statistics from the last server-start. To sample the statistics, your program must poll the server and record the changes from the last polling interval. Use the API command, GetServerStats. This API returns these attributes in an SCP object of the type, DHCPServerStats:

Attribute

Description

start-time

The date and time the server was last reloaded

total-discovers

Total DHCPDISCOVER packets received

total-requests

Total DHCPREQUEST packets received

total-releases

Total DHCPRELEASE packets received

total-offers

Total DHCPOFFER packets sent

total-acks

Total DHCPACK packets sent

total-naks

Total DHCPNACK packets sent

total-declines

Total DHCPDECLINE packets received

You can retrieve these counters with the CLI, use the server getstats command:

nrcmd> dhcp getstats

DHCP Log Files

You can use the DHCP activity summary log to collect statistics from the server. You enable the activity summary logs when you set the activity-summary flag for the DHCP log-set attribute:

nrcmd> dhcp set log-settings=activity-summary

You set the report interval by the activity-summary-interval attribute:

nrcmd> dhcp set activity-summary-interval=1m

The counters are reported in message #05321. For example:

02/08/2004 16:00:02 name/dhcp/1 Activity Server 0 05320 DHCP activity, 60 seconds: Discovers: 20000, Offers: 20000, Requests: 20000, Acks: 20000, Nacks: 0, Rel.: 0, Decl.: 0, Exp.: 0, In use: Resp: 518, Req: 1, Acks/Second: 333.

If you enable failover, the failover-related counters are reported in message #05322. For example:

02/08/2004 16:00:02 name/dhcp/1 Activity Server 0 05322 Failover RECEIVED: 324,bndupd 0, ack 318, nak 0, pool 0, poll 6, updreq 0, upddone 0, SENT: 839, bndupd 833, ack 0, nak 0, pool 0, poll 6, updreq 0, upddone 0, MISSED: 0

Category

Label

Description

Activity

Discovers

DHCPDISCOVER packets received during reporting interval

Offers

DHCPOFFER packets sent during reporting interval

Requests

DHCPREQUEST packets received during reporting interval

Acks

DHCPACK packets sent during reporting interval

Nacks

DHCPNACK packets sent during reporting interval

Rel

DHCPRELEASE packets received during reporting interval

Decl

DHCPDECLINE packets received during reporting interval

Exp

Number of leases expired during reporting interval

Resp

Number of DHCP server response buffers in use at the end of this reporting interval

Req

Number of DHCP server request buffers in use at the end of this reporting interval

Acks/Second

Average rate for the reporting interval, if greater than 0

Failover

Received

Failover packets received during reporting interval

Received

Bndupd

Bind update packets received during reporting interval

Ack

Bind ack packets received during reporting interval

Nack

Bind nack packets received during reporting interval

Pool

Backup pool messages received during reporting interval

Poll

Polling (keep-alive) messages received during reporting interval

Updreq

Update request messages received during reporting interval

Upddone

Update done messages received during reporting interval

Sent

Failover packets sent during reporting interval

Sent

Bndupd

Bind update packets sent during reporting interval

Ack

Bind ack packets sent during reporting interval

Nack

Bind nack packets sent during reporting interval

Pool

Backup pool messages sent during reporting interval

Poll

Polling (keep−alive) messages sent during reporting interval

Updreq

Update request messages sent during reporting interval

Upddone

Update done messages sent during reporting interval

Missed

Failover packets dropped during reporting interval

Collect DNS Server Statistics

The DNS server collects the basic statistics during normal server processing, modeled after RFC-1611. Do not configure any item to enable this feature. The server collects the enhanced statistics separately for the interval from the last server-start and the enabled sample counters for the current sample-interval. The counters fit in these five categories:

Performance

Query

Errors

Security

Maxcounters

You can enable the sample counters for these groups by setting the collect-sample-counters attribute and configure the sample interval:

nrcmd> dns enable collect-sample-counters
nrcmd> dns set activity-counter-interval=1m

The default sample interval is five minutes. No configuration is required to collect the total counters (measured from the last server-start or administrative reset).

The DNS server counters include:

Category

Label

id

String identifier for this DNS server

config-recurs

The recursion services offered by this server: available(1) - performs recursion on requests from clients; restricted(2) - recursion is performed on requests only from certain clients; unavailable(3) - recursion is not available.

config-up-time

The elapsed time since the DNS server process was started.

config-reset-time

The elapsed time since the DNS server was last reset (restarted).

config-reset

The server state: other(1) - server in some unknown state; initializing(3) - server (re)initializing; running(4) - server currently running.

counter-auth-ans

The number of queries which were authoritatively answered.

counter-auth-no-names

The number of queries for which ‘authoritative no such name’ responses were made.

counter-auth-no-data-resps

The number of queries for which ‘authoritative no such data’ (empty answer) responses were made.

counter-non-auth-datas

The number of queries which were non-authoritatively answered (cached data).

counter-non-auth-no-datas

The number of queries which were non-authoritatively answered with no data (empty answer).

counter-referrals

The number of requests that were referred to other servers.

counter-errors

The number of requests the server has processed that were answered with errors (RCODE values other than 0 and 3). Reference RFC-1035 section 4.1.1.]

counter-rel-names

The number of requests received by the server for names that are only 1 label long (text form - no internal dots).

counter-req-refusals

The number of DNS requests refused by the server.

counter-req-unparses

The number of requests received that could not be parsed.

counter-other-errors

The number of requests which were aborted for other (local) server errors.

counter-reset-time

The time stamp of the last administrative reset of DNS counters.

sample-time

The time stamp of the last sample. This attribute applies only when sample counters are enabled.

sample-interval

The counter sampling interval. This attribute applies only when sample counters are enabled.

These are the enhanced counters by category:

Category

Label

Description

Performance

updated-rrs

Total number of RR’s added or deleted, including administrative updates from SCP. Note that a single update may have multiple deletes and/or adds.

update-packets

Total number of update packets successfully processed.

ixfrs-out

Number of successful outbound incremental zone transfers.

ixfrs-in

Number of successful inbound incremental zone transfers, including full zone responses.

ixfrs-full-resp

Number of successful outbound full zone transfers that are originated from IXFR requests, but required a full zone response because of IXFR errors, requested serial history was not available, or there were too many changes in the zone.

axfrs-out

Number of successful outbound full zone transfers, including full zone responses to IXFR requests.

axfrs-in

Number of successful inbound full zone transfers.

queries

Number of query responses, including name queries, IXFR/AXFR query responses, and query forward responses, but excluding update replies.

xfrs-out-at-limit

Number of times the number of outbound zone transfers reached the concurrent limit (set by the DNS server visibility 3 attribute, xfer-server-concurrent-limit, which has a default value of five).

xfrs-in-at-limit

Number of time the number of inbound zone transfers reached the concurrent limit (set by the DNS server visibility 3 attribute, xfer-server-concurrent-limit, which has a default value of five).

notifies-out

Number of outbound Notify packets.

notifies-in

Number of inbound Notify packets.

Query

auth-answers

Number of queries that were authoritatively answered (reference RFC-1611).

auth-no-names

Number of queries for which authoritative-no-such-name responses were made (reference RFC-1611).

auth-no-data-responses

Number of queries for which authoritative-no-such-data (empty answer) responses were made (reference RFC-1611).

nonauth-answers

Number of queries that were non-authoritatively answered from cached data (reference RFC-1611).

nonauth-no-data-responses

Number of queries that were non-authoritatively answered with no data, i.e. an empty answer (reference RFC-1611).

referrals

Number of requests that were referred to other servers (reference RFC-1611).

relative-name-requests

Number of requests received by the server for names that were only one text label long (reference RFC-1611).

lame-delegations

Number of lame delegations.

mem-cache-hits

Number of internal memory cache lookup hits.

mem-cache-misses

Number of internal memory cache lookup misses.

mem-cache-writes

Number of cache record writes to the persistent cache DB.

Security

rcvd-tsig-packets

Number of received packets containing a TSIG record.

detected-tsig-bad-time

Bad TSIG time detected from incoming packet contents.

detected-tsig-bad-key

Bad TSIG key detected from incoming packet contents.

detected-tsig-bad-sig

Bad TSIG signature detected from incoming packet contents.

rcvd-tsig-bad-time

Bad TSIG time reported in the TSIG error field in the incoming packet.

rcvd-tsig-bad-key

Bad TSIG key reported in the TSIG error field in the incoming packet.

rcvd-tsig-bad-sig

Bad TSIG signature reported in the TSIG error field in the incoming packet.

unauth-xfer-reqs

The number of restrict-xfer-acl ACL authorization failures for zones with restrict-xfer enabled.

unauth-update-reqs

The number DNS update failures due to update-acl ACL authorization failures or because zones have been configured with the dynamic attribute disabled.

restrict-query-acl

The number of query failures due to restrict-query-acl ACL authorization failures.

Errors

update-errors

Number of errors detected in update packets, excluding TSIG errors.

ixfr-in-errors

Number of inbound IXFR errors, excluding packet format errors.

ixfr-out-errors

Number of outbound IXFR errors, excluding packet format errors.

axfr-in-errors

Number of inbound AXFR errors, excluding packet format errors.

axfr-out-errors

Number of outbound AXFR errors, excluding packet format errors.

sent-total-errors

Number of requests the server has processed that were answered with errors (RCODE values other than 0, 3, 6,7, and 8). reference RFC-1611

rcvd-format-errors

Number of incoming packets received with the error field set, i.e. with RCODE set to FORMERR.

sent-format-errors

Number of requests received that could not be parsed and resulted in a FORMERR response. reference RFC-1611.

sent-other-errors

Number of requests that were aborted for other (local) server errors. reference RFC-1611

Maxcounters

concurrent-xfrs-in

The maximum number of concurrent threads used for inbound zone transfers during this reporting interval.

concurrent-xfrs-out

The maximum number of concurrent threads used for outbound zone transfers during this reporting interval.

CNR API

You can collect total statistics from the last server-start or sample counters that use the CNR API. Use the

API call getCNRDNSServerStats. This API returns these statistics:

Attribute

Description

Value Data Type

dns-server-stats

The current DNS server statistics

SCP Object of type DNSServerStats.

total-counters

The category counters measured since the last server start or administrative reset

SCP List of counter objects of type DNSServerPerformanceStats, DNSServerQueryStats, DNSServerSecurityStats, DNSServerErrorsStats, and DNSServerMaxCounterStats.

sample-counters

The sample counters measured during the last sampling interval

SCP List of counter objects of type DNSServerPerformanceStats, DNSServerQueryStats, DNSServerSecurityStats, DNSServerErrorsStats, and DNSServerMaxCounterStats.

You can use the server getstats command to retrieve these counters with the CLI:

nrcmd> dns getstats
nrcmd> dns getstats all total
nrcmd> dns getstats all sample

DNS Log Files

You can use the DNS activity summary log to collect periodic statistics from the server. Set these items to enable the activity summary log:

The activity-summary flag for the DNS log-settings attribute

The report interval

The categories to be logged

For example:

nrcmd> dns set log-settings=activity-summary
nrcmd> dns set activity-summary-interval=1m
nrcmd> dns set activity-counter-log-settings=total, sample, performance, query

The default report interval is five minutes.

Note: You must enable sample counters to report counters for the sample interval. The counters are reported for both totals and the latest sample interval in messages 03523, 03573, 03574, 03575, 03576, 03577, 03578, 03579, 03580 and 03603. For example:

02/20/2004 15:48:41 name/dns/1 Info Server 0 03523 [Stats-Perform] Total since
Fri Feb 20 15:42:15 2004 - update-rrs:0, update-packets:0, ixfrs-out:0, ixfrs-in:0, ixfrs-full-resp:5, axfrs-out:10, axfrs-in:0, queries:10, xfrs-out-at-limit:0, xfrs-in-at-l notifies-out:10, notifies-in:0.
02/20/2004 15:48:41 name/dns/1 Info Server 0 03573 [Stats-Perform] sampled at
Fri Feb 20 15:43:41 2004 with interval of 300 sec - update-rrs:0, update-packets:0, ixfrs-out:0, ixfrs-in:0, ixfrs-full-resp:5, axfrs-out:10, axfrs-in:0, queries:10, xfrs-out xfrs-in-at-limit:0, notifies-out:10, notifies-in:0.
02/20/2004 14:28:42 name/dns/1 Info Server 0 03574 [Stats-Query] Total since
Fri Feb 20 13:54:17 2004: auth-answers:1, auth-no-names:0, auth-no-data-responses:0, nonauth-answers:3, nonauth-no-data-responses:0, referrals:1, relative-name-requests:0, refusals:0, lame-delegations:0, mem-cache-hits:316, mem-cache-misses:124, mem-cache-writes
02/20/2004 14:29:42 name/dns/1 Info Server 0 03575 [Stats-Query] sampled at
Fri Feb 20 14:28:42 2004 with interval of 60 sec: auth-answers:1, auth-no-names:0, auth-no-data-responses:0, nonauth-answers:3, nonauth-no-data-responses:3, referrals:0, relative-name-requests:1, refusals:0, lame-delegations:0, mem-cache-hits:0, mem-cache-misses:18, mem-cache-writes:4.

Collect TFTP Server Statistics

CNR API

You can use the CNR API to collect statistics from the last server-start. To sample statistics, your program must poll the server and record the changes from the last polling interval. Use the GetServerStats API call. This API returns these attributes in an SCP Object of type TFTPServerStats:

Attribute

Description

id

String identifier for this TFTP server.

server-start-time

The start time of the server.

server-reset-time

The time the server was last restarted or reloaded.

server-state

The server state: other(1) - server in some unknown state; initializing(3) - server (re)initializing; running(4) - server currently running.

server-time-since-start

The elapsed time since the TFTP server process was started.

server-time-since-reset

The elapsed time since the TFTP server was last reset (restarted or reloaded).

total-packets-in-pool

Maximum number of packet buffers that can be used by the server.

total-packets-in-use

Total number of packet buffers currently in use by the server.

total-packets-received

Total number of packets received by the server since the last server reset.

total-packets-sent

Total number of packets server has sent since the last server reset.

total-packets-drained

Total number of packets drained (read and discarded) since the last server reset. A packet is drained when the TFTP server is overwhelmed and is using all its packets already, so there are no more available to process the incoming packet.

total-packets-dropped

Total number of packets the server has dropped since the last server reset. This includes packets that are unknown to the server, malformed, duplicated, drained, etc. (any packet that is dropped for any reason).

total-packets-malformed

Total number of packets the server has received that were malformed since the last server reset.

total-read-requests

Total number of packets the server has received that were read requests since the last server reset.

total-read-requests-completed

Total number of read requests that were completed since the last server reset.

total-read-requests-refused

Total number of read requests that the server refused since the last server reset.

total-read-requests-ignored

Total number of read requests that the server ignored since the last server reset.

total-read-requests-timed-out

Total number of read requests that timed out since the last server reset.

total-write-requests

The number of packets the server has received that were write requests since the last server reset.

total-write-requests-completed

Total number of write requests that were completed since the last server reset.

total-write-requests-refused

Total number of write requests that the server refused since the last server reset.

total-write-requests-ignored

Total number of write requests that the server ignored since the last server reset.

total-write-requests-timed-out

Total number of write requests that timed out since the last server reset.

total-docsis-requests

The number of packets the server has received that were CSRC 1.0 dynamic DOCSIS requests since the last server reset.

total-docsis-requests-completed

Total number of CSRC 1.0 dynamic DOCSIS requests that were completed since the last server reset.

total-docsis-requests-refused

Total number of CSRC 1.0 dynamic DOCSIS requests that the server refused since the last server reset.

total-docsis-requests-ignored

Total number of CSRC 1.0 dynamic DOCSIS requests that the server ignored since the last server reset.

total-docsis-requests-timed-out

Total number of CSRC 1.0 dynamic DOCSIS requests that timed out since the last server reset.

read-requests-per-second

Number of read requests per second processed during this reporting interval.

write-requests-per-second

Number of write requests per second processed during this reporting interval.

docsis-requests-per-second

Number of CSRC 1.0 dynamic DOCSIS requests per second processed during this reporting interval.

You can use the server getstats command to retrieve these counters with the CLI:

nrcmd> tftp getstats

Collect Host Statistics

To collect host statistics you must plan the host capacity and tune the system performance. The mechanism used to collect information on the machine is not in the scope of this document, because it is system dependent. You must collect at least these statistics:

CPU usage

- User

- System

- Wait

Network usage

- Send

- Receive

Disk usage

- Read

- Write

Uptime at the same rate and in the same vein as the server information

Collect the use as percentages.

Interpret the Collected Statistics

You must interpret the collected statistics as an art more than a science. You must develop heuristics and adjust over time, because they are based on your deployment and deployment history. Calculate the steady state for all the statistics that you collect. These sections describe what to look for in theses categories:

Capacity planning

Attack detection

Misconfiguration

Each server has statistics that highlight what the server does. These are its performance indicators.

The uptime of the servers and the machines are used to warn of errors. Restarts can occur for maintenance. However, if they occur frequently, you can have a problem that requires investigation.

You can merge the collected host statistics with the collected server statistics to help plan capacity. With one CSV file that contains machine and server statistics, you can create charts that map server performance with CPU, network, and disk usage.

Attack detection covers both malicious attempts to break your network and friendly processes that cause more load than expected. You must know the steady state rates for each server to calculate if an attack occurs.

Misconfiguration can occur both in the servers and network configuration. You must know the deployment architecture to monitor the configuration.

Interpret DHCP Server Statistics

The number of DHCP messages processed is the main performance indicator for the DHCP server. Request and response buffers provide an indication of the traffic load on the server. Failover counters provide an indication of the state of failover synchronization.

Capacity Planning

You can chart the CPU, network, and disk usage versus the performance indicators of the DHCP server, by using the merged CSV text file. From this you can determine what combination of machine resources impact the performance of your server, aid the capacity plan, and tune the performance of the machine.

The number of request buffers used indicates how many simultaneous requests the server handles. When the network operates at a steady state, this value remains relatively constant. When a large reboot occurs, the value jumps to the configured maximum. Further incoming packets are dropped, and new requests are only taken in by the server as pending requests. This algorithm leverages the fact that DHCP clients timeout and retry if they do not receive a response. Dropping the extra requests allows the server to dedicate its process to handle only as many packets as it can respond to within the client time out and minimizes the total time required to bring all clients on line. Once the reboot event is completed, the buffers in use return to steady state values.

Note: Since the same pool of request buffers is used for both lease activity and failover activity, request buffers in use never reach 0 when failover is enabled, even in the absence of client activity.

The default value for max-dhcp-requests is 500, but you can tune this to the capacity of the server. The server capacity is defmed in terms of the lease rate and the average latency of the lease transaction. For example, if the maximum capacity of the server is 1000 leases/sec, and on average, leases are returned to the client in 500 ms, then a value of 500 is sufficient for the server to respond to clients at this rate. A lower value throttles the performance of the server below this capacity. A higher value increases the latency during burst events, but allows a greater number of clients to be serviced without retries. Given the typical client timeout of four seconds, an average latency of two seconds can be tolerated without added client timeouts on the traffic load. You can measure the maximum leases per second where CPU use reaches 100% or the latency exceeds the maximum threshold.

The number of response buffers in use indicates how many simultaneous requests are completed by the server. When the network operates at a steady state, this value remains constant, and tracks with the number of request buffers in use. If the server reaches its configured maximum, it can no longer respond to events. This should not occur and is an indicator of a serious network problem. Since the same pool of response buffers is used for both lease activity and failover activity, the server adjusts this value to be at least four times the request buffers, to ensure sufficient resources are available to process all pending client and failover activity simultaneously.

The performance of the DHCP server is impacted by the performance of external systems, if LDAP client lookups are used, or the server is integrated into a Broadband Access Center (BAC) provisioning system. If the external systems are operated at capacity, then the DHCP server seems slower. If the CPU utilization on the DHCP server does not reach 100% before the latency threshold is reached, this can be an indicator of a provisioning system performance problem. In this case, you can fix the problem in the provisioning system, and not at the DHCP server.

Attack Detection

Comparing the rates of incoming DHCP messages with the steady state rates of incoming requests is a method of detecting a possible attack. A large number of dropped packets, declines, or nacks could also be indicators. However, these can also be indicators of a misconfiguration. A large increase in requests can indicate that a CMTS was rebooted or that some portion of the network restarted after a power outage.

Misconfiguration

The presence of decline messages indicates a network configuration error or a misbehaving client. Addresses are marked unavailable when a decline is received, but then reclaimed once the unavailable timeout period expires. However, addresses continue to cycle through an unavailable state until the network problem is resolved. The DHCP server logs contain additional entries for specific error conditions that are encountered, and can be used to help isolate the problem.

Some number of nacks are normal when failover partners resynchronize after an outage. An excessive number of nacks can indicate a configuration mismatch between the servers that prevents them from agreeing on the state of a lease. In this case, the servers may fail to complete resynchronization. You can use the failover configuration feature in the CNR web UI to verify and correct failover configuration issues.

If the number of dropped DHCP messages increases over the steady state for this statistic, a configuration error can exist in the provisioning system that prevents the server from assigning the client a valid address that matches its client class assignment. The DHCP server logs contain additional entries for the specific encountered conditions.

Interpret DNS Server Statistics

The main performance indicators for the DNS server are the number of query, zone transfer, and update messages processed.

Capacity Plan

You can chart the CPU, network, and disk usage versus the performance indicators of the DNS server by using the merged CSV text file. From this information you can determine what combination of machine resources impact the performance of your server, aid with the capacity plan and tune the performance of the machine.

A high number of memory cache misses can indicate that you should increase the size allocated to the cache to support a higher volume of queries. However, if the majority of query responses are non-authoritative, cache misses can indicate the TTLs for these records are too short to be usefully cached. In this case, a larger cache has little impact.

The performance counters, xfrs-out-at-limit and xfrs-in-at-limit indicate the number of times the server was throttled back by its configuration limit in processing zone transfers. If the main function of the server is to support zone transfers (for example, it is a secondary server configured to service zone requests from a group of second-tier secondary servers that serve client query requests), you can increase this limit to reduce the latency of zone updates. You should take care when changing this value for general-purpose servers, since an increase in zone transfer responsiveness decreases the responsiveness of update and query processing. The query, zone transfer, and update performance counters can be used to assess the primary role of each server in the network.

Attack Detection

A method of used to detect a possible attack, is to compare the rates of incoming query messages with the steady state rates of incoming requests. A large number of no-such-data responses or ACL authorization failures can be indicators. However, these can also be indicators of a misconfiguration. A large increase in queries can also indicate that some portion of the network restarted after a power outage.

Misconfiguration

The presence of lame delegation errors indicates a misconfiguration in the network that needs to be corrected on the originating name server. An excessive number of error packets received can indicate configuration problems in other DNS servers in the network. The DNS server logs contain additional entries for specific error conditions that you encounter, and you can use these to help isolate the problem.

An excessive number of no-such-data responses can indicate a configuration mismatch between the domain information provided to the client in a DHCP response, and the zones configured on the DNS server. These configuration errors should be corrected at the DHCP server, or the originating provisioning system, as appropriate for the deployment.

A large number of ACL authorization failures can indicate a configuration mismatch between servers or a situation where clients issue requests from a new network that was not added to the authorized list. The DNS server logs contain additional entries for specific error conditions that are encountered, and can be used to help isolate the problem.

Interpret TFTP Server Statistics

The main performance indicators for the TFTP server are the number of read and write request messages processed. Packet buffers in use provide an indication of the traffic load on the server.

Capacity Plan

You can use the merged CSV text file to chart the CPU, network, and disk usage versus the performance indicators of the TFTP server. From this information you can determine what combination of machine resources impact the performance of your server, aid to plan the capacity and tune the performance of the machine.

The number of packet buffers used indicates how many simultaneous requests are handled by the server. When the network is operating at a steady state, this value should remain relatively constant. When a large reboot event occurs, this value can jump to the configured maximum. The server default is 512. Further incoming packets are dropped, and new requests are only taken in by the server as pending requests. This algorithm leverages the fact that clients timeout and retry if they do not receive a response. Dropping the extra requests allows the server to dedicate its process to handle only the packets it can respond to within the client time out and minimizes the total time required to bring all clients on line. Once the reboot event is completed, the buffers in use return to steady state values. This value does not need to be tuned. Since the TFTP protocol starts a new connection for each client request, configuring the server to accept a greater number of simultaneous connections can quickly exhaust server resources, and result in degraded performance overall. The maximum value that can be configured is 1000.

Attack Detection

A method used to detect a possible attack is to compare the rates of incoming read and write requests with the steady state rates. A large number of refused and/or ignored requests can also be indicators. However, these can also be indicators of a misconfiguration. A large increase in requests can indicate that a CMTS was rebooted or some portion of the network was restarted following a power outage.

Misconfiguration

An excessive number of ignored read requests can indicate a configuration mismatch between the file information provided to the client in a DHCP or BOOTP response, and the files available on the TFTP server. Configuration errors should be corrected at the DHCP server, or the originating provisioning system, as appropriate for the deployment. The TFTP server logs contain additional entries for specific error conditions that are encountered, and can be used to help isolate the problem.

Related Information

Technical Support - Cisco Systems