Table Of Contents
Troubleshooting
Determine Version Information
Cannot Locate License Key
Cannot Recover My Password
Cannot Delete a Device from MARS
Cannot Re-Add a Device to MARS
Cannot Add a Device to MARS
Cannot Rename Device in MARS
Collect Support Information
Submitting Feedback and Reporting Errors
Access the GUI when the Network Is Down
Troubleshooting Global Controller-to-Local Controller Communications
Communications Overview
Communication States
Required Open Ports
General Issues and Solutions
List of Backend Services and Processes
Error Messages
Troubleshooting
Revised: September 14, 2007, OL-14672-01
This appendix presents information that is helpful when troubleshooting the MARS Appliance. It lists expected services and error messages for each supported MARS Appliances. It explains how to collect and send support information to assist Cisco support in debugging such services are required. This appendix also provides guidance on retrieving lost license keys and running the web interface using a console connection. It includes the following topics:
•
Determine Version Information
•
Cannot Locate License Key
•
Cannot Recover My Password
•
Cannot Delete a Device from MARS
•
Cannot Re-Add a Device to MARS
•
Cannot Add a Device to MARS
•
Cannot Rename Device in MARS
•
Collect Support Information
•
Access the GUI when the Network Is Down
•
Troubleshooting Global Controller-to-Local Controller Communications
•
List of Backend Services and Processes
•
Error Messages
Determine Version Information
Beginning with the 4.3.1 and 5.3.1 releases, the dynamic IPS signature updates is an aspect of the version of software running on a MARS Appliance. Therefore, in addition to running the same MARS software versions on the Global Controller and Local Controller, the IPS signature version must match or the communications fail. To determine the version of MARS software and the IPS signature version, click Help >About on each appliance.
Cannot Locate License Key
For newer models of the MARS Appliance, the license key and serial numbers are both located on the exterior of the appliance. For information on locating the license key and serial number, see License Key, page 1-9.
If you cannot locate your license key, contact the Cisco Licensing Team at licensing@cisco.com. You will need to provide the following information in the e-mail:
•
Customer name
•
Serial number of the MARS Appliance
Cannot Recover My Password
See Recovering a Lost Administrative Password.
Cannot Delete a Device from MARS
See Delete a Device.
Cannot Re-Add a Device to MARS
If you cannot re-add a device to MARS, the device is likely already defined in one capacity or another. See Delete a Device.
Cannot Add a Device to MARS
If you cannot add a device to MARS, the device has likely been defined during a topology discovery operation. You can address this issue by first deleting the device, and then adding it. See See Delete a Device.
Cannot Rename Device in MARS
You cannot directly rename a device. To do so you must first delete the device and then re-add it. See Delete a Device.
Collect Support Information
As long as your appliance is running, you can provide Cisco support with log information that can assist in diagnosing any issues you are having with the appliance. Three options exist for collecting and sending this information:
•
Collect Summary Status from the MARS Database. As of 4.3.1 and 5.3.1 releases, you can use the get_mars_summary_info.sh script to gather high-level statistics about a MARS Appliance's configuration and topology.
[pnadmin]$ script get_mars_summary_info.sh
Collecting MARS summary info from the DB in HTML format
Started at Fri Aug 24 11:08:58 PDT 2007
Use 'pnlog mailto' command to include it in the logs This may take several minutes to
complete. Use Ctrl+C in case you need to interrupt.
Completed at Fri Aug 24 11:10:20 PDT 2007 [pnadmin]$
After running the script, use the pnlog mailto command to e-mail the logs to yourselff. You will see the files get_mars_summary_info.html and get_mars_summary_info.run.log in the log file named error-logs.tar.gz received with the other logs.
•
From the CLI, you can use the pnlog mailto command. For more information on using this command, see pnlog.
•
In the GUI, you can use the Help > Feedback option. For more information on using this option, see Submitting Feedback and Reporting Errors
Both options require that the appliance is connected to a network that can reach your SMTP server, and that the appliance is configured properly to send e-mail to that server. You can specify the e-mail gateway settings either on the Admin > System Setup > Configuration Information page or as an option the command line using the pnlog mailto command.
The pnlog mailto command packages and delivers the following information in a file named error-logs.tar.gz:
•
C++ process logs
•
System logs
•
Java (GUI) logs
•
Upgrade logs
•
Current version
•
Current model
•
List of running processes
No passwords or network information is included in the error-logs.tar.gz file.
Submitting Feedback and Reporting Errors
If you receive an error in the web interface and the system recovers, a pink page appears allowing you to report the error to Cisco.
You can use either the Report Error button or the Feedback button that appears on every page to send feedback and error log files to the Cisco TAC. When you select the Feedback button, an e-mail message is sent to the e-mail address associated with the user account with which you are logged into the MARS web interface. You can forward this e-mail as needed. If you log in using an account that does not have an e-mail address associated with it, you will be prompted to enter an e-mail address.
The Report Error button allows you to send the error logs and information related to the triggering error. The error log facilitates debugging the error, and therefore it is the recommended option. However, this option requires that you provide a valid TAC case number to which the error log is attached.
If you do not already have a valid case number, you are redirected to the Cisco TAC web site so you can create a new TAC case and obtain a valid case number.
Access the GUI when the Network Is Down
While console connections enable you to perform basic network settings for an appliance, you must use the GUI to perform the majority of the configuration for the appliance. If you cannot connect to the appliance from hosts on your network, you can access the GUI using a computer by connecting a crossover cable to one of the Ethernet ports in the appliance.
To access the GUI using a console connection, follow these steps:
Step 1
With the appliance running, connect a Cat 5 crossover cable to your computer's Ethernet port.
Step 2
Connect the Cat 5 crossover cable to the MARS Appliance's eth1 port. See Hardware Descriptions—MARS 25R, 25, 55, 110R, 110, 210, GC2R, and GC2, page 1-4
Step 3
Configure the computer's local TCP/IP settings to be on the same network as one of the Ethernet interfaces in the MARS Appliance. Pick an IP address other than the one used by the appliance on that interface.
It is possible that you specified the interface address for eth1 when you configured the interfaces using a console connection in Specify the IP address and Default Gateway for the Eth0 Interface, page 5-7, and Specify the IP Address and Default Gateway for the Eth1 Interface, page 5-8. However, the factory default setting for eth1 is 192.168.0.101.
Tip
You can use eth0 also; however, you must specify an address for your computer that works with the network settings that you specified in Specify the IP address and Default Gateway for the Eth0 Interface, page 5-7.
Troubleshooting Global Controller-to-Local Controller Communications
The following sections provide information to assist in troubleshooting communications issues between a Global Controller and the Local Controllers it manages.
•
Communications Overview
•
Communication States
•
Required Open Ports
•
General Issues and Solutions
Communications Overview
A Global Controller and Local Controller can communicate if they are running on the same version of software. A version mismatch causes all communications to stop. For more information on configuring the communications, see Configuring the Global Controller of User Guide for Cisco Security MARS Global Controller.
When a Global Controller and Local Controller communicate, several types of data are synchronized:
•
Topology. Topology configuration data includes the list of monitored devices, their interfaces, routes, and network groups. This data is sent from a Local Controller to the Global Controller every 30 seconds.
•
Configuration. Configuration data includes custom parser definitions, event types, inspection rules, report definitions, and user accounts and roles that are defined on the Global Controller. This data is sent from the Global Controller to Local Controller every 30 seconds.
•
Report data. Report result data is sent from a Local Controller to the Global Controller every 10 minutes. If a backlog exists on the Local Controller (for example, due to a communications failure), a block of report data is picked up 30 seconds after the previous block transmission completes until the backlog is clear.
Note
For each schedule report (whether global or just a default system report), data is collected every 10 minutes and sent to the Global Controller, regardless of whether a report is scheduled within that interval.
•
Incident/firing event data. This data is sent from the Local Controller to Global Controller every two minutes.
Communication States
When troubleshooting the communications, first verify that the Local Controller and Global Controller are communicating properly. From the web interface of the Global Controller, view the device state on the Admin > System Setup > Local Controller Information page. Understanding the communication state can assist you in diagnosing issues.
The key states to check for when troubleshooting communications issues are as follows:
•
Active. This state indicates that communications are operational. If you made a recent change, wait a minute for the system to process the change and then re-visit the page to obtain the updated state.
Note
After adding a new Local Controller, the page briefly indicates the Active state even though you have not added the certificates. Re-visit the page to obtain the correct state.
•
Certificate Errors. This state indicates the certificates are not configured correctly. If this state appears, validate the certificates on both the Local Controller and Global Controller. See Importing the Security Certificates
•
Synchronizing (progress). This state results from triggering a full topology synchronization. A status indicator allows you to monitor the progress.
For a complete list of states and their meanings, see Table 2-3 Local Controller Status Messages on Zone Controller Page.
Required Open Ports
When a Global Controller and Local Controller are separated by a firewall, open the following ports on both the inside and outside interfaces of the firewall to ensure proper operation of the Global Controller:
TCP Port
|
Function
|
22
|
Secure Shell (SSH) used by Local Controller for topology and device discovery
|
443
|
Hyper Text Transport Protocol with Secure Sockets Layer (HTTPS) use for user interface access
|
8444
|
Cisco Proprietary data synchronization between a Global Controller and Local Controllers.
|
General Issues and Solutions
The following symptoms and solutions address many synchronization errors.
Tip
Deleting and re-adding a Local Controller is rarely, if ever, the solution. This change also causes a full re-synchronization of topology data, resulting in an even longer downtime (possibly days). You should only delete a Local Controller if you want to permanently remove that Local Controller from the Global Controller.
Symptom
|
Possible Resolution
|
Local Controller/Global Controller communications fail.
|
Beginning with the 4.3.1 and 5.3.1 releases, the dynamic IPS signature updates is an aspect of the version of software running on a MARS Appliance. Therefore, in addition to running the same MARS software versions on the Global Controller and Local Controller, the IPS signature version must match or the communications fail. To determine the version of MARS software and the IPS signature version, click Help >About on each appliance.
|
Local Controller/Global Controller communications does not appear to work but the state is Active.
|
This issue can result from a backlog of data caused by a temporary disconnect of the Local Controller and Global Controller. Data synchronizes over time; therefore, the solution is to wait to verify the issue is correctly diagnosed. See Data is not synchronizing and the Local Controller and Global Controller were disconnected.
Possible causes:
A recent network outage caused a communication disconnect. The symptoms appear if the Local Controller receives a lot of data because, in such cases, the backlog can be large.
A high usage MARS Appliance may not have adequate bandwidth between Local Controller/ Global Controller to ensure that the system stays synchronized.
|
Data is not synchronizing and the Local Controller and Global Controller were disconnected.
|
If a Local Controller\Global Controller pair is disconnected for a long period of time, the report and incident data will take a long time to transfer to the Global Controller. For each global report, data is gathered every 10 minutes and then transferred to the Global Controller. If the connectivity to the Global Controller is down, the Local Controller queues up pending data transfers. When connectivity is restored, it begins sending the report data.
Configuration and topology data does not take as long as report and incident data, and it should synchronize in a reasonable amount of time.
Note Communication link speeds vary; a saturated link could slow synchronization greatly relative to a lab environment.
|
A change in the Global Controller, such as adding a new global report or inspection rule, does not appear on a managed Local Controller.
|
Verify Activate was clicked.
You must click Activate for Local Controller-based topological changes to be pushed to the Global Controller
|
No incidents appear in the Global Controller
|
This issue can result from a time synchronization mismatch. Make sure the Local Controller and Global Controller have the system times set properly as a time skew can cause incidents to not appear in the Summary page.
|
I deleted a Local Controller from the Global Controller when there were communication problems. How do I restore the Local Controller?
|
If the Local Controller was deleted from a Global Controller when communications were failing, use the pnreset -s command to reset the Local Controller to standalone mode. Then, you can add it to the Global Controller again.
For more information, see pnreset.
|
A replacement Global Controller appliance has been restored. How do I restore communications with the Local Controllers?
|
Use the pnreset -g command on each Local Controller. This command removes the Global Controller data from a Local Controller, leaving Local Controller-specific data untouched. This option keeps the Global Controller connectivity information on the Local Controller intact, enabling the Local Controller to reconnect as soon as the Global Controller is restored (to purge this information, use the -s option). For more information, see pnreset.
Note Use this option only when a Global Controller recovery is required.
|
The topology diagram is missing a device or other information.
|
To verify the issue is not the result of a slow link or catch up due to network downtime, add new device as a test. If the test device replicates after clicking Activate and waiting a few minutes, but the missing data still does not replicate, there could be an issue processing the transaction log.
To manually re-synchronize the topology data, perform the following steps from the Global Controller web interface:
1. Click Admin > Local Controller Management.
2. Select the Local Controller that has the issues and click the Topo Sync Start/Stop button.
The entire topology is copied from the Local Controller to the Global Controller. The size of this data set depends on the topology, but in very large cases, this operation can take several days. See Topology Synchronization.
On the Local Controller Management page, the status indicates that data is being processed. As long as it is moving, progress is being made so continue to wait.
Note Deleting and re-adding the Local Controller restarts this process and is not recommended
|
A topology change does not appear, the state is Active, and a reasonable amount of time has passed.
|
Initiate a full topology synchronization to re-push all topology.
Note The time required to perform a full topology synchronization is not trivial; use this process only if topology data is missing on the Global Controller but more recent topology data has been transferred from the same Local Controller.
|
Configuration data (users, report definitions, rules, and event types) does not replicate from a Global Controller to Local Controller
|
If the severs were disconnected, this symptom can result because it takes time to clear the backlog created during the downtime.
To diagnose, create a new piece of data, such as a new user, and then click Activate. If, after a few minutes, the new user data replicates but the originally missing data does not, MARS has encountered an issue replaying that log. No configuration synchronization mechanism exists; therefore, you should following your technical support escalation process.
|
None of the previous suggestions correct the error.
|
Use the pnlog command to collect log data and submit it to technical support to identify exceptions that may have the caused the error. See Collect Support Information.
|
List of Backend Services and Processes
You can obtain status on the following services and processes by entering pnstatus at the command line or by selecting Admin > System Maintenance > View Log Files to view backend system logs generated by the appliance. Table B-1 lists the services and processes and provides a description of their role within MARS.
Note
All services should be running on a Local Controller. However, a Global Controller only has three services running: graphgen, pnarchiver, and superV—all other services are stopped.
Table B-1 MARS Services and Processes Descriptions
Service/Process Name
|
Description
|
pnparser
|
The pnparser service receives and parses events, SNMP MIBs and traffic flow logs generated by the reporting devices. It also uses network topology information to sessionize flows. The sessionization process involves grouping flows and other events for the same Layer 7 session that arrives within a small time frame. The network topology information is used to normalize the NAT-ed flows. Events belonging to the same session are assigned a session identifier.
|
ANOMALY service
|
The ANOMALY service performs statistical analysis of flows and other variables obtained via SNMP MIBs such as per-interface bandwidth, per-interface errors, and firewall connections. This service detects statistically significant anomalies in the data. In case of a detected anomaly, the ANOMALY service inserts a MARS generated "anomaly detected" event into the system.
|
autoupdate
|
The backend process that pulls and processes the IPS signature updates.
|
LOGIC service
|
The LOGIC service correlates the parsed events according to a set of inspection rules. The inspection rules may be built in (that is, system defined) or defined by the user. Whenever a correlation rule is satisfied, the LOGIC service creates an incident containing the set of events satisfying the rule and forwards the incident for further analysis to process_postfire_srv.
|
process_postfire_srv
|
The process_postfire_srv service analyzes the incidents generated by the LOGIC service to determine whether they are false positives, identifies valid incidents that may represent potential attacks, and notifies the administrator. The service examines information from the following sources:
• Built in event vulnerability data
• Host information obtained from administrators or learned when process_postfire_srv probes hosts that have been attacked
• Host Vulnerability information from vulnerability scanner results
• Network topology paths and sessionized event data
|
LOADER service
|
The LOADER service efficiently stores the events and incidents into the database and compresses the data to be stored for archival purposes.
|
process_inlinerep_srv
|
The INLINE REPORT service performs in-memory computation of certain reports—this avoids the huge I/O penalty associated with database server computing these reports.
|
discover
|
The DISCOVERY service discovers the Layer 3 and Layer 2 network topology, NAT and ACL configuration from firewalls and routers. The service parses this information and stores it in the database in a unified vendor and device neutral form.
|
graphgen
|
The GRAPHGEN service creates network topology graphs, hotspot topology graphs, and topological attack paths for display by the web browser. The service also generates appropriate vendor and device-specific mitigation commands based on its derived knowledge about the attack path and all devices along the attack path.
|
GUI service
|
The GUI service provides the code used to display web pages that serve as the web interface for MARS. The service uses a JBOSS/Tomcat application server framework.
|
REPORTGEN service
|
The REPORTGEN service generates and sends the reports for the users. The service uses the JBOSS/Tomcat application server framework.
|
GC Exchange service
|
The Global Controller Exchange service communicates with the Global Controller and synchronizes the information between the two systems. The information that needs to be syncrhonized is:
• Network topology discovered by the MARS appliances,
• Report results generated by a MARS appliance
• Incidents generated at a MARS appliance
• Global objects (for example, networks, services, rules, reports, and queries) created at a Global Controller
|
pnarchiver
|
The pnarchiver service archives data stored in the database to an offline store via NFS. Both configuration data and dynamic events and incident data are archived. The archiving is done for both system recovery and forensics.
|
pndbpurger
|
The pndbpurger service deletes old data from the database to make room for new data.
|
superV
|
The superV service acts as a software watchdog for various MARS backend processes. It monitors resource usage of the various services and various consistency conditions and restarts the appropriate services whenever necessary. The superV service also provides an event bus for the MARS processes to send messages to each other.
|
device_monitor
|
The PNMONITOR service acts as a software watchdog for JBOSS and SUPERV. The operating system watches the health of PNMONITOR service.
|
KeywordQuerySrv
|
Based on a keyword query across raw messages, this backend process scans through local index and data files to identify and retrieve matching raw messages. The results are then stored in the database.
This process was introduced in 5.2.4.
|
csdam
|
This backend process is responsible for DTM and the management of IOS IPS signatures. It uses the IOS command line interface (CLI) over SSH or Telnet to issue SDF updates and retrieve current configuration information from the managed Cisco IOS IPS routers. This process was introduced in 4.1.
|
csips
|
This backend process uses RDEP to pull alerts from IDS 4.0 devices and SDEE to pull alerts from IPS 5.0 devices. The alerts pulled are then processed and passed on to pnparser from where they enter the system as all other events do. This process, introduced in version 4.2.2, replaces the two former processes named pnids40_srv and pnids50_srv.
|
csiosips
|
This backend process uses SDEE to pull alerts from IOS IPS devices using SDEE. The alerts pulled are then processed and passed on to pnparser from where they enter the system as all other events do. This process, introduced in version 4.2.2, replaces the former process named pniosips_srv.
|
cswin
|
This backend process uses MS-RPC to pull alerts alerts from Windows devices. The alerts pulled are then processed and passed on to pnparser from where they enter the system as all other events do. This process was introducted in version 4.2.2.
|
pnmac
|
This backend process retrieves the mac addresses for the IP addresses found in sessions and incidents. It uses the STP information provided by the switches to which the sources and destinations are connected. MARS uses this data to perform port blocks or suggest the CLI commands required to block traffic from these MAC addresses.
|
device_monitor
|
This process uses SNMP to monitor the resources usage on the reporting devices and raises device anomalies (MARS events) when the usage exceeds the defined thresholds. The resources studied include CPU, memory, number of connections, and bandwidth used.
|
DbIncidentLoaderSrv
|
This process stores event/session data for fired incidents into the database after process_postfire_srv has performed false positive analysis.
|
pnesloader
|
This process stores event and session data in the database after pnparser has parsed and sessionized the recoeved data.
|
process_event_srv
|
This process is the rule processing engine. Compiles rules, receives events, computes the incidents that need to be fired and passes them on for notification and false positive analysis to process_postfire_srv.
|
process_query_srv
|
This process computes the results for multi-lined queries (queries that look like multi-line rules. For example, X followed by Y).
|
Error Messages
"Error ./pnarchiver Thread 2051:PN-0102:SQL error: ORA-01005: null password given; logon denied"
Issue: Problem with archiving to NFS server. The directories for the archiving are properly created on the server but those directories remain empty.
Workaround: An interoperability issue exists between MARS and CygWin NFS server running on Windows 2003 server. To work around such interoperability issues, replace the NFS server with Microsoft Windows Services for Unix. For more information, see Configure the NFS Server on Windows, page 6-24.
Page cannot be found.
Issue: Upon logging in to the web interface, user receives a "Page cannot be found." error and the URL in the address bar is of the format: https://<IP_address>/j_security_check.
Workaround: If you have the MSN Search Toolbar enabled in your browser, you must disable it before logging into MARS. To disable it, right-click on the toolbar and deselect MSN Search Toolbar. Alternatively, you can simply delete the j_security_check at the end or the URL string and press Enter.
Hangs on "Creating Oracle database"
Issue: When using the Recovery DVD, the system hangs on "Creating Oracle database."
Workaround: This error can occur when, after reboot, the appliance is connected to a network. When the image is applied, the system hangs attempting to detect the factory default addresses on the network.
"Status: PN-0002: No message for PN-0216"
Issue: The message, "Status: PN-0002: No message for PN-0216", displays after configuring the data archive settings in the web interface.
Workaround. This error message appears when you've entered an incorrect IP address or directory path for the data archiving feature.