Chalk Talk March

Operating an Incident Response Team


(The following is condensed from a chapter excerpt from the book Computer Incident Response and Product Security, from Cisco Press.)

After an Incident Response Team (IRT) is established, your next concern is how to successfully operate your team. This chapter covers the following topics to help you improve the operation of your IRT:

  • Team size and working hours
  • New team member profile
  • Advertising team’s existence
  • Acknowledging incoming messages
  • Cooperation with internal groups
  • Prepare for the incidents
  • Measure of success

Team Size and Working Hours

Team size is a function of the following factors:

  • The services to provide
  • The size and the distribution of the constituency
  • Planned working hours. (That is, will the IRT operate only during office hours or around the clock?)

In practice, if you are starting from scratch and the IRT’s task is defined as “go and deal with the incidents,” a small team should be sufficient for the start. The need for team expansion can be determined over time.

Many teams operate only during regular office hours for that locale. For this kind of coverage a two-person team should suffice. Although office-hours coverage is fine for the start, the IRT should look into extending its working hours to be active around the clock.

The main reason for extending the working hours is that some services (for example, a public website) are available at all times. The IRT must be able to respond to situations swiftly and not after the weekend.

One of the standard ways to extend working hours is to have someone who is on-call. This person can answer telephone calls and check incoming emails after hours and over the weekend. This setup can be augmented by cooperating with other teams.

If you want around-the-clock and weekend coverage, the number of the people in IRT would depend on whether the duties can be shared with other teams in the organization. If the duties can be shared, you might not need to increase the size of the IRT. If not, increasing the team size should be considered. A three-member team might be a good size to cover for one another when absences arise.

If the host organization is within the EU, it must pay attention to the European Working Time Directive (Council Directive 93/104/EC and subsequent amendments), which regulates that the working week must not be longer than 48 hours, including overtime. On the other hand, people might opt-out from the directive and work as long as required. The host’s human resources department must investigate this and set up proper guidelines.

Irrespective of what hours the IRT operates, that fact must be clearly and prominently stated and communicated to other teams and the constituency. Setting the right expectations is important.

When the IRT operates only during office working hours, they should state their hours and time zone in consideration of the global community. So all the information related to your working hours must be visibly and clearly stated on your team’s website.

Advertising the IRT’s Existence

The team’s existence must be announced internally within the constituency and externally to other teams. Set up a website that explains what the team does and how it can be reached. But that should not be the end of the effort. A website is passive and the team should be proactive. Consider the following:

  • Attend and present at conferences and meetings.
  • Send letters to appropriate people within and outside the constituency.
  • Print posters and place them at visible places within the organization.
  • Print and give away mugs, pens, stationery, or similar giveaway items.
  • Include information about the team in new hire documentation packets, sales material, or a service offering prospectus.
  • Meet with key people within and outside the constituency, and talk to them about the team and its purpose.
  • Print an advertisement in a magazine or newspaper. Give interviews.
  • Broadcast an advertisement on the radio or TV.
  • Publish research papers or books.

Acknowledging Incoming Messages

Receiving an email about a compromised device is usually how work on a new incident starts. The first step in this process is for the IRT to acknowledge receiving this initial notification. The acknowledgment must fulfill several goals:

  • Ensure the sender that the report is received and given attention.
  • Communicate the incident tracking number back to the sender (if assigned).
  • Set the expectations on what will happen next.
  • Provide information about the IRT and how it can be contacted.
  • The acknowledgment reflects team image, so it must look professional and be courteous.

Giving Attention to the Report

Most people prefer communicating with another human being than an impersonal machine. Having someone who can compose a reply is much better than an autoresponse mechanism, even if the confirmation is not as instantaneous as it would have been if it were automatic. It is perfectly fine to have a template answer that will be used to acknowledge the receipt of a report, but it is also acceptable to modify it for the added “human touch.”

Following are some examples of varying the template text:

  • Use the sender’s name in the response.
  • Ask for additional details.
  • Add seasonal greetings, but remember to keep cultural and religious differences in mind.

Incident Tracking Number

If the report represents an incident, it must be assigned a tracking number. That number must be told to the sender so that it can be used in subsequent emails. That way, both parties will always know which incident they are talking about. When exchanging encrypted email, the Subject line should contain only the incident number and nothing else. That way, it gives away the minimum details to whoever intercepts the message.

Setting the Expectations

Full communication between parties is important. You must set the right expectations on what will happen next and how long it might take. If the report is not an incident, state so clearly with the explanation on what to do if the sender does not agree with the assessment. If the report is an incident, state whether it is being handled right now, and if not, when it might be taken into the process.

Information About the IRT

Where can more information about the IRT be found and how can it be contacted? This is usually only a pointer to the IRT’s website that contains all the details.

Looking Professional and Courteous

To make your responses more professional, you can prepare some template text in advance so that whoever will be composing the actual response can cut and paste parts of the template. The template adds to the uniformity of the acknowledgments that, in turn, helps set expectation for the people who are reading them. This does not mean that people will now send a prepackaged response instead of leaving that to auto-responder software. The template ensures that relevant elements are included in the acknowledgment, and each team member can add their own touch to the response.

Cooperation with Internal Groups

In the same way the IRT cannot operate in isolation from the other IRTs, it also cannot operate without support and cooperation from various internal groups and departments. The groups and departments suitable for possible liaising are as follows:

  • Physical security
  • Legal department
  • Press relation
  • Internal IT security
  • Executives
  • Product security teams
  • Internal IT and network operation center (NOC)

Physical Security

Without good old-fashioned physical security, many state-of-the-art security mechanisms would not properly work. This group usually operates, or has access to, Closed Circuit TV (CCTV) cameras, if they are installed on the premises. Therefore, their cooperation is invaluable in cases where identity of a person must be confirmed.

Security teams can have power to arrest and detain. So, if a culprit is identified, the security team could make an arrest.

Legal Department

The legal department can be an invaluable asset. The IRT must work to identify who, from the legal side, would support the team in its job. The best results can be achieved if a designated person(s) is given an extra task to support the IRT on a long-term basis.

You must expect to invest a considerable effort at the beginning while the legal team learns about the security world and the IRT learns about the legal challenges. Only after both sides understand each other’s positions can real cooperation begin.

The IRT should bring all new or different incidents to the attention of the legal team. In the majority of cases, the legal team might decide that the new case falls under one of the previously encountered issues. It is a remaining few that will prompt the legal team to look deeper into the matter to see how the organization can better protect itself from the legal perspective. These improvements might range from the way the IRT approaches similar incidents to modified contracts that the organization will use in the future.

It is also a good idea that lawyers from different organizations reach out to each other and start a dialogue. It is much easier if they are approached collectively such as through the Vendor Special Interest Group forum under FIRST. Interested parties can visit http://www.first.org/vendor-sig/index.html and contact moderators.

Press Relations

At some point, the press might approach the team about a case. Talking to the press can be tricky. Usually the journalists would like to receive as much information as possible, whereas the IRT might need to be careful what to disclose and when.

Having a dedicated PR person assigned to the team to work with the press is helpful. The next best option is to have someone from the IRT receive PR training and act as the team’s spokesperson. The least desirable option is to have somebody, without any training, step in front of the journalists. Whatever your case happens to be, following are a few simple tips on what to do when talking to the press:

  • There is no such thing as “off the record.” Whatever you say can end up being printed. If something is not to be mentioned at the time, do not mention it under any circumstances.
  • Be prepared. If possible, ask for questions in advance and prepare the answers.
  • Ask to review the final article before it will be published.
  • Do not lie. Sooner or later, people will find the truth, and the credibility of you, your team, and organization is lost.
  • Do not speculate. Know the facts and stick to them. It is better to say that something is not known than to speculate.
  • Know what can be said. Always keep within safe limits. When necessary, a “no comments” phrase can be handy to use.
  • Have a message to pass to journalists.
  • Do not always answer a question that was asked but one that you would like to be asked. If used judiciously, this can help with getting your points to journalists.

If your team is lucky to have a dedicated PR person, this person can help you with promoting your team. The PR person can also proactively work with journalists and help them understand what the IRT is doing, why, and how.

If you judge that an incident might generate inquiries from the press, you should prepare a holding statement that can be used if a journalist contacts the organization and asks for a statement.

In virtually all cases, there is not much benefit from proactively contacting the press and offering information about an incident. The exception to this rule might be a situation in which someone else will publicize the situation, and you want your version of the events to be heard first.

Internal IT Security

Some organizations might have a separate group that handles only internal security cases, cases pertaining to the host organization.

In this case, the internal IT security group is a natural an ally of the IRT. Both teams can organize regular meetings to exchange information on what kind of attacks they are seeing and observe trends. The group handling customers’ incidents should provide information only on types of attacks but not who has been attacked. In addition to the regular information exchange, both teams should enable members from one team to temporarily rotate into another team.

Executives

There must be an arrangement for the IRT to brief the executives on a regular basis and when emergencies occur. Regular briefings with executives are important to: discuss the newest security threats, learn about the challenges to resolve the threats, and raise awareness of the IRT’s role and availability. For executives, it is vital to be informed whether their part of the organization is affected by the incident and, if it is, how and to what extent.

Here are few tips when communicating with the executives:

  • Frequency: Not more often that every two weeks but not less than once a month for regular updates. During a crisis, the first message should be sent as soon as the severity of an incident reaches a certain criteria. After that point, the frequency should be a function of the incident, and reporting can be done from every hour to once a day.
  • Content: Keep it short and simple. Provide pointers to where all details are being kept. Order information chronologically so that the most recent information is presented first. Background information can be added at the end. Do not forget to include the impact to the organization—why this communication is important to the executives. The next steps and the time of the next communication also must be presented, together with actions that executives must undertake.
  • Email and Voice Messages: When sending both an email and a voice message, they should not be identical. The email can contain more background information, whereas the voice message should focus only on the most recent developments.
  • Format: Between two slides to four slides for regular face-to-face meetings. For all other regular updates, text email (no Microsoft Word or Adobe PDF documents) together with a voice message should be used. Text email is preferred because it can be quickly downloaded.
  • Web Page: Must be created where executives can find all the information. That must be a single top-level page that gives an overall view of all current events. This top-level page must then contain links for each individual incident and to all other communications to the executives.
  • Length: Optimally, approximately 2 and not longer than 3 minutes for a voice mail and a one-page email (approximately 200 words to 300 words). Everything else should be given as additional information on a web page.

Product Security Team

If the host organization is a vendor that is responsible for developing and maintenance of a product or service, it should have a dedicated team that deals with security vulnerabilities in the products. Similarly, like with the situation with IT, both teams, product security and IRT, can benefit from having close ties. The product security team can provide information on different vulnerabilities so that the IRT can start looking at whether it is being exploited. Information on vulnerabilities can also be used to reevaluate some old data. What was previously seen as only noise or random attempts might suddenly be seen as focused efforts to exploit a particular vulnerability.

Even if the organization is not a vendor, the team should establish ties with vendors’ product security teams. At least, the IRT must know how to contact them. Vendors always appreciate when they receive notification on a new vulnerability or other suspicious behavior of their products.

Internal IT and NOC

Depending on the organization’s size and complexity, you may have a separate IT group that maintains and monitors the internal network. If you are an Internet service provider (ISP), you probably would have a separate network operation center (NOC) that maintains a network used by your customers. These two groups are your partners. They can provide the IRT with the current information on what is happening in the network (internal or external) and early warnings about new attacks while they are being tested1. NOC, in particular, can add network-centric view on attacks and contribute methods how to combat attacks using network infrastructure.

Be Prepared!

An IRT, by its nature, deals with emergencies and exceptions. As such, it is hard to be prepared for something that cannot be foreseen. Although nobody can be prepared for the exact incarnation of the next worm, steps can be taken to be prepared. A new worm might share common characteristics of a previous worm, so the IRT can apply that knowledge in preparation for future incidents. Consider the following:

  • Know current attacks and techniques.
  • Know the system the IRT is responsible for.
  • Identify critical resources.
  • Formulate response strategy.
  • Create a list of scenarios and practice handling them.

Know Current Attacks and Techniques

It is imperative for the IRT to possess an intimate knowledge of current attack techniques and attacks themselves, which aids in distinguishing an attack from legitimate activities. Obviously, the knowledge must not be limited only to the attacking side, but the defensive. How can you protect your organization from various attacks? What are the drawbacks? How does this encompass features and capabilities of equipment and the network’s topology and characteristics?

The next question is, How should you gather that knowledge? There is no easy way to accomplish that. Reading public lists like Bugtraq, full-disclosure, and others is standard for every team. Attending conferences and learning new issues is also important. Analyzing what is going on in the team’s constituency is obligatory. Monitoring, as much as possible, underground is necessary. Setting up honeypots and honeynets and analyzing the activity is also an option. But, above all, talk to your peers and exchange experiences. That is something that cannot be substituted with anything else.

If the information collection is done internally, you can include other groups or individuals to help you with that task, even if they are not part of the IRT. If your organization has a group that monitors external information sources, you can make a formal arrangement with them to receive only the information that might interest the IRT. If you do not have such a group in your organization, you might find security-conscious individuals who are monitoring some of the sources who can share information that might also interest the IRT.

If your IRT decides to operate a honeypot or honeynet, you must make sure that you will have sufficient resources to do so. A honeypot is a nonproduction service exposed to the Internet with the purpose of being (mis)used by an attacker. The IRT can then capture malware and gain firsthand knowledge about how it infects devices and propagates. The service can be emulated with special software or it can be a real service. A honeynet is a network of honeypots.

Know the System IRT Is Responsible For

The IRT must know what it is protecting, the location of the boundaries of the systems for which it is responsible, and the functions of different parts of the system. After defining boundaries, the next step is to identify the groups (or people) that can be contacted when the IRT must cross the boundaries. The next task is to determine what is “normal” within that area. If the IRT knows what is normal for the given system, it will be easier to spot deviations and investigate. This is also known as determining the baseline.

The baseline means different things for different aspects of the overall system. On the highest level, it can consist of the following things:

  • Number of remote users
  • Number of internal users
  • Total consumed network bandwidth, inbound and outbound, at all links (for example, between branch offices, toward the Internet)
  • Traffic breakdown per protocol and application (TCP, UDP, mail, web, backup, and so on) and bandwidth utilization per protocol

Each of the categories can then be further refined and a more detailed picture can be formed. For remote users, remote IP addresses can be recorded. A traffic model of a user can be formed by recording how much traffic (packets) is generated inbound and outbound and what protocols and applications have generated it. For some protocols, what types of packets are being generated can even be recorded. That information can then be used to identify the presence of anomalous traffic because different types of packets are used by different attack programs. Another type of information that can be recorded is the direction of the traffic. That is important because the site can be the target or source of an attack.

Information used to build the baseline should come from multiple sources to build a better picture. Traffic snapshots (or full captures for small sites), Netflow data, syslog logs, logs from intrusion prevention/detection systems, and application logs of all of these sources should be used to build the baseline.

Taking only a single snapshot might not be sufficient to establish a credible baseline. Traffic and usage patterns change over time. Adding or removing a significant number of computers will affect the baseline, too. The message is that information should be constantly updated with the latest measurements.

Identify Critical Resources

What resources are critical for the business and in what way? What will happen if a resource is unavailable? If the company website is used only to present what the organization is about, it being unavailable might not have severe consequences. If the website is also used for ordering, you need to keep the period of not being available as short as possible.

This part of the process must be done with help from different groups and departments within the organization. Each of them should identify what resources are critical for their business. All that information then must be taken to a higher level of management and looked at from the global organization’s perspective. The criticality of services should be reviewed periodically and after significant change in the business model is introduced.

Formulate Response Strategy

After completing the inventory of critical resources, an appropriate response strategy can be formulated. This strategy is supposed to answer questions such as: If a service, or server, is compromised, what can and should be done? Here are few examples that illustrate this point:

  • If a company’s website is defaced or compromised, what needs to be done? If the website is used only for general information, it can be simply rebuilt, and no effort will be spent trying to identify how the compromise happened or who did it.
  • If a host used for collecting billing information is compromised and the attacker is siphoning credit card information from it, can you simply shut off the computer to prevent further damages? Although that can prevent data theft, it might also prevent collecting billing information, and the organization will lose some money as a consequence.
  • What level of compromise needs to happen before a decision to attempt to identify a culprit for possible prosecution will be made versus just shutting the culprit out? This can possibly mean that the attacker will be left to (mis)use the compromised system for some time while the investigation is going on. What is the point when the business might seriously suffer as the consequence of the compromise and the investigation has to be stopped?

Answers to some of the questions can also lead to rethink the way the system is organized or services are offered. In the case of a website, maybe it can be made static and burned on a DVD so that the possibility of defacement is reduced if not eliminated. Maybe some critical services can be split across multiple computers, so if one is compromised, it can be shut down without affecting the other service.

Why is this important? When the attack is ongoing, there might not be sufficient time to think about what the various actions of the attacker and defenders can cause to the organization.

Create a List of Scenarios

Instead of waiting for incidents to happen and then learning how to respond, the IRT should have regular practice drills. Common scenarios should be created and team responses should be practiced. The main purpose of these exercises is that people gain practice and confidence in handling incidents and learn how effective they are. These exercises do not need to be limited only to IRT but can involve other parts of the organization. In such joint exercises, all involved participants must know when the exercise is active, so no false alarms occur and create panic and wrong actions.

What can these scenarios look like? For a start, they must cover the main aspects of all handled incidents. If these incidents happened once, there is the possibility that they will happen again. Here are some suggestions of what can be covered:

  • Virus or worm outbreaks
  • External and internal routing hijacked
  • DNS-related attacks (for example, the organization DNS entry gets changed and points to a bogus site)
  • Computer compromise
  • Network sniffer installed on several computers
  • Website defacement or compromise
  • Phishing attacks
  • DoS attacks
  • Emergency software upgrade

These may be the most common scenarios that one organization might encounter. Depending on the organization’s role and technical capabilities, some additional scenarios can be created. These practice drills can be only a paper exercise, or they can be conducted on an isolated network segment or virtual devices.

Devices we can simulate are computers, routers, and networks of devices. In these simulations, devices can be either targets of simulated attacks or used to observe how malicious software behaves. Some of the software for creating virtual computers are VMware, Parallels, Xen, and QEMU. A more comprehensive list of different software is posted at the Wikipedia web page at http://en.wikipedia.org/wiki/ Comparison_of_platform_virtual_machines.

A paper exercise is good for formulating the initial response on an attack that has not been encountered yet and to modify an existing response after the system has changed. Testing the response, on the other hand, is best done on the actual equipment. You need to use real devices to make sure that the simulator reflects the real device’s behavior.

After the response is established and practiced, new elements should be added to it. Some unexpected or unusual elements should be introduced. They can be various things, such as the following:

  • The telephone network is down; at the same time, team members cannot use fixed telephony or mobile phones to communicate.
  • It is impossible to physically reach the affected device (for example, a computer is locked in a room and the room key is lost).
  • A new device is introduced into the network without anyone’s knowledge (for example, a load-balancing device inserted in front of the web farm) or the network topology is changed.

The last things to practice are seemingly impossible scenarios. You must accept that, occasionally, the research community does come up with a revolutionary new attack technique, and things that were considered impossible suddenly become routine. Here are a few examples:

  • A scenario that contains a logical paradox. That would be the trick case to verify that the handler can notice the paradox. An example might be to invent a device under attack that is not connected to the network or withhold information about an intermediate device.
  • A feature suddenly stops working (for example, packet filters do not block packets; rate limiters do not limit packet rate).
  • Significant improvement in attack techniques (for example, a complete compromise of MD5 and SHA-1 hash functions, an AES crypto system is broken, and the number factoring becomes trivial).

For some of these scenarios, there may be no valid, or possible, responses, so their value lies in forcing people to think out of the box.

Measure of Success

At the start, it must be said that, by itself, counting the number of incidents the team has handled in a given time period is not a good measure of how the team is doing. After the team starts operating, it will initially see only a few incidents. Quickly that number will start to rise rapidly, and the more the team is working on them, the more incidents will come to light and continue to grow. An increased number of incidents is because the IRT is now actively looking for them while before nobody took notice of them.

The way to approach creating the metrics to measure the team’s success is to start from who is the team’s constituency and what is the team’s goal for the constituency. That will provide the starting point of defining what can be measured. Additionally, you can try to measure changes in the risk the organization faces from a compromise. Part of that risk assessment is the speed of recovery and limiting the damage after the incident. The final part of the metrics is the team’s influence and standing with the community. A good guide on how to define what to measure, how, and why is the ISO 27004 standard. Let’s now look at some examples of how metrics for measuring the team’s success can be defined.

One of the goals for most of the IRTs is to increase security awareness within the constituency. This goal can be aligned with specific policies such as “All users will receive basic security training” or “All users’ passwords will be longer than six characters.” Data on a number of users receiving security training and the results of checking users’ password can be easily obtained, so you can calculate where you are in meeting the policy goals and partly measure the team’s success.

Assessing changes in the risk the organization faces from computer attacks is harder to accomplish. You cannot directly measure the attacker’s willingness to attack your organization, but you can use the fact that attackers are mostly opportunistic creatures to your advantage. If you are a hard target, attackers will go after others who are easier targets. What you can measure here is what is happening to your organization relative to your peers and the industry. Reliable data on attacks is hard to come by. CSI and BERR surveys can serve as guides, but the numbers must be taken with caution. Attacks do not have to be targeted; you can also compare the number and severity of virus outbreaks within the organization versus the industry.

Being a leader in the field is also a sign of the team’s success. This can be measured by looking at the number of talks the team was invited to give, the number of interviews the IRT members gave, and how many of the team’s ideas were incorporated into best practices and international standards.

Summary

Running a successful IRT involves many aspects. The team must have the right people and do the right thing. Not only must you pay attention to major things, but you also must not lose sight of the small details. Although all these details might look overwhelming, with dedication from the entire team, they can be achieved, and you will have a successful and respected IRT.



Computer Incident Response and Product Security


# ISBN-10: 1-58705-264-4
# ISBN-13: 978-1-58705-264-4
Published Dec 6, 2010
US SRP $44.99
Published by Cisco Press.