Operating an Incident Response Team
(The following is condensed from a chapter excerpt from the book Computer Incident Response and Product Security, from Cisco Press.)
After an Incident Response Team (IRT) is established, your next concern is how to successfully operate your team. This chapter covers the following topics to help you improve the operation of your IRT:
Team Size and Working Hours
Team size is a function of the following factors:
In practice, if you are starting from scratch and the IRTís task is defined as ďgo and deal with the incidents,Ē a small team should be sufficient for the start. The need for team expansion can be determined over time.
Many teams operate only during regular office hours for that locale. For this kind of coverage a two-person team should suffice. Although office-hours coverage is fine for the start, the IRT should look into extending its working hours to be active around the clock.
The main reason for extending the working hours is that some services (for example, a public website) are available at all times. The IRT must be able to respond to situations swiftly and not after the weekend.
One of the standard ways to extend working hours is to have someone who is on-call. This person can answer telephone calls and check incoming emails after hours and over the weekend. This setup can be augmented by cooperating with other teams.
If you want around-the-clock and weekend coverage, the number of the people in IRT would depend on whether the duties can be shared with other teams in the organization. If the duties can be shared, you might not need to increase the size of the IRT. If not, increasing the team size should be considered. A three-member team might be a good size to cover for one another when absences arise.
If the host organization is within the EU, it must pay attention to the European Working Time Directive (Council Directive 93/104/EC and subsequent amendments), which regulates that the working week must not be longer than 48 hours, including overtime. On the other hand, people might opt-out from the directive and work as long as required. The hostís human resources department must investigate this and set up proper guidelines.
Irrespective of what hours the IRT operates, that fact must be clearly and prominently stated and communicated to other teams and the constituency. Setting the right expectations is important.
When the IRT operates only during office working hours, they should state their hours and time zone in consideration of the global community. So all the information related to your working hours must be visibly and clearly stated on your teamís website.
Advertising the IRTís Existence
The teamís existence must be announced internally within the constituency and externally to other teams. Set up a website that explains what the team does and how it can be reached. But that should not be the end of the effort. A website is passive and the team should be proactive. Consider the following:
Acknowledging Incoming Messages
Receiving an email about a compromised device is usually how work on a new incident starts. The first step in this process is for the IRT to acknowledge receiving this initial notification. The acknowledgment must fulfill several goals:
Giving Attention to the Report
Most people prefer communicating with another human being than an impersonal machine. Having someone who can compose a reply is much better than an autoresponse mechanism, even if the confirmation is not as instantaneous as it would have been if it were automatic. It is perfectly fine to have a template answer that will be used to acknowledge the receipt of a report, but it is also acceptable to modify it for the added ďhuman touch.Ē
Following are some examples of varying the template text:
Incident Tracking Number
If the report represents an incident, it must be assigned a tracking number. That number must be told to the sender so that it can be used in subsequent emails. That way, both parties will always know which incident they are talking about. When exchanging encrypted email, the Subject line should contain only the incident number and nothing else. That way, it gives away the minimum details to whoever intercepts the message.
Setting the Expectations
Full communication between parties is important. You must set the right expectations on what will happen next and how long it might take. If the report is not an incident, state so clearly with the explanation on what to do if the sender does not agree with the assessment. If the report is an incident, state whether it is being handled right now, and if not, when it might be taken into the process.
Information About the IRT
Where can more information about the IRT be found and how can it be contacted? This is usually only a pointer to the IRTís website that contains all the details.
Looking Professional and Courteous
To make your responses more professional, you can prepare some template text in advance so that whoever will be composing the actual response can cut and paste parts of the template. The template adds to the uniformity of the acknowledgments that, in turn, helps set expectation for the people who are reading them. This does not mean that people will now send a prepackaged response instead of leaving that to auto-responder software. The template ensures that relevant elements are included in the acknowledgment, and each team member can add their own touch to the response.
Cooperation with Internal Groups
In the same way the IRT cannot operate in isolation from the other IRTs, it also cannot operate without support and cooperation from various internal groups and departments. The groups and departments suitable for possible liaising are as follows:
Without good old-fashioned physical security, many state-of-the-art security mechanisms would not properly work. This group usually operates, or has access to, Closed Circuit TV (CCTV) cameras, if they are installed on the premises. Therefore, their cooperation is invaluable in cases where identity of a person must be confirmed.
Security teams can have power to arrest and detain. So, if a culprit is identified, the security team could make an arrest.
The legal department can be an invaluable asset. The IRT must work to identify who, from the legal side, would support the team in its job. The best results can be achieved if a designated person(s) is given an extra task to support the IRT on a long-term basis.
You must expect to invest a considerable effort at the beginning while the legal team learns about the security world and the IRT learns about the legal challenges. Only after both sides understand each otherís positions can real cooperation begin.
The IRT should bring all new or different incidents to the attention of the legal team. In the majority of cases, the legal team might decide that the new case falls under one of the previously encountered issues. It is a remaining few that will prompt the legal team to look deeper into the matter to see how the organization can better protect itself from the legal perspective. These improvements might range from the way the IRT approaches similar incidents to modified contracts that the organization will use in the future.
It is also a good idea that lawyers from different organizations reach out to each other and start a dialogue. It is much easier if they are approached collectively such as through the Vendor Special Interest Group forum under FIRST. Interested parties can visit http://www.first.org/vendor-sig/index.html and contact moderators.
At some point, the press might approach the team about a case. Talking to the press can be tricky. Usually the journalists would like to receive as much information as possible, whereas the IRT might need to be careful what to disclose and when.
Having a dedicated PR person assigned to the team to work with the press is helpful. The next best option is to have someone from the IRT receive PR training and act as the teamís spokesperson. The least desirable option is to have somebody, without any training, step in front of the journalists. Whatever your case happens to be, following are a few simple tips on what to do when talking to the press:
If your team is lucky to have a dedicated PR person, this person can help you with promoting your team. The PR person can also proactively work with journalists and help them understand what the IRT is doing, why, and how.
If you judge that an incident might generate inquiries from the press, you should prepare a holding statement that can be used if a journalist contacts the organization and asks for a statement.
In virtually all cases, there is not much benefit from proactively contacting the press and offering information about an incident. The exception to this rule might be a situation in which someone else will publicize the situation, and you want your version of the events to be heard first.
Internal IT Security
Some organizations might have a separate group that handles only internal security cases, cases pertaining to the host organization.
In this case, the internal IT security group is a natural an ally of the IRT. Both teams can organize regular meetings to exchange information on what kind of attacks they are seeing and observe trends. The group handling customersí incidents should provide information only on types of attacks but not who has been attacked. In addition to the regular information exchange, both teams should enable members from one team to temporarily rotate into another team.
There must be an arrangement for the IRT to brief the executives on a regular basis and when emergencies occur. Regular briefings with executives are important to: discuss the newest security threats, learn about the challenges to resolve the threats, and raise awareness of the IRTís role and availability. For executives, it is vital to be informed whether their part of the organization is affected by the incident and, if it is, how and to what extent.
Here are few tips when communicating with the executives:
Product Security Team
If the host organization is a vendor that is responsible for developing and maintenance of a product or service, it should have a dedicated team that deals with security vulnerabilities in the products. Similarly, like with the situation with IT, both teams, product security and IRT, can benefit from having close ties. The product security team can provide information on different vulnerabilities so that the IRT can start looking at whether it is being exploited. Information on vulnerabilities can also be used to reevaluate some old data. What was previously seen as only noise or random attempts might suddenly be seen as focused efforts to exploit a particular vulnerability.
Even if the organization is not a vendor, the team should establish ties with vendorsí product security teams. At least, the IRT must know how to contact them. Vendors always appreciate when they receive notification on a new vulnerability or other suspicious behavior of their products.
Internal IT and NOC
Depending on the organizationís size and complexity, you may have a separate IT group that maintains and monitors the internal network. If you are an Internet service provider (ISP), you probably would have a separate network operation center (NOC) that maintains a network used by your customers. These two groups are your partners. They can provide the IRT with the current information on what is happening in the network (internal or external) and early warnings about new attacks while they are being tested1. NOC, in particular, can add network-centric view on attacks and contribute methods how to combat attacks using network infrastructure.
An IRT, by its nature, deals with emergencies and exceptions. As such, it is hard to be prepared for something that cannot be foreseen. Although nobody can be prepared for the exact incarnation of the next worm, steps can be taken to be prepared. A new worm might share common characteristics of a previous worm, so the IRT can apply that knowledge in preparation for future incidents. Consider the following:
Know Current Attacks and Techniques
It is imperative for the IRT to possess an intimate knowledge of current attack techniques and attacks themselves, which aids in distinguishing an attack from legitimate activities. Obviously, the knowledge must not be limited only to the attacking side, but the defensive. How can you protect your organization from various attacks? What are the drawbacks? How does this encompass features and capabilities of equipment and the networkís topology and characteristics?
The next question is, How should you gather that knowledge? There is no easy way to accomplish that. Reading public lists like Bugtraq, full-disclosure, and others is standard for every team. Attending conferences and learning new issues is also important. Analyzing what is going on in the teamís constituency is obligatory. Monitoring, as much as possible, underground is necessary. Setting up honeypots and honeynets and analyzing the activity is also an option. But, above all, talk to your peers and exchange experiences. That is something that cannot be substituted with anything else.
If the information collection is done internally, you can include other groups or individuals to help you with that task, even if they are not part of the IRT. If your organization has a group that monitors external information sources, you can make a formal arrangement with them to receive only the information that might interest the IRT. If you do not have such a group in your organization, you might find security-conscious individuals who are monitoring some of the sources who can share information that might also interest the IRT.
If your IRT decides to operate a honeypot or honeynet, you must make sure that you will have sufficient resources to do so. A honeypot is a nonproduction service exposed to the Internet with the purpose of being (mis)used by an attacker. The IRT can then capture malware and gain firsthand knowledge about how it infects devices and propagates. The service can be emulated with special software or it can be a real service. A honeynet is a network of honeypots.
Know the System IRT Is Responsible For
The IRT must know what it is protecting, the location of the boundaries of the systems for which it is responsible, and the functions of different parts of the system. After defining boundaries, the next step is to identify the groups (or people) that can be contacted when the IRT must cross the boundaries. The next task is to determine what is ďnormalĒ within that area. If the IRT knows what is normal for the given system, it will be easier to spot deviations and investigate. This is also known as determining the baseline.
The baseline means different things for different aspects of the overall system. On the highest level, it can consist of the following things:
Each of the categories can then be further refined and a more detailed picture can be formed. For remote users, remote IP addresses can be recorded. A traffic model of a user can be formed by recording how much traffic (packets) is generated inbound and outbound and what protocols and applications have generated it. For some protocols, what types of packets are being generated can even be recorded. That information can then be used to identify the presence of anomalous traffic because different types of packets are used by different attack programs. Another type of information that can be recorded is the direction of the traffic. That is important because the site can be the target or source of an attack.
Information used to build the baseline should come from multiple sources to build a better picture. Traffic snapshots (or full captures for small sites), Netflow data, syslog logs, logs from intrusion prevention/detection systems, and application logs of all of these sources should be used to build the baseline.
Taking only a single snapshot might not be sufficient to establish a credible baseline. Traffic and usage patterns change over time. Adding or removing a significant number of computers will affect the baseline, too. The message is that information should be constantly updated with the latest measurements.
Identify Critical Resources
What resources are critical for the business and in what way? What will happen if a resource is unavailable? If the company website is used only to present what the organization is about, it being unavailable might not have severe consequences. If the website is also used for ordering, you need to keep the period of not being available as short as possible.
This part of the process must be done with help from different groups and departments within the organization. Each of them should identify what resources are critical for their business. All that information then must be taken to a higher level of management and looked at from the global organizationís perspective. The criticality of services should be reviewed periodically and after significant change in the business model is introduced.
Formulate Response Strategy
After completing the inventory of critical resources, an appropriate response strategy can be formulated. This strategy is supposed to answer questions such as: If a service, or server, is compromised, what can and should be done? Here are few examples that illustrate this point:
Answers to some of the questions can also lead to rethink the way the system is organized or services are offered. In the case of a website, maybe it can be made static and burned on a DVD so that the possibility of defacement is reduced if not eliminated. Maybe some critical services can be split across multiple computers, so if one is compromised, it can be shut down without affecting the other service.
Why is this important? When the attack is ongoing, there might not be sufficient time to think about what the various actions of the attacker and defenders can cause to the organization.
Create a List of Scenarios
Instead of waiting for incidents to happen and then learning how to respond, the IRT should have regular practice drills. Common scenarios should be created and team responses should be practiced. The main purpose of these exercises is that people gain practice and confidence in handling incidents and learn how effective they are. These exercises do not need to be limited only to IRT but can involve other parts of the organization. In such joint exercises, all involved participants must know when the exercise is active, so no false alarms occur and create panic and wrong actions.
What can these scenarios look like? For a start, they must cover the main aspects of all handled incidents. If these incidents happened once, there is the possibility that they will happen again. Here are some suggestions of what can be covered:
These may be the most common scenarios that one organization might encounter. Depending on the organizationís role and technical capabilities, some additional scenarios can be created. These practice drills can be only a paper exercise, or they can be conducted on an isolated network segment or virtual devices.
Devices we can simulate are computers, routers, and networks of devices. In these simulations, devices can be either targets of simulated attacks or used to observe how malicious software behaves. Some of the software for creating virtual computers are VMware, Parallels, Xen, and QEMU. A more comprehensive list of different software is posted at the Wikipedia web page at http://en.wikipedia.org/wiki/ Comparison_of_platform_virtual_machines.
A paper exercise is good for formulating the initial response on an attack that has not been encountered yet and to modify an existing response after the system has changed. Testing the response, on the other hand, is best done on the actual equipment. You need to use real devices to make sure that the simulator reflects the real deviceís behavior.
After the response is established and practiced, new elements should be added to it. Some unexpected or unusual elements should be introduced. They can be various things, such as the following:
The last things to practice are seemingly impossible scenarios. You must accept that, occasionally, the research community does come up with a revolutionary new attack technique, and things that were considered impossible suddenly become routine. Here are a few examples:
For some of these scenarios, there may be no valid, or possible, responses, so their value lies in forcing people to think out of the box.
Measure of Success
At the start, it must be said that, by itself, counting the number of incidents the team has handled in a given time period is not a good measure of how the team is doing. After the team starts operating, it will initially see only a few incidents. Quickly that number will start to rise rapidly, and the more the team is working on them, the more incidents will come to light and continue to grow. An increased number of incidents is because the IRT is now actively looking for them while before nobody took notice of them.
The way to approach creating the metrics to measure the teamís success is to start from who is the teamís constituency and what is the teamís goal for the constituency. That will provide the starting point of defining what can be measured. Additionally, you can try to measure changes in the risk the organization faces from a compromise. Part of that risk assessment is the speed of recovery and limiting the damage after the incident. The final part of the metrics is the teamís influence and standing with the community. A good guide on how to define what to measure, how, and why is the ISO 27004 standard. Letís now look at some examples of how metrics for measuring the teamís success can be defined.
One of the goals for most of the IRTs is to increase security awareness within the constituency. This goal can be aligned with specific policies such as ďAll users will receive basic security trainingĒ or ďAll usersí passwords will be longer than six characters.Ē Data on a number of users receiving security training and the results of checking usersí password can be easily obtained, so you can calculate where you are in meeting the policy goals and partly measure the teamís success.
Assessing changes in the risk the organization faces from computer attacks is harder to accomplish. You cannot directly measure the attackerís willingness to attack your organization, but you can use the fact that attackers are mostly opportunistic creatures to your advantage. If you are a hard target, attackers will go after others who are easier targets. What you can measure here is what is happening to your organization relative to your peers and the industry. Reliable data on attacks is hard to come by. CSI and BERR surveys can serve as guides, but the numbers must be taken with caution. Attacks do not have to be targeted; you can also compare the number and severity of virus outbreaks within the organization versus the industry.
Being a leader in the field is also a sign of the teamís success. This can be measured by looking at the number of talks the team was invited to give, the number of interviews the IRT members gave, and how many of the teamís ideas were incorporated into best practices and international standards.
Running a successful IRT involves many aspects. The team must have the right people and do the right thing. Not only must you pay attention to major things, but you also must not lose sight of the small details. Although all these details might look overwhelming, with dedication from the entire team, they can be achieved, and you will have a successful and respected IRT.