In multiprotocol label switching (MPLS) VPN security discussions, the general statement often heard is, “MPLS is not secure, because a simple operator mistake (such as the misconfiguration of a route target) can break VPN isolation.” Such statements display some fundamental misunderstandings, which this white paper will attempt to explain.
Another similar example may illustrate why operational mistakes are not an argument against a certain technology. Assume an operator makes a mistake in a firewall configuration that accidentally leaves a security hole open. Nobody would argue that firewalls are insecure due to such an act. Since the operator has the authority to make changes, the operator implicitly has the authority to make mistakes. These examples help to show why operational problems are a different category of security. Strictly speaking, you cannot trust your network operators, which can present a very difficult problem.
Security depends on three components, each of which is independent of the others:
Note that these components are not specific to networking or even computing. Physical security has the same three fundamental properties and the possibility of failing any of them. A door lock for example, can have weaknesses in the design (for instance, constructed using the wrong material), manufacturing mistakes (for instance, not fixed properly to the door), or operational mistakes (for instance, leaving the key under the doormat).
The main conclusion is that an operational mistake, such as a misconfiguration in an MPLS provider edge (PE) router, does not automatically imply that the architecture is insecure. Misconfigurations can happen in any technology, which means operational security measures need to be in place to catch such issues.
There are two types of operational security problems:
The impact of misconfigurations of either type can range from little or no effect to catastrophic. This is especially true in the case of accidental misconfigurations, where there is a reasonable chance that the true extent of the resulting security breach is not even discovered. It should also be noted that only a minor fraction of possible misconfigurations will actually result in a security breach. Naturally, deliberate misconfigurations will be more likely to result in a breach, since there is malicious intent with a goal to break the security policy.
Currently, in the case of MPLS VPNs, the biggest concern in the industry is accidental misconfigurations. The likelihood that an operator mistypes a route target, or makes other configuration mistakes cannot be overlooked. This type of mistake could cause a given VPN site to become part of another VPN, breaking the separation of both VPNs. When this happens by accident, it is unlikely that either side will discover the true nature of the incident. The misplaced site will usually realize that it cannot reach its business applications any longer, while the other VPN may not notice the breach at all, unless there is address space overlap or some routing issues due to the new prefixes. This issue is a serious concern of VPN users.
The typical reaction when looking for a solution to a security problem is to look for features to configure. It is important to understand that operational problems cannot be fully solved by features, because the person making the misconfiguration may also remove the feature that is meant to protect against misconfigurations. Operational problems require operational solutions, and operational competence of the organization.
It can be very difficult to implement a comprehensive operational security environment, and some measures (such as dual control) can require a certain organizational size to work properly. The goal should be to carry out incremental improvements to the overall operations process. For example, precise command level authorization schemes can be difficult to deploy and expensive to operate in large networks. Other parts of the operations process are much easier to enforce. For example, one such mechanism is a dual control system. By sending all access and configuration logs to a separate log server, to which the network operators do not have access, is a step toward discouraging deliberate misconfigurations of network devices.
Extending the network to third parties, by either outsourcing parts of the network or certain network management aspects, or by providing extranets, requires third parties to comply with the operational security measures. This adds significant complexity to the operational security policy.
The key issue with many operational control functions is that they may not always prevent mistakes from happening. They may make it harder for mistakes to occur, but a large part focuses on the detection of the mistakes after they occur. This may, to a large extent, solve deliberate misconfigurations because an engineer would probably not violate the security policy if it is known that the “attack” can be detected and traced back to the engineer. But it is not always possible to proactively avoid mistakes. Obviously this causes security concerns.
Many organizations consider additional security measures, so that the overall system is more resistant against misconfigurations.
To maintain separation when the network core does not provide full separation, potentially due to a misconfiguration, IPsec may be considered in addition to MPLS VPN. GET VPN is a variant of IPsec, which is particularly suited to run in addition to MPLS VPN. If an organization runs an IPsec VPN on top of an MPLS VPN, operator mistakes on the MPLS core will not break the separation of the VPN, because it is additionally protected by IPsec. However, this poses an additional cost and operational burden. Some organizations choose to deploy two independent firewalls with different operational groups, so that no single mistake or misconfiguration can affect overall security.
The use of several layers of security is called “defense in depth” and is a common model in security deployments. However, adding additional security layers should not be done without a proper risk analysis. It is important to understand the threats, their impact on the organization, and the cost of the additional security measures.
A risk analysis should determine whether the cost of the additional security measures is in balance with the cost of the actual risk without the additional security measures. In other words, a risk analysis should determine whether the risk’s impact is large enough to justify the extra cost of the additional counter measures. However, such a risk analysis should account for the entire network including all of its assets and current counter measures. A proper risk analysis requires significant resources.
The complexity of a network makes operational mistakes and security violations more likely. This applies to both the network architecture, as well as to the methods that are in place to protect the network. From a security perspective, less complex configurations are usually preferred.
This perspective also applies to the operational management of the network. Very complex operational procedures are more likely to cause problems. For example, under a very complex operational procedure an operator group may not have the required privileges to carry out an emergency operation. Under stress, the immediate reaction in such cases is to disable some security checks.
There is no clear guideline on what is “too complex” as this also depends on the operational model of the enterprise. This parameter will be different for a highly skilled team than a first line support team.
The key message is that adding additional operational measures for example, command level authorization, or additional security measures, such as IPsec, increases the complexity of the network, and in some cases may actually result in lower security, because the network is becoming too complex to maintain.
There are an increasing number of regulations requiring certain operational security measures, such as PCI, HIPAA, and Sarbanes-Oxley. Currently these regulations are the main drivers for many operational security measures. Precise access control and authorization, as well as logging are key requirements of most compliance industry standards.
Companies considering operational security measures should verify which regulations apply to their business, and what each regulation requires.
Even though operational security is a process, and less feature or product driven, there are a number of Cisco products that address operational security:
Operational mistakes can break security policies and are a major concern for both service providers and enterprises. Most operational mistakes cannot be completely avoided; however, it is possible to reduce the risk of a mistake. The ability to detect a mistake and trace it back to its source could also deter insiders from making malicious misconfigurations or help to quickly detect operator mistakes.
Industry compliance regulations require certain operational security measures. Network operators should check which regulation applies and verify that the required measures are in place.
It is often possible to provide additional security measures that are not fully dependent on operational mistakes. However, before implementing additional security measures a formal risk analysis should be performed to balance the cost of the additional measures with the cost of the risk incurred due to operational weaknesses.
Michael Behringer (firstname.lastname@example.org)
RFC 3871: Operational Security Requirements for Large Internet Service Provider (ISP) IP Network Infrastructure
This document is part of Cisco Security.
This document is provided on an "as is" basis and does not imply any kind of guarantee or warranty, including the warranties of merchantability or fitness for a particular use. Your use of the information on the document or materials linked from the document is at your own risk. Cisco reserves the right to change or update this document at any time.