Cisco Application Centric
provides powerful new ways to dynamically manage infrastructure in the modern
world of IT automation and DevOps. Having the tools to change how
infrastructure is built is one thing, but being able to effectively operate the
infrastructure beyond the day zero build activities is crucial to long term
effectiveness and efficiency. To effectively harness the power of
organizations will need to understand how to incorporate
into their daily operations. This book examines some of the common operational
activities that IT teams use to provide continued infrastructure operations and
gives the reader exposure to the tools, methodologies, and processes that can
be employed to support day 1+ operations within an
Software Included in the Book
Cisco Application Centric
combines hardware, software, and ASIC innovations into an integrated systems
approach and provides a common management framework for network, application,
security, and virtualization.
For the purpose of
writing this book, the following hardware devices were used:
ACI Spine switches, including
Cisco Nexus 9508 and 9336PQ
ACI Leaf switches, including
Cisco Nexus 9396PX, 9396TX, and 93128TX
Cisco Application Virtual
Cisco UCS B and C series
Several models of switches
and routers, including Cisco Nexus 5000 switches and Cisco Integrated Services
A variety of hypervisors,
including KVM, Microsoft Hyper-V, and VMware vSphere
IXIA IxVM product family
This book was
written based on the
1.2 software release.
The Story of
ACME Inc. is a
multi-national corporation that specializes in manufacturing, sales, and
distribution of a diverse product portfolio, including rocket-powered roller
skates, jet-propelled unicycles, and various explosive materials. These product
groups operate as separate business groups within the company, and have
previously maintained separate infrastructure and applications. They have
largely focused on retail routes to market, but have recently decided to pursue
a more direct-to-consumer business model due to intense pressure from new
competitors who have dominated the online sales channels. In an effort to be
more competitive, ACME has undertaken a project to build a mobile application
platform to support ordering and logistics for product delivery to their
customers for their entire portfolio.
business units have leveraged third party software companies and commercially
available software to meet their IT demands, but would like to create a more
intimate relationship with their consumers and be able to take feedback on the
platform directly from those users, while incorporating an ongoing improvement
cycle so they can react to changing market dynamics in a more nimble fashion.
Where they have used custom software in the past, they have leveraged a
traditional infrastructure and software model that does not allow them to keep
up with the changing requirements, and therefore ACME is looking for a new
approach to both application and infrastructure life cycle management. The
application developers have been looking at new application development trends
such as Continuous Delivery and Continuous Integration, and the new application
platform is to be developed in this manner. To support this, the infrastructure
components need to be capable of mapping to these new paradigms in a way that
is not possible using traditional concepts.
One of the largest
challenges ACME has historically faced is that operations and infrastructure
has been an afterthought to product development. This has led to several
situations where application deployments have meant long weekend hours for all
of the teams, caused customer-impacting outages, and taken longer to accomplish
than the business leaders would have liked. For this reason, ACME Inc. has
decided to change by creating an environment where infrastructure artifacts are
treated as part of the application, can be checked into version control, can be
tested alongside the actual application, and can continually improve.
While ACME is
intensely focused on delivering the new application platform in a timely
manner, ACME is also interested in creating a foundation on which it can grow
to deliver a common pool of infrastructure that is shared across all business
groups and operated in a multi-tenant fashion to increase efficiency.
At an executive
briefing, John Chambers, the CEO of Cisco Systems at the time, told ACME: "The
world is changing. Every company is a technology company, and if you don't
adapt, you'll get left behind."
As evidenced by the
success of cloud platforms, such as Amazon Web Services (AWS) and OpenStack,
consumption models of technology delivery have the ability to adapt technology
more quickly to rapid business requirements changes. This is the type of
consumption that ACME Inc.'s business owners need. Control of operations is
what operations groups are focused on, but control can be a barrier to a pure
consumption model. Unless companies make investments in technologies that allow
for consumption of automated components, the only other way to scale is by
breaking the human level component, and few people would really choose to work
for that type of company.
current offers from various technology vendors, ACME Inc. selected
Cisco Application Centric
The ability to abstract all physical and virtual infrastructure configuration
into a single configuration that is consistent across dev, test, and prod
environments, as well as portable across the various data center locations
currently maintained by ACME, is highly desirable.
has been built from the ground up to change the substructure used to build
network devices and protocols. Innovation at this level will provide more
opportunities for expanding the tools with which users interact. This is where
the fulcrum will tilt in the favor of IT and infrastructure being more dynamic,
thus allowing IT to operate and manage at the speed of business. However, with
a change of this nature comes fear, uncertainty, and doubt. This book will
attempt to bring some level of comfort and familiarity with operations
activities within an
While ACME Inc. is a
fictitious company, this is the true story of every company, and just important
this is the story of the employees of those companies. Workers in the IT
industry need to adapt to keep up with the rapid change of the business.
However, this runs contrary to how most operations groups exist in the
relationship between business and technology. Most IT operations groups invest
a lot of time in the tools needed to deliver services today and there is an
organic resistance to re-investing. The thought is, "Why fix what is already
The Why, Who,
What, When and How
ACME is looking to simplify how it operates infrastructure, but recognizes that
this initiative, this application, and this infrastructure are new to ACME Inc.
ACME must address fundamental questions including Who manages What, and How
they go about their tasks. When different groups perform regular operations,
and Where they go to perform these operations, are also considerations, but
more tactical and point-in-time-relevant. This section discusses the relevant
aspects of these monikers as it relates to
fabric operations and how a company such as ACME Inc. can divide the workload.
"Why" is the most
important aspect of what should be considered in operationalizing an
fabric. In the case of ACME Inc. the key success criteria is to streamline
processes and procedures related to the deployment of infrastructure required
to support the application initiatives. To achieve the desired result, a high
degree of automation is required. Automation adds speed to repetitive tasks and
eliminates errors or missed steps. Initially, automation can be a scary
proposition for some of the key stakeholders, as it could be looked at as a
threat to their own job. Quite the opposite, automation is about making work
more enjoyable for all team members, allowing them the freedom to innovate and
add value, while removing mundane, repetitive tasks. Looking at why an
automated fabric is beneficial to an organization is important for setting
expectations of return on investment. Also, looking at why an operational
practice is done a specific way can help with framing the tools and processes
that are employed.
As with most
organizations, ACME Inc. traditionally had different types of stakeholders
involved in making any IT initiative successful, and each has a specific
element of the infrastructure in which they have specific expertise and about
which they care most. In any IT organization, these groups can have distinct
organizational boundaries, but more likely the boundaries are blurred to some
degree. Listed below are some characteristics of these groups, but keep in mind
that some of these characteristics might be combined. At the macro level, the
fact that these different organizations exist should not be evident to the
end-user. Instead, the entire organization should be seen as one team with a
common goal of delivering value to their organization.
and Application Team is focused on the software and applications the company
uses internally and the software that it delivers to its customers. The
Application part of the team contains application owners and subject matter
experts that ensure other business units are able to do their jobs by utilizing
the business applications available. The Development part of the team will be
writing the mobile application software platform. Both parts of this team will
need to work closely with the other teams in this section to enable the best
design, performance, and availability of applications for the end users.
ACME's Network Team
is primarily focused on creating and managing networking constructs to forward
packets at Layer 2 (MAC/switching) and Layer 3 (IP routing). The team is
challenged with juggling application requirements, managing SLA, and assisting
in the enforcement of information security, all while maintaining high levels
of availability. What the team needs to know is how to configure the networking
constructs, how to tie Layer 2 to Layer 3, how to verify forwarding, and how to
troubleshoot network forwarding aspects in the fabric. With
the team is the most interested in decoupling overloaded network constructs and
returning to the specific network problems that the team was intended to solve,
while allowing other groups to leverage their specific expertise to manipulate
security and application level policies. The team is also interested in
allowing more transparency in the performance of the network forwarding, and
the team is making key metrics available on demand in a self-service capacity.
ACME's Storage Team
is primarily focused on delivery of data storage resources to the organization.
The storage team is concerned with protecting the data in terms of
availability, as well as making sure that sensitive data is secure. The storage
team has been very successful in maintaining very tight SLAs and has
traditionally managed separate infrastructure for storage access. The
capabilities provided by the
fabric allow them to confidently deploy newer IP-based storage and clustering
technologies. The team is also very interested in being able to see how the
storage access is performing and would like to be notified in the event of
contention. The team typically has some specific requirements around QoS,
multi-pathing, and so on. Historically, the team had to worry about delivering
a storage fabric in addition to managing storage devices themselves.
will provide the storage team with the visibility they will require. These
capabilities are primarily discussed in the monitoring sections.
The Compute and
Virtualization Team at ACME Inc. is wrapping up a major initiative to
virtualize the server farms that it is responsible for maintaining. The team
also recently employed new configuration management tools to account for new
workloads that fell outside of the virtualization effort to get similar agility
for bare metal servers that the team gained from its virtualization efforts.
This is timely as the application rollout will have both virtualized and
non-virtualized workloads. Additionally, the application developers are
increasingly interested in leveraging Linux container technologies to allow for
even greater application portability. The Compute and Virtualization teams are
for its ability to provide common access to physical and virtual servers,
allowing the team to publish endpoint groups to virtualization clusters from a
centralized place across multiple hypervisors. These capabilities are discussed
further in the Fabric Connectivity chapter.
Security Team at ACME Inc. has traditionally been engaged late in an
application deployment process, and has been responsible for performing
vulnerability assessment and data classification efforts. With the current
project, the new application will be storing sensitive customer information,
including credit card numbers. Due to the sensitivity of this information and
the security aspects of the
fabric, the Information Security Team is able to provide input earlier in the
process and avoid re-doing work because of security or compliance issues. The
Information Security Team is interested in the operational aspects of the
security model as it relates to the following capabilities: tenancy, Role Based
Access Control (RBAC), monitoring, and Layer 4 to Layer 7 services.
The aspect of "what"
can be looked at in many different ways, but the main concept in the context of
this book is what tools are used to manage operations of an
fabric. In a traditional network, you have some traditional tools, such as CLI
and SNMP, to manage network operations, and these tools integrate into
management platforms and configuration and management processes.
there are some elements of the traditional tools, but the fabric management is
rooted in an abstracted object model that provides a more flexible base. With
this base, the operator of the fabric can choose from multiple modes of
management, such as GUI, CLI, API integration, programming, scripting, or some
combination of these. How a tool is selected in
will often be a product of what is being done and the aspects of how the tool
is used. For example, if an operations staff is trying to gather a bunch of
information across a number of interfaces and switches or is managing the
configuration of many different objects at once, scripting might be more
efficient, whereas simple dashboard monitoring might be more suited to a GUI.
"When" refers to
when the teams listed above are involved in the planning. It is a good idea to
involve the different teams early when building policies and processes for how
the fabric is implemented and then managed. The collaborative nature of
allows for a high degree of parallelization of work flow. This is a key
and traditional processes that were very serial in nature, resulting in a
longer deployment time for applications and a higher mean-time to resolution
when issues arise.
"How" answers the
following basic questions:
How does a networking person
go about configuring the network forwarding?
How does the compute team get
information from the infrastructure to make optimal workload placement
How do the application team
track performance and usage metrics?
How does a storage team track
the access to storage subsystems and ensure that it is performant?
When "how" involves
making a change to the configuration of an environment, an important
consideration is change control. Change control is a fact of life in the
mission-critical environments that
has been designed to support. The
policy model has been designed to reduce the overall size of a fault domain and
provide a mechanism for incremental change. There are mechanisms for backup and
restore that will be discussed in follow-on chapters. We will also discuss the
model and which objects affect the tenants and the fabric as a whole.
An evaluation of
current change control and continuous integration/delivery strategies is
warranted as operational procedures evolve. Throughout this book we will
highlight the methods and procedures to pro-actively and reactively manage the
As a baseline, most
organizations are implementing some kind of structured change-control
methodology to mitigate business risk and enhance system availability. There
are a number of change/IT management principles (Cisco life cycle services,
FCAPS, and ITIL) that are good guides from which to start. A common sense
approach to change management and continuous integration should be a premise
that is discussed early in the design and implementation cycle before handing
the fabric to the operations teams for day-to-day maintenance, monitoring, and
provisioning. Training operations teams on norms (a stated goal of this book)
is also key. Applying change management principles based on technology from
five years ago would not enable the rapid deployment of technology that
The multi-tenant and role-based access control features inherent to the ACI solution allow the isolation or drawing of a very clean box around the scope and impact of the changes that can be made.
For more information, see Role-Based Access Control.
change must be evaluated primarily in terms of both its risk and value to the
business. A way to enable a low-overhead change management process is to reduce
the risk of each change and increase its value. Continuous delivery does
exactly this by ensuring that releases are performed regularly from early on in
the delivery process, and ensuring that delivery teams are working on the most
valuable thing they could be at any given time, based on feedback from users.
In the Information
Management Systems world, there are three fundamental kinds of changes:
are by definition a response to some kind of technical outage (hardware,
software, infrastructure) and are performed to restore service to affected
Normal changes are
those that go through the regular change management process, which starts with
the creation of a request for change which is then reviewed, assessed, and then
either authorized or rejected, and then (assuming it is authorized) planned and
implemented. In an
environment a normal change could apply to anything within the following
Fabric Policies (fabric
internal and access will be discussed in detail later)
Configuration objects in the
Common tenant that are shared with all other tenants (things that affect the
Virtual Machine Manager (VMM)
Layer 4 to Layer 7 devices
Creation of logical devices
Creation of concrete devices
Layer 2 or Layer 3 external
Attachable Entity Profile
Server or external network
Changes to currently deployed
contracts and filters that would materially change the way traffic flows
Standard changes are
low-risk changes that are pre-authorized. Each organization will decide the
kind of standard changes that they allow, who is allowed to approve them, the
criteria for a change to be considered "standard", and the process for managing
them. As with normal changes, they must still be recorded and approved. In the
environment some examples of "standard" changes could be:
Application profile creation
Endpoint group (EPG) creation
Contracts scoped at a tenant
Layer 4 to Layer 7 service
Domain associations for
The items mentioned
above are not intended to be all-inclusive, but are representative of common
tasks performed day-to-day and week-to-week.
The ability to audit
changes that are happening to the environment is a requirement for ACME Inc.
Application Policy Infrastructure
maintains an audit log for all configuration changes to the system. This is a
key troubleshooting tool for "when something magically stops working".
Immediate action should be to check the audit log as it will tell who made what
change and when, correlating this to any faults that result from said change.
This enables the change to be reverted quickly.
A more in-depth
discussion of continuous delivery in the context of infrastructure management
is outside of the scope of this book.
The remainder of
this book answers these questions, providing you with a framework of how to
take the concepts and procedures and apply them to similar initiatives within
your organizations. The book is laid out in a specific order. However,
enables ACME Inc. to complete these tasks in parallel with the various
stakeholders who are highlighted throughout, and this book illustrates how the
stakeholders can work together in a more collaborative manner than they have in
the past. While some scripting opportunities are called out throughout the
book, there is a more in-depth section at the end that explains how to use the
API to automate most operational tasks. While organizational structures might
be siloed into these teams, to provide the greatest value to the customer,
user, and ultimately the business, the most important thing is for these groups
What Is New in
Application Policy Infrastructure
release 1.2(1) introduces many new features and enhancements. This section
highlights the new features and provides a brief overview. For additional
information, including new hardware support, see the Release Notes for your
Cisco Application Centric
A new "basic" GUI
mode of operation has been added. You are given the option at the
login screen to select the
Advanced GUI mode. The goal of the simplified GUI is
to offer a simple interface to enable common workflows. The GUI operational
mode enables administrators to get started easily with
with a minimal knowledge of the object model. The simplified GUI allows the
configuration of leaf ports and tenants without the need to configure advanced
policies, such as switch profiles, interface profiles, policy groups, or access
entity profiles (AEPs). The
administrator can still use the advanced (regular) GUI mode as desired.
Although the basic GUI is great for those users who are new to
Cisco recommends leveraging the advanced GUI for scale operations, existing
fabric deployments, and more granular policy control.
approach to configuring the
allows the user access to almost all aspects of the policy model while it
requires a comprehensive understanding of the policy model and the framework.
Prior to the NX-OS-style CLI, the
CLI configuration mechanism required an understanding of all related Managed
Objects (MO) for the
policy model, and the CLI allowed creating, editing, and saving those managed
objects that were represented as a UNIX file system by using commands such as
moconfig. This approach was radically different from
Cisco IOS/NX-OS CLI features that hide most of the internal details to simplify
the configuration process as much as possible. To align better with existing
NX-OS-style command interfaces, an NX-OS-style CLI for
was introduced with the goal of harnessing the power of
without the burden of learning the details of the
The new NX-OS-style
CLI for the
uses the familiar NX-OS syntax as much possible. The NX-OS-style CLI provides
intelligence to user inputs to create or modify the underlying
policy model as applicable without sacrificing the power of the
policy model to deploy applications at scale.
Depending on the
mode that you are in, commands might not be active in that context. For
example, if you are in the POD configuration mode [apic1(config-pod)#], the
running-config ntp" command shows the current NTP configuration. If
you are in the NTP configuration mode [apic1(config-ntp)#], the "show running-config
server" command shows the current NTP configuration. Use the
where command to determine your current mode. For
apic1(config)# pod 1
configure; pod 1
configure; pod 1; ntp
By default, when
ssh into an
running version 1.2(1) or later, you will automatically be placed into the
NX-OS-style CLI and not into the object model CLI of the previous releases. To
use the object model CLI, use the
bash command. You can execute a single command in the
bash shell by using the following command:
bash -c 'path/command'
bash -c '/controller/sbin/acidiag avread'
Access Control Rule Enhancements for Layer 4 to Layer 7 Services
Layer 4 to Layer 7
policy configurations in a multi tenant environment required administrator
intervention to create certain objects that cannot be created by tenant
administrators using the classic role-based access control (RBAC) domains and
roles. This introduces a requirement for more fine-grained RBAC privileges in
policy model. Tenant administrators can now create RBAC rules via self-service
without global administrator intervention to grant permissions for resources
under their tenant subtree to other tenants and users in the system.
fabric uses ingress endpoint group-based access control lists (ACLs) for policy
enforcement, except in a few cases such as when the destination endpoint group
is an L3Out or for when the ingress leaf switch does not know which endpoint
group a destination end host is in with the bridge domain in hardware proxy
mode. In the case of these exceptions, policy enforcement occurs as the packet
is leaving the fabric.
administrators can now move policy enforcement from the border leaf (where the
L3Out connection is located and the ingress/egress policy has been
traditionally enforced) to the non-border leaf where the incoming connection is
being sourced. The goal is to reduce the number of ACL rules that need to be
programmed on the border leafs for tenants leveraging L3Out connectivity and to
migrate those to the ingress leaf switch of the connection source instead.
A new configuration
property called "Policy Control Enforcement Direction" has been added to the
Layer 3 External Outside endpoint group. This property is used for defining
policy enforcement direction for the egress traffic on an L3Out. The new
default configuration is ingress for newly created Layer 3 External endpoint
groups, but for any pre-existing L3Outs prior to this software version, the old
default behavior will continue in egress mode so that previous behavior is
unchanged. Previously created L3Outs can be manually changed to ingress mode by
wizard (TSW) introduced in version 1.1(1) was limited to testing endpoint
connectivity from within a single tenant. This presented a problem if one of
the endpoints were outside of the user tenant (such as the Common tenant). This
feature has been improved to now support endpoint testing between tenants.
rollback feature allows for configuration snapshots to be created and saved
locally or to a remote location. Two saved snapshots can also be compared to
identify differences between the two. In the event that a fabric policy change
is made inadvertently or that causes an issue, a snapshot can be rolled back
reverting only the modified policies. Existing and unchanged policies remain
unaffected during this operation. The feature also includes a built-in
recurring feature to automate and schedule snapshots. Snapshots can be created
at either the fabric-wide or user tenant level.
plugin for VMware's vRealize Suite has been introduced that allows the
policies from the vRealize automation application. Users can now provision the
underlying fabric in addition to the virtual compute and services using this
orchestration and automation software. The plugin provides access to
policies, workflows, and blueprints.
Mode for Layer 4 to Layer 7 Services
The Layer 4 to
Layer 7 service insertion feature enables an administrator to insert one or
more services between two endpoint groups. The
allocates the fabric resources (VLANs) for the services and programs fabric
(leafs) and service appliances as per the configuration specified in service
graph. Previously, the
required a device package for Layer 4 to Layer 7 services to be used as part of
the service graph. Part of this operation also involved the
programming the service appliance during graph instantiation.
environments, it may be desirable that the
only allocate network resources for the service graph and program only the
fabric side during graph instantiation. This may be needed for various reasons.
For example, environments that may already have an existing orchestrator or a
DevOps tool responsible for programming the service appliance. Another common
instance is where a device package for the service appliance may not be
available for that platform.
Unmanaged mode for
services adds the desired flexibility. If enabled, it restricts
to only allocate the network resources for a service appliance and only program
the fabric (leaf). The configuration of the device is left to be done
externally by the customer.
Network Management Protocol support for
enhancements have been added to the existing SNMP agent on the APIC and fabric
state change Trap
sensor threshold trap
in-let line/cable status change trap
with vSphere 6.0 & vCenter 6.0 was introduced in APIC version 1.1(2).
Support only included the integration of the VMware Virtual Distributed Switch
(vDS). Additional support for the Cisco AVS has now been included in this
release. The only support restriction for the 1.2(1) release is that
cross-vCenter and cross-vDS vMotion is not supported on the AVS. These features
are fully supported for the VMware vDS.
deployments where there are multiple fabrics deployed, moving an endpoint from
one fabric to another fabric required changing its default gateway or IP
address to avoid hairpin traffic. This feature allows the seamless movement of
endpoints between fabrics without having to change anything.
feature requires the setup of two similarly configured bridge domains on each
fabric. The bridge domain in each fabric is configured with the same virtual IP
address and virtual MAC address, which results in reachability from the
endpoints in that bridge domain, regardless of in which physical fabric they
Microsegmentation (uSeg) Support
software release 1.1(1),
introduced the concept of attribute-based endpoint groups (aka uSeg). This
allows for special endpoint groups called "uSeg EGPs" to be defined with
various attributes, including virtual machine attributes (such as VM Name, OS
Type, and Hypervisor), IP address, and MAC sets that would be matched against
virtual machines. Matches for any of the attributes apply the policy for that
uSeg endpoint group against the virtual endpoint dynamically without having to
re-assign the Virtual Port Group binding. Previous to this release, only VMware
virtual machines were supported. Now, support has been extended to include
Microsoft HyperV virtual machines.
sit behind a load balancer, the client request gets sent to the load balancer
and then is relayed to the destination server. When the server responds to the
client, it follows the same path back to the load balancer and then onto the
client. This can often make the load balancer a bottleneck for communications
and this can degrade network performance. Direct server return allows the
destination server to respond directly to the client without having to be
relayed through the load balancer.
Shared Layer 3
software releases, there were limitations on how Layer 3 Out connections could
be deployed and used by multiple tenants. For many cases the user had to define
a Layer 3 Out policy for each tenant, despite those tenants using the same
In this release,
improvements to this feature add the ability for a tenant or VRF to use a Layer
3 Out connection that is configured in a different tenant. The common use case
for this feature is a fabric with a Layer 3 Out in the "Common" tenant, which
is shared by multiple user tenants. This is accomplished by leaking learned
routes between two tenants.
One limitation of
this feature is that the passing of learned routes in one tenant that are to be
passed out of the shared L3Out connection, which is also known as transit
routing, is not supported.