Article

A PfRv3 Case Study

I recently had occasion to dig deeply into one of the major pieces of Cisco’s IWAN portfolio, Performance Routing Version 3 (PfRv3). For those who are unfamiliar, PfRv3-enabled routers can make forwarding decisions on traffic flows based on path performance metrics, rather than rely entirely on a dynamic routing protocol that communicates only accumulated cost and minimum bandwidth (or AS-path length, etc). I will not go into a comprehensive discussion of PfRv3 and its mechanics – please visit Cisco’s PfRv3 wiki page to get the basics first.

Anyone who designs networks for a living knows the challenges associated with the somewhat limited toolset of a dynamic routing protocol – manual adjustment of interface metrics, selective route summarization, and other complex logic that influences the path of traffic – can quickly render a design inelegant (if not inadequate).

As I read PfRv3 configuration guides, command references, and wiki pages, I found a lack of coherent detail regarding some of the finer points of PfRv3. Given that the IOS and IOS-XE levels required to run the latest features has only been available for a few weeks, this is not terribly surprising.

In particular, I did not find clear direction for the following questions that I needed to answer for my design:

  • Just what is the Transit Hub/Site feature, and is it worth going to the utmost edge of IOS and IOS-XE code available?

  • How can I configure PfRv3 to handle designs that centralize web proxy/filtering at the hub site, forcing users at branch sites to communicate through the hub to reach the public internet?

I decided that the only way I would really find the answers I sought was to lab a subset of my project environment using Cisco’s VIRL simulation software. If you have not picked up VIRL yet, it is a no-brainer at $150.

Background

This diagram shows the subset of the environment I am designing:

Primary and DR data centers' networking environment subset.

This diagram shows the VIRL environment I used to simulate the design:

VIRL environment simulation.

The two data centers – Primary and DR – connect to each other using multiple 10Gbps links. In real life, OTV extends VLANs between these two data centers enabling workload mobility.

The topology utilizes two DMVPNs – one overlay with MPLS as an underlying transport, and another using commodity internet service as a transport. Both DMVPNs use phase three enhancements, with a two-hub configuration. During normal conditions, the PfRv3 policy design sends latency-sensitive traffic (VoIP), and business critical applications over the MPLS path and all other traffic over the Internet DMVPN path.

This design mandated that ALL traffic bound for the public internet use the primary or DR sites for internet egress due to the centralization of web filtering and other security appliances. This required some fiddling with PfRv3’s static site-prefix assignments on the hub and transit site master controllers.

PfRv3 Site Prefixes

It is not entirely clear in the documentation at times, but hub sites in a PfRv3 environment require static configuration of site-prefixes. In fact, the hub master controller will not even activate unless you define a site-prefix list. Routers learn site-prefixes residing at branch sites automatically, however.

Most examples given in published Cisco documentation show a limited number of site-prefixes (typically within the RFC 1918 space) configured at the hub site. This seems to indicate the primary use case of PfRv3 – controlling internal application traffic between branches and hubs. PfRv3 considers everything outside of the site-prefix list to be default traffic, which will simply take the destination-based path of the routing table – also known as “uncontrolled” traffic. This means that unless I define a prefix-list at the hub that covers the entire routable IPv4 address space in addition to my internal private space, PfRv3 will never attempt to control that traffic. This is critical to my design, as I really want that traffic to be subject to PfRv3 intelligence.

At first, I thought maybe I would be able to get away with just defining a 0.0.0.0/0 entry in my site prefix list – no luck. PfRv3 would not control the traffic.

I then came up with a more specific prefix list that encompasses both the relevant, private prefixes I am concerned with, as well as more generic entries that cover routable address spaces:

Prefix list.

Once I placed that site-prefix configuration on the hub master controller, PfRv3 began controlling internet-bound user traffic from branches to the hubs. What I found was that the traffic-classes generated by PFR chose the most specific site-prefix defined within the hub site prefix-list.

Transit site/Hub feature

Like so many dual data center designs nowadays, the two data center environments I am concerned with are logically one – which is where the transit site feature of PfRv3 comes in.

Available in IOS 15.5(2)T and IOS-XE 3.15.0(S), the transit site feature adds an additional piece of identifying information – a “site-ID” – to each PfRv3 path at hub sites. PfRv3 without the transit site feature still requires a recent version of code – IOS 15.5(1)T1 and IOS-XE 3.13+ – so the jump to support transit site isn’t too far (relatively speaking, of course).

This feature is relevant in cases where there are two major hub sites (in our case, the primary and DR data centers), which are connected to each other with high-capacity, high-speed links. Practically speaking, this means that during an outage of the primary data center’s WAN circuit, traffic can (and should) be redirected to the DR data center’s local WAN circuit, and traverse the high-capacity data center interconnect to reach the primary site without much added latency.

In looking back on the design, we can see that each data center site has two circuits: one DMVPN overlay on top of MPLS and another DMVPN overlay on top of Internet services. If the primary data center’s MPLS link were degraded, we would not necessarily want to send business critical traffic over the Internet DMVPN overlay to the primary data center straight away. Instead, I may want to use the DR data center’s MPLS path, so I still have the quality of service guarantees of my MPLS services. Once the traffic reaches the DR site, it can use the data center interconnect to reach the primary data center.

Without the transit site feature, a branch site has no way of differentiating between the MPLS path to the primary data center and the MPLS path to the DR data center. Since both data centers are logically one site, they both ought to advertise each other’s prefixes in addition to their own (with adjusted metrics, of course). Therefore, to gain the ability to use the DR data center’s MPLS path as a backup to the primary data center’s path, the transit site feature adds an additional “site ID” number to each path leading from the primary and DR data center sites like so:

Path map.

In addition to the extra site ID identifier, the DR data center’s router requires a slightly different configuration to enable it as a transit hub:

DR Router Configuration.

This transit hub is still subordinate to the hub master controller, which is where all policy configuration resides. However, since the transit site is technically a hub, the site-prefix list used at the master hub site must exist on the transit hub router as well. Note that this prefix-list is identical at both sites, and that “master transit 2” will be used to differentiate the DR hub from the primary hub in practice (shown next).

To illustrate this in action, here is a traffic class generated for user internet traffic – notice the differentiation between the primary and backup channels (highlighted below). Instead of identifying the MPLS path as the backup channel, PfRv3 is using the DR hub site’s INET path as the backup channel.

The “pfr-label:X:Y” identifies the site, X, (0 for primary, 2 for DR – remember the “master transit 2” command above?), followed by the path-id, Y.

Traffic class.

Conclusion

I admit that PfRv3 can seem a little alien and risky. PfRv3 requires you to turn over path selection to something other than a routing table, and the latest features of PfRv3 also require a bleeding edge version of code. However, it would be a disservice to completely ignore the potential benefits of performance routing in favor of the status quo. PfRv3’s small amount of configuration and the breadth of policy options it offers makes PfRv3 an extremely attractive solution for those seeking a more adaptive enterprise WAN.

Like any impactful technology or feature, it is extremely important to gain a level of comfort first before deployment. AHEAD’s lab environment and networking expertise help you do just that. Learn more about how you can leverage it below.

Schedule an AHEAD Lab Briefing