Note that I don't necessarily recommend using GNS3/dynamips for testing OER; OER is buggy enough in 12.4T that stacking a virtualized environment on top of that isn't doing you any favors. However, making rapid topology changes in GNS is significantly easier than in a physical lab, so I'll be using it for the purposes of creating this document.
OSPF area 0 will be using 192.168.0.X/24, where X is the router number.
All routers will have a loopback of X.X.X.X/32, where X is the router number.
R4 will have an additional loopback1 of 18.104.22.168/32, to simulate an extra upstream target prefix.
The links between R2 and R4 will use the following IPs:
R2 S2/0 -> R4 S1/0 10.0.24.X/24
R2 S2/1 -> R4 S1/2 10.0.242.X/24
Where X is the router number.
The link between R3 and R4 will use the following IPs:
R3 S2/0 -> R4 S1/1 10.0.34.X/24
Where X is the router number.
We're going to be sourcing traffic off R5 and R6, headed towards R4. R1 will be the master controller, with R2 and R3 acting as border routers. I've deliberately kept the MC out of the traffic flow, as it has no need to be involved in the data plane at all. While there are AS numbers pictured on the diagram, I'm going to start out with static routing and progress towards BGP. Note I've also used exclusively serial links between the BRs and the destination router; it's much easier to create a "slow" serial link for testing than it is to police down Ethernet.
First, we're going to setup our border routers. The border routers register TO the master controller, not vice-versa. This is accomplished over a TCP session, the default port is 3949 and can be changed with the "port" statement on the BR and MC. In our case, R2 and R3 will be our border routers.
Not much is actually configured on the border routers. Most of the config is done on the master controller.
R2 & R3:
key chain MY-KEY-CHAIN
local Loopback0 ! where to source your TCP session to the MC from
master 22.214.171.124 key-chain MY-KEY-CHAIN
As I'm sure you're already aware, a router can act as both a MC and BR simultaneously. You must have at least one internal interface and two external for OER to work (there's not much point in a load balancing protocol with only one link to load balance to!)
We're going to configure the MC next, but only configuring R2 as a border, to start with a simple scenario.
key chain MY-KEY-CHAIN
border 126.96.36.199 key-chain MY-KEY-CHAIN
interface Serial2/0 external
interface Serial2/1 external
interface FastEthernet1/0 internal
Since we've got logging enabled, we should see the BR register:
*Oct 14 12:21:37.907: %OER_MC-5-NOTICE: BR 188.8.131.52 UP
*Oct 14 12:21:38.019: %OER_MC-5-NOTICE: BR 184.108.40.206 IF Se2/0 UP
*Oct 14 12:21:38.119: %OER_MC-5-NOTICE: BR 220.127.116.11 IF Se2/1 UP
*Oct 14 12:21:38.119: %OER_MC-5-NOTICE: BR 18.104.22.168 IF Fa1/0 UP
*Oct 14 12:21:38.119: %OER_MC-5-NOTICE: BR 22.214.171.124 Active
We'll also see the master controller meet basic OER criteria and enable itself:
*Oct 14 12:21:38.123: %OER_MC-5-NOTICE: MC Active
We can confirm this with the following show command:
R1MC#show oer master border
Border Status UP/DOWN AuthFail Version
126.96.36.199 ACTIVE UP 00:01:58 0 2.2
KEY-TOPIC: Parent route
As I mentioned above, we're going to leave BGP for later, so for now, we need what OER/PfR calls a parent route on the BR. This is a vitally important topic for OER. The general logic is simple. When OER isn't controlling routes, if the BR isn't willing to route down a link for a particular prefix, then OER shouldn't either. This is black-hole prevention. If R2 doesn't have a route to our target(s) on R4, then OER won't force the link into play. A key element to remember here is that OER only sees up to the edge of the network that it "owns"; in our diagram, it has no idea what is on R4 or happening beyond R4 (this is slightly less true when using BGP, but we'll get there later).
Since R2 has two links, we're going to use two static routes to create parent routes:
ip route 0.0.0.0 0.0.0.0 Serial2/0
ip route 0.0.0.0 0.0.0.0 Serial2/1 5
So we're going to prefer s2/0, with a floating static route towards s2/1 as a "backup".
Almost every OER sample config uses defaults, and I'm doing the same, however, I would like to point out that these don't need to be defaults. I could just as easily create a series of more specific routes for R4's prefixes.
I've also created a similar route back on R4, pointing strictly down s1/0.
ip route 0.0.0.0 0.0.0.0 Serial1/0
Since R4 isn't participating in OER - we're going to pretend they're an ISP - all return traffic is going to come back on R4 S1/0 -> R2 s2/0. So basically, this is going to be strictly egress load balancing for our OER-controlled network. This is life with static routing; things get better when we move on to BGP.
We're going to need a way for R5 to reach R4. Let's redistribute statics into OSPF at R2.
router ospf 1
redistribute static subnets
default-information originate ! We need that static default to actually make it into OSPF
Verify that R5 and R6 have the route:
R5INTERNAL(config-router)#do sh ip route ospf
O*E2 0.0.0.0/0 [110/1] via 192.168.0.2, 00:00:59, FastEthernet1/0
R6INTERNAL(config-if)#do sh ip route ospf
O*E2 0.0.0.0/0 [110/1] via 192.168.0.2, 00:00:13, FastEthernet1/0
There it is... let's make some traffic flows now. We're going to send large pings from R6 towards R4, and we're going to have R4 pull a chargen TCP flow from R5.
service tcp-small-servers ! for chargen, TCP port 19
telnet 188.8.131.52 19 /source-interface Lo1
...which should result in....
Trying 184.108.40.206, 19 ... Open
ping 220.127.116.11 repeat 2147483647
Note we're not going to do a timeout of 0, nor are we going to use the maximum packet size. That's an excellent way to swamp a link on physical gear, but on a dynamips router, that will completely obliterate your test network. In my scenario, so much as trying it briefly caused OSPF to fail on R6 and R2 immediately. It's easier to just tune your serial links down to the traffic speed your ping is creating. In my case, on an i7 2.4ghz quad-core system, the above ping creates about 19k of data:
R2BR(config-if)#do sh int s2/0 | i 5 minute
5 minute input rate 19000 bits/sec, 20 packets/sec
5 minute output rate 20000 bits/sec, 20 packets/sec
Therefore I'm going to detune the serial interfaces on R2's serial links to appear as having 40K of bandwidth.
Potential Pitfall: load interval
In the real world, depending on how fast you want OER to react to changes, it may be just fine to leave the load-interval at the default 5 minutes. However, in a lab or testing environment, five minutes is an eternity. Always turn your load-intervals on your external border router interfaces down to the minimum 30 seconds.
I'm going to go ahead and do this to R3 now as well, for future use.
Now that we've got some traffic congestion and we've dropped our load interval, let's tell the MC to start learning traffic flows. But first, there are some timers to adjust here.
KEY TOPIC: Speed up OER's reaction time
Once again, in the real world, there may be business requirements for traffic flows to be examined over long periods of time, or for OER to only change routing every so often. However, the default OER timers are completely implausible for a lab, be it your home testing or the CCIE lab exam.
There are four timers to be concerned about.
When using the learning feature:
periodic-interval - This is basically a sleep timer. It tells the router how often to start the learning phase. Given that you're likely going to start with a scenario and then modify it continuously to see how OER behaves, this needs to be low. The default is two hours! I would recommend 0-1 minutes for a lab, tops.
monitor-period - This is how long we learn traffic flows for. This is the process triggered between periodic-intervals. In other words, the loop is:
periodic-interval (sleep) -> monitor-period (learn) -> periodic interval (sleep) -> etc
This defaults to five minutes, which isn't the eternity we see in periodic-interval, but it's still a rather long time to wait when experimenting. I would recommend setting it to 1 minute.
backoff - In a nutshell, this basically tells OER that if a traffic flow is stable and in-policy (basically it's routing fine, per other rules), that OER should let more time elapse before examining it. By default, the min-timer is 300 seconds, the max-timer is 3000 seconds, and the step-timer is 300 seconds. Let's say a particular flow has just been seen for the first time by OER, and it's consistently in-policy. It will be looked at for 300 seconds. If it's in-policy at the end of 300 seconds, nothing will change, and OER will now only look for it to be out-of-policy after min-timer + step-timer. By default, that means that 600 seconds will have to pass before it's examined again. If it makes it through 600 seconds, the next time will be 900 seconds, and so on and so forth, until it his max-timer of 3000 seconds. This is undesirable for a lab environment; as we tweak traffic flows we want OER to react immediately. The minimum settings for min-timer and max-timer are 90 seconds, and this is what you'll want to set backoff to.
holddown - This timer controls how long a prefix is dampened after a routing change is made. During the holddown period, the routing can't be shifted again, even if the new route goes out-of-policy. This has some obvious good use in a production network, however, for a lab environment, we need this timer small as well. The default is 300 seconds, the minimum is 90, which is what we will be using.
backoff 90 90
Now that we're better equipped for testing, let's get this show on the road.
By enabling "throughput", we've just told OER to start learning bandwidth usage from the BR.
We should see this on the logging output:
*Oct 14 14:21:23.615: %OER_MC-5-NOTICE: Prefix Learning STARTED
After it's done, which should happen in a whopping one minute if you tuned your timers correctly, you'll get:
*Oct 14 14:22:24.207: %OER_MC-5-NOTICE: Prefix Learning WRITING DATA
Let's see what we got from the router's learning procedure.
show oer master traffic-class
which should have some output like this:
We'll concentrate on some of this output later. What's important right now is that we've discovered both our flows: the icmp from R6 to R4 Lo0, and the TCP flow from R5 to R4 Lo1.
At this point, even though we're miserably oversubscribed on R2's S2/0 link, no routing changes are being made. That's because OER defaults to "observe" (monitor only) mode:
R1MC#show oer master | i mode route
mode route observe
If we want OER to actually try to start solving problems, we need to set it to mode route control.
Before we do that, however, we need to fine-tune what OER will be attempting to "solve". By default, OER will try to "fix" bandwidth, delay, and "range" problems. Range is the difference in traffic between the various interfaces. For simplicity, we are going to narrow it down to just bandwidth right now.
no resolve delay
no resolve range
resolve throughtput ! this is default, but here for clarity. throughput = bandwidth
mode route control
During this process the MC should start complaining that you're oversubscribed:
*Oct 14 17:08:32.054: %OER_MC-5-NOTICE: Load OOP BR 18.104.22.168, i/f Se2/0, load 31 policy 26
Note "load 31". This is in reference to throughput above.
So how did it determine this "load"? That can be seen on the BR. Netflow is automatically enabled on BRs - note it's not visible in the running config, however. On the BR, simply run show ip cache flow. It's a bit difficult to interpret. The are several ways to get better stats out of this, but those are beyond the scope of this document.
You'll also see "policy 26" in the output above. By default, maximum throughput is defined at 75% of the interface's bandwidth. Since we've set the bandwidth to be 35k, 75% of 35k is 26k. The percentage, or absolute value, can be set in this fashion:
interface Serial2/0 external
max-xmit-utilization [absolute | percentage]
After a few minutes, you should end up with a similar output to this:
OK, great! We have one prefix (22.214.171.124/24) in-policy, and one prefix (126.96.36.199/24) in HOLDDOWN. Holddown was explained above - this is a timer/phase to prevent the route from flipping exits too frequently. Consequently, that means a routing change was recently made, and we can now see that this prefix is being forwarded out Se2/1 via a static route.
If we give it a bit longer, we'll end up with both interfaces in-policy:
We can then go and see what OER has actually done on R2.
show ip route static
188.8.131.52/24 is subnetted, 1 subnets
S 184.108.40.206 [1/0] via 0.0.0.0, Serial2/1
S* 0.0.0.0/0 is directly connected, Serial2/0
We've got a more specific static route towards 220.127.116.11/24!
You can get similar information from an OER show command on the BR:
R2BR#show oer border routes static
Flags: C - Controlled by oer, X - Path is excluded from control,
E - The control is exact, N - The control is non-exact
Flags Network Parent Tag
CE 18.104.22.168/24 0.0.0.0/0 5000
But why is this /32 address showing up as a /24?
By default, OER's learn function aggregates prefixes to /24s. This can be modified:
aggregation-type prefix-length 32 ! to aggregate to /32, for example
You can also aggregate to the BGP prefix length, which we will cover later.
So now that we've covered the throughput basics, we'll switch over to delay. But before we just delve in - what exactly is the difference, at the IOS level, during this learn process, if we specify "throughput" or "delay"? Take a look at the config sample and output below:
Learned passive delay while the learn delay option is off.
From Cisco's documentation, the options in the "learn" field determines how the various flows are prioritized in the top-talkers chart on the BR. There's not much detail on this, but for a lab environment, it doesn't really matter anyway - we're not likely to have enough flows to fill the top talkers chart. So bottom line is, you're going to learn delay regardless, it's just whether or not it prioritizes the selected flows
delay threshold 15
no resolve utilization
resolve delay priority 1 variance 25
You'll notice we'll specify priority here. You may end up in a situation where delay is better on one link, but utilization is worse. Do you use that link or the one where delay is worse, but utilization is better? The lower the priority, the more likely it is to be the deciding factor. variance is a modifier applied to the "delay threshold 15" statement above. The variance variable determines if another interface is "better" than the interface the traffic is presently configured on, if the current interface is out-of-policy. In other words, if my present interface has a delay of 35, and the one I might switch to has a delay of 29, (35 present delay) - (proposed 29 interface delay) = 6. 25% (variance) of 35 is 8.75; 6 (difference above) is less than 8.75, therefore don't swap to the proposed interface, even if it is better. variance can also used with throughput.
Before I delve any further, it's important to note that OER in 12.4T is basically a beta feature. Any time you do a major change in a lab, I recommend doing a:
clear oer master *
That does exactly what it looks like - it forgets everything and starts over.
Under extreme circumstances, if you've got an oddity you just can't resolve... save your config, and reboot the MC. If you need further evidence that this ridiculous step is actually necessary, watch the last few videos in Brian Dennis' videos (see link at the beginning of the document). It happens to him several times.
Returning to the observations on delay, there are two ways that OER attemps to measure delay, by default. Passive, through netflow, and active, through ip sla. The passive method only works with TCP flows. It measures the delay between SYN & SYN/ACK. The active method automatically fires off IP SLAs; you don't need to configure them, and they won't show up in the running config. Of important note, neither of these methods fire off during the learning phase. The learning phase only identifies the flows. The "measure" phase, which I can't seem to find any "show" or "debug" commands to identify (it may just be a buzzword for the time after learning), implements the delay monitoring. This can be examined from the MC with the following debug commands:
debug oer master collector netflow ! for passive delay
debug oer master collector active-probes ! for active delay
The active method is reasonably straightforward, but the passive has some hidden "gotchas" from a lab perspective. First and foremost, as I've already mentioned above, the netflow method only works with TCP flows. Not only does it require a TCP flow, starting up a big, lengthy TCP session doesn't help a lick, and won't give you any results. You have to have SYN/SYN ACK involved, so you must initiate at least one TCP session during the measure phase. When using the chargen method above, I made a habit of doing a ctrl-shift-6-x on the session and spawning a new one (or several). For reasons unknown, OER may or may not pick up on your sample traffic anyway, so I usually just spawn a new telnet session every 20 seconds or so until it latches on and gives some results.
You can view the active probes on the MC with show oer master active-probes:
You can also get a near-identical output on the BR - obviously just for the probes specific to that BR - by using show oer border active-probes.
To check out what delay results we're getting both passively (netflow) and actively (ip sla), you use the show oer master traffic-class:
Notice that the TCP flow to 22.214.171.124/24 shows 116 for PasSDly (passive delay) and 23 for ActSDly (active delay). Don't worry too much about the values being wildly different that one another; this is GNS we're talking about. Note that 126.96.36.199/24 has U (unknown) in PasSDly because it's an ICMP flow, and passive delay is only measured by the distance between SYN and SYN/ACK in TCP flows.
I can't say this particular excerise works out too well, because every time I've labbed it, it sees both interfaces as out-of-policy - doubly so on the TCP flow, and only actively on the ICMP flow. It does what you'd expect and moves them both over to the "unknown" of the backup interface, serial2/1. For reasons unknown - perhaps a GNS issue, perhaps an OER issue - both passive and active delay monitoring stops (all show as "U") when moved to serial2/1. If I let it sit long enough (5+ minutes), this eventually gets better, and active probes and passive monitoring will fire back up. If the moons align, the traffic will balance between s2/0 and s2/1, but I wouldn't have high hopes on this working reliably:
This being GNS, I question whether this would work long-term anyway, the delay is more than likely a CPU/virtualization issue, not a line or interface delay.
In order to show a new feature and eliminate the problem just described, let's look at limiting the types of traffic that OER can learn.
The dead simplest way to do this is to specify which protocols, or protocols plus ports, that you want OER to learn. This is accomplished under learn with the protocol command:
<1-255> Specify the protocol number
tcp Learn top prefixes based on TCP protocol
udp Learn top prefixes based on UDP protocol
You can optionally get more granular with tcp and udp:
R1MC(config-oer-mc-learn)#protocol tcp port ?
<1-65535> Specify the port number
gt Learn top prefixes having port number greater than
lt Learn top prefixes having port number less than
range Learn top prefixes for a range of port numbers
For now, let's keep it simple:
I tried to be more specific with this and specify ports, I really did. But I can't get TCP matching to work with ports at all. I'm going to chalk that up to a bug. Thankfully there are several other ways to accomplish this task.
Now that we've specified that OER learn just TCP, we end up with this output:
Now that we're no longer concerned about balancing that large ICMP flow that's causing the delay issue, we can move right on to fixing the TCP flow. A few minutes later:
Perfect! The TCP flow is on S2/1, and the ICMP flow is on S2/0, as we can see from show ip route static on the BR:
R2BR#show ip route static
188.8.131.52/24 is subnetted, 1 subnets
S 184.108.40.206 [1/0] via 0.0.0.0, Serial2/1
S* 0.0.0.0/0 is directly connected, Serial2/0
The next filtering method for learning that we'll look at is using the traffic-class command and learn-list command.
Let's start with the same setup I wanted to try above that bugged out with the protocol command.
ip access-list extended only_chargen
permit tcp any host 220.127.116.11 eq chargen
deny ip any any
no protocol tcp
traffic-class filter access-list only_chargen
do clear oer master *
Check out the BR at this point with this command: show oer border passive learn
It will display the results of your filtering.
We can see the output is exactly as desired:
Now let's take a look at using learn-lists.
Potential Pitfall: Before we even get started on this topic, it's important to point out that when using traffic-class under a learn-list, ACLs may only filter on protocols, and prefix-lists must be used to filter on prefixes. You'll see more how this applies below. This is not the case, however, when using a global traffic-class directly under "learn". These can match both. (See example above)
Learn-lists work a lot like a route-map. They're configured under the "learn" subconfiguration directly.
I'd like to point out that I had zero success getting learn-lists working correctly under 12.4(15)T or 12.4(24)T on a dynamips 7206. I labbed this for almost six hours trying every permutation I could think of. I suspect this is a bug on this platform with these earlier versions of OER. Any example from here forward should, but do not, work under these versions.
Let's take a look at a sample:
ip access-list extended everything
deny ip any any
ip access-list extended voip_traffic
permit ip any any dscp ef
ip prefix-list allnines seq 5 permit 18.104.22.168/32
no traffic-class filter access-list only_chargen
traffic-class filter access-list everything
list seq 10 refname CHARGEN
traffic-class access-list only_chargen filter allnines
aggregation-type prefix-length 32
list seq 20 refname VOIP_RTP
traffic-class access-list voip_traffic
list seq 20 refname HTTP
traffic-class application http
Since I can't actually lab this, I'll run through some of the key logic. First, with the global traffic-class, we're disabling all global learning. Then, as mentioned above, ACLs are traffic match, and prefix-lists are for prefixes. Don't attempt to use the ACL to filter on address. Note the usage in bold. Each class can have its own learn parameters - throughput or delay (mutually exclusive here), aggregation size, etc. The application match is specific to OER - it doesn't use NBAR. However, in versions newer than 12.4(15)T, NBAR can be used as well.
We're just about wrapped up with learning, so let's look at some other minor features.
By default, OER will learn 2500 prefixes. This can be viewed with "show oer master". This can be adjusted with the prefixes command under learn.
The expire after command under learn can be used to adjust how long prefixes are remembered for. This can be specified in quantity of sessions or by timer.
Next, let's look at controlling two routers, adding R3 into the mix.
border 22.214.171.124 key-chain MY-KEY-CHAIN
interface FastEthernet1/0 internal
interface Serial2/0 external
ip route 0.0.0.0 0.0.0.0 s2/0
I'm redistributing static at R3, but no default-information originate, so R2 will still get all the outbound traffic.
OER can balance across many interfaces, but for simplicity I am going to down s2/0 on R2.
I've also gone ahead and cleared the rest of the custom learning and put it back to this:
The only other thing to mention regarding controlling multiple routers are the topology concerns. The BRs need to share a common subnet. In our case their internal interfaces are connected, however, this can also be achieved with a GRE tunnel if necessary.
There's not much to demonstrate here. We see both BRs:
After a couple minutes we have the traffic flows learned:
And a few minutes later, we shuffle one over to R3!
That's all there is to it - it doesn't matter what routers the interfaces are on, the behavior is the same.
I'm going to somewhat abruptly end this post here. The next section I'm covering - now in part 2 of the blog - is OER/PFR BGP interoperability. The catch is I've discovered that the 7200 has major issues running BGP in dynamips, and that the OER features relating to BGP on the 7200 are half-baked (crashes, unusual errors, etc). So we'll start off the next post with a shiny new set of 3725s accomplishing roughly the same job.