Jeff Kronlage's CCIE Study Blog: November 2012

Sunday, November 25, 2012

MPLS Tunnel Next-Hop & LDP Filtering

I ran into a rather tricky-to-debug MPLS scenario.

We're going to setup a rather traditional MPLS configuration - two PEs, two CEs, and one BGP-free P MPLS-only router: The CEs both represent the same customer and will be sharing a VRF.

As I'm sure you're all already aware, setting up MP-BGP, VRFs, MPLS, etc, is quite a lot of config. Nonetheless, in order to accurately convey the point I'd like to make, here's the relevant config from each router:

Automatic v6 over v4 tunnels: ISATAP vs 6to4

Let's compare Cisco's ISATAP and automatic 6to4 tunneling methods.

Here's the diagram:

The square represents an IPv4-only service provider, and the circles are IPv6 islands. The goal is to have any-to-any IPv6 connectivity between the circles, via the IPv4 only-network.

MSDP & Anycast RP

Anycast RP is a pretty "easy" topic, but it has a gotcha I'd forgotten about.

Anycast RP allows RP load balancing inside a single AS by advertising the same IP in multiple spots in the domain, and then announcing that IP as the RP. Those wishing to receive their respective multicast feeds just need to join the closest RP determined by the IGP's metric.

Of course the multicast sources don't know about all this extra-RP craziness we're doing. So we need to tie all the RPs together in full-mesh with MSDP. MSDP will share information on the sources with each other RP.

The config for this is simple enough that I'm not going to bother with a diagram.

Rethinking mroutes; Multicast BGP

I started working on using BGP to distribute multicast routes today. I've touched on this topic a few other times, and while I "kind of" got the idea, it never really sat well with me - I never felt comfortable with it. The problem I've always had, probably stemming from my CCNP studies or early CCIE work, is that I learned mroutes as a way to fix issues with unicast routing. In other words, if the IGP didn't jive with what the multicast was doing, you could use an mroute to bandaid it. Layer BGP on top of that logic, and you've now got nested confusion.

Today, I had a paradigm shift on how I thought of mroutes, and it clarified everything for me. I'm hoping that some of you have the same problem and this may help.

Here's our topology:

All links are 192.168.YZ.0/24; where Y is the low router number, and Z is the high router number. For example, the link between R2 and R3 is 192.168.23.0/24. The interface IPs are 192.168.YZ.X, where X is the router number. R1 has a loopback address of 1.1.1.1/32.

For the moment, ignore the BGP AS numbers.  Pretend this is all one IGP-driven network.

The thing that kept throwing me off is the way PIM incorporates the IGP, or unicast routing table, into its calculations. Everything feels kind of "backwards". Unicasting is inherently about where traffic is going, and (to some extent) multicasting is about where traffic came from (think RPF check). When you slap multicasting on top of a unicast network, without defining mroutes, you end up thinking "backwards". When you look at a unicast route, you think "I'd send traffic that way". When you look at a unicast route being used for RPF, you think "I'd receive traffic that way". The way mroutes are taught initially is as an override for this behavior.

For example, let's say we're running sparse mode, and R1's Lo0 is the RP. R4 is creating ping traffic towards 239.1.1.1, and R3 has done an IGMP join on its Fa0/0 interface. For the moment, let's assume the serial link between R1 and R3, which is high-speed (let's say it's an OC12) is in shutdown. Several processes will take place here:

1) R3 will send a join towards 1.1.1.1. In this case, because of the unicast routing table, R3 will send that join to R2. R2, again because of the unicast routing table, will send it towards R1. R1 processes the join and puts it's Fa0/0 interface into the OIL for 239.1.1.1.
2) R4's ping is heard by the PIM process running on the local router. R4 directs that traffic towards the RP, which means sending the traffic towards R1, because of the unicast routing table. R1 processes the traffic, adds the (S,G) of (192.168.14.4, 239.1.1.1) and sends it towards R3 via R2, because of the unicast routing table.
3) R3 hears the (192.168.14.4, 239.1.1.1) traffic, and initiates an SPT join towards 192.168.14.4, via R2, and R3, because of the unicast routing table.

We've got a gob of RPF checks here, and a great deal of forward routing, too. For example, the join would never reach from R3 to R1 if we were just thinking in terms of RPF.

Now, for a moment, let's say the serial link is turned on and advertised into the IGP. Since the Fast Ethernet links are all 100MB, and the Serial link is 600MB, it gets preffed. However, whoops, we didn't enable PIM on the serial link! Now, we are screwed. Looking at the steps above again:
1) R3 will send the join towards 1.1.1.1 via the serial link, no such luck, no PIM here!
2) R4 would still reach the RP OK, but the RP would try to reach R3 via the serial link; no PIM here.
3) We can't even contemplate this step because steps 1 and 2 failed.

Now, a basic understanding of mrouting tells us we can fix this with some static mroutes:
R1:
ip mroute 192.168.23.3 255.255.255.255 192.168.12.2

R3:
ip mroute 1.1.1.1 255.255.255.255 192.168.23.2

And boom, our problem is solved.

That's how I thought of mroutes up until attempting to apply BGP to them. My brain kept saying "how can I apply targeted/spot repairs with a routing protocol?". That's where it all broke down. It's difficult to think of a routing protocol in the same sense you think of "fix it" static routes. We've all put that goofy unicast static route in in production - the one you wish wasn't there for cleanliness. It's pointing at some VPN tunnel on some firewall you can't run RRI on, and there's no way to get the route into your IGP without just defining the damned thing statically . Now, again, imagine trying to fix that with BGP. Uugh, my brain hurts. And this is where I ended up turning my thinking around.

First thing's first: DROP THE IGP. You can route an entire multicast configuration without having a single IGP or static route in the network.

Same topology, dump the IGP. We're not going to do any unicast here at all. Assume PIM sparse-mode is enabled everywhere. Let's build the multicast topology as if we were statically routing any traditional unicast network.

R1:
ip mroute 192.168.23.0 255.255.255.0 192.168.12.2

R2:
ip mroute 192.168.14.0 255.255.255.0 192.168.12.1
ip mroute 1.1.1.1 255.255.255.255 192.168.12.1

R3:
ip mroute 192.168.12.0 255.255.255.0 192.168.23.2
ip mroute 192.168.14.0 255.255.255.0 192.168.23.2
ip mroute 1.1.1.1 255.255.255.255 192.168.23.2

R4:
ip mroute 192.168.12.0 255.255.255.0 192.168.14.1
ip mroute 192.168.23.0 255.255.255.0 192.168.14.1
ip mroute 1.1.1.1 255.255.255.255 192.168.14.1

No IGP, but everything works. This takes the mroute out of the role of bandaid, and into the role of controlling the network. The first thing I'd like to point out is we're not just "fixing RPF failures" here, but we control bi-directional communication, in a way. For example, R3 can locate the RP via the mroutes. This is a very "forward" behavior.

(Let's not forget if you're using pings to test, you'll need to use "debug ip icmp" on R3. It can't reply because there are no unicast routes.)

This gave me the feeling of using mroutes as the primary workings of the network, and PIM's interworking with the IGP as more of a "backup" strategy. "If I have no mroute, turn to the IGP's tables". When you start using this logic, replacing the above static routes with BGP makes complete sense! This was my "aha!" moment.

Let's turn the same strategy used above into BGP. Again, the serial link is in shutdown (I used it entirely for the example of how to break things above, so it's basically off from here on in).

R1:
router bgp 100
neighbor 192.168.14.4 remote-as 100
neighbor 192.168.12.2 remote-as 200

address-family ipv4 multicast
neighbor 192.168.14.4 activate
neighbor 192.168.14.4 next-hop-self
neighbor 192.168.12.2 activate
   network 1.1.1.1 mask 255.255.255.255
   network 192.168.14.0 mask 255.255.255.0
   network 192.168.12.0 mask 255.255.255.0

R4:
router bgp 100
  neighbor 192.168.14.1 remote-as 100

address-family ipv4 multicast
   neighbor 192.168.14.1 activate
   neighbor 192.168.14.1 next-hop-self ! not really necessary here, but I'll explain below
   network 192.168.14.0 mask 255.255.255.0

R2:
router bgp 200
neighbor 192.168.12.1 remote-as 100
neighbor 192.168.23.1 remote-as 200
neighbor 192.168.23.1 next-hop-self

address-family ipv4 multicast
neighbor 192.168.23.1 activate
neighbor 192.168.23.1 next-hop-self
network 192.168.12.0 mask 255.255.255.0
network 192.168.23.0 mask 255.255.255.0

R3:
router bgp 200
  neighbor 192.168.23.2 remote-as 200

address-family ipv4 multicast
   neighbor 192.168.23.2 activate
   neighbor 192.168.23.2 next-hop-self ! not really necessary here, but I'll explain below
   network 192.168.23.0 mask 255.255.255.0

A bit wordier than the static configuration, but at least it's a dynamic protocol. And now it makes sense. Stop trying to solve a problem, and look at it as the "right" answer, and it all makes sense.

One thing to look out for is that multicast BGP won't recurse on other multicast BGP routes. So you can't count on, for example, R3 being able to reach 1.1.1.1 because it knows how to reach 192.168.12.0 via R2. Good use of next-hop-self in iBGP sessions is necessary. eBGP still defaults next-hop-self as usual.

Hope you enjoyed...

Jeff Kronlage

Tuesday, November 20, 2012

Multicast Equal Cost Multipathing (ECMP)

Imagine a PIM sparse-mode scenario where multiple senders were sending to two different groups, and you had multiple equal-cost paths to receive the traffic on, but PIM, by default, always picks the neighbor's interface with the highest IP and sends the join up that one.

How can you get some load sharing?

Here's our diagram:

R1 is pinging 239.1.1.1 and R2 is pinging 239.1.1.2. R3 is the RP. R5 is joined to 239.1.1.1 and 239.1.1.2.

We have two equal-cost paths between R3 and R4, and we want to use one for 239.1.1.1 and the other for 239.1.1.2, headed towards R5. Right now Fa0/1, whose neighbor has a higher IP address than Fa0/0's neighbor, is getting all the traffic:

R4(config)#do sh ip mroute 239.1.1.1
<output omitted>

(192.168.13.1, 239.1.1.1), 00:00:19/00:02:47, flags: JT
Incoming interface: FastEthernet0/1, RPF nbr 192.168.234.3
Outgoing interface list:
    FastEthernet1/0, Forward/Sparse, 00:00:19/00:02:40

<output omitted>

(192.168.23.2, 239.1.1.2), 00:00:19/00:02:46, flags: JT
Incoming interface: FastEthernet0/1, RPF nbr 192.168.234.3
Outgoing interface list:
    FastEthernet1/0, Forward/Sparse, 00:00:19/00:02:40

ip multicast multipath is the answer.

R4(config)#ip multicast multipath
R4(config)#do sh ip mroute
<output omitted>

(192.168.13.1, 239.1.1.1), 00:05:39/00:02:56, flags: JT
Incoming interface: FastEthernet0/1, RPF nbr 192.168.234.3
Outgoing interface list:
    FastEthernet1/0, Forward/Sparse, 00:05:39/00:02:13

<output omitted>

(192.168.23.2, 239.1.1.2), 00:05:39/00:02:55, flags: JT
Incoming interface: FastEthernet0/0, RPF nbr 192.168.34.3
Outgoing interface list:
    FastEthernet1/0, Forward/Sparse, 00:05:39/00:02:10

Not too difficult. It uses a hash to achieve this, which I'm frankly not interested enough to look into, because on the IOS version I'm using, you can't change the hash anyway. It looks like starting in IOS 15 you can make some modifications to it.

Also, one catch here is this only balances by source. If you want to balance by group, you'll need to do that with your RP assignments, sending some traffic to one RP and other traffic to a different RP.

Enjoy

Jeff Kronlage

Monday, November 19, 2012

IP Multicast Boundary

In this post we'll take a look at the ip multicast boundary interface-level command. This command's function isn't hard to understand - it references a standard or extended access list, and either permits or denies multicast traffic through the interface. It can also optionally manipulate auto-rp to remove groups you don't want advertised downstream.

The subnets are on the diagram. Every router is using it's router number as the last octet; every router also has a loopback of X.X.X.X, where X is the router number. Every router is running EIGRP on every interface, and pim sparse-dense mode on every interface.

R1 is setup for auto-rp announcement & discovery via it's loopback 0 address:

ip access-list standard GRL
permit 232.0.0.0 7.255.255.255
permit 239.0.0.0 0.255.255.255
ip pim send-rp-announce Loopback0 scope 5 group-list GRL
ip pim send-rp-discovery Loopback0 scope 16

We can verify that R3 is receiving the mappings for these groups:

R3#show ip pim rp mapping
PIM Group-to-RP Mappings
Group(s) 232.0.0.0/5
RP 1.1.1.1 (?), v2v1
    Info source: 1.1.1.1 (?), elected via Auto-RP
         Uptime: 00:07:33, expires: 00:02:13
Group(s) 239.0.0.0/8
RP 1.1.1.1 (?), v2v1
    Info source: 1.1.1.1 (?), elected via Auto-RP
         Uptime: 00:07:33, expires: 00:02:13

Let's have R3's Lo0 interface join 239.1.1.1 and 232.1.1.1:

interface Lo0
ip igmp join-group 239.1.1.1
ip igmp join-group 224.1.1.1

And ping them from R1:

R1#ping 239.1.1.1
<output omitted>
Reply to request 0 from 192.168.23.3, 72 ms
R1#ping 224.1.1.1
<output omitted>
Reply to request 0 from 192.168.23.3, 112 ms
Now let's see if we can use R2 to selectively filter 239.0.0.0/8 from reaching R3:

R2:
ip access-list standard blockthings
deny   239.0.0.0 0.255.255.255
permit any
interface FastEthernet0/1
ip multicast boundary blockthings out
R1:
R1#ping 239.1.1.1
Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 239.1.1.1, timeout is 2 seconds:
.
R2:
sh ip mroute 239.1.1.1
<output omitted>

(192.168.12.1, 239.1.1.1), 00:03:22/00:01:55, flags: PFT
Incoming interface: FastEthernet0/0, RPF nbr 192.168.12.1
Outgoing interface list: Null

It works!

However, on R3, we still think we can join the group:

R3#sh ip pim rp map
PIM Group-to-RP Mappings
Group(s) 224.0.0.0/5
RP 1.1.1.1 (?), v2v1
    Info source: 1.1.1.1 (?), elected via Auto-RP
         Uptime: 00:00:53, expires: 00:02:05
Group(s) 239.0.0.0/8
RP 1.1.1.1 (?), v2v1
    Info source: 1.1.1.1 (?), elected via Auto-RP
         Uptime: 00:00:53, expires: 00:02:04
We can fix that as well.

R2:
interface Fa0/1
ip multicast boundary blockthings filter-autorp

Please note this command is in addition to the prior "blockthings out" statement, not in replacement.

R3#sh ip pim rp map
PIM Group-to-RP Mappings
Group(s) 224.0.0.0/5
RP 1.1.1.1 (?), v2v1
    Info source: 1.1.1.1 (?), elected via Auto-RP
         Uptime: 00:00:04, expires: 00:02:53
... and it's gone from auto-rp, as well.

Now, let's change that access-list on R2 a bit:

no ip access-list standard blockthings
ip access-list standard blockthings
deny 224.0.0.0 0.255.255.255
permit any

R2 & R3:
clear ip pim rp-mapping

Now you'd think we'd be able to ping 239.1.1.1 and not 224.1.1.1, right?

R1#ping 224.1.1.1
Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 224.1.1.1, timeout is 2 seconds:
.
OK, that was expected.

R1#ping 239.1.1.1
Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 239.1.1.1, timeout is 2 seconds:
.
Uh-oh.

R3#sh ip pim rp map
PIM Group-to-RP Mappings
R3#
More uh-oh.

So what happened? We just blocked auto-rp. Let's try this again:

no ip access-list standard blockthings
ip access-list standard blockthings
permit 224.0.1.40 0.0.0.0
deny 224.0.0.0 0.255.255.255
permit any

If our auto-rp mapping agent were behind R2, we'd also want to permit 224.0.1.39. In fact, it's best if you just permit both all the time if you're using auto-rp, just to be safe.

R1#ping 239.1.1.1
<output omitted>
Reply to request 0 from 192.168.23.3, 144 ms

R3#sh ip pim rp map
PIM Group-to-RP Mappings
Group(s) 239.0.0.0/8
RP 1.1.1.1 (?), v2v1
    Info source: 1.1.1.1 (?), elected via Auto-RP
         Uptime: 00:00:03, expires: 00:02:53
Much happier.

There are many other things you can do with ip multicast boundary, which I will give a high-level view of here.

If you wanted to block R2 from being able to perform joins to 224.1.1.1 as well, you'd place the boundary on Fa0/0 as "in". This is probably obvious but the command help is written in such a way that it leaves you scratching your head.

R2:
int lo0
ip igmp join-group 224.1.1.1

R1#ping 224.1.1.1
<output omitted>
.
The interesting thing is the way this shows up on the mroute table on R2:
R2(config-if)#do sh ip mroute 224.1.1.1
<output omitted>

(192.168.12.1, 224.1.1.1), 00:02:03/00:01:12, flags: PT
Incoming interface: FastEthernet0/0, RPF nbr 192.168.12.1
Outgoing interface list: Null

The stream is allowed on to R2, but R2 won't allow Lo0 to be added to the OIL (Outgoing interface list).

Labbing up the extended access lists version of this can get rather tricky. Tricky to the point where I really hope this never shows up on the lab.

In short, you can filter by (S,G) instead of just by group. The format is:
permit ip <source ip> <source mask> <group address><group mask>

The catch is, you have to consider traffic from the RP, after the SPT join, etc. Even the joins can have issues. It's difficult to come up with a "clean" way of showing how to make this work.

...Here's hoping we all dodge that bullet.

Jeff Kronlage

Sunday, November 18, 2012

PIM Assert

The PIM Assert is an election process that prevents multiple senders on a broadcast media from replicating the same multicast stream on to the wire. The scenario this can happen in is somewhat specific; I've only been able to think of a way to create it using PIM dense mode.

Here's our topology:

The "top" segment is IPed 192.168.234.X, where X is the router number. The "bottom" segment is IPed 192.168.123.X, where X is the router number. R4 has a loopback address of 4.4.4.4/32. All routers are running EIGRP on all interfaces.

R4 is our receiver and R1 is our transmitter. PIM dense mode is running on all interfaces except FastEthernet0/0 on R1 (R1 is not running any type of multicast routing protocol). R4's lo0 has IGMP join-group configured for 239.1.1.1:

interface Loopback0
ip address 4.4.4.4 255.255.255.255
ip pim dense-mode
ip igmp join-group 239.1.1.1
When R1 pings 239.1.1.1, the packet hits both R2 and R3. This being dense mode, both R2 and R3 forward the packet on to R4. This is obviously inefficient, and creates duplicate packets. During this process, R2 and R3 will hear each other's packets, and will start trying to sort the situation out.

There's a catch here using a router to create the multicast traffic, and there's a reason I set the lab up this way. In order to trigger a PIM Assert, which we are about to see, the (S,G) has to match exactly on both packets egressing R2 and R3. If the links between R1 and R2 and R1 and R3 were any type of point-to-point link, the traffic would be duplicated and the assert would never happen. IOS's implementation for creating multicast traffic is to source it off every router interface and send it out every interface. Even if you use a "ping 239.1.1.1 /source Lo0", the source part of the command is completely ignored - the packets always originate off their respective interfaces. If you're pinging off two separate interfaces, you have two separate (S,G) entries, and the traffic is duplicated. If you've only got one interface, pointed at a broadcast media, you end up with one (S,G), and this lab is possible.

So, going back to our Assert between R2 and R3... R2 and R3 both send an Assert packet at one another, saying why they should be the ones sending the traffic. The winner is decided by:
- Lowest Administrative Distance (AD) back to the source
- In a tie, best metric value
- In a tie, highest IP address

Our routers both have the same AD (Internal EIGRP, 90) and metric. In this case, R3 is always chosen for highest IP address, which we can see in this excerpt from show ip mroute from R3:

(192.168.123.1, 239.2.2.2), 00:00:03/00:02:56, flags: FT
Incoming interface: FastEthernet0/0, RPF nbr 0.0.0.0
Outgoing interface list:
    FastEthernet0/1, Forward/Dense, 00:00:03/00:00:00, A
The red "A" indicates it's the assert winner.

We're clearly not going to be able fiddle with the metric or the AD back to the source in this case, as both routers have connected interfaces.  This isn't the way I originally planned this lab, but after modifying it repeatedly due to the source interface problem described above, we're going with what we've got! So, let's play with the IP addresses and watch the magic:

R2:
int Fa0/1
no ip address 192.168.234.2 255.255.255.0
ip address 192.168.234.222 255.255.255.0

R3:
clear ip mroute *

R1:
ping 239.1.1.1

....

R2#sh ip mroute 239.1.1.1
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report,
       Z - Multicast Tunnel, z - MDT-data group sender,
       Y - Joined MDT-data group, y - Sending to MDT-data group
Outgoing interface flags: H - Hardware switched, A - Assert winner
Timers: Uptime/Expires
Interface state: Interface, Next-Hop or VCD, State/Mode

(*, 239.1.1.1), 00:00:03/stopped, RP 192.168.234.4, flags: S
Incoming interface: FastEthernet0/1, RPF nbr 192.168.234.4
Outgoing interface list:
    FastEthernet0/0, Forward/Dense, 00:00:03/00:00:00

(192.168.123.1, 239.1.1.1), 00:00:03/00:02:56, flags: T
Incoming interface: FastEthernet0/0, RPF nbr 0.0.0.0
Outgoing interface list:
    FastEthernet0/1, Forward/Dense, 00:00:03/00:00:00, A
And there it is, now on R2!

Cheers,

Jeff Kronlage

Saturday, November 17, 2012

BGP Capability ORF

ORF - Outbound Route Filtering - is not hard to grasp the concept of, but I hadn't actually seen it before this, and it's a fantastic idea.

Anyone who's been a BGP admin is familiar with prefix filtering on the "customer edge" side. The real-world example is that service providers normally only offer a handful of options for receiving the BGP table from them: Full routes, no routes (a default), and connected customers + a default. Normally the last two are used for customer edge routers that have limited CPU or RAM and don't have the capacity to store and parse the entire BGP table.

A common solution from the customer edge side - one I've personally implemented - is to take the entire BGP table and filter it down with a prefix list to what it actually wants to keep in memory. This works fine, however, it still keeps the burden of the PE router sending the entire BGP table to the CE router, and the CE router then having to reject a rather large percentage of it. This is terribly inefficient.

What if you could ask the PE router to only send you the routes you wanted, dynamically? This is exactly what ORF does.

ORF "sends" a prefix list from the CE to the PE, the PE keeps the prefix list in memory (not in the configuration), and then only transmits that prefix list to the CE.

The configuration is simple:

PE:
router bgp 1
no synchronization
bgp log-neighbor-changes
network 1.1.1.1 mask 255.255.255.255
network 2.2.2.2 mask 255.255.255.255
network 3.3.3.3 mask 255.255.255.255
network 4.4.4.4 mask 255.255.255.255
network 5.5.5.5 mask 255.255.255.255
neighbor 192.168.12.2 remote-as 2
neighbor 192.168.12.2 capability orf prefix-list receive no auto-summary
CE:
ip prefix-list someroutes seq 5 permit 2.2.2.2/32
ip prefix-list someroutes seq 10 permit 4.4.4.4/32
router bgp 2
no synchronization
bgp log-neighbor-changes
neighbor 192.168.12.1 remote-as 1
neighbor 192.168.12.1 capability orf prefix-list send
neighbor 192.168.12.1 prefix-list someroutes in
no auto-summary
One really nice thing about this config is that even if the PE doesn't support the method, you still get the filtering (via traditional CE-side prefix filtering).

Now obviously, the filtering happens on the CE one way or the other. So how do you verify this is working?

PE#sh ip bgp neighbors 192.168.12.2 | s capabilities
Neighbor capabilities:
    Route refresh: advertised and received(old & new)
    Address family IPv4 Unicast: advertised and received
AF-dependant capabilities:
    Outbound Route Filter (ORF) type (128) Prefix-list:
      Send-mode: received
      Receive-mode: advertised
PE#sh ip bgp neighbors 192.168.12.2 advertised-routes
BGP table version is 6, local router ID is 5.5.5.5
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
   Network          Next Hop            Metric LocPrf Weight Path
*> 2.2.2.2/32       0.0.0.0                  0         32768 i
*> 4.4.4.4/32       0.0.0.0                  0         32768 i

There's our prefix filtering, now on the PE router!

Jeff Kronlage

BGP TTL Security

I had actually never labbed TTL Security before today, and I got "good and stuck" for a while on the mechanics. One of the items that baffled me is I saw it good for an enterprise, but didn't realize the service provider had to play along or it was essentially useless. Here's our diagram:

In this case, R1 is the service provider core router, and R4 is the customer.

So what's the threat?

Using Extended Access Lists as a Substitute for Prefix Lists

I've known this feature was out there for a long while now, but my brain has just rejected learning it.

Let's say you get a lab task that has one of the two following requirements:
1) Filtering by prefix size, but don't use a prefix list
2) Filtering by prefix size and arbitrary bits in the prefix

Neither of these have any real-world purpose, unfortunately (fortunately?).

So let's take this prefix list and turn it into an extended access list:
ip prefix-list prefixmatch permit 10.5.0.0/16 ge 18 le 24

So just to recap basic prefix list, this would match anything 10.5.X.X that has a subnet mask of 18-24. So, these would match:

10.5.40.0/24
10.5.40.0/20
10.5.100.0/18

These would not match:

10.5.40.0/26
10.5.40.0/19
10.6.40.0/20

To replicate this match in an extended access-list, the following format is used:
[permit|deny] ip [prefix] [mask] [ge prefix length] [le prefix length]

The prefix and mask are really straightforward (unless you're doing arbitrary binary bit matching). The GE/LE length take some staring at to understand, because you have to do binary matching.

The easy part of the translation looks like this:

ip prefix-list prefixmatch permit 10.5.0.0/16
... is equivalent to...
access-list 100 permit ip 10.5.0.0 0.0.255.255

Now to understand the hard part.
So we're looking to match masks 18 bits (GE) to 24 bits (LE).
GE on an access-list, in this case, is 255.255.192.0. That part makes sense. /18 = 255.255.192.0.
Now we already know the second part of the mask must be a wildcard mask.

In order for my brain to wrap around this, I always have to use binary as an intermediary. The LE wildcard is based off the GE mask, so let's translate the GE to binary first:
255.255.192.0 = 11111111.11111111.11000000.00000000

the LE match needs to specify all the bits between the GE and LE. The LE is /24, so translating to binary, we have:
11111111.11111111.11111111.00000000

We need the difference of the two, LE minus GE:
11111111.11111111.11111111.00000000
-11111111.11111111.11000000.00000000
=00000000.00000000.00111111.00000000

translate your answer back to decimal:
0.0.63.0

Now we can figure out the rest of the solution:
ip prefix-list prefixmatch permit 10.5.0.0/16 ge 18 le 24
... is equivalent to...
access-list 100 permit ip 10.5.0.0 0.0.255.255 255.255.192.0 0.0.63.0

Thanks folks, but I'll stick with prefix lists!

Now just to throw one more curveball, let's try the task that can't be done with prefix lists.
Same prefix list: ip prefix-list prefixmatch permit 10.5.0.0/16 ge 18 le 24
However, this time, we want to match subnets that only have even IPs in the third octet.

access-list 100 permit ip 10.5.0.0 0.0.254.255 255.255.192.0 0.0.63.0

I'm not going to go over the binary math behind the 254 match (there are dozens of posts out there about this already), but it's quite clear this type of arbitrary non-sequential bit match is impossible with a prefix list.

Cheers...

Jeff Kronlage

Wednesday, November 14, 2012

The many maps of BGP

Every time I sit down with BGP for a prolonged period, I get quickly overwhelmed by the quantity of different types of "maps" that can be applied to various parts of the configuration.

I tried to count them this morning, and I came up with:
Suppress-Map
Unsuppress-Map
Inject-Map
Advertise-Map
Attribute-Map
Exist-Map
Non-exist-map
and of course traditional route-maps, which make for a total of 8.

When I sit down to fine-tune summarization, or do a conditional advertisement, I can never remember the term for the map I'm looking for. So let's do a thorough run-down of what they all do.

iBGP Route-Reflector Loop Prevention

I've always been a bit foggy on the loop prevention mechanism of a route reflector. I originally assumed it used some sort of split horizon, but as I've discovered, this is simply not the case.

We'll be using two topologies here, starting with this simple one:

R1 will be our route reflector, R2 and R3 will be route reflector clients. The fourth octet in the diagram's IP address is the router number. In addition to the IPs indicated on the diagram, each router has a loopback of X.X.X.X, where X is the router number.

I'm going to peer R1 to R2 and R1 to R3, but not R2 to R3.

OSPF max-metric command

The usage of max-metric router-lsa eluded me, so I labbed it up. Here's our topology:

As usual, X represents the router number.

OSPF is running on all links in the topology, including R4's loopback. Our test traffic flow will be travelling from R1 to R4. Traffic has been manually preffed to go R1 -> R2 -> R4:

R1:

interface FastEthernet0/0

ip ospf cost 10000

Jeff Kronlage's CCIE Study Blog