Wednesday, November 21, 2012

Rethinking mroutes; Multicast BGP

I started working on using BGP to distribute multicast routes today.  I've touched on this topic a few other times, and while I "kind of" got the idea, it never really sat well with me - I never felt comfortable with it.  The problem I've always had, probably stemming from my CCNP studies or early CCIE work, is that I learned mroutes as a way to fix issues with unicast routing.  In other words, if the IGP didn't jive with what the multicast was doing, you could use an mroute to bandaid it.  Layer BGP on top of that logic, and you've now got nested confusion. 

Today, I had a paradigm shift on how I thought of mroutes, and it clarified everything for me.  I'm hoping that some of you have the same problem and this may help.

Here's our topology:


All links are 192.168.YZ.0/24; where Y is the low router number, and Z is the high router number.  For example, the link between R2 and R3 is 192.168.23.0/24.  The interface IPs are 192.168.YZ.X, where X is the router number.  R1 has a loopback address of 1.1.1.1/32.

For the moment, ignore the BGP AS numbers.  Pretend this is all one IGP-driven network.

The thing that kept throwing me off is the way PIM incorporates the IGP, or unicast routing table, into its calculations.  Everything feels kind of "backwards".  Unicasting is inherently about where traffic is going, and (to some extent) multicasting is about where traffic came from (think RPF check).  When you slap multicasting on top of a unicast network, without defining mroutes, you end up thinking "backwards".  When you look at a unicast route, you think "I'd send traffic that way".  When you look at a unicast route being used for RPF, you think "I'd receive traffic that way".  The way mroutes are taught initially is as an override for this behavior. 

For example, let's say we're running sparse mode, and R1's Lo0 is the RP.  R4 is creating ping traffic towards 239.1.1.1, and R3 has done an IGMP join on its Fa0/0 interface.  For the moment, let's assume the serial link between R1 and R3, which is high-speed (let's say it's an OC12) is in shutdown.  Several processes will take place here:

1) R3 will send a join towards 1.1.1.1.  In this case, because of the unicast routing table, R3 will send that join to R2.  R2, again because of the unicast routing table, will send it towards R1.  R1 processes the join and puts it's Fa0/0 interface into the OIL for 239.1.1.1.
2) R4's ping is heard by the PIM process running on the local router.  R4 directs that traffic towards the RP, which means sending the traffic towards R1, because of the unicast routing table.  R1 processes the traffic, adds the (S,G) of (192.168.14.4, 239.1.1.1) and sends it towards R3 via R2, because of the unicast routing table
3) R3 hears the (192.168.14.4, 239.1.1.1) traffic, and initiates an SPT join towards 192.168.14.4, via R2, and R3, because of the unicast routing table.

We've got a gob of RPF checks here, and a great deal of forward routing, too.  For example, the join would never reach from R3 to R1 if we were just thinking in terms of RPF.

Now, for a moment, let's say the serial link is turned on and advertised into the IGP.  Since the Fast Ethernet links are all 100MB, and the Serial link is 600MB, it gets preffed.  However, whoops, we didn't enable PIM on the serial link!  Now, we are screwed.  Looking at the steps above again:
1) R3 will send the join towards 1.1.1.1 via the serial link, no such luck, no PIM here!
2) R4 would still reach the RP OK, but the RP would try to reach R3 via the serial link; no PIM here.
3) We can't even contemplate this step because steps 1 and 2 failed.

Now, a basic understanding of mrouting tells us we can fix this with some static mroutes:
R1:
  ip mroute 192.168.23.3 255.255.255.255 192.168.12.2

R3:
  ip mroute 1.1.1.1 255.255.255.255 192.168.23.2

And boom, our problem is solved.

That's how I thought of mroutes up until attempting to apply BGP to them.  My brain kept saying "how can I apply targeted/spot repairs with a routing protocol?".  That's where it all broke down.  It's difficult to think of a routing protocol in the same sense you think of "fix it" static routes.  We've all put that goofy unicast static route in in production - the one you wish wasn't there for cleanliness.  It's pointing at some VPN tunnel on some firewall you can't run RRI on, and there's no way to get the route into your IGP without just defining the damned thing statically .  Now, again, imagine trying to fix that with BGP.  Uugh, my brain hurts.  And this is where I ended up turning my thinking around.

First thing's first: DROP THE IGP.  You can route an entire multicast configuration without having a single IGP or static route in the network.

Same topology, dump the IGP.  We're not going to do any unicast here at all.  Assume PIM sparse-mode is enabled everywhere.  Let's build the multicast topology as if we were statically routing any traditional unicast network.

R1:
 ip mroute 192.168.23.0 255.255.255.0 192.168.12.2

R2:
 ip mroute 192.168.14.0 255.255.255.0 192.168.12.1
 ip mroute 1.1.1.1 255.255.255.255 192.168.12.1

R3:
 ip mroute 192.168.12.0 255.255.255.0 192.168.23.2
 ip mroute 192.168.14.0 255.255.255.0 192.168.23.2
 ip mroute 1.1.1.1 255.255.255.255 192.168.23.2

R4:
 ip mroute 192.168.12.0 255.255.255.0 192.168.14.1
 ip mroute 192.168.23.0 255.255.255.0 192.168.14.1
 ip mroute 1.1.1.1 255.255.255.255 192.168.14.1

No IGP, but everything works.  This takes the mroute out of the role of bandaid, and into the role of controlling the network.  The first thing I'd like to point out is we're not just "fixing RPF failures" here, but we control bi-directional communication, in a way.  For example, R3 can locate the RP via the mroutes.  This is a very "forward" behavior.

(Let's not forget if you're using pings to test, you'll need to use "debug ip icmp" on R3.  It can't reply because there are no unicast routes.)

This gave me the feeling of using mroutes as the primary workings of the network, and PIM's interworking with the IGP as more of a "backup" strategy.  "If I have no mroute, turn to the IGP's tables". When you start using this logic, replacing the above static routes with BGP makes complete sense!  This was my "aha!" moment.

Let's turn the same strategy used above into BGP.  Again, the serial link is in shutdown (I used it entirely for the example of how to break things above, so it's basically off from here on in). 

R1:
 router bgp 100
  neighbor 192.168.14.4 remote-as 100
  neighbor 192.168.12.2 remote-as 200

  address-family ipv4 multicast
   neighbor 192.168.14.4 activate
   neighbor 192.168.14.4 next-hop-self
   neighbor 192.168.12.2 activate
   network 1.1.1.1 mask 255.255.255.255
   network 192.168.14.0 mask 255.255.255.0
   network 192.168.12.0 mask 255.255.255.0

R4:
 router bgp 100
  neighbor 192.168.14.1 remote-as 100

  address-family ipv4 multicast
   neighbor 192.168.14.1 activate
   neighbor 192.168.14.1 next-hop-self ! not really necessary here, but I'll explain below
   network 192.168.14.0 mask 255.255.255.0

R2:
 router bgp 200
  neighbor 192.168.12.1 remote-as 100
  neighbor 192.168.23.1 remote-as 200
  neighbor 192.168.23.1 next-hop-self
 
 address-family ipv4 multicast
  neighbor 192.168.23.1 activate
  neighbor 192.168.23.1 next-hop-self
  network 192.168.12.0 mask 255.255.255.0
  network 192.168.23.0 mask 255.255.255.0

R3:
 router bgp 200
  neighbor 192.168.23.2 remote-as 200
 
  address-family ipv4 multicast
   neighbor 192.168.23.2 activate
   neighbor 192.168.23.2 next-hop-self  ! not really necessary here, but I'll explain below
   network 192.168.23.0 mask 255.255.255.0

A bit wordier than the static configuration, but at least it's a dynamic protocol. And now it makes sense.  Stop trying to solve a problem, and look at it as the "right" answer, and it all makes sense.

One thing to look out for is that multicast BGP won't recurse on other multicast BGP routes.  So you can't count on, for example, R3 being able to reach 1.1.1.1 because it knows how to reach 192.168.12.0 via R2.  Good use of next-hop-self in iBGP sessions is necessary.  eBGP still defaults next-hop-self as usual.

Hope you enjoyed...

Jeff Kronlage

1 comment:

  1. This is great, it did help things click together faster than the "dynamic band-aid" analogy implied by most sources.

    ReplyDelete