Saturday, November 3, 2012

iBGP Route-Reflector Loop Prevention

I've always been a bit foggy on the loop prevention mechanism of a route reflector.  I originally assumed it used some sort of split horizon, but as I've discovered, this is simply not the case. 

We'll be using two topologies here, starting with this simple one:




R1 will be our route reflector, R2 and R3 will be route reflector clients.  The fourth octet in the diagram's IP address is the router number. In addition to the IPs indicated on the diagram, each router has a loopback of X.X.X.X, where X is the router number.

I'm going to peer R1 to R2 and R1 to R3, but not R2 to R3.


Here are the relevant configs:

R1:
router bgp 100
 no synchronization
 bgp log-neighbor-changes
 network 1.1.1.1 mask 255.255.255.255
 network 192.168.12.0
 network 192.168.13.0
 neighbor 192.168.12.2 remote-as 100
 neighbor 192.168.12.2 route-reflector-client
 neighbor 192.168.13.3 remote-as 100
 neighbor 192.168.13.3 route-reflector-client
 no auto-summary

R2:
router bgp 100
 no synchronization
 bgp log-neighbor-changes
 network 2.2.2.2 mask 255.255.255.255
 network 192.168.12.0
 network 192.168.23.0
 neighbor 192.168.12.1 remote-as 100
 no auto-summary

R3:
router bgp 100
 no synchronization
 bgp log-neighbor-changes
 network 3.3.3.3 mask 255.255.255.255
 network 192.168.13.0
 network 192.168.23.0
 neighbor 192.168.13.1 remote-as 100
 no auto-summary

Route reflectors use the Originator ID and Cluster List attributes for loop prevention.

Let's look at Originator ID, and debunk the expectations I expressed at the beginning of the article.

R3#debug ip bgp ipv4 unicast updates
R3#clear ip bgp * soft in

For simplicity I'm only going to post the relevant part the debug output:

*Mar  1 00:40:21.399: BGP(0): 192.168.13.1 rcv UPDATE about 3.3.3.3/32 -- DENIED due to: ORIGINATOR is us;

Well, that kills my theory about split horizon.  We just got our own route back from the route reflector.  At least we didn't accept it.

Let's take a closer look at this Originator ID.  It's best demonstrated on R2, which did accept the route:

R2#sh ip bgp 3.3.3.3
BGP routing table entry for 3.3.3.3/32, version 7
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Flag: 0x820
  Not advertised to any peer
  Local
    192.168.13.3 from 192.168.12.1 (1.1.1.1)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Originator: 3.3.3.3, Cluster list: 1.1.1.1

The concept is simple.  Lacking an AS-PATH to use as loop prevention, the route reflector inserts the router ID of the advertising router.  The advertising router needs to be wise to the ways of the route reflector, and looks at originator ID, the same way it might have examined AS-PATH in eBGP.  I've read elsewhere that clients of a route reflector aren't aware of the fact that route reflection is going on.  This also isn't true.  While they don't change anything in the process to be a route reflector client, they must check the Originator attribute, and therefore, in my opinion, are participating in route reflection.

This brings up another interesting point.  Havok could be caused if the same router ID is in use two places in the iBGP domain.  Let's try it out.

R2(config-router)#bgp router-id 3.3.3.3

I still have the debug going on R3:
*Mar  1 00:50:51.807: BGP(0): 192.168.13.1 rcv UPDATE about 2.2.2.2/32 -- DENIED due to: ORIGINATOR is us;
*Mar  1 00:50:51.807: BGP(0): 192.168.13.1 rcv UPDATE about 192.168.23.0/24 -- DENIED due to: ORIGINATOR is us;
I think this would make an excellent troubleshooting lab question.  What an obscure, yet easy to fix, issue.

Now let's check out the cluster list.

We have a slightly more complex design this time:

In this scenario, we're running EIGRP on all links and all Lo0's.  Lo0 remains as X.X.X.X, Lo1 is now also included, as XX.XX.XX.XX.  Lo1 is not carried in EIGRP, and will be used to show iBGP interaction.

R1 and R2 are both route reflectors. R3 and R4 are both clients to both reflectors.  R1 and R2 are peered to each other with traditional iBGP.

Here is the relevant config:
R1:
router bgp 100
 no synchronization
 bgp log-neighbor-changes
 network 11.11.11.11 mask 255.255.255.255
 neighbor 2.2.2.2 remote-as 100
 neighbor 2.2.2.2 update-source Loopback0
 neighbor 3.3.3.3 remote-as 100
 neighbor 3.3.3.3 update-source Loopback0
 neighbor 3.3.3.3 route-reflector-client
 neighbor 4.4.4.4 remote-as 100
 neighbor 4.4.4.4 update-source Loopback0
 neighbor 4.4.4.4 route-reflector-client
 no auto-summary

R2:
router bgp 100
 no synchronization
 bgp log-neighbor-changes
 network 22.22.22.22 mask 255.255.255.255
 neighbor 1.1.1.1 remote-as 100
 neighbor 1.1.1.1 update-source Loopback0
 neighbor 3.3.3.3 remote-as 100
 neighbor 3.3.3.3 update-source Loopback0
 neighbor 3.3.3.3 route-reflector-client
 neighbor 4.4.4.4 remote-as 100
 neighbor 4.4.4.4 update-source Loopback0
 neighbor 4.4.4.4 route-reflector-client
 no auto-summary

R3:
router bgp 100
 no synchronization
 bgp log-neighbor-changes
 network 33.33.33.33 mask 255.255.255.255
 neighbor 1.1.1.1 remote-as 100
 neighbor 2.2.2.2 remote-as 100
 no auto-summary

R4:
router bgp 100
 no synchronization
 bgp log-neighbor-changes
 network 44.44.44.44 mask 255.255.255.255
 neighbor 1.1.1.1 remote-as 100
 neighbor 2.2.2.2 remote-as 100
 no auto-summary

As I said above, both R3 and R4 are dual-peered to R1 and R2.  This provides redundancy in case one of the route reflectors fails.  Of course, in our diagram, this is relatively useless because if the route reflector failed, the client hanging off of it would be disconnected as well.  But in the real world with real redundant links, this would be a common design.

The thing that should jump out at you is how R1 and R2 know they're peering redundant iBGP peers. 
Well... so far, they don't.  In our design this physically can't cause a loop, but it makes for a messy BGP table. 

R1#sh ip bgp 33.33.33.33
BGP routing table entry for 33.33.33.33/32, version 7
Paths: (2 available, best #2, table Default-IP-Routing-Table)
Flag: 0x820
  Advertised to update-groups:
        1    2
  Local
    3.3.3.3 (metric 156160) from 2.2.2.2 (22.22.22.22)
      Origin IGP, metric 0, localpref 100, valid, internal
      Originator: 33.33.33.33, Cluster list: 22.22.22.22
  Local, (Received from a RR-client)
    3.3.3.3 (metric 156160) from 3.3.3.3 (33.33.33.33)
      Origin IGP, metric 0, localpref 100, valid, internal, best

So we learned the route twice: once from 3.3.3.3, which was expected, and again from 2.2.2.2.  Since this is iBGP, the next hop isn't modified, so the traffic always goes to the right place, but this is messy regardless.

On a side note,  I was thinking "I sure could cause havoc by introducting next-hop-self on R1 towards R2, and R2 towards R1".  Interestingly, next-hop-self did nothing under these circumstances - must be an idiot-proofing with route reflectors.

So, what do we do about this?  Introducing the Cluster ID.  The cluster ID value is introduced into the BGP cluster list attribute, seen above (Cluster list: 22.22.22.22).  The cluster list says "I came from this group of iBGP route reflectors".  If you fail to set the cluster ID, IOS will fill in the router ID in this field.  Since both router reflectors clearly have different router IDs, they're taking each others routes and "double reflecting" them.

R1 & R2:
router bgp 100
 bgp cluster-id 99.99.99.99

The cluster-id can be any arbitrary 32-bit number.  All it needs to do is match on the redundant route reflectors, and you're all set.

R1#sh ip bgp 33.33.33.33
BGP routing table entry for 33.33.33.33/32, version 7
Paths: (1 available, best #1, table Default-IP-Routing-Table)
  Advertised to update-groups:
        1    2
  Local, (Received from a RR-client)
    3.3.3.3 (metric 156160) from 3.3.3.3 (33.33.33.33)
      Origin IGP, metric 0, localpref 100, valid, internal, best

Now only one copy of the route.

Jeff Kronlage

3 comments:

  1. Jeff,

    I have a unique situation, and would like your thoughts on it. A customer is advertising a default route with an AS path of 65160 1234 within the L3PVN (AS1234 happens to be the AS of PE that they are connected to).

    The route reflector sees AS 1234 in the path and does not advertise it to other route reflector clients in AS1234..is that expected behavior?

    =============================





    Customer RD : 2:2
    Route Reflector: 192.168.1.10 < This Route reflector client is in AS 1234
    Local AS: 111
    PE that the customer is peered to: 10.10.10.10 (This PE is in AS1785)


    Output below from route reflector in AS 111

    2:2:0.0.0.0/0 (2 entries, 0 announced)
    BGP /-101
    Route Distinguisher: 2:2
    Next hop type: Indirect
    Address: 0x28b00e20
    Next-hop reference count: 2
    Source: 192.168.1.10
    Protocol next hop: 10.10.10.10
    Push 464
    Indirect next hop: 2 no-forward
    State:
    Local AS: 111 Peer AS: 1234
    Age: 3d 6:34:32 Metric: 0 Metric2: 87766
    Task: BGP_111.192.168.1.10+62122
    AS path: 65160 1234 I (Originator) (Looped: 1234)
    Cluster list: 192.168.1.10
    Originator ID: 10.10.10.10
    Communities: target:2:2
    VPN Label: 464
    Localpref: 100
    Router ID: 192.168.1.10
    Indirect next hops: 1
    Protocol next hop: 10.10.10.10 Metric: 87766
    Push 464
    Indirect next hop: 2 no-forward
    Indirect path forwarding next hops: 2
    Next hop type: Router
    Next hop: 98.21.127.41 via ge-2/2/0.0
    Next hop: 98.21.127.43 via ge-2/2/1.0
    10.10.10.10/32 Originating RIB: inet.3
    Metric: 87766 Node path count: 1
    Forwarding nexthops: 2
    Nexthop: 98.21.127.41 via ge-2/2/0.0
    Nexthop: 98.21.127.43 via ge-2/2/1.0


    ReplyDelete
    Replies
    1. You're going to have to help me out with the topology a little more. Normally, a route reflector is advertising routes only within a single AS. Perhaps there's a VRF not picture here (Assuming so with MPLS mentioned), but I'm curious why we have a router in AS 1234 passing routes to an AS in 111 and expecting them to be passed back to 1234? That doesn't sound like route reflection, it sounds like traditional eBGP, and yes, 1234 -> 111 ->1234 would not be accepted simply because eBGP will not accept routes with its own AS in it. I suspect I'm missing something from your topology?

      Delete
  2. You did a very good job on this write up, thank you.

    ReplyDelete