We'll be using two topologies here, starting with this simple one:
R1 will be our route reflector, R2 and R3 will be route reflector clients. The fourth octet in the diagram's IP address is the router number. In addition to the IPs indicated on the diagram, each router has a loopback of X.X.X.X, where X is the router number.
I'm going to peer R1 to R2 and R1 to R3, but not R2 to R3.
Here are the relevant configs:
R1:
router bgp 100
no synchronization
bgp log-neighbor-changes
network 1.1.1.1 mask 255.255.255.255
network 192.168.12.0
network 192.168.13.0
neighbor 192.168.12.2 remote-as 100
neighbor 192.168.12.2 route-reflector-client
neighbor 192.168.13.3 remote-as 100
neighbor 192.168.13.3 route-reflector-client
no auto-summary
R2:
router bgp 100
no synchronization
bgp log-neighbor-changes
network 2.2.2.2 mask 255.255.255.255
network 192.168.12.0
network 192.168.23.0
neighbor 192.168.12.1 remote-as 100
no auto-summary
R3:
router bgp 100
no synchronization
bgp log-neighbor-changes
network 3.3.3.3 mask 255.255.255.255
network 192.168.13.0
network 192.168.23.0
neighbor 192.168.13.1 remote-as 100
no auto-summary
Route reflectors use the Originator ID and Cluster List attributes for loop prevention.
Let's look at Originator ID, and debunk the expectations I expressed at the beginning of the article.
R3#debug ip bgp ipv4 unicast updates
R3#clear ip bgp * soft in
For simplicity I'm only going to post the relevant part the debug output:
*Mar 1 00:40:21.399: BGP(0): 192.168.13.1 rcv UPDATE about 3.3.3.3/32 -- DENIED due to: ORIGINATOR is us;
Well, that kills my theory about split horizon. We just got our own route back from the route reflector. At least we didn't accept it.
Let's take a closer look at this Originator ID. It's best demonstrated on R2, which did accept the route:
R2#sh ip bgp 3.3.3.3
BGP routing table entry for 3.3.3.3/32, version 7
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Flag: 0x820
Not advertised to any peer
Local
192.168.13.3 from 192.168.12.1 (1.1.1.1)
Origin IGP, metric 0, localpref 100, valid, internal, best
Originator: 3.3.3.3, Cluster list: 1.1.1.1
The concept is simple. Lacking an AS-PATH to use as loop prevention, the route reflector inserts the router ID of the advertising router. The advertising router needs to be wise to the ways of the route reflector, and looks at originator ID, the same way it might have examined AS-PATH in eBGP. I've read elsewhere that clients of a route reflector aren't aware of the fact that route reflection is going on. This also isn't true. While they don't change anything in the process to be a route reflector client, they must check the Originator attribute, and therefore, in my opinion, are participating in route reflection.
This brings up another interesting point. Havok could be caused if the same router ID is in use two places in the iBGP domain. Let's try it out.
R2(config-router)#bgp router-id 3.3.3.3
I still have the debug going on R3:
*Mar 1 00:50:51.807: BGP(0): 192.168.13.1 rcv UPDATE about 2.2.2.2/32 -- DENIED due to: ORIGINATOR is us;
*Mar 1 00:50:51.807: BGP(0): 192.168.13.1 rcv UPDATE about 192.168.23.0/24 -- DENIED due to: ORIGINATOR is us;
I think this would make an excellent troubleshooting lab question. What an obscure, yet easy to fix, issue.
Now let's check out the cluster list.
We have a slightly more complex design this time:
R1 and R2 are both route reflectors. R3 and R4 are both clients to both reflectors. R1 and R2 are peered to each other with traditional iBGP.
Here is the relevant config:
R1:
router bgp 100
no synchronization
bgp log-neighbor-changes
network 11.11.11.11 mask 255.255.255.255
neighbor 2.2.2.2 remote-as 100
neighbor 2.2.2.2 update-source Loopback0
neighbor 3.3.3.3 remote-as 100
neighbor 3.3.3.3 update-source Loopback0
neighbor 3.3.3.3 route-reflector-client
neighbor 4.4.4.4 remote-as 100
neighbor 4.4.4.4 update-source Loopback0
neighbor 4.4.4.4 route-reflector-client
no auto-summary
R2:
router bgp 100
no synchronization
bgp log-neighbor-changes
network 22.22.22.22 mask 255.255.255.255
neighbor 1.1.1.1 remote-as 100
neighbor 1.1.1.1 update-source Loopback0
neighbor 3.3.3.3 remote-as 100
neighbor 3.3.3.3 update-source Loopback0
neighbor 3.3.3.3 route-reflector-client
neighbor 4.4.4.4 remote-as 100
neighbor 4.4.4.4 update-source Loopback0
neighbor 4.4.4.4 route-reflector-client
no auto-summary
R3:
router bgp 100
no synchronization
bgp log-neighbor-changes
network 33.33.33.33 mask 255.255.255.255
neighbor 1.1.1.1 remote-as 100
neighbor 2.2.2.2 remote-as 100
no auto-summary
R4:
router bgp 100
no synchronization
bgp log-neighbor-changes
network 44.44.44.44 mask 255.255.255.255
neighbor 1.1.1.1 remote-as 100
neighbor 2.2.2.2 remote-as 100
no auto-summary
As I said above, both R3 and R4 are dual-peered to R1 and R2. This provides redundancy in case one of the route reflectors fails. Of course, in our diagram, this is relatively useless because if the route reflector failed, the client hanging off of it would be disconnected as well. But in the real world with real redundant links, this would be a common design.
The thing that should jump out at you is how R1 and R2 know they're peering redundant iBGP peers.
Well... so far, they don't. In our design this physically can't cause a loop, but it makes for a messy BGP table.
R1#sh ip bgp 33.33.33.33
BGP routing table entry for 33.33.33.33/32, version 7
Paths: (2 available, best #2, table Default-IP-Routing-Table)
Flag: 0x820
Advertised to update-groups:
1 2
Local
3.3.3.3 (metric 156160) from 2.2.2.2 (22.22.22.22)
Origin IGP, metric 0, localpref 100, valid, internal
Originator: 33.33.33.33, Cluster list: 22.22.22.22
Local, (Received from a RR-client)
3.3.3.3 (metric 156160) from 3.3.3.3 (33.33.33.33)
Origin IGP, metric 0, localpref 100, valid, internal, best
So we learned the route twice: once from 3.3.3.3, which was expected, and again from 2.2.2.2. Since this is iBGP, the next hop isn't modified, so the traffic always goes to the right place, but this is messy regardless.
On a side note, I was thinking "I sure could cause havoc by introducting next-hop-self on R1 towards R2, and R2 towards R1". Interestingly, next-hop-self did nothing under these circumstances - must be an idiot-proofing with route reflectors.
So, what do we do about this? Introducing the Cluster ID. The cluster ID value is introduced into the BGP cluster list attribute, seen above (Cluster list: 22.22.22.22). The cluster list says "I came from this group of iBGP route reflectors". If you fail to set the cluster ID, IOS will fill in the router ID in this field. Since both router reflectors clearly have different router IDs, they're taking each others routes and "double reflecting" them.
R1 & R2:
router bgp 100
bgp cluster-id 99.99.99.99
The cluster-id can be any arbitrary 32-bit number. All it needs to do is match on the redundant route reflectors, and you're all set.
R1#sh ip bgp 33.33.33.33
BGP routing table entry for 33.33.33.33/32, version 7
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Advertised to update-groups:
1 2
Local, (Received from a RR-client)
3.3.3.3 (metric 156160) from 3.3.3.3 (33.33.33.33)
Origin IGP, metric 0, localpref 100, valid, internal, best
Now only one copy of the route.
Jeff Kronlage
Jeff,
ReplyDeleteI have a unique situation, and would like your thoughts on it. A customer is advertising a default route with an AS path of 65160 1234 within the L3PVN (AS1234 happens to be the AS of PE that they are connected to).
The route reflector sees AS 1234 in the path and does not advertise it to other route reflector clients in AS1234..is that expected behavior?
=============================
Customer RD : 2:2
Route Reflector: 192.168.1.10 < This Route reflector client is in AS 1234
Local AS: 111
PE that the customer is peered to: 10.10.10.10 (This PE is in AS1785)
Output below from route reflector in AS 111
2:2:0.0.0.0/0 (2 entries, 0 announced)
BGP /-101
Route Distinguisher: 2:2
Next hop type: Indirect
Address: 0x28b00e20
Next-hop reference count: 2
Source: 192.168.1.10
Protocol next hop: 10.10.10.10
Push 464
Indirect next hop: 2 no-forward
State:
Local AS: 111 Peer AS: 1234
Age: 3d 6:34:32 Metric: 0 Metric2: 87766
Task: BGP_111.192.168.1.10+62122
AS path: 65160 1234 I (Originator) (Looped: 1234)
Cluster list: 192.168.1.10
Originator ID: 10.10.10.10
Communities: target:2:2
VPN Label: 464
Localpref: 100
Router ID: 192.168.1.10
Indirect next hops: 1
Protocol next hop: 10.10.10.10 Metric: 87766
Push 464
Indirect next hop: 2 no-forward
Indirect path forwarding next hops: 2
Next hop type: Router
Next hop: 98.21.127.41 via ge-2/2/0.0
Next hop: 98.21.127.43 via ge-2/2/1.0
10.10.10.10/32 Originating RIB: inet.3
Metric: 87766 Node path count: 1
Forwarding nexthops: 2
Nexthop: 98.21.127.41 via ge-2/2/0.0
Nexthop: 98.21.127.43 via ge-2/2/1.0
You're going to have to help me out with the topology a little more. Normally, a route reflector is advertising routes only within a single AS. Perhaps there's a VRF not picture here (Assuming so with MPLS mentioned), but I'm curious why we have a router in AS 1234 passing routes to an AS in 111 and expecting them to be passed back to 1234? That doesn't sound like route reflection, it sounds like traditional eBGP, and yes, 1234 -> 111 ->1234 would not be accepted simply because eBGP will not accept routes with its own AS in it. I suspect I'm missing something from your topology?
DeleteYou did a very good job on this write up, thank you.
ReplyDelete