Sunday, September 7, 2014

CCIE v4 to v5: BGP NHT, SAT, FSD, Dynamic Neighbors, Multisession Transport Per AF

BGP Next Hop Tracking (NHT) is an on-by-default feature that notifies BGP to a change in routing for BGP prefix next-hops. This is important because previously this only happened as part of the BGP Scanner process, which runs every 60 seconds by default. Waiting 60 seconds to determine your BGP route is effectively no longer valid (because of invalid next-hop) significantly hampers reconvergence. Instead of being timer-based, NHT makes the process of dealing with next-hop changes event-driven.



EIGRP is peered on all routers on the 192.168.124.0/24 link.

Here's the relevant base BGP config:

R1:
router bgp 1
 bgp log-neighbor-changes
 neighbor 3.3.3.3 remote-as 3
 neighbor 3.3.3.3 ebgp-multihop 255
 neighbor 3.3.3.3 update-source Loopback0
 neighbor 4.4.4.4 remote-as 4
 neighbor 4.4.4.4 ebgp-multihop 255
 neighbor 4.4.4.4 update-source Loopback0

R3:
router bgp 3
 bgp log-neighbor-changes
 neighbor 1.1.1.1 remote-as 1
 neighbor 1.1.1.1 ebgp-multihop 255
 neighbor 1.1.1.1 update-source Loopback0
 neighbor 192.168.34.4 remote-as 4

R4:
interface Loopback1
 ip address 44.44.44.44 255.255.255.255

router bgp 4
 bgp log-neighbor-changes
 network 44.44.44.44 mask 255.255.255.255
 neighbor 1.1.1.1 remote-as 1
 neighbor 1.1.1.1 ebgp-multihop 255
 neighbor 1.1.1.1 update-source Loopback0
 neighbor 192.168.34.3 remote-as 3

In short, we're using ebgp multihop in order to keep my mock-up smaller. We have two paths from R1 to R4's 44.44.44.44:

R1 -> R4's 4.4.4.4 (and consequently to 44.44.44.44 in the same hop)
R1 -> R3's 3.3.3.3, then R3 to R4's 192.168.34.4 

The first route has one AS in it's AS-PATH, the 2nd route has two ASes, and is less preferred.

R1#sh ip bgp 44.44.44.44 bestpath
BGP routing table entry for 44.44.44.44/32, version 11
Paths: (2 available, best #1, table default)
  Advertised to update-groups:
     2
  Refresh Epoch 2
  4
    4.4.4.4 (metric 10880) from 4.4.4.4 (44.44.44.44)
      Origin IGP, metric 0, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0

Let's try this experiment without NHT enabled first:

R1(config)#router bgp 1
R1(config-router)# no bgp nexthop trigger enable

R1#debug ip routing
IP routing debugging is on

R4(config-if)#int lo0  ! this is the 4.4.4.4 interface (the next-hop for 44.44.44.44 from R1)
R4(config-if)#shut

Debug from R1 below
===============
*Sep 17 22:59:03.552: RT: delete route to 4.4.4.4 via 192.168.124.4, eigrp metric [90/10880]
*Sep 17 22:59:03.552: RT: no routes to 4.4.4.4, delayed flush
*Sep 17 22:59:03.552: RT: delete subnet route to 4.4.4.4/32
*Sep 17 22:59:03.552: RT: updating eigrp 4.4.4.4/32 (0x0)  :
    via 192.168.124.4 Gi1.124  0 1048578

*Sep 17 22:59:03.552: RT: rib update return code: 5
================

This happened as fast as EIGRP converged - very quickly.  So we know 4.4.4.4 isn't a valid route any longer, but what about 44.44.44.44?

R1#sh ip bgp 44.44.44.44 bestpath
BGP routing table entry for 44.44.44.44/32, version 11
Paths: (2 available, best #1, table default)
  Advertised to update-groups:
     2
  Refresh Epoch 2
  4
    4.4.4.4 (metric 10880) from 4.4.4.4 (44.44.44.44)
      Origin IGP, metric 0, localpref 100, valid, external, best
      rx pathid: 0, tx pathid: 0x0

R1#ping 44.44.44.44
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 44.44.44.44, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

Still thinking the next-hop is 4.4.4.4, and it's Very Down.

I didn't time it this way specifically, but remember the scan timer runs every 60 seconds. so 51 seconds after we yanked the 4.4.4.4 next-hop, BGP finally figured out something was up and reconverged to the alternate path for 44.44.44.44 via R3.

*Sep 17 22:59:54.031: RT: updating bgp 44.44.44.44/32 (0x0)  :
    via 3.3.3.3   0 1048577

*Sep 17 22:59:54.031: RT: closer admin distance for 44.44.44.44, flushing 1 routes
*Sep 17 22:59:54.031: RT: add 44.44.44.44/32 via 3.3.3.3, bgp metric [20/0]

R1#ping 44.44.44.44
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 44.44.44.44, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/3 ms

R1#trace 44.44.44.44
Type escape sequence to abort.
Tracing the route to 44.44.44.44
VRF info: (vrf in name/id, vrf out name/id)
  1 192.168.124.3 4 msec 1 msec 0 msec
  2 192.168.34.4 2 msec *  2 msec

A 51 second reconverge in a modern network is pretty awful.

R4(config-if)#int lo0
R4(config-if)#no shut

Let's re-add the next-hop trigger and try again.

R1(config-router)#router bgp 1
R1(config-router)#bgp nexthop trigger enable

R4(config-if)#int lo0
R4(config-if)#shut

Debug from R1 below
===============
*Sep 17 23:11:53.582: RT: delete route to 4.4.4.4 via 192.168.124.4, eigrp metric [90/10880]
*Sep 17 23:11:53.582: RT: no routes to 4.4.4.4, delayed flush
*Sep 17 23:11:53.582: RT: delete subnet route to 4.4.4.4/32
*Sep 17 23:11:53.582: RT: updating eigrp 4.4.4.4/32 (0x0)  :
    via 192.168.124.4 Gi1.124  0 1048578

*Sep 17 23:11:53.582: RT: rib update return code: 5
*Sep 17 23:11:58.582: RT: updating bgp 44.44.44.44/32 (0x0)  :
    via 3.3.3.3   0 1048577

*Sep 17 23:11:58.582: RT: closer admin distance for 44.44.44.44, flushing 1 routes
*Sep 17 23:11:58.582: RT: add 44.44.44.44/32 via 3.3.3.3, bgp metric [20/0]
===============

Note the bottom two lines of output, we see the reconverge this time - in 5 seconds. Why 5 seconds?

The bgp nexthop trigger delay defines how long for the NHT process to delay updating BGP. This timer is here to prevent BGP from being beaten up by a flapping IGP route. At 5 seconds, the BGP process can't get bogged down from unnecessary updates. 

Let's set it to 2 and try again.

R1(config-router)#bgp nexthop trigger delay 2

Debug from R1 below
===============
*Sep 17 23:18:40.167: RT: delete route to 4.4.4.4 via 192.168.124.4, eigrp metric [90/10880]
*Sep 17 23:18:40.167: RT: no routes to 4.4.4.4, delayed flush
*Sep 17 23:18:40.167: RT: delete subnet route to 4.4.4.4/32
*Sep 17 23:18:40.167: RT: updating eigrp 4.4.4.4/32 (0x0)  :
    via 192.168.124.4 Gi1.124  0 1048578

*Sep 17 23:18:40.167: RT: rib update return code: 5
*Sep 17 23:18:42.168: RT: updating bgp 44.44.44.44/32 (0x0)  :
    via 3.3.3.3   0 1048577

*Sep 17 23:18:42.168: RT: closer admin distance for 44.44.44.44, flushing 1 routes
*Sep 17 23:18:42.168: RT: add 44.44.44.44/32 via 3.3.3.3, bgp metric [20/0]
===============

Now converging at 2 seconds.

Applying a route-map to the NHT process is provided by a feature called Selective Address Tracking, or SAT.

The route-map determines what prefixes can be seen as valid prefixes for next-hops.

For example, if 4.4.4.4 is your desired next hop, but you have a default on your router, if you lose 4.4.4.4/32 do you want the router to consider 4.4.4.4 reachable via the default? Potentially not.

R1(config)#ip route 0.0.0.0 0.0.0.0 192.168.124.10  ! Deliberately non-existent next-hop

Without the route map....

R4(config-if)#int lo0
R4(config-if)#shut

This is hard to demonstrate, because the prefix might never recover. In our over-simplified mock-up, the BGP process would fail at timeout (because 4.4.4.4 is actually our peer) before the prefix vanished; in a more realistic design this could be a permanent black-hole.

We still have the bogus static default route in place:
ip route 0.0.0.0 0.0.0.0 192.168.124.10

R1(config-router)#ip prefix-list onlyloops seq 5 permit 0.0.0.0/0 ge 32
R1(config)#route-map SAT permit 10
R1(config-route-map)# match ip address prefix-list onlyloops
R1(config-route-map)#router bgp 1
R1(config-router)# bgp nexthop route-map SAT

This config only allows for /32s as viable next-hops.

R4(config-if)#int lo0
R4(config-if)#shut

Debug from R1 below
===============
*Sep 17 23:47:09.497: RT: delete route to 4.4.4.4 via 192.168.124.4, eigrp metric [90/10880]
*Sep 17 23:47:09.497: RT: no routes to 4.4.4.4, delayed flush
*Sep 17 23:47:09.497: RT: delete subnet route to 4.4.4.4/32
*Sep 17 23:47:09.497: RT: updating eigrp 4.4.4.4/32 (0x0)  :
    via 192.168.124.4 Gi1.124  0 1048578

*Sep 17 23:47:09.497: RT: rib update return code: 5
*Sep 17 23:47:11.498: RT: updating bgp 44.44.44.44/32 (0x0)  :
    via 3.3.3.3   0 1048577

*Sep 17 23:47:11.498: RT: closer admin distance for 44.44.44.44, flushing 1 routes
*Sep 17 23:47:11.499: RT: add 44.44.44.44/32 via 3.3.3.3, bgp metric [20/0]
===============

Now reconverging in 2 seconds again!

This is great for the downstream prefix, but what about the neighbor session itself?

This could work...
R1(config-router)#neighbor 4.4.4.4 fall-over

Except that pesky default is keeping 4.4.4.4 supposedly reachable....
For brevity, I'll tell you that as expected, when I shut the Lo0 interface on R4, 4.4.4.4 was pulled from R1's IGP and 44.44.44.44 was pulled from R1's BGP table.  However, the session is still up!

The same concept (even the same route-map) can be applied to the neighbor fall-over statement. This feature is called Fast Session Deactivation (FSD). 

R1(config-router)#neighbor 4.4.4.4 fall-over route-map SAT ! re-using SAT's route-map

Debug from R1 below
===============
*Sep 18 00:11:08.107: %BGP-5-NBR_RESET: Neighbor 4.4.4.4 reset (Route to peer lost)
*Sep 18 00:11:08.107: %BGP-5-ADJCHANGE: neighbor 4.4.4.4 Down Route to peer lost
*Sep 18 00:11:08.107: %BGP_SESSION-5-ADJCHANGE: neighbor 4.4.4.4 IPv4 Unicast topology base removed from session  Route to peer lost
===============

And the BGP session gets torn down immediately.

This next feature I'm not sure of the use case on, but it was recommended as a topic, so I looked at it. Multisession Transport per AF appears to be related to Multi-Topology Routing (MTR), but MTR should be solidly out-of-scope for CCIE R&S v5.

What multisession transport does is opens a separate TCP session for each address family.

I've erased all the BGP config from the previous task.

R1:
ipv6 unicast-routing

router bgp 100
 bgp log-neighbor-changes
 neighbor 4.4.4.4 remote-as 100
 neighbor 4.4.4.4 update-source Loopback0
 !
 address-family ipv4
  neighbor 4.4.4.4 activate
 exit-address-family
 !
 address-family vpnv4
  neighbor 4.4.4.4 activate
  neighbor 4.4.4.4 send-community extended
 exit-address-family
 !
 address-family ipv6
  neighbor 4.4.4.4 activate
 exit-address-family

R4:
ipv6 unicast-routing

router bgp 100
 bgp log-neighbor-changes
 neighbor 1.1.1.1 remote-as 100
 neighbor 1.1.1.1 update-source Loopback0
 !
 address-family ipv4
  neighbor 1.1.1.1 activate
 exit-address-family
 !
 address-family vpnv4
  neighbor 1.1.1.1 activate
  neighbor 1.1.1.1 send-community extended
 exit-address-family
 !
 address-family ipv6
  neighbor 1.1.1.1 activate
 exit-address-family

R1(config-router-af)#do show tcp brief
TCB       Local Address               Foreign Address             (state)
7F612C7742A0  1.1.1.1.40234              4.4.4.4.179                 ESTAB

Three families, one TCP session.

R1(config-router)#neighbor 4.4.4.4 transport multi-session

R4(config-router)#neighbor 1.1.1.1 transport multi-session

The two sides of the session do need to agree on the setting.

R1:
*Sep 18 00:31:19.102: %BGP-5-ADJCHANGE: neighbor 4.4.4.4 Up
*Sep 18 00:31:25.940: %BGP-5-ADJCHANGE: neighbor 4.4.4.4 session 2 Up
*Sep 18 00:31:28.322: %BGP-5-ADJCHANGE: neighbor 4.4.4.4 session 3 Up

R1(config-router)#do show tcp brief
TCB       Local Address               Foreign Address             (state)
7F612C76F0F0  1.1.1.1.179                4.4.4.4.30092               ESTAB
7F612C76DE20  1.1.1.1.179                4.4.4.4.42417               ESTAB
7F612C76E788  1.1.1.1.48539              4.4.4.4.179                 ESTAB

Our last topic is BGP Dynamic Neighbors. Yes, automagic BGP peerings!

Erasing all the pre-existing BGP config again...

R1:
router bgp 100
 bgp log-neighbor-changes
 bgp listen range 192.168.124.0/24 peer-group PEERS
 neighbor PEERS peer-group
 neighbor PEERS remote-as 100
 neighbor PEERS password CISCO
 neighbor PEERS update-source Loopback0
 neighbor PEERS route-reflector-client
 bgp listen limit 3

R2-R4:
router bgp 100
 bgp log-neighbor-changes
 neighbor 192.168.124.1 remote-as 100
 neighbor 192.168.124.1 password CISCO

R1:
*Sep 18 00:38:24.696: %BGP-5-ADJCHANGE: neighbor *192.168.124.2 Up
*Sep 18 00:39:04.980: %BGP-5-ADJCHANGE: neighbor *192.168.124.4 Up
*Sep 18 00:39:05.932: %BGP-5-ADJCHANGE: neighbor *192.168.124.3 Up

iBGP doesn't get any faster to setup than that!

I've used the most obvious settings here - the dynamic "host" would normally be a route-reflector, and would normally require authentication. 

However, you can:
- Run multiple dynamic groups
- Listen to multiple ranges
- Use multiple address families (this works great for VPNv4!)
- Listen for more neighbors (I limited it to 3 above)

Cheers,

Jeff

No comments:

Post a Comment