Thursday, November 7, 2013

[mini] Why does LDP "require" a /32 Loopback?

A few days ago I asked a coworker why LDP sessions had issues if they weren't peered on /32s.  He answered, it doesn't have to be a /32, but the IGP and LDP had to agree on the mask length.  So I asked the more specific question - why does it have to agree on the mask length? He didn't know.  And neither did I.

Everyone seems to know that /32s are best practice for the LDP router ID.  But it's hard to find a good, clear explanation of why this is.

Let's start with some obvious facts.

- "The router considers all the IP addresses of all operational interfaces.... If these addresses include loopback interface addresses, the router selects the largest loopback address." http://www.cisco.com/en/US/docs/ios/12_4t/12_4t2/ftldp41.html#wp1654686

As always, my posts are geared for the CCIE lab, and it's a fair bet most of your gear on the lab is going to have a loopback.  So, expect the router ID to be a loopback, unless it's specified otherwise.

- You can specify the interface with mpls ldp router-id <interface>.  If you don't want it to be a loopback, or you want a certain loopback to be chosen over another, then use this command. If you want to change the router-id while LDP is already up you have to use the force command, i.e. mpls ldp router-id lo7 force.  If you don't use force, and LDP was already online, you'll have to reboot in order for the switch to take place.

- You can set the range of labels that LDP is allowed to use with mpls label range <lower> <upper>  I find this useful in debugging, because you can make your labels match your router number and it's easier to read the output.  LDP show commands are not always easy to interpret if you're not used to reading them.

-  "The LDP default behavior is to allocate local labels for all non-BGP prefixes."
http://www.cisco.com/en/US/docs/ios/12_4t/12_4t2/ftldp41.html#wp1654686

So what's that mean to us?  It might be better phrased as "The LDP default behavior is to allocate local labels choosing the best administrative distance as long as it's not from BGP".

- This problem is most commonly seen with OSPF (although you could see it from a summary route as well).  The sure-fire way to demonstrate it is to create a /24 loopback and not change the default network type.  OSPF automatically uses network type LOOPBACK, which is always advertised as a /32.

- With MPLS VPNs, BGP actually distributes the labels for the VRFs, not LDP.  You learn the stacked VRF tag, relevant only to the egress PE, from BGP.  You also learn the global routing table's next hop.  The next-hop is used to find out the LDP label.

Let's take a look at how this plays out.

R3 is trying to reach R1 in VRF CCIE.  R3's IP address is 3.3.3.3 and R1's IP address is 1.1.1.1.  R2 is sitting in the middle of the two.

R3#ping vrf CCIE 1.1.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
As we can see, ping is failing.

R3#sh ip route vrf CCIE 1.1.1.1
Routing entry for 1.1.1.1/32
  Known via "bgp 100", distance 200, metric 0, type internal
  Last update from 11.11.11.11 00:26:04 ago
  Routing Descriptor Blocks:
  * 11.11.11.11 (Default-IP-Routing-Table), from 22.22.22.22, 00:26:04 ago
      Route metric is 0, traffic share count is 1
      AS Hops 0
We have a route to reach it.

R3#show ip cef vrf CCIE 1.1.1.1
1.1.1.1/32, version 3, epoch 0, cached adjacency 192.168.23.2
0 packets, 0 bytes
  tag information set
    local tag: VPN-route-head
    fast tag rewrite with Fa0/0, 192.168.23.2, tags imposed: {200 103}
  via 11.11.11.11, 0 dependencies, recursive
    next hop 192.168.23.2, FastEthernet0/0 via 11.11.11.11/32
    valid cached adjacency
    tag rewrite with Fa0/0, 192.168.23.2, tags imposed: {200 103}

I used the mpls label range command (mentioned above) in order to restrict the tags to start with their own router ID.  In this case, we should be using MPLS "transit" tag of 200, and a MPLS "VRF" tag of 103.

R3#show mpls ldp bindings | b 11.11.11.11
  tib entry: 11.11.11.11/32, rev 6
        local binding:  tag: 300
        remote binding: tsr: 22.22.22.22:0, tag: 200
<output omitted>

We know that tag 200 references R1's primary routing table loopback IP (11.11.11.11).

R3#show mpls forwarding-table 11.11.11.11
Local  Outgoing    Prefix            Bytes tag  Outgoing   Next Hop
tag    tag or VC   or Tunnel Id      switched   interface
300    200         11.11.11.11/32      0          Fa0/0      192.168.23.2

We know that means sending traffic out Fa0/0 towards R2 (192.168.23.2) with tag 200.

Ok, so this router should be able to send traffic, right?

R2#debug mpls packet
MPLS packet debugging is on
R3#ping vrf CCIE 1.1.1.1 rep 2 timeout 1
Type escape sequence to abort.
Sending 2, 100-byte ICMP Echos to 1.1.1.1, timeout is 1 seconds:
..
Success rate is 0 percent (0/2)

R2#
*Mar  1 00:36:08.651: MPLS: Fa0/1: recvd: CoS=6, TTL=255, Label(s)=0
*Mar  1 00:36:09.067: MPLS: Fa0/1: recvd: CoS=6, TTL=255, Label(s)=0
R2 gets the MPLS packet just fine!  And that's all it does.  Notice my debug doesn't say anything about forwarding it on.

R2#show mpls ldp binding | b 11.11.11.11
  tib entry: 11.11.11.11/32, rev 10
        local binding:  tag: 200
        remote binding: tsr: 33.33.33.33:0, tag: 300
<output omitted>

We see R2 has locally bound tag 200 for 11.11.11.11, and has received a tag from R3 for 11.11.11.11, but ... no tag from R1?

Let's look at the routing tables.

R2#sh ip route 11.11.11.11
Routing entry for 11.11.11.11/32
  Known via "ospf 1", distance 110, metric 2, type intra area
  Last update from 192.168.12.1 on FastEthernet0/0, 00:00:02 ago
  Routing Descriptor Blocks:
  * 192.168.12.1, from 11.11.11.11, 00:00:02 ago, via FastEthernet0/0
      Route metric is 2, traffic share count is 1
R2 sees this as a /32.

R3#sh ip route 11.11.11.11
Routing entry for 11.11.11.11/32
  Known via "ospf 1", distance 110, metric 3, type intra area
  Last update from 192.168.23.2 on FastEthernet0/0, 00:39:16 ago
  Routing Descriptor Blocks:
  * 192.168.23.2, from 11.11.11.11, 00:39:16 ago, via FastEthernet0/0
      Route metric is 3, traffic share count is 1

R3 sees this as a /32.  Consequently, R3 has no problem sending the MPLS packet to R2.

R1#sh ip route 11.11.11.11
Routing entry for 11.11.11.0/24
  Known via "connected", distance 0, metric 0 (connected, via interface)
  Routing Descriptor Blocks:
  * directly connected, via Loopback0
      Route metric is 0, traffic share count is 1
And R1 sees it as a ... /24 connected route.  As mentioned above, OSPF is the common culprit here. It's advertising a /32 to everyone else, except the local router, which still sees it as a /24.  In fact...

R2#sh mpls ldp binding | b 11.11.11.0/24
  tib entry: 11.11.11.0/24, rev 11
        remote binding: tsr: 11.11.11.0:0, tag: exp-null
<output omitted>

R1 is advertising a /24 to R2.  MPLS bindings work a bit different than the routing table, R2's LDP process isn't simply going to choose the best route to R1, it's matching labels to prefixes, and the prefixes are considered unique if they're not identical.  So R2 just drops the packet, as it has no more bindings for 11.11.11.0/24.

The fix is to just make the two prefix lengths the same. They don't need to be /32s!  The easiest way to make this happen in this scenario is to change the OSPF network type away from LOOPBACK and stop forcing the /32 advertisement:

R1(config)#int lo0
R1(config-if)#ip ospf network point-to-point
R2#sh mpls ldp binding | b 11.11.11.0/24
  tib entry: 11.11.11.0/24, rev 16
        local binding:  tag: 203
        remote binding: tsr: 11.11.11.11:0, tag: exp-null
        remote binding: tsr: 33.33.33.33:0, tag: 305
<output omitted>

We can see R2 now has a binding from R1 and R3 that matches the same prefix length.

R3#ping vrf CCIE 1.1.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 60/66/76 ms

And forwarding works end-to-end.

In a nutshell: LDP associates labels with both the IP address and subnet mask.  The prefix length does have to match to become part of the same MPLS forwarding path.  However, the prefix length does not have to be /32 - it's just a good, safe practice.

3 comments:

  1. Hi thanks for the great article,but i heard some conditions like this
    " LDP router id and BGP router-id should be same if SP is using labels only for loopbacks. If labels are generated for each and every route then no problem at all."

    can you please clarify on these conditions?

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete