Tuesday, April 15, 2014

A Thorough Approach for Debugging MPLS L3 VPNs

I recently realized I needed a more organized approach to debugging MPLS L3 VPNs for the troubleshooting section. Referencing a lot of the practice labs I've taken, I'm going to give a run-down of what I think are the fastest way to track down any problem.

First let's run down my list, then we'll pick it apart with an example below.

I'm going to assume the run-of-the-mill validation of "Host A needs to be able to ping host B".
Since we're talking high-level in the first segment, and with MPLS VPNs we're always talking about a sender and a receiver, I am going to refer to the sender that's unable to reach the receiver as the originating router, and the side that cannot be reached the terminating router for referencing direction.  

Before you start debugging...

1) Validate the problem: ping <problem IP>
2) Find out if the problem is unidirectional.  Run "debug ip icmp" on both the source and the destination.  Ping both ways.  If you're taking an INE lab, be sure logging is on too: logging con 7 and logging on
3) From the originating router, run "sh ip route <problem IP>" and "sh ip cef <problem IP>".  Sometimes some other route in the table is defeating the MPLS route on AD or, worse, more specific IP range.  That makes it not an MPLS problem, and is out of scope for this post.

Once you clear the starting checks, you want to validate whether or not you have the route in your routing table.

1) Are you importing the route in to your VRF?  Make sure the other side's exported route-target is being imported on the originating router's VRF.
2) Is the terminating router or terminating router's PE advertising the route?
3) Are route reflectors involved?  If you're relying on one route reflector to relay a route through another route reflector, you need to ensure the cluster-IDs are different.

These following items are dependent on using OSPF as your PE->CE routing protocol:
4) If you're using OSPF as PE->CE, check for sham links.  It's easy to break these and hard to look for them.  Do a "sh run | s sham" on the PEs and see if any exist.  If they do, run "show ip ospf sham-links"
5) If you're using OSPF as PE->CE, and the CE is also part of the VRF (the VRF itself exists on the CE), enable capability vrf-lite on the OSPF process on the CE.

If you don't have an internal route and you need one to beat another AD, then additionally check out:
6a) If OSPF is PE->CE, make sure domain-id is set the same on all of them, or you'll end up with external routes across the MPLS cloud.
6b) If EIGRP is PE->CE, make sure your EIGRP AS number (process number) matches on the PE routers.

If you checked into all of that, you should have an appropriate route by now.  What happens when you've got the route in your routing table, pointing the right direction, but the traffic just doesn't arrive on the far side?  Now we start debugging MPLS itself.

1) sh run | s mpls on every PE and P device. Look for LDP filtering.  There are more elegant ways to find this, but this is the fastest.
2) From the PE on the originating side, run a "sh ip cef <VRF NAME> <problem IP>".  Is the correct PE listed as next-hop in the "via" field? If it's not, go investigate the PE that is originating the route, there may be more than one path (and one may not lead anywhere!)
3) If it's the correct PE from step 2, do show mpls forwarding-table <PE LDP ID>. Unless your PEs are L2 adjacent, you must have tag listed for the PE, or "Pop Tag".  If you don't, walk your adjacent routers to be sure mpls ip is enabled on every interface or OSPF MPLS auto-config is enabled. Make sure CEF is turned on on all P and PE devices - MPLS doesn't work without CEF. If necessary, re-check step 1, make sure nothing is filtering tags. If still no problem is found, do a "show mpls ldp neighbor | i Peer" and make sure you have the correct count of neighbors.
4) Note the next-hop associated with the tag you identified in step 3. Open a command prompt on the next-hop and repeat step 3. Continue until you reach a "pop tag" for the terminating PE.
5) Check for Router-ID failures. LDP design can be picky that the mask in the routing table and the mask in the label match. This is most commonly an issue when OSPF is used as the MPLS IGP; if your router ID is based off a loopback that is other than a /32.  If this is the case, either change your loopback address to a /32 (if permitted), or change your ospf network type to point-to-point so that the label mask and the OSPF mask match.  Also, this can sometimes be an issue with summarized routes in other protocols (such as EIGRP), so be on the lookout there.
6) As a final check, be sure to see if cost-community was disabled on the PE routers. It's possible to perform traffic engineering against the prefixes if it's been disabled, and then who knows what path your traffic might be taking?  On the PEs, sh run | i cost-community. Cost community is on by default. and you want it left on. This command should show nothing if it is enabled, if it's disabled you will find bgp bestpath cost-community ignore in the config.

Now let's walk through the scenarios that these verifications above can save you from.

1) Validate the problem: ping <problem IP>

This should be obvious, but I actually proctor a private TS test, and I'm amazed the number of people that don't check what I put in front of them. In rare circumstances, sometimes the solution can be derived just from verifying the issue.  And in a TS lab, you need to be sure you didn't somehow fix the problem at some other point.

2) Find out if the problem is unidirectional.  Run "debug ip icmp" on both the source and the destination.  Ping both ways.  If you're taking an INE lab, be sure logging is on too: logging con 7 and logging on.

This is very important - so you "can't ping" the destination.  Do you know if your echo request isn't making it from origination to destination, or that the echo reply isn't making it from destination to origination? Don't waste time debugging the wrong flow. Quite regularly only one direction is failing.

3) From the originating router, run "sh ip route <problem IP>" and "sh ip cef <problem IP>". Sometimes some other route in the table is defeating the MPLS route on AD or, worse, more specific IP range.  That makes it not an MPLS problem, and is out of scope for this post.

This is easy to overlook.  You may have the route in both BGP, and the MPLS labels can be in good shape, but you're only getting a /24 across the MPLS VPN, and you're getting a bogus /32 route for the destination that leads nowhere, injected by your IGP from a router behind you. Your packet is going the wrong direction.

1) Are you importing the route in to your VRF?  Make sure the other side's exported route-target is being imported on the originating router's VRF.

Originating router:
ip vrf VPN
 rd 1:1
 route-target export 1:1
 route-target import 3:3
 route-target import 7:7

Terminating router:
ip vrf VPN
 rd 3:3
 route-target export 2:2
 route-target import 1:1
 route-target import 7:7

This config above for "Originating router" is missing route-target import 2:2.  The route target is a community carried with MP-BGP, if you don't import it into your VRF, you won't see the route.  The RD is basically irrelevant - as long as they're unique on each PE, they don't matter for the import process. 

2) Is the terminating router or terminating router's PE advertising the route?

This one sure got me once.  I'm looking and looking for an MP-BGP problem, and it turns out that the CE just didn't advertise the route to the PE.  Simple BGP error.

3) Are route reflectors involved?  If you're relying on one route reflector to relay a route through another route reflector, you need to ensure the cluster-IDs are different.

If you have two route reflectors in your MP-BGP topology, unless the PEs in question both peer to the same route reflector, you need to ensure that the route reflectors have different cluster IDs.   In other words, if your MP-BGP topology looks like this:

RR1 <-- PE1 --> RR2 <-- PE2

This will work fine, even if the cluster IDs are the same, because RR2 will reflect the routes from PE1 to PE2 and vice-versa. However, if you have:

PE1 --> RR1 <--> RR2 <-- PE2

Then you'll need separate cluster IDs, or RR1 will not reflect PE1's routes to RR2, and vice-versa.

4) If you're using OSPF as PE->CE, check for sham links. It's easy to break these and hard to look for them.  Do a "sh run | s sham" on the PEs and see if any exist.  If they do, run "show ip ospf sham-links"

Sham links allow you to extend an OSPF area across the "Super Area 0" backbone area. These are most commonly used to pref an MPLS path instead of a back-door link.  Topology aside, I've been bitten on broken sham links before, so look out for these.  If you want to know more about them:
http://brbccie.blogspot.com/2012/12/ospf-pe-downward-bit-super-area-0.html

5) If you're using OSPF as PE->CE, and the CE is also part of the VRF (the VRF itself exists on the CE), enable capability vrf-lite on the OSPF process on the CE.

The first time I ran into this I spent 5 hours debugging it. Some may say a waste of time, but I'll never forget it. In short: OSPF checks for the downward bit on routes exported from MP-BGP directly into the OSPF process. You'll watch the routes arrive on the PE and get put in the OSPF process no problem, and then when they hit the CE device(s), if the CEs are in the VRF as well, they'll be in the OSPF database but not get put into the RIB/FIB.  This is a loop prevention mechanism. To disable it, use "capability vrf-lite" inside the OSPF process.
Also reference: http://brbccie.blogspot.com/2012/12/ospf-pe-downward-bit-super-area-0.html

6a) If OSPF is PE->CE, make sure domain-id is set the same on all of them, or you'll end up with external routes across the MPLS cloud.

This only matters if you're shooting for an internal route for some reason, and is more of a reminder than a big deal.

6b) If EIGRP is PE->CE, make sure your EIGRP AS number (process number) matches on the PE routers.

This can make a slightly bigger difference, in that EIGRP naturally deprefs (via higher AD) external routes.  You may need an internal route in order to make the traffic cross the MPLS cloud. If the AS number doesn't match, you'll end up with external routes.

1) sh run | s mpls on every PE and P device. Look for LDP filtering. There are more elegant ways to find this, but this is the fastest.

This is a bit of a hack, but it catches about 90% of LDP problems in < 60 seconds.  You can't beat it for speed. I'll show more about this below.

2) From the PE on the originating side, run a "sh ip cef <VRF NAME> <problem IP>".  Is the correct PE listed as next-hop in the "via" field? If it's not, go investigate the PE that is originating the route, there may be more than one path (and one may not lead anywhere!)

PE1#sh ip cef vrf VPN 192.168.1.7
192.168.1.0/24, version 8, epoch 0, cached adjacency 10.0.23.3
0 packets, 0 bytes
  tag information set
    local tag: VPN-route-head
    fast tag rewrite with Fa0/1, 10.0.23.3, tags imposed: {17 23}
  via 5.5.5.5, 0 dependencies, recursive
    next hop 10.0.23.3, FastEthernet0/1 via 5.5.5.5/32
    valid cached adjacency
    tag rewrite with Fa0/1, 10.0.23.3, tags imposed: {17 23}

The via field above shows the PE you're heading towards.  Is it the correct PE?  This threw me off something awful once. The prefix in question was endlessly looping off a 3rd PE, and was being re-advertised on the 3rd PE.  That PE was being preffed.  Boom, an hour gone debugging - if only I'd paid more attention to the output of "sh ip cef vrf VPN"!

Assuming it is the right PE listed above, you walk the MPLS labels from there:

PE1#sh mpls forwarding-table 5.5.5.5
Local  Outgoing    Prefix            Bytes tag  Outgoing   Next Hop
tag    tag or VC   or Tunnel Id      switched   interface
17     17          5.5.5.5/32        0          Fa0/1      10.0.23.3

Next hop is 10.0.23.3, via Fa0/1; that's P1:

P1#show mpls forwarding-table 5.5.5.5
Local  Outgoing    Prefix            Bytes tag  Outgoing   Next Hop
tag    tag or VC   or Tunnel Id      switched   interface
17     Untagged    5.5.5.5/32        13766      Fa0/1      10.0.34.4

There's the evil Untagged! Let's go see what's up on P2.

P2#sh run | s mpls
no mpls ldp advertise-labels
 mpls label protocol ldp
 mpls ip
 mpls label protocol ldp
 mpls ip

Note, we should have caught this in MPLS debugging step 1, but just in case you didn't...!
There's about 3 scenarios you want to look out for related to label advertisement:

no mpls ldp advertise-labels will make no labels be advertised at all.
That command can be used in combination with mpls ldp advertise-labels for <standard ACL>. The standard ACL can be (rather obviously) rigged to prevent the labels you need advertised from being advertised.
The final command is mpls label range <min> <max>.  If you don't allow enough labels the ones you need can end up not getting assigned one at all.

I've fixed the mpls ldp advertise-labels command above, and now we see the appropriate output on P1:

P1#show mpls forwarding-table 5.5.5.5
Local  Outgoing    Prefix            Bytes tag  Outgoing   Next Hop
tag    tag or VC   or Tunnel Id      switched   interface
17     17          5.5.5.5/32        0          Fa0/1      10.0.34.4

And on P2:

P2#show mpls forwarding-table 5.5.5.5
Local  Outgoing    Prefix            Bytes tag  Outgoing   Next Hop
tag    tag or VC   or Tunnel Id      switched   interface
17     Pop tag     5.5.5.5/32        508        Fa0/1      10.0.45.5

We see "Pop tag".  Pop tag is OK, it's just part of the Penultimate Hop Pop process.

3) If it's the correct PE from step 2, do show mpls forwarding-table <PE LDP ID>Unless your PEs are L2 adjacent, you must have tag listed for the PE, or "Pop Tag".  If you don't, walk your adjacent routers to be sure mpls ip is enabled on every interface or OSPF MPLS auto-config is enabled. Make sure CEF is turned on on all P and PE devices - MPLS doesn't work without CEF. If necessary, re-check step 1, make sure nothing is filtering tags. If still no problem is found, do a "show mpls ldp neighbor | i Peer" and make sure you have the correct count of neighbors.

I've seen some nasty, nasty things done with VACLs on the layer 2 switches between routers on practice labs.  It's not much of a stretch to think they'd block LDP.  The config would look perfect and your adjacency simply wouldn't come up.  Count how many adjacencies you're expecting from the diagram, and make sure you get a good head count:

P1#show mpls ldp neigh | i Peer
    Peer LDP Ident: 7.7.7.7:0; Local LDP Ident 10.0.37.3:0
    Peer LDP Ident: 2.2.2.2:0; Local LDP Ident 10.0.37.3:0
    Peer LDP Ident: 192.168.49.4:0; Local LDP Ident 10.0.37.3:0

If you're missing one, investigate the adjacency.

And a shout out to my friend Keith Chayer, who reminded me to check for CEF being enabled as well. It is of note that you'll be missing labels if CEF is disabled on the MPLS transit path - at least LDP is smart enough to tell it's neighbors "I'm broken - don't use me".

4) Note the next-hop associated with the tag you identified in step 3. Open a command prompt on the next-hop and repeat step 3. Continue until you reach a "pop tag" for the terminating PE.

I covered this above.

5) Check for Router-ID failures. LDP design can be picky that the mask in the routing table and the mask in the label match. This is most commonly an issue when OSPF is used as the MPLS IGP; if your router ID is based off a loopback that is other than a /32.  If this is the case, either change your loopback address to a /32 (if permitted), or change your ospf network type to point-to-point so that the label mask and the OSPF mask match.  Also, this can sometimes be an issue with summarized routes in other protocols (such as EIGRP), so be on the lookout there.

This is reasonably self-explanatory.  The route prefix length and the LDP prefix length need to match. OSPF is the common culprit.  
Reference: http://brbccie.blogspot.com/2013/11/mini-why-does-ldp-require-32-loopback.html

6) As a final check, be sure to see if cost-community was disabled on the PE routers. It's possible to perform traffic engineering against the prefixes if it's been disabled, and then who knows what path your traffic might be taking?  On the PEs, sh run | i cost-community. Cost community is on by default. and you want it left on. This command should show nothing if it is enabled, if it's disabled you will find bgp bestpath cost-community ignore in the config.

I got this on a mock lab once, as well.  If the PEs are disabling cost community, you need to ask yourself why: is this a mandatory traffic engineering, or are they just trying to steer routes in the wrong direction?

Reference: http://brbccie.blogspot.com/2012/12/bgp-cost-community-eigrp-soo-and.html

/* Addition 11/27/14 - I apologize for not inserting this more thoroughly in the blog, but time doesn't permit right now - be sure to look for import or export maps on the VRF. It's possible to define a route-map that filters prefixes inbound or outbound of the VRF.  The syntax is not particuarly complex:

ip prefix-list IMPORT_PL seq 5 deny 0.0.0.0/0 le 32
route-map SNAFU permit 10
 match ip address prefix-list  IMPORT_PL

vrf definition VRFTEST
 rd 1:1
 route-target export 1:1
 route-target import 1:1
 !
 address-family ipv4
  import ipv4 unicast map IMPORT-FILTER

*/

Cheers,

Jeff Kronlage

1 comment:

  1. This comment has been removed by a blog administrator.

    ReplyDelete