Sunday, December 2, 2012

OSPF PE: Downward bit, Super Area 0, Domain IDs, capability vrf-lite, sham links

This post will bite off quite a lot.  I wanted to write one post that encompassed the entirety of the interaction of using OSPF as a PE to CE routing protocol.

Let me begin by saying... what a disastrously bad idea doing this is.  BGP is the obvious PE to CE routing protocol.  I've never deployed OSPF as a PE to CE in production, but I know someone that has, and he hated it too.  Even the service provider (AT&T) that offered the OSPF option won't let you opt for it any longer if you're a new customer.  The only argument I've heard for using anything besides BGP - that actually made sense - is if you have a great deal of routers with basic IOSes and don't have BGP as a routing option. 

The reason it's a disastrously bad idea is because it's too ambitious.  To me, it feels like the designers sat down with the concept of converting a large layer 2 frame-relay OSPF network natively to MPLS without having to rethink the OSPF design.  With all the band-aids available, you can keep your area design intact, even if it makes no sense whatsoever in an MPLS world.

In a nutshell, these are the "add-ons" we'll be looking at:
The OSPF Down Bit - designed to prevent loops from forming in an OSPF area that's multihomed to the MPLS backbone.
Super Area 0 - The MPLS network will be treated as an "area 0" in and of itself.  This is in case your areas become disconnected from area 0 due to the migration to MPLS.  This way, each area will always be attached to area 0.  Can be disabled with "capability vrf-lite"
Sham Links - Creates a control-plane intra-area link over the Super Area 0.  Can be useful for traffic engineering
Domain-ID (community) - Controls whether or not routes should be considered inter-area or external.  The domain-id is populated by the router process number by default if it's not specified.  Same Domain-ID = inter-area, different = external.  It assumes different means that they should be treated as separate OSPF processes.

This is the diagram we'll be referencing. 

 

The way we're setup out-of-the-gates is all CEs belong to the same VRF, named "VPN".  They all share all their routes with all the other sites.  The MPLS and MP-BGP are functional and redistributing all routes from the customer.

First, let's take a high-level look at how OSPF is treated through an MPLS tunnel.  Let's look from CE3's perspective as it only has it's own routes, plus the ones that came via MP-BGP.


We see a couple of type 1 & 2 LSAs, and a whole bunch of type 3. 

CE3#sh run | s router ospf
router ospf 1
 log-adjacency-changes
 network 10.0.3.0 0.0.0.255 area 1
 network 44.44.44.44 0.0.0.0 area 1

CE1#sh run | s router ospf
router ospf 1
 log-adjacency-changes
 network 10.0.1.0 0.0.0.255 area 1
 network 22.22.22.22 0.0.0.0 area 1
 network 192.168.1.0 0.0.0.255 area 1
We can specifically see from the screenshot that 22.22.22.22 is being introduced to area 1 on CE1, yet it's showing up as inter-area on CE3.  Why type 3s on CE3?

This has to do with the way the MP-BGP is treated inside OSPF, and the super area 0.  The diagram would be logically represented like this:



Now it's more obvious why routes coming from CE1 towards CE3 would show up as summary type 3's -- they are logically going across another area, "super 0".
 
Now let's take a look at the downward/DN bit.  The DN bit is only seen on type 3 LSAs.  The DN bit is for loop prevention.  When CE3 advertises it's loopback interface into the MPLS cloud, and CE1 receives it would logically try to re-advertise this LSA back into the MPLS cloud.  That would be a bad thing.  The DN bit prevents this.  Here's how.
 
When PE1 re-introduces CE3's LSA back into OSPF, it sets the DN bit.
 
 
Note "Options: (No TOS-capability, DC, Downward)"
 
Now when (or if) PE2 receives this LSA, it won't redistribute it back into MP-BGP.  In fact, it doesn't even consider this a routable LSA.  This can be a problem if you have an LSA with a DN-bit set and somewhere down the road you're using "VRF Lite" (VRF without MP-BGP).  We'll look at that harder later in the document, but this check can be disabled with capability vrf-lite.

If the downward bit isn't set, the LSA is considered Upward, but that's difficult to lab in my scenario.  You'd need a type 3 LSA that hadn't crossed the MP-BGP cloud.

Let's look at the domain-id next.

The domain-id determines whether or not the PE should consider the remote OSPF routes to be part of the local OSPF domain, or whether they came from a separate OSPF process, and should be considered external.

By default, domain-id is set by the OSPF process number on the PE.  So if you're using "router ospf 2" on the PEs, like we are, the domain-id is 2.  If the domain-ids match on both sides of an MPLS connection, you'll end up with type 3 LSAs.  If they're different, you end up with type 5 LSAs.  Let's take a look...

PE1#show ip bgp vpnv4 vrf VPN 44.44.44.44
BGP routing table entry for 1:1:44.44.44.44/32, version 19
Paths: (1 available, best #1, table VPN)
  Not advertised to any peer
  Local
    4.4.4.4 (metric 12) from 4.4.4.4 (4.4.4.4)
      Origin incomplete, metric 11, localpref 100, valid, internal, best
      Extended Community: RT:1:1 OSPF DOMAIN ID:0x0005:0x000000020200
        OSPF RT:0.0.0.1:2:0 OSPF ROUTER ID:10.0.3.1:0
      mpls labels in/out nolabel/22

0x00000002 is our domain-id.  Let's change the domain-id manually on PE3 and see what we get then.

PE3(config)#router ospf 2
PE3(config-router)#domain-id 0.0.0.3
I've done a soft-clear on BGP and waited a minute or so...

PE1#show ip bgp vpnv4 vrf VPN 44.44.44.44
BGP routing table entry for 1:1:44.44.44.44/32, version 21
Paths: (1 available, best #1, table VPN)
Flag: 0xA00
  Not advertised to any peer
  Local
    4.4.4.4 (metric 12) from 4.4.4.4 (4.4.4.4)
      Origin incomplete, metric 11, localpref 100, valid, internal, best
      Extended Community: RT:1:1 OSPF DOMAIN ID:0x0005:0x000000030200
        OSPF RT:0.0.0.1:2:0 OSPF ROUTER ID:10.0.3.1:0
      mpls labels in/out nolabel/22

There's our new ID.  Now how do the routes look?

CE1#show ip ospf database
<output omitted>

                Type-5 AS External Link States
Link ID         ADV Router      Age         Seq#       Checksum Tag
10.0.3.0        10.0.1.1        97          0x80000001 0x0069FB 3489661028
10.0.3.0        10.0.2.1        100         0x80000001 0x006202 3489661028
44.44.44.44     10.0.1.1        98          0x80000001 0x008136 3489661028
44.44.44.44     10.0.2.1        100         0x80000001 0x007A3C 3489661028
Hey, type 5 LSAs!

CE1#show ip ospf database extern 44.44.44.44

            OSPF Router with ID (22.22.22.22) (Process ID 1)

                Type-5 AS External Link States

  Routing Bit Set on this LSA
  LS age: 483
  Options: (No TOS-capability, DC)
  LS Type: AS External Link
  Link State ID: 44.44.44.44 (External Network Number )
  Advertising Router: 10.0.1.1
  LS Seq Number: 80000001
  Checksum: 0x8136
  Length: 36
  Network Mask: /32
        Metric Type: 2 (Larger than any link state path)
        TOS: 0
        Metric: 11
        Forward Address: 0.0.0.0
        External Route Tag: 3489661028
Type 5 E2s, at that.  If you want an E1 instead, change the way PE1 redistributes into OSPF:

PE1(config)#router ospf 2
PE1(config-router)#redistribute bgp 100 subnets metric-type 1
CE1#show ip ospf database extern 44.44.44.44

            OSPF Router with ID (22.22.22.22) (Process ID 1)

                Type-5 AS External Link States

  Routing Bit Set on this LSA
  LS age: 41
  Options: (No TOS-capability, DC)
  LS Type: AS External Link
  Link State ID: 44.44.44.44 (External Network Number )
  Advertising Router: 10.0.1.1
  LS Seq Number: 80000002
  Checksum: 0xFB3B
  Length: 36
  Network Mask: /32
        Metric Type: 1 (Comparable directly to link state metric)
        TOS: 0
        Metric: 11
        Forward Address: 0.0.0.0
        External Route Tag: 3489661028

I mentioned earlier the DN-bit only works on type 3's.  How is it these type 5's aren't creating loops?

                Type-5 AS External Link States
Link ID         ADV Router      Age         Seq#       Checksum Tag
10.0.3.0        10.0.1.1        97          0x80000001 0x0069FB 3489661028
10.0.3.0        10.0.2.1        100         0x80000001 0x006202 3489661028
44.44.44.44     10.0.1.1        98          0x80000001 0x008136 3489661028
44.44.44.44     10.0.2.1        100         0x80000001 0x007A3C 3489661028

The secret is in the tag, 3489661028.  I'm not going to go into how this tag is computed, but see page 12 of http://www.ietf.org/rfc/rfc4577.txt if interested.  The tag is treated the same way the DN-bit is: if it's there, don't allow the route back into MP-BGP.

Now, let's look at sham links...

Let's say we get a question on the lab exam telling us that, without changing areas, we want traffic to flow from CE1 to CE2 via the MPLS cloud.  This is difficult at first glance:  OSPF always prefers intra-area routes, and CE1 -> C -> CE2 is intra-area on area 1, and CE1 -> PE1 -> P -> PE2 -> CE2 is inter-area, because is crosses Super 0.  Adjusting metrics will not help us here: no matter how bad the metric, the traffic will always go intra-area.

So let's make the MPLS cloud intra-area!

PE1:
interface Loopback1
 ip vrf forwarding VPN
 ip address 111.111.111.111 255.255.255.255

router bgp 100
 address-family ipv4 vrf VPN
  network 111.111.111.111 mask 255.255.255.255

router ospf 2 vrf VPN
 area 1 sham-link 111.111.111.111 222.222.222.222

PE2:
interface Loopback1
 ip vrf forwarding VPN
 ip address 222.222.222.222 255.255.255.255

router bgp 100
 address-family ipv4 vrf VPN
  network 222.222.222.222 mask 255.255.255.255

router ospf 2 vrf VPN
 area 1 sham-link 222.222.222.222 111.111.111.111

Now that the config is in, let's see if the sham link came up:

PE1#show ip ospf 2 sham-links
Sham Link OSPF_SL1 to address 222.222.222.222 is up
Area 1 source address 111.111.111.111
  Run as demand circuit
  DoNotAge LSA allowed. Cost of using 1 State POINT_TO_POINT,
  Timer intervals configured, Hello 10, Dead 40, Wait 40,
    Hello due in 00:00:00
    Adjacency State FULL (Hello suppressed)
    Index 2/2, retransmission queue length 0, number of retransmission 0
    First 0x0(0)/0x0(0) Next 0x0(0)/0x0(0)
    Last retransmission scan length is 0, maximum is 0
    Last retransmission scan time is 0 msec, maximum is 0 msec

CE1#sh ip ospf database rout 10.0.1.1
<output omitted>
    Link connected to: another Router (point-to-point)
     (Link ID) Neighboring Router ID: 10.0.2.1
     (Link Data) Router Interface address: 0.0.0.18
      Number of TOS metrics: 0
       TOS 0 Metrics: 1

<output truncated>

There's our sham link.  So what's this do for us?

CE1#trace 33.33.33.33   ! 33.33.33.33 is Lo0 on CE2

Type escape sequence to abort.
Tracing the route to 33.33.33.33

  1 192.168.1.1 24 msec 56 msec 32 msec
  2 192.168.2.2 40 msec *  44 msec

Well, that's still the "backdoor" route instead of the MPLS.  But now that we have a control-plane link across the MPLS cloud, we can just play with the metric!

CE1(config)#int fa0/1
CE1(config-if)#ip ospf cost 15000
CE1#trace 33.33.33.33

Type escape sequence to abort.
Tracing the route to 33.33.33.33

  1 10.0.1.1 48 msec 52 msec 4 msec
  2 172.16.1.1 [MPLS: Labels 16/20 Exp 0] 100 msec 100 msec 88 msec
  3 10.0.2.1 [MPLS: Label 20 Exp 0] 56 msec 72 msec 64 msec
  4 10.0.2.2 92 msec *  92 msec

There we go!  Intra-area routing through the MPLS cloud.

So what's this sham link actually do?

It's a control-plane only link - very similar to a virtual-link, and very different from a GRE tunnel (which also could have been used to solve this problem).  A GRE tunnel would actually encapsulate the packets, where as all the sham link does is trick the OSPF process into believing area 1 stretched across Super 0.

There are some gotchas for setting it up.  Fortunately for those of us taking the lab exam, these are in the documentation.  Under the master command index, look up "area sham-link", and it almost walks you through it.

Here are the gotchas:
- You must peer the sham-link on /32 loopbacks.
- These loopbacks must be in the VRF
- These loopbacks must be advertised into the VRF BGP process
- These loopbacks must NOT be advertised into the CE-facing OSPF process

If you get all that down, you're good to go.

One last topic: capability vrf-lite

For this exercise, let's shut down the link between CE1 and PE1.  This will make CE1 the "end" of the network, learning all it's routes from C.

To top it off, let's throw in a NON-MPLS vrf (the definition of "vrf-lite"). 

CE1:
Int fa0/0
  shutdown

ip vrf VPN_X
  rd 2:2

Int fa0/1
  ip vrf forwarding VPN_X
  ip address 192.168.1.2 255.255.255.0
no router ospf 1

router ospf 1 vrf VPN_X
 log-adjacency-changes
 network 10.0.1.0 0.0.0.255 area 1
 network 22.22.22.22 0.0.0.0 area 1
 network 192.168.1.0 0.0.0.255 area 1

Let's check on how OSPF is handling all this.  In advance, I went back and disabled the domain-id on PE3, so that those routes will be seen as type 3 instead of type 5.

CE1#show ip ospf database summary 44.44.44.44

            OSPF Router with ID (192.168.1.2) (Process ID 1)

                Summary Net Link States (Area 1)

  Routing Bit Set on this LSA
  LS age: 495
  Options: (No TOS-capability, DC, Downward)
  LS Type: Summary Links(Network)
  Link State ID: 44.44.44.44 (summary Network Number)
  Advertising Router: 10.0.2.1
  LS Seq Number: 80000001
  Checksum: 0x12E1
  Length: 28
  Network Mask: /32
        TOS: 0  Metric: 11

We see 44.44.44.44 (loopback of CE3) as type 3, area 1.

CE1#sh ip route vrf VPN_X 44.44.44.44
% Network not in table
... but we can't route to it.

CE1#sh ip route vrf VPN_X 33.33.33.33
Routing entry for 33.33.33.33/32
  Known via "ospf 1", distance 110, metric 15011, type intra area
  Last update from 192.168.1.1 on FastEthernet0/1, 00:10:52 ago
  Routing Descriptor Blocks:
  * 192.168.1.1, from 33.33.33.33, 00:10:52 ago, via FastEthernet0/1
      Route metric is 15011, traffic share count is 1

CE2's loopback is routable.

So what's going on here?

The 44.44.44.44 LSA has the DN bit set from PE2.  The DN bit check is enabled on any router running vrf-aware OSPF, regardless of whether it's importing routes back into MP-BGP.  We can still route to 33.33.33.33 because it's not been through a PE.

You disable the DN bit check with:

CE1(config)#router ospf 1 vrf VPN_X
CE1(config-router)#capability vrf-lite
CE1#sh ip route vrf VPN_X 44.44.44.44
Routing entry for 44.44.44.44/32
  Known via "ospf 1", distance 110, metric 15031, type inter area
  Last update from 192.168.1.1 on FastEthernet0/1, 00:00:26 ago
  Routing Descriptor Blocks:
  * 192.168.1.1, from 10.0.2.1, 00:00:26 ago, via FastEthernet0/1
      Route metric is 15031, traffic share count is 1

An alternative solution is to make the routes appear external via PE3:

PE3(config)#router ospf 2 vrf VPN
PE3(config-router)#domain-id 0.0.0.3
CE1(config)#router ospf 1 vrf VPN_X
CE1(config-router)#no capability vrf-lite
<< wait a while for BGP to catch up >>

CE1#show ip ospf database ext 44.44.44.44

            OSPF Router with ID (192.168.1.2) (Process ID 1)

                Type-5 AS External Link States

  LS age: 2041
  Options: (No TOS-capability, DC)
  LS Type: AS External Link
  Link State ID: 44.44.44.44 (External Network Number )
  Advertising Router: 10.0.1.1
  LS Seq Number: 80000001
  Checksum: 0x8136
  Length: 36
  Network Mask: /32
        Metric Type: 2 (Larger than any link state path)
        TOS: 0
        Metric: 11
        Forward Address: 0.0.0.0
        External Route Tag: 3489661028

CE1#sh ip route vrf VPN_X 44.44.44.44
Routing entry for 44.44.44.44/32
  Known via "ospf 1", distance 110, metric 11
  Tag Complete, Path Length == 1, AS 100, , type extern 2, forward metric 15020
  Last update from 192.168.1.1 on FastEthernet0/1, 00:00:19 ago
  Routing Descriptor Blocks:
  * 192.168.1.1, from 10.0.2.1, 00:00:19 ago, via FastEthernet0/1
      Route metric is 11, traffic share count is 1
      Route tag 3489661028

And that's the headache of dealing with DN-bit with vrf-lite.  About five months ago I ran into this problem on a lab for the first time, and got stuck for three hours trying to figure out why I could see the LSA but couldn't route on it.  This one is important to be able to recognize quickly.

<gripe>
Now, was all this really worth it?  The sham-link even requires your service provider get involved in your traffic engineering.  What a hassle!  This entire config, including the loop prevention, while using OSPF internally on each "island", could have been BGP peered and gotten the same results with 100% less confusion.
</gripe>

Enjoy anyway...

Jeff Kronlage

6 comments:

  1. Good read. I tried to reproduce your lab. It works find with "cap vrf-lite" but when I change the domain-id I now get no route installed. In OSPF I now have a LSA-5 with the Down-bit set? Whats gives?

    PE21#sh ip ospf database external

    OSPF Router with ID (21.21.21.21) (Process ID 1)

    OSPF Router with ID (40.40.40.1) (Process ID 667)

    Type-5 AS External Link States

    LS age: 21
    Options: (No TOS-capability, DC, Downward)
    LS Type: AS External Link
    Link State ID: 44.44.44.44 (External Network Number )
    Advertising Router: 192.168.0.1
    LS Seq Number: 80000001
    Checksum: 0x6DFB
    Length: 36
    Network Mask: /32
    Metric Type: 2 (Larger than any link state path)
    MTID: 0
    Metric: 2
    Forward Address: 0.0.0.0
    External Route Tag: 3489726428

    LS age: 21

    ReplyDelete
    Replies
    1. Carsten, I'll see if I can get my brain back in OSPF-land in the next few days and get you an answer...
      Cheers
      Jeff

      Delete
    2. I don't have the original files from this lab, but I recreated it per my original spec, and it worked as I originally wrote it:

      VRFTEST#sh ip ospf database extern

      OSPF Router with ID (192.168.22.2) (Process ID 100)

      Type-5 AS External Link States

      Routing Bit Set on this LSA
      LS age: 140
      Options: (No TOS-capability, DC)
      LS Type: AS External Link
      Link State ID: 44.44.44.44 (External Network Number )
      Advertising Router: 192.168.12.1
      LS Seq Number: 80000001
      Checksum: 0x282E
      Length: 36
      Network Mask: /32
      Metric Type: 2 (Larger than any link state path)
      TOS: 0
      Metric: 2
      Forward Address: 0.0.0.0
      External Route Tag: 3489661028

      I have cap vrf-lite off as well. Not sure why you're seeing that behavior.

      Here's some outside information if you want to read up some more-

      https://www.racf.bnl.gov/Facility/TechnologyMeeting/Archive/06-30-04-CISCO/Using-OSPF-in-MPLS-VPN-Environment.pdf

      See page 23, it discusses your question.

      Delete
  2. Great effort man. Thanks a lot. This resolved my problem. I was facing the same scenario in production. Our CE is using vrf lite at their router hence summary route from our PE was not inserting their routing table. Now it resolved by enabling "cap vrf-lite" . Thanks!

    ReplyDelete
  3. Thanks a lot Jeff Kronlage. It really helped. Adding to Carsten Fandrup's concern on external routes, It seems route tag is preventing external routes to insert into routing table. Once cap lite-vrf is configured, those external routes get into the routing table.

    ReplyDelete