I'm going to briefly show some scenarios which require you to think beyond single-hub design for the command structure to make sense. I can absolutely imagine Cisco would throw requirements for commands that only make sense in a larger network into the lab. My preference for my blog is to understand the practicality, design theory, and use cases behind commands, not just "if you apply this you get action X".
So, at a high level - What is DMVPN?
DMVPN stands for Dynamic Multipoint Virtual Private Network. It's a Cisco proprietary tunnel technology with a hub-and-spoke control-plane and spoke to spoke tunnels. Assuming "Phase 2" or newer (more on phases later), a normal use case is to establish a full-mesh VPN over the Internet with minimal configuration. For example, having 10 routers that all needed VPNs to one another would have the "full mesh formula" apply of N(N-1)/2, or 10(10-1)/2 = 45 tunnels. That's a lot of config. On the other hand, with DMVPN, you create the config for just 10 tunnels. The 45 might still happen if every router in fact needed to contact every other router at the same time, but we let the routers handle that part dynamically.
Here's our diagram:
R1 will be our single hub, R2, R3, and R4 are all spokes. "INTERNET" represents the Internet. In theory, these routers could alternatively be dozens of hops from each other, but the concept doesn't change.
As I explained above, DMVPN's control plane is hub-and-spoke, and is R1 is our hub, whatever routing protocol we're using will be pinned up via R1 to each individual spoke.
So our control plane will look like this:
However, our traffic flows can be full-mesh, so the data plane will (theoretically) look like this:
This is largely dependent on which routers needed to talk to which other routers. While hub-to-spoke tunnels are always up, spoke-to-spoke are "on demand" and are established dynamically.
The nuts and bolts of how this work depends largely on what development "phase" of DMVPN you're using. We'll talk more about that shortly. First, let's take a high-level look at the three technologies that make up DMVPN - MGRE, IPSEC, and NHRP.
GRE - Generic Routing Encapsulation - creates unencrypted tunnels between two endpoints. MGRE creates Multipoint GRE tunnels. These tunnels can be established to endpoints based on information discovered via NHRP, discussed below.
I'm going to assume the audience has had a general exposure to IPSEC in the past. In our case, we're just using it to optionally encrypt the MGRE tunnel we're performing our routing on. I am not going to deep-dive IOS-based IPSEC with this post, one assumption I am making is that the IPSEC/VPN requirement for v5 is going to be "DocCD level", or something you can pull out of the documentation "stock" or "near stock" on short notice.
NHRP - this is really what makes the magic happen on DMVPN. NHRP, at its core, resolves private addresses (those behind MGRE and optionally IPSEC) to a public address. In our example, that public network will be assumed to be the Internet. NHRP treats this public network like a big NBMA area. In fact, several comparisons can be drawn between NBMA frame relay and NHRP/DMVPN, to the point where I'm betting some of the old frame-relay tricks from the R&S lab will be repeated in DMVPN. NHRP facilitates registration between the spokes and the hubs, and helps the spokes resolve the public address of another spoke based on the tunneled IPs behind it.
Next, let's look at the three phases of DMVPN and some sample config for all of them.
DMVPN "Phase 1". This phase is largely unused, and, as I understand it, was an early deployment model. When most people refer to "DMVPN" these days, they're talking about the behavior expected from Phase 2 or Phase 3, not Phase 1.
Phase 1 pins not only the control plane through the hub, but also the data plane, so all your traffic goes through the hub.
The differentiating components of Phase 1 are:
- An MGRE tunnel on the hub
- A standard GRE tunnel on the spokes
- A routing protocol on the hub that sets next-hop-self
A sample config based on our diagram from above:
For brevity, this config is applied on all four routers identically, but I will only show it here once:
crypto isakmp policy 1
encr aes 256
authentication pre-share
group 5
crypto isakmp key ABCcisco123 address 0.0.0.0
crypto ipsec transform-set TRANSFORM-SET esp-aes esp-sha-hmac
mode transport
crypto ipsec profile IPSEC_PROFILE
set transform-set TRANSFORM-SET
interface Tunnel1
ip address 10.0.0.1 255.255.255.0
no ip redirects
no ip split-horizon eigrp 100
ip nhrp authentication CISCO
ip nhrp map multicast dynamic
ip nhrp network-id 1
tunnel source FastEthernet0/0
tunnel mode gre multipoint
tunnel protection ipsec profile IPSEC_PROFILE
router eigrp 100
network 1.1.1.1 0.0.0.0
network 10.0.0.0 0.0.0.255
!R2 - A Spoke
interface Tunnel1
ip address 10.0.0.2 255.255.255.0
ip nhrp authentication CISCO
ip nhrp map 10.0.0.1 87.14.10.1
ip nhrp network-id 1
ip nhrp nhs 10.0.0.1
tunnel source FastEthernet0/0
tunnel destination 87.14.10.1
tunnel protection ipsec profile IPSEC_PROFILE
router eigrp 100
network 1.1.1.1 0.0.0.0
network 10.0.0.0 0.0.0.255
!R3 - Another Spoke
interface Tunnel1
ip address 10.0.0.3 255.255.255.0
ip nhrp authentication CISCO
ip nhrp map 10.0.0.1 87.14.10.1
ip nhrp network-id 1
ip nhrp nhs 10.0.0.1
tunnel source FastEthernet0/0
tunnel destination 87.14.10.1
tunnel protection ipsec profile IPSEC_PROFILE
router eigrp 100
network 3.3.3.3 0.0.0.0
network 10.0.0.0 0.0.0.255
!R4 - Another Spoke
interface Tunnel1
ip address 10.0.0.4 255.255.255.0
ip nhrp authentication CISCO
ip nhrp map 10.0.0.1 87.14.10.1
ip nhrp network-id 1
ip nhrp nhs 10.0.0.1
tunnel source FastEthernet0/0
tunnel destination 87.14.10.1
tunnel protection ipsec profile IPSEC_PROFILE
router eigrp 100
network 4.4.4.4 0.0.0.0
network 10.0.0.0 0.0.0.255
Assume the "Internet" router knows how to reach all public IP space on the 87.14.0.0/16 network, and that each router participating in DMVPN has a private loopback of X.X.X.X, where X is it's router number.
Before we look at the output from this config, let's break it apart a bit.
Note: I'm going to deliberately ignore most of the crypto config, this can be pulled out of the DocCD very easily from "Security" and then "Secure Connectivity Configuration Guide Library, Cisco IOS Release 15M&T", and then "Dynamic Multipoint VPN Configuration Guide, Cisco IOS Release 15M&T".
On R1, the hub -
crypto ipsec transform-set TRANSFORM-SET esp-aes esp-sha-hmac
mode transport
This is the only part of the crypto config I'm going to drill into. I was initially confused as to when to use "mode tunnel" and when to use "mode transport". I've seen examples with both. Unless you are doing a multi-tier DMVPN hub (one set of routers doing crypto-only, another set doing NHRP and the routing), which is clearly out of scope of the R&S lab, you want to use transport. Tunnel adds 20 bytes of overhead and comes out with the same exact results as transport. I suppose if you got a lab question that said "use the IPSEC method that requires more overhead", this might be important? The rest of this document will assume we are using transport only.
no ip split-horizon eigrp 100 - Clearly, we're going to be taking EIGRP routes in from one spoke and passing them to another. If we don't disable split horizon that process will not happen.
ip nhrp authentication CISCO - This is a plain-text key, more of an "identifier" than a password, keeping in mind that this traffic will be inside IPSEC, it doesn't need it's own encryption method per se.
ip nhrp map multicast dynamic - This tells the hub to pseudo-multicast to any spoke that registers to it. This is (usually) necessary for the routing protocol to communicate.
ip nhrp network-id 1 - This is a local identifier only. It is not communicated across the network. Think of it similarly to the OSPF process number. You must have it enabled, and it must be unique to your tunnel, or NHRP will not work.
tunnel source FastEthernet0/0 - where to source tunnel packets from
tunnel mode gre multipoint - This may as well read "tunnel mode gre dmvpn". A GRE multipoint tunnel, by its nature, must use NHRP for resolution.
tunnel protection ipsec profile IPSEC_PROFILE - Encrypt this tunnel with our IPSEC config from above. Note, the IPSEC config above used a pre-shared key (PSK), but it's worth pointing out that a public key infrastructure (PKI) can be used as well, although that is beyond the scope of this document.
On R2, a spoke (excluding any repetition of commands I explained on the hub) -
ip nhrp map 10.0.0.1 87.14.10.1 - This is a lot like the frame-relay map command, that, if you were a student of CCIE v4, you are well familiar with. In this case, we're mapping private IP 10.0.0.1 to NBMA IP 87.14.10.1. This is to facilitate registration to the hub.
ip nhrp nhs 10.0.0.1 - nhs stands for "next hop server", which is the hub. This basically says "register my private IP address to my NBMA address (87.14.20.1 on this case) with the hub, so it knows how to reach me.
tunnel destination 87.14.10.1 - You'll note a lack of the tunnel mode gre multipoint command on this tunnel. That's because in phase 1, the spokes only get regular GRE tunnels. So in this case, we have to set the destination statically to the hub.
Let's now look at the outcome of all this on R1, the hub.
R1#sh ip nhrp
10.0.0.2/32 via 10.0.0.2
Tunnel1 created 00:52:07, expire 01:35:14
Type: dynamic, Flags: unique registered used
NBMA address: 87.14.20.1
10.0.0.3/32 via 10.0.0.3
Tunnel1 created 00:51:25, expire 01:35:13
Type: dynamic, Flags: unique registered used
NBMA address: 87.14.30.1
10.0.0.4/32 via 10.0.0.4
Tunnel1 created 00:50:55, expire 01:35:13
Type: dynamic, Flags: unique registered used
NBMA address: 87.14.40.1
We see the three mappings for the three spokes that registered to the hub. We see type "dynamic" - meaning it was learned through registration, "unique registered" - meaning the spoke has instructed the hub not to take a registration from another NBMA address but with the same private address, and of course we see the NBMA address for each IP listed.
On the topic of NHRP mappings, optionally, we could also add static entries on the hub:
R1(config-if)#ip nhrp map 10.0.0.10 4.2.2.2
R1(config-if)#ip nhrp map multicast 4.2.2.2 ! optional
R1#sh ip nhrp | s 10.0.0.10
10.0.0.10/32 via 10.0.0.10
Tunnel1 created 00:00:09, never expire
Type: static, Flags:
NBMA address: 4.2.2.2
This entry will, of course, do nothing, but I wanted to demonstrate the idea.
We can also see who we're pseudo-multicasting towards:
R1#sh ip nhrp multicast
I/F NBMA address
Tunnel1 4.2.2.2 Flags: static
Tunnel1 87.14.20.1 Flags: dynamic
Tunnel1 87.14.30.1 Flags: dynamic
Tunnel1 87.14.40.1 Flags: dynamic
What about the routing protocol?
R1#sh ip eigrp neigh
EIGRP-IPv4 Neighbors for AS(100)
H Address Interface Hold Uptime SRTT RTO Q Seq
(sec) (ms) Cnt Num
2 10.0.0.2 Tu1 14 00:33:26 299 1794 0 13
1 10.0.0.4 Tu1 11 00:33:32 818 4908 0 13
0 10.0.0.3 Tu1 11 00:33:33 409 2454 0 16
We have EIGRP peerings with all the neighbors.
At this point, we should have any-to-any connectivity, via the hub. Let's test it out:
R4#ping 3.3.3.3 source lo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 3.3.3.3, timeout is 2 seconds:
Packet sent with a source address of 4.4.4.4
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 168/205/240 ms
R4#trace 3.3.3.3 source lo0
Type escape sequence to abort.
Tracing the route to 3.3.3.3
VRF info: (vrf in name/id, vrf out name/id)
1 10.0.0.1 156 msec 152 msec 128 msec
2 10.0.0.3 224 msec 236 msec 252 msec
We see we have connectivity from 4.4.4.4 (loopback of R4) to 3.3.3.3 (loopback of R3), however, it goes through the hub - not very efficient, since the hub doesn't need to be in the forwarding path. That is, however, the drawback of phase 1.
Phase 2 is where DMVPN really starts to shine, because it gets the hub (more or less) out of the data plane forwarding path for spoke-to-spoke communication.
Building off our existing config, let's implement a phase 2 configuration.
R1:
interface Tunnel1
no ip next-hop-self eigrp
no ip nhrp map 10.0.0.10 4.2.2.2 ! just for cleanup
no ip nhrp map multicast 4.2.2.2 ! just for cleanup
R2-R4:
interface Tunnel1
no tunnel destination 87.14.10.1
tunnel mode gre multipoint
ip nhrp map multicast 87.14.10.1
We'll do a high-level breakdown of this config, then spend a good bit of time on the theory behind Phase 2. While the config isn't a complex change, there's a lot more going on behind the scenes.
On the hub:
no ip next-hop-self eigrp - This is absolutely vital. You can setup the rest of NHRP to happily work spoke-to-spoke, but if you don't modify the control plane to not modify the next hops, you're going to get behavior very akin to phase 1.
On any spoke:
no tunnel destination 87.14.10.1 - this is only used with a regular GRE tunnel and isn't required any longer.
tunnel mode gre multipoint - Swap from a point-to-point to multipoint tunnel on the spokes. Now, the spokes will be using NHRP for resolution as well as the hub.
ip nhrp map multicast 87.14.10.1 - When we were using a standard GRE tunnel, it was inherently point-to-point, and natively supported multicast without any extra instructions. Now, we have to tell it to pseudo-multicast to the hub.
Before we deep dive into what's going on behind the scenes, let's look at what's changed.
R3#sh ip route 2.2.2.2
Routing entry for 2.2.2.2/32
Known via "eigrp 100", distance 90, metric 28288000, type internal
Redistributing via eigrp 100
Last update from 10.0.0.2 on Tunnel1, 00:04:52 ago
Routing Descriptor Blocks:
* 10.0.0.2, from 10.0.0.1, 00:04:52 ago, via Tunnel1
Route metric is 28288000, traffic share count is 1
Total delay is 105000 microseconds, minimum bandwidth is 100 Kbit
Reliability 255/255, minimum MTU 1434 bytes
Loading 1/255, Hops 2
We learned 2.2.2.2 (R2's loopback) on R3 via 10.0.0.1 (R1), but the forwarding path is via 10.0.0.2.
That's great, how do we reach 10.0.0.2?
R3#sh ip route 10.0.0.2
Routing entry for 10.0.0.0/24
Known via "connected", distance 0, metric 0 (connected, via interface)
Redistributing via eigrp 100
Routing Descriptor Blocks:
* directly connected, via Tunnel1
Route metric is 0, traffic share count is 1
OK, not so fast... while it is "on subnet" on Tunnel1, Tunnel1 is NBMA, so we can't just forward there without some type of resolution.
R3#sh ip cef 10.0.0.2 internal
10.0.0.0/24, epoch 0, flags attached, connected, cover dependents, need deagg, RIB[C], refcount 5, per-destination sharing
sources: RIB
feature space:
IPRM: 0x0003800C
subblocks:
gsb Connected chain head(1): 0x6AAF5DF4
Covered dependent prefixes: 3
need deagg: 2
notify cover updated: 1
ifnums:
Tunnel1(10)
path 6B108C6C, path list 6B101100, share 1/1, type connected prefix, for IPv4
connected to Tunnel1, adjacency punt
output chain: punt
The important lines are the bottom two. I've read in other blogs that we should see a "glean adjacency" for unresolved NHRP next hops, but I haven't been able to reproduce that on 15.2; I suspect Cisco changed the output. But there's our answer plain as day: punt. This traffic cannot be CEF forwarded because we have an unresolved dependency; we don't know how to get to 10.0.0.2.
The CPU knows we need NHRP for this to work, and it doesn't have a resolution in its NHRP cache yet:
R3#sh ip nhrp
10.0.0.1/32 via 10.0.0.1
Tunnel1 created 00:20:23, never expire
Type: static, Flags: used
NBMA address: 87.14.10.1
So how do we get it?
This is a reasonably clever process, and it only gets more clever once we get into Phase 3. The CPU, after the punt, will forward the traffic to the hub by default. This ensures while we're waiting on NHRP to do its thing and the spoke-to-spoke tunnel to build, we're not dropping packets. Generally speaking you can expect the first 2-3 packets to get punted.
On the hub, debug nhrp
On the spoke:
R3#ping 2.2.2.2 source lo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
Packet sent with a source address of 3.3.3.3
!
<cutting to the debugs on R1>
<note this has been cut to the key elements for brevity, debug nhrp is wordy>
NHRP: Tunnels gave us remote_nbma: 87.14.30.1 for Redirect
NHRP: Receive Resolution Request via Tunnel1 vrf 0, packet size: 85
The hub knows R3 needs a resolution for R2.
NHRP: nhrp_rtlookup for destination on 10.0.0.2 yielded interface Tunnel1, prefixlen 24
NHRP-ATTR: In nhrp_recv_resolution_request NHRP Resolution Request packet is forwarded to 10.0.0.2
NHRP: Attempting to forward to destination: 10.0.0.2
NHRP: Attempting to send packet via DEST 10.0.0.2
NHRP: Encapsulation succeeded. Sending NHRP Control Packet NBMA Address: 87.14.20.1
NHRP: Forwarding Resolution Request via Tunnel1 vrf 0, packet size: 105
src: 10.0.0.1, dst: 10.0.0.2
NHRP: 129 bytes out Tunnel1
But it doesn't answer R3. Instead, it forwards the NHRP request to R2, which included R3's NBMA address. Not pictured here, it also forwards the ping packet from R3 to R2 at the same time, so that no traffic is lost.
Meanwhile, on R2... R2 has received the initial ping echo request, along with the NHRP control packet. R2 will now set up an encrypted (IPSEC) MGRE tunnel back to R3! However, in the meantime, it still needs to forward it's echo reply, and we can't just stall that until the tunnel comes up. So we see the reverse traffic from R2, trying to resolve for R3.
NHRP: Receive Resolution Request via Tunnel1 vrf 0, packet size 85
NHRP: nhrp_rtlookup for destination on 10.0.0.3 yielded interface Tunnel1, prefixlen 24
NHRP-ATTR: In nhrp_recv_resolution_request NHRP Resolution Request packet is forwarded to 10.0.0.3
NHRP: Attempting to forward to destination: 10.0.0.3
NHRP: Attempting to send packet via DEST 10.0.0.3
NHRP: Encapsulation succeeded. Sending NHRP Control Packet NBMA Address: 87.14.30.1
And the traffic is delivered to R3, via R1.
During this process I've also enabled debug dmvpn all tunnel on R2, so we can see the crypto process fire off (note, this was also edited for brevity):
IPSEC-IFC MGRE/Tu1: Checking to see if we need to delay for src 87.14.20.1 dst 87.14.30.1
IPSEC-IFC MGRE/Tu1(87.14.20.1/87.14.30.1): Opening a socket with profile IPSEC_PROFILE
IPSEC-IFC MGRE/Tu1(87.14.20.1/87.14.30.1): connection lookup returned 0
IPSEC-IFC MGRE/Tu1(87.14.20.1/87.14.30.1): Triggering tunnel immediately.
IPSEC-IFC MGRE/Tu1: Adding Tunnel1 tunnel interface to shared list
IPSEC-IFC MGRE/Tu1: Need to delay.
IPSEC-IFC MGRE/Tu1(87.14.20.1/87.14.30.1): good socket ready message
IPSEC-IFC MGRE/Tu1(87.14.20.1/87.14.30.1): Got MTU message mtu 1458
IPSEC-IFC MGRE/Tu1(87.14.20.1/87.14.30.1): tunnel_protection_socket_up
IPSEC-IFC MGRE/Tu1(87.14.20.1/87.14.30.1): Signalling NHRP
And back on R3, we can see that the tunnel is up:
R3#show dmvpn | b Tunnel1
Interface: Tunnel1, IPv4 NHRP Details
Type:Spoke, NHRP Peers:2,
# Ent Peer NBMA Addr Peer Tunnel Add State UpDn Tm Attrb
----- --------------- --------------- ----- -------- -----
1 87.14.10.1 10.0.0.1 UP 00:04:38 S
1 87.14.20.1 10.0.0.2 UP 00:04:03 D
We can also see that now that we have resolution for R2 (and a dynamic tunnel), we can now directly CEF switch to it:
R3#sh ip cef 2.2.2.2 internal
2.2.2.2/32, epoch 0, RIB[I], refcount 5, per-destination sharing
sources: RIB
feature space:
IPRM: 0x00028000
ifnums:
Tunnel1(12): 10.0.0.2
path 6B1081EC, path list 6B100D40, share 1/1, type attached nexthop, for IPv4
nexthop 10.0.0.2 Tunnel1, adjacency IP midchain out of Tunnel1, addr 10.0.0.2 6AE17200
output chain: IP midchain out of Tunnel1, addr 10.0.0.2 6AE17200 IP adj out of FastEthernet0/0, addr 87.14.30.100 6925D980
We see the appropriate next-hop on Tunnel1, and no longer a mention of "punt".
Just to further prove the point:
R3#trace 2.2.2.2 source lo0
Type escape sequence to abort.
Tracing the route to 2.2.2.2
VRF info: (vrf in name/id, vrf out name/id)
1 10.0.0.2 172 msec 192 msec 184 msec
We also see we have an NHRP resolution now:
R3#sh ip nhrp
10.0.0.1/32 via 10.0.0.1
Tunnel1 created 00:08:01, never expire
Type: static, Flags: used
NBMA address: 87.14.10.1
10.0.0.2/32 via 10.0.0.2
Tunnel1 created 00:07:26, expire 01:52:36
Type: dynamic, Flags: router used
NBMA address: 87.14.20.1
10.0.0.3/32 via 10.0.0.3
Tunnel1 created 00:07:24, expire 01:52:35
Type: dynamic, Flags: router unique local
NBMA address: 87.14.30.1
(no-socket)
You'd see the flip-side of that same output on R2.
Before we push on to Phase 3, we need to spend some time looking at the various possible routing protocols for the NHRP/DMVPN control plane, and some of the oddities.
We've covered EIGRP fairly well thus far. The only thing I need to add, is that in a production environment, you need to set the bandwidth manually on the interface, regardless of whether or nor you're using it as a QoS value. You may remember back from your CCNA/CCNP days that EIGRP will only use half the available bandwidth of a link:
R3#show int tun1 | i bandwidth
Tunnel transmit bandwidth 8000 (kbps)
Tunnel receive bandwidth 8000 (kbps)
Unfortunately, 4K won't get you too many EIGRP updates.
R1-R4:
interface Tunnel1
bandwidth 1000 ! or any reasonable number for your environment
We'll look at RIP next - it's super-easy.
R1-R4:
no router eigrp 100
router rip
version 2
network 10.0.0.0
network <their specific loopback prefix>
R1:
interface Tunnel1
no ip split-horizon
That's it ...
R3#sh ip route rip | b Gateway
Gateway of last resort is 87.14.30.100 to network 0.0.0.0
R 1.0.0.0/8 [120/1] via 10.0.0.1, 00:00:06, Tunnel1
R 2.0.0.0/8 [120/2] via 10.0.0.1, 00:00:06, Tunnel1
3.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
R 3.0.0.0/8 [120/2] via 10.0.0.1, 00:01:31, Tunnel1
R 4.0.0.0/8 [120/2] via 10.0.0.1, 00:00:06, Tunnel1
R3#ping 2.2.2.2 source lo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
Packet sent with a source address of 3.3.3.3
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 304/326/336 ms
R3#trace 2.2.2.2 source lo0
Type escape sequence to abort.
Tracing the route to 2.2.2.2
VRF info: (vrf in name/id, vrf out name/id)
1 10.0.0.2 160 msec 192 msec 196 msec
In order to get Spoke->Spoke, and not Spoke->Hub->Spoke, you need to make sure you're using RIPv2.
On to BGP.
R1:
no router rip
router bgp 100
bgp log-neighbor-changes
network 1.1.1.1 mask 255.255.255.255
network 10.0.0.0 mask 255.255.255.0
neighbor 10.0.0.2 remote-as 100
neighbor 10.0.0.2 route-reflector-client
neighbor 10.0.0.3 remote-as 100
neighbor 10.0.0.3 route-reflector-client
neighbor 10.0.0.4 remote-as 100
neighbor 10.0.0.4 route-reflector-client
R2-R4:
no router rip
router bgp 100
bgp log-neighbor-changes
network <local Loopback Prefix> mask 255.255.255.255
network 10.0.0.0 mask 255.255.255.0
neighbor 10.0.0.1 remote-as 100
R1 of course needs to be a route reflector, or the other iBGP spokes wouldn't receive the other spoke routes.
R4#ping 2.2.2.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 240/301/376 ms
R4#trace 2.2.2.2 source lo0
Type escape sequence to abort.
Tracing the route to 2.2.2.2
VRF info: (vrf in name/id, vrf out name/id)
1 10.0.0.2 156 msec 192 msec 184 msec
For OSPF, you'll need to use network type broadcast or non-broadcast. There's no point-to-multipoint support until Phase 3, which we'll go over in detail later.
R1:
no router bgp 100
router ospf 1
network 1.1.1.1 0.0.0.0 area 0
network 10.0.0.0 0.0.0.255 area 0
interface Tunnel1
ip ospf network broadcast
ip ospf priority 255
R2-R4:
no router bgp 100
router ospf 1
network <Local Loopback Address> 0.0.0.0 area 0
network 10.0.0.0 0.0.0.255 area 0
interface Tunnel1
ip ospf network broadcast
ip ospf priority 0
Broadcast is used here to avoid changing the next-hop. If it were changed, we'd end up with Spoke->Hub->Spoke flows. We want to be careful that the hub becomes the DR, hence changing the ospf priorities.
R3#ping 2.2.2.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 164/183/204 ms
R3#trace 2.2.2.2
Type escape sequence to abort.
Tracing the route to 2.2.2.2
VRF info: (vrf in name/id, vrf out name/id)
1 10.0.0.2 172 msec 184 msec 196 msec
We can also use non-broadcast. Imagine a task that didn't allow multicast mappings to be used, but required an IGP to be run.
R1:
interface Tunnel1
ip ospf network non-broadcast
no ip nhrp map multicast dynamic
router ospf 1
neighbor 10.0.0.2
neighbor 10.0.0.3
neighbor 10.0.0.4
R2-R4:
interface Tunnel1
ip ospf network non-broadcast
no ip nhrp map multicast 87.14.10.1
** I actually rebuilt all the tunnels here to clear the NHRP cache thoroughly - I've found "clear ip nhrp" doesn't always produce the results I'd expect **
R1#sh ip ospf neigh
Neighbor ID Pri State Dead Time Address Interface
2.2.2.2 0 FULL/DROTHER 00:01:51 10.0.0.2 Tunnel1
3.3.3.3 0 FULL/DROTHER 00:01:57 10.0.0.3 Tunnel1
4.4.4.4 0 FULL/DROTHER 00:01:43 10.0.0.4 Tunnel1
R3#ping 2.2.2.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 208/288/360 ms
R3#trace 2.2.2.2
Type escape sequence to abort.
Tracing the route to 2.2.2.2
VRF info: (vrf in name/id, vrf out name/id)
1 10.0.0.2 168 msec 188 msec 168 msec
I mentioned at the beginning of the article that I was going to go outside the scope of R&S v5 in order to explain the "why would I use this?" behind some topics. Phase 3 DMVPN, if you're only looking at it from a handful of routers, doesn't make near as much sense. You need to take a step back and realize the challenges Phase 2 would bring if you had, say, 1,500 DMVPN spokes.
In a scenario like that, you're clearly not going to have just one hub. In fact, not even having primary/backup would be sufficient, because one router simply cannot terminate 1,500 IPSEC tunnels from a CPU perspective. In order to scale Phase 2 in volume of spokes, you had to build a topology that looked something like this:
Let's pretend SPOKES1, 2 and 3 each represented 500 spokes. They'd register to HUB1, 2, and 3, respectively. I'm not going to get into DR/failover scenarios here, but you can start seeing the problem - each hub has it's own NHRP database, which isn't shared with it's neighbors. What happens when a spoke in SPOKES1 wants to reach a spoke in SPOKES3?
Phase 2 solved this by using daisy-chained NHRP. In short, HUB1 became a NHRP client of HUB2, which became a NHRP client of HUB3, which became an NHRP client of HUB1. It would look something like this:
You can reference Cisco's drawing of the same solution here; reference figure 3-4:
http://www.cisco.com/c/en/us/td/docs/solutions/Enterprise/WAN_and_MAN/DMVPDG/DMVPN_3.html
The config is reasonably simple. This isn't something I have mocked up right now, but let's pretend they're each using tunnel 1 and have a tunnel IP address of 10.0.0.X, where X is the hub number.
HUB1:
interface Tunnel1
ip nhrp map N.B.M.A2 10.0.0.2
ip nhrp nhs 10.0.0.2
HUB2:
interface Tunnel1
ip nhrp map N.B.M.A3 10.0.0.3
ip nhrp nhs 10.0.0.3
HUB3:
interface Tunnel1
ip nhrp map N.B.M.A1 10.0.0.1
ip nhrp nhs 10.0.0.1
Let's demonstrate with a sample of 4.4.4.4 trying to reach 2.2.2.2. It's important to note here that the route to 2.2.2.2 has a next-hop of R1's private address, which was resolved by the static entry on R1, so there's no CEF failure!
So now R4 knows R1 isn't the best path to R2. At this point, R4 needs to send an NHRP resolution request to R2 (not the hub!), to find out how to reach it directly. In the meantime, it knows R1 will continue to forward packets for it.
Since R4 still can't speak directly to R2, the NHRP resolution gets forwarded via R1, but not processed via R1. In Phase 3, it's no longer R1's job to answer NHRP resolutions.
R2 receives the resolution, and responds directly to R4 (similar to the way Phase 2 worked at this point), also initiating the tunnel to R4.
At this point, R2 and R4 would still have R1 as the next hop for one another, but Phase 3 fixes that as well, rewriting CEF in order to fix this issue.
In modern versions of IOS, you can actually see the rewrite (more or less) via the routing table:
R4#sh ip route ospf | b Gateway
Gateway of last resort is 87.14.40.100 to network 0.0.0.0
1.0.0.0/32 is subnetted, 1 subnets
O 1.1.1.1 [110/1001] via 10.0.0.1, 00:03:12, Tunnel1
2.0.0.0/32 is subnetted, 1 subnets
O % 2.2.2.2 [110/2001] via 10.0.0.1, 00:02:43, Tunnel1
3.0.0.0/32 is subnetted, 1 subnets
O 3.3.3.3 [110/2001] via 10.0.0.1, 00:03:12, Tunnel1
10.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
O 10.0.0.1/32 [110/1000] via 10.0.0.1, 00:03:12, Tunnel1
O 10.0.0.2/32 [110/2000] via 10.0.0.1, 00:02:43, Tunnel1
O 10.0.0.3/32 [110/2000] via 10.0.0.1, 00:03:12, Tunnel1
There's our shortcut table. What's a shortcut, you might ask? Let's look at the handful of simple commands necessary to make Phase 3 work.
First, the routing protocol must point back towards the hub, instead of towards the spoke, like we were setup for in Phase 2.
R1-R4:
interface Tunnel1
ip ospf network point-to-multipoint
Point-to-Multipoint will rewrite the nexthop as the hub instead of Broadcast or Non-Broadcast, which did not. Also, not pictured here, I re-established the hub<->spoke multicast prior to this. Another important footnote is that with Point-to-Multipoint, we no longer need the DR/BDR we were stuck with in Phase 2 (effectively limiting us to two hubs). This design also permits for summarization (or even just a default route), which Phase 2 certainly did not allow for. More on this in a bit.
R1:
interface Tunnel1
ip nhrp redirect
R2-R4:
interface Tunnel1
ip nhrp shortcut
ip nhrp redirect goes on the hub only (note many installations will just put ip nhrp redirect and ip nhrp shortcut on every device; this is not necessary). ip nhrp redirect enables the behavior described earlier: if a packet is received and transmitted on the same MGRE tunnel, send the redirect. You can actually see who we've sent redirects for recently:
R1#sh ip nhrp redirects
I/F NBMA address Destination Drop Count Expiry
Tunnel1 87.14.40.1 3.3.3.3 4 00:00:06
Tunnel1 87.14.30.1 10.0.0.4 4 00:00:06
We still have 500 spokes registering to HUB1, HUB2, and HUB3, from SPOKES1, SPOKES2, and SPOKES3, respectively.
What if a router in SPOKES1 wants to build a spoke-to-spoke tunnel to a router in SPOKES3?
Here we see the first packet leave the spoke in SPOKES1. HUB1 will forward this packet (according to the routing table via CORE1, not pictured here). HUB1 then sends a redirect back towards the spoke in SPOKES1.
Because the hubs no longer answer NHRP requests, there is no need to NHRP daisy chain the hubs! So in our next diagram, we're going to watch the NHRP resolution request be routed to its destination.
Again, this is a routed IP packet, HUB1, CORE1, and HUB3 are not NHRP-processing this packet, they're just CEF-switching it.
When the target spoke in SPOKES3 receives the NHRP resolution request, it replies directly to the originating spoke in SPOKES1:
Much more scalable than Phase 2.
This leads back into summarization. In Phase 3 there is no need to have the full routing table. You can send out a summary for your network, or even a default.
I can't summarize intra-area in OSPF, so I'm switching back to EIGRP (not pictured here).
R1:
router eigrp 100
ip summary-address eigrp 100 0.0.0.0 248.0.0.0
Sorry for the weird summary - I didn't do myself any favors by using 1.1.1.1 - 4.4.4.4 for the loopbacks. You try summarizing those :)
R3#sh ip route eigrp | b Gateway
Gateway of last resort is 87.14.30.100 to network 0.0.0.0
D 0.0.0.0/5 [90/3968000] via 10.0.0.1, 00:02:26, Tunnel1
Pretty darn slick.
What about IPv6 DMVPN?
Note, there is no IPv6 over IPv6 DMVPN yet - at least not on my IOS. So we'll be tunneling v6 over v4.
No changes to the existing tunnels are required, we just add v6 to our existing infrastructure.
I've added X::X/64 to every Loopback0, and 10::X/64 to every Tunnel1, where X is the router number.
R1:
ipv6 unicast-routing
ipv6 router eigrp 100
no shut
ipv6 nhrp map multicast dynamic
ipv6 nhrp network-id 1
ipv6 nhrp redirect
R2-R4:
ipv6 unicast-routing
ipv6 router eigrp 100
no shut
interface Tunnel1
ipv6 address 10::X/64 ! Where X is the router number
ipv6 eigrp 100
ipv6 nhrp map multicast 87.14.10.1
ipv6 nhrp map 10::1/128 87.14.10.1
ipv6 nhrp network-id 1
ipv6 nhrp nhs 10::1
ipv6 nhrp shortcut
Phase 2 solved this by using daisy-chained NHRP. In short, HUB1 became a NHRP client of HUB2, which became a NHRP client of HUB3, which became an NHRP client of HUB1. It would look something like this:
You can reference Cisco's drawing of the same solution here; reference figure 3-4:
http://www.cisco.com/c/en/us/td/docs/solutions/Enterprise/WAN_and_MAN/DMVPDG/DMVPN_3.html
The config is reasonably simple. This isn't something I have mocked up right now, but let's pretend they're each using tunnel 1 and have a tunnel IP address of 10.0.0.X, where X is the hub number.
HUB1:
interface Tunnel1
ip nhrp map N.B.M.A2 10.0.0.2
ip nhrp nhs 10.0.0.2
HUB2:
interface Tunnel1
ip nhrp map N.B.M.A3 10.0.0.3
ip nhrp nhs 10.0.0.3
HUB3:
interface Tunnel1
ip nhrp map N.B.M.A1 10.0.0.1
ip nhrp nhs 10.0.0.1
So given my earlier predicament, "What happens when a spoke in SPOKES1 wants to reach a spoke in SPOKES3?", in this case, the requesting spoke in SPOKES1 sends the initial packet to HUB1, which, not having a resolution for the spoke that's registered to SPOKES3, passes both the original packet and the NHRP request to SPOKES2, which in turn passes it to SPOKES3. In theory, SPOKES3 has the resolution for the router we're trying to reach, and tells that router via NHRP to establish a tunnel (via NBMA) back to the original requester in SPOKES1.
You can see how inefficient this is. It's not hierarchical; it scales sideways.
Let's take a worse scenario - what if the spoke in SPOKES3 is offline and not registered to HUB3? Well, HUB3 doesn't have a resolution so it passes it to HUB1, which in turn passes it to HUB2 ... yes, it loops. It eventually TTLs away and dies, but it's messy.
Another scalability issue in Phase 2 is that there's absolutely no way to summarize routes. If you summarize, you get the spoke->hub->spoke syndrome, because the next hop is always the summarizing router.
Also, I mentioned above the problem with the first few packets being punted to the CPU, and not being CEF switched.
Phase 3 fixes all these issues, and is basically better in every way.
In Phase 3, completely contrary to the way Phase 2 worked, all the routing needs to point towards the hub (initially). So the routing protocol does need some sort of "next hop self" feature enabled.
After the hub receives the first packet, instead of generating NHRP resolution packets, the hub sends an NHRP redirect any time it receives a packet in one tunnel and sends it back out the same tunnel. This redirect goes back to the originating router (the one that sent the first packet to the hub - the packet that got sent in & out the same tunnel), informing the originating router that a better path is available over DMVPN. NHRP redirect is very similar to an ICMP redirect.
After the hub receives the first packet, instead of generating NHRP resolution packets, the hub sends an NHRP redirect any time it receives a packet in one tunnel and sends it back out the same tunnel. This redirect goes back to the originating router (the one that sent the first packet to the hub - the packet that got sent in & out the same tunnel), informing the originating router that a better path is available over DMVPN. NHRP redirect is very similar to an ICMP redirect.
Let's demonstrate with a sample of 4.4.4.4 trying to reach 2.2.2.2. It's important to note here that the route to 2.2.2.2 has a next-hop of R1's private address, which was resolved by the static entry on R1, so there's no CEF failure!
So now R4 knows R1 isn't the best path to R2. At this point, R4 needs to send an NHRP resolution request to R2 (not the hub!), to find out how to reach it directly. In the meantime, it knows R1 will continue to forward packets for it.
Since R4 still can't speak directly to R2, the NHRP resolution gets forwarded via R1, but not processed via R1. In Phase 3, it's no longer R1's job to answer NHRP resolutions.
R2 receives the resolution, and responds directly to R4 (similar to the way Phase 2 worked at this point), also initiating the tunnel to R4.
At this point, R2 and R4 would still have R1 as the next hop for one another, but Phase 3 fixes that as well, rewriting CEF in order to fix this issue.
In modern versions of IOS, you can actually see the rewrite (more or less) via the routing table:
R4#sh ip route ospf | b Gateway
Gateway of last resort is 87.14.40.100 to network 0.0.0.0
1.0.0.0/32 is subnetted, 1 subnets
O 1.1.1.1 [110/1001] via 10.0.0.1, 00:03:12, Tunnel1
2.0.0.0/32 is subnetted, 1 subnets
O % 2.2.2.2 [110/2001] via 10.0.0.1, 00:02:43, Tunnel1
3.0.0.0/32 is subnetted, 1 subnets
O 3.3.3.3 [110/2001] via 10.0.0.1, 00:03:12, Tunnel1
10.0.0.0/8 is variably subnetted, 5 subnets, 2 masks
O 10.0.0.1/32 [110/1000] via 10.0.0.1, 00:03:12, Tunnel1
O 10.0.0.2/32 [110/2000] via 10.0.0.1, 00:02:43, Tunnel1
O 10.0.0.3/32 [110/2000] via 10.0.0.1, 00:03:12, Tunnel1
Note the % sign next to 2.2.2.2:
R4#sh ip route ospf | i %
+ - replicated route, % - next hop override
O % 2.2.2.2 [110/2001] via 10.0.0.1, 00:03:33, Tunnel1
"next hop overide". Pretty cool.
R4#sh ip nhrp shortcut
2.2.2.2/32 via 10.0.0.2
Tunnel1 created 00:04:46, expire 01:55:15
Type: dynamic, Flags: router rib nho
NBMA address: 87.14.20.1
First, the routing protocol must point back towards the hub, instead of towards the spoke, like we were setup for in Phase 2.
R1-R4:
interface Tunnel1
ip ospf network point-to-multipoint
Point-to-Multipoint will rewrite the nexthop as the hub instead of Broadcast or Non-Broadcast, which did not. Also, not pictured here, I re-established the hub<->spoke multicast prior to this. Another important footnote is that with Point-to-Multipoint, we no longer need the DR/BDR we were stuck with in Phase 2 (effectively limiting us to two hubs). This design also permits for summarization (or even just a default route), which Phase 2 certainly did not allow for. More on this in a bit.
R1:
interface Tunnel1
ip nhrp redirect
R2-R4:
interface Tunnel1
ip nhrp shortcut
ip nhrp redirect goes on the hub only (note many installations will just put ip nhrp redirect and ip nhrp shortcut on every device; this is not necessary). ip nhrp redirect enables the behavior described earlier: if a packet is received and transmitted on the same MGRE tunnel, send the redirect. You can actually see who we've sent redirects for recently:
R1#sh ip nhrp redirects
I/F NBMA address Destination Drop Count Expiry
Tunnel1 87.14.40.1 3.3.3.3 4 00:00:06
Tunnel1 87.14.30.1 10.0.0.4 4 00:00:06
ip nhrp shortcut goes only on the spokes; it is used to enable processing redirect packets.
That's all there is to enabling Phase 3; but I still haven't answered the scenario I proposed being the problem with Phase 2 ("sideways" scaling for hubs). With Phase 3, you can build hierarchical hubs because of the NHRP daisy chain doesn't need to exist any longer. Imagine our 1,500 spoke router scenario from earlier, but now with Phase 3:
What if a router in SPOKES1 wants to build a spoke-to-spoke tunnel to a router in SPOKES3?
Here we see the first packet leave the spoke in SPOKES1. HUB1 will forward this packet (according to the routing table via CORE1, not pictured here). HUB1 then sends a redirect back towards the spoke in SPOKES1.
Because the hubs no longer answer NHRP requests, there is no need to NHRP daisy chain the hubs! So in our next diagram, we're going to watch the NHRP resolution request be routed to its destination.
Again, this is a routed IP packet, HUB1, CORE1, and HUB3 are not NHRP-processing this packet, they're just CEF-switching it.
When the target spoke in SPOKES3 receives the NHRP resolution request, it replies directly to the originating spoke in SPOKES1:
Much more scalable than Phase 2.
This leads back into summarization. In Phase 3 there is no need to have the full routing table. You can send out a summary for your network, or even a default.
I can't summarize intra-area in OSPF, so I'm switching back to EIGRP (not pictured here).
R1:
router eigrp 100
ip summary-address eigrp 100 0.0.0.0 248.0.0.0
R3#sh ip route eigrp | b Gateway
Gateway of last resort is 87.14.30.100 to network 0.0.0.0
D 0.0.0.0/5 [90/3968000] via 10.0.0.1, 00:02:26, Tunnel1
One route - that'd sure be easier on my spokes if I had 1,500 spokes to consider.
R3#ping 2.2.2.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 172/218/324 ms
We have reachability.
R3#trace 2.2.2.2
Type escape sequence to abort.
Tracing the route to 2.2.2.2
VRF info: (vrf in name/id, vrf out name/id)
1 10.0.0.2 156 msec 180 msec 172 msec
We're reaching it via one hop (our spoke-to-spoke tunnel)
R3#sh ip route | b Gateway
Gateway of last resort is 87.14.30.100 to network 0.0.0.0
S* 0.0.0.0/0 [1/0] via 87.14.30.100
D 0.0.0.0/5 [90/3968000] via 10.0.0.1, 00:04:07, Tunnel1
2.0.0.0/32 is subnetted, 1 subnets
H 2.2.2.2 [250/1] via 10.0.0.2, 00:00:29, Tunnel1
3.0.0.0/32 is subnetted, 1 subnets
C 3.3.3.3 is directly connected, Loopback0
10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C 10.0.0.0/24 is directly connected, Tunnel1
L 10.0.0.3/32 is directly connected, Tunnel1
87.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C 87.14.30.0/24 is directly connected, FastEthernet0/0
L 87.14.30.1/32 is directly connected, FastEthernet0/0
H = NHRP
R3#sh ip route nhrp | b Gateway
Gateway of last resort is 87.14.30.100 to network 0.0.0.0
2.0.0.0/32 is subnetted, 1 subnets
H 2.2.2.2 [250/1] via 10.0.0.2, 00:00:41, Tunnel1
R3#show dmvpn | b Tunnel1
Interface: Tunnel1, IPv4 NHRP Details
Type:Spoke, NHRP Peers:2,
# Ent Peer NBMA Addr Peer Tunnel Add State UpDn Tm Attrb
----- --------------- --------------- ----- -------- -----
2 87.14.20.1 10.0.0.2 UP 00:04:09 DT1
10.0.0.2 UP 00:04:09 D
1 87.14.10.1 10.0.0.1 UP 01:25:21 S
What about IPv6 DMVPN?
Note, there is no IPv6 over IPv6 DMVPN yet - at least not on my IOS. So we'll be tunneling v6 over v4.
No changes to the existing tunnels are required, we just add v6 to our existing infrastructure.
I've added X::X/64 to every Loopback0, and 10::X/64 to every Tunnel1, where X is the router number.
R1:
ipv6 unicast-routing
ipv6 router eigrp 100
no shut
interface Tunnel1
no ipv6 split-horizon eigrp 100
ipv6 address 10::1/64
ipv6 eigrp 100ipv6 nhrp map multicast dynamic
ipv6 nhrp network-id 1
ipv6 nhrp redirect
R2-R4:
ipv6 unicast-routing
ipv6 router eigrp 100
no shut
interface Tunnel1
ipv6 address 10::X/64 ! Where X is the router number
ipv6 eigrp 100
ipv6 nhrp map multicast 87.14.10.1
ipv6 nhrp map 10::1/128 87.14.10.1
ipv6 nhrp network-id 1
ipv6 nhrp nhs 10::1
ipv6 nhrp shortcut
The one thing that did throw me off here is that you don't need to map the link-local address of the hub on the spokes, or vice-versa. As I'd mentioned earlier, the ipv6 nhrp map commands remind me a lot of frame-relay, so I immediately started putting in manual mappings. No need. NHRP takes care of all of that:
R1#sh ipv6 nhrp | s FE80
FE80::C800:37FF:FEDC:8/128 via 10::3
Tunnel1 created 00:11:04, expire 01:48:56
Type: dynamic, Flags: unique registered used
NBMA address: 87.14.30.1
FE80::C801:FF:FEF8:8/128 via 10::4
Tunnel1 created 00:10:54, expire 01:49:06
Type: dynamic, Flags: unique registered used
NBMA address: 87.14.40.1
FE80::C803:13FF:FE90:8/128 via 10::2
Tunnel1 created 00:14:51, expire 01:45:08
Type: dynamic, Flags: unique registered used
NBMA address: 87.14.20.1
The link locals are auto-registered along with the unicast IPv6 addresses.
There's not much more to say - it works -
R4#ping 2::2 source lo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2::2, timeout is 2 seconds:
Packet sent with a source address of 4::4
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 156/198/260 ms
R4#trace 2::2
Type escape sequence to abort.
Tracing the route to 2::2
1 10::2 176 msec 172 msec 168 msec
Now let's look at what QoS options we have.
The QoS is largely Hub -> Spoke. You can get some Spoke -> Spoke but it's generally a hackjob, because your neighbors are dynamic, it's difficult to fine tune a policy.
The basic idea is that the spoke registers a value (called an NHRP "group") back to the hub, which the hub can then match and apply a policy-map to.
R2:
interface Tunnel1
ip nhrp group GROUP1
R3:
interface Tunnel1
ip nhrp group GROUP1
R4:
interface Tunnel1
ip nhrp group GROUP2
On all three spoke routers I did the following procedure:
interface tunnel1
no ip nhrp nhs 10.0.0.1
ip nhrp nhs 10.0.0.1
The reason is that the spoke doesn't dynamically re-register to the hub, so we're forcing it.
We can now see the hub is aware of the groups:
R1(config-if)#do sh ip nhrp
10.0.0.2/32 via 10.0.0.2
Tunnel1 created 00:34:47, expire 01:57:03
Type: dynamic, Flags: unique registered used
NBMA address: 87.14.20.1
Group: GROUP1
10.0.0.3/32 via 10.0.0.3
Tunnel1 created 02:24:59, expire 01:59:36
Type: dynamic, Flags: unique registered used
NBMA address: 87.14.30.1
Group: GROUP1
10.0.0.4/32 via 10.0.0.4
Tunnel1 created 02:24:41, expire 01:59:49
Type: dynamic, Flags: unique registered used
NBMA address: 87.14.40.1
Group: GROUP2
Let's build some policies on the hub. I have this all mocked up in GNS3, so we have to keep the performance expectations low.
Most of the DMVPNs I've designed used the DMVPN for bulk traffic, and used a side-by-side MPLS for the traffic that needed priority/QoS. So I have honestly never used this in production, but I suspect the design case is going to be mostly for shaping by group. You don't want a spoke with a device in the hub sending towards a spoke at 100Mbit if the spoke has a pair of bonded T1s for Internet access. If we can shape perhaps "low speed" clients to one group, and "high speed" to another group, we can stop the slow spokes from getting overwhelmed while allowing the faster spokes to get traffic at, or near, the line rate of the hub Internet connection. This would also be very easy to config, theoretically just 8 lines would take care of all spokes.
That all said, I've written slightly more complex configs for this implementation, because the CCIE lab's questions are about as far from reality as you can get.
R1:
ip access-list extended TOWARDS-R2
permit ip any host 2.2.2.2
ip access-list extended TOWARDS-R3
permit ip any host 3.3.3.3
class-map match-all TOWARDS-R3
match access-group name TOWARDS-R3
class-map match-all TOWARDS-R2
match access-group name TOWARDS-R2
policy-map GROUP1-PM
class TOWARDS-R2
shape average 4000
class TOWARDS-R3
shape average 4000
policy-map GR1-POLICY-PARENT
class class-default
shape average 6000
service-policy GROUP1-PM
policy-map GR2-POLICY-PARENT
class class-default
shape average 8000
interface Tunnel1
ip nhrp map group GROUP1 service-policy output GR1-POLICY-PARENT
ip nhrp map group GROUP2 service-policy output GR2-POLICY-PARENT
The idea here is that the cumulative bandwidth of GROUP1 should not exceed 6K, and each spoke should only get 4K maximum. Cumulative GROUP2 should not exceed 8K.
I worked up the "proof" from this, but it doesn't work into a blog well. Suffice to say it works.
You can see the policy-map hits with show policy-map multipoint, and can also get information from show dmvpn detail.
Ingress Per-Tunnel QoS (policing and remarking, basically) is not supported on DMVPN.
I know the first time I mocked this up, the first question I had was: that's great for Hub -> Spoke, but what about Spoke -> Hub, or Spoke->Spoke?
Turns out they're both kind of a pain (not to mention unsupported). As of 15.x, you can no longer apply a service policy directly to an MGRE tunnel. You can, of course, still shape, police, and queue on the physical interface that your tunnel is connected to. This more or less implies you need qos pre-classify, but interestingly, on 15.2(4)M6, I got the same results with or without it with the traffic generated on the router - if I pinged from the router, I got the inside (pre-tunnel) QoS values on the outer DSCP value. I suspect that may have differed if I was testing from behind the device, but I didn't test it.
The big nail in the coffin for Spoke->Spoke QoS is that the neighbors are dynamic. Without some way of applying a grouping to the neighbor which implies how much bandwidth they have, or what traffic is priority for them, you have to either individually manually match destinations, which defeats the dynamic nature of DMVPN, or have one generic policy that matches both the hub and every spoke.
A sample config might look something like this:
class-map match-all ef
match dscp ef
policy-map out
class ef
priority percent 50
class class-default
random-detect
interface Tunnel1
qos pre-classify
interface FastEthernet0/0
service-policy output out
And finally, some miscellaneous topics that I thought were interesting.
UNIQUE NHRP
By default, the spoke instructs the hub that it's registration is unique, and not to accept a registration for the same DMVPN (private) IP from a different NBMA (public) IP.
If you're using DHCP on a spoke, and your IP might change, you'd want to disable this.
Use ip
nhrp registration no-unique on the spoke.
TUNNEL KEYS
If you have multiple MGRE tunnels attached to the same physical interface, you need to put tunnel keys on them to keep them separate. Older IOSes (12.3 and older) required them on every MGRE tunnel.
Use tunnel key 123
SPOKE TO SPOKE MULTICAST
This is a very similar question to spoke-to-spoke QoS, but I can see this one getting used on the CCIE lab. It's impractical for large production networks, but in our topology:
R2:
ip pim rp-address 3.3.3.3
interface Tunnel1
ip nhrp map 10.0.0.3 87.14.30.1
ip nhrp map multicast 87.14.30.1
ip pim nbma-mode
ip pim sparse-mode
interface Loopback0
ip pim sparse-mode
ip igmp join-group 239.0.0.1
R3:
ip pim rp-address 3.3.3.3
interface Tunnel1
ip nhrp map 10.0.0.2 87.14.20.1
ip nhrp map multicast 87.14.20.1
ip pim nbma-mode
ip pim sparse-mode
interface Loopback0
ip pim sparse-mode
R3(config-if)#do sh ip pim neigh
PIM Neighbor Table
Mode: B - Bidir Capable, DR - Designated Router, N - Default DR Priority,
P - Proxy Capable, S - State Refresh Capable, G - GenID Capable
Neighbor Interface Uptime/Expires Ver DR
Address Prio/Mode
10.0.0.2 Tunnel1 00:06:00/00:01:38 v2 1 / S P G
R3(config-if)#do ping 239.0.0.1
Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 239.0.0.1, timeout is 2 seconds:
Reply to request 0 from 2.2.2.2, 344 ms
Reply to request 0 from 2.2.2.2, 344 ms
Very ungainly and manual, but it does work. It's also of note that EIGRP peered between R2 and R3 as well:
R3(config-if)#do sh ip eigrp neigh
EIGRP-IPv4 Neighbors for AS(100)
H Address Interface Hold Uptime SRTT RTO Q Seq
(sec) (ms) Cnt Num
1 10.0.0.2 Tu1 14 00:09:50 266 1596 0 20
0 10.0.0.1 Tu1 11 02:58:23 328 1968 0 32
CRYPTO CALL ADMISSION
So one of these theoretical hubs with 500 spokes - let's assume it's not a big burly router, but it's getting along just fine in steady-state. Uh-oh, it lost power and had to reboot! Does it have the horsepower to establish 500 encrypted tunnels all trying to reconnect at the same time?
crypto call admission limit IKE in-negotiation can control how many simultaneous tunnels it will try to process (any new incoming tunnels get dropped temporarily until the first group is up).
Hope you enjoyed,
Jeff
Awesome as always, Jeff! Thank you!
ReplyDeleteAny idea how to do this? Allow R2/R3 and R2/R4 to form a spoke to spoke tunnel, but do not allow R3/R4 to form a direct tunnel. I'm trying to figure out how to filter the nhrp redirect / shortcut process, but only for R3/R4, without breaking anything else.
ReplyDeleteAre you needing to control this on the hub side? If not, it seems the easiest answer would be to install static routes on R2 and R4 pointing the traffic at the hub for the other side's subnets, forcing the traffic through the hub permanently. If you don't want them to talk at all, just filter the EIGRP traffic from hub -> spoke. To answer more directly, I'm not aware of a way (not to say it's not possible - I just dont know how) to have a selective redirect process on the hub, without controlling something underlying, like the network path, to avoid it hairpinning through the same interface.
DeleteThanks Jeff. I was told that the ip nhrp interest command will accomplish this, just trying to figure out exactly how and where this needs to be set. The requirement (practice lab) didn't specify where the control needed to be issued, but static routes aren't allowed. So far, documentation on this command is lacking...
DeleteUpdate - got this to work. The command goes on the spoke routers. The ACL is an extended ACL that blocks the other spoke destinations. Tested in the lab, works great.
ReplyDeleteI went ahead and tried it, it works for me too. Curious what the usage of the standard ACL is, using it for destination IPs doesn't do a thing, and you'd expect it to be source-based anyway, but I can't even fathom how that would work. But it's an option nonetheless:
DeleteR5(config-if)#ip nhrp interest ?
<1-199> IP access list
none No traffic is interesting.
I'm having trouble with your explanation that HUB1 would send an NHRP redirect to the spoke in SPOKE1 (see diagram13) in the phase 3 scenario. How does HUB1 know that it should send a redirect to the spoke if it does not know whether the destination is within the DMVPN topology?
ReplyDeleteFor instance, what if that spoke in SPOKE1 was trying to send traffic to a directly connected subnet off of CORE1. Would that still trigger an NHRP redirect from the HUB1 router? In otherwords, does the hub send an NHRP redirect for any traffic it receives from a spoke?
Ben - it's all based on routing and next hop. For example, Spoke 1 will either have a default route to HUB, or an actual route to CORE1 via the routing protocol running over the DMVPN. It has the route, but it does not yet have an NHRP mapping to know how to get there. It will then use the defined NHRP NHS to send the request for the physical address.
DeleteLong story short, this works no differently than standard routing, i.e. how does one router know how to get to the next. The only difference here is NHRP will need to resolve the route's next hop.
Thanks Jeff
ReplyDeleteI think this is the most clear explanation of DMVPN logic I have ever seen. Great job.
I built my own lab using 3 x 4351s and a lot of VRFs... and I was going mad as my Phase 3 was not working even though all config was correct and Phase 2 was ok... and then, after reading your article I found two things that made me think... and scratch my head.
One was the NHRP Key and its local significance and another one was Phase 2 vs Phase 3 NHRP registration requests processing!
I have two VRFs on the same physical router but Tunnels were sharing the same NHRP key. Even though both were in different VRFs, using different 'external/tunnel' IP address, router was not able to differentiate the queries as they were coming from one VRF to another VRF on the same platform with the same NHRP key. Once I made keys different everything started to work like a charm.
This does make sense. In Phase 2 hubs do resolution on behalf of spokes and my hubs VRFs sit on a different physical router - hence there were no conflict.
This is the message I've seen while debugging NHRP:
NHRP: request loop detected. dropping packet.
There's little information about it in the Internet if any at all.
Thanks a lot!
Like how you simplified the phase 2 daisy chain of multiple hubs. And how phase 3 addresses the problem
ReplyDelete