Saturday, January 2, 2016

GETVPN

GETVPN, or Group Encrypted Transport VPN, is Cisco's implementation of the GDOI standard. GDOI, or Group Domain of Interpretation, is defined in RFC 6407, which obsoleted the original RFC, 3547.

GDOI was originally established to allow for a way of encrypting multicast traffic, which was rather cumbersome to do with, say, GRE-over-IPSEC tunnels previously.

https://tools.ietf.org/html/rfc3547
"GDOI Applications. Secure multicast applications include video broadcast and multicast file transfer."

However, GETVPN is now commonly used for encrypting any type of traffic over any private network. Most commonly, it is used for encryption over MPLS VPNs, as MPLS VPNs are not truly secure, and without encryption you're putting a lot of faith that your service provider won't sniff your data. However, GETVPN is L2/L3 agnostic, so arguably it could be used for any application where NAT is not involved. GETVPN does not replace DMVPN for Internet applications. More on that further down the document.

At a high-level, GETVPN establishes a set of rotating encryption keys that a group shares. In this fashion, any group member can encrypt data to any other group member without setting up a tunnel to the other group member. In fact, the entire system is "tunnel-less". Additionally, as GETVPN re-uses the original IP header, the underlying routing is preserved. So if you're using BGP to peer to an MPLS VPN, that same routing just keeps working even with the encrypted packets.

How the encryption process occurs can be most easily shown over a series of slides.

There are two router types involved with GETVPN: Key Servers (KS) and Group Members (GM). GMs, in this usage, are customer CEs that will be encrypting traffic at one another. KSs are control-plane only routers that are not in the forwarding path, nor do they encrypt data.

The first step is for the GMs to register to a KS. In order to do this, ISAKMP is established between the GM and the KS. This is a one-off ISAKMP session for this initial communication only.



During this initial step, a "pull" is initiated from GM to KS. The GM receives the initial Key Encryption Key (KEK) and Traffic Encryption Key (TEK). As I mentioned above, the initial ISAKMP session, while it may be up for a good while longer than the initial session, isn't used after this process - only the KEK is. It's important to note that all KSs, of which there can be up to 8, can encrypt using the KEK, which makes it sort of a distributed/shared phase 1, as opposed to the initial point-to-point ISAKMP session.


The KEK key, which is generated by the primary key server (and distributed to the other key servers), is then used by each GM to reach any other GM. I struggled with how to draw this to avoid the appearance of tunnels, which are inherently point-to-point.


There are some comparisons and contrasts to be drawn with both traditional IPSEC point-to-point tunnels and with DMVPN.

An obvious difference with point-to-point IPSEC is that, with some exceptions we will cover throughout the document, all traffic egressing the CE -> PE interface is encrypted, regardless of where it is destined. This makes for a tunnel-less, or group-"tunnel" style interface. Moreover, unlike point-to-point IPSEC, the original source and destinations in the IP header are retained, whereas with IPSEC, they are rewritten with the tunnel endpoints. As such, traditional routing - and multicast - both work.

While a GETVPN and DMVPN may accomplish similar tasks, there are some significant differences there, as well. Without making a messy static-hack to the configuration, DMVPN only supports multicast from the DMVPN head-end to spokes. As pointed out above, native multicast works fine on GETVPN, without utilizing pseudo-multicast as is common at tunnel head-ends. Additionally, DMVPN builds dynamic tunnels from spoke-to-spoke on an as-needed basis - but that leaves the spoke still building tunnels every time it needed to speak to another spoke. This creates overhead, both in tunnel setup - there's a small, but measurable delay in each tunnel being created - and in scalability; if a spoke needs to speak to hundreds of other spokes, it must build and maintain hundreds of point-to-point IPSEC tunnels.

A disadvantage of GETVPN is that it isn't supported with NAT, or the NAT must be engineered in such a way that it's invisible to the encryption devices. This has to do with the original IP addressing and header being preserved by GETVPN. One could arguably run GET on the Internet if the GMs and KSs used only public IP addressing, and, if needed, hide the NAT behind extra routers behind the GMs. I've seen documents on the Internet claiming even more can be done with GETVPN and NAT, but these are not supported use cases by Cisco, and I didn't try to verify them. Cisco's approach is evident, if an Internet-facing tunnel with NAT is required, it's best to use DMVPN, which works well with NAT.

There's a fair amount going on behind-the-scenes in a GETVPN, and I'm going to pause explaining that at this point to look at some of the config. A key aspect of a CCIE is to know both the configuration steps and the steps happening behind-the-scenes, and I always find it best to introduce both in conjunction.

Here's the topology we will be working from:

  
I'll review IP address usage on-the-fly, there are too many links here to describe them all initially.
Also, we won't be reviewing any of the P or PE devices, as they're just a basic MPLS VPN configuration.

We'll start by looking at the configuration of Key Server 1 (KS1). For now, we'll pretend KS2 doesn't exist, as I'll cover that as part of the COOP (pronounced "co-op") configuration later in the document.

But first, a quick review of scope. As with all my other previous documentation, my articles are targeted at the CCIE R&S. This means we'll only be inspecting the ISAKMP and IPSEC configuration enough for an R&S understanding, and we'll be skipping any advanced topics that are irrelevant to R&S (i.e. Trustsec integration).

KS1:
crypto isakmp policy 1
 encr aes
 authentication pre-share
 group 2

crypto isakmp key MYGDOIPSK address 0.0.0.0

crypto ipsec transform-set aes128 esp-aes esp-sha-hmac
 mode tunnel

crypto ipsec profile profile1
 set transform-set aes128

crypto gdoi group GDOI-GROUP1
 identity number 1234
 server local
  rekey algorithm aes 128
  rekey authentication mypubkey rsa MYRSAKEY
  rekey transport unicast
  sa ipsec 1
   profile profile1
   match address ipv4 getvpn-acl
   replay time window-size 5
  address ipv4 192.168.111.111

ip access-list extended getvpn-acl
 deny   udp any eq 848 any
 deny   udp any any eq 848
 deny   tcp any eq bgp any
 deny   tcp any any eq bgp
 permit ip any any

Not shown here is the BGP configuration. I have KS1 peered with PE1, advertising it's loopback, 192.168.111.111. KS1 is setup similarly to how any CE router would be in an MPLS VPN.

With the understanding that I'm going to high-level the crypto explanations, here's what the various relevant pieces of the config do:

crypto isakmp key MYGDOIPSK address 0.0.0.0

As mentioned above, all GMs stand up a temporary ISAKMP session to the KS during registration. In order to do so, they need to share a PSK (or have a PKI, which out-of-scope for this article). You can create a key per-GM, or just one that matches all GMs. Here we've defined the key as MYGDOIPSK for all GMs "0.0.0.0".

You can view the ISAKMP sessions before they die off, if desired:

KS1#show crypto isakmp sa
IPv4 Crypto ISAKMP SA
dst             src             state          conn-id status
192.168.111.111 192.168.11.3    GDOI_IDLE         1067 ACTIVE

If all devices had come up after a fresh reboot, you'd see four connections here, but as I've only recently bounced one, the other three have expired already.

crypto gdoi group GDOI-GROUP1
 identity number 1234


The crypto gdoi group command is where all the magic happens on the KSs, and we'll be reviewing the rest of the configuration below. It's important to note it's not assigned to an interface on the KS. The KS doesn't encrypt anything but the control-plane traffic, so this config, when used with 'server' local', simply enables the GDOI KS process and opens UDP port 848 for communications to the other GMs (and eventually other KSs for COOP). The identity number defines which encryption group this config belongs to - a KS can run multiple groups for different GMs, and keep the keying (and consequently the communication) isolated between groups.

 server local
  rekey algorithm aes 128
  rekey authentication mypubkey rsa MYRSAKEY
  

Here we define the KEK key and rekey process. The initial keys, shown here, are defined as 128-bit AES, authenticated with the RSA key "MYRSAKEY". The RSA keys need to be pre-created, which is accomplished with:

crypto key generate rsa label MYRSAKEY modulus 1024 exportable

With a single KS you don't technically need to make the key exportable, but if you ever want to add a second KS, this is mandatory, so it's a good idea to do it to begin with.

  rekey transport unicast

As with any IPSEC tunnel, the keys are rotated periodically so that in case they are compromised, they can't be used to decrypt messages in the future - in other words, the theory is that it takes longer to crack the keys than the actual lifetime of the key, therefore making it impossible for a hacker to decrypt data in real-time. GETVPN has to rekey both the KEK and the TEK periodically, at intervals defined at the IOS CLI. New keys are sent out prior to the expiration of the old key, so that there's a clean roll-over to new key when the appropriate time has been reached.

There are two methods for rekeying with GETVPN. If you look back to why GDOI was originally developed, it was to encrypt multicast traffic. So, logically, rekeying via multicast is an option. I didn't lab this as it would've required me to either move away from using MPLS as my core, or enable service-provider multicast over MPLS, which seemed excessive for the scope I was attempting to cover. Regardless, Cisco recommends using unicast rekey now, namely because there's an acknowledgement system in unicast that's not available in multicast. Multicast rekey does a "fire and forget" mechanism and simply hopes the new keys reach the destination; unicast rekey double-checks to ensure the keys are received by expecting an ACK back from the GM. Eventually, if a key is coming up on expiring and the GM hasn't received a replacement, it will attempt a re-register with the KS in order to resolve the issue.

The actual rekeying/retry logic is incredibly deep, and for more information on it, I recommend reading the Cisco documentation, which is actually quite good (to my surprise, as most of my CCIE-level articles got written in the first place because the Cisco documentation is generally awful):

http://www.cisco.com/c/en/us/td/docs/ios/12_4t/12_4t11/htgetvpn.html

  sa ipsec 1
   profile profile1
   match address ipv4 getvpn-acl
   replay time window-size 5


Here the TEK key attributes are defined, inherited from profile1:

crypto ipsec profile profile1
 set transform-set aes128

The GETVPN ACL is defined as getvpn-acl.

It's probably not desirable to encrypt all traffic over the MPLS circuit. For example, control-plane protocols (probably BGP) as well as the initial control plane session (UDP port 848) from GM to KS need to be exempt from this process. It might also be desirable for ICMP, SSH, and perhaps SNMP -  your management protocols - to be exempt.

At it's most basic, your ACL should look something like this:
ip access-list extended getvpn-acl
 deny   udp any eq 848 any
 deny   udp any any eq 848
 deny   tcp any eq bgp any
 deny   tcp any any eq bgp
 permit ip any any

deny indicates to not encrypt traffic. permit indicates to encrypt traffic. Normally this ACL will end in "permit ip any any".

The replay-time command has a big topic to discuss behind-the-scenes. The traditional IPSEC method for anti-replay doesn't work with GETVPN.  If you're not familiar with replay attacks, "A replay attack is a form of network attack in which a valid data transmission is maliciously or fraudulently repeated or delayed. It is an attempt to subvert security by someone who records legitimate communications and repeats them in order to impersonate a valid user, and to disrupt or cause negative impact for legitimate connections."

http://www.cisco.com/c/en/us/support/docs/ip/internet-key-exchange-ike/116858-problem-replay-00.html

Also from the same document, anti-replay is described: "IPSec provides anti-replay protection against an attacker who duplicates encrypted packets with the assignment of a monotonically increasing sequence number to each encrypted packet".

In a nutshell, traditional anti-replay has a counter embedded in each packet, with the far side of a point-to-point tunnel anticipating the number to continuously count up, one packet at a time. This clearly can't work with GETVPN, as any neighbor can forward traffic, so there's no way to maintain a two-router counting system. Introducing Time-Based Anti-Replay, or TBAR.

TBAR has the KS maintain a pseudo-time clock ('pseudo' as it's not based on NTP) with the GMs. This gives every GM a coordinated reference point for time. Every GM then sends its pseudo-timestamp embedded in every packet, and if the timestamp is more than X seconds on the receiving GM, the packet it considered a replay attack and is dropped. 'X' seconds is defined by the replay time window-size 5, where 5 is the number of seconds a packet is considered valid.

  address ipv4 192.168.111.111

This defines the local IP address in which to send and receive GETVPN messages on. It's normally set to a loopback. Our loopback on KS1 is 192.168.111.111.

Now let's move on to our first spoke configuration, on CE1/GM1:

crypto isakmp policy 1
 encr aes
 authentication pre-share
 group 2
crypto isakmp key MYGDOIPSK address 192.168.111.111
crypto gdoi group GDOI-GROUP1
 identity number 1234
 server address ipv4 192.168.111.111

crypto map gdoimap 1 gdoi
 set group GDOI-GROUP1

int e0/0
 crypto map gdoimap


This configuration is notably smaller than that of the KS.  Moreover, with some rare exception, it can be pasted in identically to each GM, so config deployment is very easy and fast.

crypto isakmp policy 1
 encr aes
 authentication pre-share
 group 2
crypto isakmp key
MYGDOIPSK address 192.168.111.111

This is an identical match to the ISAKMP GM -> KS policy shown on the KS. The GM will use this to establish the initial temporary ISAKMP session back to the KS to register and download KEK & TEK. The only real important item here is that this config match with that of the KS.

crypto gdoi group GDOI-GROUP1
 identity number 1234
 server address ipv4 192.168.111.111


Here we define our group number, which will control which key set we receive, as well as which members we can speak to. Our initial deployment will all be on group 1234 for simplicity. server address determines which KS we register to. There can be more than one KS, and we'll cover the GM config for that when we cover COOP on the KSes.

crypto map gdoimap 1 gdoi
 set group GDOI-GROUP1


int e0/0
 crypto map gdoimap


On the GMs, we activate both the control plane and forwarding plane of GETVPN on-the-interface, unlike on the KS, which has no interface-level config.

My lab has this all running already, so I'm going to manually bounce CE1 to watch the registration process.

CE1#sh run int e0/0
Building configuration...

Current configuration : 152 bytes
!
interface Ethernet0/0
 ip address 192.168.11.3 255.255.255.0
 crypto map gdoimap
end

CE1(config)#int e0/0
CE1(config-if)#no crypto map gdoimap
CE1(config-if)#crypto map gdoimap

I'll break the log down:
*Jan  2 18:10:37.203: %CRYPTO-5-GM_REGSTER: Start registration to KS 192.168.111.111 for group GDOI-GROUP1 using address 192.168.11.3 fvrf default ivrf default 

We started attempting registration

*Jan  2 18:10:37.236: %GDOI-5-SA_TEK_UPDATED: SA TEK was updated
*Jan  2 18:10:37.237: %GDOI-5-SA_KEK_UPDATED: SA KEK was updated

We received TEK and KEK

*Jan  2 18:10:37.237: %GDOI-5-GM_REGS_COMPL: Registration to KS 192.168.111.111 complete for group GDOI-GROUP1 using address 192.168.11.3 fvrf default ivrf default 

We successfully registered to KS 192.168.111.111.

*Jan  2 18:10:37.238: %GDOI-5-GM_INSTALL_POLICIES_SUCCESS: SUCCESS: Installation of Reg/Rekey policies from KS 192.168.111.111 for group GDOI-GROUP1 & gm identity 192.168.11.3 fvrf default ivrf default

Policies pushed from KS1 were activated successfully.


Remember that ACL we put on the key-server? It's downloaded to the GM as part of the registration process to the KS:

CE1#show crypto gdoi gm acl
Group Name: GDOI-GROUP1
 ACL Downloaded From KS 192.168.111.111:
   access-list   deny udp any port = 848 any
   access-list   deny udp any any port = 848
   access-list   deny tcp any port = 179 any
   access-list   deny tcp any any port = 179
   access-list   permit ip any any
 ACL Configured Locally:

Moreover, if you update the ACL on the KS, it will get re-pushed with the next scheduled rekey, or you can force a rekey at any time with:

crypto gdoi ks rekey ! refreshes the ACL and sends out the next set of keys
crypto gdoi ks rekey replace-now ! steps above, plus force swapping to a new key (traffic impacting)

There are some other useful show commands we'll take a moment to look at.
One thing that threw me initially is that the traditional ipsec "show" commands don't work all that well here. KEK and TEK are different enough that the commands developed for point-to-point throw some odd output, for example:

CE1#show crypto ipsec sa

interface: Ethernet0/0
    Crypto map tag: gdoimap, local addr 192.168.11.3

   protected vrf: (none)
   local  ident (addr/mask/prot/port): (0.0.0.0/0.0.0.0/0/0)
   remote ident (addr/mask/prot/port): (0.0.0.0/0.0.0.0/0/0)
   <output omitted>

The local and remote ident would normally describe the local and remote subnet listed in the ACE of the interesting traffic list described by this IPSEC SA. However, in the case of GDOI a single SA is shown for the whole GDOI group but no ACE information from the GDOI ACL is given.

You can, however, use traditional ISAKMP commands to see the temporary tunnel to the KS:

CE1#show crypto isakmp sa
IPv4 Crypto ISAKMP SA
dst             src             state          conn-id status
192.168.111.111 192.168.11.3    GDOI_IDLE         1067 ACTIVE

That said, let's look at the GDOI-specific commands.

On the GM, to see if you're registered:

CE1#show crypto gdoi gm
Group Member Information For Group GDOI-GROUP1:
    IPSec SA Direction       : Both
    ACL Received From KS     : gdoi_group_GDOI-GROUP1_temp_acl

    Group member             : 192.168.11.3    vrf: None
       Local addr/port       : 192.168.11.3/848
       Remote addr/port      : 192.168.111.111/848
       fvrf/ivrf             : None/None
       Version               : 1.0.8
       Registration status   : Registered
       Registered with       : 192.168.111.111
       Re-registers in       : 897 sec
       Succeeded registration: 1
       Attempted registration: 1
      <output omitted for brevity>

On the KS, to see who's registered to it:

KS1#show crypto gdoi ks members summary | s 11.3
Group Member ID    : 192.168.11.3        GM Version: 1.0.8
 Group ID          : 1234
 Group Name        : GDOI-GROUP1
 GM State          : Registered
 Key Server ID     : 192.168.111.111

This command produces a lot of output, even when using "summary", when you have many GMs registered. As all of my lab ones are up at this moment, note I filtered the output to just the one GM (CE1) that we've been working with.

To view the details on KEK and TEK on the GM (you may want to check the remaining lifetimes):

CE1#show crypto gdoi | s KEK
KEK POLICY:
    Rekey Transport Type     : Unicast
    Lifetime (secs)          : 84190
    Encrypt Algorithm        : AES
    Key Size                 : 128
    Sig Hash Algorithm       : HMAC_AUTH_SHA
    Sig Key Length (bits)    : 1296

CE1#show crypto gdoi | s TEK
TEK POLICY for the current KS-Policy ACEs Downloaded:
  Ethernet0/0:
    IPsec SA:
        spi: 0x6BAFD3AB(1806685099)
        transform: esp-aes esp-sha-hmac
        sa timing:remaining key lifetime (sec): (1386)
        Anti-Replay(Time Based) : 5 sec interval
        tag method : disabled
        alg key size: 16 (bytes)
        sig key size: 20 (bytes)
        encaps: ENCAPS_TUNNEL

Now, our GM is set to encrypt any traffic (minus UDP 848 and and BGP) that leaves it's e0/0 interface.

If you've ever watched an American cooking show, there's always a moment when the celebrity chef shows the basics of how to put a yet-to-be-cooked dish together, then instantly pops out the final product that's been in the oven for two hours prior, compliments of the magic of television. This is my moment! Not shown here, I've applied the GM config to the other three CE devices, and we have encrypted communication across all CE and Host devices on the MPLS VPN.

I'm going to send pings from Host1 (10.0.111.2, with a default route to CE1) to Host3 (10.0.33.2, with a default route to CE3).

HOST1#ping 10.0.33.2 repeat 10
Type escape sequence to abort.
Sending 10, 100-byte ICMP Echos to 10.0.33.2, timeout is 2 seconds:
!!!!!!!!!!
Success rate is 100 percent (10/10), round-trip min/avg/max = 5/6/7 ms

Great, how do we verify that the packets were encrypted? We go ask CE1:

CE1#show crypto gdoi gm dataplane counters

Data-plane statistics for group GDOI-GROUP1:
    #pkts encrypt            : 10       #pkts decrypt            : 10    #pkts tagged (send)      : 0        #pkts untagged (rcv)     : 0
    #pkts no sa (send)       : 0        #pkts invalid sa (rcv)   : 0
    #pkts encaps fail (send) : 0        #pkts decap fail (rcv)   : 0
    #pkts invalid prot (rcv) : 0        #pkts verify fail (rcv)  : 0
    #pkts not tagged (send)  : 0        #pkts not untagged (rcv) : 0
    #pkts internal err (send): 0        #pkts internal err (rcv) : 0

As we can see we sent 10 encrypted packets and received 10 encrypted packets, it's a fair bet that encryption happened, using our current TEK key. We could go check CE3 as well, but we already know we'd get the same results, because of the number of decrypted packets on CE1.

Let's test this with an ACL change on the KS:

KS1(config)#ip access-list extended getvpn-acl
KS1(config-ext-nacl)#1 deny tcp any eq telnet any
KS1(config-ext-nacl)#2 deny tcp any any eq telnet
KS1(config-ext-nacl)#end
KS1#
*Jan  2 19:16:51.992: %SYS-5-CONFIG_I: Configured from console by console
*Jan  2 19:16:51.992: %GDOI-5-POLICY_CHANGE: GDOI group GDOI-GROUP1 policy has changed. Use 'crypto gdoi ks rekey' to send a rekey, or the changes will be send in the next scheduled rekey

The KS is smart enough to know we just changed global policy, and throws a reminder that the GMs won't be aware of this until the next scheduled rekey unless we force it:

KS1#crypto gdoi ks rekey
KS1#
*Jan  2 19:17:36.784: %GDOI-5-KS_SEND_UNICAST_REKEY: Sending Unicast Rekey with policy-replace for group GDOI-GROUP1 from address 192.168.111.111 with seq # 23

Meanwhile, back on CE1:

CE1#show crypto gdoi gm acl
Group Name: GDOI-GROUP1
 ACL Downloaded From KS 192.168.111.111:
   access-list   deny tcp any port = 23 any
   access-list   deny tcp any any port = 23

   access-list   deny udp any port = 848 any
   access-list   deny udp any any port = 848
   access-list   deny tcp any port = 179 any
   access-list   deny tcp any any port = 179
   access-list   permit ip any any
 ACL Configured Locally:

And then test on CE1:

HOST1#telnet 10.0.33.2
Trying 10.0.33.2 ... Open
Password required, but none set
[Connection to 10.0.33.2 closed by foreign host]

I don't actually have Host3 setup to accept telnet logins, but it's irrelevant - we just generated bidirectional traffic.

And for verification on CE1:

CE1#show crypto gdoi gm dataplane counters

Data-plane statistics for group GDOI-GROUP1:
    #pkts encrypt            : 10       #pkts decrypt            : 10    <output omitted for brevity>

Note, the counters didn't go up this time - because this was sent in plain text. Let's pull those new telnet exemptions back off KS1 and try again:

KS1(config)#ip access-list ext getvpn-acl
KS1(config-ext-nacl)#no 1
KS1(config-ext-nacl)#no 2

KS1#crypto gdoi ks rekey

Try again on Host1:

HOST1#telnet 10.0.33.2
Trying 10.0.33.2 ... Open
Password required, but none set
[Connection to 10.0.33.2 closed by foreign host]

And back to CE1 for verification:

CE1#show crypto gdoi gm dataplane counters

Data-plane statistics for group GDOI-GROUP1:
    #pkts encrypt            : 31       #pkts decrypt            : 29
    <output omitted for brevity>

Perfect!

Another benefit of GETVPN is seamless QoS support. All Cisco tunneling solutions copy TOS markings from the original packet to the encrypted packet when creating the encrypted packet. As such, unless you're trying to perform egress marking (which would require a qos pre-classify configuration), no change is required in QoS to migrate to GETVPN.

We'll test from Host1 to Host2.

First, we need to setup QoS on CE1:

CE1(config)#class-map match-all EF
CE1(config-cmap)# match dscp ef
CE1(config-cmap)#policy-map QOS
CE1(config-pmap)# class EF
CE1(config-pmap-c)#  priority 50000
CE1(config)#int e0/0
CE1(config-if)#service-policy output QOS

HOST1#ping   ! Extended Ping is required to set ToS to 184 (DSCP EF)
Protocol [ip]:
Target IP address: 10.0.222.2   ! Host2
Repeat count [5]: 100
Datagram size [100]:
Timeout in seconds [2]:
Extended commands [n]: y
Source address or interface:
Type of service [0]: 184
<output omitted for brevity>
Type escape sequence to abort.
Sending 100, 100-byte ICMP Echos to 10.0.222.2, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (100/100), round-trip min/avg/max = 1/8/46 ms

Back to CE1 for verification:

CE1#show policy-map int | s EF
    Class-map: EF (match-all)
      100 packets, 19800 bytes
      30 second offered rate 0000 bps, drop rate 0000 bps
      Match:  dscp ef (46)
      Priority: 50000 kbps, burst bytes 1250000, b/w exceed drops: 0

Now let's take a look at COOP, the key server redundancy protocol for GETVPN.

COOP works by establishing permanent ISAKMP sessions between redundant key servers. It uses these tunnels to maintain GM registration status as well as uses dead peer detection (DPD) to ensure other key servers are up.

As I mentioned previously, all KSs must have the same RSA public & private keys installed. This is so if the primary KS fails, the KEK session between the secondary KS(s) and the GMs can be maintained. In this fashion, re-registration from GM to the backup KS is not necessary if a KS fails. Also, traffic isn't impacted - a KS failure is a 'hitless' outage for the GMs.

An additional benefit of having the KSs in sync with one another as well as having the same key on all servers is that a GM can register to any KS - even if it's not the primary. This can become important if a network gets segmented, where some GMs can reach, say, KS1, and others can only reach KS2 (again, note there can be up to eight KSes). When the KSs can reach one another again, they sync their registration database back together!

One more important item of note, if you were paying attention to the diagram: I have KS2 behind a GM. Key servers can be directly connected as CEs, they can be behind a GM, they can basically be anywhere that's globally routable from the rest of the network - it doesn't matter. To show this, I put KS1 in a CE-style configuration, directly attached to PE1, and KS2 behind CE2, in more of a "host-like" setup.

A quick reminder of our config above used to generate the RSA keys:
crypto key generate rsa label MYRSAKEY modulus 1024 exportable

Now we need to go back to KS1 and retrieve that key for KS2:
KS1(config)#crypto key export rsa MYRSAKEY pem terminal 3des MYSECRETPASS

% Key name: MYRSAKEY
   Usage: General Purpose Key
   Key data:
-----BEGIN PUBLIC KEY-----
<Public Key Omitted for Brevity>
-----END PUBLIC KEY-----
-----BEGIN RSA PRIVATE KEY-----
Proc-Type: 4,ENCRYPTED
DEK-Info: DES-EDE3-CBC,0B2283C620CB3CCA

<Private Key Omitted for Brevity>
-----END RSA PRIVATE KEY-----

Now that we have the keys, we can import them into KS2:

KS2(config)#crypto key import rsa MYRSAKEY terminal MYSECRETPASS
% Enter PEM-formatted public General Purpose key or certificate.
% End with a blank line or "quit" on a line by itself.
-----BEGIN PUBLIC KEY-----
<Public Key Omitted for Brevity>-----END PUBLIC KEY-----
quit
% Enter PEM-formatted encrypted private General Purpose key.
% End with "quit" on a line by itself.
-----BEGIN RSA PRIVATE KEY-----
Proc-Type: 4,ENCRYPTED
DEK-Info: DES-EDE3-CBC,0B2283C620CB3CCA

<Private Key Omitted for Brevity>
-----END RSA PRIVATE KEY-----
quit
% Key pair import succeeded.

Now back to configure COOP on KS1:

KS1(config)#crypto isakmp keepalive 10 periodic

KS1(config)#crypto isakmp key COOPKEY address 192.168.222.222
KS1(config)#crypto gdoi group GDOI-GROUP1
KS1(config-gdoi-group)#server local
KS1(gdoi-local-server)#redundancy
KS1(gdoi-coop-ks-config)#local priority 100
KS1(gdoi-coop-ks-config)#peer address ipv4 192.168.222.222

There's not really much new config here, but I'll run over the key elements:

crypto isakmp keepalive 10 periodic
COOP uses Dead Peer Detection (DPD) to keep track of it's neighbors up/down status, and needs to be enabled with this command.

crypto isakmp key COOPKEY address 192.168.222.222
As the KS's maintain a ISAKMP session between them, we need a key to set the session up. I believe hypothetically one could re-use the same key from the GMs, but that seems like a bad idea, so I've been in the practice of using a different key. Note, if you had more than two KSs, this config would need to be replicated for each KS.

redundancy
  local priority 100
  peer address ipv4 192.168.222.222


Enter the redundancy config, set the local priority - higher is better and more likely to become primary - and enter the address of the other KS COOP servers. Note, each key server needs to be configured with the IPs of all the other key servers, so if you were running three COOP key servers, each KS would have two entries (the other two redundant servers) for the other two servers.

With mild adaption of KS1's config to KS2, KS2's config appears like this:

ip access-list extended getvpn-acl
 deny   udp any eq 848 any
 deny   udp any any eq 848
 deny   tcp any eq bgp any
 deny   tcp any any eq bgp
 permit ip any any

crypto isakmp policy 1
 encr aes
 authentication pre-share
 group 2
crypto isakmp key COOPKEY address 192.168.111.111
crypto isakmp key MYGDOIPSK address 0.0.0.0
crypto isakmp keepalive 10 periodic
crypto ipsec transform-set aes128 esp-aes esp-sha-hmac
 mode tunnel
crypto ipsec profile profile1
 set transform-set aes128
crypto gdoi group GDOI-GROUP1
 identity number 1234
 server local
  rekey algorithm aes 128
  rekey authentication mypubkey rsa MYRSAKEY
  rekey transport unicast
  sa ipsec 1
   profile profile1
   match address ipv4 getvpn-acl
   replay time window-size 5
  address ipv4 192.168.222.222
  redundancy
   local priority 40
   peer address ipv4 192.168.111.111

And with that, COOP is up!

KS1#show crypto gdoi ks coop
Crypto Gdoi Group Name :GDOI-GROUP1
        Group handle: 2147483650, Local Key Server handle: 2147483650

        Local Address: 192.168.111.111
        Local Priority: 100
        Local KS Role: Primary   , Local KS Status: Alive
        Local KS version: 1.0.8
        Primary Timers:
                Primary Refresh Policy Time: 20
                Remaining Time: 10
                Antireplay Sequence Number: 247

        Peer Sessions:
        Session 1:
                Server handle: 2147483651
                Peer Address: 192.168.222.222
                Peer Version: 1.0.8
                Peer Priority: 40
                Peer KS Role: Secondary , Peer KS Status: Alive
                Antireplay Sequence Number: 31

                IKE status: Established
                Counters:
                    Ann msgs sent: 220
                    Ann msgs sent with reply request: 0
                    Ann msgs recv: 2
                    Ann msgs recv with reply request: 1
                    Packet sent drops: 27
                    Packet Recv drops: 0
                    Total bytes sent: 156002
                    Total bytes recv: 2247

As I mentioned, there's a permanent ISAKMP session established between COOP KSes, and you can see that with standard ISAKMP show commands:

KS1#show crypto isakmp sa
IPv4 Crypto ISAKMP SA
dst             src             state          conn-id status
192.168.222.222 192.168.111.111 GDOI_IDLE         1001 ACTIVE

We then need to tell our GMs about the additional server(s):

CE1, CE2, CE3, & CE4:
crypto gdoi group GDOI-GROUP1
 server address ipv4 192.168.222.222

So - let's take KS1 out of commission and try a few things.

KS1(config)#int e0/0
KS1(config-if)#shut

Eventually KS2 will realize that KS1 is out-of-comission. It's important to note that unlike a redundancy protocol that's directly in the dataplane (like HSRP), a brief key server outage, in an appropriately-built GETVPN, shouldn't be a big deal. The key servers are there to distribute policy and keys, and the keys are sent out well in advance, so it's unlikely the GMs would even notice the KS outage until a reregistration sometime in the distant future happened.

KS2 (a while later):
*Jan  2 20:26:55.540: %GDOI-5-COOP_KS_TRANS_TO_PRI: KS 192.168.222.222 in group GDOI-GROUP1 transitioned to Primary (Previous Primary = 192.168.111.111)

*Jan  2 20:27:15.543: %GDOI-3-COOP_KS_UNREACH: Cooperative KS 192.168.111.111 Unreachable in group GDOI-GROUP1. IKE SA Status = Failed to establish.

KS2 realizes the primary is down and assumes primary itself.

Let's check in on CE1, and when it expects a rekey:

CE1#show crypto gdoi | i life
        sa timing:remaining key lifetime (sec): (3063)

It's got a bit. Let's see if it will accept new keys from KS2:

KS2#crypto gdoi ks rekey replace-now


CE1#show crypto gdoi | i life
        sa timing:remaining key lifetime (sec): (3598)

CE1#show crypto gdoi gm
Group Member Information For Group GDOI-GROUP1:
    IPSec SA Direction       : Both
    ACL Received From KS     : gdoi_group_GDOI-GROUP1_temp_acl

    Group member             : 192.168.11.3    vrf: None
       Local addr/port       : 192.168.11.3/848
       Remote addr/port      : 192.168.111.111/848
       fvrf/ivrf             : None/None
       Version               : 1.0.8
       Registration status   : Registered
       Registered with       : 192.168.111.111
       Re-registers in       : 3326 sec
       Succeeded registration: 1
       Attempted registration: 1
       Last rekey from       : 192.168.222.222
       Last rekey seq num    : 0
       Unicast rekey received: 7
       Rekey ACKs sent       : 7
       Rekey Rcvd(hh:mm:ss)  : 00:01:31
       DP Error Monitoring   : OFF

As KS2 has KS1's RSA key, CE1 (GM) accepts KS2's authentication. Note CE1 still thinks it's registered with KS1, which is basically irrelevant, as KS2 has taken over all ongoing tasks of KS1.

Now let's force CE2 to re-register to KS2.

CE2(config)#int e0/0
CE2(config-if)#no crypto map gdoimap
*Jan  2 20:35:32.253: %CRYPTO-6-GDOI_ON_OFF: GDOI is OFF

GDOI disabled...

CE2(config-if)#crypto map gdoimap
*Jan  2 20:35:34.330: %CRYPTO-5-GM_REGSTER: Start registration to KS 192.168.111.111 for group GDOI-GROUP1 using address 192.168.12.2
*Jan  2 20:35:34.331: %CRYPTO-6-GDOI_ON_OFF: GDOI is ON

GDOI re-enabled, and now attempting registration to KS1.

*Jan  2 20:36:14.344: %CRYPTO-5-GM_REGSTER: Start registration to KS 192.168.222.222 for group GDOI-GROUP1 using address 192.168.12.2

CE2 gives up on KS1 and moves down its list to KS2.

<output omitted for brevity>
*Jan  2 20:36:14.366: %GDOI-5-GM_REGS_COMPL: Registration to KS 192.168.222.222 complete for group GDOI-GROUP1 using address 192.168.12.2
<output omitted for brevity>

And successful registration!

Looking at KS2's registrations:
KS2#show crypto gdoi ks mem summary  | i Member ID
Group Member ID    : 192.168.12.2        GM Version: 1.0.6
Group Member ID    : 192.168.13.2        GM Version: 1.0.6
Group Member ID    : 192.168.14.2        GM Version: 1.0.8
Group Member ID    : 192.168.11.3        GM Version: 1.0.8

The other GM's didn't reregister to KS2 - KS2 learned about them from KS1 before KS1 went offline.

Let's turn KS1 back online:
KS1(config-if)#no shut

It's all over the Cisco documentation that COOP doesn't support preemption:
"The recovering KS receives an announcement message reply from an existing primary, which has lower priority. In this case, there is no preemption, and the recovering KS remains a secondary KS. This eliminates unnecessary changes in the system."
http://www.cisco.com/c/dam/en/us/products/collateral/security/group-encrypted-transport-vpn/GETVPN_DIG_version_1_0_External.pdf

Well, you could have fooled me!:

KS1(config-if)#
*Jan  2 20:39:32.559: %BGP-5-ADJCHANGE: neighbor 192.168.11.1 Up
*Jan  2 20:39:45.739: %GDOI-5-COOP_KS_REACH: Reachability restored with Cooperative KS 192.168.222.222 in group GDOI-GROUP1.

KS1#show crypto gdoi ks coop | i Role
        Local KS Role: Primary   , Local KS Status: Alive
                Peer KS Role: Secondary , Peer KS Status: Alive

KS2#show crypto gdoi ks coop | i Role
        Local KS Role: Secondary , Local KS Status: Alive
                Peer KS Role: Primary   , Peer KS Status: Alive

I'm running IOS 15.4, and it occurs to me that this could've changed since the documentation was written, but it seems somewhat unlikely seeing as adamant Cisco was about this in all previous documentation. Does anyone have any idea here? I literally cannot get it to not preempt, so I find that confusing.

For a final topic related to COOP, a pure failure of a KS is one thing, but what happens if you have a network segmentation that has some GMs speaking to one KS and some GMs speaking to another KS, in a 'split-brain' scenario?

In that case, the KSs perform what's called a Key Server Merge, which doesn't have much relevance as a study topic (other than knowing it exists), but it does have some design implications. If you're reading this to build a large production GETVPN as opposed to study purposes, I recommend reading

http://www.cisco.com/c/dam/en/us/products/collateral/security/group-encrypted-transport-vpn/GETVPN_DIG_version_1_0_External.pdf
and check out section 3.7.4.2, "Network Split and Merge".

Now, on to some final topics:

Fail Open vs Fail Closed

If a crypto policy isn't in place or isn't matched, the default reaction of the router is to simply send the traffic unencrypted. This is normal, default behavior and isn't a feature. However, if security demands that traffic be stopped rather than being sent in the clear, the Fail Closed feature may be enabled on a per-GM basis:

First, create an ACL of what to still transmit even during fail-closed. For example, your routing and management traffic should probably still be permitted:

ip access-list extended fail-close
deny   tcp any eq bgp any
deny   tcp any any eq bgp


Much like the standard GETVPN ACL, "deny" means "send unencrypted".

Then you basically enable an extension to the crypto map. For example, my GDOI map is called "gdoimap", and looks like this:

crypto map gdoimap 1 gdoi
 set group GDOI-GROUP1

In that case, you create this addtional config:
crypto map gdoimap gdoi fail-close
match address fail-close
 activate

Then, if a valid KEK key isn't present, the only traffic allowed to transmit is BGP, given the ACL above.

Local Exception ACL

It's possible that one global ACL doesn't meet the needs of every GM. If you have a GM that needs to transmit some data in clear text even though it's indicated by the KS ACL that it should be encrypted, you can create a per-GM one-off ACL for this scenario.

From the Cisco documentation:
"The crypto ACL applied at the GM represents a concatenation of the downloaded ACL and local ACL. The order of operations is such that the locally defined ACL is checked first, followed by the one downloaded from the KS."
"Note:    Only deny statements can be added locally at the GM. Permit statements are not supported in the locally configured policies. In case of a conflict, local policy overrides the policy downloaded from the KS."
http://www.cisco.com/c/en/us/products/collateral/security/group-encrypted-transport-vpn/deployment_guide_c07_554713.html

An ACL is created:
ip access-list extended exception-acl
deny   icmp any any


And then applied to the crypto map on the GM:
crypto map gdoimap 1 gdoi
 match address exception-acl


I've gone ahead and applied this on CE1:

CE1#sh crypto gdoi gm acl
Group Name: GDOI-GROUP1
 ACL Downloaded From KS 192.168.111.111:
   access-list   deny udp any port = 848 any
   access-list   deny udp any any port = 848
   access-list   deny tcp any port = 179 any
   access-list   deny tcp any any port = 179
   access-list   permit ip any any
 ACL Configured Locally:
  Map Name: gdoimap
   access-list exception-acl  deny icmp any any
Let's test it...

CE1#ping 192.168.33.33   ! CE3's loopback
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.33.33, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

I scratched my head for a split second until I remembered that CE3 doesn't agree with the exception policy and therefore won't take unecrypted ICMP traffic:

CE3#
*Jan  2 21:08:30.239: %CRYPTO-4-RECVD_PKT_NOT_IPSEC: Rec'd packet not an IPSEC packet. (ip) vrf/dest_addr= /192.168.33.33, src_addr= 192.168.11.3, prot= 1

However, the Key Servers can only reply to unecrypted ICMP traffic:

CE1#ping 192.168.111.111
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.111.111, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/6 ms

Receive-only SA

When deploying GETVPN on an existing network, it is almost certainly desirable to ensure all GMs can decrypt traffic before beginning to encrypt traffic - otherwise, some GMs would be sending encrypted traffic to other GMs that hadn't had the config pasted in yet.

Receive-only SA is a policy pushed from the KS that tells all GMs to decrypt traffic but not encrypt it. Implementation is very simple:

crypto gdoi group GDOI-GROUP1
 server local
  sa receive-only


Passive SA

While deploying Receive-only SA, it may also be a good idea to do small-scale encryption testing without globally rolling encryption and hoping for the best. Passive SA is a per-GM setting that basically overrides the sa receive-only command pushed from the KS. It indicates that the GM should encrypt and decrypt traffic, rather than just decrypting it. This allows for a single-GM (or however many you'd like to apply the config to) rollout of encryption, without applying it globally with the KS.

crypto gdoi group GDOI-GROUP1
 passive


Interestingly, there's also a privilege exec command for the GMs that can control whether to encrypt, decrypt, or both - basically a macro for the functions described above:

CE3#crypto gdoi gm group 1234 ipsec direction ?
  both     IPsec SA will only accept cipher text and will encrypt the packet
              before forwarding it out
  inbound  Specify IPsec SA inbound options

CE3#crypto gdoi gm group 1234 ipsec direction inbound ?
  only      IPsec SA will accept both cipher/plain text and will forward the
               packet in clear.
  optional  IPsec SA will accept both cipher/plain text and will encrypt the
                 packet before forwarding it out

Using this command, you can indicate mandatory encryption in both directions ('both'), mandatory inbound encryption ('inbound only') or to receive encrypted and unencrypted traffic inbound ('inbound optional').

Multiple Group Support and Authorization Lists

We've only been using one group up until now. Imagine if you had separate divisions inside a company that shared a single VRF in a L3 MPLS VPN but have no business speaking to one another, at least not without going through a firewall at HQ first (I've also heard there are service provider applications for this deployment).

Let's say Division 1 is CE1 and CE2, and Division 2 is CE3 and CE4.

Let's configure the new group on KS1. I am deliberately not configuring KS2 for brevity, the second group will not participate in the COOP config from above.

KS1(config)#crypto gdoi group GDOI-GROUP2
KS1(config-gdoi-group)#identity number 6789
KS1(config-gdoi-group)#server local
KS1(gdoi-local-server)#rekey algorithm aes 128
KS1(gdoi-local-server)#rekey authentication mypubkey rsa MYRSAKEY
KS1(gdoi-local-server)#rekey transport unicast
KS1(gdoi-local-server)#sa ipsec 1
KS1(gdoi-sa-ipsec)#profile profile1
KS1(gdoi-sa-ipsec)#match address ipv4 getvpn-acl
KS1(gdoi-sa-ipsec)#replay time window-size 5
KS1(gdoi-sa-ipsec)#address ipv4 192.168.111.111

And on CE3 and CE4:
CE3(config)#crypto gdoi group GDOI-GROUP1  ! name is only locally significant
CE3(config-gdoi-group)#no server address ipv4 192.168.222.222 ! remove KS2
CE3(config-gdoi-group)#identity number 6789 ! change the group number

CE4(config)#crypto gdoi group GDOI-GROUP1  ! name is only locally significant
CE4(config-gdoi-group)#no server address ipv4 192.168.222.222 ! remove KS2
CE4(config-gdoi-group)#identity number 6789 ! change the group number

CE4#ping 192.168.33.33 so lo0    ! 192.168.33.33 is CE3's loopback
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.33.33, timeout is 2 seconds:
Packet sent with a source address of 192.168.44.44
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 5/5/6 ms

CE4#sh crypto gdoi gm dataplane counters

Data-plane statistics for group GDOI-GROUP1:
    #pkts encrypt            : 5        #pkts decrypt            : 5
    #pkts tagged (send)      : 0        #pkts untagged (rcv)     : 0
    #pkts no sa (send)       : 0        #pkts invalid sa (rcv)   : 0
    #pkts encaps fail (send) : 0        #pkts decap fail (rcv)   : 0
    #pkts invalid prot (rcv) : 0        #pkts verify fail (rcv)  : 0
    #pkts not tagged (send)  : 0        #pkts not untagged (rcv) : 0
    #pkts internal err (send): 0        #pkts internal err (rcv) : 0

OK, CE3 and CE4 can still talk to one another.

CE4#ping 10.0.111.1  ! An interface on CE1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.0.111.1, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

CE1#
*Jan  2 21:38:41.803: %CRYPTO-4-RECVD_PKT_INV_SPI: decaps: rec'd IPSEC packet has invalid spi for destaddr=10.0.111.11, prot=50, spi=0xA0B897B5(2696452021), srcaddr=192.168.14.2, input interface=Ethernet0/0

No talking from CE4 to CE1.

You'll note I used the same RSA keys on both groups:
KS1(gdoi-local-server)#rekey authentication mypubkey rsa MYRSAKEY

It doesn't matter - the TEK keys aren't built off the RSA key. The RSA key is just for the KS to authenticate to the GM, to prove it's still the original group of KS's the GM registered to.


But there's still an easy, easy way to work around this on the GM:

CE4(config)#crypto gdoi group GDOI-GROUP1
CE4(config-gdoi-group)#identity number 1234

CE4#ping 10.0.111.1 so lo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.0.111.1, timeout is 2 seconds:
Packet sent with a source address of 192.168.44.44
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 6/6/7 ms

Authorization lists to the rescue!

KS1(config)#access-list 10 permit 192.168.11.3  ! CE1's registration address
KS1(config)#access-list 10 permit 192.168.12.3  ! CE2's registration address

KS1(config)#access-list 20 permit 192.168.13.2  ! CE3's registration address
KS1(config)#access-list 20 permit 192.168.14.2  ! CE4's registration address

KS1(config)#crypto gdoi group GDOI-GROUP1
KS1(config-gdoi-group)# server local
KS1(gdoi-local-server)#  authorization address ipv4 10
KS1(gdoi-local-server)#crypto gdoi group GDOI-GROUP2
KS1(config-gdoi-group)# server local
KS1(gdoi-local-server)#  authorization address ipv4 20

Forcing re-registration on CE4:
CE4(config-if)#no crypto map gdoimap
CE4(config-if)#crypto map gdoimap

KS1(config)#
*Jan  2 21:48:15.202: %GDOI-1-UNAUTHORIZED_IPADDR: Group GDOI-GROUP1 received registration from unauthorized ip address: 192.168.14.2

And CE4 begrudingly goes back to his own group:
CE4(config-if)#crypto gdoi group GDOI-GROUP1
CE4(config-gdoi-group)#identity number 6789

*Jan  2 21:49:25.208: %GDOI-5-GM_REGS_COMPL: Registration to KS 192.168.111.111 complete for group GDOI-GROUP1 using address 192.168.14.2 fvrf default ivrf default

...and succeeds.

VRF Lite Support

One CE can join multiple, different GDOI groups on the KS by using VRF-lite on the CEs. This isn't that complex conceptually, however, I saw little reason to lab it. If you want to know more, here's the link to the Cisco documentation: http://www.cisco.com/c/en/us/products/collateral/ios-nx-os-software/enterprise-class-teleworker-ect-solution/prod_white_paper0900aecd80617171.html

Note, the KS does not support VRFs.

Cheers,

Jeff