Sunday, September 7, 2014

CCIE v4 to v5 Updates: NTPv4 and Netflow

I didn't find these updates on any Cisco or 3rd party list, but when writing my original NTP and Netflow blogs in mid-2013, I mentioned out-of-scope topics when writing them, because they weren't supported on IOS v12.4(15)T. Now that v5 is out, all those topics are back in-scope, so I decided to blog them.

Here are the original articles this one builds off of:

http://brbccie.blogspot.com/2013/05/ntp.html
http://brbccie.blogspot.com/2013/06/netflow.html

The topics we'll be covering specifically are:
- Netflow w/ NBAR
- IPFIX (Netflow v10)
- NTPv4 (IPv6 support)
- NTPv4 Multicast NTP
- NTP Panic
- NTP Maxdistance
- NTP Orphan

Netflow
First, I wanted to mention an omission from my original blog. At that time I didn't have a collector that would support Flexible Netflow, so I evaluated FNF via Wireshark. That was fairly effective except I was missing a major element of netflow: the bytes transferred! I'm now using a collector that supports FNF, and I immediately noticed I wasn't graphing any traffic.

flow record JIMBO
 match ipv4 source address
 match ipv4 destination address
 collect counter bytes
 collect counter packets

This is a simple, working FNF config. Matching or collecting counter bytes and counter packets should be done to make Netflow do what you're used to it doing -- measuring traffic.

What's the advantage of integrating NBAR with Netflow?
By default, Netflow only exports very high-level protocol information. Integrating NBAR gives very specific/granular protocol output to the collector. Note, your collector needs to specifically support this, this is not a small change from the protocol level.

If you're familiar with how the template is sent out for FNF every so often, the NBAR table is very similar. IOS will send out a rather large (many packets) template defining the NBAR Application to ID at specified intervals, then those IDs are sent with the Netflow packet to define what the protocol is.

There are several other blogs out there that give big, complex templates for integrating NBAR with Netflow. I took a few of these as a base and worked backwards to the real requirements. This is not a hard thing to enable. Your flow record must contain collect application name (or match application name), and optionally you can tune the frequency of the NBAR FNF template being sent out with option application-table timeout in the exporter.

Here's a working config:

flow record FNF-RECORD
 match ipv4 source address
 match ipv4 destination address
 collect counter bytes
 collect counter packets
 collect application name 

flow exporter FNF-EXPORTER
 destination 192.168.0.5
 source GigabitEthernet1
 transport udp 9996
 template data timeout 60
 option application-table timeout 30

flow monitor FNF-MONITOR
 exporter FNF-EXPORTER
 cache timeout inactive 60
 cache timeout active 60
 record FNF-RECORD

interface gig1
 ip flow monitor FNF-MONITOR input

Netflow was recently made an open standard with v10. The open version is called IPFIX. To enable IPFIX output instead of FNF v9, you would:

flow exporter FNF-MONITOR
 export-protocol ipfix

Note I haven't tested this beyond checking it in Wireshark, because I still don't have a collector that speaks IPFIX.

NTP



The big difference on NTP v4 is IPv6 support. There's really not much to cover on the basics... clearly broadcast NTP is gone, but Multicast NTP still works the same general way it did in v4.

R1(config)#ntp master 4

R2(config)#ntp server 1::1

R2#show ntp association detail
1::1 configured, ipv6, our_master, sane, valid, stratum 4
ref ID 127.127.1.1    , time D7C45F20.4AC083E0 (19:27:28.292 UTC Wed Sep 17 2014)
<output omitted>

Really quite simple.

15.x implementations of NTP now leave domain names in the config.
Pre 15.x:
foo.com(config)#ip host foo.com 4.4.4.4
foo.com(config)#ntp server foo.com
foo.com(config)#do sh run | i ntp
ntp server 4.4.4.4

It would translate the hostname to an IP address and the IP address would be saved in the config, not a good thing if the server changes IPs.

Post 15.x:
R2(config)#ip host test.com 4.1.1.1
R2(config)#ntp server test.com
R2(config)#do sh run | i ntp
ntp server test.com

Let's take a look at the multicast option. As IPv6 multicast has blessedly been removed from the v5 blueprint, I'm going to cheap out and perform non-routed/same-link multicast.

R2(config)#no ntp server 1::1

R1(config)#ntp authentication-key 1 md5 CISCO
R1(config)#ntp trusted-key 1
R1(config)#int gig1.123
R1(config-subif)#ntp multicast FF02::123 key 1

R2(config)#ntp authentication-key 1 md5 CISCO
R2(config)#ntp trusted-key 1
R2(config)#ntp authenticate
R2(config)#int gig1.123
R2(config-subif)#ntp multicast client FF02::123

R2(config-subif)#do show ntp ass det
FE80::20C:29FF:FEB6:3557 dynamic, ipv6, authenticated, our_master, sane, valid, stratum 4
ref ID 127.127.1.1    , time D7C460E0.4AC083E0 (19:34:56.292 UTC Wed Sep 17 2014)

Maxdistance, for me, is very confusing. It appears to be a trust value. It's normally modified in NTPv4 in order to speed up convergence. As I understand it, the higher the value the faster the synchronization will happen, because the upstream time will be trusted sooner. The algorithm appears to combine half the value of the root delay and the dispersion, and if that value is lower than Maxdistance, then it's OK to consider yourself in-sync. My labbing did not produce exactly that outcome but it was extremely hard to say for sure because my NTPv4 convergences very quickly. Because you basically have to be a time expert to understand what this does, I would hope the CCIE lab would be limited to two types of questions on it:
1) Set it to some value they provide
2) Set it to "slowest" convergence (1) or "fastest" convergence (16)

R1(config)#ntp maxdistance ?
  <1-16>  Maximum distance for synchronization

NTP Panic is simple:

R2(config)#ntp panic ?
  update  Reject time updates > panic threshold (default 1000Sec)

It does just what it says - if my peer or configured master's clock is more than 1,000 seconds off of my clock, reject the update and syslog:

.Sep  8 00:51:00.155: NTP Core (ERROR): Time correction of nan seconds exceeds sanity limit of 0. seconds. Set clock manually to the correct UTC time.

NTP Orphan is really cool. It seems like an obvious feature now that I've seen it, but I can imagine this is a huge help for smaller organizations that rely heavily on NTP.

Let's say, from our diagram, R1 is an Internet time server that our fictional organization uses as its sole NTP master. R2 and R3 are edge routers inside the company, and R4 and R5 will represent servers querying R2 and R3.

So to be clear, R2 and R3 get their time from R1, and also peer towards one another (so if R3 can't reach R1 but R2 can, R3 can learn it's time via R2).  R4 and R5 query R2 and R3 for time, respectively.

Relevant config:
R1(config)#ntp master 4

R2(config)#int gig1.123
R2(config-subif)#no  ntp multicast client FF02::123
R2(config-subif)#no ntp authenticate
R2(config)#ntp server 1::1
R2(config)#ntp peer 3::3
R2(config)#ntp source lo0

R3(config)#ntp server 1::1
R3(config)#ntp peer 2::2
R3(config)#ntp source lo0

R4(config)#ntp server 2::2

R5(config)#ntp server 3::3

At this point every device has the up-to-date time.

Now let's say R1 goes offline.
R1(config)#int lo0
R1(config-if)#shut

<<wait a while>>

R2(config)#do show ntp status
Clock is unsynchronized, stratum 16, no reference clock
<output omitted>

R3(config)#do show ntp status
Clock is unsynchronized, stratum 16, no reference clock
<output omitted>

and obviously R4 and R5 share the same fate.

What if we could program R2 and R3 to take their best stab at what the time should still be - mind you we're talking about being only a couple minutes since last sync, so the time is probably still very close to accurate - and then temporarily and seamlessly take over the NTP Master role if they lose valid clock from R1?

This is exactly what NTP Orphan does.

The config is extremely complicated:

R2(config)#ntp orphan 6

R3(config)#ntp orphan 6

(I was joking about the complicated part)

Really, that's it.  Let's understand what's happening here now.  Orphan kicks in when we lose sync with our server. The number 6 here is a stratum number, and must be a number lower than your real upstream NTP server - otherwise the failover/fail-back mechanism won't work right. 

Best practices indicate configuring the same Orphan stratum on all devices you're running Orphan on, then peering all the Orphans to one another so that only one is "elected" to be the temporary Orphan master.

R2(config)#do show ntp status
Clock is synchronized, stratum 6, reference is 127.0.0.1
<output omitted>

We see R2 is now stratum 6, synchronized with it's own virtual Orphan server.

R3(config)#do show ntp status
Clock is synchronized, stratum 7, reference is 26.33.33.239
<output omitted>

R3 is synchronized with R2 as its Master. 

R4#show ntp status
Clock is synchronized, stratum 7, reference is 26.33.33.239

R4 is synchronized with R2 as its master.

R5#show ntp status
Clock is synchronized, stratum 9, reference is 24.235.166.45

R5 is synchronized with R5 as its master.

Now the most important feature of this is fail-back, let's re-activate R1.

R1(config)#int lo0
R1(config-if)#no shut

R3 was first to recover:
R3(config)#do show ntp association detail
1::1 configured, ipv6, our_master, sane, valid, stratum 4

It automatically shut down its Orphan process when it synced to the superior stratum 4.

R5 then received the now-correct time from R3:
R5#show ntp association detail
3::3 configured, ipv6, our_master, sane, valid, stratum 5

Cheers,

Jeff Kronlage


No comments:

Post a Comment