Saturday, May 25, 2013

NTP

I try to dig pretty hard in my blog posts.  In this case, I found out that there's a lot to NTP!  In fact, there's a lot that would be a real stretch to get in to the CCIE lab.  I feel bad about writing a "high level" post, but I also want to save both my time and my readers' time - a lot of the topics I could've dug harder on are really unlikely to be of any use, either production or lab-wise.

That said, here's a list of things I will not be covering:
 - Any versions besides NTP v3.  v4 should be out-of-scope for the lab exam, as it's not supported on 12.4(15)T.
- The NTP query system.  This is normally SNMP based and there's very little information on Cisco's implementation.  It has nothing to do with syncing time.

Let's start by discussing the stratum system.

The stratum is treated by NTP similar to a hop system in a distance vector protocol.  Stratum 1 servers should be best, stratum 2 servers are one NTP hop away from stratum 1.  The servers speaking to stratum 2s become stratum 3s, and so on and so forth.

Let's look at some easy concepts that will help lead into some more advanced topics.


That's simple enough.  We gain a stratum the further we get from the most trusted time servers. 

As an attempt at being somewhat realistic, I'm going to start another diagram at stratum 5, because a stratum 1 router should have a hyper-accurate clock attached (such as a GPS or atomic clock).



Our stratum 5 router would be configured like this:

ntp master 5

Yep, that's it.  Of interesting note, it actually synchronizes with itself at either 127.127.1.1 or 127.127.7.1. Newer IOSes use 127.127.1.1, older use 127.127.7.1.  The master's "source" at 127.127.x.1 is always one lower than what you configured the master level at.  In this router's case, you'd be synchronizing to stratum 4:

Router(config)#do show ntp association detail
127.127.1.1 configured, our_master, sane, valid, stratum 4

Out of the box, NTP allows anyone to receive time from it, so no more config is required on our stratum 5 device.

Let's say our stratum 5 device is at 5.5.5.5:

The next row down, on stratum 6:

ntp server 5.5.5.5

once again, that's it.  Before synchronization happens, it will be stratum 16 (maximum stratum), and refuse to serve time to others.  Afterwards it will be stratum 6, as it's receiving time from stratum 5.

Now we'll work at the bottom row, on the "stratum ??" device:

ntp server 6.6.6.6
ntp server 5.5.5.5

So what stratum would we be?  It's probably relatively easy to figure out that we'll still end up stratum 6, as it naturally prefers the "better" time from the stratum 5 router.  Better is determined by the stratum hop-count.

What if we did this?:

ntp server 6.6.6.6 prefer
ntp server 5.5.5.5

Still stratum 6!  Why not 7?  Because prefer, contrary to what some other documentation indicates, doesn't actually statically prefer that host.  If two systems had the same stratum, prefer does function, but not for two separate stratums.

Now let's take a look at the ntp peer command.  The premise isn't that hard, it means sync time from that router, but also accept time from it.  Wait a minute!  How do we determine who has the right time in that case?

It's stratum-based, of course!  But wait, why not just make the router with the better time the server.  Well, that's great until a network failure occurs....

Let's take this scenario:


Let's say our two stratum 6 servers are using the ntp server command against the stratum 5 router.  That's great until the link between the right stratum 6 router and top stratum 5 router fails.  You'd isolate time services from the right side of the diagram.

ntp peer to the rescue!  By telling the two stratum 6 routers to peer off each other, they can both act as either a client or a server at the same time.  The router experiencing the failure would have lost its link to stratum 5, and would accept the stratum 6 as it's master, on the fly.  If either the left or right link failed, the other could fill in as its server.  When the failed link comes back up, the router would automatically reconverge to the better stratum master.

Now that that's all explained, let's swap to a simpler topology, using physical gear, to look at the CLI output.  We'll have three routers, R1, R2, and R3.  They all have layer 3 reachability to each other. 

Let's make R1 a stratum 5 master:

R1(config)#ntp master 5
R1(config)#exit
R1#show ntp status
Clock is synchronized, stratum 5, reference is 127.127.7.1
nominal freq is 250.0000 Hz, actual freq is 249.9857 Hz, precision is 2**18
reference time is D54801E4.4BC4EBC0 (22:48:04.295 BOB Thu May 23 2013)
clock offset is 0.0000 msec, root delay is 0.00 msec
root dispersion is 0.02 msec, peer dispersion is 0.02 msec

Above we see R1 is synchronized to its internal clock at 127.127.7.1

R1#show ntp association detail
127.127.7.1 configured, our_master, sane, valid, stratum 4
ref ID 127.127.7.1, time D54801E4.4BC4EBC0 (22:48:04.295 BOB Thu May 23 2013)
our mode active, peer mode passive, our poll intvl 64, peer poll intvl 64
root delay 0.00 msec, root disp 0.00, reach 377, sync dist 0.015
delay 0.00 msec, offset 0.0000 msec, dispersion 0.02
precision 2**18, version 3
<output omitted>

There's a lot of information on this command, but I've highlighted the important parts.  We can see who we're peered with, whether the peering is "sane" and "valid", and what the stratum is of our neighbor (note the "neighbor's" value of stratum 4, although we configured master 5!). 

Now let's setup R2 to poll off this device.

R2(config)#ntp server 1.1.1.1     ! 1.1.1.1 is R1's Loopback address

R2#show ntp status
Clock is unsynchronized, stratum 16, no reference clock
nominal freq is 250.0000 Hz, actual freq is 249.9999 Hz, precision is 2**16
reference time is D52B5B95.62DBC65D (09:15:01.386 UTC Wed May 1 2013)
clock offset is 1878824324.3843 msec, root delay is 57.97 msec
root dispersion is 21735550.67 msec, peer dispersion is 16000.00 msec
uh-oh, we're out of sync!

R2#show ntp association detail
1.1.1.1 configured, insane, invalid, stratum 5
ref ID 127.127.7.1, time D54806A4.5D738DD4 (03:08:20.365 UTC Thu May 23 2013)
our mode client, peer mode server, our poll intvl 64, peer poll intvl 64
root delay 0.00 msec, root disp 0.03, reach 0, sync dist 15904.022
delay 57.97 msec, offset 1878824324.3843 msec, dispersion 16000.00
precision 2**18, version 3
org time 00000000.00000000 (00:00:00.000 UTC Mon Jan 1 1900)
rcv time 00000000.00000000 (00:00:00.000 UTC Mon Jan 1 1900)
xmt time 00000000.00000000 (00:00:00.000 UTC Mon Jan 1 1900)
filtdelay =     0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
filtoffset =    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
filterror =  16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0 16000.0

Oh, the first time I configured NTP, how I came to hate you, "insane, invalid" error message!

This output is very common for NTP, and can be broken down easily if you disregard most of the message.

offset = This is the time difference in milliseconds between our local clock and the ntp server's reference clock.
insane, invalid = I'm sure this has more meaning, but in short, this basically says look at the offset value.  The offset must be < 1000 msec (1 second) off in order for the insane, invalid error to go away.

The problem is, after NTP kicks in, it doesn't just shift the clock instantaneously.  It slowly drifts towards it.  If your router is stuck in 1993 and you're trying to shift to 2013, this will take a long time. You can speed it up by manually adjusting the clock closer:

R2#clock set 03:17:00 May 23 2013

After you're < 1 second off, the router will adjust its stratum from 16 to the appropriate stratum (in our case, 6). 

R2#show ntp status
Clock is synchronized, stratum 6, reference is 1.1.1.1
<output omitted>

R2#show ntp association detail
1.1.1.1 configured, our_master, sane, valid, stratum 5
ref ID 127.127.7.1, time D5480624.5B94958C (03:06:12.357 UTC Thu May 23 2013)
our mode client, peer mode server, our poll intvl 64, peer poll intvl 64
root delay 0.00 msec, root disp 0.03, reach 377, sync dist 29.129
delay 57.39 msec, offset 0.9640 msec, dispersion 0.41
<output omitted>

Note our offset of .9640, less than 1 second.  At this point, R2 would now be willing to be an NTP server as well.  Let's verify on R3.

R3(config)#ntp server 2.2.2.2

R3#show ntp status | i Clock
Clock is synchronized, stratum 7, reference is 2.2.2.2
Again, any router that's > stratum 16 is automatically happy to be an NTP server. 

Let's look at some random features before moving on to authentication & access control.

NTP updates are always sent in UTC/GMT; time zone information is adapted to the local time with:
clock timezone PST +7

If your router has a hardware clock, you can use NTP to synchronize to it:
ntp update-calendar

Want to control how many associations it can make?
ntp max-associations 20

Disable NTP from answering on a per-interface basis?
ntp disable

Authentication & Access Control

It can be a bad idea to have a promiscuous NTP process.  While I haven't found any flaws that would cause significant damage to the router directly, attacking NTP can be used as a way to obscure what really happened during another attack.  Logs rely heavily on the time being right so that they can be correlated against specific events; if an attacker managed to skew all your logging times it could make it far more difficult to track down what happened.

So how exactly could NTP be abused?

Two ways:
1) A malicious NTP peer relationship
2) A malicious, fake NTP server

I've pre-setup R2 to treat R1 as it's NTP server.  R1 is a stratum 5 master.

R2#show ntp association detail
1.1.1.1 configured, our_master, sane, valid, stratum 5
<output omitted>

Now let's have R3 hijack R2's time:
R3(config)#ntp master 1
R3(config)#ntp source lo0
R3(config)#ntp peer 2.2.2.2

R2#show ntp association detail
1.1.1.1 configured, insane, invalid, stratum 5
<output omitted>

3.3.3.3 dynamic, our_master, sane, valid, stratum 1<output omitted>

Yes, it's just that easy to hijack NTP. Note I don't reference 3.3.3.3 anywhere in R2's config.  R3 simply announced itself as a peer with a better stratum and it wins.

You'd use either access controls or authentication to fix this.

The other potential "hack" is to impersonate a valid NTP server.  Let's say the clients already have access controls to prevent a new NTP server from taking over as shown above.  An impersonation might be from a rogue server posing as having the same IP address as the legitimate NTP server. If it gets polled for time instead, or worse yet, it announced time via broadcast or multicast (discussed later), then its time may be assumed as the correct time.  This can be fixed with authentication.

Let's look at access controls first.  These have a funny order-of-operations and can be confusing.

R3(config)#ntp access-group ?
  peer        Provide full access
  query-only  Allow only control queries
  serve       Provide server and query access
  serve-only  Provide only server access

peer = I allow every operation from this set of IPs
query-only = control queries only, nothing to do with time sync, and outside the scope of this article
serve = I serve time + allow control queries
serve-only = I serve time only

So here's the funny rules.
1) Rules are matched in the order shown above: peer, query-only, serve, serve-only.
2) Once you've put an access-list with an implicit deny in place, there's an implicit deny for all the other mechanisms.  So you need to permit everything you want after that point.

Let's walk through some examples.

First, let's stop that rogue peer on R2:

access-list 90 deny   any
ntp access-group peer 90

We'll eventually see the R3 peer fall off:

But wait, there's more!  .. well, problems that is.

sh ntp association detail
1.1.1.1 configured, insane, invalid, stratum 5

Well we lost 3.3.3.3, but we also lost 1.1.1.1.  We need to trust 1.1.1.1 to set our time:

access-list 90 permit 1.1.1.1
access-list 90 deny   any

R2(config)#do sh ntp association detail
1.1.1.1 configured, our_master, sane, valid, stratum 5

We recover pretty quickly.

Now let's have R3 get it's time from R2.

R3(config)#no ntp
R3(config)#ntp source lo0
R3(config)#ntp server 2.2.2.2

On R3, note that we specify a loopback address as source.  This is best practice when you're restricting access on the server, as NTP clients will automatically source their packets from the nearest interface.  If you lose an interface, you may lose your time, because it's a fair bet you only permitted the closest interface in your access list on your NTP server.

R3(config)#do show ntp association detail
2.2.2.2 configured, insane, invalid, unsynced, stratum 16

And, predictably, it doesn't work.  We restricted peering in the first example, which also restricted every other type of communication automatically.  As mentioned above, you get an implicit deny as soon as you add an access list.

R2(config)#access-list 91 permit 3.3.3.3
R2(config)#ntp access-group serve-only 91
R3(config)#do show ntp association detail
2.2.2.2 configured, our_master, sane, valid, stratum 6

Let's try an order-of-operations example.

Let's permit 3.3.3.3 on peer, and then deny it in serve-only.

R2(config)#no ntp
R2(config)#access-list 91 permit 3.3.3.3
R2(config)#access-list 92 deny   3.3.3.3
R2(config)#ntp master 5
R2(config)#ntp access-group serve-only 92
R2(config)#ntp access-group peer 91

R3#show ntp association detail
2.2.2.2 configured, insane, invalid, unsynced, stratum 16

Well, that didn't work, but it should have.  We expected to have the peer access-list take priority over the serve-only.  Let's examine R2:

R2(config)#do show ntp association detail
127.127.7.1 configured, insane, invalid, unsynced, stratum 4

Uh-oh, R2 thinks its own clock is insane/invalid.
This is a big gotcha on older 12.4(x) code.  A master syncs to itself on 127.127.7.1.  If we don't allow it as a peer, we won't be getting time from ourselves!  Of important note, newer versions of 12.4(x) don't have this problem.  You can easily spot the difference, as modern versions sync to 127.127.1.1 instead.  So if you see that 7 in the 3rd octet, it's a warning sign to not forget to permit it.

R2(config)#access-list 91 permit 127.127.7.1

<wait a while>

R2(config)#do sh ntp association detail
127.127.7.1 configured, our_master, sane, valid, stratum 4

Now R2 is happy again.

R3(config)#do show ntp association detail
2.2.2.2 configured, our_master, sane, valid, stratum 5

And so is R3!

Now let's try authentication.
First rule of authentication: the client is authenticating the server.

There's not a whole lot of harm of giving time to someone, so there's no need for the server to authenticate the client -- if you want to control that, then use access groups.  However, the client needs to know they're not talking to a fake server, and that's where authentication comes in.

Let's have R3 authenticate R2.

The server-side configuration is rather light:

R2(config)#ntp authentication-key 1 md5 CISCO
(note, some newer versions of IOS, including ones potentially used on the lab exam, require "ntp trusted-key <key number>" on the server as well)

The client side configuration:

R3(config)#ntp authentication-key 1 md5 CISCO
R3(config)#ntp authenticate
R3(config)#ntp trusted-key 1
R3(config)#ntp server 2.2.2.2 key 1

The trusted-key command allows you to trust a variety of keys instead of just one.  You could, for example, trust keys 1-3 if you had keys 1, 2, & 3 defined.

And the outcome?

R3(config)#do show ntp association detail
2.2.2.2 configured, authenticated, our_master, sane, valid, stratum 5

Of important note, this doesn't in any fashion force authentication for the other clients.  Other NTP clients will still be able to request time without authentication..

And last, let's look at broadcast and multicast NTP.  We're going to have R2 broadcast time to R3 on a serial link connecting them.

Not shown here, I removed the static configuration on R3 that was referencing R2.

R3(config)#interface Serial1/3
R3(config-if)#ntp broadcast client
R3(config-if)#ntp broadcast key 1

We'll also keep authenticating the packets.  This is particularly important with broadcast or multicast NTP deployments.

R2(config)#interface Serial0/1
 ntp broadcast key 1

and the outcome?

R3(config)#do show ntp association detail
155.1.23.2 dynamic, authenticated, our_master, sane, valid, stratum 5
<output omitted>

Note the dynamic peering this time.

To be fair, it wasn't quite as easy as I show above.  What looks like an IOS bug had R3 remembering that R2 was a configured peer, and it wasn't happy about getting broadcasts from it.  I found that with these commands:

debug ntp packet
debug ntp authentication

After doing a "no ntp" and re-entering just the vital commands, it was happy again.

I'm not going to do a thorough write-up of multicast basics here, so the multicast configuration is very similar to the broadcast:

R2(config)#interface Serial0/1
R2(config-if)#no ntp broadcast
R2(config-if)#ntp multicast 239.0.0.1 key 1

R3(config)#interface Serial1/3
R3(config-if)#no ntp broadcast client
R3(config-if)#no ntp broadcast key 1
R3(config-if)#ntp multicast client 239.0.0.1
R3(config-if)#ntp multicast key 1

And the outcome?

R3(config-if)#do show ntp association detail
155.1.23.2 dynamic, authenticated, our_master, sane, valid, stratum 5

There's a potential gotcha on multicast NTP - the default TTL, at least on 12.4T, is 1.  That means it's basically only valid for that segment.  You can't multicast route it without changing the TTL:

R2(config-if)#ntp multicast ttl ?
  <1-255>  TTL
There you have it folks!  Hope you enjoyed,

Jeff Kronlage

3 comments:

  1. Hi Jeff great article

    could you please assist me with explanation of output

    sh ntp association detail


    10.10.10.10 Configure, ipv4, our_master, sane, valid, stratum 3 ref ID 98.97.96.9

    In this line are mentioned 2 IPs.
    And my question here is: With which IPs has my server synchronized

    thanks

    ReplyDelete