Thursday, May 16, 2013

SNMP & RMON

There's quite a bit to Cisco's SNMP implementation.  Cisco also has a partial implementation of the RMON specification that, among other functions, allows monitoring local SNMP values and generates SNMP traps or syslog as a result.

I'm only going to give a high-level of how SNMP works and what the basic differences are in the versions.

SNMP has three basic functions:
- To expose system information that can be polled by a network monitoring station (NMS) to get information from the device (SNMP GET)
- To allow a NMS to make changes to the device (SNMP SET)
- To directly send urgent information to the NMS (SNMP TRAP or INFORM)

SNMP has four versions:
- v1 - The original specification. Has security problems.
- v2 - The original specification, plus 64-bit counters, informs, and a complex user authentication and data encryption system.  v2 is not backwards compatible with v1, it uses a separate packet structure.  v2 is not widely used, and is not supported on IOS.  This was largely due to the complexity of the security system.
- v2c - 64-bit counters and informs, new packet structure, but without the authentication and data encryption.  This format is widely used and supported on IOS.
- v3 - Similar to v2c except it re-adds user authentication and data encryption.

v1 and v2c both use "communities" for authentication, which is basically a weak, plain-text password system.  v3 uses users instead.

All systems uses the concept of MIBs and OIDs.  OIDs are hierarchical identifiers in numerical format that represent some near-arbitrary value on the device. Most OIDs you'll see on a Cisco device will start with one of the following OIDs:
1.3.6.1.2.1 "Interfaces"
1.3.6.1.4.1.9 "Enterprises - Cisco"

MIBs are dictionaries that give a very primitive text description of an OID.  MIBs are normally installed on your NMS.  If you're querying something like clogHistoryEntry.4.58, you're querying by MIB instead of by OID.  MIBs translate directly to an OID on the target device.

Basic tutorial over, let's look at Cisco's implementation.  Here's our very simple lab:


The Ubuntu VM will act as our NMS. 

R1(config)#snmp-server community public RO
R1(config)#snmp-server community private RW

This is probably the most basic (and miserably insecure) setup for demoing SNMP GET and SET.  The first command associates the community "public" with full read-only access to every OID used on the router.  The second command associates the community "private" with full read & write access to every OID used on the router.

I'm going to use snmpwalk on the Ubuntu box.  Snmpwalk is basically a macro to perform a series of SNMP GETs for all OIDs down the tree from that point.

ubuntu@ubuntu:~$ snmpwalk -v2c -c public 10.0.0.1 .1.3.6.1.2.1

iso.3.6.1.2.1.1.1.0 = STRING: "Cisco IOS Software, 3700 Software (C3725-ADVENTERPRISEK9-M), Version 12.4(15)T14, RELEASE SOFTWARE (fc2)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2010 by Cisco Systems, Inc.
Compiled Tue 17-Aug-10 12:08 by prod_rel_team"
iso.3.6.1.2.1.1.2.0 = OID: iso.3.6.1.4.1.9.1.122
iso.3.6.1.2.1.1.3.0 = Timeticks: (87351) 0:14:33.51
iso.3.6.1.2.1.1.4.0 = ""
iso.3.6.1.2.1.1.5.0 = STRING: "R1.lab.local"
iso.3.6.1.2.1.1.6.0 = ""
iso.3.6.1.2.1.1.7.0 = INTEGER: 78
iso.3.6.1.2.1.1.8.0 = Timeticks: (0) 0:00:00.00
<majority of output omitted>

This would've produced tens of thousands of lines of output, so I just showed the first few.  The "public" community is both an identifier and a password.  Problem is, it's sent in plain-text, so anyone able to intercept an SNMP packet can make this same query.

Being able to find out router variables may be fairly innocuous, but let's look at a truly risky scenario.

R1(config)#snmp-server system-shutdown

This will allow an snmp-set to reboot the router.

ubuntu@ubuntu:~$ snmpset -v2c -c private 10.0.0.1 1.3.6.1.4.1.9.2.9.9.0 i 2
R1(config)#

***
*** --- SHUTDOWN in

***
*** --- SHUTDOWN NOW ---
*** Message from network to all terminals:
***

Now that would be truly awful. 

So how do you secure this thing?  There are a few options:
- Apply an access-list to the community
- Limit what MIBs the community can access
- Use SNMPv3 (covered later)

R1(config)#access-list 52 permit 10.0.0.5
R1(config)#snmp-server community private RW 52

ubuntu@ubuntu:~$ snmpset -v2c -c private 10.0.0.1 1.3.6.1.4.1.9.2.9.9.0 i 2
Timeout: No Response from 10.0.0.1

Well that solved the problem.  Sort of...

SNMP SET operations are application-layer ACKed, but there's no need for the ACK to actually arrive.  The target will act on the SET operation regardless.  This means that if you can spoof a source IP address to one the router will accept, you can still send SETs at it and it will react!  The attacker will presumably not get the ACK, the real NMS would, but it doesn't matter. Spoofing source addresses is generally very easy, so this is still a major flaw.

On the bright side, this method is still better than ACLing SNMP at the interface.  This way the router can still transit SNMP traffic for other devices.  At the interface, it would block SNMP for devices downstream, as well.

Shy of going to SNMPv3, the best option is only allowing specific SNMP SETs to be sent to specific MIBs.  That is done with views.

First determine what you want to include:

R1#show snmp mib
dot1xPaeSystemAuthControl
dot1xPaePortProtocolVersion
dot1xPaePortCapabilities
dot1xPaePortInitialize
dot1xPaePortReauthenticate
dot1xAuthPaeState
dot1xAuthBackendAuthState
dot1xAuthAdminControlledDirections
dot1xAuthOperControlledDirections
dot1xAuthAuthControlledPortStatus
dot1xAuthAuthControlledPortControl
dot1xAuthQuietPeriod
dot1xAuthTxPeriod

This list is really, really long, and can even hang low-end routers while you're trying to display it.  Let's use a few I've pre-picked.

R1(config)#snmp-server view MYVIEW system.1 included
R1(config)#snmp-server view MYVIEW ifType included
R1(config)#snmp-server community private view MYVIEW RW

ubuntu@ubuntu:~$ snmpwalk -v2c -c private 10.0.0.1 .1.3.6

iso.3.6.1.2.1.1.1.0 = STRING: "Cisco IOS Software, 3700 Software (C3725-ADVENTERPRISEK9-M), Version 12.4(15)T14, RELEASE SOFTWARE (fc2)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2010 by Cisco Systems, Inc.
Compiled Tue 17-Aug-10 12:08 by prod_rel_team"
iso.3.6.1.2.1.2.2.1.3.1 = INTEGER: 6
iso.3.6.1.2.1.2.2.1.3.2 = INTEGER: 6
iso.3.6.1.2.1.2.2.1.3.3 = INTEGER: 6
iso.3.6.1.2.1.2.2.1.3.5 = INTEGER: 1
iso.3.6.1.2.1.2.2.1.3.5 = No more variables left in this MIB View (It is past the end of the MIB tree)
ubuntu@ubuntu:~$

Now we have a very short list of R/W variables.

Of note, you can also add to SNMP views by OID.  For example:
snmp-server view MYVIEW 1.3.6.1.4.1.9.10.91.1.1.1.1 included

The router will translate this back to the MIB:
snmp-server view MYVIEW ceemEventMapEntry included

Might also be a handy way to acquire a MIB value if given an OID?

There are a few values that are commonly set by the config that are important to be aware of:
R1(config)#snmp-server contact Your Admins Name
R1(config)#snmp-server location San Jose, CA
R1(config)#snmp-server chassis-id 11223344

ubuntu@ubuntu:~$ snmpget -v2c -c public 10.0.0.1 1.3.6.1.2.1.1.6.0
iso.3.6.1.2.1.1.6.0 = STRING: "San Jose, CA"

Thus far I haven't spent much time explaining which version we're using.  We're still going to avoid v3 until later, but let's look at v1 and v2c.

To simplify our output, let's cut this down to just the read-only community, public.

Now let's run show snmp group:



We see "public" twice - security model "v1" and security model "v2c".  Just turning on the community enables both v1 and v2c by default.

If you just wanted v2c, you can do this... rather awkward... command.

R1(config)#no snmp-server group public v1

So what's so awkward about it?

R1(config)#do sh run | i snmp-server group
R1(config)#

It doesn't show up in the config.  If you do a wr mem and reboot, you get both v1 and v2c again.  Not sure how that plays out for the CCIE lab, but it sure makes me uneasy.

That all said, before the reboot, it does work:



You might wonder what else the snmp-server group command can do.  Best I can tell, before considering v3, there's not much else you can use it for besides selectively disabling v1 or v2c.

Before we consider SNMP traps or v3, let's look at some SNMP router tricks you should know about.

Did you know you could acquire the router config via SNMP?  Not so much by SNMP GETs, but by SETs!  How, you may ask?  You can instruct the router to TFTP the config somewhere!

I've pre-setup a TFTP server on the Ubuntu box.  I've also removed the view from the "private" community.  I'm not going to go over these commands one-by-one, as there's no way this could possibly be on the CCIE lab, but you may want to try them for yourself:

These prep the transfer:
snmpset -c private -v 1 10.0.0.1 1.3.6.1.4.1.9.9.96.1.1.1.1.2.111 i 1  
snmpset -c private -v 1 10.0.0.1 1.3.6.1.4.1.9.9.96.1.1.1.1.3.111 i 4
snmpset -c private -v 1 10.0.0.1 1.3.6.1.4.1.9.9.96.1.1.1.1.4.111 i 1
snmpset -c private -v 1 10.0.0.1 1.3.6.1.4.1.9.9.96.1.1.1.1.5.111 a 10.0.0.2
snmpset -c private -v 1 10.0.0.1 1.3.6.1.4.1.9.9.96.1.1.1.1.6.111 s running.cfg

This initiates:
snmpset -c private -v 1 10.0.0.1 1.3.6.1.4.1.9.9.96.1.1.1.1.14.111 i 1  

This resets the prep commands above:
snmpset -c private -v 1 10.0.0.1 1.3.6.1.4.1.9.9.96.1.1.1.1.14.111 i 6

Afterwards, you'll have a file called "running.cfg" on the TFTP server on 10.0.0.2.

Well that's kind of scary, too.  Someone could send the config to any TFTP server.
Here's how you leverage this feature but limit your risk:

R1(config)#snmp-server file-transfer access-group 10 protocol ?
  ftp   Configure acl for ftp transfer protocol
  rcp   Configure acl for rcp transfer protocol
  scp   Configure acl for scp transfer protocol
  sftp  Configure acl for sftp transfer protocol
  tftp  Configure acl for tftp transfer protocol

Configure an access-list, specify your protocol, and you can control where the router will send files to via SNMP.

On another topic, most items that are monitored on a Cisco device are interface-related.  Did you know the interface OIDs can shift between reboots?  If a new interface is added, the interfaces can be re-indexed when the router comes back up.  This is not helpful if you're trying to reference them by either MIB OR OID.  To prevent this default behavior, you have two options:

Globally:
snmp-server ifindex persist

Per-interface:
snmp ifindex persist

Either of these will store the interface index in NVRAM, and load them on reboots.

Now we'll take a look at SNMP v3.  We'll follow that up with traps and informs, and then RMON last.

SNMP has three security models:
noAuthNoPriv - This basically disables the security features of SNMPv3, and there's almost no reason to use it.
authNoPriv - This will use a hash to authenticate the user, but not encrypt the SNMP data
authPriv - This will use a hash to authenticate the user, as well as encrypt the SNMP data

First and foremost, SNMP v3 drops the community used by v1 and v2c.  It uses groups and users instead.

Let's start with the most basic noAuthNoPriv example:

R1(config)#snmp-server user LABUSER LABGROUP v3
R1(config)#snmp-server group LABGROUP v3 noauth

ubuntu@ubuntu:~$ snmpwalk -v3 -u LABUSER -l noAuthNoPriv 10.0.0.1 1.3.6.1.2.1.1.9
iso.3.6.1.2.1.1.9.1.2.1 = OID: iso.3.6.1.4.1.9.7.129
iso.3.6.1.2.1.1.9.1.2.2 = OID: iso.3.6.1.4.1.9.7.115
iso.3.6.1.2.1.1.9.1.2.3 = OID: iso.3.6.1.4.1.9.7.265
<output omitted>

As you can see, that's really not any better than SNMP v2c.  Basically we substituted an non-secure community for a non-secure username.  Of note, you can still use views, just like with previous versions of SNMP

snmp-server view MYVIEW system.1 included
snmp-server view MYVIEW ifType included
snmp-server group LABGROUP v3 noauth read MYVIEW write MYVIEW

ubuntu@ubuntu:~$ snmpwalk -v3 -u LABUSER -l noAuthNoPriv 10.0.0.1 1.3.6.1.2
<output omitted>
iso.3.6.1.2.1.2.2.1.3.1 = INTEGER: 6
iso.3.6.1.2.1.2.2.1.3.2 = INTEGER: 6
iso.3.6.1.2.1.2.2.1.3.3 = INTEGER: 6
iso.3.6.1.2.1.2.2.1.3.5 = INTEGER: 1
iso.3.6.1.2.1.2.2.1.3.5 = No more variables left in this MIB View (It is past the end of the MIB tree)

Just for comparison purposes moving forward, let's take a look at the debug from debug snmp headers on R1:

Incoming SNMP packet
*Mar  1 00:26:29.311: v3 packet         security model: v3       security level: noauth
*Mar  1 00:26:29.311: username: LABUSER
<output omitted>

The important part here is the noauth, which is noAuthNoPriv in Cisco-speak.

Let's upgrade our security from noAuthNoPriv to authNoPriv

R1(config)#snmp-server user LABUSER LABGROUP v3 auth sha MYSECRET
R1(config)#snmp-server group LABGROUP v3 noauth  ! NOTE, I am not changing this yet

ubuntu@ubuntu:~$ snmpwalk -v3 -u LABUSER -l authNoPriv -a sha -A MYSECRET 10.0.0.1 1.3.6.1.2
<output omitted>
iso.3.6.1.2.1.2.2.1.3.1 = INTEGER: 6
iso.3.6.1.2.1.2.2.1.3.2 = INTEGER: 6
iso.3.6.1.2.1.2.2.1.3.3 = INTEGER: 6
iso.3.6.1.2.1.2.2.1.3.5 = INTEGER: 1
iso.3.6.1.2.1.2.2.1.3.5 = No more variables left in this MIB View (It is past the end of the MIB tree)

We've asked for authNoPriv, SHA authentication, with the appropriate password.

Incoming SNMP packet
*Mar  1 00:29:52.723: v3 packet         security model: v3       security level: auth
*Mar  1 00:29:52.723: username: LABUSER

And we got auth (authNoPriv) per the debug.  So what's this actually do?:

snmp-server group LABGROUP v3 noAuth

This doesn't force noAuth.  It just doesn't enforce a higher security level.  In other words, this group will accept authentication, no authentication, or authentication + encryption.  Now we did have to include a username and hash method on the snmp-server user command, or there would be no way to authenticate.

Of course the catch with this, you can still authenticate without the password:

ubuntu@ubuntu:~$ snmpwalk -v3 -u LABUSER -l noAuthNoPriv 10.0.0.1 1.3.6.1.2
<output omitted>
iso.3.6.1.2.1.2.2.1.3.1 = INTEGER: 6
iso.3.6.1.2.1.2.2.1.3.2 = INTEGER: 6
iso.3.6.1.2.1.2.2.1.3.3 = INTEGER: 6
iso.3.6.1.2.1.2.2.1.3.5 = INTEGER: 1
iso.3.6.1.2.1.2.2.1.3.5 = No more variables left in this MIB View (It is past the end of the MIB tree)

Incoming SNMP packet
*Mar  1 00:37:46.367: v3 packet         security model: v3       security level: noauth
*Mar  1 00:37:46.367: username: LABUSER

So that doesn't really protect anything.  As always, my focus is regarding the CCIE lab, and it's about knowing how to do things, not about producing good production configs.  So knowing this can be done is important.

Let's make this work in a more logical fashion:

R1(config)#no snmp-server group LABGROUP v3 noauth
R1(config)#snmp-server group LABGROUP v3 auth

ubuntu@ubuntu:~$ snmpwalk -v3 -u LABUSER -l noAuthNoPriv 10.0.0.1 1.3.6.1.2
Error in packet.
Reason: authorizationError (access denied to that object)

The door's not wide up any longer...  let's try it with auth.

ubuntu@ubuntu:~$ snmpwalk -v3 -u LABUSER -l authNoPriv -a sha -A MYSECRET 10.0.0.1 1.3.6.1.2.1
<output omitted>
iso.3.6.1.2.1.1.2.0 = OID: iso.3.6.1.4.1.9.1.122
iso.3.6.1.2.1.1.3.0 = Timeticks: (271023) 0:45:10.23
iso.3.6.1.2.1.1.4.0 = ""
iso.3.6.1.2.1.1.5.0 = STRING: "R1.lab.local"
iso.3.6.1.2.1.1.6.0 = ""
iso.3.6.1.2.1.1.7.0 = INTEGER: 78
iso.3.6.1.2.1.1.8.0 = Timeticks: (0) 0:00:00.00
iso.3.6.1.2.1.1.9.1.2.1 = OID: iso.3.6.1.4.1.9.7.129
<output omitted>

Now we have a true authentication requirement on our SNMP process.
Let's add encryption.

R1(config)#snmp-server group LABGROUP v3 priv
R1(config)#snmp-server user LABUSER LABGROUP v3 auth sha MYSECRET priv aes 128 PW4ENCRYPTION

ubuntu@ubuntu:~$ snmpwalk -v3 -u LABUSER -l authPriv -a sha -A MYSECRET -x aes -X PW4ENCRYPTION 10.0.0.1 1.3.6.1.2
<output omitted>
iso.3.6.1.2.1.1.2.0 = OID: iso.3.6.1.4.1.9.1.122
iso.3.6.1.2.1.1.3.0 = Timeticks: (359166) 0:59:51.66
iso.3.6.1.2.1.1.4.0 = ""
iso.3.6.1.2.1.1.5.0 = STRING: "R1.lab.local"
iso.3.6.1.2.1.1.6.0 = ""
iso.3.6.1.2.1.1.7.0 = INTEGER: 78
<output omitted>

Incoming SNMP packet
*Mar  1 00:59:52.687: v3 packet         security model: v3       security level: priv
*Mar  1 00:59:52.687: username: LABUSER

Did you know you could mix the v3 authentication methods on the same group?

snmp-server group LABGROUP v3 noauth read MYVIEW
snmp-server group LABGROUP v3 priv

This will permit noAuthNoPriv read access to the MYVIEW view, and authPriv to everything else.

ubuntu@ubuntu:~$ snmpwalk -v3 -u LABUSER -l authNoPriv -a sha -A MYSECRET 10.0.0.1 1.3.6.1.2.1
<output omitted>
iso.3.6.1.2.1.2.2.1.3.1 = INTEGER: 6
iso.3.6.1.2.1.2.2.1.3.2 = INTEGER: 6
iso.3.6.1.2.1.2.2.1.3.3 = INTEGER: 6
iso.3.6.1.2.1.2.2.1.3.5 = INTEGER: 1
iso.3.6.1.2.1.2.2.1.3.5 = No more variables left in this MIB View (It is past the end of the MIB tree

authNoPriv, we get the small sampling... what about with priv?

ubuntu@ubuntu:~$ snmpwalk -v3 -u LABUSER -l authPriv -a sha -A MYSECRET -x aes -X PW4ENCRYPTION 10.0.0.1 1.3.6.1.2.1
<output omitted>
iso.3.6.1.2.1.1.2.0 = OID: iso.3.6.1.4.1.9.1.122
iso.3.6.1.2.1.1.3.0 = Timeticks: (511295) 1:25:12.95
iso.3.6.1.2.1.1.4.0 = ""
iso.3.6.1.2.1.1.5.0 = STRING: "R1.lab.local"
iso.3.6.1.2.1.1.6.0 = ""
iso.3.6.1.2.1.1.7.0 = INTEGER: 78
iso.3.6.1.2.1.1.8.0 = Timeticks: (0) 0:00:00.00
iso.3.6.1.2.1.1.9.1.2.1 = OID: iso.3.6.1.4.1.9.7.129
iso.3.6.1.2.1.1.9.1.2.2 = OID: iso.3.6.1.4.1.9.7.115
iso.3.6.1.2.1.1.9.1.2.3 = OID: iso.3.6.1.4.1.9.7.265
iso.3.6.1.2.1.1.9.1.2.4 = OID: iso.3.6.1.4.1.9.7.112
iso.3.6.1.2.1.1.9.1.2.5 = OID: iso.3.6.1.4.1.9.7.106
iso.3.6.1.2.1.1.9.1.2.6 = OID: iso.3.6.1.4.1.9.7.47
iso.3.6.1.2.1.1.9.1.2.7 = OID: iso.3.6.1.4.1.9.7.122
iso.3.6.1.2.1.1.9.1.2.8 = OID: iso.3.6.1.4.1.9.7.135
iso.3.6.1.2.1.1.9.1.2.9 = OID: iso.3.6.1.4.1.9.7.43
iso.3.6.1.2.1.1.9.1.2.10 = OID: iso.3.6.1.4.1.9.7.37
iso.3.6.1.2.1.1.9.1.2.11 = OID: iso.3.6.1.4.1.9.7.92
iso.3.6.1.2.1.1.9.1.2.12 = OID: iso.3.6.1.4.1.9.7.53
iso.3.6.1.2.1.1.9.1.2.13 = OID: iso.3.6.1.4.1.9.7.54
iso.3.6.1.2.1.1.9.1.2.14 = OID: iso.3.6.1.4.1.9.7.52
^C
Obviously, I killed it there, so it didn't go on forever.

Here's an oddity...

R1#sh run | i snmp-server user
R1#

You saw me create the users.

They're not shown in the running-config for security reasons.  Annoying, in my opinion.  You can get some limited information from:

R1#show snmp user

User name: LABUSER
Engine ID: 800000090300C2000BF40000
storage-type: nonvolatile        active
Authentication Protocol: SHA
Privacy Protocol: AES128
Group-name: LABGROUP

Now let's take a look at traps.

An SNMP trap is sent to the NMS when an "urgent" event occurs on the device.  The idea is that the NMS can only poll the target device every so often, and the event may have cleared before NMS station revisits the target (my personal opinion is use a faster NMS so you can poll more often!).  Anyway, Cisco devices can send a plethora of events to the NMS, and here's how you configure it.

R1(config)#snmp-server enable traps
R1(config)#snmp-server host 10.0.0.2 traps version 2c MYCOMMUNITY

This is the most basic setup. Enable traps globally, then enable them per-NMS.  Note, if you just use the host command, and don't enable traps globally, you will get no results.

snmp-server enable traps enables all possible traps.  For example:

snmp-server enable traps snmp authentication linkdown linkup coldstart warmstart
snmp-server enable traps vrrp
snmp-server enable traps ds1
snmp-server enable traps tty
snmp-server enable traps eigrp
snmp-server enable traps xgcp
snmp-server enable traps flash insertion removal
<output omitted>

The usage of the snmp-server host command is pretty simple:
snmp-server host 10.0.0.2 traps version 2c MYCOMMUNITY

traps indicates traps instead of informs (covered below), version indicates the trap format (defaults to v1 if unspecified), and the community gives a way of grouping the traps on the NMS.  Note, the trap community does not have to correspond with your "GET" communities whatsoever.

Now our config above produces an awful lot of traps.  You have two options for limiting them:

Globally:
no snmp-server enable traps   ! turns off the mass of traps we already enabled
snmp-server enable traps eigrp
snmp-server enable traps bgp
snmp-server enable traps ospf

That will globally send those traps towards any configured hosts.

Now if we had another host we only wanted BGP traps to, but not eigrp and ospf, you could additionally:

Per-host:
snmp-server host 10.0.0.2 version 2c MYCOMMUNITY bgp

Default destination port for traps is UDP 162, if you want to change it:
snmp-server host 10.0.0.2 version 2c MYCOMMUNITY bgp udp-port 100

One of the catches of traps is that they're an un-ACKed UDP packet.  The network device has no way of knowing whether or not they arrive.  Introducing informs.

Informs are basically ACKed traps.  They were introduced in SNMP v2.

Usage is:
snmp-server enable traps
snmp-server host 10.0.0.2 informs version 2c MYCOMMUNITY

Usage is exactly the same as traps except you put the word "informs" in the host statement.

The syslog can be encapsulated into SNMP traps and sent to an NMS.

R1(config)#snmp-server enable traps syslog
R1(config)#snmp-server host 10.0.0.2 MYCOMMUNITY syslog

Both of these above options would be enabled by default if you just enabled trapping unfiltered globally and on the host, but here's the magic command:

R1(config)#logging history debugging

Logging history is the key.  You can enable whatever level of logging you like.  Let's give this a try.

Note I wasn't up for installing an NMS product, so we'll be using the magic of tcpdump to see this happen on 10.0.0.2.

R1(config)#exit
R1#
*Mar  3 00:25:46.499: %SYS-5-CONFIG_I: Configured from console by console

(exiting config mode is an easy way to generate syslog on debugging)

04:40:57.603943 IP 10.0.0.1.64738 > 10.0.0.2.snmp-trap:  C=MYCOMMUNITY Trap(182)  E:cisco.9.41.2 10.0.0.1 enterpriseSpecific s=1 17434657 E:cisco.9.41.1.2.3.1.2.4="SYS" E:cisco.9.41.1.2.3.1.3.4=6 E:cisco.9.41.1.2.3.1.4.4="CONFIG_I" E:cisco.9.41.1.2.3.1.5.4="Configured from console by console" E:cisco.9.41.1.2.3.1.6.4=17434651

And some miscellaneous items:

How to change the SNMP packet size:
snmp-server packetsize <size> ! defaults to 1500 bytes

How to change the source interface:
snmp-server trap-source <interface>

By default, all interfaces send link-status traps.  To individually disable them:
interface Fa0/0
  no snmp trap link-status

SNMP traps are rate-limited on egress.  If you have a large number of NMS stations or expect a great deal of traps, you may need to queue traps for delivery.  By default, the IOS device will hold 10 traps in its queue.  To change how many traps IOS will queue:
snmp-server queue-length <qty of traps>

SNMP informs have a similar queue-length, but it holds items in queue until they're acknowledged, meaning the queue-length needs to potentially be longer.  The default is 25.
snmp-server inform pending <qty of informs>

When to consider a lack of ACK a failure on an inform:
snmp-server inform timeout <seconds>

How many times to retry an un-acked inform:
snmp-server inform retries <qty of retries>

RMON - Defined over RFC 2819 and 4502, RMON is a large, complex network monitoring suite.  Only a couple of Cisco devices implement the entire suite, and none of them are on the CCIE lab.  The big important piece that is implemented on all IOSes and is on the lab is the ability to monitor a local SNMP variable and generate syslog or an SNMP trap.

The easiest way for me to think about RMON's function on IOS is to consider an SNMP GET -> TRAP converter.  It looks at it's own SNMP variable and generates TRAPs if certain criteria are met.

With RMON, you generate alarms and events.  Alarms call the events.  Events can trigger logging, traps, or both.

rmon event 1 trap MYCOMMUNITY owner config
rmon alarm 1 ip.3.0 15 delta rising-threshold 10 1 falling-threshold 5 1 owner config


This is the most basic usage possible.  Let's run over some of the variables.

rmon event 1 trap MYCOMMUNITY owner config

- The "1" in "event 1" is what the alarm will reference.
- The trap indicates sending a trap. 
- MYCOMMUNITY is the community string to send to.  Note, this must match a host statement's community, or no one will get these traps, i.e:
snmp-server host 10.0.0.2 informs version 2c MYCOMMUNITY
- I searched high and low for a better explanation on this, but best I can tell, the "owner" variable is just a text label.  It doesn't appear to do anything at all.  It's not included in the trap or with the log.  "config" is the default value for "owner" if you don't specify anything.

rmon alarm 1 ip.3.0 15 delta rising-threshold 10 1 falling-threshold 5 1 owner config

- The "1" in "alarm 1" is just an identifier, it doesn't reference anything
- "ip.3.0" is an arbitrary SNMP MIB value I picked, it can be any value exposed by the system
- "15" is the number of seconds between polling attempts
- "delta" will be explained later (options are either delta or absolute)
- The "10" in "rising threshold 10" is a high-water mark to trigger an event when hit.
- The following "1" is which event to call if the rising-threshold is triggered
- The "5" in "falling-threshold 5" is the low-water mark to trigger an event when hit
- The following "1" is which event to call if the falling-threshold is triggered
- "owner config", as referenced above, doesn't appear to actually have any meaning.

Of important note is the command show snmp mib ifmib ifindex, which will help you find the index number assigned to a specific interface.  Sample questions often have you measuring interface load, and while many questions provide you with the general value, you have to find the index for yourself.  Remember the snmp-server ifindex persist command from above, to make these values static between reboots.

As mentioned above, there are two operators that can be applied to alarms - delta and absolute.  These can be confusing.  Absolute is easier to work with, so I will start with it first.  Absolute should be used with values that can increase or decrease, such as CPU load.  If you wanted to get an alert when CPU load hit 80%, and then wanted to get notified when it settled back down to 10%, you would use absolute.

That's great in production, but it turns out getting a virtualized router to 80% CPU load in such a way that it can fall back down to a reasonable number afterwards without crashing actually takes quite a lot of effort.  So for our purposes, I'm going to test with CPU load at 10% being the high-water mark, and 5% being the low-water mark.  CPU load will be created on the device by enabling debug ip packet and flooding it with pings from another terminal (not pictured here).

This is our config:
rmon event 1 trap MYCOMMUNITY description "HIGH CPU LOAD" owner config
rmon event 2 trap MYCOMMUNITY description "CPU LOAD OK" owner config
rmon alarm 1 cpmCPUTotalTable.1.6.1 10 absolute rising-threshold 10 1 falling-threshold 5 2 owner config

When cpmCPUTotalTable.1.6.1 (5 second average of router CPU) hits 10%, event 1 will be called.  When it gets back down to 5 percent or lower, event 2 will be called.  It is important to note that events are not called more than once in a row.  If the CPU load is over 10% for more than two 2-second cycles, only one trap will be generated.  CPU load must fall below 5% again, event 2 will be called, and then if CPU load goes over 10% again, then event 1 can be called again.

I've turned on debug ip packet on R1, and I'm now firing away at it with a huge number of pings.

<large amounts of debug output omitted>
*Mar  3 02:13:56.711: %RMON-5-RISINGTRAP: Rising trap is generated because the value of cpmCPUTotalTable.1.6.1 exceeded the rising-threshold value 10

Let's take a look at the output on our Linux box, monitoring traps via tcpdump:

15:01:55.204278 IP 10.0.0.1.64738 > 10.0.0.2.snmp-trap:  C=MYCOMMUNITY Trap(258)  E:cisco.9.41.2 10.0.0.1 enterpriseSpecific s=1 18142683 E:cisco.9.41.1.2.3.1.2.25="RMON" E:cisco.9.41.1.2.3.1.3.25=6 E:cisco.9.41.1.2.3.1.4.25="RISINGTRAP" E:cisco.9.41.1.2.3.1.5.25="Rising trap is generated because the value of cpmCPUTotalTable.1.6.1 exceeded the rising-threshold value 10" E:cisco.9.41.1.2.3.1.6.25=18142676

Terminating the ping stream, giving it a second...

<more debug omitted>
*Mar  3 02:14:06.731: %RMON-5-FALLINGTRAP: Falling trap is generated because the value of cpmCPUTotalTable.1.6.1 has fallen below the falling-threshold value 5

And on the Linux box:

15:02:05.225729 IP 10.0.0.1.64738 > 10.0.0.2.snmp-trap:  C=MYCOMMUNITY Trap(269)  E:cisco.9.41.2 10.0.0.1 enterpriseSpecific s=1 18143682 E:cisco.9.41.1.2.3.1.2.26="RMON" E:cisco.9.41.1.2.3.1.3.26=6 E:cisco.9.41.1.2.3.1.4.26="FALLINGTRAP" E:cisco.9.41.1.2.3.1.5.26="Falling trap is generated because the value of cpmCPUTotalTable.1.6.1 has fallen below the falling-threshold value 5" E:cisco.9.41.1.2.3.1.6.26=18143678

It works as expected.  Well, sort of.  If you were paying close attention, you'll notice I put descriptions in with the trap events:

rmon event 1 trap MYCOMMUNITY description "HIGH CPU LOAD" owner config
rmon event 2 trap MYCOMMUNITY description "CPU LOAD OK" owner config

I haven't found a use for these other than to provide a text description in the config.  I've read elsewhere that these are supposed to be included in the trap, but as you can plainly see from the tcpdump output, this text isn't passed along anywhere.  You'll also notice the "owner" value isn't passed in the trap, either.

RMON events can either log, trap, or log & trap. Let's try swapping our output to logging.

R1(config)#rmon event 1 log description "HIGH CPU LOAD"
R1(config)#rmon event 2 log description "CPU LOAD OK"
R1(config)#logging trap notifications
R1(config)#logging 10.0.0.2

I'm firing up the same ping flood to create cpu load.

R1#
*Mar  1 00:08:33.711: %RMON-5-RISINGTRAP: Rising trap is generated because the value of cpmCPUTotalTable.1.6.1 exceeded the rising-threshold value 10

And on our tcpdump "collector":

15:39:04.438924 IP (tos 0x0, ttl 255, id 45646, offset 0, flags [none], proto UDP (17), length 190)
    10.0.0.1.55641 > 10.0.0.2.syslog: [udp sum ok] SYSLOG, length: 162
        Facility local7 (23), Severity notice (5)
        Msg: 176719: *Mar  1 00:08:33.711: %RMON-5-RISINGTRAP: Rising trap is generated because the value of cpmCPUTotalTable.1.6.1 exceeded the rising-threshold value 10

There it is, facility local7, severity 5.
For brevity I'm excluding the falling output, but suffice to say it works as expected.

One would think this is a slam-dunk, but not so fast.

Let's put it back to trap, but leave the generalized syslog config in place:

R1(config)#no rmon event 1 log description "HIGH CPU LOAD"
R1(config)#no rmon event 2 log description "CPU LOAD OK"
R1(config)#rmon event 1 trap BOGUS description "HIGH CPU LOAD"
R1(config)#rmon event 2 trap BOGUS description "CPU LOAD OK"

Now we won't get the traps any longer, but check out what happened on our syslog output after I fired up the pings again:

16:03:49.160416 IP (tos 0x0, ttl 255, id 21, offset 0, flags [none], proto UDP (17), length 186)
    10.0.0.1.49286 > 10.0.0.2.syslog: [udp sum ok] SYSLOG, length: 158
        Facility local7 (23), Severity notice (5)
        Msg: 22: *Mar  1 00:07:08.415: %RMON-5-RISINGTRAP: Rising trap is generated because the value of cpmCPUTotalTable.1.6.1 exceeded the rising-threshold value 10

Best I can tell, the "log" keyword does absolutely nothing.  You get logging automatically when you enable an RMON event, even if you only specify traps.  Could be a bug in my version of IOS.  Testing on 12.4(15)T14.

Now that we've mastered absolute, let's look at the trickier delta.

Delta is best used for SNMP values that only climb.  For instance, number of packets across an interface will only go up, barring clearing the counters.  The command syntax is identical, which adds to the difficulty of getting your head around this, because the variables actually act differently.

Let's go back to a similar example to my original:

rmon alarm 1 ip.3.0 15 delta rising-threshold 10 1 falling-threshold 5 2 owner config

In this case, we're measuring the change per-measurement of ip.3.0.  Let's replace with some pseudo-variables for ease of reference:

rmon alarm 1 ip.3.0 $SEC$ delta rising-threshold $BIG$ 1 falling-threshold $SMALL$ 2

During period $SEC$, if ip.3.0 increased < $SMALL$, trigger event 2
During period $SEC$, if ip.3.0 increased > $BIG$, trigger event 1
During period $SEC$, if ip.3.0 increased > $SMALL$ but < $BIG$, do nothing

So let's walk through this.  I'm not exactly sure what ip.3.0 references, but it appears to be a packet counter of some sort. 

rmon alarm 1 ip.3.0 15 delta rising-threshold 10 1 falling-threshold 5 2 owner config
$SMALL$ = 2
$BIG$ = 10

For simplicity, let's assume we are turning the router on fresh, and RMON kicks in at the same time IP forwarding does.

On seconds 0-9, if no packets are received whatsoever, our value for ip.3.0 = 0, and event 2 is triggered due to "ip.3.0 increased < 2".

On seconds 10-19, if five packets are received, our value for ip.3.0 = 5, our previous value for ip.3.0 = 0.  5 - 0 = 5, so we increased 5 packets.  Per my rule above:
"During period $SEC$, if ip.3.0 increased > $SMALL$ but < $BIG$, do nothing"
We defined $SMALL$ (falling-threshold) as 2 and $BIG$ (rising-threshold) as 10, so:
"During period 10 seconds, if ip.3.0 increased > 2 but < 10, do nothing"
No action is triggered.

On seconds 20-29, if fifty packets are received, our value for ip.3.0 = 55, our previous value for ip.3.0 = 5.  50-5 = 50, so we increased 50 packets.  Per my rule above:
During period 10 seconds, if ip.3.0 increased > 10, trigger event 1
Event 1 is triggered

On seconds 30-39, if one hundred packets are received, our value for ip.3.0 = 155, our previous value for ip.3.0 = 55.  155-55 = 100, so we increased 50 packets.  Per my rule above:
During period 10 seconds, if ip.3.0 increased > 10, trigger event 1
CATCH: rising-events and falling-events cannot be triggered back to back, so actually, no action is triggered.

On seconds 30-39, if three packets are received, our value for ip.3.0 = 158, our previous value for ip.3.0 = 155.  158-155 = 3, so we increased 3 packets.  Per my rule above:
During period 10 seconds, if ip.3.0 increased < $SMALL$, trigger event 2
Event 2 is triggered

I had a complex lab solution worked out for delta, but it's so tied to being able to follow the timer that I've decided it didn't work well visually and went with the above example instead.

I really hope you enjoyed!

Jeff

3 comments:

  1. Thanks Jeff

    Nice blog you have and very helpful :)

    ReplyDelete
  2. very nice topic really well done :)

    ReplyDelete