Saturday, December 14, 2013

[mini] BGP Auto-Summary

I recently got a task on a practice lab that was obviously regarding BGP auto summary.  I'm well-practiced in BGP on production systems, but who the heck uses auto-summary any longer?  It then occurred to me that I'd never even turned it on.

My first attempt was to:

int lo5
  ip address 5.5.5.5 255.255.255.0

router bgp 100
  auto-summary
  network 5.5.5.0 mask 255.255.255.0

I peered it up with another router, and expected to see "5.0.0.0/8" in the BGP table of the other router.

No such luck, I ended up with 5.5.5.0/24.

After some googling, I found two methods to make this work:

int lo5
  ip address 5.5.5.5 255.255.255.0

router bgp 100
  auto-summary
  network 5.0.0.0

That will produce 5.0.0.0 in both the local BGP table and anyone it peers to.

You can also:

int lo5
  ip address 5.5.5.5 255.255.255.0

router bgp 100
  auto-summary
  redistribute connected

That will also get you 5.0.0.0 in both the local BGP table and anyone it peers to.

Of interesting note, if you:

int lo5
  ip address 5.5.5.5 255.255.255.0

int lo6
  ip address 5.5.6.6 255.255.255.0

router bgp 100
  auto-summary
  network 5.5.0.0 mask 255.255.0.0

That will also produce 5.0.0.0/8.

Not a complex topic, but it works differently than the way IGPs do, and I thought it was worth mentioning.

Happy studying!

Jeff

Friday, November 29, 2013

[mini] PPPoE in the DocCD

I ran across a PPPoE problem a couple days ago, and let me tell you, this is not my favorite topic.  I've only used it in production once, and I don't come across it in practice labs enough to keep it fresh in my mind. I've been skipping these questions when doing time-trial practice labs and just using traditional Ethernet whenever this was called for, and just taking a hit on the points.  Not a good plan, but I felt there were more important things to focus on.

One of the other reasons I haven't wanted to focus on it, knowing that I only see it once in a blue moon, is that the documentation is so spread out I could never figure out where all the various pieces are.  The lab questions always call for server and client installs, and they're on different pages, and spread out across those two pages.

I decided a good interim step on this problem is to nutshell exactly where the pieces are in the documentation.

First, you want the Broadband Access Aggregation and DSL Configuration Guide.  It's on the main "Configuration" page for 12.4T that you've been going to in the DocCD.  See below.



The next page has a lot of options on it. Fortunately we only need two of them, and they're right on top of each other:

- PPPoE "server" is on Providing Protocol Support for Broadband Access Aggregation of PPPoE Sessions.
- PPPoE client is on "PPP over Ethernet Client"



We'll start on the server side first.  "R1" will be our server router.  Not providing a diagram, just two devices connected in Fa0/0 involved.

You need three sections on the "Providing Protocol Support for Broadband Access Aggregation of PPPoE Sessions" page:

- "Configuring a Virtual Template Interface"
- "Defining a PPPoE Profile"
- "Assigning a PPPoE Profile to an Ethernet Interface"



I put them in the order I felt they should be done in, so let's start with "Configuring a Virtual Template Interface".  Frankly, if you don't know how to this, this is worth memorizing.  It comes up in more places than just PPPoE (PPP over Frame Relay, namely). 


Let's apply the necessary pieces as we walk through this:

R1:
R1(config)#interface virtual-template 1
R1(config-if)#ip address 192.168.1.1 255.255.255.0  ! you don't actually have to use IP unnumbered
R1(config-if)#mtu 1492   ! not really a requirement but a really good idea
R1(config-if)#peer default ip address dhcp-pool TEST-POOL

To be fair, the "peer default" bit for assigning IP addresses to clients isn't actually in the above documentation snippet, but it is elsewhere on the page if you search for it.  It's also not a requirement, you could assign IPs statically.

Next step -



R1(config-if)#bba-group pppoe global
R1(config-bba-group)# virtual-template 1

Yep, that's all you really must have to get the bba-group working.  Now let's assign it to an interface.



R1(config)#interface fa0/0
R1(config-if)#pppoe enable
R1(config-if)#no shut

The pppoe enable command will expand to pppoe enable group global on its own, if you do a "show run".

We did reference a DHCP pool up above; we'll need to create that.

R1(config)#ip dhcp pool TEST-POOL
R1(dhcp-config)#network 192.168.1.0

That's all - now for the client side.  As we saw earlier (same image repeated from above), the client side is directly underneath the "server" side.



Once you're in there, there's once again many options, however the two you need are pretty easy to spot.  Note carefully that we are on the "12.2(13)T 12.4T and Later Releases" section.  There's one just above this for pre-12.2(13)T.



Configuring the dialer interface first makes more sense, so we'll start there:



R2(config)#int dialer 1
R2(config-if)#mtu 1492
R2(config-if)#encapsulation ppp
R2(config-if)#ip address negotiated
R2(config-if)#dialer pool 1



R2(config-if)#pppoe-client dial-pool-number 1
R2(config-if)#no shut

That's it - if you did it correctly, you should get output something like this on your client:

*Mar  1 00:28:51.103: %DIALER-6-BIND: Interface Vi1 bound to profile Di1
*Mar  1 00:28:51.191: %LINK-3-UPDOWN: Interface Virtual-Access1, changed state to up
*Mar  1 00:28:52.235: %LINEPROTO-5-UPDOWN: Line protocol on Interface Virtual-Access1, changed state to up

R2(config-if)#do sh ip int dialer1 | i Internet address
  Internet address is 192.168.1.3/32

R2(config-if)#do ping 192.168.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 16/23/36 ms

Cheers,

Jeff

Sunday, November 10, 2013

[mini] Embarassing BGP as-override misunderstanding

It can be hard to post on the Internet about dramatically misunderstanding a technology. 

In my defense, I've never worked for an MPLS provider, so I've never used as-override outside of a lab - actually I'm not sure I've ever used it in a lab before tonight, either.

For those unfamiliar with the basic idea, as-override is used in MP-BGP/VRF/MPLS scenarios where the customer wants to re-use an AS number on several sites.  Since the CE routers see the traffic from the PE routers as eBGP, they see their own AS number in the path and reject the update from the PE.  as-override is the PE mechanism to overcome this problem.

Let's take a four-router scenario - two CE routers and two PE.

It might look something like this:

CE1 (AS 100) -> PE1 (AS 250) -> PE2 (AS 250) -> CE2 (AS 100)

Clearly, when PE2 advertises CE1's routes to CE2, CE2 should reject them.

Fixing this on the CE side is very easy; you can change the AS number or use allowas-in to allow the CE to ignore the fact that its own AS number is present while receiving BGP updates.

As a network consultant I regularly deal with MPLS site activations, and twice now I've had the carrier offer to use as-override to fix the problem above, and I've declined, one time opting to change the AS number on the CE, another time I used allowas-in. I'd gotten the idea that, given that the carrier technician was signed into the PE connected to my CE, that that's the only place where the as-override would go.  Boy was I wrong.

I spent about 90 minutes this evening trying to get as-override working in the scenario described above.  CE1 would send AS 100 to PE1.  PE1 was configured with as-override facing CE1, and what I expected to have happen was PE1 strip out AS 100 on its way to PE2.  Incorrect! 

I'd repeatedly pull up PE2's BGP table:

PE2#sh ip bgp vpnv4 vrf CCIE | s 1.1.1.1
*>i1.1.1.1/32       192.168.23.2             0    100      0 100 I

BGP output doesn't paste the best into a non-monospaced document, but in short, it shows the prefix is still learned from AS 100 still (the other "100" adjacent to that is the local preference).  I sat there scratching my head, wondering how CE2 was going to be able to learn this (quick answer - it can't).

It turns out as-override is not an ingress setting at all.  It's an egress setting.  All it does is tell the PE that as-override is configured on that when it's passing routes to a CE, to do a find-and-replace of the CE's AS number and replace it with the local PE's AS number.

In other words, in our scenario:

CE1 (AS 100) -> PE1 (AS 250) -> PE2 (AS 250) -> CE2 (AS 100)

If I were to set as-override on PE1, that would enable CE1 to receive CE2's routes - not vice-versa.

CE1(config)#do sh ip bgp | i 2.2.2.2
*> 2.2.2.2/32       192.168.12.2                           0 250 250 I

We see that CE1 sees 2.2.2.2 (CE2's loopback) as going through AS 250 twice, instead of AS 250 followed by AS 100.

Thought this might help others out there stuck on a similar misunderstanding.

Cheers,

Jeff

Thursday, November 7, 2013

[mini] Why does LDP "require" a /32 Loopback?

A few days ago I asked a coworker why LDP sessions had issues if they weren't peered on /32s.  He answered, it doesn't have to be a /32, but the IGP and LDP had to agree on the mask length.  So I asked the more specific question - why does it have to agree on the mask length? He didn't know.  And neither did I.

Everyone seems to know that /32s are best practice for the LDP router ID.  But it's hard to find a good, clear explanation of why this is.

Let's start with some obvious facts.

- "The router considers all the IP addresses of all operational interfaces.... If these addresses include loopback interface addresses, the router selects the largest loopback address." http://www.cisco.com/en/US/docs/ios/12_4t/12_4t2/ftldp41.html#wp1654686

As always, my posts are geared for the CCIE lab, and it's a fair bet most of your gear on the lab is going to have a loopback.  So, expect the router ID to be a loopback, unless it's specified otherwise.

- You can specify the interface with mpls ldp router-id <interface>.  If you don't want it to be a loopback, or you want a certain loopback to be chosen over another, then use this command. If you want to change the router-id while LDP is already up you have to use the force command, i.e. mpls ldp router-id lo7 force.  If you don't use force, and LDP was already online, you'll have to reboot in order for the switch to take place.

- You can set the range of labels that LDP is allowed to use with mpls label range <lower> <upper>  I find this useful in debugging, because you can make your labels match your router number and it's easier to read the output.  LDP show commands are not always easy to interpret if you're not used to reading them.

-  "The LDP default behavior is to allocate local labels for all non-BGP prefixes."
http://www.cisco.com/en/US/docs/ios/12_4t/12_4t2/ftldp41.html#wp1654686

So what's that mean to us?  It might be better phrased as "The LDP default behavior is to allocate local labels choosing the best administrative distance as long as it's not from BGP".

- This problem is most commonly seen with OSPF (although you could see it from a summary route as well).  The sure-fire way to demonstrate it is to create a /24 loopback and not change the default network type.  OSPF automatically uses network type LOOPBACK, which is always advertised as a /32.

- With MPLS VPNs, BGP actually distributes the labels for the VRFs, not LDP.  You learn the stacked VRF tag, relevant only to the egress PE, from BGP.  You also learn the global routing table's next hop.  The next-hop is used to find out the LDP label.

Let's take a look at how this plays out.

R3 is trying to reach R1 in VRF CCIE.  R3's IP address is 3.3.3.3 and R1's IP address is 1.1.1.1.  R2 is sitting in the middle of the two.

R3#ping vrf CCIE 1.1.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
As we can see, ping is failing.

R3#sh ip route vrf CCIE 1.1.1.1
Routing entry for 1.1.1.1/32
  Known via "bgp 100", distance 200, metric 0, type internal
  Last update from 11.11.11.11 00:26:04 ago
  Routing Descriptor Blocks:
  * 11.11.11.11 (Default-IP-Routing-Table), from 22.22.22.22, 00:26:04 ago
      Route metric is 0, traffic share count is 1
      AS Hops 0
We have a route to reach it.

R3#show ip cef vrf CCIE 1.1.1.1
1.1.1.1/32, version 3, epoch 0, cached adjacency 192.168.23.2
0 packets, 0 bytes
  tag information set
    local tag: VPN-route-head
    fast tag rewrite with Fa0/0, 192.168.23.2, tags imposed: {200 103}
  via 11.11.11.11, 0 dependencies, recursive
    next hop 192.168.23.2, FastEthernet0/0 via 11.11.11.11/32
    valid cached adjacency
    tag rewrite with Fa0/0, 192.168.23.2, tags imposed: {200 103}

I used the mpls label range command (mentioned above) in order to restrict the tags to start with their own router ID.  In this case, we should be using MPLS "transit" tag of 200, and a MPLS "VRF" tag of 103.

R3#show mpls ldp bindings | b 11.11.11.11
  tib entry: 11.11.11.11/32, rev 6
        local binding:  tag: 300
        remote binding: tsr: 22.22.22.22:0, tag: 200
<output omitted>

We know that tag 200 references R1's primary routing table loopback IP (11.11.11.11).

R3#show mpls forwarding-table 11.11.11.11
Local  Outgoing    Prefix            Bytes tag  Outgoing   Next Hop
tag    tag or VC   or Tunnel Id      switched   interface
300    200         11.11.11.11/32      0          Fa0/0      192.168.23.2

We know that means sending traffic out Fa0/0 towards R2 (192.168.23.2) with tag 200.

Ok, so this router should be able to send traffic, right?

R2#debug mpls packet
MPLS packet debugging is on
R3#ping vrf CCIE 1.1.1.1 rep 2 timeout 1
Type escape sequence to abort.
Sending 2, 100-byte ICMP Echos to 1.1.1.1, timeout is 1 seconds:
..
Success rate is 0 percent (0/2)

R2#
*Mar  1 00:36:08.651: MPLS: Fa0/1: recvd: CoS=6, TTL=255, Label(s)=0
*Mar  1 00:36:09.067: MPLS: Fa0/1: recvd: CoS=6, TTL=255, Label(s)=0
R2 gets the MPLS packet just fine!  And that's all it does.  Notice my debug doesn't say anything about forwarding it on.

R2#show mpls ldp binding | b 11.11.11.11
  tib entry: 11.11.11.11/32, rev 10
        local binding:  tag: 200
        remote binding: tsr: 33.33.33.33:0, tag: 300
<output omitted>

We see R2 has locally bound tag 200 for 11.11.11.11, and has received a tag from R3 for 11.11.11.11, but ... no tag from R1?

Let's look at the routing tables.

R2#sh ip route 11.11.11.11
Routing entry for 11.11.11.11/32
  Known via "ospf 1", distance 110, metric 2, type intra area
  Last update from 192.168.12.1 on FastEthernet0/0, 00:00:02 ago
  Routing Descriptor Blocks:
  * 192.168.12.1, from 11.11.11.11, 00:00:02 ago, via FastEthernet0/0
      Route metric is 2, traffic share count is 1
R2 sees this as a /32.

R3#sh ip route 11.11.11.11
Routing entry for 11.11.11.11/32
  Known via "ospf 1", distance 110, metric 3, type intra area
  Last update from 192.168.23.2 on FastEthernet0/0, 00:39:16 ago
  Routing Descriptor Blocks:
  * 192.168.23.2, from 11.11.11.11, 00:39:16 ago, via FastEthernet0/0
      Route metric is 3, traffic share count is 1

R3 sees this as a /32.  Consequently, R3 has no problem sending the MPLS packet to R2.

R1#sh ip route 11.11.11.11
Routing entry for 11.11.11.0/24
  Known via "connected", distance 0, metric 0 (connected, via interface)
  Routing Descriptor Blocks:
  * directly connected, via Loopback0
      Route metric is 0, traffic share count is 1
And R1 sees it as a ... /24 connected route.  As mentioned above, OSPF is the common culprit here. It's advertising a /32 to everyone else, except the local router, which still sees it as a /24.  In fact...

R2#sh mpls ldp binding | b 11.11.11.0/24
  tib entry: 11.11.11.0/24, rev 11
        remote binding: tsr: 11.11.11.0:0, tag: exp-null
<output omitted>

R1 is advertising a /24 to R2.  MPLS bindings work a bit different than the routing table, R2's LDP process isn't simply going to choose the best route to R1, it's matching labels to prefixes, and the prefixes are considered unique if they're not identical.  So R2 just drops the packet, as it has no more bindings for 11.11.11.0/24.

The fix is to just make the two prefix lengths the same. They don't need to be /32s!  The easiest way to make this happen in this scenario is to change the OSPF network type away from LOOPBACK and stop forcing the /32 advertisement:

R1(config)#int lo0
R1(config-if)#ip ospf network point-to-point
R2#sh mpls ldp binding | b 11.11.11.0/24
  tib entry: 11.11.11.0/24, rev 16
        local binding:  tag: 203
        remote binding: tsr: 11.11.11.11:0, tag: exp-null
        remote binding: tsr: 33.33.33.33:0, tag: 305
<output omitted>

We can see R2 now has a binding from R1 and R3 that matches the same prefix length.

R3#ping vrf CCIE 1.1.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 60/66/76 ms

And forwarding works end-to-end.

In a nutshell: LDP associates labels with both the IP address and subnet mask.  The prefix length does have to match to become part of the same MPLS forwarding path.  However, the prefix length does not have to be /32 - it's just a good, safe practice.

Sunday, October 27, 2013

[mini] Static RP Address Blocks auto-RP Dense Flows

My first 40 posts were written while I was attempting to improve my understanding of a number of topics.  At this point in my studying, I've moved on to practicing interoperability of features, so I haven't written any new posts in some time.  My first posts were between five and twenty page topic deep-dives.  Now that I've moved on to review & practice, I'm planning on starting a new series of posts, which I will label with [mini] in front of the subject.  These will cover any small problems that really got me stuck while doing practice labs. Same quality as my old posts, but much smaller scope.

Today, I got stuck on a multicast problem.


I have EIGRP running on every interface, and pim sparse-dense mode on every interface.
Every IP address has reachability to every other IP address.  The last octet IP on every segment is the router number.  Every router has a loopback of Y.Y.Y.Y where Y is the router number.

I was working a lab for auto-RP. In an equivalence for the simpler scenario above, R1 was the mapping agent and R2 was the RP candidate.  Then R3 would join 239.0.0.1, and R1 would send a ping towards 239.0.0.1 and expect a reply.

The setup was as follows (remember, PIM sparse-dense and routing are already setup)

R1:
ip pim send-rp-discovery Loopback0 scope 10 interval 2

R2:
ip pim send-rp-announce Loopback0 scope 10 interval 2

R3:
interface FastEthernet0/0
 ip igmp join-group 239.0.0.1

And be damned if I could get the join on R3 to work.  I discovered pretty quickly that R3 wasn't learning the dynamic RP address:

R3#sh ip pim rp mapping
PIM Group-to-RP Mappings

R3#

"Well there's your problem!"

After a lot of digging, I finally noticed some odd output on R2:

R2#sh ip mroute 224.0.1.40 | b 224
(*, 224.0.1.40), 00:15:00/stopped, RP 2.2.2.2, flags: SJCL
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    FastEthernet0/0, Forward/Sparse-Dense, 00:15:00/00:01:58

(1.1.1.1, 224.0.1.40), 00:14:47/00:02:57, flags: PLJTX
  Incoming interface: FastEthernet0/0, RPF nbr 192.168.12.1
  Outgoing interface list: Null

It's pretty evident that 224.0.1.40 (The mapping agent group) isn't going to reach R3, as the OIL lists "Null", and R3 isn't going to learn the RP address, and therefore isn't going to be able to join the group.  Let's look closer on that output:

R2#sh ip mroute 224.0.1.40 | i 224
(*, 224.0.1.40), 00:21:47/stopped, RP 2.2.2.2, flags: SJCL
(1.1.1.1, 224.0.1.40), 00:21:34/00:02:59, flags: PLJTX

What's up with those flags?

S=Sparse, P=Pruned ... wait a minute!  224.0.1.40 is supposed to be dense mode forwarded.
Just to verify that, look at R1:

R1#sh ip mroute 224.0.1.40 | i 224
(*, 224.0.1.40), 00:36:03/stopped, RP 0.0.0.0, flags: DCL
(1.1.1.1, 224.0.1.40), 00:29:21/00:02:58, flags: LT

D=Dense

What the heck is R2 up to?
Turns out I didn't remove some debugging config I'd put in earlier, which as a whole is really nothing new on these type of tasks, but this one struck me as odd:

R2:
ip pim rp-address 2.2.2.2

In fact, let's take it out and see what happens:

R2(config)#no ip pim rp-address 2.2.2.2
R2(config)#exit
R2#sh ip mroute 224.0.1.40 | b 224
(*, 224.0.1.40), 00:00:18/stopped, RP 0.0.0.0, flags: DCL
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    FastEthernet0/1, Forward/Sparse-Dense, 00:00:18/00:00:00
    FastEthernet0/0, Forward/Sparse-Dense, 00:00:18/00:00:00

(1.1.1.1, 224.0.1.40), 00:00:16/00:02:58, flags: LT
  Incoming interface: FastEthernet0/0, RPF nbr 192.168.12.1
  Outgoing interface list:
    FastEthernet0/1, Forward/Sparse-Dense, 00:00:16/00:00:00

I can't imagine why this behavior is there, but if you have ip pim rp-address Y.Y.Y.Y configured, the RP will automatically assume auto-RP groups originated by other routers are sparse mode instead of dense, which effectively breaks auto-RP.  That makes no sense to me, and it took me almost two hours to go pull this line of config out.  I also can't find any documentation on why this behavior happens.

In a nutshell: Configuring a static RP address on an auto-RP device will stop the device in question from sending auto-RP dense groups to downstream neighbors.

Cheers,

Jeff Kronlage

Sunday, July 14, 2013

PPP Authentication

Just a short article today.  How many of us use PPP authentication regularly at our day jobs?  Unless you're at an ISP, not likely very often, and even then you probably use a cut & paste template.

Anyway, I always forget how this thing works, and the syntax is hard to understand.

No drawing this time, only two routers involved, R1 and R2, with a single serial interface between them. R1 will be assigned 172.16.0.1 and R2 will be assigned 172.16.0.2.

R1:
interface Serial0/0
 ip address 172.16.0.1 255.255.255.0
 encapsulation ppp
 clock rate 2000000

R2:
interface Serial0/0
 ip address 172.16.0.2 255.255.255.0
 encapsulation ppp

R1#ping 172.16.0.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.0.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/11/28 ms

We have connectivity, let's enable PAP.  PAP sends the password in cleartext across the wire.

R1(config-if)#ppp authentication pap

...and we'll see the other side of the link go down.  Of note, you frequently need to bounce (shut/no shut) the link in order to get the authentication changes to kick in. This time it kicked in immediately for me:

R2(config-if)#
*Mar  1 00:03:41.323: %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial0/0, changed state to down

So what's "ppp authentication pap" do?

ppp authentication pap says "I require my neighbor to authenticate PAP to me".  So now we need some way for R2 to authenticate PAP back to R1.

R2(config-if)#ppp pap sent-username PAPUSER password PAPPASSWORD

So now R2 has a way to authenticate back to R1.  But how does R1 check this password?

R1(config-if)#username PAPUSER password PAPPASSWORD
R1(config)#
*Mar  1 00:19:34.535: %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial0/0, changed state to up

To recap, R1 asks R2 to authenticate, R2 sends the username configured on its interface, and then R1 confirms that against its local user database.  To recap, this is what the config looks like now:

R1:
interface Serial0/0
 ip address 172.16.0.1 255.255.255.0
 encapsulation ppp
 clock rate 2000000
 ppp authentication pap

username PAPUSER password PAPPASSWORD

R2:
interface Serial0/0
 ip address 172.16.0.2 255.255.255.0
 encapsulation ppp
 clock rate 2000000
 ppp pap sent-username PAPUSER password 0 PAPPASSWORD

If we wanted R2 to additionally authenticate R1, we'd add:

R2(config)#username R1USER password R1PASS
R2(config-if)#ppp authentication pap
R2(config-if)#
*Mar  1 00:26:11.471: %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial0/0, changed state to down
R1(config-if)#ppp pap sent-username R1USER password R1PASS
R1(config-if)#
*Mar  1 00:28:04.067: %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial0/0, changed state to up

In short, if you put ppp authentication pap on both sides of the link, you end up with both devices authenticating the other.

CHAP, the challenge-response method, uses similar rules. CHAP never actually puts the password on the wire, instead favoring a hash.

R1(config-if)#ppp authentication chap
R1(config-if)#shut
R1(config-if)#no shut
*Mar  1 01:23:10.919: %LINK-5-CHANGED: Interface Serial0/0, changed state to administratively down
*Mar  1 01:23:11.919: %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial0/0, changed state to down
R2(config)#
*Mar  1 00:28:00.771: %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial0/0, changed state to up
R2(config)#
*Mar  1 01:23:18.851: %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial0/0, changed state to down

We can see the interface is now up/down:

R1(config-if)#do show int s0/0
Serial0/0 is up, line protocol is down
<output omitted>

To make the debug easier to read, I'm going to remove the PAP authentication requirement on R2.  There's no reason R2 can't authenticate R1 with PAP and R1 authenticate R2 with CHAP, other than it makes for complex debugs.

R2(config-if)#no ppp authentication pap
R2(config-if)#no ppp pap sent-username PAPUSER password 0 PAPPASSWORD
R2(config-if)#do debug ppp authentication
*Mar  1 01:56:10.271: Se0/0 PPP: Authorization required
*Mar  1 01:56:12.311: Se0/0 PPP: No authorization without authentication
*Mar  1 01:56:12.315: Se0/0 CHAP: I CHALLENGE id 164 len 23 from "R1"
*Mar  1 01:56:12.323: Se0/0 CHAP: Unable to authenticate for peer

So what this is telling us is that R1 is requesting CHAP, and challenging us with R1. Furthermore this tells R2 is that it should have a username for R1, to take that password, and send it to R2 in hash format.

R2(config)#username R1 password chappass
R2(config)#
*Mar  1 02:16:28.259: Se0/0 PPP: Authorization required
*Mar  1 02:16:28.303: Se0/0 PPP: No authorization without authentication
*Mar  1 02:16:28.307: Se0/0 CHAP: I CHALLENGE id 73 len 23 from "R1"
*Mar  1 02:16:28.315: Se0/0 CHAP: Using hostname from unknown source
*Mar  1 02:16:28.319: Se0/0 CHAP: Using password from AAA
*Mar  1 02:16:28.319: Se0/0 CHAP: O RESPONSE id 73 len 23 from "R2"
*Mar  1 02:16:28.343: Se0/0 CHAP: I FAILURE id 73 len 25 msg is "Authentication failed"

So we sent our hash to R1, but R1 told us we didn't match up.  That's because R1 needs a password for R2:

R1(config)#username R2 password chappass
R2(config)#
*Mar  1 02:21:16.047: Se0/0 PPP: Authorization required
*Mar  1 02:21:16.107: Se0/0 PPP: No authorization without authentication
*Mar  1 02:21:16.107: Se0/0 CHAP: I CHALLENGE id 165 len 23 from "R1"
*Mar  1 02:21:16.119: Se0/0 CHAP: Using hostname from unknown source
*Mar  1 02:21:16.123: Se0/0 CHAP: Using password from AAA
*Mar  1 02:21:16.123: Se0/0 CHAP: O RESPONSE id 165 len 23 from "R2"
*Mar  1 02:21:16.151: Se0/0 CHAP: I SUCCESS id 165 len 4
R2(config)#
*Mar  1 02:21:17.059: %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial0/0, changed state to up

and we're up!

You can change what hostname you send to your neighbor.

R1(config-if)#ppp chap hostname FOO
R1(config-if)#shut
*Mar  1 02:42:08.943: %LINK-5-CHANGED: Interface Serial0/0, changed state to administratively down
R1(config-if)#no shut

A sample from the debug on R2:
*Mar  1 02:42:35.407: Se0/0 CHAP: I CHALLENGE id 173 len 24 from "FOO"
*Mar  1 02:42:35.411: Se0/0 CHAP: Unable to authenticate for peer
and the fix:

R2(config)#username FOO password chappass
*Mar  1 02:42:39.459: Se0/0 CHAP: I CHALLENGE id 174 len 24 from "FOO"
*Mar  1 02:42:39.467: Se0/0 CHAP: Using hostname from unknown source
*Mar  1 02:42:39.471: Se0/0 CHAP: Using password from AAA
*Mar  1 02:42:39.471: Se0/0 CHAP: O RESPONSE id 174 len 23 from "R2"
*Mar  1 02:42:39.527: Se0/0 CHAP: I SUCCESS id 174 len 4
R2(config)#
*Mar  1 02:42:40.527: %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial0/0, changed state to up

To recap, R1 sends its hostname to R2 as "FOO" instead of "R1".  R2 looks up FOO in its local database and sends back "R2" and "chappass".  R1 already has a username for R2/chappass, and authentication succeeds.

It's possible to ask for both PAP and CHAP, in the order we prefer them:

R1(config-if)#ppp authen pap chap

I'm going to change debugs on R2, the output from debug ppp authentication is lacking some details I want.

R2(config-if)#do u all
All possible debugging has been turned off
R2(config-if)#do debug ppp negotiation

debug ppp negotiation is great, but it produces a lot of output, so be careful enabling it.

R2(config-if)#shut
*Mar  1 03:16:01.343: %LINK-5-CHANGED: Interface Serial0/0, changed state to administratively down
R2(config-if)#no shut
R2(config-if)#
*Mar  1 03:16:04.171: %LINK-3-UPDOWN: Interface Serial0/0, changed state to up
*Mar  1 03:16:04.175: Se0/0 PPP: Using default call direction
*Mar  1 03:16:04.175: Se0/0 PPP: Treating connection as a dedicated line
*Mar  1 03:16:04.175: Se0/0 PPP: Session handle[2000047F] Session id[286]
*Mar  1 03:16:04.179: Se0/0 PPP: Phase is ESTABLISHING, Active Open
*Mar  1 03:16:04.179: Se0/0 LCP: O CONFREQ [Closed] id 185 len 10
*Mar  1 03:16:04.179: Se0/0 LCP:    MagicNumber 0x0245ACCB (0x05060245ACCB)
*Mar  1 03:16:04.235: Se0/0 LCP: I CONFREQ [REQsent] id 227 len 14
*Mar  1 03:16:04.235: Se0/0 LCP:    AuthProto PAP (0x0304C023)
*Mar  1 03:16:04.239: Se0/0 LCP:    MagicNumber 0x0145B453 (0x05060145B453)
*Mar  1 03:16:04.239: Se0/0 LCP: O CONFNAK [REQsent] id 227 len 9
*Mar  1 03:16:04.239: Se0/0 LCP:    AuthProto CHAP (0x0305C22305)


We see PAP requested first, we decline, and then CHAP.  CHAP eventually succeeds:

*Mar  1 03:16:04.387: Se0/0 IPCP: State is Open
*Mar  1 03:16:04.395: Se0/0 IPCP: Install route to 172.16.0.1
*Mar  1 03:16:05.319: %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial0/0, changed state to up
R2(config-if)#

This leads into another interesting point. Other blogs mention using ppp pap refuse if you don't want to authenticate with PAP. There's some call for using this command with AAA-backed PAP, but you just don't need it for this simple config.

For example,
R2(config-if)#ppp pap refuse
R2(config-if)#ppp pap sent-username PAPUSER password PAPPASS

Will refuse PAP.

However, so will:
R2(config-if)#no ppp pap refuse
R2(config-if)#no ppp pap sent-username PAPUSER password PAPPASS

Simply not providing PAP authentication details will work.

On the other hand, ppp chap refuse does make some sense.  Let's say your neighbor (R1), prefers CHAP but also accepts PAP:

R1(config-if)#ppp authen chap pap

You don't want your user database being checked for matches against CHAP, you want to authenticate with PAP instead.  This is the way to accomplish that:

R2(config-if)#ppp chap refuse
R2(config-if)#ppp pap sent-username R2 password 0 chappass

Hopefully that clears up this easy to implement, but easily misunderstood, topic.

Cheers,

Jeff Kronlage

Sunday, June 30, 2013

Netflow

This post will be geared towards CCIE lab topics.  I will use Solarwinds' freebie Netflow analyzer in some examples, but the topics, in general, will be geared towards exporting data, not towards collecting it.

Let's kick off with a discussion of versions.  Anyone who's used Netflow before knows version 5 is the one typically used, with some newer implementations using version 9.  So what's the story on all the "lost versions"?

v1 - First implementation, still supported on 12.4T, restricted to IPv4 without subnet masks.
v2-v4 - Internal Cisco versions, never released
v5 - Most commonly used version, IPv4-only, supports subnet masks
v6 - I couldn't find any information at all
v7 - Extension of v5, reportedly used on some Catalyst switches
v8 - First version to support aggregation.  v8's improvements made it into v9
v9 - Latest Cisco standard, supports IPv6, aggregation, and Flexible Netflow (FNF).
"v10" - aka IPFIX, this is the open standard for Netflow and will presumably replace it eventually.  It's called "v10" because the version header in the packet of IPFIX is "10", and is basically an open standard implementation of v9.

We will be focusing primarily on v5 and v9, and touching a little bit on v8.  There's no good argument for using v1, and IOS 12.4(15)T only supports v1, v5, v8 (limited) and v9.  IPFIX/v10 isn't available in 12.4(15)T.  Fortunately - or perhaps unfortunately for those who are looking at this document for reasons other than academic reasons - the Catalyst 3560 that is on the lab exam doesn't support Netflow at all, so we're not going to touch on Catalyst Netflow at all.  Of note, more modern 3560s, such as the 3560-X, do support Netflow.

If you want to know more about the various Netflow versions, here is a fantastic explanation:
http://www.youtube.com/watch?v=rcDQi7M1uo4

At a high-level, here is how Netflow works:
- "Flows" are identified by the collector.  Prior to v9, flows are identified as having the same source IP, source port, destination IP, destination port, protocol (TCP, UDP, ICMP, etc), and input interface.  If they all match, they're considered the same flow.
- Flows are collected to the Netflow cache on the router
- After a timeout, either due to length of the flow exceeding a maximum, the flow explicitly terminating (FIN/RST flag), or no packets being received for the flow for a length of time, the data is collected, along with other appropriate flows, and sent to the Netflow collector. The default timeouts are 30 minutes for active flows, and 15 seconds for inactive flows.
- The Netflow collector collects, and then presents, the data in whatever format you chose.

On a side note, I mentioned above that the protocol is determined on a high-level by protocol number: TCP, UDP, ICMP, etc.  In newer versions of IOS (15.0+), NBAR can be integrated into Netflow for more granular protocol results.  As that is presently outside the scope of the CCIE lab, I will not be discussing it here.

Let's look at some basic Netflow v5 usage.  Here is our lab topology:


R7 (Lo0 7.7.7.7) and R8 (Lo0 8.8.8.8) will be communicating with each other, with R1 running Netflow, and exporting to the Windows XP VM running Solarwinds Free Real-Time Netflow Analyzer:
http://www.solarwinds.com/products/freetools/netflow-analyzer.aspx

We'll enable TCP small servers so that we can utilize chargen to create TCP flows.
R7(config)#service tcp-small-servers
R8(config)#service tcp-small-servers

R7#telnet 8.8.8.8 19 /source-interface lo0
Trying 8.8.8.8, 19 ... Open
 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefg
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh
<ctrl-c>

R8#telnet 7.7.7.7 19 /source-int lo0
Trying 7.7.7.7, 19 ... Open
 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefg
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh
<ctrl-c>

Even after terminating the output with ctrl-c, the session is still running in the background:

R7#show tcp brief
TCB       Local Address               Foreign Address             (state)
6603F1F0  7.7.7.7.12570               8.8.8.8.19                  ESTAB
6603E708  7.7.7.7.19                  8.8.8.8.51405               ESTAB

Now let's setup Netflow on R1:
R1(config)#int fa0/1
R1(config-if)#ip flow ingress
R1(config-if)#int fa1/0
R1(config-if)#ip flow ingress
R1(config-if)#int fa2/0
R1(config-if)#ip flow ingress
R1(config-if)#exit
R1(config)#ip flow-export version 5
R1(config)#ip flow-export destination 172.16.0.100 2055

It's unlikely you'd be able to start collection in Solarwinds at this point.  The freebie Solarwinds will only let you start collection if it's receiving Netflow packets, and it's unlikely any flows have been sent yet, as the time-out for ongoing flows is presently set to 1 hour.  Let's turn it down:

R1(config)#ip flow-cache timeout active 1

We'll go ahead and turn down the inactive flow timer as well:

R1(config)#ip flow-cache timeout inactive 10

I've fired up the collector on the 172.16.0.100:



As you can see, we've started capturing some traffic.

You probably noticed my use of ip flow ingress on all interfaces that are passing traffic.  This is a new command with Netflow v9.  The old command was ip route-cache flow.  It's still supported but it's almost functionality identical to ip flow ingress.  The one small difference is with sub-interfaces.  If you apply ip flow ingress to a main interface, you're going to get the native VLAN traffic reported only.  If you want the entire interface, you apply ip route-cache flow to the main interface, and it basically acts as a macro, applying ip flow ingress to every sub-interface (even ones created in the future) for you.

One of the most baffling things for me was the ip flow egress command.  There's some very important things to know about its usage.  First of all, do not use it unless you are using Netflow v9.  Netflow v5 doesn't have a concept of ingress and egress.  There's no field in the v5 packet for direction. 

So how do you collect egress traffic information on v1 or v5?  This is simple.  ip flow ingress is applied to every interface and the collector reverses the information behind the scenes.  Logically, if the collector can see all the ingress flows, it would know about all the egress flows, too (what comes in most go out!). We'll talk more about ip flow egress when we get to Netflow v9.

Random Sampled Netflow
As you might imagine, a busy Netflow exporter could not only create a lot of extra CPU and memory usage for the router, but it could create too much traffic on the wire or even swamp the collector.  Sampled Netflow was created to fix this problem.  Sampled Netflow would take every 1 out of X packets and sample it.  The problem with this mechanism is that it may continuously miss flows that are happen in between 1 and X.  Say you are looking at every 1 in 100 packets, and you continuously have a burst every 50th packet.  Sampled Netflow will never see this burst.  Introducing random sampled Netflow, which still grabs every 1 in X packet, but introduces a random element so that it's not precisely every 1 in 100.  Sampled Netflow isn't supported on any equipment on the CCIE lab, but random sampled Netflow is.

Implementation is reasonable simple:

R1(config)#flow-sampler-map NETFLOW-TEST
R1(config-sampler)#mode random one-out-of 10
R1(config-sampler)#exit
R1(config)#int fa0/1
R1(config-if)#no ip flow ingress
R1(config-if)#flow-sampler NETFLOW-TEST
R1(config-if)#int fa1/0
R1(config-if)#no ip flow ingress
R1(config-if)#flow-sampler NETFLOW-TEST
R1(config-if)#int fa2/0
R1(config-if)#no ip flow ingress
R1(config-if)#flow-sampler NETFLOW-TEST

Note I've turned off ip flow ingress on all interfaces first.  ip flow ingress trumps random sampled Netflow. 



We see we're still getting output to the collector.

We can add input filters to random sampled Netflow.  This just tells the Netflow collector to only collect flows that match the access list.

R1(config)#flow-sampler-map FILTERED_NETFLOW
R1(config-sampler)# mode random one-out-of 1
R1(config-sampler)#
R1(config-sampler)#ip access-list extended traffic_acl
R1(config-ext-nacl)# permit ip host 7.7.7.7 host 8.8.8.8
R1(config-ext-nacl)#
R1(config-ext-nacl)#class-map match-all traffic_cm
R1(config-cmap)# match access-group name traffic_acl
R1(config-cmap)#
R1(config-cmap)#policy-map netflow
R1(config-pmap)# class traffic_cm
R1(config-pmap-c)#   netflow-sampler FILTERED_NETFLOW
R1(config-pmap-c)#
R1(config-pmap-c)#int fa0/1
R1(config-if)# no flow-sampler NETFLOW-TEST
R1(config-if)# service-policy input netflow
R1(config-if)#
R1(config-if)#int fa1/0
R1(config-if)# no flow-sampler NETFLOW-TEST
R1(config-if)# service-policy input netflow
R1(config-if)#
R1(config-if)#int fa2/0
R1(config-if)# no flow-sampler NETFLOW-TEST
R1(config-if)# service-policy input netflow

Wordy configuration, isn't it?  You'll notice I changed the "random sampled" Netflow back to "one out of one" packets.  This isn't necessary, but it does demonstrate how you can have non-sampled Netflow but still have input filters.  The configuration isn't that complex, match an ACL with the traffic you want to evaluate on a class-map, match the class-map in a policy-map, and apply the netflow-sampler in the policy-map.  Then apply to interfaces!



And still collecting!

Netflow v9 is a big topic.  The first thing to understand is that there is no set number of fields of a Netflow v9 packet.  They can be defined.  This is know as Flexible Netflow (FNF).  Because of this, a template needs to be periodically sent out to define what the flows will contain, in order to instruct  the collector what to do with the information.

The two other notable changes, due to its flexible nature, is that IPv6 and direction are now supported.  We'll discuss both of them.

Let's start with IPv4 egress collection.

R1(config-if)#int fa0/1
R1(config-if)#no service-policy input netflow
R1(config-if)#int fa1/0
R1(config-if)#no service-policy input netflow
R1(config-if)#int fa2/0
R1(config-if)#no service-policy input netflow
R1(config-if)#exit
R1(config)#ip flow-export version 9
R1(config)#int fa2/0
R1(config-if)#ip flow egress



There we have it - only egress data, as expected.  So why is this egress data any better than just using the inverse ingress on the collector side?  There are three main reasons:

- If you only want to collect flows on one interface and still want the egress traffic.  Obviously in order for egress to work otherwise, you have to collect ingress from every interface.  With egress, you could put ip flow ingress and ip flow egress on the same interface and get both.
- If you want Netflow to sample multicast traffic.  Multicast traffic can't be effectively matched on ingress, because before the router processes the traffic, it's not known what interface or interfaces it will be exiting.
- If WAN links are using compression.  Using the "all interfaces ingress" method of calculating egress creates a problem with compression.  The "outbound" flow is calculated before the compression is applied with that method, potentially showing the link using more bandwidth than it has available.  Using ip flow egress calculates after compression.

Let's take a look at what the actual packets look like, courtesy of Wireshark.



Sorry for having to click the image, the Wireshark output is just too big to insert natively into the blog.

Note the final line: "no template found"

This is normal for Netflow v9.  Since Netflow exporting is inherently one-way, there's no way for the collector to ask for the template when it fires up.  The template is like the a Rosetta stone, the collector doesn't know what to do with the data it's given. 



Luckily the templates come pretty regularly.  Wait a minute, templates?  We didn't configure a template.  Technically speaking this isn't FNF.  Netflow v9 has a default template that's used unless you configure FNF, which we'll do further on in the blog.

The next packet contained a template.  Also included in the next packet was another data sample.  Now we can understand what's in the flow data:



And now we can also see the important "Direction" field.

You can also adjust how frequently the template is sent:
ip flow-export template refresh-rate 2

This would sent the template every other packet.

Netflow Top Talkers is a feature supported on all versions of Netflow (except IPv6, in 12.4T) that will let you see the top talkers for performance debugging purposes.  It can be useful if you don't have a collector.

R1(config)#ip flow-top-talkers
R1(config-flow-top-talkers)#top 10
R1(config-flow-top-talkers)#sort-by bytes

R1#show ip flow top-talkers
SrcIf         SrcIPaddress    DstIf         DstIPaddress    Pr SrcP DstP Bytes
Fa0/1         7.7.7.7         Fa2/0*        8.8.8.8         06 0013 C8CD    44K
Fa1/0         7.7.7.7         Fa2/0*        8.8.8.8         06 311A 0013  4400
2 of 10 top talkers shown. 2 flows processed.

You'll notice the * next to DstIf.  This indicates an egress flow.

Let's take a look at IPv6 Netflow.

R1(config-if)#int fa2/0
R1(config-if)#no ip flow egress  ! disabling IPv4 Netflow
R1(config-if)#ipv6 flow ingress
R1(config-if)#ipv6 flow egress
R1(config-if)#exit
R1(config)#ipv6 flow-export version 9  !  somewhat redundant
R1(config)#ipv6 flow-export destination 172.16.0.100 2055

R8#telnet CC1E::7 19 /source-int lo0
Trying CC1E::7, 19 ... Open
 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefg
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh

R7#telnet CC1E::8 19 /source-int Lo0
Trying CC1E::8, 19 ... Open
 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefg
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh

I won't bother showing you any Solarwinds results at this point, because the freebie edition doesn't support IPv6.  We'll have to rely on Wireshark output.



Aside from optionally playing with the fields in FNF, that's about all there is to IPv6 Netflow.  Note there is no IPv6 edition of top-talkers in 12.4(15)T.

Something I found interesting in Netflow v9 is that basic modifications can be made to the data output without actually using FNF.  For example:

Rack1R4(config)#ip flow-export ver 9 ?
  bgp-nexthop  record BGP NextHop
  origin-as    record origin AS
  peer-as      record peer AS

You can optionally include some BGP information right off the "flow-export" command.  Before I'd used FNF I was confused as to how these fields would interact with FNF -- what if you included a parameter with ip flow-export but then didn't include it in FNF?  Then I discovered FNF doesn't even use the ip flow-export command, so it became a non-issue.

Before we move on to FNF, let's look at one last topic.  This is really a v8 topic, but since it also interfaces with v9, I stuck it in here:

Netflow Aggregation
It may be of more efficient use to group IPs rather than seeing individual flows for every source/destination.  What if we grouped all source, or all destination, based on the routing table?  This may be a better real-world use, as it's a fair bet that the routing table's prefix-size is a pretty good indicator of how systems would be grouped.  This feature was the reason behind Netflow v8, and unless you enable Netflow v9 manually, you'll still get v8 packets.

R1(config-if)#int fa2/0
R1(config-if)#no ipv6 flow ingress
R1(config-if)#no ipv6 flow egress
R1(config-if)#ip flow-aggregation cache destination-prefix
R1(config-flow-cache)# cache entries 1024
R1(config-flow-cache)# export destination 172.16.0.100 2055
R1(config-flow-cache)# mask destination minimum 16 ! indicates never go bigger than a /16
R1(config-flow-cache)# enabled
R1(config-flow-cache)#
R1(config-flow-cache)#int fa1/0
R1(config-if)#ip flow ingress
R1(config-if)#int fa0/1
R1(config-if)#ip flow ingress

This would aggregate based on destination prefix; if you wanted to aggregate based on source prefix, you would substitute:

R1(config-if)#ip flow-aggregation cache destination-prefix source-prefix
R1(config-flow-cache)# mask destination source minimum 16

I've only got one flow, and it's attached to a /32, so this isn't going to be too impressive for output, but I do want to show the v8 packet:



Now you can say you've seen a Netflow v8 packet!  Not exactly anything to write home about ...

R1(config)#ip flow-aggregation cache destination-prefix
R1(config-flow-cache)#export version 9

Now we're back to v9 packets.

Here's a rather curious command -- ip flow-egress input-interface

You may have wondered why I made such an elaborate lab for Netflow. Demonstrating this command is the reason why.  I've got two equal cost EIGRP routes from R7 to R8, via R2 and R3.  I'm using CEF per-packet load sharing on R7 (ip load-sharing per-packet) to be sure roughly half the packets from the chargen (TCP 19) go down each path. In this fashion, R1 will receive 50% of the packets destined for R8 on Fa0/1 and the other 50% on Fa1/0.

The reason is to demonstrate the ability to swap the egress and ingress fields as key fields. What is a key field?  As you're aware, in v9, the exported fields can be changed. The key fields are a "must match" - in other words, they need to be present in the flow, or that flow will not be cached & exported. The key fields also must all match across all packets for them to be considered part of the same flow. The non-key fields don't need to match, and will be exported only if they're present.  Not all fields are interchangeable, many that can be used as key fields cannot be used as non-key fields.

As an example, obvious key fields could be source & destination IP, with a non-key field of destination AS number. 

ip flow-egress input-interface shifts the default egress key field from the output interface to the input-interface.  What's that mean to us? 

I've stopped all the chargen sessions except one from R8 to R7 (in other words, R7 telnetted to R8's Lo0 on TCP port 19).  I've removed all interface-level Netflow commands and added ip flow egress to Fa2/0.  R1 will see one flow by default, because the egress-interface is the default key field for egress flows.

Let's double-check that theory.



Hard to prove over screenshots, but this is the recurring pattern - one template + one flow.  The template is coming consistently because I configured it to arrive every other packet (better for fast labbing). Then we see the one flow, which is the only one we'll see, because source/dest (and all other key fields) are the same, as well as egress interface.

What if we wanted to see the flows separately?

R1(config)#ip flow-egress input-interface

While both fields are still in the packet, the one that matters for matching the flow is now the input interface instead of egress interface.  Let's look at the change:



Now we see one template plus two flows, one for each ingress interface.

Flexible Netflow (FNF)

Let's build out a sample of FNF.  Solarwinds freebie edition doesn't support FNF, so once again we'll be looking at the outcome from Wireshark.

First, we need to remove all traditional Netflow v9 commands.  The command set we've been using thus far only works with the default v9 template, changing it makes the rest of the traditional commands unnecessary:

R1(config)#no ipv6 flow-export destination 172.16.0.100 2055
R1(config)#no ip flow-top-talkers
R1(config)#no ip flow-aggregation cache destination-prefix
R1(config)#no ip flow-export destination 172.16.0.100 2055
R1(config)#no ip flow-export template refresh-rate 2
R1(config)#no ip flow-export version 9
R1(config)#no ip flow-cache timeout active 1
R1(config)#no ip flow-cache timeout inactive 10
R1(config)#no ip flow-egress input-interface
R1(config)#int fa2/0
R1(config-if)#no ip flow egress

FNF reminds me a bit of building a MQC QoS policy.

You create:
Flow Records, which set your key and non-key fields
Flow Exporter, which details where and how to send the exports
Flow Monitors, which match the flow records and exporters, and are then applied to an interface.

On 12.4(15)T, IPv6 isn't supported on FNF; I had to use the default template to get IPv6 flows exported. 

There are over a hundred fields that can be exported, so I'm just going to show one sample here, as a small book could be written about FNF in and of itself.

R1(config)#flow record FLOW-RECORD-TEST
R1(config-flow-record)# match ipv4 source address
R1(config-flow-record)# match ipv4 destination address
R1(config-flow-record)# collect flow direction   ! IMPORTANT
R1(config-flow-record)# collect interface input
R1(config-flow-record)# collect routing next-hop address ipv4

match denotes a key field, collect denotes a non-key field.

Note I flagged the collect flow direction line.  By default, FNF does not export anything, so as best practice you should export collect flow-direction.  Otherwise, the collector will not know if the flow is ingress or egress, although I've heard that most collectors assume ingress if this record is absent.

R1(config-flow-record)#flow exporter FLOW-EXPORTER-TEST
R1(config-flow-exporter)# destination 172.16.0.100
R1(config-flow-exporter)# source FastEthernet1/0
R1(config-flow-exporter)# transport udp 2055
R1(config-flow-exporter)# template data timeout 60

This is pretty obvious; setting the destination, port, template timeout, etc.

R1(config-flow-exporter)#flow monitor FLOW-MONITOR-TEST
R1(config-flow-monitor)# record FLOW-RECORD-TEST
R1(config-flow-monitor)# exporter FLOW-EXPORTER-TEST
R1(config-flow-monitor)# cache timeout active 60

R1(config-flow-monitor)#interface fa2/0
R1(config-if)# ip flow monitor FLOW-MONITOR-TEST input
R1(config-if)# ip flow monitor FLOW-MONITOR-TEST output

R1#show flow monitor
Flow Monitor FLOW-MONITOR-TEST:
  Description:       User defined
  Flow Record:       FLOW-RECORD-TEST
  Flow Exporter:     FLOW-EXPORTER-TEST
  Cache:
    Type:              normal
    Status:            allocated
    Size:              4096 entries / 196620 bytes
    Inactive Timeout:  15 secs
    Active Timeout:    60 secs
    Update Timeout:    1800 secs

And the packets?



There it is - just the fields we asked for.

Hope you enjoyed,

Jeff

Tuesday, June 11, 2013

Everything NAT

IOS has a plethora of NAT features that span from simple 1:1 NATs to policy NATs to basic round-robin load balancing. I've done a lot of NAT in my career, but most of it has been on an ASA.  Some of these features are not so obvious on IOS, and I've sometimes had a hard time producing specific functionality when I had to do anything beyond a basic NAT or PAT.  Here, I have deep-dived every NAT feature I can find, including use cases.  We will start with introducing the easy features, cover some directional issues, and then move on to advanced features.

Our lab is as follows:



The subnet between R1, R2, R3 and R4 will be 192.168.0.0/24, with the fourth octet being the router number.  The link between R4 and R5 will be 30.0.0.0/24 with the fourth octet being the router number. Each router will have a loopback of X.X.X.X where X is its router number (i.e. R3 = 3.3.3.3).  R4 will perform all the NATing.

Static 1:1 NAT
R1, R2, and R3 all have a default route pointing towards R4.  These will be our "inside".  R5 doesn't have a route for anything other than its own loopback and connected 30.0.0.0/24 connected segment.  This will be our "outside".

Let's get R1 and R5 talking to each other.

R4(config)#int fa0/1
R4(config-if)#ip nat inside
R4(config-if)#int fa0/0
R4(config-if)#ip nat outside
R4(config)#ip nat inside source static 192.168.0.1 30.0.0.20

R1#ping 30.0.0.5
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 30.0.0.5, timeout is 2 seconds:
..!!!
Success rate is 60 percent (3/5), round-trip min/avg/max = 48/52/60 ms

R5#debug ip icmp
ICMP packet debugging is on
R5#
*Mar  1 00:26:35.911: ICMP: echo reply sent, src 30.0.0.5, dst 30.0.0.20
*Mar  1 00:26:37.899: ICMP: echo reply sent, src 30.0.0.5, dst 30.0.0.20
*Mar  1 00:26:37.979: ICMP: echo reply sent, src 30.0.0.5, dst 30.0.0.20
*Mar  1 00:26:38.027: ICMP: echo reply sent, src 30.0.0.5, dst 30.0.0.20

Really straightforward.  This flips the source address from 192.168.0.1 to 30.0.0.20 when moving from inside to outside.  From outside to inside the destination address will be flipped from 30.0.0.20 back to 192.168.0.1.  R4 will ARP for 30.0.0.20 on the outside, which we can see via the alias table:

R4(config)#do show ip alias
Address Type             IP Address      Port
Interface                4.4.4.4
Interface                30.0.0.4
Dynamic                  30.0.0.20
Interface                192.168.0.4

If for some reason we don't want R4 to ARP for 30.0.0.20, we could use no-alias:

R4(config)#ip nat inside source static 192.168.0.1 30.0.0.20 no-alias
R4(config)#do sh ip alias
Address Type             IP Address      Port
Interface                4.4.4.4
Interface                30.0.0.4
Interface                192.168.0.4

R5#clear arp
R5#ping 30.0.0.20
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 30.0.0.20, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

Let's make R4 ARP again.

R4(config)#no ip nat inside source static 192.168.0.1 30.0.0.20 no-alias
R4(config)#ip nat inside source static 192.168.0.1 30.0.0.20

R5#ping 30.0.0.20
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 30.0.0.20, timeout is 2 seconds:
.!!!!
Success rate is 80 percent (4/5), round-trip min/avg/max = 28/54/80 ms

R1#debug ip icmp
ICMP packet debugging is on
R1#
*Mar  1 00:38:39.007: ICMP: echo reply sent, src 192.168.0.1, dst 30.0.0.5
*Mar  1 00:38:39.067: ICMP: echo reply sent, src 192.168.0.1, dst 30.0.0.5
*Mar  1 00:38:39.103: ICMP: echo reply sent, src 192.168.0.1, dst 30.0.0.5
*Mar  1 00:38:39.175: ICMP: echo reply sent, src 192.168.0.1, dst 30.0.0.5

You see the mapping is bidirectional, R5 can reach R1.

Let's create some more traffic and check out the NAT table.

R1#telnet 30.0.0.5
Trying 30.0.0.5 ... Open

Password required, but none set
[Connection to 30.0.0.5 closed by foreign host]

R4#sh ip nat translations
Pro Inside global      Inside local       Outside local      Outside global
tcp 30.0.0.20:62371    192.168.0.1:62371  30.0.0.5:23        30.0.0.5:23
--- 30.0.0.20          192.168.0.1        ---                ---
There's some interesting stuff here.  We see the entry created by our nat statement:
--- 30.0.0.20          192.168.0.1        ---                ---

We'll go over this more further down the document.  Let's focus on:
Pro Inside global      Inside local       Outside local      Outside global
tcp 30.0.0.20:62371    192.168.0.1:62371  30.0.0.5:23        30.0.0.5:23

Why is this here?  I thought this was NAT, not PAT, so we shouldn't need all these port numbers.  For that matter we don't even care about the outside local/global addresses, really.

This is because of a feature activated by ip nat create flow-entries.  This is a default-on feature to accelerate the NAT process.  If you want to disable it, you'd use:

R1#telnet 30.0.0.5
Trying 30.0.0.5 ... Open

Password required, but none set
[Connection to 30.0.0.5 closed by foreign host]

R4(config)#do sh ip nat trans
Pro Inside global      Inside local       Outside local      Outside global
--- 30.0.0.20          192.168.0.1        ---                ---

That's more what you'd expect to see, even if it is slower.  I've now re-enabled ip nat create flow-entries.

Static PATs. 

First we'll remove our NAT.

R4(config)#no ip nat inside source static 192.168.0.1 30.0.0.20
R4(config)#ip nat inside source static tcp 192.168.0.1 19 30.0.0.20 5000

This should map port 19 (chargen) on the inside to port 5000 on the outside.

R1(config)#service tcp-small-servers   ! enable chargen on R1

R4(config)#do sh ip nat trans
Pro Inside global      Inside local       Outside local      Outside global
tcp 30.0.0.20:5000     192.168.0.1:19     ---                ---

There's the translation from our static PAT.

We see that we can no longer telnet there:

R5#telnet 30.0.0.20
Trying 30.0.0.20 ...
% Connection refused by remote host

What about telnetting to port 5000?

R5#telnet 30.0.0.20 5000
Trying 30.0.0.20, 5000 ... Open
 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefg
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh
"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghi

Clearly, we can reach chargen on R1.

We've also added the expected entry in the NAT table:

R4(config)#do sh ip nat trans
Pro Inside global      Inside local       Outside local      Outside global
tcp 30.0.0.20:5000     192.168.0.1:19     30.0.0.5:13368     30.0.0.5:13368
tcp 30.0.0.20:5000     192.168.0.1:19     ---                ---

What if we were translating some protocol that needed an ALG (Application Layer Gateway)?  Turns out IOS's NAT process has some fixups built-in for applications that contain IP and port information inside the packet.  This process happens be default.  If you want to disable it, you'd use:

ip nat inside source static tcp 192.168.0.1 19 30.0.0.20 5000 no-payload

Dynamic NAT

I've removed the static NAT/PAT config.

R4(config)#access-list 90 permit 192.168.0.0 0.0.0.255
R4(config)#ip nat pool nat-pool 30.0.0.50 30.0.0.70 netmask 255.255.255.0
R4(config)#ip nat inside source list 90 pool nat-pool

This will perform a 1:1 NAT translation, dynamically, for the first 20 hosts on 192.168.0.0/24 on to 30.0.0.50 through 70. 

R1#telnet 30.0.0.5
Trying 30.0.0.5 ... Open

Password required, but none set
[Connection to 30.0.0.5 closed by foreign host]

R4(config)#do sh ip nat trans
Pro Inside global      Inside local       Outside local      Outside global
tcp 30.0.0.50:52584    192.168.0.1:52584  30.0.0.5:23        30.0.0.5:23
--- 30.0.0.50          192.168.0.1        ---                ---

We see that 192.168.0.1 has translated to 30.0.0.50 as expected.  Now that this is setup we'll see that dynamic NAT is actually reversible:

R5#telnet 30.0.0.50
Trying 30.0.0.50 ... Open

Password required, but none set
[Connection to 30.0.0.50 closed by foreign host]

R4(config)#do sh ip nat trans
Pro Inside global      Inside local       Outside local      Outside global
tcp 30.0.0.50:23       192.168.0.1:23     30.0.0.5:63636     30.0.0.5:63636
tcp 30.0.0.50:52584    192.168.0.1:52584  30.0.0.5:23        30.0.0.5:23
--- 30.0.0.50          192.168.0.1        ---                ---

What about the other routers?

R2#ping 30.0.0.5
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 30.0.0.5, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 32/55/96 ms

R4(config)#do sh ip nat trans
Pro Inside global      Inside local       Outside local      Outside global
--- 30.0.0.50          192.168.0.1        ---                ---
icmp 30.0.0.51:1       192.168.0.2:1      30.0.0.5:1         30.0.0.5:1
--- 30.0.0.51          192.168.0.2        ---                ---

We now see the new dynamic mapping, 192.168.0.2 = 30.0.0.51. 

What if we wanted to do a bulk 1:1 NAT?

R4(config)#do clear ip nat trans *
R4(config)#no ip nat inside source list 90 pool nat-pool
R4(config)#ip nat inside source static network 192.168.0.0 30.0.0.0 /24

This will do a pretty clever thing, and match the fourth octet on a 1:1 basis when generating traffic from inside -> outside.  Outside -> inside is reversible after the inside->outside translation has taken place and is in the table.

R3#telnet 30.0.0.5
Trying 30.0.0.5 ... Open

Password required, but none set
[Connection to 30.0.0.5 closed by foreign host]

R4(config)#do sh ip nat trans
Pro Inside global      Inside local       Outside local      Outside global
--- 30.0.0.3           192.168.0.3        ---                ---
--- 30.0.0.0           192.168.0.0        ---                ---

We see .3 = .3, as anticipated.

Let's convert the earlier example to dynamic PAT instead.

Dynamic PAT

R4(config)#ip nat inside source list 90 pool nat-pool overload

Now our "nat-pool" nat pool still references 20 IPs, which is unnecessary, this would work fine with one IP address.  But let's test anyway:

R1#telnet 30.0.0.5
Trying 30.0.0.5 ... Open

Password required, but none set
[Connection to 30.0.0.5 closed by foreign host]

R2#ping 30.0.0.5
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 30.0.0.5, timeout is 2 seconds:
.!!!!
Success rate is 80 percent (4/5), round-trip min/avg/max = 36/45/56 ms

R4(config)#do sh ip nat trans
Pro Inside global      Inside local       Outside local      Outside global
tcp 30.0.0.52:26904    192.168.0.1:26904  30.0.0.5:23        30.0.0.5:23
icmp 30.0.0.52:2       192.168.0.2:2      30.0.0.5:2         30.0.0.5:2

We see that both our sessions are now sourced dynamically off 30.0.0.52, instead of one IP per device.

You can also PAT to an interface:

R4(config)#no ip nat inside source list 90 pool nat-pool overload
R4(config)#ip nat inside source list 90 interface fa0/0

I ran the same telnet/ping from R1 and R2 here, not shown.

R4(config)#do sh ip nat trans
Pro Inside global      Inside local       Outside local      Outside global
tcp 30.0.0.4:30046     192.168.0.1:30046  30.0.0.5:23        30.0.0.5:23
icmp 30.0.0.4:3        192.168.0.2:3      30.0.0.5:3         30.0.0.5:3

We see all the sessions coming off the interface IP, 30.0.0.4.  Note I did not use the overload command above.  I could've, but it's implied when you PAT off an interface in this fashion.

Let's say you want a catch-all host behind your PAT.  It would get all the traffic not going somewhere else.  This is similar to the "DMZ host" feature that's on a lot of economy routers.  Let's make R2 our catch-all:

ip nat inside source static 192.168.0.2 interface Fa0/0

Just to reconfirm that R1 can still reach from inside->outside:

R1#ping 30.0.0.5
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 30.0.0.5, timeout is 2 seconds:
.!!!!
Success rate is 80 percent (4/5), round-trip min/avg/max = 36/46/68 ms

I've enabled local login on R2 (not shown here).

R5#telnet 30.0.0.4
Trying 30.0.0.4 ... Open

User Access Verification
Password:
R2>

R4(config)#do sh ip nat trans
Pro Inside global      Inside local       Outside local      Outside global
tcp 30.0.0.4:23        192.168.0.2:23     30.0.0.5:52330     30.0.0.5:52330
--- 30.0.0.4           192.168.0.2        ---                ---

That covers all the basics, let's move on...

NAT Table & Order of Operations

So far we've been looking at "domain based" NAT.  "domains" meaning inside & outside.  As we've been seeing, the NAT table for domain-based NAT is viewed by show ip nat translations.  But we haven't examined what these entries mean too much.

We're actually going to cover four topics here:

- Examining the NAT table
- Eliminating routing by using NAT
- ip nat outside
- Nat Virtual Interface (NVI)

I've eliminated all the existing NAT configuration and we're starting from scratch.

I've also removed the default route on R1 that's pointing at R4:
R1(config)#no ip route 0.0.0.0 0.0.0.0 192.168.0.4

R4(config)#ip nat inside source static 192.168.0.1 30.0.0.50
R4(config)#ip nat outside source static 30.0.0.5 192.168.0.50

The idea is to have R1 perceive R5 as 192.168.0.50, and R5 to perceive R1 as 30.0.0.50.

Let's have a look at the NAT tables.



I personally don't care for the layout of the domain-based NAT table.  Here's the way I decipher it.

These are both source NATs.  The top one is created by ip nat outside, the bottom one is created by ip nat inside.  I'm going to generate a traffic flow so that we can see the outcome of this better.

(Note I've fixed something behind-the-scenes here so that I can demonstrate this point first.  We'll discuss later.)



I've created a ping on R1:
SOURCE: 192.168.0.1
DESTINATION: 192.168.0.50

R4 performed two NATs:
1) A source NAT from 192.168.0.1 to 30.0.0.50
2) A reverse source NAT of 192.168.0.50 to 30.0.0.5.  The source NAT is for the outside->inside direction (30.0.0.5 -> 192.168.0.50), and this is the "reversible" method we've been discussing.

That's all fine and dandy, but here's my quicky method for seeing what this all means:



If inside->outside, our pre-translation packet is the inside pair (inside local, outside local) or "1 -> 2" (192.168.0.1 -> 192.168.0.50) and our post-translation packet is the outside pair (inside global -> outside global) or "3 -> 4" (30.0.0.50 -> 30.0.0.5).

Outside -> Inside is exactly flipped:



Original packet is (Outside Global, Inside Global) or "1 -> 2" (30.0.0.5 -> 30.0.0.50); and our post-translation packet is the inside pair, reversed (Outside Local -> Inside Local) or "3 -> 4" (192.168.0.50 -> 192.168.0.1).

As such, we are now able to get by on translations and ARPs, no routing is required.... sort of.

I mentioned I'd "fixed"  something undisclosed above, let's look at what would have gone wrong here.  I removed the fix.

Disabling CEF so that we can debug the transit packets...

R4(config)#int fa0/0
R4(config-if)#no ip route-cache
R4(config-if)#int fa0/1
R4(config-if)#no ip route-cache
R4(config-if)#do debug ip packet
IP packet debugging is on
R4(config-if)#do debug ip nat
IP NAT debugging is on

R1#telnet 192.168.0.50
Trying 192.168.0.50 ...
% Connection refused by remote host

Clearly broken, what'd R4's debug have to say?

R4(config-if)#
*Mar  1 07:41:33.458: IP: tableid=0, s=192.168.0.1 (FastEthernet0/1), d=192.168.0.50 (FastEthernet0/1), routed via RIB
*Mar  1 07:41:33.458: IP: s=192.168.0.1 (FastEthernet0/1), d=192.168.0.50 (FastEthernet0/1), len 44, rcvd 3
*Mar  1 07:41:33.466: IP: tableid=0, s=192.168.0.50 (local), d=192.168.0.1 (FastEthernet0/1), routed via FIB
*Mar  1 07:41:33.466: IP: s=192.168.0.50 (local), d=192.168.0.1 (FastEthernet0/1), len 40, sending

The issue is on line 1.  We're routing from Fa0/1 to Fa0/1.  That's because even though R4 ARPed for 192.168.0.50, it sees the egress interface as the one it came in on. 

This is where order of operations comes in.  Inside->Outside and Outside->Inside NAT are handled differently.

Inside->Outside "routes first" and NATs second.  I put "routes first" in quotes, because it's more like "picks an interface first" (which I suppose is routing). Outside->Inside nat NATs first and "routes second". 

Problem is, the packet is basically deemed invalid before the NAT even happens.  We need a more specific route to fix this issue.  This is what I "fixed" earlier.

R4(config)#ip route 192.168.0.50 255.255.255.255 fa0/0

This /32 route will push traffic for 192.168.0.50 on to the outside interface.

R1#telnet 192.168.0.50
Trying 192.168.0.50 ... Open

Password required, but none set
[Connection to 192.168.0.50 closed by foreign host]

Now we can connect!

R4(config)#
*Mar  1 07:53:13.510: IP: tableid=0, s=192.168.0.1 (FastEthernet0/1), d=192.168.0.50 (FastEthernet0/0), routed via RIB
*Mar  1 07:53:13.510: NAT: s=192.168.0.1->30.0.0.50, d=192.168.0.50 [49922]
*Mar  1 07:53:13.514: NAT: s=30.0.0.50, d=192.168.0.50->30.0.0.5 [49922]
*Mar  1 07:53:13.514: IP: s=30.0.0.50 (FastEthernet0/1), d=30.0.0.5 (FastEthernet0/0), g=30.0.0.5, len 44, forward

And now we're seeing traffic going from Fa0/1 to Fa0/0, and then the two pre-discussed NATs happening.

A slightly cleaner way to make this happen:

R4(config)#no ip route 192.168.0.50 255.255.255.255 fa0/0
R4(config)#ip nat outside source static 30.0.0.5 192.168.0.50 add-route

The "add-route" command creates the static route towards 192.168.0.50 on Fa0/0 automatically:

R4(config)#do sh ip route static
     192.168.0.0/24 is variably subnetted, 2 subnets, 2 masks
S       192.168.0.50/32 [1/0] via 30.0.0.5

There's another method, as well.  NVI-based NAT.
I've read a lot of blogs saying NVI-based NAT is the "new NAT method".  I don't think this is the case, or at least not yet.  Refer to this article from Cisco:
http://www.cisco.com/en/US/tech/tk648/tk361/technologies_q_and_a_item09186a00800e523b.shtml

"NVI stands for NAT Virtual Interface. It allows NAT to translate between two different VRFs."

There's a lot of features that aren't available on NVI-based NAT yet (such as SNAT, and some route-map configurations), and based on the above statement, I am wondering if they're planned for the future?

Anyway, how does this help our NAT/route order-of-operation problem?

NVI-based NAT "double routes".  It picks an egress interface, NATs, and then re-picks an egress interface.  This behavior is symmetric for both "inside" and "outside".  In fact, as we will see, NVI NAT eliminates the concept of inside and outside completely.

I've removed all the existing NAT configuration from R4.

R4(config)#int fa0/0
R4(config-if)#ip nat enable
R4(config)#int fa0/1
R4(config-if)#ip nat enable
R4(config)#ip nat source static 192.168.0.1 30.0.0.50
R4(config)#ip nat source static 30.0.0.5 192.168.0.50

That's it - no inside, no outside, just simple translations.

R1#ping 192.168.0.50
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.50, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 40/56/76 ms

R4#show ip nat nvi translations
Pro Source global      Source local       Destin  local      Destin  global
--- 192.168.0.50       30.0.0.5           ---                ---
icmp 30.0.0.50:16      192.168.0.1:16     192.168.0.50:16    30.0.0.5:16
--- 30.0.0.50          192.168.0.1        ---                ---

Note the new "show" command for NVI.
You'll also note no reference of inside or outside on the show output, everything is considered source or destination.  This makes the debugging, nat statements, etc much easier to figure out. 

A couple catches on NVI NAT.  As I mentioned above, SNAT (discussed later) is unsupported, as are route-maps for 1:1 static NATs (also discussed later).

Before we push on to policy NATs, let's take a quick look at a way to use IOS NAT as a poor man's load balancer.

I've enabled telnet on R1, R2 and R3; let's distribute inbound telnet connections from R5 amongst the three in a round-robin fashion.  I've also given all three routers a default route aimed at R4.  I've removed all the existing NAT config, again.

R4(config)#int fa0/0
R4(config-if)#ip nat outside
R4(config-if)#int fa0/1
R4(config-if)#ip nat inside
R4(config-if)#exit
R4(config)#ip access-list sta vip
R4(config-std-nacl)#permit 30.0.0.25
R4(config-std-nacl)#exit
R4(config)#ip nat pool server-pool 192.168.0.1 192.168.0.3 netmask 255.255.255.0 type rotary
R4(config)#ip nat inside destination list vip pool server-pool

In this case, 30.0.0.25 is our virtual IP (VIP) on the outside. 

and...

R5#telnet 30.0.0.25
Trying 30.0.0.25 ...
% Connection timed out; remote host not responding

That was anti-climatic. 

I'm not sure what causes the problem, but sometimes when I set this up, the router refuses to automatically ARP for the VIP:

R4(config)#do sh ip alias
Address Type             IP Address      Port
Interface                4.4.4.4
Interface                30.0.0.4
Interface                192.168.0.4

But we can force the behavior:

R4(config)#ip alias 30.0.0.25 23
R4(config)#do sh ip alias
Address Type             IP Address      Port
Interface                4.4.4.4
Interface                30.0.0.4
Alias                    30.0.0.25      23
Interface                192.168.0.4

and now it should work:

R5#telnet 30.0.0.25
Trying 30.0.0.25 ... Open

User Access Verification
Password:
R1>exit

[Connection to 30.0.0.25 closed by foreign host]
R5#telnet 30.0.0.25
Trying 30.0.0.25 ... Open

User Access Verification
Password:
R2>exit
[Connection to 30.0.0.25 closed by foreign host]
R5#telnet 30.0.0.25
Trying 30.0.0.25 ... Open

User Access Verification
Password:
R3>

Now that we have that working, what if R1-R3 want to access the outside?

R1#ping 30.0.0.5
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 30.0.0.5, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

No luck, we haven't setup an inside->outside translation.
Let's build a dynamic PAT.

R4(config)#ip access-list sta inside
R4(config-std-nacl)#permit 192.168.0.0 0.0.0.255
R4(config)#ip nat inside source list inside interface fa0/0

R1#ping 30.0.0.5
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 30.0.0.5, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 36/52/112 ms

Now we have outside->inside load balancing, and inside->outside dynamic PAT.

If you've ever used a "lesser" router and tried to forward a range of ports (say TCP 10 through 30) from the outside to an inside address, you probably did it with relative ease.  You may have also struggled trying to get this to work in the Cisco world, which does "port forwards" one at a time via static PAT.  There's a workaround to be had using this same NAT Rotary feature:

First we remove the old config:

R4(config)#no ip nat pool server-pool 192.168.0.1 192.168.0.3 netmask 255.255.255.0 type rotary
R4(config)#no ip nat inside destination list vip pool server-pool

Then we implement the workaround.  We want to forward TCP 10 - 30 to R1.

Create a rotary pool of just R1:

R4(config)#no ip nat pool server-pool 192.168.0.1 192.168.0.1 netmask 255.255.255.0 type rotary

Create an access-list specifying the traffic to "rotary load-balance" to our single server:

R4(config)#ip access-list extended port-forwarding
R4(config-ext-nacl)#permit tcp any any range 10 30
R4(config-ext-nacl)#exit

And apply the policy:

R4(config)#ip nat inside destination list port-forwarding pool server-pool

And test:

R5#telnet 30.0.0.4
Trying 30.0.0.4 ... Open

User Access Verification
Password:
R1>exit

[Connection to 30.0.0.4 closed by foreign host]

Telnet (TCP 23) works.

R5#telnet 30.0.0.4 19
Trying 30.0.0.4, 19 ... Open
 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefg
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh
chargen (TCP 19) works as well.

Let's move on to policy NAT now.

Policy NAT with Extended Access Lists

The simplest way to create a policy NAT is to just use an extended access list.  Up until now, we've been using standard access-lists, which create a simple logic: If the source is on this list, change it.  Now we can say things like "If source is on this list and you're headed towards a specific IP range, then change it".  In my experience, this is most useful for VPNs, where you want to PAT towards the Internet but dynamic NAT to another range over the VPN tunnel.  I'm not going to build that elaborate of a lab, but now you have a reference point for production use.

Some additions to our diagram:


I've given R4 two options for routing "outside": R5 and R6.
R5 has the same IPs as before - the interface IPs between R4 and R5 are 30.0.0.0/24
R6 is using 31.0.0.0/24 between R4 and R6. 

Pretend R7 doesn't exist for now-- we'll get there.

I've removed all the existing NAT config on R4.

R4(config)#int fa0/1
R4(config-if)#ip nat inside
R4(config-if)#int fa0/0
R4(config-if)#ip nat outside
R4(config-if)#int fa1/0
R4(config-if)#ip nat outside

Now that you can reach either R5 or R6 from R4, we need to NAT differently depending on which direction we're going.

One access-list for traffic headed towards R5:
R4(config)#ip access-list ext towards-r5
R4(config-ext-nacl)#permit ip 192.168.0.0 0.0.0.255 30.0.0.0 0.0.0.255

Another access-list for traffic headed towards R6:
R4(config)#ip access-list ext towards-r6
R4(config-ext-nacl)#permit ip 192.168.0.0 0.0.0.255 30.1.0.0 0.0.0.255

Match them in the NAT statements with the appropriate interface:
R4(config)#ip nat inside source list towards-r5 interface fa0/0 overload
R4(config)#ip nat inside source list towards-r6 interface fa1/0 overload

R1#ping 30.0.0.5
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 30.0.0.5, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 16/47/92 ms

R1#ping 30.1.0.6
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 30.1.0.6, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 20/52/116 ms

Pings to R5 and R6 succeed -

R4#sh ip nat trans
Pro Inside global      Inside local       Outside local      Outside global
icmp 30.1.0.4:22       192.168.0.1:22     30.1.0.6:22        30.1.0.6:22
icmp 30.0.0.4:23       192.168.0.1:23     30.0.0.5:23        30.0.0.5:23

and are appropriately NATed.

Now if you're paying attention, you may have already noticed the limitation of this method when used with my diagram.  "Pretend R7 doesn't exist", I said.  What if we're trying to reach 7.7.7.7 using the extended-access list policy NAT method?  Both our access-lists would read the same:

permit ip 192.168.0.0 0.0.0.255 7.7.7.7 0.0.0.0

That's not going to work. In fact, what if the destination was the Internet?  Your access-lists might look like:

permit ip 192.168.0.0 0.0.0.255 any

That's not going to work either. 

Here's where route-maps come in to policy PATs.

Policy NAT with Route-Maps

Cleanup from earlier...
R4(config)#no ip nat inside source list towards-r5 interface fa0/1 overload
R4(config)#no ip nat inside source list towards-r6 interface fa1/0 overload

Build
R4(config)#ip access-list extended towards-outside
R4(config-ext-nacl)#permit ip 192.168.0.0 0.0.0.255 any

Route-maps can do two functions:
1) match access-lists
2) match egress interfaces or next-hops

Some examples will also show them setting interfaces (or next-hops), but I've not seen a functional difference between using "set interface" and "match interface" with policy NAT.  If anyone knows a difference, please comment!

R4(config)#route-map towards-r6 permit 10
R4(config-route-map)#match ip address towards-outside
R4(config-route-map)#match interface FastEthernet1/0
R4(config)#route-map towards-r5 permit 10
R4(config-route-map)#match ip address towards-outside
R4(config-route-map)#match interface FastEthernet0/0

R4(config)#ip nat inside source route-map towards-r5 interface FastEthernet0/0 overload
R4(config)#ip nat inside source route-map towards-r6 interface FastEthernet1/0 overload
R4(config)#ip route 0.0.0.0 0.0.0.0 30.0.0.5
R4(config)#ip route 0.0.0.0 0.0.0.0 30.1.0.6 10

This would simulate a poor man's redundant Internet solution - different static IPs on different ISPs, routing out one at a time.  If Fa0/0 goes down, Fa1/0 will take over.  Let's give it a try:

R1#ping 7.7.7.7
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 7.7.7.7, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 60/76/112 ms

R4#sh ip nat trans
Pro Inside global      Inside local       Outside local      Outside global
icmp 30.0.0.4:40       192.168.0.1:40     7.7.7.7:40         7.7.7.7:40

And now for the failover test:

R4#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R4(config)#int fa0/0
R4(config-if)#shut

R1#ping 7.7.7.7
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 7.7.7.7, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 48/74/100 ms

R4#sh ip nat trans
Pro Inside global      Inside local       Outside local      Outside global
icmp 30.1.0.4:41       192.168.0.1:41     7.7.7.7:41         7.7.7.7:41

Note that the "match ip address" clause in the route-map is really not necessary in this case, but I included it to show the functionality.  "match interface" is sufficient to make the NAT decision.

We saw earlier than dynamic NAT is typically reversible.  Not so much with route-maps for dynamic NAT.

Reversible Dynamic NAT with Route-Maps

R4(config)#route-map towards-r5 permit 10
R4(config-route-map)#no match interface FastEthernet0/0  ! interface not supported w/ reversible

R4(config)#no ip nat inside source route-map towards-r5 interface FastEthernet0/0 overload
R4(config)#int fa0/0
R4(config-if)#no shut

R4(config)#ip nat pool dynamic-pool 30.0.0.10 30.0.0.100 netmask 255.255.255.0
R4(config)#ip nat inside source route-map towards-r5 pool dynamic-pool reversible

The reversible keyword is required in order to make this scenario happen with route-maps.

Static NAT with Route-Maps

I've cleared all the NAT off R4 again.

R4(config)#int fa0/1
R4(config-if)#ip nat inside
R4(config-if)#int fa0/0
R4(config-if)#ip nat outside
R4(config-if)#int fa1/0
R4(config-if)#ip nat outside

I showed how to do policy-PAT already, but 1:1 is a whole different story.  Let's say this is a server farm, we have two different ISPs, but we're not running BGP and we have separate IP ranges statically assigned from both ISPs.  How do we do a hot/cold failover but maintain the static NAT?

Let's make 192.168.0.1 our "server" and try to forward traffic from two outside IPs towards it.

R4(config)#ip nat inside source static 192.168.0.1 30.0.0.1
R4(config)#ip nat inside source static 192.168.0.1 30.1.0.1
% 192.168.0.1 already mapped (192.168.0.1 -> 30.0.0.1)

You had probably already guessed that that wasn't going to work.

Here's the route-map method to accomplish this:

R4(config)#no ip nat inside source static 192.168.0.1 30.0.0.1
R4(config)#ip access-list extended towards-outside
R4(config-ext-nacl)#permit ip 192.168.0.0 0.0.0.255 any

R4(config)#ip route inside source static 192.168.0.1 30.0.0.1 route-map R1TOANY_VIAISP1
R4(config)#ip route inside source static 192.168.0.1 30.1.0.1 route-map R1TOANY_VIAISP2

R4(config)#do sh ip nat trans
Pro Inside global      Inside local       Outside local      Outside global
--- 30.0.0.1           192.168.0.1        ---                ---
--- 30.1.0.1           192.168.0.1        ---                ---

We still have these in place:
ip route 0.0.0.0 0.0.0.0 30.0.0.5
ip route 0.0.0.0 0.0.0.0 30.1.0.6 10

Verification -

R5#telnet 30.0.0.1
Trying 30.0.0.1 ... Open

User Access Verification
Password:
R1>

R4(config)#int fa0/0
R4(config-if)#shut

R6#telnet 30.1.0.1
Trying 30.1.0.1 ... Open

User Access Verification
Password:
R1>

R4(config)#int fa0/0
R4(config-if)#no shut

There's another way to accomplish something similar.  The extendable command makes sort of a reverse-PAT function for 1:1 NATs.

R4(config)#no ip nat inside source static 192.168.0.1 30.0.0.1 route-map R1TOANY_VIAISP1
R4(config)#no ip nat inside source static 192.168.0.1 30.1.0.1 route-map R1TOANY_VIAISP2

R4(config)#ip nat inside source static 192.168.0.1 30.0.0.1 extendable
R4(config)#ip nat inside source static 192.168.0.1 30.1.0.1 extendable

R4(config)#do sh ip nat trans
Pro Inside global      Inside local       Outside local      Outside global
--- 30.0.0.1           192.168.0.1        ---                ---
--- 30.1.0.1           192.168.0.1        ---                ---

Hmm, the NAT table looks about the same.

R5#telnet 30.0.0.1
Trying 30.0.0.1 ... Open

User Access Verification
Password:
R1>

R4(config)#do sh ip nat trans
Pro Inside global      Inside local       Outside local      Outside global
tcp 30.0.0.1:23        192.168.0.1:23     30.0.0.5:16608     30.0.0.5:16608
--- 30.0.0.1           192.168.0.1        ---                ---
--- 30.1.0.1           192.168.0.1        ---                ---

We'd expect an entry like this based on the default ip nat create flow-entries.  However, this time, it's taken... more literally.  The router is doing what I can only describe as a bi-directional PAT. 

R6#telnet 30.1.0.1
Trying 30.1.0.1 ... Open

User Access Verification
Password:
R1>

R4(config)#do sh ip nat trans
Pro Inside global      Inside local       Outside local      Outside global
tcp 30.0.0.1:23        192.168.0.1:23     30.0.0.5:16608     30.0.0.5:16608
tcp 30.1.0.1:23        192.168.0.1:23     30.1.0.6:16284     30.1.0.6:16284
--- 30.0.0.1           192.168.0.1        ---                ---
--- 30.1.0.1           192.168.0.1        ---                ---

Those entries are there for more than just acceleration, they're actually required now.  In fact, I got curious and disabled ip nat create flow-entries:

R4(config)#no ip nat create flow-entries
R4(config)#do clear ip nat trans *

<sample telnets were done here, not shown>

R4(config)#do sh ip nat trans
Pro Inside global      Inside local       Outside local      Outside global
tcp 30.0.0.1:23        192.168.0.1:23     30.0.0.5:59295     30.0.0.5:59295
tcp 30.1.0.1:23        192.168.0.1:23     30.1.0.6:16284     30.1.0.6:16284
tcp 30.1.0.1:23        192.168.0.1:23     30.1.0.6:63205     30.1.0.6:63205
--- 30.0.0.1           192.168.0.1        ---                ---
--- 30.1.0.1           192.168.0.1        ---                ---

Tough luck, you're getting the flow entries anyway, because this process doesn't work without them.

Arbitrary IPs; Redistributing NAT

Here's something I was always curious about - NATing to totally arbitrary IPs in IOS.
You've certainly gathered by now that you can NAT to anything, even IPs that aren't on any of your interfaces.  That's generally pretty useless because other devices aren't aware how to reach the IPs, whether your router ARPs for them or not.

I've cleared all the prior NAT configuration.  I'd like to NAT 192.168.0.0/24, our inside range, to 207.50.50.0/24.  I don't want to put 207.50.50.0/24 on an interface. I'm going to use NVI NAT for the example, but something similar could be accomplished with domain NAT. 

R4(config)#interface fa0/0
R4(config-if)#ip nat enable
R4(config)#interface fa0/1
R4(config-if)#ip nat enable

R4(config)#ip nat pool arbitrary 207.50.50.1 207.50.50.200 prefix-length 24 add-route

You may remember "add-route" from domain NAT; here it is for NVI NAT.  Note it's applied on the pool with NVI NAT instead of in the NAT statement itself.

R4(config)#ip access-list extended inside-range
R4(config-ext-nacl)#permit ip 192.168.0.0 0.0.0.255 any

R4(config)#ip nat source list inside-range pool arbitrary

NAT is all setup now.

R4(config)#do sh ip route static
S    207.50.50.0/24 [0/0] via 0.0.0.0, NVI0

This static route can now be introduced into our outside routing protocol through redistribution.  Or, you could just use a bgp "network" statement: network 207.50.50.0 mask 255.255.255.0.  In our case, the outside is presently running OSPF, so:

R4(config)#router ospf 1
R4(config-router)#redistribute static subnets

Verify -

R5#sh ip route ospf
O E2 207.50.50.0/24 [110/20] via 30.0.0.4, 00:19:24, FastEthernet0/0
R5#debug ip icmp

R1#ping 30.0.0.5
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 30.0.0.5, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 36/48/88 ms

R5#
*Mar  1 00:36:11.983: ICMP: echo reply sent, src 30.0.0.5, dst 207.50.50.1
*Mar  1 00:36:12.019: ICMP: echo reply sent, src 30.0.0.5, dst 207.50.50.1
*Mar  1 00:36:12.055: ICMP: echo reply sent, src 30.0.0.5, dst 207.50.50.1
*Mar  1 00:36:12.115: ICMP: echo reply sent, src 30.0.0.5, dst 207.50.50.1
*Mar  1 00:36:12.159: ICMP: echo reply sent, src 30.0.0.5, dst 207.50.50.1

Something similar could also be accomplished by creating a static route to null0 - ip route 207.50.50.0 255.255.255.0 null0 - and redistributing it. 

Stateful NAT (SNAT)

We're still missing one big topic in this article still: SNAT, or Stateful NAT.  It's a way of sharing NAT tables across multiple routers, typically via HSRP, for the purpose of hot/hot shared NAT or hot/cold shared NAT.  This method could literally take a blog to itself... in fact, it did!  I had to put it into production a few months ago, so I took the time to blog about it then:

http://brbccie.blogspot.com/2013/03/stateful-nat-with-asymmetric-routing.html

Hope you enjoyed,

Jeff Kronlage