Friday, February 22, 2013

Catalyst 3560 QoS [1 of 2]: Order of Operations

Thus far, 3560 QoS is probably at a tie with OER for subjects I've had to spend the most time on to get to the bottom of.  There are a lot of blogs, videos, and books available that discuss the nuts & bolts of hardware-enabled QoS on the 3560.  However, I couldn't find a single document that really delved into the order-of-operations.  For example, you could trust IP Precedence, use a PREC->DSCP Map, police the internal DSCP label down, but which of those values would be used for the ingress queuing?  This is not at all made clear from the documentation.  After an exhaustive labbing experience, I will answer these questions!

I created the network diagram using objects in GNS3 for simplicity of creating the diagram.  Obviously, I am using real routers and real 3560s in this scenario, as GNS3/dynamips has no way of emulating a 3560.


R1, R2 and R3 are all running IPX and IP.  No routing is taking place here, this is all one big VLAN (Vlan 705).  All routers are using 192.168.0.X on Fa0/0.705, where X is their router number.  We're using a subinterface in order to use COS values, as COS is only carried on trunk links.  All routers also have an IPX address of 123.YYY.YYY.YYYY, where the Ys represent their MAC address.  We'll look at the MACs of my physical gear more closely as we proceed.  The switches have no IP addresses on them whatsoever.
 
I'm going to start with the punchline, and work my way backwards.  This is the process, in order from left to right, that the values are interpreted:
 
 
Obviously this is very high-level; we'll look at the entire process step-by-step. 

But first, let's talk about how I determined all this information, and how you can reproduce my findings.
 
A lot of blogs suggest using show mls qos interface fa0/x statistics.  This is a fantastic command for the lab, or for making quick clarifications.  A sample of the output looks like this:
 
Rack1SW1#show mls qos int fa0/1 statistics
FastEthernet0/1 (All statistics are in packets)
  dscp: incoming
-------------------------------
  0 -  4 :           0            0            0            0            0
  5 -  9 :           0            0            0            0            0
[output omitted]
 
  dscp: outgoing
-------------------------------
  0 -  4 :           0            0            0            0            0
  5 -  9 :           0            0            0            0            0
[output omitted]
 
  cos: incoming
-------------------------------
  0 -  4 :           2            0            0            0            0
  5 -  7 :           0            0            0
  cos: outgoing
-------------------------------
  0 -  4 :           0            0            0            0            0
  5 -  7 :           0            0            0

However, my findings - particuarly in determining what features took place in front of others - required me to know what was happening at the ingress & egress queuing steps.  In this case you have to look at some ASIC statistics, which unfortunately require some really awful command syntax.
 
To see the four egress queues, use show platform port-asic stats enqueue fa0/1
 
The output looks like this:
 
Rack1SW1#show platform port-asic stats enqueue fa0/1
  Interface Fa0/1 TxQueue Enqueue Statistics
    Queue 0
      Weight 0 Frames 2
      Weight 1 Frames 0
      Weight 2 Frames 0
    Queue 1
      Weight 0 Frames 0
      Weight 1 Frames 3
      Weight 2 Frames 3766
    Queue 2
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 0
    Queue 3
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 532

The output of this command is reasonably simple.  You have the four egress queues and three thresholds.  The only thing funny about it is the egress queues are numbered 0-3, instead of 1-4, like they are in other commands.
 
Ingress queues take some more steps.
 
Step 1 is to find the internal port number and ASIC number.  You use show platform pm if-numbers:

Rack1SW1#show platform pm if-numbers | i Fa0/1
Fa0/1     3    3    1    1/2  1    1    3    local     Yes     Yes
Fa0/10    12   12   10   1/11 1    10   12   local     Yes     Yes
[output omitted]

The bolded item (1/2) above is the interesting one.  That means port 2 ASIC 1 is what we want for Fa0/1.  You can then get both the ingress and egress stats from show platform port-asic stats enqueue port X asic Y.  This really produces a lot of output, so I always add | b RxQueue Enqueue to limit the output.
 
Rack1SW1#show platform port-asic stats enqueue port 2 asic 1 | b RxQueue Enqueue
  RxQueue Enqueue Statistics
    Queue 0
      Weight 0 Frames 0
      Weight 1 Frames 72
      Weight 2 Frames 0
    Queue 1
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 0
    Queue 2  <-- ACTUALLY QUEUE 1
      Weight 0 Frames 375836
      Weight 1 Frames 0
      Weight 2 Frames 2321
    Queue 3 <-- ACTUALLY QUEUE 2
      Weight 0 Frames 164
      Weight 1 Frames 0
      Weight 2 Frames 240
 
Now if you're familiar with 3560 QoS at all, you're probably wondering why there are four ingress queues listed here and not just two.  Queue 0 and Queue 1 are some sort of internal queue system; this is beyond the scope of the lab exam, and I didn't investigate them any further.  Queue 2 and Queue 3 correspond with the ingress Queue 1 and Queue 2 that we're familiar with.
 
Another introductory topic is the very vague "Internal QoS label".  This is the value that's carried through the steps in the chart above that queuing decisions are actually made on.  I couldn't find any clear reference in the Cisco documentation on exactly what this is or how it was assigned, so I figured it out the best I could.  The value is always either a COS value or a DSCP value. If you trust COS, it's a COS value, if you trust DSCP or IPP - even if the traffic is non-IP (such as IPX) - the internal QoS label is DSCP.  In that scenario, the COS->DSCP chart is consulted for the internal QoS label.
 
Now, back to the chart I described above.  One more time to avoid scrolling:


 
Step 1: Trust COS/PREC/DSCP
 
This first step has quite a lot of potential outcomes. 
 
Depending on which you choose to trust, you end up with one of the following outcomes for Internal QoS label:
IP Traffic - Trust COS = COS
Non-IP     - Trust COS = COS
IP Traffic - Trust PREC = DSCP
Non-IP     - Trust PREC = DSCP
IP Traffic - Trust DSCP = DSCP
Non-IP     - Trust DSCP = DSCP
 
The Prec and Non-IP Traffic items warrant some additional explanation:
 
If you trust PREC, the PREC -> DSCP map kicks in and internal decisions are made on DSCP.  In fact, the traffic is output as DSCP, not PREC, on the way to the next hop.  The PREC->DSCP map is also consulted for the egress DSCP value.
 
Non-IP traffic, if you trusted anything other than COS, is judged by DSCP value, based on the COS->DSCP table. 
 
In general, you can basically assume that unless you're talking about trusting COS, you're using DSCP as your internal QoS label.
 
If you're trusting COS, you can also set the COS value if it's "not present".  This is done through:
 
Rack1SW1(config-if)#mls qos trust cos
Rack1SW1(config-if)#mls qos cos 5

This works on both trunk and access ports, tagged frames get trusted, untagged (access and trunk native vlan) get reset to the assigned value.
 
On the other hand, you can also:

Rack1SW1(config-if)#mls qos cos override
Rack1SW1(config-if)#mls qos cos 5

This method works on both trunk and access ports.  This example would force COS 5 on to any frame entering this port, even if it already had another COS value set. 
 
You're probably wondering how I came up with all the facts I've been throwing around, so I'm going to take a moment to show some samples.  We're going to assume no policer or DSCP mutation is in place, and that we're going straight from the trust/classify state all the way to ingress queuing.
 
We'll be sending pings from R1 to R3.  R1 will mark COS, PREC, and DSCP separately, as well as send IPX pings.  We'll assign each of them to different ingress queues and thresholds.  We'll then check the ingress queue hits to see what the internal QoS label is.
 
These mls qos settings will be applied to SW1:
 
mls qos
mls qos map cos-dscp 0 8 16 24 18 40 48 56 <-- mark COS 4 to DSCP 18/AF21
mls qos map ip-prec-dscp 0 8 16 24 10 40 48 56 <-- mark PREC 4 to DSCP 10/AF11
mls qos srr-queue input cos-map queue 2 threshold 1  4 <-- COS 4
mls qos srr-queue input dscp-map queue 1 threshold 2  34 <-- AF 41
mls qos srr-queue input dscp-map queue 2 threshold 2  18 <-- for COS->DSCP18/AF21
mls qos srr-queue input dscp-map queue 2 threshold 3  10 <-- for IPP->DSCP10/AF11

What we're accomplishing here is setting a variety of QoS markings to different queues.  I'll cover them individually as I reference them.
 
These policy-maps will be applied to R1:
 
policy-map set-dscp-af41-cos-4
 class class-default
  set cos 4
  set dscp af41

policy-map set-prec-4-cos-4
 class class-default
  set cos 4
  set ip precedence 4
 
First, let's look at a sample of trusting COS on the port connected to R1:
 
interface FastEthernet0/1
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 705
 switchport mode trunk
 mls qos trust cos

We're going to set the set-dscp-af41-cos-4 map egressing R1 towards SW1:
 
interface FastEthernet0/0.705
 encapsulation dot1Q 705
 ip address 192.168.0.1 255.255.255.0
 ipx network 123 encapsulation ARPA
 service-policy output set-dscp-af41-cos-4

So we'll be sending AF41 and COS 4, and trusting COS 4.  Which queue would we expect this to come in?  Per our config above:

mls qos srr-queue input cos-map queue 2 threshold 1 4

We'd expect Queue 2, threshold 1, for COS 4.

Now let's look at our counters on Fa0/1's ASIC:

Rack1SW1#show platform port-asic stats enqueue port 2 asic 1 | b RxQueue Enqueue
  RxQueue Enqueue Statistics
    Queue 0
      Weight 0 Frames 0
      Weight 1 Frames 72
      Weight 2 Frames 0
    Queue 1
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 0
    Queue 2
      Weight 0 Frames 777444
      Weight 1 Frames 0
      Weight 2 Frames 4809
    Queue 3  <-- THIS IS ACTUALLY INGRESS QUEUE 2
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 0

Ok, so no frames yet in queue 2 at all.

Rack1R1#ping 192.168.0.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/3/4 ms

 Rack1SW1#show platform port-asic stats enqueue port 2 asic 1 | b RxQueue Enqueue
  RxQueue Enqueue Statistics
    Queue 0
      Weight 0 Frames 0
      Weight 1 Frames 72
      Weight 2 Frames 0
    Queue 1
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 0
    Queue 2
      Weight 0 Frames 785040
      Weight 1 Frames 35
      Weight 2 Frames 4854
    Queue 3
      Weight 0 Frames 5
      Weight 1 Frames 0
      Weight 2 Frames 0

There you have it.  Queue "2", threshold 1, for COS 4.  So we know that the ingress queuing decision is made based on COS, as expected.

I'm going to be giving these samples over and over again during this document, so I'm going to shorten my format for the output moving forward.  All pings will be sent from R1, through Fa0/1 on SW1, to either R2 (via SW2) or R3 (via SW1).

Next up is non-IP traffic, same policy-map (AF41/COS 4).  We'd expect the same results as above.

Rack1R1#ping ipx 123.0012.43c1.6f20    ! 0012.43c1.6f20 is the MAC address of R3
Type escape sequence to abort.
Sending 5, 100-byte IPX Novell Echoes to 123.0012.43c1.6f20, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms

Rack1SW1#show platform port-asic stats enqueue port 2 asic 1 | b RxQueue Enqueue
  RxQueue Enqueue Statistics
  [output omitted]
    Queue 3
      Weight 0 Frames 10
      Weight 1 Frames 0
      Weight 2 Frames 0

As anticipated, IP and IPX traffic are handled the same for ingress queuing if trusting COS.  As PREC-flagged traffic is going to respond the same in this scenario, I'm going to skip over it for brevity.

Let's trust DSCP on Fa0/1 instead:

Rack1SW1(config)#int fa0/1
Rack1SW1(config-if)#mls qos trust dscp

Rack1R1#ping 192.168.0.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/4/4 ms

From earlier:
mls qos srr-queue input dscp-map queue 1 threshold 2  34 <-- AF 41

We'd expect queue 1 threshhold 2.  I'm going to point out one last time that the input queues we're after are numbered queue "2" (actually 1) and queue "3" (actually 2).  From here forward I'm going to assume I've harped on that enough.

Rack1SW1#show platform port-asic stats enqueue port 2 asic 1 | b RxQueue Enqueue
  RxQueue Enqueue Statistics
  [output omitted]
    Queue 2
      Weight 0 Frames 819695
      Weight 1 Frames 5
      Weight 2 Frames 5069
    Queue 3
      Weight 0 Frames 10
      Weight 1 Frames 0
      Weight 2 Frames 0

As anticipated.  DSCP was trusted, AF41 was mapped to the appropriate queue.

Now IPX traffic.  But first let's hypothesize.
Based on what I announced at the beginning, this should go through a COS->DSCP map before the internal QoS label (DSCP) is selected. 

If the internal value was a COS->DSCP translation as I explained, per the config above, we would expect:

mls qos map cos-dscp 0 8 16 24 18 40 48 56 <-- mark COS 4 to DSCP 18/AF21
mls qos srr-queue input dscp-map queue 2 threshold 2  18 <-- for COS->DSCP18/AF21

4 to get remarked to AF21, and AF21 to be queued in queue 2/threshold 2.

If I was wrong, and the COS value was taken natively:

mls qos srr-queue input cos-map queue 2 threshold 1  4 <-- COS 4

We'd expect queue 2 threshold 1.

Rack1SW1#show platform port-asic stats enqueue port 2 asic 1 | b RxQueue Enqueue
  RxQueue Enqueue Statistics
  [output omitted]
    Queue 2
      Weight 0 Frames 841208
      Weight 1 Frames 5
      Weight 2 Frames 5206
    Queue 3
      Weight 0 Frames 10
      Weight 1 Frames 5
      Weight 2 Frames 0

Queue 2 threshold 2!  For non-IP traffic, if an IP marking is trusted (either PREC or DSCP), the COS value is converted to the DSCP value, and that's where the internal QoS label comes from.

Now, behind the scenes, I'm going to change the service-policy on R1 to send out PREC 4 / COS 4 instead of DSCP AF41 / COS 4.  By all rights, I could just leave "mls qos trust dscp" on Fa0/1 on SW1, however, I would have to have some matching for CS4 in that case.  Probably obvious, but since PREC and DSCP share bit space and DSCP is "backwards compatible" with PREC, if you're clever about your mappings, you can send PREC and match on DSCP just fine.  One quick reminder... PREC 5 does not equal DSCP EF!  If you want it to work that way, you have to adjust the ip-prec-dscp map.
 
Rather than go through what should be an elementary lesson on how PREC and DSCP line up, I'm going to move on to trusting PREC on SW1 Fa0/1:
 
Rack1SW1(config)#int fa0/1
Rack1SW1(config-if)#mls qos trust ip-precedence

Now what results would we expect?
 
mls qos map ip-prec-dscp 0 8 16 24 10 40 48 56 <-- mark PREC 4 to DSCP 10/AF11
mls qos srr-queue input dscp-map queue 2 threshold 3  10 <-- for IPP->DSCP 10/AF11
 
Remember there is no such thing as an internal PREC value, nor is there any way to make a queuing decision based on PREC.  So we have to convert it to DSCP first.  I've translated PREC 4 to DSCP AF11, and I'm putting AF11 in queue 2 threshold 3.
 
Rack1R1#ping 192.168.0.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/3/4 ms
 
Rack1SW1#show platform port-asic stats enqueue port 2 asic 1 | b RxQueue Enqueue
  RxQueue Enqueue Statistics
  [output omitted]
    Queue 2
      Weight 0 Frames 870451
      Weight 1 Frames 5
      Weight 2 Frames 5388
    Queue 3
      Weight 0 Frames 10
      Weight 1 Frames 5
      Weight 2 Frames 5

As expected!
 
I didn't delve into every possible scenario, and I'm going to get more sparse with samples moving forward. If you'd like the full text-based results of this (I have tested every possible permutation), post your email address and I'll send it on.  For brevity, I'm only going to highlight the important pieces of every step as we move forward.
 


Step 2a: RE-MARK BY DSCP MUTATION (if applicable)

The DSCP mutation is normally applied at a network edge where you want to translate your neighbor's DSCP values into something different for your network.

Let's take the AF41 that the router is putting out:

Rack1R1(config-subif)#int fa0/0.705
Rack1R1(config-subif)#no service-policy output set-prec-4-cos-4
Rack1R1(config-subif)#service-policy output set-dscp-af41-cos-4

and turn it into AF11:

Rack1SW1(config)#mls qos map dscp-mutation af41-to-af11 34 to 10
Rack1SW1(config)#int fa0/1
Rack1SW1(config-if)#mls qos dscp-mutation af41-to-af11

We'd expect it in queue 2 threshold 3, per:
mls qos srr-queue input dscp-map queue 2 threshold 3  10

Rack1R1#ping 192.168.0.3 repeat 5
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms

Rack1SW1#show platform port-asic stats enqueue port 2 asic 1 | b RxQueue Enqueue
    Queue 2
      Weight 0 Frames 1175371
      Weight 1 Frames 5
      Weight 2 Frames 7265
    Queue 3
      Weight 0 Frames 10
      Weight 1 Frames 5
      Weight 2 Frames 10



Step 2b: SERVICE-POLICY/POLICER (if applicable)
I have removed the above dscp-mutation prior to this section.

There are two oddities about applying a service-policy:

- If you trusted COS/PREC/DSCP on the interface, and then apply a service policy, the packets matched by the service policy are not trusted. You need to reassign the QoS value in the policy-map, or trust the value (again, so to speak) in the policy-map.  To be clear, if your packet came in on the interface, was run past the criteria in the policy-map, and no match was found, the value is still trusted.

- "show policy-map interface" will show you what policy maps are applied to what interfaces, but the counters don't work at all.  Pretty typical Catalyst switch behavior, since the policies are applied in hardware, the counters are just broken.  So don't take a bunch of zeros on the command output literally.

Let's add some global config:

mls qos map policed-dscp 34 to 10

class-map match-all af41
 match ip dscp af41

policy-map police-down-dscp
 class af41
  trust dscp
  police 8000 8000 exceed-action policed-dscp-transmit

and on the interface in question:

Rack1SW1(config)#int fa0/1
Rack1SW1(config-if)#mls qos trust dscp
Rack1SW1(config-if)#service-policy input police-down-dscp

and on R1:

Rack1R1(config)#int fa0/0.705
Rack1R1(config-subif)#service-policy output set-dscp-af41-cos-4

So the logic should go as follows:
1) R1 should egress traffic as AF41
2) SW1 should trust DSCP, and bring AF41 in as the internal QoS label
3) The service policy should knock AF41 (DSCP 34) down to AF11 (DSCP 10) if it exceeds the policer.

Clearly, we're going to have to send more than five pings here in order for the policer to kick in.  But first, let's go ahead and send those five pings just to demonstrate a point.

Rack1R1#ping 192.168.0.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/4/4 ms

Rack1SW1#show platform port-asic stats enqueue port 2 asic 1 | b RxQueue Enqueue
  RxQueue Enqueue Statistics
  [output omitted]
    Queue 2
      Weight 0 Frames 1292827
      Weight 1 Frames 10
      Weight 2 Frames 5383
    Queue 3
      Weight 0 Frames 10
      Weight 1 Frames 5
      Weight 2 Frames 10

We're getting queue 1 threshold 2, which is what's expected for DSCP AF41.  If I hadn't put "trust dscp" in the policy map, this would have matched DSCP 0.  Alternatively I could have used "set dscp af41", which would have had the same effect in this case.

Now let's ramp up the traffic.

Rack1R1#ping 192.168.0.3 size 1000 repeat 100
Type escape sequence to abort.
Sending 100, 1000-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (100/100), round-trip min/avg/max = 1/4/8 ms

Rack1SW1#show platform port-asic stats enqueue port 2 asic 1 | b RxQueue Enqueue
    [output omitted]
    Queue 2
      Weight 0 Frames 1350941
      Weight 1 Frames 18
      Weight 2 Frames 7481
    Queue 3
      Weight 0 Frames 10
      Weight 1 Frames 5
      Weight 2 Frames 97

We expected queue 1 threshold 2 for the AF41 traffic - we see that increased by 8 frames.  All traffic in excess of the policer we expected to be treated as AF11, which is mapped to queue 2 threshold 3, which we see increased by 92.  There's our 100 packets.

Step 2, SUMMARIZED

I'm sure some of you are wondering what happens if you try to put a policer in front of a DSCP mutation, or a DSCP mutation behind a policer.  I know I did!

First and foremost, these features don't "stack".  In fact, they're mutually exclusive.  Applying both a dscp-mutation and a service-policy to the same interface makes the dscp-mutation get ignored.  It will apply, and do absolutely nothing, even if matching different types of traffic classes.  For example, if you were looking for AF21 on the policer, and AF41 on the dscp-mutation, the dscp-mutation still won't have any effect.



Step 3: INGRESS QUEUING

The 3560's backplane doesn't have enough bandwidth for every port to transmit at line rate simultaneously.  In order to cope with that problem, each port performs queueing to determine which traffic makes it on to the ingress ring in times of congestion.  As you've probably noticed from the information we've seen so far, the internal QoS label is what's used to pick the ingress queue.  There's really not much more to say here.  The information above gives everything you need to know about setting the internal QoS label, so getting to the correct queue is just a matter of setting the queue with the appropriate map.  For reference let's look at two of the maps from above:

mls qos srr-queue input cos-map queue 2 threshold 1 4 <- Map COS 4 to queue 2, threshold 1
mls qos srr-queue input dscp-map queue 1 threshold 2 34 <- Map AF41 to queue 1, threshold 2



Step 4: EGRESS QUEUING

Here's more great news - egress queuing is also very simple! It's very similar to ingress queuing in functionality.  It has 4 queues instead of the 2 we had at ingress.  However, the same internal QoS label we used at ingress is used at egress, so the order-of-operations is very easy to figure out here.  I'm going to give some brief examples for verification.

First, we need to specify which queue traffic gets put in, so that we can track our results.

mls qos srr-queue output dscp-map queue 3 threshold 1  34 <-- AF41
mls qos srr-queue output dscp-map queue 1 threshold 2  18 <-- AF21

I'm going to put a service-policy back in place.  Not a policer, because I don't want traffic spread out over multiple queues, but I'm going to re-mark traffic on ingress to prove that the ingress and egress queues always use the same internal QoS label for the purpose of queuing. 

class-map match-all af41
 match ip dscp af41

policy-map remark-af41
 class af41
  set dscp af21

interface FastEthernet0/1
 mls qos trust dscp
 service-policy input remark-af41

Based on our maps, we'd expect....

mls qos srr-queue input dscp-map queue 2 threshold 2 18 <-- AF21
mls qos srr-queue output dscp-map queue 3 threshold 3 18 <-- AF21

Ingress queue 2, threshold 2 and egress queue 3 threshold 3.

Before we ping, we need to see the stats on Fa0/3's egress queue for reference:

You may recall the command I showed at the beginning of the article that showed the easiest way to see the hits on the egress queue: show platform port-asic stats enqueue

Rack1SW1#show platform port-asic stats enqueue fa0/3
  Interface Fa0/3 TxQueue Enqueue Statistics
    Queue 0
      Weight 0 Frames 2
      Weight 1 Frames 0
      Weight 2 Frames 0
    Queue 1
      Weight 0 Frames 23
      Weight 1 Frames 3
      Weight 2 Frames 67270
    Queue 2
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 0   
    Queue 3
      Weight 0 Frames 1260
      Weight 1 Frames 169
      Weight 2 Frames 9791

Now let's send some pings.

Rack1R1#ping 192.168.0.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms

Rack1SW1#show platform port-asic stats enqueue port 2 asic 1 | b RxQueue Enqueue
    [output omitted]
    Queue 2
      Weight 0 Frames 1350941
      Weight 1 Frames 18
      Weight 2 Frames 7481
    Queue 3
      Weight 0 Frames 10
      Weight 1 Frames 10  <-- This incremented by 5; reference above
      Weight 2 Frames 97

Ingress queue 2, threshold 2...

Rack1SW1#show platform port-asic stats enqueue fa0/3   ! Router 3 is on Fa0/3
  Interface Fa0/3 TxQueue Enqueue Statistics
    Queue 0
      Weight 0 Frames 2
      Weight 1 Frames 0
      Weight 2 Frames 0
    Queue 1
      Weight 0 Frames 23
      Weight 1 Frames 3
      Weight 2 Frames 66973
    Queue 2
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 5
    Queue 3
      Weight 0 Frames 1260
      Weight 1 Frames 169
      Weight 2 Frames 9744

As you can see, we can prove the order-of-operations in this case is Trust, Remark (via service-policy), Ingress Queuing, Egress Queuing.

Interesting side note.  I tried this with putting IPX traffic on the wire at R1, with COS 4. The queuing decision does in fact take place based on DSCP (via COS->DSCP map), however, you can't slap a service policy on it that references DSCP and expect to get an outcome by matching the internal QoS label -- it just doesn't work.



Step 5: SYNC L2 & L3 MARKING

If you wanted to be nit-picky about it, I have no real way of proving this step happens last.  However, it makes no difference, as the outcome is exactly the same than if this were wedged in between ingress & egress queuing.

If you trusted a L3 value on ingress, this step sets the corresponding COS value on egress.  If you trusted a COS value on ingress, and this is an IP packet, this step sets the corresponding DSCP value on egress -- obviously there's no way in setting a DSCP value on an IPX packet!

We'll be adding a series of class maps and a policy map to R3.  This policy-map will have a policer attached to the various COS levels, to prove which COS (via show policy-map interface) is incoming.  The policer itself is large and not important, it doesn't change any DSCP values, it's just there so we can match the COS easily. We will also match all the possible DSCP values via an access list called "all_qos"; we will show its output and see where the hits are.

First example will be sending a ping from R1 to R3, via SW1.  SW1 Fa0/1 is setup to trust DSCP, SW2 Fa0/13 (the connection point to SW1) is setup to trust DSCP as well.  R1 will set AF41 and COS 4.

We'll also be modifying the default DSCP->COS map on SW1 so that af41 aligns with COS 6:

mls qos map dscp-cos  34 to 6

Sending 100 pings now so that the service policy will report them..

Rack1R1#ping 192.168.0.3 repeat 100
Type escape sequence to abort.
Sending 100, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (100/100), round-trip min/avg/max = 1/3/4 ms
On R3, we see that COS 6 is getting matched. 

Rack1R3#show policy-map int | s cos6
    Class-map: cos6 (match-any)
      100 packets, 11800 bytes
      30 second offered rate 0 bps, drop rate 0 bps
      Match: cos  6
        100 packets, 11800 bytes
        30 second rate 0 bps
      police:
          cir 100000 bps, bc 50000 bytes
        conformed 100 packets, 11800 bytes; actions:
          transmit
        exceeded 0 packets, 0 bytes; actions:
          drop
        conformed 0 bps, exceed 0 bps

This is happening because the initial COS value of 4 is meaningless to the switch.  We trusted DSCP, and the DSCP->COS map says synchronize to COS 6.  The same outcome would have happened if we'd not set the COS value on R1 at all.

The access-list verification always confuses me.  For some reason, I get three hits for every one packet when matching on an access-list during this lab.  So that 100 packets I sent translates to 300 ACL matches.  If someone knows why this behavior happens, I would love to know:

Rack1R3#sh ip access-list | i match
    360 permit ip any any dscp af41 (300 matches)

So we hit AF41 / COS 6, as expected.

Now let's try setting just a COS value on IP packets, trusting COS on SW1, and see what we get.

R1:
policy-map setcos4
 class class-default
  set cos 4

interface FastEthernet0/0.705
 service-policy output setcos4

SW1:
interface FastEthernet0/1
 mls qos trust cos

As mentioned at the beginning of the document, we will be setting AF21 when we trust COS 4:

mls qos map cos-dscp 0 8 16 24 18 40 48 56  <-- 18 / AF21
Clear counters on R3...
Rack1R3#clear counters
Clear "show interface" counters on all interfaces [confirm]

Rack1R3#clear ip access-list counters

and send the pings!

Rack1R1#ping 192.168.0.3 repeat 100
Type escape sequence to abort.
Sending 100, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (100/100), round-trip min/avg/max = 1/3/4 ms

First, check the COS:

Rack1R3#show policy-map int | s cos4
    Class-map: cos4 (match-any)
      100 packets, 11800 bytes
      30 second offered rate 4000 bps, drop rate 0 bps
      Match: cos  4
        100 packets, 11800 bytes
        30 second rate 4000 bps

Alright, COS 4, same as we sent from R1.

Rack1R3#sh ip access-list | i match
    200 permit ip any any dscp af21 (300 matches)

Alright, all synced up from L2 QoS -> L3 QoS!

Now for the tricky verification.  Let's trust DSCP but send IPX traffic.  As we may recall from above, this ends up with an internal QoS label of DSCP.  On egress, however, we still use the original COS.  The tricky part is the verification. I'm not certain there's a way to match COS on IPX traffic on a router.  You can do it with a mac access-list on a switch, but using just one switch, there's no way for me to prove that value was actually sent on the next-hop.  So my method, and the entire reason for including SW2 and R2 on the diagram, which haven't been used up until now, was to have SW2 match COS values on it's ingress queuing, proving my theory.

So now, we will be IPX pinging R1 to R2, instead of R1 to R3, as we have been doing thus far.

R1 is still marking COS 4 on outbound traffic on Fa0/0.705.

Even though the internal queuing decision on R1 will be made based on DSCP, the COS 4 value will pass through on to SW2.  Obviously no DSCP is set as part of the L2/L3 synchronization, as there's no DSCP field on IPX traffic.

On SW2, when COS 4 enters, we need to put it in a queue where we can identify it:

Rack1SW2(config)#mls qos srr-queue input cos-map queue 1 threshold 2 4
Rack1SW2(config)#int fa0/13
Rack1SW2(config-if)#mls qos trust cos

So we'll expect hits on Queue 1, threshold 2, on Fa0/13 on SW2.  (SW1 and SW2 are connected on Fa0/13)

Let's check out the stats on that port before we begin:

Rack1SW2#$rm port-asic stats enqueue port 14 asic 1 | b RxQueue Enqueue
  RxQueue Enqueue Statistics
    Queue 0
      Weight 0 Frames 0
      Weight 1 Frames 72
      Weight 2 Frames 0
    Queue 1
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 0
    Queue 2
      Weight 0 Frames 1078527
      Weight 1 Frames 0
      Weight 2 Frames 8831
    Queue 3
      Weight 0 Frames 0
      Weight 1 Frames 7358
      Weight 2 Frames 101

Cool, no frames on that queue/threshold yet.

Rack1R1#ping 123.0019.e880.09c0   ! the last 12 digits are the mac address of Fa0/0 on R2
Translating "123.0019.e880.09c0"
Type escape sequence to abort.
Sending 5, 100-byte IPX Novell Echoes to 123.0019.e880.09c0, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms

Rack1SW2#$rm port-asic stats enqueue port 14 asic 1 | b RxQueue Enqueue
  RxQueue Enqueue Statistics
    Queue 0
      Weight 0 Frames 0
      Weight 1 Frames 72
      Weight 2 Frames 0
    Queue 1
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 0
    Queue 2
      Weight 0 Frames 1078527
      Weight 1 Frames 5     
      Weight 2 Frames 8831
    Queue 3
      Weight 0 Frames 5
      Weight 1 Frames 7358
      Weight 2 Frames 101

There it is, COS 4 unchanged.

I can't tell you how relieved I am to be done writing this.  This is definitely the hardest blog post I've written, mainly because all the moving parts - setting QoS, matching it in one spot, queueing it in another, handing it off to a 3rd device - makes accurately labbing this topic extremely error-prone and taxing.  Not to mention all the decimal -> DSCP conversions, which are so easy to mix up or overlook when you're reading config. Some of these samples I had to look over for upwards of 30 minutes to figure out where I messed up the first go-round.

Come back for 3560 QoS part 2 of 2, where I will cover all the common 3560 QoS features in detail, including many of the ones used in this article.

Hope you enjoyed,

Jeff

4 comments:

  1. Hey Jeff

    This is an truly excellent piece of work and has helped to clear up a lot of grey areas for me.

    Thanks for taking the time to share this!

    As for the ACL matching multiple times is it used in multiple places such as a inbound policy, outbound policy, and on an interface? Just a hunch.

    I will read Part 2 tomorrow ;)

    ReplyDelete
  2. Thanks, I think you'll find part 2 easier to digest. :P This article got a bit niche as I got deeper into it.

    ReplyDelete
  3. Thanks Jeff for the extensive testing.
    The commands to verify both the input queue and output queue are really useful.

    I assume that I can see the packets being dropped using command
    show platform port-asic stats drop port 1 asic 9

    ReplyDelete
  4. Thanks Jeff for this article! Please keep them coming :)

    ReplyDelete