Tuesday, February 26, 2013

Catalyst 3560 QoS [2 of 2]: Nuts & Bolts

On most of my posts, I set out to prove some sort of point.  If you read part 1, you'll see the outcome from my order-of-operations research.  This particular post, however, is just going to be a laundry list of functions.  I'll take some time to explain the production use of the functions that weren't used, or were lightly used, in the first post.

We'll be reusing the diagram from the first post:



Quick disclaimer, as mentioned in part 1, all these examples are being done on real equipment - obviously, as GNS3 can't emulate a 3560 - even though I used GNS3 for the diagram.  GNS3 just lends itself to making easy diagrams.

R1, R2 and R3 are all running IPX and IP. No routing is taking place here, this is all one big VLAN (Vlan 705). All routers are using 192.168.0.X on Fa0/0.705, where X is their router number. We're using a subinterface in order to use COS values, as COS is only carried on trunk links. All routers also have an IPX address of 123.YYY.YYY.YYYY, where the Ys represent their MAC address. We'll look at the MACs of my physical gear more closely as we proceed. The switches have no IP addresses on them whatsoever.

These topics will be covered in this post:
- Setting COS, DSCP via "mls qos cos"
- DSCP Passthrough
- DSCP Mutations
- Per-port classification & remarking via service-policy
- Matching MAC Access Lists
- Trusting Cisco phones
- Ingress Policing
- Aggregate Ingress Policing
- VLAN-based Marking
- Per-Port, Per-VLAN Policing
- Shared Mode & Shaped Mode
- Weighted Tail Drop
- Queue Sets
- Priority Queuing

Let's get started!

Setting COS, DSCP via "mls qos cos"

Rack1SW1(config-if)#mls qos trust cos
Rack1SW1(config-if)#mls qos cos 5

This will set COS 5 on the port if COS is "none", and, if you're sending IP traffic will also mark the appropriate DSCP when forwarding the packet to the next device.  This works on both trunk and access ports.  Access ports are easy, as COS is only carried on trunks.  If you're using this on a trunk port, the tagged frames are trusted, and the native VLAN is set to the interface value.

Rack1SW1(config-if)#mls qos cos override
Rack1SW1(config-if)#mls qos cos 5

This works on all frames - and will set the COS regardless of what it already has assigned. 

Let's take a look at override in conjunction with IP traffic, and see that it does set the DSCP.  As with many of the labs we'll be doing today, I'll be sending pings from R1 Fa0/0.705, through SW1 - ingress Fa0/1, egress Fa0/3, to R3 Fa0/0.705.

I'll be setting COS 2 on R1, having SW1 override to COS 5, and expecting EF (explained shortly) to be marked when the packet arrives on R3.

Our config will look like so:

R1:

policy-map setcos2
 class class-default
  set cos 2

interface FastEthernet0/0.705
 encapsulation dot1Q 705
 ip address 192.168.0.1 255.255.255.0
 ipx network 123 encapsulation ARPA
 service-policy output setcos2

SW1:

mls qos map cos-dscp 0 8 16 24 32 46 48 56  ! this will have COS 5 mapped to DSCP EF (46)

interface FastEthernet0/1
 mls qos cos 5
 mls qos cos override

R3:

interface FastEthernet0/0.705
 ip access-group all_qos in
 service-policy input cos

R3's all_qos access list matches & permits every possible DSCP/PREC value.
R3's policy-map cos polices all COS values.  The policer is not important; it's just a way of matching COS values.

Rack1R1#ping 192.168.0.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/4/8 ms

Rack1R3#show policy-map int | s cos5
    Class-map: cos5 (match-any)
      5 packets, 590 bytes      30 second offered rate 0 bps, drop rate 0 bps
      Match: cos  5
        5 packets, 590 bytes
        30 second rate 0 bps
     [output omitted]

There's our COS 5.

Rack1R3#sh ip access-list | i match
    480 permit ip any any dscp ef (15 matches)

And there's DSCP EF.  Of interesting note, I always get 3x the matches on the ACL than I'm expecting.  Not really sure why.  If you have input, please post...

DSCP Passthrough

Now let's say you didn't want the L2/L3 sync to happen, and we didn't want DSCP rewritten.

Same config as above.  Clearing counters on R3. 

Rack1SW1(config)#no mls qos rewrite ip dscp
Note this command is global.

Rack1R3#show policy-map int | s cos5
    Class-map: cos5 (match-any)
      5 packets, 590 bytes      30 second offered rate 0 bps, drop rate 0 bps
      Match: cos  5
        5 packets, 590 bytes
        30 second rate 0 bps

COS 5, as expected.

Rack1R3#sh ip access-list | i match
    660 permit ip any any precedence routine (15 matches)

But no DSCP EF!

DSCP Mutations

The idea behind DSCP mutations is if you attached to a different administrative/DSCP domain, you may want to take their values and turn them into something else.

policy-map setdscpef
 class class-default
  set dscp ef

Rack1R1(config-subif)#no service-policy output setcos2
Rack1R1(config-subif)#service-policy output setdscpef

Rack1SW1(config)#mls qos map dscp-mutation ef-to-af11 46 to 10
Rack1SW1(config)#interface Fa0/1
Rack1SW1(config-if)#no mls qos cos 5
Rack1SW1(config-if)#mls qos trust dscp
Rack1SW1(config-if)#mls qos dscp-mutation ef-to-af11
...counters have been cleared on R3...

Rack1R1#ping 192.168.0.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms

Rack1R3#sh ip access-list | i match
    120 permit ip any any dscp af11 (15 matches)

EF translated to AF11.

Per-port classification & remarking via service-policy

Even if it's a layer 2 interface, you can still classify by IP.  This function is straightforward, it works much the same as it would on a router.

ip access-list extended icmp_acl
 permit icmp any any

class-map match-all icmp_cm
 match access-group name icmp_acl

policy-map remark_icmp
 class icmp_cm
  set dscp af13

Rack1SW1(config)#int fa0/1
Rack1SW1(config-if)#no mls qos dscp-mutation ef-to-af11
Rack1SW1(config-if)#service-policy input remark_icmp
...clearing counters on R3...

Rack1R1#ping 192.168.0.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/3/4 ms

Rack1R3#sh ip access-list | i match
    160 permit ip any any dscp af13 (15 matches)

There are a few gotchas.  For instance, if you've trusted DSCP, you'd expect values unmodified by the policy-map to still maintain their DSCP values.  Not so fast...

First off, there's only a handful of things you can do with a policy map on a 3560:

Rack1SW1(config-pmap-c)#?
QoS policy-map class configuration commands:
  exit            Exit from QoS class action configuration mode
  no              Negate or set default values of a command
  police          Police
  service-policy  Configure QoS Service Policy
  set             Set QoS values
  trust           Set trust value for the class

You can police, set the QoS value to something else, or trust the marking.

If you're setting the QoS to something else, then the original trust value doesn't matter.  If you're policing it...

policy-map remark_icmp
 class icmp_cm
  police 1000000 100000 exceed-action drop

Our 5 pings won't come close to that policer value.

Rack1R1#ping 192.168.0.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/4/8 ms

Rack1R3#sh ip access-list | i match
    660 permit ip any any precedence routine (15 matches)

If we have a match in a policy-map that doesn't set, trust, or remark the value, you get a default (0) value.  Mind you, we're still trusting DSCP on the interface, but it's ignored.  So be sure to:

policy-map remark_icmp
 class icmp_cm
  police 1000000 100000 exceed-action drop
  trust dscp

Of important note, you do need to re-apply the service-policy on 3560s for changes to the policy-map to take effect.  That stumped me a few times.

Rack1R3#sh ip access-list | i match
    480 permit ip any any dscp ef (15 matches)

Now traffic that isn't policed gets trusted.

Matching MAC Access Lists

I'll start with some basics on mac access lists. 

These only work on switches.  You can't create these on routers.
Primary use is to match non-IP traffic.
We'll be focusing on IPX traffic in my examples.  The ethertype for IPX is 0x8137.

The format on a MAC ACL is:

[permit | deny] [source mac address] [source mask] [destination mac address] [source mask] [Ethertype] [Ethertype Mask] [optional cos] [optional cos value]

For example,

permit host 0013.c460.2be0 host 0012.43c1.6f20 0x8137 0x0 cos 5

This would permit COS 5 IPX traffic between 0013.c460.2be0 and 0012.43c1.6f20.
The cos match is optional.

The only items I've used for the mask are either 0x0 (specific match - similar to /32 in IPv4) or 0xFFFF (match everything).  There are probably some more advanced uses but I haven't had a call for it.

Another interesting thing is that they basically can't be used to block IP traffic.  If you try to restrict 0x800 (Ethertype for IP), IP traffic still flows.  I have read a post that claims that filtering 0x800 will filter malformed IP packets, but pass the good ones.  I didn't have the desire to lab that up and test, but I did think it was worth mentioning.

And last but definitely not least, be really careful not to block ARP (0x806).  MAC ACLs may not block IP traffic, but they do block ARP, and no ARP = no IP functionality.

Let's give this thing a try.

The MAC addresses I listed in the example above are the real MACs for R1 and R3, respectively.  Let's verify IPX traffic before we begin:

Rack1R1#ping ipx 123.0012.43c1.6f20
Type escape sequence to abort.
Sending 5, 100-byte IPX Novell Echoes to 123.0012.43c1.6f20, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms

IPX has end-to-end reachability.

Now let's block IPX unless the COS is 5.

mac access-list extended block-ipx
 permit host 0013.c460.2be0 host 0012.43c1.6f20 0x8137 0x0 cos 5
 deny   any any

Rack1SW1(config)#int fa0/1
Rack1SW1(config-if)#mac access-group block-ipx in

Rack1R1#ping ipx 123.0012.43c1.6f20
Type escape sequence to abort.
Sending 5, 100-byte IPX Novell Echoes to 123.0012.43c1.6f20, timeout is 2 second                s:
.....
Success rate is 0 percent (0/5)

Does IP still work?

Rack1R1#ping 192.168.0.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/4/4 ms

Yes ... for now.

Rack1R1#clear arp
Rack1R1#ping 192.168.0.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

Uh-oh!  Let's fix this.

Rack1SW1(config)#no mac access-list extended block-ipx
Rack1SW1(config-ext-macl)#permit host 0013.c460.2be0 host 0012.43c1.6f20 0x8137 0x0 cos 5
Rack1SW1(config-ext-macl)#permit any any 0x806 0x0  ! ARP
Rack1SW1(config-ext-macl)#deny any any

Rack1R1#ping 192.168.0.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
.!!!!
Success rate is 80 percent (4/5), round-trip min/avg/max = 4/4/4 ms

Great.  Now let's send some COS 5 IPX traffic and see if it goes through.

policy-map setcos5
 class class-default
  set cos 5

Rack1R1(config)#int fa0/0.705
Rack1R1(config-subif)#no service-policy output setdscpef
Rack1R1(config-subif)#service-policy output setcos5
Rack1R1#ping ipx 123.0012.43c1.6f20
Type escape sequence to abort.
Sending 5, 100-byte IPX Novell Echoes to 123.0012.43c1.6f20, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms

That's how to match on a Mac access-list in a nutshell.  Now let's classify on one, this is a QoS article after all.

mac access-list extended IPX-COS5
 permit any any 0x8137 0x0 cos 5

class-map match-all match-macl
 match access-group name IPX-COS5

policy-map change_ipx_cos
 class match-macl
  set dscp af21

Any traffic that permits on that (IPX and ARPs) will be remarked to... AF21?  There is no "set cos" command in a policy-map, so this is our option.  I go over these types of matches in great detail in part 1:
http://brbccie.blogspot.com/2013/02/catalyst-3560-qos-1-of-2-order-of.html

Verifying this is something of a pain.  We have to use a second switch to match the non-IP COS.  So now we'll be pinging R1 to R2, via SW1 and SW2.

Read part 1 for a walkthrough of what should happen here, but in short, we'll expect:
- R1 to send COS 5 IPX ping through SW1
- SW1 will take COS 5, set the internal QoS label to DSCP AF21 (via service policy)
- SW1 will then ingress & egress queue on DSCP AF21
- SW1 will then set COS 2 based on the DSCP -> COS table
- SW2 will receive COS 2 and trust it
- SW2 will ingress queue on COS 2, and we'll see the queue hits

A little extra configuration on SW2:

Rack1SW2(config)#mls qos srr-queue input cos-map queue 2 threshold 3  2
Rack1SW2(config)#interface fa0/13 ! this is the interface traffic will ingress from SW1 on
Rack1SW2(config-if)#mls qos trust cos

Rack1R1#ping ipx 123.0019.e880.09c0 ! 0019.e880.09c0 is R2's Fa0/0 mac address
Type escape sequence to abort.
Sending 5, 100-byte IPX Novell Echoes to 123.0019.e880.09c0, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/4 ms
Rack1SW2#show platform port-asic stats enqueue port 2 asic 1 | b RxQueue Enqueue
  RxQueue Enqueue Statistics
    Queue 0
      Weight 0 Frames 0
      Weight 1 Frames 72
      Weight 2 Frames 0
    Queue 1
      Weight 0 Frames 0
      Weight 1 Frames 0
      Weight 2 Frames 0
    Queue 2 <-- ACTUALLY QUEUE 1
      Weight 0 Frames 701325
      Weight 1 Frames 0
      Weight 2 Frames 5815
    Queue 3 <-- ACTUALLY QUEUE 2
      Weight 0 Frames 44
      Weight 1 Frames 0
      Weight 2 Frames 5

For clarification on the "show platform port-asic" command, read part 1. 

The hits on queue 2 prove COS 2 was the ingress value.

Trusting Cisco phones

If you're supporting a phone installation, you need to trust the QoS from the phone, and generally dump the QoS value from the PC behind it.

To trust the phone:
mls qos trust device cisco-phone
mls qos trust cos
OR
mls qos trust dscp

Of note, COS can be trusted on a phone interface, as it's technically a dot1q trunk (even if Cisco doesn't call it one).

Other tuning options are:
switchport priority extend trust ! this will trust the PC behind the phone
switchport priority extend cos 1 ! reset the PC's COS to 1

Ingress Policing

This uses policy-maps like the ones we saw earlier.  Adding policing is very easy.

ip access-list extended icmp_acl
 permit icmp any any

class-map match-all icmp_cm
 match access-group name icmp_acl

policy-map remark_icmp
 class icmp_cm
  police 8000 8000 exceed-action drop
  trust dscp

interface Fa0/1
 service-policy input remark_icmp

Recall from above that we must re-trust DSCP on the policy map.

I'm not going to bother labbing this, the outcome is pretty obvious.  Let's look at the DSCP markdown option instead.  The global policed-dscp map determines how the markdown works.

mls qos map policed-dscp 46 to 18

interface Fa0/1
 no service-policy input remark_icmp

no policy-map remark_icmp
policy-map remark_icmp
 class icmp_cm
  police 8000 8000 exceed-action policed-dscp-transmit
  trust dscp

interface Fa0/1
   service-policy input remark_icmp

Now we should end up with some mix of EF and AF21. 

Rack1R1#ping 192.168.0.3 repeat 100
Type escape sequence to abort.
Sending 100, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 100 percent (100/100), round-trip min/avg/max = 1/3/4 ms

Rack1R3#show ip access-list | i match
    200 permit ip any any dscp af21 (87 matches)
    480 permit ip any any dscp ef (213 matches)

Big important note before we move on ... the 3560 cannot hierarchically police outbound! (although it can bulk rate limit the entire interface) To get a similar function to hierarchical policing, you must be careful with ingress policing, or put the traffic in a queue with a small shaped flow on egress, which we'll talk about later.

Aggregate Ingress Policing

An aggregate ingress policer is a policer that performs its "counting" across multiple classes.

mls qos aggregate-policer POLICETEST 10000 8000 exceed-action policed-dscp-transmit

ip access-list extended icmp_acl
  permit icmp any any
ip access-list extended tcp_acl
  permit tcp any any

class-map match-all icmp_cm
  match access-group name icmp_acl
class-map match-all tcp_cm
  match access-group name tcp_acl

policy-map aggregate_policer
  class icmp_cm
    police aggregate POLICETEST
    trust dscp
  class tcp_cm
    police aggregate POLICETEST
    trust dscp

interface Fa0/1
  no service-policy input remark_icmp
  service-policy input aggregate_policer

Not going to bother labbing this one, but it works just like it looks!

VLAN-based Marking

The 3560 can mark traffic VLAN-wide.  It cannot, however, police VLAN-wide. 

The policer gets applied to the SVI.  The SVI does not have to have an IP address on it.  Even though it's applied to the SVI, you still need to tell the ports that you want involved that their QoS is derived from the VLAN.  This is done with mls qos vlan-based on the interfaces in question.

The ports involved can be access or trunk ports.

One oddity is that class-default is ignored on the policy-map.

ip access-list extended icmp_acl
  permit icmp any any

class-map match-all per_vlan_cm
 match access-group name icmp_acl

policy-map per_vlan
 class per_vlan_cm
   set dscp af13

interface fa0/1
 no service-policy input aggregate_police
 mls qos vlan-based

interface fa0/13
 mls qos vlan-based

int vlan705
  service-policy input per_vlan

I've also added my before-mentioned giant all-dscp / all-prec ACL to R1.  We'll be pinging towards R1 from R3, and towards R3 from R1, and we should see DSCP AF13 on each of them.  I've also prepped the port R3 is connected to on SW1:

Rack1SW1(config-if)#int fa0/3
Rack1SW1(config-if)#mls qos trust dscp

Rack1R1#ping 192.168.0.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/4/4 ms

Rack1R3#show ip access-list | i match
    160 permit ip any any dscp af13 (15 matches)

and vice-versa...

Rack1R3#ping 192.168.0.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/4/8 ms

Rack1R1#sh ip access-list | i match
    160 permit ip any any dscp af13 (15 matches)

I've temporarily added vlan 706 to both R1 and R3, and the appropriate underlying switching. We should see that this does not happen to Vlan 706.  It's subnet is 192.168.1.0/24.  It has a similar QoS ACL applied on the routers to check the matches.

Rack1R1#ping 192.168.1.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/4/8 ms

Rack1R3#clear ip access-list counters
Rack1R3#sh ip access-list | i match
    660 permit ip any any precedence routine (15 matches)

No EF .... now for vice-versa:

Rack1R3#ping 192.168.0.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.0.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/5/8 ms

Rack1R1#sh ip access-list | i match
    660 permit ip any any precedence routine (15 matches)

Per-Port, Per-VLAN Policing

The trick to PPPV policing is using a nested service policy and having the child class match the ports involved.  I mentioned above that you couldn't police per-Vlan on all ports.  You can specify more than one port in the class-map, but every port is individually policed; it does not aggregate the policing across ports.

! the traffic we want to match
ip access-list extended icmp_acl
  permit icmp any any

! this matches the traffic we want to match
class-map match-all PPPV_parent_cm
  match access-group name icmp_acl

! what interfaces are involved
class-map match-all PPPV_child_cm
  match input-interface FastEthernet0/1 FastEthernet0/3

! this is the actual (child) policer
policy-map PPPV_child_pm
  class PPPV_child_cm
    police 16000 8000 exceed-action drop

! the parent policy is what gets applied to the interface
policy-map PPPV_parent
  class PPPV_parent_cm
   trust dscp
   service-policy PPPV_child_pm

! we already set these ports to vlan-based QoS, but I added it here for clarity
interface Fa0/1
 mls qos vlan-based

interface Fa0/3
 mls qos vlan-based

! apply it to the SVI.  No IP required on the SVI.
interface Vlan705
 service-policy input PPPV_parent

The idea here is that R1 can ping R3, and R3 can ping R1, and both flows will be policed.  However, as I mentioned, the aggregate of R1 + R3's ICMP input is not added up.  They're both individually policed at 16,000 bits per second.  Now, this is really hard to show in a blog, so if I start both pings more-or-less at the same time, I can show you the output and we can come to some rough conclusions.

Rack1R1#ping 192.168.0.3 repeat 100
Type escape sequence to abort.
Sending 100, 100-byte ICMP Echos to 192.168.0.3, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!.!!!!!!!!!!!!!!!!
!!.!!!!!!!!!!!!!!!!!!.!!!!!!!!
Success rate is 97 percent (97/100), round-trip min/avg/max = 4/5/8 ms

97% success rate.

Rack1R3#ping 192.168.0.1 repeat 100
Type escape sequence to abort.
Sending 100, 100-byte ICMP Echos to 192.168.0.1, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!.!!!!!!!!!!!!!!!!!!.!!!!!!!!!
Success rate is 98 percent (98/100), round-trip min/avg/max = 4/5/8 ms

And 98% success rate.

If the two flows were somehow being policed aggregate, we'd expect a 50% success rate from both, or a 98% on one and 0% on the other.  Clearly, they are being individually policed.

Shared Mode & Shaped Mode

There are two forms of queuing supported on the 3560: shared mode & shaped mode.

Shared mode is based on relative weights, shaped mode is based on percentages.

For example, on the four egress queues, setting:
srr-queue bandwidth share 25 25 25 25

would give each queue 1/4 of the bandwidth, with the ability to burst to line rate if the other queus are not requiring their share. 

However, this is not a percentage.

using:
srr-queue bandwidth share 98 98 98 98

would have the same exact effect.

srr-queue bandwidth share 200 100 50 25

This would give the first queue about 53%, the second about 26.5%, the third about 13.5%, and the fourth about 6.5%.  The easy way to do the math is add all four up, and divide the queue's value by the cumulative value.  In other words, the sum of all four queues is 375.  Divide the first queue's weight, 200, by 375, and you get 53.33%.

For egress queues, the shared weights are configured per-port, for ingress queues, the weights are configured globally.  Of important note, the ingress queues only support shared mode, whereas the egress queues can be shared or shaped.  Shared mode is configured globally via mls qos srr-queue input bandwidth Q1value Q2value.

Shaped mode is based on percentages.  Shaped mode is configured by:

srr-queue bandwidth shape 4 4 4 4

This would give 1/4 of the bandwidth to each queue, but in a totally different fashion than shared mode.  The percentages are calculated as 1/value.  So if you specify 4 for queue 1, you get 1/4th. 

Using:

srr-queue bandwidth shape 2 6 6 6

would give 50% of the bandwidth (1/2) to queue 1, and ~16% to each of the remaining queues (1/6).

Shaped mode's claim to fame is that it guarantees the bandwidth availability, but also polices the queue to that rate.  In other words, queue 1 could not borrow any more bandwidth if it needed to burst to 75% of the interface rate; 50% is its absolute limit.

Shaped more and shared mode can be mixed.

srr-queue bandwidth shape 2 4 0 0
srr-queue bandwidth shape 50 50 50 50

Shaped mode always gets its share of the bandwidth "first".  This isn't to imply some sort of priority, just that the math works that way.  In this case, 1/2 + 1/4 = 75% of the bandwidth is taken by shaped mode.  Shaped mode takes config priority over shared mode, so the first two queues' values of 50 are ignored.  The last two queues, having equal weights, would share the remaining 25% evenly amongst each other, resulting in 12.5% each.

I mentioned above you could rate-limit an interface on egress, but couldn't hierarchically police it.  Rate limiting is accomplished with: srr-queue bandwidth limit Value-in-Percent

Value-in-Percent can be any number 10-90.  What if you want less than 10mb?  Simple answer:

interface Fa0/1
  speed 10
  srr-queue bandwidth limit 50

Reduce the interface to 10MB.  That config would give you 5MB outbound.

Weighted Tail Drop

If you've read part 1 and the rest of this blog, you've seen me refer to thresholds over and over again.  The three thresholds are the the basis for WTD (weighted tail drop) and buffers.  The idea behind WTD is that some traffic should be dropped before all the buffers assigned to the queue are used up (remember, this is buffers per queue, not per interface).  There are three WTD thresholds per interface.  The first two are configurable, the 3rd is locked at 100%. 

For example, if you wanted ingress FTP packets to drop first when congestion begins, then ingress SQL traffic to drop when congestion is getting rather bad, then ingress voice traffic to drop dead last, you might map FTP to threshold 1, SQL to threshold 2, voice to threshold 3, and then configure:

mls qos srr-queue input threshold 1 25 75

The first 1 is the queue in question, the 25 and 75 are the percentage values at which to begin dropping.  So FTP drops at 25%, SQL drops at 75%, and voice drops at the implicit 100% when there's no other option, because we're so congested we're out of buffers.

I scratched my head just a little bit when I first saw this configuration.  For example, this is valid:

mls qos srr-queue input threshold 1 75 90

My brain just had a hard time with those numbers, it kept trying to add 75 + 90 (is greater than 100).  There's no addition taking place here.  That just means tail drop happens for threshold 1 when the buffers for the entire queue are at 75%.  Then 15% later, at 90%, tail drop happens for threshold 2.  Full drop on a frames (notably queue 3) happens at 100%.

That covers ingress thresholds, now let's talk about egress.

Queue-Sets

Queue-sets are used to set buffers and WTD for egress queues.

show mls qos queue-set

will show you the existing queue-sets. 

Rack1SW1(config-if)#do show mls qos queue-set
Queueset: 1
Queue     :       1       2       3       4
----------------------------------------------
buffers   :      25      25      25      25
threshold1:     100     200     100     100
threshold2:     100     200     100     100
reserved  :      50      50      50      50
maximum   :     400     400     400     400
Queueset: 2
Queue     :       1       2       3       4
----------------------------------------------
buffers   :      25      25      25      25
threshold1:     100     200     100     100
threshold2:     100     200     100     100
reserved  :      50      50      50      50
maximum   :     400     400     400     400

All ports are members of queue set 1 by default.

The 3560 has both per-port buffers and a common buffer pool that ports can borrow from.
I found this chart rather hard to read at first, so let's talk about it more, using the defaults from queue-set 1 as an example:

Queue : 1 2 3 4
----------------------------------------------
buffers : 25 25 25 25
threshold1: 100 200 100 100
threshold2: 100 200 100 100
reserved : 50 50 50 50
maximum : 400 400 400 400

This is generally read vertically (bolded for ease of reading).  It says that queue 1 gets 25% of the interface buffers.  Queue 1, threshold 1 (explained in previous section) is set at 100%.  Queue 1, threshold 2 is also set at 100%.  Queue 1 threshold 3 isn't on the chart as it can't be modified (always 100%).  At least 50% of the buffers are reserved for this queue at all times.  This queue may use up to 400% (4x its normal quantity) of buffers if they're available.  The extra buffers can be acquired from other queues if they're not being used, or from the common buffer pool, if available.

The configuration works as so:

mls qos queue-set output 1 buffers 50 25 10 15
mls qos queue-set output 1 threshold 2 33 66 100 200

That would get you these results:

Queueset: 1
Queue     :       1       2       3       4
----------------------------------------------
buffers   :      50      25      10      15
threshold1:     100      33     100     100
threshold2:     100      66     100     100
reserved  :      50     100      50      50
maximum   :     400     200     400     400

The first command - mls qos queue-set output 1 buffers 50 2 5 10 15 - sets all four per-queue buffers, horizontally, across the chart.

The second command - mls qos queue-set output 1 threshold 2 33 66 100 200 - sets the thresholds, reserved, and maximum buffers vertically for one individual queue.

Changing which queue-set an interface belongs to is easy:

Rack1SW1(config)#int fa0/1
Rack1SW1(config-if)#queue-set 2

Priority Queuing

PQ can be used both ingress and egress.  Contrary to what I've read in many other places, the PQ is enabled on ingress by default on .  It is, however, disabled on egress by default.

On ingress, the PQ can be either queue 1 or queue 2.  It also has a configurable bandwidth parameter.  The bandwidth is set at 10% on queue 2 by default.

Setting/changing both the ingress queue and bandwidth can be done via:

mls qos srr-queue input priority-queue Queue-Num bandwidth Reserved-BW-Percentage

for example,

mls qos srr-queue input priority-queue 1 bandwidth 40

So, what happens to the shared mode values at this point?

Best I can tell the PQ is served completely until its bandwidth is used up, and then the shared weights kick in.  So, the priority queue is serviced up to its value, and then can continue to transmit as shared in relation with its neighbor queue.  So the above command, setting 40% PQ on queue 1, would obviously take up to 40% of the bandwidth.  The remaining 60% is split amongst the shared queues, including the same queue the PQ is sitting on.

So if you:

Rack1SW1(config)#mls qos srr-queue input bandwidth 50 100

Queue 1 would get another 1/3 of the bandwidth (ratio of 50:100), and since we have 60% left, that's 20% more in addition to whatever it got from the PQ.  Queue 2 would get the other 40% left over.

If you want to turn off the PQ for ingress entirely, you can:

mls qos srr-queue input priority-queue 2 bandwidth 0

Enabling the egress PQ is easy:

Rack1SW1(config)#int fa0/1
Rack1SW1(config-if)#priority-queue out

The egress PQ is always queue 1.

There's an important difference, however - the egress PQ has no shaper, policer, etc on it.  It will be serviced non-stop even if there's traffic and the other queues will starve.  Be mindful on your ingress policing that too much traffic doesn't arrive for egress!

That means even if you set a shaper of 20 percent on that interface, it will just ignore the shaper value and continue using 100% of the resource until its done transmitting.

Hope everyone enjoyed and learned.

Cheers,

Jeff

4 comments:

  1. Thanks Jeff

    I really enjoyed this two part series on 3560 QoS. Great job :)

    Nick

    ReplyDelete
  2. Hi Jeff,
    which IOS image were you using to get all those qos commands?

    thanks
    Alex

    ReplyDelete