Sunday, December 16, 2012

The nitty-gritty of WRED

WRED is such a simple topic at first glance.  It's effective congestion avoidance, and it can be enabled with one command!  If you dig a little deeper, however, there can be quite a lot there.  I've been avoiding doing the WRED deep-dive for a while now, but it finally caught up with me. 

I assume most anyone reading this understands what WRED does at a high-level already, so I will only touch on the general idea.  Any interface that has its transmit (egress) buffer fill up goes into tail drop.  Tail drop is a state where all new packets are dropped.  This is bad, because if TCP sessions are running through that interface, the packet loss will cause all TCP sessions that were part of the tail drop to decrease their window size and go into slow start.  This process is called global synchronization.  It produces a saw-tooth effect on traffic diagrams, as all TCP flows slow down at once, gradually speed back up, experience congestion/packet loss at the same time, and then repeat the slowdown, for infinity.

RED (random early detection) solves this problem by randomly dropping packets prior to the transmit buffer filling up.  The idea is that some TCP flows will go into slow start instead of all of them.  Theoretically, tail drop is avoided, and therefore global synchronization is also avoided.  It's of note that RED/WRED does absolutely nothing for UDP flows, as UDP flows don't have a transport-layer ACK, and therefore there's no way to know at the UDP level if packets haven't been received.  Therefore, UDP cannot implement a slow-start at the transport layer.  If you have an entire interface full of UDP traffic, there's no benefit to running RED at all.

Cisco only implements WRED, as opposed to RED.  WRED is Weighted Random Early Detection, and takes into account IP Precedence (default) or DSCP values, making the "less important" flows get more aggressively dropped. 

WRED can be implemented in two fashions:
1) On the command line
2) As part of a CBWFQ policy

The basic usage of either is simple:

Interface Serial0/0


policy-map example
 class class-default
  fair-queue  !  or bandwidth command

interface Serial0/0
 service-policy output example.

To see the default settings...

...if you applied it on the interface:
show queueing interface Serial0/0

...if you applied it as part of CBWFQ:
show policy-map interface Serial0/0

You'll see in both these cases that the default IP Precedence is being used for the weighting.
If you look at each value, you'll see the default minimum threshold increases as the IPP goes up. 

WRED turns any queue - be that the queueing on a physical interface, or the sub-queueing on a CBWFQ - into a FIFO.  The reasoning is obvious - it's not logical to apply a packet-dropping algorithm to a tiered queue.  If you enable WRED straight on a low-speed serial interface, where WFQ would be enabled by default, it will silently switch it back to a FIFO.

The minimum threshold is the FIFO's queue depth before WRED kicks in.  The overall size of the queue depth is specific by:

interface Serial0/0
  hold-queue X out

Where X is the overall depth of the queue.  In fact, it's completely possible to enable WRED but have it do absolutely nothing if you set the hold-queue smaller than the minimum threshold -- might be useful for some obscure lab question.

The default outbound hold-queue depth is 40.  You can check the setting & current hold-queue depth with show queueing interface serial0/0.

The minimum & maximum threshold can be tuned on a per-IPP basis.  Let's say you want to enable WRED on a physical interface, but don't want it to drop classes 5 & 6 until we have no other choice (tail drop).  Let's also move tail drop out to 100 packets.

interface Serial0/0
 random-detect precedence 5 101 102 10
 random-detect precedence 6 101 102 10
 hold-queue 100 out

We've stopped tail drop from occuring until 100 packets is hit by using hold-queue 100 out.
random-detect precedence 5 101 102 10
Specifies IPP 5 should start randomly dropping at 101 (min thresh), and go to full drop at 102 (max thresh), and uses an MPD of 10 (more on this later).  Technically in this case, min thresh and max thresh could be any value over 100.

MPD, Mark Probability Denominator, defines the maximum percentage of packets dropped before tail-drop happens at max-threshold.  In other words, assuming a hold-queue size of 100:

random-detect precedence 1 50 100 4

Would define that IPP 1 should begin dropping packets at a depth of 50, and go to full drop at 100. If the queue was at 100, it would drop 25% of the packets at IPP 1.  25% is calculated by 1/MPD.  In our case, 1/4.  If the queue was at 50, it would drop some smaller percentage than 25%, while working it's way up to the full 25%.  At 101, it goes to full drop.

Let's look at the DSCP implementation now.

interface Serial0/0
  random-detect dscp-based

show queueing interface Serial0/0

If you scan your eyes down the minimum threshold, you'll probably notice the numbers aren't near as linear as they were with IPP.  Cisco follows the DSCP RFC.  For example, AF23 and AF33 have the same minimum threshold.  That's because the second digit - "3" - determines the drop probability. 

Preventing EF and CS5 from being randomly dropped, similar to the task we performed above with IPP, would be accomplished as:

interface Serial0/0
 random-detect dscp-based
 random-detect dscp cs5 101 102 10
 random-detect dscp ef 101 102 10
 hold-queue 100 out

Another thing of note is that the queue depth is measured on the main interface queue, not the per-dscp or per-IPP queue.

WRED works very similary in a CBWFQ.  It is applied per-traffic class.  You can even match DSCP in some queues and IPP on others.  show policy-map interface Serial0/0 output gets rather lengthy in these scenarios!

Another variable of note is the exponential weighting constant. 

WRED doesn't work based on the actual queue depth.  It works on the average queue depth.  The average is updated based on the exponential weighting constant.

The actual definition of the exponential weighting constant is complicated; please see if you want the mathematical definition.  The long and short of it is as follows:

- 9 is the default.  The valid range is 1-16
- Changing the number down makes WRED react more quickly
- Changing the number up makes WRED react more slowly

The command usage is:
random-detect exponential-weighting-constant 5

ECN - Explicit Congestion Notification - is a mode of WRED that can be enabled to suggest that a traffic flow slow down, instead of actually dropping packets.

The WRED usage is exactly them same, except instead of dropping the packet, the packet is flagged with a congestion notification.

The system works like so-

If a TCP host supports ECN, it sets either (but not both) of the low-order bits in the DSCP byte - ECT or CE - to 1.  If it doesn't support ECN, these should both be set to 0.  Now, when WRED would have dropped a packet with ECN enabled, instead, it sets both ECT and CE to 1, and the router forwards the packet on to the destination.

When the destination receives a TCP packet with both ECT and CE set to 1, it sets the ECE (Explicit Congestion Experienced) flag on its next TCP packet back to the sender.  When the sender receives the packet, it should go into slow start, all without packet loss.  In addition, it sets the CWR (Congestion Window Reduced) flag back towards the receiver to let it know it got the message.

The configuration is simple:

interface Serial0/0
  random-detect ecn

Our last topic is flow-based WRED.  One of the disadvantages of WRED is that it does truly randomly drop packets.  It might drop a bunch of TCP packets from the same flow, or it might target a flow that isn't taking much bandwidth.  Flow-based WRED addresses these issues by making sort of a pseudo-WFQ inside the WRED.  Of note, flow-based WRED does not work with CBWFQ, so we're only talking about an on-the-interface WRED.  Since we can't specify any other type of queueing method on-the-interface while WRED is enabled, it makes sense for WRED to have a more complicate queueing method.

Flow-based WRED - sometimes referred to as FRED - will create a "pseudo-queue" (that's my own term, don't quote it) for each flow passing through the WRED.  It then divides the number of flows by the buffer size.  Let's say you have the default queue size of 40, and 20 flows.  Each flow would be allowed 2 packets in the queue, multiplied by a scaling factor to allow for bursts.  The default scaling factor is 4.  So, each flow would be allowed 8 packets.

In this algorithm, if a flow had more than 8 packets in the queue, it would be penalized more heavily. 

The configuration is relatively easy:

Interface Serial0/0
  random-detect flow
  random-detect flow count 128
  random-detect flow average-depth-factor 8 ! must be a power of 2

This would allow for 128 unique flows, and allow a multiplier of 8 for burst.

Hope you enjoyed my WRED deep-dive.




  1. Jeff, you did a great job on this topic I have always searched this topic on Cisco website and on Google but very few people explain it the way you did. Thank you so much man but unfortunately I'm only seeing this post in November 2015 and your post was written back in 2012. Therefore I think it's been a long time and what I was going to ask you will definitely not be applicable now. I was going to ask you to do more topics on QoS in general since this is one of the most misunderstood subject in the world of networking. Thank you so much for your post..

    1. Thanks for the kinds words! However, it's unlikely I'll be writing any more QoS up in the immediate future. I threw a teaser on my main page that I'd be blogging CCNP Voice (Collab) in the near future, and I've actually dropped that as well, in favor of more CUBE/CUCM/3rd-party SBC research, as that's my voice niche. Those articles should be coming... eventually :)

  2. My name is Lungelo by the way from South Africa...

  3. Thank you for your post, it is very clear.
    I would like to ask you the following question:
    "In WRED, is it possible to dynamically tweak the min- and max-threshold values?"
    My problem is that I would like to change some WRED parameters according to the network performance.

  4. Jeff, thank you for this post. I'm really enjoying turning to your blog for clear answers when a lot of other articles talk about general and sometimes vague stuff.

    The way I've seen WRED implemented in my organization doesn't look correct. They have got WRED implemented in AF4x class without filtering UDP from TCP traffic causing poor quality live video streams and have set MPD value to 1. When I try to seek explanation, I'm told that the EWC is playing a role to exponentially increase drops as Average queue Size moves to maximum threshold from minimum threshold so that all packets are dropped by the time Average Queue size hits the maximum threshold. I'm unable to buy this logic. What would you say? Is it just plain wrong or my interpretation is wrong?

    1. It doesn't seem a good plan to mix important UDP flows into a queue with WRED enabled, ever. I can imagine some use cases where MPD should be set to 1. I can't see how adjusting EWC is going to change the situation - EWC just controls how "knee jerk" the process is. These sort of things are hard to judge without all the details, but what you're describing doesn't seem like a proper configuration.

  5. Great article, but have a question.

    You mention the following:

    "In fact, it's completely possible to enable WRED but have it do absolutely nothing if you set the hold-queue smaller than the minimum threshold -- might be useful for some obscure lab question."

    Then in the example below you appear to do just that for IPP 5 and 6?

    "interface Serial0/0
    random-detect precedence 5 101 102 10
    random-detect precedence 6 101 102 10
    hold-queue 100 out"

    So in this case, WRED will have no affect at all on IPP5 and 6.

    Once again, great article!