Thursday, December 27, 2012

A different perspective on CIR, PIR, Tc, Bc & Be

The best topics in my CCIE studies have been the ones where I've experienced a true paradigm shift in my thinking. With this topic, I've had three, and one of those came months after the first, when I thought I was most of the way done typing the first revision of this document. I will do my best to convey all three here and now, and perhaps save someone else the same long journey.

But first, an introduction....

Both policing and shaping are tools to deal with a service provider giving a higher speed phyiscal interface with an understanding that the customer will only use a fraction of the speed. This is an ingress tool to not allow the core or egress links to become swamped. For example, a SP might give out Gig-E interfaces with the understanding that their customers will only use 200MB of it. If all customers actually used 1GB, the edge router, core, or egress routers could easily run out of bandwidth.

Policing is the tool used at the SP side to enforce the traffic policy, and shaping is the tool used at the enterprise edge towards the policer, to conform to the policy.

By the way, I've had an argument about this with a couple people in the past. The CIR CAN equal the line rate of the interface. Marketing nonsense from some ISPs may make you believe otherwise.  In a scenario where CIR equals line rate, you don't need to shape or police, and none of this matters!

Before I delve into how CIR, PIR, Bc, Be, Tc, etc work, I will share with you the first two secrets to understanding all of this.

You'll see moving forward that the Bc / Tc / CIR interaction generally involve transmitting data at certain intervals and pausing at others. This is rather confusing to people looking at this topic for the first time. The common question is why can't the interface just slow down and transmit all the time?

There are two extremely important concepts to grasp first:

1) Throw out the idea of Mbit/sec or Kbit/sec What is a 100Mbit interface? A human measurement concept. Your router doesn't, and should not, measure in one second increments. Routers need to think much faster than one second intervals. This makes Tc easier to understand.

2) Think serialization rate. "100Mbit" interfaces serialize at a certain speed that allows them to achieve "100Mbit per second". That speed is not variable. The interface always transmits at the same speed, which is full-tilt for the interface. For example, a "100Mbit" interface will always transmit at "100Mbit". It cannot become a "50Mbit" interface just because you want it to (although, ironically, it can become a 10Mbit interface by changing the port speed). To achieve 50Mbit, you tell a 100Mbit interface to transmit half the time.

Now that we understand that, at a high-level, how do we achieve 50Mbit on a 100Mbit interface?

Well, we'd have to transmit roughly half the time in one second.

This is another important concept. You could absolutely achieve 50Mbit by transmitting the first half the second and not the second half. That would eliminate all the Bc/Tc concept entirely! Now, let's think about VoIP. Imagine transmitting the first half the second and not the second half. You'd have a pretty unpleasant sounding phone call! This clearly won't work. You'd want to have even breaks transmitting and pausing, transmitting and pausing, across a single second.

Hopefully now it's clear why we need a method to divvy up time.

Everyone here has seen the Tc = Bc/CIR formula. All the concepts needed to understand this are above.

Tc is the number of "segments" one second is divided into. Bc is the number of bits we're allowed to send per Tc. CIR is the per-second traffic commit.

Let's delve into that formula a little more. Let's say you want to send 50MBit on a 100MBit interface. Let's start from a shaping (enterprise/client) perspective.

Generally speaking, how your service provider will determine either the Tc or Bc for you, based on how their policing is setup. In our case, let's say the service provider has decided the Tc shall be .125 sec (125ms). This is 1/8th of a second. This means that our measurements happen in 1/8th of a second blocks. Remember above, "one second" is too long of an interval to measure in. We'd end up with very choppy interactive sessions, and terrible VoIP.

Now, if we're transmitting in eight 125ms segments, and we're trying to achieve 50Mbit/sec, we'd need to divide 50Mbit by 8. In bits, that would leave us with 6250000. This is your Bc - Burst Commit.

On a side note about Tc, if you're dealing with interactive traffic, it's best to have the smallest Tc possible.  That said, if you shrink your Tc too far, your Bc may be smaller than an average packet!  Packets aren't fragmented by the shaper, so you need to take this into account by dropping your MTU or using a technology like LFI to get the job done. 

PLEASE NOTE: When not specifying policing or shaping throughout this document, and just talking about general concepts, I reference bits.  Shaping refers to everything in bits, however, policing refers to CIR and PIR and in bits, and Bc and Be in BYTES.

Note the math easily works in reverse, if the service provider says use a Bc of 6250000, and you knew the CIR was 50Mbit, you could deduce a Tc of 125ms.

So to reiterate, this leaves us sending 6250000 bits every 125ms. This particular interface is capable of sending 12500000 bits (100Mbit/sec) every 125ms. Clearly, we are going to only be transmitting part of the time.

I apologize in advance for the size of my diagrams. I couldn't think of a way to make them small and get the point across. I always like PDFing good blogs to read later on my tablet, and that's may be difficult with this one.

Here is a blank diagram, for reference moving forward. You'll see we'll be using a Tc of 125ms, resulting in 8 segments to a second (click for larger/legible image!):

At this point I need to give some brief insight into the token & bucket system that the RFCs refer to.
When I first got started with this topic, I had a hard time with the token/bucket analogies, so I'm not going to harp on this too much yet, favoring instead my per-second traffic diagrams.  In a nutshell, the token/bucket system says this:  Every bit you send requires a token.  The tokens are stored in the Bc bucket.  Every Tc interval, the Bc bucket gets filled back up with Bc worth of tokens.  (note, for policing, every byte you send requires a token)

I show the Bc "bucket" getting refilled every Tc interval in most of these diagrams.  This is true only in shaping.  The policing Bc bucket actually gets refilled a fraction of its tokens every time a packet is sent.

If we populate my blank diagram from above with a visual example of shaping from 100MB down to 50MB, using red as an indicator of traffic flow, and assuming (as mentioned above) that the client is moving at least 50MB at all times, keeping in mind that an interface is either transmitting at "full speed" or not at all, the visual representation might be:

Hopefully that makes sense - every 1/8th of a second, we transmit for half the time (1/16th of a second), our shaper runs out of tokens from it's virtual bucket, and then we pause for half the time (1/16th of second).  

To be clear with the diagrams, the flow above is assuming we're going at full speed.  The Bc does not have to get fully used up every Tc. 

Take the flip side and look at this from a policer standpoint. This is a single-rate, two-color policer. Here's where it comes from:

Single-rate = CIR (this will make more sense when we get to dual-rate)
Two-color = Green = Conforming to Bc; Red = Exceeding Bc

Don't worry so much about the single-rate yet. The two-color part is easy. On a two-color policer, traffic is either "good" - green (at or below the Bc) or "bad" - red (above the Bc). It's up to the policer what to do with the traffic. A very simple setup would transmit the "green" and drop the "red". A more complex setup might transmit the green and set a likely-to-be-dropped QoS flag on the red.

Let's look at the configuration for CIR/Bc/Tc, or single-rate, two-color policer. We'll also look at the relevant shaper configuration. We'll use our same analogy of 50MB on a 100MB interface.

policy-map policer
  class class-default
   police cir 50000000 bc 781250  ! remember, policing bc is in bytes
     conform-action transmit
     violate-action drop

interface serial0/0
  service-policy input policer

Note, conform-action and violate-action are both configurable, and could do any of the following:
drop - drop packet
set-clp-transmit - set atm clp and send it
set-discard-class-transmit - set discard-class and send it
set-dscp-transmit - set dscp and send it
set-frde-transmit - set FR DE and send it
set-mpls-exp-imposition-transmit - set exp at tag imposition and send it
set-mpls-exp-topmost-transmit - set exp on topmost label and send it
set-prec-transmit - rewrite packet precedence and send it
set-qos-transmit - set qos-group and send it
transmit - transmit packet

Also note we didn't set the Tc at all. The majority of these commands interpret the Tc based off Bc/CIR. As shown above, we have 781250 bytes divided by 50MBits, or 6250000/50000000 = 0.125; or 1/8th of a second.

We can verify this with show policy-map interface.

R2#show policy-map int
  Service-policy input: policer
    Class-map: class-default (match-any)
      0 packets, 0 bytes
      30 second offered rate 0 bps, drop rate 0 bps
      Match: any
          cir 50000000 bps, bc 6250000 bytes, be 1500 bytes
        conformed 0 packets, 0 bytes; actions:
        exceeded 0 packets, 0 bytes; actions:
        violated 0 packets, 0 bytes; actions:
        conformed 0 bps, exceed 0 bps, violate 0 bps

You're probably wondering what exceeded is, that's used for a three-color policer, which we'll cover in just a bit.

Now we could just rely on the service provider's policer to keep us in check, but having our packets dropped is not much fun. The avoidance technique for this is to delay packets to meet the CIR, which is where shaping comes in. Shaping buffers packets to prevent the service provider's policer from kicking in. Remember, always shape towards a policer. Another useful function for a shaper is to delay traffic when the service provider doesn't drop traffic - they just charge a premium for going over the CIR. Let's say you have redundant BGP-enabled Internet links, and you normally push 40MB out of both of them. Your CIR with your primary carrier is 50MB on a 100MB interface. You want to be able to burst to 100MB if your secondary carrier goes down, but you don't want to do this automatically, because the carrier charges you 4x the normal rate per MB when you go over 50MB -- they won't drop your traffic, they just make your bill massive! A shaper is useful to make sure you don't accidentally go over 50MB unless the secondary carrier is offline long-term. During the outage, just log in, manually remove the shaper, and you're back to full speed (and flat broke).

Here's what our matching shaper would look like:

policy-map shaper
 class class-default
  shape average 50000000 6249984 0
interface s0/0
  service-policy output shaper

We've got some interesting numbers here, let's check it out.

50000000 is obvious, it's the CIR.

6249984 is as close as I can get to 6250000 and still be a multiple of 128. This is an IOS limitation. You can enter any number you want, but as I understand it it will round to the closest 128 if you do. The CLI warns you about this:

R1(config-pmap-c)#shape average 50000000 ?

<256-154400000> bits per interval, sustained. Needs to be multiple of 128.

The last number - 0 - is not to be overlooked!

0 is the Be - Burst Excess - which we'll cover next. If you don't say "0", and just leave the value empty, you get a Be = Bc automatically, and this does not show up in the config. Major pitfall, as using a "three-color shaper" with a two-color policer is far from ideal, and as this blog is mostly about passing a test, would definitely not get you points on the lab!

R1#show policy-map int

  Service-policy output: shaper

    Class-map: class-default (match-any)
      16944 packets, 1154436 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
      Match: any
      Traffic Shaping
           Target/Average   Byte   Sustain   Excess    Interval  Increment
             Rate           Limit  bits/int  bits/int  (ms)      (bytes)
         50000000/50000000  781248 6249984   0         124       781248

        Adapt  Queue     Packets   Bytes     Packets   Bytes     Shaping
        Active Depth                         Delayed   Delayed   Active
        -      0         9         2997      0         0         no

Three color policers are reasonably easy if you can grasp the two-color we've been discussing thus far.

The concept of a single-rate, three-color policer is all about the Be - Burst Excess - value. The Be value makes policing more "fair". The idea is that if a host hasn't transmitted for a while, we let it accumulate a credit that lets it send a short burst when it's ready to send again.

This is also where the two-bucket system comes in.  As mentioned above, every time a bit needs to be transmitted, it needs a token. A single-rate, two color policer has one bucket. If a host hasn't transmitted, the bucket is full up to Bc with tokens. The host may or may not use those tokens. Roughly every Tc, the bucket is refilled up to Bc. If Bc was partially full (limited transmission occured that Tc), or completely full (no transmission occured that Tc), those new tokens go completely to waste, and "spill" out of the token bucket and into the void.

My visuals from above assumed we were always using all our bits.  Don't get that too stuck in your head!  It's just an example.  Here's an example of a traffic pattern only using part of the Bc during the first Tc:

Unlike the diagram above, in a two-bucket system (single-rate, three-color), if the Bc bucket overflows, it spills into the Be bucket. Now the Be bucket fills until it's full, and if even more tokens come in, they spill from Bc -> Be -> "the void".

After the idle period is over, the host would then consume its Bc, and when it runs out of tokens, it goes for the Be bucket.  The Be can basically be set to whatever the service provider wants, provided the link can physically support it. Obviously you can't burst beyond line rate! General practice, and the IOS default for shaping, is to set Be = Bc. So you're allowed to go up to double-speed (in our example line rate of "100MB") for 1/8th of a second, the Tc. Or, you could go at 3/4 speed for two Tcs, etc. The long-term ratio on the policer will still equate the CIR (or less), but it allows more burstiness from the customer.

This is a single-rate, three-color policer. Here's where it comes from:

Single-rate = CIR (more on this later)
Three-color = Green = Conforming to Bc; Yellow = Exceeding Bc, but less than Be, Red = Exceeding Be

This gives us some additional QoS control. A common practice might be to transmit on green, give a higher drop probability on yellow (via a DSCP/IPP/DE-bit change), and completely drop on exceed.

Something of quick note before we move on to the policer configuration. Take a look at my output from earlier, from above:

R2#show policy-map int

  Service-policy input: policer

    Class-map: class-default (match-any)
      0 packets, 0 bytes
      30 second offered rate 0 bps, drop rate 0 bps
      Match: any
          cir 50000000 bps, bc 6250000 bytes, be 1500 bytes        conformed 0 packets, 0 bytes; actions:
        exceeded 0 packets, 0 bytes; actions:
        violated 0 packets, 0 bytes; actions:
        conformed 0 bps, exceed 0 bps, violate 0 bps

Technically speaking the MQC doesn't have a strict two-color policer, but you can force it to be one by making the exceeded action "drop" (this is default if the violated action is drop, and you don't specify exceeded).

So now we'll make "exceeded" something we can take advantage of:

policy-map policer
 class class-default
   police cir 50000000 bc 781250 be 781250
     conform-action transmit
     exceed-action set-dscp-transmit af13
     violate-action drop

interface Serial0/0
  service-policy input policer

This is an example of a very common setup; bc = be, and the exceed action sets a DSCP value more likely to be dropped by a fictional upstream router.

Shaping at this is very easy:

policy-map shaper
 class class-default
  shape average 50000000 6249984 6249984  ! 781250 from above is 6259984 divided by 8

interface Serial0/0
  service-policy output shaper

I don't think that needs much explanation, your variables are CIR, Bc, and Be, just like before. This time we're just specifying that we want a Be. Of note, if you left Be out, you'd have the same exact answer. As I mentioned above, Be is assumed to be equal to Bc unless otherwise specified. Hence needing the "0" for Be in two color example above.

The scenario above lends itself very easily to a diagram, because the Bc/Be interaction is so simple.  This is the way I thought of Bc/Be for a long time, and while it might work with this scenario, from the a big picture perspective, it's wrong.  Do not use this diagram moving forward!  I am showing this only to show where the problem with the logic is.

Here, we'll see one complete Tc with no transmission.  The Be bucket has filled up to its maximum size, which is the same as one Bc.  In the 3rd Tc, the Bc bucket is completely emptied as we've been seeing in previous diagrams, the host still has more to transmit, so then the Be bucket is drained, resulting in total transmission for one Tc.  The Be usage is shown in pink to make it obvious.

As mentioned, this works, for this scenario.  However, in order to grasp more advanced topics, you should never overlap the Bc and Be diagrams.  They're separate buckets with separate functionality.

Let's take a more advanced scenario.  Say Be is equal to twice the size of Bc.  There are no transmissions for two Tcs, so we've spilled two Bcs worth of tokens into the Be.  After that we have a lengthy burst.  Now we're no longer committing the full contents of the Be in just one Tc, we're treating it like rollover minutes for a cell phone plan:  they accumulate, in a separate pool, and they can be drawn on whenever needed.   Here's a diagram of how this might play out:

Tc 2 and 3 have no transmission, Be builds up to its maximum size of 2xBc, and at Tc 4, the host bursts all the way through Tc 7 non-stop.  Tc 8 it is forced to slow down again after Be is depleted.  As you'll see in a moment, this is still an "easy" scenario, but get used to this thought process before moving forward.

In a dual-rate, three-color policer (note, there cannot be a dual-rate, two-color policer!), Tc and Bc work the same, but Be works very differently. Previously, in a single-rate, two bucket (three color) system, we had the imaginary "tap" filling the policer's token system into the first bucket, Bc, and then "spilling over" into Be if Bc was completely full.

There is no spillover in dual-rate. The two buckets are still there, but now there are two "taps" independently filling Bc and Be.

Let's put this down to a real-world example, with a real world motivator - money.

Let's say our service provider has been oferring single-rate, three-color policing as their only offering. They've been charging you $2,500 a month for this 50MB capped circuit. They're selling this same 50MB capped circuit all over town, and they have roughly enough bandwidth on their upstream edge to handle it if all their customers transmit at 50MB all the time. But we all know that doesn't happen, generally speaking not every customer bursts at the same time (basic concept of oversubscription). However, they're still nervous to open their connections to the customer edge to 100MB, even though they know they could theoretically charge $5,000 a month for the same circuit, because everyone transmitting at the same time is still possible, and they want to provide a consistent customer experience.

Their sales guy (hah! - more likely the tech guy) wakes up in the middle of the night and has an "AHA!" moment. He can have the engineering team implement a DSCP QoS policy on the upstream edge that selectively drops customer traffic that exceeded 50MB if there is congestion. In other words, he can take that capped 50MB link that was selling for $2,500, uncap it as a 100MB link with a 50MB guarantee, and now sell it for $3,750 a month!

But in order to do this, he needs edge policing that will mark customer ingress traffic sub-50MB with a favorable DSCP value, and customer ingress traffic > 50MB with a drop-likely DSCP value. He can then have the engineering team implement a WRED policy on the upstream routers to their Tier-1 ISPs that can agressively drop the traffic that was marked as Be on ingress. 

Here's what our fictional ISP's setup looks like:
This ISP is a little behind the times - the upstream link (between ISP and INTERNET) is only 100Mb, and they've only got two customers, both with CIR of 50Mb, and a PIR of 100Mb.

The PIR is Peak Information Rate, and defines how fast the customer is allowed to transmit.  In our case, the PIR is the interface speed.

Here's what our relevant configs on "ISP" look like.

policy-map congestion_avoidance
 class class-default
  bandwidth 100000
  random-detect dscp-based
  random-detect dscp 14   5     100
  random-detect dscp 34   4095  4096  1

policy-map dual_rate_ingress
 class class-default
   police cir 50000000 bc 781250 pir 100000000 be 3125000
     conform-action set-dscp-transmit af41
     exceed-action set-dscp-transmit af13
     violate-action drop

interface FastEthernet1/0
 service-policy input dual_rate_ingress

interface FastEthernet2/0
 max-reserved-bandwidth 100
 service-policy output congestion_avoidance
 hold-queue 4096 out

Let's break this apart a bit.  On ingress from the customer, we'll be policing with a CIR of 50Mbit and PIR of 100Mbit.  Disregard the Bc and Be for the moment, I made those deliberately tricky, so we'll come back to them.  If the rate is < CIR, it conforms, and gets a DSCP value of AF41. If the rate is > CIR < PIR, it exceeds, and is marked AF13.  If the rate exceeds PIR (which is basically impossible, this is a 100Mbit interface), the packet is dropped.

On the side facing the upstream, we've upped our hold-queue to maximum to allow for more software queuing.  We've then implemented DSCP-based WRED to aggressively drop AF13 (14 in the config), while not allowing AF41 to be dropped until full tail drop occurs, which theoretically should never happen.

You'll also note that I didn't show any shaper config on CUST1.  That's because there isn't any.  Technically speaking, you don't have to shape if your PIR is equal to line speed.  It's a particuarly good idea to shape on interfaces that support an explicit "slow down" mechanism (Frame Relay FECN/BECN for example), that way you can use shape adaptive to allow your router to slow down if it gets a congestion notification.  However, on Ethernet, there's no such function.  That said, I will cover shape adaptive in a future FRTS post.

Of note, there is a command for "peak shaping" shape peak that can be used, but it's little more than a way of showing what the CIR is.  Imagine a case where the prior engineer didn't apply shaping to an interface that had a sub-rate CIR.  He quits, and you came in and took his place.  It may not be apparent from a config-level that the CIR wasn't line-rate.

In our case, shape peak would look like this:
shape peak 50000000 6249984 6249984

50000000 is the CIR
The first refence of 6249984 is Bc, second reference is Be.  Bc+Be bits will be transmitted per Tc, so we will be averaging (6249984 + 6249984) *  (1/Tc) per second.  In other words, Bc + Be (~12.5Mbit) * 8 = 100Mbit.

Now for those crazy Bc and Be values on the policer!

This is what I applied above:
police cir 50000000 bc 781250 pir 100000000 be 3125000

But first, let's pretend it was a slightly easier set of values, and work back towards the hard ones.
police cir 50000000 bc 781250 pir 100000000 be 1562500

This is pretty hard to read.  We've got a CIR of 50Mbit, and a Bc of 781250 bytes, which equates to a Bc of 6250000 bits.  6250000/50000000 = Tc of 125ms.  We've then got a PIR of 100Mbit, and a Be of 1562500 bytes, or 12500000 bits.  12500000/100000000 = Tc of 125ms.  But hold the phone!  That means we'd be putting twice as many bits in the Be bucket as the Bc bucket.  That straight-up wouldn't work, because Be would reach its max size of PIR twice as fast as it should.

Go figure, there's a catch.

On dual-rate, three-color policing, when a packet is conforming - enough tokens are present in the Bc bucket to pass the packet - the tokens are removed from the Bc bucket AND the Be bucket simultaneously.  So the Be bucket is deducted an equivalent number of tokens coming out of Bc at all times.  So all those extra (double) tokens going into Be will get removed as soon as the traffic flow starts.

Now let's go back to my crazier scenario.
police cir 50000000 bc 781250 pir 100000000 be 3125000

The CIR math is the same. 6250000/50000000 = Tc of 125ms.  However, the PIR math now says 3125000 bytes/100Mbit, or 250000000/100000000 = Tc of 250ms.  Mismatched Tcs.

I had a really, really hard time with this when I saw it. Best I can tell this is a valid configuration, although I haven't been able to find a single document supporting or denying it.

If you take the token & bucket concept to heart, understanding that we're not using spillover, there's no reason this couldn't work.  The CIR and PIR systems refill rates don't have to be timed with one another.  It's probably a good idea to keep their Tcs in sync, but it should work even if they're not.  Let's look at a revision of my previous diagram as to how this might function.

As you can see here, having the refill rate change doesn't necessarily impact the traffic pattern.  This still blows my mind, and I can't think of why on earth you'd want to do this, but it can be done.

That's all for now - look for a FRTS post covering the peak/adaptive topics in the near future!




  1. Jeff,
    1st I wanna say it's a great post, I like it.
    2sec As we know traffic transmit in two directions, for example for customer there are upload and download speed, same as the service provider PE side, so, we should apply policy on both direction, right? Then in which way should appy policing and shaping?

  2. Hi Yaotian,
    Policing or marking are the major reasons for applying a QoS policy inbound. Obviously you can't control what your neighbor sends you, but you can stop it from overwhelming the rest of your network. That can be accomplished by policing ingress, or by marking exceeding traffic so that it can be dropped or de-prioritized later on. So, to answer your question, policing can be ingress or egress. Ingress is the more "normal" way, so that you stop your neighbor from flooding your network. Egress policing is less usual, but it still works fine. Normally shaping is used on egress, to stall (buffer) traffic to meet the CIR of your neighbor rather than policing, which would drop your excess traffic. Shaping is much better on TCP flows than policing, because policing TCP causes TCP packet loss and the window size decreases, causing the link usage to be inefficient. So, in short, you always shape towards a policer.

    1. This comment has been removed by the author.

    2. Hey Jeff, thanks your reply, indeed.
      I draw a Simple topology of Policing and Shaping used, Please see if I get your point, btw, in my topology, all Qos done on PE router. thanks, or maybe you could draw one? thanks

      ←←←←←←←← ←←←←←←←←
      CE PE ISP orP
      →→→→→→→→ →→→→→→→→
      Marking Policing


  3. This is great, thanks! Studying for the R&S now and this was a great resource.

  4. Dear Jeff,
    Thanks you for your article!

    I am struggling to understand your Single Rate, Three Color Policing Diagram. If Be=2*Bc, then during quiet period we should accumulate Be tokens for only 2 Be transmission intervals. Why there is 4 Be tranmission intervals in your diagram?
    Be token bucket should have been emptied after 2 transmission intervals...