Spanning-Tree “Flavors” (PVST+/Rapid-PVST/MST)

Wow.  Just wow.  I’ve made a career out of routing and switching.  Yet, it’s absolutely mind-blowing to me how much of this stuff I’d either forgotten or simply never learned in the first place.  That’s natural and expected of course… the CCIE program is intended to help us take our skills to the next level, well beyond the stuff we need to know for our day-to-day administration tasks.  So, without further ado… I hope you enjoy learning this as much as I did!  Here are our exam topics:

2.1.f Implement and troubleshoot spanning-tree
2.1.f [i] PVST+/RPVST+/MST

PVST+

In my introduction to STP, I stated that PVST should be replaced by Rapid-PVST as Cisco’s default.  The biggest reason for this is the behavior of these protocols in the event of a topology change.  For PVST, any change in the STP topology will result in a Topology Change Notification (TCN) BPDU.  For PVST, this means that every single time a computer gets turned on or off, a laptop gets plugged in or unplugged, a device goes in or out of power-save mode, etc., basically any event that causes the port’s status to go up or down, topology change occurs.  The TCN tells the switches that a change in the topology table has occurred, and that they must therefore flush their CAM tables.  Switches will set their CAM tables to age out after ForwardDelay seconds, which is 15 seconds by default.  In other words, if a host doesn’t send traffic within 15 seconds to update the CAM table, the switch will have to begin flooding traffic to that host.  This can lead to excessive amounts of flooded traffic.  Additionally, the whole reason for this is that a topology change means that we need to rediscover which port MAC addresses are on.  In order to rediscover hosts, the switch flushes its MAC table – which also means that it can take 15 seconds to detect the fact that a host needs to be reached through a different port.  This shortened CAM expiration condition lasts for MaxAge+ForwardDelay seconds (20 seconds and 15 seconds by default for a total of 35 seconds).

Or, to put it another way – “your PVST switches act like hubs.”  Ok, not exactly like hubs.  Hubs don’t have a CAM table at all.  A PVST switch in a Topology Change state sees increased flooding because of the shortened aging time, and basically says “let’s forget everything we know and start from scratch after 15 seconds.”  Worse yet, it falls back on this behavior EVERY TIME a link goes up or down.

Let’s also briefly review some of the key behaviors of PVST on an access port.  After a switch believes that it can transition a port into the forwarding state, it first waits in the listening and learning states for ForwardDelay seconds, which are 15 seconds each.  This can cause issues for hosts running DHCP, or that are able to come to full operation quickly, but are unable to send any traffic on their links.  A switch will believe that it can be moved into the forwarding state if it does not receive BPDUs, if it receives inferior BPDUs, or if it becomes a root port.

One special case that must be addresses is when you have a mix of Cisco and non-Cisco switches.  Remember that PVST+ is Cisco-proprietary, like its predecessor, PVST.  (The difference between the two being that the original encapsulated information in ISL frames, and a new standard was needed that was capable of using dot1q encapsulation, which is what PVST+ added)  Technically, some vendors have implemented ways of interoperating with Cisco’s PVST(+) protocol, but from Cisco’s standpoint, they just want us to know how to interoperate with generic non-Cisco switches that do not have such interoperability.

In these cases where there is a mix of vendor switches, the non-Cisco portion of the network is said to be running the Common Spanning Tree (CST).  This CST region is essentially merged with PVST+’s VLAN1 instance, and interoperates with VLAN1 as would normally happen.  However, in cases where a non-VLAN1 VLAN exists on both sides of a CST, Cisco switches must handle BPDUs specially to compensate.  In cases where a CST region sits between Cisco PVST switches, the Cisco switches will send the BPDUs to a different MAC address.  Normally, they send to 0180.C200.0000, but in these instances, they send to MAC 0100.0CCC.CCCD.  The BPDUs will have a dot1q VLAN tag (STP normally leaves frames untagged), and use SNAP encapsulation instead of LLC.  Inside the BPDU, they also add the VLAN number (this is in addition to the dot1q tag), and the VLAN number inside of the BPDU is compared against the dot1q tag to help detect native VLAN mismatches.  The CST region will simply see these frames as ordinary multicast frames and will flood them accordingly.  On the other side, the Cisco switch will be intelligent enough to recognize these characteristics as those of a BPDU that has been tunneled across a CST region.

Access ports, when sending BPDUs, will send only the standard IEEE format BPDUs.  This means that they will not communicate the VLAN number in which they run.  That’s fine though, as the port will process any received BPDUs on the appropriate VLAN for that port, according to which port the access VLAN is assigned to.  So, only trunk ports will follow the above behavior of sending multiple BPDUs, one of which is the standard BPDU for VLAN1 and one each for each VLAN running on the trunk in Cisco’s PVST-tunneling mode.

When receiving BPDUs, an access port must receive only the IEEE standard type BPDUs, or they will go into a type-inconsistent state.  This makes sense, as an access port that receives PVST-formatted BPDUs is basically detecting that the other end of the link is configured as a trunk, and this is a configuration issue that needs to be resolved.

RPVST+

I’m going to say it again, because it bears repeating: any new switch you get should be configured for rapid PVST at a bare minimum.  That Cisco still ships a great many switches with PVST as the default is a little surprising.  There are cases where RSTP can actually perform worse than IEEE STP, but that’s not the case in a properly configured environment.  We’ll talk about those tuning mechanisms when we get to our section on STP enhancements, the most notable of which is PortFast.  For today though, we’ll focus on the differences between these different flavors of STP.  Rapid Spanning Tree changes many of the STP concepts we know.

There are two major differences with RSTP from the PVST standard.  First, RSTP is event-driven, rather than being timer-driven, like PVST is.  Rather than a detected change kicking off timers to eventually converge the STP topology, switches will actively track, discover and negotiate states for rapid convergence.  Second, Ports now have different states, roles and types.

Let’s first talk about these RPVST states.  Legacy Spanning-Tree had five, which were disabled, blocking, listening, learning and forwarding.  Rapid Spanning-Tree reduces these to three, which are discarding, learning and forwarding.

As for roles, RSPT still carries the concepts of root ports and designated ports, but adds alternate ports and backup ports, which can be used in the case of a failure of the root port or designated port, respectively.  The existance on these alternate ports is one of the main reasons Rapid STP can recover so quickly in the event of a link failure, as it will have already pre-caluculated the resulting topology in the event that the root port goes down, and can therefore confidently begin forwarding immediately, without having to wait through the listening/learning process that legacy STP had to go through.  Designated ports, on the other hand, do not have a mechanism that can guarantee that the topology will be loop-free, so they must transition from Discarding to Learning to Forwarding.

Finally, RSTP also carries a concept of port types.  RSTP ports can be either of the Edge or Non-Edge type.  An edge port can immediately begin forwarding when it comes up.  It listens for BPDUs, but only for loop avoidance reasons… it certainly isn’t “expecting” to receive BPDUs.

In addition to port types, RSTP also has multiple link types: point-to-point and shared links.  Point-to-point links connect a switch to no more than one neighboring switch on a given link.  It is only in these cases, where there are either zero or one neighboring switch, when RSTP can actually be “rapid.”  When there are multiple switches on the same link, RSTP cannot perform the pre-failure calculations to know exactly what the topology will be in the even of a link failure, and RSTP will be forced to operate as legacy STP.  Interestingly, Catalyst switches will assume that the link type P2P if the link is running as a full-duplex link, and that it is a shared link if the link is running in half-duplex mode.  (Note: If you are trying to simulate this in GNS3/IOU, I’ve been unable to find an L2 image capable of having the interface duplex settings configured – they always remain statically set at “auto” and never actually appear to go into full-duplex mode.  So in IOU, you will see Spanning-Tree come up with the shared link-type by default)  This assumption will not always be right, and can be manually overridden with the interface spanning-tree link-type {point-to-point | shared} command.  (This WILL work in IOU, so while you cannot test the behavior of automatically becoming point-to-point vx. shared, you CAN test and see the resulting behaviours of these different link-types.)

RSTP does not use separate BPDUs for Hellos and TCNs.  The version number in the protocol version field is set to 2.  In STP, only the root bridge generates BPDUs that get forwarded by downstream switches, whereas in RSTP, each switch generates its own individual BPDUs.  This means that instead of having to wait for the MaxAge-MessageAge timer to expire, the switch can age out its neighbor’s BPDU much more quickly, which happens afer 3 hello intervals expire.  MessageAge no longer affects actual timers and is now just used as a method of tracking the hop count to the root bridge.  In STP, switches don’t even bother sending BPDUs on segments where they know their BPDUs are inferior.  In RSTP, inferior BPDUs are indeed sent, and can even immediately age out the previous BPDU, even if the previous BPDU sent was inferior.  The reason for this is that a switch would only begin sending inferior BPDUs in the event of a configuration change or topology change, and the new inferior BPDU should then be the correct one.

Interestingly, the process of dealing with failed links is actually simpler than the process of adding a new link to the RSTP topology when the new link becomes the root port on the switch.  This is because RSTP can calculate in advance the results of a known link going down, but obviously cannot  do so for a link that it does not know about that is about to exist.  In cases where the new link will become the Root Port, some special processes must happen, and they can take a moment.  RSTP uses a proposal/agreement process to perform these calculations.  First, RSTP places all non-edge Designated Ports into a discarding state before putting the new root port into the forwarding state.  This can include the ports that were root or alternate before the new link came up.  Keep in mind though, that the root port will be connected to a new designated port on the neighboring switch, and that the designated port will have JUST become a designated port.  This means that the switch on the other side of the link would normally have to wait through the discarding and learning timers.  In order to get around the need to wait for normal timers to expire for reconvergance, switches will send BPDUs with the proposal bit set.  In order to do this without causing a switching loop, the switch that is moving a new link into the root port role must go into a sync state, meaning that it moves all non-edge designated ports into a discarding state.  The proposal bit being set on the BPDU informs the neighboring switch of the change in root port, and of the request to begin forwarding on this link.  The neighboring switch can then run its own sync process, send an agreement BPDU to tell the original switch with the changed root port “go ahead and start forwarding on this link”, and then that switch will initiate the proposal process with any of its neighboring switches.  This proposal/sync/agreement process is what allows rapid spanning-tree to be rapid in the event of a new root port being discovered.

RSTP still holds the concept of TCNs, as did legacy STP.  However, rather than having a TCN be sent to the root bridge (which STP only did because of the fact that non-bridges don’t really have a way of communicating changes to other bridges without asking the root to perform these notifications on their behalf), switches will immediately flood BPDUs with the TC flag set.  Any switch sending or receiving these BPDUs with the TC bit set will set a tcWhile timer which is the hello timer plus one second.  It then immediately floods all MACs learned on non-edge designated and root ports.  It then sends BPDUs with the TC flag set, and the process repeats, quickly informing all switches in the topology of the change.

MST

MST takes the simplicity of running a single Spanning-Tree instance and combines it with the advantages of running PVST.  Granted, the “simplicity” that we refer to is not in the configuration or theory of MST.  Rather, it’s in regard to the fact that MST is far less processor-intensive than it’s per-VLAN predecessors.  Running an instance of STP for every single VLAN you run is terribly inefficient, especially when you consider the fact that there are rarely more than a few actually feasible topologies that might be used.  You might run 750 VLANs, but it would be very rare to actually have 750 different unique optimized paths through your network.  So rather than run 750 different instances of STP, why not run a few and map the individual topologies to instances, and assign multiple VLANs to the instances?

MST uses the priority/system ID extension field to indicate an STP instance along with the bridge ID.  Of note is the fact that multiple MST instances DO NOT generate multiple BPDUs.  Rather, MST uses a single BPDU to relay information about all instances.  MST also refers to instance 0 as the Internal Spanning Tree.  It is the default instance to which VLANs are mapped, and like VLAN 1 in a PVST/legacy STP edge, instance 0 is the only instance which interacts with STP outside the MST region.

Of critical importance when configuring MST is the fact that three things must match for MST neighbors to interoperate: Region Name, Revision Number, and VLAN-to-Instance mappings.  While these values are not independently transmitted in BPDUs, an MD5 hash of them is, and this hash can be viewed and compared with the show spanning-tree mst configuration digest command.

 

Because of the requirement that the VLAN-to-instance mapping tables match on all switches in the MST region, adding a new VLAN would normally create an outage in your environment.  This can be avoided simply by mapping VLANs in advance, before they actually exist or need to be created.  By having extra VLANs added to your mapping tables, you can later create the VLANs without causing an outage.  Also, because Cisco was the first vendor to the MST game, some older switches run a non-standard variant of MST.  In cases where this is true, newer switches must be configured to use this non-standard implementation with the global spanning-tree mst pre-standard command.

To configure a switch to use MST, first configure MST (then, we can “activate” it after).  Enter MST config mode with the global spanning-tree mst configuration command.  Specify the region with the name region-name command.  Give a revision number with the revision number command.  Assign VLANs to instances with the instance number vlan numbers command:

MSTConfig

Once you are ready to step into the MST world, activate MST with the global spanning-tree mode mst command.  Remember from our VTP blog that VTPv3 can carry MST information, so if converting to MST, it might be a great time to consider changing to VTPv3 as well.

At this point, you’ll already know what you need to know to configure MST, as most of our familiar spanning-tree commands can still be used by adding the MST argument and by configuring instances rather than individual VLANs.  For instance, rather than configuring VLAN 1 with a priority of 4096 with the spanning-tree vlan 1 priority 4096 command, we can configure MST instance 1 with the spanning-tree mst 1 priority 4096.  Show commands follow similar logic.

There’s still plenty more to come in our overview of Spanning-Tree, so stay tuned!  Thanks again for stopping by!

Leave a Comment

Your email address will not be published. Required fields are marked *