Often, it just works. Why should you care about it? STP kills loops.
Why dedicate a protocol to prevent loops in local area networks?
First, loops in most network topologies are bad. The network’s job is to relay a datagram between two or more points, with the expectation that the datagram spends a finite amount of time on that network. In IP based networks, this finite lifetime on the network is at least guaranteed with the use of TTL and the explicit duty of IP forwarding devices to discard packets with a value of zero in this field. Conversely, Ethernet based networks have no such mechanism. A loop at layer 2 has the potential to relay an Ethernet frame for an indefinite period of time. Frames forwarded in perpetuity can be potentially amplified, consume resources, prevent topologies from stabilizing, and often cause full out disruption to the forwarding and control planes.
How does STP prevent loops?
Spanning tree implements a means for devices in a network to share information and determine a loop-free topology. Each node shares its perspective of the network and eventually determines what local ports can safely forward frames without introducing a loop.
Before beginning to forward frames (and potentially causing a loop), devices running spanning tree must first follow a series of steps to determine the network topology.
Bootstrapping the Topology
At the core of a spanning tree domain is the root bridge. This device is responsible for determining time spent in convergence and how forwarding paths will be constructed.
- Elect a root. At first, all devices transmit messages, or Bridge Protocol Data Units, claiming to be the root of the topology. These messages are addressed to a destination MAC of 01:80:C2:00:00:00 (notice that the least significant bit of the first octet is set to 1 – this is a multicast destination) and carry the following format:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
The root bridge will be elected based on the bridge identifier field. Specifically, the device transmitting messages with the lowest value in this field will become the root of the topology. This value is composed of the device’s priority (32678 by default, 4096 in the above frame) and MAC address. Priority is a configurable value, and must be set in increments of 4096. If priorities are equal, devices will fallback to evaluating MAC addresses.
The integer lying between the priority and the MAC address is the VLAN ID for the spanning tree domain, which is necessary for per-VLAN spanning tree. Note that this is a Cisco switch, with spanning-tree extend system-id configured. In this situation, a root is elected for each VLAN.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
- Elect a root port. Once a root bridge is elected, it becomes necessary for all devices to determine the lowest cost path back towards the root. This is achieved with the following steps:
Once elected, the root bridge will begin sending configuration BPDUs referred to as hello messages at a default interval of 2 seconds. The Root Path Cost field in these BPDUs will be 0 when leaving the root switch.
Devices directly connected to the root will receive these messages and elect a single port on which to send datagrams back towards the root referred to as the Root Port. The port receiving BPDUs with the lowest Root Path Cost is determined to be the Root Port. If Root Path Cost is equal across 2 or more ports, Bridge ID, port Priority, or physical port number is used to break the tie.
Only the root bridge originates the BPDU configuration messages. Non-root devices are responsible for propagating these messages towards the edges of the topology. Before propagating a BPDU configuration message, a non-root device will write its bridge ID into the bridge identifier field, rewrite the port identifier field, increment the Root Path Cost field with the cost of its Root Port, and increment the message age. Message age is similar to IP TTL in that it is incremented every time a message is propagated (much like a hop). This field also indicates to a non-root device how far it is away from the actual root of the topology.
Only hello messages received on Root Ports are propagated. These BPDU configuration messages are only propagated via designated ports. The root device will continue to send BPDU configuration messages at an interval of the hello timer. These messages serve as a keepalive mechanism – non-root devices can assume that they have lost contact with the root device if a configuration BPDU is not received within the max age timer.
How is cost determined?
Cost is generally determined based on the interface speed of a port. These values have been updated continuously as interface speeds have increased (1 and 10GE interfaces had the same port cost at the 1998 release of the 801.2D specification). The most recent costs (802.1D-2004) acommodate ports with higher speeds. Listed below are the default costs for both specifications:
801.d-1998
10Mbps = 100
100mbps = 19
1Gbps = 4
10Gbps = 2
801.d-2004
10Mbps = 2000000
100Mbps = 200000
1Gbps = 20000
10Gbps = 2000
On Cisco devices, you can specify whether to use the older or newer standards. This is done with the global command spanning-tree pathcost method [ short | long ].
- Elect a Designated Port. Only a single device may propagate BPDU configuration messages from the root onto a LAN segment. That device, and its port on that segment, are elected based on Root Path Cost. The physical port ID is used if a tie exists. All remaining ports on the designated device transition to blocking.
Port States
One catch with spanning tree is that it must communicate with other devices to determine the network’s topology while not introducing a loop. To do this, a device will transition ports through a series of states. A port will wait for a period of the Forward Delay timer before transitioning.
- Blocking: Port does not forward any frames or record source MAC addresses of received frames.
- Listening: Port does not forward any frames or record source MAC addresses of received frames. In this state, the device will process BPDU’s received.
- Learning: Port does not forward any frames, but does record source MAC addresses into the MAC address table. BPDU frames are processed.
- Forwarding: Port is sending and receiving data. BPDUs are processed regularly. This state can only be reached by transitioning through both listening and learning. A transition from blocking directly to forwarding cannot occur. Since two states must be transitioned through before reaching forwarding, we observe that it takes 2xForward Delay for a port to transition from blocking to forwarding and transmit and receive frames.
- Disabled: Port is administratively down.
Convergence
Topology changes in a network are inevitable. Devices may fail, the topology may be scaled, or cabling may be damaged – these are facts of life. Spanning tree was designed to propagate notice of changes throughout the network so that the topology can be re-computed on every device.
When a change is detected, a non-root device will send a BPDU out its root port with the Topology Change Notification (TCN) flag bit set. It will send this BPDU continuously at its hello interval until it receives a BPDU frame with the Topology Change Acknowledgement bit set.
It is the responsibility of the designated switch on a segment to acknowledge a TCN BPDU with TCA BPDU. TCN BPDUs are then propagated back towards the root by all devices in the path. An acknowledgement is expected at every propagation point towards the root.
Once the root has received the TCN BPDU, it is then responsible for sending out a notification to all devices in the network to rapidly age out MAC address entries. This BPDU is sent out with the TC flag bit set.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
All non-root devices receiving this message will inspect the value of the Forward Delay field in the BPDU frame, and apply it as a timer for entries in the device’s MAC address table. Any MAC addresses not seen in the source of a frame for the period of Forward Delay will be removed from the MAC address table. Note that the typical timer for this is 300 seconds:
1 2 3 4 |
|
Timers
Spanning tree uses 3 distinct timers, all of which are configured on the root device:
- Hello Timer: This is the interval at which the root device sends out configuration BPDUs.
- Forward Delay Timer: Time that a transitioning port must spend in the listening and learning states.
- Max Age Timer: The maximum amount of time that a root port will remain forwarding without receiving a configuration BPDU. If this timer expires, the port will transition to a listening state.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
More
The biggest drawback with the 802.1d version of STP is that it takes a lot of time to converge. In some cases, it may take between 30 and 50 seconds for a port to transition to forwarding. This was addressed in the later standard, 802.1w, or Rapid Spanning Tree Protocol (RSTP). Likely a good topic for another post.