Ethernet Bridging

Introduction

The IEEE 802.1Q-2022 (Bridges and Bridged Networks) standard defines the operation of bridges in computer networks. A bridge, in the context of this standard, is a device that connects two or more network segments and operates at the data link layer (Layer 2) of the OSI (Open Systems Interconnection) model. The purpose of a bridge is to filter and forward frames between different segments based on the destination MAC (Media Access Control) address.

Bridge kAPI

Here are some core structures of bridge code. Note that the kAPI is unstable, and can be changed at any time.

struct net_bridge_vlan

per-vlan entry

Definition:

struct net_bridge_vlan {
    struct rhash_head               vnode;
    struct rhash_head               tnode;
    u16 vid;
    u16 flags;
    u16 priv_flags;
    u8 state;
    struct pcpu_sw_netstats __percpu *stats;
    union {
        struct net_bridge       *br;
        struct net_bridge_port  *port;
    };
    union {
        refcount_t refcnt;
        struct net_bridge_vlan  *brvlan;
    };
    struct br_tunnel_info           tinfo;
    union {
        struct net_bridge_mcast         br_mcast_ctx;
        struct net_bridge_mcast_port    port_mcast_ctx;
    };
    u16 msti;
    struct list_head                vlist;
    struct rcu_head                 rcu;
};

Members

vnode

rhashtable member

tnode

rhashtable member

vid

VLAN id

flags

bridge vlan flags

priv_flags

private (in-kernel) bridge vlan flags

state

STP state (e.g. blocking, learning, forwarding)

stats

per-cpu VLAN statistics

{unnamed_union}

anonymous

br

if MASTER flag set, this points to a bridge struct

port

if MASTER flag unset, this points to a port struct

{unnamed_union}

anonymous

refcnt

if MASTER flag set, this is bumped for each port referencing it

brvlan

if MASTER flag unset, this points to the global per-VLAN context for this VLAN entry

tinfo

bridge tunnel info

{unnamed_union}

anonymous

br_mcast_ctx

if MASTER flag set, this is the global vlan multicast context

port_mcast_ctx

if MASTER flag unset, this is the per-port/vlan multicast context

msti

if MASTER flag set, this holds the VLANs MST instance

vlist

sorted list of VLAN entries

rcu

used for entry destruction

Description

This structure is shared between the global per-VLAN entries contained in the bridge rhashtable and the local per-port per-VLAN entries contained in the port's rhashtable. The union entries should be interpreted depending on the entry flags that are set.

Bridge uAPI

Modern Linux bridge uAPI is accessed via Netlink interface. You can find below files where the bridge and bridge port netlink attributes are defined.

Bridge sysfs

The sysfs interface is deprecated and should not be extended if new options are added.

STP

The STP (Spanning Tree Protocol) implementation in the Linux bridge driver is a critical feature that helps prevent loops and broadcast storms in Ethernet networks by identifying and disabling redundant links. In a Linux bridge context, STP is crucial for network stability and availability.

STP is a Layer 2 protocol that operates at the Data Link Layer of the OSI model. It was originally developed as IEEE 802.1D and has since evolved into multiple versions, including Rapid Spanning Tree Protocol (RSTP) and Multiple Spanning Tree Protocol (MSTP).

The 802.1D-2004 removed the original Spanning Tree Protocol, instead incorporating the Rapid Spanning Tree Protocol (RSTP). By 2014, all the functionality defined by IEEE 802.1D has been incorporated into either IEEE 802.1Q (Bridges and Bridged Networks) or IEEE 802.1AC (MAC Service Definition). 802.1D has been officially withdrawn in 2022.

Bridge Ports and STP States

In the context of STP, bridge ports can be in one of the following states:
  • Blocking: The port is disabled for data traffic and only listens for BPDUs (Bridge Protocol Data Units) from other devices to determine the network topology.

  • Listening: The port begins to participate in the STP process and listens for BPDUs.

  • Learning: The port continues to listen for BPDUs and begins to learn MAC addresses from incoming frames but does not forward data frames.

  • Forwarding: The port is fully operational and forwards both BPDUs and data frames.

  • Disabled: The port is administratively disabled and does not participate in the STP process. The data frames forwarding are also disabled.

Root Bridge and Convergence

In the context of networking and Ethernet bridging in Linux, the root bridge is a designated switch in a bridged network that serves as a reference point for the spanning tree algorithm to create a loop-free topology.

Here's how the STP works and root bridge is chosen:
  1. Bridge Priority: Each bridge running a spanning tree protocol, has a configurable Bridge Priority value. The lower the value, the higher the priority. By default, the Bridge Priority is set to a standard value (e.g., 32768).

  2. Bridge ID: The Bridge ID is composed of two components: Bridge Priority and the MAC address of the bridge. It uniquely identifies each bridge in the network. The Bridge ID is used to compare the priorities of different bridges.

  3. Bridge Election: When the network starts, all bridges initially assume that they are the root bridge. They start advertising Bridge Protocol Data Units (BPDU) to their neighbors, containing their Bridge ID and other information.

  4. BPDU Comparison: Bridges exchange BPDUs to determine the root bridge. Each bridge examines the received BPDUs, including the Bridge Priority and Bridge ID, to determine if it should adjust its own priorities. The bridge with the lowest Bridge ID will become the root bridge.

  5. Root Bridge Announcement: Once the root bridge is determined, it sends BPDUs with information about the root bridge to all other bridges in the network. This information is used by other bridges to calculate the shortest path to the root bridge and, in doing so, create a loop-free topology.

  6. Forwarding Ports: After the root bridge is selected and the spanning tree topology is established, each bridge determines which of its ports should be in the forwarding state (used for data traffic) and which should be in the blocking state (used to prevent loops). The root bridge's ports are all in the forwarding state. while other bridges have some ports in the blocking state to avoid loops.

  7. Root Ports: After the root bridge is selected and the spanning tree topology is established, each non-root bridge processes incoming BPDUs and determines which of its ports provides the shortest path to the root bridge based on the information in the received BPDUs. This port is designated as the root port. And it is in the Forwarding state, allowing it to actively forward network traffic.

  8. Designated ports: A designated port is the port through which the non-root bridge will forward traffic towards the designated segment. Designated ports are placed in the Forwarding state. All other ports on the non-root bridge that are not designated for specific segments are placed in the Blocking state to prevent network loops.

STP ensures network convergence by calculating the shortest path and disabling redundant links. When network topology changes occur (e.g., a link failure), STP recalculates the network topology to restore connectivity while avoiding loops.

Proper configuration of STP parameters, such as the bridge priority, can influence network performance, path selection and which bridge becomes the Root Bridge.

User space STP helper

The user space STP helper bridge-stp is a program to control whether to use user mode spanning tree. The /sbin/bridge-stp <bridge> <start|stop> is called by the kernel when STP is enabled/disabled on a bridge (via brctl stp <bridge> <on|off> or ip link set <bridge> type bridge stp_state <0|1>). The kernel enables user_stp mode if that command returns 0, or enables kernel_stp mode if that command returns any other value.

VLAN

A LAN (Local Area Network) is a network that covers a small geographic area, typically within a single building or a campus. LANs are used to connect computers, servers, printers, and other networked devices within a localized area. LANs can be wired (using Ethernet cables) or wireless (using Wi-Fi).

A VLAN (Virtual Local Area Network) is a logical segmentation of a physical network into multiple isolated broadcast domains. VLANs are used to divide a single physical LAN into multiple virtual LANs, allowing different groups of devices to communicate as if they were on separate physical networks.

Typically there are two VLAN implementations, IEEE 802.1Q and IEEE 802.1ad (also known as QinQ). IEEE 802.1Q is a standard for VLAN tagging in Ethernet networks. It allows network administrators to create logical VLANs on a physical network and tag Ethernet frames with VLAN information, which is called VLAN-tagged frames. IEEE 802.1ad, commonly known as QinQ or Double VLAN, is an extension of the IEEE 802.1Q standard. QinQ allows for the stacking of multiple VLAN tags within a single Ethernet frame. The Linux bridge supports both the IEEE 802.1Q and 802.1AD protocol for VLAN tagging.

VLAN filtering on a bridge is disabled by default. After enabling VLAN filtering on a bridge, it will start forwarding frames to appropriate destinations based on their destination MAC address and VLAN tag (both must match).

Multicast

The Linux bridge driver has multicast support allowing it to process Internet Group Management Protocol (IGMP) or Multicast Listener Discovery (MLD) messages, and to efficiently forward multicast data packets. The bridge driver supports IGMPv2/IGMPv3 and MLDv1/MLDv2.

Multicast snooping

Multicast snooping is a networking technology that allows network switches to intelligently manage multicast traffic within a local area network (LAN).

The switch maintains a multicast group table, which records the association between multicast group addresses and the ports where hosts have joined these groups. The group table is dynamically updated based on the IGMP/MLD messages received. With the multicast group information gathered through snooping, the switch optimizes the forwarding of multicast traffic. Instead of blindly broadcasting the multicast traffic to all ports, it sends the multicast traffic based on the destination MAC address only to ports which have subscribed the respective destination multicast group.

When created, the Linux bridge devices have multicast snooping enabled by default. It maintains a Multicast forwarding database (MDB) which keeps track of port and group relationships.

IGMPv3/MLDv2 EHT support

The Linux bridge supports IGMPv3/MLDv2 EHT (Explicit Host Tracking), which was added by 474ddb37fa3a ("net: bridge: multicast: add EHT allow/block handling")

The explicit host tracking enables the device to keep track of each individual host that is joined to a particular group or channel. The main benefit of the explicit host tracking in IGMP is to allow minimal leave latencies when a host leaves a multicast group or channel.

The length of time between a host wanting to leave and a device stopping traffic forwarding is called the IGMP leave latency. A device configured with IGMPv3 or MLDv2 and explicit tracking can immediately stop forwarding traffic if the last host to request to receive traffic from the device indicates that it no longer wants to receive traffic. The leave latency is thus bound only by the packet transmission latencies in the multiaccess network and the processing time in the device.

Other multicast features

The Linux bridge also supports per-VLAN multicast snooping, which is disabled by default but can be enabled. And Multicast Router Discovery, which help identify the location of multicast routers.

Switchdev

Linux Bridge Switchdev is a feature in the Linux kernel that extends the capabilities of the traditional Linux bridge to work more efficiently with hardware switches that support switchdev. With Linux Bridge Switchdev, certain networking functions like forwarding, filtering, and learning of Ethernet frames can be offloaded to a hardware switch. This offloading reduces the burden on the Linux kernel and CPU, leading to improved network performance and lower latency.

To use Linux Bridge Switchdev, you need hardware switches that support the switchdev interface. This means that the switch hardware needs to have the necessary drivers and functionality to work in conjunction with the Linux kernel.

Please see the Ethernet switch device driver model (switchdev) document for more details.

Netfilter

The bridge netfilter module is a legacy feature that allows to filter bridged packets with iptables and ip6tables. Its use is discouraged. Users should consider using nftables for packet filtering.

The older ebtables tool is more feature-limited compared to nftables, but just like nftables it doesn't need this module either to function.

The br_netfilter module intercepts packets entering the bridge, performs minimal sanity tests on ipv4 and ipv6 packets and then pretends that these packets are being routed, not bridged. br_netfilter then calls the ip and ipv6 netfilter hooks from the bridge layer, i.e. ip(6)tables rulesets will also see these packets.

br_netfilter is also the reason for the iptables physdev match: This match is the only way to reliably tell routed and bridged packets apart in an iptables ruleset.

Note that ebtables and nftables will work fine without the br_netfilter module. iptables/ip6tables/arptables do not work for bridged traffic because they plug in the routing stack. nftables rules in ip/ip6/inet/arp families won't see traffic that is forwarded by a bridge either, but that's very much how it should be.

Historically the feature set of ebtables was very limited (it still is), this module was added to pretend packets are routed and invoke the ipv4/ipv6 netfilter hooks from the bridge so users had access to the more feature-rich iptables matching capabilities (including conntrack). nftables doesn't have this limitation, pretty much all features work regardless of the protocol family.

So, br_netfilter is only needed if users, for some reason, need to use ip(6)tables to filter packets forwarded by the bridge, or NAT bridged traffic. For pure link layer filtering, this module isn't needed.

Other Features

The Linux bridge also supports IEEE 802.11 Proxy ARP, Media Redundancy Protocol (MRP), Media Redundancy Protocol (MRP) LC mode, IEEE 802.1X port authentication, and MAC Authentication Bypass (MAB).

FAQ

What does a bridge do?

A bridge transparently forwards traffic between multiple network interfaces. In plain English this means that a bridge connects two or more physical Ethernet networks, to form one larger (logical) Ethernet network.

Is it L3 protocol independent?

Yes. The bridge sees all frames, but it uses only L2 headers/information. As such, the bridging functionality is protocol independent, and there should be no trouble forwarding IPX, NetBEUI, IP, IPv6, etc.

Contact Info

The code is currently maintained by Roopa Prabhu <roopa@nvidia.com> and Nikolay Aleksandrov <razor@blackwall.org>. Bridge bugs and enhancements are discussed on the linux-netdev mailing list netdev@vger.kernel.org and bridge@lists.linux-foundation.org.

The list is open to anyone interested: http://vger.kernel.org/vger-lists.html#netdev