The Linux Kernel
5.18.0
  • The Linux kernel user’s and administrator’s guide
  • Kernel Build System
  • The Linux kernel firmware guide
  • Open Firmware and Devicetree
  • The Linux kernel user-space API guide
  • Working with the kernel development community
  • Development tools for the kernel
  • How to write kernel documentation
  • Kernel Hacking Guides
  • Linux Tracing Technologies
  • Kernel Maintainer Handbook
  • fault-injection
  • Kernel Livepatching
  • The Linux driver implementer’s API guide
    • Driver Model
    • Driver Basics
    • Device drivers infrastructure
    • ioctl based interfaces
    • Early Userspace
    • CPU and Device Power Management
    • The Common Clk Framework
    • Bus-Independent Device Accesses
    • Buffer Sharing and Synchronization
    • Device links
    • Component Helper for Aggregate Drivers
    • Message-based devices
    • InfiniBand and Remote DMA (RDMA) Interfaces
    • Frame Buffer Library
    • Voltage and current regulator API
    • Reset controller API
    • Industrial I/O
    • Input Subsystem
    • Linux USB API
    • Firewire (IEEE 1394) driver Interface Guide
    • The Linux PCI driver implementer’s API guide
    • Compute Express Link
      • Compute Express Link Memory Devices
        • CXL Bus: Theory of Operation
        • Driver Infrastructure
        • External Interfaces
    • Serial Peripheral Interface (SPI)
    • I2C and SMBus Subsystem
    • IPMB Driver for a Satellite MC
    • The Linux IPMI Driver
    • I3C subsystem
    • Generic System Interconnect Subsystem
    • Device Frequency Scaling
    • High Speed Synchronous Serial Interface (HSI)
    • Error Detection And Correction (EDAC) Devices
    • SCSI Interfaces Guide
    • libATA Developer’s Guide
    • target and iSCSI Interfaces Guide
    • The Common Mailbox Framework
    • MTD NAND Driver Programming Interface
    • Parallel Port Devices
    • 16x50 UART Driver
    • Pulse-Width Modulation (PWM)
    • Intel(R) Management Engine Interface (Intel(R) MEI)
    • Memory Technology Device (MTD)
    • MMC/SD/SDIO card support
    • Non-Volatile Memory Device (NVDIMM)
    • W1: Dallas’ 1-wire bus
    • The Linux RapidIO Subsystem
    • Writing s390 channel device drivers
    • VME Device Drivers
    • Linux 802.11 Driver Developer’s Guide
    • The Userspace I/O HOWTO
    • Linux Firmware API
    • PINCTRL (PIN CONTROL) subsystem
    • General Purpose Input/Output (GPIO)
    • RAID
    • Media subsystem kernel internal API
    • Miscellaneous Devices
    • Near Field Communication
    • DMAEngine documentation
    • Linux kernel SLIMbus support
    • SoundWire Documentation
    • Thermal
    • FPGA Subsystem
    • ACPI Support
    • Auxiliary Bus
    • Kernel driver lp855x
    • Kernel Connector
    • Console Drivers
    • Dell Systems Management Base Driver
    • EISA bus support
    • ISA Drivers
    • ISA Plug & Play support by Jaroslav Kysela <perex@suse.cz>
    • The io_mapping functions
    • Ordering I/O writes to memory-mapped addresses
    • Generic Counter Interface
    • Memory Controller drivers
    • MEN Chameleon Bus
    • NTB Drivers
    • NVMEM Subsystem
    • PARPORT interface documentation
    • PPS - Pulse Per Second
    • PTP hardware clock infrastructure for Linux
    • Generic PHY Framework
    • Pulse Width Modulation (PWM) interface
    • PLDM Firmware Flash Update Library
    • Overview of the pldmfw library
    • rfkill - RF kill switch support
    • Support for Serial devices
    • SM501 Driver
    • Surface System Aggregator Module (SSAM)
    • Linux Switchtec Support
    • Sync File API Guide
    • VFIO Mediated devices
    • VFIO - “Virtual Function I/O”
    • Acceptance criteria for vfio-pci device specific driver variants
    • Xilinx FPGA
    • Xillybus driver for generic FPGA interface
    • Writing Device Drivers for Zorro Devices
  • Core API Documentation
  • locking
  • Accounting
  • Block
  • cdrom
  • Linux CPUFreq - CPU frequency and voltage scaling code in the Linux(TM) kernel
  • Integrated Drive Electronics (IDE)
  • Frame Buffer
  • fpga
  • Human Interface Devices (HID)
  • I2C/SMBus Subsystem
  • Industrial I/O
  • ISDN
  • InfiniBand
  • LEDs
  • NetLabel
  • Linux Networking Documentation
  • pcmcia
  • Power Management
  • TCM Virtual Device
  • timers
  • Serial Peripheral Interface (SPI)
  • 1-Wire Subsystem
  • Linux Watchdog Support
  • Linux Virtualization Support
  • The Linux Input Documentation
  • Linux Hardware Monitoring
  • Linux GPU Driver Developer’s Guide
  • Security Documentation
  • Linux Sound Subsystem Documentation
  • Linux Kernel Crypto API
  • Filesystems in the Linux kernel
  • Linux Memory Management Documentation
  • BPF Documentation
  • USB support
  • Linux PCI Bus Subsystem
  • Linux SCSI Subsystem
  • Assorted Miscellaneous Devices Documentation
  • Linux Scheduler
  • MHI
  • TTY
  • Linux PECI Subsystem
  • Assembler Annotations
  • CPU Architectures
  • Kernel tools
  • Unsorted Documentation
  • Atomic Types
  • Atomic bitops
  • Memory Barriers
  • General notification mechanism
  • Translations
The Linux Kernel
  • »
  • The Linux driver implementer’s API guide »
  • Compute Express Link »
  • Compute Express Link Memory Devices
  • View page source

Compute Express Link Memory Devices¶

A Compute Express Link Memory Device is a CXL component that implements the CXL.mem protocol. It contains some amount of volatile memory, persistent memory, or both. It is enumerated as a PCI device for configuration and passing messages over an MMIO mailbox. Its contribution to the System Physical Address space is handled via HDM (Host Managed Device Memory) decoders that optionally define a device’s contribution to an interleaved address range across multiple devices underneath a host-bridge or interleaved across host-bridges.

CXL Bus: Theory of Operation¶

Similar to how a RAID driver takes disk objects and assembles them into a new logical device, the CXL subsystem is tasked to take PCIe and ACPI objects and assemble them into a CXL.mem decode topology. The need for runtime configuration of the CXL.mem topology is also similar to RAID in that different environments with the same hardware configuration may decide to assemble the topology in contrasting ways. One may choose performance (RAID0) striping memory across multiple Host Bridges and endpoints while another may opt for fault tolerance and disable any striping in the CXL.mem topology.

Platform firmware enumerates a menu of interleave options at the “CXL root port” (Linux term for the top of the CXL decode topology). From there, PCIe topology dictates which endpoints can participate in which Host Bridge decode regimes. Each PCIe Switch in the path between the root and an endpoint introduces a point at which the interleave can be split. For example platform firmware may say at a given range only decodes to 1 one Host Bridge, but that Host Bridge may in turn interleave cycles across multiple Root Ports. An intervening Switch between a port and an endpoint may interleave cycles across multiple Downstream Switch Ports, etc.

Here is a sample listing of a CXL topology defined by ‘cxl_test’. The ‘cxl_test’ module generates an emulated CXL topology of 2 Host Bridges each with 2 Root Ports. Each of those Root Ports are connected to 2-way switches with endpoints connected to those downstream ports for a total of 8 endpoints:

# cxl list -BEMPu -b cxl_test
{
  "bus":"root3",
  "provider":"cxl_test",
  "ports:root3":[
    {
      "port":"port5",
      "host":"cxl_host_bridge.1",
      "ports:port5":[
        {
          "port":"port8",
          "host":"cxl_switch_uport.1",
          "endpoints:port8":[
            {
              "endpoint":"endpoint9",
              "host":"mem2",
              "memdev":{
                "memdev":"mem2",
                "pmem_size":"256.00 MiB (268.44 MB)",
                "ram_size":"256.00 MiB (268.44 MB)",
                "serial":"0x1",
                "numa_node":1,
                "host":"cxl_mem.1"
              }
            },
            {
              "endpoint":"endpoint15",
              "host":"mem6",
              "memdev":{
                "memdev":"mem6",
                "pmem_size":"256.00 MiB (268.44 MB)",
                "ram_size":"256.00 MiB (268.44 MB)",
                "serial":"0x5",
                "numa_node":1,
                "host":"cxl_mem.5"
              }
            }
          ]
        },
        {
          "port":"port12",
          "host":"cxl_switch_uport.3",
          "endpoints:port12":[
            {
              "endpoint":"endpoint17",
              "host":"mem8",
              "memdev":{
                "memdev":"mem8",
                "pmem_size":"256.00 MiB (268.44 MB)",
                "ram_size":"256.00 MiB (268.44 MB)",
                "serial":"0x7",
                "numa_node":1,
                "host":"cxl_mem.7"
              }
            },
            {
              "endpoint":"endpoint13",
              "host":"mem4",
              "memdev":{
                "memdev":"mem4",
                "pmem_size":"256.00 MiB (268.44 MB)",
                "ram_size":"256.00 MiB (268.44 MB)",
                "serial":"0x3",
                "numa_node":1,
                "host":"cxl_mem.3"
              }
            }
          ]
        }
      ]
    },
    {
      "port":"port4",
      "host":"cxl_host_bridge.0",
      "ports:port4":[
        {
          "port":"port6",
          "host":"cxl_switch_uport.0",
          "endpoints:port6":[
            {
              "endpoint":"endpoint7",
              "host":"mem1",
              "memdev":{
                "memdev":"mem1",
                "pmem_size":"256.00 MiB (268.44 MB)",
                "ram_size":"256.00 MiB (268.44 MB)",
                "serial":"0",
                "numa_node":0,
                "host":"cxl_mem.0"
              }
            },
            {
              "endpoint":"endpoint14",
              "host":"mem5",
              "memdev":{
                "memdev":"mem5",
                "pmem_size":"256.00 MiB (268.44 MB)",
                "ram_size":"256.00 MiB (268.44 MB)",
                "serial":"0x4",
                "numa_node":0,
                "host":"cxl_mem.4"
              }
            }
          ]
        },
        {
          "port":"port10",
          "host":"cxl_switch_uport.2",
          "endpoints:port10":[
            {
              "endpoint":"endpoint16",
              "host":"mem7",
              "memdev":{
                "memdev":"mem7",
                "pmem_size":"256.00 MiB (268.44 MB)",
                "ram_size":"256.00 MiB (268.44 MB)",
                "serial":"0x6",
                "numa_node":0,
                "host":"cxl_mem.6"
              }
            },
            {
              "endpoint":"endpoint11",
              "host":"mem3",
              "memdev":{
                "memdev":"mem3",
                "pmem_size":"256.00 MiB (268.44 MB)",
                "ram_size":"256.00 MiB (268.44 MB)",
                "serial":"0x2",
                "numa_node":0,
                "host":"cxl_mem.2"
              }
            }
          ]
        }
      ]
    }
  ]
}

In that listing each “root”, “port”, and “endpoint” object correspond a kernel ‘struct cxl_port’ object. A ‘cxl_port’ is a device that can decode CXL.mem to its descendants. So “root” claims non-PCIe enumerable platform decode ranges and decodes them to “ports”, “ports” decode to “endpoints”, and “endpoints” represent the decode from SPA (System Physical Address) to DPA (Device Physical Address).

Continuing the RAID analogy, disks have both topology metadata and on device metadata that determine RAID set assembly. CXL Port topology and CXL Port link status is metadata for CXL.mem set assembly. The CXL Port topology is enumerated by the arrival of a CXL.mem device. I.e. unless and until the PCIe core attaches the cxl_pci driver to a CXL Memory Expander there is no role for CXL Port objects. Conversely for hot-unplug / removal scenarios, there is no need for the Linux PCI core to tear down switch-level CXL resources because the endpoint ->remove() event cleans up the port data that was established to support that Memory Expander.

The port metadata and potential decode schemes that a give memory device may participate can be determined via a command like:

# cxl list -BDMu -d root -m mem3
{
  "bus":"root3",
  "provider":"cxl_test",
  "decoders:root3":[
    {
      "decoder":"decoder3.1",
      "resource":"0x8030000000",
      "size":"512.00 MiB (536.87 MB)",
      "volatile_capable":true,
      "nr_targets":2
    },
    {
      "decoder":"decoder3.3",
      "resource":"0x8060000000",
      "size":"512.00 MiB (536.87 MB)",
      "pmem_capable":true,
      "nr_targets":2
    },
    {
      "decoder":"decoder3.0",
      "resource":"0x8020000000",
      "size":"256.00 MiB (268.44 MB)",
      "volatile_capable":true,
      "nr_targets":1
    },
    {
      "decoder":"decoder3.2",
      "resource":"0x8050000000",
      "size":"256.00 MiB (268.44 MB)",
      "pmem_capable":true,
      "nr_targets":1
    }
  ],
  "memdevs:root3":[
    {
      "memdev":"mem3",
      "pmem_size":"256.00 MiB (268.44 MB)",
      "ram_size":"256.00 MiB (268.44 MB)",
      "serial":"0x2",
      "numa_node":0,
      "host":"cxl_mem.2"
    }
  ]
}

…which queries the CXL topology to ask “given CXL Memory Expander with a kernel device name of ‘mem3’ which platform level decode ranges may this device participate”. A given expander can participate in multiple CXL.mem interleave sets simultaneously depending on how many decoder resource it has. In this example mem3 can participate in one or more of a PMEM interleave that spans to Host Bridges, a PMEM interleave that targets a single Host Bridge, a Volatile memory interleave that spans 2 Host Bridges, and a Volatile memory interleave that only targets a single Host Bridge.

Conversely the memory devices that can participate in a given platform level decode scheme can be determined via a command like the following:

# cxl list -MDu -d 3.2
[
  {
    "memdevs":[
      {
        "memdev":"mem1",
        "pmem_size":"256.00 MiB (268.44 MB)",
        "ram_size":"256.00 MiB (268.44 MB)",
        "serial":"0",
        "numa_node":0,
        "host":"cxl_mem.0"
      },
      {
        "memdev":"mem5",
        "pmem_size":"256.00 MiB (268.44 MB)",
        "ram_size":"256.00 MiB (268.44 MB)",
        "serial":"0x4",
        "numa_node":0,
        "host":"cxl_mem.4"
      },
      {
        "memdev":"mem7",
        "pmem_size":"256.00 MiB (268.44 MB)",
        "ram_size":"256.00 MiB (268.44 MB)",
        "serial":"0x6",
        "numa_node":0,
        "host":"cxl_mem.6"
      },
      {
        "memdev":"mem3",
        "pmem_size":"256.00 MiB (268.44 MB)",
        "ram_size":"256.00 MiB (268.44 MB)",
        "serial":"0x2",
        "numa_node":0,
        "host":"cxl_mem.2"
      }
    ]
  },
  {
    "root decoders":[
      {
        "decoder":"decoder3.2",
        "resource":"0x8050000000",
        "size":"256.00 MiB (268.44 MB)",
        "pmem_capable":true,
        "nr_targets":1
      }
    ]
  }
]

…where the naming scheme for decoders is “decoder<port_id>.<instance_id>”.

Driver Infrastructure¶

This section covers the driver infrastructure for a CXL memory device.

CXL Memory Device¶

This implements the PCI exclusive functionality for a CXL device as it is defined by the Compute Express Link specification. CXL devices may surface certain functionality even if it isn’t CXL enabled. While this driver is focused around the PCI specific aspects of a CXL device, it binds to the specific CXL memory device class code, and therefore the implementation of cxl_pci is focused around CXL memory devices.

The driver has several responsibilities, mainly:
  • Create the memX device and register on the CXL bus.

  • Enumerate device’s register interface and map them.

  • Registers nvdimm bridge device with cxl_core.

  • Registers a CXL mailbox with cxl_core.

int __cxl_pci_mbox_send_cmd(struct cxl_dev_state *cxlds, struct cxl_mbox_cmd *mbox_cmd)¶

Execute a mailbox command

Parameters

struct cxl_dev_state *cxlds

The device state to communicate with.

struct cxl_mbox_cmd *mbox_cmd

Command to send to the memory device.

Context

Any context. Expects mbox_mutex to be held.

Return

-ETIMEDOUT if timeout occurred waiting for completion. 0 on success.

Caller should check the return code in mbox_cmd to make sure it succeeded.

Description

This is a generic form of the CXL mailbox send command thus only using the registers defined by the mailbox capability ID - CXL 2.0 8.2.8.4. Memory devices, and perhaps other types of CXL devices may have further information available upon error conditions. Driver facilities wishing to send mailbox commands should use the wrapper command.

The CXL spec allows for up to two mailboxes. The intention is for the primary mailbox to be OS controlled and the secondary mailbox to be used by system firmware. This allows the OS and firmware to communicate with the device and not need to coordinate with each other. The driver only uses the primary mailbox.

CXL memory endpoint devices and switches are CXL capable devices that are participating in CXL.mem protocol. Their functionality builds on top of the CXL.io protocol that allows enumerating and configuring components via standard PCI mechanisms.

The cxl_mem driver owns kicking off the enumeration of this CXL.mem capability. With the detection of a CXL capable endpoint, the driver will walk up to find the platform specific port it is connected to, and determine if there are intervening switches in the path. If there are switches, a secondary action is to enumerate those (implemented in cxl_core). Finally the cxl_mem driver adds the device it is bound to as a CXL endpoint-port for use in higher level operations.

CXL Port¶

The port driver enumerates dport via PCI and scans for HDM (Host-managed-Device-Memory) decoder resources via the component_reg_phys value passed in by the agent that registered the port. All descendant ports of a CXL root port (described by platform firmware) are managed in this drivers context. Each driver instance is responsible for tearing down the driver context of immediate descendant ports. The locking for this is validated by CONFIG_PROVE_CXL_LOCKING.

The primary service this driver provides is presenting APIs to other drivers to utilize the decoders, and indicating to userspace (via bind status) the connectivity of the CXL.mem protocol throughout the PCIe topology.

CXL Core¶

The CXL core objects like ports, decoders, and regions are shared between the subsystem drivers cxl_acpi, cxl_pci, and core drivers (port-driver, region-driver, nvdimm object-drivers… etc).

struct cxl_register_map¶

DVSEC harvested register block mapping parameters

Definition

struct cxl_register_map {
  void __iomem *base;
  u64 block_offset;
  u8 reg_type;
  u8 barno;
  union {
    struct cxl_component_reg_map component_map;
    struct cxl_device_reg_map device_map;
  };
};

Members

base

virtual base of the register-block-BAR + block_offset

block_offset

offset to start of register block in barno

reg_type

see enum cxl_regloc_type

barno

PCI BAR number containing the register block

{unnamed_union}

anonymous

component_map

cxl_reg_map for component registers

device_map

cxl_reg_maps for device registers

struct cxl_decoder¶

CXL address range decode configuration

Definition

struct cxl_decoder {
  struct device dev;
  int id;
  union {
    struct resource platform_res;
    struct range decoder_range;
  };
  int interleave_ways;
  int interleave_granularity;
  enum cxl_decoder_type target_type;
  unsigned long flags;
  seqlock_t target_lock;
  int nr_targets;
  struct cxl_dport *target[];
};

Members

dev

this decoder’s device

id

kernel device name id

{unnamed_union}

anonymous

platform_res

address space resources considered by root decoder

decoder_range

address space resources considered by midlevel decoder

interleave_ways

number of cxl_dports in this decode

interleave_granularity

data stride per dport

target_type

accelerator vs expander (type2 vs type3) selector

flags

memory type capabilities and locking

target_lock

coordinate coherent reads of the target list

nr_targets

number of elements in target

target

active ordered target list in current decoder configuration

enum cxl_nvdimm_brige_state¶

state machine for managing bus rescans

Constants

CXL_NVB_NEW

Set at bridge create and after cxl_pmem_wq is destroyed

CXL_NVB_DEAD

Set at brige unregistration to preclude async probing

CXL_NVB_ONLINE

Target state after successful ->probe()

CXL_NVB_OFFLINE

Target state after ->remove() or failed ->probe()

struct cxl_port¶

logical collection of upstream port devices and downstream port devices to construct a CXL memory decode hierarchy.

Definition

struct cxl_port {
  struct device dev;
  struct device *uport;
  int id;
  struct list_head dports;
  struct list_head endpoints;
  struct ida decoder_ida;
  resource_size_t component_reg_phys;
  bool dead;
  unsigned int depth;
};

Members

dev

this port’s device

uport

PCI or platform device implementing the upstream port capability

id

id for port device-name

dports

cxl_dport instances referenced by decoders

endpoints

cxl_ep instances, endpoints that are a descendant of this port

decoder_ida

allocator for decoder ids

component_reg_phys

component register capability base address (optional)

dead

last ep has been removed, force port re-creation

depth

How deep this port is relative to the root. depth 0 is the root.

struct cxl_dport¶

CXL downstream port

Definition

struct cxl_dport {
  struct device *dport;
  int port_id;
  resource_size_t component_reg_phys;
  struct cxl_port *port;
  struct list_head list;
};

Members

dport

PCI bridge or firmware device representing the downstream link

port_id

unique hardware identifier for dport in decoder target list

component_reg_phys

downstream port component registers

port

reference to cxl_port that contains this downstream port

list

node for a cxl_port’s list of cxl_dport instances

struct cxl_ep¶

track an endpoint’s interest in a port

Definition

struct cxl_ep {
  struct device *ep;
  struct list_head list;
};

Members

ep

device that hosts a generic CXL endpoint (expander or accelerator)

list

node on port->endpoints list

The CXL core provides a set of interfaces that can be consumed by CXL aware drivers. The interfaces allow for creation, modification, and destruction of regions, memory devices, ports, and decoders. CXL aware drivers must register with the CXL core via these interfaces in order to be able to participate in cross-device interleave coordination. The CXL core also establishes and maintains the bridge to the nvdimm subsystem.

CXL core introduces sysfs hierarchy to control the devices that are instantiated by the core.

struct cxl_port * devm_cxl_add_port(struct device *host, struct device *uport, resource_size_t component_reg_phys, struct cxl_port *parent_port)¶

register a cxl_port in CXL memory decode hierarchy

Parameters

struct device *host

host device for devm operations

struct device *uport

“physical” device implementing this upstream port

resource_size_t component_reg_phys

(optional) for configurable cxl_port instances

struct cxl_port *parent_port

next hop up in the CXL memory decode hierarchy

struct cxl_dport * devm_cxl_add_dport(struct cxl_port *port, struct device *dport_dev, int port_id, resource_size_t component_reg_phys)¶

append downstream port data to a cxl_port

Parameters

struct cxl_port *port

the cxl_port that references this dport

struct device *dport_dev

firmware or PCI device representing the dport

int port_id

identifier for this dport in a decoder’s target list

resource_size_t component_reg_phys

optional location of CXL component registers

Description

Note that dports are appended to the devm release action’s of the either the port’s host (for root ports), or the port itself (for switch ports)

int cxl_add_ep(struct cxl_port *port, struct device *ep_dev)¶

register an endpoint’s interest in a port

Parameters

struct cxl_port *port

a port in the endpoint’s topology ancestry

struct device *ep_dev

device representing the endpoint

Description

Intermediate CXL ports are scanned based on the arrival of endpoints. When those endpoints depart the port can be destroyed once all endpoints that care about that port have been removed.

struct cxl_decoder * cxl_decoder_alloc(struct cxl_port *port, unsigned int nr_targets)¶

Allocate a new CXL decoder

Parameters

struct cxl_port *port

owning port of this decoder

unsigned int nr_targets

downstream targets accessible by this decoder. All upstream ports and root ports must have at least 1 target. Endpoint devices will have 0 targets. Callers wishing to register an endpoint device should specify 0.

Description

A port should contain one or more decoders. Each of those decoders enable some address space for CXL.mem utilization. A decoder is expected to be configured by the caller before registering.

Return

A new cxl decoder to be registered by cxl_decoder_add(). The decoder

is initialized to be a “passthrough” decoder.

struct cxl_decoder * cxl_root_decoder_alloc(struct cxl_port *port, unsigned int nr_targets)¶

Allocate a root level decoder

Parameters

struct cxl_port *port

owning CXL root of this decoder

unsigned int nr_targets

static number of downstream targets

Return

A new cxl decoder to be registered by cxl_decoder_add(). A ‘CXL root’ decoder is one that decodes from a top-level / static platform firmware description of CXL resources into a CXL standard decode topology.

struct cxl_decoder * cxl_switch_decoder_alloc(struct cxl_port *port, unsigned int nr_targets)¶

Allocate a switch level decoder

Parameters

struct cxl_port *port

owning CXL switch port of this decoder

unsigned int nr_targets

max number of dynamically addressable downstream targets

Return

A new cxl decoder to be registered by cxl_decoder_add(). A ‘switch’ decoder is any decoder that can be enumerated by PCIe topology and the HDM Decoder Capability. This includes the decoders that sit between Switch Upstream Ports / Switch Downstream Ports and Host Bridges / Root Ports.

struct cxl_decoder * cxl_endpoint_decoder_alloc(struct cxl_port *port)¶

Allocate an endpoint decoder

Parameters

struct cxl_port *port

owning port of this decoder

Return

A new cxl decoder to be registered by cxl_decoder_add()

int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map)¶

Add a decoder with targets

Parameters

struct cxl_decoder *cxld

The cxl decoder allocated by cxl_decoder_alloc()

int *target_map

A list of downstream ports that this decoder can direct memory traffic to. These numbers should correspond with the port number in the PCIe Link Capabilities structure.

Description

Certain types of decoders may not have any targets. The main example of this is an endpoint device. A more awkward example is a hostbridge whose root ports get hot added (technically possible, though unlikely).

This is the locked variant of cxl_decoder_add().

Context

Process context. Expects the device lock of the port that owns the cxld to be held.

Return

Negative error code if the decoder wasn’t properly configured; else

returns 0.

int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map)¶

Add a decoder with targets

Parameters

struct cxl_decoder *cxld

The cxl decoder allocated by cxl_decoder_alloc()

int *target_map

A list of downstream ports that this decoder can direct memory traffic to. These numbers should correspond with the port number in the PCIe Link Capabilities structure.

Description

This is the unlocked variant of cxl_decoder_add_locked(). See cxl_decoder_add_locked().

Context

Process context. Takes and releases the device lock of the port that owns the cxld.

int __cxl_driver_register(struct cxl_driver *cxl_drv, struct module *owner, const char *modname)¶

register a driver for the cxl bus

Parameters

struct cxl_driver *cxl_drv

cxl driver structure to attach

struct module *owner

owning module/driver

const char *modname

KBUILD_MODNAME for parent driver

Compute Express Link protocols are layered on top of PCIe. CXL core provides a set of helpers for CXL interactions which occur via PCIe.

int devm_cxl_port_enumerate_dports(struct cxl_port *port)¶

enumerate downstream ports of the upstream port

Parameters

struct cxl_port *port

cxl_port whose ->uport is the upstream of dports to be enumerated

Description

Returns a positive number of dports enumerated or a negative error code.

The core CXL PMEM infrastructure supports persistent memory provisioning and serves as a bridge to the LIBNVDIMM subsystem. A CXL ‘bridge’ device is added at the root of a CXL device topology if platform firmware advertises at least one persistent memory capable CXL window. That root-level bridge corresponds to a LIBNVDIMM ‘bus’ device. Then for each cxl_memdev in the CXL device topology a bridge device is added to host a LIBNVDIMM dimm object. When these bridges are registered native LIBNVDIMM uapis are translated to CXL operations, for example, namespace label access commands.

CXL device capabilities are enumerated by PCI DVSEC (Designated Vendor-specific) and / or descriptors provided by platform firmware. They can be defined as a set like the device and component registers mandated by CXL Section 8.1.12.2 Memory Device PCIe Capabilities and Extended Capabilities, or they can be individual capabilities appended to bridged and endpoint devices.

Provide common infrastructure for enumerating and mapping these discrete capabilities.

Core implementation of the CXL 2.0 Type-3 Memory Device Mailbox. The implementation is used by the cxl_pci driver to initialize the device and implement the cxl_mem.h IOCTL UAPI. It also implements the backend of the cxl_pmem_ctl() transport for LIBNVDIMM.

External Interfaces¶

CXL IOCTL Interface¶

Not all of all commands that the driver supports are always available for use by userspace. Userspace must check the results from the QUERY command in order to determine the live set of commands.

struct cxl_command_info¶

Command information returned from a query.

Definition

struct cxl_command_info {
  __u32 id;
  __u32 flags;
#define CXL_MEM_COMMAND_FLAG_MASK GENMASK(0, 0);
  __s32 size_in;
  __s32 size_out;
};

Members

id

ID number for the command.

flags

Flags that specify command behavior.

size_in

Expected input size, or -1 if variable length.

size_out

Expected output size, or -1 if variable length.

Description

Represents a single command that is supported by both the driver and the hardware. This is returned as part of an array from the query ioctl. The following would be a command that takes a variable length input and returns 0 bytes of output.

  • id = 10

  • flags = 0

  • size_in = -1

  • size_out = 0

See struct cxl_mem_query_commands.

struct cxl_mem_query_commands¶

Query supported commands.

Definition

struct cxl_mem_query_commands {
  __u32 n_commands;
  __u32 rsvd;
  struct cxl_command_info __user commands[];
};

Members

n_commands

In/out parameter. When n_commands is > 0, the driver will return min(num_support_commands, n_commands). When n_commands is 0, driver will return the number of total supported commands.

rsvd

Reserved for future use.

commands

Output array of supported commands. This array must be allocated by userspace to be at least min(num_support_commands, n_commands)

Description

Allow userspace to query the available commands supported by both the driver, and the hardware. Commands that aren’t supported by either the driver, or the hardware are not returned in the query.

Examples

  • { .n_commands = 0 } // Get number of supported commands

  • { .n_commands = 15, .commands = buf } // Return first 15 (or less) supported commands

See struct cxl_command_info.

struct cxl_send_command¶

Send a command to a memory device.

Definition

struct cxl_send_command {
  __u32 id;
  __u32 flags;
  union {
    struct {
      __u16 opcode;
      __u16 rsvd;
    } raw;
    __u32 rsvd;
  };
  __u32 retval;
  struct {
    __s32 size;
    __u32 rsvd;
    __u64 payload;
  } in;
  struct {
    __s32 size;
    __u32 rsvd;
    __u64 payload;
  } out;
};

Members

id

The command to send to the memory device. This must be one of the commands returned by the query command.

flags

Flags for the command (input).

{unnamed_union}

anonymous

raw

Special fields for raw commands

raw.opcode

Opcode passed to hardware when using the RAW command.

raw.rsvd

Must be zero.

rsvd

Must be zero.

retval

Return value from the memory device (output).

in

Parameters associated with input payload.

in.size

Size of the payload to provide to the device (input).

in.rsvd

Must be zero.

in.payload

Pointer to memory for payload input, payload is little endian.

out

Parameters associated with output payload.

out.size

Size of the payload received from the device (input/output). This field is filled in by userspace to let the driver know how much space was allocated for output. It is populated by the driver to let userspace know how large the output payload actually was.

out.rsvd

Must be zero.

out.payload

Pointer to memory for payload output, payload is little endian.

Description

Mechanism for userspace to send a command to the hardware for processing. The driver will do basic validation on the command sizes. In some cases even the payload may be introspected. Userspace is required to allocate large enough buffers for size_out which can be variable length in certain situations.

Next Previous

© Copyright The kernel development community.

Built with Sphinx using a theme provided by Read the Docs.