aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/cxl
AgeCommit message (Collapse)AuthorFilesLines
5 daystracing/treewide: Remove second parameter of __assign_str()Steven Rostedt (Google)1-16/+16
With the rework of how the __string() handles dynamic strings where it saves off the source string in field in the helper structure[1], the assignment of that value to the trace event field is stored in the helper value and does not need to be passed in again. This means that with: __string(field, mystring) Which use to be assigned with __assign_str(field, mystring), no longer needs the second parameter and it is unused. With this, __assign_str() will now only get a single parameter. There's over 700 users of __assign_str() and because coccinelle does not handle the TRACE_EVENT() macro I ended up using the following sed script: git grep -l __assign_str | while read a ; do sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file; mv /tmp/test-file $a; done I then searched for __assign_str() that did not end with ';' as those were multi line assignments that the sed script above would fail to catch. Note, the same updates will need to be done for: __assign_str_len() __assign_rel_str() __assign_rel_str_len() I tested this with both an allmodconfig and an allyesconfig (build only for both). [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts. Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs Tested-by: Guenter Roeck <linux@roeck-us.net>
6 daysMerge tag 'pci-v6.10-changes' of ↵Linus Torvalds5-6/+58
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Skip E820 checks for MCFG ECAM regions for new (2016+) machines, since there's no requirement to describe them in E820 and some platforms require ECAM to work (Bjorn Helgaas) - Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX to be more specific (Damien Le Moal) - Remove last user and pci_enable_device_io() (Heiner Kallweit) - Wait for Link Training==0 to avoid possible race (Ilpo Järvinen) - Skip waiting for devices that have been disconnected while suspended (Ilpo Järvinen) - Clear Secondary Status errors after enumeration since Master Aborts and Unsupported Request errors are an expected part of enumeration (Vidya Sagar) MSI: - Remove unused IMS (Interrupt Message Store) support (Bjorn Helgaas) Error handling: - Mask Genesys GL975x SD host controller Replay Timer Timeout correctable errors caused by a hardware defect; the errors cause interrupts that prevent system suspend (Kai-Heng Feng) - Fix EDR-related _DSM support, which previously evaluated revision 5 but assumed revision 6 behavior (Kuppuswamy Sathyanarayanan) ASPM: - Simplify link state definitions and mask calculation (Ilpo Järvinen) Power management: - Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports, where BIOS apparently doesn't know how to put them back in D0 (Mario Limonciello) CXL: - Support resetting CXL devices; special handling required because CXL Ports mask Secondary Bus Reset by default (Dave Jiang) DOE: - Support DOE Discovery Version 2 (Alexey Kardashevskiy) Endpoint framework: - Set endpoint BAR to be 64-bit if the driver says that's all the device supports, in addition to doing so if the size is >2GB (Niklas Cassel) - Simplify endpoint BAR allocation and setting interfaces (Niklas Cassel) Cadence PCIe controller driver: - Drop DT binding redundant msi-parent and pci-bus.yaml (Krzysztof Kozlowski) Cadence PCIe endpoint driver: - Configure endpoint BARs to be 64-bit based on the BAR type, not the BAR value (Niklas Cassel) Freescale Layerscape PCIe controller driver: - Convert DT binding to YAML (Frank Li) MediaTek MT7621 PCIe controller driver: - Add DT binding missing 'reg' property for child Root Ports (Krzysztof Kozlowski) - Fix theoretical string truncation in PHY name (Sergio Paracuellos) NVIDIA Tegra194 PCIe controller driver: - Return success for endpoint probe instead of falling through to the failure path (Vidya Sagar) Renesas R-Car PCIe controller driver: - Add DT binding missing IOMMU properties (Geert Uytterhoeven) - Add DT binding R-Car V4H compatible for host and endpoint mode (Yoshihiro Shimoda) Rockchip PCIe controller driver: - Configure endpoint BARs to be 64-bit based on the BAR type, not the BAR value (Niklas Cassel) - Add DT binding missing maxItems to ep-gpios (Krzysztof Kozlowski) - Set the Subsystem Vendor ID, which was previously zero because it was masked incorrectly (Rick Wertenbroek) Synopsys DesignWare PCIe controller driver: - Restructure DBI register access to accommodate devices where this requires Refclk to be active (Manivannan Sadhasivam) - Remove the deinit() callback, which was only need by the pcie-rcar-gen4, and do it directly in that driver (Manivannan Sadhasivam) - Add dw_pcie_ep_cleanup() so drivers that support PERST# can clean up things like eDMA (Manivannan Sadhasivam) - Rename dw_pcie_ep_exit() to dw_pcie_ep_deinit() to make it parallel to dw_pcie_ep_init() (Manivannan Sadhasivam) - Rename dw_pcie_ep_init_complete() to dw_pcie_ep_init_registers() to reflect the actual functionality (Manivannan Sadhasivam) - Call dw_pcie_ep_init_registers() directly from all the glue drivers, not just those that require active Refclk from the host (Manivannan Sadhasivam) - Remove the "core_init_notifier" flag, which was an obscure way for glue drivers to indicate that they depend on Refclk from the host (Manivannan Sadhasivam) TI J721E PCIe driver: - Add DT binding J784S4 SoC Device ID (Siddharth Vadapalli) - Add DT binding J722S SoC support (Siddharth Vadapalli) TI Keystone PCIe controller driver: - Add DT binding missing num-viewport, phys and phy-name properties (Jan Kiszka) Miscellaneous: - Constify and annotate with __ro_after_init (Heiner Kallweit) - Convert DT bindings to YAML (Krzysztof Kozlowski) - Check for kcalloc() failure in of_pci_prop_intr_map() (Duoming Zhou)" * tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits) PCI: Do not wait for disconnected devices when resuming x86/pci: Skip early E820 check for ECAM region PCI: Remove unused pci_enable_device_io() ata: pata_cs5520: Remove unnecessary call to pci_enable_device_io() PCI: Update pci_find_capability() stub return types PCI: Remove PCI_IRQ_LEGACY scsi: vmw_pvscsi: Do not use PCI_IRQ_LEGACY instead of PCI_IRQ_LEGACY scsi: pmcraid: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: mpt3sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: megaraid_sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: ipr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: hpsa: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: arcmsr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY wifi: rtw89: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY dt-bindings: PCI: rockchip,rk3399-pcie: Add missing maxItems to ep-gpios Revert "genirq/msi: Provide constants for PCI/IMS support" Revert "x86/apic/msi: Enable PCI/IMS" Revert "iommu/vt-d: Enable PCI/IMS" Revert "iommu/amd: Enable PCI/IMS" Revert "PCI/MSI: Provide IMS (Interrupt Message Store) support" ...
12 daysMerge tag 'cxl-for-6.10' of ↵Linus Torvalds13-229/+385
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull CXL updates from Dave Jiang: - Three CXL mailbox passthrough commands are added to support the populating and clearing of vendor debug logs: - Get Log Capabilities - Get Supported Log Sub-List Commands - Clear Log - Add support of Device Phyiscal Address (DPA) to Host Physical Address (HPA) translation for CXL events of cxl_dram and cxl_general media. This allows user space to figure out which CXL region the event occured via trace event. - Connect CXL to CPER reporting. If a device is configured for firmware first, CXL event records are not sent directly to the host. Those records are reported through EFI Common Platform Error Records (CPER). Add support to route the CPER records through the CXL sub-system in order to provide DPA to HPA translation and also event decoding and tracing. This is useful for users to determine which system issues may correspond to specific hardware events. - A number of misc cleanups and fixes: - Fix for compile warning of cxl_security_ops - Add debug message for invalid interleave granularity - Enhancement to cxl-test event testing - Add dev_warn() on unsupported mixed mode decoder - Fix use of phys_to_target_node() for x86 - Use helper function for decoder enum instead of open coding - Include missing headers for cxl-event - Fix MAINTAINERS file entry - Fix cxlr_pmem memory leak - Cleanup __cxl_parse_cfmws via scope-based resource menagement - Convert cxl_pmem_region_alloc() to scope-based resource management * tag 'cxl-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (21 commits) cxl/cper: Remove duplicated GUID defines cxl/cper: Fix non-ACPI-APEI-GHES build cxl/pci: Process CPER events acpi/ghes: Process CXL Component Events cxl/region: Convert cxl_pmem_region_alloc to scope-based resource management cxl/acpi: Cleanup __cxl_parse_cfmws() cxl/region: Fix cxlr_pmem leaks cxl/core: Add region info to cxl_general_media and cxl_dram events cxl/region: Move cxl_trace_hpa() work to the region driver cxl/region: Move cxl_dpa_to_region() work to the region driver cxl/trace: Correct DPA field masks for general_media & dram events MAINTAINERS: repair file entry in COMPUTE EXPRESS LINK cxl/cxl-event: include missing <linux/types.h> and <linux/uuid.h> cxl/hdm: Debug, use decoder name function cxl: Fix use of phys_to_target_node() for x86 cxl/hdm: dev_warn() on unsupported mixed mode decoder cxl/test: Enhance event testing cxl/hdm: Add debug message for invalid interleave granularity cxl: Fix compile warning for cxl_security_ops extern cxl/mbox: Add Clear Log mailbox command ...
2024-05-08cxl: Add post-reset warning if reset results in loss of previously committed ↵Dave Jiang3-0/+53
HDM decoders Secondary Bus Reset (SBR) is equivalent to a device being hot removed and inserted again. Doing a SBR on a CXL type 3 device is problematic if the exported device memory is part of system memory that cannot be offlined. The event is equivalent to violently ripping out that range of memory from the kernel. While the hardware requires the "Unmask SBR" bit set in the Port Control Extensions register and the kernel currently does not unmask it, user can unmask this bit via setpci or similar tool. The driver does not have a way to detect whether a reset coming from the PCI subsystem is a Function Level Reset (FLR) or SBR. The only way to detect is to note if a decoder is marked as enabled in software but the decoder control register indicates it's not committed. Add a helper function to find discrepancy between the decoder software state versus the hardware register state. Suggested-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20240502165851.1948523-6-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
2024-05-08PCI/CXL: Move CXL Vendor ID to pci_ids.hDave Jiang4-6/+5
Move PCI_DVSEC_VENDOR_ID_CXL in CXL private code to PCI_VENDOR_ID_CXL in pci_ids.h in order to be utilized in PCI subsystem. While the CXL Vendor ID (0x1e98) is not listed in the PCI SIG "Member Companies" database at https://pcisig.com/membership/member-companies, the SIG has confirmed that it is reserved by CXL. Link: https://lore.kernel.org/r/20240502165851.1948523-2-dave.jiang@intel.com Suggested-by: Bjorn Helgaas <helgaas@kernel.org> Link: https://lore.kernel.org/linux-cxl/20240402172323.GA1818777@bhelgaas/ Signed-off-by: Dave Jiang <dave.jiang@intel.com> [bhelgaas: update commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
2024-05-01Merge remote-tracking branch 'cxl/for-6.10/cper' into cxl-for-nextDave Jiang1-1/+70
Add support to send CPER records to CXL for more detailed parsing.
2024-05-01cxl/pci: Process CPER eventsIra Weiny1-1/+70
If the firmware has configured CXL event support to be firmware first the OS will receive those events through CPER records. The CXL layer has unique DPA to HPA knowledge and existing event trace parsing in place.[0] Add a CXL CPER work item and register it with the GHES code to process CPER events. Link: http://lore.kernel.org/r/cover.1711598777.git.alison.schofield@intel.com [0] Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Link: https://lore.kernel.org/r/20240426-cxl-cper3-v4-2-58076cce1624@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-05-01cxl/region: Convert cxl_pmem_region_alloc to scope-based resource managementDan Williams1-26/+17
A recent bugfix to cxl_pmem_region_alloc() to fix an error-unwind-memleak [1], highlighted a use case for scope-based resource management. Delete the goto for releasing @cxl_region_rwsem, and return error codes directly from error condition paths. The caller, devm_cxl_add_pmem_region(), is no longer given @cxlr_pmem directly it must retrieve it from @cxlr->cxlr_pmem. This retrieval from @cxlr was already in place for @cxlr->cxl_nvb, and converting cxl_pmem_region_alloc() to return an int makes it less awkward to handle no_free_ptr(). Cc: Li Zhijian <lizhijian@fujitsu.com> Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Closes: http://lore.kernel.org/r/20240430174540.000039ce@Huawei.com Link: http://lore.kernel.org/r/20240428030748.318985-1-lizhijian@fujitsu.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/171451430965.1147997.15782562063090960666.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-05-01cxl/acpi: Cleanup __cxl_parse_cfmws()Dan Williams2-42/+56
As a follow on to the recent rework of __cxl_parse_cfmws() to always return errors [1], use cleanup.h helpers to remove goto and other cleanups now that logging is moved to the cxl_parse_cfmws() wrapper. This ends up adding more code than it deletes, but __cxl_parse_cfmws() itself does get smaller. The takeaway from the cond_no_free_ptr() discussion [2] was to not add new macros to handle the cases where no_free_ptr() is awkward, instead rework the code to have helpers and clearer delineation of responsibility. Now one might say that __free(del_cxl_resource) is excessive given it is immediately registered with add_or_reset_cxl_resource(). The rationale for keeping it is that it forces use of "no_free_ptr()" on the argument passed to add_or_reset_cxl_resource(). That in turn makes it clear that @res is NULL for the rest of the function which is part of the point of the cleanup helpers, to turn subtle use after free errors [3] into loud NULL pointer de-references. Link: http://lore.kernel.org/r/170820177238.631006.1012639681618409284.stgit@dwillia2-xfh.jf.intel.com [1] Link: http://lore.kernel.org/r/CAHk-=whBVhnh=KSeBBRet=E7qJAwnPR_aj5em187Q3FiD+LXnA@mail.gmail.com [2] Link: http://lore.kernel.org/r/20230714093146.2253438-1-leitao@debian.org [3] Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Closes: http://lore.kernel.org/r/20240219124041.00002bda@Huawei.com Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/171235474028.2718248.14109646123143505522.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30cxl/region: Fix cxlr_pmem leaksLi Zhijian1-0/+1
Before this error path, cxlr_pmem pointed to a kzalloc() memory, free it to avoid this memory leaking. Fixes: f17b558d6663 ("cxl/pmem: Refactor nvdimm device registration, delete the workqueue") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240428030748.318985-1-lizhijian@fujitsu.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30Merge remote-tracking branch 'cxl/for-6.10/dpa-to-hpa' into cxl-for-nextDave Jiang6-154/+216
Support for HPA to DPA translation for CXL events cxl_dram and cxl_general_media.
2024-04-30cxl/core: Add region info to cxl_general_media and cxl_dram eventsAlison Schofield2-15/+65
User space may need to know which region, if any, maps the DPAs (device physical addresses) reported in a cxl_general_media or cxl_dram event. Since the mapping can change, the kernel provides this information at the time the event occurs. This informs user space that at event <timestamp> this <region> mapped this <DPA> to this <HPA>. Add the same region info that is included in the cxl_poison trace event: the DPA->HPA translation, region name, and region uuid. The new fields are inserted in the trace event and no existing fields are modified. If the DPA is not mapped, user will see: hpa=ULLONG_MAX, region="", and uuid=0 This work must be protected by dpa_rwsem & region_rwsem since it is looking up region mappings. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/dd8d708b7a7ebfb64a27020a5eb338091336b34d.1714496730.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30cxl/region: Move cxl_trace_hpa() work to the region driverAlison Schofield4-93/+98
This work belongs in the region driver as it is only useful with CONFIG_CXL_REGION. Add a stub in core.h for when the region driver is not built. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/183222631f11a43c5e6debc42ec22fe1bd4b818a.1714496730.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30cxl/region: Move cxl_dpa_to_region() work to the region driverAlison Schofield3-44/+51
This helper belongs in the region driver as it is only useful with CONFIG_CXL_REGION. Add a stub in core.h for when the region driver is not built. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/05e30f788d62b3dd398aff2d2ea50a6aaa7c3313.1714496730.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30cxl/trace: Correct DPA field masks for general_media & dram eventsAlison Schofield1-2/+2
The length of Physical Address in General Media and DRAM event records is 64-bit, so the field mask for extracting the DPA should be 64-bit also, otherwise the trace event reports DPA's with the upper 32 bits of a DPA address masked off. If users do DPA-to-HPA translations this could lead to incorrect page retirement decisions. Use GENMASK_ULL() for CXL_DPA_MASK to get all the DPA address bits. Tidy up CXL_DPA_FLAGS_MASK by using GENMASK() to only mask the exact flag bits. These bits are defined as part of the event record physical address descriptions of General Media and DRAM events in CXL Spec 3.1 Section 8.2.9.2 Events. Fixes: d54a531a430b ("cxl/mem: Trace General Media Event Record") Co-developed-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/2867fc43c57720a4a15a3179431829b8dbd2dc16.1714496730.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30Merge remote-tracking branch 'cxl/for-6.10/add-log-mbox-cmds' into cxl-for-nextDave Jiang2-0/+15
Add CXL log related mailbox commands - Add Get Log Capabilities command - Add Get Supported Log Sub-List Commands command - Add Clear Log command
2024-04-30cxl/hdm: Debug, use decoder name functionIra Weiny1-2/+1
The decoder enum has a name conversion function defined now. Use that instead of open coding. Suggested-by: Navneet Singh <navneet.singh@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230604-dcd-type2-upstream-v2-1-f740c47e7916@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30cxl: Fix use of phys_to_target_node() for x86Robert Richter1-0/+1
The CXL driver uses both functions phys_to_target_node() and memory_add_physaddr_to_nid(). The x86 architecture relies on the NUMA_KEEP_MEMINFO kernel option enabled for both functions to work correct. Update Kconfig to make sure the option is always enabled for the driver. Suggested-by: Dan Williams <dan.j.williams@intel.com> Link: http://lore.kernel.org/r/65f8b191c0422_aa222941b@dwillia2-mobl3.amr.corp.intel.com.notmuch Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Robert Richter <rrichter@amd.com> Link: https://lore.kernel.org/r/20240424154756.2152614-1-rrichter@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30cxl/hdm: dev_warn() on unsupported mixed mode decoderAlison Schofield1-2/+2
A mixed mode decoder is programmed with device physical addresses that span both ram and pmem partitions of a memdev. Linux does not support mixed mode decoders. The driver rejects sysfs writes that try to set decoder mode to mixed, and if a resource bieng allocated is not wholly contained in either the pmem or ram partition of a memdev, it is also rejected. Basically, the CXL region driver is not going to create regions with mixed mode decoders, but the BIOS could. If the kernel driver sees the mixed mode decoder, it will fail to enable the region, and emit a dev_dbg() message. A dev_dbg() is not noisy enough in this case. Change the message to be a dev_warn() that explicitly says mixed mode is not supported. Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230218013834.31237-1-alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30cxl/hdm: Add debug message for invalid interleave granularityHuang Ying1-1/+5
There's no debug message for invalid interleave granularity. This makes it hard to debug related bugs. So, this is added in this patch. Signed-off-by: Huang, Ying <ying.huang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240402061016.388408-1-ying.huang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30cxl: Fix compile warning for cxl_security_ops externDave Jiang2-2/+2
Jonathan reported he has observed compiler warning when using running with W=1 C=1 for cxl_security_ops that is declared as an extern in cxl/pmem.c. Move to cxl.h to make it visible to all cxl sources. Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/linux-cxl/167771196186.3285982.18283746206612049722.stgit@djiang5-mobl3.local/ Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30cxl/mbox: Add Clear Log mailbox commandSrinivasulu Thanneeru2-0/+11
Adding UAPI support for CXL r3.1 8.2.9.5.4 Clear Log command. This proposed patch will be useful for clearing and populating the Vendor debug log in certain scenarios, allowing for the aggregation of results over time. Signed-off-by: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240313071218.729-3-sthanneeru.opensrc@micron.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30cxl/mbox: Add Get Log Capabilities and Get Supported Logs Sub-List commandsSrinivasulu Thanneeru2-0/+4
Adding UAPI support for 1. CXL r3.1 8.2.9.5.3 Get Log Capabilities. 2. CXL r3.1 8.2.9.5.6 Get Supported Logs Sub-List. Signed-off-by: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240313071218.729-2-sthanneeru.opensrc@micron.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-29cxl: Fix cxl_endpoint_get_perf_coordinate() support for RCHDave Jiang1-1/+14
Robert reported the following when booting a CXL host with Restricted CXL Host (RCH) topology: [ 39.815379] cxl_acpi ACPI0017:00: not a cxl_port device [ 39.827123] WARNING: CPU: 46 PID: 1754 at drivers/cxl/core/port.c:592 to_cxl_port+0x56/0x70 [cxl_core] ... plus some related subsequent NULL pointer dereference: [ 40.718708] BUG: kernel NULL pointer dereference, address: 00000000000002d8 The iterator to walk the PCIe path did not account for RCH topology. However RCH does not support hotplug and the memory exported by the Restricted CXL Device (RCD) should be covered by HMAT and therefore no access_coordinate is needed. Add check to see if the endpoint device is RCD and skip calculation. Also add a call to cxl_endpoint_get_perf_coordinates() in cxl_test in order to exercise the topology iterator. The dev_is_pci() check added is to help with this test and should be harmless for normal operation. Reported-by: Robert Richter <rrichter@amd.com> Closes: https://lore.kernel.org/all/Ziv8GfSMSbvlBB0h@rric.localdomain/ Fixes: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path") Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Robert Richter <rrichter@amd.com> Reviewed-by: Robert Richter <rrichter@amd.com> Link: https://lore.kernel.org/r/20240426224913.1027420-1-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-22cxl/core: Fix potential payload size confusion in cxl_mem_get_poison()Dan Williams1-21/+17
A recent change to cxl_mem_get_records_log() [1] highlighted a subtle nuance of looping calls to cxl_internal_send_cmd(), i.e. that cxl_internal_send_cmd() modifies the 'size_out' member of the @mbox_cmd argument. That mechanism is useful for communicating underflow, but it is unwanted when reusing @mbox_cmd for a subsequent submission. It turns out that cxl_xfer_log() avoids this scenario by always redefining @mbox_cmd each iteration. Update cxl_mem_get_records_log() and cxl_mem_get_poison() to follow the same style as cxl_xfer_log(), i.e. re-define @mbox_cmd each iteration. The cxl_mem_get_records_log() change is just a style fixup, but the cxl_mem_get_poison() change is a potential fix, per Alison [2]: Poison list retrieval can hit this case if the MORE flag is set and a follow on read of the list delivers more records than the previous read. ie. device gives one record, sets the _MORE flag, then gives 5. Not an urgent fix since this behavior has not been seen in the wild, but worth tracking as a fix. Cc: Kwangjin Ko <kwangjin.ko@sk.com> Cc: Alison Schofield <alison.schofield@intel.com> Fixes: ed83f7ca398b ("cxl/mbox: Add GET_POISON_LIST mailbox command") Link: http://lore.kernel.org/r/20240402081404.1106-2-kwangjin.ko@sk.com [1] Link: http://lore.kernel.org/r/ZhAhAL/GOaWFrauw@aschofie-mobl2 [2] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/r/171235441633.2716581.12330082428680958635.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-11Merge tag 'cxl-fixes-6.9-rc4' of ↵Linus Torvalds7-141/+158
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Dave Jiang: - Fix index of Clear Event Record handles in cxl_clear_event_record() - Fix use before init of map->reg_type in cxl_decode_regblock() - Fix initialization of mbox_cmd.size_out in cxl_mem_get_records_log() - Fix CXL path access_coordinate computation: - Remove unneded check of iter in loop - Fix of retrieving of access_coordinate in PCI topology walk - Fix of incorrect region access_coordinate data calculation - Consolidate of access_coordinates attached to downstream port context - Add check to validate access_coordinate validity to prevent incorrect data being exposed via sysfs * tag 'cxl-fixes-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl: Add checks to access_coordinate calculation to fail missing data cxl: Consolidate dport access_coordinate ->hb_coord and ->sw_coord into ->coord cxl: Fix incorrect region perf data calculation cxl: Fix retrieving of access_coordinates in PCIe path cxl: Remove checking of iter in cxl_endpoint_get_perf_coordinates() cxl/core: Fix initialization of mbox_cmd.size_out in get event cxl/core/regs: Fix usage of map->reg_type in cxl_decode_regblock() before assigned cxl/mem: Fix for the index of Clear Event Record Handle
2024-04-08cxl: Add checks to access_coordinate calculation to fail missing dataDave Jiang1-1/+18
Jonathan noted that when the coordinates for host bridge and switches can be 0s if no actual data are retrieved and the calculation continues. The resulting number would be inaccurate. Add checks to ensure that the calculation would complete only if the numbers are valid. While not seen in the wild, issue may show up with a BIOS that reported CXL root ports via Generic Ports (via a PCI handle in the SRAT entry). Fixes: 14a6960b3e92 ("cxl: Add helper function that calculate performance data for downstream ports") Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20240403154844.3403859-6-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-08cxl: Consolidate dport access_coordinate ->hb_coord and ->sw_coord into ->coordDave Jiang5-44/+88
The driver stores access_coordinate for host bridge in ->hb_coord and switch CDAT access_coordinate in ->sw_coord. Since neither of these access_coordinate clobber each other, the variable name can be consolidated into ->coord to simplify the code. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20240403154844.3403859-5-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-08cxl: Fix incorrect region perf data calculationDave Jiang4-100/+45
Current math in cxl_region_perf_data_calculate divides the latency by 1000 every time the function gets called. This causes the region latency to be divided by 1000 per memory device and the math is incorrect. This is user visible as the latency access_coordinate exposed via sysfs will show incorrect latency data. Normalize values from CDAT to nanoseconds. Adjust sub-nanoseconds latency to at least 1. Remove adjustment of perf numbers from the generic target since hmat handling code has already normalized those numbers. Now all computation and stored numbers should be in nanoseconds. cxl_hb_get_perf_coordinates() is removed and HB coords are calculated in the port access_coordinate calculation path since it no longer need to be treated special. Fixes: 3d9f4a197230 ("cxl/region: Calculate performance data for a region") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20240403154844.3403859-4-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-08cxl: Fix retrieving of access_coordinates in PCIe pathDave Jiang1-13/+22
Current loop in cxl_endpoint_get_perf_coordinates() incorrectly assumes the Root Port (RP) dport is the one with generic port access_coordinate. However those coordinates are one level up in the Host Bridge (HB). Current code causes the computation code to pick up 0s as the coordinates and cause minimal bandwidth to result in 0. Add check to skip RP when combining coordinates. Fixes: 14a6960b3e92 ("cxl: Add helper function that calculate performance data for downstream ports") Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20240403154844.3403859-3-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-05cxl: Remove checking of iter in cxl_endpoint_get_perf_coordinates()Dave Jiang1-1/+1
The while() loop in cxl_endpoint_get_perf_coordinates() checks to see if 'iter' is valid as part of the condition breaking out of the loop. is_cxl_root() will stop the loop before the next iteration could go NULL. Remove the iter check. The presence of the iter or removing the iter does not impact the behavior of the code. This is a code clean up and not a bug fix. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20240403154844.3403859-2-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-03cxl/core: Fix initialization of mbox_cmd.size_out in get eventKwangjin Ko1-1/+2
Since mbox_cmd.size_out is overwritten with the actual output size in the function below, it needs to be initialized every time. cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd Problem scenario: 1) The size_out variable is initially set to the size of the mailbox. 2) Read an event. - size_out is set to 160 bytes(header 32B + one event 128B). - Two event are created while reading. 3) Read the new *two* events. - size_out is still set to 160 bytes. - Although the value of out_len is 288 bytes, only 160 bytes are copied from the mailbox register to the local variable. - record_count is set to 2. - Accessing records[1] will result in reading incorrect data. Fixes: 6ebe28f9ec72 ("cxl/mem: Read, trace, and clear events on driver load") Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Kwangjin Ko <kwangjin.ko@sk.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-03-26cxl/core/regs: Fix usage of map->reg_type in cxl_decode_regblock() before ↵Dave Jiang1-2/+3
assigned In the error path, map->reg_type is being used for kernel warning before its value is setup. Found by code inspection. Exposure to user is wrong reg_type being emitted via kernel log. Use a local var for reg_type and retrieve value for usage. Fixes: 6c7f4f1e51c2 ("cxl/core/regs: Make cxl_map_{component, device}_regs() device generic") Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-03-26cxl/mem: Fix for the index of Clear Event Record HandleYuquan Wang1-1/+1
The dev_dbg info for Clear Event Records mailbox command would report the handle of the next record to clear not the current one. This was because the index 'i' had incremented before printing the current handle value. Fixes: 6ebe28f9ec72 ("cxl/mem: Read, trace, and clear events on driver load") Signed-off-by: Yuquan Wang <wangyuquan1236@phytium.com.cn> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-03-27cxl: remove CONFIG_CXL_PMU entry in drivers/cxl/KconfigMasahiro Yamada1-13/+0
Commit 5d7107c72796 ("perf: CXL Performance Monitoring Unit driver") added the config entries for CXL_PMU in drivers/cxl/Kconfig and drivers/perf/Kconfig, so it can be toggled from multiple locations: [1] Device Drivers -> PCI support -> CXL (Compute Expres Link) Devices -> CXL Performance Monitoring Unit [2] Device Drivers -> Performance monitor support -> CXL Performance Monitoring Unit This complicates things, and nobody else does this. I kept the one in drivers/perf/Kconfig because CONFIG_CXL_PMU controls the compilation of drivers/perf/cxl_pmu.c. Acked-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-03-18Merge tag 'trace-v6.9-2' of ↵Linus Torvalds1-7/+7
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: "Main user visible change: - User events can now have "multi formats" The current user events have a single format. If another event is created with a different format, it will fail to be created. That is, once an event name is used, it cannot be used again with a different format. This can cause issues if a library is using an event and updates its format. An application using the older format will prevent an application using the new library from registering its event. A task could also DOS another application if it knows the event names, and it creates events with different formats. The multi-format event is in a different name space from the single format. Both the event name and its format are the unique identifier. This will allow two different applications to use the same user event name but with different payloads. - Added support to have ftrace_dump_on_oops dump out instances and not just the main top level tracing buffer. Other changes: - Add eventfs_root_inode Only the root inode has a dentry that is static (never goes away) and stores it upon creation. There's no reason that the thousands of other eventfs inodes should have a pointer that never gets set in its descriptor. Create a eventfs_root_inode desciptor that has a eventfs_inode descriptor and a dentry pointer, and only the root inode will use this. - Added WARN_ON()s in eventfs There's some conditionals remaining in eventfs that should never be hit, but instead of removing them, add WARN_ON() around them to make sure that they are never hit. - Have saved_cmdlines allocation also include the map_cmdline_to_pid array The saved_cmdlines structure allocates a large amount of data to hold its mappings. Within it, it has three arrays. Two are already apart of it: map_pid_to_cmdline[] and saved_cmdlines[]. More memory can be saved by also including the map_cmdline_to_pid[] array as well. - Restructure __string() and __assign_str() macros used in TRACE_EVENT() Dynamic strings in TRACE_EVENT() are declared with: __string(name, source) And assigned with: __assign_str(name, source) In the tracepoint callback of the event, the __string() is used to get the size needed to allocate on the ring buffer and __assign_str() is used to copy the string into the ring buffer. There's a helper structure that is created in the TRACE_EVENT() macro logic that will hold the string length and its position in the ring buffer which is created by __string(). There are several trace events that have a function to create the string to save. This function is executed twice. Once for __string() and again for __assign_str(). There's no reason for this. The helper structure could also save the string it used in __string() and simply copy that into __assign_str() (it also already has its length). By using the structure to store the source string for the assignment, it means that the second argument to __assign_str() is no longer needed. It will be removed in the next merge window, but for now add a warning if the source string given to __string() is different than the source string given to __assign_str(), as the source to __assign_str() isn't even used and will be going away. - Added checks to make sure that the source of __string() is also the source of __assign_str() so that it can be safely removed in the next merge window. Included fixes that the above check found. - Other minor clean ups and fixes" * tag 'trace-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (34 commits) tracing: Add __string_src() helper to help compilers not to get confused tracing: Use strcmp() in __assign_str() WARN_ON() check tracepoints: Use WARN() and not WARN_ON() for warnings tracing: Use div64_u64() instead of do_div() tracing: Support to dump instance traces by ftrace_dump_on_oops tracing: Remove second parameter to __assign_rel_str() tracing: Add warning if string in __assign_str() does not match __string() tracing: Add __string_len() example tracing: Remove __assign_str_len() ftrace: Fix most kernel-doc warnings tracing: Decrement the snapshot if the snapshot trigger fails to register tracing: Fix snapshot counter going between two tracers that use it tracing: Use EVENT_NULL_STR macro instead of open coding "(null)" tracing: Use ? : shortcut in trace macros tracing: Do not calculate strlen() twice for __string() fields tracing: Rework __assign_str() and __string() to not duplicate getting the string cxl/trace: Properly initialize cxl_poison region name net: hns3: tracing: fix hclgevf trace event strings drm/i915: Add missing ; to __assign_str() macros in tracepoint code NFSD: Fix nfsd_clid_class use of __string_len() macro ...
2024-03-18cxl/trace: Properly initialize cxl_poison region nameAlison Schofield1-7/+7
The TP_STRUCT__entry that gets assigned the region name, or an empty string if no region is present, is erroneously initialized to the cxl_region pointer. It needs to be properly initialized otherwise it's length is wrong and garbage chars can appear in the kernel trace output: /sys/kernel/tracing/trace The bad initialization was due in part to a naming conflict with the parameter: struct cxl_region *region. The field 'region' is already exposed externally as the region name, so changing that to something logical, like 'region_name' is not an option. Instead rename the internal only struct cxl_region to the commonly used 'cxlr'. Impact is that tooling depending on that trace data can miss picking up a valid event when searching by region name. The TP_printk() output, if enabled, does emit the correct region names in the dmesg log. This was found during testing of the cxl-list option to report media-errors for a region. Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: stable@vger.kernel.org Fixes: ddf49d57b841 ("cxl/trace: Add TRACE support for CXL media-error records") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-03-13Merge branch 'for-6.9/cxl-fixes' into for-6.9/cxlDan Williams1-15/+15
Pick up a parsing fix for the CDAT SSLBIS structure for v6.9.
2024-03-13Merge branch 'for-6.9/cxl-einj' into for-6.9/cxlDan Williams1-0/+41
Pick up support for injecting errors via ACPI EINJ into the CXL protocol for v6.9.
2024-03-13Merge branch 'for-6.9/cxl-qos' into for-6.9/cxlDan Williams6-33/+342
Pick up support for CXL "HMEM reporting" for v6.9, i.e. build an HMAT from CXL CDAT and PCIe switch information.
2024-03-13lib/firmware_table: Provide buffer length argument to cdat_table_parse()Robert Richter2-4/+10
There exist card implementations with a CDAT table using a fixed size buffer, but with entries filled in that do not fill the whole table length size. Then, the last entry in the CDAT table may not mark the end of the CDAT table buffer specified by the length field in the CDAT header. It can be shorter with trailing unused (zero'ed) data. The actual table length is determined while reading all CDAT entries of the table with DOE. If the table is greater than expected (containing zero'ed trailing data), the CDAT parser fails with: [ 48.691717] Malformed DSMAS table length: (24:0) [ 48.702084] [CDAT:0x00] Invalid zero length [ 48.711460] cxl_port endpoint1: Failed to parse CDAT: -22 In addition, a check of the table buffer length is missing to prevent an out-of-bound access then parsing the CDAT table. Hardening code against device returning borked table. Fix that by providing an optional buffer length argument to acpi_parse_entries_array() that can be used by cdat_table_parse() to propagate the buffer size down to its users to check the buffer length. This also prevents a possible out-of-bound access mentioned. Add a check to warn about a malformed CDAT table length. Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Len Brown <lenb@kernel.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/ZdEnopFO0Tl3t2O1@rric.localdomain Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl/pci: Get rid of pointer arithmetic reading CDAT tableRobert Richter2-36/+65
Reading the CDAT table using DOE requires a Table Access Response Header in addition to the CDAT entry. In current implementation this has caused offsets with sizeof(__le32) to the actual buffers. This led to hardly readable code and even bugs. E.g., see fix of devm_kfree() in read_cdat_data(): commit c65efe3685f5 ("cxl/cdat: Free correct buffer on checksum error") Rework code to avoid calculations with sizeof(__le32). Introduce struct cdat_doe_rsp for this which contains the Table Access Response Header and a variable payload size for various data structures afterwards to access the CDAT table and its CDAT Data Structures without recalculating buffer offsets. Cc: Lukas Wunner <lukas@wunner.de> Cc: Fan Ni <nifan.cxl@gmail.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240216155844.406996-3-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl/pci: Rename DOE mailbox handle to doe_mbRobert Richter1-10/+10
Trivial variable rename for the DOE mailbox handle from cdat_doe to doe_mb. The variable name cdat_doe is too ambiguous, use doe_mb that is commonly used for the mailbox. Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240216155844.406996-2-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl: Fix the incorrect assignment of SSLBIS entry pointer initial locationDave Jiang1-15/+15
The 'entry' pointer in cdat_sslbis_handler() is set to header + sizeof(common header). However, the math missed the addition of the SSLBIS main header. It should be header + sizeof(common header) + sizeof(*sslbis). Use a defined struct for all the SSLBIS parts in order to avoid pointer math errors. The bug causes incorrect parsing of the SSLBIS table and introduces incorrect performance values to the access_coordinates during the CXL access_coordinate calculation path if there are CXL switches present in the topology. The issue was found during testing of new code being added to add additional checks for invalid CDAT values during CXL access_coordinate calculation. The testing was done on qemu with a CXL topology including a CXL switch. Fixes: 80aa780dda20 ("cxl: Add callback to parse the SSLBIS subtable from CDAT") Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/20240301210948.1298075-1-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl/core: Add CXL EINJ debugfs filesBen Cheatham1-0/+41
Export CXL helper functions in einj-cxl.c for getting/injecting available CXL protocol error types to sysfs under kernel/debug/cxl. The kernel/debug/cxl/einj_types file will print the available CXL protocol errors in the same format as the available_error_types file provided by the einj module. The kernel/debug/cxl/$dport_dev/einj_inject file is functionally the same as the error_type and error_inject files provided by the EINJ module, i.e.: writing an error type into $dport_dev/einj_inject will inject said error type into the CXL dport represented by $dport_dev. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ben Cheatham <Benjamin.Cheatham@amd.com> Link: https://lore.kernel.org/r/20240311142508.31717-4-Benjamin.Cheatham@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl/region: Deal with numa nodes not enumerated by SRATDave Jiang3-1/+12
For the numa nodes that are not created by SRAT, no memory_target is allocated and is not managed by the HMAT_REPORTING code. Therefore hmat_callback() memory hotplug notifier will exit early on those NUMA nodes. The CXL memory hotplug notifier will need to call node_set_perf_attrs() directly in order to setup the access sysfs attributes. In acpi_numa_init(), the last proximity domain (pxm) id created by SRAT is stored. Add a helper function acpi_node_backed_by_real_pxm() in order to check if a NUMA node id is defined by SRAT or created by CFMWS. node_set_perf_attrs() symbol is exported to allow update of perf attribs for a node. The sysfs path of /sys/devices/system/node/nodeX/access0/initiators/* is created by node_set_perf_attrs() for the various attributes where nodeX is matched to the NUMA node of the CXL region. Cc: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-13-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl/region: Add memory hotplug notifier for cxl regionDave Jiang4-0/+80
When the CXL region is formed, the driver computes the performance data for the region. However this data is not available at the node data collection that has been populated by the HMAT during kernel initialization. Add a memory hotplug notifier to update the access coordinates to the 'struct memory_target' context kept by the HMAT_REPORTING code. Add CXL_CALLBACK_PRI for a memory hotplug callback priority. Set the priority number to be called before HMAT_CALLBACK_PRI. The CXL update must happen before hmat_callback(). A new HMAT_REPORTING helper hmat_update_target_coordinates() is added in order to allow CXL to update the memory_target access coordinates. A new ext_updated member is added to the memory_target to indicate that the access coordinates within the memory_target has been updated by an external agent such as CXL. This prevents data being overwritten by the hmat_update_target_attrs() triggered by hmat_callback(). Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Huang, Ying <ying.huang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-12-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl/region: Add sysfs attribute for locality attributes of CXL regionsDave Jiang1-0/+94
Add read/write latencies and bandwidth sysfs attributes for the enabled CXL region. The bandwidth is the aggregated bandwidth of all devices that contribute to the CXL region. The latency is the worst latency of the device amongst all the devices that contribute to the CXL region. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-11-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl/region: Calculate performance data for a regionDave Jiang3-0/+71
Calculate and store the performance data for a CXL region. Find the worst read and write latency for all the included ranges from each of the devices that attributes to the region and designate that as the latency data. Sum all the read and write bandwidth data for each of the device region and that is the total bandwidth for the region. The perf list is expected to be constructed before the endpoint decoders are registered and thus there should be no early reading of the entries from the region assemble action. The calling of the region qos calculate function is under the protection of cxl_dpa_rwsem and will ensure that all DPA associated work has completed. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-10-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl: Set cxlmd->endpoint before adding port deviceDave Jiang1-1/+1
Move setting of cxlmd->endpoint to before calling add_device() on the port device. Otherwise when referencing cxlmd->endpoint in region discovery code that is triggered by the port driver probe function, the endpoint port pointer is not valid. Current code does not hit this issue yet since cxlmd->endpoint is not being referenced during region discovery. However follow on code that does performance calculations will. Tested-by: Wonjae Lee <wj28.lee@samsung.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-9-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl: Move QoS class to be calculated from the nearest CPUDave Jiang1-3/+3
Retrieve the qos_class (QTG ID) using the access coordinates from the nearest CPU rather than the nearst initiator that may not be a CPU. This may be the more appropriate number that applications care about. For most cases, access0 and access1 have the same values. Link: https://lore.kernel.org/linux-cxl/20240112113023.00006c50@Huawei.com/ Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-8-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl: Split out host bridge access coordinatesDave Jiang3-9/+56
The difference between access class 0 and access class 1 for 'struct access_coordinate', if any, is that class 0 is for the distance from the target to the closest initiator and that class 1 is for the distance from the target to the closest CPU. For CXL memory, the nearest initiator may not necessarily be a CPU node. The performance path from the CXL endpoint to the host bridge should remain the same. However, the numbers extracted and stored from HMAT is the difference for the two access classes. Split out the performance numbers for the host bridge (generic target) from the calculation of the entire path in order to allow calculation of both access classes for a CXL region. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-7-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl: Split out combine_coordinates() for common shared usageDave Jiang3-25/+29
Refactor the common code of combining coordinates in order to reduce code. Create a new function cxl_cooordinates_combine() it combine two 'struct access_coordinate'. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-6-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12ACPI: HMAT / cxl: Add retrieval of generic port coordinates for both access ↵Dave Jiang3-5/+7
classes Update acpi_get_genport_coordinates() to allow retrieval of both access classes of the 'struct access_coordinate' for a generic target. The update will allow CXL code to compute access coordinates for both access class. Cc: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-5-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-20cxl/acpi: Fix load failures due to single window creation failureDan Williams1-18/+28
The expectation is that cxl_parse_cfwms() continues in the face the of failure as evidenced by code like: cxlrd = cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb); if (IS_ERR(cxlrd)) return 0; There are other error paths in that function which mistakenly follow idiomatic expectations and return an error when they should not. Most of those mistakes are innocuous checks that hardly ever fail in practice. However, a recent change succeed in making the implementation more fragile by applying an idiomatic, but still wrong "fix" [1]. In this failure case the kernel reports: cxl root0: Failed to populate active decoder targets cxl_acpi ACPI0017:00: Failed to add decode range: [mem 0x00000000-0x7fffffff flags 0x200] ...which is a real issue with that one window (to be fixed separately), but ends up failing the entirety of cxl_acpi_probe(). Undo that recent breakage while also removing the confusion about ignoring errors. Update all exits paths to return an error per typical expectations and let an outer wrapper function handle dropping the error. Fixes: 91019b5bc7c2 ("cxl/acpi: Return 'rc' instead of '0' in cxl_parse_cfmws()") [1] Cc: <stable@vger.kernel.org> Cc: Breno Leitao <leitao@debian.org> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-20Merge branch 'for-6.8/cxl-cper' into for-6.8/cxlDan Williams2-59/+4
Pick up CXL CPER notification removal for v6.8-rc6, to return in a later merge window.
2024-02-20acpi/ghes: Remove CXL CPER notificationsDan Williams1-56/+1
Initial tests with the CXL CPER implementation identified that error reports were being duplicated in the log and the trace event [1]. Then it was discovered that the notification handler took sleeping locks while the GHES event handling runs in spin_lock_irqsave() context [2] While the duplicate reporting was fixed in v6.8-rc4, the fix for the sleeping-lock-vs-atomic collision would enjoy more time to settle and gain some test cycles. Given how late it is in the development cycle, remove the CXL hookup for now and try again during the next merge window. Note that end result is that v6.8 does not emit CXL CPER payloads to the kernel log, but this is in line with the CXL trend to move error reporting to trace events instead of the kernel log. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: http://lore.kernel.org/r/20240108165855.00002f5a@Huawei.com [1] Closes: http://lore.kernel.org/r/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain [2] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-16cxl/pci: Fix disabling memory if DVSEC CXL Range does not match a CFMWS windowRobert Richter1-3/+3
The Linux CXL subsystem is built on the assumption that HPA == SPA. That is, the host physical address (HPA) the HDM decoder registers are programmed with are system physical addresses (SPA). During HDM decoder setup, the DVSEC CXL range registers (cxl-3.1, 8.1.3.8) are checked if the memory is enabled and the CXL range is in a HPA window that is described in a CFMWS structure of the CXL host bridge (cxl-3.1, 9.18.1.3). Now, if the HPA is not an SPA, the CXL range does not match a CFMWS window and the CXL memory range will be disabled then. The HDM decoder stops working which causes system memory being disabled and further a system hang during HDM decoder initialization, typically when a CXL enabled kernel boots. Prevent a system hang and do not disable the HDM decoder if the decoder's CXL range is not found in a CFMWS window. Note the change only fixes a hardware hang, but does not implement HPA/SPA translation. Support for this can be added in a follow on patch series. Signed-off-by: Robert Richter <rrichter@amd.com> Fixes: 34e37b4c432c ("cxl/port: Enable HDM Capability after validating DVSEC Ranges") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20240216160113.407141-1-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-16cxl: Fix sysfs export of qos_class for memdevDave Jiang4-36/+66
Current implementation exports only to /sys/bus/cxl/devices/.../memN/qos_class. With both ram and pmem exposed, the second registered sysfs attribute is rejected as duplicate. It's not possible to create qos_class under the dev_groups via the driver due to the ram and pmem sysfs sub-directories already created by the device sysfs groups. Move the ram and pmem qos_class to the device sysfs groups and add a call to sysfs_update() after the perf data are validated so the qos_class can be visible. The end results should be /sys/bus/cxl/devices/.../memN/ram/qos_class and /sys/bus/cxl/devices/.../memN/pmem/qos_class. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240206190431.1810289-4-dave.jiang@intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-16cxl: Remove unnecessary type cast in cxl_qos_class_verify()Dave Jiang1-2/+1
The passed in host bridge parameter for device_for_each_child() has unnecessary void * type cast. Remove the type cast. Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240206190431.1810289-3-dave.jiang@intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-16cxl: Change 'struct cxl_memdev_state' *_perf_list to single 'struct ↵Dave Jiang4-90/+34
cxl_dpa_perf' In order to address the issue with being able to expose qos_class sysfs attributes under 'ram' and 'pmem' sub-directories, the attributes must be defined as static attributes rather than under driver->dev_groups. To avoid implementing locking for accessing the 'struct cxl_dpa_perf` lists, convert the list to a single 'struct cxl_dpa_perf' entry in preparation to move the attributes to statically defined. While theoretically a partition may have multiple qos_class via CDAT, this has not been encountered with testing on available hardware. The code is simplified for now to not support the complex case until a use case is needed to support that. Link: https://lore.kernel.org/linux-cxl/65b200ba228f_2d43c29468@dwillia2-mobl3.amr.corp.intel.com.notmuch/ Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240206190431.1810289-2-dave.jiang@intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-16cxl/region: Allow out of order assembly of autodiscovered regionsAlison Schofield1-10/+38
Autodiscovered regions can fail to assemble if they are not discovered in HPA decode order. The user will see failure messages like: [] cxl region0: endpoint5: HPA order violation region1 [] cxl region0: endpoint5: failed to allocate region reference The check that is causing the failure helps the CXL driver enforce a CXL spec mandate that decoders be committed in HPA order. The check is needless for autodiscovered regions since their decoders are already programmed. Trying to enforce order in the assembly of these regions is useless because they are assembled once all their member endpoints arrive, and there is no guarantee on the order in which endpoints are discovered during probe. Keep the existing check, but for autodiscovered regions, allow the out of order assembly after a sanity check that the lesser numbered decoder has the lesser HPA starting address. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Wonjae Lee <wj28.lee@samsung.com> Link: https://lore.kernel.org/r/3dec69ee97524ab229a20c6739272c3000b18408.1706736863.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-16cxl/region: Handle endpoint decoders in cxl_region_find_decoder()Alison Schofield1-6/+8
In preparation for adding a new caller of cxl_region_find_decoders() teach it to find a decoder from a cxl_endpoint_decoder structure. Combining switch and endpoint decoder lookup in one function prevents code duplication in call sites. Update the existing caller. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Wonjae Lee <wj28.lee@samsung.com> Link: https://lore.kernel.org/r/79ae6d72978ef9f3ceec9722e1cb793820553c8e.1706736863.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-09Merge tag 'efi-fixes-for-v6.8-1' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: "The only notable change here is the patch that changes the way we deal with spurious errors from the EFI memory attribute protocol. This will be backported to v6.6, and is intended to ensure that we will not paint ourselves into a corner when we tighten this further in order to comply with MS requirements on signed EFI code. Note that this protocol does not currently exist in x86 production systems in the field, only in Microsoft's fork of OVMF, but it will be mandatory for Windows logo certification for x86 PCs in the future. - Tighten ELF relocation checks on the RISC-V EFI stub - Give up if the new EFI memory attributes protocol fails spuriously on x86 - Take care not to place the kernel in the lowest 16 MB of DRAM on x86 - Omit special purpose EFI memory from memblock - Some fixes for the CXL CPER reporting code - Make the PE/COFF layout of mixed-mode capable images comply with a strict interpretation of the spec" * tag 'efi-fixes-for-v6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: x86/efistub: Use 1:1 file:memory mapping for PE/COFF .compat section cxl/trace: Remove unnecessary memcpy's cxl/cper: Fix errant CPER prints for CXL events efi: Don't add memblocks for soft-reserved memory efi: runtime: Fix potential overflow of soft-reserved region size efi/libstub: Add one kernel-doc comment x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR x86/efistub: Give up if memory attribute protocol returns an error riscv/efistub: Tighten ELF relocation check riscv/efistub: Ensure GP-relative addressing is not used
2024-02-03cxl/trace: Remove unnecessary memcpy'sIra Weiny1-3/+3
CPER events don't have UUIDs. Therefore UUIDs were removed from the records passed to trace events and replaced with hard coded values. As pointed out by Jonathan, the new defines for the UUIDs present a more efficient way to assign UUID in trace records.[1] Replace memcpy's with the use of static data. [1] https://lore.kernel.org/all/20240108132325.00000e9c@Huawei.com/ Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-01-29cxl/pci: Skip to handle RAS errors if CXL.mem device is detachedLi Ming1-12/+31
The PCI AER model is an awkward fit for CXL error handling. While the expectation is that a PCI device can escalate to link reset to recover from an AER event, the same reset on CXL amounts to a surprise memory hotplug of massive amounts of memory. At present, the CXL error handler attempts some optimistic error handling to unbind the device from the cxl_mem driver after reaping some RAS register values. This results in a "hopeful" attempt to unplug the memory, but there is no guarantee that will succeed. A subsequent AER notification after the memdev unbind event can no longer assume the registers are mapped. Check for memdev bind before reaping status register values to avoid crashes of the form: BUG: unable to handle page fault for address: ffa00000195e9100 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page [...] RIP: 0010:__cxl_handle_ras+0x30/0x110 [cxl_core] [...] Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x82/0x160 ? kernelmode_fixup_or_oops+0x84/0x110 ? exc_page_fault+0x113/0x170 ? asm_exc_page_fault+0x26/0x30 ? __pfx_dpc_reset_link+0x10/0x10 ? __cxl_handle_ras+0x30/0x110 [cxl_core] ? find_cxl_port+0x59/0x80 [cxl_core] cxl_handle_rp_ras+0xbc/0xd0 [cxl_core] cxl_error_detected+0x6c/0xf0 [cxl_core] report_error_detected+0xc7/0x1c0 pci_walk_bus+0x73/0x90 pcie_do_recovery+0x23f/0x330 Longer term, the unbind and PCI_ERS_RESULT_DISCONNECT behavior might need to be replaced with a new PCI_ERS_RESULT_PANIC. Fixes: 6ac07883dbb5 ("cxl/pci: Add RCH downstream port error logging") Cc: stable@vger.kernel.org Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Li Ming <ming4.li@intel.com> Link: https://lore.kernel.org/r/20240129131856.2458980-1-ming4.li@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-24cxl/region:Fix overflow issue in alloc_hpa()Quanquan Cao1-2/+2
Creating a region with 16 memory devices caused a problem. The div_u64_rem function, used for dividing an unsigned 64-bit number by a 32-bit one, faced an issue when SZ_256M * p->interleave_ways. The result surpassed the maximum limit of the 32-bit divisor (4G), leading to an overflow and a remainder of 0. note: At this point, p->interleave_ways is 16, meaning 16 * 256M = 4G To fix this issue, I replaced the div_u64_rem function with div64_u64_rem and adjusted the type of the remainder. Signed-off-by: Quanquan Cao <caoqq@fujitsu.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Fixes: 23a22cd1c98b ("cxl/region: Allocate HPA capacity to regions") Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-22cxl/pci: Skip irq features if MSI/MSI-X are not supportedIra Weiny1-11/+15
CXL 3.1 Section 3.1.1 states: "A Function on a CXL device must not generate INTx messages if that Function participates in CXL.cache protocol or CXL.mem protocols." The generic CXL memory driver only supports devices which use the CXL.mem protocol. The current driver attempts to allocate MSI/MSI-X vectors in anticipation of their need for mailbox interrupts or event processing. However, the above requirement does not require a device to support interrupts, only that they use MSI/MSI-X. For example, a device may disable mailbox interrupts and either be configured for firmware first or skip event processing and function. Dave Larsen reported that the following Intel / Agilex card does not support interrupts on function 0. CXL: Intel Corporation Device 0ddb (rev 01) (prog-if 10 [CXL Memory Device (CXL 2.x)]) Rather than fail device probe if interrupts are not supported; flag that irqs are not enabled and avoid features which require interrupts. Emit messages appropriate for the situation to aid in debugging should device behavior be unexpected due to a failure to allocate vectors. Note that it is possible for a device to have host based event processing through polling. However, the driver does not support polling and it is not anticipated to be generally required. Leave that functionality to a future patch if such a device comes along. Reported-by: Dave Larsen <davelarsen58@gmail.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-and-tested-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://lore.kernel.org/r/20240117-dont-fail-irq-v2-1-f33f26b0e365@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-12cxl/core: use sysfs_emit() for attr's _show()Shiyang Ruan1-1/+1
sprintf() is deprecated for sysfs, use preferred sysfs_emit() instead. Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/20240112062709.2490947-1-ruansy.fnst@fujitsu.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-09Merge branch 'for-6.8/cxl-cper' into for-6.8/cxlDan Williams4-138/+124
Pick up the CPER to CXL driver integration work for v6.8. Some additional cleanup of cper_estatus_print() messages is needed, but that is to be handled incrementally.
2024-01-09cxl/pci: Register for and process CPER eventsIra Weiny3-13/+89
If the firmware has configured CXL event support to be firmware first the OS can process those events through CPER records. The CXL layer has unique DPA to HPA knowledge and standard event trace parsing in place. CPER records contain Bus, Device, Function information which can be used to identify the PCI device which is sending the event. Change the PCI driver registration to include registration of a CXL CPER callback to process events through the trace subsystem. Use new scoped based management to simplify the handling of the PCI device object. Tested-by: Smita-Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Smita-Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-9-1bb8a4ca2c7a@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> [djbw: use new pci_dev guard, flip init order] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-09cxl/events: Create a CXL event unionIra Weiny2-23/+17
The CXL CPER and event log records share everything but a UUID/GUID in their structures. Define a cxl_event union without the UUID/GUID to be shared between the CPER and event log record formats. Adjust the code to use this union. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-6-1bb8a4ca2c7a@intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-09cxl/events: Separate UUID from event structuresIra Weiny1-1/+1
The UEFI CXL CPER structure does not include the UUID. Now that the UUID is passed separately to the trace event there is no need to have the UUID in those structures. Move UUID from the event record header to the raw structures. Adjust cxl-test to Create dummy structures for creating test records. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-5-1bb8a4ca2c7a@intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-09cxl/events: Remove passing a UUID to known event tracesIra Weiny2-15/+19
The UUID data is redundant in the known event trace types. The addition of static defines allows the trace macros to create the UUID data inside the trace thus removing unnecessary code. Have well known trace events use static data to set the uuid field based on the event type. Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-4-1bb8a4ca2c7a@intel.com Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-09cxl/events: Create common event UUID definesIra Weiny2-27/+27
Dan points out in review that the cxl_test code could be made better through the use of UUID's defines rather than being open coded.[1] Create UUID defines and use them rather than open coding them. Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: http://lore.kernel.org/r/65738d09e30e2_45e0129451@dwillia2-xfh.jf.intel.com.notmuch [1] Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-3-1bb8a4ca2c7a@intel.com [djbw: clang-format uuid definitions] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-05Merge branch 'for-6.7/cxl' into for-6.8/cxlDan Williams3-26/+14
Pick up a late locking change + fixup that is better as merge window material than rc material.
2024-01-05Merge branch 'for-6.8/cxl-misc' into for-6.8/cxlDan Williams1-1/+1
Pick up some miscellaneous fixups for v6.8.
2024-01-05Merge branch 'for-6.8/cxl-cdat' into for-6.8/cxlDan Williams6-27/+44
Pick up some follow-on fixes for 'cxl_root' reference count leaks.
2024-01-05cxl/events: Promote CXL event structures to a core headerIra Weiny1-89/+1
UEFI code can process CXL events through CPER records. Those records use almost the same format as the CXL events. Lift the CXL event structures to a core header to be shared in later patches. [jic123: drop "CXL rev 3.0" mention] Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-2-1bb8a4ca2c7a@intel.com [djbw: add F: entry to maintainers for include/linux/cxl-event.h] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-05cxl: Refactor to use __free() for cxl_root allocation in ↵Dave Jiang1-3/+2
cxl_endpoint_port_probe() Use scope-based resource management __free() macro to drop the open coded put_device() in cxl_endpoint_port_probe(). Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170449247973.3779673.15088722836135359275.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-05cxl: Refactor to use __free() for cxl_root allocation in ↵Dave Jiang1-5/+3
cxl_find_nvdimm_bridge() Use scope-based resource management __free() macro to drop the open coded put_device() in cxl_find_nvdimm_bridge(). Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170449247353.3779673.5963704495491343135.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-05cxl: Fix device reference leak in cxl_port_perf_data_calculate()Dave Jiang1-2/+5
cxl_port_perf_data_calculate() calls find_cxl_root() and does not dereference the 'struct device' in the cxl_root->port. find_cxl_root() calls get_device() and takes a reference on the port 'struct device' member. Use the __free() macro to ensure the dereference happens. Fixes: 7a4f148dd8d5 ("cxl: Compute the entire CXL path latency and bandwidth data") Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170449246681.3779673.2288926019977963333.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-05cxl: Convert find_cxl_root() to return a 'struct cxl_root *'Dave Jiang6-23/+28
Commit 790815902ec6 ("cxl: Add support for _DSM Function for retrieving QTG ID") introduced 'struct cxl_root', however all usages have been worked indirectly through cxl_port. Refactor code such as find_cxl_root() function to use 'struct cxl_root' directly. Suggested-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170449246044.3779673.13035770941393418591.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-05cxl: Introduce put_cxl_root() helperDave Jiang2-0/+12
Add a helper function put_cxl_root() to maintain symmetry for find_cxl_root() function instead of relying on open coding of the put_device() in order to dereference the 'struct device' that happens via get_device() in find_cxl_root(). Suggested-by: Robert Richter <rrichter@amd.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Robert Richter <rrichter@amd.com> Link: https://lore.kernel.org/r/170449245417.3779673.4566146351673989387.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-04cxl/port: Fix missing target list lockDan Williams2-17/+7
cxl_port_setup_targets() modifies the ->targets[] array of a switch decoder. target_list_show() expects to be able to emit a coherent snapshot of that array by "holding" ->target_lock for read. The target_lock is held for write during initialization of the ->targets[] array, but it is not held for write during cxl_port_setup_targets(). The ->target_lock() predates the introduction of @cxl_region_rwsem. That semaphore protects changes to host-physical-address (HPA) decode which is precisely what writes to a switch decoder's target list affects. Replace ->target_lock with @cxl_region_rwsem. Now the side-effect of snapshotting a unstable view of a decoder's target list is likely benign so the Fixes: tag is presumptive. Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-04cxl/port: Fix decoder initialization when nr_targets > interleave_waysHuang Ying1-1/+1
The decoder_populate_targets() helper walks all of the targets in a port and makes sure they can be looked up in @target_map. Where @target_map is a lookup table from target position to target id (corresponding to a cxl_dport instance). However @target_map is only responsible for conveying the active dport instances as indicated by interleave_ways. When nr_targets > interleave_ways it results in decoder_populate_targets() walking off the end of the valid entries in @target_map. Given target_map is initialized to 0 it results in the dport lookup failing if position 0 is not mapped to a dport with an id of 0: cxl_port port3: Failed to populate active decoder targets cxl_port port3: Failed to add decoder cxl_port port3: Failed to add decoder3.0 cxl_bus_probe: cxl_port port3: probe: -6 This bug also highlights that when the decoder's ->targets[] array is written in cxl_port_setup_targets() it is missing a hold of the targets_lock to synchronize against sysfs readers of the target list. A fix for that is saved for a later patch. Fixes: a5c258021689 ("cxl/bus: Populate the target list at decoder create") Cc: <stable@vger.kernel.org> Signed-off-by: Huang, Ying <ying.huang@intel.com> [djbw: rewrite the changelog, find the Fixes: tag] Co-developed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-03cxl/region: fix x9 interleave typoJim Harris1-1/+1
CXL supports x3, x6 and x12 - not x9. Fixes: 80d10a6cee050 ("cxl/region: Add interleave geometry attributes") Signed-off-by: Jim Harris <jim.harris@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/169904271254.204936.8580772404462743630.stgit@ubuntu Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-03cxl/trace: Pass UUID explicitly to event tracesIra Weiny2-18/+18
CXL CPER events are identified by the CPER Section Type GUID. The GUID correlates with the CXL UUID for the event record. It turns out that a CXL CPER record is a strict subset of the CXL event record, only the UUID header field is chopped. In order to unify handling between native and CPER flavors of CXL events, prepare the code for the UUID to be passed in rather than inferred from the record itself. Later patches update the passed in record to only refer to the common data between the formats. Pass the UUID explicitly to each trace event to be able to remove the UUID from the event structures. Originally it was desirable to remove the UUID from the well known event because the UUID value was redundant. However, the trace API was already in place.[1] Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/all/36f2d12934d64a278f2c0313cbd01abc@huawei.com [1] Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-1-1bb8a4ca2c7a@intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-02Merge branch 'for-6.8/cxl-cdat' into for-6.8/cxlDan Williams17-39/+1009
Pick up the CDAT parsing and QOS class infrastructure for v6.8.
2024-01-02cxl/region: use %pap format to print resource_size_tRandy Dunlap1-2/+2
Use "%pap" to print a resource_size_t (phys_addr_t derived type) to prevent build warnings on 32-bit arches (seen on i386 and riscv-32). ../drivers/cxl/core/region.c: In function 'alloc_hpa': ../drivers/cxl/core/region.c:556:25: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type 'resource_size_t' {aka 'unsigned int'} [-Wformat=] 556 | "HPA allocation error (%ld) for size:%#llx in %s %pr\n", Fixes: 7984d22f1315 ("cxl/region: Add dev_dbg() detail on failure to allocate HPA space") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Fan Ni <fan.ni@samsung.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <linux-cxl@vger.kernel.org> Link: https://lore.kernel.org/r/20240102173917.19718-1-rdunlap@infradead.org Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-24cxl/region: Add dev_dbg() detail on failure to allocate HPA spaceAlison Schofield1-2/+3
When the region driver fails while allocating HPA space for a new region it can be because the parent resource, the CXL Window, has no more available space. In that case, the debug user sees this message: cxl_core:alloc_hpa:555: cxl region2: failed to allocate HPA: -34 Expand the message like this: cxl_core:alloc_hpa:555: cxl region8: HPA allocation error (-34) for size:0x20000000 in CXL Window 0 [mem 0xf010000000-0xf04fffffff flags 0x200] Now the debug user can examine /proc/iomem and consider actions like removing other allocations in that space or reducing the size of their region request. Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/20231223004740.1401858-1-alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22cxl: Check qos_class validity on memdev probeDave Jiang1-0/+103
Add a check to make sure the qos_class for the device will match one of the root decoders qos_class. If no match is found, then the qos_class for the device is set to invalid. Also add a check to ensure that the device's host bridge matches to one of the root decoder's downstream targets. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170319626313.2212653.9021004640856081917.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22cxl: Export sysfs attributes for memory device QoS classDave Jiang1-6/+61
Export qos_class sysfs attributes for the CXL memory device. The QoS clas should show up as /sys/bus/cxl/devices/memX/ram/qos_class for the volatile partition and /sys/bus/cxl/devices/memX/pmem/qos_class for the persistent partition. The QTG ID is retrieved via _DSM after supplying the calculated bandwidth and latency for the entire CXL path from device to the CPU. This ID is used to match up to the root decoder QoS class to determine which CFMWS the memory range of a hotplugged CXL mem device should be assigned under. While there may be multiple DSMAS exported by the device CDAT, the driver will only expose the first QTG ID per partition in sysfs for now. In the future when multiple QTG IDs are necessary, they can be exposed. [1] [1]: https://lore.kernel.org/linux-cxl/167571650007.587790.10040913293130712882.stgit@djiang5-mobl3.local/T/#md2a47b1ead3e1ba08f50eab29a4af1aed1d215ab Suggested-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170319625698.2212653.17544381274847420961.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22cxl: Store QTG IDs and related info to the CXL memory device contextDave Jiang3-0/+92
Once the QTG ID _DSM is executed successfully, the QTG ID is retrieved from the return package. Create a list of entries in the cxl_memdev context and store the QTG ID as qos_class token and the associated DPA range. This information can be exposed to user space via sysfs in order to help region setup for hot-plugged CXL memory devices. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170319625109.2212653.11872111896220384056.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22cxl: Compute the entire CXL path latency and bandwidth dataDave Jiang1-1/+58
CXL Memory Device SW Guide [1] rev1.0 2.11.2 provides instruction on how to calculate latency and bandwidth for CXL memory device. Calculate minimum bandwidth and total latency for the path from the CXL device to the root port. The QTG id is retrieved by providing the performance data as input and calling the root port callback ->get_qos_class(). The retrieved id is stored with the cxl_port of the CXL device. For example for a device that is directly attached to a host bus: Total Latency = Device Latency (from CDAT) + Dev to Host Bus (HB) Link Latency + Generic Port Latency Min Bandwidth = Min bandwidth for link bandwidth between HB and CXL device, device CDAT bandwidth, and Generic Port Bandwidth For a device that has a switch in between host bus and CXL device: Total Latency = Device (CDAT) Latency + Dev to Switch Link Latency + Switch (CDAT) Latency + Switch to HB Link Latency + Generic Port Latency Min Bandwidth = Min bandwidth for link bandwidth between CXL device to CXL switch, CXL device CDAT bandwidth, CXL switch CDAT bandwidth, CXL switch to HB bandwidth, and Generic Port Bandwidth. [1]: https://cdrdv2-public.intel.com/643805/643805_CXL%20Memory%20Device%20SW%20Guide_Rev1p0.pdf Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170319624458.2212653.13252496567443656371.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22cxl: Add helper function that calculate performance data for downstream portsDave Jiang2-0/+78
The CDAT information from the switch, Switch Scoped Latency and Bandwidth Information Structure (SSLBIS), is parsed and stored under a cxl_dport based on the correlated downstream port id from the SSLBIS entry. Walk the entire CXL port paths and collect all the performance data. Also pick up the link latency number that's stored under the dports. The entire path PCIe bandwidth can be retrieved using the pcie_bandwidth_available() call. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170319623824.2212653.10302079766473698427.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22cxl: Store the access coordinates for the generic portsDave Jiang2-0/+27
Each CXL host bridge is represented by an ACPI0016 device. A generic port device handle that is an ACPI device is represented by a string of ACPI0016 device HID and UID. Create a device handle from the ACPI device and retrieve the access coordinates from the stored memory targets. The access coordinates are stored under the cxl_dport that is associated with the CXL host bridge. The access coordinates struct is dynamically allocated under cxl_dport in order for code later on to detect whether the data exists or not. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170319623196.2212653.17916695743464172534.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22cxl: Calculate and store PCI link latency for the downstream portsDave Jiang5-0/+61
The latency is calculated by dividing the flit size over the bandwidth. Add support to retrieve the flit size for the CXL switch device and calculate the latency of the PCIe link. Cache the latency number with cxl_dport. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170319621931.2212653.6800240203604822886.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22cxl: Add support for _DSM Function for retrieving QTG IDDave Jiang3-13/+193
CXL spec v3.0 9.17.3 CXL Root Device Specific Methods (_DSM) Add support to retrieve QTG ID via ACPI _DSM call. The _DSM call requires an input of an ACPI package with 4 dwords (read latency, write latency, read bandwidth, write bandwidth). The call returns a package with 1 WORD that provides the max supported QTG ID and a package that may contain 0 or more WORDs as the recommended QTG IDs in the recommended order. Create a cxl_root container for the root cxl_port and provide a callback ->get_qos_class() in order to retrieve the QoS class. For the ACPI case, the _DSM helper is used to retrieve the QTG ID and returned. A devm_cxl_add_root() function is added for root port setup and registration of the cxl_root callback operation(s). Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/170319621294.2212653.1649682083061569256.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22cxl: Add callback to parse the SSLBIS subtable from CDATDave Jiang3-0/+104
Provide a callback to parse the Switched Scoped Latency and Bandwidth Information Structure (SSLBIS) in the CDAT structures. The SSLBIS contains the bandwidth and latency information that's tied to the CXL switch that the data table has been read from. The extracted values are stored to the cxl_dport correlated by the port_id depending on the SSLBIS entry. Coherent Device Attribute Table 1.03 2.1 Switched Scoped Latency and Bandwidth Information Structure (DSLBIS) Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170319620635.2212653.5194389158785365150.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22cxl: Add callback to parse the DSLBIS subtable from CDATDave Jiang1-2/+100
Provide a callback to parse the Device Scoped Latency and Bandwidth Information Structure (DSLBIS) in the CDAT structures. The DSLBIS contains the bandwidth and latency information that's tied to a DSMAS handle. The driver will retrieve the read and write latency and bandwidth associated with the DSMAS which is tied to a DPA range. Coherent Device Attribute Table 1.03 2.1 Device Scoped Latency and Bandwidth Information Structure (DSLBIS) Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170319620005.2212653.7475488478229720542.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22cxl: Add callback to parse the DSMAS subtables from CDATDave Jiang5-0/+99
Provide a callback function to the CDAT parser in order to parse the Device Scoped Memory Affinity Structure (DSMAS). Each DSMAS structure contains the DPA range and its associated attributes in each entry. See the CDAT specification for details. The device handle and the DPA range is saved and to be associated with the DSLBIS locality data when the DSLBIS entries are parsed. The xarray is a local variable. When the total path performance data is calculated and storred this xarray can be discarded. Coherent Device Attribute Table 1.03 2.1 Device Scoped memory Affinity Structure (DSMAS) Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170319619355.2212653.2675953129671561293.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-18cxl: Fix unregister_region() callback parameter assignmentDave Jiang1-4/+4
In devm_cxl_add_region(), devm_add_action_or_reset() is called by passing in unregister_region() with data ptr of 'cxlr'. However, in unregister_region(), the passed in parameter is incorrectly assumed to be a 'struct device' rather than the 'cxlr' pointer. The code has been working because 'struct device' is the first member of 'struct cxl_region'. Issue found by inspection. Fix the assignment so that cxlr is pointing directly to the passed in parameter. Not flagged for -stable since there is no functional impact of this fix. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/170258123810.952211.3907381447996426480.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-14cxl/pmu: Ensure put_device on pmu devicesIra Weiny1-1/+1
The following kmemleaks were detected when removing the cxl module stack: unreferenced object 0xffff88822616b800 (size 1024): ... backtrace: [<00000000bedc6f83>] kmalloc_trace+0x26/0x90 [<00000000448d1afc>] devm_cxl_pmu_add+0x3a/0x110 [cxl_core] [<00000000ca3bfe16>] 0xffffffffa105213b [<00000000ba7f78dc>] local_pci_probe+0x41/0x90 [<000000005bb027ac>] pci_device_probe+0xb0/0x1c0 ... unreferenced object 0xffff8882260abcc0 (size 16): ... hex dump (first 16 bytes): 70 6d 75 5f 6d 65 6d 30 2e 30 00 26 82 88 ff ff pmu_mem0.0.&.... backtrace: ... [<00000000152b5e98>] dev_set_name+0x43/0x50 [<00000000c228798b>] devm_cxl_pmu_add+0x102/0x110 [cxl_core] [<00000000ca3bfe16>] 0xffffffffa105213b [<00000000ba7f78dc>] local_pci_probe+0x41/0x90 [<000000005bb027ac>] pci_device_probe+0xb0/0x1c0 ... unreferenced object 0xffff8882272af200 (size 256): ... backtrace: [<00000000bedc6f83>] kmalloc_trace+0x26/0x90 [<00000000a14d1813>] device_add+0x4ea/0x890 [<00000000a3f07b47>] devm_cxl_pmu_add+0xbe/0x110 [cxl_core] [<00000000ca3bfe16>] 0xffffffffa105213b [<00000000ba7f78dc>] local_pci_probe+0x41/0x90 [<000000005bb027ac>] pci_device_probe+0xb0/0x1c0 ... devm_cxl_pmu_add() correctly registers a device remove function but it only calls device_del() which is only part of device unregistration. Properly call device_unregister() to free up the memory associated with the device. Fixes: 1ad3f701c399 ("cxl/pci: Find and register CXL PMU devices") Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231016-pmu-unregister-fix-v1-1-1e2eb2fa3c69@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-08cxl/cdat: Free correct buffer on checksum errorIra Weiny1-7/+6
The new 6.7-rc1 kernel now checks the checksum on CDAT data. While using a branch of Fan's DCD qemu work (and specifying DCD devices), the following splat was observed. WARNING: CPU: 1 PID: 1384 at drivers/base/devres.c:1064 devm_kfree+0x4f/0x60 ... RIP: 0010:devm_kfree+0x4f/0x60 ... ? devm_kfree+0x4f/0x60 read_cdat_data+0x1a0/0x2a0 [cxl_core] cxl_port_probe+0xdf/0x200 [cxl_port] ... The issue in qemu is still unknown but the spat is a straight forward bug in the CDAT checksum processing code. Use a CDAT buffer variable to ensure the devm_free() works correctly on error. Fixes: 670e4e88f3b1 ("cxl: Add checksum verification to CDAT from CXL") Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Robert Richter <rrichter@amd.com> Link: http://lore.kernel.org/r/20231116-fix-cdat-devm-free-v1-1-b148b40707d7@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-07cxl/hdm: Fix dpa translation lockingDan Williams2-4/+3
The helper, cxl_dpa_resource_start(), snapshots the dpa-address of an endpoint-decoder after acquiring the cxl_dpa_rwsem. However, it is sufficient to assert that cxl_dpa_rwsem is held rather than acquire it in the helper. Otherwise, it triggers multiple lockdep reports: 1/ Tracing callbacks are in an atomic context that can not acquire sleeping locks: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1525 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1288, name: bash preempt_count: 2, expected: 0 RCU nest depth: 0, expected: 0 [..] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023 Call Trace: <TASK> dump_stack_lvl+0x71/0x90 __might_resched+0x1b2/0x2c0 down_read+0x1a/0x190 cxl_dpa_resource_start+0x15/0x50 [cxl_core] cxl_trace_hpa+0x122/0x300 [cxl_core] trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core] 2/ The rwsem is already held in the inject poison path: WARNING: possible recursive locking detected 6.7.0-rc2+ #12 Tainted: G W OE N -------------------------------------------- bash/1288 is trying to acquire lock: ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_dpa_resource_start+0x15/0x50 [cxl_core] but task is already holding lock: ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_inject_poison+0x7d/0x1e0 [cxl_core] [..] Call Trace: <TASK> dump_stack_lvl+0x71/0x90 __might_resched+0x1b2/0x2c0 down_read+0x1a/0x190 cxl_dpa_resource_start+0x15/0x50 [cxl_core] cxl_trace_hpa+0x122/0x300 [cxl_core] trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core] __traceiter_cxl_poison+0x5c/0x80 [cxl_core] cxl_inject_poison+0x1bc/0x1e0 [cxl_core] This appears to have been an issue since the initial implementation and uncovered by the new cxl-poison.sh test [1]. That test is now passing with these changes. Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events") Link: http://lore.kernel.org/r/e4f2716646918135ddbadf4146e92abb659de734.1700615159.git.alison.schofield@intel.com [1] Cc: <stable@vger.kernel.org> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-07cxl: Add Support for Get TimestampDavidlohr Bueso2-0/+2
Add the call to the UAPI such that userspace may corelate the timestamps from the device log with system wall time, if, for example there's any sort of inaccuracy or skew in the device. Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230829152014.15452-1-dave@stgolabs.net Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-11-29cxl/memdev: Hold region_rwsem during inject and clear poison opsAlison Schofield1-2/+16
Poison inject and clear are supported via debugfs where a privileged user can inject and clear poison to a device physical address. Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper") added a lockdep assert that highlighted a gap in poison inject and clear functions where holding the dpa_rwsem does not assure that a a DPA is not added to a region. The impact for inject and clear is that if the DPA address being injected or cleared has been attached to a region, but not yet committed, the dev_dbg() message intended to alert the debug user that they are acting on a mapped address is not emitted. Also, the cxl_poison trace event that serves as a log of the inject and clear activity will not include region info. Close this gap by snapshotting an unchangeable region state during poison inject and clear operations. That means holding both the region_rwsem and the dpa_rwsem during the inject and clear ops. Fixes: d2fbc4865802 ("cxl/memdev: Add support for the Inject Poison mailbox command") Fixes: 9690b07748d1 ("cxl/memdev: Add support for the Clear Poison mailbox command") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/08721dc1df0a51e4e38fecd02425c3475912dfd5.1701041440.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-11-29cxl/core: Always hold region_rwsem while reading poison listsAlison Schofield2-6/+8
A read of a device poison list is triggered via a sysfs attribute and the results are logged as kernel trace events of type cxl_poison. The work is managed by either: a) the region driver when one of more regions map the device, or by b) the memdev driver when no regions map the device. In the case of a) the region driver holds the region_rwsem while reading the poison by committed endpoint decoder mappings and for any unmapped resources. This makes sure that the cxl_poison trace event trace reports valid region info. (Region name, HPA, and UUID). In the case of b) the memdev driver holds the dpa_rwsem preventing new DPA resources from being attached to a region. However, it leaves a gap between region attach and decoder commit actions. If a DPA in the gap is in the poison list, the cxl_poison trace event will omit the region info. Close the gap by holding the region_rwsem and the dpa_rwsem when reading poison per memdev. Since both methods now hold both locks, down_read both from the caller. Doing so also addresses the lockdep assert that found this issue: Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper") Fixes: f0832a586396 ("cxl/region: Provide region info to the cxl_poison trace event") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/08e8e7ec9a3413b91d51de39e385653494b1eed0.1701041440.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-11-22cxl/hdm: Fix a benign lockdep splatDave Jiang1-0/+2
The new helper "cxl_num_decoders_committed()" added a lockdep assertion to validate that port->commit_end is protected against modification. That assertion fires in init_hdm_decoder() where it is initializing port->commit_end. Given that it is both accessing and writing that property it obstensibly needs the lock. In practice, CXL decoder commit rules (must commit in order) and the in-order discovery of device decoders makes the manipulation of ->commit_end in init_hdm_decoder() safe. However, rather than rely on the subtle rules of CXL hardware, just make the implementation obviously correct from a software perspective. The Fixes: tag is only for cleaning up a lockdep splat, there is no functional issue addressed by this fix. Fixes: 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper") Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170025232811.2147250.16376901801315194121.stgit@djiang5-mobl3 Acked-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-11-02cxl/pci: Change CXL AER support check to use native AERTerry Bowman1-2/+2
Native CXL protocol errors are delivered to the OS through AER reporting. The owner of AER owns CXL Protocol error management with respect to _OSC negotiation.[1] CXL device errors are handled by a separate interrupt with native control gated by _OSC control field 'CXL Memory Error Reporting Control'. The CXL driver incorrectly checks for 'CXL Memory Error Reporting Control' before accessing AER registers and caching RCH downport AER registers. Replace the current check in these 2 cases with native AER checks. [1] CXL 3.0 - 9.17.2 CXL _OSC, Table-9-26, Interpretation of CXL _OSC Support Fields, p.641 Fixes: f05fd10d138d ("cxl/pci: Add RCH downstream port AER register discovery") Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Link: https://lore.kernel.org/r/20231102155232.1421261-1-terry.bowman@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-31cxl/hdm: Remove broken error pathDan Williams2-17/+10
Dan reports that cxl_decoder_commit() potentially leaks a hold of cxl_dpa_rwsem. The potential error case is a "should not" happen scenario, turn it into a "can not" happen scenario by adding the error check to cxl_port_setup_targets() where other setting validation occurs. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: http://lore.kernel.org/r/63295673-5d63-4919-b851-3b06d48734c0@moroto.mountain Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-31cxl/hdm: Fix && vs || bugDan Carpenter1-1/+1
If "info" is NULL then this code will crash. || was intended instead of &&. Fixes: 8ce520fdea24 ("cxl/hdm: Use stored Component Register mappings to map HDM decoder capability") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Robert Richter <rrichter@amd.com> Link: https://lore.kernel.org/r/60028378-d3d5-4d6d-90fd-f915f061e731@moroto.mountain Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-31Merge branch 'for-6.7/cxl-commited' into cxl/nextDan Williams5-6/+40
Add the committed decoder sysfs attribute for v6.7.
2023-10-31Merge branch 'for-6.7/cxl' into cxl/nextDan Williams4-5/+11
Pickup some misc. CXL updates for v6.7.
2023-10-31Merge branch 'for-6.7/cxl-qtg' into cxl/nextDan Williams5-12/+60
Merge some prep-work for CXL QOS class support. This cycle saw large collisions with mm on this topic, so the bulk of this topic needs to wait.
2023-10-31Merge branch 'for-6.7/cxl-rch-eh' into cxl/nextDan Williams10-129/+406
Restricted CXL Host (RCH) Error Handling undoes the topology munging of CXL 1.1 to enabled some AER recovery, and lands some base infrastructure for handling Root-Complex-Event-Collectors (RCECs) with CXL. Include this long running series finally for v6.7.
2023-10-27cxl: Add support for reading CXL switch CDAT tableDave Jiang2-5/+20
Add read_cdat_data() call in cxl_switch_port_probe() to allow reading of CDAT data for CXL switches. read_cdat_data() needs to be adjusted for the retrieving of the PCIe device depending on if the passed in port is endpoint or switch. Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/169713682855.2205276.6418370379144967443.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl: Add checksum verification to CDAT from CXLDave Jiang1-7/+23
A CDAT table is available from a CXL device. The table is read by the driver and cached in software. With the CXL subsystem needing to parse the CDAT table, the checksum should be verified. Add checksum verification after the CDAT table is read from device. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/169713682277.2205276.2687265961314933628.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl: Export QTG ids from CFMWS to sysfs as qos_class attributeDave Jiang3-0/+17
Export the QoS Throttling Group ID from the CXL Fixed Memory Window Structure (CFMWS) under the root decoder sysfs attributes as qos_class. CXL rev3.0 9.17.1.3 CXL Fixed Memory Window Structure (CFMWS) cxl cli will use this id to match with the _DSM retrieved id for a hot-plugged CXL memory device DPA memory range to make sure that the DPA range is under the right CFMWS window. Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/169713681699.2205276.14475306324720093079.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl: Add decoders_committed sysfs attribute to cxl_portDave Jiang1-0/+25
This attribute allows cxl-cli to determine whether there are decoders committed to a memdev. This is only a snapshot of the state, and doesn't offer any protection or serialization against a concurrent disable-region operation. Reviewed-by: Jim Harris <jim.harris@samsung.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/169747907439.272156.10261062080830155662.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl: Add cxl_decoders_committed() helperDave Jiang5-6/+15
Add a helper to retrieve the number of decoders committed for the port. Replace all the open coding of the calculation with the helper. Link: https://lore.kernel.org/linux-cxl/651c98472dfed_ae7e729495@dwillia2-xfh.jf.intel.com.notmuch/ Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jim Harris <jim.harris@samsung.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/r/169747906849.272156.1729290904857372335.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/core/regs: Rework cxl_map_pmu_regs() to use map->dev for devmRobert Richter3-6/+4
struct cxl_register_map carries a @dev parameter for devm operations. Simplify the function interface to use that instead of a separate @dev argument. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20231018171713.1883517-21-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/core/regs: Rename phys_addr in cxl_map_component_regs()Robert Richter1-3/+3
Trivial change that renames variable phys_addr in cxl_map_component_regs() to shorten its length to keep the 80 char size limit for the line and also for consistency between the different paths. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20231018171713.1883517-20-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/pci: Disable root port interrupts in RCH modeTerry Bowman1-0/+32
The RCH root port contains root command AER registers that should not be enabled.[1] Disable these to prevent root port interrupts. [1] CXL 3.0 - 12.2.1.1 RCH Downstream Port-detected Errors Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231018171713.1883517-17-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/pci: Add RCH downstream port error loggingTerry Bowman1-0/+96
RCH downstream port error logging is missing in the current CXL driver. The missing AER and RAS error logging is needed for communicating driver error details to userspace. Update the driver to include PCIe AER and CXL RAS error logging. Add RCH downstream port error handling into the existing RCiEP handler. The downstream port error handler is added to the RCiEP error handler because the downstream port is implemented in a RCRB, is not PCI enumerable, and as a result is not directly accessible to the PCI AER root port driver. The AER root port driver calls the RCiEP handler for handling RCD errors and RCH downstream port protocol errors. Update existing RCiEP correctable and uncorrectable handlers to also call the RCH handler. The RCH handler will read the RCH AER registers, check for error severity, and if an error exists will log using an existing kernel AER trace routine. The RCH handler will also log downstream port RAS errors if they exist. Co-developed-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231018171713.1883517-16-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/pci: Map RCH downstream AER registers for logging protocol errorsTerry Bowman2-0/+46
The restricted CXL host (RCH) error handler will log protocol errors using AER and RAS status registers. The AER and RAS registers need to be virtually memory mapped before enabling interrupts. Create the initializer function devm_cxl_setup_parent_dport() for this when the endpoint is connected with the dport. The initialization sets up the RCH RAS and AER mappings. Add 'struct cxl_regs' to 'struct cxl_dport' for saving a pointer to the RCH downstream port's AER and RAS registers. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Co-developed-by: Robert Richter <rrichter@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20231018171713.1883517-15-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/pci: Update CXL error logging to use RAS register addressTerry Bowman1-13/+31
The CXL error handler currently only logs endpoint RAS status. The CXL topology includes several components providing RAS details to be logged during error handling.[1] Update the current handler's RAS logging to use a RAS register address. Also, update the error handler function names to be consistent with correctable and uncorrectable RAS. This will allow for adding support to log other CXL component's RAS details in the future. [1] CXL3.0 Table 8-22 CXL_Capability_ID Assignment Co-developed-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231018171713.1883517-14-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27PCI/AER: Refactor cper_print_aer() for use by CXL driver moduleTerry Bowman1-0/+1
The CXL driver plans to use cper_print_aer() for logging restricted CXL host (RCH) AER errors. cper_print_aer() is not currently exported and therefore not usable by the CXL drivers built as loadable modules. Export the cper_print_aer() function. Use the EXPORT_SYMBOL_NS_GPL() variant to restrict the export to CXL drivers. The CONFIG_ACPI_APEI_PCIEAER kernel config is currently used to enable cper_print_aer(). cper_print_aer() logs the AER registers and is useful in PCIE AER logging outside of APEI. Remove the CONFIG_ACPI_APEI_PCIEAER dependency to enable cper_print_aer(). The cper_print_aer() function name implies CPER specific use but is useful in non-CPER cases as well. Rename cper_print_aer() to pci_print_aer(). Also, update cxl_core to import CXL namespace imports. Co-developed-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com> Cc: Oliver O'Halloran <oohall@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231018171713.1883517-13-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/pci: Add RCH downstream port AER register discoveryRobert Richter5-0/+61
Restricted CXL host (RCH) downstream port AER information is not currently logged while in the error state. One problem preventing the error logging is the AER and RAS registers are not accessible. The CXL driver requires changes to find RCH downstream port AER and RAS registers for purpose of error logging. RCH downstream ports are not enumerated during a PCI bus scan and are instead discovered using system firmware, ACPI in this case.[1] The downstream port is implemented as a Root Complex Register Block (RCRB). The RCRB is a 4k memory block containing PCIe registers based on the PCIe root port.[2] The RCRB includes AER extended capability registers used for reporting errors. Note, the RCH's AER Capability is located in the RCRB memory space instead of PCI configuration space, thus its register access is different. Existing kernel PCIe AER functions can not be used to manage the downstream port AER capabilities and RAS registers because the port was not enumerated during PCI scan and the registers are not PCI config accessible. Discover RCH downstream port AER extended capability registers. Use MMIO accesses to search for extended AER capability in RCRB register space. [1] CXL 3.0 Spec, 9.11.2 - System Firmware View of CXL 1.1 Hierarchy [2] CXL 3.0 Spec, 8.2.1.1 - RCH Downstream Port RCRB Co-developed-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231018171713.1883517-12-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/port: Remove Component Register base address from struct cxl_portRobert Richter2-5/+1
The Component Register base address @component_reg_phys is no longer used after the rework of the Component Register setup which now uses struct member @reg_map instead. Remove the base address. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231018171713.1883517-10-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/pci: Remove Component Register base address from struct cxl_dev_stateRobert Richter2-5/+0
The Component Register base address @component_reg_phys is no longer used after the rework of the Component Register setup which now uses struct member @reg_map instead. Remove the base address. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231018171713.1883517-9-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/hdm: Use stored Component Register mappings to map HDM decoder capabilityRobert Richter3-39/+43
Now, that the Component Register mappings are stored, use them to enable and map the HDM decoder capabilities. The Component Registers do not need to be probed again for this, remove probing code. The HDM capability applies to Endpoints, USPs and VH Host Bridges. The Endpoint's component register mappings are located in the cxlds and else in the port's structure. Duplicate the cxlds->reg_map in port->reg_map for endpoint ports. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> [rework to drop cxl_port_get_comp_map()] Link: https://lore.kernel.org/r/20231018171713.1883517-8-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/pci: Store the endpoint's Component Register mappings in struct ↵Robert Richter3-4/+9
cxl_dev_state Same as for ports and dports, also store the endpoint's Component Register mappings, use struct cxl_dev_state for that. Keep the Component Register base address @component_reg_phys a bit to not break functionality. It will be removed after the transition in a later patch. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231018171713.1883517-7-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/port: Pre-initialize component register mappingsRobert Richter1-5/+7
The component registers of a component may not exist and cxl_setup_comp_regs() will fail for that reason. In another case, Software may not use and set those registers up. cxl_setup_comp_regs() is then called with a base address of CXL_RESOURCE_NONE. Both are valid cases, but the function returns without initializing the register map. Now, a missing component register block is not necessarily a reason to fail (feature is optional or its existence checked later). Change cxl_setup_comp_regs() to also use components with the component register block missing. Thus, always initialize struct cxl_register_map with valid values, set @dev and make @resource CXL_RESOURCE_NONE. The change is in preparation of follow-on patches. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20231018171713.1883517-6-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/port: Rename @comp_map to @reg_map in struct cxl_register_mapRobert Richter2-7/+7
Name the field @reg_map, because @reg_map->host will be used for mapping operations beyond component registers (i.e. AER registers). This is valid for all occurrences of @comp_map. Change them all. Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20231018171713.1883517-5-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/port: Fix @host confusion in cxl_dport_setup_regs()Dan Williams1-12/+31
commit 5d2ffbe4b81a ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport") ...moved the dport component registers from a raw component_reg_phys passed in at dport instantiation time to a 'struct cxl_register_map' populated with both the component register data *and* the "host" device for mapping operations. While typical CXL switch dports are mapped by their associated 'struct cxl_port', an RCH host bridge dport registered by cxl_acpi needs to wait until the cxl_mem driver makes the attachment to map the registers. This is because there are no intervening 'struct cxl_port' instances between the root cxl_port and the endpoint port in an RCH topology. For now just mark the host as NULL in the RCH dport case until code that needs to map the dport registers arrives. This patch is not flagged for -stable since nothing in the current driver uses the dport->comp_map. Now, I am slightly uneasy that cxl_setup_comp_regs() sets map->host to a wrong value and then cxl_dport_setup_regs() fixes it up, but the alternatives I came up with are more messy. For example, adding an @logdev to 'struct cxl_register_map' that the dev_printk()s can fall back to when @host is NULL. I settled on "post-fixup+comment" since it is only RCH dports that have this special case where register probing is split between a host-bridge RCRB lookup and when cxl_mem_probe() does the association of the cxl_memdev and endpoint port. [moved rename of @comp_map to @reg_map into next patch] Fixes: 5d2ffbe4b81a ("cxl/port: Store the downstream port's Component Register mappings in struct cxl_dport") Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20231018171713.1883517-4-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/core/regs: Rename @dev to @host in struct cxl_register_mapRobert Richter5-20/+20
The primary role of @dev is to host the mappings for devm operations. @dev is too ambiguous as a name. I.e. when does @dev refer to the 'struct device *' instance that the registers belong, and when does @dev refer to the 'struct device *' instance hosting the mapping for devm operations? Clarify the role of @dev in cxl_register_map by renaming it to @host. Also, rename local variables to 'host' where map->host is used. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20231018171713.1883517-3-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/port: Fix delete_endpoint() vs parent unregistration raceDan Williams1-15/+19
The CXL subsystem, at cxl_mem ->probe() time, establishes a lineage of ports (struct cxl_port objects) between an endpoint and the root of a CXL topology. Each port including the endpoint port is attached to the cxl_port driver. Given that setup, it follows that when either any port in that lineage goes through a cxl_port ->remove() event, or the memdev goes through a cxl_mem ->remove() event. The hierarchy below the removed port, or the entire hierarchy if the memdev is removed needs to come down. The delete_endpoint() callback is careful to check whether it is being called to tear down the hierarchy, or if it is only being called to teardown the memdev because an ancestor port is going through ->remove(). That care needs to take the device_lock() of the endpoint's parent. Which requires 2 bugs to be fixed: 1/ A reference on the parent is needed to prevent use-after-free scenarios like this signature: BUG: spinlock bad magic on CPU#0, kworker/u56:0/11 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023 Workqueue: cxl_port detach_memdev [cxl_core] RIP: 0010:spin_bug+0x65/0xa0 Call Trace: do_raw_spin_lock+0x69/0xa0 __mutex_lock+0x695/0xb80 delete_endpoint+0xad/0x150 [cxl_core] devres_release_all+0xb8/0x110 device_unbind_cleanup+0xe/0x70 device_release_driver_internal+0x1d2/0x210 detach_memdev+0x15/0x20 [cxl_core] process_one_work+0x1e3/0x4c0 worker_thread+0x1dd/0x3d0 2/ In the case of RCH topologies, the parent device that needs to be locked is not always @port->dev as returned by cxl_mem_find_port(), use endpoint->dev.parent instead. Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Cc: <stable@vger.kernel.org> Reported-by: Robert Richter <rrichter@amd.com> Closes: http://lore.kernel.org/r/20231018171713.1883517-2-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/region: Fix x1 root-decoder granularity calculationsJim Harris1-1/+8
Root decoder granularity must match value from CFWMS, which may not be the region's granularity for non-interleaved root decoders. So when calculating granularities for host bridge decoders, use the region's granularity instead of the root decoder's granularity to ensure the correct granularities are set for the host bridge decoders and any downstream switch decoders. Test configuration is 1 host bridge * 2 switches * 2 endpoints per switch. Region created with 2048 granularity using following command line: cxl create-region -m -d decoder0.0 -w 4 mem0 mem2 mem1 mem3 \ -g 2048 -s 2048M Use "cxl list -PDE | grep granularity" to get a view of the granularity set at each level of the topology. Before this patch: "interleave_granularity":2048, "interleave_granularity":2048, "interleave_granularity":512, "interleave_granularity":2048, "interleave_granularity":2048, "interleave_granularity":512, "interleave_granularity":256, After: "interleave_granularity":2048, "interleave_granularity":2048, "interleave_granularity":4096, "interleave_granularity":2048, "interleave_granularity":2048, "interleave_granularity":4096, "interleave_granularity":2048, Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Cc: <stable@vger.kernel.org> Signed-off-by: Jim Harris <jim.harris@samsung.com> Link: https://lore.kernel.org/r/169824893473.1403938.16110924262989774582.stgit@bgt-140510-bm03.eng.stellus.in [djbw: fixup the prebuilt cxl_test region] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/region: Fix cxl_region_rwsem lock held when returning to user spaceLi Zhijian1-1/+1
Fix a missed "goto out" to unlock on error to cleanup this splat: WARNING: lock held when returning to user space! 6.6.0-rc3-lizhijian+ #213 Not tainted ------------------------------------------------ cxl/673 is leaving the kernel with locks still held! 1 lock held by cxl/673: #0: ffffffffa013b9d0 (cxl_region_rwsem){++++}-{3:3}, at: commit_store+0x7d/0x3e0 [cxl_core] In terms of user visible impact of this bug for backports: cxl_region_invalidate_memregion() on x86 invokes wbinvd which is a problematic instruction for virtualized environments. So, on virtualized x86, cxl_region_invalidate_memregion() returns an error. This failure case got missed because CXL memory-expander device passthrough is not a production use case, and emulation of CXL devices is typically limited to kernel development builds with CONFIG_CXL_REGION_INVALIDATION_TEST=y, that makes cxl_region_invalidate_memregion() succeed. In other words, the expected exposure of this bug is limited to CXL subsystem development environments using QEMU that neglected CONFIG_CXL_REGION_INVALIDATION_TEST=y. Fixes: d1257d098a5a ("cxl/region: Move cache invalidation before region teardown, and before setup") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20231025085450.2514906-1-lizhijian@fujitsu.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/region: Use cxl_calc_interleave_pos() for auto-discoveryAlison Schofield1-112/+15
For auto-discovered regions the driver must assign each target to a valid position in the region interleave set based on the decoder topology. The current implementation fails to parse valid decode topologies, as it does not consider the child offset into a parent port. The sort put all targets of one port ahead of another port when an interleave was expected, causing the region assembly to fail. Replace the existing relative sort with cxl_calc_interleave_pos() that finds the exact position in a region interleave for an endpoint based on a walk up the ancestral tree from endpoint to root decoder. cxl_calc_interleave_pos() was introduced in a prior patch, so the work here is to use it in cxl_region_sort_targets(). Remove the obsoleted helper functions from the prior sort. Testing passes on pre-production hardware with BIOS defined regions that natively trigger this autodiscovery path of the region driver. Testing passes a CXL unit test using the dev_dbg() calculation test (see cxl_region_attach()) across an expanded set of region configs: 1, 1, 1+1, 1+1+1, 2, 2+2, 2+2+2, 2+2+2+2, 4, 4+4, where each number represents the count of endpoints per host bridge. Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery") Reported-by: Dmytro Adamenko <dmytro.adamenko@intel.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jim Harris <jim.harris@samsung.com> Link: https://lore.kernel.org/r/3946cc55ddc19678733eddc9de2c317749f43f3b.1698263080.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/region: Calculate a target position in a region interleaveAlison Schofield1-0/+127
Introduce a calculation to find a target's position in a region interleave. Perform a self-test of the calculation on user-defined regions. The region driver uses the kernel sort() function to put region targets in relative order. Positions are assigned based on each target's index in that sorted list. That relative sort doesn't consider the offset of a port into its parent port which causes some auto-discovered regions to fail creation. In one failure case, a 2 + 2 config (2 host bridges each with 2 endpoints), the sort puts all the targets of one port ahead of another port when they were expected to be interleaved. In preparation for repairing the autodiscovery region assembly, introduce a new method for discovering a target position in the region interleave. cxl_calc_interleave_pos() adds a method to find the target position by ascending from an endpoint to a root decoder. The calculation starts with the endpoint's local position and position in the parent port. It traverses towards the root decoder and examines both position and ways in order to allow the position to be refined all the way to the root decoder. This calculation: position = position * parent_ways + parent_pos; applied iteratively yields the correct position. Include a self-test that exercises this new position calculation against every successfully configured user-defined region. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/r/0ac32c75cf81dd8b86bf07d70ff139d33c2300bc.1698263080.git.alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-26cxl/region: Prepare the decoder match range helper for reuseAlison Schofield1-6/+11
match_decoder_by_range() and decoder_match_range() both determine if an HPA range matches a decoder. The first does it for root decoders and the second one operates on switch decoders. Tidy these up with clear naming and make the switch helper more like the root decoder helper in style and functionality. Make it take the actual range, rather than an endpoint decoder from which it extracts the range. Require an exact match on switch decoders, because unlike a root decoder that maps an entire region, Linux only supports 1:1 mapping of switch to endpoint decoders. Note that root-decoders are a super-set of switch-decoders and the range they cover is a super-set of a region, hence the use of range_contains() for that case. Aside from aesthetics and maintainability, this is in preparation for reuse. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jim Harris <jim.harris@samsung.com> Link: https://lore.kernel.org/r/011b1f498e1758bb8df17c5951be00bd8d489e3b.1698263080.git.alison.schofield@intel.com [djbw: fixup root decoder vs switch decoder range checks] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-24cxl/mbox: Remove useless cast in cxl_mem_create_range_info()Alison Schofield1-2/+1
DEFINE_RES_MEM() is a wrapper around the DEFINE_RES_NAMED() macro which already has the (struct resource) for the compound literal. The user of the macro should not repeat the cast. Cleans up these sparse warnings: drivers/cxl/core/mbox.c:1184:18: warning: cast to non-scalar drivers/cxl/core/mbox.c:1184:18: warning: cast from non-scalar Fixes: 52c4d11f1dce ("resource: Convert DEFINE_RES_NAMED() to be compound literal") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20230815172052.22514-1-alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-24cxl/region: Do not try to cleanup after cxl_region_setup_targets() failsJim Harris1-7/+7
Commit 5e42bcbc3fef ("cxl/region: decrement ->nr_targets on error in cxl_region_attach()") tried to avoid 'eiw' initialization errors when ->nr_targets exceeded 16, by just decrementing ->nr_targets when cxl_region_setup_targets() failed. Commit 86987c766276 ("cxl/region: Cleanup target list on attach error") extended that cleanup to also clear cxled->pos and p->targets[pos]. The initialization error was incidentally fixed separately by: Commit 8d4285425714 ("cxl/region: Fix port setup uninitialized variable warnings") which was merged a few days after 5e42bcbc3fef. But now the original cleanup when cxl_region_setup_targets() fails prevents endpoint and switch decoder resources from being reused: 1) the cleanup does not set the decoder's region to NULL, which results in future dpa_size_store() calls returning -EBUSY 2) the decoder is not properly freed, which results in future commit errors associated with the upstream switch Now that the initialization errors were fixed separately, the proper cleanup for this case is to just return immediately. Then the resources associated with this target get cleanup up as normal when the failed region is deleted. The ->nr_targets decrement in the error case also helped prevent a p->targets[] array overflow, so add a new check to prevent against that overflow. Tested by trying to create an invalid region for a 2 switch * 2 endpoint topology, and then following up with creating a valid region. Fixes: 5e42bcbc3fef ("cxl/region: decrement ->nr_targets on error in cxl_region_attach()") Cc: <stable@vger.kernel.org> Signed-off-by: Jim Harris <jim.harris@samsung.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/169703589120.1202031.14696100866518083806.stgit@bgt-140510-bm03.eng.stellus.in Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-09cxl/mem: Fix shutdown orderDan Williams1-1/+1
Ira reports that removing cxl_mock_mem causes a crash with the following trace: BUG: kernel NULL pointer dereference, address: 0000000000000044 [..] RIP: 0010:cxl_region_decode_reset+0x7f/0x180 [cxl_core] [..] Call Trace: <TASK> cxl_region_detach+0xe8/0x210 [cxl_core] cxl_decoder_kill_region+0x27/0x40 [cxl_core] cxld_unregister+0x29/0x40 [cxl_core] devres_release_all+0xb8/0x110 device_unbind_cleanup+0xe/0x70 device_release_driver_internal+0x1d2/0x210 bus_remove_device+0xd7/0x150 device_del+0x155/0x3e0 device_unregister+0x13/0x60 devm_release_action+0x4d/0x90 ? __pfx_unregister_port+0x10/0x10 [cxl_core] delete_endpoint+0x121/0x130 [cxl_core] devres_release_all+0xb8/0x110 device_unbind_cleanup+0xe/0x70 device_release_driver_internal+0x1d2/0x210 bus_remove_device+0xd7/0x150 device_del+0x155/0x3e0 ? lock_release+0x142/0x290 cdev_device_del+0x15/0x50 cxl_memdev_unregister+0x54/0x70 [cxl_core] This crash is due to the clearing out the cxl_memdev's driver context (@cxlds) before the subsystem is done with it. This is ultimately due to the region(s), that this memdev is a member, being torn down and expecting to be able to de-reference @cxlds, like here: static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) ... if (cxlds->rcd) goto endpoint_reset; ... Fix it by keeping the driver context valid until memdev-device unregistration, and subsequently the entire stack of related dependencies, unwinds. Fixes: 9cc238c7a526 ("cxl/pci: Introduce cdevm_file_operations") Reported-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-06cxl/memdev: Fix sanitize vs decoder setup lockingDan Williams8-49/+90
The sanitize operation is destructive and the expectation is that the device is unmapped while in progress. The current implementation does a lockless check for decoders being active, but then does nothing to prevent decoders from racing to be committed. Introduce state tracking to resolve this race. This incidentally cleans up unpriveleged userspace from triggering mmio read cycles by spinning on reading the 'security/state' attribute. Which at a minimum is a waste since the kernel state machine can cache the completion result. Lastly cxl_mem_sanitize() was mistakenly marked EXPORT_SYMBOL() in the original implementation, but an export was never required. Fixes: 0c36b6ad436a ("cxl/mbox: Add sanitization handling machinery") Cc: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-06cxl/pci: Fix sanitize notifier setupDan Williams3-42/+50
Fix a race condition between the mailbox-background command interrupt firing and the security-state sysfs attribute being removed. The race is difficult to see due to the awkward placement of the sanitize-notifier setup code and the multiple places the teardown calls are made, cxl_memdev_security_init() and cxl_memdev_security_shutdown(). Unify setup in one place, cxl_sanitize_setup_notifier(). Arrange for the paired cxl_sanitize_teardown_notifier() to safely quiet the notifier and let the cxl_memdev + irq be unregistered later in the flow. Note: The special wrinkle of the sanitize notifier is that it interacts with interrupts, which are enabled early in the flow, and it interacts with memdev sysfs which is not initialized until late in the flow. Hence why this setup routine takes an @cxlmd argument, and not just @mds. This fix is also needed as a preparation fix for a memdev unregistration crash. Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Closes: http://lore.kernel.org/r/20230929100316.00004546@Huawei.com Cc: Dave Jiang <dave.jiang@intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Fixes: 0c36b6ad436a ("cxl/mbox: Add sanitization handling machinery") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-06cxl/pci: Clarify devm host for memdev relative setupDan Williams3-12/+13
It is all too easy to get confused about @dev usage in the CXL driver stack. Before adding a new cxl_pci_probe() setup operation that has a devm lifetime dependent on @cxlds->dev binding, but also references @cxlmd->dev, and prints messages, rework the devm_cxl_add_memdev() and cxl_memdev_setup_fw_upload() function signatures to make this distinction explicit. I.e. pass in the devm context as an @host argument rather than infer it from other objects. This is in preparation for adding a devm_cxl_sanitize_setup_notifier(). Note the whitespace fixup near the change of the devm_cxl_add_memdev() signature. That uncaught typo originated in the patch that added cxl_memdev_security_init(). Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-06cxl/pci: Remove inconsistent usage of dev_err_probe()Dan Williams1-11/+2
If dev_err_probe() is to be used it should at least be used consistently within the same function. It is also worth questioning whether every potential -ENOMEM needs an explicit error message. Remove the cxl_setup_fw_upload() error prints for what are rare / hardware-independent failures. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-06cxl/pci: Remove hardirq handler for cxl_request_irq()Dan Williams1-6/+6
Now that all callers of cxl_request_irq() are using threaded irqs, drop the hardirq handler option. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-29cxl/pci: Cleanup 'sanitize' to always pollDan Williams3-39/+26
In preparation for fixing the init/teardown of the 'sanitize' workqueue and sysfs notification mechanism, arrange for cxl_mbox_sanitize_work() to be the single location where the sysfs attribute is notified. With that change there is no distinction between polled mode and interrupt mode. All the interrupt does is accelerate the polling interval. The change to check for "mds->security.sanitize_node" under the lock is there to ensure that the interrupt, the work routine and the setup/teardown code can all have a consistent view of the registered notifier and the workqueue state. I.e. the expectation is that the interrupt is live past the point that the sanitize sysfs attribute is published, and it may race teardown, so it must be consulted under a lock. Given that new locking requirement, cxl_pci_mbox_irq() is moved from hard to thread irq context. Lastly, some opportunistic replacements of "queue_delayed_work(system_wq, ...)", which is just open coded schedule_delayed_work(), are included. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-29cxl/pci: Remove unnecessary device reference management in sanitize workDan Williams1-5/+0
Given that any particular put_device() could be the final put of the device, the fact that there are usages of cxlds->dev after put_device(cxlds->dev) is a red flag. Drop the reference counting since the device is pinned by being registered and will not be unregistered without triggering the driver + workqueue to shutdown. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-22cxl/acpi: Annotate struct cxl_cxims_data with __counted_byKees Cook1-2/+2
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct cxl_cxims_data. Additionally, since the element count member must be set before accessing the annotated flexible array member, move its initialization earlier. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-cxl@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230922175319.work.096-kees@kernel.org Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-22cxl/port: Fix cxl_test register enumeration regressionDan Williams1-4/+9
The cxl_test unit test environment models a CXL topology for sysfs/user-ABI regression testing. It uses interface mocking via the "--wrap=" linker option to redirect cxl_core routines that parse hardware registers with versions that just publish objects, like devm_cxl_enumerate_decoders(). Starting with: Commit 19ab69a60e3b ("cxl/port: Store the port's Component Register mappings in struct cxl_port") ...port register enumeration is moved into devm_cxl_add_port(). This conflicts with the "cxl_test avoids emulating registers stance" so either the port code needs to be refactored (too violent), or modified so that register enumeration is skipped on "fake" cxl_test ports (annoying, but straightforward). This conflict has happened previously and the "check for platform device" workaround to avoid instrusive refactoring was deployed in those scenarios. In general, refactoring should only benefit production code, test code needs to remain minimally instrusive to the greatest extent possible. This was missed previously because it may sometimes just cause warning messages to be emitted, but it can also cause test failures. The backport to -stable is only nice to have for clean cxl_test runs. Fixes: 19ab69a60e3b ("cxl/port: Store the port's Component Register mappings in struct cxl_port") Cc: stable@vger.kernel.org Reported-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/169476525052.1013896.6235102957693675187.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-15cxl/pci: Update commentIra Weiny1-1/+4
The existence of struct cxl_dev_id containing a single member is odd. The comment made sense when I wrote it but could be clarified. Update the comment and place it next to the odd looking structure. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230426-cxl-fixes-v1-2-870c4c8b463a@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-15cxl/port: Quiet warning messages from the cxl_test environmentDan Williams2-2/+7
The cxl_test platform device CXL port hierarchy is useful for testing, but throws warning messages of the form: cxl_mem mem2: at cxl_root_port.1 no parent for dport: platform cxl_mem mem3: at cxl_root_port.2 no parent for dport: platform cxl_mem mem4: at cxl_root_port.3 no parent for dport: platform cxl_mem mem5: at cxl_root_port.0 no parent for dport: platform cxl_mem mem6: at cxl_root_port.1 no parent for dport: platform cxl_mem mem7: at cxl_root_port.2 no parent for dport: platform cxl_mem mem8: at cxl_root_port.3 no parent for dport: platform cxl_mem mem9: at cxl_root_port.4 no parent for dport: platform cxl_mem mem10: at cxl_root_port.4 no parent for dport: platform ...and this message when running testing in QEMU: cxl_region region4: Bypassing cpu_cache_invalidate_memregion() for testing! Noisy cxl_test warnings have caused other regressions to be missed. In the interest of using cxl_test for early detection of dev_err() and dev_warn() messages, silence platform device topology and cache-invalidation messages. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-14cxl/region: Refactor granularity select in cxl_port_setup_targets()Alison Schofield1-9/+8
In cxl_port_setup_targets() the region driver validates the configuration of auto-discovered region decoders, as well as decoders the driver is preparing to program. The existing calculations use the encoded interleave granularity value to create an interleave granularity that properly fans out when routing an x1 interleave to a greater than x1 interleave. That all worked well, until this config came along: Host Bridge: 2 way at 256 granularity Switch Decoder_A: 1 way at 512 Endpoint_X: 2 way at 256 Switch Decoder_B: 1 way at 512 Endpoint_Y: 2 way at 256 When the Host Bridge interleave is greater than 1 and the root decoder interleave is exactly 1, the region driver needs to consider the number of targets in the region when calculating the expected granularity. While examining the existing logic, and trying to cover the case above, a couple of simplifications appeared, hence this proposed refactoring. The first simplification is to apply the logic to the nominal values and use the existing helper function granularity_to_eig() to translate the desired granularity to the encoded form. This means the comment and code regarding setting address bits is discarded. Although that logic is not wrong, it adds a level of complexity that is not required in the granularity selection. The eig and eiw are indeed part of the routing instructions programmed into the decoders. Up-level the discussion to nominal ways and granularity for clearer analysis. The second simplification reduces the logic to a single granularity calculation that works for all cases. The new calculation doesn't care if parent_iw => 1 because parent_iw is used as a multiplier. The refactor cleans up a useless assignment of eiw made after the iw is already calculated. Regression testing included an examination of all of the ways and granularity selections made during a run of the cxl_test unit tests. There were no differences in selections before and after this patch. Fixes: ("27b3f8d13830 cxl/region: Program target lists") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230822180928.117596-1-alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-14cxl/region: Match auto-discovered region decoders by HPA rangeAlison Schofield1-1/+23
Currently, when the region driver attaches a region to a port, it selects the ports next available decoder to program. With the addition of auto-discovered regions, a port decoder has already been programmed so grabbing the next available decoder can be a mismatch when there is more than one region using the port. The failure appears like this with CXL DEBUG enabled: [] cxl_core:alloc_region_ref:754: cxl region0: endpoint9: HPA order violation region0:[mem 0x14780000000-0x1478fffffff flags 0x200] vs [mem 0x880000000-0x185fffffff flags 0x200] [] cxl_core:cxl_port_attach_region:972: cxl region0: endpoint9: failed to allocate region reference When CXL DEBUG is not enabled, there is no failure message. The region just never materializes. Users can suspect this issue if they know their firmware has programmed decoders so that more than one region is using a port. Note that the problem may appear intermittently, ie not on every reboot. Add a matching method for auto-discovered regions that finds a decoder based on an HPA range. The decoder range must exactly match the region resource parameter. Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230905211007.256385-1-alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-14cxl/mbox: Fix CEL logic for poison and security commandsIra Weiny1-11/+12
The following debug output was observed while testing CXL cxl_core:cxl_walk_cel:721: cxl_mock_mem cxl_mem.0: Opcode 0x4300 unsupported by driver opcode 0x4300 (Get Poison) is supported by the driver and the mock device supports it. The logic should be checking that the opcode is both not poison and not security. Fix the logic to allow poison and security commands. Fixes: ad64f5952ce3 ("cxl/memdev: Only show sanitize sysfs files when supported") Cc: <stable@vger.kernel.org> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230903-cxl-cel-fix-v1-1-e260c9467be3@intel.com [cleanup cxl_walk_cel() to centralized "enabled" checks] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-11cxl/pci: Replace host_bridge->native_aer with pcie_aer_is_native()Smita Koralahalli1-2/+1
Use pcie_aer_is_native() to determine the native AER ownership as the usage of host_bride->native_aer does not cover command line override of AER ownership. Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230823234305.27333-4-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-09-11cxl/pci: Fix appropriate checking for _OSC while handling CXL RAS registersSmita Koralahalli1-3/+3
cxl_pci fails to unmask CXL protocol errors when CXL memory error reporting is not granted native control. Given that CXL memory error reporting uses the event interface and protocol errors use AER, unmask protocol errors based only on the native AER setting. Without this change end user deployments will fail to report protocol errors in the case where native memory error handling is not granted to Linux. Also, return zero instead of an error code to not block the communication with the cxl device when in native memory error reporting mode. Fixes: 248529edc86f ("cxl: add RAS status unmasking for CXL") Cc: <stable@vger.kernel.org> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230823234305.27333-2-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-07-28cxl/memdev: Only show sanitize sysfs files when supportedDavidlohr Bueso3-1/+78
If the device does not support Sanitize or Secure Erase commands, hide the respective sysfs interfaces such that the operation can never be attempted. In order to be generic, keep track of the enabled security commands found in the CEL - the driver does not support Security Passthrough. Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://lore.kernel.org/r/20230726051940.3570-4-dave@stgolabs.net Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-07-28cxl/memdev: Document security state in kern-docDavidlohr Bueso1-0/+1
... as is the case with all members of struct cxl_memdev_state. Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://lore.kernel.org/r/20230726051940.3570-3-dave@stgolabs.net Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-07-18cxl/acpi: Return 'rc' instead of '0' in cxl_parse_cfmws()Breno Leitao1-1/+1
Driver initialization returned success (return 0) even if the initialization (cxl_decoder_add() or acpi_table_parse_cedt()) failed. Return the error instead of swallowing it. Fixes: f4ce1f766f1e ("cxl/acpi: Convert CFMWS parsing to ACPI sub-table helpers") Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20230714093146.2253438-2-leitao@debian.org Reviewed-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-07-18cxl/acpi: Fix a use-after-free in cxl_parse_cfmws()Breno Leitao1-2/+1
KASAN and KFENCE detected an user-after-free in the CXL driver. This happens in the cxl_decoder_add() fail path. KASAN prints the following error: BUG: KASAN: slab-use-after-free in cxl_parse_cfmws (drivers/cxl/acpi.c:299) This happens in cxl_parse_cfmws(), where put_device() is called, releasing cxld, which is accessed later. Use the local variables in the dev_err() instead of pointing to the released memory. Since the dev_err() is printing a resource, change the open coded print format to use the %pr format specifier. Fixes: e50fe01e1f2a ("cxl/core: Drop ->platform_res attribute for root decoders") Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20230714093146.2253438-1-leitao@debian.org Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-07-14cxl/mem: Fix a double shift bugDan Carpenter1-1/+1
The CXL_FW_CANCEL macro is used with set/test_bit() so it should be a bit number and not the shifted value. The original code is the equivalent of using BIT(BIT(0)) so it's 0x2 instead of 0x1. This has no effect on runtime because it's done consistently and nothing else was using the 0x2 bit. Fixes: 9521875bbe00 ("cxl: add a firmware update mechanism using the sysfs firmware loader") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/a11b0c78-4717-4f4e-90be-f47f300d607c@moroto.mountain Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-07-14cxl: fix CONFIG_FW_LOADER dependencyArnd Bergmann1-1/+2
When FW_LOADER is disabled, cxl fails to link: arm-linux-gnueabi-ld: drivers/cxl/core/memdev.o: in function `cxl_memdev_setup_fw_upload': memdev.c:(.text+0x90e): undefined reference to `firmware_upload_register' memdev.c:(.text+0x93c): undefined reference to `firmware_upload_unregister' In order to use the firmware_upload_register() function, both FW_LOADER and FW_UPLOAD have to be enabled, which is a bit confusing. In addition, the dependency is on the wrong symbol, as the caller is part of the cxl_core.ko module, not the cxl_mem.ko module. Fixes: 9521875bbe005 ("cxl: add a firmware update mechanism using the sysfs firmware loader") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230703112928.332321-1-arnd@kernel.org Reviewed-by: Xiao Yang <yangx.jy@fujitsu.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
2023-06-29cxl: Fix one kernel-doc commentYang Li1-1/+1
Fix a merge error that updated the argument to cxl_mem_get_fw_info() but not the kernel-doc. drivers/cxl/core/memdev.c:678: warning: Function parameter or member 'mds' not described in 'cxl_mem_get_fw_info' drivers/cxl/core/memdev.c:678: warning: Excess function parameter 'cxlds' description in 'cxl_mem_get_fw_info' Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20230629021118.102744-1-yang.lee@linux.alibaba.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-27cxl/pci: Use correct flag for sanitize pollingDavidlohr Bueso1-1/+1
This is a bogus value, left behind from a previous version. Fixes: 0c36b6ad436a ("cxl/mbox: Add sanitization handling machinery") Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://lore.kernel.org/r/7q3vcjqidtmxmys4n34g6b3mygvhaen7yikzxanpz56lw43fz7@7subbtbfkmyx Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25Merge branch 'for-6.5/cxl-rch-eh' into for-6.5/cxlDan Williams12-291/+443
Pick up the first half of the RCH error handling series. The back half needs some fixups for test regressions. Small conflicts with the PMU work around register enumeration and setup helpers.
2023-06-25Merge branch 'for-6.5/cxl-perf' into for-6.5/cxlDan Williams10-7/+224
Pick up initial support for the CXL 3.0 performance monitoring definition. Small conflicts with the firmware update work as they both placed their init code in the same location.
2023-06-25perf: CXL Performance Monitoring Unit driverJonathan Cameron1-0/+13
CXL rev 3.0 introduces a standard performance monitoring hardware block to CXL. Instances are discovered using CXL Register Locator DVSEC entries. Each CXL component may have multiple PMUs. This initial driver supports a subset of types of counter. It supports counters that are either fixed or configurable, but requires that they support the ability to freeze and write value whilst frozen. Development done with QEMU model which will be posted shortly. Example: $ perf stat -a -e cxl_pmu_mem0.0/h2d_req_snpcur/ -e cxl_pmu_mem0.0/h2d_req_snpdata/ -e cxl_pmu_mem0.0/clock_ticks/ sleep 1 Performance counter stats for 'system wide': 96,757,023,244,321 cxl_pmu_mem0.0/h2d_req_snpcur/ 96,757,023,244,365 cxl_pmu_mem0.0/h2d_req_snpdata/ 193,514,046,488,653 cxl_pmu_mem0.0/clock_ticks/ 1.090539600 seconds time elapsed Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230526095824.16336-5-Jonathan.Cameron@huawei.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25Merge branch 'for-6.5/cxl-region-fixes' into for-6.5/cxlDan Williams2-46/+72
Pick up the recent fixes to how CPU caches are managed relative to region setup / teardown, and make sure that all decoders transition successfully before updating the region state from COMMIT => ACTIVE.
2023-06-25Merge branch 'for-6.5/cxl-type-2' into for-6.5/cxlDan Williams16-445/+515
Pick up the driver cleanups identified in preparation for CXL "type-2" (accelerator) device support. The major change here from a conflict generation perspective is the split of 'struct cxl_memdev_state' from the core 'struct cxl_dev_state'. Since an accelerator may not care about all the optional features that are standard on a CXL "type-3" (host-only memory expander) device. A silent conflict also occurs with the move of the endpoint port to be a formal property of a 'struct cxl_memdev' rather than drvdata.
2023-06-25Merge branch 'for-6.5/cxl-fwupd' into for-6.5/cxlDan Williams4-0/+395
Add the first typical (non-sanitization) consumer of the new background command infrastructure, firmware update. Given both firmware-update and sanitization were developed in parallel from the common background-command baseline, resolve some minor context conflicts.
2023-06-25Merge branch 'for-6.5/cxl-background' into for-6.5/cxlDan Williams6-21/+434
Pick up the sanitization work and the infrastructure for other background commands for 6.5. Sanitization has a different completion path than typical background commands so it was important to have both thought out and implemented before either went upstream.
2023-06-25cxl: add a firmware update mechanism using the sysfs firmware loaderVishal Verma4-0/+395
The sysfs based firmware loader mechanism was created to easily allow userspace to upload firmware images to FPGA cards. This also happens to be pretty suitable to create a user-initiated but kernel-controlled firmware update mechanism for CXL devices, using the CXL specified mailbox commands. Since firmware update commands can be long-running, and can be processed in the background by the endpoint device, it is desirable to have the ability to chunk the firmware transfer down to smaller pieces, so that one operation does not monopolize the mailbox, locking out any other long running background commands entirely - e.g. security commands like 'sanitize' or poison scanning operations. The firmware loader mechanism allows a natural way to perform this chunking, as after each mailbox command, that is restricted to the maximum mailbox payload size, the cxl memdev driver relinquishes control back to the fw_loader system and awaits the next chunk of data to transfer. This opens opportunities for other background commands to access the mailbox and send their own slices of background commands. Add the necessary helpers and state tracking to be able to perform the 'Get FW Info', 'Transfer FW', and 'Activate FW' mailbox commands as described in the CXL spec. Wire these up to the firmware loader callbacks, and register with that system to create the memX/firmware/ sysfs ABI. Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Cc: Russ Weight <russell.h.weight@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ben Widawsky <bwidawsk@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/20230602-vv-fw_update-v4-1-c6265bd7343b@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/mem: Support Secure EraseDavidlohr Bueso3-1/+34
Implement support for the non-pmem exclusive secure erase, per CXL specs. Create a write-only 'security/erase' sysfs file to perform the requested operation. As with the sanitation this requires the device being offline and thus no active HPA-DPA decoding. The expectation is that userspace can use it such as: cxl disable-memdev memX echo 1 > /sys/bus/cxl/devices/memX/security/erase cxl enable-memdev memX Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://lore.kernel.org/r/20230612181038.14421-7-dave@stgolabs.net Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/mem: Wire up Sanitization supportDavidlohr Bueso4-0/+132
Implement support for CXL 3.0 8.2.9.8.5.1 Sanitize. This is done by adding a security/sanitize' memdev sysfs file to trigger the operation and extend the status file to make it poll(2)-capable for completion. Unlike all other background commands, this is the only operation that is special and monopolizes the device for long periods of time. In addition to the traditional pmem security requirements, all regions must also be offline in order to perform the operation. This permits avoiding explicit global CPU cache management, relying instead on the implict cache management when a region transitions between CXL_CONFIG_ACTIVE and CXL_CONFIG_COMMIT. The expectation is that userspace can use it such as: cxl disable-memdev memX echo 1 > /sys/bus/cxl/devices/memX/security/sanitize cxl wait-sanitize memX cxl enable-memdev memX Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://lore.kernel.org/r/20230612181038.14421-5-dave@stgolabs.net Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/mbox: Add sanitization handling machineryDavidlohr Bueso3-3/+91
Sanitization is by definition a device-monopolizing operation, and thus the timeslicing rules for other background commands do not apply. As such handle this special case asynchronously and return immediately. Subsequent changes will allow completion to be pollable from userspace via a sysfs file interface. For devices that don't support interrupts for notifying background command completion, self-poll with the caveat that the poller can be out of sync with the ready hardware, and therefore care must be taken to not allow any new commands to go through until the poller sees the hw completion. The poller takes the mbox_mutex to stabilize the flagging, minimizing any runtime overhead in the send path to check for 'sanitize_tmo' for uncommon poll scenarios. The irq case is much simpler as hardware will serialize/error appropriately. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230612181038.14421-4-dave@stgolabs.net Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/mem: Introduce security state sysfs fileDavidlohr Bueso3-0/+46
Add a read-only sysfs file to display the security state of a device (currently only pmem): /sys/bus/cxl/devices/memX/security/state This introduces a cxl_security_state structure that is to be the placeholder for common CXL security features. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20230612181038.14421-3-dave@stgolabs.net Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/mbox: Allow for IRQ_NONE case in the isrDavidlohr Bueso1-2/+4
For cases when the mailbox background operation is not complete, do not "handle" the interrupt, as it was not from this device. And furthermore there are no racy scenarios such as the hw being out of sync with the driver and starting a new background op behind its back. Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Fixes: ccadf1310fb (cxl/mbox: Add background cmd handling machinery) Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230612181038.14421-2-dave@stgolabs.net Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25Revert "cxl/port: Enable the HDM decoder capability for switch ports"Dan Williams3-33/+9
commit eb0764b822b9 ("cxl/port: Enable the HDM decoder capability for switch ports") ...was added on the observation of CXL memory not being accessible after setting up a region on a "cold-plugged" device. A "cold-plugged" CXL device is one that was not present at boot, so platform-firmware/BIOS has no chance to set it up. While it is true that the debug found the enable bit clear in the host-bridge's instance of the global control register (CXL 3.0 8.2.4.19.2 CXL HDM Decoder Global Control Register), that bit is described as: "This bit is only applicable to CXL.mem devices and shall return 0 on CXL Host Bridges and Upstream Switch Ports." So it is meant to be zero, and further testing confirmed that this "fix" had no effect on the failure. Revert it, and be more vigilant about proposed fixes in the future. Since the original copied stable@, flag this revert for stable@ as well. Cc: <stable@vger.kernel.org> Fixes: eb0764b822b9 ("cxl/port: Enable the HDM decoder capability for switch ports") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168685882012.3475336.16733084892658264991.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/memdev: Formalize endpoint port linkageDan Williams4-5/+8
Move the endpoint port that the cxl_mem driver establishes from drvdata to a first class attribute. This is in preparation for device-memory drivers reusing the CXL core for memory region management. Those drivers need a type-safe method to retrieve their CXL port linkage. Leave drvdata for private usage of the cxl_mem driver not external consumers of a 'struct cxl_memdev' object. Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168679264292.3436160.3901392135863405807.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/pci: Unconditionally unmask 256B Flit errorsDan Williams1-16/+2
The current check for 256B Flit mode is incomplete and unnecessary. It is incomplete because it fails to consider the link speed, or check for CXL link capabilities. It is unnecessary because unconditionally unmasking 256B Flit errors is a nop when 256B Flit operation is not available. Remove this check in preparation for creating a cxl_probe_link() helper to centralize this detection. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168679263124.3436160.6228910132469454346.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/region: Manage decoder target_type at decoder-attach timeDan Williams1-0/+12
Switch-level (mid-level) decoders between the platform root and an endpoint can dynamically switch modes between HDM-H and HDM-D[B] depending on which region they target. Use the region type to fixup each decoder that gets allocated to map the given region. Note that endpoint decoders are meant to determine the region type, so warn if those ever need to be fixed up, but since it is possible to continue do so. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168679262543.3436160.13053831955768440312.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/hdm: Default CXL_DEVTYPE_DEVMEM decoders to CXL_DECODER_DEVMEMDan Williams2-10/+27
In preparation for device-memory region creation, arrange for decoders of CXL_DEVTYPE_DEVMEM memdevs to default to CXL_DECODER_DEVMEM for their target type. Revisit this if a device ever shows up that wants to offer mixed HDM-H (Host-Only Memory) and HDM-DB support, or an CXL_DEVTYPE_DEVMEM device that supports HDM-H. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168679261945.3436160.11673393474107374595.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/port: Rename CXL_DECODER_{EXPANDER, ACCELERATOR} => {HOSTONLYMEM, DEVMEM}Dan Williams5-12/+13
In preparation for support for HDM-D and HDM-DB configuration (device-memory, and device-memory with back-invalidate). Rename the current type designators to use HOSTONLYMEM and DEVMEM as a suffix. HDM-DB can be supported by devices that are not accelerators, so DEVMEM is a more generic term for that case. Fixup one location where this type value was open coded. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168679261369.3436160.7042443847605280593.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/memdev: Make mailbox functionality optionalDan Williams3-1/+28
In support of the Linux CXL core scaling for a wider set of CXL devices, allow for the creation of memdevs with some memory device capabilities disabled. Specifically, allow for CXL devices outside of those claiming to be compliant with the generic CXL memory device class code, like vendor specific Type-2/3 devices that host CXL.mem. This implies, allow for the creation of memdevs that only support component-registers, not necessarily memory-device-registers (like mailbox registers). A memdev derived from a CXL endpoint that does not support generic class code expectations is tagged "CXL_DEVTYPE_DEVMEM", while a memdev derived from a class-code compliant endpoint is tagged "CXL_DEVTYPE_CLASSMEM". The primary assumption of a CXL_DEVTYPE_DEVMEM memdev is that it optionally may not host a mailbox. Disable the command passthrough ioctl for memdevs that are not CXL_DEVTYPE_CLASSMEM, and return empty strings from memdev attributes associated with data retrieved via the class-device-standard IDENTIFY command. Note that empty strings were chosen over attribute visibility to maintain compatibility with shipping versions of cxl-cli that expect those attributes to always be present. Once cxl-cli has dropped that requirement this workaround can be deprecated. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168679260782.3436160.7587293613945445365.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/mbox: Move mailbox related driver state to its own data structureDan Williams7-271/+312
'struct cxl_dev_state' makes too many assumptions about the capabilities of a CXL device. In particular it assumes a CXL device has a mailbox and all of the infrastructure and state that comes along with that. In preparation for supporting accelerator / Type-2 devices that may not have a mailbox and in general maintain a minimal core context structure, make mailbox functionality a super-set of 'struct cxl_dev_state' with 'struct cxl_memdev_state'. With this reorganization it allows for CXL devices that support HDM decoder mapping, but not other general-expander / Type-3 capabilities, to only enable that subset without the rest of the mailbox infrastructure coming along for the ride. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168679260240.3436160.15520641540463704524.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl: Remove leftover attribute documentation in 'struct cxl_dev_state'Dan Williams1-1/+0
commit 14d788740774 ("cxl/mem: Consolidate CXL DVSEC Range enumeration in the core") ...removed @info from 'struct cxl_dev_state', but neglected to remove the corresponding kernel-doc entry. Complete the removal. Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Closes: http://lore.kernel.org/r/20230606121054.000069e1@Huawei.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168679259703.3436160.12583306507362357946.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl: Fix kernel-doc warningsDan Williams1-3/+3
After Jonathan noticed [1] that 'struct cxl_dev_state' had a kernel-doc entry without a corresponding struct attribute I ran the kernel-doc script to see what else might be broken. Fix these warnings: drivers/cxl/cxlmem.h:199: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Event Interrupt Policy drivers/cxl/cxlmem.h:224: warning: Function parameter or member 'buf' not described in 'cxl_event_state' drivers/cxl/cxlmem.h:224: warning: Function parameter or member 'log_lock' not described in 'cxl_event_state' Note that scripts/kernel-doc only finds missing kernel-doc entries. It does not warn on too many kernel-doc entries, i.e. it did not catch the fact that @info refers to a not present member. Link: http://lore.kernel.org/r/20230606121054.000069e1@Huawei.com [1] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168679259170.3436160.3686460404739136336.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/regs: Clarify when a 'struct cxl_register_map' is input vs outputDan Williams2-6/+6
The @map parameter to cxl_probe_X_registers() is filled in with the mapping parameters of the register block. The @map parameter to cxl_map_X_registers() only reads that information to perform the mapping. Mark @map const for cxl_map_X_registers() to clarify that it is only an input to those helpers. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168679258103.3436160.4941603739448763855.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/region: Fix state transitions after reset failureDan Williams1-11/+15
Jonathan reports that failed attempts to reset a region (teardown its HDM decoder configuration) mistakenly advance the state of the region to "not committed". Revert to the previous state of the region on reset failure so that the reset can be re-attempted. Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Closes: http://lore.kernel.org/r/20230316171441.0000205b@Huawei.com Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168696507968.3590522.14484000711718573626.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/region: Flag partially torn down regions as unusableDan Williams2-0/+20
cxl_region_decode_reset() walks all the decoders associated with a given region and disables them. Due to decoder ordering rules it is possible that a switch in the topology notices that a given decoder can not be shutdown before another region with a higher HPA is shutdown first. That can leave the region in a partially committed state. Capture that state in a new CXL_REGION_F_NEEDS_RESET flag and require that a successful cxl_region_decode_reset() attempt must be completed before cxl_region_probe() accepts the region. This is a corollary for the bug that Jonathan identified in "CXL/region : commit reset of out of order region appears to succeed." [1]. Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Link: http://lore.kernel.org/r/20230316171441.0000205b@Huawei.com [1] Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168696507423.3590522.16254212607926684429.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/region: Move cache invalidation before region teardown, and before setupDan Williams2-36/+38
Vikram raised a concern with the theoretical case of a CPU sending MemClnEvict to a device that is not prepared to receive. MemClnEvict is a message that is sent after a CPU has taken ownership of a cacheline from accelerator memory (HDM-DB). In the case of hotplug or HDM decoder reconfiguration it is possible that the CPU is holding old contents for a new device that has taken over the physical address range being cached by the CPU. To avoid this scenario, invalidate caches prior to tearing down an HDM decoder configuration. Now, this poses another problem that it is possible for something to speculate into that space while the decode configuration is still up, so to close that gap also invalidate prior to establish new contents behind a given physical address range. With this change the cache invalidation is now explicit and need not be checked in cxl_region_probe(), and that obviates the need for CXL_REGION_F_INCOHERENT. Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Fixes: d18bc74aced6 ("cxl/region: Manage CPU caches relative to DPA invalidation events") Reported-by: Vikram Sethi <vsethi@nvidia.com> Closes: http://lore.kernel.org/r/BYAPR12MB33364B5EB908BF7239BB996BBD53A@BYAPR12MB3336.namprd12.prod.outlook.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168696506886.3590522.4597053660991916591.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/port: Store the downstream port's Component Register mappings in struct ↵Robert Richter2-0/+13
cxl_dport Same as for ports, also store the downstream port's Component Register mappings, use struct cxl_dport for that. Signed-off-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230622205523.85375-16-terry.bowman@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/port: Store the port's Component Register mappings in struct cxl_portRobert Richter2-0/+29
CXL capabilities are stored in the Component Registers. To use them, the specific I/O ranges of the capabilities must be determined by probing the registers. For this, the whole Component Register range needs to be mapped temporarily to detect the offset and length of a capability range. In order to use more than one capability of a component (e.g. RAS and HDM) the Component Register are probed and its mappings created multiple times. This also causes overlapping I/O ranges as the whole Component Register range must be mapped again while a capability's I/O range is already mapped. Different capabilities cannot be setup at the same time. E.g. the RAS capability must be made available as soon as the PCI driver is bound, the HDM decoder is setup later during port enumeration. Moreover, during early setup it is still unknown if a certain capability is needed. A central capability setup is therefore not possible, capabilities must be individually enabled once needed during initialization. To avoid a duplicate register probe and overlapping I/O mappings, only probe the Component Registers one time and store the Component Register mapping in struct port. The stored mappings can be used later to iomap the capability register range when enabling the capability, which will be implemented in a follow-on patch. Signed-off-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230622205523.85375-15-terry.bowman@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>