diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2024-03-11 18:14:06 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2024-03-11 18:14:06 -0700 |
commit | b0402403e54ae9eb94ce1cbb53c7def776e97426 (patch) | |
tree | 64198f48106bd0909cedf604df42ad8c667ce388 /Documentation | |
parent | 1f75619a721d5149d9a947f2177d3cffc473fbb7 (diff) | |
parent | af65545a0f82d7336f62e34f69d3c644806f5f95 (diff) | |
download | linux-b0402403e54ae9eb94ce1cbb53c7def776e97426.tar.gz |
Merge tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:
- Add a FRU (Field Replaceable Unit) memory poison manager which
collects and manages previously encountered hw errors in order to
save them to persistent storage across reboots. Previously recorded
errors are "replayed" upon reboot in order to poison memory which has
caused said errors in the past.
The main use case is stacked, on-chip memory which cannot simply be
replaced so poisoning faulty areas of it and thus making them
inaccessible is the only strategy to prolong its lifetime.
- Add an AMD address translation library glue which converts the
reported addresses of hw errors into system physical addresses in
order to be used by other subsystems like memory failure, for
example. Add support for MI300 accelerators to that library.
- igen6: Add support for Alder Lake-N SoC
- i10nm: Add Grand Ridge support
- The usual fixlets and cleanups
* tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/versal: Convert to platform remove callback returning void
RAS/AMD/FMPM: Fix off by one when unwinding on error
RAS/AMD/FMPM: Add debugfs interface to print record entries
RAS/AMD/FMPM: Save SPA values
RAS: Export helper to get ras_debugfs_dir
RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2()
RAS: Introduce a FRU memory poison manager
RAS/AMD/ATL: Add MI300 row retirement support
Documentation: Move RAS section to admin-guide
EDAC/versal: Make the bit position of injected errors configurable
EDAC/i10nm: Add Intel Grand Ridge micro-server support
EDAC/igen6: Add one more Intel Alder Lake-N SoC support
RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support
RAS/AMD/ATL: Fix array overflow in get_logical_coh_st_fabric_id_mi300()
RAS/AMD/ATL: Add MI300 support
Documentation: RAS: Add index and address translation section
EDAC/amd64: Use new AMD Address Translation Library
RAS: Introduce AMD Address Translation Library
EDAC/synopsys: Convert to devm_platform_ioremap_resource()
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/admin-guide/RAS/address-translation.rst | 24 | ||||
-rw-r--r-- | Documentation/admin-guide/RAS/error-decoding.rst (renamed from Documentation/RAS/ras.rst) | 11 | ||||
-rw-r--r-- | Documentation/admin-guide/RAS/index.rst | 7 | ||||
-rw-r--r-- | Documentation/admin-guide/RAS/main.rst (renamed from Documentation/admin-guide/ras.rst) | 10 | ||||
-rw-r--r-- | Documentation/admin-guide/index.rst | 2 | ||||
-rw-r--r-- | Documentation/index.rst | 1 |
6 files changed, 42 insertions, 13 deletions
diff --git a/Documentation/admin-guide/RAS/address-translation.rst b/Documentation/admin-guide/RAS/address-translation.rst new file mode 100644 index 00000000000000..f0ca17b43cd3de --- /dev/null +++ b/Documentation/admin-guide/RAS/address-translation.rst @@ -0,0 +1,24 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Address translation +=================== + +x86 AMD +------- + +Zen-based AMD systems include a Data Fabric that manages the layout of +physical memory. Devices attached to the Fabric, like memory controllers, +I/O, etc., may not have a complete view of the system physical memory map. +These devices may provide a "normalized", i.e. device physical, address +when reporting memory errors. Normalized addresses must be translated to +a system physical address for the kernel to action on the memory. + +AMD Address Translation Library (CONFIG_AMD_ATL) provides translation for +this case. + +Glossary of acronyms used in address translation for Zen-based systems + +* CCM = Cache Coherent Moderator +* COD = Cluster-on-Die +* COH_ST = Coherent Station +* DF = Data Fabric diff --git a/Documentation/RAS/ras.rst b/Documentation/admin-guide/RAS/error-decoding.rst index 2556b397cd271f..26a72f3fe5de83 100644 --- a/Documentation/RAS/ras.rst +++ b/Documentation/admin-guide/RAS/error-decoding.rst @@ -1,15 +1,10 @@ .. SPDX-License-Identifier: GPL-2.0 -Reliability, Availability and Serviceability features -===================================================== - -This documents different aspects of the RAS functionality present in the -kernel. - Error decoding ---------------- +============== -* x86 +x86 +--- Error decoding on AMD systems should be done using the rasdaemon tool: https://github.com/mchehab/rasdaemon/ diff --git a/Documentation/admin-guide/RAS/index.rst b/Documentation/admin-guide/RAS/index.rst new file mode 100644 index 00000000000000..f4087040a7c054 --- /dev/null +++ b/Documentation/admin-guide/RAS/index.rst @@ -0,0 +1,7 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. toctree:: + :maxdepth: 2 + + main + error-decoding + address-translation diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/RAS/main.rst index 8e03751d126d01..7ac1d4ccc50993 100644 --- a/Documentation/admin-guide/ras.rst +++ b/Documentation/admin-guide/RAS/main.rst @@ -1,8 +1,12 @@ +.. SPDX-License-Identifier: GPL-2.0 .. include:: <isonum.txt> -============================================ -Reliability, Availability and Serviceability -============================================ +================================================== +Reliability, Availability and Serviceability (RAS) +================================================== + +This documents different aspects of the RAS functionality present in the +kernel. RAS concepts ************ diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst index fb40a1f6f79e18..dfc06fab943225 100644 --- a/Documentation/admin-guide/index.rst +++ b/Documentation/admin-guide/index.rst @@ -122,7 +122,7 @@ configure specific aspects of kernel behavior to your liking. pmf pnp rapidio - ras + RAS/index rtc serial-console svga diff --git a/Documentation/index.rst b/Documentation/index.rst index 36e61783437c10..9dfdc826618c08 100644 --- a/Documentation/index.rst +++ b/Documentation/index.rst @@ -113,7 +113,6 @@ to ReStructured Text format, or are simply too old. :maxdepth: 1 staging/index - RAS/ras Translations |