sphinx.addnodesdocument)}( rawsourcechildren]( translations LanguagesNode)}(hhh](h pending_xref)}(hhh]docutils.nodesTextChinese (Simplified)}parenthsba attributes}(ids]classes]names]dupnames]backrefs] refdomainstdreftypedoc reftarget8/translations/zh_CN/admin-guide/perf/nvidia-tegra410-pmumodnameN classnameN refexplicitutagnamehhh ubh)}(hhh]hChinese (Traditional)}hh2sbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget8/translations/zh_TW/admin-guide/perf/nvidia-tegra410-pmumodnameN classnameN refexplicituh1hhh ubh)}(hhh]hItalian}hhFsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget8/translations/it_IT/admin-guide/perf/nvidia-tegra410-pmumodnameN classnameN refexplicituh1hhh ubh)}(hhh]hJapanese}hhZsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget8/translations/ja_JP/admin-guide/perf/nvidia-tegra410-pmumodnameN classnameN refexplicituh1hhh ubh)}(hhh]hKorean}hhnsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget8/translations/ko_KR/admin-guide/perf/nvidia-tegra410-pmumodnameN classnameN refexplicituh1hhh ubh)}(hhh]hPortuguese (Brazilian)}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget8/translations/pt_BR/admin-guide/perf/nvidia-tegra410-pmumodnameN classnameN refexplicituh1hhh ubh)}(hhh]hSpanish}hhsbah}(h]h ]h"]h$]h&] refdomainh)reftypeh+ reftarget8/translations/sp_SP/admin-guide/perf/nvidia-tegra410-pmumodnameN classnameN refexplicituh1hhh ubeh}(h]h ]h"]h$]h&]current_languageEnglishuh1h hh _documenthsourceNlineNubhsection)}(hhh](htitle)}(hubah}(h]h ]h"]h$]h&]uh1hhhhhhhhNubh)}(hNV-CLinkh]h)}(hjWh]hNV-CLink}(hjYhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK hjUubah}(h]h ]h"]h$]h&]uh1hhhhhhhhNubh)}(h NV-DLink h]h)}(hNV-DLinkh]hNV-DLink}(hjphhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjlubah}(h]h ]h"]h$]h&]uh1hhhhhhhhNubeh}(h]h ]h"]h$]h&]bullet*uh1hhhhKhhhhubh)}(hhh](h)}(h PMU Driverh]h PMU Driver}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(hXThe PMU driver describes the available events and configuration of each PMU in sysfs. Please see the sections below to get the sysfs path of each PMU. Like other uncore PMU drivers, the driver provides "cpumask" sysfs attribute to show the CPU id used to handle the PMU event. There is also "associated_cpus" sysfs attribute, which contains a list of CPUs associated with the PMU instance.h]hXThe PMU driver describes the available events and configuration of each PMU in sysfs. Please see the sections below to get the sysfs path of each PMU. Like other uncore PMU drivers, the driver provides “cpumask” sysfs attribute to show the CPU id used to handle the PMU event. There is also “associated_cpus” sysfs attribute, which contains a list of CPUs associated with the PMU instance.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubeh}(h] pmu-driverah ]h"] pmu driverah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(hUCF PMUh]hUCF PMU}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(hThe Unified Coherence Fabric (UCF) in the NVIDIA Tegra410 SoC serves as a distributed cache, last level for CPU Memory and CXL Memory, and cache coherent interconnect that supports hardware coherence across multiple coherently caching agents, including:h]hThe Unified Coherence Fabric (UCF) in the NVIDIA Tegra410 SoC serves as a distributed cache, last level for CPU Memory and CXL Memory, and cache coherent interconnect that supports hardware coherence across multiple coherently caching agents, including:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubh block_quote)}(hZ* CPU clusters * GPU * PCIe Ordering Controller Unit (OCU) * Other IO-coherent requesters h]h)}(hhh](h)}(h CPU clustersh]h)}(hjh]h CPU clusters}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK!hjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hGPUh]h)}(hjh]hGPU}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK"hjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h#PCIe Ordering Controller Unit (OCU)h]h)}(hj h]h#PCIe Ordering Controller Unit (OCU)}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK#hj ubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hOther IO-coherent requesters h]h)}(hOther IO-coherent requestersh]hOther IO-coherent requesters}(hj$hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK$hj ubah}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]jjuh1hhhhK!hjubah}(h]h ]h"]h$]h&]uh1jhhhK!hjhhubh)}(hThe events and configuration options of this PMU device are described in sysfs, see /sys/bus/event_source/devices/nvidia_ucf_pmu_.h]hThe events and configuration options of this PMU device are described in sysfs, see /sys/bus/event_source/devices/nvidia_ucf_pmu_.}(hjDhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK&hjhhubh)}(hZSome of the events available in this PMU can be used to measure bandwidth and utilization:h]hZSome of the events available in this PMU can be used to measure bandwidth and utilization:}(hjRhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK)hjhhubj)}(hXU* slc_access_rd: count the number of read requests to SLC. * slc_access_wr: count the number of write requests to SLC. * slc_bytes_rd: count the number of bytes transferred by slc_access_rd. * slc_bytes_wr: count the number of bytes transferred by slc_access_wr. * mem_access_rd: count the number of read requests to local or remote memory. * mem_access_wr: count the number of write requests to local or remote memory. * mem_bytes_rd: count the number of bytes transferred by mem_access_rd. * mem_bytes_wr: count the number of bytes transferred by mem_access_wr. * cycles: counts the UCF cycles. h]h)}(hhh](h)}(h8slc_access_rd: count the number of read requests to SLC.h]h)}(hjih]h8slc_access_rd: count the number of read requests to SLC.}(hjkhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK,hjgubah}(h]h ]h"]h$]h&]uh1hhjdubh)}(h9slc_access_wr: count the number of write requests to SLC.h]h)}(hjh]h9slc_access_wr: count the number of write requests to SLC.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK-hj~ubah}(h]h ]h"]h$]h&]uh1hhjdubh)}(hEslc_bytes_rd: count the number of bytes transferred by slc_access_rd.h]h)}(hjh]hEslc_bytes_rd: count the number of bytes transferred by slc_access_rd.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK.hjubah}(h]h ]h"]h$]h&]uh1hhjdubh)}(hEslc_bytes_wr: count the number of bytes transferred by slc_access_wr.h]h)}(hjh]hEslc_bytes_wr: count the number of bytes transferred by slc_access_wr.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK/hjubah}(h]h ]h"]h$]h&]uh1hhjdubh)}(hKmem_access_rd: count the number of read requests to local or remote memory.h]h)}(hjh]hKmem_access_rd: count the number of read requests to local or remote memory.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK0hjubah}(h]h ]h"]h$]h&]uh1hhjdubh)}(hLmem_access_wr: count the number of write requests to local or remote memory.h]h)}(hjh]hLmem_access_wr: count the number of write requests to local or remote memory.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK1hjubah}(h]h ]h"]h$]h&]uh1hhjdubh)}(hEmem_bytes_rd: count the number of bytes transferred by mem_access_rd.h]h)}(hjh]hEmem_bytes_rd: count the number of bytes transferred by mem_access_rd.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK2hjubah}(h]h ]h"]h$]h&]uh1hhjdubh)}(hEmem_bytes_wr: count the number of bytes transferred by mem_access_wr.h]h)}(hj h]hEmem_bytes_wr: count the number of bytes transferred by mem_access_wr.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK3hjubah}(h]h ]h"]h$]h&]uh1hhjdubh)}(hcycles: counts the UCF cycles. h]h)}(hcycles: counts the UCF cycles.h]hcycles: counts the UCF cycles.}(hj#hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK4hjubah}(h]h ]h"]h$]h&]uh1hhjdubeh}(h]h ]h"]h$]h&]jjuh1hhhhK,hj`ubah}(h]h ]h"]h$]h&]uh1jhhhK,hjhhubh)}(h(The average bandwidth is calculated as::h]h'The average bandwidth is calculated as:}(hjChhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK6hjhhubh literal_block)}(hX AVG_SLC_READ_BANDWIDTH_IN_GBPS = SLC_BYTES_RD / ELAPSED_TIME_IN_NS AVG_SLC_WRITE_BANDWIDTH_IN_GBPS = SLC_BYTES_WR / ELAPSED_TIME_IN_NS AVG_MEM_READ_BANDWIDTH_IN_GBPS = MEM_BYTES_RD / ELAPSED_TIME_IN_NS AVG_MEM_WRITE_BANDWIDTH_IN_GBPS = MEM_BYTES_WR / ELAPSED_TIME_IN_NSh]hX AVG_SLC_READ_BANDWIDTH_IN_GBPS = SLC_BYTES_RD / ELAPSED_TIME_IN_NS AVG_SLC_WRITE_BANDWIDTH_IN_GBPS = SLC_BYTES_WR / ELAPSED_TIME_IN_NS AVG_MEM_READ_BANDWIDTH_IN_GBPS = MEM_BYTES_RD / ELAPSED_TIME_IN_NS AVG_MEM_WRITE_BANDWIDTH_IN_GBPS = MEM_BYTES_WR / ELAPSED_TIME_IN_NS}hjSsbah}(h]h ]h"]h$]h&] xml:spacepreserveuh1jQhhhK8hjhhubh)}(h+The average request rate is calculated as::h]h*The average request rate is calculated as:}(hjchhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK=hjhhubjR)}(hAVG_SLC_READ_REQUEST_RATE = SLC_ACCESS_RD / CYCLES AVG_SLC_WRITE_REQUEST_RATE = SLC_ACCESS_WR / CYCLES AVG_MEM_READ_REQUEST_RATE = MEM_ACCESS_RD / CYCLES AVG_MEM_WRITE_REQUEST_RATE = MEM_ACCESS_WR / CYCLESh]hAVG_SLC_READ_REQUEST_RATE = SLC_ACCESS_RD / CYCLES AVG_SLC_WRITE_REQUEST_RATE = SLC_ACCESS_WR / CYCLES AVG_MEM_READ_REQUEST_RATE = MEM_ACCESS_RD / CYCLES AVG_MEM_WRITE_REQUEST_RATE = MEM_ACCESS_WR / CYCLES}hjqsbah}(h]h ]h"]h$]h&]jajbuh1jQhhhK?hjhhubh)}(hkMore details about what other events are available can be found in Tegra410 SoC technical reference manual.h]hkMore details about what other events are available can be found in Tegra410 SoC technical reference manual.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKDhjhhubh)}(hX'The events can be filtered based on source or destination. The source filter indicates the traffic initiator to the SLC, e.g local CPU, non-CPU device, or remote socket. The destination filter specifies the destination memory type, e.g. local system memory (CMEM), local GPU memory (GMEM), or remote memory. The local/remote classification of the destination filter is based on the home socket of the address, not where the data actually resides. The available filters are described in /sys/bus/event_source/devices/nvidia_ucf_pmu_/format/.h]hX'The events can be filtered based on source or destination. The source filter indicates the traffic initiator to the SLC, e.g local CPU, non-CPU device, or remote socket. The destination filter specifies the destination memory type, e.g. local system memory (CMEM), local GPU memory (GMEM), or remote memory. The local/remote classification of the destination filter is based on the home socket of the address, not where the data actually resides. The available filters are described in /sys/bus/event_source/devices/nvidia_ucf_pmu_/format/.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKGhjhhubh)}(h"The list of UCF PMU event filters:h]h"The list of UCF PMU event filters:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKPhjhhubh)}(hhh](h)}(hSource filter: * src_loc_cpu: if set, count events from local CPU * src_loc_noncpu: if set, count events from local non-CPU device * src_rem: if set, count events from CPU, GPU, PCIE devices of remote socket h](h)}(hSource filter:h]hSource filter:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKRhjubh)}(hhh](h)}(h0src_loc_cpu: if set, count events from local CPUh]h)}(hjh]h0src_loc_cpu: if set, count events from local CPU}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKThjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h>src_loc_noncpu: if set, count events from local non-CPU deviceh]h)}(hjh]h>src_loc_noncpu: if set, count events from local non-CPU device}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKUhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hKsrc_rem: if set, count events from CPU, GPU, PCIE devices of remote socket h]h)}(hJsrc_rem: if set, count events from CPU, GPU, PCIE devices of remote socketh]hJsrc_rem: if set, count events from CPU, GPU, PCIE devices of remote socket}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKVhjubah}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]jjuh1hhhhKThjubeh}(h]h ]h"]h$]h&]uh1hhjhhhNhNubh)}(hX?Destination filter: * dst_loc_cmem: if set, count events to local system memory (CMEM) address * dst_loc_gmem: if set, count events to local GPU memory (GMEM) address * dst_loc_other: if set, count events to local CXL memory address * dst_rem: if set, count events to CPU, GPU, and CXL memory address of remote socket h](h)}(hDestination filter:h]hDestination filter:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKXhjubh)}(hhh](h)}(hHdst_loc_cmem: if set, count events to local system memory (CMEM) addressh]h)}(hj*h]hHdst_loc_cmem: if set, count events to local system memory (CMEM) address}(hj,hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKZhj(ubah}(h]h ]h"]h$]h&]uh1hhj%ubh)}(hEdst_loc_gmem: if set, count events to local GPU memory (GMEM) addressh]h)}(hjAh]hEdst_loc_gmem: if set, count events to local GPU memory (GMEM) address}(hjChhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK[hj?ubah}(h]h ]h"]h$]h&]uh1hhj%ubh)}(h?dst_loc_other: if set, count events to local CXL memory addressh]h)}(hjXh]h?dst_loc_other: if set, count events to local CXL memory address}(hjZhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK\hjVubah}(h]h ]h"]h$]h&]uh1hhj%ubh)}(hSdst_rem: if set, count events to CPU, GPU, and CXL memory address of remote socket h]h)}(hRdst_rem: if set, count events to CPU, GPU, and CXL memory address of remote socketh]hRdst_rem: if set, count events to CPU, GPU, and CXL memory address of remote socket}(hjqhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK]hjmubah}(h]h ]h"]h$]h&]uh1hhj%ubeh}(h]h ]h"]h$]h&]jjuh1hhhhKZhjubeh}(h]h ]h"]h$]h&]uh1hhjhhhNhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhKRhjhhubh)}(hIf the source is not specified, the PMU will count events from all sources. If the destination is not specified, the PMU will count events to all destinations.h]hIf the source is not specified, the PMU will count events from all sources. If the destination is not specified, the PMU will count events to all destinations.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK_hjhhubh)}(hExample usage:h]hExample usage:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKbhjhhubh)}(hhh](h)}(hyCount event id 0x0 in socket 0 from all sources and to all destinations:: perf stat -a -e nvidia_ucf_pmu_0/event=0x0/ h](h)}(hICount event id 0x0 in socket 0 from all sources and to all destinations::h]hHCount event id 0x0 in socket 0 from all sources and to all destinations:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKdhjubjR)}(h+perf stat -a -e nvidia_ucf_pmu_0/event=0x0/h]h+perf stat -a -e nvidia_ucf_pmu_0/event=0x0/}hjsbah}(h]h ]h"]h$]h&]jajbuh1jQhhhKfhjubeh}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hCount event id 0x0 in socket 0 with source filter = local CPU and destination filter = local system memory (CMEM):: perf stat -a -e nvidia_ucf_pmu_0/event=0x0,src_loc_cpu=0x1,dst_loc_cmem=0x1/ h](h)}(hsCount event id 0x0 in socket 0 with source filter = local CPU and destination filter = local system memory (CMEM)::h]hrCount event id 0x0 in socket 0 with source filter = local CPU and destination filter = local system memory (CMEM):}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhhjubjR)}(hLperf stat -a -e nvidia_ucf_pmu_0/event=0x0,src_loc_cpu=0x1,dst_loc_cmem=0x1/h]hLperf stat -a -e nvidia_ucf_pmu_0/event=0x0,src_loc_cpu=0x1,dst_loc_cmem=0x1/}hjsbah}(h]h ]h"]h$]h&]jajbuh1jQhhhKkhjubeh}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hCount event id 0x0 in socket 1 with source filter = local non-CPU device and destination filter = remote memory:: perf stat -a -e nvidia_ucf_pmu_1/event=0x0,src_loc_noncpu=0x1,dst_rem=0x1/ h](h)}(hqCount event id 0x0 in socket 1 with source filter = local non-CPU device and destination filter = remote memory::h]hpCount event id 0x0 in socket 1 with source filter = local non-CPU device and destination filter = remote memory:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKmhjubjR)}(hJperf stat -a -e nvidia_ucf_pmu_1/event=0x0,src_loc_noncpu=0x1,dst_rem=0x1/h]hJperf stat -a -e nvidia_ucf_pmu_1/event=0x0,src_loc_noncpu=0x1,dst_rem=0x1/}hjsbah}(h]h ]h"]h$]h&]jajbuh1jQhhhKphjubeh}(h]h ]h"]h$]h&]uh1hhjhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhKdhjhhubeh}(h]ucf-pmuah ]h"]ucf pmuah$]h&]uh1hhhhhhhhKubh)}(hhh](h)}(hPCIE PMUh]hPCIE PMU}(hj9hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj6hhhhhKsubh)}(hX"This PMU is located in the SOC fabric connecting the PCIE root complex (RC) and the memory subsystem. It monitors all read/write traffic from the root port(s) or a particular BDF in a PCIE RC to local or remote memory. There is one PMU per PCIE RC in the SoC. Each RC can have up to 16 lanes that can be bifurcated into up to 8 root ports. The traffic from each root port can be filtered using RP or BDF filter. For example, specifying "src_rp_mask=0xFF" means the PMU counter will capture traffic from all RPs. Please see below for more details.h]hX&This PMU is located in the SOC fabric connecting the PCIE root complex (RC) and the memory subsystem. It monitors all read/write traffic from the root port(s) or a particular BDF in a PCIE RC to local or remote memory. There is one PMU per PCIE RC in the SoC. Each RC can have up to 16 lanes that can be bifurcated into up to 8 root ports. The traffic from each root port can be filtered using RP or BDF filter. For example, specifying “src_rp_mask=0xFF” means the PMU counter will capture traffic from all RPs. Please see below for more details.}(hjGhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKuhj6hhubh)}(hThe events and configuration options of this PMU device are described in sysfs, see /sys/bus/event_source/devices/nvidia_pcie_pmu__rc_.h]hThe events and configuration options of this PMU device are described in sysfs, see /sys/bus/event_source/devices/nvidia_pcie_pmu__rc_.}(hjUhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhK}hj6hhubh)}(hRThe events in this PMU can be used to measure bandwidth, utilization, and latency:h]hRThe events in this PMU can be used to measure bandwidth, utilization, and latency:}(hjchhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj6hhubj)}(hXw* rd_req: count the number of read requests by PCIE device. * wr_req: count the number of write requests by PCIE device. * rd_bytes: count the number of bytes transferred by rd_req. * wr_bytes: count the number of bytes transferred by wr_req. * rd_cum_outs: count outstanding rd_req each cycle. * cycles: count the clock cycles of SOC fabric connected to the PCIE interface. h]h)}(hhh](h)}(h9rd_req: count the number of read requests by PCIE device.h]h)}(hjzh]h9rd_req: count the number of read requests by PCIE device.}(hj|hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjxubah}(h]h ]h"]h$]h&]uh1hhjuubh)}(h:wr_req: count the number of write requests by PCIE device.h]h)}(hjh]h:wr_req: count the number of write requests by PCIE device.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjuubh)}(h:rd_bytes: count the number of bytes transferred by rd_req.h]h)}(hjh]h:rd_bytes: count the number of bytes transferred by rd_req.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjuubh)}(h:wr_bytes: count the number of bytes transferred by wr_req.h]h)}(hjh]h:wr_bytes: count the number of bytes transferred by wr_req.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjuubh)}(h1rd_cum_outs: count outstanding rd_req each cycle.h]h)}(hjh]h1rd_cum_outs: count outstanding rd_req each cycle.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjuubh)}(hNcycles: count the clock cycles of SOC fabric connected to the PCIE interface. h]h)}(hMcycles: count the clock cycles of SOC fabric connected to the PCIE interface.h]hMcycles: count the clock cycles of SOC fabric connected to the PCIE interface.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjuubeh}(h]h ]h"]h$]h&]jjuh1hhhhKhjqubah}(h]h ]h"]h$]h&]uh1jhhhKhj6hhubh)}(h(The average bandwidth is calculated as::h]h'The average bandwidth is calculated as:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj6hhubjR)}(hqAVG_RD_BANDWIDTH_IN_GBPS = RD_BYTES / ELAPSED_TIME_IN_NS AVG_WR_BANDWIDTH_IN_GBPS = WR_BYTES / ELAPSED_TIME_IN_NSh]hqAVG_RD_BANDWIDTH_IN_GBPS = RD_BYTES / ELAPSED_TIME_IN_NS AVG_WR_BANDWIDTH_IN_GBPS = WR_BYTES / ELAPSED_TIME_IN_NS}hjsbah}(h]h ]h"]h$]h&]jajbuh1jQhhhKhj6hhubh)}(h+The average request rate is calculated as::h]h*The average request rate is calculated as:}(hj+hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj6hhubjR)}(hKAVG_RD_REQUEST_RATE = RD_REQ / CYCLES AVG_WR_REQUEST_RATE = WR_REQ / CYCLESh]hKAVG_RD_REQUEST_RATE = RD_REQ / CYCLES AVG_WR_REQUEST_RATE = WR_REQ / CYCLES}hj9sbah}(h]h ]h"]h$]h&]jajbuh1jQhhhKhj6hhubh)}(h&The average latency is calculated as::h]h%The average latency is calculated as:}(hjGhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj6hhubjR)}(hFREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS AVG_LATENCY_IN_CYCLES = RD_CUM_OUTS / RD_REQ AVERAGE_LATENCY_IN_NS = AVG_LATENCY_IN_CYCLES / FREQ_IN_GHZh]hFREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS AVG_LATENCY_IN_CYCLES = RD_CUM_OUTS / RD_REQ AVERAGE_LATENCY_IN_NS = AVG_LATENCY_IN_CYCLES / FREQ_IN_GHZ}hjUsbah}(h]h ]h"]h$]h&]jajbuh1jQhhhKhj6hhubh)}(hXThe PMU events can be filtered based on the traffic source and destination. The source filter indicates the PCIE devices that will be monitored. The destination filter specifies the destination memory type, e.g. local system memory (CMEM), local GPU memory (GMEM), or remote memory. The local/remote classification of the destination filter is based on the home socket of the address, not where the data actually resides. These filters can be found in /sys/bus/event_source/devices/nvidia_pcie_pmu__rc_/format/.h]hXThe PMU events can be filtered based on the traffic source and destination. The source filter indicates the PCIE devices that will be monitored. The destination filter specifies the destination memory type, e.g. local system memory (CMEM), local GPU memory (GMEM), or remote memory. The local/remote classification of the destination filter is based on the home socket of the address, not where the data actually resides. These filters can be found in /sys/bus/event_source/devices/nvidia_pcie_pmu__rc_/format/.}(hjchhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj6hhubh)}(hThe list of event filters:h]hThe list of event filters:}(hjqhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj6hhubh)}(hhh](h)}(hXSource filter: * src_rp_mask: bitmask of root ports that will be monitored. Each bit in this bitmask represents the RP index in the RC. If the bit is set, all devices under the associated RP will be monitored. E.g "src_rp_mask=0xF" will monitor devices in root port 0 to 3. * src_bdf: the BDF that will be monitored. This is a 16-bit value that follows formula: (bus << 8) + (device << 3) + (function). For example, the value of BDF 27:01.1 is 0x2781. * src_bdf_en: enable the BDF filter. If this is set, the BDF filter value in "src_bdf" is used to filter the traffic. Note that Root-Port and BDF filters are mutually exclusive and the PMU in each RC can only have one BDF filter for the whole counters. If BDF filter is enabled, the BDF filter value will be applied to all events. h](h)}(hSource filter:h]hSource filter:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubh)}(hhh](h)}(hXsrc_rp_mask: bitmask of root ports that will be monitored. Each bit in this bitmask represents the RP index in the RC. If the bit is set, all devices under the associated RP will be monitored. E.g "src_rp_mask=0xF" will monitor devices in root port 0 to 3.h]h)}(hXsrc_rp_mask: bitmask of root ports that will be monitored. Each bit in this bitmask represents the RP index in the RC. If the bit is set, all devices under the associated RP will be monitored. E.g "src_rp_mask=0xF" will monitor devices in root port 0 to 3.h]hXsrc_rp_mask: bitmask of root ports that will be monitored. Each bit in this bitmask represents the RP index in the RC. If the bit is set, all devices under the associated RP will be monitored. E.g “src_rp_mask=0xF” will monitor devices in root port 0 to 3.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(hsrc_bdf: the BDF that will be monitored. This is a 16-bit value that follows formula: (bus << 8) + (device << 3) + (function). For example, the value of BDF 27:01.1 is 0x2781.h]h)}(hsrc_bdf: the BDF that will be monitored. This is a 16-bit value that follows formula: (bus << 8) + (device << 3) + (function). For example, the value of BDF 27:01.1 is 0x2781.h]hsrc_bdf: the BDF that will be monitored. This is a 16-bit value that follows formula: (bus << 8) + (device << 3) + (function). For example, the value of BDF 27:01.1 is 0x2781.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(htsrc_bdf_en: enable the BDF filter. If this is set, the BDF filter value in "src_bdf" is used to filter the traffic. h]h)}(hssrc_bdf_en: enable the BDF filter. If this is set, the BDF filter value in "src_bdf" is used to filter the traffic.h]hwsrc_bdf_en: enable the BDF filter. If this is set, the BDF filter value in “src_bdf” is used to filter the traffic.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]jjuh1hhhhKhjubh)}(hNote that Root-Port and BDF filters are mutually exclusive and the PMU in each RC can only have one BDF filter for the whole counters. If BDF filter is enabled, the BDF filter value will be applied to all events.h]hNote that Root-Port and BDF filters are mutually exclusive and the PMU in each RC can only have one BDF filter for the whole counters. If BDF filter is enabled, the BDF filter value will be applied to all events.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubeh}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hXjDestination filter: * dst_loc_cmem: if set, count events to local system memory (CMEM) address * dst_loc_gmem: if set, count events to local GPU memory (GMEM) address * dst_loc_pcie_p2p: if set, count events to local PCIE peer address * dst_loc_pcie_cxl: if set, count events to local CXL memory address * dst_rem: if set, count events to remote memory address h](h)}(hDestination filter:h]hDestination filter:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubh)}(hhh](h)}(hHdst_loc_cmem: if set, count events to local system memory (CMEM) addressh]h)}(hjh]hHdst_loc_cmem: if set, count events to local system memory (CMEM) address}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(hEdst_loc_gmem: if set, count events to local GPU memory (GMEM) addressh]h)}(hj'h]hEdst_loc_gmem: if set, count events to local GPU memory (GMEM) address}(hj)hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj%ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(hAdst_loc_pcie_p2p: if set, count events to local PCIE peer addressh]h)}(hj>h]hAdst_loc_pcie_p2p: if set, count events to local PCIE peer address}(hj@hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj<ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(hBdst_loc_pcie_cxl: if set, count events to local CXL memory addressh]h)}(hjUh]hBdst_loc_pcie_cxl: if set, count events to local CXL memory address}(hjWhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjSubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(h7dst_rem: if set, count events to remote memory address h]h)}(h6dst_rem: if set, count events to remote memory addressh]h6dst_rem: if set, count events to remote memory address}(hjnhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjjubah}(h]h ]h"]h$]h&]uh1hhj ubeh}(h]h ]h"]h$]h&]jjuh1hhhhKhjubeh}(h]h ]h"]h$]h&]uh1hhjhhhNhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhKhj6hhubh)}(hIf the source filter is not specified, the PMU will count events from all root ports. If the destination filter is not specified, the PMU will count events to all destinations.h]hIf the source filter is not specified, the PMU will count events from all root ports. If the destination filter is not specified, the PMU will count events to all destinations.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj6hhubh)}(hExample usage:h]hExample usage:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj6hhubh)}(hhh](h)}(hCount event id 0x0 from root port 0 of PCIE RC-0 on socket 0 targeting all destinations:: perf stat -a -e nvidia_pcie_pmu_0_rc_0/event=0x0,src_rp_mask=0x1/ h](h)}(hYCount event id 0x0 from root port 0 of PCIE RC-0 on socket 0 targeting all destinations::h]hXCount event id 0x0 from root port 0 of PCIE RC-0 on socket 0 targeting all destinations:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubjR)}(hAperf stat -a -e nvidia_pcie_pmu_0_rc_0/event=0x0,src_rp_mask=0x1/h]hAperf stat -a -e nvidia_pcie_pmu_0_rc_0/event=0x0,src_rp_mask=0x1/}hjsbah}(h]h ]h"]h$]h&]jajbuh1jQhhhKhjubeh}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hCount event id 0x1 from root port 0 and 1 of PCIE RC-1 on socket 0 and targeting just local CMEM of socket 0:: perf stat -a -e nvidia_pcie_pmu_0_rc_1/event=0x1,src_rp_mask=0x3,dst_loc_cmem=0x1/ h](h)}(hnCount event id 0x1 from root port 0 and 1 of PCIE RC-1 on socket 0 and targeting just local CMEM of socket 0::h]hmCount event id 0x1 from root port 0 and 1 of PCIE RC-1 on socket 0 and targeting just local CMEM of socket 0:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubjR)}(hRperf stat -a -e nvidia_pcie_pmu_0_rc_1/event=0x1,src_rp_mask=0x3,dst_loc_cmem=0x1/h]hRperf stat -a -e nvidia_pcie_pmu_0_rc_1/event=0x1,src_rp_mask=0x3,dst_loc_cmem=0x1/}hjsbah}(h]h ]h"]h$]h&]jajbuh1jQhhhKhjubeh}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hCount event id 0x2 from root port 0 of PCIE RC-2 on socket 1 targeting all destinations:: perf stat -a -e nvidia_pcie_pmu_1_rc_2/event=0x2,src_rp_mask=0x1/ h](h)}(hYCount event id 0x2 from root port 0 of PCIE RC-2 on socket 1 targeting all destinations::h]hXCount event id 0x2 from root port 0 of PCIE RC-2 on socket 1 targeting all destinations:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjubjR)}(hAperf stat -a -e nvidia_pcie_pmu_1_rc_2/event=0x2,src_rp_mask=0x1/h]hAperf stat -a -e nvidia_pcie_pmu_1_rc_2/event=0x2,src_rp_mask=0x1/}hjsbah}(h]h ]h"]h$]h&]jajbuh1jQhhhKhjubeh}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hCount event id 0x3 from root port 0 and 1 of PCIE RC-3 on socket 1 and targeting just local CMEM of socket 1:: perf stat -a -e nvidia_pcie_pmu_1_rc_3/event=0x3,src_rp_mask=0x3,dst_loc_cmem=0x1/ h](h)}(hnCount event id 0x3 from root port 0 and 1 of PCIE RC-3 on socket 1 and targeting just local CMEM of socket 1::h]hmCount event id 0x3 from root port 0 and 1 of PCIE RC-3 on socket 1 and targeting just local CMEM of socket 1:}(hj)hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhj%ubjR)}(hRperf stat -a -e nvidia_pcie_pmu_1_rc_3/event=0x3,src_rp_mask=0x3,dst_loc_cmem=0x1/h]hRperf stat -a -e nvidia_pcie_pmu_1_rc_3/event=0x3,src_rp_mask=0x3,dst_loc_cmem=0x1/}hj7sbah}(h]h ]h"]h$]h&]jajbuh1jQhhhKhj%ubeh}(h]h ]h"]h$]h&]uh1hhjhhhhhNubh)}(hCount event id 0x4 from BDF 01:01.0 of PCIE RC-4 on socket 0 targeting all destinations:: perf stat -a -e nvidia_pcie_pmu_0_rc_4/event=0x4,src_bdf=0x0180,src_bdf_en=0x1/ h](h)}(hYCount event id 0x4 from BDF 01:01.0 of PCIE RC-4 on socket 0 targeting all destinations::h]hXCount event id 0x4 from BDF 01:01.0 of PCIE RC-4 on socket 0 targeting all destinations:}(hjOhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjKubjR)}(hOperf stat -a -e nvidia_pcie_pmu_0_rc_4/event=0x4,src_bdf=0x0180,src_bdf_en=0x1/h]hOperf stat -a -e nvidia_pcie_pmu_0_rc_4/event=0x4,src_bdf=0x0180,src_bdf_en=0x1/}hj]sbah}(h]h ]h"]h$]h&]jajbuh1jQhhhKhjKubeh}(h]h ]h"]h$]h&]uh1hhjhhhhhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhKhj6hhubhtarget)}(h,.. _NVIDIA_T410_PCIE_PMU_RC_Mapping_Section:h]h}(h]h ]h"]h$]h&]refid'nvidia-t410-pcie-pmu-rc-mapping-sectionuh1jwhKhj6hhhhubh)}(hhh](h)}(h'Mapping the RC# to lspci segment numberh]h'Mapping the RC# to lspci segment number}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjhhhhhKubh)}(hXVMapping the RC# to lspci segment number can be non-trivial; hence a new NVIDIA Designated Vendor Specific Capability (DVSEC) register is added into the PCIE config space for each RP. This DVSEC has vendor id "10de" and DVSEC id of "0x4". The DVSEC register contains the following information to map PCIE devices under the RP back to its RC# :h]hX^Mapping the RC# to lspci segment number can be non-trivial; hence a new NVIDIA Designated Vendor Specific Capability (DVSEC) register is added into the PCIE config space for each RP. This DVSEC has vendor id “10de” and DVSEC id of “0x4”. The DVSEC register contains the following information to map PCIE devices under the RP back to its RC# :}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhKhjhhubj)}(hXp- Bus# (byte 0xc) : bus number as reported by the lspci output - Segment# (byte 0xd) : segment number as reported by the lspci output - RP# (byte 0xe) : port number as reported by LnkCap attribute from lspci for a device with Root Port capability - RC# (byte 0xf): root complex number associated with the RP - Socket# (byte 0x10): socket number associated with the RP h]h)}(hhh](h)}(h_rc_.h]hThe events and configuration options of this PMU device are available in sysfs, see /sys/bus/event_source/devices/nvidia_pcie_tgt_pmu__rc_.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM'hjy hhubh)}(hHThe events in this PMU can be used to measure bandwidth and utilization:h]hHThe events in this PMU can be used to measure bandwidth and utilization:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM*hjy hhubj)}(hX5* rd_req: count the number of read requests to PCIE. * wr_req: count the number of write requests to PCIE. * rd_bytes: count the number of bytes transferred by rd_req. * wr_bytes: count the number of bytes transferred by wr_req. * cycles: count the clock cycles of SOC fabric connected to the PCIE interface. h]h)}(hhh](h)}(h2rd_req: count the number of read requests to PCIE.h]h)}(hj h]h2rd_req: count the number of read requests to PCIE.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM,hj ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(h3wr_req: count the number of write requests to PCIE.h]h)}(hj h]h3wr_req: count the number of write requests to PCIE.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM-hj ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(h:rd_bytes: count the number of bytes transferred by rd_req.h]h)}(hj" h]h:rd_bytes: count the number of bytes transferred by rd_req.}(hj$ hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM.hj ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(h:wr_bytes: count the number of bytes transferred by wr_req.h]h)}(hj9 h]h:wr_bytes: count the number of bytes transferred by wr_req.}(hj; hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM/hj7 ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(hNcycles: count the clock cycles of SOC fabric connected to the PCIE interface. h]h)}(hMcycles: count the clock cycles of SOC fabric connected to the PCIE interface.h]hMcycles: count the clock cycles of SOC fabric connected to the PCIE interface.}(hjR hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM0hjN ubah}(h]h ]h"]h$]h&]uh1hhj ubeh}(h]h ]h"]h$]h&]jjuh1hhhhM,hj ubah}(h]h ]h"]h$]h&]uh1jhhhM,hjy hhubh)}(h(The average bandwidth is calculated as::h]h'The average bandwidth is calculated as:}(hjr hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM2hjy hhubjR)}(hqAVG_RD_BANDWIDTH_IN_GBPS = RD_BYTES / ELAPSED_TIME_IN_NS AVG_WR_BANDWIDTH_IN_GBPS = WR_BYTES / ELAPSED_TIME_IN_NSh]hqAVG_RD_BANDWIDTH_IN_GBPS = RD_BYTES / ELAPSED_TIME_IN_NS AVG_WR_BANDWIDTH_IN_GBPS = WR_BYTES / ELAPSED_TIME_IN_NS}hj sbah}(h]h ]h"]h$]h&]jajbuh1jQhhhM4hjy hhubh)}(h+The average request rate is calculated as::h]h*The average request rate is calculated as:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM7hjy hhubjR)}(hKAVG_RD_REQUEST_RATE = RD_REQ / CYCLES AVG_WR_REQUEST_RATE = WR_REQ / CYCLESh]hKAVG_RD_REQUEST_RATE = RD_REQ / CYCLES AVG_WR_REQUEST_RATE = WR_REQ / CYCLES}hj sbah}(h]h ]h"]h$]h&]jajbuh1jQhhhM9hjy hhubh)}(hXSThe PMU events can be filtered based on the destination root port or target address range. Filtering based on RP is only available for PCIE BAR traffic. Address filter works for both PCIE BAR and CXL HDM ranges. These filters can be found in sysfs, see /sys/bus/event_source/devices/nvidia_pcie_tgt_pmu__rc_/format/.h]hXSThe PMU events can be filtered based on the destination root port or target address range. Filtering based on RP is only available for PCIE BAR traffic. Address filter works for both PCIE BAR and CXL HDM ranges. These filters can be found in sysfs, see /sys/bus/event_source/devices/nvidia_pcie_tgt_pmu__rc_/format/.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM<hjy hhubh)}(hDestination filter settings:h]hDestination filter settings:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMBhjy hhubh)}(hhh](h)}(hdst_rp_mask: bitmask to select the root port(s) to monitor. E.g. "dst_rp_mask=0xFF" corresponds to all root ports (from 0 to 7) in the PCIE RC. Note that this filter is only available for PCIE BAR traffic.h]h)}(hdst_rp_mask: bitmask to select the root port(s) to monitor. E.g. "dst_rp_mask=0xFF" corresponds to all root ports (from 0 to 7) in the PCIE RC. Note that this filter is only available for PCIE BAR traffic.h]hdst_rp_mask: bitmask to select the root port(s) to monitor. E.g. “dst_rp_mask=0xFF” corresponds to all root ports (from 0 to 7) in the PCIE RC. Note that this filter is only available for PCIE BAR traffic.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMDhj ubah}(h]h ]h"]h$]h&]uh1hhj hhhhhNubh)}(h2dst_addr_base: BAR or CXL HDM filter base address.h]h)}(hj h]h2dst_addr_base: BAR or CXL HDM filter base address.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMGhj ubah}(h]h ]h"]h$]h&]uh1hhj hhhhhNubh)}(h2dst_addr_mask: BAR or CXL HDM filter address mask.h]h)}(hj h]h2dst_addr_mask: BAR or CXL HDM filter address mask.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMHhj ubah}(h]h ]h"]h$]h&]uh1hhj hhhhhNubh)}(hXdst_addr_en: enable BAR or CXL HDM address range filter. If this is set, the address range specified by "dst_addr_base" and "dst_addr_mask" will be used to filter the PCIE BAR and CXL HDM traffic address. The PMU uses the following comparison to determine if the traffic destination address falls within the filter range:: (txn's addr & dst_addr_mask) == (dst_addr_base & dst_addr_mask) If the comparison succeeds, then the event will be counted. h](h)}(hXBdst_addr_en: enable BAR or CXL HDM address range filter. If this is set, the address range specified by "dst_addr_base" and "dst_addr_mask" will be used to filter the PCIE BAR and CXL HDM traffic address. The PMU uses the following comparison to determine if the traffic destination address falls within the filter range::h]hXIdst_addr_en: enable BAR or CXL HDM address range filter. If this is set, the address range specified by “dst_addr_base” and “dst_addr_mask” will be used to filter the PCIE BAR and CXL HDM traffic address. The PMU uses the following comparison to determine if the traffic destination address falls within the filter range:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMIhj ubjR)}(h?(txn's addr & dst_addr_mask) == (dst_addr_base & dst_addr_mask)h]h?(txn's addr & dst_addr_mask) == (dst_addr_base & dst_addr_mask)}hj! sbah}(h]h ]h"]h$]h&]jajbuh1jQhhhMNhj ubh)}(h;If the comparison succeeds, then the event will be counted.h]h;If the comparison succeeds, then the event will be counted.}(hj/ hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMPhj ubeh}(h]h ]h"]h$]h&]uh1hhj hhhhhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhMDhjy hhubh)}(hIf the destination filter is not specified, the RP filter will be configured by default to count PCIE BAR traffic to all root ports.h]hIf the destination filter is not specified, the RP filter will be configured by default to count PCIE BAR traffic to all root ports.}(hjI hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMRhjy hhubh)}(hExample usage:h]hExample usage:}(hjW hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMUhjy hhubh)}(hhh](h)}(hCount event id 0x0 to root port 0 and 1 of PCIE RC-0 on socket 0:: perf stat -a -e nvidia_pcie_tgt_pmu_0_rc_0/event=0x0,dst_rp_mask=0x3/ h](h)}(hBCount event id 0x0 to root port 0 and 1 of PCIE RC-0 on socket 0::h]hACount event id 0x0 to root port 0 and 1 of PCIE RC-0 on socket 0:}(hjl hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMWhjh ubjR)}(hEperf stat -a -e nvidia_pcie_tgt_pmu_0_rc_0/event=0x0,dst_rp_mask=0x3/h]hEperf stat -a -e nvidia_pcie_tgt_pmu_0_rc_0/event=0x0,dst_rp_mask=0x3/}hjz sbah}(h]h ]h"]h$]h&]jajbuh1jQhhhMYhjh ubeh}(h]h ]h"]h$]h&]uh1hhje hhhhhNubh)}(hCount event id 0x1 for accesses to PCIE BAR or CXL HDM address range 0x10000 to 0x100FF on socket 0's PCIE RC-1:: perf stat -a -e nvidia_pcie_tgt_pmu_0_rc_1/event=0x1,dst_addr_base=0x10000,dst_addr_mask=0xFFF00,dst_addr_en=0x1/ h](h)}(hqCount event id 0x1 for accesses to PCIE BAR or CXL HDM address range 0x10000 to 0x100FF on socket 0's PCIE RC-1::h]hrCount event id 0x1 for accesses to PCIE BAR or CXL HDM address range 0x10000 to 0x100FF on socket 0’s PCIE RC-1:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM[hj ubjR)}(hqperf stat -a -e nvidia_pcie_tgt_pmu_0_rc_1/event=0x1,dst_addr_base=0x10000,dst_addr_mask=0xFFF00,dst_addr_en=0x1/h]hqperf stat -a -e nvidia_pcie_tgt_pmu_0_rc_1/event=0x1,dst_addr_base=0x10000,dst_addr_mask=0xFFF00,dst_addr_en=0x1/}hj sbah}(h]h ]h"]h$]h&]jajbuh1jQhhhM^hj ubeh}(h]h ]h"]h$]h&]uh1hhje hhhhhNubeh}(h]h ]h"]h$]h&]jjuh1hhhhMWhjy hhubeh}(h] pcie-tgt-pmuah ]h"] pcie-tgt pmuah$]h&]uh1hhhhhhhhMubh)}(hhh](h)}(hCPU Memory (CMEM) Latency PMUh]hCPU Memory (CMEM) Latency PMU}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hhhhhMaubh)}(hThis PMU monitors latency events of memory read requests from the edge of the Unified Coherence Fabric (UCF) to local CPU DRAM:h]hThis PMU monitors latency events of memory read requests from the edge of the Unified Coherence Fabric (UCF) to local CPU DRAM:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMchj hhubj)}(h* RD_REQ counters: count read requests (32B per request). * RD_CUM_OUTS counters: accumulated outstanding request counter, which track how many cycles the read requests are in flight. * CYCLES counter: counts the number of elapsed cycles. h]h)}(hhh](h)}(h7RD_REQ counters: count read requests (32B per request).h]h)}(hj h]h7RD_REQ counters: count read requests (32B per request).}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMfhj ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(h{RD_CUM_OUTS counters: accumulated outstanding request counter, which track how many cycles the read requests are in flight.h]h)}(h{RD_CUM_OUTS counters: accumulated outstanding request counter, which track how many cycles the read requests are in flight.h]h{RD_CUM_OUTS counters: accumulated outstanding request counter, which track how many cycles the read requests are in flight.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMghj ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(h5CYCLES counter: counts the number of elapsed cycles. h]h)}(h4CYCLES counter: counts the number of elapsed cycles.h]h4CYCLES counter: counts the number of elapsed cycles.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMihj ubah}(h]h ]h"]h$]h&]uh1hhj ubeh}(h]h ]h"]h$]h&]jjuh1hhhhMfhj ubah}(h]h ]h"]h$]h&]uh1jhhhMfhj hhubh)}(h&The average latency is calculated as::h]h%The average latency is calculated as:}(hj; hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMkhj hhubjR)}(hFREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS AVG_LATENCY_IN_CYCLES = RD_CUM_OUTS / RD_REQ AVERAGE_LATENCY_IN_NS = AVG_LATENCY_IN_CYCLES / FREQ_IN_GHZh]hFREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS AVG_LATENCY_IN_CYCLES = RD_CUM_OUTS / RD_REQ AVERAGE_LATENCY_IN_NS = AVG_LATENCY_IN_CYCLES / FREQ_IN_GHZ}hjI sbah}(h]h ]h"]h$]h&]jajbuh1jQhhhMmhj hhubh)}(hThe events and configuration options of this PMU device are described in sysfs, see /sys/bus/event_source/devices/nvidia_cmem_latency_pmu_.h]hThe events and configuration options of this PMU device are described in sysfs, see /sys/bus/event_source/devices/nvidia_cmem_latency_pmu_.}(hjW hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMqhj hhubh)}(hExample usage::h]hExample usage:}(hje hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMthj hhubjR)}(h~perf stat -a -e '{nvidia_cmem_latency_pmu_0/rd_req/,nvidia_cmem_latency_pmu_0/rd_cum_outs/,nvidia_cmem_latency_pmu_0/cycles/}'h]h~perf stat -a -e '{nvidia_cmem_latency_pmu_0/rd_req/,nvidia_cmem_latency_pmu_0/rd_cum_outs/,nvidia_cmem_latency_pmu_0/cycles/}'}hjs sbah}(h]h ]h"]h$]h&]jajbuh1jQhhhMvhj hhubeh}(h]cpu-memory-cmem-latency-pmuah ]h"]cpu memory (cmem) latency pmuah$]h&]uh1hhhhhhhhMaubh)}(hhh](h)}(hNVLink-C2C PMUh]hNVLink-C2C PMU}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhj hhhhhMyubh)}(hThis PMU monitors latency events of memory read/write requests that pass through the NVIDIA Chip-to-Chip (C2C) interface. Bandwidth events are not available in this PMU, unlike the C2C PMU in Grace (Tegra241 SoC).h]hThis PMU monitors latency events of memory read/write requests that pass through the NVIDIA Chip-to-Chip (C2C) interface. Bandwidth events are not available in this PMU, unlike the C2C PMU in Grace (Tegra241 SoC).}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhM{hj hhubh)}(hThe events and configuration options of this PMU device are available in sysfs, see /sys/bus/event_source/devices/nvidia_nvlink_c2c_pmu_.h]hThe events and configuration options of this PMU device are available in sysfs, see /sys/bus/event_source/devices/nvidia_nvlink_c2c_pmu_.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(hThe list of events:h]hThe list of events:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubj)}(hXe* IN_RD_CUM_OUTS: accumulated outstanding request (in cycles) of incoming read requests. * IN_RD_REQ: the number of incoming read requests. * IN_WR_CUM_OUTS: accumulated outstanding request (in cycles) of incoming write requests. * IN_WR_REQ: the number of incoming write requests. * OUT_RD_CUM_OUTS: accumulated outstanding request (in cycles) of outgoing read requests. * OUT_RD_REQ: the number of outgoing read requests. * OUT_WR_CUM_OUTS: accumulated outstanding request (in cycles) of outgoing write requests. * OUT_WR_REQ: the number of outgoing write requests. * CYCLES: NVLink-C2C interface cycle counts. h]h)}(hhh](h)}(hVIN_RD_CUM_OUTS: accumulated outstanding request (in cycles) of incoming read requests.h]h)}(hj h]hVIN_RD_CUM_OUTS: accumulated outstanding request (in cycles) of incoming read requests.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(h0IN_RD_REQ: the number of incoming read requests.h]h)}(hj h]h0IN_RD_REQ: the number of incoming read requests.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(hWIN_WR_CUM_OUTS: accumulated outstanding request (in cycles) of incoming write requests.h]h)}(hj h]hWIN_WR_CUM_OUTS: accumulated outstanding request (in cycles) of incoming write requests.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(h1IN_WR_REQ: the number of incoming write requests.h]h)}(hj h]h1IN_WR_REQ: the number of incoming write requests.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(hWOUT_RD_CUM_OUTS: accumulated outstanding request (in cycles) of outgoing read requests.h]h)}(hj) h]hWOUT_RD_CUM_OUTS: accumulated outstanding request (in cycles) of outgoing read requests.}(hj+ hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj' ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(h1OUT_RD_REQ: the number of outgoing read requests.h]h)}(hj@ h]h1OUT_RD_REQ: the number of outgoing read requests.}(hjB hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj> ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(hXOUT_WR_CUM_OUTS: accumulated outstanding request (in cycles) of outgoing write requests.h]h)}(hjW h]hXOUT_WR_CUM_OUTS: accumulated outstanding request (in cycles) of outgoing write requests.}(hjY hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjU ubah}(h]h ]h"]h$]h&]uh1hhj ubhᕒV)}(h2OUT_WR_REQ: the number of outgoing write requests.h]h)}(hjn h]h2OUT_WR_REQ: the number of outgoing write requests.}(hjp hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjl ubah}(h]h ]h"]h$]h&]uh1hhj ubh)}(h+CYCLES: NVLink-C2C interface cycle counts. h]h)}(h*CYCLES: NVLink-C2C interface cycle counts.h]h*CYCLES: NVLink-C2C interface cycle counts.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1hhj ubeh}(h]h ]h"]h$]h&]jjuh1hhhhMhj ubah}(h]h ]h"]h$]h&]uh1jhhhMhj hhubh)}(hThe incoming events count the reads/writes from remote device to the SoC. The outgoing events count the reads/writes from the SoC to remote device.h]hThe incoming events count the reads/writes from remote device to the SoC. The outgoing events count the reads/writes from the SoC to remote device.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(hThe sysfs /sys/bus/event_source/devices/nvidia_nvlink_c2c_pmu_/peer contains the information about the connected device.h]hThe sysfs /sys/bus/event_source/devices/nvidia_nvlink_c2c_pmu_/peer contains the information about the connected device.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(hX1When the C2C interface is connected to GPU(s), the user can use the "gpu_mask" parameter to filter traffic to/from specific GPU(s). Each bit represents the GPU index, e.g. "gpu_mask=0x1" corresponds to GPU 0 and "gpu_mask=0x3" is for GPU 0 and 1. The PMU will monitor all GPUs by default if not specified.h]hX=When the C2C interface is connected to GPU(s), the user can use the “gpu_mask” parameter to filter traffic to/from specific GPU(s). Each bit represents the GPU index, e.g. “gpu_mask=0x1” corresponds to GPU 0 and “gpu_mask=0x3” is for GPU 0 and 1. The PMU will monitor all GPUs by default if not specified.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(hBWhen connected to another SoC, only the read events are available.h]hBWhen connected to another SoC, only the read events are available.}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubh)}(hTThe events can be used to calculate the average latency of the read/write requests::h]hSThe events can be used to calculate the average latency of the read/write requests:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubjR)}(hX?C2C_FREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS IN_RD_AVG_LATENCY_IN_CYCLES = IN_RD_CUM_OUTS / IN_RD_REQ IN_RD_AVG_LATENCY_IN_NS = IN_RD_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_GHZ IN_WR_AVG_LATENCY_IN_CYCLES = IN_WR_CUM_OUTS / IN_WR_REQ IN_WR_AVG_LATENCY_IN_NS = IN_WR_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_GHZ OUT_RD_AVG_LATENCY_IN_CYCLES = OUT_RD_CUM_OUTS / OUT_RD_REQ OUT_RD_AVG_LATENCY_IN_NS = OUT_RD_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_GHZ OUT_WR_AVG_LATENCY_IN_CYCLES = OUT_WR_CUM_OUTS / OUT_WR_REQ OUT_WR_AVG_LATENCY_IN_NS = OUT_WR_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_GHZh]hX?C2C_FREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS IN_RD_AVG_LATENCY_IN_CYCLES = IN_RD_CUM_OUTS / IN_RD_REQ IN_RD_AVG_LATENCY_IN_NS = IN_RD_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_GHZ IN_WR_AVG_LATENCY_IN_CYCLES = IN_WR_CUM_OUTS / IN_WR_REQ IN_WR_AVG_LATENCY_IN_NS = IN_WR_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_GHZ OUT_RD_AVG_LATENCY_IN_CYCLES = OUT_RD_CUM_OUTS / OUT_RD_REQ OUT_RD_AVG_LATENCY_IN_NS = OUT_RD_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_GHZ OUT_WR_AVG_LATENCY_IN_CYCLES = OUT_WR_CUM_OUTS / OUT_WR_REQ OUT_WR_AVG_LATENCY_IN_NS = OUT_WR_AVG_LATENCY_IN_CYCLES / C2C_FREQ_IN_GHZ}hj sbah}(h]h ]h"]h$]h&]jajbuh1jQhhhMhj hhubh)}(hExample usage:h]hExample usage:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj hhubj)}(hX* Count incoming traffic from all GPUs connected via NVLink-C2C:: perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_req/ * Count incoming traffic from GPU 0 connected via NVLink-C2C:: perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_cum_outs,gpu_mask=0x1/ * Count incoming traffic from GPU 1 connected via NVLink-C2C:: perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_cum_outs,gpu_mask=0x2/ * Count outgoing traffic to all GPUs connected via NVLink-C2C:: perf stat -a -e nvidia_nvlink_c2c_pmu_0/out_rd_req/ * Count outgoing traffic to GPU 0 connected via NVLink-C2C:: perf stat -a -e nvidia_nvlink_c2c_pmu_0/out_rd_cum_outs,gpu_mask=0x1/ * Count outgoing traffic to GPU 1 connected via NVLink-C2C:: perf stat -a -e nvidia_nvlink_c2c_pmu_0/out_rd_cum_outs,gpu_mask=0x2/ h]h)}(hhh](h)}(hvCount incoming traffic from all GPUs connected via NVLink-C2C:: perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_req/ h](h)}(h?Count incoming traffic from all GPUs connected via NVLink-C2C::h]h>Count incoming traffic from all GPUs connected via NVLink-C2C:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubjR)}(h2perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_req/h]h2perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_req/}hj"sbah}(h]h ]h"]h$]h&]jajbuh1jQhhhMhjubeh}(h]h ]h"]h$]h&]uh1hhj ubh)}(hCount incoming traffic from GPU 0 connected via NVLink-C2C:: perf stat -a -e nvidia_nvlink_c2c_pmu_0/in_rd_cum_outs,gpu_mask=0x1/ h](h)}(h.h]hThe events and configuration options of this PMU device are available in sysfs, see /sys/bus/event_source/devices/nvidia_nvclink_pmu_.}(hj'hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubh)}(hThe list of events:h]hThe list of events:}(hj5hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubj)}(hXE* IN_RD_CUM_OUTS: accumulated outstanding request (in cycles) of incoming read requests. * IN_RD_REQ: the number of incoming read requests. * OUT_RD_CUM_OUTS: accumulated outstanding request (in cycles) of outgoing read requests. * OUT_RD_REQ: the number of outgoing read requests. * CYCLES: NV-CLINK interface cycle counts. h]h)}(hhh](h)}(hVIN_RD_CUM_OUTS: accumulated outstanding request (in cycles) of incoming read requests.h]h)}(hjLh]hVIN_RD_CUM_OUTS: accumulated outstanding request (in cycles) of incoming read requests.}(hjNhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjJubah}(h]h ]h"]h$]h&]uh1hhjGubh)}(h0IN_RD_REQ: the number of incoming read requests.h]h)}(hjch]h0IN_RD_REQ: the number of incoming read requests.}(hjehhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjaubah}(h]h ]h"]h$]h&]uh1hhjGubh)}(hWOUT_RD_CUM_OUTS: accumulated outstanding request (in cycles) of outgoing read requests.h]h)}(hjzh]hWOUT_RD_CUM_OUTS: accumulated outstanding request (in cycles) of outgoing read requests.}(hj|hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjxubah}(h]h ]h"]h$]h&]uh1hhjGubh)}(h1OUT_RD_REQ: the number of outgoing read requests.h]h)}(hjh]h1OUT_RD_REQ: the number of outgoing read requests.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjGubh)}(h)CYCLES: NV-CLINK interface cycle counts. h]h)}(h(CYCLES: NV-CLINK interface cycle counts.h]h(CYCLES: NV-CLINK interface cycle counts.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjGubeh}(h]h ]h"]h$]h&]jjuh1hhhhMhjCubah}(h]h ]h"]h$]h&]uh1jhhhMhjhhubh)}(hThe incoming events count the reads from remote device to the SoC. The outgoing events count the reads from the SoC to remote device.h]hThe incoming events count the reads from remote device to the SoC. The outgoing events count the reads from the SoC to remote device.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubh)}(hNThe events can be used to calculate the average latency of the read requests::h]hMThe events can be used to calculate the average latency of the read requests:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubjR)}(hX<CLINK_FREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS IN_RD_AVG_LATENCY_IN_CYCLES = IN_RD_CUM_OUTS / IN_RD_REQ IN_RD_AVG_LATENCY_IN_NS = IN_RD_AVG_LATENCY_IN_CYCLES / CLINK_FREQ_IN_GHZ OUT_RD_AVG_LATENCY_IN_CYCLES = OUT_RD_CUM_OUTS / OUT_RD_REQ OUT_RD_AVG_LATENCY_IN_NS = OUT_RD_AVG_LATENCY_IN_CYCLES / CLINK_FREQ_IN_GHZh]hX<CLINK_FREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS IN_RD_AVG_LATENCY_IN_CYCLES = IN_RD_CUM_OUTS / IN_RD_REQ IN_RD_AVG_LATENCY_IN_NS = IN_RD_AVG_LATENCY_IN_CYCLES / CLINK_FREQ_IN_GHZ OUT_RD_AVG_LATENCY_IN_CYCLES = OUT_RD_CUM_OUTS / OUT_RD_REQ OUT_RD_AVG_LATENCY_IN_NS = OUT_RD_AVG_LATENCY_IN_CYCLES / CLINK_FREQ_IN_GHZ}hjsbah}(h]h ]h"]h$]h&]jajbuh1jQhhhMhjhhubh)}(hExample usage:h]hExample usage:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjhhubj)}(h* Count incoming read traffic from remote SoC connected via NV-CLINK:: perf stat -a -e nvidia_nvclink_pmu_0/in_rd_req/ * Count outgoing read traffic to remote SoC connected via NV-CLINK:: perf stat -a -e nvidia_nvclink_pmu_0/out_rd_req/ h]h)}(hhh](h)}(hxCount incoming read traffic from remote SoC connected via NV-CLINK:: perf stat -a -e nvidia_nvclink_pmu_0/in_rd_req/ h](h)}(hDCount incoming read traffic from remote SoC connected via NV-CLINK::h]hCCount incoming read traffic from remote SoC connected via NV-CLINK:}(hj hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj ubjR)}(h/perf stat -a -e nvidia_nvclink_pmu_0/in_rd_req/h]h/perf stat -a -e nvidia_nvclink_pmu_0/in_rd_req/}hjsbah}(h]h ]h"]h$]h&]jajbuh1jQhhhMhj ubeh}(h]h ]h"]h$]h&]uh1hhjubh)}(hwCount outgoing read traffic to remote SoC connected via NV-CLINK:: perf stat -a -e nvidia_nvclink_pmu_0/out_rd_req/ h](h)}(hBCount outgoing read traffic to remote SoC connected via NV-CLINK::h]hACount outgoing read traffic to remote SoC connected via NV-CLINK:}(hj3hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj/ubjR)}(h0perf stat -a -e nvidia_nvclink_pmu_0/out_rd_req/h]h0perf stat -a -e nvidia_nvclink_pmu_0/out_rd_req/}hjAsbah}(h]h ]h"]h$]h&]jajbuh1jQhhhMhj/ubeh}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]jjuh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhhhMhjhhubeh}(h] nv-clink-pmuah ]h"] nv-clink pmuah$]h&]uh1hhhhhhhhMubh)}(hhh](h)}(h NV-DLink PMUh]h NV-DLink PMU}(hjlhhhNhNubah}(h]h ]h"]h$]h&]uh1hhjihhhhhMubh)}(hThis PMU monitors latency events of memory read requests that pass through the NV-DLINK interface. Bandwidth events are not available in this PMU. In Tegra410 SoC, this PMU only counts CXL memory read traffic.h]hThis PMU monitors latency events of memory read requests that pass through the NV-DLINK interface. Bandwidth events are not available in this PMU. In Tegra410 SoC, this PMU only counts CXL memory read traffic.}(hjzhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjihhubh)}(hThe events and configuration options of this PMU device are available in sysfs, see /sys/bus/event_source/devices/nvidia_nvdlink_pmu_.h]hThe events and configuration options of this PMU device are available in sysfs, see /sys/bus/event_source/devices/nvidia_nvdlink_pmu_.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjihhubh)}(hThe list of events:h]hThe list of events:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjihhubj)}(h* IN_RD_CUM_OUTS: accumulated outstanding read requests (in cycles) to CXL memory. * IN_RD_REQ: the number of read requests to CXL memory. * CYCLES: NV-DLINK interface cycle counts. h]h)}(hhh](h)}(hPIN_RD_CUM_OUTS: accumulated outstanding read requests (in cycles) to CXL memory.h]h)}(hjh]hPIN_RD_CUM_OUTS: accumulated outstanding read requests (in cycles) to CXL memory.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h5IN_RD_REQ: the number of read requests to CXL memory.h]h)}(hjh]h5IN_RD_REQ: the number of read requests to CXL memory.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjubh)}(h)CYCLES: NV-DLINK interface cycle counts. h]h)}(h(CYCLES: NV-DLINK interface cycle counts.h]h(CYCLES: NV-DLINK interface cycle counts.}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1hhjubeh}(h]h ]h"]h$]h&]jjuh1hhhhMhjubah}(h]h ]h"]h$]h&]uh1jhhhMhjihhubh)}(hNThe events can be used to calculate the average latency of the read requests::h]hMThe events can be used to calculate the average latency of the read requests:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjihhubjR)}(hDLINK_FREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS IN_RD_AVG_LATENCY_IN_CYCLES = IN_RD_CUM_OUTS / IN_RD_REQ IN_RD_AVG_LATENCY_IN_NS = IN_RD_AVG_LATENCY_IN_CYCLES / DLINK_FREQ_IN_GHZh]hDLINK_FREQ_IN_GHZ = CYCLES / ELAPSED_TIME_IN_NS IN_RD_AVG_LATENCY_IN_CYCLES = IN_RD_CUM_OUTS / IN_RD_REQ IN_RD_AVG_LATENCY_IN_NS = IN_RD_AVG_LATENCY_IN_CYCLES / DLINK_FREQ_IN_GHZ}hj sbah}(h]h ]h"]h$]h&]jajbuh1jQhhhMhjihhubh)}(hExample usage:h]hExample usage:}(hjhhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhjihhubj)}(h* Count read events to CXL memory:: perf stat -a -e '{nvidia_nvdlink_pmu_0/in_rd_req/,nvidia_nvdlink_pmu_0/in_rd_cum_outs/}'h]h)}(hhh]h)}(h}Count read events to CXL memory:: perf stat -a -e '{nvidia_nvdlink_pmu_0/in_rd_req/,nvidia_nvdlink_pmu_0/in_rd_cum_outs/}'h](h)}(h!Count read events to CXL memory::h]h Count read events to CXL memory:}(hj2hhhNhNubah}(h]h ]h"]h$]h&]uh1hhhhMhj.ubjR)}(hXperf stat -a -e '{nvidia_nvdlink_pmu_0/in_rd_req/,nvidia_nvdlink_pmu_0/in_rd_cum_outs/}'h]hXperf stat -a -e '{nvidia_nvdlink_pmu_0/in_rd_req/,nvidia_nvdlink_pmu_0/in_rd_cum_outs/}'}hj@sbah}(h]h ]h"]h$]h&]jajbuh1jQhhhM hj.ubeh}(h]h ]h"]h$]h&]uh1hhj+ubah}(h]h ]h"]h$]h&]jjuh1hhhhMhj'ubah}(h]h ]h"]h$]h&]uh1jhhhMhjihhubeh}(h] nv-dlink-pmuah ]h"] nv-dlink pmuah$]h&]uh1hhhhhhhhMubeh}(h]:nvidia-tegra410-soc-uncore-performance-monitoring-unit-pmuah ]h"]