summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorVishal Verma <vishal.l.verma@intel.com>2020-03-27 18:50:36 -0600
committerVishal Verma <vishal.l.verma@intel.com>2022-01-12 11:50:36 -0700
commitd58c7b0e1a99a2ec17f2910a310835bafc50b4d1 (patch)
treecd73d9301bad86bcd216675188b92f7947dcb478
parente5e4a15a16e3b804faf9cf6e5487e50b362114bc (diff)
downloadtiering-0.8.tar.gz
Add a tiering specific README document to the topleveltiering-0.8tiering-0.8
Add a README that details setup steps and notes for running and evaluating the tiering specific features present in this branch. Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
-rw-r--r--README-tiering.txt117
1 files changed, 117 insertions, 0 deletions
diff --git a/README-tiering.txt b/README-tiering.txt
new file mode 100644
index 00000000000000..9ed612ac2703b7
--- /dev/null
+++ b/README-tiering.txt
@@ -0,0 +1,117 @@
+Release Notes
+=============
+
+Updates in tiering-0.8:
+- Rebased on v5.15
+- Remove cgroup v1 support, we will switch to cgroup v2 support in a
+ future version. If you need cgroup v1 support, please stick with
+ v0.72.
+- Increase hot threshold quicker if too few pages pass the threshold
+- Reset hot threshold if workload change is detected
+- Batch migrate_pages() to reduce TLB shootdown IPIs
+- Support to decrease hot threshold if the pages just demoted are hot
+- Support to promote pages asynchronously
+- Support to wake up kswapd earlier to make promotion more smooth
+- Add more sysctl knob for experimenting new features
+- Change the interface to enable NUMA balancing for MPOL_PREFERRED_MANY
+
+
+0. Introduction
+===============
+
+This document describes the software setup steps and usage hints that can be
+helpful in setting up a system to use tiered memory for evaluation.
+
+The document will go over:
+1. Any kernel config options required to enable the tiered memory features
+2. Any additional userspace tooling required, and related instructions
+3. Any post-boot setup for configurable or tunable knobs
+
+Note:
+Any instructions/settings described here may be tailored to the branch this
+is under. Setup steps may change from release to release, and for each
+release branch, the setup document accompanying that branch should be
+consulted.
+
+1. Kernel build and configuration
+=================================
+
+a. The recommended starting point is a distro-default kernel config. We
+ use and recommend using a recent Fedora config for a starting point.
+
+b. Ensure the following:
+ CONFIG_DEV_DAX_KMEM=m
+ CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n
+ NUMA_BALANCING=y
+
+
+2. Tooling setup
+================
+
+a. Install 'ndctl' and 'daxctl' from your distro, or from upstream:
+ https://github.com/pmem/ndctl
+ This may especially be required if the distro version of daxctl
+ is not new enough to support the daxctl reconfigure-device command[1]
+
+ [1]: https://pmem.io/ndctl/daxctl-reconfigure-device.html
+
+b. Assuming that persistent memory devices are the next demotion tier
+ for system memory, perform the following steps to allow a pmem device
+ to be hot-plugged system RAM:
+
+ First, convert 'fsdax' namespace(s) to 'devdax':
+ ndctl create-namespace -fe namespaceX.Y -m devdax
+
+c. Reconfigure 'daxctl' devices to system-ram using the kmem facility:
+ daxctl reconfigure-device -m system-ram daxX.Y.
+
+ The JSON emitted at this step contains the 'target_node' for this
+ hotplugged memory. This is the memory-only NUMA node where this
+ memory appears, and can be used explicitly with normal libnuma/numactl
+ techniques.
+
+d. Ensure the newly created NUMA nodes for the hotplugged memory are in
+ ZONE_MOVABLE. The JSON from daxctl in the above step should indicate
+ this with a 'movable: true' attribute. Based on the distribution, there
+ may be udev rules that interfere with memory onlining. They may race to
+ online memory into ZONE_NORMAL rather than movable. If this is the case,
+ find and disable any such udev rules.
+
+3. Post boot setup
+==================
+
+a. Enable cold page demotion after the device-dax instances are
+ onlined to start migrating “cold” pages from DRAM to PMEM.
+ # echo 1 > /sys/kernel/mm/numa/demotion_enabled
+
+b. Enable 'NUMA balancing' for promotion
+ # echo 2 > /proc/sys/kernel/numa_balancing
+ # echo 30 > /proc/sys/kernel/numa_balancing_rate_limit_mbps
+
+c. Optional: enable waking up kswapd earlier to make promotion more smooth
+
+ # echo 1 > /proc/sys/kernel/numa_balancing_wake_up_kswapd_early
+
+d. Optional: enable decreasing hot threshold if the pages just demoted are hot
+
+ # echo 1 > /proc/sys/kernel/numa_balancing_scan_demoted
+ # echo 16 > /proc/sys/kernel/numa_balancing_demoted_threshold
+
+4. Promotion/demotion statistics
+================================
+
+ The number of promoted pages can be checked by the following counters in
+ /proc/vmstat or /sys/devices/system/node/node[n]/vmstat:
+ pgpromote_success
+
+ The number of pages demoted can be checked by the following counters:
+ pgdemote_kswapd
+ pgdemote_direct
+
+ The page number of failure in promotion could be checked by the
+ following counters:
+ pgmigrate_fail_dst_node_fail
+ pgmigrate_fail_numa_isolate_fail
+ pgmigrate_fail_nomem_fail
+ pgmigrate_fail_refcount_fail
+