aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorSeongJae Park <sj@kernel.org>2023-11-05 18:49:16 +0000
committerSeongJae Park <sj@kernel.org>2023-11-05 18:49:16 +0000
commitc5b207a868a6e76327f6e57ced80877577451cbd (patch)
tree9d1945e419bcb05e80c4f77e74234737aadc5e1e
parent61f9609f6d59e1e8808d804e59ffba273e1638ee (diff)
downloaddamon-hack-c5b207a868a6e76327f6e57ced80877577451cbd.tar.gz
ideas: Add RFC IDEA for tiered memory management
Signed-off-by: SeongJae Park <sj@kernel.org>
-rw-r--r--ideas/tiered_mem185
1 files changed, 185 insertions, 0 deletions
diff --git a/ideas/tiered_mem b/ideas/tiered_mem
new file mode 100644
index 0000000..d49aafc
--- /dev/null
+++ b/ideas/tiered_mem
@@ -0,0 +1,185 @@
+Subject: [RFC IDEA] DAMOS-based Tiered-Memory Management
+
+Hello,
+
+
+There were attempts to use DAMON at tiered memory system management, from the
+pretty early days of DAMON[1]. I also wanted to dive deep on this topic[2] but
+didn't get time for this, unfortunately. Meanwhile, a few people explored the
+approach in their own way, and thankfully shared their approaches and results
+with me, sometimes in public, and sometimes in provate.
+
+They used varying approaches and the results were also very different. Some
+folks achieved nice results, while it was only waste of time for someone.
+Nonetheless, what I commonly heard about from such grateful sharing was the
+difficulty of DAMON tuning[3,4]. My proposal for the tuning difficulty was
+auto-tuning of DAMOS aggressiveness based on user-provided or self-collectable
+feedbacks. It is still not done, but an early version of the implementation
+has recently shared. I'd like to share my concept level idea of using the
+auto-tuning for reasonable DAMOS-based tiered memory system management.
+
+Background
+==========
+
+Please read DAMOS auto-tuning RFC patchset's coverletter for detail of it[5].
+In short, the feature allow users define aimed system status (e.g., 0.05%
+memory PSI and/or 50% free memory ratios), and let DAMOS control its
+aggressiveness to achieve the goals. If the status is far from the goal, DAMOS
+increases its aggressiveness, and vice versa. Under the limited
+aggressiveness, DAMOS applies the action to pages that more prioritized for the
+action. For example, colder pages are prioritized for 'pageout' action.
+
+DAMOS-based Tiered-Memory Management Idea
+=========================================
+
+The idea is to set DAMOS schemes for each tiered memory node, like below.
+
+1. If the node has a lower node, demote cold pages of the
+node to the lower node using DAMOS, colder pages first. Let DAMOS auto-tune
+the aggressiveness of the demotion aiming small amount of (e.g., 5%) free
+memory of the current node.
+
+2. If the node is not the highest node, promote hot pages in the node to the
+upper node using DAMOS, more hot pages first. Let DAMOS auto-tune the
+aggressiveness of the demotion aiming high utilization rate (e.g., 96%) of the
+upper node.
+
+Discussion
+==========
+
+The simple scheme can be easily extended to multiple tiered memory nodes.
+Higher node will keep highutilization, with hot pages, while lower nodes have
+only cold pages that cannot be accommodated in higher nodes due to out of
+space.
+
+Because the utilization goal and free memory goal overlap, DAMOS will continue
+moving cold pages down and hot pages up. Since the demotion is for coldest
+pages of the node, and the goal-based aggressiveness auto-tuning makes the
+aggressiveness minimum for the overlapping case, the overhead will be only
+modest.
+
+If the system is memory over-committed, we can also apply DAMOS-based cold
+pages proactive reclamation aiming some level of PSI, and
+Access/Contiguity-aware Memory Auto-scalaing[6] to the lowest node.
+
+Request For Comments
+====================
+
+This is in very early stage. No enough survey of related works is done, and no
+implementation of the demote/promote DAMOS action are made. An early version
+of the aim-oriented DAMOS aggressive auto tuning is available, though. I even
+have no good tiered memory test system setup for myself. I hope to share this
+approach that I believe might work in general, and get any comment if possible,
+not to only success, but rather to learn and improve, or even fail fast.
+
+Example Operation Scenario
+==========================
+
+Let's suppose a system is having a DRAM and a CXL-based slower node, each can
+accommodate 10 pages.
+
+And the proposed idea is configured. The free memory ratio and memory
+utilization ratio goals are set as 5% and 96%. That is, we have the demote
+DAMOS scheme for DRAM node, and promote DAMOS scheme for CXL node.
+
+Let's also represent the hotness of each page in five level, from 0 (cold) to 4
+(hot). And let's represent free pages as 'F'.
+
+Then a state of the system may represented like below:
+
+ Node 0 (DRAM): 43210 43210 (100% util, 0% free)
+ Node 1 (CXL): FFFFF FFFFF ( 0% util, 100% free)
+
+Demoting Cold Pages
+-------------------
+
+Since DRAM node is having 0% free memory ratio, which is under the goal of the
+demote scheme, demote scheme is activated. Meanwhile, since DRAM node
+utilization ratio is 100%, which is higher than the goal of the promote scheme
+(96%), the promote scheme is doing nothing. Hence, cold pages in Node 0 are
+demoted, colder one first.
+
+ Node 0 (DRAM): 4321F 43210 ( 95% util, 5% free)
+ Node 1 (CXL): FFFF0 FFFFF ( 5% util, 95% free)
+
+The goal of demote scheme is met, so demote scheme stops. The DRAM utilization
+ratio is below the promote scheme's goal (96%). So the scheme is activated and
+promote hot pages of the node, more hot one first.
+
+ Node 0 (DRAM): 43210 43210 (100% util, 0% free)
+ Node 1 (CXL): FFFFF FFFFF ( 0% util, 100% free)
+
+Then promote scheme again deactivated, and demote scheme activated. Coldest
+page demoted.
+
+ Node 0 (DRAM): 43210 4321F ( 95% util, 5% free)
+ Node 1 (CXL): FFFFF FFFF0 ( 5% util, 95% free)
+
+Then promote scheme again activated, and demote scheme deactivated. Hottest
+page in CXL memory promoted.
+
+ Node 0 (DRAM): 43210 43210 (100% util, 0% free)
+ Node 1 (CXL): FFFFF FFFFF ( 0% util, 100% free)
+
+In this way, DRAM node keeps high utilization ratio with only hot pages, while
+only cold pages moves back and forth between the two nodes.
+
+Promoting Hot Pages
+-------------------
+
+Let's assume the two cold pages are in CXL node.
+
+ Node 0 (DRAM): 4321F 4321F ( 90% util, 10% free)
+ Node 1 (CXL): FFFF0 FFFF0 ( 10% util, 90% free)
+
+And let's assume the demoted pages become hot.
+
+ Node 0 (DRAM): 4321F 4321F ( 90% util, 10% free)
+ Node 1 (CXL): FFFF3 FFFF2 ( 10% util, 90% free)
+
+The promotion scheme will promote the hot pages, more hot page first.
+
+ Node 0 (DRAM): 43213 4321F ( 95% util, 5% free)
+ Node 1 (CXL): FFFFF FFFF2 ( 5% util, 95% free)
+
+The other page in CXL also get promoted following the goals.
+
+ Node 0 (DRAM): 43213 43212 (100% util, 0% free)
+ Node 1 (CXL): FFFFF FFFFF ( 0% util, 100% free)
+
+Now, the demotion scheme demotes cold pages of the DRAM node, not the now-hot
+just promoted pages.
+
+ Node 0 (DRAM): 432F3 43212 ( 95% util, 5% free)
+ Node 1 (CXL): FFF1F FFFFF ( 5% util, 95% free)
+
+Extension
+---------
+
+The schemes can be extended for more than two nodes scenario. For example,
+below system can be configured.
+
+ Node 0 (DRAM): 44444 4443F ( 95% util, 5% free)
+ Node 1 (CXL): 33333 3321F ( 95% util, 5% free)
+ Node 2 (CXL2): 11111 100FF ( 90% util, 19% free)
+
+A demote scheme for Node 0, which aim 5% free memory rate of Node 0, is
+configured.
+
+Two schemes are set for Node 1. One for a promote scheme which aims 96%
+utilization of Node 0, and one for a demote scheme which aims 5% free memory of
+Node 1.
+
+And Node 2 also has one promote scheme which aims 96% utilization of Node 1.
+
+In the way, higher nodes get high utilization with hot pages. If working set
+is enough to be accommodated in 95% of highest node, all working set will be
+placed in the node.
+
+
+[1] https://lore.kernel.org/linux-mm/cover.1640171137.git.baolin.wang@linux.alibaba.com/
+[2] https://lore.kernel.org/linux-mm/20230105221109.53398-1-sj@kernel.org/
+[3] https://lore.kernel.org/linux-mm/20230219203138.4873-1-sj@kernel.org/
+[4] https://lwn.net/Articles/931812/
+[5] TODO: Add link to DAMOS autotuning patches
+[6] TODO: Add link to ACMA RFC IDEA