diff options
author | SeongJae Park <sj@kernel.org> | 2023-11-05 18:49:16 +0000 |
---|---|---|
committer | SeongJae Park <sj@kernel.org> | 2023-11-05 18:49:16 +0000 |
commit | c5b207a868a6e76327f6e57ced80877577451cbd (patch) | |
tree | 9d1945e419bcb05e80c4f77e74234737aadc5e1e | |
parent | 61f9609f6d59e1e8808d804e59ffba273e1638ee (diff) | |
download | damon-hack-c5b207a868a6e76327f6e57ced80877577451cbd.tar.gz |
ideas: Add RFC IDEA for tiered memory management
Signed-off-by: SeongJae Park <sj@kernel.org>
-rw-r--r-- | ideas/tiered_mem | 185 |
1 files changed, 185 insertions, 0 deletions
diff --git a/ideas/tiered_mem b/ideas/tiered_mem new file mode 100644 index 0000000..d49aafc --- /dev/null +++ b/ideas/tiered_mem @@ -0,0 +1,185 @@ +Subject: [RFC IDEA] DAMOS-based Tiered-Memory Management + +Hello, + + +There were attempts to use DAMON at tiered memory system management, from the +pretty early days of DAMON[1]. I also wanted to dive deep on this topic[2] but +didn't get time for this, unfortunately. Meanwhile, a few people explored the +approach in their own way, and thankfully shared their approaches and results +with me, sometimes in public, and sometimes in provate. + +They used varying approaches and the results were also very different. Some +folks achieved nice results, while it was only waste of time for someone. +Nonetheless, what I commonly heard about from such grateful sharing was the +difficulty of DAMON tuning[3,4]. My proposal for the tuning difficulty was +auto-tuning of DAMOS aggressiveness based on user-provided or self-collectable +feedbacks. It is still not done, but an early version of the implementation +has recently shared. I'd like to share my concept level idea of using the +auto-tuning for reasonable DAMOS-based tiered memory system management. + +Background +========== + +Please read DAMOS auto-tuning RFC patchset's coverletter for detail of it[5]. +In short, the feature allow users define aimed system status (e.g., 0.05% +memory PSI and/or 50% free memory ratios), and let DAMOS control its +aggressiveness to achieve the goals. If the status is far from the goal, DAMOS +increases its aggressiveness, and vice versa. Under the limited +aggressiveness, DAMOS applies the action to pages that more prioritized for the +action. For example, colder pages are prioritized for 'pageout' action. + +DAMOS-based Tiered-Memory Management Idea +========================================= + +The idea is to set DAMOS schemes for each tiered memory node, like below. + +1. If the node has a lower node, demote cold pages of the +node to the lower node using DAMOS, colder pages first. Let DAMOS auto-tune +the aggressiveness of the demotion aiming small amount of (e.g., 5%) free +memory of the current node. + +2. If the node is not the highest node, promote hot pages in the node to the +upper node using DAMOS, more hot pages first. Let DAMOS auto-tune the +aggressiveness of the demotion aiming high utilization rate (e.g., 96%) of the +upper node. + +Discussion +========== + +The simple scheme can be easily extended to multiple tiered memory nodes. +Higher node will keep highutilization, with hot pages, while lower nodes have +only cold pages that cannot be accommodated in higher nodes due to out of +space. + +Because the utilization goal and free memory goal overlap, DAMOS will continue +moving cold pages down and hot pages up. Since the demotion is for coldest +pages of the node, and the goal-based aggressiveness auto-tuning makes the +aggressiveness minimum for the overlapping case, the overhead will be only +modest. + +If the system is memory over-committed, we can also apply DAMOS-based cold +pages proactive reclamation aiming some level of PSI, and +Access/Contiguity-aware Memory Auto-scalaing[6] to the lowest node. + +Request For Comments +==================== + +This is in very early stage. No enough survey of related works is done, and no +implementation of the demote/promote DAMOS action are made. An early version +of the aim-oriented DAMOS aggressive auto tuning is available, though. I even +have no good tiered memory test system setup for myself. I hope to share this +approach that I believe might work in general, and get any comment if possible, +not to only success, but rather to learn and improve, or even fail fast. + +Example Operation Scenario +========================== + +Let's suppose a system is having a DRAM and a CXL-based slower node, each can +accommodate 10 pages. + +And the proposed idea is configured. The free memory ratio and memory +utilization ratio goals are set as 5% and 96%. That is, we have the demote +DAMOS scheme for DRAM node, and promote DAMOS scheme for CXL node. + +Let's also represent the hotness of each page in five level, from 0 (cold) to 4 +(hot). And let's represent free pages as 'F'. + +Then a state of the system may represented like below: + + Node 0 (DRAM): 43210 43210 (100% util, 0% free) + Node 1 (CXL): FFFFF FFFFF ( 0% util, 100% free) + +Demoting Cold Pages +------------------- + +Since DRAM node is having 0% free memory ratio, which is under the goal of the +demote scheme, demote scheme is activated. Meanwhile, since DRAM node +utilization ratio is 100%, which is higher than the goal of the promote scheme +(96%), the promote scheme is doing nothing. Hence, cold pages in Node 0 are +demoted, colder one first. + + Node 0 (DRAM): 4321F 43210 ( 95% util, 5% free) + Node 1 (CXL): FFFF0 FFFFF ( 5% util, 95% free) + +The goal of demote scheme is met, so demote scheme stops. The DRAM utilization +ratio is below the promote scheme's goal (96%). So the scheme is activated and +promote hot pages of the node, more hot one first. + + Node 0 (DRAM): 43210 43210 (100% util, 0% free) + Node 1 (CXL): FFFFF FFFFF ( 0% util, 100% free) + +Then promote scheme again deactivated, and demote scheme activated. Coldest +page demoted. + + Node 0 (DRAM): 43210 4321F ( 95% util, 5% free) + Node 1 (CXL): FFFFF FFFF0 ( 5% util, 95% free) + +Then promote scheme again activated, and demote scheme deactivated. Hottest +page in CXL memory promoted. + + Node 0 (DRAM): 43210 43210 (100% util, 0% free) + Node 1 (CXL): FFFFF FFFFF ( 0% util, 100% free) + +In this way, DRAM node keeps high utilization ratio with only hot pages, while +only cold pages moves back and forth between the two nodes. + +Promoting Hot Pages +------------------- + +Let's assume the two cold pages are in CXL node. + + Node 0 (DRAM): 4321F 4321F ( 90% util, 10% free) + Node 1 (CXL): FFFF0 FFFF0 ( 10% util, 90% free) + +And let's assume the demoted pages become hot. + + Node 0 (DRAM): 4321F 4321F ( 90% util, 10% free) + Node 1 (CXL): FFFF3 FFFF2 ( 10% util, 90% free) + +The promotion scheme will promote the hot pages, more hot page first. + + Node 0 (DRAM): 43213 4321F ( 95% util, 5% free) + Node 1 (CXL): FFFFF FFFF2 ( 5% util, 95% free) + +The other page in CXL also get promoted following the goals. + + Node 0 (DRAM): 43213 43212 (100% util, 0% free) + Node 1 (CXL): FFFFF FFFFF ( 0% util, 100% free) + +Now, the demotion scheme demotes cold pages of the DRAM node, not the now-hot +just promoted pages. + + Node 0 (DRAM): 432F3 43212 ( 95% util, 5% free) + Node 1 (CXL): FFF1F FFFFF ( 5% util, 95% free) + +Extension +--------- + +The schemes can be extended for more than two nodes scenario. For example, +below system can be configured. + + Node 0 (DRAM): 44444 4443F ( 95% util, 5% free) + Node 1 (CXL): 33333 3321F ( 95% util, 5% free) + Node 2 (CXL2): 11111 100FF ( 90% util, 19% free) + +A demote scheme for Node 0, which aim 5% free memory rate of Node 0, is +configured. + +Two schemes are set for Node 1. One for a promote scheme which aims 96% +utilization of Node 0, and one for a demote scheme which aims 5% free memory of +Node 1. + +And Node 2 also has one promote scheme which aims 96% utilization of Node 1. + +In the way, higher nodes get high utilization with hot pages. If working set +is enough to be accommodated in 95% of highest node, all working set will be +placed in the node. + + +[1] https://lore.kernel.org/linux-mm/cover.1640171137.git.baolin.wang@linux.alibaba.com/ +[2] https://lore.kernel.org/linux-mm/20230105221109.53398-1-sj@kernel.org/ +[3] https://lore.kernel.org/linux-mm/20230219203138.4873-1-sj@kernel.org/ +[4] https://lwn.net/Articles/931812/ +[5] TODO: Add link to DAMOS autotuning patches +[6] TODO: Add link to ACMA RFC IDEA |