aboutsummaryrefslogtreecommitdiffstats
path: root/tools/mm
AgeCommit message (Collapse)AuthorFilesLines
2024-02-22tools/mm: add thpmaps script to dump THP usage infoRyan Roberts2-4/+680
With the proliferation of large folios for file-backed memory, and more recently the introduction of multi-size THP for anonymous memory, it is becoming useful to be able to see exactly how large folios are mapped into processes. For some architectures (e.g. arm64), if most memory is mapped using contpte-sized and -aligned blocks, TLB usage can be optimized so it's useful to see where these requirements are and are not being met. thpmaps is a Python utility that reads /proc/<pid>/smaps, /proc/<pid>/pagemap and /proc/kpageflags to print information about how transparent huge pages (both file and anon) are mapped to a specified process or cgroup. It aims to help users debug and optimize their workloads. In future we may wish to introduce stats directly into the kernel (e.g. smaps or similar), but for now this provides a short term solution without the need to introduce any new ABI. Run with help option for a full listing of the arguments: # ./thpmaps --help --8<-- usage: thpmaps [-h] [--pid pid | --cgroup path] [--rollup] [--cont size[KMG]] [--inc-smaps] [--inc-empty] [--periodic sleep_ms] Prints information about how transparent huge pages are mapped, either system-wide, or for a specified process or cgroup. When run with --pid, the user explicitly specifies the set of pids to scan. e.g. "--pid 10 [--pid 134 ...]". When run with --cgroup, the user passes either a v1 or v2 cgroup and all pids that belong to the cgroup subtree are scanned. When run with neither --pid nor --cgroup, the full set of pids on the system is gathered from /proc and scanned as if the user had provided "--pid 1 --pid 2 ...". A default set of statistics is always generated for THP mappings. However, it is also possible to generate additional statistics for "contiguous block mappings" where the block size is user-defined. Statistics are maintained independently for anonymous and file-backed (pagecache) memory and are shown both in kB and as a percentage of either total anonymous or total file-backed memory as appropriate. THP Statistics -------------- Statistics are always generated for fully- and contiguously-mapped THPs whose mapping address is aligned to their size, for each <size> supported by the system. Separate counters describe THPs mapped by PTE vs those mapped by PMD. (Although note a THP can only be mapped by PMD if it is PMD-sized): - anon-thp-pte-aligned-<size>kB - file-thp-pte-aligned-<size>kB - anon-thp-pmd-aligned-<size>kB - file-thp-pmd-aligned-<size>kB Similarly, statistics are always generated for fully- and contiguously- mapped THPs whose mapping address is *not* aligned to their size, for each <size> supported by the system. Due to the unaligned mapping, it is impossible to map by PMD, so there are only PTE counters for this case: - anon-thp-pte-unaligned-<size>kB - file-thp-pte-unaligned-<size>kB Statistics are also always generated for mapped pages that belong to a THP but where the is THP is *not* fully- and contiguously- mapped. These "partial" mappings are all counted in the same counter regardless of the size of the THP that is partially mapped: - anon-thp-pte-partial - file-thp-pte-partial Contiguous Block Statistics --------------------------- An optional, additional set of statistics is generated for every contiguous block size specified with `--cont <size>`. These statistics show how much memory is mapped in contiguous blocks of <size> and also aligned to <size>. A given contiguous block must all belong to the same THP, but there is no requirement for it to be the *whole* THP. Separate counters describe contiguous blocks mapped by PTE vs those mapped by PMD: - anon-cont-pte-aligned-<size>kB - file-cont-pte-aligned-<size>kB - anon-cont-pmd-aligned-<size>kB - file-cont-pmd-aligned-<size>kB As an example, if monitoring 64K contiguous blocks (--cont 64K), there are a number of sources that could provide such blocks: a fully- and contiguously-mapped 64K THP that is aligned to a 64K boundary would provide 1 block. A fully- and contiguously-mapped 128K THP that is aligned to at least a 64K boundary would provide 2 blocks. Or a 128K THP that maps its first 100K, but contiguously and starting at a 64K boundary would provide 1 block. A fully- and contiguously-mapped 2M THP would provide 32 blocks. There are many other possible permutations. options: -h, --help show this help message and exit --pid pid Process id of the target process. Maybe issued multiple times to scan multiple processes. --pid and --cgroup are mutually exclusive. If neither are provided, all processes are scanned to provide system-wide information. --cgroup path Path to the target cgroup in sysfs. Iterates over every pid in the cgroup and its children. --pid and --cgroup are mutually exclusive. If neither are provided, all processes are scanned to provide system-wide information. --rollup Sum the per-vma statistics to provide a summary over the whole system, process or cgroup. --cont size[KMG] Adds stats for memory that is mapped in contiguous blocks of <size> and also aligned to <size>. May be issued multiple times to track multiple sized blocks. Useful to infer e.g. arm64 contpte and hpa mappings. Size must be a power-of-2 number of pages. --inc-smaps Include all numerical, additive /proc/<pid>/smaps stats in the output. --inc-empty Show all statistics including those whose value is 0. --periodic sleep_ms Run in a loop, polling every sleep_ms milliseconds. Requires root privilege to access pagemap and kpageflags. --8<-- Example command to summarise fully and partially mapped THPs and 64K contiguous blocks over all VMAs in all processes in the system (--inc-empty forces printing stats that are 0): # ./thpmaps --cont 64K --rollup --inc-empty --8<-- anon-thp-pmd-aligned-2048kB: 139264 kB ( 6%) file-thp-pmd-aligned-2048kB: 0 kB ( 0%) anon-thp-pte-aligned-16kB: 0 kB ( 0%) anon-thp-pte-aligned-32kB: 0 kB ( 0%) anon-thp-pte-aligned-64kB: 72256 kB ( 3%) anon-thp-pte-aligned-128kB: 0 kB ( 0%) anon-thp-pte-aligned-256kB: 0 kB ( 0%) anon-thp-pte-aligned-512kB: 0 kB ( 0%) anon-thp-pte-aligned-1024kB: 0 kB ( 0%) anon-thp-pte-aligned-2048kB: 0 kB ( 0%) anon-thp-pte-unaligned-16kB: 0 kB ( 0%) anon-thp-pte-unaligned-32kB: 0 kB ( 0%) anon-thp-pte-unaligned-64kB: 0 kB ( 0%) anon-thp-pte-unaligned-128kB: 0 kB ( 0%) anon-thp-pte-unaligned-256kB: 0 kB ( 0%) anon-thp-pte-unaligned-512kB: 0 kB ( 0%) anon-thp-pte-unaligned-1024kB: 0 kB ( 0%) anon-thp-pte-unaligned-2048kB: 0 kB ( 0%) anon-thp-pte-partial: 63232 kB ( 3%) file-thp-pte-aligned-16kB: 809024 kB (47%) file-thp-pte-aligned-32kB: 43168 kB ( 3%) file-thp-pte-aligned-64kB: 98496 kB ( 6%) file-thp-pte-aligned-128kB: 17536 kB ( 1%) file-thp-pte-aligned-256kB: 0 kB ( 0%) file-thp-pte-aligned-512kB: 0 kB ( 0%) file-thp-pte-aligned-1024kB: 0 kB ( 0%) file-thp-pte-aligned-2048kB: 0 kB ( 0%) file-thp-pte-unaligned-16kB: 21712 kB ( 1%) file-thp-pte-unaligned-32kB: 704 kB ( 0%) file-thp-pte-unaligned-64kB: 896 kB ( 0%) file-thp-pte-unaligned-128kB: 44928 kB ( 3%) file-thp-pte-unaligned-256kB: 0 kB ( 0%) file-thp-pte-unaligned-512kB: 0 kB ( 0%) file-thp-pte-unaligned-1024kB: 0 kB ( 0%) file-thp-pte-unaligned-2048kB: 0 kB ( 0%) file-thp-pte-partial: 9252 kB ( 1%) anon-cont-pmd-aligned-64kB: 139264 kB ( 6%) file-cont-pmd-aligned-64kB: 0 kB ( 0%) anon-cont-pte-aligned-64kB: 100672 kB ( 4%) file-cont-pte-aligned-64kB: 161856 kB ( 9%) --8<-- Link: https://lkml.kernel.org/r/20240116141235.960842-1-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Tested-by: Barry Song <v-songbaohua@oppo.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18tools/mm: update the usage output to be more organizedAudra Mitchell1-13/+20
Organize the usage options alphabetically and improve the description of some options. Also separate the more complicated cull options from the single use compare options. Link: https://lkml.kernel.org/r/20231013190350.579407-6-audra@redhat.com Signed-off-by: Audra Mitchell <audra@redhat.com> Acked-by: Rafael Aquini <aquini@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Georgi Djakov <djakov@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18tools/mm: fix the default case for page_owner_sortAudra Mitchell1-8/+53
With the additional commands and timestamps added to the tool, the default case (-t) has been broken. Now that the allocation timestamps are saved outside of the txt field, allow us to properly sort the data by number of times the record has been seen. Furthermore prevent the misuse of the commandline arguments so only one compare option can be used. Link: https://lkml.kernel.org/r/20231013190350.579407-5-audra@redhat.com Signed-off-by: Audra Mitchell <audra@redhat.com> Acked-by: Rafael Aquini <aquini@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Georgi Djakov <djakov@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18tools/mm: filter out timestamps for correct collationAudra Mitchell1-7/+18
With the introduction of allocation timestamps being included in page_owner output, each record becomes unique due to the timestamp nanosecond granularity. Remove the check in add_list that tries to collate each record during processing as the memcmp() is just additional overhead at this point. Also keep the allocation timestamps, but allow collation to occur without consideration of the allocation timestamp except in the case were allocation timestamps are requested by the user (the -a option). Link: https://lkml.kernel.org/r/20231013190350.579407-4-audra@redhat.com Signed-off-by: Audra Mitchell <audra@redhat.com> Acked-by: Rafael Aquini <aquini@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Georgi Djakov <djakov@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18tools/mm: remove references to free_ts from page_owner_sortAudra Mitchell1-86/+12
With the removal of free timestamps from page_owner output, we no longer need to handle this case or the "unreleased" case. Remove all references to both cases. Link: https://lkml.kernel.org/r/20231013190350.579407-3-audra@redhat.com Signed-off-by: Audra Mitchell <audra@redhat.com> Acked-by: Rafael Aquini <aquini@redhat.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Georgi Djakov <djakov@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-05tools/mm: fix undefined reference to pthread_onceXie XiuQi1-2/+2
Commit 97d5f2e9ee12 ("tools api fs: More thread safety for global filesystem variables") introduces pthread_once, so the libpthread should be added at link time, or we'll meet the following compile error when 'make -C tools/mm': gcc -Wall -Wextra -I../lib/ -o page-types page-types.c ../lib/api/libapi.a ~/linux/tools/lib/api/fs/fs.c:146: undefined reference to `pthread_once' ~/linux/tools/lib/api/fs/fs.c:147: undefined reference to `pthread_once' ~/linux/tools/lib/api/fs/fs.c:148: undefined reference to `pthread_once' ~/linux/tools/lib/api/fs/fs.c:149: undefined reference to `pthread_once' ~/linux/tools/lib/api/fs/fs.c:150: undefined reference to `pthread_once' /usr/bin/ld: ../lib/api/libapi.a(libapi-in.o):~/linux/tools/lib/api/fs/fs.c:151: more undefined references to `pthread_once' follow collect2: error: ld returned 1 exit status make: *** [Makefile:22: page-types] Error 1 Link: https://lkml.kernel.org/r/20230831034205.2376653-1-xiexiuqi@huaweicloud.com Fixes: 97d5f2e9ee12 ("tools api fs: More thread safety for global filesystem variables") Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-25Merge tag 'slab-for-6.4' of ↵Linus Torvalds1-5/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: "The main change is naturally the SLOB removal. Since its deprecation in 6.2 I've seen no complaints so hopefully SLUB_(TINY) works well for everyone and we can proceed. Besides the code cleanup, the main immediate benefit will be allowing kfree() family of function to work on kmem_cache_alloc() objects, which was incompatible with SLOB. This includes kfree_rcu() which had no kmem_cache_free_rcu() counterpart yet and now it shouldn't be necessary anymore. Besides that, there are several small code and comment improvements from Thomas, Thorsten and Vernon" * tag 'slab-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slab: document kfree() as allowed for kmem_cache_alloc() objects mm/slob: remove slob.c mm/slab: remove CONFIG_SLOB code from slab common code mm, pagemap: remove SLOB and SLQB from comments and documentation mm, page_flags: remove PG_slob_free mm/slob: remove CONFIG_SLOB mm/slub: fix help comment of SLUB_DEBUG mm: slub: make kobj_type structure constant slab: Adjust comment after refactoring of gfp.h
2023-04-16tools/mm/page_owner_sort.c: fix TGID output when cull=tg is usedSteve Chou1-1/+1
When using cull option with 'tg' flag, the fprintf is using pid instead of tgid. It should use tgid instead. Link: https://lkml.kernel.org/r/20230411034929.2071501-1-steve_chou@pesi.com.tw Fixes: 9c8a0a8e599f4a ("tools/vm/page_owner_sort.c: support for user-defined culling rules") Signed-off-by: Steve Chou <steve_chou@pesi.com.tw> Cc: Jiajian Ye <yejiajian2018@email.szu.edu.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-29mm, page_flags: remove PG_slob_freeVlastimil Babka1-5/+1
With SLOB removed we no longer need the PG_slob_free alias for PG_private. Also update tools/mm/page-types. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
2023-02-02tools/mm: allow users to provide additional cflags/ldflagsHerton R. Krzesinski1-2/+2
Right now there is no way to provide additional cflags/ldflags when building tools/vm binaries. And using eg. make CFLAGS=<options> will override the CFLAGS being set in the Makefile, making the build fail since it requires the include of the ../lib dir (for libapi). This change then allows you to specify: CFLAGS=<options> LDFLAGS=<options> make V=1 -C tools/vm And the options will be correctly appended as can be seen from the make output. Link: https://lkml.kernel.org/r/20230116224921.4106324-1-herton@redhat.com Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Justin Forbes <jforbes@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Scott Weaver <scweaver@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-18tools/vm: rename tools/vm to tools/mmSeongJae Park6-0/+4141
Rename tools/vm to tools/mm for being more consistent with the code and documentation directories, and won't be confused with virtual machines. Link: https://lkml.kernel.org/r/20230103180754.129637-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>