******************************************************************************** * * * * * Documentation * * * * * ******************************************************************************** The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us. There are many more regions which have not been covered by the test cases. ################################################################################ # KERNEL CONFIGURATION # ################################################################################ The test suite was developed with the help of gcov code coverage analysis tool. Gcov can be enabled as a configuration option in linux kernel 2.6 and upwards. GCOV gives a per line execution count for the source files. The supporting tool versions that were used in the development of the suite are gcov - 4.6.3 gcc - 4.6.3 Kernel - 3.4 The test kernel needs to be configured with the following options set CONFIG_LOCK_STAT=y CONFIG_GCOV_KERNEL=y CONFIG_GCOV_PROFILE_ALL=y CONFIG_PERF_EVENTS=y CONFIG_FTRACE_SYSCALLS=y CONFIG_TRANSPARENT_HUGEPAGE=y Once the test kernel has been compiled and installed, a debugfs is mounted on /sys/kernel. writing to the file /sys/kernel/debug/gcov/reset resets all the counters. The directory /sys/kernel/debug/gcov/ also has a link to the build directory on the test system. For more information about setting up gcov, consult the gcov documentation. ################################################################################ # FILES # ################################################################################ hw_vars: This file is the interface between the system and the suite and provides the system configuration environment information like total memory, cpus, nr_hugepages etc. All the /proc/meminfo output can be extracted using this file. This file also has functions to create and delete sparse files in the $SPARSE_ROOT directory which is set to /tmp/vm-scalability. Most of the cases work on these files. The sparse root could be made to be of either the btrfs, ext4 or the xfs type by suitable editing the hw_vars file. run_cases: This file first resets the counters and then executes all the test cases one by one. Some of the functions are invoked only during startup and are reset before the cases are run. Hence some cases need to enable them again via the case script before calling the executables so that the neccessary invoking functions are counted for in the source_file.gcov. Similary, some of the functions are invoked dynamically when a system configuration like nr_hugepages changes during the execution of the program. Cases like that have been covered by resetting such parameters from within the program. cases-*: These files call the executable files with suitable options. Some of them have multithreaded option while some dont. ################################################################################ # USAGE # ################################################################################ cd /path/to/suite/directory make all ./run gcov -o /path/to/build/directory/with/the/.gcno .c the last command produces the source_file.c.gcov file which has the coverage information for each line of the mm .c and .f files. Note: scripts to aumatically gather the gcno/gcda files and extract the source_file.gcov data coverage files are available in the gcov documentation. The cases in the suite call an exucutable file with options. Most of the cases work on usemem. Some of the cases that call other ececutables have been written in seperate files in order to modularise the code and have been named based on the kernel functionality they exercise. Some of the cases merely call trivial system calls and do not do anything else. They can be extended suitably as per need. Some cases like case-migration, case-mbind etc need a numa setup. This was achieved using the numa=fake=. The value is the number of nodes to be emulated. The suite was tested with value = 2, which is the minimum value for inter-core page migration. passed as a kernel boot option. Those cases that require the numa setup need to be linked with -lnuma flag and the libnuma has to be installed in the system. The executables that these cases call have been taken from the numactl documentation and slightly modified. They have been found to work on a 2 node numa-emulated machine. cases which require the sysfs parameters to be using echo > sysfs_parameter set may need tweaking based on the system configuration. The default values used in the case scripts may not scale well when systems parameters are scaled. For example, for systems with higher memory, the /sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan may be needed to be set to a higher value or the scan_sleep_millisecs needs to be reduced or both. Failure to scale the values may result in disproportional or sometimes, no observable coverage in corresponding functions. cases can be run individually using ./case-name with the suite directory as pwd. The scripts work under this assumption. Also, care has to taken to make sure that the sparse root is mounted. The run_cases script takes care of mounting the sparse partition before running the scripts. Hugepages are assumed to be of 2MB. ################################################################################ # WARNING # ################################################################################ The coverage analysis with gcov enabled by setting the CONFIG_GCOV_KERNEL=y CONFIG_GCOV_PROFILE_ALL=y configuration options, profiles the entire kernel. Hence the system boot time in considerably increased and the system runs a little slower too. Enabling these configuration parameters on high-end server systems have been observed to cause boot problems or unstable kernels with or without a lot of errors. ################################################################################ # CASE DESCRIPTION # ################################################################################ case-000-anon: Fill 1/3 of total memory with anonymous pages by creating an anonymous memory region and continuous write to it. This test is used to exercise kernel's page fault handler and memory allocation. case-000-shm: Fill 1/3 of total memory by continuously writing to a file that is hosted on a tmpfs; the file is accessed with mmap. The above two test cases are meant to eat memory, they should be used with other test cases. Anonymous page related: case-anon-cow-seq/rand: Parent allocate a portion of anonymous memory and then fork several child processes; these child processes will write data sequentially/randomly to that memory region. This test is used to test kernel's copy-on-write functionality. case-anon-cow-seq/rand-mt: Since threads share the same memory space, this test doesn't seem to make much sense(No COW happened). This test is used to make sure no COW occurred and thus its performance should be much better than the above one. case-anon-r-rand(-mt): mmap a specified size of anonymous region and read at random position to trigger page faults. Since this is a readonly test and kernel will always use the same ZERO page for all the faults, this test is used to test if this fast path works. The -mt version use thread instead of process, it could involve the test of page table lock. case-anon-r-seq(-mt): Sequentially read instead of randomly read compared to the above test. Should have the same effect since the mapped physical page, i.e. the ZERO page is always accessible so no matter it is sequentially read or randomly read, only page fault occurred, no IO happened. case-anon-rx-seq-mt: Almost the same as case-anon-r-seq-mt, the only difference is that it does the allocation before the test starts. The allocation is actually a mmap of a huge anonymous region and that alone shouldn't take much time(on populate). This test is meant to see the speed difference? between prefault(this case) non-prefault(the above case). case-anon-rx-rand-mt: read randomly instead of sequentially compared to case-anon-rx-seq-mt. Since it's preallocated, it shouldn't matter if it is read randomly or sequentially so this test case is used to verify that. case-anon-w-seq/rand: Start N tasks and each mmap 1/2N whole system memory size of anonymous region and write sequentially/randomly to that region. This will trigger page fault and memory allocation. case-anon-w-seq/rand-mt: use threads instead of tasks compared to the above test case. case-anon-wx-seq/rand-mt: preallocate, i.e. do mmap of this anonymous memory region before the test starts compared to the above test case. case-direct-write: Open a file with O_DIRECT and then continuously write data to it till the world ends. The file is a sparse file. case-fork: start 20000 processes and do nothing. This test is used to test fork performance. case-fork-sleep: Almost the same as the above test, except that each task will sleep 10s before exit. The sleep is used to make sure there are many processes alive at the same time. case-hugetlb: Try to allocate 1/3 of whole memory size as huge page by manipulating the /proc/sys/vm/nr_hugepages file and then free them all. case-ksm: Start a process per node and the processes each will RW mmap a private anonymous region of MemTotal/1000 size and populate when mmap, then use madvise to set MERGEABLE to trigger KSM; sleep 1 minute and disable MERGEABLE. The populate will trigger a lot of zero pages so KSM should make an huge effect. We could measure CPU consumption and how much memory gets freed. case-ksm-hugepages: Start N processes, each will mmap a RW private anonymous region of size MemTotal/2N. The transparent hugepage is always enabled in this case so all allocations are done in hugepages(zeored hugepages). Then the test set MERGEABLE for this region and that should trigger KSM. After the set, sleep 30 seconds and then clear MERGEABLE flag for this region. The test is done now. Again, we could measure CPU consumption and how much memory gets freed. File related: case-lru-file-mmap-read/rand: Fork N processes and each process mmap a separate file whose size is ROTATE_SIZE/n and read its data sequentially/randomly to test LRU related functions. case-lru-file-mmap-write: Almost the same as the above test, except doing write instead of read. case-lru-file-readonce: Quite similar as case-lru-file-mmap-read except it is using dd to read the file. case-lru-file-readtwice: The file is read twice instead of once compared to the above test. case-lru-memcg: Set memory limit to 1/3 total memory size and then using case-lru-file-readonce to do the test. case-lru-shm: Start N task, each create a sparse file on a tmpfs with a size of MemTotal. Then start reading 1/2N size of its data. So the N processes will fill the memory half full after they are all done. It doesn't matter if the task has exited since the read portion of the file will always occupy the memory. case-lru-shm-rand: Almost the same as the above test case except the read is done in random order instead of sequentially. case-mbind: Start N processes, each allocates 1/5N MemFree memory anonymous pages, then move these pages to node 0 with numa_move_pages and mbind(seems duplicated here since mbind could also move existing pages), use get_mempolicy to verify if these pages are indeed at the desired node and print error messages if any. Then use mbind to move these pages to node 1 and verify again with get_mempolicy. This test is used to test if mbind's moving existing pages functionality works. case-migrate: /* FIXME: */ migratepages is missing. case-migrate-across-nodes: Start $nr_node processes, each will allocate 1/2.5N MemFree memory anonymous pages, then use numa_move_pages to move these pages to node 1, verify the move is successful; use numa_migrate_pages to migrate these pages to node 0 and verify if the migrate is successful. This test is used to test if migrate_pages is working as expected. KSM is enabled before the test, not sure the impact. case-mincore: Start N threads, each thread has a separate file as its backing store. Mmap that file and read in MemTotal/N bytes data randomly(which means some part of the memory space will never be touched). After this, use the mincore system call to see how many pages are in core. Then the test exits. case-mlock: Start N processes, each will allocate 1/3N reclaimable memory((nr_free_page + nr_file_page)*PAGE_SIZE) with mmap and then use mlock to lock all the allocated space into memory. The mlock system call will also cause the memory to be actually allocated. case-mmap-pread-seq/rand: Create a sparse file with a size of 4T and then start N processes to mmap the file and read its content sequentially/randomly. This is used to cause pressure on the LRU. case-mmap-pread-seq/rand-mt: Uses threads to do the read instead of processes compared to the above case. case-mmap-xread-seq/rand-mt: Preallocate and then all the threads will use the same memory space of their own allocated one. This should cause less pressue on the LRU list compared to the above test case. case-msync: Create N sparse files, each with a size of $MemTotal. For each sparse file, start a process to write 1/2N of the sparse file's size. After the write, do a msync to make sure the change in memory has reached the file. case-msync-mt: Create a sparse file with size of $MemTotal, before creating N threads, it will preallocate and prefault 1/2 memory space with mmap using this sparse file as backing store and then the N threads will all write data there using the preallocated space. When this is done, use msync to flush change back to the file. case-shm-pread-seq/rand: Start N processes to read sequentially/randomly a file hosted on a tmpfs. The file's size is the same as $MemTotal and the read size is half the file's size. The end result is the file will occupy 1/2 $MemTotal. case-shm-pread-seq/rand-mt: Use thread instead of process compared to the above case. Will generate some pressure on the page table lock. case-shm-xread-seq/rand: Preallocate the space using mmap before forking N processes to do the read compared to case-shm-pread-seq/rand. The difference between them is the preallocate could save these processes a mmap call. case-shm-xread-seq/rand-mt: Use thread instead of process compared to the above case. Since threads share the same VM, the page faults for some space pages may occur concurrently and this test may be able to test page fault's scalability.