Huge Page Text Support
This is a proposal for the huge page text extension support. The main changes are a kernel patch, a glibc kernel, the Linux binutils patches:
The following features are implemented:
- To the section attribute flags, add SHF_GNU_HUGE_PAGE
#define SHF_GNU_HUGE_PAGE 0x02000000
described as follows:
SHF_GNU_HUGE_PAGE
The section contains text that will be placed in huge TLB pages at the runtime. If an implementation doesn't support huge TLB pages, it should be ignored.
- To Special Sections, add the following:
.text..huge, type SHT_PROGBITS, attributes SHF_ALLOC+SHF_EXECINSTR+SHF_GNU_HUGE_PAGE
.rodata..huge, type SHT_PROGBITS, attributes SHF_ALLOC+SHF_GNU_HUGE_PAGE
.text..huge
This section holds text that contribute to the program's memory image. The section should be placed in huge TLB pages. If an implementation doesn't support huge TLB pages, it should be treated as a normal .text section.
.rodata..huge
This section holds read-only data that contribute to the program's memory image. The section should be placed in huge TLB pages. If an implementation doesn't support huge TLB pages, it should be treated as a normal .rodata section.
- To the Program Header section, add a segment type PT_GNU_HUGE_PAGE:
#define PT_GNU_HUGE_PAGE (PT_LOOS + 0x474e553)
The array element specifies the location of a huge TLB page segment. The interpretation of the huge TLB page segment is implementation-dependent. If an implementation doesn't support huge TLB pages, it can be ignored. The PT_GNU_HUGE_PAGE segment should only be used in dynamic or static executables. If a shared library or position independent executable contain such a segment, the behavior is undefined. An executable can only have at most one PT_GNU_HUGE_PAGE segment and it must precede any PT_LOAD segments in the program header.
- .text..huge.*. It has the same section type and attribute as .text..huge.
- .gnu.linkonce.ht.*. It has the same section type and attribute as .text..huge. If linker sees more than one section with the same name, only one section will be kept.
- .rodata..huge.*. It has the same section type and attribute as .rodata..huge.
- .gnu.linkonce.hr.*. It has the same section type and attribute as .rodata..huge. If linker sees more than one section with the same name, only one section will be kept.
- Assembler will set the proper type and attribute for special sections listed above, regardless what the assembly directive specifies. No other sections with the SHF_GNU_HUGE_PAGE attribute are allowed.
- By default, for executable outputs, linker will group together input .text..huge, .text..huge.* and .gnu.linkonce.ht.* sections into a single output .text..huge section, and group together input .rodata..huge, .rodata..huge.* and .gnu.linkonce.hr.* sections into a single output .rodata..huge section. A customized linker script can be used to put other text sections into the output .text..huge section and other read-only data sections into the output .rodata..huge section.
- By default, sections from system libraries won't be put in the huge TLB page. A customized linker script can be used to remove this limit. But it has to ensure that the bootstrap code remains in the normal page.
- The output .text..huge and .rodata..huge sections will be placed in a separate PT_LOAD segment and aligned to the maximum huge TLB page size supported by the architecture. This PT_LOAD segment will have the same addresses, offset and sizes as the PT_GNU_HUGE_PAGE segment, but its alignment will be set to the maximum normal page size.
- The kernel loader should skip final mapping of the PT_LOAD segment specified by the PT_GNU_HUGE_PAGE segment. It should pass the huge TLB page size in the auxiliary vector, AT_HUGEPAGESZ and the pointer to the PT_GNU_HUGE_PAGE segment in the program header in another auxiliary vector, AT_HUGEPAGEPHDR.
- If there are relocations in the loadable huge TLB page segment, the behavior is undefined.
- The run time start up code will do the followings:
- Locate the PT_GNU_HUGE_PAGE segment by checking AT_HUGEPAGEPHDR for executable. If it isn't available, skip the steps below.
- If AT_HUGEPAGESZ isn't available, treat the PT_GNU_HUGE_PAGE segment as the normal PT_LOAD segment.
- If AT_HUGEPAGESZ > the p_align field of the PT_GNU_HUGE_PAGE segment, treat the PT_GNU_HUGE_PAGE segment as the normal PT_LOAD segment. It is due to the PT_GNU_HUGE_PAGE segment will be rounded up to the next integral number of full pages of size p_align. A larger page size may affect memory layout.
- Check the environment variable, LD_GNU_HUGE_PAGE_FS, for the huge TLB page file system. It will be default to /mnt/hugepage if it isn't set.
- Lock the executable to prevent other processes from accessing it.
- Create a backup file under /mnt/hugepage for the PT_LOAD segment specified by the with filename, /mnt/hugepage/st_dev/st_ino/text, st_dev and st_ino are the device and inode of the executable.
- If the /mnt/hugepage doesn't exist or it isn't mounted, unlock the executable and treat the PT_GNU_HUGE_PAGE segment as the normal PT_LOAD segment.
- If the backup file doesn't exist or is different from the executable, run time should mmap the backup file with the address and the size specified by the PT_LOAD segment and copy it from the executable to the mmaped area. Run time should set the time stamps of the backup file the same as the executable and unlock the executable after copying is finished.
- If mmap fails on the backup file, unlock the executable and treat the PT_GNU_HUGE_PAGE segment as the normal PT_LOAD segment.
- If the backup file exists and its time stamp is the same as the one in the executable, the existing backup file will be mmaped and the executable will be unlocked.
- The user is ultimately responsible for maintaining the consistency between the executable and its backup file under /mnt/hugepage. Run time will make no attempt other than checking time stamp when comparing the executable/shared library and its backup file.
- This should be backward compatible with existing kernel and the Linux C library if kernel supports mapping the PT_GNU_HUGE_PAGE segment as the normal PT_LOAD segment. Otherwise, the executable with the PT_GNU_HUGE_PAGE segment will abort if the huge TLB page isn't available for any reason.
There is a new linker option, -z huge, which will create a PT_GNU_HUGE_PAGE segment in executable automatically.