From: Alan made overcommit mode 2 and it doesnt work at all. A process passing the limit often does so at a moment of stack extension, and is killed by a segfault, not better than being OOM-killed. Another problem is that close to the edge no other processes can be started, so that a sysadmin has problems logging in and investigating. Below a patch that does 3 things: (1) It reserves a reasonable amount of virtual stack space (amount randomly chosen, no guarantees given) when the process is started, so that the common utilities will not be killed by segfault on stack extension. (2) It reserves a reasonable amount of virtual memory for root, so that root can do things when the system is out-of-memory (3) It limits a single process to 97% of what is left, so that also an ordinary user is able to use getty, login, bash, ps, kill and similar things when one of her processes got out of control. Since the current overcommit mode 2 is not really useful, I did not give this a new number. The patch is just for playing, not to be applied by Linus. But, Andrew, I hope that you would be willing to put this in -mm so that people can experiment. Of course it only does something if one sets overcommit mode to 2. The past month I have pressured people asking for feedback, and now have about a dozen reports, mostly positive, one very positive. Signed-off-by: Andrew Morton --- 25-akpm/fs/exec.c | 19 +++++++++++-------- 25-akpm/security/commoncap.c | 8 ++++++++ 25-akpm/security/dummy.c | 8 ++++++++ 3 files changed, 27 insertions(+), 8 deletions(-) diff -puN fs/exec.c~mm-overcommit-updates fs/exec.c --- 25/fs/exec.c~mm-overcommit-updates 2004-11-30 01:22:43.969204160 -0800 +++ 25-akpm/fs/exec.c 2004-11-30 01:22:43.977202944 -0800 @@ -341,6 +341,8 @@ out_sig: force_sig(SIGKILL, current); } +#define EXTRA_STACK_VM_PAGES 20 /* random */ + int setup_arg_pages(struct linux_binprm *bprm, int executable_stack) { unsigned long stack_base; @@ -378,15 +380,15 @@ int setup_arg_pages(struct linux_binprm memmove(to, to + offset, PAGE_SIZE - offset); kunmap(bprm->page[j - 1]); - /* Adjust bprm->p to point to the end of the strings. */ - bprm->p = PAGE_SIZE * i - offset; - /* Limit stack size to 1GB */ stack_base = current->signal->rlim[RLIMIT_STACK].rlim_max; if (stack_base > (1 << 30)) stack_base = 1 << 30; stack_base = PAGE_ALIGN(STACK_TOP - stack_base); + /* Adjust bprm->p to point to the end of the strings. */ + bprm->p = stack_base + PAGE_SIZE * i - offset; + mm->arg_start = stack_base; arg_size = i << PAGE_SHIFT; @@ -395,11 +397,13 @@ int setup_arg_pages(struct linux_binprm bprm->page[i++] = NULL; #else stack_base = STACK_TOP - MAX_ARG_PAGES * PAGE_SIZE; - mm->arg_start = bprm->p + stack_base; + bprm->p += stack_base; + mm->arg_start = bprm->p; arg_size = STACK_TOP - (PAGE_MASK & (unsigned long) mm->arg_start); #endif - bprm->p += stack_base; + arg_size += EXTRA_STACK_VM_PAGES * PAGE_SIZE; + if (bprm->loader) bprm->loader += stack_base; bprm->exec += stack_base; @@ -420,11 +424,10 @@ int setup_arg_pages(struct linux_binprm mpnt->vm_mm = mm; #ifdef CONFIG_STACK_GROWSUP mpnt->vm_start = stack_base; - mpnt->vm_end = PAGE_MASK & - (PAGE_SIZE - 1 + (unsigned long) bprm->p); + mpnt->vm_end = stack_base + arg_size; #else - mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p; mpnt->vm_end = STACK_TOP; + mpnt->vm_start = mpnt->vm_end - arg_size; #endif /* Adjust stack execute permissions; explicitly enable * for EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X diff -puN security/commoncap.c~mm-overcommit-updates security/commoncap.c --- 25/security/commoncap.c~mm-overcommit-updates 2004-11-30 01:22:43.971203856 -0800 +++ 25-akpm/security/commoncap.c 2004-11-30 01:22:43.978202792 -0800 @@ -386,6 +386,14 @@ int cap_vm_enough_memory(long pages) allowed -= allowed / 32; allowed += total_swap_pages; + /* Leave the last 3% for root */ + if (current->euid) + allowed -= allowed / 32; + + /* Don't let a single process grow too big: + leave 3% of the size of this process for other processes */ + allowed -= current->mm->total_vm / 32; + if (atomic_read(&vm_committed_space) < allowed) return 0; diff -puN security/dummy.c~mm-overcommit-updates security/dummy.c --- 25/security/dummy.c~mm-overcommit-updates 2004-11-30 01:22:43.972203704 -0800 +++ 25-akpm/security/dummy.c 2004-11-30 01:22:43.978202792 -0800 @@ -160,6 +160,14 @@ static int dummy_vm_enough_memory(long p * sysctl_overcommit_ratio / 100; allowed += total_swap_pages; + /* Leave the last 3% for root */ + if (current->euid) + allowed -= allowed / 32; + + /* Don't let a single process grow too big: + leave 3% of the size of this process for other processes */ + allowed -= current->mm->total_vm / 32; + if (atomic_read(&vm_committed_space) < allowed) return 0; _