Age | Commit message (Collapse) | Author | Files | Lines |
|
S/390 needs this for its binfmt_elf32 module.
Signed-off-by: Christian Borntraeger <cborntra@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This almost changes no code (constant is still "3"), but at least it uses
right constants for device_suspend() and fixes types at few points. Also
puts explanation of constants to the Documentation.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This adds few missing statics to swsusp.c, prints errors even when
non-debugging and fixes last "pmdisk: " message. Fixed few comments.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Matthew Wilcox just converted parisc over to doing the generic irq code and
we ran across the symbol probe_irq_mask being undefined (and thus
preventing yenta_socket from loading).
It looks like the EXPORT_SYMBOL() was accidentally missed from
kernel/irq/autoprobe.c and no-one noticed on x86 because it's still in
i386_ksyms.c
This patch corrects the problem so that the generic irq code now works
completely on parisc.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
account_steal_time called for idle doesn't work correctly:
1) steal time while idle needs to be added to the system time of idle
to get correct uptime numbers
3) if there is an i/o request outstanding the steal time should be
added to iowait, even if the hypervisor scheduled another virtual
cpu since we are still waiting for i/o.
2) steal time while idle without an i/o request outstanding has to
be added to cpustat->idle and not to cpustat->system.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Compat syscalls need to start compat_sys_ otherwise PA-RISC's compat
syscall wrappers don't work. Not that the individual involved bothered
to patch PA-RISC ...
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Paul Mackerras points out that doing the _raw_spin_trylock each time
through the loop will generate tons of unnecessary bus traffic.
Instead, after we fail to get the lock we should poll it with simple
loads until we see that it is clear and then retry the atomic op.
Assuming a reasonable cache design, the loads won't generate any bus
traffic until another cpu writes to the cacheline containing the lock.
Agreed.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Radheka Godse <radheka.godse@intel.com> pointed out that parameter parsing
failures allow a module still to be loaded. Trivial fix.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
A couple of one liners to resolve two issues that have come up regarding
audit.
Roger reported a problem with audit.c:audit_receive_skb which improperly
negates the errno argument when netlink_ack is called.
The second issue was reported by Steve on the linux-audit list,
auditsc.s:audit_log_exit using %u instead of %d in the audit_log_format
call.
Please note, there is a mailing list available for audit discussion at
https://www.redhat.com/archives/linux-audit/
Signed-off-by: Peter Martuccelli <peterm@redhat.com>
Signed-off-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Roger Luethi <rl@hellgate.ch>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch adds the architecture magic to replace the jiffies based cputime
with microsecond based cputime and it adds code to calculate involuntary
wait time. With this patch the numbers reported by top and ps when running
on LPAR or z/VM are finally not junk anymore.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch introduces the concept of (virtual) cputime. Each architecture
can define its method to measure cputime. The main idea is to define a
cputime_t type and a set of operations on it (see asm-generic/cputime.h).
Then use the type for utime, stime, cutime, cstime, it_virt_value,
it_virt_incr, it_prof_value and it_prof_incr and use the cputime operations
for each access to these variables. The default implementation is jiffies
based and the effect of this patch for architectures which use the default
implementation should be neglectible.
There is a second type cputime64_t which is necessary for the kernel_stat
cpu statistics. The default cputime_t is 32 bit and based on HZ, this will
overflow after 49.7 days. This is not enough for kernel_stat (ihmo not
enough for a processes too), so it is necessary to have a 64 bit type.
The third thing that gets introduced by this patch is an additional field
for the /proc/stat interface: cpu steal time. An architecture can account
cpu steal time by calls to the account_stealtime function. The cpu which
backs a virtual processor doesn't spent all of its time for the virtual
cpu. To get meaningful cpu usage numbers this involuntary wait time needs
to be accounted and exported to user space.
From: Hugh Dickins <hugh@veritas.com>
The p->signal check in account_system_time is insufficient. If the timer
interrupt hits near the end of exit_notify, after EXIT_ZOMBIE has been set,
another cpu may release_task (NULLifying p->signal) in between
account_system_time's check and check_rlimit's dereference. Nor should
account_it_prof risk send_sig. But surely account_user_time is safe?
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch is to provide extra check in acct_update_integrals() function.
The routine would return if 'delta' is 0 to take quick exit if nothing to
be done.
Signed-off-by: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
During resume, my previous patch switches over to the saved swsusp image
without suspending all devices first. This patch fixes that oversight, so
that the state of the hardware upon resume more closely matches the state
it had at suspend time.
While my previous patch alone seemed to work fine in my testing, it is
not fully correct without this as well.
Signed-off-by: Barry K. Nathan <barryn@pobox.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Since at least kernel 2.6.9, if not earlier, swsusp fails to properly
suspend and resume all devices.
The most notable effect is that resuming fails to properly reconfigure
interrupt routers. In 2.6.9 this was obscured by other kernel code, but in
2.6.10 this often causes post-resume APIC errors and near-total failure of
some PCI devices (e.g. network, sound and USB controllers).
Even in cases where interrupt routing is unaffected, this bug causes other
problems. For instance, on one of my systems I have to run "ifdown
eth0;ifup eth0" after resume in order to have functional networking, if I
do not apply this patch.
By itself, this patch is not theoretically complete; my next patch fixes
that. However, this patch is the critical one for fixing swsusp's behavior
in the real world.
Signed-off-by: Barry K. Nathan <barryn@pobox.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
When a thread stops for ptrace exit tracing, it cannot be resumed by
SIGKILL. Once PF_EXITING is set, SIGKILL will not cause a wakeup from stop
(see wants_signal in kernel/signal.c). This patch moves the ptrace stop
for exit tracing before the setting of PF_EXITING.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Upon reevaluation we think it is indeed safe to permit the race between
a ptrace call and the traced thread waking up, as long as it will never
get back to user mode. This patch makes SIGKILL wake up threads in
TASK_TRACED. That alone resolves most of the deadlock issues that
became possible with the introduction of TASK_TRACED, getting us back to
the killing behavior of 2.6.8 and before.
This patch also further cleans up ptrace detaching, so that threads are
left in TASK_STOPPED only if a job control stop is actually in effect,
and otherwise resume. This removes the past nuisances requiring a
SIGCONT to resume a thread even when it had a pending SIGKILL.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The __ptrace_unlink code that checks for TASK_TRACED fixed the problem of a
thread being left in TASK_TRACED when no longer being ptraced.
However, an oversight in the original fix made it fail to handle the
case where the child is ptraced by its real parent.
Fixed thus.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
into kroah.com:/home/greg/linux/BK/usb-2.6
|
|
The patch below removes an unused function from kernel/sched.c
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Vasia Pupkin <ptushnik@gmail.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Kernel core files converted to use the new lock initializers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Fix (harmless?) smp_processor_id() usage in preemptible section of
cpu_down.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This is the current remove-BKL patch. I test-booted it on x86 and x64, trying
every conceivable combination of SMP, PREEMPT and PREEMPT_BKL. All other
architectures should compile as well. (most of the testing was done with the
zaphod patch undone but it applies cleanly on vanilla -mm3 as well and should
work fine.)
this is the debugging-enabled variant of the patch which has two main
debugging features:
- debug potentially illegal smp_processor_id() use. Has caught a number
of real bugs - e.g. look at the printk.c fix in the patch.
- make it possible to enable/disable the BKL via a .config. If this
goes upstream we dont want this of course, but for now it gives
people a chance to find out whether any particular problem was caused
by this patch.
This patch has one important fix over the previous BKL patch: on PREEMPT
kernels if we preempted BKL-using code then the code still auto-dropped the
BKL by mistake. This caused a number of breakages for testers, which
breakages went away once this bug was fixed.
Also the debugging mechanism has been improved alot relative to the previous
BKL patch.
Would be nice to test-drive this in -mm. There will likely be some more
smp_processor_id() false positives but they are 1) harmless 2) easy to fix up.
We could as well find more real smp_processor_id() related breakages as well.
The most noteworthy fact is that no BKL-using code was found yet that relied
on smp_processor_id(), which is promising from a compatibility POV.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Fix some unlikely races in respect of vm_truncate_count.
Firstly, it's supposed to be guarded by i_mmap_lock, but some places copy a
vma structure by *new_vma = *old_vma: if the compiler implements that with a
bytewise copy, new_vma->vm_truncate_count could be munged, and new_vma later
appear up-to-date when it's not; so set it properly once under lock.
vma_link set vm_truncate_count to mapping->truncate_count when adding an empty
vma: if new vmas are being added profusely while vmtruncate is in progess,
this lets them be skipped without scanning.
vma_adjust has vm_truncate_count problem much like it had with anon_vma under
mprotect merge: when merging be careful not to leave vma marked as up-to-date
when it might not be, lest unmap_mapping_range in progress - set
vm_truncate_count 0 when in doubt. Similarly when mremap moving ptes from one
vma to another.
Cut a little code from __anon_vma_merge: now vma_adjust sets "importer" in the
remove_next case (to get its vm_truncate_count right), its anon_vma is already
linked by the time __anon_vma_merge is called.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Change the sched-domain debug routine to be called on a per-CPU basis, and
executed before the domain is actually attached to the CPU. Previously, all
CPUs would have their new domains attached, and then the debug routine would
loop over all of them.
This has two advantages: First, there is no longer any theoretical races: we
are running the debug routine on a domain that isn't yet active, and should
have no racing access from another CPU. Second, if there is a problem with a
domain, the validator will have a better chance to catch the error and print a
diagnostic _before_ the domain is attached, which may take down the system.
Also, change reporting of detected error conditions to KERN_ERR instead of
KERN_DEBUG, so they have a better chance of being seen in a hang on boot
situation.
The patch also does an unrelated (and harmless) cleanup in migration_thread().
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch adds a handful of cond_resched() points to a number of key,
scheduling-latency related non-inlined functions.
This reduces preemption latency for !PREEMPT kernels. These are scheduling
points complementary to PREEMPT_VOLUNTARY scheduling points (might_sleep()
places) - i.e. these are all points where an explicit cond_resched() had
to be added.
Has been tested as part of the -VP patchset.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
We dont want to execute off keventd since it might hold a semaphore our
callers hold too. This can happen when kthread_create() is called from
within keventd. This happened due to the IRQ threading patches but it
could happen with other code too.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
It adds cond_resched_softirq() which can be used by _process context_
softirqs-disabled codepaths to preempt if necessary. The function will
enable softirqs before scheduling. (Later patches will use this
primitive.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This is another generic fallout from the voluntary-preempt patchset: a
cleanup of the cond_resched() infrastructure, in preparation of the latency
reduction patches. The changes:
- uninline cond_resched() - this makes the footprint smaller,
especially once the number of cond_resched() points increase.
- add a 'was rescheduled' return value to cond_resched. This makes it
symmetric to cond_resched_lock() and later latency reduction patches
rely on the ability to tell whether there was any preemption.
- make cond_resched() more robust by using the same mechanism as
preempt_kernel(): by using PREEMPT_ACTIVE. This preserves the task's
state - e.g. if the task is in TASK_ZOMBIE but gets preempted via
cond_resched() just prior scheduling off then this approach preserves
TASK_ZOMBIE.
- the patch also adds need_lockbreak() which critical sections can use
to detect lock-break requests.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
SMP locking latencies are one of the last architectural problems that cause
millisec-category scheduling delays. CONFIG_PREEMPT tries to solve some of
the SMP issues but there are still lots of problems remaining: spinlocks
nested at multiple levels, spinning with irqs turned off, and non-nested
spinning with preemption turned off permanently.
The nesting problem goes like this: if a piece of kernel code (e.g. the MM
or ext3's journalling code) does the following:
spin_lock(&spinlock_1);
...
spin_lock(&spinlock_2);
...
then even with CONFIG_PREEMPT enabled, current kernels may spin on
spinlock_2 indefinitely. A number of critical sections break their long
paths by using cond_resched_lock(), but this does not break the path on
SMP, because need_resched() *of the other CPU* is not set so
cond_resched_lock() doesnt notice that a reschedule is due.
to solve this problem i've introduced a new spinlock field,
lock->break_lock, which signals towards the holding CPU that a
spinlock-break is requested by another CPU. This field is only set if a
CPU is spinning in a spinlock function [at any locking depth], so the
default overhead is zero. I've extended cond_resched_lock() to check for
this flag - in this case we can also save a reschedule. I've added the
lock_need_resched(lock) and need_lockbreak(lock) methods to check for the
need to break out of a critical section.
Another latency problem was that the stock kernel, even with CONFIG_PREEMPT
enabled, didnt have any spin-nicely preemption logic for the following,
commonly used SMP locking primitives: read_lock(), spin_lock_irqsave(),
spin_lock_irq(), spin_lock_bh(), read_lock_irqsave(), read_lock_irq(),
read_lock_bh(), write_lock_irqsave(), write_lock_irq(), write_lock_bh().
Only spin_lock() and write_lock() [the two simplest cases] where covered.
In addition to the preemption latency problems, the _irq() variants in the
above list didnt do any IRQ-enabling while spinning - possibly resulting in
excessive irqs-off sections of code!
preempt-smp.patch fixes all these latency problems by spinning irq-nicely
(if possible) and by requesting lock-breaks if needed. Two
architecture-level changes were necessary for this: the addition of the
break_lock field to spinlock_t and rwlock_t, and the addition of the
_raw_read_trylock() function.
Testing done by Mark H Johnson and myself indicate SMP latencies comparable
to the UP kernel - while they were basically indefinitely high without this
patch.
i successfully test-compiled and test-booted this patch ontop of BK-curr
using the following .config combinations: SMP && PREEMPT, !SMP && PREEMPT,
SMP && !PREEMPT and !SMP && !PREEMPT on x86, !SMP && !PREEMPT and SMP &&
PREEMPT on x64. I also test-booted x86 with the generic_read_trylock
function to check that it works fine. Essentially the same patch has been
in testing as part of the voluntary-preempt patches for some time already.
NOTE to architecture maintainers: generic_raw_read_trylock() is a crude
version that should be replaced with the proper arch-optimized version
ASAP.
From: Hugh Dickins <hugh@veritas.com>
The i386 and x86_64 _raw_read_trylocks in preempt-smp.patch are too
successful: atomic_read() returns a signed integer.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Heiko Carstens figured out that offlining a cpu can leak mm_structs because
the dying cpu's idle task fails to switch to init_mm and mmdrop its
active_mm before the cpu is down. This patch introduces idle_task_exit,
which allows the idle task to do this as Ingo suggested.
I will follow this up with a patch for ppc64 which calls idle_task_exit
from cpu_die.
Signed-off-by: Nathan Lynch <nathanl@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch removes two outdated/misleading comments from the CPU scheduler.
1) The first comment removed is simply incorrect. The function it
comments on is not used for what the comments says it is anymore.
2) The second comment is a leftover from when the "if" block it comments
on contained a goto. It does not any more, and the comment doesn't make
sense.
There isn't really a reason to add different comments, though someone might
feel differently in the case of the second one. I'll leave adding a
comment to anybody who wants to - more important to just get rid of them
now.
Signed-off-by: Josh Aas <josha@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch exports sched_setscheduler() so that it can be used by a kernel
module to set a kthread's scheduling policy and associated parameters.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
no need to call task_rq in setscheduler; just use rq
Signed-Off-By: Robert Love <rml@novell.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
schedule() can use prev instead of get_current().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Special casing tasks by interactive credit was helpful for preventing fully
cpu bound tasks from easily rising to interactive status.
However it did not select out tasks that had periods of being fully cpu
bound and then sleeping while waiting on pipes, signals etc. This led to a
more disproportionate share of cpu time.
Backing this out will no longer special case only fully cpu bound tasks,
and prevents the variable behaviour that occurs at startup before tasks
declare themseleves interactive or not, and speeds up application startup
slightly under certain circumstances. It does cost in interactivity
slightly as load rises but it is worth it for the fairness gains.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Change the granularity code to requeue tasks at their best priority instead
of changing priority while they're running. This keeps tasks at their top
interactive level during their whole timeslice.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
We can requeue tasks for cheaper then doing a complete dequeue followed by
an enqueue. Add the requeue_task function and perform it where possible.
This will be hit frequently by upcoming changes to the requeueing in
timeslice granularity.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The minimum timeslice was decreased from 10ms to 5ms. In the process, the
timeslice granularity was leading to much more rapid round robinning of
interactive tasks at cache trashing levels.
Restore minimum granularity to 10ms.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Timeslice proportion has been increased substantially for -niced tasks. As
a result of this kernel threads have much larger timeslices than they
previously had.
Change kernel threads' nice value to -5 to bring their timeslice back in
line with previous behaviour. This means kernel threads will be less
likely to cause large latencies under periods of system stress for normal
nice 0 tasks.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Convert whitespace in sched.c to tabs
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
There is a small problem with the active_load_balance() patch that Darren
sent out last week. As soon as we discover a potential 'target_cpu' from
'cpu_group' to try to push tasks to, we cease considering other CPUs in
that group as potential 'target_cpu's. We break out of the
for_each_cpu_mask() loop and try to push tasks to that CPU. The problem is
that there may well be other idle cpus in that group that we should also
try to push tasks to. Here is a patch to fix that small problem. The
solution is to simply move the code that tries to push the tasks into the
for_each_cpu_mask() loop and do away with the whole 'target_cpu' thing
entirely. Compiled & booted on a 16-way x440.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Fix can_migrate to allow aggressive steal for idle cpus. This -was- in
mainline, but I believe sched_domains kind of blasted it outta there. IMO,
it's a no brainer for an idle cpu (with all that cache going to waste) to
be granted to steal a task. The one enhancement I have made was to make
sure the whole cpu was idle.
Signed-off-by: <habanero@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch addresses some problems with wake_idle(). Currently wake_idle()
will wake a task on an alternate cpu if:
1) task->cpu is not idle
2) an idle cpu can be found
However the span of cpus to look for is very limited (only the task->cpu's
sibling). The scheduler should find the closest idle cpu, starting with
the lowest level domain, then going to higher level domains if allowed
(doamin has flag SD_WAKE_IDLE). This patch does this.
This and the other two patches (also to be submitted) combined have
provided as much at 5% improvement on that "online transaction DB workload"
and 2% on the industry standard J@EE workload.
I asked Martin Bligh to test these for regression, and he did not find any.
I would like to submit for inclusion to -mm and barring any problems
eventually to mainline.
Signed-off-by: <habanero@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
On x86-64, the attached patch is required to fix
> kernel/sys.c: In function `sys_setsid':
> kernel/sys.c:1078: error: `tty_sem' undeclared (first use in this function)
> kernel/sys.c:1078: error: (Each undeclared identifier is reported only once
> kernel/sys.c:1078: error: for each function it appears in.)
kernel/sys.c needs the tty_sem declaration from linux/tty.h.
|
|
Use the existing "tty_sem" to protect against the process tty changes
too.
|
|
Thisintroduces __GFP_ZERO as an additional gfp_mask element to allow to
request zeroed pages from the page allocator:
- Modifies the page allocator so that it zeroes memory if __GFP_ZERO is
set
- Replace all page zeroing after allocating pages by prior allocations with
allocations using __GFP_ZERO
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
into shinybook.infradead.org:/home/dwmw2/bk/mtd-2.6
|
|
into shinybook.infradead.org:/home/dwmw2/bk/mtd-2.6
|
|
__exit_mm() is an inlined version of exit_mm(). This patch unifies them.
Saves 356 byte in exit.o.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
I just did a quick audit of the use of exit_state and the EXIT_* bit
macros. I guess I didn't really review these changes very closely when you
did them originally. :-(
I found several places that seem like lossy cases of query-replace without
enough thought about the code. Linus has previously said the >= tests
ought to be & tests instead. But for exit_state, it can only ever be 0,
EXIT_DEAD, or EXIT_ZOMBIE--so a nonzero test is actually the same as
testing & (EXIT_DEAD|EXIT_ZOMBIE), and maybe its code is a tiny bit better.
The case like in choose_new_parent is just confusing, to have the
always-false test for EXIT_* bits in ->state there too.
The two cases in wants_signal and do_process_times are actual regressions
that will give us back old bugs in race conditions. These places had
s/TASK/EXIT/ but not s/state/exit_state/, and now there tests for exiting
tasks are now wrong and never catching them. I take it back: there is no
regression in wants_signal in practice I think, because of the PF_EXITING
test that makes the EXIT_* state checks superfluous anyway. So that is
just another cosmetic case of confusing code. But in do_process_times,
there is that SIGXCPU-while-exiting race condition back again.
Signed-off-by: Roland McGrath <roland@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
There is really no point in each task_struct having its own waitchld_exit.
In the only use of it, the waitchld_exit of each thread in a group gets
woken up at the same time. So, there might as well just be one wait queue
for the whole thread group. This patch does that by moving the field from
task_struct to signal_struct. It should have no effect on the behavior,
but saves a little work and a little storage in the multithreaded case.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
There is a BUG_ON in ptrace_stop that hits if the thread is not ptraced.
However, there is no synchronization between a thread deciding to do a
ptrace stop and so going here, and its ptracer dying and so detaching from
it and clearing its ->ptrace field.
The RHEL3 2.4-based kernel has a backport of a slightly older version of
the 2.6 signals code, which has a different but equivalent BUG_ON. This
actually bit users in practice (when the debugger dies), but was
exceedingly difficult to reproduce in contrived circumstances. We moved
forward in RHEL3 just by removing the BUG_ON, and that fixed the real user
problems even though I was never able to reproduce the scenario myself.
So, to my knowledge this scenario has never actually been seen in practice
under 2.6. But it's plain to see from the code that it is indeed possible.
This patch removes that BUG_ON, but also goes further and tries to handle
this case more gracefully than simply avoiding the crash. By removing the
BUG_ON alone, it becomes possible for the real parent of a process to see
spurious SIGCHLD notifications intended for the debugger that has just
died, and have its child wind up stopped unexpectedly. This patch avoids
that possibility by detecting the case when we are about to do the ptrace
stop but our ptracer has gone away, and simply eliding that ptrace stop
altogether as if we hadn't been ptraced when we hit the interesting event
(signal or ptrace_notify call for syscall tracing or something like that).
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
After my last change, there are plenty of unused bits available in the new
flags word in signal_struct. This patch moves the `group_exit' flag into
one of those bits, saving a word in signal_struct.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The `sig_avoid_stop_race' checks fail to catch a related race scenario that
can happen. I don't think this has been seen in nature, but it could
happen in the same sorts of situations where the observed problems come up
that those checks work around. This patch takes a different approach to
catching this race condition. The new approach plugs the hole, and I think
is also cleaner.
The issue is a race between one CPU processing a stop signal while another
CPU processes a SIGCONT or SIGKILL. There is a window in stop-signal
processing where the siglock must be released. If a SIGCONT or SIGKILL
comes along here on another CPU, then the stop signal in the midst of being
processed needs to be discarded rather than having the stop take place
after the SIGCONT or SIGKILL has been generated. The existing workaround
checks for this case explicitly by looking for a pending SIGCONT or SIGKILL
after reacquiring the lock.
However, there is another problem related to the same race issue. In the
window where the processing of the stop signal has released the siglock,
the stop signal is not represented in the pending set any more, but it is
still "pending" and not "delivered" in POSIX terms. The SIGCONT coming in
this window is required to clear all pending stop signals. But, if a stop
signal has been dequeued but not yet processed, the SIGCONT generation will
fail to clear it (in handle_stop_signal). Likewise, a SIGKILL coming here
should prevent the stop processing and make the thread die immediately
instead. The `sig_avoid_stop_race' code checks for this by examining the
pending set to see if SIGCONT or SIGKILL is in it. But this fails to
handle the case where another CPU running another thread in the same
process has already dequeued the signal (so it no longer can be found in
the pending set). We must catch this as well, so that the same problems do
not arise when another thread on another CPU acted real fast.
I've fixed this dumping the `sig_avoid_stop_race' kludge in favor of a
little explicit bookkeeping. Now, dequeuing any stop signal sets a flag
saying that a pending stop signal has been taken on by some CPU since the
last time all pending stop signals were cleared due to SIGCONT/SIGKILL.
The processing of stop signals checks the flag after the window where it
released the lock, and abandons the signal the flag has been cleared. The
code that clears pending stop signals on SIGCONT generation also clears
this flag. The various places that are trying to ensure the process dies
quickly (SIGKILL or other unhandled signals) also clear the flag. I've
made this a general flags word in signal_struct, and replaced the
stop_state field with flag bits in this word.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Peter Chubb recently split out a standalone sys_ni.c file for the not
implemented syscalls. This patch removes the redundant sys_delete_module()
in module.c.
Signed-off-by: Coywolf Qi Hunt <coywolf@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
- Merge sys32_rt_sigtimedwait function in X86_64, IA64, PPC64, MIPS,
SPARC64, S390 32 bit layer into 1 compat_rt_sigtimedwait function. It will
also fix a bug of copy wrong information to 32 bit userspace siginfo
structure on X86_64, IA64 and SPARC64 when calling sigtimedwait on 32 bit
layer.
- Change all name the of siginfo_t32 structure in X86_64, IA64, MIPS,
SPARC64 and S390 to the name compat_siginfo_t as used in PPC64.
- Patch introduced a macro __COMPAT_ENDIAN_SWAP__ in
include/asm-mips/compat.h when MIPS kernel is compiled in little-endian
mode. This macro is used to do byte swapping in function
sigset_from_compat.
- This patch is only tested on X86_64 and IA_64.
Signed-off-by: Zou Nan hai <Nanhai.zou@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
When setting the 'cpu_isolated_map' mask, check that the user input value
is valid (in range 0 .. NR_CPUS - 1). Also fix up kernel-parameters.txt
for this parameter.
Signed-off-by: Randy Dunlap <rddunlap@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
A while back we added the PR_SET_NAME prctl, but no PR_GET_NAME. I guess
we should add this, if only to enable testing of PR_SET_NAME.
Signed-off-by: Prasanna Meda <pmeda@akamai.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Move 'panic_timeout' to linux/kernel.h.
ipmi_watchdog.c wanted to know why panic_timeout isn't in some header file.
However, ipmi_watchdog.c doesn't even use it, so that reference was
deleted. Other references now use kernel.h instead of straight extern int.
Signed-off-by: Randy Dunlap <rddunlap@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Based on an initial patch from Oleg Nesterov <oleg@tv-sign.ru>
rcu_data.last_qsctr is not needed. Actually, not even a counter is needed,
just a flag that indicates that there was a quiescent state.
Signed-Off-By: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The patch below makes two needlessly global structs static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
rcu_ctrlblk.lock is used to read the ->cur and ->next_pending
atomically in __rcu_process_callbacks(). It can be replaced
by a couple of memory barriers.
rcu_start_batch:
rcp->next_pending = 0;
smp_wmb();
rcp->cur++;
__rcu_process_callbacks:
rdp->batch = rcp->cur + 1;
smp_rmb();
if (!rcp->next_pending)
rcu_start_batch(rcp, rsp, 1);
This way, if __rcu_process_callbacks() sees incremented ->cur value,
it must also see that ->next_pending == 0 (or rcu_start_batch() is
already in progress on another cpu).
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch corrects a problem that was originally added with the nanosecond
timestamps in stat patch. The problem is that some file systems don't have
enough space in their on disk inode to save nanosecond timestamps, so they
truncate the c/a/mtime to seconds when flushing an dirty node. In core the
inode would have full jiffies granuality.
This can be observed by programs as a timestamp that jumps backwards under
specific loads when an inode is flushed and then reloaded from disk.
The problem was already known when the original patch went in, but it
wasn't deemed important enough at that time. So far there has been only
one report of it causing problems. Now Tridge is worried that it will
break running Excel over samba4 because Excel seems to do very anal
timestamp checking and samba4 will supply 100ns timestamps over the
network.
This patch solves it by putting the time resolution into the superblock of
a fs and always rounding the in core timestamps to that granuality.
This also supercedes some previous ext2/3 hacks to flush the inode less
often when only the subsecond timestamp changes.
I tried to keep the overhead low, in particular it tries to keep divisions
out of fast paths as far as possible.
The patch is quite big but 99% of it is just relatively straight forward
search'n'replace in a lot of fs. Unconverted filesystems will default to a
1ns granuality, but may still show the problem if they continue to use
CURRENT_TIME. I converted all in tree fs.
One possible future extension of this would be to have two time
granualities per superblock - one that specifies the visible resolution,
and the other to specify how often timestamps should be flushed to disk,
which could be tuned with a mount option per fs (e.g. often m/atimes don't
need to be flushed every second). Would be easy to do as an addon if
someone is interested.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
I realized that the best way to get the sys_time/sys_stime problem fixed is
to make sys_time 64 bit safe by using "time_t *" instead of "int *" and to
introduce two proper compat functions compat_sys_time and compat_sys_stime.
The prototype change of sys_time is transparent for 32 bit architectures
because both "int" and "time_t" are 32 bit. For 64 bit the type change
would be wrong but luckily no 64 bit architecture uses sys_time/sys_stime
in 64 bit mode. The patch makes the following change:
ia64 : Remove sys32_time, use compat_sys_time and
add (!!) compat_sys_stime to compat syscall table.
mips : Use compat_sys_time/compat_sys_stime in 32 bit syscall table.
Add #ifdef magic to compile sys_time/sys_stime and
compat_sys_time/compat_sys_stime only if needed.
parisc : Remove sys32_time, use compat_sys_time and compat_sys_stime.
ppc64 : remove sys32_time, ppc64_sys32_stime and ppc64_sys_stime.
Use common compat_sys_time, compat_sys_stime and sys_stime.
s390 : Use compat_sys_stime. Add #ifdef magic to compile
sys_time/sys_stime and compat_sys_time/compat_sys_stime only
if needed.
sparc64 : Use compat_sys_time/compat_Sys_stime in 32 bit syscall table.
um : Remove um_time and um_stime. Use common functions sys_time and
sys_stime. This adds a CAP_SYS_TIME check to UMs stime call.
x86_64 : Remove sys32_time. Use compat_sys_time and compat_sys_stime
in 32 bit syscall table.
The original stime bug is fixed for mips, parisc, s390, sparc64 and
x86_64. Can the arch-maintainers please take a look at this?
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
Convert compat_time_t to time_t in 32 bit emulation for sys_stime and
consolidate all the different implementation of sys_time, sys_stime and
their 32-bit emulation parts.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
We can call might_sleep() functions on the oops handling path (under do_exit).
There seem little point in emitting spurious might_sleep() warnings into the
logs after the kernel has oopsed.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Bring the total_forks under tasklist_lock. When most of the fork code
icluding nr_threads is moved to copy_process() from do_fork() code in 2.6,
this is left out.
Althought accuracy of total_forks is not important, it would be nice to add
this. It does not involve additional cost, and the code will be cleaner if
it is grouped with nr_threads. The difference is, total_forks will
increase on fork, but nr_threads will increase on fork and decrease on the
exit.
I also moved extern decleration to sched.h from proc_misc.c.
Signed-off-by: Prasanna Meda <pmeda@akamai.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This code is the same for all architectures with the following invariants:
- arm gurantees irqs are disabled when calling irq_exit so it can call
__do_softirq directly instead of do_softirq
- arm26 is totally broken for about half a year, I didn't care for it
- some architectures use softirq_pending(smp_processor_id()) instead of
local_softirq_pending, but they always evaluate to the same
This patch moves the out of line irq_exit implementation from
kernel/irq/handle.c which depends on CONFIG_GENERIC_HARDIRQS to
kernel/softirq.c which is always compiled, tweaks it for the arm special
case and moves the irq_enter/irq_exit/nmi_enter/nmi_exit bits from
asm-*/hardirq.h to linux/hardirq.h
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Fix module parameter quote handling.
Module parameter strings (with spaces) are quoted like so:
"modprm=this test"
and not like this:
modprm="this test"
Signed-off-by: Randy Dunlap <rddunlap@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch is to offer common accounting data collection method at memory
usage for various accounting packages including BSD accounting, ELSA, CSA
and any other acct packages that use a common layer of data collection.
New struct fields are added to mm_struct to save high watermarks of rss
usage as well as virtual memory usage.
New struct fields are added to task_struct to collect accumulated rss usage
and vm usages.
These data are collected on per process basis.
Signed-off-by: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch is to offer common accounting data collection method at I/O for
various accounting packages including BSD accounting, ELSA, CSA and any
other acct packages that use a common layer of data collection.
Patch is made to fs/read_write.c to collect per process data on character
read/written in bytes and number of read/write syscalls made.
New struct fields are added to task_struct to store the data.
These data are collected on per process basis.
Signed-off-by: Jay Lan <jlan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The attached patch fixes a number of problems in the VM routines:
(1) Some inline funcs don't compile if CONFIG_MMU is not set.
(2) swapper_pml4 needn't exist if CONFIG_MMU is not set.
(3) __free_pages_ok() doesn't counter set_page_refs() different behaviour if
CONFIG_MMU is not set.
(4) swsusp.c invokes TLB flushing functions without including the header file
that declares them.
CONFIG_SHMEM semantics:
- If MMU: Always enabled if !EMBEDDED
- If MMU && EMBEDDED: configurable
- If !MMU: disabled
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The attached patch makes it possible to support gp-rel addressing for small
variables. Since the FR-V cpu's have fixed-length instructions and plenty of
general-purpose registers, one register is nominated as a base for the small
data area. This makes it possible to use single-insn accesses to access
global and static variables instead of having to use multiple instructions.
This, however, causes problems with small variables used to pinpoint the
beginning and end of sections. The compiler assumes it can use gp-rel
addressing for these, but the linker then complains because the displacement
is out of range.
By declaring certain variables as arrays or by forcing them into named
sections, the compiler is persuaded to access them as if they can be outside
the displacement range. Declaring the variables as "const void" type also
works.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
In the current kernel/capability.c:sys_capset() code, permission is
denied if CAP_SETPCAP is not held and pid is positive. pid=0 means use
the current process, and this is allowed. But using the current
process' pid is not allowed. The man page for capsetp simply says that
CAP_SETPCAP is required to use this function, and does not mention the
exception for pid=0.
The current behavior seems inconsistent. The attached patch also
allows a process to call capset() on itself.
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The attached patch removes checks from kernel/capability.c which are
redundant with cap_capset_check() code, and moves the capset_check() calls
to immediately before the capset_set() calls. This allows capset_check()
to accurately check the setter's permission to set caps on the target.
Please apply.
Signed-off-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Stephen Smalley <sds@epoch.ncsc.mil>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
into ppc970.osdl.org:/home/torvalds/v2.6/linux
|
|
Some machines are spending minutes of CPU time during suspend in stupid O(n^2)
algorithm. This patch replaces it with O(n) algorithm, making swsusp usable
to some people.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This adds statics at few places and fixes stale references to pmdisk.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
swsusp contains few one-line helpers that only make reading/understanding code
more difficult. Also warn the user when something goes wrong, instead of
waking machine with corrupt data.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Variable used only for writing is bad idea.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
At few points we still reference to swsusp as "pmdisk"... it might confuse
someone not knowing full history.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch exports to userspace the boot loader ID which has been exported
by (b)zImage boot loaders since boot protocol version 2.
It is needed so that update tools that update kernels from vendors know which
bootloader file they need to update; eg right now those tools do all kinds of
hairy heuristics to find out if it's grub or lilo or .. that installed the
kernel. Those heuristics are fragile in the presence of more than one
bootloader (which isn't that uncommon in OS upgrade situations).
Tested on i386 and x86-64; as far as I know those are the only
architectures which use zImage/bzImage format.
Signed-Off-By: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
* Treat the gate page as part of the kernel, to improve kernel backtraces.
* Honour CONFIG_KALLSYMS_ALL, all symbols are valid, not just text.
Signed-off-by: Keith Owens <kaos@ocs.com.au>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
|
switch_uid() doesn't care about tasklist_lock, so do it outside
the lock and avoid a subtle (and very very unlikely to trigger)
AB-BA deadlock.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
into kroah.com:/home/greg/linux/BK/usb-2.6
|
|
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
|
|
This patch reverts the additions of an ABI supporting thread and process
CPU clocks in the posix-timers code. This returns us to 2.6.9's condition,
there is no support for any new clockid_t values for process CPU clocks.
This also fixes the return value for clock_nanosleep when unsupported (I
think this is used only by sgi-timer at the moment). The POSIX-specified
code for valid clocks that don't support the sleep operation is ENOTSUP.
On most architectures the kernel doesn't define ENOTSUP and this name is
defined in userland the same as the kernel's EOPNOTSUPP.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Reimplement parameter attributes using attribute group.
This makes more sense, for, while they reside in a separate
subdirectory, they belong to the ownig module and their
lifetime exactly equals the lifetime of the owning module,
and it's simpler.
Signed-off-by: Tejun Heo <tj@home-tj.org>
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
|
|
Reimplement section attributes using attribute group. This
makes more sense, for, while they reside in a separate
subdirectory, they belong to the ownig module and their
lifetime exactly equals the lifetime of the owning module,
and it's simpler.
Signed-off-by: Tejun Heo <tj@home-tj.org>
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
|
|
Modify module_attribute show/store methods to accept self
argument to enable further extensions.
Signed-off-by: Tejun Heo <tj@home-tj.org>
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
|
|
Make module.mkobj inline. As this is simpler and what's
usually done with kobjs when it's representing an entity.
Signed-off-by: Tejun Heo <tj@home-tj.org>
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
|
|
Klaus Dittrich observed this bug and posted a test case for it.
This patch fixes both that failure mode and some others possible. What
Klaus saw was a false negative (i.e. ECHILD when there was a child)
when the group leader was a zombie but delayed because other children
live; in the test program this happens in a race between the two threads
dying on a signal.
The change to the TASK_TRACED case avoids a potential false positive
(blocking, or WNOHANG returning 0, when there are really no children
left), in the race condition where my_ptrace_child returns zero.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
into shinybook.infradead.org:/home/dwmw2/bk/mtd-2.6
|
|
This fixes types so that sparse has less stuff to complain about.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Fixes typo in header, please apply,
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This fixes confusing printk.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This fixes memory leak when we are low on memory during suspend. Ouch and
nr_needed_pages is only used twice, and only written :-(. I guess that can
wait for 2.6.10.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This prevents oops when not enough memory is available during resume.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Fix an oops in sched_domain_debug when using the isolcpus= option.
Also move a debug check for validating groups into the "for-each-group"
loop, where it should be.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The real bug was in the debugging code, not the actual domain data
structure setup.
Cset exclude: sivanich@sgi.com[torvalds]|ChangeSet|20041207160443|30564
|
|
The isolcpus option is broken in 2.6.10-rc2-bk2. The domains are no longer
being properly initialized (which results in a panic at bootup).
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Specify an initial value signal_struct's field stop_state whenever a
signal_struct variable is created.
Bug was discovered through the occasional failure of telnet(1) to connect.
Signed-off-by: Joe Korty <joe.korty@ccur.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This change brings the semantics equivalent to 2.4 and also to what the man
page says; Also optimises by avoiding unneeded lookup in uid cache, when
who is same as the current->uid.
sys_set/getpriority is rewritten in 2.5/2.6, perhaps while transitioning to
the pid maps. It has now semantical bug, when uid is zero. Note that akpm
also fixed refcount leak and locking in the new functions in changeset
http://linus.bkbits.net:8080/linux-2.5/cset@1.1608.10.84
Signed-off-by: <pmeda@akamai.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The generic irq affinity code limits us to a single cpu target regardless
of what the architecture supports. If required this should be done in the
architecture specific ->set_affinity call.
With this patch ppc64 is able to select all cpus affinity again.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Stephen Rothwell noted a case where one CPU was sitting in userspace, one
in stop_machine() waiting for everyone to enter stopmachine(). This can
happen if migration occurs at exactly the wrong time with more than 2 CPUS.
Say we have 4 CPUS:
1) stop_machine() on CPU 0creates stopmachine() threads for CPUS 1, 2
and 3, and yields waiting for them to migrate to their CPUs and
ack.
2) stopmachine(2) gets rebalanced (probably on exec) to CPU 1.
3) stopmachine(2) calls set_cpus_allowed on CPU 1, sleeps awaiting
migration thread.
4) stopmachine(1) calls set_cpus_allowed on CPU 0, moves onto CPU1 and
starts spinning.
Now the migration thread never runs, and we deadlock. The simplest
solution is for stopmachine() to yield until they are all in place.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Vadim says:
I was reading through the kernel/power/Kconfig file, and noticed that
the wording was slightly unclear. I poked at it a bit, hopefully making
the description a tad more straightforward, but you be the judge. :)
Diffed against 2.6.10-rc2.
From: Vadim Lobanov <vlobanov@speakeasy.net>
Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
If we're waiting on a futex and we are woken up, it's either because
someone did FUTEX_WAKE, we timed out, or have been signalled. However, the
WARN_ON(!signal_pending(current)) test is overzealous: with threads (a
common use of futexes), we share the signal handler and the other
thread might get to the signal before us. In addition, exit_notify()
can do a recalc_sigpending_tsk() on us, which will then clear our
TIF_SIGPENDING bit, making signal_pending(current) return false.
Returning EINTR is a little strange in this case, since this thread
hasn't handled a signal. However, with threads it's the best we can
do: there's always a race where another thread could have been the
actual one to handle the signal.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
into dwmw2.baythorne.internal:/inst/bk/mtd-2.6
|
|
We just spent some days fighting a rare race in one of the distro's who backported
some of timer.c from 2.6 to 2.4 (though they missed a bit).
The actual race we found didn't happen in 2.6 _but_ code inspection showed that a
similar race is still present in 2.6, explanation below:
Code removing a timer from a list (run_timers or del_timer) takes that CPU list
lock, does list_del, then timer->base = NULL.
It is mandatory that this timer->base = NULL is visible to other CPUs only after
the list_del() is complete. If not, then mod timer could see it NULL, thus take it's
own CPU list lock and not the one for the CPU the timer was beeing removed from the
list, and thus the list_add in mod_timer() could race with the list_del() from
run_timers() or del_timer().
Our race happened with run_timers(), which _DOES_ contain a proper smp_wmb() in the
right spot in 2.6, but didn't in the "backport" we were fighting with.
However, del_timer() doesn't have such a barrier, and thus is subject to this race in
2.6 as well. This patch fixes it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This adds an early polled-mode "uart" console driver, based on Andi Kleen's
early_printk work.
The difference is that this locates the UART device directly by its MMIO or
I/O port address, so we don't have to make assumptions about how ttyS
devices will be named. After the normal serial driver starts, we try to
locate the matching ttyS device and start a console there.
Sample usage:
console=uart,io,0x3f8
console=uart,mmio,0xff5e0000,115200n8
If the baud rate isn't specified, we peek at the UART to figure it out.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
PREEMPT_RT on SMP systems triggered weird (very high) load average
values rather easily, which turned out to be a mainline kernel
->nr_uninterruptible handling bug in try_to_wake_up().
the following code:
if (old_state == TASK_UNINTERRUPTIBLE) {
old_rq->nr_uninterruptible--;
potentially executes with old_rq potentially being != rq, and hence
updating ->nr_uninterruptible without the lock held. Given a
sufficiently concurrent preemption workload the count can get out of
whack and updates might get lost, permanently skewing the global count.
Nothing except the load-average uses nr_uninterruptible() so this
condition can go unnoticed quite easily.
the fix is to update ->nr_uninterruptible always on the runqueue where
the task currently is. (this is also a tiny performance plus for
try_to_wake_up() as a stackslot gets freed up.)
while fixing this bug i found three other ->nr_uninterruptible related
bugs:
- the update should be moved from deactivate_task() into schedule(),
beacause e.g. setscheduler() does deactivate_task()+activate_task(),
which in turn may result in a -1 counter-skew if setscheduler() is
done on a task asynchronously, which task is still on the runqueue
but has already set ->state to TASK_UNINTERRUPTIBLE.
sys_sched_setscheduler() is used rarely, but the bug is real. (The
fix is also a small performance enhancement.)
The rules for ->nr_uninterruptible updating are the following: it
gets increased by schedule() only, when a task is moved off the
runqueue and it has a state of TASK_UNINTERRUPTIBLE. It is decreased
by try_to_wake_up(), by the first wakeup that materially changes the
state from TASK_UNINTERRUPTIBLE back to TASK_RUNNING, and moves the
task to the runqueue.
- on CPU-hotplug down we might zap a CPU that has a nonzero counter.
Due to the fuzzy nature of the global counter a CPU might hold a
nonzero ->nr_uninterruptible count even if it has no tasks anymore.
The solution is to 'migrate' the counter to another runqueue.
- we should not return negative counter values from the
nr_uninterruptible() function, since it accesses them without taking
the runqueue locks, so the total sum might be slightly above or
slightly below the real count.
I tested the attached patch on x86 SMP and it solves the load-average
problem. (I have tested CPU_HOTPLUG compilation but not functionality.)
I think this is a must-have for 2.6.10, because there are apps that go
berzerk if load-average is too high (e.g. sendmail).
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
We should not touch "self_exec_id" here. The parent changed,
not we.
|
|
The work address is increasingly unreliable and incompetently run.
Time to remove all visible instances of it and rely only on one
which isn't run by crack-monkeys.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
|
|
The attached patch fixes the fork fix to avoid the divide-by-zero error I'd
previously fixed, but without using any sort of conditional.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The patch was wrong. Back it out, and add some commentary explaining why we
need to run queue_me() prior to the get_user().
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
task_nice() was exported for binfmt_elf, however that's no longer modular.
normalize_rt_tasks() is used by the sysreq code only, which isn't modular.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
NPTL has 3 control counters (total/wake/woken).
so NPTL can know:
"how many threads enter to wait"(total),
"how many threads receive wake signal"(wake),
and "how many threads exit waiting"(woken).
Abstraction of pthread_cond_wait and pthread_cond_signal are:
A01 pthread_cond_wait {
A02 timeout = 0;
A03 lock(counters);
A04 total++;
A05 val = get_from(futex);
A06 unlock(counters);
A07
A08 sys_futex(futex, FUTEX_WAIT, val, timeout);
A09
A10 lock(counters);
A11 woken++;
A12 unlock(counters);
A13 }
B01 pthread_cond_signal {
B02 lock(counters);
B03 if(total>wake) { /* if there is waiter */
B04 wake++;
B05 update_val(futex);
B06 sys_futex(futex, FUTEX_WAKE, 1);
B07 }
B08 unlock(counters);
B09 }
What we have to notice is:
FUTEX_WAKE could be called before FUTEX_WAIT have called (at A07).
In such case, FUTEX_WAKE will fail if there is no thread in waitqueue.
However, since pthread_cond_signal do not only wake++ but also
update_val(futex), next FUTEX_WAIT will fail with -EWOULDBLOCK because the val
passed to WAIT is now not equal to updated val. Therefore, as the result, it
seems that the WAKE wakes the WAIT.
===
The bug will appear if 2 pair of wait & wake called at (nearly)once:
* Assume 4 threads, wait_A, wait_B, wake_X, and wake_Y
* counters start from [total/wake/woken]=[0/0/0]
* the val of futex starts from (0), update means inclement of the val.
* there is no thread in waitqueue on the futex.
[simulation]
wait_A: calls pthread_cond_wait:
total++, prepare to call FUTEX_WAIT with val=0.
# status: [1/0/0] (0) queue={}(empty) #
wake_X: calls pthread_cond_signal:
no one in waitqueue, just wake++ and update futex val.
# status: [1/1/0] (1) queue={}(empty) #
wait_B: calls pthread_cond_wait:
total++, prepare to call FUTEX_WAIT with val=1.
# status: [2/1/0] (1) queue={}(empty) #
wait_A: calls FUTEX_WAIT with val=0:
after queueing, compare val. 0!=1 ... this should be blocked...
# status: [2/1/0] (1) queue={A} #
wait_B: calls FUTEX_WAIT with val=1:
after queueing, compare val. 1==1 ... OK, let's schedule()...
# status: [2/1/0] (1) queue={A,B} (B=sleeping) #
wake_Y: calls pthread_cond_signal:
A is in waitqueue ... dequeue A, wake++ and update futex val.
# status: [2/2/0] (2) queue={B} (B=sleeping) #
wait_A: end of FUTEX_WAIT with val=0:
try to dequeue but already dequeued, return anyway.
# status: [2/2/0] (2) queue={B} (B=sleeping) #
wait_A: end of pthread_cond_wait:
woken++.
# status: [2/2/1] (2) queue={B} (B=sleeping) #
This is bug:
wait_A: wakeup
wait_B: sleeping
wake_X: wake A
wake_Y: wake A again
if subsequent wake_Z try to wake B:
wake_Z: calls pthread_cond_signal:
since total==wake, do nothing.
# status: [2/2/1] (2) queue={B} (B=sleeping) #
If wait_C comes, B become to can be woken, but C...
This bug makes the waitqueue to trap some threads in it all time.
====
> - According to man of futex:
> "If the futex was not equal to the expected value, the operation
> returns -EWOULDBLOCK."
> but now, here is no description about the rare case:
> "returns 0 if the futex was not equal to the expected value, but
> the process was woken by a FUTEX_WAKE call."
> this behavior on rare case causes the hang which I found.
So to avoid this problem, my patch shut up the window that you said:
> The patch certainly looks sensible - I can see that without the patch,
> there is a window in which this process is pointlessly queued up on the
> futex and that in this window a wakeup attempt might do a bad thing.
=====
In short:
There is an un-documented behavior of futex_wait. This behavior misleads
NPTL to wake a thread doubly, as the result, causes an application hang.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The attached patch fixes fork to get rid of the assumption that THREAD_SIZE
>= PAGE_SIZE (on the FR-V the smallest available page size is 16KB).
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
profile_hook unconditionally takes a read lock on profile_lock if kernel
profiling is enabled. The lock protects the profile_hook notifier chain
from being written while it's being called. The routine profile_hook is
called in a very hot path though: every timer tick on every CPU. As you
can imagine, on a large system, this makes the cacheline containing
profile_lock pretty hot. Since oprofile was the only user of the
profile_hook, I removed the notifier chain altogether in favor of a simple
function pointer with the help of John Levon. This removes all of the
contention in the hot path since the variable is very seldom written and
simplifies things a little to boot.
Acked-by: John Levon <levon@movementarian.org>
Signed-off-by: Jesse Barnes <jbarnes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
On PA-RISC, we have a unified syscall table for 32 and 64 bit that uses
macros to generate the appropriate syscall names (native vs compat). For
this to work, we need consistent compat syscall names. Unfortunately, some
recent additions drop the 'sys_'.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Only set the flag in the cases when the exit state is not either
TASK_DEAD or TASK_ZOMBIE.
(TASK_DEAD or TASK_ZOMBIE will either race or we'll return the
information, so no need to note them).
I confirmed that this fixes the problem and I also ran some LTP tests
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
released the tasklist_lock.
Since it released the lock, the process lists may not
be valid any more, and we must repeat the loop rather than
continue with the next parent.
Use -EAGAIN to show this condition (separate from the
normal -EFAULT that may happen if rusage information could
not be copied to user space).
|
|
At unload i8042 sets panic_blink to 0. This will cause problems if kernel
panics later as it will just use it assuming that the pointer is correct.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
- Kprobes structure has been modified to support copying of original
instruction as required by the architecture. On x86_64 normal pages we
get from kmalloc or vmalloc are not executable. Single-stepping an
instruction on such a page yields an oops. So instead of storing the
instruction copies in their respective kprobe objects, we allocate a
page, map it executable, and store all the instruction copies there and
store the pointer of the copied instruction in the specific kprobes
object.
- jprobe_return_end is moved into inline assembly to avoid compiler
optimization.
- arch_prepare_kprobe() now returns an integer,since
arch_prepare_kprobe() might fail on other architectures.
- added arch_remove_kprobe() routine, since other architectures requires
it.
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Teach sysrq-N to switch all rt-policy tasks to SCHED_OTHER. For recovering
from (and diagnosing) userspace bugs.
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Since 2.6.4 we've been ignoring the failure of try_stop_module: it will
normally fail if the module reference count is non-zero. This would have
been mainly unnoticed, since "modprobe -r" checks the usage count before
calling sys_delete_module(), however there is a race which would cause a
hang in this case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
kfifo_alloc tries to round up the buffer size to the next power of two.
But it accidently uses the original size when calling kfifo_init,
which will BUG.
Acked-by: Stelian Pop <stelian@popies.net>
Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This clarifies more of the x86 caller/callee stack ownership
issues by making the exception and interrupt handler assembler
interfaces use register calling conventions.
System calls still use the stack.
Tested with "crashme" on UP/SMP.
|
|
This patch readds the panic blinking that was in 2.4 to 2.6. This is
useful to see when you're in X that the machine has paniced
It addresses previously criticism.
It should work now when the keyboard interrupt is off.
It doesn't fully emulate the handler, but has a timeout
for this case.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Ported from i386
Support a sysctl to raise an oops with an NMI
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Sticking the not-implemented syscall stuff in sys.c is a pain because the
cond_syscall()s explode when certain prototypes are in scope. And we need
those prototypes' header files for the C code in sys.c.
Fix all that up by moving all the sys_ni_syscall code into its own .c file.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
- fix broken IBM cyclone time interpolator support
- add support for cyclic timers through an addition of a mask
in the timer interpolator structure
- Allow time_interpolator_update() and time_interpolator_get_offset()
to be invoked without an active time interpolator
(necessary since the cyclone clock is initialized late in ACPI
processing)
- remove obsolete function time_interpolator_resolution()
- add a mask to all struct time_interpolator setups in the
kernel
- Make time interpolators work on 32bit platforms
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Move hotplug_path[] out of kmod.[ch] to kobject_uevent.[ch] where
it belongs now. At some time in the future we should fix the remaining bad
hotplug calls (no SEQNUM, no netlink uevent):
./drivers/input/input.c (no DEVPATH on some hotplug events!)
./drivers/pnp/pnpbios/core.c
./drivers/s390/crypto/z90main.c
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
|
|
In particular, a function that is called with a lock held, and
releases it only to re-acquire it needs to be annotated as such,
since otherwise sparse will complain about an unexpected unlock,
even though "globally" the lock is constant over the call.
|
|
This annotates the scheduler routines for locking, telling
what locks a function releases or acquires, allowing sparse
to check the lock usage (and documenting it at the same time).
|
|
Christoph suggests letting the compiler choose. No real compelling reason
to inline anyhow. I had some vmlinux size numbers suggesting inline was
better, but re-running them on newer kernel is giving different results,
favoring uninline. Best let compiler choose. Un-inline __sigqueue_alloc.
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
make dnotify configurable, via CONFIG_DNOTIFY. CONFIG_EMBEDDED is required
for disabling dnotify.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This adds typechecking to suspend types and powerdown types. This should
solve at least part of suspend type confusion. There should be no code
changes generated by this one.
Acked-by: Patrick Mochel <mochel@digitalimplant.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
__handle_sysrq already prints a newline, so the action_msg string doesnt
need yet another newline.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This was used by the early irqstacks implementation on s390 and has been
replaced by __ARCH_HAS_DO_SOFTIRQ now.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
module_attribute.show is defined to return ssize_t
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
To make spinlock/rwlock initialization consistent all over the kernel,
this patch converts explicit lock-initializers into spin_lock_init() and
rwlock_init() calls.
Currently, spinlocks and rwlocks are initialized in two different ways:
lock = SPIN_LOCK_UNLOCKED
spin_lock_init(&lock)
rwlock = RW_LOCK_UNLOCKED
rwlock_init(&rwlock)
this patch converts all explicit lock initializations to
spin_lock_init() or rwlock_init(). (Besides consistency this also helps
automatic lock validators and debugging code.)
The conversion was done with a script, it was verified manually and it
was reviewed, compiled and tested as far as possible on x86, ARM, PPC.
There is no runtime overhead or actual code change resulting out of this
patch, because spin_lock_init() and rwlock_init() are macros and are
thus equivalent to the explicit initialization method.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
add_timer_on() isn't used by modules (in fact it's only used ONCE, in
workqueue.c) and it's not even a good api for drivers, in fact, the comment
for it says
* This is not very scalable on SMP. Double adds are not possible.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This isn't exactly the kind of interface modules should use.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This recently added function is only used by the posix timers code, no need
to be exported.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The patch below unexports raise_softirq(). raise_softirq() is not the
right api for drivers to use, instead raise_softirq_irqoff() is, and
thankfully all in-kernel code is using that variant already. To avoid
future "accidents", unexport.
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Removes a redundant #ifdef CONFIG_SMP that is nested within an enclosing
#ifdef CONFIG_SMP.
Signed-off-by: <paulmck@us.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch gives some clues to the user when swapping is not enabled during
swsusp. Please apply.
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The third "shared" field of /proc/$pid/statm in 2.4 was a count of pages in
the mm whose page_count is more than 1 (oddly, including pages shared just
with swapcache). That's too costly to calculate each time, so 2.6 changed
it to the total file-backed extent. But Andrea knows apps and users
surprised when (rss - shared) goes negative: we need to provide an rss-like
statistic, close to the 2.4 interpretation.
Something that's quick and easy to maintain accurately is mm->anon_rss, the
count of anonymous pages in the mm. Then shared = rss - anon_rss gives a
pretty good and meaningful approximation to 2.4's intention: wli confirms
that this will be useful to Oracle too.
Where to show it? I think it's best to treat this as a bugfix and show it
in the third field of /proc/$pid/statm, after resident, as before - there's
no evidence that the total file-backed extent was found useful.
Albert would like other fields to revert to page counts, but that's a lot
harder: if mprotect can change the category of a page, then it can't be
accounted as simply as this. Only go that route if real need shown.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: William Irwin <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Fix name, and make sure that it's listed as a conditional
system call so that we stub it out to ENOSYS if the kernel
isn't compiled with key management support.
|
|
All ARM binutils versions post 2.11.90 contains an extra "feature" which
interferes with the kernel in various ways - extra "mapping symbols"
in the ELF symbol table '$a', '$t' and '$d'. This causes two problems:
1. Since '$a' symbols have the same value as function names, this
causes anything which uses the kallsyms infrastructure to report
wrong values.
2. programs which parse System.map do not expect symbols to start with
'$'.
Signed-off-by: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
===== kernel/module.c 1.120 vs edited =====
|
|
In general it is not safe to do any non-ptrace wakeup of a thread in
TASK_TRACED, because the waking thread could race with a ptrace call
that could be doing things like mucking directly with its kernel stack.
AFAIK noone has established that whatever clobberation ptrace can do to
a running thread is safe even if it will never return to user mode, so
we can't allow this even for SIGKILL.
What we _can_ safely do is make a thread switching out of TASK_TRACED
resume rather than sitting in TASK_STOPPED if it has a pending SIGKILL
or SIGCONT. The following patch does this.
This should be sufficient for the shutdown case. When killing all
processes, if the tracer gets killed first, the tracee goes into
TASK_STOPPED and will be woken and killed by the SIGKILL (same as
before). If the tracee gets killed first, it gets a pending SIGKILL and
doesn't wake up immediately--but, now, when the tracer gets killed, the
tracee will then wake up to die.
This will also fix the (same) situations that can arise now where you
have used gdb (or whatever ptrace caller), killed -9 the gdb and the
process being debugged, but still have to kill -CONT the process before
it goes away (now it should just go away either the first time or when
you kill gdb).
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This is needed for an mmtimer driver update that we are currently working
on. The mmtimer driver provides CLOCK_SGI_CYCLE via clock_gettime and
clock_settime.
With this api fix one will be able to use timer_create, timer_settime and
friends from userspace to schedule and receive signals via timer interrupts
of mmtimer.
Changelog
* Clean up timer api for drivers that use register_posix_clock. Drivers
will then be able to use posix timers to schedule interrupts.
* Change API for posix_clocks[].timer_create to only pass one pointer
to a k_itimer structure that is now allocated and managed by the
posix layer in the same way as for the other posix timer
functions.
* Isolate a posix_timer_event(timr) function in posix-timers.c that may
be called by the interrupt routine of a timer to signal that the
scheduled event has taken place.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Currently, only module parameters in loaded modules are exported in
/sys/modules/, while those of "modules" built into the kernel can be set by
the kernel command line, but not read or set via sysfs.
- move module parameters from /sys/modules/$(module_name)/$(parameter_name) to
/sys/modules/$(module_name)/parameters/$(parameter_name)
- remove dummy kernel_param for exporting refcnt, add "struct module *"-based
attribute instead
- also export module paramters for "modules" which are built into the kernel,
so parameters are always accessible at
/sys/modules/$(KBUILD_MODNAME)/$(parameter_name)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (modified)
Signed-off-by: Dominik Brodowski <linux@brodo.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The session leader should disassociate from its controlling terminal and
send SIGHUP signals only when the whole session leader process dies.
Currently, this gets done when any thread in that process dies, which is
wrong. This patch fixes it.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch changes process accounting to write just one record for a
process with many NPTL threads, rather than one record for each thread. No
record is written until the last thread exits. The process's record shows
the cumulative time of all the threads that ever lived in that process
(thread group). This seems like the clearly right thing and I assume it is
what anyone using process accounting really would like to see.
There is a race condition between multiple threads exiting at the same time
to decide which one should write the accounting record. I couldn't think
of anything clever using existing bookkeeping that would get this right, so
I added another counter for this. (There may be some potential to clean up
existing places that figure out how many non-zombie threads are in the
group, now that this count is available.)
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The following patch against the latest mm fixes several problems with
active_load_balance().
Rather than starting with the highest allowable domain (SD_LOAD_BALANCE is
still set) and depending on the order of the cpu groups, we start at the
lowest domain and work up until we find a suitable CPU or run out of
options (SD_LOAD_BALANCE is no longer set). This is a more robust approach
as it is more explicit and not subject to the construction order of the cpu
groups.
We move the test for busiest_rq->nr_running <=1 into the domain loop so we
don't continue to try and move tasks when there are none left to move.
This new logic (testing for nr_running in the domain loop) should make the
busiest_rq==target_rq condition really impossible, so we have replaced the
graceful continue on fail with a BUG_ON. (Bjorn Helgaas, please confirm)
We eliminate the exclusion of the busiest_cpu's group from the pool of
available groups to push to as it is the ideal group to push to, even if
not very likely to be available. Note that by removing the test for
group==busy_group and allowing it to also be tested for suitability, the
running time is nearly the same.
We no longer force the destination CPU to be in a group of completely idle
CPUs, nor to be the last in that group.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The number of times schedule() left the processor idle in the
/proc/schedstat (runqueue.sched_goidle) seems to be wrong.
The schedule() statistics should satisfy the equation:
sched_cnt == sched_noswitch + sched_switch + sched_goidle
(http://eaglet.rain.com/rick/linux/schedstat/v10/format-10.html)
The below patch fix this, and I have confirmed to be fixed with:
# grep ^cpu /proc/schedstat | awk '{print $6+$7+$9, $8}'
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
A large number of processes that are pinned to a single CPU results in
every other CPU's load_balance() seeing this overloaded CPU as "busiest",
yet move_tasks() never finds a task to pull-migrate. This condition occurs
during module unload, but can also occur as a denial-of-service using
sys_sched_setaffinity(). Several hundred CPUs performing this fruitless
load_balance() will livelock on the busiest CPU's runqueue lock. A smaller
number of CPUs will livelock if the pinned task count gets high. This
simple patch remedies the more common first problem: after a move_tasks()
failure to migrate anything, the balance_interval increments. Using a
simple increment, vs. the more dramatic doubling of the balance_interval,
is conservative and yet also effective.
Signed-off-by: John Hawkes <hawkes@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Small bug fix for domains that don't load balance (like those that only
balance on exec for example).
Signed-off-by: John Hawkes <hawkes@sgi.com>
Signed-off-by: Jesse Barnes <jbarnes@sgi.com>
Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Some small fixes for the SOFTWARE_SUSPEND help text.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
power_down may never ever fail, so it does not really need to return
anything. Kill obsolete code and fixup old comments.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This allows for low-latency BKL contention even with
preemption. Previously, since preemption is disabled
over context switches, re-acquiring the kernel lock when
resuming a process would be non-preemtible.
|
|
Makes sure msleep() sleeps at least the amount provided, since
schedule_timeout() doesn't guarantee a full jiffy.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Now that spinlocks are uninlined, it is silly to keep the
BKL inlined. And this should make it a lot easier for people
to play around with variations on the locking (ie Ingo's
semaphores etc).
|
|
This is indeed a new bug, and it is not architecture-specific. In my
recent changes to close some race conditions, I overlooked the case of a
process using PTRACE_ATTACH on its own children. The new PT_ATTACHED flag
does not really mean "PTRACE_ATTACH was used", it means "PTRACE_ATTACH is
changing the ->parent link".
This fixes the problem that Stephane Eranian program demonstrates.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Oh, duh. The race is obvious. Sorry for the confusion there.
The BUG_ON's were useful for debugging, since they trigger on a lot of
errors, but they _also_ trigger on some unlikely (but valid) races.
So just remove them - just fall through to the regular exit code after
core-dumping (which does everything right).
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Doing access control checks with rq_lock held can cause deadlock when
audit messages are created (via printk or audit infrastructure) which
trigger a wakeup and deadlock, as noted by both SELinux and SubDomain
folks. This patch will let the security checks happen w/out lock held,
then re-sample the p->policy in case it was raced.
Originally from John Johansen <johansen@immunix.com>, reworked by me.
AFAIK, this version drew no objections from Ingo or Andrea.
From: John Johansen <johansen@immunix.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: Keshavamurthy Anil S <anil.s.keshavamurthy@intel.com>
Remove cpu_run_sbin_hotplug() - use kobject_hotplug() instead.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
|
|
kobject_set_name takes a printf style argument list. There are many
callers that pass only one string, if this string contained a '%' character
than bad things would happen. The fix is simple.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
|
|
Posix timers preallocate siqueue structures during timer creation
and keep them for reuse. This allocation happens in user context
with no locks held, however it's designated as an atomic allocation.
Loosen this restriction, and while we're at it let's do a bit of code
consolidation so signal sending uses same __sigqueue_alloc() helper.
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The current "generic" implementation of IRQ probing isn't well suited
for ppc in it's current form, and causes issues with yenta_socket
(and possibly others) on pmac laptops. We didn't have a probe implementation
in the past, we probably don't need one anyway, so for now, the fix is to
make this optional and enable it on x86 and x86_64 but not ppc and ppc64
(the 4 archs to use the generic IRQ code).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Instead, tty_io.c will always copy user space data to
kernel space, leaving the drivers to worry only about
normal kernel buffers.
No more "from_user" flag, and having the user copy in
each driver.
This cleans up the code and also fixes a number of
locking bugs.
|
|
This makes us do the proper copy_to_user() for the new
posix timers code.
Acked by Christoph Lameter <clameter@sgi.com>.
|
|
into ppc970.osdl.org:/home/torvalds/v2.6/linux
|
|
into ppc970.osdl.org:/home/torvalds/v2.6/linux
|
|
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
I've noticed that under specific circumstances the "console=" kernel
parameter is ignored. This happens when EARLY_PRINTK is enabled and the
serial console is the only available. In this case unregister_console()
when called for the early console sets preferred_console back to -1
replacing the value that was recorded by console_setup() -- the order of
calls is as follows:
1. register_console() -- for the early console,
2. console_setup() -- recording the console index for the real console,
3. unregister_console() -- for the early console, erasing the console
index recorded above,
4. register_console() -- for the real console, picking up the first device
available, instead of the selected one.
I've observed this problem with a DECstation system using ttyS3 -- its
default console device from the firmware's point of view.
The solution is to restore the setting of "console=" upon
unregister_console(). This made a snapshot of 2.4.26 work for me. I
wasn't able to test the changes with 2.6 because DECstation drivers don't
support it yet, but the code responsible for console selection appears
functionally the same. So I've concluded it needs the same change. Here's
a patch.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
There's no reason to directly #include <asm/bitops.h> since it's
available on all architectures and also included by
#include <linux/bitops.h>.
This patch changes #include <asm/bitops.h> to #include <linux/bitops.h>.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch adds "swap_token_timeout" parameter in /proc/sys/vm. The
parameter means expired time of token. Unit of the value is HZ, and the
default value is the same as current SWAP_TOKEN_TIMEOUT (i.e. HZ * 300).
Signed-off-by: Hideo Aoki <aoki@sdl.hitachi.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Now there is no point in calling costly find_pid(type) if
__detach_pid(type) returned non zero value.
Acked-By: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Kirill's kernel/pid.c rework broke optimization logic in detach_pid(). Non
zero return from __detach_pid() was used to indicate, that this pid can
probably be freed. Current version always (modulo idle threads) return non
zero value, thus resulting in unneccesary pid_hash scanning.
Also, uninlining __detach_pid() reduces pid.o text size from 2492 to 1600
bytes.
Acked-By: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
POSIX clocks are to be implemented in the following way according
to V3 of the Single Unix Specification:
1. CLOCK_PROCESS_CPUTIME_ID
Implementations shall also support the special clockid_t value
CLOCK_PROCESS_CPUTIME_ID, which represents the CPU-time clock of the
calling process when invoking one of the clock_*() or timer_*()
functions. For these clock IDs, the values returned by clock_gettime() and
specified by clock_settime() represent the amount of execution time of the
process associated with the clock.
2. CLOCK_THREAD_CPUTIME_ID
Implementations shall also support the special clockid_t value
CLOCK_THREAD_CPUTIME_ID, which represents the CPU-time clock of the
calling thread when invoking one of the clock_*() or timer_*()
functions. For these clock IDs, the values returned by clock_gettime()
and specified by clock_settime() shall represent the amount of
execution time of the thread associated with the clock.
These times mentioned are CPU processing times and not the time that has
passed since the startup of a process. Glibc currently provides its own
implementation of these two clocks which is designed to return the time
that passed since the startup of a process or a thread.
Moreover Glibc's clocks are bound to CPU timers which is problematic when the
frequency of the clock changes or the process is moved to a different
processor whose cpu timer may not be fully synchronized to the cpu timer
of the current CPU. This patchset results in a both clocks working reliably.
The patch also implements the access to other the thread and process clocks
of linux processes by using negative clockid's:
1. For CLOCK_PROCESS_CPUTIME_ID: -pid
2. For CLOCK_THREAD_CPUTIME_ID: -(pid + PID_MAX_LIMIT)
This allows
clock_getcpuclockid(pid) to return -pid
and
pthread_getcpuiclock(pid) to return -(pid + PID_MAX_LIMIT)
to allow access to the corresponding clocks.
Todo:
- The timer API to generate events by a non tick based timer is not
usable in its current state. The posix timer API seems to be only
useful at this point to define clock_get/set. Need to revise this.
- Implement timed interrupts in mmtimer after API is revised.
The mmtimer patch is unchanged from V6 and stays as is in 2.6.9-rc3-mm2.
But I expect to update the driver as soon as the interface to setup hardware
timer interrupts is usable.
Single Thread Testing
CLOCK_THREAD_CPUTIME_ID= 0.494140878 resolution= 0.000976563
CLOCK_PROCESS_CPUTIME_ID= 0.494140878 resolution= 0.000976563
Multi Thread Testing
Starting Thread: 0 1 2 3 4 5 6 7 8 9
Joining Thread: 0 1 2 3 4 5 6 7 8 9
0 Cycles= 0 Thread= 0.000000000ns Process= 0.495117441ns
1 Cycles=1000000 Thread= 0.140625072ns Process= 2.523438792ns
2 Cycles=2000000 Thread= 0.966797370ns Process= 8.512699671ns
3 Cycles=3000000 Thread= 0.806641038ns Process= 7.561527309ns
4 Cycles=4000000 Thread= 1.865235330ns Process= 12.891608163ns
5 Cycles=5000000 Thread= 1.604493009ns Process= 11.528326215ns
6 Cycles=6000000 Thread= 2.086915131ns Process= 13.500983475ns
7 Cycles=7000000 Thread= 2.245118337ns Process= 13.947272766ns
8 Cycles=8000000 Thread= 1.604493009ns Process= 12.252935961ns
9 Cycles=9000000 Thread= 2.160157356ns Process= 13.977546219ns
Clock status at the end of the timer tests:
Gettimeofday() = 1097084999.489938000
CLOCK_REALTIME= 1097084999.490116229 resolution= 0.000000040
CLOCK_MONOTONIC= 177.071675109 resolution= 0.000000040
CLOCK_PROCESS_CPUTIME_ID= 13.978522782 resolution= 0.000976563
CLOCK_THREAD_CPUTIME_ID= 0.497070567 resolution= 0.000976563
CLOCK_SGI_CYCLE= 229.967982280 resolution= 0.000000040
PROCESS clock of 1 (init)= 4.833986850 resolution= 0.000976563
THREAD clock of 1 (init)= 0.009765630 resolution= 0.000976563
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
I have received positive feedback from various individuals who have applied my
BSD Secure Levels LSM patch, and so at this point I am submitting it to you
with a request to merge it in. Nothing has changed in this patch since when I
last posted it to the LKML, so I am not re-sending it there.
This first patch adds hooks to catch attempts to set the system clock back.
Signed-off-by: Michael A. Halcrow <mahalcro@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Let's lighten the global spinlock mmlist_lock.
What's it for?
1. Its original role is to guard mmlist.
2. It later got a second role, to prevent get_task_mm from raising
mm_users from the dead, just after it went down to 0.
Firstly consider the second: __exit_mm sets tsk->mm NULL while holding
task_lock before calling mmput; so mmlist_lock only guards against the
exceptional case, of get_task_mm on a kernel workthread which did AIO's
use_mm (which transiently sets its tsk->mm without raising mm_users) on an
mm now exiting.
Well, I don't think get_task_mm should succeed at all on use_mm tasks.
It's mainly used by /proc/pid and ptrace, seems at best confusing for those
to present the kernel thread as having a user mm, which it won't have a
moment later. Define PF_BORROWED_MM, set in use_mm, clear in unuse_mm
(though we could just leave it), get_task_mm give NULL if set.
Secondly consider the first: and what's mmlist for?
1. Its original role was for swap_out to scan: rmap ended that in 2.5.27.
2. In 2.4.10 it got a second role, for try_to_unuse to scan for swapoff.
So, make mmlist a list of mms which maybe have pages on swap: add mm to
mmlist when first swap entry is assigned in try_to_unmap_one (pageout), or
in copy_page_range (fork); and mmput remove it from mmlist as before,
except usually list_empty and there's no need to lock. drain_mmlist added
to swapoff, to empty out the mmlist if no swap is then in use.
mmput leave mm on mmlist until after its exit_mmap, so try_to_unmap_one can
still add mm to mmlist without worrying about the mm_users 0 case; but
try_to_unuse must avoid the mm_users 0 case (when an mm might be removed
from mmlist, and freed, while it's down in unuse_process): use
atomic_inc_return now all architectures support that.
Some of the detailed comments in try_to_unuse have grown out of date:
updated and trimmed some, but leave SWAP_MAP_MAX for another occasion.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
module_param_array() takes a variable to put the number of elements in.
Looking through the uses, many people don't care, so they declare a dummy
or share one variable between several parameters. The latter is
problematic because sysfs uses that number to decide how many to display.
The solution is to change the variable arg to a pointer, and if the pointer
is NULL, use the "max" value. This change is fairly small, but fixing up
the callers is a lot of (trivial) churn.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
|
|
Used by sparc envctl drivers, specifically envctl.c and bbc_envctrl.c
under drivers/sbus/char/
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
I've been informed that /proc/profile livelocks some systems in the timer
interrupt, usually at boot. The following patch attempts to amortize the
atomic operations done on the profile buffer to address this stability
concern. This patch has nothing to do with performance; kernels using
periodic timer interrupts are under realtime constraints to complete
whatever work they perform within timer interrupts before the next timer
interrupt arrives lest they livelock, performing no work whatsoever apart
from servicing timer interrupts. The latency of the cacheline bounce for
prof_buffer contributes to the time spent in the timer interrupt, hence it
must be amortized when remote access latencies or deviations from fair
exclusive cacheline acquisition may cause cacheline bounces to take longer
than the interval between timer ticks.
What this patch does is to create a pair of per-cpu open-addressed
hashtables indexed by profile buffer slot holding values representing the
number of pending profile buffer hits for the profile buffer slot. When
this hashtable overflows, one iterates over the hashtable accounting each
of the pairs of profile buffer slots and hit counts to the global profile
buffer. Zero is a legitimate profile buffer slot, so zero hit counts
represent unused hashtable entries. The hashtable is furthermore protected
from flush IPI's by interrupt disablement.
In order to flush the pending profile hits for read_profile(), this patch
flips betweeen the pairs of per-cpu profile buffer by signalling all cpus
to flip via IPI at the time of read_profile(), followed by doing all the
work to flush the profile hits from the older per-cpu buffers in the
context of the caller of read_profile(), with exclusion provided by a
semaphore ensuring that only one caller of profile_flip_buffers() may
execute at a time, and using interrupt disablement to prevent buffer flip
IPI's from altering the hashtables or flip state while an update is in
progress. The flip state is per-cpu so that remote cpus need only disable
interrupts locally for synchronization, which is both simple and
busywait-free for remote cpus. The flip states all change in tandem when
some cpu requests the hashtables be flipped, and the requester waits for
the completion of smp_call_function() for notification that all cpus have
finished flipping between their hashtables. The IPI handler merely toggles
the flip state (which is an array index) between 0 and 1.
This is expected to be a much stronger amortization than merely reducing
the frequency of profile buffer access by a factor of the size of the
hashtable because numerous hits may be held for each of its entries. This
reduces what was before the patch a number of atomic increments equal to
what after the patch becomes the sum of the hits held for each entry in the
hashtable, to a number of atomic_add()'s equal to the number of entries in
the per_cpu hashtable. This is nondeterministic, but as the profile hits
tend to be concentrated in a very small number of profile buffer slots
during any given timing interval, is likely to represent a very large
number of atomic increments. This amortization of atomic increments does
not depend on the hash function, only the sharp peakedness of the
distribution of profile buffer hits.
This algorithm has two advantages over full-size per-cpu profile buffers.
The first is that the space footprint is much smaller. Per-cpu profile
buffers would increase the space requirements by a factor of
num_online_cpus(), where this algorithm only requires one page per cpu.
The second is that reading the profile state is much faster, because the
state that must be traversed is exactly the above space consumers, and the
relative reduction in size concomitantly reduces the time required for a
read operation.
I also took the liberty of adding some commentary to the comments at the
beginning of the file reflecting the major work done on profile.c in recent
months and describing what the file implements.
The reporters of this issue have verified that this resolves their timer
interrupt livelock on 512x Altixen. In my own testing on 4x logical
x86-64, this patch saw a rate of about 18 flushes per minute under load, or
about one flush every 3 seconds, for about 38.4 atomic accesses to the
profile buffer per second per cpu in one of the algorithm's worst cases,
about 3.84% of the number of atomic profile buffer accesses per second per
cpu as a normal kernel would commit. This represents a twenty-six-fold
increase in the scalability on SMP systems with 4KB PAGE_SIZE, i.e. with a
4KB PAGE_SIZE, the number of atomic profile buffer accesses per second per
cpu is reduced by a factor of 26, thereby increasing the number of cpus a
system must have before it would experience a timer interrupt livelock by a
factor of 26, with the proviso that cacheline bounces must take the same
amount of time to service. This increase in the scalability of the kernel
is expected to be much larger for ia64, which has a large PAGE_SIZE,
because the distribution of profile buffer hits is so sharply peaked that
doubling the hashtable size will much more than double the amortization
factor. In fact, only 19 flushes were observed on a 64x Altix over an
approximately 10 minute AIM7 run, and 1 flush on a 512x Altix over the
course of an entire AIM7 run, for truly vast effective amortization
factors.
A prior version of this patch, which did not include the node-local
hashtable allocation and bounded collision chains has been successfully
tested on 64x and 512x ia64 vs 2.6.9-rc2, 8x ia64 vs. 2.6.9-rc2-mm1, 4x
x86-64 vs. 2.6.9-rc2-mm1, and 6x sparc64 vs. 2.6.9-rc2-mm1. This patch
minus the hashtable initialization fix has been successfully tested on 2x
ppc64, 2x alpha, 8x ia64, 6x sparc64, and 4x x86-64, all vs.
2.6.9-rc2-mm1. This precise version of the patch has been successfully
tested on 8x ia32 against 2.6.9-rc2-mm1 and 6x sparc64 vs. both
2.6.9-rc2-mm1 and 2.6.9-rc2-mm2.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Hugh and I both thought this would be generally useful.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This taint didn't appear to be reported.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
This patch adds machine check tainting. When a handled machine check
occurs the oops gets a new 'M' flag. This is useful to ignore machines
with hardware problems in oops reports.
On i386 a thermal failure also sets this flag.
Done for x86-64 and i386 so far.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
For non-smp kernels the call to update_process_times is done in the
do_timer function. It is more consistent with smp kernels to move this
call to the architecture file which calls do_timer.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|