Age | Commit message (Collapse) | Author | Files | Lines |
|
Currently most driver events are not sent out when using initramfs as
driver_init() (which triggers the events) is called before init_workqueues.
This patch rearranges the init calls so that the hotplug event queue is
enabled prior to calling driver_init(), hence we're getting all hotplug
events again.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
into intel.com:/home/lenb/src/linux-acpi-test-2.6.8
|
|
|
|
|
|
Fixes two common boot failures due to buggy SMM BIOS code
SMP boot crash if SMI_CMD=ACPI written from CPU1
http://bugzilla.kernel.org/show_bug.cgi?id=2941
laptop crash due to LAPIC timer before SMI_CMD=ACPI
http://bugzilla.kernel.org/show_bug.cgi?id=1269
|
|
Move STANDALONE from init/Kconfig to drivers/base/Kconfig .
This way, it's besides PREVENT_FIRMWARE_BUILD.
Signed-off-by: Adrian Bunk <bunk@fs.tum.de>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
|
|
This patch adds IA64 support to the audit infrastructure. The IA64
audit patch complements the existing audit support for the i386,
PPC64, and x86_64 architectures. This patch is based on work by Ray
Lanza.
Signed-off-by: Peter Martuccelli <peterm@redhat.com>
Signed-off-by: David Mosberger <davidm@hpl.hp.com>
|
|
Attached is a smallish patch for couple trivial sparse warnings in
allnoconfig build and more importantly an "excuses" text file explaining
why the rest have not been fixed.
Basically all of them (with the exception of the one in Andrews tree) need
some serious re-engineering.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
I don't think we're in K&R any more, Toto.
If you want a NULL pointer, use NULL. Don't use an integer.
Most of the users really didn't seem to know the proper type.
|
|
Rework the declaration, sizing and memcpying of saved_command_line[] so
that ARM doesn't need to implement unwelcome header file nestings.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Verify that linking kallsyms into vmlinux generates a stable System.map,
instead of assuming that it is stable.
Add CONFIG_KALLSYMS_EXTRA_PASS as a temporary workaround for unstable maps,
so users can proceed while waiting for kallsyms to be fixed.
Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
The sysv-ipc code uses mm/shmem.o, which in turn uses VM stuff and is
only compiled on MMU systems.
Signed-off-by: Miles Bader <miles@gnu.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: "Randy.Dunlap" <rddunlap@osdl.org>
Some elements of ikconfig have been removed, but the help text wasn't
updated to reflect those changes.
Signed-off-by: Randy Dunlap <rddunlap@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: Rusty Russell <rusty@rustcorp.com.au>
Currently every arch declares its own char saved_command_line[]. Make sure
every arch defines COMMAND_LINE_SIZE in asm/setup.h, and declare
saved_command_line in linux/init.h (init/main.c contains the definition).
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Here's a patch to allocate memory for big system hash tables with the
bootmem allocator rather than with main page allocator.
It is needed for three reasons:
(1) So that the size can be bigger than MAX_ORDER. IBM have done some
testing on their big PPC64 systems (64GB of RAM) with linux-2.4 and found
that they get better performance if the sizes of the inode cache hash,
dentry cache hash, buffer head hash and page cache hash are increased
beyond MAX_ORDER (order 11).
Now the main allocator can't allocate anything larger than MAX_ORDER, but
the bootmem allocator can.
In 2.6 it appears that only the inode and dentry hashes remain of those
four, but there are other hash tables that could use this service.
(2) Changing MAX_ORDER appears to have a number of effects beyond just
limiting the maximum size that can be allocated in one go.
(3) Should someone want a hash table in which each bucket isn't a power of
two in size, memory will be wasted as the chunk of memory allocated will
be a power of two in size (to hold a power of two number of buckets).
On the other hand, using the bootmem allocator means the allocation
will only take up sufficient pages to hold it, rather than the next power
of two up.
Admittedly, this point doesn't apply to the dentry and inode hashes,
but it might to another hash table that might want to use this service.
I've coelesced the meat of the inode and dentry allocation routines into
one such routine in mm/page_alloc.c that the the respective initialisation
functions now call before mem_init() is called.
This routine gets it's approximation of memory size by counting up the
ZONE_NORMAL and ZONE_DMA pages (and ZONE_HIGHMEM if requested) in all the
nodes passed to the main allocator by paging_init() (or wherever the arch
does it). It does not use max_low_pfn as that doesn't seem to be available
on all archs, and it doesn't use num_physpages since that includes highmem
pages not available to the kernel for allocating data structures upon -
which may not be appropriate when calculating hash table size.
On the off chance that the size of each hash bucket may not be exactly a
power of two, the routine will only allocate as many pages as is necessary
to ensure that the number of buckets is exactly a power of two, rather than
allocating the smallest power-of-two sized chunk of memory that will hold
the same array of buckets.
The maximum size of any single hash table is given by
MAX_SYS_HASH_TABLE_ORDER, as is now defined in linux/mmzone.h.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
* Fixed argument processing bug in init/main.c (Eric Delaunay)
This fixes Debian BTS #58566.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=58566
From: Eric Delaunay <delaunay@lix.polytechnique.fr>
Message-Id: <200002201918.UAA02327@jazz.pontchartrain.fr>
Subject: pb in handling parameters on kernel command line
To: submit@bugs.debian.org (debian bug tracking system)
Hello, I found some bugs in kernel command line parser. AFAIK, they are not
Debian nor sparc specific but I'm not subscribed to linux-kernel mailing list
and since I'm involved with boot-floppies (mainly for sparc), I think I'm right
to report it here. Feel free to forward it upstream (I checked the latest
2.3.46 sources and it seems these bugs are still there).
These bugs are not release critical. The latter just not gives the user a
chance to overwrite TERM env var at boot time. It could be just
inconvenient for serial console boot, and in this case, our busybox' init is
already enforcing TERM=vt102.
Nevertheless if it could not be fixed before the release, I could even write a
workaround in busybox' init (it's just a matter of rewriting getenv()).
At last, it does not affect sysvinit package because serial console tty is
controlled by a getty process which is reading terminal settings on its command
line (take a look in inittab for T0 entries, if any).
Ok, here is my modest contribution to kernel hacking. I don't know much about
kernel internals but it seems that argument parsing is a bit broken.
One trivial patch for command line like "init=/bin/sh console=prom" where
console=prom is replaced by lot of spaces in previous call to setup_arch() on
sparc, therefore the line parsed by parse_options() is really
"init=/bin/sh " and a lot of null args are pushed into argv_init.
The other patch is for command line like "TERM=vt100" where both default & user
TERM entries are pushed into the env array.
Taking a look into /proc/1/environ, it shows up:
HOME=/
TERM=linux
TERM=vt100
It appears that ash (maybe other shells too) is giving the latter entry but
glibc getenv() is giving the former. It is therefore impossible to get entry
from the user in a C program like busybox' init (used in Debian boot-floppies).
I guess getenv() is not written to support duplicate entries, therefore the
kernel should avoid such construct.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
Suggested by Manfred Spraul.
__get_free_pages had a hack to do node interleaving allocation at boot
time. This patch sets an interleave process policy using the NUMA API for
init and the idle threads instead. Before entering the user space init the
policy is reset to default again. Result is the same.
Advantage is less code and removing of a check from a fast path.
Removes more code than it adds.
I verified that the memory distribution after boot is roughly the same.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
BSD accounting format rework:
Use all explicit and implicit padding in struct acct to
- correctly report 32 bit uid/gid,
- correctly report jobs (e.g., daemons) running longer than 497 days,
- increase the precision of ac_etime from 2^-13 to 2^-20
(i.e., from ~6 hours to ~1 min. after a year)
- store the current AHZ value.
- allow cross-platform processing of the accounting file
(limited for m68k which has a different size struct acct).
- introduce versioning for smooth transition to incompatible formats in
the future. Currently the following version numbers are defined:
0: old format (until 2.6.7) with 16 bit uid/gid
1: extended variant (binary compatible to v0 on M68K)
2: extended variant (binary compatible to v0 on everything except M68K)
3: a new binary incompatible format (64 bytes)
4: new binary incompatible format (128 bytes).
layout of its first 64 bytes is the same as for v3.
5: marks second half of new binary incompatible format (128 bytes)
(layout is not yet defined)
All this is accomplished without breaking binary compatibility. 32 bit
uid/gid support is compatible with the patch previously floating around and
used e.g. by Red Hat.
This patch also introduces a config option for a new, binary incompatible
"version 3" format that
- is uniform across and properly aligned on all platforms
- stores pid and ppid
- uses AHZ==100 on all platforms (allows to report longer times)
Much of the compatibility glue goes away when v1/v2 support is removed from
the kernel. Such a patch is at
http://www.physik3.uni-rostock.de/tim/kernel/2.7/acct-cleanup-04.patch
and might be applied in the 2.7 timeframe.
The new v3 format is source compatible with current GNU acct tools (6.3.5).
However, current GNU acct tools can be compiled for only one format. As there
is no way to pass the kernel configuration to userspace, with my patch it will
still only support the old v2 format. Only if v1/v2 support is removed from
the kernel, recompiling GNU acct tools will yield v3 support.
A preliminary take at the corresponding work on cross-platform userspace tools
(GNU acct package) is at
http://www.physik3.uni-rostock.de/tim/kernel/utils/acct/
This version of the package is able to read any of the v0/v2/v3 formats,
regardless of byte-order (untested), even within the same file.
Cross-platform compatibility with m68k (v1 format) is not yet implemented, but
native use on m68k should work (untested). pid and ppid are currently only
shown by the dump-acct utility.
Thanks to Arthur Corliss, Albert Cahalan and Ragnar Kjørstad for their
comments, and to Albert Cahalan for the u64->IEEE float conversion code.
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
1) Explicitly cast to __user in syscall invocations
where we know we are in KERNEL_DS
2) Explicitly test against zero in assignment expression
conditional.
|
|
From: Adrian Bunk <bunk@fs.tum.de>
POSIX_MQUEUE requires netlink.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
|
From: "Randy.Dunlap" <rddunlap@osdl.org>
Remove the outdated "POSIX conformance testing by UNIFIX" message.
There is a general desire to reduce the quantity of noisy and/or
outdated kernel boot-time messages...
Suggested by Andi Kleen.
Ulrich's (old) comments:
http://www.nsa.gov/selinux/list-archive/0107/0525.cfm
Certifying Linux (Linux Journal):
http://www.linuxjournal.com/article.php?sid=0131
Agreed by Tim Bird, no dissenters that I heard of:
http://marc.theaimsgroup.com/?l=linux-kernel&m=108362954024749&w=2
Signed-off-by: Andrew Morton <akpm@osdl.org>
|
|
From: <viro@parcelfarce.linux.theplanet.co.uk>
init/initramfs.c::do_skip() has an off-by-one that leads to unpacking
failures for some gzipped cpio images. We have
static int __init do_skip(void)
{
if (this_header + count <= next_header) {
eat(count);
return 1;
} else {
eat(next_header - this_header);
state = next_state;
return 0;
}
}
and that <= should actually be <. It almost never matters, since if we hit
the boundary case (header ending exactly on the gunzip window end) the
current variant will simply end up doing extra call of do_skip() when we
get to the next window and that will finish the work (assign state). The
only exception is when we hit that in the last window. That is, if there's
nothing after the final header (trailer). Then we miss the final state
transition (Skip -> Reset) and get "junk in archive" panic. Normally
cpio(1) pads the image to multiple of 512, so we actually have a bunch of
zeroes after the trailer. And that almost always saves our butts - trailer
is followed by zeroes, so we get to Reset state just fine.
So we never see that on small in-kernel image (it's less than 512 bytes, so
it gets a lot of padding) and we almost never see that on external ones
(1:127 odds of hitting the bug).
|
|
From: Hugh Dickins <hugh@veritas.com>
Andrea Arcangeli's anon_vma object-based reverse mapping scheme for anonymous
pages. Instead of tracking anonymous pages by pte_chains or by mm, this
tracks them by vma. But because vmas are frequently split and merged
(particularly by mprotect), a page cannot point directly to its vma(s), but
instead to an anon_vma list of those vmas likely to contain the page - a list
on which vmas can easily be linked and unlinked as they come and go. The vmas
on one list are all related, either by forking or by splitting.
This has three particular advantages over anonmm: that it can cope
effortlessly with mremap moves; and no longer needs page_table_lock to protect
an mm's vma tree, since try_to_unmap finds vmas via page -> anon_vma -> vma
instead of using find_vma; and should use less cpu for swapout since it can
locate its anonymous vmas more quickly.
It does have disadvantages too: a lot more change in mmap.c to deal with
anon_vmas, though small straightforward additions now that the vma merging has
been refactored there; more lowmem needed for each anon_vma and vma structure;
an additional restriction on the merging of vmas (cannot be merged if already
assigned different anon_vmas, since then their pages will be pointing to
different heads).
(There would be no need to enlarge the vma structure if anonymous pages
belonged only to anonymous vmas; but private file mappings accumulate
anonymous pages by copy-on-write, so need to be listed in both anon_vma and
prio_tree at the same time. A different implementation could avoid that by
using anon_vmas only for purely anonymous vmas, and use the existing prio_tree
to locate cow pages - but that would involve a long search for each single
private copy, probably not a good idea.)
Where before the vm_pgoff of a purely anonymous (not file-backed) vma was
meaningless, now it represents the virtual start address at which that vma is
mapped - which the standard file pgoff manipulations treat linearly as vmas
are split and merged. But if mremap moves the vma, then it generally carries
its original vm_pgoff to the new location, so pages shared with the old
location can still be found. Magic.
Hugh has massaged it somewhat: building on the earlier rmap patches, this
patch is a fifth of the size of Andrea's original anon_vma patch. Please note
that this posting will be his first sight of this patch, which he may or may
not approve.
|
|
From: Hugh Dickins <hugh@veritas.com>
Rajesh Venkatasubramanian's implementation of a radix priority search tree of
vmas, to handle object-based reverse mapping corner cases well.
Amongst the objections to object-based rmap were test cases by akpm and by
mingo, in which large numbers of vmas mapping disjoint or overlapping parts of
a file showed strikingly poor performance of the i_mmap lists. Perhaps those
tests are irrelevant in the real world? We cannot be too sure: the prio_tree
is well-suited to solving precisely that problem, so unless it turns out to
bring too much overhead, let's include it.
Why is this prio_tree.c placed in mm rather than lib? See GET_INDEX: this
implementation is geared throughout to use with vmas, though the first half of
the file appears more general than the second half.
Each node of the prio_tree is itself (contained within) a vma: might save
memory by allocating distinct nodes from which to hang vmas, but wouldn't save
much, and would complicate the usage with preallocations. Off each node of
the prio_tree itself hangs a list of like vmas, if any.
The connection from node to list is a little awkward, but probably the best
compromise: it would be more straightforward to list likes directly from the
tree node, but that would use more memory per vma, for the list_head and to
identify that head. Instead, node's shared.vm_set.head points to next vma
(whose shared.vm_set.head points back to node vma), and that next contains the
list_head from which the rest hang - reusing fields already used in the
prio_tree node itself.
Currently lacks prefetch: Rajesh hopes to add some soon.
|
|
From: Hugh Dickins <hugh@veritas.com>
Lots of deletions: the next patch will put in the new anon rmap, which
should look clearer if first we remove all of the old pte-pointer-based
rmap from the core in this patch - which therefore leaves anonymous rmap
totally disabled, anon pages locked in memory until process frees them.
Leave arch files (and page table rmap) untouched for now, clean them up in
a later batch. A few constructive changes amidst all the deletions:
Choose names (e.g. page_add_anon_rmap) and args (e.g. no more pteps) now
so we need not revisit so many files in the next patch. Inline function
page_dup_rmap for fork's copy_page_range, simply bumps mapcount under lock.
cond_resched_lock in copy_page_range. Struct page rearranged: no pte
union, just mapcount moved next to atomic count, so two ints can occupy one
long on 64-bit; i386 struct page now 32 bytes even with PAE. Never pass
PageReserved to page_remove_rmap, only do_wp_page did so.
From: Hugh Dickins <hugh@veritas.com>
Move page_add_anon_rmap's BUG_ON(page_mapping(page)) inside the rmap_lock
(well, might as well just check mapping if !mapcount then): if this page is
being mapped or unmapped on another cpu at the same time, page_mapping's
PageAnon(page) and page->mapping are volatile.
But page_mapping(page) is used more widely: I've a nasty feeling that
clear_page_anon, page_add_anon_rmap and/or page_mapping need barriers added
(also in 2.6.6 itself),
|
|
From: Andrew Theurer <habanero@us.ibm.com>
Use num_online_cpus in smp_init instead of counting cpus which may or may not
really be brought up.
|
|
From: David Mosberger <davidm@napali.hpl.hp.com>
It used to be that loops_per_jiffy was a macro on ia64, hence it couldn't be
exported. That's no longer the case though, so there is no point in
inhibiting its export (not that it makes any _sense_ to export that value on
ia64).
|
|
From: Rusty Russell <rusty@rustcorp.com.au>
kallsyms contains only function names, but some debuggers (eg. xmon on
PPC/PPC64) use it to lookup symbols: it'd be much nicer if it included data
symbols too.
|
|
Split the system_state state `SYSTEM_SHUTDOWN' into SYSTEM_HALT,
SYSTEM_POWER_OFF and SYSTEM_RESTART and export system_state to modules.
This allows driver shutdown routines to know why they are being shutdown. The
IDE subsystem wants this so that it knows to not spin the disks down across a
reboot.
|
|
From: Paul Jackson <pj@sgi.com>
With a hotplug capable kernel, there is a requirement to distinguish a
possible CPU from one actually present. The set of possible CPU numbers
doesn't change during a single system boot, but the set of present CPUs
changes as CPUs are physically inserted into or removed from a system. The
cpu_possible_map does not change once initialized at boot, but the
cpu_present_map changes dynamically as CPUs are inserted or removed.
Paul Jackson <pj@sgi.com> provided an expanded explanation:
Ashok's cpu hot plug patch adds a cpu_present_map, resulting in the following
cpu maps being available. All the following maps are fixed size bitmaps of
size NR_CPUS.
#ifdef CONFIG_HOTPLUG_CPU
cpu_possible_map - map with all NR_CPUS bits set
cpu_present_map - map with bit 'cpu' set iff cpu is populated
cpu_online_map - map with bit 'cpu' set iff cpu available to scheduler
#else
cpu_possible_map - map with bit 'cpu' set iff cpu is populated
cpu_present_map - copy of cpu_possible_map
cpu_online_map - map with bit 'cpu' set iff cpu available to scheduler
#endif
In either case, NR_CPUS is fixed at compile time, as the static size of these
bitmaps. The cpu_possible_map is fixed at boot time, as the set of CPU id's
that it is possible might ever be plugged in at anytime during the life of
that system boot. The cpu_present_map is dynamic(*), representing which CPUs
are currently plugged in. And cpu_online_map is the dynamic subset of
cpu_present_map, indicating those CPUs available for scheduling.
If HOTPLUG is enabled, then cpu_possible_map is forced to have all NR_CPUS
bits set, otherwise it is just the set of CPUs that ACPI reports present at
boot.
If HOTPLUG is enabled, then cpu_present_map varies dynamically, depending on
what ACPI reports as currently plugged in, otherwise cpu_present_map is just a
copy of cpu_possible_map.
(*) Well, cpu_present_map is dynamic in the hotplug case. If not hotplug,
it's the same as cpu_possible_map, hence fixed at boot.
|
|
From: Ashok Raj <ashok.raj@intel.com>
This patch changes __init to __devinit to init_idle so that when a new cpu
arrives, it can call these functions at a later time.
|
|
gcc-3.4.0 sez:
init/do_mounts_rd.c:309: warning: conflicting types for built-in function 'malloc'
|
|
From: Ingo Molnar <mingo@elte.hu>
The trivial fixes.
- added recent trivial bits from Nick's and my patches.
- hotplug CPU fix
- early init cleanup
|
|
From: Nick Piggin <piggin@cyberone.com.au>
This is the core sched domains patch. It can handle any number of levels
in a scheduling heirachy, and allows architectures to easily customize how
the scheduler behaves. It also provides progressive balancing backoff
needed by SGI on their large systems (although they have not yet tested
it).
It is built on top of (well, uses ideas from) my previous SMP/NUMA work, and
gets results very similar to them when using the default scheduling
description.
Benchmarks
==========
Martin was seeing I think 10-20% better system times in kernbench on the 32
way. I was seeing improvements in dbench, tbench, kernbench, reaim,
hackbench on a 16-way NUMAQ. Hackbench in fact had a non linear element
which is all but eliminated. Large improvements in volanomark.
Cross node task migration was decreased in all above benchmarks, sometimes by
a factor of 100!! Cross CPU migration was also generally decreased. See
this post:
http://groups.google.com.au/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&frame=right&th=a406c910b30cbac4&seekm=UAdQ.3hj.5%40gated-at.bofh.it#link2
Results on a hyperthreading P4 are equivalent to Ingo's shared runqueues
patch (which is a big improvement).
Some examples on the 16-way NUMAQ (this is slightly older sched domain code):
http://www.kerneltrap.org/~npiggin/w26/hbench.png
http://www.kerneltrap.org/~npiggin/w26/vmark.html
From: Jes Sorensen <jes@wildopensource.com>
Tiny patch to make -mm3 compile on an NUMA box with NR_CPUS >
BITS_PER_LONG.
From: "Martin J. Bligh" <mbligh@aracnet.com>
Fix a minor nit with the find_busiest_group code. No functional change,
but makes the code simpler and clearer. This patch does two things ...
adds some more expansive comments, and removes this if clause:
if (*imbalance < SCHED_LOAD_SCALE
&& max_load - this_load > SCHED_LOAD_SCALE)
*imbalance = SCHED_LOAD_SCALE;
If we remove the scaling factor, we're basically conditionally doing:
if (*imbalance < 1)
*imbalance = 1;
Which is pointless, as the very next thing we do is to remove the
scaling factor, rounding up to the nearest integer as we do:
*imbalance = (*imbalance + SCHED_LOAD_SCALE - 1) >> SCHED_LOAD_SHIFT;
Thus the if statement is redundant, and only makes the code harder to
read ;-)
From: Rick Lindsley <ricklind@us.ibm.com>
In find_busiest_group(), after we exit the do/while, we select our
imbalance. But max_load, avg_load, and this_load are all unsigned, so
min(x,y) will make a bad choice if max_load < avg_load < this_load (that
is, a choice between two negative [very large] numbers).
Unfortunately, there is a bug when max_load never gets changed from zero
(look in the loop and think what happens if the only load on the machine is
being created by cpu groups of which we are a member). And you have a
recipe for some really bogus values for imbalance.
Even if you fix the max_load == 0 bug, there will still be times when
avg_load - this_load will be negative (thus very large) and you'll make the
decision to move stuff when you shouldn't have.
This patch allows for this_load to set max_load, which if I understand
the logic properly is correct. With this patch applied, the algorithm is
*much* more conservative ... maybe *too* conservative but that's for
another round of testing ...
From: Ingo Molnar <mingo@elte.hu>
sched-find-busiest-fix
|
|
I moved this a little too late - we need to run populate_rootfs() before
running initcalls because some driver initcalls need to open files for
firmware.
The populate_rootfs() call is still coming after init_idle(), so it won't
knock the scheduler over.
|
|
From: Matt Mackall <mpm@selenic.com>
|
|
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
s390 core changes:
- Fix race in do_call_softirq in regard to kernel preemption.
- Fix typo in compat mq system call wrappers.
- Add s390 to Kconfig for AUDITSYSCALL.
- Redefine TASK_SIZE to TASK31_SIZE for compilation of binfmt_elf32.
- Use correct error value for sys32_ipc when called with an invalid number.
- New default configuration.
|
|
populate_rootfs() is called rather early - before we've called init_idle().
But populate_rootfs() does file I/O, which involves calls to cond_resched(),
and downing of semaphores, etc. If it scheules, the scheduler emits
scheduling-while-atomic warnings and sometimes oopses.
So run populate_rootfs() later, after the scheduler is all set up.
|
|
From: Patrick Mochel <mochel@digitalimplant.org>
Here is a patch to make sysfs optional. Note that with CONFIG_SYSFS=n you
must specify the boot device's major:minor on the kernel boot command line
with
root=03:01
For embedded systems, it will save a significant amount of memory during
runtime. And, it saves 4k from the built kernel image for me.
|
|
From: Rik Faith <faith@redhat.com>
This patch provides a low-overhead system-call auditing framework for Linux
that is usable by LSM components (e.g., SELinux). This is an update of the
patch discussed in this thread:
http://marc.theaimsgroup.com/?t=107815888100001&r=1&w=2
In brief, it provides for netlink-based logging of audit records that have
been generated in other parts of the kernel (e.g., SELinux) as well as the
ability to audit system calls, either independently (using simple
filtering) or as a compliment to the audit record that another part of the
kernel generated.
The main goals were to provide system call auditing with 1) as low overhead
as possible, and 2) without duplicating functionality that is already
provided by SELinux (and/or other security infrastructures). This
framework will work "stand-alone", but is not designed to provide, e.g.,
CAPP functionality without another security component in place.
This updated patch includes changes from feedback I have received,
including the ability to compile without CONFIG_NET (and better use of
tabs, so use -w if you diff against the older patch).
Please see http://people.redhat.com/faith/audit/ for an early example
user-space client (auditd-0.4.tar.gz) and instructions on how to try it.
My future intentions at the kernel level include improving filtering (e.g.,
syscall personality/exit codes) and syscall support for more architectures.
First, though, I'm going to work on documentation, a (real) audit daemon,
and patches for other user-space tools so that people can play with the
framework and understand how it can be used with and without SELinux.
Update:
Light-weight Auditing Framework receive filter fixes
From: Rik Faith <faith@redhat.com>
Since audit_receive_filter() is only called with audit_netlink_sem held, it
cannot race with either audit_del_rule() or audit_add_rule(), so the
list_for_each_entry_rcu()s may be replaced by list_for_each_entry()s, and
the rcu_read_{un,}lock()s removed. A fix for this is part of the attached
patch.
Other features of the attached patch are:
1) generalized the ability to test for inequality
2) added syscall exit status reporting and testing
3) added ability to report and test first 4 syscall arguments (this adds
a large amount of flexibility for little cost; not implemented or tested
on ppc64)
4) added ability to report and test personality
User-space demo program enhanced for new fields and inequality testing:
http://people.redhat.com/faith/audit/auditd-0.5.tar.gz
|
|
From: Matt Mackall <mpm@selenic.com>
Make CONFIG_EMBEDDED description more accurate
|
|
From: Manfred Spraul <manfred@colorfullife.com>
Actual implementation of the posix message queues, written by Krzysztof
Benedyczak and Michal Wronski. The complete implementation is dependant on
CONFIG_POSIX_MQUEUE.
It passed the openposix test suite with two exceptions: one mq_unlink test
was bad and tested undefined behavior. And Linux succeeds
mq_close(open(,,,)). The spec mandates EBADF, but we have decided to ignore
that: we would have to add a new syscall just for the right error code.
The patch intentionally doesn't use all helpers from fs/libfs for kernel-only
filesystems: step 5 allows user space mounts of the file system.
Signal changes:
The patch redefines SI_MESGQ using __SI_CODE: The generic Linux ABI uses
a negative value (i.e. from user) for SI_MESGQ, but the kernel internal
value must be posive to pass check_kill_value. Additionally, the patch
adds support into copy_siginfo_to_user to copy the "new" signal type to
user space.
Changes in signal code caused by POSIX message queues patch:
General & rationale:
mqueues generated signals (only upon notification) must have si_code
== SI_MESGQ. In fact such a signal is send from one process which
caused notification (== sent message to empty message queue) to
another which requested it. Both processes can be of course unrelated
in terms of uids/euids. So SI_MESGQ signals must be classified as
SI_FROMKERNEL to pass check_kill_permissions (not need to say that
this signals ARE from kernel).
Signals generated by message queues notification need the same
fields in siginfo struct's union _sifields as POSIX.1b signals and we
can reuse its union entry.
SI_MESGQ was previously defined to -3 in kernel and also in glibc.
So in userspace SI_MESGQ must be still visible as -3.
Solution:
SI_MESGQ is defined in the same style as SI_TIMER using __SI_CODE macro.
Details:
Fortunately copy_siginfo_to_user copies si_code as short. So we
can use remaining part of int value freely. __SI_CODE does the
work. SI_MESGQ is in kernel:
6<<16 | (-3 & 0xffff) what is > 0
but to userspace is copied
(short) SI_MESGQ == -3
Actual changes:
Changes in include/asm-generic/siginfo.h
__SI_MESGQ added in signal.h to represent inside-kernel prefix of
SI_MESGQ. SI_MESGQ is redefined from -3 to __SI_CODE(__SI_MESGQ, -3)
Except mips architecture those changes should be arch independent
(asm-generic/siginfo.h is included in arch versions). On mips
SI_MESGQ is redefined to -4 in order to be compatible with IRIX. But
the same schema can be used.
Change in copy_siginfo_to_user: We only add one line to order the
same copy semantics as for _SI_RT.
This change isn't very portable - some arch have its own
copy_siginfo_to_user. All those should have similar change (but
possibly not one-line as _SI_RT case was sometimes ignored because i
wasn't used yet, e.g. see ia64 signal.c).
Update:
mq: only fail with invalid timespec if mq_timed{send,receive} needs to block
From: Jakub Jelinek <jakub@redhat.com>
POSIX requires EINVAL to be set if:
"The process or thread would have blocked, and the abs_timeout parameter
specified a nanoseconds field value less than zero or greater than or equal
to 1000 million."
but 2.6.5-mm3 returns -EINVAL even if the process or thread would not block
(if the queue is not empty for timedreceive or not full for timedsend).
|
|
From: Olaf Hering <olh@suse.de>
initramfs can not be used in current 2.6 kernels, the files will never be
executed because prepare_namespace doesn't care about them. The only way to
workaround that limitation is a root=0:0 cmdline option to force rootfs as
root filesystem. This will break further booting because rootfs is not the
final root filesystem.
This patch checks for the presence of /init which comes from the cpio archive
(and thats the only way to store files into the rootfs). This binary/script
has to do all the work of prepare_namespace().
|
|
From: Olof Johansson <olof@austin.ibm.com>
It's currently a boolean, but that means that system_running goes to zero
again when shutting down. So we then use code (in the page allocator) which
is only designed to be used during bootup - it is marked __init.
So we need to be able to distinguish early boot state from late shutdown
state. Rename system_running to system_state and give it the three
appropriate states.
|
|
From: Rusty Russell <rusty@rustcorp.com.au>
From: "Petri T. Koistinen" <petri.koistinen@iki.fi>
1) Various URLs in the Kconfig files are out of date: update them.
2) URLs should be of form <http://url-goes-here>.
3) References to files in the source should be of form
<file:path-from-top>
4) Email addresses should be of form <foo@bar.com>
|
|
If someone runs page_address() before page_address_init(), the kernel locks
up over uninitialised spinlocks.
This only happens with the 4:4 patch, but it is more robust to run
page_address_init() before setup_arch(). page_address_init() simply
initialises statically allocated storage.
|
|
|
|
From: NeilBrown <neilb@cse.unsw.edu.au>
kernel parameters:
raid=partitionable
will make all auto-detected md arrays partitionable
md=d....
will assemble an array as a partitionable array.
|
|
From: Arnd Bergmann <arnd@arndb.de>
Dave Jones already removed some of the useless __KERNEL_SYSCALLS__ defines
in various files, this gets rid of almost all the others. Replacing
execve() is nontrivial, so I left those in for now.
For all the other system calls that are currently used from inside the
kernel, calling the sys_* function directly should always have an identical
effect.
|
|
|
|
From: vda <vda@port.imtp.ilyichevsk.odessa.ua>
Add a missing test for the "root=/dev/ram" kernel boot option. It's just an
alias for /dev/ram0, but it worked in 2.4...
|
|
into ppc970.osdl.org:/home/torvalds/v2.5/linux
|
|
From Jan-Benedict Glaw <jbglaw@lug-owl.de>
|
|
The "bogolock" code was introduced in module.c, as a way of freezing
the machine when we wanted to remove a module. This patch moves it
out to stop_machine.c and stop_machine.h.
Since the code changes affinity and proirity, it's impolite to hijack
the current context, so we use a kthread. This means we have to pass
the function rather than implement "stop_machine()" and
"restart_machine()".
|
|
|
|
parameters. Whenever such parameter is specified kernel
will complain that "Parameter %s is obsolete, ignored"
|
|
From: Adrian Bunk <bunk@fs.tum.de>
"swap" is more known than "Support for paging of anonymous memory". The
patch below adds "(swap)" to the prompt of CONFIG_SWAP.
|
|
I goofed in my last patch to this code..
It reported 1 less CPU than it should have. Doh.
|
|
From: "Randy.Dunlap" <rddunlap@osdl.org>
Add syscalls.h, which contains prototypes for the kernel's system calls.
Replace open-coded declarations all over the place. This patch found a
couple of prior bugs. It appears to be more important with -mregparm=3 as we
discover more asmlinkage mismatches.
Some syscalls have arch-dependent arguments, so their prototypes are in the
arch-specific unistd.h. Maybe it should have been asm/syscalls.h, but there
were already arch-specific syscall prototypes in asm/unistd.h...
Tested on x86, ia64, x86_64, ppc64, s390 and sparc64. May cause
trivial-to-fix build breakage on other architectures.
|
|
into kroah.com:/home/linux/BK/pci-2.6
|
|
gcc-3.4 incorretly inlines rest_init() into start_kernel(), causing things to
crash when the .text.init section gets unloaded. Use noinline to prevent
that.
|
|
From: Tim Hockin <thockin@sun.com>
Remove the max_anon via dynamically allocation. We also change the
idr_pre_get() interface to take a gfp mask, which should have always been
there.
|
|
From: Pratik Solanki <pratik.solanki@timesys.com>
- Fix include path for build.c so that it finds asm/boot.h.
/usr/include/asm/boot.h may not be present when cross-compiling on a
non-Linux machine.
- $(CONFIG_SHELL) instead of sh.
|
|
From: Werner Almesberger <werner@almesberger.net>
When passing too many unrecognized boot command line options (which become
arguments or environment variables), the 2.6 kernel panics (unlike 2.4,
which just ignores the extra items). Unfortunately, this happens before
the console is initialized, so all you get is a kernel that dies quickly,
for no apparent reason.
This is particularly irritating if using UML with
init=something wi th a lot of ar gu men t s
The patch below delays the panic until after console_init.
(akpm: I mainly added this in because we have other places where the
panic-later-on machinery is needed).
|
|
As a bonus: cris, h8300, m68k and sparc can use CONFIG_HOTPLUG now.
|
|
Now that Al Viro fixed cramfs, it works beautifully as an initrd
filesystem.
So finally plumb it in.
|
|
From: Jes Sorensen <jes@trained-monkey.org>
I'd like to propose the following for 2.6.1-mm/2.6.2. On systems with a
large number of CPUs the number of printk's flowing by for each CPU
booting starts becoming a real console hog.
The following patch eliminates a couple of them (already sent a patch to
David for the ia64 specific ones) as well as changes the
"Building zonelist : X" in "Built Y zonelists". IMHO it doesn't make any
sense to print for each zonelist since it's run in a for loop running
from 0 to Y-1 anyway.
The patch nukes a few new printk's that were introduced with the
scheduler changes to the NUMA code in -mm3, if these are still needed
then I won't fight for that part of the patch.
|
|
|
|
This currently prints out the maximum number of CPUs the
kernel is configured to support, instead of the actual
number that the kernel brought up. Which results in odd
displays that look like you have more CPUs than you do.
|
|
This patch arranges for the exception tables to be sorted on most
architectures. It sorts the main kernel exception table on startup
and the module exception tables when they get loaded. The main table
is sorted reasonably early - just after kmem_cache_init - but that
could be moved even earlier if necessary.
There is now a lib/extable.c which includes the sort_extable()
function from arch/ppc/mm/extable.c and the search_extable() function
from arch/i386/mm/extable.c, which had been copied to many
architectures. On many architectures, arch/$(ARCH)/mm/extable.c
became empty and so I have removed it.
There are four architectures which do things differently from i386:
alpha, ia64, sparc and sparc64. Alpha and ia64 store the offset from
the offset from the exception table entry to the instruction, and
sparc and sparc64 have range entries in the table. For those
architectures I have added empty sort_extable functions. The
maintainers for those architectures can implement something better if
they care to. As it is they are no worse off than before.
Although it is a moderately sizable patch, it ends up with a net
reduction of 377 lines in the size of the kernel source. :)
I have tested this on x86 and ppc with a module that uses __get_user
in an init function, deliberately laid out to get the exception table
out of order, and it works (whereas it oopsed without this patch).
|
|
From: Jes Sorensen <jes@trained-monkey.org>
The following patch removes a couple of null-ilizers of global variables.
Not a big deal, but every byte helps in the .data segment ;-)
|
|
From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
- remove unused "rows" and "cols"
- change the 2 variable to static
|
|
From: viro@parcelfarce.linux.theplanet.co.uk
When we register disks, we mangle the disk names that contain slashes (e.g.
cciss/c0d0) replacing them with '!' in corresponding sysfs names. So
name_to_dev_t() should mangle the name in the same way before looking for it
in /sys/block.
|
|
From: James Morris <jmorris@redhat.com>
The patch below removes the CLONE_FILES flag from the kernel_thread() call
which starts init.
This is to prevent other kernel threads from sharing file descriptors
opened by init (try 'lsof /dev/initctl' on a 2.6 system :-).
The reason this patch is being proposed is so that usermode helper apps
launched via kernel threads (e.g. modprobe, hotplug) do not then inherit
any such file descriptors. This is not a problem in itself so far (other
than being messy), but it is a problem for SELinux, which will otherwise
need to grant access to /dev/initctl by modprobe and hotplug, a somewhat
undesirable scenario.
As far as I can tell, there is no reason why init needs to be spawned with
CLONE_FILES. Please let me know if there are any objections to the
change, which I would like to propose for 2.6.0+ as a cleanup.
|
|
From: Adrian Bunk <bunk@fs.tum.de>
Allow the kernel to be built with `-Os'.
It requires CONFIG_EMBEDDED. This is to make it "hard to get at" because
one gcc version (3.2.x I think) from RH9 generates crashy kernels with this
option set.
|
|
It calls __init functions anyway.
|
|
This tunable refers to the amount of free memory which the VM will attempt to
sustain. It is mainly needed for atomic allocations (eg, networking
receive).
It is currently hardwired to 1024k, which is far too large for small machines
and too small for large machines.
Rework it to be 128k on tiny machines and 16M on huge machines.
|
|
From: Matt Tolentino <metolent@snoqualmie.dp.intel.com>
Attached is a patch that enables EFI boot-up support in ia32 kernels.
In order to continue to determine whether the kernel should initialize using
EFI tables, I've temporarily added a check on the LOADER_TYPE boot parameter.
Although I haven't requested that elilo be assigned an id for this yet, I've
used this to determine whether the kernel should use the EFI initialization
path as well as a check to see if the EFI_SYSTAB boot parameter contains
anything. If someone has a better suggestion for determining this, I'm
open...
This patch also uses the existing ioremapping functions to map the efi tables
into kernel virtual address space. I've added an option such that I could
use Dave Hansen's boot_ioremap() before paging_init(). After paging_init, I
then remap the efi memmap using bt_ioremap for use later. This has
eliminated the need for several functions...thanks for the suggestions and
thanks for your help Dave. Still this could use a look-see.
|
|
There seems to be no header file which declares system_running.
|
|
This fixes initrd with devfs. With that combination the late-boot code
does temporary mount of devfs over rootfs /dev, which made /dev/initrd
inaccessible. For setups without devfs that didn't happen.
The fix is trivial - put the file in question outside of /dev; IOW,
we simply replace "/dev/initrd" with "/initrd.image" in init/*.
Confirmed to fix the problem by Valdis Kletnieks
|
|
From: viro@parcelfarce.linux.theplanet.co.uk
* drivers/block/initrd.c gone
* chunk of memory where the current tree would look for intird image is
checked for being a valid initramfs image first; then, it is either
unpacked (in addition to normal built-in image) or, if it wasn't a valid
image, copied into a regular file on rootfs called /dev/initrd. Then
memory is freed.
Result:
a) we can put initramfs image in place of initrd one and kernel will DTRT.
b) initrd images still work as usual; code that shoves the thing to
ramdisk, etc. doesn't care whether it reads from a block device or
regular file.
c) initrd.c is gone, so is fake block device and a lot of irregularities
with it.
It has been in -mm for almost two weeks with no reported problems.
|
|
|
|
into intel.com:/home/lenb/bk/linux-acpi-test-2.6.0
|
|
Real conversion to 32bit dev_t. Expansion to:
* mknod() - 32
* newstat() - 32 on 64bit platforms
* stat64() - 32 on mips, 64 on everything else (mips has weird struct
stat64 and can't get more than 32 bits). Note that right now the difference
is purely theoretical - we don't have internal values above 32 bits, so
huge_... vs. new_... only marks the places where 64bit conversion will need
extra work.
* arch-dependent stat variants - depending on width available.
* ustat et.al. - 32
* filesystems that can handle 32 bits right now - 32
* ext2 and ext3 - 32, with large dev_t inodes having 0 in the first
element of i_data[] (where we store dev_t value for small device numbers) and
keeping the value in the second element.
* nfsd - 32; it can be driven to 64, but we'll get several issues with
NFSv2 support.
* RAID - 32
* devmapper - with v1 it's still 16 (nothing to do here), with v4 it's
64.
* loop - 64
* initramfs - 32
* do_mounts code - 32. Parts that scan devfs tree are using newstat()
on 64bit platforms and stat64() on the rest (IOW, the latest stat variant on
given platform).
* old_valid_dev()/new_valid_dev() added where needed (stat variants,
mostly - we fail with -EOVERFLOW if values do not fit).
|
|
Changed sys_mknod() prototype to have unsigned int passed to it
instead of current dev_t. Added old_decode_dev() in sys_mknod() and
made sure that its callers are passing it old_encode_dev(<value>)
Switched sys_ustat() and its variants from dev_t to unsigned (and
added old_decode_dev()).
Took care of assignments to ROOT_DEV - again, old_decode_dev().
Late-boot search in devfs (call sys_newstat() and compare with
st_rdev) also updated.
|
|
From: Erik Andersen <andersen@codepoet.org>
When someone specifies "init=" to select an alternative binary to run
instead of /sbin/init, argv[0] is not set correctly. This is a problem for
programs such as busybox that multiplex applications based on the value of
argv[0]. For example, even if you specify init=/bin/sh" on the kernel
command line, busybox will still receive "/sbin/init" as argv[0] and will
therefore run init rather than /bin/sh...
|
|
Restores CONFIG_ACPI_HT_ONLY as an alternative to CONFIG_ACPI
rather than a prerequisite
|
|
Note that this restores CONFIG_ACPI_HT_ONLY as a sub-set of CONFIG_ACPI rather than a dependency.
|
|
Dont print the contents of the initramfs, for any decent sized cpio it will
overflow the kernel ring buffer.
Also relax permissions on /dev (755 not 700).
|
|
Set correct permissions on initramfs directories and special files. We dont
want to obey the umask here, so do the same thing we do on normal files -
call sys_chmod.
|
|
Thanks to Stephen Hemminger for pointing out how obsolete modules.txt is.
modules.txt contains mainly ancient information which is replicated
in the kconfig help message, README, makefile.txt or the modprobe manual
page. The only part which is not covered elsewhere is the "building
external modules" which is still being debated (and belongs under the
kbuild docs). kmod.txt reference removed from index, too.
|
|
From: Andre McCurdy <armcc2000@yahoo.com>
There is some inconsistency within lib/inflate.c and its users about
whether the error message text or the error() function should provide
the '\n'.
This patch tries to make everyone consistent - by removing the
newline from all message texts, and adding one to the only error()
function which did not provide it (in init/do_mounts_rd.c).
|
|
From: "Randy.Dunlap" <randy.dunlap@verizon.net>
The SuSE kernels place their ikconfig info at /proc/config.gz: in a
different place, and compressed. We thought it was a good idea to do it
that way in 2.6 as well.
- gzip the /proc config file, put it in /proc/config.gz;
- Based on a SuSE patch by Oliver Xymoron <oxymoron@waste.org>, which was
derived from a patch by Nicholas Leon <nicholas@binary9.net>
- change /proc/ikconfig/built_with to /proc/config_build_info;
- cleanup ikconfig init/exit entry points (static, __init, __exit);
- Makefile help from Sam Ravnborg;
DESC
ikconfig cleanup
EDESC
From: Stephen Hemminger <shemminger@osdl.org>
Simplify and cleanup the code:
- use single interface to seq_file where possible
- don't need to do as much of the /proc interface, only read
- use copy_to_user to avoid char at a time copy
- remove unneccesary globals
- use const char[] rather than const char * where possible.
Didn't change the version since interface doesn't change.
|
|
files.
|
|
forced into double negatives.
|
|
This makes "allyesconfig" do a better job.
|
|
- let more drivers that don't compile depend on BROKEN
- MTD_BLKMTD is fixed, remove the dependency on BROKEN
- let all drivers that don't compile on SMP (due to cli/sti usage)
depend on a BROKEN_ON_SMP that is only defined if !SMP || BROKEN
- #include interrupt.h for dummy cli/sti/... in two files to fix the
UP compilation of these files
I marked only drivers that are broken for a long time and where I don't
know about existing fixes with BROKEN or BROKEN_ON_SMP.
|
|
From: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
High Precision Event Timer (HPET) is next generation timer
hardware and has various advantages over legacy 8254
(PIT) timer, like:
- Associated registers are mapped to memory space. So, we no
longer require in and out on legacy ioports
- Memory map address is reported by ACPI (and are not
hard-coded)
- Each timer can be configured to generate separate interrupts,
even sharing lines with PCI devices
- HPET has a minimum period of 100 nanosecs and is not fixed.
Giving a flexibility of increasing the resolution in future.
- Most current implementations has 3 counters, but in future,
we can have as many as 32 timers per block, and 8
HPET timer blocks (total 256 timers)
- Can support 32bit and 64bit counting
(Refer to http://www.intel.com/labs/platcomp/hpet/hpetspec.htm
for complete specs)
The patchset that follow adds support for High Precision Event
Timer (HPET) based timer in kernel. This uses the HPET in
LegacyReplacement mode (so that counter 0 will be tied to IRQ0,
and counter 1 will be tied to IRQ 8). In this mode, HPET overrides
PIT and RTC interrupt lines. The patch will enable HPET by default,
on systems where ACPI tables reports this feature. The patch will
have no impact on systems that do not support this feature.
A major change from previous version is elimination of fixmap for HPET.
Based on Andrew Morton's suggestion, we have a new hook in init/main.c for
late_time_init(), at which time we can use ioremap, in place of fixmap.
Impact on other archs: Calibrate_delay() (and hence loops_per_jiffy
calculation) has moved down in main.c, from after time_init() to after
kmem_cache_init().
1/6 - hpet1.patch - main.c change to introduce late_time_init()
|
|
From: Andrea Arcangeli <andrea@suse.de>
aka: "vmalloc allocations in ipc needs smp initialized (and vm must be
allowed to schedule in 2.6)"
In short if you change SEMMNI to 8192 the kernel will crash at boot, beause
it tries to call vmalloc before the smp is initialized. The reason is that
vmalloc calls into the pte alloc code, and the fast pte alloc is tried
first, but that reads into the pte_quicklist, that requires the cpu_data to
be initialized (and that happens in smp_init()).
the patch is obviously safe, since no piece of kernel (especially the code
in the check_bugs and smp_init paths ;) calls into the ipc subsystem.
The reason this started to trigger wasn't really that we increased SEMMNI,
but what happend is that some IPC data structure grown, and for some reason
the corruption due the uninitalized pte_quicklist triggers only for smp
boxes with less than 1G (not very common anymore ;). So it wasn't
immediatly reproducible on all setups.
2.6 doesn't suffer from the same problem, simply because 2.6 isn't using
the quicklist anymore, but I think it would be much more correct to make
the same change in 2.6 too, since whatever cond_resched() in the vm paths
(and they're definitely allowed to call it), will lead to a crash since the
init task isn't initialized and the scheduler can't be invoked yet. (and
2.6 already has the bigger data structures that should trigger the vmalloc
all the time on all setups)
|
|
- nmi_watchdog documentation typo ("Randy.Dunlap" <rddunlap@osdl.org>)
- ikconfig proc requires CONFIG_PROC_FS ("Randy.Dunlap" <rddunlap@osdl.org>)
- visws build fix (Andrey Panin <pazke@donpac.ru>)
- VM lock ranking comment update
|
|
This one snuck in...
- debugging message for ACPI
- Intel guys removed it from their 2.4 tree (at my request)
- it's point-in-time specific (message becomes nearly useless after
ACPI bug fixes)
- b/c of the point-in-time issue, it's IMO much more appropriate for a
vendor kernel (where the message, I agree, may be helpful)
- can potentially mislead users to the correct cause of root mount failure
- overall, I disagree with adding messages like this. The number one
bug report, by far, for networking drivers is ACPI-related (no
interrupts delivered). You don't see me adding "boot with acpi=off"
messages to the net subsystem.
|
|
From: Sean Estabrooks <seanlkml@rogers.com>
- fix space at end of line in config files;
- add error check on put_user(); (Daniele Bellucci <bellucda@tiscali.it>)
- add missing Kconfig piece for ikconfig;
|
|
|
|
into home.osdl.org:/home/torvalds/v2.5/linux
|
|
When we changed try_name() to handle new-style printable dev_t formatting we
broke lots of people's setups. Lilo, grub, etc.
Fix that by trying new-style formatting first, then fall back to old-style.
People should generally use new-style %u:%u major:minor formatting in the
future.
|
|
|
|
- Move SMP check to software_suspend() (from software_resume()), so we will
not even attempt to sleep with it enabled.
- Make software_resume() a late initcall, removing the explicit call from
prepare_namespace().
- Initialize software_suspend_enabled to 1, instead of doing it manually in
software_resume().
- Don't explicitly initialzie resume_file.
- Remove resume_status variable, as we can simply check for (non-) NULL
resume_file string.
- "noresume" setup function changed to simply zero first byte of resume_file
string, simplifying logic.
- Don't attempt to reset swap signature if noresume is specified.
- Downstream function (bdev_write_page() wasn't implemented anyway, so we
can just remove that also).
If noresume is specified, there will still be a suspend image left on the
swap partition. It may behoove us to never reset the swap signature, and
always leave the image intact on the disk, since it is a valid snapshot
that we can resume from at anytime.
This unconditional behavior would force the user to add 'mkswap <part>' to
their init scripts to reset the partition to swap use. IMO, this is better
anyway.
|
|
dmi_scan.c: delete some incomplete code that broke !SMP + APIC build; add ACPI blacklist comment,
move __i386__ out of do_mounts.c and into create mount_root_failed_msg()
|
|
|
|
build: add ACPI_HT, delete ACPI_HT_ONLY
boot: add acpi={force, off, ht}; delete "noht", "acpismp="
add DMI blacklist from UnitedLinux
|
|
This function tries to allocate increasingly large buffers, but it gets the
bounds wrong by a factor of PAGE_SIZE. It causes boot-time devfs mounting
to fail.
|
|
From: Greg KH <greg@kroah.com>
Different architectures use different types for dev_t, so it is hard to
print dev_t variables out correctly. Quite a lot of code is wrong now, and
will continue to be wrong when 64-bit dev_t is merged.
Greg's patch introduces a little wrapper function which can be used to
safely form a dev_t for printing. I added the format_dev_t function as
well, which is needed for direct insertion in a printk statement.
|
|
Add kconfig options to allow excluding either or both the I/O
schedulers. This can be useful for embedded systems (saves about ~13KB).
All schedulers are enabled by default for non-embedded.
|
|
This causes blk.h to print a warning and removes all uses of blk.h.
I've tested the compilation in 2.6.0-test1 with a .config that tries to
compile as many drivers as possible.
|
|
|
|
From: Diego Calleja Garcia <diegocg@teleline.es>
Move CONFIG_KALLSYMS out of the arch directory and into init/.
It defaults to "on" unless the user explicitly turns it off in the
"embedded systems" menu.
|
|
From: bert hubert <ahu@ds9a.nl>
Attached patch adds a range check to LOG_BUF_SHIFT and clarifies the
configuration somewhat. I managed to build a non-booting kernel because I
thought 64 was a nice power of two, which lead to the kernel blocking when
it tried to actually use or allocate a 2^64 buffer.
|
|
Declare the parameter array as an array, rather than a single entry.
This doesn't matter for code generation, but may be less likely to cause
problems down the line, since we're telling gcc more about the real
situation.
|
|
As discussed before, this allows for early initialization of security
modules when compiled statically into the kernel. The standard
do_initcalls is too late for complete coverage of all filesystems and
threads, for example.
|
|
Trivial patch: when these were introduced cpu.h didn't exist.
|
|
From: Matthew Dobson <colpatch@us.ibm.com>
This resurrects the old /proc/sys/vm/free_pages functionality: the ability to
tell page reclaim how much free memory to maintain.
This may be needed for specialised networking applications, and it provides
an interesting way to stress the kernel: set it very low so atomic
allocations can easily fail.
Also, a 16G ppc64 box currently cruises along at 1M free memory, which is
surely too little to supporthigh-speed networking. We have not changed that
setting here, but it is now possible to do so.
The patch also reduces the amount of free memory which the VM will maintain
in ZONE_HIGHMEM, as it is almost always wasted memory.
|
|
From: Rusty Russell <rusty@rustcorp.com.au>
OK, this does the *minimum* required to support DEFINE_PER_CPU inside
modules. If we decide to change kmalloc_percpu later, great, we can turf
this out.
Basically, overallocates the amount of per-cpu data at boot to at least
PERCPU_ENOUGH_ROOM if CONFIG_MODULES=y (arch-specific by default 32k: I have
only 7744 bytes of percpu data in my kernel here, so makes sense), and a
special allocator in module.c dishes it out.
|
|
Use strlcpy in root_dev_setup()
|
|
From: Hollis Blanchard <hollis@austin.ibm.com>
Another potential memory leak the Stanford checker caught at 2.5.48: while
closing and opening floppy disks, buf could be allocated and never freed.
|
|
From: Christopher Hoover <ch@murgatroid.com>
Here's a patch to drop some more text/data/bss out of 2.5. This time
the ``victim'' is eventpollfs (epoll).
|
|
From: Christopher Hoover <ch@murgatroid.com>
Not everyone needs futex support, so it should be optional. This is needed
for small platforms.
|
|
|
|
From: Manfred Spraul <manfred@colorfullife.com>
attached is the promised cleanup/bugfix patch for the slab bootstrap:
- kmem_cache_init & kmem_cache_sizes_init merged into one function,
called after mem_init(). It's impossible to bring slab to an operational
state without working gfp, thus the early partial initialization is not
necessary.
- g_cpucache_up set to FULL at the end of kmem_cache_init instead of the
module init call. This is a bugfix: slab was completely initialized,
just the update of the state was missing.
- some documentation for the bootstrap added.
The minimal fix for the bug is a two-liner: move g_cpucache_up=FULL from
cpucache_init to kmem_cache_sizes_init, but I want to get rid of
kmem_cache_sizes_init, too.
|
|
but the build was confused by the fact that they did share some files.
Move INITRD code from do_mounts_rd.c to new file do_mounts_initrd.c.
|
|
This is a patch from Robert P.J. Day that replaces www.linuxdoc.org
(which is outdated and unspported according to www.tldp.org)
with www.tldp.org in lots of Kconfig files.
|
|
split the initrd stuff out of blk.h, it's only needed in the boot code
and the ramdisk driver.
|
|
Now that sparc64 is using gcc-3.x we can disallow gcc-2.91, etc.
Documentation/Changes already says 2.95.3, which is working fine for me.
With this change, we no longer require that per-cpu data definitions be
initialised. That was a workaround for a bug in older gccs. So remove the
build infrastructure which was checking for that.
Also, mention that nfs-utils-1.0.3 is required. It isn't required yet, but
will be once we enable larger dev_t: there is an interface for exportfs which
passes dev_t's into the kernel which breaks with larger dev_t. That
interface is old, deprecated and is not used in nfs-utils-1.0.3.
|
|
The patch is designed to help locate where the kernel is dying during the
startup sequence.
- Boot parameter "initcall_debug" causes the kernel to print out the
address of each initcall before calling it.
The kallsyms tables do not cover __init sections, so printing the
symbolic version of these symbols doesn't work. They need to be looked up
in System.map.
- Detect whether an initcall returns with interrupts disabled or with a
locking imbalance. If it does, complain and then try to fix it up.
|
|
bdevname returns a pointer to a static string. Change it so that the caller
passes in the buffer.
|
|
Patch from Anders Gustafsson <andersg@0x63.nu>
We're getting a division-by-zero in the writeback code during early rootfs
population, because writeback has not yet been initialised.
Fix that by performing an explicit initialisation rather than relying on
initcall ordering.
|
|
|
|
set. A few versions of gcc-2.96 generate seriously incorrect code.
|
|
|
|
Patch from Tom Rini <trini@kernel.crashing.org>
Take CONFIG_SWAP out of the top-level menu into the general setup menu. Make
it dependent on CONFIG_MMU and common to all architectures.
|
|
Added a new rule filechk used to check when a generated file
actually is changed. If there is no actual changes the file
is left without updating the timestamp.
When building a kernel from scratch two printouts occurs:
CHK file-to-generate
UPD file-to-generate
The first line tell that kbuild checks the file, second line tell that
the file is being updated (or created).
On successive runs only the first line is printed.
Output is the same in verbose and non-verbose mode.
This replaces the former update-if-changed which has been deleted.
generate-asm-offsets.h has been renamed as well.
All users are updated in next patch.
Output when generating compile.h follow above style
|
|
Russell King investigated a failure case I introduced: When booting
with "load_ramdisk=1", we use the kernel root= parameter to determine
from what device to get the contents to copy into a ramdisk and then
mount that ramdisk as root.
For the copy to work, /dev/root needs to point to the device to load the
ramdisk from.
|
|
- Call driver_init() from init/main.c::do_basic_setup().
This ensures that all the driver model subsystems are initialized before
any drivers or devices can be registered.
It nearly frees up the core and postcore initcall levels, making them
available for other kernel code to use freely.
|
|
We don't really have a nice way to say "compile this when CONFIG_FOO
is y, don't otherwise".
Alternatives are:
obj-$(subst m,,$(CONFIG_FOO)) := foo.o
or
obj-$(CONFIG_FOO) := foo.o
obj-m :=
or
obj-y := do_foo.o
do_foo-$(CONFIG_FOO) := foo.o
I chose the last one, though I'm not particularly happy with either.
|
|
into home.transmeta.com:/home/torvalds/v2.5/linux
|
|
into tp1.ruhr-uni-bochum.de:/scratch/kai/kernel/v2.5/linux-2.5.do_mounts
|
|
For some people (though not me), the '+' indicating that a command will
invoke a sub-make didn't propagated properly, and caused a warning.
Putting the command all into one line should fix that.
Plus some cosmetics and clean up the per_cpu check.
|
|
use bdevname() for block devices.
|
|
The only path where the /dev/root created by prepare_namespace() wouldn't
be overwritten by a later create_dev() call is mount_nfs_root(), so move
it there, so that all the create_dev() calls are now next to the
actual mount. Also simplify mount_root() a little.
|
|
Small savings, but somewhat nicer anyway.
|
|
There's still an #ifdef in there for CONFIG_BLK_DEV_INITRD, but at
some point it just gets too many files, so accept that for now.
There's also a #define BUILD_CRAMDISK in there, which should be either
made a config option or removed...
|
|
The mount_initrd check can be moved into initrd_load(), so that we
have all initrd code consolidated in one #ifdef'd section now.
|
|
Currently, we would try to read in a ramdisk image from floppy, if
o root device is /dev/fd*
o "load_ramdisk=1" on the command line
o CONFIG_BLK_DEV_INITRD is not set, or "noinitrd" on the command line.
Relax the last restriction, which only makes things more complicated for
no reason, and changes behavior depending on an unrelated config option.
|
|
This just moves a bit of logic out of prepare_namespace() into
initrd_load(), which means that we now don't reference handle_initrd()
anymore if CONFIG_BLK_DEV_INITRD is not set (we wouldn't call it anyway),
so we can put some bits of code under a common #ifdef CONFIG_BLK_DEV_INITRD.
|
|
When CONFIG_BLK_DEV_INITRD is not set, mount_initrd will always be
0, so initrd_load() won't be called. However, we need a stub to make
the compiler happy.
|
|
If initrd_start is 0, initrd_load() -> rd_load_image() -> open("dev/initrd")
will fail anyway, so no need for the explicit check here.
|
|
As we all know, strncmp() returns 0 for match...
Obviously nobody uses this codepath since it never worked, but let's fix
it anyway.
|
|
Again, just get rid of some #ifdefs by moving MD setup into
its own file which is only compiled when CONFIG_BLK_DEV_MD is set.
|
|
o prepare_namespace() is called before free_initmem(), so it can
be __init.
o all static data in do_mounts.c can be __initdata for the same reason.
o move the __init into its standard location between return value
and function name.
o root_device_name can be a char *.
|
|
Get rid of a couple scattered #ifdef CONFIG_DEVFS_FS in init/do_mounts.c
by moving the devfs code into its own file and using stubs when it's
not selected.
|
|
Since we'll have to always do module postprocessing shortly, we can as well
get rid of the special cased init/vermagic.o which needed to be compiled
before descending, and instead include the current version magic string
during post processing. For that purpose, the generation of the string is
moved from init/vermagic.c to include/linux/vermagic.h.
People who externally maintain modules will also be happy about that.
|
|
The Stanford Checker identified a memory leak in init/do_mounts.c.
This corrects it.
|
|
CONFIG_MODVERSIONING was a temporary name introduced to distinguish
between the old and new module version implementation. Since the
traces of the old implementation are now gone from the build system,
we rename the config option back in order to not confuse users more
than necessary in 2.6.
Also, remove some historic modversions cruft throughout the tree.
|
|
This patch adds the new config option CONFIG_MODVERSIONING which will
be the new way of checking for ABI changes between kernel and module
code.
This and the following patches are in part based on an initial
implementation by Rusty Russell and I believe some of the ideas go back
to discussions on linux-kbuild, Keith Owens and Rusty.
though I'm not sure I think credit for the basic idea of
storing version info in sections goes to Keith Owens and Rusty.
o Rename __gpl_ksymtab to __ksymtab_gpl since that looks more consistent
and appending _gpl instead of putting it into the middle simplifies
sharing code for EXPORT_SYMBOL() and EXPORT_SYMBOL_GPL()
o Add CONFIG_MODVERSIONING
o If CONFIG_MODVERSIONING is set, add a section __kcrctab{,_gpl}, which
contains the ABI checksums for the exported symbols listed in
__ksymtab{,_crc} Since we don't know the checksums yet at compilation
time, just make them an unresolved symbol which gets filled in by the
linker later.
|
|
Patch from Michael Hohnbaum
This adds a hook, sched_balance_exec(), to the exec code, to make it
place the exec'ed task on the least loaded queue. We have less state
to move at exec time than fork time, so this is the cheapest point
to cross-node migrate. Experience in Dynix/PTX and testing on Linux
has confirmed that this is the cheapest time to move tasks between nodes.
It also macro-wraps changes to nr_running, to allow us to keep track of
per-node nr_running as well. Again, no impact on non-NUMA machines.
|
|
The current LOG_BUF size is a bit confusing the first
time that "make oldconfig" is used. It's difficult to
select anything other than the default value.
Also, you (Linus) expressed a desire to have this
configurable only if DEBUG_KERNEL or "kernel hacking"
was enabled, so I've changed it to accomplish that.
This patch also uses Kconfig in a way that Roman intended
since a patch in 2.5.52 which enables default values if
a prompt is not enabled, but lets values be chosen when
the prompt is enabled. You also asked for this in setting
this config option.
|
|
This patch, based on Rusty's implementation, adds a special section
to vmlinux and all modules, which contain the kernel version
string, values of some particularly important config options
(SMP,preempt,proc family) and the gcc version.
When inserting a module, the version string is checked against the
kernel version string and loading is rejected if they don't match.
The version string is actually added to the modules during the
final .ko generation, so that a changed version string does only
cause relinking, not recompilation, which is a major performance
improvement over the old 2.4 way of doing things.
|
|
From: Adrian Bunk <bunk@fs.tum.de>
|
|
Remove unused prototype for init_modules()
|
|
This patch combines the common exception table searching functionality
for various architectures, to avoid unneccessary (and currently buggy)
duplication, and so that the exception table list and lock can be kept
private to module.c.
The archs provide "struct exception_table" and "search_extable": the
generic infrastructure drives the rest.
|
|
Patch from Bill Irwin. Prodding from me.
The hashtables in kernel/pid.c are 128 kbytes, which is far too large for
very small machines.
So we dynamically size them and allocate them from bootmem. From 16 buckets
on the very smallest machine up to 4096 buckets (effectively half the current
size) with one gigabyte of memory or more.
The patch also switches the hashing from a custom hash over to the more
powerful hash_long().
|
|
Patch from Adam J. Richter <adam@yggdrasil.com> and
Milton Miller <miltonm@bga.com>
There's some init-time code which is supposed to read a devfs directory by
expanding the bufer until the whole directory fits. But the logic is wrong
and it only works if the whole directory fits into 512 bytes.
So fix that up, and also clean up some coding in there, and rationalise the
duplicated definition of linux_dirent64.
|
|
Patch from "Randy.Dunlap" <rddunlap@osdl.org>
Convert the selection of LOG_BUF_SIZE from an ifdef tangle in
printk.c into config logic.
|
|
|
|
George Anzinger identified the following problem: when a secondary CPU is
coming up, it calls printk() before it is "online". It calls the console
drivers before its per-cpu storage has been prepared. And the vga console
driver does a mod_timer(). This CPU's timers have not yet been initialised;
it is not clear why this doesn't oops - George thinks it is because virtual
address zero is still accessible at that time.
I believe the right way to fix this is to change printk so that a not-online
CPU will not call the console drivers. Because printk should always be
callable. If the CPU is not online the message is buffered, so the next
caller to printk who is online will actually display it.
ia64 has been doing exactly this for ages, so we can remove the
arch_consoles_callable() hook and just open-code the cpu_online() test in
printk.
That fixes things up for the secondary CPUs. But this change causes a
problem for the boot CPU: it is being marked online very late in boot, so the
printk buffer is being displayed much later than we would like.
I believe that the solution to this is to mark the boot CPU online much
earlier. So in this patch we call the new arch-provided function
smp_prepare_boot_cpu() immediately after the boot CPU's per-cpu areas are set
up. Its mandate is to (at least) mark the boot CPU "online".
The change has been reviewed by davem and rth. No comments were received
from the other arch maintainers.
|
|
Restore the accidentally dropped code to handle "init=xxx"
|
|
This is the backwards compatibility code for MODULE_PARM, and moves
__MODULE_STRING() down to the graveyard at the bottom of module.h.
It's complicated by the fact that many modules place MODULE_PARM()
before the declaration (some do MODULE_PARM() for non-existant
variables, too). To avoid breaking them, we have to do the name
lookups at load time, rather than just storing a pointer 8(
CONFIG_OBSOLETE_MODPARM is set to y without prompting: it's a useful
marker for deprecating in 2.7.
|
|
This patch is a rewrite of the insmod and boot parameter handling,
to unify them.
The new format is fairly simple: built on top of __module_param_call there
are several helpers, eg "module_param(foo, int, 000)". The final argument
is the permissions bits, for exposing parameters in sysfs (if
non-zero) at a later stage.
|
|
Moves console_loglevel & friends to an array, as sysctl expects.
|
|
Makefiles no longer need to include Rules.make, which is currently an
empty file. This patch removes it from the remaining Makefiles, and
removes the empty Rules.make file.
|
|
Rewrite of the s390 channel subsystem driver for the new driver model
The channel subsystem driver a.k.a s390 common I/O layer is the low level
driver for most device drivers on s390 systems. The old code is largely
unchanged from the initial linux-2.2 port and there is a lot of bitrot
on it.
In particular, concepts from the 2.5 driver model are implemented in a
completely different and more complicated way here.
This rewrite tries to get the driver ready for 2.6. The new interface is
not compatible to the old one but should be rather stable now unless
someone finds major flaws.
The 's390dyn' and 'chandev' interfaces have been removed entirely (yippii!)
and are replaced by hotplug and sysfs interfaces.
Authors: Arnd Bergmann <arndb@de.ibm.com>,
Cornelia Huck <cohuck@de.ibm.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
fixes warnings in acct.h and do_mounts.c
|
|
|
|
into kroah.com:/home/greg/linux/BK/lsm-2.5
|
|
init/do_mounts.c is using the BLKGETSIZE ioctl which expects a pointer to
an unsigned long but actually it passes a pointer to an int which of
course is blowing up on 64-bit systems.
|
|
|
|
|
|
Don't include the following headers implicitly through fs.h:
stddef.h, string.h, bitops.h, pipe_fs_i.h, ext3_fs_i.h, efs_fs_i.h
and fixup the fallout..
|
|
since they are needed for early arch initialization.
Thanks to Manfred for pointing this out.
|
|
This is the logical counterpoint to the code which marks modules
"[unsafe]" when obsolete (racy) interfaces are used. Allows "just
remove the damn thing" rmmod -f, and taints the kernel.
Mark it dangerous and experimental in the config file to make this
doubly clear.
|
|
This is a preparation to get rid of the implicit includes in
dcache.h and fs_struct.h.
|
|
Grrr... Two bugs in a patch that had moved md setup to late boot:
a) we need md_run_setup() run before parsing root name.
b) it's create_dev("/dev/md0",...), not create_dev("md0",...) ;-/
|
|
This is an implementation of the in-kernel module loader extending
the try_inc_mod_count() primitive and making its use compulsory.
This has the benifit of simplicity, and similarity to the existing
scheme. To reduce the cost of the constant increments and
decrements, reference counters are lockless and per-cpu.
Eliminated (coming in following patches):
o Modversions
o Module parameters
o kallsyms
o EXPORT_SYMBOL_GPL and MODULE_LICENCE checks
o DEVICE_TABLE support.
New features:
o Typesafe symbol_get/symbol_put
o Single "insert this module" syscall interface allows trivial userspace.
o Raceless loading and unloading
You will need the trivial replacement module utilities from:
http://ozlabs.org/~rusty/module-init-tools-0.6.tar.gz
|
|
RAID autoconfig rewritten to use syscalls and moved into do_mounts.c;
use of devfs_get_handle() in do_mounts.c also rewritten in syscalls.
|
|
|
|
into redhat.com:/home/jgarzik/repo/minitramfs-2.5
|
|
(thanks to Al Viro for his patience, I owe him one)
|
|
|
|
Without the below patch, my HT 2-way prints out
"CPUS Done 4294967295" on boot, which whilst amusing
is somewhat exaggerated.
|
|
Also, update path to look for devices in to reflect placement of block
subsystem at top level.
|