Quick merge to 2.5.1-pre11

author: davem <davem> 2001-12-13 12:34:08 +0000
committer: davem <davem> 2001-12-13 12:34:08 +0000
commit: 426a063260c2afa557be2256f3f7cb07acc0f2b4 (patch)
tree: 44c6bd97b331323db803d203746803ce4f07f595
parent: 6cd01a554443baeb90df209d3e279142aa1709da (diff)
download: netdev-vger-cvs-426a063260c2afa557be2256f3f7cb07acc0f2b4.tar.gz
52 files changed, 2452 insertions, 2214 deletions
diff --git a/Documentation/driver-model.txt b/Documentation/driver-model.txt
new file mode 100644
index 000000000..f77e051f0
--- /dev/null
+++ b/Documentation/driver-model.txt
@@ -0,0 +1,598 @@
+The (New) Linux Kernel Driver Model
+
+Version 0.04
+
+Patrick Mochel	<mochel@osdl.org>
+
+03 December 2001
+
+
+Overview
+~~~~~~~~
+
+This driver model is a unification of all the current, disparate driver models
+that are currently in the kernel. It is intended is to augment the
+bus-specific drivers for bridges and devices by consolidating a set of data
+and operations into globally accessible data structures.
+
+Current driver models implement some sort of tree-like structure (sometimes
+just a list) for the devices they control. But, there is no linkage between
+the different bus types.
+
+A common data structure can provide this linkage with little overhead: when a
+bus driver discovers a particular device, it can insert it into the global
+tree as well as its local tree. In fact, the local tree becomes just a subset
+of the global tree.
+
+Common data fields can also be moved out of the local bus models into the
+global model. Some of the manipulation of these fields can also be
+consolidated. Most likely, manipulation functions will become a set
+of helper functions, which the bus drivers wrap around to include any
+bus-specific items.
+
+The common device and bridge interface currently reflects the goals of the
+modern PC: namely the ability to do seamless Plug and Play, power management,
+and hot plug. (The model dictated by Intel and Microsoft (read: ACPI) ensures
+us that any device in the system may fit any of these criteria.)
+
+In reality, not every bus will be able to support such operations. But, most
+buses will support a majority of those operations, and all future buses will.
+In other words, a bus that doesn't support an operation is the exception,
+instead of the other way around.
+
+
+Drivers
+~~~~~~~
+
+The callbacks for bridges and devices are intended to be singular for a
+particular type of bus. For each type of bus that has support compiled in the
+kernel, there should be one statically allocated structure with the
+appropriate callbacks that each device (or bridge) of that type share.
+
+Each bus layer should implement the callbacks for these drivers. It then
+forwards the calls on to the device-specific callbacks. This means that
+device-specific drivers must still implement callbacks for each operation.
+But, they are not called from the top level driver layer.
+
+This does add another layer of indirection for calling one of these functions,
+but there are benefits that are believed to outweigh this slowdown.
+
+First, it prevents device-specific drivers from having to know about the
+global device layer. This speeds up integration time incredibly. It also
+allows drivers to be more portable across kernel versions. Note that the
+former was intentional, the latter is an added bonus.
+
+Second, this added indirection allows the bus to perform any additional logic
+necessary for its child devices. A bus layer may add additional information to
+the call, or translate it into something meaningful for its children.
+
+This could be done in the driver, but if it happens for every object of a
+particular type, it is best done at a higher level.
+
+Recap
+~~~~~
+
+Instances of devices and bridges are allocated dynamically as the system
+discovers their existence. Their fields describe the individual object.
+Drivers - in the global sense - are statically allocated and singular for a
+particular type of bus. They describe a set of operations that every type of
+bus could implement, the implementation following the bus's semantics.
+
+
+Downstream Access
+~~~~~~~~~~~~~~~~~
+
+Common data fields have been moved out of individual bus layers into a common
+data structure. But, these fields must still be accessed by the bus layers,
+and sometimes by the device-specific drivers.
+
+Other bus layers are encouraged to do what has been done for the PCI layer.
+struct pci_dev now looks like this:
+
+struct pci_dev {
+	...
+
+	struct device device;
+};
+
+Note first that it is statically allocated. This means only one allocation on
+device discovery. Note also that it is at the _end_ of struct pci_dev. This is
+to make people think about what they're doing when switching between the bus
+driver and the global driver; and to prevent against mindless casts between
+the two.
+
+The PCI bus layer freely accesses the fields of struct device. It knows about
+the structure of struct pci_dev, and it should know the structure of struct
+device. PCI devices that have been converted generally do not touch the fields
+of struct device. More precisely, device-specific drivers should not touch
+fields of struct device unless there is a strong compelling reason to do so.
+
+This abstraction is prevention of unnecessary pain during transitional phases.
+If the name of the field changes or is removed, then every downstream driver
+will break. On the other hand, if only the bus layer (and not the device
+layer) accesses struct device, it is only those that need to change.
+
+
+User Interface
+~~~~~~~~~~~~~~
+
+By virtue of having a complete hierarchical view of all the devices in the
+system, exporting a complete hierarchical view to userspace becomes relatively
+easy.
+
+Whenever a device is inserted into the tree, a directory is created for it.
+This directory may be populated at each layer of discovery - the global layer,
+the bus layer, or the device layer.
+
+The global layer currently creates two files - 'status' and 'power'. The
+former only reports the name of the device and its bus ID. The latter reports
+the current power state of the device. It also be used to set the current
+power state.
+
+The bus layer may also create files for the devices it finds while probing the
+bus. For example, the PCI layer currently creates 'wake' and 'resource' files
+for each PCI device.
+
+A device-specific driver may also export files in its directory to expose
+device-specific data or tunable interfaces.
+
+These features were initially implemented using procfs. However, after one
+conversation with Linus, a new filesystem - driverfs - was created to
+implement these features. It is an in-memory filesystem, based heavily off of
+ramfs, though it uses procfs as inspiration for its callback functionality.
+
+Each struct device has a 'struct driver_dir_entry' which encapsulates the
+device's directory and the files within.
+
+Device Structures
+~~~~~~~~~~~~~~~~~
+
+struct device {
+	struct list_head 	bus_list;
+	struct iobus		*parent;
+	struct iobus		*subordinate;
+
+	char    		name[DEVICE_NAME_SIZE];
+	char    		bus_id[BUS_ID_SIZE];
+
+	struct driver_dir_entry	* dir;
+
+	spinlock_t		lock;
+	atomic_t		refcount;
+
+	struct device_driver 	*driver;
+	void            	*driver_data;
+	void    		*platform_data;
+
+	u32             	current_state;
+	unsigned char 		*saved_state;
+};
+
+bus_list:
+	List of all devices on a particular bus; i.e. the device's siblings
+
+parent:
+	The parent bridge for the device.
+
+subordinate:
+	If the device is a bridge itself, this points to the struct io_bus that is
+	created for it.
+
+name:
+	Human readable (descriptive) name of device. E.g. "Intel EEPro 100"
+
+bus_id:
+	Parsable (yet ASCII) bus id. E.g. "00:04.00" (PCI Bus 0, Device 4, Function
+	0). It is necessary to have a searchable bus id for each device; making it
+	ASCII allows us to use it for its directory name without translating it.
+
+dir:
+	Driver's driverfs directory.
+
+lock:
+	Driver specific lock.
+
+refcount:
+	Driver's usage count.
+	When this goes to 0, the device is assumed to be removed. It will be removed
+	from its parent's list of children. It's remove() callback will be called to
+	inform the driver to clean up after itself.
+
+driver:
+	Pointer to a struct device_driver, the common operations for each device. See
+	next section.
+
+driver_data:
+	Private data for the driver.
+	Much like the PCI implementation of this field, this allows device-specific
+	drivers to keep a pointer to a device-specific data.
+
+platform_data:
+	Data that the platform (firmware) provides about the device.
+	For example, the ACPI BIOS or EFI may have additional information about the
+	device that is not directly mappable to any existing kernel data structure.
+	It also allows the platform driver (e.g. ACPI) to a driver without the driver
+	having to have explicit knowledge of (atrocities like) ACPI.
+
+
+current_state:
+	Current power state of the device. For PCI and other modern devices, this is
+	0-3, though it's not necessarily limited to those values.
+
+saved_state:
+	Pointer to driver-specific set of saved state.
+	Having it here allows modules to be unloaded on system suspend and reloaded
+	on resume and maintain state across transitions.
+	It also allows generic drivers to maintain state across system state
+	transitions.
+	(I've implemented a generic PCI driver for devices that don't have a
+	device-specific driver. Instead of managing some vector of saved state
+	for each device the generic driver supports, it can simply store it here.)
+
+
+
+struct device_driver {
+        int     (*probe)        (struct device *dev);
+        int     (*remove)       (struct device *dev);
+
+        int     (*suspend)      (struct device *dev, u32 state, u32 level);
+        int     (*resume)       (struct device *dev, u32 level);
+}
+
+probe:
+	Check for device existence and associate driver with it.
+
+remove:
+	Dissociate driver with device. Releases device so that it could be used by
+	another driver. Also, if it is a hotplug device (hotplug PCI, Cardbus), an
+	ejection event could take place here.
+
+suspend:
+	Perform one step of the device suspend process.
+
+resume:
+	Perform one step of the device resume process.
+
+The probe() and remove() callbacks are intended to be much simpler than the
+current PCI correspondents.
+
+probe() should do the following only:
+
+- Check if hardware is present
+- Register device interface
+- Disable DMA/interrupts, etc, just in case.
+
+Some device initialisation was done in probe(). This should not be the case
+anymore. All initialisation should take place in the open() call for the
+device.
+
+Breaking initialisation code out must also be done for the resume() callback,
+as most devices will have to be completely reinitialised when coming back from
+a suspend state.
+
+remove() should simply unregister the device interface.
+
+
+Device power management can be quite complicated, based exactly what is
+desired to be done. Four operations sum up most of it:
+
+- OS directed power management.
+  The OS takes care of notifying all drivers that a suspend is requested,
+  saving device state, and powering devices down.
+- Firmware controlled power management.
+  The OS only wants to notify devices that a suspend is requested.
+- Device power management.
+  A user wants to place only one device in a low power state, and maybe save
+  state.
+- System reboot.
+  The system wants to place devices in a quiescent state before the system is
+  reset.
+
+In an attempt to please all of these scenarios, the power management
+transition for any device is broken up into several stages - notify, save
+state, and power down. The disable stage, which should happen after notify and
+before save state has been considered and may be implemented in the future.
+
+Depending on what the system-wide policy is (usually dictated by the power
+management scheme present), each driver's suspend callback may be called
+multiple times, each with a different stage.
+
+On all power management transitions, the stages should be called sequentially
+(notify before save state; save state before power down). However, drivers
+should not assume that any stage was called before hand. (If a driver gets a
+power down call, it shouldn't assume notify or save state was called first.)
+This allows the framework to be used seamlessly by all power management
+actions. Hopefully.
+
+Resume transitions happen in a similar manner. They are broken up into two
+stages currently (power on and restore state), though a third stage (enable)
+may be added later.
+
+For suspend and resume transitions, the following values are defined to denote
+the stage:
+
+enum{
+	SUSPEND_NOTIFY,
+	SUSPEND_SAVE_STATE,
+	SUSPEND_POWER_DOWN,
+};
+
+enum {
+	RESUME_POWER_ON,
+	RESUME_RESTORE_STATE,
+};
+
+
+During a system power transition, the device tree must be walked in order,
+calling the suspend() or resume() callback for each node. This may happen
+several times.
+
+Initially, this was done in kernel space. However, it has occurred to me that
+doing recursion to a non-bounded depth is dangerous, and that there are a lot
+of inherent race conditions in such an operation.
+
+Non-recursive walking of the device tree is possible. However, this makes for
+convoluted code.
+
+No matter what, if the transition happens in kernel space, it is difficult to
+gracefully recover from errors or to implement a policy that prevents one from
+shutting down the device(s) you want to save state to.
+
+Instead, the walking of the device tree has been moved to userspace. When a
+user requests the system to suspend, it will walk the device tree, as exported
+via driverfs, and tell each device to go to sleep. It will do this multiple
+times based on what the system policy is.
+
+Device resume should happen in the same manner when the system awakens.
+
+Each suspend stage is described below:
+
+SUSPEND_NOTIFY:
+
+This level to notify the driver that it is going to sleep. If it knows that it
+cannot resume the hardware from the requested level, or it feels that it is
+too important to be put to sleep, it should return an error from this function.
+
+It does not have to stop I/O requests or actually save state at this point.
+
+SUSPEND_DISABLE:
+
+The driver should stop taking I/O requests at this stage. Because the save
+state stage happens afterwards, the driver may not want to physically disable
+the device; only mark itself unavailable if possible.
+
+SUSPEND_SAVE_STATE:
+
+The driver should allocate memory and save any device state that is relevant
+for the state it is going to enter.
+
+SUSPEND_POWER_DOWN:
+
+The driver should place the device in the power state requested.
+
+
+For resume, the stages are defined as follows:
+
+RESUME_POWER_ON:
+
+Devices should be powered on and reinitialised to some known working state.
+
+RESUME_RESTORE_STATE:
+
+The driver should restore device state to its pre-suspend state and free any
+memory allocated for its saved state.
+
+RESUME_ENABLE:
+
+The device should start taking I/O requests again.
+
+
+Each driver does not have to implement each stage. But, it if it does
+implemente a stage, it should do what is described above. It should not assume
+that it performed any stage previously, or that it will perform any stage
+later.
+
+It is quite possible that a driver can fail during the suspend process, for
+whatever reason. In this event, the calling process must gracefully recover
+and restore everything to their states before the suspend transition began.
+
+If a driver knows that it cannot suspend or resume properly, it should fail
+during the notify stage. Properly implemented power management schemes should
+make sure that this is the first stage that is called.
+
+If a driver gets a power down request, it should obey it, as it may very
+likely be during a reboot.
+
+
+Bus Structures
+~~~~~~~~~~~~~~
+
+struct iobus {
+	struct	list_head 	node;
+	struct 	iobus 		*parent;
+	struct 	list_head 	children;
+	struct 	list_head 	devices;
+
+	struct 	list_head 	bus_list;
+
+	spinlock_t		lock;
+	atomic_t		refcount;
+
+	struct 	device 		*self;
+	struct	driver_dir_entry * dir;
+
+	char    name[DEVICE_NAME_SIZE];
+	char    bus_id[BUS_ID_SIZE];
+
+	struct  bus_driver	*driver;
+};
+
+node:
+	Bus's node in sibling list (its parent's list of child buses).
+
+parent:
+	Pointer to parent bridge.
+
+children:
+	List of subordinate buses.
+	In the children, this correlates to their 'node' field.
+
+devices:
+	List of devices on the bus this bridge controls.
+	This field corresponds to the 'bus_list' field in each child device.
+
+bus_list:
+	Each type of bus keeps a list of all bridges that it finds. This is the
+	bridges entry in that list.
+
+self:
+	Pointer to the struct device for this bridge.
+
+lock:
+	Lock for the bus.
+
+refcount:
+	Usage count for the bus.
+
+dir:
+	Driverfs directory.
+
+name:
+	Human readable ASCII name of bus.
+
+bus_id:
+	Machine readable (though ASCII) description of position on parent bus.
+
+driver:
+	Pointer to operations for bus.
+
+
+struct iobus_driver {
+	char    name[16];
+	struct  list_head node;
+
+	int     (*scan)         (struct io_bus*);
+	int     (*add_device)   (struct io_bus*, char*);
+};
+
+name:
+	ASCII name of bus.
+
+node:
+	List of buses of this type in system.
+
+scan:
+	Search the bus for new devices. This may happen either at boot - where every
+	device discovered will be new - or later on - in which there may only be a few
+	(or no) new devices.
+
+add_device:
+	Trigger a device insertion at a particular location.
+
+
+
+The API
+~~~~~~~
+
+There are several functions exported by the global device layer, including
+several optional helper functions, written solely to try and make your life
+easier.
+
+void device_init_dev(struct device * dev);
+
+Initialise a device structure. It first zeros the device, the initialises all
+of the lists. (Note that this would have been called device_init(), but that
+name was already taken. :/)
+
+
+struct device * device_alloc(void)
+
+Allocate memory for a device structure and initialise it.
+First, allocates memory, then calls device_init_dev() with the new pointer.
+
+
+int device_register(struct device * dev);
+
+Register a device with the global device layer.
+The bus layer should call this function upon device discovery, e.g. when
+probing the bus.
+dev should be fully initialised when this is called.
+If dev->parent is not set, it sets its parent to be the device root.
+It then does the following:
+	- inserts it into its parent's list of children
+	- creates a driverfs directory for it
+	- creates a set of default files for the device in its directory
+	- calls platform_notify() to notify the firmware driver of its existence.
+
+
+void get_device(struct device * dev);
+
+Increment the refcount for a device.
+
+
+int valid_device(struct device * dev);
+
+Check if reference count is positive for a device (it's not waiting to be
+freed). If it is positive, it increments the reference count for the device.
+It returns whether or not the device is usable.
+
+
+void put_device(struct device * dev);
+
+Decrement the reference count for the device. If it hits 0, it removes the
+device from its parent's list of children and calls the remove() callback for
+the device.
+
+
+void lock_device(struct device * dev);
+
+Take the spinlock for the device.
+
+
+void unlock_device(struct device * dev);
+
+Release the spinlock for the device.
+
+
+
+void 	iobus_init(struct iobus * iobus);
+struct 	iobus * iobus_alloc(void);
+int 	iobus_register(struct iobus * iobus);
+void	get_iobus(struct iobus * iobus);
+int	valid_iobus(struct iobus * iobus);
+void	put_iobus(struct iobus * iobus);
+void	lock_iobus(struct iobus * iobus);
+void	unlock_iobus(struct iobus * iobus);
+
+These functions provide the same functionality as the device_*
+counterparts, only operating on a struct iobus. One important thing to note,
+though is that iobus_register() and iobus_unregister() operate recursively. It
+is possible to add an entire tree in one call.
+
+
+
+int device_driver_init(void);
+
+Main initialisation routine.
+
+This makes sure driverfs is up and running and initialises the device tree.
+
+
+void device_driver_exit(void);
+
+This frees up the device tree.
+
+
+
+
+Credits
+~~~~~~~
+
+The following people have been extremely helpful in solidifying this document
+and the driver model.
+
+Randy Dunlap		rddunlap@osdl.org
+Jeff Garzik		jgarzik@mandrakesoft.com
+Ben Herrenschmidt	benh@kernel.crashing.org
+
+
diff --git a/Documentation/filesystems/driverfs.txt b/Documentation/filesystems/driverfs.txt
new file mode 100644
index 000000000..b1f2553b7
--- /dev/null
+++ b/Documentation/filesystems/driverfs.txt
@@ -0,0 +1,211 @@
+
+driverfs - The Device Driver Filesystem
+
+Patrick Mochel	<mochel@osdl.org>
+
+3 December 2001
+
+
+What it is:
+~~~~~~~~~~~
+driverfs is a unified means for device drivers to export interfaces to
+userspace.
+
+Some drivers have a need for exporting interfaces for things like
+setting device-specific parameters, or tuning the device performance.
+For example, wireless networking cards export a file in procfs to set
+their SSID.
+
+Other times, the bus on which a device resides may export other
+information about the device. For example, PCI and USB both export
+device information via procfs or usbdevfs.
+
+In these cases, the files or directories are in nearly random places
+in /proc. One benefit of driverfs is that it can consolidate all of
+these interfaces to one standard location.
+
+
+Why it's better than procfs:
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+This of course can't happen without changing every single driver that
+exports a procfs interface, and having some coordination between all
+of them as to what the proper place for their files is. Or can it?
+
+
+driverfs was developed in conjunction with the new driver model for
+the 2.5 kernel. In that model, the system has one unified tree of all
+the devices that are present in the system. It follows naturally that
+this tree can be exported to userspace in the same order.
+
+So, every bus and every device gets a directory in the filesystem.
+This directory is created when the device is registered in the tree;
+before the driver actually gets a initialised. The dentry for this
+directory is stored in the struct device for this driver, so the
+driver has access to it.
+
+Now, every driver has one standard place to export its files.
+
+Granted, the location of the file is not as intuitive as it may have
+been under procfs. But, I argue that with the exception of
+/proc/bus/pci, none of the files had intuitive locations. I also argue
+that the development of userspace tools can help cope with these
+changes and inconsistencies in locations.
+
+
+Why we're not just using procfs:
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+When developing the new driver model, it was initially implemented
+with a procfs tree. In explaining the concept to Linus, he said "Don't
+use proc."
+
+I was a little shocked (especially considering I had already
+implemented it using procfs). "What do you mean 'don't use proc'?"
+
+His argument was that too many things use proc that shouldn't. And
+even more things misuse proc that shouldn't. On top of that, procfs
+was written before the VFS layer was written, so it doesn't use the
+dcache. It reimplements many of the same features that the dcache
+does, and is in general, crufty.
+
+So, he told me to write my own. Soon after, he pointed me at ramfs,
+the simplest filesystem known to man.
+
+Consequently, we have a virtual fileystem based heavily on ramfs, and
+borrowing some conceptual functionality from procfs.
+
+It may suck, but it does what it was designed to. At least so far.
+
+
+How it works:
+~~~~~~~~~~~~~
+
+Directories are encapsulated like this:
+
+struct driver_dir_entry {
+	char                    * name;
+	struct dentry           * dentry;
+	mode_t                  mode;
+	struct list_head        files;
+};
+
+name:
+	Name of the directory.
+dentry:
+	Dentry for the directory.
+mode:
+	Permissions of the directory.
+files:
+	Linked list of driver_file_entry's that are in the directory.
+
+
+To create a directory, one first calls
+
+struct driver_dir_entry *
+driverfs_create_dir_entry(const char * name, mode_t mode);
+
+which allocates and initialises a struct driver_dir_entry. Then to actually
+create the directory:
+
+int driverfs_create_dir(struct driver_dir_entry *, struct driver_dir_entry *);
+
+To remove a directory:
+
+void driverfs_remove_dir(struct driver_dir_entry * entry);
+
+
+Files are encapsulated like this:
+
+struct driver_file_entry {
+	struct driver_dir_entry * parent;
+	struct list_head        node;
+	char                    * name;
+	mode_t                  mode;
+	struct dentry           * dentry;
+	void                    * data;
+	struct driverfs_operations      * ops;
+};
+
+struct driverfs_operations {
+	ssize_t (*read) (char *, size_t, loff_t, void *);
+	ssize_t (*write)(const char *, size_t, loff_t, void*);
+};
+
+node:
+	Node in its parent directory's list of files.
+
+name:
+	The name of the file.
+
+dentry:
+	The dentry for the file.
+
+data:
+	Caller specific data that is passed to the callbacks when they
+	are called.
+
+ops:
+	Operations for the file. Currently, this only contains read() and write()
+	callbacks for the file.
+
+To create a file, one first calls
+
+struct driver_file_entry *
+driverfs_create_entry (const char * name, mode_t mode,
+			struct driverfs_operations * ops, void * data);
+
+That allocates and initialises a struct driver_file_entry. Then, to actually
+create a file, one calls
+
+int driverfs_create_file(struct driver_file_entry * entry,
+			struct driver_dir_entry * parent);
+
+
+To remove a file, one calls
+
+void driverfs_remove_file(struct driver_dir_entry *, const char * name);
+
+
+The callback functionality is similar to the way procfs works. When a
+user performs a read(2) or write(2) on the file, it first calls a
+driverfs function. This function then checks for a non-NULL pointer in
+the file->private_data field, which it assumes to be a pointer to a
+struct driver_file_entry.
+
+It then checks for the appropriate callback and calls it.
+
+
+What driverfs is not:
+~~~~~~~~~~~~~~~~~~~~~
+It is not a replacement for either devfs or procfs.
+
+It does not handle device nodes, like devfs is intended to do. I think
+this functionality is possible, but indeed think that integration of
+the device nodes and control files should be done. Whether driverfs or
+devfs, or something else, is the place to do it, I don't know.
+
+It is not intended to be a replacement for all of the procfs
+functionality. I think that many of the driver files should be moved
+out of /proc (and maybe a few other things as well ;).
+
+
+
+Limitations:
+~~~~~~~~~~~~
+The driverfs functions assume that at most a page is being either read
+or written each time.
+
+
+Possible bugs:
+~~~~~~~~~~~~~~
+It may not deal with offsets and/or seeks very well, especially if
+they cross a page boundary.
+
+There may be locking issues when dynamically adding/removing files and
+directories rapidly (like if you have a hot plug device).
+
+There are some people that believe that filesystems which add
+files/directories dynamically based on the presence of devices are
+inherently flawed. Though not as technically versed in this area as
+some of those people, I like to believe that they can be made to work,
+with the right guidance.
+
diff --git a/Makefile b/Makefile
index 02dbfd5dd..a62e69d0f 100644
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 5
 SUBLEVEL = 1
-EXTRAVERSION =-pre10
+EXTRAVERSION =-pre11
 
 KERNELRELEASE=$(VERSION).$(PATCHLEVEL).$(SUBLEVEL)$(EXTRAVERSION)
 
diff --git a/arch/i386/lib/iodebug.c b/arch/i386/lib/iodebug.c
index 701a07fe7..3f74de6a0 100644
--- a/arch/i386/lib/iodebug.c
+++ b/arch/i386/lib/iodebug.c
@@ -9,11 +9,3 @@ void * __io_virt_debug(unsigned long x, const char *file, int line)
 	return (void *)x;
 }
 
-unsigned long __io_phys_debug(unsigned long x, const char *file, int line)
-{
-	if (x < PAGE_OFFSET) {
-		printk("io mapaddr 0x%05lx not valid at %s:%d!\n", x, file, line);
-		return x;
-	}
-	return __pa(x);
-}
diff --git a/drivers/block/cciss.c b/drivers/block/cciss.c
index e3aeae4a6..feaf7217f 100644
--- a/drivers/block/cciss.c
+++ b/drivers/block/cciss.c
@@ -1237,7 +1237,7 @@ queue:
 
 	blkdev_dequeue_request(creq);
 
-	spin_unlock_irq(&q->queue_lock);
+	spin_unlock_irq(q->queue_lock);
 
 	c->cmd_type = CMD_RWREQ;
 	c->rq = creq;
@@ -1298,7 +1298,7 @@ queue:
 	c->Request.CDB[8]= creq->nr_sectors & 0xff; 
 	c->Request.CDB[9] = c->Request.CDB[11] = c->Request.CDB[12] = 0;
 
-	spin_lock_irq(&q->queue_lock);
+	spin_lock_irq(q->queue_lock);
 
 	addQ(&(h->reqQ),c);
 	h->Qdepth++;
@@ -1866,7 +1866,7 @@ static int __init cciss_init_one(struct pci_dev *pdev,
 
 	q = BLK_DEFAULT_QUEUE(MAJOR_NR + i);
         q->queuedata = hba[i];
-        blk_init_queue(q, do_cciss_request);
+        blk_init_queue(q, do_cciss_request, &hba[i]->lock);
 	blk_queue_bounce_limit(q, hba[i]->pdev->dma_mask);
 
 	/* This is a hardware imposed limit. */
diff --git a/drivers/block/cciss.h b/drivers/block/cciss.h
index 357088d21..03afe43da 100644
--- a/drivers/block/cciss.h
+++ b/drivers/block/cciss.h
@@ -66,6 +66,7 @@ struct ctlr_info
 	unsigned int Qdepth;
 	unsigned int maxQsinceinit;
 	unsigned int maxSG;
+	spinlock_t lock;
 
 	//* pointers to command and error info pool */ 
 	CommandList_struct 	*cmd_pool;
@@ -242,7 +243,7 @@ struct board_type {
 	struct access_method *access;
 };
 
-#define CCISS_LOCK(i)	(&((BLK_DEFAULT_QUEUE(MAJOR_NR + i))->queue_lock))
+#define CCISS_LOCK(i)	((BLK_DEFAULT_QUEUE(MAJOR_NR + i))->queue_lock)
 
 #endif /* CCISS_H */
 
diff --git a/drivers/block/cpqarray.c b/drivers/block/cpqarray.c
index 4062aea0b..2aa1d0365 100644
--- a/drivers/block/cpqarray.c
+++ b/drivers/block/cpqarray.c
@@ -467,7 +467,7 @@ int __init cpqarray_init(void)
 
 		q = BLK_DEFAULT_QUEUE(MAJOR_NR + i);
 		q->queuedata = hba[i];
-		blk_init_queue(q, do_ida_request);
+		blk_init_queue(q, do_ida_request, &hba[i]->lock);
 		blk_queue_bounce_limit(q, hba[i]->pci_dev->dma_mask);
 
 		/* This is a hardware imposed limit. */
@@ -888,7 +888,7 @@ queue_next:
 
 	blkdev_dequeue_request(creq);
 
-	spin_unlock_irq(&q->queue_lock);
+	spin_unlock_irq(q->queue_lock);
 
 	c->ctlr = h->ctlr;
 	c->hdr.unit = MINOR(creq->rq_dev) >> NWD_SHIFT;
@@ -921,7 +921,7 @@ DBGPX(	printk("Submitting %d sectors in %d segments\n", creq->nr_sectors, seg);
 	c->req.hdr.cmd = (rq_data_dir(creq) == READ) ? IDA_READ : IDA_WRITE;
 	c->type = CMD_RWREQ;
 
-	spin_lock_irq(&q->queue_lock);
+	spin_lock_irq(q->queue_lock);
 
 	/* Put the request on the tail of the request queue */
 	addQ(&h->reqQ, c);
diff --git a/drivers/block/cpqarray.h b/drivers/block/cpqarray.h
index bdb8e4108..80b4dba8b 100644
--- a/drivers/block/cpqarray.h
+++ b/drivers/block/cpqarray.h
@@ -106,6 +106,7 @@ struct ctlr_info {
 	cmdlist_t *cmd_pool;
 	dma_addr_t cmd_pool_dhandle;
 	__u32	*cmd_pool_bits;
+	spinlock_t lock;
 
 	unsigned int Qdepth;
 	unsigned int maxQsinceinit;
@@ -117,7 +118,7 @@ struct ctlr_info {
 	unsigned int misc_tflags;
 };
 
-#define IDA_LOCK(i)	(&((BLK_DEFAULT_QUEUE(MAJOR_NR + i))->queue_lock))
+#define IDA_LOCK(i)	((BLK_DEFAULT_QUEUE(MAJOR_NR + i))->queue_lock)
 
 #endif
 
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index ef79fcd0e..800c5b0ae 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -204,6 +204,8 @@ static int use_virtual_dma;
  * record each buffers capabilities
  */
 
+static spinlock_t floppy_lock;
+
 static unsigned short virtual_dma_port=0x3f0;
 void floppy_interrupt(int irq, void *dev_id, struct pt_regs * regs);
 static int set_dor(int fdc, char mask, char data);
@@ -2296,7 +2298,7 @@ static void request_done(int uptodate)
 			DRS->maxtrack = 1;
 
 		/* unlock chained buffers */
-		spin_lock_irqsave(&QUEUE->queue_lock, flags);
+		spin_lock_irqsave(QUEUE->queue_lock, flags);
 		while (current_count_sectors && !QUEUE_EMPTY &&
 		       current_count_sectors >= CURRENT->current_nr_sectors){
 			current_count_sectors -= CURRENT->current_nr_sectors;
@@ -2304,7 +2306,7 @@ static void request_done(int uptodate)
 			CURRENT->sector += CURRENT->current_nr_sectors;
 			end_request(1);
 		}
-		spin_unlock_irqrestore(&QUEUE->queue_lock, flags);
+		spin_unlock_irqrestore(QUEUE->queue_lock, flags);
 
 		if (current_count_sectors && !QUEUE_EMPTY){
 			/* "unlock" last subsector */
@@ -2329,9 +2331,9 @@ static void request_done(int uptodate)
 			DRWE->last_error_sector = CURRENT->sector;
 			DRWE->last_error_generation = DRS->generation;
 		}
-		spin_lock_irqsave(&QUEUE->queue_lock, flags);
+		spin_lock_irqsave(QUEUE->queue_lock, flags);
 		end_request(0);
-		spin_unlock_irqrestore(&QUEUE->queue_lock, flags);
+		spin_unlock_irqrestore(QUEUE->queue_lock, flags);
 	}
 }
 
@@ -2433,17 +2435,20 @@ static void rw_interrupt(void)
 static int buffer_chain_size(void)
 {
 	struct bio *bio;
-	int size;
+	struct bio_vec *bv;
+	int size, i;
 	char *base;
 
-	base = CURRENT->buffer;
+	base = bio_data(CURRENT->bio);
 	size = 0;
 
 	rq_for_each_bio(bio, CURRENT) {
-		if (bio_data(bio) != base + size)
-			break;
+		bio_for_each_segment(bv, bio, i) {
+			if (page_address(bv->bv_page) + bv->bv_offset != base + size)
+				break;
 
-		size += bio->bi_size;
+			size += bv->bv_len;
+		}
 	}
 
 	return size >> 9;
@@ -2469,9 +2474,10 @@ static int transfer_size(int ssize, int max_sector, int max_size)
 static void copy_buffer(int ssize, int max_sector, int max_sector_2)
 {
 	int remaining; /* number of transferred 512-byte sectors */
+	struct bio_vec *bv;
 	struct bio *bio;
 	char *buffer, *dma_buffer;
-	int size;
+	int size, i;
 
 	max_sector = transfer_size(ssize,
 				   minimum(max_sector, max_sector_2),
@@ -2501,12 +2507,17 @@ static void copy_buffer(int ssize, int max_sector, int max_sector_2)
 
 	dma_buffer = floppy_track_buffer + ((fsector_t - buffer_min) << 9);
 
-	bio = CURRENT->bio;
 	size = CURRENT->current_nr_sectors << 9;
-	buffer = CURRENT->buffer;
 
-	while (remaining > 0){
-		SUPBOUND(size, remaining);
+	rq_for_each_bio(bio, CURRENT) {
+		bio_for_each_segment(bv, bio, i) {
+			if (!remaining)
+				break;
+
+			size = bv->bv_len;
+			SUPBOUND(size, remaining);
+
+			buffer = page_address(bv->bv_page) + bv->bv_offset;
 #ifdef FLOPPY_SANITY_CHECK
 		if (dma_buffer + size >
 		    floppy_track_buffer + (max_buffer_sectors << 10) ||
@@ -2526,24 +2537,14 @@ static void copy_buffer(int ssize, int max_sector, int max_sector_2)
 		if (((unsigned long)buffer) % 512)
 			DPRINT("%p buffer not aligned\n", buffer);
 #endif
-		if (CT(COMMAND) == FD_READ)
-			memcpy(buffer, dma_buffer, size);
-		else
-			memcpy(dma_buffer, buffer, size);
-		remaining -= size;
-		if (!remaining)
-			break;
+			if (CT(COMMAND) == FD_READ)
+				memcpy(buffer, dma_buffer, size);
+			else
+				memcpy(dma_buffer, buffer, size);
 
-		dma_buffer += size;
-		bio = bio->bi_next;
-#ifdef FLOPPY_SANITY_CHECK
-		if (!bio){
-			DPRINT("bh=null in copy buffer after copy\n");
-			break;
+			remaining -= size;
+			dma_buffer += size;
 		}
-#endif
-		size = bio->bi_size;
-		buffer = bio_data(bio);
 	}
 #ifdef FLOPPY_SANITY_CHECK
 	if (remaining){
@@ -4169,7 +4170,7 @@ int __init floppy_init(void)
 
 	blk_size[MAJOR_NR] = floppy_sizes;
 	blksize_size[MAJOR_NR] = floppy_blocksizes;
-	blk_init_queue(BLK_DEFAULT_QUEUE(MAJOR_NR), DEVICE_REQUEST);
+	blk_init_queue(BLK_DEFAULT_QUEUE(MAJOR_NR), DEVICE_REQUEST, &floppy_lock);
 	reschedule_timeout(MAXTIMEOUT, "floppy init", MAXTIMEOUT);
 	config_types();
 
@@ -4477,6 +4478,7 @@ MODULE_LICENSE("GPL");
 #else
 
 __setup ("floppy=", floppy_setup);
+module_init(floppy_init)
 
 /* eject the boot floppy (if we need the drive for a different root floppy) */
 /* This should only be called at boot time when we're sure that there's no
diff --git a/drivers/block/ll_rw_blk.c b/drivers/block/ll_rw_blk.c
index 86fe87ce6..26bf06621 100644
--- a/drivers/block/ll_rw_blk.c
+++ b/drivers/block/ll_rw_blk.c
@@ -272,6 +272,12 @@ void blk_queue_segment_boundary(request_queue_t *q, unsigned long mask)
 	q->seg_boundary_mask = mask;
 }
 
+void blk_queue_assign_lock(request_queue_t *q, spinlock_t *lock)
+{
+	spin_lock_init(lock);
+	q->queue_lock = lock;
+}
+
 static char *rq_flags[] = { "REQ_RW", "REQ_RW_AHEAD", "REQ_BARRIER",
 			   "REQ_CMD", "REQ_NOMERGE", "REQ_STARTED",
 			   "REQ_DONTPREP", "REQ_DRIVE_CMD", "REQ_DRIVE_TASK",
@@ -641,9 +647,9 @@ void generic_unplug_device(void *data)
 	request_queue_t *q = (request_queue_t *) data;
 	unsigned long flags;
 
-	spin_lock_irqsave(&q->queue_lock, flags);
+	spin_lock_irqsave(q->queue_lock, flags);
 	__generic_unplug_device(q);
-	spin_unlock_irqrestore(&q->queue_lock, flags);
+	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
 static int __blk_cleanup_queue(struct request_list *list)
@@ -729,7 +735,6 @@ static int blk_init_free_list(request_queue_t *q)
 
 	init_waitqueue_head(&q->rq[READ].wait);
 	init_waitqueue_head(&q->rq[WRITE].wait);
-	spin_lock_init(&q->queue_lock);
 	return 0;
 nomem:
 	blk_cleanup_queue(q);
@@ -766,7 +771,7 @@ static int __make_request(request_queue_t *, struct bio *);
  *    blk_init_queue() must be paired with a blk_cleanup_queue() call
  *    when the block device is deactivated (such as at module unload).
  **/
-int blk_init_queue(request_queue_t *q, request_fn_proc *rfn)
+int blk_init_queue(request_queue_t *q, request_fn_proc *rfn, spinlock_t *lock)
 {
 	int ret;
 
@@ -787,6 +792,7 @@ int blk_init_queue(request_queue_t *q, request_fn_proc *rfn)
 	q->plug_tq.routine	= &generic_unplug_device;
 	q->plug_tq.data		= q;
 	q->queue_flags		= (1 << QUEUE_FLAG_CLUSTER);
+	q->queue_lock		= lock;
 	
 	/*
 	 * by default assume old behaviour and bounce for any highmem page
@@ -833,7 +839,7 @@ static struct request *get_request_wait(request_queue_t *q, int rw)
 	struct request_list *rl = &q->rq[rw];
 	struct request *rq;
 
-	spin_lock_prefetch(&q->queue_lock);
+	spin_lock_prefetch(q->queue_lock);
 
 	generic_unplug_device(q);
 	add_wait_queue(&rl->wait, &wait);
@@ -841,9 +847,9 @@ static struct request *get_request_wait(request_queue_t *q, int rw)
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		if (rl->count < batch_requests)
 			schedule();
-		spin_lock_irq(&q->queue_lock);
+		spin_lock_irq(q->queue_lock);
 		rq = get_request(q, rw);
-		spin_unlock_irq(&q->queue_lock);
+		spin_unlock_irq(q->queue_lock);
 	} while (rq == NULL);
 	remove_wait_queue(&rl->wait, &wait);
 	current->state = TASK_RUNNING;
@@ -1054,9 +1060,9 @@ void blk_attempt_remerge(request_queue_t *q, struct request *rq)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&q->queue_lock, flags);
+	spin_lock_irqsave(q->queue_lock, flags);
 	__blk_attempt_remerge(q, rq);
-	spin_unlock_irqrestore(&q->queue_lock, flags);
+	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
 static int __make_request(request_queue_t *q, struct bio *bio)
@@ -1079,7 +1085,7 @@ static int __make_request(request_queue_t *q, struct bio *bio)
 	 */
 	blk_queue_bounce(q, &bio);
 
-	spin_lock_prefetch(&q->queue_lock);
+	spin_lock_prefetch(q->queue_lock);
 
 	latency = elevator_request_latency(elevator, rw);
 	barrier = test_bit(BIO_RW_BARRIER, &bio->bi_rw);
@@ -1088,7 +1094,7 @@ again:
 	req = NULL;
 	head = &q->queue_head;
 
-	spin_lock_irq(&q->queue_lock);
+	spin_lock_irq(q->queue_lock);
 
 	insert_here = head->prev;
 	if (blk_queue_empty(q) || barrier) {
@@ -1171,7 +1177,7 @@ get_rq:
 		freereq = NULL;
 	} else if ((req = get_request(q, rw)) == NULL) {
 
-		spin_unlock_irq(&q->queue_lock);
+		spin_unlock_irq(q->queue_lock);
 
 		/*
 		 * READA bit set
@@ -1216,7 +1222,7 @@ get_rq:
 out:
 	if (freereq)
 		blkdev_release_request(freereq);
-	spin_unlock_irq(&q->queue_lock);
+	spin_unlock_irq(q->queue_lock);
 	return 0;
 
 end_io:
@@ -1736,3 +1742,4 @@ EXPORT_SYMBOL(blk_dump_rq_flags);
 EXPORT_SYMBOL(submit_bio);
 EXPORT_SYMBOL(blk_phys_contig_segment);
 EXPORT_SYMBOL(blk_hw_contig_segment);
+EXPORT_SYMBOL(blk_queue_assign_lock);
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 22e5b4a60..c16b6163a 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -62,6 +62,8 @@ static u64 nbd_bytesizes[MAX_NBD];
 static struct nbd_device nbd_dev[MAX_NBD];
 static devfs_handle_t devfs_handle;
 
+static spinlock_t nbd_lock;
+
 #define DEBUG( s )
 /* #define DEBUG( s ) printk( s ) 
  */
@@ -347,22 +349,22 @@ static void do_nbd_request(request_queue_t * q)
 #endif
 		req->errors = 0;
 		blkdev_dequeue_request(req);
-		spin_unlock_irq(&q->queue_lock);
+		spin_unlock_irq(q->queue_lock);
 
 		down (&lo->queue_lock);
 		list_add(&req->queuelist, &lo->queue_head);
 		nbd_send_req(lo->sock, req);	/* Why does this block?         */
 		up (&lo->queue_lock);
 
-		spin_lock_irq(&q->queue_lock);
+		spin_lock_irq(q->queue_lock);
 		continue;
 
 	      error_out:
 		req->errors++;
 		blkdev_dequeue_request(req);
-		spin_unlock(&q->queue_lock);
+		spin_unlock(q->queue_lock);
 		nbd_end_request(req);
-		spin_lock(&q->queue_lock);
+		spin_lock(q->queue_lock);
 	}
 	return;
 }
@@ -515,7 +517,7 @@ static int __init nbd_init(void)
 #endif
 	blksize_size[MAJOR_NR] = nbd_blksizes;
 	blk_size[MAJOR_NR] = nbd_sizes;
-	blk_init_queue(BLK_DEFAULT_QUEUE(MAJOR_NR), do_nbd_request);
+	blk_init_queue(BLK_DEFAULT_QUEUE(MAJOR_NR), do_nbd_request, &nbd_lock);
 	for (i = 0; i < MAX_NBD; i++) {
 		nbd_dev[i].refcnt = 0;
 		nbd_dev[i].file = NULL;
diff --git a/drivers/block/paride/pcd.c b/drivers/block/paride/pcd.c
index 61e50fec5..9604464e3 100644
--- a/drivers/block/paride/pcd.c
+++ b/drivers/block/paride/pcd.c
@@ -146,6 +146,8 @@ static int pcd_drive_count;
 
 #include <asm/uaccess.h>
 
+static spinlock_t pcd_lock;
+
 #ifndef MODULE
 
 #include "setup.h"
@@ -355,7 +357,7 @@ int pcd_init (void)	/* preliminary initialisation */
 		}
 	}
 
-	blk_init_queue(BLK_DEFAULT_QUEUE(MAJOR_NR), DEVICE_REQUEST);
+	blk_init_queue(BLK_DEFAULT_QUEUE(MAJOR_NR), DEVICE_REQUEST, &pcd_lock);
 	read_ahead[MAJOR_NR] = 8;	/* 8 sector (4kB) read ahead */
 
 	for (i=0;i<PCD_UNITS;i++) pcd_blocksizes[i] = 1024;
@@ -821,11 +823,11 @@ static void pcd_start( void )
 
 	if (pcd_command(unit,rd_cmd,2048,"read block")) {
 		pcd_bufblk = -1; 
-		spin_lock_irqsave(&QUEUE->queue_lock,saved_flags);
+		spin_lock_irqsave(&pcd_lock,saved_flags);
 		pcd_busy = 0;
 		end_request(0);
 		do_pcd_request(NULL);
-		spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags);
+		spin_unlock_irqrestore(&pcd_lock,saved_flags);
 		return;
 	}
 
@@ -845,11 +847,11 @@ static void do_pcd_read( void )
 	pcd_retries = 0;
 	pcd_transfer();
 	if (!pcd_count) {
-		spin_lock_irqsave(&QUEUE->queue_lock,saved_flags);
+		spin_lock_irqsave(&pcd_lock,saved_flags);
 		end_request(1);
 		pcd_busy = 0;
 		do_pcd_request(NULL);
-		spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags);
+		spin_unlock_irqrestore(&pcd_lock,saved_flags);
 		return;
 	}
 
@@ -868,19 +870,19 @@ static void do_pcd_read_drq( void )
 			pi_do_claimed(PI,pcd_start);
                         return;
                         }
-		spin_lock_irqsave(&QUEUE->queue_lock,saved_flags);
+		spin_lock_irqsave(&pcd_lock,saved_flags);
 		pcd_busy = 0;
 		pcd_bufblk = -1;
 		end_request(0);
 		do_pcd_request(NULL);
-		spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags);
+		spin_unlock_irqrestore(&pcd_lock,saved_flags);
 		return;
 	}
 
 	do_pcd_read();
-	spin_lock_irqsave(&QUEUE->queue_lock,saved_flags);
+	spin_lock_irqsave(&pcd_lock,saved_flags);
 	do_pcd_request(NULL);
-	spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags); 
+	spin_unlock_irqrestore(&pcd_lock,saved_flags); 
 }
 
 /* the audio_ioctl stuff is adapted from sr_ioctl.c */
diff --git a/drivers/block/paride/pf.c b/drivers/block/paride/pf.c
index a86ec90bd..cb4454267 100644
--- a/drivers/block/paride/pf.c
+++ b/drivers/block/paride/pf.c
@@ -164,6 +164,8 @@ static int pf_drive_count;
 
 #include <asm/uaccess.h>
 
+static spinlock_t pf_spin_lock;
+
 #ifndef MODULE
 
 #include "setup.h"
@@ -358,7 +360,7 @@ int pf_init (void)      /* preliminary initialisation */
                 return -1;
         }
 	q = BLK_DEFAULT_QUEUE(MAJOR_NR);
-	blk_init_queue(q, DEVICE_REQUEST);
+	blk_init_queue(q, DEVICE_REQUEST, &pf_spin_lock);
 	blk_queue_max_phys_segments(q, cluster);
 	blk_queue_max_hw_segments(q, cluster);
         read_ahead[MAJOR_NR] = 8;       /* 8 sector (4kB) read ahead */
@@ -878,9 +880,9 @@ static void pf_next_buf( int unit )
 
 {	long	saved_flags;
 
-	spin_lock_irqsave(&QUEUE->queue_lock,saved_flags);
+	spin_lock_irqsave(&pf_spin_lock,saved_flags);
 	end_request(1);
-	if (!pf_run) { spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags);
+	if (!pf_run) { spin_unlock_irqrestore(&pf_spin_lock,saved_flags);
 		       return; 
 	}
 	
@@ -896,7 +898,7 @@ static void pf_next_buf( int unit )
 
 	pf_count = CURRENT->current_nr_sectors;
 	pf_buf = CURRENT->buffer;
-	spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags);
+	spin_unlock_irqrestore(&pf_spin_lock,saved_flags);
 }
 
 static void do_pf_read( void )
@@ -920,11 +922,11 @@ static void do_pf_read_start( void )
                         pi_do_claimed(PI,do_pf_read_start);
 			return;
                 }
-		spin_lock_irqsave(&QUEUE->queue_lock,saved_flags);
+		spin_lock_irqsave(&pf_spin_lock,saved_flags);
                 end_request(0);
                 pf_busy = 0;
 		do_pf_request(NULL);
-		spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags);
+		spin_unlock_irqrestore(&pf_spin_lock,saved_flags);
                 return;
         }
 	pf_mask = STAT_DRQ;
@@ -946,11 +948,11 @@ static void do_pf_read_drq( void )
                         pi_do_claimed(PI,do_pf_read_start);
                         return;
                 }
-		spin_lock_irqsave(&QUEUE->queue_lock,saved_flags);
+		spin_lock_irqsave(&pf_spin_lock,saved_flags);
                 end_request(0);
                 pf_busy = 0;
 		do_pf_request(NULL);
-		spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags);
+		spin_unlock_irqrestore(&pf_spin_lock,saved_flags);
                 return;
             }
             pi_read_block(PI,pf_buf,512);
@@ -961,11 +963,11 @@ static void do_pf_read_drq( void )
 	    if (!pf_count) pf_next_buf(unit);
         }
         pi_disconnect(PI);
-	spin_lock_irqsave(&QUEUE->queue_lock,saved_flags); 
+	spin_lock_irqsave(&pf_spin_lock,saved_flags); 
         end_request(1);
         pf_busy = 0;
 	do_pf_request(NULL);
-	spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags);
+	spin_unlock_irqrestore(&pf_spin_lock,saved_flags);
 }
 
 static void do_pf_write( void )
@@ -987,11 +989,11 @@ static void do_pf_write_start( void )
                         pi_do_claimed(PI,do_pf_write_start);
 			return;
                 }
-		spin_lock_irqsave(&QUEUE->queue_lock,saved_flags);
+		spin_lock_irqsave(&pf_spin_lock,saved_flags);
                 end_request(0);
                 pf_busy = 0;
 		do_pf_request(NULL);
-		spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags);
+		spin_unlock_irqrestore(&pf_spin_lock,saved_flags);
                 return;
         }
 
@@ -1004,11 +1006,11 @@ static void do_pf_write_start( void )
                         pi_do_claimed(PI,do_pf_write_start);
                         return;
                 }
-		spin_lock_irqsave(&QUEUE->queue_lock,saved_flags);
+		spin_lock_irqsave(&pf_spin_lock,saved_flags);
                 end_request(0);
                 pf_busy = 0;
 		do_pf_request(NULL);
-		spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags);
+		spin_unlock_irqrestore(&pf_spin_lock,saved_flags);
                 return;
             }
             pi_write_block(PI,pf_buf,512);
@@ -1034,19 +1036,19 @@ static void do_pf_write_done( void )
 			pi_do_claimed(PI,do_pf_write_start);
                         return;
                 }
-		spin_lock_irqsave(&QUEUE->queue_lock,saved_flags);
+		spin_lock_irqsave(&pf_spin_lock,saved_flags);
                 end_request(0);
                 pf_busy = 0;
 		do_pf_request(NULL);
-		spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags);
+		spin_unlock_irqrestore(&pf_spin_lock,saved_flags);
                 return;
         }
         pi_disconnect(PI);
-	spin_lock_irqsave(&QUEUE->queue_lock,saved_flags);
+	spin_lock_irqsave(&pf_spin_lock,saved_flags);
         end_request(1);
         pf_busy = 0;
 	do_pf_request(NULL);
-	spin_unlock_irqrestore(&QUEUE->queue_lock,saved_flags);
+	spin_unlock_irqrestore(&pf_spin_lock,saved_flags);
 }
 
 /* end of pf.c */
diff --git a/drivers/block/ps2esdi.c b/drivers/block/ps2esdi.c
index 01c8805b8..b248b437b 100644
--- a/drivers/block/ps2esdi.c
+++ b/drivers/block/ps2esdi.c
@@ -189,6 +189,8 @@ int __init ps2esdi_init(void)
 	return 0;
 }				/* ps2esdi_init */
 
+module_init(ps2esdi_init);
+
 #ifdef MODULE
 
 static int cyl[MAX_HD] = {-1,-1};
diff --git a/drivers/block/rd.c b/drivers/block/rd.c
index 67e124c4a..4cf2978d0 100644
--- a/drivers/block/rd.c
+++ b/drivers/block/rd.c
@@ -44,9 +44,6 @@
 
 #include <linux/config.h>
 #include <linux/sched.h>
-#include <linux/minix_fs.h>
-#include <linux/ext2_fs.h>
-#include <linux/romfs_fs.h>
 #include <linux/fs.h>
 #include <linux/kernel.h>
 #include <linux/hdreg.h>
@@ -79,19 +76,10 @@ extern void wait_for_keypress(void);
 /* The RAM disk size is now a parameter */
 #define NUM_RAMDISKS 16		/* This cannot be overridden (yet) */ 
 
-#ifndef MODULE
-/* We don't have to load RAM disks or gunzip them in a module. */
-#define RD_LOADER
-#define BUILD_CRAMDISK
-
-void rd_load(void);
-static int crd_load(struct file *fp, struct file *outfp);
-
 #ifdef CONFIG_BLK_DEV_INITRD
 static int initrd_users;
 static spinlock_t initrd_users_lock = SPIN_LOCK_UNLOCKED;
 #endif
-#endif
 
 /* Various static variables go here.  Most are used only in the RAM disk code.
  */
@@ -542,6 +530,8 @@ int __init rd_init (void)
 #ifdef CONFIG_BLK_DEV_INITRD
 	/* We ought to separate initrd operations here */
 	register_disk(NULL, MKDEV(MAJOR_NR,INITRD_MINOR), 1, &rd_bd_op, rd_size<<1);
+	devfs_register(devfs_handle, "initrd", DEVFS_FL_DEFAULT, MAJOR_NR,
+			INITRD_MINOR, S_IFBLK | S_IRUSR, &rd_bd_op, NULL);
 #endif
 
 	blksize_size[MAJOR_NR] = rd_blocksizes;		/* Avoid set_blocksize() check */
@@ -565,462 +555,3 @@ MODULE_PARM     (rd_blocksize, "i");
 MODULE_PARM_DESC(rd_blocksize, "Blocksize of each RAM disk in bytes.");
 
 MODULE_LICENSE("GPL");
-
-/* End of non-loading portions of the RAM disk driver */
-
-#ifdef RD_LOADER 
-/*
- * This routine tries to find a RAM disk image to load, and returns the
- * number of blocks to read for a non-compressed image, 0 if the image
- * is a compressed image, and -1 if an image with the right magic
- * numbers could not be found.
- *
- * We currently check for the following magic numbers:
- * 	minix
- * 	ext2
- *	romfs
- * 	gzip
- */
-static int __init 
-identify_ramdisk_image(kdev_t device, struct file *fp, int start_block)
-{
-	const int size = 512;
-	struct minix_super_block *minixsb;
-	struct ext2_super_block *ext2sb;
-	struct romfs_super_block *romfsb;
-	int nblocks = -1;
-	unsigned char *buf;
-
-	buf = kmalloc(size, GFP_KERNEL);
-	if (buf == 0)
-		return -1;
-
-	minixsb = (struct minix_super_block *) buf;
-	ext2sb = (struct ext2_super_block *) buf;
-	romfsb = (struct romfs_super_block *) buf;
-	memset(buf, 0xe5, size);
-
-	/*
-	 * Read block 0 to test for gzipped kernel
-	 */
-	if (fp->f_op->llseek)
-		fp->f_op->llseek(fp, start_block * BLOCK_SIZE, 0);
-	fp->f_pos = start_block * BLOCK_SIZE;
-	
-	fp->f_op->read(fp, buf, size, &fp->f_pos);
-
-	/*
-	 * If it matches the gzip magic numbers, return -1
-	 */
-	if (buf[0] == 037 && ((buf[1] == 0213) || (buf[1] == 0236))) {
-		printk(KERN_NOTICE
-		       "RAMDISK: Compressed image found at block %d\n",
-		       start_block);
-		nblocks = 0;
-		goto done;
-	}
-
-	/* romfs is at block zero too */
-	if (romfsb->word0 == ROMSB_WORD0 &&
-	    romfsb->word1 == ROMSB_WORD1) {
-		printk(KERN_NOTICE
-		       "RAMDISK: romfs filesystem found at block %d\n",
-		       start_block);
-		nblocks = (ntohl(romfsb->size)+BLOCK_SIZE-1)>>BLOCK_SIZE_BITS;
-		goto done;
-	}
-
-	/*
-	 * Read block 1 to test for minix and ext2 superblock
-	 */
-	if (fp->f_op->llseek)
-		fp->f_op->llseek(fp, (start_block+1) * BLOCK_SIZE, 0);
-	fp->f_pos = (start_block+1) * BLOCK_SIZE;
-
-	fp->f_op->read(fp, buf, size, &fp->f_pos);
-		
-	/* Try minix */
-	if (minixsb->s_magic == MINIX_SUPER_MAGIC ||
-	    minixsb->s_magic == MINIX_SUPER_MAGIC2) {
-		printk(KERN_NOTICE
-		       "RAMDISK: Minix filesystem found at block %d\n",
-		       start_block);
-		nblocks = minixsb->s_nzones << minixsb->s_log_zone_size;
-		goto done;
-	}
-
-	/* Try ext2 */
-	if (ext2sb->s_magic == cpu_to_le16(EXT2_SUPER_MAGIC)) {
-		printk(KERN_NOTICE
-		       "RAMDISK: ext2 filesystem found at block %d\n",
-		       start_block);
-		nblocks = le32_to_cpu(ext2sb->s_blocks_count);
-		goto done;
-	}
-
-	printk(KERN_NOTICE
-	       "RAMDISK: Couldn't find valid RAM disk image starting at %d.\n",
-	       start_block);
-	
-done:
-	if (fp->f_op->llseek)
-		fp->f_op->llseek(fp, start_block * BLOCK_SIZE, 0);
-	fp->f_pos = start_block * BLOCK_SIZE;	
-
-	kfree(buf);
-	return nblocks;
-}
-
-/*
- * This routine loads in the RAM disk image.
- */
-static void __init rd_load_image(kdev_t device, int offset, int unit)
-{
- 	struct inode *inode, *out_inode;
-	struct file infile, outfile;
-	struct dentry in_dentry, out_dentry;
-	mm_segment_t fs;
-	kdev_t ram_device;
-	int nblocks, i;
-	char *buf;
-	unsigned short rotate = 0;
-	unsigned short devblocks = 0;
-#if !defined(CONFIG_ARCH_S390) && !defined(CONFIG_PPC_ISERIES)
-	char rotator[4] = { '|' , '/' , '-' , '\\' };
-#endif
-	ram_device = MKDEV(MAJOR_NR, unit);
-
-	if ((inode = get_empty_inode()) == NULL)
-		return;
-	memset(&infile, 0, sizeof(infile));
-	memset(&in_dentry, 0, sizeof(in_dentry));
-	infile.f_mode = 1; /* read only */
-	infile.f_dentry = &in_dentry;
-	in_dentry.d_inode = inode;
-	infile.f_op = &def_blk_fops;
-	init_special_inode(inode, S_IFBLK | S_IRUSR, kdev_t_to_nr(device));
-
-	if ((out_inode = get_empty_inode()) == NULL)
-		goto free_inode;
-	memset(&outfile, 0, sizeof(outfile));
-	memset(&out_dentry, 0, sizeof(out_dentry));
-	outfile.f_mode = 3; /* read/write */
-	outfile.f_dentry = &out_dentry;
-	out_dentry.d_inode = out_inode;
-	outfile.f_op = &def_blk_fops;
-	init_special_inode(out_inode, S_IFBLK | S_IRUSR | S_IWUSR, kdev_t_to_nr(ram_device));
-
-	if (blkdev_open(inode, &infile) != 0) {
-		iput(out_inode);
-		goto free_inode;
-	}
-	if (blkdev_open(out_inode, &outfile) != 0)
-		goto free_inodes;
-
-	fs = get_fs();
-	set_fs(KERNEL_DS);
-	
-	nblocks = identify_ramdisk_image(device, &infile, offset);
-	if (nblocks < 0)
-		goto done;
-
-	if (nblocks == 0) {
-#ifdef BUILD_CRAMDISK
-		if (crd_load(&infile, &outfile) == 0)
-			goto successful_load;
-#else
-		printk(KERN_NOTICE
-		       "RAMDISK: Kernel does not support compressed "
-		       "RAM disk images\n");
-#endif
-		goto done;
-	}
-
-	/*
-	 * NOTE NOTE: nblocks suppose that the blocksize is BLOCK_SIZE, so
-	 * rd_load_image will work only with filesystem BLOCK_SIZE wide!
-	 * So make sure to use 1k blocksize while generating ext2fs
-	 * ramdisk-images.
-	 */
-	if (nblocks > (rd_length[unit] >> BLOCK_SIZE_BITS)) {
-		printk("RAMDISK: image too big! (%d/%ld blocks)\n",
-		       nblocks, rd_length[unit] >> BLOCK_SIZE_BITS);
-		goto done;
-	}
-		
-	/*
-	 * OK, time to copy in the data
-	 */
-	buf = kmalloc(BLOCK_SIZE, GFP_KERNEL);
-	if (buf == 0) {
-		printk(KERN_ERR "RAMDISK: could not allocate buffer\n");
-		goto done;
-	}
-
-	if (blk_size[MAJOR(device)])
-		devblocks = blk_size[MAJOR(device)][MINOR(device)];
-
-#ifdef CONFIG_BLK_DEV_INITRD
-	if (MAJOR(device) == MAJOR_NR && MINOR(device) == INITRD_MINOR)
-		devblocks = nblocks;
-#endif
-
-	if (devblocks == 0) {
-		printk(KERN_ERR "RAMDISK: could not determine device size\n");
-		goto done;
-	}
-
-	printk(KERN_NOTICE "RAMDISK: Loading %d blocks [%d disk%s] into ram disk... ", 
-		nblocks, ((nblocks-1)/devblocks)+1, nblocks>devblocks ? "s" : "");
-	for (i=0; i < nblocks; i++) {
-		if (i && (i % devblocks == 0)) {
-			printk("done disk #%d.\n", i/devblocks);
-			rotate = 0;
-			if (infile.f_op->release(inode, &infile) != 0) {
-				printk("Error closing the disk.\n");
-				goto noclose_input;
-			}
-			printk("Please insert disk #%d and press ENTER\n", i/devblocks+1);
-			wait_for_keypress();
-			if (blkdev_open(inode, &infile) != 0)  {
-				printk("Error opening disk.\n");
-				goto noclose_input;
-			}
-			infile.f_pos = 0;
-			printk("Loading disk #%d... ", i/devblocks+1);
-		}
-		infile.f_op->read(&infile, buf, BLOCK_SIZE, &infile.f_pos);
-		outfile.f_op->write(&outfile, buf, BLOCK_SIZE, &outfile.f_pos);
-#if !defined(CONFIG_ARCH_S390) && !defined(CONFIG_PPC_ISERIES)
-		if (!(i % 16)) {
-			printk("%c\b", rotator[rotate & 0x3]);
-			rotate++;
-		}
-#endif
-	}
-	printk("done.\n");
-	kfree(buf);
-
-successful_load:
-	ROOT_DEV = MKDEV(MAJOR_NR, unit);
-	if (ROOT_DEVICE_NAME != NULL) strcpy (ROOT_DEVICE_NAME, "rd/0");
-
-done:
-	infile.f_op->release(inode, &infile);
-noclose_input:
-	blkdev_close(out_inode, &outfile);
-	iput(inode);
-	iput(out_inode);
-	set_fs(fs);
-	return;
-free_inodes: /* free inodes on error */ 
-	iput(out_inode);
-	infile.f_op->release(inode, &infile);
-free_inode:
-	iput(inode);
-}
-
-#ifdef CONFIG_MAC_FLOPPY
-int swim3_fd_eject(int devnum);
-#endif
-
-static void __init rd_load_disk(int n)
-{
-
-	if (rd_doload == 0)
-		return;
-
-	if (MAJOR(ROOT_DEV) != FLOPPY_MAJOR
-#ifdef CONFIG_BLK_DEV_INITRD
-		&& MAJOR(real_root_dev) != FLOPPY_MAJOR
-#endif
-	)
-		return;
-
-	if (rd_prompt) {
-#ifdef CONFIG_BLK_DEV_FD
-		floppy_eject();
-#endif
-#ifdef CONFIG_MAC_FLOPPY
-		if(MAJOR(ROOT_DEV) == FLOPPY_MAJOR)
-			swim3_fd_eject(MINOR(ROOT_DEV));
-		else if(MAJOR(real_root_dev) == FLOPPY_MAJOR)
-			swim3_fd_eject(MINOR(real_root_dev));
-#endif
-		printk(KERN_NOTICE
-		       "VFS: Insert root floppy disk to be loaded into RAM disk and press ENTER\n");
-		wait_for_keypress();
-	}
-
-	rd_load_image(ROOT_DEV,rd_image_start, n);
-
-}
-
-void __init rd_load(void)
-{
-	rd_load_disk(0);
-}
-
-void __init rd_load_secondary(void)
-{
-	rd_load_disk(1);
-}
-
-#ifdef CONFIG_BLK_DEV_INITRD
-void __init initrd_load(void)
-{
-	rd_load_image(MKDEV(MAJOR_NR, INITRD_MINOR),rd_image_start,0);
-}
-#endif
-
-#endif /* RD_LOADER */
-
-#ifdef BUILD_CRAMDISK
-
-/*
- * gzip declarations
- */
-
-#define OF(args)  args
-
-#ifndef memzero
-#define memzero(s, n)     memset ((s), 0, (n))
-#endif
-
-typedef unsigned char  uch;
-typedef unsigned short ush;
-typedef unsigned long  ulg;
-
-#define INBUFSIZ 4096
-#define WSIZE 0x8000    /* window size--must be a power of two, and */
-			/*  at least 32K for zip's deflate method */
-
-static uch *inbuf;
-static uch *window;
-
-static unsigned insize;  /* valid bytes in inbuf */
-static unsigned inptr;   /* index of next byte to be processed in inbuf */
-static unsigned outcnt;  /* bytes in output buffer */
-static int exit_code;
-static long bytes_out;
-static struct file *crd_infp, *crd_outfp;
-
-#define get_byte()  (inptr < insize ? inbuf[inptr++] : fill_inbuf())
-		
-/* Diagnostic functions (stubbed out) */
-#define Assert(cond,msg)
-#define Trace(x)
-#define Tracev(x)
-#define Tracevv(x)
-#define Tracec(c,x)
-#define Tracecv(c,x)
-
-#define STATIC static
-
-static int  fill_inbuf(void);
-static void flush_window(void);
-static void *malloc(int size);
-static void free(void *where);
-static void error(char *m);
-static void gzip_mark(void **);
-static void gzip_release(void **);
-
-#include "../../lib/inflate.c"
-
-static void __init *malloc(int size)
-{
-	return kmalloc(size, GFP_KERNEL);
-}
-
-static void __init free(void *where)
-{
-	kfree(where);
-}
-
-static void __init gzip_mark(void **ptr)
-{
-}
-
-static void __init gzip_release(void **ptr)
-{
-}
-
-
-/* ===========================================================================
- * Fill the input buffer. This is called only when the buffer is empty
- * and at least one byte is really needed.
- */
-static int __init fill_inbuf(void)
-{
-	if (exit_code) return -1;
-	
-	insize = crd_infp->f_op->read(crd_infp, inbuf, INBUFSIZ,
-				      &crd_infp->f_pos);
-	if (insize == 0) return -1;
-
-	inptr = 1;
-
-	return inbuf[0];
-}
-
-/* ===========================================================================
- * Write the output window window[0..outcnt-1] and update crc and bytes_out.
- * (Used for the decompressed data only.)
- */
-static void __init flush_window(void)
-{
-    ulg c = crc;         /* temporary variable */
-    unsigned n;
-    uch *in, ch;
-    
-    crd_outfp->f_op->write(crd_outfp, window, outcnt, &crd_outfp->f_pos);
-    in = window;
-    for (n = 0; n < outcnt; n++) {
-	    ch = *in++;
-	    c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
-    }
-    crc = c;
-    bytes_out += (ulg)outcnt;
-    outcnt = 0;
-}
-
-static void __init error(char *x)
-{
-	printk(KERN_ERR "%s", x);
-	exit_code = 1;
-}
-
-static int __init 
-crd_load(struct file * fp, struct file *outfp)
-{
-	int result;
-
-	insize = 0;		/* valid bytes in inbuf */
-	inptr = 0;		/* index of next byte to be processed in inbuf */
-	outcnt = 0;		/* bytes in output buffer */
-	exit_code = 0;
-	bytes_out = 0;
-	crc = (ulg)0xffffffffL; /* shift register contents */
-
-	crd_infp = fp;
-	crd_outfp = outfp;
-	inbuf = kmalloc(INBUFSIZ, GFP_KERNEL);
-	if (inbuf == 0) {
-		printk(KERN_ERR "RAMDISK: Couldn't allocate gzip buffer\n");
-		return -1;
-	}
-	window = kmalloc(WSIZE, GFP_KERNEL);
-	if (window == 0) {
-		printk(KERN_ERR "RAMDISK: Couldn't allocate gzip window\n");
-		kfree(inbuf);
-		return -1;
-	}
-	makecrc();
-	result = gunzip();
-	kfree(inbuf);
-	kfree(window);
-	return result;
-}
-
-#endif  /* BUILD_CRAMDISK */
-
diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
index f076200dd..3f93dcc90 100644
--- a/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -597,7 +597,7 @@ static void ide_init_queue(ide_drive_t *drive)
 	int max_sectors;
 
 	q->queuedata = HWGROUP(drive);
-	blk_init_queue(q, do_ide_request);
+	blk_init_queue(q, do_ide_request, &ide_lock);
 	blk_queue_segment_boundary(q, 0xffff);
 
 	/* IDE can do up to 128K per request, pdc4030 needs smaller limit */
diff --git a/drivers/ide/ide.c b/drivers/ide/ide.c
index 8a941308a..c1b19e1d9 100644
--- a/drivers/ide/ide.c
+++ b/drivers/ide/ide.c
@@ -177,8 +177,6 @@ static int	initializing;     /* set while initializing built-in drivers */
 /*
  * protects global structures etc, we want to split this into per-hwgroup
  * instead.
- *
- * anti-deadlock ordering: ide_lock -> DRIVE_LOCK
  */
 spinlock_t ide_lock __cacheline_aligned = SPIN_LOCK_UNLOCKED;
 
@@ -583,11 +581,9 @@ inline int __ide_end_request(ide_hwgroup_t *hwgroup, int uptodate, int nr_secs)
 
 	if (!end_that_request_first(rq, uptodate, nr_secs)) {
 		add_blkdev_randomness(MAJOR(rq->rq_dev));
-		spin_lock(DRIVE_LOCK(drive));
 		blkdev_dequeue_request(rq);
         	hwgroup->rq = NULL;
 		end_that_request_last(rq);
-		spin_unlock(DRIVE_LOCK(drive));
 		ret = 0;
 	}
 
@@ -900,11 +896,9 @@ void ide_end_drive_cmd (ide_drive_t *drive, byte stat, byte err)
 		}
 	}
 
-	spin_lock(DRIVE_LOCK(drive));
 	blkdev_dequeue_request(rq);
 	HWGROUP(drive)->rq = NULL;
 	end_that_request_last(rq);
-	spin_unlock(DRIVE_LOCK(drive));
 
 	spin_unlock_irqrestore(&ide_lock, flags);
 }
@@ -1368,7 +1362,7 @@ repeat:
 
 /*
  * Issue a new request to a drive from hwgroup
- * Caller must have already done spin_lock_irqsave(DRIVE_LOCK(drive), ...)
+ * Caller must have already done spin_lock_irqsave(&ide_lock, ...)
  *
  * A hwgroup is a serialized group of IDE interfaces.  Usually there is
  * exactly one hwif (interface) per hwgroup, but buggy controllers (eg. CMD640)
@@ -1456,9 +1450,7 @@ static void ide_do_request(ide_hwgroup_t *hwgroup, int masked_irq)
 		/*
 		 * just continuing an interrupted request maybe
 		 */
-		spin_lock(DRIVE_LOCK(drive));
 		rq = hwgroup->rq = elv_next_request(&drive->queue);
-		spin_unlock(DRIVE_LOCK(drive));
 
 		/*
 		 * Some systems have trouble with IDE IRQs arriving while
@@ -1496,19 +1488,7 @@ request_queue_t *ide_get_queue (kdev_t dev)
  */
 void do_ide_request(request_queue_t *q)
 {
-	unsigned long flags;
-
-	/*
-	 * release queue lock, grab IDE global lock and restore when
-	 * we leave...
-	 */
-	spin_unlock(&q->queue_lock);
-
-	spin_lock_irqsave(&ide_lock, flags);
 	ide_do_request(q->queuedata, 0);
-	spin_unlock_irqrestore(&ide_lock, flags);
-
-	spin_lock(&q->queue_lock);
 }
 
 /*
@@ -1875,7 +1855,6 @@ int ide_do_drive_cmd (ide_drive_t *drive, struct request *rq, ide_action_t actio
 	if (action == ide_wait)
 		rq->waiting = &wait;
 	spin_lock_irqsave(&ide_lock, flags);
-	spin_lock(DRIVE_LOCK(drive));
 	if (blk_queue_empty(&drive->queue) || action == ide_preempt) {
 		if (action == ide_preempt)
 			hwgroup->rq = NULL;
@@ -1886,7 +1865,6 @@ int ide_do_drive_cmd (ide_drive_t *drive, struct request *rq, ide_action_t actio
 			queue_head = queue_head->next;
 	}
 	q->elevator.elevator_add_req_fn(q, rq, queue_head);
-	spin_unlock(DRIVE_LOCK(drive));
 	ide_do_request(hwgroup, 0);
 	spin_unlock_irqrestore(&ide_lock, flags);
 	if (action == ide_wait) {
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index c40dd3a1b..b65a67357 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -189,7 +189,7 @@ static mdk_personality_t linear_personality=
 	status:		linear_status,
 };
 
-static int md__init linear_init (void)
+static int __init linear_init (void)
 {
 	return register_md_personality (LINEAR, &linear_personality);
 }
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8fe839aff..2ddc2d188 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -130,7 +130,7 @@ static struct gendisk md_gendisk=
 /*
  * Enables to iterate over all existing md arrays
  */
-static MD_LIST_HEAD(all_mddevs);
+static LIST_HEAD(all_mddevs);
 
 /*
  * The mapping between kdev and mddev is not necessary a simple
@@ -201,8 +201,8 @@ static mddev_t * alloc_mddev(kdev_t dev)
 	init_MUTEX(&mddev->reconfig_sem);
 	init_MUTEX(&mddev->recovery_sem);
 	init_MUTEX(&mddev->resync_sem);
-	MD_INIT_LIST_HEAD(&mddev->disks);
-	MD_INIT_LIST_HEAD(&mddev->all_mddevs);
+	INIT_LIST_HEAD(&mddev->disks);
+	INIT_LIST_HEAD(&mddev->all_mddevs);
 	atomic_set(&mddev->active, 0);
 
 	/*
@@ -211,7 +211,7 @@ static mddev_t * alloc_mddev(kdev_t dev)
 	 * if necessary.
 	 */
 	add_mddev_mapping(mddev, dev, 0);
-	md_list_add(&mddev->all_mddevs, &all_mddevs);
+	list_add(&mddev->all_mddevs, &all_mddevs);
 
 	MOD_INC_USE_COUNT;
 
@@ -221,7 +221,7 @@ static mddev_t * alloc_mddev(kdev_t dev)
 mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr)
 {
 	mdk_rdev_t * rdev;
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 
 	ITERATE_RDEV(mddev,rdev,tmp) {
 		if (rdev->desc_nr == nr)
@@ -232,7 +232,7 @@ mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr)
 
 mdk_rdev_t * find_rdev(mddev_t * mddev, kdev_t dev)
 {
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 	mdk_rdev_t *rdev;
 
 	ITERATE_RDEV(mddev,rdev,tmp) {
@@ -242,17 +242,17 @@ mdk_rdev_t * find_rdev(mddev_t * mddev, kdev_t dev)
 	return NULL;
 }
 
-static MD_LIST_HEAD(device_names);
+static LIST_HEAD(device_names);
 
 char * partition_name(kdev_t dev)
 {
 	struct gendisk *hd;
 	static char nomem [] = "<nomem>";
 	dev_name_t *dname;
-	struct md_list_head *tmp = device_names.next;
+	struct list_head *tmp = device_names.next;
 
 	while (tmp != &device_names) {
-		dname = md_list_entry(tmp, dev_name_t, list);
+		dname = list_entry(tmp, dev_name_t, list);
 		if (dname->dev == dev)
 			return dname->name;
 		tmp = tmp->next;
@@ -275,8 +275,8 @@ char * partition_name(kdev_t dev)
 	}
 
 	dname->dev = dev;
-	MD_INIT_LIST_HEAD(&dname->list);
-	md_list_add(&dname->list, &device_names);
+	INIT_LIST_HEAD(&dname->list);
+	list_add(&dname->list, &device_names);
 
 	return dname->name;
 }
@@ -311,7 +311,7 @@ static unsigned int zoned_raid_size(mddev_t *mddev)
 {
 	unsigned int mask;
 	mdk_rdev_t * rdev;
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 
 	if (!mddev->sb) {
 		MD_BUG();
@@ -341,7 +341,7 @@ int md_check_ordering(mddev_t *mddev)
 {
 	int i, c;
 	mdk_rdev_t *rdev;
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 
 	/*
 	 * First, all devices must be fully functional
@@ -435,7 +435,7 @@ static int alloc_array_sb(mddev_t * mddev)
 	mddev->sb = (mdp_super_t *) __get_free_page (GFP_KERNEL);
 	if (!mddev->sb)
 		return -ENOMEM;
-	md_clear_page(mddev->sb);
+	clear_page(mddev->sb);
 	return 0;
 }
 
@@ -449,7 +449,7 @@ static int alloc_disk_sb(mdk_rdev_t * rdev)
 		printk(OUT_OF_MEM);
 		return -EINVAL;
 	}
-	md_clear_page(rdev->sb);
+	clear_page(rdev->sb);
 
 	return 0;
 }
@@ -564,7 +564,7 @@ static kdev_t dev_unit(kdev_t dev)
 
 static mdk_rdev_t * match_dev_unit(mddev_t *mddev, kdev_t dev)
 {
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 	mdk_rdev_t *rdev;
 
 	ITERATE_RDEV(mddev,rdev,tmp)
@@ -576,7 +576,7 @@ static mdk_rdev_t * match_dev_unit(mddev_t *mddev, kdev_t dev)
 
 static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2)
 {
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 	mdk_rdev_t *rdev;
 
 	ITERATE_RDEV(mddev1,rdev,tmp)
@@ -586,8 +586,8 @@ static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2)
 	return 0;
 }
 
-static MD_LIST_HEAD(all_raid_disks);
-static MD_LIST_HEAD(pending_raid_disks);
+static LIST_HEAD(all_raid_disks);
+static LIST_HEAD(pending_raid_disks);
 
 static void bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
 {
@@ -605,7 +605,7 @@ static void bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
 			mdidx(mddev), partition_name(rdev->dev),
 				partition_name(same_pdev->dev));
 
-	md_list_add(&rdev->same_set, &mddev->disks);
+	list_add(&rdev->same_set, &mddev->disks);
 	rdev->mddev = mddev;
 	mddev->nb_dev++;
 	printk(KERN_INFO "md: bind<%s,%d>\n", partition_name(rdev->dev), mddev->nb_dev);
@@ -617,8 +617,8 @@ static void unbind_rdev_from_array(mdk_rdev_t * rdev)
 		MD_BUG();
 		return;
 	}
-	md_list_del(&rdev->same_set);
-	MD_INIT_LIST_HEAD(&rdev->same_set);
+	list_del(&rdev->same_set);
+	INIT_LIST_HEAD(&rdev->same_set);
 	rdev->mddev->nb_dev--;
 	printk(KERN_INFO "md: unbind<%s,%d>\n", partition_name(rdev->dev),
 						 rdev->mddev->nb_dev);
@@ -664,13 +664,13 @@ static void export_rdev(mdk_rdev_t * rdev)
 		MD_BUG();
 	unlock_rdev(rdev);
 	free_disk_sb(rdev);
-	md_list_del(&rdev->all);
-	MD_INIT_LIST_HEAD(&rdev->all);
+	list_del(&rdev->all);
+	INIT_LIST_HEAD(&rdev->all);
 	if (rdev->pending.next != &rdev->pending) {
 		printk(KERN_INFO "md: (%s was pending)\n",
 			partition_name(rdev->dev));
-		md_list_del(&rdev->pending);
-		MD_INIT_LIST_HEAD(&rdev->pending);
+		list_del(&rdev->pending);
+		INIT_LIST_HEAD(&rdev->pending);
 	}
 #ifndef MODULE
 	md_autodetect_dev(rdev->dev);
@@ -688,7 +688,7 @@ static void kick_rdev_from_array(mdk_rdev_t * rdev)
 
 static void export_array(mddev_t *mddev)
 {
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 	mdk_rdev_t *rdev;
 	mdp_super_t *sb = mddev->sb;
 
@@ -723,14 +723,14 @@ static void free_mddev(mddev_t *mddev)
 	 * Make sure nobody else is using this mddev
 	 * (careful, we rely on the global kernel lock here)
 	 */
-	while (md_atomic_read(&mddev->resync_sem.count) != 1)
+	while (atomic_read(&mddev->resync_sem.count) != 1)
 		schedule();
-	while (md_atomic_read(&mddev->recovery_sem.count) != 1)
+	while (atomic_read(&mddev->recovery_sem.count) != 1)
 		schedule();
 
 	del_mddev_mapping(mddev, MKDEV(MD_MAJOR, mdidx(mddev)));
-	md_list_del(&mddev->all_mddevs);
-	MD_INIT_LIST_HEAD(&mddev->all_mddevs);
+	list_del(&mddev->all_mddevs);
+	INIT_LIST_HEAD(&mddev->all_mddevs);
 	kfree(mddev);
 	MOD_DEC_USE_COUNT;
 }
@@ -793,7 +793,7 @@ static void print_rdev(mdk_rdev_t *rdev)
 
 void md_print_devices(void)
 {
-	struct md_list_head *tmp, *tmp2;
+	struct list_head *tmp, *tmp2;
 	mdk_rdev_t *rdev;
 	mddev_t *mddev;
 
@@ -871,12 +871,12 @@ static int uuid_equal(mdk_rdev_t *rdev1, mdk_rdev_t *rdev2)
 
 static mdk_rdev_t * find_rdev_all(kdev_t dev)
 {
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 	mdk_rdev_t *rdev;
 
 	tmp = all_raid_disks.next;
 	while (tmp != &all_raid_disks) {
-		rdev = md_list_entry(tmp, mdk_rdev_t, all);
+		rdev = list_entry(tmp, mdk_rdev_t, all);
 		if (rdev->dev == dev)
 			return rdev;
 		tmp = tmp->next;
@@ -980,7 +980,7 @@ static int sync_sbs(mddev_t * mddev)
 {
 	mdk_rdev_t *rdev;
 	mdp_super_t *sb;
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 
 	ITERATE_RDEV(mddev,rdev,tmp) {
 		if (rdev->faulty || rdev->alias_device)
@@ -996,15 +996,15 @@ static int sync_sbs(mddev_t * mddev)
 int md_update_sb(mddev_t * mddev)
 {
 	int err, count = 100;
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 	mdk_rdev_t *rdev;
 
 repeat:
 	mddev->sb->utime = CURRENT_TIME;
-	if ((++mddev->sb->events_lo)==0)
+	if (!(++mddev->sb->events_lo))
 		++mddev->sb->events_hi;
 
-	if ((mddev->sb->events_lo|mddev->sb->events_hi)==0) {
+	if (!(mddev->sb->events_lo | mddev->sb->events_hi)) {
 		/*
 		 * oops, this 64-bit counter should never wrap.
 		 * Either we are in around ~1 trillion A.C., assuming
@@ -1128,8 +1128,8 @@ static int md_import_device(kdev_t newdev, int on_disk)
 			rdev->desc_nr = -1;
 		}
 	}
-	md_list_add(&rdev->all, &all_raid_disks);
-	MD_INIT_LIST_HEAD(&rdev->pending);
+	list_add(&rdev->all, &all_raid_disks);
+	INIT_LIST_HEAD(&rdev->pending);
 
 	if (rdev->faulty && rdev->sb)
 		free_disk_sb(rdev);
@@ -1167,7 +1167,7 @@ abort_free:
 static int analyze_sbs(mddev_t * mddev)
 {
 	int out_of_date = 0, i, first;
-	struct md_list_head *tmp, *tmp2;
+	struct list_head *tmp, *tmp2;
 	mdk_rdev_t *rdev, *rdev2, *freshest;
 	mdp_super_t *sb;
 
@@ -1225,7 +1225,7 @@ static int analyze_sbs(mddev_t * mddev)
 		 */
 		if (calc_sb_csum(rdev->sb) != rdev->sb->sb_csum) {
 			if (rdev->sb->events_lo || rdev->sb->events_hi)
-				if ((rdev->sb->events_lo--)==0)
+				if (!(rdev->sb->events_lo--))
 					rdev->sb->events_hi--;
 		}
 
@@ -1513,7 +1513,7 @@ static int device_size_calculation(mddev_t * mddev)
 	int data_disks = 0, persistent;
 	unsigned int readahead;
 	mdp_super_t *sb = mddev->sb;
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 	mdk_rdev_t *rdev;
 
 	/*
@@ -1572,7 +1572,7 @@ static int device_size_calculation(mddev_t * mddev)
 		md_size[mdidx(mddev)] = sb->size * data_disks;
 
 	readahead = MD_READAHEAD;
-	if ((sb->level == 0) || (sb->level == 4) || (sb->level == 5)) {
+	if (!sb->level || (sb->level == 4) || (sb->level == 5)) {
 		readahead = (mddev->sb->chunk_size>>PAGE_SHIFT) * 4 * data_disks;
 		if (readahead < data_disks * (MAX_SECTORS>>(PAGE_SHIFT-9))*2)
 			readahead = data_disks * (MAX_SECTORS>>(PAGE_SHIFT-9))*2;
@@ -1608,7 +1608,7 @@ static int do_md_run(mddev_t * mddev)
 {
 	int pnum, err;
 	int chunk_size;
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 	mdk_rdev_t *rdev;
 
 
@@ -1873,7 +1873,7 @@ int detect_old_array(mdp_super_t *sb)
 static void autorun_array(mddev_t *mddev)
 {
 	mdk_rdev_t *rdev;
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 	int err;
 
 	if (mddev->disks.prev == &mddev->disks) {
@@ -1913,8 +1913,8 @@ static void autorun_array(mddev_t *mddev)
  */
 static void autorun_devices(kdev_t countdev)
 {
-	struct md_list_head candidates;
-	struct md_list_head *tmp;
+	struct list_head candidates;
+	struct list_head *tmp;
 	mdk_rdev_t *rdev0, *rdev;
 	mddev_t *mddev;
 	kdev_t md_kdev;
@@ -1922,11 +1922,11 @@ static void autorun_devices(kdev_t countdev)
 
 	printk(KERN_INFO "md: autorun ...\n");
 	while (pending_raid_disks.next != &pending_raid_disks) {
-		rdev0 = md_list_entry(pending_raid_disks.next,
+		rdev0 = list_entry(pending_raid_disks.next,
 					 mdk_rdev_t, pending);
 
 		printk(KERN_INFO "md: considering %s ...\n", partition_name(rdev0->dev));
-		MD_INIT_LIST_HEAD(&candidates);
+		INIT_LIST_HEAD(&candidates);
 		ITERATE_RDEV_PENDING(rdev,tmp) {
 			if (uuid_equal(rdev0, rdev)) {
 				if (!sb_equal(rdev0->sb, rdev->sb)) {
@@ -1936,8 +1936,8 @@ static void autorun_devices(kdev_t countdev)
 					continue;
 				}
 				printk(KERN_INFO "md:  adding %s ...\n", partition_name(rdev->dev));
-				md_list_del(&rdev->pending);
-				md_list_add(&rdev->pending, &candidates);
+				list_del(&rdev->pending);
+				list_add(&rdev->pending, &candidates);
 			}
 		}
 		/*
@@ -1964,8 +1964,8 @@ static void autorun_devices(kdev_t countdev)
 		printk(KERN_INFO "md: created md%d\n", mdidx(mddev));
 		ITERATE_RDEV_GENERIC(candidates,pending,rdev,tmp) {
 			bind_rdev_to_array(rdev, mddev);
-			md_list_del(&rdev->pending);
-			MD_INIT_LIST_HEAD(&rdev->pending);
+			list_del(&rdev->pending);
+			INIT_LIST_HEAD(&rdev->pending);
 		}
 		autorun_array(mddev);
 	}
@@ -2025,7 +2025,7 @@ static int autostart_array(kdev_t startdev, kdev_t countdev)
 						partition_name(startdev));
 		goto abort;
 	}
-	md_list_add(&start_rdev->pending, &pending_raid_disks);
+	list_add(&start_rdev->pending, &pending_raid_disks);
 
 	sb = start_rdev->sb;
 
@@ -2058,7 +2058,7 @@ static int autostart_array(kdev_t startdev, kdev_t countdev)
 			MD_BUG();
 			goto abort;
 		}
-		md_list_add(&rdev->pending, &pending_raid_disks);
+		list_add(&rdev->pending, &pending_raid_disks);
 	}
 
 	/*
@@ -2091,7 +2091,7 @@ static int get_version(void * arg)
 	ver.minor = MD_MINOR_VERSION;
 	ver.patchlevel = MD_PATCHLEVEL_VERSION;
 
-	if (md_copy_to_user(arg, &ver, sizeof(ver)))
+	if (copy_to_user(arg, &ver, sizeof(ver)))
 		return -EFAULT;
 
 	return 0;
@@ -2128,7 +2128,7 @@ static int get_array_info(mddev_t * mddev, void * arg)
 	SET_FROM_SB(layout);
 	SET_FROM_SB(chunk_size);
 
-	if (md_copy_to_user(arg, &info, sizeof(info)))
+	if (copy_to_user(arg, &info, sizeof(info)))
 		return -EFAULT;
 
 	return 0;
@@ -2144,7 +2144,7 @@ static int get_disk_info(mddev_t * mddev, void * arg)
 	if (!mddev->sb)
 		return -EINVAL;
 
-	if (md_copy_from_user(&info, arg, sizeof(info)))
+	if (copy_from_user(&info, arg, sizeof(info)))
 		return -EFAULT;
 
 	nr = info.number;
@@ -2156,7 +2156,7 @@ static int get_disk_info(mddev_t * mddev, void * arg)
 	SET_FROM_SB(raid_disk);
 	SET_FROM_SB(state);
 
-	if (md_copy_to_user(arg, &info, sizeof(info)))
+	if (copy_to_user(arg, &info, sizeof(info)))
 		return -EFAULT;
 
 	return 0;
@@ -2191,7 +2191,7 @@ static int add_new_disk(mddev_t * mddev, mdu_disk_info_t *info)
 			return -EINVAL;
 		}
 		if (mddev->nb_dev) {
-			mdk_rdev_t *rdev0 = md_list_entry(mddev->disks.next,
+			mdk_rdev_t *rdev0 = list_entry(mddev->disks.next,
 							mdk_rdev_t, same_set);
 			if (!uuid_equal(rdev0, rdev)) {
 				printk(KERN_WARNING "md: %s has different UUID to %s\n",
@@ -2223,7 +2223,7 @@ static int add_new_disk(mddev_t * mddev, mdu_disk_info_t *info)
 	SET_SB(raid_disk);
 	SET_SB(state);
 
-	if ((info->state & (1<<MD_DISK_FAULTY))==0) {
+	if (!(info->state & (1<<MD_DISK_FAULTY))) {
 		err = md_import_device (dev, 0);
 		if (err) {
 			printk(KERN_WARNING "md: error, md_import_device() returned %d\n", err);
@@ -2566,7 +2566,7 @@ static int md_ioctl(struct inode *inode, struct file *file,
 	mddev_t *mddev = NULL;
 	kdev_t dev;
 
-	if (!md_capable_admin())
+	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
 	dev = inode->i_rdev;
@@ -2604,12 +2604,12 @@ static int md_ioctl(struct inode *inode, struct file *file,
 				MD_BUG();
 				goto abort;
 			}
-			err = md_put_user(md_hd_struct[minor].nr_sects,
+			err = put_user(md_hd_struct[minor].nr_sects,
 						(unsigned long *) arg);
 			goto done;
 
 		case BLKGETSIZE64:	/* Return device size */
-			err = md_put_user((u64)md_hd_struct[minor].nr_sects << 9,
+			err = put_user((u64)md_hd_struct[minor].nr_sects << 9,
 						(u64 *) arg);
 			goto done;
 
@@ -2618,7 +2618,7 @@ static int md_ioctl(struct inode *inode, struct file *file,
 		case BLKFLSBUF:
 		case BLKBSZGET:
 		case BLKBSZSET:
-			err = blk_ioctl (dev, cmd, arg);
+			err = blk_ioctl(dev, cmd, arg);
 			goto abort;
 
 		default:;
@@ -2670,7 +2670,7 @@ static int md_ioctl(struct inode *inode, struct file *file,
 			}
 			if (arg) {
 				mdu_array_info_t info;
-				if (md_copy_from_user(&info, (void*)arg, sizeof(info))) {
+				if (copy_from_user(&info, (void*)arg, sizeof(info))) {
 					err = -EFAULT;
 					goto abort_unlock;
 				}
@@ -2753,17 +2753,17 @@ static int md_ioctl(struct inode *inode, struct file *file,
 				err = -EINVAL;
 				goto abort_unlock;
 			}
-			err = md_put_user (2, (char *) &loc->heads);
+			err = put_user (2, (char *) &loc->heads);
 			if (err)
 				goto abort_unlock;
-			err = md_put_user (4, (char *) &loc->sectors);
+			err = put_user (4, (char *) &loc->sectors);
 			if (err)
 				goto abort_unlock;
-			err = md_put_user (md_hd_struct[mdidx(mddev)].nr_sects/8,
+			err = put_user (md_hd_struct[mdidx(mddev)].nr_sects/8,
 						(short *) &loc->cylinders);
 			if (err)
 				goto abort_unlock;
-			err = md_put_user (get_start_sect(dev),
+			err = put_user (get_start_sect(dev),
 						(long *) &loc->start);
 			goto done_unlock;
 	}
@@ -2787,7 +2787,7 @@ static int md_ioctl(struct inode *inode, struct file *file,
 		case ADD_NEW_DISK:
 		{
 			mdu_disk_info_t info;
-			if (md_copy_from_user(&info, (void*)arg, sizeof(info)))
+			if (copy_from_user(&info, (void*)arg, sizeof(info)))
 				err = -EFAULT;
 			else
 				err = add_new_disk(mddev, &info);
@@ -2828,7 +2828,7 @@ static int md_ioctl(struct inode *inode, struct file *file,
 		{
 /* The data is never used....
 			mdu_param_t param;
-			err = md_copy_from_user(&param, (mdu_param_t *)arg,
+			err = copy_from_user(&param, (mdu_param_t *)arg,
 							 sizeof(param));
 			if (err)
 				goto abort_unlock;
@@ -2887,7 +2887,7 @@ static int md_release(struct inode *inode, struct file * file)
 	return 0;
 }
 
-static struct block_device_operations md_fops=
+static struct block_device_operations md_fops =
 {
 	owner:		THIS_MODULE,
 	open:		md_open,
@@ -2896,11 +2896,18 @@ static struct block_device_operations md_fops=
 };
 
 
+static inline void flush_curr_signals(void)
+{
+	spin_lock(&current->sigmask_lock);
+	flush_signals(current);
+	spin_unlock(&current->sigmask_lock);
+}
+
 int md_thread(void * arg)
 {
 	mdk_thread_t *thread = arg;
 
-	md_lock_kernel();
+	lock_kernel();
 
 	/*
 	 * Detach thread
@@ -2909,8 +2916,9 @@ int md_thread(void * arg)
 	daemonize();
 
 	sprintf(current->comm, thread->name);
-	md_init_signals();
-	md_flush_signals();
+	current->exit_signal = SIGCHLD;
+	siginitsetinv(&current->blocked, sigmask(SIGKILL));
+	flush_curr_signals();
 	thread->tsk = current;
 
 	/*
@@ -2926,7 +2934,7 @@ int md_thread(void * arg)
 	 */
 	current->policy = SCHED_OTHER;
 	current->nice = -20;
-	md_unlock_kernel();
+	unlock_kernel();
 
 	complete(thread->event);
 	while (thread->run) {
@@ -2949,8 +2957,8 @@ int md_thread(void * arg)
 			run(thread->data);
 			run_task_queue(&tq_disk);
 		}
-		if (md_signal_pending(current))
-			md_flush_signals();
+		if (signal_pending(current))
+			flush_curr_signals();
 	}
 	complete(thread->event);
 	return 0;
@@ -2976,7 +2984,7 @@ mdk_thread_t *md_register_thread(void (*run) (void *),
 		return NULL;
 
 	memset(thread, 0, sizeof(mdk_thread_t));
-	md_init_waitqueue_head(&thread->wqueue);
+	init_waitqueue_head(&thread->wqueue);
 
 	init_completion(&event);
 	thread->event = &event;
@@ -3064,7 +3072,7 @@ static int status_unused(char * page)
 {
 	int sz = 0, i = 0;
 	mdk_rdev_t *rdev;
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 
 	sz += sprintf(page + sz, "unused devices: ");
 
@@ -3150,7 +3158,7 @@ static int md_status_read_proc(char *page, char **start, off_t off,
 			int count, int *eof, void *data)
 {
 	int sz = 0, j, size;
-	struct md_list_head *tmp, *tmp2;
+	struct list_head *tmp, *tmp2;
 	mdk_rdev_t *rdev;
 	mddev_t *mddev;
 
@@ -3207,7 +3215,7 @@ static int md_status_read_proc(char *page, char **start, off_t off,
 		if (mddev->curr_resync) {
 			sz += status_resync (page+sz, mddev);
 		} else {
-			if (md_atomic_read(&mddev->resync_sem.count) != 1)
+			if (atomic_read(&mddev->resync_sem.count) != 1)
 				sz += sprintf(page + sz, "	resync=DELAYED");
 		}
 		sz += sprintf(page + sz, "\n");
@@ -3251,7 +3259,7 @@ mdp_disk_t *get_spare(mddev_t *mddev)
 	mdp_super_t *sb = mddev->sb;
 	mdp_disk_t *disk;
 	mdk_rdev_t *rdev;
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 
 	ITERATE_RDEV(mddev,rdev,tmp) {
 		if (rdev->faulty)
@@ -3288,7 +3296,7 @@ void md_sync_acct(kdev_t dev, unsigned long nr_sectors)
 static int is_mddev_idle(mddev_t *mddev)
 {
 	mdk_rdev_t * rdev;
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 	int idle;
 	unsigned long curr_events;
 
@@ -3311,7 +3319,7 @@ static int is_mddev_idle(mddev_t *mddev)
 	return idle;
 }
 
-MD_DECLARE_WAIT_QUEUE_HEAD(resync_wait);
+DECLARE_WAIT_QUEUE_HEAD(resync_wait);
 
 void md_done_sync(mddev_t *mddev, int blocks, int ok)
 {
@@ -3333,7 +3341,7 @@ int md_do_sync(mddev_t *mddev, mdp_disk_t *spare)
 	unsigned long mark[SYNC_MARKS];
 	unsigned long mark_cnt[SYNC_MARKS];
 	int last_mark,m;
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 	unsigned long last_check;
 
 
@@ -3356,8 +3364,8 @@ recheck:
 	}
 	if (serialize) {
 		interruptible_sleep_on(&resync_wait);
-		if (md_signal_pending(current)) {
-			md_flush_signals();
+		if (signal_pending(current)) {
+			flush_curr_signals();
 			err = -EINTR;
 			goto out;
 		}
@@ -3365,8 +3373,7 @@ recheck:
 	}
 
 	mddev->curr_resync = 1;
-
-	max_sectors = mddev->sb->size<<1;
+	max_sectors = mddev->sb->size << 1;
 
 	printk(KERN_INFO "md: syncing RAID array md%d\n", mdidx(mddev));
 	printk(KERN_INFO "md: minimum _guaranteed_ reconstruction speed: %d KB/sec/disc.\n",
@@ -3403,7 +3410,6 @@ recheck:
 		int sectors;
 
 		sectors = mddev->pers->sync_request(mddev, j);
-
 		if (sectors < 0) {
 			err = sectors;
 			goto out;
@@ -3432,13 +3438,13 @@ recheck:
 		}
 
 
-		if (md_signal_pending(current)) {
+		if (signal_pending(current)) {
 			/*
 			 * got a signal, exit.
 			 */
 			mddev->curr_resync = 0;
 			printk(KERN_INFO "md: md_do_sync() got signal ... exiting\n");
-			md_flush_signals();
+			flush_curr_signals();
 			err = -EINTR;
 			goto out;
 		}
@@ -3451,7 +3457,7 @@ recheck:
 		 * about not overloading the IO subsystem. (things like an
 		 * e2fsck being done on the RAID array should execute fast)
 		 */
-		if (md_need_resched(current))
+		if (current->need_resched)
 			schedule();
 
 		currspeed = (j-mddev->resync_mark_cnt)/2/((jiffies-mddev->resync_mark)/HZ +1) +1;
@@ -3462,7 +3468,7 @@ recheck:
 			if ((currspeed > sysctl_speed_limit_max) ||
 					!is_mddev_idle(mddev)) {
 				current->state = TASK_INTERRUPTIBLE;
-				md_schedule_timeout(HZ/4);
+				schedule_timeout(HZ/4);
 				goto repeat;
 			}
 		} else
@@ -3474,7 +3480,7 @@ recheck:
 	 * this also signals 'finished resyncing' to md_stop
 	 */
 out:
-	wait_event(mddev->recovery_wait, atomic_read(&mddev->recovery_active)==0);
+	wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));
 	up(&mddev->resync_sem);
 out_nolock:
 	mddev->curr_resync = 0;
@@ -3497,7 +3503,7 @@ void md_do_recovery(void *data)
 	mddev_t *mddev;
 	mdp_super_t *sb;
 	mdp_disk_t *spare;
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 
 	printk(KERN_INFO "md: recovery thread got woken up ...\n");
 restart:
@@ -3581,13 +3587,13 @@ restart:
 int md_notify_reboot(struct notifier_block *this,
 					unsigned long code, void *x)
 {
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 	mddev_t *mddev;
 
-	if ((code == MD_SYS_DOWN) || (code == MD_SYS_HALT)
-					|| (code == MD_SYS_POWER_OFF)) {
+	if ((code == SYS_DOWN) || (code == SYS_HALT) || (code == SYS_POWER_OFF)) {
 
 		printk(KERN_INFO "md: stopping all md devices.\n");
+		return NOTIFY_DONE;
 
 		ITERATE_MDDEV(mddev,tmp)
 			do_md_stop (mddev, 1);
@@ -3597,7 +3603,7 @@ int md_notify_reboot(struct notifier_block *this,
 		 * right place to handle this issue is the given
 		 * driver, we do want to have a safe RAID driver ...
 		 */
-		md_mdelay(1000*1);
+		mdelay(1000*1);
 	}
 	return NOTIFY_DONE;
 }
@@ -3628,7 +3634,7 @@ static void md_geninit(void)
 #endif
 }
 
-int md__init md_init(void)
+int __init md_init(void)
 {
 	static char * name = "mdrecoveryd";
 	int minor;
@@ -3665,7 +3671,7 @@ int md__init md_init(void)
 		printk(KERN_ALERT
 		       "md: bug: couldn't allocate md_recovery_thread\n");
 
-	md_register_reboot_notifier(&md_notifier);
+	register_reboot_notifier(&md_notifier);
 	raid_table_header = register_sysctl_table(raid_root_table, 1);
 
 	md_geninit();
@@ -3687,7 +3693,7 @@ int md__init md_init(void)
 struct {
 	int set;
 	int noautodetect;
-} raid_setup_args md__initdata;
+} raid_setup_args __initdata;
 
 /*
  * Searches all registered partitions for autorun RAID arrays
@@ -3730,7 +3736,7 @@ static void autostart_arrays(void)
 			MD_BUG();
 			continue;
 		}
-		md_list_add(&rdev->pending, &pending_raid_disks);
+		list_add(&rdev->pending, &pending_raid_disks);
 	}
 	dev_cnt = 0;
 
@@ -3742,7 +3748,7 @@ static struct {
 	int pers[MAX_MD_DEVS];
 	int chunk[MAX_MD_DEVS];
 	char *device_names[MAX_MD_DEVS];
-} md_setup_args md__initdata;
+} md_setup_args __initdata;
 
 /*
  * Parse the command-line parameters given our kernel, but do not
@@ -3764,7 +3770,7 @@ static struct {
  *		Shifted name_to_kdev_t() and related operations to md_set_drive()
  *		for later execution. Rewrote section to make devfs compatible.
  */
-static int md__init md_setup(char *str)
+static int __init md_setup(char *str)
 {
 	int minor, level, factor, fault;
 	char *pername = "";
@@ -3783,7 +3789,7 @@ static int md__init md_setup(char *str)
 	}
 	switch (get_option(&str, &level)) {	/* RAID Personality */
 	case 2: /* could be 0 or -1.. */
-		if (level == 0 || level == -1) {
+		if (!level || level == -1) {
 			if (get_option(&str, &factor) != 2 ||	/* Chunk Size */
 					get_option(&str, &fault) != 2) {
 				printk(KERN_WARNING "md: Too few arguments supplied to md=.\n");
@@ -3825,8 +3831,8 @@ static int md__init md_setup(char *str)
 	return 1;
 }
 
-extern kdev_t name_to_kdev_t(char *line) md__init;
-void md__init md_setup_drive(void)
+extern kdev_t name_to_kdev_t(char *line) __init;
+void __init md_setup_drive(void)
 {
 	int minor, i;
 	kdev_t dev;
@@ -3838,7 +3844,8 @@ void md__init md_setup_drive(void)
 		char *devname;
 		mdu_disk_info_t dinfo;
 
-		if ((devname = md_setup_args.device_names[minor]) == 0)	continue;
+		if (!(devname = md_setup_args.device_names[minor]))
+			continue;
 
 		for (i = 0; i < MD_SB_DISKS && devname != 0; i++) {
 
@@ -3857,7 +3864,7 @@ void md__init md_setup_drive(void)
 				devfs_get_maj_min(handle, &major, &minor);
 				dev = MKDEV(major, minor);
 			}
-			if (dev == 0) {
+			if (!dev) {
 				printk(KERN_WARNING "md: Unknown device name: %s\n", devname);
 				break;
 			}
@@ -3869,7 +3876,7 @@ void md__init md_setup_drive(void)
 		}
 		devices[i] = 0;
 
-		if (md_setup_args.device_set[minor] == 0)
+		if (!md_setup_args.device_set[minor])
 			continue;
 
 		if (mddev_map[minor].mddev) {
@@ -3933,7 +3940,7 @@ void md__init md_setup_drive(void)
 	}
 }
 
-static int md__init raid_setup(char *str)
+static int __init raid_setup(char *str)
 {
 	int len, pos;
 
@@ -3947,7 +3954,7 @@ static int md__init raid_setup(char *str)
 			wlen = (comma-str)-pos;
 		else	wlen = (len-1)-pos;
 
-		if (strncmp(str, "noautodetect", wlen) == 0)
+		if (!strncmp(str, "noautodetect", wlen))
 			raid_setup_args.noautodetect = 1;
 		pos += wlen+1;
 	}
@@ -3955,7 +3962,7 @@ static int md__init raid_setup(char *str)
 	return 1;
 }
 
-int md__init md_run_setup(void)
+int __init md_run_setup(void)
 {
 	if (raid_setup_args.noautodetect)
 		printk(KERN_INFO "md: Skipping autodetection of RAID arrays. (raid=noautodetect)\n");
@@ -4008,23 +4015,23 @@ void cleanup_module(void)
 }
 #endif
 
-MD_EXPORT_SYMBOL(md_size);
-MD_EXPORT_SYMBOL(register_md_personality);
-MD_EXPORT_SYMBOL(unregister_md_personality);
-MD_EXPORT_SYMBOL(partition_name);
-MD_EXPORT_SYMBOL(md_error);
-MD_EXPORT_SYMBOL(md_do_sync);
-MD_EXPORT_SYMBOL(md_sync_acct);
-MD_EXPORT_SYMBOL(md_done_sync);
-MD_EXPORT_SYMBOL(md_recover_arrays);
-MD_EXPORT_SYMBOL(md_register_thread);
-MD_EXPORT_SYMBOL(md_unregister_thread);
-MD_EXPORT_SYMBOL(md_update_sb);
-MD_EXPORT_SYMBOL(md_wakeup_thread);
-MD_EXPORT_SYMBOL(md_print_devices);
-MD_EXPORT_SYMBOL(find_rdev_nr);
-MD_EXPORT_SYMBOL(md_interrupt_thread);
-MD_EXPORT_SYMBOL(mddev_map);
-MD_EXPORT_SYMBOL(md_check_ordering);
-MD_EXPORT_SYMBOL(get_spare);
+EXPORT_SYMBOL(md_size);
+EXPORT_SYMBOL(register_md_personality);
+EXPORT_SYMBOL(unregister_md_personality);
+EXPORT_SYMBOL(partition_name);
+EXPORT_SYMBOL(md_error);
+EXPORT_SYMBOL(md_do_sync);
+EXPORT_SYMBOL(md_sync_acct);
+EXPORT_SYMBOL(md_done_sync);
+EXPORT_SYMBOL(md_recover_arrays);
+EXPORT_SYMBOL(md_register_thread);
+EXPORT_SYMBOL(md_unregister_thread);
+EXPORT_SYMBOL(md_update_sb);
+EXPORT_SYMBOL(md_wakeup_thread);
+EXPORT_SYMBOL(md_print_devices);
+EXPORT_SYMBOL(find_rdev_nr);
+EXPORT_SYMBOL(md_interrupt_thread);
+EXPORT_SYMBOL(mddev_map);
+EXPORT_SYMBOL(md_check_ordering);
+EXPORT_SYMBOL(get_spare);
 
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 8b21f612e..7203d97a2 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -334,7 +334,7 @@ static mdk_personality_t raid0_personality=
 	status:		raid0_status,
 };
 
-static int md__init raid0_init (void)
+static int __init raid0_init (void)
 {
 	return register_md_personality (RAID0, &raid0_personality);
 }
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 6c8a5bf21..57829582b 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1,7 +1,7 @@
 /*
  * raid1.c : Multiple Devices driver for Linux
  *
- * Copyright (C) 1999, 2000 Ingo Molnar, Red Hat
+ * Copyright (C) 1999, 2000, 2001 Ingo Molnar, Red Hat
  *
  * Copyright (C) 1996, 1997, 1998 Ingo Molnar, Miguel de Icaza, Gadi Oxman
  *
@@ -22,330 +22,208 @@
  * Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  */
 
-#include <linux/module.h>
-#include <linux/slab.h>
 #include <linux/raid/raid1.h>
-#include <asm/atomic.h>
 
 #define MAJOR_NR MD_MAJOR
 #define MD_DRIVER
 #define MD_PERSONALITY
 
 #define MAX_WORK_PER_DISK 128
-
-#define	NR_RESERVED_BUFS	32
-
-
 /*
- * The following can be used to debug the driver
+ * Number of guaranteed r1bios in case of extreme VM load:
  */
-#define RAID1_DEBUG	0
-
-#if RAID1_DEBUG
-#define PRINTK(x...)   printk(x)
-#define inline
-#define __inline__
-#else
-#define PRINTK(x...)  do { } while (0)
-#endif
-
+#define	NR_RAID1_BIOS 256
 
 static mdk_personality_t raid1_personality;
-static md_spinlock_t retry_list_lock = MD_SPIN_LOCK_UNLOCKED;
-struct raid1_bh *raid1_retry_list = NULL, **raid1_retry_tail;
+static spinlock_t retry_list_lock = SPIN_LOCK_UNLOCKED;
+static LIST_HEAD(retry_list_head);
 
-static struct buffer_head *raid1_alloc_bh(raid1_conf_t *conf, int cnt)
+static inline void check_all_w_bios_empty(r1bio_t *r1_bio)
 {
-	/* return a linked list of "cnt" struct buffer_heads.
-	 * don't take any off the free list unless we know we can
-	 * get all we need, otherwise we could deadlock
-	 */
-	struct buffer_head *bh=NULL;
-
-	while(cnt) {
-		struct buffer_head *t;
-		md_spin_lock_irq(&conf->device_lock);
-		if (!conf->freebh_blocked && conf->freebh_cnt >= cnt)
-			while (cnt) {
-				t = conf->freebh;
-				conf->freebh = t->b_next;
-				t->b_next = bh;
-				bh = t;
-				t->b_state = 0;
-				conf->freebh_cnt--;
-				cnt--;
-			}
-		md_spin_unlock_irq(&conf->device_lock);
-		if (cnt == 0)
-			break;
-		t = kmem_cache_alloc(bh_cachep, SLAB_NOIO);
-		if (t) {
-			t->b_next = bh;
-			bh = t;
-			cnt--;
-		} else {
-			PRINTK("raid1: waiting for %d bh\n", cnt);
-			conf->freebh_blocked = 1;
-			wait_disk_event(conf->wait_buffer,
-					!conf->freebh_blocked ||
-					conf->freebh_cnt > conf->raid_disks * NR_RESERVED_BUFS/2);
-			conf->freebh_blocked = 0;
-		}
-	}
-	return bh;
-}
+	int i;
 
-static inline void raid1_free_bh(raid1_conf_t *conf, struct buffer_head *bh)
-{
-	unsigned long flags;
-	spin_lock_irqsave(&conf->device_lock, flags);
-	while (bh) {
-		struct buffer_head *t = bh;
-		bh=bh->b_next;
-		if (t->b_pprev == NULL)
-			kmem_cache_free(bh_cachep, t);
-		else {
-			t->b_next= conf->freebh;
-			conf->freebh = t;
-			conf->freebh_cnt++;
-		}
-	}
-	spin_unlock_irqrestore(&conf->device_lock, flags);
-	wake_up(&conf->wait_buffer);
+	return;
+	for (i = 0; i < MD_SB_DISKS; i++)
+		if (r1_bio->write_bios[i])
+			BUG();
 }
 
-static int raid1_grow_bh(raid1_conf_t *conf, int cnt)
+static inline void check_all_bios_empty(r1bio_t *r1_bio)
 {
-	/* allocate cnt buffer_heads, possibly less if kmalloc fails */
-	int i = 0;
-
-	while (i < cnt) {
-		struct buffer_head *bh;
-		bh = kmem_cache_alloc(bh_cachep, SLAB_KERNEL);
-		if (!bh) break;
-
-		md_spin_lock_irq(&conf->device_lock);
-		bh->b_pprev = &conf->freebh;
-		bh->b_next = conf->freebh;
-		conf->freebh = bh;
-		conf->freebh_cnt++;
-		md_spin_unlock_irq(&conf->device_lock);
-
-		i++;
-	}
-	return i;
+	return;
+	if (r1_bio->read_bio)
+		BUG();
+	check_all_w_bios_empty(r1_bio);
 }
 
-static void raid1_shrink_bh(raid1_conf_t *conf)
+static void * r1bio_pool_alloc(int gfp_flags, void *data)
 {
-	/* discard all buffer_heads */
-
-	md_spin_lock_irq(&conf->device_lock);
-	while (conf->freebh) {
-		struct buffer_head *bh = conf->freebh;
-		conf->freebh = bh->b_next;
-		kmem_cache_free(bh_cachep, bh);
-		conf->freebh_cnt--;
-	}
-	md_spin_unlock_irq(&conf->device_lock);
-}
-		
+	r1bio_t *r1_bio;
 
-static struct raid1_bh *raid1_alloc_r1bh(raid1_conf_t *conf)
-{
-	struct raid1_bh *r1_bh = NULL;
+	r1_bio = kmalloc(sizeof(r1bio_t), gfp_flags);
+	if (r1_bio)
+		memset(r1_bio, 0, sizeof(*r1_bio));
 
-	do {
-		md_spin_lock_irq(&conf->device_lock);
-		if (!conf->freer1_blocked && conf->freer1) {
-			r1_bh = conf->freer1;
-			conf->freer1 = r1_bh->next_r1;
-			conf->freer1_cnt--;
-			r1_bh->next_r1 = NULL;
-			r1_bh->state = (1 << R1BH_PreAlloc);
-			r1_bh->bh_req.b_state = 0;
-		}
-		md_spin_unlock_irq(&conf->device_lock);
-		if (r1_bh)
-			return r1_bh;
-		r1_bh = (struct raid1_bh *) kmalloc(sizeof(struct raid1_bh), GFP_NOIO);
-		if (r1_bh) {
-			memset(r1_bh, 0, sizeof(*r1_bh));
-			return r1_bh;
-		}
-		conf->freer1_blocked = 1;
-		wait_disk_event(conf->wait_buffer,
-				!conf->freer1_blocked ||
-				conf->freer1_cnt > NR_RESERVED_BUFS/2
-			);
-		conf->freer1_blocked = 0;
-	} while (1);
+	return r1_bio;
 }
 
-static inline void raid1_free_r1bh(struct raid1_bh *r1_bh)
+static void r1bio_pool_free(void *r1_bio, void *data)
 {
-	struct buffer_head *bh = r1_bh->mirror_bh_list;
-	raid1_conf_t *conf = mddev_to_conf(r1_bh->mddev);
-
-	r1_bh->mirror_bh_list = NULL;
-
-	if (test_bit(R1BH_PreAlloc, &r1_bh->state)) {
-		unsigned long flags;
-		spin_lock_irqsave(&conf->device_lock, flags);
-		r1_bh->next_r1 = conf->freer1;
-		conf->freer1 = r1_bh;
-		conf->freer1_cnt++;
-		spin_unlock_irqrestore(&conf->device_lock, flags);
-		/* don't need to wakeup wait_buffer because
-		 *  raid1_free_bh below will do that
-		 */
-	} else {
-		kfree(r1_bh);
-	}
-	raid1_free_bh(conf, bh);
+	check_all_bios_empty(r1_bio);
+	kfree(r1_bio);
 }
 
-static int raid1_grow_r1bh (raid1_conf_t *conf, int cnt)
-{
-	int i = 0;
-
-	while (i < cnt) {
-		struct raid1_bh *r1_bh;
-		r1_bh = (struct raid1_bh*)kmalloc(sizeof(*r1_bh), GFP_KERNEL);
-		if (!r1_bh)
-			break;
-		memset(r1_bh, 0, sizeof(*r1_bh));
-		set_bit(R1BH_PreAlloc, &r1_bh->state);
-		r1_bh->mddev = conf->mddev;
-
-		raid1_free_r1bh(r1_bh);
-		i++;
-	}
-	return i;
-}
+#define RESYNC_BLOCK_SIZE (64*1024)
+#define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
+#define RESYNC_WINDOW (2048*1024)
 
-static void raid1_shrink_r1bh(raid1_conf_t *conf)
+static void * r1buf_pool_alloc(int gfp_flags, void *data)
 {
-	md_spin_lock_irq(&conf->device_lock);
-	while (conf->freer1) {
-		struct raid1_bh *r1_bh = conf->freer1;
-		conf->freer1 = r1_bh->next_r1;
-		conf->freer1_cnt--;
-		kfree(r1_bh);
+	conf_t *conf = data;
+	struct page *page;
+	r1bio_t *r1_bio;
+	struct bio *bio;
+	int i, j;
+
+	r1_bio = mempool_alloc(conf->r1bio_pool, gfp_flags);
+	check_all_bios_empty(r1_bio);
+
+	bio = bio_alloc(gfp_flags, RESYNC_PAGES);
+	if (!bio)
+		goto out_free_r1_bio;
+
+	for (i = 0; i < RESYNC_PAGES; i++) {
+		page = alloc_page(gfp_flags);
+		if (unlikely(!page))
+			goto out_free_pages;
+
+		bio->bi_io_vec[i].bv_page = page;
+		bio->bi_io_vec[i].bv_len = PAGE_SIZE;
+		bio->bi_io_vec[i].bv_offset = 0;
 	}
-	md_spin_unlock_irq(&conf->device_lock);
-}
-
 
-
-static inline void raid1_free_buf(struct raid1_bh *r1_bh)
-{
-	unsigned long flags;
-	struct buffer_head *bh = r1_bh->mirror_bh_list;
-	raid1_conf_t *conf = mddev_to_conf(r1_bh->mddev);
-	r1_bh->mirror_bh_list = NULL;
-	
-	spin_lock_irqsave(&conf->device_lock, flags);
-	r1_bh->next_r1 = conf->freebuf;
-	conf->freebuf = r1_bh;
-	spin_unlock_irqrestore(&conf->device_lock, flags);
-	raid1_free_bh(conf, bh);
+	/*
+	 * Allocate a single data page for this iovec.
+	 */
+	bio->bi_vcnt = RESYNC_PAGES;
+	bio->bi_idx = 0;
+	bio->bi_size = RESYNC_BLOCK_SIZE;
+	bio->bi_end_io = NULL;
+	atomic_set(&bio->bi_cnt, 1);
+
+	r1_bio->master_bio = bio;
+
+	return r1_bio;
+
+out_free_pages:
+	for (j = 0; j < i; j++)
+		__free_page(bio->bi_io_vec[j].bv_page);
+	bio_put(bio);
+out_free_r1_bio:
+	mempool_free(r1_bio, conf->r1bio_pool);
+	return NULL;
 }
 
-static struct raid1_bh *raid1_alloc_buf(raid1_conf_t *conf)
+static void r1buf_pool_free(void *__r1_bio, void *data)
 {
-	struct raid1_bh *r1_bh;
-
-	md_spin_lock_irq(&conf->device_lock);
-	wait_event_lock_irq(conf->wait_buffer, conf->freebuf, conf->device_lock);
-	r1_bh = conf->freebuf;
-	conf->freebuf = r1_bh->next_r1;
-	r1_bh->next_r1= NULL;
-	md_spin_unlock_irq(&conf->device_lock);
+	int i;
+	conf_t *conf = data;
+	r1bio_t *r1bio = __r1_bio;
+	struct bio *bio = r1bio->master_bio;
 
-	return r1_bh;
+	check_all_bios_empty(r1bio);
+	if (atomic_read(&bio->bi_cnt) != 1)
+		BUG();
+	for (i = 0; i < RESYNC_PAGES; i++) {
+		__free_page(bio->bi_io_vec[i].bv_page);
+		bio->bi_io_vec[i].bv_page = NULL;
+	}
+	if (atomic_read(&bio->bi_cnt) != 1)
+		BUG();
+	bio_put(bio);
+	mempool_free(r1bio, conf->r1bio_pool);
 }
 
-static int raid1_grow_buffers (raid1_conf_t *conf, int cnt)
+static void put_all_bios(conf_t *conf, r1bio_t *r1_bio)
 {
-	int i = 0;
-
-	md_spin_lock_irq(&conf->device_lock);
-	while (i < cnt) {
-		struct raid1_bh *r1_bh;
-		struct page *page;
-
-		page = alloc_page(GFP_KERNEL);
-		if (!page)
-			break;
+	int i;
 
-		r1_bh = (struct raid1_bh *) kmalloc(sizeof(*r1_bh), GFP_KERNEL);
-		if (!r1_bh) {
-			__free_page(page);
-			break;
+	if (r1_bio->read_bio) {
+		if (atomic_read(&r1_bio->read_bio->bi_cnt) != 1)
+			BUG();
+		bio_put(r1_bio->read_bio);
+		r1_bio->read_bio = NULL;
+	}
+	for (i = 0; i < MD_SB_DISKS; i++) {
+		struct bio **bio = r1_bio->write_bios + i;
+		if (*bio) {
+			if (atomic_read(&(*bio)->bi_cnt) != 1)
+				BUG();
+			bio_put(*bio);
 		}
-		memset(r1_bh, 0, sizeof(*r1_bh));
-		r1_bh->bh_req.b_page = page;
-		r1_bh->bh_req.b_data = page_address(page);
-		r1_bh->next_r1 = conf->freebuf;
-		conf->freebuf = r1_bh;
-		i++;
+		*bio = NULL;
 	}
-	md_spin_unlock_irq(&conf->device_lock);
-	return i;
+	check_all_bios_empty(r1_bio);
 }
 
-static void raid1_shrink_buffers (raid1_conf_t *conf)
+static inline void free_r1bio(r1bio_t *r1_bio)
 {
-	md_spin_lock_irq(&conf->device_lock);
-	while (conf->freebuf) {
-		struct raid1_bh *r1_bh = conf->freebuf;
-		conf->freebuf = r1_bh->next_r1;
-		__free_page(r1_bh->bh_req.b_page);
-		kfree(r1_bh);
-	}
-	md_spin_unlock_irq(&conf->device_lock);
+	conf_t *conf = mddev_to_conf(r1_bio->mddev);
+
+	put_all_bios(conf, r1_bio);
+	mempool_free(r1_bio, conf->r1bio_pool);
+}
+
+static inline void put_buf(r1bio_t *r1_bio)
+{
+	conf_t *conf = mddev_to_conf(r1_bio->mddev);
+	struct bio *bio = r1_bio->master_bio;
+
+	/*
+	 * undo any possible partial request fixup magic:
+	 */
+	if (bio->bi_size != RESYNC_BLOCK_SIZE)
+		bio->bi_io_vec[bio->bi_vcnt-1].bv_len = PAGE_SIZE;
+	put_all_bios(conf, r1_bio);
+	mempool_free(r1_bio, conf->r1buf_pool);
 }
 
-static int raid1_map (mddev_t *mddev, kdev_t *rdev)
+static int map(mddev_t *mddev, kdev_t *rdev)
 {
-	raid1_conf_t *conf = mddev_to_conf(mddev);
+	conf_t *conf = mddev_to_conf(mddev);
 	int i, disks = MD_SB_DISKS;
 
 	/*
-	 * Later we do read balancing on the read side 
+	 * Later we do read balancing on the read side
 	 * now we use the first available disk.
 	 */
 
 	for (i = 0; i < disks; i++) {
 		if (conf->mirrors[i].operational) {
 			*rdev = conf->mirrors[i].dev;
-			return (0);
+			return 0;
 		}
 	}
 
 	printk (KERN_ERR "raid1_map(): huh, no more operational devices?\n");
-	return (-1);
+	return -1;
 }
 
-static void raid1_reschedule_retry (struct raid1_bh *r1_bh)
+static void reschedule_retry(r1bio_t *r1_bio)
 {
 	unsigned long flags;
-	mddev_t *mddev = r1_bh->mddev;
-	raid1_conf_t *conf = mddev_to_conf(mddev);
-
-	md_spin_lock_irqsave(&retry_list_lock, flags);
-	if (raid1_retry_list == NULL)
-		raid1_retry_tail = &raid1_retry_list;
-	*raid1_retry_tail = r1_bh;
-	raid1_retry_tail = &r1_bh->next_r1;
-	r1_bh->next_r1 = NULL;
-	md_spin_unlock_irqrestore(&retry_list_lock, flags);
+	mddev_t *mddev = r1_bio->mddev;
+	conf_t *conf = mddev_to_conf(mddev);
+
+	spin_lock_irqsave(&retry_list_lock, flags);
+	list_add(&r1_bio->retry_list, &retry_list_head);
+	spin_unlock_irqrestore(&retry_list_lock, flags);
+
 	md_wakeup_thread(conf->thread);
 }
 
 
-static void inline io_request_done(unsigned long sector, raid1_conf_t *conf, int phase)
+static void inline raid_request_done(unsigned long sector, conf_t *conf, int phase)
 {
 	unsigned long flags;
 	spin_lock_irqsave(&conf->segment_lock, flags);
@@ -359,9 +237,10 @@ static void inline io_request_done(unsigned long sector, raid1_conf_t *conf, int
 	spin_unlock_irqrestore(&conf->segment_lock, flags);
 }
 
-static void inline sync_request_done (unsigned long sector, raid1_conf_t *conf)
+static void inline sync_request_done(sector_t sector, conf_t *conf)
 {
 	unsigned long flags;
+
 	spin_lock_irqsave(&conf->segment_lock, flags);
 	if (sector >= conf->start_ready)
 		--conf->cnt_ready;
@@ -375,73 +254,80 @@ static void inline sync_request_done (unsigned long sector, raid1_conf_t *conf)
 }
 
 /*
- * raid1_end_bh_io() is called when we have finished servicing a mirrored
+ * raid_end_bio_io() is called when we have finished servicing a mirrored
  * operation and are ready to return a success/failure code to the buffer
  * cache layer.
  */
-static void raid1_end_bh_io (struct raid1_bh *r1_bh, int uptodate)
+static int raid_end_bio_io(r1bio_t *r1_bio, int uptodate, int nr_sectors)
 {
-	struct buffer_head *bh = r1_bh->master_bh;
+	struct bio *bio = r1_bio->master_bio;
 
-	io_request_done(bh->b_rsector, mddev_to_conf(r1_bh->mddev),
-			test_bit(R1BH_SyncPhase, &r1_bh->state));
+	raid_request_done(bio->bi_sector, mddev_to_conf(r1_bio->mddev),
+			test_bit(R1BIO_SyncPhase, &r1_bio->state));
 
-	bh->b_end_io(bh, uptodate);
-	raid1_free_r1bh(r1_bh);
+	bio_endio(bio, uptodate, nr_sectors);
+	free_r1bio(r1_bio);
+
+	return 0;
 }
-void raid1_end_request (struct buffer_head *bh, int uptodate)
+
+static int end_request(struct bio *bio, int nr_sectors)
 {
-	struct raid1_bh * r1_bh = (struct raid1_bh *)(bh->b_private);
+	int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
+	r1bio_t * r1_bio = (r1bio_t *)(bio->bi_private);
 
 	/*
 	 * this branch is our 'one mirror IO has finished' event handler:
 	 */
 	if (!uptodate)
-		md_error (r1_bh->mddev, bh->b_dev);
+		md_error(r1_bio->mddev, bio->bi_dev);
 	else
 		/*
-		 * Set R1BH_Uptodate in our master buffer_head, so that
+		 * Set R1BIO_Uptodate in our master bio, so that
 		 * we will return a good error code for to the higher
 		 * levels even if IO on some other mirrored buffer fails.
 		 *
-		 * The 'master' represents the complex operation to 
+		 * The 'master' represents the complex operation to
 		 * user-side. So if something waits for IO, then it will
-		 * wait for the 'master' buffer_head.
+		 * wait for the 'master' bio.
 		 */
-		set_bit (R1BH_Uptodate, &r1_bh->state);
+		set_bit(R1BIO_Uptodate, &r1_bio->state);
 
 	/*
-	 * We split up the read and write side, imho they are 
+	 * We split up the read and write side, imho they are
 	 * conceptually different.
 	 */
 
-	if ( (r1_bh->cmd == READ) || (r1_bh->cmd == READA) ) {
+	if ((r1_bio->cmd == READ) || (r1_bio->cmd == READA)) {
+		if (!r1_bio->read_bio)
+			BUG();
 		/*
-		 * we have only one buffer_head on the read side
+		 * we have only one bio on the read side
 		 */
-		
 		if (uptodate) {
-			raid1_end_bh_io(r1_bh, uptodate);
-			return;
+			raid_end_bio_io(r1_bio, uptodate, nr_sectors);
+			return 0;
 		}
 		/*
 		 * oops, read error:
 		 */
-		printk(KERN_ERR "raid1: %s: rescheduling block %lu\n", 
-			 partition_name(bh->b_dev), bh->b_blocknr);
-		raid1_reschedule_retry(r1_bh);
-		return;
+		printk(KERN_ERR "raid1: %s: rescheduling sector %lu\n",
+			partition_name(bio->bi_dev), r1_bio->sector);
+		reschedule_retry(r1_bio);
+		return 0;
 	}
 
+	if (r1_bio->read_bio)
+		BUG();
 	/*
 	 * WRITE:
 	 *
-	 * Let's see if all mirrored write operations have finished 
+	 * Let's see if all mirrored write operations have finished
 	 * already.
 	 */
-
-	if (atomic_dec_and_test(&r1_bh->remaining))
-		raid1_end_bh_io(r1_bh, test_bit(R1BH_Uptodate, &r1_bh->state));
+	if (atomic_dec_and_test(&r1_bio->remaining))
+		raid_end_bio_io(r1_bio, uptodate, nr_sectors);
+	return 0;
 }
 
 /*
@@ -456,22 +342,20 @@ void raid1_end_request (struct buffer_head *bh, int uptodate)
  * reads should be somehow balanced.
  */
 
-static int raid1_read_balance (raid1_conf_t *conf, struct buffer_head *bh)
+static int read_balance(conf_t *conf, struct bio *bio, r1bio_t *r1_bio)
 {
-	int new_disk = conf->last_used;
-	const int sectors = bh->b_size >> 9;
-	const unsigned long this_sector = bh->b_rsector;
-	int disk = new_disk;
-	unsigned long new_distance;
-	unsigned long current_distance;
-	
+	const int sectors = bio->bi_size >> 9;
+	const unsigned long this_sector = r1_bio->sector;
+	unsigned long new_distance, current_distance;
+	int new_disk = conf->last_used, disk = new_disk;
+
 	/*
 	 * Check if it is sane at all to balance
 	 */
-	
+
 	if (conf->resync_mirrors)
 		goto rb_out;
-	
+
 
 	/* make sure that disk is operational */
 	while( !conf->mirrors[new_disk].operational) {
@@ -483,7 +367,7 @@ static int raid1_read_balance (raid1_conf_t *conf, struct buffer_head *bh)
 			 * Nothing much to do, lets not change anything
 			 * and hope for the best...
 			 */
-			
+
 			new_disk = conf->last_used;
 
 			goto rb_out;
@@ -491,53 +375,51 @@ static int raid1_read_balance (raid1_conf_t *conf, struct buffer_head *bh)
 	}
 	disk = new_disk;
 	/* now disk == new_disk == starting point for search */
-	
+
 	/*
 	 * Don't touch anything for sequential reads.
 	 */
-
 	if (this_sector == conf->mirrors[new_disk].head_position)
 		goto rb_out;
-	
+
 	/*
 	 * If reads have been done only on a single disk
 	 * for a time, lets give another disk a change.
 	 * This is for kicking those idling disks so that
 	 * they would find work near some hotspot.
 	 */
-	
 	if (conf->sect_count >= conf->mirrors[new_disk].sect_limit) {
 		conf->sect_count = 0;
 
 		do {
-			if (new_disk<=0)
+			if (new_disk <= 0)
 				new_disk = conf->raid_disks;
 			new_disk--;
 			if (new_disk == disk)
 				break;
 		} while ((conf->mirrors[new_disk].write_only) ||
-			 (!conf->mirrors[new_disk].operational));
+			(!conf->mirrors[new_disk].operational));
 
 		goto rb_out;
 	}
-	
+
 	current_distance = abs(this_sector -
 				conf->mirrors[disk].head_position);
-	
+
 	/* Find the disk which is closest */
-	
+
 	do {
 		if (disk <= 0)
 			disk = conf->raid_disks;
 		disk--;
-		
+
 		if ((conf->mirrors[disk].write_only) ||
 				(!conf->mirrors[disk].operational))
 			continue;
-		
+
 		new_distance = abs(this_sector -
 					conf->mirrors[disk].head_position);
-		
+
 		if (new_distance < current_distance) {
 			conf->sect_count = 0;
 			current_distance = new_distance;
@@ -554,69 +436,73 @@ rb_out:
 	return new_disk;
 }
 
-static int raid1_make_request (mddev_t *mddev, int rw,
-			       struct buffer_head * bh)
-{
-	raid1_conf_t *conf = mddev_to_conf(mddev);
-	struct buffer_head *bh_req, *bhl;
-	struct raid1_bh * r1_bh;
-	int disks = MD_SB_DISKS;
-	int i, sum_bhs = 0;
-	struct mirror_info *mirror;
-
-	if (!buffer_locked(bh))
-		BUG();
-	
 /*
- * make_request() can abort the operation when READA is being
- * used and no empty request is available.
- *
- * Currently, just replace the command with READ/WRITE.
+ * Wait if the reconstruction state machine puts up a bar for
+ * new requests in this sector range:
  */
-	if (rw == READA)
-		rw = READ;
-
-	r1_bh = raid1_alloc_r1bh (conf);
-
+static inline void new_request(conf_t *conf, r1bio_t *r1_bio)
+{
 	spin_lock_irq(&conf->segment_lock);
 	wait_event_lock_irq(conf->wait_done,
-			bh->b_rsector < conf->start_active ||
-			bh->b_rsector >= conf->start_future,
+			r1_bio->sector < conf->start_active ||
+			r1_bio->sector >= conf->start_future,
 			conf->segment_lock);
-	if (bh->b_rsector < conf->start_active) 
+	if (r1_bio->sector < conf->start_active)
 		conf->cnt_done++;
 	else {
 		conf->cnt_future++;
 		if (conf->phase)
-			set_bit(R1BH_SyncPhase, &r1_bh->state);
+			set_bit(R1BIO_SyncPhase, &r1_bio->state);
 	}
 	spin_unlock_irq(&conf->segment_lock);
-	
+}
+
+static int make_request(mddev_t *mddev, int rw, struct bio * bio)
+{
+	conf_t *conf = mddev_to_conf(mddev);
+	mirror_info_t *mirror;
+	r1bio_t *r1_bio;
+	struct bio *read_bio;
+	int i, sum_bios = 0, disks = MD_SB_DISKS;
+
 	/*
-	 * i think the read and write branch should be separated completely,
-	 * since we want to do read balancing on the read side for example.
-	 * Alternative implementations? :) --mingo
+	 * make_request() can abort the operation when READA is being
+	 * used and no empty request is available.
+	 *
+	 * Currently, just replace the command with READ.
 	 */
+	if (rw == READA)
+		rw = READ;
+
+	r1_bio = mempool_alloc(conf->r1bio_pool, GFP_NOIO);
+	check_all_bios_empty(r1_bio);
+
+	r1_bio->master_bio = bio;
+
+	r1_bio->mddev = mddev;
+	r1_bio->sector = bio->bi_sector;
+	r1_bio->cmd = rw;
 
-	r1_bh->master_bh = bh;
-	r1_bh->mddev = mddev;
-	r1_bh->cmd = rw;
+	new_request(conf, r1_bio);
 
 	if (rw == READ) {
 		/*
 		 * read balancing logic:
 		 */
-		mirror = conf->mirrors + raid1_read_balance(conf, bh);
-
-		bh_req = &r1_bh->bh_req;
-		memcpy(bh_req, bh, sizeof(*bh));
-		bh_req->b_blocknr = bh->b_rsector;
-		bh_req->b_dev = mirror->dev;
-		bh_req->b_rdev = mirror->dev;
-	/*	bh_req->b_rsector = bh->n_rsector; */
-		bh_req->b_end_io = raid1_end_request;
-		bh_req->b_private = r1_bh;
-		generic_make_request (rw, bh_req);
+		mirror = conf->mirrors + read_balance(conf, bio, r1_bio);
+
+		read_bio = bio_clone(bio, GFP_NOIO);
+		if (r1_bio->read_bio)
+			BUG();
+		r1_bio->read_bio = read_bio;
+
+		read_bio->bi_sector = r1_bio->sector;
+		read_bio->bi_dev = mirror->dev;
+		read_bio->bi_end_io = end_request;
+		read_bio->bi_rw = rw;
+		read_bio->bi_private = r1_bio;
+
+		generic_make_request(read_bio);
 		return 0;
 	}
 
@@ -624,62 +510,35 @@ static int raid1_make_request (mddev_t *mddev, int rw,
 	 * WRITE:
 	 */
 
-	bhl = raid1_alloc_bh(conf, conf->raid_disks);
+	check_all_w_bios_empty(r1_bio);
+
 	for (i = 0; i < disks; i++) {
-		struct buffer_head *mbh;
-		if (!conf->mirrors[i].operational) 
+		struct bio *mbio;
+		if (!conf->mirrors[i].operational)
 			continue;
- 
-	/*
-	 * We should use a private pool (size depending on NR_REQUEST),
-	 * to avoid writes filling up the memory with bhs
-	 *
- 	 * Such pools are much faster than kmalloc anyways (so we waste
- 	 * almost nothing by not using the master bh when writing and
- 	 * win alot of cleanness) but for now we are cool enough. --mingo
- 	 *
-	 * It's safe to sleep here, buffer heads cannot be used in a shared
- 	 * manner in the write branch. Look how we lock the buffer at the
- 	 * beginning of this function to grok the difference ;)
-	 */
- 		mbh = bhl;
-		if (mbh == NULL) {
-			MD_BUG();
-			break;
-		}
-		bhl = mbh->b_next;
-		mbh->b_next = NULL;
-		mbh->b_this_page = (struct buffer_head *)1;
-		
- 	/*
- 	 * prepare mirrored mbh (fields ordered for max mem throughput):
- 	 */
-		mbh->b_blocknr    = bh->b_rsector;
-		mbh->b_dev        = conf->mirrors[i].dev;
-		mbh->b_rdev	  = conf->mirrors[i].dev;
-		mbh->b_rsector	  = bh->b_rsector;
-		mbh->b_state      = (1<<BH_Req) | (1<<BH_Dirty) |
-						(1<<BH_Mapped) | (1<<BH_Lock);
-
-		atomic_set(&mbh->b_count, 1);
- 		mbh->b_size       = bh->b_size;
- 		mbh->b_page	  = bh->b_page;
- 		mbh->b_data	  = bh->b_data;
- 		mbh->b_list       = BUF_LOCKED;
- 		mbh->b_end_io     = raid1_end_request;
- 		mbh->b_private    = r1_bh;
-
-		mbh->b_next = r1_bh->mirror_bh_list;
-		r1_bh->mirror_bh_list = mbh;
-		sum_bhs++;
+
+		mbio = bio_clone(bio, GFP_NOIO);
+		if (r1_bio->write_bios[i])
+			BUG();
+		r1_bio->write_bios[i] = mbio;
+
+		mbio->bi_sector	= r1_bio->sector;
+		mbio->bi_dev = conf->mirrors[i].dev;
+		mbio->bi_end_io	= end_request;
+		mbio->bi_rw = rw;
+		mbio->bi_private = r1_bio;
+
+		sum_bios++;
 	}
-	if (bhl) raid1_free_bh(conf,bhl);
-	if (!sum_bhs) {
-		/* Gag - all mirrors non-operational.. */
-		raid1_end_bh_io(r1_bh, 0);
+	if (!sum_bios) {
+		/*
+		 * If all mirrors are non-operational
+		 * then return an IO error:
+		 */
+		raid_end_bio_io(r1_bio, 0, 0);
 		return 0;
 	}
-	md_atomic_set(&r1_bh->remaining, sum_bhs);
+	atomic_set(&r1_bio->remaining, sum_bios);
 
 	/*
 	 * We have to be a bit careful about the semaphore above, thats
@@ -688,28 +547,30 @@ static int raid1_make_request (mddev_t *mddev, int rw,
 	 * safer solution. Imagine, end_request decreasing the semaphore
 	 * before we could have set it up ... We could play tricks with
 	 * the semaphore (presetting it and correcting at the end if
-	 * sum_bhs is not 'n' but we have to do end_request by hand if
+	 * sum_bios is not 'n' but we have to do end_request by hand if
 	 * all requests finish until we had a chance to set up the
 	 * semaphore correctly ... lots of races).
 	 */
-	bh = r1_bh->mirror_bh_list;
-	while(bh) {
-		struct buffer_head *bh2 = bh;
-		bh = bh->b_next;
-		generic_make_request(rw, bh2);
+	for (i = 0; i < disks; i++) {
+		struct bio *mbio;
+		mbio = r1_bio->write_bios[i];
+		if (!mbio)
+			continue;
+
+		generic_make_request(mbio);
 	}
-	return (0);
+	return 0;
 }
 
-static int raid1_status (char *page, mddev_t *mddev)
+static int status(char *page, mddev_t *mddev)
 {
-	raid1_conf_t *conf = mddev_to_conf(mddev);
+	conf_t *conf = mddev_to_conf(mddev);
 	int sz = 0, i;
-	
-	sz += sprintf (page+sz, " [%d/%d] [", conf->raid_disks,
-						 conf->working_disks);
+
+	sz += sprintf(page+sz, " [%d/%d] [", conf->raid_disks,
+						conf->working_disks);
 	for (i = 0; i < conf->raid_disks; i++)
-		sz += sprintf (page+sz, "%s",
+		sz += sprintf(page+sz, "%s",
 			conf->mirrors[i].operational ? "U" : "_");
 	sz += sprintf (page+sz, "]");
 	return sz;
@@ -731,10 +592,10 @@ static int raid1_status (char *page, mddev_t *mddev)
 #define ALREADY_SYNCING KERN_INFO \
 "raid1: syncing already in progress.\n"
 
-static void mark_disk_bad (mddev_t *mddev, int failed)
+static void mark_disk_bad(mddev_t *mddev, int failed)
 {
-	raid1_conf_t *conf = mddev_to_conf(mddev);
-	struct mirror_info *mirror = conf->mirrors+failed;
+	conf_t *conf = mddev_to_conf(mddev);
+	mirror_info_t *mirror = conf->mirrors+failed;
 	mdp_super_t *sb = mddev->sb;
 
 	mirror->operational = 0;
@@ -749,37 +610,36 @@ static void mark_disk_bad (mddev_t *mddev, int failed)
 	md_wakeup_thread(conf->thread);
 	if (!mirror->write_only)
 		conf->working_disks--;
-	printk (DISK_FAILED, partition_name (mirror->dev),
-				 conf->working_disks);
+	printk(DISK_FAILED, partition_name(mirror->dev),
+				conf->working_disks);
 }
 
-static int raid1_error (mddev_t *mddev, kdev_t dev)
+static int error(mddev_t *mddev, kdev_t dev)
 {
-	raid1_conf_t *conf = mddev_to_conf(mddev);
-	struct mirror_info * mirrors = conf->mirrors;
+	conf_t *conf = mddev_to_conf(mddev);
+	mirror_info_t * mirrors = conf->mirrors;
 	int disks = MD_SB_DISKS;
 	int i;
 
-	/* Find the drive.
+	/*
+	 * Find the drive.
 	 * If it is not operational, then we have already marked it as dead
 	 * else if it is the last working disks, ignore the error, let the
 	 * next level up know.
 	 * else mark the drive as failed
 	 */
-
 	for (i = 0; i < disks; i++)
-		if (mirrors[i].dev==dev && mirrors[i].operational)
+		if (mirrors[i].dev == dev && mirrors[i].operational)
 			break;
 	if (i == disks)
 		return 0;
 
-	if (i < conf->raid_disks && conf->working_disks == 1) {
-		/* Don't fail the drive, act as though we were just a
+	if (i < conf->raid_disks && conf->working_disks == 1)
+		/*
+		 * Don't fail the drive, act as though we were just a
 		 * normal single drive
 		 */
-
 		return 1;
-	}
 	mark_disk_bad(mddev, i);
 	return 0;
 }
@@ -790,41 +650,42 @@ static int raid1_error (mddev_t *mddev, kdev_t dev)
 #undef START_SYNCING
 
 
-static void print_raid1_conf (raid1_conf_t *conf)
+static void print_conf(conf_t *conf)
 {
 	int i;
-	struct mirror_info *tmp;
+	mirror_info_t *tmp;
 
 	printk("RAID1 conf printout:\n");
 	if (!conf) {
-		printk("(conf==NULL)\n");
+		printk("(!conf)\n");
 		return;
 	}
 	printk(" --- wd:%d rd:%d nd:%d\n", conf->working_disks,
-			 conf->raid_disks, conf->nr_disks);
+			conf->raid_disks, conf->nr_disks);
 
 	for (i = 0; i < MD_SB_DISKS; i++) {
 		tmp = conf->mirrors + i;
 		printk(" disk %d, s:%d, o:%d, n:%d rd:%d us:%d dev:%s\n",
-			i, tmp->spare,tmp->operational,
-			tmp->number,tmp->raid_disk,tmp->used_slot,
+			i, tmp->spare, tmp->operational,
+			tmp->number, tmp->raid_disk, tmp->used_slot,
 			partition_name(tmp->dev));
 	}
 }
 
-static void close_sync(raid1_conf_t *conf)
+static void close_sync(conf_t *conf)
 {
 	mddev_t *mddev = conf->mddev;
-	/* If reconstruction was interrupted, we need to close the "active" and "pending"
-	 * holes.
-	 * we know that there are no active rebuild requests, os cnt_active == cnt_ready ==0
+	/*
+	 * If reconstruction was interrupted, we need to close the "active"
+	 * and "pending" holes.
+	 * we know that there are no active rebuild requests,
+	 * os cnt_active == cnt_ready == 0
 	 */
-	/* this is really needed when recovery stops too... */
 	spin_lock_irq(&conf->segment_lock);
 	conf->start_active = conf->start_pending;
 	conf->start_ready = conf->start_pending;
 	wait_event_lock_irq(conf->wait_ready, !conf->cnt_pending, conf->segment_lock);
-	conf->start_active =conf->start_ready = conf->start_pending = conf->start_future;
+	conf->start_active = conf->start_ready = conf->start_pending = conf->start_future;
 	conf->start_future = mddev->sb->size+1;
 	conf->cnt_pending = conf->cnt_future;
 	conf->cnt_future = 0;
@@ -838,18 +699,18 @@ static void close_sync(raid1_conf_t *conf)
 	wake_up(&conf->wait_done);
 }
 
-static int raid1_diskop(mddev_t *mddev, mdp_disk_t **d, int state)
+static int diskop(mddev_t *mddev, mdp_disk_t **d, int state)
 {
 	int err = 0;
-	int i, failed_disk=-1, spare_disk=-1, removed_disk=-1, added_disk=-1;
-	raid1_conf_t *conf = mddev->private;
-	struct mirror_info *tmp, *sdisk, *fdisk, *rdisk, *adisk;
+	int i, failed_disk = -1, spare_disk = -1, removed_disk = -1, added_disk = -1;
+	conf_t *conf = mddev->private;
+	mirror_info_t *tmp, *sdisk, *fdisk, *rdisk, *adisk;
 	mdp_super_t *sb = mddev->sb;
 	mdp_disk_t *failed_desc, *spare_desc, *added_desc;
 	mdk_rdev_t *spare_rdev, *failed_rdev;
 
-	print_raid1_conf(conf);
-	md_spin_lock_irq(&conf->device_lock);
+	print_conf(conf);
+	spin_lock_irq(&conf->device_lock);
 	/*
 	 * find the disk ...
 	 */
@@ -871,7 +732,7 @@ static int raid1_diskop(mddev_t *mddev, mdp_disk_t **d, int state)
 		}
 		/*
 		 * When we activate a spare disk we _must_ have a disk in
-		 * the lower (active) part of the array to replace. 
+		 * the lower (active) part of the array to replace.
 		 */
 		if ((failed_disk == -1) || (failed_disk >= conf->raid_disks)) {
 			MD_BUG();
@@ -982,7 +843,7 @@ static int raid1_diskop(mddev_t *mddev, mdp_disk_t **d, int state)
 			err = 1;
 			goto abort;
 		}
-			
+
 		if (sdisk->raid_disk != spare_disk) {
 			MD_BUG();
 			err = 1;
@@ -1007,13 +868,14 @@ static int raid1_diskop(mddev_t *mddev, mdp_disk_t **d, int state)
 		spare_rdev = find_rdev_nr(mddev, spare_desc->number);
 		failed_rdev = find_rdev_nr(mddev, failed_desc->number);
 
-		/* There must be a spare_rdev, but there may not be a
-		 * failed_rdev.  That slot might be empty...
+		/*
+		 * There must be a spare_rdev, but there may not be a
+		 * failed_rdev. That slot might be empty...
 		 */
 		spare_rdev->desc_nr = failed_desc->number;
 		if (failed_rdev)
 			failed_rdev->desc_nr = spare_desc->number;
-		
+
 		xchg_values(*spare_desc, *failed_desc);
 		xchg_values(*fdisk, *sdisk);
 
@@ -1024,7 +886,6 @@ static int raid1_diskop(mddev_t *mddev, mdp_disk_t **d, int state)
 		 * give the proper raid_disk number to the now activated
 		 * disk. (this means we switch back these values)
 		 */
-	
 		xchg_values(spare_desc->raid_disk, failed_desc->raid_disk);
 		xchg_values(sdisk->raid_disk, fdisk->raid_disk);
 		xchg_values(spare_desc->number, failed_desc->number);
@@ -1054,7 +915,7 @@ static int raid1_diskop(mddev_t *mddev, mdp_disk_t **d, int state)
 		rdisk = conf->mirrors + removed_disk;
 
 		if (rdisk->spare && (removed_disk < conf->raid_disks)) {
-			MD_BUG();	
+			MD_BUG();
 			err = 1;
 			goto abort;
 		}
@@ -1068,14 +929,14 @@ static int raid1_diskop(mddev_t *mddev, mdp_disk_t **d, int state)
 		added_desc = *d;
 
 		if (added_disk != added_desc->number) {
-			MD_BUG();	
+			MD_BUG();
 			err = 1;
 			goto abort;
 		}
 
 		adisk->number = added_desc->number;
 		adisk->raid_disk = added_desc->raid_disk;
-		adisk->dev = MKDEV(added_desc->major,added_desc->minor);
+		adisk->dev = MKDEV(added_desc->major, added_desc->minor);
 
 		adisk->operational = 0;
 		adisk->write_only = 0;
@@ -1087,17 +948,18 @@ static int raid1_diskop(mddev_t *mddev, mdp_disk_t **d, int state)
 		break;
 
 	default:
-		MD_BUG();	
+		MD_BUG();
 		err = 1;
 		goto abort;
 	}
 abort:
-	md_spin_unlock_irq(&conf->device_lock);
-	if (state == DISKOP_SPARE_ACTIVE || state == DISKOP_SPARE_INACTIVE)
-		/* should move to "END_REBUILD" when such exists */
-		raid1_shrink_buffers(conf);
+	spin_unlock_irq(&conf->device_lock);
+	if (state == DISKOP_SPARE_ACTIVE || state == DISKOP_SPARE_INACTIVE) {
+		mempool_destroy(conf->r1buf_pool);
+		conf->r1buf_pool = NULL;
+	}
 
-	print_raid1_conf(conf);
+	print_conf(conf);
 	return err;
 }
 
@@ -1108,6 +970,122 @@ abort:
 #define REDIRECT_SECTOR KERN_ERR \
 "raid1: %s: redirecting sector %lu to another mirror\n"
 
+static int end_sync_read(struct bio *bio, int nr_sectors)
+{
+	int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
+	r1bio_t * r1_bio = (r1bio_t *)(bio->bi_private);
+
+	check_all_w_bios_empty(r1_bio);
+	if (r1_bio->read_bio != bio)
+		BUG();
+	/*
+	 * we have read a block, now it needs to be re-written,
+	 * or re-read if the read failed.
+	 * We don't do much here, just schedule handling by raid1d
+	 */
+	if (!uptodate)
+		md_error (r1_bio->mddev, bio->bi_dev);
+	else
+		set_bit(R1BIO_Uptodate, &r1_bio->state);
+	reschedule_retry(r1_bio);
+
+	return 0;
+}
+
+static int end_sync_write(struct bio *bio, int nr_sectors)
+{
+	int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
+	r1bio_t * r1_bio = (r1bio_t *)(bio->bi_private);
+	mddev_t *mddev = r1_bio->mddev;
+
+	if (!uptodate)
+		md_error(mddev, bio->bi_dev);
+
+	if (atomic_dec_and_test(&r1_bio->remaining)) {
+		sync_request_done(r1_bio->sector, mddev_to_conf(mddev));
+		md_done_sync(mddev, r1_bio->master_bio->bi_size >> 9, uptodate);
+		put_buf(r1_bio);
+	}
+	return 0;
+}
+
+static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio)
+{
+	conf_t *conf = mddev_to_conf(mddev);
+	int i, sum_bios = 0;
+	int disks = MD_SB_DISKS;
+	struct bio *bio, *mbio;
+
+	bio = r1_bio->master_bio;
+
+	/*
+	 * have to allocate lots of bio structures and
+	 * schedule writes
+	 */
+	if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) {
+		/*
+		 * There is no point trying a read-for-reconstruct as
+		 * reconstruct is about to be aborted
+		 */
+		printk(IO_ERROR, partition_name(bio->bi_dev), r1_bio->sector);
+		md_done_sync(mddev, r1_bio->master_bio->bi_size >> 9, 0);
+		return;
+	}
+
+	check_all_w_bios_empty(r1_bio);
+
+	for (i = 0; i < disks ; i++) {
+		if (!conf->mirrors[i].operational)
+			continue;
+		if (i == conf->last_used)
+			/*
+			 * we read from here, no need to write
+			 */
+			continue;
+		if (i < conf->raid_disks && !conf->resync_mirrors)
+			/*
+			 * don't need to write this we are just rebuilding
+			 */
+			continue;
+
+		mbio = bio_clone(bio, GFP_NOIO);
+		if (r1_bio->write_bios[i])
+			BUG();
+		r1_bio->write_bios[i] = mbio;
+		mbio->bi_dev = conf->mirrors[i].dev;
+		mbio->bi_sector = r1_bio->sector;
+		mbio->bi_end_io	= end_sync_write;
+		mbio->bi_rw = WRITE;
+		mbio->bi_private = r1_bio;
+
+		sum_bios++;
+	}
+	if (i != disks)
+		BUG();
+	atomic_set(&r1_bio->remaining, sum_bios);
+
+
+	if (!sum_bios) {
+		/*
+		 * Nowhere to write this to... I guess we
+		 * must be done
+		 */
+		printk(IO_ERROR, partition_name(bio->bi_dev), r1_bio->sector);
+		sync_request_done(r1_bio->sector, conf);
+		md_done_sync(mddev, r1_bio->master_bio->bi_size >> 9, 0);
+		put_buf(r1_bio);
+		return;
+	}
+	for (i = 0; i < disks ; i++) {
+		mbio = r1_bio->write_bios[i];
+		if (!mbio)
+			continue;
+
+		md_sync_acct(mbio->bi_dev, mbio->bi_size >> 9);
+		generic_make_request(mbio);
+	}
+}
+
 /*
  * This is a kernel thread which:
  *
@@ -1115,134 +1093,56 @@ abort:
  *	2.	Updates the raid superblock when problems encounter.
  *	3.	Performs writes following reads for array syncronising.
  */
-static void end_sync_write(struct buffer_head *bh, int uptodate);
-static void end_sync_read(struct buffer_head *bh, int uptodate);
 
-static void raid1d (void *data)
+static void raid1d(void *data)
 {
-	struct raid1_bh *r1_bh;
-	struct buffer_head *bh;
+	struct list_head *head = &retry_list_head;
+	r1bio_t *r1_bio;
+	struct bio *bio;
 	unsigned long flags;
 	mddev_t *mddev;
 	kdev_t dev;
 
 
 	for (;;) {
-		md_spin_lock_irqsave(&retry_list_lock, flags);
-		r1_bh = raid1_retry_list;
-		if (!r1_bh)
+		spin_lock_irqsave(&retry_list_lock, flags);
+		if (list_empty(head))
 			break;
-		raid1_retry_list = r1_bh->next_r1;
-		md_spin_unlock_irqrestore(&retry_list_lock, flags);
+		r1_bio = list_entry(head->prev, r1bio_t, retry_list);
+		list_del(head->prev);
+		spin_unlock_irqrestore(&retry_list_lock, flags);
+		check_all_w_bios_empty(r1_bio);
 
-		mddev = r1_bh->mddev;
+		mddev = r1_bio->mddev;
 		if (mddev->sb_dirty) {
 			printk(KERN_INFO "raid1: dirty sb detected, updating.\n");
 			mddev->sb_dirty = 0;
 			md_update_sb(mddev);
 		}
-		bh = &r1_bh->bh_req;
-		switch(r1_bh->cmd) {
+		bio = r1_bio->master_bio;
+		switch(r1_bio->cmd) {
 		case SPECIAL:
-			/* have to allocate lots of bh structures and
-			 * schedule writes
-			 */
-			if (test_bit(R1BH_Uptodate, &r1_bh->state)) {
-				int i, sum_bhs = 0;
-				int disks = MD_SB_DISKS;
-				struct buffer_head *bhl, *mbh;
-				raid1_conf_t *conf;
-				
-				conf = mddev_to_conf(mddev);
-				bhl = raid1_alloc_bh(conf, conf->raid_disks); /* don't really need this many */
-				for (i = 0; i < disks ; i++) {
-					if (!conf->mirrors[i].operational)
-						continue;
-					if (i==conf->last_used)
-						/* we read from here, no need to write */
-						continue;
-					if (i < conf->raid_disks
-					    && !conf->resync_mirrors)
-						/* don't need to write this,
-						 * we are just rebuilding */
-						continue;
-					mbh = bhl;
-					if (!mbh) {
-						MD_BUG();
-						break;
-					}
-					bhl = mbh->b_next;
-					mbh->b_this_page = (struct buffer_head *)1;
-
-						
-				/*
-				 * prepare mirrored bh (fields ordered for max mem throughput):
-				 */
-					mbh->b_blocknr    = bh->b_blocknr;
-					mbh->b_dev        = conf->mirrors[i].dev;
-					mbh->b_rdev	  = conf->mirrors[i].dev;
-					mbh->b_rsector	  = bh->b_blocknr;
-					mbh->b_state      = (1<<BH_Req) | (1<<BH_Dirty) |
-						(1<<BH_Mapped) | (1<<BH_Lock);
-					atomic_set(&mbh->b_count, 1);
-					mbh->b_size       = bh->b_size;
-					mbh->b_page	  = bh->b_page;
-					mbh->b_data	  = bh->b_data;
-					mbh->b_list       = BUF_LOCKED;
-					mbh->b_end_io     = end_sync_write;
-					mbh->b_private    = r1_bh;
-
-					mbh->b_next = r1_bh->mirror_bh_list;
-					r1_bh->mirror_bh_list = mbh;
-
-					sum_bhs++;
-				}
-				md_atomic_set(&r1_bh->remaining, sum_bhs);
-				if (bhl) raid1_free_bh(conf, bhl);
-				mbh = r1_bh->mirror_bh_list;
-
-				if (!sum_bhs) {
-					/* nowhere to write this too... I guess we
-					 * must be done
-					 */
-					sync_request_done(bh->b_blocknr, conf);
-					md_done_sync(mddev, bh->b_size>>9, 0);
-					raid1_free_buf(r1_bh);
-				} else
-				while (mbh) {
-					struct buffer_head *bh1 = mbh;
-					mbh = mbh->b_next;
-					generic_make_request(WRITE, bh1);
-					md_sync_acct(bh1->b_dev, bh1->b_size/512);
-				}
-			} else {
-				/* There is no point trying a read-for-reconstruct
-				 * as reconstruct is about to be aborted
-				 */
-
-				printk (IO_ERROR, partition_name(bh->b_dev), bh->b_blocknr);
-				md_done_sync(mddev, bh->b_size>>9, 0);
-			}
-
+			sync_request_write(mddev, r1_bio);
 			break;
 		case READ:
 		case READA:
-			dev = bh->b_dev;
-			raid1_map (mddev, &bh->b_dev);
-			if (bh->b_dev == dev) {
-				printk (IO_ERROR, partition_name(bh->b_dev), bh->b_blocknr);
-				raid1_end_bh_io(r1_bh, 0);
-			} else {
-				printk (REDIRECT_SECTOR,
-					partition_name(bh->b_dev), bh->b_blocknr);
-				bh->b_rdev = bh->b_dev;
-				bh->b_rsector = bh->b_blocknr;
-				generic_make_request (r1_bh->cmd, bh);
+			dev = bio->bi_dev;
+			map(mddev, &bio->bi_dev);
+			if (bio->bi_dev == dev) {
+				printk(IO_ERROR, partition_name(bio->bi_dev), r1_bio->sector);
+				raid_end_bio_io(r1_bio, 0, 0);
+				break;
 			}
+			printk(REDIRECT_SECTOR,
+				partition_name(bio->bi_dev), r1_bio->sector);
+			bio->bi_sector = r1_bio->sector;
+			bio->bi_rw = r1_bio->cmd;
+
+			generic_make_request(bio);
 			break;
 		}
 	}
-	md_spin_unlock_irqrestore(&retry_list_lock, flags);
+	spin_unlock_irqrestore(&retry_list_lock, flags);
 }
 #undef IO_ERROR
 #undef REDIRECT_SECTOR
@@ -1251,9 +1151,9 @@ static void raid1d (void *data)
  * Private kernel thread to reconstruct mirrors after an unclean
  * shutdown.
  */
-static void raid1syncd (void *data)
+static void raid1syncd(void *data)
 {
-	raid1_conf_t *conf = data;
+	conf_t *conf = data;
 	mddev_t *mddev = conf->mddev;
 
 	if (!conf->resync_mirrors)
@@ -1271,7 +1171,56 @@ static void raid1syncd (void *data)
 	close_sync(conf);
 
 	up(&mddev->recovery_sem);
-	raid1_shrink_buffers(conf);
+}
+
+static int init_resync(conf_t *conf)
+{
+	int buffs;
+
+	conf->start_active = 0;
+	conf->start_ready = 0;
+	conf->start_pending = 0;
+	conf->start_future = 0;
+	conf->phase = 0;
+
+	buffs = RESYNC_WINDOW / RESYNC_BLOCK_SIZE;
+	if (conf->r1buf_pool)
+		BUG();
+	conf->r1buf_pool = mempool_create(buffs, r1buf_pool_alloc, r1buf_pool_free, conf);
+	if (!conf->r1buf_pool)
+		return -ENOMEM;
+	conf->window = 2048;
+	conf->cnt_future += conf->cnt_done+conf->cnt_pending;
+	conf->cnt_done = conf->cnt_pending = 0;
+	if (conf->cnt_ready || conf->cnt_active)
+		MD_BUG();
+	return 0;
+}
+
+static void wait_sync_pending(conf_t *conf, sector_t sector_nr)
+{
+	spin_lock_irq(&conf->segment_lock);
+	while (sector_nr >= conf->start_pending) {
+//		printk("wait .. sect=%lu start_active=%d ready=%d pending=%d future=%d, cnt_done=%d active=%d ready=%d pending=%d future=%d\n", sector_nr, conf->start_active, conf->start_ready, conf->start_pending, conf->start_future, conf->cnt_done, conf->cnt_active, conf->cnt_ready, conf->cnt_pending, conf->cnt_future);
+		wait_event_lock_irq(conf->wait_done, !conf->cnt_active,
+					conf->segment_lock);
+		wait_event_lock_irq(conf->wait_ready, !conf->cnt_pending,
+					conf->segment_lock);
+		conf->start_active = conf->start_ready;
+		conf->start_ready = conf->start_pending;
+		conf->start_pending = conf->start_future;
+		conf->start_future = conf->start_future+conf->window;
+
+		// Note: falling off the end is not a problem
+		conf->phase = conf->phase ^1;
+		conf->cnt_active = conf->cnt_ready;
+		conf->cnt_ready = 0;
+		conf->cnt_pending = conf->cnt_future;
+		conf->cnt_future = 0;
+		wake_up(&conf->wait_done);
+	}
+	conf->cnt_ready++;
+	spin_unlock_irq(&conf->segment_lock);
 }
 
 /*
@@ -1279,7 +1228,7 @@ static void raid1syncd (void *data)
  *
  * We need to make sure that no normal I/O request - particularly write
  * requests - conflict with active sync requests.
- * This is achieved by conceptually dividing the device space into a
+ * This is achieved by conceptually dividing the block space into a
  * number of sections:
  *  DONE: 0 .. a-1     These blocks are in-sync
  *  ACTIVE: a.. b-1    These blocks may have active sync requests, but
@@ -1322,149 +1271,81 @@ static void raid1syncd (void *data)
  * issue suitable write requests
  */
 
-static int raid1_sync_request (mddev_t *mddev, unsigned long sector_nr)
+static int sync_request(mddev_t *mddev, sector_t sector_nr)
 {
-	raid1_conf_t *conf = mddev_to_conf(mddev);
-	struct mirror_info *mirror;
-	struct raid1_bh *r1_bh;
-	struct buffer_head *bh;
-	int bsize;
-	int disk;
-	int block_nr;
+	conf_t *conf = mddev_to_conf(mddev);
+	mirror_info_t *mirror;
+	r1bio_t *r1_bio;
+	struct bio *read_bio, *bio;
+	sector_t max_sector, nr_sectors;
+	int disk, partial;
 
-	spin_lock_irq(&conf->segment_lock);
-	if (!sector_nr) {
-		/* initialize ...*/
-		int buffs;
-		conf->start_active = 0;
-		conf->start_ready = 0;
-		conf->start_pending = 0;
-		conf->start_future = 0;
-		conf->phase = 0;
-		/* we want enough buffers to hold twice the window of 128*/
-		buffs = 128 *2 / (PAGE_SIZE>>9);
-		buffs = raid1_grow_buffers(conf, buffs);
-		if (buffs < 2)
-			goto nomem;
-		
-		conf->window = buffs*(PAGE_SIZE>>9)/2;
-		conf->cnt_future += conf->cnt_done+conf->cnt_pending;
-		conf->cnt_done = conf->cnt_pending = 0;
-		if (conf->cnt_ready || conf->cnt_active)
-			MD_BUG();
-	}
-	while (sector_nr >= conf->start_pending) {
-		PRINTK("wait .. sect=%lu start_active=%d ready=%d pending=%d future=%d, cnt_done=%d active=%d ready=%d pending=%d future=%d\n",
-			sector_nr, conf->start_active, conf->start_ready, conf->start_pending, conf->start_future,
-			conf->cnt_done, conf->cnt_active, conf->cnt_ready, conf->cnt_pending, conf->cnt_future);
-		wait_event_lock_irq(conf->wait_done,
-					!conf->cnt_active,
-					conf->segment_lock);
-		wait_event_lock_irq(conf->wait_ready,
-					!conf->cnt_pending,
-					conf->segment_lock);
-		conf->start_active = conf->start_ready;
-		conf->start_ready = conf->start_pending;
-		conf->start_pending = conf->start_future;
-		conf->start_future = conf->start_future+conf->window;
-		// Note: falling off the end is not a problem
-		conf->phase = conf->phase ^1;
-		conf->cnt_active = conf->cnt_ready;
-		conf->cnt_ready = 0;
-		conf->cnt_pending = conf->cnt_future;
-		conf->cnt_future = 0;
-		wake_up(&conf->wait_done);
-	}
-	conf->cnt_ready++;
-	spin_unlock_irq(&conf->segment_lock);
-		
+	if (!sector_nr)
+		if (init_resync(conf))
+			return -ENOMEM;
 
-	/* If reconstructing, and >1 working disc,
+	wait_sync_pending(conf, sector_nr);
+
+	/*
+	 * If reconstructing, and >1 working disc,
 	 * could dedicate one to rebuild and others to
 	 * service read requests ..
 	 */
 	disk = conf->last_used;
 	/* make sure disk is operational */
 	while (!conf->mirrors[disk].operational) {
-		if (disk <= 0) disk = conf->raid_disks;
+		if (disk <= 0)
+			disk = conf->raid_disks;
 		disk--;
 		if (disk == conf->last_used)
 			break;
 	}
 	conf->last_used = disk;
-	
+
 	mirror = conf->mirrors+conf->last_used;
-	
-	r1_bh = raid1_alloc_buf (conf);
-	r1_bh->master_bh = NULL;
-	r1_bh->mddev = mddev;
-	r1_bh->cmd = SPECIAL;
-	bh = &r1_bh->bh_req;
-
-	block_nr = sector_nr;
-	bsize = 512;
-	while (!(block_nr & 1) && bsize < PAGE_SIZE
-			&& (block_nr+2)*(bsize>>9) < (mddev->sb->size *2)) {
-		block_nr >>= 1;
-		bsize <<= 1;
-	}
-	bh->b_size = bsize;
-	bh->b_list = BUF_LOCKED;
-	bh->b_dev = mirror->dev;
-	bh->b_rdev = mirror->dev;
-	bh->b_state = (1<<BH_Req) | (1<<BH_Mapped) | (1<<BH_Lock);
-	if (!bh->b_page)
-		BUG();
-	if (!bh->b_data)
-		BUG();
-	if (bh->b_data != page_address(bh->b_page))
+
+	r1_bio = mempool_alloc(conf->r1buf_pool, GFP_NOIO);
+	check_all_bios_empty(r1_bio);
+
+	r1_bio->mddev = mddev;
+	r1_bio->sector = sector_nr;
+	r1_bio->cmd = SPECIAL;
+
+	max_sector = mddev->sb->size << 1;
+	if (sector_nr >= max_sector)
 		BUG();
-	bh->b_end_io = end_sync_read;
-	bh->b_private = r1_bh;
-	bh->b_blocknr = sector_nr;
-	bh->b_rsector = sector_nr;
-	init_waitqueue_head(&bh->b_wait);
 
-	generic_make_request(READ, bh);
-	md_sync_acct(bh->b_dev, bh->b_size/512);
+	bio = r1_bio->master_bio;
+	nr_sectors = RESYNC_BLOCK_SIZE >> 9;
+	if (max_sector - sector_nr < nr_sectors)
+		nr_sectors = max_sector - sector_nr;
+	bio->bi_size = nr_sectors << 9;
+	bio->bi_vcnt = (bio->bi_size + PAGE_SIZE-1) / PAGE_SIZE;
+	/*
+	 * Is there a partial page at the end of the request?
+	 */
+	partial = bio->bi_size % PAGE_SIZE;
+	if (partial)
+		bio->bi_io_vec[bio->bi_vcnt-1].bv_len = partial;
 
-	return (bsize >> 9);
 
-nomem:
-	raid1_shrink_buffers(conf);
-	spin_unlock_irq(&conf->segment_lock);
-	return -ENOMEM;
-}
+	read_bio = bio_clone(r1_bio->master_bio, GFP_NOIO);
 
-static void end_sync_read(struct buffer_head *bh, int uptodate)
-{
-	struct raid1_bh * r1_bh = (struct raid1_bh *)(bh->b_private);
+	read_bio->bi_sector = sector_nr;
+	read_bio->bi_dev = mirror->dev;
+	read_bio->bi_end_io = end_sync_read;
+	read_bio->bi_rw = READ;
+	read_bio->bi_private = r1_bio;
 
-	/* we have read a block, now it needs to be re-written,
-	 * or re-read if the read failed.
-	 * We don't do much here, just schedule handling by raid1d
-	 */
-	if (!uptodate)
-		md_error (r1_bh->mddev, bh->b_dev);
-	else
-		set_bit(R1BH_Uptodate, &r1_bh->state);
-	raid1_reschedule_retry(r1_bh);
-}
+	if (r1_bio->read_bio)
+		BUG();
+	r1_bio->read_bio = read_bio;
 
-static void end_sync_write(struct buffer_head *bh, int uptodate)
-{
- 	struct raid1_bh * r1_bh = (struct raid1_bh *)(bh->b_private);
-	
-	if (!uptodate)
- 		md_error (r1_bh->mddev, bh->b_dev);
-	if (atomic_dec_and_test(&r1_bh->remaining)) {
-		mddev_t *mddev = r1_bh->mddev;
- 		unsigned long sect = bh->b_blocknr;
-		int size = bh->b_size;
-		raid1_free_buf(r1_bh);
-		sync_request_done(sect, mddev_to_conf(mddev));
-		md_done_sync(mddev,size>>9, uptodate);
-	}
+	md_sync_acct(read_bio->bi_dev, nr_sectors);
+
+	generic_make_request(read_bio);
+
+	return nr_sectors;
 }
 
 #define INVALID_LEVEL KERN_WARNING \
@@ -1506,15 +1387,15 @@ static void end_sync_write(struct buffer_head *bh, int uptodate)
 #define START_RESYNC KERN_WARNING \
 "raid1: raid set md%d not clean; reconstructing mirrors\n"
 
-static int raid1_run (mddev_t *mddev)
+static int run(mddev_t *mddev)
 {
-	raid1_conf_t *conf;
+	conf_t *conf;
 	int i, j, disk_idx;
-	struct mirror_info *disk;
+	mirror_info_t *disk;
 	mdp_super_t *sb = mddev->sb;
 	mdp_disk_t *descriptor;
 	mdk_rdev_t *rdev;
-	struct md_list_head *tmp;
+	struct list_head *tmp;
 	int start_recovery = 0;
 
 	MOD_INC_USE_COUNT;
@@ -1525,11 +1406,10 @@ static int raid1_run (mddev_t *mddev)
 	}
 	/*
 	 * copy the already verified devices into our private RAID1
-	 * bookkeeping area. [whatever we allocate in raid1_run(),
-	 * should be freed in raid1_stop()]
+	 * bookkeeping area. [whatever we allocate in run(),
+	 * should be freed in stop()]
 	 */
-
-	conf = kmalloc(sizeof(raid1_conf_t), GFP_KERNEL);
+	conf = kmalloc(sizeof(conf_t), GFP_KERNEL);
 	mddev->private = conf;
 	if (!conf) {
 		printk(MEM_ERROR, mdidx(mddev));
@@ -1537,7 +1417,16 @@ static int raid1_run (mddev_t *mddev)
 	}
 	memset(conf, 0, sizeof(*conf));
 
-	ITERATE_RDEV(mddev,rdev,tmp) {
+	conf->r1bio_pool = mempool_create(NR_RAID1_BIOS, r1bio_pool_alloc,
+						r1bio_pool_free, NULL);
+	if (!conf->r1bio_pool) {
+		printk(MEM_ERROR, mdidx(mddev));
+		goto out;
+	}
+
+//	for (tmp = (mddev)->disks.next; rdev = ((mdk_rdev_t *)((char *)(tmp)-(unsigned long)(&((mdk_rdev_t *)0)->same_set))), tmp = tmp->next, tmp->prev != &(mddev)->disks ; ) {
+
+	ITERATE_RDEV(mddev, rdev, tmp) {
 		if (rdev->faulty) {
 			printk(ERRORS, partition_name(rdev->dev));
 		} else {
@@ -1573,7 +1462,7 @@ static int raid1_run (mddev_t *mddev)
 				continue;
 			}
 			if ((descriptor->number > MD_SB_DISKS) ||
-					 (disk_idx > sb->raid_disks)) {
+					(disk_idx > sb->raid_disks)) {
 
 				printk(INCONSISTENT,
 					partition_name(rdev->dev));
@@ -1586,7 +1475,7 @@ static int raid1_run (mddev_t *mddev)
 				continue;
 			}
 			printk(OPERATIONAL, partition_name(rdev->dev),
- 					disk_idx);
+					disk_idx);
 			disk->number = descriptor->number;
 			disk->raid_disk = disk_idx;
 			disk->dev = rdev->dev;
@@ -1616,10 +1505,9 @@ static int raid1_run (mddev_t *mddev)
 	conf->raid_disks = sb->raid_disks;
 	conf->nr_disks = sb->nr_disks;
 	conf->mddev = mddev;
-	conf->device_lock = MD_SPIN_LOCK_UNLOCKED;
+	conf->device_lock = SPIN_LOCK_UNLOCKED;
 
-	conf->segment_lock = MD_SPIN_LOCK_UNLOCKED;
-	init_waitqueue_head(&conf->wait_buffer);
+	conf->segment_lock = SPIN_LOCK_UNLOCKED;
 	init_waitqueue_head(&conf->wait_done);
 	init_waitqueue_head(&conf->wait_ready);
 
@@ -1628,25 +1516,8 @@ static int raid1_run (mddev_t *mddev)
 		goto out_free_conf;
 	}
 
-
-	/* pre-allocate some buffer_head structures.
-	 * As a minimum, 1 r1bh and raid_disks buffer_heads
-	 * would probably get us by in tight memory situations,
-	 * but a few more is probably a good idea.
-	 * For now, try NR_RESERVED_BUFS r1bh and
-	 * NR_RESERVED_BUFS*raid_disks bufferheads
-	 * This will allow at least NR_RESERVED_BUFS concurrent
-	 * reads or writes even if kmalloc starts failing
-	 */
-	if (raid1_grow_r1bh(conf, NR_RESERVED_BUFS) < NR_RESERVED_BUFS ||
-	    raid1_grow_bh(conf, NR_RESERVED_BUFS*conf->raid_disks)
-	                      < NR_RESERVED_BUFS*conf->raid_disks) {
-		printk(MEM_ERROR, mdidx(mddev));
-		goto out_free_conf;
-	}
-
 	for (i = 0; i < MD_SB_DISKS; i++) {
-		
+
 		descriptor = sb->disks+i;
 		disk_idx = descriptor->raid_disk;
 		disk = conf->mirrors + disk_idx;
@@ -1691,10 +1562,10 @@ static int raid1_run (mddev_t *mddev)
 	}
 
 	if (!start_recovery && !(sb->state & (1 << MD_SB_CLEAN)) &&
-	    (conf->working_disks > 1)) {
+						(conf->working_disks > 1)) {
 		const char * name = "raid1syncd";
 
-		conf->resync_thread = md_register_thread(raid1syncd, conf,name);
+		conf->resync_thread = md_register_thread(raid1syncd, conf, name);
 		if (!conf->resync_thread) {
 			printk(THREAD_ERROR, mdidx(mddev));
 			goto out_free_conf;
@@ -1731,9 +1602,8 @@ static int raid1_run (mddev_t *mddev)
 	return 0;
 
 out_free_conf:
-	raid1_shrink_r1bh(conf);
-	raid1_shrink_bh(conf);
-	raid1_shrink_buffers(conf);
+	if (conf->r1bio_pool)
+		mempool_destroy(conf->r1bio_pool);
 	kfree(conf);
 	mddev->private = NULL;
 out:
@@ -1752,9 +1622,9 @@ out:
 #undef NONE_OPERATIONAL
 #undef ARRAY_IS_ACTIVE
 
-static int raid1_stop_resync (mddev_t *mddev)
+static int stop_resync(mddev_t *mddev)
 {
-	raid1_conf_t *conf = mddev_to_conf(mddev);
+	conf_t *conf = mddev_to_conf(mddev);
 
 	if (conf->resync_thread) {
 		if (conf->resync_mirrors) {
@@ -1769,9 +1639,9 @@ static int raid1_stop_resync (mddev_t *mddev)
 	return 0;
 }
 
-static int raid1_restart_resync (mddev_t *mddev)
+static int restart_resync(mddev_t *mddev)
 {
-	raid1_conf_t *conf = mddev_to_conf(mddev);
+	conf_t *conf = mddev_to_conf(mddev);
 
 	if (conf->resync_mirrors) {
 		if (!conf->resync_thread) {
@@ -1785,46 +1655,45 @@ static int raid1_restart_resync (mddev_t *mddev)
 	return 0;
 }
 
-static int raid1_stop (mddev_t *mddev)
+static int stop(mddev_t *mddev)
 {
-	raid1_conf_t *conf = mddev_to_conf(mddev);
+	conf_t *conf = mddev_to_conf(mddev);
 
 	md_unregister_thread(conf->thread);
 	if (conf->resync_thread)
 		md_unregister_thread(conf->resync_thread);
-	raid1_shrink_r1bh(conf);
-	raid1_shrink_bh(conf);
-	raid1_shrink_buffers(conf);
+	if (conf->r1bio_pool)
+		mempool_destroy(conf->r1bio_pool);
 	kfree(conf);
 	mddev->private = NULL;
 	MOD_DEC_USE_COUNT;
 	return 0;
 }
 
-static mdk_personality_t raid1_personality=
+static mdk_personality_t raid1_personality =
 {
 	name:		"raid1",
-	make_request:	raid1_make_request,
-	run:		raid1_run,
-	stop:		raid1_stop,
-	status:		raid1_status,
-	error_handler:	raid1_error,
-	diskop:		raid1_diskop,
-	stop_resync:	raid1_stop_resync,
-	restart_resync:	raid1_restart_resync,
-	sync_request:	raid1_sync_request
+	make_request:	make_request,
+	run:		run,
+	stop:		stop,
+	status:		status,
+	error_handler:	error,
+	diskop:		diskop,
+	stop_resync:	stop_resync,
+	restart_resync:	restart_resync,
+	sync_request:	sync_request
 };
 
-static int md__init raid1_init (void)
+static int __init raid_init(void)
 {
-	return register_md_personality (RAID1, &raid1_personality);
+	return register_md_personality(RAID1, &raid1_personality);
 }
 
-static void raid1_exit (void)
+static void raid_exit(void)
 {
-	unregister_md_personality (RAID1);
+	unregister_md_personality(RAID1);
 }
 
-module_init(raid1_init);
-module_exit(raid1_exit);
+module_init(raid_init);
+module_exit(raid_exit);
 MODULE_LICENSE("GPL");
diff --git a/drivers/net/tulip/ChangeLog b/drivers/net/tulip/ChangeLog
index a515efcfd..8a1caaa28 100644
--- a/drivers/net/tulip/ChangeLog
+++ b/drivers/net/tulip/ChangeLog
@@ -1,3 +1,8 @@
+2001-12-11  Jeff Garzik  <jgarzik@mandrakesoft.com>
+
+	* eeprom.c, timer.c, media.c, tulip_core.c:
+	Remove 21040 and 21041 chip support.
+
 2001-11-13  David S. Miller  <davem@redhat.com>
 
 	* tulip_core.c (tulip_mwi_config): Kill unused label early_out.
diff --git a/drivers/net/tulip/eeprom.c b/drivers/net/tulip/eeprom.c
index beb1430cc..8777cc1f3 100644
--- a/drivers/net/tulip/eeprom.c
+++ b/drivers/net/tulip/eeprom.c
@@ -136,23 +136,6 @@ void __devinit tulip_parse_eeprom(struct net_device *dev)
 subsequent_board:
 
 	if (ee_data[27] == 0) {		/* No valid media table. */
-	} else if (tp->chip_id == DC21041) {
-		unsigned char *p = (void *)ee_data + ee_data[27 + controller_index*3];
-		int media = get_u16(p);
-		int count = p[2];
-		p += 3;
-
-		printk(KERN_INFO "%s: 21041 Media table, default media %4.4x (%s).\n",
-			   dev->name, media,
-			   media & 0x0800 ? "Autosense" : medianame[media & MEDIA_MASK]);
-		for (i = 0; i < count; i++) {
-			unsigned char media_block = *p++;
-			int media_code = media_block & MEDIA_MASK;
-			if (media_block & 0x40)
-				p += 6;
-			printk(KERN_INFO "%s:  21041 media #%d, %s.\n",
-				   dev->name, media_code, medianame[media_code]);
-		}
 	} else {
 		unsigned char *p = (void *)ee_data + ee_data[27];
 		unsigned char csr12dir = 0;
diff --git a/drivers/net/tulip/media.c b/drivers/net/tulip/media.c
index 5d1329776..e7160fca0 100644
--- a/drivers/net/tulip/media.c
+++ b/drivers/net/tulip/media.c
@@ -21,12 +21,6 @@
 #include "tulip.h"
 
 
-/* This is a mysterious value that can be written to CSR11 in the 21040 (only)
-   to support a pre-NWay full-duplex signaling mechanism using short frames.
-   No one knows what it should be, but if left at its default value some
-   10base2(!) packets trigger a full-duplex-request interrupt. */
-#define FULL_DUPLEX_MAGIC	0x6969
-
 /* The maximum data clock rate is 2.5 Mhz.  The minimum timing is usually
    met by back-to-back PCI I/O cycles, but we insert a delay to avoid
    "overclocking" issues or future 66Mhz PCI. */
@@ -326,17 +320,6 @@ void tulip_select_media(struct net_device *dev, int startup)
 			printk(KERN_DEBUG "%s: Using media type %s, CSR12 is %2.2x.\n",
 				   dev->name, medianame[dev->if_port],
 				   inl(ioaddr + CSR12) & 0xff);
-	} else if (tp->chip_id == DC21041) {
-		int port = dev->if_port <= 4 ? dev->if_port : 0;
-		if (tulip_debug > 1)
-			printk(KERN_DEBUG "%s: 21041 using media %s, CSR12 is %4.4x.\n",
-				   dev->name, medianame[port == 3 ? 12: port],
-				   inl(ioaddr + CSR12));
-		outl(0x00000000, ioaddr + CSR13); /* Reset the serial interface */
-		outl(t21041_csr14[port], ioaddr + CSR14);
-		outl(t21041_csr15[port], ioaddr + CSR15);
-		outl(t21041_csr13[port], ioaddr + CSR13);
-		new_csr6 = 0x80020000;
 	} else if (tp->chip_id == LC82C168) {
 		if (startup && ! tp->medialock)
 			dev->if_port = tp->mii_cnt ? 11 : 0;
@@ -363,26 +346,6 @@ void tulip_select_media(struct net_device *dev, int startup)
 			new_csr6 = 0x00420000;
 			outl(0x1F078, ioaddr + 0xB8);
 		}
-	} else if (tp->chip_id == DC21040) {					/* 21040 */
-		/* Turn on the xcvr interface. */
-		int csr12 = inl(ioaddr + CSR12);
-		if (tulip_debug > 1)
-			printk(KERN_DEBUG "%s: 21040 media type is %s, CSR12 is %2.2x.\n",
-				   dev->name, medianame[dev->if_port], csr12);
-		if (tulip_media_cap[dev->if_port] & MediaAlwaysFD)
-			tp->full_duplex = 1;
-		new_csr6 = 0x20000;
-		/* Set the full duplux match frame. */
-		outl(FULL_DUPLEX_MAGIC, ioaddr + CSR11);
-		outl(0x00000000, ioaddr + CSR13); /* Reset the serial interface */
-		if (t21040_csr13[dev->if_port] & 8) {
-			outl(0x0705, ioaddr + CSR14);
-			outl(0x0006, ioaddr + CSR15);
-		} else {
-			outl(0xffff, ioaddr + CSR14);
-			outl(0x0000, ioaddr + CSR15);
-		}
-		outl(0x8f01 | t21040_csr13[dev->if_port], ioaddr + CSR13);
 	} else {					/* Unknown chip type with no media table. */
 		if (tp->default_port == 0)
 			dev->if_port = tp->mii_cnt ? 11 : 3;
diff --git a/drivers/net/tulip/timer.c b/drivers/net/tulip/timer.c
index 4079772ae..53c43912b 100644
--- a/drivers/net/tulip/timer.c
+++ b/drivers/net/tulip/timer.c
@@ -33,60 +33,6 @@ void tulip_timer(unsigned long data)
 			   inl(ioaddr + CSR14), inl(ioaddr + CSR15));
 	}
 	switch (tp->chip_id) {
-	case DC21040:
-		if (!tp->medialock  &&  csr12 & 0x0002) { /* Network error */
-			printk(KERN_INFO "%s: No link beat found.\n",
-				   dev->name);
-			dev->if_port = (dev->if_port == 2 ? 0 : 2);
-			tulip_select_media(dev, 0);
-			dev->trans_start = jiffies;
-		}
-		break;
-	case DC21041:
-		if (tulip_debug > 2)
-			printk(KERN_DEBUG "%s: 21041 media tick  CSR12 %8.8x.\n",
-				   dev->name, csr12);
-		if (tp->medialock) break;
-		switch (dev->if_port) {
-		case 0: case 3: case 4:
-		  if (csr12 & 0x0004) { /*LnkFail */
-			/* 10baseT is dead.  Check for activity on alternate port. */
-			tp->mediasense = 1;
-			if (csr12 & 0x0200)
-				dev->if_port = 2;
-			else
-				dev->if_port = 1;
-			printk(KERN_INFO "%s: No 21041 10baseT link beat, Media switched to %s.\n",
-				   dev->name, medianame[dev->if_port]);
-			outl(0, ioaddr + CSR13); /* Reset */
-			outl(t21041_csr14[dev->if_port], ioaddr + CSR14);
-			outl(t21041_csr15[dev->if_port], ioaddr + CSR15);
-			outl(t21041_csr13[dev->if_port], ioaddr + CSR13);
-			next_tick = 10*HZ;			/* 2.4 sec. */
-		  } else
-			next_tick = 30*HZ;
-		  break;
-		case 1:					/* 10base2 */
-		case 2:					/* AUI */
-			if (csr12 & 0x0100) {
-				next_tick = (30*HZ);			/* 30 sec. */
-				tp->mediasense = 0;
-			} else if ((csr12 & 0x0004) == 0) {
-				printk(KERN_INFO "%s: 21041 media switched to 10baseT.\n",
-					   dev->name);
-				dev->if_port = 0;
-				tulip_select_media(dev, 0);
-				next_tick = (24*HZ)/10;				/* 2.4 sec. */
-			} else if (tp->mediasense || (csr12 & 0x0002)) {
-				dev->if_port = 3 - dev->if_port; /* Swap ports. */
-				tulip_select_media(dev, 0);
-				next_tick = 20*HZ;
-			} else {
-				next_tick = 20*HZ;
-			}
-			break;
-		}
-		break;
 	case DC21140:
 	case DC21142:
 	case MX98713:
diff --git a/drivers/net/tulip/tulip_core.c b/drivers/net/tulip/tulip_core.c
index 601046c8e..fef6b4463 100644
--- a/drivers/net/tulip/tulip_core.c
+++ b/drivers/net/tulip/tulip_core.c
@@ -15,8 +15,8 @@
 */
 
 #define DRV_NAME	"tulip"
-#define DRV_VERSION	"0.9.15-pre9"
-#define DRV_RELDATE	"Nov 6, 2001"
+#define DRV_VERSION	"1.1.0"
+#define DRV_RELDATE	"Dec 11, 2001"
 
 #include <linux/config.h>
 #include <linux/module.h>
@@ -130,12 +130,8 @@ int tulip_debug = 1;
  */
 
 struct tulip_chip_table tulip_tbl[] = {
-  /* DC21040 */
-  { "Digital DC21040 Tulip", 128, 0x0001ebef, 0, tulip_timer },
-
-  /* DC21041 */
-  { "Digital DC21041 Tulip", 128, 0x0001ebef,
-	HAS_MEDIA_TABLE | HAS_NWAY, tulip_timer },
+  { }, /* placeholder for array, slot unused currently */
+  { }, /* placeholder for array, slot unused currently */
 
   /* DC21140 */
   { "Digital DS21140 Tulip", 128, 0x0001ebef,
@@ -192,8 +188,6 @@ struct tulip_chip_table tulip_tbl[] = {
 
 
 static struct pci_device_id tulip_pci_tbl[] __devinitdata = {
-	{ 0x1011, 0x0002, PCI_ANY_ID, PCI_ANY_ID, 0, 0, DC21040 },
-	{ 0x1011, 0x0014, PCI_ANY_ID, PCI_ANY_ID, 0, 0, DC21041 },
 	{ 0x1011, 0x0009, PCI_ANY_ID, PCI_ANY_ID, 0, 0, DC21140 },
 	{ 0x1011, 0x0019, PCI_ANY_ID, PCI_ANY_ID, 0, 0, DC21143 },
 	{ 0x11AD, 0x0002, PCI_ANY_ID, PCI_ANY_ID, 0, 0, LC82C168 },
@@ -224,19 +218,6 @@ MODULE_DEVICE_TABLE(pci, tulip_pci_tbl);
 /* A full-duplex map for media types. */
 const char tulip_media_cap[32] =
 {0,0,0,16,  3,19,16,24,  27,4,7,5, 0,20,23,20,  28,31,0,0, };
-u8 t21040_csr13[] = {2,0x0C,8,4,  4,0,0,0, 0,0,0,0, 4,0,0,0};
-
-/* 21041 transceiver register settings: 10-T, 10-2, AUI, 10-T, 10T-FD*/
-u16 t21041_csr13[] = {
-	csr13_mask_10bt,		/* 10-T */
-	csr13_mask_auibnc,		/* 10-2 */
-	csr13_mask_auibnc,		/* AUI */
-	csr13_mask_10bt,		/* 10-T */
-	csr13_mask_10bt,		/* 10T-FD */
-};
-u16 t21041_csr14[] = { 0xFFFF, 0xF7FD, 0xF7FD, 0x7F3F, 0x7F3D, };
-u16 t21041_csr15[] = { 0x0008, 0x0006, 0x000E, 0x0008, 0x0008, };
-
 
 static void tulip_tx_timeout(struct net_device *dev);
 static void tulip_init_ring(struct net_device *dev);
@@ -388,19 +369,6 @@ media_picked:
 			outl(0x0008, ioaddr + CSR15);
 		}
 		tulip_select_media(dev, 1);
-	} else if (tp->chip_id == DC21041) {
-		dev->if_port = 0;
-		tp->nway = tp->mediasense = 1;
-		tp->nwayset = tp->lpar = 0;
-		outl(0x00000000, ioaddr + CSR13);
-		outl(0xFFFFFFFF, ioaddr + CSR14);
-		outl(0x00000008, ioaddr + CSR15); /* Listen on AUI also. */
-		tp->csr6 = 0x80020000;
-		if (tp->sym_advertise & 0x0040)
-			tp->csr6 |= FullDuplex;
-		outl(tp->csr6, ioaddr + CSR6);
-		outl(0x0000EF01, ioaddr + CSR13);
-
 	} else if (tp->chip_id == DC21142) {
 		if (tp->mii_cnt) {
 			tulip_select_media(dev, 1);
@@ -538,33 +506,6 @@ static void tulip_tx_timeout(struct net_device *dev)
 		if (tulip_debug > 1)
 			printk(KERN_WARNING "%s: Transmit timeout using MII device.\n",
 				   dev->name);
-	} else if (tp->chip_id == DC21040) {
-		if ( !tp->medialock  &&  inl(ioaddr + CSR12) & 0x0002) {
-			dev->if_port = (dev->if_port == 2 ? 0 : 2);
-			printk(KERN_INFO "%s: 21040 transmit timed out, switching to "
-				   "%s.\n",
-				   dev->name, medianame[dev->if_port]);
-			tulip_select_media(dev, 0);
-		}
-		goto out;
-	} else if (tp->chip_id == DC21041) {
-		int csr12 = inl(ioaddr + CSR12);
-
-		printk(KERN_WARNING "%s: 21041 transmit timed out, status %8.8x, "
-			   "CSR12 %8.8x, CSR13 %8.8x, CSR14 %8.8x, resetting...\n",
-			   dev->name, inl(ioaddr + CSR5), csr12,
-			   inl(ioaddr + CSR13), inl(ioaddr + CSR14));
-		tp->mediasense = 1;
-		if ( ! tp->medialock) {
-			if (dev->if_port == 1 || dev->if_port == 2)
-				if (csr12 & 0x0004) {
-					dev->if_port = 2 - dev->if_port;
-				} else
-					dev->if_port = 0;
-			else
-				dev->if_port = 1;
-			tulip_select_media(dev, 0);
-		}
 	} else if (tp->chip_id == DC21140 || tp->chip_id == DC21142
 			   || tp->chip_id == MX98713 || tp->chip_id == COMPEX9881
 			   || tp->chip_id == DM910X) {
@@ -636,7 +577,6 @@ static void tulip_tx_timeout(struct net_device *dev)
 
 	tp->stats.tx_errors++;
 
-out:
 	spin_unlock_irqrestore (&tp->lock, flags);
 	dev->trans_start = jiffies;
 	netif_wake_queue (dev);
@@ -802,10 +742,6 @@ static void tulip_down (struct net_device *dev)
 	/* release any unconsumed transmit buffers */
 	tulip_clean_tx_ring(tp);
 
-	/* 21040 -- Leave the card in 10baseT state. */
-	if (tp->chip_id == DC21040)
-		outl (0x00000004, ioaddr + CSR13);
-
 	if (inl (ioaddr + CSR6) != 0xffffffff)
 		tp->stats.rx_missed_errors += inl (ioaddr + CSR8) & 0xffff;
 
@@ -966,16 +902,14 @@ static int private_ioctl (struct net_device *dev, struct ifreq *rq, int cmd)
 					0x1848 +
 					((csr12&0x7000) == 0x5000 ? 0x20 : 0) +
 					((csr12&0x06) == 6 ? 0 : 4);
-                                if (tp->chip_id != DC21041)
-                                        data->val_out |= 0x6048;
+                                data->val_out |= 0x6048;
 				break;
 			case 4:
                                 /* Advertised value, bogus 10baseTx-FD value from CSR6. */
                                 data->val_out =
 					((inl(ioaddr + CSR6) >> 3) & 0x0040) +
 					((csr14 >> 1) & 0x20) + 1;
-                                if (tp->chip_id != DC21041)
-                                         data->val_out |= ((csr14 >> 9) & 0x03C0);
+                                data->val_out |= ((csr14 >> 9) & 0x03C0);
 				break;
 			case 5: data->val_out = tp->lpar; break;
 			default: data->val_out = 0; break;
@@ -1358,7 +1292,6 @@ static int __devinit tulip_init_one (struct pci_dev *pdev,
 	long ioaddr;
 	static int board_idx = -1;
 	int chip_idx = ent->driver_data;
-	unsigned int t2104x_mode = 0;
 	unsigned int eeprom_missing = 0;
 	unsigned int force_csr0 = 0;
 
@@ -1527,31 +1460,12 @@ static int __devinit tulip_init_one (struct pci_dev *pdev,
 	/* Clear the missed-packet counter. */
 	inl(ioaddr + CSR8);
 
-	if (chip_idx == DC21041) {
-		if (inl(ioaddr + CSR9) & 0x8000) {
-			chip_idx = DC21040;
-			t2104x_mode = 1;
-		} else {
-			t2104x_mode = 2;
-		}
-	}
-
 	/* The station address ROM is read byte serially.  The register must
 	   be polled, waiting for the value to be read bit serially from the
 	   EEPROM.
 	   */
 	sum = 0;
-	if (chip_idx == DC21040) {
-		outl(0, ioaddr + CSR9);		/* Reset the pointer with a dummy write. */
-		for (i = 0; i < 6; i++) {
-			int value, boguscnt = 100000;
-			do
-				value = inl(ioaddr + CSR9);
-			while (value < 0  && --boguscnt > 0);
-			dev->dev_addr[i] = value;
-			sum += value & 0xff;
-		}
-	} else if (chip_idx == LC82C168) {
+	if (chip_idx == LC82C168) {
 		for (i = 0; i < 3; i++) {
 			int value, boguscnt = 100000;
 			outl(0x600 | i, ioaddr + 0x98);
@@ -1719,10 +1633,6 @@ static int __devinit tulip_init_one (struct pci_dev *pdev,
 	       dev->name, tulip_tbl[chip_idx].chip_name, chip_rev, ioaddr);
 	pci_set_drvdata(pdev, dev);
 
-	if (t2104x_mode == 1)
-		printk(" 21040 compatible mode,");
-	else if (t2104x_mode == 2)
-		printk(" 21041 mode,");
 	if (eeprom_missing)
 		printk(" EEPROM not present,");
 	for (i = 0; i < 6; i++)
@@ -1731,26 +1641,13 @@ static int __devinit tulip_init_one (struct pci_dev *pdev,
 
         if (tp->chip_id == PNIC2)
 		tp->link_change = pnic2_lnk_change;
-	else if ((tp->flags & HAS_NWAY)  || tp->chip_id == DC21041)
+	else if (tp->flags & HAS_NWAY)
 		tp->link_change = t21142_lnk_change;
 	else if (tp->flags & HAS_PNICNWAY)
 		tp->link_change = pnic_lnk_change;
 
 	/* Reset the xcvr interface and turn on heartbeat. */
 	switch (chip_idx) {
-	case DC21041:
-		if (tp->sym_advertise == 0)
-			tp->sym_advertise = 0x0061;
-		outl(0x00000000, ioaddr + CSR13);
-		outl(0xFFFFFFFF, ioaddr + CSR14);
-		outl(0x00000008, ioaddr + CSR15); /* Listen on AUI also. */
-		outl(inl(ioaddr + CSR6) | csr6_fd, ioaddr + CSR6);
-		outl(0x0000EF01, ioaddr + CSR13);
-		break;
-	case DC21040:
-		outl(0x00000000, ioaddr + CSR13);
-		outl(0x00000004, ioaddr + CSR13);
-		break;
 	case DC21140:
 	case DM910X:
 	default:
diff --git a/drivers/scsi/eata.c b/drivers/scsi/eata.c
index 1ce0fa803..fa97dfb7b 100644
--- a/drivers/scsi/eata.c
+++ b/drivers/scsi/eata.c
@@ -1,6 +1,9 @@
 /*
  *      eata.c - Low-level driver for EATA/DMA SCSI host adapters.
  *
+ *      11 Dec 2001 Rev. 7.00 for linux 2.5.1
+ *        + Use host->host_lock instead of io_request_lock.
+ *
  *       1 May 2001 Rev. 6.05 for linux 2.4.4
  *        + Clean up all pci related routines.
  *        + Fix data transfer direction for opcode SEND_CUE_SHEET (0x5d)
@@ -438,13 +441,6 @@ MODULE_AUTHOR("Dario Ballabio");
 #include <linux/ctype.h>
 #include <linux/spinlock.h>
 
-#define SPIN_FLAGS unsigned long spin_flags;
-#define SPIN_LOCK spin_lock_irq(&io_request_lock);
-#define SPIN_LOCK_SAVE spin_lock_irqsave(&io_request_lock, spin_flags);
-#define SPIN_UNLOCK spin_unlock_irq(&io_request_lock);
-#define SPIN_UNLOCK_RESTORE \
-                  spin_unlock_irqrestore(&io_request_lock, spin_flags);
-
 /* Subversion values */
 #define ISA  0
 #define ESA 1
@@ -1589,10 +1585,12 @@ static inline int do_reset(Scsi_Cmnd *SCarg) {
 #endif
 
    HD(j)->in_reset = TRUE;
-   SPIN_UNLOCK
+
+   spin_unlock_irq(&sh[j]->host_lock);
    time = jiffies;
    while ((jiffies - time) < (10 * HZ) && limit++ < 200000) udelay(100L);
-   SPIN_LOCK
+   spin_lock_irq(&sh[j]->host_lock);
+
    printk("%s: reset, interrupts disabled, loops %d.\n", BN(j), limit);
 
    for (i = 0; i < sh[j]->can_queue; i++) {
@@ -2036,14 +2034,14 @@ static inline void ihdlr(int irq, unsigned int j) {
 
 static void do_interrupt_handler(int irq, void *shap, struct pt_regs *regs) {
    unsigned int j;
-   SPIN_FLAGS
+   unsigned long spin_flags;
 
    /* Check if the interrupt must be processed by this handler */
    if ((j = (unsigned int)((char *)shap - sha)) >= num_boards) return;
 
-   SPIN_LOCK_SAVE
+   spin_lock_irqsave(&sh[j]->host_lock, spin_flags);
    ihdlr(irq, j);
-   SPIN_UNLOCK_RESTORE
+   spin_unlock_irqrestore(&sh[j]->host_lock, spin_flags);
 }
 
 int eata2x_release(struct Scsi_Host *shpnt) {
@@ -2077,4 +2075,4 @@ static Scsi_Host_Template driver_template = EATA;
 #ifndef MODULE
 __setup("eata=", option_setup);
 #endif /* end MODULE */
-MODULE_LICENSE("Dual BSD/GPL");
+MODULE_LICENSE("GPL");
diff --git a/drivers/scsi/eata.h b/drivers/scsi/eata.h
index afa5e2787..de0bad6ef 100644
--- a/drivers/scsi/eata.h
+++ b/drivers/scsi/eata.h
@@ -13,7 +13,7 @@ int eata2x_abort(Scsi_Cmnd *);
 int eata2x_reset(Scsi_Cmnd *);
 int eata2x_biosparam(Disk *, kdev_t, int *);
 
-#define EATA_VERSION "6.05.00"
+#define EATA_VERSION "7.00.00"
 
 #define EATA {                                                               \
                 name:              "EATA/DMA 2.0x rev. " EATA_VERSION " ",   \
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 01fcfd78b..ad08c5bf2 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -182,7 +182,7 @@ void  scsi_initialize_queue(Scsi_Device * SDpnt, struct Scsi_Host * SHpnt)
 {
 	request_queue_t *q = &SDpnt->request_queue;
 
-	blk_init_queue(q, scsi_request_fn);
+	blk_init_queue(q, scsi_request_fn, &SHpnt->host_lock);
 	q->queuedata = (void *) SDpnt;
 
         /* Hardware imposed limit. */
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index af0bb409c..b6894649e 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1254,9 +1254,7 @@ STATIC void scsi_restart_operations(struct Scsi_Host *host)
 			break;
 		}
 
-		spin_lock(&q->queue_lock);
 		q->request_fn(q);
-		spin_unlock(&q->queue_lock);
 	}
 	spin_unlock_irqrestore(&host->host_lock, flags);
 }
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 7ce480796..9ddf48635 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -70,7 +70,7 @@ static void __scsi_insert_special(request_queue_t *q, struct request *rq,
 {
 	unsigned long flags;
 
-	ASSERT_LOCK(&q->queue_lock, 0);
+	ASSERT_LOCK(q->queue_lock, 0);
 
 	/*
 	 * tell I/O scheduler that this isn't a regular read/write (ie it
@@ -91,10 +91,10 @@ static void __scsi_insert_special(request_queue_t *q, struct request *rq,
 	 * head of the queue for things like a QUEUE_FULL message from a
 	 * device, or a host that is unable to accept a particular command.
 	 */
-	spin_lock_irqsave(&q->queue_lock, flags);
+	spin_lock_irqsave(q->queue_lock, flags);
 	__elv_add_request(q, rq, !at_head, 0);
 	q->request_fn(q);
-	spin_unlock_irqrestore(&q->queue_lock, flags);
+	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
 
@@ -250,9 +250,9 @@ void scsi_queue_next_request(request_queue_t * q, Scsi_Cmnd * SCpnt)
 	Scsi_Device *SDpnt;
 	struct Scsi_Host *SHpnt;
 
-	ASSERT_LOCK(&q->queue_lock, 0);
+	ASSERT_LOCK(q->queue_lock, 0);
 
-	spin_lock_irqsave(&q->queue_lock, flags);
+	spin_lock_irqsave(q->queue_lock, flags);
 	if (SCpnt != NULL) {
 
 		/*
@@ -325,7 +325,7 @@ void scsi_queue_next_request(request_queue_t * q, Scsi_Cmnd * SCpnt)
 			SHpnt->some_device_starved = 0;
 		}
 	}
-	spin_unlock_irqrestore(&q->queue_lock, flags);
+	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
 /*
@@ -360,7 +360,7 @@ static Scsi_Cmnd *__scsi_end_request(Scsi_Cmnd * SCpnt,
 	request_queue_t *q = &SCpnt->device->request_queue;
 	struct request *req = &SCpnt->request;
 
-	ASSERT_LOCK(&q->queue_lock, 0);
+	ASSERT_LOCK(q->queue_lock, 0);
 
 	/*
 	 * If there are blocks left over at the end, set up the command
@@ -445,7 +445,7 @@ static void scsi_release_buffers(Scsi_Cmnd * SCpnt)
 {
 	struct request *req = &SCpnt->request;
 
-	ASSERT_LOCK(&SCpnt->device->request_queue.queue_lock, 0);
+	ASSERT_LOCK(&SCpnt->host->host_lock, 0);
 
 	/*
 	 * Free up any indirection buffers we allocated for DMA purposes. 
@@ -518,7 +518,7 @@ void scsi_io_completion(Scsi_Cmnd * SCpnt, int good_sectors,
 	 *	would be used if we just wanted to retry, for example.
 	 *
 	 */
-	ASSERT_LOCK(&q->queue_lock, 0);
+	ASSERT_LOCK(q->queue_lock, 0);
 
 	/*
 	 * Free up any indirection buffers we allocated for DMA purposes. 
@@ -746,8 +746,6 @@ struct Scsi_Device_Template *scsi_get_request_dev(struct request *req)
 	kdev_t dev = req->rq_dev;
 	int major = MAJOR(dev);
 
-	ASSERT_LOCK(&req->q->queue_lock, 1);
-
 	for (spnt = scsi_devicelist; spnt; spnt = spnt->next) {
 		/*
 		 * Search for a block device driver that supports this
@@ -804,7 +802,7 @@ void scsi_request_fn(request_queue_t * q)
 	struct Scsi_Host *SHpnt;
 	struct Scsi_Device_Template *STpnt;
 
-	ASSERT_LOCK(&q->queue_lock, 1);
+	ASSERT_LOCK(q->queue_lock, 1);
 
 	SDpnt = (Scsi_Device *) q->queuedata;
 	if (!SDpnt) {
@@ -871,9 +869,9 @@ void scsi_request_fn(request_queue_t * q)
 			 */
 			SDpnt->was_reset = 0;
 			if (SDpnt->removable && !in_interrupt()) {
-				spin_unlock_irq(&q->queue_lock);
+				spin_unlock_irq(q->queue_lock);
 				scsi_ioctl(SDpnt, SCSI_IOCTL_DOORLOCK, 0);
-				spin_lock_irq(&q->queue_lock);
+				spin_lock_irq(q->queue_lock);
 				continue;
 			}
 		}
@@ -973,7 +971,7 @@ void scsi_request_fn(request_queue_t * q)
 		 * another.  
 		 */
 		req = NULL;
-		spin_unlock_irq(&q->queue_lock);
+		spin_unlock_irq(q->queue_lock);
 
 		if (SCpnt->request.flags & REQ_CMD) {
 			/*
@@ -1003,7 +1001,7 @@ void scsi_request_fn(request_queue_t * q)
 				{
 					panic("Should not have leftover blocks\n");
 				}
-				spin_lock_irq(&q->queue_lock);
+				spin_lock_irq(q->queue_lock);
 				SHpnt->host_busy--;
 				SDpnt->device_busy--;
 				continue;
@@ -1019,7 +1017,7 @@ void scsi_request_fn(request_queue_t * q)
 				{
 					panic("Should not have leftover blocks\n");
 				}
-				spin_lock_irq(&q->queue_lock);
+				spin_lock_irq(q->queue_lock);
 				SHpnt->host_busy--;
 				SDpnt->device_busy--;
 				continue;
@@ -1040,7 +1038,7 @@ void scsi_request_fn(request_queue_t * q)
 		 * Now we need to grab the lock again.  We are about to mess
 		 * with the request queue and try to find another command.
 		 */
-		spin_lock_irq(&q->queue_lock);
+		spin_lock_irq(q->queue_lock);
 	}
 }
 
diff --git a/drivers/scsi/scsi_queue.c b/drivers/scsi/scsi_queue.c
index b864fc045..1d9a90bbd 100644
--- a/drivers/scsi/scsi_queue.c
+++ b/drivers/scsi/scsi_queue.c
@@ -80,7 +80,6 @@ int scsi_mlqueue_insert(Scsi_Cmnd * cmd, int reason)
 {
 	struct Scsi_Host *host;
 	unsigned long flags;
-	request_queue_t *q = &cmd->device->request_queue;
 
 	SCSI_LOG_MLQUEUE(1, printk("Inserting command %p into mlqueue\n", cmd));
 
@@ -138,10 +137,10 @@ int scsi_mlqueue_insert(Scsi_Cmnd * cmd, int reason)
 	 * Decrement the counters, since these commands are no longer
 	 * active on the host/device.
 	 */
-	spin_lock_irqsave(&q->queue_lock, flags);
+	spin_lock_irqsave(&cmd->host->host_lock, flags);
 	cmd->host->host_busy--;
 	cmd->device->device_busy--;
-	spin_unlock_irqrestore(&q->queue_lock, flags);
+	spin_unlock_irqrestore(&cmd->host->host_lock, flags);
 
 	/*
 	 * Insert this command at the head of the queue for it's device.
diff --git a/drivers/scsi/u14-34f.c b/drivers/scsi/u14-34f.c
index 41cff9e57..adacf2fd4 100644
--- a/drivers/scsi/u14-34f.c
+++ b/drivers/scsi/u14-34f.c
@@ -1,6 +1,9 @@
 /*
  *      u14-34f.c - Low-level driver for UltraStor 14F/34F SCSI host adapters.
  *
+ *      11 Dec 2001 Rev. 7.00 for linux 2.5.1
+ *        + Use host->host_lock instead of io_request_lock.
+ *
  *       1 May 2001 Rev. 6.05 for linux 2.4.4
  *        + Fix data transfer direction for opcode SEND_CUE_SHEET (0x5d)
  *
@@ -334,7 +337,6 @@
  *  the driver sets host->wish_block = TRUE for all ISA boards.
  */
 
-#include <linux/module.h>
 #include <linux/version.h>
 
 #ifndef LinuxVersionCode
@@ -343,6 +345,9 @@
 
 #define MAX_INT_PARAM 10
 
+#if defined(MODULE)
+#include <linux/module.h>
+
 MODULE_PARM(boot_options, "s");
 MODULE_PARM(io_port, "1-" __MODULE_STRING(MAX_INT_PARAM) "i");
 MODULE_PARM(linked_comm, "i");
@@ -352,6 +357,8 @@ MODULE_PARM(max_queue_depth, "i");
 MODULE_PARM(ext_tran, "i");
 MODULE_AUTHOR("Dario Ballabio");
 
+#endif
+
 #include <linux/string.h>
 #include <linux/sched.h>
 #include <linux/kernel.h>
@@ -374,13 +381,6 @@ MODULE_AUTHOR("Dario Ballabio");
 #include <linux/ctype.h>
 #include <linux/spinlock.h>
 
-#define SPIN_FLAGS unsigned long spin_flags;
-#define SPIN_LOCK spin_lock_irq(&io_request_lock);
-#define SPIN_LOCK_SAVE spin_lock_irqsave(&io_request_lock, spin_flags);
-#define SPIN_UNLOCK spin_unlock_irq(&io_request_lock);
-#define SPIN_UNLOCK_RESTORE \
-                  spin_unlock_irqrestore(&io_request_lock, spin_flags);
-
 /* Values for the PRODUCT_ID ports for the 14/34F */
 #define PRODUCT_ID1  0x56
 #define PRODUCT_ID2  0x40        /* NOTE: Only upper nibble is used */
@@ -672,10 +672,8 @@ static int board_inquiry(unsigned int j) {
    /* Issue OGM interrupt */
    outb(CMD_OGM_INTR, sh[j]->io_port + REG_LCL_INTR);
 
-   SPIN_UNLOCK
    time = jiffies;
    while ((jiffies - time) < HZ && limit++ < 20000) udelay(100L);
-   SPIN_LOCK
 
    if (cpp->adapter_status || HD(j)->cp_stat[0] != FREE) {
       HD(j)->cp_stat[0] = FREE;
@@ -1274,10 +1272,12 @@ static inline int do_reset(Scsi_Cmnd *SCarg) {
 #endif
 
    HD(j)->in_reset = TRUE;
-   SPIN_UNLOCK
+   
+   spin_unlock_irq(&sh[j]->host_lock);
    time = jiffies;
    while ((jiffies - time) < (10 * HZ) && limit++ < 200000) udelay(100L);
-   SPIN_LOCK
+   spin_lock_irq(&sh[j]->host_lock);
+   
    printk("%s: reset, interrupts disabled, loops %d.\n", BN(j), limit);
 
    for (i = 0; i < sh[j]->can_queue; i++) {
@@ -1718,14 +1718,14 @@ static inline void ihdlr(int irq, unsigned int j) {
 
 static void do_interrupt_handler(int irq, void *shap, struct pt_regs *regs) {
    unsigned int j;
-   SPIN_FLAGS
+   unsigned long spin_flags;
 
    /* Check if the interrupt must be processed by this handler */
    if ((j = (unsigned int)((char *)shap - sha)) >= num_boards) return;
 
-   SPIN_LOCK_SAVE
+   spin_lock_irqsave(&sh[j]->host_lock, spin_flags);
    ihdlr(irq, j);
-   SPIN_UNLOCK_RESTORE
+   spin_unlock_irqrestore(&sh[j]->host_lock, spin_flags);
 }
 
 int u14_34f_release(struct Scsi_Host *shpnt) {
@@ -1752,7 +1752,6 @@ int u14_34f_release(struct Scsi_Host *shpnt) {
    return FALSE;
 }
 
-MODULE_LICENSE("BSD without advertisement clause");
 static Scsi_Host_Template driver_template = ULTRASTOR_14_34F;
 
 #include "scsi_module.c"
@@ -1760,3 +1759,4 @@ static Scsi_Host_Template driver_template = ULTRASTOR_14_34F;
 #ifndef MODULE
 __setup("u14-34f=", option_setup);
 #endif /* end MODULE */
+MODULE_LICENSE("GPL");
diff --git a/drivers/scsi/u14-34f.h b/drivers/scsi/u14-34f.h
index 1d2988d73..d8d1d400f 100644
--- a/drivers/scsi/u14-34f.h
+++ b/drivers/scsi/u14-34f.h
@@ -13,7 +13,7 @@ int u14_34f_abort(Scsi_Cmnd *);
 int u14_34f_reset(Scsi_Cmnd *);
 int u14_34f_biosparam(Disk *, kdev_t, int *);
 
-#define U14_34F_VERSION "6.05.00"
+#define U14_34F_VERSION "7.00.00"
 
 #define ULTRASTOR_14_34F {                                                   \
                 name:         "UltraStor 14F/34F rev. " U14_34F_VERSION " ", \
diff --git a/fs/bio.c b/fs/bio.c
index 085247a16..555b7ac14 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -48,7 +48,7 @@ static const int bvec_pool_sizes[BIOVEC_NR_POOLS] = { 1, 4, 16, 64, 128, 256 };
 
 #define BIO_MAX_PAGES	(bvec_pool_sizes[BIOVEC_NR_POOLS - 1])
 
-static void * slab_pool_alloc(int gfp_mask, void *data)
+static void *slab_pool_alloc(int gfp_mask, void *data)
 {
 	return kmem_cache_alloc(data, gfp_mask);
 }
diff --git a/fs/block_dev.c b/fs/block_dev.c
index de4cb8afa..301a62ef5 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -324,6 +324,7 @@ struct block_device *bdget(dev_t dev)
 			new_bdev->bd_dev = dev;
 			new_bdev->bd_op = NULL;
 			new_bdev->bd_inode = inode;
+			inode->i_mode = S_IFBLK;
 			inode->i_rdev = kdev;
 			inode->i_dev = kdev;
 			inode->i_bdev = new_bdev;
diff --git a/fs/buffer.c b/fs/buffer.c
index 405e81410..e724f5ade 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2005,12 +2005,12 @@ int generic_direct_IO(int rw, struct inode * inode, struct kiobuf * iobuf, unsig
 {
 	int i, nr_blocks, retval;
 	sector_t *blocks = iobuf->blocks;
-	struct buffer_head bh;
 
-	bh.b_dev = inode->i_dev;
 	nr_blocks = iobuf->length / blocksize;
 	/* build the blocklist */
 	for (i = 0; i < nr_blocks; i++, blocknr++) {
+		struct buffer_head bh;
+
 		bh.b_state = 0;
 		bh.b_dev = inode->i_dev;
 		bh.b_size = blocksize;
@@ -2037,7 +2037,7 @@ int generic_direct_IO(int rw, struct inode * inode, struct kiobuf * iobuf, unsig
 	}
 
 	/* This does not understand multi-device filesystems currently */
-	retval = brw_kiovec(rw, 1, &iobuf, bh.b_dev, blocks, blocksize);
+	retval = brw_kiovec(rw, 1, &iobuf, inode->i_dev, blocks, blocksize);
 
  out:
 	return retval;
diff --git a/include/asm-i386/io.h b/include/asm-i386/io.h
index 975f0bf61..a140326a5 100644
--- a/include/asm-i386/io.h
+++ b/include/asm-i386/io.h
@@ -51,12 +51,9 @@
  */
 #if CONFIG_DEBUG_IOVIRT
   extern void *__io_virt_debug(unsigned long x, const char *file, int line);
-  extern unsigned long __io_phys_debug(unsigned long x, const char *file, int line);
   #define __io_virt(x) __io_virt_debug((unsigned long)(x), __FILE__, __LINE__)
-//#define __io_phys(x) __io_phys_debug((unsigned long)(x), __FILE__, __LINE__)
 #else
   #define __io_virt(x) ((void *)(x))
-//#define __io_phys(x) __pa(x)
 #endif
 
 /*
diff --git a/include/asm-s390/io.h b/include/asm-s390/io.h
index a9c1a917a..e044135ef 100644
--- a/include/asm-s390/io.h
+++ b/include/asm-s390/io.h
@@ -19,7 +19,7 @@
 #define IO_SPACE_LIMIT 0xffffffff
 
 #define __io_virt(x)            ((void *)(PAGE_OFFSET | (unsigned long)(x)))
-#define __io_phys(x)            ((unsigned long)(x) & ~PAGE_OFFSET)
+
 /*
  * Change virtual addresses to physical addresses and vv.
  * These are pretty trivial
diff --git a/include/asm-s390x/io.h b/include/asm-s390x/io.h
index 2d0d2e79a..088e26498 100644
--- a/include/asm-s390x/io.h
+++ b/include/asm-s390x/io.h
@@ -19,7 +19,7 @@
 #define IO_SPACE_LIMIT 0xffffffff
 
 #define __io_virt(x)            ((void *)(PAGE_OFFSET | (unsigned long)(x)))
-#define __io_phys(x)            ((unsigned long)(x) & ~PAGE_OFFSET)
+
 /*
  * Change virtual addresses to physical addresses and vv.
  * These are pretty trivial
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 8f82353c7..a32ec05bb 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -171,7 +171,7 @@ struct request_queue
 	/*
 	 * protects queue structures from reentrancy
 	 */
-	spinlock_t		queue_lock;
+	spinlock_t		*queue_lock;
 
 	/*
 	 * queue settings
@@ -271,13 +271,14 @@ extern void blk_plug_device(request_queue_t *);
 extern void blk_recount_segments(request_queue_t *, struct bio *);
 extern inline int blk_phys_contig_segment(request_queue_t *q, struct bio *, struct bio *);
 extern inline int blk_hw_contig_segment(request_queue_t *q, struct bio *, struct bio *);
+extern void blk_queue_assign_lock(request_queue_t *q, spinlock_t *);
 
 extern int block_ioctl(kdev_t, unsigned int, unsigned long);
 
 /*
  * Access functions for manipulating queue properties
  */
-extern int blk_init_queue(request_queue_t *, request_fn_proc *);
+extern int blk_init_queue(request_queue_t *, request_fn_proc *, spinlock_t *);
 extern void blk_cleanup_queue(request_queue_t *);
 extern void blk_queue_make_request(request_queue_t *, make_request_fn *);
 extern void blk_queue_bounce_limit(request_queue_t *, u64);
diff --git a/include/linux/devfs_fs_kernel.h b/include/linux/devfs_fs_kernel.h
index df5aee5e7..c52d7eabb 100644
--- a/include/linux/devfs_fs_kernel.h
+++ b/include/linux/devfs_fs_kernel.h
@@ -46,14 +46,6 @@
 
 typedef struct devfs_entry * devfs_handle_t;
 
-
-#ifdef CONFIG_BLK_DEV_INITRD
-#  define ROOT_DEVICE_NAME ((real_root_dev ==ROOT_DEV) ? root_device_name:NULL)
-#else
-#  define ROOT_DEVICE_NAME root_device_name
-#endif
-
-
 #ifdef CONFIG_DEVFS_FS
 
 struct unique_numspace
diff --git a/include/linux/ide.h b/include/linux/ide.h
index 38a17222c..5bcdab80f 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -1001,7 +1001,6 @@ unsigned long ide_get_or_set_dma_base (ide_hwif_t *hwif, int extra, const char *
 
 void hwif_unregister (ide_hwif_t *hwif);
 
-#define DRIVE_LOCK(drive)	(&(drive)->queue.queue_lock)
 extern spinlock_t ide_lock;
 
 #endif /* _IDE_H */
diff --git a/include/linux/mempool.h b/include/linux/mempool.h
index 07e97d109..bd3745152 100644
--- a/include/linux/mempool.h
+++ b/include/linux/mempool.h
@@ -25,6 +25,7 @@ struct mempool_s {
 };
 extern mempool_t * mempool_create(int min_nr, mempool_alloc_t *alloc_fn,
 				 mempool_free_t *free_fn, void *pool_data);
+extern void mempool_resize(mempool_t *pool, int new_min_nr, int gfp_mask);
 extern void mempool_destroy(mempool_t *pool);
 extern void * mempool_alloc(mempool_t *pool, int gfp_mask);
 extern void mempool_free(void *element, mempool_t *pool);
diff --git a/include/linux/nbd.h b/include/linux/nbd.h
index 0dbf87851..6c8bc1e44 100644
--- a/include/linux/nbd.h
+++ b/include/linux/nbd.h
@@ -46,7 +46,7 @@ nbd_end_request(struct request *req)
 #ifdef PARANOIA
 	requests_out++;
 #endif
-	spin_lock_irqsave(&q->queue_lock, flags);
+	spin_lock_irqsave(q->queue_lock, flags);
 	while((bio = req->bio) != NULL) {
 		nsect = bio_sectors(bio);
 		blk_finished_io(nsect);
@@ -55,7 +55,7 @@ nbd_end_request(struct request *req)
 		bio_endio(bio, uptodate, nsect);
 	}
 	blkdev_release_request(req);
-	spin_unlock_irqrestore(&q->queue_lock, flags);
+	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 
 #define MAX_NBD 128
diff --git a/include/linux/raid/md.h b/include/linux/raid/md.h
index a7e18913e..233163eb2 100644
--- a/include/linux/raid/md.h
+++ b/include/linux/raid/md.h
@@ -37,8 +37,12 @@
 #include <linux/kernel_stat.h>
 #include <asm/io.h>
 #include <linux/completion.h>
+#include <linux/mempool.h>
+#include <linux/list.h>
+#include <linux/reboot.h>
+#include <linux/vmalloc.h>
+#include <linux/blkpg.h>
 
-#include <linux/raid/md_compatible.h>
 /*
  * 'md_p.h' holds the 'physical' layout of RAID devices
  * 'md_u.h' holds the user <=> kernel API
diff --git a/include/linux/raid/md_compatible.h b/include/linux/raid/md_compatible.h
deleted file mode 100644
index 74dadd4bb..000000000
--- a/include/linux/raid/md_compatible.h
+++ /dev/null
@@ -1,158 +0,0 @@
-
-/*
-   md.h : Multiple Devices driver compatibility layer for Linux 2.0/2.2
-          Copyright (C) 1998 Ingo Molnar
-	  
-   This program is free software; you can redistribute it and/or modify
-   it under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 2, or (at your option)
-   any later version.
-   
-   You should have received a copy of the GNU General Public License
-   (for example /usr/src/linux/COPYING); if not, write to the Free
-   Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.  
-*/
-
-#include <linux/version.h>
-
-#ifndef _MD_COMPATIBLE_H
-#define _MD_COMPATIBLE_H
-
-/** 2.3/2.4 stuff: **/
-
-#include <linux/reboot.h>
-#include <linux/vmalloc.h>
-#include <linux/blkpg.h>
-
-/* 000 */
-#define md__get_free_pages(x,y) __get_free_pages(x,y)
-
-#if defined(__i386__) || defined(__x86_64__)
-/* 001 */
-static __inline__ int md_cpu_has_mmx(void)
-{
-	return test_bit(X86_FEATURE_MMX,  &boot_cpu_data.x86_capability);
-}
-#else
-#define md_cpu_has_mmx(x)	(0)
-#endif
-
-/* 002 */
-#define md_clear_page(page)        clear_page(page)
-
-/* 003 */
-#define MD_EXPORT_SYMBOL(x) EXPORT_SYMBOL(x)
-
-/* 004 */
-#define md_copy_to_user(x,y,z) copy_to_user(x,y,z)
-
-/* 005 */
-#define md_copy_from_user(x,y,z) copy_from_user(x,y,z)
-
-/* 006 */
-#define md_put_user put_user
-
-/* 007 */
-static inline int md_capable_admin(void)
-{
-	return capable(CAP_SYS_ADMIN);
-}
-
-/* 008 */
-#define MD_FILE_TO_INODE(file) ((file)->f_dentry->d_inode)
-
-/* 009 */
-static inline void md_flush_signals (void)
-{
-	spin_lock(&current->sigmask_lock);
-	flush_signals(current);
-	spin_unlock(&current->sigmask_lock);
-}
- 
-/* 010 */
-static inline void md_init_signals (void)
-{
-        current->exit_signal = SIGCHLD;
-        siginitsetinv(&current->blocked, sigmask(SIGKILL));
-}
-
-/* 011 */
-#define md_signal_pending signal_pending
-
-/* 012 - md_set_global_readahead - nowhere used */
-
-/* 013 */
-#define md_mdelay(x) mdelay(x)
-
-/* 014 */
-#define MD_SYS_DOWN SYS_DOWN
-#define MD_SYS_HALT SYS_HALT
-#define MD_SYS_POWER_OFF SYS_POWER_OFF
-
-/* 015 */
-#define md_register_reboot_notifier register_reboot_notifier
-
-/* 016 */
-#define md_test_and_set_bit test_and_set_bit
-
-/* 017 */
-#define md_test_and_clear_bit test_and_clear_bit
-
-/* 018 */
-#define md_atomic_read atomic_read
-#define md_atomic_set atomic_set
-
-/* 019 */
-#define md_lock_kernel lock_kernel
-#define md_unlock_kernel unlock_kernel
-
-/* 020 */
-
-#include <linux/init.h>
-
-#define md__init __init
-#define md__initdata __initdata
-#define md__initfunc(__arginit) __initfunc(__arginit)
-
-/* 021 */
-
-
-/* 022 */
-
-#define md_list_head list_head
-#define MD_LIST_HEAD(name) LIST_HEAD(name)
-#define MD_INIT_LIST_HEAD(ptr) INIT_LIST_HEAD(ptr)
-#define md_list_add list_add
-#define md_list_del list_del
-#define md_list_empty list_empty
-
-#define md_list_entry(ptr, type, member) list_entry(ptr, type, member)
-
-/* 023 */
-
-#define md_schedule_timeout schedule_timeout
-
-/* 024 */
-#define md_need_resched(tsk) ((tsk)->need_resched)
-
-/* 025 */
-#define md_spinlock_t spinlock_t
-#define MD_SPIN_LOCK_UNLOCKED SPIN_LOCK_UNLOCKED
-
-#define md_spin_lock spin_lock
-#define md_spin_unlock spin_unlock
-#define md_spin_lock_irq spin_lock_irq
-#define md_spin_unlock_irq spin_unlock_irq
-#define md_spin_unlock_irqrestore spin_unlock_irqrestore
-#define md_spin_lock_irqsave spin_lock_irqsave
-
-/* 026 */
-typedef wait_queue_head_t md_wait_queue_head_t;
-#define MD_DECLARE_WAITQUEUE(w,t) DECLARE_WAITQUEUE((w),(t))
-#define MD_DECLARE_WAIT_QUEUE_HEAD(x) DECLARE_WAIT_QUEUE_HEAD(x)
-#define md_init_waitqueue_head init_waitqueue_head
-
-/* END */
-
-#endif 
-
diff --git a/include/linux/raid/md_k.h b/include/linux/raid/md_k.h
index 360402d76..8755d806e 100644
--- a/include/linux/raid/md_k.h
+++ b/include/linux/raid/md_k.h
@@ -158,9 +158,9 @@ static inline void mark_disk_nonsync(mdp_disk_t * d)
  */
 struct mdk_rdev_s
 {
-	struct md_list_head same_set;	/* RAID devices within the same set */
-	struct md_list_head all;	/* all RAID devices */
-	struct md_list_head pending;	/* undetected RAID devices */
+	struct list_head same_set;	/* RAID devices within the same set */
+	struct list_head all;		/* all RAID devices */
+	struct list_head pending;	/* undetected RAID devices */
 
 	kdev_t dev;			/* Device number */
 	kdev_t old_dev;			/*  "" when it was last imported */
@@ -197,7 +197,7 @@ struct mddev_s
 	int				__minor;
 	mdp_super_t			*sb;
 	int				nb_dev;
-	struct md_list_head 		disks;
+	struct list_head 		disks;
 	int				sb_dirty;
 	mdu_param_t			param;
 	int				ro;
@@ -212,9 +212,9 @@ struct mddev_s
 	atomic_t			active;
 
 	atomic_t			recovery_active; /* blocks scheduled, but not written */
-	md_wait_queue_head_t		recovery_wait;
+	wait_queue_head_t		recovery_wait;
 
-	struct md_list_head		all_mddevs;
+	struct list_head		all_mddevs;
 };
 
 struct mdk_personality_s
@@ -240,7 +240,7 @@ struct mdk_personality_s
 
 	int (*stop_resync)(mddev_t *mddev);
 	int (*restart_resync)(mddev_t *mddev);
-	int (*sync_request)(mddev_t *mddev, unsigned long block_nr);
+	int (*sync_request)(mddev_t *mddev, sector_t sector_nr);
 };
 
 
@@ -269,9 +269,9 @@ extern mdp_disk_t *get_spare(mddev_t *mddev);
  */
 #define ITERATE_RDEV_GENERIC(head,field,rdev,tmp)			\
 									\
-	for (tmp = head.next;						\
-		rdev = md_list_entry(tmp, mdk_rdev_t, field),		\
-			tmp = tmp->next, tmp->prev != &head		\
+	for ((tmp) = (head).next;					\
+		(rdev) = (list_entry((tmp), mdk_rdev_t, field)),	\
+			(tmp) = (tmp)->next, (tmp)->prev != &(head)	\
 		; )
 /*
  * iterates through the 'same array disks' ringlist
@@ -305,7 +305,7 @@ extern mdp_disk_t *get_spare(mddev_t *mddev);
 #define ITERATE_MDDEV(mddev,tmp)					\
 									\
 	for (tmp = all_mddevs.next;					\
-		mddev = md_list_entry(tmp, mddev_t, all_mddevs),	\
+		mddev = list_entry(tmp, mddev_t, all_mddevs),	\
 			tmp = tmp->next, tmp->prev != &all_mddevs	\
 		; )
 
@@ -325,7 +325,7 @@ static inline void unlock_mddev (mddev_t * mddev)
 typedef struct mdk_thread_s {
 	void			(*run) (void *data);
 	void			*data;
-	md_wait_queue_head_t	wqueue;
+	wait_queue_head_t	wqueue;
 	unsigned long           flags;
 	struct completion	*event;
 	struct task_struct	*tsk;
@@ -337,7 +337,7 @@ typedef struct mdk_thread_s {
 #define MAX_DISKNAME_LEN 64
 
 typedef struct dev_name_s {
-	struct md_list_head list;
+	struct list_head list;
 	kdev_t dev;
 	char namebuf [MAX_DISKNAME_LEN];
 	char *name;
diff --git a/include/linux/raid/raid1.h b/include/linux/raid/raid1.h
index 40675b40c..c03eabf2e 100644
--- a/include/linux/raid/raid1.h
+++ b/include/linux/raid/raid1.h
@@ -3,6 +3,8 @@
 
 #include <linux/raid/md.h>
 
+typedef struct mirror_info mirror_info_t;
+
 struct mirror_info {
 	int		number;
 	int		raid_disk;
@@ -20,34 +22,21 @@ struct mirror_info {
 	int		used_slot;
 };
 
-struct raid1_private_data {
+typedef struct r1bio_s r1bio_t;
+
+struct r1_private_data_s {
 	mddev_t			*mddev;
-	struct mirror_info	mirrors[MD_SB_DISKS];
+	mirror_info_t		mirrors[MD_SB_DISKS];
 	int			nr_disks;
 	int			raid_disks;
 	int			working_disks;
 	int			last_used;
-	unsigned long		next_sect;
+	sector_t		next_sect;
 	int			sect_count;
 	mdk_thread_t		*thread, *resync_thread;
 	int			resync_mirrors;
-	struct mirror_info	*spare;
-	md_spinlock_t		device_lock;
-
-	/* buffer pool */
-	/* buffer_heads that we have pre-allocated have b_pprev -> &freebh
-	 * and are linked into a stack using b_next
-	 * raid1_bh that are pre-allocated have R1BH_PreAlloc set.
-	 * All these variable are protected by device_lock
-	 */
-	struct buffer_head	*freebh;
-	int			freebh_cnt;	/* how many are on the list */
-	int			freebh_blocked;
-	struct raid1_bh		*freer1;
-	int			freer1_blocked;
-	int			freer1_cnt;
-	struct raid1_bh		*freebuf; 	/* each bh_req has a page allocated */
-	md_wait_queue_head_t	wait_buffer;
+	mirror_info_t		*spare;
+	spinlock_t		device_lock;
 
 	/* for use when syncing mirrors: */
 	unsigned long	start_active, start_ready,
@@ -56,18 +45,21 @@ struct raid1_private_data {
 		cnt_pending, cnt_future;
 	int	phase;
 	int	window;
-	md_wait_queue_head_t	wait_done;
-	md_wait_queue_head_t	wait_ready;
-	md_spinlock_t		segment_lock;
+	wait_queue_head_t	wait_done;
+	wait_queue_head_t	wait_ready;
+	spinlock_t		segment_lock;
+
+	mempool_t *r1bio_pool;
+	mempool_t *r1buf_pool;
 };
 
-typedef struct raid1_private_data raid1_conf_t;
+typedef struct r1_private_data_s conf_t;
 
 /*
  * this is the only point in the RAID code where we violate
  * C type safety. mddev->private is an 'opaque' pointer.
  */
-#define mddev_to_conf(mddev) ((raid1_conf_t *) mddev->private)
+#define mddev_to_conf(mddev) ((conf_t *) mddev->private)
 
 /*
  * this is our 'private' 'collective' RAID1 buffer head.
@@ -75,20 +67,32 @@ typedef struct raid1_private_data raid1_conf_t;
  * for this RAID1 operation, and about their status:
  */
 
-struct raid1_bh {
+struct r1bio_s {
 	atomic_t		remaining; /* 'have we finished' count,
 					    * used from IRQ handlers
 					    */
 	int			cmd;
+	sector_t		sector;
 	unsigned long		state;
 	mddev_t			*mddev;
-	struct buffer_head	*master_bh;
-	struct buffer_head	*mirror_bh_list;
-	struct buffer_head	bh_req;
-	struct raid1_bh		*next_r1;	/* next for retry or in free list */
+	/*
+	 * original bio going to /dev/mdx
+	 */
+	struct bio		*master_bio;
+	/*
+	 * if the IO is in READ direction, then this bio is used:
+	 */
+	struct bio		*read_bio;
+	/*
+	 * if the IO is in WRITE direction, then multiple bios are used:
+	 */
+	struct bio		*write_bios[MD_SB_DISKS];
+
+	r1bio_t			*next_r1; /* next for retry or in free list */
+	struct list_head	retry_list;
 };
-/* bits for raid1_bh.state */
-#define	R1BH_Uptodate	1
-#define	R1BH_SyncPhase	2
-#define	R1BH_PreAlloc	3	/* this was pre-allocated, add to free list */
+
+/* bits for r1bio.state */
+#define	R1BIO_Uptodate	1
+#define	R1BIO_SyncPhase	2
 #endif
diff --git a/init/do_mounts.c b/init/do_mounts.c
index d34fdd7ae..e6a94292c 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -14,37 +14,44 @@
 #include <linux/nfs_fs.h>
 #include <linux/nfs_fs_sb.h>
 #include <linux/nfs_mount.h>
+#include <linux/minix_fs.h>
+#include <linux/ext2_fs.h>
+#include <linux/romfs_fs.h>
 
 #include <asm/uaccess.h>
 
-/* syscalls missing from unistd.h */
- 
-static inline _syscall2(int,mkdir,char *,name,int,mode);
-static inline _syscall1(int,chdir,char *,name);
-static inline _syscall1(int,chroot,char *,name);
-static inline _syscall1(int,unlink,char *,name);
-static inline _syscall3(int,mknod,char *,name,int,mode,dev_t,dev);
-static inline _syscall5(int,mount,char *,dev,char *,dir,char *,type,
-			unsigned long,flags,void *,data);
-static inline _syscall2(int,umount,char *,name,int,flags);
-
-extern void rd_load(void);
-extern void initrd_load(void);
+#define BUILD_CRAMDISK
+
 extern int get_filesystem_list(char * buf);
 extern void wait_for_keypress(void);
 
-asmlinkage long sys_mount(char * dev_name, char * dir_name, char * type,
-	 unsigned long flags, void * data);
+asmlinkage long sys_mount(char *dev_name, char *dir_name, char *type,
+	 unsigned long flags, void *data);
+asmlinkage long sys_mkdir(char *name, int mode);
+asmlinkage long sys_chdir(char *name);
+asmlinkage long sys_chroot(char *name);
+asmlinkage long sys_unlink(char *name);
+asmlinkage long sys_symlink(char *old, char *new);
+asmlinkage long sys_mknod(char *name, int mode, dev_t dev);
+asmlinkage long sys_umount(char *name, int flags);
+asmlinkage long sys_ioctl(int fd, int cmd, unsigned long arg);
 
 #ifdef CONFIG_BLK_DEV_INITRD
 unsigned int real_root_dev;	/* do_proc_dointvec cannot handle kdev_t */
 #endif
-int root_mountflags = MS_RDONLY;
-char root_device_name[64];
+#ifdef CONFIG_BLK_DEV_RAM
+extern int rd_doload;
+#else
+static int rd_doload = 0;
+#endif
+int root_mountflags = MS_RDONLY | MS_VERBOSE;
+static char root_device_name[64];
 
 /* this is initialized in init/main.c */
 kdev_t ROOT_DEV;
 
+static int do_devfs = 0;
+
 static int __init readonly(char *str)
 {
 	if (*str)
@@ -275,91 +282,20 @@ static void __init get_fs_names(char *page)
 	}
 	*s = '\0';
 }
-
-static void __init mount_root(void)
+static void __init mount_block_root(char *name, int flags)
 {
-	void *handle;
-	char path[64];
-	char *name = "/dev/root";
-	char *fs_names, *p;
-	int do_devfs = 0;
+	char *fs_names = __getname();
+	char *p;
 
-	root_mountflags |= MS_VERBOSE;
-
-	fs_names = __getname();
 	get_fs_names(fs_names);
-
-#ifdef CONFIG_ROOT_NFS
-	if (MAJOR(ROOT_DEV) == UNNAMED_MAJOR) {
-		void *data;
-		data = nfs_root_data();
-		if (data) {
-			int err = mount("/dev/root", "/root", "nfs", root_mountflags, data);
-			if (!err)
-				goto done;
-		}
-		printk(KERN_ERR "VFS: Unable to mount root fs via NFS, trying floppy.\n");
-		ROOT_DEV = MKDEV(FLOPPY_MAJOR, 0);
-	}
-#endif
-
-#ifdef CONFIG_BLK_DEV_FD
-	if (MAJOR(ROOT_DEV) == FLOPPY_MAJOR) {
-#ifdef CONFIG_BLK_DEV_RAM
-		extern int rd_doload;
-		extern void rd_load_secondary(void);
-#endif
-		floppy_eject();
-#ifndef CONFIG_BLK_DEV_RAM
-		printk(KERN_NOTICE "(Warning, this kernel has no ramdisk support)\n");
-#else
-		/* rd_doload is 2 for a dual initrd/ramload setup */
-		if(rd_doload==2)
-			rd_load_secondary();
-		else
-#endif
-		{
-			printk(KERN_NOTICE "VFS: Insert root floppy and press ENTER\n");
-			wait_for_keypress();
-		}
-	}
-#endif
-
-	devfs_make_root (root_device_name);
-	handle = devfs_find_handle (NULL, ROOT_DEVICE_NAME,
-	                            MAJOR (ROOT_DEV), MINOR (ROOT_DEV),
-				    DEVFS_SPECIAL_BLK, 1);
-	if (handle) {
-		int n;
-		unsigned major, minor;
-
-		devfs_get_maj_min (handle, &major, &minor);
-		ROOT_DEV = MKDEV (major, minor);
-		if (!ROOT_DEV)
-			panic("I have no root and I want to scream");
-		n = devfs_generate_path (handle, path + 5, sizeof (path) - 5);
-		if (n >= 0) {
-			name = path + n;
-			devfs_mk_symlink (NULL, "root", DEVFS_FL_DEFAULT,
-					  name + 5, NULL, NULL);
-			memcpy (name, "/dev/", 5);
-			do_devfs = 1;
-		}
-	}
-	chdir("/dev");
-	unlink("root");
-	mknod("root", S_IFBLK|0600, kdev_t_to_nr(ROOT_DEV));
-	if (do_devfs)
-		mount("devfs", ".", "devfs", 0, NULL);
 retry:
 	for (p = fs_names; *p; p += strlen(p)+1) {
-		int err;
-		err = sys_mount(name,"/root",p,root_mountflags,root_mount_data);
+		int err = sys_mount(name, "/root", p, flags, root_mount_data);
 		switch (err) {
 			case 0:
-				goto done;
+				goto out;
 			case -EACCES:
-				root_mountflags |= MS_RDONLY;
+				flags |= MS_RDONLY;
 				goto retry;
 			case -EINVAL:
 				continue;
@@ -375,94 +311,324 @@ retry:
 			kdevname(ROOT_DEV));
 	}
 	panic("VFS: Unable to mount root fs on %s", kdevname(ROOT_DEV));
-
-done:
+out:
 	putname(fs_names);
-	if (do_devfs)
-		umount(".", 0);
+	sys_chdir("/root");
+	ROOT_DEV = current->fs->pwdmnt->mnt_sb->s_dev;
+	printk("VFS: Mounted root (%s filesystem)%s.\n",
+		current->fs->pwdmnt->mnt_sb->s_type->name,
+		(current->fs->pwdmnt->mnt_sb->s_flags & MS_RDONLY) ? " readonly" : "");
 }
+ 
+#ifdef CONFIG_ROOT_NFS
+static int __init mount_nfs_root(void)
+{
+	void *data = nfs_root_data();
 
-#ifdef CONFIG_BLK_DEV_INITRD
+	if (data && sys_mount("/dev/root","/root","nfs",root_mountflags,data) == 0)
+		return 1;
+	return 0;
+}
+#endif
 
-static int __init change_root(kdev_t new_root_dev,const char *put_old)
+static int __init create_dev(char *name, kdev_t dev, char *devfs_name)
+{
+	void *handle;
+	char path[64];
+	int n;
+
+	sys_unlink(name);
+	if (!do_devfs)
+		return sys_mknod(name, S_IFBLK|0600, kdev_t_to_nr(dev));
+
+	handle = devfs_find_handle(NULL, dev ? NULL : devfs_name,
+				MAJOR(dev), MINOR(dev), DEVFS_SPECIAL_BLK, 1);
+	if (!handle)
+		return -1;
+	n = devfs_generate_path(handle, path + 5, sizeof (path) - 5);
+	if (n < 0)
+		return -1;
+	return sys_symlink(path + n + 5, name);
+}
+
+#ifdef CONFIG_MAC_FLOPPY
+int swim3_fd_eject(int devnum);
+#endif
+static void __init change_floppy(char *fmt, ...)
 {
-	struct vfsmount *old_rootmnt;
-	struct nameidata devfs_nd;
-	char *new_devname = kmalloc(strlen("/dev/root.old")+1, GFP_KERNEL);
-	int error = 0;
-
-	if (new_devname)
-		strcpy(new_devname, "/dev/root.old");
-
-	/* .. here is directory mounted over root */
-	mount("..", ".", NULL, MS_MOVE, NULL);
-	chdir("/old");
-
-	read_lock(&current->fs->lock);
-	old_rootmnt = mntget(current->fs->pwdmnt);
-	read_unlock(&current->fs->lock);
-
-	/*  First unmount devfs if mounted  */
-	if (path_init("/old/dev", LOOKUP_FOLLOW|LOOKUP_POSITIVE, &devfs_nd))
-		error = path_walk("/old/dev", &devfs_nd);
-	if (!error) {
-		if (devfs_nd.mnt->mnt_sb->s_magic == DEVFS_SUPER_MAGIC &&
-		    devfs_nd.dentry == devfs_nd.mnt->mnt_root)
-			umount("/old/dev", 0);
-		path_release(&devfs_nd);
+	extern void wait_for_keypress(void);
+	char buf[80];
+	va_list args;
+	va_start(args, fmt);
+	vsprintf(buf, fmt, args);
+	va_end(args);
+#ifdef CONFIG_BLK_DEV_FD
+	floppy_eject();
+#endif
+#ifdef CONFIG_MAC_FLOPPY
+	swim3_fd_eject(MINOR(ROOT_DEV));
+#endif
+	printk(KERN_NOTICE "VFS: Insert %s and press ENTER\n", buf);
+	wait_for_keypress();
+}
+
+#ifdef CONFIG_BLK_DEV_RAM
+
+static int __init crd_load(int in_fd, int out_fd);
+
+/*
+ * This routine tries to find a RAM disk image to load, and returns the
+ * number of blocks to read for a non-compressed image, 0 if the image
+ * is a compressed image, and -1 if an image with the right magic
+ * numbers could not be found.
+ *
+ * We currently check for the following magic numbers:
+ * 	minix
+ * 	ext2
+ *	romfs
+ * 	gzip
+ */
+static int __init 
+identify_ramdisk_image(int fd, int start_block)
+{
+	const int size = 512;
+	struct minix_super_block *minixsb;
+	struct ext2_super_block *ext2sb;
+	struct romfs_super_block *romfsb;
+	int nblocks = -1;
+	unsigned char *buf;
+
+	buf = kmalloc(size, GFP_KERNEL);
+	if (buf == 0)
+		return -1;
+
+	minixsb = (struct minix_super_block *) buf;
+	ext2sb = (struct ext2_super_block *) buf;
+	romfsb = (struct romfs_super_block *) buf;
+	memset(buf, 0xe5, size);
+
+	/*
+	 * Read block 0 to test for gzipped kernel
+	 */
+	lseek(fd, start_block * BLOCK_SIZE, 0);
+	read(fd, buf, size);
+
+	/*
+	 * If it matches the gzip magic numbers, return -1
+	 */
+	if (buf[0] == 037 && ((buf[1] == 0213) || (buf[1] == 0236))) {
+		printk(KERN_NOTICE
+		       "RAMDISK: Compressed image found at block %d\n",
+		       start_block);
+		nblocks = 0;
+		goto done;
 	}
 
-	ROOT_DEV = new_root_dev;
-	mount_root();
+	/* romfs is at block zero too */
+	if (romfsb->word0 == ROMSB_WORD0 &&
+	    romfsb->word1 == ROMSB_WORD1) {
+		printk(KERN_NOTICE
+		       "RAMDISK: romfs filesystem found at block %d\n",
+		       start_block);
+		nblocks = (ntohl(romfsb->size)+BLOCK_SIZE-1)>>BLOCK_SIZE_BITS;
+		goto done;
+	}
 
-	chdir("/root");
-	ROOT_DEV = current->fs->pwdmnt->mnt_sb->s_dev;
-	printk("VFS: Mounted root (%s filesystem)%s.\n",
-		current->fs->pwdmnt->mnt_sb->s_type->name,
-		(current->fs->pwdmnt->mnt_sb->s_flags & MS_RDONLY) ? " readonly" : "");
+	/*
+	 * Read block 1 to test for minix and ext2 superblock
+	 */
+	lseek(fd, (start_block+1) * BLOCK_SIZE, 0);
+	read(fd, buf, size);
+
+	/* Try minix */
+	if (minixsb->s_magic == MINIX_SUPER_MAGIC ||
+	    minixsb->s_magic == MINIX_SUPER_MAGIC2) {
+		printk(KERN_NOTICE
+		       "RAMDISK: Minix filesystem found at block %d\n",
+		       start_block);
+		nblocks = minixsb->s_nzones << minixsb->s_log_zone_size;
+		goto done;
+	}
+
+	/* Try ext2 */
+	if (ext2sb->s_magic == cpu_to_le16(EXT2_SUPER_MAGIC)) {
+		printk(KERN_NOTICE
+		       "RAMDISK: ext2 filesystem found at block %d\n",
+		       start_block);
+		nblocks = le32_to_cpu(ext2sb->s_blocks_count);
+		goto done;
+	}
 
-#if 1
-	shrink_dcache();
-	printk("change_root: old root has d_count=%d\n", 
-	       atomic_read(&old_rootmnt->mnt_root->d_count));
+	printk(KERN_NOTICE
+	       "RAMDISK: Couldn't find valid RAM disk image starting at %d.\n",
+	       start_block);
+	
+done:
+	lseek(fd, start_block * BLOCK_SIZE, 0);
+	kfree(buf);
+	return nblocks;
+}
 #endif
 
-	error = mount("/old", "/root/initrd", NULL, MS_MOVE, NULL);
-	if (error) {
-		int blivet;
-		struct block_device *ramdisk = old_rootmnt->mnt_sb->s_bdev;
-
-		atomic_inc(&ramdisk->bd_count);
-		blivet = blkdev_get(ramdisk, FMODE_READ, 0, BDEV_FS);
-		printk(KERN_NOTICE "Trying to unmount old root ... ");
-		umount("/old", MNT_DETACH);
-		if (!blivet) {
-			blivet = ioctl_by_bdev(ramdisk, BLKFLSBUF, 0);
-			blkdev_put(ramdisk, BDEV_FS);
-		}
-		if (blivet) {
-			printk(KERN_ERR "error %d\n", blivet);
-		} else {
-			printk("okay\n");
-			error = 0;
+static int __init rd_load_image(char *from)
+{
+	int res = 0;
+
+#ifdef CONFIG_BLK_DEV_RAM
+	int in_fd, out_fd;
+	int nblocks, rd_blocks, devblocks, i;
+	char *buf;
+	unsigned short rotate = 0;
+#if !defined(CONFIG_ARCH_S390) && !defined(CONFIG_PPC_ISERIES)
+	char rotator[4] = { '|' , '/' , '-' , '\\' };
+#endif
+
+	out_fd = open("/dev/ram", O_RDWR, 0);
+	if (out_fd < 0)
+		goto out;
+
+	in_fd = open(from, O_RDONLY, 0);
+	if (in_fd < 0)
+		goto noclose_input;
+
+	nblocks = identify_ramdisk_image(in_fd, rd_image_start);
+	if (nblocks < 0)
+		goto done;
+
+	if (nblocks == 0) {
+#ifdef BUILD_CRAMDISK
+		if (crd_load(in_fd, out_fd) == 0)
+			goto successful_load;
+#else
+		printk(KERN_NOTICE
+		       "RAMDISK: Kernel does not support compressed "
+		       "RAM disk images\n");
+#endif
+		goto done;
+	}
+
+	/*
+	 * NOTE NOTE: nblocks suppose that the blocksize is BLOCK_SIZE, so
+	 * rd_load_image will work only with filesystem BLOCK_SIZE wide!
+	 * So make sure to use 1k blocksize while generating ext2fs
+	 * ramdisk-images.
+	 */
+	if (sys_ioctl(out_fd, BLKGETSIZE, (unsigned long)&rd_blocks) < 0)
+		rd_blocks = 0;
+	else
+		rd_blocks >>= 1;
+
+	if (nblocks > rd_blocks) {
+		printk("RAMDISK: image too big! (%d/%d blocks)\n",
+		       nblocks, rd_blocks);
+		goto done;
+	}
+		
+	/*
+	 * OK, time to copy in the data
+	 */
+	buf = kmalloc(BLOCK_SIZE, GFP_KERNEL);
+	if (buf == 0) {
+		printk(KERN_ERR "RAMDISK: could not allocate buffer\n");
+		goto done;
+	}
+
+	if (sys_ioctl(in_fd, BLKGETSIZE, (unsigned long)&devblocks) < 0)
+		devblocks = 0;
+	else
+		devblocks >>= 1;
+
+	if (strcmp(from, "/dev/initrd") == 0)
+		devblocks = nblocks;
+
+	if (devblocks == 0) {
+		printk(KERN_ERR "RAMDISK: could not determine device size\n");
+		goto done;
+	}
+
+	printk(KERN_NOTICE "RAMDISK: Loading %d blocks [%d disk%s] into ram disk... ", 
+		nblocks, ((nblocks-1)/devblocks)+1, nblocks>devblocks ? "s" : "");
+	for (i=0; i < nblocks; i++) {
+		if (i && (i % devblocks == 0)) {
+			printk("done disk #%d.\n", i/devblocks);
+			rotate = 0;
+			if (close(in_fd)) {
+				printk("Error closing the disk.\n");
+				goto noclose_input;
+			}
+			change_floppy("disk #%d", i/devblocks+1);
+			in_fd = open(from, O_RDONLY, 0);
+			if (in_fd < 0)  {
+				printk("Error opening disk.\n");
+				goto noclose_input;
+			}
+			printk("Loading disk #%d... ", i/devblocks+1);
 		}
-	} else {
-		spin_lock(&dcache_lock);
-		if (new_devname) {
-			void *p = old_rootmnt->mnt_devname;
-			old_rootmnt->mnt_devname = new_devname;
-			new_devname = p;
+		read(in_fd, buf, BLOCK_SIZE);
+		write(out_fd, buf, BLOCK_SIZE);
+#if !defined(CONFIG_ARCH_S390) && !defined(CONFIG_PPC_ISERIES)
+		if (!(i % 16)) {
+			printk("%c\b", rotator[rotate & 0x3]);
+			rotate++;
 		}
-		spin_unlock(&dcache_lock);
+#endif
 	}
+	printk("done.\n");
+	kfree(buf);
 
-	/* put the old stuff */
-	mntput(old_rootmnt);
-	kfree(new_devname);
-	return error;
+successful_load:
+	res = 1;
+done:
+	close(in_fd);
+noclose_input:
+	close(out_fd);
+out:
+	sys_unlink("/dev/ram");
+#endif
+	return res;
+}
+
+static int __init rd_load_disk(int n)
+{
+#ifdef CONFIG_BLK_DEV_RAM
+	extern int rd_prompt;
+	if (rd_prompt)
+		change_floppy("root floppy disk to be loaded into RAM disk");
+	create_dev("/dev/ram", MKDEV(RAMDISK_MAJOR, n), NULL);
+#endif
+	return rd_load_image("/dev/root");
 }
 
+static void __init mount_root(void)
+{
+#ifdef CONFIG_ROOT_NFS
+	if (MAJOR(ROOT_DEV) == UNNAMED_MAJOR) {
+		if (mount_nfs_root()) {
+			sys_chdir("/root");
+			ROOT_DEV = current->fs->pwdmnt->mnt_sb->s_dev;
+			printk("VFS: Mounted root (nfs filesystem).\n");
+			return;
+		}
+		printk(KERN_ERR "VFS: Unable to mount root fs via NFS, trying floppy.\n");
+		ROOT_DEV = MKDEV(FLOPPY_MAJOR, 0);
+	}
 #endif
+	devfs_make_root(root_device_name);
+	create_dev("/dev/root", ROOT_DEV, root_device_name);
+#ifdef CONFIG_BLK_DEV_FD
+	if (MAJOR(ROOT_DEV) == FLOPPY_MAJOR) {
+		/* rd_doload is 2 for a dual initrd/ramload setup */
+		if (rd_doload==2) {
+			if (rd_load_disk(1)) {
+				ROOT_DEV = MKDEV(RAMDISK_MAJOR, 1);
+				create_dev("/dev/root", ROOT_DEV, NULL);
+			}
+		} else
+			change_floppy("root floppy");
+	}
+#endif
+	mount_block_root("/dev/root", root_mountflags);
+}
 
 #ifdef CONFIG_BLK_DEV_INITRD
 static int do_linuxrc(void * shell)
@@ -470,9 +636,9 @@ static int do_linuxrc(void * shell)
 	static char *argv[] = { "linuxrc", NULL, };
 	extern char * envp_init[];
 
-	chdir("/root");
-	mount(".", "/", NULL, MS_MOVE, NULL);
-	chroot(".");
+	sys_chdir("/root");
+	sys_mount(".", "/", NULL, MS_MOVE, NULL);
+	sys_chroot(".");
 
 	mount_devfs_fs ();
 
@@ -486,76 +652,247 @@ static int do_linuxrc(void * shell)
 
 #endif
 
+static void __init handle_initrd(void)
+{
+#ifdef CONFIG_BLK_DEV_INITRD
+	int ram0 = kdev_t_to_nr(MKDEV(RAMDISK_MAJOR,0));
+	int error;
+	int i, pid;
+
+	create_dev("/dev/root.old", ram0, NULL);
+	mount_block_root("/dev/root.old", root_mountflags & ~MS_RDONLY);
+	sys_mkdir("/old", 0700);
+	sys_chdir("/old");
+
+	pid = kernel_thread(do_linuxrc, "/linuxrc", SIGCHLD);
+	if (pid > 0) {
+		while (pid != wait(&i)) {
+			current->policy |= SCHED_YIELD;
+			schedule();
+		}
+	}
+
+	sys_mount("..", ".", NULL, MS_MOVE, NULL);
+	sys_umount("/old/dev", 0);
+
+	if (real_root_dev == ram0) {
+		sys_chdir("/old");
+		return;
+	}
+
+	ROOT_DEV = real_root_dev;
+	mount_root();
+
+	printk(KERN_NOTICE "Trying to move old root to /initrd ... ");
+	error = sys_mount("/old", "/root/initrd", NULL, MS_MOVE, NULL);
+	if (!error)
+		printk("okay\n");
+	else {
+		int fd = open("/dev/root.old", O_RDWR, 0);
+		printk("failed\n");
+		printk(KERN_NOTICE "Unmounting old root\n");
+		sys_umount("/old", MNT_DETACH);
+		printk(KERN_NOTICE "Trying to free ramdisk memory ... ");
+		if (fd < 0) {
+			error = fd;
+		} else {
+			error = sys_ioctl(fd, BLKFLSBUF, 0);
+			close(fd);
+		}
+		printk(error ? "okay\n" : "failed\n");
+	}
+#endif
+}
+
+static int __init initrd_load(void)
+{
+#ifdef CONFIG_BLK_DEV_INITRD
+	create_dev("/dev/ram", MKDEV(RAMDISK_MAJOR, 0), NULL);
+	create_dev("/dev/initrd", MKDEV(RAMDISK_MAJOR, INITRD_MINOR), NULL);
+#endif
+	return rd_load_image("/dev/initrd");
+}
+
 /*
  * Prepare the namespace - decide what/where to mount, load ramdisks, etc.
  */
 void prepare_namespace(void)
 {
+	int do_initrd = 0;
+	int is_floppy = MAJOR(ROOT_DEV) == FLOPPY_MAJOR;
 #ifdef CONFIG_BLK_DEV_INITRD
-	int real_root_mountflags = root_mountflags;
 	if (!initrd_start)
 		mount_initrd = 0;
 	if (mount_initrd)
-		root_mountflags &= ~MS_RDONLY;
+		do_initrd = 1;
 	real_root_dev = ROOT_DEV;
 #endif
-	mkdir("/dev", 0700);
-	mkdir("/root", 0700);
-
-#ifdef CONFIG_BLK_DEV_RAM
-#ifdef CONFIG_BLK_DEV_INITRD
-	if (mount_initrd)
-		initrd_load();
-	else
-#endif
-	rd_load();
+	sys_mkdir("/dev", 0700);
+	sys_mkdir("/root", 0700);
+#ifdef CONFIG_DEVFS_FS
+	sys_mount("devfs", "/dev", "devfs", 0, NULL);
+	do_devfs = 1;
 #endif
 
-	/* Mount the root filesystem.. */
+	create_dev("/dev/root", ROOT_DEV, NULL);
+	if (do_initrd) {
+		if (initrd_load() && ROOT_DEV != MKDEV(RAMDISK_MAJOR, 0)) {
+			handle_initrd();
+			goto out;
+		}
+	} else if (is_floppy && rd_doload && rd_load_disk(0))
+		ROOT_DEV = MKDEV(RAMDISK_MAJOR, 0);
 	mount_root();
-	chdir("/root");
-	ROOT_DEV = current->fs->pwdmnt->mnt_sb->s_dev;
-	printk("VFS: Mounted root (%s filesystem)%s.\n",
-		current->fs->pwdmnt->mnt_sb->s_type->name,
-		(current->fs->pwdmnt->mnt_sb->s_flags & MS_RDONLY) ? " readonly" : "");
+out:
+	sys_umount("/dev", 0);
+	sys_mount(".", "/", NULL, MS_MOVE, NULL);
+	sys_chroot(".");
+	mount_devfs_fs ();
+}
 
-#ifdef CONFIG_BLK_DEV_INITRD
-	root_mountflags = real_root_mountflags;
-	if (mount_initrd && ROOT_DEV != real_root_dev
-	    && MAJOR(ROOT_DEV) == RAMDISK_MAJOR && MINOR(ROOT_DEV) == 0) {
-		int error;
-		int i, pid;
-		mkdir("/old", 0700);
-		chdir("/old");
-
-		pid = kernel_thread(do_linuxrc, "/linuxrc", SIGCHLD);
-		if (pid > 0) {
-			while (pid != wait(&i)) {
-				current->policy |= SCHED_YIELD;
-				schedule();
-			}
-		}
-		if (MAJOR(real_root_dev) != RAMDISK_MAJOR
-		     || MINOR(real_root_dev) != 0) {
-			error = change_root(real_root_dev,"/initrd");
-			if (error)
-				printk(KERN_ERR "Change root to /initrd: "
-				    "error %d\n",error);
-
-			chdir("/root");
-			mount(".", "/", NULL, MS_MOVE, NULL);
-			chroot(".");
-
-			mount_devfs_fs ();
-			return;
-		}
-		chroot("..");
-		chdir("/");
-		return;
-	}
+#ifdef BUILD_CRAMDISK
+
+/*
+ * gzip declarations
+ */
+
+#define OF(args)  args
+
+#ifndef memzero
+#define memzero(s, n)     memset ((s), 0, (n))
 #endif
-	mount(".", "/", NULL, MS_MOVE, NULL);
-	chroot(".");
 
-	mount_devfs_fs ();
+typedef unsigned char  uch;
+typedef unsigned short ush;
+typedef unsigned long  ulg;
+
+#define INBUFSIZ 4096
+#define WSIZE 0x8000    /* window size--must be a power of two, and */
+			/*  at least 32K for zip's deflate method */
+
+static uch *inbuf;
+static uch *window;
+
+static unsigned insize;  /* valid bytes in inbuf */
+static unsigned inptr;   /* index of next byte to be processed in inbuf */
+static unsigned outcnt;  /* bytes in output buffer */
+static int exit_code;
+static long bytes_out;
+static int crd_infd, crd_outfd;
+
+#define get_byte()  (inptr < insize ? inbuf[inptr++] : fill_inbuf())
+		
+/* Diagnostic functions (stubbed out) */
+#define Assert(cond,msg)
+#define Trace(x)
+#define Tracev(x)
+#define Tracevv(x)
+#define Tracec(c,x)
+#define Tracecv(c,x)
+
+#define STATIC static
+
+static int  fill_inbuf(void);
+static void flush_window(void);
+static void *malloc(int size);
+static void free(void *where);
+static void error(char *m);
+static void gzip_mark(void **);
+static void gzip_release(void **);
+
+#include "../lib/inflate.c"
+
+static void __init *malloc(int size)
+{
+	return kmalloc(size, GFP_KERNEL);
+}
+
+static void __init free(void *where)
+{
+	kfree(where);
+}
+
+static void __init gzip_mark(void **ptr)
+{
 }
+
+static void __init gzip_release(void **ptr)
+{
+}
+
+
+/* ===========================================================================
+ * Fill the input buffer. This is called only when the buffer is empty
+ * and at least one byte is really needed.
+ */
+static int __init fill_inbuf(void)
+{
+	if (exit_code) return -1;
+	
+	insize = read(crd_infd, inbuf, INBUFSIZ);
+	if (insize == 0) return -1;
+
+	inptr = 1;
+
+	return inbuf[0];
+}
+
+/* ===========================================================================
+ * Write the output window window[0..outcnt-1] and update crc and bytes_out.
+ * (Used for the decompressed data only.)
+ */
+static void __init flush_window(void)
+{
+    ulg c = crc;         /* temporary variable */
+    unsigned n;
+    uch *in, ch;
+    
+    write(crd_outfd, window, outcnt);
+    in = window;
+    for (n = 0; n < outcnt; n++) {
+	    ch = *in++;
+	    c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
+    }
+    crc = c;
+    bytes_out += (ulg)outcnt;
+    outcnt = 0;
+}
+
+static void __init error(char *x)
+{
+	printk(KERN_ERR "%s", x);
+	exit_code = 1;
+}
+
+static int __init crd_load(int in_fd, int out_fd)
+{
+	int result;
+
+	insize = 0;		/* valid bytes in inbuf */
+	inptr = 0;		/* index of next byte to be processed in inbuf */
+	outcnt = 0;		/* bytes in output buffer */
+	exit_code = 0;
+	bytes_out = 0;
+	crc = (ulg)0xffffffffL; /* shift register contents */
+
+	crd_infd = in_fd;
+	crd_outfd = out_fd;
+	inbuf = kmalloc(INBUFSIZ, GFP_KERNEL);
+	if (inbuf == 0) {
+		printk(KERN_ERR "RAMDISK: Couldn't allocate gzip buffer\n");
+		return -1;
+	}
+	window = kmalloc(WSIZE, GFP_KERNEL);
+	if (window == 0) {
+		printk(KERN_ERR "RAMDISK: Couldn't allocate gzip window\n");
+		kfree(inbuf);
+		return -1;
+	}
+	makecrc();
+	result = gunzip();
+	kfree(inbuf);
+	kfree(window);
+	return result;
+}
+
+#endif  /* BUILD_CRAMDISK */
diff --git a/mm/memory.c b/mm/memory.c
index f455a5d3d..533529b1f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1231,8 +1231,10 @@ static int do_no_page(struct mm_struct * mm, struct vm_area_struct * vma,
 	 */
 	if (write_access && !(vma->vm_flags & VM_SHARED)) {
 		struct page * page = alloc_page(GFP_HIGHUSER);
-		if (!page)
+		if (!page) {
+			page_cache_release(new_page);
 			return -1;
+		}
 		copy_user_highpage(page, new_page, address);
 		page_cache_release(new_page);
 		lru_cache_add(page);
diff --git a/mm/mempool.c b/mm/mempool.c
index 8116cac13..0c0bf9996 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -1,9 +1,9 @@
 /*
  *  linux/mm/mempool.c
  *
- *  memory buffer pool support. Such pools are mostly used to
- *  guarantee deadlock-free IO operations even during extreme
- *  VM load.
+ *  memory buffer pool support. Such pools are mostly used
+ *  for guaranteed, deadlock-free memory allocations during
+ *  extreme VM load.
  *
  *  started by Ingo Molnar, Copyright (C) 2001
  */
@@ -75,6 +75,71 @@ mempool_t * mempool_create(int min_nr, mempool_alloc_t *alloc_fn,
 }
 
 /**
+ * mempool_resize - resize an existing memory pool
+ * @pool:       pointer to the memory pool which was allocated via
+ *              mempool_create().
+ * @new_min_nr: the new minimum number of elements guaranteed to be
+ *              allocated for this pool.
+ * @gfp_mask:   the usual allocation bitmask.
+ *
+ * This function shrinks/grows the pool. In the case of growing,
+ * it cannot be guaranteed that the pool will be grown to the new
+ * size immediately, but new mempool_free() calls will refill it.
+ *
+ * Note, the caller must guarantee that no mempool_destroy is called
+ * while this function is running. mempool_alloc() & mempool_free()
+ * might be called (eg. from IRQ contexts) while this function executes.
+ */
+void mempool_resize(mempool_t *pool, int new_min_nr, int gfp_mask)
+{
+	int delta;
+	void *element;
+	unsigned long flags;
+	struct list_head *tmp;
+
+	if (new_min_nr <= 0)
+		BUG();
+
+	spin_lock_irqsave(&pool->lock, flags);
+	if (new_min_nr < pool->min_nr) {
+		pool->min_nr = new_min_nr;
+		/*
+		 * Free possible excess elements.
+		 */
+		while (pool->curr_nr > pool->min_nr) {
+			tmp = pool->elements.next;
+			if (tmp == &pool->elements)
+				BUG();
+			list_del(tmp);
+			element = tmp;
+			pool->curr_nr--;
+			spin_unlock_irqrestore(&pool->lock, flags);
+
+			pool->free(element, pool->pool_data);
+
+			spin_lock_irqsave(&pool->lock, flags);
+		}
+		spin_unlock_irqrestore(&pool->lock, flags);
+		return;
+	}
+	delta = new_min_nr - pool->min_nr;
+	pool->min_nr = new_min_nr;
+	spin_unlock_irqrestore(&pool->lock, flags);
+
+	/*
+	 * We refill the pool up to the new treshold - but we dont
+	 * (cannot) guarantee that the refill succeeds.
+	 */
+	while (delta) {
+		element = pool->alloc(gfp_mask, pool->pool_data);
+		if (!element)
+			break;
+		mempool_free(element, pool);
+		delta--;
+	}
+}
+
+/**
  * mempool_destroy - deallocate a memory pool
  * @pool:      pointer to the memory pool which was allocated via
  *             mempool_create().
@@ -110,7 +175,7 @@ void mempool_destroy(mempool_t *pool)
  * @gfp_mask:  the usual allocation bitmask.
  *
  * this function only sleeps if the alloc_fn function sleeps or
- * returns NULL. Note that due to preallocation guarantees this function
+ * returns NULL. Note that due to preallocation, this function
  * *never* fails.
  */
 void * mempool_alloc(mempool_t *pool, int gfp_mask)
@@ -175,7 +240,7 @@ repeat_alloc:
 
 /**
  * mempool_free - return an element to the pool.
- * @gfp_mask:  pool element pointer.
+ * @element:   pool element pointer.
  * @pool:      pointer to the memory pool which was allocated via
  *             mempool_create().
  *
@@ -200,6 +265,7 @@ void mempool_free(void *element, mempool_t *pool)
 }
 
 EXPORT_SYMBOL(mempool_create);
+EXPORT_SYMBOL(mempool_resize);
 EXPORT_SYMBOL(mempool_destroy);
 EXPORT_SYMBOL(mempool_alloc);
 EXPORT_SYMBOL(mempool_free);
author	davem <davem>	2001-12-13 12:34:08 +0000
committer	davem <davem>	2001-12-13 12:34:08 +0000
commit	426a063260c2afa557be2256f3f7cb07acc0f2b4 (patch)
tree	44c6bd97b331323db803d203746803ce4f07f595
parent	6cd01a554443baeb90df209d3e279142aa1709da (diff)
download	netdev-vger-cvs-426a063260c2afa557be2256f3f7cb07acc0f2b4.tar.gz