Documentation/must-fix.txt | 354 +++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 354 insertions(+) diff -puN /dev/null Documentation/must-fix.txt --- /dev/null 2002-08-30 16:31:37.000000000 -0700 +++ 25-akpm/Documentation/must-fix.txt 2003-10-03 02:14:46.000000000 -0700 @@ -0,0 +1,354 @@ + +Must-fix bugs +============= + +drivers/char/ +~~~~~~~~~~~~~ + +o TTY locking is broken. + + o see FIXME in do_tty_hangup(). This causes ppp BUGs in local_bh_enable() + + o Other problems: aviro, dipankar, Alan have details. + + o somebody will have to document the tty driver and ldisc API + +o Lack of test cases and/or stress tests is a problem. Contributions and + suggestions are sought. + +o Lots of drivers are using cli/sti and are broken. + +drivers/tty +~~~~~~~~~~~ + +o viro: we need to fix refcounting for tty_driver (oopsable race, must fix + anyway, hopefully about a week until it's merged) then we can do + tty/misc/upper levels of sound. + +drivers/block/ +~~~~~~~~~~~~~~ + +o ideraid hasn't been ported to 2.5 at all yet. + + We need to understand whether the proposed BIO split code will suffice + for this. + +o CD burning. There are still a few quirks to solve wrt SG_IO and ide-cd. + + Jens: The basic hang has been solved (double fault in ide-cd), there still + seems to be some cases that don't work too well. Don't really have a + handle on those :/ + +o lmb: Last time I looked at the multipath code (2.5.50 or so) it also + looked pretty broken; I plan to port forward the changes we did on 2.4 + before KS. + +drivers/input/ +~~~~~~~~~~~~~~ + +o rmk: unconverted keyboard/mouse drivers (there's a deadline of 2.6.0 + currently on these remaining in my/Linus' tree.) + +o viro: large absence of locking. + +o viro: parport is nearly as bad as that and there the code is more hairy. + IMO parport is more of "figure out what API changes are needed for its + users, get them done ASAP, then fix generic layer at leisure" + +o (Albert Cahalan) Lots of people (check Google) get this message from the + kernel: + + psmouse.c: Lost synchronization, throwing 2 bytes away. + + (the number of bytes will be 1, 2, or 3) + + At work, I get it when there is heavy NFS traffic. The mouse goes crazy, + jumping around and doing random cut-and-paste all over everything. This + is with a decently fast and modern PC. + +o There seem to be too many reports of keyboards and mice failing or acting + strangely. + + +drivers/misc/ +~~~~~~~~~~~~~ + +o rmk: UCB1[23]00 drivers, currently sitting in drivers/misc in the ARM + tree. (touchscreen, audio, gpio, type device.) + + These need to be moved out of drivers/misc/ and into real places + +o viro: actually, misc.c has a good chance to die. With cdev-cidr that's + trivial. + +drivers/net/ +~~~~~~~~~~~~ + +o rmk: network drivers. ARM people like to add tonnes of #ifdefs into + these to customise them to their hardware platform (eg, chip access + methods, addresses, etc.) I cope with this by not integrating them into my + tree. The result is that many ARM platforms can't be built from even my + tree without extra patches. This isn't sane, and has bred a culture of + network drivers not being submitted. I don't see this changing for 2.6 + though. + +drivers/net/irda/ +~~~~~~~~~~~~~~~~~ + +o dongle drivers need to be converted to sir-dev + +o irport need to be converted to sir-kthread + +o new drivers (irtty-sir/smsc-ircc2/donauboe) need more testing + +o rmk: Refuse IrDA initialisation if sizeof(structures) is incorrect (I'm + not sure if we still need this; I think gcc 2.95.3 on ARM shows this + problem though.) + +drivers/pci/ +~~~~~~~~~~~~ + +o alan: Some cardbus crashes the system + + (bugzilla, please?) + +drivers/pcmcia/ +~~~~~~~~~~~~~~~ + +o alan: This is a locking disaster. + + (rmk, brodo: in progress) + +drivers/pld/ +~~~~~~~~~~~~ + +o rmk: EPXA (ARM platform) PLD hotswap drivers (drivers/pld) + + (rmk: will work out what to do here. maybe drivers/arm/) + +drivers/video/ +~~~~~~~~~~~~~~ + +o Lots of drivers don't compile, others do but don't work. + +drivers/scsi/ +~~~~~~~~~~~~~ + +o hch: large parts of the locking are hosed or not existant + + (Mike Anderson, Patrick Mansfield, Badari Pulavarty) + + o shost->my_devices isn't locked down at all + + o there are lots of members of struct Scsi_Host/scsi_device/scsi_cmnd + with very unclear locking, many of them probably want to become + atomic_t's or bitmaps (for the 1bit bitfields). + + o there's lots of volatile abuse in the scsi code that needs to be + thought about. + + o there's some global variables incremented without any locks + +o Convert am53c974, dpt_i2o, initio and pci2220i to DMA-mapping + +o Make inia100, cpqfc, pci2000 and dc390t compile + +o Convert + + wd33c99 based: a2091 a3000 gpv11 mvme174 sgiwd93 + + 53c7xx based: amiga7xxx bvme6000 mvme16x initio am53c974 pci2000 + pci2220i dc390t + + To new error handling + + It also might be possible to shift the 53c7xx based drivers over to + 53c700 which does the new EH stuff, but I don't have the hardware to check + such a shift. + + For the non-compiling stuff, I've probably missed a few that just aren't + compilable on my platforms, so any updates would be welcome. Also, are + some of our non-compiling or unconverted drivers obsolete? + +o rmk: I have a pending todo: I need to put the scsi error handling through + a workout on my scsi bus from hell to make sure it does the right thing + and doesn't get wedged. + +o James B: USB hot-removal crash: "It's a known scsi refcounting issue." + +fs/ +~~~ + +o AIO/direct-IO writes can race with truncate and wreck filesystems. + (Badari has a patch) + +o hch: devfs: there's a fundamental lookup vs devfsd race that's only + fixable by introducing a lookup vs devfs deadlock. I can't see how this is + fixable without getting rid of the current devfsd design. Mandrake seems + to have a workaround for this so this is at least not triggered so easily, + but that's not what I'd consider a fix.. + +o viro: fs/char_dev.c needs removal of aeb stuff and merge of cdev-cidr. + In progress. + +o forward-port sct's O_DIRECT fixes (Badari has a patch) + +o viro: there is some generic stuff for namei/namespace/super, but that's a + slow-merge and can go in 2.6 just fine + +o andi: also soft needs to be fixed - there are quite a lot of + uninterruptible waits in sunrpc/nfs + +o trond: NFS has a mmap-versus-truncate problem + +kernel/sched.c +~~~~~~~~~~~~~~ + +o Starvation, general interactivity need close monitoring. + +kernel/ +~~~~~~~ + +o Alan: 32bit uid support is *still* broken for process accounting. + + Create a 32bit uid, turn accounting on. Shock horror it doesn't work + because the field is 16bit. We need an acct structure flag day for 2.6 + IMHO + + (alan has patch) + +o viro: core sysctl code is racy. And its interaction wiuth sysfs + +o (ingo) rwsems (on x86) are limited to 32766 waiting processes. This + means that setting pid_max to above 32K is unsafe :-( + + An option is to use CONFIG_RWSEM_GENERIC_SPINLOCK variant all the time, + for all archs, and not inline any part of the ops. + +lib/kobject.c +~~~~~~~~~~~~~ + +o kobject refcounting (comments from Al Viro): + + _anything_ can grab a temporary reference to kobject. IOW, if kobject is + embedded into something that could be freed - it _MUST_ have a destructor + and that destructor _MUST_ be the destructor for containing object. + + Any violation of the above (and we already have a bunch of those) is a + user-triggerable memory corruption. + + We can tolerate it for a while in 2.5 (e.g. during work on susbsystem we + can decide to switch to that way of handling objects and have subsystem + vulnerable for a while), but all such windows must be closed before 2.6 + and during 2.6 we can't open them at all. + +o All block drivers which control multiple gendisks with a single + request_queue are broken, due to one-to-one assumptions in the request + queue sysfs hookup. + +mm/ +~~~ + +o GFP_DMA32 (or something like that). Lots of ideas. jejb, zaitcev, + willy, arjan, wli. + + Specifically, 64-bit systems need to be able to enforce 32-bit addressing + limits for device metadata like network cards' ring buffers and SCSI + command descriptors. + +o access_process_vm() doesn't flush right. We probably need new flushing + primitives to do this (davem?) + + +modules +~~~~~~~ + + (Rusty) + +net/ +~~~~ + + (davem) + +o UDP apps can in theory deadlock, because the ip_append_data path can end + up sleeping while the socket lock is held. + + It is OK to sleep with the socket held held, normally. But in this case + the sleep happens while waiting for socket memory/space to become + available, if another context needs to take the socket lock to free up the + space we could hang. + + I sent a rough patch on how to fix this to Alexey, and he is analyzing + the situation. I expect a final fix from him next week or so. + +o Semantics for IPSEC during operations such as TCP connect suck currently. + + When we first try to connect to a destination, we may need to ask the + IPSEC key management daemon to resolve the IPSEC routes for us. For the + purposes of what the kernel needs to do, you can think of it like ARP. We + can't send the packet out properly until we resolve the path. + + What happens now for IPSEC is basically this: + + O_NONBLOCK: returns -EAGAIN over and over until route is resolved + + !O_NONBLOCK: Sleeps until route is resolved + + These semantics are total crap. The solution, which Alexey is working + on, is to allow incomplete routes to exist. These "incomplete" routes + merely put the packet onto a "resolution queue", and once the key manager + does it's thing we finish the output of the packet. This is precisely how + ARP works. + + I don't know when Alexey will be done with this. + +net/*/netfilter/ +~~~~~~~~~~~~~~~~ + + (Rusty) + +o Rework conntrack hashing. + +o Module relationship bogosity fix (trivial, have patch). + +sound/ +~~~~~~ + +o rmk: several OSS drivers for SA11xx-based hardware in need of + ALSA-ification and L3 bus support code for these. + +o rmk: linux/sound/drivers/mpu401/mpu401.c and + linux/sound/drivers/virmidi.c complained about 'errno' at some time in the + past, need to confirm whether this is still a problem. + +o rmk: need to complete ALSA-ification of the WaveArtist driver for both + NetWinder and other stuff (there's some fairly fundamental differences in + the way the mixer needs to be handled for the NetWinder.) + + + (Issues with forward-porting 2.4 bugfixes.) + (Killing off OSS is 2.7 material) + + +global +~~~~~~ + +o 64-bit dev_t. Seems almost ready, but it's not really known how much + work is still to do. Patches exist in -mm but with the recent rise of the + neo-viro I'm not sure where things are at. + +o Lots of 2.4 fixes including some security are not in 2.5 + +o There are about 60 or 70 security related checks that need doing + (copy_user etc) from Stanford tools. (badari is looking into this, and + hollisb) + +o A couple of hundred real looking bugzilla bugs + +o viro: cdev rework. Main group is pretty stable and I hope to feed it to + Linus RSN. That's cdev-cidr and ->i_cdev/->i_cindex stuff + +o Athlon prefetch oopses sometimes. It is currently disabled, and needs to + be fixed. + + _