aboutsummaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2005-10-30[CRYPTO] Simplify one-member scatterlist expressionsHerbert Xu1-2/+2
This patch rewrites various occurences of &sg[0] where sg is an array of length one to simply sg. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-10-30[PATCH] Use sg_set_buf/sg_init_one where applicableDavid Hardeman2-24/+9
This patch uses sg_set_buf/sg_init_one in some places where it was duplicated. Signed-off-by: David Hardeman <david@2gen.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-10-29[PATCH] bluetooth hidp is broken on s390Al Viro1-1/+1
Bluetooth HIDP selects INPUT and it really needs it to be there - module depends on input core. And input core is never built on s390... Marked as broken on s390, for now; if somebody has better ideas, feel free to fix it and remove dependency... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29[IPV4]: Fix issue reported by Coverity in ipv4/fib_frontend.cJayachandran C1-1/+1
fib_del_ifaddr() dereferences ifa->ifa_dev, so the code already assumes that ifa->ifa_dev is non-NULL, the check is unnecessary. Signed-off-by: Jayachandran C. <c.jayachandran at gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-29[ETH]: ether address compareStephen Hemminger1-15/+2
Expose faster ether compare for use by protocols and other driver. And change name to be more consistent with other ether address manipulation routines in same file Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-28Merge master.kernel.org:/pub/scm/linux/kernel/git/sridhar/lksctp-2.6Arnaldo Carvalho de Melo3-38/+60
2005-10-28[SCTP] Do not allow unprivileged programs initiating new associations onIvan Skytte Jorgensen1-0/+26
privileged ports. Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2005-10-28[SCTP] Allow SCTP_MAXSEG to revert to default frag point with a '0' value.Ivan Skytte Jorgensen1-7/+5
Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2005-10-28[SCTP] Fix SCTP_SETADAPTION sockopt to use the correct structure.Ivan Skytte Jorgensen1-11/+9
Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2005-10-28[SCTP] Rename SCTP specific control message flags.Ivan Skytte Jorgensen3-20/+20
Rename SCTP specific control message flags to use SCTP_ prefix rather than MSG_ prefix as per the latest sctp sockets API draft. Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
2005-10-28Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6Linus Torvalds1-5/+8
2005-10-28[MCAST] IPv6: Fix algorithm to compute Querier's Query IntervalYan Zheng1-1/+1
5.1.3. Maximum Response Code The Maximum Response Code field specifies the maximum time allowed before sending a responding Report. The actual time allowed, called the Maximum Response Delay, is represented in units of milliseconds, and is derived from the Maximum Response Code as follows: If Maximum Response Code < 32768, Maximum Response Delay = Maximum Response Code If Maximum Response Code >=32768, Maximum Response Code represents a floating-point value as follows: 0 1 2 3 4 5 6 7 8 9 A B C D E F +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1| exp | mant | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Maximum Response Delay = (mant | 0x1000) << (exp+3) 5.1.9. QQIC (Querier's Query Interval Code) The Querier's Query Interval Code field specifies the [Query Interval] used by the Querier. The actual interval, called the Querier's Query Interval (QQI), is represented in units of seconds, and is derived from the Querier's Query Interval Code as follows: If QQIC < 128, QQI = QQIC If QQIC >= 128, QQIC represents a floating-point value as follows: 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |1| exp | mant | +-+-+-+-+-+-+-+-+ QQI = (mant | 0x10) << (exp + 3) -- rfc3810 #define MLDV2_QQIC(value) MLDV2_EXP(0x80, 4, 3, value) #define MLDV2_MRC(value) MLDV2_EXP(0x8000, 12, 3, value) Above macro are defined in mcast.c. but 1 << 4 == 0x10 and 1 << 12 == 0x1000. So the result computed by original Macro is larger. Signed-off-by: Yan Zheng <yanzheng@21cn.com> Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-28[IPv4/IPv6]: UFO Scatter-gather approachAnanda Raju5-6/+290
Attached is kernel patch for UDP Fragmentation Offload (UFO) feature. 1. This patch incorporate the review comments by Jeff Garzik. 2. Renamed USO as UFO (UDP Fragmentation Offload) 3. udp sendfile support with UFO This patches uses scatter-gather feature of skb to generate large UDP datagram. Below is a "how-to" on changes required in network device driver to use the UFO interface. UDP Fragmentation Offload (UFO) Interface: ------------------------------------------- UFO is a feature wherein the Linux kernel network stack will offload the IP fragmentation functionality of large UDP datagram to hardware. This will reduce the overhead of stack in fragmenting the large UDP datagram to MTU sized packets 1) Drivers indicate their capability of UFO using dev->features |= NETIF_F_UFO | NETIF_F_HW_CSUM | NETIF_F_SG NETIF_F_HW_CSUM is required for UFO over ipv6. 2) UFO packet will be submitted for transmission using driver xmit routine. UFO packet will have a non-zero value for "skb_shinfo(skb)->ufo_size" skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP fragment going out of the adapter after IP fragmentation by hardware. skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[] contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW indicating that hardware has to do checksum calculation. Hardware should compute the UDP checksum of complete datagram and also ip header checksum of each fragmented IP packet. For IPV6 the UFO provides the fragment identification-id in skb_shinfo(skb)->ip6_frag_id. The adapter should use this ID for generating IPv6 fragments. Signed-off-by: Ananda Raju <ananda.raju@neterion.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (forwarded) Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-28Merge master.kernel.org:/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6Arnaldo Carvalho de Melo6-86/+52
2005-10-28[Bluetooth] Update security filter for Extended Inquiry ResponseMarcel Holtmann1-6/+6
This patch updates the HCI security filter with support for the Extended Inquiry Response (EIR) feature. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-10-28[Bluetooth] Make more functions staticMarcel Holtmann2-8/+2
This patch makes another bunch of functions static. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-10-28[Bluetooth] Move CRC table into RFCOMM coreMarcel Holtmann3-72/+44
This patch moves rfcomm_crc_table[] into the RFCOMM core, because there is no need to keep it in a separate file. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-10-28Merge ../bleed-2.6Greg KH55-2442/+4583
2005-10-28[PATCH] Input: convert net/bluetooth to dynamic input_dev allocationDmitry Torokhov1-5/+8
Input: convert net/bluetooth to dynamic input_dev allocation This is required for input_dev sysfs integration Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-10-28Merge branch 'upstream' of ↵Linus Torvalds10-430/+1392
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
2005-10-28Merge master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.15Linus Torvalds17-404/+344
2005-10-28[PATCH] gfp_t: net/*Al Viro3-3/+3
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-27Merge /home/trondmy/scm/kernel/git/torvalds/linux-2.6Trond Myklebust1-0/+1
2005-10-27RPC: Ensure that nobody can queue up new upcalls after rpc_close_pipes()Trond Myklebust1-14/+15
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-27Merge branch 'master'Jeff Garzik3-3/+9
2005-10-27Merge /home/trondmy/scm/kernel/git/torvalds/linux-2.6Trond Myklebust11-56/+45
2005-10-27Revert "RPC: stops the release_pipe() funtion from being called twice"Trond Myklebust1-2/+0
This reverts 747c5534c9a6da4aa87e7cdc2209ea98ea27f381 commit.
2005-10-27[TCP]: Clear stale pred_flags when snd_wnd changesHerbert Xu1-0/+1
This bug is responsible for causing the infamous "Treason uncloaked" messages that's been popping up everywhere since the printk was added. It has usually been blamed on foreign operating systems. However, some of those reports implicate Linux as both systems are running Linux or the TCP connection is going across the loopback interface. In fact, there really is a bug in the Linux TCP header prediction code that's been there since at least 2.1.8. This bug was tracked down with help from Dale Blount. The effect of this bug ranges from harmless "Treason uncloaked" messages to hung/aborted TCP connections. The details of the bug and fix is as follows. When snd_wnd is updated, we only update pred_flags if tcp_fast_path_check succeeds. When it fails (for example, when our rcvbuf is used up), we will leave pred_flags with an out-of-date snd_wnd value. When the out-of-date pred_flags happens to match the next incoming packet we will again hit the fast path and use the current snd_wnd which will be wrong. In the case of the treason messages, it just happens that the snd_wnd cached in pred_flags is zero while tp->snd_wnd is non-zero. Therefore when a zero-window packet comes in we incorrectly conclude that the window is non-zero. In fact if the peer continues to send us zero-window pure ACKs we will continue making the same mistake. It's only when the peer transmits a zero-window packet with data attached that we get a chance to snap out of it. This is what triggers the treason message at the next retransmit timeout. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26[PATCH] svcsock timestamp fixAndrew Morton1-1/+1
Convert nanoseconds to microseconds correctly. Spotted by Steve Dickson <SteveD@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-26[PATCH] kill massive wireless-related log spamJeff Garzik1-2/+7
Although this message is having the intended effect of causing wireless driver maintainers to upgrade their code, I never should have merged this patch in its present form. Leading to tons of bug reports and unhappy users. Some wireless apps poll for statistics regularly, which leads to a printk() every single time they ask for stats. That's a little bit _too_ much of a reminder that the driver is using an old API. Change this to printing out the message once, per kernel boot. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-26Merge branch 'master'Jeff Garzik9-53/+37
2005-10-26[PATCH] ieee80211 build fixJames Ketrenos1-1/+1
James Ketrenos wrote: > [3/4] Use the tx_headroom and reserve requested space. This patch introduced a compile problem; patch below corrects this. Fixed compilation error due to not passing tx_headroom in ieee80211_tx_frame. Signed-off-by: James Ketrenos <jketreno@linux.intel.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-10-26[IPV4]: Fix setting broadcast for SIOCSIFNETMASKDavid Engel1-1/+2
Fix setting of the broadcast address when the netmask is set via SIOCSIFNETMASK in Linux 2.6. The code wanted the old value of ifa->ifa_mask but used it after it had already been overwritten with the new value. Signed-off-by: David Engel <gigem@comcast.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26[AX.25]: Use constant instead of magic numberRalf Baechle1-1/+1
Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26[SK_BUFF] kernel-doc: fix skbuff warningsRandy Dunlap1-0/+2
Add kernel-doc to skbuff.h, skbuff.c to eliminate kernel-doc warnings. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26[IPV4]: Remove dead code from ip_output.cJayachandran C1-4/+1
skb_prev is assigned from skb, which cannot be NULL. This patch removes the unnecessary NULL check. Signed-off-by: Jayachandran C. <c.jayachandran at gmail.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26[NETLINK]: Remove dead code in af_netlink.cJayachandran C1-3/+0
Remove the variable nlk & call to nlk_sk as it does not have any side effect. Signed-off-by: Jayachandran C. <c.jayachandran at gmail.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26[IPSEC]: Kill obsolete get_mss functionHerbert Xu2-43/+6
Now that we've switched over to storing MTUs in the xfrm_dst entries, we no longer need the dst's get_mss methods. This patch gets rid of them. It also documents the fact that our MTU calculation is not optimal for ESP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26[IPV4]: Kill redundant rcu_dereference on fa_infoHerbert Xu1-1/+1
This patch kills a redundant rcu_dereference on fa->fa_info in fib_trie.c. As this dereference directly follows a list_for_each_entry_rcu line, we have already taken a read barrier with respect to getting an entry from the list. This read barrier guarantees that all values read out of fa are valid. In particular, the contents of structure pointed to by fa->fa_info is initialised before fa->fa_info is actually set (see fn_trie_insert); the setting of fa->fa_info itself is further separated with a write barrier from the insertion of fa into the list. Therefore by taking a read barrier after obtaining fa from the list (which is given by list_for_each_entry_rcu), we can be sure that fa->fa_info contains a valid pointer, as well as the fact that the data pointed to by fa->fa_info is itself valid. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26[NETFILTER] ip_conntrack: Make "hashsize" conntrack parameter writableHarald Welte1-37/+95
It's fairly simple to resize the hash table, but currently you need to remove and reinsert the module. That's bad (we lose connection state). Harald has even offered to write a daemon which sets this based on load. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26[PKTGEN]: proc interface revisionStephen Hemminger1-257/+215
The code to handle the /proc interface can be cleaned up in several places: * use seq_file for read * don't need to remember all the filenames separately * use for_online_cpu's * don't vmalloc a buffer for small command from user. Committer note: This patch clashed with John Hawkes's "[NET]: Wider use of for_each_*cpu()", so I fixed it up manually. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26[PKTGEN]: Spelling and white spaceStephen Hemminger1-12/+12
Fix some cosmetic issues. Indentation, spelling errors, and some whitespace. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26[PKTGEN]: Use kzallocStephen Hemminger1-5/+2
These are cleanup patches for pktgen that can go in 2.6.15 Can use kzalloc in a couple of places. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-26[PKTGEN]: Sleeping function called under lockStephen Hemminger1-3/+3
pktgen is calling kmalloc GFP_KERNEL and vmalloc with lock held. The simplest fix is to turn the lock into a semaphore, since the thread lock is only used for admin control from user context. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-25[NET]: Wider use of for_each_*cpu()John Hawkes7-28/+8
In 'net' change the explicit use of for-loops and NR_CPUS into the general for_each_cpu() or for_each_online_cpu() constructs, as appropriate. This widens the scope of potential future optimizations of the general constructs, as well as takes advantage of the existing optimizations of first_cpu() and next_cpu(), which is advantageous when the true CPU count is much smaller than NR_CPUS. Signed-off-by: John Hawkes <hawkes@sgi.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-25[DECNET]: Remove some redundant ifdeffed codePatrick Caulfield1-13/+0
Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com> Signed-off-by: Steven Whitehouse <steve@chygwyn.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-25[TR]: Preserve RIF flag even for 2 byte RIF fields.Jochen Friedrich1-2/+3
Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-25[IPV6]: Fix refcnt of struct ip6_flowlabelYan Zheng1-1/+1
Signed-off-by: Yan Zheng <yanzheng@21cn.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-23[NEIGH] Fix timer leak in neigh_changeaddrHerbert Xu1-30/+13
neigh_changeaddr attempts to delete neighbour timers without setting nud_state. This doesn't work because the timer may have already fired when we acquire the write lock in neigh_changeaddr. The result is that the timer may keep firing for quite a while until the entry reaches NEIGH_FAILED. It should be setting the nud_state straight away so that if the timer has already fired it can simply exit once we relinquish the lock. In fact, this whole function is simply duplicating the logic in neigh_ifdown which in turn is already doing the right thing when it comes to deleting timers and setting nud_state. So all we have to do is take that code out and put it into a common function and make both neigh_changeaddr and neigh_ifdown call it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-10-23[NEIGH] Fix add_timer race in neigh_add_timerHerbert Xu1-2/+2
neigh_add_timer cannot use add_timer unconditionally. The reason is that by the time it has obtained the write lock someone else (e.g., neigh_update) could have already added a new timer. So it should only use mod_timer and deal with its return value accordingly. This bug would have led to rare neighbour cache entry leaks. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-10-23[NEIGH] Print stack trace in neigh_add_timerHerbert Xu1-0/+1
Stack traces are very helpful in determining the exact nature of a bug. So let's print a stack trace when the timer is added twice. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-10-22[SK_BUFF]: ipvs_property field must be copiedJulian Anastasov2-0/+9
IPVS used flag NFC_IPVS_PROPERTY in nfcache but as now nfcache was removed the new flag 'ipvs_property' still needs to be copied. This patch should be included in 2.6.14. Further comments from Harald Welte: Sorry, seems like the bug was introduced by me. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-21ieee80211 subsystem:Michael Buesch1-3/+6
* Use GFP mask on TX skb allocation. * Use the tx_headroom and reserve requested space. Signed-off-by: Michael Buesch <mbuesch@freenet.de> Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-10-20[TCP] Allow len == skb->len in tcp_fragmentHerbert Xu1-11/+1
It is legitimate to call tcp_fragment with len == skb->len since that is done for FIN packets and the FIN flag counts as one byte. So we should only check for the len > skb->len case. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-20[DCCP]: Clear the IPCB areaHerbert Xu2-0/+3
Turns out the problem has nothing to do with use-after-free or double-free. It's just that we're not clearing the CB area and DCCP unlike TCP uses a CB format that's incompatible with IP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Ian McDonald <imcdnzl@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-20[DCCP]: Make dccp_write_xmit always free the packetHerbert Xu2-3/+2
icmp_send doesn't use skb->sk at all so even if skb->sk has already been freed it can't cause crash there (it would've crashed somewhere else first, e.g., ip_queue_xmit). I found a double-free on an skb that could explain this though. dccp_sendmsg and dccp_write_xmit are a little confused as to what should free the packet when something goes wrong. Sometimes they both go for the ball and end up in each other's way. This patch makes dccp_write_xmit always free the packet no matter what. This makes sense since dccp_transmit_skb which in turn comes from the fact that ip_queue_xmit always frees the packet. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-20[DCCP]: Use skb_set_owner_w in dccp_transmit_skb when skb->sk is NULLHerbert Xu1-4/+2
David S. Miller <davem@davemloft.net> wrote: > One thing you can probably do for this bug is to mark data packets > explicitly somehow, perhaps in the SKB control block DCCP already > uses for other data. Put some boolean in there, set it true for > data packets. Then change the test in dccp_transmit_skb() as > appropriate to test the boolean flag instead of "skb_cloned(skb)". I agree. In fact we already have that flag, it's called skb->sk. So here is patch to test that instead of skb_cloned(). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Ian McDonald <imcdnzl@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2005-10-20Fixed oops if an uninitialized key is used for encryption.Hong Liu1-4/+7
Without this patch, if you try and use a key that has not been configured, for example: % iwconfig eth1 key deadbeef00 [2] without having configured key [1], then the active key will still be [1], but privacy will now be enabled. Transmission of a packet in this situation will result in a kernel oops. Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-10-19Fixed problem with not being able to decrypt/encrypt broadcast packets.Hong Liu2-2/+4
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-10-18RPCSEC_GSS: krb5 cleanupJ. Bruce Fields3-33/+6
Remove some senseless wrappers. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18RPCSEC_GSS remove all qop parametersJ. Bruce Fields10-70/+33
Not only are the qop parameters that are passed around throughout the gssapi unused by any currently implemented mechanism, but there appears to be some doubt as to whether they will ever be used. Let's just kill them off for now. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18RPCSEC_GSS: Add support for privacy to krb5 rpcsec_gss mechanism.J. Bruce Fields6-6/+535
Add support for privacy to the krb5 rpcsec_gss mechanism. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18RPCSEC_GSS: krb5 pre-privacy cleanupJ. Bruce Fields3-59/+14
The code this was originally derived from processed wrap and mic tokens using the same functions. This required some contortions, and more would be required with the addition of xdr_buf's, so it's better to separate out the two code paths. In preparation for adding privacy support, remove the last vestiges of the old wrap token code. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18RPCSEC_GSS: Simplify rpcsec_gss crypto codeJ. Bruce Fields1-29/+77
Factor out some code that will be shared by privacy crypto routines Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18RPCSEC_GSS: client-side privacy supportJ. Bruce Fields1-1/+148
Add the code to the client side to handle privacy. This is dead code until we actually add privacy support to krb5. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18RPCSEC_GSS: cleanup au_rslack calculationJ. Bruce Fields1-14/+6
Various xdr encode routines use au_rslack to guess where the reply argument will end up, so we can set up the xdr_buf to recieve data into the right place for zero copy. Currently we calculate the au_rslack estimate when we check the verifier. Normally this only depends on the verifier size. In the integrity case we add a few bytes to allow for a length and sequence number. It's a bit simpler to calculate only the verifier size when we check the verifier, and delay the full calculation till we unwrap. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18SUNRPC: Retry wrap in case of memory allocation failure.J. Bruce Fields1-3/+10
For privacy we need to allocate extra pages to hold encrypted page data when wrapping requests. This allocation may fail, and we handle that case by waiting and retrying. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18SUNRPC: Provide a callback to allow free pages allocated during xdr encodingJ. Bruce Fields1-0/+3
For privacy, we need to allocate pages to store the encrypted data (passed in pages can't be used without the risk of corrupting data in the page cache). So we need a way to free that memory after the request has been transmitted. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18SUNRPC: Add support for privacy to generic gss-api code.J. Bruce Fields1-0/+22
Add support for privacy to generic gss-api code. This is dead code until we have both a mechanism that supports privacy and code in the client or server that uses it. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18RPC: stops the release_pipe() funtion from being called twiceSteve Dickson1-0/+2
This patch stops the release_pipe() funtion from being called twice by invalidating the ops pointer in the rpc_inode when rpc_pipe_release() is called. Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18[PATCH] ieee80211: division by zero fixJiri Benc1-9/+12
This fixes division by zero bug in ieee80211_wx_get_scan(). Signed-off-by: Jiri Benc <jbenc@suse.cz> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-10-18RPC: allow call_encode() to delay transmission of an RPC call.Trond Myklebust2-11/+20
Currently, call_encode will cause the entire RPC call to abort if it returns an error. This is unnecessarily rigid, and gets in the way of attempts to allow the NFSv4 layer to order RPC calls that carry sequence ids. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18SUNRPC: Retry rpcbind requests if the server's portmapper isn't upChuck Lever1-1/+2
After a server crash/reboot, rebinding should always retry, otherwise requests on "hard" mounts will fail when they shouldn't. Test plan: Run a lock-intensive workload against a server while rebooting the server repeatedly. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-10-18Merge branch 'master'Jeff Garzik1-1/+0
2005-10-18Merge /home/trondmy/scm/kernel/git/torvalds/linux-2.6Trond Myklebust133-1119/+1787
2005-10-15[NETFILTER]: Fix ip6_table.c build with NETFILTER_DEBUG enabled.Andrew Morton1-1/+0
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-13Merge branch 'master'Jeff Garzik64-220/+580
2005-10-13[TCP]: Ratelimit debugging warning.Herbert Xu1-5/+7
Better safe than sorry. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-13[NET]: Disable NET_SCH_CLK_CPU for SMP x86 hostsAndi Kleen1-1/+3
Opterons with frequency scaling have fully unsynchronized TSCs running at different frequencies, so using TSCs there is not a good idea. Also some other x86 boxes have this problem. gettimeofday should be good enough, so just disable it. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-13[NETFILTER]: Fix OOPSes on machines with discontiguous cpu numbering.David S. Miller4-26/+48
Original patch by Harald Welte, with feedback from Herbert Xu and testing by Sébastien Bernard. EBTABLES, ARP tables, and IP/IP6 tables all assume that cpus are numbered linearly. That is not necessarily true. This patch fixes that up by calculating the largest possible cpu number, and allocating enough per-cpu structure space given that. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-12[TCP]: Add code to help track down "BUG at net/ipv4/tcp_output.c:438!"Herbert Xu1-1/+8
This is the second report of this bug. Unfortunately the first reporter hasn't been able to reproduce it since to provide more debugging info. So let's apply this patch for 2.6.14 to 1) Make this non-fatal. 2) Provide the info we need to track it down. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-12[BRIDGE]: fix race on bridge del ifStephen Hemminger1-1/+1
This fixes the RCU race on bridge delete interface. Basically, the network device has to be detached from the bridge in the first step (pre-RCU), rather than later. At that point, no more bridge traffic will come in, and the other code will not think that network device is part of a bridge. This should also fix the XEN test problems. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10[TWSK]: Grab the module refcount for timewait socketsArnaldo Carvalho de Melo1-0/+1
This is required to avoid unloading a module that has active timewait sockets, such as DCCP. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10[DCCP]: Transition from PARTOPEN to OPEN when receiving DATA packetsArnaldo Carvalho de Melo1-1/+5
Noticed by Andrea Bittau, that provided a patch that was modified to not transition from RESPOND to OPEN when receiving DATA packets. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10[CCID]: Check if ccid is NULL in the hc_[tr]x_exit functionsArnaldo Carvalho de Melo1-2/+2
For consistency with ccid_exit and to fix a bug when IP_DCCP_UNLOAD_HACK is enabled as the control sock is not associated to any CCID. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10[NETFILTER] ctnetlink: add support to change protocol infoPablo Neira Ayuso1-0/+37
This patch add support to change the state of the private protocol information via conntrack_netlink. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10[NETFILTER] ctnetlink: allow userspace to change TCP statePablo Neira Ayuso1-0/+23
This patch adds the ability of changing the state a TCP connection. I know that this must be used with care but it's required to provide a complete conntrack creation via conntrack_netlink. So I'll document this aspect on the upcoming docs. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10[NETFILTER]: Use only 32bit counters for CONNTRACK_ACCTHarald Welte2-9/+12
Initially we used 64bit counters for conntrack-based accounting, since we had no event mechanism to tell userspace that our counters are about to overflow. With nfnetlink_conntrack, we now have such a event mechanism and thus can save 16bytes per connection. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10[IPSEC] Fix block size/MTU bugs in ESPHerbert Xu2-6/+7
This patch fixes the following bugs in ESP: * Fix transport mode MTU overestimate. This means that the inner MTU is smaller than it needs be. Worse yet, given an input MTU which is a multiple of 4 it will always produce an estimate which is not a multiple of 4. For example, given a standard ESP/3DES/MD5 transform and an MTU of 1500, the resulting MTU for transport mode is 1462 when it should be 1464. The reason for this is because IP header lengths are always a multiple of 4 for IPv4 and 8 for IPv6. * Ensure that the block size is at least 4. This is required by RFC2406 and corresponds to what the esp_output function does. At the moment this only affects crypto_null as its block size is 1. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10[IPSEC]: Use ALIGN macro in ESPHerbert Xu2-10/+12
This patch uses the macro ALIGN in all the applicable spots for ESP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10[NETFILTER] ctnetlink: add one nesting level for TCP statePablo Neira Ayuso1-0/+4
To keep consistency, the TCP private protocol information is nested attributes under CTA_PROTOINFO_TCP. This way the sequence of attributes to access the TCP state information looks like here below: CTA_PROTOINFO CTA_PROTOINFO_TCP CTA_PROTOINFO_TCP_STATE instead of: CTA_PROTOINFO CTA_PROTOINFO_TCP_STATE Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10[NETFILTER] ctnetlink: ICMP ID is not mandatoryPablo Neira Ayuso1-2/+1
The ID is only required by ICMP type 8 (echo), so it's not mandatory for all sort of ICMP connections. This patch makes mandatory only the type and the code for ICMP netlink messages. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10[NETFILTER] conntrack_netlink: Fix endian issue with status from userspaceHarald Welte1-1/+2
When we send "status" from userspace, we forget to convert the endianness. This patch adds the reqired conversion. Thanks to Pablo Neira for discovering this. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10[NETFILTER] nfnetlink: use highest bit of nfa_type to indicate nested TLVHarald Welte1-2/+2
As Henrik Nordstrom pointed out, all our efforts with "split endian" (i.e. host byte order tags, net byte order values) are useless, unless a parser can determine whether an attribute is nested or not. This patch steals the highest bit of nfattr.nfa_type to indicate whether the data payload contains a nested nfattr (1) or not (0). This will break userspace compatibility, but luckily no kernel with nfnetlink was released so far. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10[NETFILTER] ipt_ULOG: Mark ipt_ULOG as OBSOLETEHarald Welte1-1/+6
Similar to nfnetlink_queue and ip_queue, we mark ipt_ULOG as obsolete. This should have been part of the original nfnetlink_log merge, but I somehow missed it. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10[NETFILTER] PPTP helper: Add missing Kconfig dependencyHarald Welte1-0/+1
PPTP should not be selectable without conntrack enabled Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-08[PATCH] gfp flags annotations - part 1Al Viro38-102/+95
- added typedef unsigned int __nocast gfp_t; - replaced __nocast uses for gfp flags with gfp_t - it gives exactly the same warnings as far as sparse is concerned, doesn't change generated code (from gcc point of view we replaced unsigned int with typedef) and documents what's going on far better. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-07[ATM]: [br2684] if we free the skb, we should return 0Jean-Denis Boyer1-1/+1
From: "Jean-Denis Boyer" <jdboyer@mediatrix.com> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-06[ATM]: add support for LECS addresses learned from networkEric Kinzie3-23/+60
From: Eric Kinzie <ekinzie@cmf.nrl.navy.mil> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-06[SCTP] Fix sctp_get{pl}addrs() API to work with 32-bit apps on 64-bit kernels.Ivan Skytte Jørgensen1-25/+227
The old socket options are marked with a _OLD suffix so that the existing 32-bit apps on 32-bit kernels do not break. Signed-off-by: Ivan Skytte Jørgensen <isj-sctp@i1.dk> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-05[AX.25]: Fix packet socket crashRalf Baechle2-2/+2
Since changeset 98a82febb6340466824c3a453738d4fbd05db81a AX.25 is passing received IP and ARP packets to the stack through netif_rx() but we don't set the skb->mac.raw to right value which may result in a crash with applications that use a packet socket. Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-05[IPSEC]: Document that policy direction is derived from the index.Herbert Xu2-5/+10
Here is a patch that adds a helper called xfrm_policy_id2dir to document the fact that the policy direction can be and is derived from the index. This is based on a patch by YOSHIFUJI Hideaki and 210313105@suda.edu.cn. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-05[IPV6]: Fix NS handing for proxy/anycast addressYOSHIFUJI Hideaki1-1/+1
Timer set up by pneigh_enqueue() ended up calling ndisc_rcv() via pndisc_redo(), which clears LOCALLY_ENQUEUED flag in NEIGH_CB(skb) and NS was queued again. Let's call ndisc_recv_ns() directly to avoid the loop. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-05[TCP]: BIC coding bug in Linux 2.6.13Stephen Hemminger1-1/+1
Missing parenthesis in causes BIC to be slow in increasing congestion window. Spotted by Injong Rhee. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-05[MCAST] ipv6: Fix address size in grec_sizeYan Zheng1-1/+1
Signed-Off-By: Yan Zheng <yanzheng@21cn.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-05Merge branch 'master'Jeff Garzik2-3/+24
2005-10-04[XFRM]: fix sparse gfp nocast warningsRandy Dunlap1-1/+1
Fix implicit nocast warnings in xfrm code: net/xfrm/xfrm_policy.c:232:47: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04[RPC]: fix sparse gfp nocast warningsRandy Dunlap3-3/+3
Fix nocast sparse warnings: net/rxrpc/call.c:2013:25: warning: implicit cast to nocast type net/rxrpc/connection.c:538:46: warning: implicit cast to nocast type net/sunrpc/sched.c:730:36: warning: implicit cast to nocast type net/sunrpc/sched.c:734:56: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04[AF_KEY]: fix sparse gfp nocast warningsRandy Dunlap1-3/+4
Fix implicit nocast warnings in net/key code: net/key/af_key.c:195:27: warning: implicit cast to nocast type net/key/af_key.c:1439:28: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04[NETFILTER]: fix sparse gfp nocast warningsRandy Dunlap1-1/+2
Fix implicit nocast warnings in nfnetlink code: net/netfilter/nfnetlink.c:204:43: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04[IPVS]: fix sparse gfp nocast warningsRandy Dunlap1-1/+1
From: Randy Dunlap <rdunlap@xenotime.net> Fix implicit nocast warnings in ip_vs code: net/ipv4/ipvs/ip_vs_app.c:631:54: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04[DECNET]: fix sparse gfp nocast warningsRandy Dunlap2-11/+18
Fix implicit nocast warnings in decnet code: net/decnet/af_decnet.c:458:40: warning: implicit cast to nocast type net/decnet/dn_nsp_out.c:125:35: warning: implicit cast to nocast type net/decnet/dn_nsp_out.c:219:29: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04[ATM]: fix sparse gfp nocast warningsRandy Dunlap1-1/+1
Fix implicit nocast warnings in atm code: net/atm/atm_misc.c:35:44: warning: implicit cast to nocast type drivers/atm/fore200e.c:183:33: warning: implicit cast to nocast type Also use kzalloc() instead of kmalloc(). Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04[NETFILTER]: Fix Kconfig typoHorst H. von Brand1-1/+1
Signed-off-by: Horst H. von Brand <vonbrand@inf.utfsm.cl> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04[IPV4]: fib_trie root-node expansionRobert Olsson1-2/+21
The patch below introduces special thresholds to keep root node in the trie large. This gives a flatter tree at the cost of a modest memory increase. Overall it seems to be gain and this was also proposed by one the authors of the paper in recent a seminar. Main table after loading 123 k routes. Aver depth: 3.30 Max depth: 9 Root-node size 12 bits Total size: 4044 kB With the patch: Aver depth: 2.78 Max depth: 8 Root-node size 15 bits Total size: 4150 kB An increase of 8-10% was seen in forwading performance for an rDoS attack. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04[IPV6]: Fix infinite loop in udp_v6_get_port().YOSHIFUJI Hideaki1-1/+3
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04Merge rsync://bughost.org/repos/ieee80211-delta/Jeff Garzik1-180/+134
2005-10-04Merge branch 'upstream-fixes'Jeff Garzik1-1/+1
2005-10-04[PATCH] ieee80211: fix gfp flags typeRandy Dunlap1-1/+1
Fix implicit nocast warnings in ieee80211 code, including __nocast: net/ieee80211/ieee80211_tx.c:215:9: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-10-03Merge branch 'upstream-fixes'Jeff Garzik85-946/+1246
2005-10-03[PATCH] ieee80211: fix gfp flags typeRandy Dunlap1-1/+1
Fix implicit nocast warnings in ieee80211 code: net/ieee80211/ieee80211_tx.c:215:9: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-10-03[IPV4]: Update icmp sysctl docs and disable broadcast ECHO/TIMESTAMP by defaultDavid S. Miller1-1/+1
It's not a good idea to be smurf'able by default. The few people who need this can turn it on. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03[IPV4]: Get rid of bogus __in_put_dev in pktgenHerbert Xu1-1/+0
This patch gets rid of a bogus __in_dev_put() in pktgen.c. This was spotted by Suzanne Wood. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03[IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnlHerbert Xu17-37/+39
The following patch renames __in_dev_get() to __in_dev_get_rtnl() and introduces __in_dev_get_rcu() to cover the second case. 1) RCU with refcnt should use in_dev_get(). 2) RCU without refcnt should use __in_dev_get_rcu(). 3) All others must hold RTNL and use __in_dev_get_rtnl(). There is one exception in net/ipv4/route.c which is in fact a pre-existing race condition. I've marked it as such so that we remember to fix it. This patch is based on suggestions and prior work by Suzanne Wood and Paul McKenney. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03[IPV6]: Fix leak added by udp connect dst caching fix.David S. Miller1-4/+10
Based upon a patch from Mitsuru KANDA <mk@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03[IPV6]: Fix ipv6 fragment ID selection at slow pathYan Zheng1-1/+1
Signed-Off-By: Yan Zheng <yanzheng@21cn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03[IPV4]: Fix "Proxy ARP seems broken"Herbert Xu1-6/+5
Meelis Roos <mroos@linux.ee> wrote: > RK> My firewall setup relies on proxyarp working. However, with 2.6.14-rc3, > RK> it appears to be completely broken. The firewall is 212.18.232.186, > > Same here with some kernel between 14-rc2 and 14-rc3 - no reposnse to > ARP on a proxyarp gateway. Sorry, no exact revison and no more debugging > yet since it'a a production gateway. The breakage is caused by the change to use the CB area for flagging whether a packet has been queued due to proxy_delay. This area gets cleared every time arp_rcv gets called. Unfortunately packets delayed due to proxy_delay also go through arp_rcv when they are reprocessed. In fact, I can't think of a reason why delayed proxy packets should go through netfilter again at all. So the easiest solution is to bypass that and go straight to arp_process. This is essentially what would've happened before netfilter support was added to ARP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03[NET]: Fix "sysctl_net.c:36: error: 'core_table' undeclared here"Russell King1-0/+2
During the build for ARM machine type "fortunet", this error occurred: CC net/sysctl_net.o net/sysctl_net.c:36: error: 'core_table' undeclared here (not in a function) It appears that the following configuration settings cause this error due to a missing include: CONFIG_SYSCTL=y CONFIG_NET=y # CONFIG_INET is not set core_table appears to be declared in net/sock.h. if CONFIG_INET were defined, net/sock.h would have been included via: sysctl_net.c -> net/ip.h -> linux/ip.h -> net/sock.h so include it directly. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03[INET]: speedup inet (tcp/dccp) lookupsEric Dumazet6-26/+29
Arnaldo and I agreed it could be applied now, because I have other pending patches depending on this one (Thank you Arnaldo) (The other important patch moves skc_refcnt in a separate cache line, so that the SMP/NUMA performance doesnt suffer from cache line ping pongs) 1) First some performance data : -------------------------------- tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established() The most time critical code is : sk_for_each(sk, node, &head->chain) { if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif)) goto hit; /* You sunk my battleship! */ } The sk_for_each() does use prefetch() hints but only the begining of "struct sock" is prefetched. As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far away from the begining of "struct sock", it has to bring into CPU cache cold cache line. Each iteration has to use at least 2 cache lines. This can be problematic if some chains are very long. 2) The goal ----------- The idea I had is to change things so that INET_MATCH() may return FALSE in 99% of cases only using the data already in the CPU cache, using one cache line per iteration. 3) Description of the patch --------------------------- Adds a new 'unsigned int skc_hash' field in 'struct sock_common', filling a 32 bits hole on 64 bits platform. struct sock_common { unsigned short skc_family; volatile unsigned char skc_state; unsigned char skc_reuse; int skc_bound_dev_if; struct hlist_node skc_node; struct hlist_node skc_bind_node; atomic_t skc_refcnt; + unsigned int skc_hash; struct proto *skc_prot; }; Store in this 32 bits field the full hash, not masked by (ehash_size - 1) Using this full hash as the first comparison done in INET_MATCH permits us immediatly skip the element without touching a second cache line in case of a miss. Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to sk_hash and tw_hash) already contains the slot number if we mask with (ehash_size - 1) File include/net/inet_hashtables.h 64 bits platforms : #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie)) && \ ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports)) && \ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif)))) 32bits platforms: #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) && \ (inet_sk(__sk)->daddr == (__saddr)) && \ (inet_sk(__sk)->rcv_saddr == (__daddr)) && \ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif)))) - Adds a prefetch(head->chain.first) in __inet_lookup_established()/__tcp_v4_check_established() and __inet6_lookup_established()/__tcp_v6_check_established() and __dccp_v4_check_established() to bring into cache the first element of the list, before the {read|write}_lock(&head->lock); Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03[NET]: Fix packet timestamping.Herbert Xu7-17/+12
I've found the problem in general. It affects any 64-bit architecture. The problem occurs when you change the system time. Suppose that when you boot your system clock is forward by a day. This gets recorded down in skb_tv_base. You then wind the clock back by a day. From that point onwards the offset will be negative which essentially overflows the 32-bit variables they're stored in. In fact, why don't we just store the real time stamp in those 32-bit variables? After all, we're not going to overflow for quite a while yet. When we do overflow, we'll need a better solution of course. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03Lindent and trailing whitespace script executed ieee80211 subsystemJames Ketrenos1-14/+21
Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-10-03When an assoc_resp is received the network structure is not completelyIvo van Doorn1-0/+32
initialized which can cause problems for drivers that expect the network structure to be completely filled in. This patch will make sure the network is filled in as much as possible. Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-10-03Currently the info_element is parsed by 2 seperate functions, thisIvo van Doorn1-168/+99
results in a lot of duplicate code. This will move the parsing stage into a seperate function. Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-10-03Fix implicit nocast warnings in ieee80211 code:Randy Dunlap1-1/+1
net/ieee80211/ieee80211_tx.c:215:9: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-10-03This will move the ieee80211_is_ofdm_rate function to the ieee80211.hIvo van Doorn1-16/+0
header, and I also added the ieee80211_is_cck_rate counterpart. Various drivers currently create there own version of these functions, but I guess the ieee80211 stack is the best place to provide such routines. Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: James Ketrenos <jketreno@linux.intel.com>
2005-09-29[ATM]: [lec] reset retry counter when new arp issuedScott Talbert1-0/+6
From: Scott Talbert <scott.talbert@lmco.com> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-29[ATM]: [lec] attempt to support cisco failoverScott Talbert1-3/+34
From: Scott Talbert <scott.talbert@lmco.com> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-29[TCP]: Don't over-clamp window in tcp_clamp_window()Alexey Kuznetsov1-2/+0
From: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Handle better the case where the sender sends full sized frames initially, then moves to a mode where it trickles out small amounts of data at a time. This known problem is even mentioned in the comments above tcp_grow_window() in tcp_input.c, specifically: ... * The scheme does not work when sender sends good segments opening * window and then starts to feed us spagetti. But it should work * in common situations. Otherwise, we have to rely on queue collapsing. ... When the sender gives full sized frames, the "struct sk_buff" overhead from each packet is small. So we'll advertize a larger window. If the sender moves to a mode where small segments are sent, this ratio becomes tilted to the other extreme and we start overrunning the socket buffer space. tcp_clamp_window() tries to address this, but it's clamping of tp->window_clamp is a wee bit too aggressive for this particular case. Fix confirmed by Ion Badulescu. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-29[TCP]: Revert 6b251858d377196b8cea20e65cae60f584a42735David S. Miller1-5/+4
But retain the comment fix. Alexey Kuznetsov has explained the situation as follows: -------------------- I think the fix is incorrect. Look, the RFC function init_cwnd(mss) is not continuous: f.e. for mss=1095 it needs initial window 1095*4, but for mss=1096 it is 1096*3. We do not know exactly what mss sender used for calculations. If we advertised 1096 (and calculate initial window 3*1096), the sender could limit it to some value < 1096 and then it will need window his_mss*4 > 3*1096 to send initial burst. See? So, the honest function for inital rcv_wnd derived from tcp_init_cwnd() is: init_rcv_wnd(mss)= min { init_cwnd(mss1)*mss1 for mss1 <= mss } It is something sort of: if (mss < 1096) return mss*4; if (mss < 1096*2) return 1096*4; return mss*2; (I just scrablled a graph of piece of paper, it is difficult to see or to explain without this) I selected it differently giving more window than it is strictly required. Initial receive window must be large enough to allow sender following to the rfc (or just setting initial cwnd to 2) to send initial burst. But besides that it is arbitrary, so I decided to give slack space of one segment. Actually, the logic was: If mss is low/normal (<=ethernet), set window to receive more than initial burst allowed by rfc under the worst conditions i.e. mss*4. This gives slack space of 1 segment for ethernet frames. For msses slighlty more than ethernet frame, take 3. Try to give slack space of 1 frame again. If mss is huge, force 2*mss. No slack space. Value 1460*3 is really confusing. Minimal one is 1096*2, but besides that it is an arbitrary value. It was meant to be ~4096. 1460*3 is just the magic number from RFC, 1460*3 = 1095*4 is the magic :-), so that I guess hands typed this themselves. -------------------- Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-29Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds13-118/+150
2005-09-29[PATCH] proc_mkdir() should be used to create procfs directoriesAl Viro2-18/+7
A bunch of create_proc_dir_entry() calls creating directories had crept in since the last sweep; converted to proc_mkdir(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28[NET]: Fix reversed logic in eth_type_trans().David S. Miller1-1/+1
I got the second compare_eth_addr() test reversed, oops. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-28[ATM]: fix bug in atm address list handlingMartin Whitaker1-2/+4
From: Martin Whitaker <atm@martin-whitaker.co.uk> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
2005-09-28[ATM]: track and close listen sockets when sigd exitsChas Williams3-6/+7
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
2005-09-28[ATM]: net/atm/ioctl.c: autoload pppoatm and br2684Roman Kagan1-8/+26
Signed-off-by: Roman Kagan <rkagan@mail.ru> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
2005-09-28[TCP]: Fix init_cwnd calculations in tcp_select_initial_window()David S. Miller1-5/+6
Match it up to what RFC2414 really specifies. Noticed by Rick Jones. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27[APPLETALK]: Fix broadcast bug.Oliver Dawid1-9/+22
From: Oliver Dawid <oliver@helios.de> we found a bug in net/appletalk/ddp.c concerning broadcast packets. In kernel 2.4 it was working fine. The bug first occured 4 years ago when switching to new SNAP layer handling. This bug can be splitted up into a sending(1) and reception(2) problem: Sending(1) In kernel 2.4 broadcast packets were sent to a matching ethernet device and atalk_rcv() was called to receive it as "loopback" (so loopback packets were shortcutted and handled in DDP layer). When switching to the new SNAP structure, this shortcut was removed and the loopback packet was send to SNAP layer. The author forgot to replace the remote device pointer by the loopback device pointer before sending the packet to SNAP layer (by calling ddp_dl->request() ) therfor the packet was not sent back by underlying layers to ddp's atalk_rcv(). Reception(2) In atalk_rcv() a packet received by this loopback mechanism contains now the (rigth) loopback device pointer (in Kernel 2.4 it was the (wrong) remote ethernet device pointer) and therefor no matching socket will be found to deliver this packet to. Because a broadcast packet should be send to the first matching socket (as it is done in many other protocols (?)), we removed the network comparison in broadcast case. Below you will find a patch to correct this bug. Its diffed to kernel 2.6.14-rc1 Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27[NET]: Slightly optimize ethernet address comparison.David S. Miller1-10/+21
We know the thing is at least 2-byte aligned, so take advantage of that instead of invoking memcmp() which results in truly horrifically inefficient code because it can't assume anything about alignment. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27[ROSE]: fix typo (regeistration)Alexey Dobriyan1-1/+1
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27[ROSE]: check rose_ndevs earlierAlexey Dobriyan1-9/+11
* Don't bother with proto registering if rose_ndevs is bad. * Make escape structure more coherent. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27[ROSE]: return sane -E* from rose_proto_init()Alexey Dobriyan1-4/+6
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27[ROSE]: do proto_unregister() on exit pathsAlexey Dobriyan1-0/+2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27[NET]: Fix module reference counts for loadable protocol modulesFrank Filz2-13/+20
I have been experimenting with loadable protocol modules, and ran into several issues with module reference counting. The first issue was that __module_get failed at the BUG_ON check at the top of the routine (checking that my module reference count was not zero) when I created the first socket. When sk_alloc() is called, my module reference count was still 0. When I looked at why sctp didn't have this problem, I discovered that sctp creates a control socket during module init (when the module ref count is not 0), which keeps the reference count non-zero. This section has been updated to address the point Stephen raised about checking the return value of try_module_get(). The next problem arose when my socket init routine returned an error. This resulted in my module reference count being decremented below 0. My socket ops->release routine was also being called. The issue here is that sock_release() calls the ops->release routine and decrements the ref count if sock->ops is not NULL. Since the socket probably didn't get correctly initialized, this should not be done, so we will set sock->ops to NULL because we will not call try_module_get(). While searching for another bug, I also noticed that sys_accept() has a possibility of doing a module_put() when it did not do an __module_get so I re-ordered the call to security_socket_accept(). Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27[NET]: Prefetch dev->qdisc_lock in dev_queue_xmit()Eric Dumazet1-0/+2
We know the lock is going to be taken. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27[NET]: Use non-recursive algorithm in skb_copy_datagram_iovec()Daniel Phillips1-55/+26
Use iteration instead of recursion. Fraglists within fraglists should never occur, so we BUG check this. Signed-off-by: Daniel Phillips <phillips@istop.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27[NEIGH]: Add debugging check when adding timers.David S. Miller1-9/+14
If we double-add a neighbour entry timer, which should be impossible but has been reported, dump the current state of the entry so that we can debug this. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-26Merge master.kernel.org:/pub/scm/linux/kernel/git/acme/llc-2.6David S. Miller18-604/+806
2005-09-26[NETFILTER]: Fix invalid module autoloading by splitting iptable_natHarald Welte4-34/+35
When you've enabled conntrack and NAT as a module (standard case in all distributions), and you've also enabled the new conntrack netlink interface, loading ip_conntrack_netlink.ko will auto-load iptable_nat.ko. This causes a huge performance penalty, since for every packet you iterate the nat code, even if you don't want it. This patch splits iptable_nat.ko into the NAT core (ip_nat.ko) and the iptables frontend (iptable_nat.ko). Threfore, ip_conntrack_netlink.ko will only pull ip_nat.ko, but not the frontend. ip_nat.ko will "only" allocate some resources, but not affect runtime performance. This separation is also a nice step in anticipation of new packet filters (nf-hipac, ipset, pkttables) being able to use the NAT core. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-26[AF_PACKET]: Remove bogus checks added to packet_sendmsg().David S. Miller1-6/+0
These broke existing apps, and the checks are superfluous as the values being verified aren't even used. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-26[IPV6]: Fix [Bug 5306] Oops on IPv6 route lookupHerbert Xu1-0/+2
> Steps to reproduce: > 1. Boot Linux, do NOT setup any IPv6 routes > 2. ip route get 2001::1 (or any unroutable address) Well caught. We never set rt6i_idev on ip6_null_entry. This patch should make the problem go away. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-26[NET]: Make sure ctl buffer is aligned properly in sys_sendmsg().Alex Williamson1-1/+3
It's on the stack and declared as "unsigned char[]", but pointers and similar can be in here thus we need to give it an explicit alignment attribute. Signed-off-by: Alex Williamson <alex.williamson@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-24[NETFILTER] ip_conntrack: Update event cache when status changesHarald Welte3-1/+4
The GRE, SCTP and TCP protocol helpers did not call ip_conntrack_event_cache() when updating ct->status. This patch adds the respective calls. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-24[IRDA]: *irttp cleanupAlexey Dobriyan1-11/+4
* Remove useless comment. * Remove useless assertions. * Remove useless comparison. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-24[IRDA]: Fix memory leak in irttp_init()Alexey Dobriyan1-0/+1
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-24[NET]: Protect neigh_stat_seq_fops by CONFIG_PROC_FSAmos Waterland1-0/+2
From: Amos Waterland <apw@us.ibm.com> If CONFIG_PROC_FS is not selected, the compiler emits this warning: net/core/neighbour.c:64: warning: `neigh_stat_seq_fops' defined but not used Which is correct, because neigh_stat_seq_fops is in fact only initialized and used by code that is protected by CONFIG_PROC_FS. So this patch fixes that up. Signed-off-by: Amos Waterland <apw@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-24[NETFILTER]: Fix ip[6]t_NFQUEUE Kconfig dependencyHarald Welte4-2/+24
We have to introduce a separate Kconfig menu entry for the NFQUEUE targets. They cannot "just" depend on nfnetlink_queue, since nfnetlink_queue could be linked into the kernel, whereas iptables can be a module. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-23SUNRPC: fix bug in patch "portmapper doesn't need a reserved port"Chuck Lever1-6/+7
The in-kernel portmapper does in fact need a reserved port when registering new services, but not when performing bind queries. Ensure that we distinguish between the two cases. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23Revert "[PATCH] RPC,NFS: new rpc_pipefs patch"Trond Myklebust3-133/+197
This reverts 17f4e6febca160a9f9dd4bdece9784577a2f4524 commit.
2005-09-23[PATCH] RPC,NFS: new rpc_pipefs patchChristoph Hellwig3-197/+133
Currently rpc_mkdir/rpc_rmdir and rpc_mkpipe/mk_unlink have an API that's a little unfortunate. They take a path relative to the rpc_pipefs root and thus need to perform a full lookup. If you look at debugfs or usbfs they always store the dentry for directories they created and thus can pass in a dentry + single pathname component pair into their equivalents of the above functions. And in fact rpc_pipefs actually stores a dentry for all but one component so this change not only simplifies the core rpc_pipe code but also the callers. Unfortuntately this code path is only used by the NFS4 idmapper and AUTH_GSSAPI for which I don't have a test enviroment. Could someone give it a spin? It's the last bit needed before we can rework the lookup_hash API Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: rationalize set_buffer_sizeChuck Lever2-23/+17
In fact, ->set_buffer_size should be completely functionless for non-UDP. Test-plan: Check socket buffer size on UDP sockets over time. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: parametrize various transport connect timeoutsChuck Lever3-6/+69
Each transport implementation can now set unique bind, connect, reestablishment, and idle timeout values. These are variables, allowing the values to be modified dynamically. This permits exponential backoff of any of these values, for instance. As an example, we implement exponential backoff for the connection reestablishment timeout. Test-plan: Destructive testing (unplugging the network temporarily). Connectathon with UDP and TCP. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: make sure to get the same local port number when reconnectingChuck Lever1-12/+53
Implement a best practice: if the remote end drops our connection, try to reconnect using the same port number. This is important because the NFS server's Duplicate Reply Cache often hashes on the source port number. If the client reuses the port number when it reconnects, the server's DRC will be more effective. Based on suggestions by Mike Eisler, Olaf Kirch, and Alexey Kuznetsky. Test-plan: Destructive testing. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: allow RPC client's port range to be adjustableChuck Lever2-15/+37
Select an RPC client source port between 650 and 1023 instead of between 1 and 800. The old range conflicts with a number of network services. Provide sysctls to allow admins to select a different port range. Note that this doesn't affect user-level RPC library behavior, which still uses 1 to 800. Based on a suggestion by Olaf Kirch <okir@suse.de>. Test-plan: Repeated mount and unmount. Destructive testing. Idle timeouts. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: clean up after nocong was removedChuck Lever2-12/+19
Clean-up: Move some macros that are specific to the Van Jacobson implementation into xprt.c. Get rid of the cong_wait field in rpc_xprt, which is no longer used. Get rid of xprt_clear_backlog. Test-plan: Compile with CONFIG_NFS enabled. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: remove xprt->nocongChuck Lever1-2/+0
Get rid of the "xprt->nocong" variable. Test-plan: Use WAN simulation to cause sporadic bursty packet loss with UDP mounts. Look for significant regression in performance or client stability. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: add a release_rqst callout to the RPC transport switchChuck Lever2-1/+14
The final place where congestion control state is adjusted is in xprt_release, where each request is finally released. Add a callout there to allow transports to perform additional processing when a request is about to be released. Test-plan: Use WAN simulation to cause sporadic bursty packet loss. Look for significant regression in performance or client stability. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: add generic interface for adjusting the congestion windowChuck Lever2-48/+31
A new interface that allows transports to adjust their congestion window using the Van Jacobson implementation in xprt.c is provided. Test-plan: Use WAN simulation to cause sporadic bursty packet loss. Look for significant regression in performance or client stability. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: separate xprt_timer implementationsChuck Lever2-25/+32
Allow transports to hook the retransmit timer interrupt. Some transports calculate their congestion window here so that a retransmit timeout has immediate effect on the congestion window. Test-plan: Use WAN simulation to cause sporadic bursty packet loss. Look for significant regression in performance or client stability. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: expose API for serializing access to RPC transportsChuck Lever2-14/+65
The next method we abstract is the one that releases a transport, allowing another task to have access to the transport. Again, one generic version of this is provided for transports that don't need the RPC client to perform congestion control, and one version is for transports that can use the original Van Jacobson implementation in xprt.c. Test-plan: Use WAN simulation to cause sporadic bursty packet loss. Look for significant regression in performance or client stability. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: expose API for serializing access to RPC transportsChuck Lever2-12/+54
The next several patches introduce an API that allows transports to choose whether the RPC client provides congestion control or whether the transport itself provides it. The first method we abstract is the one that serializes access to the RPC transport to prevent the bytes from different requests from mingling together. This method provides proper request serialization and the opportunity to prevent new requests from being started because the transport is congested. The normal situation is for the transport to handle congestion control itself. Although NFS over UDP was first, it has been recognized after years of experience that having the transport provide congestion control is much better than doing it in the RPC client. Thus TCP, and probably every future transport implementation, will use the default method, xprt_lock_write, provided in xprt.c, which does not provide any kind of congestion control. UDP can continue using the xprt.c-provided Van Jacobson congestion avoidance implementation. Test-plan: Use WAN simulation to cause sporadic bursty packet loss. Look for significant regression in performance or client stability. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: add API to set transport-specific timeoutsChuck Lever2-22/+47
Prepare the way to remove the "xprt->nocong" variable by adding a callout to the RPC client transport switch API to handle setting RPC retransmit timeouts. Add a pair of generic helper functions that provide the ability to set a simple fixed timeout, or to set a timeout based on the state of a round- trip estimator. Test-plan: Use WAN simulation to cause sporadic bursty packet loss. Look for significant regression in performance or client stability. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: get rid of xprt->streamChuck Lever2-12/+19
Now we can fix up the last few places that use the "xprt->stream" variable, and get rid of it from the rpc_xprt structure. Test-plan: Destructive testing (unplugging the network temporarily). Connectathon with UDP and TCP. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: skip over transport-specific heads automaticallyChuck Lever3-14/+21
Add a generic mechanism for skipping over transport-specific headers when constructing an RPC request. This removes another "xprt->stream" dependency. Test-plan: Write-intensive workload on a single mount point (try both UDP and TCP). Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: separate TCP and UDP socket write pathsChuck Lever1-87/+128
Split the RPC client's main socket write path into a TCP version and a UDP version to eliminate another dependency on the "xprt->stream" variable. Compiler optimization removes unneeded code from xs_sendpages, as this function is now called with some constant arguments. We can now cleanly perform transport protocol-specific return code testing and error recovery in each path. Test-plan: Millions of fsx operations. Performance characterization such as "sio" or "iozone". Examine oprofile results for any changes before and after this patch is applied. Version: Thu, 11 Aug 2005 16:08:46 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: separate TCP and UDP transport connection logicChuck Lever1-73/+91
Create separate connection worker functions for managing UDP and TCP transport sockets. This eliminates several dependencies on "xprt->stream". Test-plan: Destructive testing (unplugging the network temporarily). Connectathon with v2, v3, and v4. Version: Thu, 11 Aug 2005 16:08:18 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: separate TCP and UDP write space callbacksChuck Lever2-31/+87
Split the socket write space callback function into a TCP version and UDP version, eliminating one dependence on the "xprt->stream" variable. Keep the common pieces of this path in xprt.c so other transports can use it too. Test-plan: Write-intensive workload on a single mount point. Version: Thu, 11 Aug 2005 16:07:51 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: client-side transport switch cleanupChuck Lever3-20/+20
Clean-up: change some comments to reflect the realities of the new RPC transport switch mechanism. Get rid of unused xprt_receive() prototype. Also, organize function prototypes in xprt.h by usage and scope. Test-plan: Compile kernel with CONFIG_NFS enabled. Version: Thu, 11 Aug 2005 16:07:21 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: Add helper for waking tasks pending on a transportChuck Lever2-7/+18
Clean-up: remove only reference to xprt->pending from the socket transport implementation. This makes a cleaner interface for other transport implementations as well. Test-plan: Compile kernel with CONFIG_NFS enabled. Version: Thu, 11 Aug 2005 16:06:52 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: Eliminate socket.h includes in RPC clientChuck Lever8-11/+0
Clean-up: get rid of unnecessary socket.h and in.h includes in the generic parts of the RPC client. Test-plan: Compile kernel with CONFIG_NFS enabled. Version: Thu, 11 Aug 2005 16:06:23 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: rename the sockstate fieldChuck Lever2-11/+9
Clean-up: get rid of a name reference to sockets in the generic parts of the RPC client by renaming the sockstate field in the rpc_xprt structure. Test-plan: Compile kernel with CONFIG_NFS enabled. Version: Thu, 11 Aug 2005 16:05:53 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: Rename xprt_lockChuck Lever1-5/+5
Clean-up: Replace the xprt_lock with something more aptly named. This lock single-threads the XID and request slot reservation process. Test-plan: Compile kernel with CONFIG_NFS enabled. Version: Thu, 11 Aug 2005 16:05:26 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: Rename sock_lockChuck Lever2-33/+33
Clean-up: replace a name reference to sockets in the generic parts of the RPC client by renaming sock_lock in the rpc_xprt structure. Test-plan: Compile kernel with CONFIG_NFS enabled. Version: Thu, 11 Aug 2005 16:05:00 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: Reduce stack utilization in xs_sendpagesChuck Lever1-30/+43
Reduce stack utilization of the RPC socket transport's send path. A couple of unlikely()s are added to ensure the compiler places the tail processing at the end of the csect. Test-plan: Millions of fsx operations. Performance characterization such as "sio" or "iozone". Version: Thu, 11 Aug 2005 16:04:30 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: transport switch function namingChuck Lever2-301/+310
Introduce block header comments and a function naming convention to the socket transport implementation. Provide a debug setting for transports that is separate from RPCDBG_XPRT. Eliminate xprt_default_timeout(). Provide block comments for exposed interfaces in xprt.c, and eliminate the useless obvious comments. Convert printk's to dprintk's. Test-plan: Compile kernel with CONFIG_NFS enabled. Version: Thu, 11 Aug 2005 16:04:04 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: introduce client-side transport switchChuck Lever6-977/+1070
Move the bulk of client-side socket-specific code into a separate source file, net/sunrpc/xprtsock.c. Test-plan: Millions of fsx operations. Performance characterization such as "sio" or "iozone". Destructive testing (unplugging the network temporarily, server reboots). Connectathon with v2, v3, and v4. Version: Thu, 11 Aug 2005 16:03:38 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: extract socket logic common to both client and serverChuck Lever5-143/+176
Clean-up: Move some code that is common to both RPC client- and server-side socket transports into its own source file, net/sunrpc/socklib.c. Test-plan: Compile kernel with CONFIG_NFS enabled. Millions of fsx operations over UDP, client and server. Connectathon over UDP. Version: Thu, 11 Aug 2005 16:03:09 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: portmapper doesn't need a reserved portChuck Lever1-0/+1
The in-kernel portmapper does not require a reserved port for making bind queries. Test-plan: Tens of runs of the Connectathon locking suite with TCP and UDP against several other NFS server implementations using NFSv3, not NFSv4 (which doesn't require rpcbind). Version: Thu, 11 Aug 2005 16:02:43 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] NFS: use a constant value for TCP retransmit timeoutsChuck Lever1-2/+2
Implement a best practice: don't use exponential backoff when computing retransmit timeout values on TCP connections, but simply retransmit at regular intervals. This also fixes a bug introduced when xprt_reset_majortimeo() was added. Test-plan: Enable RPC debugging and watch timeout behavior on a NFS/TCP mount. Version: Thu, 11 Aug 2005 16:02:19 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: proper soft timeout behavior for rpcbindChuck Lever1-20/+77
Implement a best practice: for soft mounts, an rpcbind timeout should cause an RPC request to fail. This also provides an FSM hook for retrying an rpcbind with a different rpcbind protocol version. We'll use this later to try multiple rpcbind protocol versions when binding. To enable this, expose the RPC error code returned during a portmap request to the FSM so it can make some decision about how to report, retry, or fail the request. Test-plan: Hundreds of passes with connectathon NFSv3 locking suite, on the client and server. Version: Thu, 11 Aug 2005 16:01:53 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23[PATCH] RPC: Report connection errors properly when mounting with "soft"Chuck Lever1-9/+18
Fix up xprt_connect_status: the soft timeout logic was clobbering tk_status, so TCP connect errors were not properly reported on soft mounts. Test-plan: Destructive testing (unplugging the network temporarily). Connectathon with UDP and TCP. Version: Thu, 11 Aug 2005 16:01:28 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>