aboutsummaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2005-08-11[NETPOLL]: remove unused variableMatt Mackall1-1/+0
Remove unused variable Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11[NETPOLL]: fix initialization/NAPI raceMatt Mackall2-4/+8
This fixes a race during initialization with the NAPI softirq processing by using an RCU approach. This race was discovered when refill_skbs() was added to the setup code. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11[NETPOLL]: pre-fill skb poolIngo Molnar1-0/+4
we could do one thing (see the patch below): i think it would be useful to fill up the netlogging skb queue straight at initialization time. Especially if netpoll is used for dumping alone, the system might not be in a situation to fill up the queue at the point of crash, so better be a bit more prepared and keep the pipeline filled. [ I've modified this to be called earlier - mpm ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11[NETPOLL]: add retry timeoutMatt Mackall1-3/+10
Add limited retry logic to netpoll_send_skb Each time we attempt to send, decrement our per-device retry counter. On every successful send, we reset the counter. We delay 50us between attempts with up to 20000 retries for a total of 1 second. After we've exhausted our retries, subsequent failed attempts will try only once until reset by success. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11[NETPOLL]: netpoll_send_skb simplifyMatt Mackall1-20/+22
Minor netpoll_send_skb restructuring Restructure to avoid confusing goto and move some bits out of the retry loop. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11[NETPOLL]: deadlock bugfixJeff Moyer1-3/+0
This fixes an obvious deadlock in the netpoll code. netpoll_rx takes the npinfo->rx_lock. netpoll_rx is also the only caller of arp_reply (through __netpoll_rx). As such, it is not necessary to take this lock. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-11[NETPOLL]: rx_flags bugfixJeff Moyer1-0/+1
Initialize npinfo->rx_flags. The way it stands now, this will have random garbage, and so will incur a locking penalty even when an rx_hook isn't registered and we are not active in the netpoll polling code. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-10[TCP]: Adjust {p,f}ackets_out correctly in tcp_retransmit_skb()Herbert Xu1-4/+10
Well I've only found one potential cause for the assertion failure in tcp_mark_head_lost. First of all, this can only occur if cnt > 1 since tp->packets_out is never zero here. If it did hit zero we'd have much bigger problems. So cnt is equal to fackets_out - reordering. Normally fackets_out is less than packets_out. The only reason I've found that might cause fackets_out to exceed packets_out is if tcp_fragment is called from tcp_retransmit_skb with a TSO skb and the current MSS is greater than the MSS stored in the TSO skb. This might occur as the result of an expiring dst entry. In that case, packets_out may decrease (line 1380-1381 in tcp_output.c). However, fackets_out is unchanged which means that it may in fact exceed packets_out. Previously tcp_retrans_try_collapse was the only place where packets_out can go down and it takes care of this by decrementing fackets_out. So we should make sure that fackets_out is reduced by an appropriate amount here as well. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-10[DECNET]: Use sk_stream_error function rather than DECnet's ownSteven Whitehouse1-10/+1
Signed-off-by: Steven Whitehouse <steve@chygwyn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-09[NET]: Fix memory leak in sys_{send,recv}msg() w/compatAndrew Morton1-9/+0
From: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com> sendmsg()/recvmsg() syscalls from o32/n32 apps to a 64bit kernel will cause a kernel memory leak if iov_len > UIO_FASTIOV for each syscall! This is because both sys_sendmsg() and verify_compat_iovec() kmalloc a new iovec structure. Only the one from sys_sendmsg() is free'ed. I wrote a simple test program to confirm this after identifying the problem: http://davej.org/programs/testsendmsg.c Note that the below fix will break solaris_sendmsg()/solaris_recvmsg() as it also calls verify_compat_iovec() but expects it to malloc internally. [ I fixed that. -DaveM ] Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-09[SUNRPC]: Fix nsec --> usec conversion.David S. Miller1-1/+1
We need to divide, not multiply. While we're here, use NSEC_PER_USEC instead of a magic constant. Based upon a report from Josip Loncaric and a patch by Andrew Morton. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-08Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds6-33/+32
2005-08-08[IPV4]: Debug cleanupHeikki Orsila4-33/+26
Here's a small patch to cleanup NETDEBUG() use in net/ipv4/ for Linux kernel 2.6.13-rc5. Also weird use of indentation is changed in some places. Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-08[PATCH] don't try to do any NAT on untracked connectionsHarald Welte1-0/+4
With the introduction of 'rustynat' in 2.6.11, the old tricks of preventing NAT of 'untracked' connections (e.g. NOTRACK target in 'raw' table) are no longer sufficient. The ip_conntrack_untracked.status |= IPS_NAT_DONE_MASK effectively prevents iteration of the 'nat' table, but doesn't prevent nat_packet() to be executed. Since nr_manips is gone in 'rustynat', nat_packet() now implicitly thinks that it has to do NAT on the packet. This patch fixes that problem by explicitly checking for ip_conntrack_untracked in ip_nat_fn(). Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-06[IPSEC]: Restrict socket policy loading to CAP_NET_ADMIN.Herbert Xu2-0/+6
The interface needs much redesigning if we wish to allow normal users to do this in some way. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-06[Bluetooth] Add direction and timestamp to stack internal eventsMarcel Holtmann1-0/+3
This patch changes the direction to incoming and adds the timestamp to all stack internal events. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-08-06[Bluetooth] Remove unused functions and cleanup symbol exportsMarcel Holtmann3-28/+0
This patch removes the unused bt_dump() function and it also removes its BT_DMP macro. It also unexports the hci_dev_get(), hci_send_cmd() and hci_si_event() functions. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-08-06[Bluetooth] Revert session reference counting fixMarcel Holtmann1-4/+0
The fix for the reference counting problem of the signal DLC introduced a race condition which leads to an oops. The reason for it is not fully understood by now and so revert this fix, because the reference counting problem is not crashing the RFCOMM layer and its appearance it rare. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2005-08-05[IPV4]: Fix memory leak during fib_info hash expansion.David S. Miller1-1/+8
When we grow the tables, we forget to free the olds ones up. Noticed by Yan Zheng. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-08-04[PATCH] tcp: fix TSO cwnd caching bugHerbert Xu1-25/+9
tcp_write_xmit caches the cwnd value indirectly in cwnd_quota. When tcp_transmit_skb reduces the cwnd because of tcp_enter_cwr, the cached value becomes invalid. This patch ensures that the cwnd value is always reread after each tcp_transmit_skb call. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-04[PATCH] tcp: fix TSO sizing bugsDavid S. Miller1-28/+28
MSS changes can be lost since we preemptively initialize the tso_segs count for an SKB before we %100 commit to sending it out. So, by the time we send it out, the tso_size information can be stale due to PMTU events. This mucks up all of the logic in our send engine, and can even result in the BUG() triggering in tcp_tso_should_defer(). Another problem we have is that we're storing the tp->mss_cache, not the SACK block normalized MSS, as the tso_size. That's wrong too. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-30[NET] Fix too aggressive backoff in dst garbage collectionDenis Lunev1-4/+11
The bug is evident when it is seen once. dst gc timer was backed off, when gc queue is not empty. But this means that timer quickly backs off, if at least one destination remains in use. Normally, the bug is invisible, because adding new dst entry to queue cancels the backoff. But it shots deadly with destination cache overflow when new destinations are not released for long time f.e. after an interface goes down. The fix is to cancel backoff when something was released. Signed-off-by: Denis Lunev <den@sw.ru> Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-30[NET]: fix oops after tunnel module unloadAlexey Kuznetsov3-7/+55
Tunnel modules used to obtain module refcount each time when some tunnel was created, which meaned that tunnel could be unloaded only after all the tunnels are deleted. Since killing old MOD_*_USE_COUNT macros this protection has gone. It is possible to return it back as module_get/put, but it looks more natural and practically useful to force destruction of all the child tunnels on module unload. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-30[NETFILTER] Inherit masq_index to slave connectionsHarald Welte1-0/+5
masq_index is used for cleanup in case the interface address changes (such as a dialup ppp link with dynamic addreses). Without this patch, slave connections are not evicted in such a case, since they don't inherit masq_index. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-30[NET]: Spelling mistakes threshoulds -> thresholdsBaruch Even1-3/+3
Just simple spelling mistake fixes. Signed-Off-By: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-28[NET]: Fix busy waiting in dev_close().David S. Miller1-2/+1
If the current task has signal_pending(), the loop we have to wait for the __LINK_STATE_RX_SCHED bit to clear becomes a pure busy-loop. Fixed by using msleep() instead of the hand-crafted version. Noticed by Andrew Morton. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-27Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds6-69/+46
2005-07-27[PATCH] clean up inline static vs static inlineJesper Juhl1-1/+1
`gcc -W' likes to complain if the static keyword is not at the beginning of the declaration. This patch fixes all remaining occurrences of "inline static" up with "static inline" in the entire kernel tree (140 occurrences in 47 files). While making this change I came across a few lines with trailing whitespace that I also fixed up, I have also added or removed a blank line or two here and there, but there are no functional changes in the patch. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27[PATCH] turn many #if $undefined_string into #ifdef $undefined_stringOlaf Hering1-6/+1
turn many #if $undefined_string into #ifdef $undefined_string to fix some warnings after -Wno-def was added to global CFLAGS Signed-off-by: Olaf Hering <olh@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27[NET]: Move in_aton from net/ipv4/utils.c to net/core/utils.cMatt Mackall3-61/+37
Move in_aton to allow netpoll and pktgen to work without the rest of the IPv4 stack. Fix whitespace and add comment for the odd placement. Delete now-empty net/ipv4/utils.c Re-enable netpoll/netconsole without CONFIG_INET Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-27[NETFILTER]: Fix -Wunder error in ip_conntrack_core.cNick Sillik1-1/+1
Signed-off-by: Nick Sillik <n.sillik@temple.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-27[NET]: Fix setsockopt locking bugKyle Moffett1-6/+7
On Sparc, SO_DONTLINGER support resulted in sock_reset_flag being called without lock_sock(). Signed-off-by: Kyle Moffett <mrmacman_g4@mac.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-27[IPV4]: Fix Kconfig syntax errorHans-Juergen Tappe (SYSGO AG)1-1/+1
From: "Hans-Juergen Tappe (SYSGO AG)" <hjt@sysgo.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-26[XFRM]: Fix possible overflow of sock->sk_policyHerbert Xu1-0/+3
Spotted by, and original patch by, Balazs Scheidler. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-24[EMATCH]: Remove feature ifdefs in meta ematch.Patrick McHardy1-8/+8
Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-24[IPV6]: fix implicit declaration of function `xfrm6_tunnel_unregister'Cal Peake1-1/+1
Signed-off-by: Cal Peake <cp@absolutedigital.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22[PKT_SCHED]: em_meta: Kill TCF_META_ID_{INDEV,SECURITY,TCVERDICT}David S. Miller1-25/+3
More unusable TCF_META_* match types that need to get eliminated before 2.6.13 goes out the door. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Thomas Graf <tgraf@suug.ch>
2005-07-22[NETFILTER]: Fix ip6t_LOG MAC formatPatrick McHardy1-4/+7
I broke this in the patch that consolidated MAC logging. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22[NETFILTER]: Use correct byteorder in ICMP NATPatrick McHardy1-3/+4
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22[NETFILTER]: Wait until all references to ip_conntrack_untracked are dropped ↵Patrick McHardy1-0/+3
on unload Fixes a crash when unloading ip_conntrack. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22[NETFILTER]: Fix potential memory corruption in NAT code (aka memory NAT)Patrick McHardy2-2/+4
The portptr pointing to the port in the conntrack tuple is declared static, which could result in memory corruption when two packets of the same protocol are NATed at the same time and one conntrack goes away. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22[NETFILTER]: Fix deadlock in ip6_queuePatrick McHardy1-0/+2
Already fixed in ip_queue, ip6_queue was missed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-22[PKT_SCHED]: Kill TCF_META_ID_REALDEV from meta ematch.David S. Miller1-12/+0
It won't exist any longer when we shrink the SKB in 2.6.14, and we should kill this off before anyone in userspace starts using it. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Thomas Graf <tgraf@suug.ch>
2005-07-21[NETFILTER]: ip_conntrack_expect_related must not free expectationRusty Russell10-57/+39
If a connection tracking helper tells us to expect a connection, and we're already expecting that connection, we simply free the one they gave us and return success. The problem is that NAT helpers (eg. FTP) have to allocate the expectation first (to see what port is available) then rewrite the packet. If that rewrite fails, they try to remove the expectation, but it was freed in ip_conntrack_expect_related. This is one example of a larger problem: having registered the expectation, the pointer is no longer ours to use. Reference counting is needed for ctnetlink anyway, so introduce it now. To have a single "put" path, we need to grab the reference to the connection on creation, rather than open-coding it in the caller. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19[NET]: Fix tc_verd thinko in skb_clone()David S. Miller1-2/+2
It was overwriting the computer n->tc_verd value over and over with skb->tc_verd, by mistake. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19[NET]: Make ipip/ip6_tunnel independant of XFRMPatrick McHardy5-13/+66
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19[IPV4]: Fix up lots of little whitespace indentation stuff in fib_trie.Stephen Hemminger2-386/+388
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19[NET]: BRIDGE_EBT_ARPREPLY must depend on INETAdrian Bunk1-1/+1
BRIDGE_EBT_ARPREPLY=y and INET=n results in the following compile error: net/built-in.o: In function `ebt_target_reply': ebt_arpreply.c:(.text+0x68fb9): undefined reference to `arp_send' make: *** [.tmp_vmlinux1] Error 1 Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19[IPV4]: Don't select XFRM for ip_grePatrick McHardy1-1/+0
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19[NET]: Only build flow.o if CONFIG_XFRM=yPatrick McHardy1-1/+2
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19[ATM]: Trivial spelling fix patch for net/KconfigJesper Juhl1-2/+2
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-19[ATM]: allow bind() on point-to-multpoint svcs (from Martin Whitaker ↵Chas Williams1-4/+0
<martin_whitaker@ntlworld.com>) Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18[EMATCH]: Kill TCF_META_ID_TCCLASSID reference from meta ematch as well.David S. Miller1-6/+0
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18[IPV4]: fix IP_FIB_HASH kconfig warningAdrian Bunk1-7/+3
This patch fixes the following kconfig warning: net/ipv4/Kconfig:92:warning: defaults for choice values not supported Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18[NET]: Kconfig: NETCONSOLE and NETPOLL togetherRandy Dunlap1-16/+0
Put NETCONSOLE and NETPOLL options together since they are related. This cuts down on the hassle of flipping back and forth between the Networking menu and the Network drivers menu to change their config settings. Tested with menuconfig, gconfig, and xconfig. gconfig has a small problem with this. I think that it's a bug in gconfig and I will take it up with Romain Lievin. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18[SCTP]: Fix potential null pointer dereference while handling an icmp errorSridhar Samudrala2-37/+15
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18[SCTP]: Audit return code of create_proc_*Christophe Lucas1-1/+5
From: Christophe Lucas <clucas@rotomalug.org> Audit return of create_proc_* functions. Signed-off-by: Christophe Lucas <clucas@rotomalug.org> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18[NETLINK]: Fix "nocast type" warningsVictor Fusco1-2/+3
From: Victor Fusco <victor@cetuc.puc-rio.br> Fix the sparse warning "implicit cast to nocast type" Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18[PKT_SCHED]: Reduce branch mispredictions in pfifo_fast_dequeueThomas Graf1-4/+3
The current call to __qdisc_dequeue_head leads to a branch misprediction for every loop iteration, the fact that the most common priority is 2 makes this even worse. This issue has been brought up by Eric Dumazet <dada1@cosmosbay.com> but unlike his solution which was to manually unroll the loop, this approach preserves the possibility to increase the number of bands at compile time. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-18[PKT_SCHED]: Remove debugging leftover from textsearch ematchThomas Graf1-3/+0
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-12[VLAN]: Fix early vlan adding leads to not functional deviceTommy Christensen1-0/+8
OK, I can see what's happening here. eth0 doesn't detect link-up until after a few seconds, so when the vlan interface is opened immediately after eth0 has been opened, it inherits the link-down state. Subsequently the vlan interface is never properly activated and are thus unable to transmit any packets. dev->state bits are not supposed to be manipulated directly. Something similar is probably needed for the netif_device_present() bit, although I don't know how this is meant to work for a virtual device. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-12[NET]: __be'ify *_type_trans()Alexey Dobriyan4-8/+7
tr_type_trans(), hippi_type_trans() left as-is. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-12[NETFILTER]: Revert nf_reset changePhil Oester3-9/+13
Revert the nf_reset change that caused so much trouble, drop conntrack references manually before packets are queued to packet sockets. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11[NET]: move config options out to individual protocolsSam Ravnborg17-452/+447
Move the protocol specific config options out to the specific protocols. With this change net/Kconfig now starts to become readable and serve as a good basis for further re-structuring. The menu structure is left almost intact, except that indention is fixed in most cases. Most visible are the INET changes where several "depends on INET" are replaced with a single ifdef INET / endif pair. Several new files were created to accomplish this change - they are small but serve the purpose that config options are now distributed out where they belongs. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11[NET]: add a top-level Networking menu to *configSam Ravnborg1-5/+5
Create a new top-level menu named "Networking" thus moving net related options and protocol selection way from the drivers menu and up on the top-level where they belong. To implement this all architectures has to source "net/Kconfig" before drivers/*/Kconfig in their Kconfig file. This change has been implemented for all architectures. Device drivers for ordinary NIC's are still to be found in the Device Drivers section, but Bluetooth, IrDA and ax25 are located with their corresponding menu entries under the new networking menu item. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11[IPV4]: Prevent oops when printing martian sourceOlaf Kirch1-1/+1
In some cases, we may be generating packets with a source address that qualifies as martian. This can happen when we're in the middle of setting up the network, and netfilter decides to reject a packet with an RST. The IPv4 routing code would try to print a warning and oops, because locally generated packets do not have a valid skb->mac.raw pointer at this point. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11[IPVS]: Add and reorder bh locks after moving to keventd.Julian Anastasov2-6/+9
An addition to the last ipvs changes that move update_defense_level/si_meminfo to keventd: - ip_vs_random_dropentry now runs in process context and should use _bh locks to protect from softirqs - update_defense_level still needs _bh locks after si_meminfo is called, for the same purpose Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11[NET]: Trivial spelling fix patch for net/KconfigJesper Juhl1-2/+2
Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-11[SCTP]: __nocast annotationsAlexey Dobriyan11-44/+59
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08[SCTP]: Use struct list_head for chunk lists, not sk_buff_head.David S. Miller7-53/+79
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08[IPV6]: Fix warning in ip6_mc_msfilter.David S. Miller1-0/+1
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08[IPV4]: fix IPv4 leave-group group matchingDavid L Stevens1-12/+13
This patch fixes the multicast group matching for IP_DROP_MEMBERSHIP, similar to the IP_ADD_MEMBERSHIP fix in a prior patch. Groups are identifiedby <group address,interface> and including the interface address in the match will fail if a leave-group is done by address when the join was done by index, or if different addresses on the same interface are used in the join and leave. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08[IPV4]: (INCLUDE,empty)/leave-group equivalence for full-state MSF APIs & ↵David L Stevens2-2/+19
errno fix 1) Adds (INCLUDE, empty)/leave-group equivalence to the full-state multicast source filter APIs (IPv4 and IPv6) 2) Fixes an incorrect errno in the IPv6 leave-group (ENOENT should be EADDRNOTAVAIL) Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08[IPV4]: multicast API "join" issuesDavid L Stevens2-12/+24
1) In the full-state API when imsf_numsrc == 0 errno should be "0", but returns EADDRNOTAVAIL 2) An illegal filter mode change errno should be EINVAL, but returns EADDRNOTAVAIL 3) Trying to do an any-source option without IP_ADD_MEMBERSHIP errno should be EINVAL, but returns EADDRNOTAVAIL 4) Adds comments for the less obvious error return values Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08[IPV4]: multicast API "join" issuesDavid L Stevens2-3/+12
1) Changes IP_ADD_SOURCE_MEMBERSHIP and MCAST_JOIN_SOURCE_GROUP to ignore EADDRINUSE errors on a "courtesy join" -- prior membership or not is ok for these. 2) Adds "leave group" equivalence of (INCLUDE, empty) filters in the delta-based API. Without this, mixing delta-based API calls that end in an (INCLUDE, empty) filter would not allow a subsequent regular IP_ADD_MEMBERSHIP. It also frees socket buffer memory that isn't needed for both the multicast group record and source filter. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08[IPV4]: multicast API "join" issuesDavid L Stevens1-23/+12
This patch corrects a few problems with the IP_ADD_MEMBERSHIP socket option: 1) The existing code makes an attempt at reference counting joins when using the ip_mreqn/imr_ifindex interface. Joining the same group on the same socket is an error, whatever the API. This leads to unexpected results when mixing ip_mreqn by index with ip_mreqn by address, ip_mreq, or other API's. For example, ip_mreq followed by ip_mreqn of the same group will "work" while the same two reversed will not. Fixed to always return EADDRINUSE on a duplicate join and removed the (now unused) reference count in ip_mc_socklist. 2) The group-search list in ip_mc_join_group() is comparing a full ip_mreqn structure and all of it must match for it to find the group. This doesn't correctly match a group that was joined with ip_mreq or ip_mreqn with an address (with or without an index). It also doesn't match groups that are joined by different addresses on the same interface. All of these are the same multicast group, which is identified by group address and interface index. Fixed the check to correctly match groups so we don't get duplicate group entries on the ip_mc_socklist. 3) The old code allocates a multicast address before searching for duplicates requiring it to free in various error cases. This patch moves the allocate until after the search and igmp_max_memberships check, so never a need to allocate, then free an entry. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08[IPV4]: Apply sysctl_icmp_echo_ignore_broadcasts to ICMP_TIMESTAMP as well.Alexey Kuznetsov1-1/+2
This was the full intention of the original code. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08[NET]: Fix sparse warningsVictor Fusco4-13/+19
From: Victor Fusco <victor@cetuc.puc-rio.br> Fix the sparse warning "implicit cast to nocast type" Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-08[NET]: Transform skb_queue_len() binary tests into skb_queue_empty()David S. Miller17-50/+44
This is part of the grand scheme to eliminate the qlen member of skb_queue_head, and subsequently remove the 'list' member of sk_buff. Most users of skb_queue_len() want to know if the queue is empty or not, and that's trivially done with skb_queue_empty() which doesn't use the skb_queue_head->qlen member and instead uses the queue list emptyness as the test. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-07[PATCH] coverity: sunrpc/xprt task null checkKAMBAROV, ZAUR1-2/+0
In __xprt_lock_write() we check to see if `task' is NULL, but in other places we just go and dereference it. `task' shouldn't be NULL anyway, so remove this test. This defect was found automatically by Coverity Prevent, a static analysis tool. Signed-off-by: Zaur Kambarov <zkambarov@coverity.com> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-05[TCP]: Never TSO defer under periods of congestion.David S. Miller1-0/+3
Congestion window recover after loss depends upon the fact that if we have a full MSS sized frame at the head of the send queue, we will send it. TSO deferral can defeat the ACK clocking necessary to exit cleanly from recovery. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[PKT_SCHED]: Blackhole queueing disciplineThomas Graf2-1/+55
Useful in combination with classful qdiscs to drop or temporary disable certain flows, e.g. one could block specific ds flows with dsmark. Unlike the noop qdisc it can be controlled by the user and statistic accounting is done. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[TCP]: Move to new TSO segmenting scheme.David S. Miller5-237/+381
Make TSO segment transmit size decisions at send time not earlier. The basic scheme is that we try to build as large a TSO frame as possible when pulling in the user data, but the size of the TSO frame output to the card is determined at transmit time. This is guided by tp->xmit_size_goal. It is always set to a multiple of MSS and tells sendmsg/sendpage how large an SKB to try and build. Later, tcp_write_xmit() and tcp_push_one() chop up the packet if necessary and conditions warrant. These routines can also decide to "defer" in order to wait for more ACKs to arrive and thus allow larger TSO frames to be emitted. A general observation is that TSO elongates the pipe, thus requiring a larger congestion window and larger buffering especially at the sender side. Therefore, it is important that applications 1) get a large enough socket send buffer (this is accomplished by our dynamic send buffer expansion code) 2) do large enough writes. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[TCP]: Break out send buffer expansion test.David S. Miller1-4/+23
This makes it easier to understand, and allows easier tweaking of the heuristic later on. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[TCP]: Do not call tcp_tso_acked() if no work to do.David S. Miller1-1/+2
In tcp_clean_rtx_queue(), if the TSO packet is not even partially acked, do not waste time calling tcp_tso_acked(). Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[TCP]: Kill bogus comment above tcp_tso_acked().David S. Miller1-9/+0
Everything stated there is out of data. tcp_trim_skb() does adjust the available socket send buffer space and skb->truesize now. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[TCP]: Fix send-side cpu utiliziation regression.David S. Miller1-2/+11
Only put user data purely to pages when doing TSO. The extra page allocations cause two problems: 1) Add the overhead of the page allocations themselves. 2) Make us do small user copies when we get to the end of the TCP socket cache page. It is still beneficial to purely use pages for TSO, so we will do it for that case. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[TCP]: Eliminate redundant computations in tcp_write_xmit().David S. Miller1-9/+31
tcp_snd_test() is run for every packet output by a single call to tcp_write_xmit(), but this is not necessary. For one, the congestion window space needs to only be calculated one time, then used throughout the duration of the loop. This cleanup also makes experimenting with different TSO packetization schemes much easier. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[TCP]: Break out tcp_snd_test() into it's constituent parts.David S. Miller1-38/+82
tcp_snd_test() does several different things, use inline functions to express this more clearly. 1) It initializes the TSO count of SKB, if necessary. 2) It performs the Nagle test. 3) It makes sure the congestion window is adhered to. 4) It makes sure SKB fits into the send window. This cleanup also sets things up so that things like the available packets in the congestion window does not need to be calculated multiple times by packet sending loops such as tcp_write_xmit(). Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[TCP]: Fix __tcp_push_pending_frames() 'nonagle' handling.David S. Miller2-24/+8
'nonagle' should be passed to the tcp_snd_test() function as 'TCP_NAGLE_PUSH' if we are checking an SKB not at the tail of the write_queue. This is because Nagle does not apply to such frames since we cannot possibly tack more data onto them. However, while doing this __tcp_push_pending_frames() makes all of the packets in the write_queue use this modified 'nonagle' value. Fix the bug and simplify this function by just calling tcp_write_xmit() directly if sk_send_head is non-NULL. As a result, we can now make tcp_data_snd_check() just call tcp_push_pending_frames() instead of the specialized __tcp_data_snd_check(). Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[TCP]: Fix redundant calculations of tcp_current_mss()David S. Miller1-12/+4
tcp_write_xmit() uses tcp_current_mss(), but some of it's callers, namely __tcp_push_pending_frames(), already has this value available already. While we're here, fix the "cur_mss" argument to be "unsigned int" instead of plain "unsigned". Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[TCP]: tcp_write_xmit() tabbing cleanupDavid S. Miller1-34/+34
Put the main basic block of work at the top-level of tabbing, and mark the TCP_CLOSE test with unlikely(). Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[TCP]: Kill extra cwnd validate in __tcp_push_pending_frames().David S. Miller1-30/+49
The tcp_cwnd_validate() function should only be invoked if we actually send some frames, yet __tcp_push_pending_frames() will always invoke it. tcp_write_xmit() does the call for us, so the call here can simply be removed. Also, tcp_write_xmit() can be marked static. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[TCP]: Add missing skb_header_release() call to tcp_fragment().David S. Miller1-0/+1
When we add any new packet to the TCP socket write queue, we must call skb_header_release() on it in order for the TSO sharing checks in the drivers to work. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[TCP]: Move __tcp_data_snd_check into tcp_output.cDavid S. Miller2-10/+10
It reimplements portions of tcp_snd_check(), so it we move it to tcp_output.c we can consolidate it's logic much easier in a later change. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[TCP]: Move send test logic out of net/tcp.hDavid S. Miller1-21/+129
This just moves the code into tcp_output.c, no code logic changes are made by this patch. Using this as a baseline, we can begin to untangle the mess of comparisons for the Nagle test et al. We will also be able to reduce all of the redundant computation that occurs when outputting data packets. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[TCP]: Fix quick-ack decrementing with TSO.David S. Miller1-3/+3
On each packet output, we call tcp_dec_quickack_mode() if the ACK flag is set. It drops tp->ack.quick until it hits zero, at which time we deflate the ATO value. When doing TSO, we are emitting multiple packets with ACK set, so we should decrement tp->ack.quick that many segments. Note that, unlike this case, tcp_enter_cwr() should not take the tcp_skb_pcount(skb) into consideration. That function, one time, readjusts tp->snd_cwnd and moves into TCP_CA_CWR state. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[TCP]: Simplify SKB data portion allocation with NETIF_F_SG.David S. Miller1-11/+2
The ideal and most optimal layout for an SKB when doing scatter-gather is to put all the headers at skb->data, and all the user data in the page array. This makes SKB splitting and combining extremely simple, especially before a packet goes onto the wire the first time. So, when sk_stream_alloc_pskb() is given a zero size, make sure there is no skb_tailroom(). This is achieved by applying SKB_DATA_ALIGN() to the header length used here. Next, make select_size() in TCP output segmentation use a length of zero when NETIF_F_SG is true on the outgoing interface. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[NET]: improve readability of dev_set_promiscuity() in net/core/dev.cDavid Chau1-2/+3
A trivial patch to improve the readability of dev_set_promiscuity() in net/core/dev.c. New code does exactly the same thing as original code. Signed-off-by: David Chau <ddcc@mit.edu> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[IPV4]: More broken memory allocation fixes for fib_trieRobert Olsson1-32/+145
Below a patch to preallocate memory when doing resize of trie (inflate halve) If preallocations fails it just skips the resize of this tnode for this time. The oops we got when killing bgpd (with full routing) is now gone. Patrick memory patch is also used. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[DECNET]: Fix memset overflow on 64bit archs while dumping decnet routing rulesThomas Graf1-1/+2
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[IPV4]: Bug fix in rt_check_expire()Eric Dumazet1-7/+14
- rt_check_expire() fixes (an overflow occured if size of the hash was >= 65536) reminder of the bugfix: The rt_check_expire() has a serious problem on machines with large route caches, and a standard HZ value of 1000. With default values, ie ip_rt_gc_interval = 60*HZ = 60000 ; the loop count : for (t = ip_rt_gc_interval << rt_hash_log; t >= 0; overflows (t is a 31 bit value) as soon rt_hash_log is >= 16 (65536 slots in route cache hash table). In this case, rt_check_expire() does nothing at all Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[IPV4]: Use the fancy alloc_large_system_hash() function for route hash tableEric Dumazet1-27/+16
- rt hash table allocated using alloc_large_system_hash() function. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[NET]: Hashed spinlocks in net/ipv4/route.cEric Dumazet1-19/+47
- Locking abstraction - Spinlocks moved out of rt hash table : Less memory (50%) used by rt hash table. it's a win even on UP. - Sizing of spinlocks table depends on NR_CPUS Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[IPV4]: Handle large allocations in fib_triePatrick McHardy1-2/+23
Inflating a node a couple of times makes it exceed the 128k kmalloc limit. Use __get_free_pages for allocations > PAGE_SIZE, as in fib_hash. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Robert Olsson <Robert.Olsson@data.slu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[IPV6]: Makes IPv6 rcv registration happen last during initialisation.Herbert Xu1-2/+2
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[IPV4]: Fix crash in ip_rcv while booting related to netconsoleHerbert Xu2-15/+11
Makes IPv4 ip_rcv registration happen last in af_inet. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[PKT_SCHED]: Report rate estimator configuration errors during qdisc allocationThomas Graf1-5/+17
Current behaviour is to not report an error if a rate estimator is created together with a qdisc and the configuration of the rate estimator is bogus. This leads to unexpected behaviour because the user is not notified. New behaviour is to report the error and let the whole qdisc creation operation fail so the user is able to fix his mistake. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[PKT_SCHED]: Cleanup qdisc creation and alignment macrosThomas Graf2-43/+33
Adds qdisc_alloc() to share code between qdisc_create() and qdisc_create_dflt(). Hides the qdisc alignment behind macros and makes use of them. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[NET]: Remove unused security member in sk_buffThomas Graf4-10/+0
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[NET]: net/core/filter.c: make len cover the entire packetPatrick McHardy1-6/+2
As suggested by Herbert Xu: Since we don't require anything to be in the linear packet range anymore make len cover the entire packet. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[NET]: Consolidate common code in net/core/filter.cPatrick McHardy1-57/+33
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-07-05[NET]: Remove redundant code in net/core/filter.cPatrick McHardy1-12/+0
skb_header_pointer handles linear and non-linear data, no need to handle linear data again. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28[NETFILTER]: Fix connection tracking bug in 2.6.12Patrick McHardy2-2/+8
In 2.6.12 we started dropping the conntrack reference when a packet leaves the IP layer. This broke connection tracking on a bridge, because bridge-netfilter defers calling some NF_IP_* hooks to the bridge layer for locally generated packets going out a bridge, where the conntrack reference is no longer available. This patch keeps the reference in this case as a temporary solution, long term we will remove the defered hook calling. No attempt is made to drop the reference in the bridge-code when it is no longer needed, tc actions could already have sent the packet anywhere. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28[NET]: Micro optimization in eth_header()Denis Vlasenko1-4/+3
Signed-off-by: Denis Vlasenko <vda@ilport.com.ua> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28[IPV6]: remove more unused IPV6_AUTHHDR things.YOSHIFUJI Hideaki1-1/+0
Remove two more unused IPV6_AUTHHDR option things, which I failed to remove them last time, plus, mark IPV6_AUTHHDR obsolete. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28[IPVS]: Close race conditions on ip_vs_conn_tab list modificationNeil Horman1-21/+4
In an smp system, it is possible for an connection timer to expire, calling ip_vs_conn_expire while the connection table is being flushed, before ct_write_lock_bh is acquired. Since the list iterator loop in ip_vs_con_flush releases and re-acquires the spinlock (even though it doesn't re-enable softirqs), it is possible for the expiration function to modify the connection list, while it is being traversed in ip_vs_conn_flush. The result is that the next pointer gets set to NULL, and subsequently dereferenced, resulting in an oops. Signed-off-by: Neil Horman <nhorman@redhat.com> Acked-by: JulianAnastasov Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28[IPV4]: Broken memory allocation in fib_trieRobert Olsson1-17/+39
This should help up the insertion... but the resize is more crucial. and complex and needs some thinking. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28[SCTP] Make init & delayed sack timeouts configurable by user.Vlad Yasevich4-10/+22
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28[IPV4]: ipconfig.c: fix dhcp timeout behaviourMaxime Bizon1-1/+3
I think there is a small bug in ipconfig.c in case IPCONFIG_DHCP is set and dhcp is used. When a DHCPOFFER is received, ip address is kept until we get DHCPACK. If no ack is received, ic_dynamic() returns negatively, but leaves the offered ip address in ic_myaddr. This makes the main loop in ip_auto_config() break and uses the maybe incomplete configuration. Not sure if it's the best way to do, but the following trivial patch correct this. Signed-off-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28[IPV4]: Snmpv2 Mib IP counter ipInAddrErrors supportDietmar Eggemann2-3/+12
I followed Thomas' proposal to see every martian destination as a case where the ipInAddrErrors counter has to be incremented. There are two advantages by doing so: (1) The relation between the ipInReceive counter and all the other ipInXXX counters is more accurate in the case the RTN_UNICAST code check fails and (2) it makes the code in ip_route_input_slow easier. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28[IPV6]: Don't dump temporary addresses twiceYOSHIFUJI Hideaki1-14/+1
Each IPv6 Temporary Address (w/ CONFIG_IPV6_PRIVACY) is dumped twice to netlink. Because temporary addresses are listed in idev->addr_list, there's no need to dump idev->tempaddr separately. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28[NETLINK]: Missing padding fields in dumped structuresPatrick McHardy3-0/+5
Plug holes with padding fields and initialized them to zero. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28[NETLINK]: Missing initializations in dumped dataPatrick McHardy9-3/+32
Mostly missing initialization of padding fields of 1 or 2 bytes length, two instances of uninitialized nlmsgerr->msg of 16 bytes length. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28[NETLINK]: Clear padding in netlink messagesPatrick McHardy1-0/+1
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28[NETFILTER]: ipt_CLUSTERIP: fix ARP manglingHarald Welte1-3/+4
This patch adds mangling of ARP requests (in addition to replies), since ARP caches are made from snooping both requests and replies. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-28[EBTABLES]: Fix thinkos in ebt_log.cDavid S. Miller1-4/+2
When converting over the skb_header_pointer(), I converted parts of this module incorrectly. Kill the 'u' union in ebt_log() and all the bogus references to it. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-26[IPVS]: Fix for overflowspageexec2-5/+7
From: <pageexec@freemail.hu> $subject was fixed in 2.4 already, 2.6 needs it as well. The impact of the bugs is a kernel stack overflow and privilege escalation from CAP_NET_ADMIN via the IP_VS_SO_SET_STARTDAEMON/IP_VS_SO_GET_DAEMON ioctls. People running with 'root=all caps' (i.e., most users) are not really affected (there's nothing to escalate), but SELinux and similar users should take it seriously if they grant CAP_NET_ADMIN to other users. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-26[NETLINK]: Fix two socket hashing bugs.David S. Miller1-3/+8
1) netlink_release() should only decrement the hash entry count if the socket was actually hashed. This was causing hash->entries to underflow, which resulting in all kinds of troubles. On 64-bit systems, this would cause the following conditional to erroneously trigger: err = -ENOMEM; if (BITS_PER_LONG > 32 && unlikely(hash->entries >= UINT_MAX)) goto err; 2) netlink_autobind() needs to propagate the error return from netlink_insert(). Otherwise, callers will not see the error as they should and thus try to operate on a socket with a zero pid, which is very bad. However, it should not propagate -EBUSY. If two threads race to autobind the socket, that is fine. This is consistent with the autobind behavior in other protocols. So bug #1 above, combined with this one, resulted in hangs on netlink_sendmsg() calls to the rtnetlink socket. We'd try to do the user sendmsg() with the socket's pid set to zero, later we do a socket lookup using that pid (via the value we stashed away in NETLINK_CB(skb).pid), but that won't give us the user socket, it will give us the rtnetlink socket. So when we try to wake up the receive queue, we dive back into rtnetlink_rcv() which tries to recursively take the rtnetlink semaphore. Thanks to Jakub Jelink for providing backtraces. Also, thanks to Herbert Xu for supplying debugging patches to help track this down, and also finding a mistake in an earlier version of this fix. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-26[PKTGEN]: Fix random packet sizes causing panicRobert Olsson1-16/+13
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-26[TCP]: Let TCP_CONG_ADVANCED default to nAdrian Bunk1-1/+0
It doesn't seem to make much sense to let an "If unsure, say N." option default to y. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-26[IPV4]: Fix thinko in TCP_CONG_BIC default.David S. Miller1-1/+1
Since it is tristate when we offer it as a choice, we should definte it also as tristate when forcing it as the default. Otherwise kconfig warns. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-25Merge Christoph's freeze cleanup patchLinus Torvalds4-6/+6
2005-06-25[PATCH] Cleanup patch for process freezingChristoph Lameter4-6/+6
1. Establish a simple API for process freezing defined in linux/include/sched.h: frozen(process) Check for frozen process freezing(process) Check if a process is being frozen freeze(process) Tell a process to freeze (go to refrigerator) thaw_process(process) Restart process frozen_process(process) Process is frozen now 2. Remove all references to PF_FREEZE and PF_FROZEN from all kernel sources except sched.h 3. Fix numerous locations where try_to_freeze is manually done by a driver 4. Remove the argument that is no longer necessary from two function calls. 5. Some whitespace cleanup 6. Clear potential race in refrigerator (provides an open window of PF_FREEZE cleared before setting PF_FROZEN, recalc_sigpending does not check PF_FROZEN). This patch does not address the problem of freeze_processes() violating the rule that a task may only modify its own flags by setting PF_FREEZE. This is not clean in an SMP environment. freeze(process) is therefore not SMP safe! Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24[SUNRPC]: Fix {s,}size_t printf format strings in xprt.cDavid S. Miller1-2/+2
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-24[TCP]: Do not present confusing congestion control options by default.David S. Miller1-1/+19
Create TCP_CONG_ADVANCED option, akin to IP_ADVANCED_ROUTER, which when disabled will bypass all of the congestion control Kconfig options and leave the user with a safe default. That safe default is currently BIC-TCP with new Reno as a fallback. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-24[IPV4]: Move FIB lookup algorithm choice under IP_ADVANCED_ROUTINGDavid S. Miller1-26/+38
Most users need not be concerned with a complex choice of what FIB lookup algorithm to use. So give them the safe default of IP_FIB_HASH if IP_ADVANCED_ROUTING is disabled. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-24[PKT_SCHED]: Make TEXTSEARCH* options only selected.David S. Miller1-2/+3
Do not present these confusing new options to the user unless he picked some facility that makes use of it, such as NET_EMATCH_TEXT. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-24Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds10-159/+420
2005-06-24[PATCH] make various thing staticAdrian Bunk1-2/+14
Another rollup of patches which give various symbols static scope Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24[PATCH] knfsd: nfsd4: fix probe_callbackNeilBrown1-0/+1
rpc_create_client was modified recently to do its own (synchronous) NULL ping of the server. We'd rather do that on our own, asynchronously, so that we don't have to block the nfsd thread doing the probe, and so that setclientid handling (hence, client mounts) can proceed normally whether the callback is succesful or not. (We can still function fine without the callback channel--we just won't be able to give out delegations till it's verified to work.) Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23[PKT_SCHED]: Make NET_EMATCH_TEXT select TEXTSEARChDavid S. Miller1-0/+1
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[PKT_SCHED]: Packet classification based on textsearch (ematch)Thomas Graf3-0/+169
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[NET]: skb_find_text() - Find a text pattern in skb dataThomas Graf1-0/+40
Finds a pattern in the skb data according to the specified textsearch configuration. Use textsearch_next() to retrieve subsequent occurrences of the pattern. Returns the offset to the first occurrence or UINT_MAX if no match was found. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[NET]: Zerocopy sequential reading of skb dataThomas Graf1-0/+117
Implements sequential reading for both linear and non-linear skb data at zerocopy cost. The data is returned in chunks of arbitary length, therefore random access is not possible. Usage: from := 0 to := 128 state := undef data := undef len := undef consumed := 0 skb_prepare_seq_read(skb, from, to, &state) while (len = skb_seq_read(consumed, &data, &state)) != 0 do /* do something with 'data' of length 'len' */ if abort then /* abort read if we don't wait for * skb_seq_read() to return 0 */ skb_abort_seq_read(&state) return endif /* not necessary to consume all of 'len' */ consumed += len done Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[TCP]: Allow choosing TCP congestion control via sockopt.Stephen Hemminger4-5/+76
Allow using setsockopt to set TCP congestion control to use on a per socket basis. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[NET]: Separate two usages of netdev_max_backlog.Stephen Hemminger2-3/+12
Separate out the two uses of netdev_max_backlog. One controls the upper bound on packets processed per softirq, the new name for this is netdev_budget; the other controls the limit on packets queued via netif_rx. Increase the max_backlog default to account for faster processors. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[NET]: Eliminate netif_rx massive packet drops.Stephen Hemminger1-19/+2
Eliminate the throttling behaviour when the netif receive queue fills because it behaves badly when using high speed networks under load. The throttling cause multiple packet drops that cause TCP to go into slow start mode. The same effective patch has been part of BIC TCP and H-TCP as well as part of Web100. The existing code drops 100's of packets when the queue fills; this changes it to individual packet drop-tail. Signed-off-by: Stephen Hemmminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[NET]: Remove obsolete netif_rx congestion sensing mechanism.Stephen Hemminger2-123/+1
Remove the congestion sensing mechanism from netif_rx, and always return either full or empty. Almost no driver checks the return value from netif_rx, and those that do only use it for debug messages. The original design of netif_rx was to do flow control based on the receive queue, but NAPI has supplanted this and no driver uses the feedback. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[NET]: Remove obsolete fastroute stats.Stephen Hemminger2-9/+2
Remove last vestiages of fastroute code that is no longer used. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[TCP]: Add Scalable TCP congestion control module.John Heffner3-0/+78
This patch implements Tom Kelly's Scalable TCP congestion control algorithm for the modular framework. The algorithm has some nice scaling properties, and has been used a fair bit in research, though is known to have significant fairness issues, so it's not really suitable for general purpose use. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[TCP]: Add H-TCP congestion control module.Baruch Even3-0/+303
H-TCP is a congestion control algorithm developed at the Hamilton Institute, by Douglas Leith and Robert Shorten. It is extending the standard Reno algorithm with mode switching is thus a relatively simple modification. H-TCP is defined in a layered manner as it is still a research platform. The basic form includes the modification of beta according to the ratio of maxRTT to min RTT and the alpha=2*factor*(1-beta) relation, where factor is dependant on the time since last congestion. The other layers improve convergence by adding appropriate factors to alpha. The following patch implements the H-TCP algorithm in it's basic form. Signed-Off-By: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[TCP]: Add TCP Vegas congestion control module.Stephen Hemminger3-0/+423
TCP Vegas code modified for the new TCP infrastructure. Vegas now uses microsecond resolution timestamps for better estimation of performance over higher speed links. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[TCP]: Add TCP Hybla congestion control module.Daniele Lacamera3-0/+198
TCP Hybla congestion avoidance. - "In heterogeneous networks, TCP connections that incorporate a terrestrial or satellite radio link are greatly disadvantaged with respect to entirely wired connections, because of their longer round trip times (RTTs). To cope with this problem, a new TCP proposal, the TCP Hybla, is presented and discussed in the paper[1]. It stems from an analytical evaluation of the congestion window dynamics in the TCP standard versions (Tahoe, Reno, NewReno), which suggests the necessary modifications to remove the performance dependence on RTT.[...]"[1] [1]: Carlo Caini, Rosario Firrincieli, "TCP Hybla: a TCP enhancement for heterogeneous networks", International Journal of Satellite Communications and Networking Volume 22, Issue 5 , Pages 547 - 566. September 2004. Signed-off-by: Daniele Lacamera (root at danielinux.net)net Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[TCP]: Add High Speed TCP congestion control module.John Heffner3-0/+193
Sally Floyd's high speed TCP congestion control. This is useful for comparison and research. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[TCP]: Add TCP Westwood congestion control module.Stephen Hemminger3-0/+275
This is the existing 2.6.12 Westwood code moved from tcp_input to the new congestion framework. A lot of the inline functions have been eliminated to try and make it clearer. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[TCP]: Add TCP BIC congestion control module.Stephen Hemminger3-0/+353
TCP BIC congestion control reworked to use the new congestion control infrastructure. This version is more up to date than the BIC code in 2.6.12; it incorporates enhancements from BICTCP 1.1, to handle low latency links. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[TCP]: Report congestion control algorithm in tcp_diag.Stephen Hemminger1-0/+5
Enhancement to the tcp_diag interface used by the iproute2 ss command to report the tcp congestion control being used by a socket. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[TCP]: Change tcp_diag to use the existing __RTA_PUT() macro.Stephen Hemminger1-7/+2
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[TCP]: Add pluggable congestion control algorithm infrastructure.Stephen Hemminger10-790/+313
Allow TCP to have multiple pluggable congestion control algorithms. Algorithms are defined by a set of operations and can be built in or modules. The legacy "new RENO" algorithm is used as a starting point and fallback. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-23[PATCH] create a kstrdup library functionPaulo Marques5-27/+7
This patch creates a new kstrdup library function and changes the "local" implementations in several places to use this function. Most of the changes come from the sound and net subsystems. The sound part had already been acknowledged by Takashi Iwai and the net part by David S. Miller. I left UML alone for now because I would need more time to read the code carefully before making changes there. Signed-off-by: Paulo Marques <pmarques@grupopie.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds9-103/+203
2005-06-22[X25]: Fast select with no restriction on responseShaun Pereira3-14/+80
This patch is a follow up to patch 1 regarding "Selective Sub Address matching with call user data". It allows use of the Fast-Select-Acceptance optional user facility for X.25. This patch just implements fast select with no restriction on response (NRR). What this means (according to ITU-T Recomendation 10/96 section 6.16) is that if in an incoming call packet, the relevant facility bits are set for fast-select-NRR, then the called DTE can issue a direct response to the incoming packet using a call-accepted packet that contains call-user-data. This patch allows such a response. The called DTE can also respond with a clear-request packet that contains call-user-data. However, this feature is currently not implemented by the patch. How is Fast Select Acceptance used? By default, the system does not allow fast select acceptance (as before). To enable a response to fast select acceptance, After a listen socket in created and bound as follows socket(AF_X25, SOCK_SEQPACKET, 0); bind(call_soc, (struct sockaddr *)&locl_addr, sizeof(locl_addr)); but before a listen system call is made, the following ioctl should be used. ioctl(call_soc,SIOCX25CALLACCPTAPPRV); Now the listen system call can be made listen(call_soc, 4); After this, an incoming-call packet will be accepted, but no call-accepted packet will be sent back until the following system call is made on the socket that accepts the call ioctl(vc_soc,SIOCX25SENDCALLACCPT); The network (or cisco xot router used for testing here) will allow the application server's call-user-data in the call-accepted packet, provided the call-request was made with Fast-select NRR. Signed-off-by: Shaun Pereira <spereira@tusc.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[X25]: Selective sub-address matching with call user data.Shaun Pereira2-43/+48
From: Shaun Pereira <spereira@tusc.com.au> This is the first (independent of the second) patch of two that I am working on with x25 on linux (tested with xot on a cisco router). Details are as follows. Current state of module: A server using the current implementation (2.6.11.7) of the x25 module will accept a call request/ incoming call packet at the listening x.25 address, from all callers to that address, as long as NO call user data is present in the packet header. If the server needs to choose to accept a particular call request/ incoming call packet arriving at its listening x25 address, then the kernel has to allow a match of call user data present in the call request packet with its own. This is required when multiple servers listen at the same x25 address and device interface. The kernel currently matches ALL call user data, if present. Current Changes: This patch is a follow up to the patch submitted previously by Andrew Hendry, and allows the user to selectively control the number of octets of call user data in the call request packet, that the kernel will match. By default no call user data is matched, even if call user data is present. To allow call user data matching, a cudmatchlength > 0 has to be passed into the kernel after which the passed number of octets will be matched. Otherwise the kernel behavior is exactly as the original implementation. This patch also ensures that as is normally the case, no call user data will be present in the Call accepted / call connected packet sent back to the caller Future Changes on next patch: There are cases however when call user data may be present in the call accepted packet. According to the X.25 recommendation (ITU-T 10/96) section 5.2.3.2 call user data may be present in the call accepted packet provided the fast select facility is used. My next patch will include this fast select utility and the ability to send up to 128 octets call user data in the call accepted packet provided the fast select facility is used. I am currently testing this, again with xot on linux and cisco. Signed-off-by: Shaun Pereira <spereira@tusc.com.au> (With a fix from Alexey Dobriyan <adobriyan@gmail.com>) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[EBTABLES]: vfree() checking cleanupsJames Lamanna1-14/+7
From: jlamanna@gmail.com ebtables.c vfree() checking cleanups. Signed-off by: James Lamanna <jlamanna@gmail.com> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[ATALK] aarp: replace schedule_timeout() with msleep()Nishanth Aravamudan1-4/+3
From: Nishanth Aravamudan <nacc@us.ibm.com> Use msleep() instead of schedule_timeout() to guarantee the task delays as expected. The current code is not wrong, but it does not account for early return due to signals, so I think msleep() should be appropriate. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[IPV4]: Fix route.c gcc4 warningsChuck Short1-4/+4
Signed-off by: Chuck Short <zulcss@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[NETPOLL]: allow multiple netpoll_clients to register against one interfaceJeff Moyer1-10/+29
This patch provides support for registering multiple netpoll clients to the same network device. Only one of these clients may register an rx_hook, however. In practice, this restriction has not been problematic. It is worth mentioning, though, that the current design can be easily extended to allow for the registration of multiple rx_hooks. The basic idea of the patch is that the rx_np pointer in the netpoll_info structure points to the struct netpoll that has rx_hook filled in. Aside from this one case, there is no need for a pointer from the struct net_device to an individual struct netpoll. A lock is introduced to protect the setting and clearing of the np_rx pointer. The pointer will only be cleared upon netpoll client module removal, and the lock should be uncontested. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[NETPOLL]: Introduce a netpoll_info structJeff Moyer1-19/+38
This patch introduces a netpoll_info structure, which the struct net_device will now point to instead of pointing to a struct netpoll. The reason for this is two-fold: 1) fields such as the rx_flags, poll_owner, and poll_lock should be maintained per net_device, not per netpoll; and 2) this is a first step in providing support for multiple netpoll clients to register against the same net_device. The struct netpoll is now pointed to by the netpoll_info structure. As such, the previous behaviour of the code is preserved. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[NET]: dont use strlen() but the result from a prior sprintf()Eric Dumazet1-2/+1
Small patch to save an unecessary call to strlen() : sprintf() gave us the length, just trust it. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[PATCH] RPC: kick off socket connect operations fasterChuck Lever1-1/+8
Make the socket transport kick the event queue to start socket connects immediately. This should improve responsiveness of applications that are sensitive to slow mount operations (like automounters). We are now also careful to cancel the connect worker before destroying the xprt. This eliminates a race where xprt_destroy can finish before the connect worker is even allowed to run. Test-plan: Destructive testing (unplugging the network temporarily). Connectathon with UDP and TCP. Hard-code impossibly small connect timeout. Version: Fri, 29 Apr 2005 15:32:01 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: TCP reconnects are too slowChuck Lever1-2/+1
When the network layer reports a connection close, the RPC task waiting to reconnect should be notified so it can retry immediately instead of waiting for the normal connection establishment timeout. This reverts a change made in 2.6.6 as part of adding client support for RPC over TCP socket idle timeouts. Test-plan: Destructive testing with NFS over TCP mounts. Version: Fri, 29 Apr 2005 15:31:46 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Clean up socket autodisconnectTrond Myklebust1-2/+2
Cancel autodisconnect requests inside xprt_transmit() in order to avoid races. Use more efficient del_singleshot_timer_sync() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Ensure rpc calls respects the RPC_NOINTR flagTrond Myklebust1-34/+37
For internal purposes, the rpc_clnt_sigmask() call is replaced by a call to rpc_task_sigmask(), which ensures that the current task sigmask respects both the client cl_intr flag and the per-task NOINTR flag. Problem noted by Jiaying Zhang. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Allow the sunrpc server to multiplex serveral programs on a ↵Andreas Gruenbacher1-17/+18
single port The NFS and NFSACL programs run on the same RPC transport. This patch adds support for this by converting svc_program into a chained list of programs (server-side). Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Encode and decode arbitrary XDR arraysAndreas Gruenbacher2-3/+257
Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: fix accounting bug in the case of a truncated RPC messageTrond Myklebust2-16/+41
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Lazy RPC receive buffer allocationOlaf Kirch2-7/+35
Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Allow multiple RPC client programs to share the same transportAndreas Gruenbacher3-0/+44
Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Return -EPFNOSUPPORT for RPC programs that are unavailableAndreas Gruenbacher1-10/+14
Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: [PATCH] improve rpcauthauth_create error returnsJ. Bruce Fields3-9/+16
Currently we return -ENOMEM for every single failure to create a new auth. This is actually accurate for auth_null and auth_unix, but for auth_gss it's a bit confusing. Allow rpcauth_create (and the ->create methods) to return errors. With this patch, the user may sometimes see an EINVAL instead. Whee. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Don't fall back from krb5p to krb5iJ. Bruce Fields1-3/+2
We shouldn't be silently falling back from krb5p to krb5i. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Shrink struct rpc_task by switching to wait_on_bit()Trond Myklebust1-13/+18
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Make rpc_create_client() probe server for RPC program+version ↵Trond Myklebust2-2/+59
support Ensure that we don't create an RPC client without checking that the server does indeed support the RPC program + version that we are trying to set up. This enables us to immediately return an error to "mount" if it turns out that the server is only supporting NFSv2, when we requested NFSv3 or NFSv4. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Make rpc_create_client() destroy the transport on failure.Trond Myklebust3-4/+2
This saves us a couple of lines of cleanup code for each call. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Ensure XDR iovec length is initialized correctly in call_headerTrond Myklebust3-4/+19
Fix up call_header() so that it calls xdr_adjust_iovec(). Fix calculation of the scratch buffer length in xdr_init_encode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[PATCH] RPC: Fix a race with rpc_restart_call()Trond Myklebust1-23/+30
If the task->tk_exit() wants to restart the RPC call after delaying then the current RPC code will clobber the timer by calling rpc_delete_timer() immediately after re-entering the loop in __rpc_execute(). Problem noticed by Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-22[NETFILTER]: Fix handling of ICMP packets (RELATED) in ipt_CLUSTERIP target.Harald Welte1-1/+1
Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-22[PATCH] Fix extra double quote in IPV4 KconfigKumar Gala1-1/+1
Kconfig option had an extra double quote at the end of the line which was causing in warning when building. Signed-off-by: Kumar Gala <kumar.gala@freescale.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21[IPV4]: Fix fib_trie.c's args to fib_dump_info().David S. Miller1-1/+1
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETFILTER]: Fix ip6t_LOG sit tunnel loggingPatrick McHardy1-35/+19
Sit tunnel logging is currently broken: MAC=01:23:45:67:89:ab->01:23:45:47:89:ac TUNNEL=123.123. 0.123-> 12.123. 6.123 Apart from the broken IP address, MAC addresses are printed differently for sit tunnels than for everything else. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETFILTER]: Drop conntrack reference in ip_call_ra_chain()/ip_mr_input()Patrick McHardy2-0/+2
Drop reference before handing the packets to raw_rcv() Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETFILTER]: Check TCP checksum in ipt_REJECTPatrick McHardy1-1/+12
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETFILTER]: Avoid unncessary checksum validation in UDP connection trackingKeir Fraser1-0/+1
Signed-off-by: Keir Fraser <Keir.Fraser@xl.cam.ac.uk> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETFILTER]: Missing owner-field initialization in ip6table_rawPatrick McHardy1-2/+4
I missed this one when fixing up iptable_raw. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETFILTER]: expectation timeouts are compulsoryPhil Oester1-9/+5
Since expectation timeouts were made compulsory [1], there is no need to check for them in ip_conntrack_expect_insert. [1] https://lists.netfilter.org/pipermail/netfilter-devel/2005-January/018143.html Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETFILTER]: Restore netfilter assumptions in IPv6 multicastPatrick McHardy1-21/+26
Netfilter assumes that skb->data == skb->nh.ipv6h Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETFILTER]: Kill nf_debugPatrick McHardy11-220/+0
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[NETFILTER]: Kill lockhelp.hPatrick McHardy19-168/+158
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21[IPV6]: multicast join and miscDavid L Stevens2-2/+24
Here is a simplified version of the patch to fix a bug in IPv6 multicasting. It: 1) adds existence check & EADDRINUSE error for regular joins 2) adds an exception for EADDRINUSE in the source-specific multicast join (where a prior join is ok) 3) adds a missing/needed read_lock on sock_mc_list; would've raced with destroying the socket on interface down without 4) adds a "leave group" in the (INCLUDE, empty) source filter case. This frees unneeded socket buffer memory, but also prevents an inappropriate interaction among the 8 socket options that mess with this. Some would fail as if in the group when you aren't really. Item #4 had a locking bug in the last version of this patch; rather than removing the idev->lock read lock only, I've simplified it to remove all lock state in the path and treat it as a direct "leave group" call for the (INCLUDE,empty) case it covers. Tested on an MP machine. :-) Much thanks to HoerdtMickael <hoerdt@clarinet.u-strasbg.fr> who reported the original bug. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>