aboutsummaryrefslogtreecommitdiffstats
path: root/net/netfilter/nf_conntrack_standalone.c
AgeCommit message (Collapse)AuthorFilesLines
2024-05-03netfilter: Remove the now superfluous sentinel elements from ctl_table arrayJoel Granados1-5/+1
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) * Remove sentinel elements from ctl_table structs * Remove instances where an array element is zeroed out to make it look like a sentinel. This is not longer needed and is safe after commit c899710fe7f9 ("networking: Update to register_net_sysctl_sz") added the array size to the ctl_table registration * Remove the need for having __NF_SYSCTL_CT_LAST_SYSCTL as the sysctl array size is now in NF_SYSCTL_CT_LAST_SYSCTL * Remove extra element in ctl_table arrays declarations Acked-by: Kees Cook <keescook@chromium.org> # loadpin & yama Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-22sysctl: treewide: constify ctl_table_header::ctl_table_argThomas Weißschuh1-1/+1
To be able to constify instances of struct ctl_tables it is necessary to remove ways through which non-const versions are exposed from the sysctl core. One of these is the ctl_table_arg member of struct ctl_table_header. Constify this reference as a prerequisite for the full constification of struct ctl_table instances. No functional change. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-15netfilter: Update to register_net_sysctl_szJoel Granados1-1/+3
Move from register_net_sysctl to register_net_sysctl_sz for all the netfilter related files. Do this while making sure to mirror the NULL assignments with a table_size of zero for the unprivileged users. We need to move to the new function in preparation for when we change SIZE_MAX to ARRAY_SIZE() in the register_net_sysctl macro. Failing to do so would erroneously allow ARRAY_SIZE() to be called on a pointer. We hold off the SIZE_MAX to ARRAY_SIZE change until we have migrated all the relevant net sysctl registering functions to register_net_sysctl_sz in subsequent commits. Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-05-10netfilter: conntrack: fix possible bug_on with enable_hooks=1Florian Westphal1-1/+2
I received a bug report (no reproducer so far) where we trip over 712 rcu_read_lock(); 713 ct_hook = rcu_dereference(nf_ct_hook); 714 BUG_ON(ct_hook == NULL); // here In nf_conntrack_destroy(). First turn this BUG_ON into a WARN. I think it was triggered via enable_hooks=1 flag. When this flag is turned on, the conntrack hooks are registered before nf_ct_hook pointer gets assigned. This opens a short window where packets enter the conntrack machinery, can have skb->_nfct set up and a subsequent kfree_skb might occur before nf_ct_hook is set. Call nf_conntrack_init_end() to set nf_ct_hook before we register the pernet ops. Fixes: ba3fbe663635 ("netfilter: nf_conntrack: provide modparam to always register conntrack hooks") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-02-01netfilter: conntrack: remote a return value of the 'seq_print_acct' function.Gavrilov Ilia1-8/+4
The static 'seq_print_acct' function always returns 0. Change the return value to 'void' and remove unnecessary checks. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 1ca9e41770cb ("netfilter: Remove uses of seq_<foo> return values") Signed-off-by: Ilia.Gavrilov <Ilia.Gavrilov@infotecs.ru> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-24netfilter: conntrack: unify established states for SCTP pathsSriram Yagnaraman1-8/+0
An SCTP endpoint can start an association through a path and tear it down over another one. That means the initial path will not see the shutdown sequence, and the conntrack entry will remain in ESTABLISHED state for 5 days. By merging the HEARTBEAT_ACKED and ESTABLISHED states into one ESTABLISHED state, there remains no difference between a primary or secondary path. The timeout for the merged ESTABLISHED state is set to 210 seconds (hb_interval * max_path_retrans + rto_max). So, even if a path doesn't see the shutdown sequence, it will expire in a reasonable amount of time. With this change in place, there is now more than one state from which we can transition to ESTABLISHED, COOKIE_ECHOED and HEARTBEAT_SENT, so handle the setting of ASSURED bit whenever a state change has happened and the new state is ESTABLISHED. Removed the check for dir==REPLY since the transition to ESTABLISHED can happen only in the reply direction. Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-24Revert "netfilter: conntrack: add sctp DATA_SENT state"Sriram Yagnaraman1-8/+0
This reverts commit (bff3d0534804: "netfilter: conntrack: add sctp DATA_SENT state") Using DATA/SACK to detect a new connection on secondary/alternate paths works only on new connections, while a HEARTBEAT is required on connection re-use. It is probably consistent to wait for HEARTBEAT to create a secondary connection in conntrack. Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-12-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextJakub Kicinski1-0/+8
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next 1) Incorrect error check in nft_expr_inner_parse(), from Dan Carpenter. 2) Add DATA_SENT state to SCTP connection tracking helper, from Sriram Yagnaraman. 3) Consolidate nf_confirm for ipv4 and ipv6, from Florian Westphal. 4) Add bitmask support for ipset, from Vishwanath Pai. 5) Handle icmpv6 redirects as RELATED, from Florian Westphal. 6) Add WARN_ON_ONCE() to impossible case in flowtable datapath, from Li Qiong. 7) A large batch of IPVS updates to replace timer-based estimators by kthreads to scale up wrt. CPUs and workload (millions of estimators). Julian Anastasov says: This patchset implements stats estimation in kthread context. It replaces the code that runs on single CPU in timer context every 2 seconds and causing latency splats as shown in reports [1], [2], [3]. The solution targets setups with thousands of IPVS services, destinations and multi-CPU boxes. Spread the estimation on multiple (configured) CPUs and multiple time slots (timer ticks) by using multiple chains organized under RCU rules. When stats are not needed, it is recommended to use run_estimation=0 as already implemented before this change. RCU Locking: - As stats are now RCU-locked, tot_stats, svc and dest which hold estimator structures are now always freed from RCU callback. This ensures RCU grace period after the ip_vs_stop_estimator() call. Kthread data: - every kthread works over its own data structure and all such structures are attached to array. For now we limit kthreads depending on the number of CPUs. - even while there can be a kthread structure, its task may not be running, eg. before first service is added or while the sysctl var is set to an empty cpulist or when run_estimation is set to 0 to disable the estimation. - the allocated kthread context may grow from 1 to 50 allocated structures for timer ticks which saves memory for setups with small number of estimators - a task and its structure may be released if all estimators are unlinked from its chains, leaving the slot in the array empty - every kthread data structure allows limited number of estimators. Kthread 0 is also used to initially calculate the max number of estimators to allow in every chain considering a sub-100 microsecond cond_resched rate. This number can be from 1 to hundreds. - kthread 0 has an additional job of optimizing the adding of estimators: they are first added in temp list (est_temp_list) and later kthread 0 distributes them to other kthreads. The optimization is based on the fact that newly added estimator should be estimated after 2 seconds, so we have the time to offload the adding to chain from controlling process to kthread 0. - to add new estimators we use the last added kthread context (est_add_ktid). The new estimators are linked to the chains just before the estimated one, based on add_row. This ensures their estimation will start after 2 seconds. If estimators are added in bursts, common case if all services and dests are initially configured, we may spread the estimators to more chains and as result, reducing the initial delay below 2 seconds. Many thanks to Jiri Wiesner for his valuable comments and for spending a lot of time reviewing and testing the changes on different platforms with 48-256 CPUs and 1-8 NUMA nodes under different cpufreq governors. The new IPVS estimators do not use workqueue infrastructure because: - The estimation can take long time when using multiple IPVS rules (eg. millions estimator structures) and especially when box has multiple CPUs due to the for_each_possible_cpu usage that expects packets from any CPU. With est_nice sysctl we have more control how to prioritize the estimation kthreads compared to other processes/kthreads that have latency requirements (such as servers). As a benefit, we can see these kthreads in top and decide if we will need some further control to limit their CPU usage (max number of structure to estimate per kthread). - with kthreads we run code that is read-mostly, no write/lock operations to process the estimators in 2-second intervals. - work items are one-shot: as estimators are processed every 2 seconds, they need to be re-added every time. This again loads the timers (add_timer) if we use delayed works, as there are no kthreads to do the timings. [1] Report from Yunhong Jiang: https://lore.kernel.org/netdev/D25792C1-1B89-45DE-9F10-EC350DC04ADC@gmail.com/ [2] https://marc.info/?l=linux-virtual-server&m=159679809118027&w=2 [3] Report from Dust: https://archive.linuxvirtualserver.org/html/lvs-devel/2020-12/msg00000.html * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: ipvs: run_estimation should control the kthread tasks ipvs: add est_cpulist and est_nice sysctl vars ipvs: use kthreads for stats estimation ipvs: use u64_stats_t for the per-cpu counters ipvs: use common functions for stats allocation ipvs: add rcu protection to stats netfilter: flowtable: add a 'default' case to flowtable datapath netfilter: conntrack: set icmpv6 redirects as RELATED netfilter: ipset: Add support for new bitmask parameter netfilter: conntrack: merge ipv4+ipv6 confirm functions netfilter: conntrack: add sctp DATA_SENT state netfilter: nft_inner: fix IS_ERR() vs NULL check ==================== Link: https://lore.kernel.org/r/20221211101204.1751-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-30netfilter: conntrack: add sctp DATA_SENT stateSriram Yagnaraman1-0/+8
SCTP conntrack currently assumes that the SCTP endpoints will probe secondary paths using HEARTBEAT before sending traffic. But, according to RFC 9260, SCTP endpoints can send any traffic on any of the confirmed paths after SCTP association is up. SCTP endpoints that sends INIT will confirm all peer addresses that upper layer configures, and the SCTP endpoint that receives COOKIE_ECHO will only confirm the address it sent the INIT_ACK to. So, we can have a situation where the INIT sender can start to use secondary paths without the need to send HEARTBEAT. This patch allows DATA/SACK packets to create new connection tracking entry. A new state has been added to indicate that a DATA/SACK chunk has been seen in the original direction - SCTP_CONNTRACK_DATA_SENT. State transitions mostly follows the HEARTBEAT_SENT, except on receiving HEARTBEAT/HEARTBEAT_ACK/DATA/SACK in the reply direction. State transitions in original direction: - DATA_SENT behaves similar to HEARTBEAT_SENT for all chunks, except that it remains in DATA_SENT on receving HEARTBEAT, HEARTBEAT_ACK/DATA/SACK chunks State transitions in reply direction: - DATA_SENT behaves similar to HEARTBEAT_SENT for all chunks, except that it moves to HEARTBEAT_ACKED on receiving HEARTBEAT/HEARTBEAT_ACK/DATA/SACK chunks Note: This patch still doesn't solve the problem when the SCTP endpoint decides to use primary paths for association establishment but uses a secondary path for association shutdown. We still have to depend on timeout for connections to expire in such a case. Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-11-18netfilter: conntrack: Fix data-races around ct markDaniel Xu1-1/+1
nf_conn:mark can be read from and written to in parallel. Use READ_ONCE()/WRITE_ONCE() for reads and writes to prevent unwanted compiler optimizations. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-31netfilter: remove nf_conntrack_helper sysctl and modparam togglesPablo Neira Ayuso1-10/+0
__nf_ct_try_assign_helper() remains in place but it now requires a template to configure the helper. A toggle to disable automatic helper assignment was added by: a9006892643a ("netfilter: nf_ct_helper: allow to disable automatic helper assignment") in 2012 to address the issues described in "Secure use of iptables and connection tracking helpers". Automatic conntrack helper assignment was disabled by: 3bb398d925ec ("netfilter: nf_ct_helper: disable automatic helper assignment") back in 2016. This patch removes the sysctl and modparam toggles, users now have to rely on explicit conntrack helper configuration via ruleset. Update tools/testing/selftests/netfilter/nft_conntrack_helper.sh to check that auto-assignment does not happen anymore. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-07netfilter: conntrack: fix crash due to confirmed bit load reorderingFlorian Westphal1-0/+3
Kajetan Puchalski reports crash on ARM, with backtrace of: __nf_ct_delete_from_lists nf_ct_delete early_drop __nf_conntrack_alloc Unlike atomic_inc_not_zero, refcount_inc_not_zero is not a full barrier. conntrack uses SLAB_TYPESAFE_BY_RCU, i.e. it is possible that a 'newly' allocated object is still in use on another CPU: CPU1 CPU2 encounter 'ct' during hlist walk delete_from_lists refcount drops to 0 kmem_cache_free(ct); __nf_conntrack_alloc() // returns same object refcount_inc_not_zero(ct); /* might fail */ /* If set, ct is public/in the hash table */ test_bit(IPS_CONFIRMED_BIT, &ct->status); In case CPU1 already set refcount back to 1, refcount_inc_not_zero() will succeed. The expected possibilities for a CPU that obtained the object 'ct' (but no reference so far) are: 1. refcount_inc_not_zero() fails. CPU2 ignores the object and moves to the next entry in the list. This happens for objects that are about to be free'd, that have been free'd, or that have been reallocated by __nf_conntrack_alloc(), but where the refcount has not been increased back to 1 yet. 2. refcount_inc_not_zero() succeeds. CPU2 checks the CONFIRMED bit in ct->status. If set, the object is public/in the table. If not, the object must be skipped; CPU2 calls nf_ct_put() to un-do the refcount increment and moves to the next object. Parallel deletion from the hlists is prevented by a 'test_and_set_bit(IPS_DYING_BIT, &ct->status);' check, i.e. only one cpu will do the unlink, the other one will only drop its reference count. Because refcount_inc_not_zero is not a full barrier, CPU2 may try to delete an object that is not on any list: 1. refcount_inc_not_zero() successful (refcount inited to 1 on other CPU) 2. CONFIRMED test also successful (load was reordered or zeroing of ct->status not yet visible) 3. delete_from_lists unlinks entry not on the hlist, because IPS_DYING_BIT is 0 (already cleared). 2) is already wrong: CPU2 will handle a partially initited object that is supposed to be private to CPU1. Add needed barriers when refcount_inc_not_zero() is successful. It also inserts a smp_wmb() before the refcount is set to 1 during allocation. Because other CPU might still see the object, refcount_set(1) "resurrects" it, so we need to make sure that other CPUs will also observe the right content. In particular, the CONFIRMED bit test must only pass once the object is fully initialised and either in the hash or about to be inserted (with locks held to delay possible unlink from early_drop or gc worker). I did not change flow_offload_alloc(), as far as I can see it should call refcount_inc(), not refcount_inc_not_zero(): the ct object is attached to the skb so its refcount should be >= 1 in all cases. v2: prefer smp_acquire__after_ctrl_dep to smp_rmb (Will Deacon). v3: keep smp_acquire__after_ctrl_dep close to refcount_inc_not_zero call add comment in nf_conntrack_netlink, no control dependency there due to locks. Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/all/Yr7WTfd6AVTQkLjI@e126311.manchester.arm.com/ Reported-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Diagnosed-by: Will Deacon <will@kernel.org> Fixes: 719774377622 ("netfilter: conntrack: convert to refcount_t api") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Will Deacon <will@kernel.org>
2022-05-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextDavid S. Miller1-1/+1
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next This is v2 including deadlock fix in conntrack ecache rework reported by Jakub Kicinski. The following patchset contains Netfilter updates for net-next, mostly updates to conntrack from Florian Westphal. 1) Add a dedicated list for conntrack event redelivery. 2) Include event redelivery list in conntrack dumps of dying type. 3) Remove per-cpu dying list for event redelivery, not used anymore. 4) Add netns .pre_exit to cttimeout to zap timeout objects before synchronize_rcu() call. 5) Remove nf_ct_unconfirmed_destroy. 6) Add generation id for conntrack extensions for conntrack timeout and helpers. 7) Detach timeout policy from conntrack on cttimeout module removal. 8) Remove __nf_ct_unconfirmed_destroy. 9) Remove unconfirmed list. 10) Remove unconditional local_bh_disable in init_conntrack(). 11) Consolidate conntrack iterator nf_ct_iterate_cleanup(). 12) Detect if ctnetlink listeners exist to short-circuit event path early. 13) Un-inline nf_ct_ecache_ext_add(). 14) Add nf_conntrack_events autodetect ctnetlink listener mode and make it default. 15) Add nf_ct_ecache_exist() to check for event cache extension. 16) Extend flowtable reverse route lookup to include source, iif, tos and mark, from Sven Auhagen. 17) Do not verify zero checksum UDP packets in nf_reject, from Kevin Mitchell. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-13netfilter: conntrack: add nf_conntrack_events autodetect modeFlorian Westphal1-1/+1
This adds the new nf_conntrack_events=2 mode and makes it the default. This leverages the earlier flag in struct net to allow to avoid the event extension as long as no event listener is active in the namespace. This avoids, for most cases, allocation of ct->ext area. A followup patch will take further advantage of this by avoiding calls down into the event framework if the extension isn't present. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-04-27netfilter: conntrack: fix udp offload timeout sysctlVolodymyr Mytnyk1-1/+1
`nf_flowtable_udp_timeout` sysctl option is available only if CONFIG_NFT_FLOW_OFFLOAD enabled. But infra for this flow offload UDP timeout was added under CONFIG_NF_FLOW_TABLE config option. So, if you have CONFIG_NFT_FLOW_OFFLOAD disabled and CONFIG_NF_FLOW_TABLE enabled, the `nf_flowtable_udp_timeout` is not present in sysfs. Please note, that TCP flow offload timeout sysctl option is present even CONFIG_NFT_FLOW_OFFLOAD is disabled. I suppose it was a typo in commit that adds UDP flow offload timeout and CONFIG_NF_FLOW_TABLE should be used instead. Fixes: 975c57504da1 ("netfilter: conntrack: Introduce udp offload timeout configuration") Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-01-09netfilter: conntrack: convert to refcount_t apiFlorian Westphal1-2/+2
Convert nf_conn reference counting from atomic_t to refcount_t based api. refcount_t api provides more runtime sanity checks and will warn on certain constructs, e.g. refcount_inc() on a zero reference count, which usually indicates use-after-free. For this reason template allocation is changed to init the refcount to 1, the subsequenct add operations are removed. Likewise, init_conntrack() is changed to set the initial refcount to 1 instead refcount_inc(). This is safe because the new entry is not (yet) visible to other cpus. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-09-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski1-2/+2
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Protect nft_ct template with global mutex, from Pavel Skripkin. 2) Two recent commits switched inet rt and nexthop exception hashes from jhash to siphash. If those two spots are problematic then conntrack is affected as well, so switch voer to siphash too. While at it, add a hard upper limit on chain lengths and reject insertion if this is hit. Patches from Florian Westphal. 3) Fix use-after-scope in nf_socket_ipv6 reported by KASAN, from Benjamin Hesmans. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf: netfilter: socket: icmp6: fix use-after-scope netfilter: refuse insertion if chain has grown too large netfilter: conntrack: switch to siphash netfilter: conntrack: sanitize table size default settings netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex ==================== Link: https://lore.kernel.org/r/20210903163020.13741-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-30netfilter: refuse insertion if chain has grown too largeFlorian Westphal1-2/+2
Also add a stat counter for this that gets exported both via old /proc interface and ctnetlink. Assuming the old default size of 16536 buckets and max hash occupancy of 64k, this results in 128k insertions (origin+reply), so ~8 entries per chain on average. The revised settings in this series will result in about two entries per bucket on average. This allows a hard-limit ceiling of 64. This is not tunable at the moment, but its possible to either increase nf_conntrack_buckets or decrease nf_conntrack_max to reduce average lengths. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-30netfilter: add netfilter hooks to SRv6 data planeRyoga Saito1-0/+15
This patch introduces netfilter hooks for solving the problem that conntrack couldn't record both inner flows and outer flows. This patch also introduces a new sysctl toggle for enabling lightweight tunnel netfilter hooks. Signed-off-by: Ryoga Saito <contact@proelbtn.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-08-06netfilter: conntrack: remove offload_pickup sysctl againFlorian Westphal1-16/+0
These two sysctls were added because the hardcoded defaults (2 minutes, tcp, 30 seconds, udp) turned out to be too low for some setups. They appeared in 5.14-rc1 so it should be fine to remove it again. Marcelo convinced me that there should be no difference between a flow that was offloaded vs. a flow that was not wrt. timeout handling. Thus the default is changed to those for TCP established and UDP stream, 5 days and 120 seconds, respectively. Marcelo also suggested to account for the timeout value used for the offloading, this avoids increase beyond the value in the conntrack-sysctl and will also instantly expire the conntrack entry with altered sysctls. Example: nf_conntrack_udp_timeout_stream=60 nf_flowtable_udp_timeout=60 This will remove offloaded udp flows after one minute, rather than two. An earlier version of this patch also cleared the ASSURED bit to allow nf_conntrack to evict the entry via early_drop (i.e., table full). However, it looks like we can safely assume that connection timed out via HW is still in established state, so this isn't needed. Quoting Oz: [..] the hardware sends all packets with a set FIN flags to sw. [..] Connections that are aged in hardware are expected to be in the established state. In case it turns out that back-to-sw-path transition can occur for 'dodgy' connections too (e.g., one side disappeared while software-path would have been in RETRANS timeout), we can adjust this later. Cc: Oz Shlomo <ozsh@nvidia.com> Cc: Paul Blakey <paulb@nvidia.com> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-06netfilter: conntrack: add new sysctl to disable RST checkAli Abdallah1-0/+10
This patch adds a new sysctl tcp_ignore_invalid_rst to disable marking out of segments RSTs as INVALID. Signed-off-by: Ali Abdallah <aabdallah@suse.de> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-07netfilter: conntrack: Introduce udp offload timeout configurationOz Shlomo1-0/+22
UDP connections may be offloaded from nf conntrack to nf flow table. Offloaded connections are aged after 30 seconds of inactivity. Once aged, ownership is returned to conntrack with a hard coded pickup time of 30 seconds, after which the connection may be deleted. eted. The current aging intervals may be too aggressive for some users. Provide users with the ability to control the nf flow table offload aging and pickup time intervals via sysctl parameter as a pre-step for configuring the nf flow table GC timeout intervals. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-07netfilter: conntrack: Introduce tcp offload timeout configurationOz Shlomo1-0/+24
TCP connections may be offloaded from nf conntrack to nf flow table. Offloaded connections are aged after 30 seconds of inactivity. Once aged, ownership is returned to conntrack with a hard coded pickup time of 120 seconds, after which the connection may be deleted. eted. The current aging intervals may be too aggressive for some users. Provide users with the ability to control the nf flow table offload aging and pickup time intervals via sysctl parameter as a pre-step for configuring the nf flow table GC timeout intervals. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-06-07netfilter: nftables: add nf_ct_pernet() helper functionPablo Neira Ayuso1-5/+3
Consolidate call to net_generic(net, nf_conntrack_net_id) in this wrapper function. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-31/+35
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Add vlan match and pop actions to the flowtable offload, patches from wenxu. 2) Reduce size of the netns_ct structure, which itself is embedded in struct net Make netns_ct a read-mostly structure. Patches from Florian Westphal. 3) Add FLOW_OFFLOAD_XMIT_UNSPEC to skip dst check from garbage collector path, as required by the tc CT action. From Roi Dayan. 4) VLAN offload fixes for nftables: Allow for matching on both s-vlan and c-vlan selectors. Fix match of VLAN id due to incorrect byteorder. Add a new routine to properly populate flow dissector ethertypes. 5) Missing keys in ip{6}_route_me_harder() results in incorrect routes. This includes an update for selftest infra. Patches from Ido Schimmel. 6) Add counter hardware offload support through FLOW_CLS_STATS. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+1
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c - keep the ZC code, drop the code related to reinit net/bridge/netfilter/ebtables.c - fix build after move to net_generic Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-13netfilter: conntrack: convert sysctls to u8Florian Westphal1-24/+18
log_invalid sysctl allows values of 0 to 255 inclusive so we no longer need a range check: the min/max values can be removed. This also removes all member variables that were moved to net_generic data in previous patches. This reduces size of netns_ct struct by one cache line. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-13netfilter: conntrack: move ct counter to net_generic dataFlorian Westphal1-3/+14
Its only needed from slowpath (sysctl, ctnetlink, gc worker) and when a new conntrack object is allocated. Furthermore, each write dirties the otherwise read-mostly pernet data in struct net.ct, which are accessed from packet path. Move it to the net_generic data. This makes struct netns_ct read-mostly. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-13netfilter: conntrack: move autoassign_helper sysctl to net_generic dataFlorian Westphal1-4/+3
While at it, make it an u8, no need to use an integer for a boolean. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-04-12netfilter: conntrack: Make global sysctls readonly in non-init netnsJonathon Reinhart1-8/+2
These sysctls point to global variables: - NF_SYSCTL_CT_MAX (&nf_conntrack_max) - NF_SYSCTL_CT_EXPECT_MAX (&nf_ct_expect_max) - NF_SYSCTL_CT_BUCKETS (&nf_conntrack_htable_size_user) Because their data pointers are not updated to point to per-netns structures, they must be marked read-only in a non-init_net ns. Otherwise, changes in any net namespace are reflected in (leaked into) all other net namespaces. This problem has existed since the introduction of net namespaces. The current logic marks them read-only only if the net namespace is owned by an unprivileged user (other than init_user_ns). Commit d0febd81ae77 ("netfilter: conntrack: re-visit sysctls in unprivileged namespaces") "exposes all sysctls even if the namespace is unpriviliged." Since we need to mark them readonly in any case, we can forego the unprivileged user check altogether. Fixes: d0febd81ae77 ("netfilter: conntrack: re-visit sysctls in unprivileged namespaces") Signed-off-by: Jonathon Reinhart <Jonathon.Reinhart@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06netfilter: conntrack: move sysctl pointer to net_generic infraFlorian Westphal1-4/+6
No need to keep this in struct net, place it in the net_generic data. The sysctl pointer is removed from struct net in a followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-03-31netfilter: conntrack: do not print icmpv6 as unknown via /procPablo Neira Ayuso1-0/+1
/proc/net/nf_conntrack shows icmpv6 as unknown. Fixes: 09ec82f5af99 ("netfilter: conntrack: remove protocol name from l4proto struct") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-01-10netfilter: conntrack: fix reading nf_conntrack_bucketsJesper Dangaard Brouer1-0/+3
The old way of changing the conntrack hashsize runtime was through changing the module param via file /sys/module/nf_conntrack/parameters/hashsize. This was extended to sysctl change in commit 3183ab8997a4 ("netfilter: conntrack: allow increasing bucket size via sysctl too"). The commit introduced second "user" variable nf_conntrack_htable_size_user which shadow actual variable nf_conntrack_htable_size. When hashsize is changed via module param this "user" variable isn't updated. This results in sysctl net/netfilter/nf_conntrack_buckets shows the wrong value when users update via the old way. This patch fix the issue by always updating "user" variable when reading the proc file. This will take care of changes to the actual variable without sysctl need to be aware. Fixes: 3183ab8997a4 ("netfilter: conntrack: allow increasing bucket size via sysctl too") Reported-by: Yoel Caspersen <yoel@kviknet.dk> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-09-22netfilter: conntrack: proc: rename stat columnFlorian Westphal1-2/+2
Rename 'searched' column to 'clashres' (same len). conntrack(8) using the old /proc interface (ctnetlink not available) shows: cpu=0 entries=4784 clashres=2292 [..] Another alternative is to add another column, but this increases the number of always-0 columns. Fixes: bc92470413f3af1 ("netfilter: conntrack: add clash resolution stat counter") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-28netfilter: conntrack: add clash resolution stat counterFlorian Westphal1-1/+1
There is a misconception about what "insert_failed" means. We increment this even when a clash got resolved, so it might not indicate a problem. Add a dedicated counter for clash resolution and only increment insert_failed if a clash cannot be resolved. For the old /proc interface, export this in place of an older stat that got removed a while back. For ctnetlink, export this with a new attribute. Also correct an outdated comment that implies we add a duplicate tuple -- we only add the (unique) reply direction. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-28netfilter: conntrack: remove ignore statsFlorian Westphal1-1/+1
This counter increments when nf_conntrack_in sees a packet that already has a conntrack attached or when the packet is marked as UNTRACKED. Neither is an error. The former is normal for loopback traffic. The second happens for certain ICMPv6 packets or when nftables/ip(6)tables rules are in place. In case someone needs to count UNTRACKED packets, or packets that are marked as untracked before conntrack_in this can be done with both nftables and ip(6)tables rules. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-07-22netfilter: Use fallthrough pseudo-keywordGustavo A. R. Silva1-1/+1
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-05-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-1/+1
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-05-01 (v2) The following pull-request contains BPF updates for your *net-next* tree. We've added 61 non-merge commits during the last 6 day(s) which contain a total of 153 files changed, 6739 insertions(+), 3367 deletions(-). The main changes are: 1) pulled work.sysctl from vfs tree with sysctl bpf changes. 2) bpf_link observability, from Andrii. 3) BTF-defined map in map, from Andrii. 4) asan fixes for selftests, from Andrii. 5) Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH, from Jakub. 6) production cloudflare classifier as a selftes, from Lorenz. 7) bpf_ktime_get_*_ns() helper improvements, from Maciej. 8) unprivileged bpftool feature probe, from Quentin. 9) BPF_ENABLE_STATS command, from Song. 10) enable bpf_[gs]etsockopt() helpers for sock_ops progs, from Stanislav. 11) enable a bunch of common helpers for cg-device, sysctl, sockopt progs, from Stanislav. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-27sysctl: pass kernel pointers to ->proc_handlerChristoph Hellwig1-1/+1
Instead of having all the sysctl handlers deal with user pointers, which is rather hairy in terms of the BPF interaction, copy the input to and from userspace in common code. This also means that the strings are always NUL-terminated by the common code, making the API a little bit safer. As most handler just pass through the data to one of the common handlers a lot of the changes are mechnical. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-26netfilter: nf_conntrack: add IPS_HW_OFFLOAD status bitBodong Wang1-1/+3
This bit indicates that the conntrack entry is offloaded to hardware flow table. nf_conntrack entry will be tagged with [HW_OFFLOAD] if it's offload to hardware. cat /proc/net/nf_conntrack ipv4 2 tcp 6 \ src=1.1.1.17 dst=1.1.1.16 sport=56394 dport=5001 \ src=1.1.1.16 dst=1.1.1.17 sport=5001 dport=56394 [HW_OFFLOAD] \ mark=0 zone=0 use=3 Note that HW_OFFLOAD/OFFLOAD/ASSURED are mutually exclusive. Changelog: * V1->V2: - Remove check of lastused from stats. It was meant for cases such as removing driver module while traffic still running. Better to handle such cases from garbage collector. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-15netfilter: conntrack: re-visit sysctls in unprivileged namespacesFlorian Westphal1-11/+8
since commit b884fa46177659 ("netfilter: conntrack: unify sysctl handling") conntrack no longer exposes most of its sysctls (e.g. tcp timeouts settings) to network namespaces that are not owned by the initial user namespace. This patch exposes all sysctls even if the namespace is unpriviliged. compared to a 4.19 kernel, the newly visible and writeable sysctls are: net.netfilter.nf_conntrack_acct net.netfilter.nf_conntrack_timestamp .. to allow to enable accouting and timestamp extensions. net.netfilter.nf_conntrack_events .. to turn off conntrack event notifications. net.netfilter.nf_conntrack_checksum .. to disable checksum validation. net.netfilter.nf_conntrack_log_invalid .. to enable logging of packets deemed invalid by conntrack. newly visible sysctls that are only exported as read-only: net.netfilter.nf_conntrack_count .. current number of conntrack entries living in this netns. net.netfilter.nf_conntrack_max .. global upperlimit (maximum size of the table). net.netfilter.nf_conntrack_buckets .. size of the conntrack table (hash buckets). net.netfilter.nf_conntrack_expect_max .. maximum number of permitted expectations in this netns. net.netfilter.nf_conntrack_helper .. conntrack helper auto assignment. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-04netfilter: nf_conntrack: ct_cpu_seq_next should increase position indexVasily Averin1-1/+1
If .next function does not change position index, following .show function will repeat output related to current position index. Cc: stable@vger.kernel.org Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code ...") Link: https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-13netfilter: conntrack: move code to linux/nf_conntrack_common.h.Jeremy Sowden1-1/+0
Move some `struct nf_conntrack` code from linux/skbuff.h to linux/nf_conntrack_common.h. Together with a couple of helpers for getting and setting skb->_nfct, it allows us to remove CONFIG_NF_CONNTRACK checks from net/netfilter/nf_conntrack.h. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-0/+5
r8152 conflicts are the NAPI fixes in 'net' overlapping with some tasklet stuff in net-next Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27netfilter: conntrack: make sysctls per-namespace againFlorian Westphal1-0/+5
When I merged the extension sysctl tables with the main one I forgot to reset them on netns creation. They currently read/write init_net settings. Fixes: d912dec12428 ("netfilter: conntrack: merge acct and helper sysctl table with main one") Fixes: cb2833ed0044 ("netfilter: conntrack: merge ecache and timestamp sysctl tables with main one") Reported-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-03netfilter: conntrack: use shared sysctl constantsMatteo Croce1-18/+16
Use shared sysctl variables for zero and one constants, as in commit eec4844fae7c ("proc/sysctl: add shared variables for range check") Fixes: 8f14c99c7eda ("netfilter: conntrack: limit sysctl setting for boolean options") Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-30netfilter: conntrack: limit sysctl setting for boolean optionsTonghao Zhang1-15/+33
We use the zero and one to limit the boolean options setting. After this patch we only set 0 or 1 to boolean options for nf conntrack sysctl. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-28netfilter: conntrack: fix error path in nf_conntrack_pernet_init()Cong Wang1-2/+2
When nf_ct_netns_get() fails, it should clean up itself, its caller doesn't need to call nf_conntrack_fini_net(). nf_conntrack_init_net() is called after registering sysctl and proc, so its cleanup function should be called before unregistering sysctl and proc. Fixes: ba3fbe663635 ("netfilter: nf_conntrack: provide modparam to always register conntrack hooks") Fixes: b884fa461776 ("netfilter: conntrack: unify sysctl handling") Reported-and-tested-by: syzbot+fcee88b2d87f0539dfe9@syzkaller.appspotmail.com Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: nf_conntrack: provide modparam to always register conntrack hooksPablo Neira Ayuso1-4/+24
The connection tracking hooks can be optionally registered per netns when conntrack is specifically invoked from the ruleset since 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset"). Then, since 4d3a57f23dec ("netfilter: conntrack: do not enable connection tracking unless needed"), the default behaviour is changed to always register them on demand. This patch provides a toggle that allows users to always register them. Without this toggle, in order to use conntrack for statistics collection, you need a dummy rule that refers to conntrack, eg. iptables -I INPUT -m state --state NEW This patch allows users to restore the original behaviour via modparam, ie. always register connection tracking, eg. modprobe nf_conntrack enable_hooks=1 Hence, no dummy rule is required. Reported-by: Laura Garcia <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: remove nf_ct_l4proto_find_getFlorian Westphal1-2/+1
Its now same as __nf_ct_l4proto_find(), so rename that to nf_ct_l4proto_find and use it everywhere. It never returns NULL and doesn't need locks or reference counts. Before this series: 302824 net/netfilter/nf_conntrack.ko 21504 net/netfilter/nf_conntrack_proto_gre.ko text data bss dec hex filename 6281 1732 4 8017 1f51 nf_conntrack_proto_gre.ko 108356 20613 236 129205 1f8b5 nf_conntrack.ko After: 294864 net/netfilter/nf_conntrack.ko text data bss dec hex filename 106979 19557 240 126776 1ef38 nf_conntrack.ko so, even with builtin gre, total size got reduced. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-18netfilter: conntrack: unify sysctl handlingFlorian Westphal1-11/+383
Due to historical reasons, all l4 trackers register their own sysctls. This leads to copy&pasted boilerplate code, that does exactly same thing, just with different data structure. Place all of this in a single file. This allows to remove the various ctl_table pointers from the ct_netns structure and reduces overall code size. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-21netfilter: conntrack: merge ecache and timestamp sysctl tables with main oneFlorian Westphal1-0/+33
Similar to previous change, this time for eache and timestamp. Unlike helper and acct, these can be disabled at build time, so they need ifdef guards. Next patch will remove a few (now obsolete) functions. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-21netfilter: conntrack: merge acct and helper sysctl table with main oneFlorian Westphal1-1/+20
Needless copy&paste, just handle all in one. Next patch will handle acct and timestamp, which have similar functions. Intentionally leaves cruft behind, will be cleaned up in a followup patch. The obsolete sysctl pointers in netns_ct struct are left in place and removed in a single change, as changes to netns trigger rebuild of almost all files. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-21netfilter: conntrack: add mnemonics for sysctl tableFlorian Westphal1-11/+20
Its a bit hard to see what table[3] really lines up with, so add human-readable mnemonics and use them for initialisation. This makes it easier to see e.g. which sysctls are not exported to unprivileged userns. objdiff shows no changes. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-21netfilter: conntrack: un-export seq_print_acctFlorian Westphal1-0/+18
Only one caller, just place it where its needed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-20netfilter: conntrack: remove l3->l4 mapping informationFlorian Westphal1-1/+1
l4 protocols are demuxed by l3num, l4num pair. However, almost all l4 trackers are l3 agnostic. Only exceptions are: - gre, icmp (ipv4 only) - icmpv6 (ipv6 only) This commit gets rid of the l3 mapping, l4 trackers can now be looked up by their IPPROTO_XXX value alone, which gets rid of the additional l3 indirection. For icmp, ipcmp6 and gre, add a check on state->pf and return -NF_ACCEPT in case we're asked to track e.g. icmpv6-in-ipv4, this seems more fitting than using the generic tracker. Additionally we can kill the 2nd l4proto definitions that were needed for v4/v6 split -- they are now the same so we can use single l4proto struct for each protocol, rather than two. The EXPORT_SYMBOLs can be removed as all these object files are part of nf_conntrack with no external references. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-09-17netfilter: remove obsolete need_conntrack stubFlorian Westphal1-7/+0
as of a0ae2562c6c4b27 ("netfilter: conntrack: remove l3proto abstraction") there are no users anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-17netfilter: conntrack: remove l3proto abstractionFlorian Westphal1-10/+4
This unifies ipv4 and ipv6 protocol trackers and removes the l3proto abstraction. This gets rid of all l3proto indirect calls and the need to do a lookup on the function to call for l3 demux. It increases module size by only a small amount (12kbyte), so this reduces size because nf_conntrack.ko is useless without either nf_conntrack_ipv4 or nf_conntrack_ipv6 module. before: text data bss dec hex filename 7357 1088 0 8445 20fd nf_conntrack_ipv4.ko 7405 1084 4 8493 212d nf_conntrack_ipv6.ko 72614 13689 236 86539 1520b nf_conntrack.ko 19K nf_conntrack_ipv4.ko 19K nf_conntrack_ipv6.ko 179K nf_conntrack.ko after: text data bss dec hex filename 79277 13937 236 93450 16d0a nf_conntrack.ko 191K nf_conntrack.ko Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-07-16netfilter: conntrack: remove ctnetlink callbacks from l3 protocol trackersFlorian Westphal1-10/+4
handle everything from ctnetlink directly. After all these years we still only support ipv4 and ipv6, so it seems reasonable to remove l3 protocol tracker support and instead handle ipv4/ipv6 from a common, always builtin inet tracker. Step 1: Get rid of all the l3proto->func() calls. Start with ctnetlink, then move on to packet-path ones. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-16proc: introduce proc_create_net{,_data}Christoph Hellwig1-29/+4
Variants of proc_create{,_data} that directly take a struct seq_operations and deal with network namespaces in ->open and ->release. All callers of proc_create + seq_open_net converted over, and seq_{open,release}_net are removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-03-27net: Drop pernet_operations::asyncKirill Tkhai1-1/+0
Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26net: Use octal not symbolic permissionsJoe Perches1-1/+1
Prefer the direct use of octal for permissions. Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace and some typing. Miscellanea: o Whitespace neatening around these conversions. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-05net: Convert nf_conntrack_net_opsKirill Tkhai1-0/+1
These pernet_operations register and unregister sysctl and /proc entries. Exit batch method also waits till all per-net conntracks are dead. Thus, they are safe to be marked as async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19netfilter: delete /proc THIS_MODULE referencesAlexey Dobriyan1-2/+0
/proc has been ignoring struct file_operations::owner field for 10 years. Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where inode->i_fop is initialized with proxy struct file_operations for regular files: - if (de->proc_fops) - inode->i_fop = de->proc_fops; + if (de->proc_fops) { + if (S_ISREG(inode->i_mode)) + inode->i_fop = &proc_reg_file_ops; + else + inode->i_fop = de->proc_fops; + } VFS stopped pinning module at this point. # ipvs Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: nf_conntrack: add IPS_OFFLOAD status bitPablo Neira Ayuso1-4/+8
This new bit tells us that the conntrack entry is owned by the flow table offload infrastructure. # cat /proc/net/nf_conntrack ipv4 2 tcp 6 src=10.141.10.2 dst=147.75.205.195 sport=36392 dport=443 src=147.75.205.195 dst=192.168.2.195 sport=443 dport=36392 [OFFLOAD] mark=0 zone=0 use=2 Note the [OFFLOAD] tag in the listing. The timer of such conntrack entries look like stopped from userspace. In practise, to make sure the conntrack entry does not go away, the conntrack timer is periodically set to an arbitrary large value that gets refreshed on every iteration from the garbage collector, so it never expires- and they display no internal state in the case of TCP flows. This allows us to save a bitcheck from the packet path via nf_ct_is_expired(). Conntrack entries that have been offloaded to the flow table infrastructure cannot be deleted/flushed via ctnetlink. The flow table infrastructure is also responsible for releasing this conntrack entry. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-09-04net: Replace NF_CT_ASSERT() with WARN_ON().Varsha Rao1-3/+3
This patch removes NF_CT_ASSERT() and instead uses WARN_ON(). Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
2017-08-24netfilter: conntrack: place print_tuple in procfs partFlorian Westphal1-2/+56
CONFIG_NF_CONNTRACK_PROCFS is deprecated, no need to use a function pointer in the trackers for this. Place the printf formatting in the one place that uses it. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-08-24netfilter: conntrack: remove protocol name from l4proto structFlorian Westphal1-1/+16
no need to waste storage for something that is only needed in one place and can be deduced from protocol number. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-08-24netfilter: conntrack: remove protocol name from l3proto structFlorian Westphal1-1/+11
no need to waste storage for something that is only needed in one place and can be deduced from protocol number. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-07-31netfilter: conntrack: do not enable connection tracking unless neededFlorian Westphal1-10/+0
Discussion during NFWS 2017 in Faro has shown that the current conntrack behaviour is unreasonable. Even if conntrack module is loaded on behalf of a single net namespace, its turned on for all namespaces, which is expensive. Commit 481fa373476 ("netfilter: conntrack: add nf_conntrack_default_on sysctl") attempted to provide an alternative to the 'default on' behaviour by adding a sysctl to change it. However, as Eric points out, the sysctl only becomes available once the module is loaded, and then its too late. So we either have to move the sysctl to the core, or, alternatively, change conntrack to become active only once the rule set requires this. This does the latter, conntrack is only enabled when a rule needs it. Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-07netfilter: Use seq_puts()/seq_putc() where possiblesimran singhal1-3/+3
For string without format specifiers, use seq_puts(). For seq_printf("\n"), use seq_putc('\n'). Signed-off-by: simran singhal <singhalsimran0@gmail.com> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-02netfilter: merge ctinfo into nfct pointer storage areaFlorian Westphal1-0/+3
After this change conntrack operations (lookup, creation, matching from ruleset) only access one instead of two sk_buff cache lines. This works for normal conntracks because those are allocated from a slab that guarantees hw cacheline or 8byte alignment (whatever is larger) so the 3 bits needed for ctinfo won't overlap with nf_conn addresses. Template allocation now does manual address alignment (see previous change) on arches that don't have sufficent kmalloc min alignment. Some spots intentionally use skb->_nfct instead of skb_nfct() helpers, this is to avoid undoing the skb_nfct() use when we remove untracked conntrack object in the future. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-04netfilter: conntrack: add nf_conntrack_default_on sysctlFlorian Westphal1-0/+10
This switch (default on) can be used to disable automatic registration of connection tracking functionality in newly created network namespaces. This means that when net namespace goes down (or the tracker protocol module is unloaded) we *might* have to unregister the hooks. We can either add another per-netns variable that tells if the hooks got registered by default, or, alternatively, just call the protocol _put() function and have the callee deal with a possible 'extra' put() operation that doesn't pair with a get() one. This uses the latter approach, i.e. a put() without a get has no effect. Conntrack is still enabled automatically regardless of the new sysctl setting if the new net namespace requires connection tracking, e.g. when NAT rules are created. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-25netfilter: evict stale entries when user reads /proc/net/nf_conntrackFlorian Westphal1-0/+5
Fabian reports a possible conntrack memory leak (could not reproduce so far), however, one minor issue can be easily resolved: > cat /proc/net/nf_conntrack | wc -l = 5 > 4 minutes required to clean up the table. We should not report those timed-out entries to the user in first place. And instead of just skipping those timed-out entries while iterating over the table we can also zap them (we already do this during ctnetlink walks, but I forgot about the /proc interface). Fixes: f330a7fdbe16 ("netfilter: conntrack: get rid of conntrack timer") Reported-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-12netfilter: conntrack: remove packet hotpath statsFlorian Westphal1-4/+4
These counters sit in hot path and do show up in perf, this is especially true for 'found' and 'searched' which get incremented for every packet processed. Information like searched=212030105 new=623431 found=333613 delete=623327 does not seem too helpful nowadays: - on busy systems found and searched will overflow every few hours (these are 32bit integers), other more busy ones every few days. - for debugging there are better methods, such as iptables' trace target, the conntrack log sysctls. Nowadays we also have perf tool. This removes packet path stat counters except those that are expected to be 0 (or close to 0) on a normal system, e.g. 'insert_failed' (race happened) or 'invalid' (proto tracker rejects). The insert stat is retained for the ctnetlink case. The found stat is retained for the tuple-is-taken check when NAT has to determine if it needs to pick a different source address. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-2/+1
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. Most relevant updates are the removal of per-conntrack timers to use a workqueue/garbage collection approach instead from Florian Westphal, the hash and numgen expression for nf_tables from Laura Garcia, updates on nf_tables hash set to honor the NLM_F_EXCL flag, removal of ip_conntrack sysctl and many other incremental updates on our Netfilter codebase. More specifically, they are: 1) Retrieve only 4 bytes to fetch ports in case of non-linear skb transport area in dccp, sctp, tcp, udp and udplite protocol conntrackers, from Gao Feng. 2) Missing whitespace on error message in physdev match, from Hangbin Liu. 3) Skip redundant IPv4 checksum calculation in nf_dup_ipv4, from Liping Zhang. 4) Add nf_ct_expires() helper function and use it, from Florian Westphal. 5) Replace opencoded nf_ct_kill() call in IPVS conntrack support, also from Florian. 6) Rename nf_tables set implementation to nft_set_{name}.c 7) Introduce the hash expression to allow arbitrary hashing of selector concatenations, from Laura Garcia Liebana. 8) Remove ip_conntrack sysctl backward compatibility code, this code has been around for long time already, and we have two interfaces to do this already: nf_conntrack sysctl and ctnetlink. 9) Use nf_conntrack_get_ht() helper function whenever possible, instead of opencoding fetch of hashtable pointer and size, patch from Liping Zhang. 10) Add quota expression for nf_tables. 11) Add number generator expression for nf_tables, this supports incremental and random generators that can be combined with maps, very useful for load balancing purpose, again from Laura Garcia Liebana. 12) Fix a typo in a debug message in FTP conntrack helper, from Colin Ian King. 13) Introduce a nft_chain_parse_hook() helper function to parse chain hook configuration, this is used by a follow up patch to perform better chain update validation. 14) Add rhashtable_lookup_get_insert_key() to rhashtable and use it from the nft_set_hash implementation to honor the NLM_F_EXCL flag. 15) Missing nulls check in nf_conntrack from nf_conntrack_tuple_taken(), patch from Florian Westphal. 16) Don't use the DYING bit to know if the conntrack event has been already delivered, instead a state variable to track event re-delivery states, also from Florian. 17) Remove the per-conntrack timer, use the workqueue approach that was discussed during the NFWS, from Florian Westphal. 18) Use the netlink conntrack table dump path to kill stale entries, again from Florian. 19) Add a garbage collector to get rid of stale conntracks, from Florian. 20) Reschedule garbage collector if eviction rate is high. 21) Get rid of the __nf_ct_kill_acct() helper. 22) Use ARPHRD_ETHER instead of hardcoded 1 from ARP logger. 23) Make nf_log_set() interface assertive on unsupported families. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-17netfilter: conntrack: do not dump other netns's conntrack entries via procLiping Zhang1-0/+4
We should skip the conntracks that belong to a different namespace, otherwise other unrelated netns's conntrack entries will be dumped via /proc/net/nf_conntrack. Fixes: 56d52d4892d0 ("netfilter: conntrack: use a single hashtable for all namespaces") Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-08-12netfilter: use_nf_conn_expires helper in more placesFlorian Westphal1-2/+1
... so we don't need to touch all of these places when we get rid of the timer in nf_conn. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11netfilter: conntrack: fix race between nf_conntrack proc read and hash resizeLiping Zhang1-5/+9
When we do "cat /proc/net/nf_conntrack", and meanwhile resize the conntrack hash table via /sys/module/nf_conntrack/parameters/hashsize, race will happen, because reader can observe a newly allocated hash but the old size (or vice versa). So oops will happen like follows: BUG: unable to handle kernel NULL pointer dereference at 0000000000000017 IP: [<ffffffffa0418e21>] seq_print_acct+0x11/0x50 [nf_conntrack] Call Trace: [<ffffffffa0412f4e>] ? ct_seq_show+0x14e/0x340 [nf_conntrack] [<ffffffff81261a1c>] seq_read+0x2cc/0x390 [<ffffffff812a8d62>] proc_reg_read+0x42/0x70 [<ffffffff8123bee7>] __vfs_read+0x37/0x130 [<ffffffff81347980>] ? security_file_permission+0xa0/0xc0 [<ffffffff8123cf75>] vfs_read+0x95/0x140 [<ffffffff8123e475>] SyS_read+0x55/0xc0 [<ffffffff817c2572>] entry_SYSCALL_64_fastpath+0x1a/0xa4 It is very easy to reproduce this kernel crash. 1. open one shell and input the following cmds: while : ; do echo $RANDOM > /sys/module/nf_conntrack/parameters/hashsize done 2. open more shells and input the following cmds: while : ; do cat /proc/net/nf_conntrack done 3. just wait a monent, oops will happen soon. The solution in this patch is based on Florian's Commit 5e3c61f98175 ("netfilter: conntrack: fix lookup race during hash resize"). And add a wrapper function nf_conntrack_get_ht to get hash and hsize suggested by Florian Westphal. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-5/+31
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next, they are: 1) Don't use userspace datatypes in bridge netfilter code, from Tobin Harding. 2) Iterate only once over the expectation table when removing the helper module, instead of once per-netns, from Florian Westphal. 3) Extra sanitization in xt_hook_ops_alloc() to return error in case we ever pass zero hooks, xt_hook_ops_alloc(): 4) Handle NFPROTO_INET from the logging core infrastructure, from Liping Zhang. 5) Autoload loggers when TRACE target is used from rules, this doesn't change the behaviour in case the user already selected nfnetlink_log as preferred way to print tracing logs, also from Liping Zhang. 6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields by cache lines, increases the size of entries in 11% per entry. From Florian Westphal. 7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian. 8) Remove useless defensive check in nf_logger_find_get() from Shivani Bhardwaj. 9) Remove zone extension as place it in the conntrack object, this is always include in the hashing and we expect more intensive use of zones since containers are in place. Also from Florian Westphal. 10) Owner match now works from any namespace, from Eric Bierdeman. 11) Make sure we only reply with TCP reset to TCP traffic from nf_reject_ipv4, patch from Liping Zhang. 12) Introduce --nflog-size to indicate amount of network packet bytes that are copied to userspace via log message, from Vishwanath Pai. This obsoletes --nflog-range that has never worked, it was designed to achieve this but it has never worked. 13) Introduce generic macros for nf_tables object generation masks. 14) Use generation mask in table, chain and set objects in nf_tables. This allows fixes interferences with ongoing preparation phase of the commit protocol and object listings going on at the same time. This update is introduced in three patches, one per object. 15) Check if the object is active in the next generation for element deactivation in the rbtree implementation, given that deactivation happens from the commit phase path we have to observe the future status of the object. 16) Support for deletion of just added elements in the hash set type. 17) Allow to resize hashtable from /proc entry, not only from the obscure /sys entry that maps to the module parameter, from Florian Westphal. 18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised anymore since we tear down the ruleset whenever the netdevice goes away. 19) Support for matching inverted set lookups, from Arturo Borrero. 20) Simplify the iptables_mangle_hook() by removing a superfluous extra branch. 21) Introduce ether_addr_equal_masked() and use it from the netfilter codebase, from Joe Perches. 22) Remove references to "Use netfilter MARK value as routing key" from the Netfilter Kconfig description given that this toggle doesn't exists already for 10 years, from Moritz Sichert. 23) Introduce generic NF_INVF() and use it from the xtables codebase, from Joe Perches. 24) Setting logger to NONE via /proc was not working unless explicit nul-termination was included in the string. This fixes seems to leave the former behaviour there, so we don't break backward. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-24netfilter: conntrack: allow increasing bucket size via sysctl tooFlorian Westphal1-5/+31
No need to restrict this to module parameter. We export a copy of the real hash size -- when user alters the value we allocate the new table, copy entries etc before we update the real size to the requested one. This is also needed because the real size is used by concurrent readers and cannot be changed without synchronizing the conntrack generation seqcnt. We only allow changing this value from the initial net namespace. Tested using http-client-benchmark vs. httpterm with concurrent while true;do echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets done Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-2/+0
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Fix incorrect timestamp in nfnetlink_queue introduced when addressing y2038 safe timestamp, from Florian Westphal. 2) Get rid of leftover conntrack definition from the previous merge window, oneliner from Florian. 3) Make nf_queue handler pernet to resolve race on dereferencing the hook state structure with netns removal, from Eric Biederman. 4) Ensure clean exit on unregistered helper ports, from Taehee Yoo. 5) Restore FLOWI_FLAG_KNOWN_NH in nf_dup_ipv6. This got lost while generalizing xt_TEE to add packet duplication support in nf_tables, from Paolo Abeni. 6) Insufficient netlink NFTA_SET_TABLE attribute check in nf_tables_getset(), from Phil Turnbull. 7) Reject helper registration on duplicated ports via modparams. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25netfilter: conntrack: remove leftover binary sysctl defineFlorian Westphal1-2/+0
Users got removed in f8572d8f2a2ba ("sysctl net: Remove unused binary sysctl code"). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-05-05netfilter: conntrack: use a single hashtable for all namespacesFlorian Westphal1-8/+5
We already include netns address in the hash and compare the netns pointers during lookup, so even if namespaces have overlapping addresses entries will be spread across the table. Assuming 64k bucket size, this change saves 0.5 mbyte per namespace on a 64bit system. NAT bysrc and expectation hash is still per namespace, those will changed too soon. Future patch will also make conntrack object slab cache global again. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-25netfilter: Set /proc/net entries owner to root in namespacePhilip Whineray1-0/+7
Various files are owned by root with 0440 permission. Reading them is impossible in an unprivileged user namespace, interfering with firewall tools. For instance, iptables-save relies on /proc/net/ip_tables_names contents to dump only loaded tables. This patch assigned ownership of the following files to root in the current namespace: - /proc/net/*_tables_names - /proc/net/*_tables_matches - /proc/net/*_tables_targets - /proc/net/nf_conntrack - /proc/net/nf_conntrack_expect - /proc/net/netfilter/nfnetlink_log A mapping for root must be available, so this order should be followed: unshare(CLONE_NEWUSER); /* Setup the mapping */ unshare(CLONE_NEWNET); Signed-off-by: Philip Whineray <phil@firehol.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-18netfilter: nf_conntrack: add direction support for zonesDaniel Borkmann1-4/+26
This work adds a direction parameter to netfilter zones, so identity separation can be performed only in original/reply or both directions (default). This basically opens up the possibility of doing NAT with conflicting IP address/port tuples from multiple, isolated tenants on a host (e.g. from a netns) without requiring each tenant to NAT twice resp. to use its own dedicated IP address to SNAT to, meaning overlapping tuples can be made unique with the zone identifier in original direction, where the NAT engine will then allocate a unique tuple in the commonly shared default zone for the reply direction. In some restricted, local DNAT cases, also port redirection could be used for making the reply traffic unique w/o requiring SNAT. The consensus we've reached and discussed at NFWS and since the initial implementation [1] was to directly integrate the direction meta data into the existing zones infrastructure, as opposed to the ct->mark approach we proposed initially. As we pass the nf_conntrack_zone object directly around, we don't have to touch all call-sites, but only those, that contain equality checks of zones. Thus, based on the current direction (original or reply), we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID. CT expectations are direction-agnostic entities when expectations are being compared among themselves, so we can only use the identifier in this case. Note that zone identifiers can not be included into the hash mix anymore as they don't contain a "stable" value that would be equal for both directions at all times, f.e. if only zone->id would unconditionally be xor'ed into the table slot hash, then replies won't find the corresponding conntracking entry anymore. If no particular direction is specified when configuring zones, the behaviour is exactly as we expect currently (both directions). Support has been added for the CT netlink interface as well as the x_tables raw CT target, which both already offer existing interfaces to user space for the configuration of zones. Below a minimal, simplified collision example (script in [2]) with netperf sessions: +--- tenant-1 ---+ mark := 1 | netperf |--+ +----------------+ | CT zone := mark [ORIGINAL] [ip,sport] := X +--------------+ +--- gateway ---+ | mark routing |--| SNAT |-- ... + +--------------+ +---------------+ | +--- tenant-2 ---+ | ~~~|~~~ | netperf |--+ +-----------+ | +----------------+ mark := 2 | netserver |------ ... + [ip,sport] := X +-----------+ [ip,port] := Y On the gateway netns, example: iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark conntrack dump from gateway netns: netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1 src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024 [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2 src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555 [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1 src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438 [ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2 src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889 [ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2 Taking this further, test script in [2] creates 200 tenants and runs original-tuple colliding netperf sessions each. A conntrack -L dump in the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED state as expected. I also did run various other tests with some permutations of the script, to mention some: SNAT in random/random-fully/persistent mode, no zones (no overlaps), static zones (original, reply, both directions), etc. [1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/ [2] https://paste.fedoraproject.org/242835/65657871/ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-11netfilter: nf_conntrack: push zone object into functionsDaniel Borkmann1-5/+12
This patch replaces the zone id which is pushed down into functions with the actual zone object. It's a bigger one-time change, but needed for later on extending zones with a direction parameter, and thus decoupling this additional information from all call-sites. No functional changes in this patch. The default zone becomes a global const object, namely nf_ct_zone_dflt and will be returned directly in various cases, one being, when there's f.e. no zoning support. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-11-05netfilter: Remove checks of seq_printf() return valuesSteven Rostedt (Red Hat)1-31/+29
The return value of seq_printf() is soon to be removed. Remove the checks from seq_printf() in favor of seq_has_overflowed(). Link: http://lkml.kernel.org/r/20141104142236.GA10239@salvia Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-05netfilter: Convert print_tuple functions to return voidJoe Perches1-8/+7
Since adding a new function to seq_file (seq_has_overflowed()) there isn't any value for functions called from seq_show to return anything. Remove the int returns of the various print_tuple/<foo>_print_tuple functions. Link: http://lkml.kernel.org/p/f2e8cf8df433a197daa62cbaf124c900c708edc7.1412031505.git.joe@perches.com Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-05netfilter: Remove return values for print_conntrack callbacksSteven Rostedt (Red Hat)1-2/+2
The seq_printf() and friends are having their return values removed. The print_conntrack() returns the result of seq_printf(), which is meaningless when seq_printf() returns void. Might as well remove the return values of print_conntrack() as well. Link: http://lkml.kernel.org/r/20141029220107.465008329@goodmis.org Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-08-22net: use ktime_get_ns() and ktime_get_real_ns() helpersEric Dumazet1-1/+1
ktime_get_ns() replaces ktime_to_ns(ktime_get()) ktime_get_real_ns() replaces ktime_to_ns(ktime_get_real()) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13net: Convert uses of typedef ctl_table to struct ctl_tableJoe Perches1-2/+2
Reduce the uses of this unnecessary typedef. Done via perl script: $ git grep --name-only -w ctl_table net | \ xargs perl -p -i -e '\ sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \ s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge' Reflow the modified lines that now exceed 80 columns. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-18netfilter: add my copyright statementsPatrick McHardy1-0/+1
Add copyright statements to all netfilter files which have had significant changes done by myself in the past. Some notes: - nf_conntrack_ecache.c was incorrectly attributed to Rusty and Netfilter Core Team when it got split out of nf_conntrack_core.c. The copyrights even state a date which lies six years before it was written. It was written in 2005 by Harald and myself. - net/ipv{4,6}/netfilter.c, net/netfitler/nf_queue.c were missing copyright statements. I've added the copyright statement from net/netfilter/core.c, where this code originated - for nf_conntrack_proto_tcp.c I've also added Jozsef, since I didn't want it to give the wrong impression Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-04-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+1
Conflicts: drivers/nfc/microread/mei.c net/netfilter/nfnetlink_queue_core.c Pull in 'net' to get Eric Biederman's AF_UNIX fix, upon which some cleanups are going to go on-top. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27netfilter: nf_conntrack: fix error return codeWei Yongjun1-0/+1
Fix to return a negative error code from the error handling case instead of 0, as returned elsewhere in function nf_conntrack_standalone_init(). Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-19netfilter: nf_conntrack: speed up module removal path if netns in useVladimir Davydov1-6/+10
The patch introduces nf_conntrack_cleanup_net_list(), which cleanups nf_conntrack for a list of netns and calls synchronize_net() only once for them all. This should reduce netns destruction time. I've measured cleanup time for 1k dummy net ns. Here are the results: <without the patch> # modprobe nf_conntrack # time modprobe -r nf_conntrack real 0m10.337s user 0m0.000s sys 0m0.376s <with the patch> # modprobe nf_conntrack # time modprobe -r nf_conntrack real 0m5.661s user 0m0.000s sys 0m0.216s Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-02-18net: proc: change proc_net_remove to remove_proc_entryGao feng1-2/+2
proc_net_remove is only used to remove proc entries that under /proc/net,it's not a general function for removing proc entries of netns. if we want to remove some proc entries which under /proc/net/stat/, we still need to call remove_proc_entry. this patch use remove_proc_entry to replace proc_net_remove. we can remove proc_net_remove after this patch. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-18net: proc: change proc_net_fops_create to proc_createGao feng1-1/+1
Right now, some modules such as bonding use proc_create to create proc entries under /proc/net/, and other modules such as ipv4 use proc_net_fops_create. It looks a little chaos.this patch changes all of proc_net_fops_create to proc_create. we can remove proc_net_fops_create after this patch. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-23netfilter: nf_conntrack: fix compilation if sysctl are disabledPablo Neira Ayuso1-2/+9
In (f94161c netfilter: nf_conntrack: move initialization out of pernet operations), some ifdefs were missing for sysctl dependent code. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-23netfilter: nf_conntrack: move initialization out of pernet operationsGao feng1-21/+35
nf_conntrack initialization and cleanup codes happens in pernet operations function. This task should be done in module_init/exit. We can't use init_net to identify if it's the right time to initialize or cleanup since we cannot make assumption on the order netns are created/destroyed. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-12netfilter: nf_conntrack: fix BUG_ON while removing nf_conntrack with netnsPablo Neira Ayuso1-0/+1
canqun zhang reported that we're hitting BUG_ON in the nf_conntrack_destroy path when calling kfree_skb while rmmod'ing the nf_conntrack module. Currently, the nf_ct_destroy hook is being set to NULL in the destroy path of conntrack.init_net. However, this is a problem since init_net may be destroyed before any other existing netns (we cannot assume any specific ordering while releasing existing netns according to what I read in recent emails). Thanks to Gao feng for initial patch to address this issue. Reported-by: canqun zhang <canqunzhang@gmail.com> Acked-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-11-18net: Don't export sysctls to unprivileged usersEric W. Biederman1-0/+4
In preparation for supporting the creation of network namespaces by unprivileged users, modify all of the per net sysctl exports and refuse to allow them to unprivileged users. This makes it safe for unprivileged users in general to access per net sysctls, and allows sysctls to be exported to unprivileged users on an individual basis as they are deemed safe. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20net: Convert all sysctl registrations to register_net_sysctlEric W. Biederman1-8/+2
This results in code with less boiler plate that is a bit easier to read. Additionally stops us from using compatibility code in the sysctl core, hastening the day when the compatibility code can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20net: Move all of the network sysctls without a namespace into init_net.Eric W. Biederman1-3/+3
This makes it clearer which sysctls are relative to your current network namespace. This makes it a little less error prone by not exposing sysctls for the initial network namespace in other namespaces. This is the same way we handle all of our other network interfaces to userspace and I can't honestly remember why we didn't do this for sysctls right from the start. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-27netfilter: provide config option to disable ancient procfs partsJan Engelhardt1-2/+2
Using /proc/net/nf_conntrack has been deprecated in favour of the conntrack(8) tool. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-04-17netfilter: nf_conntrack_standalone: Fix set-but-unused variables.David S. Miller1-1/+1
The variable 'ret' is set but unused in ct_seq_show(). This was obviously meant to be used to propagate error codes to the caller, so make it so. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-19netfilter: nf_conntrack: fix lifetime display for disabled connectionsPatrick McHardy1-17/+12
When no tstamp extension exists, ct_delta_time() returns -1, which is then assigned to an u64 and tested for negative values to decide whether to display the lifetime. This obviously doesn't work, use a s64 and merge the two minor functions into one. Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-19netfilter: nf_conntrack_tstamp: add flow-based timestamp extensionPablo Neira Ayuso1-0/+41
This patch adds flow-based timestamping for conntracks. This conntrack extension is disabled by default. Basically, we use two 64-bits variables to store the creation timestamp once the conntrack has been confirmed and the other to store the deletion time. This extension is disabled by default, to enable it, you have to: echo 1 > /proc/sys/net/netfilter/nf_conntrack_timestamp This patch allows to save memory for user-space flow-based loogers such as ulogd2. In short, ulogd2 does not need to keep a hashtable with the conntrack in user-space to know when they were created and destroyed, instead we use the kernel timestamp. If we want to have a sane IPFIX implementation in user-space, this nanosecs resolution timestamps are also useful. Other custom user-space applications can benefit from this via libnetfilter_conntrack. This patch modifies the /proc output to display the delta time in seconds since the flow start. You can also obtain the flow-start date by means of the conntrack-tools. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-01-13Merge branch 'master' of ↵Simon Horman1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 into HEAD
2011-01-06netfilter: fix export secctx error handlingPablo Neira Ayuso1-1/+1
In 1ae4de0cdf855305765592647025bde55e85e451, the secctx was exported via the /proc/net/netfilter/nf_conntrack and ctnetlink interfaces instead of the secmark. That patch introduced the use of security_secid_to_secctx() which may return a non-zero value on error. In one of my setups, I have NF_CONNTRACK_SECMARK enabled but no security modules. Thus, security_secid_to_secctx() returns a negative value that results in the breakage of the /proc and `conntrack -L' outputs. To fix this, we skip the inclusion of secctx if the aforementioned function fails. This patch also fixes the dynamic netlink message size calculation if security_secid_to_secctx() returns an error, since its logic is also wrong. This problem exists in Linux kernel >= 2.6.37. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15netfilter: add __rcu annotationsEric Dumazet1-3/+6
Add some __rcu annotations and use helpers to reduce number of sparse warnings (CONFIG_SPARSE_RCU_POINTER=y) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-21secmark: export secctx, drop secmark in procfsEric Paris1-3/+25
The current secmark code exports a secmark= field which just indicates if there is special labeling on a packet or not. We drop this field as it isn't particularly useful and instead export a new field secctx= which is the actual human readable text label. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: James Morris <jmorris@namei.org>
2010-05-13netfilter: cleanup printk messagesStephen Hemminger1-1/+1
Make sure all printk messages have a severity level. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-23netfilter: nf_conntrack: extend with extra stat counterJesper Dangaard Brouer1-3/+4
I suspect an unfortunatly series of events occuring under a DDoS attack, in function __nf_conntrack_find() nf_contrack_core.c. Adding a stats counter to see if the search is restarted too often. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo1-0/+1
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-02-15netfilter: nf_conntrack: add support for "conntrack zones"Patrick McHardy1-0/+6
Normally, each connection needs a unique identity. Conntrack zones allow to specify a numerical zone using the CT target, connections in different zones can use the same identity. Example: iptables -t raw -A PREROUTING -i veth0 -j CT --zone 1 iptables -t raw -A OUTPUT -o veth1 -j CT --zone 1 Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-08netfilter: nf_conntrack: fix hash resizing with namespacesPatrick McHardy1-3/+4
As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash size is global and not per namespace, but modifiable at runtime through /sys/module/nf_conntrack/hashsize. Changing the hash size will only resize the hash in the current namespace however, so other namespaces will use an invalid hash size. This can cause crashes when enlarging the hashsize, or false negative lookups when shrinking it. Move the hash size into the per-namespace data and only use the global hash size to initialize the per-namespace value when instanciating a new namespace. Additionally restrict hash resizing to init_net for now as other namespaces are not handled currently. Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-12sysctl net: Remove unused binary sysctl codeEric W. Biederman1-11/+3
Now that sys_sysctl is a compatiblity wrapper around /proc/sys all sysctl strategy routines, and all ctl_name and strategy entries in the sysctl tables are unused, and can be revmoed. In addition neigh_sysctl_register has been modified to no longer take a strategy argument and it's callers have been modified not to pass one. Cc: "David Miller" <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: netdev@vger.kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2009-03-25netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu()Eric Dumazet1-24/+33
Use "hlist_nulls" infrastructure we added in 2.6.29 for RCUification of UDP & TCP. This permits an easy conversion from call_rcu() based hash lists to a SLAB_DESTROY_BY_RCU one. Avoiding call_rcu() delay at nf_conn freeing time has numerous gains. First, it doesnt fill RCU queues (up to 10000 elements per cpu). This reduces OOM possibility, if queued elements are not taken into account This reduces latency problems when RCU queue size hits hilimit and triggers emergency mode. - It allows fast reuse of just freed elements, permitting better use of CPU cache. - We delete rcu_head from "struct nf_conn", shrinking size of this structure by 8 or 16 bytes. This patch only takes care of "struct nf_conn". call_rcu() is still used for less critical conntrack parts, that may be converted later if necessary. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-12-29cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits: netRusty Russell1-2/+2
In future all cpumask ops will only be valid (in general) for bit numbers < nr_cpu_ids. So use that instead of NR_CPUS in iterators and other comparisons. This is always safe: no cpu number can be >= nr_cpu_ids, and nr_cpu_ids is initialized to NR_CPUS at boot. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-03net: '&' reduxAlexey Dobriyan1-8/+8
I want to compile out proc_* and sysctl_* handlers totally and stub them to NULL depending on config options, however usage of & will prevent this, since taking adress of NULL pointer will break compilation. So, drop & in front of every ->proc_handler and every ->strategy handler, it was never needed in fact. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-08netfilter: netns nf_conntrack: per-netns ↵Alexey Dobriyan1-3/+3
net.netfilter.nf_conntrack_log_invalid sysctl Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_checksum ↵Alexey Dobriyan1-4/+3
sysctl Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08netfilter: netns nf_conntrack: per-netns net.netfilter.nf_conntrack_count sysctlAlexey Dobriyan1-32/+41
Note, sysctl table is always duplicated, this is simpler and less special-cased. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08netfilter: netns nf_conntrack: per-netns /proc/net/stat/nf_conntrack, ↵Alexey Dobriyan1-5/+9
/proc/net/stat/ip_conntrack Show correct conntrack count, while I'm at it. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08netfilter: netns nf_conntrack: per-netns statisticsAlexey Dobriyan1-2/+2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08netfilter: netns nf_conntrack: per-netns /proc/net/nf_conntrack, ↵Alexey Dobriyan1-20/+31
/proc/net/stat/nf_conntrack Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08netfilter: netns nf_conntrack: per-netns conntrack hashAlexey Dobriyan1-2/+2
* make per-netns conntrack hash Other solution is to add ->ct_net pointer to tuplehashes and still has one hash, I tried that it's ugly and requires more code deep down in protocol modules et al. * propagate netns pointer to where needed, e. g. to conntrack iterators. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08netfilter: netns nf_conntrack: per-netns conntrack countAlexey Dobriyan1-2/+2
Sysctls and proc files are stubbed to init_net's one. This is temporary. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-10-08netfilter: netns nf_conntrack: add netns boilerplateAlexey Dobriyan1-3/+18
One comment: #ifdefs around #include is necessary to overcome amazing compile breakages in NOTRACK-in-netns patch (see below). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-08-06netfilter: fix two recent sysctl problemsKrzysztof Piotr Oledzki1-11/+17
Starting with 9043476f726802f4b00c96d0c4f418dde48d1304 ("[PATCH] sanitize proc_sysctl") we have two netfilter releated problems: - WARNING: at kernel/sysctl.c:1966 unregister_sysctl_table+0xcc/0x103(), caused by wrong order of ini/fini calls - net.netfilter is duplicated and has truncated set of records Thanks to very useful guidelines from Al Viro, this patch fixes both of them. Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-21netfilter: accounting rework: ct_extend + 64bit counters (v4)Krzysztof Piotr Oledzki1-15/+3
Initially netfilter has had 64bit counters for conntrack-based accounting, but it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit counters are still required, for example for "connbytes" extension. However, 64bit counters waste a lot of memory and it was not possible to enable/disable it runtime. This patch: - reimplements accounting with respect to the extension infrastructure, - makes one global version of seq_print_acct() instead of two seq_print_counters(), - makes it possible to enable it at boot time (for CONFIG_SYSCTL/CONFIG_SYSFS=n), - makes it possible to enable/disable it at runtime by sysctl or sysfs, - extends counters from 32bit to 64bit, - renames ip_conntrack_counter -> nf_conn_counter, - enables accounting code unconditionally (no longer depends on CONFIG_NF_CT_ACCT), - set initial accounting enable state based on CONFIG_NF_CT_ACCT - removes buggy IPCT_COUNTER_FILLING event handling. If accounting is enabled newly created connections get additional acct extend. Old connections are not changed as it is not possible to add a ct_extend area to confirmed conntrack. Accounting is performed for all connections with acct extend regardless of a current state of "net.netfilter.nf_conntrack_acct". Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-05-02netfilter: assign PDE->fops before gluing PDE into /proc treeDenis V. Lunev1-3/+3
Replace create_proc_entry with specially created for this purpose proc_create. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14[NETFILTER]: nf_conntrack: add tuplehash l3num/protonum accessorsPatrick McHardy1-11/+4
Add accessors for l3num and protonum and get rid of some overly long expressions. Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14[NETFILTER]: nf_conntrack: less hairy ifdefs around proc and sysctlAlexey Dobriyan1-40/+76
Patch splits creation of /proc/net/nf_conntrack, /proc/net/stat/nf_conntrack and net.netfilter hierarchy into their own functions with dummy ones if PROC_FS or SYSCTL is not set. Also, remove dead "ret = 0" write while I'm at it. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-03-27Merge branch 'master' of ↵David S. Miller1-6/+3
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/usb/rndis_host.c drivers/net/wireless/b43/dma.c net/ipv6/ndisc.c
2008-03-27[NETFILTER]: Replate direct proc_fops assignment with proc_create call.Denis V. Lunev1-6/+3
This elliminates infamous race during module loading when one could lookup proc entry without proc_fops assigned. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-02-29[NETFILTER/RXRPC]: Don't use seq_release_private where inappropriate.Pavel Emelyanov1-1/+1
Some netfilter code and rxrpc one use seq_open() to open a proc file, but seq_release_private to release one. This is harmless, but ambiguous. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31[NETFILTER]: nf_conntrack: fix sparse warningPatrick McHardy1-1/+1
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31[NETFILTER]: nf_conntrack: annotate l3protos with constJan Engelhardt1-5/+5
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31[NETFILTER]: nf_conntrack: naming unificationPatrick McHardy1-19/+19
Rename all "conntrack" variables to "ct" for more consistency and avoiding some overly long lines. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31[NETFILTER]: nf_conntrack: use RCU for conntrack hashPatrick McHardy1-8/+10
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-31[NETFILTER]: conntrack: get rid of sparse warningsStephen Hemminger1-0/+2
Teach sparse about locking here, and fix signed/unsigned warnings. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: nf_conntrack: make print_conntrack function optional for l4protosPatrick McHardy1-1/+1
Allows to remove five empty implementations. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: nf_conntrack: remove print_conntrack function from l3protosPatrick McHardy1-3/+0
Its unused and unlikely to ever be used. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: Switch to using ctl_paths in nf_queue and conntrack modulesPavel Emelyanov1-9/+6
This includes the most simple cases for netfilter. The first part is tne queue modules for ipv4 and ipv6, on which the net/ipv4/ and net/ipv6/ paths are reused from the appropriate ipv4 and ipv6 code. The conntrack module is also patched, but this hunk is very small and simple. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NETFILTER]: Make netfilter code use the seq_open_privatePavel Emelyanov1-16/+2
Just switch to the consolidated calls. ipt_recent() has to initialize the private, so use the __seq_open_private() helper. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[NET]: Make /proc/net per network namespaceEric W. Biederman1-6/+7
This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-19Merge branch 'master' of ↵Linus Torvalds1-1/+1
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (25 commits) [TG3]: Fix msi issue with kexec/kdump. [NET] XFRM: Fix whitespace errors. [NET] TIPC: Fix whitespace errors. [NET] SUNRPC: Fix whitespace errors. [NET] SCTP: Fix whitespace errors. [NET] RXRPC: Fix whitespace errors. [NET] ROSE: Fix whitespace errors. [NET] RFKILL: Fix whitespace errors. [NET] PACKET: Fix whitespace errors. [NET] NETROM: Fix whitespace errors. [NET] NETFILTER: Fix whitespace errors. [NET] IPV4: Fix whitespace errors. [NET] DCCP: Fix whitespace errors. [NET] CORE: Fix whitespace errors. [NET] BLUETOOTH: Fix whitespace errors. [NET] AX25: Fix whitespace errors. [PATCH] mac80211: remove rtnl locking in ieee80211_sta.c [PATCH] mac80211: fix GCC warning on 64bit platforms [GENETLINK]: Dynamic multicast groups. [NETLIKN]: Allow removing multicast groups. ...
2007-07-19some kmalloc/memset ->kzalloc (tree wide)Yoann Padioleau1-2/+1
Transform some calls to kmalloc/memset to a single kzalloc (or kcalloc). Here is a short excerpt of the semantic patch performing this transformation: @@ type T2; expression x; identifier f,fld; expression E; expression E1,E2; expression e1,e2,e3,y; statement S; @@ x = - kmalloc + kzalloc (E1,E2) ... when != \(x->fld=E;\|y=f(...,x,...);\|f(...,x,...);\|x=E;\|while(...) S\|for(e1;e2;e3) S\) - memset((T2)x,0,E1); @@ expression E1,E2,E3; @@ - kzalloc(E1 * E2,E3) + kcalloc(E1,E2,E3) [akpm@linux-foundation.org: get kcalloc args the right way around] Signed-off-by: Yoann Padioleau <padator@wanadoo.fr> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Bryan Wu <bryan.wu@analog.com> Acked-by: Jiri Slaby <jirislaby@gmail.com> Cc: Dave Airlie <airlied@linux.ie> Acked-by: Roland Dreier <rolandd@cisco.com> Cc: Jiri Kosina <jkosina@suse.cz> Acked-by: Dmitry Torokhov <dtor@mail.ru> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org> Acked-by: Pierre Ossman <drzeus-list@drzeus.cx> Cc: Jeff Garzik <jeff@garzik.org> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Greg KH <greg@kroah.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19[NET] NETFILTER: Fix whitespace errors.YOSHIFUJI Hideaki1-1/+1
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2007-07-10[NET]: Make all initialized struct seq_operations const.Philippe De Muyter1-2/+2
Make all initialized struct seq_operations in net/ const Signed-off-by: Philippe De Muyter <phdm@macqel.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10[NETFILTER]: Convert DEBUGP to pr_debugPatrick McHardy1-6/+0
Convert DEBUGP to pr_debug and fix lots of non-compiling debug statements. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10[NETFILTER]: nf_conntrack_expect: introduce nf_conntrack_expect_max sysctPatrick McHardy1-1/+8
As a last step of preventing DoS by creating lots of expectations, this patch introduces a global maximum and a sysctl to control it. The default is initialized to 4 * the expectation hash table size, which results in 1/64 of the default maxmimum of conntracks. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10[NETFILTER]: nf_conntrack: move expectaton related init code to ↵Patrick McHardy1-9/+2
nf_conntrack_expect.c Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10[NETFILTER]: nf_conntrack: use hlists for conntrack hashPatrick McHardy1-8/+9
Convert conntrack hash to hlists to reduce its size and cache footprint. Since the default hashsize to max. entries ratio sucks (1:16), this patch doesn't reduce the amount of memory used for the hash by default, but instead uses a better ratio of 1:8, which results in the same max. entries value. One thing worth noting is early_drop. It really should use LRU, so it now has to iterate over the entire chain to find the last unconfirmed entry. Since chains shouldn't be very long and the entire operation is very rare this shouldn't be a problem. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NETFILTER]: Remove changelogs and CVS IDsPatrick McHardy1-11/+0
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-14[PATCH] sysctl: remove insert_at_head from register_sysctlEric W. Biederman1-1/+1
The semantic effect of insert_at_head is that it would allow new registered sysctl entries to override existing sysctl entries of the same name. Which is pain for caching and the proc interface never implemented. I have done an audit and discovered that none of the current users of register_sysctl care as (excpet for directories) they do not register duplicate sysctl entries. So this patch simply removes the support for overriding existing entries in the sys_sysctl interface since no one uses it or cares and it makes future enhancments harder. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andi Kleen <ak@muc.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Corey Minyard <minyard@acm.org> Cc: Neil Brown <neilb@suse.de> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jan Kara <jack@ucw.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12[NETFILTER]: Fix whitespace errorsYOSHIFUJI Hideaki1-1/+1
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12[PATCH] mark struct file_operations const 8Arjan van de Ven1-2/+2
Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2006-12-02[NETFILTER]: nf_conntrack: EXPORT_SYMBOL cleanupPatrick McHardy1-71/+3
- move EXPORT_SYMBOL next to exported symbol - use EXPORT_SYMBOL_GPL since this is what the original code used Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NETFILTER]: Add NAT support for nf_conntrackJozsef Kadlecsik1-1/+4
Add NAT support for nf_conntrack. Joint work of Jozsef Kadlecsik, Yasuyuki Kozakai, Martin Josefsson and myself. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-02[NETFILTER]: conntrack: add '_get' to {ip, nf}_conntrack_expect_findYasuyuki Kozakai1-1/+1
We usually uses 'xxx_find_get' for function which increments reference count. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02[NETFILTER]: nf_conntrack: /proc compatibility with old connection trackingPatrick McHardy1-0/+1
This patch adds /proc/net/ip_conntrack, /proc/net/ip_conntrack_expect and /proc/net/stat/ip_conntrack files to keep old programs using them working. The /proc/net/ip_conntrack and /proc/net/ip_conntrack_expect files show only IPv4 entries, the /proc/net/stat/ip_conntrack shows global statistics. Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02[NETFILTER]: nf_conntrack: move conntrack protocol sysctls to individual modulesPatrick McHardy1-142/+0
Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02[NETFILTER]: nf_conntrack: move extern declaration to header filesPatrick McHardy1-7/+0
Using extern in a C file is a bad idea because the compiler can't catch type errors. Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02[NETFILTER]: nf_conntrack: remove ASSERT_{READ,WRITE}_LOCKMartin Josefsson1-4/+0
Remove the usage of ASSERT_READ_LOCK/ASSERT_WRITE_LOCK in nf_conntrack, it didn't do anything, it was just an empty define and it uglified the code. Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se> Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02[NETFILTER]: nf_conntrack: rename struct nf_conntrack_protocolMartin Josefsson1-15/+15
Rename 'struct nf_conntrack_protocol' to 'struct nf_conntrack_l4proto' in order to help distinguish it from 'struct nf_conntrack_l3proto'. It gets rather confusing with 'nf_conntrack_protocol'. Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se> Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02[NETFILTER]: nf_conntrack: split out protocol handlingMartin Josefsson1-116/+0
This patch splits out L3/L4 protocol handling into its own file nf_conntrack_proto.c Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se> Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02[NETFILTER]: nf_conntrack: split out the event cacheMartin Josefsson1-1/+1
This patch splits out the event cache into its own file nf_conntrack_ecache.c Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se> Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-12-02[NETFILTER]: nf_conntrack: split out expectation handlingMartin Josefsson1-79/+2
This patch splits out expectation handling into its own file nf_conntrack_expect.c Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se> Signed-off-by: Patrick McHardy <kaber@trash.net>
2006-09-22[NETFILTER]: kill listhelp.hPatrick McHardy1-1/+0
Kill listhelp.h and use the list.h functions instead. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[NETFILTER]: Change tunables to __read_mostlyBrian Haley1-1/+1
Change some netfilter tunables to __read_mostly. Also fixed some incorrect file reference comments while I was in there. (this will be my last __read_mostly patch unless someone points out something else that needs it) Signed-off-by: Brian Haley <brian.haley@hp.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-24[NETFILTER]: conntrack: fix SYSCTL=n compileAdrian Bunk1-2/+2
Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-30Remove obsolete #include <linux/config.h>Jörn Engel1-1/+0
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-17[SECMARK]: Add secmark support to conntrackJames Morris1-0/+5
Add a secmark field to IP and NF conntracks, so that security markings on packets can be copied to their associated connections, and also copied back to packets as required. This is similar to the network mark field currently used with conntrack, although it is intended for enforcement of security policy rather than network policy. Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17[NETFILTER]: conntrack: add sysctl to disable checksummingPatrick McHardy1-0/+11
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-09[NETFILTER]: Fix section mismatch warningsPatrick McHardy1-59/+56
Fix section mismatch warnings caused by netfilter's init_or_cleanup functions used in many places by splitting the init from the cleanup parts. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-28[NETFILTER]: Rename init functions.Andrew Morton1-4/+4
Every netfilter module uses `init' for its module_init() function and `fini' or `cleanup' for its module_exit() function. Problem is, this creates uninformative initcall_debug output and makes ctags rather useless. So go through and rename them all to $(filename)_init and $(filename)_fini. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-22[NETFILTER]: nf_conntrack: support for layer 3 protocol load on demandPablo Neira Ayuso1-0/+2
x_tables matches and targets that require nf_conntrack_ipv[4|6] to work don't have enough information to load on demand these modules. This patch introduces the following changes to solve this issue: o nf_ct_l3proto_try_module_get: try to load the layer 3 connection tracker module and increases the refcount. o nf_ct_l3proto_module put: drop the refcount of the module. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[NETFILTER] nf_conntrack: clean up to reduce size of 'struct nf_conn'Harald Welte1-1/+0
This patch moves all helper related data fields of 'struct nf_conn' into a separate structure 'struct nf_conn_help'. This new structure is only present in conntrack entries for which we actually have a helper loaded. Also, this patch cleans up the nf_conntrack 'features' mechanism to resemble what the original idea was: Just glue the feature-specific data structures at the end of 'struct nf_conn', and explicitly re-calculate the pointer to it when needed rather than keeping pointers around. Saves 20 bytes per conntrack on my x86_64 box. A non-helped conntrack is 276 bytes. We still need to save another 20 bytes in order to fit into to target of 256bytes. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-12[NETFILTER] x_tables: Abstraction layer for {ip,ip6,arp}_tablesHarald Welte1-2/+2
This monster-patch tries to do the best job for unifying the data structures and backend interfaces for the three evil clones ip_tables, ip6_tables and arp_tables. In an ideal world we would never have allowed this kind of copy+paste programming... but well, our world isn't (yet?) ideal. o introduce a new x_tables module o {ip,arp,ip6}_tables depend on this x_tables module o registration functions for tables, matches and targets are only wrappers around x_tables provided functions o all matches/targets that are used from ip_tables and ip6_tables are now implemented as xt_FOOBAR.c files and provide module aliases to ipt_FOOBAR and ip6t_FOOBAR o header files for xt_matches are in include/linux/netfilter/, include/linux/netfilter_{ipv4,ipv6} contains compatibility wrappers around the xt_FOOBAR.h headers Based on this patchset we're going to further unify the code, gradually getting rid of all the layer 3 specific assumptions. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-10[NETFILTER]: Fix timeout sysctls on big-endian 64bit architecturesPatrick McHardy1-12/+12
The connection tracking timeout variables are unsigned long, but proc_dointvec_jiffies is used with sizeof(unsigned int) in the sysctl tables. Since there is no proc_doulongvec_jiffies function, change the timeout variables to unsigned int. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-05[NETFILTER]: Add ctnetlink port for nf_conntrackPablo Neira Ayuso1-10/+32
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-15[NETFILTER] Remove nf_conntrack stat proc file when cleaning upKOVACS Krisztian1-1/+1
Fix nf_conntrack statistics proc file removal. Looks like the old bug was forward-ported from ip_conntrack. :-] Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-11-09[NETFILTER]: Add nf_conntrack subsystem.Yasuyuki Kozakai1-0/+869
The existing connection tracking subsystem in netfilter can only handle ipv4. There were basically two choices present to add connection tracking support for ipv6. We could either duplicate all of the ipv4 connection tracking code into an ipv6 counterpart, or (the choice taken by these patches) we could design a generic layer that could handle both ipv4 and ipv6 and thus requiring only one sub-protocol (TCP, UDP, etc.) connection tracking helper module to be written. In fact nf_conntrack is capable of working with any layer 3 protocol. The existing ipv4 specific conntrack code could also not deal with the pecularities of doing connection tracking on ipv6, which is also cured here. For example, these issues include: 1) ICMPv6 handling, which is used for neighbour discovery in ipv6 thus some messages such as these should not participate in connection tracking since effectively they are like ARP messages 2) fragmentation must be handled differently in ipv6, because the simplistic "defrag, connection track and NAT, refrag" (which the existing ipv4 connection tracking does) approach simply isn't feasible in ipv6 3) ipv6 extension header parsing must occur at the correct spots before and after connection tracking decisions, and there were no provisions for this in the existing connection tracking design 4) ipv6 has no need for stateful NAT The ipv4 specific conntrack layer is kept around, until all of the ipv4 specific conntrack helpers are ported over to nf_conntrack and it is feature complete. Once that occurs, the old conntrack stuff will get placed into the feature-removal-schedule and we will fully kill it off 6 months later. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>