aboutsummaryrefslogtreecommitdiffstats
path: root/net/compat.c
AgeCommit message (Collapse)AuthorFilesLines
2023-12-12file: stop exposing receive_fd_user()Christian Brauner1-1/+1
Not every subsystem needs to have their own specialized helper. Just us the __receive_fd() helper. Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-4-e73ca6f4ea83@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-04-14net/compat: Update msg_control_is_user when setting a kernel pointerKevin Brodsky1-0/+1
cmsghdr_from_user_compat_to_kern() is an unusual case w.r.t. how the kmsg->msg_control* fields are used. The input struct msghdr holds a pointer to a user buffer, i.e. ksmg->msg_control_user is active. However, upon success, a kernel pointer is stored in kmsg->msg_control. kmsg->msg_control_is_user should therefore be updated accordingly. Cc: Christoph Hellwig <hch@lst.de> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-14net: Ensure ->msg_control_user is used for user buffersKevin Brodsky1-6/+6
Since commit 1f466e1f15cf ("net: cleanly handle kernel vs user buffers for ->msg_control"), pointers to user buffers should be stored in struct msghdr::msg_control_user, instead of the msg_control field. Most users of msg_control have already been converted (where user buffers are involved), but not all of them. This patch attempts to address the remaining cases. An exception is made for null checks, as it should be safe to use msg_control unconditionally for that purpose. Cc: Christoph Hellwig <hch@lst.de> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-25use less confusing names for iov_iter direction initializersAl Viro1-1/+2
READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-09-20net: clear msg_get_inq in __get_compat_msghdr()Tetsuo Handa1-0/+1
syzbot is still complaining uninit-value in tcp_recvmsg(), for commit 1228b34c8d0ecf6d ("net: clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user()") missed that __get_compat_msghdr() is called instead of copy_msghdr_from_user() when MSG_CMSG_COMPAT is specified. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 1228b34c8d0ecf6d ("net: clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user()") Reviewed-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/d06d0f7f-696c-83b4-b2d5-70b5f2730a37@I-love.SAKURA.ne.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-24Merge branch 'for-5.20/io_uring' into for-5.20/io_uring-zerocopy-sendJens Axboe1-22/+17
* for-5.20/io_uring: (716 commits) io_uring: ensure REQ_F_ISREG is set async offload net: fix compat pointer in get_compat_msghdr() io_uring: Don't require reinitable percpu_ref io_uring: fix types in io_recvmsg_multishot_overflow io_uring: Use atomic_long_try_cmpxchg in __io_account_mem io_uring: support multishot in recvmsg net: copy from user before calling __get_compat_msghdr net: copy from user before calling __copy_msghdr io_uring: support 0 length iov in buffer select in compat io_uring: fix multishot ending when not polled io_uring: add netmsg cache io_uring: impose max limit on apoll cache io_uring: add abstraction around apoll cache io_uring: move apoll cache to poll.c io_uring: consolidate hash_locked io-wq handling io_uring: clear REQ_F_HASH_LOCKED on hash removal io_uring: don't race double poll setting REQ_F_ASYNC_DATA io_uring: don't miss setting REQ_F_DOUBLE_POLL io_uring: disable multishot recvmsg io_uring: only trace one of complete or overflow ... Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24net: fix compat pointer in get_compat_msghdr()Jens Axboe1-1/+1
A previous change enabled external users to copy the data before calling __get_compat_msghdr(), but didn't modify get_compat_msghdr() or __io_compat_recvmsg_copy_hdr() to take that into account. They are both stil passing in the __user pointer rather than the copied version. Ensure we pass in the kernel struct, not the pointer to the user data. Link: https://lore.kernel.org/all/46439555-644d-08a1-7d66-16f8f9a320f0@samsung.com/ Fixes: 1a3e4e94a1b9 ("net: copy from user before calling __get_compat_msghdr") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24net: copy from user before calling __get_compat_msghdrDylan Yudaken1-22/+17
this is in preparation for multishot receive from io_uring, where it needs to have access to the original struct user_msghdr. functionally this should be a no-op. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220714110258.1336200-3-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-19skbuff: carry external ubuf_info in msghdrPavel Begunkov1-0/+1
Make possible for network in-kernel callers like io_uring to pass in a custom ubuf_info by setting it in a new field of struct msghdr. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-03net: Return the correct errno codeZheng Yongjun1-1/+1
When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-03iov_iter: transparently handle compat iovecs in import_iovecChristoph Hellwig1-2/+2
Use in compat_syscall to import either native or the compat iovecs, and remove the now superflous compat_import_iovec. This removes the need for special compat logic in most callers, and the remaining ones can still be simplified by using __import_iovec with a bool compat parameter. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-08-07net/scm: Fix typo in SCM_RIGHTS compat refactoringKees Cook1-1/+1
When refactoring the SCM_RIGHTS code, I accidentally mis-merged my native/compat diffs, which entirely broke using SCM_RIGHTS in compat mode. Use the correct helper. Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Link: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-August/216156.html Reported-by: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca> Link: https://lore.kernel.org/lkml/1596812929.lz7fuo8r2w.none@localhost/ Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Fixes: c0029de50982 ("net/scm: Regularize compat handling of scm_detach_fds()") Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2020-08-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds1-119/+3
Pull networking updates from David Miller: 1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan. 2) Support UDP segmentation in code TSO code, from Eric Dumazet. 3) Allow flashing different flash images in cxgb4 driver, from Vishal Kulkarni. 4) Add drop frames counter and flow status to tc flower offloading, from Po Liu. 5) Support n-tuple filters in cxgb4, from Vishal Kulkarni. 6) Various new indirect call avoidance, from Eric Dumazet and Brian Vazquez. 7) Fix BPF verifier failures on 32-bit pointer arithmetic, from Yonghong Song. 8) Support querying and setting hardware address of a port function via devlink, use this in mlx5, from Parav Pandit. 9) Support hw ipsec offload on bonding slaves, from Jarod Wilson. 10) Switch qca8k driver over to phylink, from Jonathan McDowell. 11) In bpftool, show list of processes holding BPF FD references to maps, programs, links, and btf objects. From Andrii Nakryiko. 12) Several conversions over to generic power management, from Vaibhav Gupta. 13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry Yakunin. 14) Various https url conversions, from Alexander A. Klimov. 15) Timestamping and PHC support for mscc PHY driver, from Antoine Tenart. 16) Support bpf iterating over tcp and udp sockets, from Yonghong Song. 17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov. 18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan. 19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several drivers. From Luc Van Oostenryck. 20) XDP support for xen-netfront, from Denis Kirjanov. 21) Support receive buffer autotuning in MPTCP, from Florian Westphal. 22) Support EF100 chip in sfc driver, from Edward Cree. 23) Add XDP support to mvpp2 driver, from Matteo Croce. 24) Support MPTCP in sock_diag, from Paolo Abeni. 25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic infrastructure, from Jakub Kicinski. 26) Several pci_ --> dma_ API conversions, from Christophe JAILLET. 27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel. 28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki. 29) Refactor a lot of networking socket option handling code in order to avoid set_fs() calls, from Christoph Hellwig. 30) Add rfc4884 support to icmp code, from Willem de Bruijn. 31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei. 32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin. 33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin. 34) Support TCP syncookies in MPTCP, from Flowian Westphal. 35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano Brivio. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits) net: thunderx: initialize VF's mailbox mutex before first usage usb: hso: remove bogus check for EINPROGRESS usb: hso: no complaint about kmalloc failure hso: fix bailout in error case of probe ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM selftests/net: relax cpu affinity requirement in msg_zerocopy test mptcp: be careful on subflow creation selftests: rtnetlink: make kci_test_encap() return sub-test result selftests: rtnetlink: correct the final return value for the test net: dsa: sja1105: use detected device id instead of DT one on mismatch tipc: set ub->ifindex for local ipv6 address ipv6: add ipv6_dev_find() net: openvswitch: silence suspicious RCU usage warning Revert "vxlan: fix tos value before xmit" ptp: only allow phase values lower than 1 period farsync: switch from 'pci_' to 'dma_' API wan: wanxl: switch from 'pci_' to 'dma_' API hv_netvsc: do not use VF device if link is down dpaa2-eth: Fix passing zero to 'PTR_ERR' warning net: macb: Properly handle phylink on at91sam9x ...
2020-08-04Merge tag 'seccomp-v5.9-rc1' of ↵Linus Torvalds1-30/+25
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: "There are a bunch of clean ups and selftest improvements along with two major updates to the SECCOMP_RET_USER_NOTIF filter return: EPOLLHUP support to more easily detect the death of a monitored process, and being able to inject fds when intercepting syscalls that expect an fd-opening side-effect (needed by both container folks and Chrome). The latter continued the refactoring of __scm_install_fd() started by Christoph, and in the process found and fixed a handful of bugs in various callers. - Improved selftest coverage, timeouts, and reporting - Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner) - Refactor __scm_install_fd() into __receive_fd() and fix buggy callers - Introduce 'addfd' command for SECCOMP_RET_USER_NOTIF (Sargun Dhillon)" * tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits) selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD seccomp: Introduce addfd ioctl to seccomp user notifier fs: Expand __receive_fd() to accept existing fd pidfd: Replace open-coded receive_fd() fs: Add receive_fd() wrapper for __receive_fd() fs: Move __scm_install_fd() to __receive_fd() net/scm: Regularize compat handling of scm_detach_fds() pidfd: Add missing sock updates for pidfd_getfd() net/compat: Add missing sock updates for SCM_RIGHTS selftests/seccomp: Check ENOSYS under tracing selftests/seccomp: Refactor to use fixture variants selftests/harness: Clean up kern-doc for fixtures seccomp: Use -1 marker for end of mode 1 syscall list seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID selftests/seccomp: Rename user_trap_syscall() to user_notif_syscall() selftests/seccomp: Make kcmp() less required seccomp: Use pr_fmt selftests/seccomp: Improve calibration loop selftests/seccomp: use 90s as timeout selftests/seccomp: Expand benchmark to per-filter measurements ...
2020-08-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-1/+1
Resolved kernel/bpf/btf.c using instructions from merge commit 69138b34a7248d2396ab85c8652e20c0c39beaba Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-27fix a braino in cmsghdr_from_user_compat_to_kern()Al Viro1-1/+1
commit 547ce4cfb34c ("switch cmsghdr_from_user_compat_to_kern() to copy_from_user()") missed one of the places where ucmlen should've been replaced with cmsg.cmsg_len, now that we are fetching the entire struct rather than doing it field-by-field. As the result, compat sendmsg() with several different-sized cmsg attached started to fail with EINVAL. Trivial to fix, fortunately. Fixes: 547ce4cfb34c ("switch cmsghdr_from_user_compat_to_kern() to copy_from_user()") Reported-by: Nick Bowler <nbowler@draconx.ca> Tested-by: Nick Bowler <nbowler@draconx.ca> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19net: remove compat_sys_{get,set}sockoptChristoph Hellwig1-76/+3
Now that the ->compat_{get,set}sockopt proto_ops methods are gone there is no good reason left to keep the compat syscalls separate. This fixes the odd use of unsigned int for the compat_setsockopt optlen and the missing sock_use_custom_sol_socket. It would also easily allow running the eBPF hooks for the compat syscalls, but such a large change in behavior does not belong into a consolidation patch like this one. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19net: simplify cBPF setsockopt compat handlingChristoph Hellwig1-44/+1
Add a helper that copies either a native or compat bpf_fprog from userspace after verifying the length, and remove the compat setsockopt handlers that now aren't required. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-13fs: Add receive_fd() wrapper for __receive_fd()Kees Cook1-1/+1
For both pidfd and seccomp, the __user pointer is not used. Update __receive_fd() to make writing to ufd optional via a NULL check. However, for the receive_fd_user() wrapper, ufd is NULL checked so an -EFAULT can be returned to avoid changing the SCM_RIGHTS interface behavior. Add new wrapper receive_fd() for pidfd and seccomp that does not use the ufd argument. For the new helper, the allocated fd needs to be returned on success. Update the existing callers to handle it. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-13fs: Move __scm_install_fd() to __receive_fd()Kees Cook1-1/+1
In preparation for users of the "install a received file" logic outside of net/ (pidfd and seccomp), relocate and rename __scm_install_fd() from net/core/scm.c to __receive_fd() in fs/file.c, and provide a wrapper named receive_fd_user(), as future patches will change the interface to __receive_fd(). Additionally add a comment to fd_install() as a counterpoint to how __receive_fd() interacts with fput(). Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Dmitry Kadashev <dkadashev@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Sargun Dhillon <sargun@sargun.me> Cc: Ido Schimmel <idosch@idosch.org> Cc: Ioana Ciornei <ioana.ciornei@nxp.com> Cc: linux-fsdevel@vger.kernel.org Cc: netdev@vger.kernel.org Reviewed-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-13net/scm: Regularize compat handling of scm_detach_fds()Kees Cook1-31/+25
Duplicate the cleanups from commit 2618d530dd8b ("net/scm: cleanup scm_detach_fds") into the compat code. Replace open-coded __receive_sock() with a call to the helper. Move the check added in commit 1f466e1f15cf ("net: cleanly handle kernel vs user buffers for ->msg_control") to before the compat call, even though it should be impossible for an in-kernel call to also be compat. Correct the int "flags" argument to unsigned int to match fd_install() and similar APIs. Regularize any remaining differences, including a whitespace issue, a checkpatch warning, and add the check from commit 6900317f5eff ("net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds") which fixed an overflow unique to 64-bit. To avoid confusion when comparing the compat handler to the native handler, just include the same check in the compat handler. Cc: Christoph Hellwig <hch@lst.de> Cc: Sargun Dhillon <sargun@sargun.me> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-13net/compat: Add missing sock updates for SCM_RIGHTSKees Cook1-0/+1
Add missed sock updates to compat path via a new helper, which will be used more in coming patches. (The net/core/scm.c code is left as-is here to assist with -stable backports for the compat path.) Cc: Christoph Hellwig <hch@lst.de> Cc: Sargun Dhillon <sargun@sargun.me> Cc: Jakub Kicinski <kuba@kernel.org> Cc: stable@vger.kernel.org Fixes: 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly") Fixes: d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly") Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2020-06-01switch cmsghdr_from_user_compat_to_kern() to copy_from_user()Al Viro1-7/+8
no point getting compat_cmsghdr field-by-field Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-20get rid of compat_mc_setsockopt()Al Viro1-90/+0
not used anymore Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-20get rid of compat_mc_getsockopt()Al Viro1-79/+0
now we can do MCAST_MSFILTER in compat ->getsockopt() without playing silly buggers with copying things back and forth. We can form a native struct group_filter (sans the variable-length tail) on stack, pass that + pointer to the tail of original request to the helper doing the bulk of the work, then do the rest of copyout - same as the native getsockopt() does. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-20lift compat definitions of mcast [sg]etsockopt requests into net/compat.hAl Viro1-25/+0
We want to get rid of compat_mc_[sg]etsockopt() and to have that stuff handled without compat_alloc_user_space(), extra copying through userland, etc. To do that we'll need ipv4 and ipv6 instances of ->compat_[sg]etsockopt() to manipulate the 32bit variants of mcast requests, so we need to move the definitions of those out of net/compat.c and into a public header. This patch just does a mechanical move to include/net/compat.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-11net: cleanly handle kernel vs user buffers for ->msg_controlChristoph Hellwig1-2/+3
The msg_control field in struct msghdr can either contain a user pointer when used with the recvmsg system call, or a kernel pointer when used with sendmsg. To complicate things further kernel_recvmsg can stuff a kernel pointer in and then use set_fs to make the uaccess helpers accept it. Replace it with a union of a kernel pointer msg_control field, and a user pointer msg_control_user one, and allow kernel_recvmsg operate on a proper kernel pointer using a bitfield to override the normal choice of a user pointer for recvmsg. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10net: abstract out normal and compat msghdr importJens Axboe1-7/+23
This splits it into two parts, one that imports the message, and one that imports the iovec. This allows a caller to only do the first part, and import the iovec manually afterwards. No functional changes in this patch. Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-15y2038: socket: use __kernel_old_timespec instead of timespecArnd Bergmann1-1/+1
The 'timespec' type definition and helpers like ktime_to_timespec() or timespec64_to_timespec() should no longer be used in the kernel so we can remove them and avoid introducing y2038 issues in new code. Change the socket code that needs to pass a timespec to user space for backward compatibility to use __kernel_old_timespec instead. This type has the same layout but with a clearer defined name. Slightly reformat tcp_recv_timestamp() for consistency after the removal of timespec64_to_timespec(). Acked-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-05-31uio: make import_iovec()/compat_import_iovec() return bytes on successJens Axboe1-1/+2
Currently these functions return < 0 on error, and 0 for success. Change that so that we return < 0 on error, but number of bytes for success. Some callers already treat the return value that way, others need a slight tweak. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-21treewide: Add SPDX license identifier for missed filesThomas Gleixner1-0/+1
Add SPDX license identifiers to all files which: - Have no license information of any form - Have EXPORT_.*_SYMBOL_GPL inside which was used in the initial scan/conversion to ignore the file These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-19net: rework SIOCGSTAMP ioctl handlingArnd Bergmann1-57/+0
The SIOCGSTAMP/SIOCGSTAMPNS ioctl commands are implemented by many socket protocol handlers, and all of those end up calling the same sock_get_timestamp()/sock_get_timestampns() helper functions, which results in a lot of duplicate code. With the introduction of 64-bit time_t on 32-bit architectures, this gets worse, as we then need four different ioctl commands in each socket protocol implementation. To simplify that, let's add a new .gettstamp() operation in struct proto_ops, and move ioctl implementation into the common sock_ioctl()/compat_sock_ioctl_trans() functions that these all go through. We can reuse the sock_get_timestamp() implementation, but generalize it so it can deal with both native and compat mode, as well as timeval and timespec structures. Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/lkml/CAK8P3a038aDQQotzua_QtKGhq8O9n+rdiz2=WDCp82ys8eUT+A@mail.gmail.com/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-05Merge branch 'timers-2038-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull year 2038 updates from Thomas Gleixner: "Another round of changes to make the kernel ready for 2038. After lots of preparatory work this is the first set of syscalls which are 2038 safe: 403 clock_gettime64 404 clock_settime64 405 clock_adjtime64 406 clock_getres_time64 407 clock_nanosleep_time64 408 timer_gettime64 409 timer_settime64 410 timerfd_gettime64 411 timerfd_settime64 412 utimensat_time64 413 pselect6_time64 414 ppoll_time64 416 io_pgetevents_time64 417 recvmmsg_time64 418 mq_timedsend_time64 419 mq_timedreceiv_time64 420 semtimedop_time64 421 rt_sigtimedwait_time64 422 futex_time64 423 sched_rr_get_interval_time64 The syscall numbers are identical all over the architectures" * 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) riscv: Use latest system call ABI checksyscalls: fix up mq_timedreceive and stat exceptions unicore32: Fix __ARCH_WANT_STAT64 definition asm-generic: Make time32 syscall numbers optional asm-generic: Drop getrlimit and setrlimit syscalls from default list 32-bit userspace ABI: introduce ARCH_32BIT_OFF_T config option compat ABI: use non-compat openat and open_by_handle_at variants y2038: add 64-bit time_t syscalls to all 32-bit architectures y2038: rename old time and utime syscalls y2038: remove struct definition redirects y2038: use time32 syscall names on 32-bit syscalls: remove obsolete __IGNORE_ macros y2038: syscalls: rename y2038 compat syscalls x86/x32: use time64 versions of sigtimedwait and recvmmsg timex: change syscalls to use struct __kernel_timex timex: use __kernel_timex internally sparc64: add custom adjtimex/clock_adjtime functions time: fix sys_timer_settime prototype time: Add struct __kernel_timex time: make adjtime compat handling available for 32 bit ...
2019-03-03net: fixup address-space warnings in compat_mc_{get,set}sockopt()Ben Dooks1-4/+4
Add __user attributes in some of the casts in this function to avoid the following sparse warnings: net/compat.c:592:57: warning: cast removes address space of expression net/compat.c:592:57: warning: incorrect type in initializer (different address spaces) net/compat.c:592:57: expected struct compat_group_req [noderef] <asn:1>*gr32 net/compat.c:592:57: got void *<noident> net/compat.c:613:65: warning: cast removes address space of expression net/compat.c:613:65: warning: incorrect type in initializer (different address spaces) net/compat.c:613:65: expected struct compat_group_source_req [noderef] <asn:1>*gsr32 net/compat.c:613:65: got void *<noident> net/compat.c:634:60: warning: cast removes address space of expression net/compat.c:634:60: warning: incorrect type in initializer (different address spaces) net/compat.c:634:60: expected struct compat_group_filter [noderef] <asn:1>*gf32 net/compat.c:634:60: got void *<noident> net/compat.c:672:52: warning: cast removes address space of expression net/compat.c:672:52: warning: incorrect type in initializer (different address spaces) net/compat.c:672:52: expected struct compat_group_filter [noderef] <asn:1>*gf32 net/compat.c:672:52: got void *<noident> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+5
Three conflicts, one of which, for marvell10g.c is non-trivial and requires some follow-up from Heiner or someone else. The issue is that Heiner converted the marvell10g driver over to use the generic c45 code as much as possible. However, in 'net' a bug fix appeared which makes sure that a new local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0 is cleared. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22net: socket: add check for negative optlen in compat setsockoptJann Horn1-1/+5
__sys_setsockopt() already checks for `optlen < 0`. Add an equivalent check to the compat path for robustness. This has to be `> INT_MAX` instead of `< 0` because the signedness of `optlen` is different here. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07y2038: syscalls: rename y2038 compat syscallsArnd Bergmann1-1/+1
A lot of system calls that pass a time_t somewhere have an implementation using a COMPAT_SYSCALL_DEFINEx() on 64-bit architectures, and have been reworked so that this implementation can now be used on 32-bit architectures as well. The missing step is to redefine them using the regular SYSCALL_DEFINEx() to get them out of the compat namespace and make it possible to build them on 32-bit architectures. Any system call that ends in 'time' gets a '32' suffix on its name for that version, while the others get a '_time32' suffix, to distinguish them from the normal version, which takes a 64-bit time argument in the future. In this step, only 64-bit architectures are changed, doing this rename first lets us avoid touching the 32-bit architectures twice. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-02-03socket: Use old_timeval types for socket timestampsDeepa Dinamani1-3/+3
As part of y2038 solution, all internal uses of struct timeval are replaced by struct __kernel_old_timeval and struct compat_timeval by struct old_timeval32. Make socket timestamps use these new types. This is mainly to be able to verify that the kernel build is y2038 safe when such non y2038 safe types are not supported anymore. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: isdn@linux-pingi.de Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03sockopt: Rename SO_TIMESTAMP* to SO_TIMESTAMP*_OLDDeepa Dinamani1-3/+3
SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING options, the way they are currently defined, are not y2038 safe. Subsequent patches in the series add new y2038 safe versions of these options which provide 64 bit timestamps on all architectures uniformly. Hence, rename existing options with OLD tag suffixes. Also note that kernel will not use the untagged SO_TIMESTAMP* and SCM_TIMESTAMP* options internally anymore. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Cc: deller@gmx.de Cc: dhowells@redhat.com Cc: jejb@parisc-linux.org Cc: ralf@linux-mips.org Cc: rth@twiddle.net Cc: linux-afs@lists.infradead.org Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-parisc@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03socket: move compat timeout handling into sock.cArnd Bergmann1-65/+1
This is a cleanup to prepare for the addition of 64-bit time_t in O_SNDTIMEO/O_RCVTIMEO. The existing compat handler seems unnecessarily complex and error-prone, moving it all into the main setsockopt()/getsockopt() implementation requires half as much code and is easier to extend. 32-bit user space can now use old_timeval32 on both 32-bit and 64-bit machines, while 64-bit code can use __old_kernel_timeval. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-03Remove 'type' argument from access_ok() functionLinus Torvalds1-15/+15
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-6/+9
Pull networking fixes from David Miller: "Several fixes here. Basically split down the line between newly introduced regressions and long existing problems: 1) Double free in tipc_enable_bearer(), from Cong Wang. 2) Many fixes to nf_conncount, from Florian Westphal. 3) op->get_regs_len() can throw an error, check it, from Yunsheng Lin. 4) Need to use GFP_ATOMIC in *_add_hash_mac_address() of fsl/fman driver, from Scott Wood. 5) Inifnite loop in fib_empty_table(), from Yue Haibing. 6) Use after free in ax25_fillin_cb(), from Cong Wang. 7) Fix socket locking in nr_find_socket(), also from Cong Wang. 8) Fix WoL wakeup enable in r8169, from Heiner Kallweit. 9) On 32-bit sock->sk_stamp is not thread-safe, from Deepa Dinamani. 10) Fix ptr_ring wrap during queue swap, from Cong Wang. 11) Missing shutdown callback in hinic driver, from Xue Chaojing. 12) Need to return NULL on error from ip6_neigh_lookup(), from Stefano Brivio. 13) BPF out of bounds speculation fixes from Daniel Borkmann" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits) ipv6: Consider sk_bound_dev_if when binding a socket to an address ipv6: Fix dump of specific table with strict checking bpf: add various test cases to selftests bpf: prevent out of bounds speculation on pointer arithmetic bpf: fix check_map_access smin_value test when pointer contains offset bpf: restrict unknown scalars of mixed signed bounds for unprivileged bpf: restrict stack pointer arithmetic for unprivileged bpf: restrict map value pointer arithmetic for unprivileged bpf: enable access to ax register also from verifier rewrite bpf: move tmp variable into ax register in interpreter bpf: move {prev_,}insn_idx into verifier env isdn: fix kernel-infoleak in capi_unlocked_ioctl ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() error net/hamradio/6pack: use mod_timer() to rearm timers net-next/hinic:add shutdown callback net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT ip: validate header length on virtual device xmit tap: call skb_probe_transport_header after setting skb->dev ptr_ring: wrap back ->producer in __ptr_ring_swap_queue() net: rds: remove unnecessary NULL check ...
2019-01-01sock: Make sock->sk_stamp thread-safeDeepa Dinamani1-6/+9
Al Viro mentioned (Message-ID <20170626041334.GZ10672@ZenIV.linux.org.uk>) that there is probably a race condition lurking in accesses of sk_stamp on 32-bit machines. sock->sk_stamp is of type ktime_t which is always an s64. On a 32 bit architecture, we might run into situations of unsafe access as the access to the field becomes non atomic. Use seqlocks for synchronization. This allows us to avoid using spinlocks for readers as readers do not need mutual exclusion. Another approach to solve this is to require sk_lock for all modifications of the timestamps. The current approach allows for timestamps to have their own lock: sk_stamp_lock. This allows for the patch to not compete with already existing critical sections, and side effects are limited to the paths in the patch. The addition of the new field maintains the data locality optimizations from commit 9115e8cd2a0c ("net: reorganize struct sock for better data locality") Note that all the instances of the sk_stamp accesses are either through the ioctl or the syscall recvmsg. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18y2038: socket: Add compat_sys_recvmmsg_time64Arnd Bergmann1-22/+12
recvmmsg() takes two arguments to pointers of structures that differ between 32-bit and 64-bit architectures: mmsghdr and timespec. For y2038 compatbility, we are changing the native system call from timespec to __kernel_timespec with a 64-bit time_t (in another patch), and use the existing compat system call on both 32-bit and 64-bit architectures for compatibility with traditional 32-bit user space. As we now have two variants of recvmmsg() for 32-bit tasks that are both different from the variant that we use on 64-bit tasks, this means we also require two compat system calls! The solution I picked is to flip things around: The existing compat_sys_recvmmsg() call gets moved from net/compat.c into net/socket.c and now handles the case for old user space on all architectures that have set CONFIG_COMPAT_32BIT_TIME. A new compat_sys_recvmmsg_time64() call gets added in the old place for 64-bit architectures only, this one handles the case of a compat mmsghdr structure combined with __kernel_timespec. In the indirect sys_socketcall(), we now need to call either do_sys_recvmmsg() or __compat_sys_recvmmsg(), depending on what kind of architecture we are on. For compat_sys_socketcall(), no such change is needed, we always call __compat_sys_recvmmsg(). I decided to not add a new SYS_RECVMMSG_TIME64 socketcall: Any libc implementation for 64-bit time_t will need significant changes including an updated asm/unistd.h, and it seems better to consistently use the separate syscalls that configuration, leaving the socketcall only for backward compatibility with 32-bit time_t based libc. The naming is asymmetric for the moment, so both existing syscalls entry points keep their names, while the new ones are recvmmsg_time32 and compat_recvmmsg_time64 respectively. I expect that we will rename the compat syscalls later as we start using generated syscall tables everywhere and add these entry points. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-08-29y2038: socket: Change recvmmsg to use __kernel_timespecArnd Bergmann1-3/+3
This converts the recvmmsg() system call in all its variations to use 'timespec64' internally for its timeout, and have a __kernel_timespec64 argument in the native entry point. This lets us change the type to use 64-bit time_t at a later point while using the 32-bit compat system call emulation for existing user space. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-08-27y2038: globally rename compat_time to old_time32Arnd Bergmann1-2/+2
Christoph Hellwig suggested a slightly different path for handling backwards compatibility with the 32-bit time_t based system calls: Rather than simply reusing the compat_sys_* entry points on 32-bit architectures unchanged, we get rid of those entry points and the compat_time types by renaming them to something that makes more sense on 32-bit architectures (which don't have a compat mode otherwise), and then share the entry points under the new name with the 64-bit architectures that use them for implementing the compatibility. The following types and interfaces are renamed here, and moved from linux/compat_time.h to linux/time32.h: old new --- --- compat_time_t old_time32_t struct compat_timeval struct old_timeval32 struct compat_timespec struct old_timespec32 struct compat_itimerspec struct old_itimerspec32 ns_to_compat_timeval() ns_to_old_timeval32() get_compat_itimerspec64() get_old_itimerspec32() put_compat_itimerspec64() put_old_itimerspec32() compat_get_timespec64() get_old_timespec32() compat_put_timespec64() put_old_timespec32() As we already have aliases in place, this patch addresses only the instances that are relevant to the system call interface in particular, not those that occur in device drivers and other modules. Those will get handled separately, while providing the 64-bit version of the respective interfaces. I'm not renaming the timex, rusage and itimerval structures, as we are still debating what the new interface will look like, and whether we will need a replacement at all. This also doesn't change the names of the syscall entry points, which can be done more easily when we actually switch over the 32-bit architectures to use them, at that point we need to change COMPAT_SYSCALL_DEFINEx to SYSCALL_DEFINEx with a new name, e.g. with a _time32 suffix. Suggested-by: Christoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/lkml/20180705222110.GA5698@infradead.org/ Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-08-06net: avoid unnecessary sock_flag() check when enable timestampYafang Shao1-4/+2
The sock_flag() check is alreay inside sock_enable_timestamp(), so it is unnecessary checking it in the caller. void sock_enable_timestamp(struct sock *sk, int flag) { if (!sock_flag(sk, flag)) { ... } } Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27net: support compat 64-bit time in {s,g}etsockoptLance Richardson1-2/+4
For the x32 ABI, struct timeval has two 64-bit fields. However the kernel currently interprets the user-space values used for the SO_RCVTIMEO and SO_SNDTIMEO socket options as having a pair of 32-bit fields. When the seconds portion of the requested timeout is less than 2**32, the seconds portion of the effective timeout is correct but the microseconds portion is zero. When the seconds portion of the requested timeout is zero and the microseconds portion is non-zero, the kernel interprets the timeout as zero (never timeout). Fix by using 64-bit time for SO_RCVTIMEO/SO_SNDTIMEO as required for the ABI. The code included below demonstrates the problem. Results before patch: $ gcc -m64 -Wall -O2 -o socktmo socktmo.c && ./socktmo recv time: 2.008181 seconds send time: 2.015985 seconds $ gcc -m32 -Wall -O2 -o socktmo socktmo.c && ./socktmo recv time: 2.016763 seconds send time: 2.016062 seconds $ gcc -mx32 -Wall -O2 -o socktmo socktmo.c && ./socktmo recv time: 1.007239 seconds send time: 1.023890 seconds Results after patch: $ gcc -m64 -O2 -Wall -o socktmo socktmo.c && ./socktmo recv time: 2.010062 seconds send time: 2.015836 seconds $ gcc -m32 -O2 -Wall -o socktmo socktmo.c && ./socktmo recv time: 2.013974 seconds send time: 2.015981 seconds $ gcc -mx32 -O2 -Wall -o socktmo socktmo.c && ./socktmo recv time: 2.030257 seconds send time: 2.013383 seconds #include <stdio.h> #include <stdlib.h> #include <sys/socket.h> #include <sys/types.h> #include <sys/time.h> void checkrc(char *str, int rc) { if (rc >= 0) return; perror(str); exit(1); } static char buf[1024]; int main(int argc, char **argv) { int rc; int socks[2]; struct timeval tv; struct timeval start, end, delta; rc = socketpair(AF_UNIX, SOCK_STREAM, 0, socks); checkrc("socketpair", rc); /* set timeout to 1.999999 seconds */ tv.tv_sec = 1; tv.tv_usec = 999999; rc = setsockopt(socks[0], SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof tv); rc = setsockopt(socks[0], SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof tv); checkrc("setsockopt", rc); /* measure actual receive timeout */ gettimeofday(&start, NULL); rc = recv(socks[0], buf, sizeof buf, 0); gettimeofday(&end, NULL); timersub(&end, &start, &delta); printf("recv time: %ld.%06ld seconds\n", (long)delta.tv_sec, (long)delta.tv_usec); /* fill send buffer */ do { rc = send(socks[0], buf, sizeof buf, 0); } while (rc > 0); /* measure actual send timeout */ gettimeofday(&start, NULL); rc = send(socks[0], buf, sizeof buf, 0); gettimeofday(&end, NULL); timersub(&end, &start, &delta); printf("send time: %ld.%06ld seconds\n", (long)delta.tv_sec, (long)delta.tv_usec); exit(0); } Fixes: 515c7af85ed9 ("x32: Use compat shims for {g,s}etsockopt") Reported-by: Gopal RajagopalSai <gopalsr83@gmail.com> Signed-off-by: Lance Richardson <lance.richardson.net@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-02net: socket: add __compat_sys_...msg() helpers; remove in-kernel calls to ↵Dominik Brodowski1-7/+30
compat syscalls Using the net-internal helpers __compat_sys_...msg() allows us to avoid the internal calls to the compat_sys_...msg() syscalls. compat_sys_recvmmsg() is handled in a different patch. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __compat_sys_recvmmsg() helper; remove in-kernel call to ↵Dominik Brodowski1-5/+12
compat syscall Using the net-internal helper __compat_sys_recvmmsg() allows us to avoid the internal calls to the compat_sys_recvmmsg() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __compat_sys_getsockopt() helper; remove in-kernel call to ↵Dominik Brodowski1-4/+12
compat syscall Using the net-internal helper __compat_sys_getsockopt() allows us to avoid the internal calls to the compat_sys_getsockopt() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __compat_sys_setsockopt() helper; remove in-kernel call to ↵Dominik Brodowski1-4/+10
compat syscall Using the net-internal helper __compat_sys_setsockopt() allows us to avoid the internal calls to the compat_sys_setsockopt() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __compat_sys_recvfrom() helper; remove in-kernel call to ↵Dominik Brodowski1-7/+16
compat syscall Using the net-internal helper __compat_sys_recvfrom() allows us to avoid the internal calls to the compat_sys_recvfrom() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: replace call to sys_recv() with __sys_recvfrom()Dominik Brodowski1-1/+2
sys_recv() merely expands the parameters to __sys_recvfrom() by NULL and NULL. Open-code this in the two places which used sys_recv() as a wrapper to __sys_recvfrom(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: replace calls to sys_send() with __sys_sendto()Dominik Brodowski1-1/+1
sys_send() merely expands the parameters to __sys_sendto() by NULL and 0. Open-code this in the two places which used sys_send() as a wrapper to __sys_sendto(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: move check for forbid_cmsg_compat to __sys_...msg()Dominik Brodowski1-3/+5
The non-compat codepaths for sys_...msg() verify that MSG_CMSG_COMPAT is not set. By moving this check to the __sys_...msg() functions (and making it dependent on a static flag passed to this function), we can call the __sys...msg() functions instead of the syscall functions in all cases. __sys_recvmmsg() does not need this trickery, as the check is handled within the do_sys_recvmmsg() function internal to net/socket.c. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_shutdown() helper; remove in-kernel call to syscallDominik Brodowski1-1/+1
Using the net-internal helper __sys_shutdown() allows us to avoid the internal calls to the sys_shutdown() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_socketpair() helper; remove in-kernel call to syscallDominik Brodowski1-1/+1
Using the net-internal helper __sys_socketpair() allows us to avoid the internal calls to the sys_socketpair() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_getpeername() helper; remove in-kernel call to syscallDominik Brodowski1-1/+1
Using the net-internal helper __sys_getpeername() allows us to avoid the internal calls to the sys_getpeername() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_getsockname() helper; remove in-kernel call to syscallDominik Brodowski1-1/+1
Using the net-internal helper __sys_getsockname() allows us to avoid the internal calls to the sys_getsockname() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_listen() helper; remove in-kernel call to syscallDominik Brodowski1-1/+1
Using the net-internal helper __sys_listen() allows us to avoid the internal calls to the sys_listen() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_connect() helper; remove in-kernel call to syscallDominik Brodowski1-1/+1
Using the net-internal helper __sys_connect() allows us to avoid the internal calls to the sys_connect() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_bind() helper; remove in-kernel call to syscallDominik Brodowski1-1/+1
Using the net-internal helper __sys_bind() allows us to avoid the internal calls to the sys_bind() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_socket() helper; remove in-kernel call to syscallDominik Brodowski1-1/+1
Using the net-internal helper __sys_socket() allows us to avoid the internal calls to the sys_socket() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_accept4() helper; remove in-kernel call to syscallDominik Brodowski1-2/+2
Using the net-internal helper __sys_accept4() allows us to avoid the internal calls to the sys_accept4() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_sendto() helper; remove in-kernel call to syscallDominik Brodowski1-1/+2
Using the net-internal helper __sys_sendto() allows us to avoid the internal calls to the sys_sendto() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2018-04-02net: socket: add __sys_recvfrom() helper; remove in-kernel call to syscallDominik Brodowski1-1/+2
Using the net-internal helper __sys_recvfrom() allows us to avoid the internal calls to the sys_recvfrom() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2017-09-20net: compat: assert the size of cmsg copied in is as expectedMeng Xu1-0/+7
The actual length of cmsg fetched in during the second loop (i.e., kcmsg - kcmsg_base) could be different from what we get from the first loop (i.e., kcmlen). The main reason is that the two get_user() calls in the two loops (i.e., get_user(ucmlen, &ucmsg->cmsg_len) and __get_user(ucmlen, &ucmsg->cmsg_len)) could cause ucmlen to have different values even they fetch from the same userspace address, as user can race to change the memory content in &ucmsg->cmsg_len across fetches. Although in the second loop, the sanity check if ((char *)kcmsg_base + kcmlen - (char *)kcmsg < CMSG_ALIGN(tmp)) is inplace, it only ensures that the cmsg fetched in during the second loop does not exceed the length of kcmlen, but not necessarily equal to kcmlen. But indicated by the assignment kmsg->msg_controllen = kcmlen, we should enforce that. This patch adds this additional sanity check and ensures that what is recorded in kmsg->msg_controllen is the actual cmsg length. Signed-off-by: Meng Xu <mengxu.gatech@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04get_compat_bpf_fprog(): don't copyin field-by-fieldAl Viro1-9/+9
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-04get_compat_msghdr(): get rid of field-by-field copyinAl Viro1-17/+14
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-02-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-8/+9
Pull networking updates from David Miller: "Highlights: 1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini Varadhan. 2) Simplify classifier state on sk_buff in order to shrink it a bit. From Willem de Bruijn. 3) Introduce SIPHASH and it's usage for secure sequence numbers and syncookies. From Jason A. Donenfeld. 4) Reduce CPU usage for ICMP replies we are going to limit or suppress, from Jesper Dangaard Brouer. 5) Introduce Shared Memory Communications socket layer, from Ursula Braun. 6) Add RACK loss detection and allow it to actually trigger fast recovery instead of just assisting after other algorithms have triggered it. From Yuchung Cheng. 7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot. 8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert. 9) Export MPLS packet stats via netlink, from Robert Shearman. 10) Significantly improve inet port bind conflict handling, especially when an application is restarted and changes it's setting of reuseport. From Josef Bacik. 11) Implement TX batching in vhost_net, from Jason Wang. 12) Extend the dummy device so that VF (virtual function) features, such as configuration, can be more easily tested. From Phil Sutter. 13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric Dumazet. 14) Add new bpf MAP, implementing a longest prefix match trie. From Daniel Mack. 15) Packet sample offloading support in mlxsw driver, from Yotam Gigi. 16) Add new aquantia driver, from David VomLehn. 17) Add bpf tracepoints, from Daniel Borkmann. 18) Add support for port mirroring to b53 and bcm_sf2 drivers, from Florian Fainelli. 19) Remove custom busy polling in many drivers, it is done in the core networking since 4.5 times. From Eric Dumazet. 20) Support XDP adjust_head in virtio_net, from John Fastabend. 21) Fix several major holes in neighbour entry confirmation, from Julian Anastasov. 22) Add XDP support to bnxt_en driver, from Michael Chan. 23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan. 24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi. 25) Support GRO in IPSEC protocols, from Steffen Klassert" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits) Revert "ath10k: Search SMBIOS for OEM board file extension" net: socket: fix recvmmsg not returning error from sock_error bnxt_en: use eth_hw_addr_random() bpf: fix unlocking of jited image when module ronx not set arch: add ARCH_HAS_SET_MEMORY config net: napi_watchdog() can use napi_schedule_irqoff() tcp: Revert "tcp: tcp_probe: use spin_lock_bh()" net/hsr: use eth_hw_addr_random() net: mvpp2: enable building on 64-bit platforms net: mvpp2: switch to build_skb() in the RX path net: mvpp2: simplify MVPP2_PRS_RI_* definitions net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT net: mvpp2: remove unused register definitions net: mvpp2: simplify mvpp2_bm_bufs_add() net: mvpp2: drop useless fields in mvpp2_bm_pool and related code net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue' net: mvpp2: release reference to txq_cpu[] entry after unmapping net: mvpp2: handle too large value in mvpp2_rx_time_coal_set() net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set() net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set ...
2017-02-21Merge branch 'stable-4.11' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds1-3/+14
Pull audit updates from Paul Moore: "The audit changes for v4.11 are relatively small compared to what we did for v4.10, both in terms of size and impact. - two patches from Steve tweak the formatting for some of the audit records to make them more consistent with other audit records. - three patches from Richard record the name of a module on module load, fix the logging of sockaddr information when using socketcall() on 32-bit systems, and add the ability to reset audit's lost record counter. - my lone patch just fixes an annoying style nit that I was reminded about by one of Richard's patches. All these patches pass our test suite" * 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit: audit: remove unnecessary curly braces from switch/case statements audit: log module name on init_module audit: log 32-bit socketcalls audit: add feature audit_lost reset audit: Make AUDIT_ANOM_ABEND event normalized audit: Make AUDIT_KERNEL event conform to the specification
2017-01-18audit: log 32-bit socketcallsRichard Guy Briggs1-3/+14
32-bit socketcalls were not being logged by audit on x86_64 systems. Log them. This is basically a duplicate of the call from net/socket.c:sys_socketcall(), but it addresses the impedance mismatch between 32-bit userspace process and 64-bit kernel audit. See: https://github.com/linux-audit/audit-kernel/issues/14 Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-01-04net: Assert at build time the assumptions we make about the CMSG header.David S. Miller1-0/+3
It must always be the case that CMSG_ALIGN(sizeof(hdr)) == sizeof(hdr). Otherwise there are missing adjustments in the various calculations that parse and build these things. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04scm: remove use CMSG{_COMPAT}_ALIGN(sizeof(struct {compat_}cmsghdr))yuan linyu1-8/+6
sizeof(struct cmsghdr) and sizeof(struct compat_cmsghdr) already aligned. remove use CMSG_ALIGN(sizeof(struct cmsghdr)) and CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)) keep code consistent. Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds1-1/+1
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-09packet: compat support for sock_fprogWillem de Bruijn1-2/+15
Socket option PACKET_FANOUT_DATA takes a struct sock_fprog as argument if PACKET_FANOUT has mode PACKET_FANOUT_CBPF. This structure contains a pointer into user memory. If userland is 32-bit and kernel is 64-bit the two disagree about the layout of struct sock_fprog. Add compat setsockopt support to convert a 32-bit compat_sock_fprog to a 64-bit sock_fprog. This is analogous to compat_sock_fprog support for SO_REUSEPORT added in commit 1957598840f4 ("soreuseport: add compat case for setsockopt SO_ATTACH_REUSEPORT_CBPF"). Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-06soreuseport: add compat case for setsockopt SO_ATTACH_REUSEPORT_CBPFHelge Deller1-1/+2
Commit 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF") missed to add the compat case for the SO_ATTACH_REUSEPORT_CBPF option. Signed-off-by: Helge Deller <deller@gmx.de> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-09net: switch importing msghdr from userland to {compat_,}import_iovec()Al Viro1-11/+7
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-23net: socket: add support for async operationstadeusz.struk@intel.com1-0/+2
Add support for async operations. Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+7
Conflicts: drivers/net/ethernet/emulex/benet/be_main.c net/core/sysctl_net_core.c net/ipv4/inet_diag.c The be_main.c conflict resolution was really tricky. The conflict hunks generated by GIT were very unhelpful, to say the least. It split functions in half and moved them around, when the real actual conflict only existed solely inside of one function, that being be_map_pci_bars(). So instead, to resolve this, I checked out be_main.c from the top of net-next, then I applied the be_main.c changes from 'net' since the last time I merged. And this worked beautifully. The inet_diag.c and sysctl_net_core.c conflicts were simple overlapping changes, and were easily to resolve. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() ↵Catalin Marinas1-0/+7
behaviour Commit db31c55a6fb2 (net: clamp ->msg_namelen instead of returning an error) introduced the clamping of msg_namelen when the unsigned value was larger than sizeof(struct sockaddr_storage). This caused a msg_namelen of -1 to be valid. The native code was subsequently fixed by commit dbb490b96584 (net: socket: error on a negative msg_namelen). In addition, the native code sets msg_namelen to 0 when msg_name is NULL. This was done in commit (6a2a2b3ae075 net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland) and subsequently updated by 08adb7dabd48 (fold verify_iovec() into copy_msghdr_from_user()). This patch brings the get_compat_msghdr() in line with copy_msghdr_from_user(). Fixes: db31c55a6fb2 (net: clamp ->msg_namelen instead of returning an error) Cc: David S. Miller <davem@davemloft.net> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-9/+0
Conflicts: drivers/net/ethernet/rocker/rocker.c The rocker commit was two overlapping changes, one to rename the ->vport member to ->pport, and another making the bitmask expression use '1ULL' instead of plain '1'. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-23net: compat: Ignore MSG_CMSG_COMPAT in compat_sys_{send, recv}msgCatalin Marinas1-9/+0
With commit a7526eb5d06b (net: Unbreak compat_sys_{send,recv}msg), the MSG_CMSG_COMPAT flag is blocked at the compat syscall entry points, changing the kernel compat behaviour from the one before the commit it was trying to fix (1be374a0518a, net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg). On 32-bit kernels (!CONFIG_COMPAT), MSG_CMSG_COMPAT is 0 and the native 32-bit sys_sendmsg() allows flag 0x80000000 to be set (it is ignored by the kernel). However, on a 64-bit kernel, the compat ABI is different with commit a7526eb5d06b. This patch changes the compat_sys_{send,recv}msg behaviour to the one prior to commit 1be374a0518a. The problem was found running 32-bit LTP (sendmsg01) binary on an arm64 kernel. Arguably, LTP should not pass 0xffffffff as flags to sendmsg() but the general rule is not to break user ABI (even when the user behaviour is not entirely sane). Fixes: a7526eb5d06b (net: Unbreak compat_sys_{send,recv}msg) Cc: Andy Lutomirski <luto@amacapital.net> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-22net: __aligned(size) is preferred over __attribute__((aligned(size)))Ameen Ali1-5/+5
Signed-off-by: Ameen Ali <AmeenAli023@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09put iov_iter into msghdrAl Viro1-4/+6
Note that the code _using_ ->msg_iter at that point will be very unhappy with anything other than unshifted iovec-backed iov_iter. We still need to convert users to proper primitives. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19fold verify_iovec() into copy_msghdr_from_user()Al Viro1-30/+22
... and do the same on the compat side of things. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19{compat_,}verify_iovec(): switch to generic copying of iovecsAl Viro1-36/+15
use {compat_,}rw_copy_check_uvector(). As the result, we are guaranteed that all iovecs seen in ->msg_iov by ->sendmsg() and ->recvmsg() will pass access_ok(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19separate kernel- and userland-side msghdrAl Viro1-2/+2
Kernel-side struct msghdr is (currently) using the same layout as userland one, but it's not a one-to-one copy - even without considering 32bit compat issues, we have msg_iov, msg_name and msg_control copied to kernel[1]. It's fairly localized, so we get away with a few functions where that knowledge is needed (and we could shrink that set even more). Pretty much everything deals with the kernel-side variant and the few places that want userland one just use a bunch of force-casts to paper over the differences. The thing is, kernel-side definition of struct msghdr is *not* exposed in include/uapi - libc doesn't see it, etc. So we can add struct user_msghdr, with proper annotations and let the few places that ever deal with those beasts use it for userland pointers. Saner typechecking aside, that will allow to change the layout of kernel-side msghdr - e.g. replace msg_iov/msg_iovlen there with struct iov_iter, getting rid of the need to modify the iovec as we copy data to/from it, etc. We could introduce kernel_msghdr instead, but that would create much more noise - the absolute majority of the instances would need to have the type switched to kernel_msghdr and definition of struct msghdr in include/linux/socket.h is not going to be seen by userland anyway. This commit just introduces user_msghdr and switches the few places that are dealing with userland-side msghdr to it. [1] actually, it's even trickier than that - we copy msg_control for sendmsg, but keep the userland address on recvmsg. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-07-29net: sendmsg: fix NULL pointer dereferenceAndrey Ryabinin1-4/+5
Sasha's report: > While fuzzing with trinity inside a KVM tools guest running the latest -next > kernel with the KASAN patchset, I've stumbled on the following spew: > > [ 4448.949424] ================================================================== > [ 4448.951737] AddressSanitizer: user-memory-access on address 0 > [ 4448.952988] Read of size 2 by thread T19638: > [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813 > [ 4448.956823] ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40 > [ 4448.958233] ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d > [ 4448.959552] 0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000 > [ 4448.961266] Call Trace: > [ 4448.963158] dump_stack (lib/dump_stack.c:52) > [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184) > [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352) > [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339) > [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339) > [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555) > [ 4448.970103] sock_sendmsg (net/socket.c:654) > [ 4448.971584] ? might_fault (mm/memory.c:3741) > [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740) > [ 4448.973596] ? verify_iovec (net/core/iovec.c:64) > [ 4448.974522] ___sys_sendmsg (net/socket.c:2096) > [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254) > [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273) > [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1)) > [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188) > [ 4448.980535] __sys_sendmmsg (net/socket.c:2181) > [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600) > [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607) > [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2)) > [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600) > [ 4448.986754] SyS_sendmmsg (net/socket.c:2201) > [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542) > [ 4448.988929] ================================================================== This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0. After this report there was no usual "Unable to handle kernel NULL pointer dereference" and this gave me a clue that address 0 is mapped and contains valid socket address structure in it. This bug was introduced in f3d3342602f8bcbf37d7c46641cb9bca7618eb1c (net: rework recvmsg handler msg_name and msg_namelen logic). Commit message states that: "Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address." But in fact this affects sendto when address 0 is mapped and contains socket address structure in it. In such case copy-in address will succeed, verify_iovec() function will successfully exit with msg->msg_namelen > 0 and msg->msg_name == NULL. This patch fixes it by setting msg_namelen to 0 if msg_name == NULL. Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Eric Dumazet <edumazet@google.com> Cc: <stable@vger.kernel.org> Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-06net/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter typesHeiko Carstens1-4/+4
In order to allow the COMPAT_SYSCALL_DEFINE macro generate code that performs proper zero and sign extension convert all 64 bit parameters to their corresponding 32 bit compat counterparts. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-06net/compat: convert to COMPAT_SYSCALL_DEFINEHeiko Carstens1-12/+12
Convert all compat system call functions where all parameter types have a size of four or less than four bytes, or are pointer types to COMPAT_SYSCALL_DEFINE. The implicit casts within COMPAT_SYSCALL_DEFINE will perform proper zero and sign extension to 64 bit of all parameters if needed. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-01-30x86, x32: Correct invalid use of user timespec in the kernelPaX Team1-7/+2
The x32 case for the recvmsg() timout handling is broken: asmlinkage long compat_sys_recvmmsg(int fd, struct compat_mmsghdr __user *mmsg, unsigned int vlen, unsigned int flags, struct compat_timespec __user *timeout) { int datagrams; struct timespec ktspec; if (flags & MSG_CMSG_COMPAT) return -EINVAL; if (COMPAT_USE_64BIT_TIME) return __sys_recvmmsg(fd, (struct mmsghdr __user *)mmsg, vlen, flags | MSG_CMSG_COMPAT, (struct timespec *) timeout); ... The timeout pointer parameter is provided by userland (hence the __user annotation) but for x32 syscalls it's simply cast to a kernel pointer and is passed to __sys_recvmmsg which will eventually directly dereference it for both reading and writing. Other callers to __sys_recvmmsg properly copy from userland to the kernel first. The bug was introduced by commit ee4fa23c4bfc ("compat: Use COMPAT_USE_64BIT_TIME in net/compat.c") and should affect all kernels since 3.4 (and perhaps vendor kernels if they backported x32 support along with this code). Note that CONFIG_X86_X32_ABI gets enabled at build time and only if CONFIG_X86_X32 is enabled and ld can build x32 executables. Other uses of COMPAT_USE_64BIT_TIME seem fine. This addresses CVE-2014-0038. Signed-off-by: PaX Team <pageexec@freemail.hu> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v3.4+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-29net: clamp ->msg_namelen instead of returning an errorDan Carpenter1-1/+1
If kmsg->msg_namelen > sizeof(struct sockaddr_storage) then in the original code that would lead to memory corruption in the kernel if you had audit configured. If you didn't have audit configured it was harmless. There are some programs such as beta versions of Ruby which use too large of a buffer and returning an error code breaks them. We should clamp the ->msg_namelen value instead. Fixes: 1661bf364ae9 ("net: heap overflow in __audit_sockaddr()") Reported-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Tested-by: Eric Wong <normalperson@yhbt.net> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-20net: rework recvmsg handler msg_name and msg_namelen logicHannes Frederic Sowa1-1/+2
This patch now always passes msg->msg_namelen as 0. recvmsg handlers must set msg_namelen to the proper size <= sizeof(struct sockaddr_storage) to return msg_name to the user. This prevents numerous uninitialized memory leaks we had in the recvmsg handlers and makes it harder for new code to accidentally leak uninitialized memory. Optimize for the case recvfrom is called with NULL as address. We don't need to copy the address at all, so set it to NULL before invoking the recvmsg handler. We can do so, because all the recvmsg handlers must cope with the case a plain read() is called on them. read() also sets msg_name to NULL. Also document these changes in include/linux/net.h as suggested by David Miller. Changes since RFC: Set msg->msg_name = NULL if user specified a NULL in msg_name but had a non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't affect sendto as it would bail out earlier while trying to copy-in the address. It also more naturally reflects the logic by the callers of verify_iovec. With this change in place I could remove " if (!uaddr || msg_sys->msg_namelen == 0) msg->msg_name = NULL ". This change does not alter the user visible error logic as we ignore msg_namelen as long as msg_name is NULL. Also remove two unnecessary curly brackets in ___sys_recvmsg and change comments to netdev style. Cc: David Miller <davem@davemloft.net> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-03net: heap overflow in __audit_sockaddr()Dan Carpenter1-0/+2
We need to cap ->msg_namelen or it leads to a buffer overflow when we to the memcpy() in __audit_sockaddr(). It requires CAP_AUDIT_CONTROL to exploit this bug. The call tree is: ___sys_recvmsg() move_addr_to_user() audit_sockaddr() __audit_sockaddr() Reported-by: Jüri Aedla <juri.aedla@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-06net: Unbreak compat_sys_{send,recv}msgAndy Lutomirski1-2/+11
I broke them in this commit: commit 1be374a0518a288147c6a7398792583200a67261 Author: Andy Lutomirski <luto@amacapital.net> Date: Wed May 22 14:07:44 2013 -0700 net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg This patch adds __sys_sendmsg and __sys_sendmsg as common helpers that accept MSG_CMSG_COMPAT and blocks MSG_CMSG_COMPAT at the syscall entrypoints. It also reverts some unnecessary checks in sys_socketcall. Apparently I was suffering from underscore blindness the first time around. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Tested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-26make get_file() return its argumentAl Viro1-2/+1
simplifies a bunch of callers... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-22net: Fix references to out-of-scope variables in put_cmsg_compat()Jesper Juhl1-2/+2
In net/compat.c::put_cmsg_compat() we may assign 'data' the address of either the 'ctv' or 'cts' local variables inside the 'if (!COMPAT_USE_64BIT_TIME)' branch. Those variables go out of scope at the end of the 'if' statement, so when we use 'data' further down in 'copy_to_user(CMSG_COMPAT_DATA(cm), data, cmlen - sizeof(struct compat_cmsghdr))' there's no telling what it may be refering to - not good. Fix the problem by simply giving 'ctv' and 'cts' function scope. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-21Merge branch 'next' of ↵Linus Torvalds1-8/+0
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "New notable features: - The seccomp work from Will Drewry - PR_{GET,SET}_NO_NEW_PRIVS from Andy Lutomirski - Longer security labels for Smack from Casey Schaufler - Additional ptrace restriction modes for Yama by Kees Cook" Fix up trivial context conflicts in arch/x86/Kconfig and include/linux/filter.h * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (65 commits) apparmor: fix long path failure due to disconnected path apparmor: fix profile lookup for unconfined ima: fix filename hint to reflect script interpreter name KEYS: Don't check for NULL key pointer in key_validate() Smack: allow for significantly longer Smack labels v4 gfp flags for security_inode_alloc()? Smack: recursive tramsmute Yama: replace capable() with ns_capable() TOMOYO: Accept manager programs which do not start with / . KEYS: Add invalidation support KEYS: Do LRU discard in full keyrings KEYS: Permit in-place link replacement in keyring list KEYS: Perform RCU synchronisation on keys prior to key destruction KEYS: Announce key type (un)registration KEYS: Reorganise keys Makefile KEYS: Move the key config into security/keys/Kconfig KEYS: Use the compat keyctl() syscall wrapper on Sparc64 for Sparc32 compat Yama: remove an unused variable samples/seccomp: fix dependencies on arch macros Yama: add additional ptrace scopes ...
2012-04-15net: cleanup unsigned to unsigned intEric Dumazet1-5/+5
Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14net/compat.c,linux/filter.h: share compat_sock_fprogWill Drewry1-8/+0
Any other users of bpf_*_filter that take a struct sock_fprog from userspace will need to be able to also accept a compat_sock_fprog if the arch supports compat calls. This change allows the existing compat_sock_fprog be shared. Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Eric Paris <eparis@redhat.com> v18: tasered by the apostrophe police v14: rebase/nochanges v13: rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc v12: rebase on to linux-next v11: introduction Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-03-29Merge branch 'x86-x32-for-linus' of ↵Linus Torvalds1-25/+40
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x32 support for x86-64 from Ingo Molnar: "This tree introduces the X32 binary format and execution mode for x86: 32-bit data space binaries using 64-bit instructions and 64-bit kernel syscalls. This allows applications whose working set fits into a 32 bits address space to make use of 64-bit instructions while using a 32-bit address space with shorter pointers, more compressed data structures, etc." Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c} * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) x32: Fix alignment fail in struct compat_siginfo x32: Fix stupid ia32/x32 inversion in the siginfo format x32: Add ptrace for x32 x32: Switch to a 64-bit clock_t x32: Provide separate is_ia32_task() and is_x32_task() predicates x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls x86/x32: Fix the binutils auto-detect x32: Warn and disable rather than error if binutils too old x32: Only clear TIF_X32 flag once x32: Make sure TS_COMPAT is cleared for x32 tasks fs: Remove missed ->fds_bits from cessation use of fd_set structs internally fs: Fix close_on_exec pointer in alloc_fdtable x32: Drop non-__vdso weak symbols from the x32 VDSO x32: Fix coding style violations in the x32 VDSO code x32: Add x32 VDSO support x32: Allow x32 to be configured x32: If configured, add x32 system calls to system call tables x32: Handle process creation x32: Signal-related system calls x86: Add #ifdef CONFIG_COMPAT to <asm/sys_ia32.h> ...
2012-03-11net: get rid of some pointless casts to sockaddrMaciej Żenczykowski1-1/+1
The following 4 functions: move_addr_to_kernel move_addr_to_user verify_iovec verify_compat_iovec are always effectively called with a sockaddr_storage. Make this explicit by changing their signature. This removes a large number of casts from sockaddr_storage to sockaddr. Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-20compat: Use COMPAT_USE_64BIT_TIME in net/compat.cH. J. Lu1-25/+40
Handle 64-bit time structures in the networking core compat code. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: David S. Miller <davem@davemloft.net>
2011-10-31net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modulesPaul Gortmaker1-0/+1
These files are non modular, but need to export symbols using the macros now living in export.h -- call out the include so that things won't break when we remove the implicit presence of module.h from everywhere. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-05-05net: Add sendmmsg socket system callAnton Blanchard1-3/+13
This patch adds a multiple message send syscall and is the send version of the existing recvmmsg syscall. This is heavily based on the patch by Arnaldo that added recvmmsg. I wrote a microbenchmark to test the performance gains of using this new syscall: http://ozlabs.org/~anton/junkcode/sendmmsg_test.c The test was run on a ppc64 box with a 10 Gbit network card. The benchmark can send both UDP and RAW ethernet packets. 64B UDP batch pkts/sec 1 804570 2 872800 (+ 8 %) 4 916556 (+14 %) 8 939712 (+17 %) 16 952688 (+18 %) 32 956448 (+19 %) 64 964800 (+20 %) 64B raw socket batch pkts/sec 1 1201449 2 1350028 (+12 %) 4 1461416 (+22 %) 8 1513080 (+26 %) 16 1541216 (+28 %) 32 1553440 (+29 %) 64 1557888 (+30 %) We see a 20% improvement in throughput on UDP send and 30% on raw socket send. [ Add sparc syscall entries. -DaveM ] Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28net: Limit socket I/O iovec total length to INT_MAX.David S. Miller1-4/+6
This helps protect us from overflow issues down in the individual protocol sendmsg/recvmsg handlers. Once we hit INT_MAX we truncate out the rest of the iovec by setting the iov_len members to zero. This works because: 1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial writes are allowed and the application will just continue with another write to send the rest of the data. 2) For datagram oriented sockets, where there must be a one-to-one correspondance between write() calls and packets on the wire, INT_MAX is going to be far larger than the packet size limit the protocol is going to check for and signal with -EMSGSIZE. Based upon a patch by Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>
2010-06-03From abbffa2aa9bd6f8df16d0d0a102af677510d8b9a Mon Sep 17 00:00:00 2001Eric Dumazet1-25/+22
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 3 Jun 2010 04:29:41 +0000 Subject: [PATCH 2/3] net: net/socket.c and net/compat.c cleanups cleanup patch, to match modern coding style. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> --- net/compat.c | 47 ++++++++--------- net/socket.c | 165 ++++++++++++++++++++++++++++------------------------------ 2 files changed, 102 insertions(+), 110 deletions(-) diff --git a/net/compat.c b/net/compat.c index 1cf7590..63d260e 100644 --- a/net/compat.c +++ b/net/compat.c @@ -81,7 +81,7 @@ int verify_compat_iovec(struct msghdr *kern_msg, struct iovec *kern_iov, int tot_len; if (kern_msg->msg_namelen) { - if (mode==VERIFY_READ) { + if (mode == VERIFY_READ) { int err = move_addr_to_kernel(kern_msg->msg_name, kern_msg->msg_namelen, kern_address); @@ -354,7 +354,7 @@ static int do_set_attach_filter(struct socket *sock, int level, int optname, static int do_set_sock_timeout(struct socket *sock, int level, int optname, char __user *optval, unsigned int optlen) { - struct compat_timeval __user *up = (struct compat_timeval __user *) optval; + struct compat_timeval __user *up = (struct compat_timeval __user *)optval; struct timeval ktime; mm_segment_t old_fs; int err; @@ -367,7 +367,7 @@ static int do_set_sock_timeout(struct socket *sock, int level, return -EFAULT; old_fs = get_fs(); set_fs(KERNEL_DS); - err = sock_setsockopt(sock, level, optname, (char *) &ktime, sizeof(ktime)); + err = sock_setsockopt(sock, level, optname, (char *)&ktime, sizeof(ktime)); set_fs(old_fs); return err; @@ -389,11 +389,10 @@ asmlinkage long compat_sys_setsockopt(int fd, int level, int optname, char __user *optval, unsigned int optlen) { int err; - struct socket *sock; + struct socket *sock = sockfd_lookup(fd, &err); - if ((sock = sockfd_lookup(fd, &err))!=NULL) - { - err = security_socket_setsockopt(sock,level,optname); + if (sock) { + err = security_socket_setsockopt(sock, level, optname); if (err) { sockfd_put(sock); return err; @@ -453,7 +452,7 @@ static int compat_sock_getsockopt(struct socket *sock, int level, int optname, int compat_sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp) { struct compat_timeval __user *ctv = - (struct compat_timeval __user*) userstamp; + (struct compat_timeval __user *) userstamp; int err = -ENOENT; struct timeval tv; @@ -477,7 +476,7 @@ EXPORT_SYMBOL(compat_sock_get_timestamp); int compat_sock_get_timestampns(struct sock *sk, struct timespec __user *userstamp) { struct compat_timespec __user *ctv = - (struct compat_timespec __user*) userstamp; + (struct compat_timespec __user *) userstamp; int err = -ENOENT; struct timespec ts; @@ -502,12 +501,10 @@ asmlinkage long compat_sys_getsockopt(int fd, int level, int optname, char __user *optval, int __user *optlen) { int err; - struct socket *sock; + struct socket *sock = sockfd_lookup(fd, &err); - if ((sock = sockfd_lookup(fd, &err))!=NULL) - { - err = security_socket_getsockopt(sock, level, - optname); + if (sock) { + err = security_socket_getsockopt(sock, level, optname); if (err) { sockfd_put(sock); return err; @@ -557,7 +554,7 @@ struct compat_group_filter { int compat_mc_setsockopt(struct sock *sock, int level, int optname, char __user *optval, unsigned int optlen, - int (*setsockopt)(struct sock *,int,int,char __user *,unsigned int)) + int (*setsockopt)(struct sock *, int, int, char __user *, unsigned int)) { char __user *koptval = optval; int koptlen = optlen; @@ -640,12 +637,11 @@ int compat_mc_setsockopt(struct sock *sock, int level, int optname, } return setsockopt(sock, level, optname, koptval, koptlen); } - EXPORT_SYMBOL(compat_mc_setsockopt); int compat_mc_getsockopt(struct sock *sock, int level, int optname, char __user *optval, int __user *optlen, - int (*getsockopt)(struct sock *,int,int,char __user *,int __user *)) + int (*getsockopt)(struct sock *, int, int, char __user *, int __user *)) { struct compat_group_filter __user *gf32 = (void *)optval; struct group_filter __user *kgf; @@ -681,7 +677,7 @@ int compat_mc_getsockopt(struct sock *sock, int level, int optname, __put_user(interface, &kgf->gf_interface) || __put_user(fmode, &kgf->gf_fmode) || __put_user(numsrc, &kgf->gf_numsrc) || - copy_in_user(&kgf->gf_group,&gf32->gf_group,sizeof(kgf->gf_group))) + copy_in_user(&kgf->gf_group, &gf32->gf_group, sizeof(kgf->gf_group))) return -EFAULT; err = getsockopt(sock, level, optname, (char __user *)kgf, koptlen); @@ -714,21 +710,22 @@ int compat_mc_getsockopt(struct sock *sock, int level, int optname, copylen = numsrc * sizeof(gf32->gf_slist[0]); if (copylen > klen) copylen = klen; - if (copy_in_user(gf32->gf_slist, kgf->gf_slist, copylen)) + if (copy_in_user(gf32->gf_slist, kgf->gf_slist, copylen)) return -EFAULT; } return err; } - EXPORT_SYMBOL(compat_mc_getsockopt); /* Argument list sizes for compat_sys_socketcall */ #define AL(x) ((x) * sizeof(u32)) -static unsigned char nas[20]={AL(0),AL(3),AL(3),AL(3),AL(2),AL(3), - AL(3),AL(3),AL(4),AL(4),AL(4),AL(6), - AL(6),AL(2),AL(5),AL(5),AL(3),AL(3), - AL(4),AL(5)}; +static unsigned char nas[20] = { + AL(0), AL(3), AL(3), AL(3), AL(2), AL(3), + AL(3), AL(3), AL(4), AL(4), AL(4), AL(6), + AL(6), AL(2), AL(5), AL(5), AL(3), AL(3), + AL(4), AL(5) +}; #undef AL asmlinkage long compat_sys_sendmsg(int fd, struct compat_msghdr __user *msg, unsigned flags) @@ -827,7 +824,7 @@ asmlinkage long compat_sys_socketcall(int call, u32 __user *args) compat_ptr(a[4]), compat_ptr(a[5])); break; case SYS_SHUTDOWN: - ret = sys_shutdown(a0,a1); + ret = sys_shutdown(a0, a1); break; case SYS_SETSOCKOPT: ret = compat_sys_setsockopt(a0, a1, a[2], diff --git a/net/socket.c b/net/socket.c index 367d547..b63c051 100644 --- a/net/socket.c +++ b/net/socket.c @@ -124,7 +124,7 @@ static int sock_fasync(int fd, struct file *filp, int on); static ssize_t sock_sendpage(struct file *file, struct page *page, int offset, size_t size, loff_t *ppos, int more); static ssize_t sock_splice_read(struct file *file, loff_t *ppos, - struct pipe_inode_info *pipe, size_t len, + struct pipe_inode_info *pipe, size_t len, unsigned int flags); /* @@ -162,7 +162,7 @@ static const struct net_proto_family *net_families[NPROTO] __read_mostly; * Statistics counters of the socket lists */ -static DEFINE_PER_CPU(int, sockets_in_use) = 0; +static DEFINE_PER_CPU(int, sockets_in_use); /* * Support routines. @@ -309,9 +309,9 @@ static int init_inodecache(void) } static const struct super_operations sockfs_ops = { - .alloc_inode = sock_alloc_inode, - .destroy_inode =sock_destroy_inode, - .statfs = simple_statfs, + .alloc_inode = sock_alloc_inode, + .destroy_inode = sock_destroy_inode, + .statfs = simple_statfs, }; static int sockfs_get_sb(struct file_system_type *fs_type, @@ -411,6 +411,7 @@ int sock_map_fd(struct socket *sock, int flags) return fd; } +EXPORT_SYMBOL(sock_map_fd); static struct socket *sock_from_file(struct file *file, int *err) { @@ -422,7 +423,7 @@ static struct socket *sock_from_file(struct file *file, int *err) } /** - * sockfd_lookup - Go from a file number to its socket slot + * sockfd_lookup - Go from a file number to its socket slot * @fd: file handle * @err: pointer to an error code return * @@ -450,6 +451,7 @@ struct socket *sockfd_lookup(int fd, int *err) fput(file); return sock; } +EXPORT_SYMBOL(sockfd_lookup); static struct socket *sockfd_lookup_light(int fd, int *err, int *fput_needed) { @@ -540,6 +542,7 @@ void sock_release(struct socket *sock) } sock->file = NULL; } +EXPORT_SYMBOL(sock_release); int sock_tx_timestamp(struct msghdr *msg, struct sock *sk, union skb_shared_tx *shtx) @@ -586,6 +589,7 @@ int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size) ret = wait_on_sync_kiocb(&iocb); return ret; } +EXPORT_SYMBOL(sock_sendmsg); int kernel_sendmsg(struct socket *sock, struct msghdr *msg, struct kvec *vec, size_t num, size_t size) @@ -604,6 +608,7 @@ int kernel_sendmsg(struct socket *sock, struct msghdr *msg, set_fs(oldfs); return result; } +EXPORT_SYMBOL(kernel_sendmsg); static int ktime2ts(ktime_t kt, struct timespec *ts) { @@ -664,7 +669,6 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk, put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING, sizeof(ts), &ts); } - EXPORT_SYMBOL_GPL(__sock_recv_timestamp); inline void sock_recv_drops(struct msghdr *msg, struct sock *sk, struct sk_buff *skb) @@ -720,6 +724,7 @@ int sock_recvmsg(struct socket *sock, struct msghdr *msg, ret = wait_on_sync_kiocb(&iocb); return ret; } +EXPORT_SYMBOL(sock_recvmsg); static int sock_recvmsg_nosec(struct socket *sock, struct msghdr *msg, size_t size, int flags) @@ -752,6 +757,7 @@ int kernel_recvmsg(struct socket *sock, struct msghdr *msg, set_fs(oldfs); return result; } +EXPORT_SYMBOL(kernel_recvmsg); static void sock_aio_dtor(struct kiocb *iocb) { @@ -774,7 +780,7 @@ static ssize_t sock_sendpage(struct file *file, struct page *page, } static ssize_t sock_splice_read(struct file *file, loff_t *ppos, - struct pipe_inode_info *pipe, size_t len, + struct pipe_inode_info *pipe, size_t len, unsigned int flags) { struct socket *sock = file->private_data; @@ -887,7 +893,7 @@ static ssize_t sock_aio_write(struct kiocb *iocb, const struct iovec *iov, */ static DEFINE_MUTEX(br_ioctl_mutex); -static int (*br_ioctl_hook) (struct net *, unsigned int cmd, void __user *arg) = NULL; +static int (*br_ioctl_hook) (struct net *, unsigned int cmd, void __user *arg); void brioctl_set(int (*hook) (struct net *, unsigned int, void __user *)) { @@ -895,7 +901,6 @@ void brioctl_set(int (*hook) (struct net *, unsigned int, void __user *)) br_ioctl_hook = hook; mutex_unlock(&br_ioctl_mutex); } - EXPORT_SYMBOL(brioctl_set); static DEFINE_MUTEX(vlan_ioctl_mutex); @@ -907,7 +912,6 @@ void vlan_ioctl_set(int (*hook) (struct net *, void __user *)) vlan_ioctl_hook = hook; mutex_unlock(&vlan_ioctl_mutex); } - EXPORT_SYMBOL(vlan_ioctl_set); static DEFINE_MUTEX(dlci_ioctl_mutex); @@ -919,7 +923,6 @@ void dlci_ioctl_set(int (*hook) (unsigned int, void __user *)) dlci_ioctl_hook = hook; mutex_unlock(&dlci_ioctl_mutex); } - EXPORT_SYMBOL(dlci_ioctl_set); static long sock_do_ioctl(struct net *net, struct socket *sock, @@ -1047,6 +1050,7 @@ out_release: sock = NULL; goto out; } +EXPORT_SYMBOL(sock_create_lite); /* No kernel lock held - perfect */ static unsigned int sock_poll(struct file *file, poll_table *wait) @@ -1147,6 +1151,7 @@ call_kill: rcu_read_unlock(); return 0; } +EXPORT_SYMBOL(sock_wake_async); static int __sock_create(struct net *net, int family, int type, int protocol, struct socket **res, int kern) @@ -1265,11 +1270,13 @@ int sock_create(int family, int type, int protocol, struct socket **res) { return __sock_create(current->nsproxy->net_ns, family, type, protocol, res, 0); } +EXPORT_SYMBOL(sock_create); int sock_create_kern(int family, int type, int protocol, struct socket **res) { return __sock_create(&init_net, family, type, protocol, res, 1); } +EXPORT_SYMBOL(sock_create_kern); SYSCALL_DEFINE3(socket, int, family, int, type, int, protocol) { @@ -1474,7 +1481,8 @@ SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr, goto out; err = -ENFILE; - if (!(newsock = sock_alloc())) + newsock = sock_alloc(); + if (!newsock) goto out_put; newsock->type = sock->type; @@ -1861,8 +1869,7 @@ SYSCALL_DEFINE3(sendmsg, int, fd, struct msghdr __user *, msg, unsigned, flags) if (MSG_CMSG_COMPAT & flags) { if (get_compat_msghdr(&msg_sys, msg_compat)) return -EFAULT; - } - else if (copy_from_user(&msg_sys, msg, sizeof(struct msghdr))) + } else if (copy_from_user(&msg_sys, msg, sizeof(struct msghdr))) return -EFAULT; sock = sockfd_lookup_light(fd, &err, &fput_needed); @@ -1964,8 +1971,7 @@ static int __sys_recvmsg(struct socket *sock, struct msghdr __user *msg, if (MSG_CMSG_COMPAT & flags) { if (get_compat_msghdr(msg_sys, msg_compat)) return -EFAULT; - } - else if (copy_from_user(msg_sys, msg, sizeof(struct msghdr))) + } else if (copy_from_user(msg_sys, msg, sizeof(struct msghdr))) return -EFAULT; err = -EMSGSIZE; @@ -2191,10 +2197,10 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr __user *, mmsg, /* Argument list sizes for sys_socketcall */ #define AL(x) ((x) * sizeof(unsigned long)) static const unsigned char nargs[20] = { - AL(0),AL(3),AL(3),AL(3),AL(2),AL(3), - AL(3),AL(3),AL(4),AL(4),AL(4),AL(6), - AL(6),AL(2),AL(5),AL(5),AL(3),AL(3), - AL(4),AL(5) + AL(0), AL(3), AL(3), AL(3), AL(2), AL(3), + AL(3), AL(3), AL(4), AL(4), AL(4), AL(6), + AL(6), AL(2), AL(5), AL(5), AL(3), AL(3), + AL(4), AL(5) }; #undef AL @@ -2340,6 +2346,7 @@ int sock_register(const struct net_proto_family *ops) printk(KERN_INFO "NET: Registered protocol family %d\n", ops->family); return err; } +EXPORT_SYMBOL(sock_register); /** * sock_unregister - remove a protocol handler @@ -2366,6 +2373,7 @@ void sock_unregister(int family) printk(KERN_INFO "NET: Unregistered protocol family %d\n", family); } +EXPORT_SYMBOL(sock_unregister); static int __init sock_init(void) { @@ -2490,13 +2498,13 @@ static int dev_ifconf(struct net *net, struct compat_ifconf __user *uifc32) ifc.ifc_req = NULL; uifc = compat_alloc_user_space(sizeof(struct ifconf)); } else { - size_t len =((ifc32.ifc_len / sizeof (struct compat_ifreq)) + 1) * - sizeof (struct ifreq); + size_t len = ((ifc32.ifc_len / sizeof(struct compat_ifreq)) + 1) * + sizeof(struct ifreq); uifc = compat_alloc_user_space(sizeof(struct ifconf) + len); ifc.ifc_len = len; ifr = ifc.ifc_req = (void __user *)(uifc + 1); ifr32 = compat_ptr(ifc32.ifcbuf); - for (i = 0; i < ifc32.ifc_len; i += sizeof (struct compat_ifreq)) { + for (i = 0; i < ifc32.ifc_len; i += sizeof(struct compat_ifreq)) { if (copy_in_user(ifr, ifr32, sizeof(struct compat_ifreq))) return -EFAULT; ifr++; @@ -2516,9 +2524,9 @@ static int dev_ifconf(struct net *net, struct compat_ifconf __user *uifc32) ifr = ifc.ifc_req; ifr32 = compat_ptr(ifc32.ifcbuf); for (i = 0, j = 0; - i + sizeof (struct compat_ifreq) <= ifc32.ifc_len && j < ifc.ifc_len; - i += sizeof (struct compat_ifreq), j += sizeof (struct ifreq)) { - if (copy_in_user(ifr32, ifr, sizeof (struct compat_ifreq))) + i + sizeof(struct compat_ifreq) <= ifc32.ifc_len && j < ifc.ifc_len; + i += sizeof(struct compat_ifreq), j += sizeof(struct ifreq)) { + if (copy_in_user(ifr32, ifr, sizeof(struct compat_ifreq))) return -EFAULT; ifr32++; ifr++; @@ -2567,7 +2575,7 @@ static int compat_siocwandev(struct net *net, struct compat_ifreq __user *uifr32 compat_uptr_t uptr32; struct ifreq __user *uifr; - uifr = compat_alloc_user_space(sizeof (*uifr)); + uifr = compat_alloc_user_space(sizeof(*uifr)); if (copy_in_user(uifr, uifr32, sizeof(struct compat_ifreq))) return -EFAULT; @@ -2601,9 +2609,9 @@ static int bond_ioctl(struct net *net, unsigned int cmd, return -EFAULT; old_fs = get_fs(); - set_fs (KERNEL_DS); + set_fs(KERNEL_DS); err = dev_ioctl(net, cmd, &kifr); - set_fs (old_fs); + set_fs(old_fs); return err; case SIOCBONDSLAVEINFOQUERY: @@ -2710,9 +2718,9 @@ static int compat_sioc_ifmap(struct net *net, unsigned int cmd, return -EFAULT; old_fs = get_fs(); - set_fs (KERNEL_DS); + set_fs(KERNEL_DS); err = dev_ioctl(net, cmd, (void __user *)&ifr); - set_fs (old_fs); + set_fs(old_fs); if (cmd == SIOCGIFMAP && !err) { err = copy_to_user(uifr32, &ifr, sizeof(ifr.ifr_name)); @@ -2734,7 +2742,7 @@ static int compat_siocshwtstamp(struct net *net, struct compat_ifreq __user *uif compat_uptr_t uptr32; struct ifreq __user *uifr; - uifr = compat_alloc_user_space(sizeof (*uifr)); + uifr = compat_alloc_user_space(sizeof(*uifr)); if (copy_in_user(uifr, uifr32, sizeof(struct compat_ifreq))) return -EFAULT; @@ -2750,20 +2758,20 @@ static int compat_siocshwtstamp(struct net *net, struct compat_ifreq __user *uif } struct rtentry32 { - u32 rt_pad1; + u32 rt_pad1; struct sockaddr rt_dst; /* target address */ struct sockaddr rt_gateway; /* gateway addr (RTF_GATEWAY) */ struct sockaddr rt_genmask; /* target network mask (IP) */ - unsigned short rt_flags; - short rt_pad2; - u32 rt_pad3; - unsigned char rt_tos; - unsigned char rt_class; - short rt_pad4; - short rt_metric; /* +1 for binary compatibility! */ + unsigned short rt_flags; + short rt_pad2; + u32 rt_pad3; + unsigned char rt_tos; + unsigned char rt_class; + short rt_pad4; + short rt_metric; /* +1 for binary compatibility! */ /* char * */ u32 rt_dev; /* forcing the device at add */ - u32 rt_mtu; /* per route MTU/Window */ - u32 rt_window; /* Window clamping */ + u32 rt_mtu; /* per route MTU/Window */ + u32 rt_window; /* Window clamping */ unsigned short rt_irtt; /* Initial RTT */ }; @@ -2793,29 +2801,29 @@ static int routing_ioctl(struct net *net, struct socket *sock, if (sock && sock->sk && sock->sk->sk_family == AF_INET6) { /* ipv6 */ struct in6_rtmsg32 __user *ur6 = argp; - ret = copy_from_user (&r6.rtmsg_dst, &(ur6->rtmsg_dst), + ret = copy_from_user(&r6.rtmsg_dst, &(ur6->rtmsg_dst), 3 * sizeof(struct in6_addr)); - ret |= __get_user (r6.rtmsg_type, &(ur6->rtmsg_type)); - ret |= __get_user (r6.rtmsg_dst_len, &(ur6->rtmsg_dst_len)); - ret |= __get_user (r6.rtmsg_src_len, &(ur6->rtmsg_src_len)); - ret |= __get_user (r6.rtmsg_metric, &(ur6->rtmsg_metric)); - ret |= __get_user (r6.rtmsg_info, &(ur6->rtmsg_info)); - ret |= __get_user (r6.rtmsg_flags, &(ur6->rtmsg_flags)); - ret |= __get_user (r6.rtmsg_ifindex, &(ur6->rtmsg_ifindex)); + ret |= __get_user(r6.rtmsg_type, &(ur6->rtmsg_type)); + ret |= __get_user(r6.rtmsg_dst_len, &(ur6->rtmsg_dst_len)); + ret |= __get_user(r6.rtmsg_src_len, &(ur6->rtmsg_src_len)); + ret |= __get_user(r6.rtmsg_metric, &(ur6->rtmsg_metric)); + ret |= __get_user(r6.rtmsg_info, &(ur6->rtmsg_info)); + ret |= __get_user(r6.rtmsg_flags, &(ur6->rtmsg_flags)); + ret |= __get_user(r6.rtmsg_ifindex, &(ur6->rtmsg_ifindex)); r = (void *) &r6; } else { /* ipv4 */ struct rtentry32 __user *ur4 = argp; - ret = copy_from_user (&r4.rt_dst, &(ur4->rt_dst), + ret = copy_from_user(&r4.rt_dst, &(ur4->rt_dst), 3 * sizeof(struct sockaddr)); - ret |= __get_user (r4.rt_flags, &(ur4->rt_flags)); - ret |= __get_user (r4.rt_metric, &(ur4->rt_metric)); - ret |= __get_user (r4.rt_mtu, &(ur4->rt_mtu)); - ret |= __get_user (r4.rt_window, &(ur4->rt_window)); - ret |= __get_user (r4.rt_irtt, &(ur4->rt_irtt)); - ret |= __get_user (rtdev, &(ur4->rt_dev)); + ret |= __get_user(r4.rt_flags, &(ur4->rt_flags)); + ret |= __get_user(r4.rt_metric, &(ur4->rt_metric)); + ret |= __get_user(r4.rt_mtu, &(ur4->rt_mtu)); + ret |= __get_user(r4.rt_window, &(ur4->rt_window)); + ret |= __get_user(r4.rt_irtt, &(ur4->rt_irtt)); + ret |= __get_user(rtdev, &(ur4->rt_dev)); if (rtdev) { - ret |= copy_from_user (devname, compat_ptr(rtdev), 15); + ret |= copy_from_user(devname, compat_ptr(rtdev), 15); r4.rt_dev = devname; devname[15] = 0; } else r4.rt_dev = NULL; @@ -2828,9 +2836,9 @@ static int routing_ioctl(struct net *net, struct socket *sock, goto out; } - set_fs (KERNEL_DS); + set_fs(KERNEL_DS); ret = sock_do_ioctl(net, sock, cmd, (unsigned long) r); - set_fs (old_fs); + set_fs(old_fs); out: return ret; @@ -2993,11 +3001,13 @@ int kernel_bind(struct socket *sock, struct sockaddr *addr, int addrlen) { return sock->ops->bind(sock, addr, addrlen); } +EXPORT_SYMBOL(kernel_bind); int kernel_listen(struct socket *sock, int backlog) { return sock->ops->listen(sock, backlog); } +EXPORT_SYMBOL(kernel_listen); int kernel_accept(struct socket *sock, struct socket **newsock, int flags) { @@ -3022,24 +3032,28 @@ int kernel_accept(struct socket *sock, struct socket **newsock, int flags) done: return err; } +EXPORT_SYMBOL(kernel_accept); int kernel_connect(struct socket *sock, struct sockaddr *addr, int addrlen, int flags) { return sock->ops->connect(sock, addr, addrlen, flags); } +EXPORT_SYMBOL(kernel_connect); int kernel_getsockname(struct socket *sock, struct sockaddr *addr, int *addrlen) { return sock->ops->getname(sock, addr, addrlen, 0); } +EXPORT_SYMBOL(kernel_getsockname); int kernel_getpeername(struct socket *sock, struct sockaddr *addr, int *addrlen) { return sock->ops->getname(sock, addr, addrlen, 1); } +EXPORT_SYMBOL(kernel_getpeername); int kernel_getsockopt(struct socket *sock, int level, int optname, char *optval, int *optlen) @@ -3056,6 +3070,7 @@ int kernel_getsockopt(struct socket *sock, int level, int optname, set_fs(oldfs); return err; } +EXPORT_SYMBOL(kernel_getsockopt); int kernel_setsockopt(struct socket *sock, int level, int optname, char *optval, unsigned int optlen) @@ -3072,6 +3087,7 @@ int kernel_setsockopt(struct socket *sock, int level, int optname, set_fs(oldfs); return err; } +EXPORT_SYMBOL(kernel_setsockopt); int kernel_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags) @@ -3083,6 +3099,7 @@ int kernel_sendpage(struct socket *sock, struct page *page, int offset, return sock_no_sendpage(sock, page, offset, size, flags); } +EXPORT_SYMBOL(kernel_sendpage); int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg) { @@ -3095,33 +3112,11 @@ int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg) return err; } +EXPORT_SYMBOL(kernel_sock_ioctl); int kernel_sock_shutdown(struct socket *sock, enum sock_shutdown_cmd how) { return sock->ops->shutdown(sock, how); } - -EXPORT_SYMBOL(sock_create); -EXPORT_SYMBOL(sock_create_kern); -EXPORT_SYMBOL(sock_create_lite); -EXPORT_SYMBOL(sock_map_fd); -EXPORT_SYMBOL(sock_recvmsg); -EXPORT_SYMBOL(sock_register); -EXPORT_SYMBOL(sock_release); -EXPORT_SYMBOL(sock_sendmsg); -EXPORT_SYMBOL(sock_unregister); -EXPORT_SYMBOL(sock_wake_async); -EXPORT_SYMBOL(sockfd_lookup); -EXPORT_SYMBOL(kernel_sendmsg); -EXPORT_SYMBOL(kernel_recvmsg); -EXPORT_SYMBOL(kernel_bind); -EXPORT_SYMBOL(kernel_listen); -EXPORT_SYMBOL(kernel_accept); -EXPORT_SYMBOL(kernel_connect); -EXPORT_SYMBOL(kernel_getsockname); -EXPORT_SYMBOL(kernel_getpeername); -EXPORT_SYMBOL(kernel_getsockopt); -EXPORT_SYMBOL(kernel_setsockopt); -EXPORT_SYMBOL(kernel_sendpage); -EXPORT_SYMBOL(kernel_sock_ioctl); EXPORT_SYMBOL(kernel_sock_shutdown); + -- 1.7.0.4
2010-06-03net: use __packed annotationEric Dumazet1-3/+3
cleanup patch. Use new __packed annotation in net/ and include/ (except netfilter) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo1-0/+1
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2009-12-11net: use compat helper functions in compat_sys_recvmmsgHeiko Carstens1-5/+2
Use (get|put)_compat_timespec helper functions to simplify the code. Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-11net: fix compat_sys_recvmmsg parameter typeHeiko Carstens1-7/+5
compat_sys_recvmmsg has a compat_timespec parameter and not a timespec parameter. This way we also get rid of an odd cast. Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-12-02net: compat_sys_recvmmsg user timespec arg can be NULLJean-Mickael Guerin1-2/+6
We must test if user timespec is non-NULL before copying from userpace, same as sys_recvmmsg(). Commiter note: changed it so that we have just one branch. Signed-off-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-29net: Cleanup redundant tests on unsignedroel kluin1-3/+0
optlen is unsigned so the `< 0' test is never true. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-12net: Introduce recvmmsg socket syscallArnaldo Carvalho de Melo1-3/+30
Meaning receive multiple messages, reducing the number of syscalls and net stack entry/exit operations. Next patches will introduce mechanisms where protocols that want to optimize this operation will provide an unlocked_recvmsg operation. This takes into account comments made by: . Paul Moore: sock_recvmsg is called only for the first datagram, sock_recvmsg_nosec is used for the rest. . Caitlin Bestler: recvmmsg now has a struct timespec timeout, that works in the same fashion as the ppoll one. If the underlying protocol returns a datagram with MSG_OOB set, this will make recvmmsg return right away with as many datagrams (+ the OOB one) it has received so far. . Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen datagrams and then recvmsg returns an error, recvmmsg will return the successfully received datagrams, store the error and return it in the next call. This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg, where we will be able to acquire the lock only at batch start and end, not at every underlying recvmsg call. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-30net: Make setsockopt() optlen be unsigned.David S. Miller1-6/+6
This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-15net/compat/wext: send different messages to compat tasksJohannes Berg1-2/+15
Wireless extensions have the unfortunate problem that events are multicast netlink messages, and are not independent of pointer size. Thus, currently 32-bit tasks on 64-bit platforms cannot properly receive events and fail with all kinds of strange problems, for instance wpa_supplicant never notices disassociations, due to the way the 64-bit event looks (to a 32-bit process), the fact that the address is all zeroes is lost, it thinks instead it is 00:00:00:00:01:00. The same problem existed with the ioctls, until David Miller fixed those some time ago in an heroic effort. A different problem caused by this is that we cannot send the ASSOCREQIE/ASSOCRESPIE events because sending them causes a 32-bit wpa_supplicant on a 64-bit system to overwrite its internal information, which is worse than it not getting the information at all -- so we currently resort to sending a custom string event that it then parses. This, however, has a severe size limitation we are frequently hitting with modern access points; this limitation would can be lifted after this patch by sending the correct binary, not custom, event. A similar problem apparently happens for some other netlink users on x86_64 with 32-bit tasks due to the alignment for 64-bit quantities. In order to fix these problems, I have implemented a way to send compat messages to tasks. When sending an event, we send the non-compat event data together with a compat event data in skb_shinfo(main_skb)->frag_list. Then, when the event is read from the socket, the netlink code makes sure to pass out only the skb that is compatible with the task. This approach was suggested by David Miller, my original approach required always sending two skbs but that had various small problems. To determine whether compat is needed or not, I have used the MSG_CMSG_COMPAT flag, and adjusted the call path for recv and recvfrom to include it, even if those calls do not have a cmsg parameter. I have not solved one small part of the problem, and I don't think it is necessary to: if a 32-bit application uses read() rather than any form of recvmsg() it will still get the wrong (64-bit) event. However, neither do applications actually do this, nor would it be a regression. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-15net: socket infrastructure for SO_TIMESTAMPINGPatrick Ohly1-7/+12
The overlap with the old SO_TIMESTAMP[NS] options is handled so that time stamping in software (net_enable_timestamp()) is enabled when SO_TIMESTAMP[NS] and/or SO_TIMESTAMPING_RX_SOFTWARE is set. It's disabled if all of these are off. Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-19reintroduce accept4Ulrich Drepper1-45/+5
Introduce a new accept4() system call. The addition of this system call matches analogous changes in 2.6.27 (dup3(), evenfd2(), signalfd4(), inotify_init1(), epoll_create1(), pipe2()) which added new system calls that differed from analogous traditional system calls in adding a flags argument that can be used to access additional functionality. The accept4() system call is exactly the same as accept(), except that it adds a flags bit-mask argument. Two flags are initially implemented. (Most of the new system calls in 2.6.27 also had both of these flags.) SOCK_CLOEXEC causes the close-on-exec (FD_CLOEXEC) flag to be enabled for the new file descriptor returned by accept4(). This is a useful security feature to avoid leaking information in a multithreaded program where one thread is doing an accept() at the same time as another thread is doing a fork() plus exec(). More details here: http://udrepper.livejournal.com/20407.html "Secure File Descriptor Handling", Ulrich Drepper). The other flag is SOCK_NONBLOCK, which causes the O_NONBLOCK flag to be enabled on the new open file description created by accept4(). (This flag is merely a convenience, saving the use of additional calls fcntl(F_GETFL) and fcntl (F_SETFL) to achieve the same result. Here's a test program. Works on x86-32. Should work on x86-64, but I (mtk) don't have a system to hand to test with. It tests accept4() with each of the four possible combinations of SOCK_CLOEXEC and SOCK_NONBLOCK set/clear in 'flags', and verifies that the appropriate flags are set on the file descriptor/open file description returned by accept4(). I tested Ulrich's patch in this thread by applying against 2.6.28-rc2, and it passes according to my test program. /* test_accept4.c Copyright (C) 2008, Linux Foundation, written by Michael Kerrisk <mtk.manpages@gmail.com> Licensed under the GNU GPLv2 or later. */ #define _GNU_SOURCE #include <unistd.h> #include <sys/syscall.h> #include <sys/socket.h> #include <netinet/in.h> #include <stdlib.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #define PORT_NUM 33333 #define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0) /**********************************************************************/ /* The following is what we need until glibc gets a wrapper for accept4() */ /* Flags for socket(), socketpair(), accept4() */ #ifndef SOCK_CLOEXEC #define SOCK_CLOEXEC O_CLOEXEC #endif #ifndef SOCK_NONBLOCK #define SOCK_NONBLOCK O_NONBLOCK #endif #ifdef __x86_64__ #define SYS_accept4 288 #elif __i386__ #define USE_SOCKETCALL 1 #define SYS_ACCEPT4 18 #else #error "Sorry -- don't know the syscall # on this architecture" #endif static int accept4(int fd, struct sockaddr *sockaddr, socklen_t *addrlen, int flags) { printf("Calling accept4(): flags = %x", flags); if (flags != 0) { printf(" ("); if (flags & SOCK_CLOEXEC) printf("SOCK_CLOEXEC"); if ((flags & SOCK_CLOEXEC) && (flags & SOCK_NONBLOCK)) printf(" "); if (flags & SOCK_NONBLOCK) printf("SOCK_NONBLOCK"); printf(")"); } printf("\n"); #if USE_SOCKETCALL long args[6]; args[0] = fd; args[1] = (long) sockaddr; args[2] = (long) addrlen; args[3] = flags; return syscall(SYS_socketcall, SYS_ACCEPT4, args); #else return syscall(SYS_accept4, fd, sockaddr, addrlen, flags); #endif } /**********************************************************************/ static int do_test(int lfd, struct sockaddr_in *conn_addr, int closeonexec_flag, int nonblock_flag) { int connfd, acceptfd; int fdf, flf, fdf_pass, flf_pass; struct sockaddr_in claddr; socklen_t addrlen; printf("=======================================\n"); connfd = socket(AF_INET, SOCK_STREAM, 0); if (connfd == -1) die("socket"); if (connect(connfd, (struct sockaddr *) conn_addr, sizeof(struct sockaddr_in)) == -1) die("connect"); addrlen = sizeof(struct sockaddr_in); acceptfd = accept4(lfd, (struct sockaddr *) &claddr, &addrlen, closeonexec_flag | nonblock_flag); if (acceptfd == -1) { perror("accept4()"); close(connfd); return 0; } fdf = fcntl(acceptfd, F_GETFD); if (fdf == -1) die("fcntl:F_GETFD"); fdf_pass = ((fdf & FD_CLOEXEC) != 0) == ((closeonexec_flag & SOCK_CLOEXEC) != 0); printf("Close-on-exec flag is %sset (%s); ", (fdf & FD_CLOEXEC) ? "" : "not ", fdf_pass ? "OK" : "failed"); flf = fcntl(acceptfd, F_GETFL); if (flf == -1) die("fcntl:F_GETFD"); flf_pass = ((flf & O_NONBLOCK) != 0) == ((nonblock_flag & SOCK_NONBLOCK) !=0); printf("nonblock flag is %sset (%s)\n", (flf & O_NONBLOCK) ? "" : "not ", flf_pass ? "OK" : "failed"); close(acceptfd); close(connfd); printf("Test result: %s\n", (fdf_pass && flf_pass) ? "PASS" : "FAIL"); return fdf_pass && flf_pass; } static int create_listening_socket(int port_num) { struct sockaddr_in svaddr; int lfd; int optval; memset(&svaddr, 0, sizeof(struct sockaddr_in)); svaddr.sin_family = AF_INET; svaddr.sin_addr.s_addr = htonl(INADDR_ANY); svaddr.sin_port = htons(port_num); lfd = socket(AF_INET, SOCK_STREAM, 0); if (lfd == -1) die("socket"); optval = 1; if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval)) == -1) die("setsockopt"); if (bind(lfd, (struct sockaddr *) &svaddr, sizeof(struct sockaddr_in)) == -1) die("bind"); if (listen(lfd, 5) == -1) die("listen"); return lfd; } int main(int argc, char *argv[]) { struct sockaddr_in conn_addr; int lfd; int port_num; int passed; passed = 1; port_num = (argc > 1) ? atoi(argv[1]) : PORT_NUM; memset(&conn_addr, 0, sizeof(struct sockaddr_in)); conn_addr.sin_family = AF_INET; conn_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK); conn_addr.sin_port = htons(port_num); lfd = create_listening_socket(port_num); if (!do_test(lfd, &conn_addr, 0, 0)) passed = 0; if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, 0)) passed = 0; if (!do_test(lfd, &conn_addr, 0, SOCK_NONBLOCK)) passed = 0; if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, SOCK_NONBLOCK)) passed = 0; close(lfd); exit(passed ? EXIT_SUCCESS : EXIT_FAILURE); } [mtk.manpages@gmail.com: rewrote changelog, updated test program] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Tested-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: <linux-api@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-12net: put_cmsg_compat + SO_TIMESTAMP[NS]: use same name for value as callerPatrick Ohly1-2/+2
In __sock_recv_timestamp() the additional SCM_TIMESTAMP[NS] is used. This has the same value as SO_TIMESTAMP[NS], so this is a purely cosmetic change. Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-24flag parameters: pacceptUlrich Drepper1-4/+48
This patch is by far the most complex in the series. It adds a new syscall paccept. This syscall differs from accept in that it adds (at the userlevel) two additional parameters: - a signal mask - a flags value The flags parameter can be used to set flag like SOCK_CLOEXEC. This is imlpemented here as well. Some people argued that this is a property which should be inherited from the file desriptor for the server but this is against POSIX. Additionally, we really want the signal mask parameter as well (similar to pselect, ppoll, etc). So an interface change in inevitable. The flag value is the same as for socket and socketpair. I think diverging here will only create confusion. Similar to the filesystem interfaces where the use of the O_* constants differs, it is acceptable here. The signal mask is handled as for pselect etc. The mask is temporarily installed for the thread and removed before the call returns. I modeled the code after pselect. If there is a problem it's likely also in pselect. For architectures which use socketcall I maintained this interface instead of adding a system call. The symmetry shouldn't be broken. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <errno.h> #include <fcntl.h> #include <pthread.h> #include <signal.h> #include <stdio.h> #include <unistd.h> #include <netinet/in.h> #include <sys/socket.h> #include <sys/syscall.h> #ifndef __NR_paccept # ifdef __x86_64__ # define __NR_paccept 288 # elif defined __i386__ # define SYS_PACCEPT 18 # define USE_SOCKETCALL 1 # else # error "need __NR_paccept" # endif #endif #ifdef USE_SOCKETCALL # define paccept(fd, addr, addrlen, mask, flags) \ ({ long args[6] = { \ (long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \ syscall (__NR_socketcall, SYS_PACCEPT, args); }) #else # define paccept(fd, addr, addrlen, mask, flags) \ syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags) #endif #define PORT 57392 #define SOCK_CLOEXEC O_CLOEXEC static pthread_barrier_t b; static void * tf (void *arg) { pthread_barrier_wait (&b); int s = socket (AF_INET, SOCK_STREAM, 0); struct sockaddr_in sin; sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr *) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); s = socket (AF_INET, SOCK_STREAM, 0); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr *) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); pthread_barrier_wait (&b); sleep (2); pthread_kill ((pthread_t) arg, SIGUSR1); return NULL; } static void handler (int s) { } int main (void) { pthread_barrier_init (&b, NULL, 2); struct sockaddr_in sin; pthread_t th; if (pthread_create (&th, NULL, tf, (void *) pthread_self ()) != 0) { puts ("pthread_create failed"); return 1; } int s = socket (AF_INET, SOCK_STREAM, 0); int reuse = 1; setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse)); sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); bind (s, (struct sockaddr *) &sin, sizeof (sin)); listen (s, SOMAXCONN); pthread_barrier_wait (&b); int s2 = paccept (s, NULL, 0, NULL, 0); if (s2 < 0) { puts ("paccept(0) failed"); return 1; } int coe = fcntl (s2, F_GETFD); if (coe & FD_CLOEXEC) { puts ("paccept(0) set close-on-exec-flag"); return 1; } close (s2); pthread_barrier_wait (&b); s2 = paccept (s, NULL, 0, NULL, SOCK_CLOEXEC); if (s2 < 0) { puts ("paccept(SOCK_CLOEXEC) failed"); return 1; } coe = fcntl (s2, F_GETFD); if ((coe & FD_CLOEXEC) == 0) { puts ("paccept(SOCK_CLOEXEC) does not set close-on-exec flag"); return 1; } close (s2); pthread_barrier_wait (&b); struct sigaction sa; sa.sa_handler = handler; sa.sa_flags = 0; sigemptyset (&sa.sa_mask); sigaction (SIGUSR1, &sa, NULL); sigset_t ss; pthread_sigmask (SIG_SETMASK, NULL, &ss); sigaddset (&ss, SIGUSR1); pthread_sigmask (SIG_SETMASK, &ss, NULL); sigdelset (&ss, SIGUSR1); alarm (4); pthread_barrier_wait (&b); errno = 0 ; s2 = paccept (s, NULL, 0, &ss, 0); if (s2 != -1 || errno != EINTR) { puts ("paccept did not fail with EINTR"); return 1; } close (s); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [akpm@linux-foundation.org: make it compile] [akpm@linux-foundation.org: add sys_ni stub] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Roland McGrath <roland@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-19net: Use standard structures for generic socket address structures.YOSHIFUJI Hideaki1-1/+1
Use sockaddr_storage{} for generic socket address storage and ensures proper alignment. Use sockaddr{} for pointers to omit several casts. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-29net: Add compat support for getsockopt (MCAST_MSFILTER)David L Stevens1-0/+79
This patch adds support for getsockopt for MCAST_MSFILTER for both IPv4 and IPv6. It depends on the previous setsockopt patch, and uses the same method. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-29net: Several cleanups for the setsockopt compat support.David L Stevens1-4/+7
1) added missing "__user" for kgsr and kgf pointers 2) verify read for only GROUP_FILTER_SIZE(0). The group_filter structure definition (via RFC) includes space for one source in the source list array, but that source need not be present. So, sizeof(group_filter) > GROUP_FILTER_SIZE(0). Fixed the user read-check for minimum length to use the smaller size. 3) remove unneeded "&" for gf_slist addresses Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-27ipv4/ipv6 compat: Fix SSM applications on 64bit kernels.David L Stevens1-0/+117
Add support on 64-bit kernels for seting 32-bit compatible MCAST* socket options. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: ip6_tables: add compat supportPatrick McHardy1-106/+0
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: Introduce NF_INET_ hook valuesPatrick McHardy1-3/+3
The IPv4 and IPv6 hook values are identical, yet some code tries to figure out the "correct" value by looking at the address family. Introduce NF_INET_* values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__ section for userspace compatibility. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-20[NET]: Fix function put_cmsg() which may cause usr application memory overflowWei Yongjun1-0/+2
When used function put_cmsg() to copy kernel information to user application memory, if the memory length given by user application is not enough, by the bad length calculate of msg.msg_controllen, put_cmsg() function may cause the msg.msg_controllen to be a large value, such as 0xFFFFFFF0, so the following put_cmsg() can also write data to usr application memory even usr has no valid memory to store this. This may cause usr application memory overflow. int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data) { struct cmsghdr __user *cm = (__force struct cmsghdr __user *)msg->msg_control; struct cmsghdr cmhdr; int cmlen = CMSG_LEN(len); ~~~~~~~~~~~~~~~~~~~~~ int err; if (MSG_CMSG_COMPAT & msg->msg_flags) return put_cmsg_compat(msg, level, type, len, data); if (cm==NULL || msg->msg_controllen < sizeof(*cm)) { msg->msg_flags |= MSG_CTRUNC; return 0; /* XXX: return error? check spec. */ } if (msg->msg_controllen < cmlen) { ~~~~~~~~~~~~~~~~~~~~~~~~ msg->msg_flags |= MSG_CTRUNC; cmlen = msg->msg_controllen; } cmhdr.cmsg_level = level; cmhdr.cmsg_type = type; cmhdr.cmsg_len = cmlen; err = -EFAULT; if (copy_to_user(cm, &cmhdr, sizeof cmhdr)) goto out; if (copy_to_user(CMSG_DATA(cm), data, cmlen - sizeof(struct cmsghdr))) goto out; cmlen = CMSG_SPACE(len); ~~~~~~~~~~~~~~~~~~~~~~~~~~~ If MSG_CTRUNC flags is set, msg->msg_controllen is less than CMSG_SPACE(len), "msg->msg_controllen -= cmlen" will cause unsinged int type msg->msg_controllen to be a large value. ~~~~~~~~~~~~~~~~~~~~~~~~~~~ msg->msg_control += cmlen; msg->msg_controllen -= cmlen; ~~~~~~~~~~~~~~~~~~~~~ err = 0; out: return err; } The same promble exists in put_cmsg_compat(). This patch can fix this problem. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-16O_CLOEXEC for SCM_RIGHTSUlrich Drepper1-1/+2
Part two in the O_CLOEXEC saga: adding support for file descriptors received through Unix domain sockets. The patch is once again pretty minimal, it introduces a new flag for recvmsg and passes it just like the existing MSG_CMSG_COMPAT flag. I think this bit is not used otherwise but the networking people will know better. This new flag is not recognized by recvfrom and recv. These functions cannot be used for that purpose and the asymmetry this introduces is not worse than the already existing MSG_CMSG_COMPAT situations. The patch must be applied on the patch which introduced O_CLOEXEC. It has to remove static from the new get_unused_fd_flags function but since scm.c cannot live in a module the function still hasn't to be exported. Here's a test program to make sure the code works. It's so much longer than the actual patch... #include <errno.h> #include <error.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <unistd.h> #include <sys/socket.h> #include <sys/un.h> #ifndef O_CLOEXEC # define O_CLOEXEC 02000000 #endif #ifndef MSG_CMSG_CLOEXEC # define MSG_CMSG_CLOEXEC 0x40000000 #endif int main (int argc, char *argv[]) { if (argc > 1) { int fd = atol (argv[1]); printf ("child: fd = %d\n", fd); if (fcntl (fd, F_GETFD) == 0 || errno != EBADF) { puts ("file descriptor valid in child"); return 1; } return 0; } struct sockaddr_un sun; strcpy (sun.sun_path, "./testsocket"); sun.sun_family = AF_UNIX; char databuf[] = "hello"; struct iovec iov[1]; iov[0].iov_base = databuf; iov[0].iov_len = sizeof (databuf); union { struct cmsghdr hdr; char bytes[CMSG_SPACE (sizeof (int))]; } buf; struct msghdr msg = { .msg_iov = iov, .msg_iovlen = 1, .msg_control = buf.bytes, .msg_controllen = sizeof (buf) }; struct cmsghdr *cmsg = CMSG_FIRSTHDR (&msg); cmsg->cmsg_level = SOL_SOCKET; cmsg->cmsg_type = SCM_RIGHTS; cmsg->cmsg_len = CMSG_LEN (sizeof (int)); msg.msg_controllen = cmsg->cmsg_len; pid_t child = fork (); if (child == -1) error (1, errno, "fork"); if (child == 0) { int sock = socket (PF_UNIX, SOCK_STREAM, 0); if (sock < 0) error (1, errno, "socket"); if (bind (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0) error (1, errno, "bind"); if (listen (sock, SOMAXCONN) < 0) error (1, errno, "listen"); int conn = accept (sock, NULL, NULL); if (conn == -1) error (1, errno, "accept"); *(int *) CMSG_DATA (cmsg) = sock; if (sendmsg (conn, &msg, MSG_NOSIGNAL) < 0) error (1, errno, "sendmsg"); return 0; } /* For a test suite this should be more robust like a barrier in shared memory. */ sleep (1); int sock = socket (PF_UNIX, SOCK_STREAM, 0); if (sock < 0) error (1, errno, "socket"); if (connect (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0) error (1, errno, "connect"); unlink (sun.sun_path); *(int *) CMSG_DATA (cmsg) = -1; if (recvmsg (sock, &msg, MSG_CMSG_CLOEXEC) < 0) error (1, errno, "recvmsg"); int fd = *(int *) CMSG_DATA (cmsg); if (fd == -1) error (1, 0, "no descriptor received"); char fdname[20]; snprintf (fdname, sizeof (fdname), "%d", fd); execl ("/proc/self/exe", argv[0], fdname, NULL); puts ("execl failed"); return 1; } [akpm@linux-foundation.org: Fix fastcall inconsistency noted by Michael Buesch] [akpm@linux-foundation.org: build fix] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Michael Buesch <mb@bu3sch.de> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-25[NET]: Adding SO_TIMESTAMPNS / SCM_TIMESTAMPNS supportEric Dumazet1-1/+9
Now that network timestamps use ktime_t infrastructure, we can add a new SOL_SOCKET sockopt SO_TIMESTAMPNS. This command is similar to SO_TIMESTAMP, but permits transmission of a 'timespec struct' instead of a 'timeval struct' control message. (nanosecond resolution instead of microsecond) Control message is labelled SCM_TIMESTAMPNS instead of SCM_TIMESTAMP A socket cannot mix SO_TIMESTAMP and SO_TIMESTAMPNS : the two modes are mutually exclusive. sock_recv_timestamp() became too big to be fully inlined so I added a __sock_recv_timestamp() helper function. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: linux-arch@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET] core: whitespace cleanupStephen Hemminger1-15/+15
Fix whitespace around keywords. Fix indentation especially of switch statements. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolutionEric Dumazet1-0/+24
Now network timestamps use ktime_t infrastructure, we can add a new ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'. User programs can thus access to nanosecond resolution. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET]: convert network timestamps to ktime_tEric Dumazet1-5/+10
We currently use a special structure (struct skb_timeval) and plain 'struct timeval' to store packet timestamps in sk_buffs and struct sock. This has some drawbacks : - Fixed resolution of micro second. - Waste of space on 64bit platforms where sizeof(struct timeval)=16 I suggest using ktime_t that is a nice abstraction of high resolution time services, currently capable of nanosecond resolution. As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits a 8 byte shrink of this structure on 64bit architectures. Some other structures also benefit from this size reduction (struct ipq in ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...) Once this ktime infrastructure adopted, we can more easily provide nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or SO_TIMESTAMPNS/SCM_TIMESTAMPNS) Note : this patch includes a bug correction in compat_sock_get_timestamp() where a "err = 0;" was missing (so this syscall returned -ENOENT instead of 0) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> CC: John find <linux.kernel@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-14[PATCH] remove many unneeded #includes of sched.hTim Schmielau1-1/+0
After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-10[NET]: Fix whitespace errors.YOSHIFUJI Hideaki1-8/+8
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-10-11[NET]: File descriptor loss while receiving SCM_RIGHTSMiklos Szeredi1-2/+1
If more than one file descriptor was sent with an SCM_RIGHTS message, and on the receiving end, after installing a nonzero (but not all) file descritpors the process runs out of fds, then the already installed fds will be lost (userspace will have no way of knowing about them). The following patch makes sure, that at least the already installed fds are sent to userspace. It doesn't solve the issue of losing file descriptors in case of an EFAULT on the userspace buffer. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-01[NETFILTER]: iptables 32bit compat layerDmitry Mishin1-2/+1
This patch extends current iptables compatibility layer in order to get 32bit iptables to work on 64bit kernel. Current layer is insufficient due to alignment checks both in kernel and user space tools. Patch is for current net-2.6.17 with addition of move of ipt_entry_{match| target} definitions to xt_entry_{match|target}. Signed-off-by: Dmitry Mishin <dim@openvz.org> Acked-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-21[NET]: socket timestamp 32 bit handler for 64 bit kernelShaun Pereira1-0/+19
Get socket timestamp handler function that does not use the ioctl32_hash_table. Signed-off-by: Shaun Pereira <spereira@tusc.com.au> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-03-20[NET]: {get|set}sockopt compatibility layerDmitry Mishin1-18/+77
This patch extends {get|set}sockopt compatibility layer in order to move protocol specific parts to their place and avoid huge universal net/compat.c file in the future. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-08[PATCH] Fix 32bit sendmsg() flawAl Viro1-18/+26
When we copy 32bit ->msg_control contents to kernel, we walk the same userland data twice without sanity checks on the second pass. Second version of this patch: the original broke with 64-bit arches running 32-bit-compat-mode executables doing sendmsg() syscalls with unaligned CMSG data areas Another thing is that we use kmalloc() to allocate and sock_kfree_s() to free afterwards; less serious, but also needs fixing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Chris Wright <chrisw@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-08-09[NET]: Fix memory leak in sys_{send,recv}msg() w/compatAndrew Morton1-9/+0
From: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com> sendmsg()/recvmsg() syscalls from o32/n32 apps to a 64bit kernel will cause a kernel memory leak if iov_len > UIO_FASTIOV for each syscall! This is because both sys_sendmsg() and verify_compat_iovec() kmalloc a new iovec structure. Only the one from sys_sendmsg() is free'ed. I wrote a simple test program to confirm this after identifying the problem: http://davej.org/programs/testsendmsg.c Note that the below fix will break solaris_sendmsg()/solaris_recvmsg() as it also calls verify_compat_iovec() but expects it to malloc internally. [ I fixed that. -DaveM ] Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-16Linux-2.6.12-rc2v2.6.12-rc2Linus Torvalds1-0/+605
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!