aboutsummaryrefslogtreecommitdiffstats
path: root/fs/ocfs2/dlmglue.c
AgeCommit message (Collapse)AuthorFilesLines
2024-02-22ocfs2: spelling fixYongzhen Zhang1-1/+1
Modify reques to request in the comment. Link: https://lkml.kernel.org/r/20240108015604.38377-1-zhangyongzhen@kylinos.cn Signed-off-by: Yongzhen Zhang <zhangyongzhen@kylinos.cn> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18ocfs2: convert to new timestamp accessorsJeff Layton1-15/+14
Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-54-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-07-24ocfs2: convert to ctime accessor functionsJeff Layton1-2/+5
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-60-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2022-08-28ocfs2: fix freeing uninitialized resource on ocfs2_dlm_shutdownHeming Zhao1-3/+5
After commit 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job before return error"), any procedure after ocfs2_dlm_init() fails will trigger crash when calling ocfs2_dlm_shutdown(). ie: On local mount mode, no dlm resource is initialized. If ocfs2_mount_volume() fails in ocfs2_find_slot(), error handling will call ocfs2_dlm_shutdown(), then does dlm resource cleanup job, which will trigger kernel crash. This solution should bypass uninitialized resources in ocfs2_dlm_shutdown(). Link: https://lkml.kernel.org/r/20220815085754.20417-1-heming.zhao@suse.com Fixes: 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job before return error") Signed-off-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2021-09-24ocfs2: drop acl cache for directories tooWengang Wang1-1/+2
ocfs2_data_convert_worker() is currently dropping any cached acl info for FILE before down-converting meta lock. It should also drop for DIRECTORY. Otherwise the second acl lookup returns the cached one (from VFS layer) which could be already stale. The problem we are seeing is that the acl changes on one node doesn't get refreshed on other nodes in the following case: Node 1 Node 2 -------------- ---------------- getfacl dir1 getfacl dir1 <-- this is OK setfacl -m u:user1:rwX dir1 getfacl dir1 <-- see the change for user1 getfacl dir1 <-- can't see change for user1 Link: https://lkml.kernel.org/r/20210903012631.6099-1-wen.gang.wang@oracle.com Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03ocfs2: ocfs2_downconvert_lock failure results in deadlockGang He1-0/+12
Usually, ocfs2_downconvert_lock() function always downconverts dlm lock to the expected level for satisfy dlm bast requests from the other nodes. But there is a rare situation. When dlm lock conversion is being canceled, ocfs2_downconvert_lock() function will return -EBUSY. You need to be aware that ocfs2_cancel_convert() function is asynchronous in fsdlm implementation. If we does not requeue this lockres entry, ocfs2 downconvert thread no longer handles this dlm lock bast request. Then, the other nodes will not get the dlm lock again, the current node's process will be blocked when acquire this dlm lock again. Link: https://lkml.kernel.org/r/20210830044621.12544-1-ghe@suse.com Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03ocfs2: remove an unnecessary conditionDan Carpenter1-1/+1
The case where "tmp_oh" is NULL is handled at the start of the function. At this point we know it's non-NULL so this will always return 1. Link: https://lkml.kernel.org/r/YOcItgIXtisi3MaO@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Larry Chen <lchen@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07treewide: remove editor modelines and cruftMasahiro Yamada1-3/+1
The section "19) Editor modelines and other cruft" in Documentation/process/coding-style.rst clearly says, "Do not include any of these in source files." I recently receive a patch to explicitly add a new one. Let's do treewide cleanups, otherwise some people follow the existing code and attempt to upstream their favoriate editor setups. It is even nicer if scripts/checkpatch.pl can check it. If we like to impose coding style in an editor-independent manner, I think editorconfig (patch [1]) is a saner solution. [1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/ Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> [auxdisplay] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-12ocfs2_inode_lock_update(): make sure we don't change the type bits of i_modeAl Viro1-2/+10
... even if another node has gone insane. Fail with -ESTALE if it detects an attempt to change the type bits. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-08-07ocfs2: fix unbalanced lockingPavel Machek1-1/+7
Based on what fails, function can return with nfs_sync_rwlock either locked or unlocked. That can not be right. Always return with lock unlocked on error. Fixes: 4cd9973f9ff6 ("ocfs2: avoid inode removal while nfsd is accessing it") Signed-off-by: Pavel Machek (CIP) <pavel@denx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200724124443.GA28164@duo.ucw.cz Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-26ocfs2: avoid inode removal while nfsd is accessing itJunxiao Bi1-1/+16
Patch series "ocfs2: fix nfsd over ocfs2 issues", v2. This is a series of patches to fix issues on nfsd over ocfs2. patch 1 is to avoid inode removed while nfsd access it patch 2 & 3 is to fix a panic issue. This patch (of 4): When nfsd is getting file dentry using handle or parent dentry of some dentry, one cluster lock is used to avoid inode removed from other node, but it still could be removed from local node, so use a rw lock to avoid this. Link: http://lkml.kernel.org/r/20200616183829.87211-1-junxiao.bi@oracle.com Link: http://lkml.kernel.org/r/20200616183829.87211-2-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02ocfs2: use OCFS2_SEC_BITS in macroAlex Shi1-1/+1
This macro should be used. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/1579577840-251956-1-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31ocfs2: remove unneeded semicolonszhengbin1-1/+1
Fixes coccicheck warnings: fs/ocfs2/cluster/quorum.c:76:2-3: Unneeded semicolon fs/ocfs2/dlmglue.c:573:2-3: Unneeded semicolon Link: http://lkml.kernel.org/r/6ee3aa16-9078-30b1-df3f-22064950bd98@linux.alibaba.com Signed-off-by: zhengbin <zhengbin13@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04ocfs2: fix the crash due to call ocfs2_get_dlm_debug once lessGang He1-0/+1
Because ocfs2_get_dlm_debug() function is called once less here, ocfs2 file system will trigger the system crash, usually after ocfs2 file system is unmounted. This system crash is caused by a generic memory corruption, these crash backtraces are not always the same, for exapmle, ocfs2: Unmounting device (253,16) on (node 172167785) general protection fault: 0000 [#1] SMP PTI CPU: 3 PID: 14107 Comm: fence_legacy Kdump: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:__kmalloc+0xa5/0x2a0 Code: 00 00 4d 8b 07 65 4d 8b RSP: 0018:ffffaa1fc094bbe8 EFLAGS: 00010286 RAX: 0000000000000000 RBX: d310a8800d7a3faf RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff96e68fc036c0 RBP: d310a8800d7a3faf R08: ffff96e6ffdb10a0 R09: 00000000752e7079 R10: 000000000001c513 R11: 0000000004091041 R12: 0000000000000dc0 R13: 0000000000000039 R14: ffff96e68fc036c0 R15: ffff96e68fc036c0 FS: 00007f699dfba540(0000) GS:ffff96e6ffd80000(0000) knlGS:00000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f3a9d9b768 CR3: 000000002cd1c000 CR4: 00000000000006e0 Call Trace: ext4_htree_store_dirent+0x35/0x100 [ext4] htree_dirblock_to_tree+0xea/0x290 [ext4] ext4_htree_fill_tree+0x1c1/0x2d0 [ext4] ext4_readdir+0x67c/0x9d0 [ext4] iterate_dir+0x8d/0x1a0 __x64_sys_getdents+0xab/0x130 do_syscall_64+0x60/0x1f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f699d33a9fb This regression problem was introduced by commit e581595ea29c ("ocfs: no need to check return value of debugfs_create functions"). Link: http://lkml.kernel.org/r/20191225061501.13587-1-ghe@suse.com Fixes: e581595ea29c ("ocfs: no need to check return value of debugfs_create functions") Signed-off-by: Gang He <ghe@suse.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> [5.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-09locking/lockdep: Remove unused @nested argument from lock_release()Qian Cai1-1/+1
Since the following commit: b4adfe8e05f1 ("locking/lockdep: Remove unused argument in __lock_release") @nested is no longer used in lock_release(), so remove it from all lock_release() calls and friends. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: alexander.levin@microsoft.com Cc: daniel@iogearbox.net Cc: davem@davemloft.net Cc: dri-devel@lists.freedesktop.org Cc: duyuyang@gmail.com Cc: gregkh@linuxfoundation.org Cc: hannes@cmpxchg.org Cc: intel-gfx@lists.freedesktop.org Cc: jack@suse.com Cc: jlbec@evilplan.or Cc: joonas.lahtinen@linux.intel.com Cc: joseph.qi@linux.alibaba.com Cc: jslaby@suse.com Cc: juri.lelli@redhat.com Cc: maarten.lankhorst@linux.intel.com Cc: mark@fasheh.com Cc: mhocko@kernel.org Cc: mripard@kernel.org Cc: ocfs2-devel@oss.oracle.com Cc: rodrigo.vivi@intel.com Cc: sean@poorly.run Cc: st@kernel.org Cc: tj@kernel.org Cc: tytso@mit.edu Cc: vdavydov.dev@gmail.com Cc: vincent.guittot@linaro.org Cc: viro@zeniv.linux.org.uk Link: https://lkml.kernel.org/r/1568909380-32199-1-git-send-email-cai@lca.pw Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-24ocfs2: delete unnecessary checks before brelse()Markus Elfring1-5/+2
brelse() tests whether its argument is NULL and then returns immediately. Thus the tests around the shown calls are not needed. This issue was detected by using the Coccinelle software. Link: http://lkml.kernel.org/r/55cde320-394b-f985-56ce-1a2abea782aa@web.de Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24ocfs2: further debugfs cleanupsGreg Kroah-Hartman1-15/+5
There is no need to check return value of debugfs_create functions, but the last sweep through ocfs missed a number of places where this was happening. There is also no need to save the individual dentries for the debugfs files, as everything is can just be removed at once when the directory is removed. By getting rid of the file dentries for the debugfs entries, a bit of local memory can be saved as well. [colin.king@canonical.com: ensure ret is set to zero before returning] Link: http://lkml.kernel.org/r/20190807121929.28918-1-colin.king@canonical.com Link: http://lkml.kernel.org/r/20190731132119.GA12603@kroah.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jia Guo <guojia12@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12fs/ocfs2/dlmglue.c: unneeded variable: "status"Hariprasad Kelam1-2/+1
fix below issue reported by coccicheck fs/ocfs2/dlmglue.c:4410:5-11: Unneeded variable: "status". Return "0" on line 4428 We can not change return type of ocfs2_downconvert_thread as its registered as callback of kthread_create. Link: http://lkml.kernel.org/r/20190702183237.GA13975@hari-Inspiron-1545 Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12ocfs: no need to check return value of debugfs_create functionsGreg Kroah-Hartman1-23/+2
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Also, because there is no need to save the file dentry, remove all of the variables that were being saved, and just recursively delete the whole directory when shutting down, saving a lot of logic and local variables. [gregkh@linuxfoundation.org: v2] Link: http://lkml.kernel.org/r/20190613055455.GE19717@kroah.com Link: http://lkml.kernel.org/r/20190612152912.GA19151@kroah.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Jia Guo <guojia12@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12ocfs2: add first lock wait time in locking_stateGang He1-3/+29
ocfs2 file system uses locking_state file under debugfs to dump each ocfs2 file system's dlm lock resources, but the users ever encountered some hang(deadlock) problems in ocfs2 file system. I'd like to add first lock wait time in locking_state file, which can help the upper scripts detect these deadlock problems via comparing the first lock wait time with the current time. Link: http://lkml.kernel.org/r/20190611015414.27754-3-ghe@suse.com Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12ocfs2: add locking filter debugfs fileGang He1-0/+38
Add locking filter debugfs file, which is used to filter lock resources dump from locking_state debugfs file. We use d_filter_secs field to filter lock resources dump, the default d_filter_secs(0) value filters nothing, otherwise, only dump the last N seconds active lock resources. This enhancement can avoid dumping lots of old records. The d_filter_secs value can be changed via locking_filter file. [akpm@linux-foundation.org: fix undefined reference to `__udivdi3'] Link: http://lkml.kernel.org/r/20190611015414.27754-2-ghe@suse.com Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> [build-tested] Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12ocfs2: add last unlock times in locking_stateGang He1-3/+15
ocfs2 file system uses locking_state file under debugfs to dump each ocfs2 file system's dlm lock resources, but the dlm lock resources in memory are becoming more and more after the files were touched by the user. it will become a bit difficult to analyze these dlm lock resource records in locking_state file by the upper scripts, though some files are not active for now, which were accessed long time ago. Then, I'd like to add last pr/ex unlock times in locking_state file for each dlm lock resource record, the the upper scripts can use last unlock time to filter inactive dlm lock resource record. Link: http://lkml.kernel.org/r/20190611015414.27754-1-ghe@suse.com Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 145Thomas Gleixner1-15/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 021110 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 84 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190524100844.756442981@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-03-05ocfs2: fix the application IO timeout when fstrim is runningGang He1-0/+5
The user reported this problem, the upper application IO was timeout when fstrim was running on this ocfs2 partition. the application monitoring resource agent considered that this application did not work, then this node was fenced by the cluster brain (e.g. pacemaker). The root cause is that fstrim thread always holds main_bm meta-file related locks until all the cluster groups are trimmed. This patch will make fstrim thread release main_bm meta-file related locks when each cluster group is trimmed, this will let the current application IO has a chance to claim the clusters from main_bm meta-file. Link: http://lkml.kernel.org/r/20190111090014.31645-1-ghe@suse.com Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03ocfs2: dlmglue: clean up timestamp handlingArnd Bergmann1-17/+9
The handling of timestamps outside of the 1970..2038 range in the dlm glue is rather inconsistent: on 32-bit architectures, this has always wrapped around to negative timestamps in the 1902..1969 range, while on 64-bit kernels all timestamps are interpreted as positive 34 bit numbers in the 1970..2514 year range. Now that the VFS code handles 64-bit timestamps on all architectures, we can make the behavior more consistent here, and return the same result that we had on 64-bit already, making the file system y2038 safe in the process. Outside of dlmglue, it already uses 64-bit on-disk timestamps anway, so that part is fine. For consistency, I'm changing ocfs2_pack_timespec() to clamp anything outside of the supported range to the minimum and maximum values. This avoids a possible ambiguity of values before 1970 in particular, which used to be interpreted as times at the end of the 2514 range previously. Link: http://lkml.kernel.org/r/20180619155826.4106487-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-03ocfs2: remove ocfs2_is_o2cb_active()Gang He1-1/+1
Remove ocfs2_is_o2cb_active(). We have similar functions to identify which cluster stack is being used via osb->osb_cluster_stack. Secondly, the current implementation of ocfs2_is_o2cb_active() is not totally safe. Based on the design of stackglue, we need to get ocfs2_stack_lock before using ocfs2_stack related data structures, and that active_stack pointer can be NULL in the case of mount failure. Link: http://lkml.kernel.org/r/1495441079-11708-1-git-send-email-ghe@suse.com Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Reviewed-by: Eric Ren <zren@suse.com> Acked-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-13ocfs2: fix a GCC warningzhong jiang1-0/+2
Fix the following compile warning: fs/ocfs2/dlmglue.c:99:30: warning: ‘lockdep_keys’ defined but not used [-Wunused-variable] static struct lock_class_key lockdep_keys[OCFS2_NUM_LOCK_TYPES]; Link: http://lkml.kernel.org/r/1536938148-32110-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-08-17ocfs2: make several functions and variables static (and some const)Colin Ian King1-1/+1
There are a variety of functions and variables that are local to the source and do not need to be in global scope, so make them static. Also make a couple of char arrays static const. Cleans up sparse warnings: symbol 'o2hb_heartbeat_mode_desc' was not declared. Should it be static? symbol 'o2hb_heartbeat_mode' was not declared. Should it be static? symbol 'o2hb_dependent_users' was not declared. Should it be static? symbol 'o2hb_region_dec_user' was not declared. Should it be static? symbol 'o2nm_fence_method_desc' was not declared. Should it be static? symbol 'lockdep_keys' was not declared. Should it be static? Link: http://lkml.kernel.org/r/20180628131659.12133-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-15Merge tag 'vfs-timespec64' of ↵Linus Torvalds1-6/+14
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground Pull inode timestamps conversion to timespec64 from Arnd Bergmann: "This is a late set of changes from Deepa Dinamani doing an automated treewide conversion of the inode and iattr structures from 'timespec' to 'timespec64', to push the conversion from the VFS layer into the individual file systems. As Deepa writes: 'The series aims to switch vfs timestamps to use struct timespec64. Currently vfs uses struct timespec, which is not y2038 safe. The series involves the following: 1. Add vfs helper functions for supporting struct timepec64 timestamps. 2. Cast prints of vfs timestamps to avoid warnings after the switch. 3. Simplify code using vfs timestamps so that the actual replacement becomes easy. 4. Convert vfs timestamps to use struct timespec64 using a script. This is a flag day patch. Next steps: 1. Convert APIs that can handle timespec64, instead of converting timestamps at the boundaries. 2. Update internal data structures to avoid timestamp conversions' Thomas Gleixner adds: 'I think there is no point to drag that out for the next merge window. The whole thing needs to be done in one go for the core changes which means that you're going to play that catchup game forever. Let's get over with it towards the end of the merge window'" * tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground: pstore: Remove bogus format string definition vfs: change inode times to use struct timespec64 pstore: Convert internal records to timespec64 udf: Simplify calls to udf_disk_stamp_to_time fs: nfs: get rid of memcpys for inode times ceph: make inode time prints to be long long lustre: Use long long type to print inode time fs: add timespec64_truncate()
2018-06-07ocfs2: ocfs2_inode_lock_tracker does not distinguish lock levelLarry Chen1-30/+89
ocfs2_inode_lock_tracker as a variant of ocfs2_inode_lock, is used to prevent deadlock due to recursive lock acquisition. But this function does not distinguish whether the requested level is EX or PR. If a RP lock has been attained, this function will immediately return success afterwards even an EX lock is requested. But actually the return value does not mean that the process got a EX lock, because ocfs2_inode_lock has not been called. When taking lock levels into account, we face some different situations: 1. no lock is held In this case, just lock the inode and return 0 2. We are holding a lock For this situation, things diverges into several cases wanted holding what to do ex ex see 2.1 below ex pr see 2.2 below pr ex see 2.1 below pr pr see 2.1 below 2.1 lock level that is been held is compatible with the wanted level, so no lock action will be tacken. 2.2 Otherwise, an upgrade is needed, but it is forbidden. Reason why upgrade within a process is forbidden is that lock upgrade may cause dead lock. The following illustrate how it happens. process 1 process 2 ocfs2_inode_lock_tracker(ex=0) <====== ocfs2_inode_lock_tracker(ex=1) ocfs2_inode_lock_tracker(ex=1) For the status quo of ocfs2, without this patch, neither a bug nor end-user impact will be caused because the wrong logic is avoided. But I'm afraid this generic interface, may be called by other developers in future and used in this situation. a process ocfs2_inode_lock_tracker(ex=0) ocfs2_inode_lock_tracker(ex=1) Link: http://lkml.kernel.org/r/20180510053230.17217-1-lchen@suse.com Signed-off-by: Larry Chen <lchen@suse.com> Reviewed-by: Gang He <ghe@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-05vfs: change inode times to use struct timespec64Deepa Dinamani1-6/+14
struct timespec is not y2038 safe. Transition vfs to use y2038 safe struct timespec64 instead. The change was made with the help of the following cocinelle script. This catches about 80% of the changes. All the header file and logic changes are included in the first 5 rules. The rest are trivial substitutions. I avoid changing any of the function signatures or any other filesystem specific data structures to keep the patch simple for review. The script can be a little shorter by combining different cases. But, this version was sufficient for my usecase. virtual patch @ depends on patch @ identifier now; @@ - struct timespec + struct timespec64 current_time ( ... ) { - struct timespec now = current_kernel_time(); + struct timespec64 now = current_kernel_time64(); ... - return timespec_trunc( + return timespec64_trunc( ... ); } @ depends on patch @ identifier xtime; @@ struct \( iattr \| inode \| kstat \) { ... - struct timespec xtime; + struct timespec64 xtime; ... } @ depends on patch @ identifier t; @@ struct inode_operations { ... int (*update_time) (..., - struct timespec t, + struct timespec64 t, ...); ... } @ depends on patch @ identifier t; identifier fn_update_time =~ "update_time$"; @@ fn_update_time (..., - struct timespec *t, + struct timespec64 *t, ...) { ... } @ depends on patch @ identifier t; @@ lease_get_mtime( ... , - struct timespec *t + struct timespec64 *t ) { ... } @te depends on patch forall@ identifier ts; local idexpression struct inode *inode_node; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn_update_time =~ "update_time$"; identifier fn; expression e, E3; local idexpression struct inode *node1; local idexpression struct inode *node2; local idexpression struct iattr *attr1; local idexpression struct iattr *attr2; local idexpression struct iattr attr; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; @@ ( ( - struct timespec ts; + struct timespec64 ts; | - struct timespec ts = current_time(inode_node); + struct timespec64 ts = current_time(inode_node); ) <+... when != ts ( - timespec_equal(&inode_node->i_xtime, &ts) + timespec64_equal(&inode_node->i_xtime, &ts) | - timespec_equal(&ts, &inode_node->i_xtime) + timespec64_equal(&ts, &inode_node->i_xtime) | - timespec_compare(&inode_node->i_xtime, &ts) + timespec64_compare(&inode_node->i_xtime, &ts) | - timespec_compare(&ts, &inode_node->i_xtime) + timespec64_compare(&ts, &inode_node->i_xtime) | ts = current_time(e) | fn_update_time(..., &ts,...) | inode_node->i_xtime = ts | node1->i_xtime = ts | ts = inode_node->i_xtime | <+... attr1->ia_xtime ...+> = ts | ts = attr1->ia_xtime | ts.tv_sec | ts.tv_nsec | btrfs_set_stack_timespec_sec(..., ts.tv_sec) | btrfs_set_stack_timespec_nsec(..., ts.tv_nsec) | - ts = timespec64_to_timespec( + ts = ... -) | - ts = ktime_to_timespec( + ts = ktime_to_timespec64( ...) | - ts = E3 + ts = timespec_to_timespec64(E3) | - ktime_get_real_ts(&ts) + ktime_get_real_ts64(&ts) | fn(..., - ts + timespec64_to_timespec(ts) ,...) ) ...+> ( <... when != ts - return ts; + return timespec64_to_timespec(ts); ...> ) | - timespec_equal(&node1->i_xtime1, &node2->i_xtime2) + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2) | - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2) + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2) | - timespec_compare(&node1->i_xtime1, &node2->i_xtime2) + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2) | node1->i_xtime1 = - timespec_trunc(attr1->ia_xtime1, + timespec64_trunc(attr1->ia_xtime1, ...) | - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2, + attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2, ...) | - ktime_get_real_ts(&attr1->ia_xtime1) + ktime_get_real_ts64(&attr1->ia_xtime1) | - ktime_get_real_ts(&attr.ia_xtime1) + ktime_get_real_ts64(&attr.ia_xtime1) ) @ depends on patch @ struct inode *node; struct iattr *attr; identifier fn; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; expression e; @@ ( - fn(node->i_xtime); + fn(timespec64_to_timespec(node->i_xtime)); | fn(..., - node->i_xtime); + timespec64_to_timespec(node->i_xtime)); | - e = fn(attr->ia_xtime); + e = fn(timespec64_to_timespec(attr->ia_xtime)); ) @ depends on patch forall @ struct inode *node; struct iattr *attr; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); fn (..., - &attr->ia_xtime, + &ts, ...); ) ...+> } @ depends on patch forall @ struct inode *node; struct iattr *attr; struct kstat *stat; identifier ia_xtime =~ "^ia_[acm]time$"; identifier i_xtime =~ "^i_[acm]time$"; identifier xtime =~ "^[acm]time$"; identifier fn, ret; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime); + &ts); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime); + &ts); | + ts = timespec64_to_timespec(stat->xtime); ret = fn (..., - &stat->xtime); + &ts); ) ...+> } @ depends on patch @ struct inode *node; struct inode *node2; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier i_xtime3 =~ "^i_[acm]time$"; struct iattr *attrp; struct iattr *attrp2; struct iattr attr ; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; struct kstat *stat; struct kstat stat1; struct timespec64 ts; identifier xtime =~ "^[acmb]time$"; expression e; @@ ( ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ; | node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | stat->xtime = node2->i_xtime1; | stat1.xtime = node2->i_xtime1; | ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ; | ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2; | - e = node->i_xtime1; + e = timespec64_to_timespec( node->i_xtime1 ); | - e = attrp->ia_xtime1; + e = timespec64_to_timespec( attrp->ia_xtime1 ); | node->i_xtime1 = current_time(...); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | - node->i_xtime1 = e; + node->i_xtime1 = timespec_to_timespec64(e); ) Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: <anton@tuxera.com> Cc: <balbi@kernel.org> Cc: <bfields@fieldses.org> Cc: <darrick.wong@oracle.com> Cc: <dhowells@redhat.com> Cc: <dsterba@suse.com> Cc: <dwmw2@infradead.org> Cc: <hch@lst.de> Cc: <hirofumi@mail.parknet.co.jp> Cc: <hubcap@omnibond.com> Cc: <jack@suse.com> Cc: <jaegeuk@kernel.org> Cc: <jaharkes@cs.cmu.edu> Cc: <jslaby@suse.com> Cc: <keescook@chromium.org> Cc: <mark@fasheh.com> Cc: <miklos@szeredi.hu> Cc: <nico@linaro.org> Cc: <reiserfs-devel@vger.kernel.org> Cc: <richard@nod.at> Cc: <sage@redhat.com> Cc: <sfrench@samba.org> Cc: <swhiteho@redhat.com> Cc: <tj@kernel.org> Cc: <trond.myklebust@primarydata.com> Cc: <tytso@mit.edu> Cc: <viro@zeniv.linux.org.uk>
2018-04-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-13/+8
Merge updates from Andrew Morton: - a few misc things - ocfs2 updates - the v9fs maintainers have been missing for a long time. I've taken over v9fs patch slinging. - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (116 commits) mm,oom_reaper: check for MMF_OOM_SKIP before complaining mm/ksm: fix interaction with THP mm/memblock.c: cast constant ULLONG_MAX to phys_addr_t headers: untangle kmemleak.h from mm.h include/linux/mmdebug.h: make VM_WARN* non-rvals mm/page_isolation.c: make start_isolate_page_range() fail if already isolated mm: change return type to vm_fault_t mm, oom: remove 3% bonus for CAP_SYS_ADMIN processes mm, page_alloc: wakeup kcompactd even if kswapd cannot free more memory kernel/fork.c: detect early free of a live mm mm: make counting of list_lru_one::nr_items lockless mm/swap_state.c: make bool enable_vma_readahead and swap_vma_readahead() static block_invalidatepage(): only release page if the full page was invalidated mm: kernel-doc: add missing parameter descriptions mm/swap.c: remove @cold parameter description for release_pages() mm/nommu: remove description of alloc_vm_area zram: drop max_zpage_size and use zs_huge_class_size() zsmalloc: introduce zs_huge_class_size() mm: fix races between swapoff and flush dcache fs/direct-io.c: minor cleanups in do_blockdev_direct_IO ...
2018-04-05ocfs2: use 'osb' instead of 'OCFS2_SB()'piaojun1-13/+8
We could use 'osb' instead of 'OCFS2_SB()' to make code more elegant. Link: http://lkml.kernel.org/r/5A702111.7090907@huawei.com Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-03-19vfs: Replace stray non-ASCII homoglyph characters with their ASCII equivalentsIngo Molnar1-1/+1
Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-31ocfs2: nowait aio supportGang He1-5/+15
Return EAGAIN if any of the following checks fail for direct I/O: - Cannot get the related locks immediately - Blocks are not allocated at the write location, it will trigger block allocation and block IO operations. [ghe@suse.com: v4] Link: http://lkml.kernel.org/r/1516007283-29932-4-git-send-email-ghe@suse.com [ghe@suse.com: v2] Link: http://lkml.kernel.org/r/1511944612-9629-4-git-send-email-ghe@suse.com Link: http://lkml.kernel.org/r/1511775987-841-4-git-send-email-ghe@suse.com Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Alex Chen <alex.chen@huawei.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31ocfs2: add ocfs2_try_rw_lock() and ocfs2_try_inode_lock()Gang He1-0/+21
Patch series "ocfs2: add nowait aio support", v4. VFS layer has introduced the non-blocking aio flag IOCB_NOWAIT, which tells the kernel to bail out if an AIO request will block for reasons such as file allocations, or writeback triggering, or would block while allocating requests while performing direct I/O. Subsequently, pwritev2/preadv2 also can leverage this part of kernel code. So far, ext4/xfs/btrfs have supported this feature. Add the related code for the ocfs2 file system. This patch (of 3): Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which will be used in non-blocking IO scenarios. [ghe@suse.com: v2] Link: http://lkml.kernel.org/r/1511944612-9629-2-git-send-email-ghe@suse.com Link: http://lkml.kernel.org/r/1511775987-841-2-git-send-email-ghe@suse.com Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Jun Piao <piaojun@huawei.com> Acked-by: alex chen <alex.chen@huawei.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31ocfs2: add trimfs dlm lock resourceGang He1-0/+86
Introduce a new dlm lock resource, which will be used to communicate during fstrimming of an ocfs2 device from cluster nodes. Link: http://lkml.kernel.org/r/1513228484-2084-1-git-send-email-ghe@suse.com Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31ocfs2: try a blocking lock before return AOP_TRUNCATED_PAGEGang He1-0/+9
If we can't get inode lock immediately in the function ocfs2_inode_lock_with_page() when reading a page, we should not return directly here, since this will lead to a softlockup problem when the kernel is configured with CONFIG_PREEMPT is not set. The method is to get a blocking lock and immediately unlock before returning, this can avoid CPU resource waste due to lots of retries, and benefits fairness in getting lock among multiple nodes, increase efficiency in case modifying the same file frequently from multiple nodes. The softlockup crash (when set /proc/sys/kernel/softlockup_panic to 1) looks like: Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 885 Comm: multi_mmap Tainted: G L 4.12.14-6.1-default #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Call Trace: <IRQ> dump_stack+0x5c/0x82 panic+0xd5/0x21e watchdog_timer_fn+0x208/0x210 __hrtimer_run_queues+0xcc/0x200 hrtimer_interrupt+0xa6/0x1f0 smp_apic_timer_interrupt+0x34/0x50 apic_timer_interrupt+0x96/0xa0 </IRQ> RIP: 0010:unlock_page+0x17/0x30 RSP: 0000:ffffaf154080bc88 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 RAX: dead000000000100 RBX: fffff21e009f5300 RCX: 0000000000000004 RDX: dead0000000000ff RSI: 0000000000000202 RDI: fffff21e009f5300 RBP: 0000000000000000 R08: 0000000000000000 R09: ffffaf154080bb00 R10: ffffaf154080bc30 R11: 0000000000000040 R12: ffff993749a39518 R13: 0000000000000000 R14: fffff21e009f5300 R15: fffff21e009f5300 ocfs2_inode_lock_with_page+0x25/0x30 [ocfs2] ocfs2_readpage+0x41/0x2d0 [ocfs2] filemap_fault+0x12b/0x5c0 ocfs2_fault+0x29/0xb0 [ocfs2] __do_fault+0x1a/0xa0 __handle_mm_fault+0xbe8/0x1090 handle_mm_fault+0xaa/0x1f0 __do_page_fault+0x235/0x4b0 trace_do_page_fault+0x3c/0x110 async_page_fault+0x28/0x30 RIP: 0033:0x7fa75ded638e RSP: 002b:00007ffd6657db18 EFLAGS: 00010287 RAX: 000055c7662fb700 RBX: 0000000000000001 RCX: 000055c7662fb700 RDX: 0000000000001770 RSI: 00007fa75e909000 RDI: 000055c7662fb700 RBP: 0000000000000003 R08: 000000000000000e R09: 0000000000000000 R10: 0000000000000483 R11: 00007fa75ded61b0 R12: 00007fa75e90a770 R13: 000000000000000e R14: 0000000000001770 R15: 0000000000000000 About performance improvement, we can see the testing time is reduced, and CPU utilization decreases, the detailed data is as follows. I ran multi_mmap test case in ocfs2-test package in a three nodes cluster. Before applying this patch: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2754 ocfs2te+ 20 0 170248 6980 4856 D 80.73 0.341 0:18.71 multi_mmap 1505 root rt 0 222236 123060 97224 S 2.658 6.015 0:01.44 corosync 5 root 20 0 0 0 0 S 1.329 0.000 0:00.19 kworker/u8:0 95 root 20 0 0 0 0 S 1.329 0.000 0:00.25 kworker/u8:1 2728 root 20 0 0 0 0 S 0.997 0.000 0:00.24 jbd2/sda1-33 2721 root 20 0 0 0 0 S 0.664 0.000 0:00.07 ocfs2dc-3C8CFD4 2750 ocfs2te+ 20 0 142976 4652 3532 S 0.664 0.227 0:00.28 mpirun ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared Tests with "-b 4096 -C 32768" Thu Dec 28 14:44:52 CST 2017 multi_mmap..................................................Passed. Runtime 783 seconds. After apply this patch: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2508 ocfs2te+ 20 0 170248 6804 4680 R 54.00 0.333 0:55.37 multi_mmap 155 root 20 0 0 0 0 S 2.667 0.000 0:01.20 kworker/u8:3 95 root 20 0 0 0 0 S 2.000 0.000 0:01.58 kworker/u8:1 2504 ocfs2te+ 20 0 142976 4604 3480 R 1.667 0.225 0:01.65 mpirun 5 root 20 0 0 0 0 S 1.000 0.000 0:01.36 kworker/u8:0 2482 root 20 0 0 0 0 S 1.000 0.000 0:00.86 jbd2/sda1-33 299 root 0 -20 0 0 0 S 0.333 0.000 0:00.13 kworker/2:1H 335 root 0 -20 0 0 0 S 0.333 0.000 0:00.17 kworker/1:1H 535 root 20 0 12140 7268 1456 S 0.333 0.355 0:00.34 haveged 1282 root rt 0 222284 123108 97224 S 0.333 6.017 0:01.33 corosync ocfs2test@tb-node2:~>multiple_run.sh -i ens3 -k ~/linux-4.4.21-69.tar.gz -o ~/ocfs2mullog -C hacluster -s pcmk -n tb-node2,tb-node1,tb-node3 -d /dev/sda1 -b 4096 -c 32768 -t multi_mmap /mnt/shared Tests with "-b 4096 -C 32768" Thu Dec 28 15:04:12 CST 2017 multi_mmap..................................................Passed. Runtime 487 seconds. Link: http://lkml.kernel.org/r/1514447305-30814-1-git-send-email-ghe@suse.com Fixes: 1cce4df04f37 ("ocfs2: do not lock/unlock() inode DLM lock") Signed-off-by: Gang He <ghe@suse.com> Reviewed-by: Eric Ren <zren@suse.com> Acked-by: alex chen <alex.chen@huawei.com> Acked-by: piaojun <piaojun@huawei.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-23ocfs2: fix deadlock caused by recursive locking in xattrEric Ren1-0/+4
Another deadlock path caused by recursive locking is reported. This kind of issue was introduced since commit 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()"). Two deadlock paths have been fixed by commit b891fa5024a9 ("ocfs2: fix deadlock issue when taking inode lock at vfs entry points"). Yes, we intend to fix this kind of case in incremental way, because it's hard to find out all possible paths at once. This one can be reproduced like this. On node1, cp a large file from home directory to ocfs2 mountpoint. While on node2, run setfacl/getfacl. Both nodes will hang up there. The backtraces: On node1: __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2] ocfs2_write_begin+0x43/0x1a0 [ocfs2] generic_perform_write+0xa9/0x180 __generic_file_write_iter+0x1aa/0x1d0 ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2] __vfs_write+0xc3/0x130 vfs_write+0xb1/0x1a0 SyS_write+0x46/0xa0 On node2: __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2] ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2] ocfs2_xattr_set+0x12e/0xe80 [ocfs2] ocfs2_set_acl+0x22d/0x260 [ocfs2] ocfs2_iop_set_acl+0x65/0xb0 [ocfs2] set_posix_acl+0x75/0xb0 posix_acl_xattr_set+0x49/0xa0 __vfs_setxattr+0x69/0x80 __vfs_setxattr_noperm+0x72/0x1a0 vfs_setxattr+0xa7/0xb0 setxattr+0x12d/0x190 path_setxattr+0x9f/0xb0 SyS_setxattr+0x14/0x20 Fix this one by using ocfs2_inode_{lock|unlock}_tracker, which is exported by commit 439a36b8ef38 ("ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock"). Link: http://lkml.kernel.org/r/20170622014746.5815-1-zren@suse.com Fixes: 743b5f1434f5 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()") Signed-off-by: Eric Ren <zren@suse.com> Reported-by: Thomas Voegtle <tv@lio96.de> Tested-by: Thomas Voegtle <tv@lio96.de> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-02sched/headers: Prepare to move signal wakeup & sigpending methods from ↵Ingo Molnar1-0/+1
<linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-22ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lockEric Ren1-3/+102
We are in the situation that we have to avoid recursive cluster locking, but there is no way to check if a cluster lock has been taken by a precess already. Mostly, we can avoid recursive locking by writing code carefully. However, we found that it's very hard to handle the routines that are invoked directly by vfs code. For instance: const struct inode_operations ocfs2_file_iops = { .permission = ocfs2_permission, .get_acl = ocfs2_iop_get_acl, .set_acl = ocfs2_iop_set_acl, }; Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR): do_sys_open may_open inode_permission ocfs2_permission ocfs2_inode_lock() <=== first time generic_permission get_acl ocfs2_iop_get_acl ocfs2_inode_lock() <=== recursive one A deadlock will occur if a remote EX request comes in between two of ocfs2_inode_lock(). Briefly describe how the deadlock is formed: On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of the remote EX lock request. Another hand, the recursive cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why? because there is no chance for the first cluster lock on this node to be unlocked - we block ourselves in the code path. The idea to fix this issue is mostly taken from gfs2 code. 1. introduce a new field: struct ocfs2_lock_res.l_holders, to keep track of the processes' pid who has taken the cluster lock of this lock resource; 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH; it means just getting back disk inode bh for us if we've got cluster lock. 3. export a helper: ocfs2_is_locked_by_me() is used to check if we have got the cluster lock in the upper code path. The tracking logic should be used by some of the ocfs2 vfs's callbacks, to solve the recursive locking issue cuased by the fact that vfs routines can call into each other. The performance penalty of processing the holder list should only be seen at a few cases where the tracking logic is used, such as get/set acl. You may ask what if the first time we got a PR lock, and the second time we want a EX lock? fortunately, this case never happens in the real world, as far as I can see, including permission check, (get|set)_(acl|attr), and the gfs2 code also do so. [sfr@canb.auug.org.au remove some inlines] Link: http://lkml.kernel.org/r/20170117100948.11657-2-zren@suse.com Signed-off-by: Eric Ren <zren@suse.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10ocfs2: fix crash caused by stale lvb with fsdlm pluginEric Ren1-0/+10
The crash happens rather often when we reset some cluster nodes while nodes contend fiercely to do truncate and append. The crash backtrace is below: dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover_grant 1 locks on 971 resources dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover 9 generation 5 done: 4 ms ocfs2: Begin replay journal (node 318952601, slot 2) on device (253,18) ocfs2: End replay journal (node 318952601, slot 2) on device (253,18) ocfs2: Beginning quota recovery on device (253,18) for slot 2 ocfs2: Finishing quota recovery on device (253,18) for slot 2 (truncate,30154,1):ocfs2_truncate_file:470 ERROR: bug expression: le64_to_cpu(fe->i_size) != i_size_read(inode) (truncate,30154,1):ocfs2_truncate_file:470 ERROR: Inode 290321, inode i_size = 732 != di i_size = 937, i_flags = 0x1 ------------[ cut here ]------------ kernel BUG at /usr/src/linux/fs/ocfs2/file.c:470! invalid opcode: 0000 [#1] SMP Modules linked in: ocfs2_stack_user(OEN) ocfs2(OEN) ocfs2_nodemanager ocfs2_stackglue(OEN) quota_tree dlm(OEN) configfs fuse sd_mod iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs softdog xfs libcrc32c ppdev parport_pc pcspkr parport joydev virtio_balloon virtio_net i2c_piix4 acpi_cpufreq button processor ext4 crc16 jbd2 mbcache ata_generic cirrus virtio_blk ata_piix drm_kms_helper ahci syscopyarea libahci sysfillrect sysimgblt fb_sys_fops ttm floppy libata drm virtio_pci virtio_ring uhci_hcd virtio ehci_hcd usbcore serio_raw usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 Supported: No, Unsupported modules are loaded CPU: 1 PID: 30154 Comm: truncate Tainted: G OE N 4.4.21-69-default #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014 task: ffff88004ff6d240 ti: ffff880074e68000 task.ti: ffff880074e68000 RIP: 0010:[<ffffffffa05c8c30>] [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2] RSP: 0018:ffff880074e6bd50 EFLAGS: 00010282 RAX: 0000000000000074 RBX: 000000000000029e RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246 RBP: ffff880074e6bda8 R08: 000000003675dc7a R09: ffffffff82013414 R10: 0000000000034c50 R11: 0000000000000000 R12: ffff88003aab3448 R13: 00000000000002dc R14: 0000000000046e11 R15: 0000000000000020 FS: 00007f839f965700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f839f97e000 CR3: 0000000036723000 CR4: 00000000000006e0 Call Trace: ocfs2_setattr+0x698/0xa90 [ocfs2] notify_change+0x1ae/0x380 do_truncate+0x5e/0x90 do_sys_ftruncate.constprop.11+0x108/0x160 entry_SYSCALL_64_fastpath+0x12/0x6d Code: 24 28 ba d6 01 00 00 48 c7 c6 30 43 62 a0 8b 41 2c 89 44 24 08 48 8b 41 20 48 c7 c1 78 a3 62 a0 48 89 04 24 31 c0 e8 a0 97 f9 ff <0f> 0b 3d 00 fe ff ff 0f 84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff RIP [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2] It's because ocfs2_inode_lock() get us stale LVB in which the i_size is not equal to the disk i_size. We mistakenly trust the LVB because the underlaying fsdlm dlm_lock() doesn't set lkb_sbflags with DLM_SBF_VALNOTVALID properly for us. But, why? The current code tries to downconvert lock without DLM_LKF_VALBLK flag to tell o2cb don't update RSB's LVB if it's a PR->NULL conversion, even if the lock resource type needs LVB. This is not the right way for fsdlm. The fsdlm plugin behaves different on DLM_LKF_VALBLK, it depends on DLM_LKF_VALBLK to decide if we care about the LVB in the LKB. If DLM_LKF_VALBLK is not set, fsdlm will skip recovering RSB's LVB from this lkb and set the right DLM_SBF_VALNOTVALID appropriately when node failure happens. The following diagram briefly illustrates how this crash happens: RSB1 is inode metadata lock resource with LOCK_TYPE_USES_LVB; The 1st round: Node1 Node2 RSB1: PR RSB1(master): NULL->EX ocfs2_downconvert_lock(PR->NULL, set_lvb==0) ocfs2_dlm_lock(no DLM_LKF_VALBLK) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - dlm_lock(no DLM_LKF_VALBLK) convert_lock(overwrite lkb->lkb_exflags with no DLM_LKF_VALBLK) RSB1: NULL RSB1: EX reset Node2 dlm_recover_rsbs() recover_lvb() /* The LVB is not trustable if the node with EX fails and * no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1. */ if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to return; * to invalid the LVB here. */ The 2nd round: Node 1 Node2 RSB1(become master from recovery) ocfs2_setattr() ocfs2_inode_lock(NULL->EX) /* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID */ ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from disk */ ocfs2_truncate_file() mlog_bug_on_msg(disk isize != i_size_read(inode)) /* crash! */ The fix is quite straightforward. We keep to set DLM_LKF_VALBLK flag for dlm_lock() if the lock resource type needs LVB and the fsdlm plugin is uesed. Link: http://lkml.kernel.org/r/1481275846-6604-1-git-send-email-zren@suse.com Signed-off-by: Eric Ren <zren@suse.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26ocfs2: remove obscure BUG_ON in dlmglueJoseph Qi1-9/+0
These BUG_ON(!inode) are obscure because we have already used inode to get osb. And actually we can guarantee here inode is valid in the context. So we can safely remove them. Link: http://lkml.kernel.org/r/5776336A.6030104@huawei.com Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Eric Ren <zren@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26ocfs2: cleanup unneeded goto in ocfs2_create_new_inode_locksJoseph Qi1-3/+1
The last goto is unneeded, so remove it. Link: http://lkml.kernel.org/r/576213D3.6080002@huawei.com Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-31posix_acl: Inode acl caching fixesAndreas Gruenbacher1-0/+3
When get_acl() is called for an inode whose ACL is not cached yet, the get_acl inode operation is called to fetch the ACL from the filesystem. The inode operation is responsible for updating the cached acl with set_cached_acl(). This is done without locking at the VFS level, so another task can call set_cached_acl() or forget_cached_acl() before the get_acl inode operation gets to calling set_cached_acl(), and then get_acl's call to set_cached_acl() results in caching an outdate ACL. Prevent this from happening by setting the cached ACL pointer to a task-specific sentinel value before calling the get_acl inode operation. Move the responsibility for updating the cached ACL from the get_acl inode operations to get_acl(). There, only set the cached ACL if the sentinel value hasn't changed. The sentinel values are chosen to have odd values. Likewise, the value of ACL_NOT_CACHED is odd. In contrast, ACL object pointers always have an even value (ACLs are aligned in memory). This allows to distinguish uncached ACLs values from ACL objects. In addition, switch from guarding inode->i_acl and inode->i_default_acl upates by the inode->i_lock spinlock to using xchg() and cmpxchg(). Filesystems that do not want ACLs returned from their get_acl inode operations to be cached must call forget_cached_acl() to prevent the VFS from doing so. (Patch written by Al Viro and Andreas Gruenbacher.) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-21ocfs2: NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lockTariq Saeed1-0/+6
NFS on a 2 node ocfs2 cluster each node exporting dir. The lock causing the hang is the global bit map inode lock. Node 1 is master, has the lock granted in PR mode; Node 2 is in the converting list (PR -> EX). There are no holders of the lock on the master node so it should downconvert to NL and grant EX to node 2 but that does not happen. BLOCKED + QUEUED in lock res are set and it is on osb blocked list. Threads are waiting in __ocfs2_cluster_lock on BLOCKED. One thread wants EX, rest want PR. So it is as though the downconvert thread needs to be kicked to complete the conv. The hang is caused by an EX req coming into __ocfs2_cluster_lock on the heels of a PR req after it sets BUSY (drops l_lock, releasing EX thread), forcing the incoming EX to wait on BUSY without doing anything. PR has called ocfs2_dlm_lock, which sets the node 1 lock from NL -> PR, queues ast. At this time, upconvert (PR ->EX) arrives from node 2, finds conflict with node 1 lock in PR, so the lock res is put on dlm thread's dirty listt. After ret from ocf2_dlm_lock, PR thread now waits behind EX on BUSY till awoken by ast. Now it is dlm_thread that serially runs dlm_shuffle_lists, ast, bast, in that order. dlm_shuffle_lists ques a bast on behalf of node 2 (which will be run by dlm_thread right after the ast). ast does its part, sets UPCONVERT_FINISHING, clears BUSY and wakes its waiters. Next, dlm_thread runs bast. It sets BLOCKED and kicks dc thread. dc thread runs ocfs2_unblock_lock, but since UPCONVERT_FINISHING set, skips doing anything and reques. Inside of __ocfs2_cluster_lock, since EX has been waiting on BUSY ahead of PR, it wakes up first, finds BLOCKED set and skips doing anything but clearing UPCONVERT_FINISHING (which was actually "meant" for the PR thread), and this time waits on BLOCKED. Next, the PR thread comes out of wait but since UPCONVERT_FINISHING is not set, it skips updating the l_ro_holders and goes straight to wait on BLOCKED. So there, we have a hang! Threads in __ocfs2_cluster_lock wait on BLOCKED, lock res in osb blocked list. Only when dc thread is awoken, it will run ocfs2_unblock_lock and things will unhang. One way to fix this is to wake the dc thread on the flag after clearing UPCONVERT_FINISHING Orabug: 20933419 Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Eric Ren <zren@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14ocfs2: do not lock/unlock() inode DLM lockGoldwyn Rodrigues1-8/+0
DLM does not cache locks. So, blocking lock and unlock will only make the performance worse where contention over the locks is high. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05ocfs2: add uuid to ocfs2 thread name for problem analysisJoseph Qi1-1/+2
A node can mount multiple ocfs2 volumes. And if thread names are same for each volume/domain, it will bring inconvenience when analyzing problems because we have to identify which volume/domain the messages belong to. Since thread name will be printed to messages, so add volume uuid or dlm name to thread name can benefit problem analysis. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Gang He <ghe@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04ocfs2: remove unneeded code in ocfs2_dlm_initJoseph Qi1-2/+0
status is already initialized and it will only be 0 or negatives in the code flow. So remove the unneeded assignment after the lable 'local'. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07ocfs2: fix BUG in ocfs2_downconvert_thread_do_work()Joseph Qi1-3/+7
The "BUG_ON(list_empty(&osb->blocked_lock_list))" in ocfs2_downconvert_thread_do_work can be triggered in the following case: ocfs2dc has firstly saved osb->blocked_lock_count to local varibale processed, and then processes the dentry lockres. During the dentry put, it calls iput and then deletes rw, inode and open lockres from blocked list in ocfs2_mark_lockres_freeing. And this causes the variable `processed' to not reflect the number of blocked lockres to be processed, which triggers the BUG. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-21Revert "ocfs2: incorrect check for debugfs returns"Linus Torvalds1-1/+1
This reverts commit e2ac55b6a8e337fac7cc59c6f452caac92ab5ee6. Huang Ying reports that this causes a hang at boot with debugfs disabled. It is true that the debugfs error checks are kind of confusing, and this code certainly merits more cleanup and thinking about it, but there's something wrong with the trivial "check not just for NULL, but for error pointers too" patch. Yes, with debugfs disabled, we will end up setting the o2hb_debug_dir pointer variable to an error pointer (-ENODEV), and then continue as if everything was fine. But since debugfs is disabled, all the _users_ of that pointer end up being compiled away, so even though the pointer can not be dereferenced, that's still fine. So it's confusing and somewhat questionable, but the "more correct" error checks end up causing more trouble than they fix. Reported-by: Huang Ying <ying.huang@intel.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Chengyu Song <csong84@gatech.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: check if the ocfs2 lock resource has been initialized before calling ↵alex chen1-0/+5
ocfs2_dlm_lock If ocfs2 lockres has not been initialized before calling ocfs2_dlm_lock, the lock won't be dropped and then will lead umount hung. The case is described below: ocfs2_mknod ocfs2_mknod_locked __ocfs2_mknod_locked ocfs2_journal_access_di Failed because of -ENOMEM or other reasons, the inode lockres has not been initialized yet. iput(inode) ocfs2_evict_inode ocfs2_delete_inode ocfs2_inode_lock ocfs2_inode_lock_full_nested __ocfs2_cluster_lock Succeeds and allocates a new dlm lockres. ocfs2_clear_inode ocfs2_open_unlock ocfs2_drop_inode_locks ocfs2_drop_lock Since lockres has not been initialized, the lock can't be dropped and the lockres can't be migrated, thus umount will hang forever. Signed-off-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14ocfs2: incorrect check for debugfs returnsChengyu Song1-1/+1
debugfs_create_dir and debugfs_create_file may return -ENODEV when debugfs is not configured, so the return value should be checked against ERROR_VALUE as well, otherwise the later dereference of the dentry pointer would crash the kernel. This patch tries to solve this problem by fixing certain checks. However, I have that found other call sites are protected by #ifdef CONFIG_DEBUG_FS. In current implementation, if CONFIG_DEBUG_FS is defined, then the above two functions will never return any ERROR_VALUE. So another possibility to fix this is to surround all the buggy checks/functions with the same #ifdef CONFIG_DEBUG_FS. But I'm not sure if this would break any functionality, as only OCFS2_FS_STATS declares dependency on DEBUG_FS. Signed-off-by: Chengyu Song <csong84@gatech.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10ocfs2: prune the dcache before deleting the dentry of directoryalex chen1-0/+3
In ocfs2_dentry_convert_worker, we should prune the dcache before deleting the dentry of directory, otherwise, in the following cases the inode of directory will still remain in orphan directory until the device being umounted. Mount point: /mnt/ocfs2 Node A Node B mkdir /mnt/ocfs2/testdir ocfs2_mkdir ->ocfs2_mknod ->ocfs2_dentry_attach_lock ->ocfs2_dentry_lock(dentry, 0) ... ... touch /mnt/ocfs2/testdir/testfile unlink /mnt/test/testdir/testfile rmdir /mnt/ocfs2/testdir ocfs2_unlink ->ocfs2_remote_dentry_delete ->ocfs2_dentry_lock(dentry, 1) ... ... ... ... ocfs2_downconvert_thread ->ocfs2_unblock_lock ->ocfs2_dentry_convert_worker ->ocfs2_find_local_alias ->dget_dlock ->d_delete Here the dentry can not be released because the children's dentry is negative but still exist. Finally, this inode will still remain in orphan directory until its children are destroyed. So before deleting dentry of directory, we should prune the dcache to remove unused children of the parent dentry by shrink_dcache_parent(). Signed-off-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10Merge branch 'akpm' (patchbomb from Andrew)Linus Torvalds1-6/+31
Merge first patchbomb from Andrew Morton: - a few minor cifs fixes - dma-debug upadtes - ocfs2 - slab - about half of MM - procfs - kernel/exit.c - panic.c tweaks - printk upates - lib/ updates - checkpatch updates - fs/binfmt updates - the drivers/rtc tree - nilfs - kmod fixes - more kernel/exit.c - various other misc tweaks and fixes * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits) exit: pidns: fix/update the comments in zap_pid_ns_processes() exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting exit: exit_notify: re-use "dead" list to autoreap current exit: reparent: call forget_original_parent() under tasklist_lock exit: reparent: avoid find_new_reaper() if no children exit: reparent: introduce find_alive_thread() exit: reparent: introduce find_child_reaper() exit: reparent: document the ->has_child_subreaper checks exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper() exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting exit: proc: don't try to flush /proc/tgid/task/tgid exit: release_task: fix the comment about group leader accounting exit: wait: drop tasklist_lock before psig->c* accounting exit: wait: don't use zombie->real_parent exit: wait: cleanup the ptrace_reparented() checks usermodehelper: kill the kmod_thread_locker logic usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper() fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races ...
2014-12-10ocfs2: do not set OCFS2_LOCK_UPCONVERT_FINISHING if nonblocking lock can not ↵Xue jiufei1-6/+31
be granted at once ocfs2_readpages() use nonblocking flag to avoid page lock inversion. It will trigger cluster hang because that flag OCFS2_LOCK_UPCONVERT_FINISHING is not cleared if nonblocking lock cannot be granted at once. The flag would prevent dc thread from downconverting. So other nodes cannot acheive this lockres for ever. So we should not set OCFS2_LOCK_UPCONVERT_FINISHING when receiving ast if nonblocking lock had already returned. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-19assorted conversions to %p[dD]Al Viro1-2/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09fs/ocfs2/dlmglue.c: use __seq_open_private() not seq_open()Rob Jones1-18/+5
Reduce boilerplate code by using seq_open_private() instead of seq_open() Signed-off-by: Rob Jones <rob.jones@codethink.co.uk> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04ocfs2: remove some unused codeXue jiufei1-5/+0
dlm_recovery_ctxt.received is unused. ocfs2_should_refresh_lock_res() can only return 0 or 1, so the error handling code in ocfs2_super_lock() is unneeded. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03ocfs2: avoid blocking in ocfs2_mark_lockres_freeing() in downconvert threadJan Kara1-3/+41
If we are dropping last inode reference from downconvert thread, we will end up calling ocfs2_mark_lockres_freeing() which can block if the lock we are freeing is queued thus creating an A-A deadlock. Luckily, since we are the downconvert thread, we can immediately dequeue the lock and thus avoid waiting in this case. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21ocfs2: pass ocfs2_cluster_connection to ocfs2_this_nodeGoldwyn Rodrigues1-1/+1
This is done to differentiate between using and not using controld and use the connection information accordingly. We need to be backward compatible. So, we use a new enum ocfs2_connection_type to identify when controld is used and when it is not. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21ocfs2: add clustername to cluster connectionGoldwyn Rodrigues1-0/+2
This is an effort of removing ocfs2_controld.pcmk and getting ocfs2 DLM handling up to the times with respect to DLM (>=4.0.1) and corosync (2.3.x). AFAIK, cman also is being phased out for a unified corosync cluster stack. fs/dlm performs all the functions with respect to fencing and node management and provides the API's to do so for ocfs2. For all future references, DLM stands for fs/dlm code. The advantages are: + No need to run an additional userspace daemon (ocfs2_controld) + No controld device handling and controld protocol + Shifting responsibilities of node management to DLM layer For backward compatibility, we are keeping the controld handling code. Once enough time has passed we can remove a significant portion of the code. This was tested by using the kernel with changes on older unmodified tools. The kernel used ocfs2_controld as expected, and displayed the appropriate warning message. This feature requires modification in the userspace ocfs2-tools. The changes can be found at: https://github.com/goldwynr/ocfs2-tools branch: nocontrold Currently, not many checks are present in the userspace code, but that would change soon. This patch (of 6): Add clustername to cluster connection. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15tree-wide: use reinit_completion instead of INIT_COMPLETIONWolfram Sang1-2/+2
Use this new function to make code more comprehensible, since we are reinitialzing the completion, not initializing. [akpm@linux-foundation.org: linux-next resyncs] Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Acked-by: Linus Walleij <linus.walleij@linaro.org> (personally at LCE13) Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07aio: remove retry-based AIOZach Brown1-1/+1
This removes the retry-based AIO infrastructure now that nothing in tree is using it. We want to remove retry-based AIO because it is fundemantally unsafe. It retries IO submission from a kernel thread that has only assumed the mm of the submitting task. All other task_struct references in the IO submission path will see the kernel thread, not the submitting task. This design flaw means that nothing of any meaningful complexity can use retry-based AIO. This removes all the code and data associated with the retry machinery. The most significant benefit of this is the removal of the locking around the unused run list in the submission path. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: Zach Brown <zab@redhat.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-25Merge branch 'for-linus' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace and namespace infrastructure changes from Eric W Biederman: "This set of changes starts with a few small enhnacements to the user namespace. reboot support, allowing more arbitrary mappings, and support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the user namespace root. I do my best to document that if you care about limiting your unprivileged users that when you have the user namespace support enabled you will need to enable memory control groups. There is a minor bug fix to prevent overflowing the stack if someone creates way too many user namespaces. The bulk of the changes are a continuation of the kuid/kgid push down work through the filesystems. These changes make using uids and gids typesafe which ensures that these filesystems are safe to use when multiple user namespaces are in use. The filesystems converted for 3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The changes for these filesystems were a little more involved so I split the changes into smaller hopefully obviously correct changes. XFS is the only filesystem that remains. I was hoping I could get that in this release so that user namespace support would be enabled with an allyesconfig or an allmodconfig but it looks like the xfs changes need another couple of days before it they are ready." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits) cifs: Enable building with user namespaces enabled. cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t cifs: Convert struct cifs_sb_info to use kuids and kgids cifs: Modify struct smb_vol to use kuids and kgids cifs: Convert struct cifsFileInfo to use a kuid cifs: Convert struct cifs_fattr to use kuid and kgids cifs: Convert struct tcon_link to use a kuid. cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t cifs: Convert from a kuid before printing current_fsuid cifs: Use kuids and kgids SID to uid/gid mapping cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc cifs: Use BUILD_BUG_ON to validate uids and gids are the same size cifs: Override unmappable incoming uids and gids nfsd: Enable building with user namespaces enabled. nfsd: Properly compare and initialize kuids and kgids nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids nfsd: Modify nfsd4_cb_sec to use kuids and kgids nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion nfsd: Convert nfsxdr to use kuids and kgids nfsd: Convert nfs3xdr to use kuids and kgids ...
2013-02-21ocfs2: unlock super lock if lockres refresh failedJunxiao Bi1-1/+4
If lockres refresh failed, the super lock will never be released which will cause some processes on other cluster nodes hung forever. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-13ocfs2: convert between kuids and kgids and DLM locksEric W. Biederman1-4/+4
Convert between uid and gids stored in the on the wire format of dlm locks aka struct ocfs2_meta_lvb and kuids and kgids stored in inode->i_uid and inode->i_gid. Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-07-03ocfs2: use spinlock irqsave for downconvert lock.patchSrinivas Eeda1-12/+19
When ocfs2dc thread holds dc_task_lock spinlock and receives soft IRQ it deadlock itself trying to get same spinlock in ocfs2_wake_downconvert_thread. Below is the stack snippet. The patch disables interrupts when acquiring dc_task_lock spinlock. ocfs2_wake_downconvert_thread ocfs2_rw_unlock ocfs2_dio_end_io dio_complete ..... bio_endio req_bio_endio .... scsi_io_completion blk_done_softirq __do_softirq do_softirq irq_exit do_IRQ ocfs2_downconvert_thread [kthread] Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2012-07-03ocfs2: Misplaced parens in unlikleyroel1-1/+1
Fix misplaced parentheses Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-12-01Merge branch 'upstream-linus' of ↵Linus Torvalds1-6/+15
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (31 commits) ocfs2: avoid unaligned access to dqc_bitmap ocfs2: Use filemap_write_and_wait() instead of write_inode_now() ocfs2: honor O_(D)SYNC flag in fallocate ocfs2: Add a missing journal credit in ocfs2_link_credits() -v2 ocfs2: send correct UUID to cleancache initialization ocfs2: Commit transactions in error cases -v2 ocfs2: make direntry invalid when deleting it fs/ocfs2/dlm/dlmlock.c: free kmem_cache_zalloc'd data using kmem_cache_free ocfs2: Avoid livelock in ocfs2_readpage() ocfs2: serialize unaligned aio ocfs2: Implement llseek() ocfs2: Fix ocfs2_page_mkwrite() ocfs2: Add comment about orphan scanning ocfs2: Clean up messages in the fs ocfs2/cluster: Cluster up now includes network connections too ocfs2/cluster: Add new function o2net_fill_node_map() ocfs2/cluster: Fix output in file elapsed_time_in_ms ocfs2/dlm: dlmlock_remote() needs to account for remastery ocfs2/dlm: Take inflight reference count for remotely mastered resources too ocfs2/dlm: Cleanup dlm_wait_for_node_death() and dlm_wait_for_node_recovery() ...
2011-11-02filesystems: add set_nlink()Miklos Szeredi1-1/+1
Replace remaining direct i_nlink updates with a new set_nlink() updater function. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-05-31ocfs2: Bugfix for hard readonly mountTiger Yang1-6/+15
ocfs2 cannot currently mount a device that is readonly at the media ("hard readonly"). Fix the broken places. see detail: http://oss.oracle.com/bugzilla/show_bug.cgi?id=1322 [ Description edited -- Joel ] Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Reviewed-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-03-28Merge branch 'mlog_replace_for_39' of git://repo.or.cz/taoma-kernel into ↵Joel Becker1-143/+6
ocfs2-merge-window-fix
2011-03-07ocfs2: Remove EXIT from masklog.Tao Ma1-65/+6
mlog_exit is used to record the exit status of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. This patch just try to remove it or change it. So: 1. if all the error paths already use mlog_errno, it is just removed. Otherwise, it will be replaced by mlog_errno. 2. if it is used to print some return value, it is replaced with mlog(0,...). mlog_exit_ptr is changed to mlog(0. All those mlog(0,...) will be replaced with trace events later. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2011-02-21ocfs2: Remove ENTRY from masklog.Tao Ma1-78/+0
ENTRY is used to record the entry of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. So for mlog_entry_void, we just remove it. for mlog_entry(...), we replace it with mlog(0,...), and they will be replace by trace event later. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2011-02-20ocfs2: Use hrtimer to track ocfs2 fs lock statsSunil Mushran1-47/+50
Patch makes use of the hrtimer to track times in ocfs2 lock stats. The patch is a bit involved to ensure no additional impact on the memory footprint. The size of ocfs2_inode_cache remains 1280 bytes on 32-bit systems. A related change was to modify the unit of the max wait time from nanosec to microsec allowing us to track max time larger than 4 secs. This change necessitated the bumping of the output version in the debugfs file, locking_state, from 2 to 3. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2010-09-10Track negative entries v3Goldwyn Rodrigues1-0/+8
Track negative dentries by recording the generation number of the parent directory in d_fsdata. The generation number for the parent directory is recorded in the inode_info, which increments every time the lock on the directory is dropped. If the generation number of the parent directory and the negative dentry matches, there is no need to perform the revalidate, else a revalidate is forced. This improves performance in situations where nodes look for the same non-existent file multiple times. Thanks Mark for explaining the DLM sequence. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-07-20fs/ocfs2: Remove unnecessary casts of private_dataJoe Perches1-2/+2
Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-05-21ocfs2: Avoid unnecessary block mapping when refreshing quota infoJan Kara1-1/+2
The position of global quota file info does not change. So we do not have to do logical -> physical block translation every time we reread it from disk. Thus we can also avoid taking ip_alloc_sem. Acked-by: Joel Becker <Joel.Becker@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2010-03-08Merge branch 'for-next' into for-linusJiri Kosina1-1/+1
Conflicts: Documentation/filesystems/proc.txt arch/arm/mach-u300/include/mach/debug-macro.S drivers/net/qlge/qlge_ethtool.c drivers/net/qlge/qlge_main.c drivers/net/typhoon.c
2010-02-27ocfs2: Use a separate masklog for AST and BASTsSunil Mushran1-25/+65
This patch adds a new masklog and uses it allow tracing ASTs and BASTs in the dlmglue layer. This has been found to be very useful in debugging cluster locking issues. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26ocfs2: Pass the locking protocol into ocfs2_cluster_connect().Joel Becker1-85/+83
Inside the stackglue, the locking protocol structure is hanging off of the ocfs2_cluster_connection. This takes it one further; the locking protocol is passed into ocfs2_cluster_connect(). Now different cluster connections can have different locking protocols with distinct asts. Note that all locking protocols have to keep their maximum protocol version in lock-step. With the protocol structure set in ocfs2_cluster_connect(), there is no need for the stackglue to have a static pointer to a specific protocol structure. We can change initialization to only pass in the maximum protocol version. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26ocfs2: Attach the connection to the lksbJoel Becker1-4/+4
We're going to want it in the ast functions, so we convert union ocfs2_dlm_lksb to struct ocfs2_dlm_lksb and let it carry the connection. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26ocfs2: Pass lksbs back from stackglue ast/bast functions.Joel Becker1-17/+17
The stackglue ast and bast functions tried to maintain the fiction that their arguments were void pointers. In reality, stack_user.c had to know that the argument was an ocfs2_lock_res in order to get the status off of the lksb. That's ugly. This changes stackglue to always pass the lksb as the argument to ast and bast functions. The caller can always use container_of() to get the ocfs2_lock_res or user_dlm_lock_res. The net effect to the caller is zero. They still get back the lockres in their ast. stackglue gets cleaner, and now can use the lksb itself. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-09tree-wide: Assorted spelling fixesDaniel Mack1-1/+1
In particular, several occurances of funny versions of 'success', 'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address', 'beginning', 'desirable', 'separate' and 'necessary' are fixed. Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: Joe Perches <joe@perches.com> Cc: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-03ocfs2: Plugs race between the dc thread and an unlock ast messageSunil Mushran1-1/+3
This patch plugs a race between the downconvert thread and an unlock ast message. Specifically, after the downconvert worker has done its task, the dc thread needs to check whether an unlock ast made the downconvert moot. Reported-by: David Teigland <teigland@redhat.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Mark Fasheh <mfasheh@sus.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02ocfs2: Remove overzealous BUG_ON during blocked lock processingSunil Mushran1-2/+10
During blocked lock processing, we should consider the possibility that the lock is no longer blocking. Joel Becker <joel.becker@oracle.com> assisted in fixing this issue. Reported-by: David Teigland <teigland@redhat.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02ocfs2: Do not downconvert if the lock level is already compatibleSunil Mushran1-0/+13
During upconvert, if the master were to send a BAST, dlmglue will detect the upconversion in process and send a cancel convert to the master. Upon receiving the AST for the cancel convert, it will re-process the lock resource to determine whether it needs downconverting. Say, the up was from PR to EX and the BAST was for EX. After the cancel convert, it will need to downconvert to NL. However, if the node was originally upconverting from NL to EX, then there would be no reason to downconvert (assuming the same message sequence). This patch makes dlmglue consider the possibility that the current lock level is already compatible and that downconverting is not required. Joel Becker <joel.becker@oracle.com> assisted in fixing this issue. Fixes ossbz#1178 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1178 Reported-by: Coly Li <coly.li@suse.de> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02ocfs2: Prevent a livelock in dlmglueSunil Mushran1-3/+46
There is possibility of a livelock in __ocfs2_cluster_lock(). If a node were to get an ast for an upconvert request, followed immediately by a bast, there is a small window where the fs may downconvert the lock before the process requesting the upconvert is able to take the lock. This patch adds a new flag to indicate that the upconvert is still in progress and that the dc thread should not downconvert it right now. Wengang Wang <wen.gang.wang@oracle.com> and Joel Becker <joel.becker@oracle.com> contributed heavily to this patch. Reported-by: David Teigland <teigland@redhat.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02ocfs2: Fix setting of OCFS2_LOCK_BLOCKED during bastWengang Wang1-2/+3
During bast, set the OCFS2_LOCK_BLOCKED flag only if the lock needs to downconverted. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-01-25ocfs2/trivial: Remove trailing whitespacesSunil Mushran1-1/+1
Patch removes trailing whitespaces. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-04tree-wide: fix assorted typos all over the placeAndré Goddard Rosa1-1/+1
That is "success", "unknown", "through", "performance", "[re|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-09-23dlmglue.c: add missed mlog linesColy Li1-1/+4
This patch adds the missed mlog_exit() and mlog_exit_void() lines when routines return. Signed-off-by: Coly Li <coly.li@suse.de> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-22ocfs2: Add new refcount tree lock resource in dlmglue.Tao Ma1-0/+80
refcount tree lock resource is used to protect refcount tree read/write among multiple nodes. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-22ocfs2: Abstract caching info checkpoint.Tao Ma1-5/+13
In meta downconvert, we need to checkpoint the metadata in an inode. For refcount tree, we also need it. So abstract the process out. Signed-off-by: Tao Ma <tao.ma@oracle.com>
2009-09-04ocfs2: Pass struct ocfs2_caching_info to the journal functions.Joel Becker1-1/+1
The next step in divorcing metadata I/O management from struct inode is to pass struct ocfs2_caching_info to the journal functions. Thus the journal locks a metadata cache with the cache io_lock function. It also can compare ci_last_trans and ci_created_trans directly. This is a large patch because of all the places we change ocfs2_journal_access..(handle, inode, ...) to ocfs2_journal_access..(handle, INODE_CACHE(inode), ...). Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04ocfs2: Take the inode out of the metadata read/write paths.Joel Becker1-1/+1
We are really passing the inode into the ocfs2_read/write_blocks() functions to get at the metadata cache. This commit passes the cache directly into the metadata block functions, divorcing them from the inode. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22ocfs2: Add lockdep annotationsJan Kara1-16/+65
Add lockdep support to OCFS2. The support also covers all of the cluster locks except for open locks, journal locks, and local quotafile locks. These are special because they are acquired for a node, not for a particular process and lockdep cannot deal with such type of locking. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22ocfs2: Disable orphan scanning for local and hard-ro mountsSunil Mushran1-10/+16
Local and Hard-RO mounts do not need orphan scanning. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init()Sunil Mushran1-4/+3
We don't access the LVB in our ocfs2_*_lock_res_init() functions. Since the LVB can become invalid during some cluster recovery operations, the dlmglue must be able to handle an uninitialized LVB. For the orphan scan lock, we initialized an uninitialzed LVB with our scan sequence number plus one. This starts a normal orphan scan cycle. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-22ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API.Joel Becker1-3/+6
The Lock Value Block (LVB) of a DLM lock can be lost when nodes die and the DLM cannot reconstruct its state. Clients of the DLM need to know this. ocfs2's internal DLM, o2dlm, explicitly zeroes out the LVB when it loses track of the state. This is not a standard behavior, but ocfs2 has always relied on it. Thus, an o2dlm LVB is always "valid". ocfs2 now supports both o2dlm and fs/dlm via the stack glue. When fs/dlm loses track of an LVBs state, it sets a flag (DLM_SBF_VALNOTVALID) on the Lock Status Block (LKSB). The contents of the LVB may be garbage or merely stale. ocfs2 doesn't want to try to guess at the validity of the stale LVB. Instead, it should be checking the VALNOTVALID flag. As this is the 'standard' way of treating LVBs, we will promote this behavior. We add a stack glue API ocfs2_dlm_lvb_valid(). It returns non-zero when the LVB is valid. o2dlm will always return valid, while fs/dlm will check VALNOTVALID. Signed-off-by: Joel Becker <joel.becker@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com>
2009-06-03ocfs2: timer to queue scan of all orphan slotsSrinivas Eeda1-0/+51
When a dentry is unlinked, the unlinking node takes an EX on the dentry lock before moving the dentry to the orphan directory. Other nodes that have this dentry in cache have a PR on the same dentry lock. When the EX is requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED during downconvert. The inode is finally deleted when the last node to iput the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set. A problem arises if a node is forced to free dentry locks because of memory pressure. If this happens, the node will no longer get downconvert notifications for the dentries that have been unlinked on another node. If it also happens that node is actively using the corresponding inode and happens to be the one performing the last iput on that inode, it will fail to delete the inode as it will not have the MAYBE_ORPHANED flag set. This patch fixes this shortcoming by introducing a periodic scan of the orphan directories to delete such inodes. Care has been taken to distribute the workload across the cluster so that no one node has to perform the task all the time. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-04-03ocfs2: fix rare stale inode errors when exporting via nfswengang wang1-0/+46
For nfs exporting, ocfs2_get_dentry() returns the dentry for fh. ocfs2_get_dentry() may read from disk when the inode is not in memory, without any cross cluster lock. this leads to the file system loading a stale inode. This patch fixes above problem. Solution is that in case of inode is not in memory, we get the cluster lock(PR) of alloc inode where the inode in question is allocated from (this causes node on which deletion is done sync the alloc inode) before reading out the inode itsself. then we check the bitmap in the group (the inode in question allcated from) to see if the bit is clear. if it's clear then it's stale. if the bit is set, we then check generation as the existing code does. We have to read out the inode in question from disk first to know its alloc slot and allot bit. And if its not stale we read it out using ocfs2_iget(). The second read should then be from cache. And also we have to add a per superblock nfs_sync_lock to cover the lock for alloc inode and that for inode in question. this is because ocfs2_get_dentry() and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so that mutliple ocfs2_delete_inode() can run concurrently in normal case. [mfasheh@suse.com: build warning fixes and comment cleanups] Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-02-26ocfs2: Cleanup the lockname print in dlmglue.cSunil Mushran1-3/+8
The dentry lock has a different format than other locks. This patch fixes ocfs2_log_dlm_error() macro to make it print the dentry lock correctly. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-02-02ocfs2: Wakeup the downconvert thread after a successful cancel convertSunil Mushran1-0/+4
When two nodes holding PR locks on a resource concurrently attempt to upconvert the locks to EX, the master sends a BAST to one of the nodes. This message tells that node to first cancel convert the upconvert request, followed by downconvert to a NL. Only when this lock is downconverted to NL, can the master upconvert the first node's lock to EX. While the fs was doing the cancel convert, it was forgetting to wake up the dc thread after a successful cancel, leading to a deadlock. Reported-and-Tested-by: David Teigland <teigland@redhat.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-08fix similar typos to successfullColy Li1-2/+2
When I review ocfs2 code, find there are 2 typos to "successfull". After doing grep "successfull " in kernel tree, 22 typos found totally -- great minds always think alike :) This patch fixes all the similar typos. Thanks for Randy's ack and comments. Signed-off-by: Coly Li <coyli@suse.de> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Roland Dreier <rolandd@cisco.com> Cc: Jeremy Kerr <jk@ozlabs.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-05ocfs2: remove unneeded lvb castsMark Fasheh1-7/+5
dlmglue.c has lots of code which casts the return value of ocfs2_dlm_lvb(). This is pointless however, as ocfs2_dlm_lvb() returns void *. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Fix ocfs2_read_quota_block() error handling.Joel Becker1-3/+3
ocfs2_bread() has become ocfs2_read_virt_blocks(), with a prototype to match ocfs2_read_blocks(). The quota code, converting from ocfs2_bread(), wraps the call to ocfs2_read_virt_blocks() in ocfs2_read_quota_block(). Unfortunately, the prototype of ocfs2_read_quota_block() matches the old prototype of ocfs2_bread(). The problem is that ocfs2_bread() returned the buffer head, and callers assumed that a NULL pointer was indicative of error. It wasn't. This is why ocfs2_bread() took an int*err argument as well. The new prototype of ocfs2_read_virt_blocks() avoids this error handling confusion. Let's change ocfs2_read_quota_block() to match. Signed-off-by: Joel Becker <joel.becker@oracle.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Implementation of local and global quota file handlingJan Kara1-0/+146
For each quota type each node has local quota file. In this file it stores changes users have made to disk usage via this node. Once in a while this information is synced to global file (and thus with other nodes) so that limits enforcement at least aproximately works. Global quota files contain all the information about usage and limits. It's mostly handled by the generic VFS code (which implements a trie of structures inside a quota file). We only have to provide functions to convert structures from on-disk format to in-memory one. We also have to provide wrappers for various quota functions starting transactions and acquiring necessary cluster locks before the actual IO is really started. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Wrap inode block reads in a dedicated function.Joel Becker1-8/+4
The ocfs2 code currently reads inodes off disk with a simple ocfs2_read_block() call. Each place that does this has a different set of sanity checks it performs. Some check only the signature. A couple validate the block number (the block read vs di->i_blkno). A couple others check for VALID_FL. Only one place validates i_fs_generation. A couple check nothing. Even when an error is found, they don't all do the same thing. We wrap inode reading into ocfs2_read_inode_block(). This will validate all the above fields, going readonly if they are invalid (they never should be). ocfs2_read_inode_block_full() is provided for the places that want to pass read_block flags. Every caller is passing a struct inode with a valid ip_blkno, so we don't need a separate blkno argument either. We will remove the validation checks from the rest of the code in a later commit, as they are no longer necessary. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-01ocfs2: fix wake_up in unlock_astDavid Teigland1-2/+1
In ocfs2_unlock_ast(), call wake_up() on lockres before releasing the spin lock on it. As soon as the spin lock is released, the lockres can be freed. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14ocfs2: Simplify ocfs2_read_block()Joel Becker1-6/+2
More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED. Only six pass a different flag set. Rather than have every caller care, let's make ocfs2_read_block() take no flags and always do a cached read. The remaining six places can call ocfs2_read_blocks() directly. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-10-14ocfs2: Require an inode for ocfs2_read_block(s)().Joel Becker1-5/+4
Now that synchronous readers are using ocfs2_read_blocks_sync(), all callers of ocfs2_read_blocks() are passing an inode. Use it unconditionally. Since it's there, we don't need to pass the ocfs2_super either. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-07-14ocfs2: fix printk format warnings with OCFS2_FS_STATS=nRandy Dunlap1-4/+4
Fix printk format warnings when OCFS2_FS_STATS=n: linux-next-20080528/fs/ocfs2/dlmglue.c: In function 'ocfs2_dlm_seq_show': linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'int' linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 4 has type 'int' linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 7 has type 'int' linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 8 has type 'int' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-07-14[PATCH 2/2] ocfs2: Instrument fs cluster locksSunil Mushran1-1/+121
This patch adds code to track the number of times the fs takes various cluster locks as well as the times associated with it. The information is made available to users via debugfs. This patch was originally written by Jan Kara <jack@suse.cz>. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-07-10ocfs2: Fix flags in ocfs2_file_lockMark Fasheh1-7/+7
The stack-glue merge changed the way we use flags in dlmglue in that we now use the fs/dlm equivalents. Unfortunately, a merge error left the new flock code only partially updated. This took a while to show up though, because the lock level constants are actually identical between o2dlm and fs/dlm. The *_CONVERT and *_NOQUEUE flags have different values though, which is eventually causing a crash in flags_to_o2dlm(). Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2: Add the 'cluster_stack' sysfs file.Joel Becker1-1/+2
Userspace can now query and specify the cluster stack in use via the /sys/fs/ocfs2/cluster_stack file. By default, it is 'o2cb', which is the classic stack. Thus, old tools that do not know how to modify this file will work just fine. The stack cannot be modified if there is a live filesystem. ocfs2_cluster_connect() now takes the expected cluster stack as an argument. This way, the filesystem and the stack glue ensure they are speaking to the same backend. If the stack is 'o2cb', the o2cb stack plugin is used. For any other value, the fsdlm stack plugin is selected. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2: Break out stackglue into modules.Joel Becker1-3/+4
We define the ocfs2_stack_plugin structure to represent a stack driver. The o2cb stack code is split into stack_o2cb.c. This becomes the ocfs2_stack_o2cb.ko module. The stackglue generic functions are similarly split into the ocfs2_stackglue.ko module. This module now provides an interface to register drivers. The ocfs2_stack_o2cb driver registers itself. As part of this interface, ocfs2_stackglue can load drivers on demand. This is accomplished in ocfs2_cluster_connect(). ocfs2_cluster_disconnect() is now notified when a _hangup() is pending. If a hangup is pending, it will not release the driver module and will let _hangup() do that. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2008-04-18ocfs2: Clean up stackglue initializationJoel Becker1-7/+2
The stack glue initialization function needs a better name so that it can be used cleanly when stackglue becomes a module. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2: Abstract out a debugging function for underlying dlms.Joel Becker1-2/+1
dlmglue.c was still referencing a raw o2dlm lksb in one instance. Let's create a generic ocfs2_dlm_dump_lksb() function. This allows underlying DLMs to print whatever they want about their lock. We then move the o2dlm dump into stackglue.c where it belongs. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2: handle async EAGAIN from NOQUEUE requestDavid Teigland1-4/+23
When using fsdlm, -EAGAIN is returned in the async callback for NOQUEUE requests. Fix up dlmglue to expect this. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2: Remove CANCELGRANT from the view of dlmglue.Joel Becker1-28/+171
o2dlm has the non-standard behavior of providing a cancel callback (unlock_ast) even when the cancel has failed (the locking operation succeeded without canceling). This is called CANCELGRANT after the status code sent to the callback. fs/dlm does not provide this callback, so dlmglue must be changed to live without it. o2dlm_unlock_ast_wrapper() in stackglue now ignores CANCELGRANT calls. Because dlmglue no longer sees CANCELGRANT, ocfs2_unlock_ast() no longer needs to check for it. ocfs2_locking_ast() must catch that a cancel was tried and clear the cancel state. Making these changes opens up a locking race. dlmglue uses the the OCFS2_LOCK_BUSY flag to ensure only one thread is calling the dlm at any one time. But dlmglue must unlock the lockres before calling into the dlm. In the small window of time between unlocking the lockres and calling the dlm, the downconvert thread can try to cancel the lock. The downconvert thread is checking the OCFS2_LOCK_BUSY flag - it doesn't know that ocfs2_dlm_lock() has not yet been called. Because ocfs2_dlm_lock() has not yet been called, the cancel operation will just be a no-op. There's nothing to cancel. With CANCELGRANT, dlmglue uses the CANCELGRANT callback to clear up the cancel state. When it comes around again, it will retry the cancel. Eventually, the first thread will have called into ocfs2_dlm_lock(), and either the lock or the cancel will succeed. The downconvert thread can then do its downconvert. Without CANCELGRANT, there is nothing to clean up the cancellation state. The downconvert thread does not know to retry its operations. More importantly, the original lock may be blocking on the other node that is trying to cancel us. With neither able to make progress, the ast is never called and the cancellation state is never cleaned up that way. dlmglue is deadlocked. The OCFS2_LOCK_PENDING flag is introduced to remedy this window. It is set at the same time OCFS2_LOCK_BUSY is. Thus, the downconvert thread can check whether the lock is cancelable. If not, it just loops around to try again. Once ocfs2_dlm_lock() is called, the thread then clears OCFS2_LOCK_PENDING and wakes the downconvert thread. Now, if the downconvert thread finds the lock BUSY, it can safely try to cancel it. Whether the cancel works or not, the state will be properly set and the lock processing can continue. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2: Fill node number during cluster stack initMark Fasheh1-1/+12
It doesn't make sense to query for a node number before connecting to the cluster stack. This should be safe to do because node_num is only just printed, and we're actually only moving the setting of node num a small amount further in the mount process. [ Disconnect when node query fails -- Joel ] Reviewed-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2: Move o2hb functionality into the stack glue.Joel Becker1-4/+0
The last bit of classic stack used directly in ocfs2 code is o2hb. Specifically, the check for heartbeat during mount and the call to ocfs2_hb_ctl during unmount. We create an extra API, ocfs2_cluster_hangup(), to encapsulate the call to ocfs2_hb_ctl. Other stacks will just leave hangup() empty. The check for heartbeat is moved into ocfs2_cluster_connect(). It will be matched by a similar check for other stacks. With this change, only stackglue.c includes cluster/ headers. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API.Joel Becker1-49/+48
This step introduces a cluster stack agnostic API for initializing and exiting. fs/ocfs2/dlmglue.c no longer uses o2cb/o2dlm knowledge to connect to the stack. It is all handled in stackglue.c. heartbeat.c no longer needs to know how it gets called. ocfs2_do_node_down() is now a clean recovery trigger. The big gotcha is the ordering of initializations and de-initializations done underneath ocfs2_cluster_connect(). ocfs2_dlm_init() used to do all o2dlm initialization in one block. Thus, the o2dlm functionality of ocfs2_cluster_connect() is very straightforward. ocfs2_dlm_shutdown(), however, did a few things between de-registration of the eviction callback and actually shutting down the domain. Now de-registration and shutdown of the domain are wrapped within the single ocfs2_cluster_disconnect() call. I've checked the code paths to make sure we can safely tear down things in ocfs2_dlm_shutdown() before calling ocfs2_cluster_disconnect(). The filesystem has already set itself to ignore the callback. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2: Create the lock status block union.Joel Becker1-10/+13
Wrap the lock status block (lksb) in a union. Later we will add a union element for the fs/dlm lksb. Create accessors for the status and lvb fields. Other than a debugging function, dlmglue.c does not directly reference the o2dlm locking path anymore. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2: Use -errno instead of dlm_status for ocfs2_dlm_lock/unlock() API.Joel Becker1-66/+50
Change the ocfs2_dlm_lock/unlock() functions to return -errno values. This is the first step towards elminiating dlm_status in fs/ocfs2/dlmglue.c. The change also passes -errno values to ->unlock_ast(). [ Fix a return code in dlmglue.c and change the error translation table into an array of ints. --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2: Use global DLM_ constants in generic code.Joel Becker1-70/+70
The ocfs2 generic code should use the values in <linux/dlmconstants.h>. stackglue.c will convert them to o2dlm values. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2: Separate out dlm lock functions.Joel Becker1-49/+61
This is the first in a series of patches to isolate ocfs2 from the underlying cluster stack. Here we wrap the dlm locking functions with ocfs2-specific calls. Because ocfs2 always uses the same dlm lock status callbacks, we can eliminate the callbacks from the filesystem visible functions. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2: Change the recovery map to an array of node numbers.Joel Becker1-4/+2
The old recovery map was a bitmap of node numbers. This was sufficient for the maximum node number of 254. Going forward, we want node numbers to be UINT32. Thus, we need a new recovery map. Note that we can't keep track of slots here. We must write down the node number to recovery *before* we get the locks needed to convert a node number into a slot number. The recovery map is now an array of unsigned ints, max_slots in size. It moves to journal.c with the rest of recovery. Because it needs to be initialized, we move all of recovery initialization into a new function, ocfs2_recovery_init(). This actually cleans up ocfs2_initialize_super() a little as well. Following on, recovery cleaup becomes part of ocfs2_recovery_exit(). A number of node map functions are rendered obsolete and are removed. Finally, waiting on recovery is wrapped in a function rather than naked checks on the recovery_event. This is a cleanup from Mark. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18ocfs2: Move slot map access into slot_map.cMark Fasheh1-7/+1
journal.c and dlmglue.c would refresh the slot map by hand. Instead, have the update and clear functions do the work inside slot_map.c. The eventual result is to make ocfs2_slot_info defined privately in slot_map.c Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-03-10[PATCH] [OCFS2]: constify function pointer tablesJan Engelhardt1-1/+1
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-03[2.6 patch] make ocfs2_downconvert_thread() staticAdrian Bunk1-1/+1
This patch makes the needlessly global ocfs2_downconvert_thread() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-03[2.6 patch] fs/ocfs2/: possible cleanupsAdrian Bunk1-2/+2
This patch contains the following cleanups that are now possible: - make the following needlessly global functions static: - dlmglue.c:ocfs2_process_blocked_lock() - heartbeat.c:ocfs2_node_map_init() - #if 0 the following unused global function plus support functions: - heartbeat.c:ocfs2_node_map_is_only() Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-03ocfs2: Fix writeout in ocfs2_data_convert_worker()Mark Fasheh1-1/+1
Commit f1f540688eae66c274ff1c1133b5d9c687b28f58 "optimized" ocfs2_data_convert_worker() to "only do work for regular files". Unfortunately, I left out a '!', which casued it to *skip* regular files. This was hidden from testing until recently because the default data journaling mode (data=ordered) doesn't exercise this code. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2008-02-06ocfs2: Negotiate locking protocol versions.Joel Becker1-1/+28
Currently, when ocfs2 nodes connect via TCP, they advertise their compatibility level. If the versions do not match, two nodes cannot speak to each other and they disconnect. As a result, this provides no forward or backwards compatibility. This patch implements a simple protocol negotiation at the dlm level by introducing a major/minor version number scheme for entities that communicate. Specifically, o2dlm has a major/minor version for interaction with o2dlm on other nodes, and ocfs2 itself has a major/minor version for interacting with the filesystem on other nodes. This will allow rolling upgrades of ocfs2 clusters when changes to the locking or network protocols can be done in a backwards compatible manner. In those cases, only the minor number is changed and the negotatied protocol minor is returned from dlm join. In the far less likely event that a required protocol change makes backwards compatibility impossible, we simply bump the major number. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25[PATCH 1/2] ocfs2: add flock lock typeMark Fasheh1-0/+267
This adds a new dlmglue lock type which is intended to back flock() requests. Since these locks are driven from userspace, usage rules are much more liberal than the typical Ocfs2 internal cluster lock. As a result, we can't make use of most dlmglue features - lock caching and lock level optimizations in particular. Additionally, userspace is free to deadlock itself, so we have to deal with that in the same way as the rest of the kernel - by allowing a signal to abort a lock request. In order to keep ocfs2_cluster_lock() complexity down, ocfs2_file_lock() does it's own dlm coordination. We still use the same helper functions though, so duplicated code is kept to a minimum. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25ocfs2: Rename ocfs2_meta_[un]lockMark Fasheh1-25/+25
Call this the "inode_lock" now, since it covers both data and meta data. This patch makes no functional changes. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25ocfs2: Remove data locksMark Fasheh1-104/+0
The meta lock now covers both meta data and data, so this just removes the now-redundant data lock. Combining locks saves us a round of lock mastery per inode and one less lock to ping between nodes during read/write. We don't lose much - since meta locks were always held before a data lock (and at the same level) ordered writeout mode (the default) ensured that flushing for the meta data lock also pushed out data anyways. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25ocfs2: Add data downconvert worker to inode lockMark Fasheh1-0/+5
In order to extend inode lock coverage to inode data, we use the same data downconvert worker with only a small modification to only do work for regular files. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25ocfs2: Remove mount/unmount votesMark Fasheh1-36/+128
The node maps that are set/unset by these votes are no longer relevant, thus we can remove the mount and umount votes. Since those are the last two remaining votes, we can also remove the entire vote infrastructure. The vote thread has been renamed to the downconvert thread, and the small amount of functionality related to managing it has been moved into fs/ocfs2/dlmglue.c. All references to votes have been removed or updated. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-06ocfs2: Create locks at initially requested levelMark Fasheh1-14/+9
If we have not yet created a cluster lock, ocfs2_cluster_lock() will first create it at NLMODE, and then convert the lock to either PRMODE or EXMODE (whichever is requested). Change ocfs2_cluster_lock() to just create the lock at the initially requested level. ocfs2_locking_ast() handles this case fine, so the only update required was in setup of locking state. This should reduce the number of network messages required for a new lock by one, providing an incremental performance enhancement. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-06[PATCH] Fix priority mistakes in fs/ocfs2/{alloc.c, dlmglue.c}Roel Kluin1-1/+1
Fixes priority mistakes similar to '!x & y' Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-12ocfs2: Structure updates for inline dataMark Fasheh1-0/+2
Add the disk, network and memory structures needed to support data in inode. Struct ocfs2_inline_data is defined and embedded in ocfs2_dinode for storing inline data. A new inode field, i_dyn_features, is added to facilitate tracking of dynamic inode state. Since it will be used often, we want to mirror it on ocfs2_inode_info, and transfer it via the meta data lvb. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-07-10[PATCH] ocfs2: use list_for_each_entry where beneficalChristoph Hellwig1-4/+2
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-08header cleaning: don't include smp_lock.h when not usedRandy Dunlap1-1/+0
Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-02[PATCH] fs/ocfs2/: make 3 functions staticAdrian Bunk1-25/+29
This patch makes the following needlessly global functions static: - aops.c: ocfs2_write_data_page() - dlmglue.c: ocfs2_dump_meta_lvb_info() - file.c: ocfs2_set_inode_size() Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Cache extent recordsMark Fasheh1-0/+2
The extent map code was ripped out earlier because of an inability to deal with holes. This patch adds back a simpler caching scheme requiring far less code. Our old extent map caching was designed back when meta data block caching in Ocfs2 didn't work very well, resulting in many disk reads. These days our metadata caching is much better, resulting in no un-necessary disk reads. As a result, extent caching doesn't have to be as fancy, nor does it have to cache as many extents. Keeping the last 3 extents seen should be sufficient to give us a small performance boost on some streaming workloads. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Fix up i_blocks calculation to know about holesMark Fasheh1-2/+1
Older file systems which didn't support holes did a dumb calculation of i_blocks based on i_size. This is no longer accurate, so fix things up to take actual allocation into account. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: temporarily remove extent map cachingMark Fasheh1-4/+0
The code in extent_map.c is not prepared to deal with a subtree being rotated between lookups. This can happen when filling holes in sparse files. Instead of a lengthy patch to update the code (which would likely lose the benefit of caching subtree roots), we remove most of the algorithms and implement a simple path based lookup. A less ambitious extent caching scheme will be added in a later patch. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Remove delete inode voteTiger Yang1-2/+117
Ocfs2 currently does cluster-wide node messaging to check the open state of an inode during delete. This patch removes that mechanism in favor of an inode cluster lock which is taken at shared read when an inode is first read and dropped in clear_inode(). This allows a deleting node to test the liveness of an inode by attempting to take an exclusive lock. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2: Local mounts should skip inode updatesMark Fasheh1-12/+9
We don't want the extent map and uptodate cache destruction in ocfs2_meta_lock_update() on a local mount, so skip that. This fixes several bugs with uptodate being cleared on buffers and extent maps being corrupted. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28ocfs2: always unmap in ocfs2_data_convert_worker()Mark Fasheh1-1/+9
Mmap-heavy clustered workloads were sometimes finding stale data on mmap reads. The solution is to call unmap_mapping_range() on any down convert of a data lock. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-07ocfs2: local mountsSunil Mushran1-20/+59
This allows users to format an ocfs2 file system with a special flag, OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT. When the file system sees this flag, it will not use any cluster services, nor will it require a cluster configuration, thus acting like a 'local' file system. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01ocfs2: core atime update functionsTiger Yang1-0/+39
This patch adds the core routines for updating atime in ocfs2. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01ocfs2: remove unused handle argument from ocfs2_meta_lock_full()Mark Fasheh1-5/+2
Now that this is unused and all callers pass NULL, we can safely remove it. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01ocfs2: remove unused ocfs2_handle_add_lock()Mark Fasheh1-6/+0
This gets us rid of a slab we no longer need, as well as removing the majority of what's left on ocfs2_journal_handle. ocfs2_commit_unstarted_handle() has no more real work to do, so remove that function too. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01[2.6 patch] make ocfs2_create_new_lock() staticAdrian Bunk1-4/+4
This patch makes the needlessly global ocfs2_create_new_lock() static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-27[PATCH] inode_diet: Replace inode.u.generic_ip with inode.i_privateTheodore Ts'o1-1/+1
The following patches reduce the size of the VFS inode structure by 28 bytes on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction in the inode size on a UP kernel that is configured in a production mode (i.e., with no spinlock or other debugging functions enabled; if you want to save memory taken up by in-core inodes, the first thing you should do is disable the debugging options; they are responsible for a huge amount of bloat in the VFS inode structure). This patch: The filesystem or device-specific pointer in the inode is inside a union, which is pretty pointless given that all 30+ users of this field have been using the void pointer. Get rid of the union and rename it to i_private, with a comment to explain who is allowed to use the void pointer. This is just a cleanup, but it allows us to reuse the union 'u' for something something where the union will actually be used. [judith@osdl.org: powerpc build fix] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Judith Lebzelter <judith@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-24ocfs2: Teach ocfs2_drop_lock() to use ->set_lvb() callbackMark Fasheh1-31/+28
With this, we don't need to pass an additional struct with function pointer. Now that the callbacks are fully used, comment the remaining API. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: Remove ->unblock lockres operationMark Fasheh1-146/+6
Have ocfs2_process_blocked_lock() call ocfs2_generic_unblock_lock(), which gets to be ocfs2_unblock_lock() now that it's the only possible unblock function. Remove the ->unblock() callback from the structure, and all lock type specific unblock functions. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: move downconvert worker to lockres opsMark Fasheh1-18/+32
This way lock types don't have to manually pass it to ocfs2_generic_unblock_lock(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: Remove unused dlmglue functionsMark Fasheh1-103/+0
The meta data unblocking code no longer needs ocfs2_do_unblock_meta() or ocfs2_can_downconvert_meta_lock(), so remove them. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: Have the metadata lock use generic dlmglue functionsMark Fasheh1-1/+31
Fill in the ->check_downconvert and ->set_lvb callbacks with meta data specific operations and switch ocfs2_unblock_meta() to call ocfs2_generic_unblock_lock() Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: Add ->set_lvb callback in dlmglueMark Fasheh1-2/+29
This allows a lock type to set the value block before downconvert. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: Add ->check_downconvert callback in dlmglueMark Fasheh1-1/+18
This will allow lock types to force a requeue of a lock downconvert. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: Check for refreshing locks in generic unblock functionMark Fasheh1-12/+19
Tidy up the exit path a bit too. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: don't unconditionally pass LVB flagsMark Fasheh1-3/+15
Allow a lock type to specifiy whether it makes use of the LVB. The only type which does this right now is the meta data lock. This should save us some space on network messages since they won't have to needlessly transmit value blocks. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: combine inode and generic blocking AST functionsMark Fasheh1-112/+11
There is extremely little difference between the two now. We can remove the callback from ocfs2_lock_res_ops as well. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: Add ->get_osb() dlmglue locking operationMark Fasheh1-0/+33
Will be used to find the ocfs2_super structure from a given lockres. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: remove ->unlock_ast() callback from ocfs2_lock_res_opsMark Fasheh1-13/+3
This was always defined to the same function in all locks, so clean things up by removing and passing ocfs2_unlock_ast() directly to the DLM. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: combine inode and generic AST functionsMark Fasheh1-110/+10
There is extremely little difference between the two now. We can remove the callback from ocfs2_lock_res_ops as well. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: Clean up lock resource refresh flagsMark Fasheh1-14/+35
Use of the refresh mechanism is lock-type wide, so move knowledge of that to the ocfs2_lock_res_ops structure. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: Remove i_generation from inode lock namesMark Fasheh1-6/+36
OCFS2 puts inode meta data in the "lock value block" provided by the DLM. Typically, i_generation is encoded in the lock name so that a deleted inode on and a new one in the same block don't share the same lvb. Unfortunately, that scheme means that the read in ocfs2_read_locked_inode() is potentially thrown away as soon as the meta data lock is taken - we cannot encode the lock name without first knowing i_generation, which requires a disk read. This patch encodes i_generation in the inode meta data lvb, and removes the value from the inode meta data lock name. This way, the read can be covered by a lock, and at the same time we can distinguish between an up to date and a stale LVB. This will help cold-cache stat(2) performance in particular. Since this patch changes the protocol version, we take the opportunity to do a minor re-organization of two of the LVB fields. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: Encode i_generation in the meta data lvbMark Fasheh1-5/+9
When i_generation is removed from the lockname, this will help us determine whether a meta data lvb has information that is in sync with the local struct inode. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: Free up some space in the lvbMark Fasheh1-3/+3
lvb_version doesn't need to be a whole 32 bits. Make it an 8 bit field to free up some space. This should be backwards compatible until we use one of the fields, in which case we'd bump the lvb version anyway. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: Add new cluster lock typeMark Fasheh1-104/+371
Replace the dentry vote mechanism with a cluster lock which covers a set of dentries. This allows us to force d_delete() only on nodes which actually care about an unlink. Every node that does a ->lookup() gets a read only lock on the dentry, until an unlink during which the unlinking node, will request an exclusive lock, forcing the other nodes who care about that dentry to d_delete() it. The effect is that we retain a very lightweight ->d_revalidate(), and at the same time get to make large improvements to the average case performance of the ocfs2 unlink and rename operations. This patch adds the cluster lock type which OCFS2 can attach to dentries. A small number of fs/ocfs2/dcache.c functions are stubbed out so that this change can compile. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-24ocfs2: Update dlmglue for new dlmlock() APIMark Fasheh1-0/+3
File system lock names are very regular right now, so we really only need to pass an extra parameter to dlmlock(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-09-20ocfs2: add ext2 attributesHerbert Poetzl1-2/+7
Support immutable, and other attributes. Some renaming and other minor fixes done by myself. Signed-off-by: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-29ocfs2: clean up some osb fieldsMark Fasheh1-2/+1
Get rid of osb->uuid, osb->proc_sub_dir, and osb->osb_id. Those fields were unused, or could easily be removed. As a result, we also no longer need MAX_OSB_ID or ocfs2_globals_lock. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-27[PATCH] spin/rwlock init cleanupsIngo Molnar1-1/+1
locking init cleanups: - convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK() - convert rwlocks in a similar manner this patch was generated automatically. Motivation: - cleanliness - lockdep needs control of lock initialization, which the open-coded variants do not give - it's also useful for -rt and for lock debugging in general Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28[PATCH] Make most file operations structs in fs/ constArjan van de Ven1-1/+1
This is a conversion to make the various file_operations structs in fs/ const. Basically a regexp job, with a few manual fixups The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24ocfs2: don't use MLF* in the file systemMark Fasheh1-49/+52
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-01-03[PATCH] OCFS2: The Second Oracle Cluster FilesystemMark Fasheh1-0/+2904
The OCFS2 file system module. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>