aboutsummaryrefslogtreecommitdiffstats
path: root/fs/ocfs2/dlm/dlmrecovery.c
AgeCommit message (Collapse)AuthorFilesLines
2022-11-18ocfs2/dlm: use bitmap API instead of hand-writing itJoseph Qi1-1/+1
Use bitmap_zero/bitmap_copy/bitmap_qeual directly for bitmap operations. Link: https://lkml.kernel.org/r/20221007124846.186453-3-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-01-15all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriateYury Norov1-1/+1
find_first{,_zero}_bit is a more effective analogue of 'next' version if start == 0. This patch replaces 'next' with 'first' where things look trivial. Signed-off-by: Yury Norov <yury.norov@gmail.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2021-11-06ocfs2/dlm: remove redundant assignment of variable retColin Ian King1-1/+0
The variable ret is being assigned a value that is never read, it is updated later on with a different value. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Link: https://lkml.kernel.org/r/20211007233452.30815-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-07treewide: remove editor modelines and cruftMasahiro Yamada1-3/+1
The section "19) Editor modelines and other cruft" in Documentation/process/coding-style.rst clearly says, "Do not include any of these in source files." I recently receive a patch to explicitly add a new one. Let's do treewide cleanups, otherwise some people follow the existing code and attempt to upstream their favoriate editor setups. It is even nicer if scripts/checkpatch.pl can check it. If we like to impose coding style in an editor-independent manner, I think editorconfig (patch [1]) is a saner solution. [1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/ Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> [auxdisplay] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-30ocfs2/dlm: remove unused functionJiapeng Chong1-7/+0
Fix the following clang warning: fs/ocfs2/dlm/dlmrecovery.c:129:20: warning: unused function 'dlm_reset_recovery' [-Wunused-function]. Link: https://lkml.kernel.org/r/1618382761-5784-1-git-send-email-jiapeng.chong@linux.alibaba.com Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reported-by: Abaci Robot <abaci@linux.alibaba.com> Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31ocfs2/dlm: remove redundant assignment to retColin Ian King1-1/+1
The variable ret is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses Coverity ("Unused value") Link: http://lkml.kernel.org/r/20191202164833.62865-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-31ocfs2: make local header paths relative to C filesMasahiro Yamada1-4/+4
Gang He reports the failure of building fs/ocfs2/ as an external module of the kernel installed on the system: $ cd fs/ocfs2 $ make -C /lib/modules/`uname -r`/build M=`pwd` modules If you want to make it work reliably, I'd recommend to remove ccflags-y from the Makefiles, and to make header paths relative to the C files. I think this is the correct usage of the #include "..." directive. Link: http://lkml.kernel.org/r/20191227022950.14804-1-ghe@suse.com Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Gang He <ghe@suse.com> Reported-by: Gang He <ghe@suse.com> Reviewed-by: Gang He <ghe@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12ocfs2/dlm: use struct_size() helperGustavo A. R. Silva1-5/+3
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct dlm_migratable_lockres { ... struct dlm_migratable_lock ml[0]; // 16 bytes each, begins at byte 112 }; Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. So, replace the following form: sizeof(struct dlm_migratable_lockres) + (mres->num_locks * sizeof(struct dlm_migratable_lock)) with: struct_size(mres, ml, mres->num_locks) Notice that, in this case, variable sz is not necessary, hence it is removed. This code was detected with the help of Coccinelle. Link: http://lkml.kernel.org/r/20190605204926.GA24467@embeddedor Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 145Thomas Gleixner1-16/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 021110 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 84 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190524100844.756442981@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-05ocfs2/dlm: clean up unused variable in dlm_process_recovery_dataChangwei Ge1-4/+0
Link: http://lkml.kernel.org/r/1522734135-7933-1-git-send-email-ge.changwei@h3c.com Signed-off-by: Changwei Ge <ge.changwei@h3c.com> Acked-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05ocfs2/dlm: wait for dlm recovery done when migrating all lock resourcespiaojun1-3/+10
Wait for dlm recovery done when migrating all lock resources in case that new lock resource left after leaving dlm domain. And the left lock resource will cause other nodes BUG. NodeA NodeB NodeC umount: dlm_unregister_domain() dlm_migrate_all_locks() NodeB down do recovery for NodeB and collect a new lockres form other live nodes: dlm_do_recovery dlm_remaster_locks dlm_request_all_locks: dlm_mig_lockres_handler dlm_new_lockres __dlm_insert_lockres at last NodeA become the master of the new lockres and leave domain: dlm_leave_domain() mount: dlm_join_domain() touch file and request for the owner of the new lockres, but all the other nodes said 'NO', so NodeC decide to be the owner, and send do assert msg to other nodes: dlmlock() dlm_get_lock_resource() dlm_do_assert_master() other nodes receive the msg and found two masters exist. at last cause BUG in dlm_assert_master_handler() -->BUG(); Link: http://lkml.kernel.org/r/5AAA6E25.7090303@huawei.com Fixes: bc9838c4d44a ("dlm: allow dlm do recovery during shutdown") Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Acked-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05ocfs2/dlm: clean up unused argument for dlm_destroy_recovery_area()Changwei Ge1-4/+4
Link: http://lkml.kernel.org/r/1521116681-14602-1-git-send-email-ge.changwei@h3c.com Signed-off-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05fs/ocfs2/dlm/dlmrecovery.c: remove unrelated commentChangwei Ge1-7/+0
Obviously, the comment before dlm_do_local_recovery_cleanup() has nothing to do with it. So remove it. Link: http://lkml.kernel.org/r/1519371054-4648-1-git-send-email-ge.changwei@h3c.com Signed-off-by: Changwei Ge <ge.changwei@h3c.com> Acked-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-05ocfs2/dlm: don't handle migrate lockres if already in shutdownJun Piao1-0/+9
We should not handle migrate lockres if we are already in 'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after leaving dlm domain. At last other nodes will get stuck into infinite loop when requsting lock from us. The problem is caused by concurrency umount between nodes. Before receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as the migrate target. So N2 will continue sending lockres to N1 even though N1 has left domain. N1 N2 (owner) touch file access the file, and get pr lock begin leave domain and pick up N1 as new owner begin leave domain and migrate all lockres done begin migrate lockres to N1 end leave domain, but the lockres left unexpectedly, because migrate task has passed [piaojun@huawei.com: v3] Link: http://lkml.kernel.org/r/5A9CBD19.5020107@huawei.com Link: http://lkml.kernel.org/r/5A99F028.2090902@huawei.com Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Reviewed-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15ocfs2: fix cluster hang after a node diesChangwei Ge1-0/+1
When a node dies, other live nodes have to choose a new master for an existed lock resource mastered by the dead node. As for ocfs2/dlm implementation, this is done by function - dlm_move_lockres_to_recovery_list which marks those lock rsources as DLM_LOCK_RES_RECOVERING and manages them via a list from which DLM changes lock resource's master later. So without invoking dlm_move_lockres_to_recovery_list, no master will be choosed after dlm recovery accomplishment since no lock resource can be found through ::resource list. What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for lock resources mastered a dead node, it will break up synchronization among nodes. So invoke dlm_move_lockres_to_recovery_list again. Fixs: 'commit ee8f7fcbe638 ("ocfs2/dlm: continue to purge recovery lockres when recovery master goes down")' Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373CED6E0F9@H3CMLB14-EX.srv.huawei-3com.com Signed-off-by: Changwei Ge <ge.changwei@h3c.com> Reported-by: Vitaly Mayatskih <v.mayatskih@gmail.com> Tested-by: Vitaly Mayatskikh <v.mayatskih@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-12ocfs2/dlm: clean up useless BUG_ON default case in dlm_finalize_reco_handler()piaojun1-2/+0
The value of 'stage' must be between 1 and 2, so the switch can't reach the default case. Link: http://lkml.kernel.org/r/57FB5EB2.7050002@huawei.com Signed-off-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02ocfs2/dlm: continue to purge recovery lockres when recovery master goes downpiaojun1-8/+21
We found a dlm-blocked situation caused by continuous breakdown of recovery masters described below. To solve this problem, we should purge recovery lock once detecting recovery master goes down. N3 N2 N1(reco master) go down pick up recovery lock and begin recoverying for N2 go down pick up recovery lock failed, then purge it: dlm_purge_lockres ->DROPPING_REF is set send deref to N1 failed, recovery lock is not purged find N1 go down, begin recoverying for N1, but blocked in dlm_do_recovery as DROPPING_REF is set: dlm_do_recovery ->dlm_pick_recovery_master ->dlmlock ->dlm_get_lock_resource ->__dlm_wait_on_lockres_flags(tmpres, DLM_LOCK_RES_DROPPING_REF); Fixes: 8c0343968163 ("ocfs2/dlm: clear DROPPING_REF flag when the master goes down") Link: http://lkml.kernel.org/r/578453AF.8030404@huawei.com Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Jiufei Xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25ocfs2/dlm: fix BUG in dlm_move_lockres_to_recovery_listJoseph Qi1-1/+0
When master handles convert request, it queues ast first and then returns status. This may happen that the ast is sent before the request status because the above two messages are sent by two threads. And right after the ast is sent, if master down, it may trigger BUG in dlm_move_lockres_to_recovery_list in the requested node because ast handler moves it to grant list without clear lock->convert_pending. So remove BUG_ON statement and check if the ast is processed in dlmconvert_remote. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reported-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Tariq Saeed <tariq.x.saeed@oracle.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15ocfs2: fix a tiny race that leads file system read-onlyJiufei Xue1-0/+8
when o2hb detect a node down, it first set the dead node to recovery map and create ocfs2rec which will replay journal for dead node. o2hb thread then call dlm_do_local_recovery_cleanup() to delete the lock for dead node. After the lock of dead node is gone, locks for other nodes can be granted and may modify the meta data without replaying journal of the dead node. The detail is described as follows. N1 N2 N3(master) modify the extent tree of inode, and commit dirty metadata to journal, then goes down. o2hb thread detects N1 goes down, set recovery map and delete the lock of N1. dlm_thread flush ast for the lock of N2. do not detect the death of N1, so recovery map is empty. read inode from disk without replaying the journal of N1 and modify the extent tree of the inode that N1 had modified. ocfs2rec recover the journal of N1. The modification of N2 is lost. The modification of N1 and N2 are not serial, and it will lead to read-only file system. We can set recovery_waiting flag to the lock resource after delete the lock for dead node to prevent other node from getting the lock before dlm recovery. After dlm recovery, the recovery map on N2 is not empty, ocfs2_inode_lock_full_nested() will wait for ocfs2 recovery. Signed-off-by: Jiufei Xue <xuejiufei@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15ocfs2/dlm: return EINVAL when the lockres on migration target is in ↵xuejiufei1-1/+13
DROPPING_REF state If master migrate this lock resource to node when it happened to purge it, a new lock resource will be created and inserted into hash list. If then master goes down, the lock resource being purged is recovered, so there exist two lock resource with different owner. So return error to master if the lock resource is in DROPPING state, master will retry to migrate this lock resource. Signed-off-by: xuejiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-15ocfs2/dlm: clear DROPPING_REF flag when the master goes downxuejiufei1-8/+10
If the master goes down after return in-progress for deref message. The lock resource on non-master node can not be purged. Clear the DROPPING_REF flag and recovery it. Signed-off-by: xuejiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-05ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanupxuejiufei1-0/+2
When recovery master down, dlm_do_local_recovery_cleanup() only remove the $RECOVERY lock owned by dead node, but do not clear the refmap bit. Which will make umount thread falling in dead loop migrating $RECOVERY to the dead node. Signed-off-by: xuejiufei <xuejiufei@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14ocfs2: dlm: remove redundant codeJunxiao Bi1-5/+1
Found this when do patch review, remove to make it clear and save a little cpu time. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14ocfs2/dlm: fix a race between purge and migrationXue jiufei1-1/+8
We found a race between purge and migration when doing code review. Node A put lockres to purgelist before receiving the migrate message from node B which is the master. Node A call dlm_mig_lockres_handler to handle this message. dlm_mig_lockres_handler dlm_lookup_lockres >>>>>> race window, dlm_run_purge_list may run and send deref message to master, waiting the response spin_lock(&res->spinlock); res->state |= DLM_LOCK_RES_MIGRATING; spin_unlock(&res->spinlock); dlm_mig_lockres_handler returns >>>>>> dlm_thread receives the response from master for the deref message and triggers the BUG because the lockres has the state DLM_LOCK_RES_MIGRATING with the following message: dlm_purge_lockres:209 ERROR: 6633EB681FA7474A9C280A4E1A836F0F: res M0000000000000000030c0300000000 in use after deref Signed-off-by: Jiufei Xue <xuejiufei@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05ocfs2: add uuid to ocfs2 thread name for problem analysisJoseph Qi1-1/+1
A node can mount multiple ocfs2 volumes. And if thread names are same for each volume/domain, it will bring inconvenience when analyzing problems because we have to identify which volume/domain the messages belong to. Since thread name will be printed to messages, so add volume uuid or dlm name to thread name can benefit problem analysis. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Gang He <ghe@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-23ocfs2/dlm: unlock lockres spinlock before dlm_lockres_putJoseph Qi1-1/+1
dlm_lockres_put will call dlm_lockres_release if it is the last reference, and then it may call dlm_print_one_lock_resource and take lockres spinlock. So unlock lockres spinlock before dlm_lockres_put to avoid deadlock. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-22ocfs2/dlm: fix deadlock when dispatch assert masterJoseph Qi1-2/+6
The order of the following three spinlocks should be: dlm_domain_lock < dlm_ctxt->spinlock < dlm_lock_resource->spinlock But dlm_dispatch_assert_master() is called while holding dlm_ctxt->spinlock and dlm_lock_resource->spinlock, and then it calls dlm_grab() which will take dlm_domain_lock. Once another thread (for example, dlm_query_join_handler) has already taken dlm_domain_lock, and tries to take dlm_ctxt->spinlock deadlock happens. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: "Junxiao Bi" <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-11revert "ocfs2/dlm: use list_for_each_entry instead of list_for_each"Andrew Morton1-2/+4
Revert commit f83c7b5e9fd6 ("ocfs2/dlm: use list_for_each_entry instead of list_for_each"). list_for_each_entry() will dereference its `pos' argument, which can be NULL in dlm_process_recovery_data(). Reported-by: Julia Lawall <julia.lawall@lip6.fr> Reported-by: Fengguang Wu <fengguang.wu@gmail.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04ocfs2/dlm: use list_for_each_entry instead of list_for_eachJoseph Qi1-4/+2
Use list_for_each_entry instead of list_for_each to simplify code. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10ocfs2/dlm: add missing dlm_lock_put() when recovery master downXue jiufei1-0/+7
When the recovery master is down, the owner of $RECOVERY calls dlm_do_local_recovery_cleanup() to prune any $RECOVERY entries for dead nodes. The lock is in the granted list and the refcount must be 2. We should put twice to remove this lock. Otherwise, it will lead to a memory leak. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reported-by: yangwenfang <vicky.yangwenfang@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08ocfs2: remove bogus check in dlm_process_recovery_dataJoseph Qi1-4/+1
In dlm_process_recovery_data, only when dlm_new_lock failed the ret will be set to -ENOMEM. And in this case, newlock is definitely NULL. So test newlock is meaningless, remove it. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10ocfs2/dlm: let sender retry if dlm_dispatch_assert_master failed with -ENOMEMJoseph Qi1-5/+13
Do not BUG() if GFP_ATOMIC allocation fails in dlm_dispatch_assert_master. Instead, return -ENOMEM to the sender and then retry. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Alex Chen <alex.chen@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09ocfs2/dlm: call dlm_lockres_put without resource spinlockalex chen1-2/+5
dlm_lockres_put() should be called without &res->spinlock, otherwise a deadlock case may happen. spin_lock(&res->spinlock) ... dlm_lockres_put ->dlm_lockres_release ->dlm_print_one_lock_resource ->spin_lock(&res->spinlock) Signed-off-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23ocfs2/dlm: do not purge lockres that is queued for assert masterXue jiufei1-1/+2
When workqueue is delayed, it may occur that a lockres is purged while it is still queued for master assert. it may trigger BUG() as follows. N1 N2 dlm_get_lockres() ->dlm_do_master_requery is the master of lockres, so queue assert_master work dlm_thread() start running and purge the lockres dlm_assert_master_worker() send assert master message to other nodes receiving the assert_master message, set master to N2 dlmlock_remote() send create_lock message to N2, but receive DLM_IVLOCKID, if it is RECOVERY lockres, it triggers the BUG(). Another BUG() is triggered when N3 become the new master and send assert_master to N1, N1 will trigger the BUG() because owner doesn't match. So we should not purge lockres when it is queued for assert master. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04ocfs2/dlm: fix possible convert=sion deadlockXue jiufei1-1/+9
We found there is a conversion deadlock when the owner of lockres happened to crash before send DLM_PROXY_AST_MSG for a downconverting lock. The situation is as follows: Node1 Node2 Node3 the owner of lockresA lock_1 granted at EX mode and call ocfs2_cluster_unlock to decrease ex_holders. converting lock_3 from NL to EX send DLM_PROXY_AST_MSG to Node1, asking Node 1 to downconvert. receiving DLM_PROXY_AST_MSG, thread ocfs2dc send DLM_CONVERT_LOCK_MSG to Node2 to downconvert lock_1(EX->NL). lock_1 can be granted and put it into pending_asts list, return DLM_NORMAL. then something happened and Node2 crashed. received DLM_NORMAL, waiting for DLM_PROXY_AST_MSG. selected as the recovery master, receving migrate lock from Node1, queue lock_1 to the tail of converting list. After dlm recovery, converting list in the master of lockresA(Node3) will be: converting list head <-> lock_3(NL->EX) <->lock_1(EX<->NL). Requested mode of lock_3 is not compatible with the granted mode of lock_1, so it can not be granted. and lock_1 can not downconvert because covnerting queue is strictly FIFO. So a deadlock is created. We think function dlm_process_recovery_data() should queue_ast for lock_1 or alter the order of lock_1 and lock_3, so dlm_thread can process lock_1 first. And if there are multiple downconverting locks, they must convert form PR to NL, so no need to sort them. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03ocfs2: dlm: fix recovery hungJunxiao Bi1-2/+13
There is a race window in dlm_do_recovery() between dlm_remaster_locks() and dlm_reset_recovery() when the recovery master nearly finish the recovery process for a dead node. After the master sends FINALIZE_RECO message in dlm_remaster_locks(), another node may become the recovery master for another dead node, and then send the BEGIN_RECO message to all the nodes included the old master, in the handler of this message dlm_begin_reco_handler() of old master, dlm->reco.dead_node and dlm->reco.new_master will be set to the second dead node and the new master, then in dlm_reset_recovery(), these two variables will be reset to default value. This will cause new recovery master can not finish the recovery process and hung, at last the whole cluster will hung for recovery. old recovery master: new recovery master: dlm_remaster_locks() become recovery master for another dead node. dlm_send_begin_reco_message() dlm_begin_reco_handler() { if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) { return -EAGAIN; } dlm_set_reco_master(dlm, br->node_idx); dlm_set_reco_dead_node(dlm, br->dead_node); } dlm_reset_recovery() { dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM); dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM); } will hang in dlm_remaster_locks() for request dlm locks info Before send FINALIZE_RECO message, recovery master should set DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done, this can break the race windows as the BEGIN_RECO messages will not be handled before DLM_RECO_STATE_FINALIZE flag is cleared. A similar race may happen between new recovery master and normal node which is in dlm_finalize_reco_handler(), also fix it. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03ocfs2: dlm: fix lock migration crashJunxiao Bi1-6/+8
This issue was introduced by commit 800deef3f6f8 ("ocfs2: use list_for_each_entry where benefical") in 2007 where it replaced list_for_each with list_for_each_entry. The variable "lock" will point to invalid data if "tmpq" list is empty and a panic will be triggered due to this. Sunil advised reverting it back, but the old version was also not right. At the end of the outer for loop, that list_for_each_entry will also set "lock" to an invalid data, then in the next loop, if the "tmpq" list is empty, "lock" will be an stale invalid data and cause the panic. So reverting the list_for_each back and reset "lock" to NULL to fix this issue. Another concern is that this seemes can not happen because the "tmpq" list should not be empty. Let me describe how. old lock resource owner(node 1): migratation target(node 2): image there's lockres with a EX lock from node 2 in granted list, a NR lock from node x with convert_type EX in converting list. dlm_empty_lockres() { dlm_pick_migration_target() { pick node 2 as target as its lock is the first one in granted list. } dlm_migrate_lockres() { dlm_mark_lockres_migrating() { res->state |= DLM_LOCK_RES_BLOCK_DIRTY; wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res)); //after the above code, we can not dirty lockres any more, // so dlm_thread shuffle list will not run downconvert lock from EX to NR upconvert lock from NR to EX <<< migration may schedule out here, then <<< node 2 send down convert request to convert type from EX to <<< NR, then send up convert request to convert type from NR to <<< EX, at this time, lockres granted list is empty, and two locks <<< in the converting list, node x up convert lock followed by <<< node 2 up convert lock. // will set lockres RES_MIGRATING flag, the following // lock/unlock can not run dlm_lockres_release_ast(dlm, res); } dlm_send_one_lockres() dlm_process_recovery_data() for (i=0; i<mres->num_locks; i++) if (ml->node == dlm->node_num) for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) { list_for_each_entry(lock, tmpq, list) if (lock) break; <<< lock is invalid as grant list is empty. } if (lock->ml.node != ml->node) BUG() >>> crash here } I see the above locks status from a vmcore of our internal bug. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13ocfs2: skip locks in the blocked listXue jiufei1-0/+7
A parallel umount on 4 nodes triggered a bug in dlm_process_recovery_date(). Here's the situation: Receiving MIG_LOCKRES message, A node processes the locks in migratable lockres. It copys lvb from migratable lockres when processing the first valid lock. If there is a lock in the blocked list with the EX level, it triggers the BUG. Since valid lvbs are set when locks are granted with EX or PR levels, locks in the blocked list cannot have valid lvbs. Therefore I think we should skip the locks in the blocked list. Signed-off-by: Xuejiufei <xuejiufei@huawei.com> Signed-off-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11ocfs2/dlm: force clean refmap when doing local cleanupXue jiufei1-0/+8
dlm_do_local_recovery_cleanup() should force clean refmap if the owner of lockres is UNKNOWN. Otherwise node may hang when umounting filesystems. Here's the situation: Node1 Node2 dlmlock() -> dlm_get_lock_resource() send DLM_MASTER_REQUEST_MSG to other nodes. trying to master this lockres, return MAYBE. selected as the master of lockresA, set mle->master to Node1, and do assert_master, send DLM_ASSERT_MASTER_MSG to Node2. Node 2 has interest on lockresA and return DLM_ASSERT_RESPONSE_MASTERY_REF then something happened and Node2 crashed. Receiving DLM_ASSERT_RESPONSE_MASTERY_REF, set Node2 into refmap, and keep sending DLM_ASSERT_MASTER_MSG to other nodes o2hb found node2 down, calling dlm_hb_node_down() --> dlm_do_local_recovery_cleanup() the master of lockresA is still UNKNOWN, no need to call dlm_free_dead_locks(). Set the master of lockresA to Node1, but Node2 stills remains in refmap. When Node1 umount, it found that the refmap of lockresA is not empty and attempted to migrate it to Node2, But Node2 is already down, so umount hang, trying to migrate lockresA again and again. Signed-off-by: joyce <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11ocfs2: dlm_request_all_locks() should deal with the status sent from target nodeXue jiufei1-1/+4
dlm_request_all_locks() should deal with the status sent from target node if DLM_LOCK_REQUEST_MSG is sent successfully, or recovery master will fall into endless loop, waiting for other nodes to send locks and DLM_RECO_DATA_DONE_MSG to me. NodeA NodeB selected as recovery master dlm_remaster_locks() ->dlm_request_all_locks() send DLM_LOCK_REQUEST_MSG to nodeA It happened that NodeA cannot alloc memory when it processes this message. dlm_request_all_locks_handler() do not queue dlm_request_all_locks_worker and returns -ENOMEM. It will never send locks and DLM_RECO_DATA_DONE_MSG to NodeB. NodeB do not deal with the status sent from nodeA, and will fall in endless loop waiting for the recovery state of NodeA to be changed. Signed-off-by: joyce <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Jeff Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03ocfs2: add missing dlm_put() in dlm_begin_reco_handler()Xue jiufei1-0/+1
dlm_begin_reco_handler() returns without putting dlm when dlm recovery state is DLM_RECO_STATE_FINALIZE. Signed-off-by: joyce <xuejiufei@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03fs/ocfs2/dlm/dlmrecovery.c:dlm_request_all_locks(): ret should be int ↵Joseph Qi1-2/+1
instead of enum In dlm_request_all_locks, ret is type enum. But o2net_send_message returns a type int value. Then it will never run into the following error branch. So we should change the ret type from enum to int. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Sunil Mushran <sunil.mushran@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03fs/ocfs2/dlm/dlmrecovery.c: remove duplicate declarationsJoseph Qi1-3/+0
Below 3 functions have already been declared in dlmcommon.h, so we have no need to declare them again in dlmrecovery.c: dlm_complete_recovery_thread dlm_launch_recovery_thread dlm_kick_recovery_thread Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Sunil Mushran <sunil.mushran@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-12ocfs2: add missing lockres put in dlm_mig_lockres_handlerXue jiufei1-0/+1
dlm_mig_lockres_handler() is missing a dlm_lockres_put() on an error path. Signed-off-by: joyce <xuejiufei@huawei.com> Reviewed-by: shencanquan <shencanquan@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29ocfs2/dlm: remove redundant null pointer checkSachin Kamat1-4/+2
kfree on a NULL pointer is a no-op. Remove the redundant null pointer check. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Acked-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27hlist: drop the node parameter from iteratorsSasha Levin1-4/+2
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-24ocfs2/dlm: Cleanup dlm_wait_for_node_death() and dlm_wait_for_node_recovery()Sunil Mushran1-24/+22
dlm_wait_for_node_death() and dlm_wait_for_node_recovery() needed a facelift. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2/dlm: Clean up refmap helpersSunil Mushran1-4/+4
Patch cleans up helpers that set/clear refmap bits and grab/drop inflight lock ref counts. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2/dlm: Cleanup up dlm_finish_local_lockres_recovery()Sunil Mushran1-32/+25
dlm_finish_local_lockres_recovery() needed a facelift. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-07-24ocfs2/dlm: Clean up messages in o2dlmSunil Mushran1-22/+31
o2dlm messages needed a facelift. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
2011-05-25ocfs2/dlm: Add new dlm message DLM_BEGIN_EXIT_DOMAIN_MSGSunil Mushran1-0/+1
This patch adds a new dlm message DLM_BEGIN_EXIT_DOMAIN_MSG and ups the dlm protocol to 1.2. o2dlm sends this new message in dlm_unregister_domain() to mark the beginning of the exit domain. This message is sent to all nodes in the domain. Currently o2dlm has no way of informing other nodes of its impending exit. This information is useful as the other nodes could disregard the exiting node in certain operations. For example, in resource migration. If two or more nodes were umounting in parallel, it would be more efficient if o2dlm were to choose a non-exiting node to be the new master node rather than an exiting one. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
2011-03-07ocfs2: Remove EXIT from masklog.Tao Ma1-4/+1
mlog_exit is used to record the exit status of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. This patch just try to remove it or change it. So: 1. if all the error paths already use mlog_errno, it is just removed. Otherwise, it will be replaced by mlog_errno. 2. if it is used to print some return value, it is replaced with mlog(0,...). mlog_exit_ptr is changed to mlog(0. All those mlog(0,...) will be replaced with trace events later. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2011-02-21ocfs2: Remove ENTRY from masklog.Tao Ma1-4/+0
ENTRY is used to record the entry of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. So for mlog_entry_void, we just remove it. for mlog_entry(...), we replace it with mlog(0,...), and they will be replace by trace event later. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2010-08-07ocfs2/dlm: avoid incorrect bit set in refmap on recovery masterWengang Wang1-12/+10
In the following situation, there remains an incorrect bit in refmap on the recovery master. Finally the recovery master will fail at purging the lockres due to the incorrect bit in refmap. 1) node A has no interest on lockres A any longer, so it is purging it. 2) the owner of lockres A is node B, so node A is sending de-ref message to node B. 3) at this time, node B crashed. node C becomes the recovery master. it recovers lockres A(because the master is the dead node B). 4) node A migrated lockres A to node C with a refbit there. 5) node A failed to send de-ref message to node B because it crashed. The failure is ignored. no other action is done for lockres A any more. For mormal, re-send the deref message to it to recovery master can fix it. Well, ignoring the failure of deref to the original master and not recovering the lockres to recovery master has the same effect. And the later is simpler. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: stable@kernel.org Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-07-12ocfs2/dlm: don't access beyond bitmap sizeWengang Wang1-1/+1
dlm->recovery_map is defined as unsigned long recovery_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; We should treat O2NM_MAX_NODES as the bit map size in bits. This patches fixes a bit operation that takes O2NM_MAX_NODES + 1 as bitmap size. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-05-05ocfs2: print node # when tcp failsWengang Wang1-9/+18
Print the node number of a peer node if sending it a message failed. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26dlm: allow dlm do recovery during shutdownSrinivas Eeda1-1/+1
If a node down event happens while dlm shutdown in progress, dlm recovery should be done before dlm is shutdown. We can't migrate unrecovered locks, obviously. But dlm_reco_thread only does recovery if the dlm_state is in DLM_CTXT_JOINED. dlm_reco_thread should do recovery if dlm_state is in DLM_CTXT_JOINED or DLM_CTXT_IN_SHUTDOWN. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-03ocfs2/dlm: Remove BUG_ON in dlm recovery when freeing locks of a dead nodeSunil Mushran1-1/+6
During recovery, the dlm frees the locks for the dead node. If it finds a lock in a resource for the dead node, it expects that node to also have a ref in that lock resource. If not, it BUGs. ossbz#1175 was filed with the above BUG. Now, while it is correct that we should be expecting the ref, I see no reason why we have to BUG. After all, we are freeing up the lock and clearing the ref. This patch replaces the BUG_ON with a printk(). Hopefully, that will give us more clues next time this happens. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1175 Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-02ocfs2/dlm: Handle EAGAIN for compatibility - v2Sunil Mushran1-1/+7
Mainline commit aad1b15310b9bcd59fa81ab8f2b1513b59553ea8 made the dlm_begin_reco_handler() return -EAGAIN instead of EAGAIN. As this error is transmitted over the wire, we want the receiver, dlm_send_begin_reco_message(), to understand both the older EAGAIN and the newer -EAGAIN, to allow rolling upgrade of the cluster nodes. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-01-25ocfs2/dlm: Print more messages during lock migrationSunil Mushran1-8/+38
When a lock resource is migrated, the dlm compares the migrated locks with that that was already existing on the new node. If the comparison fails, it BUGs. This patch prints more messages when the comparison fails inorder to help with the root cause analyis. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1206 This does not fix bz1206. However, if we run into it again, we will have more information to chew on. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-01-25ocfs2/dlm: Ignore LVBs of locks in the Blocked listSunil Mushran1-14/+34
During lock resource migration, o2dlm fills the packet with a LVB from the first valid lock. For sanity, it ensures that the other valid locks have the same LVB. If not, it BUGs. The valid locks are ones that have granted EX or PR lock levels and are either on the Granted or Converting lists. Locks in the Blocked list cannot have a valid LVB. This patch ensures that we skip the locks in the Blocked list. Fixes oss bugzilla#1202 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1202 Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-01-25ocfs2/trivial: Remove trailing whitespacesSunil Mushran1-19/+19
Patch removes trailing whitespaces. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-12-02ocfs2: return -EAGAIN instead of EAGAIN in dlmTiger Yang1-9/+9
We used to return positive EAGAIN to indicate a retry action is needed in dlm_begin_reco_handler(). Now we return negative -EAGAIN to erase the confusion caused by this error code. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-23headers: utsname.h reduxAlexey Dobriyan1-1/+0
* remove asm/atomic.h inclusion from linux/utsname.h -- not needed after kref conversion * remove linux/utsname.h inclusion from files which do not need it NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however due to some personality stuff it _is_ needed -- cowardly leave ELF-related headers and files alone. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-08ocfs2: trivial fix for s/migrate/migration/ in dlmrecovery.c loggingJeff Liu1-1/+1
in dlmrecovery.c:1121, replace 'migrate' to 'migration' to keep the consistency by comparing to other lines with the similar log info in the same file. Signed-off-by: Jeff Liu <jeff.liu@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2008-03-10ocfs2/dlm: Print message showing the recovery masterSunil Mushran1-3/+3
Knowing the dlm recovery master helps in debugging recovery issues. This patch prints a message on the recovery master node. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10ocfs2/dlm: Add missing dlm_lockres_put()s in migration pathSunil Mushran1-6/+34
During migration, the recovery master node may be asked to master a lockres it may not know about. In that case, it would not only have to create a lockres and add it to the hash, but also remember to to do the _put_ corresponding to the kref_init in dlm_init_lockres(), as soon as the migration is completed. Yes, we don't wait for the dlm_purge_lockres() to do that matching put. Note the ref added for it being in the hash protects the lockres from being freed prematurely. This patch adds that missing put, as described above, to plug a memleak. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10ocfs2/dlm: Add missing dlm_lock_put()sSunil Mushran1-0/+9
Normally locks for remote nodes are freed when that node sends an UNLOCK message to the master. The master node tags an DLM_UNLOCK_FREE_LOCK action to do an extra put on the lock at the end. However, there are times when the master node has to free the locks for the remote nodes forcibly. Two cases when this happens are: 1. When the master has migrated the lockres plus all locks to another node. 2. When the master is clearing all the locks of a dead node. It was in the above two conditions that the dlm was missing the extra put. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-03-10ocfs2: Use dlm_print_one_lock_resource for lock resource printTao Ma1-1/+1
__dlm_print_one_lock_resource must be called with spin_lock the res->spinlock. While in some cases, we use it without this precondition and lead to the failure of assert_spin_locked. So call dlm_print_one_lock_resource instead. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25ocfs2/dlm: Clear joining_node on hearbeat node downTao Ma1-6/+6
Currently the process of dlm join contains 2 steps: query join and assert join. After query join, the joined node will set its joining_node. So if the joining node happens to panic before the 2nd step, the joined node will fail to clear its joining_node flag because that node isn't in the domain map. It at least cause 2 problems. 1. All the new join request will fail. So no new node can mount the volume. 2. The joined node can't umount the volume since during the umount process it has to wait for the joining_node to be unknown. So the umount will be hanged. The solution is to clear the joining_node before we check the domain map. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25ocfs2_dlm: Call node eviction callbacks from heartbeat handlerMark Fasheh1-0/+7
With this, a dlm client can take advantage of the group protocol in the dlm to get full notification whenever a node within the dlm domain leaves unexpectedly. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-19Use helpers to obtain task pid in printksPavel Emelyanov1-5/+5
The task_struct->pid member is going to be deprecated, so start using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in the kernel. The first thing to start with is the pid, printed to dmesg - in this case we may safely use task_pid_nr(). Besides, printks produce more (much more) than a half of all the explicit pid usage. [akpm@linux-foundation.org: git-drm went and changed lots of stuff] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-10[KJ PATCH] Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in ↵Shani Moideen1-1/+1
fs/ocfs2/dlm/dlmrecovery.c Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c Signed-off-by: Shani Moideen <shani.moideen@wipro.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10[PATCH] ocfs2: use list_for_each_entry where beneficalChristoph Hellwig1-52/+25
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02ocfs2: fix sparse warnings in fs/ocfs2/dlmMark Fasheh1-2/+2
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26ocfs2_dlm: fix race in dlm_remaster_locksSrinivas Eeda1-0/+2
There is a possibility that dlm_remaster_locks could overwride node->state with DLM_RECO_NODE_DATA_REQUESTED after dlm_reco_data_done_handler sets the node->state to DLM_RECO_NODE_DATA_DONE. This could lead to recovery getting stuck and requires a cluster reboot. Synchronize with dlm_reco_state_lock spinlock. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07ocfs2: Added post handler callable function in o2net message handlerKurt Hackel1-6/+12
Currently o2net allows one handler function per message type. This patch adds the ability to call another function to be called after the handler has returned the message to the other node. Handlers are now given the option of returning a context (in the form of a void **) which will be passed back into the post message handler function. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07ocfs2_dlm: Cookies in locks not being printed correctly in error messagesKurt Hackel1-6/+6
The dlm encodes the node number and a sequence number in the lock cookie. It also stores the cookie in the lockres in the big endian format to avoid swapping 8 bytes on each lock request. The bug here was that it was assuming the cookie to be in the cpu format when decoding it for printing the error message. This patch swaps the bytes before the print. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07ocfs2_dlm: wake up sleepers on the lockres waitqueueKurt Hackel1-0/+1
The dlm was not waking up threads waiting on the lockres wait queue, waiting for the lockres to be no longer be in the DLM_LOCK_RES_IN_PROGRESS and the DLM_LOCK_RES_MIGRATING states. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07ocfs2_dlm: Dlm dispatch was stopping too earlyKurt Hackel1-3/+0
dlm_dispatch_work was not processing the queued up tasks at the first sign of the node leaving the domain leading to not only incompleted tasks but also a mismatch in the dlm refcnt. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07ocfs2_dlm: Drop inflight refmap even if no locks found on the lockresKurt Hackel1-5/+3
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07ocfs2_dlm: Fix migrate lockres handler queue scanningKurt Hackel1-6/+20
The migrate lockres handler was only searching for its lock on migrated lockres on the expected queue. This could be problematic as the new master could have also issued a convert request during the migration and thus moved the lock to the convert queue. We now search for the lock on all three queues. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07ocfs2_dlm: Make dlmunlock() wait for migration to completeKurt Hackel1-0/+1
dlmunlock() was not waiting for migration to complete before releasing locks on locally mastered locks. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Sunil Mushran <Sunil.Mushran@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-02-07ocfs2_dlm: fix cluster-wide refcounting of lock resourcesKurt Hackel1-7/+116
This was previously broken and migration of some locks had to be temporarily disabled. We use a new (and backward-incompatible) set of network messages to account for all references to a lock resources held across the cluster. once these are all freed, the master node may then free the lock resource memory once its local references are dropped. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-13[PATCH] Fix numerous kcalloc() calls, convert to kzalloc()Robert P. J. Day1-3/+3
All kcalloc() calls of the form "kcalloc(1,...)" are converted to the equivalent kzalloc() calls, and a few kcalloc() calls with the incorrect ordering of the first two arguments are fixed. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Adam Belay <ambx1@neo.rr.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-22WorkStruct: make allyesconfigDavid Howells1-2/+3
Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>
2006-09-24ocfs2: Allow binary names in the DLMMark Fasheh1-1/+2
The OCFS2 DLM uses strlen() to determine lock name length, which excludes the possibility of putting binary values in the name string. Fix this by requiring that string length be passed in as a parameter. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-29[PATCH] fs/ocfs2/dlm/dlmrecovery.c: make dlm_lockres_master_requery() staticAdrian Bunk1-2/+6
dlm_lockres_master_requery() became global without any external usage. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-27[PATCH] spin/rwlock init cleanupsIngo Molnar1-2/+2
locking init cleanups: - convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK() - convert rwlocks in a similar manner this patch was generated automatically. Motivation: - cleanliness - lockdep needs control of lock initialization, which the open-coded variants do not give - it's also useful for -rt and for lock debugging in general Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] fs/ocfs2/dlm/: cleanupsAdrian Bunk1-1/+1
This patch #if 0's the no longer used dlm_dump_lock_resources(). Since this makes dlmdebug.h empty, this patch also removes this header. Additionally, the needlessly global dlm_is_node_recovered() is made static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: move dlm work to a private work queueKurt Hackel1-2/+11
The work that is done can block for long periods of time and so is not appropriate for keventd. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: tune down some noisy messages during dlm recoveryKurt Hackel1-1/+1
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: display message before waiting for recovery to completeKurt Hackel1-0/+7
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: use GFP_NOFS in some dlm operationsKurt Hackel1-5/+5
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: wait for recovery when starting lock masteryKurt Hackel1-0/+30
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: continue recovery when a dead node is encounteredKurt Hackel1-0/+1
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: remove unneccesary spin_unlock() in dlm_remaster_locks()Kurt Hackel1-1/+0
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: dlm_remaster_locks() should never exit without completingKurt Hackel1-54/+62
We cannot restart recovery. Once we begin to recover a node, keep the state of the recovery intact and follow through, regardless of any other node deaths that may occur. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: special case recovery lock in dlmlock_remote()Kurt Hackel1-0/+4
If the previous master of the recovery lock dies, let calc_usage take it down completely and let the caller completely redo the dlmlock() call. Otherwise, there will never be an opportunity to re-master the lockres and recovery wont be able to progress. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: update lvb immediately during recoveryKurt Hackel1-18/+26
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: dump mismatching migrated lvbs before BUG()Kurt Hackel1-2/+13
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: make dlm recovery finalization 2 stageKurt Hackel1-17/+95
Makes it easier for the recovery process to deal with node death. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: dlm recovery / lockres reference count fixKurt Hackel1-3/+13
Take a reference on lockres structures while they are on the recovery list. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: clean up recovery related messagesKurt Hackel1-12/+90
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: handle network errors during recoveryKurt Hackel1-17/+36
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: only recover one dead node at a timeKurt Hackel1-3/+35
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: Better tracking for recovery state changesKurt Hackel1-9/+28
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: Fix empty lvb checkKurt Hackel1-5/+7
The check for an empty lvb should check the entire buffer not just the first byte. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: fix inverted logic in dlm_is_node_deadKurt Hackel1-1/+1
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: allocate lockres hash pages in an arrayDaniel Phillips1-2/+2
This allows us to have a hash table greater than a single page which greatly improves dlm performance on some tests. Signed-off-by: Daniel Phillips <phillips@google.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26ocfs2: calculate lockid hash values outside of the spinlockMark Fasheh1-1/+4
Fixes a performance bug - pointed out by Andrew. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-06-26[PATCH] fs: use list_move()Akinobu Mita1-6/+3
This patch converts the combination of list_del(A) and list_add(A, B) to list_move(A, B) under fs/. Cc: Ian Kent <raven@themaw.net> Acked-by: Joel Becker <joel.becker@oracle.com> Cc: Neil Brown <neilb@cse.unsw.edu.au> Cc: Hans Reiser <reiserfs-dev@namesys.com> Cc: Urban Widmark <urban@teststation.com> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24ocfs2: don't use MLF* in dlm/ filesKurt Hackel1-4/+8
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-03-24[PATCH] ocfs2: dlm recovery fixesKurt Hackel1-17/+21
when starting lock mastery (excepting the recovery lock) wait on any nodes needing recovery. fix one instance where lock resources were left attached to the recovery list after recovery completed. ensure that the node_down code is run uniformly regardless of which node found the dead node first. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-03-01[PATCH] ocfs2: use hlists for lockres hashMark Fasheh1-11/+12
Switch from list_head to hlist_head. Make the size of the hash dependent upon the allocated area, rather than a constant. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-02-16[PATCH] ocfs2: add dlm_wait_for_node_deathKurt Hackel1-0/+18
* add dlm_wait_for_node_death function to be used after receiving a network error. this will wait for the given timeout to allow the heartbeat callbacks to update the domain map. without this, some paths may spin and consume enough cpu that the heartbeat gets starved and never updates. Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-02-16[PATCH] ocfs2: recheck recovery state after getting lockKurt Hackel1-0/+24
* after successfully taking the $RECOVERY lock in EX mode, recheck to make sure that recovery has not already begun or completed on another node Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-02-03[PATCH] fs/ocfs2/dlm/dlmrecovery.c must #include <linux/delay.h>Adrian Bunk1-0/+1
fs/ocfs2/dlm/dlmrecovery.c does now use msleep(), and does therefore need to #include <linux/delay.h> for getting the prototype of this function. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-02-03[PATCH] ocfs2/dlm: fixesKurt Hackel1-37/+212
* fix a hang which can occur during shutdown migration * do not allow nodes to join during recovery * when restarting lock mastery, do not ignore nodes which come up * more than one node could become recovery master, fix this * sleep to allow some time for heartbeat state to catch up to network * extra debug info for bad recovery state problems * make DLM_RECO_NODE_DATA_DONE a valid state for non-master recovery nodes * prune all locks from dead nodes on $RECOVERY lock resources * do NOT automatically add new nodes to mle nodemaps until they have properly joined the domain * make sure dlm_pick_recovery_master only exits when all nodes have synced * properly handle dlmunlock errors in dlm_pick_recovery_master * do not propagate network errors in dlm_send_begin_reco_message * dead nodes were not being put in the recovery map sometimes, fix this * dlmunlock was failing to clear the unlock actions on DLM_DENIED Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-01-03[PATCH] OCFS2: The Second Oracle Cluster FilesystemKurt Hackel1-0/+2132
A distributed lock manager built with the cluster file system use case in mind. The OCFS2 dlm exposes a VMS style API, though things have been simplified internally. The only lock levels implemented currently are NLMODE, PRMODE and EXMODE. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>