mdmon: manage_member: fix race condition during slow meta data writes

In order to track kernel state changes, the monitor needs to notice changes in sysfs. If the changes are transient, and the monitor is busy writing meta data, it can happen that the changes are missed. This will cause the meta data to be inconsistent with the real state of the array. I can reproduce this in a test scenario with a DDF container and two subarrays, where I set a disk to "failed" and then add a global hot-spare. On a typical MD test setup with loop devices, I can reliably reproduce a failure where the metadata show degraded members although the kernel finished the recovery successfully. This patch fixes this problem by applying two changes. First, when a metadata update is queued, wait until it is certain that the monitor actually applied these meta data (the for loop is actually needed to avoid failures completely in my test case). Second, after triggering the recovery, set prev_state of the changed array to "recover", in case the monitor misses the transient "recover" state. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
author: Martin Wilck <mwilck@arcor.de> 2013-07-30 23:18:33 +0200
committer: NeilBrown <neilb@suse.de> 2013-07-31 13:00:46 +1000
commit: 6ca1e6eccb2c6661ec111a455bcc2f3f5593cb06 (patch)
tree: 7b9872615725c2d76261a4dee6b51467a0616faf /managemon.c
parent: 30b83120ede68eb28f28118e3af4ff9c1de91fa0 (diff)
download: mdadm-6ca1e6eccb2c6661ec111a455bcc2f3f5593cb06.tar.gz
1 files changed, 7 insertions, 1 deletions
diff --git a/managemon.c b/managemon.c
index a6551081..40c863f1 100644
--- a/managemon.c
+++ b/managemon.c
@@ -535,8 +535,14 @@ static void manage_member(struct mdstat_ent *mdstat,
 		}
 		queue_metadata_update(updates);
 		updates = NULL;
+		while (update_queue_pending || update_queue) {
+			check_update_queue(container);
+			usleep(15*1000);
+		}
 		replace_array(container, a, newa);
-		sysfs_set_str(&a->info, NULL, "sync_action", "recover");
+		if (sysfs_set_str(&a->info, NULL, "sync_action", "recover")
+		    == 0)
+			newa->prev_action = recover;
 		dprintf("%s: recovery started on %s\n", __func__,
 			a->info.sys_name);
  out:
author	Martin Wilck <mwilck@arcor.de>	2013-07-30 23:18:33 +0200
committer	NeilBrown <neilb@suse.de>	2013-07-31 13:00:46 +1000
commit	6ca1e6eccb2c6661ec111a455bcc2f3f5593cb06 (patch)
tree	7b9872615725c2d76261a4dee6b51467a0616faf /managemon.c
parent	30b83120ede68eb28f28118e3af4ff9c1de91fa0 (diff)
download	mdadm-6ca1e6eccb2c6661ec111a455bcc2f3f5593cb06.tar.gz