[PATCH] intrinsic automount and mountpoint degradation support

Here's a patch that I worked out with Al Viro that adds support for a filesystem (such as kAFS) to perform automounting intrinsically without the need for a userspace daemon. It also adds support for such mountpoints to be degraded at the filesystem's behest until they've been untouched long enough that they'll be removed. I've a patch (to follow) that removes some #ifdef's from fs/afs/* thus allowing it to make use of this facility. There are five pieces to this: (1) Any interested filesystem needs to have at least one list to which expirable mountpoints can be added. Access to this list is governed by the vfsmount_lock. (2) When a filesystem wants to create an expirable mount, it calls do_kern_mount() to get a handle on the filesystem it wants mounting, and then calls do_add_mount() to mount that filesystem on the designated mountpoint, supplying the list mentioned in (1) to which the vfsmount will be added. In kAFS's case, the mountpoint is a directory with a follow_link() method defined (fs/afs/mntpt.c). This uses the struct nameidata supplied as an argument as a determination of where the new filesystem should be mounted. (3) When something using a vfsmount finishes dealing with it, it calls mntput(). This unmarks the vfsmount for immediate expiry. There are two criteria for determining if a vfsmount may be expired - it mustn't be marked as in use for anything other than being a child of another vfsmount, and it must have an expiry mark against it already. (4) The filesystem then determines the policy on expiring the mounts created in (2). When it feels the need to, it passes the list mentioned in (1) to mark_mounts_for_expiry() to request everything on the list be expired. This function examines each mount listed. If the vfsmount meets the criteria mentioned in (3), then the vfsmount is deleted from the namespace and disposed of as for unmounting; otherwise the vfsmount is left untouched apart from now bearing an expiration mark if it didn't before. kAFS's expiration policy is simply to invoke this process at regular intervals for all the mounts on its list. (5) An expiration facility is also provided to userspace: by calling umount() with a MNT_EXPIRE flag, it can make a request to unmount only if the mountpoint hasn't been used since the last request and isn't in use now. This allows expiration to be driven by userspace instead of by the kernel if that is desirable. This also means that do_umount() has to use a different version of path_release() to everyone else... it can't call mntput() as that clears the expiration flag, thus rendering this unachievable; so it's version of path_release() calls _mntput(), which doesn't do the clear. My original idea was to give the kernel more knowledge of automounted things. This avoids a certain problem with stat() on a mountpoint causing it to mount (for example, do "ls -l /afs" on a machine with kAFS), but Al wanted it done this way. > Why is autofs unsuitable? Because: (1) Autofs is flat; AFS requires a tree - mounts on mounts on mounts on mounts... (2) AFS holds the data as to what the mountpoints are and where they go, and these may be cross-links to subtrees beyond your control. It's also not trivial to extract a list of mountpoints as is required for autofs. (3) Autofs is not namespace safe. (4) Ducking back to userspace to get that to do the mount is pretty tricky if namespaces are involved. In fact, autofs may well want to make use of this facility. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
author: David Howells <dhowells@redhat.com> 2004-07-10 19:30:11 -0700
committer: Linus Torvalds <torvalds@ppc970.osdl.org> 2004-07-10 19:30:11 -0700
commit: c8a6ba011918486195870aab9a8e3d61c76c1b02 (patch)
tree: 896c5960db3db754d7364802486044fa907eed50 /fs
parent: 694effe72032db1384b5a084977b8bbb2c3cdaa6 (diff)
download: history-c8a6ba011918486195870aab9a8e3d61c76c1b02.tar.gz
3 files changed, 209 insertions, 19 deletions
diff --git a/fs/namei.c b/fs/namei.c
index 3db4da5a2116ed..bbadcf782e92f7 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -279,6 +279,16 @@ void path_release(struct nameidata *nd)
 }
 
 /*
+ * umount() mustn't call path_release()/mntput() as that would clear
+ * mnt_expiry_mark
+ */
+void path_release_on_umount(struct nameidata *nd)
+{
+	dput(nd->dentry);
+	_mntput(nd->mnt);
+}
+
+/*
  * Internal lookup() using the new generic dcache.
  * SMP-safe
  */
diff --git a/fs/namespace.c b/fs/namespace.c
index f3ab6910fdf130..759527b1efe2ce 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -60,6 +60,7 @@ struct vfsmount *alloc_vfsmnt(const char *name)
 		INIT_LIST_HEAD(&mnt->mnt_child);
 		INIT_LIST_HEAD(&mnt->mnt_mounts);
 		INIT_LIST_HEAD(&mnt->mnt_list);
+		INIT_LIST_HEAD(&mnt->mnt_fslink);
 		if (name) {
 			int size = strlen(name)+1;
 			char *newname = kmalloc(size, GFP_KERNEL);
@@ -106,13 +107,9 @@ struct vfsmount *lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
 
 EXPORT_SYMBOL(lookup_mnt);
 
-static int check_mnt(struct vfsmount *mnt)
+static inline int check_mnt(struct vfsmount *mnt)
 {
-	spin_lock(&vfsmount_lock);
-	while (mnt->mnt_parent != mnt)
-		mnt = mnt->mnt_parent;
-	spin_unlock(&vfsmount_lock);
-	return mnt == current->namespace->root;
+	return mnt->mnt_namespace == current->namespace;
 }
 
 static void detach_mnt(struct vfsmount *mnt, struct nameidata *old_nd)
@@ -164,6 +161,14 @@ clone_mnt(struct vfsmount *old, struct dentry *root)
 		mnt->mnt_root = dget(root);
 		mnt->mnt_mountpoint = mnt->mnt_root;
 		mnt->mnt_parent = mnt;
+		mnt->mnt_namespace = old->mnt_namespace;
+
+		/* stick the duplicate mount on the same expiry list
+		 * as the original if that was on one */
+		spin_lock(&vfsmount_lock);
+		if (!list_empty(&old->mnt_fslink))
+			list_add(&mnt->mnt_fslink, &old->mnt_fslink);
+		spin_unlock(&vfsmount_lock);
 	}
 	return mnt;
 }
@@ -346,6 +351,7 @@ void umount_tree(struct vfsmount *mnt)
 	while (!list_empty(&kill)) {
 		mnt = list_entry(kill.next, struct vfsmount, mnt_list);
 		list_del_init(&mnt->mnt_list);
+		list_del_init(&mnt->mnt_fslink);
 		if (mnt->mnt_parent == mnt) {
 			spin_unlock(&vfsmount_lock);
 		} else {
@@ -369,6 +375,24 @@ static int do_umount(struct vfsmount *mnt, int flags)
 		return retval;
 
 	/*
+	 * Allow userspace to request a mountpoint be expired rather than
+	 * unmounting unconditionally. Unmount only happens if:
+	 *  (1) the mark is already set (the mark is cleared by mntput())
+	 *  (2) the usage count == 1 [parent vfsmount] + 1 [sys_umount]
+	 */
+	if (flags & MNT_EXPIRE) {
+		if (mnt == current->fs->rootmnt ||
+		    flags & (MNT_FORCE | MNT_DETACH))
+			return -EINVAL;
+
+		if (atomic_read(&mnt->mnt_count) != 2)
+			return -EBUSY;
+
+		if (!xchg(&mnt->mnt_expiry_mark, 1))
+			return -EAGAIN;
+	}
+
+	/*
 	 * If we may have to abort operations to get out of this
 	 * mount, and they will themselves hold resources we must
 	 * allow the fs to do things. In the Unix tradition of
@@ -461,7 +485,7 @@ asmlinkage long sys_umount(char __user * name, int flags)
 
 	retval = do_umount(nd.mnt, flags);
 dput_and_out:
-	path_release(&nd);
+	path_release_on_umount(&nd);
 out:
 	return retval;
 }
@@ -618,6 +642,11 @@ static int do_loopback(struct nameidata *nd, char *old_name, int recurse)
 	}
 
 	if (mnt) {
+		/* stop bind mounts from expiring */
+		spin_lock(&vfsmount_lock);
+		list_del_init(&mnt->mnt_fslink);
+		spin_unlock(&vfsmount_lock);
+
 		err = graft_tree(mnt, nd);
 		if (err) {
 			spin_lock(&vfsmount_lock);
@@ -638,7 +667,8 @@ static int do_loopback(struct nameidata *nd, char *old_name, int recurse)
  * on it - tough luck.
  */
 
-static int do_remount(struct nameidata *nd,int flags,int mnt_flags,void *data)
+static int do_remount(struct nameidata *nd, int flags, int mnt_flags,
+		      void *data)
 {
 	int err;
 	struct super_block * sb = nd->mnt->mnt_sb;
@@ -710,6 +740,10 @@ static int do_move_mount(struct nameidata *nd, char *old_name)
 
 	detach_mnt(old_nd.mnt, &parent_nd);
 	attach_mnt(old_nd.mnt, nd);
+
+	/* if the mount is moved, it should no longer be expire
+	 * automatically */
+	list_del_init(&old_nd.mnt->mnt_fslink);
 out2:
 	spin_unlock(&vfsmount_lock);
 out1:
@@ -722,11 +756,14 @@ out:
 	return err;
 }
 
-static int do_add_mount(struct nameidata *nd, char *type, int flags,
+/*
+ * create a new mount for userspace and request it to be added into the
+ * namespace's tree
+ */
+static int do_new_mount(struct nameidata *nd, char *type, int flags,
 			int mnt_flags, char *name, void *data)
 {
 	struct vfsmount *mnt;
-	int err;
 
 	if (!type || !memchr(type, 0, PAGE_SIZE))
 		return -EINVAL;
@@ -736,9 +773,20 @@ static int do_add_mount(struct nameidata *nd, char *type, int flags,
 		return -EPERM;
 
 	mnt = do_kern_mount(type, flags, name, data);
-	err = PTR_ERR(mnt);
 	if (IS_ERR(mnt))
-		goto out;
+		return PTR_ERR(mnt);
+
+	return do_add_mount(mnt, nd, mnt_flags, NULL);
+}
+
+/*
+ * add a mount into a namespace's mount tree
+ * - provide the option of adding the new mount to an expiration list
+ */
+int do_add_mount(struct vfsmount *newmnt, struct nameidata *nd,
+		 int mnt_flags, struct list_head *fslist)
+{
+	int err;
 
 	down_write(&current->namespace->sem);
 	/* Something was mounted here while we slept */
@@ -750,22 +798,143 @@ static int do_add_mount(struct nameidata *nd, char *type, int flags,
 
 	/* Refuse the same filesystem on the same mount point */
 	err = -EBUSY;
-	if (nd->mnt->mnt_sb == mnt->mnt_sb && nd->mnt->mnt_root == nd->dentry)
+	if (nd->mnt->mnt_sb == newmnt->mnt_sb &&
+	    nd->mnt->mnt_root == nd->dentry)
 		goto unlock;
 
 	err = -EINVAL;
-	if (S_ISLNK(mnt->mnt_root->d_inode->i_mode))
+	if (S_ISLNK(newmnt->mnt_root->d_inode->i_mode))
 		goto unlock;
 
-	mnt->mnt_flags = mnt_flags;
-	err = graft_tree(mnt, nd);
+	newmnt->mnt_flags = mnt_flags;
+	err = graft_tree(newmnt, nd);
+
+	if (err == 0 && fslist) {
+		/* add to the specified expiration list */
+		spin_lock(&vfsmount_lock);
+		list_add_tail(&newmnt->mnt_fslink, fslist);
+		spin_unlock(&vfsmount_lock);
+	}
+
 unlock:
 	up_write(&current->namespace->sem);
-	mntput(mnt);
-out:
+	mntput(newmnt);
 	return err;
 }
 
+EXPORT_SYMBOL_GPL(do_add_mount);
+
+/*
+ * process a list of expirable mountpoints with the intent of discarding any
+ * mountpoints that aren't in use and haven't been touched since last we came
+ * here
+ */
+void mark_mounts_for_expiry(struct list_head *mounts)
+{
+	struct namespace *namespace;
+	struct list_head graveyard, *_p, *_n;
+	struct vfsmount *mnt;
+
+	if (list_empty(mounts))
+		return;
+
+	INIT_LIST_HEAD(&graveyard);
+
+	spin_lock(&vfsmount_lock);
+
+	/* extract from the expiration list every vfsmount that matches the
+	 * following criteria:
+	 * - only referenced by its parent vfsmount
+	 * - still marked for expiry (marked on the last call here; marks are
+	 *   cleared by mntput())
+	 */
+	list_for_each_safe(_p, _n, mounts) {
+		mnt = list_entry(_p, struct vfsmount, mnt_fslink);
+
+		if (!xchg(&mnt->mnt_expiry_mark, 1) ||
+		    atomic_read(&mnt->mnt_count) != 1)
+			continue;
+
+		mntget(mnt);
+		list_move(&mnt->mnt_fslink, &graveyard);
+	}
+
+	/*
+	 * go through the vfsmounts we've just consigned to the graveyard to
+	 * - check that they're still dead
+	 * - delete the vfsmount from the appropriate namespace under lock
+	 * - dispose of the corpse
+	 */
+	while (!list_empty(&graveyard)) {
+		mnt = list_entry(graveyard.next, struct vfsmount, mnt_fslink);
+		list_del_init(&mnt->mnt_fslink);
+
+		/* don't do anything if the namespace is dead - all the
+		 * vfsmounts from it are going away anyway */
+		namespace = mnt->mnt_namespace;
+		if (!namespace || atomic_read(&namespace->count) <= 0)
+			continue;
+		get_namespace(namespace);
+
+		spin_unlock(&vfsmount_lock);
+		down_write(&namespace->sem);
+		spin_lock(&vfsmount_lock);
+
+		/* check that it is still dead: the count should now be 2 - as
+		 * contributed by the vfsmount parent and the mntget above */
+		if (atomic_read(&mnt->mnt_count) == 2) {
+			struct vfsmount *xdmnt;
+			struct dentry *xdentry;
+
+			/* delete from the namespace */
+			list_del_init(&mnt->mnt_list);
+			list_del_init(&mnt->mnt_child);
+			list_del_init(&mnt->mnt_hash);
+			mnt->mnt_mountpoint->d_mounted--;
+
+			xdentry = mnt->mnt_mountpoint;
+			mnt->mnt_mountpoint = mnt->mnt_root;
+			xdmnt = mnt->mnt_parent;
+			mnt->mnt_parent = mnt;
+
+			spin_unlock(&vfsmount_lock);
+
+			mntput(xdmnt);
+			dput(xdentry);
+
+			/* now lay it to rest if this was the last ref on the
+			 * superblock */
+			if (atomic_read(&mnt->mnt_sb->s_active) == 1) {
+				/* last instance - try to be smart */
+				lock_kernel();
+				DQUOT_OFF(mnt->mnt_sb);
+				acct_auto_close(mnt->mnt_sb);
+				unlock_kernel();
+			}
+
+			mntput(mnt);
+		}
+		else {
+			/* someone brought it back to life whilst we didn't
+			 * have any locks held so return it to the expiration
+			 * list */
+			list_add_tail(&mnt->mnt_fslink, mounts);
+			spin_unlock(&vfsmount_lock);
+		}
+
+		up_write(&namespace->sem);
+
+		mntput(mnt);
+		put_namespace(namespace);
+
+		spin_lock(&vfsmount_lock);
+	}
+
+	spin_unlock(&vfsmount_lock);
+}
+
+EXPORT_SYMBOL_GPL(mark_mounts_for_expiry);
+
 int copy_mount_options (const void __user *data, unsigned long *where)
 {
 	int i;
@@ -860,7 +1029,7 @@ long do_mount(char * dev_name, char * dir_name, char *type_page,
 	else if (flags & MS_MOVE)
 		retval = do_move_mount(&nd, dev_name);
 	else
-		retval = do_add_mount(&nd, type_page, flags, mnt_flags,
+		retval = do_new_mount(&nd, type_page, flags, mnt_flags,
 				      dev_name, data_page);
 dput_out:
 	path_release(&nd);
@@ -1185,6 +1354,7 @@ static void __init init_mount_tree(void)
 	init_rwsem(&namespace->sem);
 	list_add(&mnt->mnt_list, &namespace->list);
 	namespace->root = mnt;
+	mnt->mnt_namespace = namespace;
 
 	init_task.namespace = namespace;
 	read_lock(&tasklist_lock);
@@ -1252,8 +1422,15 @@ void __init mnt_init(unsigned long mempages)
 
 void __put_namespace(struct namespace *namespace)
 {
+	struct vfsmount *mnt;
+
 	down_write(&namespace->sem);
 	spin_lock(&vfsmount_lock);
+
+	list_for_each_entry(mnt, &namespace->list, mnt_list) {
+		mnt->mnt_namespace = NULL;
+	}
+
 	umount_tree(namespace->root);
 	spin_unlock(&vfsmount_lock);
 	up_write(&namespace->sem);
diff --git a/fs/super.c b/fs/super.c
index ca60ada578b55e..38c23bf558d498 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -793,6 +793,7 @@ do_kern_mount(const char *fstype, int flags, const char *name, void *data)
 	mnt->mnt_root = dget(sb->s_root);
 	mnt->mnt_mountpoint = sb->s_root;
 	mnt->mnt_parent = mnt;
+	mnt->mnt_namespace = current->namespace;
 	up_write(&sb->s_umount);
 	put_filesystem(type);
 	return mnt;
@@ -809,6 +810,8 @@ out:
 	return (struct vfsmount *)sb;
 }
 
+EXPORT_SYMBOL_GPL(do_kern_mount);
+
 struct vfsmount *kern_mount(struct file_system_type *type)
 {
 	return do_kern_mount(type->name, 0, type->name, NULL);
author	David Howells <dhowells@redhat.com>	2004-07-10 19:30:11 -0700
committer	Linus Torvalds <torvalds@ppc970.osdl.org>	2004-07-10 19:30:11 -0700
commit	c8a6ba011918486195870aab9a8e3d61c76c1b02 (patch)
tree	896c5960db3db754d7364802486044fa907eed50 /fs
parent	694effe72032db1384b5a084977b8bbb2c3cdaa6 (diff)
download	history-c8a6ba011918486195870aab9a8e3d61c76c1b02.tar.gz