From: Andrew Morton <akpm@osdl.org>

Cc: Robert Love <rml@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 Documentation/filesystems/inotify.txt |   91 +++++++++++++++++++++-------------
 1 files changed, 57 insertions(+), 34 deletions(-)

diff -puN Documentation/filesystems/inotify.txt~inotify-faq-fds Documentation/filesystems/inotify.txt
--- 25/Documentation/filesystems/inotify.txt~inotify-faq-fds	2005-06-21 13:39:09.000000000 -0700
+++ 25-akpm/Documentation/filesystems/inotify.txt	2005-06-21 13:39:09.000000000 -0700
@@ -83,41 +83,64 @@ See fs/inotify.c for the locking and lif
 
 (iii) Rationale
 
-Q: What is the design decision behind not tying the watch to the
-open fd of the watched object?
+Q: What is the design decision behind not tying the watch to the open fd of
+   the watched object?
 
-A: Watches are associated with an open inotify device, not an
-open file.  This solves the primary problem with dnotify:
-keeping the file open pins the file and thus, worse, pins the
-mount.  Dnotify is therefore infeasible for use on a desktop
-system with removable media as the media cannot be unmounted.
-
-Q: What is the design decision behind using an-fd-per-device as
-opposed to an fd-per-watch?
-
-A: An fd-per-watch quickly consumes more file descriptors than
-are allowed, more fd's than are feasible to manage, and more
-fd's than are ideally select()-able.  Yes, root can bump the
-per-process fd limit and yes, users can use epoll, but requiring
-both is silly and an extraneous requirement.  A watch consumes
-less memory than an open file, separating the number spaces is
-thus sensible.  The current design is what user-space developers
-want: Users open the device, once, and add n watches, requiring
-but one fd and no twiddling with fd limits.
-Opening /dev/inotify two thousand times is silly.  If we can
-implement user-space's preferences cleanly--and we can, the idr
-layer makes stuff like this trivial--then we should.
+A: Watches are associated with an open inotify device, not an open file.
+   This solves the primary problem with dnotify: keeping the file open pins
+   the file and thus, worse, pins the mount.  Dnotify is therefore infeasible
+   for use on a desktop system with removable media as the media cannot be
+   unmounted.
+
+Q: What is the design decision behind using an-fd-per-device as opposed to
+   an fd-per-watch?
+
+A: An fd-per-watch quickly consumes more file descriptors than are allowed,
+   more fd's than are feasible to manage, and more fd's than are ideally
+   select()-able.  Yes, root can bump the per-process fd limit and yes, users
+   can use epoll, but requiring both is silly and an extraneous requirement.
+   A watch consumes less memory than an open file, separating the number
+   spaces is thus sensible.  The current design is what user-space developers
+   want: Users open the device, once, and add n watches, requiring but one fd
+   and no twiddling with fd limits.  Opening /dev/inotify two thousand times
+   is silly.  If we can implement user-space's preferences cleanly--and we
+   can, the idr layer makes stuff like this trivial--then we should.
+
+   There are other good arguments.  With a single fd, there is a single
+   item to block on, which is mapped to a single queue of events.  The single
+   fd returns all watch events and also any potential out-of-band data.  If
+   every fd was a separate watch,
+
+   - There would be no way to get event ordering.  Events on file foo and
+     file bar would pop poll() on both fd's, but there would be no way to tell
+     which happened first.  A single queue trivially gives you ordering.
+
+   - We'd have to maintain n fd's and n internal queues with state,
+     versus just one.  It is a lot messier in the kernel.
+
+   - User-space developers prefer the current API.  The Beagle guys, for
+     example, love it.  Trust me, I asked.  It is not a surprise: Who'd want
+     to manage and block on 1000 fd's?
+
+   - You'd have to manage the fd's, as an example: call close() when you
+     received a delete event.
+
+   - No way to get out of band data.
+
+   - 1024 is still too low.  ;-)
+
+   When you talk about designing a file change notification system that
+   scales to 1000s of directories, juggling 1000s of fd's just does not seem
+   the right interface.  It is too heavy.
 
 Q: Why a device node?
 
-A: The second biggest problem with dnotify is that the user
-interface sucks ass.  Signals are a terrible, terrible interface
-for file notification.  Or for anything, for that matter.  The
-idea solution, from all perspectives, is a file descriptor based
-one that allows basic file I/O and poll/select.  Obtaining the
-fd and managing the watches could of been done either via a
-device file or a family of new system calls.  We decided to
-implement a device file because adding three or four new system
-calls that mirrored open, close, and ioctl seemed silly.  A
-character device makes sense from user-space and was easy to
-implement inside of the kernel.
+A: The second biggest problem with dnotify is that the user interface sucks
+   ass.  Signals are a terrible, terrible interface for file notification.  Or
+   for anything, for that matter.  The idea solution, from all perspectives,
+   is a file descriptor based one that allows basic file I/O and poll/select.
+   Obtaining the fd and managing the watches could of been done either via a
+   device file or a family of new system calls.  We decided to implement a
+   device file because adding three or four new system calls that mirrored
+   open, close, and ioctl seemed silly.  A character device makes sense from
+   user-space and was easy to implement inside of the kernel.
_