From: David Howells The attached patch reworks the CacheFS documentation to reflect the new split between CacheFS and FS-Cache. Signed-Off-By: David Howells Signed-off-by: Andrew Morton --- /dev/null | 881 -------------- 25-akpm/Documentation/filesystems/caching/backend-api.txt | 317 +++++ 25-akpm/Documentation/filesystems/caching/cachefs.txt | 274 ++++ 25-akpm/Documentation/filesystems/caching/fscache.txt | 94 + 25-akpm/Documentation/filesystems/caching/netfs-api.txt | 583 +++++++++ 5 files changed, 1268 insertions(+), 881 deletions(-) diff -L Documentation/filesystems/cachefs.txt -puN Documentation/filesystems/cachefs.txt~rework-the-cachefs-documentation-to-reflect-fs-cache-split /dev/null --- 25/Documentation/filesystems/cachefs.txt +++ /dev/null Thu Apr 11 07:25:15 2002 @@ -1,892 +0,0 @@ - =========================== - CacheFS: Caching Filesystem - =========================== - -======== -OVERVIEW -======== - -CacheFS is a general purpose cache for network filesystems, though it could be -used for caching other things such as ISO9660 filesystems too. - -CacheFS uses a block device directly rather than a bunch of files under an -already mounted filesystem. For why this is so, see further on. If necessary, -however, a file can be loopback mounted as a cache. - -CacheFS does not follow the idea of completely loading every netfs file opened -into the cache before it can be operated upon, and then serving the pages out -of CacheFS rather than the netfs because: - - (1) It must be practical to operate without a cache. - - (2) The size of any accessible file must not be limited to the size of the - cache. - - (3) The combined size of all opened files (this includes mapped libraries) - must not be limited to the size of the cache. - - (4) The user should not be forced to download an entire file just to do a - one-off access of a small portion of it. - -It rather serves the cache out in PAGE_SIZE chunks as and when requested by -the netfs('s) using it. - - -CacheFS provides the following facilities: - - (1) More than one block device can be mounted as a cache. - - (2) Caches can be mounted / unmounted at any time. - - (3) The netfs is provided with an interface that allows either party to - withdraw caching facilities from a file (required for (2)). - - (4) The interface to the netfs returns as few errors as possible, preferring - rather to let the netfs remain oblivious. - - (5) Cookies are used to represent files and indexes to the netfs. The simplest - cookie is just a NULL pointer - indicating nothing cached there. - - (6) The netfs is allowed to propose - dynamically - any index hierarchy it - desires, though it must be aware that the index search function is - recursive and stack space is limited. - - (7) Data I/O is done direct to and from the netfs's pages. The netfs indicates - that page A is at index B of the data-file represented by cookie C, and - that it should be read or written. CacheFS may or may not start I/O on - that page, but if it does, a netfs callback will be invoked to indicate - completion. - - (8) Cookies can be "retired" upon release. At this point CacheFS will mark - them as obsolete and the index hierarchy rooted at that point will get - recycled. - - (9) The netfs provides a "match" function for index searches. In addition to - saying whether a match was made or not, this can also specify that an - entry should be updated or deleted. - -(10) All metadata modifications (this includes index contents) are performed - as journalled transactions. These are replayed on mounting. - - -============================================= -WHY A BLOCK DEVICE? WHY NOT A BUNCH OF FILES? -============================================= - -CacheFS is backed by a block device rather than being backed by a bunch of -files on a filesystem. This confers several advantages: - - (1) Performance. - - Going directly to a block device means that we can DMA directly to/from - the the netfs's pages. If another filesystem was managing the backing - store, everything would have to be copied between pages. Whilst DirectIO - does exist, it doesn't appear easy to make use of in this situation. - - New address space or file operations could be added to make it possible to - persuade a backing discfs to generate block I/O directly to/from disc - blocks under its control, but that then means the discfs has to keep track - of I/O requests to pages not under its control. - - Furthermore, we only have to do one lot of readahead calculations, not - two; in the discfs backing case, the netfs would do one and the discfs - would do one. - - (2) Memory. - - Using a block device means that we have a lower memory usage - all data - pages belong to the netfs we're backing. If we used a filesystem, we would - have twice as many pages at certain points - one from the netfs and one - from the backing discfs. In the backing discfs model, under situations of - memory pressure, we'd have to allocate or keep around a discfs page to be - able to write out a netfs page; or else we'd need to be able to punch a - hole in the backing file. - - Furthermore, whilst we have to keep a CacheFS inode around in memory for - every netfs inode we're backing, a backing discfs would have to keep the - dentry and possibly a file struct too. - - (3) Holes. - - The cache uses holes to indicate to the netfs that it hasn't yet - downloaded the data for that page. - - Since CacheFS is its own filesystem, it can support holes in files - trivially. Running on top of another discfs would limit us to using ones - that can support holes. - - Furthermore, it would have to be made possible to detect holes in a discfs - file, rather than just seeing zero filled blocks. - - (4) Data Consistency. - - Cachefs uses a pair of journals to keep track of the state of the cache - and all the pages contained therein. This means that it doesn't get into - an inconsistent state in the on-disc cache and it doesn't lose disc space. - - CacheFS takes especial care between the allocation of a block and its - splicing into the on-disc pointer tree, and the data having been written - to disc. If power is interrupted and then restored, the journals are - replayed and if it is seen that a block was allocated but not written it - is then punched out. Being backed by a discfs, I'm not certain what will - happen. It may well be possible to mark a discfs's journal, if it has one, - but how does the discfs deal with those marks? This also limits consistent - caching to running on journalled discfs's where there's a function to - write extraordinary marks into the journal. - - The alternative would be to keep flags in the superblock, and to - re-initialise the cache if it wasn't cleanly unmounted. - - Knowing that your cache is in a good state is vitally important if you, - say, put /usr on AFS. Some organisations put everything barring /etc, - /sbin, /lib and /var on AFS and have an enormous cache on every - computer. Imagine if the power goes out and renders every cache - inconsistent, requiring all the computers to re-initialise their caches - when the power comes back on... - - (5) Recycling. - - Recycling is simple on CacheFS. It can just scan the metadata index to - look for inodes that require reclamation/recycling; and it can also build - up a list of the least recently used inodes so that they can be reclaimed - later to make space. - - Doing this on a discfs would require a search going down through a nest - of directories, and would probably have to be done in userspace. - - (6) Disc Space. - - Whilst the block device does set a hard ceiling on the amount of space - available, CacheFS can guarantee that all that space will be available to - the cache. On a discfs-backed cache, the administrator would probably want - to set a cache size limit, but the system wouldn't be able guarantee that - all that space would be available to the cache - not unless that cache was - on a partition of its own. - - Furthermore, with a discfs-backed cache, if the recycler starts to reclaim - cache files to make space, the freed blocks may just be eaten directly by - userspace programs, potentially resulting in the entire cache being - consumed. Alternatively, netfs operations may end up being held up because - the cache can't get blocks on which to store the data. - - (7) Users. - - Users can't so easily go into CacheFS and run amok. The worst they can do - is cause bits of the cache to be recycled early. With a discfs-backed - cache, they can do all sorts of bad things to the files belonging to the - cache, and they can do this quite by accident. - - -On the other hand, there would be some advantages to using a file-based cache -rather than a blockdev-based cache: - - (1) Having to copy to a discfs's page would mean that a netfs could just make - the copy and then assume its own page is ready to go. - - (2) Backing onto a discfs wouldn't require a committed block device. You would - just nominate a directory and go from there. With CacheFS you have to - repartition or install an extra drive to make use of it in an existing - system (though the loopback device offers a way out). - - (3) CacheFS requires the netfs to store a key in any pertinent index entry, - and it also permits a limited amount arbitrary data to be stored there. - - A discfs could be requested to store the netfs's data in xattrs, and the - filename could be used to store the key, though the key would have to be - rendered as text not binary. Likewise indexes could be rendered as - directories with xattrs. - - (4) You could easily make your cache bigger if the discfs has plenty of space, - you could even go across multiple mountpoints. - - -====================== -GENERAL ON-DISC LAYOUT -====================== - -The filesystem is divided into a number of parts: - - 0 +---------------------------+ - | Superblock | - 1 +---------------------------+ - | Update Journal | - +---------------------------+ - | Validity Journal | - +---------------------------+ - | Write-Back Journal | - +---------------------------+ - | | - | Data | - | | - END +---------------------------+ - -The superblock contains the filesystem ID tags and pointers to all the other -regions. - -The update journal consists of a set of entries of sector size that keep track -of what changes have been made to the on-disc filesystem, but not yet -committed. - -The validity journal contains records of data blocks that have been allocated -but not yet written. Upon journal replay, all these blocks will be detached -from their pointers and recycled. - -The writeback journal keeps track of changes that have been made locally to -data blocks, but that have not yet been committed back to the server. This is -not yet implemented. - -The journals are replayed upon mounting to make sure that the cache is in a -reasonable state. - -The data region holds a number of things: - - (1) Index Files - - These are files of entries used by CacheFS internally and by filesystems - that wish to cache data here (such as AFS) to keep track of what's in - the cache at any given time. - - The first index file (inode 1) is special. It holds the CacheFS-specific - metadata for every file in the cache (including direct, single-indirect - and double-indirect block pointers). - - The second index file (inode 2) is also special. It has an entry for - each filesystem that's currently holding data in this cache. - - Every allocated entry in an index has an inode bound to it. This inode is - either another index file or it is a data file. - - (2) Cached Data Files - - These are caches of files from remote servers. Holes in these files - represent blocks not yet obtained from the server. - - (3) Indirection Blocks - - Should a file have more blocks than can be pointed to by the few - pointers in its storage management record, then indirection blocks will - be used to point to further data or indirection blocks. - - Three levels of indirection are currently supported: - - - single indirection - - double indirection - - (4) Allocation Nodes and Free Blocks - - The free blocks of the filesystem are kept in two single-branched - "trees". One tree is the blocks that are ready to be allocated, and the - other is the blocks that have just been recycled. When the former tree - becomes empty, the latter tree is decanted across. - - Each tree is arranged as a chain of "nodes", each node points to the next - node in the chain (unless it's at the end) and also up to 1022 free - blocks. - -Note that all blocks are PAGE_SIZE in size. The blocks are numbered starting -with the superblock at 0. Using 32-bit block pointers, a maximum number of -0xffffffff blocks can be accessed, meaning that the maximum cache size is ~16TB -for 4KB pages. - - -======== -MOUNTING -======== - -Since CacheFS is actually a quasi-filesystem, it requires a block device behind -it. The way to give it one is to mount it as cachefs type on a directory -somewhere. The mounted filesystem will then present the user with a set of -directories outlining the index structure resident in the cache. Indexes -(directories) and files can be turfed out of the cache by the sysadmin through -the use of rmdir and unlink. - -For instance, if a cache contains AFS data, the user might see the following: - - root>mount -t cachefs /dev/hdg9 /cache-hdg9 - root>ls -1 /cache-hdg9 - afs - root>ls -1 /cache-hdg9/afs - cambridge.redhat.com - root>ls -1 /cache-hdg9/afs/cambridge.redhat.com - root.afs - root.cell - -However, a block device that's going to be used for a cache must be prepared -before it can be mounted initially. This is done very simply by: - - echo "cachefs___" >/dev/hdg9 - -During the initial mount, the basic structure will be scribed into the cache, -and then a background thread will "recycle" the as-yet unused data blocks. - - -====================== -NETWORK FILESYSTEM API -====================== - -There is, of course, an API by which a network filesystem can make use of the -CacheFS facilities. This is based around a number of principles: - - (1) Every file and index is represented by a cookie. This cookie may or may - not have anything associated with it, but the netfs doesn't need to care. - - (2) Barring the top-level index (one entry per cached netfs), the index - hierarchy for each netfs is structured according the whim of the netfs. - - (3) Any netfs page being backed by the cache must have a small token - associated with it (possibly pointed to by page->private) so that CacheFS - can keep track of it. - -This API is declared in . - - -NETWORK FILESYSTEM DEFINITION ------------------------------ - -CacheFS needs a description of the network filesystem. This is specified using -a record of the following structure: - - struct cachefs_netfs { - const char *name; - unsigned version; - struct cachefs_netfs_operations *ops; - struct cachefs_cookie *primary_index; - ... - }; - -This first three fields should be filled in before registration, and the fourth -will be filled in by the registration function; any other fields should just be -ignored and are for internal use only. - -The fields are: - - (1) The name of the netfs (used as the key in the toplevel index). - - (2) The version of the netfs (if the name matches but the version doesn't, the - entire on-disc hierarchy for this netfs will be scrapped and begun - afresh). - - (3) The operations table is defined as follows: - - struct cachefs_netfs_operations { - struct cachefs_page *(*get_page_cookie)(struct page *page); - }; - - The functions here must all be present. Currently the only one is: - - (a) get_page_cookie(): Get the token used to bind a page to a block in a - cache. This function should allocate it if it doesn't exist. - - Return -ENOMEM if there's not enough memory and -ENODATA if the page - just shouldn't be cached. - - Set *_page_cookie to point to the token and return 0 if there is now a - cookie. Note that the netfs must keep track of the cookie itself (and - free it later). page->private can be used for this (see below). - - (4) The cookie representing the primary index will be allocated according to - another parameter passed into the registration function. - -For example, kAFS (linux/fs/afs/) uses the following definitions to describe -itself: - - static struct cachefs_netfs_operations afs_cache_ops = { - .get_page_cookie = afs_cache_get_page_cookie, - }; - - struct cachefs_netfs afs_cache_netfs = { - .name = "afs", - .version = 0, - .ops = &afs_cache_ops, - }; - - -INDEX DEFINITION ----------------- - -Indexes are used for two purposes: - - (1) To speed up the finding of a file based on a series of keys (such as AFS's - "cell", "volume ID", "vnode ID"). - - (2) To make it easier to discard a subset of all the files cached based around - a particular key - for instance to mirror the removal of an AFS volume. - -However, since it's unlikely that any two netfs's are going to want to define -their index hierarchies in quite the same way, CacheFS tries to impose as few -restraints as possible on how an index is structured and where it is placed in -the tree. The netfs can even mix indexes and data files at the same level, but -it's not recommended. - -There are some limits on indexes: - - (1) All entries in any given index must be the same size. An array of such - entries needn't fit exactly into a page, but they will be not laid across - a page boundary. - - The netfs supplies a blob of data for each index entry, and CacheFS - provides an inode number and a flag. - - (2) The entries in one index can be of a different size to the entries in - another index. - - (3) The entry data must be journallable, and thus must be able to fit into an - update journal entry - this limits the maximum size to a little over 400 - bytes at present. - - (4) The index data must start with the key. The layout of the key is described - in the index definition, and this is used to display the key in some - appropriate way. - - (5) The depth of the index tree should be judged with care as the search - function is recursive. Too many layers will run the kernel out of stack. - -To define an index, a structure of the following type should be filled out: - - struct cachefs_index_def - { - uint8_t name[8]; - uint16_t data_size; - struct { - uint8_t type; - uint16_t len; - } keys[4]; - - cachefs_match_val_t (*match)(void *target_netfs_data, - const void *entry); - - void (*update)(void *source_netfs_data, void *entry); - }; - -This has the following fields: - - (1) The name of the index (NUL terminated unless all 8 chars are used). - - (2) The size of the data blob provided by the netfs. - - (3) A definition of the key(s) at the beginning of the blob. The netfs is - permitted to specify up to four keys. The total length must not exceed the - data size. It is assumed that the keys will be laid end to end in order, - starting at the first byte of the data. - - The type field specifies the way the data should be displayed. It can be - one of: - - (*) CACHEFS_INDEX_KEYS_NOTUSED - key field not used - (*) CACHEFS_INDEX_KEYS_BIN - display byte-by-byte in hex - (*) CACHEFS_INDEX_KEYS_ASCIIZ - NUL-terminated ASCII - (*) CACHEFS_INDEX_KEYS_IPV4ADDR - display as IPv4 address - (*) CACHEFS_INDEX_KEYS_IPV6ADDR - display as IPv6 address - - (4) A function to compare an in-page-cache index entry blob with the data - passed to the cookie acquisition function. This function can also be used - to extract data from the blob and copy it into the netfs's structures. - - The values this function can return are: - - (*) CACHEFS_MATCH_FAILED - failed to match - (*) CACHEFS_MATCH_SUCCESS - successful match - (*) CACHEFS_MATCH_SUCCESS_UPDATE - successful match, entry needs update - (*) CACHEFS_MATCH_SUCCESS_DELETE - entry should be deleted - - For example, in linux/fs/afs/vnode.c: - - static cachefs_match_val_t - afs_vnode_cache_match(void *target, const void *entry) - { - const struct afs_cache_vnode *cvnode = entry; - struct afs_vnode *vnode = target; - - if (vnode->fid.vnode != cvnode->vnode_id) - return CACHEFS_MATCH_FAILED; - - if (vnode->fid.unique != cvnode->vnode_unique || - vnode->status.version != cvnode->data_version) - return CACHEFS_MATCH_SUCCESS_DELETE; - - return CACHEFS_MATCH_SUCCESS; - } - - (5) A function to initialise or update an in-page-cache index entry blob from - netfs data passed to CacheFS by the netfs. This function should not assume - that there's any data yet in the in-page-cache. - - Continuing the above example: - - static void afs_vnode_cache_update(void *source, void *entry) - { - struct afs_cache_vnode *cvnode = entry; - struct afs_vnode *vnode = source; - - cvnode->vnode_id = vnode->fid.vnode; - cvnode->vnode_unique = vnode->fid.unique; - cvnode->data_version = vnode->status.version; - } - -To finish the above example, the index definition for the "vnode" level is as -follows: - - struct cachefs_index_def afs_vnode_cache_index_def = { - .name = "vnode", - .data_size = sizeof(struct afs_cache_vnode), - .keys[0] = { CACHEFS_INDEX_KEYS_BIN, 4 }, - .match = afs_vnode_cache_match, - .update = afs_vnode_cache_update, - }; - -The first element of struct afs_cache_vnode is the vnode ID. - -And for contrast, the cell index definition is: - - struct cachefs_index_def afs_cache_cell_index_def = { - .name = "cell_ix", - .data_size = sizeof(afs_cell_t), - .keys[0] = { CACHEFS_INDEX_KEYS_ASCIIZ, 64 }, - .match = afs_cell_cache_match, - .update = afs_cell_cache_update, - }; - -The cell index is the primary index for kAFS. - - -NETWORK FILESYSTEM (UN)REGISTRATION ------------------------------------ - -The first step is to declare the network filesystem to the cache. This also -involves specifying the layout of the primary index (for AFS, this would be the -"cell" level). - -The registration function is: - - int cachefs_register_netfs(struct cachefs_netfs *netfs, - struct cachefs_index_def *primary_idef); - -It just takes pointers to the netfs definition and the primary index -definition. It returns 0 or an error as appropriate. - -For kAFS, registration is done as follows: - - ret = cachefs_register_netfs(&afs_cache_netfs, - &afs_cache_cell_index_def); - -The last step is, of course, unregistration: - - void cachefs_unregister_netfs(struct cachefs_netfs *netfs); - - -INDEX REGISTRATION ------------------- - -The second step is to inform cachefs about part of an index hierarchy that can -be used to locate files. This is done by requesting a cookie for each index in -the path to the file: - - struct cachefs_cookie * - cachefs_acquire_cookie(struct cachefs_cookie *iparent, - struct cachefs_index_def *idef, - void *netfs_data); - -This function creates an index entry in the index represented by iparent, -loading the associated blob by calling iparent's update method with the -supplied netfs_data. - -It also creates a new index inode, formatted according to the definition -supplied in idef. The new cookie is then returned in *_cookie. - -Note that this function never returns an error - all errors are handled -internally. It may also return CACHEFS_NEGATIVE_COOKIE. It is quite acceptable -to pass this token back to this function as iparent (or even to the relinquish -cookie, read page and write page functions - see below). - -Note also that no indexes are actually created on disc until a data file needs -to be created somewhere down the hierarchy. Furthermore, an index may be -created in several different caches independently at different times. This is -all handled transparently, and the netfs doesn't see any of it. - -For example, with AFS, a cell would be added to the primary index. This index -entry would have a dependent inode containing a volume location index for the -volume mappings within this cell: - - cell->cache = - cachefs_acquire_cookie(afs_cache_netfs.primary_index, - &afs_vlocation_cache_index_def, - cell); - -Then when a volume location was accessed, it would be entered into the cell's -index and an inode would be allocated that acts as a volume type and hash chain -combination: - - vlocation->cache = - cachefs_acquire_cookie(cell->cache, - &afs_volume_cache_index_def, - vlocation); - -And then a particular flavour of volume (R/O for example) could be added to -that index, creating another index for vnodes (AFS inode equivalents): - - volume->cache = - cachefs_acquire_cookie(vlocation->cache, - &afs_vnode_cache_index_def, - volume); - - -DATA FILE REGISTRATION ----------------------- - -The third step is to request a data file be created in the cache. This is -almost identical to index cookie acquisition. The only difference is that a -NULL index definition is passed. - - vnode->cache = - cachefs_acquire_cookie(volume->cache, - NULL, - vnode); - - - -PAGE ALLOC/READ/WRITE ---------------------- - -And the fourth step is to propose a page be cached. There are two functions -that are used to do this. - -Firstly, the netfs should ask CacheFS to examine the caches and read the -contents cached for a particular page of a particular file if present, or else -allocate space to store the contents if not: - - typedef - void (*cachefs_rw_complete_t)(void *cookie_data, - struct page *page, - void *end_io_data, - int error); - - int cachefs_read_or_alloc_page(struct cachefs_cookie *cookie, - struct page *page, - cachefs_rw_complete_t end_io_func, - void *end_io_data, - unsigned long gfp); - -The cookie argument must specify a data file cookie, the page specified will -have the data loaded into it (and is also used to specify the page number), and -the gfp argument is used to control how any memory allocations made are satisfied. - -If the cookie indicates the inode is not cached: - - (1) The function will return -ENOBUFS. - -Else if there's a copy of the page resident on disc: - - (1) The function will submit a request to read the data off the disc directly - into the page specified. - - (2) The function will return 0. - - (3) When the read is complete, end_io_func() will be invoked with: - - (*) The netfs data supplied when the cookie was created. - - (*) The page descriptor. - - (*) The data passed to the above function. - - (*) An argument that's 0 on success or negative for an error. - - If an error occurs, it should be assumed that the page contains no usable - data. - -Otherwise, if there's not a copy available on disc: - - (1) A block may be allocated in the cache and attached to the inode at the - appropriate place. - - (2) The validity journal will be marked to indicate this page does not yet - contain valid data. - - (3) The function will return -ENODATA. - - -Secondly, if the netfs changes the contents of the page (either due to an -initial download or if a user performs a write), then the page should be -written back to the cache: - - int cachefs_write_page(struct cachefs_cookie *cookie, - struct page *page, - cachefs_rw_complete_t end_io_func, - void *end_io_data, - unsigned long gfp); - -The cookie argument must specify a data file cookie, the page specified should -contain the data to be written (and is also used to specify the page number), -and the gfp argument is used to control how any memory allocations made are -satisfied. - -If the cookie indicates the inode is not cached then: - - (1) The function will return -ENOBUFS. - -Else if there's a block allocated on disc to hold this page: - - (1) The function will submit a request to write the data to the disc directly - from the page specified. - - (2) The function will return 0. - - (3) When the write is complete: - - (a) Any associated validity journal entry will be cleared (the block now - contains valid data as far as CacheFS is concerned). - - (b) end_io_func() will be invoked with: - - (*) The netfs data supplied when the cookie was created. - - (*) The page descriptor. - - (*) The data passed to the above function. - - (*) An argument that's 0 on success or negative for an error. - - If an error happens, it can be assumed that the page has been - discarded from the cache. - - -PAGE UNCACHING --------------- - -To uncache a page, this function should be called: - - void cachefs_uncache_page(struct cachefs_cookie *cookie, - struct page *page); - -This detaches the page specified from the data file indicated by the cookie and -unbinds it from the underlying block. - -Note that pages can't be explicitly detached from the a data file. The whole -data file must be retired (see the relinquish cookie function below). - -Furthermore, note that this does not cancel the asynchronous read or write -operation started by the read/alloc and write functions. - - -INDEX AND DATA FILE UPDATE --------------------------- - -To request an update of the index data for an index or data file, the following -function should be called: - - void cachefs_update_cookie(struct cachefs_cookie *cookie); - -This function will refer back to the netfs_data pointer stored in the cookie by -the acquisition function to obtain the data to write into each revised index -entry. The update method in the parent index definition will be called to -transfer the data. - - -INDEX AND DATA FILE UNREGISTRATION ----------------------------------- - -To get rid of a cookie, this function should be called. - - void cachefs_relinquish_cookie(struct cachefs_cookie *cookie, - int retire); - -If retire is non-zero, then the index or file will be marked for recycling, and -all copies of it will be removed from all active caches in which it is present. - -If retire is zero, then the inode may be available again next the the -acquisition function is called. - -One very important note - relinquish must NOT be called unless all "child" -indexes, files and pages have been relinquished first. - - -PAGE TOKEN MANAGEMENT ---------------------- - -As previously mentioned, the netfs must keep a token associated with each page -currently actively backed by the cache. This is used by CacheFS to go from a -page to the internal representation of the underlying block and back again. It -is particularly important for managing the withdrawal of a cache whilst it is -in active service (eg: it got unmounted). - -The token is this: - - struct cachefs_page { - ... - }; - -Note that all fields are for internal CacheFS use only. - -The token only needs to be allocated when CacheFS asks for it. This it will do -by calling the get_page_cookie() method in the netfs definition ops table. Once -allocated, the same token should be presented every time the method is called -again for a particular page. - -The token should be retained by the netfs, and should be deleted only after the -page has been uncached. - -One way to achieve this is to attach the token to page->private (and set the -PG_private bit on the page) once allocated. Shortcut routines are provided by -CacheFS to do this. Firstly, to retrieve if present and allocate if not: - - struct cachefs_page *cachefs_page_get_private(struct page *page, - unsigned gfp); - -Secondly to retrieve if present and BUG if not: - - static inline - struct cachefs_page *cachefs_page_grab_private(struct page *page); - -To clean up the tokens, the netfs inode hosting the page should be provided -with address space operations that circumvent the buffer-head operations for a -page. For instance: - - struct address_space_operations afs_fs_aops = { - ... - .sync_page = block_sync_page, - .set_page_dirty = __set_page_dirty_nobuffers, - .releasepage = afs_file_releasepage, - .invalidatepage = afs_file_invalidatepage, - }; - - static int afs_file_invalidatepage(struct page *page, - unsigned long offset) - { - struct afs_vnode *vnode = AFS_FS_I(page->mapping->host); - int ret = 1; - - BUG_ON(!PageLocked(page)); - if (!PagePrivate(page)) - return 1; - cachefs_uncache_page(vnode->cache,page); - if (offset == 0) - return 1; - BUG_ON(!PageLocked(page)); - if (PageWriteback(page)) - return 0; - return page->mapping->a_ops->releasepage(page, 0); - } - - static int afs_file_releasepage(struct page *page, int gfp_flags) - { - struct cachefs_page *token; - struct afs_vnode *vnode = AFS_FS_I(page->mapping->host); - - if (PagePrivate(page)) { - cachefs_uncache_page(vnode->cache, page); - token = (struct cachefs_page *) page->private; - page->private = 0; - ClearPagePrivate(page); - if (token) - kfree(token); - } - return 0; - } - - -INDEX AND DATA FILE INVALIDATION --------------------------------- - -There is no direct way to invalidate an index subtree or a data file. To do -this, the caller should relinquish and retire the cookie they have, and then -acquire a new one. diff -puN /dev/null Documentation/filesystems/caching/backend-api.txt --- /dev/null Thu Apr 11 07:25:15 2002 +++ 25-akpm/Documentation/filesystems/caching/backend-api.txt Wed Oct 6 16:03:29 2004 @@ -0,0 +1,317 @@ + ========================== + FS-CACHE CACHE BACKEND API + ========================== + +The FS-Cache system provides an API by which actual caches can be supplied to +FS-Cache for it to then serve out to network filesystems and other interested +parties.: + +This API is declared in . + + +==================================== +INITIALISING AND REGISTERING A CACHE +==================================== + +To start off, a cache definition must be initialised and registered for each +cache the backend wants to make available. For instance, CacheFS does this in +the fill_super() operation on mounting. + +The cache definition (struct fscache_cache) should be initialised by calling: + + void fscache_init_cache(struct fscache_cache *cache, + struct fscache_cache_ops *ops, + unsigned fsdef_ino, + const char *idfmt, + ...) + +Where: + + (*) "cache" is a pointer to the cache definition; + + (*) "ops" is a pointer to the table of operations that the backend supports on + this cache; + + (*) "fsdef_ino" is the reference number of the FileSystem DEFinition index + (the top-level index), which in CacheFS is its inode number; + + (*) and a format and printf-style arguments for constructing a label for the + cache. + + +The cache should then be registered with FS-Cache by passing a pointer to the +previously initialised cache definition to: + + void fscache_add_cache(struct fscache_cache *cache) + + +===================== +UNREGISTERING A CACHE +===================== + +A cache can be withdrawn from the system by calling this function with a +pointer to the cache definition: + + void fscache_withdraw_cache(struct fscache_cache *cache) + +In CacheFS's case, this is called by put_super(). + +It is possible to check to see if a cache has been withdrawn by calling: + + int fscache_is_cache_withdrawn(struct fscache_cache *cache) + +Which will return non-zero if it has been, zero if it is still active. + + +================== +FS-CACHE UTILITIES +================== + +FS-Cache provides some utilities that a cache backend may make use of: + + (*) Find parent of node. + + struct fscache_node *fscache_find_parent_node(struct fscache_node *node) + + This allows a backend to find the logical parent of an index or data file + in the cache hierarchy. + + (*) Allocate a page token. + + struct fscache_page *fscache_page_get_private(struct page *page, + unsigned gfp); + + If the page has a page token attached, then this is returned by this + function. If it doesn't have one, then a page token is allocated with the + specified allocation flags and attached to the page's private value. The + error ENOMEM is returned if there's no memory available. + + (*) Grab an existing page token. + + struct fscache_page *fscache_page_grab_private(struct page *page) + + This function returns a pointer to the page token attached to the page's + private value if it exists, and BUG's if it does not. + + +======================== +RELEVANT DATA STRUCTURES +======================== + + (*) Index/Data file FS-Cache representation cookie. + + struct fscache_cookie { + struct fscache_index_def *idef; + struct fscache_netfs *netfs; + void *netfs_data; + ... + }; + + The fields that might be of use to the backend describe the index + definition (indexes only), the netfs definition and the netfs's data for + this cookie. The index definition contains a number of functions supplied + by the netfs for matching index entries; these are required to provide + some of the cache operations. + + (*) Cached search result. + + struct fscache_search_result { + unsigned ino; + ... + }; + + This is used by FS-Cache to keep track of what nodes it has found in what + caches. Some of the cache operations set the "cache node number" held + therein. + + (*) In-cache node representation. + + struct fscache_node { + struct fscache_cookie *cookie; + unsigned long flags; + #define FSCACHE_NODE_ISINDEX 0 + ... + }; + + Structures of this type should be allocated by the cache backend and + passed to FS-Cache when requested by the appropriate cache operation. In + the case of CacheFS, they're embedded in CacheFS's inode structure. + + Each node contains a pointer to the cookie that represents the index or + data file it is backing. It also contains a flag that indicates whether + this is an index or not. This should be initialised by calling + fscache_node_init(node). + + (*) Filesystem definition (FSDEF) index entry representation. + + struct fscache_fsdef_index_entry { + uint8_t name[24]; /* name of netfs */ + uint32_t version; /* version of layout */ + }; + + This structure defines the layout of the data in the FSDEF index + maintained by the FS-Cache facility for distinguishing between the caches + for separate netfs's. + + +================ +CACHE OPERATIONS +================ + +The cache backend provides FS-Cache with a table of operations that can be +performed on the denizens of the cache. These are held in a structure of type + + struct fscache_cache_ops + + (*) Name of cache provider [mandatory]. + + const char *name + + This isn't strictly an operation, but should be pointed at a string naming + the backend. + + (*) Node lookup [mandatory]. + + struct fscache_node *(*lookup_node)(struct fscache_cache *cache, + unsigned ino) + + This method is used to turn a logical cache node number into a handle on a + represention of that node. + + (*) Increment node refcount [mandatory]. + + struct fscache_node *(*grab_node)(struct fscache_node *node) + + This method is called to increment the reference count on a node. It may + fail (for instance if the cache is being withdrawn). + + (*) Lock/Unlock node [mandatory]. + + void (*lock_node)(struct fscache_node *node) + void (*unlock_node)(struct fscache_node *node) + + These methods are used to exclusively lock a node. It must be possible to + schedule with the lock held, so a spinlock isn't sufficient. + + (*) Unreference node [mandatory]. + + void (*put_node)(struct fscache_node *node) + + This method is used to discard a reference to a node. The node may be + destroyed when all the references held by FS-Cache are released. + + (*) Search an index [mandatory]. + + int (*index_search)(struct fscache_node *index, + struct fscache_cookie *cookie, + struct fscache_search_result *result) + + This method is called to search an index for a node that matches the + criteria attached to the cookie (cookie->netfs_data). This should be + matched by calling index->cookie->idef->match(). + + The cache backend is responsible for dealing with the match result, + including updating or discarding existing index entries. An index entry + can be updated by calling index->cookie->idef->update(). + + If the search is successful, the node number should be stored in + result->ino and zero returned. If not successful, error ENOENT should be + returned if no entry was found, or some other error otherwise. + + (*) Create a new node [mandatory]. + + int (*index_add)(struct fscache_node *index, + struct fscache_cookie *cookie, + struct fscache_search_result *result) + + This method is called to create a new node on disc and add an entry for it + to the specified index. The index entry for the new node should be + obtained by calling index->cookie->idef->update() and passing it the + argument cookie. + + If successful, the node number should be stored in result->ino and zero + should be returned. + + (*) Update a node [mandatory]. + + int (*index_update)(struct fscache_node *index, + struct fscache_node *node) + + This is called to update the on-disc index entry for the specified + node. The new information should be in node->cookie->netfs_data. This can + be obtained by calling index->cookie->idef->update() and passing it + node->cookie. + + (*) Synchronise a cache to disc [mandatory]. + + void (*sync)(struct fscache_cache *cache) + + This is called to ask the backend to synchronise a cache with disc. + + (*) Dissociate a cache [mandatory]. + + void (*dissociate_pages)(struct fscache_cache *cache) + + This is called to ask the cache to dissociate all netfs pages from + mappings to disc. It is assumed that the backend cache will have some way + of finding all the page tokens that refer to its own blocks. + + (*) Request page be read from cache [mandatory]. + + int (*read_or_alloc_page)(struct fscache_node *node, + struct page *page, + struct fscache_page *pageio, + fscache_rw_complete_t end_io_func, + void *end_io_data, + unsigned long gfp) + + This is called to attempt to read a netfs page from disc, or to allocate a + backing block if not. FS-Cache will have done as much checking as it can + before calling, but most of the work belongs to the backend. + + If there's no page on disc, then -ENODATA should be returned if the + backend managed to allocate a backing block; -ENOBUFS or -ENOMEM if it + didn't. + + If there is a page on disc, then a read operation should be queued and 0 + returned. When the read finishes, end_io_func() should be called with the + following arguments: + + (*end_io_func)(node->cookie->netfs_data, + page, + end_io_data, + error); + + (*) Request page be written to cache [mandatory]. + + int (*write_page)(struct fscache_node *node, + struct page *page, + struct fscache_page *pageio, + fscache_rw_complete_t end_io_func, + void *end_io_data, + unsigned long gfp) + + This is called to write from a page on which there was a previously + successful read_or_alloc_page() call. FS-Cache filters out pages that + don't have mappings. + + If there's no block on disc available, then -ENOBUFS should be returned + (or -ENOMEM if there wasn't any memory to be had). + + If the write operation could be queued, then 0 should be returned. When + the write completes, end_io_func() should be called with the following + arguments: + + (*end_io_func)(node->cookie->netfs_data, + page, + end_io_data, + error); + + (*) Discard mapping [mandatory]. + + void (*uncache_page)(struct fscache_node *node, + struct fscache_page *page_token) + + This is called when a page is being booted from the pagecache. The cache + backend needs to break the links between the page token and whatever + internal representations it maintains. diff -puN /dev/null Documentation/filesystems/caching/cachefs.txt --- /dev/null Thu Apr 11 07:25:15 2002 +++ 25-akpm/Documentation/filesystems/caching/cachefs.txt Wed Oct 6 16:03:29 2004 @@ -0,0 +1,274 @@ + =========================== + CacheFS: Caching Filesystem + =========================== + +======== +OVERVIEW +======== + +CacheFS is a backend for the general filesystem cache facility. + +CacheFS uses a block device directly rather than a bunch of files under an +already mounted filesystem. For why this is so, see further on. If necessary, +however, a file can be loopback mounted as a cache. + + +CacheFS provides the following facilities: + + (1) More than one block device can be mounted as a cache. + + (2) Caches can be mounted / unmounted at any time. + + (3) All metadata modifications (this includes index contents) are performed + as journalled transactions. These are replayed on mounting. + + +============================================= +WHY A BLOCK DEVICE? WHY NOT A BUNCH OF FILES? +============================================= + +CacheFS is backed by a block device rather than being backed by a bunch of +files on a filesystem. This confers several advantages: + + (1) Performance. + + Going directly to a block device means that we can DMA directly to/from + the the netfs's pages. If another filesystem was managing the backing + store, everything would have to be copied between pages. Whilst DirectIO + does exist, it doesn't appear easy to make use of in this situation. + + New address space or file operations could be added to make it possible to + persuade a backing discfs to generate block I/O directly to/from disc + blocks under its control, but that then means the discfs has to keep track + of I/O requests to pages not under its control. + + Furthermore, we only have to do one lot of readahead calculations, not + two; in the discfs backing case, the netfs would do one and the discfs + would do one. + + (2) Memory. + + Using a block device means that we have a lower memory usage - all data + pages belong to the netfs we're backing. If we used a filesystem, we would + have twice as many pages at certain points - one from the netfs and one + from the backing discfs. In the backing discfs model, under situations of + memory pressure, we'd have to allocate or keep around a discfs page to be + able to write out a netfs page; or else we'd need to be able to punch a + hole in the backing file. + + Furthermore, whilst we have to keep a CacheFS inode around in memory for + every netfs inode we're backing, a backing discfs would have to keep the + dentry and possibly a file struct too. + + (3) Holes. + + The cache uses holes to indicate to the netfs that it hasn't yet + downloaded the data for that page. + + Since CacheFS is its own filesystem, it can support holes in files + trivially. Running on top of another discfs would limit us to using ones + that can support holes. + + Furthermore, it would have to be made possible to detect holes in a discfs + file, rather than just seeing zero filled blocks. + + (4) Data Consistency. + + Cachefs uses a pair of journals to keep track of the state of the cache + and all the pages contained therein. This means that it doesn't get into + an inconsistent state in the on-disc cache and it doesn't lose disc space. + + CacheFS takes especial care between the allocation of a block and its + splicing into the on-disc pointer tree, and the data having been written + to disc. If power is interrupted and then restored, the journals are + replayed and if it is seen that a block was allocated but not written it + is then punched out. Being backed by a discfs, I'm not certain what will + happen. It may well be possible to mark a discfs's journal, if it has one, + but how does the discfs deal with those marks? This also limits consistent + caching to running on journalled discfs's where there's a function to + write extraordinary marks into the journal. + + The alternative would be to keep flags in the superblock, and to + re-initialise the cache if it wasn't cleanly unmounted. + + Knowing that your cache is in a good state is vitally important if you, + say, put /usr on AFS. Some organisations put everything barring /etc, + /sbin, /lib and /var on AFS and have an enormous cache on every + computer. Imagine if the power goes out and renders every cache + inconsistent, requiring all the computers to re-initialise their caches + when the power comes back on... + + (5) Recycling. + + Recycling is simple on CacheFS. It can just scan the metadata index to + look for inodes that require reclamation/recycling; and it can also build + up a list of the least recently used inodes so that they can be reclaimed + later to make space. + + Doing this on a discfs would require a search going down through a nest + of directories, and would probably have to be done in userspace. + + (6) Disc Space. + + Whilst the block device does set a hard ceiling on the amount of space + available, CacheFS can guarantee that all that space will be available to + the cache. On a discfs-backed cache, the administrator would probably want + to set a cache size limit, but the system wouldn't be able guarantee that + all that space would be available to the cache - not unless that cache was + on a partition of its own. + + Furthermore, with a discfs-backed cache, if the recycler starts to reclaim + cache files to make space, the freed blocks may just be eaten directly by + userspace programs, potentially resulting in the entire cache being + consumed. Alternatively, netfs operations may end up being held up because + the cache can't get blocks on which to store the data. + + (7) Users. + + Users can't so easily go into CacheFS and run amok. The worst they can do + is cause bits of the cache to be recycled early. With a discfs-backed + cache, they can do all sorts of bad things to the files belonging to the + cache, and they can do this quite by accident. + + +On the other hand, there would be some advantages to using a file-based cache +rather than a blockdev-based cache: + + (1) Having to copy to a discfs's page would mean that a netfs could just make + the copy and then assume its own page is ready to go. + + (2) Backing onto a discfs wouldn't require a committed block device. You would + just nominate a directory and go from there. With CacheFS you have to + repartition or install an extra drive to make use of it in an existing + system (though the loopback device offers a way out). + + (3) CacheFS requires the netfs to store a key in any pertinent index entry, + and it also permits a limited amount arbitrary data to be stored there. + + A discfs could be requested to store the netfs's data in xattrs, and the + filename could be used to store the key, though the key would have to be + rendered as text not binary. Likewise indexes could be rendered as + directories with xattrs. + + (4) You could easily make your cache bigger if the discfs has plenty of space, + you could even go across multiple mountpoints. + + +====================== +GENERAL ON-DISC LAYOUT +====================== + +The filesystem is divided into a number of parts: + + 0 +---------------------------+ + | Superblock | + 1 +---------------------------+ + | Update Journal | + +---------------------------+ + | Validity Journal | + +---------------------------+ + | Write-Back Journal | + +---------------------------+ + | | + | Data | + | | + END +---------------------------+ + +The superblock contains the filesystem ID tags and pointers to all the other +regions. + +The update journal consists of a set of entries of sector size that keep track +of what changes have been made to the on-disc filesystem, but not yet +committed. + +The validity journal contains records of data blocks that have been allocated +but not yet written. Upon journal replay, all these blocks will be detached +from their pointers and recycled. + +The writeback journal keeps track of changes that have been made locally to +data blocks, but that have not yet been committed back to the server. This is +not yet implemented. + +The journals are replayed upon mounting to make sure that the cache is in a +reasonable state. + +The data region holds a number of things: + + (1) Index Files + + These are files of entries used by CacheFS internally and by filesystems + that wish to cache data here (such as AFS) to keep track of what's in + the cache at any given time. + + The first index file (inode 1) is special. It holds the CacheFS-specific + metadata for every file in the cache (including direct, single-indirect + and double-indirect block pointers). + + The second index file (inode 2) is also special. It has an entry for + each filesystem that's currently holding data in this cache. + + Every allocated entry in an index has an inode bound to it. This inode is + either another index file or it is a data file. + + (2) Cached Data Files + + These are caches of files from remote servers. Holes in these files + represent blocks not yet obtained from the server. + + (3) Indirection Blocks + + Should a file have more blocks than can be pointed to by the few + pointers in its storage management record, then indirection blocks will + be used to point to further data or indirection blocks. + + Three levels of indirection are currently supported: + + - single indirection + - double indirection + + (4) Allocation Nodes and Free Blocks + + The free blocks of the filesystem are kept in two single-branched + "trees". One tree is the blocks that are ready to be allocated, and the + other is the blocks that have just been recycled. When the former tree + becomes empty, the latter tree is decanted across. + + Each tree is arranged as a chain of "nodes", each node points to the next + node in the chain (unless it's at the end) and also up to 1022 free + blocks. + +Note that all blocks are PAGE_SIZE in size. The blocks are numbered starting +with the superblock at 0. Using 32-bit block pointers, a maximum number of +0xffffffff blocks can be accessed, meaning that the maximum cache size is ~16TB +for 4KB pages. + + +======== +MOUNTING +======== + +Since CacheFS is actually a quasi-filesystem, it requires a block device behind +it. The way to give it one is to mount it as cachefs type on a directory +somewhere. The mounted filesystem will then present the user with a set of +directories outlining the index structure resident in the cache. Indexes +(directories) and files can be turfed out of the cache by the sysadmin through +the use of rmdir and unlink. + +For instance, if a cache contains AFS data, the user might see the following: + + root>mount -t cachefs /dev/hdg9 /cache-hdg9 + root>ls -1 /cache-hdg9 + afs + root>ls -1 /cache-hdg9/afs + cambridge.redhat.com + root>ls -1 /cache-hdg9/afs/cambridge.redhat.com + root.afs + root.cell + +However, a block device that's going to be used for a cache must be prepared +before it can be mounted initially. This is done very simply by: + + echo "cachefs___" >/dev/hdg9 + +During the initial mount, the basic structure will be scribed into the cache, +and then a background thread will "recycle" the as-yet unused data blocks. diff -puN /dev/null Documentation/filesystems/caching/fscache.txt --- /dev/null Thu Apr 11 07:25:15 2002 +++ 25-akpm/Documentation/filesystems/caching/fscache.txt Wed Oct 6 16:03:29 2004 @@ -0,0 +1,94 @@ + ========================== + General Filesystem Caching + ========================== + +======== +OVERVIEW +======== + +This facility is a general purpose cache for network filesystems, though it +could be used for caching other things such as ISO9660 filesystems too. + +FS-Cache mediates between cache backends (such as CacheFS) and network +filesystems: + + +---------+ + | | +-----------+ + | NFS |--+ | | + | | | +-->| CacheFS | + +---------+ | +----------+ | | /dev/hda5 | + | | | | +-----------+ + +---------+ +-->| | | + | | | |--+ +-------------+ + | AFS |----->| FS-Cache | | | + | | | |----->| Cache Files | + +---------+ +-->| | | /var/cache | + | | |--+ +-------------+ + +---------+ | +----------+ | + | | | | +-------------+ + | ISOFS |--+ | | | + | | +-->| ReiserCache | + +---------+ | / | + +-------------+ + +FS-Cache does not follow the idea of completely loading every netfs file +opened in its entirety into a cache before permitting it to be accessed and +then serving the pages out of that cache rather than the netfs inode because: + + (1) It must be practical to operate without a cache. + + (2) The size of any accessible file must not be limited to the size of the + cache. + + (3) The combined size of all opened files (this includes mapped libraries) + must not be limited to the size of the cache. + + (4) The user should not be forced to download an entire file just to do a + one-off access of a small portion of it (such as might be done with the + "file" program). + +It instead serves the cache out in PAGE_SIZE chunks as and when requested by +the netfs('s) using it. + + +FS-Cache provides the following facilities: + + (1) More than one cache can be used at once. + + (2) Caches can be added / removed at any time. + + (3) The netfs is provided with an interface that allows either party to + withdraw caching facilities from a file (required for (2)). + + (4) The interface to the netfs returns as few errors as possible, preferring + rather to let the netfs remain oblivious. + + (5) Cookies are used to represent files and indexes to the netfs. The simplest + cookie is just a NULL pointer - indicating nothing cached there. + + (6) The netfs is allowed to propose - dynamically - any index hierarchy it + desires, though it must be aware that the index search function is + recursive and stack space is limited. + + (7) Data I/O is done direct to and from the netfs's pages. The netfs indicates + that page A is at index B of the data-file represented by cookie C, and + that it should be read or written. The cache backend may or may not start + I/O on that page, but if it does, a netfs callback will be invoked to + indicate completion. The I/O may be either synchronous or asynchronous. + + (8) Cookies can be "retired" upon release. At this point FS-Cache will mark + them as obsolete and the index hierarchy rooted at that point will get + recycled. + + (9) The netfs provides a "match" function for index searches. In addition to + saying whether a match was made or not, this can also specify that an + entry should be updated or deleted. + + +The netfs API to FS-Cache can be found in: + + Documentation/filesystems/caching/netfs-api.txt + +The cache backend API to FS-Cache can be found in: + + Documentation/filesystems/caching/backend-api.txt diff -puN /dev/null Documentation/filesystems/caching/netfs-api.txt --- /dev/null Thu Apr 11 07:25:15 2002 +++ 25-akpm/Documentation/filesystems/caching/netfs-api.txt Wed Oct 6 16:03:29 2004 @@ -0,0 +1,583 @@ + =============================== + FS-CACHE NETWORK FILESYSTEM API + =============================== + +There's an API by which a network filesystem can make use of the FS-Cache +facilities. This is based around a number of principles: + + (1) Every file and index is represented by a cookie. This cookie may or may + not have anything associated with it, but the netfs doesn't need to care. + + (2) Barring the top-level index (one entry per cached netfs), the index + hierarchy for each netfs is structured according the whim of the netfs. + + (3) Any netfs page being backed by the cache must have a small token + associated with it (possibly pointed to by page->private) so that FS-Cache + can keep track of it. + +This API is declared in . + + +============================= +NETWORK FILESYSTEM DEFINITION +============================= + +FS-Cache needs a description of the network filesystem. This is specified using +a record of the following structure: + + struct fscache_netfs { + const char *name; + unsigned version; + struct fscache_netfs_operations *ops; + struct fscache_cookie *primary_index; + ... + }; + +This first three fields should be filled in before registration, and the fourth +will be filled in by the registration function; any other fields should just be +ignored and are for internal use only. + +The fields are: + + (1) The name of the netfs (used as the key in the toplevel index). + + (2) The version of the netfs (if the name matches but the version doesn't, the + entire on-disc hierarchy for this netfs will be scrapped and begun + afresh). + + (3) The operations table is defined as follows: + + struct fscache_netfs_operations { + struct fscache_page *(*get_page_cookie)(struct page *page); + }; + + The functions here must all be present. Currently the only one is: + + (a) get_page_token(): Get the token used to bind a page to a block in a + cache. This function should allocate it if it doesn't exist. + + Return -ENOMEM if there's not enough memory and -ENODATA if the page + just shouldn't be cached. + + Set *_page_token to point to the token and return 0 if there is now a + token. Note that the netfs must keep track of the token itself (and + free it later). page->private can be used for this (see below). + + (4) The cookie representing the primary index will be allocated according to + another parameter passed into the registration function. + +For example, kAFS (linux/fs/afs/) uses the following definitions to describe +itself: + + static struct fscache_netfs_operations afs_cache_ops = { + .get_page_token = afs_cache_get_page_token, + }; + + struct fscache_netfs afs_cache_netfs = { + .name = "afs", + .version = 0, + .ops = &afs_cache_ops, + }; + + +================ +INDEX DEFINITION +================ + +Indexes are used for two purposes: + + (1) To speed up the finding of a file based on a series of keys (such as AFS's + "cell", "volume ID", "vnode ID"). + + (2) To make it easier to discard a subset of all the files cached based around + a particular key - for instance to mirror the removal of an AFS volume. + +However, since it's unlikely that any two netfs's are going to want to define +their index hierarchies in quite the same way, FS-Cache tries to impose as few +restraints as possible on how an index is structured and where it is placed in +the tree. The netfs can even mix indexes and data files at the same level, but +it's not recommended. + +There are some limits on indexes: + + (1) All entries in any given index must be the same size. The netfs supplies a + blob of data for each index entry. + + (2) The entries in one index can be of a different size to the entries in + another index. + + (3) The entry data must be atomically journallable, so it is limited to 400 + bytes at present. + + (4) The index data must start with the key. The layout of the key is described + in the index definition, and this is used to display the key in some + appropriate way. + + (5) The depth of the index tree should be judged with care as the search + function is recursive. Too many layers will run the kernel out of stack. + +To define an index, a structure of the following type should be filled out: + + struct fscache_index_def + { + uint8_t name[8]; + uint16_t data_size; + struct { + uint8_t type; + uint16_t len; + } keys[4]; + + fscache_match_val_t (*match)(void *target_netfs_data, + const void *entry); + + void (*update)(void *source_netfs_data, void *entry); + }; + +This has the following fields: + + (1) The name of the index (NUL terminated unless all 8 chars are used). + + (2) The size of the data blob provided by the netfs. + + (3) A definition of the key(s) at the beginning of the blob. The netfs is + permitted to specify up to four keys. The total length must not exceed the + data size. It is assumed that the keys will be laid end to end in order, + starting at the first byte of the data. + + The type field specifies the way the data should be displayed. It can be + one of: + + (*) FSCACHE_INDEX_KEYS_NOTUSED - key field not used + (*) FSCACHE_INDEX_KEYS_BIN - display byte-by-byte in hex + (*) FSCACHE_INDEX_KEYS_BIN_SZ1 - as above, BE size in byte 0 + (*) FSCACHE_INDEX_KEYS_BIN_SZ2 - as above, BE size in bytes 0-1 + (*) FSCACHE_INDEX_KEYS_BIN_SZ4 - as above, BE size in bytes 0-3 + (*) FSCACHE_INDEX_KEYS_ASCIIZ - NUL-terminated ASCII + (*) FSCACHE_INDEX_KEYS_IPV4ADDR - display as IPv4 address + (*) FSCACHE_INDEX_KEYS_IPV6ADDR - display as IPv6 address + + (4) A function to compare an in-page-cache index entry blob with the data + passed to the cookie acquisition function. This function can also be used + to extract data from the blob and copy it into the netfs's structures. + + The values this function can return are: + + (*) FSCACHE_MATCH_FAILED - failed to match + (*) FSCACHE_MATCH_SUCCESS - successful match + (*) FSCACHE_MATCH_SUCCESS_UPDATE - successful match, entry needs update + (*) FSCACHE_MATCH_SUCCESS_DELETE - entry should be deleted + + For example, in linux/fs/afs/vnode.c: + + static fscache_match_val_t + afs_vnode_cache_match(void *target, const void *entry) + { + const struct afs_cache_vnode *cvnode = entry; + struct afs_vnode *vnode = target; + + if (vnode->fid.vnode != cvnode->vnode_id) + return FSCACHE_MATCH_FAILED; + + if (vnode->fid.unique != cvnode->vnode_unique || + vnode->status.version != cvnode->data_version) + return FSCACHE_MATCH_SUCCESS_DELETE; + + return FSCACHE_MATCH_SUCCESS; + } + + (5) A function to initialise or update an in-page-cache index entry blob from + netfs data passed to FS-Cache by the netfs. This function should not assume + that there's any data yet in the in-page-cache. + + Continuing the above example: + + static void afs_vnode_cache_update(void *source, void *entry) + { + struct afs_cache_vnode *cvnode = entry; + struct afs_vnode *vnode = source; + + cvnode->vnode_id = vnode->fid.vnode; + cvnode->vnode_unique = vnode->fid.unique; + cvnode->data_version = vnode->status.version; + } + + Any dead space in the index entry should be filled with a pattern defined + by FS-Cache: + + FSCACHE_INDEX_DEADFILL_PATTERN + +To finish the above example, the index definition for the "vnode" level is as +follows: + + struct fscache_index_def afs_vnode_cache_index_def = { + .name = "vnode", + .data_size = sizeof(struct afs_cache_vnode), + .keys[0] = { FSCACHE_INDEX_KEYS_BIN, 4 }, + .match = afs_vnode_cache_match, + .update = afs_vnode_cache_update, + }; + +The first element of struct afs_cache_vnode is the vnode ID. + +And for contrast, the cell index definition is: + + struct fscache_index_def afs_cache_cell_index_def = { + .name = "cell_ix", + .data_size = sizeof(struct afs_cell), + .keys[0] = { FSCACHE_INDEX_KEYS_ASCIIZ, 64 }, + .match = afs_cell_cache_match, + .update = afs_cell_cache_update, + }; + +The cell index is the primary index for kAFS. + + +=================================== +NETWORK FILESYSTEM (UN)REGISTRATION +=================================== + +The first step is to declare the network filesystem to the cache. This also +involves specifying the layout of the primary index (for AFS, this would be the +"cell" level). + +The registration function is: + + int fscache_register_netfs(struct fscache_netfs *netfs, + struct fscache_index_def *primary_idef); + +It just takes pointers to the netfs definition and the primary index +definition. It returns 0 or an error as appropriate. + +For kAFS, registration is done as follows: + + ret = fscache_register_netfs(&afs_cache_netfs, + &afs_cache_cell_index_def); + +The last step is, of course, unregistration: + + void fscache_unregister_netfs(struct fscache_netfs *netfs); + + +================== +INDEX REGISTRATION +================== + +The second step is to inform FS-Cache about part of an index hierarchy that can +be used to locate files. This is done by requesting a cookie for each index in +the path to the file: + + struct fscache_cookie * + fscache_acquire_cookie(struct fscache_cookie *iparent, + struct fscache_index_def *idef, + void *netfs_data); + +This function creates an index entry in the index represented by iparent, +loading the associated blob by calling iparent's update method with the +supplied netfs_data. + +It also creates a new index inode, formatted according to the definition +supplied in idef. The new cookie is then returned in *_cookie. + +Note that this function never returns an error - all errors are handled +internally. It may also return FSCACHE_NEGATIVE_COOKIE. It is quite acceptable +to pass this token back to this function as iparent (or even to the relinquish +cookie, read page and write page functions - see below). + +Note also that no indexes are actually created on disc until a data file needs +to be created somewhere down the hierarchy. Furthermore, an index may be +created in several different caches independently at different times. This is +all handled transparently, and the netfs doesn't see any of it. + +For example, with AFS, a cell would be added to the primary index. This index +entry would have a dependent inode containing a volume location index for the +volume mappings within this cell: + + cell->cache = + fscache_acquire_cookie(afs_cache_netfs.primary_index, + &afs_vlocation_cache_index_def, + cell); + +Then when a volume location was accessed, it would be entered into the cell's +index and an inode would be allocated that acts as a volume type and hash chain +combination: + + vlocation->cache = + fscache_acquire_cookie(cell->cache, + &afs_volume_cache_index_def, + vlocation); + +And then a particular flavour of volume (R/O for example) could be added to +that index, creating another index for vnodes (AFS inode equivalents): + + volume->cache = + fscache_acquire_cookie(vlocation->cache, + &afs_vnode_cache_index_def, + volume); + + +====================== +DATA FILE REGISTRATION +====================== + +The third step is to request a data file be created in the cache. This is +almost identical to index cookie acquisition. The only difference is that a +NULL index definition is passed. + + vnode->cache = + fscache_acquire_cookie(volume->cache, + NULL, + vnode); + + +===================== +PAGE ALLOC/READ/WRITE +===================== + +And the fourth step is to propose a page be cached. There are two functions +that are used to do this. + +Firstly, the netfs should ask FS-Cache to examine the caches and read the +contents cached for a particular page of a particular file if present, or else +allocate space to store the contents if not: + + typedef + void (*fscache_rw_complete_t)(void *cookie_data, + struct page *page, + void *end_io_data, + int error); + + int fscache_read_or_alloc_page(struct fscache_cookie *cookie, + struct page *page, + fscache_rw_complete_t end_io_func, + void *end_io_data, + unsigned long gfp); + +The cookie argument must specify a data file cookie, the page specified will +have the data loaded into it (and is also used to specify the page number), and +the gfp argument is used to control how any memory allocations made are satisfied. + +If the cookie indicates the inode is not cached: + + (1) The function will return -ENOBUFS. + +Else if there's a copy of the page resident on disc: + + (1) The function will submit a request to read the data off the disc directly + into the page specified. + + (2) The function will return 0. + + (3) When the read is complete, end_io_func() will be invoked with: + + (*) The netfs data supplied when the cookie was created. + + (*) The page descriptor. + + (*) The data passed to the above function. + + (*) An argument that's 0 on success or negative for an error. + + If an error occurs, it should be assumed that the page contains no usable + data. + +Otherwise, if there's not a copy available on disc: + + (1) A block may be allocated in the cache and attached to the inode at the + appropriate place. + + (2) The validity journal will be marked to indicate this page does not yet + contain valid data. + + (3) The function will return -ENODATA. + + +Secondly, if the netfs changes the contents of the page (either due to an +initial download or if a user performs a write), then the page should be +written back to the cache: + + int fscache_write_page(struct fscache_cookie *cookie, + struct page *page, + fscache_rw_complete_t end_io_func, + void *end_io_data, + unsigned long gfp); + +The cookie argument must specify a data file cookie, the page specified should +contain the data to be written (and is also used to specify the page number), +and the gfp argument is used to control how any memory allocations made are +satisfied. + +If the cookie indicates the inode is not cached then: + + (1) The function will return -ENOBUFS. + +Else if there's a block allocated on disc to hold this page: + + (1) The function will submit a request to write the data to the disc directly + from the page specified. + + (2) The function will return 0. + + (3) When the write is complete: + + (a) Any associated validity journal entry will be cleared (the block now + contains valid data as far as FS-Cache is concerned). + + (b) end_io_func() will be invoked with: + + (*) The netfs data supplied when the cookie was created. + + (*) The page descriptor. + + (*) The data passed to the above function. + + (*) An argument that's 0 on success or negative for an error. + + If an error happens, it can be assumed that the page has been + discarded from the cache. + + +============== +PAGE UNCACHING +============== + +To uncache a page, this function should be called: + + void fscache_uncache_page(struct fscache_cookie *cookie, + struct page *page); + +This detaches the page specified from the data file indicated by the cookie and +unbinds it from the underlying block. + +Note that pages can't be explicitly detached from the a data file. The whole +data file must be retired (see the relinquish cookie function below). + +Furthermore, note that this does not cancel the asynchronous read or write +operation started by the read/alloc and write functions. + + +========================== +INDEX AND DATA FILE UPDATE +========================== + +To request an update of the index data for an index or data file, the following +function should be called: + + void fscache_update_cookie(struct fscache_cookie *cookie); + +This function will refer back to the netfs_data pointer stored in the cookie by +the acquisition function to obtain the data to write into each revised index +entry. The update method in the parent index definition will be called to +transfer the data. + + +================================== +INDEX AND DATA FILE UNREGISTRATION +================================== + +To get rid of a cookie, this function should be called. + + void fscache_relinquish_cookie(struct fscache_cookie *cookie, + int retire); + +If retire is non-zero, then the index or file will be marked for recycling, and +all copies of it will be removed from all active caches in which it is present. + +If retire is zero, then the inode may be available again next the the +acquisition function is called. + +One very important note - relinquish must NOT be called unless all "child" +indexes, files and pages have been relinquished first. + + +===================== +PAGE TOKEN MANAGEMENT +===================== + +As previously mentioned, the netfs must keep a token associated with each page +currently actively backed by the cache. This is used by FS-Cache to go from a +page to the internal representation of the underlying block and back again. It +is particularly important for managing the withdrawal of a cache whilst it is +in active service (eg: it got unmounted). + +The token is this: + + struct fscache_page { + ... + }; + +Note that all fields are for internal FS-Cache use only. + +The token only needs to be allocated when FS-Cache asks for it. This it will do +by calling the get_page_cookie() method in the netfs definition ops table. Once +allocated, the same token should be presented every time the method is called +again for a particular page. + +The token should be retained by the netfs, and should be deleted only after the +page has been uncached. + +One way to achieve this is to attach the token to page->private (and set the +PG_private bit on the page) once allocated. Shortcut routines are provided by +FS-Cache to do this. Firstly, to retrieve if present and allocate if not: + + struct fscache_page *fscache_page_get_private(struct page *page, + unsigned gfp); + +Secondly to retrieve if present and BUG if not: + + static inline + struct fscache_page *fscache_page_grab_private(struct page *page); + +To clean up the tokens, the netfs inode hosting the page should be provided +with address space operations that circumvent the buffer-head operations for a +page. For instance: + + struct address_space_operations afs_fs_aops = { + ... + .sync_page = block_sync_page, + .set_page_dirty = __set_page_dirty_nobuffers, + .releasepage = afs_file_releasepage, + .invalidatepage = afs_file_invalidatepage, + }; + + static int afs_file_invalidatepage(struct page *page, + unsigned long offset) + { + struct afs_vnode *vnode = AFS_FS_I(page->mapping->host); + int ret = 1; + + BUG_ON(!PageLocked(page)); + if (!PagePrivate(page)) + return 1; + fscache_uncache_page(vnode->cache,page); + if (offset == 0) + return 1; + BUG_ON(!PageLocked(page)); + if (PageWriteback(page)) + return 0; + return page->mapping->a_ops->releasepage(page, 0); + } + + static int afs_file_releasepage(struct page *page, int gfp_flags) + { + struct fscache_page *token; + struct afs_vnode *vnode = AFS_FS_I(page->mapping->host); + + if (PagePrivate(page)) { + fscache_uncache_page(vnode->cache, page); + token = (struct fscache_page *) page->private; + page->private = 0; + ClearPagePrivate(page); + if (token) + kfree(token); + } + return 0; + } + + +================================ +INDEX AND DATA FILE INVALIDATION +================================ + +There is no direct way to invalidate an index subtree or a data file. To do +this, the caller should relinquish and retire the cookie they have, and then +acquire a new one. _