Linux Filesystems API summary¶
This section contains API-level documentation, mostly taken from the source code itself.
The Linux VFS¶
The Filesystem types¶
-
enum
positive_aop_returns
¶ aop return codes with specific semantics
Constants
AOP_WRITEPAGE_ACTIVATE
- Informs the caller that page writeback has completed, that the page is still locked, and should be considered active. The VM uses this hint to return the page to the active list – it won’t be a candidate for writeback again in the near future. Other callers must be careful to unlock the page if they get this return. Returned by writepage();
AOP_TRUNCATED_PAGE
- The AOP method that was handed a locked page has unlocked it and the page might have been truncated. The caller should back up to acquiring a new page and trying again. The aop will be taking reasonable precautions not to livelock. If the caller held a page reference, it should drop it before retrying. Returned by readpage().
Description
address_space_operation functions return these large constants to indicate special semantics to the caller. These are much larger than the bytes in a page to allow for functions that return the number of bytes operated on in a given page.
-
struct
address_space
¶ Contents of a cacheable, mappable object.
Definition
struct address_space {
struct inode *host;
struct xarray i_pages;
gfp_t gfp_mask;
atomic_t i_mmap_writable;
#ifdef CONFIG_READ_ONLY_THP_FOR_FS;
atomic_t nr_thps;
#endif;
struct rb_root_cached i_mmap;
struct rw_semaphore i_mmap_rwsem;
unsigned long nrpages;
unsigned long nrexceptional;
pgoff_t writeback_index;
const struct address_space_operations *a_ops;
unsigned long flags;
errseq_t wb_err;
spinlock_t private_lock;
struct list_head private_list;
void *private_data;
};
Members
host
- Owner, either the inode or the block_device.
i_pages
- Cached pages.
gfp_mask
- Memory allocation flags to use for allocating pages.
i_mmap_writable
- Number of VM_SHARED mappings.
nr_thps
- Number of THPs in the pagecache (non-shmem only).
i_mmap
- Tree of private and shared mappings.
i_mmap_rwsem
- Protects i_mmap and i_mmap_writable.
nrpages
- Number of page entries, protected by the i_pages lock.
nrexceptional
- Shadow or DAX entries, protected by the i_pages lock.
writeback_index
- Writeback starts here.
a_ops
- Methods.
flags
- Error bits and flags (AS_*).
wb_err
- The most recent error which has occurred.
private_lock
- For use by the owner of the address_space.
private_list
- For use by the owner of the address_space.
private_data
- For use by the owner of the address_space.
-
void
sb_end_write
(struct super_block * sb)¶ drop write access to a superblock
Parameters
struct super_block * sb
- the super we wrote to
Description
Decrement number of writers to the filesystem. Wake up possible waiters wanting to freeze the filesystem.
-
void
sb_end_pagefault
(struct super_block * sb)¶ drop write access to a superblock from a page fault
Parameters
struct super_block * sb
- the super we wrote to
Description
Decrement number of processes handling write page fault to the filesystem. Wake up possible waiters wanting to freeze the filesystem.
-
void
sb_end_intwrite
(struct super_block * sb)¶ drop write access to a superblock for internal fs purposes
Parameters
struct super_block * sb
- the super we wrote to
Description
Decrement fs-internal number of writers to the filesystem. Wake up possible waiters wanting to freeze the filesystem.
-
void
sb_start_write
(struct super_block * sb)¶ get write access to a superblock
Parameters
struct super_block * sb
- the super we write to
Description
When a process wants to write data or metadata to a file system (i.e. dirty
a page or an inode), it should embed the operation in a sb_start_write()
-
sb_end_write()
pair to get exclusion against file system freezing. This
function increments number of writers preventing freezing. If the file
system is already frozen, the function waits until the file system is
thawed.
Since freeze protection behaves as a lock, users have to preserve ordering of freeze protection and other filesystem locks. Generally, freeze protection should be the outermost lock. In particular, we have:
- sb_start_write
- -> i_mutex (write path, truncate, directory ops, …) -> s_umount (freeze_super, thaw_super)
-
void
sb_start_pagefault
(struct super_block * sb)¶ get write access to a superblock from a page fault
Parameters
struct super_block * sb
- the super we write to
Description
When a process starts handling write page fault, it should embed the
operation into sb_start_pagefault()
- sb_end_pagefault()
pair to get
exclusion against file system freezing. This is needed since the page fault
is going to dirty a page. This function increments number of running page
faults preventing freezing. If the file system is already frozen, the
function waits until the file system is thawed.
Since page fault freeze protection behaves as a lock, users have to preserve
ordering of freeze protection and other filesystem locks. It is advised to
put sb_start_pagefault()
close to mmap_sem in lock ordering. Page fault
handling code implies lock dependency:
- mmap_sem
- -> sb_start_pagefault
-
void
filemap_set_wb_err
(struct address_space * mapping, int err)¶ set a writeback error on an address_space
Parameters
struct address_space * mapping
- mapping in which to set writeback error
int err
- error to be set in mapping
Description
When writeback fails in some way, we must record that error so that userspace can be informed when fsync and the like are called. We endeavor to report errors on any file that was open at the time of the error. Some internal callers also need to know when writeback errors have occurred.
When a writeback error occurs, most filesystems will want to call filemap_set_wb_err to record the error in the mapping so that it will be automatically reported whenever fsync is called on the file.
-
int
filemap_check_wb_err
(struct address_space * mapping, errseq_t since)¶ has an error occurred since the mark was sampled?
Parameters
struct address_space * mapping
- mapping to check for writeback errors
errseq_t since
- previously-sampled errseq_t
Description
Grab the errseq_t value from the mapping, and see if it has changed “since” the given value was sampled.
If it has then report the latest error set, otherwise return 0.
-
errseq_t
filemap_sample_wb_err
(struct address_space * mapping)¶ sample the current errseq_t to test for later errors
Parameters
struct address_space * mapping
- mapping to be sampled
Description
Writeback errors are always reported relative to a particular sample point in the past. This function provides those sample points.
The Directory Cache¶
-
struct dentry *
d_find_any_alias
(struct inode * inode)¶ find any alias for a given inode
Parameters
struct inode * inode
- inode to find an alias for
Description
If any aliases exist for the given inode, take and return a
reference for one of them. If no aliases exist, return NULL
.
-
void
shrink_dcache_sb
(struct super_block * sb)¶ shrink dcache for a superblock
Parameters
struct super_block * sb
- superblock
Description
Shrink the dcache for the specified super block. This is used to free the dcache before unmounting a file system.
-
int
path_has_submounts
(const struct path * parent)¶ check for mounts over a dentry in the current namespace.
Parameters
const struct path * parent
- path to check.
Description
Return true if the parent or its subdirectories contain a mount point in the current namespace.
-
void
shrink_dcache_parent
(struct dentry * parent)¶ prune dcache
Parameters
struct dentry * parent
- parent of entries to prune
Description
Prune the dcache to remove unused children of the parent dentry.
-
void
d_invalidate
(struct dentry * dentry)¶ detach submounts, prune dcache, and drop
Parameters
struct dentry * dentry
- dentry to invalidate (aka detach, prune and drop)
-
struct dentry *
d_alloc
(struct dentry * parent, const struct qstr * name)¶ allocate a dcache entry
Parameters
struct dentry * parent
- parent of entry to allocate
const struct qstr * name
- qstr of the name
Description
Allocates a dentry. It returns NULL
if there is insufficient memory
available. On a success the dentry is returned. The name passed in is
copied and the copy passed in may be reused after this call.
-
void
d_instantiate
(struct dentry * entry, struct inode * inode)¶ fill in inode information for a dentry
Parameters
struct dentry * entry
- dentry to complete
struct inode * inode
- inode to attach to this dentry
Description
Fill in inode information in the entry.
This turns negative dentries into productive full members of society.
NOTE! This assumes that the inode count has been incremented (or otherwise set) by the caller to indicate that it is now in use by the dcache.
-
struct dentry *
d_obtain_alias
(struct inode * inode)¶ find or allocate a DISCONNECTED dentry for a given inode
Parameters
struct inode * inode
- inode to allocate the dentry for
Description
Obtain a dentry for an inode resulting from NFS filehandle conversion or similar open by handle operations. The returned dentry may be anonymous, or may have a full name (if the inode was already in the cache).
When called on a directory inode, we must ensure that the inode only ever has one dentry. If a dentry is found, that is returned instead of allocating a new one.
On successful return, the reference to the inode has been transferred
to the dentry. In case of an error the reference on the inode is released.
To make it easier to use in export operations a NULL
or IS_ERR inode may
be passed in and the error will be propagated to the return value,
with a NULL
inode replaced by ERR_PTR(-ESTALE).
-
struct dentry *
d_obtain_root
(struct inode * inode)¶ find or allocate a dentry for a given inode
Parameters
struct inode * inode
- inode to allocate the dentry for
Description
Obtain an IS_ROOT dentry for the root of a filesystem.
We must ensure that directory inodes only ever have one dentry. If a dentry is found, that is returned instead of allocating a new one.
On successful return, the reference to the inode has been transferred
to the dentry. In case of an error the reference on the inode is
released. A NULL
or IS_ERR inode may be passed in and will be the
error will be propagate to the return value, with a NULL
inode
replaced by ERR_PTR(-ESTALE).
-
struct dentry *
d_add_ci
(struct dentry * dentry, struct inode * inode, struct qstr * name)¶ lookup or allocate new dentry with case-exact name
Parameters
struct dentry * dentry
- the negative dentry that was passed to the parent’s lookup func
struct inode * inode
- the inode case-insensitive lookup has found
struct qstr * name
- the case-exact name to be associated with the returned dentry
Description
This is to avoid filling the dcache with case-insensitive names to the same inode, only the actual correct case is stored in the dcache for case-insensitive filesystems.
For a case-insensitive lookup match and if the the case-exact dentry already exists in in the dcache, use it and return it.
If no entry exists with the exact case name, allocate new dentry with the exact case, and return the spliced entry.
-
struct dentry *
d_lookup
(const struct dentry * parent, const struct qstr * name)¶ search for a dentry
Parameters
const struct dentry * parent
- parent dentry
const struct qstr * name
- qstr of name we wish to find
Return
dentry, or NULL
d_lookup searches the children of the parent dentry for the name in
question. If the dentry is found its reference count is incremented and the
dentry is returned. The caller must use dput to free the entry when it has
finished using it. NULL
is returned if the dentry does not exist.
-
struct dentry *
d_hash_and_lookup
(struct dentry * dir, struct qstr * name)¶ hash the qstr then search for a dentry
Parameters
struct dentry * dir
- Directory to search in
struct qstr * name
- qstr of name we wish to find
Description
On lookup failure NULL is returned; on bad name - ERR_PTR(-error)
-
void
d_delete
(struct dentry * dentry)¶ delete a dentry
Parameters
struct dentry * dentry
- The dentry to delete
Description
Turn the dentry into a negative dentry if possible, otherwise remove it from the hash queues so it can be deleted later
-
void
d_rehash
(struct dentry * entry)¶ add an entry back to the hash
Parameters
struct dentry * entry
- dentry to add to the hash
Description
Adds a dentry to the hash according to its name.
-
void
d_add
(struct dentry * entry, struct inode * inode)¶ add dentry to hash queues
Parameters
struct dentry * entry
- dentry to add
struct inode * inode
- The inode to attach to this dentry
Description
This adds the entry to the hash queues and initializes inode.
The entry was actually filled in earlier during d_alloc()
.
-
struct dentry *
d_exact_alias
(struct dentry * entry, struct inode * inode)¶ find and hash an exact unhashed alias
Parameters
struct dentry * entry
- dentry to add
struct inode * inode
- The inode to go with this dentry
Description
If an unhashed dentry with the same name/parent and desired inode already exists, hash and return it. Otherwise, return NULL.
Parent directory should be locked.
-
struct dentry *
d_splice_alias
(struct inode * inode, struct dentry * dentry)¶ splice a disconnected dentry into the tree if one exists
Parameters
struct inode * inode
- the inode which may have a disconnected dentry
struct dentry * dentry
- a negative dentry which we want to point to the inode.
Description
If inode is a directory and has an IS_ROOT alias, then d_move that in place of the given dentry and return it, else simply d_add the inode to the dentry and return NULL.
If a non-IS_ROOT directory is found, the filesystem is corrupt, and we should error out: directories can’t have multiple aliases.
This is needed in the lookup routine of any filesystem that is exportable (via knfsd) so that we can build dcache paths to directories effectively.
If a dentry was found and moved, then it is returned. Otherwise NULL is returned. This matches the expected return value of ->lookup.
Cluster filesystems may call this function with a negative, hashed dentry. In that case, we know that the inode will be a regular file, and also this will only occur during atomic_open. So we need to check for the dentry being already hashed only in the final case.
-
bool
is_subdir
(struct dentry * new_dentry, struct dentry * old_dentry)¶ is new dentry a subdirectory of old_dentry
Parameters
struct dentry * new_dentry
- new dentry
struct dentry * old_dentry
- old dentry
Description
Returns true if new_dentry is a subdirectory of the parent (at any depth).
Returns false otherwise.
Caller must ensure that “new_dentry” is pinned before calling is_subdir()
-
struct dentry *
dget_dlock
(struct dentry * dentry)¶ get a reference to a dentry
Parameters
struct dentry * dentry
- dentry to get a reference to
Description
Given a dentry orNULL
pointer increment the reference count if appropriate and return the dentry. A dentry will not be destroyed when it has references.
-
int
d_unhashed
(const struct dentry * dentry)¶ is dentry hashed
Parameters
const struct dentry * dentry
- entry to check
Description
Returns true if the dentry passed is not currently hashed.
-
bool
d_really_is_negative
(const struct dentry * dentry)¶ Determine if a dentry is really negative (ignoring fallthroughs)
Parameters
const struct dentry * dentry
- The dentry in question
Description
Returns true if the dentry represents either an absent name or a name that doesn’t map to an inode (ie. ->d_inode is NULL). The dentry could represent a true miss, a whiteout that isn’t represented by a 0,0 chardev or a fallthrough marker in an opaque directory.
Note! (1) This should be used only by a filesystem to examine its own
dentries. It should not be used to look at some other filesystem’s
dentries. (2) It should also be used in combination with d_inode()
to get
the inode. (3) The dentry may have something attached to ->d_lower and the
type field of the flags may be set to something other than miss or whiteout.
-
bool
d_really_is_positive
(const struct dentry * dentry)¶ Determine if a dentry is really positive (ignoring fallthroughs)
Parameters
const struct dentry * dentry
- The dentry in question
Description
Returns true if the dentry represents a name that maps to an inode (ie. ->d_inode is not NULL). The dentry might still represent a whiteout if that is represented on medium as a 0,0 chardev.
Note! (1) This should be used only by a filesystem to examine its own
dentries. It should not be used to look at some other filesystem’s
dentries. (2) It should also be used in combination with d_inode()
to get
the inode.
-
struct inode *
d_inode
(const struct dentry * dentry)¶ Get the actual inode of this dentry
Parameters
const struct dentry * dentry
- The dentry to query
Description
This is the helper normal filesystems should use to get at their own inodes in their own dentries and ignore the layering superimposed upon them.
-
struct inode *
d_inode_rcu
(const struct dentry * dentry)¶ Get the actual inode of this dentry with READ_ONCE()
Parameters
const struct dentry * dentry
- The dentry to query
Description
This is the helper normal filesystems should use to get at their own inodes in their own dentries and ignore the layering superimposed upon them.
-
struct inode *
d_backing_inode
(const struct dentry * upper)¶ Get upper or lower inode we should be using
Parameters
const struct dentry * upper
- The upper layer
Description
This is the helper that should be used to get at the inode that will be used if this dentry were to be opened as a file. The inode may be on the upper dentry or it may be on a lower dentry pinned by the upper.
Normal filesystems should not use this to access their own inodes.
-
struct dentry *
d_backing_dentry
(struct dentry * upper)¶ Get upper or lower dentry we should be using
Parameters
struct dentry * upper
- The upper layer
Description
This is the helper that should be used to get the dentry of the inode that will be used if this dentry were opened as a file. It may be the upper dentry or it may be a lower dentry pinned by the upper.
Normal filesystems should not use this to access their own dentries.
-
struct dentry *
d_real
(struct dentry * dentry, const struct inode * inode)¶ Return the real dentry
Parameters
struct dentry * dentry
- the dentry to query
const struct inode * inode
- inode to select the dentry from multiple layers (can be NULL)
Description
If dentry is on a union/overlay, then return the underlying, real dentry. Otherwise return the dentry itself.
See also: Documentation/filesystems/vfs.rst
-
struct inode *
d_real_inode
(const struct dentry * dentry)¶ Return the real inode
Parameters
const struct dentry * dentry
- The dentry to query
Description
If dentry is on a union/overlay, then return the underlying, real inode.
Otherwise return d_inode()
.
Inode Handling¶
-
int
inode_init_always
(struct super_block * sb, struct inode * inode)¶ perform inode structure initialisation
Parameters
struct super_block * sb
- superblock inode belongs to
struct inode * inode
- inode to initialise
Description
These are initializations that need to be done on every inode allocation as the fields are not initialised by slab allocation.
-
void
drop_nlink
(struct inode * inode)¶ directly drop an inode’s link count
Parameters
struct inode * inode
- inode
Description
This is a low-level filesystem helper to replace any direct filesystem manipulation of i_nlink. In cases where we are attempting to track writes to the filesystem, a decrement to zero means an imminent write when the file is truncated and actually unlinked on the filesystem.
-
void
clear_nlink
(struct inode * inode)¶ directly zero an inode’s link count
Parameters
struct inode * inode
- inode
Description
This is a low-level filesystem helper to replace any
direct filesystem manipulation of i_nlink. See
drop_nlink()
for why we care about i_nlink hitting zero.
-
void
set_nlink
(struct inode * inode, unsigned int nlink)¶ directly set an inode’s link count
Parameters
struct inode * inode
- inode
unsigned int nlink
- new nlink (should be non-zero)
Description
This is a low-level filesystem helper to replace any direct filesystem manipulation of i_nlink.
-
void
inc_nlink
(struct inode * inode)¶ directly increment an inode’s link count
Parameters
struct inode * inode
- inode
Description
This is a low-level filesystem helper to replace any direct filesystem manipulation of i_nlink. Currently, it is only here for parity with dec_nlink().
-
void
inode_sb_list_add
(struct inode * inode)¶ add inode to the superblock list of inodes
Parameters
struct inode * inode
- inode to add
-
void
__insert_inode_hash
(struct inode * inode, unsigned long hashval)¶ hash an inode
Parameters
struct inode * inode
- unhashed inode
unsigned long hashval
- unsigned long value used to locate this object in the inode_hashtable.
Description
Add an inode to the inode hash for this superblock.
-
void
__remove_inode_hash
(struct inode * inode)¶ remove an inode from the hash
Parameters
struct inode * inode
- inode to unhash
Description
Remove an inode from the superblock.
-
void
evict_inodes
(struct super_block * sb)¶ evict all evictable inodes for a superblock
Parameters
struct super_block * sb
- superblock to operate on
Description
Make sure that no inodes with zero refcount are retained. This is called by superblock shutdown after having SB_ACTIVE flag removed, so any inode reaching zero refcount during or after that call will be immediately evicted.
-
struct inode *
new_inode
(struct super_block * sb)¶ obtain an inode
Parameters
struct super_block * sb
- superblock
Description
Allocates a new inode for given superblock. The default gfp_mask for allocations related to inode->i_mapping is GFP_HIGHUSER_MOVABLE. If HIGHMEM pages are unsuitable or it is known that pages allocated for the page cache are not reclaimable or migratable, mapping_set_gfp_mask() must be called with suitable flags on the newly created inode’s mapping
-
void
unlock_new_inode
(struct inode * inode)¶ clear the I_NEW state and wake up any waiters
Parameters
struct inode * inode
- new inode to unlock
Description
Called when the inode is fully initialised to clear the new state of the inode and wake up anyone waiting for the inode to finish initialisation.
-
void
lock_two_nondirectories
(struct inode * inode1, struct inode * inode2)¶ take two i_mutexes on non-directory objects
Parameters
struct inode * inode1
- first inode to lock
struct inode * inode2
- second inode to lock
Description
Lock any non-NULL argument that is not a directory. Zero, one or two objects may be locked by this function.
-
void
unlock_two_nondirectories
(struct inode * inode1, struct inode * inode2)¶ release locks from
lock_two_nondirectories()
Parameters
struct inode * inode1
- first inode to unlock
struct inode * inode2
- second inode to unlock
-
struct inode *
inode_insert5
(struct inode * inode, unsigned long hashval, int (*test) (struct inode *, void *, int (*set) (struct inode *, void *, void * data)¶ obtain an inode from a mounted file system
Parameters
struct inode * inode
- pre-allocated inode to use for insert to cache
unsigned long hashval
- hash value (usually inode number) to get
int (*)(struct inode *, void *) test
- callback used for comparisons between inodes
int (*)(struct inode *, void *) set
- callback used to initialize a new struct inode
void * data
- opaque data pointer to pass to test and set
Description
Search for the inode specified by hashval and data in the inode cache,
and if present it is return it with an increased reference count. This is
a variant of iget5_locked()
for callers that don’t want to fail on memory
allocation of inode.
If the inode is not in cache, insert the pre-allocated inode to cache and
return it locked, hashed, and with the I_NEW flag set. The file system gets
to fill it in before unlocking it via unlock_new_inode()
.
Note both test and set are called with the inode_hash_lock held, so can’t sleep.
-
struct inode *
iget5_locked
(struct super_block * sb, unsigned long hashval, int (*test) (struct inode *, void *, int (*set) (struct inode *, void *, void * data)¶ obtain an inode from a mounted file system
Parameters
struct super_block * sb
- super block of file system
unsigned long hashval
- hash value (usually inode number) to get
int (*)(struct inode *, void *) test
- callback used for comparisons between inodes
int (*)(struct inode *, void *) set
- callback used to initialize a new struct inode
void * data
- opaque data pointer to pass to test and set
Description
Search for the inode specified by hashval and data in the inode cache,
and if present it is return it with an increased reference count. This is
a generalized version of iget_locked()
for file systems where the inode
number is not sufficient for unique identification of an inode.
If the inode is not in cache, allocate a new inode and return it locked,
hashed, and with the I_NEW flag set. The file system gets to fill it in
before unlocking it via unlock_new_inode()
.
Note both test and set are called with the inode_hash_lock held, so can’t sleep.
-
struct inode *
iget_locked
(struct super_block * sb, unsigned long ino)¶ obtain an inode from a mounted file system
Parameters
struct super_block * sb
- super block of file system
unsigned long ino
- inode number to get
Description
Search for the inode specified by ino in the inode cache and if present return it with an increased reference count. This is for file systems where the inode number is sufficient for unique identification of an inode.
If the inode is not in cache, allocate a new inode and return it locked,
hashed, and with the I_NEW flag set. The file system gets to fill it in
before unlocking it via unlock_new_inode()
.
-
ino_t
iunique
(struct super_block * sb, ino_t max_reserved)¶ get a unique inode number
Parameters
struct super_block * sb
- superblock
ino_t max_reserved
- highest reserved inode number
Description
Obtain an inode number that is unique on the system for a given superblock. This is used by file systems that have no natural permanent inode numbering system. An inode number is returned that is higher than the reserved limit but unique.
BUGS: With a large number of inodes live on the file system this function currently becomes quite slow.
-
struct inode *
ilookup5_nowait
(struct super_block * sb, unsigned long hashval, int (*test) (struct inode *, void *, void * data)¶ search for an inode in the inode cache
Parameters
struct super_block * sb
- super block of file system to search
unsigned long hashval
- hash value (usually inode number) to search for
int (*)(struct inode *, void *) test
- callback used for comparisons between inodes
void * data
- opaque data pointer to pass to test
Description
Search for the inode specified by hashval and data in the inode cache. If the inode is in the cache, the inode is returned with an incremented reference count.
Note
I_NEW is not waited upon so you have to be very careful what you do
with the returned inode. You probably should be using ilookup5()
instead.
Note2: test is called with the inode_hash_lock held, so can’t sleep.
-
struct inode *
ilookup5
(struct super_block * sb, unsigned long hashval, int (*test) (struct inode *, void *, void * data)¶ search for an inode in the inode cache
Parameters
struct super_block * sb
- super block of file system to search
unsigned long hashval
- hash value (usually inode number) to search for
int (*)(struct inode *, void *) test
- callback used for comparisons between inodes
void * data
- opaque data pointer to pass to test
Description
Search for the inode specified by hashval and data in the inode cache, and if the inode is in the cache, return the inode with an incremented reference count. Waits on I_NEW before returning the inode. returned with an incremented reference count.
This is a generalized version of ilookup()
for file systems where the
inode number is not sufficient for unique identification of an inode.
Note
test is called with the inode_hash_lock held, so can’t sleep.
-
struct inode *
ilookup
(struct super_block * sb, unsigned long ino)¶ search for an inode in the inode cache
Parameters
struct super_block * sb
- super block of file system to search
unsigned long ino
- inode number to search for
Description
Search for the inode ino in the inode cache, and if the inode is in the cache, the inode is returned with an incremented reference count.
-
struct inode *
find_inode_nowait
(struct super_block * sb, unsigned long hashval, int (*match) (struct inode *, unsigned long, void *, void * data)¶ find an inode in the inode cache
Parameters
struct super_block * sb
- super block of file system to search
unsigned long hashval
- hash value (usually inode number) to search for
int (*)(struct inode *, unsigned long, void *) match
- callback used for comparisons between inodes
void * data
- opaque data pointer to pass to match
Description
Search for the inode specified by hashval and data in the inode cache, where the helper function match will return 0 if the inode does not match, 1 if the inode does match, and -1 if the search should be stopped. The match function must be responsible for taking the i_lock spin_lock and checking i_state for an inode being freed or being initialized, and incrementing the reference count before returning 1. It also must not sleep, since it is called with the inode_hash_lock spinlock held.
This is a even more generalized version of ilookup5()
when the
function must never block — find_inode() can block in
__wait_on_freeing_inode() — or when the caller can not increment
the reference count because the resulting iput()
might cause an
inode eviction. The tradeoff is that the match funtion must be
very carefully implemented.
-
void
iput
(struct inode * inode)¶ put an inode
Parameters
struct inode * inode
- inode to put
Description
Puts an inode, dropping its usage count. If the inode use count hits zero, the inode is then freed and may also be destroyed.
Consequently,
iput()
can sleep.
-
sector_t
bmap
(struct inode * inode, sector_t block)¶ find a block number in a file
Parameters
struct inode * inode
- inode of file
sector_t block
- block to find
Description
Returns the block number on the device holding the inode that is the disk block number for the block of the file requested. That is, asked for block 4 of inode 1 the function will return the disk block relative to the disk start that holds that block of the file.
-
int
file_update_time
(struct file * file)¶ update mtime and ctime time
Parameters
struct file * file
- file accessed
Description
Update the mtime and ctime members of an inode and mark the inode for writeback. Note that this function is meant exclusively for usage in the file write path of filesystems, and filesystems may choose to explicitly ignore update via this function with the S_NOCMTIME inode flag, e.g. for network filesystem where these timestamps are handled by the server. This can return an error for file systems who need to allocate space in order to update an inode.
-
void
inode_init_owner
(struct inode * inode, const struct inode * dir, umode_t mode)¶ Init uid,gid,mode for new inode according to posix standards
Parameters
struct inode * inode
- New inode
const struct inode * dir
- Directory inode
umode_t mode
- mode of the new inode
-
bool
inode_owner_or_capable
(const struct inode * inode)¶ check current task permissions to inode
Parameters
const struct inode * inode
- inode being checked
Description
Return true if current either has CAP_FOWNER in a namespace with the inode owner uid mapped, or owns the file.
-
void
inode_dio_wait
(struct inode * inode)¶ wait for outstanding DIO requests to finish
Parameters
struct inode * inode
- inode to wait for
Description
Waits for all pending direct I/O requests to finish so that we can proceed with a truncate or equivalent operation.
Must be called under a lock that serializes taking new references to i_dio_count, usually by inode->i_mutex.
-
struct timespec64
timespec64_trunc
(struct timespec64 t, unsigned gran)¶ Truncate timespec64 to a granularity
Parameters
struct timespec64 t
- Timespec64
unsigned gran
- Granularity in ns.
Description
Truncate a timespec64 to a granularity. Always rounds down. gran must not be 0 nor greater than a second (NSEC_PER_SEC, or 10^9 ns).
-
struct timespec64
timestamp_truncate
(struct timespec64 t, struct inode * inode)¶ Truncate timespec to a granularity
Parameters
struct timespec64 t
- Timespec
struct inode * inode
- inode being updated
Description
Truncate a timespec to the granularity supported by the fs containing the inode. Always rounds down. gran must not be 0 nor greater than a second (NSEC_PER_SEC, or 10^9 ns).
-
struct timespec64
current_time
(struct inode * inode)¶ Return FS time
Parameters
struct inode * inode
- inode.
Description
Return the current time truncated to the time granularity supported by the fs.
Note that inode and inode->sb cannot be NULL. Otherwise, the function warns and returns time without truncation.
-
void
make_bad_inode
(struct inode * inode)¶ mark an inode bad due to an I/O error
Parameters
struct inode * inode
- Inode to mark bad
Description
When an inode cannot be read due to a media or remote network failure this function makes the inode “bad” and causes I/O operations on it to fail from this point on.
-
bool
is_bad_inode
(struct inode * inode)¶ is an inode errored
Parameters
struct inode * inode
- inode to test
Description
Returns true if the inode in question has been marked as bad.
-
void
iget_failed
(struct inode * inode)¶ Mark an under-construction inode as dead and release it
Parameters
struct inode * inode
- The inode to discard
Description
Mark an under-construction inode as dead and release it.
Registration and Superblocks¶
-
void
deactivate_locked_super
(struct super_block * s)¶ drop an active reference to superblock
Parameters
struct super_block * s
- superblock to deactivate
Description
Drops an active reference to superblock, converting it into a temporary one if there is no other active references left. In that case we tell fs driver to shut it down and drop the temporary reference we had just acquired.
Caller holds exclusive lock on superblock; that lock is released.
-
void
deactivate_super
(struct super_block * s)¶ drop an active reference to superblock
Parameters
struct super_block * s
- superblock to deactivate
Description
Variant ofdeactivate_locked_super()
, except that superblock is not locked by caller. If we are going to drop the final active reference, lock will be acquired prior to that.
-
void
generic_shutdown_super
(struct super_block * sb)¶ common helper for ->kill_sb()
Parameters
struct super_block * sb
- superblock to kill
Description
generic_shutdown_super()
does all fs-independent work on superblock shutdown. Typical ->kill_sb() should pick all fs-specific objects that need destruction out of superblock, callgeneric_shutdown_super()
and release aforementioned objects. Note: dentries and inodes _are_ taken care of and do not need specific handling.Upon calling this function, the filesystem may no longer alter or rearrange the set of dentries belonging to this super_block, nor may it change the attachments of dentries to inodes.
-
struct super_block *
sget_fc
(struct fs_context * fc, int (*test) (struct super_block *, struct fs_context *, int (*set) (struct super_block *, struct fs_context *)¶ Find or create a superblock
Parameters
struct fs_context * fc
- Filesystem context.
int (*)(struct super_block *, struct fs_context *) test
- Comparison callback
int (*)(struct super_block *, struct fs_context *) set
- Setup callback
Description
Find or create a superblock using the parameters stored in the filesystem context and the two callback functions.
If an extant superblock is matched, then that will be returned with an elevated reference count that the caller must transfer or discard.
If no match is made, a new superblock will be allocated and basic initialisation will be performed (s_type, s_fs_info and s_id will be set and the set() callback will be invoked), the superblock will be published and it will be returned in a partially constructed state with SB_BORN and SB_ACTIVE as yet unset.
-
struct super_block *
sget
(struct file_system_type * type, int (*test) (struct super_block *, void *, int (*set) (struct super_block *, void *, int flags, void * data)¶ find or create a superblock
Parameters
struct file_system_type * type
- filesystem type superblock should belong to
int (*)(struct super_block *,void *) test
- comparison callback
int (*)(struct super_block *,void *) set
- setup callback
int flags
- mount flags
void * data
- argument to each of them
-
void
iterate_supers_type
(struct file_system_type * type, void (*f) (struct super_block *, void *, void * arg)¶ call function for superblocks of given type
Parameters
struct file_system_type * type
- fs type
void (*)(struct super_block *, void *) f
- function to call
void * arg
- argument to pass to it
Description
Scans the superblock list and calls given function, passing it locked superblock and given argument.
-
struct super_block *
get_super
(struct block_device * bdev)¶ get the superblock of a device
Parameters
struct block_device * bdev
- device to get the superblock for
Description
Scans the superblock list and finds the superblock of the file system mounted on the device given.NULL
is returned if no match is found.
-
struct super_block *
get_super_thawed
(struct block_device * bdev)¶ get thawed superblock of a device
Parameters
struct block_device * bdev
- device to get the superblock for
Description
Scans the superblock list and finds the superblock of the file system mounted on the device. The superblock is returned once it is thawed (or immediately if it was not frozen).NULL
is returned if no match is found.
-
struct super_block *
get_super_exclusive_thawed
(struct block_device * bdev)¶ get thawed superblock of a device
Parameters
struct block_device * bdev
- device to get the superblock for
Description
Scans the superblock list and finds the superblock of the file system mounted on the device. The superblock is returned once it is thawed (or immediately if it was not frozen) and s_umount semaphore is held in exclusive mode.NULL
is returned if no match is found.
-
int
get_anon_bdev
(dev_t * p)¶ Allocate a block device for filesystems which don’t have one.
Parameters
dev_t * p
- Pointer to a dev_t.
Description
Filesystems which don’t use real block devices can call this function to allocate a virtual block device.
Context
Any context. Frequently called while holding sb_lock.
Return
0 on success, -EMFILE if there are no anonymous bdevs left or -ENOMEM if memory allocation failed.
-
int
vfs_get_super
(struct fs_context * fc, enum vfs_get_super_keying keying, int (*fill_super) (struct super_block *sb, struct fs_context *fc)¶ Get a superblock with a search key set in s_fs_info.
Parameters
struct fs_context * fc
- The filesystem context holding the parameters
enum vfs_get_super_keying keying
- How to distinguish superblocks
int (*)(struct super_block *sb, struct fs_context *fc) fill_super
- Helper to initialise a new superblock
Description
Search for a superblock and create a new one if not found. The search criterion is controlled by keying. If the search fails, a new superblock is created and fill_super() is called to initialise it.
keying can take one of a number of values:
- vfs_get_single_super - Only one superblock of this type may exist on the system. This is typically used for special system filesystems.
- vfs_get_keyed_super - Multiple superblocks may exist, but they must have distinct keys (where the key is in s_fs_info). Searching for the same key again will turn up the superblock for that key.
- vfs_get_independent_super - Multiple superblocks may exist and are unkeyed. Each call will get a new superblock.
A permissions check is made by sget_fc()
unless we’re getting a superblock
for a kernel-internal mount or a submount.
-
int
get_tree_bdev
(struct fs_context * fc, int (*fill_super) (struct super_block *, struct fs_context *)¶ Get a superblock based on a single block device
Parameters
struct fs_context * fc
- The filesystem context holding the parameters
int (*)(struct super_block *, struct fs_context *) fill_super
- Helper to initialise a new superblock
-
int
vfs_get_tree
(struct fs_context * fc)¶ Get the mountable root
Parameters
struct fs_context * fc
- The superblock configuration context.
Description
The filesystem is invoked to get or create a superblock which can then later be used for mounting. The filesystem places a pointer to the root to be used for mounting in fc->root.
-
int
freeze_super
(struct super_block * sb)¶ lock the filesystem and force it into a consistent state
Parameters
struct super_block * sb
- the super to lock
Description
Syncs the super to make sure the filesystem is consistent and calls the fs’s freeze_fs. Subsequent calls to this without first thawing the fs will return -EBUSY.
During this function, sb->s_writers.frozen goes through these values:
SB_UNFROZEN: File system is normal, all writes progress as usual.
SB_FREEZE_WRITE: The file system is in the process of being frozen. New writes should be blocked, though page faults are still allowed. We wait for all writes to complete and then proceed to the next stage.
SB_FREEZE_PAGEFAULT: Freezing continues. Now also page faults are blocked but internal fs threads can still modify the filesystem (although they should not dirty new pages or inodes), writeback can run etc. After waiting for all running page faults we sync the filesystem which will clean all dirty pages and inodes (no new dirty pages or inodes can be created when sync is running).
SB_FREEZE_FS: The file system is frozen. Now all internal sources of fs modification are blocked (e.g. XFS preallocation truncation on inode reclaim). This is usually implemented by blocking new transactions for filesystems that have them and need this additional guard. After all internal writers are finished we call ->freeze_fs() to finish filesystem freezing. Then we transition to SB_FREEZE_COMPLETE state. This state is mostly auxiliary for filesystems to verify they do not modify frozen fs.
sb->s_writers.frozen is protected by sb->s_umount.
File Locks¶
-
int
locks_delete_block
(struct file_lock * waiter)¶ stop waiting for a file lock
Parameters
struct file_lock * waiter
- the lock which was waiting
Description
lockd/nfsd need to disconnect the lock while working on it.
-
int
posix_lock_file
(struct file * filp, struct file_lock * fl, struct file_lock * conflock)¶ Apply a POSIX-style lock to a file
Parameters
struct file * filp
- The file to apply the lock to
struct file_lock * fl
- The lock to be applied
struct file_lock * conflock
- Place to return a copy of the conflicting lock, if found.
Description
Add a POSIX style lock to a file. We merge adjacent & overlapping locks whenever possible. POSIX locks are sorted by owner task, then by starting address
Note that if called with an FL_EXISTS argument, the caller may determine whether or not a lock was successfully freed by testing the return value for -ENOENT.
-
int
locks_mandatory_area
(struct inode * inode, struct file * filp, loff_t start, loff_t end, unsigned char type)¶ Check for a conflicting lock
Parameters
struct inode * inode
- the file to check
struct file * filp
- how the file was opened (if it was)
loff_t start
- first byte in the file to check
loff_t end
- lastbyte in the file to check
unsigned char type
F_WRLCK
for a write lock, elseF_RDLCK
Description
Searches the inode’s list of locks to find any POSIX locks which conflict.
-
int
__break_lease
(struct inode * inode, unsigned int mode, unsigned int type)¶ revoke all outstanding leases on file
Parameters
struct inode * inode
- the inode of the file to return
unsigned int mode
- O_RDONLY: break only write leases; O_WRONLY or O_RDWR: break all leases
unsigned int type
- FL_LEASE: break leases and delegations; FL_DELEG: break only delegations
Description
break_lease (inlined for speed) has checked there already is at least some kind of lock (maybe a lease) on this file. Leases are broken on a call to open() or truncate(). This function can sleep unless you specifiedO_NONBLOCK
to your open().
-
void
lease_get_mtime
(struct inode * inode, struct timespec64 * time)¶ update modified time of an inode with exclusive lease
Parameters
struct inode * inode
- the inode
struct timespec64 * time
- pointer to a timespec which contains the last modified time
Description
This is to force NFS clients to flush their caches for files with exclusive leases. The justification is that if someone has an exclusive lease, then they could be modifying it.
-
int
generic_setlease
(struct file * filp, long arg, struct file_lock ** flp, void ** priv)¶ sets a lease on an open file
Parameters
struct file * filp
- file pointer
long arg
- type of lease to obtain
struct file_lock ** flp
- input - file_lock to use, output - file_lock inserted
void ** priv
- private data for lm_setup (may be NULL if lm_setup doesn’t require it)
Description
The (input) flp->fl_lmops->lm_break function is required by break_lease().
-
int
vfs_setlease
(struct file * filp, long arg, struct file_lock ** lease, void ** priv)¶ sets a lease on an open file
Parameters
struct file * filp
- file pointer
long arg
- type of lease to obtain
struct file_lock ** lease
- file_lock to use when adding a lease
void ** priv
- private info for lm_setup when adding a lease (may be NULL if lm_setup doesn’t require it)
Description
Call this to establish a lease on the file. The “lease” argument is not
used for F_UNLCK requests and may be NULL. For commands that set or alter
an existing lease, the (*lease)->fl_lmops->lm_break
operation must be
set; if not, this function will return -ENOLCK (and generate a scary-looking
stack trace).
The “priv” pointer is passed directly to the lm_setup function as-is. It may be NULL if the lm_setup operation doesn’t require it.
-
int
locks_lock_inode_wait
(struct inode * inode, struct file_lock * fl)¶ Apply a lock to an inode
Parameters
struct inode * inode
- inode of the file to apply to
struct file_lock * fl
- The lock to be applied
Description
Apply a POSIX or FLOCK style lock request to an inode.
-
int
vfs_test_lock
(struct file * filp, struct file_lock * fl)¶ test file byte range lock
Parameters
struct file * filp
- The file to test lock for
struct file_lock * fl
- The lock to test; also used to hold result
Description
Returns -ERRNO on failure. Indicates presence of conflicting lock by setting conf->fl_type to something other than F_UNLCK.
-
int
vfs_lock_file
(struct file * filp, unsigned int cmd, struct file_lock * fl, struct file_lock * conf)¶ file byte range lock
Parameters
struct file * filp
- The file to apply the lock to
unsigned int cmd
- type of locking operation (F_SETLK, F_GETLK, etc.)
struct file_lock * fl
- The lock to be applied
struct file_lock * conf
- Place to return a copy of the conflicting lock, if found.
Description
A caller that doesn’t care about the conflicting lock may pass NULL as the final argument.
If the filesystem defines a private ->lock() method, then conf will be left unchanged; so a caller that cares should initialize it to some acceptable default.
To avoid blocking kernel daemons, such as lockd, that need to acquire POSIX locks, the ->lock() interface may return asynchronously, before the lock has been granted or denied by the underlying filesystem, if (and only if) lm_grant is set. Callers expecting ->lock() to return asynchronously will only use F_SETLK, not F_SETLKW; they will set FL_SLEEP if (and only if) the request is for a blocking lock. When ->lock() does return asynchronously, it must return FILE_LOCK_DEFERRED, and call ->lm_grant() when the lock request completes. If the request is for non-blocking lock the file system should return FILE_LOCK_DEFERRED then try to get the lock and call the callback routine with the result. If the request timed out the callback routine will return a nonzero return code and the file system should release the lock. The file system is also responsible to keep a corresponding posix lock when it grants a lock so the VFS can find out which locks are locally held and do the correct lock cleanup when required. The underlying filesystem must not drop the kernel lock or call ->lm_grant() before returning to the caller with a FILE_LOCK_DEFERRED return code.
-
int
vfs_cancel_lock
(struct file * filp, struct file_lock * fl)¶ file byte range unblock lock
Parameters
struct file * filp
- The file to apply the unblock to
struct file_lock * fl
- The lock to be unblocked
Description
Used by lock managers to cancel blocked requests
-
int
posix_lock_inode_wait
(struct inode * inode, struct file_lock * fl)¶ Apply a POSIX-style lock to a file
Parameters
struct inode * inode
- inode of file to which lock request should be applied
struct file_lock * fl
- The lock to be applied
Description
Apply a POSIX style lock request to an inode.
-
int
locks_mandatory_locked
(struct file * file)¶ Check for an active lock
Parameters
struct file * file
- the file to check
Description
Searches the inode’s list of locks to find any POSIX locks which conflict. This function is called from locks_verify_locked() only.
-
int
fcntl_getlease
(struct file * filp)¶ Enquire what lease is currently active
Parameters
struct file * filp
- the file
Description
The value returned by this function will be one of (if no lease break is pending):
F_RDLCK
to indicate a shared lease is held.
F_WRLCK
to indicate an exclusive lease is held.
F_UNLCK
to indicate no lease is held.(if a lease break is pending):
F_RDLCK
to indicate an exclusive lease needs to be- changed to a shared lease (or removed).
F_UNLCK
to indicate the lease needs to be removed.XXX: sfr & willy disagree over whether F_INPROGRESS should be returned to userspace.
-
int
check_conflicting_open
(struct file * filp, const long arg, int flags)¶ see if the given file points to an inode that has an existing open that would conflict with the desired lease.
Parameters
struct file * filp
- file to check
const long arg
- type of lease that we’re trying to acquire
int flags
- current lock flags
Description
Check to see if there’s an existing open fd on this file that would conflict with the lease we’re trying to set.
-
int
fcntl_setlease
(unsigned int fd, struct file * filp, long arg)¶ sets a lease on an open file
Parameters
unsigned int fd
- open file descriptor
struct file * filp
- file pointer
long arg
- type of lease to obtain
Description
Call this fcntl to establish a lease on the file. Note that you also need to callF_SETSIG
to receive a signal when the lease is broken.
-
int
flock_lock_inode_wait
(struct inode * inode, struct file_lock * fl)¶ Apply a FLOCK-style lock to a file
Parameters
struct inode * inode
- inode of the file to apply to
struct file_lock * fl
- The lock to be applied
Description
Apply a FLOCK style lock request to an inode.
-
long
sys_flock
(unsigned int fd, unsigned int cmd)¶ flock() system call.
Parameters
unsigned int fd
- the file descriptor to lock.
unsigned int cmd
- the type of lock to apply.
Description
Apply a
FL_FLOCK
style lock to an open file descriptor. The cmd can be one of:
LOCK_SH
– a shared lock.LOCK_EX
– an exclusive lock.LOCK_UN
– remove an existing lock.LOCK_MAND
– a ‘mandatory’ flock. This exists to emulate Windows Share Modes.
LOCK_MAND
can be combined withLOCK_READ
orLOCK_WRITE
to allow other processes read and write access respectively.
-
pid_t
locks_translate_pid
(struct file_lock * fl, struct pid_namespace * ns)¶ translate a file_lock’s fl_pid number into a namespace
Parameters
struct file_lock * fl
- The file_lock who’s fl_pid should be translated
struct pid_namespace * ns
- The namespace into which the pid should be translated
Description
Used to tranlate a fl_pid into a namespace virtual pid number
Other Functions¶
-
int
mpage_readpages
(struct address_space * mapping, struct list_head * pages, unsigned nr_pages, get_block_t get_block)¶ populate an address space with some pages & start reads against them
Parameters
struct address_space * mapping
- the address_space
struct list_head * pages
- The address of a list_head which contains the target pages. These pages have their ->index populated and are otherwise uninitialised. The page at pages->prev has the lowest file offset, and reads should be issued in pages->prev to pages->next order.
unsigned nr_pages
- The number of pages at *pages
get_block_t get_block
- The filesystem’s block mapper function.
Description
This function walks the pages and the blocks within each page, building and emitting large BIOs.
If anything unusual happens, such as:
- encountering a page which has buffers
- encountering a page which has a non-hole after a hole
- encountering a page with non-contiguous blocks
then this code just gives up and calls the buffer_head-based read function. It does handle a page which has holes at the end - that is a common case: the end-of-file on blocksize < PAGE_SIZE setups.
BH_Boundary explanation:
There is a problem. The mpage read code assembles several pages, gets all their disk mappings, and then submits them all. That’s fine, but obtaining the disk mappings may require I/O. Reads of indirect blocks, for example.
So an mpage read of the first 16 blocks of an ext2 file will cause I/O to be submitted in the following order:
12 0 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16
because the indirect block has to be read to get the mappings of blocks 13,14,15,16. Obviously, this impacts performance.
So what we do it to allow the filesystem’s get_block() function to set BH_Boundary when it maps block 11. BH_Boundary says: mapping of the block after this one will require I/O against a block which is probably close to this one. So you should push what I/O you have currently accumulated.
This all causes the disk requests to be issued in the correct order.
-
int
mpage_writepages
(struct address_space * mapping, struct writeback_control * wbc, get_block_t get_block)¶ walk the list of dirty pages of the given address space & writepage() all of them
Parameters
struct address_space * mapping
- address space structure to write
struct writeback_control * wbc
- subtract the number of written pages from *wbc->nr_to_write
get_block_t get_block
- the filesystem’s block mapper function. If this is NULL then use a_ops->writepage. Otherwise, go direct-to-BIO.
Description
This is a library function, which implements the writepages() address_space_operation.
If a page is already under I/O, generic_writepages()
skips it, even
if it’s dirty. This is desirable behaviour for memory-cleaning writeback,
but it is INCORRECT for data-integrity system calls such as fsync(). fsync()
and msync() need to guarantee that all the data which was dirty at the time
the call was made get new I/O started against them. If wbc->sync_mode is
WB_SYNC_ALL then we were called for data integrity and we must wait for
existing IO to complete.
-
int
generic_permission
(struct inode * inode, int mask)¶ check for access rights on a Posix-like filesystem
Parameters
struct inode * inode
- inode to check access rights for
int mask
- right to check for (
MAY_READ
,MAY_WRITE
,MAY_EXEC
, …)
Description
Used to check for read/write/execute permissions on a file. We use “fsuid” for this, letting us set arbitrary permissions for filesystem access without changing the “normal” uids which are used for other things.
generic_permission is rcu-walk aware. It returns -ECHILD in case an rcu-walk request cannot be satisfied (eg. requires blocking or too much complexity). It would then be called again in ref-walk mode.
-
int
inode_permission
(struct inode * inode, int mask)¶ Check for access rights to a given inode
Parameters
struct inode * inode
- Inode to check permission on
int mask
- Right to check for (
MAY_READ
,MAY_WRITE
,MAY_EXEC
)
Description
Check for read/write/execute permissions on an inode. We use fs[ug]id for this, letting us set arbitrary permissions for filesystem access without changing the “normal” UIDs which are used for other things.
When checking for MAY_APPEND, MAY_WRITE must also be set in mask.
-
void
path_get
(const struct path * path)¶ get a reference to a path
Parameters
const struct path * path
- path to get the reference to
Description
Given a path increment the reference count to the dentry and the vfsmount.
-
void
path_put
(const struct path * path)¶ put a reference to a path
Parameters
const struct path * path
- path to put the reference to
Description
Given a path decrement the reference count to the dentry and the vfsmount.
-
int
vfs_path_lookup
(struct dentry * dentry, struct vfsmount * mnt, const char * name, unsigned int flags, struct path * path)¶ lookup a file path relative to a dentry-vfsmount pair
Parameters
struct dentry * dentry
- pointer to dentry of the base directory
struct vfsmount * mnt
- pointer to vfs mount of the base directory
const char * name
- pointer to file name
unsigned int flags
- lookup flags
struct path * path
- pointer to struct path to fill
-
struct dentry *
try_lookup_one_len
(const char * name, struct dentry * base, int len)¶ filesystem helper to lookup single pathname component
Parameters
const char * name
- pathname component to lookup
struct dentry * base
- base directory to lookup from
int len
- maximum length len should be interpreted to
Description
Look up a dentry by name in the dcache, returning NULL if it does not currently exist. The function does not try to create a dentry.
Note that this routine is purely a helper for filesystem usage and should not be called by generic code.
The caller must hold base->i_mutex.
-
struct dentry *
lookup_one_len
(const char * name, struct dentry * base, int len)¶ filesystem helper to lookup single pathname component
Parameters
const char * name
- pathname component to lookup
struct dentry * base
- base directory to lookup from
int len
- maximum length len should be interpreted to
Description
Note that this routine is purely a helper for filesystem usage and should not be called by generic code.
The caller must hold base->i_mutex.
-
struct dentry *
lookup_one_len_unlocked
(const char * name, struct dentry * base, int len)¶ filesystem helper to lookup single pathname component
Parameters
const char * name
- pathname component to lookup
struct dentry * base
- base directory to lookup from
int len
- maximum length len should be interpreted to
Description
Note that this routine is purely a helper for filesystem usage and should not be called by generic code.
Unlike lookup_one_len, it should be called without the parent i_mutex held, and will take the i_mutex itself if necessary.
-
int
vfs_unlink
(struct inode * dir, struct dentry * dentry, struct inode ** delegated_inode)¶ unlink a filesystem object
Parameters
struct inode * dir
- parent directory
struct dentry * dentry
- victim
struct inode ** delegated_inode
- returns victim inode, if the inode is delegated.
Description
The caller must hold dir->i_mutex.
If vfs_unlink discovers a delegation, it will return -EWOULDBLOCK and return a reference to the inode in delegated_inode. The caller should then break the delegation on that inode and retry. Because breaking a delegation may take a long time, the caller should drop dir->i_mutex before doing so.
Alternatively, a caller may pass NULL for delegated_inode. This may be appropriate for callers that expect the underlying filesystem not to be NFS exported.
-
int
vfs_link
(struct dentry * old_dentry, struct inode * dir, struct dentry * new_dentry, struct inode ** delegated_inode)¶ create a new link
Parameters
struct dentry * old_dentry
- object to be linked
struct inode * dir
- new parent
struct dentry * new_dentry
- where to create the new link
struct inode ** delegated_inode
- returns inode needing a delegation break
Description
The caller must hold dir->i_mutex
If vfs_link discovers a delegation on the to-be-linked file in need of breaking, it will return -EWOULDBLOCK and return a reference to the inode in delegated_inode. The caller should then break the delegation and retry. Because breaking a delegation may take a long time, the caller should drop the i_mutex before doing so.
Alternatively, a caller may pass NULL for delegated_inode. This may be appropriate for callers that expect the underlying filesystem not to be NFS exported.
-
int
vfs_rename
(struct inode * old_dir, struct dentry * old_dentry, struct inode * new_dir, struct dentry * new_dentry, struct inode ** delegated_inode, unsigned int flags)¶ rename a filesystem object
Parameters
struct inode * old_dir
- parent of source
struct dentry * old_dentry
- source
struct inode * new_dir
- parent of destination
struct dentry * new_dentry
- destination
struct inode ** delegated_inode
- returns an inode needing a delegation break
unsigned int flags
- rename flags
Description
The caller must hold multiple mutexes–see lock_rename()).
If vfs_rename discovers a delegation in need of breaking at either the source or destination, it will return -EWOULDBLOCK and return a reference to the inode in delegated_inode. The caller should then break the delegation and retry. Because breaking a delegation may take a long time, the caller should drop all locks before doing so.
Alternatively, a caller may pass NULL for delegated_inode. This may be appropriate for callers that expect the underlying filesystem not to be NFS exported.
The worst of all namespace operations - renaming directory. “Perverted” doesn’t even start to describe it. Somebody in UCB had a heck of a trip… Problems:
- we can get into loop creation.
- race potential - two innocent renames can create a loop together. That’s where 4.4 screws up. Current fix: serialization on sb->s_vfs_rename_mutex. We might be more accurate, but that’s another story.
- we have to lock _four_ objects - parents and victim (if it exists), and source (if it is not a directory). And that - after we got ->i_mutex on parents (until then we don’t know whether the target exists). Solution: try to be smart with locking order for inodes. We rely on the fact that tree topology may change only under ->s_vfs_rename_mutex _and_ that parent of the object we move will be locked. Thus we can rank directories by the tree (ancestors first) and rank all non-directories after them. That works since everybody except rename does “lock parent, lookup, lock child” and rename is under ->s_vfs_rename_mutex. HOWEVER, it relies on the assumption that any object with ->lookup() has no more than 1 dentry. If “hybrid” objects will ever appear, we’d better make sure that there’s no link(2) for them.
- conversion from fhandle to dentry may come in the wrong moment - when we are removing the target. Solution: we will have to grab ->i_mutex in the fhandle_to_dentry code. [FIXME - current nfsfh.c relies on ->i_mutex on parents, which works but leads to some truly excessive locking].
-
int
vfs_readlink
(struct dentry * dentry, char __user * buffer, int buflen)¶ copy symlink body into userspace buffer
Parameters
struct dentry * dentry
- dentry on which to get symbolic link
char __user * buffer
- user memory pointer
int buflen
- size of buffer
Description
Does not touch atime. That’s up to the caller if necessary
Does not call security hook.
-
const char *
vfs_get_link
(struct dentry * dentry, struct delayed_call * done)¶ get symlink body
Parameters
struct dentry * dentry
- dentry on which to get symbolic link
struct delayed_call * done
- caller needs to free returned data with this
Description
Calls security hook and i_op->get_link() on the supplied inode.
It does not touch atime. That’s up to the caller if necessary.
Does not work on “special” symlinks like /proc/$$/fd/N
-
int
sync_mapping_buffers
(struct address_space * mapping)¶ write out & wait upon a mapping’s “associated” buffers
Parameters
struct address_space * mapping
- the mapping which wants those buffers written
Description
Starts I/O against the buffers at mapping->private_list, and waits upon that I/O.
Basically, this is a convenience function for fsync(). mapping is a file or directory which needs those buffers to be written for a successful fsync().
-
void
mark_buffer_dirty
(struct buffer_head * bh)¶ mark a buffer_head as needing writeout
Parameters
struct buffer_head * bh
- the buffer_head to mark dirty
Description
mark_buffer_dirty()
will set the dirty bit against the buffer, then set
its backing page dirty, then tag the page as dirty in the page cache
and then attach the address_space’s inode to its superblock’s dirty
inode list.
mark_buffer_dirty()
is atomic. It takes bh->b_page->mapping->private_lock,
i_pages lock and mapping->host->i_lock.
-
struct buffer_head *
__bread_gfp
(struct block_device * bdev, sector_t block, unsigned size, gfp_t gfp)¶ reads a specified block and returns the bh
Parameters
struct block_device * bdev
- the block_device to read from
sector_t block
- number of block
unsigned size
- size (in bytes) to read
gfp_t gfp
- page allocation flag
Description
Reads a specified block, and returns buffer head that contains it. The page cache can be allocated from non-movable area not to prevent page migration if you set gfp to zero. It returns NULL if the block was unreadable.
-
void
block_invalidatepage
(struct page * page, unsigned int offset, unsigned int length)¶ invalidate part or all of a buffer-backed page
Parameters
struct page * page
- the page which is affected
unsigned int offset
- start of the range to invalidate
unsigned int length
- length of the range to invalidate
Description
block_invalidatepage()
is called when all or part of the page has become
invalidated by a truncate operation.
block_invalidatepage()
does not have to release all buffers, but it must
ensure that no dirty buffer is left outside offset and that no I/O
is underway against any of the blocks which are outside the truncation
point. Because the caller is about to free (and possibly reuse) those
blocks on-disk.
-
void
clean_bdev_aliases
(struct block_device * bdev, sector_t block, sector_t len)¶
Parameters
struct block_device * bdev
- Block device to clean buffers in
sector_t block
- Start of a range of blocks to clean
sector_t len
- Number of blocks to clean
Description
We are taking a range of blocks for data and we don’t want writeback of any buffer-cache aliases starting from return from this function and until the moment when something will explicitly mark the buffer dirty (hopefully that will not happen until we will free that block ;-) We don’t even need to mark it not-uptodate - nobody can expect anything from a newly allocated buffer anyway. We used to use unmap_buffer() for such invalidation, but that was wrong. We definitely don’t want to mark the alias unmapped, for example - it would confuse anyone who might pick it with bread() afterwards…
Also.. Note that bforget() doesn’t lock the buffer. So there can be writeout I/O going on against recently-freed buffers. We don’t wait on that I/O in bforget() - it’s more efficient to wait on the I/O only if we really need to. That happens here.
-
void
ll_rw_block
(int op, int op_flags, int nr, struct buffer_head * bhs)¶ level access to block devices (DEPRECATED)
Parameters
int op
- whether to
READ
orWRITE
int op_flags
- req_flag_bits
int nr
- number of
struct buffer_heads
in the array struct buffer_head * bhs
- array of pointers to
struct buffer_head
Description
ll_rw_block()
takes an array of pointers to struct buffer_heads
, and
requests an I/O operation on them, either a REQ_OP_READ
or a REQ_OP_WRITE
.
op_flags contains flags modifying the detailed I/O behavior, most notably
REQ_RAHEAD
.
This function drops any buffer that it cannot get a lock on (with the BH_Lock state bit), any buffer that appears to be clean when doing a write request, and any buffer that appears to be up-to-date when doing read request. Further it marks as clean buffers that are processed for writing (the buffer cache won’t assume that they are actually clean until the buffer gets unlocked).
ll_rw_block sets b_end_io to simple completion handler that marks the buffer up-to-date (if appropriate), unlocks the buffer and wakes any waiters.
All of the buffers must be for the same device, and must also be a multiple of the current approved size for the device.
-
int
bh_uptodate_or_lock
(struct buffer_head * bh)¶ Test whether the buffer is uptodate
Parameters
struct buffer_head * bh
- struct buffer_head
Description
Return true if the buffer is up-to-date and false, with the buffer locked, if not.
-
int
bh_submit_read
(struct buffer_head * bh)¶ Submit a locked buffer for reading
Parameters
struct buffer_head * bh
- struct buffer_head
Description
Returns zero on success and -EIO on error.
-
void
bio_reset
(struct bio * bio)¶ reinitialize a bio
Parameters
struct bio * bio
- bio to reset
Description
After callingbio_reset()
, bio will be in the same state as a freshly allocated bio returned biobio_alloc_bioset()
- the only fields that are preserved are the ones that are initialized bybio_alloc_bioset()
. See comment in struct bio.
-
void
bio_chain
(struct bio * bio, struct bio * parent)¶ chain bio completions
Parameters
struct bio * bio
- the target bio
struct bio * parent
- the bio’s parent bio
Description
The caller won’t have a bi_end_io called when bio completes - instead, parent’s bi_end_io won’t be called until both parent and bio have completed; the chained bio will also be freed when it completes.
The caller must not set bi_private or bi_end_io in bio.
-
struct bio *
bio_alloc_bioset
(gfp_t gfp_mask, unsigned int nr_iovecs, struct bio_set * bs)¶ allocate a bio for I/O
Parameters
gfp_t gfp_mask
- the GFP_* mask given to the slab allocator
unsigned int nr_iovecs
- number of iovecs to pre-allocate
struct bio_set * bs
- the bio_set to allocate from.
Description
If bs is NULL, uses
kmalloc()
to allocate the bio; else the allocation is backed by the bs’s mempool.When bs is not NULL, if
__GFP_DIRECT_RECLAIM
is set then bio_alloc will always be able to allocate a bio. This is due to the mempool guarantees. To make this work, callers must never allocate more than 1 bio at a time from this pool. Callers that need to allocate more than 1 bio must always submit the previously allocated bio for IO before attempting to allocate a new one. Failure to do so can cause deadlocks under memory pressure.Note that when running under
generic_make_request()
(i.e. any block driver), bios are not submitted until after you return - see the code ingeneric_make_request()
that converts recursion into iteration, to prevent stack overflows.This would normally mean allocating multiple bios under
generic_make_request()
would be susceptible to deadlocks, but we have deadlock avoidance code that resubmits any blocked bios from a rescuer thread.However, we do not guarantee forward progress for allocations from other mempools. Doing multiple allocations from the same mempool under
generic_make_request()
should be avoided - instead, use bio_set’s front_pad for per bio allocations.
Return
Pointer to new bio on success, NULL on failure.
-
void
bio_put
(struct bio * bio)¶ release a reference to a bio
Parameters
struct bio * bio
- bio to release reference to
Description
Put a reference to astruct bio
, either one you have gotten with bio_alloc, bio_get or bio_clone_*. The last put of a bio will free it.
-
void
__bio_clone_fast
(struct bio * bio, struct bio * bio_src)¶ clone a bio that shares the original bio’s biovec
Parameters
struct bio * bio
- destination bio
struct bio * bio_src
- bio to clone
Description
Clone a
bio
. Caller will own the returned bio, but not the actual data it points to. Reference count of returned bio will be one.Caller must ensure that bio_src is not freed before bio.
-
struct bio *
bio_clone_fast
(struct bio * bio, gfp_t gfp_mask, struct bio_set * bs)¶ clone a bio that shares the original bio’s biovec
Parameters
struct bio * bio
- bio to clone
gfp_t gfp_mask
- allocation priority
struct bio_set * bs
- bio_set to allocate from
Description
Like __bio_clone_fast, only also allocates the returned bio
-
bool
__bio_try_merge_page
(struct bio * bio, struct page * page, unsigned int len, unsigned int off, bool * same_page)¶ try appending data to an existing bvec.
Parameters
struct bio * bio
- destination bio
struct page * page
- start page to add
unsigned int len
- length of the data to add
unsigned int off
- offset of the data relative to page
bool * same_page
- return if the segment has been merged inside the same page
Description
Try to add the data at page + off to the last bvec of bio. This is a a useful optimisation for file systems with a block size smaller than the page size.
Warn if (len, off) crosses pages in case that same_page is true.
Return true
on success or false
on failure.
-
void
__bio_add_page
(struct bio * bio, struct page * page, unsigned int len, unsigned int off)¶ add page(s) to a bio in a new segment
Parameters
struct bio * bio
- destination bio
struct page * page
- start page to add
unsigned int len
- length of the data to add, may cross pages
unsigned int off
- offset of the data relative to page, may cross pages
Description
Add the data at page + off to bio as a new bvec. The caller must ensure that bio has space for another bvec.
-
int
bio_add_page
(struct bio * bio, struct page * page, unsigned int len, unsigned int offset)¶ attempt to add page(s) to bio
Parameters
struct bio * bio
- destination bio
struct page * page
- start page to add
unsigned int len
- vec entry length, may cross pages
unsigned int offset
- vec entry offset relative to page, may cross pages
Description
Attempt to add page(s) to the bio_vec maplist. This will only fail if either bio->bi_vcnt == bio->bi_max_vecs or it’s a cloned bio.
-
int
submit_bio_wait
(struct bio * bio)¶ submit a bio, and wait until it completes
Parameters
struct bio * bio
- The
struct bio
which describes the I/O
Description
Simple wrapper around submit_bio()
. Returns 0 on success, or the error from
bio_endio()
on failure.
WARNING: Unlike to how submit_bio()
is usually used, this function does not
result in bio reference to be consumed. The caller must drop the reference
on his own.
-
void
bio_advance
(struct bio * bio, unsigned bytes)¶ increment/complete a bio by some number of bytes
Parameters
struct bio * bio
- bio to advance
unsigned bytes
- number of bytes to complete
Description
This updates bi_sector, bi_size and bi_idx; if the number of bytes to complete doesn’t align with a bvec boundary, then bv_len and bv_offset will be updated on the last bvec as well.
bio will then represent the remaining, uncompleted portion of the io.
-
void
bio_copy_data
(struct bio * dst, struct bio * src)¶ copy contents of data buffers from one bio to another
Parameters
struct bio * dst
- destination bio
struct bio * src
- source bio
Description
Stops when it reaches the end of either src or dst - that is, copies min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).
-
void
bio_list_copy_data
(struct bio * dst, struct bio * src)¶ copy contents of data buffers from one chain of bios to another
Parameters
struct bio * dst
- destination bio list
struct bio * src
- source bio list
Description
Stops when it reaches the end of either the src list or dst list - that is, copies min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).
-
void
bio_endio
(struct bio * bio)¶ end I/O on a bio
Parameters
struct bio * bio
- bio
Description
bio_endio()
will end I/O on the whole bio.bio_endio()
is the preferred way to end I/O on a bio. No one should call bi_end_io() directly on a bio unless they own it and thus know that it has an end_io function.
bio_endio()
can be called several times on a bio that has been chained usingbio_chain()
. The ->bi_end_io() function will only be called the last time. At this point the BLK_TA_COMPLETE tracing event will be generated if BIO_TRACE_COMPLETION is set.
-
struct bio *
bio_split
(struct bio * bio, int sectors, gfp_t gfp, struct bio_set * bs)¶ split a bio
Parameters
struct bio * bio
- bio to split
int sectors
- number of sectors to split from the front of bio
gfp_t gfp
- gfp mask
struct bio_set * bs
- bio set to allocate from
Description
Allocates and returns a new bio which represents sectors from the start of bio, and updates bio to represent the remaining sectors.
Unless this is a discard request the newly allocated bio will point to bio’s bi_io_vec. It is the caller’s responsibility to ensure that neither bio nor bs are freed before the split bio.
-
void
bio_trim
(struct bio * bio, int offset, int size)¶ trim a bio
Parameters
struct bio * bio
- bio to trim
int offset
- number of sectors to trim from the front of bio
int size
- size we want to trim bio to, in sectors
-
int
bioset_init
(struct bio_set * bs, unsigned int pool_size, unsigned int front_pad, int flags)¶ Initialize a bio_set
Parameters
struct bio_set * bs
- pool to initialize
unsigned int pool_size
- Number of bio and bio_vecs to cache in the mempool
unsigned int front_pad
- Number of bytes to allocate in front of the returned bio
int flags
- Flags to modify behavior, currently
BIOSET_NEED_BVECS
andBIOSET_NEED_RESCUER
Description
Set up a bio_set to be used with bio_alloc_bioset. Allows the caller to ask for a number of bytes to be allocated in front of the bio. Front pad allocation is useful for embedding the bio inside another structure, to avoid allocating extra data to go with the bio. Note that the bio must be embedded at the END of that structure always, or things will break badly. IfBIOSET_NEED_BVECS
is set in flags, a separate pool will be allocated for allocating iovecs. This pool is not needed e.g. forbio_clone_fast()
. IfBIOSET_NEED_RESCUER
is set, a workqueue is created which can be used to dispatch queued requests when the mempool runs out of space.
-
void
bio_disassociate_blkg
(struct bio * bio)¶ puts back the blkg reference if associated
Parameters
struct bio * bio
- target bio
Description
Helper to disassociate the blkg from bio if a blkg is associated.
-
void
bio_associate_blkg_from_css
(struct bio * bio, struct cgroup_subsys_state * css)¶ associate a bio with a specified css
Parameters
struct bio * bio
- target bio
struct cgroup_subsys_state * css
- target css
Description
Associate bio with the blkg found by combining the css’s blkg and the request_queue of the bio. This falls back to the queue’s root_blkg if the association fails with the css.
-
void
bio_associate_blkg
(struct bio * bio)¶ associate a bio with a blkg
Parameters
struct bio * bio
- target bio
Description
Associate bio with the blkg found from the bio’s css and request_queue. If one is not found, bio_lookup_blkg() creates the blkg. If a blkg is already associated, the css is reused and association redone as the request_queue may have changed.
-
void
bio_clone_blkg_association
(struct bio * dst, struct bio * src)¶ clone blkg association from src to dst bio
Parameters
struct bio * dst
- destination bio
struct bio * src
- source bio
-
int
seq_open
(struct file * file, const struct seq_operations * op)¶ initialize sequential file
Parameters
struct file * file
- file we initialize
const struct seq_operations * op
- method table describing the sequence
Description
seq_open()
sets file, associating it with a sequence described by op. **op->start**() sets the iterator up and returns the first element of sequence. **op->stop**() shuts it down. **op->next**() returns the next element of sequence. **op->show**() prints element into the buffer. In case of error ->start() and ->next() return ERR_PTR(error). In the end of sequence they returnNULL
. ->show() returns 0 in case of success and negative number in case of error. Returning SEQ_SKIP means “discard this element and move on”.
Note
- seq_open() will allocate a struct seq_file and store its
- pointer in file->private_data. This pointer should not be modified.
-
ssize_t
seq_read
(struct file * file, char __user * buf, size_t size, loff_t * ppos)¶ ->read() method for sequential files.
Parameters
struct file * file
- the file to read from
char __user * buf
- the buffer to read to
size_t size
- the maximum number of bytes to read
loff_t * ppos
- the current position in the file
Description
Ready-made ->f_op->read()
-
loff_t
seq_lseek
(struct file * file, loff_t offset, int whence)¶ ->llseek() method for sequential files.
Parameters
struct file * file
- the file in question
loff_t offset
- new position
int whence
- 0 for absolute, 1 for relative position
Description
Ready-made ->f_op->llseek()
-
int
seq_release
(struct inode * inode, struct file * file)¶ free the structures associated with sequential file.
Parameters
struct inode * inode
- its inode
struct file * file
- file in question
Description
Frees the structures associated with sequential file; can be used as ->f_op->release() if you don’t have private data to destroy.
-
void
seq_escape
(struct seq_file * m, const char * s, const char * esc)¶ print string into buffer, escaping some characters
Parameters
struct seq_file * m
- target buffer
const char * s
- string
const char * esc
- set of characters that need escaping
Description
Puts string into buffer, replacing each occurrence of character from esc with usual octal escape. Use seq_has_overflowed() to check for errors.
-
char *
mangle_path
(char * s, const char * p, const char * esc)¶ mangle and copy path to buffer beginning
Parameters
char * s
- buffer start
const char * p
- beginning of path in above buffer
const char * esc
- set of characters that need escaping
Description
Copy the path from p to s, replacing each occurrence of character from esc with usual octal escape. Returns pointer past last written character in s, or NULL in case of failure.
-
int
seq_path
(struct seq_file * m, const struct path * path, const char * esc)¶ seq_file interface to print a pathname
Parameters
struct seq_file * m
- the seq_file handle
const struct path * path
- the struct path to print
const char * esc
- set of characters to escape in the output
Description
return the absolute path of ‘path’, as represented by the dentry / mnt pair in the path parameter.
-
int
seq_file_path
(struct seq_file * m, struct file * file, const char * esc)¶ seq_file interface to print a pathname of a file
Parameters
struct seq_file * m
- the seq_file handle
struct file * file
- the struct file to print
const char * esc
- set of characters to escape in the output
Description
return the absolute path to the file.
-
int
seq_write
(struct seq_file * seq, const void * data, size_t len)¶ write arbitrary data to buffer
Parameters
struct seq_file * seq
- seq_file identifying the buffer to which data should be written
const void * data
- data address
size_t len
- number of bytes
Description
Return 0 on success, non-zero otherwise.
-
void
seq_pad
(struct seq_file * m, char c)¶ write padding spaces to buffer
Parameters
struct seq_file * m
- seq_file identifying the buffer to which data should be written
char c
- the byte to append after padding if non-zero
-
struct hlist_node *
seq_hlist_start
(struct hlist_head * head, loff_t pos)¶ start an iteration of a hlist
Parameters
struct hlist_head * head
- the head of the hlist
loff_t pos
- the start position of the sequence
Description
Called at seq_file->op->start().
-
struct hlist_node *
seq_hlist_start_head
(struct hlist_head * head, loff_t pos)¶ start an iteration of a hlist
Parameters
struct hlist_head * head
- the head of the hlist
loff_t pos
- the start position of the sequence
Description
Called at seq_file->op->start(). Call this function if you want to print a header at the top of the output.
-
struct hlist_node *
seq_hlist_next
(void * v, struct hlist_head * head, loff_t * ppos)¶ move to the next position of the hlist
Parameters
void * v
- the current iterator
struct hlist_head * head
- the head of the hlist
loff_t * ppos
- the current position
Description
Called at seq_file->op->next().
-
struct hlist_node *
seq_hlist_start_rcu
(struct hlist_head * head, loff_t pos)¶ start an iteration of a hlist protected by RCU
Parameters
struct hlist_head * head
- the head of the hlist
loff_t pos
- the start position of the sequence
Description
Called at seq_file->op->start().
This list-traversal primitive may safely run concurrently with
the _rcu list-mutation primitives such as hlist_add_head_rcu()
as long as the traversal is guarded by rcu_read_lock()
.
-
struct hlist_node *
seq_hlist_start_head_rcu
(struct hlist_head * head, loff_t pos)¶ start an iteration of a hlist protected by RCU
Parameters
struct hlist_head * head
- the head of the hlist
loff_t pos
- the start position of the sequence
Description
Called at seq_file->op->start(). Call this function if you want to print a header at the top of the output.
This list-traversal primitive may safely run concurrently with
the _rcu list-mutation primitives such as hlist_add_head_rcu()
as long as the traversal is guarded by rcu_read_lock()
.
-
struct hlist_node *
seq_hlist_next_rcu
(void * v, struct hlist_head * head, loff_t * ppos)¶ move to the next position of the hlist protected by RCU
Parameters
void * v
- the current iterator
struct hlist_head * head
- the head of the hlist
loff_t * ppos
- the current position
Description
Called at seq_file->op->next().
This list-traversal primitive may safely run concurrently with
the _rcu list-mutation primitives such as hlist_add_head_rcu()
as long as the traversal is guarded by rcu_read_lock()
.
-
struct hlist_node *
seq_hlist_start_percpu
(struct hlist_head __percpu * head, int * cpu, loff_t pos)¶ start an iteration of a percpu hlist array
Parameters
struct hlist_head __percpu * head
- pointer to percpu array of struct hlist_heads
int * cpu
- pointer to cpu “cursor”
loff_t pos
- start position of sequence
Description
Called at seq_file->op->start().
-
struct hlist_node *
seq_hlist_next_percpu
(void * v, struct hlist_head __percpu * head, int * cpu, loff_t * pos)¶ move to the next position of the percpu hlist array
Parameters
void * v
- pointer to current hlist_node
struct hlist_head __percpu * head
- pointer to percpu array of struct hlist_heads
int * cpu
- pointer to cpu “cursor”
loff_t * pos
- start position of sequence
Description
Called at seq_file->op->next().
-
int
register_filesystem
(struct file_system_type * fs)¶ register a new filesystem
Parameters
struct file_system_type * fs
- the file system structure
Description
Adds the file system passed to the list of file systems the kernel is aware of for mount and other syscalls. Returns 0 on success, or a negative errno code on an error.
The
struct file_system_type
that is passed is linked into the kernel structures and must not be freed until the file system has been unregistered.
-
int
unregister_filesystem
(struct file_system_type * fs)¶ unregister a file system
Parameters
struct file_system_type * fs
- filesystem to unregister
Description
Remove a file system that was previously successfully registered with the kernel. An error is returned if the file system is not found. Zero is returned on a success.
Once this function has returned the
struct file_system_type
structure may be freed or reused.
-
void
wbc_attach_and_unlock_inode
(struct writeback_control * wbc, struct inode * inode)¶ associate wbc with target inode and unlock it
Parameters
struct writeback_control * wbc
- writeback_control of interest
struct inode * inode
- target inode
Description
inode is locked and about to be written back under the control of wbc.
Record inode’s writeback context into wbc and unlock the i_lock. On
writeback completion, wbc_detach_inode()
should be called. This is used
to track the cgroup writeback context.
-
void
wbc_detach_inode
(struct writeback_control * wbc)¶ disassociate wbc from inode and perform foreign detection
Parameters
struct writeback_control * wbc
- writeback_control of the just finished writeback
Description
To be called after a writeback attempt of an inode finishes and undoes
wbc_attach_and_unlock_inode()
. Can be called under any context.
As concurrent write sharing of an inode is expected to be very rare and memcg only tracks page ownership on first-use basis severely confining the usefulness of such sharing, cgroup writeback tracks ownership per-inode. While the support for concurrent write sharing of an inode is deemed unnecessary, an inode being written to by different cgroups at different points in time is a lot more common, and, more importantly, charging only by first-use can too readily lead to grossly incorrect behaviors (single foreign page can lead to gigabytes of writeback to be incorrectly attributed).
To resolve this issue, cgroup writeback detects the majority dirtier of an inode and transfers the ownership to it. To avoid unnnecessary oscillation, the detection mechanism keeps track of history and gives out the switch verdict only if the foreign usage pattern is stable over a certain amount of time and/or writeback attempts.
On each writeback attempt, wbc tries to detect the majority writer using Boyer-Moore majority vote algorithm. In addition to the byte count from the majority voting, it also counts the bytes written for the current wb and the last round’s winner wb (max of last round’s current wb, the winner from two rounds ago, and the last round’s majority candidate). Keeping track of the historical winner helps the algorithm to semi-reliably detect the most active writer even when it’s not the absolute majority.
Once the winner of the round is determined, whether the winner is foreign or not and how much IO time the round consumed is recorded in inode->i_wb_frn_history. If the amount of recorded foreign IO time is over a certain threshold, the switch verdict is given.
-
void
wbc_account_cgroup_owner
(struct writeback_control * wbc, struct page * page, size_t bytes)¶ account writeback to update inode cgroup ownership
Parameters
struct writeback_control * wbc
- writeback_control of the writeback in progress
struct page * page
- page being written out
size_t bytes
- number of bytes being written out
Description
bytes from page are about to written out during the writeback
controlled by wbc. Keep the book for foreign inode detection. See
wbc_detach_inode()
.
-
int
inode_congested
(struct inode * inode, int cong_bits)¶ test whether an inode is congested
Parameters
struct inode * inode
- inode to test for congestion (may be NULL)
int cong_bits
- mask of WB_[a]sync_congested bits to test
Description
Tests whether inode is congested. cong_bits is the mask of congestion bits to test and the return value is the mask of set bits.
If cgroup writeback is enabled for inode, the congestion state is determined by whether the cgwb (cgroup bdi_writeback) for the blkcg associated with inode is congested; otherwise, the root wb’s congestion state is used.
inode is allowed to be NULL as this function is often called on mapping->host which is NULL for the swapper space.
-
void
__mark_inode_dirty
(struct inode * inode, int flags)¶ internal function
Parameters
struct inode * inode
- inode to mark
int flags
- what kind of dirty (i.e. I_DIRTY_SYNC)
Description
Mark an inode as dirty. Callers should use mark_inode_dirty or mark_inode_dirty_sync.
Put the inode on the super block’s dirty list.
CAREFUL! We mark it dirty unconditionally, but move it onto the dirty list only if it is hashed or if it refers to a blockdev. If it was not hashed, it will never be added to the dirty list even if it is later hashed, as it will have been marked dirty already.
In short, make sure you hash any inodes _before_ you start marking them dirty.
Note that for blockdevs, inode->dirtied_when represents the dirtying time of the block-special inode (/dev/hda1) itself. And the ->dirtied_when field of the kernel-internal blockdev inode represents the dirtying time of the blockdev’s pages. This is why for I_DIRTY_PAGES we always use page->mapping->host, so the page-dirtying time is recorded in the internal blockdev inode.
-
void
writeback_inodes_sb_nr
(struct super_block * sb, unsigned long nr, enum wb_reason reason)¶ writeback dirty inodes from given super_block
Parameters
struct super_block * sb
- the superblock
unsigned long nr
- the number of pages to write
enum wb_reason reason
- reason why some writeback work initiated
Description
Start writeback on some inodes on this super_block. No guarantees are made on how many (if any) will be written, and this function does not wait for IO completion of submitted IO.
-
void
writeback_inodes_sb
(struct super_block * sb, enum wb_reason reason)¶ writeback dirty inodes from given super_block
Parameters
struct super_block * sb
- the superblock
enum wb_reason reason
- reason why some writeback work was initiated
Description
Start writeback on some inodes on this super_block. No guarantees are made on how many (if any) will be written, and this function does not wait for IO completion of submitted IO.
-
void
try_to_writeback_inodes_sb
(struct super_block * sb, enum wb_reason reason)¶ try to start writeback if none underway
Parameters
struct super_block * sb
- the superblock
enum wb_reason reason
- reason why some writeback work was initiated
Description
Invoke __writeback_inodes_sb_nr if no writeback is currently underway.
-
void
sync_inodes_sb
(struct super_block * sb)¶ sync sb inode pages
Parameters
struct super_block * sb
- the superblock
Description
This function writes and waits on any dirty inode belonging to this super_block.
-
int
write_inode_now
(struct inode * inode, int sync)¶ write an inode to disk
Parameters
struct inode * inode
- inode to write to disk
int sync
- whether the write should be synchronous or not
Description
This function commits an inode to disk immediately if it is dirty. This is primarily needed by knfsd.
The caller must either have a ref on the inode or must have set I_WILL_FREE.
-
int
sync_inode
(struct inode * inode, struct writeback_control * wbc)¶ write an inode and its pages to disk.
Parameters
struct inode * inode
- the inode to sync
struct writeback_control * wbc
- controls the writeback mode
Description
sync_inode()
will write an inode and its pages to disk. It will also
correctly update the inode on its superblock’s dirty inode lists and will
update inode->i_state.
The caller must have a ref on the inode.
-
int
sync_inode_metadata
(struct inode * inode, int wait)¶ write an inode to disk
Parameters
struct inode * inode
- the inode to sync
int wait
- wait for I/O to complete.
Description
Write an inode to disk and adjust its dirty state after completion.
Note
only writes the actual inode, no associated data or other metadata.
-
struct super_block *
freeze_bdev
(struct block_device * bdev)¶ - lock a filesystem and force it into a consistent state
Parameters
struct block_device * bdev
- blockdevice to lock
Description
If a superblock is found on this device, we take the s_umount semaphore
on it to make sure nobody unmounts until the snapshot creation is done.
The reference counter (bd_fsfreeze_count) guarantees that only the last
unfreeze process can unfreeze the frozen filesystem actually when multiple
freeze requests arrive simultaneously. It counts up in freeze_bdev()
and
count down in thaw_bdev()
. When it becomes 0, thaw_bdev()
will unfreeze
actually.
-
int
thaw_bdev
(struct block_device * bdev, struct super_block * sb)¶ - unlock filesystem
Parameters
struct block_device * bdev
- blockdevice to unlock
struct super_block * sb
- associated superblock
Description
Unlocks the filesystem and marks it writeable again after freeze_bdev()
.
-
int
bdev_read_page
(struct block_device * bdev, sector_t sector, struct page * page)¶ Start reading a page from a block device
Parameters
struct block_device * bdev
- The device to read the page from
sector_t sector
- The offset on the device to read the page to (need not be aligned)
struct page * page
- The page to read
Description
On entry, the page should be locked. It will be unlocked when the page has been read. If the block driver implements rw_page synchronously, that will be true on exit from this function, but it need not be.
Errors returned by this function are usually “soft”, eg out of memory, or queue full; callers should try a different route to read this page rather than propagate an error back up the stack.
Return
negative errno if an error occurs, 0 if submission was successful.
-
int
bdev_write_page
(struct block_device * bdev, sector_t sector, struct page * page, struct writeback_control * wbc)¶ Start writing a page to a block device
Parameters
struct block_device * bdev
- The device to write the page to
sector_t sector
- The offset on the device to write the page to (need not be aligned)
struct page * page
- The page to write
struct writeback_control * wbc
- The writeback_control for the write
Description
On entry, the page should be locked and not currently under writeback. On exit, if the write started successfully, the page will be unlocked and under writeback. If the write failed already (eg the driver failed to queue the page to the device), the page will still be locked. If the caller is a ->writepage implementation, it will need to unlock the page.
Errors returned by this function are usually “soft”, eg out of memory, or queue full; callers should try a different route to write this page rather than propagate an error back up the stack.
Return
negative errno if an error occurs, 0 if submission was successful.
-
struct block_device *
bdgrab
(struct block_device * bdev)¶ - Grab a reference to an already referenced block device
Parameters
struct block_device * bdev
- Block device to grab a reference to.
-
struct block_device *
bd_start_claiming
(struct block_device * bdev, void * holder)¶ start claiming a block device
Parameters
struct block_device * bdev
- block device of interest
void * holder
- holder trying to claim bdev
Description
bdev is about to be opened exclusively. Check bdev can be opened
exclusively and mark that an exclusive open is in progress. Each
successful call to this function must be matched with a call to
either bd_finish_claiming()
or bd_abort_claiming()
(which do not
fail).
This function is used to gain exclusive access to the block device without actually causing other exclusive open attempts to fail. It should be used when the open sequence itself requires exclusive access but may subsequently fail.
Context
Might sleep.
Return
Pointer to the block device containing bdev on success, ERR_PTR() value on failure.
-
void
bd_finish_claiming
(struct block_device * bdev, struct block_device * whole, void * holder)¶ finish claiming of a block device
Parameters
struct block_device * bdev
- block device of interest
struct block_device * whole
- whole block device (returned from
bd_start_claiming()
) void * holder
- holder that has claimed bdev
Description
Finish exclusive open of a block device. Mark the device as exlusively open by the holder and wake up all waiters for exclusive open to finish.
-
void
bd_abort_claiming
(struct block_device * bdev, struct block_device * whole, void * holder)¶ abort claiming of a block device
Parameters
struct block_device * bdev
- block device of interest
struct block_device * whole
- whole block device (returned from
bd_start_claiming()
) void * holder
- holder that has claimed bdev
Description
Abort claiming of a block device when the exclusive open failed. This can be also used when exclusive open is not actually desired and we just needed to block other exclusive openers for a while.
-
int
bd_link_disk_holder
(struct block_device * bdev, struct gendisk * disk)¶ create symlinks between holding disk and slave bdev
Parameters
struct block_device * bdev
- the claimed slave bdev
struct gendisk * disk
- the holding disk
Description
DON’T USE THIS UNLESS YOU’RE ALREADY USING IT.
This functions creates the following sysfs symlinks.
- from “slaves” directory of the holder disk to the claimed bdev
- from “holders” directory of the bdev to the holder disk
For example, if /dev/dm-0 maps to /dev/sda and disk for dm-0 is
passed to bd_link_disk_holder()
, then:
/sys/block/dm-0/slaves/sda –> /sys/block/sda /sys/block/sda/holders/dm-0 –> /sys/block/dm-0
The caller must have claimed bdev before calling this function and ensure that both bdev and disk are valid during the creation and lifetime of these symlinks.
Context
Might sleep.
Return
0 on success, -errno on failure.
-
void
bd_unlink_disk_holder
(struct block_device * bdev, struct gendisk * disk)¶ destroy symlinks created by
bd_link_disk_holder()
Parameters
struct block_device * bdev
- the calimed slave bdev
struct gendisk * disk
- the holding disk
Description
DON’T USE THIS UNLESS YOU’RE ALREADY USING IT.
Context
Might sleep.
-
int
revalidate_disk
(struct gendisk * disk)¶ wrapper for lower-level driver’s revalidate_disk call-back
Parameters
struct gendisk * disk
- struct gendisk to be revalidated
Description
This routine is a wrapper for lower-level driver’s revalidate_disk call-backs. It is used to do common pre and post operations needed for all revalidate_disk operations.
-
int
blkdev_get
(struct block_device * bdev, fmode_t mode, void * holder)¶ open a block device
Parameters
struct block_device * bdev
- block_device to open
fmode_t mode
- FMODE_* mask
void * holder
- exclusive holder identifier
Description
Open bdev with mode. If mode includes FMODE_EXCL
, bdev is
open with exclusive access. Specifying FMODE_EXCL
with NULL
holder is invalid. Exclusive opens may nest for the same holder.
On success, the reference count of bdev is unchanged. On failure, bdev is put.
Context
Might sleep.
Return
0 on success, -errno on failure.
-
struct block_device *
blkdev_get_by_path
(const char * path, fmode_t mode, void * holder)¶ open a block device by name
Parameters
const char * path
- path to the block device to open
fmode_t mode
- FMODE_* mask
void * holder
- exclusive holder identifier
Description
Open the blockdevice described by the device file at path. mode
and holder are identical to blkdev_get()
.
On success, the returned block_device has reference count of one.
Context
Might sleep.
Return
Pointer to block_device on success, ERR_PTR(-errno) on failure.
-
struct block_device *
blkdev_get_by_dev
(dev_t dev, fmode_t mode, void * holder)¶ open a block device by device number
Parameters
dev_t dev
- device number of block device to open
fmode_t mode
- FMODE_* mask
void * holder
- exclusive holder identifier
Description
Open the blockdevice described by device number dev. mode and
holder are identical to blkdev_get()
.
Use it ONLY if you really do not have anything better - i.e. when you are behind a truly sucky interface and all you are given is a device number. _Never_ to be used for internal purposes. If you ever need it - reconsider your API.
On success, the returned block_device has reference count of one.
Context
Might sleep.
Return
Pointer to block_device on success, ERR_PTR(-errno) on failure.
-
struct block_device *
lookup_bdev
(const char * pathname)¶ lookup a struct block_device by name
Parameters
const char * pathname
- special file representing the block device
Description
Get a reference to the blockdevice at pathname in the current namespace if possible and return it. Return ERR_PTR(error) otherwise.
-
struct file *
anon_inode_getfile
(const char * name, const struct file_operations * fops, void * priv, int flags)¶ creates a new file instance by hooking it up to an anonymous inode, and a dentry that describe the “class” of the file
Parameters
const char * name
- [in] name of the “class” of the new file
const struct file_operations * fops
- [in] file operations for the new file
void * priv
- [in] private data for the new file (will be file’s private_data)
int flags
- [in] flags
Description
Creates a new file by hooking it on a single inode. This is useful for files
that do not need to have a full-fledged inode in order to operate correctly.
All the files created with anon_inode_getfile()
will share a single inode,
hence saving memory and avoiding code duplication for the file/inode/dentry
setup. Returns the newly created file* or an error pointer.
-
int
anon_inode_getfd
(const char * name, const struct file_operations * fops, void * priv, int flags)¶ creates a new file instance by hooking it up to an anonymous inode, and a dentry that describe the “class” of the file
Parameters
const char * name
- [in] name of the “class” of the new file
const struct file_operations * fops
- [in] file operations for the new file
void * priv
- [in] private data for the new file (will be file’s private_data)
int flags
- [in] flags
Description
Creates a new file by hooking it on a single inode. This is useful for files
that do not need to have a full-fledged inode in order to operate correctly.
All the files created with anon_inode_getfd()
will share a single inode,
hence saving memory and avoiding code duplication for the file/inode/dentry
setup. Returns new descriptor or an error code.
-
int
setattr_prepare
(struct dentry * dentry, struct iattr * attr)¶ check if attribute changes to a dentry are allowed
Parameters
struct dentry * dentry
- dentry to check
struct iattr * attr
- attributes to change
Description
Check if we are allowed to change the attributes contained in attr in the given dentry. This includes the normal unix access permission checks, as well as checks for rlimits and others. The function also clears SGID bit from mode if user is not allowed to set it. Also file capabilities and IMA extended attributes are cleared if ATTR_KILL_PRIV is set.
Should be called as the first thing in ->setattr implementations, possibly after taking additional locks.
-
int
inode_newsize_ok
(const struct inode * inode, loff_t offset)¶ may this inode be truncated to a given size
Parameters
const struct inode * inode
- the inode to be truncated
loff_t offset
- the new size to assign to the inode
Description
inode_newsize_ok must be called with i_mutex held.
inode_newsize_ok will check filesystem limits and ulimits to check that the new inode size is within limits. inode_newsize_ok will also send SIGXFSZ when necessary. Caller must not proceed with inode size change if failure is returned. inode must be a file (not directory), with appropriate permissions to allow truncate (inode_newsize_ok does NOT check these conditions).
Return
0 on success, -ve errno on failure
-
void
setattr_copy
(struct inode * inode, const struct iattr * attr)¶ copy simple metadata updates into the generic inode
Parameters
struct inode * inode
- the inode to be updated
const struct iattr * attr
- the new attributes
Description
setattr_copy must be called with i_mutex held.
setattr_copy updates the inode’s metadata with that specified in attr. Noticeably missing is inode size update, which is more complex as it requires pagecache updates.
The inode is not marked as dirty after this operation. The rationale is that for “simple” filesystems, the struct inode is the inode storage. The caller is free to mark the inode dirty afterwards if needed.
-
int
notify_change
(struct dentry * dentry, struct iattr * attr, struct inode ** delegated_inode)¶ modify attributes of a filesytem object
Parameters
struct dentry * dentry
- object affected
struct iattr * attr
- new attributes
struct inode ** delegated_inode
- returns inode, if the inode is delegated
Description
The caller must hold the i_mutex on the affected object.
If notify_change discovers a delegation in need of breaking, it will return -EWOULDBLOCK and return a reference to the inode in delegated_inode. The caller should then break the delegation and retry. Because breaking a delegation may take a long time, the caller should drop the i_mutex before doing so.
Alternatively, a caller may pass NULL for delegated_inode. This may be appropriate for callers that expect the underlying filesystem not to be NFS exported. Also, passing NULL is fine for callers holding the file open for write, as there can be no conflicting delegation in that case.
-
char *
d_path
(const struct path * path, char * buf, int buflen)¶ return the path of a dentry
Parameters
const struct path * path
- path to report
char * buf
- buffer to return value in
int buflen
- buffer length
Description
Convert a dentry into an ASCII path name. If the entry has been deleted the string ” (deleted)” is appended. Note that this is ambiguous.
Returns a pointer into the buffer or an error code if the path was too long. Note: Callers should use the returned pointer, not the passed in buffer, to use the name! The implementation often starts at an offset into the buffer, and may leave 0 bytes at the start.
“buflen” should be positive.
-
struct page *
dax_layout_busy_page
(struct address_space * mapping)¶ find first pinned page in mapping
Parameters
struct address_space * mapping
- address space to scan for a page with ref count > 1
Description
DAX requires ZONE_DEVICE mapped pages. These pages are never ‘onlined’ to the page allocator so they are considered idle when page->count == 1. A filesystem uses this interface to determine if any page in the mapping is busy, i.e. for DMA, or other get_user_pages() usages.
It is expected that the filesystem is holding locks to block the
establishment of new mappings in this address_space. I.e. it expects
to be able to run unmap_mapping_range()
and subsequently not race
mapping_mapped() becoming true.
-
ssize_t
dax_iomap_rw
(struct kiocb * iocb, struct iov_iter * iter, const struct iomap_ops * ops)¶ Perform I/O to a DAX file
Parameters
struct kiocb * iocb
- The control block for this I/O
struct iov_iter * iter
- The addresses to do I/O from or to
const struct iomap_ops * ops
- iomap ops passed from the file system
Description
This function performs read and write operations to directly mapped persistent memory. The callers needs to take care of read/write exclusion and evicting any page cache pages in the region under I/O.
-
vm_fault_t
dax_iomap_fault
(struct vm_fault * vmf, enum page_entry_size pe_size, pfn_t * pfnp, int * iomap_errp, const struct iomap_ops * ops)¶ handle a page fault on a DAX file
Parameters
struct vm_fault * vmf
- The description of the fault
enum page_entry_size pe_size
- Size of the page to fault in
pfn_t * pfnp
- PFN to insert for synchronous faults if fsync is required
int * iomap_errp
- Storage for detailed error code in case of error
const struct iomap_ops * ops
- Iomap ops passed from the file system
Description
When a page fault occurs, filesystems may call this helper in
their fault handler for DAX files. dax_iomap_fault()
assumes the caller
has done all the necessary locking for page fault to proceed
successfully.
-
vm_fault_t
dax_finish_sync_fault
(struct vm_fault * vmf, enum page_entry_size pe_size, pfn_t pfn)¶ finish synchronous page fault
Parameters
struct vm_fault * vmf
- The description of the fault
enum page_entry_size pe_size
- Size of entry to be inserted
pfn_t pfn
- PFN to insert
Description
This function ensures that the file range touched by the page fault is stored persistently on the media and handles inserting of appropriate page table entry.
-
void
dio_end_io
(struct bio * bio)¶ handle the end io action for the given bio
Parameters
struct bio * bio
- The direct io bio thats being completed
Description
This is meant to be called by any filesystem that uses their own dio_submit_t so that the DIO specific endio actions are dealt with after the filesystem has done it’s completion work.
-
int
simple_setattr
(struct dentry * dentry, struct iattr * iattr)¶ setattr for simple filesystem
Parameters
struct dentry * dentry
- dentry
struct iattr * iattr
- iattr structure
Description
Returns 0 on success, -error on failure.
simple_setattr is a simple ->setattr implementation without a proper implementation of size changes.
It can either be used for in-memory filesystems or special files on simple regular filesystems. Anything that needs to change on-disk or wire state on size changes needs its own setattr method.
-
int
simple_write_end
(struct file * file, struct address_space * mapping, loff_t pos, unsigned len, unsigned copied, struct page * page, void * fsdata)¶ .write_end helper for non-block-device FSes
Parameters
struct file * file
- See .write_end of address_space_operations
struct address_space * mapping
- “
loff_t pos
- “
unsigned len
- “
unsigned copied
- “
struct page * page
- “
void * fsdata
- “
Description
simple_write_end does the minimum needed for updating a page after writing is done. It has the same API signature as the .write_end of address_space_operations vector. So it can just be set onto .write_end for FSes that don’t need any other processing. i_mutex is assumed to be held. Block based filesystems should use generic_write_end().
NOTE
Even though i_size might get updated by this function, mark_inode_dirty is not called, so a filesystem that actually does store data in .write_inode should extend on what’s done here with a call to mark_inode_dirty() in the case that i_size has changed.
Use ONLY with simple_readpage()
-
ssize_t
simple_read_from_buffer
(void __user * to, size_t count, loff_t * ppos, const void * from, size_t available)¶ copy data from the buffer to user space
Parameters
void __user * to
- the user space buffer to read to
size_t count
- the maximum number of bytes to read
loff_t * ppos
- the current position in the buffer
const void * from
- the buffer to read from
size_t available
- the size of the buffer
Description
The simple_read_from_buffer()
function reads up to count bytes from the
buffer from at offset ppos into the user space address starting at to.
On success, the number of bytes read is returned and the offset ppos is advanced by this number, or negative value is returned on error.
-
ssize_t
simple_write_to_buffer
(void * to, size_t available, loff_t * ppos, const void __user * from, size_t count)¶ copy data from user space to the buffer
Parameters
void * to
- the buffer to write to
size_t available
- the size of the buffer
loff_t * ppos
- the current position in the buffer
const void __user * from
- the user space buffer to read from
size_t count
- the maximum number of bytes to read
Description
The simple_write_to_buffer()
function reads up to count bytes from the user
space address starting at from into the buffer to at offset ppos.
On success, the number of bytes written is returned and the offset ppos is advanced by this number, or negative value is returned on error.
-
ssize_t
memory_read_from_buffer
(void * to, size_t count, loff_t * ppos, const void * from, size_t available)¶ copy data from the buffer
Parameters
void * to
- the kernel space buffer to read to
size_t count
- the maximum number of bytes to read
loff_t * ppos
- the current position in the buffer
const void * from
- the buffer to read from
size_t available
- the size of the buffer
Description
The memory_read_from_buffer()
function reads up to count bytes from the
buffer from at offset ppos into the kernel space address starting at to.
On success, the number of bytes read is returned and the offset ppos is advanced by this number, or negative value is returned on error.
-
struct dentry *
generic_fh_to_dentry
(struct super_block * sb, struct fid * fid, int fh_len, int fh_type, struct inode *(*get_inode) (struct super_block *sb, u64 ino, u32 gen)¶ generic helper for the fh_to_dentry export operation
Parameters
struct super_block * sb
- filesystem to do the file handle conversion on
struct fid * fid
- file handle to convert
int fh_len
- length of the file handle in bytes
int fh_type
- type of file handle
struct inode *(*) (struct super_block *sb, u64 ino, u32 gen) get_inode
- filesystem callback to retrieve inode
Description
This function decodes fid as long as it has one of the well-known Linux filehandle types and calls get_inode on it to retrieve the inode for the object specified in the file handle.
-
struct dentry *
generic_fh_to_parent
(struct super_block * sb, struct fid * fid, int fh_len, int fh_type, struct inode *(*get_inode) (struct super_block *sb, u64 ino, u32 gen)¶ generic helper for the fh_to_parent export operation
Parameters
struct super_block * sb
- filesystem to do the file handle conversion on
struct fid * fid
- file handle to convert
int fh_len
- length of the file handle in bytes
int fh_type
- type of file handle
struct inode *(*) (struct super_block *sb, u64 ino, u32 gen) get_inode
- filesystem callback to retrieve inode
Description
This function decodes fid as long as it has one of the well-known Linux filehandle types and calls get_inode on it to retrieve the inode for the _parent_ object specified in the file handle if it is specified in the file handle, or NULL otherwise.
-
int
__generic_file_fsync
(struct file * file, loff_t start, loff_t end, int datasync)¶ generic fsync implementation for simple filesystems
Parameters
struct file * file
- file to synchronize
loff_t start
- start offset in bytes
loff_t end
- end offset in bytes (inclusive)
int datasync
- only synchronize essential metadata if true
Description
This is a generic implementation of the fsync method for simple filesystems which track all non-inode metadata in the buffers list hanging off the address_space structure.
-
int
generic_file_fsync
(struct file * file, loff_t start, loff_t end, int datasync)¶ generic fsync implementation for simple filesystems with flush
Parameters
struct file * file
- file to synchronize
loff_t start
- start offset in bytes
loff_t end
- end offset in bytes (inclusive)
int datasync
- only synchronize essential metadata if true
-
int
generic_check_addressable
(unsigned blocksize_bits, u64 num_blocks)¶ Check addressability of file system
Parameters
unsigned blocksize_bits
- log of file system block size
u64 num_blocks
- number of blocks in file system
Description
Determine whether a file system with num_blocks blocks (and a block size of 2****blocksize_bits**) is addressable by the sector_t and page cache of the system. Return 0 if so and -EFBIG otherwise.
-
int
simple_nosetlease
(struct file * filp, long arg, struct file_lock ** flp, void ** priv)¶ generic helper for prohibiting leases
Parameters
struct file * filp
- file pointer
long arg
- type of lease to obtain
struct file_lock ** flp
- new lease supplied for insertion
void ** priv
- private data for lm_setup operation
Description
Generic helper for filesystems that do not wish to allow leases to be set. All arguments are ignored and it just returns -EINVAL.
-
const char *
simple_get_link
(struct dentry * dentry, struct inode * inode, struct delayed_call * done)¶ generic helper to get the target of “fast” symlinks
Parameters
struct dentry * dentry
- not used here
struct inode * inode
- the symlink inode
struct delayed_call * done
- not used here
Description
Generic helper for filesystems to use for symlink inodes where a pointer to the symlink target is stored in ->i_link. NOTE: this isn’t normally called, since as an optimization the path lookup code uses any non-NULL ->i_link directly, without calling ->get_link(). But ->get_link() still must be set, to mark the inode_operations as being for a symlink.
Return
the symlink target
-
int
posix_acl_update_mode
(struct inode * inode, umode_t * mode_p, struct posix_acl ** acl)¶ update mode in set_acl
Parameters
struct inode * inode
- target inode
umode_t * mode_p
- mode (pointer) for update
struct posix_acl ** acl
- acl pointer
Description
Update the file mode when setting an ACL: compute the new file permission bits based on the ACL. In addition, if the ACL is equivalent to the new file mode, set *acl to NULL to indicate that no ACL should be set.
As with chmod, clear the setgid bit if the caller is not in the owning group or capable of CAP_FSETID (see inode_change_ok).
Called from set_acl inode operations.
-
void
generic_fillattr
(struct inode * inode, struct kstat * stat)¶ Fill in the basic attributes from the inode struct
Parameters
struct inode * inode
- Inode to use as the source
struct kstat * stat
- Where to fill in the attributes
Description
Fill in the basic attributes in the kstat structure from data that’s to be found on the VFS inode structure. This is the default if no getattr inode operation is supplied.
-
int
vfs_getattr_nosec
(const struct path * path, struct kstat * stat, u32 request_mask, unsigned int query_flags)¶ getattr without security checks
Parameters
const struct path * path
- file to get attributes from
struct kstat * stat
- structure to return attributes in
u32 request_mask
- STATX_xxx flags indicating what the caller wants
unsigned int query_flags
- Query mode (KSTAT_QUERY_FLAGS)
Description
Get attributes without calling security_inode_getattr.
Currently the only caller other than vfs_getattr is internal to the filehandle lookup code, which uses only the inode number and returns no attributes to any user. Any other code probably wants vfs_getattr.
-
int
vfs_statx_fd
(unsigned int fd, struct kstat * stat, u32 request_mask, unsigned int query_flags)¶ Get the enhanced basic attributes by file descriptor
Parameters
unsigned int fd
- The file descriptor referring to the file of interest
struct kstat * stat
- The result structure to fill in.
u32 request_mask
- STATX_xxx flags indicating what the caller wants
unsigned int query_flags
- Query mode (KSTAT_QUERY_FLAGS)
Description
This function is a wrapper around vfs_getattr(). The main difference is that it uses a file descriptor to determine the file location.
0 will be returned on success, and a -ve error code if unsuccessful.
-
int
vfs_statx
(int dfd, const char __user * filename, int flags, struct kstat * stat, u32 request_mask)¶ Get basic and extra attributes by filename
Parameters
int dfd
- A file descriptor representing the base dir for a relative filename
const char __user * filename
- The name of the file of interest
int flags
- Flags to control the query
struct kstat * stat
- The result structure to fill in.
u32 request_mask
- STATX_xxx flags indicating what the caller wants
Description
This function is a wrapper around vfs_getattr(). The main difference is that it uses a filename and base directory to determine the file location. Additionally, the use of AT_SYMLINK_NOFOLLOW in flags will prevent a symlink at the given name from being referenced.
0 will be returned on success, and a -ve error code if unsuccessful.
-
int
vfs_fsync_range
(struct file * file, loff_t start, loff_t end, int datasync)¶ helper to sync a range of data & metadata to disk
Parameters
struct file * file
- file to sync
loff_t start
- offset in bytes of the beginning of data range to sync
loff_t end
- offset in bytes of the end of data range (inclusive)
int datasync
- perform only datasync
Description
Write back data in range start..**end** and metadata for file to disk. If datasync is set only metadata needed to access modified file data is written.
-
int
vfs_fsync
(struct file * file, int datasync)¶ perform a fsync or fdatasync on a file
Parameters
struct file * file
- file to sync
int datasync
- only perform a fdatasync operation
Description
Write back data and metadata for file to disk. If datasync is set only metadata needed to access modified file data is written.
-
const char *
xattr_full_name
(const struct xattr_handler * handler, const char * name)¶ Compute full attribute name from suffix
Parameters
const struct xattr_handler * handler
- handler of the xattr_handler operation
const char * name
- name passed to the xattr_handler operation
Description
The get and set xattr handler operations are called with the remainder of the attribute name after skipping the handler’s prefix: for example, “foo” is passed to the get operation of a handler with prefix “user.” to get attribute “user.foo”. The full name is still “there” in the name though.
Note
the list xattr handler operation when called from the vfs is passed a NULL name; some file systems use this operation internally, with varying semantics.
The proc filesystem¶
sysctl interface¶
-
int
proc_dostring
(struct ctl_table * table, int write, void __user * buffer, size_t * lenp, loff_t * ppos)¶ read a string sysctl
Parameters
struct ctl_table * table
- the sysctl table
int write
TRUE
if this is a write to the sysctl filevoid __user * buffer
- the user buffer
size_t * lenp
- the size of the user buffer
loff_t * ppos
- file position
Description
Reads/writes a string from/to the user buffer. If the kernel
buffer provided is not large enough to hold the string, the
string is truncated. The copied string is NULL-terminated
.
If the string is being read by the user process, it is copied
and a newline ‘n’ is added. It is truncated if the buffer is
not large enough.
Returns 0 on success.
-
int
proc_dointvec
(struct ctl_table * table, int write, void __user * buffer, size_t * lenp, loff_t * ppos)¶ read a vector of integers
Parameters
struct ctl_table * table
- the sysctl table
int write
TRUE
if this is a write to the sysctl filevoid __user * buffer
- the user buffer
size_t * lenp
- the size of the user buffer
loff_t * ppos
- file position
Description
Reads/writes up to table->maxlen/sizeof(unsigned int) integer values from/to the user buffer, treated as an ASCII string.
Returns 0 on success.
-
int
proc_douintvec
(struct ctl_table * table, int write, void __user * buffer, size_t * lenp, loff_t * ppos)¶ read a vector of unsigned integers
Parameters
struct ctl_table * table
- the sysctl table
int write
TRUE
if this is a write to the sysctl filevoid __user * buffer
- the user buffer
size_t * lenp
- the size of the user buffer
loff_t * ppos
- file position
Description
Reads/writes up to table->maxlen/sizeof(unsigned int) unsigned integer values from/to the user buffer, treated as an ASCII string.
Returns 0 on success.
-
int
proc_dointvec_minmax
(struct ctl_table * table, int write, void __user * buffer, size_t * lenp, loff_t * ppos)¶ read a vector of integers with min/max values
Parameters
struct ctl_table * table
- the sysctl table
int write
TRUE
if this is a write to the sysctl filevoid __user * buffer
- the user buffer
size_t * lenp
- the size of the user buffer
loff_t * ppos
- file position
Description
Reads/writes up to table->maxlen/sizeof(unsigned int) integer values from/to the user buffer, treated as an ASCII string.
This routine will ensure the values are within the range specified by table->extra1 (min) and table->extra2 (max).
Returns 0 on success or -EINVAL on write when the range check fails.
-
int
proc_douintvec_minmax
(struct ctl_table * table, int write, void __user * buffer, size_t * lenp, loff_t * ppos)¶ read a vector of unsigned ints with min/max values
Parameters
struct ctl_table * table
- the sysctl table
int write
TRUE
if this is a write to the sysctl filevoid __user * buffer
- the user buffer
size_t * lenp
- the size of the user buffer
loff_t * ppos
- file position
Description
Reads/writes up to table->maxlen/sizeof(unsigned int) unsigned integer values from/to the user buffer, treated as an ASCII string. Negative strings are not allowed.
This routine will ensure the values are within the range specified by table->extra1 (min) and table->extra2 (max). There is a final sanity check for UINT_MAX to avoid having to support wrap around uses from userspace.
Returns 0 on success or -ERANGE on write when the range check fails.
-
int
proc_doulongvec_minmax
(struct ctl_table * table, int write, void __user * buffer, size_t * lenp, loff_t * ppos)¶ read a vector of long integers with min/max values
Parameters
struct ctl_table * table
- the sysctl table
int write
TRUE
if this is a write to the sysctl filevoid __user * buffer
- the user buffer
size_t * lenp
- the size of the user buffer
loff_t * ppos
- file position
Description
Reads/writes up to table->maxlen/sizeof(unsigned long) unsigned long values from/to the user buffer, treated as an ASCII string.
This routine will ensure the values are within the range specified by table->extra1 (min) and table->extra2 (max).
Returns 0 on success.
-
int
proc_doulongvec_ms_jiffies_minmax
(struct ctl_table * table, int write, void __user * buffer, size_t * lenp, loff_t * ppos)¶ read a vector of millisecond values with min/max values
Parameters
struct ctl_table * table
- the sysctl table
int write
TRUE
if this is a write to the sysctl filevoid __user * buffer
- the user buffer
size_t * lenp
- the size of the user buffer
loff_t * ppos
- file position
Description
Reads/writes up to table->maxlen/sizeof(unsigned long) unsigned long values from/to the user buffer, treated as an ASCII string. The values are treated as milliseconds, and converted to jiffies when they are stored.
This routine will ensure the values are within the range specified by table->extra1 (min) and table->extra2 (max).
Returns 0 on success.
-
int
proc_dointvec_jiffies
(struct ctl_table * table, int write, void __user * buffer, size_t * lenp, loff_t * ppos)¶ read a vector of integers as seconds
Parameters
struct ctl_table * table
- the sysctl table
int write
TRUE
if this is a write to the sysctl filevoid __user * buffer
- the user buffer
size_t * lenp
- the size of the user buffer
loff_t * ppos
- file position
Description
Reads/writes up to table->maxlen/sizeof(unsigned int) integer values from/to the user buffer, treated as an ASCII string. The values read are assumed to be in seconds, and are converted into jiffies.
Returns 0 on success.
-
int
proc_dointvec_userhz_jiffies
(struct ctl_table * table, int write, void __user * buffer, size_t * lenp, loff_t * ppos)¶ read a vector of integers as 1/USER_HZ seconds
Parameters
struct ctl_table * table
- the sysctl table
int write
TRUE
if this is a write to the sysctl filevoid __user * buffer
- the user buffer
size_t * lenp
- the size of the user buffer
loff_t * ppos
- pointer to the file position
Description
Reads/writes up to table->maxlen/sizeof(unsigned int) integer values from/to the user buffer, treated as an ASCII string. The values read are assumed to be in 1/USER_HZ seconds, and are converted into jiffies.
Returns 0 on success.
-
int
proc_dointvec_ms_jiffies
(struct ctl_table * table, int write, void __user * buffer, size_t * lenp, loff_t * ppos)¶ read a vector of integers as 1 milliseconds
Parameters
struct ctl_table * table
- the sysctl table
int write
TRUE
if this is a write to the sysctl filevoid __user * buffer
- the user buffer
size_t * lenp
- the size of the user buffer
loff_t * ppos
- the current position in the file
Description
Reads/writes up to table->maxlen/sizeof(unsigned int) integer values from/to the user buffer, treated as an ASCII string. The values read are assumed to be in 1/1000 seconds, and are converted into jiffies.
Returns 0 on success.
-
int
proc_do_large_bitmap
(struct ctl_table * table, int write, void __user * buffer, size_t * lenp, loff_t * ppos)¶ read/write from/to a large bitmap
Parameters
struct ctl_table * table
- the sysctl table
int write
TRUE
if this is a write to the sysctl filevoid __user * buffer
- the user buffer
size_t * lenp
- the size of the user buffer
loff_t * ppos
- file position
Description
The bitmap is stored at table->data and the bitmap length (in bits) in table->maxlen.
We use a range comma separated format (e.g. 1,3-4,10-10) so that large bitmaps may be represented in a compact manner. Writing into the file will clear the bitmap then update it with the given input.
Returns 0 on success.
proc filesystem interface¶
-
void
proc_flush_task
(struct task_struct * task)¶ Remove dcache entries for task from the /proc dcache.
Parameters
struct task_struct * task
- task that should be flushed.
Description
When flushing dentries from proc, one needs to flush them from global proc (proc_mnt) and from all the namespaces’ procs this task was seen in. This call is supposed to do all of this job.
Looks in the dcache for /proc/pid /proc/tgid/task/pid if either directory is present flushes it and all of it’ts children from the dcache.
It is safe and reasonable to cache /proc entries for a task until that task exits. After that they just clog up the dcache with useless entries, possibly causing useful dcache entries to be flushed instead. This routine is proved to flush those useless dcache entries at process exit time.
NOTE
- This routine is just an optimization so it does not guarantee
- that no dcache entries will exist at process exit time it just makes it very unlikely that any will persist.
Events based on file descriptors¶
-
__u64
eventfd_signal
(struct eventfd_ctx * ctx, __u64 n)¶ Adds n to the eventfd counter.
Parameters
struct eventfd_ctx * ctx
- [in] Pointer to the eventfd context.
__u64 n
- [in] Value of the counter to be added to the eventfd internal counter. The value cannot be negative.
Description
This function is supposed to be called by the kernel in paths that do not allow sleeping. In this function we allow the counter to reach the ULLONG_MAX value, and we signal this as overflow condition by returning a EPOLLERR to poll(2).
Returns the amount by which the counter was incremented. This will be less than n if the counter has overflowed.
-
void
eventfd_ctx_put
(struct eventfd_ctx * ctx)¶ Releases a reference to the internal eventfd context.
Parameters
struct eventfd_ctx * ctx
- [in] Pointer to eventfd context.
Description
The eventfd context reference must have been previously acquired either
with eventfd_ctx_fdget()
or eventfd_ctx_fileget()
.
-
int
eventfd_ctx_remove_wait_queue
(struct eventfd_ctx * ctx, wait_queue_entry_t * wait, __u64 * cnt)¶ Read the current counter and removes wait queue.
Parameters
struct eventfd_ctx * ctx
- [in] Pointer to eventfd context.
wait_queue_entry_t * wait
- [in] Wait queue to be removed.
__u64 * cnt
- [out] Pointer to the 64-bit counter value.
Description
Returns 0
if successful, or the following error codes:
-EAGAIN | : The operation would have blocked. |
This is used to atomically remove a wait queue entry from the eventfd wait queue head, and read/reset the counter value.
-
struct file *
eventfd_fget
(int fd)¶ Acquire a reference of an eventfd file descriptor.
Parameters
int fd
- [in] Eventfd file descriptor.
Description
Returns a pointer to the eventfd file structure in case of success, or the following error pointer:
-EBADF | : Invalid fd file descriptor. |
-EINVAL | : The fd file descriptor is not an eventfd file. |
-
struct eventfd_ctx *
eventfd_ctx_fdget
(int fd)¶ Acquires a reference to the internal eventfd context.
Parameters
int fd
- [in] Eventfd file descriptor.
Description
Returns a pointer to the internal eventfd context, otherwise the error pointers returned by the following functions:
eventfd_fget
-
struct eventfd_ctx *
eventfd_ctx_fileget
(struct file * file)¶ Acquires a reference to the internal eventfd context.
Parameters
struct file * file
- [in] Eventfd file pointer.
Description
Returns a pointer to the internal eventfd context, otherwise the error pointer:
-EINVAL | : The fd file descriptor is not an eventfd file. |
The Filesystem for Exporting Kernel Objects¶
-
int
sysfs_create_file_ns
(struct kobject * kobj, const struct attribute * attr, const void * ns)¶ create an attribute file for an object with custom ns
Parameters
struct kobject * kobj
- object we’re creating for
const struct attribute * attr
- attribute descriptor
const void * ns
- namespace the new file should belong to
-
int
sysfs_add_file_to_group
(struct kobject * kobj, const struct attribute * attr, const char * group)¶ add an attribute file to a pre-existing group.
Parameters
struct kobject * kobj
- object we’re acting for.
const struct attribute * attr
- attribute descriptor.
const char * group
- group name.
-
int
sysfs_chmod_file
(struct kobject * kobj, const struct attribute * attr, umode_t mode)¶ update the modified mode value on an object attribute.
Parameters
struct kobject * kobj
- object we’re acting for.
const struct attribute * attr
- attribute descriptor.
umode_t mode
- file permissions.
-
struct kernfs_node *
sysfs_break_active_protection
(struct kobject * kobj, const struct attribute * attr)¶ break “active” protection
Parameters
struct kobject * kobj
- The kernel object attr is associated with.
const struct attribute * attr
- The attribute to break the “active” protection for.
Description
With sysfs, just like kernfs, deletion of an attribute is postponed until all active .show() and .store() callbacks have finished unless this function is called. Hence this function is useful in methods that implement self deletion.
-
void
sysfs_unbreak_active_protection
(struct kernfs_node * kn)¶ restore “active” protection
Parameters
struct kernfs_node * kn
- Pointer returned by
sysfs_break_active_protection()
.
Description
Undo the effects of sysfs_break_active_protection()
. Since this function
calls kernfs_put() on the kernfs node that corresponds to the ‘attr’
argument passed to sysfs_break_active_protection()
that attribute may have
been removed between the sysfs_break_active_protection()
and
sysfs_unbreak_active_protection()
calls, it is not safe to access kn after
this function has returned.
-
void
sysfs_remove_file_ns
(struct kobject * kobj, const struct attribute * attr, const void * ns)¶ remove an object attribute with a custom ns tag
Parameters
struct kobject * kobj
- object we’re acting for
const struct attribute * attr
- attribute descriptor
const void * ns
- namespace tag of the file to remove
Description
Hash the attribute name and namespace tag and kill the victim.
-
void
sysfs_remove_file_from_group
(struct kobject * kobj, const struct attribute * attr, const char * group)¶ remove an attribute file from a group.
Parameters
struct kobject * kobj
- object we’re acting for.
const struct attribute * attr
- attribute descriptor.
const char * group
- group name.
-
int
sysfs_create_bin_file
(struct kobject * kobj, const struct bin_attribute * attr)¶ create binary file for object.
Parameters
struct kobject * kobj
- object.
const struct bin_attribute * attr
- attribute descriptor.
-
void
sysfs_remove_bin_file
(struct kobject * kobj, const struct bin_attribute * attr)¶ remove binary file for object.
Parameters
struct kobject * kobj
- object.
const struct bin_attribute * attr
- attribute descriptor.
-
int
sysfs_create_link
(struct kobject * kobj, struct kobject * target, const char * name)¶ create symlink between two objects.
Parameters
struct kobject * kobj
- object whose directory we’re creating the link in.
struct kobject * target
- object we’re pointing to.
const char * name
- name of the symlink.
-
int
sysfs_create_link_nowarn
(struct kobject * kobj, struct kobject * target, const char * name)¶ create symlink between two objects.
Parameters
struct kobject * kobj
- object whose directory we’re creating the link in.
struct kobject * target
- object we’re pointing to.
const char * name
- name of the symlink.
Description
This function does the same assysfs_create_link()
, but it doesn’t warn if the link already exists.
-
void
sysfs_remove_link
(struct kobject * kobj, const char * name)¶ remove symlink in object’s directory.
Parameters
struct kobject * kobj
- object we’re acting for.
const char * name
- name of the symlink to remove.
-
int
sysfs_rename_link_ns
(struct kobject * kobj, struct kobject * targ, const char * old, const char * new, const void * new_ns)¶ rename symlink in object’s directory.
Parameters
struct kobject * kobj
- object we’re acting for.
struct kobject * targ
- object we’re pointing to.
const char * old
- previous name of the symlink.
const char * new
- new name of the symlink.
const void * new_ns
- new namespace of the symlink.
Description
A helper function for the common rename symlink idiom.
The debugfs filesystem¶
debugfs interface¶
-
struct dentry *
debugfs_lookup
(const char * name, struct dentry * parent)¶ look up an existing debugfs file
Parameters
const char * name
- a pointer to a string containing the name of the file to look up.
struct dentry * parent
- a pointer to the parent dentry of the file.
Description
This function will return a pointer to a dentry if it succeeds. If the file
doesn’t exist or an error occurs, NULL
will be returned. The returned
dentry must be passed to dput() when it is no longer needed.
If debugfs is not enabled in the kernel, the value -ENODEV
will be
returned.
-
struct dentry *
debugfs_create_file
(const char * name, umode_t mode, struct dentry * parent, void * data, const struct file_operations * fops)¶ create a file in the debugfs filesystem
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
umode_t mode
- the permission that the file should have.
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a directory dentry if set. If this parameter is NULL, then the file will be created in the root of the debugfs filesystem.
void * data
- a pointer to something that the caller will want to get to later on. The inode.i_private pointer will point to this value on the open() call.
const struct file_operations * fops
- a pointer to a struct file_operations that should be used for this file.
Description
This is the basic “create a file” function for debugfs. It allows for a
wide range of flexibility in creating a file, or a directory (if you want
to create a directory, the debugfs_create_dir()
function is
recommended to be used instead.)
This function will return a pointer to a dentry if it succeeds. This
pointer must be passed to the debugfs_remove()
function when the file is
to be removed (no automatic cleanup happens if your module is unloaded,
you are responsible here.) If an error occurs, ``ERR_PTR``(-ERROR) will be
returned.
If debugfs is not enabled in the kernel, the value -ENODEV
will be
returned.
-
struct dentry *
debugfs_create_file_unsafe
(const char * name, umode_t mode, struct dentry * parent, void * data, const struct file_operations * fops)¶ create a file in the debugfs filesystem
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
umode_t mode
- the permission that the file should have.
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a directory dentry if set. If this parameter is NULL, then the file will be created in the root of the debugfs filesystem.
void * data
- a pointer to something that the caller will want to get to later on. The inode.i_private pointer will point to this value on the open() call.
const struct file_operations * fops
- a pointer to a struct file_operations that should be used for this file.
Description
debugfs_create_file_unsafe()
is completely analogous to
debugfs_create_file()
, the only difference being that the fops
handed it will not get protected against file removals by the
debugfs core.
It is your responsibility to protect your struct file_operation
methods against file removals by means of debugfs_file_get()
and debugfs_file_put()
. ->open() is still protected by
debugfs though.
Any struct file_operations defined by means of DEFINE_DEBUGFS_ATTRIBUTE() is protected against file removals and thus, may be used here.
-
struct dentry *
debugfs_create_file_size
(const char * name, umode_t mode, struct dentry * parent, void * data, const struct file_operations * fops, loff_t file_size)¶ create a file in the debugfs filesystem
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
umode_t mode
- the permission that the file should have.
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a directory dentry if set. If this parameter is NULL, then the file will be created in the root of the debugfs filesystem.
void * data
- a pointer to something that the caller will want to get to later on. The inode.i_private pointer will point to this value on the open() call.
const struct file_operations * fops
- a pointer to a struct file_operations that should be used for this file.
loff_t file_size
- initial file size
Description
This is the basic “create a file” function for debugfs. It allows for a
wide range of flexibility in creating a file, or a directory (if you want
to create a directory, the debugfs_create_dir()
function is
recommended to be used instead.)
This function will return a pointer to a dentry if it succeeds. This
pointer must be passed to the debugfs_remove()
function when the file is
to be removed (no automatic cleanup happens if your module is unloaded,
you are responsible here.) If an error occurs, ``ERR_PTR``(-ERROR) will be
returned.
If debugfs is not enabled in the kernel, the value -ENODEV
will be
returned.
-
struct dentry *
debugfs_create_dir
(const char * name, struct dentry * parent)¶ create a directory in the debugfs filesystem
Parameters
const char * name
- a pointer to a string containing the name of the directory to create.
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a directory dentry if set. If this parameter is NULL, then the directory will be created in the root of the debugfs filesystem.
Description
This function creates a directory in debugfs with the given name.
This function will return a pointer to a dentry if it succeeds. This
pointer must be passed to the debugfs_remove()
function when the file is
to be removed (no automatic cleanup happens if your module is unloaded,
you are responsible here.) If an error occurs, ``ERR_PTR``(-ERROR) will be
returned.
If debugfs is not enabled in the kernel, the value -ENODEV
will be
returned.
-
struct dentry *
debugfs_create_automount
(const char * name, struct dentry * parent, debugfs_automount_t f, void * data)¶ create automount point in the debugfs filesystem
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a directory dentry if set. If this parameter is NULL, then the file will be created in the root of the debugfs filesystem.
debugfs_automount_t f
- function to be called when pathname resolution steps on that one.
void * data
- opaque argument to pass to f().
Description
f should return what ->d_automount() would.
-
struct dentry *
debugfs_create_symlink
(const char * name, struct dentry * parent, const char * target)¶ create a symbolic link in the debugfs filesystem
Parameters
const char * name
- a pointer to a string containing the name of the symbolic link to create.
struct dentry * parent
- a pointer to the parent dentry for this symbolic link. This should be a directory dentry if set. If this parameter is NULL, then the symbolic link will be created in the root of the debugfs filesystem.
const char * target
- a pointer to a string containing the path to the target of the symbolic link.
Description
This function creates a symbolic link with the given name in debugfs that links to the given target path.
This function will return a pointer to a dentry if it succeeds. This
pointer must be passed to the debugfs_remove()
function when the symbolic
link is to be removed (no automatic cleanup happens if your module is
unloaded, you are responsible here.) If an error occurs, ``ERR_PTR``(-ERROR)
will be returned.
If debugfs is not enabled in the kernel, the value -ENODEV
will be
returned.
-
void
debugfs_remove
(struct dentry * dentry)¶ removes a file or directory from the debugfs filesystem
Parameters
struct dentry * dentry
- a pointer to a the dentry of the file or directory to be removed. If this parameter is NULL or an error value, nothing will be done.
Description
This function removes a file or directory in debugfs that was previously
created with a call to another debugfs function (like
debugfs_create_file()
or variants thereof.)
This function is required to be called in order for the file to be removed, no automatic cleanup of files will happen when a module is removed, you are responsible here.
-
void
debugfs_remove_recursive
(struct dentry * dentry)¶ recursively removes a directory
Parameters
struct dentry * dentry
- a pointer to a the dentry of the directory to be removed. If this parameter is NULL or an error value, nothing will be done.
Description
This function recursively removes a directory tree in debugfs that
was previously created with a call to another debugfs function
(like debugfs_create_file()
or variants thereof.)
This function is required to be called in order for the file to be removed, no automatic cleanup of files will happen when a module is removed, you are responsible here.
-
struct dentry *
debugfs_rename
(struct dentry * old_dir, struct dentry * old_dentry, struct dentry * new_dir, const char * new_name)¶ rename a file/directory in the debugfs filesystem
Parameters
struct dentry * old_dir
- a pointer to the parent dentry for the renamed object. This should be a directory dentry.
struct dentry * old_dentry
- dentry of an object to be renamed.
struct dentry * new_dir
- a pointer to the parent dentry where the object should be moved. This should be a directory dentry.
const char * new_name
- a pointer to a string containing the target name.
Description
This function renames a file/directory in debugfs. The target must not exist for rename to succeed.
This function will return a pointer to old_dentry (which is updated to
reflect renaming) if it succeeds. If an error occurs, NULL
will be
returned.
If debugfs is not enabled in the kernel, the value -ENODEV
will be
returned.
-
bool
debugfs_initialized
(void)¶ Tells whether debugfs has been registered
Parameters
void
- no arguments
-
int
debugfs_file_get
(struct dentry * dentry)¶ mark the beginning of file data access
Parameters
struct dentry * dentry
- the dentry object whose data is being accessed.
Description
Up to a matching call to debugfs_file_put()
, any successive call
into the file removing functions debugfs_remove()
and
debugfs_remove_recursive()
will block. Since associated private
file data may only get freed after a successful return of any of
the removal functions, you may safely access it after a successful
call to debugfs_file_get()
without worrying about lifetime issues.
If -EIO
is returned, the file has already been removed and thus,
it is not safe to access any of its data. If, on the other hand,
it is allowed to access the file data, zero is returned.
-
void
debugfs_file_put
(struct dentry * dentry)¶ mark the end of file data access
Parameters
struct dentry * dentry
- the dentry object formerly passed to
debugfs_file_get()
.
Description
Allow any ongoing concurrent call into debugfs_remove()
or
debugfs_remove_recursive()
blocked by a former call to
debugfs_file_get()
to proceed and return to its caller.
-
void
debugfs_create_u8
(const char * name, umode_t mode, struct dentry * parent, u8 * value)¶ create a debugfs file that is used to read and write an unsigned 8-bit value
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
umode_t mode
- the permission that the file should have
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a
directory dentry if set. If this parameter is
NULL
, then the file will be created in the root of the debugfs filesystem. u8 * value
- a pointer to the variable that the file should read to and write from.
Description
This function creates a file in debugfs with the given name that contains the value of the variable value. If the mode variable is so set, it can be read from, and written to.
-
void
debugfs_create_u16
(const char * name, umode_t mode, struct dentry * parent, u16 * value)¶ create a debugfs file that is used to read and write an unsigned 16-bit value
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
umode_t mode
- the permission that the file should have
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a
directory dentry if set. If this parameter is
NULL
, then the file will be created in the root of the debugfs filesystem. u16 * value
- a pointer to the variable that the file should read to and write from.
Description
This function creates a file in debugfs with the given name that contains the value of the variable value. If the mode variable is so set, it can be read from, and written to.
-
struct dentry *
debugfs_create_u32
(const char * name, umode_t mode, struct dentry * parent, u32 * value)¶ create a debugfs file that is used to read and write an unsigned 32-bit value
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
umode_t mode
- the permission that the file should have
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a
directory dentry if set. If this parameter is
NULL
, then the file will be created in the root of the debugfs filesystem. u32 * value
- a pointer to the variable that the file should read to and write from.
Description
This function creates a file in debugfs with the given name that contains the value of the variable value. If the mode variable is so set, it can be read from, and written to.
This function will return a pointer to a dentry if it succeeds. This
pointer must be passed to the debugfs_remove()
function when the file is
to be removed (no automatic cleanup happens if your module is unloaded,
you are responsible here.) If an error occurs, ``ERR_PTR``(-ERROR) will be
returned.
If debugfs is not enabled in the kernel, the value ``ERR_PTR``(-ENODEV) will be returned.
-
void
debugfs_create_u64
(const char * name, umode_t mode, struct dentry * parent, u64 * value)¶ create a debugfs file that is used to read and write an unsigned 64-bit value
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
umode_t mode
- the permission that the file should have
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a
directory dentry if set. If this parameter is
NULL
, then the file will be created in the root of the debugfs filesystem. u64 * value
- a pointer to the variable that the file should read to and write from.
Description
This function creates a file in debugfs with the given name that contains the value of the variable value. If the mode variable is so set, it can be read from, and written to.
-
struct dentry *
debugfs_create_ulong
(const char * name, umode_t mode, struct dentry * parent, unsigned long * value)¶ create a debugfs file that is used to read and write an unsigned long value.
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
umode_t mode
- the permission that the file should have
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a
directory dentry if set. If this parameter is
NULL
, then the file will be created in the root of the debugfs filesystem. unsigned long * value
- a pointer to the variable that the file should read to and write from.
Description
This function creates a file in debugfs with the given name that contains the value of the variable value. If the mode variable is so set, it can be read from, and written to.
This function will return a pointer to a dentry if it succeeds. This
pointer must be passed to the debugfs_remove()
function when the file is
to be removed (no automatic cleanup happens if your module is unloaded,
you are responsible here.) If an error occurs, ``ERR_PTR``(-ERROR) will be
returned.
If debugfs is not enabled in the kernel, the value ``ERR_PTR``(-ENODEV) will be returned.
-
void
debugfs_create_x8
(const char * name, umode_t mode, struct dentry * parent, u8 * value)¶ create a debugfs file that is used to read and write an unsigned 8-bit value
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
umode_t mode
- the permission that the file should have
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a
directory dentry if set. If this parameter is
NULL
, then the file will be created in the root of the debugfs filesystem. u8 * value
- a pointer to the variable that the file should read to and write from.
-
void
debugfs_create_x16
(const char * name, umode_t mode, struct dentry * parent, u16 * value)¶ create a debugfs file that is used to read and write an unsigned 16-bit value
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
umode_t mode
- the permission that the file should have
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a
directory dentry if set. If this parameter is
NULL
, then the file will be created in the root of the debugfs filesystem. u16 * value
- a pointer to the variable that the file should read to and write from.
-
void
debugfs_create_x32
(const char * name, umode_t mode, struct dentry * parent, u32 * value)¶ create a debugfs file that is used to read and write an unsigned 32-bit value
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
umode_t mode
- the permission that the file should have
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a
directory dentry if set. If this parameter is
NULL
, then the file will be created in the root of the debugfs filesystem. u32 * value
- a pointer to the variable that the file should read to and write from.
-
void
debugfs_create_x64
(const char * name, umode_t mode, struct dentry * parent, u64 * value)¶ create a debugfs file that is used to read and write an unsigned 64-bit value
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
umode_t mode
- the permission that the file should have
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a
directory dentry if set. If this parameter is
NULL
, then the file will be created in the root of the debugfs filesystem. u64 * value
- a pointer to the variable that the file should read to and write from.
-
void
debugfs_create_size_t
(const char * name, umode_t mode, struct dentry * parent, size_t * value)¶ create a debugfs file that is used to read and write an size_t value
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
umode_t mode
- the permission that the file should have
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a
directory dentry if set. If this parameter is
NULL
, then the file will be created in the root of the debugfs filesystem. size_t * value
- a pointer to the variable that the file should read to and write from.
-
void
debugfs_create_atomic_t
(const char * name, umode_t mode, struct dentry * parent, atomic_t * value)¶ create a debugfs file that is used to read and write an atomic_t value
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
umode_t mode
- the permission that the file should have
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a
directory dentry if set. If this parameter is
NULL
, then the file will be created in the root of the debugfs filesystem. atomic_t * value
- a pointer to the variable that the file should read to and write from.
-
struct dentry *
debugfs_create_bool
(const char * name, umode_t mode, struct dentry * parent, bool * value)¶ create a debugfs file that is used to read and write a boolean value
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
umode_t mode
- the permission that the file should have
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a
directory dentry if set. If this parameter is
NULL
, then the file will be created in the root of the debugfs filesystem. bool * value
- a pointer to the variable that the file should read to and write from.
Description
This function creates a file in debugfs with the given name that contains the value of the variable value. If the mode variable is so set, it can be read from, and written to.
This function will return a pointer to a dentry if it succeeds. This
pointer must be passed to the debugfs_remove()
function when the file is
to be removed (no automatic cleanup happens if your module is unloaded,
you are responsible here.) If an error occurs, ``ERR_PTR``(-ERROR) will be
returned.
If debugfs is not enabled in the kernel, the value ``ERR_PTR``(-ENODEV) will be returned.
-
struct dentry *
debugfs_create_blob
(const char * name, umode_t mode, struct dentry * parent, struct debugfs_blob_wrapper * blob)¶ create a debugfs file that is used to read a binary blob
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
umode_t mode
- the permission that the file should have
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a
directory dentry if set. If this parameter is
NULL
, then the file will be created in the root of the debugfs filesystem. struct debugfs_blob_wrapper * blob
- a pointer to a struct debugfs_blob_wrapper which contains a pointer to the blob data and the size of the data.
Description
This function creates a file in debugfs with the given name that exports blob->data as a binary blob. If the mode variable is so set it can be read from. Writing is not supported.
This function will return a pointer to a dentry if it succeeds. This
pointer must be passed to the debugfs_remove()
function when the file is
to be removed (no automatic cleanup happens if your module is unloaded,
you are responsible here.) If an error occurs, ``ERR_PTR``(-ERROR) will be
returned.
If debugfs is not enabled in the kernel, the value ``ERR_PTR``(-ENODEV) will be returned.
-
void
debugfs_create_u32_array
(const char * name, umode_t mode, struct dentry * parent, u32 * array, u32 elements)¶ create a debugfs file that is used to read u32 array.
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
umode_t mode
- the permission that the file should have.
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a
directory dentry if set. If this parameter is
NULL
, then the file will be created in the root of the debugfs filesystem. u32 * array
- u32 array that provides data.
u32 elements
- total number of elements in the array.
Description
This function creates a file in debugfs with the given name that exports array as data. If the mode variable is so set it can be read from. Writing is not supported. Seek within the file is also not supported. Once array is created its size can not be changed.
-
void
debugfs_print_regs32
(struct seq_file * s, const struct debugfs_reg32 * regs, int nregs, void __iomem * base, char * prefix)¶ use seq_print to describe a set of registers
Parameters
struct seq_file * s
- the seq_file structure being used to generate output
const struct debugfs_reg32 * regs
- an array if struct debugfs_reg32 structures
int nregs
- the length of the above array
void __iomem * base
- the base address to be used in reading the registers
char * prefix
- a string to be prefixed to every output line
Description
This function outputs a text block describing the current values of some 32-bit hardware registers. It is meant to be used within debugfs files based on seq_file that need to show registers, intermixed with other information. The prefix argument may be used to specify a leading string, because some peripherals have several blocks of identical registers, for example configuration of dma channels
-
struct dentry *
debugfs_create_regset32
(const char * name, umode_t mode, struct dentry * parent, struct debugfs_regset32 * regset)¶ create a debugfs file that returns register values
Parameters
const char * name
- a pointer to a string containing the name of the file to create.
umode_t mode
- the permission that the file should have
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a
directory dentry if set. If this parameter is
NULL
, then the file will be created in the root of the debugfs filesystem. struct debugfs_regset32 * regset
- a pointer to a struct debugfs_regset32, which contains a pointer to an array of register definitions, the array size and the base address where the register bank is to be found.
Description
This function creates a file in debugfs with the given name that reports the names and values of a set of 32-bit registers. If the mode variable is so set it can be read from. Writing is not supported.
This function will return a pointer to a dentry if it succeeds. This
pointer must be passed to the debugfs_remove()
function when the file is
to be removed (no automatic cleanup happens if your module is unloaded,
you are responsible here.) If an error occurs, ``ERR_PTR``(-ERROR) will be
returned.
If debugfs is not enabled in the kernel, the value ``ERR_PTR``(-ENODEV) will be returned.
-
struct dentry *
debugfs_create_devm_seqfile
(struct device * dev, const char * name, struct dentry * parent, int (*read_fn) (struct seq_file *s, void *data)¶ create a debugfs file that is bound to device.
Parameters
struct device * dev
- device related to this debugfs file.
const char * name
- name of the debugfs file.
struct dentry * parent
- a pointer to the parent dentry for this file. This should be a
directory dentry if set. If this parameter is
NULL
, then the file will be created in the root of the debugfs filesystem. int (*)(struct seq_file *s, void *data) read_fn
- function pointer called to print the seq_file content.