Buffer Heads

Linux uses buffer heads to maintain state about individual filesystem blocks. Buffer heads are deprecated and new filesystems should use iomap instead.

Functions

void brelse(struct buffer_head *bh)

Release a buffer.

Parameters

struct buffer_head *bh

The buffer to release.

Description

Decrement a buffer_head’s reference count. If bh is NULL, this function is a no-op.

If all buffers on a folio have zero reference count, are clean and unlocked, and if the folio is unlocked and not under writeback then try_to_free_buffers() may strip the buffers from the folio in preparation for freeing it (sometimes, rarely, buffers are removed from a folio but it ends up not being freed, and buffers may later be reattached).

Context

Any context.

void bforget(struct buffer_head *bh)

Discard any dirty data in a buffer.

Parameters

struct buffer_head *bh

The buffer to forget.

Description

Call this function instead of brelse() if the data written to a buffer no longer needs to be written back. It will clear the buffer’s dirty flag so writeback of this buffer will be skipped.

Context

Any context.

struct buffer_head *__bread(struct block_device *bdev, sector_t block, unsigned size)

Read a block.

Parameters

struct block_device *bdev

The block device to read from.

sector_t block

Block number in units of block size.

unsigned size

The block size of this device in bytes.

Description

Read a specified block, and return the buffer head that refers to it. The memory is allocated from the movable area so that it can be migrated. The returned buffer head has its refcount increased. The caller should call brelse() when it has finished with the buffer.

Context

May sleep waiting for I/O.

Return

NULL if the block was unreadable.

struct buffer_head *get_nth_bh(struct buffer_head *bh, unsigned int count)

Get a reference on the n’th buffer after this one.

Parameters

struct buffer_head *bh

The buffer to start counting from.

unsigned int count

How many buffers to skip.

Description

This is primarily useful for finding the nth buffer in a folio; in that case you pass the head buffer and the byte offset in the folio divided by the block size. It can be used for other purposes, but it will wrap at the end of the folio rather than returning NULL or proceeding to the next folio for you.

Return

The requested buffer with an elevated refcount.

int sync_mapping_buffers(struct address_space *mapping)

write out & wait upon a mapping’s “associated” buffers

Parameters

struct address_space *mapping

the mapping which wants those buffers written

Description

Starts I/O against the buffers at mapping->i_private_list, and waits upon that I/O.

Basically, this is a convenience function for fsync(). mapping is a file or directory which needs those buffers to be written for a successful fsync().

int generic_buffers_fsync_noflush(struct file *file, loff_t start, loff_t end, bool datasync)

generic buffer fsync implementation for simple filesystems with no inode lock

Parameters

struct file *file

file to synchronize

loff_t start

start offset in bytes

loff_t end

end offset in bytes (inclusive)

bool datasync

only synchronize essential metadata if true

Description

This is a generic implementation of the fsync method for simple filesystems which track all non-inode metadata in the buffers list hanging off the address_space structure.

int generic_buffers_fsync(struct file *file, loff_t start, loff_t end, bool datasync)

generic buffer fsync implementation for simple filesystems with no inode lock

Parameters

struct file *file

file to synchronize

loff_t start

start offset in bytes

loff_t end

end offset in bytes (inclusive)

bool datasync

only synchronize essential metadata if true

Description

This is a generic implementation of the fsync method for simple filesystems which track all non-inode metadata in the buffers list hanging off the address_space structure. This also makes sure that a device cache flush operation is called at the end.

bool block_dirty_folio(struct address_space *mapping, struct folio *folio)

Mark a folio as dirty.

Parameters

struct address_space *mapping

The address space containing this folio.

struct folio *folio

The folio to mark dirty.

Description

Filesystems which use buffer_heads can use this function as their ->dirty_folio implementation. Some filesystems need to do a little work before calling this function. Filesystems which do not use buffer_heads should call filemap_dirty_folio() instead.

If the folio has buffers, the uptodate buffers are set dirty, to preserve dirty-state coherency between the folio and the buffers. Buffers added to a dirty folio are created dirty.

The buffers are dirtied before the folio is dirtied. There’s a small race window in which writeback may see the folio cleanness but not the buffer dirtiness. That’s fine. If this code were to set the folio dirty before the buffers, writeback could clear the folio dirty flag, see a bunch of clean buffers and we’d end up with dirty buffers/clean folio on the dirty folio list.

We use i_private_lock to lock against try_to_free_buffers() while using the folio’s buffer list. This also prevents clean buffers being added to the folio after it was set dirty.

Context

May only be called from process context. Does not sleep. Caller must ensure that folio cannot be truncated during this call, typically by holding the folio lock or having a page in the folio mapped and holding the page table lock.

Return

True if the folio was dirtied; false if it was already dirtied.

void mark_buffer_dirty(struct buffer_head *bh)

mark a buffer_head as needing writeout

Parameters

struct buffer_head *bh

the buffer_head to mark dirty

Description

mark_buffer_dirty() will set the dirty bit against the buffer, then set its backing page dirty, then tag the page as dirty in the page cache and then attach the address_space’s inode to its superblock’s dirty inode list.

mark_buffer_dirty() is atomic. It takes bh->b_folio->mapping->i_private_lock, i_pages lock and mapping->host->i_lock.

void __brelse(struct buffer_head *bh)

Release a buffer.

Parameters

struct buffer_head *bh

The buffer to release.

Description

This variant of brelse() can be called if bh is guaranteed to not be NULL.

void __bforget(struct buffer_head *bh)

Discard any dirty data in a buffer.

Parameters

struct buffer_head *bh

The buffer to forget.

Description

This variant of bforget() can be called if bh is guaranteed to not be NULL.

struct buffer_head *bdev_getblk(struct block_device *bdev, sector_t block, unsigned size, gfp_t gfp)

Get a buffer_head in a block device’s buffer cache.

Parameters

struct block_device *bdev

The block device.

sector_t block

The block number.

unsigned size

The size of buffer_heads for this bdev.

gfp_t gfp

The memory allocation flags to use.

Description

The returned buffer head has its reference count incremented, but is not locked. The caller should call brelse() when it has finished with the buffer. The buffer may not be uptodate. If needed, the caller can bring it uptodate either by reading it or overwriting it.

Return

The buffer head, or NULL if memory could not be allocated.

struct buffer_head *__bread_gfp(struct block_device *bdev, sector_t block, unsigned size, gfp_t gfp)

Read a block.

Parameters

struct block_device *bdev

The block device to read from.

sector_t block

Block number in units of block size.

unsigned size

The block size of this device in bytes.

gfp_t gfp

Not page allocation flags; see below.

Description

You are not expected to call this function. You should use one of sb_bread(), sb_bread_unmovable() or __bread().

Read a specified block, and return the buffer head that refers to it. If gfp is 0, the memory will be allocated using the block device’s default GFP flags. If gfp is __GFP_MOVABLE, the memory may be allocated from a movable area. Do not pass in a complete set of GFP flags.

The returned buffer head has its refcount increased. The caller should call brelse() when it has finished with the buffer.

Context

May sleep waiting for I/O.

Return

NULL if the block was unreadable.

void block_invalidate_folio(struct folio *folio, size_t offset, size_t length)

Invalidate part or all of a buffer-backed folio.

Parameters

struct folio *folio

The folio which is affected.

size_t offset

start of the range to invalidate

size_t length

length of the range to invalidate

Description

block_invalidate_folio() is called when all or part of the folio has been invalidated by a truncate operation.

block_invalidate_folio() does not have to release all buffers, but it must ensure that no dirty buffer is left outside offset and that no I/O is underway against any of the blocks which are outside the truncation point. Because the caller is about to free (and possibly reuse) those blocks on-disk.

void clean_bdev_aliases(struct block_device *bdev, sector_t block, sector_t len)

clean a range of buffers in block device

Parameters

struct block_device *bdev

Block device to clean buffers in

sector_t block

Start of a range of blocks to clean

sector_t len

Number of blocks to clean

Description

We are taking a range of blocks for data and we don’t want writeback of any buffer-cache aliases starting from return from this function and until the moment when something will explicitly mark the buffer dirty (hopefully that will not happen until we will free that block ;-) We don’t even need to mark it not-uptodate - nobody can expect anything from a newly allocated buffer anyway. We used to use unmap_buffer() for such invalidation, but that was wrong. We definitely don’t want to mark the alias unmapped, for example - it would confuse anyone who might pick it with bread() afterwards...

Also.. Note that bforget() doesn’t lock the buffer. So there can be writeout I/O going on against recently-freed buffers. We don’t wait on that I/O in bforget() - it’s more efficient to wait on the I/O only if we really need to. That happens here.

bool try_to_free_buffers(struct folio *folio)

Release buffers attached to this folio.

Parameters

struct folio *folio

The folio.

Description

If any buffers are in use (dirty, under writeback, elevated refcount), no buffers will be freed.

If the folio is dirty but all the buffers are clean then we need to be sure to mark the folio clean as well. This is because the folio may be against a block device, and a later reattachment of buffers to a dirty folio will set all buffers dirty. Which would corrupt filesystem data on the same device.

The same applies to regular filesystem folios: if all the buffers are clean then we set the folio clean and proceed. To do that, we require total exclusion from block_dirty_folio(). That is obtained with i_private_lock.

Exclusion against try_to_free_buffers may be obtained by either locking the folio or by holding its mapping’s i_private_lock.

Context

Process context. folio must be locked. Will not sleep.

Return

true if all buffers attached to this folio were freed.

int bh_uptodate_or_lock(struct buffer_head *bh)

Test whether the buffer is uptodate

Parameters

struct buffer_head *bh

struct buffer_head

Description

Return true if the buffer is up-to-date and false, with the buffer locked, if not.

int __bh_read(struct buffer_head *bh, blk_opf_t op_flags, bool wait)

Submit read for a locked buffer

Parameters

struct buffer_head *bh

struct buffer_head

blk_opf_t op_flags

appending REQ_OP_* flags besides REQ_OP_READ

bool wait

wait until reading finish

Description

Returns zero on success or don’t wait, and -EIO on error.

void __bh_read_batch(int nr, struct buffer_head *bhs[], blk_opf_t op_flags, bool force_lock)

Submit read for a batch of unlocked buffers

Parameters

int nr

entry number of the buffer batch

struct buffer_head *bhs[]

a batch of struct buffer_head

blk_opf_t op_flags

appending REQ_OP_* flags besides REQ_OP_READ

bool force_lock

force to get a lock on the buffer if set, otherwise drops any buffer that cannot lock.

Description

Returns zero on success or don’t wait, and -EIO on error.