kernel/git/dgc/linux-xfs.git - XFS Kernel Development Tree

tag name	xfs-fstrim-busy-tag (bce6020a00e28168b230f4e9dba16351588a27db)
tag date	2023-10-04 10:05:09 +1100
tagged by	Dave Chinner <david@fromorbit.com>
tagged object	commit e78a40b851...
download	linux-xfs-xfs-fstrim-busy-tag.tar.gz

xfs: reduce AGF hold times during fstrim operations

A recent log space overflow and recovery failure was root caused to a long running truncate blocking on the AGF and ending up pinning the tail of the log. The filesystem then hung, the machine was rebooted, and log recoery then refused to run because there wasn't enough space in the log for EFI transaction reservation. The reason the long running truncate got blocked on the AGF for so long was that an fstrim was being run. THe underlying block device was large and very slow (10TB ceph rbd volume) and so discarding all the free space in the AG took a really long time. The current fstrim implementation holds the AGF across the entire operations - both the freee space scan and the issuing of all the discards. The discards are synchronous and single depth, so if there are millions of free spaces, we hold the AGF lock across millions of discard operations. It doesn't really need to be said that this is a Bad Thing. This series reworks the fstrim discard path to use the same mechanisms as online discard. This allows discards to be issued asynchronously without holding the AGF locked, enabling higher discard queue depths (much faster on fast devices) and only requiring the AGF lock to be held whilst we are scanning free space. To do this, we make use of busy extents - we lock the AGF, mark all the extents we want to discard as "busy under discard" so that nothing will be allowed to allocate them, and then drop the AGF lock. We then issue discards on the gathered busy extents and on discard completion remove them from the busy list. This results in AGF lock holds times for fstrim dropping to a few milliseconds each batch of free extents we scan, and so the hours long hold times that can currently occur on large, slow, badly fragmented device no longer occur. Signed-off-by: Dave Chinner <dchinner@redhat.com> -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEmJOoJ8GffZYWSjj/regpR/R1+h0FAmUcnnAUHGRhdmlkQGZy b21vcmJpdC5jb20ACgkQregpR/R1+h3J+Q//d1PSmdSqS6GbqXy8/YDQnOvYvd0z ErUdZR3Uzd4arvOdYjqCbmshoHxqcG3ajUH4H96l0Dr/a0Y3cznYWcGGnL/fYupF PhlPSJgtnQtkM713rvZD/m7EiWU1dWOejrN++3VJrxLcrhZNu6oSej2ivFMnd0F1 xUcJLj068ztUwS2Q21/pNMaQO6QFdGkp2lfnVtAgwTkoJcjO6eFgYuB1Vqj3e09F SN+SETvoBWhr1mjQpVzP5SBj/42f6pUXQa0XvWdZoAo1D/hQGIu9G7NaoyGuv5V8 j5xpn+BmPg2iUGPOCa+D+z/WIjASOgBZTG1q/MgOL674p52qr9W7eWoNjKI6GHmH YJGZwufqXa0ud0VX3L8bOFIzHO/lg9o1mUw3asEpZloNPWcjorrU5kS4wAfKKoL8 mr6uiplpq00p4jDTtMFtpFn+ma86rUJhv51Wdqc5OMOf0iihauq0lecjwAgXSe7Y YupfyoHcHEKFTZXEwH1KXDmHMOxePH3bLHYYVy1CfgZj2jEgsz7ss9kpm4KNIAB3 TIizV5TY/ttNdO0hNHLXsXe6xx2kqW9/WeRkAD49Ao6++NZVWVoEagDmVSfSxJhs Xb3g0M7UBU55X4XVNgLJDDp7waxjRNOneBj2aGfiHkJsmIQzgDKt8l2fFCr1PbkZ 5OxZ15gsbNP3+u4= =NknK -----END PGP SIGNATURE-----