Idle/background work classes design doc:

Right now, our behaviour at idle isn’t ideal, it was designed for servers that would be under sustained load, to keep pending work at a “medium” level, to let work build up so we can process it in more efficient batches, while also giving headroom for bursts in load.

But for desktops or mobile - scenarios where work is less sustained and power usage is more important - we want to operate differently, with a “rush to idle” so the system can go to sleep. We don’t want to be dribbling out background work while the system should be idle.

The complicating factor is that there are a number of background tasks, which form a heirarchy (or a digraph, depending on how you divide it up) - one background task may generate work for another.

Thus proper idle detection needs to model this heirarchy.

Foreground writes
Page cache writeback
Copygc, rebalance
Journal reclaim

When we implement idle detection and rush to idle, we need to be careful not to disturb too much the existing behaviour that works reasonably well when the system is under sustained load (or perhaps improve it in the case of rebalance, which currently does not actively attempt to let work batch up).

SUSTAINED LOAD REGIME¶

When the system is under continuous load, we want these jobs to run continuously - this is perhaps best modelled with a P/D controller, where they’ll be trying to keep a target value (i.e. fragmented disk space, available journal space) roughly in the middle of some range.

The goal under sustained load is to balance our ability to handle load spikes without running out of x resource (free disk space, free space in the journal), while also letting some work accumululate to be batched (or become unnecessary).

For example, we don’t want to run copygc too aggressively, because then it will be evacuating buckets that would have become empty (been overwritten or deleted) anyways, and we don’t want to wait until we’re almost out of free space because then the system will behave unpredicably - suddenly we’re doing a lot more work to service each write and the system becomes much slower.

IDLE REGIME¶

When the system becomes idle, we should start flushing our pending work quicker so the system can go to sleep.

Note that the definition of “idle” depends on where in the heirarchy a task is - a task should start flushing work more quickly when the task above it has stopped generating new work.

e.g. rebalance should start flushing more quickly when page cache writeback is idle, and journal reclaim should only start flushing more quickly when both copygc and rebalance are idle.

It’s important to let work accumulate when more work is still incoming and we still have room, because flushing is always more efficient if we let it batch up. New writes may overwrite data before rebalance moves it, and tasks may be generating more updates for the btree nodes that journal reclaim needs to flush.

On idle, how much work we do at each interval should be proportional to the length of time we have been idle for. If we’re idle only for a short duration, we shouldn’t flush everything right away; the system might wake up and start generating new work soon, and flushing immediately might end up doing a lot of work that would have been unnecessary if we’d allowed things to batch more.

To summarize, we will need:

A list of classes for background tasks that generate work, which will include one “foreground” class.

Tracking for each class - “Am I doing work, or have I gone to sleep?”

And each class should check the class above it when deciding how much work to issue.

The Linux Kernel

Contents

This Page

SUSTAINED LOAD REGIME¶

IDLE REGIME¶