4.5.2. Memory-to-Memory Stateful Video Encoder Interface¶
A stateful video encoder takes raw video frames in display order and encodes them into a bytestream. It generates complete chunks of the bytestream, including all metadata, headers, etc. The resulting bytestream does not require any further post-processing by the client.
Performing software stream processing, header generation etc. in the driver in order to support this interface is strongly discouraged. In case such operations are needed, use of the Stateless Video Encoder Interface (in development) is strongly advised.
4.5.2.1. Conventions and Notations Used in This Document¶
The general V4L2 API rules apply if not specified in this document otherwise.
The meaning of words “must”, “may”, “should”, etc. is as per RFC 2119.
All steps not marked “optional” are required.
VIDIOC_G_EXT_CTRLS()andVIDIOC_S_EXT_CTRLS()may be used interchangeably withVIDIOC_G_CTRL()andVIDIOC_S_CTRL(), unless specified otherwise.Single-planar API (see Single- and multi-planar APIs) and applicable structures may be used interchangeably with multi-planar API, unless specified otherwise, depending on encoder capabilities and following the general V4L2 guidelines.
i = [a..b]: sequence of integers from a to b, inclusive, i.e. i = [0..2]: i = 0, 1, 2.
Given an
OUTPUTbuffer A, then A’ represents a buffer on theCAPTUREqueue containing data that resulted from processing buffer A.
4.5.2.2. Glossary¶
Refer to Glossary.
4.5.2.3. State Machine¶
Encoder State Machine¶
4.5.2.4. Querying Capabilities¶
To enumerate the set of coded formats supported by the encoder, the client may call
VIDIOC_ENUM_FMT()onCAPTURE.The full set of supported formats will be returned, regardless of the format set on
OUTPUT.
To enumerate the set of supported raw formats, the client may call
VIDIOC_ENUM_FMT()onOUTPUT.Only the formats supported for the format currently active on
CAPTUREwill be returned.In order to enumerate raw formats supported by a given coded format, the client must first set that coded format on
CAPTUREand then enumerate the formats onOUTPUT.
The client may use
VIDIOC_ENUM_FRAMESIZES()to detect supported resolutions for a given format, passing the desired pixel format inv4l2_frmsizeenumpixel_format.Values returned by
VIDIOC_ENUM_FRAMESIZES()for a coded pixel format will include all possible coded resolutions supported by the encoder for the given coded pixel format.Values returned by
VIDIOC_ENUM_FRAMESIZES()for a raw pixel format will include all possible frame buffer resolutions supported by the encoder for the given raw pixel format and coded format currently set onCAPTURE.
The client may use
VIDIOC_ENUM_FRAMEINTERVALS()to detect supported frame intervals for a given format and resolution, passing the desired pixel format inv4l2_frmsizeenumpixel_formatand the resolution inv4l2_frmsizeenumwidthandv4l2_frmsizeenumheight.Values returned by
VIDIOC_ENUM_FRAMEINTERVALS()for a coded pixel format and coded resolution will include all possible frame intervals supported by the encoder for the given coded pixel format and resolution.Values returned by
VIDIOC_ENUM_FRAMEINTERVALS()for a raw pixel format and resolution will include all possible frame intervals supported by the encoder for the given raw pixel format and resolution and for the coded format, coded resolution and coded frame interval currently set onCAPTURE.Support for
VIDIOC_ENUM_FRAMEINTERVALS()is optional. If it is not implemented, then there are no special restrictions other than the limits of the codec itself.
Supported profiles and levels for the coded format currently set on
CAPTURE, if applicable, may be queried using their respective controls viaVIDIOC_QUERYCTRL().Any additional encoder capabilities may be discovered by querying their respective controls.
4.5.2.5. Initialization¶
Set the coded format on the
CAPTUREqueue viaVIDIOC_S_FMT().Required fields:
typea
V4L2_BUF_TYPE_*enum appropriate forCAPTURE.pixelformatthe coded format to be produced.
sizeimagedesired size of
CAPTUREbuffers; the encoder may adjust it to match hardware requirements.width,heightignored (read-only).
- other fields
follow standard semantics.
Return fields:
sizeimageadjusted size of
CAPTUREbuffers.width,heightthe coded size selected by the encoder based on current state, e.g.
OUTPUTformat, selection rectangles, etc. (read-only).
Important
Changing the
CAPTUREformat may change the currently setOUTPUTformat. How the newOUTPUTformat is determined is up to the encoder and the client must ensure it matches its needs afterwards.Optional. Enumerate supported
OUTPUTformats (raw formats for source) for the selected coded format viaVIDIOC_ENUM_FMT().Required fields:
typea
V4L2_BUF_TYPE_*enum appropriate forOUTPUT.- other fields
follow standard semantics.
Return fields:
pixelformatraw format supported for the coded format currently selected on the
CAPTUREqueue.- other fields
follow standard semantics.
Set the raw source format on the
OUTPUTqueue viaVIDIOC_S_FMT().Required fields:
typea
V4L2_BUF_TYPE_*enum appropriate forOUTPUT.pixelformatraw format of the source.
width,heightsource resolution.
- other fields
follow standard semantics.
Return fields:
width,heightmay be adjusted to match encoder minimums, maximums and alignment requirements, as required by the currently selected formats, as reported by
VIDIOC_ENUM_FRAMESIZES().- other fields
follow standard semantics.
Setting the
OUTPUTformat will reset the selection rectangles to their default values, based on the new resolution, as described in the next step.
Set the raw frame interval on the
OUTPUTqueue viaVIDIOC_S_PARM(). This also sets the coded frame interval on theCAPTUREqueue to the same value.** Required fields:**
typea
V4L2_BUF_TYPE_*enum appropriate forOUTPUT.parm.outputset all fields except
parm.output.timeperframeto 0.parm.output.timeperframethe desired frame interval; the encoder may adjust it to match hardware requirements.
Return fields:
parm.output.timeperframethe adjusted frame interval.
Important
Changing the
OUTPUTframe interval also sets the framerate that the encoder uses to encode the video. So setting the frame interval to 1/24 (or 24 frames per second) will produce a coded video stream that can be played back at that speed. The frame interval for theOUTPUTqueue is just a hint, the application may provide raw frames at a different rate. It can be used by the driver to help schedule multiple encoders running in parallel.In the next step the
CAPTUREframe interval can optionally be changed to a different value. This is useful for off-line encoding were the coded frame interval can be different from the rate at which raw frames are supplied.Important
timeperframedeals with frames, not fields. So for interlaced formats this is the time per two fields, since a frame consists of a top and a bottom field.Note
It is due to historical reasons that changing the
OUTPUTframe interval also changes the coded frame interval on theCAPTUREqueue. Ideally these would be independent settings, but that would break the existing API.Optional Set the coded frame interval on the
CAPTUREqueue viaVIDIOC_S_PARM(). This is only necessary if the coded frame interval is different from the raw frame interval, which is typically the case for off-line encoding. Support for this feature is signalled by the V4L2_FMT_FLAG_ENC_CAP_FRAME_INTERVAL format flag.** Required fields:**
typea
V4L2_BUF_TYPE_*enum appropriate forCAPTURE.parm.captureset all fields except
parm.capture.timeperframeto 0.parm.capture.timeperframethe desired coded frame interval; the encoder may adjust it to match hardware requirements.
Return fields:
parm.capture.timeperframethe adjusted frame interval.
Important
Changing the
CAPTUREframe interval sets the framerate for the coded video. It does not set the rate at which buffers arrive on theCAPTUREqueue, that depends on how fast the encoder is and how fast raw frames are queued on theOUTPUTqueue.Important
timeperframedeals with frames, not fields. So for interlaced formats this is the time per two fields, since a frame consists of a top and a bottom field.Note
Not all drivers support this functionality, in that case just set the desired coded frame interval for the
OUTPUTqueue.However, drivers that can schedule multiple encoders based on the
OUTPUTframe interval must support this optional feature.Optional. Set the visible resolution for the stream metadata via
VIDIOC_S_SELECTION()on theOUTPUTqueue if it is desired to be different than the full OUTPUT resolution.Required fields:
typea
V4L2_BUF_TYPE_*enum appropriate forOUTPUT.targetset to
V4L2_SEL_TGT_CROP.r.left,r.top,r.width,r.heightvisible rectangle; this must fit within the V4L2_SEL_TGT_CROP_BOUNDS rectangle and may be subject to adjustment to match codec and hardware constraints.
Return fields:
r.left,r.top,r.width,r.heightvisible rectangle adjusted by the encoder.
The following selection targets are supported on
OUTPUT:V4L2_SEL_TGT_CROP_BOUNDSequal to the full source frame, matching the active
OUTPUTformat.V4L2_SEL_TGT_CROP_DEFAULTequal to
V4L2_SEL_TGT_CROP_BOUNDS.V4L2_SEL_TGT_CROPrectangle within the source buffer to be encoded into the
CAPTUREstream; defaults toV4L2_SEL_TGT_CROP_DEFAULT.Note
A common use case for this selection target is encoding a source video with a resolution that is not a multiple of a macroblock, e.g. the common 1920x1080 resolution may require the source buffers to be aligned to 1920x1088 for codecs with 16x16 macroblock size. To avoid encoding the padding, the client needs to explicitly configure this selection target to 1920x1080.
Warning
The encoder may adjust the crop/compose rectangles to the nearest supported ones to meet codec and hardware requirements. The client needs to check the adjusted rectangle returned by
VIDIOC_S_SELECTION().Allocate buffers for both
OUTPUTandCAPTUREviaVIDIOC_REQBUFS(). This may be performed in any order.Required fields:
countrequested number of buffers to allocate; greater than zero.
typea
V4L2_BUF_TYPE_*enum appropriate forOUTPUTorCAPTURE.- other fields
follow standard semantics.
Return fields:
countactual number of buffers allocated.
Warning
The actual number of allocated buffers may differ from the
countgiven. The client must check the updated value ofcountafter the call returns.Note
To allocate more than the minimum number of OUTPUT buffers (for pipeline depth), the client may query the
V4L2_CID_MIN_BUFFERS_FOR_OUTPUTcontrol to get the minimum number of buffers required, and pass the obtained value plus the number of additional buffers needed in thecountfield toVIDIOC_REQBUFS().Alternatively,
VIDIOC_CREATE_BUFS()can be used to have more control over buffer allocation.Required fields:
countrequested number of buffers to allocate; greater than zero.
typea
V4L2_BUF_TYPE_*enum appropriate forOUTPUT.- other fields
follow standard semantics.
Return fields:
countadjusted to the number of allocated buffers.
Begin streaming on both
OUTPUTandCAPTUREqueues viaVIDIOC_STREAMON(). This may be performed in any order. The actual encoding process starts when both queues start streaming.
Note
If the client stops the CAPTURE queue during the encode process and then
restarts it again, the encoder will begin generating a stream independent
from the stream generated before the stop. The exact constraints depend
on the coded format, but may include the following implications:
encoded frames produced after the restart must not reference any frames produced before the stop, e.g. no long term references for H.264/HEVC,
any headers that must be included in a standalone stream must be produced again, e.g. SPS and PPS for H.264/HEVC.
4.5.2.6. Encoding¶
This state is reached after the Initialization sequence finishes
successfully. In this state, the client queues and dequeues buffers to both
queues via VIDIOC_QBUF() and VIDIOC_DQBUF(), following the
standard semantics.
The content of encoded CAPTURE buffers depends on the active coded pixel
format and may be affected by codec-specific extended controls, as stated
in the documentation of each format.
Both queues operate independently, following standard behavior of V4L2 buffer
queues and memory-to-memory devices. In addition, the order of encoded frames
dequeued from the CAPTURE queue may differ from the order of queuing raw
frames to the OUTPUT queue, due to properties of the selected coded format,
e.g. frame reordering.
The client must not assume any direct relationship between CAPTURE and
OUTPUT buffers and any specific timing of buffers becoming
available to dequeue. Specifically:
a buffer queued to
OUTPUTmay result in more than one buffer produced onCAPTURE(for example, if returning an encoded frame allowed the encoder to return a frame that preceded it in display, but succeeded it in the decode order; however, there may be other reasons for this as well),a buffer queued to
OUTPUTmay result in a buffer being produced onCAPTURElater into encode process, and/or after processing furtherOUTPUTbuffers, or be returned out of order, e.g. if display reordering is used,buffers may become available on the
CAPTUREqueue without additional buffers queued toOUTPUT(e.g. during drain orEOS), because of theOUTPUTbuffers queued in the past whose encoding results are only available at later time, due to specifics of the encoding process,buffers queued to
OUTPUTmay not become available to dequeue instantly after being encoded into a correspondingCAPTUREbuffer, e.g. if the encoder needs to use the frame as a reference for encoding further frames.
Note
To allow matching encoded CAPTURE buffers with OUTPUT buffers they
originated from, the client can set the timestamp field of the
v4l2_buffer struct when queuing an OUTPUT buffer. The
CAPTURE buffer(s), which resulted from encoding that OUTPUT buffer
will have their timestamp field set to the same value when dequeued.
In addition to the straightforward case of one OUTPUT buffer producing
one CAPTURE buffer, the following cases are defined:
one
OUTPUTbuffer generates multipleCAPTUREbuffers: the sameOUTPUTtimestamp will be copied to multipleCAPTUREbuffers,the encoding order differs from the presentation order (i.e. the
CAPTUREbuffers are out-of-order compared to theOUTPUTbuffers):CAPTUREtimestamps will not retain the order ofOUTPUTtimestamps.
Note
To let the client distinguish between frame types (keyframes, intermediate
frames; the exact list of types depends on the coded format), the
CAPTURE buffers will have corresponding flag bits set in their
v4l2_buffer struct when dequeued. See the documentation of
v4l2_buffer and each coded pixel format for exact list of flags
and their meanings.
Should an encoding error occur, it will be reported to the client with the level of details depending on the encoder capabilities. Specifically:
the
CAPTUREbuffer (if any) that contains the results of the failed encode operation will be returned with theV4L2_BUF_FLAG_ERRORflag set,if the encoder is able to precisely report the
OUTPUTbuffer(s) that triggered the error, such buffer(s) will be returned with theV4L2_BUF_FLAG_ERRORflag set.
Note
If a CAPTURE buffer is too small then it is just returned with the
V4L2_BUF_FLAG_ERROR flag set. More work is needed to detect that this
error occurred because the buffer was too small, and to provide support to
free existing buffers that were too small.
In case of a fatal failure that does not allow the encoding to continue, any further operations on corresponding encoder file handle will return the -EIO error code. The client may close the file handle and open a new one, or alternatively reinitialize the instance by stopping streaming on both queues, releasing all buffers and performing the Initialization sequence again.
4.5.2.7. Encoding Parameter Changes¶
The client is allowed to use VIDIOC_S_CTRL() to change encoder
parameters at any time. The availability of parameters is encoder-specific
and the client must query the encoder to find the set of available controls.
The ability to change each parameter during encoding is encoder-specific, as
per the standard semantics of the V4L2 control interface. The client may
attempt to set a control during encoding and if the operation fails with the
-EBUSY error code, the CAPTURE queue needs to be stopped for the
configuration change to be allowed. To do this, it may follow the Drain
sequence to avoid losing the already queued/encoded frames.
The timing of parameter updates is encoder-specific, as per the standard semantics of the V4L2 control interface. If the client needs to apply the parameters exactly at specific frame, using the Request API (Request API) should be considered, if supported by the encoder.
4.5.2.8. Drain¶
To ensure that all the queued OUTPUT buffers have been processed and the
related CAPTURE buffers are given to the client, the client must follow the
drain sequence described below. After the drain sequence ends, the client has
received all encoded frames for all OUTPUT buffers queued before the
sequence was started.
Begin the drain sequence by issuing
VIDIOC_ENCODER_CMD().Required fields:
cmdset to
V4L2_ENC_CMD_STOP.flagsset to 0.
ptsset to 0.
Warning
The sequence can be only initiated if both
OUTPUTandCAPTUREqueues are streaming. For compatibility reasons, the call toVIDIOC_ENCODER_CMD()will not fail even if any of the queues is not streaming, but at the same time it will not initiate the Drain sequence and so the steps described below would not be applicable.Any
OUTPUTbuffers queued by the client before theVIDIOC_ENCODER_CMD()was issued will be processed and encoded as normal. The client must continue to handle both queues independently, similarly to normal encode operation. This includes:queuing and dequeuing
CAPTUREbuffers, until a buffer marked with theV4L2_BUF_FLAG_LASTflag is dequeued,Warning
The last buffer may be empty (with
v4l2_bufferbytesused= 0) and in that case it must be ignored by the client, as it does not contain an encoded frame.Note
Any attempt to dequeue more
CAPTUREbuffers beyond the buffer marked withV4L2_BUF_FLAG_LASTwill result in a -EPIPE error fromVIDIOC_DQBUF().dequeuing processed
OUTPUTbuffers, until all the buffers queued before theV4L2_ENC_CMD_STOPcommand are dequeued,dequeuing the
V4L2_EVENT_EOSevent, if the client subscribes to it.
Note
For backwards compatibility, the encoder will signal a
V4L2_EVENT_EOSevent when the last frame has been encoded and all frames are ready to be dequeued. It is deprecated behavior and the client must not rely on it. TheV4L2_BUF_FLAG_LASTbuffer flag should be used instead.Once all
OUTPUTbuffers queued before theV4L2_ENC_CMD_STOPcall are dequeued and the lastCAPTUREbuffer is dequeued, the encoder is stopped and it will accept, but not process any newly queuedOUTPUTbuffers until the client issues any of the following operations:V4L2_ENC_CMD_START- the encoder will not be reset and will resume operation normally, with all the state from before the drain,a pair of
VIDIOC_STREAMOFF()andVIDIOC_STREAMON()on theCAPTUREqueue - the encoder will be reset (see the Reset sequence) and then resume encoding,a pair of
VIDIOC_STREAMOFF()andVIDIOC_STREAMON()on theOUTPUTqueue - the encoder will resume operation normally, however any source frames queued to theOUTPUTqueue betweenV4L2_ENC_CMD_STOPandVIDIOC_STREAMOFF()will be discarded.
Note
Once the drain sequence is initiated, the client needs to drive it to
completion, as described by the steps above, unless it aborts the process by
issuing VIDIOC_STREAMOFF() on any of the OUTPUT or CAPTURE
queues. The client is not allowed to issue V4L2_ENC_CMD_START or
V4L2_ENC_CMD_STOP again while the drain sequence is in progress and they
will fail with -EBUSY error code if attempted.
For reference, handling of various corner cases is described below:
In case of no buffer in the
OUTPUTqueue at the time theV4L2_ENC_CMD_STOPcommand was issued, the drain sequence completes immediately and the encoder returns an emptyCAPTUREbuffer with theV4L2_BUF_FLAG_LASTflag set.In case of no buffer in the
CAPTUREqueue at the time the drain sequence completes, the next time the client queues aCAPTUREbuffer it is returned at once as an empty buffer with theV4L2_BUF_FLAG_LASTflag set.If
VIDIOC_STREAMOFF()is called on theCAPTUREqueue in the middle of the drain sequence, the drain sequence is canceled and allCAPTUREbuffers are implicitly returned to the client.If
VIDIOC_STREAMOFF()is called on theOUTPUTqueue in the middle of the drain sequence, the drain sequence completes immediately and nextCAPTUREbuffer will be returned empty with theV4L2_BUF_FLAG_LASTflag set.
Although not mandatory, the availability of encoder commands may be queried
using VIDIOC_TRY_ENCODER_CMD().
4.5.2.9. Reset¶
The client may want to request the encoder to reinitialize the encoding, so that the following stream data becomes independent from the stream data generated before. Depending on the coded format, that may imply that:
encoded frames produced after the restart must not reference any frames produced before the stop, e.g. no long term references for H.264/HEVC,
any headers that must be included in a standalone stream must be produced again, e.g. SPS and PPS for H.264/HEVC.
This can be achieved by performing the reset sequence.
Perform the Drain sequence to ensure all the in-flight encoding finishes and respective buffers are dequeued.
Stop streaming on the
CAPTUREqueue viaVIDIOC_STREAMOFF(). This will return all currently queuedCAPTUREbuffers to the client, without valid frame data.Start streaming on the
CAPTUREqueue viaVIDIOC_STREAMON()and continue with regular encoding sequence. The encoded frames produced intoCAPTUREbuffers from now on will contain a standalone stream that can be decoded without the need for frames encoded before the reset sequence, starting at the firstOUTPUTbuffer queued after issuing the V4L2_ENC_CMD_STOP of the Drain sequence.
This sequence may be also used to change encoding parameters for encoders without the ability to change the parameters on the fly.
4.5.2.10. Commit Points¶
Setting formats and allocating buffers triggers changes in the behavior of the encoder.
Setting the format on the
CAPTUREqueue may change the set of formats supported/advertised on theOUTPUTqueue. In particular, it also means that theOUTPUTformat may be reset and the client must not rely on the previously set format being preserved.Enumerating formats on the
OUTPUTqueue always returns only formats supported for the currentCAPTUREformat.Setting the format on the
OUTPUTqueue does not change the list of formats available on theCAPTUREqueue. An attempt to set theOUTPUTformat that is not supported for the currently selectedCAPTUREformat will result in the encoder adjusting the requestedOUTPUTformat to a supported one.Enumerating formats on the
CAPTUREqueue always returns the full set of supported coded formats, irrespective of the currentOUTPUTformat.While buffers are allocated on any of the
OUTPUTorCAPTUREqueues, the client must not change the format on theCAPTUREqueue. Drivers will return the -EBUSY error code for any such format change attempt.
To summarize, setting formats and allocation must always start with the
CAPTURE queue and the CAPTURE queue is the master that governs the
set of supported formats for the OUTPUT queue.