Header-Based Patch Attestation

Author: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Status: Beta, soliciting comments

Preamble

Projects participating in decentralized development continue to use RFC-2822 (email) formatted messages for code submissions and review. This remains the only widely accepted mechanism for code collaboration that does not rely on centralized infrastructure maintained by a single entity, which necessarily introduces a single point of dependency and a single point of failure.

RFC-2822 formatted messages can be delivered via a variety of means. To name a few of the more common ones:

email

usenet

aggregated archives (e.g. public-inbox)

Among these, email remains the most widely used transport mechanism for RFC-2822 messages, most commonly delivered via subscription-based services (mailing lists).

Email and end-to-end attestation

There are two commonly used standards for cryptographic email attestation: PGP and S/MIME. When it comes to patches sent via email, there are significant drawbacks to both:

Mailing list software may modify email body contents to add subscription information footers, causing message attestation to fail.

Attestation via detached MIME signatures may not be preserved by mailing list software that aggressively quarantines attachments.

Inline PGP attestation generally frustrates developers working with patches due to extra surrounding content and the escaping it performs for strings containing dashes at the start of the line for canonicalization purposes.

Only the body of the message is attested, leaving metadata such as "From", "Subject", and "Date" open to tampering. Git uses this metadata to formulate git commits, so leaving them unattested is suboptimal (they can be duplicated into the body of the message, but git format-patch will not do this by default).

PGP key distribution and trust delegation remains a difficult problem to solve. Even if PGP attestation is available, the developer on the receiving end of the patches may not make any use of it due to not having the sender's key in their keyring.

S/MIME certificates are increasingly difficult to obtain for developers not working in corporate environments. At the time of writing, only two commercial CAs continue to provide this service -- and only one does it for free.

For these reasons, end-to-end attestation is rarely used in communities that continue to use email as their main conduit for code submissions and review.

Email and domain-level attestation

Since unsolicited emails (SPAM) frequently forge headers in order to appear to be coming from trusted sources, most major service providers have adopted DKIM (RFC-6376) to provide cryptographic attestation for header and body contents. A message that originates from gmail.com will contain a "DKIM-Signature" header that attests the contents of the following headers (among others):

from

date

message-id

subject

The "DKIM-Signature" header also includes a hash of the message body (bh=) that is included in the final verification hash. When a DKIM signature is successfully verified using a public key that is published via gmail.com DNS records, this provides a degree of assurance that the email message has not been modified since leaving gmail.com infrastructure.

Just as PGP and S/MIME attestation, this has important problems when it comes to patches sent via mailing lists:

ML software commonly modifies the subject header in order to insert list identification (e.g. [some-topic]). Since the "subject" header is almost always included into the list of headers attested by DKIM, this causes DKIM signatures to fail verification.

ML software also routinely modifies the message body for the purposes of stripping attachments or inserting list subscription metadata. Since the bh= hash is included in the final signature hash, this results in a failed DKIM signature check.

Even if all of the above does not apply and the DKIM signature is successfully verified, body canonicalization routines mandated by the DKIM RFC may result in a false-positive successful attestation for patches. The "relaxed" canonicalization instructs that all consecutive whitespace is collapsed, so patches for languages like Python or GNU Make where whitespace is syntactically significant may have different code result in the same hash.

So, while DKIM works well enough for regular domain-level email attestation, it still has significant drawbacks for attesting patches. Similarly, it does not provide significant developer identity assurances for patches sent via large public hosting services like Gmail, Fastmail, or others -- at best, we have proof that the email traversed their mail gateways (hopefully, after being properly authenticated).

Proposal

The goal of this document is to propose a scheme that would provide cryptographic attestation for all message contents necessary for trusted distributed code collaboration. It draws on the success of the DKIM standard in order to adapt (and adopt) it for this purpose.

X-Developer-Signature header

We use DKIM RFC-6376 to implement a compatible subset of it for developer attestation signatures, with some extra steps taken to make the workflow fit better with patches sent via DKIM-non-compliant mailing lists.

Differences from DKIM:

the d= field is not used (no domain signatures involved)

the q= field is not used (end-user tooling handles key lookup)

the c= field is not used (see below for canonicalization)

the i= field is optional, but MUST be the canonical email address of the sender, if not the same as the From: field

Canonicalization

We use the "relaxed/simple" canonicalization as defined by the DKIM standard, but the message is first parsed by "git-mailinfo" in order to achieve the following:

normalize any content-transfer-encoding modifications (convert back from base64/quoted-printable/etc into 8-bit)

use any encountered in-body git headers (From:, Subject: Date:) to rewrite the outer message headers

perform any subject-line normalization in order to strip content not considered by git-am when applying the patch

To achieve this, the message is passed through git-mailinfo with the following flags:

cat orig.msg | git mailinfo --encoding=utf-8 m p > i

We then use the data found in "i" to replace the From:, Subject: and Date: headers of the original message, and concatenate "m" and "p" back together to form the body of the message, which is then normalized using CRLF line endings and the DKIM "simple" body canonicalization (any trailing blank lines are removed).

Any other headers included in signing are canonicalized using the "relaxed" header canonicalization routines defined in the DKIM standard.

In other words, the body and some of the headers are normalized and reconstituted using the "git-mailinfo" command, and then canonicalized using DKIM's relaxed/simple standard.

Algorithms

DKIM standard mostly relies on RSA signatures, though RFC 8463 extends it to support ED25519 keys as well. Since our implementation is fully backward compatible with the DKIM standard, it is possible to use any of the DKIM-defined algorithms. However, for the purposes of this POC, we only support the following two signing/hashing algorithms:

ed25519-sha256: exactly as defined in RFC8463

openpgp-sha256: uses OpenPGP to create the signature

POC code

The provided POC code in main.py is pretty feature-complete, though it probably needs further improvements to properly deal with corner-cases. You will notice that it's only a few hundred lines of Python code and does not require any external libraries/programs except libsodium and GnuPG for crypto, plus git for message canonicalization. All of these are already likely to be present on a developer's workstation.

Running the code

The POC code is written in Python and requires PyNaCl libraries in order to work. Chances are, PyNaCL is already installed on your platform, but if it isn't, you can install it via a venv:

$ python3 -mvenv .venv
$ source .venv/bin/activate
$ pip install --upgrade pip
$ pip install -r requirements.txt

Or you can achieve the same using OS packaging:

# dnf install python3-pynacl
# apt install python3-nacl

You should also have git and gpg available as external commands in your PATH.

ED25519 signatures

ED25519 is the "nothing up my sleeve" implementation of Elliptic-Curve Cryptography (ECC) favoured by free software enthusiasts. Its primary benefits are algorithmic speed of all crypto operations and relative smallness of both public/private keys and generated signatures.

To sign an email using a bundled ed25519 key, run:

$ ./main.py sign-ed25519 -k dev.key
SIGNING : ED25519 using dev.key
MSGSRC  : emails/dev-unsigned.eml
--- SIGNED MESSAGE STARTS ---
[...]
X-Developer-Signature: v=1; a=ed25519-sha256; h=from:subject:date:message-id;
 l=1003; bh=Pfwl/zDlAoe9nkYNQPcgDFscfSQdrGvx4kAzrnQdNQ8=;
 b=WyAu9nzYMUg2ntOfnvEBpa1vLQemK7axjAVu+hhYh6VyeFmB5jKzC2TcF+2IOjfG3eGl/XNY0EWc
 HUh2tF02AQwiKDVDG7mTmP1/SPpNvotD0mTWQk6LyltWKFBUpRhn

If you've ever seen email headers, you'll notice how very similar the X-Developer-Signature is to the DKIM-Signature header.

OpenPGP signatures

OpenPGP is not really an "algorithm," so this is merely an indicator that the signature is created using an OpenPGP-compliant application. Here it is in action, though you will need to use your own PGP key if you want to try it:

$ ./main.py -m emails/mricon-unsigned.eml sign-pgp -k B6C41CE35664996C
SIGNING : PGP using B6C41CE35664996C
MSGSRC  : emails/mricon-unsigned.eml
--- SIGNED MESSAGE STARTS ---
[...]
X-Developer-Signature: v=1; a=openpgp-sha256; h=from:subject:date:message-id;
 l=1002; bh=g2Sv1ZR+jIrWukzdXbqb+aeiqyFQOBLDQY6z0BBnGg4=;
 b=owGbwMvMwCG27YjM47CUmTmMp9WSGBK6vn316Z1bbjJ5DWNEgimHTc6Kx4HfTpzYcOzp9e/2jc/v
 Lg7J7ChlYRDjYJAVU2Qp2xe7KajwoYdceo8pzBxWJpAhDFycAjCRBn5Ghrc/7otaV1yX6I4/sNf056
 vmzjen3bn2Rk8X9GTuZd2/aQ0jw7fZJ2Pi36/X2fTK4cSnX/++nbAzsm0TObX4SpbBsrRHe/gA

OpenPGP supports ed25519 keys as well, so in reality the signature is made with my own ed25519 subkey, but it is further wrapped in the OpenPGP header data, which is why it is longer than the ed25519 signature in the example above. It is created using the following GnuPG parameters:

gnupg -s -u KEYID < binary-hash-to-sign

Distributing keys

The difficult part of various PKI schemes is not really the cryptography, but initial trust bootstrap and key distribution. In our case, we sidestep trust bootstrap entirely and focus solely on developer key distribution. We propose doing it via the git repository itself, borrowing the idea from the people behind the did:git project.

Using git to track contributor keys

Consider the workflow of a Linux kernel subsystem maintainer. While a single maintainer may receive patches from hundreds of people, they will likely have a fairly small subset of developers with whom they collaborate on an ongoing basis. As their relationship trust builds, the maintainer may wish to implement an attestation mechanism to verify that patches submitted by trusted lieutenants are not corrupted or modified by malicious actors en-route.

The proposed POC offers several ways of achieving this:

tracking the keys in a regular development branch
tracking the keys in a special dedicated branch
tracking the keys in a dedicated git repository

Using the regular development branch

Smaller projects with fewer contributors may simply choose to bundle developer key distribution as part of its source code. The POC in question uses the toplevel .keys directory as such location, with the following structure:

.keys
 \- sigtype
  \- domain
   \- local
    \- selector

So, for a ed25519 signature from dev@example.org, the public key needed for signature verification would be contained in:

.keys
 \- ed25519
  \- example.org
   \- dev
    \- default

The "default" filename is used when there is no other s= selector specified in the signature header.

NB: Since domain/local/selector values are taken from untrusted sources, they should be urlencoded before attempting to locate the public key on disk or via any commands passed to "git show".

Using a dedicated ref

In the case of the project the size of the Linux Kernel, it would be too onerous to track the keys of all contributors centrally, so individual subsystem maintainers will likely want to track their own subsets of keys from just the developers with whom they work on a regular basis. Using the regular development branch would be too inconvenient in this case, since it would interfere with upstream work, so it makes sense to use a separate branch for this purpose, e.g. "refs/heads/keys" that contains just the keys directory with no other content.

Participating contributors can then submit key additions and changes as regular patches or pull requests and the maintainer merely needs to remember to apply them to the proper key management branch.

Using a dedicated git repository

Similarly, instead of using a dedicated branch, maintainers may choose to use a wholly separate git repository for this purpose. This may be useful if the same set of developers work on multiple projects.

Key formats for ED25519 and OpenPGP

The public keys should be in the following format:

ed25519: base64-encoded string
openpgp: any format that can be passed to "gpg --import", but preferably an ascii-armored key export

In the case of verifying PGP signatures, the POC implementation will create a temporary keyring containing just the imported key, so it should never clash with the default keyring.

Using the default GnuPG keyring

It is up to the implementation whether to fall back to the default GnuPG keyring when checking openpgp signatures. The POC code will do so and will additionally warn if the key has insufficient trust (this check is meaningless for in-git bundled keys, so it is not performed).

Rotating and revoking keys

Keys can be retired or replaced at any time by merely changing them in the repository, committing, and pushing (or submitting a pull request/patch to the maintainer with the change). Maintainers can then pull the change or apply the patch and push it out to all other participating co-maintainers.

Contributors can have multiple valid keys if they properly specify the selector when adding signatures -- or the verification tooling can simply iterate through all keys listed in the directory for that domain/local to find the matching one.

Revoked keys can be simply deleted or moved into the revoked/ subdirectory with perhaps an explanation why they were revoked.

Verifying keys before accepting them

As stated earlier, bootstrapping trust remains a hard problem. We do not aim to resolve it here and will cowardly defer to the participating maintainers to pick their preferred key verification strategy, e.g.:

meeting up in person at a conference and exchanging keys
holding a video session and reciting fingerprints (or entire keys, in the case of ed25519)
using an email round-trip as proof of key ownership

This can be as lax or as strict as maintainers choose (though if the procedure is too lax, then the whole point of cryptographic attestation becomes moot).

Trusting the git repository

Obviously, if keys are distributed via git, then one must trust git itself and the commit provenance. This, again, is a "bootstrapping trust" sort of problem that we promised to side-step, but we can at least give the following recommendations:

the person maintaining the keyring should PGP-sign all commits modifying public key contents
the repository itself should initially be cloned from trusted sources over secure protocols

We hope to provide a separate best-practices document aimed at keyring maintainers, should this scheme become adopted.

Automating patch attestation

The git-send-email application supports executing a validation hook before sending out patches. The end-user tooling should provide git hook integration so that patches are automatically attested every time "git-send-email" is used.

We aim to provide a lightweight attestation utility for this purpose, as well as implement all necessary verification routines in "b4" client-side tooling used by many Linux developers for their patch workflow.