diff options
author | Junio C Hamano <gitster@pobox.com> | 2022-08-14 23:19:27 -0700 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2022-08-14 23:19:28 -0700 |
commit | c0f6dd49f19b6a5c74863c42c2677aade3a142ec (patch) | |
tree | 0992ce81ac210d0a45499c9597eeb54a77ec8413 /Documentation/technical | |
parent | 3adacc2817bf4644928b9430c7c6ed1ca2ef2655 (diff) | |
parent | 1e2320161d27684205f55ffa91f7f481d32863d5 (diff) | |
download | git-c0f6dd49f19b6a5c74863c42c2677aade3a142ec.tar.gz |
Merge branch 'ab/tech-docs-to-help'
Expose a lot of "tech docs" via "git help" interface.
* ab/tech-docs-to-help:
docs: move http-protocol docs to man section 5
docs: move cruft pack docs to gitformat-pack
docs: move pack format docs to man section 5
docs: move signature docs to man section 5
docs: move index format docs to man section 5
docs: move protocol-related docs to man section 5
docs: move commit-graph format docs to man section 5
git docs: add a category for file formats, protocols and interfaces
git docs: add a category for user-facing file, repo and command UX
git help doc: use "<doc>" instead of "<guide>"
help.c: remove common category behavior from drop_prefix() behavior
help.c: refactor drop_prefix() to use a "switch" statement"
Diffstat (limited to 'Documentation/technical')
-rw-r--r-- | Documentation/technical/api-simple-ipc.txt | 2 | ||||
-rw-r--r-- | Documentation/technical/bundle-format.txt | 81 | ||||
-rw-r--r-- | Documentation/technical/chunk-format.txt | 116 | ||||
-rw-r--r-- | Documentation/technical/commit-graph-format.txt | 166 | ||||
-rw-r--r-- | Documentation/technical/cruft-packs.txt | 123 | ||||
-rw-r--r-- | Documentation/technical/hash-function-transition.txt | 2 | ||||
-rw-r--r-- | Documentation/technical/http-protocol.txt | 522 | ||||
-rw-r--r-- | Documentation/technical/index-format.txt | 404 | ||||
-rw-r--r-- | Documentation/technical/long-running-process-protocol.txt | 2 | ||||
-rw-r--r-- | Documentation/technical/pack-format.txt | 484 | ||||
-rw-r--r-- | Documentation/technical/pack-protocol.txt | 709 | ||||
-rw-r--r-- | Documentation/technical/packfile-uri.txt | 2 | ||||
-rw-r--r-- | Documentation/technical/partial-clone.txt | 2 | ||||
-rw-r--r-- | Documentation/technical/protocol-capabilities.txt | 380 | ||||
-rw-r--r-- | Documentation/technical/protocol-common.txt | 99 | ||||
-rw-r--r-- | Documentation/technical/protocol-v2.txt | 568 | ||||
-rw-r--r-- | Documentation/technical/signature-format.txt | 202 |
17 files changed, 5 insertions, 3859 deletions
diff --git a/Documentation/technical/api-simple-ipc.txt b/Documentation/technical/api-simple-ipc.txt index d79ad323e6..d44ada98e7 100644 --- a/Documentation/technical/api-simple-ipc.txt +++ b/Documentation/technical/api-simple-ipc.txt @@ -78,7 +78,7 @@ client and an optional response message from the server. Both the client and server messages are unlimited in length and are terminated with a flush packet. -The pkt-line routines (Documentation/technical/protocol-common.txt) +The pkt-line routines (linkgit:gitprotocol-common[5]) are used to simplify buffer management during message generation, transmission, and reception. A flush packet is used to mark the end of the message. This allows the sender to incrementally generate and diff --git a/Documentation/technical/bundle-format.txt b/Documentation/technical/bundle-format.txt deleted file mode 100644 index b9be8644cf..0000000000 --- a/Documentation/technical/bundle-format.txt +++ /dev/null @@ -1,81 +0,0 @@ -= Git bundle v2 format - -The Git bundle format is a format that represents both refs and Git objects. - -== Format - -We will use ABNF notation to define the Git bundle format. See -protocol-common.txt for the details. - -A v2 bundle looks like this: - ----- -bundle = signature *prerequisite *reference LF pack -signature = "# v2 git bundle" LF - -prerequisite = "-" obj-id SP comment LF -comment = *CHAR -reference = obj-id SP refname LF - -pack = ... ; packfile ----- - -A v3 bundle looks like this: - ----- -bundle = signature *capability *prerequisite *reference LF pack -signature = "# v3 git bundle" LF - -capability = "@" key ["=" value] LF -prerequisite = "-" obj-id SP comment LF -comment = *CHAR -reference = obj-id SP refname LF -key = 1*(ALPHA / DIGIT / "-") -value = *(%01-09 / %0b-FF) - -pack = ... ; packfile ----- - -== Semantics - -A Git bundle consists of several parts. - -* "Capabilities", which are only in the v3 format, indicate functionality that - the bundle requires to be read properly. - -* "Prerequisites" lists the objects that are NOT included in the bundle and the - reader of the bundle MUST already have, in order to use the data in the - bundle. The objects stored in the bundle may refer to prerequisite objects and - anything reachable from them (e.g. a tree object in the bundle can reference - a blob that is reachable from a prerequisite) and/or expressed as a delta - against prerequisite objects. - -* "References" record the tips of the history graph, iow, what the reader of the - bundle CAN "git fetch" from it. - -* "Pack" is the pack data stream "git fetch" would send, if you fetch from a - repository that has the references recorded in the "References" above into a - repository that has references pointing at the objects listed in - "Prerequisites" above. - -In the bundle format, there can be a comment following a prerequisite obj-id. -This is a comment and it has no specific meaning. The writer of the bundle MAY -put any string here. The reader of the bundle MUST ignore the comment. - -=== Note on the shallow clone and a Git bundle - -Note that the prerequisites does not represent a shallow-clone boundary. The -semantics of the prerequisites and the shallow-clone boundaries are different, -and the Git bundle v2 format cannot represent a shallow clone repository. - -== Capabilities - -Because there is no opportunity for negotiation, unknown capabilities cause 'git -bundle' to abort. - -* `object-format` specifies the hash algorithm in use, and can take the same - values as the `extensions.objectFormat` configuration value. - -* `filter` specifies an object filter as in the `--filter` option in - linkgit:git-rev-list[1]. The resulting pack-file must be marked as a - `.promisor` pack-file after it is unbundled. diff --git a/Documentation/technical/chunk-format.txt b/Documentation/technical/chunk-format.txt deleted file mode 100644 index 593614fced..0000000000 --- a/Documentation/technical/chunk-format.txt +++ /dev/null @@ -1,116 +0,0 @@ -Chunk-based file formats -======================== - -Some file formats in Git use a common concept of "chunks" to describe -sections of the file. This allows structured access to a large file by -scanning a small "table of contents" for the remaining data. This common -format is used by the `commit-graph` and `multi-pack-index` files. See -link:technical/pack-format.html[the `multi-pack-index` format] and -link:technical/commit-graph-format.html[the `commit-graph` format] for -how they use the chunks to describe structured data. - -A chunk-based file format begins with some header information custom to -that format. That header should include enough information to identify -the file type, format version, and number of chunks in the file. From this -information, that file can determine the start of the chunk-based region. - -The chunk-based region starts with a table of contents describing where -each chunk starts and ends. This consists of (C+1) rows of 12 bytes each, -where C is the number of chunks. Consider the following table: - - | Chunk ID (4 bytes) | Chunk Offset (8 bytes) | - |--------------------|------------------------| - | ID[0] | OFFSET[0] | - | ... | ... | - | ID[C] | OFFSET[C] | - | 0x0000 | OFFSET[C+1] | - -Each row consists of a 4-byte chunk identifier (ID) and an 8-byte offset. -Each integer is stored in network-byte order. - -The chunk identifier `ID[i]` is a label for the data stored within this -fill from `OFFSET[i]` (inclusive) to `OFFSET[i+1]` (exclusive). Thus, the -size of the `i`th chunk is equal to the difference between `OFFSET[i+1]` -and `OFFSET[i]`. This requires that the chunk data appears contiguously -in the same order as the table of contents. - -The final entry in the table of contents must be four zero bytes. This -confirms that the table of contents is ending and provides the offset for -the end of the chunk-based data. - -Note: The chunk-based format expects that the file contains _at least_ a -trailing hash after `OFFSET[C+1]`. - -Functions for working with chunk-based file formats are declared in -`chunk-format.h`. Using these methods provide extra checks that assist -developers when creating new file formats. - -Writing chunk-based file formats --------------------------------- - -To write a chunk-based file format, create a `struct chunkfile` by -calling `init_chunkfile()` and pass a `struct hashfile` pointer. The -caller is responsible for opening the `hashfile` and writing header -information so the file format is identifiable before the chunk-based -format begins. - -Then, call `add_chunk()` for each chunk that is intended for write. This -populates the `chunkfile` with information about the order and size of -each chunk to write. Provide a `chunk_write_fn` function pointer to -perform the write of the chunk data upon request. - -Call `write_chunkfile()` to write the table of contents to the `hashfile` -followed by each of the chunks. This will verify that each chunk wrote -the expected amount of data so the table of contents is correct. - -Finally, call `free_chunkfile()` to clear the `struct chunkfile` data. The -caller is responsible for finalizing the `hashfile` by writing the trailing -hash and closing the file. - -Reading chunk-based file formats --------------------------------- - -To read a chunk-based file format, the file must be opened as a -memory-mapped region. The chunk-format API expects that the entire file -is mapped as a contiguous memory region. - -Initialize a `struct chunkfile` pointer with `init_chunkfile(NULL)`. - -After reading the header information from the beginning of the file, -including the chunk count, call `read_table_of_contents()` to populate -the `struct chunkfile` with the list of chunks, their offsets, and their -sizes. - -Extract the data information for each chunk using `pair_chunk()` or -`read_chunk()`: - -* `pair_chunk()` assigns a given pointer with the location inside the - memory-mapped file corresponding to that chunk's offset. If the chunk - does not exist, then the pointer is not modified. - -* `read_chunk()` takes a `chunk_read_fn` function pointer and calls it - with the appropriate initial pointer and size information. The function - is not called if the chunk does not exist. Use this method to read chunks - if you need to perform immediate parsing or if you need to execute logic - based on the size of the chunk. - -After calling these methods, call `free_chunkfile()` to clear the -`struct chunkfile` data. This will not close the memory-mapped region. -Callers are expected to own that data for the timeframe the pointers into -the region are needed. - -Examples --------- - -These file formats use the chunk-format API, and can be used as examples -for future formats: - -* *commit-graph:* see `write_commit_graph_file()` and `parse_commit_graph()` - in `commit-graph.c` for how the chunk-format API is used to write and - parse the commit-graph file format documented in - link:technical/commit-graph-format.html[the commit-graph file format]. - -* *multi-pack-index:* see `write_midx_internal()` and `load_multi_pack_index()` - in `midx.c` for how the chunk-format API is used to write and - parse the multi-pack-index file format documented in - link:technical/pack-format.html[the multi-pack-index file format]. diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt deleted file mode 100644 index 484b185ba9..0000000000 --- a/Documentation/technical/commit-graph-format.txt +++ /dev/null @@ -1,166 +0,0 @@ -Git commit graph format -======================= - -The Git commit graph stores a list of commit OIDs and some associated -metadata, including: - -- The generation number of the commit. - -- The root tree OID. - -- The commit date. - -- The parents of the commit, stored using positional references within - the graph file. - -- The Bloom filter of the commit carrying the paths that were changed between - the commit and its first parent, if requested. - -These positional references are stored as unsigned 32-bit integers -corresponding to the array position within the list of commit OIDs. Due -to some special constants we use to track parents, we can store at most -(1 << 30) + (1 << 29) + (1 << 28) - 1 (around 1.8 billion) commits. - -== Commit graph files have the following format: - -In order to allow extensions that add extra data to the graph, we organize -the body into "chunks" and provide a binary lookup table at the beginning -of the body. The header includes certain values, such as number of chunks -and hash type. - -All multi-byte numbers are in network byte order. - -HEADER: - - 4-byte signature: - The signature is: {'C', 'G', 'P', 'H'} - - 1-byte version number: - Currently, the only valid version is 1. - - 1-byte Hash Version - We infer the hash length (H) from this value: - 1 => SHA-1 - 2 => SHA-256 - If the hash type does not match the repository's hash algorithm, the - commit-graph file should be ignored with a warning presented to the - user. - - 1-byte number (C) of "chunks" - - 1-byte number (B) of base commit-graphs - We infer the length (H*B) of the Base Graphs chunk - from this value. - -CHUNK LOOKUP: - - (C + 1) * 12 bytes listing the table of contents for the chunks: - First 4 bytes describe the chunk id. Value 0 is a terminating label. - Other 8 bytes provide the byte-offset in current file for chunk to - start. (Chunks are ordered contiguously in the file, so you can infer - the length using the next chunk position if necessary.) Each chunk - ID appears at most once. - - The CHUNK LOOKUP matches the table of contents from - link:technical/chunk-format.html[the chunk-based file format]. - - The remaining data in the body is described one chunk at a time, and - these chunks may be given in any order. Chunks are required unless - otherwise specified. - -CHUNK DATA: - - OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes) - The ith entry, F[i], stores the number of OIDs with first - byte at most i. Thus F[255] stores the total - number of commits (N). - - OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes) - The OIDs for all commits in the graph, sorted in ascending order. - - Commit Data (ID: {'C', 'D', 'A', 'T' }) (N * (H + 16) bytes) - * The first H bytes are for the OID of the root tree. - * The next 8 bytes are for the positions of the first two parents - of the ith commit. Stores value 0x70000000 if no parent in that - position. If there are more than two parents, the second value - has its most-significant bit on and the other bits store an array - position into the Extra Edge List chunk. - * The next 8 bytes store the topological level (generation number v1) - of the commit and - the commit time in seconds since EPOCH. The generation number - uses the higher 30 bits of the first 4 bytes, while the commit - time uses the 32 bits of the second 4 bytes, along with the lowest - 2 bits of the lowest byte, storing the 33rd and 34th bit of the - commit time. - - Generation Data (ID: {'G', 'D', 'A', '2' }) (N * 4 bytes) [Optional] - * This list of 4-byte values store corrected commit date offsets for the - commits, arranged in the same order as commit data chunk. - * If the corrected commit date offset cannot be stored within 31 bits, - the value has its most-significant bit on and the other bits store - the position of corrected commit date into the Generation Data Overflow - chunk. - * Generation Data chunk is present only when commit-graph file is written - by compatible versions of Git and in case of split commit-graph chains, - the topmost layer also has Generation Data chunk. - - Generation Data Overflow (ID: {'G', 'D', 'O', '2' }) [Optional] - * This list of 8-byte values stores the corrected commit date offsets - for commits with corrected commit date offsets that cannot be - stored within 31 bits. - * Generation Data Overflow chunk is present only when Generation Data - chunk is present and atleast one corrected commit date offset cannot - be stored within 31 bits. - - Extra Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional] - This list of 4-byte values store the second through nth parents for - all octopus merges. The second parent value in the commit data stores - an array position within this list along with the most-significant bit - on. Starting at that array position, iterate through this list of commit - positions for the parents until reaching a value with the most-significant - bit on. The other bits correspond to the position of the last parent. - - Bloom Filter Index (ID: {'B', 'I', 'D', 'X'}) (N * 4 bytes) [Optional] - * The ith entry, BIDX[i], stores the number of bytes in all Bloom filters - from commit 0 to commit i (inclusive) in lexicographic order. The Bloom - filter for the i-th commit spans from BIDX[i-1] to BIDX[i] (plus header - length), where BIDX[-1] is 0. - * The BIDX chunk is ignored if the BDAT chunk is not present. - - Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional] - * It starts with header consisting of three unsigned 32-bit integers: - - Version of the hash algorithm being used. We currently only support - value 1 which corresponds to the 32-bit version of the murmur3 hash - implemented exactly as described in - https://en.wikipedia.org/wiki/MurmurHash#Algorithm and the double - hashing technique using seed values 0x293ae76f and 0x7e646e2 as - described in https://doi.org/10.1007/978-3-540-30494-4_26 "Bloom Filters - in Probabilistic Verification" - - The number of times a path is hashed and hence the number of bit positions - that cumulatively determine whether a file is present in the commit. - - The minimum number of bits 'b' per entry in the Bloom filter. If the filter - contains 'n' entries, then the filter size is the minimum number of 64-bit - words that contain n*b bits. - * The rest of the chunk is the concatenation of all the computed Bloom - filters for the commits in lexicographic order. - * Note: Commits with no changes or more than 512 changes have Bloom filters - of length one, with either all bits set to zero or one respectively. - * The BDAT chunk is present if and only if BIDX is present. - - Base Graphs List (ID: {'B', 'A', 'S', 'E'}) [Optional] - This list of H-byte hashes describe a set of B commit-graph files that - form a commit-graph chain. The graph position for the ith commit in this - file's OID Lookup chunk is equal to i plus the number of commits in all - base graphs. If B is non-zero, this chunk must exist. - -TRAILER: - - H-byte HASH-checksum of all of the above. - -== Historical Notes: - -The Generation Data (GDA2) and Generation Data Overflow (GDO2) chunks have -the number '2' in their chunk IDs because a previous version of Git wrote -possibly erroneous data in these chunks with the IDs "GDAT" and "GDOV". By -changing the IDs, newer versions of Git will silently ignore those older -chunks and write the new information without trusting the incorrect data. diff --git a/Documentation/technical/cruft-packs.txt b/Documentation/technical/cruft-packs.txt deleted file mode 100644 index d81f3a8982..0000000000 --- a/Documentation/technical/cruft-packs.txt +++ /dev/null @@ -1,123 +0,0 @@ -= Cruft packs - -The cruft packs feature offer an alternative to Git's traditional mechanism of -removing unreachable objects. This document provides an overview of Git's -pruning mechanism, and how a cruft pack can be used instead to accomplish the -same. - -== Background - -To remove unreachable objects from your repository, Git offers `git repack -Ad` -(see linkgit:git-repack[1]). Quoting from the documentation: - -[quote] -[...] unreachable objects in a previous pack become loose, unpacked objects, -instead of being left in the old pack. [...] loose unreachable objects will be -pruned according to normal expiry rules with the next 'git gc' invocation. - -Unreachable objects aren't removed immediately, since doing so could race with -an incoming push which may reference an object which is about to be deleted. -Instead, those unreachable objects are stored as loose objects and stay that way -until they are older than the expiration window, at which point they are removed -by linkgit:git-prune[1]. - -Git must store these unreachable objects loose in order to keep track of their -per-object mtimes. If these unreachable objects were written into one big pack, -then either freshening that pack (because an object contained within it was -re-written) or creating a new pack of unreachable objects would cause the pack's -mtime to get updated, and the objects within it would never leave the expiration -window. Instead, objects are stored loose in order to keep track of the -individual object mtimes and avoid a situation where all cruft objects are -freshened at once. - -This can lead to undesirable situations when a repository contains many -unreachable objects which have not yet left the grace period. Having large -directories in the shards of `.git/objects` can lead to decreased performance in -the repository. But given enough unreachable objects, this can lead to inode -starvation and degrade the performance of the whole system. Since we -can never pack those objects, these repositories often take up a large amount of -disk space, since we can only zlib compress them, but not store them in delta -chains. - -== Cruft packs - -A cruft pack eliminates the need for storing unreachable objects in a loose -state by including the per-object mtimes in a separate file alongside a single -pack containing all loose objects. - -A cruft pack is written by `git repack --cruft` when generating a new pack. -linkgit:git-pack-objects[1]'s `--cruft` option. Note that `git repack --cruft` -is a classic all-into-one repack, meaning that everything in the resulting pack is -reachable, and everything else is unreachable. Once written, the `--cruft` -option instructs `git repack` to generate another pack containing only objects -not packed in the previous step (which equates to packing all unreachable -objects together). This progresses as follows: - - 1. Enumerate every object, marking any object which is (a) not contained in a - kept-pack, and (b) whose mtime is within the grace period as a traversal - tip. - - 2. Perform a reachability traversal based on the tips gathered in the previous - step, adding every object along the way to the pack. - - 3. Write the pack out, along with a `.mtimes` file that records the per-object - timestamps. - -This mode is invoked internally by linkgit:git-repack[1] when instructed to -write a cruft pack. Crucially, the set of in-core kept packs is exactly the set -of packs which will not be deleted by the repack; in other words, they contain -all of the repository's reachable objects. - -When a repository already has a cruft pack, `git repack --cruft` typically only -adds objects to it. An exception to this is when `git repack` is given the -`--cruft-expiration` option, which allows the generated cruft pack to omit -expired objects instead of waiting for linkgit:git-gc[1] to expire those objects -later on. - -It is linkgit:git-gc[1] that is typically responsible for removing expired -unreachable objects. - -== Caution for mixed-version environments - -Repositories that have cruft packs in them will continue to work with any older -version of Git. Note, however, that previous versions of Git which do not -understand the `.mtimes` file will use the cruft pack's mtime as the mtime for -all of the objects in it. In other words, do not expect older (pre-cruft pack) -versions of Git to interpret or even read the contents of the `.mtimes` file. - -Note that having mixed versions of Git GC-ing the same repository can lead to -unreachable objects never being completely pruned. This can happen under the -following circumstances: - - - An older version of Git running GC explodes the contents of an existing - cruft pack loose, using the cruft pack's mtime. - - A newer version running GC collects those loose objects into a cruft pack, - where the .mtime file reflects the loose object's actual mtimes, but the - cruft pack mtime is "now". - -Repeating this process will lead to unreachable objects not getting pruned as a -result of repeatedly resetting the objects' mtimes to the present time. - -If you are GC-ing repositories in a mixed version environment, consider omitting -the `--cruft` option when using linkgit:git-repack[1] and linkgit:git-gc[1], and -leaving the `gc.cruftPacks` configuration unset until all writers understand -cruft packs. - -== Alternatives - -Notable alternatives to this design include: - - - The location of the per-object mtime data, and - - Storing unreachable objects in multiple cruft packs. - -On the location of mtime data, a new auxiliary file tied to the pack was chosen -to avoid complicating the `.idx` format. If the `.idx` format were ever to gain -support for optional chunks of data, it may make sense to consolidate the -`.mtimes` format into the `.idx` itself. - -Storing unreachable objects among multiple cruft packs (e.g., creating a new -cruft pack during each repacking operation including only unreachable objects -which aren't already stored in an earlier cruft pack) is significantly more -complicated to construct, and so aren't pursued here. The obvious drawback to -the current implementation is that the entire cruft pack must be re-written from -scratch. diff --git a/Documentation/technical/hash-function-transition.txt b/Documentation/technical/hash-function-transition.txt index 260224b033..e2ac36dd21 100644 --- a/Documentation/technical/hash-function-transition.txt +++ b/Documentation/technical/hash-function-transition.txt @@ -205,7 +205,7 @@ SHA-1 content. Object storage ~~~~~~~~~~~~~~ Loose objects use zlib compression and packed objects use the packed -format described in Documentation/technical/pack-format.txt, just like +format described in linkgit:gitformat-pack[5], just like today. The content that is compressed and stored uses SHA-256 content instead of SHA-1 content. diff --git a/Documentation/technical/http-protocol.txt b/Documentation/technical/http-protocol.txt deleted file mode 100644 index cc5126cfed..0000000000 --- a/Documentation/technical/http-protocol.txt +++ /dev/null @@ -1,522 +0,0 @@ -HTTP transfer protocols -======================= - -Git supports two HTTP based transfer protocols. A "dumb" protocol -which requires only a standard HTTP server on the server end of the -connection, and a "smart" protocol which requires a Git aware CGI -(or server module). This document describes both protocols. - -As a design feature smart clients can automatically upgrade "dumb" -protocol URLs to smart URLs. This permits all users to have the -same published URL, and the peers automatically select the most -efficient transport available to them. - - -URL Format ----------- - -URLs for Git repositories accessed by HTTP use the standard HTTP -URL syntax documented by RFC 1738, so they are of the form: - - http://<host>:<port>/<path>?<searchpart> - -Within this documentation the placeholder `$GIT_URL` will stand for -the http:// repository URL entered by the end-user. - -Servers SHOULD handle all requests to locations matching `$GIT_URL`, as -both the "smart" and "dumb" HTTP protocols used by Git operate -by appending additional path components onto the end of the user -supplied `$GIT_URL` string. - -An example of a dumb client requesting for a loose object: - - $GIT_URL: http://example.com:8080/git/repo.git - URL request: http://example.com:8080/git/repo.git/objects/d0/49f6c27a2244e12041955e262a404c7faba355 - -An example of a smart request to a catch-all gateway: - - $GIT_URL: http://example.com/daemon.cgi?svc=git&q= - URL request: http://example.com/daemon.cgi?svc=git&q=/info/refs&service=git-receive-pack - -An example of a request to a submodule: - - $GIT_URL: http://example.com/git/repo.git/path/submodule.git - URL request: http://example.com/git/repo.git/path/submodule.git/info/refs - -Clients MUST strip a trailing `/`, if present, from the user supplied -`$GIT_URL` string to prevent empty path tokens (`//`) from appearing -in any URL sent to a server. Compatible clients MUST expand -`$GIT_URL/info/refs` as `foo/info/refs` and not `foo//info/refs`. - - -Authentication --------------- - -Standard HTTP authentication is used if authentication is required -to access a repository, and MAY be configured and enforced by the -HTTP server software. - -Because Git repositories are accessed by standard path components -server administrators MAY use directory based permissions within -their HTTP server to control repository access. - -Clients SHOULD support Basic authentication as described by RFC 2617. -Servers SHOULD support Basic authentication by relying upon the -HTTP server placed in front of the Git server software. - -Servers SHOULD NOT require HTTP cookies for the purposes of -authentication or access control. - -Clients and servers MAY support other common forms of HTTP based -authentication, such as Digest authentication. - - -SSL ---- - -Clients and servers SHOULD support SSL, particularly to protect -passwords when relying on Basic HTTP authentication. - - -Session State -------------- - -The Git over HTTP protocol (much like HTTP itself) is stateless -from the perspective of the HTTP server side. All state MUST be -retained and managed by the client process. This permits simple -round-robin load-balancing on the server side, without needing to -worry about state management. - -Clients MUST NOT require state management on the server side in -order to function correctly. - -Servers MUST NOT require HTTP cookies in order to function correctly. -Clients MAY store and forward HTTP cookies during request processing -as described by RFC 2616 (HTTP/1.1). Servers SHOULD ignore any -cookies sent by a client. - - -General Request Processing --------------------------- - -Except where noted, all standard HTTP behavior SHOULD be assumed -by both client and server. This includes (but is not necessarily -limited to): - -If there is no repository at `$GIT_URL`, or the resource pointed to by a -location matching `$GIT_URL` does not exist, the server MUST NOT respond -with `200 OK` response. A server SHOULD respond with -`404 Not Found`, `410 Gone`, or any other suitable HTTP status code -which does not imply the resource exists as requested. - -If there is a repository at `$GIT_URL`, but access is not currently -permitted, the server MUST respond with the `403 Forbidden` HTTP -status code. - -Servers SHOULD support both HTTP 1.0 and HTTP 1.1. -Servers SHOULD support chunked encoding for both request and response -bodies. - -Clients SHOULD support both HTTP 1.0 and HTTP 1.1. -Clients SHOULD support chunked encoding for both request and response -bodies. - -Servers MAY return ETag and/or Last-Modified headers. - -Clients MAY revalidate cached entities by including If-Modified-Since -and/or If-None-Match request headers. - -Servers MAY return `304 Not Modified` if the relevant headers appear -in the request and the entity has not changed. Clients MUST treat -`304 Not Modified` identical to `200 OK` by reusing the cached entity. - -Clients MAY reuse a cached entity without revalidation if the -Cache-Control and/or Expires header permits caching. Clients and -servers MUST follow RFC 2616 for cache controls. - - -Discovering References ----------------------- - -All HTTP clients MUST begin either a fetch or a push exchange by -discovering the references available on the remote repository. - -Dumb Clients -~~~~~~~~~~~~ - -HTTP clients that only support the "dumb" protocol MUST discover -references by making a request for the special info/refs file of -the repository. - -Dumb HTTP clients MUST make a `GET` request to `$GIT_URL/info/refs`, -without any search/query parameters. - - C: GET $GIT_URL/info/refs HTTP/1.0 - - S: 200 OK - S: - S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint - S: d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master - S: 2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0 - S: a3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{} - -The Content-Type of the returned info/refs entity SHOULD be -`text/plain; charset=utf-8`, but MAY be any content type. -Clients MUST NOT attempt to validate the returned Content-Type. -Dumb servers MUST NOT return a return type starting with -`application/x-git-`. - -Cache-Control headers MAY be returned to disable caching of the -returned entity. - -When examining the response clients SHOULD only examine the HTTP -status code. Valid responses are `200 OK`, or `304 Not Modified`. - -The returned content is a UNIX formatted text file describing -each ref and its known value. The file SHOULD be sorted by name -according to the C locale ordering. The file SHOULD NOT include -the default ref named `HEAD`. - - info_refs = *( ref_record ) - ref_record = any_ref / peeled_ref - - any_ref = obj-id HTAB refname LF - peeled_ref = obj-id HTAB refname LF - obj-id HTAB refname "^{}" LF - -Smart Clients -~~~~~~~~~~~~~ - -HTTP clients that support the "smart" protocol (or both the -"smart" and "dumb" protocols) MUST discover references by making -a parameterized request for the info/refs file of the repository. - -The request MUST contain exactly one query parameter, -`service=$servicename`, where `$servicename` MUST be the service -name the client wishes to contact to complete the operation. -The request MUST NOT contain additional query parameters. - - C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0 - -dumb server reply: - - S: 200 OK - S: - S: 95dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint - S: d049f6c27a2244e12041955e262a404c7faba355 refs/heads/master - S: 2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0 - S: a3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{} - -smart server reply: - - S: 200 OK - S: Content-Type: application/x-git-upload-pack-advertisement - S: Cache-Control: no-cache - S: - S: 001e# service=git-upload-pack\n - S: 0000 - S: 004895dcfa3633004da0049d3d0fa03f80589cbcaf31 refs/heads/maint\0multi_ack\n - S: 003fd049f6c27a2244e12041955e262a404c7faba355 refs/heads/master\n - S: 003c2cb58b79488a98d2721cea644875a8dd0026b115 refs/tags/v1.0\n - S: 003fa3c2e2402b99163d1d59756e5f207ae21cccba4c refs/tags/v1.0^{}\n - S: 0000 - -The client may send Extra Parameters (see -Documentation/technical/pack-protocol.txt) as a colon-separated string -in the Git-Protocol HTTP header. - -Uses the `--http-backend-info-refs` option to -linkgit:git-upload-pack[1]. - -Dumb Server Response -^^^^^^^^^^^^^^^^^^^^ -Dumb servers MUST respond with the dumb server reply format. - -See the prior section under dumb clients for a more detailed -description of the dumb server response. - -Smart Server Response -^^^^^^^^^^^^^^^^^^^^^ -If the server does not recognize the requested service name, or the -requested service name has been disabled by the server administrator, -the server MUST respond with the `403 Forbidden` HTTP status code. - -Otherwise, smart servers MUST respond with the smart server reply -format for the requested service name. - -Cache-Control headers SHOULD be used to disable caching of the -returned entity. - -The Content-Type MUST be `application/x-$servicename-advertisement`. -Clients SHOULD fall back to the dumb protocol if another content -type is returned. When falling back to the dumb protocol clients -SHOULD NOT make an additional request to `$GIT_URL/info/refs`, but -instead SHOULD use the response already in hand. Clients MUST NOT -continue if they do not support the dumb protocol. - -Clients MUST validate the status code is either `200 OK` or -`304 Not Modified`. - -Clients MUST validate the first five bytes of the response entity -matches the regex `^[0-9a-f]{4}#`. If this test fails, clients -MUST NOT continue. - -Clients MUST parse the entire response as a sequence of pkt-line -records. - -Clients MUST verify the first pkt-line is `# service=$servicename`. -Servers MUST set $servicename to be the request parameter value. -Servers SHOULD include an LF at the end of this line. -Clients MUST ignore an LF at the end of the line. - -Servers MUST terminate the response with the magic `0000` end -pkt-line marker. - -The returned response is a pkt-line stream describing each ref and -its known value. The stream SHOULD be sorted by name according to -the C locale ordering. The stream SHOULD include the default ref -named `HEAD` as the first ref. The stream MUST include capability -declarations behind a NUL on the first ref. - -The returned response contains "version 1" if "version=1" was sent as an -Extra Parameter. - - smart_reply = PKT-LINE("# service=$servicename" LF) - "0000" - *1("version 1") - ref_list - "0000" - ref_list = empty_list / non_empty_list - - empty_list = PKT-LINE(zero-id SP "capabilities^{}" NUL cap-list LF) - - non_empty_list = PKT-LINE(obj-id SP name NUL cap_list LF) - *ref_record - - cap-list = capability *(SP capability) - capability = 1*(LC_ALPHA / DIGIT / "-" / "_") - LC_ALPHA = %x61-7A - - ref_record = any_ref / peeled_ref - any_ref = PKT-LINE(obj-id SP name LF) - peeled_ref = PKT-LINE(obj-id SP name LF) - PKT-LINE(obj-id SP name "^{}" LF - - -Smart Service git-upload-pack ------------------------------- -This service reads from the repository pointed to by `$GIT_URL`. - -Clients MUST first perform ref discovery with -`$GIT_URL/info/refs?service=git-upload-pack`. - - C: POST $GIT_URL/git-upload-pack HTTP/1.0 - C: Content-Type: application/x-git-upload-pack-request - C: - C: 0032want 0a53e9ddeaddad63ad106860237bbf53411d11a7\n - C: 0032have 441b40d833fdfa93eb2908e52742248faf0ee993\n - C: 0000 - - S: 200 OK - S: Content-Type: application/x-git-upload-pack-result - S: Cache-Control: no-cache - S: - S: ....ACK %s, continue - S: ....NAK - -Clients MUST NOT reuse or revalidate a cached response. -Servers MUST include sufficient Cache-Control headers -to prevent caching of the response. - -Servers SHOULD support all capabilities defined here. - -Clients MUST send at least one "want" command in the request body. -Clients MUST NOT reference an id in a "want" command which did not -appear in the response obtained through ref discovery unless the -server advertises capability `allow-tip-sha1-in-want` or -`allow-reachable-sha1-in-want`. - - compute_request = want_list - have_list - request_end - request_end = "0000" / "done" - - want_list = PKT-LINE(want SP cap_list LF) - *(want_pkt) - want_pkt = PKT-LINE(want LF) - want = "want" SP id - cap_list = capability *(SP capability) - - have_list = *PKT-LINE("have" SP id LF) - -TODO: Document this further. - -The Negotiation Algorithm -~~~~~~~~~~~~~~~~~~~~~~~~~ -The computation to select the minimal pack proceeds as follows -(C = client, S = server): - -'init step:' - -C: Use ref discovery to obtain the advertised refs. - -C: Place any object seen into set `advertised`. - -C: Build an empty set, `common`, to hold the objects that are later - determined to be on both ends. - -C: Build a set, `want`, of the objects from `advertised` the client - wants to fetch, based on what it saw during ref discovery. - -C: Start a queue, `c_pending`, ordered by commit time (popping newest - first). Add all client refs. When a commit is popped from - the queue its parents SHOULD be automatically inserted back. - Commits MUST only enter the queue once. - -'one compute step:' - -C: Send one `$GIT_URL/git-upload-pack` request: - - C: 0032want <want #1>............................... - C: 0032want <want #2>............................... - .... - C: 0032have <common #1>............................. - C: 0032have <common #2>............................. - .... - C: 0032have <have #1>............................... - C: 0032have <have #2>............................... - .... - C: 0000 - -The stream is organized into "commands", with each command -appearing by itself in a pkt-line. Within a command line, -the text leading up to the first space is the command name, -and the remainder of the line to the first LF is the value. -Command lines are terminated with an LF as the last byte of -the pkt-line value. - -Commands MUST appear in the following order, if they appear -at all in the request stream: - -* "want" -* "have" - -The stream is terminated by a pkt-line flush (`0000`). - -A single "want" or "have" command MUST have one hex formatted -object name as its value. Multiple object names MUST be sent by sending -multiple commands. Object names MUST be given using the object format -negotiated through the `object-format` capability (default SHA-1). - -The `have` list is created by popping the first 32 commits -from `c_pending`. Less can be supplied if `c_pending` empties. - -If the client has sent 256 "have" commits and has not yet -received one of those back from `s_common`, or the client has -emptied `c_pending` it SHOULD include a "done" command to let -the server know it won't proceed: - - C: 0009done - -S: Parse the git-upload-pack request: - -Verify all objects in `want` are directly reachable from refs. - -The server MAY walk backwards through history or through -the reflog to permit slightly stale requests. - -If no "want" objects are received, send an error: -TODO: Define error if no "want" lines are requested. - -If any "want" object is not reachable, send an error: -TODO: Define error if an invalid "want" is requested. - -Create an empty list, `s_common`. - -If "have" was sent: - -Loop through the objects in the order supplied by the client. - -For each object, if the server has the object reachable from -a ref, add it to `s_common`. If a commit is added to `s_common`, -do not add any ancestors, even if they also appear in `have`. - -S: Send the git-upload-pack response: - -If the server has found a closed set of objects to pack or the -request ends with "done", it replies with the pack. -TODO: Document the pack based response - - S: PACK... - -The returned stream is the side-band-64k protocol supported -by the git-upload-pack service, and the pack is embedded into -stream 1. Progress messages from the server side MAY appear -in stream 2. - -Here a "closed set of objects" is defined to have at least -one path from every "want" to at least one "common" object. - -If the server needs more information, it replies with a -status continue response: -TODO: Document the non-pack response - -C: Parse the upload-pack response: - TODO: Document parsing response - -'Do another compute step.' - - -Smart Service git-receive-pack ------------------------------- -This service reads from the repository pointed to by `$GIT_URL`. - -Clients MUST first perform ref discovery with -`$GIT_URL/info/refs?service=git-receive-pack`. - - C: POST $GIT_URL/git-receive-pack HTTP/1.0 - C: Content-Type: application/x-git-receive-pack-request - C: - C: ....0a53e9ddeaddad63ad106860237bbf53411d11a7 441b40d833fdfa93eb2908e52742248faf0ee993 refs/heads/maint\0 report-status - C: 0000 - C: PACK.... - - S: 200 OK - S: Content-Type: application/x-git-receive-pack-result - S: Cache-Control: no-cache - S: - S: .... - -Clients MUST NOT reuse or revalidate a cached response. -Servers MUST include sufficient Cache-Control headers -to prevent caching of the response. - -Servers SHOULD support all capabilities defined here. - -Clients MUST send at least one command in the request body. -Within the command portion of the request body clients SHOULD send -the id obtained through ref discovery as old_id. - - update_request = command_list - "PACK" <binary data> - - command_list = PKT-LINE(command NUL cap_list LF) - *(command_pkt) - command_pkt = PKT-LINE(command LF) - cap_list = *(SP capability) SP - - command = create / delete / update - create = zero-id SP new_id SP name - delete = old_id SP zero-id SP name - update = old_id SP new_id SP name - -TODO: Document this further. - - -References ----------- - -http://www.ietf.org/rfc/rfc1738.txt[RFC 1738: Uniform Resource Locators (URL)] -http://www.ietf.org/rfc/rfc2616.txt[RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1] -link:technical/pack-protocol.html -link:technical/protocol-capabilities.html diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt deleted file mode 100644 index f691c20ab0..0000000000 --- a/Documentation/technical/index-format.txt +++ /dev/null @@ -1,404 +0,0 @@ -Git index format -================ - -== The Git index file has the following format - - All binary numbers are in network byte order. - In a repository using the traditional SHA-1, checksums and object IDs - (object names) mentioned below are all computed using SHA-1. Similarly, - in SHA-256 repositories, these values are computed using SHA-256. - Version 2 is described here unless stated otherwise. - - - A 12-byte header consisting of - - 4-byte signature: - The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache") - - 4-byte version number: - The current supported versions are 2, 3 and 4. - - 32-bit number of index entries. - - - A number of sorted index entries (see below). - - - Extensions - - Extensions are identified by signature. Optional extensions can - be ignored if Git does not understand them. - - 4-byte extension signature. If the first byte is 'A'..'Z' the - extension is optional and can be ignored. - - 32-bit size of the extension - - Extension data - - - Hash checksum over the content of the index file before this checksum. - -== Index entry - - Index entries are sorted in ascending order on the name field, - interpreted as a string of unsigned bytes (i.e. memcmp() order, no - localization, no special casing of directory separator '/'). Entries - with the same name are sorted by their stage field. - - An index entry typically represents a file. However, if sparse-checkout - is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the - `extensions.sparseIndex` extension is enabled, then the index may - contain entries for directories outside of the sparse-checkout definition. - These entries have mode `040000`, include the `SKIP_WORKTREE` bit, and - the path ends in a directory separator. - - 32-bit ctime seconds, the last time a file's metadata changed - this is stat(2) data - - 32-bit ctime nanosecond fractions - this is stat(2) data - - 32-bit mtime seconds, the last time a file's data changed - this is stat(2) data - - 32-bit mtime nanosecond fractions - this is stat(2) data - - 32-bit dev - this is stat(2) data - - 32-bit ino - this is stat(2) data - - 32-bit mode, split into (high to low bits) - - 4-bit object type - valid values in binary are 1000 (regular file), 1010 (symbolic link) - and 1110 (gitlink) - - 3-bit unused - - 9-bit unix permission. Only 0755 and 0644 are valid for regular files. - Symbolic links and gitlinks have value 0 in this field. - - 32-bit uid - this is stat(2) data - - 32-bit gid - this is stat(2) data - - 32-bit file size - This is the on-disk size from stat(2), truncated to 32-bit. - - Object name for the represented object - - A 16-bit 'flags' field split into (high to low bits) - - 1-bit assume-valid flag - - 1-bit extended flag (must be zero in version 2) - - 2-bit stage (during merge) - - 12-bit name length if the length is less than 0xFFF; otherwise 0xFFF - is stored in this field. - - (Version 3 or later) A 16-bit field, only applicable if the - "extended flag" above is 1, split into (high to low bits). - - 1-bit reserved for future - - 1-bit skip-worktree flag (used by sparse checkout) - - 1-bit intent-to-add flag (used by "git add -N") - - 13-bit unused, must be zero - - Entry path name (variable length) relative to top level directory - (without leading slash). '/' is used as path separator. The special - path components ".", ".." and ".git" (without quotes) are disallowed. - Trailing slash is also disallowed. - - The exact encoding is undefined, but the '.' and '/' characters - are encoded in 7-bit ASCII and the encoding cannot contain a NUL - byte (iow, this is a UNIX pathname). - - (Version 4) In version 4, the entry path name is prefix-compressed - relative to the path name for the previous entry (the very first - entry is encoded as if the path name for the previous entry is an - empty string). At the beginning of an entry, an integer N in the - variable width encoding (the same encoding as the offset is encoded - for OFS_DELTA pack entries; see pack-format.txt) is stored, followed - by a NUL-terminated string S. Removing N bytes from the end of the - path name for the previous entry, and replacing it with the string S - yields the path name for this entry. - - 1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes - while keeping the name NUL-terminated. - - (Version 4) In version 4, the padding after the pathname does not - exist. - - Interpretation of index entries in split index mode is completely - different. See below for details. - -== Extensions - -=== Cache tree - - Since the index does not record entries for directories, the cache - entries cannot describe tree objects that already exist in the object - database for regions of the index that are unchanged from an existing - commit. The cache tree extension stores a recursive tree structure that - describes the trees that already exist and completely match sections of - the cache entries. This speeds up tree object generation from the index - for a new commit by only computing the trees that are "new" to that - commit. It also assists when comparing the index to another tree, such - as `HEAD^{tree}`, since sections of the index can be skipped when a tree - comparison demonstrates equality. - - The recursive tree structure uses nodes that store a number of cache - entries, a list of subnodes, and an object ID (OID). The OID references - the existing tree for that node, if it is known to exist. The subnodes - correspond to subdirectories that themselves have cache tree nodes. The - number of cache entries corresponds to the number of cache entries in - the index that describe paths within that tree's directory. - - The extension tracks the full directory structure in the cache tree - extension, but this is generally smaller than the full cache entry list. - - When a path is updated in index, Git invalidates all nodes of the - recursive cache tree corresponding to the parent directories of that - path. We store these tree nodes as being "invalid" by using "-1" as the - number of cache entries. Invalid nodes still store a span of index - entries, allowing Git to focus its efforts when reconstructing a full - cache tree. - - The signature for this extension is { 'T', 'R', 'E', 'E' }. - - A series of entries fill the entire extension; each of which - consists of: - - - NUL-terminated path component (relative to its parent directory); - - - ASCII decimal number of entries in the index that is covered by the - tree this entry represents (entry_count); - - - A space (ASCII 32); - - - ASCII decimal number that represents the number of subtrees this - tree has; - - - A newline (ASCII 10); and - - - Object name for the object that would result from writing this span - of index as a tree. - - An entry can be in an invalidated state and is represented by having - a negative number in the entry_count field. In this case, there is no - object name and the next entry starts immediately after the newline. - When writing an invalid entry, -1 should always be used as entry_count. - - The entries are written out in the top-down, depth-first order. The - first entry represents the root level of the repository, followed by the - first subtree--let's call this A--of the root level (with its name - relative to the root level), followed by the first subtree of A (with - its name relative to A), and so on. The specified number of subtrees - indicates when the current level of the recursive stack is complete. - -=== Resolve undo - - A conflict is represented in the index as a set of higher stage entries. - When a conflict is resolved (e.g. with "git add path"), these higher - stage entries will be removed and a stage-0 entry with proper resolution - is added. - - When these higher stage entries are removed, they are saved in the - resolve undo extension, so that conflicts can be recreated (e.g. with - "git checkout -m"), in case users want to redo a conflict resolution - from scratch. - - The signature for this extension is { 'R', 'E', 'U', 'C' }. - - A series of entries fill the entire extension; each of which - consists of: - - - NUL-terminated pathname the entry describes (relative to the root of - the repository, i.e. full pathname); - - - Three NUL-terminated ASCII octal numbers, entry mode of entries in - stage 1 to 3 (a missing stage is represented by "0" in this field); - and - - - At most three object names of the entry in stages from 1 to 3 - (nothing is written for a missing stage). - -=== Split index - - In split index mode, the majority of index entries could be stored - in a separate file. This extension records the changes to be made on - top of that to produce the final index. - - The signature for this extension is { 'l', 'i', 'n', 'k' }. - - The extension consists of: - - - Hash of the shared index file. The shared index file path - is $GIT_DIR/sharedindex.<hash>. If all bits are zero, the - index does not require a shared index file. - - - An ewah-encoded delete bitmap, each bit represents an entry in the - shared index. If a bit is set, its corresponding entry in the - shared index will be removed from the final index. Note, because - a delete operation changes index entry positions, but we do need - original positions in replace phase, it's best to just mark - entries for removal, then do a mass deletion after replacement. - - - An ewah-encoded replace bitmap, each bit represents an entry in - the shared index. If a bit is set, its corresponding entry in the - shared index will be replaced with an entry in this index - file. All replaced entries are stored in sorted order in this - index. The first "1" bit in the replace bitmap corresponds to the - first index entry, the second "1" bit to the second entry and so - on. Replaced entries may have empty path names to save space. - - The remaining index entries after replaced ones will be added to the - final index. These added entries are also sorted by entry name then - stage. - -== Untracked cache - - Untracked cache saves the untracked file list and necessary data to - verify the cache. The signature for this extension is { 'U', 'N', - 'T', 'R' }. - - The extension starts with - - - A sequence of NUL-terminated strings, preceded by the size of the - sequence in variable width encoding. Each string describes the - environment where the cache can be used. - - - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from - ctime field until "file size". - - - Stat data of core.excludesFile - - - 32-bit dir_flags (see struct dir_struct) - - - Hash of $GIT_DIR/info/exclude. A null hash means the file - does not exist. - - - Hash of core.excludesFile. A null hash means the file does - not exist. - - - NUL-terminated string of per-dir exclude file name. This usually - is ".gitignore". - - - The number of following directory blocks, variable width - encoding. If this number is zero, the extension ends here with a - following NUL. - - - A number of directory blocks in depth-first-search order, each - consists of - - - The number of untracked entries, variable width encoding. - - - The number of sub-directory blocks, variable width encoding. - - - The directory name terminated by NUL. - - - A number of untracked file/dir names terminated by NUL. - -The remaining data of each directory block is grouped by type: - - - An ewah bitmap, the n-th bit marks whether the n-th directory has - valid untracked cache entries. - - - An ewah bitmap, the n-th bit records "check-only" bit of - read_directory_recursive() for the n-th directory. - - - An ewah bitmap, the n-th bit indicates whether hash and stat data - is valid for the n-th directory and exists in the next data. - - - An array of stat data. The n-th data corresponds with the n-th - "one" bit in the previous ewah bitmap. - - - An array of hashes. The n-th hash corresponds with the n-th "one" bit - in the previous ewah bitmap. - - - One NUL. - -== File System Monitor cache - - The file system monitor cache tracks files for which the core.fsmonitor - hook has told us about changes. The signature for this extension is - { 'F', 'S', 'M', 'N' }. - - The extension starts with - - - 32-bit version number: the current supported versions are 1 and 2. - - - (Version 1) - 64-bit time: the extension data reflects all changes through the given - time which is stored as the nanoseconds elapsed since midnight, - January 1, 1970. - - - (Version 2) - A null terminated string: an opaque token defined by the file system - monitor application. The extension data reflects all changes relative - to that token. - - - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap. - - - An ewah bitmap, the n-th bit indicates whether the n-th index entry - is not CE_FSMONITOR_VALID. - -== End of Index Entry - - The End of Index Entry (EOIE) is used to locate the end of the variable - length index entries and the beginning of the extensions. Code can take - advantage of this to quickly locate the index extensions without having - to parse through all of the index entries. - - Because it must be able to be loaded before the variable length cache - entries and other index extensions, this extension must be written last. - The signature for this extension is { 'E', 'O', 'I', 'E' }. - - The extension consists of: - - - 32-bit offset to the end of the index entries - - - Hash over the extension types and their sizes (but not - their contents). E.g. if we have "TREE" extension that is N-bytes - long, "REUC" extension that is M-bytes long, followed by "EOIE", - then the hash would be: - - Hash("TREE" + <binary representation of N> + - "REUC" + <binary representation of M>) - -== Index Entry Offset Table - - The Index Entry Offset Table (IEOT) is used to help address the CPU - cost of loading the index by enabling multi-threading the process of - converting cache entries from the on-disk format to the in-memory format. - The signature for this extension is { 'I', 'E', 'O', 'T' }. - - The extension consists of: - - - 32-bit version (currently 1) - - - A number of index offset entries each consisting of: - - - 32-bit offset from the beginning of the file to the first cache entry - in this block of entries. - - - 32-bit count of cache entries in this block - -== Sparse Directory Entries - - When using sparse-checkout in cone mode, some entire directories within - the index can be summarized by pointing to a tree object instead of the - entire expanded list of paths within that tree. An index containing such - entries is a "sparse index". Index format versions 4 and less were not - implemented with such entries in mind. Thus, for these versions, an - index containing sparse directory entries will include this extension - with signature { 's', 'd', 'i', 'r' }. Like the split-index extension, - tools should avoid interacting with a sparse index unless they understand - this extension. diff --git a/Documentation/technical/long-running-process-protocol.txt b/Documentation/technical/long-running-process-protocol.txt index aa0aa9af1c..6f33654b42 100644 --- a/Documentation/technical/long-running-process-protocol.txt +++ b/Documentation/technical/long-running-process-protocol.txt @@ -3,7 +3,7 @@ Long-running process protocol This protocol is used when Git needs to communicate with an external process throughout the entire life of a single Git command. All -communication is in pkt-line format (see technical/protocol-common.txt) +communication is in pkt-line format (see linkgit:gitprotocol-common[5]) over standard input and standard output. Handshake diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt deleted file mode 100644 index b520aa9c45..0000000000 --- a/Documentation/technical/pack-format.txt +++ /dev/null @@ -1,484 +0,0 @@ -Git pack format -=============== - -== Checksums and object IDs - -In a repository using the traditional SHA-1, pack checksums, index checksums, -and object IDs (object names) mentioned below are all computed using SHA-1. -Similarly, in SHA-256 repositories, these values are computed using SHA-256. - -== pack-*.pack files have the following format: - - - A header appears at the beginning and consists of the following: - - 4-byte signature: - The signature is: {'P', 'A', 'C', 'K'} - - 4-byte version number (network byte order): - Git currently accepts version number 2 or 3 but - generates version 2 only. - - 4-byte number of objects contained in the pack (network byte order) - - Observation: we cannot have more than 4G versions ;-) and - more than 4G objects in a pack. - - - The header is followed by number of object entries, each of - which looks like this: - - (undeltified representation) - n-byte type and length (3-bit type, (n-1)*7+4-bit length) - compressed data - - (deltified representation) - n-byte type and length (3-bit type, (n-1)*7+4-bit length) - base object name if OBJ_REF_DELTA or a negative relative - offset from the delta object's position in the pack if this - is an OBJ_OFS_DELTA object - compressed delta data - - Observation: length of each object is encoded in a variable - length format and is not constrained to 32-bit or anything. - - - The trailer records a pack checksum of all of the above. - -=== Object types - -Valid object types are: - -- OBJ_COMMIT (1) -- OBJ_TREE (2) -- OBJ_BLOB (3) -- OBJ_TAG (4) -- OBJ_OFS_DELTA (6) -- OBJ_REF_DELTA (7) - -Type 5 is reserved for future expansion. Type 0 is invalid. - -=== Size encoding - -This document uses the following "size encoding" of non-negative -integers: From each byte, the seven least significant bits are -used to form the resulting integer. As long as the most significant -bit is 1, this process continues; the byte with MSB 0 provides the -last seven bits. The seven-bit chunks are concatenated. Later -values are more significant. - -This size encoding should not be confused with the "offset encoding", -which is also used in this document. - -=== Deltified representation - -Conceptually there are only four object types: commit, tree, tag and -blob. However to save space, an object could be stored as a "delta" of -another "base" object. These representations are assigned new types -ofs-delta and ref-delta, which is only valid in a pack file. - -Both ofs-delta and ref-delta store the "delta" to be applied to -another object (called 'base object') to reconstruct the object. The -difference between them is, ref-delta directly encodes base object -name. If the base object is in the same pack, ofs-delta encodes -the offset of the base object in the pack instead. - -The base object could also be deltified if it's in the same pack. -Ref-delta can also refer to an object outside the pack (i.e. the -so-called "thin pack"). When stored on disk however, the pack should -be self contained to avoid cyclic dependency. - -The delta data starts with the size of the base object and the -size of the object to be reconstructed. These sizes are -encoded using the size encoding from above. The remainder of -the delta data is a sequence of instructions to reconstruct the object -from the base object. If the base object is deltified, it must be -converted to canonical form first. Each instruction appends more and -more data to the target object until it's complete. There are two -supported instructions so far: one for copy a byte range from the -source object and one for inserting new data embedded in the -instruction itself. - -Each instruction has variable length. Instruction type is determined -by the seventh bit of the first octet. The following diagrams follow -the convention in RFC 1951 (Deflate compressed data format). - -==== Instruction to copy from base object - - +----------+---------+---------+---------+---------+-------+-------+-------+ - | 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 | - +----------+---------+---------+---------+---------+-------+-------+-------+ - -This is the instruction format to copy a byte range from the source -object. It encodes the offset to copy from and the number of bytes to -copy. Offset and size are in little-endian order. - -All offset and size bytes are optional. This is to reduce the -instruction size when encoding small offsets or sizes. The first seven -bits in the first octet determines which of the next seven octets is -present. If bit zero is set, offset1 is present. If bit one is set -offset2 is present and so on. - -Note that a more compact instruction does not change offset and size -encoding. For example, if only offset2 is omitted like below, offset3 -still contains bits 16-23. It does not become offset2 and contains -bits 8-15 even if it's right next to offset1. - - +----------+---------+---------+ - | 10000101 | offset1 | offset3 | - +----------+---------+---------+ - -In its most compact form, this instruction only takes up one byte -(0x80) with both offset and size omitted, which will have default -values zero. There is another exception: size zero is automatically -converted to 0x10000. - -==== Instruction to add new data - - +----------+============+ - | 0xxxxxxx | data | - +----------+============+ - -This is the instruction to construct target object without the base -object. The following data is appended to the target object. The first -seven bits of the first octet determines the size of data in -bytes. The size must be non-zero. - -==== Reserved instruction - - +----------+============ - | 00000000 | - +----------+============ - -This is the instruction reserved for future expansion. - -== Original (version 1) pack-*.idx files have the following format: - - - The header consists of 256 4-byte network byte order - integers. N-th entry of this table records the number of - objects in the corresponding pack, the first byte of whose - object name is less than or equal to N. This is called the - 'first-level fan-out' table. - - - The header is followed by sorted 24-byte entries, one entry - per object in the pack. Each entry is: - - 4-byte network byte order integer, recording where the - object is stored in the packfile as the offset from the - beginning. - - one object name of the appropriate size. - - - The file is concluded with a trailer: - - A copy of the pack checksum at the end of the corresponding - packfile. - - Index checksum of all of the above. - -Pack Idx file: - - -- +--------------------------------+ -fanout | fanout[0] = 2 (for example) |-. -table +--------------------------------+ | - | fanout[1] | | - +--------------------------------+ | - | fanout[2] | | - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | - | fanout[255] = total objects |---. - -- +--------------------------------+ | | -main | offset | | | -index | object name 00XXXXXXXXXXXXXXXX | | | -table +--------------------------------+ | | - | offset | | | - | object name 00XXXXXXXXXXXXXXXX | | | - +--------------------------------+<+ | - .-| offset | | - | | object name 01XXXXXXXXXXXXXXXX | | - | +--------------------------------+ | - | | offset | | - | | object name 01XXXXXXXXXXXXXXXX | | - | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | - | | offset | | - | | object name FFXXXXXXXXXXXXXXXX | | - --| +--------------------------------+<--+ -trailer | | packfile checksum | - | +--------------------------------+ - | | idxfile checksum | - | +--------------------------------+ - .-------. - | -Pack file entry: <+ - - packed object header: - 1-byte size extension bit (MSB) - type (next 3 bit) - size0 (lower 4-bit) - n-byte sizeN (as long as MSB is set, each 7-bit) - size0..sizeN form 4+7+7+..+7 bit integer, size0 - is the least significant part, and sizeN is the - most significant part. - packed object data: - If it is not DELTA, then deflated bytes (the size above - is the size before compression). - If it is REF_DELTA, then - base object name (the size above is the - size of the delta data that follows). - delta data, deflated. - If it is OFS_DELTA, then - n-byte offset (see below) interpreted as a negative - offset from the type-byte of the header of the - ofs-delta entry (the size above is the size of - the delta data that follows). - delta data, deflated. - - offset encoding: - n bytes with MSB set in all but the last one. - The offset is then the number constructed by - concatenating the lower 7 bit of each byte, and - for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1)) - to the result. - - - -== Version 2 pack-*.idx files support packs larger than 4 GiB, and - have some other reorganizations. They have the format: - - - A 4-byte magic number '\377tOc' which is an unreasonable - fanout[0] value. - - - A 4-byte version number (= 2) - - - A 256-entry fan-out table just like v1. - - - A table of sorted object names. These are packed together - without offset values to reduce the cache footprint of the - binary search for a specific object name. - - - A table of 4-byte CRC32 values of the packed object data. - This is new in v2 so compressed data can be copied directly - from pack to pack during repacking without undetected - data corruption. - - - A table of 4-byte offset values (in network byte order). - These are usually 31-bit pack file offsets, but large - offsets are encoded as an index into the next table with - the msbit set. - - - A table of 8-byte offset entries (empty for pack files less - than 2 GiB). Pack files are organized with heavily used - objects toward the front, so most object references should - not need to refer to this table. - - - The same trailer as a v1 pack file: - - A copy of the pack checksum at the end of - corresponding packfile. - - Index checksum of all of the above. - -== pack-*.rev files have the format: - - - A 4-byte magic number '0x52494458' ('RIDX'). - - - A 4-byte version identifier (= 1). - - - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256). - - - A table of index positions (one per packed object, num_objects in - total, each a 4-byte unsigned integer in network order), sorted by - their corresponding offsets in the packfile. - - - A trailer, containing a: - - checksum of the corresponding packfile, and - - a checksum of all of the above. - -All 4-byte numbers are in network order. - -== pack-*.mtimes files have the format: - -All 4-byte numbers are in network byte order. - - - A 4-byte magic number '0x4d544d45' ('MTME'). - - - A 4-byte version identifier (= 1). - - - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256). - - - A table of 4-byte unsigned integers. The ith value is the - modification time (mtime) of the ith object in the corresponding - pack by lexicographic (index) order. The mtimes count standard - epoch seconds. - - - A trailer, containing a checksum of the corresponding packfile, - and a checksum of all of the above (each having length according - to the specified hash function). - -== multi-pack-index (MIDX) files have the following format: - -The multi-pack-index files refer to multiple pack-files and loose objects. - -In order to allow extensions that add extra data to the MIDX, we organize -the body into "chunks" and provide a lookup table at the beginning of the -body. The header includes certain length values, such as the number of packs, -the number of base MIDX files, hash lengths and types. - -All 4-byte numbers are in network order. - -HEADER: - - 4-byte signature: - The signature is: {'M', 'I', 'D', 'X'} - - 1-byte version number: - Git only writes or recognizes version 1. - - 1-byte Object Id Version - We infer the length of object IDs (OIDs) from this value: - 1 => SHA-1 - 2 => SHA-256 - If the hash type does not match the repository's hash algorithm, - the multi-pack-index file should be ignored with a warning - presented to the user. - - 1-byte number of "chunks" - - 1-byte number of base multi-pack-index files: - This value is currently always zero. - - 4-byte number of pack files - -CHUNK LOOKUP: - - (C + 1) * 12 bytes providing the chunk offsets: - First 4 bytes describe chunk id. Value 0 is a terminating label. - Other 8 bytes provide offset in current file for chunk to start. - (Chunks are provided in file-order, so you can infer the length - using the next chunk position if necessary.) - - The CHUNK LOOKUP matches the table of contents from - link:technical/chunk-format.html[the chunk-based file format]. - - The remaining data in the body is described one chunk at a time, and - these chunks may be given in any order. Chunks are required unless - otherwise specified. - -CHUNK DATA: - - Packfile Names (ID: {'P', 'N', 'A', 'M'}) - Stores the packfile names as concatenated, null-terminated strings. - Packfiles must be listed in lexicographic order for fast lookups by - name. This is the only chunk not guaranteed to be a multiple of four - bytes in length, so should be the last chunk for alignment reasons. - - OID Fanout (ID: {'O', 'I', 'D', 'F'}) - The ith entry, F[i], stores the number of OIDs with first - byte at most i. Thus F[255] stores the total - number of objects. - - OID Lookup (ID: {'O', 'I', 'D', 'L'}) - The OIDs for all objects in the MIDX are stored in lexicographic - order in this chunk. - - Object Offsets (ID: {'O', 'O', 'F', 'F'}) - Stores two 4-byte values for every object. - 1: The pack-int-id for the pack storing this object. - 2: The offset within the pack. - If all offsets are less than 2^32, then the large offset chunk - will not exist and offsets are stored as in IDX v1. - If there is at least one offset value larger than 2^32-1, then - the large offset chunk must exist, and offsets larger than - 2^31-1 must be stored in it instead. If the large offset chunk - exists and the 31st bit is on, then removing that bit reveals - the row in the large offsets containing the 8-byte offset of - this object. - - [Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'}) - 8-byte offsets into large packfiles. - - [Optional] Bitmap pack order (ID: {'R', 'I', 'D', 'X'}) - A list of MIDX positions (one per object in the MIDX, num_objects in - total, each a 4-byte unsigned integer in network byte order), sorted - according to their relative bitmap/pseudo-pack positions. - -TRAILER: - - Index checksum of the above contents. - -== multi-pack-index reverse indexes - -Similar to the pack-based reverse index, the multi-pack index can also -be used to generate a reverse index. - -Instead of mapping between offset, pack-, and index position, this -reverse index maps between an object's position within the MIDX, and -that object's position within a pseudo-pack that the MIDX describes -(i.e., the ith entry of the multi-pack reverse index holds the MIDX -position of ith object in pseudo-pack order). - -To clarify the difference between these orderings, consider a multi-pack -reachability bitmap (which does not yet exist, but is what we are -building towards here). Each bit needs to correspond to an object in the -MIDX, and so we need an efficient mapping from bit position to MIDX -position. - -One solution is to let bits occupy the same position in the oid-sorted -index stored by the MIDX. But because oids are effectively random, their -resulting reachability bitmaps would have no locality, and thus compress -poorly. (This is the reason that single-pack bitmaps use the pack -ordering, and not the .idx ordering, for the same purpose.) - -So we'd like to define an ordering for the whole MIDX based around -pack ordering, which has far better locality (and thus compresses more -efficiently). We can think of a pseudo-pack created by the concatenation -of all of the packs in the MIDX. E.g., if we had a MIDX with three packs -(a, b, c), with 10, 15, and 20 objects respectively, we can imagine an -ordering of the objects like: - - |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19| - -where the ordering of the packs is defined by the MIDX's pack list, -and then the ordering of objects within each pack is the same as the -order in the actual packfile. - -Given the list of packs and their counts of objects, you can -naïvely reconstruct that pseudo-pack ordering (e.g., the object at -position 27 must be (c,1) because packs "a" and "b" consumed 25 of the -slots). But there's a catch. Objects may be duplicated between packs, in -which case the MIDX only stores one pointer to the object (and thus we'd -want only one slot in the bitmap). - -Callers could handle duplicates themselves by reading objects in order -of their bit-position, but that's linear in the number of objects, and -much too expensive for ordinary bitmap lookups. Building a reverse index -solves this, since it is the logical inverse of the index, and that -index has already removed duplicates. But, building a reverse index on -the fly can be expensive. Since we already have an on-disk format for -pack-based reverse indexes, let's reuse it for the MIDX's pseudo-pack, -too. - -Objects from the MIDX are ordered as follows to string together the -pseudo-pack. Let `pack(o)` return the pack from which `o` was selected -by the MIDX, and define an ordering of packs based on their numeric ID -(as stored by the MIDX). Let `offset(o)` return the object offset of `o` -within `pack(o)`. Then, compare `o1` and `o2` as follows: - - - If one of `pack(o1)` and `pack(o2)` is preferred and the other - is not, then the preferred one sorts first. -+ -(This is a detail that allows the MIDX bitmap to determine which -pack should be used by the pack-reuse mechanism, since it can ask -the MIDX for the pack containing the object at bit position 0). - - - If `pack(o1) ≠pack(o2)`, then sort the two objects in descending - order based on the pack ID. - - - Otherwise, `pack(o1) = pack(o2)`, and the objects are sorted in - pack-order (i.e., `o1` sorts ahead of `o2` exactly when `offset(o1) - < offset(o2)`). - -In short, a MIDX's pseudo-pack is the de-duplicated concatenation of -objects in packs stored by the MIDX, laid out in pack order, and the -packs arranged in MIDX order (with the preferred pack coming first). - -The MIDX's reverse index is stored in the optional 'RIDX' chunk within -the MIDX itself. diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt deleted file mode 100644 index e13a2c064d..0000000000 --- a/Documentation/technical/pack-protocol.txt +++ /dev/null @@ -1,709 +0,0 @@ -Packfile transfer protocols -=========================== - -Git supports transferring data in packfiles over the ssh://, git://, http:// and -file:// transports. There exist two sets of protocols, one for pushing -data from a client to a server and another for fetching data from a -server to a client. The three transports (ssh, git, file) use the same -protocol to transfer data. http is documented in http-protocol.txt. - -The processes invoked in the canonical Git implementation are 'upload-pack' -on the server side and 'fetch-pack' on the client side for fetching data; -then 'receive-pack' on the server and 'send-pack' on the client for pushing -data. The protocol functions to have a server tell a client what is -currently on the server, then for the two to negotiate the smallest amount -of data to send in order to fully update one or the other. - -pkt-line Format ---------------- - -The descriptions below build on the pkt-line format described in -protocol-common.txt. When the grammar indicate `PKT-LINE(...)`, unless -otherwise noted the usual pkt-line LF rules apply: the sender SHOULD -include a LF, but the receiver MUST NOT complain if it is not present. - -An error packet is a special pkt-line that contains an error string. - ----- - error-line = PKT-LINE("ERR" SP explanation-text) ----- - -Throughout the protocol, where `PKT-LINE(...)` is expected, an error packet MAY -be sent. Once this packet is sent by a client or a server, the data transfer -process defined in this protocol is terminated. - -Transports ----------- -There are three transports over which the packfile protocol is -initiated. The Git transport is a simple, unauthenticated server that -takes the command (almost always 'upload-pack', though Git -servers can be configured to be globally writable, in which 'receive- -pack' initiation is also allowed) with which the client wishes to -communicate and executes it and connects it to the requesting -process. - -In the SSH transport, the client just runs the 'upload-pack' -or 'receive-pack' process on the server over the SSH protocol and then -communicates with that invoked process over the SSH connection. - -The file:// transport runs the 'upload-pack' or 'receive-pack' -process locally and communicates with it over a pipe. - -Extra Parameters ----------------- - -The protocol provides a mechanism in which clients can send additional -information in its first message to the server. These are called "Extra -Parameters", and are supported by the Git, SSH, and HTTP protocols. - -Each Extra Parameter takes the form of `<key>=<value>` or `<key>`. - -Servers that receive any such Extra Parameters MUST ignore all -unrecognized keys. Currently, the only Extra Parameter recognized is -"version" with a value of '1' or '2'. See protocol-v2.txt for more -information on protocol version 2. - -Git Transport -------------- - -The Git transport starts off by sending the command and repository -on the wire using the pkt-line format, followed by a NUL byte and a -hostname parameter, terminated by a NUL byte. - - 0033git-upload-pack /project.git\0host=myserver.com\0 - -The transport may send Extra Parameters by adding an additional NUL -byte, and then adding one or more NUL-terminated strings: - - 003egit-upload-pack /project.git\0host=myserver.com\0\0version=1\0 - --- - git-proto-request = request-command SP pathname NUL - [ host-parameter NUL ] [ NUL extra-parameters ] - request-command = "git-upload-pack" / "git-receive-pack" / - "git-upload-archive" ; case sensitive - pathname = *( %x01-ff ) ; exclude NUL - host-parameter = "host=" hostname [ ":" port ] - extra-parameters = 1*extra-parameter - extra-parameter = 1*( %x01-ff ) NUL --- - -host-parameter is used for the -git-daemon name based virtual hosting. See --interpolated-path -option to git daemon, with the %H/%CH format characters. - -Basically what the Git client is doing to connect to an 'upload-pack' -process on the server side over the Git protocol is this: - - $ echo -e -n \ - "003agit-upload-pack /schacon/gitbook.git\0host=example.com\0" | - nc -v example.com 9418 - - -SSH Transport -------------- - -Initiating the upload-pack or receive-pack processes over SSH is -executing the binary on the server via SSH remote execution. -It is basically equivalent to running this: - - $ ssh git.example.com "git-upload-pack '/project.git'" - -For a server to support Git pushing and pulling for a given user over -SSH, that user needs to be able to execute one or both of those -commands via the SSH shell that they are provided on login. On some -systems, that shell access is limited to only being able to run those -two commands, or even just one of them. - -In an ssh:// format URI, it's absolute in the URI, so the '/' after -the host name (or port number) is sent as an argument, which is then -read by the remote git-upload-pack exactly as is, so it's effectively -an absolute path in the remote filesystem. - - git clone ssh://user@example.com/project.git - | - v - ssh user@example.com "git-upload-pack '/project.git'" - -In a "user@host:path" format URI, its relative to the user's home -directory, because the Git client will run: - - git clone user@example.com:project.git - | - v - ssh user@example.com "git-upload-pack 'project.git'" - -The exception is if a '~' is used, in which case -we execute it without the leading '/'. - - ssh://user@example.com/~alice/project.git, - | - v - ssh user@example.com "git-upload-pack '~alice/project.git'" - -Depending on the value of the `protocol.version` configuration variable, -Git may attempt to send Extra Parameters as a colon-separated string in -the GIT_PROTOCOL environment variable. This is done only if -the `ssh.variant` configuration variable indicates that the ssh command -supports passing environment variables as an argument. - -A few things to remember here: - -- The "command name" is spelled with dash (e.g. git-upload-pack), but - this can be overridden by the client; - -- The repository path is always quoted with single quotes. - -Fetching Data From a Server ---------------------------- - -When one Git repository wants to get data that a second repository -has, the first can 'fetch' from the second. This operation determines -what data the server has that the client does not then streams that -data down to the client in packfile format. - - -Reference Discovery -------------------- - -When the client initially connects the server will immediately respond -with a version number (if "version=1" is sent as an Extra Parameter), -and a listing of each reference it has (all branches and tags) along -with the object name that each reference currently points to. - - $ echo -e -n "0045git-upload-pack /schacon/gitbook.git\0host=example.com\0\0version=1\0" | - nc -v example.com 9418 - 000eversion 1 - 00887217a7c7e582c46cec22a130adf4b9d7d950fba0 HEAD\0multi_ack thin-pack - side-band side-band-64k ofs-delta shallow no-progress include-tag - 00441d3fcd5ced445d1abc402225c0b8a1299641f497 refs/heads/integration - 003f7217a7c7e582c46cec22a130adf4b9d7d950fba0 refs/heads/master - 003cb88d2441cac0977faf98efc80305012112238d9d refs/tags/v0.9 - 003c525128480b96c89e6418b1e40909bf6c5b2d580f refs/tags/v1.0 - 003fe92df48743b7bc7d26bcaabfddde0a1e20cae47c refs/tags/v1.0^{} - 0000 - -The returned response is a pkt-line stream describing each ref and -its current value. The stream MUST be sorted by name according to -the C locale ordering. - -If HEAD is a valid ref, HEAD MUST appear as the first advertised -ref. If HEAD is not a valid ref, HEAD MUST NOT appear in the -advertisement list at all, but other refs may still appear. - -The stream MUST include capability declarations behind a NUL on the -first ref. The peeled value of a ref (that is "ref^{}") MUST be -immediately after the ref itself, if presented. A conforming server -MUST peel the ref if it's an annotated tag. - ----- - advertised-refs = *1("version 1") - (no-refs / list-of-refs) - *shallow - flush-pkt - - no-refs = PKT-LINE(zero-id SP "capabilities^{}" - NUL capability-list) - - list-of-refs = first-ref *other-ref - first-ref = PKT-LINE(obj-id SP refname - NUL capability-list) - - other-ref = PKT-LINE(other-tip / other-peeled) - other-tip = obj-id SP refname - other-peeled = obj-id SP refname "^{}" - - shallow = PKT-LINE("shallow" SP obj-id) - - capability-list = capability *(SP capability) - capability = 1*(LC_ALPHA / DIGIT / "-" / "_") - LC_ALPHA = %x61-7A ----- - -Server and client MUST use lowercase for obj-id, both MUST treat obj-id -as case-insensitive. - -See protocol-capabilities.txt for a list of allowed server capabilities -and descriptions. - -Packfile Negotiation --------------------- -After reference and capabilities discovery, the client can decide to -terminate the connection by sending a flush-pkt, telling the server it can -now gracefully terminate, and disconnect, when it does not need any pack -data. This can happen with the ls-remote command, and also can happen when -the client already is up to date. - -Otherwise, it enters the negotiation phase, where the client and -server determine what the minimal packfile necessary for transport is, -by telling the server what objects it wants, its shallow objects -(if any), and the maximum commit depth it wants (if any). The client -will also send a list of the capabilities it wants to be in effect, -out of what the server said it could do with the first 'want' line. - ----- - upload-request = want-list - *shallow-line - *1depth-request - [filter-request] - flush-pkt - - want-list = first-want - *additional-want - - shallow-line = PKT-LINE("shallow" SP obj-id) - - depth-request = PKT-LINE("deepen" SP depth) / - PKT-LINE("deepen-since" SP timestamp) / - PKT-LINE("deepen-not" SP ref) - - first-want = PKT-LINE("want" SP obj-id SP capability-list) - additional-want = PKT-LINE("want" SP obj-id) - - depth = 1*DIGIT - - filter-request = PKT-LINE("filter" SP filter-spec) ----- - -Clients MUST send all the obj-ids it wants from the reference -discovery phase as 'want' lines. Clients MUST send at least one -'want' command in the request body. Clients MUST NOT mention an -obj-id in a 'want' command which did not appear in the response -obtained through ref discovery. - -The client MUST write all obj-ids which it only has shallow copies -of (meaning that it does not have the parents of a commit) as -'shallow' lines so that the server is aware of the limitations of -the client's history. - -The client now sends the maximum commit history depth it wants for -this transaction, which is the number of commits it wants from the -tip of the history, if any, as a 'deepen' line. A depth of 0 is the -same as not making a depth request. The client does not want to receive -any commits beyond this depth, nor does it want objects needed only to -complete those commits. Commits whose parents are not received as a -result are defined as shallow and marked as such in the server. This -information is sent back to the client in the next step. - -The client can optionally request that pack-objects omit various -objects from the packfile using one of several filtering techniques. -These are intended for use with partial clone and partial fetch -operations. An object that does not meet a filter-spec value is -omitted unless explicitly requested in a 'want' line. See `rev-list` -for possible filter-spec values. - -Once all the 'want's and 'shallow's (and optional 'deepen') are -transferred, clients MUST send a flush-pkt, to tell the server side -that it is done sending the list. - -Otherwise, if the client sent a positive depth request, the server -will determine which commits will and will not be shallow and -send this information to the client. If the client did not request -a positive depth, this step is skipped. - ----- - shallow-update = *shallow-line - *unshallow-line - flush-pkt - - shallow-line = PKT-LINE("shallow" SP obj-id) - - unshallow-line = PKT-LINE("unshallow" SP obj-id) ----- - -If the client has requested a positive depth, the server will compute -the set of commits which are no deeper than the desired depth. The set -of commits start at the client's wants. - -The server writes 'shallow' lines for each -commit whose parents will not be sent as a result. The server writes -an 'unshallow' line for each commit which the client has indicated is -shallow, but is no longer shallow at the currently requested depth -(that is, its parents will now be sent). The server MUST NOT mark -as unshallow anything which the client has not indicated was shallow. - -Now the client will send a list of the obj-ids it has using 'have' -lines, so the server can make a packfile that only contains the objects -that the client needs. In multi_ack mode, the canonical implementation -will send up to 32 of these at a time, then will send a flush-pkt. The -canonical implementation will skip ahead and send the next 32 immediately, -so that there is always a block of 32 "in-flight on the wire" at a time. - ----- - upload-haves = have-list - compute-end - - have-list = *have-line - have-line = PKT-LINE("have" SP obj-id) - compute-end = flush-pkt / PKT-LINE("done") ----- - -If the server reads 'have' lines, it then will respond by ACKing any -of the obj-ids the client said it had that the server also has. The -server will ACK obj-ids differently depending on which ack mode is -chosen by the client. - -In multi_ack mode: - - * the server will respond with 'ACK obj-id continue' for any common - commits. - - * once the server has found an acceptable common base commit and is - ready to make a packfile, it will blindly ACK all 'have' obj-ids - back to the client. - - * the server will then send a 'NAK' and then wait for another response - from the client - either a 'done' or another list of 'have' lines. - -In multi_ack_detailed mode: - - * the server will differentiate the ACKs where it is signaling - that it is ready to send data with 'ACK obj-id ready' lines, and - signals the identified common commits with 'ACK obj-id common' lines. - -Without either multi_ack or multi_ack_detailed: - - * upload-pack sends "ACK obj-id" on the first common object it finds. - After that it says nothing until the client gives it a "done". - - * upload-pack sends "NAK" on a flush-pkt if no common object - has been found yet. If one has been found, and thus an ACK - was already sent, it's silent on the flush-pkt. - -After the client has gotten enough ACK responses that it can determine -that the server has enough information to send an efficient packfile -(in the canonical implementation, this is determined when it has received -enough ACKs that it can color everything left in the --date-order queue -as common with the server, or the --date-order queue is empty), or the -client determines that it wants to give up (in the canonical implementation, -this is determined when the client sends 256 'have' lines without getting -any of them ACKed by the server - meaning there is nothing in common and -the server should just send all of its objects), then the client will send -a 'done' command. The 'done' command signals to the server that the client -is ready to receive its packfile data. - -However, the 256 limit *only* turns on in the canonical client -implementation if we have received at least one "ACK %s continue" -during a prior round. This helps to ensure that at least one common -ancestor is found before we give up entirely. - -Once the 'done' line is read from the client, the server will either -send a final 'ACK obj-id' or it will send a 'NAK'. 'obj-id' is the object -name of the last commit determined to be common. The server only sends -ACK after 'done' if there is at least one common base and multi_ack or -multi_ack_detailed is enabled. The server always sends NAK after 'done' -if there is no common base found. - -Instead of 'ACK' or 'NAK', the server may send an error message (for -example, if it does not recognize an object in a 'want' line received -from the client). - -Then the server will start sending its packfile data. - ----- - server-response = *ack_multi ack / nak - ack_multi = PKT-LINE("ACK" SP obj-id ack_status) - ack_status = "continue" / "common" / "ready" - ack = PKT-LINE("ACK" SP obj-id) - nak = PKT-LINE("NAK") ----- - -A simple clone may look like this (with no 'have' lines): - ----- - C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d multi_ack \ - side-band-64k ofs-delta\n - C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n - C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n - C: 0032want 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n - C: 0032want 74730d410fcb6603ace96f1dc55ea6196122532d\n - C: 0000 - C: 0009done\n - - S: 0008NAK\n - S: [PACKFILE] ----- - -An incremental update (fetch) response might look like this: - ----- - C: 0054want 74730d410fcb6603ace96f1dc55ea6196122532d multi_ack \ - side-band-64k ofs-delta\n - C: 0032want 7d1665144a3a975c05f1f43902ddaf084e784dbe\n - C: 0032want 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a\n - C: 0000 - C: 0032have 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01\n - C: [30 more have lines] - C: 0032have 74730d410fcb6603ace96f1dc55ea6196122532d\n - C: 0000 - - S: 003aACK 7e47fe2bd8d01d481f44d7af0531bd93d3b21c01 continue\n - S: 003aACK 74730d410fcb6603ace96f1dc55ea6196122532d continue\n - S: 0008NAK\n - - C: 0009done\n - - S: 0031ACK 74730d410fcb6603ace96f1dc55ea6196122532d\n - S: [PACKFILE] ----- - - -Packfile Data -------------- - -Now that the client and server have finished negotiation about what -the minimal amount of data that needs to be sent to the client is, the server -will construct and send the required data in packfile format. - -See pack-format.txt for what the packfile itself actually looks like. - -If 'side-band' or 'side-band-64k' capabilities have been specified by -the client, the server will send the packfile data multiplexed. - -Each packet starting with the packet-line length of the amount of data -that follows, followed by a single byte specifying the sideband the -following data is coming in on. - -In 'side-band' mode, it will send up to 999 data bytes plus 1 control -code, for a total of up to 1000 bytes in a pkt-line. In 'side-band-64k' -mode it will send up to 65519 data bytes plus 1 control code, for a -total of up to 65520 bytes in a pkt-line. - -The sideband byte will be a '1', '2' or a '3'. Sideband '1' will contain -packfile data, sideband '2' will be used for progress information that the -client will generally print to stderr and sideband '3' is used for error -information. - -If no 'side-band' capability was specified, the server will stream the -entire packfile without multiplexing. - - -Pushing Data To a Server ------------------------- - -Pushing data to a server will invoke the 'receive-pack' process on the -server, which will allow the client to tell it which references it should -update and then send all the data the server will need for those new -references to be complete. Once all the data is received and validated, -the server will then update its references to what the client specified. - -Authentication --------------- - -The protocol itself contains no authentication mechanisms. That is to be -handled by the transport, such as SSH, before the 'receive-pack' process is -invoked. If 'receive-pack' is configured over the Git transport, those -repositories will be writable by anyone who can access that port (9418) as -that transport is unauthenticated. - -Reference Discovery -------------------- - -The reference discovery phase is done nearly the same way as it is in the -fetching protocol. Each reference obj-id and name on the server is sent -in packet-line format to the client, followed by a flush-pkt. The only -real difference is that the capability listing is different - the only -possible values are 'report-status', 'report-status-v2', 'delete-refs', -'ofs-delta', 'atomic' and 'push-options'. - -Reference Update Request and Packfile Transfer ----------------------------------------------- - -Once the client knows what references the server is at, it can send a -list of reference update requests. For each reference on the server -that it wants to update, it sends a line listing the obj-id currently on -the server, the obj-id the client would like to update it to and the name -of the reference. - -This list is followed by a flush-pkt. - ----- - update-requests = *shallow ( command-list | push-cert ) - - shallow = PKT-LINE("shallow" SP obj-id) - - command-list = PKT-LINE(command NUL capability-list) - *PKT-LINE(command) - flush-pkt - - command = create / delete / update - create = zero-id SP new-id SP name - delete = old-id SP zero-id SP name - update = old-id SP new-id SP name - - old-id = obj-id - new-id = obj-id - - push-cert = PKT-LINE("push-cert" NUL capability-list LF) - PKT-LINE("certificate version 0.1" LF) - PKT-LINE("pusher" SP ident LF) - PKT-LINE("pushee" SP url LF) - PKT-LINE("nonce" SP nonce LF) - *PKT-LINE("push-option" SP push-option LF) - PKT-LINE(LF) - *PKT-LINE(command LF) - *PKT-LINE(gpg-signature-lines LF) - PKT-LINE("push-cert-end" LF) - - push-option = 1*( VCHAR | SP ) ----- - -If the server has advertised the 'push-options' capability and the client has -specified 'push-options' as part of the capability list above, the client then -sends its push options followed by a flush-pkt. - ----- - push-options = *PKT-LINE(push-option) flush-pkt ----- - -For backwards compatibility with older Git servers, if the client sends a push -cert and push options, it MUST send its push options both embedded within the -push cert and after the push cert. (Note that the push options within the cert -are prefixed, but the push options after the cert are not.) Both these lists -MUST be the same, modulo the prefix. - -After that the packfile that -should contain all the objects that the server will need to complete the new -references will be sent. - ----- - packfile = "PACK" 28*(OCTET) ----- - -If the receiving end does not support delete-refs, the sending end MUST -NOT ask for delete command. - -If the receiving end does not support push-cert, the sending end -MUST NOT send a push-cert command. When a push-cert command is -sent, command-list MUST NOT be sent; the commands recorded in the -push certificate is used instead. - -The packfile MUST NOT be sent if the only command used is 'delete'. - -A packfile MUST be sent if either create or update command is used, -even if the server already has all the necessary objects. In this -case the client MUST send an empty packfile. The only time this -is likely to happen is if the client is creating -a new branch or a tag that points to an existing obj-id. - -The server will receive the packfile, unpack it, then validate each -reference that is being updated that it hasn't changed while the request -was being processed (the obj-id is still the same as the old-id), and -it will run any update hooks to make sure that the update is acceptable. -If all of that is fine, the server will then update the references. - -Push Certificate ----------------- - -A push certificate begins with a set of header lines. After the -header and an empty line, the protocol commands follow, one per -line. Note that the trailing LF in push-cert PKT-LINEs is _not_ -optional; it must be present. - -Currently, the following header fields are defined: - -`pusher` ident:: - Identify the GPG key in "Human Readable Name <email@address>" - format. - -`pushee` url:: - The repository URL (anonymized, if the URL contains - authentication material) the user who ran `git push` - intended to push into. - -`nonce` nonce:: - The 'nonce' string the receiving repository asked the - pushing user to include in the certificate, to prevent - replay attacks. - -The GPG signature lines are a detached signature for the contents -recorded in the push certificate before the signature block begins. -The detached signature is used to certify that the commands were -given by the pusher, who must be the signer. - -Report Status -------------- - -After receiving the pack data from the sender, the receiver sends a -report if 'report-status' or 'report-status-v2' capability is in effect. -It is a short listing of what happened in that update. It will first -list the status of the packfile unpacking as either 'unpack ok' or -'unpack [error]'. Then it will list the status for each of the references -that it tried to update. Each line is either 'ok [refname]' if the -update was successful, or 'ng [refname] [error]' if the update was not. - ----- - report-status = unpack-status - 1*(command-status) - flush-pkt - - unpack-status = PKT-LINE("unpack" SP unpack-result) - unpack-result = "ok" / error-msg - - command-status = command-ok / command-fail - command-ok = PKT-LINE("ok" SP refname) - command-fail = PKT-LINE("ng" SP refname SP error-msg) - - error-msg = 1*(OCTET) ; where not "ok" ----- - -The 'report-status-v2' capability extends the protocol by adding new option -lines in order to support reporting of reference rewritten by the -'proc-receive' hook. The 'proc-receive' hook may handle a command for a -pseudo-reference which may create or update one or more references, and each -reference may have different name, different new-oid, and different old-oid. - ----- - report-status-v2 = unpack-status - 1*(command-status-v2) - flush-pkt - - unpack-status = PKT-LINE("unpack" SP unpack-result) - unpack-result = "ok" / error-msg - - command-status-v2 = command-ok-v2 / command-fail - command-ok-v2 = command-ok - *option-line - - command-ok = PKT-LINE("ok" SP refname) - command-fail = PKT-LINE("ng" SP refname SP error-msg) - - error-msg = 1*(OCTET) ; where not "ok" - - option-line = *1(option-refname) - *1(option-old-oid) - *1(option-new-oid) - *1(option-forced-update) - - option-refname = PKT-LINE("option" SP "refname" SP refname) - option-old-oid = PKT-LINE("option" SP "old-oid" SP obj-id) - option-new-oid = PKT-LINE("option" SP "new-oid" SP obj-id) - option-force = PKT-LINE("option" SP "forced-update") - ----- - -Updates can be unsuccessful for a number of reasons. The reference can have -changed since the reference discovery phase was originally sent, meaning -someone pushed in the meantime. The reference being pushed could be a -non-fast-forward reference and the update hooks or configuration could be -set to not allow that, etc. Also, some references can be updated while others -can be rejected. - -An example client/server communication might look like this: - ----- - S: 006274730d410fcb6603ace96f1dc55ea6196122532d refs/heads/local\0report-status delete-refs ofs-delta\n - S: 003e7d1665144a3a975c05f1f43902ddaf084e784dbe refs/heads/debug\n - S: 003f74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/master\n - S: 003d74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/team\n - S: 0000 - - C: 00677d1665144a3a975c05f1f43902ddaf084e784dbe 74730d410fcb6603ace96f1dc55ea6196122532d refs/heads/debug\n - C: 006874730d410fcb6603ace96f1dc55ea6196122532d 5a3f6be755bbb7deae50065988cbfa1ffa9ab68a refs/heads/master\n - C: 0000 - C: [PACKDATA] - - S: 000eunpack ok\n - S: 0018ok refs/heads/debug\n - S: 002ang refs/heads/master non-fast-forward\n ----- diff --git a/Documentation/technical/packfile-uri.txt b/Documentation/technical/packfile-uri.txt index 1eb525fe76..9d453d4765 100644 --- a/Documentation/technical/packfile-uri.txt +++ b/Documentation/technical/packfile-uri.txt @@ -18,7 +18,7 @@ a `packfile-uris` argument, the server MAY send a `packfile-uris` section directly before the `packfile` section (right after `wanted-refs` if it is sent) containing URIs of any of the given protocols. The URIs point to packfiles that use only features that the client has declared that it supports -(e.g. ofs-delta and thin-pack). See protocol-v2.txt for the documentation of +(e.g. ofs-delta and thin-pack). See linkgit:gitprotocol-v2[5] for the documentation of this section. Clients should then download and index all the given URIs (in addition to diff --git a/Documentation/technical/partial-clone.txt b/Documentation/technical/partial-clone.txt index 99f0eb3040..92fcee2bff 100644 --- a/Documentation/technical/partial-clone.txt +++ b/Documentation/technical/partial-clone.txt @@ -79,7 +79,7 @@ Design Details upload-pack negotiation. + This uses the existing capability discovery mechanism. -See "filter" in Documentation/technical/pack-protocol.txt. +See "filter" in linkgit:gitprotocol-pack[5]. - Clients pass a "filter-spec" to clone and fetch which is passed to the server to request filtering during packfile construction. diff --git a/Documentation/technical/protocol-capabilities.txt b/Documentation/technical/protocol-capabilities.txt deleted file mode 100644 index 9dfade930d..0000000000 --- a/Documentation/technical/protocol-capabilities.txt +++ /dev/null @@ -1,380 +0,0 @@ -Git Protocol Capabilities -========================= - -NOTE: this document describes capabilities for versions 0 and 1 of the pack -protocol. For version 2, please refer to the link:protocol-v2.html[protocol-v2] -doc. - -Servers SHOULD support all capabilities defined in this document. - -On the very first line of the initial server response of either -receive-pack and upload-pack the first reference is followed by -a NUL byte and then a list of space delimited server capabilities. -These allow the server to declare what it can and cannot support -to the client. - -Client will then send a space separated list of capabilities it wants -to be in effect. The client MUST NOT ask for capabilities the server -did not say it supports. - -Server MUST diagnose and abort if capabilities it does not understand -was sent. Server MUST NOT ignore capabilities that client requested -and server advertised. As a consequence of these rules, server MUST -NOT advertise capabilities it does not understand. - -The 'atomic', 'report-status', 'report-status-v2', 'delete-refs', 'quiet', -and 'push-cert' capabilities are sent and recognized by the receive-pack -(push to server) process. - -The 'ofs-delta' and 'side-band-64k' capabilities are sent and recognized -by both upload-pack and receive-pack protocols. The 'agent' and 'session-id' -capabilities may optionally be sent in both protocols. - -All other capabilities are only recognized by the upload-pack (fetch -from server) process. - -multi_ack ---------- - -The 'multi_ack' capability allows the server to return "ACK obj-id -continue" as soon as it finds a commit that it can use as a common -base, between the client's wants and the client's have set. - -By sending this early, the server can potentially head off the client -from walking any further down that particular branch of the client's -repository history. The client may still need to walk down other -branches, sending have lines for those, until the server has a -complete cut across the DAG, or the client has said "done". - -Without multi_ack, a client sends have lines in --date-order until -the server has found a common base. That means the client will send -have lines that are already known by the server to be common, because -they overlap in time with another branch that the server hasn't found -a common base on yet. - -For example suppose the client has commits in caps that the server -doesn't and the server has commits in lower case that the client -doesn't, as in the following diagram: - - +---- u ---------------------- x - / +----- y - / / - a -- b -- c -- d -- E -- F - \ - +--- Q -- R -- S - -If the client wants x,y and starts out by saying have F,S, the server -doesn't know what F,S is. Eventually the client says "have d" and -the server sends "ACK d continue" to let the client know to stop -walking down that line (so don't send c-b-a), but it's not done yet, -it needs a base for x. The client keeps going with S-R-Q, until a -gets reached, at which point the server has a clear base and it all -ends. - -Without multi_ack the client would have sent that c-b-a chain anyway, -interleaved with S-R-Q. - -multi_ack_detailed ------------------- -This is an extension of multi_ack that permits client to better -understand the server's in-memory state. See pack-protocol.txt, -section "Packfile Negotiation" for more information. - -no-done -------- -This capability should only be used with the smart HTTP protocol. If -multi_ack_detailed and no-done are both present, then the sender is -free to immediately send a pack following its first "ACK obj-id ready" -message. - -Without no-done in the smart HTTP protocol, the server session would -end and the client has to make another trip to send "done" before -the server can send the pack. no-done removes the last round and -thus slightly reduces latency. - -thin-pack ---------- - -A thin pack is one with deltas which reference base objects not -contained within the pack (but are known to exist at the receiving -end). This can reduce the network traffic significantly, but it -requires the receiving end to know how to "thicken" these packs by -adding the missing bases to the pack. - -The upload-pack server advertises 'thin-pack' when it can generate -and send a thin pack. A client requests the 'thin-pack' capability -when it understands how to "thicken" it, notifying the server that -it can receive such a pack. A client MUST NOT request the -'thin-pack' capability if it cannot turn a thin pack into a -self-contained pack. - -Receive-pack, on the other hand, is assumed by default to be able to -handle thin packs, but can ask the client not to use the feature by -advertising the 'no-thin' capability. A client MUST NOT send a thin -pack if the server advertises the 'no-thin' capability. - -The reasons for this asymmetry are historical. The receive-pack -program did not exist until after the invention of thin packs, so -historically the reference implementation of receive-pack always -understood thin packs. Adding 'no-thin' later allowed receive-pack -to disable the feature in a backwards-compatible manner. - - -side-band, side-band-64k ------------------------- - -This capability means that server can send, and client understand multiplexed -progress reports and error info interleaved with the packfile itself. - -These two options are mutually exclusive. A modern client always -favors 'side-band-64k'. - -Either mode indicates that the packfile data will be streamed broken -up into packets of up to either 1000 bytes in the case of 'side_band', -or 65520 bytes in the case of 'side_band_64k'. Each packet is made up -of a leading 4-byte pkt-line length of how much data is in the packet, -followed by a 1-byte stream code, followed by the actual data. - -The stream code can be one of: - - 1 - pack data - 2 - progress messages - 3 - fatal error message just before stream aborts - -The "side-band-64k" capability came about as a way for newer clients -that can handle much larger packets to request packets that are -actually crammed nearly full, while maintaining backward compatibility -for the older clients. - -Further, with side-band and its up to 1000-byte messages, it's actually -999 bytes of payload and 1 byte for the stream code. With side-band-64k, -same deal, you have up to 65519 bytes of data and 1 byte for the stream -code. - -The client MUST send only maximum of one of "side-band" and "side- -band-64k". Server MUST diagnose it as an error if client requests -both. - -ofs-delta ---------- - -Server can send, and client understand PACKv2 with delta referring to -its base by position in pack rather than by an obj-id. That is, they can -send/read OBJ_OFS_DELTA (aka type 6) in a packfile. - -agent ------ - -The server may optionally send a capability of the form `agent=X` to -notify the client that the server is running version `X`. The client may -optionally return its own agent string by responding with an `agent=Y` -capability (but it MUST NOT do so if the server did not mention the -agent capability). The `X` and `Y` strings may contain any printable -ASCII characters except space (i.e., the byte range 32 < x < 127), and -are typically of the form "package/version" (e.g., "git/1.8.3.1"). The -agent strings are purely informative for statistics and debugging -purposes, and MUST NOT be used to programmatically assume the presence -or absence of particular features. - -object-format -------------- - -This capability, which takes a hash algorithm as an argument, indicates -that the server supports the given hash algorithms. It may be sent -multiple times; if so, the first one given is the one used in the ref -advertisement. - -When provided by the client, this indicates that it intends to use the -given hash algorithm to communicate. The algorithm provided must be one -that the server supports. - -If this capability is not provided, it is assumed that the only -supported algorithm is SHA-1. - -symref ------- - -This parameterized capability is used to inform the receiver which symbolic ref -points to which ref; for example, "symref=HEAD:refs/heads/master" tells the -receiver that HEAD points to master. This capability can be repeated to -represent multiple symrefs. - -Servers SHOULD include this capability for the HEAD symref if it is one of the -refs being sent. - -Clients MAY use the parameters from this capability to select the proper initial -branch when cloning a repository. - -shallow -------- - -This capability adds "deepen", "shallow" and "unshallow" commands to -the fetch-pack/upload-pack protocol so clients can request shallow -clones. - -deepen-since ------------- - -This capability adds "deepen-since" command to fetch-pack/upload-pack -protocol so the client can request shallow clones that are cut at a -specific time, instead of depth. Internally it's equivalent of doing -"rev-list --max-age=<timestamp>" on the server side. "deepen-since" -cannot be used with "deepen". - -deepen-not ----------- - -This capability adds "deepen-not" command to fetch-pack/upload-pack -protocol so the client can request shallow clones that are cut at a -specific revision, instead of depth. Internally it's equivalent of -doing "rev-list --not <rev>" on the server side. "deepen-not" -cannot be used with "deepen", but can be used with "deepen-since". - -deepen-relative ---------------- - -If this capability is requested by the client, the semantics of -"deepen" command is changed. The "depth" argument is the depth from -the current shallow boundary, instead of the depth from remote refs. - -no-progress ------------ - -The client was started with "git clone -q" or something, and doesn't -want that side band 2. Basically the client just says "I do not -wish to receive stream 2 on sideband, so do not send it to me, and if -you did, I will drop it on the floor anyway". However, the sideband -channel 3 is still used for error responses. - -include-tag ------------ - -The 'include-tag' capability is about sending annotated tags if we are -sending objects they point to. If we pack an object to the client, and -a tag object points exactly at that object, we pack the tag object too. -In general this allows a client to get all new annotated tags when it -fetches a branch, in a single network connection. - -Clients MAY always send include-tag, hardcoding it into a request when -the server advertises this capability. The decision for a client to -request include-tag only has to do with the client's desires for tag -data, whether or not a server had advertised objects in the -refs/tags/* namespace. - -Servers MUST pack the tags if their referrant is packed and the client -has requested include-tags. - -Clients MUST be prepared for the case where a server has ignored -include-tag and has not actually sent tags in the pack. In such -cases the client SHOULD issue a subsequent fetch to acquire the tags -that include-tag would have otherwise given the client. - -The server SHOULD send include-tag, if it supports it, regardless -of whether or not there are tags available. - -report-status -------------- - -The receive-pack process can receive a 'report-status' capability, -which tells it that the client wants a report of what happened after -a packfile upload and reference update. If the pushing client requests -this capability, after unpacking and updating references the server -will respond with whether the packfile unpacked successfully and if -each reference was updated successfully. If any of those were not -successful, it will send back an error message. See pack-protocol.txt -for example messages. - -report-status-v2 ----------------- - -Capability 'report-status-v2' extends capability 'report-status' by -adding new "option" directives in order to support reference rewritten by -the "proc-receive" hook. The "proc-receive" hook may handle a command -for a pseudo-reference which may create or update a reference with -different name, new-oid, and old-oid. While the capability -'report-status' cannot report for such case. See pack-protocol.txt -for details. - -delete-refs ------------ - -If the server sends back the 'delete-refs' capability, it means that -it is capable of accepting a zero-id value as the target -value of a reference update. It is not sent back by the client, it -simply informs the client that it can be sent zero-id values -to delete references. - -quiet ------ - -If the receive-pack server advertises the 'quiet' capability, it is -capable of silencing human-readable progress output which otherwise may -be shown when processing the received pack. A send-pack client should -respond with the 'quiet' capability to suppress server-side progress -reporting if the local progress reporting is also being suppressed -(e.g., via `push -q`, or if stderr does not go to a tty). - -atomic ------- - -If the server sends the 'atomic' capability it is capable of accepting -atomic pushes. If the pushing client requests this capability, the server -will update the refs in one atomic transaction. Either all refs are -updated or none. - -push-options ------------- - -If the server sends the 'push-options' capability it is able to accept -push options after the update commands have been sent, but before the -packfile is streamed. If the pushing client requests this capability, -the server will pass the options to the pre- and post- receive hooks -that process this push request. - -allow-tip-sha1-in-want ----------------------- - -If the upload-pack server advertises this capability, fetch-pack may -send "want" lines with object names that exist at the server but are not -advertised by upload-pack. For historical reasons, the name of this -capability contains "sha1". Object names are always given using the -object format negotiated through the 'object-format' capability. - -allow-reachable-sha1-in-want ----------------------------- - -If the upload-pack server advertises this capability, fetch-pack may -send "want" lines with object names that exist at the server but are not -advertised by upload-pack. For historical reasons, the name of this -capability contains "sha1". Object names are always given using the -object format negotiated through the 'object-format' capability. - -push-cert=<nonce> ------------------ - -The receive-pack server that advertises this capability is willing -to accept a signed push certificate, and asks the <nonce> to be -included in the push certificate. A send-pack client MUST NOT -send a push-cert packet unless the receive-pack server advertises -this capability. - -filter ------- - -If the upload-pack server advertises the 'filter' capability, -fetch-pack may send "filter" commands to request a partial clone -or partial fetch and request that the server omit various objects -from the packfile. - -session-id=<session id> ------------------------ - -The server may advertise a session ID that can be used to identify this process -across multiple requests. The client may advertise its own session ID back to -the server as well. - -Session IDs should be unique to a given process. They must fit within a -packet-line, and must not contain non-printable or whitespace characters. The -current implementation uses trace2 session IDs (see -link:api-trace2.html[api-trace2] for details), but this may change and users of -the session ID should not rely on this fact. diff --git a/Documentation/technical/protocol-common.txt b/Documentation/technical/protocol-common.txt deleted file mode 100644 index ecedb34bba..0000000000 --- a/Documentation/technical/protocol-common.txt +++ /dev/null @@ -1,99 +0,0 @@ -Documentation Common to Pack and Http Protocols -=============================================== - -ABNF Notation -------------- - -ABNF notation as described by RFC 5234 is used within the protocol documents, -except the following replacement core rules are used: ----- - HEXDIG = DIGIT / "a" / "b" / "c" / "d" / "e" / "f" ----- - -We also define the following common rules: ----- - NUL = %x00 - zero-id = 40*"0" - obj-id = 40*(HEXDIGIT) - - refname = "HEAD" - refname /= "refs/" <see discussion below> ----- - -A refname is a hierarchical octet string beginning with "refs/" and -not violating the 'git-check-ref-format' command's validation rules. -More specifically, they: - -. They can include slash `/` for hierarchical (directory) - grouping, but no slash-separated component can begin with a - dot `.`. - -. They must contain at least one `/`. This enforces the presence of a - category like `heads/`, `tags/` etc. but the actual names are not - restricted. - -. They cannot have two consecutive dots `..` anywhere. - -. They cannot have ASCII control characters (i.e. bytes whose - values are lower than \040, or \177 `DEL`), space, tilde `~`, - caret `^`, colon `:`, question-mark `?`, asterisk `*`, - or open bracket `[` anywhere. - -. They cannot end with a slash `/` or a dot `.`. - -. They cannot end with the sequence `.lock`. - -. They cannot contain a sequence `@{`. - -. They cannot contain a `\\`. - - -pkt-line Format ---------------- - -Much (but not all) of the payload is described around pkt-lines. - -A pkt-line is a variable length binary string. The first four bytes -of the line, the pkt-len, indicates the total length of the line, -in hexadecimal. The pkt-len includes the 4 bytes used to contain -the length's hexadecimal representation. - -A pkt-line MAY contain binary data, so implementors MUST ensure -pkt-line parsing/formatting routines are 8-bit clean. - -A non-binary line SHOULD BE terminated by an LF, which if present -MUST be included in the total length. Receivers MUST treat pkt-lines -with non-binary data the same whether or not they contain the trailing -LF (stripping the LF if present, and not complaining when it is -missing). - -The maximum length of a pkt-line's data component is 65516 bytes. -Implementations MUST NOT send pkt-line whose length exceeds 65520 -(65516 bytes of payload + 4 bytes of length data). - -Implementations SHOULD NOT send an empty pkt-line ("0004"). - -A pkt-line with a length field of 0 ("0000"), called a flush-pkt, -is a special case and MUST be handled differently than an empty -pkt-line ("0004"). - ----- - pkt-line = data-pkt / flush-pkt - - data-pkt = pkt-len pkt-payload - pkt-len = 4*(HEXDIG) - pkt-payload = (pkt-len - 4)*(OCTET) - - flush-pkt = "0000" ----- - -Examples (as C-style strings): - ----- - pkt-line actual value - --------------------------------- - "0006a\n" "a\n" - "0005a" "a" - "000bfoobar\n" "foobar\n" - "0004" "" ----- diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt deleted file mode 100644 index 8a877d27e2..0000000000 --- a/Documentation/technical/protocol-v2.txt +++ /dev/null @@ -1,568 +0,0 @@ -Git Wire Protocol, Version 2 -============================ - -This document presents a specification for a version 2 of Git's wire -protocol. Protocol v2 will improve upon v1 in the following ways: - - * Instead of multiple service names, multiple commands will be - supported by a single service - * Easily extendable as capabilities are moved into their own section - of the protocol, no longer being hidden behind a NUL byte and - limited by the size of a pkt-line - * Separate out other information hidden behind NUL bytes (e.g. agent - string as a capability and symrefs can be requested using 'ls-refs') - * Reference advertisement will be omitted unless explicitly requested - * ls-refs command to explicitly request some refs - * Designed with http and stateless-rpc in mind. With clear flush - semantics the http remote helper can simply act as a proxy - -In protocol v2 communication is command oriented. When first contacting a -server a list of capabilities will advertised. Some of these capabilities -will be commands which a client can request be executed. Once a command -has completed, a client can reuse the connection and request that other -commands be executed. - -Packet-Line Framing -------------------- - -All communication is done using packet-line framing, just as in v1. See -`Documentation/technical/pack-protocol.txt` and -`Documentation/technical/protocol-common.txt` for more information. - -In protocol v2 these special packets will have the following semantics: - - * '0000' Flush Packet (flush-pkt) - indicates the end of a message - * '0001' Delimiter Packet (delim-pkt) - separates sections of a message - * '0002' Response End Packet (response-end-pkt) - indicates the end of a - response for stateless connections - -Initial Client Request ----------------------- - -In general a client can request to speak protocol v2 by sending -`version=2` through the respective side-channel for the transport being -used which inevitably sets `GIT_PROTOCOL`. More information can be -found in `pack-protocol.txt` and `http-protocol.txt`, as well as the -`GIT_PROTOCOL` definition in `git.txt`. In all cases the -response from the server is the capability advertisement. - -Git Transport -~~~~~~~~~~~~~ - -When using the git:// transport, you can request to use protocol v2 by -sending "version=2" as an extra parameter: - - 003egit-upload-pack /project.git\0host=myserver.com\0\0version=2\0 - -SSH and File Transport -~~~~~~~~~~~~~~~~~~~~~~ - -When using either the ssh:// or file:// transport, the GIT_PROTOCOL -environment variable must be set explicitly to include "version=2". -The server may need to be configured to allow this environment variable -to pass. - -HTTP Transport -~~~~~~~~~~~~~~ - -When using the http:// or https:// transport a client makes a "smart" -info/refs request as described in `http-protocol.txt` and requests that -v2 be used by supplying "version=2" in the `Git-Protocol` header. - - C: GET $GIT_URL/info/refs?service=git-upload-pack HTTP/1.0 - C: Git-Protocol: version=2 - -A v2 server would reply: - - S: 200 OK - S: <Some headers> - S: ... - S: - S: 000eversion 2\n - S: <capability-advertisement> - -Subsequent requests are then made directly to the service -`$GIT_URL/git-upload-pack`. (This works the same for git-receive-pack). - -Uses the `--http-backend-info-refs` option to -linkgit:git-upload-pack[1]. - -The server may need to be configured to pass this header's contents via -the `GIT_PROTOCOL` variable. See the discussion in `git-http-backend.txt`. - -Capability Advertisement ------------------------- - -A server which decides to communicate (based on a request from a client) -using protocol version 2, notifies the client by sending a version string -in its initial response followed by an advertisement of its capabilities. -Each capability is a key with an optional value. Clients must ignore all -unknown keys. Semantics of unknown values are left to the definition of -each key. Some capabilities will describe commands which can be requested -to be executed by the client. - - capability-advertisement = protocol-version - capability-list - flush-pkt - - protocol-version = PKT-LINE("version 2" LF) - capability-list = *capability - capability = PKT-LINE(key[=value] LF) - - key = 1*(ALPHA | DIGIT | "-_") - value = 1*(ALPHA | DIGIT | " -_.,?\/{}[]()<>!@#$%^&*+=:;") - -Command Request ---------------- - -After receiving the capability advertisement, a client can then issue a -request to select the command it wants with any particular capabilities -or arguments. There is then an optional section where the client can -provide any command specific parameters or queries. Only a single -command can be requested at a time. - - request = empty-request | command-request - empty-request = flush-pkt - command-request = command - capability-list - delim-pkt - command-args - flush-pkt - command = PKT-LINE("command=" key LF) - command-args = *command-specific-arg - - command-specific-args are packet line framed arguments defined by - each individual command. - -The server will then check to ensure that the client's request is -comprised of a valid command as well as valid capabilities which were -advertised. If the request is valid the server will then execute the -command. A server MUST wait till it has received the client's entire -request before issuing a response. The format of the response is -determined by the command being executed, but in all cases a flush-pkt -indicates the end of the response. - -When a command has finished, and the client has received the entire -response from the server, a client can either request that another -command be executed or can terminate the connection. A client may -optionally send an empty request consisting of just a flush-pkt to -indicate that no more requests will be made. - -Capabilities ------------- - -There are two different types of capabilities: normal capabilities, -which can be used to convey information or alter the behavior of a -request, and commands, which are the core actions that a client wants to -perform (fetch, push, etc). - -Protocol version 2 is stateless by default. This means that all commands -must only last a single round and be stateless from the perspective of the -server side, unless the client has requested a capability indicating that -state should be maintained by the server. Clients MUST NOT require state -management on the server side in order to function correctly. This -permits simple round-robin load-balancing on the server side, without -needing to worry about state management. - -agent -~~~~~ - -The server can advertise the `agent` capability with a value `X` (in the -form `agent=X`) to notify the client that the server is running version -`X`. The client may optionally send its own agent string by including -the `agent` capability with a value `Y` (in the form `agent=Y`) in its -request to the server (but it MUST NOT do so if the server did not -advertise the agent capability). The `X` and `Y` strings may contain any -printable ASCII characters except space (i.e., the byte range 32 < x < -127), and are typically of the form "package/version" (e.g., -"git/1.8.3.1"). The agent strings are purely informative for statistics -and debugging purposes, and MUST NOT be used to programmatically assume -the presence or absence of particular features. - -ls-refs -~~~~~~~ - -`ls-refs` is the command used to request a reference advertisement in v2. -Unlike the current reference advertisement, ls-refs takes in arguments -which can be used to limit the refs sent from the server. - -Additional features not supported in the base command will be advertised -as the value of the command in the capability advertisement in the form -of a space separated list of features: "<command>=<feature 1> <feature 2>" - -ls-refs takes in the following arguments: - - symrefs - In addition to the object pointed by it, show the underlying ref - pointed by it when showing a symbolic ref. - peel - Show peeled tags. - ref-prefix <prefix> - When specified, only references having a prefix matching one of - the provided prefixes are displayed. Multiple instances may be - given, in which case references matching any prefix will be - shown. Note that this is purely for optimization; a server MAY - show refs not matching the prefix if it chooses, and clients - should filter the result themselves. - -If the 'unborn' feature is advertised the following argument can be -included in the client's request. - - unborn - The server will send information about HEAD even if it is a symref - pointing to an unborn branch in the form "unborn HEAD - symref-target:<target>". - -The output of ls-refs is as follows: - - output = *ref - flush-pkt - obj-id-or-unborn = (obj-id | "unborn") - ref = PKT-LINE(obj-id-or-unborn SP refname *(SP ref-attribute) LF) - ref-attribute = (symref | peeled) - symref = "symref-target:" symref-target - peeled = "peeled:" obj-id - -fetch -~~~~~ - -`fetch` is the command used to fetch a packfile in v2. It can be looked -at as a modified version of the v1 fetch where the ref-advertisement is -stripped out (since the `ls-refs` command fills that role) and the -message format is tweaked to eliminate redundancies and permit easy -addition of future extensions. - -Additional features not supported in the base command will be advertised -as the value of the command in the capability advertisement in the form -of a space separated list of features: "<command>=<feature 1> <feature 2>" - -A `fetch` request can take the following arguments: - - want <oid> - Indicates to the server an object which the client wants to - retrieve. Wants can be anything and are not limited to - advertised objects. - - have <oid> - Indicates to the server an object which the client has locally. - This allows the server to make a packfile which only contains - the objects that the client needs. Multiple 'have' lines can be - supplied. - - done - Indicates to the server that negotiation should terminate (or - not even begin if performing a clone) and that the server should - use the information supplied in the request to construct the - packfile. - - thin-pack - Request that a thin pack be sent, which is a pack with deltas - which reference base objects not contained within the pack (but - are known to exist at the receiving end). This can reduce the - network traffic significantly, but it requires the receiving end - to know how to "thicken" these packs by adding the missing bases - to the pack. - - no-progress - Request that progress information that would normally be sent on - side-band channel 2, during the packfile transfer, should not be - sent. However, the side-band channel 3 is still used for error - responses. - - include-tag - Request that annotated tags should be sent if the objects they - point to are being sent. - - ofs-delta - Indicate that the client understands PACKv2 with delta referring - to its base by position in pack rather than by an oid. That is, - they can read OBJ_OFS_DELTA (aka type 6) in a packfile. - -If the 'shallow' feature is advertised the following arguments can be -included in the clients request as well as the potential addition of the -'shallow-info' section in the server's response as explained below. - - shallow <oid> - A client must notify the server of all commits for which it only - has shallow copies (meaning that it doesn't have the parents of - a commit) by supplying a 'shallow <oid>' line for each such - object so that the server is aware of the limitations of the - client's history. This is so that the server is aware that the - client may not have all objects reachable from such commits. - - deepen <depth> - Requests that the fetch/clone should be shallow having a commit - depth of <depth> relative to the remote side. - - deepen-relative - Requests that the semantics of the "deepen" command be changed - to indicate that the depth requested is relative to the client's - current shallow boundary, instead of relative to the requested - commits. - - deepen-since <timestamp> - Requests that the shallow clone/fetch should be cut at a - specific time, instead of depth. Internally it's equivalent to - doing "git rev-list --max-age=<timestamp>". Cannot be used with - "deepen". - - deepen-not <rev> - Requests that the shallow clone/fetch should be cut at a - specific revision specified by '<rev>', instead of a depth. - Internally it's equivalent of doing "git rev-list --not <rev>". - Cannot be used with "deepen", but can be used with - "deepen-since". - -If the 'filter' feature is advertised, the following argument can be -included in the client's request: - - filter <filter-spec> - Request that various objects from the packfile be omitted - using one of several filtering techniques. These are intended - for use with partial clone and partial fetch operations. See - `rev-list` for possible "filter-spec" values. When communicating - with other processes, senders SHOULD translate scaled integers - (e.g. "1k") into a fully-expanded form (e.g. "1024") to aid - interoperability with older receivers that may not understand - newly-invented scaling suffixes. However, receivers SHOULD - accept the following suffixes: 'k', 'm', and 'g' for 1024, - 1048576, and 1073741824, respectively. - -If the 'ref-in-want' feature is advertised, the following argument can -be included in the client's request as well as the potential addition of -the 'wanted-refs' section in the server's response as explained below. - - want-ref <ref> - Indicates to the server that the client wants to retrieve a - particular ref, where <ref> is the full name of a ref on the - server. - -If the 'sideband-all' feature is advertised, the following argument can be -included in the client's request: - - sideband-all - Instruct the server to send the whole response multiplexed, not just - the packfile section. All non-flush and non-delim PKT-LINE in the - response (not only in the packfile section) will then start with a byte - indicating its sideband (1, 2, or 3), and the server may send "0005\2" - (a PKT-LINE of sideband 2 with no payload) as a keepalive packet. - -If the 'packfile-uris' feature is advertised, the following argument -can be included in the client's request as well as the potential -addition of the 'packfile-uris' section in the server's response as -explained below. - - packfile-uris <comma-separated list of protocols> - Indicates to the server that the client is willing to receive - URIs of any of the given protocols in place of objects in the - sent packfile. Before performing the connectivity check, the - client should download from all given URIs. Currently, the - protocols supported are "http" and "https". - -If the 'wait-for-done' feature is advertised, the following argument -can be included in the client's request. - - wait-for-done - Indicates to the server that it should never send "ready", but - should wait for the client to say "done" before sending the - packfile. - -The response of `fetch` is broken into a number of sections separated by -delimiter packets (0001), with each section beginning with its section -header. Most sections are sent only when the packfile is sent. - - output = acknowledgements flush-pkt | - [acknowledgments delim-pkt] [shallow-info delim-pkt] - [wanted-refs delim-pkt] [packfile-uris delim-pkt] - packfile flush-pkt - - acknowledgments = PKT-LINE("acknowledgments" LF) - (nak | *ack) - (ready) - ready = PKT-LINE("ready" LF) - nak = PKT-LINE("NAK" LF) - ack = PKT-LINE("ACK" SP obj-id LF) - - shallow-info = PKT-LINE("shallow-info" LF) - *PKT-LINE((shallow | unshallow) LF) - shallow = "shallow" SP obj-id - unshallow = "unshallow" SP obj-id - - wanted-refs = PKT-LINE("wanted-refs" LF) - *PKT-LINE(wanted-ref LF) - wanted-ref = obj-id SP refname - - packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri - packfile-uri = PKT-LINE(40*(HEXDIGIT) SP *%x20-ff LF) - - packfile = PKT-LINE("packfile" LF) - *PKT-LINE(%x01-03 *%x00-ff) - - acknowledgments section - * If the client determines that it is finished with negotiations by - sending a "done" line (thus requiring the server to send a packfile), - the acknowledgments sections MUST be omitted from the server's - response. - - * Always begins with the section header "acknowledgments" - - * The server will respond with "NAK" if none of the object ids sent - as have lines were common. - - * The server will respond with "ACK obj-id" for all of the - object ids sent as have lines which are common. - - * A response cannot have both "ACK" lines as well as a "NAK" - line. - - * The server will respond with a "ready" line indicating that - the server has found an acceptable common base and is ready to - make and send a packfile (which will be found in the packfile - section of the same response) - - * If the server has found a suitable cut point and has decided - to send a "ready" line, then the server can decide to (as an - optimization) omit any "ACK" lines it would have sent during - its response. This is because the server will have already - determined the objects it plans to send to the client and no - further negotiation is needed. - - shallow-info section - * If the client has requested a shallow fetch/clone, a shallow - client requests a fetch or the server is shallow then the - server's response may include a shallow-info section. The - shallow-info section will be included if (due to one of the - above conditions) the server needs to inform the client of any - shallow boundaries or adjustments to the clients already - existing shallow boundaries. - - * Always begins with the section header "shallow-info" - - * If a positive depth is requested, the server will compute the - set of commits which are no deeper than the desired depth. - - * The server sends a "shallow obj-id" line for each commit whose - parents will not be sent in the following packfile. - - * The server sends an "unshallow obj-id" line for each commit - which the client has indicated is shallow, but is no longer - shallow as a result of the fetch (due to its parents being - sent in the following packfile). - - * The server MUST NOT send any "unshallow" lines for anything - which the client has not indicated was shallow as a part of - its request. - - wanted-refs section - * This section is only included if the client has requested a - ref using a 'want-ref' line and if a packfile section is also - included in the response. - - * Always begins with the section header "wanted-refs". - - * The server will send a ref listing ("<oid> <refname>") for - each reference requested using 'want-ref' lines. - - * The server MUST NOT send any refs which were not requested - using 'want-ref' lines. - - packfile-uris section - * This section is only included if the client sent - 'packfile-uris' and the server has at least one such URI to - send. - - * Always begins with the section header "packfile-uris". - - * For each URI the server sends, it sends a hash of the pack's - contents (as output by git index-pack) followed by the URI. - - * The hashes are 40 hex characters long. When Git upgrades to a new - hash algorithm, this might need to be updated. (It should match - whatever index-pack outputs after "pack\t" or "keep\t". - - packfile section - * This section is only included if the client has sent 'want' - lines in its request and either requested that no more - negotiation be done by sending 'done' or if the server has - decided it has found a sufficient cut point to produce a - packfile. - - * Always begins with the section header "packfile" - - * The transmission of the packfile begins immediately after the - section header - - * The data transfer of the packfile is always multiplexed, using - the same semantics of the 'side-band-64k' capability from - protocol version 1. This means that each packet, during the - packfile data stream, is made up of a leading 4-byte pkt-line - length (typical of the pkt-line format), followed by a 1-byte - stream code, followed by the actual data. - - The stream code can be one of: - 1 - pack data - 2 - progress messages - 3 - fatal error message just before stream aborts - -server-option -~~~~~~~~~~~~~ - -If advertised, indicates that any number of server specific options can be -included in a request. This is done by sending each option as a -"server-option=<option>" capability line in the capability-list section of -a request. - -The provided options must not contain a NUL or LF character. - - object-format -~~~~~~~~~~~~~~~ - -The server can advertise the `object-format` capability with a value `X` (in the -form `object-format=X`) to notify the client that the server is able to deal -with objects using hash algorithm X. If not specified, the server is assumed to -only handle SHA-1. If the client would like to use a hash algorithm other than -SHA-1, it should specify its object-format string. - -session-id=<session id> -~~~~~~~~~~~~~~~~~~~~~~~ - -The server may advertise a session ID that can be used to identify this process -across multiple requests. The client may advertise its own session ID back to -the server as well. - -Session IDs should be unique to a given process. They must fit within a -packet-line, and must not contain non-printable or whitespace characters. The -current implementation uses trace2 session IDs (see -link:api-trace2.html[api-trace2] for details), but this may change and users of -the session ID should not rely on this fact. - -object-info -~~~~~~~~~~~ - -`object-info` is the command to retrieve information about one or more objects. -Its main purpose is to allow a client to make decisions based on this -information without having to fully fetch objects. Object size is the only -information that is currently supported. - -An `object-info` request takes the following arguments: - - size - Requests size information to be returned for each listed object id. - - oid <oid> - Indicates to the server an object which the client wants to obtain - information for. - -The response of `object-info` is a list of the requested object ids -and associated requested information, each separated by a single space. - - output = info flush-pkt - - info = PKT-LINE(attrs) LF) - *PKT-LINE(obj-info LF) - - attrs = attr | attrs SP attrs - - attr = "size" - - obj-info = obj-id SP obj-size diff --git a/Documentation/technical/signature-format.txt b/Documentation/technical/signature-format.txt deleted file mode 100644 index 166721be6f..0000000000 --- a/Documentation/technical/signature-format.txt +++ /dev/null @@ -1,202 +0,0 @@ -Git signature format -==================== - -== Overview - -Git uses cryptographic signatures in various places, currently objects (tags, -commits, mergetags) and transactions (pushes). In every case, the command which -is about to create an object or transaction determines a payload from that, -calls gpg to obtain a detached signature for the payload (`gpg -bsa`) and -embeds the signature into the object or transaction. - -Signatures always begin with `-----BEGIN PGP SIGNATURE-----` -and end with `-----END PGP SIGNATURE-----`, unless gpg is told to -produce RFC1991 signatures which use `MESSAGE` instead of `SIGNATURE`. - -Signatures sometimes appear as a part of the normal payload -(e.g. a signed tag has the signature block appended after the payload -that the signature applies to), and sometimes appear in the value of -an object header (e.g. a merge commit that merged a signed tag would -have the entire tag contents on its "mergetag" header). In the case -of the latter, the usual multi-line formatting rule for object -headers applies. I.e. the second and subsequent lines are prefixed -with a SP to signal that the line is continued from the previous -line. - -This is even true for an originally empty line. In the following -examples, the end of line that ends with a whitespace letter is -highlighted with a `$` sign; if you are trying to recreate these -example by hand, do not cut and paste them---they are there -primarily to highlight extra whitespace at the end of some lines. - -The signed payload and the way the signature is embedded depends -on the type of the object resp. transaction. - -== Tag signatures - -- created by: `git tag -s` -- payload: annotated tag object -- embedding: append the signature to the unsigned tag object -- example: tag `signedtag` with subject `signed tag` - ----- -object 04b871796dc0420f8e7561a895b52484b701d51a -type commit -tag signedtag -tagger C O Mitter <committer@example.com> 1465981006 +0000 - -signed tag - -signed tag message body ------BEGIN PGP SIGNATURE----- -Version: GnuPG v1 - -iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn -rxfdqrvWd1K80sl2TOt8Bg/NYwrUBw/RWJ+sg/hhHp4WtvE1HDGHlkEz3y11Lkuh -8tSxS3qKTxXUGozyPGuE90sJfExhZlW4knIQ1wt/yWqM+33E9pN4hzPqLwyrdods -q8FWEqPPUbSJXoMbRPw04S5jrLtZSsUWbRYjmJCHzlhSfFWW4eFd37uquIaLUBS0 -rkC3Jrx7420jkIpgFcTI2s60uhSQLzgcCwdA2ukSYIRnjg/zDkj8+3h/GaROJ72x -lZyI6HWixKJkWw8lE9aAOD9TmTW9sFJwcVAzmAuFX2kUreDUKMZduGcoRYGpD7E= -=jpXa ------END PGP SIGNATURE----- ----- - -- verify with: `git verify-tag [-v]` or `git tag -v` - ----- -gpg: Signature made Wed Jun 15 10:56:46 2016 CEST using RSA key ID B7227189 -gpg: Good signature from "Eris Discordia <discord@example.net>" -gpg: WARNING: This key is not certified with a trusted signature! -gpg: There is no indication that the signature belongs to the owner. -Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189 -object 04b871796dc0420f8e7561a895b52484b701d51a -type commit -tag signedtag -tagger C O Mitter <committer@example.com> 1465981006 +0000 - -signed tag - -signed tag message body ----- - -== Commit signatures - -- created by: `git commit -S` -- payload: commit object -- embedding: header entry `gpgsig` - (content is preceded by a space) -- example: commit with subject `signed commit` - ----- -tree eebfed94e75e7760540d1485c740902590a00332 -parent 04b871796dc0420f8e7561a895b52484b701d51a -author A U Thor <author@example.com> 1465981137 +0000 -committer C O Mitter <committer@example.com> 1465981137 +0000 -gpgsig -----BEGIN PGP SIGNATURE----- - Version: GnuPG v1 - $ - iQEcBAABAgAGBQJXYRjRAAoJEGEJLoW3InGJ3IwIAIY4SA6GxY3BjL60YyvsJPh/ - HRCJwH+w7wt3Yc/9/bW2F+gF72kdHOOs2jfv+OZhq0q4OAN6fvVSczISY/82LpS7 - DVdMQj2/YcHDT4xrDNBnXnviDO9G7am/9OE77kEbXrp7QPxvhjkicHNwy2rEflAA - zn075rtEERDHr8nRYiDh8eVrefSO7D+bdQ7gv+7GsYMsd2auJWi1dHOSfTr9HIF4 - HJhWXT9d2f8W+diRYXGh4X0wYiGg6na/soXc+vdtDYBzIxanRqjg8jCAeo1eOTk1 - EdTwhcTZlI0x5pvJ3H0+4hA2jtldVtmPM4OTB0cTrEWBad7XV6YgiyuII73Ve3I= - =jKHM - -----END PGP SIGNATURE----- - -signed commit - -signed commit message body ----- - -- verify with: `git verify-commit [-v]` (or `git show --show-signature`) - ----- -gpg: Signature made Wed Jun 15 10:58:57 2016 CEST using RSA key ID B7227189 -gpg: Good signature from "Eris Discordia <discord@example.net>" -gpg: WARNING: This key is not certified with a trusted signature! -gpg: There is no indication that the signature belongs to the owner. -Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189 -tree eebfed94e75e7760540d1485c740902590a00332 -parent 04b871796dc0420f8e7561a895b52484b701d51a -author A U Thor <author@example.com> 1465981137 +0000 -committer C O Mitter <committer@example.com> 1465981137 +0000 - -signed commit - -signed commit message body ----- - -== Mergetag signatures - -- created by: `git merge` on signed tag -- payload/embedding: the whole signed tag object is embedded into - the (merge) commit object as header entry `mergetag` -- example: merge of the signed tag `signedtag` as above - ----- -tree c7b1cff039a93f3600a1d18b82d26688668c7dea -parent c33429be94b5f2d3ee9b0adad223f877f174b05d -parent 04b871796dc0420f8e7561a895b52484b701d51a -author A U Thor <author@example.com> 1465982009 +0000 -committer C O Mitter <committer@example.com> 1465982009 +0000 -mergetag object 04b871796dc0420f8e7561a895b52484b701d51a - type commit - tag signedtag - tagger C O Mitter <committer@example.com> 1465981006 +0000 - $ - signed tag - $ - signed tag message body - -----BEGIN PGP SIGNATURE----- - Version: GnuPG v1 - $ - iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn - rxfdqrvWd1K80sl2TOt8Bg/NYwrUBw/RWJ+sg/hhHp4WtvE1HDGHlkEz3y11Lkuh - 8tSxS3qKTxXUGozyPGuE90sJfExhZlW4knIQ1wt/yWqM+33E9pN4hzPqLwyrdods - q8FWEqPPUbSJXoMbRPw04S5jrLtZSsUWbRYjmJCHzlhSfFWW4eFd37uquIaLUBS0 - rkC3Jrx7420jkIpgFcTI2s60uhSQLzgcCwdA2ukSYIRnjg/zDkj8+3h/GaROJ72x - lZyI6HWixKJkWw8lE9aAOD9TmTW9sFJwcVAzmAuFX2kUreDUKMZduGcoRYGpD7E= - =jpXa - -----END PGP SIGNATURE----- - -Merge tag 'signedtag' into downstream - -signed tag - -signed tag message body - -# gpg: Signature made Wed Jun 15 08:56:46 2016 UTC using RSA key ID B7227189 -# gpg: Good signature from "Eris Discordia <discord@example.net>" -# gpg: WARNING: This key is not certified with a trusted signature! -# gpg: There is no indication that the signature belongs to the owner. -# Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189 ----- - -- verify with: verification is embedded in merge commit message by default, - alternatively with `git show --show-signature`: - ----- -commit 9863f0c76ff78712b6800e199a46aa56afbcbd49 -merged tag 'signedtag' -gpg: Signature made Wed Jun 15 10:56:46 2016 CEST using RSA key ID B7227189 -gpg: Good signature from "Eris Discordia <discord@example.net>" -gpg: WARNING: This key is not certified with a trusted signature! -gpg: There is no indication that the signature belongs to the owner. -Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189 -Merge: c33429b 04b8717 -Author: A U Thor <author@example.com> -Date: Wed Jun 15 09:13:29 2016 +0000 - - Merge tag 'signedtag' into downstream - - signed tag - - signed tag message body - - # gpg: Signature made Wed Jun 15 08:56:46 2016 UTC using RSA key ID B7227189 - # gpg: Good signature from "Eris Discordia <discord@example.net>" - # gpg: WARNING: This key is not certified with a trusted signature! - # gpg: There is no indication that the signature belongs to the owner. - # Primary key fingerprint: D4BE 2231 1AD3 131E 5EDA 29A4 6109 2E85 B722 7189 ----- |