summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJunio C Hamano <gitster@pobox.com>2021-03-02 23:07:49 -0800
committerJunio C Hamano <gitster@pobox.com>2021-03-02 23:07:49 -0800
commitb66f8a5b45f896b6a49ddc1a1b10518671dbfa60 (patch)
tree9889c649f0e3f51c138862042531cf77a23d8284
parenta372d5bb569776befd395d9c66067ecc09f13b96 (diff)
downloadgit-htmldocs-b66f8a5b45f896b6a49ddc1a1b10518671dbfa60.tar.gz
Autogenerated HTML docs for v2.31.0-rc1
-rw-r--r--RelNotes/2.31.0.txt31
-rw-r--r--SubmittingPatches.html2
-rw-r--r--git-for-each-ref.html10
-rw-r--r--git-for-each-ref.txt8
-rw-r--r--git-http-fetch.html17
-rw-r--r--git-http-fetch.txt10
-rw-r--r--git-index-pack.html8
-rw-r--r--git-index-pack.txt7
-rw-r--r--gitdiffcore.html21
-rw-r--r--gitdiffcore.txt20
-rw-r--r--howto-index.html2
-rw-r--r--howto/keep-canonical-history-correct.html2
-rw-r--r--howto/maintain-git.html2
-rw-r--r--howto/new-command.html2
-rw-r--r--howto/rebase-from-internal-branch.html2
-rw-r--r--howto/rebuild-from-update-hook.html2
-rw-r--r--howto/recover-corrupted-blob-object.html2
-rw-r--r--howto/recover-corrupted-object-harder.html2
-rw-r--r--howto/revert-a-faulty-merge.html2
-rw-r--r--howto/revert-branch-rebase.html2
-rw-r--r--howto/separating-topic-branches.html2
-rw-r--r--howto/setup-git-server-over-http.html2
-rw-r--r--howto/update-hook-example.html2
-rw-r--r--howto/use-git-daemon.html2
-rw-r--r--howto/using-merge-subtree.html2
-rw-r--r--howto/using-signed-tag-in-pull-request.html2
-rw-r--r--technical/api-index.html2
-rw-r--r--technical/chunk-format.txt116
-rw-r--r--technical/commit-graph-format.txt3
-rw-r--r--technical/pack-format.html7
-rw-r--r--technical/pack-format.txt3
-rw-r--r--technical/reftable.html39
-rw-r--r--technical/reftable.txt42
33 files changed, 307 insertions, 71 deletions
diff --git a/RelNotes/2.31.0.txt b/RelNotes/2.31.0.txt
index ef8b0d158..04bd5b70a 100644
--- a/RelNotes/2.31.0.txt
+++ b/RelNotes/2.31.0.txt
@@ -197,6 +197,31 @@ Performance, Internal Implementation, Development Support etc.
* The code to implement "git merge-base --independent" was poorly
done and was kept from the very beginning of the feature.
+ * Preliminary changes to fsmonitor integration.
+
+ * Performance optimization work on the rename detection continues.
+
+ * The common code to deal with "chunked file format" that is shared
+ by the multi-pack-index and commit-graph files have been factored
+ out, to help codepaths for both filetypes to become more robust.
+
+ * The approach to "fsck" the incoming objects in "index-pack" is
+ attractive for performance reasons (we have them already in core,
+ inflated and ready to be inspected), but fundamentally cannot be
+ applied fully when we receive more than one pack stream, as a tree
+ object in one pack may refer to a blob object in another pack as
+ ".gitmodules", when we want to inspect blobs that are used as
+ ".gitmodules" file, for example. Teach "index-pack" to emit
+ objects that must be inspected later and check them in the calling
+ "fetch-pack" process.
+
+ * The logic to handle "trailer" related placeholders in the
+ "--format=" mechanisms in the "log" family and "for-each-ref"
+ family is getting unified.
+
+ * Raise the buffer size used when writing the index file out from
+ (obviously too small) 8kB to (clearly sufficiently large) 128kB.
+
Fixes since v2.30
-----------------
@@ -318,6 +343,12 @@ Fixes since v2.30
corrected.
(merge 20e416409f jc/push-delete-nothing later to maint).
+ * Test script modernization.
+ (merge 488acf15df sv/t7001-modernize later to maint).
+
+ * An under-allocation for the untracked cache data has been corrected.
+ (merge 6347d649bc jh/untracked-cache-fix later to maint).
+
* Other code cleanup, docfix, build fix, etc.
(merge e3f5da7e60 sg/t7800-difftool-robustify later to maint).
(merge 9d336655ba js/doc-proto-v2-response-end later to maint).
diff --git a/SubmittingPatches.html b/SubmittingPatches.html
index 0a4ac313f..feb77597f 100644
--- a/SubmittingPatches.html
+++ b/SubmittingPatches.html
@@ -1393,7 +1393,7 @@ this problem around.</p></div>
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:50 PST
+ 2021-03-02 23:05:12 PST
</div>
</div>
</body>
diff --git a/git-for-each-ref.html b/git-for-each-ref.html
index 728a5fbf3..0dbb45fa5 100644
--- a/git-for-each-ref.html
+++ b/git-for-each-ref.html
@@ -1163,11 +1163,9 @@ contents:lines=N
</dd>
</dl></div>
<div class="paragraph"><p>Additionally, the trailers as interpreted by <a href="git-interpret-trailers.html">git-interpret-trailers(1)</a>
-are obtained as <code>trailers</code> (or by using the historical alias
-<code>contents:trailers</code>). Non-trailer lines from the trailer block can be omitted
-with <code>trailers:only</code>. Whitespace-continuations can be removed from trailers so
-that each trailer appears on a line by itself with its full content with
-<code>trailers:unfold</code>. Both can be used together as <code>trailers:unfold,only</code>.</p></div>
+are obtained as <code>trailers[:options]</code> (or by using the historical alias
+<code>contents:trailers[:options]</code>). For valid [:option] values see <code>trailers</code>
+section of <a href="git-log.html">git-log(1)</a>.</p></div>
<div class="paragraph"><p>For sorting purposes, fields with numeric values sort in numeric order
(<code>objectsize</code>, <code>authordate</code>, <code>committerdate</code>, <code>creatordate</code>, <code>taggerdate</code>).
All other fields are used to sort in their byte-value order.</p></div>
@@ -1326,7 +1324,7 @@ commits and from none of the <code>--no-merged</code> commits are shown.</p></di
<div id="footer">
<div id="footer-text">
Last updated
- 2020-09-22 13:11:20 PDT
+ 2021-03-02 23:05:01 PST
</div>
</div>
</body>
diff --git a/git-for-each-ref.txt b/git-for-each-ref.txt
index 2962f85a5..2ae2478de 100644
--- a/git-for-each-ref.txt
+++ b/git-for-each-ref.txt
@@ -260,11 +260,9 @@ contents:lines=N::
The first `N` lines of the message.
Additionally, the trailers as interpreted by linkgit:git-interpret-trailers[1]
-are obtained as `trailers` (or by using the historical alias
-`contents:trailers`). Non-trailer lines from the trailer block can be omitted
-with `trailers:only`. Whitespace-continuations can be removed from trailers so
-that each trailer appears on a line by itself with its full content with
-`trailers:unfold`. Both can be used together as `trailers:unfold,only`.
+are obtained as `trailers[:options]` (or by using the historical alias
+`contents:trailers[:options]`). For valid [:option] values see `trailers`
+section of linkgit:git-log[1].
For sorting purposes, fields with numeric values sort in numeric order
(`objectsize`, `authordate`, `committerdate`, `creatordate`, `taggerdate`).
diff --git a/git-http-fetch.html b/git-http-fetch.html
index e2b53b65b..7e4411ab1 100644
--- a/git-http-fetch.html
+++ b/git-http-fetch.html
@@ -819,11 +819,22 @@ commit-id
</dt>
<dd>
<p>
- Instead of a commit id on the command line (which is not expected in
+ For internal use only. Instead of a commit id on the command
+ line (which is not expected in
this case), <em>git http-fetch</em> fetches the packfile directly at the given
URL and uses index-pack to generate corresponding .idx and .keep files.
The hash is used to determine the name of the temporary file and is
- arbitrary. The output of index-pack is printed to stdout.
+ arbitrary. The output of index-pack is printed to stdout. Requires
+ --index-pack-args.
+</p>
+</dd>
+<dt class="hdlist1">
+--index-pack-args=&lt;args&gt;
+</dt>
+<dd>
+<p>
+ For internal use only. The command to run on the contents of the
+ downloaded pack. Arguments are URL-encoded separated by spaces.
</p>
</dd>
<dt class="hdlist1">
@@ -849,7 +860,7 @@ commit-id
<div id="footer">
<div id="footer-text">
Last updated
- 2020-06-25 14:07:29 PDT
+ 2021-03-02 23:05:01 PST
</div>
</div>
</body>
diff --git a/git-http-fetch.txt b/git-http-fetch.txt
index 4deb4893f..9fa17b60e 100644
--- a/git-http-fetch.txt
+++ b/git-http-fetch.txt
@@ -41,11 +41,17 @@ commit-id::
<commit-id>['\t'<filename-as-in--w>]
--packfile=<hash>::
- Instead of a commit id on the command line (which is not expected in
+ For internal use only. Instead of a commit id on the command
+ line (which is not expected in
this case), 'git http-fetch' fetches the packfile directly at the given
URL and uses index-pack to generate corresponding .idx and .keep files.
The hash is used to determine the name of the temporary file and is
- arbitrary. The output of index-pack is printed to stdout.
+ arbitrary. The output of index-pack is printed to stdout. Requires
+ --index-pack-args.
+
+--index-pack-args=<args>::
+ For internal use only. The command to run on the contents of the
+ downloaded pack. Arguments are URL-encoded separated by spaces.
--recover::
Verify that everything reachable from target is fetched. Used after
diff --git a/git-index-pack.html b/git-index-pack.html
index 4cf02f29a..6169e4ae2 100644
--- a/git-index-pack.html
+++ b/git-index-pack.html
@@ -885,8 +885,12 @@ the objects/pack/ directory of a Git repository.</p></div>
</dt>
<dd>
<p>
- Die if the pack contains broken objects. For internal use only.
+ For internal use only.
</p>
+<div class="paragraph"><p>Die if the pack contains broken objects. If the pack contains a tree
+pointing to a .gitmodules blob that does not exist, prints the hash of
+that blob (for the caller to check) after the hash that goes into the
+name of the pack/idx file (see "Notes").</p></div>
</dd>
<dt class="hdlist1">
--threads=&lt;n&gt;
@@ -954,7 +958,7 @@ mentioned above.</p></div>
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-12 14:43:56 PST
+ 2021-03-02 23:05:01 PST
</div>
</div>
</body>
diff --git a/git-index-pack.txt b/git-index-pack.txt
index 69ba904d4..7fa74b9e7 100644
--- a/git-index-pack.txt
+++ b/git-index-pack.txt
@@ -86,7 +86,12 @@ OPTIONS
Die if the pack contains broken links. For internal use only.
--fsck-objects::
- Die if the pack contains broken objects. For internal use only.
+ For internal use only.
++
+Die if the pack contains broken objects. If the pack contains a tree
+pointing to a .gitmodules blob that does not exist, prints the hash of
+that blob (for the caller to check) after the hash that goes into the
+name of the pack/idx file (see "Notes").
--threads=<n>::
Specifies the number of threads to spawn when resolving
diff --git a/gitdiffcore.html b/gitdiffcore.html
index 7fe36ab8a..bfb02082e 100644
--- a/gitdiffcore.html
+++ b/gitdiffcore.html
@@ -936,6 +936,25 @@ files are "similar enough", and can be customized to use
a similarity score different from the default of 50% by giving a
number after the "-M" or "-C" option (e.g. "-M8" to tell it to use
8/10 = 80%).</p></div>
+<div class="paragraph"><p>Note that when rename detection is on but both copy and break
+detection are off, rename detection adds a preliminary step that first
+checks if files are moved across directories while keeping their
+filename the same. If there is a file added to a directory whose
+contents is sufficiently similar to a file with the same name that got
+deleted from a different directory, it will mark them as renames and
+exclude them from the later quadratic step (the one that pairwise
+compares all unmatched files to find the "best" matches, determined by
+the highest content similarity). So, for example, if a deleted
+docs/ext.txt and an added docs/config/ext.txt are similar enough, they
+will be marked as a rename and prevent an added docs/ext.md that may
+be even more similar to the deleted docs/ext.txt from being considered
+as the rename destination in the later step. For this reason, the
+preliminary "match same filename" step uses a bit higher threshold to
+mark a file pair as a rename and stop considering other candidates for
+better matches. At most, one comparison is done per file in this
+preliminary pass; so if there are several remaining ext.txt files
+throughout the directory hierarchy after exact rename detection, this
+preliminary step will be skipped for those files.</p></div>
<div class="paragraph"><p>Note. When the "-C" option is used with <code>--find-copies-harder</code>
option, <em>git diff-&#42;</em> commands feed unmodified filepairs to
diffcore mechanism as well as modified ones. This lets the copy
@@ -1089,7 +1108,7 @@ not sorted when diffcore-order is in effect.</p></div>
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:29 PST
+ 2021-03-02 23:05:01 PST
</div>
</div>
</body>
diff --git a/gitdiffcore.txt b/gitdiffcore.txt
index 2bd122047..1c7269655 100644
--- a/gitdiffcore.txt
+++ b/gitdiffcore.txt
@@ -169,6 +169,26 @@ a similarity score different from the default of 50% by giving a
number after the "-M" or "-C" option (e.g. "-M8" to tell it to use
8/10 = 80%).
+Note that when rename detection is on but both copy and break
+detection are off, rename detection adds a preliminary step that first
+checks if files are moved across directories while keeping their
+filename the same. If there is a file added to a directory whose
+contents is sufficiently similar to a file with the same name that got
+deleted from a different directory, it will mark them as renames and
+exclude them from the later quadratic step (the one that pairwise
+compares all unmatched files to find the "best" matches, determined by
+the highest content similarity). So, for example, if a deleted
+docs/ext.txt and an added docs/config/ext.txt are similar enough, they
+will be marked as a rename and prevent an added docs/ext.md that may
+be even more similar to the deleted docs/ext.txt from being considered
+as the rename destination in the later step. For this reason, the
+preliminary "match same filename" step uses a bit higher threshold to
+mark a file pair as a rename and stop considering other candidates for
+better matches. At most, one comparison is done per file in this
+preliminary pass; so if there are several remaining ext.txt files
+throughout the directory hierarchy after exact rename detection, this
+preliminary step will be skipped for those files.
+
Note. When the "-C" option is used with `--find-copies-harder`
option, 'git diff-{asterisk}' commands feed unmodified filepairs to
diffcore mechanism as well as modified ones. This lets the copy
diff --git a/howto-index.html b/howto-index.html
index 5914844f2..b1b0ef8cf 100644
--- a/howto-index.html
+++ b/howto-index.html
@@ -885,7 +885,7 @@ later validate it.</p></div>
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:38 PST
+ 2021-03-02 23:05:05 PST
</div>
</div>
</body>
diff --git a/howto/keep-canonical-history-correct.html b/howto/keep-canonical-history-correct.html
index 99bc0c2e9..1e3aa8220 100644
--- a/howto/keep-canonical-history-correct.html
+++ b/howto/keep-canonical-history-correct.html
@@ -938,7 +938,7 @@ tip of your <em>master</em> again and redo the two merges:</p></div>
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:46 PST
+ 2021-03-02 23:05:10 PST
</div>
</div>
</body>
diff --git a/howto/maintain-git.html b/howto/maintain-git.html
index 75a24710f..e90b6aad1 100644
--- a/howto/maintain-git.html
+++ b/howto/maintain-git.html
@@ -1469,7 +1469,7 @@ $ git update-ref -d $mf/ai/topic</code></pre>
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:46 PST
+ 2021-03-02 23:05:10 PST
</div>
</div>
</body>
diff --git a/howto/new-command.html b/howto/new-command.html
index 215c063b9..9fed0f734 100644
--- a/howto/new-command.html
+++ b/howto/new-command.html
@@ -863,7 +863,7 @@ letter [PATCH 0/n].
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:40 PST
+ 2021-03-02 23:05:07 PST
</div>
</div>
</body>
diff --git a/howto/rebase-from-internal-branch.html b/howto/rebase-from-internal-branch.html
index bf1e2eff7..9a945d376 100644
--- a/howto/rebase-from-internal-branch.html
+++ b/howto/rebase-from-internal-branch.html
@@ -895,7 +895,7 @@ the #1' commit.</p></div>
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:45 PST
+ 2021-03-02 23:05:10 PST
</div>
</div>
</body>
diff --git a/howto/rebuild-from-update-hook.html b/howto/rebuild-from-update-hook.html
index a85f4f8af..9a139d066 100644
--- a/howto/rebuild-from-update-hook.html
+++ b/howto/rebuild-from-update-hook.html
@@ -847,7 +847,7 @@ This is still crude and does not protect against simultaneous
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:45 PST
+ 2021-03-02 23:05:09 PST
</div>
</div>
</body>
diff --git a/howto/recover-corrupted-blob-object.html b/howto/recover-corrupted-blob-object.html
index add827fdb..f79462520 100644
--- a/howto/recover-corrupted-blob-object.html
+++ b/howto/recover-corrupted-blob-object.html
@@ -880,7 +880,7 @@ thing.</p></div>
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:45 PST
+ 2021-03-02 23:05:09 PST
</div>
</div>
</body>
diff --git a/howto/recover-corrupted-object-harder.html b/howto/recover-corrupted-object-harder.html
index 2257e89ee..f05a619f2 100644
--- a/howto/recover-corrupted-object-harder.html
+++ b/howto/recover-corrupted-object-harder.html
@@ -1189,7 +1189,7 @@ int main(int argc, char **argv)
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:45 PST
+ 2021-03-02 23:05:09 PST
</div>
</div>
</body>
diff --git a/howto/revert-a-faulty-merge.html b/howto/revert-a-faulty-merge.html
index 1ba4e1ab7..b7318594e 100644
--- a/howto/revert-a-faulty-merge.html
+++ b/howto/revert-a-faulty-merge.html
@@ -1025,7 +1025,7 @@ P---o---o---M---x---x---W---x---M2
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:44 PST
+ 2021-03-02 23:05:09 PST
</div>
</div>
</body>
diff --git a/howto/revert-branch-rebase.html b/howto/revert-branch-rebase.html
index 478209f04..541d2b1b8 100644
--- a/howto/revert-branch-rebase.html
+++ b/howto/revert-branch-rebase.html
@@ -907,7 +907,7 @@ Committed merge 7fb9b7262a1d1e0a47bbfdcbbcf50ce0635d3f8f
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:41 PST
+ 2021-03-02 23:05:07 PST
</div>
</div>
</body>
diff --git a/howto/separating-topic-branches.html b/howto/separating-topic-branches.html
index 561d8a1c4..975558e54 100644
--- a/howto/separating-topic-branches.html
+++ b/howto/separating-topic-branches.html
@@ -841,7 +841,7 @@ o---o"master"</code></pre>
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:43 PST
+ 2021-03-02 23:05:09 PST
</div>
</div>
</body>
diff --git a/howto/setup-git-server-over-http.html b/howto/setup-git-server-over-http.html
index aa3b9e917..687b12e27 100644
--- a/howto/setup-git-server-over-http.html
+++ b/howto/setup-git-server-over-http.html
@@ -1071,7 +1071,7 @@ help diagnosing the problem, but removes security checks.</p></div>
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:43 PST
+ 2021-03-02 23:05:08 PST
</div>
</div>
</body>
diff --git a/howto/update-hook-example.html b/howto/update-hook-example.html
index f470fe56c..7f5d8b0ec 100644
--- a/howto/update-hook-example.html
+++ b/howto/update-hook-example.html
@@ -930,7 +930,7 @@ that JC can make non-fast-forward pushes on it.</p></div>
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:42 PST
+ 2021-03-02 23:05:08 PST
</div>
</div>
</body>
diff --git a/howto/use-git-daemon.html b/howto/use-git-daemon.html
index 4380a5015..9888c8a49 100644
--- a/howto/use-git-daemon.html
+++ b/howto/use-git-daemon.html
@@ -791,7 +791,7 @@ a good practice to put the paths after a "--" separator.</p></div>
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:42 PST
+ 2021-03-02 23:05:08 PST
</div>
</div>
</body>
diff --git a/howto/using-merge-subtree.html b/howto/using-merge-subtree.html
index fa5ba7ce0..03d24bbe5 100644
--- a/howto/using-merge-subtree.html
+++ b/howto/using-merge-subtree.html
@@ -848,7 +848,7 @@ Please note that if the other project merges from you, then it will
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:41 PST
+ 2021-03-02 23:05:07 PST
</div>
</div>
</body>
diff --git a/howto/using-signed-tag-in-pull-request.html b/howto/using-signed-tag-in-pull-request.html
index d59c08bc3..2180fad9f 100644
--- a/howto/using-signed-tag-in-pull-request.html
+++ b/howto/using-signed-tag-in-pull-request.html
@@ -952,7 +952,7 @@ as part of the merge commit.</p></div>
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:41 PST
+ 2021-03-02 23:05:07 PST
</div>
</div>
</body>
diff --git a/technical/api-index.html b/technical/api-index.html
index bdd0c29f4..35a7a1130 100644
--- a/technical/api-index.html
+++ b/technical/api-index.html
@@ -770,7 +770,7 @@ documents them.</p></div>
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-25 17:29:59 PST
+ 2021-03-02 23:05:16 PST
</div>
</div>
</body>
diff --git a/technical/chunk-format.txt b/technical/chunk-format.txt
new file mode 100644
index 000000000..593614fce
--- /dev/null
+++ b/technical/chunk-format.txt
@@ -0,0 +1,116 @@
+Chunk-based file formats
+========================
+
+Some file formats in Git use a common concept of "chunks" to describe
+sections of the file. This allows structured access to a large file by
+scanning a small "table of contents" for the remaining data. This common
+format is used by the `commit-graph` and `multi-pack-index` files. See
+link:technical/pack-format.html[the `multi-pack-index` format] and
+link:technical/commit-graph-format.html[the `commit-graph` format] for
+how they use the chunks to describe structured data.
+
+A chunk-based file format begins with some header information custom to
+that format. That header should include enough information to identify
+the file type, format version, and number of chunks in the file. From this
+information, that file can determine the start of the chunk-based region.
+
+The chunk-based region starts with a table of contents describing where
+each chunk starts and ends. This consists of (C+1) rows of 12 bytes each,
+where C is the number of chunks. Consider the following table:
+
+ | Chunk ID (4 bytes) | Chunk Offset (8 bytes) |
+ |--------------------|------------------------|
+ | ID[0] | OFFSET[0] |
+ | ... | ... |
+ | ID[C] | OFFSET[C] |
+ | 0x0000 | OFFSET[C+1] |
+
+Each row consists of a 4-byte chunk identifier (ID) and an 8-byte offset.
+Each integer is stored in network-byte order.
+
+The chunk identifier `ID[i]` is a label for the data stored within this
+fill from `OFFSET[i]` (inclusive) to `OFFSET[i+1]` (exclusive). Thus, the
+size of the `i`th chunk is equal to the difference between `OFFSET[i+1]`
+and `OFFSET[i]`. This requires that the chunk data appears contiguously
+in the same order as the table of contents.
+
+The final entry in the table of contents must be four zero bytes. This
+confirms that the table of contents is ending and provides the offset for
+the end of the chunk-based data.
+
+Note: The chunk-based format expects that the file contains _at least_ a
+trailing hash after `OFFSET[C+1]`.
+
+Functions for working with chunk-based file formats are declared in
+`chunk-format.h`. Using these methods provide extra checks that assist
+developers when creating new file formats.
+
+Writing chunk-based file formats
+--------------------------------
+
+To write a chunk-based file format, create a `struct chunkfile` by
+calling `init_chunkfile()` and pass a `struct hashfile` pointer. The
+caller is responsible for opening the `hashfile` and writing header
+information so the file format is identifiable before the chunk-based
+format begins.
+
+Then, call `add_chunk()` for each chunk that is intended for write. This
+populates the `chunkfile` with information about the order and size of
+each chunk to write. Provide a `chunk_write_fn` function pointer to
+perform the write of the chunk data upon request.
+
+Call `write_chunkfile()` to write the table of contents to the `hashfile`
+followed by each of the chunks. This will verify that each chunk wrote
+the expected amount of data so the table of contents is correct.
+
+Finally, call `free_chunkfile()` to clear the `struct chunkfile` data. The
+caller is responsible for finalizing the `hashfile` by writing the trailing
+hash and closing the file.
+
+Reading chunk-based file formats
+--------------------------------
+
+To read a chunk-based file format, the file must be opened as a
+memory-mapped region. The chunk-format API expects that the entire file
+is mapped as a contiguous memory region.
+
+Initialize a `struct chunkfile` pointer with `init_chunkfile(NULL)`.
+
+After reading the header information from the beginning of the file,
+including the chunk count, call `read_table_of_contents()` to populate
+the `struct chunkfile` with the list of chunks, their offsets, and their
+sizes.
+
+Extract the data information for each chunk using `pair_chunk()` or
+`read_chunk()`:
+
+* `pair_chunk()` assigns a given pointer with the location inside the
+ memory-mapped file corresponding to that chunk's offset. If the chunk
+ does not exist, then the pointer is not modified.
+
+* `read_chunk()` takes a `chunk_read_fn` function pointer and calls it
+ with the appropriate initial pointer and size information. The function
+ is not called if the chunk does not exist. Use this method to read chunks
+ if you need to perform immediate parsing or if you need to execute logic
+ based on the size of the chunk.
+
+After calling these methods, call `free_chunkfile()` to clear the
+`struct chunkfile` data. This will not close the memory-mapped region.
+Callers are expected to own that data for the timeframe the pointers into
+the region are needed.
+
+Examples
+--------
+
+These file formats use the chunk-format API, and can be used as examples
+for future formats:
+
+* *commit-graph:* see `write_commit_graph_file()` and `parse_commit_graph()`
+ in `commit-graph.c` for how the chunk-format API is used to write and
+ parse the commit-graph file format documented in
+ link:technical/commit-graph-format.html[the commit-graph file format].
+
+* *multi-pack-index:* see `write_midx_internal()` and `load_multi_pack_index()`
+ in `midx.c` for how the chunk-format API is used to write and
+ parse the multi-pack-index file format documented in
+ link:technical/pack-format.html[the multi-pack-index file format].
diff --git a/technical/commit-graph-format.txt b/technical/commit-graph-format.txt
index b6658eff1..87971c27d 100644
--- a/technical/commit-graph-format.txt
+++ b/technical/commit-graph-format.txt
@@ -61,6 +61,9 @@ CHUNK LOOKUP:
the length using the next chunk position if necessary.) Each chunk
ID appears at most once.
+ The CHUNK LOOKUP matches the table of contents from
+ link:technical/chunk-format.html[the chunk-based file format].
+
The remaining data in the body is described one chunk at a time, and
these chunks may be given in any order. Chunks are required unless
otherwise specified.
diff --git a/technical/pack-format.html b/technical/pack-format.html
index c8c0e3768..c17d8c21d 100644
--- a/technical/pack-format.html
+++ b/technical/pack-format.html
@@ -1222,6 +1222,11 @@ the number of base MIDX files, hash lengths and types.</p></div>
</div></div>
<div class="literalblock">
<div class="content">
+<pre><code>The CHUNK LOOKUP matches the table of contents from
+link:technical/chunk-format.html[the chunk-based file format].</code></pre>
+</div></div>
+<div class="literalblock">
+<div class="content">
<pre><code>The remaining data in the body is described one chunk at a time, and
these chunks may be given in any order. Chunks are required unless
otherwise specified.</code></pre>
@@ -1280,7 +1285,7 @@ otherwise specified.</code></pre>
<div id="footer">
<div id="footer-text">
Last updated
- 2021-02-12 14:43:56 PST
+ 2021-03-02 23:05:01 PST
</div>
</div>
</body>
diff --git a/technical/pack-format.txt b/technical/pack-format.txt
index 8833b71c8..1faa949bf 100644
--- a/technical/pack-format.txt
+++ b/technical/pack-format.txt
@@ -336,6 +336,9 @@ CHUNK LOOKUP:
(Chunks are provided in file-order, so you can infer the length
using the next chunk position if necessary.)
+ The CHUNK LOOKUP matches the table of contents from
+ link:technical/chunk-format.html[the chunk-based file format].
+
The remaining data in the body is described one chunk at a time, and
these chunks may be given in any order. Chunks are required unless
otherwise specified.
diff --git a/technical/reftable.html b/technical/reftable.html
index 6f68bf20a..544baf859 100644
--- a/technical/reftable.html
+++ b/technical/reftable.html
@@ -1704,16 +1704,11 @@ sorted descending by update index.</p></div>
</div>
<div class="sect3">
<h4 id="_layout">Layout</h4>
-<div class="paragraph"><p>A collection of reftable files are stored in the <code>$GIT_DIR/reftable/</code>
-directory:</p></div>
-<div class="literalblock">
-<div class="content">
-<pre><code>00000001-00000001.log
-00000002-00000002.ref
-00000003-00000003.ref</code></pre>
-</div></div>
-<div class="paragraph"><p>where reftable files are named by a unique name such as produced by the
-function <code>${min_update_index}-${max_update_index}.ref</code>.</p></div>
+<div class="paragraph"><p>A collection of reftable files are stored in the <code>$GIT_DIR/reftable/</code> directory.
+Their names should have a random element, such that each filename is globally
+unique; this helps avoid spurious failures on Windows, where open files cannot
+be removed or overwritten. It suggested to use
+<code>${min_update_index}-${max_update_index}-${random}.ref</code> as a naming convention.</p></div>
<div class="paragraph"><p>Log-only files use the <code>.log</code> extension, while ref-only and mixed ref
and log files use <code>.ref</code>. extension.</p></div>
<div class="paragraph"><p>The stack ordering file is <code>$GIT_DIR/reftable/tables.list</code> and lists the
@@ -1722,9 +1717,9 @@ current files, one per line, in order, from oldest (base) to newest
<div class="literalblock">
<div class="content">
<pre><code>$ cat .git/reftable/tables.list
-00000001-00000001.log
-00000002-00000002.ref
-00000003-00000003.ref</code></pre>
+00000001-00000001-RANDOM1.log
+00000002-00000002-RANDOM2.ref
+00000003-00000003-RANDOM3.ref</code></pre>
</div></div>
<div class="paragraph"><p>Readers must read <code>$GIT_DIR/reftable/tables.list</code> to determine which
files are relevant right now, and search through the stack in reverse
@@ -1812,7 +1807,7 @@ Prepare temp reftable <code>tmp_XXXXXX</code>, including log entries.
</li>
<li>
<p>
-Rename <code>tmp_XXXXXX</code> to <code>${update_index}-${update_index}.ref</code>.
+Rename <code>tmp_XXXXXX</code> to <code>${update_index}-${update_index}-${random}.ref</code>.
</p>
</li>
<li>
@@ -1896,7 +1891,7 @@ the locking protocol.
<li>
<p>
Rename <code>${min_update_index}-${max_update_index}_XXXXXX</code> to
-<code>${min_update_index}-${max_update_index}.ref</code>.
+<code>${min_update_index}-${max_update_index}-${random}.ref</code>.
</p>
</li>
<li>
@@ -1921,6 +1916,18 @@ readers to backtrack.
<div class="paragraph"><p>Each reftable (compacted or not) is uniquely identified by its name, so
open reftables can be cached by their name.</p></div>
</div>
+<div class="sect3">
+<h4 id="_windows">Windows</h4>
+<div class="paragraph"><p>On windows, and other systems that do not allow deleting or renaming to open
+files, compaction may succeed, but other readers may prevent obsolete tables
+from being deleted.</p></div>
+<div class="paragraph"><p>On these platforms, the following strategy can be followed: on closing a
+reftable stack, reload <code>tables.list</code>, and delete any tables no longer mentioned
+in <code>tables.list</code>.</p></div>
+<div class="paragraph"><p>Irregular program exit may still leave about unused files. In this case, a
+cleanup operation can read <code>tables.list</code>, note its modification timestamp, and
+delete any unreferenced <code>*.ref</code> files that are older.</p></div>
+</div>
</div>
<div class="sect2">
<h3 id="_alternatives_considered">Alternatives considered</h3>
@@ -2021,7 +2028,7 @@ impossible.</p></div>
<div id="footer">
<div id="footer-text">
Last updated
- 2021-01-15 16:12:09 PST
+ 2021-03-02 23:05:01 PST
</div>
</div>
</body>
diff --git a/technical/reftable.txt b/technical/reftable.txt
index 8095ab259..3ef169af2 100644
--- a/technical/reftable.txt
+++ b/technical/reftable.txt
@@ -872,17 +872,11 @@ A repository must set its `$GIT_DIR/config` to configure reftable:
Layout
^^^^^^
-A collection of reftable files are stored in the `$GIT_DIR/reftable/`
-directory:
-
-....
-00000001-00000001.log
-00000002-00000002.ref
-00000003-00000003.ref
-....
-
-where reftable files are named by a unique name such as produced by the
-function `${min_update_index}-${max_update_index}.ref`.
+A collection of reftable files are stored in the `$GIT_DIR/reftable/` directory.
+Their names should have a random element, such that each filename is globally
+unique; this helps avoid spurious failures on Windows, where open files cannot
+be removed or overwritten. It suggested to use
+`${min_update_index}-${max_update_index}-${random}.ref` as a naming convention.
Log-only files use the `.log` extension, while ref-only and mixed ref
and log files use `.ref`. extension.
@@ -893,9 +887,9 @@ current files, one per line, in order, from oldest (base) to newest
....
$ cat .git/reftable/tables.list
-00000001-00000001.log
-00000002-00000002.ref
-00000003-00000003.ref
+00000001-00000001-RANDOM1.log
+00000002-00000002-RANDOM2.ref
+00000003-00000003-RANDOM3.ref
....
Readers must read `$GIT_DIR/reftable/tables.list` to determine which
@@ -940,7 +934,7 @@ new reftable and atomically appending it to the stack:
3. Select `update_index` to be most recent file's
`max_update_index + 1`.
4. Prepare temp reftable `tmp_XXXXXX`, including log entries.
-5. Rename `tmp_XXXXXX` to `${update_index}-${update_index}.ref`.
+5. Rename `tmp_XXXXXX` to `${update_index}-${update_index}-${random}.ref`.
6. Copy `tables.list` to `tables.list.lock`, appending file from (5).
7. Rename `tables.list.lock` to `tables.list`.
@@ -993,7 +987,7 @@ prevents other processes from trying to compact these files.
should always be the case, assuming that other processes are adhering to
the locking protocol.
7. Rename `${min_update_index}-${max_update_index}_XXXXXX` to
-`${min_update_index}-${max_update_index}.ref`.
+`${min_update_index}-${max_update_index}-${random}.ref`.
8. Write the new stack to `tables.list.lock`, replacing `B` and `C`
with the file from (4).
9. Rename `tables.list.lock` to `tables.list`.
@@ -1005,6 +999,22 @@ This strategy permits compactions to proceed independently of updates.
Each reftable (compacted or not) is uniquely identified by its name, so
open reftables can be cached by their name.
+Windows
+^^^^^^^
+
+On windows, and other systems that do not allow deleting or renaming to open
+files, compaction may succeed, but other readers may prevent obsolete tables
+from being deleted.
+
+On these platforms, the following strategy can be followed: on closing a
+reftable stack, reload `tables.list`, and delete any tables no longer mentioned
+in `tables.list`.
+
+Irregular program exit may still leave about unused files. In this case, a
+cleanup operation can read `tables.list`, note its modification timestamp, and
+delete any unreferenced `*.ref` files that are older.
+
+
Alternatives considered
~~~~~~~~~~~~~~~~~~~~~~~