aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/technical
diff options
context:
space:
mode:
authorDerrick Stolee <derrickstolee@github.com>2022-03-02 09:45:13 -0500
committerJunio C Hamano <gitster@pobox.com>2022-03-07 09:17:03 -0800
commit6dbf4b8172ef9edd50bdf9ca2da4681ba9153f75 (patch)
treecac810fdbeb65f58aca4ea682570a2a955bfd575 /Documentation/technical
parentc8d67b9a682e2a52beafab60f057ac317e35358d (diff)
downloadgit-6dbf4b8172ef9edd50bdf9ca2da4681ba9153f75.tar.gz
commit-graph: declare bankruptcy on GDAT chunks
The Generation Data (GDAT) and Generation Data Overflow (GDOV) chunks store corrected commit date offsets, used for generation number v2. Recent changes have demonstrated that previous versions of Git were incorrectly parsing data from these chunks, but might have also been writing them incorrectly. I asserted [1] that the previous fixes were sufficient because the known reasons for incorrectly writing generation number v2 data relied on parsing the information incorrectly out of a commit-graph file, but the previous versions of Git were not reading the generation number v2 data. However, Patrick demonstrated [2] a case where in split commit-graphs across an alternate boundary (and possibly some other special conditions) it was possible to have a commit-graph that was generated by a previous version of Git have incorrect generation number v2 data which results in errors like the following: commit-graph generation for commit <oid> is 1623273624 < 1623273710 [1] https://lore.kernel.org/git/f50e74f0-9ffa-f4f2-4663-269801495ed3@github.com/ [2] https://lore.kernel.org/git/Yh93vOkt2DkrGPh2@ncase/ Clearly, there is something else going on. The situation is not completely understood, but the errors do not reproduce if the commit-graphs are all generated by a Git version including these recent fixes. If we cannot trust the existing data in the GDAT and GDOV chunks, then we can alter the format to change the chunk IDs for these chunks. This causes the new version of Git to silently ignore the older chunks (and disabling generation number v2 in the process) while writing new commit-graph files with correct data in the GDA2 and GDO2 chunks. Update commit-graph-format.txt including a historical note about these deprecated chunks. Reported-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'Documentation/technical')
-rw-r--r--Documentation/technical/commit-graph-format.txt12
1 files changed, 10 insertions, 2 deletions
diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
index 87971c27dd..484b185ba9 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -93,7 +93,7 @@ CHUNK DATA:
2 bits of the lowest byte, storing the 33rd and 34th bit of the
commit time.
- Generation Data (ID: {'G', 'D', 'A', 'T' }) (N * 4 bytes) [Optional]
+ Generation Data (ID: {'G', 'D', 'A', '2' }) (N * 4 bytes) [Optional]
* This list of 4-byte values store corrected commit date offsets for the
commits, arranged in the same order as commit data chunk.
* If the corrected commit date offset cannot be stored within 31 bits,
@@ -104,7 +104,7 @@ CHUNK DATA:
by compatible versions of Git and in case of split commit-graph chains,
the topmost layer also has Generation Data chunk.
- Generation Data Overflow (ID: {'G', 'D', 'O', 'V' }) [Optional]
+ Generation Data Overflow (ID: {'G', 'D', 'O', '2' }) [Optional]
* This list of 8-byte values stores the corrected commit date offsets
for commits with corrected commit date offsets that cannot be
stored within 31 bits.
@@ -156,3 +156,11 @@ CHUNK DATA:
TRAILER:
H-byte HASH-checksum of all of the above.
+
+== Historical Notes:
+
+The Generation Data (GDA2) and Generation Data Overflow (GDO2) chunks have
+the number '2' in their chunk IDs because a previous version of Git wrote
+possibly erroneous data in these chunks with the IDs "GDAT" and "GDOV". By
+changing the IDs, newer versions of Git will silently ignore those older
+chunks and write the new information without trusting the incorrect data.