summaryrefslogtreecommitdiffstats
path: root/man5/gitformat-commit-graph.5
blob: 19a6db8322403eb16ab5aa8bf206b883704be103 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
'\" t
.\"     Title: gitformat-commit-graph
.\"    Author: [FIXME: author] [see http://www.docbook.org/tdg5/en/html/author]
.\" Generator: DocBook XSL Stylesheets vsnapshot <http://docbook.sf.net/>
.\"      Date: 2024-04-12
.\"    Manual: Git Manual
.\"    Source: Git 2.44.0.591.g8f7582d995
.\"  Language: English
.\"
.TH "GITFORMAT\-COMMIT\-GRAPH" "5" "2024\-04\-12" "Git 2\&.44\&.0\&.591\&.g8f7582" "Git Manual"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\" -----------------------------------------------------------------
.\" * set default formatting
.\" -----------------------------------------------------------------
.\" disable hyphenation
.nh
.\" disable justification (adjust text to left margin only)
.ad l
.\" -----------------------------------------------------------------
.\" * MAIN CONTENT STARTS HERE *
.\" -----------------------------------------------------------------
.SH "NAME"
gitformat-commit-graph \- Git commit\-graph format
.SH "SYNOPSIS"
.sp
.nf
$GIT_DIR/objects/info/commit\-graph
$GIT_DIR/objects/info/commit\-graphs/*
.fi
.sp
.SH "DESCRIPTION"
.sp
The Git commit\-graph stores a list of commit OIDs and some associated metadata, including:
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
The generation number of the commit\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
The root tree OID\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
The commit date\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
The parents of the commit, stored using positional references within the graph file\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
The Bloom filter of the commit carrying the paths that were changed between the commit and its first parent, if requested\&.
.RE
.sp
These positional references are stored as unsigned 32\-bit integers corresponding to the array position within the list of commit OIDs\&. Due to some special constants we use to track parents, we can store at most (1 << 30) + (1 << 29) + (1 << 28) \- 1 (around 1\&.8 billion) commits\&.
.SH "COMMIT\-GRAPH FILES HAVE THE FOLLOWING FORMAT:"
.sp
In order to allow extensions that add extra data to the graph, we organize the body into "chunks" and provide a binary lookup table at the beginning of the body\&. The header includes certain values, such as number of chunks and hash type\&.
.sp
All multi\-byte numbers are in network byte order\&.
.SS "HEADER:"
.sp
.if n \{\
.RS 4
.\}
.nf
4\-byte signature:
    The signature is: {\*(AqC\*(Aq, \*(AqG\*(Aq, \*(AqP\*(Aq, \*(AqH\*(Aq}
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
1\-byte version number:
    Currently, the only valid version is 1\&.
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
1\-byte Hash Version
    We infer the hash length (H) from this value:
      1 => SHA\-1
      2 => SHA\-256
    If the hash type does not match the repository\*(Aqs hash algorithm, the
    commit\-graph file should be ignored with a warning presented to the
    user\&.
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
1\-byte number (C) of "chunks"
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
1\-byte number (B) of base commit\-graphs
    We infer the length (H*B) of the Base Graphs chunk
    from this value\&.
.fi
.if n \{\
.RE
.\}
.SS "CHUNK LOOKUP:"
.sp
.if n \{\
.RS 4
.\}
.nf
(C + 1) * 12 bytes listing the table of contents for the chunks:
    First 4 bytes describe the chunk id\&. Value 0 is a terminating label\&.
    Other 8 bytes provide the byte\-offset in current file for chunk to
    start\&. (Chunks are ordered contiguously in the file, so you can infer
    the length using the next chunk position if necessary\&.) Each chunk
    ID appears at most once\&.
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
The CHUNK LOOKUP matches the table of contents from
the chunk\-based file format, see linkgit:gitformat\-chunk[5]
.fi
.if n \{\
.RE
.\}
.sp
.if n \{\
.RS 4
.\}
.nf
The remaining data in the body is described one chunk at a time, and
these chunks may be given in any order\&. Chunks are required unless
otherwise specified\&.
.fi
.if n \{\
.RE
.\}
.SS "CHUNK DATA:"
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBOID Fanout (ID: {O, I, D, F}) (256 * 4 bytes)\fR
.RS 4
.sp
.if n \{\
.RS 4
.\}
.nf
The ith entry, F[i], stores the number of OIDs with first
byte at most i\&. Thus F[255] stores the total
number of commits (N)\&.
.fi
.if n \{\
.RE
.\}
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBOID Lookup (ID: {O, I, D, L}) (N * H bytes)\fR
.RS 4
.sp
.if n \{\
.RS 4
.\}
.nf
The OIDs for all commits in the graph, sorted in ascending order\&.
.fi
.if n \{\
.RE
.\}
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBCommit Data (ID: {C, D, A, T }) (N * (H + 16) bytes)\fR
.RS 4
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
The first H bytes are for the OID of the root tree\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
The next 8 bytes are for the positions of the first two parents of the ith commit\&. Stores value 0x70000000 if no parent in that position\&. If there are more than two parents, the second value has its most\-significant bit on and the other bits store an array position into the Extra Edge List chunk\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
The next 8 bytes store the topological level (generation number v1) of the commit and the commit time in seconds since EPOCH\&. The generation number uses the higher 30 bits of the first 4 bytes, while the commit time uses the 32 bits of the second 4 bytes, along with the lowest 2 bits of the lowest byte, storing the 33rd and 34th bit of the commit time\&.
.RE
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBGeneration Data (ID: {G, D, A, 2 }) (N * 4 bytes) [Optional]\fR
.RS 4
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
This list of 4\-byte values store corrected commit date offsets for the commits, arranged in the same order as commit data chunk\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
If the corrected commit date offset cannot be stored within 31 bits, the value has its most\-significant bit on and the other bits store the position of corrected commit date into the Generation Data Overflow chunk\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Generation Data chunk is present only when commit\-graph file is written by compatible versions of Git and in case of split commit\-graph chains, the topmost layer also has Generation Data chunk\&.
.RE
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBGeneration Data Overflow (ID: {G, D, O, 2 }) [Optional]\fR
.RS 4
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
This list of 8\-byte values stores the corrected commit date offsets for commits with corrected commit date offsets that cannot be stored within 31 bits\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Generation Data Overflow chunk is present only when Generation Data chunk is present and atleast one corrected commit date offset cannot be stored within 31 bits\&.
.RE
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBExtra Edge List (ID: {E, D, G, E}) [Optional]\fR
.RS 4
.sp
.if n \{\
.RS 4
.\}
.nf
This list of 4\-byte values store the second through nth parents for
all octopus merges\&. The second parent value in the commit data stores
an array position within this list along with the most\-significant bit
on\&. Starting at that array position, iterate through this list of commit
positions for the parents until reaching a value with the most\-significant
bit on\&. The other bits correspond to the position of the last parent\&.
.fi
.if n \{\
.RE
.\}
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBBloom Filter Index (ID: {B, I, D, X}) (N * 4 bytes) [Optional]\fR
.RS 4
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
The ith entry, BIDX[i], stores the number of bytes in all Bloom filters from commit 0 to commit i (inclusive) in lexicographic order\&. The Bloom filter for the i\-th commit spans from BIDX[i\-1] to BIDX[i] (plus header length), where BIDX[\-1] is 0\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
The BIDX chunk is ignored if the BDAT chunk is not present\&.
.RE
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBBloom Filter Data (ID: {B, D, A, T}) [Optional]\fR
.RS 4
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
It starts with header consisting of three unsigned 32\-bit integers:
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Version of the hash algorithm being used\&. We currently only support value 1 which corresponds to the 32\-bit version of the murmur3 hash implemented exactly as described in
\m[blue]\fBhttps://en\&.wikipedia\&.org/wiki/MurmurHash#Algorithm\fR\m[]
and the double hashing technique using seed values 0x293ae76f and 0x7e646e2 as described in
\m[blue]\fBhttps://doi\&.org/10\&.1007/978\-3\-540\-30494\-4_26\fR\m[]
"Bloom Filters in Probabilistic Verification"
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
The number of times a path is hashed and hence the number of bit positions that cumulatively determine whether a file is present in the commit\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
The minimum number of bits
\fIb\fR
per entry in the Bloom filter\&. If the filter contains
\fIn\fR
entries, then the filter size is the minimum number of 64\-bit words that contain n*b bits\&.
.RE
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
The rest of the chunk is the concatenation of all the computed Bloom filters for the commits in lexicographic order\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
Note: Commits with no changes or more than 512 changes have Bloom filters of length one, with either all bits set to zero or one respectively\&.
.RE
.sp
.RS 4
.ie n \{\
\h'-04'\(bu\h'+03'\c
.\}
.el \{\
.sp -1
.IP \(bu 2.3
.\}
The BDAT chunk is present if and only if BIDX is present\&.
.RE
.RE
.sp
.it 1 an-trap
.nr an-no-space-flag 1
.nr an-break-flag 1
.br
.ps +1
\fBBase Graphs List (ID: {B, A, S, E}) [Optional]\fR
.RS 4
.sp
.if n \{\
.RS 4
.\}
.nf
This list of H\-byte hashes describe a set of B commit\-graph files that
form a commit\-graph chain\&. The graph position for the ith commit in this
file\*(Aqs OID Lookup chunk is equal to i plus the number of commits in all
base graphs\&.  If B is non\-zero, this chunk must exist\&.
.fi
.if n \{\
.RE
.\}
.RE
.SS "TRAILER:"
.sp
.if n \{\
.RS 4
.\}
.nf
H\-byte HASH\-checksum of all of the above\&.
.fi
.if n \{\
.RE
.\}
.SH "HISTORICAL NOTES:"
.sp
The Generation Data (GDA2) and Generation Data Overflow (GDO2) chunks have the number \fI2\fR in their chunk IDs because a previous version of Git wrote possibly erroneous data in these chunks with the IDs "GDAT" and "GDOV"\&. By changing the IDs, newer versions of Git will silently ignore those older chunks and write the new information without trusting the incorrect data\&.
.SH "GIT"
.sp
Part of the \fBgit\fR(1) suite