diff options
author | René Scharfe <l.s.r@web.de> | 2022-06-15 19:02:33 +0200 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2022-06-15 13:19:47 -0700 |
commit | 76d7602631a9d0cb67cc1b848d580b862dc5de8b (patch) | |
tree | fb72986437cfb791d2a109f891ab08382014abe5 /Documentation/git-archive.txt | |
parent | dfce1186c6034d6f4ea283f5178fd25cbd8f4fc0 (diff) | |
download | git-76d7602631a9d0cb67cc1b848d580b862dc5de8b.tar.gz |
archive-tar: add internal gzip implementation
Git uses zlib for its own object store, but calls gzip when creating tgz
archives. Add an option to perform the gzip compression for the latter
using zlib, without depending on the external gzip binary.
Plug it in by making write_block a function pointer and switching to a
compressing variant if the filter command has the magic value "git
archive gzip". Does that indirection slow down tar creation? Not
really, at least not in this test:
$ hyperfine -w3 -L rev HEAD,origin/main -p 'git checkout {rev} && make' \
'./git -C ../linux archive --format=tar HEAD # {rev}'
Benchmark #1: ./git -C ../linux archive --format=tar HEAD # HEAD
Time (mean ± σ): 4.044 s ± 0.007 s [User: 3.901 s, System: 0.137 s]
Range (min … max): 4.038 s … 4.059 s 10 runs
Benchmark #2: ./git -C ../linux archive --format=tar HEAD # origin/main
Time (mean ± σ): 4.047 s ± 0.009 s [User: 3.903 s, System: 0.138 s]
Range (min … max): 4.038 s … 4.066 s 10 runs
How does tgz creation perform?
$ hyperfine -w3 -L command 'gzip -cn','git archive gzip' \
'./git -c tar.tgz.command="{command}" -C ../linux archive --format=tgz HEAD'
Benchmark #1: ./git -c tar.tgz.command="gzip -cn" -C ../linux archive --format=tgz HEAD
Time (mean ± σ): 20.404 s ± 0.006 s [User: 23.943 s, System: 0.401 s]
Range (min … max): 20.395 s … 20.414 s 10 runs
Benchmark #2: ./git -c tar.tgz.command="git archive gzip" -C ../linux archive --format=tgz HEAD
Time (mean ± σ): 23.807 s ± 0.023 s [User: 23.655 s, System: 0.145 s]
Range (min … max): 23.782 s … 23.857 s 10 runs
Summary
'./git -c tar.tgz.command="gzip -cn" -C ../linux archive --format=tgz HEAD' ran
1.17 ± 0.00 times faster than './git -c tar.tgz.command="git archive gzip" -C ../linux archive --format=tgz HEAD'
So the internal implementation takes 17% longer on the Linux repo, but
uses 2% less CPU time. That's because the external gzip can run in
parallel on its own processor, while the internal one works sequentially
and avoids the inter-process communication overhead.
What are the benefits? Only an internal sequential implementation can
offer this eco mode, and it allows avoiding the gzip(1) requirement.
This implementation uses the helper functions from our zlib.c instead of
the convenient gz* functions from zlib, because the latter doesn't give
the control over the generated gzip header that the next patch requires.
Original-patch-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'Documentation/git-archive.txt')
-rw-r--r-- | Documentation/git-archive.txt | 3 |
1 files changed, 2 insertions, 1 deletions
diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt index ff3f7b0344..b2d1b63d31 100644 --- a/Documentation/git-archive.txt +++ b/Documentation/git-archive.txt @@ -148,7 +148,8 @@ tar.<format>.command:: to the command (e.g., `-9`). + The `tar.gz` and `tgz` formats are defined automatically and use the -command `gzip -cn` by default. +command `gzip -cn` by default. An internal gzip implementation can be +used by specifying the value `git archive gzip`. tar.<format>.remote:: If true, enable the format for use by remote clients via |