aboutsummaryrefslogtreecommitdiffstats
path: root/fetch-pack.c
AgeCommit message (Collapse)AuthorFilesLines
11 dayscocci: apply rules to rewrite callers of "refs" interfacesPatrick Steinhardt1-2/+4
Apply the rules that rewrite callers of "refs" interfaces to explicitly pass `struct ref_store`. The resulting patch has been applied with the `--whitespace=fix` option. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-30Merge branch 'sd/negotiate-trace-fix'Junio C Hamano1-1/+1
Tracing fix. * sd/negotiate-trace-fix: push: region_leave trace for negotiate_using_fetch
2024-01-08Merge branch 'en/header-cleanup'Junio C Hamano1-2/+0
Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files
2024-01-03push: region_leave trace for negotiate_using_fetchSam Delmerico1-1/+1
There were two region_enter events for negotiate_using_fetch instead of one enter and one leave. This commit replaces the second region_enter event with a region_leave. Signed-off-by: Sam Delmerico <delmerico@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren1-2/+0
Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-09fsck: handle NULL value when parsing message configJeff King1-4/+8
When parsing fsck.*, receive.fsck.*, or fetch.fsck.*, we don't check for an implicit bool. So any of: [fsck] badTree [receive "fsck"] badTree [fetch "fsck"] badTree will cause us to segfault. We can fix it with config_error_nonbool() in the usual way, but we have to make a few more changes to get good error messages. The problem is that all three spots do: if (skip_prefix(var, "fsck.", &var)) to match and parse the actual message id. But that means that "var" now just says "badTree" instead of "receive.fsck.badTree", making the resulting message confusing. We can fix that by storing the parsed message id in its own separate variable. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-30Merge branch 'ts/unpacklimit-config-fix'Junio C Hamano1-3/+3
transfer.unpackLimit ought to be used as a fallback, but overrode fetch.unpackLimit and receive.unpackLimit instead. * ts/unpacklimit-config-fix: transfer.unpackLimit: fetch/receive.unpackLimit takes precedence
2023-08-22transfer.unpackLimit: fetch/receive.unpackLimit takes precedenceJunio C Hamano1-3/+3
The transfer.unpackLimit configuration variable is documented to be used only as a fallback value when the more operation-specific fetch.unpackLimit and receive.unpackLimit variables are not set, but the implementation had the precedence reversed. Apparently this was broken since the transfer.unpackLimit was introduced in e28714c5 (Consolidate {receive,fetch}.unpackLimit, 2007-01-24). Often when documentation and code have diverged for so long, we prefer to change the documentation instead, to avoid disrupting users. But doing so would make these weirdly unlike most other "specific overrides general" config options. And the fact that the bug has existed for so long without anyone noticing implies to me that nobody really tries to mix and match them much. Signed-off-by: Taylor Santiago <taylorsantiago@google.com> [jc: rewrote the log message, added tests, covered receive-pack as well] Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-17Merge branch 'cw/compat-util-header-cleanup'Junio C Hamano1-2/+0
Further shuffling of declarations across header files to streamline file dependencies. * cw/compat-util-header-cleanup: git-compat-util: move alloc macros to git-compat-util.h treewide: remove unnecessary includes for wrapper.h kwset: move translation table from ctype sane-ctype.h: create header for sane-ctype macros git-compat-util: move wrapper.c funcs to its header git-compat-util: move strbuf.c funcs to its header
2023-07-06Merge branch 'gc/config-context'Junio C Hamano1-2/+3
Reduce reliance on a global state in the config reading API. * gc/config-context: config: pass source to config_parser_event_fn_t config: add kvi.path, use it to evaluate includes config.c: remove config_reader from configsets config: pass kvi to die_bad_number() trace2: plumb config kvi config.c: pass ctx with CLI config config: pass ctx with config files config.c: pass ctx in configsets config: add ctx arg to config_fn_t urlmatch.h: use config_fn_t type config: inline git_color_default_config
2023-07-05git-compat-util: move alloc macros to git-compat-util.hCalvin Wan1-1/+0
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for dynamic array allocation. Moving these macros to git-compat-util.h with the other alloc macros focuses alloc.[ch] to allocation for Git objects and additionally allows us to remove inclusions to alloc.h from files that solely used the above macros. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-05treewide: remove unnecessary includes for wrapper.hCalvin Wan1-1/+0
Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28config: add ctx arg to config_fn_tGlen Choo1-2/+3
Add a new "const struct config_context *ctx" arg to config_fn_t to hold additional information about the config iteration operation. config_context has a "struct key_value_info kvi" member that holds metadata about the config source being read (e.g. what kind of config source it is, the filename, etc). In this series, we're only interested in .kvi, so we could have just used "struct key_value_info" as an arg, but config_context makes it possible to add/adjust members in the future without changing the config_fn_t signature. We could also consider other ways of organizing the args (e.g. moving the config name and value into config_context or key_value_info), but in my experiments, the incremental benefit doesn't justify the added complexity (e.g. a config_fn_t will sometimes invoke another config_fn_t but with a different config value). In subsequent commits, the .kvi member will replace the global "struct config_reader" in config.c, making config iteration a global-free operation. It requires much more work for the machinery to provide meaningful values of .kvi, so for now, merely change the signature and call sites, pass NULL as a placeholder value, and don't rely on the arg in any meaningful way. Most of the changes are performed by contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every config_fn_t: - Modifies the signature to accept "const struct config_context *ctx" - Passes "ctx" to any inner config_fn_t, if needed - Adds UNUSED attributes to "ctx", if needed Most config_fn_t instances are easily identified by seeing if they are called by the various config functions. Most of the remaining ones are manually named in the .cocci patch. Manual cleanups are still needed, but the majority of it is trivial; it's either adjusting config_fn_t that the .cocci patch didn't catch, or adding forward declarations of "struct config_context ctx" to make the signatures make sense. The non-trivial changes are in cases where we are invoking a config_fn_t outside of config machinery, and we now need to decide what value of "ctx" to pass. These cases are: - trace2/tr2_cfg.c:tr2_cfg_set_fl() This is indirectly called by git_config_set() so that the trace2 machinery can notice the new config values and update its settings using the tr2 config parsing function, i.e. tr2_cfg_cb(). - builtin/checkout.c:checkout_main() This calls git_xmerge_config() as a shorthand for parsing a CLI arg. This might be worth refactoring away in the future, since git_xmerge_config() can call git_default_config(), which can do much more than just parsing. Handle them by creating a KVI_INIT macro that initializes "struct key_value_info" to a reasonable default, and use that to construct the "ctx" arg. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21object-store-ll.h: split this header out of object-store.hElijah Newren1-1/+1
The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store | sort | uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21repository: remove unnecessary include of path.hElijah Newren1-0/+1
This also made it clear that several .c files that depended upon path.h were missing a #include for it; add the missing includes while at it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-09Merge branch 'en/header-split-cache-h-part-2'Junio C Hamano1-0/+1
More header clean-up. * en/header-split-cache-h-part-2: (22 commits) reftable: ensure git-compat-util.h is the first (indirect) include diff.h: reduce unnecessary includes object-store.h: reduce unnecessary includes commit.h: reduce unnecessary includes fsmonitor: reduce includes of cache.h cache.h: remove unnecessary headers treewide: remove cache.h inclusion due to previous changes cache,tree: move basic name compare functions from read-cache to tree cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c hash-ll.h: split out of hash.h to remove dependency on repository.h tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h dir.h: move DTYPE defines from cache.h versioncmp.h: move declarations for versioncmp.c functions from cache.h ws.h: move declarations for ws.c functions from cache.h match-trees.h: move declarations for match-trees.c functions from cache.h pkt-line.h: move declarations for pkt-line.c functions from cache.h base85.h: move declarations for base85.c functions from cache.h copy.h: move declarations for copy.c functions from cache.h server-info.h: move declarations for server-info.c functions from cache.h packfile.h: move pack_window and pack_entry from cache.h ...
2023-04-25Merge branch 'jk/protocol-cap-parse-fix'Junio C Hamano1-2/+2
The code to parse capability list for v0 on-wire protocol fell into an infinite loop when a capability appears multiple times, which has been corrected. * jk/protocol-cap-parse-fix: v0 protocol: use size_t for capability length/offset t5512: test "ls-remote --heads --symref" filtering with v0 and v2 t5512: allow any protocol version for filtered symref test t5512: add v2 support for "ls-remote --symref" test v0 protocol: fix sha1/sha256 confusion for capabilities^{} t5512: stop referring to "v1" protocol v0 protocol: fix infinite loop when parsing multi-valued capabilities
2023-04-24commit.h: reduce unnecessary includesElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14v0 protocol: use size_t for capability length/offsetJeff King1-2/+2
When parsing server capabilities, we use "int" to store lengths and offsets. At first glance this seems like a spot where our parser may be confused by integer overflow if somebody sent us a malicious response. In practice these strings are all bounded by the 64k limit of a pkt-line, so using "int" is OK. However, it makes the code simpler to audit if they just use size_t everywhere. Note that because we take these parameters as pointers, this also forces many callers to update their declared types. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: remove unnecessary cache.h inclusionElijah Newren1-1/+1
Several files were including cache.h solely to get other headers, such as trace.h and trace2.h. Since the last few commits have modified files to make these dependencies more explicit, the inclusion of cache.h is no longer needed in several cases. Remove it. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on trace.h & trace2.hElijah Newren1-0/+1
Dozens of files made use of trace and trace2 functions, without explicitly including trace.h or trace2.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include trace.h or trace2.h if they are using them. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-06Merge branch 'en/header-split-cleanup'Junio C Hamano1-1/+4
Split key function and data structure definitions out of cache.h to new header files and adjust the users. * en/header-split-cleanup: csum-file.h: remove unnecessary inclusion of cache.h write-or-die.h: move declarations for write-or-die.c functions from cache.h treewide: remove cache.h inclusion due to setup.h changes setup.h: move declarations for setup.c functions from cache.h treewide: remove cache.h inclusion due to environment.h changes environment.h: move declarations for environment.c functions from cache.h treewide: remove unnecessary includes of cache.h wrapper.h: move declarations for wrapper.c functions from cache.h path.h: move function declarations for path.c functions from cache.h cache.h: remove expand_user_path() abspath.h: move absolute path functions from cache.h environment: move comment_line_char from cache.h treewide: remove unnecessary cache.h inclusion from several sources treewide: remove unnecessary inclusion of gettext.h treewide: be explicit about dependence on gettext.h treewide: remove unnecessary cache.h inclusion from a few headers
2023-04-06Merge branch 'ab/remove-implicit-use-of-the-repository'Junio C Hamano1-4/+4
Code clean-up around the use of the_repository. * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-04-04Merge branch 'ab/remove-implicit-use-of-the-repository' into ↵Junio C Hamano1-4/+4
en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-03-28cocci: apply the "object-store.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-4/+4
Apply the part of "the_repository.pending.cocci" pertaining to "object-store.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21csum-file.h: remove unnecessary inclusion of cache.hElijah Newren1-1/+1
With the change in the last commit to move several functions to write-or-die.h, csum-file.h no longer needs to include cache.h. However, removing that include forces several other C files, which directly or indirectly dependend upon csum-file.h's inclusion of cache.h, to now be more explicit about their dependencies. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21environment.h: move declarations for environment.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21wrapper.h: move declarations for wrapper.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren1-0/+1
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-17Merge branch 'jk/unused-post-2.39-part2'Junio C Hamano1-1/+1
More work towards -Wunused. * jk/unused-post-2.39-part2: (21 commits) help: mark unused parameter in git_unknown_cmd_config() run_processes_parallel: mark unused callback parameters userformat_want_item(): mark unused parameter for_each_commit_graft(): mark unused callback parameter rewrite_parents(): mark unused callback parameter fetch-pack: mark unused parameter in callback function notes: mark unused callback parameters prio-queue: mark unused parameters in comparison functions for_each_object: mark unused callback parameters list-objects: mark unused callback parameters mark unused parameters in signal handlers run-command: mark error routine parameters as unused mark "pointless" data pointers in callbacks ref-filter: mark unused callback parameters http-backend: mark unused parameters in virtual functions http-backend: mark argc/argv unused object-name: mark unused parameters in disambiguate callbacks serve: mark unused parameters in virtual functions serve: use repository pointer to get config ls-refs: drop config caching ...
2023-02-24fetch-pack: mark unused parameter in callback functionJeff King1-1/+1
The for_each_cached_alternate() interface requires a callback that takes a negotiator parameter, but not all implementations need it. Mark the unused one as such to appease -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23alloc.h: move ALLOC_GROW() functions from cache.hElijah Newren1-1/+2
This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-13server_supports_v2(): use a separate function for die_on_errorJeff King1-6/+6
The server_supports_v2() helper lets a caller find out if the server supports a feature, and will optionally die if it's not supported. This makes the return value confusing, as it's only meaningful when the function is not asked to die. Coverity flagged a new call like: /* check that we support "foo" */ server_supports_v2("foo", 1); complaining that we usually checked the return value, but this time we didn't. But this call is correct, and other ones that did: if (server_supports_v2("foo", 1)) do_something_with_foo(); are "wrong", in the sense that we know the conditional will always be true (but there's no bug; the code is simply misleading). Let's split the "die" behavior into its own function which returns void, and modify each caller to use the correct one. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-14Merge branch 'ab/unused-annotation'Junio C Hamano1-6/+6
Undoes 'jk/unused-annotation' topic and redoes it to work around Coccinelle rules misfiring false positives in unrelated codepaths. * ab/unused-annotation: git-compat-util.h: use "deprecated" for UNUSED variables git-compat-util.h: use "UNUSED", not "UNUSED(var)"
2022-09-14Merge branch 'jk/unused-annotation'Junio C Hamano1-5/+9
Annotate function parameters that are not used (but cannot be removed for structural reasons), to prepare us to later compile with -Wunused warning turned on. * jk/unused-annotation: is_path_owned_by_current_uid(): mark "report" parameter as unused run-command: mark unused async callback parameters mark unused read_tree_recursive() callback parameters hashmap: mark unused callback parameters config: mark unused callback parameters streaming: mark unused virtual method parameters transport: mark bundle transport_options as unused refs: mark unused virtual method parameters refs: mark unused reflog callback parameters refs: mark unused each_ref_fn parameters git-compat-util: add UNUSED macro
2022-09-01git-compat-util.h: use "UNUSED", not "UNUSED(var)"Ævar Arnfjörð Bjarmason1-6/+6
As reported in [1] the "UNUSED(var)" macro introduced in 2174b8c75de (Merge branch 'jk/unused-annotation' into next, 2022-08-24) breaks coccinelle's parsing of our sources in files where it occurs. Let's instead partially go with the approach suggested in [2] of making this not take an argument. As noted in [1] "coccinelle" will ignore such tokens in argument lists that it doesn't know about, and it's less of a surprise to syntax highlighters. This undoes the "help us notice when a parameter marked as unused is actually use" part of 9b240347543 (git-compat-util: add UNUSED macro, 2022-08-19), a subsequent commit will further tweak the macro to implement a replacement for that functionality. 1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19run-command: mark unused async callback parametersJeff King1-1/+1
The start_async(), etc, functions need a "proc" callback that conforms to a particular interface. Not every callback needs every parameter (e.g., the caller might not even ask to open an input descriptor, in which case there is no point in the callback looking at it). Let's mark these for -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19refs: mark unused each_ref_fn parametersJeff King1-4/+8
Functions used with for_each_ref(), etc, need to conform to the each_ref_fn interface. But most of them don't need every parameter; let's annotate the unused ones to quiet -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-15fetch-pack: add tracing for negotiation roundsJosh Steadmon1-1/+55
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering the entire negotiation process. However, we'd like additional data, such as timing for each round of negotiation or the number of "haves" in each round. Additionally, "independent negotiation" (AKA push negotiation) has no tracing at all. Having this data would allow us to compare the performance of the various negotation implementations, and to debug unexpectedly slow fetch & push sessions. Add per-round trace2 regions for all negotiation implementations (V0+V1, V2, and independent negotiation), as well as an overall region for independent negotiation. Add trace2 data logging for the number of haves and "in vain" objects for each round, and for the total number of rounds once negotiation completes. Finally, add a few checks into various tests to verify that the number of rounds is logged as expected. Signed-off-by: Josh Steadmon <steadmon@google.com> Acked-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-05Merge branch 'jt/fetch-pack-trace2-filter-spec'Junio C Hamano1-14/+26
"git fetch" client logs the partial clone filter used in the trace2 output. * jt/fetch-pack-trace2-filter-spec: fetch-pack: write effective filter to trace2
2022-08-03Merge branch 'rs/mergesort'Junio C Hamano1-0/+8
Make our mergesort implementation type-safe. * rs/mergesort: mergesort: remove llist_mergesort() packfile: use DEFINE_LIST_SORT fetch-pack: use DEFINE_LIST_SORT commit: use DEFINE_LIST_SORT blame: use DEFINE_LIST_SORT test-mergesort: use DEFINE_LIST_SORT test-mergesort: use DEFINE_LIST_SORT_DEBUG mergesort: add macros for typed sort of linked lists mergesort: tighten merge loop mergesort: unify ranks loops
2022-07-26fetch-pack: write effective filter to trace2Jonathan Tan1-14/+26
Administrators of a managed Git environment (like the one at $DAYJOB) might want to quantify the performance change of fetches with and without filters from the client's point of view, and also detect if a server does not support it. Therefore, log the filter information being sent to the server whenever a fetch (or clone) occurs. Note that this is not necessarily the same as what's specified on the CLI, because during a fetch, the configured filter is used whenever a filter is not specified on the CLI. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-17fetch-pack: use DEFINE_LIST_SORTRené Scharfe1-0/+8
Build a static typed ref sorting function using DEFINE_LIST_SORT along with a typed comparison function near its only two callers instead of having an exported version that calls llist_mergesort(). This gets rid of the next pointer accessor functions and their calling overhead at the cost of a slightly increased object text size. Before: __TEXT __DATA __OBJC others dec hex 23231 389 0 113689 137309 2185d fetch-pack.o 29158 80 0 146864 176102 2afe6 remote.o With this patch: __TEXT __DATA __OBJC others dec hex 23591 389 0 117759 141739 229ab fetch-pack.o 29070 80 0 145718 174868 2ab14 remote.o Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-03Merge branch 'ds/bundle-uri'Junio C Hamano1-15/+30
Preliminary code refactoring around transport and bundle code. * ds/bundle-uri: bundle.h: make "fd" version of read_bundle_header() public remote: allow relative_url() to return an absolute url remote: move relative_url() http: make http_get_file() external fetch-pack: move --keep=* option filling to a function fetch-pack: add a deref_without_lazy_fetch_extended() dir API: add a generalized path_match_flags() function connect.c: refactor sending of agent & object-format
2022-05-25Merge branch 'jt/fetch-peek-optional-section'Junio C Hamano1-8/+11
"git fetch" unnecessarily failed when an unexpected optional section appeared in the output, which has been corrected. * jt/fetch-peek-optional-section: fetch-pack: make unexpected peek result non-fatal
2022-05-16fetch-pack: move --keep=* option filling to a functionÆvar Arnfjörð Bjarmason1-8/+12
Move the populating of the --keep=* option argument to "index-pack" to a static function, a subsequent commit will make use of it in another function. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-16fetch-pack: add a deref_without_lazy_fetch_extended()Ævar Arnfjörð Bjarmason1-7/+18
Add a version of the deref_without_lazy_fetch function which can be called with custom oi_flags and to grab information about the "object_type". This will be used for the bundle-uri client in a subsequent commit. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-16fetch-pack: make unexpected peek result non-fatalJonathan Tan1-8/+11
When a Git server responds to a fetch request, it may send optional sections before the packfile section. To handle this, the Git client calls packet_reader_peek() (see process_section_header()) in order to see what's next without consuming the line. However, as implemented, Git errors out whenever what's peeked is not an ordinary line. This is not only unexpected (here, we only need to know whether the upcoming line is the section header we want) but causes errors to include the name of a section header that is irrelevant to the cause of the error. For example, at $DAYJOB, we have seen "fatal: error reading section header 'shallow-info'" error messages when none of the repositories involved are shallow. Therefore, fix this so that the peek returns 1 if the upcoming line is the wanted section header and nothing else. Because of this change, reader->line may now be NULL later in the function, so update the error message printing code accordingly. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-28fetch-pack: add refetchRobert Coup1-17/+29
Allow a "refetch" where the contents of the local object store are ignored and a full fetch is performed, not attempting to find or negotiate common commits with the remote. A key use case is to apply a new partial clone blob/tree filter and refetch all the associated matching content, which would otherwise not be transferred when the commit objects are already present locally. Signed-off-by: Robert Coup <robert@coup.net.nz> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-23Merge branch 'ps/fetch-optim-with-commit-graph'Junio C Hamano1-12/+16
A couple of optimization to "git fetch". * ps/fetch-optim-with-commit-graph: fetch: skip computing output width when not printing anything fetch-pack: use commit-graph when computing cutoff
2022-02-23Merge branch 'bs/forbid-i18n-of-protocol-token-in-fetch-pack'Junio C Hamano1-2/+10
L10n support for a few error messages. * bs/forbid-i18n-of-protocol-token-in-fetch-pack: fetch-pack: parameterize message containing 'ready' keyword
2022-02-11fetch-pack: parameterize message containing 'ready' keywordBagas Sanjaya1-2/+10
The protocol keyword 'ready' isn't meant for translation. Pass it as parameter instead of spell it in die() message (and potentially confuse translators). Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-10fetch-pack: use commit-graph when computing cutoffPatrick Steinhardt1-12/+16
During packfile negotiation we iterate over all refs announced by the remote side to check whether their IDs refer to commits already known to us. If a commit is known to us already, then its date is a potential cutoff point for commits we have in common with the remote side. There is potentially a lot of commits announced by the remote depending on how many refs there are in the remote repository, and for every one of them we need to search for it in our object database and, if found, parse the corresponding object to find out whether it is a candidate for the cutoff date. This can be sped up by trying to look up commits via the commit-graph first, which is a lot more efficient. Benchmarks in a repository with about 2,1 million refs and an up-to-date commit-graph show an almost 20% speedup when mirror-fetching: Benchmark 1: git fetch +refs/*:refs/* (v2.35.0) Time (mean ± σ): 115.587 s ± 2.009 s [User: 109.874 s, System: 11.305 s] Range (min … max): 113.584 s … 118.820 s 5 runs Benchmark 2: git fetch +refs/*:refs/* (HEAD) Time (mean ± σ): 96.859 s ± 0.624 s [User: 91.948 s, System: 10.980 s] Range (min … max): 96.180 s … 97.875 s 5 runs Summary 'git fetch +refs/*:refs/* (HEAD)' ran 1.19 ± 0.02 times faster than 'git fetch +refs/*:refs/* (v2.35.0)' Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-05i18n: factorize "--foo requires --bar" and the likeJean-Noël Avila1-1/+1
They are all replaced by "the option '%s' requires '%s'", which is a new string but replaces 17 previous unique strings. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-10Merge branch 'jk/fetch-pack-avoid-sigpipe-to-index-pack'Junio C Hamano1-0/+5
"git fetch", when received a bad packfile, can fail with SIGPIPE. This wasn't wrong per-se, but we now detect the situation and fail in a more predictable way. * jk/fetch-pack-avoid-sigpipe-to-index-pack: fetch-pack: ignore SIGPIPE when writing to index-pack
2021-11-19fetch-pack: ignore SIGPIPE when writing to index-packJeff King1-0/+5
When fetching, we send the incoming pack to index-pack (or unpack-objects) via the sideband demuxer. If index-pack hits an error (e.g., because an object fails fsck), then it will die immediately. This may cause us to get SIGPIPE on the fetch, as we're still trying to write pack contents from the sideband demuxer (which is typically a thread, and thus takes down the whole fetch process). You can see this in action with: ./t5702-protocol-v2.sh --stress --run=59 which ends with (wrapped for readability): test_must_fail: died by signal 13: git -c protocol.version=2 \ -c transfer.fsckobjects=1 -c fetch.uriprotocols=http,https \ clone http://127.0.0.1:5708/smart/http_parent http_child not ok 59 - packfile-uri with transfer.fsckobjects fails on bad object This is mostly cosmetic. The actual error of interest (in this case, the object that failed the fsck check) comes from index-pack straight to stderr, so the user still sees it. They _might_ even see fetch-pack complaining about index-pack failing, because the main thread is racing with the sideband-demuxer. But they'll definitely see the signal death in the exit code, which is what the test is complaining about. We can make this more predictable by just ignoring SIGPIPE. The sideband demuxer uses write_or_die(), so it will notice and stop (gracefully, because we hook die_routine() to exit just the thread). And during this section we're not writing anywhere else where we'd be concerned about SIGPIPE preventing us from wasting effort writing to nowhere. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-11fetch-pack: redact packfile urls in tracesIvan Frade1-0/+5
In some setups, packfile uris act as bearer token. It is not recommended to expose them plainly in logs, although in special circunstances (e.g. debug) it makes sense to write them. Redact the packfile URL paths by default, unless the GIT_TRACE_REDACT variable is set to false. This mimics the redacting of the Authorization header in HTTP. Signed-off-by: Ivan Frade <ifrade@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01fetch-pack: optimize loading of refs via commit graphPatrick Steinhardt1-0/+5
In order to negotiate a packfile, we need to dereference refs to see which commits we have in common with the remote. To do so, we first look up the object's type -- if it's a tag, we peel until we hit a non-tag object. If we hit a commit eventually, then we return that commit. In case the object ID points to a commit directly, we can avoid the initial lookup of the object type by opportunistically looking up the commit via the commit-graph, if available, which gives us a slight speed bump of about 2% in a huge repository with about 2.3M refs: Benchmark #1: HEAD~: git-fetch Time (mean ± σ): 31.634 s ± 0.258 s [User: 28.400 s, System: 5.090 s] Range (min … max): 31.280 s … 31.896 s 5 runs Benchmark #2: HEAD: git-fetch Time (mean ± σ): 31.129 s ± 0.543 s [User: 27.976 s, System: 5.056 s] Range (min … max): 30.172 s … 31.479 s 5 runs Summary 'HEAD: git-fetch' ran 1.02 ± 0.02 times faster than 'HEAD~: git-fetch' In case this fails, we fall back to the old code which peels the objects to a commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01connected: refactor iterator to return next object ID directlyPatrick Steinhardt1-4/+3
The object ID iterator used by the connectivity checks returns the next object ID via an out-parameter and then uses a return code to indicate whether an item was found. This is a bit roundabout: instead of a separate error code, we can just return the next object ID directly and use `NULL` pointers as indicator that the iterator got no items left. Furthermore, this avoids a copy of the object ID. Refactor the iterator and all its implementations to return object IDs directly. This brings a tiny performance improvement when doing a mirror-fetch of a repository with about 2.3M refs: Benchmark #1: 328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch Time (mean ± σ): 30.110 s ± 0.148 s [User: 27.161 s, System: 5.075 s] Range (min … max): 29.934 s … 30.406 s 10 runs Benchmark #2: 328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch Time (mean ± σ): 29.899 s ± 0.109 s [User: 26.916 s, System: 5.104 s] Range (min … max): 29.696 s … 29.996 s 10 runs Summary '328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch' ran 1.01 ± 0.01 times faster than '328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch' While this 1% speedup could be labelled as statistically insignificant, the speedup is consistent on my machine. Furthermore, this is an end to end test, so it is expected that the improvement in the connectivity check itself is more significant. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-24Merge branch 'ps/fetch-pack-load-refs-optim'Junio C Hamano1-2/+8
Loading of ref tips to prepare for common ancestry negotiation in "git fetch-pack" has been optimized by taking advantage of the commit graph when available. * ps/fetch-pack-load-refs-optim: fetch-pack: speed up loading of refs via commit graph
2021-08-04fetch-pack: speed up loading of refs via commit graphPatrick Steinhardt1-2/+8
When doing reference negotiation, git-fetch-pack(1) is loading all refs from disk in order to determine which commits it has in common with the remote repository. This can be quite expensive in repositories with many references though: in a real-world repository with around 2.2 million refs, fetching a single commit by its ID takes around 44 seconds. Dominating the loading time is decompression and parsing of the objects which are referenced by commits. Given the fact that we only care about commits (or tags which can be peeled to one) in this context, there is thus an easy performance win by switching the parsing logic to make use of the commit graph in case we have one available. Like this, we avoid hitting the object database to parse these commits but instead only load them from the commit-graph. This results in a significant performance boost when executing git-fetch in said repository with 2.2 million refs: Benchmark #1: HEAD~: git fetch $remote $commit Time (mean ± σ): 44.168 s ± 0.341 s [User: 42.985 s, System: 1.106 s] Range (min … max): 43.565 s … 44.577 s 10 runs Benchmark #2: HEAD: git fetch $remote $commit Time (mean ± σ): 19.498 s ± 0.724 s [User: 18.751 s, System: 0.690 s] Range (min … max): 18.629 s … 20.454 s 10 runs Summary 'HEAD: git fetch $remote $commit' ran 2.27 ± 0.09 times faster than 'HEAD~: git fetch $remote $commit' Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20fetch-pack: signal v2 server that we are done making requestsJeff King1-0/+9
When fetching with the v0 protocol over ssh (or a local upload-pack with pipes), the server closes the connection as soon as it is finished sending the pack. So even though the client may still be operating on the data via index-pack (e.g., resolving deltas, checking connectivity, etc), the server has released all resources. With the v2 protocol, however, the server considers the ssh session only as a transport, with individual requests coming over it. After sending the pack, it goes back to its main loop, waiting for another request to come from the client. As a result, the ssh session hangs around until the client process ends, which may be much later (because resolving deltas, etc, may consume a lot of CPU). This is bad for two reasons: - it's consuming resources on the server to leave open a connection that won't see any more use - if something bad happens to the ssh connection in the meantime (say, it gets killed by the network because it's idle, as happened in a real-world report), then ssh will exit non-zero, and we'll propagate the error up the stack. The server is correct here not to hang up after serving the pack. The v2 protocol's design is meant to allow multiple requests like this, and hanging up would be the wrong thing for a hypothetical client which was planning to make more requests (though in practice, the git.git client never would, and I doubt any other implementations would either). The right thing is instead for the client to signal to the server that it's not interested in making more requests. We can do that by closing the pipe descriptor we use to write to ssh. This will propagate to the server upload-pack as an EOF when it tries to read the next request (and then it will close its half, and the whole connection will go away). It's important to do this "half duplex" shutdown, because we have to do it _before_ we actually receive the pack. This is an artifact of the way fetch-pack and index-pack (or unpack-objects) interact. We hand the connection off to index-pack (really, a sideband demuxer which feeds it), and then wait until it returns. And it doesn't do that until it has resolved all of the deltas in the pack, even though it was done reading from the server long before. So just closing the connection fully after index-pack returns would be too late; we'd have held it open much longer than was necessary. And teaching index-pack to close the connection is awkward. It's not even seeing the whole conversation (the sideband demuxer is, but it doesn't actually know what's in the packets, or when the end comes). Note that this close() is happening deep within the transport code. It's possible that a caller would want to perform other operations over the same ssh transport after receiving the pack. But as of the current code, none of the callers do, and there haven't been discussions of any plans to change this. If we need to support that later, we can probably do so by passing down a flag for "you're the last request on the transport; it's OK to close" instead of the code just assuming that's true. The description above all discusses v2 ssh, so it's worth thinking about how this interacts with other protocols: - in v0 protocols, we could do the same half-duplex shutdown (it just goes into the v0 do_fetch_pack() instead). This does work, but since it doesn't have the same persistence problem in the first place, there's little reason to change it at this point. - local fetches against git-upload-pack on the same machine will behave the same as ssh (they are talking over two pipes, and see EOF on their input pipe) - fetches against git-daemon will run this same code, and close one of the descriptors. In practice, this won't do anything, since there our two descriptors are dups of each other, and not part of a half-duplex pair. The right thing would probably be to call shutdown(SHUT_WR) on it. I didn't bother with that here. It doesn't face the same error-code problem (since it's just a TCP connection), so it's really only an optimization problem. And git:// is not that widely used these days, and has less impact on server resources than an ssh termination. - v2 http doesn't suffer from this problem in the first place, as our pipes terminate at a local git-remote-https, which is passing data along as individual requests via curl. Probably curl is keeping the TCP/TLS connection open for more requests, and we might be able to tell it manually "hey, we are done making requests now". But I think that's much less important. It again doesn't suffer from the error-code problem, and HTTP keepalive is pretty well understood (importantly, the timeouts can be set low, because clients like curl know how to reconnect for subsequent requests if necessary). So it's probably not worth figuring out how to tell curl that we're done (though if we do, this patch is probably the first step anyway; fetch-pack closes the pipe back to remote-https, which would be the signal that it should tell curl we're done). The code is pretty straightforward. We close the pipe at the right moment, and set it to -1 to mark it as invalid. I modified the later cleanup code to avoid calling close(-1). That's not strictly necessary, since close(-1) is a noop, but hopefully makes things a bit more obvious to a reader. I suspect that trying to call more transport functions after the close() (e.g., calling transport_fetch_refs() again) would fail, as it's not smart enough to realize we need to re-open the ssh connection. But that's already true when v0 is in use. And no current callers want to do that (and again, the solution is probably a flag in the transport code to keep things open, which can be added later). There's no test here, as the situation it covers is inherently racy (the question is when upload-pack exits, compared to when index-pack finishes resolving deltas and exits). The rather gross shell snippet below does recreate the problematic situation; when run on a sufficiently-large repository (git.git works fine), it kills an "idle" upload-pack while the client is resolving deltas, leading to a failed clone. ( git clone --no-local --progress . foo.git 2>&1 echo >&2 "clone exit code=$?" ) | tr '\r' '\n' | while read line do case "$done,$line" in ,Resolving*) echo "hit resolving deltas; killing upload-pack" killall -9 git-upload-pack done=t ;; esac done Reported-by: Greg Pflaum <greg.pflaum@pnp-hcl.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05fetch: teach independent negotiation (no packfile)Jonathan Tan1-4/+107
Currently, the packfile negotiation step within a Git fetch cannot be done independent of sending the packfile, even though there is at least one application wherein this is useful. Therefore, make it possible for this negotiation step to be done independently. A subsequent commit will use this for one such application - push negotiation. This feature is for protocol v2 only. (An implementation for protocol v0 would require a separate implementation in the fetch, transport, and transport helper code.) In the protocol, the main hindrance towards independent negotiation is that the server can unilaterally decide to send the packfile. This is solved by a "wait-for-done" argument: the server will then wait for the client to say "done". In practice, the client will never say it; instead it will cease requests once it is satisfied. In the client, the main change lies in the transport and transport helper code. fetch_refs_via_pack() performs everything needed - protocol version and capability checks, and the negotiation itself. There are 2 code paths that do not go through fetch_refs_via_pack() that needed to be individually excluded: the bundle transport (excluded through requiring smart_options, which the bundle transport doesn't support) and transport helpers that do not support takeover. If or when we support independent negotiation for protocol v0, we will need to modify these 2 code paths to support it. But for now, report failure if independent negotiation is requested in these cases. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-08fetch-pack: refactor command and capability writeJonathan Tan1-17/+24
A subsequent commit will need this functionality independent of the rest of send_fetch_request(), so put this into its own function. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-08fetch-pack: refactor add_haves()Jonathan Tan1-16/+12
A subsequent commit will need part, but not all, of the functionality in add_haves(), so move some of its functionality to its sole caller send_fetch_request(). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-08fetch-pack: refactor process_acks()Jonathan Tan1-48/+22
A subsequent commit will need part, but not all, of the functionality in process_acks(), so move some of its functionality to its sole caller do_fetch_pack_v2(). As a side effect, the resulting code is also shorter. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-08Merge branch 'jt/fetch-pack-request-fix' into jt/push-negotiationJunio C Hamano1-1/+1
* jt/fetch-pack-request-fix: fetch-pack: buffer object-format with other args
2021-04-08fetch-pack: buffer object-format with other argsJonathan Tan1-1/+1
In send_fetch_request(), "object-format" is written directly to the file descriptor, as opposed to the other arguments, which are buffered. Buffer "object-format" as well. "object-format" must be buffered; in particular, it must appear after "command=fetch" in the request. This divergence was introduced in 4b831208bb ("fetch-pack: parse and advertise the object-format capability", 2020-05-27), perhaps as an oversight (the surrounding code at the point of this commit has already been using a request buffer.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-08Merge branch 'll/clone-reject-shallow'Junio C Hamano1-4/+8
"git clone --reject-shallow" option fails the clone as soon as we notice that we are cloning from a shallow repository. * ll/clone-reject-shallow: builtin/clone.c: add --reject-shallow option
2021-04-07Merge branch 'ab/fsck-api-cleanup'Junio C Hamano1-23/+8
Fsck API clean-up. * ab/fsck-api-cleanup: fetch-pack: use new fsck API to printing dangling submodules fetch-pack: use file-scope static struct for fsck_options fetch-pack: don't needlessly copy fsck_options fsck.c: move gitmodules_{found,done} into fsck_options fsck.c: add an fsck_set_msg_type() API that takes enums fsck.c: pass along the fsck_msg_id in the fsck_error callback fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h fsck.c: give "FOREACH_MSG_ID" a more specific name fsck.c: undefine temporary STR macro after use fsck.c: call parse_msg_type() early in fsck_set_msg_type() fsck.h: re-order and re-assign "enum fsck_msg_type" fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" fsck.c: rename remaining fsck_msg_id "id" to "msg_id" fsck.c: remove (mostly) redundant append_msg_id() function fsck.c: rename variables in fsck_set_msg_type() for less confusion fsck.h: use "enum object_type" instead of "int" fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} fsck.c: refactor and rename common config callback
2021-04-01builtin/clone.c: add --reject-shallow optionLi Linchao1-4/+8
In some scenarios, users may want more history than the repository offered for cloning, which happens to be a shallow repository, can give them. But because users don't know it is a shallow repository until they download it to local, we may want to refuse to clone this kind of repository, without creating any unnecessary files. The '--depth=x' option cannot be used as a solution; the source may be deep enough to give us 'x' commits when cloned, but the user may later need to deepen the history to arbitrary depth. Teach '--reject-shallow' option to "git clone" to abort as soon as we find out that we are cloning from a shallow repository. Signed-off-by: Li Linchao <lilinchao@oschina.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-28fetch-pack: use new fsck API to printing dangling submodulesÆvar Arnfjörð Bjarmason1-23/+8
Refactor the check added in 5476e1efde (fetch-pack: print and use dangling .gitmodules, 2021-02-22) to make use of us now passing the "msg_id" to the user defined "error_func". We can now compare against the FSCK_MSG_GITMODULES_MISSING instead of parsing the generated message. Let's also replace register_found_gitmodules() with directly manipulating the "gitmodules_found" member. A recent commit moved it into "fsck_options" so we could do this here. I'm sticking this callback in fsck.c. Perhaps in the future we'd like to accumulate such callbacks into another file (maybe fsck-cb.c, similar to parse-options-cb.c?), but while we've got just the one let's just put it into fsck.c. A better alternative in this case would be some library some more obvious library shared by fetch-pack.c ad builtin/index-pack.c, but there isn't such a thing. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-28fetch-pack: use file-scope static struct for fsck_optionsÆvar Arnfjörð Bjarmason1-3/+3
Change code added in 5476e1efde (fetch-pack: print and use dangling .gitmodules, 2021-02-22) so that we use a file-scoped "static struct fsck_options" instead of defining one in the "fsck_gitmodules_oids()" function. We use this pattern in all of builtin/{fsck,index-pack,mktag,unpack-objects}.c. It's odd to see fetch-pack be the odd one out. One might think that we're using other fsck_options structs in fetch-pack, or doing on fsck twice there, but we're not. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-28fsck.c: move gitmodules_{found,done} into fsck_optionsÆvar Arnfjörð Bjarmason1-1/+1
Move the gitmodules_{found,done} static variables added in 159e7b080bf (fsck: detect gitmodules files, 2018-05-02) into the fsck_options struct. It makes sense to keep all the context in the same place. This requires changing the recently added register_found_gitmodules() function added in 5476e1efde (fetch-pack: print and use dangling .gitmodules, 2021-02-22) to take fsck_options. That function will be removed in a subsequent commit, but as it'll require the new gitmodules_found attribute of "fsck_options" we need this intermediate step first. An earlier version of this patch removed the small amount of duplication we now have between FSCK_OPTIONS_{DEFAULT,STRICT} with a FSCK_OPTIONS_COMMON macro. I don't think such de-duplication is worth it for this amount of copy/pasting. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-13use CALLOC_ARRAYRené Scharfe1-1/+1
Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-08Merge branch 'jt/transfer-fsck-across-packs-fix'Junio C Hamano1-2/+2
The code to fsck objects received across multiple packs during a single git fetch session has been broken when the packfile URI feature was in use. A workaround has been added by disabling the codepath to avoid keeping a packfile that is too small. * jt/transfer-fsck-across-packs-fix: fetch-pack: do not mix --pack_header and packfile uri
2021-03-05fetch-pack: do not mix --pack_header and packfile uriJonathan Tan1-2/+2
When fetching (as opposed to cloning) from a repository with packfile URIs enabled, an error like this may occur: fatal: pack has bad object at offset 12: unknown object type 5 fatal: finish_http_pack_request gave result -1 fatal: fetch-pack: expected keep then TAB at start of http-fetch output This bug was introduced in b664e9ffa1 ("fetch-pack: with packfile URIs, use index-pack arg", 2021-02-22), when the index-pack args used when processing the inline packfile of a fetch response and when processing packfile URIs were unified. This bug happens because fetch, by default, partially reads (and consumes) the header of the inline packfile to determine if it should store the downloaded objects as a packfile or loose objects, and thus passes --pack_header=<...> to index-pack to inform it that some bytes are missing. However, when it subsequently fetches the additional packfiles linked by URIs, it reuses the same index-pack arguments, thus wrongly passing --index-pack-arg=--pack_header=<...> when no bytes are missing. This does not happen when cloning because "git clone" always passes do_keep, which instructs the fetch mechanism to always retain the packfile, eliminating the need to read the header. There are a few ways to fix this, including filtering out pack_header arguments when downloading the additional packfiles, but I decided to stick to always using index-pack throughout when packfile URIs are present - thus, Git no longer needs to read the bytes, and no longer needs --pack_header here. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-01Merge branch 'jt/transfer-fsck-across-packs'Junio C Hamano1-17/+86
The approach to "fsck" the incoming objects in "index-pack" is attractive for performance reasons (we have them already in core, inflated and ready to be inspected), but fundamentally cannot be applied fully when we receive more than one pack stream, as a tree object in one pack may refer to a blob object in another pack as ".gitmodules", when we want to inspect blobs that are used as ".gitmodules" file, for example. Teach "index-pack" to emit objects that must be inspected later and check them in the calling "fetch-pack" process. * jt/transfer-fsck-across-packs: fetch-pack: print and use dangling .gitmodules fetch-pack: with packfile URIs, use index-pack arg http-fetch: allow custom index-pack args http: allow custom index-pack args
2021-02-22fetch-pack: print and use dangling .gitmodulesJonathan Tan1-12/+66
Teach index-pack to print dangling .gitmodules links after its "keep" or "pack" line instead of declaring an error, and teach fetch-pack to check such lines printed. This allows the tree side of the .gitmodules link to be in one packfile and the blob side to be in another without failing the fsck check, because it is now fetch-pack which checks such objects after all packfiles have been downloaded and indexed (and not index-pack on an individual packfile, as it is before this commit). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22fetch-pack: with packfile URIs, use index-pack argJonathan Tan1-11/+23
Unify the index-pack arguments used when processing the inline pack and when downloading packfiles referenced by URIs. This is done by teaching get_pack() to also store the index-pack arguments whenever at least one packfile URI is given, and then when processing the packfile URI(s), using the stored arguments. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22http-fetch: allow custom index-pack argsJonathan Tan1-0/+3
This is the next step in teaching fetch-pack to pass its index-pack arguments when processing packfiles referenced by URIs. The "--keep" in fetch-pack.c will be replaced with a full message in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-12fetch-pack: refactor writing promisor fileChristian Couder1-7/+1
Let's replace the 2 different pieces of code that write a promisor file in 'builtin/repack.c' and 'fetch-pack.c' with a new function called 'write_promisor_file()' in 'pack-write.c' and 'pack.h'. This might also help us in the future, if we want to put back the ref names and associated hashes that were in the promisor files we are repacking in 'builtin/repack.c' as suggested by a NEEDSWORK comment just above the code we are refactoring. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-12fetch-pack: rename helper to create_promisor_file()Christian Couder1-4/+4
As we are going to refactor the code that actually writes the promisor file into a separate function in a following commit, let's rename the current write_promisor_file() function to create_promisor_file(). Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08Merge branch 'rs/fetch-pack-invalid-lockfile'Junio C Hamano1-2/+3
"fetch-pack" could pass NULL pointer to unlink(2) when it sees an invalid filename; the error checking has been tightened to make this impossible. * rs/fetch-pack-invalid-lockfile: fetch-pack: disregard invalid pack lockfiles
2020-11-30fetch-pack: disregard invalid pack lockfilesRené Scharfe1-2/+3
9da69a6539 (fetch-pack: support more than one pack lockfile, 2020-06-10) started to use a string_list for pack lockfile names instead of a single string pointer. It removed a NULL check from transport_unlock_pack() as well, which is the function that eventually deletes these lockfiles and releases their name strings. index_pack_lockfile() can return NULL if it doesn't like the contents it reads from the file descriptor passed to it. unlink(2) is declared to not accept NULL pointers (at least with glibc). Undefined Behavior Sanitizer together with Address Sanitizer detects a case where a NULL lockfile name is passed to unlink(2) by transport_unlock_pack() in t1060 (make SANITIZE=address,undefined; cd t; ./t1060-object-corruption.sh). Reinstate the NULL check to avoid undefined behavior, but put it right at the source, so that the number of items in the string_list reflects the number of valid lockfiles. Signed-off-by: René Scharfe <l.s.r@web.de> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-11fetch-pack: advertise session ID in capabilitiesJosh Steadmon1-0/+9
When the server sent a session-id capability and transfer.advertiseSID is true, advertise fetch-pack's own session ID back to the server. Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-03Merge branch 'jt/lazy-fetch'Junio C Hamano1-115/+74
Updates to on-demand fetching code in lazily cloned repositories. * jt/lazy-fetch: fetch: no FETCH_HEAD display if --no-write-fetch-head fetch-pack: remove no_dependents code promisor-remote: lazy-fetch objects in subprocess fetch-pack: do not lazy-fetch during ref iteration fetch: only populate existing_refs if needed fetch: avoid reading submodule config until needed fetch: allow refspecs specified through stdin negotiator/noop: add noop fetch negotiator
2020-09-03Merge branch 'jt/fetch-pack-loosen-validation-with-packfile-uri'Junio C Hamano1-1/+5
Bugfix for "git fetch" when the packfile URI capability is in use. * jt/fetch-pack-loosen-validation-with-packfile-uri: fetch-pack: make packfile URIs work with transfer.fsckobjects fetch-pack: document only_packfile in get_pack() (various): document from_promisor parameter
2020-08-24fetch-pack: make packfile URIs work with transfer.fsckobjectsJonathan Tan1-1/+1
When fetching with packfile URIs and transfer.fsckobjects=1, use the --fsck-objects instead of the --strict flag when invoking index-pack so that links are not checked, only objects. This is because incomplete links are expected. (A subsequent connectivity check will be done when all the packs have been downloaded regardless of whether transfer.fsckobjects is set.) This is similar to 98a2ea46c2 ("fetch-pack: do not check links for partial fetch", 2018-03-15), but for packfile URIs instead of partial clones. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-24fetch-pack: document only_packfile in get_pack()Jonathan Tan1-0/+4
dd4b732df7 ("upload-pack: send part of packfile response as uri", 2020-06-10) added the "only_packfile" parameter to get_pack() but did not document it. Add documentation. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-20fetch-pack: in partial clone, pass --promisorJonathan Tan1-7/+10
When fetching a pack from a promisor remote, the corresponding .promisor file needs to be created. "fetch-pack" originally did this by passing "--promisor" to "index-pack", but in 5374a290aa ("fetch-pack: write fetched refs to .promisor", 2019-10-16), "fetch-pack" was taught to do this itself instead, because it needed to store ref information in the .promisor file. This causes a problem with superprojects when transfer.fsckobjects is set, because in the current implementation, it is "index-pack" that calls fsck_finish() to check the objects; before 5374a290aa, fsck_finish() would see that .gitmodules is a promisor object and tolerate it being missing, but after, there is no .promisor file (at the time of the invocation of fsck_finish() by "index-pack") to tell it that .gitmodules is a promisor object, so it returns an error. Therefore, teach "fetch-pack" to pass "--promisor" to index pack once again. "fetch-pack" will subsequently overwrite this file with the ref information. An alternative is to instead move object checking to "fetch-pack", and let "index-pack" only index the files. However, since "index-pack" has to inflate objects in order to index them, it seems reasonable to also let it check the objects (which also require inflated files). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18fetch-pack: remove no_dependents codeJonathan Tan1-80/+30
Now that Git has switched to using a subprocess to lazy-fetch missing objects, remove the no_dependents code as it is no longer used. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18fetch-pack: do not lazy-fetch during ref iterationJonathan Tan1-35/+44
In order to determine negotiation tips, "fetch-pack" iterates over all refs and dereferences all annotated tags found. This causes the existence of targets of refs and annotated tags to be checked. Avoiding this is especially important when we use "git fetch" (which invokes "fetch-pack") to perform lazy fetches in a partial clone because a target of such a ref or annotated tag may need to be itself lazy-fetched (and otherwise causing an infinite loop). Therefore, teach "fetch-pack" not to lazy fetch whenever iterating over refs. This is done by using the raw form of ref iteration and by dereferencing tags ourselves. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28strvec: fix indentation in renamed callsJeff King1-6/+6
Code which split an argv_array call across multiple lines, like: argv_array_pushl(&args, "one argument", "another argument", "and more", NULL); was recently mechanically renamed to use strvec, which results in mis-matched indentation like: strvec_pushl(&args, "one argument", "another argument", "and more", NULL); Let's fix these up to align the arguments with the opening paren. I did this manually by sifting through the results of: git jump grep 'strvec_.*,$' and liberally applying my editor's auto-format. Most of the changes are of the form shown above, though I also normalized a few that had originally used a single-tab indentation (rather than our usual style of aligning with the open paren). I also rewrapped a couple of obvious cases (e.g., where previously too-long lines became short enough to fit on one), but I wasn't aggressive about it. In cases broken to three or more lines, the grouping of arguments is sometimes meaningful, and it wasn't worth my time or reviewer time to ponder each case individually. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28strvec: convert more callers away from argv_array nameJeff King1-17/+17
We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts remaining files from the first half of the alphabet, to keep the diff to a manageable size. The conversion was done purely mechanically with: git ls-files '*.c' '*.h' | xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' and then selectively staging files with "git add '[abcdefghjkl]*'". We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-06Merge branch 'bc/sha-256-part-2'Junio C Hamano1-0/+14
SHA-256 migration work continues. * bc/sha-256-part-2: (44 commits) remote-testgit: adapt for object-format bundle: detect hash algorithm when reading refs t5300: pass --object-format to git index-pack t5704: send object-format capability with SHA-256 t5703: use object-format serve option t5702: offer an object-format capability in the test t/helper: initialize the repository for test-sha1-array remote-curl: avoid truncating refs with ls-remote t1050: pass algorithm to index-pack when outside repo builtin/index-pack: add option to specify hash algorithm remote-curl: detect algorithm for dumb HTTP by size builtin/ls-remote: initialize repository based on fetch t5500: make hash independent serve: advertise object-format capability for protocol v2 connect: parse v2 refs with correct hash algorithm connect: pass full packet reader when parsing v2 refs Documentation/technical: document object-format for protocol v2 t1302: expect repo format version 1 for SHA-256 builtin/show-index: provide options to determine hash algo t5302: modernize test formatting ...
2020-06-25Merge branch 'jt/cdn-offload'Junio C Hamano1-16/+121
The "fetch/clone" protocol has been updated to allow the server to instruct the clients to grab pre-packaged packfile(s) in addition to the packed object data coming over the wire. * jt/cdn-offload: upload-pack: fix a sparse '0 as NULL pointer' warning upload-pack: send part of packfile response as uri fetch-pack: support more than one pack lockfile upload-pack: refactor reading of pack-objects out Documentation: add Packfile URIs design doc Documentation: order protocol v2 sections http-fetch: support fetching packfiles by URL http-fetch: refactor into function http: refactor finish_http_pack_request() http: use --stdin when indexing dumb HTTP pack
2020-06-10upload-pack: send part of packfile response as uriJonathan Tan1-4/+108
Teach upload-pack to send part of its packfile response as URIs. An administrator may configure a repository with one or more "uploadpack.blobpackfileuri" lines, each line containing an OID, a pack hash, and a URI. A client may configure fetch.uriprotocols to be a comma-separated list of protocols that it is willing to use to fetch additional packfiles - this list will be sent to the server. Whenever an object with one of those OIDs would appear in the packfile transmitted by upload-pack, the server may exclude that object, and instead send the URI. The client will then download the packs referred to by those URIs before performing the connectivity check. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10fetch-pack: support more than one pack lockfileJonathan Tan1-14/+15
Whenever a fetch results in a packfile being downloaded, a .keep file is generated, so that the packfile can be preserved (from, say, a running "git repack") until refs are written referring to the contents of the packfile. In a subsequent patch, a successful fetch using protocol v2 may result in more than one .keep file being generated. Therefore, teach fetch_pack() and the transport mechanism to support multiple .keep files. Implementation notes: - builtin/fetch-pack.c normally does not generate .keep files, and thus is unaffected by this or future changes. However, it has an undocumented "--lock-pack" feature, used by remote-curl.c when implementing the "fetch" remote helper command. In keeping with the remote helper protocol, only one "lock" line will ever be written; the rest will result in warnings to stderr. However, in practice, warnings will never be written because the remote-curl.c "fetch" is only used for protocol v0/v1 (which will not generate multiple .keep files). (Protocol v2 uses the "stateless-connect" command, not the "fetch" command.) - connected.c has an optimization in that connectivity checks on a ref need not be done if the target object is in a pack known to be self-contained and connected. If there are multiple packfiles, this optimization can no longer be done. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-27fetch-pack: parse and advertise the object-format capabilitybrian m. carlson1-0/+12
Parse the server's object-format capability and respond accordingly, dying if there is a mismatch. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-27fetch-pack: detect when the server doesn't support our hashbrian m. carlson1-0/+2
Detect when the server doesn't support our hash algorithm and abort. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-24stateless-connect: send response end packetDenton Liu1-0/+13
Currently, remote-curl acts as a proxy and blindly forwards packets between an HTTP server and fetch-pack. In the case of a stateless RPC connection where the connection is terminated before the transaction is complete, remote-curl will blindly forward the packets before waiting on more input from fetch-pack. Meanwhile, fetch-pack will read the transaction and continue reading, expecting more input to continue the transaction. This results in a deadlock between the two processes. This can be seen in the following command which does not terminate: $ git -c protocol.version=2 clone https://github.com/git/git.git --shallow-since=20151012 Cloning into 'git'... whereas the v1 version does terminate as expected: $ git -c protocol.version=1 clone https://github.com/git/git.git --shallow-since=20151012 Cloning into 'git'... fatal: the remote end hung up unexpectedly Instead of blindly forwarding packets, make remote-curl insert a response end packet after proxying the responses from the remote server when using stateless_connect(). On the RPC client side, ensure that each response ends as described. A separate control packet is chosen because we need to be able to differentiate between what the remote server sends and remote-curl's control packets. By ensuring in the remote-curl code that a server cannot send response end packets, we prevent a malicious server from being able to perform a denial of service attack in which they spoof a response end packet and cause the described deadlock to happen. Reported-by: Force Charlie <charlieio@outlook.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-13Merge branch 'tb/shallow-cleanup'Junio C Hamano1-1/+2
Code cleanup. * tb/shallow-cleanup: shallow: use struct 'shallow_lock' for additional safety shallow.h: document '{commit,rollback}_shallow_file' shallow: extract a header file for shallow-related functions commit: make 'commit_graft_pos' non-static
2020-05-01Merge branch 'jt/v2-fetch-nego-fix'Junio C Hamano1-12/+38
The upload-pack protocol v2 gave up too early before finding a common ancestor, resulting in a wasteful fetch from a fork of a project. This has been corrected to match the behaviour of v0 protocol. * jt/v2-fetch-nego-fix: fetch-pack: in protocol v2, reset in_vain upon ACK fetch-pack: in protocol v2, in_vain only after ACK fetch-pack: return enum from process_acks()
2020-05-01Merge branch 'tb/reset-shallow'Junio C Hamano1-5/+5
Fix in-core inconsistency after fetching into a shallow repository that broke the code to write out commit-graph. * tb/reset-shallow: shallow.c: use '{commit,rollback}_shallow_file' t5537: use test_write_lines and indented heredocs for readability
2020-04-30shallow: use struct 'shallow_lock' for additional safetyTaylor Blau1-1/+1
In previous patches, the functions 'commit_shallow_file' and 'rollback_shallow_file' were introduced to reset the shallowness validity checks on a repository after potentially modifying '.git/shallow'. These functions can be made safer by wrapping the 'struct lockfile *' in a new type, 'shallow_lock', so that they cannot be called with a raw lock (and potentially misused by other code that happens to possess a lockfile, but has nothing to do with shallowness). This patch introduces that type as a thin wrapper around 'struct lockfile', and updates the two aforementioned functions and their callers to use it. Suggested-by: Junio C Hamano <gitster@pobox.com> Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-30shallow: extract a header file for shallow-related functionsTaylor Blau1-0/+1
There are many functions in commit.h that are more related to shallow repositories than they are to any sort of generic commit machinery. Likely this began when there were only a few shallow-related functions, and commit.h seemed a reasonable enough place to put them. But, now there are a good number of shallow-related functions, and placing them all in 'commit.h' doesn't make sense. This patch extracts a 'shallow.h', which takes all of the declarations from 'commit.h' for functions which already exist in 'shallow.c'. We will bring the remaining shallow-related functions defined in 'commit.c' in a subsequent patch. For now, move only the ones that already are implemented in 'shallow.c', and update the necessary includes. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-28fetch-pack: in protocol v2, reset in_vain upon ACKJonathan Tan1-0/+1
In the function process_acks() in fetch-pack.c, the variable received_ack is meant to track that an ACK was received, but it was never set. This results in negotiation terminating prematurely through the in_vain counter, when the counter should have been reset upon every ACK. Therefore, reset the in_vain counter upon every ACK. Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-28fetch-pack: in protocol v2, in_vain only after ACKJonathan Tan1-4/+9
When fetching, Git stops negotiation when it has sent at least MAX_IN_VAIN (which is 256) "have" lines without having any of them ACK-ed. But this is supposed to trigger only after the first ACK, as pack-protocol.txt says: However, the 256 limit *only* turns on in the canonical client implementation if we have received at least one "ACK %s continue" during a prior round. This helps to ensure that at least one common ancestor is found before we give up entirely. The code path for protocol v0 observes this, but not protocol v2, resulting in shorter negotiation rounds but significantly larger packfiles. Teach the code path for protocol v2 to check this criterion only after at least one ACK was received. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-28fetch-pack: return enum from process_acks()Jonathan Tan1-8/+28
process_acks() returns 0, 1, or 2, depending on whether "ready" was received and if not, whether at least one commit was found to be common. Replace these magic numbers with a documented enum. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-24shallow.c: use '{commit,rollback}_shallow_file'Taylor Blau1-5/+5
In bd0b42aed3 (fetch-pack: do not take shallow lock unnecessarily, 2019-01-10), the author noted that 'is_repository_shallow' produces visible side-effect(s) by setting 'is_shallow' and 'shallow_stat'. This is a problem for e.g., fetching with '--update-shallow' in a shallow repository with 'fetch.writeCommitGraph' enabled, since the update to '.git/shallow' will cause Git to think that the repository isn't shallow when it is, thereby circumventing the commit-graph compatibility check. This causes problems in shallow repositories with at least shallow refs that have at least one ancestor (since the client won't have those objects, and therefore can't take the reachability closure over commits when writing a commit-graph). Address this by introducing thin wrappers over 'commit_lock_file' and 'rollback_lock_file' for use specifically when the lock is held over '.git/shallow'. These wrappers (appropriately called 'commit_shallow_file' and 'rollback_shallow_file') call into their respective functions in 'lockfile.h', but additionally reset validity checks used by the shallow machinery. Replace each instance of 'commit_lock_file' and 'rollback_lock_file' with 'commit_shallow_file' and 'rollback_shallow_file' when the lock being held is over the '.git/shallow' file. As a result, 'prune_shallow' can now only be called once (since 'check_shallow_file_for_update' will die after calling 'reset_repository_shallow'). But, this is OK since we only call 'prune_shallow' at most once per process. Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30oid_array: rename source file from sha1-arrayJeff King1-1/+1
We renamed the actual data structure in 910650d2f8 (Rename sha1_array to oid_array, 2017-03-31), but the file is still called sha1-array. Besides being slightly confusing, it makes it more annoying to grep for leftover occurrences of "sha1" in various files, because the header is included in so many places. Let's complete the transition by renaming the source and header files (and fixing up a few comment references). I kept the "-" in the name, as that seems to be our style; cf. fc1395f4a4 (sha1_file.c: rename to use dash in file name, 2018-04-10). We also have oidmap.h and oidset.h without any punctuation, but those are "struct oidmap" and "struct oidset" in the code. We _could_ make this "oidarray" to match, but somehow it looks uglier to me because of the length of "array" (plus it would be a very invasive patch for little gain). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-05Merge branch 'ec/fetch-mark-common-refs-trace2'Junio C Hamano1-1/+12
Trace2 annotation. * ec/fetch-mark-common-refs-trace2: fetch: add trace2 instrumentation
2019-12-01Merge branch 'jt/fetch-remove-lazy-fetch-plugging'Junio C Hamano1-15/+34
"git fetch" codepath had a big "do not lazily fetch missing objects when I ask if something exists" switch. This has been corrected by marking the "does this thing exist?" calls with "if not please do not lazily fetch it" flag. * jt/fetch-remove-lazy-fetch-plugging: promisor-remote: remove fetch_if_missing=0 clone: remove fetch_if_missing=0 fetch: remove fetch_if_missing=0
2019-11-20fetch: add trace2 instrumentationErik Chen1-1/+12
Add trace2 regions to fetch-pack.c to better track time spent in the various phases of a fetch: * parsing remote refs and finding a cutoff * marking local refs as complete * marking complete remote refs as common All stages could potentially be slow for repositories with many refs. Signed-off-by: Erik Chen <erikchen@chromium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-13promisor-remote: remove fetch_if_missing=0Jonathan Tan1-14/+32
Commit 6462d5eb9a ("fetch: remove fetch_if_missing=0", 2019-11-08) strove to remove the need for fetch_if_missing=0 from the fetching mechanism, so it is plausible to attempt removing fetch_if_missing=0 from the lazy-fetching mechanism in promisor-remote as well. But doing so reveals a bug - when the server does not send an object pointed to by a tag object, an infinite loop occurs: Git attempts to fetch the missing object, which causes a deferencing of all refs (for negotiation), which causes a lazy fetch of that missing object, and so on. This bug is because of unnecessary use of the fetch negotiator during lazy fetching - it is not used after initialization, but it is still initialized (which causes the dereferencing of all refs). Thus, when the negotiator is not used during fetching, refrain from initializing it. Then, remove fetch_if_missing from promisor-remote. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-10Merge branch 'jt/fetch-pack-record-refs-in-the-dot-promisor'Junio C Hamano1-4/+43
Debugging support for lazy cloning has been a bit improved. * jt/fetch-pack-record-refs-in-the-dot-promisor: fetch-pack: write fetched refs to .promisor
2019-11-08fetch: remove fetch_if_missing=0Jonathan Tan1-1/+2
In fetch_pack() (and all functions it calls), pass OBJECT_INFO_SKIP_FETCH_OBJECT whenever we query an object that could be a tree or blob that we do not want to be lazy-fetched even if it is absent. Thus, the only lazy-fetches occurring for trees and blobs are when resolving deltas. Thus, we can remove fetch_if_missing=0 from builtin/fetch.c. Remove this, and also add a test ensuring that such objects are not lazy-fetched. (We might be able to remove fetch_if_missing=0 from other places too, but I have limited myself to builtin/fetch.c in this commit because I have not written tests for the other commands yet.) Note that commits and tags may still be lazy-fetched. I limited myself to objects that could be trees or blobs here because Git does not support creating such commit- and tag-excluding clones yet, and even if such a clone were manually created, Git does not have good support for fetching a single commit (when fetching a commit, it and all its ancestors would be sent). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-16fetch-pack: write fetched refs to .promisorJonathan Tan1-4/+43
The specification of promisor packfiles (in partial-clone.txt) states that the .promisor files that accompany packfiles do not matter (just like .keep files), so whenever a packfile is fetched from the promisor remote, Git has been writing empty .promisor files. But these files could contain more useful information. So instead of writing empty files, write the refs fetched to these files. This makes it easier to debug issues with partial clones, as we can identify what refs (and their associated hashes) were fetched at the time the packfile was downloaded, and if necessary, compare those hashes against what the promisor remote reports now. This is implemented by teaching fetch-pack to write its own non-empty .promisor file whenever it knows the name of the pack's lockfile. This covers the case wherein the user runs "git fetch" with an internal protocol or HTTP protocol v2 (fetch_refs_via_pack() in transport.c sets lock_pack) and with HTTP protocol v0/v1 (fetch_git() in remote-curl.c passes "--lock-pack" to "fetch-pack"). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Acked-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-15Merge branch 'js/trace2-fetch-push'Junio C Hamano1-1/+12
Dev support. * js/trace2-fetch-push: transport: push codepath can take arbitrary repository push: add trace2 instrumentation fetch: add trace2 instrumentation
2019-10-11Merge branch 'bc/object-id-part17'Junio C Hamano1-6/+6
Preparation for SHA-256 upgrade continues. * bc/object-id-part17: (26 commits) midx: switch to using the_hash_algo builtin/show-index: replace sha1_to_hex rerere: replace sha1_to_hex builtin/receive-pack: replace sha1_to_hex builtin/index-pack: replace sha1_to_hex packfile: replace sha1_to_hex wt-status: convert struct wt_status to object_id cache: remove null_sha1 builtin/worktree: switch null_sha1 to null_oid builtin/repack: write object IDs of the proper length pack-write: use hash_to_hex when writing checksums sequencer: convert to use the_hash_algo bisect: switch to using the_hash_algo sha1-lookup: switch hard-coded constants to the_hash_algo config: use the_hash_algo in abbrev comparison combine-diff: replace GIT_SHA1_HEXSZ with the_hash_algo bundle: switch to use the_hash_algo connected: switch GIT_SHA1_HEXSZ to the_hash_algo show-index: switch hard-coded constants to the_hash_algo blame: remove needless comparison with GIT_SHA1_HEXSZ ...
2019-10-03fetch: add trace2 instrumentationJosh Steadmon1-1/+12
Add trace2 regions to fetch-pack.c and builtins/fetch.c to better track time spent in the various phases of a fetch: * listing refs * negotiation for protocol versions v0-v2 * fetching refs * consuming refs Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-18Merge branch 'md/list-objects-filter-combo'Junio C Hamano1-13/+7
The list-objects-filter API (used to create a sparse/lazy clone) learned to take a combined filter specification. * md/list-objects-filter-combo: list-objects-filter-options: make parser void list-objects-filter-options: clean up use of ALLOC_GROW list-objects-filter-options: allow mult. --filter strbuf: give URL-encoding API a char predicate fn list-objects-filter-options: make filter_spec a string_list list-objects-filter-options: move error check up list-objects-filter: implement composite filters list-objects-filter-options: always supply *errbuf list-objects-filter: put omits set in filter struct list-objects-filter: encapsulate filter components
2019-08-19fetch-pack: use parse_oid_hexbrian m. carlson1-6/+6
Instead of hard-coding constants, use parse_oid_hex to compute a pointer and use it in further parsing operations. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-13repo-settings: create feature.experimental settingDerrick Stolee1-6/+5
The 'feature.experimental' setting includes config options that are not committed to become defaults, but could use additional testing. Update the following config settings to take new defaults, and to use the repo_settings struct if not already using them: * 'pack.useSparse=true' * 'fetch.negotiationAlgorithm=skipping' In the case of fetch.negotiationAlgorithm, the existing logic would load the config option only when about to use the setting, so had a die() statement on an unknown string value. This is removed as now the config is parsed under prepare_repo_settings(). In general, this die() is probably misplaced and not valuable. A test was removed that checked this die() statement executed. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-09Merge branch 'nd/fetch-capability-tweak'Junio C Hamano1-24/+37
Protocol capabilities that go over wire should never be translated, but it was incorrectly marked for translation, which has been corrected. The output of protocol capabilities for debugging has been tweaked a bit. * nd/fetch-capability-tweak: fetch-pack: print server version at the top in -v -v fetch-pack: print all relevant supported capabilities with -v -v fetch-pack: move capability names out of i18n strings
2019-06-28list-objects-filter-options: make filter_spec a string_listMatthew DeVore1-13/+7
Make the filter_spec string a string_list rather than a raw C string. The list of strings must be concatted together to make a complete filter_spec. A future patch will use this capability to build "combine:" filter specs gradually. A strbuf would seem to be a more natural choice for this object, but it unfortunately requires initialization besides just zero'ing out the memory. This results in all container structs, and all containers of those structs, etc., to also require initialization. Initializing them all would be more cumbersome that simply using a string_list, which behaves properly when its contents are zero'd. For the purposes of code simplification, change behavior in how filter specs are conveyed over the protocol: do not normalize the tree:<depth> filter specs since there should be no server in existence that supports tree:# but not tree:#k etc. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20fetch-pack: print server version at the top in -v -vNguyễn Thái Ngọc Duy1-6/+7
Before the previous patch, the server version is printed after all the "Server supports" lines. The previous one puts the version in the middle of "Server supports" group. Instead of moving it to the bottom, I move it to the top. Version may stand out more at the top as we will have even more debug out after capabilities. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20fetch-pack: print all relevant supported capabilities with -v -vNguyễn Thái Ngọc Duy1-9/+21
When we check if some capability is supported, we do print something in verbose mode. Some capabilities are not printed though (and it made me think it's not supported; I was more used to GIT_TRACE_PACKET) so let's print them all. It's a bit more code. And one could argue for printing all supported capabilities the server sends us. But I think it's still valuable this way because we see the capabilities that the client cares about. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20fetch-pack: move capability names out of i18n stringsNguyễn Thái Ngọc Duy1-9/+9
This reduces the work on translators since they only have one string to translate (and I think it's still enough context to translate). It also makes sure no capability name is translated by accident. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20object: convert lookup_object() to use object_idJeff King1-6/+6
There are no callers left of lookup_object() that aren't just passing us the "hash" member of a "struct object_id". Let's take the whole struct, which gets us closer to removing all raw sha1 variables. It also matches the existing conversions of lookup_blob(), etc. The conversions of callers were done by hand, but they're all mechanical one-liners. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-30Merge branch 'jt/clone-server-option'Junio C Hamano1-1/+1
A brown-paper-bag bugfix to a change already in 'master'. * jt/clone-server-option: fetch-pack: send server options after command
2019-05-28fetch-pack: send server options after commandJonathan Tan1-1/+1
Currently, if any server options are specified during a protocol v2 fetch, server options will be sent before "command=fetch". Write server options to the request buffer in send_fetch_request() so that the components of the request are sent in the correct order. The protocol documentation states that the command must come first. The Git server implementation in serve.c (see process_request() in that file) tolerates any order of command and capability, which is perhaps why we haven't noticed this. This was noticed when testing against a JGit server implementation, which follows the documentation in this regard. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-25Merge branch 'jk/fetch-reachability-error-fix'Junio C Hamano1-7/+9
Code clean-up and a fix for "git fetch" by an explicit object name (as opposed to fetching refs by name). * jk/fetch-reachability-error-fix: fetch: do not consider peeled tags as advertised tips remote.c: make singular free_ref() public fetch: use free_refs() pkt-line: prepare buffer before handling ERR packets upload-pack: send ERR packet for non-tip objects t5530: check protocol response for "not our ref" t5516: drop ok=sigpipe from unreachable-want tests
2019-04-25Merge branch 'jt/fetch-no-update-shallow-in-proto-v2'Junio C Hamano1-10/+41
Fix for protocol v2 support in "git fetch-pack" of shallow clones. * jt/fetch-no-update-shallow-in-proto-v2: fetch-pack: respect --no-update-shallow in v2 fetch-pack: call prepare_shallow_info only if v0
2019-04-25Merge branch 'jt/fetch-pack-wanted-refs-optim'Junio C Hamano1-9/+10
Performance fix around "git fetch" that grabs many refs. * jt/fetch-pack-wanted-refs-optim: fetch-pack: binary search when storing wanted-refs
2019-04-15fetch: do not consider peeled tags as advertised tipsJeff King1-3/+8
Our filter_refs() function accidentally considers the target of a peeled tag to be advertised by the server, even though upload-pack on the server side does not consider it so. This can result in the client making a bogus fetch to the server, which will end with the server complaining "not our ref". Whereas the correct behavior is for the client to notice that the server will not allow the request and error out immediately. So as bugs go, this is not very serious (the outcome is the same either way -- the fetch fails). But it's worth making the logic here correct and consistent with other related cases (e.g., fetching an oid that the server did not mention at all). The crux of the issue comes from fdb69d33c4 (fetch-pack: always allow fetching of literal SHA1s, 2017-05-15). After that, the strategy of filter_refs() is basically: - for each advertised ref, try to match it with a "sought" ref provided by the user. Skip any malformed refs (which includes peeled values like "refs/tags/foo^{}"), and place any unmatched items onto the unmatched list. - if there are unmatched sought refs, then put all of the advertised tips into an oidset, including the unmatched ones. - for each sought ref, see if it's in the oidset, in which case it's legal for us to ask the server for it The problem is in the second step. Our list of unmatched refs includes the peeled refs, even though upload-pack does not allow them to be directly fetched. So the simplest fix would be to exclude them during that step. However, we can observe that the unmatched list isn't used for anything else, and is freed at the end. We can just free those malformed refs immediately. That saves us having to check each ref a second time to see if it's malformed. Note that this code only kicks in when "strict" is in effect. I.e., if we are using the v0 protocol and uploadpack.allowReachableSHA1InWant is not in effect. With v2, all oids are allowed, and we do not bother creating or consulting the oidset at all. To future-proof our test against the upcoming GIT_TEST_PROTOCOL_VERSION flag, we'll manually mark it as a v0-only test. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-15fetch: use free_refs()Jeff King1-4/+1
There's no need for us to write this loop manually when a helper function can already do it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01fetch-pack: binary search when storing wanted-refsJonathan Tan1-9/+10
In do_fetch_pack_v2(), the "sought" array is sorted by name, and it is not subsequently reordered (within the function). Therefore, receive_wanted_refs() can assume that "sought" is sorted, and can thus use a binary search when storing wanted-refs retrieved from the server. Replace the existing linear search with a binary search. This improves performance significantly when mirror cloning a repository with more than 1 million refs. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01fetch-pack: respect --no-update-shallow in v2Jonathan Tan1-7/+34
In protocol v0, when sending "shallow" lines, the server distinguishes between lines caused by the remote repo being shallow and lines caused by client-specified depth settings. Unless "--update-shallow" is specified, there is a difference in behavior: refs that reach the former "shallow" lines, but not the latter, are rejected. But in v2, the server does not, and the client treats all "shallow" lines like lines caused by client-specified depth settings. Full restoration of v0 functionality is not possible without protocol change, but we can implement a heuristic: if we specify any depth setting, treat all "shallow" lines like lines caused by client-specified depth settings (that is, unaffected by "--no-update-shallow"), but otherwise, treat them like lines caused by the remote repo being shallow (that is, affected by "--no-update-shallow"). This restores most of v0 behavior, except in the case where a client fetches from a shallow repository with depth settings. This patch causes a test that previously failed with GIT_TEST_PROTOCOL_VERSION=2 to pass. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01fetch-pack: call prepare_shallow_info only if v0Jonathan Tan1-3/+7
In fetch_pack(), be clearer that there is no shallow information before the fetch when v2 is used - memset the struct shallow_info to 0 instead of calling prepare_shallow_info(). This patch is in preparation for a future patch in which a v2 fetch might call prepare_shallow_info() after shallow info has been retrieved during the fetch, so I needed to ensure that prepare_shallow_info() is not called before the fetch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-20fetch_pack(): drop unused parametersJeff King1-2/+1
We don't need the caller of fetch_pack() to pass in "dest", which is the remote URL. Since ba227857d2 (Reduce the number of connects when fetching, 2008-02-04), the caller is responsible for calling git_connect() itself, and our "dest" parameter is unused. That commit also started passing us the resulting "conn" child_process from git_connect(). But likewise, we do not need do anything with it. The descriptors in "fd" are enough for us, and the caller is responsible for cleaning up "conn". We can just drop both parameters. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-20Merge branch 'jk/no-sigpipe-during-network-transport'Junio C Hamano1-3/+6
On platforms where "git fetch" is killed with SIGPIPE (e.g. OSX), the upload-pack that runs on the other end that hangs up after detecting an error could cause "git fetch" to die with a signal, which led to a flakey test. "git fetch" now ignores SIGPIPE during the network portion of its operation (this is not a problem as we check the return status from our write(2)s). * jk/no-sigpipe-during-network-transport: fetch: ignore SIGPIPE during network operation fetch: avoid calling write_or_die()
2019-03-05fetch: avoid calling write_or_die()Jeff King1-3/+6
The write_or_die() function has one quirk that a caller might not expect: when it sees EPIPE from the write() call, it translates that into a death by SIGPIPE. This doesn't change the overall behavior (the program exits either way), but it does potentially confuse test scripts looking for a non-signal exit code. Let's switch away from using write_or_die() in a few code paths, which will give us more consistent exit codes. It also gives us the opportunity to write more descriptive error messages, since we have context that write_or_die() does not. Note that this won't do much by itself, since we'd typically be killed by SIGPIPE before write_or_die() even gets a chance to do its thing. That will be addressed in the next patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-06Merge branch 'bc/fetch-pack-clear-alternate-shallow'Junio C Hamano1-0/+5
"git fetch" over protocol v2 that needs to make a second connection to backfill tags did not clear a variable that holds shallow repository information correctly, leading to an access of freed piece of memory. * bc/fetch-pack-clear-alternate-shallow: fetch-pack: clear alternate shallow in one more place fetch-pack: clear alternate shallow when complete
2019-02-06fetch-pack: clear alternate shallow in one more placebrian m. carlson1-0/+2
The previous one did not clear the variable in one codepath, but we should aim to be complete. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> [jc: made a reroll into incremental, as the previous one already is in the next branch] Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-05Merge branch 'jt/fetch-v2-sideband'Junio C Hamano1-31/+47
"git fetch" and "git upload-pack" learned to send all exchange over the sideband channel while talking the v2 protocol. * jt/fetch-v2-sideband: tests: define GIT_TEST_SIDEBAND_ALL {fetch,upload}-pack: sideband v2 fetch response sideband: reverse its dependency on pkt-line pkt-line: introduce struct packet_writer pack-protocol.txt: accept error packets in any context Use packet_reader instead of packet_read_line
2019-02-05Merge branch 'js/filter-options-should-use-plain-int'Junio C Hamano1-3/+12
Update the protocol message specification to allow only the limited use of scaled quantities. This is ensure potential compatibility issues will not go out of hand. * js/filter-options-should-use-plain-int: filter-options: expand scaled numbers tree:<depth>: skip some trees even when collecting omits list-objects-filter: teach tree:# how to handle >0
2019-02-04fetch-pack: clear alternate shallow when completebrian m. carlson1-0/+3
When we write an alternate shallow file in update_shallow, we write it into the lock file. The string stored in alternate_shallow_file is copied from the lock file path, but it is freed the moment that the lock file is closed, since we call strbuf_release to free that path. This used to work, since we did not invoke git index-pack more than once, but now that we do, we reuse the freed memory. Ensure we reset the value to NULL to avoid using freed memory. git index-pack will read the repository's shallow file, which will have been updated with the correct information. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17tests: define GIT_TEST_SIDEBAND_ALLJonathan Tan1-1/+2
Define a GIT_TEST_SIDEBAND_ALL environment variable meant to be used from tests. When set to true, this overrides uploadpack.allowsidebandall to true, allowing the entire test suite to be run as if this configuration is in place for all repositories. As of this patch, all tests pass whether GIT_TEST_SIDEBAND_ALL is unset or set to 1. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17{fetch,upload}-pack: sideband v2 fetch responseJonathan Tan1-2/+10
Currently, a response to a fetch request has sideband support only while the packfile is being sent, meaning that the server cannot send notices until the start of the packfile. Extend sideband support in protocol v2 fetch responses to the whole response. upload-pack will advertise it if the uploadpack.allowsidebandall configuration variable is set, and fetch-pack will automatically request it if advertised. If the sideband is to be used throughout the whole response, upload-pack will use it to send errors instead of prefixing a PKT-LINE payload with "ERR ". This will be tested in a subsequent patch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15filter-options: expand scaled numbersJosh Steadmon1-3/+12
When communicating with a remote server or a subprocess, use expanded numbers rather than numbers with scaling suffix in the object filter spec (e.g. "limit:blob=1k" becomes "limit:blob=1024"). Update the protocol docs to note that clients should always perform this expansion, to allow for more compatibility between server implementations. Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14Merge branch 'ms/packet-err-check' into jt/fetch-v2-sidebandJunio C Hamano1-29/+36
* ms/packet-err-check: pack-protocol.txt: accept error packets in any context Use packet_reader instead of packet_read_line
2019-01-10upload-pack: teach deepen-relative in protocol v2Jonathan Tan1-0/+2
Commit 685fbd3291 ("fetch-pack: perform a fetch using v2", 2018-03-15) attempted to teach Git deepen-relative in protocol v2 (among other things), but it didn't work: (1) fetch-pack.c needs to emit "deepen-relative". (2) upload-pack.c needs to ensure that the correct deepen_relative variable is passed to deepen() (there are two - the static variable and the one in struct upload_pack_data). (3) Before deepen() computes the list of reachable shallows, it first needs to mark all "our" refs as OUR_REF. v2 currently does not do this, because unlike v0, it is not needed otherwise. Fix all this and include a test demonstrating that it works now. For (2), the static variable deepen_relative is also eliminated, removing a source of confusion. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Reviewed-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-10fetch-pack: do not take shallow lock unnecessarilyJonathan Tan1-2/+9
When fetching using protocol v2, the remote may send a "shallow-info" section if the client is shallow. If so, Git as the client currently takes the shallow file lock, even if the "shallow-info" section is empty. This is not a problem except that Git does not support taking the shallow file lock after modifying the shallow file, because is_repository_shallow() stores information that is never cleared. And this take-after-modify occurs when Git does a tag-following fetch from a shallow repository on a transport that does not support tag following (since in this case, 2 fetches are performed). To solve this issue, take the shallow file lock (and perform all other shallow processing) only if the "shallow-info" section is non-empty; otherwise, behave as if it were empty. A full solution (probably, ensuring that any action of committing shallow file locks also includes clearing the information stored by is_repository_shallow()) would solve the issue without need for this patch, but this patch is independently useful (as an optimization to prevent writing a file in an unnecessary case), hence why I wrote it. I have included a NEEDSWORK outlining the full solution. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-04Merge branch 'jk/loose-object-cache'Junio C Hamano1-37/+2
Code clean-up with optimization for the codepath that checks (non-)existence of loose objects. * jk/loose-object-cache: odb_load_loose_cache: fix strbuf leak fetch-pack: drop custom loose object cache sha1-file: use loose object cache for quick existence check object-store: provide helpers for loose_objects_cache sha1-file: use an object_directory for the main object dir handle alternates paths the same as the main object dir sha1_file_name(): overwrite buffer instead of appending rename "alternate_object_database" to "object_directory" submodule--helper: prefer strip_suffix() to ends_with() fsck: do not reuse child_process structs
2019-01-02pack-protocol.txt: accept error packets in any contextMasaya Suzuki1-4/+4
In the Git pack protocol definition, an error packet may appear only in a certain context. However, servers can face a runtime error (e.g. I/O error) at an arbitrary timing. This patch changes the protocol to allow an error packet to be sent instead of any packet. Without this protocol spec change, when a server cannot process a request, there's no way to tell that to a client. Since the server cannot produce a valid response, it would be forced to cut a connection without telling why. With this protocol spec change, the server can be more gentle in this situation. An old client may see these error packets as an unexpected packet, but this is not worse than having an unexpected EOF. Following this protocol spec change, the error packet handling code is moved to pkt-line.c. Implementation wise, this implementation uses pkt-line to communicate with a subprocess. Since this is not a part of Git protocol, it's possible that a packet that is not supposed to be an error packet is mistakenly parsed as an error packet. This error packet handling is enabled only for the Git pack protocol parsing code considering this. Signed-off-by: Masaya Suzuki <masayasuzuki@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02Use packet_reader instead of packet_read_lineMasaya Suzuki1-27/+34
By using and sharing a packet_reader while handling a Git pack protocol request, the same reader option is used throughout the code. This makes it easy to set a reader option to the request parsing code. Signed-off-by: Masaya Suzuki <masayasuzuki@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-13fetch-pack: drop custom loose object cacheJeff King1-37/+2
Commit 024aa4696c (fetch-pack.c: use oidset to check existence of loose object, 2018-03-14) added a cache to avoid calling stat() for a bunch of loose objects we don't have. Now that OBJECT_INFO_QUICK handles this caching itself, we can drop the custom solution. Note that this might perform slightly differently, as the original code stopped calling readdir() when we saw more loose objects than there were refs. So: 1. The old code might have spent work on readdir() to fill the cache, but then decided there were too many loose objects, wasting that effort. 2. The new code might spend a lot of time on readdir() if you have a lot of loose objects, even though there are very few objects to ask about. In practice it probably won't matter either way; see the previous commit for some discussion of the tradeoff. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-01fetch-pack: be more precise in parsing v2 responseJonathan Tan1-0/+12
Each section in a protocol v2 response is followed by either a DELIM packet (indicating more sections to follow) or a FLUSH packet (indicating none to follow). But when parsing the "acknowledgments" section, do_fetch_pack_v2() is liberal in accepting both, but determines whether to continue reading or not based solely on the contents of the "acknowledgments" section, not on whether DELIM or FLUSH was read. There is no issue with a protocol-compliant server, but can result in confusing error messages when communicating with a server that serves unexpected additional sections. Consider a server that sends "new-section" after "acknowledgments": - client writes request - client reads the "acknowledgments" section which contains no "ready", then DELIM - since there was no "ready", client needs to continue negotiation, and writes request - client reads "new-section", and reports to the end user "expected 'acknowledgments', received 'new-section'" For the person debugging the involved Git implementation(s), the error message is confusing in that "new-section" was not received in response to the latest request, but to the first one. One solution is to always continue reading after DELIM, but in this case, we can do better. We know from the protocol that "ready" means at least the packfile section is coming (hence, DELIM) and that no "ready" means that no sections are to follow (hence, FLUSH). So teach process_acks() to enforce this. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19Merge branch 'tb/filter-alternate-refs'Junio C Hamano1-2/+1
When pushing into a repository that borrows its objects from an alternate object store, "git receive-pack" that responds to the push request on the other side lists the tips of refs in the alternate to reduce the amount of objects transferred. This sometimes is detrimental when the number of refs in the alternate is absurdly large, in which case the bandwidth saved in potentially fewer objects transferred is wasted in excessively large ref advertisement. The alternate refs that are advertised are now configurable with a pair of configuration variables. * tb/filter-alternate-refs: transport.c: introduce core.alternateRefsPrefixes transport.c: introduce core.alternateRefsCommand transport.c: extract 'fill_alternate_refs_command' transport: drop refnames from for_each_alternate_ref
2018-10-19Merge branch 'jt/avoid-ls-refs'Junio C Hamano1-1/+1
Over some transports, fetching objects with an exact commit object name can be done without first seeing the ref advertisements. The code has been optimized to exploit this. * jt/avoid-ls-refs: fetch: do not list refs if fetching only hashes transport: list refs before fetch if necessary transport: do not list refs if possible transport: allow skipping of ref listing
2018-10-19Merge branch 'jt/non-blob-lazy-fetch'Junio C Hamano1-42/+73
A partial clone that is configured to lazily fetch missing objects will on-demand issue a "git fetch" request to the originating repository to fill not-yet-obtained objects. The request has been optimized for requesting a tree object (and not the leaf blob objects contained in it) by telling the originating repository that no blobs are needed. * jt/non-blob-lazy-fetch: fetch-pack: exclude blobs when lazy-fetching trees fetch-pack: avoid object flags if no_dependents
2018-10-09transport: drop refnames from for_each_alternate_refJeff King1-2/+1
None of the current callers use the refname parameter we pass to their callbacks. In theory somebody _could_ do so, but it's actually quite weird if you think about it: it's a ref in somebody else's repository. So the name has no meaning locally, and in fact there may be duplicates if there are multiple alternates. The users of this interface really only care about seeing some ref tips, since that promises that the alternate has the full commit graph reachable from there. So let's keep the information we pass back to the bare minimum. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07transport: do not list refs if possibleJonathan Tan1-1/+1
When all refs to be fetched are exact OIDs, it is possible to perform a fetch without requiring the remote to list refs if protocol v2 is used. Teach Git to do this. This currently has an effect only for lazy fetches done from partial clones. The change necessary to likewise optimize "git fetch <remote> <sha-1>" will be done in a subsequent patch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04fetch-pack: load tip_oids eagerly iff neededRené Scharfe1-21/+15
tip_oids_contain() lazily loads refs into an oidset at its first call. It abuses the internal (sub)member .map.tablesize of that oidset to check if it has done that already. Determine if the oidset needs to be populated upfront and then do that instead. This duplicates a loop, but simplifies the existing one by separating concerns between the two. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04fetch-pack: factor out is_unmatched_ref()René Scharfe1-8/+11
Move the code to determine if a request is unmatched to its own little helper. This allows us to reuse it in a subsequent patch. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04fetch-pack: exclude blobs when lazy-fetching treesJonathan Tan1-0/+14
A partial clone with missing trees can be obtained using "git clone --filter=tree:none <repo>". In such a repository, when a tree needs to be lazily fetched, any tree or blob it directly or indirectly references is fetched as well, regardless of whether the original command required those objects, or if the local repository already had some of them. This is because the fetch protocol, which the lazy fetch uses, does not allow clients to request that only the wanted objects be sent, which would be the ideal solution. This patch implements a partial solution: specify the "blob:none" filter, somewhat reducing the fetch payload. This change has no effect when lazily fetching blobs (due to how filters work). And if lazily fetching a commit (such repositories are difficult to construct and is not a use case we support very well, but it is possible), referenced commits and trees are still fetched - only the blobs are not fetched. The necessary code change is done in fetch_pack() instead of somewhere closer to where the "filter" instruction is written to the wire so that only one part of the code needs to be changed in order for users of all protocol versions to benefit from this optimization. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04fetch-pack: avoid object flags if no_dependentsJonathan Tan1-42/+59
When fetch_pack() is invoked as part of another Git command (due to a lazy fetch from a partial clone, for example), it uses object flags that may already be used by the outer Git command. The commit that introduced the lazy fetch feature (88e2f9ed8e ("introduce fetch-object: fetch one promisor object", 2017-12-05)) tried to avoid this overlap, but it did not avoid it totally. It was successful in avoiding writing COMPLETE, but did not avoid reading COMPLETE, and did not avoid writing and reading ALTERNATE. Ensure that no flags are written or read by fetch_pack() in the case where it is used to perform a lazy fetch. To do this, it is sufficient to avoid checking completeness of wanted refs (unnecessary in the case of lazy fetches), and to avoid negotiation-related work (in the current implementation, already, no negotiation is performed). After that was done, the lack of overlap was verified by checking all direct and indirect usages of COMPLETE and ALTERNATE - that they are read or written only if no_dependents is false. There are other possible solutions to this issue: (1) Split fetch-pack.{c,h} into a flag-using part and a non-flag-using part, and whenever no_dependents is set, only use the non-flag-using part. (2) Make fetch_pack() be able to be used with arbitrary repository objects. fetch_pack() should then create its own repository object based on the given repository object, with its own object hashtable, so that the flags do not conflict. (1) is possible but invasive - some functions would need to be split; and such invasiveness would potentially be unnecessary if we ever were to need (2) anyway. (2) would be useful if we were to support, say, submodules that were partial clones themselves, but I don't know when or if the Git project plans to support those. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29convert "oidcmp() != 0" to "!oideq()"Jeff King1-1/+1
This is the flip side of the previous two patches: checking for a non-zero oidcmp() can be more strictly expressed as inequality. Like those patches, we write "!= 0" in the coccinelle transformation, which covers by isomorphism the more common: if (oidcmp(E1, E2)) As with the previous two patches, this patch can be achieved almost entirely by running "make coccicheck"; the only differences are manual line-wrap fixes to match the original code. There is one thing to note for anybody replicating this, though: coccinelle 1.0.4 seems to miss the case in builtin/tag.c, even though it's basically the same as all the others. Running with 1.0.7 does catch this, so presumably it's just a coccinelle bug that was fixed in the interim. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-17Merge branch 'ab/fsck-transfer-updates'Junio C Hamano1-2/+30
The test performed at the receiving end of "git push" to prevent bad objects from entering repository can be customized via receive.fsck.* configuration variables; we now have gained a counterpart to do the same on the "git fetch" side, with fetch.fsck.* configuration variables. * ab/fsck-transfer-updates: fsck: test and document unknown fsck.<msg-id> values fsck: add stress tests for fsck.skipList fsck: test & document {fetch,receive}.fsck.* config fallback fetch: implement fetch.fsck.* transfer.fsckObjects tests: untangle confusing setup config doc: elaborate on fetch.fsckObjects security config doc: elaborate on what transfer.fsckObjects does config doc: unify the description of fsck.* and receive.fsck.* config doc: don't describe *.fetchObjects twice receive.fsck.<msg-id> tests: remove dead code
2018-08-15Merge branch 'jt/connectivity-check-after-unshallow'Junio C Hamano1-15/+15
"git fetch" sometimes failed to update the remote-tracking refs, which has been corrected. * jt/connectivity-check-after-unshallow: fetch-pack: unify ref in and out param
2018-08-15Merge branch 'bw/fetch-pack-i18n'Junio C Hamano1-8/+8
i18n updates. * bw/fetch-pack-i18n: fetch-pack: mark die strings for translation
2018-08-02Merge branch 'jt/fetch-negotiator-skipping'Junio C Hamano1-2/+5
Add a server-side knob to skip commits in exponential/fibbonacci stride in an attempt to cover wider swath of history with a smaller number of iterations, potentially accepting a larger packfile transfer, instead of going back one commit a time during common ancestor discovery during the "git fetch" transaction. * jt/fetch-negotiator-skipping: negotiator/skipping: skip commits during fetch
2018-08-02Merge branch 'jt/fetch-nego-tip'Junio C Hamano1-2/+18
"git fetch" learned a new option "--negotiation-tip" to limit the set of commits it tells the other end as "have", to reduce wasted bandwidth and cycles, which would be helpful when the receiving repository has a lot of refs that have little to do with the history at the remote it is fetching from. * jt/fetch-nego-tip: fetch-pack: support negotiation tip whitelist
2018-08-02Merge branch 'sb/object-store-lookup'Junio C Hamano1-15/+21
lookup_commit_reference() and friends have been updated to find in-core object for a specific in-core repository instance. * sb/object-store-lookup: (32 commits) commit.c: allow lookup_commit_reference to handle arbitrary repositories commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories tag.c: allow deref_tag to handle arbitrary repositories object.c: allow parse_object to handle arbitrary repositories object.c: allow parse_object_buffer to handle arbitrary repositories commit.c: allow get_cached_commit_buffer to handle arbitrary repositories commit.c: allow set_commit_buffer to handle arbitrary repositories commit.c: migrate the commit buffer to the parsed object store commit-slabs: remove realloc counter outside of slab struct commit.c: allow parse_commit_buffer to handle arbitrary repositories tag: allow parse_tag_buffer to handle arbitrary repositories tag: allow lookup_tag to handle arbitrary repositories commit: allow lookup_commit to handle arbitrary repositories tree: allow lookup_tree to handle arbitrary repositories blob: allow lookup_blob to handle arbitrary repositories object: allow lookup_object to handle arbitrary repositories object: allow object_as_type to handle arbitrary repositories tag: add repository argument to deref_tag tag: add repository argument to parse_tag_buffer tag: add repository argument to lookup_tag ...
2018-08-02Merge branch 'jt/fetch-pack-negotiator'Junio C Hamano1-168/+87
Code restructuring and a small fix to transport protocol v2 during fetching. * jt/fetch-pack-negotiator: fetch-pack: introduce negotiator API fetch-pack: move common check and marking together fetch-pack: make negotiation-related vars local fetch-pack: use ref adv. to prune "have" sent fetch-pack: directly end negotiation if ACK ready fetch-pack: clear marks before re-marking fetch-pack: split up everything_local()
2018-08-01fetch-pack: unify ref in and out paramJonathan Tan1-15/+15
When a user fetches: - at least one up-to-date ref and at least one non-up-to-date ref, - using HTTP with protocol v0 (or something else that uses the fetch command of a remote helper) some refs might not be updated after the fetch. This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow info in output parameter", 2018-06-28) which allowed transports to report the refs that they have fetched in a new out-parameter "fetched_refs". If they do so, transport_fetch_refs() makes this information available to its caller. Users of "fetched_refs" rely on the following 3 properties: (1) it is the complete list of refs that was passed to transport_fetch_refs(), (2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if relevant), and (3) it has updated OIDs if ref-in-want was used (introduced after 989b8c4452). In an effort to satisfy (1), whenever transport_fetch_refs() filters the refs sent to the transport, it re-adds the filtered refs to whatever the transport supplies before returning it to the user. However, the implementation in 989b8c4452 unconditionally re-adds the filtered refs without checking if the transport refrained from reporting anything in "fetched_refs" (which it is allowed to do), resulting in an incomplete list, no longer satisfying (1). An earlier effort to resolve this [1] solved the issue by readding the filtered refs only if the transport did not refrain from reporting in "fetched_refs", but after further discussion, it seems that the better solution is to revert the API change that introduced "fetched_refs". This API change was first suggested as part of a ref-in-want implementation that allowed for ref patterns and, thus, there could be drastic differences between the input refs and the refs actually fetched [2]; we eventually decided to only allow exact ref names, but this API change remained even though its necessity was decreased. Therefore, revert this API change by reverting commit 989b8c4452, and make receive_wanted_refs() update the OIDs in the sought array (like how update_shallow() updates shallow information in the sought array) instead. A test is also included to show that the user-visible bug discussed at the beginning of this commit message no longer exists. [1] https://public-inbox.org/git/20180801171806.GA122458@google.com/ [2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-27fetch: implement fetch.fsck.*Ævar Arnfjörð Bjarmason1-2/+30
Implement support for fetch.fsck.* corresponding with the existing receive.fsck.*. This allows for pedantically cloning repositories with specific issues without turning off fetch.fsckObjects. One such repository is https://github.com/robbyrussell/oh-my-zsh.git which before this change will emit this error when cloned with fetch.fsckObjects: error: object 2b7227859263b6aabcc28355b0b994995b7148b6: zeroPaddedFilemode: contains zero-padded file modes fatal: Error in object fatal: index-pack failed Now with fetch.fsck.zeroPaddedFilemode=warn we'll warn about that issue, but the clone will succeed: warning: object 2b7227859263b6aabcc28355b0b994995b7148b6: zeroPaddedFilemode: contains zero-padded file modes warning: object a18c4d13c2a5fa2d4ecd5346c50e119b999b807d: zeroPaddedFilemode: contains zero-padded file modes warning: object 84df066176c8da3fd59b13731a86d90f4f1e5c9d: zeroPaddedFilemode: contains zero-padded file modes The motivation for this is to be able to turn on fetch.fsckObjects globally across a fleet of computers but still be able to manually clone various legacy repositories by either white-listing specific issues, or better yet whitelist specific objects. The use of --git-dir=* instead of -C in the tests could be considered somewhat archaic, but the tests I'm adding here are duplicating the corresponding receive.* tests with as few changes as possible. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-24Merge branch 'jt/connectivity-check-after-unshallow'Junio C Hamano1-10/+74
"git fetch" failed to correctly validate the set of objects it received when making a shallow history deeper, which has been corrected. * jt/connectivity-check-after-unshallow: fetch-pack: write shallow, then check connectivity fetch-pack: implement ref-in-want fetch-pack: put shallow info in output parameter fetch: refactor to make function args narrower fetch: refactor fetch_refs into two functions fetch: refactor the population of peer ref OIDs upload-pack: test negotiation with changing repository upload-pack: implement ref-in-want test-pkt-line: add unpack-sideband subcommand
2018-07-23fetch-pack: mark die strings for translationBrandon Williams1-8/+8
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-18Merge branch 'sb/object-store-grafts'Junio C Hamano1-7/+8
The conversion to pass "the_repository" and then "a_repository" throughout the object access API continues. * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h
2018-07-16negotiator/skipping: skip commits during fetchJonathan Tan1-2/+5
Introduce a new negotiation algorithm used during fetch that skips commits in an effort to find common ancestors faster. The skips grow similarly to the Fibonacci sequence as the commit walk proceeds further away from the tips. The skips may cause unnecessary commits to be included in the packfile, but the negotiation step typically ends more quickly. Usage of this algorithm is guarded behind the configuration flag fetch.negotiationAlgorithm. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03fetch-pack: support negotiation tip whitelistJonathan Tan1-2/+18
During negotiation, fetch-pack eventually reports as "have" lines all commits reachable from all refs. Allow the user to restrict the commits sent in this way by providing a whitelist of tips; only the tips themselves and their ancestors will be sent. Both globs and single objects are supported. This feature is only supported for protocols that support connect or stateless-connect (such as HTTP with protocol v2). This will speed up negotiation when the repository has multiple relatively independent branches (for example, when a repository interacts with multiple repositories, such as with linux-next [1] and torvalds/linux [2]), and the user knows which local branch is likely to have commits in common with the upstream branch they are fetching. [1] https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next/ [2] https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03fetch-pack: write shallow, then check connectivityJonathan Tan1-0/+31
When fetching, connectivity is checked after the shallow file is updated. There are 2 issues with this: (1) the connectivity check is only performed up to ancestors of existing refs (which is not thorough enough if we were deepening an existing ref in the first place), and (2) there is no rollback of the shallow file if the connectivity check fails. To solve (1), update the connectivity check to check the ancestry chain completely in the case of a deepening fetch by refraining from passing "--not --all" when invoking rev-list in connected.c. To solve (2), have fetch_pack() perform its own connectivity check before updating the shallow file. To support existing use cases in which "git fetch-pack" is used to download objects without much regard as to the connectivity of the resulting objects with respect to the existing repository, the connectivity check is only done if necessary (that is, the fetch is not a clone, and the fetch involves shallow/deepen functionality). "git fetch" still performs its own connectivity check, preserving correctness but sometimes performing redundant work. This redundancy is mitigated by the fact that fetch_pack() reports if it has performed a connectivity check itself, and if the transport supports connect or stateless-connect, it will bubble up that report so that "git fetch" knows not to perform the connectivity check in such a case. This was noticed when a user tried to deepen an existing repository by fetching with --no-shallow from a server that did not send all necessary objects - the connectivity check as run by "git fetch" succeeded, but a subsequent "git fsck" failed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29tag: add repository argument to deref_tagStefan Beller1-3/+6
Add a repository argument to allow the callers of deref_tag to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29commit: add repository argument to lookup_commitStefan Beller1-2/+3
Add a repository argument to allow callers of lookup_commit to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29object: add repository argument to lookup_objectStefan Beller1-6/+7
Add a repository argument to allow callers of lookup_object to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29object: add repository argument to parse_objectStefan Beller1-8/+10
Add a repository argument to allow the callers of parse_object to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29Merge branch 'sb/object-store-grafts' into sb/object-store-lookupJunio C Hamano1-7/+8
* sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h
2018-06-28Merge branch 'jk/fetch-all-peeled-fix'Junio C Hamano1-4/+4
"git fetch-pack --all" used to unnecessarily fail upon seeing an annotated tag that points at an object other than a commit. * jk/fetch-all-peeled-fix: fetch-pack: test explicitly that --all can fetch tag references pointing to non-commits fetch-pack: don't try to fetch peel values with --all
2018-06-28fetch-pack: implement ref-in-wantBrandon Williams1-3/+35
Implement ref-in-want on the client side so that when a server supports the "ref-in-want" feature, a client will send "want-ref" lines for each reference the client wants to fetch. This feature allows clients to tolerate inconsistencies that exist when a remote repository's refs change during the course of negotiation. This allows a client to request to request a particular ref without specifying the OID of the ref. This means that instead of hitting an error when a ref no longer points at the OID it did at the beginning of negotiation, negotiation can continue and the value of that ref will be sent at the termination of negotiation, just before a packfile is sent. More information on the ref-in-want feature can be found in Documentation/technical/protocol-v2.txt. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-28fetch-pack: put shallow info in output parameterBrandon Williams1-7/+8
Expand the transport fetch method signature, by adding an output parameter, to allow transports to return information about the refs they have fetched. Then communicate shallow status information through this mechanism instead of by modifying the input list of refs. This does require clients to sometimes generate the ref map twice: once from the list of refs provided by the remote (as is currently done) and potentially once from the new list of refs that the fetch mechanism provides. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-15fetch-pack: introduce negotiator APIJonathan Tan1-166/+39
Introduce the new files fetch-negotiator.{h,c}, which contains an API behind which the details of negotiation are abstracted. Currently, only one algorithm is available: the existing one. This patch is written to be easily reviewed: static functions are moved verbatim from fetch-pack.c to negotiator/default.c, and it can be seen that the lines replaced by negotiator->X() calls are present in the X() functions respectively. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-15fetch-pack: move common check and marking togetherJonathan Tan1-2/+4
When receiving 'ACK <object-id> continue' for a common commit, check if the commit was already known to be common and mark it as such if not up front. This should make future refactoring of how the information about common commits is stored more straightforward. No visible change intended. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-15fetch-pack: make negotiation-related vars localJonathan Tan1-47/+69
Reduce the number of global variables by making the priority queue and the count of non-common commits in it local, passing them as a struct to various functions where necessary. This also helps in the case that fetch_pack() is invoked twice in the same process (when tag following is required when using a transport that does not support tag following), in that different priority queues will now be used in each invocation, instead of reusing the possibly non-empty one. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-15fetch-pack: use ref adv. to prune "have" sentJonathan Tan1-3/+3
In negotiation using protocol v2, fetch-pack sometimes does not make full use of the information obtained in the ref advertisement: specifically, that if the server advertises a commit that the client also has, the client never needs to inform the server that it has the commit's parents, since it can just tell the server that it has the advertised commit and it knows that the server can and will infer the rest. This is because, in do_fetch_pack_v2(), rev_list_insert_ref_oid() is invoked before mark_complete_and_common_ref(). This means that if we have a commit that is both our ref and their ref, it would be enqueued by rev_list_insert_ref_oid() as SEEN, and since it is thus already SEEN, mark_complete_and_common_ref() would not enqueue it. If mark_complete_and_common_ref() were invoked first, as it is in do_fetch_pack() for protocol v0, then mark_complete_and_common_ref() would enqueue it with COMMON_REF | SEEN. The addition of COMMON_REF ensures that its parents are not sent as "have" lines. Change the order in do_fetch_pack_v2() to be consistent with do_fetch_pack(), and to avoid sending unnecessary "have" lines. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-15fetch-pack: directly end negotiation if ACK readyJonathan Tan1-4/+5
When "ACK %s ready" is received, find_common() clears rev_list in an attempt to stop further "have" lines from being sent [1]. It is much more readable to explicitly break from the loop instead. So explicitly break from the loop, and make the clearing of the rev_list happen unconditionally. [1] The rationale is further described in the originating commit f2cba9299b ("fetch-pack: Finish negotation if remote replies "ACK %s ready"", 2011-03-14). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-14fetch-pack: clear marks before re-markingJonathan Tan1-3/+3
If tag following is required when using a transport that does not support tag following, fetch_pack() will be invoked twice in the same process, necessitating a clearing of the object flags used by fetch_pack() sometime during the second invocation. This is currently done in find_common(), which means that the invocation of mark_complete_and_common_ref() in do_fetch_pack() is useless. (This cannot be reproduced with Git alone, because all transports that come with Git support tag following.) Therefore, move this clearing from find_common() to its parent function do_fetch_pack(), right before it calls mark_complete_and_common_ref(). This has been occurring since the commit that introduced the clearing of marks, 420e9af498 ("Fix tag following", 2008-03-19). The corresponding code for protocol v2 in do_fetch_pack_v2() does not have this problem, as the clearing of flags is done before any marking (whether by rev_list_insert_ref_oid() or mark_complete_and_common_ref()). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>