aboutsummaryrefslogtreecommitdiffstats
path: root/refs
AgeCommit message (Collapse)AuthorFilesLines
6 daysMerge branch 'ps/reftable-write-optim'Junio C Hamano1-22/+53
Code to write out reftable has seen some optimization and simplification. * ps/reftable-write-optim: reftable/block: reuse compressed array reftable/block: reuse zstream when writing log blocks reftable/writer: reset `last_key` instead of releasing it reftable/writer: unify releasing memory reftable/writer: refactorings for `writer_flush_nonempty_block()` reftable/writer: refactorings for `writer_add_record()` refs/reftable: don't recompute committer ident reftable: remove name checks refs/reftable: skip duplicate name checks refs/reftable: perform explicit D/F check when writing symrefs refs/reftable: fix D/F conflict error message on ref copy
2024-04-08refs/reftable: don't recompute committer identPatrick Steinhardt1-18/+34
In order to write reflog entries we need to compute the committer's identity as it gets encoded in the log record itself. The reftable backend does this via `git_committer_info()` and `split_ident_line()` in `fill_reftable_log_record()`, which use the Git config as well as environment variables to figure out the identity. While most callers would only call `fill_reftable_log_record()` once or twice, `write_transaction_table()` will call it as many times as there are queued ref updates. This can be quite a waste of effort when writing many refs with reflog entries in a single transaction. Refactor the code to pre-compute the committer information. This results in a small speedup when writing 100000 refs in a single transaction: Benchmark 1: update-ref: create many refs (HEAD~) Time (mean ± σ): 2.895 s ± 0.020 s [User: 1.516 s, System: 1.374 s] Range (min … max): 2.868 s … 2.983 s 100 runs Benchmark 2: update-ref: create many refs (HEAD) Time (mean ± σ): 2.845 s ± 0.017 s [User: 1.461 s, System: 1.379 s] Range (min … max): 2.803 s … 2.913 s 100 runs Summary update-ref: create many refs (HEAD) ran 1.02 ± 0.01 times faster than update-ref: create many refs (HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-08reftable: remove name checksPatrick Steinhardt1-5/+0
In the preceding commit we have disabled name checks in the "reftable" backend. These checks were responsible for verifying multiple things when writing records to the reftable stack: - Detecting file/directory conflicts. Starting with the preceding commits this is now handled by the reftable backend itself via `refs_verify_refname_available()`. - Validating refnames. This is handled by `check_refname_format()` in the generic ref transacton layer. The code in the reftable library is thus not used anymore and likely to bitrot over time. Remove it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-08refs/reftable: skip duplicate name checksPatrick Steinhardt1-0/+5
All the callback functions which write refs in the reftable backend perform D/F conflict checks via `refs_verify_refname_available()`. But in reality we perform these D/F conflict checks a second time in the reftable library via `stack_check_addition()`. Interestingly, the code in the reftable library is inferior compared to the generic function: - It is slower than `refs_verify_refname_available()`, even though this can probably be optimized. - It does not provide a proper error message to the caller, and thus all the user would see is a generic "file/directory conflict" message. Disable the D/F conflict checks in the reftable library by setting the `skip_name_check` write option. This results in a non-negligible speedup when writing many refs. The following benchmark writes 100k refs in a single transaction: Benchmark 1: update-ref: create many refs (HEAD~) Time (mean ± σ): 3.241 s ± 0.040 s [User: 1.854 s, System: 1.381 s] Range (min … max): 3.185 s … 3.454 s 100 runs Benchmark 2: update-ref: create many refs (HEAD) Time (mean ± σ): 2.878 s ± 0.024 s [User: 1.506 s, System: 1.367 s] Range (min … max): 2.838 s … 2.960 s 100 runs Summary update-ref: create many refs (HEAD~) ran 1.13 ± 0.02 times faster than update-ref: create many refs (HEAD) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-08refs/reftable: perform explicit D/F check when writing symrefsPatrick Steinhardt1-3/+17
We already perform explicit D/F checks in all reftable callbacks which write refs, except when writing symrefs. For one this leads to an error message which isn't perfectly actionable because we only tell the user that there was a D/F conflict, but not which refs conflicted with each other. But second, once all ref updating callbacks explicitly check for D/F conflicts, we can disable the D/F checks in the reftable library itself and thus avoid some duplicated efforts. Refactor the code that writes symref tables to explicitly call into `refs_verify_refname_available()` when writing symrefs. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-08refs/reftable: fix D/F conflict error message on ref copyPatrick Steinhardt1-1/+2
The `write_copy_table()` function is shared between the reftable implementations for renaming and copying refs. The only difference between those two cases is that the rename will also delete the old reference, whereas copying won't. This has resulted in a bug though where we don't properly verify refname availability. When calling `refs_verify_refname_available()`, we always add the old ref name to the list of refs to be skipped when computing availability, which indicates that the name would be available even if it already exists at the current point in time. This is only the right thing to do for renames though, not for copies. The consequence of this bug is quite harmless because the reftable backend has its own checks for D/F conflicts further down in the call stack, and thus we refuse the update regardless of the bug. But all the user gets in this case is an uninformative message that copying the ref has failed, without any further details. Fix the bug and only add the old name to the skip-list in case we rename the ref. Consequently, this error case will now be handled by `refs_verify_refname_available()`, which knows to provide a proper error message. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-08reftable/stack: add env to disable autocompactionJustin Tobler1-0/+3
In future tests it will be neccesary to create repositories with a set number of tables. To make this easier, introduce the `GIT_TEST_REFTABLE_AUTOCOMPACTION` environment variable that, when set to false, disables autocompaction of reftables. Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-05Merge branch 'ps/pack-refs-auto' into jt/reftable-geometric-compactionJunio C Hamano1-2/+9
* ps/pack-refs-auto: builtin/gc: pack refs when using `git maintenance run --auto` builtin/gc: forward git-gc(1)'s `--auto` flag when packing refs t6500: extract objects with "17" prefix builtin/gc: move `struct maintenance_run_opts` builtin/pack-refs: introduce new "--auto" flag builtin/pack-refs: release allocated memory refs/reftable: expose auto compaction via new flag refs: remove `PACK_REFS_ALL` flag refs: move `struct pack_refs_opts` to where it's used t/helper: drop pack-refs wrapper refs/reftable: print errors on compaction failure reftable/stack: gracefully handle failed auto-compaction due to locks reftable/stack: use error codes when locking fails during compaction reftable/error: discern locked/outdated errors reftable/stack: fix error handling in `reftable_stack_init_addition()`
2024-03-25refs/reftable: expose auto compaction via new flagPatrick Steinhardt1-1/+4
Under normal circumstances, the "reftable" backend will automatically perform compaction after appending to the stack. It is thus not necessary and may even be considered wasteful to run git-pack-refs(1) in "reftable"-backed repositories as it will cause the backend to compact all tables into a single one. We do exactly that though when running `git maintenance run --auto` or `git gc --auto`, which gets spawned by Git after running some specific commands. The `--auto` mode is typically only executing optimizations as needed. To do so, we already use several heuristics for the various different data structures in Git to determine whether to optimize them or not. We do not use any heuristics for refs though and instead always optimize them. Introduce a new `PACK_REFS_AUTO` flag that can be passed to the backend. When not handled by the backend we will continue to behave the exact same as we do right now, that is we optimize refs unconditionally. This is done for the "files" backend for now to retain current behaviour, even though we may eventually also want to introduce heuristics here. For the "reftable" backend though we already do have auto-compaction, so we can easily reuse that logic to implement the new auto-packing flag. Note that under normal circumstances, this should always end up being a no-op. After all, we already invoke the code for every single addition to the stack. But there are special cases where it can still be helpful to execute the auto-compaction code explicitly: - Concurrent writers may cause compaction to not run due to locks. - Callers may decide to disable compaction altogether and then pack refs at a later point due to various reasons. - Other implementations of the reftable format may do compaction differently or even not at all. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-25refs/reftable: print errors on compaction failurePatrick Steinhardt1-1/+5
When git-pack-refs(1) fails in the reftable backend we end up printing no error message at all, leaving the caller puzzled as to why compaction has failed. Fix this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-21Merge branch 'ps/reftable-reflog-iteration-perf'Junio C Hamano1-36/+16
The code to iterate over reflogs in the reftable has been optimized to reduce memory allocation and deallocation. Reviewed-by: Josh Steadmon <steadmon@google.com> cf. <Ze9eX-aaWoVaqsPP@google.com> * ps/reftable-reflog-iteration-perf: refs/reftable: track last log record name via strbuf reftable/record: use scratch buffer when decoding records reftable/record: reuse message when decoding log records reftable/record: reuse refnames when decoding log records reftable/record: avoid copying author info reftable/record: convert old and new object IDs to arrays refs/reftable: reload correct stack when creating reflog iter
2024-03-14Merge branch 'ps/reftable-iteration-perf-part2'Junio C Hamano1-2/+4
The code to iterate over refs with the reftable backend has seen some optimization. * ps/reftable-iteration-perf-part2: refs/reftable: precompute prefix length reftable: allow inlining of a few functions reftable/record: decode keys in place reftable/record: reuse refname when copying reftable/record: reuse refname when decoding reftable/merged: avoid duplicate pqueue emptiness check reftable/merged: circumvent pqueue with single subiter reftable/merged: handle subiter cleanup on close only reftable/merged: remove unnecessary null check for subiters reftable/merged: make subiters own their records reftable/merged: advance subiter on subsequent iteration reftable/merged: make `merged_iter` structure private reftable/pq: use `size_t` to track iterator index
2024-03-07Merge branch 'ps/reftable-repo-init-fix'Junio C Hamano1-0/+1
Clear the fallout from a fix for 2.44 regression. * ps/reftable-repo-init-fix: t0610: remove unused variable assignment refs/reftable: don't fail empty transactions in repo without HEAD
2024-03-05Merge branch 'kn/for-all-refs'Junio C Hamano3-38/+106
"git for-each-ref" learned "--include-root-refs" option to show even the stuff outside the 'refs/' hierarchy. * kn/for-all-refs: for-each-ref: add new option to include root refs ref-filter: rename 'FILTER_REFS_ALL' to 'FILTER_REFS_REGULAR' refs: introduce `refs_for_each_include_root_refs()` refs: extract out `loose_fill_ref_dir_regular_file()` refs: introduce `is_pseudoref()` and `is_headref()`
2024-03-05refs/reftable: track last log record name via strbufPatrick Steinhardt1-5/+6
The reflog iterator enumerates all reflogs known to a ref backend. In the "reftable" backend there is no way to list all existing reflogs directly. Instead, we have to iterate through all reflog entries and discard all those redundant entries for which we have already returned a reflog entry. This logic is implemented by tracking the last reflog name that we have emitted to the iterator's user. If the next log record has the same name we simply skip it until we find another record with a different refname. This last reflog name is stored in a simple C string, which requires us to free and reallocate it whenever we need to update the reflog name. Convert it to use a `struct strbuf` instead, which reduces the number of allocations. Before: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 1,068,485 allocs, 1,068,363 frees, 281,122,886 bytes allocated After: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 68,485 allocs, 68,363 frees, 256,234,072 bytes allocated Note that even after this change we still allocate quite a lot of data, even though the number of allocations does not scale with the number of log records anymore. This remainder comes mostly from decompressing the log blocks, where we decompress each block into newly allocated memory. This will be addressed at a later point in time. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-05reftable/record: convert old and new object IDs to arraysPatrick Steinhardt1-30/+9
In 7af607c58d (reftable/record: store "val1" hashes as static arrays, 2024-01-03) and b31e3cc620 (reftable/record: store "val2" hashes as static arrays, 2024-01-03) we have converted ref records to store their object IDs in a static array. Convert log records to do the same so that their old and new object IDs are arrays, too. This change results in two allocations less per log record that we're iterating over. Before: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 8,068,495 allocs, 8,068,373 frees, 401,011,862 bytes allocated After: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 6,068,489 allocs, 6,068,367 frees, 361,011,822 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-05refs/reftable: reload correct stack when creating reflog iterPatrick Steinhardt1-1/+1
When creating a new reflog iterator, we first have to reload the stack that the iterator is being created. This is done so that any concurrent writes to the stack are reflected. But `reflog_iterator_for_stack()` always reloads the main stack, which is wrong. Fix this and reload the correct stack. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-05Merge branch 'ps/reftable-iteration-perf-part2' into ↵Junio C Hamano1-2/+4
ps/reftable-reflog-iteration-perf * ps/reftable-iteration-perf-part2: refs/reftable: precompute prefix length reftable: allow inlining of a few functions reftable/record: decode keys in place reftable/record: reuse refname when copying reftable/record: reuse refname when decoding reftable/merged: avoid duplicate pqueue emptiness check reftable/merged: circumvent pqueue with single subiter reftable/merged: handle subiter cleanup on close only reftable/merged: remove unnecessary null check for subiters reftable/merged: make subiters own their records reftable/merged: advance subiter on subsequent iteration reftable/merged: make `merged_iter` structure private reftable/pq: use `size_t` to track iterator index
2024-03-04refs/reftable: precompute prefix lengthPatrick Steinhardt1-2/+4
We're recomputing the prefix length on every iteration of the ref iterator. Precompute it for another speedup when iterating over 1 million refs: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 100.3 ms ± 3.7 ms [User: 97.3 ms, System: 2.8 ms] Range (min … max): 97.5 ms … 139.7 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 95.8 ms ± 3.4 ms [User: 92.9 ms, System: 2.8 ms] Range (min … max): 93.0 ms … 121.9 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-27refs/reftable: don't fail empty transactions in repo without HEADPatrick Steinhardt1-0/+1
Under normal circumstances, it shouldn't ever happen that a repository has no HEAD reference. In fact, git-update-ref(1) would fail any request to delete the HEAD reference, and a newly initialized repository always pre-creates it, too. We have however changed git-clone(1) to partially initialize the refdb just up to the point where remote helpers can find the repository. With that change, we are going to run into a situation where repositories have no refs at all. Now there is a very particular edge case in this situation: when preparing an empty ref transacton, we end up returning whatever value `read_ref_without_reload()` returned to the caller. Under normal conditions this would be fine: "HEAD" should usually exist, and thus the function would return `0`. But if "HEAD" doesn't exist, the function returns a positive value which we end up returning to the caller. Fix this bug by resetting the return code to `0` and add a test. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-23for-each-ref: add new option to include root refsKarthik Nayak1-4/+7
The git-for-each-ref(1) command doesn't provide a way to print root refs i.e pseudorefs and HEAD with the regular "refs/" prefixed refs. This commit adds a new option "--include-root-refs" to git-for-each-ref(1). When used this would also print pseudorefs and HEAD for the current worktree. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-23refs: introduce `refs_for_each_include_root_refs()`Karthik Nayak2-5/+66
Introduce a new ref iteration flag `DO_FOR_EACH_INCLUDE_ROOT_REFS`, which will be used to iterate over regular refs plus pseudorefs and HEAD. Refs which fall outside the `refs/` and aren't either pseudorefs or HEAD are more of a grey area. This is because we don't block the users from creating such refs but they are not officially supported. Introduce `refs_for_each_include_root_refs()` which calls `do_for_each_ref()` with this newly introduced flag. In `refs/files-backend.c`, introduce a new function `add_pseudoref_and_head_entries()` to add pseudorefs and HEAD to the `ref_dir`. We then finally call `add_pseudoref_and_head_entries()` whenever the `DO_FOR_EACH_INCLUDE_ROOT_REFS` flag is set. Any new ref backend will also have to implement similar changes on its end. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-23refs: extract out `loose_fill_ref_dir_regular_file()`Karthik Nayak1-29/+33
Extract out the code for adding a single file to the loose ref dir as `loose_fill_ref_dir_regular_file()` from `loose_fill_ref_dir()` in `refs/files-backend.c`. This allows us to use this function independently in the following commits where we add code to also add pseudorefs to the ref dir. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-21refs: stop resolving ref corresponding to reflogsPatrick Steinhardt2-14/+4
The reflog iterator tries to resolve the corresponding ref for every reflog that it is about to yield. Historically, this was done due to multiple reasons: - It ensures that the refname is safe because we end up calling `check_refname_format()`. Also, non-conformant refnames are skipped altogether. - The iterator used to yield the resolved object ID as well as its flags to the callback. This info was never used though, and the corresponding parameters were dropped in the preceding commit. - When a ref is corrupt then the reflog is not emitted at all. We're about to introduce a new `git reflog list` subcommand that will print all reflogs that the refdb knows about. Skipping over reflogs whose refs are corrupted would be quite counterproductive in this case as the user would have no way to learn about reflogs which may still exist in their repository to help and rescue such a corrupted ref. Thus, the only remaining reason for why we'd want to resolve the ref is to verify its refname. Refactor the code to call `check_refname_format()` directly instead of trying to resolve the ref. This is significantly more efficient given that we don't have to hit the object database anymore to list reflogs. And second, it ensures that we end up showing reflogs of broken refs, which will help to make the reflog more useful. Note that this really only impacts the case where the corresponding ref is corrupt. Reflogs for nonexistent refs would have been returned to the caller beforehand already as we did not pass `RESOLVE_REF_READING` to the function, and thus `refs_resolve_ref_unsafe()` would have returned successfully in that case. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-21refs: drop unused params from the reflog iterator callbackPatrick Steinhardt2-14/+2
The ref and reflog iterators share much of the same underlying code to iterate over the corresponding entries. This results in some weird code because the reflog iterator also exposes an object ID as well as a flag to the callback function. Neither of these fields do refer to the reflog though -- they refer to the corresponding ref with the same name. This is quite misleading. In practice at least the object ID cannot really be implemented in any other way as a reflog does not have a specific object ID in the first place. This is further stressed by the fact that none of the callbacks except for our test helper make use of these fields. Split up the infrastucture so that ref and reflog iterators use separate callback signatures. This allows us to drop the nonsensical fields from the reflog iterator. Note that internally, the backends still use the same shared infra to iterate over both types. As the backends should never end up being called directly anyway, this is not much of a problem and thus kept as-is for simplicity's sake. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-21refs: always treat iterators as orderedPatrick Steinhardt7-46/+20
In the preceding commit we have converted the reflog iterator of the "files" backend to be ordered, which was the only remaining ref iterator that wasn't ordered. Refactor the ref iterator infrastructure so that we always assume iterators to be ordered, thus simplifying the code. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-21refs/files: sort merged worktree and common reflogsPatrick Steinhardt4-73/+56
When iterating through reflogs in a worktree we create a merged iterator that merges reflogs from both refdbs. The resulting refs are ordered so that instead we first return all worktree reflogs before we return all common refs. This is the only remaining case where a ref iterator returns entries in a non-lexicographic order. The result would look something like the following (listed with a command we introduce in a subsequent commit): ``` $ git reflog list HEAD refs/worktree/per-worktree refs/heads/main refs/heads/wt ``` So we first print the per-worktree reflogs in lexicographic order, then the common reflogs in lexicographic order. This is confusing and not consistent with how we print per-worktree refs, which are exclusively sorted lexicographically. Sort reflogs lexicographically in the same way as we sort normal refs. As this is already implemented properly by the "reftable" backend via a separate selection function, we simply pull out that logic and reuse it for the "files" backend. As logs are properly sorted now, mark the merged reflog iterator as sorted. Tests will be added in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-21refs/files: sort reflogs returned by the reflog iteratorPatrick Steinhardt1-2/+2
We use a directory iterator to return reflogs via the reflog iterator. This iterator returns entries in the same order as readdir(3P) would and will thus yield reflogs with no discernible order. Set the new `DIR_ITERATOR_SORTED` flag that was introduced in the preceding commit so that the order is deterministic. While the effect of this can only been observed in a test tool, a subsequent commit will start to expose this functionality to users via a new `git reflog list` subcommand. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-07refs/reftable: fix leak when copying reflog failsPatrick Steinhardt1-1/+1
When copying a ref with the reftable backend we also copy the corresponding log records. When seeking the first log record that we're about to copy fails though we directly return from `write_copy_table()` without doing any cleanup, leaking several allocated data structures. Fix this by exiting via our common cleanup logic instead. Reported-by: Jeff King <peff@peff.net> via Coverity Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-07refs: introduce reftable backendPatrick Steinhardt2-0/+2298
Due to scalability issues, Shawn Pearce has originally proposed a new "reftable" format more than six years ago [1]. Initially, this new format was implemented in JGit with promising results. Around two years ago, we have then added the "reftable" library to the Git codebase via a4bbd13be3 (Merge branch 'hn/reftable', 2021-12-15). With this we have landed all the low-level code to read and write reftables. Notably missing though was the integration of this low-level code into the Git code base in the form of a new ref backend that ties all of this together. This gap is now finally closed by introducing a new "reftable" backend into the Git codebase. This new backend promises to bring some notable improvements to Git repositories: - It becomes possible to do truly atomic writes where either all refs are committed to disk or none are. This was not possible with the "files" backend because ref updates were split across multiple loose files. - The disk space required to store many refs is reduced, both compared to loose refs and packed-refs. This is enabled both by the reftable format being a binary format, which is more compact, and by prefix compression. - We can ignore filesystem-specific behaviour as ref names are not encoded via paths anymore. This means there is no need to handle case sensitivity on Windows systems or Unicode precomposition on macOS. - There is no need to rewrite the complete refdb anymore every time a ref is being deleted like it was the case for packed-refs. This means that ref deletions are now constant time instead of scaling linearly with the number of refs. - We can ignore file/directory conflicts so that it becomes possible to store both "refs/heads/foo" and "refs/heads/foo/bar". - Due to this property we can retain reflogs for deleted refs. We have previously been deleting reflogs together with their refs to avoid file/directory conflicts, which is not necessary anymore. - We can properly enumerate all refs. With the "files" backend it is not easily possible to distinguish between refs and non-refs because they may live side by side in the gitdir. Not all of these improvements are realized with the current "reftable" backend implementation. At this point, the new backend is supposed to be a drop-in replacement for the "files" backend that is used by basically all Git repositories nowadays. It strives for 1:1 compatibility, which means that a user can expect the same behaviour regardless of whether they use the "reftable" backend or the "files" backend for most of the part. Most notably, this means we artificially limit the capabilities of the "reftable" backend to match the limits of the "files" backend. It is not possible to create refs that would end up with file/directory conflicts, we do not retain reflogs, we perform stricter-than-necessary checks. This is done intentionally due to two main reasons: - It makes it significantly easier to land the "reftable" backend as tests behave the same. It would be tough to argue for each and every single test that doesn't pass with the "reftable" backend. - It ensures compatibility between repositories that use the "files" backend and repositories that use the "reftable" backend. Like this, hosters can migrate their repositories to use the "reftable" backend without causing issues for clients that use the "files" backend in their clones. It is expected that these artificial limitations may eventually go away in the long term. Performance-wise things very much depend on the actual workload. The following benchmarks compare the "files" and "reftable" backends in the current version: - Creating N refs in separate transactions shows that the "files" backend is ~50% faster. This is not surprising given that creating a ref only requires us to create a single loose ref. The "reftable" backend will also perform auto compaction on updates. In real-world workloads we would likely also want to perform pack loose refs, which would likely change the picture. Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1) Time (mean ± σ): 2.1 ms ± 0.3 ms [User: 0.6 ms, System: 1.7 ms] Range (min … max): 1.8 ms … 4.3 ms 133 runs Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1) Time (mean ± σ): 2.7 ms ± 0.1 ms [User: 0.6 ms, System: 2.2 ms] Range (min … max): 2.4 ms … 2.9 ms 132 runs Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000) Time (mean ± σ): 1.975 s ± 0.006 s [User: 0.437 s, System: 1.535 s] Range (min … max): 1.969 s … 1.980 s 3 runs Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000) Time (mean ± σ): 2.611 s ± 0.013 s [User: 0.782 s, System: 1.825 s] Range (min … max): 2.597 s … 2.622 s 3 runs Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000) Time (mean ± σ): 198.442 s ± 0.241 s [User: 43.051 s, System: 155.250 s] Range (min … max): 198.189 s … 198.670 s 3 runs Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000) Time (mean ± σ): 294.509 s ± 4.269 s [User: 104.046 s, System: 190.326 s] Range (min … max): 290.223 s … 298.761 s 3 runs - Creating N refs in a single transaction shows that the "files" backend is significantly slower once we start to write many refs. The "reftable" backend only needs to update two files, whereas the "files" backend needs to write one file per ref. Benchmark 1: update-ref: create many refs (refformat = files, refcount = 1) Time (mean ± σ): 1.9 ms ± 0.1 ms [User: 0.4 ms, System: 1.4 ms] Range (min … max): 1.8 ms … 2.6 ms 151 runs Benchmark 2: update-ref: create many refs (refformat = reftable, refcount = 1) Time (mean ± σ): 2.5 ms ± 0.1 ms [User: 0.7 ms, System: 1.7 ms] Range (min … max): 2.4 ms … 3.4 ms 148 runs Benchmark 3: update-ref: create many refs (refformat = files, refcount = 1000) Time (mean ± σ): 152.5 ms ± 5.2 ms [User: 19.1 ms, System: 133.1 ms] Range (min … max): 148.5 ms … 167.8 ms 15 runs Benchmark 4: update-ref: create many refs (refformat = reftable, refcount = 1000) Time (mean ± σ): 58.0 ms ± 2.5 ms [User: 28.4 ms, System: 29.4 ms] Range (min … max): 56.3 ms … 72.9 ms 40 runs Benchmark 5: update-ref: create many refs (refformat = files, refcount = 1000000) Time (mean ± σ): 152.752 s ± 0.710 s [User: 20.315 s, System: 131.310 s] Range (min … max): 152.165 s … 153.542 s 3 runs Benchmark 6: update-ref: create many refs (refformat = reftable, refcount = 1000000) Time (mean ± σ): 51.912 s ± 0.127 s [User: 26.483 s, System: 25.424 s] Range (min … max): 51.769 s … 52.012 s 3 runs - Deleting a ref in a fully-packed repository shows that the "files" backend scales with the number of refs. The "reftable" backend has constant-time deletions. Benchmark 1: update-ref: delete ref (refformat = files, refcount = 1) Time (mean ± σ): 1.7 ms ± 0.1 ms [User: 0.4 ms, System: 1.2 ms] Range (min … max): 1.6 ms … 2.1 ms 316 runs Benchmark 2: update-ref: delete ref (refformat = reftable, refcount = 1) Time (mean ± σ): 1.8 ms ± 0.1 ms [User: 0.4 ms, System: 1.3 ms] Range (min … max): 1.7 ms … 2.1 ms 294 runs Benchmark 3: update-ref: delete ref (refformat = files, refcount = 1000) Time (mean ± σ): 2.0 ms ± 0.1 ms [User: 0.5 ms, System: 1.4 ms] Range (min … max): 1.9 ms … 2.5 ms 287 runs Benchmark 4: update-ref: delete ref (refformat = reftable, refcount = 1000) Time (mean ± σ): 1.9 ms ± 0.1 ms [User: 0.5 ms, System: 1.3 ms] Range (min … max): 1.8 ms … 2.1 ms 217 runs Benchmark 5: update-ref: delete ref (refformat = files, refcount = 1000000) Time (mean ± σ): 229.8 ms ± 7.9 ms [User: 182.6 ms, System: 46.8 ms] Range (min … max): 224.6 ms … 245.2 ms 6 runs Benchmark 6: update-ref: delete ref (refformat = reftable, refcount = 1000000) Time (mean ± σ): 2.0 ms ± 0.0 ms [User: 0.6 ms, System: 1.3 ms] Range (min … max): 2.0 ms … 2.1 ms 3 runs - Listing all refs shows no significant advantage for either of the backends. The "files" backend is a bit faster, but not by a significant margin. When repositories are not packed the "reftable" backend outperforms the "files" backend because the "reftable" backend performs auto-compaction. Benchmark 1: show-ref: print all refs (refformat = files, refcount = 1, packed = true) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.0 ms 1729 runs Benchmark 2: show-ref: print all refs (refformat = reftable, refcount = 1, packed = true) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 1.8 ms 1816 runs Benchmark 3: show-ref: print all refs (refformat = files, refcount = 1000, packed = true) Time (mean ± σ): 4.3 ms ± 0.1 ms [User: 0.9 ms, System: 3.3 ms] Range (min … max): 4.1 ms … 4.6 ms 645 runs Benchmark 4: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = true) Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 1.0 ms, System: 3.3 ms] Range (min … max): 4.2 ms … 5.9 ms 643 runs Benchmark 5: show-ref: print all refs (refformat = files, refcount = 1000000, packed = true) Time (mean ± σ): 2.537 s ± 0.034 s [User: 0.488 s, System: 2.048 s] Range (min … max): 2.511 s … 2.627 s 10 runs Benchmark 6: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = true) Time (mean ± σ): 2.712 s ± 0.017 s [User: 0.653 s, System: 2.059 s] Range (min … max): 2.692 s … 2.752 s 10 runs Benchmark 7: show-ref: print all refs (refformat = files, refcount = 1, packed = false) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 1.9 ms 1834 runs Benchmark 8: show-ref: print all refs (refformat = reftable, refcount = 1, packed = false) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.4 ms … 2.0 ms 1840 runs Benchmark 9: show-ref: print all refs (refformat = files, refcount = 1000, packed = false) Time (mean ± σ): 13.8 ms ± 0.2 ms [User: 2.8 ms, System: 10.8 ms] Range (min … max): 13.3 ms … 14.5 ms 208 runs Benchmark 10: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = false) Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 1.2 ms, System: 3.3 ms] Range (min … max): 4.3 ms … 6.2 ms 624 runs Benchmark 11: show-ref: print all refs (refformat = files, refcount = 1000000, packed = false) Time (mean ± σ): 12.127 s ± 0.129 s [User: 2.675 s, System: 9.451 s] Range (min … max): 11.965 s … 12.370 s 10 runs Benchmark 12: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = false) Time (mean ± σ): 2.799 s ± 0.022 s [User: 0.735 s, System: 2.063 s] Range (min … max): 2.769 s … 2.836 s 10 runs - Printing a single ref shows no real difference between the "files" and "reftable" backends. Benchmark 1: show-ref: print single ref (refformat = files, refcount = 1) Time (mean ± σ): 1.5 ms ± 0.1 ms [User: 0.4 ms, System: 1.0 ms] Range (min … max): 1.4 ms … 1.8 ms 1779 runs Benchmark 2: show-ref: print single ref (refformat = reftable, refcount = 1) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.4 ms … 2.5 ms 1753 runs Benchmark 3: show-ref: print single ref (refformat = files, refcount = 1000) Time (mean ± σ): 1.5 ms ± 0.1 ms [User: 0.3 ms, System: 1.1 ms] Range (min … max): 1.4 ms … 1.9 ms 1840 runs Benchmark 4: show-ref: print single ref (refformat = reftable, refcount = 1000) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.0 ms 1831 runs Benchmark 5: show-ref: print single ref (refformat = files, refcount = 1000000) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.1 ms 1848 runs Benchmark 6: show-ref: print single ref (refformat = reftable, refcount = 1000000) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.1 ms 1762 runs So overall, performance depends on the usecases. Except for many sequential writes the "reftable" backend is roughly on par or significantly faster than the "files" backend though. Given that the "files" backend has received 18 years of optimizations by now this can be seen as a win. Furthermore, we can expect that the "reftable" backend will grow faster over time when attention turns more towards optimizations. The complete test suite passes, except for those tests explicitly marked to require the REFFILES prerequisite. Some tests in t0610 are marked as failing because they depend on still-in-flight bug fixes. Tests can be run with the new backend by setting the GIT_TEST_DEFAULT_REF_FORMAT environment variable to "reftable". There is a single known conceptual incompatibility with the dumb HTTP transport. As "info/refs" SHOULD NOT contain the HEAD reference, and because the "HEAD" file is not valid anymore, it is impossible for the remote client to figure out the default branch without changing the protocol. This shortcoming needs to be handled in a subsequent patch series. As the reftable library has already been introduced a while ago, this commit message will not go into the details of how exactly the on-disk format works. Please refer to our preexisting technical documentation at Documentation/technical/reftable for this. [1]: https://public-inbox.org/git/CAJo=hJtyof=HRy=2sLP0ng0uZ4=S-DpZ5dR1aF+VHVETKG20OQ@mail.gmail.com/ Original-idea-by: Shawn Pearce <spearce@spearce.org> Based-on-patch-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-26Merge branch 'ps/worktree-refdb-initialization'Junio C Hamano4-9/+37
Instead of manually creating refs/ hierarchy on disk upon a creation of a secondary worktree, which is only usable via the files backend, use the refs API to populate it. * ps/worktree-refdb-initialization: builtin/worktree: create refdb via ref backend worktree: expose interface to look up worktree by name builtin/worktree: move setup of commondir file earlier refs/files: skip creation of "refs/{heads,tags}" for worktrees setup: move creation of "refs/" into the files backend refs: prepare `refs_init_db()` for initializing worktree refs
2024-01-16Merge branch 'ps/refstorage-extension'Junio C Hamano4-4/+0
Introduce a new extension "refstorage" so that we can mark a repository that uses a non-default ref backend, like reftable. * ps/refstorage-extension: t9500: write "extensions.refstorage" into config builtin/clone: introduce `--ref-format=` value flag builtin/init: introduce `--ref-format=` value flag builtin/rev-parse: introduce `--show-ref-format` flag t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar setup: introduce GIT_DEFAULT_REF_FORMAT envvar setup: introduce "extensions.refStorage" extension setup: set repository's formats on init setup: start tracking ref storage format refs: refactor logic to look up storage backends worktree: skip reading HEAD when repairing worktrees t: introduce DEFAULT_REPO_FORMAT prereq
2024-01-08Merge branch 'en/header-cleanup'Junio C Hamano3-4/+0
Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files
2024-01-08refs/files: skip creation of "refs/{heads,tags}" for worktreesPatrick Steinhardt1-8/+14
The files ref backend will create both "refs/heads" and "refs/tags" in the Git directory. While this logic makes sense for normal repositories, it does not for worktrees because those refs are "common" refs that would always be contained in the main repository's ref database. Introduce a new flag telling the backend that it is expected to create a per-worktree ref database and skip creation of these dirs in the files backend when the flag is set. No other backends (currently) need worktree-specific logic, so this is the only required change to start creating per-worktree ref databases via `refs_init_db()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-08setup: move creation of "refs/" into the files backendPatrick Steinhardt1-0/+17
When creating the ref database we unconditionally create the "refs/" directory in "setup.c". This is a mandatory prerequisite for all Git repositories regardless of the ref backend in use, because Git will be unable to detect the directory as a repository if "refs/" doesn't exist. We are about to add another new caller that will want to create a ref database when creating worktrees. We would require the same logic to create the "refs/" directory even though the caller really should not care about such low-level details. Ideally, the ref database should be fully initialized after calling `refs_init_db()`. Move the code to create the directory into the files backend itself to make it so. This means that future ref backends will also need to have equivalent logic around to ensure that the directory exists, but it seems a lot more sensible to have it this way round than to require callers to create the directory themselves. An alternative to this would be to create "refs/" in `refs_init_db()` directly. This feels conceptually unclean though as the creation of the refdb is now cluttered across different callsites. Furthermore, both the "files" and the upcoming "reftable" backend write backend-specific data into the "refs/" directory anyway, so splitting up this logic would only make it harder to reason about. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-08refs: prepare `refs_init_db()` for initializing worktree refsPatrick Steinhardt4-4/+9
The purpose of `refs_init_db()` is to initialize the on-disk files of a new ref database. The function is quite inflexible right now though, as callers can neither specify the `struct ref_store` nor can they pass any flags. Refactor the interface to accept both of these. This will be required so that we can start initializing per-worktree ref databases via the ref backend instead of open-coding the initialization in "worktree.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-02refs: refactor logic to look up storage backendsPatrick Steinhardt4-4/+0
In order to look up ref storage backends, we're currently using a linked list of backends, where each backend is expected to set up its `next` pointer to the next ref storage backend. This is kind of a weird setup as backends need to be aware of other backends without much of a reason. Refactor the code so that the array of backends is centrally defined in "refs.c", where each backend is now identified by an integer constant. Expose functions to translate from those integer constants to the name and vice versa, which will be required by subsequent patches. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26treewide: remove unnecessary includes in source filesElijah Newren3-4/+0
Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-17refs: remove `delete_refs` callback from backendsPatrick Steinhardt4-32/+0
Now that `refs_delete_refs` is implemented in a generic way via the ref transaction interfaces there are no callers left that invoke the `delete_refs` callback anymore. Remove it from all of our backends. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-17refs: deduplicate code to delete referencesPatrick Steinhardt2-89/+2
Both the files and the packed-refs reference backends now use the same generic transactions-based code to delete references. Let's pull these implementations up into `refs_delete_refs()` to deduplicate the code. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-17refs/files: use transactions to delete referencesPatrick Steinhardt1-32/+34
In the `files_delete_refs()` callback function of the files backend we implement deletion of references. This is done in two steps: 1. We lock the packed-refs file and delete all references from it in a single transaction. 2. We delete all loose references via separate calls to `refs_delete_ref()`. These steps essentially duplicate the logic around locking and deletion order that we already have in the transactional interfaces, where we do know to lock and evict references from the packed-refs file. Despite the fact that we duplicate the logic, it's also less efficient than if we used a single generic transaction: - The transactional interface knows to skip locking of the packed refs in case they don't contain any of the refs which are about to be deleted. - We end up creating N+1 separate reference transactions, one for the packed-refs file and N for the individual loose references. Refactor the code to instead delete references via a single transaction. As we don't assert the expected old object ID this is equivalent to the previous behaviour, and we already do the same in the packed-refs backend. Despite the fact that the result is simpler to reason about, this change also results in improved performance. The following benchmarks have been executed in linux.git: ``` $ hyperfine -n '{rev}, packed={packed} refcount={refcount}' \ -L packed true,false -L refcount 1,1000 -L rev master,pks-ref-store-generic-delete-refs \ --setup 'git -C /home/pks/Development/git switch --detach {rev} && make -C /home/pks/Development/git -j17' \ --prepare 'printf "create refs/heads/new-branch-%d HEAD\n" $(seq {refcount}) | git -C /home/pks/Reproduction/linux.git update-ref --stdin && if test {packed} = true; then git pack-refs --all; fi' \ --warmup=10 \ '/home/pks/Development/git/bin-wrappers/git -C /home/pks/Reproduction/linux.git branch -d new-branch-{1..{refcount}}' Benchmark 1: master packed=true refcount=1 Time (mean ± σ): 7.8 ms ± 1.6 ms [User: 3.4 ms, System: 4.4 ms] Range (min … max): 5.5 ms … 11.0 ms 120 runs Benchmark 2: master packed=false refcount=1 Time (mean ± σ): 7.0 ms ± 1.1 ms [User: 3.2 ms, System: 3.8 ms] Range (min … max): 5.7 ms … 9.8 ms 180 runs Benchmark 3: master packed=true refcount=1000 Time (mean ± σ): 283.8 ms ± 5.2 ms [User: 45.7 ms, System: 231.5 ms] Range (min … max): 276.7 ms … 291.6 ms 10 runs Benchmark 4: master packed=false refcount=1000 Time (mean ± σ): 284.4 ms ± 5.3 ms [User: 44.2 ms, System: 233.6 ms] Range (min … max): 277.1 ms … 293.3 ms 10 runs Benchmark 5: generic-delete-refs packed=true refcount=1 Time (mean ± σ): 6.2 ms ± 1.8 ms [User: 2.3 ms, System: 3.9 ms] Range (min … max): 4.1 ms … 12.2 ms 142 runs Benchmark 6: generic-delete-refs packed=false refcount=1 Time (mean ± σ): 7.1 ms ± 1.4 ms [User: 2.8 ms, System: 4.3 ms] Range (min … max): 4.2 ms … 10.8 ms 157 runs Benchmark 7: generic-delete-refs packed=true refcount=1000 Time (mean ± σ): 198.9 ms ± 1.7 ms [User: 29.5 ms, System: 165.7 ms] Range (min … max): 196.1 ms … 201.4 ms 10 runs Benchmark 8: generic-delete-refs packed=false refcount=1000 Time (mean ± σ): 199.7 ms ± 7.8 ms [User: 32.2 ms, System: 163.2 ms] Range (min … max): 193.8 ms … 220.7 ms 10 runs Summary generic-delete-refs packed=true refcount=1 ran 1.14 ± 0.37 times faster than master packed=false refcount=1 1.15 ± 0.40 times faster than generic-delete-refs packed=false refcount=1 1.26 ± 0.44 times faster than master packed=true refcount=1 32.24 ± 9.17 times faster than generic-delete-refs packed=true refcount=1000 32.36 ± 9.29 times faster than generic-delete-refs packed=false refcount=1000 46.00 ± 13.10 times faster than master packed=true refcount=1000 46.10 ± 13.13 times faster than master packed=false refcount=1000 ``` Especially in the case where we have many references we can see a clear performance speedup of nearly 30%. This is in contrast to the stated objecive in a27dcf89b68 (refs: make delete_refs() virtual, 2016-09-04), where the virtual `delete_refs()` function was introduced with the intent to speed things up rather than making things slower. So it seems like we have outlived the need for a virtual function. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09files-backend.c: avoid stat in 'loose_fill_ref_dir'Victoria Dye1-9/+5
Modify the 'readdir' loop in 'loose_fill_ref_dir' to, rather than 'stat' a file to determine whether it is a directory or not, use 'get_dtype'. Currently, the loop uses 'stat' to determine whether each dirent is a directory itself or not in order to construct the appropriate ref cache entry. If 'stat' fails (returning a negative value), the dirent is silently skipped; otherwise, 'S_ISDIR(st.st_mode)' is used to check whether the entry is a directory. On platforms that include an entry's d_type in in the 'dirent' struct, this extra 'stat' check is redundant. We can use the 'get_dtype' method to extract this information on platforms that support it (i.e. where NO_D_TYPE_IN_DIRENT is unset), and derive it with 'stat' on platforms that don't. Because 'stat' is an expensive call, this confers a modest-but-noticeable performance improvement when iterating over large numbers of refs (approximately 20% speedup in 'git for-each-ref' in a 30k ref repo). Unlike other existing usage of 'get_dtype', the 'follow_symlinks' arg is set to 1 to replicate the existing handling of symlink dirents. This unfortunately requires calling 'stat' on the associated entry regardless of platform, but symlinks in the loose ref store are highly unlikely since they'd need to be created manually by a user. Note that this patch also changes the condition for skipping creation of a ref entry from "when 'stat' fails" to "when the d_type is anything other than DT_REG or DT_DIR". If a dirent's d_type is DT_UNKNOWN (either because the platform doesn't support d_type in dirents or some other reason) or DT_LNK, 'get_dtype' will try to derive the underlying type with 'stat'. If the 'stat' fails, the d_type will remain 'DT_UNKNOWN' and dirent will be skipped. However, it will also be skipped if it is any other valid d_type (e.g. DT_FIFO for named pipes, DT_LNK for a nested symlink). Git does not handle these properly anyway, so we can safely constrain accepted types to directories and regular files. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09ref-cache.c: fix prefix matching in ref iterationVictoria Dye1-1/+2
Update 'cache_ref_iterator_advance' to skip over refs that are not matched by the given prefix. Currently, a ref entry is considered "matched" if the entry name is fully contained within the prefix: * prefix: "refs/heads/v1" * entry: "refs/heads/v1.0" OR if the prefix is fully contained in the entry name: * prefix: "refs/heads/v1.0" * entry: "refs/heads/v1" The first case is always correct, but the second is only correct if the ref cache entry is a directory, for example: * prefix: "refs/heads/example" * entry: "refs/heads/" Modify the logic in 'cache_ref_iterator_advance' to reflect these expectations: 1. If 'overlaps_prefix' returns 'PREFIX_EXCLUDES_DIR', then the prefix and ref cache entry do not overlap at all. Skip this entry. 2. If 'overlaps_prefix' returns 'PREFIX_WITHIN_DIR', then the prefix matches inside this entry if it is a directory. Skip if the entry is not a directory, otherwise iterate over it. 3. Otherwise, 'overlaps_prefix' returned 'PREFIX_CONTAINS_DIR', indicating that the cache entry (directory or not) is fully contained by or equal to the prefix. Iterate over this entry. Note that condition 2 relies on the names of directory entries having the appropriate trailing slash. The existing function documentation of 'create_dir_entry' explicitly calls out the trailing slash requirement, so this is a safe assumption to make. This bug generally doesn't have any user-facing impact, since it requires: 1. using a non-empty prefix without a trailing slash in an iteration like 'for_each_fullref_in', 2. the callback to said iteration not reapplying the original filter (as for-each-ref does) to ensure unmatched refs are skipped, and 3. the repository having one or more refs that match part of, but not all of, the prefix. However, there are some niche scenarios that meet those criteria (specifically, 'rev-parse --bisect' and '(log|show|shortlog) --bisect'). Add tests covering those cases to demonstrate the fix in this patch. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-21Merge branch 'tb/refs-exclusion-and-packed-refs'Junio C Hamano4-29/+192
Enumerating refs in the packed-refs file, while excluding refs that match certain patterns, has been optimized. * tb/refs-exclusion-and-packed-refs: ls-refs.c: avoid enumerating hidden refs where possible upload-pack.c: avoid enumerating hidden refs where possible builtin/receive-pack.c: avoid enumerating hidden references refs.h: implement `hidden_refs_to_excludes()` refs.h: let `for_each_namespaced_ref()` take excluded patterns revision.h: store hidden refs in a `strvec` refs/packed-backend.c: add trace2 counters for jump list refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) refs/packed-backend.c: refactor `find_reference_location()` refs: plumb `exclude_patterns` argument throughout builtin/for-each-ref.c: add `--exclude` option ref-filter.c: parameterize match functions over patterns ref-filter: add `ref_filter_clear()` ref-filter: clear reachable list pointers after freeing ref-filter.h: provide `REF_FILTER_INIT` refs.c: rename `ref_filter`
2023-07-10refs/packed-backend.c: add trace2 counters for jump listTaylor Blau1-0/+2
The previous commit added low-level tests to ensure that the packed-refs iterator did not enumerate excluded sections of the refspace. However, there was no guarantee that these sections weren't being visited, only that they were being suppressed from the output. To harden these tests, add a trace2 counter which tracks the number of regions skipped by the packed-refs iterator, and assert on its value. Suggested-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)Taylor Blau1-6/+157
When iterating through the `packed-refs` file in order to answer a query like: $ git for-each-ref --exclude=refs/__hidden__ it would be useful to avoid walking over all of the entries in `refs/__hidden__/*` when possible, since we know that the ref-filter code is going to throw them away anyways. In certain circumstances, doing so is possible. The algorithm for doing so is as follows: - For each excluded pattern, find the first record that matches it, and the first record that *doesn't* match it (i.e. the location you'd next want to consider when excluding that pattern). - Sort the set of excluded regions from the previous step in ascending order of the first location within the `packed-refs` file that matches. - Clean up the results from the previous step: discard empty regions, and combine adjacent regions. The set of regions which remains is referred to as the "jump list", and never contains any references which should be included in the result set. Then when iterating through the `packed-refs` file, if `iter->pos` is ever contained in one of the regions from the previous steps, advance `iter->pos` past the end of that region, and continue enumeration. Note that we only perform this optimization when none of the excluded pattern(s) have special meta-characters in them. For a pattern like "refs/foo[ac]", the excluded regions ("refs/fooa", "refs/fooc", and everything underneath them) are not connected. A future implementation that handles this case may split the character class (pretending as if two patterns were excluded: "refs/fooa", and "refs/fooc"). There are a few other gotchas worth considering. First, note that the jump list is sorted, so once we jump past a region, we can avoid considering it (or any regions preceding it) again. The member `jump_pos` is used to track the first next-possible region to jump through. Second, note that the jump list is best-effort, since we do not handle loose references, and because of the meta-character issue above. The jump list may not skip past all references which won't appear in the results, but will never skip over a reference which does appear in the result set. In repositories with a large number of hidden references, the speed-up can be significant. Tests here are done with a copy of linux.git with a reference "refs/pull/N" pointing at every commit, as in: $ git rev-list HEAD | awk '{ print "create refs/pull/" NR " " $0 }' | git update-ref --stdin $ git pack-refs --all , it is significantly faster to have `for-each-ref` jump over the excluded references, as opposed to filtering them out after the fact: $ hyperfine \ 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' \ 'git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' \ 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' Benchmark 1: git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/" Time (mean ± σ): 798.1 ms ± 3.3 ms [User: 687.6 ms, System: 146.4 ms] Range (min … max): 794.5 ms … 805.5 ms 10 runs Benchmark 2: git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull" Time (mean ± σ): 98.9 ms ± 1.4 ms [User: 93.1 ms, System: 5.7 ms] Range (min … max): 97.0 ms … 104.0 ms 29 runs Benchmark 3: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull" Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 0.7 ms, System: 3.8 ms] Range (min … max): 4.1 ms … 5.8 ms 524 runs Summary 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran 21.87 ± 1.05 times faster than 'git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' 176.52 ± 8.19 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' (Comparing stock git and this patch isn't quite fair, since an earlier commit in this series adds a naive implementation of the `--exclude` option. `git.prev` is built from the previous commit and includes this naive implementation). Using the jump list is fairly straightforward (see the changes to `refs/packed-backend.c::next_record()`), but constructing the list is not. To ensure that the construction is correct, add a new suite of tests in t1419 covering various corner cases (overlapping regions, partially overlapping regions, adjacent regions, etc.). Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10refs/packed-backend.c: refactor `find_reference_location()`Taylor Blau1-16/+22
The function `find_reference_location()` is used to perform a binary search-like function over the contents of a repository's `$GIT_DIR/packed-refs` file. The search it implements is unlike a standard binary search in that the records it searches over are not of a fixed width, so the comparison must locate the end of a record before comparing it. Extract the core routine of `find_reference_location()` in order to implement a function in the following patch which will find the first location in the `packed-refs` file that *doesn't* match the given pattern. The behavior of `find_reference_location()` is unchanged. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10refs: plumb `exclude_patterns` argument throughoutTaylor Blau4-9/+13
The subsequent patch will want to access an optional `excluded_patterns` array within `refs/packed-backend.c` that will cull out certain references matching any of the given patterns on a best-effort basis. To do so, the refs subsystem needs to be updated to pass this value across a number of different locations. Prepare for a future patch by introducing this plumbing now, passing NULLs at top-level APIs in order to make that patch less noisy and more easily readable. Signed-off-by: Taylor Blau <me@ttaylorr.co> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21repository: remove unnecessary include of path.hElijah Newren1-0/+1
This also made it clear that several .c files that depended upon path.h were missing a #include for it; add the missing includes while at it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21cache.h: remove this no-longer-used headerElijah Newren2-2/+2
Since this header showed up in some places besides just #include statements, update/clean-up/remove those other places as well. Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got away with violating the rule that all files must start with an include of git-compat-util.h (or a short-list of alternate headers that happen to include it first). This change exposed the violation and caused it to stop building correctly; fix it by having it include git-compat-util.h first, as per policy. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21statinfo: move stat_{data,validity} functions from cache/read-cacheElijah Newren1-0/+1
These functions do not depend upon struct cache_entry or struct index_state in any way, and it seems more logical to break them out into this file, especially since statinfo.h already has the struct stat_data declaration. Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-13Merge branch 'jc/pack-ref-exclude-include'Junio C Hamano4-14/+22
"git pack-refs" learns "--include" and "--exclude" to tweak the ref hierarchy to be packed using pattern matching. * jc/pack-ref-exclude-include: pack-refs: teach pack-refs --include option pack-refs: teach --exclude option to exclude refs from being packed docs: clarify git-pack-refs --all will pack all refs
2023-05-12pack-refs: teach pack-refs --include optionJohn Cai1-8/+10
Allow users to be more selective over which refs to pack by adding an --include option to git-pack-refs. The existing options allow some measure of selectivity. By default git-pack-refs packs all tags. --all can be used to include all refs, and the previous commit added the ability to exclude certain refs with --exclude. While these knobs give the user some selection over which refs to pack, it could be useful to give more control. For instance, a repository may have a set of branches that are rarely updated and would benefit from being packed. --include would allow the user to easily include a set of branches to be packed while leaving everything else unpacked. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-12pack-refs: teach --exclude option to exclude refs from being packedJohn Cai4-10/+15
At GitLab, we have a system that creates ephemeral internal refs that don't live long before getting deleted. Having an option to exclude certain refs from a packed-refs file allows these internal references to be deleted much more efficiently. Add an --exclude option to the pack-refs builtin, and use the ref exclusions API to exclude certain refs from being packed into the final packed-refs file Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-09Merge branch 'en/header-split-cache-h-part-2'Junio C Hamano5-1/+8
More header clean-up. * en/header-split-cache-h-part-2: (22 commits) reftable: ensure git-compat-util.h is the first (indirect) include diff.h: reduce unnecessary includes object-store.h: reduce unnecessary includes commit.h: reduce unnecessary includes fsmonitor: reduce includes of cache.h cache.h: remove unnecessary headers treewide: remove cache.h inclusion due to previous changes cache,tree: move basic name compare functions from read-cache to tree cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c hash-ll.h: split out of hash.h to remove dependency on repository.h tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h dir.h: move DTYPE defines from cache.h versioncmp.h: move declarations for versioncmp.c functions from cache.h ws.h: move declarations for ws.c functions from cache.h match-trees.h: move declarations for match-trees.c functions from cache.h pkt-line.h: move declarations for pkt-line.c functions from cache.h base85.h: move declarations for base85.c functions from cache.h copy.h: move declarations for copy.c functions from cache.h server-info.h: move declarations for server-info.c functions from cache.h packfile.h: move pack_window and pack_entry from cache.h ...
2023-05-02Merge branch 'tb/ban-strtok'Junio C Hamano1-1/+1
Mark strtok() and strtok_r() to be banned. * tb/ban-strtok: banned.h: mark `strtok()` and `strtok_r()` as banned t/helper/test-json-writer.c: avoid using `strtok()` t/helper/test-oidmap.c: avoid using `strtok()` t/helper/test-hashmap.c: avoid using `strtok()` string-list: introduce `string_list_setlen()` string-list: multi-delimiter `string_list_split_in_place()`
2023-04-24string-list: multi-delimiter `string_list_split_in_place()`Taylor Blau1-1/+1
Enhance `string_list_split_in_place()` to accept multiple characters as delimiters instead of a single character. Instead of using `strchr(2)` to locate the first occurrence of the given delimiter character, `string_list_split_in_place_multi()` uses `strcspn(2)` to move past the initial segment of characters comprised of any characters in the delimiting set. When only a single delimiting character is provided, `strpbrk(2)` (which is implemented with `strcspn(2)`) has equivalent performance to `strchr(2)`. Modern `strcspn(2)` implementations treat an empty delimiter or the singleton delimiter as a special case and fall back to calling strchrnul(). Both glibc[1] and musl[2] implement `strcspn(2)` this way. This change is one step to removing `strtok(2)` from the tree. Note that `string_list_split_in_place()` is not a strict replacement for `strtok()`, since it will happily turn sequential delimiter characters into empty entries in the resulting string_list. For example: string_list_split_in_place(&xs, "foo:;:bar:;:baz", ":;", -1) would yield a string list of: ["foo", "", "", "bar", "", "", "baz"] Callers that wish to emulate the behavior of strtok(2) more directly should call `string_list_remove_empty_items()` after splitting. To avoid regressions for the new multi-character delimter cases, update t0063 in this patch as well. [1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35 [2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11 Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24commit.h: reduce unnecessary includesElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24hash-ll.h: split out of hash.h to remove dependency on repository.hElijah Newren4-1/+6
hash.h depends upon and includes repository.h, due to the definition and use of the_hash_algo (defined as the_repository->hash_algo). However, most headers trying to include hash.h are only interested in the layout of the structs like object_id. Move the parts of hash.h that do not depend upon repository.h into a new file hash-ll.h (the "low level" parts of hash.h), and adjust other files to use this new header where the convenience inline functions aren't needed. This allows hash.h and object.h to be fairly small, minimal headers. It also exposes a lot of hidden dependencies on both path.h (which was brought in by repository.h) and repository.h (which was previously implicitly brought in by object.h), so also adjust other files to be more explicit about what they depend upon. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24copy.h: move declarations for copy.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: reduce includes of cache.h in other headersElijah Newren1-1/+1
We had a handful of headers including cache.h that didn't need to anymore. Drop those includes and replace them with includes of smaller files, or forward declarations. However, note that two .c files now need to directly include cache.h, though they should have been including it all along given they are directly using structs defined in it. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: remove double forward declaration of read_in_fullElijah Newren1-0/+1
cache.h's nature of a dumping ground of includes prevented it from being included in some compat/ files, forcing us into a workaround of having a double forward declaration of the read_in_full() function (see commit 14086b0a13 ("compat/pread.c: Add a forward declaration to fix a warning", 2007-11-17)). Now that we have moved functions like read_in_full() from cache.h to wrapper.h, and wrapper.h isn't littered with unrelated and scary #defines, get rid of the extra forward declaration and just have compat/pread.c include wrapper.h. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-file.h: move declarations for object-file.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21write-or-die.h: move declarations for write-or-die.c functions from cache.hElijah Newren2-0/+2
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21setup.h: move declarations for setup.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21environment.h: move declarations for environment.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21wrapper.h: move declarations for wrapper.c functions from cache.hElijah Newren1-1/+2
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: remove unnecessary cache.h inclusion from several sourcesElijah Newren1-1/+1
A number of files were apparently including cache.h solely to get gettext.h. By making those files explicitly include gettext.h, we can already drop the include of cache.h in these files. On top of that, there were some files using cache.h that didn't need to for any reason. Remove these unnecessary includes. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren2-0/+2
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: remove unnecessary cache.h inclusion from a few headersElijah Newren1-1/+1
Ever since a64215b6cd ("object.h: stop depending on cache.h; make cache.h depend on object.h", 2023-02-24), we have a few headers that could have replaced their include of cache.h with an include of object.h. Make that change now. Some C files had to start including cache.h after this change (or some smaller header it had brought in), because the C files were depending on things from cache.h but were only formerly implicitly getting cache.h through one of these headers being modified in this patch. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23ident.h: move ident-related declarations out of cache.hElijah Newren1-0/+1
These functions were all defined in a separate ident.c already, so create ident.h and move the declarations into that file. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren3-0/+3
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23alloc.h: move ALLOC_GROW() functions from cache.hElijah Newren2-2/+4
This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23treewide: remove unnecessary cache.h includesElijah Newren1-1/+0
We had several header files include cache.h unnecessarily. Remove those. These have all been verified via both ensuring that gcc -E $HEADER | grep '"cache.h"' found no hits and that cat >temp.c <<EOF && #include "git-compat-util.h" #include "$HEADER" int main() {} EOF gcc -c temp.c successfully compiles without warnings. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23treewide: ensure one of the appropriate headers is sourced firstElijah Newren1-1/+1
We had several C files ignoring the rule to include one of the appropriate headers first; fix that. While at it, the rule in Documentation/CodingGuidelines about which header to include has also fallen out of sync, so update the wording to mention other allowed headers. Unfortunately, C files in reftable/ don't actually follow the previous or updated rule. If you follow the #include chain in its C files, reftable/system.h _tends_ to be first (i.e. record.c first includes record.h, which first includes basics.h, which first includees system.h), but not always (e.g. publicbasics.c includes another header first that does not include system.h). However, I'm going to punt on making actual changes to the C files in reftable/ since I do not want to risk bringing it out-of-sync with any version being used externally. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-02Merge branch 'ps/fsync-refs-fix'Junio C Hamano1-1/+2
Fix the sequence to fsync $GIT_DIR/packed-refs file that forgot to flush its output to the disk.. * ps/fsync-refs-fix: refs: fix corruption by not correctly syncing packed-refs to disk
2022-12-25refs: fix corruption by not correctly syncing packed-refs to diskPatrick Steinhardt1-1/+2
At GitLab we have recently received a report where a repository was left with a corrupted `packed-refs` file after the node hard-crashed even though `core.fsync=reference` was set. This is something that in theory should not happen if we correctly did the atomic-rename dance to: 1. Write the data into a temporary file. 2. Synchronize the temporary file to disk. 3. Rename the temporary file into place. So if we crash in the middle of writing the `packed-refs` file we should only ever see either the old or the new state of the file. And while we do the dance when writing the `packed-refs` file, there is indeed one gotcha: we use a `FILE *` stream to write the temporary file, but don't flush it before synchronizing it to disk. As a consequence any data that is still buffered will not get synchronized and a crash of the machine may cause corruption. Fix this bug by flushing the file stream before we fsync. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-19refs: unify parse_worktree_ref() and ref_type()Han-Wen Nienhuys2-45/+37
The logic to handle worktree refs (worktrees/NAME/REF and main-worktree/REF) existed in two places: * ref_type() in refs.c * parse_worktree_ref() in worktree.c Collapse this logic together in one function parse_worktree_ref(): this avoids having to cross-check the result of parse_worktree_ref() and ref_type(). Introduce enum ref_worktree_type, which is slightly different from enum ref_type. The latter is a misleading name (one would think that 'ref_type' would have the symref option). Instead, enum ref_worktree_type only makes explicit how a refname relates to a worktree. From this point of view, HEAD and refs/bisect/abc are the same: they specify the current worktree implicitly. The files-backend must avoid packing refs/bisect/* and friends into packed-refs, so expose is_per_worktree_ref() separately. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01git-compat-util.h: use "UNUSED", not "UNUSED(var)"Ævar Arnfjörð Bjarmason3-18/+18
As reported in [1] the "UNUSED(var)" macro introduced in 2174b8c75de (Merge branch 'jk/unused-annotation' into next, 2022-08-24) breaks coccinelle's parsing of our sources in files where it occurs. Let's instead partially go with the approach suggested in [2] of making this not take an argument. As noted in [1] "coccinelle" will ignore such tokens in argument lists that it doesn't know about, and it's less of a surprise to syntax highlighters. This undoes the "help us notice when a parameter marked as unused is actually use" part of 9b240347543 (git-compat-util: add UNUSED macro, 2022-08-19), a subsequent commit will further tweak the macro to implement a replacement for that functionality. 1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19refs: mark unused virtual method parametersJeff King3-14/+16
The refs code uses various polymorphic types (e.g., loose vs packed ref_stores, abstracted iterators). Not every virtual function or callback needs all of its parameters. Let's mark the unused ones to quiet -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19refs: mark unused each_ref_fn parametersJeff King1-1/+3
Functions used with for_each_ref(), etc, need to conform to the each_ref_fn interface. But most of them don't need every parameter; let's annotate the unused ones to quiet -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20Merge branch 'ep/maint-equals-null-cocci'Junio C Hamano1-1/+1
Introduce and apply coccinelle rule to discourage an explicit comparison between a pointer and NULL, and applies the clean-up to the maintenance track. * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci
2022-05-02Merge branch 'ep/maint-equals-null-cocci' for maint-2.35Junio C Hamano1-1/+1
* ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci
2022-05-02tree-wide: apply equals-null.cocciJunio C Hamano1-1/+1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13Revert "Merge branch 'ps/avoid-unnecessary-hook-invocation-with-packed-refs'"Junio C Hamano4-48/+14
This reverts commit 991b4d47f0accd3955d05927d5ce434e03ffbdb6, reversing changes made to bcd020f88e1e22f38422ac3f73ab06b34ec4bef1.
2022-03-29Merge branch 'ab/refs-various-fixes'Junio C Hamano5-171/+127
Code clean-up. * ab/refs-various-fixes: refs debug: add a wrapper for "read_symbolic_ref" packed-backend: remove stub BUG(...) functions misc *.c: use designated initializers for struct assignments refs: use designated initializers for "struct ref_iterator_vtable" refs: use designated initializers for "struct ref_storage_be"
2022-03-25Merge branch 'ps/fsync-refs'Junio C Hamano2-1/+3
Updates to refs traditionally weren't fsync'ed, but we can configure using core.fsync variable to do so. * ps/fsync-refs: core.fsync: new option to harden references
2022-03-17refs debug: add a wrapper for "read_symbolic_ref"Ævar Arnfjörð Bjarmason1-1/+26
In cd475b3b038 (refs: add ability for backends to special-case reading of symbolic refs, 2022-03-01) when the "read_symbolic_ref" callback was added we'd fall back on "refs_read_raw_ref" if there wasn't any backend implementation of "read_symbolic_ref". As discussed in the preceding commit this would only happen if we were running the "debug" backend, e.g. in the "setup for ref completion" test in t9902-completion.sh with: GIT_TRACE_REFS=1 git fetch --no-tags other Let's improve the trace output, but and also eliminate the now-redundant refs_read_raw_ref() fallback case. As noted in the preceding commit the "packed" backend will never call refs_read_symbolic_ref() (nor is it ever going to). For any future backend such as reftable it's OK to ask that they either implement this (or a wrapper) themselves. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-17packed-backend: remove stub BUG(...) functionsÆvar Arnfjörð Bjarmason1-79/+9
Remove the stub BUG(...) functions previously used by the "struct ref_storage_be refs_be_packed" backend. We never call any functions in the packed backend by using it as a "normal" primary ref store, instead we'll always initialize a "files" backend ref-store. It will then via the "packed_ref_store" member of "struct files_ref_store" call selected functions in the "packed" backend, and we'll in addition call others via wrappers in refs.c. So while these would arguably give us *slightly* more meaningful error messages we'll NULL the missing members in the initializer anyway, so we'll reliably get a segfault if we're ever changing the backend and having it call something it doesn't have. So there's no need for this verbose boilerplate, and as shown in a subsequent commit it might even lead to some confusion about the packed backend being a "real" backend. Let's make it clear that it's not. As an aside, this also fixes a warning emitted by SunCC in at least versions 12.5 and 12.6 of Oracle Developer Studio: "refs/packed-backend.c", line 1599: warning: Function has no return statement : packed_create_symref "refs/packed-backend.c", line 1606: warning: Function has no return statement : packed_rename_ref) "refs/packed-backend.c", line 1613: warning: Function has no return statement : packed_copy_ref "refs/packed-backend.c", line 1648: warning: Function has no return statement : packed_create_reflog Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-17refs: use designated initializers for "struct ref_iterator_vtable"Ævar Arnfjörð Bjarmason5-23/+24
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-17refs: use designated initializers for "struct ref_storage_be"Ævar Arnfjörð Bjarmason3-78/+78
Change the definition of the three refs backends we currently carry to use designated initializers. The "= NULL" assignments being retained here are redundant, and could be removed, but let's keep them for clarity. All of these backends define almost all fields, so we're not saving much in terms of line count by omitting these, but e.g. for "refs_be_debug" it's immediately apparent that we're omitting "init" when comparing its assignment to the others. This is a follow-up to similar work merged in bd4232fac33 (Merge branch 'ab/struct-init', 2021-07-16), a4b9fb6a5cf (Merge branch 'ab/designated-initializers-more', 2021-10-18) and a30321b9eae (Merge branch 'ab/designated-initializers' into ab/designated-initializers-more, 2021-09-27). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-15core.fsync: new option to harden referencesPatrick Steinhardt2-1/+3
When writing both loose and packed references to disk we first create a lockfile, write the updated values into that lockfile, and on commit we rename the file into place. According to filesystem developers, this behaviour is broken because applications should always sync data to disk before doing the final rename to ensure data consistency [1][2][3]. If applications fail to do this correctly, a hard crash of the machine can easily result in corrupted on-disk data. This kind of corruption can in fact be easily observed with Git when the machine hard-resets shortly after writing references to disk. On machines with ext4, this will likely lead to the "empty files" problem: the file has been renamed, but its data has not been synced to disk. The result is that the reference is corrupt, and in the worst case this can lead to data loss. Implement a new option to harden references so that users and admins can avoid this scenario by syncing locked loose and packed references to disk before we rename them into place. [1]: https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/ [2]: https://btrfs.wiki.kernel.org/index.php/FAQ (What are the crash guarantees of overwrite-by-rename) [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/ext4.rst (see auto_da_alloc) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01refs/files-backend: optimize reading of symbolic refsPatrick Steinhardt1-6/+28
When reading references via `files_read_raw_ref()` we always consult both the loose reference, and if that wasn't found, we also consult the packed-refs file. While this makes sense to read a generic reference, it is wasteful in the case where we only care about symbolic references: the packed-refs backend does not support them, and thus it cannot ever return one for us. Special-case reading of symbolic references for the files backend such that we always skip asking the packed-refs backend. We use `refs_read_symbolic_ref()` extensively to determine whether we need to skip updating local symbolic references during a fetch, which is why the change results in a significant speedup when doing fetches in repositories with huge numbers of references. The following benchmark executes a mirror-fetch in a repository with about 2 million references via `git fetch --prune --no-write-fetch-head +refs/*:refs/*`: Benchmark 1: HEAD~ Time (mean ± σ): 68.372 s ± 2.344 s [User: 65.629 s, System: 8.786 s] Range (min … max): 65.745 s … 70.246 s 3 runs Benchmark 2: HEAD Time (mean ± σ): 60.259 s ± 0.343 s [User: 61.019 s, System: 7.245 s] Range (min … max): 60.003 s … 60.649 s 3 runs Summary 'HEAD' ran 1.13 ± 0.04 times faster than 'HEAD~' Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01refs: add ability for backends to special-case reading of symbolic refsPatrick Steinhardt4-0/+19
Reading of symbolic and non-symbolic references is currently treated the same in reference backends: we always call `refs_read_raw_ref()` and then decide based on the returned flags what type it is. This has one downside though: symbolic references may be treated different from normal references in a backend from normal references. The packed-refs backend for example doesn't even know about symbolic references, and as a result it is pointless to even ask it for one. There are cases where we really only care about whether a reference is symbolic or not, but don't care about whether it exists at all or may be a non-symbolic reference. But it is not possible to optimize for this case right now, and as a consequence we will always first check for a loose reference to exist, and if it doesn't, we'll query the packed-refs backend for a known-to-not-be-symbolic reference. This is inefficient and requires us to search all packed references even though we know to not care for the result at all. Introduce a new function `refs_read_symbolic_ref()` which allows us to fix this case. This function will only ever return symbolic references and can thus optimize for the scenario layed out above. By default, if the backend doesn't provide an implementation for it, we just use the old code path and fall back to `read_raw_ref()`. But in case the backend provides its own, more efficient implementation, we will use that one instead. Note that this function is explicitly designed to not distinguish between missing references and non-symbolic references. If it did, we'd be forced to always search the packed-refs backend to see whether the symbolic reference the user asked for really doesn't exist, or if it exists as a non-symbolic reference. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-18Merge branch 'ps/avoid-unnecessary-hook-invocation-with-packed-refs'Junio C Hamano4-14/+48
Because a deletion of ref would need to remove it from both the loose ref store and the packed ref store, a delete-ref operation that logically removes one ref may end up invoking ref-transaction hook twice, which has been corrected. * ps/avoid-unnecessary-hook-invocation-with-packed-refs: refs: skip hooks when deleting uncovered packed refs refs: do not execute reference-transaction hook on packing refs refs: demonstrate excessive execution of the reference-transaction hook refs: allow skipping the reference-transaction hook refs: allow passing flags when beginning transactions refs: extract packed_refs_delete_refs() to allow control of transaction
2022-01-26refs API: remove "failure_errno" from refs_resolve_ref_unsafe()Ævar Arnfjörð Bjarmason1-22/+9
Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-17refs: skip hooks when deleting uncovered packed refsPatrick Steinhardt1-3/+6
When deleting refs from the loose-files refs backend, then we need to be careful to also delete the same ref from the packed refs backend, if it exists. If we don't, then deleting the loose ref would "uncover" the packed ref. We thus always have to queue up deletions of refs for both the loose and the packed refs backend. This is done in two separate transactions, where the end result is that the reference-transaction hook is executed twice for the deleted refs. This behaviour is quite misleading: it's exposing implementation details of how the files backend works to the user, in contrast to the logical updates that we'd really want to expose via the hook. Worse yet, whether the hook gets executed once or twice depends on how well-packed the repository is: if the ref only exists as a loose ref, then we execute it once, otherwise if it is also packed then we execute it twice. Fix this behaviour and don't execute the reference-transaction hook at all when refs in the packed-refs backend if it's driven by the files backend. This works as expected even in case the refs to be deleted only exist in the packed-refs backend because the loose-backend always queues refs in its own transaction even if they don't exist such that they can be locked for concurrent creation. And it also does the right thing in case neither of the backends has the ref because that would cause the transaction to fail completely. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-17refs: do not execute reference-transaction hook on packing refsPatrick Steinhardt1-2/+4
The reference-transaction hook is supposed to track logical changes to references, but it currently also gets executed when packing refs in a repository. This is unexpected and ultimately not all that useful: packing refs is not supposed to result in any user-visible change to the refs' state, and it ultimately is an implementation detail of how refs stores work. Fix this excessive execution of the hook when packing refs. Reported-by: Waleed Khan <me@waleedkhan.name> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-17refs: allow passing flags when beginning transactionsPatrick Steinhardt3-6/+7
We do not currently have any flags when creating reference transactions, but we'll add one to disable execution of the reference transaction hook in some cases. Allow passing flags to `ref_store_transaction_begin()` to prepare for this change. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-17refs: extract packed_refs_delete_refs() to allow control of transactionPatrick Steinhardt3-9/+37
When deleting loose refs, then we also have to delete the refs in the packed backend. This is done by calling `refs_delete_refs()`, which then uses the packed-backend's logic to delete refs. This doesn't allow us to exercise any control over the reference transaction which is being created in the packed backend, which is required in a subsequent commit. Extract a new function `packed_refs_delete_refs()`, which hosts most of the logic to delete refs except for creating the transaction itself. Like this, we can easily create the transaction in the files backend and thus exert more control over it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-14Merge branch 'ab/refs-errno-cleanup'Junio C Hamano1-2/+1
A brown-paper-bag fix on top of a topic that was merged during this cycle. * ab/refs-errno-cleanup: refs API: use "failure_errno", not "errno"
2022-01-13refs API: use "failure_errno", not "errno"Ævar Arnfjörð Bjarmason1-2/+1
Fix a logic error in refs_resolve_ref_unsafe() introduced in a recent series of mine to abstract the refs API away from errno. See 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29)for that series. In that series introduction of "failure_errno" to refs_resolve_ref_unsafe came in ef18119dec8 (refs API: add a version of refs_resolve_ref_unsafe() with "errno", 2021-10-16). There we'd set "errno = 0" immediately before refs_read_raw_ref(), and then set "failure_errno" to "errno" if errno was non-zero afterwards. Then in the next commit 8b72fea7e91 (refs API: make refs_read_raw_ref() not set errno, 2021-10-16) we started expecting "refs_read_raw_ref()" to set "failure_errno". It would do that if refs_read_raw_ref() failed, but it wouldn't be the same errno. So we might set the "errno" here to any arbitrary bad value, and end up e.g. returning NULL when we meant to return the refname from refs_resolve_ref_unsafe(), or the other way around. Instrumenting this code will reveal cases where refs_read_raw_ref() will fail, and "errno" and "failure_errno" will be set to different values. In practice I haven't found a case where this scary bug changed anything in practice. The reason for that is that we'll not care about the actual value of "errno" here per-se, but only whether: 1. We have an errno 2. If it's one of ENOENT, EISDIR or ENOTDIR. See the adjacent code added in a1c1d8170db (refs_resolve_ref_unsafe: handle d/f conflicts for writes, 2017-10-06) I.e. if we clobber "failure_errno" with "errno", but it happened to be one of those three, and we'll clobber it with another one of the three we were OK. Perhaps there are cases where the difference ended up mattering, but I haven't found them. Instrumenting the test suite to fail if "errno" and "failure_errno" are different shows a lot of failures, checking if they're different *and* one is but not the other is outside that list of three "errno" values yields no failures. But let's fix the obvious bug. We should just stop paying attention to "errno" in refs_resolve_ref_unsafe(). In addition let's change the partial resetting of "errno" in files_read_raw_ref() to happen just before the "return", to ensure that any such bug will be more easily spotted in the future. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-10Merge branch 'ab/reflog-prep'Junio C Hamano1-24/+20
Code refactoring in the reflog part of refs API. * ab/reflog-prep: reflog + refs-backend: move "verbose" out of the backend refs files-backend: assume cb->newlog if !EXPIRE_REFLOGS_DRY_RUN reflog: reduce scope of "struct rev_info" reflog expire: don't use lookup_commit_reference_gently() reflog expire: refactor & use "tip_commit" only for UE_NORMAL reflog expire: use "switch" over enum values reflog: change one->many worktree->refnames to use a string_list reflog expire: narrow scope of "cb" in cmd_reflog_expire() reflog delete: narrow scope of "cmd" passed to count_reflog_ent()
2021-12-22reflog + refs-backend: move "verbose" out of the backendÆvar Arnfjörð Bjarmason1-24/+20
Move the handling of the "verbose" flag entirely out of "refs/files-backend.c" and into "builtin/reflog.c". This allows the backend to stop knowing about the EXPIRE_REFLOGS_VERBOSE flag. The expire_reflog_ent() function shouldn't need to deal with the implementation detail of whether or not we're emitting verbose output, by doing this the --verbose output becomes backend-agnostic, so reftable will get the same output. I think the output is rather bad currently, and should e.g. be implemented with some better future mode of progress.[ch], but that's a topic for another improvement. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-22refs files-backend: assume cb->newlog if !EXPIRE_REFLOGS_DRY_RUNÆvar Arnfjörð Bjarmason1-2/+2
It's not possible for "cb->newlog" to be NULL if !EXPIRE_REFLOGS_DRY_RUN, since files_reflog_expire() would have error()'d and taken the "goto failure" branch if it couldn't open the file. By not using the "newlog" field private to "file-backend.c"'s "struct expire_reflog_cb", we can move this verbosity logging to "builtin/reflog.c" in a subsequent commit. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-22refs: centralize initialization of the base ref_store.Han-Wen Nienhuys4-11/+7
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-22refs: print error message in debug outputHan-Wen Nienhuys1-1/+2
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-22refs: pass gitdir to packed_ref_store_createHan-Wen Nienhuys3-8/+8
This is consistent with the calling convention for ref backend creation, and avoids storing ".git/packed-refs" (the name of a regular file) in a variable called ref_store::gitdir. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-15Merge branch 'hn/allow-bogus-oid-in-ref-tests'Junio C Hamano1-23/+30
The test helper for refs subsystem learned to write bogus and/or nonexistent object name to refs to simulate error situations we want to test Git in. * hn/allow-bogus-oid-in-ref-tests: t1430: create valid symrefs using test-helper t1430: remove refs using test-tool refs: introduce REF_SKIP_REFNAME_VERIFICATION flag refs: introduce REF_SKIP_OID_VERIFICATION flag refs: update comment. test-ref-store: plug memory leak in cmd_delete_refs test-ref-store: parse symbolic flag constants test-ref-store: remove force-create argument for create-reflog
2021-12-15Merge branch 'hn/reflog-tests'Junio C Hamano1-2/+5
Prepare tests on ref API to help testing reftable backends. * hn/reflog-tests: refs/debug: trim trailing LF from reflog message test-ref-store: tweaks to for-each-reflog-ent format t1405: check for_each_reflog_ent_reverse() more thoroughly test-ref-store: don't add newline to reflog message show-branch: show reflog message
2021-12-10Merge branch 'hn/create-reflog-simplify'Junio C Hamano4-9/+6
A small simplification of API. * hn/create-reflog-simplify: refs: drop force_create argument of create_reflog API
2021-12-07refs: introduce REF_SKIP_OID_VERIFICATION flagHan-Wen Nienhuys1-21/+29
This lets the ref-store test helper write non-existent or unparsable objects into the ref storage. Use this to make t1006 and t3800 independent of the files storage backend. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-07refs: update comment.Han-Wen Nienhuys1-2/+1
REF_IS_PRUNING is right below this comment, so it clearly does not belong in this comment. This was apparently introduced in commit 5ac95fee (Nov 5, 2017 "refs: tidy up and adjust visibility of the `ref_update` flags"). Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-02refs/debug: trim trailing LF from reflog messageHan-Wen Nienhuys1-2/+5
On iteration, the reflog message is always terminated by a newline. Trim it to avoid clobbering the console with is this extra newline. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-29Merge branch 'ab/refs-errno-cleanup'Junio C Hamano3-73/+113
The "remainder" of hn/refs-errno-cleanup topic. * ab/refs-errno-cleanup: (21 commits) refs API: post-migration API renaming [2/2] refs API: post-migration API renaming [1/2] refs API: don't expose "errno" in run_transaction_hook() refs API: make expand_ref() & repo_dwim_log() not set errno refs API: make resolve_ref_unsafe() not set errno refs API: make refs_ref_exists() not set errno refs API: make refs_resolve_refdup() not set errno refs tests: ignore ignore errno in test-ref-store helper refs API: ignore errno in worktree.c's find_shared_symref() refs API: ignore errno in worktree.c's add_head_info() refs API: make files_copy_or_rename_ref() et al not set errno refs API: make loose_fill_ref_dir() not set errno refs API: make resolve_gitlink_ref() not set errno refs API: remove refs_read_ref_full() wrapper refs/files: remove "name exist?" check in lock_ref_oid_basic() reflog tests: add --updateref tests refs API: make refs_rename_ref_available() static refs API: make parse_loose_ref_contents() not set errno refs API: make refs_read_raw_ref() not set errno refs API: add a version of refs_resolve_ref_unsafe() with "errno" ...
2021-11-22refs: drop force_create argument of create_reflog APIHan-Wen Nienhuys4-9/+6
There is only one caller, builtin/checkout.c, and it hardcodes force_create=1. This argument was introduced in abd0cd3a301 (refs: new public ref function: safe_create_reflog, 2015-07-21), which promised to immediately use it in a follow-on commit, but that never happened. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-25Merge branch 'jt/no-abuse-alternate-odb-for-submodules'Junio C Hamano6-11/+44
Follow through the work to use the repo interface to access submodule objects in-process, instead of abusing the alternate object database interface. * jt/no-abuse-alternate-odb-for-submodules: submodule: trace adding submodule ODB as alternate submodule: pass repo to check_has_commit() object-file: only register submodule ODB if needed merge-{ort,recursive}: remove add_submodule_odb() refs: peeling non-the_repository iterators is BUG refs: teach arbitrary repo support to iterators refs: plumb repo into ref stores
2021-10-16refs API: post-migration API renaming [2/2]Ævar Arnfjörð Bjarmason1-9/+9
Rename the transitory refs_werrres_ref_unsafe() function to refs_resolve_ref_unsafe(), now that all callers of the old function have learned to pass in a "failure_errno" parameter. The coccinelle semantic patch added in the preceding commit works, but I couldn't figure out how to get spatch(1) to re-flow these argument lists (and sometimes make lines way too long), so this rename was done with: perl -pi -e 's/refs_werrres_ref_unsafe/refs_resolve_ref_unsafe/g' \ $(git grep -l refs_werrres_ref_unsafe -- '*.c') But after that "make contrib/coccinelle/refs.cocci.patch" comes up empty, so the result would have been the same. Let's remove that transitory semantic patch file, we won't need to retain it for any other in-flight changes, refs_werrres_ref_unsafe() only existed within this patch series. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-16refs API: make files_copy_or_rename_ref() et al not set errnoÆvar Arnfjörð Bjarmason1-4/+6
None of the callers of rename_ref() and copy_ref() care about errno, and as seen in the context here we already emit our own non-errno using error() in the case where we'd use it. So let's have it explicitly ignore errno, and do the same in commit_ref_update(), which is only used within other code in files_copy_or_rename_ref() itself which doesn't care about errno either. It might actually be sensible to have the callers use errno if the failure was filesystem-specific, and with the upcoming reftable backend we don't want to rely on that sort of thing, so let's keep ignoring that for now. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-16refs API: make loose_fill_ref_dir() not set errnoÆvar Arnfjörð Bjarmason1-2/+3
Change the refs_resolve_ref_unsafe() invoked in loose_fill_ref_dir() to a form that ignores errno. The only eventual caller of this function is create_ref_cache(), whose callers in turn don't have their failure depend on any errno set here. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-16refs API: remove refs_read_ref_full() wrapperÆvar Arnfjörð Bjarmason1-14/+22
Remove the refs_read_ref_full() wrapper in favor of migrating various refs.c API users to the underlying refs_werrres_ref_unsafe() function. A careful reading of these callers shows that the callers of this function did not care about "errno", by moving away from the refs_resolve_ref_unsafe() wrapper we can be sure that nothing relies on it anymore. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-16refs/files: remove "name exist?" check in lock_ref_oid_basic()Ævar Arnfjörð Bjarmason1-24/+24
In lock_ref_oid_basic() we'll happily lock a reference that doesn't exist yet. That's normal, and is how references are initially born, but we don't need to retain checks here in lock_ref_oid_basic() about the state of the ref, when what we're checking is either checked already, or something we're about to discover by trying to lock the ref with raceproof_create_file(). The one exception is the caller in files_reflog_expire(), who passes us a "type" to find out if the reference is a symref or not. We can move the that logic over to that caller, which can now defer its discovery of whether or not the ref is a symref until it's needed. In the preceding commit an exhaustive regression test was added for that case in a new test in "t1417-reflog-updateref.sh". The improved diagnostics here were added in 5b2d8d6f218 (lock_ref_sha1_basic(): improve diagnostics for ref D/F conflicts, 2015-05-11), and then much of the surrounding code went away recently in my 245fbba46d6 (refs/files: remove unused "errno == EISDIR" code, 2021-08-23). The refs_resolve_ref_unsafe() code being removed here looks like it should be tasked with doing that, but it's actually redundant to other code. The reason for that is as noted in 245fbba46d6 this once widely used function now only has a handful of callers left, which all handle this case themselves. To the extent that we're racy between their check and ours removing this check actually improves the situation, as we'll be doing fewer things between the not-under-lock initial check and acquiring the lock. Why this is OK for all the remaining callers of lock_ref_oid_basic() is noted below. There are only two of those callers: * "git branch -[cm] <oldbranch> <newbranch>": In files_copy_or_rename_ref() we'll call this when we copy or rename refs via rename_ref() and copy_ref(). but only after we've checked if the refname exists already via its own call to refs_resolve_ref_unsafe() and refs_rename_ref_available(). As the updated comment to the latter here notes neither of those are actually needed. If we delete not only this code but also refs_rename_ref_available() we'll do just fine, we'll just emit a less friendly error message if e.g. "git branch -m A B/C" would have a D/F conflict with a "B" file. Actually we'd probably die before that in case reflogs for the branch existed, i.e. when the try to rename() or copy_file() the relevant reflog, since if we've got a D/F conflict with a branch name we'll probably also have the same with its reflogs (but not necessarily, we might have reflogs, but it might not). As some #leftoverbits that code seems buggy to me, i.e. the reflog "protocol" should be to get a lock on the main ref, and then perform ref and/or reflog operations. That code dates back to c976d415e53 (git-branch: add options and tests for branch renaming, 2006-11-28) and probably pre-dated the solidifying of that convention. But in any case, that edge case is not our bug or problem right now. * "git reflog expire <ref>": In files_reflog_expire() we'll call this without previous ref existence checking in files-backend.c, but that code is in turn called by code that's just finished checking if the refname whose reflog we're expiring exists. See ae35e16cd43 (reflog expire: don't lock reflogs using previously seen OID, 2021-08-23) for the current state of that code, and 5e6f003ca8a (reflog_expire(): ignore --updateref for symbolic references, 2015-03-03) for the code we'd break if we only did a "update = !!ref" here, which is covered by the aforementioned regression test in "t1417-reflog-updateref.sh". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-16refs API: make refs_rename_ref_available() staticÆvar Arnfjörð Bjarmason2-14/+29
Move the refs_rename_ref_available() function into "refs/files-backend.c". It is file-backend specific. This function was added in 5fe7d825da8 (refs.c: pass a list of names to skip to is_refname_available, 2014-05-01) as rename_ref_available() and was only ever used in this one file-backend specific codepath. So let's move it there. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-16refs API: make parse_loose_ref_contents() not set errnoHan-Wen Nienhuys2-13/+24
Change the parse_loose_ref_contents() function to stop setting "errno" and failure, and to instead pass up a "failure_errno" via a parameter. This requires changing its callers to do the same. The EINVAL error from parse_loose_ref_contents is used in files-backend to create a custom error message. In untangling this we discovered a tricky edge case. The refs_read_special_head() function was relying on parse_loose_ref_contents() setting EINVAL. By converting it to use "saved_errno" we can migrate away from "errno" in this part of the code entirely, and do away with an existing "save_errno" pattern, its only purpose was to not clobber the "errno" we previously needed at the end of files_read_raw_ref(). Let's assert that we can do that by not having files_read_raw_ref() itself operate on *failure_errno in addition to passing it on. Instead we'll assert that if we return non-zero we actually do set errno, thus assuring ourselves and callers that they can trust the resulting "failure_errno". Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-16refs API: make refs_read_raw_ref() not set errnoHan-Wen Nienhuys3-10/+13
Add a "failure_errno" to refs_read_raw_ref(), his allows refs_werrres_ref_unsafe() to pass along its "failure_errno", as a first step before its own callers are migrated to pass it further up the chain. We are leaving out out the refs_read_special_head() in refs_read_raw_ref() for now, as noted in a subsequent commit moving it to "failure_errno" will require some special consideration. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-11Merge branch 'jk/ref-paranoia'Junio C Hamano2-21/+40
The ref iteration code used to optionally allow dangling refs to be shown, which has been tightened up. * jk/ref-paranoia: refs: drop "broken" flag from for_each_fullref_in() ref-filter: drop broken-ref code entirely ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN repack, prune: drop GIT_REF_PARANOIA settings refs: turn on GIT_REF_PARANOIA by default refs: omit dangling symrefs when using GIT_REF_PARANOIA refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag refs-internal.h: reorganize DO_FOR_EACH_* flag documentation refs-internal.h: move DO_FOR_EACH_* flags next to each other t5312: be more assertive about command failure t5312: test non-destructive repack t5312: create bogus ref as necessary t5312: drop "verbose" helper t5600: provide detached HEAD for corruption failures t5516: don't use HEAD ref for invalid ref-deletion tests t7900: clean up some more broken refs
2021-10-08refs: peeling non-the_repository iterators is BUGJonathan Tan4-2/+17
There is currently no support for peeling the current ref of an iterator iterating over a non-the_repository ref store, and none is needed. Thus, for now, BUG() if that happens. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08refs: teach arbitrary repo support to iteratorsJonathan Tan3-3/+9
Note that should_pack_ref() is called when writing refs, which is only supported for the_repository, hence the_repository is hardcoded there. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08refs: plumb repo into ref storesJonathan Tan4-6/+18
In preparation for the next 2 patches that adds (partial) support for arbitrary repositories to ref iterators, plumb a repository into all ref stores. There are no changes to program logic. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-06Merge branch 'ab/retire-refs-unused-funcs'Junio C Hamano3-100/+19
Code cleanup. * ab/retire-refs-unused-funcs: refs/ref-cache.[ch]: remove "incomplete" from create_dir_entry() refs/ref-cache.c: remove "mkdir" parameter from find_containing_dir() refs/ref-cache.[ch]: remove unused add_ref_entry() refs/ref-cache.[ch]: remove unused remove_entry_from_dir() refs.[ch]: remove unused ref_storage_backend_exists()
2021-10-03Merge branch 'hn/refs-errno-cleanup'Junio C Hamano4-39/+145
Futz with the way 'errno' is relied on in the refs API to carry the failure modes up the call chain. * hn/refs-errno-cleanup: refs: make errno output explicit for read_raw_ref_fn refs/files-backend: stop setting errno from lock_ref_oid_basic refs: remove EINVAL errno output from specification of read_raw_ref_fn refs file backend: move raceproof_create_file() here
2021-10-03Merge branch 'ab/refs-files-cleanup'Junio C Hamano4-103/+46
Continued work on top of the hn/refs-errno-cleanup topic. * ab/refs-files-cleanup: refs/files: remove unused "errno != ENOTDIR" condition refs/files: remove unused "errno == EISDIR" code refs/files: remove unused "oid" in lock_ref_oid_basic() refs API: remove OID argument to reflog_expire() reflog expire: don't lock reflogs using previously seen OID refs/files: add a comment about refs_reflog_exists() call refs: make repo_dwim_log() accept a NULL oid refs/debug: re-indent argument list for "prepare" refs/files: remove unused "skip" in lock_raw_ref() too refs/files: remove unused "extras/skip" in lock_ref_oid_basic() refs: drop unused "flags" parameter to lock_ref_oid_basic() refs/files: remove unused REF_DELETING in lock_ref_oid_basic() refs/packet: add missing BUG() invocations to reflog callbacks
2021-09-28refs/ref-cache.[ch]: remove "incomplete" from create_dir_entry()Ævar Arnfjörð Bjarmason3-9/+7
Remove the now-unused "incomplete" parameter from create_dir_entry(), all its callers specify it as "1", so let's drop the "incomplete=0" case. The last caller to use it was search_for_subdir(), but that code was removed in the preceding commit. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-28refs/ref-cache.c: remove "mkdir" parameter from find_containing_dir()Ævar Arnfjörð Bjarmason1-24/+12
Remove the "mkdir" parameter from the find_containing_dir() function, the add_ref_entry() function removed in the preceding commit was its last user. Since "mkdir" is always "0" we can also remove the parameter from search_for_subdir(), which in turn means that we can delete most of that function. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-28refs/ref-cache.[ch]: remove unused add_ref_entry()Ævar Arnfjörð Bjarmason2-16/+0
This function has not been used since 9dd389f3d8d (packed_ref_store: get rid of the `ref_cache` entirely, 2017-09-25). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-28refs/ref-cache.[ch]: remove unused remove_entry_from_dir()Ævar Arnfjörð Bjarmason2-51/+0
This function was missed in 9939b33d6a3 (packed-backend: rip out some now-unused code, 2017-09-08), and has been orphaned since then. Let's delete it. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flagJeff King2-0/+11
When the DO_FOR_EACH_INCLUDE_BROKEN flag is used, we include both actual corrupt refs (illegal names, missing objects), but also symrefs that point to nothing. This latter is not really a corruption, but just something that may happen normally. For example, the symref at refs/remotes/origin/HEAD may point to a tracking branch which is later deleted. (The local HEAD may also be unborn, of course, but we do not access it through ref iteration). Most callers of for_each_ref() etc, do not care. They don't pass INCLUDE_BROKEN, so don't see it at all. But for those which do pass it, this somewhat-normal state causes extra warnings (e.g., from for-each-ref) or even aborts operations (destructive repacks with GIT_REF_PARANOIA set). This patch just introduces the flag and the mechanism; there are no callers yet (and hence no tests). Two things to note on the implementation: - we actually skip any symref that does not resolve to a ref. This includes ones which point to an invalidly-named ref. You could argue this is a more serious breakage than simple dangling. But the overall effect is the same (we could not follow the symref), as well as the impact on things like REF_PARANOIA (either way, a symref we can't follow won't impact reachability, because we'll see the ref itself during iteration). The underlying resolution function doesn't distinguish these two cases (they both get REF_ISBROKEN). - we change the iterator in refs/files-backend.c where we check INCLUDE_BROKEN. There's a matching spot in refs/packed-backend.c, but we don't know need to do anything there. The packed backend does not support symrefs at all. The resulting set of flags might be a bit easier to follow if we broke this down into "INCLUDE_CORRUPT_REFS" and "INCLUDE_DANGLING_SYMREFS". But there are a few reasons not do so: - adding a new OMIT_DANGLING_SYMREFS flag lets us leave existing callers intact, without changing their behavior (and some of them really do want to see the dangling symrefs; e.g., t5505 has a test which expects us to report when a symref becomes dangling) - they're not actually independent. You cannot say "include dangling symrefs" without also including refs whose objects are not reachable, because dangling symrefs by definition do not have an object. We could tweak the implementation to distinguish this, but in practice nobody wants to ask for that. Adding the OMIT flag keeps the implementation simple and makes sure we don't regress the current behavior. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27refs-internal.h: reorganize DO_FOR_EACH_* flag documentationJeff King1-19/+27
The documentation for the DO_FOR_EACH_* flags is sprinkled over the refs-internal.h file. We define the two flags in one spot, and then describe them in more detail far away from there, in the definitions of refs_ref_iterator_begin() and ref_iterator_advance_fn(). Let's try to organize this a bit better: - convert the #defines to an enum. This makes it clear that they are related, and that the enum shows the complete set of flags. - combine all descriptions for each flag in a single spot, next to the flag's definition - use the enum rather than a bare int for functions which take the flags. This helps readers realize which flags can be used. - clarify the mention of flags for ref_iterator_advance_fn(). It does not take flags itself, but is meant to depend on ones set up earlier. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27refs-internal.h: move DO_FOR_EACH_* flags next to each otherJeff King1-8/+8
There are currently two DO_FOR_EACH_* flags, which must not have their bits overlap. Yet they're defined hundreds of lines apart. Let's move them next to each other to make it clear that they are related and are a complete set (which matters if you are adding a new flag and would like to know what the next available bit is). Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09refs/files-backend: remove unused open mode parameterRené Scharfe1-1/+1
We only need to provide a mode if we are willing to let open(2) create the file, which is not the case here, so drop the unnecessary parameter. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25refs: make errno output explicit for read_raw_ref_fnHan-Wen Nienhuys4-30/+34
This makes it explicit how alternative ref backends should report errors in read_raw_ref_fn. read_raw_ref_fn needs to supply a credible errno for a number of cases. These are primarily: 1) The files backend calls read_raw_ref from lock_raw_ref, and uses the resulting error codes to create/remove directories as needed. 2) ENOENT should be translated in a zero OID, optionally with REF_ISBROKEN set, returning the last successfully resolved symref. This is necessary so read_raw_ref("HEAD") on an empty repo returns refs/heads/main (or the default branch du-jour), and we know on which branch to create the first commit. Make this information flow explicit by adding a failure_errno to the signature of read_raw_ref. All errnos from the files backend are still propagated unchanged, even though inspection suggests only ENOTDIR, EISDIR and ENOENT are relevant. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25refs/files-backend: stop setting errno from lock_ref_oid_basicHan-Wen Nienhuys1-9/+2
refs/files-backend.c::lock_ref_oid_basic() tries to signal how it failed to its callers using errno. It is safe to stop setting errno here, because the callers of this file-scope static function are * files_copy_or_rename_ref() * files_create_symref() * files_reflog_expire() None of them looks at errno after seeing a negative return from lock_ref_oid_basic() to make any decision, and no caller of these three functions looks at errno after they signal a failure by returning a negative value. In particular, * files_copy_or_rename_ref() - here, calls are followed by error() (which performs I/O) or write_ref_to_lockfile() (which calls parse_object() which may perform I/O) * files_create_symref() - here, calls are followed by error() or create_symref_locked() (which performs I/O and does not inspect errno) * files_reflog_expire() - here, calls are followed by error() or refs_reflog_exists() (which calls a function in a vtable that is not documented to use and/or preserve errno) Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25refs: remove EINVAL errno output from specification of read_raw_ref_fnHan-Wen Nienhuys1-3/+3
This commit does not change code; it documents the fact that an alternate ref backend does not need to return EINVAL from read_raw_ref_fn to function properly. This is correct, because refs_read_raw_ref is only called from; * resolve_ref_unsafe(), which does not care for the EINVAL errno result. * refs_verify_refname_available(), which does not inspect errno. * files-backend.c, where errno is overwritten on failure. * packed-backend.c (is_packed_transaction_needed), which calls it for the packed ref backend, which never emits EINVAL. A grep for EINVAL */*c reveals that no code checks errno against EINVAL after reading references. In addition, the refs.h file does not mention errno at all. A grep over resolve_ref_unsafe() turned up the following callers that inspect errno: * sequencer.c::print_commit_summary, which uses it for die_errno * lock_ref_oid_basic(), which only treats EISDIR and ENOTDIR specially. The files ref backend does use EINVAL. The files backend does not call into the generic API (refs_read_raw), but into the files-specific function (files_read_raw_ref), which we are not changing in this commit. As the errno sideband is unintuitive and error-prone, remove EINVAL value, as a step towards getting rid of the errno sideband altogether. Spotted by Ævar Arnfjörð Bjarmason <avarab@gmail.com>. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25refs file backend: move raceproof_create_file() hereÆvar Arnfjörð Bjarmason1-0/+109
Move the raceproof_create_file() API added to cache.h and object-file.c in 177978f56ad (raceproof_create_file(): new function, 2017-01-06) to its only user, refs/files-backend.c. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25refs/files: remove unused "errno != ENOTDIR" conditionÆvar Arnfjörð Bjarmason1-2/+1
As a follow-up to the preceding commit where we removed the adjacent "errno == EISDIR" condition in the same function, remove the "last_errno != ENOTDIR" condition here. It's not possible for us to hit this condition added in 5b2d8d6f218 (lock_ref_sha1_basic(): improve diagnostics for ref D/F conflicts, 2015-05-11). Since a1c1d8170db (refs_resolve_ref_unsafe: handle d/f conflicts for writes, 2017-10-06) we've explicitly caught these in refs_resolve_ref_unsafe() before returning NULL: if (errno != ENOENT && errno != EISDIR && errno != ENOTDIR) return NULL; We'd then always return the refname from refs_resolve_ref_unsafe() even if we were in a broken state as explained in the preceding commit. The elided context here is a call to refs_resolve_ref_unsafe(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25refs/files: remove unused "errno == EISDIR" codeÆvar Arnfjörð Bjarmason1-25/+3
When we lock a reference like "foo" we need to handle the case where "foo" exists, but is an empty directory. That's what this code added in bc7127ef0f (ref locking: allow 'foo' when 'foo/bar' used to exist but not anymore., 2006-09-30) seems like it should be dealing with. Except it doesn't, and we never take this branch. The reason is that when bc7127ef0f was written this looked like: ref = resolve_ref([...]); if (!ref && errno == EISDIR) { [...] And in resolve_ref() we had this code: fd = open(path, O_RDONLY); if (fd < 0) return NULL; I.e. we would attempt to read "foo" with open(), which would fail with EISDIR and we'd return NULL. We'd then take this branch, call remove_empty_directories() and continue. Since a1c1d8170d (refs_resolve_ref_unsafe: handle d/f conflicts for writes, 2017-10-06) we don't. E.g. in the case of files_copy_or_rename_ref() our callstack will look something like: [...] -> files_copy_or_rename_ref() -> lock_ref_oid_basic() -> refs_resolve_ref_unsafe() At that point the first (now only) refs_resolve_ref_unsafe() call in lock_ref_oid_basic() would do the equivalent of this in the resulting call to refs_read_raw_ref() in refs_resolve_ref_unsafe(): /* Via refs_read_raw_ref() */ fd = open(path, O_RDONLY); if (fd < 0) /* get errno == EISDIR */ /* later, in refs_resolve_ref_unsafe() */ if ([...] && errno != EISDIR) return NULL; [...] /* returns the refs/heads/foo to the caller, even though it's a directory */ return refname; I.e. even though we got an "errno == EISDIR" we won't take this branch, since in cases of EISDIR "resolved" is always non-NULL. I.e. we pretend at this point as though everything's OK and there is no "foo" directory. We then proceed with the entire ref update and don't call remove_empty_directories() until we call commit_ref_update(). See 5387c0d883 (commit_ref(): if there is an empty dir in the way, delete it, 2016-05-05) for the addition of that code, and a1c1d8170db (refs_resolve_ref_unsafe: handle d/f conflicts for writes, 2017-10-06) for the commit that changed the original codepath added in bc7127ef0f to use this "EISDIR" handling. Further historical commentary: Before the two preceding commits the caller in files_reflog_expire() was the only one out of our 4 callers that would pass non-NULL as an oid. We would then set a (now gone) "resolve_flags" to "RESOLVE_REF_READING" and just before that "errno != EISDIR" check do: if (resolve_flags & RESOLVE_REF_READING) return NULL; There may have been some case where this ended up mattering and we couldn't safely make this change before we removed the "oid" parameter, but I don't think there was, see [1] for some discussion on that. In any case, now that we've removed the "oid" parameter in a preceding commit we can be sure that this code is redundant, so let's remove it. 1. http://lore.kernel.org/git/871r801yp6.fsf@evledraar.gmail.com Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25refs/files: remove unused "oid" in lock_ref_oid_basic()Ævar Arnfjörð Bjarmason1-56/+14
In the preceding commit the last caller that passed a non-NULL OID was changed to pass NULL to lock_ref_oid_basic(). As noted in preceding commits use of this API has been going away (we should use ref transactions, or lock_raw_ref()), so we're unlikely to gain new callers that want to pass the "oid". So let's remove it, doing so means we can remove the "mustexist" condition, and therefore anything except the "flags = RESOLVE_REF_NO_RECURSE" case. Furthermore, since the verify_lock() function we called did most of its work when the "oid" was passed (as "old_oid") we can inline the trivial part of it that remains in its only remaining caller. Without a NULL "oid" passed it was equivalent to calling refs_read_ref_full() followed by oidclr(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25refs API: remove OID argument to reflog_expire()Ævar Arnfjörð Bjarmason4-6/+5
Since the the preceding commit the "oid" parameter to reflog_expire() is always NULL, but it was not cleaned up to reduce the size of the diff. Let's do that subsequent API and documentation cleanup now. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25reflog expire: don't lock reflogs using previously seen OIDÆvar Arnfjörð Bjarmason1-2/+5
During reflog expiry, the cmd_reflog_expire() function first iterates over all reflogs in logs/*, and then one-by-one acquires the lock for each one and expires it. This behavior has been with us since this command was implemented in 4264dc15e1 ("git reflog expire", 2006-12-19). Change this to stop calling lock_ref_oid_basic() with the OID we saw when we looped over the logs, instead have it pass the OID it managed to lock. This mostly mitigates a race condition where e.g. "git gc" will fail in a concurrently updated repository because the branch moved since "git reflog expire --all" was started. I.e. with: error: cannot lock ref '<refname>': ref '<refname>' is at <OID-A> but expected <OID-B> This behavior of passing in an "oid" was needed for an edge-case that I've untangled in this and preceding commits though, namely that we needed this OID because we'd: 1. Lookup the reflog name/OID via dwim_log() 2. With that OID, lock the reflog 3. Later in builtin/reflog.c we use the OID we looked as input to lookup_commit_reference_gently(), assured that it's equal to the OID we got from dwim_log(). We can be sure that this change is safe to make because between dwim_log (step #1) and lock_ref_oid_basic (step #2) there was no other logic relevant to the OID or expiry run in the cmd_reflog_expire() caller. We can thus treat that code as a black box, before and after this change it would get an OID that's been locked, the only difference is that now we mostly won't be failing to get the lock due to the TOCTOU race[0]. That failure was purely an implementation detail in how the "current OID" was looked up, it was divorced from the locking mechanism. What do we mean with "mostly"? It mostly mitigates it because we'll still run into cases where the ref is locked and being updated as we want to expire it, and other git processes wanting to update the refs will in turn race with us as we expire the reflog. That remaining race can in turn be mitigated with the core.filesRefLockTimeout setting, see 4ff0f01cb7 ("refs: retry acquiring reference locks for 100ms", 2017-08-21). In practice if that value is high enough we'll probably never have ref updates or reflog expiry failing, since the clients involved will retry for far longer than the time any of those operations could take. See [1] for an initial report of how this impacted "git gc" and a large discussion about this change in early 2019. In particular patch looked good to Michael Haggerty, see his[2]. That message seems to not have made it to the ML archive, its content is quoted in full in my [3]. I'm leaving behind now-unused code the refs API etc. that takes the now-NULL "unused_oid" argument, and other code that can be simplified now that we never have on OID in that context, that'll be cleaned up in subsequent commits, but for now let's narrowly focus on fixing the "git gc" issue. As the modified assert() shows we always pass a NULL oid to reflog_expire() now. Unfortunately this sort of probabilistic contention is hard to turn into a test. I've tested this by running the following three subshells in concurrent terminals: ( rm -rf /tmp/git && git init /tmp/git && while true do head -c 10 /dev/urandom | hexdump >/tmp/git/out && git -C /tmp/git add out && git -C /tmp/git commit -m"out" done ) ( rm -rf /tmp/git-clone && git clone file:///tmp/git /tmp/git-clone && while git -C /tmp/git-clone pull do date done ) ( while git -C /tmp/git-clone reflog expire --all do date done ) Before this change the "reflog expire" would fail really quickly with the "but expected" error noted above. After this change both the "pull" and "reflog expire" will run for a while, but eventually fail because I get unlucky with core.filesRefLockTimeout (the "reflog expire" is in a really tight loop). As noted above that can in turn be mitigated with higher values of core.filesRefLockTimeout than the 100ms default. As noted in the commentary added in the preceding commit there's also the case of branches being racily deleted, that can be tested by adding this to the above: ( while git -C /tmp/git-clone branch topic master && git -C /tmp/git-clone branch -D topic do date done ) With core.filesRefLockTimeout set to 10 seconds (it can probably be a lot lower) I managed to run all four of these concurrently for about an hour, and accumulated ~125k commits, auto-gc's and all, and didn't have a single failure. The loops visibly stall while waiting for the lock, but that's expected and desired behavior. 0. https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use 1. https://lore.kernel.org/git/87tvg7brlm.fsf@evledraar.gmail.com/ 2. http://lore.kernel.org/git/b870a17d-2103-41b8-3cbc-7389d5fff33a@alum.mit.edu 3. https://lore.kernel.org/git/87pnqkco8v.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25refs/files: add a comment about refs_reflog_exists() callÆvar Arnfjörð Bjarmason1-0/+13
Add a comment about why it is that we need to check for the the existence of a reflog we're deleting after we've successfully acquired the lock in files_reflog_expire(). As noted in [1] the lock protocol for reflogs is somewhat intuitive. This early exit code the comment applies to dates all the way back to 4264dc15e19 (git reflog expire, 2006-12-19). 1. https://lore.kernel.org/git/54DCDA42.2060800@alum.mit.edu/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25refs/debug: re-indent argument list for "prepare"Ævar Arnfjörð Bjarmason1-2/+2
Re-indent this argument list that's been mis-indented since it was added in 34c319970d1 (refs/debug: trace into reflog expiry too, 2021-04-23). This makes a subsequent change smaller. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25refs/files: remove unused "skip" in lock_raw_ref() tooÆvar Arnfjörð Bjarmason1-5/+4
Remove the unused "skip" parameter to lock_raw_ref(), it was never used. We do use it when passing "skip" to the refs_rename_ref_available() function in files_copy_or_rename_ref(), but not here. This is part of a larger series that modifies lock_ref_oid_basic() extensively, there will be no more modifications of this function in this series, but since the preceding commit removed this unused parameter from lock_ref_oid_basic(), let's do it here too for consistency. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25refs/files: remove unused "extras/skip" in lock_ref_oid_basic()Ævar Arnfjörð Bjarmason1-15/+7
The lock_ref_oid_basic() function has gradually been replaced by use of the file transaction API, there are only 4 remaining callers of it. None of those callers pass non-NULL "extras" and "skip" parameters, the last such caller went away in 92b1551b1d4 (refs: resolve symbolic refs first, 2016-04-25), so let's remove the parameters. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25refs: drop unused "flags" parameter to lock_ref_oid_basic()Jeff King1-7/+6
In the last commit we removed the REF_DELETING flag from lock_ref_oid_basic(). Since then all of the remaining callers do pass REF_NO_DEREF, but that has been ignored completely since 7a418f3a17 (lock_ref_sha1_basic(): only handle REF_NODEREF mode, 2016-04-22). So we can simply get rid of the parameter entirely. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-19refs/files: remove unused REF_DELETING in lock_ref_oid_basic()Ævar Arnfjörð Bjarmason1-2/+0
The lock_ref_oid_basic() function has gradually been replaced by most callers no longer performing a low-level "acquire lock, update and release", and instead using the ref transaction API. So there are only 4 remaining callers of lock_ref_oid_basic(). None of those callers pass REF_DELETING anymore, the last caller went away in 92b1551b1d (refs: resolve symbolic refs first, 2016-04-25). Before that we'd refactored and moved this code in: - 8df4e511387 (struct ref_update: move "have_old" into "flags", 2015-02-17) - 7bd9bcf372d (refs: split filesystem-based refs code into a new file, 2015-11-09) - 165056b2fc (lock_ref_for_update(): new function, 2016-04-24) We then finally stopped using it in 92b1551b1d (noted above). So let's remove the handling of this parameter. By itself this change doesn't benefit us much, but it's the start of even more removal of unused code in and around this function in subsequent commits. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-19refs/packet: add missing BUG() invocations to reflog callbacksÆvar Arnfjörð Bjarmason1-0/+5
In e0cc8ac8202 (packed_ref_store: make class into a subclass of `ref_store`, 2017-06-23) a die() was added to packed_create_reflog(), but not to any of the other reflog callbacks, let's do that. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-28Merge branch 'hn/refs-debug-empty-prefix'Junio C Hamano1-1/+2
Debugging aid. * hn/refs-debug-empty-prefix: refs/debug: quote prefix
2021-07-19refs/debug: quote prefixHan-Wen Nienhuys1-1/+2
This makes the empty prefix ("") stand out better. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-16Merge branch 'ab/struct-init'Junio C Hamano1-1/+1
Code cleanup around struct_type_init() functions. * ab/struct-init: string-list.h users: change to use *_{nodup,dup}() string-list.[ch]: add a string_list_init_{nodup,dup}() dir.[ch]: replace dir_init() with DIR_INIT *.c *_init(): define in terms of corresponding *_INIT macro *.h: move some *_INIT to designated initializers
2021-07-16Merge branch 'hn/refs-iterator-peel-returns-boolean'Junio C Hamano3-2/+5
Tiny API tweak. * hn/refs-iterator-peel-returns-boolean: refs: make explicit that ref_iterator_peel returns boolean
2021-07-01string-list.h users: change to use *_{nodup,dup}()Ævar Arnfjörð Bjarmason1-1/+1
Change all in-tree users of the string_list_init(LIST, BOOL) API to use string_list_init_{nodup,dup}(LIST) instead. As noted in the preceding commit let's leave the now-unused string_list_init() wrapper in-place for any in-flight users, it can be removed at some later date. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20refs: make explicit that ref_iterator_peel returns booleanHan-Wen Nienhuys3-2/+5
Use -1 as error return value throughout. This removes spurious differences in the GIT_TRACE_REFS output, depending on the ref storage backend active. Before, the cached ref_iterator (but only that iterator!) would return peel_object() output directly. No callers relied on the peel_status values beyond success/failure. All calls to these functions go through peel_iterated_oid(), which returns peel_object() as a fallback, but also squashing the error values. The iteration interface already passes REF_ISSYMREF and REF_ISBROKEN through the flags argument, so the additional error values in enum peel_status provide no value. The ref iteration interface provides a separate peel() function because certain formats (eg. packed-refs and reftable) can store the peeled object next to the tag SHA1. Passing the peeled SHA1 as an optional argument to each_ref_fn maps more naturally to the implementation of ref databases. Changing the code in this way is left for a future refactoring. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-16Merge branch 'wc/packed-ref-removal-cleanup'Junio C Hamano1-6/+6
When "git update-ref -d" removes a ref that is packed, it left empty directories under $GIT_DIR/refs/ for * wc/packed-ref-removal-cleanup: refs: cleanup directories when deleting packed ref
2021-05-11refs: cleanup directories when deleting packed refWill Chandler1-6/+6
When deleting a packed ref via 'update-ref -d', a lockfile is made in the directory that would contain the loose copy of that ref, creating any directories in the ref's path that do not exist. When the transaction completes, the lockfile is deleted, but any empty parent directories made when creating the lockfile are left in place. These empty directories are not removed by 'pack-refs' or other housekeeping tasks and will accumulate over time. When deleting a loose ref, we remove all empty parent directories at the end of the transaction. This commit applies the parent directory cleanup logic used when deleting loose refs to packed refs as well. Signed-off-by: Will Chandler <wfc@wfchandler.org> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-10Merge branch 'bc/hash-transition-interop-part-1'Junio C Hamano2-2/+2
SHA-256 transition. * bc/hash-transition-interop-part-1: hex: print objects using the hash algorithm member hex: default to the_hash_algo on zero algorithm value builtin/pack-objects: avoid using struct object_id for pack hash commit-graph: don't store file hashes as struct object_id builtin/show-index: set the algorithm for object IDs hash: provide per-algorithm null OIDs hash: set, copy, and use algo field in struct object_id builtin/pack-redundant: avoid casting buffers to struct object_id Use the final_oid_fn to finalize hashing of object IDs hash: add a function to finalize object IDs http-push: set algorithm when reading object ID Always use oidread to read into struct object_id hash: add an algo member to struct object_id
2021-05-07Merge branch 'hn/trace-reflog-expiry'Junio C Hamano1-3/+44
The reflog expiry machinery has been taught to emit trace events. * hn/trace-reflog-expiry: refs/debug: trace into reflog expiry too
2021-04-27hash: provide per-algorithm null OIDsbrian m. carlson2-2/+2
Up until recently, object IDs did not have an algorithm member, only a hash. Consequently, it was possible to share one null (all-zeros) object ID among all hash algorithms. Now that we're going to be handling objects from multiple hash algorithms, it's important to make sure that all object IDs have a correct algorithm field. Introduce a per-algorithm null OID, and add it to struct hash_algo. Introduce a wrapper function as well, and use it everywhere we used to use the null_oid constant. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-27refs/debug: trace into reflog expiry tooHan-Wen Nienhuys1-3/+44
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-12refs: print errno for read_raw_ref if GIT_TRACE_REFS is setHan-Wen Nienhuys1-1/+4
The ref backend API uses errno as a sideband error channel. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-13use CALLOC_ARRAYRené Scharfe4-9/+9
Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-06refs/files-backend: don't peek into `struct lock_file`Martin Ågren1-2/+2
Similar to the previous commits, avoid peeking into the `struct lock_file`. Use the lock file API instead. Note how we obtain the path to the lock file if `fdopen_lock_file()` failed and that this is not a problem: as documented in lockfile.h, failure to "fdopen" does not roll back the lock file and we're free to, e.g., query it for its path. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-22Merge branch 'hn/refs-trace-backend'Junio C Hamano2-0/+403
Developer support. * hn/refs-trace-backend: refs: add GIT_TRACE_REFS debugging mechanism
2020-09-09refs: add GIT_TRACE_REFS debugging mechanismHan-Wen Nienhuys2-0/+403
When set in the environment, GIT_TRACE_REFS makes git print operations and results as they flow through the ref storage backend. This helps debug discrepancies between different ref backends. Example: $ GIT_TRACE_REFS="1" ./git branch 15:42:09.769631 refs/debug.c:26 ref_store for .git 15:42:09.769681 refs/debug.c:249 read_raw_ref: HEAD: 0000000000000000000000000000000000000000 (=> refs/heads/ref-debug) type 1: 0 15:42:09.769695 refs/debug.c:249 read_raw_ref: refs/heads/ref-debug: 3a238e539bcdfe3f9eb5010fd218640c1b499f7a (=> refs/heads/ref-debug) type 0: 0 15:42:09.770282 refs/debug.c:233 ref_iterator_begin: refs/heads/ (0x1) 15:42:09.770290 refs/debug.c:189 iterator_advance: refs/heads/b4 (0) 15:42:09.770295 refs/debug.c:189 iterator_advance: refs/heads/branch3 (0) Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-08refs: move REF_LOG_ONLY to refs-internal.hHan-Wen Nienhuys2-7/+7
REF_LOG_ONLY is used in the transaction preparation: if a symref is involved in a transaction, the referent of the symref should be updated, and the symref itself should only be updated in the reflog. Other ref backends will need to duplicate this logic too, so move it to a central place. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-19refs: move gitdir into base ref_storeHan-Wen Nienhuys3-9/+10
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-19refs: fix comment about submodule ref_storesHan-Wen Nienhuys1-1/+1
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-19refs: split off reading loose ref data in separate functionHan-Wen Nienhuys2-15/+25
This prepares for handling FETCH_HEAD (which is not a regular ref) separately from the ref backend. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-31refs: move the logic to add \t to reflog to the files backendHan-Wen Nienhuys1-1/+3
523fa69c (reflog: cleanse messages in the refs.c layer, 2020-07-10) centralized reflog normalizaton. However, the normalizaton added a leading "\t" to the message. This is an artifact of the reflog storage format in the files backend, so it should be added there. Routines that parse back the reflog (such as grab_nth_branch_switch) expect the "\t" to not be in the message, so without this fix, git with reftable cannot process the "@{-1}" syntax. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-10reflog: cleanse messages in the refs.c layerJunio C Hamano2-7/+1
Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-20refs: improve documentation for ref iteratorHan-Wen Nienhuys1-3/+15
Document some of the flag options in refs_ref_iterator_begin, and explain how ref_iterator_advance_fn should handle them. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30refs: fix segfault when aborting empty transactionPatrick Steinhardt1-8/+10
When cleaning up a transaction that has no updates queued, then the transaction's backend data will not have been allocated. We correctly handle this for the packed backend, where the cleanup function checks whether the backend data has been allocated at all -- if not, then there is nothing to clean up. For the files backend we do not check this and as a result will hit a segfault due to dereferencing a `NULL` pointer when cleaning up such a transaction. Fix the issue by checking whether `backend_data` is set in the files backend, too. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31C: use skip_prefix() to avoid hardcoded string lengthJunio C Hamano1-2/+1
We often skip an optional prefix in a string with a hardcoded constant, e.g. if (starts_with(string, "prefix")) string += 6; which is less error prone when written skip_prefix(string, "prefix", &string); Note that this changes a few error messages from "git reflog expire --expire=nonsense.timestamp", which used to complain by saying '--expire=nonsense.timestamp' is not a valid timestamp but with this change, we say 'nonsense.timestamp' is not a valid timestamp which is more technically correct (the string with --expire= as a prefix obviously cannot be a valid timestamp, but the error is about the part of the input without that prefix). Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-11refs: pass NULL to refs_read_ref_full() because object ID is not neededRené Scharfe1-2/+2
refs_read_ref_full() wraps refs_resolve_ref_unsafe(), which handles a NULL oid pointer of callers not interested in the resolved object ID. Pass NULL from files_copy_or_rename_ref() to clarify that it is one such caller. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-10Fix spelling errors in code commentsElijah Newren1-1/+1
Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-22Merge branch 'sc/pack-refs-deletion-racefix'Junio C Hamano1-7/+16
"git pack-refs" can lose refs that are created while running, which is getting corrected. * sc/pack-refs-deletion-racefix: pack-refs: always refresh after taking the lock file
2019-08-07dir-iterator: release strbuf after useRené Scharfe1-1/+3
Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-02pack-refs: always refresh after taking the lock fileSun Chao1-7/+16
When a packed ref is deleted, the whole packed-refs file is rewritten to omit the ref that no longer exists. However if another gc command is running and calls `pack-refs --all` simultaneously, there is a chance that a ref that was just updated lose the newly created commits. Through these steps, losing commits on newly updated refs can be demonstrated: # step 1: compile git without `USE_NSEC` option Some kernel releases do enable it by default while some do not. And if we compile git without `USE_NSEC`, it will be easier demonstrated by the following steps. # step 2: setup a repository and add the first commit git init repo && (cd repo && git config core.logallrefupdates true && git commit --allow-empty -m foo) # step 3: in one terminal, repack the refs repeatedly cd repo && while true do git pack-refs --all done # step 4: in another terminal, simultaneously update the # master with update-ref, and create and delete an # unrelated ref also with update-ref cd repo && while true do us=$(git commit-tree -m foo -p HEAD HEAD^{tree}) && git update-ref refs/heads/newbranch $us && git update-ref refs/heads/master $us && git update-ref -d refs/heads/newbranch && them=$(git rev-parse master) && if test "$them" != "$us" then echo >&2 "lost commit: $us" exit 1 fi # eye candy printf . done Though we have the packed-refs lock file and loose refs lock files to avoid updating conflicts, a ref will lost its newly commits if racy stat-validity of `packed-refs` file happens (which is quite same as the racy-git described in `Documentation/technical/racy-git.txt`), the following specific set of operations demonstrates the problem: 1. Call `pack-refs --all` to pack all the loose refs to packed-refs, and let say the modify time of the packed-refs is DATE_M. 2. Call `update-ref` to update a new commit to master while it is already packed. the old value (let us call it OID_A) remains in the packed-refs file and write the new value (let us call it OID_B) to $GIT_DIR/refs/heads/master. 3. Call `update-ref -d` within the same DATE_M from the 1th step to delete a different ref newbranch which is packed in the packed-refs file. It check newbranch's oid from packed-refs file without locking it. Meanwhile it keeps a snapshot of the packed-refs file in memory and record the file's attributes with the snapshot. The oid of master in the packed-refs's snapshot is OID_A. 4. Call a new `pack-refs --all` to pack the loose refs, the oid of master in packe-refs file is OID_B, and the loose refs $GIT_DIR/refs/heads/master is removed. Let's say the `pack-refs --all` is very quickly done and the new packed-refs file's modify time is still DATE_M, and it has the same file size, even the same inode. 5. 3th step now goes on after checking the newbranch, it begin to rewrite the packed-refs file. After get the lock file of packed-ref file, it checks it's on-disk file attributes with the snapshot, suck as the timestamp, the file size and the inode value. If they are both the same values, and the snapshot is not refreshed. Because the loose ref of master is removed by 4th step, `update-ref -d` will updates the new packed-ref to disk which contains master with the oid OID_A. So now the newly commit OID_B of master is lost. The best path forward is just always refreshing after take the lock file of `packed-refs` file. Traditionally we avoided that because refreshing it implied parsing the whole file. But these days we mmap it, so it really is just an extra open()/mmap() and a quick read of the header. That doesn't seem like an outrageous cost to pay when we're already taking the lock. Signed-off-by: Sun Chao <sunchao9@huawei.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Sun Chao <sunchao9@huawei.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-11dir-iterator: add flags parameter to dir_iterator_beginMatheus Tavares1-1/+1
Add the possibility of giving flags to dir_iterator_begin to initialize a dir-iterator with special options. Currently possible flags are: - DIR_ITERATOR_PEDANTIC, which makes dir_iterator_advance abort immediately in the case of an error, instead of keep looking for the next valid entry; - DIR_ITERATOR_FOLLOW_SYMLINKS, which makes the iterator follow symlinks and include linked directories' contents in the iteration. These new flags will be used in a subsequent patch. Also add tests for the flags' usage and adjust refs/files-backend.c to the new dir_iterator_begin signature. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-11dir-iterator: refactor state machine modelMatheus Tavares1-4/+13
dir_iterator_advance() is a large function with two nested loops. Let's improve its readability factoring out three functions and simplifying its mechanics. The refactored model will no longer depend on level.initialized and level.dir_state to keep track of the iteration state and will perform on a single loop. Also, dir_iterator_begin() currently does not check if the given string represents a valid directory path. Since the refactored model will have to stat() the given path at initialization, let's also check for this kind of error and make dir_iterator_begin() return NULL, on failures, with errno appropriately set. And add tests for this new behavior. Improve documentation at dir-iteration.h and code comments at dir-iterator.c to reflect the changes and eliminate possible ambiguities. Finally, adjust refs/files-backend.c to check for now possible dir_iterator_begin() failures. Original-patch-by: Daniel Ferreira <bnmvco@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-16Merge branch 'jk/refs-double-abort'Junio C Hamano1-1/+15
A corner case bug in the refs API has been corrected. * jk/refs-double-abort: refs/files-backend: don't look at an aborted transaction refs/files-backend: handle packed transaction prepare failure
2019-04-10Merge branch 'nd/rewritten-ref-is-per-worktree'Junio C Hamano1-22/+28
"git rebase" uses the refs/rewritten/ hierarchy to store its intermediate states, which inherently makes the hierarchy per worktree, but it didn't quite work well. * nd/rewritten-ref-is-per-worktree: Make sure refs/rewritten/ is per-worktree files-backend.c: reduce duplication in add_per_worktree_entries_to_dir() files-backend.c: factor out per-worktree code in loose_fill_ref_dir()
2019-03-22refs/files-backend: don't look at an aborted transactionJeff King1-1/+5
When deleting refs, we hold packed-refs.lock and prepare a packed transaction to drop the refs from the packed-refs file. If it turns out that we don't need to rewrite the packed refs (e.g., because none of the deletions were present in the file), then we abort the transaction. If that abort succeeds, then the transaction struct will have been freed, and we set our local pointer to NULL so we don't look at it again. However, if it fails, then the struct will _still_ have been freed (because ref_transaction_abort() always frees). But we don't clean up the pointer, and will jump to our cleanup code, which will try to abort it again, causing a use-after-free. It's actually impossible for this to trigger in practice, since packed_transaction_abort() will never return anything but success. But let's fix it anyway, since that's more than we should assume about the packed-refs code (after all, we are already bothering to check for an error result which cannot be triggered). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-22refs/files-backend: handle packed transaction prepare failureJeff King1-0/+10
In files_transaction_prepare(), if we have to delete some refs, we use a subordinate packed_transaction to do so. It's rare for that sub-transaction's prepare step to fail, since we hold the packed-refs lock. But if it does, we trigger a BUG() due to these steps: - we've attached the packed transaction to the files transaction as backend_data->packed_transaction - when the prepare step fails, the packed transaction cleans itself up, putting itself into the CLOSED state - the error value from preparing the packed transaction lets us know in files_transaction_prepare() that we should also clean up and return an error. We call files_transaction_cleanup(), which tries to abort backend_data->packed_transaction. Since it's already CLOSED, that triggers an assertion in ref_transaction_abort(). We can fix that by disconnecting the packed transaction from the outer files transaction, and then free-ing (not aborting!) it ourselves. A few other options/alternatives I considered: - we could just make it a noop to abort a CLOSED transaction. But that seems less safe, since clearly this code expects (and enforces) a particular set of state transitions. - we could have files_transaction_cleanup() selectively call abort() vs free() based on the state of the on the packed transaction. That's basically a more restricted version of the above, but also potentially unsafe. - instead of disconnecting backend_data->packed_transaction on error, we could wait to install it until we successfully prepare. That might make the flow a little simpler, but it introduces a hassle. Earlier parts of files_transaction_prepare() that encounter an error will jump to the cleanup label, and expect that cleaning up the outer transaction will clean up the packed transaction, too. We'd have to adjust those sites to clean up the packed transaction. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-08Make sure refs/rewritten/ is per-worktreeNguyễn Thái Ngọc Duy1-2/+2
a9be29c981 (sequencer: make refs generated by the `label` command worktree-local, 2018-04-25) adds refs/rewritten/ as per-worktree reference space. Unfortunately (my bad) there are a couple places that need update to make sure it's really per-worktree. - add_per_worktree_entries_to_dir() is updated to make sure ref listing look at per-worktree refs/rewritten/ instead of per-repo one [1] - common_list[] is updated so that git_path() returns the correct location. This includes "rev-parse --git-path". This mess is created by me. I started trying to fix it with the introduction of refs/worktree, where all refs will be per-worktree without special treatments. Unfortunate refs/rewritten came before refs/worktree so this is all we can do. This also fixes logs/refs/worktree not being per-worktree. [1] note that ref listing still works sometimes. For example, if you have .git/worktrees/foo/refs/rewritten/bar AND the directory .git/worktrees/refs/rewritten, refs/rewritten/bar will show up. add_per_worktree_entries_to_dir() is only needed when the directory .git/worktrees/refs/rewritten is missing. Reported-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-08files-backend.c: reduce duplication in add_per_worktree_entries_to_dir()Nguyễn Thái Ngọc Duy1-11/+11
This function is duplicated to handle refs/bisect/ and refs/worktree/ and a third prefix is coming. Time to clean up. This also fixes incorrect "refs/worktrees/" length in this code. The correct length is 14 not 11. The test in the next patch will also cover this. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-08files-backend.c: factor out per-worktree code in loose_fill_ref_dir()Nguyễn Thái Ngọc Duy1-22/+28
This is the first step for further cleaning up and extending this function. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-14files-backend: drop refs parameter from split_symref_update()Jeff King1-3/+2
This parameter was added in fcc42ea0c9 (split_symref_update(): add a files_ref_store argument, 2016-09-04) without comment, but never used. The splitting is purely mechanical, and doesn't depend on the particular ref-store. Let's drop this parameter in the name of simplicity. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-26Merge branch 'nd/per-worktree-ref-iteration'Junio C Hamano1-1/+2
Build fix. * nd/per-worktree-ref-iteration: files-backend.c: fix build error on Solaris
2018-11-26files-backend.c: fix build error on SolarisNguyễn Thái Ngọc Duy1-1/+2
This function files_reflog_path returns void, which usually means "return;" not returning "void value" from another function. Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-13Merge branch 'nd/per-worktree-ref-iteration'Junio C Hamano1-3/+39
The code to traverse objects for reachability, used to decide what objects are unreferenced and expendable, have been taught to also consider per-worktree refs of other worktrees as starting points to prevent data loss. * nd/per-worktree-ref-iteration: git-worktree.txt: correct linkgit command name reflog expire: cover reflog from all worktrees fsck: check HEAD and reflog from other worktrees fsck: move fsck_head_link() to get_default_heads() to avoid some globals revision.c: better error reporting on ref from different worktrees revision.c: correct a parameter name refs: new ref types to make per-worktree refs visible to all worktrees Add a place for (not) sharing stuff between worktrees refs.c: indent with tabs, not spaces