aboutsummaryrefslogtreecommitdiffstats
path: root/list-objects.c
AgeCommit message (Collapse)AuthorFilesLines
2024-03-28Merge branch 'eb/hash-transition'Junio C Hamano1-1/+1
Work to support a repository that work with both SHA-1 and SHA-256 hash algorithms has started. * eb/hash-transition: (30 commits) t1016-compatObjectFormat: add tests to verify the conversion between objects t1006: test oid compatibility with cat-file t1006: rename sha1 to oid test-lib: compute the compatibility hash so tests may use it builtin/ls-tree: let the oid determine the output algorithm object-file: handle compat objects in check_object_signature tree-walk: init_tree_desc take an oid to get the hash algorithm builtin/cat-file: let the oid determine the output algorithm rev-parse: add an --output-object-format parameter repository: implement extensions.compatObjectFormat object-file: update object_info_extended to reencode objects object-file-convert: convert commits that embed signed tags object-file-convert: convert commit objects when writing object-file-convert: don't leak when converting tag objects object-file-convert: convert tag objects when writing object-file-convert: add a function to convert trees between algorithms object: factor out parse_mode out of fast-import and tree-walk into in object.h cache: add a function to read an OID of a specific algorithm tag: sign both hashes commit: export add_header_signature to support handling signatures on tags ...
2023-11-08Merge branch 'tb/rev-list-unpacked-fix'Junio C Hamano1-0/+3
"git rev-list --unpacked --objects" failed to exclude packed non-commit objects, which has been corrected. * tb/rev-list-unpacked-fix: pack-bitmap: drop --unpacked non-commit objects from results list-objects: drop --unpacked non-commit objects from results
2023-11-07list-objects: drop --unpacked non-commit objects from resultsTaylor Blau1-0/+3
In git-rev-list(1), we describe the `--unpacked` option as: Only useful with `--objects`; print the object IDs that are not in packs. This is true of commits, which we discard via get_commit_action(), but not of the objects they reach. So if we ask for an --objects traversal with --unpacked, we may get arbitrarily many objects which are indeed packed. I am nearly certain this behavior dates back to the introduction of `--unpacked` via 12d2a18780 ("git rev-list --unpacked" shows only unpacked commits, 2005-07-03), but I couldn't get that revision of Git to compile for me. At least as early as v2.0.0 this has been subtly broken: $ git.compile --version git version 2.0.0 $ git.compile rev-list --objects --all --unpacked 72791fe96c93f9ec5c311b8bc966ab349b3b5bbe 05713d991c18bbeef7e154f99660005311b5004d v1.0 153ed8b7719c6f5a68ce7ffc43133e95a6ac0fdb 8e4020bb5a8d8c873b25de15933e75cc0fc275df one 9200b628cf9dc883a85a7abc8d6e6730baee589c two 3e6b46e1b7e3b91acce99f6a823104c28aae0b58 unpacked.t There, only the first, third, and sixth entries are loose, with the remaining set of objects belonging to at least one pack. The implications for this are relatively benign: bare 'git repack' invocations which invoke pack-objects with --unpacked are impacted, and at worst we'll store a few extra objects that should have been excluded. Arguably changing this behavior is a backwards-incompatible change, since it alters the set of objects emitted from rev-list queries with `--objects` and `--unpacked`. But I argue that this change is still sensible, since the existing implementation deviates from clearly-written documentation. The fix here is straightforward: avoid showing any non-commit objects which are contained in packs by discarding them within list-objects.c, before they are shown to the user. Note that similar treatment for `list-objects.c::show_commit()` is not needed, since that case is already handled by `revision.c::get_commit_action()`. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-01rev-list: add commit object support in `--missing` optionKarthik Nayak1-0/+3
The `--missing` object option in rev-list currently works only with missing blobs/trees. For missing commits the revision walker fails with a fatal error. Let's extend the functionality of `--missing` option to also support commit objects. This is done by adding a `missing_objects` field to `rev_info`. This field is an `oidset` to which we'll add the missing commits as we encounter them. The revision walker will now continue the traversal and call `show_commit()` even for missing commits. In rev-list we can then check if the commit is a missing commit and call the existing code for parsing `--missing` objects. A scenario where this option would be used is to find the boundary objects between different object directories. Consider a repository with a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a repository, using the `--missing=print` option while disabling the alternate object directory allows us to find the boundary objects between the main and alternate object directory. Helped-by: Patrick Steinhardt <ps@pks.im> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-01revision: rename bit to `do_not_die_on_missing_objects`Karthik Nayak1-1/+1
The bit `do_not_die_on_missing_tree` is used in revision.h to ensure the revision walker does not die when encountering a missing tree. This is currently exclusively set within `builtin/rev-list.c` to ensure the `--missing` option works with missing trees. In the upcoming commits, we will extend `--missing` to also support missing commits. So let's rename the bit to `do_not_die_on_missing_objects`, which is object type agnostic and can be used for both trees/commits. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02tree-walk: init_tree_desc take an oid to get the hash algorithmEric W. Biederman1-1/+1
To make it possible for git ls-tree to display the tree encoded in the hash algorithm of the oid specified to git ls-tree, update init_tree_desc to take as a parameter the oid of the tree object. Update all callers of init_tree_desc and init_tree_desc_gently to pass the oid of the tree object. Use the oid of the tree object to discover the hash algorithm of the oid and store that hash algorithm in struct tree_desc. Use the hash algorithm in decode_tree_entry and update_tree_entry_internal to handle reading a tree object encoded in a hash algorithm that differs from the repositories hash algorithm. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-31list-objects: respect max_allowed_tree_depthJeff King1-0/+8
The tree traversal in list-objects.c, which is used by "rev-list --objects", etc, uses recursion and may run out of stack space. Let's teach it about the new core.maxTreeDepth config option. We unfortunately can't return an error here, as this code doesn't produce an error return at all. We'll die() instead, which matches the behavior when we see an otherwise broken tree. Note that this will also generally reject such deep trees from entering the repository from a fetch or push, due to the use of rev-list in the connectivity check. But it's not foolproof! We stop traversing when we see an UNINTERESTING object, and the connectivity check marks existing ref tips as UNINTERESTING. So imagine commit X has a tree with maximum depth N. If you then create a new commit Y with a tree entry "Y:subdir" that points to "X^{tree}", then the depth of Y will be N+1. But a connectivity check running "git rev-list --objects Y --not X" won't realize that; it will stop traversing at X^{tree}, since that was already reachable. So this will stop naive pushes of too-deep trees, but not carefully crafted malicious ones. Doing it robustly and efficiently would require caching the maximum depth of each tree (i.e., the longest path to any leaf entry). That's much more complex and not strictly needed. If each recursive algorithm limits itself already, then that's sufficient. Blocking the objects from entering the repo would be a nice belt-and-suspenders addition, but it's not worth the extra cost. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-02Merge branch 'jc/tree-walk-drop-base-offset'Junio C Hamano1-1/+1
Code simplification. * jc/tree-walk-drop-base-offset: tree-walk: drop unused base_offset from do_match() tree-walk: lose base_offset that is never used in tree_entry_interesting
2023-07-07tree-walk: lose base_offset that is never used in tree_entry_interestingJunio C Hamano1-1/+1
The tree_entry_interesting() function takes base_offset, allowing its callers to potentially pass a non-zero number to skip the early part of the path string. The feature is never exercised and we do not even know what bugs are lurking there, as all callers pass 0 to the parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21object-store-ll.h: split this header out of object-store.hElijah Newren1-1/+1
The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store | sort | uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: remove cache.h inclusion due to object.h changesElijah Newren1-1/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-04Merge branch 'ab/remove-implicit-use-of-the-repository' into ↵Junio C Hamano1-8/+12
en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-03-28cocci: apply the "object-store.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-1/+1
Apply the part of "the_repository.pending.cocci" pertaining to "object-store.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28cocci: apply the "commit.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-7/+11
Apply the part of "the_repository.pending.cocci" pertaining to "commit.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren1-0/+1
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-13list-objects: drop process_gitlink() functionJeff King1-32/+1
Our object graph traversal code has a process_gitlink() function which we call when we see a gitlink entry. The function does nothing; it was added in the early days of gitlinks by 6e2f441bd4 (Teach git list-objects logic to not follow gitlinks, 2007-04-13). The comment above the function talks about some things we _could_ do. But in the intervening 15 years, nobody has touched the function, and the submodule code usually makes its own decisions about when and how to examine the links. At the generic traversal layer, we can't assume that the pointed-to commit is available. Let's drop this placeholder that isn't really helping anything. This silences some -Wunused-parameter warnings, and also gets rid of a crufty use of "const unsigned char *" to pass a raw hash value. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-09list-objects: handle NULL function pointersÆvar Arnfjörð Bjarmason1-5/+22
If a caller to traverse_commit_list() specifies the options for the --objects flag but does not specify a show_object function pointer, the result is a segfault. This is currently visible by running 'git bundle create --objects HEAD'. We could fix this problem by supplying a no-op callback in builtin/bundle.c, but that only solves the problem for one builtin, leaving this segfault open for other callers. Replace all callers of the show_commit and show_object function pointers in list-objects.c to call helper functions show_commit() and show_object() which check that the given context has non-NULL functions before passing the necessary data. One extra benefit is that it reduces duplication due to passing ctx->show_data to every caller. Test that this segfault no longer occurs for 'git bundle'. Co-authored-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-09list-objects: consolidate traverse_commit_list[_filtered]Derrick Stolee1-22/+12
Now that all consumers of traverse_commit_list_filtered() populate the 'filter' member of 'struct rev_info', we can drop that parameter from the method prototype to simplify things. In addition, the only thing different now between traverse_commit_list_filtered() and traverse_commit_list() is the presence of the 'omitted' parameter, which is only non-NULL for one caller. We can consolidate these two methods by having one call the other and use the simpler form everywhere the 'omitted' parameter would be NULL. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-12list-objects.c: rename "traverse_trees_and_blobs" to "traverse_non_commits"Teng Long1-4/+4
Function `traverse_trees_and_blobs` not only works on trees and blobs, but also on tags, the function name is somewhat misleading. This commit rename it to `traverse_non_commits`. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-15bitmaps: don't recurse into trees already in the bitmapJeff King1-0/+3
If an object is already mentioned in a reachability bitmap we are building, then by definition so are all of the objects it can reach. We have an optimization to stop traversing commits when we see they are already in the bitmap, but we don't do the same for trees. It's generally unavoidable to recurse into trees for commits not yet covered by bitmaps (since most commits generally do have unique top-level trees). But they usually have subtrees that are shared with other commits (i.e., all of the subtrees the commit _didn't_ touch). And some of those commits (and their trees) may be covered by the bitmap. Usually this isn't _too_ big a deal, because we'll visit those subtrees only once in total for the whole walk. But if you have a large number of unbitmapped commits, and if your tree is big, then you may end up opening a lot of sub-trees for no good reason. We can use the same optimization we do for commits here: when we are about to open a tree, see if it's in the bitmap (either the one we are building, or the "seen" bitmap which covers the UNINTERESTING side of the bitmap when doing a set-difference). This works especially well because we'll visit all commits before hitting any trees. So even in a history like: A -- B if "A" has a bitmap on disk but "B" doesn't, we'll already have OR-ed in the results from A before looking at B's tree (so we really will only look at trees touched by B). For most repositories, the timings produced by p5310 are unspectacular. Here's linux.git: Test HEAD^ HEAD -------------------------------------------------------------------- 5310.4: simulated clone 6.00(5.90+0.10) 5.98(5.90+0.08) -0.3% 5310.5: simulated fetch 2.98(5.45+0.18) 2.85(5.31+0.18) -4.4% 5310.7: rev-list (commits) 0.32(0.29+0.03) 0.33(0.30+0.03) +3.1% 5310.8: rev-list (objects) 1.48(1.44+0.03) 1.49(1.44+0.05) +0.7% Any improvement there is within the noise (the +3.1% on test 7 has to be noise, since we are not recursing into trees, and thus the new code isn't even run). The results for git.git are likewise uninteresting. But here are numbers from some other real-world repositories (that are not public). This one's tree is comparable in size to linux.git, but has ~16k refs (and so less complete bitmap coverage): Test HEAD^ HEAD ------------------------------------------------------------------------- 5310.4: simulated clone 38.34(39.86+0.74) 33.95(35.53+0.76) -11.5% 5310.5: simulated fetch 2.29(6.31+0.35) 2.20(5.97+0.41) -3.9% 5310.7: rev-list (commits) 0.99(0.86+0.13) 0.96(0.85+0.11) -3.0% 5310.8: rev-list (objects) 11.32(11.04+0.27) 6.59(6.37+0.21) -41.8% And here's another with a very large tree (~340k entries), and a fairly large number of refs (~10k): Test HEAD^ HEAD ------------------------------------------------------------------------- 5310.3: simulated clone 53.83(54.71+1.54) 39.77(40.76+1.50) -26.1% 5310.4: simulated fetch 19.91(20.11+0.56) 19.79(19.98+0.67) -0.6% 5310.6: rev-list (commits) 0.54(0.44+0.11) 0.51(0.43+0.07) -5.6% 5310.7: rev-list (objects) 24.32(23.59+0.73) 9.85(9.49+0.36) -59.5% This patch provides substantial improvements in these larger cases, and have any drawbacks for smaller ones (the cost of the bitmap check is quite small compared to an actual tree traversal). Note that we have to add a version of revision.c's include_check callback which handles non-commits. We could possibly consolidate this into a single callback for all objects types, as there's only one user of the feature which would need converted (pack-bitmap.c:should_include). That would in theory let us avoid duplicating any logic. But when I tried it, the code ended up much worse to read, with lots of repeated "if it's a commit do this, otherwise do that". Having two separate callbacks splits that naturally, and matches the existing split of show_commit/show_object callbacks. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-12list-objects: support filtering by tag and commitPatrick Steinhardt1-3/+20
Object filters currently only support filtering blobs or trees based on some criteria. This commit lays the foundation to also allow filtering of tags and commits. No change in behaviour is expected from this commit given that there are no filters yet for those object types. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-10list-objects: move tag processing into its own functionPatrick Steinhardt1-2/+9
Move processing of tags into its own function to make the logic easier to extend when we're going to implement filtering for tags. No change in behaviour is expected from this commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07Merge branch 'jk/list-objects-optim-wo-trees'Junio C Hamano1-1/+3
The object traversal machinery has been optimized not to load tree objects when we are only interested in commit history. * jk/list-objects-optim-wo-trees: list-objects: don't queue root trees unless revs->tree_objects is set
2019-09-12list-objects: don't queue root trees unless revs->tree_objects is setJeff King1-1/+3
When traverse_commit_list() processes each commit, it queues the commit's root tree in the pending array. Then, after all commits are processed, it calls traverse_trees_and_blobs() to walk over the pending list, calling process_tree() on each. But if revs->tree_objects is not set, process_tree() just exists immediately! We can save ourselves some work by not even bothering to queue these trees in the first place. There are a few subtle points to make: - we also detect commits with a NULL tree pointer here. But this isn't an interesting check for broken commits, since the lookup_tree() we'd have done during commit parsing doesn't actually check that we have the tree on disk. So we're not losing any robustness. - besides queueing, we also set the NOT_USER_GIVEN flag on the tree object. This is used by the traverse_commit_list_filtered() variant. But if we're not exploring trees, then we won't actually care about this flag, which is used only inside process_tree() code-paths. - queueing trees eventually leads to us queueing blobs, too. But we don't need to check revs->blob_objects here. Even in the current code, we still wouldn't find those blobs, because we'd never open up the tree objects to list their contents. - the user-visible impact to the caller is minimal. The pending trees are all cleared by the time the function returns anyway, by traverse_trees_and_blobs(). We do call a show_commit() callback, which technically could be looking at revs->pending during the callback. But it seems like a rather unlikely thing to do (if you want the tree of the current commit, then accessing the tree struct member is a lot simpler). So this should be safe to do. Let's look at the benefits: [before] Benchmark #1: git -C linux rev-list HEAD >/dev/null Time (mean ± σ): 7.651 s ± 0.021 s [User: 7.399 s, System: 0.252 s] Range (min … max): 7.607 s … 7.683 s 10 runs [after] Benchmark #1: git -C linux rev-list HEAD >/dev/null Time (mean ± σ): 7.593 s ± 0.023 s [User: 7.329 s, System: 0.264 s] Range (min … max): 7.565 s … 7.634 s 10 runs Not too impressive, but then we're really just avoiding sticking a pointer into a growable array. But still, I'll take a free 0.75% speedup. Let's try it after running "git commit-graph write": [before] Benchmark #1: git -C linux rev-list HEAD >/dev/null Time (mean ± σ): 1.458 s ± 0.011 s [User: 1.199 s, System: 0.259 s] Range (min … max): 1.447 s … 1.481 s 10 runs [after] Benchmark #1: git -C linux rev-list HEAD >/dev/null Time (mean ± σ): 1.126 s ± 0.023 s [User: 896.5 ms, System: 229.0 ms] Range (min … max): 1.106 s … 1.181 s 10 runs Now that's more like it. We saved over 22% of the total time. Part of that is because the runtime is shorter overall, but the absolute improvement is also much larger. What's going on? When we fill in a commit struct using the commit graph, we don't bother to set the tree pointer, and instead lazy-load it when somebody calls get_commit_tree(). So we're not only skipping the pointer write to the pending queue, but we're skipping the lazy-load of the tree entirely. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28list-objects-filter: encapsulate filter componentsMatthew DeVore1-33/+22
Encapsulate filter_fn, filter_free_fn, and filter_data into their own opaque struct. Due to opaqueness, filter_fn and filter_free_fn can no longer be accessed directly by users. Currently, all usages of filter_fn are guarded by a necessary check: (obj->flags & NOT_USER_GIVEN) && filter_fn Take the opportunity to include this check into the new function list_objects_filter__filter_object(), so that we no longer need to write this check at every caller of the filter function. Also, the init functions in list-objects-filter.c no longer need to confusingly return the filter constituents in various places (filter_fn and filter_free_fn as out parameters, and filter_data as the function's return value); they can just initialize the "struct filter" passed in. Helped-by: Jeff Hostetler <git@jeffhostetler.com> Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10rev-list: detect broken root treesJeff King1-0/+3
When the traversal machinery sees a commit without a root tree, it assumes that the tree was part of a BOUNDARY commit, and quietly ignores the tree. But it could also be caused by a commit whose root tree is broken or missing. Instead, let's die() when we see a NULL root tree. We can differentiate it from the BOUNDARY case by seeing if the commit was actually parsed. This covers that case, plus future-proofs us against any others where we might try to show an unparsed commit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10list-objects.c: handle unexpected non-tree entriesTaylor Blau1-0/+5
Apply similar treatment as the previous commit for non-tree entries, too. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10list-objects.c: handle unexpected non-blob entriesTaylor Blau1-0/+5
Fix one of the cases described in the previous commit where a tree-entry that is promised to a blob is in fact a non-blob. When 'lookup_blob()' returns NULL, it is because Git has cached the requested object as a non-blob. In this case, prevent a SIGSEGV by 'die()'-ing immediately before attempting to dereference the result. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-06Merge branch 'ds/push-sparse-tree-walk'Junio C Hamano1-11/+59
"git pack-objects" learned another algorithm to compute the set of objects to send, that trades the resulting packfile off to save traversal cost to favor small pushes. * ds/push-sparse-tree-walk: pack-objects: create GIT_TEST_PACK_SPARSE pack-objects: create pack.useSparse setting revision: implement sparse algorithm list-objects: consume sparse tree walk revision: add mark_tree_uninteresting_sparse
2019-01-29Merge branch 'bc/tree-walk-oid'Junio C Hamano1-3/+3
The code to walk tree objects has been taught that we may be working with object names that are not computed with SHA-1. * bc/tree-walk-oid: cache: make oidcpy always copy GIT_MAX_RAWSZ bytes tree-walk: store object_id in a separate member match-trees: use hashcpy to splice trees match-trees: compute buffer offset correctly when splicing tree-walk: copy object ID before use
2019-01-17list-objects: consume sparse tree walkDerrick Stolee1-11/+59
When creating a pack-file using 'git pack-objects --revs' we provide a list of interesting and uninteresting commits. For example, a push operation would make the local topic branch be interesting and the known remote refs as uninteresting. We want to discover the set of new objects to send to the server as a thin pack. We walk these commits until we discover a frontier of commits such that every commit walk starting at interesting commits ends in a root commit or unintersting commit. We then need to discover which non-commit objects are reachable from uninteresting commits. This commit walk is not changing during this series. The mark_edges_uninteresting() method in list-objects.c iterates on the commit list and does the following: * If the commit is UNINTERSTING, then mark its root tree and every object it can reach as UNINTERESTING. * If the commit is interesting, then mark the root tree of every UNINTERSTING parent (and all objects that tree can reach) as UNINTERSTING. At the very end, we repeat the process on every commit directly given to the revision walk from stdin. This helps ensure we properly cover shallow commits that otherwise were not included in the frontier. The logic to recursively follow trees is in the mark_tree_uninteresting() method in revision.c. The algorithm avoids duplicate work by not recursing into trees that are already marked UNINTERSTING. Add a new 'sparse' option to the mark_edges_uninteresting() method that performs this logic in a slightly different way. As we iterate over the commits, we add all of the root trees to an oidset. Then, call mark_trees_uninteresting_sparse() on that oidset. Note that we include interesting trees in this process. The current implementation of mark_trees_unintersting_sparse() will walk the same trees as the old logic, but this will be replaced in a later change. Add a '--sparse' flag in 'git pack-objects' to call this new logic. Add a new test script t/t5322-pack-objects-sparse.sh that tests this option. The tests currently demonstrate that the resulting object list is the same as the old algorithm. This includes a case where both algorithms pack an object that is not needed by a remote due to limits on the explored set of trees. When the sparse algorithm is changed in a later commit, we will add a test that demonstrates a change of behavior in some cases. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15tree-walk: store object_id in a separate memberbrian m. carlson1-3/+3
When parsing a tree, we read the object ID directly out of the tree buffer. This is normally fine, but such an object ID cannot be used with oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1, there may not be that many bytes to copy. Instead, store the object ID in a separate struct member. Since we can no longer efficiently compute the path length, store that information as well in struct name_entry. Ensure we only copy the object ID into the new buffer if the path length is nonzero, as some callers will pass us an empty path with no object ID following it, and we will not want to read past the end of the buffer. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14Merge branch 'nd/attr-pathspec-in-tree-walk'Junio C Hamano1-1/+2
The traversal over tree objects has learned to honor ":(attr:label)" pathspec match, which has been implemented only for enumerating paths on the filesystem. * nd/attr-pathspec-in-tree-walk: tree-walk: support :(attr) matching dir.c: move, rename and export match_attrs() pathspec.h: clean up "extern" in function declarations tree-walk.c: make tree_entry_interesting() take an index tree.c: make read_tree*() take 'struct repository *'
2018-11-19tree-walk.c: make tree_entry_interesting() take an indexNguyễn Thái Ngọc Duy1-1/+2
In order to support :(attr) when matching pathspec on a tree, tree_entry_interesting() needs to take an index (because git_check_attr() needs it). This is the preparation step for it. This also makes it clearer what index we fall back to when looking up attributes during an unpack-trees operation: the source index. This also fixes revs->pruning.repo initialization that should have been done in 2abf350385 (revision.c: remove implicit dependency on the_index - 2018-09-21). Without it, skip_uninteresting() will dereference a NULL pointer through this call chain get_revision(revs) get_revision_internal get_revision_1 try_to_simplify_commit rev_compare_tree diff_tree_oid(..., &revs->pruning) ll_diff_tree_oid diff_tree_paths ll_diff_tree skip_uninteresting Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12list-objects.c: reduce the_repository referencesNguyễn Thái Ngọc Duy1-2/+2
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12list-objects-filter.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy1-3/+6
While at there, since we have access to struct repository now, eliminate the only the_repository reference in this file. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-30Merge branch 'md/filter-trees'Junio C Hamano1-110/+125
The "rev-list --filter" feature learned to exclude all trees via "tree:0" filter. * md/filter-trees: list-objects: support for skipping tree traversal filter-trees: code clean-up of tests list-objects-filter: implement filter tree:0 list-objects-filter-options: do not over-strbuf_init list-objects-filter: use BUG rather than die revision: mark non-user-given objects instead rev-list: handle missing tree objects properly list-objects: always parse trees gently list-objects: refactor to process_tree_contents list-objects: store common func args in struct
2018-10-18list-objects: support for skipping tree traversalMatthew DeVore1-1/+4
The tree:0 filter does not need to traverse the trees that it has filtered out, so optimize list-objects and list-objects-filter to skip traversing the trees entirely. Before this patch, we iterated over all children of the tree, and did nothing for all of them, which was wasteful. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07revision: mark non-user-given objects insteadMatthew DeVore1-13/+18
Currently, list-objects.c incorrectly treats all root trees of commits as USER_GIVEN. Also, it would be easier to mark objects that are non-user-given instead of user-given, since the places in the code where we access an object through a reference are more obvious than the places where we access an object that was given by the user. Resolve these two problems by introducing a flag NOT_USER_GIVEN that marks blobs and trees that are non-user-given, replacing USER_GIVEN. (Only blobs and trees are marked because this mark is only used when filtering objects, and filtering of other types of objects is not supported yet.) This fixes a bug in that git rev-list behaved differently from git pack-objects. pack-objects would *not* filter objects given explicitly on the command line and rev-list would filter. This was because the two commands used a different function to add objects to the rev_info struct. This seems to have been an oversight, and pack-objects has the correct behavior, so I added a test to make sure that rev-list now behaves properly. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07rev-list: handle missing tree objects properlyMatthew DeVore1-3/+8
Previously, we assumed only blob objects could be missing. This patch makes rev-list handle missing trees like missing blobs. The --missing=* and --exclude-promisor-objects flags now work for trees as they already do for blobs. This is demonstrated in t6112. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21revision.c: reduce implicit dependency the_repositoryNguyễn Thái Ngọc Duy1-3/+5
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-15list-objects: always parse trees gentlyMatthew DeVore1-3/+1
If parsing fails when revs->ignore_missing_links and revs->exclude_promisor_objects are both false, we print the OID anyway in the die("bad tree object...") call, so any message printed by parse_tree_gently() is superfluous. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13list-objects: refactor to process_tree_contentsMatthew DeVore1-27/+41
This will be used in a follow-up patch to reduce indentation needed when invoking the logic conditionally. i.e. rather than: if (foo) { while (...) { /* this is very indented */ } } we will have: if (foo) process_tree_contents(...); Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13list-objects: store common func args in structMatthew DeVore1-84/+74
This will make utility functions easier to create, as done by the next patch. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-02Merge branch 'sb/object-store-lookup'Junio C Hamano1-2/+2
lookup_commit_reference() and friends have been updated to find in-core object for a specific in-core repository instance. * sb/object-store-lookup: (32 commits) commit.c: allow lookup_commit_reference to handle arbitrary repositories commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories tag.c: allow deref_tag to handle arbitrary repositories object.c: allow parse_object to handle arbitrary repositories object.c: allow parse_object_buffer to handle arbitrary repositories commit.c: allow get_cached_commit_buffer to handle arbitrary repositories commit.c: allow set_commit_buffer to handle arbitrary repositories commit.c: migrate the commit buffer to the parsed object store commit-slabs: remove realloc counter outside of slab struct commit.c: allow parse_commit_buffer to handle arbitrary repositories tag: allow parse_tag_buffer to handle arbitrary repositories tag: allow lookup_tag to handle arbitrary repositories commit: allow lookup_commit to handle arbitrary repositories tree: allow lookup_tree to handle arbitrary repositories blob: allow lookup_blob to handle arbitrary repositories object: allow lookup_object to handle arbitrary repositories object: allow object_as_type to handle arbitrary repositories tag: add repository argument to deref_tag tag: add repository argument to parse_tag_buffer tag: add repository argument to lookup_tag ...
2018-07-24Merge branch 'jt/partial-clone-fsck-connectivity'Junio C Hamano1-3/+3
Partial clone support of "git clone" has been updated to correctly validate the objects it receives from the other side. The server side has been corrected to send objects that are directly requested, even if they may match the filtering criteria (e.g. when doing a "lazy blob" partial clone). * jt/partial-clone-fsck-connectivity: clone: check connectivity even if clone is partial upload-pack: send refs' objects despite "filter"
2018-07-18Merge branch 'sb/object-store-grafts'Junio C Hamano1-0/+1
The conversion to pass "the_repository" and then "a_repository" throughout the object access API continues. * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h
2018-07-09upload-pack: send refs' objects despite "filter"Jonathan Tan1-3/+3
A filter line in a request to upload-pack filters out objects regardless of whether they are directly referenced by a "want" line or not. This means that cloning with "--filter=blob:none" (or another filter that excludes blobs) from a repository with at least one ref pointing to a blob (for example, the Git repository itself) results in output like the following: error: missing object referenced by 'refs/tags/junio-gpg-pub' and if that particular blob is not referenced by a fetched tree, the resulting clone fails fsck because there is no object from the remote to vouch that the missing object is a promisor object. Update both the protocol and the upload-pack implementation to include all explicitly specified "want" objects in the packfile regardless of the filter specification. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29tree: add repository argument to lookup_treeStefan Beller1-1/+1
Add a repository argument to allow the callers of lookup_tree to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29blob: add repository argument to lookup_blobStefan Beller1-1/+1
Add a repository argument to allow the callers of lookup_blob to be more specific about which repository to act on. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29Merge branch 'sb/object-store-grafts' into sb/object-store-lookupJunio C Hamano1-0/+1
* sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h
2018-05-16object-store: move object access functions to object-store.hStefan Beller1-0/+1
This should make these functions easier to find and cache.h less overwhelming to read. In particular, this moves: - read_object_file - oid_object_info - write_object_file As a result, most of the codebase needs to #include object-store.h. In this patch the #include is only added to files that would fail to compile otherwise. It would be better to #include wherever identifiers from the header are used. That can happen later when we have better tooling for it. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11treewide: replace maybe_tree with accessor methodsDerrick Stolee1-5/+5
In anticipation of making trees load lazily, create a Coccinelle script (contrib/coccinelle/commit.cocci) to ensure that all references to the 'maybe_tree' member of struct commit are either mutations or accesses through get_commit_tree() or get_commit_tree_oid(). Apply the Coccinelle script to create the rest of the patch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11treewide: rename tree to maybe_treeDerrick Stolee1-5/+5
Using the commit-graph file to walk commit history removes the large cost of parsing commits during the walk. This exposes a performance issue: lookup_tree() takes a large portion of the computation time, even when Git never uses those trees. In anticipation of lazy-loading these trees, rename the 'tree' member of struct commit to 'maybe_tree'. This serves two purposes: it hints at the future role of possibly being NULL even if the commit has a valid tree, and it allows for unambiguous transformation from simple member access (i.e. commit->maybe_tree) to method access. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-13Merge branch 'jh/fsck-promisors'Junio C Hamano1-1/+28
In preparation for implementing narrow/partial clone, the machinery for checking object connectivity used by gc and fsck has been taught that a missing object is OK when it is referenced by a packfile specially marked as coming from trusted repository that promises to make them available on-demand and lazily. * jh/fsck-promisors: gc: do not repack promisor packfiles rev-list: support termination at promisor objects sha1_file: support lazily fetching missing objects introduce fetch-object: fetch one promisor object index-pack: refactor writing of .keep files fsck: support promisor objects as CLI argument fsck: support referenced promisor objects fsck: support refs pointing to promisor objects fsck: introduce partialclone extension extension.partialclone: introduce partial clone extension
2017-12-28Merge branch 'sb/describe-blob'Junio C Hamano1-21/+46
"git describe" was taught to dig trees deeper to find a <commit-ish>:<path> that refers to a given blob object. * sb/describe-blob: builtin/describe.c: describe a blob builtin/describe.c: factor out describe_commit builtin/describe.c: print debug statements earlier builtin/describe.c: rename `oid` to avoid variable shadowing revision.h: introduce blob/tree walking in order of the commits list-objects.c: factor out traverse_trees_and_blobs t6120: fix typo in test name
2017-12-08rev-list: support termination at promisor objectsJonathan Tan1-1/+28
Teach rev-list to support termination of an object traversal at any object from a promisor remote (whether one that the local repo also has, or one that the local repo knows about because it has another promisor object that references it). This will be used subsequently in gc and in the connectivity check used by fetch. For efficiency, if an object is referenced by a promisor object, and is in the local repo only as a non-promisor object, object traversal will not stop there. This is to avoid building the list of promisor object references. (In list-objects.c, the case where obj is NULL in process_blob() and process_tree() do not need to be changed because those happen only when there is a conflict between the expected type and the existing object. If the object doesn't exist, an object will be synthesized, which is fine.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-22list-objects: filter objects in traverse_commit_listJeff Hostetler1-16/+79
Create traverse_commit_list_filtered() and add filtering interface to allow certain objects to be omitted from the traversal. Update traverse_commit_list() to be a wrapper for the above with a null filter to minimize the number of callers that needed to be changed. Object filtering will be used in a future commit by rev-list and pack-objects for partial clone and fetch to omit unwanted objects from the result. traverse_bitmap_commit_list() does not work with filtering. If a packfile bitmap is present, it will not be used. It should be possible to extend such support in the future (at least to simple filters that do not require object pathnames), but that is beyond the scope of this patch series. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-16revision.h: introduce blob/tree walking in order of the commitsStefan Beller1-0/+8
The functionality to list tree objects in the order they were seen while traversing the commits will be used in one of the next commits, where we teach `git describe` to describe not only commits, but blobs, too. The change in list-objects.c is rather minimal as we'll be re-using the infrastructure put in place of the revision walking machinery. For example one could expect that add_pending_tree is not called, but rather commit->tree is directly passed to the tree traversal function. This however requires a lot more code than just emptying the queue containing trees after each commit. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-03list-objects.c: factor out traverse_trees_and_blobsStefan Beller1-19/+31
With traverse_trees_and_blobs factored out of the main traverse function, the next patch can introduce an in-order revision walking with ease. In the next patch we'll call `traverse_trees_and_blobs` from within the loop walking the commits, such that we'll have one invocation of that function per commit. That is why we do not want to have memory allocations in that function, such as we'd have if we were to use a strbuf locally. Pass a strbuf from traverse_commit_list into the blob and tree traversing function as a scratch pad that only needs to be allocated once. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08Convert lookup_tree to struct object_idbrian m. carlson1-1/+1
Convert the lookup_tree function to take a pointer to struct object_id. The commit was created with manual changes to tree.c, tree.h, and object.c, plus the following semantic patch: @@ @@ - lookup_tree(EMPTY_TREE_SHA1_BIN) + lookup_tree(&empty_tree_oid) @@ expression E1; @@ - lookup_tree(E1.hash) + lookup_tree(&E1) @@ expression E1; @@ - lookup_tree(E1->hash) + lookup_tree(E1) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08Convert lookup_blob to struct object_idbrian m. carlson1-1/+1
Convert lookup_blob to take a pointer to struct object_id. The commit was created with manual changes to blob.c and blob.h, plus the following semantic patch: @@ expression E1; @@ - lookup_blob(E1.hash) + lookup_blob(&E1) @@ expression E1; @@ - lookup_blob(E1->hash) + lookup_blob(E1) Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-25struct name_entry: use struct object_id instead of unsigned char sha1[20]brian m. carlson1-3/+3
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-12list-objects: pass full pathname to callbacksJeff King1-5/+9
When we find a blob at "a/b/c", we currently pass this to our show_object_fn callbacks as two components: "a/b/" and "c". Callbacks which want the full value then call path_name(), which concatenates the two. But this is an inefficient interface; the path is a strbuf, and we could simply append "c" to it temporarily, then roll back the length, without creating a new copy. So we could improve this by teaching the callsites of path_name() this trick (and there are only 3). But we can also notice that no callback actually cares about the broken-down representation, and simply pass each callback the full path "a/b/c" as a string. The callback code becomes even simpler, then, as we do not have to worry about freeing an allocated buffer, nor rolling back our modification to the strbuf. This is theoretically less efficient, as some callbacks would not bother to format the final path component. But in practice this is not measurable. Since we use the same strbuf over and over, our work to grow it is amortized, and we really only pay to memcpy a few bytes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-12list-objects: drop name_path entirelyJeff King1-7/+5
In the previous commit, we left name_path as a thin wrapper around a strbuf. This patch drops it entirely. As a result, every show_object_fn callback needs to be adjusted. However, none of their code needs to be changed at all, because the only use was to pass it to path_name(), which now handles the bare strbuf. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-12list-objects: convert name_path to a strbufJeff King1-13/+9
The "struct name_path" data is examined in only two places: we generate it in process_tree(), and we convert it to a single string in path_name(). Everyone else just passes it through to those functions. We can further note that process_tree() already keeps a single strbuf with the leading tree path, for use with tree_entry_interesting(). Instead of building a separate name_path linked list, let's just use the one we already build in "base". This reduces the amount of code (especially tricky code in path_name() which did not check for integer overflows caused by deep or large pathnames). It is also more efficient in some instances. Any time we were using tree_entry_interesting, we were building up the strbuf anyway, so this is an immediate and obvious win there. In cases where we were not, we trade off storing "pathname/" in a strbuf on the heap for each level of the path, instead of two pointers and an int on the stack (with one pointer into the tree object). On a 64-bit system, the latter is 20 bytes; so if path components are less than that on average, this has lower peak memory usage. In practice it probably doesn't matter either way; we are already holding in memory all of the tree objects leading up to each pathname, and for normal-depth pathnames, we are only talking about hundreds of bytes. This patch leaves "struct name_path" as a thin wrapper around the strbuf, to avoid disrupting callbacks. We should fix them, but leaving it out makes this diff easier to view. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-11-20Convert struct object to object_idbrian m. carlson1-2/+2
struct object is one of the major data structures dealing with object IDs. Convert it to use struct object_id instead of an unsigned char array. Convert get_object_hash to refer to the new member as well. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Jeff King <peff@peff.net>
2015-06-11Merge branch 'jk/squelch-missing-link-warning-for-unreachable'Junio C Hamano1-1/+1
Recent "git prune" traverses young unreachable objects to safekeep old objects in the reachability chain from them, which sometimes caused error messages that are unnecessarily alarming. * jk/squelch-missing-link-warning-for-unreachable: suppress errors on missing UNINTERESTING links silence broken link warnings with revs->ignore_missing_links add quieter versions of parse_{tree,commit}
2015-06-01silence broken link warnings with revs->ignore_missing_linksJeff King1-1/+1
We set revs->ignore_missing_links to instruct the revision-walking machinery that we know the history graph may be incomplete. For example, we use it when walking unreachable but recent objects; we want to add what we can, but it's OK if the history is incomplete. However, we still print error messages for the missing objects, which can be confusing. This is not an error, but just a normal situation when transitioning from a repository last pruned by an older git (which can leave broken segments of history) to a more recent one (where we try to preserve whole reachable segments). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-29rev-list: add an option to mark fewer edges as uninterestingbrian m. carlson1-2/+2
In commit fbd4a70 (list-objects: mark more commits as edges in mark_edges_uninteresting - 2013-08-16), we marked an increasing number of edges uninteresting. This change, and the subsequent change to make this conditional on --objects-edge, are used by --thin to make much smaller packs for shallow clones. Unfortunately, they cause a significant performance regression when pushing non-shallow clones with lots of refs (23.322 seconds vs. 4.785 seconds with 22400 refs). Add an option to git rev-list, --objects-edge-aggressive, that preserves this more aggressive behavior, while leaving --objects-edge to provide more performant behavior. Preserve the current behavior for the moment by using the aggressive option. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19traverse_commit_list: support pending blobs/trees with pathsJeff King1-2/+5
When we call traverse_commit_list, we may have trees and blobs in the pending array. As we process these, we pass the "name" field from the pending entry as the path of the object within the tree (which then becomes the root path if we recurse into subtrees). When we set up the traversal in prepare_revision_walk, though, the "name" field of any pending trees and blobs is likely to be the ref at which we found the object. We would not want to make this part of the path (e.g., doing so would make "git rev-list --objects v2.6.11-tree" in linux.git show paths like "v2.6.11-tree/Makefile", which is nonsensical). Therefore prepare_revision_walk sets the name field of each pending tree and blobs to the empty string. However, this leaves no room for a caller who does know the correct path of a pending object to propagate that information to the revision walker. We can fix this by making two related changes: 1. Use the "path" field as the path instead of the "name" field in traverse_commit_list. If the path is not set, default to "" (which is what we always ended up with in the current code, because of prepare_revision_walk). 2. In prepare_revision_walk, make a complete copy of the entry. This makes the path field available to the walker (if there is one), solving our problem. Leaving the name field intact is now OK, as we do not use it as a path due to point (1) above (and we can use it to make more meaningful error messages if we want). We also make the original "mode" field available to the walker, though it does not actually use it. Note that we still re-add the pending objects and free the old ones (so we may strdup the path and name only to free the old ones). This could be made more efficient by simply copying the object_array entries that we are keeping. However, that would require more restructuring of the code, and is not done here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16object_array: add a "clear" functionJeff King1-6/+1
There's currently no easy way to free the memory associated with an object_array (and in most cases, we simply leak the memory in a rev_info's pending array). Let's provide a helper to make this easier to handle. We can make use of it in list-objects.c, which does the same thing by hand (but fails to free the "name" field of each entry, potentially leaking memory). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-08Merge branch 'jk/pack-bitmap'Junio C Hamano1-1/+4
* jk/pack-bitmap: pack-objects: do not reuse packfiles without --delta-base-offset add `ignore_missing_links` mode to revwalk
2014-04-04add `ignore_missing_links` mode to revwalkVicent Marti1-1/+4
When pack-objects is computing the reachability bitmap to serve a fetch request, it can erroneously die() if some of the UNINTERESTING objects are not present. Upload-pack throws away HAVE lines from the client for objects we do not have, but we may have a tip object without all of its ancestors (e.g., if the tip is no longer reachable and was new enough to survive a `git prune`, but some of its reachable objects did get pruned). In the non-bitmap case, we do a revision walk with the HAVE objects marked as UNINTERESTING. The revision walker explicitly ignores errors in accessing UNINTERESTING commits to handle this case (and we do not bother looking at UNINTERESTING trees or blobs at all). When we have bitmaps, however, the process is quite different. The bitmap index for a pack-objects run is calculated in two separate steps: First, we perform an extensive walk from all the HAVEs to find the full set of objects reachable from them. This walk is usually optimized away because we are expected to hit an object with a bitmap during the traversal, which allows us to terminate early. Secondly, we perform an extensive walk from all the WANTs, which usually also terminates early because we hit a commit with an existing bitmap. Once we have the resulting bitmaps from the two walks, we AND-NOT them together to obtain the resulting set of objects we need to pack. When we are walking the HAVE objects, the revision walker does not know that we are walking it only to mark the results as uninteresting. We strip out the UNINTERESTING flag, because those objects _are_ interesting to us during the first walk. We want to keep going to get a complete set of reachable objects if we can. We need some way to tell the revision walker that it's OK to silently truncate the HAVE walk, just like it does for the UNINTERESTING case. This patch introduces a new `ignore_missing_links` flag to the `rev_info` struct, which we set only for the HAVE walk. It also adds tests to cover UNINTERESTING objects missing from several positions: a missing blob, a missing tree, and a missing parent commit. The missing blob already worked (as we do not care about its contents at all), but the other two cases caused us to die(). Note that there are a few cases we do not need to test: 1. We do not need to test a missing tree, with the blob still present. Without the tree that refers to it, we would not know that the blob is relevant to our walk. 2. We do not need to test a tip commit that is missing. Upload-pack omits these for us (and in fact, we complain even in the non-bitmap case if it fails to do so). Reported-by: Siddharth Agarwal <sid0@fb.com> Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-27Merge branch 'jk/mark-edges-uninteresting'Junio C Hamano1-9/+11
Fix performance regression in v1.8.4.x and later. * jk/mark-edges-uninteresting: list-objects: only look at cmdline trees with edge_hint t/perf: time rev-list with UNINTERESTING commits
2014-01-21list-objects: only look at cmdline trees with edge_hintJeff King1-9/+11
When rev-list is given a command-line like: git rev-list --objects $commit --not --all the most accurate answer is the difference between the set of objects reachable from $commit and the set reachable from all of the existing refs. However, we have not historically provided that answer, because it is very expensive to calculate. We would have to open every tree of every commit in the entire history. Instead, we find the accurate set difference of the reachable commits, and then mark the trees at the boundaries as uninteresting. This misses objects which appear in the trees of both the interesting commits and deep within the uninteresting history. Commit fbd4a70 (list-objects: mark more commits as edges in mark_edges_uninteresting, 2013-08-16) noticed that we miss those objects during pack-objects, and added code to examine the trees of all of the "--not" refs given on the command-line. Note that this is still not the complete set difference, because we look only at the tips of the command-line arguments, not all of their reachable commits. But it increases the set of boundary objects we consider, which is especially important for shallow fetches. So we are trading extra CPU time for a larger set of boundary objects, which can improve the resulting pack size for a --thin pack. This tradeoff probably makes sense in the context of pack-objects, where we have set revs->edge_hint to have the traversal feed us the set of boundary objects. For a regular rev-list, though, it is probably not a good tradeoff. It is true that it makes our list slightly closer to a true set difference, but it is a rare case where this is important. And because we do not have revs->edge_hint set, we do nothing useful with the larger set of boundary objects. This patch therefore ties the extra tree examination to the revs->edge_hint flag; it is the presence of that flag that makes the tradeoff worthwhile. Here is output from the p0001-rev-list showing the improvement in performance: Test HEAD^ HEAD ----------------------------------------------------------------------------------------- 0001.1: rev-list --all 0.69(0.65+0.02) 0.69(0.66+0.02) +0.0% 0001.2: rev-list --all --objects 3.22(3.19+0.03) 3.23(3.20+0.03) +0.3% 0001.4: rev-list $commit --not --all 0.04(0.04+0.00) 0.04(0.04+0.00) +0.0% 0001.5: rev-list --objects $commit --not --all 0.27(0.26+0.01) 0.04(0.04+0.00) -85.2% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-20Merge branch 'nd/fetch-into-shallow'Junio C Hamano1-4/+20
When there is no sufficient overlap between old and new history during a fetch into a shallow repository, we unnecessarily sent objects the sending side knows the receiving end has. * nd/fetch-into-shallow: Add testcase for needless objects during a shallow fetch list-objects: mark more commits as edges in mark_edges_uninteresting list-objects: reduce one argument in mark_edges_uninteresting upload-pack: delegate rev walking in shallow fetch to pack-objects shallow: add setup_temporary_shallow() shallow: only add shallow graft points to new shallow file move setup_alternate_shallow and write_shallow_commits to shallow.c
2013-08-28list-objects: mark more commits as edges in mark_edges_uninterestingNguyễn Thái Ngọc Duy1-0/+17
The purpose of edge commits is to let pack-objects know what objects it can use as base, but does not need to include in the thin pack because the other side is supposed to already have them. So far we mark uninteresting parents of interesting commits as edges. But even an unrelated uninteresting commit (that the other side has) may become a good base for pack-objects and help produce more efficient packs. This is especially true for shallow clone, when the client issues a fetch with a depth smaller or equal to the number of commits the server is ahead of the client. For example, in this commit history the client has up to "A" and the server has up to "B": -------A---B have--^ ^ / want--+ If depth 1 is requested, the commit list to send to the client includes only B. The way m_e_u is working, it checks if parent commits of B are uninteresting, if so mark them as edges. Due to shallow effect, commit B is grafted to have no parents and the revision walker never sees A as the parent of B. In fact it marks no edges at all in this simple case and sends everything B has to the client even if it could have excluded what A and also the client already have. In a slightly different case where A is not a direct parent of B (iow there are commits in between A and B), marking A as an edge can still save some because B may still have stuff from the far ancestor A. There is another case from the earlier patch, when we deepen a ref from C->E to A->E: ---A---B C---D---E want--^ ^ ^ shallow-+ / have-------+ In this case we need to send A and B to the client, and C (i.e. the current shallow point that the client informs the server) is a very good base because it's closet to A and B. Normal m_e_u won't recognize C as an edge because it only looks back to parents (i.e. A<-B) not the opposite way B->C even if C is already marked as uninteresting commit by the previous patch. This patch includes all uninteresting commits from command line as edges and lets pack-objects decide what's best to do. The upside is we have better chance of producing better packs in certain cases. The downside is we may need to process some extra objects on the server side. For the shallow case on git.git, when the client is 5 commits behind and does "fetch --depth=3", the result pack is 99.26 KiB instead of 4.92 MiB. Reported-and-analyzed-by: Matthijs Kooijman <matthijs@stdin.nl> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-28list-objects: reduce one argument in mark_edges_uninterestingNguyễn Thái Ngọc Duy1-4/+3
mark_edges_uninteresting() is always called with this form mark_edges_uninteresting(revs->commits, revs, ...); Remove the first argument and let mark_edges_uninteresting figure that out by itself. It helps answer the question "are this commit list and revs related in any way?" when looking at mark_edges_uninteresting implementation. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-06clear parsed flag when we free tree buffersJeff King1-2/+1
Many code paths will free a tree object's buffer and set it to NULL after finishing with it in order to keep memory usage down during a traversal. However, out of 8 sites that do this, only one actually unsets the "parsed" flag back. Those sites that don't are setting a trap for later users of the tree object; even after calling parse_tree, the buffer will remain NULL, causing potential segfaults. It is not known whether this is triggerable in the current code. Most commands do not do an in-memory traversal followed by actually using the objects again. However, it does not hurt to be safe for future callers. In most cases, we can abstract this out to a "free_tree_buffer" helper. However, there are two exceptions: 1. The fsck code relies on the parsed flag to know that we were able to parse the object at one point. We can switch this to using a flag in the "flags" field. 2. The index-pack code sets the buffer to NULL but does not free it (it is freed by a caller). We should still unset the parsed flag here, but we cannot use our helper, as we do not want to free the buffer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-27tree_entry_interesting(): give meaningful names to return valuesNguyễn Thái Ngọc Duy1-4/+5
It is a basic code hygiene to avoid magic constants that are unnamed. Besides, this helps extending the value later on for "interesting, but cannot decide if the entry truely matches yet" (ie. prefix matches) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-01list-objects: pass callback data to show_objects()Junio C Hamano1-11/+17
The traverse_commit_list() API takes two callback functions, one to show commit objects, and the other to show other kinds of objects. Even though the former has a callback data parameter, so that the callback does not have to rely on global state, the latter does not. Give the show_objects() callback the same callback data parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-06Merge branch 'nd/struct-pathspec'Junio C Hamano1-11/+7
* nd/struct-pathspec: pathspec: rename per-item field has_wildcard to use_wildcard Improve tree_entry_interesting() handling code Convert read_tree{,_recursive} to support struct pathspec Reimplement read_tree_recursive() using tree_entry_interesting()
2011-03-25Improve tree_entry_interesting() handling codeNguyễn Thái Ngọc Duy1-11/+7
t_e_i() can return -1 or 2 to early shortcut a search. Current code may use up to two variables to handle it. One for saving return value from t_e_i temporarily, one for saving return code 2. The second variable is not needed. If we make sure the first variable does not change until the next t_e_i() call, then we can do something like this: int ret = 0; while (...) { if (ret != 2) { ret = t_e_i(); if (ret < 0) /* no longer interesting */ break; if (ret == 0) /* skip this round */ continue; } /* ret > 0, interesting */ } Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-22Merge branch 'jc/maint-rev-list-culled-boundary'Junio C Hamano1-1/+6
* jc/maint-rev-list-culled-boundary: list-objects.c: don't add an unparsed NULL as a pending tree Conflicts: list-objects.c
2011-03-14list-objects.c: don't add an unparsed NULL as a pending treeJunio C Hamano1-1/+6
"git rev-list --first-parent --boundary $commit^..$commit" segfaults on a merge commit since 8d2dfc4 (process_{tree,blob}: show objects without buffering, 2009-04-10), as it tried to dereference a commit that was discarded as UNINTERESTING without being parsed (hence lacking "tree"). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03Make rev-list --objects work together with pathspecsElijah Newren1-2/+28
When traversing commits, the selection of commits would heed the list of pathspecs passed, but subsequent walking of the trees of those commits would not. This resulted in 'rev-list --objects HEAD -- <paths>' displaying objects at unwanted paths. Have process_tree() call tree_entry_interesting() to determine which paths are interesting and should be walked. Naturally, this change can provide a large speedup when paths are specified together with --objects, since many tree entries are now correctly ignored. Interestingly, though, this change also gives me a small (~1%) but repeatable speedup even when no paths are specified with --objects. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-18Merge branch 'lt/pack-object-memuse'Junio C Hamano1-16/+17
* lt/pack-object-memuse: show_object(): push path_name() call further down process_{tree,blob}: show objects without buffering Conflicts: builtin-pack-objects.c builtin-rev-list.c list-objects.c list-objects.h upload-pack.c
2009-04-12process_{tree,blob}: show objects without bufferingLinus Torvalds1-17/+18
Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-12show_object(): push path_name() call further downLinus Torvalds1-5/+5
In particular, pushing the "path_name()" call _into_ the show() function would seem to allow - more clarity into who "owns" the name (ie now when we free the name in the show_object callback, it's because we generated it ourselves by calling path_name()) - not calling path_name() at all, either because we don't care about the name in the first place, or because we are actually happy walking the linked list of "struct name_path *" and the last component. Now, I didn't do that latter optimization, because it would require some more coding, but especially looking at "builtin-pack-objects.c", we really don't even want the whole pathname, we really would be better off with the list of path components. Why? We use that name for two things: - add_preferred_base_object(), which actually _wants_ to traverse the path, and now does it by looking for '/' characters! - for 'name_hash()', which only cares about the last 16 characters of a name, so again, generating the full name seems to be just unnecessary work. Anyway, so I didn't look any closer at those things, but it did convince me that the "show_object()" calling convention was crazy, and we're actually better off doing _less_ in list-objects.c, and giving people access to the internal data structures so that they can decide whether they want to generate a path-name or not. This patch does that, and then for people who did use the name (even if they might do something more clever in the future), it just does the straightforward "name = path_name(path, component); .. free(name);" thing. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-12Merge branch 'cc/bisect-filter'Junio C Hamano1-4/+5
* cc/bisect-filter: (21 commits) rev-list: add "int bisect_show_flags" in "struct rev_list_info" rev-list: remove last static vars used in "show_commit" list-objects: add "void *data" parameter to show functions bisect--helper: string output variables together with "&&" rev-list: pass "int flags" as last argument of "show_bisect_vars" t6030: test bisecting with paths bisect: use "bisect--helper" and remove "filter_skipped" function bisect: implement "read_bisect_paths" to read paths in "$GIT_DIR/BISECT_NAMES" bisect--helper: implement "git bisect--helper" bisect: use the new generic "sha1_pos" function to lookup sha1 rev-list: call new "filter_skip" function patch-ids: use the new generic "sha1_pos" function to lookup sha1 sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1 rev-list: pass "revs" to "show_bisect_vars" rev-list: make "show_bisect_vars" non static rev-list: move code to show bisect vars into its own function rev-list: move bisect related code into its own file rev-list: make "bisect_list" variable local to "cmd_rev_list" refs: add "for_each_ref_in" function to refactor "for_each_*_ref" functions quote: add "sq_dequote_to_argv" to put unwrapped args in an argv array ...
2009-04-08process_{tree,blob}: Remove useless xstrdup callsBjörn Steinbrink1-2/+0
The name of the processed object was duplicated for passing it to add_object(), but that already calls path_name, which allocates a new string anyway. So the memory allocated by the xstrdup calls just went nowhere, leaking memory. This reduces the RSS usage for a "rev-list --all --objects" by about 10% on the gentoo repo (fully packed) as well as linux-2.6.git: gentoo: | old | new ----------------|------------------------------- RSS | 1537284 | 1388408 VSZ | 1816852 | 1667952 time elapsed | 1:49.62 | 1:48.99 min. page faults| 417178 | 379919 linux-2.6.git: | old | new ----------------|------------------------------- RSS | 324452 | 292996 VSZ | 491792 | 460376 time elapsed | 0:14.53 | 0:14.28 min. page faults| 89360 | 81613 Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-07list-objects: add "void *data" parameter to show functionsChristian Couder1-4/+5
The goal of this patch is to get rid of the "static struct rev_info revs" static variable in "builtin-rev-list.c". To do that, we need to pass the revs to the "show_commit" function in "builtin-rev-list.c" and this in turn means that the "traverse_commit_list" function in "list-objects.c" must be passed functions pointers to functions with 2 parameters instead of one. So we have to change all the callers and all the functions passed to "traverse_commit_list". Anyway this makes the code more clean and more generic, so it should be a good thing in the long run. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-18list-objects.c::process_tree/blob: check for NULLMartin Koegler1-0/+4
As these functions are directly called with the result from lookup_tree/blob, they must handle NULL. Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-10Fix memory leak in traverse_commit_listShawn O. Pearce1-0/+7
If we were listing objects too then the objects were buffered in an array only reachable from a stack allocated structure. When this function returns that array would be leaked as nobody would have a reference to it anymore. Historically this hasn't been a problem as the primary user of traverse_commit_list() (the noble git-rev-list) would terminate as soon as the function was finished, thus allowing the operating system to cleanup memory. However we have been leaking this data in git-pack-objects ever since that program learned how to run the revision listing internally, rather than relying on reading object names from git-rev-list. To better facilitate reuse of traverse_commit_list during other builtin tools (such as git-fetch) we shouldn't leak temporary memory like this and instead we need to clean up properly after ourselves. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-05-21rename dirlink to gitlink.Martin Waitz1-1/+1
Unify naming of plumbing dirlink/gitlink concept: git ls-files -z '*.[ch]' | xargs -0 perl -pi -e 's/dirlink/gitlink/g;' -e 's/DIRLNK/GITLINK/g;' Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-14Teach git list-objects logic to not follow gitlinksLinus Torvalds1-0/+34
This allows us to pack superprojects and thus clone them (but not yet check them out on the receiving side.. That's the next patch) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-21Initialize tree descriptors with a helper function rather than by hand.Linus Torvalds1-2/+1
This removes slightly more lines than it adds, but the real reason for doing this is that future optimizations will require more setup of the tree descriptor, and so we want to do it in one place. Also renamed the "desc.buf" field to "desc.buffer" just to trigger compiler errors for old-style manual initializations, making sure I didn't miss anything. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-07pack-objects: further work on internal rev-list logic.Junio C Hamano1-0/+33
This teaches the internal rev-list logic to understand options that are needed for pack handling: --all, --unpacked, and --thin. It also moves two functions from builtin-rev-list to list-objects so that the two programs can share more code. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-07Separate object listing routines out of rev-listJunio C Hamano1-0/+107
Create a separate file, list-objects.c, and move object listing routines from rev-list to it. The next round will use it in pack-objects directly. Signed-off-by: Junio C Hamano <junkio@cox.net>