aboutsummaryrefslogtreecommitdiffstats
path: root/revision.h
AgeCommit message (Collapse)AuthorFilesLines
2024-04-03revision: optionally record matches with pathspec elementsJunio C Hamano1-0/+1
Unlike "git add" and other end-user facing commands, where it is diagnosed as an error to give a pathspec with an element that does not match any path, the diff machinery does not care if some elements of the pathspec do not match. Given that the diff machinery is heavily used in pathspec-limited "git log" machinery, and it is common for a path to come and go while traversing the project history, this is usually a good thing. However, in some cases we would want to know if all the pathspec elements matched. For example, "git add -u <pathspec>" internally uses the machinery used by "git diff-files" to decide contents from what paths to add to the index, and as an end-user facing command, "git add -u" would want to report an unmatched pathspec element. Add a new .ps_matched member next to the .prune_data member in "struct rev_info" so that we can optionally keep track of the use of .prune_data pathspec elements that can be inspected by the caller. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-01rev-list: add commit object support in `--missing` optionKarthik Nayak1-0/+4
The `--missing` object option in rev-list currently works only with missing blobs/trees. For missing commits the revision walker fails with a fatal error. Let's extend the functionality of `--missing` option to also support commit objects. This is done by adding a `missing_objects` field to `rev_info`. This field is an `oidset` to which we'll add the missing commits as we encounter them. The revision walker will now continue the traversal and call `show_commit()` even for missing commits. In rev-list we can then check if the commit is a missing commit and call the existing code for parsing `--missing` objects. A scenario where this option would be used is to find the boundary objects between different object directories. Consider a repository with a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a repository, using the `--missing=print` option while disabling the alternate object directory allows us to find the boundary objects between the main and alternate object directory. Helped-by: Patrick Steinhardt <ps@pks.im> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-01revision: rename bit to `do_not_die_on_missing_objects`Karthik Nayak1-8/+9
The bit `do_not_die_on_missing_tree` is used in revision.h to ensure the revision walker does not die when encountering a missing tree. This is currently exclusively set within `builtin/rev-list.c` to ensure the `--missing` option works with missing trees. In the upcoming commits, we will extend `--missing` to also support missing commits. So let's rename the bit to `do_not_die_on_missing_objects`, which is object type agnostic and can be used for both trees/commits. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-19range-diff: treat notes like `log`Kristoffer Haugsbakk1-0/+1
Currently, `range-diff` shows the default notes if no notes-related arguments are given. This is also how `log` behaves. But unlike `range-diff`, `log` does *not* show the default notes if `--notes=<custom>` are given. In other words, this: git log --notes=custom is equivalent to this: git log --no-notes --notes=custom While: git range-diff --notes=custom acts like this: git log --notes --notes-custom This can’t be how the user expects `range-diff` to behave given that the man page for `range-diff` under `--[no-]notes[=<ref>]` says: > This flag is passed to the `git log` program (see git-log(1)) that > generates the patches. This behavior also affects `format-patch` since it uses `range-diff` for the cover letter. Unlike `log`, though, `format-patch` is not supposed to show the default notes if no notes-related arguments are given.[1] But this promise is broken when the range-diff happens to have something to say about the changes to the default notes, since that will be shown in the cover letter. Remedy this by introducing `--show-notes-by-default` that `range-diff` can use to tell the `log` subprocess what to do. § Authors • Fix by Johannes • Tests by Kristoffer † 1: See e.g. 66b2ed09c2 (Fix "log" family not to be too agressive about showing notes, 2010-01-20). Co-authored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-25Merge branch 'jk/unused-parameter'Junio C Hamano1-1/+1
Mark-up unused parameters in the code so that we can eventually enable -Wunused-parameter by default. * jk/unused-parameter: t/helper: mark unused callback void data parameters tag: mark unused parameters in each_tag_name_fn callbacks rev-parse: mark unused parameter in for_each_abbrev callback replace: mark unused parameter in each_mergetag_fn callback replace: mark unused parameter in ref callback merge-tree: mark unused parameter in traverse callback fsck: mark unused parameters in various fsck callbacks revisions: drop unused "opt" parameter in "tweak" callbacks count-objects: mark unused parameter in alternates callback am: mark unused keep_cr parameters http-push: mark unused parameter in xml callback http: mark unused parameters in curl callbacks do_for_each_ref_helper(): mark unused repository parameter test-ref-store: drop unimplemented reflog-expire command
2023-07-21Merge branch 'tb/refs-exclusion-and-packed-refs'Junio C Hamano1-2/+3
Enumerating refs in the packed-refs file, while excluding refs that match certain patterns, has been optimized. * tb/refs-exclusion-and-packed-refs: ls-refs.c: avoid enumerating hidden refs where possible upload-pack.c: avoid enumerating hidden refs where possible builtin/receive-pack.c: avoid enumerating hidden references refs.h: implement `hidden_refs_to_excludes()` refs.h: let `for_each_namespaced_ref()` take excluded patterns revision.h: store hidden refs in a `strvec` refs/packed-backend.c: add trace2 counters for jump list refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) refs/packed-backend.c: refactor `find_reference_location()` refs: plumb `exclude_patterns` argument throughout builtin/for-each-ref.c: add `--exclude` option ref-filter.c: parameterize match functions over patterns ref-filter: add `ref_filter_clear()` ref-filter: clear reachable list pointers after freeing ref-filter.h: provide `REF_FILTER_INIT` refs.c: rename `ref_filter`
2023-07-13revisions: drop unused "opt" parameter in "tweak" callbacksJeff King1-1/+1
The setup_revision_opt struct has a "tweak" function pointer, which can be used to adjust parameters after setup_revisions() parses arguments, but before it finalizes setup. In addition to the rev_info struct, the callback receives a pointer to the setup_revision_opt, as well. But none of the existing callbacks looks at the extra "opt" parameter, leading to -Wunused-parameter warnings. We could mark it as UNUSED, but instead let's remove it entirely. It's conceivable that it could be useful for a callback to have access to the "opt" struct. But in the 13 years that this mechanism has existed, nobody has used it. So let's just drop it in the name of simplifying. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10revision.h: store hidden refs in a `strvec`Taylor Blau1-2/+3
In subsequent commits, it will be convenient to have a 'const char **' of hidden refs (matching `transfer.hiderefs`, `uploadpack.hideRefs`, etc.), instead of a `string_list`. Convert spots throughout the tree that store the list of hidden refs from a `string_list` to a `strvec`. Note that in `parse_hide_refs_config()` there is an ugly const-cast used to avoid an extra copy of each value before trimming any trailing slash characters. This could instead be written as: ref = xstrdup(value); len = strlen(ref); while (len && ref[len - 1] == '/') ref[--len] = '\0'; strvec_push(hide_refs, ref); free(ref); but the double-copy (once when calling `xstrdup()`, and another via `strvec_push()`) is wasteful. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-13Merge branch 'jc/pack-ref-exclude-include'Junio C Hamano1-1/+1
"git pack-refs" learns "--include" and "--exclude" to tweak the ref hierarchy to be packed using pattern matching. * jc/pack-ref-exclude-include: pack-refs: teach pack-refs --include option pack-refs: teach --exclude option to exclude refs from being packed docs: clarify git-pack-refs --all will pack all refs
2023-05-12pack-refs: teach --exclude option to exclude refs from being packedJohn Cai1-1/+1
At GitLab, we have a system that creates ephemeral internal refs that don't live long before getting deleted. Having an option to exclude certain refs from a packed-refs file allows these internal references to be deleted much more efficiently. Add an --exclude option to the pack-refs builtin, and use the ref exclusions API to exclude certain refs from being packed into the final packed-refs file Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-09Merge branch 'en/header-split-cache-h-part-2'Junio C Hamano1-0/+1
More header clean-up. * en/header-split-cache-h-part-2: (22 commits) reftable: ensure git-compat-util.h is the first (indirect) include diff.h: reduce unnecessary includes object-store.h: reduce unnecessary includes commit.h: reduce unnecessary includes fsmonitor: reduce includes of cache.h cache.h: remove unnecessary headers treewide: remove cache.h inclusion due to previous changes cache,tree: move basic name compare functions from read-cache to tree cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c hash-ll.h: split out of hash.h to remove dependency on repository.h tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h dir.h: move DTYPE defines from cache.h versioncmp.h: move declarations for versioncmp.c functions from cache.h ws.h: move declarations for ws.c functions from cache.h match-trees.h: move declarations for match-trees.c functions from cache.h pkt-line.h: move declarations for pkt-line.c functions from cache.h base85.h: move declarations for base85.c functions from cache.h copy.h: move declarations for copy.c functions from cache.h server-info.h: move declarations for server-info.c functions from cache.h packfile.h: move pack_window and pack_entry from cache.h ...
2023-04-24commit.h: reduce unnecessary includesElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-06Merge branch 'ab/remove-implicit-use-of-the-repository'Junio C Hamano1-3/+0
Code clean-up around the use of the_repository. * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-04-04Merge branch 'ab/remove-implicit-use-of-the-repository' into ↵Junio C Hamano1-3/+0
en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
2023-03-30Merge branch 'sg/parse-options-h-users'Junio C Hamano1-1/+2
Code clean-up to include and/or uninclude parse-options.h file as needed. * sg/parse-options-h-users: treewide: remove unnecessary inclusions of parse-options.h from headers treewide: include parse-options.h in source files
2023-03-28cocci: apply the "revision.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason1-3/+0
Apply the part of "the_repository.pending.cocci" pertaining to "revision.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20treewide: remove unnecessary inclusions of parse-options.h from headersSZEDER Gábor1-1/+2
The headers 'diagnose.h', 'list-objects-filter-options.h', 'ref-filter.h' and 'remote.h' declare option parsing callback functions with a 'struct option*' parameter, and 'revision.h' declares an option parsing helper function taking 'struct parse_opt_ctx_t*' and 'struct option*' parameters. These headers all include 'parse-options.h', although they don't need any of the type definitions from that header file. Furthermore, 'list-objects-filter-options.h' and 'ref-filter.h' also define some OPT_* macros to initialize a 'struct option', but these don't necessitate the inclusion of parse-options.h in these headers either, because these macros are only expanded in source files. Remove these unnecessary inclusions of parse-options.h and use forward declarations to declare the necessary types. After this patch none of the header files include parse-options.h anymore. With these changes, the build time after modifying only parse-options.h is reduced by about 30%, and the number of targets built is almost 20% less: Before: $ touch parse-options.h && time make -j4 |wc -l 353 real 1m1.527s user 3m32.205s sys 0m15.903s After: 289 real 0m39.285s user 2m12.540s sys 0m11.164s Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23ident.h: move ident-related declarations out of cache.hElijah Newren1-0/+1
These functions were all defined in a separate ident.c already, so create ident.h and move the declarations into that file. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-23Merge branch 'ps/receive-use-only-advertised'Junio C Hamano1-6/+37
"git receive-pack" used to use all the local refs as the boundary for checking connectivity of the data "git push" sent, but now it uses only the refs that it advertised to the pusher. In a repository with the .hideRefs configuration, this reduces the resources needed to perform the check. cf. <221028.86bkpw805n.gmgdl@evledraar.gmail.com> cf. <xmqqr0yrizqm.fsf@gitster.g> * ps/receive-use-only-advertised: receive-pack: only use visible refs for connectivity check rev-parse: add `--exclude-hidden=` option revision: add new parameter to exclude hidden refs revision: introduce struct to handle exclusions revision: move together exclusion-related functions refs: get rid of global list of hidden refs refs: fix memory leak when parsing hideRefs config
2022-11-17revision: add new parameter to exclude hidden refsPatrick Steinhardt1-0/+16
Users can optionally hide refs from remote users in git-upload-pack(1), git-receive-pack(1) and others via the `transfer.hideRefs`, but there is not an easy way to obtain the list of all visible or hidden refs right now. We'll require just that though for a performance improvement in our connectivity check. Add a new option `--exclude-hidden=` that excludes any hidden refs from the next pseudo-ref like `--all` or `--branches`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-17revision: introduce struct to handle exclusionsPatrick Steinhardt1-6/+21
The functions that handle exclusion of refs work on a single string list. We're about to add a second mechanism for excluding refs though, and it makes sense to reuse much of the same architecture for both kinds of exclusion. Introduce a new `struct ref_exclusions` that encapsulates all the logic related to excluding refs and move the `struct string_list` that holds all wildmatch patterns of excluded refs into it. Rename functions that operate on this struct to match its name. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08revisions API: extend the nascent REV_INFO_INIT macroÆvar Arnfjörð Bjarmason1-1/+17
Have the REV_INFO_INIT macro added in [1] declare more members of "struct rev_info" that we can initialize statically, and have repo_init_revisions() do so with the memcpy(..., &blank) idiom introduced in [2]. As the comment for the "REV_INFO_INIT" macro notes this still isn't sufficient to initialize a "struct rev_info" for use yet. But we are getting closer to that eventual goal. Even though we can't fully initialize a "struct rev_info" with REV_INFO_INIT it's useful for readability to clearly separate those things that we can statically initialize, and those that we can't. This change could replace the: list_objects_filter_init(&revs->filter); In the repo_init_revisions() with this line, at the end of the REV_INFO_INIT deceleration in revisions.h: .filter = LIST_OBJECTS_FILTER_INIT, \ But doing so would produce a minor conflict with an outstanding topic[3]. Let's skip that for now. I have follow-ups to initialize more of this statically, e.g. changes to get rid of grep_init(). We can initialize more members with the macro in a future series. 1. f196c1e908d (revisions API users: use release_revisions() needing REV_INFO_INIT, 2022-04-13) 2. 5726a6b4012 (*.c *_init(): define in terms of corresponding *_INIT macro, 2021-07-01) 3. https://lore.kernel.org/git/265b292ed5c2de19b7118dfe046d3d9d932e2e89.1667901510.git.ps@pks.im/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-09-09Merge branch 'jc/format-patch-force-in-body-from'Junio C Hamano1-0/+1
"git format-patch --from=<ident>" can be told to add an in-body "From:" line even for commits that are authored by the given <ident> with "--force-in-body-from"option. * jc/format-patch-force-in-body-from: format-patch: learn format.forceInBodyFrom configuration variable format-patch: allow forcing the use of in-body From: header pretty: separate out the logic to decide the use of in-body from
2022-08-29format-patch: allow forcing the use of in-body From: headerJunio C Hamano1-0/+1
Users may be authoring and committing their commits under the same e-mail address they use to send their patches from, in which case they shouldn't need to use the in-body From: line in their outgoing e-mails. At the receiving end, "git am" will use the address on the "From:" header of the incoming e-mail and all should be well. Some mailing lists, however, mangle the From: address from what the original sender had; in such a situation, the user may want to add the in-body "From:" header even for their own patches. "git format-patch --[no-]force-in-body-from" was invented for such users. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19revision: allow --ancestry-path to take an argumentElijah Newren1-0/+9
We have long allowed users to run e.g. git log --ancestry-path master..seen which shows all commits which satisfy all three of these criteria: * are an ancestor of seen * are not an ancestor of master * have master as an ancestor This commit allows another variant: git log --ancestry-path=$TOPIC master..seen which shows all commits which satisfy all of these criteria: * are an ancestor of seen * are not an ancestor of master * have $TOPIC in their ancestry-path that last bullet can be defined as commits meeting any of these criteria: * are an ancestor of $TOPIC * have $TOPIC as an ancestor * are $TOPIC This also allows multiple --ancestry-path arguments, which can be used to find commits with any of the given topics in their ancestry path. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03revisions API: don't leak memory on argv elements that need free()-ingÆvar Arnfjörð Bjarmason1-1/+2
Add a "free_removed_argv_elements" member to "struct setup_revision_opt", and use it to fix several memory leaks. We have various memory leaks in APIs that take and munge "const char **argv", e.g. parse_options(). Sometimes these APIs are given the "argv" we get to the "main" function, in which case we don't leak memory, but other times we're giving it the "v" member of a "struct strvec" we created. There's several potential ways to fix those sort of leaks, we could add a "nodup" mode to "struct strvec", which would work for the cases where we push constant strings to it. But that wouldn't work as soon as we used strvec_pushf(), or otherwise needed to duplicate or create a string for that "struct strvec". Let's instead make it the responsibility of the revisions API. If it's going to clobber elements of argv it can also free() them, which it will now do if instructed to do so via "free_removed_argv_elements". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-07Merge branch 'ab/plug-leak-in-revisions'Junio C Hamano1-25/+48
Plug the memory leaks from the trickiest API of all, the revision walker. * ab/plug-leak-in-revisions: (27 commits) revisions API: add a TODO for diff_free(&revs->diffopt) revisions API: have release_revisions() release "topo_walk_info" revisions API: have release_revisions() release "date_mode" revisions API: call diff_free(&revs->pruning) in revisions_release() revisions API: release "reflog_info" in release revisions() revisions API: clear "boundary_commits" in release_revisions() revisions API: have release_revisions() release "prune_data" revisions API: have release_revisions() release "grep_filter" revisions API: have release_revisions() release "filter" revisions API: have release_revisions() release "cmdline" revisions API: have release_revisions() release "mailmap" revisions API: have release_revisions() release "commits" revisions API users: use release_revisions() for "prune_data" users revisions API users: use release_revisions() with UNLEAK() revisions API users: use release_revisions() in builtin/log.c revisions API users: use release_revisions() in http-push.c revisions API users: add "goto cleanup" for release_revisions() stash: always have the owner of "stash_info" free it revisions API users: use release_revisions() needing REV_INFO_INIT revision.[ch]: document and move code declared around "init" ...
2022-04-23log: "--since-as-filter" option is a non-terminating "--since" variantMiklos Vajna1-0/+1
The "--since=<time>" option of "git log" limits the commits displayed by the command by stopping the traversal once it sees a commit whose timestamp is older than the given time and not digging further into its parents. This is OK in a history where a commit always has a newer timestamp than any of its parents'. Once you see a commit older than the given <time>, all ancestor commits of it are even older than the time anyway. It poses, however, a problem when there is a commit with a wrong timestamp that makes it appear older than its parents. Stopping traversal at the "incorrectly old" commit will hide its ancestors that are newer than that wrong commit and are newer than the cut-off time given with the --since option. --max-age and --after being the synonyms to --since, they share the same issue. Add a new "--since-as-filter" option that is a variant of "--since=<time>". Instead of stopping the traversal to hide an old enough commit and its all ancestors, exclude commits with an old timestamp from the output but still keep digging the history. Without other traversal stopping options, this will force the command in "git log" family to dig down the history to the root. It may be an acceptable cost for a small project with short history and many commits with screwy timestamps. It is quite unlikely for us to add traversal stopper other than since, so have this as a --since-as-filter option, rather than a separate --as-filter, that would be probably more confusing. Signed-off-by: Miklos Vajna <vmiklos@vmiklos.hu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revisions API users: use release_revisions() needing REV_INFO_INITÆvar Arnfjörð Bjarmason1-1/+20
Use release_revisions() to various users of "struct rev_list" which need to have their "struct rev_info" zero-initialized before we can start using it. For the bundle.c code see the early exit case added in 3bbbe467f29 (bundle verify: error out if called without an object database, 2019-05-27). For the relevant bisect.c code see 45b6370812c (bisect: libify `check_good_are_ancestors_of_bad` and its dependents, 2020-02-17). For the submodule.c code see the "goto" on "(!left || !right || !sub)" added in 8e6df65015f (submodule: refactor show_submodule_summary with helper function, 2016-08-31). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revision.[ch]: document and move code declared around "init"Ævar Arnfjörð Bjarmason1-26/+24
A subsequent commit will add "REV_INFO_INIT" macro adjacent to repo_init_revisions(), unfortunately between the "struct rev_info" itself and that function we've added various miscellaneous code between the two over the years. Let's move that code either lower in revision.h, giving it API docs while we're at it, or in cases where it wasn't public API at all move it into revision.c No lines of code are changed here, only moved around. The only changes are the addition of new API comments. The "tree_difference" variable could also be declared like this, which I think would be a lot clearer, but let's leave that for now to keep this a move-only change: static enum { REV_TREE_SAME, REV_TREE_NEW, /* Only new files */ REV_TREE_OLD, /* Only files removed */ REV_TREE_DIFFERENT, /* Mixed changes */ } tree_difference = REV_TREE_SAME; Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13revision.[ch]: provide and start using a release_revisions()Ævar Arnfjörð Bjarmason1-0/+6
The users of the revision.[ch] API's "struct rev_info" are a major source of memory leaks in the test suite under SANITIZE=leak, which in turn adds a lot of noise when trying to mark up tests with "TEST_PASSES_SANITIZE_LEAK=true". The users of that API are largely one-shot, e.g. "git rev-list" or "git log", or the "git checkout" and "git stash" being modified here For these callers freeing the memory is arguably a waste of time, but in many cases they've actually been trying to free the memory, and just doing that in a buggy manner. Let's provide a release_revisions() function for these users, and start migrating them over per the plan outlined in [1]. Right now this only handles the "pending" member of the struct, but more will be added in subsequent commits. Even though we only clear the "pending" member now, let's not leave a trap in code like the pre-image of index_differs_from(), where we'd start doing the wrong thing as soon as the release_revisions() learned to clear its "diffopt". I.e. we need to call release_revisions() after we've inspected any state in "struct rev_info". This leaves in place e.g. clear_pathspec(&rev.prune_data) in stash_working_tree() in builtin/stash.c, subsequent commits will teach release_revisions() to free "prune_data" and other members that in some cases are individually cleared by users of "struct rev_info" by reaching into its members. Those subsequent commits will remove the relevant calls to e.g. clear_pathspec(). We avoid amending code in index_differs_from() in diff-lib.c as well as wt_status_collect_changes_index(), has_unstaged_changes() and has_uncommitted_changes() in wt-status.c in a way that assumes that we are already clearing the "diffopt" member. That will be handled in a subsequent commit. 1. https://lore.kernel.org/git/87a6k8daeu.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-09revision: put object filter into struct rev_infoDerrick Stolee1-0/+7
Placing a 'struct list_objects_filter_options' within 'struct rev_info' will assist making some bookkeeping around object filters in the future. For now, let's use this new member to remove a static global instance of the struct from builtin/rev-list.c. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-23Merge branch 'ah/log-no-graph'Junio C Hamano1-0/+1
"git log --graph --graph" used to leak a graph structure, and there was no way to countermand "--graph" that appear earlier on the command line. A "--no-graph" option has been added and resource leakage has been plugged. * ah/log-no-graph: log: add a --no-graph option log: fix memory leak if --graph is passed multiple times
2022-02-17Merge branch 'jz/rev-list-exclude-first-parent-only'Junio C Hamano1-1/+2
"git log" and friends learned an option --exclude-first-parent-only to propagate UNINTERESTING bit down only along the first-parent chain, just like --first-parent option shows commits that lack the UNINTERESTING bit only along the first-parent chain. * jz/rev-list-exclude-first-parent-only: git-rev-list: add --exclude-first-parent-only flag
2022-02-16Merge branch 'en/remerge-diff'Junio C Hamano1-1/+5
"git log --remerge-diff" shows the difference from mechanical merge result and the result that is actually recorded in a merge commit. * en/remerge-diff: diff-merges: avoid history simplifications when diffing merges merge-ort: mark conflict/warning messages from inner merges as omittable show, log: include conflict/warning messages in --remerge-diff headers diff: add ability to insert additional headers for paths merge-ort: format messages slightly different for use in headers merge-ort: mark a few more conflict messages as omittable merge-ort: capture and print ll-merge warnings in our preferred fashion ll-merge: make callers responsible for showing warnings log: clean unneeded objects during `log --remerge-diff` show, log: provide a --remerge-diff capability
2022-02-11log: add a --no-graph optionAlex Henrie1-0/+1
It's useful to be able to countermand a previous --graph option, for example if `git log --graph` is run via an alias. Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02log: clean unneeded objects during `log --remerge-diff`Elijah Newren1-0/+3
The --remerge-diff option will need to create new blobs and trees representing the "automatic merge" state. If one is traversing a long project history, one can easily get hundreds of thousands of loose objects generated during `log --remerge-diff`. However, none of those loose objects are needed after we have completed our diff operation; they can be summarily deleted. Add a new helper function to tmp_objdir to discard all the contained objects, and call it after each merge is handled. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02show, log: provide a --remerge-diff capabilityElijah Newren1-1/+2
When this option is specified, we remerge all (two parent) merge commits and diff the actual merge commit to the automatically created version, in order to show how users removed conflict markers, resolved the different conflict versions, and potentially added new changes outside of conflict regions in order to resolve semantic merge problems (or, possibly, just to hide other random changes). This capability works by creating a temporary object directory and marking it as the primary object store. This makes it so that any blobs or trees created during the automatic merge are easily removable afterwards by just deleting all objects from the temporary object directory. There are a few ways that this implementation is suboptimal: * `log --remerge-diff` becomes slow, because the temporary object directory can fill with many loose objects while running * the log output can be muddied with misplaced "warning: cannot merge binary files" messages, since ll-merge.c unconditionally writes those messages to stderr while running instead of allowing callers to manage them. * important conflict and warning messages are simply dropped; thus for conflicts like modify/delete or rename/rename or file/directory which are not representable with content conflict markers, there may be no way for a user of --remerge-diff to know that there had been a conflict which was resolved (and which possibly motivated other changes in the merge commit). * when fixing the previous issue, note that some unimportant conflict and warning messages might start being included. We should instead make sure these remain dropped. Subsequent commits will address these issues. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-12git-rev-list: add --exclude-first-parent-only flagJerry Zhang1-1/+2
It is useful to know when a branch first diverged in history from some integration branch in order to be able to enumerate the user's local changes. However, these local changes can include arbitrary merges, so it is necessary to ignore this merge structure when finding the divergence point. In order to do this, teach the "rev-list" family to accept "--exclude-first-parent-only", which restricts the traversal of excluded commits to only follow first parent links. -A-----E-F-G--main \ / / B-C-D--topic In this example, the goal is to return the set {B, C, D} which represents a topic branch that has been merged into main branch. `git rev-list topic ^main` will end up returning no commits since excluding main will end up traversing the commits on topic as well. `git rev-list --exclude-first-parent-only topic ^main` however will return {B, C, D} as desired. Add docs for the new flag, and clarify the doc for --first-parent to indicate that it applies to traversing the set of included commits only. Signed-off-by: Jerry Zhang <jerry@skydio.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-17log: let --invert-grep only invert --grepRené Scharfe1-2/+0
The option --invert-grep is documented to filter out commits whose messages match the --grep filters. However, it also affects the header matches (--author, --committer), which is not intended. Move the handling of that option to grep.c, as only the code there can distinguish between matches in the header from those in the message body. If --invert-grep is given then enable extended expressions (not the regex type, we just need git grep's --not to work), negate the body patterns and check if any of them match by piggy-backing on the collect_hits mechanism of grep_source_1(). Collecting the matches in struct grep_opt is a bit iffy, but with "last_shown" we have a precedent for writing state information to that struct. Reported-by: Dotan Cohen <dotancohen@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-06Merge branch 'jt/add-submodule-odb-clean-up'Junio C Hamano1-1/+0
More code paths that use the hack to add submodule's object database to the set of alternate object store have been cleaned up. * jt/add-submodule-odb-clean-up: revision: remove "submodule" from opt struct repository: support unabsorbed in repo_submodule_init submodule: remove unnecessary unabsorbed fallback
2021-09-09revision: remove "submodule" from opt structJonathan Tan1-1/+0
Clean up a TODO in revision.h by removing the "submodule" field from struct setup_revision_opt. This field is only used to specify the ref store to use, so use rev_info->repo to determine the ref store instead. The only users of this field are merge-ort.c and merge-recursive.c. However, both these files specify the superproject as rev_info->repo and the submodule as setup_revision_opt->submodule. In order to be able to pass the submodule as rev_info->repo, all commits must be parsed with the submodule explicitly specified; this patch does that as well. (An incremental solution in which only some commits are parsed with explicit submodule will not work, because if the same commit is parsed twice in different repositories, there will be 2 heap-allocated object structs corresponding to that commit, and any flag set by the revision walking mechanism on one of them will not be reflected onto the other.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-05revision: separate walk and unsorted flagsPatrick Steinhardt1-5/+2
The `--no-walk` flag supports two modes: either it sorts the revisions given as input input or it doesn't. This is reflected in a single `no_walk` flag, which reflects one of the three states "walk", "don't walk but without sorting" and "don't walk but with sorting". Split up the flag into two separate bits, one indicating whether we should walk or not and one indicating whether the input should be sorted or not. This will allow us to more easily introduce a new flag `--unsorted-input`, which only impacts the sorting bit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-22Merge branch 'bc/rev-list-without-commit-line'Junio C Hamano1-1/+2
"git rev-list" learns to omit the "commit <object-name>" header lines from the output with the `--no-commit-header` option. * bc/rev-list-without-commit-line: rev-list: add option for --pretty=format without header
2021-07-12rev-list: add option for --pretty=format without headerbrian m. carlson1-1/+2
In general, we encourage users to use plumbing commands, like git rev-list, over porcelain commands, like git log, when scripting. However, git rev-list has one glaring problem that prevents it from being used in certain cases: when --pretty is used with a custom format, it always prints out a line containing "commit" and the object ID. This makes it unsuitable for many scripting needs, and forces users to use git log instead. While we can't change this behavior for backwards compatibility, we can add an option to suppress this behavior, so let's do so, and call it "--no-commit-header". Additionally, add the corresponding positive option to switch it back on. Note that this option doesn't affect the built-in formats, only custom formats. This is exactly the same behavior as users already have from git log and is what most users will be used to. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-08Merge branch 'jk/bitmap-tree-optim'Junio C Hamano1-0/+1
Avoid duplicated work while building reachability bitmaps. * jk/bitmap-tree-optim: bitmaps: don't recurse into trees already in the bitmap
2021-06-15bitmaps: don't recurse into trees already in the bitmapJeff King1-0/+1
If an object is already mentioned in a reachability bitmap we are building, then by definition so are all of the objects it can reach. We have an optimization to stop traversing commits when we see they are already in the bitmap, but we don't do the same for trees. It's generally unavoidable to recurse into trees for commits not yet covered by bitmaps (since most commits generally do have unique top-level trees). But they usually have subtrees that are shared with other commits (i.e., all of the subtrees the commit _didn't_ touch). And some of those commits (and their trees) may be covered by the bitmap. Usually this isn't _too_ big a deal, because we'll visit those subtrees only once in total for the whole walk. But if you have a large number of unbitmapped commits, and if your tree is big, then you may end up opening a lot of sub-trees for no good reason. We can use the same optimization we do for commits here: when we are about to open a tree, see if it's in the bitmap (either the one we are building, or the "seen" bitmap which covers the UNINTERESTING side of the bitmap when doing a set-difference). This works especially well because we'll visit all commits before hitting any trees. So even in a history like: A -- B if "A" has a bitmap on disk but "B" doesn't, we'll already have OR-ed in the results from A before looking at B's tree (so we really will only look at trees touched by B). For most repositories, the timings produced by p5310 are unspectacular. Here's linux.git: Test HEAD^ HEAD -------------------------------------------------------------------- 5310.4: simulated clone 6.00(5.90+0.10) 5.98(5.90+0.08) -0.3% 5310.5: simulated fetch 2.98(5.45+0.18) 2.85(5.31+0.18) -4.4% 5310.7: rev-list (commits) 0.32(0.29+0.03) 0.33(0.30+0.03) +3.1% 5310.8: rev-list (objects) 1.48(1.44+0.03) 1.49(1.44+0.05) +0.7% Any improvement there is within the noise (the +3.1% on test 7 has to be noise, since we are not recursing into trees, and thus the new code isn't even run). The results for git.git are likewise uninteresting. But here are numbers from some other real-world repositories (that are not public). This one's tree is comparable in size to linux.git, but has ~16k refs (and so less complete bitmap coverage): Test HEAD^ HEAD ------------------------------------------------------------------------- 5310.4: simulated clone 38.34(39.86+0.74) 33.95(35.53+0.76) -11.5% 5310.5: simulated fetch 2.29(6.31+0.35) 2.20(5.97+0.41) -3.9% 5310.7: rev-list (commits) 0.99(0.86+0.13) 0.96(0.85+0.11) -3.0% 5310.8: rev-list (objects) 11.32(11.04+0.27) 6.59(6.37+0.21) -41.8% And here's another with a very large tree (~340k entries), and a fairly large number of refs (~10k): Test HEAD^ HEAD ------------------------------------------------------------------------- 5310.3: simulated clone 53.83(54.71+1.54) 39.77(40.76+1.50) -26.1% 5310.4: simulated fetch 19.91(20.11+0.56) 19.79(19.98+0.67) -0.6% 5310.6: rev-list (commits) 0.54(0.44+0.11) 0.51(0.43+0.07) -5.6% 5310.7: rev-list (objects) 24.32(23.59+0.73) 9.85(9.49+0.36) -59.5% This patch provides substantial improvements in these larger cases, and have any drawbacks for smaller ones (the cost of the bitmap check is quite small compared to an actual tree traversal). Note that we have to add a version of revision.c's include_check callback which handles non-commits. We could possibly consolidate this into a single callback for all objects types, as there's only one user of the feature which would need converted (pack-bitmap.c:should_include). That would in theory let us avoid duplicating any logic. But when I tried it, the code ended up much worse to read, with lots of repeated "if it's a commit do this, otherwise do that". Having two separate callbacks splits that naturally, and matches the existing split of show_commit/show_object callbacks. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-14Merge branch 'so/log-m-implies-p'Junio C Hamano1-1/+1
The "-m" option in "git log -m" that does not specify which format, if any, of diff is desired did not have any visible effect; it now implies some form of diff (by default "--patch") is produced. * so/log-m-implies-p: diff-merges: let "-m" imply "-p" diff-merges: rename "combined_imply_patch" to "merges_imply_patch" stash list: stop passing "-m" to "git log" git-svn: stop passing "-m" to "git rev-list" diff-merges: move specific diff-index "-m" handling to diff-index t4013: test "git diff-index -m" t4013: test "git diff-tree -m" t4013: test "git log -m --stat" t4013: test "git log -m --raw" t4013: test that "-m" alone has no effect in "git log"
2021-05-21diff-merges: rename "combined_imply_patch" to "merges_imply_patch"Sergey Organov1-1/+1
This is refactoring change in preparation for the next commit that will let -m imply -p. The old name doesn't match the intention to let not only -c/-cc imply -p, but also -m, that is not a "combined" format, so we rename the flag accordingly. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-10revision: mark commit parents as NOT_USER_GIVENPatrick Steinhardt1-3/+0
The NOT_USER_GIVEN flag of an object marks whether a flag was explicitly provided by the user or not. The most important use case for this is when filtering objects: only objects that were not explicitly requested will get filtered. The flag is currently only set for blobs and trees, which has been fine given that there are no filters for tags or commits currently. We're about to extend filtering capabilities to add object type filter though, which requires us to set up the NOT_USER_GIVEN flag correctly -- if it's not set, the object wouldn't get filtered at all. Mark unseen commit parents as NOT_USER_GIVEN when processing parents. Like this, explicitly provided parents stay user-given and thus unfiltered, while parents which get loaded as part of the graph walk can be filtered. This commit shouldn't have any user-visible impact yet as there is no logic to filter commits yet. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-02Merge branch 'zh/format-patch-fractional-reroll-count'Junio C Hamano1-1/+1
"git format-patch -v<n>" learned to allow a reroll count that is not an integer. * zh/format-patch-fractional-reroll-count: format-patch: allow a non-integral version numbers
2021-03-23format-patch: allow a non-integral version numbersZheNing Hu1-1/+1
The `-v<n>` option of `format-patch` can give nothing but an integral iteration number to patches in a series.  Some people, however, prefer to mark a new iteration with only a small fixup with a non integral iteration number (e.g. an "oops, that was wrong" fix-up patch for v4 iteration may be labeled as "v4.1"). Allow `format-patch` to take such a non-integral iteration number. `<n>` can be any string, such as '3.1' or '4rev2'. In the case where it is a non-integral value, the "Range-diff" and "Interdiff" headers will not include the previous version. Signed-off-by: ZheNing Hu <adlternative@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22revision: learn '--no-kept-objects'Taylor Blau1-0/+4
A future caller will want to be able to perform a reachability traversal which terminates when visiting an object found in a kept pack. The closest existing option is '--honor-pack-keep', but this isn't quite what we want. Instead of halting the traversal midway through, a full traversal is always performed, and the results are only trimmed afterwords. Besides needing to introduce a new flag (since culling results post-facto can be different than halting the traversal as it's happening), there is an additional wrinkle handling the distinction in-core and on-disk kept packs. That is: what kinds of kept pack should stop the traversal? Introduce '--no-kept-objects[=<on-disk|in-core>]' to specify which kinds of kept packs, if any, should stop a traversal. This can be useful for callers that want to perform a reachability analysis, but want to leave certain packs alone (for e.g., when doing a geometric repack that has some "large" packs which are kept in-core that it wants to leave alone). Note that this option is not guaranteed to produce exactly the set of objects that aren't in kept packs, since it's possible the traversal order may end up in a situation where a non-kept ancestor was "cut off" by a kept object (at which point we would stop traversing). But, we don't care about absolute correctness here, since this will eventually be used as a purely additive guide in an upcoming new repack mode. Explicitly avoid documenting this new flag, since it is only used internally. In theory we could avoid even adding it rev-list, but being able to spell this option out on the command-line makes some special cases easier to test without promising to keep it behaving consistently forever. Those tricky cases are exercised in t6114. Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-05Merge branch 'so/log-diff-merge'Junio C Hamano1-2/+7
"git log" learned a new "--diff-merges=<how>" option. * so/log-diff-merge: (32 commits) t4013: add tests for --diff-merges=first-parent doc/git-show: include --diff-merges description doc/rev-list-options: document --first-parent changes merges format doc/diff-generate-patch: mention new --diff-merges option doc/git-log: describe new --diff-merges options diff-merges: add '--diff-merges=1' as synonym for 'first-parent' diff-merges: add old mnemonic counterparts to --diff-merges diff-merges: let new options enable diff without -p diff-merges: do not imply -p for new options diff-merges: implement new values for --diff-merges diff-merges: make -m/-c/--cc explicitly mutually exclusive diff-merges: refactor opt settings into separate functions diff-merges: get rid of now empty diff_merges_init_revs() diff-merges: group diff-merge flags next to each other inside 'rev_info' diff-merges: split 'ignore_merges' field diff-merges: fix -m to properly override -c/--cc t4013: add tests for -m failing to override -c/--cc t4013: support test_expect_failure through ':failure' magic diff-merges: revise revs->diff flag handling diff-merges: handle imply -p on -c/--cc logic for log.c ...
2020-12-21diff-merges: let new options enable diff without -pSergey Organov1-0/+1
New options don't have any visible effect unless -p is either given or implied, as unlike -c/-cc we don't imply -p with --diff-merges. To fix this, this patch adds new functionality by letting new options enable output of diffs for merge commits only. Add 'merges_need_diff' field and set it whenever diff output for merges is enabled by any of the new options. Extend diff output logic accordingly, to output diffs for merges when 'merges_need_diff' is set even when no -p has been provided. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-21diff-merges: do not imply -p for new optionsSergey Organov1-0/+1
Add 'combined_imply_patch' field and set it only for old --cc/-c options, then imply -p if this flag is set instead of implying -p whenever 'combined_merge' flag is set. We don't want new --diff-merge options to imply -p, to make it possible to enable output of diffs for merges independently from non-merge commits. At the same time we want to preserve behavior of old --c/-c/-m options and their interactions with --first-parent, to stay backward-compatible. This patch is first step in this direction: it separates old "--cc/-c imply -p" logic from the rest of the options. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-21diff-merges: group diff-merge flags next to each other inside 'rev_info'Sergey Organov1-2/+3
The relevant flags were somewhat scattered over definition of 'struct rev_info'. Rearrange them to group them together. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-21diff-merges: split 'ignore_merges' fieldSergey Organov1-1/+2
'ignore_merges' was 3-way field that served two distinct purposes that we now assign to 2 new independent flags: 'separate_merges', and 'explicit_diff_merges'. 'separate_merges' tells that we need to output diff format containing separate diff for every parent (as opposed to 'combine_merges'). 'explicit_diff_merges' tells that at least one of diff-merges options has been explicitly specified on the command line, so no defaults should apply. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-21diff-merges: introduce revs->first_parent_merges flagSergey Organov1-0/+1
This new field allows us to separate format of diff for merges from 'first_parent_only' flag which primary purpose is limiting history traversal. This change further localizes diff format selection logic into the diff-merges.c file. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-21revision: move diff merges functions to its own diff-merges.cSergey Organov1-3/+0
Create separate diff-merges.c and diff-merges.h files, and move all the code related to handling of diff merges there. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-21revision: provide implementation for diff merges tweaksSergey Organov1-0/+3
Use these implementations from show_setup_revisions_tweak() and log_setup_revisions_tweak() in builtin/log.c. This completes moving of management of diff merges parameters to a single place, where we can finally observe them simultaneously. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09format-patch: make output filename configurableJunio C Hamano1-0/+1
For the past 15 years, we've used the hardcoded 64 as the length limit of the filename of the output from the "git format-patch" command. Since the value is shorter than the 80-column terminal, it could grow without line wrapping a bit. At the same time, since the value is longer than half of the 80-column terminal, we could fit two or more of them in "ls" output on such a terminal if we allowed to lower it. Introduce a new command line option --filename-max-length=<n> and a new configuration variable format.filenameMaxLength to override the hardcoded default. While we are at it, remove a check that the name of output directory does not exceed PATH_MAX---this check is pointless in that by the time control reaches the function, the caller would already have done an equivalent of "mkdir -p", so if the system does not like an overly long directory name, the control wouldn't have reached here, and otherwise, we know that the system allowed the output directory to exist. In the worst case, we will get an error when we try to open the output file and handle the error correctly anyway. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-31revision: add separate field for "-m" of "diff-index -m"Sergey Organov1-0/+1
Add separate 'match_missing' field for diff-index to use and set it when we encounter "-m" option. This field won't then be cleared when another meaning of "-m" is reverted (e.g., by "--no-diff-merges"), nor it will be affected by future option(s) that might drive 'ignore_merges' field. Use this new field from diff-lib:do_oneway_diff() instead of reusing 'ignore_merges' field. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17Merge branch 'jk/log-fp-implies-m'Junio C Hamano1-1/+1
"git log --first-parent -p" showed patches only for single-parent commits on the first-parent chain; the "--first-parent" option has been made to imply "-m". Use "--no-diff-merges" to restore the previous behaviour to omit patches for merge commits. * jk/log-fp-implies-m: doc/git-log: clarify handling of merge commit diffs doc/git-log: move "-t" into diff-options list doc/git-log: drop "-r" diff option doc/git-log: move "Diff Formatting" from rev-list-options log: enable "-m" automatically with "--first-parent" revision: add "--no-diff-merges" option to counteract "-m" log: drop "--cc implies -m" logic
2020-07-30Merge branch 'ds/commit-graph-bloom-updates' into masterJunio C Hamano1-2/+4
Updates to the changed-paths bloom filter. * ds/commit-graph-bloom-updates: commit-graph: check all leading directories in changed path Bloom filters revision: empty pathspecs should not use Bloom filters revision.c: fix whitespace commit-graph: check chunk sizes after writing commit-graph: simplify chunk writes into loop commit-graph: unify the signatures of all write_graph_chunk_*() functions commit-graph: persist existence of changed-paths bloom: fix logic in get_bloom_filter() commit-graph: change test to die on parse, not load commit-graph: place bloom_settings in context
2020-07-29revision: add "--no-diff-merges" option to counteract "-m"Jeff King1-1/+1
The "-m" option sets revs->ignore_merges to "0", but there's no way to undo it. This probably isn't something anybody overly cares about, since "1" is already the default, but it will serve as an escape hatch when we flip the default for ignore_merges to "0" in more situations. We'll also add a few extra niceties: - initialize the value to "-1" to indicate "not set", and then resolve it to the normal 0/1 bool in setup_revisions(). This lets any tweak functions, as well as setup_revisions() itself, avoid clobbering the user's preference (which until now they couldn't actually express). - since we now have --no-diff-merges, let's add the matching --diff-merges, which is just a synonym for "-m". Then we don't even need to document --no-diff-merges separately; it countermands the long form of "-m" in the usual way. The new test shows that this behaves just the same as the current behavior without "-m". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-01commit-graph: check all leading directories in changed path Bloom filtersSZEDER Gábor1-2/+4
The file 'dir/subdir/file' can only be modified if its leading directories 'dir' and 'dir/subdir' are modified as well. So when checking modified path Bloom filters looking for commits modifying a path with multiple path components, then check not only the full path in the Bloom filters, but all its leading directories as well. Take care to check these paths in "deepest first" order, because it's the full path that is least likely to be modified, and the Bloom filter queries can short circuit sooner. This can significantly reduce the average false positive rate, by about an order of magnitude or three(!), and can further speed up pathspec-limited revision walks. The table below compares the average false positive rate and runtime of git rev-list HEAD -- "$path" before and after this change for 5000+ randomly* selected paths from each repository: Average false Average Average positive rate runtime runtime before after before after difference ------------------------------------------------------------------ git 3.220% 0.7853% 0.0558s 0.0387s -30.6% linux 2.453% 0.0296% 0.1046s 0.0766s -26.8% tensorflow 2.536% 0.6977% 0.0594s 0.0420s -29.2% *Path selection was done with the following pipeline: git ls-tree -r --name-only HEAD | sort -R | head -n 5000 The improvements in runtime are much smaller than the improvements in average false positive rate, as we are clearly reaching diminishing returns here. However, all these timings depend on that accessing tree objects is reasonably fast (warm caches). If we had a partial clone and the tree objects had to be fetched from a promisor remote, e.g.: $ git clone --filter=tree:0 --bare file://.../webkit.git webkit.notrees.git $ git -C webkit.git -c core.modifiedPathBloomFilters=1 \ commit-graph write --reachable $ cp webkit.git/objects/info/commit-graph webkit.notrees.git/objects/info/ $ git -C webkit.notrees.git -c core.modifiedPathBloomFilters=1 \ rev-list HEAD -- "$path" then checking all leading path component can reduce the runtime from over an hour to a few seconds (and this is with the clone and the promisor on the same machine). This adjusts the tracing values in t4216-log-bloom.sh, which provides a concrete way to notice the improvement. Helped-by: Taylor Blau <me@ttaylorr.com> Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24revision: reallocate TOPO_WALK object flagsRené Scharfe1-3/+4
The bit fields in struct object have an unfortunate layout. Here's what pahole reports on x86_64 GNU/Linux: struct object { unsigned int parsed:1; /* 0: 0 4 */ unsigned int type:3; /* 0: 1 4 */ /* XXX 28 bits hole, try to pack */ /* Force alignment to the next boundary: */ unsigned int :0; unsigned int flags:29; /* 4: 0 4 */ /* XXX 3 bits hole, try to pack */ struct object_id oid; /* 8 32 */ /* size: 40, cachelines: 1, members: 4 */ /* sum members: 32 */ /* sum bitfield members: 33 bits, bit holes: 2, sum bit holes: 31 bits */ /* last cacheline: 40 bytes */ }; Notice the 1+3+29=33 bits in bit fields and 28+3=31 bits in holes. There are holes inside the flags bit field as well -- while some object flags are used for more than one purpose, 22, 23 and 24 are still free. Use 23 and 24 instead of 27 and 28 for TOPO_WALK_EXPLORED and TOPO_WALK_INDEGREE. This allows us to reduce FLAG_BITS by one so that all bitfields combined fit into a single 32-bit slot: struct object { unsigned int parsed:1; /* 0: 0 4 */ unsigned int type:3; /* 0: 1 4 */ unsigned int flags:28; /* 0: 4 4 */ struct object_id oid; /* 4 32 */ /* size: 36, cachelines: 1, members: 4 */ /* last cacheline: 36 bytes */ }; With this tight packing the size of struct object is reduced by 10%. Other architectures probably benefit as well. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-01Merge branch 'gs/commit-graph-path-filter'Junio C Hamano1-0/+11
Introduce an extension to the commit-graph to make it efficient to check for the paths that were modified at each commit using Bloom filters. * gs/commit-graph-path-filter: bloom: ignore renames when computing changed paths commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag t4216: add end to end tests for git log with Bloom filters revision.c: add trace2 stats around Bloom filter usage revision.c: use Bloom filters to speed up path based revision walks commit-graph: add --changed-paths option to write subcommand commit-graph: reuse existing Bloom filters during write commit-graph: write Bloom filters to commit graph file commit-graph: examine commits by generation number commit-graph: examine changed-path objects in pack order commit-graph: compute Bloom filters for changed paths diff: halt tree-diff early after max_changes bloom.c: core Bloom filter implementation for changed paths. bloom.c: introduce core Bloom filter constructs bloom.c: add the murmur3 hash implementation commit-graph: define and use MAX_NUM_CHUNKS
2020-04-22Merge branch 'ds/revision-show-pulls'Junio C Hamano1-1/+5
"git log" learned "--show-pulls" that helps pathspec limited history views; a merge commit that takes the whole change from a side branch, which is normally omitted from the output, is shown in addition to the commits that introduce real changes. * ds/revision-show-pulls: revision: --show-pulls adds helpful merges
2020-04-10revision: --show-pulls adds helpful mergesDerrick Stolee1-1/+5
The default file history simplification of "git log -- <path>" or "git rev-list -- <path>" focuses on providing the smallest set of commits that first contributed a change. The revision walk greatly restricts the set of walked commits by visiting only the first TREESAME parent of a merge commit, when one exists. This means that portions of the commit-graph are not walked, which can be a performance benefit, but can also "hide" commits that added changes but were ignored by a merge resolution. The --full-history option modifies this by walking all commits and reporting a merge commit as "interesting" if it has _any_ parent that is not TREESAME. This tends to be an over-representation of important commits, especially in an environment where most merge commits are created by pull request completion. Suppose we have a commit A and we create a commit B on top that changes our file. When we merge the pull request, we create a merge commit M. If no one else changed the file in the first-parent history between M and A, then M will not be TREESAME to its first parent, but will be TREESAME to B. Thus, the simplified history will be "B". However, M will appear in the --full-history mode. However, suppose that a number of topics T1, T2, ..., Tn were created based on commits C1, C2, ..., Cn between A and M as follows: A----C1----C2--- ... ---Cn----M------P1---P2--- ... ---Pn \ \ \ \ / / / / \ \__.. \ \/ ..__T1 / Tn \ \__.. /\ ..__T2 / \_____________________B \____________________/ If the commits T1, T2, ... Tn did not change the file, then all of P1 through Pn will be TREESAME to their first parent, but not TREESAME to their second. This means that all of those merge commits appear in the --full-history view, with edges that immediately collapse into the lower history without introducing interesting single-parent commits. The --simplify-merges option was introduced to remove these extra merge commits. By noticing that the rewritten parents are reachable from their first parents, those edges can be simplified away. Finally, the commits now look like single-parent commits that are TREESAME to their "only" parent. Thus, they are removed and this issue does not cause issues anymore. However, this also ends up removing the commit M from the history view! Even worse, the --simplify-merges option requires walking the entire history before returning a single result. Many Git users are using Git alongside a Git service that provides code storage alongside a code review tool commonly called "Pull Requests" or "Merge Requests" against a target branch. When these requests are accepted and merged, they typically create a merge commit whose first parent is the previous branch tip and the second parent is the tip of the topic branch used for the request. This presents a valuable order to the parents, but also makes that merge commit slightly special. Users may want to see not only which commits changed a file, but which pull requests merged those commits into their branch. In the previous example, this would mean the users want to see the merge commit "M" in addition to the single- parent commit "C". Users are even more likely to want these merge commits when they use pull requests to merge into a feature branch before merging that feature branch into their trunk. In some sense, users are asking for the "first" merge commit to bring in the change to their branch. As long as the parent order is consistent, this can be handled with the following rule: Include a merge commit if it is not TREESAME to its first parent, but is TREESAME to a later parent. These merges look like the merge commits that would result from running "git pull <topic>" on a main branch. Thus, the option to show these commits is called "--show-pulls". This has the added benefit of showing the commits created by closing a pull request or merge request on any of the Git hosting and code review platforms. To test these options, extend the standard test example to include a merge commit that is not TREESAME to its first parent. It is surprising that that option was not already in the example, as it is instructive. In particular, this extension demonstrates a common issue with file history simplification. When a user resolves a merge conflict using "-Xours" or otherwise ignoring one side of the conflict, they create a TREESAME edge that probably should not be TREESAME. This leads users to become frustrated and complain that "my change disappeared!" In my experience, showing them history with --full-history and --simplify-merges quickly reveals the problematic merge. As mentioned, this option is expensive to compute. The --show-pulls option _might_ show the merge commit (usually titled "resolving conflicts") more quickly. Of course, this depends on the user having the correct parent order, which is backwards when using "git pull master" from a topic branch. There are some special considerations when combining the --show-pulls option with --simplify-merges. This requires adding a new PULL_MERGE object flag to store the information from the initial TREESAME comparisons. This helps avoid dropping those commits in later filters. This is covered by a test, including how the parents can be simplified. Since "struct object" has already ruined its 32-bit alignment by using 33 bits across parsed, type, and flags member, let's not make it worse. PULL_MERGE is used in revision.c with the same value (1u<<15) as REACHABLE in commit-graph.c. The REACHABLE flag is only used when writing a commit-graph file, and a revision walk using --show-pulls does not happen in the same process. Care must be taken in the future to ensure this remains the case. Update Documentation/rev-list-options.txt with significant details around this option. This requires updating the example in the History Simplification section to demonstrate some of the problems with TREESAME second parents. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-07format-patch: teach --no-encode-email-headersEmma Brooks1-1/+2
When commit subjects or authors have non-ASCII characters, git format-patch Q-encodes them so they can be safely sent over email. However, if the patch transfer method is something other than email (web review tools, sneakernet), this only serves to make the patch metadata harder to read without first applying it (unless you can decode RFC 2047 in your head). git am as well as some email software supports non-Q-encoded mail as described in RFC 6531. Add --[no-]encode-email-headers and format.encodeEmailHeaders to let the user control this behavior. Signed-off-by: Emma Brooks <me@pluvano.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-06revision.c: use Bloom filters to speed up path based revision walksGarima Singh1-0/+11
Revision walk will now use Bloom filters for commits to speed up revision walks for a particular path (for computing history for that path), if they are present in the commit-graph file. We load the Bloom filters during the prepare_revision_walk step, currently only when dealing with a single pathspec. Extending it to work with multiple pathspecs can be explored and built on top of this series in the future. While comparing trees in rev_compare_trees(), if the Bloom filter says that the file is not different between the two trees, we don't need to compute the expensive diff. This is where we get our performance gains. The other response of the Bloom filter is '`:maybe', in which case we fall back to the full diff calculation to determine if the path was changed in the commit. We do not try to use Bloom filters when the '--walk-reflogs' option is specified. The '--walk-reflogs' option does not walk the commit ancestry chain like the rest of the options. Incorporating the performance gains when walking reflog entries would add more complexity, and can be explored in a later series. Performance Gains: We tested the performance of `git log -- <path>` on the git repo, the linux and some internal large repos, with a variety of paths of varying depths. On the git and linux repos: - we observed a 2x to 5x speed up. On a large internal repo with files seated 6-10 levels deep in the tree: - we observed 10x to 20x speed ups, with some paths going up to 28 times faster. Helped-by: Derrick Stolee <dstolee@microsoft.com Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-25Merge branch 'dl/format-patch-notes-config-fixup'Junio C Hamano1-1/+1
"git format-patch" can take a set of configured format.notes values to specify which notes refs to use in the log message part of the output. The behaviour of this was not consistent with multiple --notes command line options, which has been corrected. * dl/format-patch-notes-config-fixup: notes.h: fix typos in comment notes: break set_display_notes() into smaller functions config/format.txt: clarify behavior of multiple format.notes format-patch: move git_config() before repo_init_revisions() format-patch: use --notes behavior for format.notes notes: extract logic into set_display_notes() notes: create init_display_notes() helper notes: rename to load_display_notes()
2019-12-16Merge branch 'hw/doc-in-header'Junio C Hamano1-0/+59
* hw/doc-in-header: trace2: move doc to trace2.h submodule-config: move doc to submodule-config.h tree-walk: move doc to tree-walk.h trace: move doc to trace.h run-command: move doc to run-command.h parse-options: add link to doc file in parse-options.h credential: move doc to credential.h argv-array: move doc to argv-array.h cache: move doc to cache.h sigchain: move doc to sigchain.h pathspec: move doc to pathspec.h revision: move doc to revision.h attr: move doc to attr.h refs: move doc to refs.h remote: move doc to remote.h and refspec.h sha1-array: move doc to sha1-array.h merge: move doc to ll-merge.h graph: move doc to graph.h and graph.c dir: move doc to dir.h diff: move doc to diff.h and diffcore.h
2019-12-13notes: break set_display_notes() into smaller functionsDenton Liu1-1/+1
In 8164c961e1 (format-patch: use --notes behavior for format.notes, 2019-12-09), we introduced set_display_notes() which was a monolithic function with three mutually exclusive branches. Break the function up into three small and simple functions that each are only responsible for one task. This family of functions accepts an `int *show_notes` instead of returning a value suitable for assignment to `show_notes`. This is for two reasons. First of all, this guarantees that the external `show_notes` variable changes in lockstep with the `struct display_notes_opt`. Second, this prompts future developers to be careful about doing something meaningful with this value. In fact, a NULL check is intentionally omitted because causing a segfault here would tell the future developer that they are meant to use the value for something meaningful. One alternative was making the family of functions accept a `struct rev_info *` instead of the `struct display_notes_opt *`, since the former contains the `show_notes` field as well. This does not work because we have to call git_config() before repo_init_revisions(). However, if we had a `struct rev_info`, we'd need to initialize it before it gets assigned values from git_config(). As a result, we break the circular dependency by having standalone `int show_notes` and `struct display_notes_opt notes_opt` variables which temporarily hold values from git_config() until the values are copied over to `rev`. To implement this change, we need to get a pointer to `rev_info::show_notes`. Unfortunately, this is not possible with bitfields and only direct-assignment is possible. Change `rev_info::show_notes` to a non-bitfield int so that we can get its address. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-20revision: make get_revision_mark() return const pointerDenton Liu1-2/+2
get_revision_mark() used to return a `char *`, even though all of the strings it was returning were string literals. Make get_revision_mark() return a `const char *` so that callers won't be tempted to modify the returned string. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-18revision: move doc to revision.hHeba Waly1-0/+59
Move the documentation from Documentation/technical/api-revision-walking.txt to revision.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Also documentation/technical/api-revision-walking.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-07Merge branch 'en/combined-all-paths'Junio C Hamano1-0/+1
Output from "diff --cc" did not show the original paths when the merge involved renames. A new option adds the paths in the original trees to the output. * en/combined-all-paths: log,diff-tree: add --combined-all-paths option
2019-02-07log,diff-tree: add --combined-all-paths optionElijah Newren1-0/+1
The combined diff format for merges will only list one filename, even if rename or copy detection is active. For example, with raw format one might see: ::100644 100644 100644 fabadb8 cc95eb0 4866510 MM describe.c ::100755 100755 100755 52b7a2d 6d1ac04 d2ac7d7 RM bar.sh ::100644 100644 100644 e07d6c5 9042e82 ee91881 RR phooey.c This doesn't let us know what the original name of bar.sh was in the first parent, and doesn't let us know what either of the original names of phooey.c were in either of the parents. In contrast, for non-merge commits, raw format does provide original filenames (and a rename score to boot). In order to also provide original filenames for merge commits, add a --combined-all-paths option (which must be used with either -c or --cc, and is likely only useful with rename or copy detection active) so that we can print tab-separated filenames when renames are involved. This transforms the above output to: ::100644 100644 100644 fabadb8 cc95eb0 4866510 MM desc.c desc.c desc.c ::100755 100755 100755 52b7a2d 6d1ac04 d2ac7d7 RM foo.sh bar.sh bar.sh ::100644 100644 100644 e07d6c5 9042e82 ee91881 RR fooey.c fuey.c phooey.c Further, in patch format, this changes the from/to headers so that instead of just having one "from" header, we get one for each parent. For example, instead of having --- a/phooey.c +++ b/phooey.c we would see --- a/fooey.c --- a/fuey.c +++ b/phooey.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-06Merge branch 'ds/push-sparse-tree-walk'Junio C Hamano1-0/+2
"git pack-objects" learned another algorithm to compute the set of objects to send, that trades the resulting packfile off to save traversal cost to favor small pushes. * ds/push-sparse-tree-walk: pack-objects: create GIT_TEST_PACK_SPARSE pack-objects: create pack.useSparse setting revision: implement sparse algorithm list-objects: consume sparse tree walk revision: add mark_tree_uninteresting_sparse
2019-01-17revision: add mark_tree_uninteresting_sparseDerrick Stolee1-0/+2
In preparation for a new algorithm that walks fewer trees when creating a pack from a set of revisions, create a method that takes an oidset of tree oids and marks reachable objects as UNINTERESTING. The current implementation uses the existing mark_tree_uninteresting to recursively walk the trees and blobs. This will walk the same number of trees as the old mechanism. To ensure that mark_tree_uninteresting walks the tree, we need to remove the UNINTERESTING flag before calling the method. This implementation will be replaced entirely in a later commit. There is one new assumption in this approach: we are also given the oids of the interesting trees. This implementation does not use those trees at the moment, but we will use them in a later rewrite of this method. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14Merge branch 'md/exclude-promisor-objects-fix-cleanup'Junio C Hamano1-2/+2
Code clean-up. * md/exclude-promisor-objects-fix-cleanup: revision.c: put promisor option in specialized struct
2018-12-06revision.c: put promisor option in specialized structMatthew DeVore1-2/+2
Put the allow_exclude_promisor_objects flag in setup_revision_opt. When it was in rev_info, it was unclear when it was used, since rev_info is passed to functions that don't use the flag. This resulted in unnecessary setting of the flag in prune.c, so fix that as well. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-18Merge branch 'ds/reachable-topo-order'Junio C Hamano1-0/+7
The revision walker machinery learned to take advantage of the commit generation numbers stored in the commit-graph file. * ds/reachable-topo-order: t6012: make rev-list tests more interesting revision.c: generation-based topo-order algorithm commit/revisions: bookkeeping before refactoring revision.c: begin refactoring --topo-order logic test-reach: add rev-list tests test-reach: add run_three_modes method prio-queue: add 'peek' operation
2018-11-06Merge branch 'md/exclude-promisor-objects-fix'Junio C Hamano1-0/+1
Operations on promisor objects make sense in the context of only a small subset of the commands that internally use the revisions machinery, but the "--exclude-promisor-objects" option were taken and led to nonsense results by commands like "log", to which it didn't make much sense. This has been corrected. * md/exclude-promisor-objects-fix: exclude-promisor-objects: declare when option is allowed Documentation/git-log.txt: do not show --exclude-promisor-objects
2018-11-02revision.c: generation-based topo-order algorithmDerrick Stolee1-0/+2
The current --topo-order algorithm requires walking all reachable commits up front, topo-sorting them, all before outputting the first value. This patch introduces a new algorithm which uses stored generation numbers to incrementally walk in topo-order, outputting commits as we go. This can dramatically reduce the computation time to write a fixed number of commits, such as when limiting with "-n <N>" or filling the first page of a pager. When running a command like 'git rev-list --topo-order HEAD', Git performed the following steps: 1. Run limit_list(), which parses all reachable commits, adds them to a linked list, and distributes UNINTERESTING flags. If all unprocessed commits are UNINTERESTING, then it may terminate without walking all reachable commits. This does not occur if we do not specify UNINTERESTING commits. 2. Run sort_in_topological_order(), which is an implementation of Kahn's algorithm. It first iterates through the entire set of important commits and computes the in-degree of each (plus one, as we use 'zero' as a special value here). Then, we walk the commits in priority order, adding them to the priority queue if and only if their in-degree is one. As we remove commits from this priority queue, we decrement the in-degree of their parents. 3. While we are peeling commits for output, get_revision_1() uses pop_commit on the full list of commits computed by sort_in_topological_order(). In the new algorithm, these three steps correspond to three different commit walks. We run these walks simultaneously, and advance each only as far as necessary to satisfy the requirements of the 'higher order' walk. We know when we can pause each walk by using generation numbers from the commit- graph feature. Recall that the generation number of a commit satisfies: * If the commit has at least one parent, then the generation number is one more than the maximum generation number among its parents. * If the commit has no parent, then the generation number is one. There are two special generation numbers: * GENERATION_NUMBER_INFINITY: this value is 0xffffffff and indicates that the commit is not stored in the commit-graph and the generation number was not previously calculated. * GENERATION_NUMBER_ZERO: this value (0) is a special indicator to say that the commit-graph was generated by a version of Git that does not compute generation numbers (such as v2.18.0). Since we use generation_numbers_enabled() before using the new algorithm, we do not need to worry about GENERATION_NUMBER_ZERO. However, the existence of GENERATION_NUMBER_INFINITY implies the following weaker statement than the usual we expect from generation numbers: If A and B are commits with generation numbers gen(A) and gen(B) and gen(A) < gen(B), then A cannot reach B. Thus, we will walk in each of our stages until the "maximum unexpanded generation number" is strictly lower than the generation number of a commit we are about to use. The walks are as follows: 1. EXPLORE: using the explore_queue priority queue (ordered by maximizing the generation number), parse each reachable commit until all commits in the queue have generation number strictly lower than needed. During this walk, update the UNINTERESTING flags as necessary. 2. INDEGREE: using the indegree_queue priority queue (ordered by maximizing the generation number), add one to the in- degree of each parent for each commit that is walked. Since we walk in order of decreasing generation number, we know that discovering an in-degree value of 0 means the value for that commit was not initialized, so should be initialized to two. (Recall that in-degree value "1" is what we use to say a commit is ready for output.) As we iterate the parents of a commit during this walk, ensure the EXPLORE walk has walked beyond their generation numbers. 3. TOPO: using the topo_queue priority queue (ordered based on the sort_order given, which could be commit-date, author- date, or typical topo-order which treats the queue as a LIFO stack), remove a commit from the queue and decrement the in-degree of each parent. If a parent has an in-degree of one, then we add it to the topo_queue. Before we decrement the in-degree, however, ensure the INDEGREE walk has walked beyond that generation number. The implementations of these walks are in the following methods: * explore_walk_step and explore_to_depth * indegree_walk_step and compute_indegrees_to_depth * next_topo_commit and expand_topo_walk These methods have some patterns that may seem strange at first, but they are probably carry-overs from their equivalents in limit_list and sort_in_topological_order. One thing that is missing from this implementation is a proper way to stop walking when the entire queue is UNINTERESTING, so this implementation is not enabled by comparisions, such as in 'git rev-list --topo-order A..B'. This can be updated in the future. In my local testing, I used the following Git commands on the Linux repository in three modes: HEAD~1 with no commit-graph, HEAD~1 with a commit-graph, and HEAD with a commit-graph. This allows comparing the benefits we get from parsing commits from the commit-graph and then again the benefits we get by restricting the set of commits we walk. Test: git rev-list --topo-order -100 HEAD HEAD~1, no commit-graph: 6.80 s HEAD~1, w/ commit-graph: 0.77 s HEAD, w/ commit-graph: 0.02 s Test: git rev-list --topo-order -100 HEAD -- tools HEAD~1, no commit-graph: 9.63 s HEAD~1, w/ commit-graph: 6.06 s HEAD, w/ commit-graph: 0.06 s This speedup is due to a few things. First, the new generation- number-enabled algorithm walks commits on order of the number of results output (subject to some branching structure expectations). Since we limit to 100 results, we are running a query similar to filling a single page of results. Second, when specifying a path, we must parse the root tree object for each commit we walk. The previous benefits from the commit-graph are entirely from reading the commit-graph instead of parsing commits. Since we need to parse trees for the same number of commits as before, we slow down significantly from the non-path-based query. For the test above, I specifically selected a path that is changed frequently, including by merge commits. A less-frequently-changed path (such as 'README') has similar end-to-end time since we need to walk the same number of commits (before determining we do not have 100 hits). However, get the benefit that the output is presented to the user as it is discovered, much the same as a normal 'git log' command (no '--topo-order'). This is an improved user experience, even if the command has the same runtime. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-02revision.c: begin refactoring --topo-order logicDerrick Stolee1-0/+4
When running 'git rev-list --topo-order' and its kin, the topo_order setting in struct rev_info implies the limited setting. This means that the following things happen during prepare_revision_walk(): * revs->limited implies we run limit_list() to walk the entire reachable set. There are some short-cuts here, such as if we perform a range query like 'git rev-list COMPARE..HEAD' and we can stop limit_list() when all queued commits are uninteresting. * revs->topo_order implies we run sort_in_topological_order(). See the implementation of that method in commit.c. It implies that the full set of commits to order is in the given commit_list. These two methods imply that a 'git rev-list --topo-order HEAD' command must walk the entire reachable set of commits _twice_ before returning a single result. If we have a commit-graph file with generation numbers computed, then there is a better way. This patch introduces some necessary logic redirection when we are in this situation. In v2.18.0, the commit-graph file contains zero-valued bytes in the positions where the generation number is stored in v2.19.0 and later. Thus, we use generation_numbers_enabled() to check if the commit-graph is available and has non-zero generation numbers. When setting revs->limited only because revs->topo_order is true, only do so if generation numbers are not available. There is no reason to use the new logic as it will behave similarly when all generation numbers are INFINITY or ZERO. In prepare_revision_walk(), if we have revs->topo_order but not revs->limited, then we trigger the new logic. It breaks the logic into three pieces, to fit with the existing framework: 1. init_topo_walk() fills a new struct topo_walk_info in the rev_info struct. We use the presence of this struct as a signal to use the new methods during our walk. In this patch, this method simply calls limit_list() and sort_in_topological_order(). In the future, this method will set up a new data structure to perform that logic in-line. 2. next_topo_commit() provides get_revision_1() with the next topo- ordered commit in the list. Currently, this simply pops the commit from revs->commits. 3. expand_topo_walk() provides get_revision_1() with a way to signal walking beyond the latest commit. Currently, this calls add_parents_to_list() exactly like the old logic. While this commit presents method redirection for performing the exact same logic as before, it allows the next commit to focus only on the new logic. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-30Merge branch 'md/filter-trees'Junio C Hamano1-2/+24
The "rev-list --filter" feature learned to exclude all trees via "tree:0" filter. * md/filter-trees: list-objects: support for skipping tree traversal filter-trees: code clean-up of tests list-objects-filter: implement filter tree:0 list-objects-filter-options: do not over-strbuf_init list-objects-filter: use BUG rather than die revision: mark non-user-given objects instead rev-list: handle missing tree objects properly list-objects: always parse trees gently list-objects: refactor to process_tree_contents list-objects: store common func args in struct
2018-10-23exclude-promisor-objects: declare when option is allowedMatthew DeVore1-0/+1
The --exclude-promisor-objects option causes some funny behavior in at least two commands: log and blame. It causes a BUG crash: $ git log --exclude-promisor-objects BUG: revision.c:2143: exclude_promisor_objects can only be used when fetch_if_missing is 0 Aborted [134] Fix this such that the option is treated like any other unknown option. The commands that must support it are limited, so declare in those commands that the flag is supported. In particular: pack-objects prune rev-list The commands were found by searching for logic which parses --exclude-promisor-objects outside of revision.c. Extra logic outside of revision.c is needed because fetch_if_missing must be turned on before revision.c sees the option or it will BUG-crash. The above list is supported by the fact that no other command is introspectively invoked by another command passing --exclude-promisor-object. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19Merge branch 'nd/the-index'Junio C Hamano1-4/+11
Various codepaths in the core-ish part learn to work on an arbitrary in-core index structure, not necessarily the default instance "the_index". * nd/the-index: (23 commits) revision.c: reduce implicit dependency the_repository revision.c: remove implicit dependency on the_index ws.c: remove implicit dependency on the_index tree-diff.c: remove implicit dependency on the_index submodule.c: remove implicit dependency on the_index line-range.c: remove implicit dependency on the_index userdiff.c: remove implicit dependency on the_index rerere.c: remove implicit dependency on the_index sha1-file.c: remove implicit dependency on the_index patch-ids.c: remove implicit dependency on the_index merge.c: remove implicit dependency on the_index merge-blobs.c: remove implicit dependency on the_index ll-merge.c: remove implicit dependency on the_index diff-lib.c: remove implicit dependency on the_index read-cache.c: remove implicit dependency on the_index diff.c: remove implicit dependency on the_index grep.c: remove implicit dependency on the_index diff.c: remove the_index dependency in textconv() functions blame.c: rename "repo" argument to "r" combine-diff.c: remove implicit dependency on the_index ...
2018-10-07revision: mark non-user-given objects insteadMatthew DeVore1-2/+9
Currently, list-objects.c incorrectly treats all root trees of commits as USER_GIVEN. Also, it would be easier to mark objects that are non-user-given instead of user-given, since the places in the code where we access an object through a reference are more obvious than the places where we access an object that was given by the user. Resolve these two problems by introducing a flag NOT_USER_GIVEN that marks blobs and trees that are non-user-given, replacing USER_GIVEN. (Only blobs and trees are marked because this mark is only used when filtering objects, and filtering of other types of objects is not supported yet.) This fixes a bug in that git rev-list behaved differently from git pack-objects. pack-objects would *not* filter objects given explicitly on the command line and rev-list would filter. This was because the two commands used a different function to add objects to the rev_info struct. This seems to have been an oversight, and pack-objects has the correct behavior, so I added a test to make sure that rev-list now behaves properly. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07rev-list: handle missing tree objects properlyMatthew DeVore1-0/+15
Previously, we assumed only blob objects could be missing. This patch makes rev-list handle missing trees like missing blobs. The --missing=* and --exclude-promisor-objects flags now work for trees as they already do for blobs. This is demonstrated in t6112. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21revision.c: reduce implicit dependency the_repositoryNguyễn Thái Ngọc Duy1-1/+1
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-21revision.c: remove implicit dependency on the_indexNguyễn Thái Ngọc Duy1-3/+10
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17Merge branch 'es/format-patch-rangediff'Junio C Hamano1-0/+6
"git format-patch" learned a new "--range-diff" option to explain the difference between this version and the previous attempt in the cover letter (or after the tree-dashes as a comment). * es/format-patch-rangediff: format-patch: allow --range-diff to apply to a lone-patch format-patch: add --creation-factor tweak for --range-diff format-patch: teach --range-diff to respect -v/--reroll-count format-patch: extend --range-diff to accept revision range format-patch: add --range-diff option to embed diff in cover letter range-diff: relieve callers of low-level configuration burden range-diff: publish default creation factor range-diff: respect diff_option.file rather than assuming 'stdout'
2018-09-17Merge branch 'es/format-patch-interdiff'Junio C Hamano1-0/+5
"git format-patch" learned a new "--interdiff" option to explain the difference between this version and the previous atttempt in the cover letter (or after the tree-dashes as a comment). * es/format-patch-interdiff: format-patch: allow --interdiff to apply to a lone-patch log-tree: show_log: make commentary block delimiting reusable interdiff: teach show_interdiff() to indent interdiff format-patch: teach --interdiff to respect -v/--reroll-count format-patch: add --interdiff option to embed diff in cover letter format-patch: allow additional generated content in make_cover_letter()
2018-09-17Merge branch 'jk/rev-list-stdin-noop-is-ok'Junio C Hamano1-0/+5
"git rev-list --stdin </dev/null" used to be an error; it now shows no output without an error. "git rev-list --stdin --default HEAD" still falls back to the given default when nothing is given on the standard input. * jk/rev-list-stdin-noop-is-ok: rev-list: make empty --stdin not an error
2018-08-22rev-list: make empty --stdin not an errorJeff King1-0/+5
When we originally did the series that contains 7ba826290a (revision: add rev_input_given flag, 2017-08-02) the intent was that "git rev-list --stdin </dev/null" would similarly become a successful noop. However, an attempt at the time to do that did not work[1]. The problem is that rev_input_given serves two roles: - it tells rev-list.c that it should not error out - it tells revision.c that it should not have the "default" ref kick (e.g., "HEAD" in "git log") We want to trigger the former, but not the latter. This is technically possible with a single flag, if we set the flag only after revision.c's revs->def check. But this introduces a rather subtle ordering dependency. Instead, let's keep two flags: one to denote when we got actual input (which triggers both roles) and one for when we read stdin (which triggers only the first). This does mean a caller interested in the first role has to check both flags, but there's only one such caller. And any future callers might want to make the distinction anyway (e.g., if they care less about erroring out, and more about whether revision.c soaked up our stdin). In fact, we already keep such a flag internally in revision.c for this purpose, so this is really just exposing that to the caller (and the old function-local flag can go away in favor of our new one). [1] https://public-inbox.org/git/20170802223416.gwiezhbuxbdmbjzx@sigill.intra.peff.net/ Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20Merge branch 'en/incl-forward-decl'Junio C Hamano1-0/+1
Code hygiene improvement for the header files. * en/incl-forward-decl: Remove forward declaration of an enum compat/precompose_utf8.h: use more common include guard style urlmatch.h: fix include guard Move definition of enum branch_track from cache.h to branch.h alloc: make allocate_alloc_state and clear_alloc_state more consistent Add missing includes and forward declarations
2018-08-15Add missing includes and forward declarationsElijah Newren1-0/+1
I looped over the toplevel header files, creating a temporary two-line C program for each consisting of #include "git-compat-util.h" #include $HEADER This patch is the result of manually fixing errors in compiling those tiny programs. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-14format-patch: teach --range-diff to respect -v/--reroll-countEric Sunshine1-0/+1
The --range-diff option announces the embedded range-diff generically as "Range-diff:", however, we can do better when --reroll-count is specified by emitting "Range-diff against v{n}:" instead. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-14format-patch: add --range-diff option to embed diff in cover letterEric Sunshine1-0/+5
When submitting a revised version of a patch series, it can be helpful (to reviewers) to include a summary of changes since the previous attempt in the form of a range-diff, however, doing so involves manually copy/pasting the diff into the cover letter. Add a --range-diff option to automate this process. The argument to --range-diff specifies the tip of the previous attempt against which to generate the range-diff. For example: git format-patch --cover-letter --range-diff=v1 -3 v2 (At this stage, the previous attempt and the patch series being formatted must share a common base, however, a subsequent enhancement will make it possible to specify an explicit revision range for the previous attempt.) Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-14Merge branch 'es/format-patch-interdiff' into es/format-patch-rangediffJunio C Hamano1-0/+5
* es/format-patch-interdiff: format-patch: allow --interdiff to apply to a lone-patch log-tree: show_log: make commentary block delimiting reusable interdiff: teach show_interdiff() to indent interdiff format-patch: teach --interdiff to respect -v/--reroll-count format-patch: add --interdiff option to embed diff in cover letter format-patch: allow additional generated content in make_cover_letter()
2018-08-03revision.h: drop extern from function declarationNguyễn Thái Ngọc Duy1-34/+35
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-24Merge branch 'jt/partial-clone-fsck-connectivity'Junio C Hamano1-1/+2
Partial clone support of "git clone" has been updated to correctly validate the objects it receives from the other side. The server side has been corrected to send objects that are directly requested, even if they may match the filtering criteria (e.g. when doing a "lazy blob" partial clone). * jt/partial-clone-fsck-connectivity: clone: check connectivity even if clone is partial upload-pack: send refs' objects despite "filter"
2018-07-23format-patch: teach --interdiff to respect -v/--reroll-countEric Sunshine1-0/+1
The --interdiff option introduces the embedded interdiff generically as "Interdiff:", however, we can do better when --reroll-count is specified by emitting "Interdiff against v{n}:" instead. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-23format-patch: add --interdiff option to embed diff in cover letterEric Sunshine1-0/+4
When submitting a revised version of a patch series, it can be helpful (to reviewers) to include a summary of changes since the previous attempt in the form of an interdiff, however, doing so involves manually copy/pasting the diff into the cover letter. Add an --interdiff option to automate this process. The argument to --interdiff specifies the tip of the previous attempt against which to generate the interdiff. For example: git format-patch --cover-letter --interdiff=v1 -3 v2 The previous attempt and the patch series being formatted must share a common base. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-09upload-pack: send refs' objects despite "filter"Jonathan Tan1-1/+2
A filter line in a request to upload-pack filters out objects regardless of whether they are directly referenced by a "want" line or not. This means that cloning with "--filter=blob:none" (or another filter that excludes blobs) from a repository with at least one ref pointing to a blob (for example, the Git repository itself) results in output like the following: error: missing object referenced by 'refs/tags/junio-gpg-pub' and if that particular blob is not referenced by a fetched tree, the resulting clone fails fsck because there is no object from the remote to vouch that the missing object is a promisor object. Update both the protocol and the upload-pack implementation to include all explicitly specified "want" objects in the packfile regardless of the filter specification. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-21revision.c: use commit-slab for show_sourceNguyễn Thái Ngọc Duy1-1/+4
Instead of relying on commit->util to store the source string, let the user provide a commit-slab to store the source strings in. It's done so that commit->util can be removed. See more explanation in the commit that removes commit->util. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-06Merge branch 'jk/cached-commit-buffer'Junio C Hamano1-1/+0
Code clean-up. * jk/cached-commit-buffer: revision: drop --show-all option commit: drop uses of get_cached_commit_buffer()
2018-02-22revision: drop --show-all optionJeff King1-1/+0
This was an undocumented debugging aid that does not seem to have come in handy in the past decade, judging from its lack of mentions on the mailing list. Let's drop it in the name of simplicity. This is morally a revert of 3131b71301 (Add "--show-all" revision walker flag for debugging, 2008-02-09), but note that I did leave in the mapping of UNINTERESTING to "^" in get_revision_mark(). I don't think this would be possible to trigger with the current code, but it's the only sensible marker. We'll skip the usual deprecation period because this was explicitly a debugging aid that was never documented. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-15Merge branch 'rs/lose-leak-pending' into maintJunio C Hamano1-12/+0
API clean-up around revision traversal. * rs/lose-leak-pending: commit: remove unused function clear_commit_marks_for_object_array() revision: remove the unused flag leak_pending checkout: avoid using the rev_info flag leak_pending bundle: avoid using the rev_info flag leak_pending bisect: avoid using the rev_info flag leak_pending object: add clear_commit_marks_all() ref-filter: use clear_commit_marks_many() in do_merge_filter() commit: use clear_commit_marks_many() in remove_redundant() commit: avoid allocation in clear_commit_marks_many()
2018-02-13Merge branch 'jh/fsck-promisors'Junio C Hamano1-1/+4
In preparation for implementing narrow/partial clone, the machinery for checking object connectivity used by gc and fsck has been taught that a missing object is OK when it is referenced by a packfile specially marked as coming from trusted repository that promises to make them available on-demand and lazily. * jh/fsck-promisors: gc: do not repack promisor packfiles rev-list: support termination at promisor objects sha1_file: support lazily fetching missing objects introduce fetch-object: fetch one promisor object index-pack: refactor writing of .keep files fsck: support promisor objects as CLI argument fsck: support referenced promisor objects fsck: support refs pointing to promisor objects fsck: introduce partialclone extension extension.partialclone: introduce partial clone extension
2018-01-23Merge branch 'rs/lose-leak-pending'Junio C Hamano1-12/+0
API clean-up around revision traversal. * rs/lose-leak-pending: commit: remove unused function clear_commit_marks_for_object_array() revision: remove the unused flag leak_pending checkout: avoid using the rev_info flag leak_pending bundle: avoid using the rev_info flag leak_pending bisect: avoid using the rev_info flag leak_pending object: add clear_commit_marks_all() ref-filter: use clear_commit_marks_many() in do_merge_filter() commit: use clear_commit_marks_many() in remove_redundant() commit: avoid allocation in clear_commit_marks_many()
2017-12-28Merge branch 'sb/describe-blob'Junio C Hamano1-1/+2
"git describe" was taught to dig trees deeper to find a <commit-ish>:<path> that refers to a given blob object. * sb/describe-blob: builtin/describe.c: describe a blob builtin/describe.c: factor out describe_commit builtin/describe.c: print debug statements earlier builtin/describe.c: rename `oid` to avoid variable shadowing revision.h: introduce blob/tree walking in order of the commits list-objects.c: factor out traverse_trees_and_blobs t6120: fix typo in test name
2017-12-28revision: remove the unused flag leak_pendingRené Scharfe1-12/+0
Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-12format: create pretty.h fileOlga Telezhnaya1-1/+1
Create header for pretty.c to make formatting interface more structured. This is a middle point, this file would be merged further with other files which contain formatting stuff. Signed-off-by: Olga Telezhnaia <olyatelezhnaya@gmail.com> Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08rev-list: support termination at promisor objectsJonathan Tan1-1/+4
Teach rev-list to support termination of an object traversal at any object from a promisor remote (whether one that the local repo also has, or one that the local repo knows about because it has another promisor object that references it). This will be used subsequently in gc and in the connectivity check used by fetch. For efficiency, if an object is referenced by a promisor object, and is in the local repo only as a non-promisor object, object traversal will not stop there. This is to avoid building the list of promisor object references. (In list-objects.c, the case where obj is NULL in process_blob() and process_tree() do not need to be changed because those happen only when there is a conflict between the expected type and the existing object. If the object doesn't exist, an object will be synthesized, which is fine.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-16revision.h: introduce blob/tree walking in order of the commitsStefan Beller1-1/+2
The functionality to list tree objects in the order they were seen while traversing the commits will be used in one of the next commits, where we teach `git describe` to describe not only commits, but blobs, too. The change in list-objects.c is rather minimal as we'll be re-using the infrastructure put in place of the revision walking machinery. For example one could expect that add_pending_tree is not called, but rather commit->tree is directly passed to the tree traversal function. This however requires a lot more code than just emptying the queue containing trees after each commit. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-29Merge branch 'ma/leakplugs'Junio C Hamano1-0/+11
Memory leaks in various codepaths have been plugged. * ma/leakplugs: pack-bitmap[-write]: use `object_array_clear()`, don't leak object_array: add and use `object_array_pop()` object_array: use `object_array_clear()`, not `free()` leak_pending: use `object_array_clear()`, not `free()` commit: fix memory leak in `reduce_heads()` builtin/commit: fix memory leak in `prepare_index()`
2017-09-24leak_pending: use `object_array_clear()`, not `free()`Martin Ågren1-0/+11
Setting `leak_pending = 1` tells `prepare_revision_walk()` not to release the `pending` array, and makes that the caller's responsibility. See 4a43d374f (revision: add leak_pending flag, 2011-10-01) and 353f5657a (bisect: use leak_pending flag, 2011-10-01). Commit 1da1e07c8 (clean up name allocation in prepare_revision_walk, 2014-10-15) fixed a memory leak in `prepare_revision_walk()` by switching from `free()` to `object_array_clear()`. However, where we use the `leak_pending`-mechanism, we're still only calling `free()`. Use `object_array_clear()` instead. Copy some helpful comments from 353f5657a to the other callers that we update to clarify the memory responsibilities, and to highlight that the commits are not affected when we clear the array -- it is indeed correct to both tidy up the commit flags and clear the object array. Document `leak_pending` in revision.h to help future users get this right. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-24revision.h: new flag in struct rev_info wrt. worktree-related refsNguyễn Thái Ngọc Duy1-0/+1
The revision walker can walk through per-worktree refs like HEAD or SHA-1 references in the index. These currently are from the current worktree only. This new flag is added to change rev-list behavior in this regard: When single_worktree is set, only current worktree is considered. When it is not set (which is the default), all worktrees are considered. The default is chosen so because the two big components that rev-list works with are object database (entirely shared between worktrees) and refs (mostly shared). It makes sense that default behavior goes per-repo too instead of per-worktree. The flag will eventually be exposed as a rev-list argument with documents. For now it stays internal until the new behavior is fully implemented. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-02revision: add rev_input_given flagJeff King1-0/+7
Normally a caller that invokes setup_revisions() has to check rev.pending to see if anything was actually queued for the traversal. But they can't tell the difference between two cases: 1. The user gave us no tip from which to start a traversal. 2. The user tried to give us tips via --glob, --all, etc, but their patterns ended up being empty. Let's set a flag in the rev_info struct that callers can use to tell the difference. We can set this from the init_all_refs_cb() function. That's a little funny because it's not exactly about initializing the "cb" struct itself. But that function is the common setup place for doing pattern traversals that is used by --glob, --all, etc. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-22Merge branch 'sg/revision-parser-skip-prefix'Junio C Hamano1-2/+3
Code clean-up. * sg/revision-parser-skip-prefix: revision.c: use skip_prefix() in handle_revision_pseudo_opt() revision.c: use skip_prefix() in handle_revision_opt() revision.c: stricter parsing of '--early-output' revision.c: stricter parsing of '--no-{min,max}-parents' revision.h: turn rev_info.early_output back into an unsigned int
2017-06-12revision.h: turn rev_info.early_output back into an unsigned intSZEDER Gábor1-2/+3
rev_info.early_output started out as an unsigned int in cdcefbc97 (Add "--early-output" log flag for interactive GUI use, 2007-11-03), but later it was turned into a single bit in a bit field in cc243c3ce (show: --ignore-missing, 2011-05-18) without explanation, though the code using it still expects it to be a regular integer type and uses it as a counter. Consequently, any even number given via '--early-output=<N>', or indeed a plain '--early-output' defaulting to 100 effectively disabled the feature. Turn rev_info.early_output back into its origin unsigned int data type, making '--early-output' work again. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-29Merge branch 'bc/object-id'Junio C Hamano1-3/+3
Conversion from uchar[20] to struct object_id continues. * bc/object-id: (53 commits) object: convert parse_object* to take struct object_id tree: convert parse_tree_indirect to struct object_id sequencer: convert do_recursive_merge to struct object_id diff-lib: convert do_diff_cache to struct object_id builtin/ls-tree: convert to struct object_id merge: convert checkout_fast_forward to struct object_id sequencer: convert fast_forward_to to struct object_id builtin/ls-files: convert overlay_tree_on_cache to object_id builtin/read-tree: convert to struct object_id sha1_name: convert internals of peel_onion to object_id upload-pack: convert remaining parse_object callers to object_id revision: convert remaining parse_object callers to object_id revision: rename add_pending_sha1 to add_pending_oid http-push: convert process_ls_object and descendants to object_id refs/files-backend: convert many internals to struct object_id refs: convert struct ref_update to use struct object_id ref-filter: convert some static functions to struct object_id Convert struct ref_array_item to struct object_id Convert the verify_pack callback to struct object_id Convert lookup_tag to struct object_id ...
2017-05-08revision: rename add_pending_sha1 to add_pending_oidbrian m. carlson1-3/+3
Rename this function and convert it to take a pointer to struct object_id. This is a prerequisite for converting get_reference, which is needed to convert parse_object. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-27timestamp_t: a new data type for timestampsJohannes Schindelin1-2/+2
Git's source code assumes that unsigned long is at least as precise as time_t. Which is incorrect, and causes a lot of problems, in particular where unsigned long is only 32-bit (notably on Windows, even in 64-bit versions). So let's just use a more appropriate data type instead. In preparation for this, we introduce the new `timestamp_t` data type. By necessity, this is a very, very large patch, as it has to replace all timestamps' data type in one go. As we will use a data type that is not necessarily identical to `time_t`, we need to be very careful to use `time_t` whenever we interact with the system functions, and `timestamp_t` everywhere else. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-24Merge branch 'rs/path-name-safety-cleanup'Junio C Hamano1-2/+0
Code clean-up. * rs/path-name-safety-cleanup: revision: remove declaration of path_name()
2017-03-18revision: remove declaration of path_name()René Scharfe1-2/+0
The definition of path_name() was removed by 2824e1841 (list-objects: pass full pathname to callbacks); remove its declaration as well. Signed-off-by: Rene Scharfe <l.s.r@web.de> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-13Merge branch 'lt/pretty-expand-tabs'Junio C Hamano1-0/+2
When "git log" shows the log message indented by 4-spaces, the remainder of a line after a HT does not align in the way the author originally intended. The command now expands tabs by default in such a case, and allows the users to override it with a new option, '--no-expand-tabs'. * lt/pretty-expand-tabs: pretty: test --expand-tabs pretty: allow tweaking tabwidth in --expand-tabs pretty: enable --expand-tabs by default for selected pretty formats pretty: expand tabs in indented logs to make things line up properly
2016-03-30pretty: enable --expand-tabs by default for selected pretty formatsJunio C Hamano1-1/+2
"git log --pretty={medium,full,fuller}" and "git log" by default prepend 4 spaces to the log message, so it makes sense to enable the new "expand-tabs" facility by default for these formats. Add --no-expand-tabs option to override the new default. The change alone breaks a test in t4201 that runs "git shortlog" on the output from "git log", and expects that the output from "git log" does not do such a tab expansion. Adjust the test to explicitly disable expand-tabs with --no-expand-tabs. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-30pretty: expand tabs in indented logs to make things line up properlyLinus Torvalds1-0/+1
A commit log message sometimes tries to line things up using tabs, assuming fixed-width font with the standard 8-place tab settings. Viewing such a commit however does not work well in "git log", as we indent the lines by prefixing 4 spaces in front of them. This should all line up: Column 1 Column 2 -------- -------- A B ABCD EFGH SPACES Instead of Tabs Even with multi-byte UTF8 characters: Column 1 Column 2 -------- -------- Ä B åäö 100 A Møøse once bit my sister.. Tab-expand the lines in "git log --expand-tabs" output before prefixing 4 spaces. This is based on the patch by Linus Torvalds, but at this step, we require an explicit command line option to enable the behaviour. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-16list-objects: pass full pathname to callbacksJeff King1-2/+1
When we find a blob at "a/b/c", we currently pass this to our show_object_fn callbacks as two components: "a/b/" and "c". Callbacks which want the full value then call path_name(), which concatenates the two. But this is an inefficient interface; the path is a strbuf, and we could simply append "c" to it temporarily, then roll back the length, without creating a new copy. So we could improve this by teaching the callsites of path_name() this trick (and there are only 3). But we can also notice that no callback actually cares about the broken-down representation, and simply pass each callback the full path "a/b/c" as a string. The callback code becomes even simpler, then, as we do not have to worry about freeing an allocated buffer, nor rolling back our modification to the strbuf. This is theoretically less efficient, as some callbacks would not bother to format the final path component. But in practice this is not measurable. Since we use the same strbuf over and over, our work to grow it is amortized, and we really only pay to memcpy a few bytes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-16list-objects: drop name_path entirelyJeff King1-6/+2
In the previous commit, we left name_path as a thin wrapper around a strbuf. This patch drops it entirely. As a result, every show_object_fn callback needs to be adjusted. However, none of their code needs to be changed at all, because the only use was to pass it to path_name(), which now handles the bare strbuf. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-16list-objects: convert name_path to a strbufJeff King1-3/+1
The "struct name_path" data is examined in only two places: we generate it in process_tree(), and we convert it to a single string in path_name(). Everyone else just passes it through to those functions. We can further note that process_tree() already keeps a single strbuf with the leading tree path, for use with tree_entry_interesting(). Instead of building a separate name_path linked list, let's just use the one we already build in "base". This reduces the amount of code (especially tricky code in path_name() which did not check for integer overflows caused by deep or large pathnames). It is also more efficient in some instances. Any time we were using tree_entry_interesting, we were building up the strbuf anyway, so this is an immediate and obvious win there. In cases where we were not, we trade off storing "pathname/" in a strbuf on the heap for each level of the path, instead of two pointers and an int on the stack (with one pointer into the tree object). On a 64-bit system, the latter is 20 bytes; so if path components are less than that on average, this has lower peak memory usage. In practice it probably doesn't matter either way; we are already holding in memory all of the tree objects leading up to each pathname, and for normal-depth pathnames, we are only talking about hundreds of bytes. This patch leaves "struct name_path" as a thin wrapper around the strbuf, to avoid disrupting callbacks. We should fix them, but leaving it out makes this diff easier to view. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-12list-objects: pass full pathname to callbacksJeff King1-2/+1
When we find a blob at "a/b/c", we currently pass this to our show_object_fn callbacks as two components: "a/b/" and "c". Callbacks which want the full value then call path_name(), which concatenates the two. But this is an inefficient interface; the path is a strbuf, and we could simply append "c" to it temporarily, then roll back the length, without creating a new copy. So we could improve this by teaching the callsites of path_name() this trick (and there are only 3). But we can also notice that no callback actually cares about the broken-down representation, and simply pass each callback the full path "a/b/c" as a string. The callback code becomes even simpler, then, as we do not have to worry about freeing an allocated buffer, nor rolling back our modification to the strbuf. This is theoretically less efficient, as some callbacks would not bother to format the final path component. But in practice this is not measurable. Since we use the same strbuf over and over, our work to grow it is amortized, and we really only pay to memcpy a few bytes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-12list-objects: drop name_path entirelyJeff King1-6/+2
In the previous commit, we left name_path as a thin wrapper around a strbuf. This patch drops it entirely. As a result, every show_object_fn callback needs to be adjusted. However, none of their code needs to be changed at all, because the only use was to pass it to path_name(), which now handles the bare strbuf. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-12list-objects: convert name_path to a strbufJeff King1-3/+1
The "struct name_path" data is examined in only two places: we generate it in process_tree(), and we convert it to a single string in path_name(). Everyone else just passes it through to those functions. We can further note that process_tree() already keeps a single strbuf with the leading tree path, for use with tree_entry_interesting(). Instead of building a separate name_path linked list, let's just use the one we already build in "base". This reduces the amount of code (especially tricky code in path_name() which did not check for integer overflows caused by deep or large pathnames). It is also more efficient in some instances. Any time we were using tree_entry_interesting, we were building up the strbuf anyway, so this is an immediate and obvious win there. In cases where we were not, we trade off storing "pathname/" in a strbuf on the heap for each level of the path, instead of two pointers and an int on the stack (with one pointer into the tree object). On a 64-bit system, the latter is 20 bytes; so if path components are less than that on average, this has lower peak memory usage. In practice it probably doesn't matter either way; we are already holding in memory all of the tree objects leading up to each pathname, and for normal-depth pathnames, we are only talking about hundreds of bytes. This patch leaves "struct name_path" as a thin wrapper around the strbuf, to avoid disrupting callbacks. We should fix them, but leaving it out makes this diff easier to view. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-15format-patch: add an option to suppress commit hashbrian m. carlson1-0/+1
Oftentimes, patches created by git format-patch will be stored in version control or compared with diff. In these cases, two otherwise identical patches can have different commit hashes, leading to diff noise. Teach git format-patch a --zero-commit option that instead produces an all-zero hash to avoid this diff noise. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-29convert "enum date_mode" into a structJeff King1-1/+1
In preparation for adding date modes that may carry extra information beyond the mode itself, this patch converts the date_mode enum into a struct. Most of the conversion is fairly straightforward; we pass the struct as a pointer and dereference the type field where necessary. Locations that declare a date_mode can use a "{}" constructor. However, the tricky case is where we use the enum labels as constants, like: show_date(t, tz, DATE_NORMAL); Ideally we could say: show_date(t, tz, &{ DATE_NORMAL }); but of course C does not allow that. Likewise, we cannot cast the constant to a struct, because we need to pass an actual address. Our options are basically: 1. Manually add a "struct date_mode d = { DATE_NORMAL }" definition to each caller, and pass "&d". This makes the callers uglier, because they sometimes do not even have their own scope (e.g., they are inside a switch statement). 2. Provide a pre-made global "date_normal" struct that can be passed by address. We'd also need "date_rfc2822", "date_iso8601", and so forth. But at least the ugliness is defined in one place. 3. Provide a wrapper that generates the correct struct on the fly. The big downside is that we end up pointing to a single global, which makes our wrapper non-reentrant. But show_date is already not reentrant, so it does not matter. This patch implements 3, along with a minor macro to keep the size of the callers sane. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-11Merge branch 'jc/unused-symbols'Junio C Hamano1-8/+4
Mark file-local symbols as "static", and drop functions that nobody uses. * jc/unused-symbols: shallow.c: make check_shallow_file_for_update() static remote.c: make clear_cas_option() static urlmatch.c: make match_urls() static revision.c: make save_parents() and free_saved_parents() static line-log.c: make line_log_data_init() static pack-bitmap.c: make pack_bitmap_filename() static prompt.c: remove git_getpass() nobody uses http.c: make finish_active_slot() and handle_curl_result() static
2015-02-11Merge branch 'cj/log-invert-grep'Junio C Hamano1-0/+2
"git log --invert-grep --grep=WIP" will show only commits that do not have the string "WIP" in their messages. * cj/log-invert-grep: log: teach --invert-grep option
2015-01-15revision.c: make save_parents() and free_saved_parents() staticJunio C Hamano1-8/+4
No external callers exist. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-13log: teach --invert-grep optionChristoph Junghans1-0/+2
"git log --grep=<string>" shows only commits with messages that match the given string, but sometimes it is useful to be able to show only commits that do *not* have certain messages (e.g. "show me ones that are not FIXUP commits"). Originally, we had the invert-grep flag in grep_opt, but because "git grep --invert-grep" does not make sense except in conjunction with "--files-with-matches", which is already covered by "--files-without-matches", it was moved it to revisions structure. To have the flag there expresses the function to the feature better. When the newly inserted two tests run, the history would have commits with messages "initial", "second", "third", "fourth", "fifth", "sixth" and "Second", committed in this order. The commits that does not match either "th" or "Sec" is "second" and "initial". For the case insensitive case only "initial" matches. Signed-off-by: Christoph Junghans <ottxor@gentoo.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-29rev-list: add an option to mark fewer edges as uninterestingbrian m. carlson1-0/+1
In commit fbd4a70 (list-objects: mark more commits as edges in mark_edges_uninteresting - 2013-08-16), we marked an increasing number of edges uninteresting. This change, and the subsequent change to make this conditional on --objects-edge, are used by --thin to make much smaller packs for shallow clones. Unfortunately, they cause a significant performance regression when pushing non-shallow clones with lots of refs (23.322 seconds vs. 4.785 seconds with 22400 refs). Add an option to git rev-list, --objects-edge-aggressive, that preserves this more aggressive behavior, while leaving --objects-edge to provide more performant behavior. Preserve the current behavior for the moment by using the aggressive option. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19revision: remove definition of unused 'add_object' functionRamsay Jones1-5/+0
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19rev-list: add --indexed-objects optionJeff King1-0/+1
There is currently no easy way to ask the revision traversal machinery to include objects reachable from the index (e.g., blobs and trees that have not yet been committed). This patch adds an option to do so. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16reachable: reuse revision.c "add all reflogs" codeJeff King1-0/+1
We want to add all reflog entries as tips for finding reachable objects. The revision machinery can already do this (to support "rev-list --reflog"); we can reuse that code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-08Merge branch 'jk/pack-bitmap'Junio C Hamano1-1/+2
* jk/pack-bitmap: pack-objects: do not reuse packfiles without --delta-base-offset add `ignore_missing_links` mode to revwalk
2014-04-04add `ignore_missing_links` mode to revwalkVicent Marti1-1/+2
When pack-objects is computing the reachability bitmap to serve a fetch request, it can erroneously die() if some of the UNINTERESTING objects are not present. Upload-pack throws away HAVE lines from the client for objects we do not have, but we may have a tip object without all of its ancestors (e.g., if the tip is no longer reachable and was new enough to survive a `git prune`, but some of its reachable objects did get pruned). In the non-bitmap case, we do a revision walk with the HAVE objects marked as UNINTERESTING. The revision walker explicitly ignores errors in accessing UNINTERESTING commits to handle this case (and we do not bother looking at UNINTERESTING trees or blobs at all). When we have bitmaps, however, the process is quite different. The bitmap index for a pack-objects run is calculated in two separate steps: First, we perform an extensive walk from all the HAVEs to find the full set of objects reachable from them. This walk is usually optimized away because we are expected to hit an object with a bitmap during the traversal, which allows us to terminate early. Secondly, we perform an extensive walk from all the WANTs, which usually also terminates early because we hit a commit with an existing bitmap. Once we have the resulting bitmaps from the two walks, we AND-NOT them together to obtain the resulting set of objects we need to pack. When we are walking the HAVE objects, the revision walker does not know that we are walking it only to mark the results as uninteresting. We strip out the UNINTERESTING flag, because those objects _are_ interesting to us during the first walk. We want to keep going to get a complete set of reachable objects if we can. We need some way to tell the revision walker that it's OK to silently truncate the HAVE walk, just like it does for the UNINTERESTING case. This patch introduces a new `ignore_missing_links` flag to the `rev_info` struct, which we set only for the HAVE walk. It also adds tests to cover UNINTERESTING objects missing from several positions: a missing blob, a missing tree, and a missing parent commit. The missing blob already worked (as we do not care about its contents at all), but the other two cases caused us to die(). Note that there are a few cases we do not need to test: 1. We do not need to test a missing tree, with the blob still present. Without the tree that refers to it, we would not know that the blob is relevant to our walk. 2. We do not need to test a tip commit that is missing. Upload-pack omits these for us (and in fact, we complain even in the non-bitmap case if it fails to do so). Reported-by: Siddharth Agarwal <sid0@fb.com> Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-03Merge branch 'nd/log-show-linear-break'Junio C Hamano1-1/+10
Attempts to show where a single-strand-of-pearls break in "git log" output. * nd/log-show-linear-break: log: add --show-linear-break to help see non-linear history object.h: centralize object flag allocation
2014-03-25log: add --show-linear-break to help see non-linear historyNguyễn Thái Ngọc Duy1-1/+9
Option explanation is in rev-list-options.txt. The interaction with -z is left undecided. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-25object.h: centralize object flag allocationNguyễn Thái Ngọc Duy1-0/+1
While the field "flags" is mainly used by the revision walker, it is also used in many other places. Centralize the whole flag allocation to one place for a better overview (and easier to move flags if we have too). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-27Merge branch 'jk/pack-bitmap'Junio C Hamano1-0/+2
Borrow the bitmap index into packfiles from JGit to speed up enumeration of objects involved in a commit range without having to fully traverse the history. * jk/pack-bitmap: (26 commits) ewah: unconditionally ntohll ewah data ewah: support platforms that require aligned reads read-cache: use get_be32 instead of hand-rolled ntoh_l block-sha1: factor out get_be and put_be wrappers do not discard revindex when re-preparing packfiles pack-bitmap: implement optional name_hash cache t/perf: add tests for pack bitmaps t: add basic bitmap functionality tests count-objects: recognize .bitmap in garbage-checking repack: consider bitmaps when performing repacks repack: handle optional files created by pack-objects repack: turn exts array into array-of-struct repack: stop using magic number for ARRAY_SIZE(exts) pack-objects: implement bitmap writing rev-list: add bitmap mode to speed up object lists pack-objects: use bitmaps when packing objects pack-objects: split add_object_entry pack-bitmap: add support for bitmap indexes documentation: add documentation for the bitmap format ewah: compressed bitmap implementation ...
2013-12-05Merge branch 'jc/ref-excludes'Junio C Hamano1-0/+8
People often wished a way to tell "git log --branches" (and "git log --remotes --not --branches") to exclude some local branches from the expansion of "--branches" (similarly for "--tags", "--all" and "--glob=<pattern>"). Now they have one. * jc/ref-excludes: rev-parse: introduce --exclude=<glob> to tame wildcards rev-list --exclude: export add/clear-ref-exclusion and ref-excluded API rev-list --exclude: tests document --exclude option revision: introduce --exclude=<glob> to tame wildcards
2013-11-01rev-list --exclude: export add/clear-ref-exclusion and ref-excluded APIJunio C Hamano1-0/+5
... while updating their function signature. To be squashed into the initial patch to rev-list. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-31revision: add missing includeFelipe Contreras1-0/+1
Otherwise we might not have 'struct diff_options'. [jc: needs a matching follow-up patch to remove inclusion of diff.h from *.c files that do not themselves use anything from diff.h] Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24revision: allow setting custom limiter functionVicent Marti1-0/+2
This commit enables users of `struct rev_info` to peform custom limiting during a revision walk (i.e. `get_revision`). If the field `include_check` has been set to a callback, this callback will be issued once for each commit before it is added to the "pending" list of the revwalk. If the include check returns 0, the commit will be marked as added but won't be pushed to the pending list, effectively limiting the walk. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-30revision: introduce --exclude=<glob> to tame wildcardsJunio C Hamano1-0/+3
People often find "git log --branches" etc. that includes _all_ branches is cumbersome to use when they want to grab most but except some. The same applies to --tags, --all and --glob. Teach the revision machinery to remember patterns, and then upon the next such a globbing option, exclude those that match the pattern. With this, I can view only my integration branches (e.g. maint, master, etc.) without topic branches, which are named after two letters from primary authors' names, slash and topic name. git rev-list --no-walk --exclude=??/* --branches | git name-rev --refs refs/heads/* --stdin This one shows things reachable from local and remote branches that have not been merged to the integration branches. git log --remotes --branches --not --exclude=??/* --branches It may be a bit rough around the edges, in that the pattern to give the exclude option depends on what globbing option follows. In these examples, the pattern "??/*" is used, not "refs/heads/??/*", because the globbing option that follows the -"-exclude=<pattern>" is "--branches". As each use of globbing option resets previously set "--exclude", this may not be such a bad thing, though. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-01log: use true parents for diff even when rewritingThomas Rast1-0/+20
When using pathspec filtering in combination with diff-based log output, parent simplification happens before the diff is computed. The diff is therefore against the *simplified* parents. This works okay, arguably by accident, in the normal case: simplification reduces to one parent as long as the commit is TREESAME to it. So the simplified parent of any given commit must have the same tree contents on the filtered paths as its true (unfiltered) parent. However, --full-diff breaks this guarantee, and indeed gives pretty spectacular results when comparing the output of git log --graph --stat ... git log --graph --full-diff --stat ... (--graph internally kicks in parent simplification, much like --parents). To fix it, store a copy of the parent list before simplification (in a slab) whenever --full-diff is in effect. Then use the stored parents instead of the simplified ones in the commit display code paths. The latter do not actually check for --full-diff to avoid duplicated code; they just grab the original parents if save_parents() has not been called for this revision walk. For ordinary commits it should be obvious that this is the right thing to do. Merge commits are a bit subtle. Observe that with default simplification, merge simplification is an all-or-nothing decision: either the merge is TREESAME to one parent and disappears, or it is different from all parents and the parent list remains intact. Redundant parents are not pruned, so the existing code also shows them as a merge. So if we do show a merge commit, the parent list just consists of the rewrite result on each parent. Running, e.g., --cc on this in --full-diff mode is not very useful: if any commits were skipped, some hunks will disagree with all sides of the merge (with one side, because commits were skipped; with the others, because they didn't have those changes in the first place). This triggers --cc showing these hunks spuriously. Therefore I believe that even for merge commits it is better to show the diffs wrt. the original parents. Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-03teach format-patch to place other authors into in-body "From"Jeff King1-0/+1
Format-patch generates emails with the "From" address set to the author of each patch. If you are going to send the emails, however, you would want to replace the author identity with yours (if they are not the same), and bump the author identity to an in-body header. Normally this is handled by git-send-email, which does the transformation before sending out the emails. However, some workflows may not use send-email (e.g., imap-send, or a custom script which feeds the mbox to a non-git MUA). They could each implement this feature themselves, but getting it right is non-trivial (one must canonicalize the identities by reversing any RFC2047 encoding or RFC822 quoting of the headers, which has caused many bugs in send-email over the years). This patch takes a different approach: it teaches format-patch a "--from" option which handles the ident check and in-body header while it is writing out the email. It's much simpler to do at this level (because we haven't done any quoting yet), and any workflow based on format-patch can easily turn it on. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-01Merge branch 'jc/topo-author-date-sort'Junio C Hamano1-1/+5
"git log" learned the "--author-date-order" option, with which the output is topologically sorted and commits in parallel histories are shown intermixed together based on the author timestamp. * jc/topo-author-date-sort: t6003: add --author-date-order test topology tests: teach a helper to set author dates as well t6003: add --date-order test topology tests: teach a helper to take abbreviated timestamps t/lib-t6000: style fixes log: --author-date-order sort-in-topological-order: use prio-queue prio-queue: priority queue of pointers to structs toposort: rename "lifo" field
2013-06-14Merge branch 'mh/reflife'Junio C Hamano1-11/+21
Define memory ownership and lifetime rules for what for-each-ref feeds to its callbacks (in short, "you do not own it, so make a copy if you want to keep it"). * mh/reflife: (25 commits) refs: document the lifetime of the args passed to each_ref_fn register_ref(): make a copy of the bad reference SHA-1 exclude_existing(): set existing_refs.strdup_strings string_list_add_refs_by_glob(): add a comment about memory management string_list_add_one_ref(): rename first parameter to "refname" show_head_ref(): rename first parameter to "refname" show_head_ref(): do not shadow name of argument add_existing(): do not retain a reference to sha1 do_fetch(): clean up existing_refs before exiting do_fetch(): reduce scope of peer_item object_array_entry: fix memory handling of the name field find_first_merges(): remove unnecessary code find_first_merges(): initialize merges variable using initializer fsck: don't put a void*-shaped peg in a char*-shaped hole object_array_remove_duplicates(): rewrite to reduce copying revision: use object_array_filter() in implementation of gc_boundary() object_array: add function object_array_filter() revision: split some overly-long lines cmd_diff(): make it obvious which cases are exclusive of each other cmd_diff(): rename local variable "list" -> "entry" ...
2013-06-14Merge branch 'kb/full-history-compute-treesame-carefully-2'Junio C Hamano1-1/+3
Major update to the revision traversal logic to improve culling of irrelevant parents while traversing a mergy history. * kb/full-history-compute-treesame-carefully-2: revision.c: make default history consider bottom commits revision.c: don't show all merges for --parents revision.c: discount side branches when computing TREESAME revision.c: add BOTTOM flag for commits simplify-merges: drop merge from irrelevant side branch simplify-merges: never remove all TREESAME parents t6012: update test for tweaked full-history traversal revision.c: Make --full-history consider more merges Documentation: avoid "uninteresting" rev-list-options.txt: correct TREESAME for P t6111: add parents to tests t6111: allow checking the parents as well t6111: new TREESAME test set t6019: test file dropped in -s ours merge decorate.c: compact table when growing
2013-06-11toposort: rename "lifo" fieldJunio C Hamano1-1/+5
The primary invariant of sort_in_topological_order() is that a parent commit is not emitted until all children of it are. When traversing a forked history like this with "git log C E": A----B----C \ D----E we ensure that A is emitted after all of B, C, D, and E are done, B has to wait until C is done, and D has to wait until E is done. In some applications, however, we would further want to control how these child commits B, C, D and E on two parallel ancestry chains are shown. Most of the time, we would want to see C and B emitted together, and then E and D, and finally A (i.e. the --topo-order output). The "lifo" parameter of the sort_in_topological_order() function is used to control this behaviour. We start the traversal by knowing two commits, C and E. While keeping in mind that we also need to inspect E later, we pick C first to inspect, and we notice and record that B needs to be inspected. By structuring the "work to be done" set as a LIFO stack, we ensure that B is inspected next, before other in-flight commits we had known that we will need to inspect, e.g. E. When showing in --date-order, we would want to see commits ordered by timestamps, i.e. show C, E, B and D in this order before showing A, possibly mixing commits from two parallel histories together. When "lifo" parameter is set to false, the function keeps the "work to be done" set sorted in the date order to realize this semantics. After inspecting C, we add B to the "work to be done" set, but the next commit we inspect from the set is E which is newer than B. The name "lifo", however, is too strongly tied to the way how the function implements its behaviour, and does not describe what the behaviour _means_. Replace this field with an enum rev_sort_order, with two possible values: REV_SORT_IN_GRAPH_ORDER and REV_SORT_BY_COMMIT_DATE, and update the existing code. The mechanical replacement rule is: "lifo == 0" is equivalent to "sort_order == REV_SORT_BY_COMMIT_DATE" "lifo == 1" is equivalent to "sort_order == REV_SORT_IN_GRAPH_ORDER" Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-02Merge branch 'tr/line-log'Junio C Hamano1-1/+15
* tr/line-log: git-log(1): remove --full-line-diff description line-log: fix documentation formatting log -L: improve comments in process_all_files() log -L: store the path instead of a diff_filespec log -L: test merge of parallel modify/rename t4211: pass -M to 'git log -M -L...' test log -L: fix overlapping input ranges log -L: check range set invariants when we look it up Speed up log -L... -M log -L: :pattern:file syntax to find by funcname Implement line-history search (git log -L) Export rewrite_parents() for 'log -L' Refactor parse_loc
2013-05-28revision: split some overly-long linesMichael Haggerty1-11/+21
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-16revision.c: add BOTTOM flag for commitsKevin Bracey1-1/+2
When performing edge-based operations on the revision graph, it can be useful to be able to identify the INTERESTING graph's connection(s) to the bottom commit(s) specified by the user. Conceptually when the user specifies "A..B" (== B ^A), they are asking for the history from A to B. The first connection from A onto the INTERESTING graph is part of that history, and should be considered. If we consider only INTERESTING nodes and their connections, then we're really only considering the history from A's immediate descendants to B. This patch does not change behaviour, but adds a new BOTTOM flag to indicate the bottom commits specified by the user, ready to be used by following patches. We immediately use the BOTTOM flag to return collect_bottom_commits() to its original approach of examining the pending commit list rather than the command line. This will ensure alignment of the definition of "bottom" with future patches. Signed-off-by: Kevin Bracey <kevin@bracey.fi> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-16revision.c: Make --full-history consider more mergesKevin Bracey1-0/+1
History simplification previously always treated merges as TREESAME if they were TREESAME to any parent. While this was consistent with the default behaviour, this could be extremely unhelpful when searching detailed history, and could not be overridden. For example, if a merge had ignored a change, as if by "-s ours", then: git log -m -p --full-history -Schange file would successfully locate "change"'s addition but would not locate the merge that resolved against it. Futher, simplify_merges could drop the actual parent that a commit was TREESAME to, leaving it as a normal commit marked TREESAME that isn't actually TREESAME to its remaining parent. Now redefine a commit's TREESAME flag to be true only if a commit is TREESAME to _all_ of its parents. This doesn't affect either the default simplify_history behaviour (because partially TREESAME merges are turned into normal commits), or full-history with parent rewriting (because all merges are output). But it does affect other modes. The clearest difference is that --full-history will show more merges - sufficient to ensure that -m -p --full-history log searches can really explain every change to the file, including those changes' ultimate fate in merges. Also modify simplify_merges to recalculate TREESAME after removing a parent. This is achieved by storing per-parent TREESAME flags on the initial scan, so the combined flag can be easily recomputed. This fixes some t6111 failures, but creates a couple of new ones - we are now showing some merges that don't need to be shown. Signed-off-by: Kevin Bracey <kevin@bracey.fi> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-16revision.c: treat A...B merge bases as if manually specifiedKevin Bracey1-0/+1
The documentation assures users that "A...B" is defined as "A B --not $(git merge-base --all A B)". This wasn't in fact quite true, because the calculated merge bases were not sent to add_rev_cmdline(). The main effect of this was that although git rev-list --ancestry-path A B --not $(git merge-base --all A B) worked, the simpler form git rev-list --ancestry-path A...B failed with a "no bottom commits" error. Other potential users of bottom commits could also be affected by this problem, if they examine revs->cmdline_info; I came across the issue in my proposed history traversal refinements series. So ensure that the calculated merge bases are sent to add_rev_cmdline(), flagged with new 'whence' enum value REV_CMD_MERGE_BASE. Signed-off-by: Kevin Bracey <kevin@bracey.fi> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-01Merge branch 'bc/append-signed-off-by'Junio C Hamano1-1/+1
Consolidate codepaths that inspect log-message-to-be and decide to add a new Signed-off-by line in various commands. * bc/append-signed-off-by: git-commit: populate the edit buffer with 2 blank lines before s-o-b Unify appending signoff in format-patch, commit and sequencer format-patch: update append_signoff prototype t4014: more tests about appending s-o-b lines sequencer.c: teach append_signoff to avoid adding a duplicate newline sequencer.c: teach append_signoff how to detect duplicate s-o-b sequencer.c: always separate "(cherry picked from" from commit body sequencer.c: require a conforming footer to be preceded by a blank line sequencer.c: recognize "(cherry picked from ..." as part of s-o-b footer t/t3511: add some tests of 'cherry-pick -s' functionality t/test-lib-functions.sh: allow to specify the tag name to test_commit commit, cherry-pick -s: remove broken support for multiline rfc2822 fields sequencer.c: rework search for start of footer to improve clarity
2013-03-28Implement line-history search (git log -L)Thomas Rast1-1/+5
This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-28Export rewrite_parents() for 'log -L'Bo Yang1-0/+10
The function rewrite_one is used to rewrite a single parent of the current commit, and is used by rewrite_parents to rewrite all the parents. Decouple the dependence between them by making rewrite_one a callback function that is passed to rewrite_parents. Then export rewrite_parents for reuse by the line history browser. We will use this function in line-log.c. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-12format-patch: update append_signoff prototypeNguyễn Thái Ngọc Duy1-1/+1
This is a preparation step for merging with append_signoff from sequencer.c Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Brandon Casey <bcasey@nvidia.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-20Merge branch 'ap/log-mailmap'Junio C Hamano1-0/+1
Teach commands in the "log" family to optionally pay attention to the mailmap. * ap/log-mailmap: log --use-mailmap: optimize for cases without --author/--committer search log: add log.mailmap configuration option log: grep author/committer using mailmap test: add test for --use-mailmap option log: add --use-mailmap option pretty: use mailmap to display username and email mailmap: add mailmap structure to rev_info and pp mailmap: simplify map_user() interface mailmap: remove email copy and length limitation Use split_ident_line to parse author and committer string-list: allow case-insensitive string list
2013-01-11Merge branch 'jc/format-patch-reroll'Junio C Hamano1-0/+1
Teach "format-patch" to prefix v4- to its output files for the fourth iteration of a patch series, to make it easier for the submitter to keep separate copies for iterations. * jc/format-patch-reroll: format-patch: give --reroll-count a short synonym -v format-patch: document and test --reroll-count format-patch: add --reroll-count=$N option get_patch_filename(): split into two functions get_patch_filename(): drop "just-numbers" hack get_patch_filename(): simplify function signature builtin/log.c: stop using global patch_suffix builtin/log.c: drop redundant "numbered_files" parameter from make_cover_letter() builtin/log.c: drop unused "numbered" parameter from make_cover_letter()
2013-01-10mailmap: add mailmap structure to rev_info and ppAntoine Pelisse1-0/+1
Pass a mailmap from rev_info to pretty_print_context to so that the pretty printer can use rewritten name and email address when showing commits. Signed-off-by: Antoine Pelisse <apelisse@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-22format-patch: add --reroll-count=$N optionJunio C Hamano1-0/+1
The --reroll-count=$N option, when given a positive integer: - Adds " v$N" to the subject prefix specified. As the default subject prefix string is "PATCH", --reroll-count=2 makes it "PATCH v2". - Prefixes "v$N-" to the names used for output files. The cover letter, whose name is usually 0000-cover-letter.patch, becomes v2-0000-cover-letter.patch when given --reroll-count=2. This allows users to use the same --output-directory for multiple iterations of the same series, without letting the output for a newer round overwrite output files from the earlier rounds. The user can incorporate materials from earlier rounds to update the newly minted iteration, and use "send-email v2-*.patch" to send out the patches belonging to the second iteration easily. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-17format-patch --notes: show notes after three-dashesJunio C Hamano1-0/+1
When inserting the note after the commit log message to format-patch output, add three dashes before the note. Record the fact that we did so in the rev_info and omit showing duplicated three dashes in the usual codepath that is used when notes are not being shown. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-10Merge branch 'mz/cherry-pick-cmdline-order'Junio C Hamano1-1/+5
"git cherry-pick A C B" used to replay changes in A and then B and then C if these three commits had committer timestamps in that order, which is not what the user who said "A C B" naturally expects. * mz/cherry-pick-cmdline-order: cherry-pick/revert: respect order of revisions to pick demonstrate broken 'git cherry-pick three one two' teach log --no-walk=unsorted, which avoids sorting
2012-08-30teach log --no-walk=unsorted, which avoids sortingMartin von Zweigbergk1-1/+5
When 'git log' is passed the --no-walk option, no revision walk takes place, naturally. Perhaps somewhat surprisingly, however, the provided revisions still get sorted by commit date. So e.g 'git log --no-walk HEAD HEAD~1' and 'git log --no-walk HEAD~1 HEAD' give the same result (unless the two revisions share the commit date, in which case they will retain the order given on the command line). As the commit that introduced --no-walk (8e64006 (Teach revision machinery about --no-walk, 2007-07-24)) points out, the sorting is intentional, to allow things like git log --abbrev-commit --pretty=oneline --decorate --all --no-walk to show all refs in order by commit date. But there are also other cases where the sorting is not wanted, such as <command producing revisions in order> | git log --oneline --no-walk --stdin To accomodate both cases, leave the decision of whether or not to sort up to the caller, by allowing --no-walk={sorted,unsorted}, defaulting to 'sorted' for backward-compatibility reasons. Signed-off-by: Martin von Zweigbergk <martinvonz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-22Merge branch 'jc/sha1-name-more'Junio C Hamano1-1/+4
Teaches the object name parser things like a "git describe" output is always a commit object, "A" in "git log A" must be a committish, and "A" and "B" in "git log A...B" both must be committish, etc., to prolong the lifetime of abbreviated object names. * jc/sha1-name-more: (27 commits) t1512: match the "other" object names t1512: ignore whitespaces in wc -l output rev-parse --disambiguate=<prefix> rev-parse: A and B in "rev-parse A..B" refer to committish reset: the command takes committish commit-tree: the command wants a tree and commits apply: --build-fake-ancestor expects blobs sha1_name.c: add support for disambiguating other types revision.c: the "log" family, except for "show", takes committish revision.c: allow handle_revision_arg() to take other flags sha1_name.c: introduce get_sha1_committish() sha1_name.c: teach lookup context to get_sha1_with_context() sha1_name.c: many short names can only be committish sha1_name.c: get_sha1_1() takes lookup flags sha1_name.c: get_describe_name() by definition groks only commits sha1_name.c: teach get_short_sha1() a commit-only option sha1_name.c: allow get_short_sha1() to take other flags get_sha1(): fix error status regression sha1_name.c: restructure disambiguation of short names sha1_name.c: correct misnamed "canonical" and "res" ...
2012-07-09revision.c: the "log" family, except for "show", takes committishJunio C Hamano1-0/+2
Add a field to setup_revision_opt structure and allow these callers to tell the setup_revisions command parsing machinery that short SHA1 it encounters are meant to name committish. This step does not go all the way to connect the setup_revisions() to sha1_name.c yet. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-09revision.c: allow handle_revision_arg() to take other flagsJunio C Hamano1-1/+2
The existing "cant_be_filename" that tells the function that the caller knows the arg is not a path (hence it does not have to be checked for absense of the file whose name matches it) is made into a bit in the flag word. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-27Merge branch 'cb/cherry-pick-rev-path-confusion'Junio C Hamano1-0/+1
The command line parser choked "git cherry-pick $name" when $name can be both revision name and a pathname, even though $name can never be a path in the context of the command. The issue the patch addresses is real, but the way it is implemented felt unnecessarily invasive a bit. It may be cleaner for this caller to add the "--" to the end of the argv_array it passes to setup_revisions(). By Clemens Buchacher * cb/cherry-pick-rev-path-confusion: cherry-pick: do not expect file arguments
2012-04-15cherry-pick: do not expect file argumentsClemens Buchacher1-0/+1
If a commit-ish passed to cherry-pick or revert happens to have a file of the same name, git complains that the argument is ambiguous and advises to use '--'. To make things worse, the '--' argument is removed by parse_options, und so passing '--' has no effect. Instead, always interpret cherry-pick/revert arguments as revisions. Signed-off-by: Clemens Buchacher <drizzd@aon.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-30Teach revision walking machinery to walk multiple times sequenciallyHeiko Voigt1-0/+1
Previously it was not possible to iterate revisions twice using the revision walking api. We add a reset_revision_walk() which clears the used flags. This allows us to do multiple sequencial revision walks. We add the appropriate calls to the existing submodule machinery doing revision walks. This is done to avoid surprises if future code wants to call these functions more than once during the processes lifetime. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-12log: --show-signatureJunio C Hamano1-0/+1
This teaches the "log" family of commands to pass the GPG signature in the commit objects to "gpg --verify" via the verify_signed_buffer() interface used to verify signed tag objects. E.g. $ git show --show-signature -s HEAD shows GPG output in the header part of the output. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-13Merge branch 'rs/pending'Junio C Hamano1-0/+2
* rs/pending: commit: factor out clear_commit_marks_for_object_array checkout: use leak_pending flag bundle: use leak_pending flag bisect: use leak_pending flag revision: add leak_pending flag checkout: use add_pending_{object,sha1} in orphan check revision: factor out add_pending_sha1 checkout: check for "Previous HEAD" notice in t2020 Conflicts: builtin/checkout.c revision.c
2011-10-05Merge branch 'jc/fetch-verify'Junio C Hamano1-0/+1
* jc/fetch-verify: fetch: verify we have everything we need before updating our ref rev-list --verify-object list-objects: pass callback data to show_objects()
2011-10-05Merge branch 'jc/traverse-commit-list'Junio C Hamano1-0/+2
* jc/traverse-commit-list: revision.c: update show_object_with_name() without using malloc() revision.c: add show_object_with_name() helper function rev-list: fix finish_object() call
2011-10-05Merge branch 'bk/ancestry-path'Junio C Hamano1-0/+20
* bk/ancestry-path: t6019: avoid refname collision on case-insensitive systems revision: do not include sibling history in --ancestry-path output revision: keep track of the end-user input from the command line rev-list: Demonstrate breakage with --ancestry-path --all
2011-10-03revision: add leak_pending flagRené Scharfe1-0/+1
The new flag leak_pending in struct rev_info can be used to prevent prepare_revision_walk from freeing the list of pending objects. It will still forget about them, so it really is leaked. This behaviour may look weird at first, but it can be useful if the pointer to the list is saved before calling prepare_revision_walk. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-03revision: factor out add_pending_sha1René Scharfe1-0/+1
This function is a combination of the static get_reference and add_pending_object. It can be used to easily queue objects by hash. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-01rev-list --verify-objectJunio C Hamano1-0/+1
Often we want to verify everything reachable from a given set of commits are present in our repository and connected without a gap to the tips of our refs. We used to do this for this purpose: $ rev-list --objects $commits_to_be_tested --not --all Even though this is good enough for catching missing commits and trees, we show the object name but do not verify their existence, let alone their well-formedness, for the blob objects at the leaf level. Add a new "--verify-object" option so that we can catch missing and broken blobs as well. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-25revision: keep track of the end-user input from the command lineJunio C Hamano1-0/+20
Given a complex set of revision specifiers on the command line, it is too late to look at the flags of the objects in the initial traversal list at the beginning of limit_list() in order to determine what the objects the end-user explicitly listed on the command line were. The process to move objects from the pending array to the traversal list may have marked objects that are not mentioned as UNINTERESTING, when handle_commit() marked the parents of UNINTERESTING commits mentioned on the command line by calling mark_parents_uninteresting(). This made "rev-list --ancestry-path ^A ..." to mistakenly list commits that are descendants of A's parents but that are not descendants of A itself, as ^A from the command line causes A and its parents marked as UNINTERESTING before coming to limit_list(), and we try to enumerate the commits that are descendants of these commits that are UNINTERESTING before we start walking the history. It actually is too late even if we inspected the pending object array before calling prepare_revision_walk(), as some of the same objects might have been mentioned twice, once as positive and another time as negative. The "rev-list --some-option A --not --all" command may want to notice, even if the resulting set is empty, that the user showed some interest in "A" and do something special about it. Prepare a separate array to keep track of what syntactic element was used to cause each object to appear in the pending array from the command line, and populate it as setup_revisions() parses the command line. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-22revision.c: add show_object_with_name() helper functionJunio C Hamano1-0/+2
There are two copies of traverse_commit_list callback that show the object name followed by pathname the object was found, to produce output similar to "rev-list --objects". Unify them. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-31Merge branch 'jk/format-patch-am'Junio C Hamano1-1/+2
* jk/format-patch-am: format-patch: preserve subject newlines with -k clean up calling conventions for pretty.c functions pretty: add pp_commit_easy function for simple callers mailinfo: always clean up rfc822 header folding t: test subject handling in format-patch / am pipeline Conflicts: builtin/branch.c builtin/log.c commit.h