aboutsummaryrefslogtreecommitdiffstats
path: root/convert.c
AgeCommit message (Collapse)AuthorFilesLines
2024-02-12use xstrncmpz()René Scharfe1-1/+1
Add and apply a semantic patch for calling xstrncmpz() to compare a NUL-terminated string with a buffer of a known length instead of using strncmp() and checking the terminating NUL explicitly. This simplifies callers by reducing code duplication. I had to adjust remote.c manually because Coccinelle inexplicably changed the indent of the else branches. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-17Merge branch 'cw/compat-util-header-cleanup'Junio C Hamano1-1/+0
Further shuffling of declarations across header files to streamline file dependencies. * cw/compat-util-header-cleanup: git-compat-util: move alloc macros to git-compat-util.h treewide: remove unnecessary includes for wrapper.h kwset: move translation table from ctype sane-ctype.h: create header for sane-ctype macros git-compat-util: move wrapper.c funcs to its header git-compat-util: move strbuf.c funcs to its header
2023-07-06Merge branch 'gc/config-context'Junio C Hamano1-1/+3
Reduce reliance on a global state in the config reading API. * gc/config-context: config: pass source to config_parser_event_fn_t config: add kvi.path, use it to evaluate includes config.c: remove config_reader from configsets config: pass kvi to die_bad_number() trace2: plumb config kvi config.c: pass ctx with CLI config config: pass ctx with config files config.c: pass ctx in configsets config: add ctx arg to config_fn_t urlmatch.h: use config_fn_t type config: inline git_color_default_config
2023-07-06Merge branch 'rs/strbuf-expand-step'Junio C Hamano1-12/+10
Code clean-up around strbuf_expand() API. * rs/strbuf-expand-step: strbuf: simplify strbuf_expand_literal_cb() replace strbuf_expand() with strbuf_expand_step() replace strbuf_expand_dict_cb() with strbuf_expand_step() strbuf: factor out strbuf_expand_step() pretty: factor out expand_separator()
2023-07-05treewide: remove unnecessary includes for wrapper.hCalvin Wan1-1/+0
Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28config: add ctx arg to config_fn_tGlen Choo1-1/+3
Add a new "const struct config_context *ctx" arg to config_fn_t to hold additional information about the config iteration operation. config_context has a "struct key_value_info kvi" member that holds metadata about the config source being read (e.g. what kind of config source it is, the filename, etc). In this series, we're only interested in .kvi, so we could have just used "struct key_value_info" as an arg, but config_context makes it possible to add/adjust members in the future without changing the config_fn_t signature. We could also consider other ways of organizing the args (e.g. moving the config name and value into config_context or key_value_info), but in my experiments, the incremental benefit doesn't justify the added complexity (e.g. a config_fn_t will sometimes invoke another config_fn_t but with a different config value). In subsequent commits, the .kvi member will replace the global "struct config_reader" in config.c, making config iteration a global-free operation. It requires much more work for the machinery to provide meaningful values of .kvi, so for now, merely change the signature and call sites, pass NULL as a placeholder value, and don't rely on the arg in any meaningful way. Most of the changes are performed by contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every config_fn_t: - Modifies the signature to accept "const struct config_context *ctx" - Passes "ctx" to any inner config_fn_t, if needed - Adds UNUSED attributes to "ctx", if needed Most config_fn_t instances are easily identified by seeing if they are called by the various config functions. Most of the remaining ones are manually named in the .cocci patch. Manual cleanups are still needed, but the majority of it is trivial; it's either adjusting config_fn_t that the .cocci patch didn't catch, or adding forward declarations of "struct config_context ctx" to make the signatures make sense. The non-trivial changes are in cases where we are invoking a config_fn_t outside of config machinery, and we now need to decide what value of "ctx" to pass. These cases are: - trace2/tr2_cfg.c:tr2_cfg_set_fl() This is indirectly called by git_config_set() so that the trace2 machinery can notice the new config values and update its settings using the tr2 config parsing function, i.e. tr2_cfg_cb(). - builtin/checkout.c:checkout_main() This calls git_xmerge_config() as a shorthand for parsing a CLI arg. This might be worth refactoring away in the future, since git_xmerge_config() can call git_default_config(), which can do much more than just parsing. Handle them by creating a KVI_INIT macro that initializes "struct key_value_info" to a reasonable default, and use that to construct the "ctx" arg. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21object-store-ll.h: split this header out of object-store.hElijah Newren1-1/+1
The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store | sort | uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21merge-ll: rename from ll-mergeElijah Newren1-1/+1
A long term (but rather minor) pet-peeve of mine was the name ll-merge.[ch]. I thought it made it harder to realize what stuff was related to merging when I was working on the merge machinery and trying to improve it. Further, back in d1cbe1e6d8a ("hash-ll.h: split out of hash.h to remove dependency on repository.h", 2023-04-22), we have split the portions of hash.h that do not depend upon repository.h into a "hash-ll.h" (due to the recommendation to use "ll" for "low-level" in its name[1], but which I used as a suffix precisely because of my distaste for "ll-merge"). When we discussed adding additional "*-ll.h" files, a request was made that we use "ll" consistently as either a prefix or a suffix. Since it is already in use as both a prefix and a suffix, the only way to do so is to rename some files. Besides my distaste for the ll-merge.[ch] name, let me also note that the files ll-fsmonitor.h, ll-hash.h, ll-merge.h, ll-object-store.h, ll-read-cache.h would have essentially nothing to do with each other and make no sense to group. But giving them the common "ll-" prefix would group them. Using "-ll" as a suffix thus seems just much more logical to me. Rename ll-merge.[ch] to merge-ll.[ch] to achieve this consistency, and to ensure we get a more logical grouping of files. [1] https://lore.kernel.org/git/kl6lsfcu1g8w.fsf@chooglen-macbookpro.roam.corp.google.com/ Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21cache.h: remove this no-longer-used headerElijah Newren1-1/+1
Since this header showed up in some places besides just #include statements, update/clean-up/remove those other places as well. Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got away with violating the rule that all files must start with an include of git-compat-util.h (or a short-list of alternate headers that happen to include it first). This change exposed the violation and caused it to stop building correctly; fix it by having it include git-compat-util.h first, as per policy. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21read-cache*.h: move declarations for read-cache.c functions from cache.hElijah Newren1-0/+1
For the functions defined in read-cache.c, move their declarations from cache.h to a new header, read-cache-ll.h. Also move some related inline functions from cache.h to read-cache.h. The purpose of the read-cache-ll.h/read-cache.h split is that about 70% of the sites don't need the inline functions and the extra headers they include. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-18replace strbuf_expand_dict_cb() with strbuf_expand_step()René Scharfe1-12/+10
Avoid the overhead of setting up a dictionary and passing it via strbuf_expand() to strbuf_expand_dict_cb() by using strbuf_expand_step() in a loop instead. It requires explicit handling of %% and unrecognized placeholders, but is more direct and simpler overall, and expands only on demand. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-17Merge branch 'jc/attr-source-tree'Junio C Hamano1-1/+1
"git --attr-source=<tree> cmd $args" is a new way to have any command to read attributes not from the working tree but from the given tree object. * jc/attr-source-tree: attr: teach "--attr-source=<tree>" global option to "git"
2023-05-06attr: teach "--attr-source=<tree>" global option to "git"John Cai1-1/+1
Earlier, 47cfc9bd (attr: add flag `--source` to work with tree-ish, 2023-01-14) taught "git check-attr" the "--source=<tree>" option to allow it to read attribute files from a tree-ish, but did so only for the command. Just like "check-attr" users wanted a way to use attributes from a tree-ish and not from the working tree files, users of other commands (like "git diff") would benefit from the same. Undo most of the UI change the commit made, while keeping the internal logic to read attributes from a given tree-ish. Expose the internal logic via a new "--attr-source=<tree>" command line option given to "git", so that it can be used with any git command that runs as part of the main git process. Additionally, add an environment variable GIT_ATTR_SOURCE that is set when --attr-source is passed in, so that subprocesses use the same value for the attributes source tree. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24copy.h: move declarations for copy.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on convert.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on advice.hElijah Newren1-0/+1
Dozens of files made use of advice functions, without explicitly including advice.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include advice.h if they are using it. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on trace.h & trace2.hElijah Newren1-0/+1
Dozens of files made use of trace and trace2 functions, without explicitly including trace.h or trace2.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include trace.h or trace2.h if they are using them. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21wrapper.h: move declarations for wrapper.c functions from cache.hElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren1-0/+1
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren1-0/+1
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-14attr: add flag `--source` to work with tree-ishKarthik Nayak1-1/+1
The contents of the .gitattributes files may evolve over time, but "git check-attr" always checks attributes against them in the working tree and/or in the index. It may be beneficial to optionally allow the users to check attributes taken from a commit other than HEAD against paths. Add a new flag `--source` which will allow users to check the attributes against a commit (actually any tree-ish would do). When the user uses this flag, we go through the stack of .gitattributes files but instead of checking the current working tree and/or in the index, we check the blobs from the provided tree-ish object. This allows the command to also be used in bare repositories. Since we use a tree-ish object, the user can pass "--source HEAD:subdirectory" and all the attributes will be looked up as if subdirectory was the root directory of the repository. We cannot simply use the `<rev>:<path>` syntax without the `--source` flag, similar to how it is used in `git show` because any non-flag parameter before `--` is treated as an attribute and any parameter after `--` is treated as a pathname. The change involves creating a new function `read_attr_from_blob`, which given the path reads the blob for the path against the provided source and parses the attributes line by line. This function is plugged into `read_attr()` function wherein we go through the stack of attributes files. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Toon Claes <toon@iotcl.com> Co-authored-by: toon@iotcl.com Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-17convert: mark unused parameter in null stream filterJeff King1-2/+2
The null stream filter unsurprisingly does not look at its "filter" argument, since it just eats bytes. But we can't drop it, since it has to conform to the same virtual interface that real filters do. Mark the unused parameter to appease -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01git-compat-util.h: use "UNUSED", not "UNUSED(var)"Ævar Arnfjörð Bjarmason1-2/+2
As reported in [1] the "UNUSED(var)" macro introduced in 2174b8c75de (Merge branch 'jk/unused-annotation' into next, 2022-08-24) breaks coccinelle's parsing of our sources in files where it occurs. Let's instead partially go with the approach suggested in [2] of making this not take an argument. As noted in [1] "coccinelle" will ignore such tokens in argument lists that it doesn't know about, and it's less of a surprise to syntax highlighters. This undoes the "help us notice when a parameter marked as unused is actually use" part of 9b240347543 (git-compat-util: add UNUSED macro, 2022-08-19), a subsequent commit will further tweak the macro to implement a replacement for that functionality. 1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19run-command: mark unused async callback parametersJeff King1-1/+1
The start_async(), etc, functions need a "proc" callback that conforms to a particular interface. Not every callback needs every parameter (e.g., the caller might not even ask to open an input descriptor, in which case there is no point in the callback looking at it). Let's mark these for -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19config: mark unused callback parametersJeff King1-1/+1
The callback passed to git_config() must conform to a particular interface. But most callbacks don't actually look at the extra "void *data" parameter. Let's mark the unused parameters to make -Wunused-parameter happy. Note there's one unusual case here in get_remote_default() where we actually ignore the "value" parameter. That's because it's only checking whether the option is found at all, and not parsing its value. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-08convert: clarify line ending conversion warningAlex Henrie1-6/+6
The warning about converting line endings is extremely confusing. Its two sentences each use the word "will" without specifying a timeframe, which makes it sound like both sentences are referring to the same timeframe. On top of that, it uses the term "original line endings" without saying whether "original" means LF or CRLF. Rephrase the warning to be clear about when the line endings will be changed and what they will be changed to. On a platform whose native line endings are not CRLF (e.g. Linux), the "git add" step in the following sequence triggers the warning in question: $ git config core.autocrlf true $ echo 'Hello world!' >hello.txt $ git add hello.txt warning: LF will be replaced by CRLF in hello.txt The file will have its original line endings in your working directory Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-16Merge branch 'ab/object-file-api-updates'Junio C Hamano1-1/+1
Object-file API shuffling. * ab/object-file-api-updates: object-file API: pass an enum to read_object_with_reference() object-file.c: add a literal version of write_object_file_prepare() object-file API: have hash_object_file() take "enum object_type" object API: rename hash_object_file_literally() to write_*() object-file API: split up and simplify check_object_signature() object API users + docs: check <0, not !0 with check_object_signature() object API docs: move check_object_signature() docs to cache.h object API: correct "buf" v.s. "map" mismatch in *.c and *.h object-file API: have write_object_file() take "enum object_type" object-file API: add a format_object_header() function object-file API: return "void", not "int" from hash_object_file() object-file.c: split up declaration of unrelated variables
2022-02-25object-file API: have hash_object_file() take "enum object_type"Ævar Arnfjörð Bjarmason1-1/+1
Change the hash_object_file() function to take an "enum object_type". Since a preceding commit all of its callers are passing either "{commit,tree,blob,tag}_type", or the result of a call to type_name(), the parse_object() caller that would pass NULL is now using stream_object_signature(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-24convert.c: use designated initializers for "struct stream_filter*"Ævar Arnfjörð Bjarmason1-9/+9
Change the "struct stream_filter_vtbl" and "struct stream_filter" assignments in convert.c to use designated initializers. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-29Merge branch 'mc/clean-smudge-with-llp64'Junio C Hamano1-1/+1
The clean/smudge conversion code path has been prepared to better work on platforms where ulong is narrower than size_t. * mc/clean-smudge-with-llp64: clean/smudge: allow clean filters to process extremely large files odb: guard against data loss checking out a huge file git-compat-util: introduce more size_t helpers odb: teach read_blob_entry to use size_t t1051: introduce a smudge filter test for extremely large files test-lib: add prerequisite for 64-bit platforms test-tool genzeros: generate large amounts of data more efficiently test-genzeros: allow more than 2G zeros in Windows
2021-11-03clean/smudge: allow clean filters to process extremely large filesMatt Cooper1-1/+1
The filter system allows for alterations to file contents when they're moved between the database and the worktree. We already made sure that it is possible for smudge filters to produce contents that are larger than `unsigned long` can represent (which matters on systems where `unsigned long` is narrower than `size_t`, most notably 64-bit Windows). Now we make sure that clean filters can _consume_ contents that are larger than that. Note that this commit only allows clean filters' _input_ to be larger than can be represented by `unsigned long`. This change makes only a very minute dent into the much larger project to teach Git to use `size_t` instead of `unsigned long` wherever appropriate. Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Matt Cooper <vtbassmatt@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-26convert: release strbuf to avoid leakAndrzej Hunt1-0/+2
apply_multi_file_filter and async_query_available_blobs both query subprocess output using subprocess_read_status, which writes data into the identically named filter_status strbuf. We add a strbuf_release to avoid leaking their contents. Leak output seen when running t0021 with LSAN: Direct leak of 24 byte(s) in 1 object(s) allocated from: #0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3 #1 0xa8c2b5 in xrealloc wrapper.c:126:8 #2 0x9ff99d in strbuf_grow strbuf.c:98:2 #3 0x9ff99d in strbuf_addbuf strbuf.c:304:2 #4 0xa101d6 in subprocess_read_status sub-process.c:45:5 #5 0x77793c in apply_multi_file_filter convert.c:886:8 #6 0x77793c in apply_filter convert.c:1042:10 #7 0x77a0b5 in convert_to_git_filter_fd convert.c:1492:7 #8 0x8b48cd in index_stream_convert_blob object-file.c:2156:2 #9 0x8b48cd in index_fd object-file.c:2248:9 #10 0x597411 in hash_fd builtin/hash-object.c:43:9 #11 0x596be1 in hash_object builtin/hash-object.c:59:2 #12 0x596be1 in cmd_hash_object builtin/hash-object.c:153:3 #13 0x4ce83e in run_builtin git.c:475:11 #14 0x4ccafe in handle_builtin git.c:729:3 #15 0x4cb01c in run_argv git.c:818:4 #16 0x4cb01c in cmd_main git.c:949:19 #17 0x6bdc2d in main common-main.c:52:11 #18 0x7f42acf79349 in __libc_start_main (/lib64/libc.so.6+0x24349) SUMMARY: AddressSanitizer: 24 byte(s) leaked in 1 allocation(s). Direct leak of 120 byte(s) in 5 object(s) allocated from: #0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3 #1 0xa8c295 in xrealloc wrapper.c:126:8 #2 0x9ff97d in strbuf_grow strbuf.c:98:2 #3 0x9ff97d in strbuf_addbuf strbuf.c:304:2 #4 0xa101b6 in subprocess_read_status sub-process.c:45:5 #5 0x775c73 in async_query_available_blobs convert.c:960:8 #6 0x80029d in finish_delayed_checkout entry.c:183:9 #7 0xa65d1e in check_updates unpack-trees.c:493:10 #8 0xa5f469 in unpack_trees unpack-trees.c:1747:8 #9 0x525971 in checkout builtin/clone.c:815:6 #10 0x525971 in cmd_clone builtin/clone.c:1409:8 #11 0x4ce83e in run_builtin git.c:475:11 #12 0x4ccafe in handle_builtin git.c:729:3 #13 0x4cb01c in run_argv git.c:818:4 #14 0x4cb01c in cmd_main git.c:949:19 #15 0x6bdc2d in main common-main.c:52:11 #16 0x7fa253fce349 in __libc_start_main (/lib64/libc.so.6+0x24349) SUMMARY: AddressSanitizer: 120 byte(s) leaked in 5 allocation(s). Signed-off-by: Andrzej Hunt <andrzej@ahunt.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-30Merge branch 'ds/sparse-index-protections'Junio C Hamano1-10/+10
Builds on top of the sparse-index infrastructure to mark operations that are not ready to mark with the sparse index, causing them to fall back on fully-populated index that they always have worked with. * ds/sparse-index-protections: (47 commits) name-hash: use expand_to_path() sparse-index: expand_to_path() name-hash: don't add directories to name_hash revision: ensure full index resolve-undo: ensure full index read-cache: ensure full index pathspec: ensure full index merge-recursive: ensure full index entry: ensure full index dir: ensure full index update-index: ensure full index stash: ensure full index rm: ensure full index merge-index: ensure full index ls-files: ensure full index grep: ensure full index fsck: ensure full index difftool: ensure full index commit: ensure full index checkout: ensure full index ...
2021-04-14*: remove 'const' qualifier for struct index_stateDerrick Stolee1-13/+13
Several methods specify that they take a 'struct index_state' pointer with the 'const' qualifier because they intend to only query the data, not change it. However, we will be introducing a step very low in the method stack that might modify a sparse-index to become a full index in the case that our queries venture inside a sparse-directory entry. This change only removes the 'const' qualifiers that are necessary for the following change which will actually modify the implementation of index_name_stage_pos(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-02Merge branch 'jh/simple-ipc'Junio C Hamano1-3/+8
A simple IPC interface gets introduced to build services like fsmonitor on top. * jh/simple-ipc: t0052: add simple-ipc tests and t/helper/test-simple-ipc tool simple-ipc: add Unix domain socket implementation unix-stream-server: create unix domain socket under lock unix-socket: disallow chdir() when creating unix domain sockets unix-socket: add backlog size option to unix_stream_listen() unix-socket: eliminate static unix_stream_socket() helper function simple-ipc: add win32 implementation simple-ipc: design documentation for new IPC mechanism pkt-line: add options argument to read_packetized_to_strbuf() pkt-line: add PACKET_READ_GENTLE_ON_READ_ERROR option pkt-line: do not issue flush packets in write_packetized_*() pkt-line: eliminate the need for static buffer in packet_write_gently()
2021-04-02Merge branch 'mt/parallel-checkout-part-1'Junio C Hamano1-70/+73
Preparatory API changes for parallel checkout. * mt/parallel-checkout-part-1: entry: add checkout_entry_ca() taking preloaded conv_attrs entry: move conv_attrs lookup up to checkout_entry() entry: extract update_ce_after_write() from write_entry() entry: make fstat_output() and read_blob_entry() public entry: extract a header file for entry.c functions convert: add classification for conv_attrs struct convert: add get_stream_filter_ca() variant convert: add [async_]convert_to_working_tree_ca() variants convert: make convert_attrs() and convert structs public
2021-03-22Merge branch 'mt/cleanly-die-upon-missing-required-filter'Junio C Hamano1-1/+0
We had a code to diagnose and die cleanly when a required clean/smudge filter is missing, but an assert before that unnecessarily fired, hiding the end-user facing die() message. * mt/cleanly-die-upon-missing-required-filter: convert: fail gracefully upon missing clean cmd on required filter
2021-03-18convert: add classification for conv_attrs structJeff Hostetler1-7/+19
Create `enum conv_attrs_classification` to express the different ways that attributes are handled for a blob during checkout. This will be used in a later commit when deciding whether to add a file to the parallel or delayed queue during checkout. For now, we can also use it in get_stream_filter_ca() to simplify the function (as the classifying logic is the same). Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-18convert: add get_stream_filter_ca() variantJeff Hostetler1-11/+17
Like the previous patch, we will also need to call get_stream_filter() with a precomputed `struct conv_attrs`, when we add support for parallel checkout workers. So add the _ca() variant which takes the conversion attributes struct as a parameter. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-18convert: add [async_]convert_to_working_tree_ca() variantsJeff Hostetler1-28/+32
Separate the attribute gathering from the actual conversion by adding _ca() variants of the conversion functions. These variants receive a precomputed 'struct conv_attrs', not relying, thus, on an index state. They will be used in a future patch adding parallel checkout support, for two reasons: - We will already load the conversion attributes in checkout_entry(), before conversion, to decide whether a path is eligible for parallel checkout. Therefore, it would be wasteful to load them again later, for the actual conversion. - The parallel workers will be responsible for reading, converting and writing blobs to the working tree. They won't have access to the main process' index state, so they cannot load the attributes. Instead, they will receive the preloaded ones and call the _ca() variant of the conversion functions. Furthermore, the attributes machinery is optimized to handle paths in sequential order, so it's better to leave it for the main process, anyway. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-18convert: make convert_attrs() and convert structs publicJeff Hostetler1-27/+8
Move convert_attrs() declaration from convert.c to convert.h, together with the conv_attrs struct and the crlf_action enum. This function and the data structures will be used outside convert.c in the upcoming parallel checkout implementation. Note that crlf_action is renamed to convert_crlf_action, which is more appropriate for the global namespace. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-15pkt-line: add options argument to read_packetized_to_strbuf()Johannes Schindelin1-1/+2
Update the calling sequence of `read_packetized_to_strbuf()` to take an options argument and not assume a fixed set of options. Update the only existing caller accordingly to explicitly pass the formerly-assumed flags. The `read_packetized_to_strbuf()` function calls `packet_read()` with a fixed set of assumed options (`PACKET_READ_GENTLE_ON_EOF`). This assumption has been fine for the single existing caller `apply_multi_file_filter()` in `convert.c`. In a later commit we would like to add other callers to `read_packetized_to_strbuf()` that need a different set of options. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-15pkt-line: do not issue flush packets in write_packetized_*()Johannes Schindelin1-2/+6
Remove the `packet_flush_gently()` call in `write_packetized_from_buf() and `write_packetized_from_fd()` and require the caller to call it if desired. Rename both functions to `write_packetized_from_*_no_flush()` to prevent later merge accidents. `write_packetized_from_buf()` currently only has one caller: `apply_multi_file_filter()` in `convert.c`. It always wants a flush packet to be written after writing the payload. However, we are about to introduce a caller that wants to write many packets before a final flush packet, so let's make the caller responsible for emitting the flush packet. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-13use CALLOC_ARRAYRené Scharfe1-1/+1
Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-26convert: fail gracefully upon missing clean cmd on required filterMatheus Tavares1-1/+0
The gitattributes documentation mentions that either the clean cmd or the smudge cmd can be left unspecified in a filter definition. However, when the filter is marked as 'required', the absence of any one of these two should be treated as an error. Git already fails under these circumstances, but not always in a pleasant way: omitting a clean cmd in a required filter triggers an assertion error which leaves the user with a quite verbose message: git: convert.c:1459: convert_to_git_filter_fd: Assertion "ca.drv->clean || ca.drv->process" failed. This assertion is not really necessary, as the apply_filter() call below it already performs the same check. And when this condition is not met, the function returns 0, making the caller die() with a much nicer message. (Also note that die()-ing here is the right behavior as `would_convert_to_git_filter_fd() == true` is a precondition to use convert_to_git_filter_fd(), and the former is only true when the filter is required.) So remove the assertion and add two regression tests to make sure that git fails nicely when either the smudge or clean command is missing on a required filter. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30convert: drop unused crlf_action from check_global_conv_flags_eol()Jeff King1-2/+2
The crlf_action parameter hasn't been used since a0ad53c181 (convert: Correct NNO tests and missing `LF will be replaced by CRLF`, 2016-08-13), where that part of the function was hoisted out to a separate will_convert_lf_to_crlf() helper. Let's drop the useless parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-26run_command: teach API users to use embedded 'args' moreJunio C Hamano1-4/+1
The child_process structure has an embedded strvec for formulating the command line argument list these days, but code that predates the wide use of it prepared a separate char *argv[] array and manually set the child_process.argv pointer point at it. Teach these old-style code to lose the separate argv[] array. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-10parse_config_key(): return subsection len as size_tJeff King1-1/+1
We return the length to a subset of a string using an "int *" out-parameter. This is fine most of the time, as we'd expect config keys to be relatively short, but it could behave oddly if we had a gigantic config key. A more appropriate type is size_t. Let's switch over, which lets our callers use size_t as appropriate (they are bound by our type because they must pass the out-parameter as a pointer). This is mostly just a cleanup to make it clear this code handles long strings correctly. In practice, our config parser already chokes on long key names (because of a similar int/size_t mixup!). When doing an int/size_t conversion, we have to be careful that nobody was trying to assign a negative value to the variable. I manually confirmed that for each case here. They tend to just feed the result to xmemdupz() or similar; in a few cases I adjusted the parameter types for helper functions to make sure the size_t is preserved. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-16convert: provide additional metadata to filtersbrian m. carlson1-0/+22
Now that we have the codebase wired up to pass any additional metadata to filters, let's collect the additional metadata that we'd like to pass. The two main places we pass this metadata are checkouts and archives. In these two situations, reading HEAD isn't a valid option, since HEAD isn't updated for checkouts until after the working tree is written and archives can accept an arbitrary tree. In other situations, HEAD will usually reflect the refname of the branch in current use. We pass a smaller amount of data in other cases, such as git cat-file, where we can really only logically know about the blob. This commit updates only the parts of the checkout code where we don't use unpack_trees. That function and callers of it will be handled in a future commit. In the archive code, we leak a small amount of memory, since nothing we pass in the archiver argument structure is freed. Signed-off-by: brian m. carlson <bk2204@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-16convert: permit passing additional metadata to filter processesbrian m. carlson1-10/+34
There are a variety of situations where a filter process can make use of some additional metadata. For example, some people find the ident filter too limiting and would like to include the commit or the branch in their smudged files. This information isn't available during checkout as HEAD hasn't been updated at that point, and it wouldn't be available in archives either. Let's add a way to pass this metadata down to the filter. We pass the blob we're operating on, the treeish (preferring the commit over the tree if one exists), and the ref we're operating on. Note that we won't pass this information in all cases, such as when renormalizing or when we're performing diffs, since it doesn't make sense in those cases. The data we currently get from the filter process looks like the following: command=smudge pathname=git.c 0000 With this change, we'll get data more like this: command=smudge pathname=git.c refname=refs/tags/v2.25.1 treeish=c522f061d551c9bb8684a7c3859b2ece4499b56b blob=7be7ad34bd053884ec48923706e70c81719a8660 0000 There are a couple things to note about this approach. For operations like checkout, treeish will always be a commit, since we cannot check out individual trees, but for other operations, like archive, we can end up operating on only a particular tree, so we'll provide only a tree as the treeish. Similar comments apply for refname, since there are a variety of cases in which we won't have a ref. This commit wires up the code to print this information, but doesn't pass any of it at this point. In a future commit, we'll have various code paths pass the actual useful data down. Signed-off-by: brian m. carlson <bk2204@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14Merge branch 'mt/use-passed-repo-more-in-funcs'Junio C Hamano1-1/+1
Some codepaths were given a repository instance as a parameter to work in the repository, but passed the_repository instance to its callees, which has been cleaned up (somewhat). * mt/use-passed-repo-more-in-funcs: sha1-file: allow check_object_signature() to handle any repo sha1-file: pass git_hash_algo to hash_object_file() sha1-file: pass git_hash_algo to write_object_file_prepare() streaming: allow open_istream() to handle any repo pack-check: use given repo's hash_algo at verify_packfile() cache-tree: use given repo's hash_algo at verify_one() diff: make diff_populate_filespec() honor its repo argument
2020-02-12Merge branch 'js/convert-typofix'Junio C Hamano1-1/+1
Typofix. * js/convert-typofix: convert: fix typo
2020-02-11convert: fix typoJohannes Schindelin1-1/+1
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31sha1-file: pass git_hash_algo to hash_object_file()Matheus Tavares1-1/+1
Allow hash_object_file() to work on arbitrary repos by introducing a git_hash_algo parameter. Change callers which have a struct repository pointer in their scope to pass on the git_hash_algo from the said repo. For all other callers, pass on the_hash_algo, which was already being used internally at hash_object_file(). This functionality will be used in the following patch to make check_object_signature() be able to work on arbitrary repos (which, in turn, will be used to fix an inconsistency at object.c:parse_object()). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-01Merge branch 'rs/skip-iprefix'Junio C Hamano1-14/+8
Code simplification. * rs/skip-iprefix: convert: use skip_iprefix() in validate_encoding() utf8: use skip_iprefix() in same_utf_encoding()
2019-11-10convert: use skip_iprefix() in validate_encoding()René Scharfe1-14/+8
Use skip_iprefix() to parse "UTF" case-insensitively instead of checking with istarts_with(), building an upper-case version and then using skip_prefix() on it. This gets rid of duplicate code and of a small allocation. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-10Fix spelling errors in code commentsElijah Newren1-1/+1
Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-09Merge branch 'rs/convert-fix-utf-without-dash'Junio C Hamano1-4/+4
The code to skip "UTF" and "UTF-" prefix, when computing an advice message, did not work correctly when the prefix was "UTF", which has been fixed. * rs/convert-fix-utf-without-dash: convert: fix handling of dashless UTF prefix in validate_encoding()
2019-10-06convert: fix handling of dashless UTF prefix in validate_encoding()René Scharfe1-4/+4
Strip "UTF" and an optional dash from the start of 'upper' without passing a NULL pointer to skip_prefix() in the second call, as it cannot handle that. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-03am: reload .gitattributes after patching itbrian m. carlson1-1/+20
When applying multiple patches with git am, or when rebasing using the am backend, it's possible that one of our patches has updated a gitattributes file. Currently, we cache this information, so if a file in a subsequent patch has attributes applied, the file will be written out with the attributes in place as of the time we started the rebase or am operation, not with the attributes applied by the previous patch. This problem does not occur when using the -m or -i flags to rebase. To ensure we write the correct data into the working tree, expire the cache after each patch that touches a path ending in ".gitattributes". Since we load these attributes in multiple separate files, we must expire them accordingly. Verify that both the am and rebase code paths work correctly, including the conflict marker size with am -3. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-10Merge branch 'jh/resize-convert-scratch-buffer'Junio C Hamano1-1/+1
When the "clean" filter can reduce the size of a huge file in the working tree down to a small "token" (a la Git LFS), there is no point in allocating a huge scratch area upfront, but the buffer is sized based on the original file size. The convert mechanism now allocates very minimum and reallocates as it receives the output from the clean filter process. * jh/resize-convert-scratch-buffer: convert: avoid malloc of original file size
2019-03-08convert: avoid malloc of original file sizeJoey Hess1-1/+1
We write the output of a "clean" filter into a strbuf. Rather than growing the strbuf dynamically as we read its output, we make the initial allocation as large as the original input file. This is a good guess when the filter is just tweaking a few bytes, but it's disastrous when the point of the filter is to condense a very large file into a short identifier (e.g., the way git-lfs and git-annex do). We may ask to allocate many gigabytes, causing the allocation to fail and Git to die(). Instead, let's just let strbuf do its usual growth. When the clean filter does output something around the same size as the worktree file, the buffer will need to be reallocated until it fits, starting at 8192 and doubling in size. Benchmarking indicates that reallocation is not a significant overhead for outputs up to a few MB in size. Signed-off-by: Joey Hess <id@joeyh.name> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-06Merge branch 'jk/unused-parameter-cleanup'Junio C Hamano1-14/+14
Code cleanup. * jk/unused-parameter-cleanup: convert: drop path parameter from actual conversion functions convert: drop len parameter from conversion checks config: drop unused parameter from maybe_remove_section() show_date_relative(): drop unused "tz" parameter column: drop unused "opts" parameter in item_length() create_bundle(): drop unused "header" parameter apply: drop unused "def" parameter from find_name_gnu() match-trees: drop unused path parameter from score functions
2019-02-06Merge branch 'nd/the-index-final'Junio C Hamano1-1/+0
The assumption to work on the single "in-core index" instance has been reduced from the library-ish part of the codebase. * nd/the-index-final: cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch read-cache.c: remove the_* from index_has_changes() merge-recursive.c: remove implicit dependency on the_repository merge-recursive.c: remove implicit dependency on the_index sha1-name.c: remove implicit dependency on the_index read-cache.c: replace update_index_if_able with repo_& read-cache.c: kill read_index() checkout: avoid the_index when possible repository.c: replace hold_locked_index() with repo_hold_locked_index() notes-utils.c: remove the_repository references grep: use grep_opt->repo instead of explict repo argument
2019-01-24convert: drop path parameter from actual conversion functionsJeff King1-7/+7
The caller is responsible for looking up the attributes, after which point we no longer care about the path at which the content is found. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-24convert: drop len parameter from conversion checksJeff King1-7/+7
We've already extracted the useful information into our text_stat struct, so the length is no longer needed. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-24cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switchNguyễn Thái Ngọc Duy1-1/+0
By default, index compat macros are off from now on, because they could hide the_index dependency. Only those in builtin can use it. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-18Merge branch 'nd/style-opening-brace'Junio C Hamano1-1/+2
Code clean-up. * nd/style-opening-brace: style: the opening '{' of a function is in a separate line
2018-12-10style: the opening '{' of a function is in a separate lineNguyễn Thái Ngọc Duy1-1/+2
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-09Indent code with TABsNguyễn Thái Ngọc Duy1-3/+3
We indent with TABs and sometimes for fine alignment, TABs followed by spaces, but never all spaces (unless the indentation is less than 8 columns). Indenting with spaces slips through in some places. Fix them. Imported code and compat/ are left alone on purpose. The former should remain as close as upstream as possible. The latter pretty much has separate maintainers, it's up to them to decide. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-12Make git_check_attr() a void functionTorsten Bögershausen1-23/+19
git_check_attr() returns always 0. Remove all the error handling code of the callers, which is never executed. Change git_check_attr() to be a void function. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20Merge branch 'nd/no-the-index'Junio C Hamano1-17/+24
The more library-ish parts of the codebase learned to work on the in-core index-state instance that is passed in by their callers, instead of always working on the singleton "the_index" instance. * nd/no-the-index: (24 commits) blame.c: remove implicit dependency on the_index apply.c: remove implicit dependency on the_index apply.c: make init_apply_state() take a struct repository apply.c: pass struct apply_state to more functions resolve-undo.c: use the right index instead of the_index archive-*.c: use the right repository archive.c: avoid access to the_index grep: use the right index instead of the_index attr: remove index from git_attr_set_direction() entry.c: use the right index instead of the_index submodule.c: use the right index instead of the_index pathspec.c: use the right index instead of the_index unpack-trees: avoid the_index in verify_absent() unpack-trees: convert clear_ce_flags* to avoid the_index unpack-trees: don't shadow global var the_index unpack-trees: add a note about path invalidation unpack-trees: remove 'extern' on function declaration ls-files: correct index argument to get_convert_attr_ascii() preload-index.c: use the right index instead of the_index dir.c: remove an implicit dependency on the_index in pathspec code ...
2018-08-15Merge branch 'jk/size-t'Junio C Hamano1-3/+3
Code clean-up to use size_t/ssize_t when they are the right type. * jk/size-t: strbuf_humanise: use unsigned variables pass st.st_size as hint for strbuf_readlink() strbuf_readlink: use ssize_t strbuf: use size_t for length in intermediate variables reencode_string: use size_t for string lengths reencode_string: use st_add/st_mult helpers
2018-08-15Merge branch 'nd/i18n'Junio C Hamano1-20/+22
Many more strings are prepared for l10n. * nd/i18n: (23 commits) transport-helper.c: mark more strings for translation transport.c: mark more strings for translation sha1-file.c: mark more strings for translation sequencer.c: mark more strings for translation replace-object.c: mark more strings for translation refspec.c: mark more strings for translation refs.c: mark more strings for translation pkt-line.c: mark more strings for translation object.c: mark more strings for translation exec-cmd.c: mark more strings for translation environment.c: mark more strings for translation dir.c: mark more strings for translation convert.c: mark more strings for translation connect.c: mark more strings for translation config.c: mark more strings for translation commit-graph.c: mark more strings for translation builtin/replace.c: mark more strings for translation builtin/pack-objects.c: mark more strings for translation builtin/grep.c: mark strings for translation builtin/config.c: mark more strings for translation ...
2018-08-13convert.c: remove an implicit dependency on the_indexNguyễn Thái Ngọc Duy1-17/+24
Make the convert API take an index_state instead of assuming the_index in convert.c. All external call sites are converted blindly to keep the patch simple and retain current behavior. Individual call sites may receive further updates to use the right index instead of the_index. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13attr: remove an implicit dependency on the_indexNguyễn Thái Ngọc Duy1-1/+1
Make the attr API take an index_state instead of assuming the_index in attr code. All call sites are converted blindly to keep the patch simple and retain current behavior. Individual call sites may receive further updates to use the right index instead of the_index. There is one ugly temporary workaround added in attr.c that needs some more explanation. Commit c24f3abace (apply: file commited with CRLF should roundtrip diff and apply - 2017-08-19) forces one convert_to_git() call to NOT read the index at all. But what do you know, we read it anyway by falling back to the_index. When "istate" from convert_to_git is now propagated down to read_attr_from_array() we will hit segfault somewhere inside read_blob_data_from_index. The right way of dealing with this is to kill "use_index" variable and only follow "istate" but at this stage we are not ready for that: while most git_attr_set_direction() calls just passes the_index to be assigned to use_index, unpack-trees passes a different one which is used by entry.c code, which has no way to know what index to use if we delete use_index. So this has to be done later. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-24Merge branch 'bb/pedantic'Junio C Hamano1-1/+1
The codebase has been updated to compile cleanly with -pedantic option. * bb/pedantic: utf8.c: avoid char overflow string-list.c: avoid conversion from void * to function pointer sequencer.c: avoid empty statements at top level convert.c: replace "\e" escapes with "\033". fixup! refs/refs-internal.h: avoid forward declaration of an enum refs/refs-internal.h: avoid forward declaration of an enum fixup! connect.h: avoid forward declaration of an enum connect.h: avoid forward declaration of an enum
2018-07-24reencode_string: use size_t for string lengthsJeff King1-3/+3
The iconv interface takes a size_t, which is the appropriate type for an in-memory buffer. But our reencode_string_* functions use integers, meaning we may get confusing results when the sizes exceed INT_MAX. Let's use size_t consistently. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-23convert.c: mark more strings for translationNguyễn Thái Ngọc Duy1-18/+20
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-23Update messages in preparation for i18nNguyễn Thái Ngọc Duy1-3/+3
Many messages will be marked for translation in the following commits. This commit updates some of them to be more consistent and reduce diff noise in those commits. Changes are - keep the first letter of die(), error() and warning() in lowercase - no full stop in die(), error() or warning() if it's single sentence messages - indentation - some messages are turned to BUG(), or prefixed with "BUG:" and will not be marked for i18n - some messages are improved to give more information - some messages are broken down by sentence to be i18n friendly (on the same token, combine multiple warning() into one big string) - the trailing \n is converted to printf_ln if possible, or deleted if not redundant - errno_errno() is used instead of explicit strerror() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-18Merge branch 'sb/object-store-grafts'Junio C Hamano1-0/+1
The conversion to pass "the_repository" and then "a_repository" throughout the object access API continues. * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h
2018-07-09convert.c: replace "\e" escapes with "\033".Beat Bolli1-1/+1
The "\e" escape is not defined in ISO C. While on this line, add a missing space after the comma. Signed-off-by: Beat Bolli <dev+git@drbeat.li> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16object-store: move object access functions to object-store.hStefan Beller1-0/+1
This should make these functions easier to find and cache.h less overwhelming to read. In particular, this moves: - read_object_file - oid_object_info - write_object_file As a result, most of the codebase needs to #include object-store.h. In this patch the #include is only added to files that would fail to compile otherwise. It would be better to #include wherever identifiers from the header are used. That can happen later when we have better tooling for it. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-08Merge branch 'ls/checkout-encoding'Junio C Hamano1-1/+275
The new "checkout-encoding" attribute can ask Git to convert the contents to the specified encoding when checking out to the working tree (and the other way around when checking in). * ls/checkout-encoding: convert: add round trip check based on 'core.checkRoundtripEncoding' convert: add tracing for 'working-tree-encoding' attribute convert: check for detectable errors in UTF encodings convert: add 'working-tree-encoding' attribute utf8: add function to detect a missing UTF-16/32 BOM utf8: add function to detect prohibited UTF-16/32 BOM utf8: teach same_encoding() alternative UTF encoding names strbuf: add a case insensitive starts_with() strbuf: add xstrdup_toupper() strbuf: remove unnecessary NUL assignment in xstrdup_tolower()
2018-04-16convert: add round trip check based on 'core.checkRoundtripEncoding'Lars Schneider1-0/+77
UTF supports lossless conversion round tripping and conversions between UTF and other encodings are mostly round trip safe as Unicode aims to be a superset of all other character encodings. However, certain encodings (e.g. SHIFT-JIS) are known to have round trip issues [1]. Add 'core.checkRoundtripEncoding', which contains a comma separated list of encodings, to define for what encodings Git should check the conversion round trip if they are used in the 'working-tree-encoding' attribute. Set SHIFT-JIS as default value for 'core.checkRoundtripEncoding'. [1] https://support.microsoft.com/en-us/help/170559/prb-conversion-problem-between-shift-jis-and-unicode Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16convert: add tracing for 'working-tree-encoding' attributeLars Schneider1-0/+25
Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable tracing for content that is reencoded with the 'working-tree-encoding' attribute. This is useful to debug encoding issues. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16convert: check for detectable errors in UTF encodingsLars Schneider1-0/+61
Check that new content is valid with respect to the user defined 'working-tree-encoding' attribute. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16convert: add 'working-tree-encoding' attributeLars Schneider1-1/+112
Git recognizes files encoded with ASCII or one of its supersets (e.g. UTF-8 or ISO-8859-1) as text files. All other encodings are usually interpreted as binary and consequently built-in Git text processing tools (e.g. 'git diff') as well as most Git web front ends do not visualize the content. Add an attribute to tell Git what encoding the user has defined for a given file. If the content is added to the index, then Git reencodes the content to a canonical UTF-8 representation. On checkout Git will reverse this operation. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14convert: convert to struct object_idbrian m. carlson1-6/+6
Convert convert.c to struct object_id. Add a use of the_hash_algo to replace hard-coded constants and change a strbuf_add to a strbuf_addstr to avoid another hard-coded constant. Note that a strict conversion using the hexsz constant would cause problems in the future if the internal and user-visible hash algorithms differed, as anticipated by the hash function transition plan. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-15Merge branch 'po/object-id'Junio C Hamano1-3/+3
Conversion from uchar[20] to struct object_id continues. * po/object-id: sha1_file: rename hash_sha1_file_literally sha1_file: convert write_loose_object to object_id sha1_file: convert force_object_loose to object_id sha1_file: convert write_sha1_file to object_id notes: convert write_notes_tree to object_id notes: convert combine_notes_* to object_id commit: convert commit_tree* to object_id match-trees: convert splice_tree to object_id cache: clear whole hash buffer with oidclr sha1_file: convert hash_sha1_file to object_id dir: convert struct sha1_stat to use object_id sha1_file: convert pretend_sha1_file to object_id
2018-01-30sha1_file: convert hash_sha1_file to object_idPatryk Obara1-3/+3
Convert the declaration and definition of hash_sha1_file to use struct object_id and adjust all function calls. Rename this function to hash_object_file. Signed-off-by: Patryk Obara <patryk.obara@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-16convert_to_git(): safe_crlf/checksafe becomes int conv_flagsTorsten Bögershausen1-19/+19
When calling convert_to_git(), the checksafe parameter defined what should happen if the EOL conversion (CRLF --> LF --> CRLF) does not roundtrip cleanly. In addition, it also defined if line endings should be renormalized (CRLF --> LF) or kept as they are. checksafe was an safe_crlf enum with these values: SAFE_CRLF_FALSE: do nothing in case of EOL roundtrip errors SAFE_CRLF_FAIL: die in case of EOL roundtrip errors SAFE_CRLF_WARN: print a warning in case of EOL roundtrip errors SAFE_CRLF_RENORMALIZE: change CRLF to LF SAFE_CRLF_KEEP_CRLF: keep all line endings as they are In some cases the integer value 0 was passed as checksafe parameter instead of the correct enum value SAFE_CRLF_FALSE. That was no problem because SAFE_CRLF_FALSE is defined as 0. FALSE/FAIL/WARN are different from RENORMALIZE and KEEP_CRLF. Therefore, an enum is not ideal. Let's use a integer bit pattern instead and rename the parameter to conv_flags to make it more generically usable. This allows us to extend the bit pattern in a subsequent commit. Reported-By: Randall S. Becker <rsbecker@nexbridge.com> Helped-By: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-27Merge branch 'tb/check-crlf-for-safe-crlf'Junio C Hamano1-5/+14
The "safe crlf" check incorrectly triggered for contents that does not use CRLF as line endings, which has been corrected. * tb/check-crlf-for-safe-crlf: t0027: Adapt the new MIX tests to Windows convert: tighten the safe autocrlf handling
2017-11-27convert: tighten the safe autocrlf handlingTorsten Bögershausen1-5/+14
When a text file had been commited with CRLF and the file is commited again, the CRLF are kept if .gitattributs has "text=auto". This is done by analyzing the content of the blob stored in the index: If a '\r' is found, Git assumes that the blob was commited with CRLF. The simple search for a '\r' does not always work as expected: A file is encoded in UTF-16 with CRLF and commited. Git treats it as binary. Now the content is converted into UTF-8. At the next commit Git treats the file as text, the CRLF should be converted into LF, but isn't. Replace has_cr_in_index() with has_crlf_in_index(). When no '\r' is found, 0 is returned directly, this is the most common case. If a '\r' is found, the content is analyzed more deeply. Reported-By: Ashish Negi <ashishnegi33@gmail.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-23Merge branch 'ma/ts-cleanups' into maintJunio C Hamano1-2/+3
Assorted bugfixes and clean-ups. * ma/ts-cleanups: ThreadSanitizer: add suppressions strbuf_setlen: don't write to strbuf_slopbuf pack-objects: take lock before accessing `remaining` convert: always initialize attr_action in convert_attrs
2017-09-22consistently use "fallthrough" comments in switchesJeff King1-1/+2
Gcc 7 adds -Wimplicit-fallthrough, which can warn when a switch case falls through to the next case. The general idea is that the compiler can't tell if this was intentional or not, so you should annotate any intentional fall-throughs as such, leaving it to complain about any unannotated ones. There's a GNU __attribute__ which can be used for annotation, but of course we'd have to #ifdef it away on non-gcc compilers. Gcc will also recognize specially-formatted comments, which matches our current practice. Let's extend that practice to all of the unannotated sites (which I did look over and verify that they were behaving as intended). Ideally in each case we'd actually give some reasons in the comment about why we're falling through, or what we're falling through to. And gcc does support that with -Wimplicit-fallthrough=2, which relaxes the comment pattern matching to anything that contains "fallthrough" (or a variety of spelling variants). However, this isn't the default for -Wimplicit-fallthrough, nor for -Wextra. In the name of simplicity, it's probably better for us to support the default level, which requires "fallthrough" to be the only thing in the comment (modulo some window dressing like "else" and some punctuation; see the gcc manual for the complete set of patterns). This patch suppresses all warnings due to -Wimplicit-fallthrough. We might eventually want to add that to the DEVELOPER Makefile knob, but we should probably wait until gcc 7 is more widely adopted (since earlier versions will complain about the unknown warning type). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-19Merge branch 'rs/strbuf-leakfix'Junio C Hamano1-1/+3
Many leaks of strbuf have been fixed. * rs/strbuf-leakfix: (34 commits) wt-status: release strbuf after use in wt_longstatus_print_tracking() wt-status: release strbuf after use in read_rebase_todolist() vcs-svn: release strbuf after use in end_revision() utf8: release strbuf on error return in strbuf_utf8_replace() userdiff: release strbuf after use in userdiff_get_textconv() transport-helper: release strbuf after use in process_connect_service() sequencer: release strbuf after use in save_head() shortlog: release strbuf after use in insert_one_record() sha1_file: release strbuf on error return in index_path() send-pack: release strbuf on error return in send_pack() remote: release strbuf after use in set_url() remote: release strbuf after use in migrate_file() remote: release strbuf after use in read_remote_branches() refs: release strbuf on error return in write_pseudoref() notes: release strbuf after use in notes_copy_from_stdin() merge: release strbuf after use in write_merge_heads() merge: release strbuf after use in save_state() mailinfo: release strbuf on error return in handle_boundary() mailinfo: release strbuf after use in handle_from() help: release strbuf on error return in exec_woman_emacs() ...
2017-09-10Merge branch 'ma/ts-cleanups'Junio C Hamano1-2/+3
Assorted bugfixes and clean-ups. * ma/ts-cleanups: ThreadSanitizer: add suppressions strbuf_setlen: don't write to strbuf_slopbuf pack-objects: take lock before accessing `remaining` convert: always initialize attr_action in convert_attrs
2017-09-10Merge branch 'tb/apply-with-crlf' into maintJunio C Hamano1-4/+6
"git apply" that is used as a better "patch -p1" failed to apply a taken from a file with CRLF line endings to a file with CRLF line endings. The root cause was because it misused convert_to_git() that tried to do "safe-crlf" processing by looking at the index entry at the same path, which is a nonsense---in that mode, "apply" is not working on the data in (or derived from) the index at all. This has been fixed. * tb/apply-with-crlf: apply: file commited with CRLF should roundtrip diff and apply convert: add SAFE_CRLF_KEEP_CRLF
2017-09-07convert: release strbuf on error return in filter_buffer_or_fd()Rene Scharfe1-1/+3
Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-26Merge branch 'tb/apply-with-crlf'Junio C Hamano1-4/+6
"git apply" that is used as a better "patch -p1" failed to apply a taken from a file with CRLF line endings to a file with CRLF line endings. The root cause was because it misused convert_to_git() that tried to do "safe-crlf" processing by looking at the index entry at the same path, which is a nonsense---in that mode, "apply" is not working on the data in (or derived from) the index at all. This has been fixed. * tb/apply-with-crlf: apply: file commited with CRLF should roundtrip diff and apply convert: add SAFE_CRLF_KEEP_CRLF
2017-08-23convert: always initialize attr_action in convert_attrsMartin Ågren1-2/+3
convert_attrs contains an "if-else". In the "if", we set attr_action twice, and the first assignment has no effect. In the "else", we do not set it at all. Since git_check_attr always returns the same value, we'll always end up in the "if", so there is no problem right now. But convert_attrs is obviously trying not to rely on such an implementation-detail of another component. Make the initialization of attr_action after the if-else. Remove the earlier assignments. Suggested-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-16convert: add SAFE_CRLF_KEEP_CRLFTorsten Bögershausen1-4/+6
When convert_to_git() is called, the caller may want to keep CRLF to be kept as CRLF (and not converted into LF). This will be used in the next commit, when apply works with files that have CRLF and patches are applied onto these files. Add the new value "SAFE_CRLF_KEEP_CRLF" to safe_crlf. Prepare convert_to_git() to be able to run the clean filter, skip the CRLF conversion and run the ident filter. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-11Merge branch 'jt/subprocess-handshake'Junio C Hamano1-68/+7
Code cleanup. * jt/subprocess-handshake: sub-process: refactor handshake to common function Documentation: migrate sub-process docs to header
2017-08-11Merge branch 'sb/hashmap-cleanup'Junio C Hamano1-2/+1
Many uses of comparision callback function the hashmap API uses cast the callback function type when registering it to hashmap_init(), which defeats the compile time type checking when the callback interface changes (e.g. gaining more parameters). The callback implementations have been updated to take "void *" pointers and cast them to the type they expect instead. * sb/hashmap-cleanup: t/helper/test-hashmap: use custom data instead of duplicate cmp functions name-hash.c: drop hashmap_cmp_fn cast submodule-config.c: drop hashmap_cmp_fn cast remote.c: drop hashmap_cmp_fn cast patch-ids.c: drop hashmap_cmp_fn cast convert/sub-process: drop cast to hashmap_cmp_fn config.c: drop hashmap_cmp_fn cast builtin/describe: drop hashmap_cmp_fn cast builtin/difftool.c: drop hashmap_cmp_fn cast attr.c: drop hashmap_cmp_fn cast
2017-08-11Merge branch 'ls/filter-process-delayed'Junio C Hamano1-52/+150
The filter-process interface learned to allow a process with long latency give a "delayed" response. * ls/filter-process-delayed: convert: add "status=delayed" to filter process protocol convert: refactor capabilities negotiation convert: move multiple file filter error handling to separate function convert: put the flags field before the flag itself for consistent style t0021: write "OUT <size>" only on success t0021: make debug log file name configurable t0021: keep filter log files on comparison
2017-07-26sub-process: refactor handshake to common functionJonathan Tan1-68/+7
Refactor, into a common function, the version and capability negotiation done when invoking a long-running process as a clean or smudge filter. This will be useful for other Git code that needs to interact similarly with a long-running process. As you can see in the change to t0021, this commit changes the error message reported when the long-running process does not introduce itself with the expected "server"-terminated line. Originally, the error message reports that the filter "does not support filter protocol version 2", differentiating between the old single-file filter protocol and the new multi-file filter protocol - I have updated it to something more generic and useful. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-26Merge branch 'ls/filter-process-delayed' into jt/subprocess-handshakeJunio C Hamano1-52/+150
* ls/filter-process-delayed: convert: add "status=delayed" to filter process protocol convert: refactor capabilities negotiation convert: move multiple file filter error handling to separate function convert: put the flags field before the flag itself for consistent style t0021: write "OUT <size>" only on success t0021: make debug log file name configurable t0021: keep filter log files on comparison
2017-07-05convert/sub-process: drop cast to hashmap_cmp_fnStefan Beller1-2/+1
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30convert: add "status=delayed" to filter process protocolLars Schneider1-16/+94
Some `clean` / `smudge` filters may require a significant amount of time to process a single blob (e.g. the Git LFS smudge filter might perform network requests). During this process the Git checkout operation is blocked and Git needs to wait until the filter is done to continue with the checkout. Teach the filter process protocol, introduced in edcc8581 ("convert: add filter.<driver>.process option", 2016-10-16), to accept the status "delayed" as response to a filter request. Upon this response Git continues with the checkout operation. After the checkout operation Git calls "finish_delayed_checkout" which queries the filter for remaining blobs. If the filter is still working on the completion, then the filter is expected to block. If the filter has completed all remaining blobs then an empty response is expected. Git has a multiple code paths that checkout a blob. Support delayed checkouts only in `clone` (in unpack-trees.c) and `checkout` operations for now. The optimization is most effective in these code paths as all files of the tree are processed. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30convert: refactor capabilities negotiationLars Schneider1-12/+27
The code to negotiate long running filter capabilities was very repetitive for new capabilities. Replace the repetitive conditional statements with a table-driven approach. This is useful for the subsequent patch 'convert: add "status=delayed" to filter process protocol'. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30hashmap.h: compare function has access to a data fieldStefan Beller1-1/+2
When using the hashmap a common need is to have access to caller provided data in the compare function. A couple of times we abuse the keydata field to pass in the data needed. This happens for example in patch-ids.c. This patch changes the function signature of the compare function to have one more void pointer available. The pointer given for each invocation of the compare function must be defined in the init function of the hashmap and is just passed through. Documentation of this new feature is deferred to a later patch. This is a rather mechanical conversion, just adding the new pass-through parameter. However while at it improve the naming of the fields of all compare functions used by hashmaps by ensuring unused parameters are prefixed with 'unused_' and naming the parameters what they are (instead of 'unused' make it 'unused_keydata'). Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-29convert: move multiple file filter error handling to separate functionLars Schneider1-21/+26
Refactoring the filter error handling is useful for the subsequent patch 'convert: add "status=delayed" to filter process protocol'. In addition, replace the parentheses around the empty "if" block with a single semicolon to adhere to the Git style guide. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-29convert: put the flags field before the flag itself for consistent styleLars Schneider1-5/+5
Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-24Merge branch 'bw/config-h'Junio C Hamano1-0/+1
Fix configuration codepath to pay proper attention to commondir that is used in multi-worktree situation, and isolate config API into its own header file. * bw/config-h: config: don't implicitly use gitdir or commondir config: respect commondir setup: teach discover_git_directory to respect the commondir config: don't include config.h by default config: remove git_config_iter config: create config.h
2017-06-15config: don't include config.h by defaultBrandon Williams1-0/+1
Stop including config.h by default in cache.h. Instead only include config.h in those files which require use of the config system. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-13convert: convert renormalize_buffer to take an indexBrandon Williams1-2/+4
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-13convert: convert convert_to_git to take an indexBrandon Williams1-3/+4
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-13convert: convert convert_to_git_filter_fd to take an indexBrandon Williams1-2/+3
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-13convert: convert crlf_to_git to take an indexBrandon Williams1-6/+8
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-13convert: convert get_cached_convert_stats_ascii to take an indexBrandon Williams1-2/+3
Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-15convert: update subprocess_read_status() to not die on EOFBen Peart1-2/+8
Enable sub-processes to gracefully handle when the process dies by updating subprocess_read_status to return an error on EOF instead of dying. Update apply_multi_file_filter to take advantage of the revised subprocess_read_status. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-15sub-process: move sub-process functions into separate filesBen Peart1-103/+1
Move the sub-proces functions into sub-process.h/c. Add documentation for the new module in Documentation/technical/api-sub-process.txt Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-15convert: rename reusable sub-process functionsBen Peart1-20/+20
Do a mechanical rename of the functions that will become the reusable sub-process module. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-15convert: update generic functions to only use generic data structuresBen Peart1-18/+23
Update all functions that are going to be moved into a reusable module so that they only work with the reusable data structures. Move code that is specific to the filter out into the filter specific functions. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-15convert: separate generic structures and variables from the filter specific onesBen Peart1-15/+20
To enable future reuse of the filter.<driver>.process infrastructure, split the cmd2process structure into two separate parts. subprocess_entry will now contain the generic data required to manage the creation and tracking of the child process in a hashmap. cmd2process is a filter protocol specific structure that is used to track the negotiated capabilities of the filter. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-15convert: split start_multi_file_filter() into two separate functionsBen Peart1-24/+34
To enable future reuse of the filter.<driver>.process infrastructure, split start_multi_file_filter() into two separate parts. start_multi_file_filter() will now only contain the generic logic to manage the creation and tracking of the child process in a hashmap. start_multi_file_filter_fn() is a protocol specific initialization function that will negotiate the multi-file-filter interface version and capabilities. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08convert: move packet_write_line() into pkt-line as packet_writel()Ben Peart1-21/+2
Add packet_writel() which writes multiple lines in a single call and then calls packet_flush_gently(). Update convert.c to use the new packet_writel() function from pkt-line. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08convert: remove erroneous tests for errno == EPIPEBen Peart1-2/+2
start_multi_file_filter() and apply_multi_file_filter() currently test for errno == EPIPE but treating EPIPE as an error is already happening from one of the packet_write() functions. Signed-off-by: Ben Peart <benpeart@microsoft.com> Found/Fixed-by: Jeff King <peff@peff.net> Acked-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01attr: convert git_check_attrs() callers to use the new APIJunio C Hamano1-11/+6
The remaining callers are all simple "I have N attributes I am interested in. I'll ask about them with various paths one by one". After this step, no caller to git_check_attrs() remains. After removing it, we can extend "struct attr_check" struct with data that can be used in optimizing the query for the specific N attributes it contains. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01attr: rename function and struct related to checking attributesJunio C Hamano1-6/+6
The traditional API to check attributes is to prepare an N-element array of "struct git_attr_check" and pass N and the array to the function "git_check_attr()" as arguments. In preparation to revamp the API to pass a single structure, in which these N elements are held, rename the type used for these individual array elements to "struct attr_check_item" and rename the function to "git_check_attrs()". Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-19Merge branch 'jc/renormalize-merge-kill-safer-crlf'Junio C Hamano1-6/+7
Fix a corner case in merge-recursive regression that crept in during 2.10 development cycle. * jc/renormalize-merge-kill-safer-crlf: convert: git cherry-pick -Xrenormalize did not work merge-recursive: handle NULL in add_cacheinfo() correctly cherry-pick: demonstrate a segmentation fault
2016-12-01convert: git cherry-pick -Xrenormalize did not workTorsten Bögershausen1-6/+7
Working with a repo that used to be all CRLF. At some point it was changed to all LF, with `text=auto` in .gitattributes. Trying to cherry-pick a commit from before the switchover fails: $ git cherry-pick -Xrenormalize <commit> fatal: CRLF would be replaced by LF in [path] Commit 65237284 "unify the "auto" handling of CRLF" introduced a regression: Whenever crlf_action is CRLF_TEXT_XXX and not CRLF_AUTO_XXX, SAFE_CRLF_RENORMALIZE was feed into check_safe_crlf(). This is wrong because here everything else than SAFE_CRLF_WARN is treated as SAFE_CRLF_FAIL. Call check_safe_crlf() only if checksafe is SAFE_CRLF_WARN or SAFE_CRLF_FAIL. Reported-by: Eevee (Lexy Munroe) <eevee@veekun.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-31Merge branch 'ls/filter-process'Junio C Hamano1-41/+335
The smudge/clean filter API expect an external process is spawned to filter the contents for each path that has a filter defined. A new type of "process" filter API has been added to allow the first request to run the filter for a path to spawn a single process, and all filtering need is served by this single process for multiple paths, reducing the process creation overhead. * ls/filter-process: contrib/long-running-filter: add long running filter example convert: add filter.<driver>.process option convert: prepare filter.<driver>.process option convert: make apply_filter() adhere to standard Git error handling pkt-line: add functions to read/write flush terminated packet streams pkt-line: add packet_write_gently() pkt-line: add packet_flush_gently() pkt-line: add packet_write_fmt_gently() pkt-line: extract set_packet_header() pkt-line: rename packet_write() to packet_write_fmt() run-command: add clean_on_exit_handler run-command: move check_pipe() from write_or_die to run_command convert: modernize tests convert: quote filter names in error messages
2016-10-17i18n: convert mark error messages for translationVasco Almeida1-4/+8
Mark error messages about CRLF for translation. Update test to reflect changes. Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-17convert: add filter.<driver>.process optionLars Schneider1-6/+295
Git's clean/smudge mechanism invokes an external filter process for every single blob that is affected by a filter. If Git filters a lot of blobs then the startup time of the external filter processes can become a significant part of the overall Git execution time. In a preliminary performance test this developer used a clean/smudge filter written in golang to filter 12,000 files. This process took 364s with the existing filter mechanism and 5s with the new mechanism. See details here: https://github.com/github/git-lfs/pull/1382 This patch adds the `filter.<driver>.process` string option which, if used, keeps the external filter process running and processes all blobs with the packet format (pkt-line) based protocol over standard input and standard output. The full protocol is explained in detail in `Documentation/gitattributes.txt`. A few key decisions: * The long running filter process is referred to as filter protocol version 2 because the existing single shot filter invocation is considered version 1. * Git sends a welcome message and expects a response right after the external filter process has started. This ensures that Git will not hang if a version 1 filter is incorrectly used with the filter.<driver>.process option for version 2 filters. In addition, Git can detect this kind of error and warn the user. * The status of a filter operation (e.g. "success" or "error) is set before the actual response and (if necessary!) re-set after the response. The advantage of this two step status response is that if the filter detects an error early, then the filter can communicate this and Git does not even need to create structures to read the response. * All status responses are pkt-line lists terminated with a flush packet. This allows us to send other status fields with the same protocol in the future. Helped-by: Martin-Louis Bright <mlbright@gmail.com> Reviewed-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-17convert: prepare filter.<driver>.process optionLars Schneider1-26/+34
Refactor the existing 'single shot filter mechanism' and prepare the new 'long running filter mechanism'. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-17convert: make apply_filter() adhere to standard Git error handlingLars Schneider1-9/+6
apply_filter() returns a boolean that tells the caller if it "did convert or did not convert". The variable `ret` was used throughout the function to track errors whereas `1` denoted success and `0` failure. This is unusual for the Git source where `0` denotes success. Rename the variable and flip its value to make the function easier readable for Git developers. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-17convert: quote filter names in error messagesLars Schneider1-6/+6
Git filter driver commands with spaces (e.g. `filter.sh foo`) are hard to read in error messages. Quote them to improve the readability. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-14convert: Correct NNO tests and missing `LF will be replaced by CRLF`Torsten Bögershausen1-40/+57
When a non-reversible CRLF conversion is done in "git add", a warning is printed on stderr (or Git dies, depending on checksafe) The function commit_chk_wrnNNO() in t0027 was written to test this, but did the wrong thing: Instead of looking at the warning from "git add", it looked at the warning from "git commit". This is racy because "git commit" may not have to do CRLF conversion at all if it can use the sha1 value from the index (which depends on whether "add" and "commit" run in a single second). Correct t0027 and replace the commit for each and every file with a commit of all files in one go. The function commit_chk_wrnNNO() should be renamed in a separate commit. Now that t0027 does the right thing, it detects a bug in covert.c: This sequence should generate the warning `LF will be replaced by CRLF`, but does not: $ git init $ git config core.autocrlf false $ printf "Line\r\n" >file $ git add file $ git commit -m "commit with CRLF" $ git config core.autocrlf true $ printf "Line\n" >file $ git add file "git add" calls crlf_to_git() in convert.c, which calls check_safe_crlf(). When has_cr_in_index(path) is true, crlf_to_git() returns too early and check_safe_crlf() is not called at all. Factor out the code which determines if "git checkout" converts LF->CRLF into will_convert_lf_to_crlf(). Update the logic around check_safe_crlf() and "simulate" the possible LF->CRLF conversion at "git checkout" with help of will_convert_lf_to_crlf(). Thanks to Jeff King <peff@peff.net> for analyzing t0027. Reported-By: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-06convert: unify the "auto" handling of CRLFTorsten Bögershausen1-20/+22
Before this change, $ echo "* text=auto" >.gitattributes $ echo "* eol=crlf" >>.gitattributes would have the same effect as $ echo "* text" >.gitattributes $ git config core.eol crlf Since the 'eol' attribute had higher priority than 'text=auto', this may corrupt binary files and is not what most users expect to happen. Make the 'eol' attribute to obey 'text=auto' and now $ echo "* text=auto" >.gitattributes $ echo "* eol=crlf" >>.gitattributes behaves the same as $ echo "* text=auto" >.gitattributes $ git config core.eol crlf In other words, $ echo "* text=auto eol=crlf" >.gitattributes has the same effect as $ git config core.autocrlf true and $ echo "* text=auto eol=lf" >.gitattributes has the same effect as $ git config core.autocrlf input Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-25convert.c: ident + core.autocrlf didn't workTorsten Bögershausen1-12/+7
When the ident attributes is set, get_stream_filter() did not obey core.autocrlf=true, and the file was checked out with LF. Change the rule when a streaming filter can be used: - if an external filter is specified, don't use a stream filter. - if the worktree eol is CRLF and "auto" is active, don't use a stream filter. - Otherwise the stream filter can be used. Add test cases in t0027. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-26Merge branch 'tb/conversion'Junio C Hamano1-90/+100
Code simplification. * tb/conversion: convert.c: correct attr_action() convert.c: simplify text_stat convert.c: refactor crlf_action convert.c: use text_eol_is_crlf() convert.c: remove input_crlf_action() convert.c: remove unused parameter 'path' t0027: add tests for get_stream_filter()
2016-02-23convert.c: correct attr_action()Torsten Bögershausen1-9/+9
df747b81 (convert.c: refactor crlf_action, 2016-02-10) introduced a bug to "git ls-files --eol". The "text" attribute was shown as "text eol=lf" or "text eol=crlf", depending on core.autocrlf or core.eol. Correct this and add test cases in t0027. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-10convert.c: simplify text_statTorsten Bögershausen1-25/+22
Simplify the statistics: lonecr counts the CR which is not followed by a LF, lonelf counts the LF which is not preceded by a CR, crlf counts CRLF combinations. This simplifies the evaluation of the statistics. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-10convert.c: refactor crlf_actionTorsten Bögershausen1-31/+46
Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-10Merge branch 'ls/clean-smudge-override-in-config'Junio C Hamano1-1/+1
Clean/smudge filters defined in a configuration file of lower precedence can now be overridden to be a pass-through no-op by setting the variable to an empty string. * ls/clean-smudge-override-in-config: convert: treat an empty string for clean/smudge filters as "cat"
2016-02-08convert.c: use text_eol_is_crlf()Torsten Bögershausen1-6/+14
Add a helper function to find out, which line endings text files should get at checkout, depending on core.autocrlf and core.eol configuration variables. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-08convert.c: remove input_crlf_action()Torsten Bögershausen1-23/+14
Integrate the code of input_crlf_action() into convert_attrs(), so that ca.crlf_action is always valid after calling convert_attrs(). Keep a copy of crlf_action in attr_action, this is needed for get_convert_attr_ascii(). Remove eol_attr from struct conv_attrs, as it is now used temporally. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-08convert.c: remove unused parameter 'path'Torsten Bögershausen1-10/+9
Some functions get a parameter path, but don't use it. Remove the unused parameter. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-29convert: treat an empty string for clean/smudge filters as "cat"Lars Schneider1-1/+1
Once a lower-priority configuration file defines a clean or smudge filter, there is no convenient way to override it to produce as-is output. Even though the configuration mechanism implements "the last one wins" semantics, you cannot set them to an empty string and expect them to work, as apply_filter() would try to run the empty string as an external command and fail. The conversion is not done, but the function would still report a failure to convert. Even though resetting the variable to "cat" (i.e. pass the data back as-is and report success) is an obvious and a viable way to solve this, it is wasteful to spawn an external process just as a workaround. Instead, teach apply_filter() to treat an empty string as a no-op filter that always returns successfully its input as-is without conversion. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-18ls-files: add eol diagnosticsTorsten Bögershausen1-28/+91
When working in a cross-platform environment, a user may want to check if text files are stored normalized in the repository and if .gitattributes are set appropriately. Make it possible to let Git show the line endings in the index and in the working tree and the effective text/eol attributes. The end of line ("eolinfo") are shown like this: "-text" binary (or with bare CR) file "none" text file without any EOL "lf" text file with LF "crlf" text file with CRLF "mixed" text file with mixed line endings. The effective text/eol attribute is one of these: "", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf" git ls-files --eol gives an output like this: i/none w/none attr/text=auto t/t5100/empty i/-text w/-text attr/-text t/test-binary-2.png i/lf w/lf attr/text eol=lf t/t5100/rfc2047-info-0007 i/lf w/crlf attr/text eol=crlf doit.bat i/mixed w/mixed attr/ locale/XX.po to show what eol convention is used in the data in the index ('i'), and in the working tree ('w'), and what attribute is in effect, for each path that is shown. Add test cases in t0027. Helped-By: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25convert trivial sprintf / strcpy calls to xsnprintfJeff King1-1/+2
We sometimes sprintf into fixed-size buffers when we know that the buffer is large enough to fit the input (either because it's a constant, or because it's numeric input that is bounded in size). Likewise with strcpy of constant strings. However, these sites make it hard to audit sprintf and strcpy calls for buffer overflows, as a reader has to cross-reference the size of the array with the input. Let's use xsnprintf instead, which communicates to a reader that we don't expect this to overflow (and catches the mistake in case we do). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20filter_buffer_or_fd(): ignore EPIPEJunio C Hamano1-1/+6
We are explicitly ignoring SIGPIPE, as we fully expect that the filter program may not read our output fully. Ignore EPIPE that may come from writing to it as well. A new test was stolen from Jeff's suggestion. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-08Merge branch 'sp/stream-clean-filter'Junio C Hamano1-6/+49
When running a required clean filter, we do not have to mmap the original before feeding the filter. Instead, stream the file contents directly to the filter and process its output. * sp/stream-clean-filter: sha1_file: don't convert off_t to size_t too early to avoid potential die() convert: stream from fd to required clean filter to reduce used address space copy_fd(): do not close the input file descriptor mmap_limit: introduce GIT_MMAP_LIMIT to allow testing expected mmap size memory_limit: use git_env_ulong() to parse GIT_ALLOC_LIMIT config.c: add git_env_ulong() to parse environment variable convert: drop arguments other than 'path' from would_convert_to_git()
2014-08-28convert: stream from fd to required clean filter to reduce used address spaceSteffen Prohaska1-6/+49
The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-20run-command: introduce CHILD_PROCESS_INITRené Scharfe1-2/+1
Most struct child_process variables are cleared using memset first after declaration. Provide a macro, CHILD_PROCESS_INIT, that can be used to initialize them statically instead. That's shorter, doesn't require a function call and is slightly more readable (especially given that we already have STRBUF_INIT, ARGV_ARRAY_INIT etc.). Helped-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20use skip_prefix to avoid magic numbersJeff King1-2/+2
It's a common idiom to match a prefix and then skip past it with a magic number, like: if (starts_with(foo, "bar")) foo += 3; This is easy to get wrong, since you have to count the prefix string yourself, and there's no compiler check if the string changes. We can use skip_prefix to avoid the magic numbers here. Note that some of these conversions could be much shorter. For example: if (starts_with(arg, "--foo=")) { bar = arg + 6; continue; } could become: if (skip_prefix(arg, "--foo=", &bar)) continue; However, I have left it as: if (skip_prefix(arg, "--foo=", &v)) { bar = v; continue; } to visually match nearby cases which need to actually process the string. Like: if (skip_prefix(arg, "--foo=", &v)) { bar = atoi(v); continue; } Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05replace {pre,suf}fixcmp() with {starts,ends}_with()Christian Couder1-1/+1
Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c | grep -v strbuf\\.c | xargs perl -pi -e ' s|!prefixcmp\(|starts_with\(|g; s|prefixcmp\(|!starts_with\(|g; s|!suffixcmp\(|ends_with\(|g; s|suffixcmp\(|!ends_with\(|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-22typofix: in-code commentsOndřej Bílka1-1/+1
Signed-off-by: Ondřej Bílka <neleai@seznam.cz> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-21Merge branch 'lf/read-blob-data-from-index'Junio C Hamano1-25/+2
Reduce duplicated code between convert.c and attr.c. * lf/read-blob-data-from-index: convert.c: remove duplicate code read_blob_data_from_index(): optionally return the size of blob data attr.c: extract read_index_data() as read_blob_data_from_index()
2013-04-17convert.c: remove duplicate codeLukas Fleischer1-25/+2
The has_cr_in_index() function is an almost 1:1 copy of read_blob_data_from_index() with some additions. Use the latter instead of using copy-pasted code. Signed-off-by: Lukas Fleischer <git@cryptocrack.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-23convert some config callbacks to parse_config_keyJeff King1-9/+5
These callers can drop some inline pointer arithmetic and magic offset constants, making them more readable and less error-prone (those constants had to match the lengths of strings, but there is no automatic verification of that fact). The "ep" pointer (presumably for "end pointer"), which points to the final key segment of the config variable, is given the more standard name "key" to describe its function rather than its derivation. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-28Merge branch 'jb/required-filter'Junio C Hamano1-4/+24
* jb/required-filter: Add a setting to require a filter to be successful Conflicts: convert.c
2012-02-26Merge branch 'jk/maint-avoid-streaming-filtered-contents'Junio C Hamano1-4/+25
* jk/maint-avoid-streaming-filtered-contents: do not stream large files to pack when filters are in use teach dry-run convert_to_git not to require a src buffer teach convert_to_git a "dry run" mode
2012-02-24teach dry-run convert_to_git not to require a src bufferJeff King1-2/+10
When we call convert_to_git in dry-run mode, it may still want to look at the source buffer, because some CRLF conversion modes depend on analyzing the source to determine whether it is in fact convertible CRLF text. However, the main motivation for convert_to_git's dry-run mode is that we would decide which method to use to acquire the blob's data (streaming versus in-core). Requiring this source analysis creates a chicken-and-egg problem. We are better off simply guessing that anything we can't analyze will end up needing conversion. This patch lets a caller specify a NULL src buffer when using dry-run mode (and only dry-run mode). A non-zero return value goes from "we would convert" to "we might convert"; a zero return value remains "we would definitely not convert". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-24teach convert_to_git a "dry run" modeJeff King1-2/+15
Some callers may want to know whether convert_to_git will actually do anything before performing the conversion itself (e.g., to decide whether to stream or handle blobs in-core). This patch lets callers specify the dry run mode by passing a NULL destination buffer. The return value, instead of indicating whether conversion happened, will indicate whether conversion would occur. For readability, we also include a wrapper function which makes it more obvious we are not actually performing the conversion. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-21Ignore SIGPIPE when running a filter driverJehan Bing1-0/+5
If a filter is not defined or if it fails, git should behave as if the filter is a no-op passthru. However, if the filter exits before reading all the content, depending on the timing, git could be killed with SIGPIPE when it tries to write to the pipe connected to the filter. Ignore SIGPIPE while processing the filter to give us a chance to check the return value from a failed write, in order to detect and act on this mode of failure in a more controlled way. Signed-off-by: Jehan Bing <jehan@orb.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-17Add a setting to require a filter to be successfulJehan Bing1-4/+24
By default, a missing filter driver or a failure from the filter driver is not an error, but merely makes the filter operation a no-op pass through. This is useful to massage the content into a shape that is more convenient for the platform, filesystem, and the user to use, and the content filter mechanism is not used to turn something unusable into usable. However, we could also use of the content filtering mechanism and store the content that cannot be directly used in the repository (e.g. a UUID that refers to the true content stored outside git, or an encrypted content) and turn it into a usable form upon checkout (e.g. download the external content, or decrypt the encrypted content). For such a use case, the content cannot be used when filter driver fails, and we need a way to tell Git to abort the whole operation for such a failing or missing filter driver. Add a new "filter.<driver>.required" configuration variable to mark the second use case. When it is set, git will abort the operation when the filter driver does not exist or exits with a non-zero status code. Signed-off-by: Jehan Bing <jehan@orb.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-28Merge branch 'jc/maint-lf-to-crlf-keep-crlf' into maintJunio C Hamano1-10/+50
* jc/maint-lf-to-crlf-keep-crlf: lf_to_crlf_filter(): resurrect CRLF->CRLF hack
2011-12-22Merge branch 'jc/maint-lf-to-crlf-keep-crlf'Junio C Hamano1-10/+50
* jc/maint-lf-to-crlf-keep-crlf: lf_to_crlf_filter(): resurrect CRLF->CRLF hack
2011-12-21Merge branch 'cn/maint-lf-to-crlf-filter' into maintJunio C Hamano1-14/+40
* cn/maint-lf-to-crlf-filter: lf_to_crlf_filter(): tell the caller we added "\n" when draining convert: track state in LF-to-CRLF filter
2011-12-18lf_to_crlf_filter(): resurrect CRLF->CRLF hackJunio C Hamano1-10/+50
The non-streaming version of the filter counts CRLF and LF in the whole buffer, and returns without doing anything when they match (i.e. what is recorded in the object store already uses CRLF). This was done to help people who added files from the DOS world before realizing they want to go cross platform and adding .gitattributes to tell Git that they only want CRLF in their working tree. The streaming version of the filter does not want to read the whole thing before starting to work, as that defeats the whole point of streaming. So we instead check what byte follows CR whenever we see one, and add CR before LF only when the LF does not immediately follow CR already to keep CRLF as is. Reported-and-tested-by: Ralf Thielow Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-16lf_to_crlf_filter(): tell the caller we added "\n" when drainingJunio C Hamano1-5/+7
This can only happen when the input size is multiple of the buffer size of the cascade filter (16k) and ends with an LF, but in such a case, the code forgot to tell the caller that it added the "\n" it could not add during the last round. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-28convert: track state in LF-to-CRLF filterCarlos Martín Nieto1-13/+37
There may not be enough space to store CRLF in the output. If we don't fill the buffer, then the filter will keep getting called with the same short buffer and will loop forever. Instead, always store the CR and record whether there's a missing LF if so we store it in the output buffer the next time the function gets called. Reported-by: Henrik Grubbström <grubba@roxen.com> Signed-off-by: Carlos Martín Nieto <cmn@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-21convert.c: Fix return type of git_path_check_eol()Ramsay Jones1-1/+1
The git_path_check_eol() function converts a string value to the corresponding 'enum eol' value. However, the function is currently declared to return an 'enum crlf_action', which causes sparse to complain thus: SP convert.c convert.c:736:50: warning: mixing different enum types convert.c:736:50: int enum crlf_action versus convert.c:736:50: int enum eol In order to suppress the warning, we simply correct the return type in the function declaration. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-15convert: don't mix enum with intRamkumar Ramachandra1-3/+3
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-02Merge branch 'tr/maint-ident-to-git-memmove'Junio C Hamano1-2/+2
* tr/maint-ident-to-git-memmove: Use memmove in ident_to_git
2011-08-29Use memmove in ident_to_gitThomas Rast1-2/+2
convert_to_git sets src=dst->buf if any of the preceding conversions actually did any work. Thus in ident_to_git we have to use memmove instead of memcpy as far as src->dst copying is concerned. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-04Rename git_checkattr() to git_check_attr()Michael Haggerty1-1/+1
Suggested by: Junio Hamano <gitster@pobox.com> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26streaming: filter cascadingJunio C Hamano1-14/+112
This implements an internal "cascade" filter mechanism that plugs two filters in series. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26streaming filter: ident filterJunio C Hamano1-8/+169
Add support for "ident" filter on the output codepath. This does not work with lf-to-crlf filter together (yet). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26Add LF-to-CRLF streaming conversionJunio C Hamano1-0/+41
If we do not have to guess or validate by scanning the input, we can just stream this through. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26stream filter: add "no more input" to the filtersJunio C Hamano1-1/+5
Some filters may need to buffer the input and look-ahead inside it to decide what to output, and they may consume more than zero bytes of input and still not produce any output. After feeding all the input, pass NULL as input as keep calling stream_filter() to let such filters know there is no more input coming, and it is time for them to produce the remaining output based on the buffered input. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26Add streaming filter APIJunio C Hamano1-7/+77
This introduces an API to plug custom filters to an input stream. The caller gets get_stream_filter("path") to obtain an appropriate filter for the path, and then uses it when opening an input stream via open_istream(). After that, the caller can read from the stream with read_istream(), and close it with close_istream(), just like an unfiltered stream. This only adds a "null" filter that is a pass-thru filter, but later changes can add LF-to-CRLF and other filters, and the callers of the streaming API do not have to change. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20convert: CRLF_INPUT is a no-op in the output codepathJunio C Hamano1-1/+1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20streaming_write_entry(): use streaming API in write_entry()Junio C Hamano1-0/+23
When the output to a path does not have to be converted, we can read from the object database from the streaming API and write to the file in the working tree, without having to hold everything in the memory. The ident, auto- and safe- crlf conversions inherently require you to read the whole thing before deciding what to do, so while it is technically possible to support them by using a buffer of an unbound size or rewinding and reading the stream twice, it is less practical than the traditional "read the whole thing in core and convert" approach. Adding streaming filters for the other conversions on top of this should be doable by tweaking the can_bypass_conversion() function (it should be renamed to can_filter_stream() when it happens). Then the streaming API can be extended to wrap the git_istream streaming_write_entry() opens on the underlying object in another git_istream that reads from it, filters what is read, and let the streaming_write_entry() read the filtered result. But that is outside the scope of this series. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09convert: make it harder to screw up adding a conversion attributeJunio C Hamano1-41/+38
The current internal API requires the callers of setup_convert_check() to supply the git_attr_check structures (hence they need to know how many to allocate), but they grab the same set of attributes for given path. Define a new convert_attrs() API that fills a higher level information that the callers (convert_to_git and convert_to_working_tree) really want, and move the common code to interact with the attributes system to it. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09convert: make it safer to add conversion attributesJunio C Hamano1-26/+22
The places that need to pass an array of "struct git_attr_check" needed to be careful to pass a large enough array and know what index each element lied. Make it safer and easier to code these. Besides, the hard-coded sequence of initializing various attributes was too ugly after we gained more than a few attributes. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09convert: give saner names to crlf/eol variables, types and functionsJunio C Hamano1-30/+31
Back when the conversion was only about the end-of-line convention, it might have made sense to call what we do upon seeing CR/LF simply an "action", but these days the conversion routines do a lot more than just tweaking the line ending. Raname "action" to "crlf_action". The function that decides what end of line conversion to use on the output codepath was called "determine_output_conversion", as if there is no other kind of output conversion. Rename it to "output_eol"; it is a function that returns what EOL convention is to be used. A function that decides what "crlf_action" needs to be used on the input codepath, given what conversion attribute is set to the path and global end-of-line convention, was called "determine_action". Rename it to "input_crlf_action". Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09convert: rename the "eol" global variable to "core_eol"Junio C Hamano1-2/+2
Yes, it is clear that "eol" wants to mean some sort of end-of-line thing, but as the name of a global variable, it is way too short to describe what kind of end-of-line thing it wants to represent. Besides, there are many codepaths that want to use their own local "char *eol" variable to point at the end of the current line they are processing. This global variable holds what we read from core.eol configuration variable. Name it as such. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-16enums: omit trailing comma for portabilityJonathan Nieder1-1/+1
Since v1.7.2-rc0~23^2~2 (Add per-repository eol normalization, 2010-05-19), building with gcc -std=gnu89 -pedantic produces warnings like the following: convert.c:21:11: warning: comma at end of enumerator list [-pedantic] gcc is right to complain --- these commas are not permitted in C89. In the spirit of v1.7.2-rc0~32^2~16 (2010-05-14), remove them. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-22convert filter: supply path to external driverPete Wyckoff1-1/+22
Filtering to support keyword expansion may need the name of the file being filtered. In particular, to support p4 keywords like $File: //depot/product/dir/script.sh $ the smudge filter needs to know the name of the file it is smudging. Allow "%f" in the custom filter command line specified in the configuration. This will be substituted by the filename inside a single-quote pair to be passed to the shell. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-07-02Don't expand CRLFs when normalizing text during mergeEyvind Bernhardsen1-7/+20
Disable CRLF expansion when convert_to_working_tree() is called from normalize_buffer(). This improves performance when merging branches with conflicting line endings when core.eol=crlf or core.autocrlf=true by making the normalization act as if core.eol=lf. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-07-02Avoid conflicts when merging branches with mixed normalizationEyvind Bernhardsen1-2/+14
Currently, merging across changes in line ending normalization is painful since files containing CRLF will conflict with normalized files, even if the only difference between the two versions is the line endings. Additionally, any "real" merge conflicts that exist are obscured because every line in the file has a conflict. Assume you start out with a repo that has a lot of text files with CRLF checked in (A): o---C / \ A---B---D B: Add "* text=auto" to .gitattributes and normalize all files to LF-only C: Modify some of the text files D: Try to merge C You will get a ridiculous number of LF/CRLF conflicts when trying to merge C into D, since the repository contents for C are "wrong" wrt the new .gitattributes file. Fix ll-merge so that the "base", "theirs" and "ours" stages are passed through convert_to_worktree() and convert_to_git() before a three-way merge. This ensures that all three stages are normalized in the same way, removing from consideration differences that are only due to normalization. This feature is optional for now since it changes a low-level mechanism and is not necessary for the majority of users. The "merge.renormalize" config variable enables it. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-06-21Merge branch 'eb/core-eol'Junio C Hamano1-40/+110
* eb/core-eol: Add "core.eol" config variable Rename the "crlf" attribute "text" Add per-repository eol normalization Add tests for per-repository eol normalization Conflicts: Documentation/config.txt Makefile
2010-06-21Merge branch 'fg/autocrlf'Junio C Hamano1-0/+49
* fg/autocrlf: autocrlf: Make it work also for un-normalized repositories
2010-06-21Merge branch 'gv/portable'Junio C Hamano1-1/+3
* gv/portable: test-lib: use DIFF definition from GIT-BUILD-OPTIONS build: propagate $DIFF to scripts Makefile: Tru64 portability fix Makefile: HP-UX 10.20 portability fixes Makefile: HPUX11 portability fixes Makefile: SunOS 5.6 portability fix inline declaration does not work on AIX Allow disabling "inline" Some platforms lack socklen_t type Make NO_{INET_NTOP,INET_PTON} configured independently Makefile: some platforms do not have hstrerror anywhere git-compat-util.h: some platforms with mmap() lack MAP_FAILED definition test_cmp: do not use "diff -u" on platforms that lack one fixup: do not unconditionally disable "diff -u" tests: use "test_cmp", not "diff", when verifying the result Do not use "diff" found on PATH while building and installing enums: omit trailing comma for portability Makefile: -lpthread may still be necessary when libc has only pthread stubs Rewrite dynamic structure initializations to runtime assignment Makefile: pass CPPFLAGS through to fllow customization Conflicts: Makefile wt-status.h
2010-06-06Add "core.eol" config variableEyvind Bernhardsen1-24/+36
Introduce a new configuration variable, "core.eol", that allows the user to set which line endings to use for end-of-line-normalized files in the working directory. It defaults to "native", which means CRLF on Windows and LF everywhere else. Note that "core.autocrlf" overrides core.eol. This means that [core] autocrlf = true puts CRLFs in the working directory even if core.eol is set to "lf". Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-31Rewrite dynamic structure initializations to runtime assignmentGary V. Vaughan1-1/+3
Unfortunately, there are still plenty of production systems with vendor compilers that choke unless all compound declarations can be determined statically at compile time, for example hpux10.20 (I can provide a comprehensive list of our supported platforms that exhibit this problem if necessary). This patch simply breaks apart any compound declarations with dynamic initialisation expressions, and moves the initialisation until after the last declaration in the same block, in all the places necessary to have the offending compilers accept the code. Signed-off-by: Gary V. Vaughan <gary@thewrittenword.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>