Age | Commit message (Collapse) | Author | Files | Lines |
|
Add and apply a semantic patch for calling xstrncmpz() to compare a
NUL-terminated string with a buffer of a known length instead of using
strncmp() and checking the terminating NUL explicitly. This simplifies
callers by reducing code duplication.
I had to adjust remote.c manually because Coccinelle inexplicably
changed the indent of the else branches.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Further shuffling of declarations across header files to streamline
file dependencies.
* cw/compat-util-header-cleanup:
git-compat-util: move alloc macros to git-compat-util.h
treewide: remove unnecessary includes for wrapper.h
kwset: move translation table from ctype
sane-ctype.h: create header for sane-ctype macros
git-compat-util: move wrapper.c funcs to its header
git-compat-util: move strbuf.c funcs to its header
|
|
Reduce reliance on a global state in the config reading API.
* gc/config-context:
config: pass source to config_parser_event_fn_t
config: add kvi.path, use it to evaluate includes
config.c: remove config_reader from configsets
config: pass kvi to die_bad_number()
trace2: plumb config kvi
config.c: pass ctx with CLI config
config: pass ctx with config files
config.c: pass ctx in configsets
config: add ctx arg to config_fn_t
urlmatch.h: use config_fn_t type
config: inline git_color_default_config
|
|
Code clean-up around strbuf_expand() API.
* rs/strbuf-expand-step:
strbuf: simplify strbuf_expand_literal_cb()
replace strbuf_expand() with strbuf_expand_step()
replace strbuf_expand_dict_cb() with strbuf_expand_step()
strbuf: factor out strbuf_expand_step()
pretty: factor out expand_separator()
|
|
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).
In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.
Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:
- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed
Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.
The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:
- trace2/tr2_cfg.c:tr2_cfg_set_fl()
This is indirectly called by git_config_set() so that the trace2
machinery can notice the new config values and update its settings
using the tr2 config parsing function, i.e. tr2_cfg_cb().
- builtin/checkout.c:checkout_main()
This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
This might be worth refactoring away in the future, since
git_xmerge_config() can call git_default_config(), which can do much
more than just parsing.
Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The vast majority of files including object-store.h did not need dir.h
nor khash.h. Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.
After this patch:
$ git grep -h include..object-store | sort | uniq -c
2 #include "object-store.h"
129 #include "object-store-ll.h"
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A long term (but rather minor) pet-peeve of mine was the name
ll-merge.[ch]. I thought it made it harder to realize what stuff was
related to merging when I was working on the merge machinery and trying
to improve it.
Further, back in d1cbe1e6d8a ("hash-ll.h: split out of hash.h to remove
dependency on repository.h", 2023-04-22), we have split the portions of
hash.h that do not depend upon repository.h into a "hash-ll.h" (due to
the recommendation to use "ll" for "low-level" in its name[1], but which
I used as a suffix precisely because of my distaste for "ll-merge").
When we discussed adding additional "*-ll.h" files, a request was made
that we use "ll" consistently as either a prefix or a suffix. Since it
is already in use as both a prefix and a suffix, the only way to do so
is to rename some files.
Besides my distaste for the ll-merge.[ch] name, let me also note that
the files
ll-fsmonitor.h, ll-hash.h, ll-merge.h, ll-object-store.h, ll-read-cache.h
would have essentially nothing to do with each other and make no sense
to group. But giving them the common "ll-" prefix would group them. Using
"-ll" as a suffix thus seems just much more logical to me. Rename
ll-merge.[ch] to merge-ll.[ch] to achieve this consistency, and to
ensure we get a more logical grouping of files.
[1] https://lore.kernel.org/git/kl6lsfcu1g8w.fsf@chooglen-macbookpro.roam.corp.google.com/
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since this header showed up in some places besides just #include
statements, update/clean-up/remove those other places as well.
Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got
away with violating the rule that all files must start with an include
of git-compat-util.h (or a short-list of alternate headers that happen
to include it first). This change exposed the violation and caused it
to stop building correctly; fix it by having it include
git-compat-util.h first, as per policy.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
For the functions defined in read-cache.c, move their declarations from
cache.h to a new header, read-cache-ll.h. Also move some related inline
functions from cache.h to read-cache.h. The purpose of the
read-cache-ll.h/read-cache.h split is that about 70% of the sites don't
need the inline functions and the extra headers they include.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Avoid the overhead of setting up a dictionary and passing it via
strbuf_expand() to strbuf_expand_dict_cb() by using strbuf_expand_step()
in a loop instead. It requires explicit handling of %% and unrecognized
placeholders, but is more direct and simpler overall, and expands only
on demand.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git --attr-source=<tree> cmd $args" is a new way to have any
command to read attributes not from the working tree but from the
given tree object.
* jc/attr-source-tree:
attr: teach "--attr-source=<tree>" global option to "git"
|
|
Earlier, 47cfc9bd (attr: add flag `--source` to work with tree-ish,
2023-01-14) taught "git check-attr" the "--source=<tree>" option to
allow it to read attribute files from a tree-ish, but did so only
for the command. Just like "check-attr" users wanted a way to use
attributes from a tree-ish and not from the working tree files,
users of other commands (like "git diff") would benefit from the
same.
Undo most of the UI change the commit made, while keeping the
internal logic to read attributes from a given tree-ish. Expose the
internal logic via a new "--attr-source=<tree>" command line option
given to "git", so that it can be used with any git command that
runs as part of the main git process.
Additionally, add an environment variable GIT_ATTR_SOURCE that is set
when --attr-source is passed in, so that subprocesses use the same value
for the attributes source tree.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Dozens of files made use of advice functions, without explicitly
including advice.h. This made it more difficult to find which files
could remove a dependence on cache.h. Make C files explicitly include
advice.h if they are using it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Dozens of files made use of trace and trace2 functions, without
explicitly including trace.h or trace2.h. This made it more difficult
to find which files could remove a dependence on cache.h. Make C files
explicitly include trace.h or trace2.h if they are using them.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Dozens of files made use of gettext functions, without explicitly
including gettext.h. This made it more difficult to find which files
could remove a dependence on cache.h. Make C files explicitly include
gettext.h if they are using it.
However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The contents of the .gitattributes files may evolve over time, but "git
check-attr" always checks attributes against them in the working tree
and/or in the index. It may be beneficial to optionally allow the users
to check attributes taken from a commit other than HEAD against paths.
Add a new flag `--source` which will allow users to check the
attributes against a commit (actually any tree-ish would do). When the
user uses this flag, we go through the stack of .gitattributes files but
instead of checking the current working tree and/or in the index, we
check the blobs from the provided tree-ish object. This allows the
command to also be used in bare repositories.
Since we use a tree-ish object, the user can pass "--source
HEAD:subdirectory" and all the attributes will be looked up as if
subdirectory was the root directory of the repository.
We cannot simply use the `<rev>:<path>` syntax without the `--source`
flag, similar to how it is used in `git show` because any non-flag
parameter before `--` is treated as an attribute and any parameter after
`--` is treated as a pathname.
The change involves creating a new function `read_attr_from_blob`, which
given the path reads the blob for the path against the provided source and
parses the attributes line by line. This function is plugged into
`read_attr()` function wherein we go through the stack of attributes
files.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
Co-authored-by: toon@iotcl.com
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The null stream filter unsurprisingly does not look at its "filter"
argument, since it just eats bytes. But we can't drop it, since it has
to conform to the same virtual interface that real filters do. Mark the
unused parameter to appease -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
As reported in [1] the "UNUSED(var)" macro introduced in
2174b8c75de (Merge branch 'jk/unused-annotation' into next,
2022-08-24) breaks coccinelle's parsing of our sources in files where
it occurs.
Let's instead partially go with the approach suggested in [2] of
making this not take an argument. As noted in [1] "coccinelle" will
ignore such tokens in argument lists that it doesn't know about, and
it's less of a surprise to syntax highlighters.
This undoes the "help us notice when a parameter marked as unused is
actually use" part of 9b240347543 (git-compat-util: add UNUSED macro,
2022-08-19), a subsequent commit will further tweak the macro to
implement a replacement for that functionality.
1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The start_async(), etc, functions need a "proc" callback that conforms
to a particular interface. Not every callback needs every parameter
(e.g., the caller might not even ask to open an input descriptor, in
which case there is no point in the callback looking at it). Let's mark
these for -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The callback passed to git_config() must conform to a particular
interface. But most callbacks don't actually look at the extra "void
*data" parameter. Let's mark the unused parameters to make
-Wunused-parameter happy.
Note there's one unusual case here in get_remote_default() where we
actually ignore the "value" parameter. That's because it's only checking
whether the option is found at all, and not parsing its value.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The warning about converting line endings is extremely confusing. Its
two sentences each use the word "will" without specifying a timeframe,
which makes it sound like both sentences are referring to the same
timeframe. On top of that, it uses the term "original line endings"
without saying whether "original" means LF or CRLF.
Rephrase the warning to be clear about when the line endings will be
changed and what they will be changed to.
On a platform whose native line endings are not CRLF (e.g. Linux), the
"git add" step in the following sequence triggers the warning in
question:
$ git config core.autocrlf true
$ echo 'Hello world!' >hello.txt
$ git add hello.txt
warning: LF will be replaced by CRLF in hello.txt
The file will have its original line endings in your working directory
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Object-file API shuffling.
* ab/object-file-api-updates:
object-file API: pass an enum to read_object_with_reference()
object-file.c: add a literal version of write_object_file_prepare()
object-file API: have hash_object_file() take "enum object_type"
object API: rename hash_object_file_literally() to write_*()
object-file API: split up and simplify check_object_signature()
object API users + docs: check <0, not !0 with check_object_signature()
object API docs: move check_object_signature() docs to cache.h
object API: correct "buf" v.s. "map" mismatch in *.c and *.h
object-file API: have write_object_file() take "enum object_type"
object-file API: add a format_object_header() function
object-file API: return "void", not "int" from hash_object_file()
object-file.c: split up declaration of unrelated variables
|
|
Change the hash_object_file() function to take an "enum
object_type".
Since a preceding commit all of its callers are passing either
"{commit,tree,blob,tag}_type", or the result of a call to type_name(),
the parse_object() caller that would pass NULL is now using
stream_object_signature().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the "struct stream_filter_vtbl" and "struct stream_filter"
assignments in convert.c to use designated initializers.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The clean/smudge conversion code path has been prepared to better
work on platforms where ulong is narrower than size_t.
* mc/clean-smudge-with-llp64:
clean/smudge: allow clean filters to process extremely large files
odb: guard against data loss checking out a huge file
git-compat-util: introduce more size_t helpers
odb: teach read_blob_entry to use size_t
t1051: introduce a smudge filter test for extremely large files
test-lib: add prerequisite for 64-bit platforms
test-tool genzeros: generate large amounts of data more efficiently
test-genzeros: allow more than 2G zeros in Windows
|
|
The filter system allows for alterations to file contents when they're
moved between the database and the worktree. We already made sure that
it is possible for smudge filters to produce contents that are larger
than `unsigned long` can represent (which matters on systems where
`unsigned long` is narrower than `size_t`, most notably 64-bit Windows).
Now we make sure that clean filters can _consume_ contents that are
larger than that.
Note that this commit only allows clean filters' _input_ to be larger
than can be represented by `unsigned long`.
This change makes only a very minute dent into the much larger project
to teach Git to use `size_t` instead of `unsigned long` wherever
appropriate.
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Matt Cooper <vtbassmatt@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
apply_multi_file_filter and async_query_available_blobs both query
subprocess output using subprocess_read_status, which writes data into
the identically named filter_status strbuf. We add a strbuf_release to
avoid leaking their contents.
Leak output seen when running t0021 with LSAN:
Direct leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0xa8c2b5 in xrealloc wrapper.c:126:8
#2 0x9ff99d in strbuf_grow strbuf.c:98:2
#3 0x9ff99d in strbuf_addbuf strbuf.c:304:2
#4 0xa101d6 in subprocess_read_status sub-process.c:45:5
#5 0x77793c in apply_multi_file_filter convert.c:886:8
#6 0x77793c in apply_filter convert.c:1042:10
#7 0x77a0b5 in convert_to_git_filter_fd convert.c:1492:7
#8 0x8b48cd in index_stream_convert_blob object-file.c:2156:2
#9 0x8b48cd in index_fd object-file.c:2248:9
#10 0x597411 in hash_fd builtin/hash-object.c:43:9
#11 0x596be1 in hash_object builtin/hash-object.c:59:2
#12 0x596be1 in cmd_hash_object builtin/hash-object.c:153:3
#13 0x4ce83e in run_builtin git.c:475:11
#14 0x4ccafe in handle_builtin git.c:729:3
#15 0x4cb01c in run_argv git.c:818:4
#16 0x4cb01c in cmd_main git.c:949:19
#17 0x6bdc2d in main common-main.c:52:11
#18 0x7f42acf79349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 24 byte(s) leaked in 1 allocation(s).
Direct leak of 120 byte(s) in 5 object(s) allocated from:
#0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
#1 0xa8c295 in xrealloc wrapper.c:126:8
#2 0x9ff97d in strbuf_grow strbuf.c:98:2
#3 0x9ff97d in strbuf_addbuf strbuf.c:304:2
#4 0xa101b6 in subprocess_read_status sub-process.c:45:5
#5 0x775c73 in async_query_available_blobs convert.c:960:8
#6 0x80029d in finish_delayed_checkout entry.c:183:9
#7 0xa65d1e in check_updates unpack-trees.c:493:10
#8 0xa5f469 in unpack_trees unpack-trees.c:1747:8
#9 0x525971 in checkout builtin/clone.c:815:6
#10 0x525971 in cmd_clone builtin/clone.c:1409:8
#11 0x4ce83e in run_builtin git.c:475:11
#12 0x4ccafe in handle_builtin git.c:729:3
#13 0x4cb01c in run_argv git.c:818:4
#14 0x4cb01c in cmd_main git.c:949:19
#15 0x6bdc2d in main common-main.c:52:11
#16 0x7fa253fce349 in __libc_start_main (/lib64/libc.so.6+0x24349)
SUMMARY: AddressSanitizer: 120 byte(s) leaked in 5 allocation(s).
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Builds on top of the sparse-index infrastructure to mark operations
that are not ready to mark with the sparse index, causing them to
fall back on fully-populated index that they always have worked with.
* ds/sparse-index-protections: (47 commits)
name-hash: use expand_to_path()
sparse-index: expand_to_path()
name-hash: don't add directories to name_hash
revision: ensure full index
resolve-undo: ensure full index
read-cache: ensure full index
pathspec: ensure full index
merge-recursive: ensure full index
entry: ensure full index
dir: ensure full index
update-index: ensure full index
stash: ensure full index
rm: ensure full index
merge-index: ensure full index
ls-files: ensure full index
grep: ensure full index
fsck: ensure full index
difftool: ensure full index
commit: ensure full index
checkout: ensure full index
...
|
|
Several methods specify that they take a 'struct index_state' pointer
with the 'const' qualifier because they intend to only query the data,
not change it. However, we will be introducing a step very low in the
method stack that might modify a sparse-index to become a full index in
the case that our queries venture inside a sparse-directory entry.
This change only removes the 'const' qualifiers that are necessary for
the following change which will actually modify the implementation of
index_name_stage_pos().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A simple IPC interface gets introduced to build services like
fsmonitor on top.
* jh/simple-ipc:
t0052: add simple-ipc tests and t/helper/test-simple-ipc tool
simple-ipc: add Unix domain socket implementation
unix-stream-server: create unix domain socket under lock
unix-socket: disallow chdir() when creating unix domain sockets
unix-socket: add backlog size option to unix_stream_listen()
unix-socket: eliminate static unix_stream_socket() helper function
simple-ipc: add win32 implementation
simple-ipc: design documentation for new IPC mechanism
pkt-line: add options argument to read_packetized_to_strbuf()
pkt-line: add PACKET_READ_GENTLE_ON_READ_ERROR option
pkt-line: do not issue flush packets in write_packetized_*()
pkt-line: eliminate the need for static buffer in packet_write_gently()
|
|
Preparatory API changes for parallel checkout.
* mt/parallel-checkout-part-1:
entry: add checkout_entry_ca() taking preloaded conv_attrs
entry: move conv_attrs lookup up to checkout_entry()
entry: extract update_ce_after_write() from write_entry()
entry: make fstat_output() and read_blob_entry() public
entry: extract a header file for entry.c functions
convert: add classification for conv_attrs struct
convert: add get_stream_filter_ca() variant
convert: add [async_]convert_to_working_tree_ca() variants
convert: make convert_attrs() and convert structs public
|
|
We had a code to diagnose and die cleanly when a required
clean/smudge filter is missing, but an assert before that
unnecessarily fired, hiding the end-user facing die() message.
* mt/cleanly-die-upon-missing-required-filter:
convert: fail gracefully upon missing clean cmd on required filter
|
|
Create `enum conv_attrs_classification` to express the different ways
that attributes are handled for a blob during checkout.
This will be used in a later commit when deciding whether to add a file
to the parallel or delayed queue during checkout. For now, we can also
use it in get_stream_filter_ca() to simplify the function (as the
classifying logic is the same).
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Like the previous patch, we will also need to call get_stream_filter()
with a precomputed `struct conv_attrs`, when we add support for parallel
checkout workers. So add the _ca() variant which takes the conversion
attributes struct as a parameter.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Separate the attribute gathering from the actual conversion by adding
_ca() variants of the conversion functions. These variants receive a
precomputed 'struct conv_attrs', not relying, thus, on an index state.
They will be used in a future patch adding parallel checkout support,
for two reasons:
- We will already load the conversion attributes in checkout_entry(),
before conversion, to decide whether a path is eligible for parallel
checkout. Therefore, it would be wasteful to load them again later,
for the actual conversion.
- The parallel workers will be responsible for reading, converting and
writing blobs to the working tree. They won't have access to the main
process' index state, so they cannot load the attributes. Instead,
they will receive the preloaded ones and call the _ca() variant of
the conversion functions. Furthermore, the attributes machinery is
optimized to handle paths in sequential order, so it's better to leave
it for the main process, anyway.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move convert_attrs() declaration from convert.c to convert.h, together
with the conv_attrs struct and the crlf_action enum. This function and
the data structures will be used outside convert.c in the upcoming
parallel checkout implementation. Note that crlf_action is renamed to
convert_crlf_action, which is more appropriate for the global namespace.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Update the calling sequence of `read_packetized_to_strbuf()` to take
an options argument and not assume a fixed set of options. Update the
only existing caller accordingly to explicitly pass the
formerly-assumed flags.
The `read_packetized_to_strbuf()` function calls `packet_read()` with
a fixed set of assumed options (`PACKET_READ_GENTLE_ON_EOF`). This
assumption has been fine for the single existing caller
`apply_multi_file_filter()` in `convert.c`.
In a later commit we would like to add other callers to
`read_packetized_to_strbuf()` that need a different set of options.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove the `packet_flush_gently()` call in `write_packetized_from_buf() and
`write_packetized_from_fd()` and require the caller to call it if desired.
Rename both functions to `write_packetized_from_*_no_flush()` to prevent
later merge accidents.
`write_packetized_from_buf()` currently only has one caller:
`apply_multi_file_filter()` in `convert.c`. It always wants a flush packet
to be written after writing the payload.
However, we are about to introduce a caller that wants to write many
packets before a final flush packet, so let's make the caller responsible
for emitting the flush packet.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead. It shortens the code and infers the
element size automatically.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The gitattributes documentation mentions that either the clean cmd or
the smudge cmd can be left unspecified in a filter definition. However,
when the filter is marked as 'required', the absence of any one of these
two should be treated as an error. Git already fails under these
circumstances, but not always in a pleasant way: omitting a clean cmd in
a required filter triggers an assertion error which leaves the user with
a quite verbose message:
git: convert.c:1459: convert_to_git_filter_fd: Assertion "ca.drv->clean || ca.drv->process" failed.
This assertion is not really necessary, as the apply_filter() call below
it already performs the same check. And when this condition is not met,
the function returns 0, making the caller die() with a much nicer
message. (Also note that die()-ing here is the right behavior as
`would_convert_to_git_filter_fd() == true` is a precondition to use
convert_to_git_filter_fd(), and the former is only true when the filter
is required.) So remove the assertion and add two regression tests to
make sure that git fails nicely when either the smudge or clean command
is missing on a required filter.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The crlf_action parameter hasn't been used since a0ad53c181 (convert:
Correct NNO tests and missing `LF will be replaced by CRLF`,
2016-08-13), where that part of the function was hoisted out to a
separate will_convert_lf_to_crlf() helper. Let's drop the useless
parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The child_process structure has an embedded strvec for formulating
the command line argument list these days, but code that predates
the wide use of it prepared a separate char *argv[] array and
manually set the child_process.argv pointer point at it.
Teach these old-style code to lose the separate argv[] array.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We return the length to a subset of a string using an "int *"
out-parameter. This is fine most of the time, as we'd expect config keys
to be relatively short, but it could behave oddly if we had a gigantic
config key. A more appropriate type is size_t.
Let's switch over, which lets our callers use size_t as appropriate
(they are bound by our type because they must pass the out-parameter as
a pointer). This is mostly just a cleanup to make it clear this code
handles long strings correctly. In practice, our config parser already
chokes on long key names (because of a similar int/size_t mixup!).
When doing an int/size_t conversion, we have to be careful that nobody
was trying to assign a negative value to the variable. I manually
confirmed that for each case here. They tend to just feed the result to
xmemdupz() or similar; in a few cases I adjusted the parameter types for
helper functions to make sure the size_t is preserved.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Now that we have the codebase wired up to pass any additional metadata
to filters, let's collect the additional metadata that we'd like to
pass.
The two main places we pass this metadata are checkouts and archives.
In these two situations, reading HEAD isn't a valid option, since HEAD
isn't updated for checkouts until after the working tree is written and
archives can accept an arbitrary tree. In other situations, HEAD will
usually reflect the refname of the branch in current use.
We pass a smaller amount of data in other cases, such as git cat-file,
where we can really only logically know about the blob.
This commit updates only the parts of the checkout code where we don't
use unpack_trees. That function and callers of it will be handled in a
future commit.
In the archive code, we leak a small amount of memory, since nothing we
pass in the archiver argument structure is freed.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are a variety of situations where a filter process can make use of
some additional metadata. For example, some people find the ident
filter too limiting and would like to include the commit or the branch
in their smudged files. This information isn't available during
checkout as HEAD hasn't been updated at that point, and it wouldn't be
available in archives either.
Let's add a way to pass this metadata down to the filter. We pass the
blob we're operating on, the treeish (preferring the commit over the
tree if one exists), and the ref we're operating on. Note that we won't
pass this information in all cases, such as when renormalizing or when
we're performing diffs, since it doesn't make sense in those cases.
The data we currently get from the filter process looks like the
following:
command=smudge
pathname=git.c
0000
With this change, we'll get data more like this:
command=smudge
pathname=git.c
refname=refs/tags/v2.25.1
treeish=c522f061d551c9bb8684a7c3859b2ece4499b56b
blob=7be7ad34bd053884ec48923706e70c81719a8660
0000
There are a couple things to note about this approach. For operations
like checkout, treeish will always be a commit, since we cannot check
out individual trees, but for other operations, like archive, we can end
up operating on only a particular tree, so we'll provide only a tree as
the treeish. Similar comments apply for refname, since there are a
variety of cases in which we won't have a ref.
This commit wires up the code to print this information, but doesn't
pass any of it at this point. In a future commit, we'll have various
code paths pass the actual useful data down.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Some codepaths were given a repository instance as a parameter to
work in the repository, but passed the_repository instance to its
callees, which has been cleaned up (somewhat).
* mt/use-passed-repo-more-in-funcs:
sha1-file: allow check_object_signature() to handle any repo
sha1-file: pass git_hash_algo to hash_object_file()
sha1-file: pass git_hash_algo to write_object_file_prepare()
streaming: allow open_istream() to handle any repo
pack-check: use given repo's hash_algo at verify_packfile()
cache-tree: use given repo's hash_algo at verify_one()
diff: make diff_populate_filespec() honor its repo argument
|
|
Typofix.
* js/convert-typofix:
convert: fix typo
|
|
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Allow hash_object_file() to work on arbitrary repos by introducing a
git_hash_algo parameter. Change callers which have a struct repository
pointer in their scope to pass on the git_hash_algo from the said repo.
For all other callers, pass on the_hash_algo, which was already being
used internally at hash_object_file(). This functionality will be used
in the following patch to make check_object_signature() be able to work
on arbitrary repos (which, in turn, will be used to fix an
inconsistency at object.c:parse_object()).
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code simplification.
* rs/skip-iprefix:
convert: use skip_iprefix() in validate_encoding()
utf8: use skip_iprefix() in same_utf_encoding()
|
|
Use skip_iprefix() to parse "UTF" case-insensitively instead of checking
with istarts_with(), building an upper-case version and then using
skip_prefix() on it. This gets rid of duplicate code and of a small
allocation.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The code to skip "UTF" and "UTF-" prefix, when computing an advice
message, did not work correctly when the prefix was "UTF", which
has been fixed.
* rs/convert-fix-utf-without-dash:
convert: fix handling of dashless UTF prefix in validate_encoding()
|
|
Strip "UTF" and an optional dash from the start of 'upper' without
passing a NULL pointer to skip_prefix() in the second call, as it cannot
handle that.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When applying multiple patches with git am, or when rebasing using the
am backend, it's possible that one of our patches has updated a
gitattributes file. Currently, we cache this information, so if a
file in a subsequent patch has attributes applied, the file will be
written out with the attributes in place as of the time we started the
rebase or am operation, not with the attributes applied by the previous
patch. This problem does not occur when using the -m or -i flags to
rebase.
To ensure we write the correct data into the working tree, expire the
cache after each patch that touches a path ending in ".gitattributes".
Since we load these attributes in multiple separate files, we must
expire them accordingly.
Verify that both the am and rebase code paths work correctly, including
the conflict marker size with am -3.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When the "clean" filter can reduce the size of a huge file in the
working tree down to a small "token" (a la Git LFS), there is no
point in allocating a huge scratch area upfront, but the buffer is
sized based on the original file size. The convert mechanism now
allocates very minimum and reallocates as it receives the output
from the clean filter process.
* jh/resize-convert-scratch-buffer:
convert: avoid malloc of original file size
|
|
We write the output of a "clean" filter into a strbuf. Rather than
growing the strbuf dynamically as we read its output, we make the
initial allocation as large as the original input file. This is a good
guess when the filter is just tweaking a few bytes, but it's disastrous
when the point of the filter is to condense a very large file into a
short identifier (e.g., the way git-lfs and git-annex do). We may ask to
allocate many gigabytes, causing the allocation to fail and Git to
die().
Instead, let's just let strbuf do its usual growth.
When the clean filter does output something around the same size as the
worktree file, the buffer will need to be reallocated until it fits,
starting at 8192 and doubling in size. Benchmarking indicates that
reallocation is not a significant overhead for outputs up to a
few MB in size.
Signed-off-by: Joey Hess <id@joeyh.name>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code cleanup.
* jk/unused-parameter-cleanup:
convert: drop path parameter from actual conversion functions
convert: drop len parameter from conversion checks
config: drop unused parameter from maybe_remove_section()
show_date_relative(): drop unused "tz" parameter
column: drop unused "opts" parameter in item_length()
create_bundle(): drop unused "header" parameter
apply: drop unused "def" parameter from find_name_gnu()
match-trees: drop unused path parameter from score functions
|
|
The assumption to work on the single "in-core index" instance has
been reduced from the library-ish part of the codebase.
* nd/the-index-final:
cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch
read-cache.c: remove the_* from index_has_changes()
merge-recursive.c: remove implicit dependency on the_repository
merge-recursive.c: remove implicit dependency on the_index
sha1-name.c: remove implicit dependency on the_index
read-cache.c: replace update_index_if_able with repo_&
read-cache.c: kill read_index()
checkout: avoid the_index when possible
repository.c: replace hold_locked_index() with repo_hold_locked_index()
notes-utils.c: remove the_repository references
grep: use grep_opt->repo instead of explict repo argument
|
|
The caller is responsible for looking up the attributes,
after which point we no longer care about the path at which
the content is found.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We've already extracted the useful information into our text_stat
struct, so the length is no longer needed.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
By default, index compat macros are off from now on, because they
could hide the_index dependency.
Only those in builtin can use it.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code clean-up.
* nd/style-opening-brace:
style: the opening '{' of a function is in a separate line
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We indent with TABs and sometimes for fine alignment, TABs followed by
spaces, but never all spaces (unless the indentation is less than 8
columns). Indenting with spaces slips through in some places. Fix
them.
Imported code and compat/ are left alone on purpose. The former should
remain as close as upstream as possible. The latter pretty much has
separate maintainers, it's up to them to decide.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
git_check_attr() returns always 0.
Remove all the error handling code of the callers, which is never executed.
Change git_check_attr() to be a void function.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The more library-ish parts of the codebase learned to work on the
in-core index-state instance that is passed in by their callers,
instead of always working on the singleton "the_index" instance.
* nd/no-the-index: (24 commits)
blame.c: remove implicit dependency on the_index
apply.c: remove implicit dependency on the_index
apply.c: make init_apply_state() take a struct repository
apply.c: pass struct apply_state to more functions
resolve-undo.c: use the right index instead of the_index
archive-*.c: use the right repository
archive.c: avoid access to the_index
grep: use the right index instead of the_index
attr: remove index from git_attr_set_direction()
entry.c: use the right index instead of the_index
submodule.c: use the right index instead of the_index
pathspec.c: use the right index instead of the_index
unpack-trees: avoid the_index in verify_absent()
unpack-trees: convert clear_ce_flags* to avoid the_index
unpack-trees: don't shadow global var the_index
unpack-trees: add a note about path invalidation
unpack-trees: remove 'extern' on function declaration
ls-files: correct index argument to get_convert_attr_ascii()
preload-index.c: use the right index instead of the_index
dir.c: remove an implicit dependency on the_index in pathspec code
...
|
|
Code clean-up to use size_t/ssize_t when they are the right type.
* jk/size-t:
strbuf_humanise: use unsigned variables
pass st.st_size as hint for strbuf_readlink()
strbuf_readlink: use ssize_t
strbuf: use size_t for length in intermediate variables
reencode_string: use size_t for string lengths
reencode_string: use st_add/st_mult helpers
|
|
Many more strings are prepared for l10n.
* nd/i18n: (23 commits)
transport-helper.c: mark more strings for translation
transport.c: mark more strings for translation
sha1-file.c: mark more strings for translation
sequencer.c: mark more strings for translation
replace-object.c: mark more strings for translation
refspec.c: mark more strings for translation
refs.c: mark more strings for translation
pkt-line.c: mark more strings for translation
object.c: mark more strings for translation
exec-cmd.c: mark more strings for translation
environment.c: mark more strings for translation
dir.c: mark more strings for translation
convert.c: mark more strings for translation
connect.c: mark more strings for translation
config.c: mark more strings for translation
commit-graph.c: mark more strings for translation
builtin/replace.c: mark more strings for translation
builtin/pack-objects.c: mark more strings for translation
builtin/grep.c: mark strings for translation
builtin/config.c: mark more strings for translation
...
|
|
Make the convert API take an index_state instead of assuming the_index
in convert.c. All external call sites are converted blindly to keep
the patch simple and retain current behavior. Individual call sites
may receive further updates to use the right index instead of
the_index.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Make the attr API take an index_state instead of assuming the_index in
attr code. All call sites are converted blindly to keep the patch
simple and retain current behavior. Individual call sites may receive
further updates to use the right index instead of the_index.
There is one ugly temporary workaround added in attr.c that needs some
more explanation.
Commit c24f3abace (apply: file commited with CRLF should roundtrip
diff and apply - 2017-08-19) forces one convert_to_git() call to NOT
read the index at all. But what do you know, we read it anyway by
falling back to the_index. When "istate" from convert_to_git is now
propagated down to read_attr_from_array() we will hit segfault
somewhere inside read_blob_data_from_index.
The right way of dealing with this is to kill "use_index" variable and
only follow "istate" but at this stage we are not ready for that:
while most git_attr_set_direction() calls just passes the_index to be
assigned to use_index, unpack-trees passes a different one which is
used by entry.c code, which has no way to know what index to use if we
delete use_index. So this has to be done later.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The codebase has been updated to compile cleanly with -pedantic
option.
* bb/pedantic:
utf8.c: avoid char overflow
string-list.c: avoid conversion from void * to function pointer
sequencer.c: avoid empty statements at top level
convert.c: replace "\e" escapes with "\033".
fixup! refs/refs-internal.h: avoid forward declaration of an enum
refs/refs-internal.h: avoid forward declaration of an enum
fixup! connect.h: avoid forward declaration of an enum
connect.h: avoid forward declaration of an enum
|
|
The iconv interface takes a size_t, which is the appropriate
type for an in-memory buffer. But our reencode_string_*
functions use integers, meaning we may get confusing results
when the sizes exceed INT_MAX. Let's use size_t
consistently.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Many messages will be marked for translation in the following
commits. This commit updates some of them to be more consistent and
reduce diff noise in those commits. Changes are
- keep the first letter of die(), error() and warning() in lowercase
- no full stop in die(), error() or warning() if it's single sentence
messages
- indentation
- some messages are turned to BUG(), or prefixed with "BUG:" and will
not be marked for i18n
- some messages are improved to give more information
- some messages are broken down by sentence to be i18n friendly
(on the same token, combine multiple warning() into one big string)
- the trailing \n is converted to printf_ln if possible, or deleted
if not redundant
- errno_errno() is used instead of explicit strerror()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.
* sb/object-store-grafts:
commit: allow lookup_commit_graft to handle arbitrary repositories
commit: allow prepare_commit_graft to handle arbitrary repositories
shallow: migrate shallow information into the object parser
path.c: migrate global git_path_* to take a repository argument
cache: convert get_graft_file to handle arbitrary repositories
commit: convert read_graft_file to handle arbitrary repositories
commit: convert register_commit_graft to handle arbitrary repositories
commit: convert commit_graft_pos() to handle arbitrary repositories
shallow: add repository argument to is_repository_shallow
shallow: add repository argument to check_shallow_file_for_update
shallow: add repository argument to register_shallow
shallow: add repository argument to set_alternate_shallow_file
commit: add repository argument to lookup_commit_graft
commit: add repository argument to prepare_commit_graft
commit: add repository argument to read_graft_file
commit: add repository argument to register_commit_graft
commit: add repository argument to commit_graft_pos
object: move grafts to object parser
object-store: move object access functions to object-store.h
|
|
The "\e" escape is not defined in ISO C.
While on this line, add a missing space after the comma.
Signed-off-by: Beat Bolli <dev+git@drbeat.li>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This should make these functions easier to find and cache.h less
overwhelming to read.
In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file
As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise. It would be better to #include wherever
identifiers from the header are used. That can happen later
when we have better tooling for it.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The new "checkout-encoding" attribute can ask Git to convert the
contents to the specified encoding when checking out to the working
tree (and the other way around when checking in).
* ls/checkout-encoding:
convert: add round trip check based on 'core.checkRoundtripEncoding'
convert: add tracing for 'working-tree-encoding' attribute
convert: check for detectable errors in UTF encodings
convert: add 'working-tree-encoding' attribute
utf8: add function to detect a missing UTF-16/32 BOM
utf8: add function to detect prohibited UTF-16/32 BOM
utf8: teach same_encoding() alternative UTF encoding names
strbuf: add a case insensitive starts_with()
strbuf: add xstrdup_toupper()
strbuf: remove unnecessary NUL assignment in xstrdup_tolower()
|
|
UTF supports lossless conversion round tripping and conversions between
UTF and other encodings are mostly round trip safe as Unicode aims to be
a superset of all other character encodings. However, certain encodings
(e.g. SHIFT-JIS) are known to have round trip issues [1].
Add 'core.checkRoundtripEncoding', which contains a comma separated
list of encodings, to define for what encodings Git should check the
conversion round trip if they are used in the 'working-tree-encoding'
attribute.
Set SHIFT-JIS as default value for 'core.checkRoundtripEncoding'.
[1] https://support.microsoft.com/en-us/help/170559/prb-conversion-problem-between-shift-jis-and-unicode
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add the GIT_TRACE_WORKING_TREE_ENCODING environment variable to enable
tracing for content that is reencoded with the 'working-tree-encoding'
attribute. This is useful to debug encoding issues.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Check that new content is valid with respect to the user defined
'working-tree-encoding' attribute.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Git recognizes files encoded with ASCII or one of its supersets (e.g.
UTF-8 or ISO-8859-1) as text files. All other encodings are usually
interpreted as binary and consequently built-in Git text processing
tools (e.g. 'git diff') as well as most Git web front ends do not
visualize the content.
Add an attribute to tell Git what encoding the user has defined for a
given file. If the content is added to the index, then Git reencodes
the content to a canonical UTF-8 representation. On checkout Git will
reverse this operation.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Convert convert.c to struct object_id. Add a use of the_hash_algo to
replace hard-coded constants and change a strbuf_add to a strbuf_addstr
to avoid another hard-coded constant.
Note that a strict conversion using the hexsz constant would cause
problems in the future if the internal and user-visible hash algorithms
differed, as anticipated by the hash function transition plan.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Conversion from uchar[20] to struct object_id continues.
* po/object-id:
sha1_file: rename hash_sha1_file_literally
sha1_file: convert write_loose_object to object_id
sha1_file: convert force_object_loose to object_id
sha1_file: convert write_sha1_file to object_id
notes: convert write_notes_tree to object_id
notes: convert combine_notes_* to object_id
commit: convert commit_tree* to object_id
match-trees: convert splice_tree to object_id
cache: clear whole hash buffer with oidclr
sha1_file: convert hash_sha1_file to object_id
dir: convert struct sha1_stat to use object_id
sha1_file: convert pretend_sha1_file to object_id
|
|
Convert the declaration and definition of hash_sha1_file to use
struct object_id and adjust all function calls.
Rename this function to hash_object_file.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When calling convert_to_git(), the checksafe parameter defined what
should happen if the EOL conversion (CRLF --> LF --> CRLF) does not
roundtrip cleanly. In addition, it also defined if line endings should
be renormalized (CRLF --> LF) or kept as they are.
checksafe was an safe_crlf enum with these values:
SAFE_CRLF_FALSE: do nothing in case of EOL roundtrip errors
SAFE_CRLF_FAIL: die in case of EOL roundtrip errors
SAFE_CRLF_WARN: print a warning in case of EOL roundtrip errors
SAFE_CRLF_RENORMALIZE: change CRLF to LF
SAFE_CRLF_KEEP_CRLF: keep all line endings as they are
In some cases the integer value 0 was passed as checksafe parameter
instead of the correct enum value SAFE_CRLF_FALSE. That was no problem
because SAFE_CRLF_FALSE is defined as 0.
FALSE/FAIL/WARN are different from RENORMALIZE and KEEP_CRLF. Therefore,
an enum is not ideal. Let's use a integer bit pattern instead and rename
the parameter to conv_flags to make it more generically usable. This
allows us to extend the bit pattern in a subsequent commit.
Reported-By: Randall S. Becker <rsbecker@nexbridge.com>
Helped-By: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The "safe crlf" check incorrectly triggered for contents that does
not use CRLF as line endings, which has been corrected.
* tb/check-crlf-for-safe-crlf:
t0027: Adapt the new MIX tests to Windows
convert: tighten the safe autocrlf handling
|
|
When a text file had been commited with CRLF and the file is commited
again, the CRLF are kept if .gitattributs has "text=auto".
This is done by analyzing the content of the blob stored in the index:
If a '\r' is found, Git assumes that the blob was commited with CRLF.
The simple search for a '\r' does not always work as expected:
A file is encoded in UTF-16 with CRLF and commited. Git treats it as binary.
Now the content is converted into UTF-8. At the next commit Git treats the
file as text, the CRLF should be converted into LF, but isn't.
Replace has_cr_in_index() with has_crlf_in_index(). When no '\r' is found,
0 is returned directly, this is the most common case.
If a '\r' is found, the content is analyzed more deeply.
Reported-By: Ashish Negi <ashishnegi33@gmail.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Assorted bugfixes and clean-ups.
* ma/ts-cleanups:
ThreadSanitizer: add suppressions
strbuf_setlen: don't write to strbuf_slopbuf
pack-objects: take lock before accessing `remaining`
convert: always initialize attr_action in convert_attrs
|
|
Gcc 7 adds -Wimplicit-fallthrough, which can warn when a
switch case falls through to the next case. The general idea
is that the compiler can't tell if this was intentional or
not, so you should annotate any intentional fall-throughs as
such, leaving it to complain about any unannotated ones.
There's a GNU __attribute__ which can be used for
annotation, but of course we'd have to #ifdef it away on
non-gcc compilers. Gcc will also recognize
specially-formatted comments, which matches our current
practice. Let's extend that practice to all of the
unannotated sites (which I did look over and verify that
they were behaving as intended).
Ideally in each case we'd actually give some reasons in the
comment about why we're falling through, or what we're
falling through to. And gcc does support that with
-Wimplicit-fallthrough=2, which relaxes the comment pattern
matching to anything that contains "fallthrough" (or a
variety of spelling variants). However, this isn't the
default for -Wimplicit-fallthrough, nor for -Wextra. In the
name of simplicity, it's probably better for us to support
the default level, which requires "fallthrough" to be the
only thing in the comment (modulo some window dressing like
"else" and some punctuation; see the gcc manual for the
complete set of patterns).
This patch suppresses all warnings due to
-Wimplicit-fallthrough. We might eventually want to add that
to the DEVELOPER Makefile knob, but we should probably wait
until gcc 7 is more widely adopted (since earlier versions
will complain about the unknown warning type).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Many leaks of strbuf have been fixed.
* rs/strbuf-leakfix: (34 commits)
wt-status: release strbuf after use in wt_longstatus_print_tracking()
wt-status: release strbuf after use in read_rebase_todolist()
vcs-svn: release strbuf after use in end_revision()
utf8: release strbuf on error return in strbuf_utf8_replace()
userdiff: release strbuf after use in userdiff_get_textconv()
transport-helper: release strbuf after use in process_connect_service()
sequencer: release strbuf after use in save_head()
shortlog: release strbuf after use in insert_one_record()
sha1_file: release strbuf on error return in index_path()
send-pack: release strbuf on error return in send_pack()
remote: release strbuf after use in set_url()
remote: release strbuf after use in migrate_file()
remote: release strbuf after use in read_remote_branches()
refs: release strbuf on error return in write_pseudoref()
notes: release strbuf after use in notes_copy_from_stdin()
merge: release strbuf after use in write_merge_heads()
merge: release strbuf after use in save_state()
mailinfo: release strbuf on error return in handle_boundary()
mailinfo: release strbuf after use in handle_from()
help: release strbuf on error return in exec_woman_emacs()
...
|
|
Assorted bugfixes and clean-ups.
* ma/ts-cleanups:
ThreadSanitizer: add suppressions
strbuf_setlen: don't write to strbuf_slopbuf
pack-objects: take lock before accessing `remaining`
convert: always initialize attr_action in convert_attrs
|
|
"git apply" that is used as a better "patch -p1" failed to apply a
taken from a file with CRLF line endings to a file with CRLF line
endings. The root cause was because it misused convert_to_git()
that tried to do "safe-crlf" processing by looking at the index
entry at the same path, which is a nonsense---in that mode, "apply"
is not working on the data in (or derived from) the index at all.
This has been fixed.
* tb/apply-with-crlf:
apply: file commited with CRLF should roundtrip diff and apply
convert: add SAFE_CRLF_KEEP_CRLF
|
|
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
"git apply" that is used as a better "patch -p1" failed to apply a
taken from a file with CRLF line endings to a file with CRLF line
endings. The root cause was because it misused convert_to_git()
that tried to do "safe-crlf" processing by looking at the index
entry at the same path, which is a nonsense---in that mode, "apply"
is not working on the data in (or derived from) the index at all.
This has been fixed.
* tb/apply-with-crlf:
apply: file commited with CRLF should roundtrip diff and apply
convert: add SAFE_CRLF_KEEP_CRLF
|
|
convert_attrs contains an "if-else". In the "if", we set attr_action
twice, and the first assignment has no effect. In the "else", we do not
set it at all. Since git_check_attr always returns the same value, we'll
always end up in the "if", so there is no problem right now. But
convert_attrs is obviously trying not to rely on such an
implementation-detail of another component.
Make the initialization of attr_action after the if-else. Remove the
earlier assignments.
Suggested-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When convert_to_git() is called, the caller may want to keep CRLF to
be kept as CRLF (and not converted into LF).
This will be used in the next commit, when apply works with files
that have CRLF and patches are applied onto these files.
Add the new value "SAFE_CRLF_KEEP_CRLF" to safe_crlf.
Prepare convert_to_git() to be able to run the clean filter, skip
the CRLF conversion and run the ident filter.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code cleanup.
* jt/subprocess-handshake:
sub-process: refactor handshake to common function
Documentation: migrate sub-process docs to header
|
|
Many uses of comparision callback function the hashmap API uses
cast the callback function type when registering it to
hashmap_init(), which defeats the compile time type checking when
the callback interface changes (e.g. gaining more parameters).
The callback implementations have been updated to take "void *"
pointers and cast them to the type they expect instead.
* sb/hashmap-cleanup:
t/helper/test-hashmap: use custom data instead of duplicate cmp functions
name-hash.c: drop hashmap_cmp_fn cast
submodule-config.c: drop hashmap_cmp_fn cast
remote.c: drop hashmap_cmp_fn cast
patch-ids.c: drop hashmap_cmp_fn cast
convert/sub-process: drop cast to hashmap_cmp_fn
config.c: drop hashmap_cmp_fn cast
builtin/describe: drop hashmap_cmp_fn cast
builtin/difftool.c: drop hashmap_cmp_fn cast
attr.c: drop hashmap_cmp_fn cast
|
|
The filter-process interface learned to allow a process with long
latency give a "delayed" response.
* ls/filter-process-delayed:
convert: add "status=delayed" to filter process protocol
convert: refactor capabilities negotiation
convert: move multiple file filter error handling to separate function
convert: put the flags field before the flag itself for consistent style
t0021: write "OUT <size>" only on success
t0021: make debug log file name configurable
t0021: keep filter log files on comparison
|
|
Refactor, into a common function, the version and capability negotiation
done when invoking a long-running process as a clean or smudge filter.
This will be useful for other Git code that needs to interact similarly
with a long-running process.
As you can see in the change to t0021, this commit changes the error
message reported when the long-running process does not introduce itself
with the expected "server"-terminated line. Originally, the error
message reports that the filter "does not support filter protocol
version 2", differentiating between the old single-file filter protocol
and the new multi-file filter protocol - I have updated it to something
more generic and useful.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ls/filter-process-delayed:
convert: add "status=delayed" to filter process protocol
convert: refactor capabilities negotiation
convert: move multiple file filter error handling to separate function
convert: put the flags field before the flag itself for consistent style
t0021: write "OUT <size>" only on success
t0021: make debug log file name configurable
t0021: keep filter log files on comparison
|
|
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Some `clean` / `smudge` filters may require a significant amount of
time to process a single blob (e.g. the Git LFS smudge filter might
perform network requests). During this process the Git checkout
operation is blocked and Git needs to wait until the filter is done to
continue with the checkout.
Teach the filter process protocol, introduced in edcc8581 ("convert: add
filter.<driver>.process option", 2016-10-16), to accept the status
"delayed" as response to a filter request. Upon this response Git
continues with the checkout operation. After the checkout operation Git
calls "finish_delayed_checkout" which queries the filter for remaining
blobs. If the filter is still working on the completion, then the filter
is expected to block. If the filter has completed all remaining blobs
then an empty response is expected.
Git has a multiple code paths that checkout a blob. Support delayed
checkouts only in `clone` (in unpack-trees.c) and `checkout` operations
for now. The optimization is most effective in these code paths as all
files of the tree are processed.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The code to negotiate long running filter capabilities was very
repetitive for new capabilities. Replace the repetitive conditional
statements with a table-driven approach. This is useful for the
subsequent patch 'convert: add "status=delayed" to filter process
protocol'.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When using the hashmap a common need is to have access to caller provided
data in the compare function. A couple of times we abuse the keydata field
to pass in the data needed. This happens for example in patch-ids.c.
This patch changes the function signature of the compare function
to have one more void pointer available. The pointer given for each
invocation of the compare function must be defined in the init function
of the hashmap and is just passed through.
Documentation of this new feature is deferred to a later patch.
This is a rather mechanical conversion, just adding the new pass-through
parameter. However while at it improve the naming of the fields of all
compare functions used by hashmaps by ensuring unused parameters are
prefixed with 'unused_' and naming the parameters what they are (instead
of 'unused' make it 'unused_keydata').
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Refactoring the filter error handling is useful for the subsequent patch
'convert: add "status=delayed" to filter process protocol'.
In addition, replace the parentheses around the empty "if" block with a
single semicolon to adhere to the Git style guide.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
|
|
Stop including config.h by default in cache.h. Instead only include
config.h in those files which require use of the config system.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Enable sub-processes to gracefully handle when the process dies by
updating subprocess_read_status to return an error on EOF instead of
dying.
Update apply_multi_file_filter to take advantage of the revised
subprocess_read_status.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Move the sub-proces functions into sub-process.h/c. Add documentation
for the new module in Documentation/technical/api-sub-process.txt
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Do a mechanical rename of the functions that will become the reusable
sub-process module.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Update all functions that are going to be moved into a reusable module
so that they only work with the reusable data structures. Move code
that is specific to the filter out into the filter specific functions.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
To enable future reuse of the filter.<driver>.process infrastructure,
split the cmd2process structure into two separate parts.
subprocess_entry will now contain the generic data required to manage
the creation and tracking of the child process in a hashmap.
cmd2process is a filter protocol specific structure that is used to
track the negotiated capabilities of the filter.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
To enable future reuse of the filter.<driver>.process infrastructure,
split start_multi_file_filter() into two separate parts.
start_multi_file_filter() will now only contain the generic logic to
manage the creation and tracking of the child process in a hashmap.
start_multi_file_filter_fn() is a protocol specific initialization
function that will negotiate the multi-file-filter interface version
and capabilities.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add packet_writel() which writes multiple lines in a single call and
then calls packet_flush_gently(). Update convert.c to use the new
packet_writel() function from pkt-line.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
start_multi_file_filter() and apply_multi_file_filter() currently test
for errno == EPIPE but treating EPIPE as an error is already happening
from one of the packet_write() functions.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Found/Fixed-by: Jeff King <peff@peff.net>
Acked-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The remaining callers are all simple "I have N attributes I am
interested in. I'll ask about them with various paths one by one".
After this step, no caller to git_check_attrs() remains. After
removing it, we can extend "struct attr_check" struct with data
that can be used in optimizing the query for the specific N
attributes it contains.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The traditional API to check attributes is to prepare an N-element
array of "struct git_attr_check" and pass N and the array to the
function "git_check_attr()" as arguments.
In preparation to revamp the API to pass a single structure, in
which these N elements are held, rename the type used for these
individual array elements to "struct attr_check_item" and rename
the function to "git_check_attrs()".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Fix a corner case in merge-recursive regression that crept in
during 2.10 development cycle.
* jc/renormalize-merge-kill-safer-crlf:
convert: git cherry-pick -Xrenormalize did not work
merge-recursive: handle NULL in add_cacheinfo() correctly
cherry-pick: demonstrate a segmentation fault
|
|
Working with a repo that used to be all CRLF. At some point it
was changed to all LF, with `text=auto` in .gitattributes.
Trying to cherry-pick a commit from before the switchover fails:
$ git cherry-pick -Xrenormalize <commit>
fatal: CRLF would be replaced by LF in [path]
Commit 65237284 "unify the "auto" handling of CRLF" introduced
a regression:
Whenever crlf_action is CRLF_TEXT_XXX and not CRLF_AUTO_XXX,
SAFE_CRLF_RENORMALIZE was feed into check_safe_crlf(). This is
wrong because here everything else than SAFE_CRLF_WARN is treated as
SAFE_CRLF_FAIL.
Call check_safe_crlf() only if checksafe is SAFE_CRLF_WARN or
SAFE_CRLF_FAIL.
Reported-by: Eevee (Lexy Munroe) <eevee@veekun.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The smudge/clean filter API expect an external process is spawned
to filter the contents for each path that has a filter defined. A
new type of "process" filter API has been added to allow the first
request to run the filter for a path to spawn a single process, and
all filtering need is served by this single process for multiple
paths, reducing the process creation overhead.
* ls/filter-process:
contrib/long-running-filter: add long running filter example
convert: add filter.<driver>.process option
convert: prepare filter.<driver>.process option
convert: make apply_filter() adhere to standard Git error handling
pkt-line: add functions to read/write flush terminated packet streams
pkt-line: add packet_write_gently()
pkt-line: add packet_flush_gently()
pkt-line: add packet_write_fmt_gently()
pkt-line: extract set_packet_header()
pkt-line: rename packet_write() to packet_write_fmt()
run-command: add clean_on_exit_handler
run-command: move check_pipe() from write_or_die to run_command
convert: modernize tests
convert: quote filter names in error messages
|
|
Mark error messages about CRLF for translation.
Update test to reflect changes.
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Git's clean/smudge mechanism invokes an external filter process for
every single blob that is affected by a filter. If Git filters a lot of
blobs then the startup time of the external filter processes can become
a significant part of the overall Git execution time.
In a preliminary performance test this developer used a clean/smudge
filter written in golang to filter 12,000 files. This process took 364s
with the existing filter mechanism and 5s with the new mechanism. See
details here: https://github.com/github/git-lfs/pull/1382
This patch adds the `filter.<driver>.process` string option which, if
used, keeps the external filter process running and processes all blobs
with the packet format (pkt-line) based protocol over standard input and
standard output. The full protocol is explained in detail in
`Documentation/gitattributes.txt`.
A few key decisions:
* The long running filter process is referred to as filter protocol
version 2 because the existing single shot filter invocation is
considered version 1.
* Git sends a welcome message and expects a response right after the
external filter process has started. This ensures that Git will not
hang if a version 1 filter is incorrectly used with the
filter.<driver>.process option for version 2 filters. In addition,
Git can detect this kind of error and warn the user.
* The status of a filter operation (e.g. "success" or "error) is set
before the actual response and (if necessary!) re-set after the
response. The advantage of this two step status response is that if
the filter detects an error early, then the filter can communicate
this and Git does not even need to create structures to read the
response.
* All status responses are pkt-line lists terminated with a flush
packet. This allows us to send other status fields with the same
protocol in the future.
Helped-by: Martin-Louis Bright <mlbright@gmail.com>
Reviewed-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Refactor the existing 'single shot filter mechanism' and prepare the
new 'long running filter mechanism'.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
apply_filter() returns a boolean that tells the caller if it
"did convert or did not convert". The variable `ret` was used throughout
the function to track errors whereas `1` denoted success and `0`
failure. This is unusual for the Git source where `0` denotes success.
Rename the variable and flip its value to make the function easier
readable for Git developers.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Git filter driver commands with spaces (e.g. `filter.sh foo`) are hard
to read in error messages. Quote them to improve the readability.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When a non-reversible CRLF conversion is done in "git add",
a warning is printed on stderr (or Git dies, depending on checksafe)
The function commit_chk_wrnNNO() in t0027 was written to test this,
but did the wrong thing: Instead of looking at the warning
from "git add", it looked at the warning from "git commit".
This is racy because "git commit" may not have to do CRLF conversion
at all if it can use the sha1 value from the index (which depends on
whether "add" and "commit" run in a single second).
Correct t0027 and replace the commit for each and every file with a commit
of all files in one go.
The function commit_chk_wrnNNO() should be renamed in a separate commit.
Now that t0027 does the right thing, it detects a bug in covert.c:
This sequence should generate the warning `LF will be replaced by CRLF`,
but does not:
$ git init
$ git config core.autocrlf false
$ printf "Line\r\n" >file
$ git add file
$ git commit -m "commit with CRLF"
$ git config core.autocrlf true
$ printf "Line\n" >file
$ git add file
"git add" calls crlf_to_git() in convert.c, which calls check_safe_crlf().
When has_cr_in_index(path) is true, crlf_to_git() returns too early and
check_safe_crlf() is not called at all.
Factor out the code which determines if "git checkout" converts LF->CRLF
into will_convert_lf_to_crlf().
Update the logic around check_safe_crlf() and "simulate" the possible
LF->CRLF conversion at "git checkout" with help of will_convert_lf_to_crlf().
Thanks to Jeff King <peff@peff.net> for analyzing t0027.
Reported-By: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Before this change,
$ echo "* text=auto" >.gitattributes
$ echo "* eol=crlf" >>.gitattributes
would have the same effect as
$ echo "* text" >.gitattributes
$ git config core.eol crlf
Since the 'eol' attribute had higher priority than 'text=auto', this may
corrupt binary files and is not what most users expect to happen.
Make the 'eol' attribute to obey 'text=auto' and now
$ echo "* text=auto" >.gitattributes
$ echo "* eol=crlf" >>.gitattributes
behaves the same as
$ echo "* text=auto" >.gitattributes
$ git config core.eol crlf
In other words,
$ echo "* text=auto eol=crlf" >.gitattributes
has the same effect as
$ git config core.autocrlf true
and
$ echo "* text=auto eol=lf" >.gitattributes
has the same effect as
$ git config core.autocrlf input
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When the ident attributes is set, get_stream_filter() did not obey
core.autocrlf=true, and the file was checked out with LF.
Change the rule when a streaming filter can be used:
- if an external filter is specified, don't use a stream filter.
- if the worktree eol is CRLF and "auto" is active, don't use a stream filter.
- Otherwise the stream filter can be used.
Add test cases in t0027.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Code simplification.
* tb/conversion:
convert.c: correct attr_action()
convert.c: simplify text_stat
convert.c: refactor crlf_action
convert.c: use text_eol_is_crlf()
convert.c: remove input_crlf_action()
convert.c: remove unused parameter 'path'
t0027: add tests for get_stream_filter()
|
|
df747b81 (convert.c: refactor crlf_action, 2016-02-10) introduced a
bug to "git ls-files --eol".
The "text" attribute was shown as "text eol=lf" or "text eol=crlf",
depending on core.autocrlf or core.eol.
Correct this and add test cases in t0027.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Simplify the statistics:
lonecr counts the CR which is not followed by a LF,
lonelf counts the LF which is not preceded by a CR,
crlf counts CRLF combinations.
This simplifies the evaluation of the statistics.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Refactor the determination and usage of crlf_action.
Today, when no "crlf" attribute are set on a file, crlf_action is set to
CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as
before.
After searching for line ending attributes, save the value in
struct conv_attrs.crlf_action attr_action,
so that get_convert_attr_ascii() is able report the attributes.
Replace the old CRLF_GUESS usage:
CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF
CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY
CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT
Save the action in conv_attrs.crlf_action (as before) and change
all callers.
Make more clear, what is what, by defining:
- CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf
and core.eol is evaluated and one of CRLF_BINARY,
CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected
- CRLF_BINARY : No processing of line endings.
- CRLF_TEXT : attribute "text" is set, line endings are processed.
- CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text.
- CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text.
- CRLF_AUTO : attribute "auto" is set.
- CRLF_AUTO_INPUT: core.autocrlf=input (no attributes)
- CRLF_AUTO_CRLF : core.autocrlf=true (no attributes)
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Clean/smudge filters defined in a configuration file of lower
precedence can now be overridden to be a pass-through no-op by
setting the variable to an empty string.
* ls/clean-smudge-override-in-config:
convert: treat an empty string for clean/smudge filters as "cat"
|
|
Add a helper function to find out, which line endings text files
should get at checkout, depending on core.autocrlf and core.eol
configuration variables.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Integrate the code of input_crlf_action() into convert_attrs(),
so that ca.crlf_action is always valid after calling convert_attrs().
Keep a copy of crlf_action in attr_action, this is needed for
get_convert_attr_ascii().
Remove eol_attr from struct conv_attrs, as it is now used temporally.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Some functions get a parameter path, but don't use it.
Remove the unused parameter.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Once a lower-priority configuration file defines a clean or smudge
filter, there is no convenient way to override it to produce as-is
output. Even though the configuration mechanism implements "the
last one wins" semantics, you cannot set them to an empty string and
expect them to work, as apply_filter() would try to run the empty
string as an external command and fail. The conversion is not done,
but the function would still report a failure to convert.
Even though resetting the variable to "cat" (i.e. pass the data back
as-is and report success) is an obvious and a viable way to solve
this, it is wasteful to spawn an external process just as a
workaround.
Instead, teach apply_filter() to treat an empty string as a no-op
filter that always returns successfully its input as-is without
conversion.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When working in a cross-platform environment, a user may want to
check if text files are stored normalized in the repository and
if .gitattributes are set appropriately.
Make it possible to let Git show the line endings in the index and
in the working tree and the effective text/eol attributes.
The end of line ("eolinfo") are shown like this:
"-text" binary (or with bare CR) file
"none" text file without any EOL
"lf" text file with LF
"crlf" text file with CRLF
"mixed" text file with mixed line endings.
The effective text/eol attribute is one of these:
"", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf"
git ls-files --eol gives an output like this:
i/none w/none attr/text=auto t/t5100/empty
i/-text w/-text attr/-text t/test-binary-2.png
i/lf w/lf attr/text eol=lf t/t5100/rfc2047-info-0007
i/lf w/crlf attr/text eol=crlf doit.bat
i/mixed w/mixed attr/ locale/XX.po
to show what eol convention is used in the data in the index ('i'),
and in the working tree ('w'), and what attribute is in effect,
for each path that is shown.
Add test cases in t0027.
Helped-By: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We sometimes sprintf into fixed-size buffers when we know
that the buffer is large enough to fit the input (either
because it's a constant, or because it's numeric input that
is bounded in size). Likewise with strcpy of constant
strings.
However, these sites make it hard to audit sprintf and
strcpy calls for buffer overflows, as a reader has to
cross-reference the size of the array with the input. Let's
use xsnprintf instead, which communicates to a reader that
we don't expect this to overflow (and catches the mistake in
case we do).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We are explicitly ignoring SIGPIPE, as we fully expect that the
filter program may not read our output fully. Ignore EPIPE that
may come from writing to it as well.
A new test was stolen from Jeff's suggestion.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When running a required clean filter, we do not have to mmap the
original before feeding the filter. Instead, stream the file
contents directly to the filter and process its output.
* sp/stream-clean-filter:
sha1_file: don't convert off_t to size_t too early to avoid potential die()
convert: stream from fd to required clean filter to reduce used address space
copy_fd(): do not close the input file descriptor
mmap_limit: introduce GIT_MMAP_LIMIT to allow testing expected mmap size
memory_limit: use git_env_ulong() to parse GIT_ALLOC_LIMIT
config.c: add git_env_ulong() to parse environment variable
convert: drop arguments other than 'path' from would_convert_to_git()
|
|
The data is streamed to the filter process anyway. Better avoid mapping
the file if possible. This is especially useful if a clean filter
reduces the size, for example if it computes a sha1 for binary data,
like git media. The file size that the previous implementation could
handle was limited by the available address space; large files for
example could not be handled with (32-bit) msysgit. The new
implementation can filter files of any size as long as the filter output
is small enough.
The new code path is only taken if the filter is required. The filter
consumes data directly from the fd. If it fails, the original data is
not immediately available. The condition can easily be handled as
a fatal error, which is expected for a required filter anyway.
If the filter was not required, the condition would need to be handled
in a different way, like seeking to 0 and reading the data. But this
would require more restructuring of the code and is probably not worth
it. The obvious approach of falling back to reading all data would not
help achieving the main purpose of this patch, which is to handle large
files with limited address space. If reading all data is an option, we
can simply take the old code path right away and mmap the entire file.
The environment variable GIT_MMAP_LIMIT, which has been introduced in
a previous commit is used to test that the expected code path is taken.
A related test that exercises required filters is modified to verify
that the data actually has been modified on its way from the file system
to the object store.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Most struct child_process variables are cleared using memset first after
declaration. Provide a macro, CHILD_PROCESS_INIT, that can be used to
initialize them statically instead. That's shorter, doesn't require a
function call and is slightly more readable (especially given that we
already have STRBUF_INIT, ARGV_ARRAY_INIT etc.).
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It's a common idiom to match a prefix and then skip past it
with a magic number, like:
if (starts_with(foo, "bar"))
foo += 3;
This is easy to get wrong, since you have to count the
prefix string yourself, and there's no compiler check if the
string changes. We can use skip_prefix to avoid the magic
numbers here.
Note that some of these conversions could be much shorter.
For example:
if (starts_with(arg, "--foo=")) {
bar = arg + 6;
continue;
}
could become:
if (skip_prefix(arg, "--foo=", &bar))
continue;
However, I have left it as:
if (skip_prefix(arg, "--foo=", &v)) {
bar = v;
continue;
}
to visually match nearby cases which need to actually
process the string. Like:
if (skip_prefix(arg, "--foo=", &v)) {
bar = atoi(v);
continue;
}
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.
The change can be recreated by mechanically applying this:
$ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
grep -v strbuf\\.c |
xargs perl -pi -e '
s|!prefixcmp\(|starts_with\(|g;
s|prefixcmp\(|!starts_with\(|g;
s|!suffixcmp\(|ends_with\(|g;
s|suffixcmp\(|!ends_with\(|g;
'
on the result of preparatory changes in this series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Ondřej Bílka <neleai@seznam.cz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Reduce duplicated code between convert.c and attr.c.
* lf/read-blob-data-from-index:
convert.c: remove duplicate code
read_blob_data_from_index(): optionally return the size of blob data
attr.c: extract read_index_data() as read_blob_data_from_index()
|
|
The has_cr_in_index() function is an almost 1:1 copy of
read_blob_data_from_index() with some additions. Use the
latter instead of using copy-pasted code.
Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
These callers can drop some inline pointer arithmetic and
magic offset constants, making them more readable and less
error-prone (those constants had to match the lengths of
strings, but there is no automatic verification of that
fact).
The "ep" pointer (presumably for "end pointer"), which
points to the final key segment of the config variable, is
given the more standard name "key" to describe its function
rather than its derivation.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jb/required-filter:
Add a setting to require a filter to be successful
Conflicts:
convert.c
|
|
* jk/maint-avoid-streaming-filtered-contents:
do not stream large files to pack when filters are in use
teach dry-run convert_to_git not to require a src buffer
teach convert_to_git a "dry run" mode
|
|
When we call convert_to_git in dry-run mode, it may still
want to look at the source buffer, because some CRLF
conversion modes depend on analyzing the source to determine
whether it is in fact convertible CRLF text.
However, the main motivation for convert_to_git's dry-run
mode is that we would decide which method to use to acquire
the blob's data (streaming versus in-core). Requiring this
source analysis creates a chicken-and-egg problem. We are
better off simply guessing that anything we can't analyze
will end up needing conversion.
This patch lets a caller specify a NULL src buffer when
using dry-run mode (and only dry-run mode). A non-zero
return value goes from "we would convert" to "we might
convert"; a zero return value remains "we would definitely
not convert".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Some callers may want to know whether convert_to_git will
actually do anything before performing the conversion
itself (e.g., to decide whether to stream or handle blobs
in-core). This patch lets callers specify the dry run mode
by passing a NULL destination buffer. The return value,
instead of indicating whether conversion happened, will
indicate whether conversion would occur.
For readability, we also include a wrapper function which
makes it more obvious we are not actually performing the
conversion.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If a filter is not defined or if it fails, git should behave as if the
filter is a no-op passthru.
However, if the filter exits before reading all the content, depending on
the timing, git could be killed with SIGPIPE when it tries to write to the
pipe connected to the filter.
Ignore SIGPIPE while processing the filter to give us a chance to check
the return value from a failed write, in order to detect and act on this
mode of failure in a more controlled way.
Signed-off-by: Jehan Bing <jehan@orb.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
By default, a missing filter driver or a failure from the filter driver is
not an error, but merely makes the filter operation a no-op pass through.
This is useful to massage the content into a shape that is more convenient
for the platform, filesystem, and the user to use, and the content filter
mechanism is not used to turn something unusable into usable.
However, we could also use of the content filtering mechanism and store
the content that cannot be directly used in the repository (e.g. a UUID
that refers to the true content stored outside git, or an encrypted
content) and turn it into a usable form upon checkout (e.g. download the
external content, or decrypt the encrypted content). For such a use case,
the content cannot be used when filter driver fails, and we need a way to
tell Git to abort the whole operation for such a failing or missing filter
driver.
Add a new "filter.<driver>.required" configuration variable to mark the
second use case. When it is set, git will abort the operation when the
filter driver does not exist or exits with a non-zero status code.
Signed-off-by: Jehan Bing <jehan@orb.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* jc/maint-lf-to-crlf-keep-crlf:
lf_to_crlf_filter(): resurrect CRLF->CRLF hack
|
|
* jc/maint-lf-to-crlf-keep-crlf:
lf_to_crlf_filter(): resurrect CRLF->CRLF hack
|
|
* cn/maint-lf-to-crlf-filter:
lf_to_crlf_filter(): tell the caller we added "\n" when draining
convert: track state in LF-to-CRLF filter
|
|
The non-streaming version of the filter counts CRLF and LF in the whole
buffer, and returns without doing anything when they match (i.e. what is
recorded in the object store already uses CRLF). This was done to help
people who added files from the DOS world before realizing they want to go
cross platform and adding .gitattributes to tell Git that they only want
CRLF in their working tree.
The streaming version of the filter does not want to read the whole thing
before starting to work, as that defeats the whole point of streaming. So
we instead check what byte follows CR whenever we see one, and add CR
before LF only when the LF does not immediately follow CR already to keep
CRLF as is.
Reported-and-tested-by: Ralf Thielow
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This can only happen when the input size is multiple of the
buffer size of the cascade filter (16k) and ends with an LF,
but in such a case, the code forgot to tell the caller that
it added the "\n" it could not add during the last round.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There may not be enough space to store CRLF in the output. If we don't
fill the buffer, then the filter will keep getting called with the same
short buffer and will loop forever.
Instead, always store the CR and record whether there's a missing LF
if so we store it in the output buffer the next time the function gets
called.
Reported-by: Henrik Grubbström <grubba@roxen.com>
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The git_path_check_eol() function converts a string value to the
corresponding 'enum eol' value. However, the function is currently
declared to return an 'enum crlf_action', which causes sparse to
complain thus:
SP convert.c
convert.c:736:50: warning: mixing different enum types
convert.c:736:50: int enum crlf_action versus
convert.c:736:50: int enum eol
In order to suppress the warning, we simply correct the return type
in the function declaration.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* tr/maint-ident-to-git-memmove:
Use memmove in ident_to_git
|
|
convert_to_git sets src=dst->buf if any of the preceding conversions
actually did any work. Thus in ident_to_git we have to use memmove
instead of memcpy as far as src->dst copying is concerned.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Suggested by: Junio Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This implements an internal "cascade" filter mechanism that plugs
two filters in series.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add support for "ident" filter on the output codepath. This does not work
with lf-to-crlf filter together (yet).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If we do not have to guess or validate by scanning the input, we can
just stream this through.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Some filters may need to buffer the input and look-ahead inside it
to decide what to output, and they may consume more than zero bytes
of input and still not produce any output. After feeding all the
input, pass NULL as input as keep calling stream_filter() to let
such filters know there is no more input coming, and it is time for
them to produce the remaining output based on the buffered input.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This introduces an API to plug custom filters to an input stream.
The caller gets get_stream_filter("path") to obtain an appropriate
filter for the path, and then uses it when opening an input stream
via open_istream(). After that, the caller can read from the stream
with read_istream(), and close it with close_istream(), just like an
unfiltered stream.
This only adds a "null" filter that is a pass-thru filter, but later
changes can add LF-to-CRLF and other filters, and the callers of the
streaming API do not have to change.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When the output to a path does not have to be converted, we can read from
the object database from the streaming API and write to the file in the
working tree, without having to hold everything in the memory.
The ident, auto- and safe- crlf conversions inherently require you to read
the whole thing before deciding what to do, so while it is technically
possible to support them by using a buffer of an unbound size or rewinding
and reading the stream twice, it is less practical than the traditional
"read the whole thing in core and convert" approach.
Adding streaming filters for the other conversions on top of this should
be doable by tweaking the can_bypass_conversion() function (it should be
renamed to can_filter_stream() when it happens). Then the streaming API
can be extended to wrap the git_istream streaming_write_entry() opens on
the underlying object in another git_istream that reads from it, filters
what is read, and let the streaming_write_entry() read the filtered
result. But that is outside the scope of this series.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The current internal API requires the callers of setup_convert_check() to
supply the git_attr_check structures (hence they need to know how many to
allocate), but they grab the same set of attributes for given path.
Define a new convert_attrs() API that fills a higher level information that
the callers (convert_to_git and convert_to_working_tree) really want, and
move the common code to interact with the attributes system to it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The places that need to pass an array of "struct git_attr_check" needed to
be careful to pass a large enough array and know what index each element
lied. Make it safer and easier to code these.
Besides, the hard-coded sequence of initializing various attributes was
too ugly after we gained more than a few attributes.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Back when the conversion was only about the end-of-line convention, it
might have made sense to call what we do upon seeing CR/LF simply an
"action", but these days the conversion routines do a lot more than just
tweaking the line ending. Raname "action" to "crlf_action".
The function that decides what end of line conversion to use on the output
codepath was called "determine_output_conversion", as if there is no other
kind of output conversion. Rename it to "output_eol"; it is a function
that returns what EOL convention is to be used.
A function that decides what "crlf_action" needs to be used on the input
codepath, given what conversion attribute is set to the path and global
end-of-line convention, was called "determine_action". Rename it to
"input_crlf_action".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Yes, it is clear that "eol" wants to mean some sort of end-of-line thing,
but as the name of a global variable, it is way too short to describe what
kind of end-of-line thing it wants to represent. Besides, there are many
codepaths that want to use their own local "char *eol" variable to point
at the end of the current line they are processing.
This global variable holds what we read from core.eol configuration
variable. Name it as such.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Since v1.7.2-rc0~23^2~2 (Add per-repository eol normalization,
2010-05-19), building with gcc -std=gnu89 -pedantic produces warnings
like the following:
convert.c:21:11: warning: comma at end of enumerator list [-pedantic]
gcc is right to complain --- these commas are not permitted in C89.
In the spirit of v1.7.2-rc0~32^2~16 (2010-05-14), remove them.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Filtering to support keyword expansion may need the name of
the file being filtered. In particular, to support p4 keywords
like
$File: //depot/product/dir/script.sh $
the smudge filter needs to know the name of the file it is
smudging.
Allow "%f" in the custom filter command line specified in the
configuration. This will be substituted by the filename
inside a single-quote pair to be passed to the shell.
Signed-off-by: Pete Wyckoff <pw@padd.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Disable CRLF expansion when convert_to_working_tree() is called from
normalize_buffer(). This improves performance when merging branches
with conflicting line endings when core.eol=crlf or core.autocrlf=true
by making the normalization act as if core.eol=lf.
Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Currently, merging across changes in line ending normalization is
painful since files containing CRLF will conflict with normalized files,
even if the only difference between the two versions is the line
endings. Additionally, any "real" merge conflicts that exist are
obscured because every line in the file has a conflict.
Assume you start out with a repo that has a lot of text files with CRLF
checked in (A):
o---C
/ \
A---B---D
B: Add "* text=auto" to .gitattributes and normalize all files to
LF-only
C: Modify some of the text files
D: Try to merge C
You will get a ridiculous number of LF/CRLF conflicts when trying to
merge C into D, since the repository contents for C are "wrong" wrt the
new .gitattributes file.
Fix ll-merge so that the "base", "theirs" and "ours" stages are passed
through convert_to_worktree() and convert_to_git() before a three-way
merge. This ensures that all three stages are normalized in the same
way, removing from consideration differences that are only due to
normalization.
This feature is optional for now since it changes a low-level mechanism
and is not necessary for the majority of users. The "merge.renormalize"
config variable enables it.
Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* eb/core-eol:
Add "core.eol" config variable
Rename the "crlf" attribute "text"
Add per-repository eol normalization
Add tests for per-repository eol normalization
Conflicts:
Documentation/config.txt
Makefile
|
|
* fg/autocrlf:
autocrlf: Make it work also for un-normalized repositories
|
|
* gv/portable:
test-lib: use DIFF definition from GIT-BUILD-OPTIONS
build: propagate $DIFF to scripts
Makefile: Tru64 portability fix
Makefile: HP-UX 10.20 portability fixes
Makefile: HPUX11 portability fixes
Makefile: SunOS 5.6 portability fix
inline declaration does not work on AIX
Allow disabling "inline"
Some platforms lack socklen_t type
Make NO_{INET_NTOP,INET_PTON} configured independently
Makefile: some platforms do not have hstrerror anywhere
git-compat-util.h: some platforms with mmap() lack MAP_FAILED definition
test_cmp: do not use "diff -u" on platforms that lack one
fixup: do not unconditionally disable "diff -u"
tests: use "test_cmp", not "diff", when verifying the result
Do not use "diff" found on PATH while building and installing
enums: omit trailing comma for portability
Makefile: -lpthread may still be necessary when libc has only pthread stubs
Rewrite dynamic structure initializations to runtime assignment
Makefile: pass CPPFLAGS through to fllow customization
Conflicts:
Makefile
wt-status.h
|
|
Introduce a new configuration variable, "core.eol", that allows the user
to set which line endings to use for end-of-line-normalized files in the
working directory. It defaults to "native", which means CRLF on Windows
and LF everywhere else.
Note that "core.autocrlf" overrides core.eol. This means that
[core]
autocrlf = true
puts CRLFs in the working directory even if core.eol is set to "lf".
Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Unfortunately, there are still plenty of production systems with
vendor compilers that choke unless all compound declarations can be
determined statically at compile time, for example hpux10.20 (I can
provide a comprehensive list of our supported platforms that exhibit
this problem if necessary).
This patch simply breaks apart any compound declarations with dynamic
initialisation expressions, and moves the initialisation until after
the last declaration in the same block, in all the places necessary to
have the offending compilers accept the code.
Signed-off-by: Gary V. Vaughan <gary@thewrittenword.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|