aboutsummaryrefslogtreecommitdiffstats
path: root/pack-objects.c
AgeCommit message (Collapse)AuthorFilesLines
2023-12-14pack-objects: free packing_data in more placesTaylor Blau1-0/+15
The pack-objects internals use a packing_data struct to track what objects are part of the pack(s) being formed. Since these structures contain allocated fields, failing to appropriately free() them results in a leak. Plug that leak by introducing a clear_packing_data() function, and call it in the appropriate spots. This is a fairly straightforward leak to plug, since none of the callers expect to read any values or have any references to parts of the address space being freed. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-29parse: separate out parsing functions from config.hCalvin Wan1-1/+1
The files config.{h,c} contain functions that have to do with parsing, but not config. In order to further reduce all-in-one headers, separate out functions in config.c that do not operate on config into its own file, parse.h, and update the include directives in the .c files that need only such functions accordingly. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-05git-compat-util: move alloc macros to git-compat-util.hCalvin Wan1-1/+0
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for dynamic array allocation. Moving these macros to git-compat-util.h with the other alloc macros focuses alloc.[ch] to allocation for Git objects and additionally allows us to remove inclusions to alloc.h from files that solely used the above macros. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23alloc.h: move ALLOC_GROW() functions from cache.hElijah Newren1-1/+2
This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26pack-mtimes: support writing pack .mtimes filesTaylor Blau1-0/+6
Now that the `.mtimes` format is defined, supplement the pack-write API to be able to conditionally write an `.mtimes` file along with a pack by setting an additional flag and passing an oidmap that contains the timestamps corresponding to each object in the pack. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-13use CALLOC_ARRAYRené Scharfe1-1/+1
Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24pack-objects: convert oe_set_delta_ext() to use object_idJeff King1-2/+2
We already store an object_id internally, and now our sole caller also has one. Let's stop passing around the internal hash array, which adds a bit of type safety. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-01Merge branch 'jk/optim-in-pack-idx-conversion'Junio C Hamano1-1/+4
Code clean-up. * jk/optim-in-pack-idx-conversion: pack-objects: avoid pointless oe_map_new_pack() calls
2019-11-12pack-objects: avoid pointless oe_map_new_pack() callsJeff King1-1/+4
This patch fixes an extreme slowdown in pack-objects when you have more than 1023 packs. See below for numbers. Since 43fa44fa3b (pack-objects: move in_pack out of struct object_entry, 2018-04-14), we use a complicated system to save some per-object memory. Each object_entry structs gets a 10-bit field to store the index of the pack it's in. We map those indices into pointers using packing_data->in_pack_by_idx, which we initialize at the start of the program. If we have 2^10 or more packs, then we instead create an array of pack pointers, one per object. This is packing_data->in_pack. So far so good. But there's one other tricky case: if a new pack arrives after we've initialized in_pack_by_idx, it won't have an index yet. We solve that by calling oe_map_new_pack(), which just switches on the fly to the less-optimal in_pack mechanism, allocating the array and back-filling it for already-seen objects. But that logic kicks in even when we've switched to it already (whether because we really did see a new pack, or because we had too many packs in the first place). The result doesn't produce a wrong outcome, but it's very slow. What happens is this: - imagine you have a repo with 500k objects and 2000 packs that you want to repack. - before looking at any objects, we call prepare_in_pack_by_idx(). It starts allocating an index for each pack. On the 1024th pack, it sees there are too many, so it bails, leaving in_pack_by_idx as NULL. - while actually adding objects to the packing list, we call oe_set_in_pack(), which checks whether the pack already has an index. If it's one of the packs after the first 1023, then it doesn't have one, and we'll call oe_map_new_pack(). But there's no useful work for that function to do. We're already using in_pack, so it just uselessly walks over the complete list of objects, trying to backfill in_pack. And we end up doing this for almost 1000 packs (each of which may be triggered by more than one object). And each time it triggers, we may iterate over up to 500k objects. So in the absolute worst case, this is quadratic in the number of objects. The solution is simple: we don't need to bother checking whether the pack has an index if we've already converted to using in_pack, since by definition we're not going to use it. So we can just push the "does the pack have a valid index" check down into that half of the conditional, where we know we're going to use it. The current test in p5303 sadly doesn't notice this problem, since it maxes out at 1000 packs. If we add a new test to it at 2000 packs, it does show the improvement: Test HEAD^ HEAD ---------------------------------------------------------------------- 5303.12: repack (2000) 26.72(39.68+0.67) 15.70(28.70+0.66) -41.2% However, these many-pack test cases are rather expensive to run, so adding larger and larger numbers isn't appealing. Instead, we can show it off more easily by using GIT_TEST_FULL_IN_PACK_ARRAY, which forces us into the absolute worst case: no pack has an index, so we'll trigger oe_map_new_pack() pointlessly for every single object, making it truly quadratic. Here are the numbers (on git.git) with the included change to p5303: Test HEAD^ HEAD ---------------------------------------------------------------------- 5303.3: rev-list (1) 2.05(1.98+0.06) 2.06(1.99+0.06) +0.5% 5303.4: repack (1) 33.45(33.46+0.19) 2.75(2.73+0.22) -91.8% 5303.6: rev-list (50) 2.07(2.01+0.06) 2.06(2.01+0.05) -0.5% 5303.7: repack (50) 34.21(35.18+0.16) 3.49(4.50+0.12) -89.8% 5303.9: rev-list (1000) 2.87(2.78+0.08) 2.88(2.80+0.07) +0.3% 5303.10: repack (1000) 41.26(51.30+0.47) 10.75(20.75+0.44) -73.9% Again, those improvements aren't realistic for the 1-pack case (because in the real world, the full-array solution doesn't kick in), but it's more useful to be testing the more-complicated code path. While we're looking at this issue, we'll tweak one more thing: in oe_map_new_pack(), we call REALLOC_ARRAY(pack->in_pack). But we'd never expect to get here unless we're back-filling it for the first time, in which case it would be NULL. So let's switch that to ALLOC_ARRAY() for clarity, and add a BUG() to document the expectation. Unfortunately this code isn't well-covered in the test suite because it's inherently racy (it only kicks in if somebody else adds a new pack while we're in the middle of repacking). Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-06pack-objects: drop packlist index_pos optimizationJeff King1-9/+11
Once upon a time, the code to add an object to our packing list in pack-objects all lived in a single function. It computed the position within the hash table once, then used it to check if the object was already present, and if not, to add it. Later, in 2834bc27c1 (pack-objects: refactor the packing list, 2013-10-24), this was split into two functions: packlist_find() and packlist_alloc(). We ended up with an "index_pos" variable that gets passed through several functions to make it from one to the other. The resulting code is rather confusing to follow. The "index_pos" variable is sometimes undefined, if we don't yet have a hash table. This works out in practice because in that case packlist_alloc() won't use it at all, since it will have to create/grow the hash table. But it's hard to verify that, and it does cause gcc 9.2.1's -Wmaybe-uninitialized to complain when compiled with "-flto -O3" (rightfully, since we do pass the uninitialized value as a function parameter, even if nobody ends up using it). All of this is to save computing the hash index again when we're inserting into the hash table, which I found doesn't make a measurable difference in the program runtime (which is not surprising, since we're doing all kinds of other heavyweight things for each object). Let's just drop this index_pos variable entirely, simplifying the code (and pleasing the compiler). We might be better still refactoring this custom hash table to use one of our existing implementations (an oidmap, or a kh_oid_map). I stopped short of that here, but this would be the likely first step towards that anyway. Reported-by: Stephan Beyer <s-beyer@gmx.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-06pack-objects: use object_id in packlist_alloc()Jeff King1-2/+2
The only caller of packlist_alloc() already has a "struct object_id", and we immediately copy the hash they pass us into our own object_id. Let's avoid the unnecessary round-trip to a raw sha1 pointer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20hashmap: convert sha1hash() to oidhash()Jeff King1-1/+1
There are no callers left of sha1hash() that do not simply pass the "hash" member of a "struct object_id". Let's get rid of the outdated sha1-specific function and provide one that operates on the whole struct (even though the technique, taking the first few bytes of the hash, will remain the same). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20pack-objects: convert locate_object_entry_hash() to object_idJeff King1-5/+5
There are no callers of locate_object_entry_hash() that aren't just passing us the "hash" member of a "struct object_id". Let's take the whole struct, which gets us closer to removing all raw sha1 variables. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20pack-objects: convert packlist_find() to use object_idJeff King1-2/+2
We take a raw hash pointer, but most of our callers have a "struct object_id" already. Let's switch to taking the full struct, which will let us continue removing uses of raw sha1 buffers. There are two callers that do need special attention: - in rebuild_existing_bitmaps(), we need to switch to nth_packed_object_oid(). This incurs an extra hash copy over pointing straight to the mmap'd sha1, but it shouldn't be measurable compared to the rest of the operation. - in can_reuse_delta() we already spent the effort to copy the sha1 into a "struct object_id", but now we just have to do so a little earlier in the function (we can't easily convert that function's callers because they may be pointing at mmap'd REF_DELTA blocks). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-14pack-objects: drop unused parameter from oe_map_new_pack()Jeff King1-2/+1
Since 43fa44fa3b (pack-objects: move in_pack out of struct object_entry, 2018-04-14), we store the source pack for each object as a small index rather than as a pointer. When we see a new pack that has no allocated index, we fall back to generating an array of pointers by calling oe_map_new_pack(). Perhaps counter-intuitively, that function does not need to actually see our new index-less pack. It only allocates and populates the array with the existing packs, after which oe_set_in_pack() actually adds the new pack to the array. Let's drop the unused "struct packed_git" argument to oe_map_new_pack() to avoid confusion. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-05Merge branch 'ph/pack-objects-mutex-fix'Junio C Hamano1-3/+1
"git pack-objects" incorrectly used uninitialized mutex, which has been corrected. * ph/pack-objects-mutex-fix: pack-objects: merge read_lock and lock in packing_data struct pack-objects: move read mutex to packing_data struct
2019-01-28pack-objects: merge read_lock and lock in packing_data structPatrick Hogg1-4/+1
Rename the packing_data lock to obd_lock and upgrade it to a recursive mutex to make it suitable for current read_lock usages. Additionally remove the superfluous #ifndef NO_PTHREADS guard around mutex initialization in prepare_packing_data as the mutex functions themselves are already protected. Signed-off-by: Patrick Hogg <phogg@novamoon.net> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-28pack-objects: move read mutex to packing_data structPatrick Hogg1-0/+1
ac77d0c37 ("pack-objects: shrink size field in struct object_entry", 2018-04-14) added an extra usage of read_lock/read_unlock in the newly introduced oe_get_size_slow for thread safety in parallel calls to try_delta(). Unfortunately oe_get_size_slow is also used in serial code, some of which is called before the first invocation of ll_find_deltas. As such the read mutex is not guaranteed to be initialized. Resolve this by moving the read mutex to packing_data and initializing it in prepare_packing_data which is initialized in cmd_pack_objects. Signed-off-by: Patrick Hogg <phogg@novamoon.net> Reviewed-by: Duy Nguyen <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12pack-*.c: remove the_repository referencesNguyễn Thái Ngọc Duy1-2/+4
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-30Merge branch 'js/pack-objects-mutex-init-fix'Junio C Hamano1-0/+3
A mutex used in "git pack-objects" were not correctly initialized and this caused "git repack" to dump core on Windows. * js/pack-objects-mutex-init-fix: pack-objects (mingw): initialize `packing_data` mutex in the correct spot pack-objects (mingw): demonstrate a segmentation fault with large deltas pack-objects: fix typo 'detla' -> 'delta'
2018-10-19pack-objects (mingw): initialize `packing_data` mutex in the correct spotJohannes Schindelin1-0/+3
In 9ac3f0e5b3e4 (pack-objects: fix performance issues on packing large deltas, 2018-07-22), a mutex was introduced that is used to guard the call to set the delta size. This commit even added code to initialize it, but at an incorrect spot: in `init_threaded_search()`, while the call to `oe_set_delta_size()` (and hence to `packing_data_lock()`) can happen in the call chain `check_object()` <- `get_object_details()` <- `prepare_pack()` <- `cmd_pack_objects()`, which is long before the `prepare_pack()` function calls `ll_find_deltas()` (which initializes the threaded search). Another tell-tale that the mutex was initialized in an incorrect spot is that the function to initialize it lives in builtin/, while the code that uses the mutex is defined in a libgit.a header file. Let's use a more appropriate function: `prepare_packing_data()`, which not only lives in libgit.a, but *has* to be called before the `packing_data` struct is used that contains that mutex. This fixes https://github.com/git-for-windows/git/issues/1839. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17Merge branch 'jk/cocci'Junio C Hamano1-1/+1
spatch transformation to replace boolean uses of !hashcmp() to newly introduced oideq() is added, and applied, to regain performance lost due to support of multiple hash algorithms. * jk/cocci: show_dirstat: simplify same-content check read-cache: use oideq() in ce_compare functions convert hashmap comparison functions to oideq() convert "hashcmp() != 0" to "!hasheq()" convert "oidcmp() != 0" to "!oideq()" convert "hashcmp() == 0" to hasheq() convert "oidcmp() == 0" to oideq() introduce hasheq() and oideq() coccinelle: use <...> for function exclusion
2018-09-17Merge branch 'cc/delta-islands'Junio C Hamano1-0/+12
Lift code from GitHub to restrict delta computation so that an object that exists in one fork is not made into a delta against another object that does not appear in the same forked repository. * cc/delta-islands: pack-objects: move 'layer' into 'struct packing_data' pack-objects: move tree_depth into 'struct packing_data' t5320: tests for delta islands repack: add delta-islands support pack-objects: add delta-islands support pack-objects: refactor code into compute_layer_order() Add delta-islands.{c,h}
2018-09-17Merge branch 'jk/pack-delta-reuse-with-bitmap'Junio C Hamano1-0/+19
When creating a thin pack, which allows objects to be made into a delta against another object that is not in the resulting pack but is known to be present on the receiving end, the code learned to take advantage of the reachability bitmap; this allows the server to send a delta against a base beyond the "boundary" commit. * jk/pack-delta-reuse-with-bitmap: pack-objects: reuse on-disk deltas for thin "have" objects pack-bitmap: save "have" bitmap from walk t/perf: add perf tests for fetches from a bitmapped server t/perf: add infrastructure for measuring sizes t/perf: factor out percent calculations t/perf: factor boilerplate out of test_perf
2018-09-17Merge branch 'ds/multi-pack-index'Junio C Hamano1-1/+1
When there are too many packfiles in a repository (which is not recommended), looking up an object in these would require consulting many pack .idx files; a new mechanism to have a single file that consolidates all of these .idx files is introduced. * ds/multi-pack-index: (32 commits) pack-objects: consider packs in multi-pack-index midx: test a few commands that use get_all_packs treewide: use get_all_packs packfile: add all_packs list midx: fix bug that skips midx with alternates midx: stop reporting garbage midx: mark bad packed objects multi-pack-index: store local property multi-pack-index: provide more helpful usage info midx: clear midx on repack packfile: skip loading index if in multi-pack-index midx: prevent duplicate packfile loads midx: use midx in approximate_object_count midx: use existing midx when writing new one midx: use midx in abbreviation calculations midx: read objects from multi-pack-index config: create core.multiPackIndex setting midx: write object offsets midx: write object id fanout chunk midx: write object ids in a chunk ...
2018-08-29convert "hashcmp() == 0" to hasheq()Jeff King1-1/+1
This is the partner patch to the previous one, but covering the "hash" variants instead of "oid". Note that our coccinelle rule is slightly more complex to avoid triggering the call in hasheq(). I didn't bother to add a new rule to convert: - hasheq(E1->hash, E2->hash) + oideq(E1, E2) Since these are new functions, there won't be any such existing callers. And since most of the code is already using oideq, we're not likely to introduce new ones. We might still see "!hashcmp(E1->hash, E2->hash)" from topics in flight. But because our new rule comes after the existing ones, that should first get converted to "!oidcmp(E1, E2)" and then to "oideq(E1, E2)". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-22Merge branch 'nd/pack-deltify-regression-fix'Junio C Hamano1-0/+4
In a recent update in 2.18 era, "git pack-objects" started producing a larger than necessary packfiles by missing opportunities to use large deltas. * nd/pack-deltify-regression-fix: pack-objects: fix performance issues on packing large deltas
2018-08-21pack-objects: reuse on-disk deltas for thin "have" objectsJeff King1-0/+19
When we serve a fetch, we pass the "wants" and "haves" from the fetch negotiation to pack-objects. That tells us not only which objects we need to send, but we also use the boundary commits as "preferred bases": their trees and blobs are candidates for delta bases, both for reusing on-disk deltas and for finding new ones. However, this misses some opportunities. Modulo some special cases like shallow or partial clones, we know that every object reachable from the "haves" could be a preferred base. We don't use all of them for two reasons: 1. It's expensive to traverse the whole history and enumerate all of the objects the other side has. 2. The delta search is expensive, so we want to keep the number of candidate bases sane. The boundary commits are the most likely to work. When we have reachability bitmaps, though, reason 1 no longer applies. We can efficiently compute the set of reachable objects on the other side (and in fact already did so as part of the bitmap set-difference to get the list of interesting objects). And using this set conveniently covers the shallow and partial cases, since we have to disable the use of bitmaps for those anyway. The second reason argues against using these bases in the search for new deltas. But there's one case where we can use this information for free: when we have an existing on-disk delta that we're considering reusing, we can do so if we know the other side has the base object. This in fact saves time during the delta search, because it's one less delta we have to compute. And that's exactly what this patch does: when we're considering whether to reuse an on-disk delta, if bitmaps tell us the other side has the object (and we're making a thin-pack), then we reuse it. Here are the results on p5311 using linux.git, which simulates a client fetching after `N` days since their last fetch: Test origin HEAD -------------------------------------------------------------------------- 5311.3: server (1 days) 0.27(0.27+0.04) 0.12(0.09+0.03) -55.6% 5311.4: size (1 days) 0.9M 237.0K -73.7% 5311.5: client (1 days) 0.04(0.05+0.00) 0.10(0.10+0.00) +150.0% 5311.7: server (2 days) 0.34(0.42+0.04) 0.13(0.10+0.03) -61.8% 5311.8: size (2 days) 1.5M 347.7K -76.5% 5311.9: client (2 days) 0.07(0.08+0.00) 0.16(0.15+0.01) +128.6% 5311.11: server (4 days) 0.56(0.77+0.08) 0.13(0.10+0.02) -76.8% 5311.12: size (4 days) 2.8M 566.6K -79.8% 5311.13: client (4 days) 0.13(0.15+0.00) 0.34(0.31+0.02) +161.5% 5311.15: server (8 days) 0.97(1.39+0.11) 0.30(0.25+0.05) -69.1% 5311.16: size (8 days) 4.3M 1.0M -76.0% 5311.17: client (8 days) 0.20(0.22+0.01) 0.53(0.52+0.01) +165.0% 5311.19: server (16 days) 1.52(2.51+0.12) 0.30(0.26+0.03) -80.3% 5311.20: size (16 days) 8.0M 2.0M -74.5% 5311.21: client (16 days) 0.40(0.47+0.03) 1.01(0.98+0.04) +152.5% 5311.23: server (32 days) 2.40(4.44+0.20) 0.31(0.26+0.04) -87.1% 5311.24: size (32 days) 14.1M 4.1M -70.9% 5311.25: client (32 days) 0.70(0.90+0.03) 1.81(1.75+0.06) +158.6% 5311.27: server (64 days) 11.76(26.57+0.29) 0.55(0.50+0.08) -95.3% 5311.28: size (64 days) 89.4M 47.4M -47.0% 5311.29: client (64 days) 5.71(9.31+0.27) 15.20(15.20+0.32) +166.2% 5311.31: server (128 days) 16.15(36.87+0.40) 0.91(0.82+0.14) -94.4% 5311.32: size (128 days) 134.8M 100.4M -25.5% 5311.33: client (128 days) 9.42(16.86+0.49) 25.34(25.80+0.46) +169.0% In all cases we save CPU time on the server (sometimes significant) and the resulting pack is smaller. We do spend more CPU time on the client side, because it has to reconstruct more deltas. But that's the right tradeoff to make, since clients tend to outnumber servers. It just means the thin pack mechanism is doing its job. From the user's perspective, the end-to-end time of the operation will generally be faster. E.g., in the 128-day case, we saved 15s on the server at a cost of 16s on the client. Since the resulting pack is 34MB smaller, this is a net win if the network speed is less than 270Mbit/s. And that's actually the worst case. The 64-day case saves just over 11s at a cost of just under 11s. So it's a slight win at any network speed, and the 40MB saved is pure bonus. That trend continues for the smaller fetches. The implementation itself is mostly straightforward, with the new logic going into check_object(). But there are two tricky bits. The first is that check_object() needs access to the relevant information (the thin flag and bitmap result). We can do this by pushing these into program-lifetime globals. The second is that the rest of the code assumes that any reused delta will point to another "struct object_entry" as its base. But of course the case we are interested in here is the one where don't have such an entry! I looked at a number of options that didn't quite work: - we could use a flag to signal a reused delta, but it's not a single bit. We have to actually store the oid of the base, which is normally done by pointing to the existing object_entry. And we'd have to modify all the code which looks at deltas. - we could add the reused bases to the end of the existing object_entry array. While this does create some extra work as later stages consider the extra entries, it's actually not too bad (we're not sending them, so they don't cost much in the delta search, and at most we'd have 2*N of them). But there's a more subtle problem. Adding to the existing array means we might need to grow it with realloc, which could move the earlier entries around. While many of the references to other entries are done by integer index, some (including ones on the stack) use pointers, which would become invalidated. This isn't insurmountable, but it would require quite a bit of refactoring (and it's hard to know that you've got it all, since it may work _most_ of the time and then fail subtly based on memory allocation patterns). - we could allocate a new one-off entry for the base. In fact, this is what an earlier version of this patch did. However, since the refactoring brought in by ad635e82d6 (Merge branch 'nd/pack-objects-pack-struct', 2018-05-23), the delta_idx code requires that both entries be in the main packing list. So taking all of those options into account, what I ended up with is a separate list of "external bases" that are not part of the main packing list. Each delta entry that points to an external base has a single-bit flag to do so; we have a little breathing room in the bitfield section of object_entry. This lets us limit the change primarily to the oe_delta() and oe_set_delta_ext() functions. And as a bonus, most of the rest of the code does not consider these dummy entries at all, saving both runtime CPU and code complexity. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20treewide: use get_all_packsDerrick Stolee1-1/+1
There are many places in the codebase that want to iterate over all packfiles known to Git. The purposes are wide-ranging, and those that can take advantage of the multi-pack-index already do. So, use get_all_packs() instead of get_packed_git() to be sure we are iterating over all packfiles. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-16pack-objects: move 'layer' into 'struct packing_data'Christian Couder1-0/+6
This reduces the size of 'struct object_entry' from 88 bytes to 80 and therefore makes packing objects more efficient. For example on a Linux repo with 12M objects, `git pack-objects --all` needs extra 96MB memory even if the layer feature is not used. Helped-by: Jeff King <peff@peff.net> Helped-by: Duy Nguyen <pclouds@gmail.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-16pack-objects: move tree_depth into 'struct packing_data'Christian Couder1-0/+6
This reduces the size of 'struct object_entry' and therefore makes packing objects more efficient. This also renames cmp_tree_depth() into tree_depth_compare(), as it is more modern to have the name of the compare functions end with "compare". Helped-by: Jeff King <peff@peff.net> Helped-by: Duy Nguyen <pclouds@gmail.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-23pack-objects: fix performance issues on packing large deltasNguyễn Thái Ngọc Duy1-0/+4
Let's start with some background about oe_delta_size() and oe_set_delta_size(). If you already know, skip the next paragraph. These two are added in 0aca34e826 (pack-objects: shrink delta_size field in struct object_entry - 2018-04-14) to help reduce 'struct object_entry' size. The delta size field in this struct is reduced to only contain max 1MB. So if any new delta is produced and larger than 1MB, it's dropped because we can't really save such a large size anywhere. Fallback is provided in case existing packfiles already have large deltas, then we can retrieve it from the pack. While this should help small machines repacking large repos without large deltas (i.e. less memory pressure), dropping large deltas during the delta selection process could end up with worse pack files. And if existing packfiles already have >1MB delta and pack-objects is instructed to not reuse deltas, all of them will be dropped on the floor, and the resulting pack would be definitely bigger. There is also a regression in terms of CPU/IO if we have large on-disk deltas because fallback code needs to parse the pack every time the delta size is needed and just access to the mmap'd pack data is enough for extra page faults when memory is under pressure. Both of these issues were reported on the mailing list. Here's some numbers for comparison. Version Pack (MB) MaxRSS(kB) Time (s) ------- --------- ---------- -------- 2.17.0 5498 43513628 2494.85 2.18.0 10531 40449596 4168.94 This patch provides a better fallback that is - cheaper in terms of cpu and io because we won't have to read existing pack files as much - better in terms of pack size because the pack heuristics is back to 2.17.0 time, we do not drop large deltas at all If we encounter any delta (on-disk or created during try_delta phase) that is larger than the 1MB limit, we stop using delta_size_ field for this because it can't contain such size anyway. A new array of delta size is dynamically allocated and can hold all the deltas that 2.17.0 can. This array only contains delta sizes that delta_size_ can't contain. With this, we do not have to drop deltas in try_delta() anymore. Of course the downside is we use slightly more memory, even compared to 2.17.0. But since this is considered an uncommon case, a bit more memory consumption should not be a problem. Delta size limit is also raised from 1MB to 16MB to better cover common case and avoid that extra memory consumption (99.999% deltas in this reported repo are under 12MB; Jeff noted binary artifacts topped out at about 3MB in some other private repos). Other fields are shuffled around to keep this struct packed tight. We don't use more memory in common case even with this limit update. A note about thread synchronization. Since this code can be run in parallel during delta searching phase, we need a mutex. The realloc part in packlist_alloc() is not protected because it only happens during the object counting phase, which is always single-threaded. Access to e->delta_size_ (and by extension pack->delta_size[e - pack->objects]) is unprotected as before, the thread scheduler in pack-objects must make sure "e" is never updated by two different threads. The area under the new lock is as small as possible, avoiding locking at all in common case, since lock contention with high thread count could be expensive (most blobs are small enough that delta compute time is short and we end up taking the lock very often). The previous attempt to always hold a lock in oe_delta_size() and oe_set_delta_size() increases execution time by 33% when repacking linux.git with with 40 threads. Reported-by: Elijah Newren <newren@gmail.com> Helped-by: Elijah Newren <newren@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-30Merge branch 'js/use-bug-macro'Junio C Hamano1-1/+1
Developer support update, by using BUG() macro instead of die() to mark codepaths that should not happen more clearly. * js/use-bug-macro: BUG_exit_code: fix sparse "symbol not declared" warning Convert remaining die*(BUG) messages Replace all die("BUG: ...") calls by BUG() ones run-command: use BUG() to report bugs, not die() test-tool: help verifying BUG() code paths
2018-05-06Replace all die("BUG: ...") calls by BUG() onesJohannes Schindelin1-1/+1
In d8193743e08 (usage.c: add BUG() function, 2017-05-12), a new macro was introduced to use for reporting bugs instead of die(). It was then subsequently used to convert one single caller in 588a538ae55 (setup_git_env: convert die("BUG") to BUG(), 2017-05-12). The cover letter of the patch series containing this patch (cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not terribly clear why only one call site was converted, or what the plan is for other, similar calls to die() to report bugs. Let's just convert all remaining ones in one fell swoop. This trick was performed by this invocation: sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c) Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16pack-objects: shrink size field in struct object_entryNguyễn Thái Ngọc Duy1-0/+3
It's very very rare that an uncompressed object is larger than 4GB (partly because Git does not handle those large files very well to begin with). Let's optimize it for the common case where object size is smaller than this limit. Shrink size field down to 31 bits and one overflow bit. If the size is too large, we read it back from disk. As noted in the previous patch, we need to return the delta size instead of canonical size when the to-be-reused object entry type is a delta instead of a canonical one. Add two compare helpers that can take advantage of the overflow bit (e.g. if the file is 4GB+, chances are it's already larger than core.bigFileThreshold and there's no point in comparing the actual value). Another note about oe_get_size_slow(). This function MUST be thread safe because SIZE() macro is used inside try_delta() which may run in parallel. Outside parallel code, no-contention locking should be dirt cheap (or insignificant compared to i/o access anyway). To exercise this code, it's best to run the test suite with something like make test GIT_TEST_OE_SIZE=4 which forces this code on all objects larger than 3 bytes. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16pack-objects: move in_pack out of struct object_entryNguyễn Thái Ngọc Duy1-0/+65
Instead of using 8 bytes (on 64 bit arch) to store a pointer to a pack. Use an index instead since the number of packs should be relatively small. This limits the number of packs we can handle to 1k. Since we can't be sure people can never run into the situation where they have more than 1k pack files. Provide a fall back route for it. If we find out they have too many packs, the new in_pack_by_idx[] array (which has at most 1k elements) will not be used. Instead we allocate in_pack[] array that holds nr_objects elements. This is similar to how the optional in_pack_pos field is handled. The new simple test is just to make sure the too-many-packs code path is at least executed. The true test is running make test GIT_TEST_FULL_IN_PACK_ARRAY=1 to take advantage of other special case tests. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08pack: convert struct pack_idx_entry to struct object_idbrian m. carlson1-3/+5
Convert struct pack_idx_entry to use struct object_id by changing the definition and applying the following semantic patch, plus the standard object_id transforms: @@ struct pack_idx_entry E1; @@ - E1.sha1 + E1.oid.hash @@ struct pack_idx_entry *E1; @@ - E1->sha1 + E1->oid.hash Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-18use REALLOC_ARRAY for changing the allocation size of arraysRené Scharfe1-2/+1
Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07hashmap: factor out getting a hash code from a SHA1Karsten Blees1-3/+2
Copying the first bytes of a SHA1 is duplicated in six places, however, the implications (the actual value would depend on the endianness of the platform) is documented only once. Add a properly documented API for this. Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-02pack-objects: use free()+xcalloc() instead of xrealloc()+memset()René Scharfe1-2/+2
Whenever the hash table becomes too small then its size is increased, the original part (and the added space) is zerod out using memset(), and the table is rebuilt from scratch. Simplify this proceess by returning the old memory using free() and allocating the new buffer using xcalloc(), which already clears the buffer for us. That way we avoid copying the old hash table contents needlessly inside xrealloc(). While at it, use the first array member with sizeof instead of a specific type. The old code used uint32_t and int, while index is actually an array of int32_t. Their sizes are the same basically everywhere, so it's not actually a problem, but the new code is cleaner and doesn't have to be touched should the type be changed. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24pack-objects: refactor the packing listVicent Marti1-0/+111
The hash table that stores the packing list for a given `pack-objects` run was tightly coupled to the pack-objects code. In this commit, we refactor the hash table and the underlying storage array into a `packing_data` struct. The functionality for accessing and adding entries to the packing list is hence accessible from other parts of Git besides the `pack-objects` builtin. This refactoring is a requirement for further patches in this series that will require accessing the commit packing list from outside of `pack-objects`. The hash table implementation has been minimally altered: we now use table sizes which are always a power of two, to ensure a uniform index distribution in the array. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2006-08-03Make git-pack-objects a builtinMatthias Kestenholz1-1376/+0
Signed-off-by: Matthias Kestenholz <matthias@spinlock.ch> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-23pack-objects: check pack.window for default window sizeJeff King1-1/+12
For some repositories, deltas simply don't make sense. One can disable them for git-repack by adding --window, but git-push insists on making the deltas which can be very CPU-intensive for little benefit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-10Fix more typos, primarily in the codePavel Roskin1-2/+2
The only visible change is that git-blame doesn't understand "--compability" anymore, but it does accept "--compatibility" instead, which is already documented. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-30don't load objects needlessly when repackingNicolas Pitre1-17/+28
If no delta is attempted on some objects then it is useless to load them in memory, neither create any delta index for them. The best thing to do is therefore to load and index them only when really needed. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-29consider previous pack undeltified object state only when reusing delta dataNicolas Pitre1-2/+3
Without this there would never be a chance to improve packing for previously undeltified objects. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-29Do not try futile object pairs when repacking.Linus Torvalds1-0/+7
In the repacking window, if both objects we are looking at already came from the same (old) pack-file, don't bother delta'ing them against each other. That means that we'll still always check for better deltas for (and against!) _unpacked_ objects, but assuming incremental repacks, you'll avoid the delta creation 99% of the time. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-21Merge branch 'ff/c99' into nextJunio C Hamano1-2/+2
* ff/c99: Remove all void-pointer arithmetic.
2006-06-21upload-pack: prepare for sideband message support.Junio C Hamano1-0/+4
This does not implement sideband for propagating the status to the downloader yet, but add code to capture the standard error output from the pack-objects process in preparation for sending it off to the client when the protocol extension allows us to do so. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-20Remove all void-pointer arithmetic.Florian Forster1-2/+2
ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in various ways. Usually the strategy that required the least changes was used. Signed-off-by: Florian Forster <octo@verplant.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-05pack-objects: improve path grouping heuristics.Linus Torvalds1-50/+19
This trivial patch not only simplifies the name hashing, it actually improves packing for both git and the kernel. The git archive pack shrinks from 6824090->6622627 bytes (a 3% improvement), and the kernel pack shrinks from 108756213 to 108219021 (a mere 0.5% improvement, but still, it's an improvement from making the hashing much simpler!) We just create a 32-bit hash, where we "age" previous characters by two bits, so the last characters in a filename count most. So when we then compare the hashes in the sort routine, filenames that end the same way sort the same way. It takes the subdirectory into account (unless the filename is > 16 characters), but files with the same name within the same subdirectory will obviously sort closer than files in different subdirectories. And, incidentally (which is why I tried the hash change in the first place, of course) builtin-rev-list.c will sort fairly close to rev-list.c. And no, it's not a "good hash" in the sense of being secure or unique, but that's not what we're looking for. The whole "hash" thing is misnamed here. It's not so much a hash as a "sorting number". [jc: rolled in simplification for computing the sorting number computation for thin pack base objects] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-30tree_entry(): new tree-walking helper functionLinus Torvalds1-16/+11
This adds a "tree_entry()" function that combines the common operation of doing a "tree_entry_extract()" + "update_tree_entry()". It also has a simplified calling convention, designed for simple loops that traverse over a whole tree: the arguments are pointers to the tree descriptor and a name_entry structure to fill in, and it returns a boolean "true" if there was an entry left to be gotten in the tree. This allows tree traversal with struct tree_desc desc; struct name_entry entry; desc.buf = tree->buffer; desc.size = tree->size; while (tree_entry(&desc, &entry) { ... use "entry.{path, sha1, mode, pathlen}" ... } which is not only shorter than writing it out in full, it's hopefully less error prone too. [ It's actually a tad faster too - we don't need to recalculate the entry pathlength in both extract and update, but need to do it only once. Also, some callers can avoid doing a "strlen()" on the result, since it's returned as part of the name_entry structure. However, by now we're talking just 1% speedup on "git-rev-list --objects --all", and we're definitely at the point where tree walking is no longer the issue any more. ] NOTE! Not everybody wants to use this new helper function, since some of the tree walkers very much on purpose do the descriptor update separately from the entry extraction. So the "extract + update" sequence still remains as the core sequence, this is just a simplified interface. We should probably add a silly two-line inline helper function for initializing the descriptor from the "struct tree" too, just to cut down on the noise from that common "desc" initializer. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-16Merge branch 'np/pack'Junio C Hamano1-13/+14
* np/pack: improve depth heuristic for maximum delta size pack-object: slightly more efficient simple euristic for further free packing improvements
2006-05-16improve depth heuristic for maximum delta sizeNicolas Pitre1-2/+5
This provides a linear decrement on the penalty related to delta depth instead of being an 1/x function. With this another 5% reduction is observed on packs for both the GIT repo and the Linux kernel repo, as well as fixing a pack size regression in another sample repo I have. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-15Merge branch 'fix'Junio C Hamano1-2/+1
* fix: Fix pack-index issue on 64-bit platforms a bit more portably. Install git-send-email by default Fix compilation on newer NetBSD systems git config syntax updates Another config file parsing fix. checkout: use --aggressive when running a 3-way merge (-m).
2006-05-15Fix pack-index issue on 64-bit platforms a bit more portably.v1.3.3Junio C Hamano1-2/+1
Apparently <stdint.h> is not enough for uint32_t on OpenBSD; use "unsigned int" -- hopefully that would stay 32-bit on every platform we care about, at least until we update the pack-index file format. Our sha1 routines optimized for architectures use uint32_t and expects '#include <stdint.h>' to be enough, so OpenBSD on arm or ppc might have similar issues down the road, I dunno. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-15pack-object: slightly more efficientNicolas Pitre1-7/+8
Avoid creating a delta index for objects with maximum depth since they are not going to be used as delta base anyway. This also reduce peak memory usage slightly as the current object's delta index is not useful until the next object in the loop is considered for deltification. This saves a bit more than 1% on CPU usage. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-15simple euristic for further free packing improvementsNicolas Pitre1-5/+2
Given that the early eviction of objects with maximum delta depth may exhibit bad packing on its own, why not considering a bias against deep base objects in try_delta() to mitigate that bad behavior. This patch adjust the MAX_size allowed for a delta based on the depth of the base object as well as enabling the early eviction of max depth objects from the object window. When used separately, those two things produce slightly better and much worse results respectively. But their combined effect is a surprising significant packing improvement. With this really simple patch the GIT repo gets nearly 15% smaller, and the Linux kernel repo about 5% smaller, with no significantly measurable CPU usage difference. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-14Merge branch 'fix'Junio C Hamano1-0/+1
* fix: include header to define uint32_t, necessary on Mac OS X
2006-05-14include header to define uint32_t, necessary on Mac OS XBen Clifford1-0/+1
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-13Merge branch 'fix'Junio C Hamano1-1/+1
* fix: Fix git-pack-objects for 64-bit platforms
2006-05-13Fix git-pack-objects for 64-bit platformsDennis Stosberg1-1/+1
The offset of an object in the pack is recorded as a 4-byte integer in the index file. When reading the offset from the mmap'ed index in prepare_pack_revindex(), the address is dereferenced as a long*. This works fine as long as the long type is four bytes wide. On NetBSD/sparc64, however, a long is 8 bytes wide and so dereferencing the offset produces garbage. [jc: taking suggestion by Linus to use uint32_t] Signed-off-by: Dennis Stosberg <dennis@stosberg.net> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-09Merge branch 'np/delta'Junio C Hamano1-35/+39
* np/delta: improve diff-delta with sparse and/or repetitive data tiny optimization to diff-delta replace adler32 with Rabin's polynomial in diff-delta use delta index data when finding best delta matches split the diff-delta interface
2006-05-05pack-object: squelch eye-candy on non-ttyJunio C Hamano1-0/+5
One of my post-update scripts runs a git-fetch into a separate repository and sends the results back to me (2>&1); I end up getting this in the mail: Generating pack... Done counting 180 objects. Result has 131 objects. Deltifying 131 objects. 0% (0/131) done^M 1% (2/131) done^M... This defaults not to do the progress report when not on a tty. You could give --progress to force the progress report, but let's not bother even documenting it nor mentioning it in the usage string. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-27pack-objects: update size heuristucs.Junio C Hamano1-6/+6
We used to omit delta base candidates that is much bigger than the target, but delta size does not grow when we delete more, so that was not a very good heuristics. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-26use delta index data when finding best delta matchesNicolas Pitre1-37/+41
This patch allows for computing the delta index for each base object only once and reuse it when trying to find the best delta match. This should set the mark and pave the way for possibly better delta generator algorithms. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-21fix pack-object buffer sizeNicolas Pitre1-1/+1
The input line has 40 _chars_ of sha1 and no 20 _bytes_. It should also account for the space before the pathname, and the terminating \n and \0. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-20pack-objects: do not stop at object that is "too small"Junio C Hamano1-1/+1
Because we sort the delta window by name-hash and then size, hitting an object that is too small to consider as a delta base for the current object does not mean we do not have better candidate in the window beyond it. Noticed by Shawn Pearce, analyzed by Nico, Linus and me. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-07Thin pack generation: optimization.Junio C Hamano1-48/+236
Jens Axboe noticed that recent "git push" has become very slow since we made --thin transfer the default. Thin pack generation to push a handful revisions that touch relatively small number of paths out of huge tree was stupid; it registered _everything_ from the excluded revisions. As a result, "Counting objects" phase was unnecessarily expensive. This changes the logic to register the blobs and trees from excluded revisions only for paths we are actually going to send to the other end. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-04Merge branch 'pe/cleanup'Junio C Hamano1-6/+10
* pe/cleanup: Replace xmalloc+memset(0) with xcalloc. Use blob_, commit_, tag_, and tree_type throughout.
2006-04-04Merge branch 'lt/fix-sol-pack'Junio C Hamano1-9/+31
* lt/fix-sol-pack: Use sigaction and SA_RESTART in read-tree.c; add option in Makefile. safe_fgets() - even more anal fgets() pack-objects: be incredibly anal about stdio semantics Fix Solaris stdio signal handling stupidities
2006-04-04Use blob_, commit_, tag_, and tree_type throughout.Peter Eriksen1-6/+10
This replaces occurences of "blob", "commit", "tag", and "tree", where they're really used as type specifiers, which we already have defined global constants for. Signed-off-by: Peter Eriksen <s022018@student.dtu.dk> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-03safe_fgets() - even more anal fgets()Junio C Hamano1-3/+4
This is from Linus -- the previous round forgot to clear error after EINTR case. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-02pack-objects: be incredibly anal about stdio semanticsLinus Torvalds1-1/+11
This is the "letter of the law" version of using fgets() properly in the face of incredibly broken stdio implementations. We can work around the Solaris breakage with SA_RESTART, but in case anybody else is ever that stupid, here's the "safe" (read: "insanely anal") way to use fgets. It probably goes without saying that I'm not terribly impressed by Solaris libc. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-02Fix Solaris stdio signal handling stupiditiesLinus Torvalds1-8/+19
This uses sigaction() to install the SIGALRM handler with SA_RESTART, so that Solaris stdio doesn't break completely when a signal interrupts a read. Thanks to Jason Riedy for confirming the silly Solaris signal behaviour. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-29tree/diff header cleanup.Junio C Hamano1-1/+1
Introduce tree-walk.[ch] and move "struct tree_desc" and associated functions from various places. Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and move it to cache.h. This macro returns the canonicalized st_mode value in the host byte order for files, symlinks and directories -- to be compared with a tree_desc entry. create_ce_mode(mode) in cache.h is similar but is intended to be used for index entries (so it does not work for directories) and returns the value in the network byte order. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-06pack-objects: simplify "thin" pack.Junio C Hamano1-27/+11
There was a misguided logic to overly prefer using objects that we are not going to pack as the base object. This was unnecessary. It does not matter to the unpacking side where the base object is -- it matters more to make the resulting delta smaller. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-01Re-fix compilation warnings.Luck, Tony1-1/+1
Commit 8fcf1ad9c68e15d881194c8544e7c11d33529c2b has a combination of double cast and Andreas' switch to using unsigned long ... just the latter is sufficient (and a lot less ugly than using the double cast). Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-26Use setenv(), fix warningsTimo Hirvonen1-1/+1
- Fix -Wundef -Wold-style-definition warnings - Make pll_free() static [jc: original patch by Timo had another unrelated bits: - Use setenv() instead of putenv() I'm postponing that part for now.] Signed-off-by: Timo Hirvonen <tihirvon@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-24fix warning from pack-objects.cLuck, Tony1-1/+1
When compiling on ia64 I get this warning (from gcc 3.4.3): gcc -o pack-objects.o -c -g -O2 -Wall -DSHA1_HEADER='<openssl/sha.h>' pack-objects.c pack-objects.c: In function `pack_revindex_ix': pack-objects.c:94: warning: cast from pointer to integer of different size A double cast (first to long, then to int) shuts gcc up, but is there a better way? [jc: Andreas Ericsson suggests to use ulong instead. ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-24Merge branches 'jc/rev-list' and 'jc/pack-thin'Junio C Hamano1-100/+317
* jc/rev-list: rev-list --objects: use full pathname to help hashing. rev-list --objects-edge: remove duplicated edge commit output. rev-list --objects-edge * jc/pack-thin: pack-objects: hash basename and direname a bit differently. pack-objects: allow "thin" packs to exceed depth limits pack-objects: use full pathname to help hashing with "thin" pack. pack-objects: thin pack micro-optimization. Use thin pack transfer in "git fetch". Add git-push --thin. send-pack --thin: use "thin pack" delta transfer. Thin pack - create packfile with missing delta base. Conflicts: pack-objects.c (taking "next") send-pack.c (taking "next")
2006-02-23pack-objects: hash basename and direname a bit differently.Junio C Hamano1-1/+32
...so that "Makefile"s from different revs are sorted together, separate from "t/Makefile"s, but close enough. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-23pack-objects: allow "thin" packs to exceed depth limitsJunio C Hamano1-4/+17
When creating a new pack to be used in .git/objects/pack/ directory, we carefully count the depth of deltified objects to be reused, so that the generated pack does not to exceed the specified depth limit for runtime efficiency. However, when we are generating a thin pack that does not contain base objects, such a pack can only be used during network transfer that is expanded on the other end upon reception, so being careful and artificially cutting the delta chain does not buy us anything except increased bandwidth requirement. This patch disables the delta chain depth limit check when reusing an existing delta. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22pack-objects: use full pathname to help hashing with "thin" pack.Junio C Hamano1-16/+44
This uses the same hashing algorithm to the "preferred base tree" objects and the incoming pathnames, to group the same files from different revs together, while spreading files with the same basename in different directories. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22pack-objects: thin pack micro-optimization.Junio C Hamano1-6/+10
Since we sort objects by type, hash, preferredness and then size, after we have a delta against preferred base, there is no point trying a delta with non-preferred base. This seems to save expensive calls to diff-delta and it also seems to save the output space as well. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22pack-objects eye-candy: finishing touches.Junio C Hamano1-10/+33
This updates the progress output to match "every one second or every percent whichever comes early" used by unpack-objects, as discussed on the list. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22also adds progress when actually writing a packNicolas Pitre1-3/+16
If that pack is big, it takes significant time to write and might benefit from some more eye candies as well. This is however disabled when the pack is written to stdout since in that case the output is usually piped into unpack_objects which already does its own progress reporting. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22nicer eye candies for pack-objectsNicolas Pitre1-31/+25
This provides a stable and simpler progress reporting mechanism that updates progress as often as possible but accurately not updating more than once a second. The deltification phase is also made more interesting to watch (since repacking a big repository and only seeing a dot appear once every many seconds is rather boring and doesn't provide much food for anticipation). Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22pack-objects: avoid delta chains that are too long.Junio C Hamano1-8/+35
This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22pack-objects: finishing touches.Junio C Hamano1-31/+71
This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-22pack-objects: reuse data from existing packs.Junio C Hamano1-57/+310
When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-19Thin pack - create packfile with missing delta base.Junio C Hamano1-81/+219
This goes together with "rev-list --object-edge" change, to feed pack-objects list of edge commits in addition to the usual object list. Upon seeing such list, pack-objects loosens the usual "self contained delta" constraints, and can produce delta against blobs and trees contained in the edge commits without storing the delta base objects themselves. The resulting packfile is not usable in .git/object/packs, but is a good way to implement "delta-only" transfer. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17pack-objects: avoid delta chains that are too long.Junio C Hamano1-8/+35
This tries to rework the solution for the excess delta chain problem. An earlier commit worked it around ``cheaply'', but repeated repacking risks unbound growth of delta chains. This version counts the length of delta chain we are reusing from the existing pack, and makes sure a base object that has sufficiently long delta chain does not get deltified. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17pack-objects: finishing touches.Junio C Hamano1-31/+71
This introduces --no-reuse-delta option to disable reusing of existing delta, which is a large part of the optimization introduced by this series. This may become necessary if repeated repacking makes delta chain too long. With this, the output of the command becomes identical to that of the older implementation. But the performance suffers greatly. It still allows reusing non-deltified representations; there is no point uncompressing and recompressing the whole text. It also adds a couple more statistics output, while squelching it under -q flag, which the last round forgot to do. $ time old-git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... real 12m8.530s user 11m1.450s sys 0m57.920s $ time git-pack-objects --stdout >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081) real 0m59.549s user 0m56.670s sys 0m2.400s $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... Total 184141, written 184141 (delta 134833), reused 47904 (delta 0) real 11m13.830s user 9m45.240s sys 0m44.330s There is one remaining issue when --no-reuse-delta option is not used. It can create delta chains that are deeper than specified. A<--B<--C<--D E F G Suppose we have a delta chain A to D (A is stored in full either in a pack or as a loose object. B is depth1 delta relative to A, C is depth2 delta relative to B...) with loose objects E, F, G. And we are going to pack all of them. B, C and D are left as delta against A, B and C respectively. So A, E, F, and G are examined for deltification, and let's say we decided to keep E expanded, and store the rest as deltas like this: E<--F<--G<--A Oops. We ended up making D a bit too deep, didn't we? B, C and D form a chain on top of A! This is because we did not know what the final depth of A would be, when we checked objects and decided to keep the existing delta. Unfortunately, deferring the decision until just before the deltification is not an option. To be able to make B, C, and D candidates for deltification with the rest, we need to know the type and final unexpanded size of them, but the major part of the optimization comes from the fact that we do not read the delta data to do so -- getting the final size is quite an expensive operation. To prevent this from happening, we should keep A from being deltified. But how would we tell that, cheaply? To do this most precisely, after check_object() runs, each object that is used as the base object of some existing delta needs to be marked with the maximum depth of the objects we decided to keep deltified (in this case, D is depth 3 relative to A, so if no other delta chain that is longer than 3 based on A exists, mark A with 3). Then when attempting to deltify A, we would take that number into account to see if the final delta chain that leads to D becomes too deep. However, this is a bit cumbersome to compute, so we would cheat and reduce the maximum depth for A arbitrarily to depth/4 in this implementation. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17pack-objects: reuse data from existing packs.Junio C Hamano1-57/+310
When generating a new pack, notice if we have already needed objects in existing packs. If an object is stored deltified, and its base object is also what we are going to pack, then reuse the existing deltified representation unconditionally, bypassing all the expensive find_deltas() and try_deltas() calls. Also, notice if what we are going to write out exactly match what is already in an existing pack (either deltified or just compressed). In such a case, we can just copy it instead of going through the usual uncompressing & recompressing cycle. Without this patch, in linux-2.6 repository with about 1500 loose objects and a single mega pack: $ git-rev-list --objects v2.6.16-rc3 >RL $ wc -l RL 184141 RL $ time git-pack-objects p <RL Generating pack... Done counting 184141 objects. Packing 184141 objects.................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 real 12m4.323s user 11m2.560s sys 0m55.950s With this patch, the same input: $ time ../git.junio/git-pack-objects q <RL Generating pack... Done counting 184141 objects. Packing 184141 objects..................... a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2 Total 184141, written 184141, reused 182441 real 1m2.608s user 0m55.090s sys 0m1.830s Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12Make pack-objects chattier.Junio C Hamano1-4/+6
You could give -q to squelch it, but currently no tool does it. This would make 'git clone host:repo here' over ssh not silent again. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-11fetch-clone progress: finishing touches.Junio C Hamano1-2/+41
This makes fetch-pack also report the progress of packing part. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-29code comments: spellJunio C Hamano1-1/+1
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-12-08Document the --non-empty command-line option to git-pack-objects.Nikolai Weibull1-1/+1
This provides (minimal) documentation for the --non-empty command-line option to the pack-objects command. Signed-off-by: Nikolai Weibull <nikolai@bitwi.se> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-28Make the rest of commands work from a subdirectory.Junio C Hamano1-0/+2
These commands are converted to run from a subdirectory. commit-tree convert-objects merge-base merge-index mktag pack-objects pack-redundant prune-packed read-tree tar-tree unpack-file unpack-objects update-server-info write-tree Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-11-21git-repack: Properly abort in corrupt repositoryLinus Torvalds1-1/+1
In a corrupt repository, git-repack produces a pack that does not contain needed objects without complaining, and the result of this combined with -d flag can be very painful -- e.g. a lossage of one tree object can lead to lossage of blobs reachable only through that tree. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-26pack-objects: Allow use of pre-generated pack.Junio C Hamano1-11/+73
git-pack-objects can reuse pack files stored in $GIT_DIR/pack-cache directory, when a necessary pack is found. This is hopefully useful when upload-pack (called from git-daemon) is expected to receive requests for the same set of objects many times (e.g full cloning request of any project, or updates from the set of heads previous day to the latest for a slow moving project). Currently git-pack-objects does *not* keep pack files it creates for reusing. It might be useful to add --update-cache option to it, which would allow it store pack files it created in the pack-cache directory, and prune rarely used ones from it. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-14Unlocalized isspace and friendsLinus Torvalds1-1/+0
Do our own ctype.h, just to get the sane semantics: we want locale-independence, _and_ we want the right signed behaviour. Plus we only use a very small subset of ctype.h anyway (isspace, isalpha, isdigit and isalnum). Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-13Add support for "local" packingLinus Torvalds1-3/+21
This adds the "--local" flag to git-pack-objects, which acts like "--incremental", except that instead of ignoring all packed objects, it only ignores objects that are packed and in an alternate object tree. As a result, it effectively only does a local re-pack: any remote-packed objects will stay in the alternate object directories. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-10-12Fix packname hash generation.Junio C Hamano1-4/+10
This changes the generation of hash packfiles have in their names, from "hash of object names as fed to us" to "hash of object names in the resulting pack, in the order they appear in the index file". The new "git-index-pack" command is taught to output the computed hash value to its standard output. With this, we can store downloaded pack in a temporary file without knowing its final name, run git-index-pack to generate idx for it while finding out its final name, and then rename the pack and idx to their final names. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-08-08[PATCH] Plug memory leak in git-pack-objectsSergey Vlasov1-0/+4
find_deltas() should free its temporary objects before returning. [jc: Sergey, if you have [PATCH] title on the Subject line of your e-mail, please do not repeat it on the first line in your message body. Thanks.] Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-07-03Make the name of a pack-file depend on the objects packed there-in.Linus Torvalds1-6/+14
This means that the .git/objects/pack directory is also rsync'able, since the filenames created there-in are either unique or refer to the same data. Otherwise you might not be able to pull from a directory that is partly packed without having to worry about missing objects due to pack-file name clashes.
2005-07-03Add "--non-empty" flag to git-pack-objectsLinus Torvalds1-0/+7
It skips writing the pack-file if it ends up being empty.
2005-07-03Add "--incremental" flag to git-pack-objectsLinus Torvalds1-1/+9
It won't add an object that is already in a pack to the new pack.
2005-06-29[PATCH] assorted delta code cleanupNicolas Pitre1-1/+1
This is a wrap-up patch including all the cleanups I've done to the delta code and its usage. The most important change is the factorization of the delta header handling code. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-28Make git pack files use little-endian size encodingLinus Torvalds1-21/+8
This makes it match the new delta encoding, and admittedly makes the code easier to follow. This also updates the PACK file version to 2, since this (and the delta encoding change in the previous commit) are incompatible with the old format.
2005-06-28[PATCH] Emit base objects of a delta chain when the delta is output.Junio C Hamano1-5/+20
Deltas are useless by themselves and when you use them you need to get to their base objects. A base object should inherit recency from the most recent deltified object that is based on it and that is what this patch teaches git-pack-objects. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-28[PATCH] Fix unpack-objects for header length information.Junio C Hamano1-3/+3
Standalone unpack-objects command was not adjusted for header length encoding change when dealing with deltified entry. This fixes it. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-28Change pack file format. Hopefully for the last time.Linus Torvalds1-22/+62
This also adds a header with a signature, version info, and the number of objects to the pack file. It also encodes the file length and type more efficiently.
2005-06-28git-pack-objects: add "--stdout" flag to write the pack file to stdoutLinus Torvalds1-6/+15
This also suppresses creation of the index file.
2005-06-28Teach packing about "tag" objectsLinus Torvalds1-13/+15
(And teach sha1_file and unpack-object know how to unpack them too, of course)
2005-06-27[PATCH] Enhance sha1_file_size() into sha1_object_info()Junio C Hamano1-22/+16
This lets us eliminate one use of map_sha1_file() outside sha1_file.c, to bring us one step closer to the packed GIT. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-27[PATCH] Remove "delta" object representation.Junio C Hamano1-1/+1
Packed delta files created by git-pack-objects seems to be the way to go, and existing "delta" object handling code has exposed the object representation details to too many places. Remove it while we refactor code to come up with a proper interface in sha1_file.c. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-26csum-file interface updates: return resulting SHA1Linus Torvalds1-3/+5
Also, make the writing of the SHA1 as a end-header be conditional: not every user will necessarily want to write the SHA1 to the file itself, even though current users do (but we migh end up using the same helper functions for the object files themselves, that don't do this). This also makes the packed index file contain the SHA1 of the packed data file at the end (just before its own SHA1). That way you can validate the pairing of the two if you want to.
2005-06-26git-pack-objects: write the pack files with a SHA1 csumLinus Torvalds1-55/+11
We want to be able to check their integrity later, and putting the sha1-sum of the contents at the end is a good thing. The writing routines are generic, so we could try to re-use them for the index file, instead of having the same logic duplicated. Update unpack-objects to know about the extra 20 bytes at the end of the index.
2005-06-26git-pack-objects: use name information (if any) to sort objects for packing.Linus Torvalds1-4/+23
This is incredibly cheezy. But it's cheap, and it works pretty well.
2005-06-26git-pack-objects: do the delta search in reverse size orderLinus Torvalds1-12/+16
Starting from big objects and going backwards means that we end up picking a delta that goes from a bigger object to a smaller one. That's advantageous for two reasons: the bigger object is likely the newer one (since things tend to grow, rather than shrink), and doing a delete tends to be smaller than doing an add. So the deltas don't tend to be top-of-tree, and the packed end result is just slightly smaller.
2005-06-26Fix object packing/unpacking.Linus Torvalds1-5/+5
This actually successfully packed and unpacked a git archive down to 1.3MB (17MB unpacked). Right now unpacking is way too noisy, lots of debug messages left.
2005-06-26[PATCH] Finish initial cut of git-pack-object/git-unpack-object pair.Junio C Hamano1-2/+4
This finishes the initial round of git-pack-object / git-unpack-object pair. They are now good enough to be used as a transport medium: - Fix delta direction in pack-objects; the original was computing delta to create the base object from the object to be squashed, which was quite unfriendly for unpacker ;-). - Add a script to test the very basics. - Implement unpacker for both regular and deltified objects. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25Add "--depth=N" parameter to git-pack-objects to limit maximum delta depthLinus Torvalds1-7/+17
It too defaults to 10. A nice round random number.
2005-06-25git-pack-objects: make "--window=x" semantics more logical.Linus Torvalds1-4/+4
A zero disables delta generation (like before), but we make the window be one bigger than specified, since we use one entry for the one to be tested (it used to be that "--window=1" was meaningless, since we'd have used up the single-entry window with the entry to be tested, and had no chance of actually ever finding a delta). The default window remains at 10, but now it really means "test the 10 closest objects", not "test the 9 closest objects".
2005-06-25Add a "max_size" parameter to diff_delta()Linus Torvalds1-10/+12
Anything that generates a delta to see if two objects are close usually isn't interested in the delta ends up being bigger than some specified size, and this allows us to stop delta generation early when that happens.
2005-06-25Fix delta "sliding window" codeLinus Torvalds1-4/+5
When Junio fixed the lack of a successful error code from try_delta(), that uncovered an off-by-one error in the caller. Also, some testing made it clear that we now find a lot more deltas, because we used to (incorrectly) break early on bogus "failure" cases.
2005-06-25[PATCH] (patchlet) pack-objects.c: try_delta()Junio C Hamano1-0/+1
Return value of try_delta is checked for negativeness, but the success path does not return anything, letting compiler warn and presumably return garbage. Signed-off-by: Junio C Hamano <junkio@cox.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-25git-pack-objects: mark the delta packing with a 'D'.Linus Torvalds1-0/+1
When writing a delta, we take the real type from the object we're doing the delta against, and just write a 'D' as the type of the current object.
2005-06-25git-pack-objects: fix typoLinus Torvalds1-1/+1
("<" should be "=")
2005-06-25git-pack-objects: create a packed object representation.Linus Torvalds1-0/+403
This is kind of like a tar-ball for a set of objects, ready to be shipped off to another end. Alternatively, you could use is as a packed representation of the object database directly, if you changed "read_sha1_file()" to read these kinds of packs. The latter is partiularly useful to generate a "packed history", ie you could pack up your old history efficiently, but still have it available (at a performance hit, of course). I haven't actually written an unpacker yet, so the end result has not been verified in any way yet. I obviously always write bug-free code, so it just has to work, no?