reftable/merged: circumvent pqueue with single subiter

The merged iterator uses a priority queue to order records so that we can yielid them in the expected order. This priority queue of course comes with some overhead as we need to add, compare and remove entries in that priority queue. In the general case, that overhead cannot really be avoided. But when we have a single subiter left then there is no need to use the priority queue anymore because the order is exactly the same as what that subiter would return. While having a single subiter may sound like an edge case, it happens more frequently than one might think. In the most common scenario, you can expect a repository to have a single large table that contains most of the records and then a set of smaller tables which contain later additions to the reftable stack. In this case it is quite likely that we exhaust subiters of those smaller stacks before exhausting the large table. Special-case this and return records directly from the remaining subiter. This results in a sizeable speedup when iterating over 1m refs in a repository with a single table: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 135.4 ms ± 4.4 ms [User: 132.5 ms, System: 2.8 ms] Range (min … max): 131.0 ms … 166.3 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 126.3 ms ± 3.9 ms [User: 123.3 ms, System: 2.8 ms] Range (min … max): 122.7 ms … 157.0 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.07 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
author: Patrick Steinhardt <ps@pks.im> 2024-03-04 11:49:13 +0100
committer: Junio C Hamano <gitster@pobox.com> 2024-03-04 10:19:40 -0800
commit: f8c1a8e2e18f71b46549ba722fc5e4a62db0864b (patch)
tree: 8eab6179d8972154a41a3d35e268af6269030c9c /reftable
parent: 3b6dd6ad1da614dec42576de416e819cb77b1592 (diff)
download: git-f8c1a8e2e18f71b46549ba722fc5e4a62db0864b.tar.gz
1 files changed, 22 insertions, 2 deletions
diff --git a/reftable/merged.c b/reftable/merged.c
index d9ed4a19dd..29161a32cf 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -87,16 +87,36 @@ static int merged_iter_next_entry(struct merged_iter *mi,
 				  struct reftable_record *rec)
 {
 	struct pq_entry entry = { 0 };
-	int err = 0;
+	int err = 0, empty;
+
+	empty = merged_iter_pqueue_is_empty(mi->pq);
 
 	if (mi->advance_index >= 0) {
+		/*
+		 * When there are no pqueue entries then we only have a single
+		 * subiter left. There is no need to use the pqueue in that
+		 * case anymore as we know that the subiter will return entries
+		 * in the correct order already.
+		 *
+		 * While this may sound like a very specific edge case, it may
+		 * happen more frequently than you think. Most repositories
+		 * will end up having a single large base table that contains
+		 * most of the refs. It's thus likely that we exhaust all
+		 * subiters but the one from that base ref.
+		 */
+		if (empty)
+			return iterator_next(&mi->subiters[mi->advance_index].iter,
+					     rec);
+
 		err = merged_iter_advance_subiter(mi, mi->advance_index);
 		if (err < 0)
 			return err;
+		if (!err)
+			empty = 0;
 		mi->advance_index = -1;
 	}
 
-	if (merged_iter_pqueue_is_empty(mi->pq))
+	if (empty)
 		return 1;
 
 	entry = merged_iter_pqueue_remove(&mi->pq);
author	Patrick Steinhardt <ps@pks.im>	2024-03-04 11:49:13 +0100
committer	Junio C Hamano <gitster@pobox.com>	2024-03-04 10:19:40 -0800
commit	f8c1a8e2e18f71b46549ba722fc5e4a62db0864b (patch)
tree	8eab6179d8972154a41a3d35e268af6269030c9c /reftable
parent	3b6dd6ad1da614dec42576de416e819cb77b1592 (diff)
download	git-f8c1a8e2e18f71b46549ba722fc5e4a62db0864b.tar.gz