find multi-byte comment chars in unterminated buffers

As with the previous patch, we need to swap out single-byte matching for something like starts_with() to match all bytes of a multi-byte comment character. But for cases where the buffer is not NUL-terminated (and we instead have an explicit size or end pointer), it's not safe to use starts_with(), as it might walk off the end of the buffer. Let's introduce a new starts_with_mem() that does the same thing but also accepts the length of the "haystack" str and makes sure not to walk past it. Note that in most cases the existing code did not need a length check at all, since it was written in a way that knew we had at least one byte available (and that was all we checked). So I had to read each one to find the appropriate bounds. The one exception is sequencer.c's add_commented_lines(), where we can actually get rid of the length check. Just like starts_with(), our starts_with_mem() handles an empty haystack variable by not matching (assuming a non-empty prefix). A few notes on the implementation of starts_with_mem(): - it would be equally correct to take an "end" pointer (and indeed, many of the callers have this and have to subtract to come up with the length). I think taking a ptr/size combo is a more usual interface for our codebase, though, and has the added benefit that the function signature makes it harder to mix up the three parameters. - we could obviously build starts_with() on top of this by passing strlen(str) as the length. But it's possible that starts_with() is a relatively hot code path, and it should not pay that penalty (it can generally return an answer proportional to the size of the prefix, not the whole string). - it naively feels like xstrncmpz() should be able to do the same thing, but that's not quite true. If you pass the length of the haystack buffer, then strncmp() finds that a shorter prefix string is "less than" than the haystack, even if the haystack starts with the prefix. If you pass the length of the prefix, then you risk reading past the end of the haystack if it is shorter than the prefix. So I think we really do need a new function. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
author: Jeff King <peff@peff.net> 2024-03-12 05:17:39 -0400
committer: Junio C Hamano <gitster@pobox.com> 2024-03-12 13:28:10 -0700
commit: 2ec225d397d564d9c6bb907d85a58507dec75989 (patch)
tree: 8158ad80d8691ddb1d497106e17f6d3330b2fa2b /commit.c
parent: 600559b7169c1353e3d46f773f39cf018d38ebbc (diff)
download: git-2ec225d397d564d9c6bb907d85a58507dec75989.tar.gz
1 files changed, 2 insertions, 1 deletions
diff --git a/commit.c b/commit.c
index ef679a0b93..531a666cba 100644
--- a/commit.c
+++ b/commit.c
@@ -1796,7 +1796,8 @@ size_t ignored_log_message_bytes(const char *buf, size_t len)
 		else
 			next_line++;
 
-		if (buf[bol] == comment_line_char || buf[bol] == '\n') {
+		if (starts_with_mem(buf + bol, cutoff - bol, comment_line_str) ||
+		    buf[bol] == '\n') {
 			/* is this the first of the run of comments? */
 			if (!boc)
 				boc = bol;
author	Jeff King <peff@peff.net>	2024-03-12 05:17:39 -0400
committer	Junio C Hamano <gitster@pobox.com>	2024-03-12 13:28:10 -0700
commit	2ec225d397d564d9c6bb907d85a58507dec75989 (patch)
tree	8158ad80d8691ddb1d497106e17f6d3330b2fa2b /commit.c
parent	600559b7169c1353e3d46f773f39cf018d38ebbc (diff)
download	git-2ec225d397d564d9c6bb907d85a58507dec75989.tar.gz