Subject: kern/23196: libkern/pmatch.c bug fix & man page enhancement
To: None <>
From: None <kre@munnari.OZ.AU>
List: netbsd-bugs
Date: 10/20/2003 01:56:14
>Number:         23196
>Category:       kern
>Synopsis:       libkern/pmatch.c bug fix & man page enhancement
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Oct 19 19:01:00 UTC 2003
>Originator:     Robert Elz
>Release:        NetBSD 1.6T	(really -current of 2003-10-17)
	Prince of Songkla University
System: NetBSD 1.6T NetBSD 1.6T (DELTA) #42: Thu May 29 12:23:16 ICT 2003 i386
Architecture: i386
Machine: i386
	Manuel's description of libkern/pmatch.c in his new pmatch(9)
	inspired me to go check the code to see if the man page matched
	reality (which seemed just a little unlikely).   While doing that
	I noticed a bug in the code that should be fixed (sometime).
	That is, the pattern [^-z] matches any character that isn't
	within the range ^-z (using the '^' for two different purposes)
	which isn't what any other unix pattern matching facility does,
	and was almost certainly not intended.

	While fixing just that is trivial, the code also didn't handle
	']' inside the set - fixing that along with the previous one
	was almost as trivial (but altered the way the fix was done a

	Then, prompted by Manuel, I also made some updates for his
	man page, to slightly better document this function.

	By code inspection.

	This is a non-critical/low PR, as there is (at least that I
	could find) only one use of pmatch(9) in the kernel currently,
	and as used there, memcmp() would be just as good (for now
	anyway), so the bug/enhancement here are irrelevant (no-one
	needs the fix here for now - but sometime in the future, who knows).

	I enclose a patch for libkern/pmatch.c and man9/pmatch.9
	(in the form of a diff -u).

	I also include a really small crappy test program that I found
	in the trash on the way to work the other day, and some test
	data to drive it - the program seems to read stdin, say nothing,
	and exit(0) if pmatch() behaves as expected, and make some noise,
	and exit(1) if it doesn't.   Those two files (the makefile and
	any doc must have been in some other bin...) are in a shar file.
	I have no idea if regression tests are possible for libkern
	functions with the current setup, butif they are, maybe this
	can be bent into some productive use?

Index: sys/lib/libkern/pmatch.c
RCS file: /local/NetBSD/repository/src/sys/lib/libkern/pmatch.c,v
retrieving revision 1.3
diff -u -r1.3 pmatch.c
--- sys/lib/libkern/pmatch.c	2003/08/07 16:32:10	1.3
+++ sys/lib/libkern/pmatch.c	2003/10/19 14:12:07
@@ -37,7 +37,7 @@
  *	Return 1 on substring match.
  *	Return 0 on no match.
  *	Return -1 on error.
- * *estr will point to the end of thelongest exact or substring match.
+ * *estr will point to the end of the longest exact or substring match.
 pmatch(string, pattern, estr)
@@ -46,6 +46,7 @@
 	u_char stringc, patternc, rangec;
 	int match, negate_range;
 	const char *oestr, *pestr, *testr;
+	const char *eclass;
 	if (estr == NULL)
 		estr = &testr;
@@ -101,13 +102,15 @@
 			match = 0;
 			if ((negate_range = (*pattern == '^')) != 0)
+			/* char class cannot end at char after '[' (or "[^") */
+			eclass = pattern + 1;
 			while ((rangec = *pattern++) != '\0') {
-				if (rangec == ']')
+				if (rangec == ']' && pattern != eclass)
 				if (match)
-				if (rangec == '-' && *(pattern - 2) != '[' &&
-				    *pattern != ']') {
+				if (rangec == '-' && *pattern != ']' &&
+				    pattern != eclass) {
 					match = 
 					    stringc <= (u_char)*pattern &&
 					    (u_char)*(pattern - 2) <= stringc;

Index: share/man/man9/pmatch.9
RCS file: /local/NetBSD/repository/src/share/man/man9/pmatch.9,v
retrieving revision 1.2
diff -u -r1.2 pmatch.9
--- share/man/man9/pmatch.9	2003/10/14 06:49:51	1.2
+++ share/man/man9/pmatch.9	2003/10/19 18:18:56
@@ -39,14 +39,26 @@
 .Ft int
 .Fn pmatch "const char *string" "const char *pattern" "const char **estr"
-Extract substring matching
+Extract leading substring matching
 .Fa pattern
 .Fa string .
-If not
-.Dv NULL ,
 .Fa estr
-points to the end of the longest exact or substring match.
+is not
+.Dv NULL ,
+.Fa *estr
+is set to point to character in
+.Fa string
+after the last character that matched,
+which will be the terminating
+.Sq Li \&\e0
+if the entire string matched.
+If the pattern does not match the string,
+or a leading substring of the string,
+the result in
+.Fa *estr
+is undefined.
 .Fn pmatch
 uses the following metacharacters:
@@ -54,20 +66,70 @@
 .It Li \&?
 match any single character.
 .It Li *
-match any character 0 or more times.
+match any string of characters,
+including the empty string.
 .It Li \&[
-define a range of characters that will match.
-The range is defined by 2 characters separated by a
-.Sq Li \&- .
-The range definition has to end with a
+match any of a set of characters against a
+single character from the string.
+The set consists of individual characters,
+and ranges of characters.
+A range is defined by 2 characters separated by a
+.Sq Li \&\- ,
+and identifies all characters in the ASCII
+character set from the first to the second,
+The set definition ends with a
 .Sq Li \&] .
 .Sq Li ^
 following the
 .Sq Li \&[
-will negate the range.
+will negate the set,
+matching any character not included.
+The set of characters cannot be empty, a
+.Sq Li \&]
+occurring in a position that would create an empty
+set is instead treated as a character in the set.
+So, to include
+.Sq Li \&]
+in the set, place it first, immediately after
+.Sq Li \&[ ,
+or the
+.Sq Li \&^
+for a negated set.
+To include
+.Sq Li \&\-
+in the set, place it first or last.
+To include
+.Sq Li \&^
+place it anywhere but immediately after the
+.Sq Li \&[ .
+There are no other special characters inside a set,
+.Sq Li \&* ,
+.Sq Li \&?
+.Sq Li \&[
+are all treated simply as characters.
+There are no escape characters anywhere in the pattern.
+To match a literal
+.Sq Li \&*
+Use the sequence
+.Dq Li \&[*] ,
+and similarly to match a literal
+.Sq Li \&?
+.Sq Li \&[ .
 .Fn pmatch
-will return 2 for an exact match, 1 for a substring match, 0 for no match and
-\-1 if an error occurs.
+will return 2 if the whole string matched,
+1 for a substring match,
+0 for no match and
+\-1 if there was an error in the pattern,
+for which currently the only possible cause is a missing
+.Sq Li \&]
+to close a set.

# This is a shell archive.  Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file".  Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
# This archive contains:
#	tpmatch.c
#	Data
echo x - tpmatch.c
sed 's/^X//' >tpmatch.c << 'END-of-tpmatch.c'
X * $NetBSD$
X *
X * Unlicensed undocumented anonymous public domain code.
X * (and very crappy code at that...)
X */
Xint pmatch(const char *string, const char *pattern, const char **estr);
X	char string[256];
X	char pattern[128];
X	int matchlen, result;
X	const char *estr;
X	int errs = 0;
X	int n;
X	int line = 0;
X	while (scanf("%s %s %d %d\n", string, pattern, &matchlen, &result) == 4) {
X		line++;
X		n = pmatch(string, pattern, &estr);
X		if (n != result) {
X			printf("pmatch: %d: result %d, expected %d\n",
X			    line, n, result);
X			errs++;
X			continue;
X		}
X		if (n < 1)
X			continue;
X		if (string + matchlen != estr) {
X			printf("pmatch: %d: matched %d, expected %d\n",
X				line, (int)(estr - string), matchlen);
X			errs++;
X			continue;
X		}
X	}
X	exit(errs != 0);
echo x - Data
sed 's/^X//' >Data << 'END-of-Data'
Xa a 1 2
Xa b 0 0
Xab b 0 0
Xab ab 2 2
Xab AB 0 0
Xab [aA][Bb] 2 2
Xabc ab 2 1
Xabc a?c 3 2
Xabc a* 3 2
Xabc *c 3 2
Xabc *b 2 1
Xabc bc 0 0
Xabc ??? 3 2
Xabc * 3 2
Xabc a*b*c 3 2
Xabc a*b 2 1
Xabc [a-c][a-z][a-c] 3 2
Xabc [abc][^d-z]? 3 2
Xabc a[^b]c 0 0
X[?] [[][?] 2 1
X[?] [[][?]] 3 2
X[?] [[][?][]] 3 2
X*[? [*][[][?] 3 2
Xabc [hello 0 -1
Xabc []abc 0 -1
Xabc []abc[] 1 1
Xa-b [a-b]-[a-b] 3 2
Xa-b [a-b][-] 2 1
Xa-c [^-z]-c 3 2
Xa-c [^z-][---]c 3 2
Xa^b a[^^]b 0 0
Xa^b a^b 3 2
Xa^b a[!^]b 3 2