NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

bin/50179: sh(1) variable expansion bug

>Number:         50179
>Category:       bin
>Synopsis:       ${var#?} removes two instead of one character if $var starts with a \201-byte
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Aug 27 00:40:00 +0000 2015
>Originator:     Timo Buhrmester
>Release:        NetBSD 7.99.20
System: NetBSD frozen.localdomain 7.99.20 NetBSD 7.99.20 (FRZKERN32) #0: Wed Aug 12 01:07:44 CEST 2015 build@frozen.localdomain:/usr/build/obj-frozen32-i386-i386/sys/arch/i386/compile/FRZKERN32 i386
Architecture: i386
Machine: i386
	We're having a bug in sh(1) that can be triggered with the following lines:

	$ orig="$(printf '\201foo')" # ---------------------(1)
	$ oneless="${orig#?}"        # ---------------------(2)
	$ echo "$oneless"            # prints 'oo' instead of 'foo'

	In (1), a string that starts with a \201-byte (aka 0x81, CTLESC) followed by 'foo' is created and assigned to `orig`.
	In (2) we're then trying to nibble off the first byte, so $oneless ought to be 'foo'.  However, sh(1) removes *two* bytes, leaving $oneless as just 'oo'.  For prefixes other than \201, the shell behaves correctly.

	Other shells (ksh from base, bash from pkgsrc and FreeBSD's sh(1)) do not have this behavior.

	The code responsible seems to be (src/bin/sh/expand.c, function subevalvar):
	| case VSTRIMLEFT:  // <------------------- the ${var#prefix} case
	| 	for (loc = startp; loc < str; loc++) {
	| 		c = *loc;
	| 		*loc = '\0';
	| 		if (patmatch(str, startp, varflags & VSQUOTE))
	| 			goto recordleft;
	| 		*loc = c;
	| 		if ((varflags & VSQUOTE) && *loc == CTLESC)
	| 		        loc++; // <--------- Oops.
	| 	}
	The loop tries to match the `pattern`-part of ${var#pattern}, pointed to by `str`, against bigger and bigger prefixes of $var, pointed to by `startp`.
	In the second iteration, *loc is \201, also known as CTLESC, and the VSQUOTE-bit in varflags is also set.  So we (mistakenly) `loc++` in the bottom conditional and hence consume the next byte that would follow, the 'f' in 'foo'.

	I'm not familiar with when and why the shell needs to insert its own control characters (like CTLESC) into strings, but in this case it clearly should not interpret them.


	orig="$(printf "${badbyte}foo")" # -------------------------(1)
	oneless="${orig#?}"  # -------------------------------------(2)

	if [ "$oneless" != foo ]; then
		printf "Expected 'foo' but got '%s'\n" "$oneless"
	None known so far.

Home | Main Index | Thread Index | Old Index