tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Ordering of sh(1) pathname expansion



    Date:        Tue, 31 Jan 2023 02:29:02 +0000
    From:        Taylor R Campbell <campbell+netbsd-tech-userlevel%NetBSD.org@localhost>
    Message-ID:  <20230131022904.A3A4B60897%jupiter.mumble.net@localhost>

  | In sh(1) pathname expansion, the pattern doesn't constrain ordering,
  | only inclusion criteria.  It appears that NetBSD's sh(1) always sorts
  | lexicographically with strcmp(3), however.
  |
  | 1. Does POSIX sh guarantee strcmp(3) lexicographic ordering?

What POSIX says is (from XCU 2.13.3):

	those filenames and pathnames, sorted according to the
	collating sequence in effect in the current locale. If
	this collating sequence does not have a total ordering of
	all characters (see XBD Section 7.3.2, on page 127), any
	filenames or pathnames that collate equally shall be further
	compared byte-by-byte using the collating sequence for the
	POSIX locale.

So, "no".

That quote is from a not publicly released draft of the next version
(now expected to be 2024 I believe, but the time everything is done)
so the section number (perhaps) and page number (certainly) will not
match anything.  The reference in the quote is to the section which
defines LC_COLLATE.   "XBD" is the "Basic Definitions" section of
the standard (which contains definitions of all kinds of things, plus
specs of all required header files).

I think the actual section which contains that quote is new in the
draft, lots of text relating to pattern matching has been revised.
XCU is "Commands and Utilities", XCU 2 is the shell.

  | 2. Does NetBSD sh(1) guarantee strcmp(3) lexicographic ordering?

That's what it does, because sh(1) (along with many NetBSD utilities)
really knows nothing about locales.

If anyone would like to work on that, feel free - but note that it is
a minefield of contradictions, almost nothing about the way that the
charset parts of locales are defined makes much sense at all, other than
as a way to allow users to select a (single) character encoding and
use that for everything.   The Plan 9 solution was much better.

  | 3. Should the sh(1) man page be amended to specify the order?

Probably not, because it isn't guaranteed not to change.   That is,
unless you're using the C (aka POSIX) locale (which is what you get
when you're not explicitly using anything) - as for that locale, we
are doing what is required.

kre



Home | Main Index | Thread Index | Old Index