On Mon, Dec 03, 2012 at 04:06:06AM +0100, Alistair Crooks wrote:
> On Sun, Dec 02, 2012 at 07:35:51PM -0500, Thomas Dickey wrote:
> > On Mon, Dec 03, 2012 at 12:16:57AM +0100, Alistair Crooks wrote:
> > >
> > > It's long been a pet peeve of mine that regexp matching for word
> > > boundaries has an annoying dependency on the implementation.
> > >
> > > My thanks to the many people hwo helped out with this table, reproduced
> > > below, which shows what works, and what doesn't work, when attempting
> > > to match the zero-width pattern at a word boundary.
> >
> > \b is a perl feature
> >
> > man perlre explains about that, and \<
> > (BRE's versus ERE's, essentially).
>
> Let's look at what happens:
>
> vile-9.8nb1 on NetBSD/amd64 6.99.10, vile /usr/share/dict/words
> /\<arch
> cursor is placed at the start of the word "arch" on line 12397, as I
> would expect.
>
> vile-9.8e on FreeBSD/amd64 9.0-RC1 (I know, I know), vile
> /usr/share/dict/words
> /\<arch
> cursor is placed on the "arch" in "agonistarch" on line 4109
> i.e. \< as a word boundary is not respected.
> /\barch results in a "not found"
>
> now another try:
hmm - 9.8j is current. I don't recall any recent regex changes or related
fixes. I have 9.8i on FreeBSD 9, and don't see this behavior. I'll make
a to-do to investigate the port's configuration...
\b in vile means almost the same as \s
\b is [[:blank:]]
\s is [[:space:]]
so \s includes \r, \n and \f while \b does not.
(\b has been there since early 2001 - seems that I added it as part
of the character-class changes).
> vile-9.8e on FreeBSD/amd64 9.0-RC1, vile /etc/motd
> The text reads
> ...
> Welcome to FreeBSD!
> ...
> /\<to
> Results in "not found"
> so now let's use the one derived from perl regexps
> /\bto
> and the word "to" is found.
> (unfortunately, the cursor is placed on the space before the word "to".
> So, it's not quite zero-width, and some people may find that close
> enough.
> Again, unfortunately, I'm not one of them).
>
> not quite what i'd expect from RTFM, but thanks for the suggestion.
>
> > > regexp word boundaries
> > > \< \b [[:<:]]
> > > perl not works not
> >
> > (see manpage, as noted)
>
> I think it would probably be best if you viewed what I wrote as a general
> criticism that is the trainwreck of regexp word boundary matching, rather
> than pointing me at a manual page for one of the programs involved.
sure - but reading it, I look for things to improve (or fix).
\< should work because it's the most standardized.
\b is perl (perl will never be standardized - so I've read :-)
[[:<:]] is... (let's not digress)
I'd forgotten about \b in vile actually, but given compatibility issues
I could change its behavior to more closely match perl's (and added
it to my to-do list to investigate). Doing that would lose the nice
feature that all of the character classes have an abbreviation.
> > > freebsd vile not works not
> > > netbsd vile works not not
> >
> > without version numbers, I can only guess what you're referring to with
> > vile. \< has been part of vile for a long time; \b is different from perl
> > (vile matches whitespace rather than a word boundary). Both are in the
> > help-file. See
> >
> > http://invisible-island.net/vile/vile-toc.html
> > http://invisible-island.net/vile/vile-hlp.html#regular-expressions2
>
> Thanks - I remember fixing the \< zero-width matching in the mid 1990s
> on vile, and Paul merged the fix. Unfortunately, your change log only
> goes back as far as 1999 when the license was changed to the GPL (and
> when I stopped working on vile), so there's no record of anything going
> back that far.
All of the changelogs are in the sources - the practice used to be that
we would rename CHANGES to CHANGES.Rx, but I stopped doing that a while
back (filesizes aren't as important).
--
Thomas E. Dickey
http://invisible-island.net
ftp://invisible-island.net
Attachment:
pgpDdv04NS3Mc.pgp
Description: PGP signature