NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

bin/38108: single regexp implementation for NetBSD base system



>Number:         38108
>Category:       bin
>Synopsis:       single regexp implementation for NetBSD base system
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    bin-bug-people
>State:          open
>Class:          change-request
>Submitter-Id:   net
>Arrival-Date:   Tue Feb 26 20:30:00 +0000 2008
>Originator:     cheusov%tut.by@localhost
>Release:        NetBSD 4.0_STABLE
>Organization:
>Environment:
System: NetBSD chen.chizhovka.net 4.0_STABLE NetBSD 4.0_STABLE (GENERIC) #2: 
Tue Dec 25 17:42:38 EET 2007 
cheusov%chen.chizhovka.net@localhost:/srv/obj/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
It whould be nice to AWK/SED and GREP using the same regexp engine
from NetBSD libc. Or at least AWK able to be built with external
regexp engine that supports UTF-8. The same for usr.bin/grep and sed.

SUN did this for their AWK years ago, see wip/heirloom-awk.
I did this for MAWK (wip/mawk-uxre) too.

I think the following function can convert AWK regexp to ERE.

void prepare_regexp (char *regexp)
{
   int bs = 0;
   char *tail = regexp;
   char ch;

   while (ch = *regexp++, ch != 0){
      if (bs){
         switch (ch){
         case 'n': *tail++ = '\n';   break;
         case 't': *tail++ = '\t';   break;
         case 'f': *tail++ = '\f';   break;
         case 'b': *tail++ = '\b';   break;
         case 'r': *tail++ = '\r';   break;
         case 'a': *tail++ = '\07';  break;
         case 'v': *tail++ = '\013'; break;
         default:  *tail++ = '\\'; *tail++ = ch;
         }

         bs = 0;
      }else{
         if (ch == '\\'){
            bs = 1;
         }else{
            *tail++ = ch;
         }
      }
   }

   *tail = 0;
}

usr.bin/grep can also be adapted to use BRE/ERE without hacks for -f.

>Fix:

Unknown


Home | Main Index | Thread Index | Old Index