Subject: Re: PR/33392 CVS commit: src/dist/nawk
To: None <gnats-bugs@NetBSD.org>
From: Aleksey Cheusov <cheusov@tut.by>
List: netbsd-bugs
Date: 07/03/2006 13:41:08
>  | >  I think int is too wide. I made it unsigned short.
>  | 
>  | "640k is anough for everyone" ;)
>  | Seriuosly, I often use awk with very large regexps for my work.
>  | AFAIR, according to theory NFA for regexp
>  | that looks like (a|b)*(a|b)^n has equivalent DFA with 2^N states, so 65536
>  | states of DFA may correspond to NFA with only 16 (!!!) terminal
>  | symbols.  IMHO this kind of internal limits is bad. I read
>  | NetBSD philosophy but reality is that hardware changes fast.
>  | My 5 years old Athlon-800/384Mb RAM is capable of propressing
>  | DFAs including more than 2^16 states.
>  | So, I personally would prefer 'int' type for the states.
>  
>  I thought that this is limited by NCHARS+3. I will change it.

Changes you commited to the HEAD related to this PR seems good to me,
everything works correctly and much faster than gawk (for huge
regexps) that i used for years.

1) have you a plan to notify Brian about bug found?
2) have you a plan to add an additional regression test for awk?

>  | P.S.
>  | I saw HEAD changes in awk code and was surprized that
>  | lots of snprintf functions was changed to sprintf,
>  | and strlcpy to strcpy. Is this really ok?
>  
>  They were not done carefully so bugs were introduced and we decided
>  to back them out until someone does them carefully.
>  
>  | P.P.S
>  | Where is nawk upstream? Who maintains that YYYYMMDD versions?
>  
>  from /usr/src/doc/3RDPARTY.
>  Package:        nawk
>  Version:        2005-04-24
>  Current Vers:   2005-04-24
>  Maintainer:     Brian Kernighan <bwk@bell-labs.com> (Lucent Technologies)
>  Archive Site:   http://cm.bell-labs.com/who/bwk/
>  Home Page:      http://cm.bell-labs.com/who/bwk/
>  


-- 
Best regards, Aleksey Cheusov.