Re: Google SoC Proposal Ideas

To: tech-userlevel%NetBSD.org@localhost
Subject: Re: Google SoC Proposal Ideas
From: Joerg Sonnenberger <joerg%britannica.bec.de@localhost>
Date: Tue, 24 Mar 2009 18:27:55 +0100

On Tue, Mar 24, 2009 at 07:17:48PM +0200, Aleksey Cheusov wrote:
> Using the fact that diff(1) should not ALWAYS work 100% correctly (100%
> correctness is when diff generates MINIMAL difference) you
> can implement heuristic even with O(N1+N2) algorithm that can work much
> faster in practice for huge files.

...and such heuristics can easily fall apart rather badly. Look at the
history of diff(1) in OpenBSD for the mentioned case. Also note that
correctness is different from giving optimal results, but that is
nitpicking.

> The same for grep. nfa2dfa has O(I^N) complexity where I is a number of
> elements in input alphabet and N is a number of terminals in input nfa.
> Matching is O(N) or O(N^2) depending on implementation (I'm not about
> submatching).

Pure DFAs are known to be rather suboptimal for many critical cases due
to the space complexity. Any implementation that is not O(n) for
backreference-free regular expression is broken. Look at the extensive
research before correcting me, please.

In short: a replacement for either tool must be at least mostly feature
complete and should not fall apart in terms of performance or suboptimal
output.

Joerg

Follow-Ups:
- Re: Google SoC Proposal Ideas
  - From: Aleksey Cheusov

References:
- Google SoC Proposal Ideas
  - From: cberardi
- Re: Google SoC Proposal Ideas
  - From: Joerg Sonnenberger
- Re: Google SoC Proposal Ideas
  - From: Aleksey Cheusov

Prev by Date: Re: Google SoC Proposal Ideas
Next by Date: Re: Google SoC Proposal Ideas
Previous by Thread: Re: Google SoC Proposal Ideas
Next by Thread: Re: Google SoC Proposal Ideas
Indexes:

Home | Main Index | Thread Index | Old Index