tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Google SoC Proposal Ideas



On Tue, Mar 24, 2009 at 07:17:48PM +0200, Aleksey Cheusov wrote:
> Using the fact that diff(1) should not ALWAYS work 100% correctly (100%
> correctness is when diff generates MINIMAL difference) you
> can implement heuristic even with O(N1+N2) algorithm that can work much
> faster in practice for huge files.

...and such heuristics can easily fall apart rather badly. Look at the
history of diff(1) in OpenBSD for the mentioned case. Also note that
correctness is different from giving optimal results, but that is
nitpicking.

> The same for grep. nfa2dfa has O(I^N) complexity where I is a number of
> elements in input alphabet and N is a number of terminals in input nfa.
> Matching is O(N) or O(N^2) depending on implementation (I'm not about
> submatching).

Pure DFAs are known to be rather suboptimal for many critical cases due
to the space complexity. Any implementation that is not O(n) for
backreference-free regular expression is broken. Look at the extensive
research before correcting me, please.

In short: a replacement for either tool must be at least mostly feature
complete and should not fall apart in terms of performance or suboptimal
output.

Joerg


Home | Main Index | Thread Index | Old Index