Re: Google SoC Proposal Ideas

To: tech-userlevel%NetBSD.org@localhost
Subject: Re: Google SoC Proposal Ideas
From: Aleksey Cheusov <vle%gmx.net@localhost>
Date: Tue, 24 Mar 2009 19:17:48 +0200

 >> My own idea is to reimplement a GNU/GPL utility under a BSD license,
 >> like grep or diff/diff3. 

> Both grep and diff can be a bit ambigious in terms of algorithm
> complexity.
What exactly is ambigious in diff's complexity?
Its complexity is O(N1*N2) where N1 and N2 are a number of lines 
in two files. I don't know how GNU and OpenBSD diffs are implemented,
but diff is one of the well know example for "dynamic programming".

Using the fact that diff(1) should not ALWAYS work 100% correctly (100%
correctness is when diff generates MINIMAL difference) you
can implement heuristic even with O(N1+N2) algorithm that can work much
faster in practice for huge files.

The same for grep. nfa2dfa has O(I^N) complexity where I is a number of
elements in input alphabet and N is a number of terminals in input nfa.
Matching is O(N) or O(N^2) depending on implementation (I'm not about
submatching).

> grep for example is demanding if you want to implement
> multibyte support correctly and full GNU grep compatibility needs
> changes to the system regex library.
agc@ said it has utf-8 aware regexp engine.

> diff has some critical edge cases like a number of small changes for
> large files (think configure).
Explain please.

-- 
Best regards, Aleksey Cheusov.

Follow-Ups:
- Re: Google SoC Proposal Ideas
  - From: Joerg Sonnenberger

References:
- Google SoC Proposal Ideas
  - From: cberardi
- Re: Google SoC Proposal Ideas
  - From: Joerg Sonnenberger

Prev by Date: Adding openresolv to base
Next by Date: Re: Google SoC Proposal Ideas
Previous by Thread: Re: Google SoC Proposal Ideas
Next by Thread: Re: Google SoC Proposal Ideas
Indexes:

Home | Main Index | Thread Index | Old Index