pkgsrc-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: textproc/cabocha memory usage fix



Hi,

Issues only happened with tournament model, and it will be deprecated at
the next upstream release (heard from the author).
So I've removed those models from our package.

On Tue, 18 Feb 2014 10:54:58 +0900, Joerg Sonnenberger 
<joerg%britannica.bec.de@localhost> wrote:

Hi all,
the attached patch pushes the memory use of the "new" cabocha version
finally down below 2GB again. That's the limit I see in my bulk builds
and a very reasonable limit in general. The specific RAM use depends on
the STL implementation, e.g. libc++ has a 32 Bytes long std::string
class as it is optimised for short strings. The patch works by using two
ideas:

(1) Avoid resizing the feature_trie_output vector. It has very large
(8M+ elements) and the pair_weight hash map is already very huge (32M
elements).

(2) Avoid storing the stringified keys as long as possible. Most
importantly, push it after the point where pair_weight has been freed
again.

The code can likely be optimised for speed since e.g. compareIds can
likely avoid the the stringification, but getting it to work was my
priority.

Joerg



--
OBATA Akio / obata%lins.jp@localhost


Home | Main Index | Thread Index | Old Index