tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: List of Keywords for apropos(1) Which Should Not be Stemmed



On Mon, Jul 11, 2016 at 08:59:05PM +0530, Abhinav Upadhyay wrote:
> 
> Thanks, that would be a good starting point too. I guess we will still
> have to add few words to the list manually later, but it should be
> good to begin with.
> 

How about checking the length of the word - technical abbreviations tend
to be short (<= 4 characters predominantly).  According to grep there
are 155 two letter words, 1358 three letter words and 5124 four letter
words (assuming my driving of grep is correct) in /usr/share/dict/words.
So it could be feasible to hash just the short words in the dictionary
and then stem if you find a match otherwise assume it is a technical
abbreviation and don't stem.

-- 
Brett Lymn


Home | Main Index | Thread Index | Old Index