[Gardeners] library for tokenization of natural languages ?
Jean-Christophe Helary
fusion at mx6.tiki.ne.jp
Mon Feb 5 19:12:06 CST 2007
On 5 févr. 07, at 22:29, Ian Eslick wrote:
> check out langutils and remind me to do another release soon!
Thanks !
Do another release soon !
:)
JC Helary
> On Feb 5, 2007, at 3:11 AM, Jean-Christophe Helary wrote:
>
>> I am looking for a library that would do basic to reasonably smart
>> tokenization of natural language strings.
>>
>> Like, if fed something in English or French, it creates tokens for
>> the things between the spaces, for Japanese, it deals with the non-
>> spaced strings in a rule based fashion.
>>
>> I think Lucene can do that and so montezuma would be a candidate (?),
>> but I wonder if any of you has experience with such tools, especially
>> for languages that do not use spaces.
>>
>> Jean-Christophe Helary
More information about the Gardeners
mailing list