[Gardeners] library for tokenization of natural languages ?

Jean-Christophe Helary fusion at mx6.tiki.ne.jp
Mon Feb 5 19:12:06 CST 2007


On 5 févr. 07, at 22:29, Ian Eslick wrote:

> check out langutils and remind me to do another release soon!

Thanks !

Do another release soon !

:)

JC Helary

> On Feb 5, 2007, at 3:11 AM, Jean-Christophe Helary wrote:
>
>> I am looking for a library that would do basic to reasonably smart
>> tokenization of natural language strings.
>>
>> Like, if fed something in English or French, it creates tokens for
>> the things between the spaces, for Japanese, it deals with the non-
>> spaced strings in a rule based fashion.
>>
>> I think Lucene can do that and so montezuma would be a candidate (?),
>> but I wonder if any of you has experience with such tools, especially
>> for languages that do not use spaces.
>>
>> Jean-Christophe Helary


More information about the Gardeners mailing list