[Gardeners] library for tokenization of natural languages ?
Ian Eslick
eslick at csail.mit.edu
Mon Feb 5 07:29:33 CST 2007
check out langutils and remind me to do another release soon!
On Feb 5, 2007, at 3:11 AM, Jean-Christophe Helary wrote:
> I am looking for a library that would do basic to reasonably smart
> tokenization of natural language strings.
>
> Like, if fed something in English or French, it creates tokens for
> the things between the spaces, for Japanese, it deals with the non-
> spaced strings in a rule based fashion.
>
> I think Lucene can do that and so montezuma would be a candidate (?),
> but I wonder if any of you has experience with such tools, especially
> for languages that do not use spaces.
>
> Jean-Christophe Helary
>
>
>
>
> _______________________________________________
> Gardeners mailing list
> Gardeners at lispniks.com
> http://www.lispniks.com/mailman/listinfo/gardeners
More information about the Gardeners
mailing list