mercredi 19 décembre 2012

Apache UIMA HMM Tagger FR Models

Download here: Models for the Apache UIMA Hidden Markov Model Tagger Annotator [1] (from the sandbox UIMA Addons)

The models concern the following tasks:
* Part of speech tagging (POS)
* Grammatical subcategorization (Subcat)
* Morphological inflection analysis (Mph) 
* Lemmatization(canonical form) 
* Ee analysis (POS + Subcat + Mph)

Models have been built with the addon's version 2.4 using the French Treebank corpus [2] (version 2010). The ftb licence does not prevent to distribute analysis results under whatever licence but it mentions that the ftb should be used only for research purpose.Consequently we restrict the use of these models only for research purposes.

To get the '.dat', unzip and have a look to the '/HMMTrainerTagger/french/' dir

[1] http://uima.apache.org/sandbox.html#tagger.annotator
[2] For more on the French Treebank, see Abeille, A., L. Clement, and F. Toussenel. 2003. `Building a treebank for French', in A. Abeille (ed) Treebanks , Kluwer, Dordrecht. http://www.llf.cnrs.fr/Gens/Abeille/French-Treebank-fr.php

Aucun commentaire:

Enregistrer un commentaire