mercredi 19 décembre 2012

Apache OpenNLP FR Models


Download here: the last version of the models for processing several common Natural Language Processing tasks in French with Apache OpenNLP  [1]

The models concern the following tasks: Sentence segmentation, Word tokenization, Part-of-Speech Tagging, Morphological inflection analysis*, Lemmatization*, Chunking, Person|Organization|Location Name Entity recognition**

Except for Named Entity models, models have been built using the French Treebank corpus [2] (version 2010). Its licence does not prevent to distribute its analysis results under whatever licence but it mentions that the ftb should be used only for research purpose. Consequently we restrict the use of these models only for research purposes. 


* To be used with the tagger 

** Named Finder Models have been built by Olivier Grisel. See for more detail [3]. 

To get the '.bin' files, unzip and have a loot at the '/opennlp/models/fr/' dir. 

[2] For more on the French Treebank, see Abeille, A., L. Clement, and F. Toussenel. 2003. `Building a treebank for French', in A. Abeille (ed) Treebanks , Kluwer, Dordrecht. http://www.llf.cnrs.fr/Gens/Abeille/French-Treebank-fr.php 

9 commentaires:

  1. I got this error message Model is a snapshot models are notsupported by release versions!
    How can I switch your model to release ?

    RépondreSupprimer
  2. As a matter of fact, models were generated using a snapshot version of OpenNLP.
    Consequently they are not supported by release versions.
    I am waiting the next 1.5.3 release to generate them again. This should come
    soon. Rigth now, the simple hack I can propose is to give the jar files I use. I have added them to the repository (you will have to fill up the form to get them).

    By the way, we are working on an exit from using the current source corpus
    whose licence is too restrictive. Let me your email if you wish to be informed about that.

    RépondreSupprimer
  3. From now, always check the link which appears at the beginning of the article "Download here: the last version of the models for processing several common Natural Language Processing tasks in French with Apache OpenNLP".
    It will lead you at a page where you will be able to download the last version of the models.

    RépondreSupprimer
  4. Hello,

    I have been trying to use your models and I also get the error of being a snapshot version. Also, in the latest version, however, there are no jar files. I downloaded them from the previous version but I don't know where to put them. Where should the jar files be put?

    I created Sentence recognizers, tokenizers, POS and ner from the models that were available (in this sequence) but strangely, I only get this error for tokenizer and not any other enhancer in the chain.

    RépondreSupprimer
  5. Hello, could you please explain a little bit how to use these models? I am unable to use them by putting them inside the datafiles directory. It seems that not all models are found there. Also, is it possible to have access to the training sets you used? Specially for NER?

    Thanks!

    RépondreSupprimer
  6. Dear Mohammad,

    as mentioned in my post
    * the last version of the models for processing several common Natural Language Processing tasks in French with Apache OpenNLP are available here https://docs.google.com/spreadsheet/viewform?formkey=dGJ5ZmRvZnF0WEVUSXBPbWRzNjBIaEE6MQ
    * information about these modes are available here
    https://sites.google.com/site/nicolashernandez/resources/opennlp

    Models for the sentence spliter, tokenizer, part-of-speech tagger, morphological analysers and chunker have been built using the French Treebank corpus (version 2010) for OpenNLP 1.5.3.

    Models for Named Entity recognition (Person|Organization|Location) have been built by Olivier Grisel using Wikipedia and DBpedia dumps. These models have been built with/for OpenNLP 1.5.1 and can be used with the Name Finder.
    See more here http://blogs.nuxeo.com/development/2011/01/mining-wikipedia-with-hadoop-and-pig-for-natural-language-processing/

    Which opennlp version do you use ?



    RépondreSupprimer
  7. Hello,

    Thank you for your response. Yes, I saw the details that you mentioned in your comments, including the links for NER model. To answer your question about opennlp, I am actually using opennlp as part of Apache Stanbol and it is using the latest (1.5.3) version i believe.

    Since your models were also made with the same version, I thought they would work, but it doesn't seem to work. I will try to use them directly with opennlp and see how it goes!

    As for the data sets, I noticed that you mentioned your sources. I would try to contact Olivier Grisel to see if he has a training/test set available. I don't have the infrastructure or the goal to do what he did with Hadoop. A more reasonable goal for me would be to maybe enhance the training set and build a new model suited for my use case.

    Thanks again!

    RépondreSupprimer
  8. can you give more specific steps on how to use your models ?
    i trien installing .tar.gz package in R studio but it always fails

    RépondreSupprimer
    Réponses
    1. Dear,

      I have no experience in R studio. Check the version compatibility first. Models were generated for OpenNLP 1.5.3.

      Supprimer