vendredi 27 juillet 2012

Example uima-connectors CAS to CSV

Using uima-connectors to serialize some CAS annotations into CSV-formatted files.
  1. This is performed in two steps: First, create a text view formatted in CSV with the information you wish serialize. 
  2. Second, write the views on the file system. The second step is optional, you do whatever you want with your view.


The first step is performed by the CAS2CSVAE. Each line of the '_CSVView', created at the first step, will contain some feature values of n annotation whose type have been configured via parameter. Columns of the lines contain the values of the features of the annotation type. They are also configured via parameters. 
The second step is performed by the ViewWriterAE. This AE requires the presence of org.apache.uima.examples.SourceDocumentInformation annotations in each CAS which results from using the FileSystemCollectionReader or using any tools from the apache uima SDK (like the DocumentAnalyzer). This is used to name the exported views in the file system.


Requirement
uima-connectors library is available here http://code.google.com/p/uima-connectors.
It requires the uima-common lib http://code.google.com/p/uima-common/.


Example of use
In the example eclipse project you will see how to turn TokenAnnotations with coveredText, posTag, stem which result from apache uima  addons analysis, into CSV formatted files.
See the descriptor desc/xample-apacheAddons-uimaConnectors/example-apacheAddons-uimaConnectors-CAS2CSV-ViewWriter-AAE.xml
In particular have a look at the aggregate and the parameter settings tabs.
The example project assumes you have installed the apache-uima and apache-uima-addons binaries. Check the build path before using it.