This directory contains the data of the Potsdam Twitter Sentiment
Corpus (ISLRN 714-621-985-491-3). To open the files of this
corpus, you need to download and launch
MMAX2—a freely distributed
annotation tool—and then select one of the *.mmax projects from the
directories corpus/annotator-1/ or corpus/annotator-2/.
The folders of this project are structured as follows:
-
corpus/– directory containing corpus files;annotator1/– directory containing MMAX projects for the first annotator;markables/– directory containing annotation files for the first annotator;
annotator2/– directory containing MMAX projects for the second annotator;markables/– directory containing annotation files for the second annotator;
basedata/andsource/– original corpus tokenization;custom/,scheme/, andstyle/– auxiliary MMAX2 data;
-
docs/– directory containing annotation guidelines and other accompanying documents; -
scripts/– directory containing scripts that were used to process corpus data;examples/– directory containing examples of input files for the scripts;align.py– auxiliary module used for annotation alignment;alt_fio.py– auxiliary module for AWK-like input/output operations;conll.py– auxiliary module for handling CONLL sentences;measure_corpus_agreement.py– script for measuring corpus agreement;merge_conll_mmax.py– script for aligning annotation from the corpus with the automatically processed CONLL data;
You can see the examples of invocations in the script files or by just
typing --help to see their usage.
I strongly recommend using the annotation of annotator-2 on the branch eexpression-revision (run git checkout eexpression-revision after cloning this project).