A web-derived Spanish corpus crawled in the ES top-level domain in late 2011. Created with texrex-mrvain (deprecated) and TreeTagger. Continue reading
Category Archives: Corpora
NLCOW12
A web-derived Dutch corpus crawled in the NL top-level domain in 2012. Created with texrex-mrvain (deprecated) and TreeTagger. Continue reading
SECOW12
A web-derived Swedish corpus crawled in the SE top-level domain in late 2011. Created with texrex-mrvain (deprecated) and HunPos. Continue reading
DECOW12
A web-derived corpus crawled in the DE top-level domain in late 2011. Created with texrex-mrvain (deprecated) and TreeTagger. The DECOW12Q is a subset that contains only documents written in a quasi-spontaneous register, selected based on the occurrences of cliticized variants of the indefinite article as described in: Schäfer and Sayatz (2014) Die Kurzformen des Indefinitartikels im Deutschen. Zeitschrift für Sprachwissenschaft 33(2). Continue reading