Category Archives: Corpora

Read this: You can only query and download sentence shuffles!

As explained in Section 4.1 of Roland Schäfer (2015) Processing and querying large web corpora with the COW14 architecture, we have to take certain measures in order to stay within the bounds of German copyright laws. This means that we only release sentence shuffles, i.e., corpora which are just bags of sentences. In other words, there are no documents in released versions of COW corpora, just single sentences without contexts. The original URL plus some other meta data are recorded for each sentence, however.

FRCOW16

FRCOW16 is the French web corpus by COW created with the 2014 technology of the COW initiative. Available through the portal for COW corpora at webcorpora.org. STATUS UPDATE: Planned release date is 30 June 2017 for NoSketchEngine and July 15 2017 for the shuffle XML version. Continue reading

ENCOW16

ENCOW14 is the English web corpus by COW created with the 2016 technology of the COW initiative. Available through the portal for COW corpora at webcorpora.org running our custom Colibri² corpus portal software. Continue reading

ENCOW14

ENCOW14 is the English web corpus by COW created with the 2014 technology of the COW initiative. Available through the portal for COW corpora at webcorpora.org running our custom Colibri² corpus portal software. Continue reading

FRCOW14

FRCOW14 was the planned French web corpus by COW created with the 2014 technology of the COW initiative. Its release was delayed several times due to high workload and necessary improvements of the quality of the annotation implemented by us. Since it is now made entirely with COW16 technology, it is now released as FRCOW16.