Category Archives: COW16

RanDECOW17

RanDECOW17 is a German web corpus by COW created with the 2016 technology of the COW initiative. It is not based on breadth-first crawls, but it was “crawled” using the ClaraX research crawler developed in Roland Schäfer’s third-funded project Linguistic Web Characterisation.

RanDECOW17 was released in 2019 including the COReX document feature annotation. A version of RanDECOW17 is available through NoSketchEngine at webcorpora.org. It is not useful for most normal corpus studies.

The development of RanDECOW17 was funded by the DFG (SCHA1916/1-1). Continue reading

DECOW16 (A and B)

DECOW16 is the German web corpus by COW created with the 2016 technology of the COW initiative. DECOW16A was released in 2017 including the COReX document feature annotation. DECOW16B was released in 2018. The B iteration contains minor fixes and significantly improved topological parses and COReX data. DECOW16 is available through NoSketchEngine at webcorpora.org.

The development of DECOW16A/B was funded by the DFG (SCHA1916/1-1). Continue reading

FRCOW16

FRCOW16 is the French web corpus by COW created with the 2014 technology of the COW initiative. Available through the portal for COW corpora at webcorpora.org. STATUS UPDATE: Planned release date is 30 June 2017 for NoSketchEngine and July 15 2017 for the shuffle XML version. Continue reading

ENCOW16

ENCOW14 is the English web corpus by COW created with the 2016 technology of the COW initiative. Available through the portal for COW corpora at webcorpora.org running our custom Colibri² corpus portal software. Continue reading