Category Archives: Third-party


RanDECOW17 is a German web corpus by COW created with the 2016 technology of the COW initiative. It is not based on breadth-first crawls, but it was “crawled” using the ClaraX research crawler developed in Roland Schäfer’s third-funded project Linguistic Web Characterisation.

RanDECOW17 was released in 2019 including the COReX document feature annotation. A version of RanDECOW17 is available through NoSketchEngine at It is not useful for most normal corpus studies.

The development of RanDECOW17 was funded by the DFG (SCHA1916/1-1). Continue reading