Category Archives: Welcome

What is COW?

The COW (COrpora from the Web) corpora are the result of an ongoing project which has the goal of determining the value of linguistic material collected from the World Wide Web for fundamental linguistic research. The data are made available to a limited audience of collaborators within the linguistic community. Work on COW is supported by the German Research Council (Deutsche Forschungsgemeinschaft, DFG) in the form of the project Linguistic web characterization and web corpus creation (SCHA1916/1-1). In essence, COW is a collection of linguistically processed gigatoken web corpora created by Felix Bildhauer and Roland Schäfer at Freie Universität Berlin.

We have corpora in Dutch, English, French, German, Spanish, Swedish. The fourth-generation COW16 corpora available for English, French, German, and Spanish add a lot of linguistic annotation and provide a much higher data quality, especially for German and French.

Access to COW is provided at webcorpora.org.

Correct citation of COW corpora

If you use the COW corpora in publications (including talks and presentations) you must cite a paper by the corpus creators. Which paper you have to cite must be determined at the time of the submission of your publication by visiting the COW citation link.