Category Archives: Access

Correct citation of COW corpora

If you use the COW corpora in publications (including talks and presentations) you must cite a paper by the corpus creators. Which paper you have to cite must be determined at the time of the submission of your publication by visiting the COW citation link.

Read this: You can only query and download sentence shuffles!

As explained in Section 4.1 of Roland Schäfer (2015) Processing and querying large web corpora with the COW14 architecture, we have to take certain measures in order to stay within the bounds of German copyright laws. This means that we only release sentence shuffles, i.e., corpora which are just bags of sentences. In other words, there are no documents in released versions of COW corpora, just single sentences without contexts. The original URL plus some other meta data are recorded for each sentence, however.

RStudio Server and Python

We have an RStudio Server installation running on webcorpora.org. This allows users to:

  • use Python with the convenient ManaCOW wrappers (in development) to make scripted queries with Python (using RStudio as a minimal Python IDE), and
  • do statistical analyses of the results in RStudio directly without downloading them.

Users who have already registered  on webcorpora.org can apply for an RStudio account by writing us an email.

Note: The open-source version of RStudio Server, which we use, cannot be integrated into a single sign-on environment. Therefore, your NoSkE/download account and your RStudio account are two separate things with two separate passwords. If you have lost your RStudio Server password, please write us an email to have it reset. Please use the email address that you used when you registered on webcorpora.org. If you use a different address, we cannot help you for security reasons.