Author Archives: grosshose

Moving corporafromtheweb.org to webcorpora.org!

Bookmark http://www.webcorpora.org now in order to stay in touch with the COW!

We are in the process of reorganising our infrastructure. Our corpus server webcorpora.org will move to a site at Humboldt-Universität zu Berlin in 2022. (Registered users will be informed in time, and there won’t be any significant downtime.) The information provided here on corporafromtheweb.org will also be moved to the new server in the form of concise technical descriptions of the corpora provided. Please update your bookmarks!

COReX lexico-grammatical feature extractor

COReCO is our lexico-grammatical feature extraction system, developed at FU Berlin and IDS Mannheim. Please go here for preliminary information. Data will be released in 2017.

Based in parts on the previous COWCat experiments.

COReCo content classification

COReCO is our document topic classification system, developed at FU Berlin and IDS Mannheim. Please go here for preliminary information. Data will be released in 2017.

Based in parts on the previous COWCat experiments.

ENCOW14

ENCOW14 is the English web corpus by COW created with the 2014 technology of the COW initiative. Available through the portal for COW corpora at webcorpora.org running our custom Colibri² corpus portal software. Continue reading