- crawling (ClaraX random walker with texrex)
- linguistic web characterization
- document classification (COReCo and COReX frameworks)
- web page cleaning/processing (texrex software suite)
- linguistic annotation (COW toolchain)
- languages: English, German, Swedish
The linguistic annotation of NLCOW14 was made possible with financial support from the Dutch Linguistics Department (Matthias Hüning) of Freie Universität Berlin in 2014.
COW was supported by Stefan Müller (German Grammar, Freie Universität Berlin) from 2011 to 2014. Both Felix Bildhauer and Roland Schäfer worked for the German Grammar Department while doing the fundamental research which helped to establish the COW initiative. Also, Stefan Müller made the computing infrastructure of the German Grammar Department available for COW.
Enrique was responsible for the linguistic annotation of NLCOW14.
Areas of expertise:
- named entity recognition for German
- web corpus graph visualization