Link data sets (CC-BY)

You can download link databases derived from COW14/COW16 corpora from this repository:

COW link databases

These link databases were created using texrex available on GitHub from the COW14A/COW16A corpora. They contain all HTTP links between web pages in the corpora. As opposed to the corpora, the link databases can be used freely under a permissive Creative Common CC-BY license. Notice that the BY in CC-BY implies that you have to cite the appropriate papers specified here (always check this link immediately before you publish anything based on COW data):

http://corporafromtheweb.org/category/cow-citation/