数据库简介
English Corpora英语语料库每月的使用人数超过17万人,为目前使用最广泛的语料库之一,平台整合了多个常用的语料库资源。本次打包采购,平台上所有语料库均有使用权限,包括最常用的COCA和BNC。
美国当代英语语料库COCA(The Corpus of Contemporary American English): 包含19万篇文本中约4.5亿个单词,收录年度1990-2012,分为话语、小说、杂志、报纸、学术5大类。
英语国家语料库BNC(British National Corpus):包含1亿个单词,收录年度1980-1993。BNC语料库最初由牛津大学出版社于1980-1990年建立。English Corpora平台收录BNC完整的语料信息,采用的版本为CLAWS 7 tagset。
访问网址
https://www.english-corpora.org
访问方式
注册个人账号后IP地址登录,无并发用户限制。
登录指南
打开网站,选择要使用的数据库,需要注册个人账号,机构选择“Other”。
提交后再返回登录,如果之前注册过个人账号,直接输入账号登录。登录后提示机构认证,选择“academic license”,在下页选择“Join by IP address”如下图:
然后返回主页,重新选择想要访问的语料库,右上角账号头标显示绿色和“LOGGED IN”,表明机构权限已认证,显示如下图:
机构认证
第一次绑定IP后,如以后登录时提示选择机构,可按照如下操作绑定机构名称,之后再访问语料库时就不会提醒了。
授权语料库列表
以下语料库资源均可以在平台检索并查看词条详细信息:
Corpus (online access) |
Download |
# words |
Dialect |
Time period |
Genre(s) |
iWeb: The Intelligent Web-based Corpus |
√ |
14 billion |
6 countries |
2017 |
Web |
News on the Web (NOW) |
√ |
12.1 billion+ |
20 countries |
2010-yesterday |
Web: News |
Global Web-Based English (GloWBE) |
√ |
1.9 billion |
20 countries |
2012-13 |
Web (incl blogs) |
Wikipedia Corpus |
√ |
1.9 billion |
(Various) |
2014 |
Wikipedia |
Corpus of Contemporary American English (COCA) |
√ |
1.0 billion |
American |
1990-2019 |
Balanced |
Coronavirus Corpus |
√ |
894 million+ |
20 countries |
Jan 2020-yesterday |
Web: News |
Corpus of Historical American English (COHA) |
√ |
400 million |
American |
1810-2009 |
Balanced |
The TV Corpus |
√ |
325 million |
6 countries |
1950-2018 |
TV shows |
The Movie Corpus |
√ |
200 million |
6 countries |
1930-2018 |
Movies |
Corpus of American Soap Operas |
√ |
100 million |
American |
2001-2012 |
TV shows |
|
|||||
Hansard Corpus |
|
1.6 billion |
British |
1803-2005 |
Parliament |
Early English Books Online |
|
755 million |
British |
1470s-1690s |
(Various) |
Corpus of US Supreme Court Opinions |
|
130 million |
American |
1790s-present |
Legal opinions |
TIME Magazine Corpus |
|
100 million |
American |
1923-2006 |
Magazine |
British National Corpus (BNC) * |
|
100 million |
British |
1980s-1993 |
Balanced |
Strathy Corpus (Canada) |
|
50 million |
Canadian |
1970s-2000s |
Balanced |
CORE Corpus |
|
50 million |
6 countries |
2014 |
Web |
From Google Books n-grams |
|||||
American English |
|
155 billion |
American |
1500s-2000s |
(Various) |
British English |
|
34 billion |
British |
1500s-2000 |
(Various) |