id | sameAs | financier | programme | workType | name | year | targetedTask | modality | textualGenre | normalizedTextualGenre | domain | Sub-Domain | tagging | usedTypology | construction | language | size | normalizedSize | format | license | normalizedLicense | availability | catalogueReference | islrn | link | publication |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | - | DARPA | MUC | shared task | MUC-6 | 1995 | NER | W | NW | NW | SPEC | BUS | NE | MUC | manual | eng | 200 articles, 318 docs slot filling | 207,200 | sgml | LDC | LDC | non-free | LDC2003T13 | 402-267-910-068-8 | - | - |
2 | - | DARPA | MUC | shared task | MET-2 | 1998 | NER | W | NW | NW | SPEC | airline crashes, launch events | NE | MUC | manual | jpn | 414 docs | 165,600 | sgml | - | - | downloadable | - | - | http://www.itl.nist.gov/iaui/894.02/related_projects/muc/proceedings/muc_7_proceedings/overview.html, http://www.itl.nist.gov/iaui/894.02/related_projects/muc/ | - |
3 | - | DARPA | MUC | shared task | MET-2 | 1998 | NER | W | NW | NW | SPEC | airline crashes, launch events | - | MUC | manual | zho | 308 docs | 123,200 | sgml | - | - | downloadable | - | - | - | - |
4 | - | DARPA | MUC | shared task | MUC-7 | 1998 | NER | W | NW | NW | SPEC | MIL | NE | MUC | manual | eng | 400 articles | 160,000 | - | LDC | LDC | non-free | LDC2001T02 | 783-262-033-141-8 | http://www.aclweb.org/anthology/M98-1028 | - |
5 | - | DARPA | MUC | shared task | HUB-4 | 1998 | NER | S | BN | BN | GEN | - | NE | MUC | manual | eng | 3h | 37,500 | - | - | - | - | LDC2000S86 | - | http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.28.520&rep=rep1&type=pdf | - |
6 | - | NIST | ACE | shared task | ACE-2 | 2001 | EDT, RDC | WS | NW, BN, NP | NW, BN, NP | GEN | - | NE, RDC | Basic,GSP,FAC | manual | eng | 180k | 180,000 | sgml,xml,tab | LDC | LDC | non-free | LDC2003T11 | 498-363-793-174-9 | ftp://jaguar.ncsl.nist.gov/ace/phase1/edt_phase1_v2.2.pdf, ftp://jaguar.ncsl.nist.gov/ace/phase2/docs/RDC-Guidelines-v2.3.doc | - |
7 | - | NIST | CoNLL | shared task | CoNLL 2002 | 2002 | NERC | W | NW | NW | GEN | - | NE | POS,CONLL | manual | spa | 370k | 370,000 | IOB | private | private | non-free | - | - | http://www.cnts.ua.ac.be/conll2002/ner/ | - |
8 | - | NIST | CoNLL | shared task | CoNLL 2002 | 2002 | NERC | W | NW | NW | GEN | - | NE | POS,CONLL | manual | nld | 310k | 310,000 | IOB | private | private | non-free | - | - | - | |
9 | - | NIST | CoNLL | shared task | CoNLL 2003 | 2003 | NERC | W | NW | NW | GEN | - | NE | POS,CONLL | manual | eng | 210k | 210,000 | IOB | private | private | non-free | - | - | http://www.cnts.ua.ac.be/conll2003/ner/ | - |
10 | - | NIST | CoNLL | shared task | CoNLL 2004 | 2003 | NERC | W | NW | NW | GEN | - | NE | POS,CONLL | manual | deu | 310k | 310,000 | IOB | private | private | non-free | - | - | - | - |
11 | - | NIST | ACE/TIDES | shared task | ACE 2003 | 2003 | EDT, RDC | WS | NW, BN, TS | NW, BN, TS | GEN | - | NE | - | manual | eng | 91k | 91,000 | sgml,xml,tab | LDC | LDC | non-free | LDC2004T09 | 685-740-491-198-0 | - | - |
12 | - | US NSF | Berkeley | research project | BioText | 2003 | RDC | W | articles | medline abstracts | BIO | - | NE,RDC | - | manual | eng | 1100 medline abstracts | 385,000 | XML | downloadable | - | - | http://biotext.berkeley.edu/data/dis_treat_data.html | - | ||
13 | - | NIST | ACE/TIDES | shared task | ACE 2003 | 2003 | EDT, RDC | WS | - | - | GEN | - | - | - | manual | ara | 43k | 43,000 | sgml,xml,tab | LDC | LDC | non-free | - | - | - | - |
14 | - | NIST | ACE/TIDES | shared task | ACE 2003 | 2003 | EDT, RDC | WS | - | - | GEN | - | - | - | manual | zho | 98k | 98,000 | sgml,xml,tab | LDC | LDC | non-free | - | - | - | - |
15 | - | NIST | ACE | shared task | ACE 2004 | 2004 | EDT, RDC | WS | NW, BN | NW, BN | GEN | - | NE, TE, RD | ACE | manual | eng | 158k | 158,000 | sgml,xml,tab | LDC | LDC | non-free | LDC2005T09 | 789-870-824-708-5 | - | - |
16 | - | - | Biocreative | shared task | BIOCREATIVE I | 2004 | NERC | W | medline articles (from Genetag!) | medline abstracts | BIO | Gene mentions | NE | - | semi-automatic | eng | 15,000 sentences | 31,500 | - | - | - | - | - | - | http://www.biocreative.org/tasks/biocreative-i/first-task-gm/ | https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-6-S1-S1 |
17 | - | - | - | shared task | JNLPBA | 2004 | NERC | W | medline abstracts | medline abstracts | BIO | Bio medical | NE | - | semi-automatic | eng | 401 medline abstracts | 140,350 | - | - | - | - | - | - | - | http://dl.acm.org/citation.cfm?id=1567610 |
18 | - | NIST | GALE, AQUAINT, ACE, TIDES | - | BBN Pronoun Coreference and Entity Type | 2005 | NERC | W | NP | NP | GEN | - | NE | Extended | - | eng | 1M | 1,000,000 | txt (stand-off annotation) | LDC | LDC | non-free | LDC2005T33 | 375-520-999-436-0 | - | - |
19 | - | NIST | ACE | shared task | ACE 2005 SpatialML | 2005 | - | WS | NW, BN et BC | NW, BN, BC | GEN | - | PLN | N/A | manual | eng | 300k | 300,000 | sgml,xml,tab | LDC | LDC | non-free | LDC2008T03 | 472-226-418-389-7 | - | - |
20 | - | NIST | ACE | shared task | ACE 2005 SpatialML v2 | 2005 | - | WS | NW, BN et BC | NW, BN, BC | GEN | - | - | N/A | manual | eng | 210k | 210,000 | sgml,xml,tab | LDC | LDC | non-free | LDC2011T02 | 912-956-774-503-2 | - | - |
21 | - | NIST | ACE | shared task | ACE 2005 SpatialML | 2005 | - | WS | NW, BN et BC | NW, BN, BC | GEN | - | - | N/A | manual | zho | 298 docs | 119,200 | sgml,xml,tab | LDC | LDC | non-free | LDC2010T09 | 951-452-048-245-8 | - | - |
22 | - | NIST | ACE | shared task | ACE 2005 | 2005 | EDR, RDC | WS | NW, BN, BC, WL, TS | NW, BN, BC, WL, TS | GEN | - | NE, TE, RD, events | ACE | manual | eng | 303k | 303,000 | sgml,xml,tab | LDC | LDC | non-free | LDC2006T06 | 458-031-085-383-4 | - | - |
23 | - | - | Biocreative | shared task | GENETAG-05 | 2005 | NERC | W | medline articles | medline abstracts | BIO | - | NE | - | semi-automatic | eng | 20,000 sentences, 547801 words | 547,801 | - | - | - | downloadable | - | - | http://biocreative.sourceforge.net/bio_corpora_links.html | https://www.researchgate.net/publication/7782145_GENETAG_A_tagged_corpus_for_geneprotein_named_entity_recognition |
24 | - | - | - | shared task | LLL05 | 2005 | RDC | W | articles | medline abstracts | BIO | - | NE,RDC | - | - | eng | - | - | - | - | - | - | - | http://genome.jouy.inra.fr/texte/LLLchallenge/#task1 | - | |
25 | - | NIST | ACE | shared task | ACE 2005 | 2005 | EDR, RDC | WS | NW, BN, BC, WL, TS | NW, BN, BC, WL, TS | GEN | - | NE, TE, RD, events | ACE | manual | ara | 112k | 112,000 | sgml,xml,tab | LDC | LDC | non-free | - | - | - | - |
26 | - | NIST | ACE | shared task | ACE 2005 | 2005 | EDR, RDC | WS | NW, BN, BC, WL, TS | NW, BN, BC, WL, TS | GEN | - | NE, TE, RD, events | ACE | manual | zho | 334k | 334,000 | sgml,xml,tab | LDC | LDC | non-free | - | - | - | - |
27 | - | - | Biocreative | shared task | GENETAG | 2005 | NERC | W | Medline | medline abstracts | BIO | - | NE | - | semi-automatic | eng | 547k | 547,000 | - | - | - | - | - | - | http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-6-S1-S3 | - |
28 | - | - | Biocreative | shared task | BIOCREATIVE II | 2006 | NERC,RDC | W | articles | medline abstracts | BIO | - | NE,RDC | - | - | eng | 4,171 sentences | 87,591 | IeXML | downloadable | - | - | http://www.biocreative.org/tasks/biocreative-ii/, ftp://ftp.ebi.ac.uk/pub/software/textmining/corpora/BioCreative/ | - | ||
29 | - | - | - | research project | Yapex | 2002 | NERC | W | medline abstracts | medline abstracts | BIO | protein | NERC | - | manual | eng | 55,616 | 55,616 | - | - | - | - | - | - | http://universal.elra.info/product_info.php?cPath=42_43&products_id=1460 | www.mpb.unige.ch/reports/rap_SanaaChtioui.pdf |
30 | - | - | - | research project | Genia | 2006 | NERC | W | articles | medline abstracts | BIO | - | ML | Mesh | - | eng | 1999 abstracts | 699,650 | XML | - | - | downloadable | - | - | http://www.geniaproject.org/genia-corpus/term-corpus, http://universal.elra.info/product_info.php?cPath=42_43&products_id=1460 | - |
31 | - | EVALDA | ESTER | shared task | ESTER 1 | 2006 | NERC | S | BN | BN | GEN | - | NE | Extended | manual | fra | 100 hours | 1,250,000 | xml | - | - | low fee | - | - | - | - |
32 | - | - | research project | ROCO - Romanian journalistic corpus | 2006 | several,NERC | W | NP | NP | GEN | - | ML, NE | not said | semi-automatic | ron | 7.1M | 7,100,000 | xml | ELRA | ELRA | free of charge | ELRA-W0085 | 312-617-089-348-7 | http://www.lrec-conf.org/proceedings/lrec2006/pdf/451_pdf.pdf | - | |
33 | - | - | Technolangue | shared task | ARCADE 2 | 2006 | several,NERC | W | NP, parallel corpora | NP, parallel corpora | GEN | - | alignment, NE | not said | manual | fra | 316k | 316,000 | xml | ELRA | ELRA | low fee | ELRA-E0018 | 875-865-064-331-9 | - | |
34 | - | EU | CESAR PROJECT | - | Szeged NER corpus | 2006 | NERC | W | short business news | NW | GEN | - | NE | CONNL | manual | hun | 200k tokens | 200,000 | - | Academic - Non Commercial Use | CC-BY-NC-SA | - | - | - | http://www.lrec-conf.org/proceedings/lrec2006/pdf/365_pdf.pdf | - |
35 | - | - | - | research project | BulTreeBank | 2006 | several,NERC | W | VAR | VAR | GEN | news, literature | ML,NE | - | manual | bul | 15k sentences | 315,000 | xml | free license | CC-BY | apparently downloadable | - | - | http://www.bultreebank.org | - |
36 | - | NIST | ACE | shared task | ACE 2007 | 2007 | EDT,RDR | W | NW | NW | GEN | - | EDR, RDR, EMD, RMD | ACE | manual | spa | 150k | 150,000 | sgml,xml,tab | LDC | LDC | non-free | LDC2014T18 | 600-375-253-846-9 | - | - |
37 | - | NIST | ACE | shared task | ACE 2007 | 2007 | EDT,RDR | W | NW, WL, BN | NW, WL, BN | GEN | - | EDR, RDR, EMD, RMD | ACE | manual | ara | 205k | 205,000 | sgml,xml,tab | LDC | LDC | non-free | LDC2014T18 | 600-375-253-846-9 | - | - |
38 | - | U.S NSF | TREC | shared task | TREC Genomics 2007 | 2007 | IR | W | articles | scientific articles | BIO | - | passages | eng | - | - | - | - | - | - | - | - | - | - | ||
39 | - | NIST | ACE | shared task | ACE 2007 | 2007 | EDT,RDR | WS | BC,NW,BN,WL,Conversation | BC,NW,BN,WL,Conversation | GEN | - | EDR, RDR, EMD, RMD | ACE | manual | eng | 265k | 265,000 | sgml,xml,tab | LDC | LDC | non-free | - | - | - | - |
40 | - | NIST | ACE | shared task | ACE 2007 | 2007 | EDT,RDR | W | BN, NW, WL | BN, NW, WL | GEN | - | EDR, RDR, EMD, RMD | ACE | manual | zho | 125k | 125,000 | sgml,xml,tab | LDC | LDC | non-free | - | - | - | - |
41 | - | - | EVALITA | shared task | I-CAB 4.1, Evalita 2007 | 2007 | NERC | W | NP | NP | GEN | - | ML, NE | P,O,L,GPE | manual | ita | 182k | 182,000 | IOB,xml | free license | CC-BY | free | - | - | - | - |
42 | - | - | Metanet4u Project | shared task | CHIL 2007+ Evaluation Package | 2007 | several,NERC | S | seminars | seminars | GEN | - | multi layer | not said | not said | eng | - | - | DB | ELRA | ELRA | non-free | ELRA-E0041 | 639-487-568-289-4 | - | - |
43 | - | - | CNEC | research project | Czech Named Entity Corpus 1.1 (CNEC) | 2007 | NERC | W | VAR | VAR | GEN | - | NE | Extended | manual | ces | 5800 sentences | 121,800 | plain text, xml, html, treex | CC-BY-NC-SA 3.0 | CC-BY-NC-SA | downloadable | - | - | http://ufal.mff.cuni.cz/cnec | - |
44 | - | - | - | - | ANERCorp | 2007 | NERC | W | NW,WEB | NW,WEB | GEN | - | NE | CoNNL | manual | ara | 150,000 | 150,000 | BIO | - | - | downloadable | - | - | http://users.dsic.upv.es/~ybenajiba/ | http://link.springer.com/chapter/10.1007/978-3-540-70939-8_13 |
45 | - | US NSF | - | research project | Manually Annotated Sub-Corpus Third Release (MASC) | 2008 | several,NERC | WS | ANC, contemporary English | VAR | GEN | - | ML,NE | Basic,D | semi-automatic | eng | 500k | 500,000 | - | - | - | LDC2013T12 | - | http://www.lrec-conf.org/proceedings/lrec2008/pdf/617_paper.pdf | - | |
46 | - | - | - | research project | PennBioIE CYP 1.0 Corpus | 2008 | NERC | W | PubMed abstracts | medline abstracts | BIO | - | ML,NE | 5 biomedical entities | manual | eng | 274k | 274,000 | standoff | private | fee | - | LDC2008T20 | - | - | - |
47 | - | - | - | research project | PennBioIE Oncology Corpus | 2008 | NERC | W | PubMed abstracts | medline abstracts | BIO | - | oncological NE | manual | eng | 380k | 380,000 | standoff | private | fee | - | LDC2008T21 | - | http://anthology.aclweb.org/W/W04/W04-3111.pdf | - | |
48 | - | - | - | - | Arizona Disease | 2008 | NERC | W | PubMed abstracts | medline abstracts | BIO | Bio medical | NE | diseases | eng | 2,775 sentences | 58,275 | - | - | downloadable | - | - | ftp://ftp.ebi.ac.uk/pub/software/textmining/corpora/diseases/ | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2352871/ | ||
49 | - | - | - | research project | SCAI-Test | 2008 | NERC | W | - | - | BIO | Bio medical | NE | - | manual | eng | 100 medline abstracts | 35,000 | IOB | - | - | downloadable | - | - | https://www.scai.fraunhofer.de/en/business-research-areas/bioinformatics/downloads/corpora-for-chemical-entity-recognition.html | https://pub.uni-bielefeld.de/download/2603498/2624539 |
50 | - | - | Biocreative | shared task | BIOCREATIVE III | 2009 | RDC | W | PubMed abstracts | medline abstracts | BIO | - | RDC | - | - | - | - | 265,000 | xml | - | - | - | - | - | http://www.biocreative.org/tasks/biocreative-iii/ppi/ | |
51 | - | - | shared task | ESTER 2 corpus | 2009 | NERC | S | BN | BN | GEN | - | NE | Extended | manual | fra | 100 hours | 1,250,000 | xml | - | - | low fee | ELRA-S0338 | 123-207-221-143-8 | - | - | |
52 | - | NIST | ACE | shared task | REFLEX Entity Translation | 2009 | NERC, Entity Translation | W | NW,WL | NW,WL | GEN | - | NE, TE | ACE | manual | eng | 22.5k | 22,500 | - | LDC | LDC | - | LDC2009T09 | - | http://www.itl.nist.gov/iad/mig//tests/ace/2007/et/ | |
53 | - | NIST | ACE | shared task | REFLEX Entity Translation | 2009 | NERC, Entity Translation | W | NW,WL | NW,WL | GEN | - | NE, TE | ACE | manual | zho | 22.5k | 22,500 | - | LDC | LDC | - | LDC2009T10 | - | - | - |
54 | - | NIST | ACE | shared task | REFLEX Entity Translation | 2009 | NERC, Entity Translation | W | NW,WL | NW,WL | GEN | - | NE, TE | ACE | manual | ara | 22.5k | 22,500 | - | LDC | LDC | - | LDC2009T11 | - | - | - |
55 | - | SCHWA | WikiNER | research project | WikiGold | 2009 | NERC | W | WKPD | WKPD | GEN | - | NE | CONLL | manual | eng | 39k | 39,000 | IOB | CC BY 3.0 | CC BY | downloadable | - | http://schwa.org/projects/resources/wiki/Wikiner | http://www.joelnothman.com/downloads/PeoplesWeb02.pdf | |
56 | - | EU FP7 | - | research project | CALBC-SSC-III-Small | 2009 | NERC | W | medline abstracts | medline abstracts | BIO | Bio medical | NE | - | automatic | eng | 179,999 medline abstracts | 62,999 | - | - | - | - | - | - | http://www.ebi.ac.uk/Rebholz-srv/CALBC/corpora/resources.html | http://lbm2009.biopathway.org/papers/long/The_CALBC_Silver_Standard_Corpus_-_Harmonizing_multiple_semantic_annotations_in_a_large_biomedical_corpus.pdf |
57 | - | EU FP7 | - | research project | CALBC-SSC-III-Big | 2009 | NERC | W | medline abstracts | medline abstracts | BIO | Bio medical | NE | - | automatic | eng | 714,283 medline abstracts | 249,999,050 | - | - | - | - | - | - | - | - |
58 | - | - | - | research project | FSU-PRGE | 2009 | - | W | medline abstracts | medline abstracts | BIO | Bio medical | NE | - | semi-automatic | eng | 3306 medline abstracts | 1,157,100 | - | - | - | - | - | - | http://pubannotation-old.dbcls.jp/projects/FSU-PRGE | http://aclweb.org/anthology/W/W10/W10-1838.pdf |
59 | - | - | - | research project | LINNAEUS | 2010 | NERC | W | PubMed articles | scientific articles | BIO | - | NE | - | eng | 10000 articles | 4,000,000 | - | - | - | - | - | - | http://linnaeus.sourceforge.net/ | - | |
60 | - | - | - | research project | Finin-tweets | 2010 | NERC | W | tweets | tweets | GEN | - | NE | Basic | crowd-sourced | eng | 12,800 tweets | 102,400 | IOB | downloadable | - | http://www.cs.jhu.edu/~mdredze/publications/amt_ner.pdf | - | |||
61 | - | - | - | shared task | Evalita 2011 | 2011 | NERC | S | BN | BN | GEN | - | NE | P,O,L,GPE | manual | ita | 79238 | 79,238 | - | - | - | - | - | - | http://www.evalita.it/2011/tasks/NER | - |
62 | - | - | - | research project | Ratinov-AQUAINT subset | 2011 | NERC,EL | W | NW | NW | GEN | - | NE, references | - | manual | eng | - | - | sgml | LDC2002T31 | - | http://cogcomp.cs.illinois.edu/papers/RRDA11.pdf | - | |||
63 | - | - | - | research project | Ratinov-MSNBC | 2011 | NERC,EL | W | NW | NW | GEN | - | NE, references | - | manual | eng | - | - | NIF | - | http://cogcomp.cs.illinois.edu/papers/RRDA11.pdf | - | ||||
64 | - | EU | - | research project | Polish Sejm Corpus | 2011 | NERC | S | parliament | parliament | GEN | - | POS, syntax, NE | Basic | automatic | pol | 14M tokens | 14,000,000 | unrestricted use | CC BY | free of charge | - | - | - | - | |
65 | - | - | - | research project | AIDA CONLL YAGO Dataset | 2011 | NERC, EL | W | reuters news corpora | NW | GEN | - | NE, references | CONNL | automatic | eng | - | NIF | free of charge, CoNLL Licence | CoNLL | downloadable | - | - | https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/aida/downloads/ | - | |
66 | - | - | - | research project | Polish Coreference Corpus | 2011 | NERC | WS | various data (National Corpus of Polish) | VAR | GEN | - | ML,NE | Basic | automatic | pol | 500k | 500,000 | xml | CC-BY | CC-BY | broken link | - | - | - | - |
67 | - | - | - | - | Ratinov-ACE-coref 2004 | 2011 | NERC,EL | W | - | - | GEN | - | NE, references | - | manual | eng | - | - | NIF | - | - | - | - | - | see also for NIF: http://dashboard.nlp2rdf.aksw.org/ | - |
68 | - | - | - | - | Ratinov-Wiki | 2011 | NERC | W | - | WKPD | WKPD | - | - | - | semi-automatic | eng | - | - | NIF | - | - | - | - | - | - | - |
69 | - | - | ETAPE | shared task | ETAPE | 2012 | NERC | S | BC,BN | BC,BN | GEN | - | NE, speech | 7 types | manual | fra | 30 hours | 375,000 | ? | ? | - | ? | - | - | - | - |
70 | - | - | Ancora | research project | Ancora corpus | 2012 | several, NERC | W | NP | NP | GEN | - | ML, NE | CONLL, NUM | - | spa | 500k | 500,000 | ? | GPL | GPL | free of charge | - | - | http://clic.ub.edu/corpus/en/ancora | - |
71 | - | - | Ancora | research project | Ancora corpus | 2012 | several, NERC | W | NP | NP | GEN | - | ML, NE | CONLL, NUM | - | cat | 500k | 500,000 | ? | GPL | GPL | free of charge | - | - | - | - |
72 | - | CESAR | - | research project | OpinHuBank | 2012 | OM | W | NW,SocialMedia,Blog | VAR | GEN | - | NE,OM | not said | automatic | hun | 80k (8000 sent.) | 80,000 | txt/csv | CC-BY-SA | CC-BY-SA | - | - | - | - | |
73 | - | - | - | research project | Szeged Criminal NE Corpus | 2012 | NERC | W | texts on criminal offences | texts on criminal offences | SPEC | CRIME | NE | CONNL | manual | hun | 540k | 540,000 | Academic - Non Commercial Use | CC-BY-NC-SA | - | - | - | - | ||
74 | - | - | - | shared task | Entity profiling ORM Twitter (Meij) | 2012 | ORM, C2KB | W | tweets | tweets | GEN | - | terms, NE | O | semi-automatic | eng | - | - | tsv | - | - | - | http://nlp.uned.es/~damiano/datasets/entityProfiling_ORM_Twitter.html | http://nlp.uned.es/~damiano/pdf/spina2012corpusEntityProfiling.pdf | ||
75 | - | - | - | - | ROMBAC - Romanian balanced corpus | 2012 | several,NERC | W | VAR | VAR | VAR | journalism, law, fiction, medicine, biographical | ML, NE | not said | semi-automatic | ron | 41M | 41,000,000 | xml | ELRA | ELRA | free of charge | ELRA-W0088 | 162-192-982-061-0 | http://www.lrec-conf.org/proceedings/lrec2012/pdf/218_Paper.pdf | |
76 | - | PANACEA | - | research project | Panacea | 2012 | several,NERC | W | WEB | WEB | ENV | - | ML,NE | not said | automatic | ell | 34.6M | 346,000,000 | CC-BY-SA | CC-BY-SA | - | - | - | - | - | |
77 | - | PANACEA | - | research project | Panacea | 2012 | several,NERC | W | WEB | WEB | ENV | - | ML,NE | not said | automatic | ita | 36M | 36,000,000 | txt/plain | CC-BY-SA | CC-BY-SA | - | - | - | - | - |
78 | - | PANACEA | - | research project | Panacea | 2012 | several,NERC | W | WEB | WEB | ENV | - | ML,NE | not said | automatic | spa | 30M | 30,000,000 | txt/plain | CC-BY-SA | CC-BY-SA | - | - | - | - | - |
79 | - | PANACEA | - | research project | Panacea | 2012 | several,NERC | W | WEB | WEB | LAB | - | ML,NE | not said | automatic | ita | 70M | 70,000,000 | txt/plain | CC-BY-SA | CC-BY-SA | - | - | - | - | - |
80 | - | PANACEA | - | research project | Panacea | 2012 | several,NERC | W | WEB | WEB | LAB | - | ML,NE | not said | automatic | spa | 60M | 60,000,000 | txt/plain | CC-BY-SA | CC-BY-SA | - | - | - | - | - |
81 | - | PANACEA | - | research project | Panacea | 2012 | several,NERC | W | WEB | WEB | LAB | - | ML,NE | not said | automatic | ell | 26M | 26,000,000 | txt/plain | CC-BY-SA | CC-BY-SA | - | - | - | - | - |
82 | - | - | - | - | HunNERwiki | 2012 | NERC | W | WKPD | WKPD | GEN | - | ML,NE | CONNL | automatic | hun | 19M | 19,000,000 | txt,csv | CC-BY-SA 3.0 | CC-BY-SA | - | - | - | Automatically generated NE tagged corpora for English and Hungarian. http://hlt.sztaki.hu/resources/hunnerwiki.html | - |
83 | - | - | - | research project | Corpus NE | 2012 | QA | W | - | - | GEN | - | NE | Basic | manual | eng | 60k (6000 sent.) | 60,000 | - | CC-BY-SA | CC-BY-SA | downloadable | - | - | http://metashare.elda.org/repository/browse/corpusne/8a5694f8a19911e1ab95080027f903f25bf18b8246744a13b65da3b6515cb0d4/ | - |
84 | - | Appen | - | - | Appen Named Entity Corpora | 2012 | NERC | WS | - | - | GEN | - | NE | Basic,GPE,Nationality,Religion,Facility,Titles,Quantities | - | ara | 500k | 500,000 | - | - | - | non-free | - | - | http://isca-speech.org/iscapad/iscapad.php?module=article&id=11576 | - |
85 | - | Appen | - | - | Appen Named Entity Corpora | 2012 | NERC | WS | - | - | GEN | - | NE | Basic,GPE,Nationality,Religion,Facility,Titles,Quantities | - | eng | 500k | 500,000 | - | - | - | non-free | - | - | http://isca-speech.org/iscapad/iscapad.php?module=article&id=11577 | |
86 | - | Appen | - | - | Appen Named Entity Corpora | 2012 | NERC | WS | - | - | GEN | - | NE | Basic,GPE,Nationality,Religion,Facility,Titles,Quantities | - | fas | 500k | 500,000 | - | - | - | non-free | - | - | http://isca-speech.org/iscapad/iscapad.php?module=article&id=11578 | |
87 | - | Appen | - | - | Appen Named Entity Corpora | 2012 | NERC | WS | - | - | GEN | - | NE | Basic,GPE,Nationality,Religion,Facility,Titles,Quantities | - | kor | 500k | 500,000 | - | - | - | non-free | - | - | http://isca-speech.org/iscapad/iscapad.php?module=article&id=11579 | |
88 | - | Appen | - | - | Appen Named Entity Corpora | 2012 | NERC | WS | - | - | GEN | - | NE | Basic,GPE,Nationality,Religion,Facility,Titles,Quantities | - | jpn | 500k | 500,000 | - | - | - | non-free | - | - | http://isca-speech.org/iscapad/iscapad.php?module=article&id=11580 | |
89 | - | Appen | - | - | Appen Named Entity Corpora | 2012 | NERC | WS | - | - | GEN | - | NE | Basic,GPE,Nationality,Religion,Facility,Titles,Quantities | - | rus | 500k | 500,000 | - | - | - | non-free | - | - | http://isca-speech.org/iscapad/iscapad.php?module=article&id=11581 | |
90 | - | Appen | - | - | Appen Named Entity Corpora | 2012 | NERC | WS | - | - | GEN | - | NE | Basic,GPE,Nationality,Religion,Facility,Titles,Quantities | - | zho | 500k | 500,000 | - | - | - | non-free | - | - | http://isca-speech.org/iscapad/iscapad.php?module=article&id=11582 | |
91 | - | Appen | - | - | Appen Named Entity Corpora | 2012 | NERC | WS | - | - | GEN | - | NE | Basic,GPE,Nationality,Religion,Facility,Titles,Quantities | - | urd | 500k | 500,000 | - | - | - | non-free | - | - | http://isca-speech.org/iscapad/iscapad.php?module=article&id=11583 | |
92 | - | EU FP7 | Accurat | research project | TildeNER | 2012 | NERC | W | - | - | VAR | - | NE | MUC extended | manual | lav | 72k | 72,000 | - | Apache 2.0 | Apache | - | - | - | http://www.accurat-project.eu/index.php?p=accurat-toolkit | www.lrec-conf.org/proceedings/lrec2012/pdf/948_Paper.pdf |
93 | - | EU FP8 | Accurat | research project | TildeNER | 2012 | NERC | W | - | - | VAR | - | NE | MUC extended | manual | lit | 73k | 73,000 | - | Apache 2.0 | Apache | - | - | - | - | - |
94 | - | DARPA | Gale | research project | OntoNotes 5.0 | 2013 | several, NERC | WS | BN,BC,NW,WEB | BN,BC,NW,WEB | GEN | - | ML, NE | Extended | - | eng | 1,445M | 1,445,000 | txt (stand-off annotation), sql DB with Python API | LDC | LDC | non-free | LDC2013T19 | 151-738-649-048-2 | - | - |
95 | - | DARPA | Gale | research project | OntoNotes 5.0 | 2013 | several, NERC | WS | BN,BC,NW,WEB | BN,BC,NW,WEB | GEN | - | ML, NE | Extended | - | cmn | 690k | 690,000 | txt (stand-off annotation), sql DB with Python API | LDC | LDC | non-free | LDC2013T19 | 151-738-649-048-2 | - | - |
96 | - | DARPA | Gale | research project | OntoNotes 5.0 | 2013 | several, NERC | W | BN,BC,NW,WEB | BN,BC,NW,WEB | GEN | - | ML, NE | Extended | - | ara | 300k | 300,000 | txt (stand-off annotation), sql DB with Python API | LDC | LDC | non-free | LDC2013T19 | 151-738-649-048-2 | - | - |
97 | - | - | Biocreative | shared task | CHEMDNER | 2013 | NERC | W | PubMed abstracts | medline abstracts | BIO | - | NE | - | - | eng | - | - | - | http://www.biocreative.org/tasks/biocreative-iv/chemdner/ | https://jcheminf.springeropen.com/articles/10.1186/1758-2946-7-S1-S2 | |||||
98 | - | - | - | research project | Estonian NER corpus | 2013 | NERC | W | NP | NP | GEN | - | NE | CONLL | manual | est | 184k | 184,000 | IOB | CC-BY-NC | CC-BY-NC | downloadable | - | - | http://www.aclweb.org/anthology/W/W13/W13-24.pdf#page=90 | - |
99 | - | - | - | shared task | BioNLP-ST 2013 | 2013 | NERC | W | public web site on biology | WEB | SPEC | BIO | NE | OntoBioTope Ontology (1700 concepts) | manual | eng | 2040 docs | 816,000 | OBO format | free license | CC-BY | downloadable | - | - | http://2013.bionlp-st.org/tasks/bacteria-biotopes | - |
100 | - | - | WWW 2013 | shared task | Microposts2013 | 2013 | NERC | W | tweets | tweets | VAR | - | NE | CONNL | manual | eng | 4300 tweets | 34,400 | tsv | CC-BY-NC-SA | CC-BY-NC-SA | downloadable | - | - | http://oak.dcs.shef.ac.uk/msm2013/challenge.html | http://ceur-ws.org/Vol-1019/msm2013-challenge-report.pdf |
101 | - | - | WikiNER | research project | Silver-corpus | 2013 | NERC | W | WKPD | WKPD | GEN | - | NE | CONLL,Extended | automatic | deu | 3.5M | 3,500,000 | tsc | CC BY 3.1 | CC BY | downloadable | - | - | - | - |
102 | - | - | WikiNER | research project | Silver-corpus | 2013 | NERC | W | WKPD | WKPD | GEN | - | NE | CONLL,Extended | automatic | eng | 3.5M | 3,500,000 | tsc | CC BY 3.1 | CC BY | downloadable | - | - | - | - |
103 | - | - | WikiNER | research project | Silver-corpus | 2013 | NERC | W | WKPD | WKPD | GEN | - | NE | CONLL,Extended | automatic | spa | 3.5M | 3,500,000 | tsc | CC BY 3.1 | CC BY | downloadable | - | - | - | - |
104 | - | - | WikiNER | research project | Silver-corpus | 2013 | NERC | W | WKPD | WKPD | GEN | - | NE | CONLL,Extended | automatic | fra | 3.5M | 3,500,000 | tsc | CC BY 3.1 | CC BY | downloadable | - | - | - | - |
105 | - | - | WikiNER | research project | Silver-corpus | 2013 | NERC | W | WKPD | WKPD | GEN | - | NE | CONLL,Extended | automatic | ita | 3.5M | 3,500,000 | tsc | CC BY 3.1 | CC BY | downloadable | - | - | - | - |
106 | - | - | WikiNER | research project | Silver-corpus | 2013 | NERC | W | WKPD | WKPD | GEN | - | NE | CONLL,Extended | automatic | nld | 3.5M | 3,500,000 | tsc | CC BY 3.1 | CC BY | downloadable | - | - | - | - |
107 | - | - | WikiNER | research project | Silver-corpus | 2013 | NERC | W | WKPD | WKPD | GEN | - | NE | CONLL,Extended | automatic | pol | 3.5M | 3,500,000 | tsc | CC BY 3.1 | CC BY | downloadable | - | - | - | - |
108 | - | - | WikiNER | research project | Silver-corpus | 2013 | NERC | W | WKPD | WKPD | GEN | - | NE | CONLL,Extended | automatic | por | 3.5M | 3,500,000 | tsc | CC BY 3.1 | CC BY | downloadable | - | - | - | - |
109 | - | - | WikiNER | research project | Silver-corpus | 2013 | NERC | W | WKPD | WKPD | GEN | - | NE | CONLL,Extended | automatic | rus | 3.5M | 3,500,000 | tsc | CC BY 3.1 | CC BY | downloadable | - | - | - | - |
110 | - | - | - | research project | AIDA-EE Dataset | 2014 | - | W | gigaword5 corpus | NW | GEN | - | NE, references | - | - | eng | - | - | - | - | - | - | - | - | https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/aida/downloads/ | - |
111 | - | - | NE3L | research project | NE Arabic corpus | 2014 | NERC | W | NP | NP | GEN | - | NE | MUC | - | ara | 100k | 100,000 | txt | ELRA | ELRA | non-free | ELRA-W0078 | 398-979-151-557-0 | - | - |
112 | - | - | NE3L | research project | NE Chinese corpus | 2014 | NERC | W | NP | NP | GEN | - | NE | MUC | - | zho | 80k | 80,000 | txt | ELRA | ELRA | non-free | ELRA-W0079 | 187-154-782-686-9 | - | - |
113 | - | - | NE3L | research project | NE Russian corpus | 2014 | NERC | W | NP | NP | GEN | - | NE | MUC | - | rus | 75k | 75,000 | txt | ELRA | ELRA | non-free | ELRA-W0080 | 024-620-556-146-2 | - | - |
114 | - | - | N3-Collection | research project | News 100 | 2014 | EL | W | NP | NP | GEN | - | NE, references | Basic | manual | deu | 48k | 48,000 | NIF | CC-BY-NC-SA-4 | CC-BY-NC-SA | downloadable | - | - | http://svn.aksw.org/papers/2014/LREC_N3NIFNERNED/public.pdf | - |
115 | - | - | N3-Collection | research project | Reuters 128 | 2014 | EL, SA2KB | W | NP | NP | SPEC | CRIME | NE, references | Basic | manual | eng | 33k | 33,000 | NIF | CC-BY-NC-SA-5 | CC-BY-NC-SA | downloadable | - | - | - | - |
116 | - | - | N3-Collection | research project | RSS 500 | 2014 | EL, SA2KB | W | NP | NP | GEN | POL,ECO,SCI | NE, references | Basic | manual | eng | 31k | 31,000 | NIF | CC-BY-NC-SA-6 | CC-BY-NC-SA | downloadable | - | - | - | - |
117 | - | - | - | research project | DBpedia Spotlight NIF NER Corpus | 2014 | EL, SA2KB | W | NP | NP | GEN | - | NE, references | Basic | manual | eng | 3500 | 3500 | NIF | CC BY 4.0 | CC-BY | downloadable | - | - | https://datahub.io/dataset/dbpedia-spotlight-nif-ner-corpus | |
118 | - | - | - | research project | Abstract Meaning Representation (AMR) Annotation Release 1.0 | 2014 | several,NERC | W | NW,WEB | NW,WEB | GEN | - | ML,NE | - | manual | eng | - | - | treebank | LDC | LDC | LDC2014T12 | - | - | - | |
119 | - | - | Twitter Adverse Drug Reaction Mentions - ASU DIEGO Lab | research project | binary corpus | 2014 | NERC, EL | W | tweets | tweets | SPEC | BIO | presence of adverse drug reaction | adverse drug reactions | manual | eng | 7574 tweets | 60,592 | various files and script | unrestricted use | CC-BY | downloadable | - | - | - | - |
120 | - | - | Twitter Adverse Drug Reaction Mentions - ASU DIEGO Lab | research project | full annotation corpus | 2014 | NERC, EL | W | tweets | tweets | SPEC | BIO | annotation of adverse drug reaction | adverse drug reactions | manual | eng | 1784 tweets | 14,272 | various files and script | unrestricted use | CC-BY | downloadable | - | - | - | - |
121 | - | - | - | research project | Named Entity Recognition on Turkish Tweets | 2014 | NER | W | tweets | tweets | GEN | - | NE | MUC | manual | tur | - | - | - | downloadable | - | 764-177-227-350-7 | https://ec.europa.eu/jrc/en/language-technologies | - | ||
122 | - | - | WWW 2014 | shared task | Microposts2014 | 2014 | NERC,EL | W | tweets | tweets | GEN | events | NE, references | NERD | semi-automatic | eng | 3505 tweets | 28,040 | tsv | Twitter license | need to subscribe | - | - | http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=34440©ownerid=2 | http://ceur-ws.org/Vol-1141/microposts2014_neel-challenge-report.pdf | |
123 | - | - | - | research project | TaLAPi: A Thai Linguistically Annotated Corpus for Language Processing | 2014 | several,NERC | W | VAR | VAR | GEN | news, entertainment, lifestyle | ML,NE | Extended | manual | tha | 1M | 1,000,000 | - | - | - | - | - | - | http://www.lrec-conf.org/proceedings/lrec2014/pdf/59_Paper.pdf | |
124 | - | - | - | research project | KAIST silver standard corpus (404 error) | 2014 | NERC | W | Wikipedia,Dbpedia | WKPD | GEN | - | NE | - | automatic | multi | - | - | - | - | - | - | - | http://www.lrec-conf.org/proceedings/lrec2014/pdf/688_Paper.pdf | ||
125 | - | - | GERMEVAL | shared task | NoSta-D | 2014 | NERC | W | WKPD,NP | WKPD,NP | GEN | - | NE | Extended | manual | deu | 590k | 590,000 | IOB | CC-BY | CC-BY | - | - | https://www.tk.informatik.tu-darmstadt.de/fileadmin/user_upload/Group_LangTech/publications/BenikovaBiemannReznicek_LREC2014_GermanNER.pdf | - | |
126 | - | - | - | research project | German Reference Corpus DeReKo (not found) | 2014 | - | W | - | - | VAR | - | NE | - | manual | deu | - | - | - | - | - | - | - | - | Paper: Named Entity Tagging a Very Large Unbalanced Corpus: Training and Evaluating NE Classifiers. | - |
127 | - | - | - | - | KORE 50 | 2014 | EL, SA2KB | W | NW | NW | GEN | MUS,BUS,CELEB | NE, references | CONLL | manual | eng | 1300 | 1300 | NIF | CC BY 4.0 | CC-BY | downloadable | - | - | KORE & Keyphrase Overlap Relatedness for Entity Disambiguation //http://www.yovisto.com/labs/ner-benchmarks/, https://datahub.io/dataset/kore-50-nif-ner-corpus | - |
128 | - | - | Biocreative | shared task | CHEMDNER | 2015 | NERC | W | patents | patents | BIO | - | NE | - | eng | - | - | - | - | - | - | - | - | http://www.biocreative.org/tasks/biocreative-v/track-2-chemdner/ | - | |
129 | - | - | - | shared task | Quaero French Medical Corpus | 2015 | NERC (concepts et liaison) | W | titles (medline) and articles (emea) | scientific articles | MED | - | NE, references | Quaero based on UMLS | manual | fra | 103,056 words | 103,056 | standoff format | GFDL | GFDL | downloadable | - | - | https://quaerofrenchmed.limsi.fr/ | - |
130 | - | - | WWW 2015 | shared task | Microposts2015 | 2015 | NERC,EL | W | tweets | tweets | GEN | event (re-use of Micropost2014) | NE, references | NERD, other types | semi-automatic | eng | 6025 tweets | 48,200 | - | - | - | - | - | - | - | |
131 | - | - | WNU | shared task | WNUT2015 (test data) | 2015 | NERC | W | tweets | tweets | GEN | re-use of Ritter data | NE | Extended | manual | eng | 1425 tweets | 11,400 | - | - | - | upon registration | - | - | http://noisy-text.github.io/2015/ner-shared-task.html | http://www.anthology.aclweb.org/W/W15/W15-43.pdf#page=138 |
132 | - | - | - | research project | Arboretum treebank | 2015 | several,NERC | W | VAR | VAR | VAR | - | morphosyntax, syntax, NE | not said | not said | dan | 425k | 425,000 | xml | ELRA | ELRA | non-free | ELRA-W0084 | 025-729-182-451-2 | http://metashare.elda.org/repository/browse/arboretum-treebank/f8c4509e983d11e5a51c00259011f6ead47bd1ee2f67436083f627cf3d252a6d/ | - |
133 | - | none | none | shared task | OKE 2015 Task 1 | 2015 | NERC,EL, KBP | W | Wikipedia | WKP | SPEC | scholar biographies | NE,EL | BASIC,R | manual | eng | 196 sentences | 4116 | NIF | - | - | downloadable | - | - | https://github.com/anuzzolese/oke-challenge | http://link.springer.com/chapter/10.1007/978-3-319-25518-7_1 |
134 | - | - | - | shared task | Semeval 2015 Task 13 | 2015 | WSD, EL | W | emea, KDEdoc, EU bookshop corpus | VAR | SPEC | Bio-medical, Maths, Social issues | WSD,EL | - | manual | eng | 1.2k | 1200 | tsv | - | - | downloadable | - | - | http://anthology.aclweb.org/S/S15/S15-2.pdf#page=330, http://alt.qcri.org/semeval2015/task13/index.php?id=data-and-tools | |
135 | - | - | - | shared task | Semeval 2015 Task 13 | 2015 | WSD, EL | W | emea, KDEdoc, EU bookshop corpus | VAR | SPEC | Bio-medical, Maths, Social issues | WSD,EL | - | manual | spa | 1.2k | 1200 | tsv | - | - | downloadable | - | - | - | |
136 | - | - | - | shared task | Semeval 2015 Task 13 | 2015 | WSD, EL | W | emea, KDEdoc, EU bookshop corpus | VAR | SPEC | Bio-medical, Maths, Social issues | WSD,EL | - | manual | ita | 1.2k | 1200 | tsv | - | - | downloadable | - | - | - | |
137 | - | - | NewsReader | shared task | MEANTIME | 2016 | several,NERC, EL | W | NP | NP | GEN | ECO, FIN | ML,NE | P,O,L,PROD,EVENT | manual | eng | 40k | 40,000 | xml | CC-BY 4.0 | CC-BY | - | - | - | http://www.newsreader-project.eu, MEANTIME, the NewsReader Multilingual Event and Time Corpus. In Proceedings of LREC 2016. TO APPEAR | - |
138 | - | - | - | shared task | Microposts2016 | 2016 | NERC,EL | W | tweets | tweets | GEN | events and non events (re-use of Micropost2015) | NE, references | NERD, other types | semi-automatic | eng | 9289 tweets | 74,312 | tsv, neel format | CC-BY 4.0 | CC-BY | - | - | - | - | - |
139 | - | - | WNU | shared task | WNUT2016 | 2016 | NERC | W | tweets | tweets | GEN, VAR | re-use of Ritter and WNU2015 | NE | Extended | manual | eng | 3473 tweets | 27,784 | - | - | - | - | - | - | - | http://www.aclweb.org/anthology/W/W16/W16-39.pdf#page=150 |
140 | - | - | - | shared task | MEANTIME | 2016 | several,NERC, EL | W | NP | NP | GEN | ECO, FIN | ML,NE | P,O,L,PROD,EVENT | semi-automatic | spa | 40k | 40,000 | xml | CC-BY 4.1 | CC-BY | - | - | - | - | |
141 | - | - | - | shared task | MEANTIME | 2016 | several,NERC, EL | W | NP | NP | GEN | ECO, FIN | ML,NE | P,O,L,PROD,EVENT | semi-automatic | nld | 40k | 40,000 | xml | CC-BY 4.2 | CC-BY | - | - | - | - | - |
142 | - | - | - | shared task | MEANTIME | 2016 | several,NERC, EL | W | NP | NP | GEN | ECO, FIN | ML,NE | P,O,L,PROD,EVENT | semi-automatic | ita | 40k | 40,000 | xml | CC-BY 4.3 | CC-BY | - | - | - | - | - |
143 | - | - | - | research project | Japanese Basic NE corpus (BCCWJ Basic NE) | 2016 | NERC | W | various / balanced | VAR | VAR | - | NE | IREX | manual | jpn | 136 docs / 2561 NEs | 54,400 | - | - | - | downloadable | - | - | https://sites.google.com/site/projectnextnlpne/en | - |
144 | - | - | - | research project | KDD-D KDD-T | 2016 | NERC | W | web queries | web queries | GEN | - | NE | CoNNL | manual | eng | 3000 queries | 12,000 | - | - | - | - | - | - | anthology.aclweb.org/P/P11/P11-1097.pdf | |
145 | - | - | - | - | CRAFT | 2016 | NERC | W | journal articles | scientific articles | BIO | Bio medical | NE | - | manual | eng | - | - | - | - | - | - | - | - | http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml | |
146 | - | EU, Ireland, Italy | Multilingual Entity Liking | shared task | EVALITA NEEL | 2016 | NERC,EL | W | tweets | tweets | GEN | - | NE, references | - | manual | ita | 1301 tweets | 10,408 | - | - | - | downloadable upon registration to the task | - | - | http://neel-it.github.io/ | http://ceur-ws.org/Vol-1749/paper_007.pdf |
147 | - | - | NEMLAR | - | Broadcast News Speech Corpus | 2005 | - | S | BN | BN | GEN | - | NE | - | - | ara | 40 hours | 500,000 | DB | - | - | low fee | ELRA-S0219 | 479-507-036-103-9 | - | |
148 | - | - | QUAERO | shared task | Broadcast News corpus | 2011 | NERC | S | BN, BC | BN, BC | GEN | - | NE | QUAERO | manual | fra | 1.2M | 1,200,000 | - | Academic - Non Commercial Use | CC-BY-NC-SA | free of charge | ELRA-S0349 | 074-668-446-920-0 | - | |
149 | - | - | QUAERO | shared task | Old Press corpus | 2011 | NERC | W | ONP | ONP | GEN | - | NE | QUAERO | manual | fra | 1.8M | 1,800,000 | - | Academic - Non Commercial Use | CC-BY-NC-SA | free of charge | ELRA-W0073 | 864-217-681-552-4 | - | |
150 | - | - | QUAERO | shared task | Pharmacology Patents corpus for Quaero | 2011 | - | W | patents | patents | SPEC | - | - | QUAERO | manual | fra | - | - | - | - | - | - | - | - | - | - |
151 | - | - | CINTIL | research project | Cintil-corpus | 2006 | several, NERC | WS | VAR | VAR | VAR | - | ML, NE | MUC | automatic | por | 1.1M | 1,100,000 | IOB | ELRA | ELRA | low fee | ELRA-W0050 | 176-775-844-396-0 | http://www.academia.edu/download/32290046/BarretoEtAl2006a.pdf | |
152 | - | - | CNEC | research project | Czech Named Entity Corpus 2.0 | 2007 | NERC | W | VAR | VAR | GEN | - | NE | Extended | manual | ces | 8993 sentences | 186,900 | plain text, xml, html, treex | CC-BY-NC-SA 3.0 | CC-BY-NC-SA | downloadable | - | - | http://ufal.mff.cuni.cz/cnec | - |
153 | - | - | HAREM | shared task | Harem Golden Collection | 2008 | NERC | WS | WEB, BN, NW, email, reports | WEB, BN, NW, email, reports | VAR | - | NE | Extended | - | por | 80k | 80,000 | xml | - | - | free of charge | - | - | http://www.linguateca.pt/HAREM/ | - |
154 | - | - | - | research project | EIEC Basque Named Entities Corpus v1.0 | 2004 | NERC | W | NW | NW | GEN | - | NE | CONNL | manual | eus | 50044 | 50,044 | IOB | CC BY 4.0 | CC BY | downloadable | N/A | - | http://ixa2.si.ehu.es/eiec/eiec_v1.0.tgz | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.302.8999 |
155 | - | - | - | research project | Original Short-Message Data Collation I | 2007 | - | W | SMS, SMess | SMS, SMess | VAR | - | NE | - | manual | zho | 265k SMS | 2,120,000 | manually tagged | ELRA | ELRA | non-free | ELRA-W0045-04 | 169-161-744-054-8 | - | - |
156 | - | - | - | research project | Original Short-Message Data Collation II | 2007 | - | W | SMS, SMess | SMS, SMess | VAR | - | NE | - | manual | zho | 202k SMS | 1,616,000 | manually tagged | ELRA | ELRA | non-free | ELRA-W0045-08 | 753-094-616-225-9 | - | - |
157 | - | - | - | research project | NER-Tweets | 2011 | NERC | W | tweets | tweets | GEN | - | NE | Extended | manual | eng | 2400 tweets | 19,200 | IOB | - | - | downloadable | - | - | http://github.com/aritter/twitter_nlp | http://dl.acm.org/citation.cfm?id=2145595 |
158 | - | - | - | - | IITB | 2009 | SA2KB | W | web sites | web sites | GEN | - | NE, references | Wikipedia | semi-automatic | eng | 107 documents | 42,800 | - | - | - | - | - | - | http://dl.acm.org/citation.cfm?id=1557073 | - |
159 | - | - | - | research project | Yapex | 2002 | NERC | W | medline abstracts | medline abstracts | BIO | - | NE | - | manual | eng | 101 medline abstracts | 35,350 | - | - | - | - | - | - | - | |
160 | - | - | - | research project | Linnaeus | 2010 | NERC | W | PubMed abstracts | medline abstracts | BIO | species | NE | - | manual | eng | 100 full texts | 40,000 | - | - | - | - | - | - | http://linnaeus.sourceforge.net/ | http://www.lrec-conf.org/proceedings/lrec2012/summaries/222.html |
161 | - | - | - | research project | BioInfer | 2008 | ERD | W | PubMed abstracts | medline abstracts | BIO | protein | NE | - | manual | eng | 836 abstracts | 292,600 | - | - | - | downloadable | - | - | http://mars.cs.utu.fi/BioInfer/ | - |
162 | - | - | - | research project | AImed | 2003 | NERC | W | PubMed abstracts | medline abstracts | BIO | protein | NE | - | manual | eng | 748 abstracts | 261,800 | - | - | - | downloadable | - | - | ftp://ftp.cs.utexas.edu/pub/mooney/bio-data/ | - |
163 | - | - | - | research project | FetchProt | 2008 | NERC | W | scientific articles | scientific articles | BIO | proteins | NE | - | manual | eng | 177 articles | 61,950 | - | - | - | downloadable | - | - | http://soda.swedishict.se/2712/ | - |
164 | - | - | - | shared task | LLL05 | 2005 | ERD | W | medline abstracts | medline abstracts | BIO | proteins/genes | NE | - | manual | eng | 80 sentences | 1,680 | - | - | - | downloadable | - | - | http://genome.jouy.inra.fr/texte/LLLchallenge/#training_download | - |
165 | - | - | - | shared task | OKE 2016 Task 1 | 2016 | NERC,EL, KBP | W | Wikipedia | WKP | SPEC | scholar biographies | NE,EL | BASIC,R | crowd-sourced | eng | 55 sentences | 1155 | NIF | - | - | downloadable | - | - | https://github.com/anuzzolese/oke-challenge-2016 | http://link.springer.com/chapter/10.1007/978-3-319-46565-4_1 |
166 | - | none | none | shared task | BSNLP 2017 | 2017 | NERC, EN, ECC | W | web pages | WEB | NEWS | politics | NE, normalization | CONLL | manual? | hrv | 200 documents | 80,000 | - | - | - | - | - | - | http://bsnlp-2017.cs.helsinki.fi/shared_task.html | |
167 | - | none | none | shared task | BSNLP 2017 | 2017 | NERC, EN, ECC | W | web pages | WEB | NEWS | politics | NE, normalization | CONLL | manual? | ces | 200 documents | 80,000 | - | - | - | - | - | - | http://bsnlp-2017.cs.helsinki.fi/shared_task.html | |
168 | - | none | none | shared task | BSNLP 2017 | 2017 | NERC, EN, ECC | W | web pages | WEB | NEWS | politics | NE, normalization | CONLL | manual? | pol | 200 documents | 80,000 | - | - | - | - | - | - | http://bsnlp-2017.cs.helsinki.fi/shared_task.html | |
169 | - | none | none | shared task | BSNLP 2017 | 2017 | NERC, EN, ECC | W | web pages | WEB | NEWS | politics | NE, normalization | CONLL | manual? | rus | 200 documents | 80,000 | - | - | - | - | - | - | http://bsnlp-2017.cs.helsinki.fi/shared_task.html | |
170 | - | none | none | shared task | BSNLP 2017 | 2017 | NERC, EN, ECC | W | web pages | WEB | NEWS | politics | NE, normalization | CONLL | manual? | slk | 200 documents | 80,000 | - | - | - | - | - | - | http://bsnlp-2017.cs.helsinki.fi/shared_task.html | |
171 | - | none | none | shared task | BSNLP 2017 | 2017 | NERC, EN, ECC | W | web pages | WEB | NEWS | politics | NE, normalization | CONLL | manual? | slv | 200 documents | 80,000 | - | - | - | - | - | - | http://bsnlp-2017.cs.helsinki.fi/shared_task.html | |
172 | - | none | none | shared task | BSNLP 2017 | 2017 | NERC, EN, ECC | W | web pages | WEB | NEWS | politics | NE, normalization | CONLL | manual? | ukr | 200 documents | 80,000 | - | - | - | - | - | - | http://bsnlp-2017.cs.helsinki.fi/shared_task.html | |
173 | - | - | - | shared task | Pascal challenge | 2005 | IE | W | workshop call for papers | WEB | VAR | science | NE | specific | manual | eng | 600 documents | 240,000 | - | - | - | downloadable | - | - | http://nlp.shef.ac.uk/pascal/Corpus.html | http://machinelearning.org/proceedings/icml2005/papers/044_Evaluating_IresonEtAl.pdf |
174 | - | - | - | shared task | OKE 2017 | 2017 | EL | - | - | - | - | - | - | - | - | eng | - | - | - | - | - | - | - | - | https://project-hobbit.eu/challenges/oke2017-challenge-eswc-2017/ | |
175 | - | - | - | research project | Cucerzan MSNBC | 2007 | NERC,EL | W | news | NEWS | VAR | - | NE, references | Wikipedia | manual | eng | 20 news stories | 8000 | txt | - | - | downloadable | - | - | http://research.microsoft.com/en-us/um/people/silviu/WebAssistant/TestData/ | https://www.microsoft.com/en-us/research/publication/large-scale-named-entity-disambiguation-based-on-wikipedia-data/ |
176 | - | - | - | research project | Cucerzan Wikipedia | 2007 | NERC,EL | W | Wikipedia | WKP | VAR | - | NE, references | Wikipedia | manual | eng | 350 wikipedia pages | 140,000 | txt | - | - | downloadable | - | - | http://research.microsoft.com/en-us/um/people/silviu/WebAssistant/TestData/ | https://www.microsoft.com/en-us/research/publication/large-scale-named-entity-disambiguation-based-on-wikipedia-data/ |
177 | research project | EDIEC Basque Disambiguated Named Entities Corpus | 2011 | NED | W | NP | NP | GEN | - | NE, references | Wikipedia | manual | eus | 1032 text documents | 412,800 | txt | CC BY 4.0 | CC BY | downloadable | - | - | http://ixa2.si.ehu.es/ediec/ediec_v1.0.tgz | http://link.springer.com/chapter/10.1007%2F978-3-642-23538-2_35 |
Acronyms