Corpora for Named Entities

id sameAs financierprogramme workType nameyeartargetedTaskmodalitytextualGenre normalizedTextualGenre domainSub-Domain tagging usedTypologyconstruction language sizenormalizedSizeformatlicensenormalizedLicenseavailability catalogueReferenceislrn linkpublication
1-DARPAMUC shared taskMUC-6 1995NERW NW NW SPECBUSNEMUCmanualeng200 articles, 318 docs slot filling207,200sgmlLDCLDCnon-free LDC2003T13402-267-910-068-8--
2-DARPAMUC shared taskMET-21998NERW NW NW SPECairline crashes, launch eventsNEMUCmanualjpn414 docs165,600sgml--downloadable--http://www.itl.nist.gov/iaui/894.02/related_projects/muc/proceedings/muc_7_proceedings/overview.html, http://www.itl.nist.gov/iaui/894.02/related_projects/muc/-
3-DARPAMUC shared taskMET-21998NERW NW NW SPECairline crashes, launch events-MUCmanualzho308 docs123,200sgml--downloadable----
4-DARPAMUC shared task MUC-7 1998NERW NW NW SPECMILNEMUCmanual eng 400 articles160,000-LDCLDCnon-free LDC2001T02 783-262-033-141-8http://www.aclweb.org/anthology/M98-1028-
5-DARPAMUC shared taskHUB-41998NERSBNBNGEN-NEMUCmanualeng3h37,500---- LDC2000S86-http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.28.520&rep=rep1&type=pdf-
6-NISTACE shared task ACE-2 2001EDT, RDCWS NW, BN, NP NW, BN, NPGEN-NE, RDCBasic,GSP,FACmanual eng 180k180,000sgml,xml,tabLDCLDCnon-free LDC2003T11 498-363-793-174-9ftp://jaguar.ncsl.nist.gov/ace/phase1/edt_phase1_v2.2.pdf, ftp://jaguar.ncsl.nist.gov/ace/phase2/docs/RDC-Guidelines-v2.3.doc-
7-NISTCoNLL shared task CoNLL 2002 2002NERCW NW NW GEN-NEPOS,CONLLmanualspa370k370,000IOBprivateprivatenon-free--http://www.cnts.ua.ac.be/conll2002/ner/-
8-NISTCoNLL shared task CoNLL 2002 2002NERCW NW NW GEN-NEPOS,CONLLmanualnld310k310,000IOBprivateprivatenon-free---
9-NISTCoNLL shared task CoNLL 2003 2003NERCW NW NW GEN-NEPOS,CONLLmanual eng 210k210,000IOBprivateprivatenon-free--http://www.cnts.ua.ac.be/conll2003/ner/-
10-NISTCoNLL shared task CoNLL 20042003NERCW NW NW GEN-NEPOS,CONLLmanual deu310k310,000IOBprivateprivatenon-free----
11-NISTACE/TIDES shared task ACE 2003 2003EDT, RDCWS NW, BN, TS NW, BN, TS GEN-NE-manual eng 91k91,000sgml,xml,tabLDCLDCnon-free LDC2004T09685-740-491-198-0--
12-US NSFBerkeleyresearch projectBioText2003RDCWarticlesmedline abstractsBIO-NE,RDC-manualeng1100 medline abstracts385,000XMLdownloadable--http://biotext.berkeley.edu/data/dis_treat_data.html-
13-NISTACE/TIDES shared task ACE 2003 2003EDT, RDCWS--GEN---manualara43k43,000sgml,xml,tabLDCLDCnon-free----
14-NISTACE/TIDES shared task ACE 2003 2003EDT, RDCWS--GEN---manualzho98k98,000sgml,xml,tabLDCLDCnon-free----
15-NISTACE shared task ACE 2004 2004EDT, RDCWS NW, BN NW, BNGEN-NE, TE, RDACEmanual eng 158k158,000sgml,xml,tabLDCLDCnon-free LDC2005T09 789-870-824-708-5--
16--Biocreativeshared taskBIOCREATIVE I2004NERCWmedline articles (from Genetag!)medline abstracts BIOGene mentionsNE-semi-automaticeng15,000 sentences31,500------http://www.biocreative.org/tasks/biocreative-i/first-task-gm/https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-6-S1-S1
17---shared taskJNLPBA2004NERCWmedline abstractsmedline abstractsBIOBio medicalNE-semi-automaticeng401 medline abstracts140,350-------http://dl.acm.org/citation.cfm?id=1567610
18-NISTGALE, AQUAINT, ACE, TIDES-BBN Pronoun Coreference and Entity Type 2005NERCW NP NP GEN-NEExtended- eng 1M1,000,000txt (stand-off annotation) LDCLDCnon-free LDC2005T33 375-520-999-436-0--
19-NISTACE shared task ACE 2005 SpatialML 2005-WS NW, BN et BC NW, BN, BC GEN-PLNN/Amanual eng 300k300,000sgml,xml,tabLDCLDCnon-free LDC2008T03 472-226-418-389-7--
20-NISTACE shared task ACE 2005 SpatialML v2 2005-WS NW, BN et BC NW, BN, BC GEN--N/Amanual eng 210k210,000sgml,xml,tabLDCLDCnon-free LDC2011T02 912-956-774-503-2--
21-NISTACE shared task ACE 2005 SpatialML2005-WS NW, BN et BC NW, BN, BC GEN--N/Amanualzho298 docs119,200sgml,xml,tabLDCLDCnon-free LDC2010T09951-452-048-245-8--
22-NISTACE shared task ACE 2005 2005EDR, RDCWS NW, BN, BC, WL, TS NW, BN, BC, WL, TSGEN-NE, TE, RD, eventsACEmanual eng 303k303,000sgml,xml,tabLDCLDCnon-free LDC2006T06458-031-085-383-4--
23--Biocreativeshared taskGENETAG-052005NERCWmedline articlesmedline abstractsBIO-NE-semi-automaticeng20,000 sentences, 547801 words547,801---downloadable--http://biocreative.sourceforge.net/bio_corpora_links.htmlhttps://www.researchgate.net/publication/7782145_GENETAG_A_tagged_corpus_for_geneprotein_named_entity_recognition
24---shared taskLLL052005RDCWarticlesmedline abstractsBIO-NE,RDC--eng-------http://genome.jouy.inra.fr/texte/LLLchallenge/#task1-
25-NISTACE shared task ACE 2005 2005EDR, RDCWS NW, BN, BC, WL, TS NW, BN, BC, WL, TSGEN-NE, TE, RD, eventsACEmanualara112k112,000sgml,xml,tabLDCLDCnon-free----
26-NISTACE shared task ACE 2005 2005EDR, RDCWS NW, BN, BC, WL, TS NW, BN, BC, WL, TSGEN-NE, TE, RD, eventsACEmanualzho334k334,000sgml,xml,tabLDCLDCnon-free----
27--Biocreativeshared taskGENETAG2005NERCWMedlinemedline abstractsBIO-NE-semi-automaticeng547k547,000------http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-6-S1-S3-
28--Biocreativeshared taskBIOCREATIVE II2006NERC,RDCWarticlesmedline abstractsBIO-NE,RDC--eng4,171 sentences87,591IeXMLdownloadable--http://www.biocreative.org/tasks/biocreative-ii/, ftp://ftp.ebi.ac.uk/pub/software/textmining/corpora/BioCreative/-
29---research projectYapex2002NERCWmedline abstractsmedline abstractsBIOproteinNERC-manualeng55,61655,616------http://universal.elra.info/product_info.php?cPath=42_43&products_id=1460www.mpb.unige.ch/reports/rap_SanaaChtioui.pdf
30---research projectGenia2006NERCWarticlesmedline abstractsBIO-MLMesh-eng1999 abstracts699,650XML--downloadable--http://www.geniaproject.org/genia-corpus/term-corpus, http://universal.elra.info/product_info.php?cPath=42_43&products_id=1460-
31-EVALDAESTER shared taskESTER 12006NERCSBN BN GEN-NEExtendedmanualfra100 hours1,250,000xml--low fee----
32--research projectROCO - Romanian journalistic corpus2006several,NERCWNPNPGEN-ML, NEnot saidsemi-automaticron7.1M7,100,000xmlELRAELRAfree of chargeELRA-W0085312-617-089-348-7http://www.lrec-conf.org/proceedings/lrec2006/pdf/451_pdf.pdf-
33--Technolangueshared taskARCADE 22006several,NERCWNP, parallel corporaNP, parallel corporaGEN-alignment, NEnot saidmanualfra316k 316,000xmlELRAELRAlow feeELRA-E0018875-865-064-331-9-
34-EU CESAR PROJECT -Szeged NER corpus2006NERCWshort business news NW GEN-NECONNLmanualhun200k tokens200,000-Academic - Non Commercial UseCC-BY-NC-SA---http://www.lrec-conf.org/proceedings/lrec2006/pdf/365_pdf.pdf-
35---research projectBulTreeBank2006several,NERCWVARVARGENnews, literatureML,NE-manualbul15k sentences315,000xmlfree licenseCC-BY apparently downloadable--http://www.bultreebank.org-
36-NISTACE shared task ACE 2007 2007EDT,RDRW NW NW GEN-EDR, RDR, EMD, RMDACEmanualspa150k150,000sgml,xml,tabLDCLDCnon-free LDC2014T18600-375-253-846-9--
37-NISTACE shared task ACE 2007 2007EDT,RDRW NW, WL, BN NW, WL, BNGEN-EDR, RDR, EMD, RMDACEmanualara205k205,000sgml,xml,tabLDCLDCnon-free LDC2014T18 600-375-253-846-9--
38-U.S NSFTRECshared taskTREC Genomics 20072007IRWarticlesscientific articlesBIO-passageseng----------
39-NISTACE shared task ACE 2007 2007EDT,RDRWSBC,NW,BN,WL,ConversationBC,NW,BN,WL,ConversationGEN-EDR, RDR, EMD, RMDACEmanualeng265k265,000sgml,xml,tabLDCLDCnon-free----
40-NISTACE shared task ACE 2007 2007EDT,RDRWBN, NW, WLBN, NW, WLGEN-EDR, RDR, EMD, RMDACEmanualzho125k125,000sgml,xml,tabLDCLDCnon-free----
41--EVALITAshared taskI-CAB 4.1, Evalita 20072007NERCWNPNPGEN-ML, NEP,O,L,GPEmanualita182k182,000IOB,xmlfree licenseCC-BY free----
42--Metanet4u Projectshared taskCHIL 2007+ Evaluation Package2007several,NERCSseminarsseminarsGEN-multi layernot saidnot saideng--DBELRAELRAnon-freeELRA-E0041 639-487-568-289-4--
43--CNECresearch projectCzech Named Entity Corpus 1.1 (CNEC)2007NERCWVARVARGEN-NEExtendedmanualces5800 sentences121,800plain text, xml, html, treexCC-BY-NC-SA 3.0CC-BY-NC-SAdownloadable--http://ufal.mff.cuni.cz/cnec-
44----ANERCorp2007NERCWNW,WEB NW,WEB GEN -NECoNNLmanual ara150,000150,000BIO--downloadable--http://users.dsic.upv.es/~ybenajiba/http://link.springer.com/chapter/10.1007/978-3-540-70939-8_13
45-US NSF-research projectManually Annotated Sub-Corpus Third Release (MASC)2008several,NERCWSANC, contemporary EnglishVARGEN-ML,NEBasic,Dsemi-automaticeng500k500,000---LDC2013T12-http://www.lrec-conf.org/proceedings/lrec2008/pdf/617_paper.pdf-
46---research projectPennBioIE CYP 1.0 Corpus2008NERCWPubMed abstractsmedline abstractsBIO-ML,NE5 biomedical entitiesmanualeng274k274,000standoffprivatefee-LDC2008T20---
47---research projectPennBioIE Oncology Corpus2008NERCWPubMed abstractsmedline abstractsBIO-oncological NEmanualeng380k380,000standoffprivatefee-LDC2008T21-http://anthology.aclweb.org/W/W04/W04-3111.pdf-
48----Arizona Disease2008NERCWPubMed abstractsmedline abstractsBIOBio medicalNEdiseaseseng2,775 sentences58,275--downloadable--ftp://ftp.ebi.ac.uk/pub/software/textmining/corpora/diseases/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2352871/
49---research projectSCAI-Test2008NERCW--BIOBio medicalNE-manualeng100 medline abstracts35,000IOB--downloadable--https://www.scai.fraunhofer.de/en/business-research-areas/bioinformatics/downloads/corpora-for-chemical-entity-recognition.htmlhttps://pub.uni-bielefeld.de/download/2603498/2624539
50--Biocreativeshared taskBIOCREATIVE III2009RDCWPubMed abstractsmedline abstractsBIO-RDC----265,000xml -----http://www.biocreative.org/tasks/biocreative-iii/ppi/
51--shared taskESTER 2 corpus 2009NERCSBN BN GEN- NE Extendedmanual fra 100 hours1,250,000xml--low fee ELRA-S0338123-207-221-143-8--
52-NISTACE shared taskREFLEX Entity Translation2009NERC, Entity TranslationWNW,WLNW,WLGEN-NE, TEACEmanualeng22.5k22,500-LDCLDC-LDC2009T09-http://www.itl.nist.gov/iad/mig//tests/ace/2007/et/
53-NISTACE shared taskREFLEX Entity Translation2009NERC, Entity TranslationWNW,WLNW,WLGEN-NE, TEACEmanualzho22.5k22,500-LDCLDC-LDC2009T10---
54-NISTACE shared taskREFLEX Entity Translation2009NERC, Entity TranslationWNW,WLNW,WLGEN-NE, TEACEmanualara22.5k22,500-LDCLDC-LDC2009T11---
55-SCHWAWikiNER research project WikiGold 2009NERCWWKPDWKPDGEN-NECONLLmanual eng 39k39,000IOBCC BY 3.0CC BYdownloadable-http://schwa.org/projects/resources/wiki/Wikinerhttp://www.joelnothman.com/downloads/PeoplesWeb02.pdf
56-EU FP7-research projectCALBC-SSC-III-Small2009NERCWmedline abstracts medline abstracts BIOBio medicalNE-automaticeng179,999 medline abstracts62,999------http://www.ebi.ac.uk/Rebholz-srv/CALBC/corpora/resources.htmlhttp://lbm2009.biopathway.org/papers/long/The_CALBC_Silver_Standard_Corpus_-_Harmonizing_multiple_semantic_annotations_in_a_large_biomedical_corpus.pdf
57-EU FP7-research projectCALBC-SSC-III-Big2009NERCWmedline abstracts medline abstracts BIOBio medicalNE-automaticeng714,283 medline abstracts249,999,050--------
58---research projectFSU-PRGE2009-Wmedline abstracts medline abstracts BIOBio medicalNE-semi-automaticeng3306 medline abstracts1,157,100------http://pubannotation-old.dbcls.jp/projects/FSU-PRGEhttp://aclweb.org/anthology/W/W10/W10-1838.pdf
59---research projectLINNAEUS2010NERCWPubMed articlesscientific articlesBIO-NE-eng10000 articles4,000,000------http://linnaeus.sourceforge.net/-
60---research projectFinin-tweets2010NERCWtweetstweetsGEN-NEBasiccrowd-sourcedeng12,800 tweets102,400IOBdownloadable-http://www.cs.jhu.edu/~mdredze/publications/amt_ner.pdf-
61---shared taskEvalita 20112011NERCSBNBNGEN-NEP,O,L,GPEmanualita7923879,238------http://www.evalita.it/2011/tasks/NER-
62---research projectRatinov-AQUAINT subset2011NERC,ELWNWNWGEN-NE, references-manualeng--sgmlLDC2002T31-http://cogcomp.cs.illinois.edu/papers/RRDA11.pdf-
63---research projectRatinov-MSNBC2011NERC,ELWNWNWGEN-NE, references-manualeng--NIF-http://cogcomp.cs.illinois.edu/papers/RRDA11.pdf-
64-EU-research projectPolish Sejm Corpus2011NERCSparliamentparliamentGEN-POS, syntax, NEBasicautomaticpol14M tokens14,000,000unrestricted useCC BY free of charge----
65---research projectAIDA CONLL YAGO Dataset2011NERC, ELWreuters news corporaNWGEN-NE, referencesCONNLautomaticeng-NIFfree of charge, CoNLL LicenceCoNLLdownloadable--https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/aida/downloads/-
66---research projectPolish Coreference Corpus2011NERCWSvarious data (National Corpus of Polish)VARGEN-ML,NEBasicautomaticpol500k500,000xmlCC-BYCC-BYbroken link----
67----Ratinov-ACE-coref 20042011NERC,ELW--GEN-NE, references-manualeng--NIF-----see also for NIF: http://dashboard.nlp2rdf.aksw.org/-
68----Ratinov-Wiki2011NERCW-WKPDWKPD---semi-automaticeng--NIF-------
69--ETAPEshared taskETAPE2012NERCSBC,BNBC,BNGEN-NE, speech7 typesmanualfra30 hours375,000??-?----
70--Ancora research projectAncora corpus 2012several, NERCWNPNPGEN-ML, NECONLL, NUM- spa500k500,000?GPLGPLfree of charge-- http://clic.ub.edu/corpus/en/ancora-
71--Ancora research projectAncora corpus 2012several, NERCWNPNPGEN-ML, NECONLL, NUM-cat500k500,000?GPLGPLfree of charge----
72-CESAR-research projectOpinHuBank2012OMWNW,SocialMedia,BlogVARGEN-NE,OMnot saidautomatichun80k (8000 sent.)80,000txt/csvCC-BY-SACC-BY-SA----
73---research projectSzeged Criminal NE Corpus2012NERCWtexts on criminal offencestexts on criminal offencesSPECCRIMENECONNLmanualhun540k540,000Academic - Non Commercial UseCC-BY-NC-SA----
74---shared taskEntity profiling ORM Twitter (Meij)2012ORM, C2KBWtweetstweetsGEN- terms, NEOsemi-automaticeng--tsv---http://nlp.uned.es/~damiano/datasets/entityProfiling_ORM_Twitter.html http://nlp.uned.es/~damiano/pdf/spina2012corpusEntityProfiling.pdf
75----ROMBAC - Romanian balanced corpus2012several,NERCWVARVARVARjournalism, law, fiction, medicine, biographicalML, NEnot saidsemi-automaticron41M41,000,000xmlELRAELRAfree of chargeELRA-W0088 162-192-982-061-0http://www.lrec-conf.org/proceedings/lrec2012/pdf/218_Paper.pdf
76-PANACEA-research projectPanacea 2012several,NERCWWEBWEBENV-ML,NEnot saidautomaticell34.6M346,000,000CC-BY-SACC-BY-SA-----
77-PANACEA-research projectPanacea 2012several,NERCWWEBWEBENV-ML,NEnot saidautomaticita36M36,000,000txt/plainCC-BY-SACC-BY-SA-----
78-PANACEA-research projectPanacea 2012several,NERCWWEBWEBENV-ML,NEnot saidautomaticspa30M30,000,000txt/plainCC-BY-SACC-BY-SA-----
79-PANACEA-research projectPanacea 2012several,NERCWWEBWEBLAB-ML,NEnot saidautomaticita70M70,000,000txt/plainCC-BY-SACC-BY-SA-----
80-PANACEA-research projectPanacea 2012several,NERCWWEBWEBLAB-ML,NEnot saidautomaticspa60M60,000,000txt/plainCC-BY-SACC-BY-SA-----
81-PANACEA-research projectPanacea 2012several,NERCWWEBWEBLAB-ML,NEnot saidautomaticell26M26,000,000txt/plainCC-BY-SACC-BY-SA-----
82----HunNERwiki2012NERCWWKPDWKPDGEN-ML,NECONNLautomatichun19M19,000,000txt,csvCC-BY-SA 3.0CC-BY-SA---Automatically generated NE tagged corpora for English and Hungarian. http://hlt.sztaki.hu/resources/hunnerwiki.html-
83---research projectCorpus NE2012QAW--GEN-NEBasicmanualeng60k (6000 sent.)60,000-CC-BY-SACC-BY-SAdownloadable--http://metashare.elda.org/repository/browse/corpusne/8a5694f8a19911e1ab95080027f903f25bf18b8246744a13b65da3b6515cb0d4/-
84-Appen--Appen Named Entity Corpora2012NERCWS--GEN-NEBasic,GPE,Nationality,Religion,Facility,Titles,Quantities-ara500k500,000---non-free --http://isca-speech.org/iscapad/iscapad.php?module=article&id=11576-
85-Appen--Appen Named Entity Corpora2012NERCWS--GEN-NEBasic,GPE,Nationality,Religion,Facility,Titles,Quantities-eng500k500,000---non-free --http://isca-speech.org/iscapad/iscapad.php?module=article&id=11577
86-Appen--Appen Named Entity Corpora2012NERCWS--GEN-NEBasic,GPE,Nationality,Religion,Facility,Titles,Quantities-fas500k500,000---non-free --http://isca-speech.org/iscapad/iscapad.php?module=article&id=11578
87-Appen--Appen Named Entity Corpora2012NERCWS--GEN-NEBasic,GPE,Nationality,Religion,Facility,Titles,Quantities-kor500k500,000---non-free --http://isca-speech.org/iscapad/iscapad.php?module=article&id=11579
88-Appen--Appen Named Entity Corpora2012NERCWS--GEN-NEBasic,GPE,Nationality,Religion,Facility,Titles,Quantities-jpn500k500,000---non-free --http://isca-speech.org/iscapad/iscapad.php?module=article&id=11580
89-Appen--Appen Named Entity Corpora2012NERCWS--GEN-NEBasic,GPE,Nationality,Religion,Facility,Titles,Quantities-rus500k500,000---non-free --http://isca-speech.org/iscapad/iscapad.php?module=article&id=11581
90-Appen--Appen Named Entity Corpora2012NERCWS--GEN-NEBasic,GPE,Nationality,Religion,Facility,Titles,Quantities-zho500k500,000---non-free --http://isca-speech.org/iscapad/iscapad.php?module=article&id=11582
91-Appen--Appen Named Entity Corpora2012NERCWS--GEN-NEBasic,GPE,Nationality,Religion,Facility,Titles,Quantities-urd500k500,000---non-free --http://isca-speech.org/iscapad/iscapad.php?module=article&id=11583
92-EU FP7Accuratresearch projectTildeNER2012NERCW--VAR-NEMUC extendedmanual lav72k 72,000-Apache 2.0Apache---http://www.accurat-project.eu/index.php?p=accurat-toolkitwww.lrec-conf.org/proceedings/lrec2012/pdf/948_Paper.pdf
93-EU FP8Accuratresearch projectTildeNER2012NERCW--VAR-NEMUC extendedmanual lit73k73,000-Apache 2.0Apache-----
94-DARPAGale research project OntoNotes 5.0 2013several, NERCWS BN,BC,NW,WEB BN,BC,NW,WEBGEN-ML, NEExtended- eng 1,445M1,445,000txt (stand-off annotation), sql DB with Python APILDCLDCnon-free LDC2013T19151-738-649-048-2--
95-DARPAGale research project OntoNotes 5.0 2013several, NERCWS BN,BC,NW,WEB BN,BC,NW,WEBGEN-ML, NEExtended-cmn690k690,000txt (stand-off annotation), sql DB with Python APILDCLDCnon-free LDC2013T19 151-738-649-048-2--
96-DARPAGale research project OntoNotes 5.0 2013several, NERCW BN,BC,NW,WEB BN,BC,NW,WEBGEN-ML, NEExtended-ara300k300,000txt (stand-off annotation), sql DB with Python APILDCLDCnon-free LDC2013T19 151-738-649-048-2--
97--Biocreativeshared taskCHEMDNER2013NERCWPubMed abstractsmedline abstractsBIO-NE--eng---http://www.biocreative.org/tasks/biocreative-iv/chemdner/https://jcheminf.springeropen.com/articles/10.1186/1758-2946-7-S1-S2
98---research projectEstonian NER corpus 2013NERCWNPNPGEN-NECONLLmanualest184k184,000IOBCC-BY-NC CC-BY-NC downloadable--http://www.aclweb.org/anthology/W/W13/W13-24.pdf#page=90-
99---shared taskBioNLP-ST 20132013NERCWpublic web site on biologyWEBSPECBIONEOntoBioTope Ontology (1700 concepts)manualeng2040 docs816,000OBO formatfree licenseCC-BY downloadable--http://2013.bionlp-st.org/tasks/bacteria-biotopes-
100--WWW 2013shared task Microposts20132013NERCWtweetstweetsVAR-NECONNLmanualeng4300 tweets34,400tsvCC-BY-NC-SACC-BY-NC-SAdownloadable--http://oak.dcs.shef.ac.uk/msm2013/challenge.htmlhttp://ceur-ws.org/Vol-1019/msm2013-challenge-report.pdf
101--WikiNER research projectSilver-corpus2013NERCWWKPDWKPDGEN-NECONLL,Extendedautomaticdeu3.5M3,500,000tscCC BY 3.1CC BYdownloadable----
102--WikiNER research projectSilver-corpus2013NERCWWKPDWKPDGEN-NECONLL,Extendedautomaticeng3.5M3,500,000tscCC BY 3.1CC BYdownloadable----
103--WikiNER research projectSilver-corpus2013NERCWWKPDWKPDGEN-NECONLL,Extendedautomaticspa3.5M3,500,000tscCC BY 3.1CC BYdownloadable----
104--WikiNER research projectSilver-corpus2013NERCWWKPDWKPDGEN-NECONLL,Extendedautomaticfra3.5M3,500,000tscCC BY 3.1CC BYdownloadable----
105--WikiNER research projectSilver-corpus2013NERCWWKPDWKPDGEN-NECONLL,Extendedautomaticita3.5M3,500,000tscCC BY 3.1CC BYdownloadable----
106--WikiNER research projectSilver-corpus2013NERCWWKPDWKPDGEN-NECONLL,Extendedautomaticnld3.5M3,500,000tscCC BY 3.1CC BYdownloadable----
107--WikiNER research projectSilver-corpus2013NERCWWKPDWKPDGEN-NECONLL,Extendedautomaticpol3.5M3,500,000tscCC BY 3.1CC BYdownloadable----
108--WikiNER research projectSilver-corpus2013NERCWWKPDWKPDGEN-NECONLL,Extendedautomaticpor3.5M3,500,000tscCC BY 3.1CC BYdownloadable----
109--WikiNER research projectSilver-corpus2013NERCWWKPDWKPDGEN-NECONLL,Extendedautomaticrus3.5M3,500,000tscCC BY 3.1CC BYdownloadable----
110---research projectAIDA-EE Dataset2014-Wgigaword5 corpusNWGEN-NE, references--eng--------https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/aida/downloads/-
111--NE3L research projectNE Arabic corpus 2014NERCWNPNPGEN- NE MUC- ara 100k100,000txtELRAELRAnon-free ELRA-W0078 398-979-151-557-0--
112--NE3L research projectNE Chinese corpus 2014NERCWNPNPGEN- NE MUC- zho 80k80,000txtELRAELRAnon-free ELRA-W0079 187-154-782-686-9--
113--NE3L research projectNE Russian corpus 2014NERCWNPNPGEN- NE MUC- rus 75k75,000txtELRAELRAnon-free ELRA-W0080 024-620-556-146-2--
114--N3-Collectionresearch projectNews 1002014ELWNPNPGEN-NE, referencesBasicmanualdeu48k48,000NIFCC-BY-NC-SA-4CC-BY-NC-SAdownloadable--http://svn.aksw.org/papers/2014/LREC_N3NIFNERNED/public.pdf-
115--N3-Collectionresearch projectReuters 1282014EL, SA2KBWNPNPSPECCRIMENE, referencesBasicmanualeng33k33,000NIFCC-BY-NC-SA-5CC-BY-NC-SAdownloadable----
116--N3-Collectionresearch projectRSS 5002014EL, SA2KBWNPNPGENPOL,ECO,SCINE, referencesBasicmanualeng31k31,000NIFCC-BY-NC-SA-6CC-BY-NC-SAdownloadable----
117---research projectDBpedia Spotlight NIF NER Corpus2014EL, SA2KBWNPNPGEN-NE, referencesBasicmanualeng35003500NIFCC BY 4.0CC-BYdownloadable--https://datahub.io/dataset/dbpedia-spotlight-nif-ner-corpus
118---research projectAbstract Meaning Representation (AMR) Annotation Release 1.02014several,NERCWNW,WEBNW,WEBGEN-ML,NE- manualeng--treebankLDCLDCLDC2014T12---
119--Twitter Adverse Drug Reaction Mentions - ASU DIEGO Lab research projectbinary corpus2014NERC, ELWtweetstweetsSPECBIOpresence of adverse drug reactionadverse drug reactionsmanualeng 7574 tweets60,592various files and scriptunrestricted useCC-BYdownloadable----
120--Twitter Adverse Drug Reaction Mentions - ASU DIEGO Lab research projectfull annotation corpus2014NERC, ELWtweetstweetsSPECBIOannotation of adverse drug reactionadverse drug reactionsmanualeng1784 tweets14,272various files and scriptunrestricted useCC-BYdownloadable----
121---research projectNamed Entity Recognition on Turkish Tweets2014NERWtweetstweetsGEN-NEMUCmanualtur---downloadable-764-177-227-350-7https://ec.europa.eu/jrc/en/language-technologies-
122--WWW 2014 shared task Microposts20142014NERC,ELWtweetstweetsGENeventsNE, referencesNERDsemi-automaticeng3505 tweets28,040tsvTwitter licenseTwitterneed to subscribe--http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=34440©ownerid=2http://ceur-ws.org/Vol-1141/microposts2014_neel-challenge-report.pdf
123---research projectTaLAPi: A Thai Linguistically Annotated Corpus for Language Processing2014several,NERCWVARVARGENnews, entertainment, lifestyleML,NEExtendedmanualtha1M1,000,000------http://www.lrec-conf.org/proceedings/lrec2014/pdf/59_Paper.pdf
124---research projectKAIST silver standard corpus (404 error)2014NERCWWikipedia,DbpediaWKPDGEN-NE-automaticmulti-------http://www.lrec-conf.org/proceedings/lrec2014/pdf/688_Paper.pdf
125--GERMEVALshared taskNoSta-D2014NERCWWKPD,NPWKPD,NPGEN-NEExtendedmanualdeu590k590,000IOBCC-BYCC-BY--https://www.tk.informatik.tu-darmstadt.de/fileadmin/user_upload/Group_LangTech/publications/BenikovaBiemannReznicek_LREC2014_GermanNER.pdf-
126---research projectGerman Reference Corpus DeReKo (not found)2014-W--VAR-NE-manualdeu--------Paper: Named Entity Tagging a Very Large Unbalanced Corpus: Training and Evaluating NE Classifiers.-
127----KORE 502014EL, SA2KBW NW NW GENMUS,BUS,CELEBNE, referencesCONLLmanualeng13001300NIFCC BY 4.0CC-BYdownloadable--KORE & Keyphrase Overlap Relatedness for Entity Disambiguation //http://www.yovisto.com/labs/ner-benchmarks/, https://datahub.io/dataset/kore-50-nif-ner-corpus-
128--Biocreativeshared taskCHEMDNER2015NERCWpatentspatentsBIO-NE-eng--------http://www.biocreative.org/tasks/biocreative-v/track-2-chemdner/-
129---shared taskQuaero French Medical Corpus2015NERC (concepts et liaison)Wtitles (medline) and articles (emea)scientific articlesMED -NE, referencesQuaero based on UMLSmanualfra 103,056 words103,056standoff formatGFDLGFDLdownloadable--https://quaerofrenchmed.limsi.fr/-
130--WWW 2015 shared task Microposts20152015NERC,ELWtweetstweetsGENevent (re-use of Micropost2014)NE, referencesNERD, other typessemi-automaticeng6025 tweets48,200-------
131--WNUshared taskWNUT2015 (test data)2015NERCWtweetstweetsGENre-use of Ritter dataNEExtended manualeng1425 tweets11,400---upon registration--http://noisy-text.github.io/2015/ner-shared-task.htmlhttp://www.anthology.aclweb.org/W/W15/W15-43.pdf#page=138
132---research projectArboretum treebank2015several,NERCWVARVARVAR-morphosyntax, syntax, NEnot saidnot saiddan425k425,000xmlELRAELRAnon-freeELRA-W0084 025-729-182-451-2http://metashare.elda.org/repository/browse/arboretum-treebank/f8c4509e983d11e5a51c00259011f6ead47bd1ee2f67436083f627cf3d252a6d/-
133-nonenoneshared taskOKE 2015 Task 12015NERC,EL, KBPWWikipediaWKPSPECscholar biographiesNE,ELBASIC,Rmanualeng196 sentences4116NIF--downloadable--https://github.com/anuzzolese/oke-challengehttp://link.springer.com/chapter/10.1007/978-3-319-25518-7_1
134---shared taskSemeval 2015 Task 132015WSD, ELWemea, KDEdoc, EU bookshop corpusVARSPECBio-medical, Maths, Social issuesWSD,EL-manualeng1.2k1200tsv--downloadable--http://anthology.aclweb.org/S/S15/S15-2.pdf#page=330, http://alt.qcri.org/semeval2015/task13/index.php?id=data-and-tools
135---shared taskSemeval 2015 Task 132015WSD, ELWemea, KDEdoc, EU bookshop corpusVARSPECBio-medical, Maths, Social issuesWSD,EL-manualspa1.2k1200tsv--downloadable---
136---shared taskSemeval 2015 Task 132015WSD, ELWemea, KDEdoc, EU bookshop corpusVARSPECBio-medical, Maths, Social issuesWSD,EL-manualita1.2k1200tsv--downloadable---
137--NewsReadershared taskMEANTIME2016several,NERC, ELWNPNPGENECO, FINML,NEP,O,L,PROD,EVENTmanualeng40k40,000xmlCC-BY 4.0CC-BY---http://www.newsreader-project.eu, MEANTIME, the NewsReader Multilingual Event and Time Corpus. In Proceedings of LREC 2016. TO APPEAR-
138---shared task Microposts20162016NERC,ELWtweetstweetsGENevents and non events (re-use of Micropost2015)NE, referencesNERD, other typessemi-automaticeng9289 tweets74,312tsv, neel formatCC-BY 4.0CC-BY-----
139--WNUshared taskWNUT20162016NERCWtweetstweetsGEN, VARre-use of Ritter and WNU2015NEExtended manualeng3473 tweets27,784-------http://www.aclweb.org/anthology/W/W16/W16-39.pdf#page=150
140---shared taskMEANTIME2016several,NERC, ELWNPNPGENECO, FINML,NEP,O,L,PROD,EVENTsemi-automaticspa40k40,000xmlCC-BY 4.1CC-BY----
141---shared taskMEANTIME2016several,NERC, ELWNP NP GENECO, FINML,NEP,O,L,PROD,EVENTsemi-automaticnld 40k40,000xmlCC-BY 4.2CC-BY-----
142---shared taskMEANTIME2016several,NERC, ELWNPNPGENECO, FINML,NEP,O,L,PROD,EVENTsemi-automaticita40k40,000xmlCC-BY 4.3CC-BY-----
143---research projectJapanese Basic NE corpus (BCCWJ Basic NE)2016NERCWvarious / balancedVARVAR-NEIREXmanualjpn136 docs / 2561 NEs54,400---downloadable--https://sites.google.com/site/projectnextnlpne/en-
144---research projectKDD-D KDD-T2016NERCWweb queriesweb queriesGEN-NECoNNLmanualeng3000 queries12,000------anthology.aclweb.org/P/P11/P11-1097.pdf
145----CRAFT2016NERCWjournal articlesscientific articlesBIOBio medicalNE-manualeng--------http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml
146-EU, Ireland, Italy Multilingual Entity Likingshared taskEVALITA NEEL2016NERC,ELWtweetstweetsGEN-NE, references-manualita 1301 tweets10,408---downloadable upon registration to the task--http://neel-it.github.io/http://ceur-ws.org/Vol-1749/paper_007.pdf
147--NEMLAR -Broadcast News Speech Corpus 2005-SBN BN GEN- NE -- ara 40 hours500,000DB--low fee ELRA-S0219 479-507-036-103-9-
148--QUAERO shared taskBroadcast News corpus 2011NERCSBN, BC BN, BC GEN- NE QUAEROmanual fra 1.2M1,200,000-Academic - Non Commercial Use CC-BY-NC-SAfree of charge ELRA-S0349 074-668-446-920-0-
149--QUAERO shared taskOld Press corpus 2011NERCWONPONPGEN- NE QUAEROmanual fra 1.8M1,800,000-Academic - Non Commercial Use CC-BY-NC-SAfree of charge ELRA-W0073 864-217-681-552-4-
150--QUAERO shared taskPharmacology Patents corpus for Quaero2011-WpatentspatentsSPEC--QUAEROmanual fra ----------
151--CINTIL research projectCintil-corpus 2006several, NERCWSVARVARVAR- ML, NE MUCautomatic por 1.1M1,100,000IOBELRAELRAlow fee ELRA-W0050 176-775-844-396-0http://www.academia.edu/download/32290046/BarretoEtAl2006a.pdf
152--CNECresearch projectCzech Named Entity Corpus 2.02007NERCW VAR VARGEN-NEExtendedmanualces8993 sentences186,900plain text, xml, html, treexCC-BY-NC-SA 3.0CC-BY-NC-SAdownloadable--http://ufal.mff.cuni.cz/cnec-
153--HAREM shared taskHarem Golden Collection2008NERCWS WEB, BN, NW, email, reports WEB, BN, NW, email, reportsVAR- NE Extended- por 80k80,000xml--free of charge--http://www.linguateca.pt/HAREM/-
154---research projectEIEC Basque Named Entities Corpus v1.02004NERCWNWNWGEN-NECONNLmanualeus5004450,044IOBCC BY 4.0CC BYdownloadableN/A-http://ixa2.si.ehu.es/eiec/eiec_v1.0.tgzhttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.302.8999
155---research projectOriginal Short-Message Data Collation I 2007-WSMS, SMessSMS, SMessVAR- NE -manual zho 265k SMS2,120,000manually taggedELRAELRAnon-free ELRA-W0045-04169-161-744-054-8--
156---research projectOriginal Short-Message Data Collation II 2007-WSMS, SMessSMS, SMessVAR- NE -manual zho 202k SMS1,616,000manually taggedELRAELRAnon-free ELRA-W0045-08753-094-616-225-9--
157---research projectNER-Tweets2011NERCWtweetstweetsGEN-NEExtendedmanualeng2400 tweets19,200IOB--downloadable-- http://github.com/aritter/twitter_nlphttp://dl.acm.org/citation.cfm?id=2145595
158----IITB2009SA2KBWweb sitesweb sitesGEN-NE, referencesWikipediasemi-automaticeng107 documents42,800------http://dl.acm.org/citation.cfm?id=1557073-
159---research projectYapex2002NERCWmedline abstractsmedline abstractsBIO-NE-manualeng101 medline abstracts35,350-------
160---research projectLinnaeus2010NERCWPubMed abstractsmedline abstractsBIOspeciesNE-manualeng100 full texts40,000------http://linnaeus.sourceforge.net/http://www.lrec-conf.org/proceedings/lrec2012/summaries/222.html
161---research projectBioInfer2008ERDWPubMed abstractsmedline abstractsBIOproteinNE-manualeng 836 abstracts292,600---downloadable--http://mars.cs.utu.fi/BioInfer/-
162---research projectAImed2003NERCWPubMed abstractsmedline abstractsBIOproteinNE-manualeng748 abstracts261,800---downloadable--ftp://ftp.cs.utexas.edu/pub/mooney/bio-data/-
163---research projectFetchProt2008NERCWscientific articlesscientific articlesBIOproteinsNE-manualeng177 articles61,950---downloadable--http://soda.swedishict.se/2712/-
164---shared taskLLL052005ERDWmedline abstractsmedline abstractsBIOproteins/genesNE-manualeng80 sentences1,680---downloadable--http://genome.jouy.inra.fr/texte/LLLchallenge/#training_download-
165---shared taskOKE 2016 Task 12016NERC,EL, KBPWWikipediaWKPSPECscholar biographiesNE,ELBASIC,Rcrowd-sourced eng55 sentences1155NIF--downloadable--https://github.com/anuzzolese/oke-challenge-2016http://link.springer.com/chapter/10.1007/978-3-319-46565-4_1
166-nonenoneshared taskBSNLP 20172017NERC, EN, ECCWweb pagesWEBNEWSpoliticsNE, normalizationCONLLmanual?hrv200 documents80,000------http://bsnlp-2017.cs.helsinki.fi/shared_task.html
167-nonenoneshared taskBSNLP 20172017NERC, EN, ECCWweb pagesWEBNEWSpoliticsNE, normalizationCONLLmanual?ces200 documents80,000------http://bsnlp-2017.cs.helsinki.fi/shared_task.html
168-nonenoneshared taskBSNLP 20172017NERC, EN, ECCWweb pagesWEBNEWSpoliticsNE, normalizationCONLLmanual?pol200 documents80,000------http://bsnlp-2017.cs.helsinki.fi/shared_task.html
169-nonenoneshared taskBSNLP 20172017NERC, EN, ECCWweb pagesWEBNEWSpoliticsNE, normalizationCONLLmanual?rus200 documents80,000------http://bsnlp-2017.cs.helsinki.fi/shared_task.html
170-nonenoneshared taskBSNLP 20172017NERC, EN, ECCWweb pagesWEBNEWSpoliticsNE, normalizationCONLLmanual?slk200 documents80,000------http://bsnlp-2017.cs.helsinki.fi/shared_task.html
171-nonenoneshared taskBSNLP 20172017NERC, EN, ECCWweb pagesWEBNEWSpoliticsNE, normalizationCONLLmanual?slv200 documents80,000------http://bsnlp-2017.cs.helsinki.fi/shared_task.html
172-nonenoneshared taskBSNLP 20172017NERC, EN, ECCWweb pagesWEBNEWSpoliticsNE, normalizationCONLLmanual?ukr200 documents80,000------http://bsnlp-2017.cs.helsinki.fi/shared_task.html
173---shared taskPascal challenge2005IEWworkshop call for papersWEBVARscienceNEspecificmanualeng600 documents240,000---downloadable--http://nlp.shef.ac.uk/pascal/Corpus.htmlhttp://machinelearning.org/proceedings/icml2005/papers/044_Evaluating_IresonEtAl.pdf
174---shared taskOKE 20172017EL--------eng--------https://project-hobbit.eu/challenges/oke2017-challenge-eswc-2017/
175---research projectCucerzan MSNBC2007NERC,ELWnewsNEWSVAR-NE, referencesWikipediamanualeng20 news stories8000txt--downloadable--http://research.microsoft.com/en-us/um/people/silviu/WebAssistant/TestData/https://www.microsoft.com/en-us/research/publication/large-scale-named-entity-disambiguation-based-on-wikipedia-data/
176---research projectCucerzan Wikipedia2007NERC,ELWWikipediaWKPVAR-NE, referencesWikipediamanualeng350 wikipedia pages140,000txt--downloadable--http://research.microsoft.com/en-us/um/people/silviu/WebAssistant/TestData/https://www.microsoft.com/en-us/research/publication/large-scale-named-entity-disambiguation-based-on-wikipedia-data/
177research projectEDIEC Basque Disambiguated Named Entities Corpus2011NEDWNPNPGEN-NE, referencesWikipediamanualeus1032 text documents412,800txtCC BY 4.0CC BYdownloadable--http://ixa2.si.ehu.es/ediec/ediec_v1.0.tgzhttp://link.springer.com/chapter/10.1007%2F978-3-642-23538-2_35

Acronyms