Damien Nouvel
PhD & Assoc. Prof., Inalco, Paris, France
"The question whether machines can think is about as relevant as the question whether submarines can swim" (E. Dijkstra)
I work on various topics within Natural Language Processing, such as:
- Text Mining
- Named Entity Recognition
- Discourse Analysis
- Multilingual text analysis
- Lexical Incompleteness
My PhD focused on how to combine knowledge-based and data-driven learning approaches for Named Entity Recognition over oral transcripts. For this purpose, we propose to discover hierarchical sequential patterns by mining large annotated corpora in an exhaustive and objective fashion. We are then able to induce automata (or, more precisely, transducers) called annotation rules that may be used to recognize named entities. The originality of our proposal is that we do not focus on categorizing words, but instead on individually marking boundaries of entities (discriminating a single annotation tag as a local instruction). In this context, the system does not have to categorize words of an entity, but instead aims at inserting beginning and ending tags for recognizing named entities. You may find recent information in the Publications section.
About implementation, have a look at the mXS page for the French version of this system or for information about adaptions to other languages.
For Named Entity Recgnition, if you are looking for references, here are some I'd suggest as a starting point:
- History and survey:
- Grishman, R., & Sundheim, B. (1996, August). Message Understanding Conference-6: A Brief History. In COLING (Vol. 96, pp. 466-471).
- Nadeau, D., & Sekine, S. (2007). A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1), 3-26.
- Task and approaches (sorted by date):
- Bikel, D. M., Schwartz, R., & Weischedel, R. M. (1999). An algorithm that learns what's in a name. Machine learning, 34(1-3), 211-231.
- Mikheev, A., Moens, M., & Grover, C. (1999, June). Named entity recognition without gazetteers. In Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics (pp. 1-8). Association for Computational Linguistics.
- Miller, D., Boisen, S., Schwartz, R., Stone, R., & Weischedel, R. (2000, April). Named entity extraction from noisy input: speech and OCR. In Proceedings of the sixth conference on Applied natural language processing (pp. 316-324). Association for Computational Linguistics.
- Tjong Kim Sang, E. F., & De Meulder, F. (2003, May). Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4 (pp. 142-147). Association for Computational Linguistics.
- McCallum, A., & Li, W. (2003, May). Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4 (pp. 188-191). Association for Computational Linguistics.
- Bunescu, R. C., & Pasca, M. (2006, April). Using Encyclopedic Knowledge for Named entity Disambiguation. In EACL (Vol. 6, pp. 9-16).
- Ratinov, L., & Roth, D. (2009, June). Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (pp. 147-155). Association for Computational Linguistics.