Date: 15 June 2017
Time: 12:00 - 13:00
Location: Wolfson Lecture Theatre, Computing, Queen Mother Building
Host: Professor Chris Reed / Martin Pereira Farina
Title: Distributional Semantics and Compositionality and Open Source Modules for NLP (Linguakit)
Abstract: Distributional models of word meaning have become increasingly popular in the last 25 years. To date, distributional approaches have successfully dealt with individual words out of context, as in the case of recent research relying on deep learning strategies (word embeddings). However, methods for constructing distributional-based semantic representations for phrases or sentences have received little attention in the literature. I will present a purely distributional and compositional model, based on the notion of co-selection and incremental interpretation.Central to this approach is vector composition, which I operationalize in terms of very simple additive and multiplicative functions. A useful application of this approach is the possibility of computing distributional-based similarity between phrases and sentences in order to identify paraphrases.
At the end of the talk, I will introduce Linguakit, a multilanguage toolkit containing a package of linguistic tools such as a PoS tagging, syntactic analysis, summarization, sentiment analysis, relation extraction, multiword extraction, etc.
Source code available at https://github.com/citiususc/Linguakit.
And web application at: https://linguakit.com/
Bio: I defended my Linguistics thesis in 1998 at the Université Blaise Pascal. Then, I worked as a post-doc in the Artificial Intelligence Center of the Universidade Nova de Lisboa (Portugal) for several years. Since 2004 I have been working at the University of Santiago de Compostela (Spain), first as a Isidro Parga Pondal research fellow and now as a Ramón y Cajal research fellow. I am promoter and founding partner of Cilenis, spin-off of the University of Santiago de Compostela on language technologies.
My main scientific interest is Natural Language Processing and Information Extraction. Currently, I am developing projects related to information; extraction from bilingual comparable corpus; extraction of semantic relations; measures of similarity between words/concepts; dependency parsing and morphosyntactic tagging.