Diachronic word embedding aims to reveal the semantic evolution of words over time. Previous works learned word embeddings in different time periods first, and then aligned all the word embeddings into a same vector space. Different from previous works, we iteratively identify stable words, meanings of which remain acceptably stable even in different time periods, as anchors to ensure the performances of both embedding learning and alignment. To learn word embeddings in the same vector space, tw...
One of the main challenges of conducting research on paraphrase is the lack of large-scale, high-quality corpus, which is particularly serious for non-English investigations. In this paper, we present a simple and effective unsupervised learning model that is able to automatically extract high-quality sentence-level paraphrases from multiple Chinese translations of the same source texts. By applying this new model, we obtain a large-scale paraphrase corpus, which contains 509,832 pairs of paraph...
Abstract Automatic generation of texts with different sentiment labels has wide use in artificial intelligence applications such as conversational agents. It is an important problem to be addressed for achieving emotional intelligence. In this paper, we propose two novel models, SentiGAN and C-SentiGAN, which have multiple generators and one multi-class discriminator, to address this problem. In our models, multiple generators are trained simultaneously, aiming at generating texts of different s...
In this paper, we explore a new approach for automated chess commentary generation, which aims to generate chess commentary texts in different categories (e.g., description, comparison, planning, etc.). We introduce a neural chess engine into text generation models to help with encoding boards, predicting moves, and analyzing situations. By jointly training the neural chess engine and the generation models for different categories, the models become more effective. We conduct experiments on 5 ca...
Drug effectiveness describes the capacity of a drug to cure a disease, which is of great importance for drug safety. To get this information, a number of real-world patient-oriented outcomes are required. However, current surveillance systems can only capture a small portion of them, and there is a time lag in processing the reported data. Since social media provides quantities of patient-oriented user posts in real-time, it is of great value to automatically extract drug effectiveness from thes...
We present a phenomenon-oriented comparative analysis of the two dominant approaches in task-independent semantic parsing: classic, knowledge-intensive and neural, data-intensive models. To reflect state-of-the-art neural NLP technologies, we introduce a new target structure-centric parser that can produce semantic graphs much more accurately than previous data-driven parsers. We then show that, in spite of comparable performance overall, knowledge- and data-intensive models produce different ty...