Yuki Arase


Professor, Tokyo Institute of Technology

View My GitHub Profile

Research Projects

More information will be added soon.

Paraphrase generation & recognition

Paraphrasing takes various forms of monolingual text transformations, such as text simplification, rewriting, and style transfer. We work on both recognition and generation. The core technologies are intelligent phrase alignment and controllable paraphrase generation.

Phrase alignment aims to identify phrasal paraphrases with syntactic structures. This technology is also valuable for estimating semantic similarities between texts for evaluating text generation models such as machine translation and automatic QA.

Current text generation models perform mostly black-box; we don’t know what comes out until the end. In paraphrasing, we are interested in realising controllability in text generation to allow fine-grained control of output texts.


We have created datasets for phrase alignment that (1) provide ground-truth tree structures (HPSG) and (2) provide ground-truth phrase alignments.

A parallel corpus of direct and indirect utterances is beneficial for natural language understanding in conversation systems and advancing paraphrase recognition in challenging and realistic settings.

Representation learning

Vector representation of words, phrases, and sentences are the very basis for NLP research. We study

  1. sophisticated representations for word meaning in context and multilingual sentences,
  2. efficient pre-trained models for words and phrases, and
  3. representations for few-shot learning.

NLP for language education & learning

As a central application of our research outcomes, we develop technologies for language learning and education supports. Our technology covers from fine-grained lexical-level transformations to coarse-grained text-level processing.