EpiReader: a 2-stage approach to machine comprehension
When humans think about the world, we create hypotheses and test them out through experimentation. We see new things, we think about them, we act, and we learn.
At Maluuba, we believe this principle of testing theories with observed data can be applied to machine comprehension of natural language. We’ve verified this belief by developing EpiReader, an end-to-end neural model that uses a two-stage reasoning process to make sense of written text.
We conceived EpiReader because machine comprehension is a prerequisite for an extremely broad class of AI applications. Indeed, most human knowledge is collected in the natural language of text. There are trillions of unstructured documents - books, emails, chat logs, messages, notes, reports, user guides, and much more. Moreover, comprehension entails a range of important abilities, from basic understanding to causal reasoning to inference.
Turning machines into bookworms
As a comprehension model, EpiReader’s task is to answer questions by reading and comprehending a supporting passage of text. This is just like the reading tests we all took in school. In particular, we focused on the two best-known large-scale benchmarks for machine comprehension: CNN, which Google Deepmind released last summer and the Children’s Book Test (CBT), which Facebook released in February.
Both these datasets were generated synthetically through similar processes. The CNN corpus consists of news articles scraped from the CNN website with corresponding question-answer pairs. The questions are “fill-in-the-blanks”, or cloze format: they are constructed by deleting a single entity from highlight points that accompany each article. The target answer is simply the deleted word(s). See the example below for a clearer picture.
Similarly, CBT is based on 20-sentence excerpts from children’s books available through Project Gutenberg; question-answer pairs are generated by deleting a single word in the next (i.e., 21st) sentence.
We use these datasets in a supervised learning framework, wherein EpiReader “reads” both the text and the question and then outputs an answer that we compare to the correct target. We use stochastic gradient descent with backpropagation to tune the model parameters based on this error signal.
The structure of EpiReader can be seen in the figure below. From a high level, it consists in two connected modules that form hypotheses and then test them. The first module, the Extractor, selects a small set of potential answers by pointing to their locations in the text. It accomplishes this by first encoding both the passage and question using bidirectional GRU networks, then measuring the concordance between passage encodings and the question encoding with an inner product in the encoding space.
Based on this concordance, the Extractor generates hypotheses by replacing the placeholder token in the question with an answer candidate. The result is a statement that can be verified as true or false.
The second module, the Reasoner, then tests the hypotheses. It compares each hypothesis to the text passage, split into sentences, to measure textual entailment, and then aggregates entailment over all sentences in the passage. This is achieved using convolutional networks and a final GRU.
We combine the Reasoner’s entailment score with the concordance scores from the Extractor to determine a final likelihood of each hypothesis.
We tested EpiReader against the performance of several baselines on CNN and CBT. It achieved state-of-the-art performance across the board for both datasets, outperforming all previous approaches. In particular, EpiReader achieved 74% on CNN and 67.4% on CBT.
Applying the research
In the video below, we demonstrate EpiReader answering a few example questions. We use a twenty-sentence excerpt from Snow White (à la CBT), then take the twenty-first sentence and delete one of its words (in this case, ‘dwarfs’).
EpiReader first reads the question sentence and then reviews the passage. It extracts some potential words, formulates its hypotheses, then reasons about what the missing word is most likely to be.
Thereafter the video shows an example of comprehension over a news article.
Machine comprehension capabilities like this can work across multiple verticals -- in essence, on any task where information must be reviewed and understood. These capabilities can be applied to extracting information from reports, studies, user guides, and manuals, as well as to support human agents or customers searching for answers to their questions. In our future research, we will move beyond fill-in-the-blanks questions to cases where the answer is an arbitrary span of text in the corresponding passage.