We are making strong progress by focusing on challenging problems, driving new techniques in reinforcement learning with a focus on machine comprehension and conversational interfaces. Our research team publishes peer-reviewed papers that provide insight into the work we’re doing to advance knowledge in this field.
Hybrid Reward Architecture for Reinforcement Learning
One of the main challenges in reinforcement learning (RL) is generalisation. In typical deep RL methods this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network. While this approach works well in many domains, in domains where the optimal value function cannot easily be reduced to a low-dimensional representation, learning can be very slow and unstable. This paper contributes towards tackling such challenging domains, by proposing a new method, called Hybrid Reward Architecture (HRA). HRA takes as input a decomposed reward function and learns
A Frame Tracking Model for Memory-Enhanced Dialogue Systems
RepL4NLP @ ACL 2017
Recently, resources and tasks were proposed to go beyond state tracking in dialogue systems. An example is the frame tracking task, which requires recording multiple frames, one for each user goal set during the dialogue. This allows a user, for instance, to compare items corresponding to different goals. This paper proposes a model which takes as input the...
A Joint Model for Question Answering and Question Generation
We propose a generative machine comprehension model that learns jointly to ask and answer questions based on documents. The proposed model uses a sequence-to-sequence framework that encodes the document and generates a question (answer) given an answer (question). Significant improvement in model performance is...
Machine Comprehension by Text-to-Text Neural Question Generation
RepL4NLP @ ACL 2017
We propose a recurrent neural model that generates natural-language questions from documents, conditioned on answers. We show how to train the model using a combination of supervised and reinforcement learning. After teacher forcing for standard maximum likelihood training, we ﬁne-tune the model using policy gradient techniques to...
Multi-Advisor Reinforcement Learning
This article deals with a novel branch of Separation of Concerns, called Multi-Advisor Reinforcement Learning (MAd-RL), where a single-agent RL problem is distributed to
n learners, called advisors. Each advisor tries to solve the problem with a different focus. Their advice is then communicated to an aggregator, which is in control of...
Learning Algorithms for Active Learning
We present a model that learns active learning algorithms via metalearning. For each metatask, our model jointly learns: a data representation, an item selection heuristic, and a one-shot classiﬁer. Our model uses the item selection heuristic to construct a labeled support set for the one-shot classiﬁer. Using metatasks...
Transfer Reinforcement Learning with Shared Dynamics
This article addresses a particular Transfer Reinforcement Learning (RL) problem: when dynamics do not change from one task to another, and only the reward function does. Our method relies on two ideas, the first one is that transition samples obtained from a task can be reused to learn on…
Algorithm selection of off-policy reinforcement learning algorithm
Dialogue systems rely on a careful reinforcement learning design: the learning algorithm and its state space representation. In lack of more rigorous knowledge, the designer resorts to its practical experience to choose the best option. In order to automate and to improve the performance of the aforementioned...
Towards Information-Seeking Agents
We develop a general problem setting for training and testing the ability of agents to gather information efficiently. Specifically, we present a collection of tasks in which success requires searching through a partially-observed environment, for fragments of information which can be pieced together to accomplish…
Frames: A Corpus For Adding Memory To Goal-Oriented Dialogue Systems
This paper presents the Frames dataset, a corpus of 1369 human-human dialogues with an average of 15 turns per dialogue. We developed this dataset to study the role of memory in goal-oriented dialogue systems. Based on Frames, we introduce a task called frame tracking, which generalizes..
Separation of Concerns in Reinforcement Learning
In this paper, we propose a framework for solving a single-agent task by using multiple agents, each focusing on different aspects of the task. This approach has two main advantages: 1) it allows for specialized agents for different parts of the task, and 2) it provides a new way to transfer…
Calibrating Energy-based Generative Adversarial Networks
In this paper, we propose to equip Generative Adversarial Networks with the ability to produce direct energy estimates for samples. Specifically, we propose a flexible adversarial training framework, and prove this framework not only ensures the generator converges to...
NewsQA: A Machine Comprehension Dataset
RepL4NLP @ ACL 2017
We present NewsQA, a challenging machine comprehension dataset of over 100,000 question-answer pairs. Crowd-workers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting in spans of text from the corresponding articles. We collect this dataset through a four-stage process designed to solicit exploratory questions that require reasoning. A thorough analysis confirms...
An Architecture for Deep, Hierarchical Generative Models
We present an architecture which makes it easy to train deep, directed generative models with many layers of latent variables…
Effective Multi-step Temporal-Difference Learning for Non-Linear Function Approximation
Multi-step temporal-difference (TD) learning, where the update targets contain information from multiple time steps ahead, is one of the most popular forms of TD learning for linear function approximation. The reason is that multi-step methods often yield substantially better performance than their single-step counter-parts, due to a lower bias of...
Natural Language Comprehension with the EpiReader
We present the EpiReader, a novel model for machine comprehension of text. Machine comprehension of unstructured, real-world text is a major research goal for natural language processing. Current tests of machine comprehension pose questions whose answers can be inferred from some supporting text, and evaluate a model's response to the questions. The EpiReader is an end-to-end neural model…
Iterative Alternating Neural Attention for Machine Reading
We propose a novel neural attention architecture to tackle machine comprehension tasks, such as answering Cloze-style queries with respect to a document. Unlike previous models, we do not collapse the query into a single vector, instead we deploy an iterative alternating attention mechanism that allows a fine-grained exploration of both the query and…
Policy Networks with Two-Stage Training for Dialogue Systems
In this paper, we propose to use deep policy networks which are trained with an advantage actor-critic method for statistically optimised dialogue systems. First, we show that, on summary state and action spaces, deep Reinforcement Learning (RL) outperforms Gaussian Processes methods. Summary state and action spaces lead to good performance but…
A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems
User simulation is essential for generating enough data to train a statistical spoken dialogue system. Previous models for user simulation suffer from several drawbacks, such as the inability to take dialogue history into account, the need of rigid structure to ensure coherent user behaviour, heavy dependence on a specific…
Natural Language Generation in Dialogue using Lexicalized and Delexicalized Data
Natural language generation plays a critical role in any spoken dialogue system. We present a new approach to natural language generation using recurrent neural networks in an encoder-decoder framework. In contrast with previous work, our model uses both lexicalized and delexicalized versions of slot-value pairs for each dialogue…
Score-based Inverse Reinforcement Learning
This paper reports theoretical and empirical results obtained for the score-based Inverse Reinforcement Learning (IRL) algorithm. It relies on a non-standard setting for IRL consisting of learning a reward from a set of globally...
A Parallel-Hierarchical Model for Machine Comprehension on Sparse Data
Understanding unstructured text is a major goal within natural language processing. Comprehension tests pose questions based on short text passages to evaluate such understanding. In this work, we investigate machine comprehension on the challenging MCTest benchmark. Partly because of its limited size, prior work on MCTest has focused mainly on engineering…