Abstract

Robot programming by demonstration encompasses a wide range of learning strategies, from simple mimicking of the demonstrator's actions to the higher level extraction of the underlying intent. Focussing on this last form, we study the problem of extracting the reward function explaining the demonstrations from a set of candidate reward functions, and using this information for self-refinement of the skill. This definition of the problem has links with inverse reinforcement learning problems in which the robot autonomously extracts an optimal reward function that defines the goal of the task. By relying on Gaussian mixture model, the proposed approach learns how the different candidate reward functions are combined, and in which contexts or phases of the overall task, they are relevant for explaining the user's demonstrations. The extracted reward profile is then exploited to improve the skill with an expectation-maximization based self-refinement approach, allowing the imitator to reach a higher skill level than the demonstrator. The approach can be used in reproducing the same skills in different ways or transferring them across agents of different structures. The proposed approach is tested in simulation with a new type of continuum robot, using kinesthetic demonstrations coming from a Barrett WAM manipulator.

Bibtex reference

@inproceedings{Malekzadeh13IROS,
author="Malekzadeh, M. S. and Bruno, D. and Calinon, S. and Nanayakkara, T. and Caldwell, D. G.",
title="Skills transfer across dissimilar robots by learning context-dependent rewards",
booktitle="Proc. {IEEE/RSJ} Intl Conf. on Intelligent Robots and Systems ({IROS})",
year="2013",
month="November",
pages="1746--1751"
}

Video

Tomorrow's robots will not only be numerous: they will also have many different forms. It will become difficult for the end users to exploit this ecosystem of robots if each robot needs to be re-programmed separately to make them achieve a new task.

A user-friendly approach to transfer new skills to robots is to take inspiration from the way we teach skills to each others by demonstration. Current research in robot learning by imitation aims at simplifying the re-programming process by directly demonstrating the task to the robot (human-robot skill transfer). The democratization of robots will soon require that robots also teach each others new skills (robot-robot skill transfer). Due to the large variety of robots and large spectrum of possible embodiments, the correspondence problem will become a bottleneck for the transfer of skills only based on action-level representations. Instead, the skill transfer process will likely require (but not be limited to) higher-level forms of imitation capable of extracting and reproducing the intent underlying demonstrated actions.

The approach that we show in this video shares connection with inverse reinforcement learning (IRL), in which the aim is to extract an unknown reward function that underlies the executed actions. In contrast to most IRL problems that attempt to explain the observations with reward functions defined for the entire task (or pre-defined parts of the task), our approach is based on context-dependent reward-weighted regression, where the robot can learn (in the policy parameters space) the relevance of candidate reward functions with respect to time or situation.

The approach is tested with the transfer of a via-point task from a standard 7 DOFs manipulator to a very different form of robot. This experiment takes place within the STIFF-FLOP European project, with the aim of transferring skills from a surgeon teleoperator to a flexible robot that can selectively stiffen its body to navigate within the patient through a trocar port. This form of continuum robot is inspired by the way the octopus makes use of its embodiment to achieve skillful movements.

The video shows that the continuum robot can use the extracted context-dependent rewards to refine the skill on its own. The nature of the approach leaves the robot with the freedom to exploit its own body characteristics to fulfill the task, with the possibility of reaching a level of skill that goes beyond that of the demonstrator (learning from suboptimal demonstrations).