Skills transfer from a robot manipulator to a flexible snake-like robot
Tomorrow's robots will not only be numerous: they will also have many different forms. It will become difficult for the end users to exploit this ecosystem of robots if each robot needs to be re-programmed separately to make them achieve a new task.
A user-friendly approach to transfer new skills to robots is to take inspiration from the way we teach skills to each others by demonstration. Current research in robot learning by imitation aims at simplifying the re-programming process by directly demonstrating the task to the robot (human-robot skill transfer). The democratization of robots will soon require that robots also teach each others new skills (robot-robot skill transfer). Due to the large variety of robots and large spectrum of possible embodiments, the correspondence problem will become a bottleneck for the transfer of skills only based on action-level representations. Instead, the skill transfer process will likely require (but not be limited to) higher-level forms of imitation capable of extracting and reproducing the intent underlying demonstrated actions.
The approach that we show in this video shares connection with inverse reinforcement learning (IRL), in which the aim is to extract an unknown reward function that underlies the executed actions. In contrast to most IRL problems that attempt to explain the observations with reward functions defined for the entire task (or pre-defined parts of the task), our approach is based on context-dependent reward-weighted regression, where the robot can learn (in the policy parameters space) the relevance of candidate reward functions with respect to time or situation.
The approach is tested with the transfer of a via-point task from a standard 7 DOFs manipulator to a very different form of robot. This experiment takes place within the STIFF-FLOP European project, with the aim of transferring skills from a surgeon teleoperator to a flexible robot that can selectively stiffen its body to navigate within the patient through a trocar port. This form of continuum robot is inspired by the way the octopus makes use of its embodiment to achieve skillful movements.
The video shows that the continuum robot can use the extracted context-dependent rewards to refine the skill on its own. The nature of the approach leaves the robot with the freedom to exploit its own body characteristics to fulfill the task, with the possibility of reaching a level of skill that goes beyond that of the demonstrator (learning from suboptimal demonstrations).