### Abstract

A summary of the state-of-the-art reinforcement learning in robotics is given, in terms of both algorithms and policy representations. Numerous challenges faced by the policy representation in robotics are identified. Two recent examples for application of reinforcement learning to robots are described: pancake flipping task and bipedal walking energy minimization task. In both examples, a state-of-the-art Expectation- Maximization-based reinforcement learning algorithm is used, but different policy representations are proposed and evaluated for each task. The two proposed policy representations offer viable solutions to four rarely-addressed challenges in policy representations: correlations, adaptability, multi-resolution, and globality. Both the successes and the practical difficulties encountered in these examples are discussed.

### Bibtex reference

@inproceedings{Kormushev12IJCNN,
title="Challenges for the Policy Representation when Applying Reinforcement Learning in Robotics",
author="Kormushev, P. and Calinon, S. and Ugurlu, B. and Caldwell, D. G.",
booktitle="Intl Joint Conf. on Neural Networks ({IJCNN})",
year="2012",
pages="2819--2826"
}

### Video

The compliant humanoid robot COMAN learns to walk with two different gaits: one with fixed height of the center of mass, and one with varying height. The varying-height center-of-mass trajectory was learned by reinforcement learning in order to minimize the electric energy consumption during walking. The optimized walking gait achieves 18% reduction of the energy consumption in the sagittal plane, due to the passive compliance - the springs in the knees and ankles of the robot are able to store and release energy efficiently. In addition, the varying-height walking looks more natural and smooth than the conventional fixed-height walking.