Gradient-based minimization for multi-expert Inverse Reinforcement Learning
We present a model-free method for solving the Inverse Reinforcement Learning (IRL) problem given a set of trajectories generated by different experts' policies. In many applications, the observed demonstrations are not produced by the same policy. In fact, they may be provided by multiple experts that follow different (but similar) policies or even by the same expert that does not always replicat
