LS-IQ : implicit reward regularization for inverse reinforcement learning
Recent methods for imitation learning directly learn a Q-function using an implicit reward formulation rather than an explicit reward function.However, these methods generally require implicit reward regularization to improve stability and often mistreat absorbing states.Previous works show that a squared norm regularization on the implicit reward function is effective, but do not provide a theore
