Abstract
Counterfactual Regret Minimization (CRF) is a fundamental and effective
technique for solving Imperfect Information Games (IIG). However, the original
CRF algorithm only works for discrete state and action spaces, and the
resulting strategy is maintained as a tabular representation. Such tabular
representation limits the method from being directly applied to large games and
continuing to improve from a poor strategy profile. In this paper, we propose a
double neural representation for the imperfect information games, where one
neural network represents the cumulative regret, and the other represents the
average strategy. Furthermore, we adopt the counterfactual regret minimization
algorithm to optimize this double neural representation. To make neural
learning efficient, we also developed several novel techniques including a
robust sampling method, mini-batch Monte Carlo Counterfactual Regret
Minimization (MCCFR) and Monte Carlo Counterfactual Regret Minimization Plus
(MCCFR+) which may be of independent interests. Experimentally, we demonstrate
that the proposed double neural algorithm converges significantly better than
the reinforcement learning counterpart.
Users
Please
log in to take part in the discussion (add own reviews or comments).