Abstract
The AlphaZero algorithm has achieved superhuman performance in two-player,
deterministic, zero-sum games where perfect information of the game state is
available. This success has been demonstrated in Chess, Shogi, and Go where
learning occurs solely through self-play. Many real-world applications (e.g.,
equity trading) require the consideration of a multiplayer environment. In this
work, we suggest novel modifications of the AlphaZero algorithm to support
multiplayer environments, and evaluate the approach in two simple 3-player
games. Our experiments show that multiplayer AlphaZero learns successfully and
consistently outperforms a competing approach: Monte Carlo tree search. These
results suggest that our modified AlphaZero can learn effective strategies in
multiplayer game scenarios. Our work supports the use of AlphaZero in
multiplayer games and suggests future research for more complex environments.
Users
Please
log in to take part in the discussion (add own reviews or comments).