Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks.
P. Vamplew, R. Dazeley, E. Barker, и A. Kelarev. Australasian Conference on Artificial Intelligence, том 5866 из Lecture Notes in Computer Science, стр. 340-349. Springer, (2009)