@csaba

An Evaluation Criterion for Macro-Learning and Some Results

, and . TR-99-01. Mindmaker Ltd., Budapest 1121, Konkoly Th. M. u. 29-33, HUNGARY, (1999)

Abstract

It is known that a well-chosen set of macros makes it possible to considerably speed-up the solution of planning problems. Recently, macros have been considered in the planning framework, built on Markovian decision problem. However, so far no systematic approach was put forth to investigate the utility of macros within this framework. In this article we begin to systematically study this problem by introducing the concept of multi-task MDPs defined with a distribution over the tasks. We propose an evaluation criterion for macro-sets that is based on the expected planning speed-up due to the usage of a macro-set, where the expectation is taken over the set of tasks. The consistency of the empirical speed-up maximization algorithm is shown in the finite case. For acyclic systems, the expected planning speed-up is shown to be proportional to the amount of ``time-compression'' due to the macros. Based on these observations a heuristic algorithm for learning of macros is proposed. The algorithm is shown to return macros identical with those that one would like to design by hand in the case of a particular navigation like multi-task MDP. Some related questions, in particular the problem of breaking up MDPs into multiple tasks, factorizing MDPs and learning generalizations over actions to enhance the amount of transfer are also considered in brief at the end of the paper.

Links and resources

Tags