APRICODD: Approximate Policy Construction using Decision Diagrams
Abstract
We propose a method of approximate dynamic programming
for Markov decision processes (MDPs) using algebraic decision
diagrams (ADDs). We produce near-optimal value functions and policies
with much lower time and space requirements than exact dynamic programming.
Our method reduces the sizes of the intermediate value functions
generated during value iteration
by replacing the values at the terminals of the ADD
with ranges of values.
Our method is demonstrated on a class of large MDPS (with up
to 2 billion states), and we compare the results
with the optimal value functions.