Bellmann's Principle of Optimality

So far, there is one essential ingredient missing: efficiency. Since the size of the search space may be exponential in terms of the input size, an ADP problem can be solved in polynomial time and space under a certain condition known as Bellman's principle of optimality. In his own words:

"An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision."

We formalize this principle:

Definition (Algebraic version of Bellman's principle):

For each k-ary operator f in Σ, and all answer lists z1,...,zk, the objective function h satisfies

h([f(x1,...,xk) | x1 <- z1,...,xk <- zk])

= h([f(x1,...,xk) | x1 <- h(z1),...,xk <- h(zk)])

Additionally, the same property holds for the concatenation of answer lists:
= h([f(x1,...,xk) | x1 <- h(z1),...,xk <- h(zk)])

h(z1 ++ z2) = h( h(z1) ++ h(z2))

The practical meaning of the optimality principle is that we may push the application of the objective function inside the computation of subproblems, thus preventing combinatorial explosion. We shall annotate the tree grammar to indicate the cases where h is to be applied.