# Introduction

## General DP Formulation

Typically a DP problem involves searching for a sequence of decisions that arrives at a maximum or minimum value. Let's define:

$t = 1...N$
: A "stage" of the problem. When $t=N$, we have reached the end of the problem. This is usually the variable representing time / planning horizon, but not necessarily

$s_t\in S$  
: State at stage $t$; $S$ is the space of all possible states

$a_t\in A$  
: Action at stage $t$; $A$ is the space of all possible actions

$r_t(s_t, a_t)$  
: Reward function if we take action $a_t$ at state $s_t$

$R(s_N)$
: Reward if we end on state $s_N$. This is the cost/reward of our base case, a subproblem that we can immediately solve without recursion. If we reach $s_N$, we have no possible actions.

Now the **optimality equation** / **value function** / **objective recurrence** is:

$$
v_t(s_t) = \left.\underset{a_t\in A}{\max/\min}\,\middle\{v_{t+1}(s_{t+1}) + r_t(s_t, a_t) \right\}
$$

## Examples

### Rod Cutting

For a rod of length $N$,

$$
\text{CutRod}(n)  = \begin{cases}
    0       &   \text{if } n = 0\\
    p_1     &   \text{if }n = 1\\
    \displaystyle\max_{1\leqslant i\leqslant n}\{
    \text{CutRod}(n-i) + p_i
    \}  &   \text{otherwise}
\end{cases}
$$

In this case, the remaining length of the rod is the state $s_t$, and the number of units of rod to cut is the action $a_t$. We immediately encounter the first exception of the definition of stage $t$. The only thing that's constant is the **initial** length of the rod, or the maximum length we can cut in 1 go. Therefore the stage is technically time, but it ranges from $1...N$ since the max number of cuts we can make is just the total length $N$ (just cut 1 unit at a time).

The rest of the variables are straightforward to find: the revenue of a cut is the reward function; final reward is either 0 or $p_1$, and $s_{n+1}$ is the new length after the cut.

Let's list each component:

$t = 1...N$
: Stage is the "time" elapsed or the total number of cuts we have made

$s_t\in S$  
: State is the length of the remaining rod

$a_t\in A$  
: Action the the possible cuts we can make. Ranges from 1 to $s_t$

$r_t(s_t, a_t)$  
: Reward is the revenue of the cut

$R(s_N)$
: Base cases, $s_N = 0$ or $s_N = 1$ which means $R(0) = 0, R(1) = p_1$

optimality equation is:

$$
\max_{1\leqslant i\leqslant n}\{\text{CutRod}(n-i) + p_i\}
$$

### Edit Distance

For strings `A[1...n], B[1...m]`

$$
{\text{EditDist}(i, j)} = \begin{cases}
i & \text{if } j = 0\\j & \text{if } i = 0\\
\min \left.\begin{cases}

\text{EditDist}(i, j-1) + 1\\
\text{EditDist}(i-1, j) + 1\\
\text{EditDist}(i-1, j-1) + \begin{cases}1 & \text{if }A[i]\ne B[j]\\
0 & \text{if }A[i]= B[j]
\end{cases}\\

\end{cases}\right\} & \text{Otherwise}
\end{cases}
$$
