#
Pretty Printing
Question. Given an array of
N
words and a maximum page width, what is the most optimal printing strategy that minimizes the total cost? Assume that any individual word can fit on one line.
words[1...N]: List<string>
- the words we want to print neatly on a page
page_width: int
- the maximum number of characters on a single line
cost: (spaces: int) -> int
- a function that measures how "bad" a line is. The more empty spaces there are, the worse a line is. Let’s use x^3 here, where x is the number of trailing spaces.
We want to find:
List<int>
- a list of line break indices
int
- the total cost after applying the line breaks
Depending on the variant of this question, sometimes the last line is free, which means we don’t calculate the cost of the final line.
Def. Line Break
To avoid weird indexing problems, let’s define a break at index i to mean putting \tt words[i] on a new line.
#
1. Solve the Backtracking Problem
#
1.1 Function Signature & Base Case
To keep track of how many words are remaining, we need an index j
to indicate the last word that we need to place onto the paper. So the function looks like:
PrettyPrint(j: int) -> int
and the initial call will be:
PrettyPrint(N)
There’s only 1 base case here: when there are no words, then there’s no cost, return 0.
PrettyPrint(0) = 0
In this case we are scanning from the last word to the first word, but reversing the scanning order also works. Flip the base case and initial call accordingly.
#
1.2 Line Cost
From the problem inputs, we know that the cost of a line is given by:
function cost(spaces) {
return pow(spaces, 3) // raise to 3rd power
}
Then number of empty spaces is:
num_empty_spaces = page_width - sentence_length
For the length of the sentence, we also need to count the number of spaces between the words.
Let’s define a function LineCost(i, j)
.
- i is the index of the first word on this line
- j is the index of the last word on this line
- Both i and j are indices for the \texttt{words} array
\text{LineCost}(i, j) = \begin{cases} \infty&\text{if the words doesn't fit}\\ 0 &\text{if the words fit and $j$ is the last}\\ (\text{num empty spaces})^3 & \text{Otherwise} \end{cases}
This is a bit different from the previous backtracking problems. We are incorporating the find valid choices
step into this cost function by using \infty as 1 of the costs.
To compute the number of empty spaces, lets define:
\text{Extras}(i, j) = \texttt{page\_width} -\underbrace{ (j-i)}_\text{space between words} - \underbrace{\sum_{k=i}^j \texttt{len(words[k])}}_\text{actual letters}
where i, j are the indices of the original words array. We can see that computing \text{Extras} will take O(n) time where n is the number of words.
const pageWidth = ...
const words = [...]
function Extras(i: int, j: int) -> int:
numSpaces = j - i
numLetters = 0
for word in words:
numLetters += len(word)
return pageWidth - numSpaces - numLetters
Now we can translate \text{LineCost} into expressions:
\text{LineCost}(i, j) = \begin{cases} \infty&\text{if Extras}(i, j) < 0\\ 0 &\text{if Extras}(i, j) \ge 0\text{ and }j= n\\ \text{Extras}(i, j)^3 & \text{Otherwise} \end{cases}
function LineCost(i, j) -> int:
extras = Extras(i, j)
if extras < 0:
return Infinity
else if extras >= 0 and j == n:
return 0 // last line is free
return pow(extras, 3)
So \text{LineCost} also runs in O(n) time.
#
1.3 Recursive Cases
This falls under the Best Solution case with the FindNext
strategy, so we need:
Choices
- Every index before the current index is a choice. We can place a line break at any of them.
- The "valid" choice part is handled by the \text{LineCost} function.
- If the choice is not valid, \text{LineCost} will return \infty and it’s definitely not going to be selected.
How to make a choice
To make a choice, we go to the "next" word we want to place. We can either go from the end or go from the beginning.
- If we go from the end j=n, the next one is j-1.
- If we go from the beginning j=1, the next one is j+1.
Since we want to minimize the total cost, we will also need a min(...)
call
#
1.4 Recurrence & Pseudocode
Combining 1.1 ~ 1.3, the recurrence is:
\text{PrettyPrint}(j) =\begin{cases} 0 &\text{if }j = 0\\ \displaystyle\min_{\text{possible breaks }i}\left\{\text{total cost before break} + \text{LineCost}(i, j)\right\}&\text{Otherwise} \end{cases}
In expressions:
\text{PrettyPrint}(j) =\begin{cases} 0 &\text{if }j = 0\\ \displaystyle\min_{1\le i\le j}\left\{\text{PrettyPrint}(i-1) + \text{LineCost}(i, j)\right\}&\text{Otherwise} \end{cases}
Convert to pseudocode:
const words = [...]
const pageWidth = ...
function Extras(i: int, j: int) -> int:
numSpaces = j - i
numLetters = 0
for word in words:
numLetters += len(word)
return pageWidth - numSpaces - numLetters
function LineCost(i, j) -> int:
extras = Extras(i, j)
if extras < 0:
return Infinity
else if extras >= 0 and j == n:
return 0 // last line is free
return pow(extras, 3)
function PrettyPrint(j) -> int:
if j == 0:
return 0
best = Infinity
for i = 1 to j:
breakHereCost = PrettyPrint(i - 1) + LineCost(i, j)
best = min(breakHereCost, best)
return best
Don’t forget i\red{-1} in the recursive call, otherwise we will call PrettyPrint(j)
again when i == j
, which leads to infinite recursion.
#
1.5 Python. Backtracking Pretty Printing
Sanity Check
- At each recursion level, we “choose” a line break
- All indices before the current j is a “valid” choice
- We take the
min
from all the choice results
Make sure you feel comfortable with the brute force solution before moving on to section 2.
#
2. Convert to Dynamic Programming
#
2.1 Order of Evaluation & Data Structure
From the recurrence:
\text{PrettyPrint}(j) =\begin{cases} 0 &\text{if }j = 0\\ \displaystyle\min_{\red{1\le i\le j}}\left\{\text{PrettyPrint}(i-1) + \text{LineCost}(i, j)\right\}&\text{Otherwise} \end{cases}
Unwrapping the \min call shows that \text{PrettyPrint}(j) needs: @Hint
Plug in possible values of i.
\text{PrettyPrint}(0)\\ \text{PrettyPrint}(1)\\ \hspace{0.5em} \vdots\\ \text{PrettyPrint}(j-1)
We need to compute the function results of the smaller j’s first, thus the order of evaluation for j goes in ascending order.
The function only needs 1 parameter, so we can use a 1D array.
#
2.2 Pseudocode
function PrettyPrint_DP(words[1...N], page_width) -> int:
dpTable = Array(shape=(N + 1))
// Pretend 0 is a valid index here
// we can put it at the end or handle it separately
dpTable[0] = 0 // PrettyPrint(0) = 0
// for loop for simulating recursive calls, iterate through:
// PrettyPrint(0), PrettyPrint(1), PrettyPrint(2), PrettyPrint(3)
for j = 1 to N:
best = Infinity
// for loop from the backtracking solution
for i = 1 to j:
// PrettyPrint(i - 1) + LineCost(i, j)
breakHereCost = dpTable[i - 1] + LineCost(i, j)
best = min(breakHereCost, best)
dpTable[j] = best
return dpTable[N] // PrettyPrint(N)
function PrettyPrint_DP(words[1...N], page_width) -> int:
dpTable = Array(shape=(N + 1))
dpTable[0] = 0
for j = 1 to N:
best = Infinity
for i = 1 to j:
breakHereCost = dpTable[i - 1] + LineCost(i, j)
best = min(breakHereCost, best)
dpTable[j] = best
return dpTable[N]
Time complexity is O(n^3) and space complexity is O(n).
Implementation Tip
If handling special indices like 0, -1 feels cumbersome, we can use a map.
dpTable = {-1: 0}
#
❖ 3. Memoizing LineCost
and Extras
Calling LineCost
inside nested for loops can be expensive, so we should isolate that from the main memoization loop. Since LineCost
needs 2 parameters, we can use a 2D array to memoize the results. But directly going through all combinations of i, j doesn’t really help because that’s still O(n^3)
lineCost = Array(shape=(N, N))
for i = 1 to N:
for j = 1 to N:
lineCost[i, j] = ...// normal LineCost evaluation here is O(n)
// total work is still O(n^3)
So we need to somehow use previous results to make the evaluation lineCost[i, j]
O(1).
Observe that the actual work happens inside \text{Extras}(i, j). Instead of recomputing the total number of characters every time we increase j, we can build on top of the previous result like this:
\underbrace{\text{Extras}(i, j)}_{\text{including }\texttt{word[j]}} = \underbrace{\text{Extras}(i, j-1)}_{\text{without }\texttt{word[j]}} - \underbrace{1}_\text{space} - \underbrace{\text{len(words[$j$])}}_\text{\texttt{word[j]} itself}
We can assume the \text{len} function for finding the number of characters is O(1) because the work is done before the \text{PrettyPrint} routine starts.
The base case is a single word:
Extras(i, i) = page_width - len(words[i])
\text{Extras}(i, j) requires \text{Extras}(i, j-1), so the order of evaluation goes in ascending order:
extras = Array(shape=(N, N))
for i = 1 to N:
for j = i to N:
if j == i: // Extras(i, i)
extras[i, j] = pageWidth - len(words[i])
else:
extras[i, j] = extras[i, j-1] - 1 - len(words[j]) // recurrence
Now memoizing \text{Extras} becomes O(n^2).
#
4. Putting everything together
The recurrences are:
\begin{aligned} \text{PrettyPrint}(j) &= \begin{cases} 0 &\text{if }j = 0\\[5pt] \displaystyle\min_{1\le i\le j}\left\{\text{PrettyPrint}(i-1) + \text{LineCost}(i, j)\right\}&\text{Otherwise} \end{cases} \\[25pt] \underbrace{\text{Extras}(i, j)}_{\text{including }\texttt{word[j]}} &= \underbrace{\text{Extras}(i, j-1)}_{\text{without }\texttt{word[j]}} - \underbrace{1}_\text{space} - \underbrace{\text{len(words[$j$])}}_\text{\texttt{word[j]} itself} \end{aligned}
Line cost itself isn’t really a recurrence but it’s important:
\text{LineCost}(i, j) = \begin{cases} \infty&\text{if Extras}(i, j) < 0\\ 0 &\text{if Extras}(i, j) \ge 0\text{ and }j= n\\ \text{Extras}(i, j)^3 & \text{Otherwise} \end{cases}
function PrettyPrintDpFull(words[1...N], pageWidth: int) -> int:
totalCost = Array(shape=(N + 1))
// base case: PrettyPrint(0) = 0
totalCost[0] = 0
// Memoizing Extras(i, j)
extras = Array(shape=(N, N))
for i = 1 to N:
for j = i to N:
if j == i: // base case: Extras(i, i)
extras[i, j] = pageWidth - words[i].length
else:
extras[i, j] = extras[i, j - 1] - 1 - words[j].length
// Memoizing the main recurrence
for j = 1 to N: // PrettyPrint(1), PrettyPrint(2) ... PrettyPrint(N)
bestCost = Infinity
for i = 1 to j:
// simulate LineCost(i, j)
lineCost = Infinity // default to inf for emptySpaces < 0
emptySpaces = extras[i, j]
if emptySpaces >= 0:
if j == N:
lineCost = 0 // last line is free
else:
lineCost = pow(emptySpaces, 3)
// Simulate PrettyPrint(i - 1) + LineCost(i, j)
breakCost = totalCost[i - 1] + lineCost
bestCost = min(bestCost, breakCost)
totalCost[j] = bestCost // the result of PrettyPrint(j)
// Initial call: PrettyPrint(N)
return totalCost[N]
function PrettyPrintDpFull(words[1...N], pageWidth: int) -> int:
totalCost = Array(shape=(N + 1))
totalCost[0] = 0
extras = Array(shape=(N, N))
for i = 1 to N:
for j = i to N:
if j == i:
extras[i, j] = pageWidth - words[i].length
else:
extras[i, j] = extras[i, j - 1] - 1 - words[j].length
for j = 1 to N:
bestCost = Infinity
for i = 1 to j:
lineCost = Infinity
emptySpaces = extras[i, j]
if emptySpaces >= 0:
if j == N:
lineCost = 0
else:
lineCost = pow(emptySpaces, 3)
breakCost = totalCost[i - 1] + lineCost
bestCost = min(bestCost, breakCost)
totalCost[j] = bestCost
return totalCost[N]
Runtime is O(n^2) with O(n^2) space.