# Levenshtein Edit Distance

Leetcode

Question. Given 2 strings A[1...n] and B[1...m], what is minimum number of edits to A required to make 2 strings identical? An edit could be 1 of the following:

  • Insert a character
  • Delete a character
  • Replace a character

For example, when A = "FOOD", B = "MONEY", the edit distance is 4.

FOOD -> MOOD -> MOND -> MONED -> MONEY

when A = "HORSE", B = "ROS", the edit distance is 3.

HORSE -> RORSE -> ROSE -> ROS

# 1. Observing the Problem

From the example we can see that there are 3 operations at any given index i:

  1. Insert(i) Cost = 1
  2. Delete(i) Cost = 1
  3. Replace(i, newChar) Cost = 1 if A[i] != B[i], else Cost = 0

# 2. Solve the backtracking problem first

# 2.1 Decide the Function Signature

Since we eventually need to convert this to DP, it’s easier to use string indices to keep track of which subproblem we are solving. Let’s use i for string A[1...n] and j for string B[1...n].

\text{EditDist}(i, j)

Our final result is:

\text{EditDist}(n, m)

# 2.2 Base Cases

If any of the string is empty, then the cost would be the length of that non-empty string. This will happen when we call:

{\text{EditDist}}(\red0, j)\\
\text{ or }\\
\text{EditDist}(i, \red0)

# 2.3 Recursive Cases

If both strings are not empty, we will try to insert, delete, or replace. These are the 3 possible choices. To make a choice:

  1. Let’s consider what it really means to “insert”.

    @EX A=\texttt{"FOOD"}, B=\texttt{"MONEY"}

     \begin{matrix}
     &&&&&\Downarrow\\
     A:&\tt{F} & \tt{O}&\tt{O}&\tt{D}&\texttt{\red?}\\
     B:&\tt{M} & \tt{O}&\tt{N}&\tt{E} &\tt{\red Y} 
     \end{matrix}

    If we insert a \tt Y at the questions mark, we have an aligned column. Then we can move to the "next" by considering the column to the left.

    Maybe we can do this by i -= 1, j -= 1, now we are here:

     \begin{matrix}
     &&&&\Downarrow\\
     A:&\tt{F} & \tt{O}&\tt{O}&\tt{\red D}&\tt{\green Y}\\
     B:&\tt{M} & \tt{O}&\tt{N}&\tt{\red E} &\tt{\green Y} 
     \end{matrix}

    But notice that the question mark technically doesn’t exist because A is shorter.

     \begin{matrix}
     &&&&\overset{i}{\Downarrow}\\
     A:&\tt{F} & \tt{O}&\tt{O}&\tt{\red D}&\\
     B:&\tt{M} & \tt{O}&\tt{N}&\tt{E} &\tt{\red Y} \\
     &&&&&\underset{j}{\Uparrow}\\
     \end{matrix}

    So i is already at the position after insertion. All we need to do is move j to j-1.

     \begin{matrix}
     &&&&\overset{i}{\Downarrow}\\
     A:&\tt{F} & \tt{O}&\tt{O}&\tt{\red D}&\tt{\green Y}\\
     B:&\tt{M} & \tt{O}&\tt{N}&\tt{\red E} &\tt{\green Y} \\
     &&&&\underset{j}{\Uparrow}\\
     \end{matrix}

    Therefore the recursive call for insert is:

     \text{EditDist}(i, j-1) + 1

    +1 for the cost of insertion

  2. Delete

    Delete is more straightforward. We simply don’t consider the i–th character in A.

    We can do this by moving i to i - 1.

    Therefore the recursive call for delete is:

     \text{EditDist}(i-1, j) + 1

    +1 for the cost of deletion

  3. Replace

    To replace a character, we move both i and j.

    @EX A=\texttt{"FOOD"}, B=\texttt{"MONEY"}

     \begin{array}{c:}
     \text{\blue {Index}}&\blue1 & \blue2 & \blue 3 & \blue4  & \blue 5 \\
     &&&&\Downarrow\\
     A:&\tt{F} & \tt{O}&\tt{O}&\tt{\red D}&\tt{Y}\\
     B:&\tt{M} & \tt{O}&\tt{N}&\tt{\red E} &\tt{Y} 
     \end{array}\hspace{1em}\xrightarrow{\text{replace(cost 1) at index 4}}\hspace{1em}\begin{matrix}
     \blue1 & \blue2 & \blue 3 & \blue4  & \blue 5 \\
     &&\Downarrow\\
     \tt{F} & \tt{O}&\tt{\red O}&\tt{\green E}&\tt{Y}\\
     \tt{M} & \tt{O}&\tt{\red N}&\tt{\green E} &\tt{Y} 
     \end{matrix}
     \begin{array}{c:}
     \text{\blue {Index}}&\blue1 & \blue2 & \blue 3 & \blue4  & \blue 5 \\
     &&\Downarrow\\
     A:&\tt{F} & \tt{\purple O}&\tt{N}&\tt{E}&\tt{Y}\\
     B:&\tt{M} & \tt{\purple O}&\tt{N}&\tt{E} &\tt{Y} 
     \end{array}\hspace{1em}\xrightarrow{\text{replace(free) at index 2}}\hspace{1em}\begin{matrix}
     \blue1 & \blue2 & \blue 3 & \blue4  & \blue 5 \\\Downarrow\\
     \tt{\red F} & \tt{\green O}&\tt{ N}&\tt{E}&\tt{Y}\\
     \tt{\red M} & \tt{\green O}&\tt{N}&\tt{E} &\tt{Y} 
     \end{matrix}

    To find the cost, we check if the characters A[i] and B[j] are the same.

    1. If A[i] = B[j] then the replacement is free
    2. If A[i] \ne B[j] then the cost is 1

    Therefore the recursive call for replace is:

     \text{EditDist}(i-1, j-1) + \begin{cases}1 & \text{if }A[i]\ne B[j]\\
     0 & \text{if }A[i]= B[j]
     \end{cases}

Since we want the shortest distance, this falls under the best solution case and we will need a min(…) call.

# 2.4 Forming the recurrence

Now we combine 2.2.2 and 2.2.3,

{\text{EditDist}(i, j)} = \begin{cases}
i & \text{if } j = 0\\j & \text{if } i = 0\\
\min \left.\begin{cases}

\text{Insert}\\
\text{Delete}\\
\text{Replace}\\

\end{cases}\right\} & \text{Otherwise}
\end{cases}

Convert to expressions:

{\text{EditDist}(i, j)} = \begin{cases}
i & \text{if } j = 0\\j & \text{if } i = 0\\
\min \left.\begin{cases}

\text{EditDist}(i, j-1) + 1\\
\text{EditDist}(i-1, j) + 1\\
\text{EditDist}(i-1, j-1) + \begin{cases}1 & \text{if }A[i]\ne B[j]\\
0 & \text{if }A[i]= B[j]
\end{cases}\\

\end{cases}\right\} & \text{Otherwise}
\end{cases}

# 2.5 Python: Backtracking Edit Distance

Implements the above recurrence. Github

# 3. Convert Backtracking to Dynamic Programming

# 3.1 Data Structure

The EditDistance(i, j) call requires 2 arguments, so we can use a 2D array.

EditDist: List<List<int>>

We can access the result of a previous function call by EditDist[i, j]

The final result will be in EditDist[n, m]

# 3.2 Order of Evaluation

\text{EditDist}(i, j) needs:

  • \text{EditDist}(i, j - 1)
  • \text{EditDist}(i-1, j)
  • \text{EditDist}(i-1, j-1)

So both the loop for i and j need to go in ascending order.

# 4.2 Python: DP Edit Distance

We can also save the recursion results by using the library @cache decorator. See LRU Cache.

Github

This file has DP and @cache brute force.