Pages

Computing the Edit Distance Between Two Strings

Given two strings, we should compute their edit distance. It is a well know problem, commonly solved by Dynamic Programming.

As we should expect, the idea is very close to the one seen in the previous problem, with the noticeable difference that here we are working on two lists, so our cache is going to be a bidimensional matrix and the complexity of the algorithm is moving to the O(n * m) realm, being n and m the sizes of the two strings in input.

def solution_dp(lhs, rhs):
    table = [[x for x in range(len(rhs) + 1)]]  # 1
    for k in range(1, len(lhs) + 1):
        table.append([k] + [0] * len(rhs))

    for i in range(1, len(table)):
        for j in range(1, len(table[0])):  # 2
            if lhs[i - 1] == rhs[j - 1]:
                table[i][j] = table[i-1][j-1]  # 3
            else:
                table[i][j] = min(table[i - 1][j], table[i][j - 1], table[i - 1][j - 1]) + 1  # 4

    return table[-1][-1]  # 5
1. This is our cache. Instead of having a single dummy cell, here we have both zeroth row and column just filled with zeroes and not touched anymore. Again, not strictly a necessity, still the code is much more readable in this way.
2. Let's loop on all "real" elements in the matrix.
3. If the corresponding characters in the strings are the same, we have a match. Meaning the edit distance won't change, so we just copy in the current cell the value of the one on the left top corner.
4. Otherwise we have seen a change. Since we are looking to the minimal distance, we get the lowest value in the top / left cells, and increase it by one.
5. At the end of the loop, the bottom-right cell contains the result.

Incredible simple, isn't it?

Python code and testcase on GitHub.

No comments:

Post a Comment