Pages

The closest pair of points

Given n points on a plane, find the smallest distance between a pair of two (different) points.

Let's see first a naive solution:
def solution_naive(points):
    size = len(points)
    if size < 2:  # 1
        return 0.0

    result = float('inf')  # 2
    for i in range(size - 1):
        for j in range(i+1, size):
            result = min(result, distance(points[i], points[j]))
    return result
1. If only one point, or no point at all, is passed, the minimum distance is defaulted to zero.
2. Otherwise initialize it to infinite, then loop on all the possible couples of points to find the one with minimum distance.

Where the distance function could be implemented as:
def distance(a, b):
    return math.sqrt((a[0]-b[0])**2 + (a[1]-b[1])**2)

The flaw of this naive implementation is that it has an obvious O(n^2) time complexity. Meaning that it rapidly becomes useless with the growing of the number points in input.

Let's apply a divide and conquer approach to this problem. The thing is that to do that we need a bit of mathematics that is beyond my knowledge. You could read this document by Subhash Suri from UC Santa Barbara for more details.

Firstly we sort the points for their x component, then we call the recursive function:
def solution_dac(points):
    points.sort()
    return closest_distance(points)
We firstly divide the problem up to get pieces that are so simple that they can be solved with the naive algorithm, and apply a merging part to "conquer" that step:
def closest_distance(points):
    size = len(points)
    if size < 4:  # 1
        return solution_naive(points)

    lhs = points[:size // 2]  # 2
    rhs = points[size // 2:]

    left = closest_distance(lhs)
    right = closest_distance(rhs)
    result = min(left, right)
    if result == 0.0:
        return result

    median = (lhs[-1][0] + rhs[0][0]) / 2  # 3
    closers = []
    for i in range(len(lhs)):
        if abs(lhs[i][0] - median) < result:
            closers.append(lhs[i])
    for i in range(len(rhs)):
        if abs(rhs[i][0] - median) < result:
            closers.append(rhs[i])
    closers.sort(key=lambda p: p[1])  # 4

    for i in range(len(closers) - 1):  # 5
        for j in range(i + 1, min(i + 6, len(closers))):
            if abs(closers[i][1] - closers[j][1]) < result:
                result = min(result, distance(closers[i], closers[j]))
    return result
1. For up to 3 points we could use the naive approach.
2. Split the list of points in two parts, left hand side and right hand side, and iterate on them the algorithm. If we see a minimal distance of zero, meaning two overlapping points, there is no use in going on.
3. Draw a line between the rightmost point in the left part of point collection and the leftmost in right part. Then add to a buffer list all the points that have a distance to this median line that is less than the currently found closest distance.
4. Sort the points on their "y" component.
5. And here happens the magic. There is a theorem stating that we could stop checking at 6 when looking for a closest distance in this condition.

The cutoff at 6 ensures that the time complexity of this algorithm if O(n log n).

Full python code on GitHub.

No comments:

Post a Comment