Number of disc intersections - O(N square) solution

The beta 2010 test from Codility training requires knowing, at least at intuitive level, about binomial coefficients. Otherwise, considering the short time available for pondering on such problems, and the pressure you could feel if this is part of an interview process, you are doomed. Or you should fall back to a sub-optimal solution.

The problem looks quite innocent. We have an array of integers, each element representing a one-dimensional "disc" having as center the index of the element itself, and as radius the element's value. So, if A[3] is 1, this means that we have a "disc" centered on three with radius of one. You can visualize it as a segment [2, 4]. Our job is writing a function that counts the intersections among these "discs".

Setup

As usual, I started working on the problem thinking for a C++ implementation, and writing a few test cases (I use gtest) to help me to design the code. Here is three of the tests I have written, with a few constants and a type definition that would help keeping the code readable:
const int MAX_INTERSECTING_PAIRS = 10000000; // 1
const unsigned int MAX_ELEMENTS = 100000; // 2
const int MAX_VALUE = 2147483647; // 3

typedef std::vector<int>::size_type SIZE_T;

int beta(const std::vector<int>& input); // 4

TEST(TestBeta, CaseSimple) // 5
{
  std::vector<int> input = {1, 5, 2, 1, 4, 0};
  ASSERT_EQ(11, beta(input));
}

TEST(TestBeta, CaseOverflow) // 6
{
  std::vector<int> input = { 0, MAX_VALUE, 0 };
  ASSERT_EQ(2, beta(input));
}

TEST(TestBeta, CaseTooMany)
{
  std::vector<int> input;
  input.resize(MAX_ELEMENTS + 1);

  ASSERT_EQ(-1, beta(input));
}
1. If the solution is getting huge, we are allowed to (actually, we have to) abort the job and return an error code (minus one).
2. Ditto if there are too many elements in the array.
3. We can assume that the values in input are 32 bit signed integer, that means, this is the max value we could get.
4. Declaration for the function we have to implement.
5. Test case provided by Codility. We have six segments, [-1, 1], [-4, 6], [0, 4], [2, 4], [0, 8], [5, 5], that should give eleven intersections. Note that the last element has length zero, so it collapses to a single point, on x = 5.
6. Codility problems are often written to lead to surprises on the extreme cases. Better to think of a few test cases on that neighborhood. Here I ensure that a huge "disk" is checked correctly. I expect just two intersections here.
7. Let's check what happens when the input is too big.

An inefficient solution

The first solution that jumped to my mind is naive but effective, at least for a small number of elements. Just loop on all the disks, compare the right limit of the current one against the left one of all the other ones. The good thing about this algorithm is that is pretty easy to write. You should have it written and tested in a whiff. On the flip side, it has a Big Oh N Square time complexity. Meaning, it would take forever to run, in case of a large vector in input.
int beta(const std::vector<int>& input)
{
  if(input.size() < 2) // 1
    return 0;
  if(input.size() > MAX_ELEMENTS) // 2
    return -1;

  int result = 0;
  for(int64_t i = 0; i < static_cast<int64_t>(input.size() - 1); ++i) // 3
    for(int64_t j = i+1; j < static_cast<int64_t>(input.size()); ++j) // 4
      if(i + input[i] >= j - input[j]) // 5
      {
        if(result == MAX_INTERSECTING_PAIRS) // 6
          return -1;
        ++result;
      }

  return result;
}
1. If there are less than two "discs", we can't have any intersection.
2. No more than MAX_ELEMENTS are allowed in input.
3. Let's loop on all the discs (but the last one, all the job would be already done at that point).
4. As said above, I compare each disc against all the other. More precisely, I compare it against all the next ones, since the intersections with the previous ones have been already checked by the previous iterations.
5. Check if the "i" disc is overlapping the "j" one. If you wondered why I have used int64_t integers to represent the vector's indices, here is the answer. If not, in case of huge values stored as vector values, we could get here an overflow. I could have used a more natural type for the indices, and write this line something like "if((int64_t)input[i] + input[j] + i - j >= 0)". Use whichever version looks more readable to you.
6. We have an intersection, before keeping track of it, we should check if we haven't already reached the limit.

Testing time duration

To see how this implementation is bound to fail, we could write a test case that checks how long it takes to run it. Boost has a time library that comes handy for this task:
#include <boost/date_time/posix_time/posix_time.hpp>

// ...

TEST(TestBeta, Case20K)
{
  std::vector<int> input;
  input.resize(20000); // 1

  boost::posix_time::ptime start = boost::posix_time::microsec_clock::local_time();

  ASSERT_EQ(0, beta(input));

  boost::posix_time::time_duration diff = boost::posix_time::microsec_clock::local_time() - start;
  ASSERT_LT(diff.total_milliseconds(), 5); // 2
}
1. In this test I don't care much on the vector content, a twenty thousand zeroes company is good enough to test the time complexity of my function.
2. I want that checking this vector would take less than five milliseconds. Tune this value to your system and expectations.

Improve the result

We have got a solution, and this is sweet, but it is a weak one (it scores 65/100). Now we have a couple of alternative. Working on this code to make it more performant, or looking for a better algorithm that would solve the structural issues of the current one. More about it in the following posts.

No comments:

Post a Comment