Number of disc intersections - linear solution

I described this Codility problem in previous post. There I took an alluring path that drove me to a weak O(N square) solution. Then I tried to save the day, tweaking that algorithm so that, in the best case, it could lead to an acceptable O(N log N) result. But we could not avoid to acknowledge that we still have a Big Oh N Square time complexity in the worst case.

Thinking a bit more on the problem nature, we should find a much more elegant algorithm, that gives us a shiny O(N) time complexity.

A different approach

Instead of focusing on the discs, let's check how a generic point contributes to the total number of intersections. We should just know how many other discs have started before, and how may are starting in that point. An example should clarify the matter.

The red vertical line in the picture represent the current point.

We have three discs (A, B, C - represented by black lines) that have already started, and a new one (D - the blue line) that starts there.

It is easy to see that we'll have three new intersections, AD, BD, CD. The blue line could be accounted for one intersection for each black line.

Things get a bit more complicated if we have many new discs starting in a point.
Say that we have three running discs (A, B, C - black lines), and three new ones (D, E, F - blue lines) are starting there (the red line).

Similarly to the previous example, we'll have a new bunch of intersections deriving from the contact between the first group and the second, namely, AD, AE, AF; BD, BE, BF; CD, CE, CF. That is 3 * 3.

But this is just part of the story, now we need to consider also of the intersection internal to the second group. In this case they are three: DE, DF, EF.

It is easy to count them by hand, but, what is the rule that determine this number?

Mathematically speaking, we are looking for the k-combinations of n-elements, where k is the number of parts involved in the intersection (in our case it is always 2), and n the number of elements we are dealing to (the new discs, represented by blue lines). This n-choose-k value is formally named binomial coefficient (n k) and is calculated using the formula n! / (k! * (n-k)!).

Being k = 2, we want to get the number of intersection between couple of discs, we have just n, the number of new discs, as variable. And we can simplify the formula to n! / (2! * (n-2)!) = n * (n-1) / 2

Let's get the number of intersections using it.

In the first case, there is just one new disc, so we have 3 intersection, 1 blue against 3 blacks, plus 1-choose-2 for the blue: n * (n - 1) / 2 = 1 * 0 / 2 = 0. That means, in the first case we have just three intersections.

What happens in the case of 2 new discs against 3 already existing? We'll have 6 intersections, 2 blues * 3 blacks, plus 2-choose-2 for the blues: 2 * 1 / 2 = 1. A total of seven intersections.

If we have three blues and three blacks, we'll have 3 * 3 = 9 intersections, and 3-choose-2 for the blues alone: (3 2) = 3 * 2 / 2 = 3. Twelve intersections.

The code
int beta(const std::vector<int>& input)
{
  const SIZE_T SIZE = input.size();

  if(SIZE < 2)
    return 0;
  if(SIZE > MAX_ELEMENTS)
    return -1;

  std::pair<int, int> disks[SIZE]; // 1
  for(SIZE_T i = 0; i < input.size(); ++i)
  {
    ++disks[(int)i < input[i] ? 0 : i - input[i]].first;
    ++disks[i+input[i] >= input.size() ? input.size() - 1 : i + input[i]].second;
  }

  int result = 0;
  int open = 0; // 2
  for(SIZE_T i = 0; i < input.size(); ++i)
  {
    result += open * disks[i].first + (disks[i].first * (disks[i].first - 1)) / 2; // 3
    if(result > MAX_INTERSECTING_PAIRS)
      return -1;

    open += disks[i].first - disks[i].second; // 4
  }

  return result;
}
1. Same data structure as used in the previous implementation, but here it has a different meaning. Each pair represents a point in the interval [0, input size); its first component keep tracks of how may discs start in that point (note that I cut off to zero this value, for the same reason discussed in the previous post); its second component represents the number of discs ending there.
2. Initially we have no "open" discs.
3. This is the core of the algorithm. The first part calculates the intersections between the already existing discs and the number of discs starting in this point. The second part is the binomial coefficient discussed above.
4. Add to "open" the number of discs starting here, and subtract the one that terminates here.

7 comments:

  1. Well, I've been looking over this very nice solution, but it is making me feel pretty stupid. I understand counting the starts and stops for the discs. That is straightforward. However, I'm not clear on the n-choose-k nature of the problem from the explanation.

    Multiplying 'open' with the number of starting discs is clear. Then you add the binomial coefficient, but that is calculated as the number of opening disks times itself minus 1, all divided by 2. Call opening disks of the i'th element d1[i], you you give d1[i]*(d1[i]-1)/2.

    I am confused by the equivalence you draw between (n k) = n! / (2 * (k! * (n-k)!)) and the statement that the binomial coefficient (3 2) is 3! / 2 * 1. The problems in understanding I have is as follows:

    The binomial coefficient doesn't include '2' in it's formula; n-choose-k is (n k) = n!/(k!(n-k)!) - where did your '2' come from?

    Of course, the main point of confusion for me is what the n-choose-k portion of the program actually means. I've looked a little into multisets on Wikipedia and think it might have something to do with that, but I'm still unclear. I suppose I'm missing the leap in logic that says, "Aha! This is an n-choose-k problem."

    Thanks for the very elegant O(n) solution. If you have some clarification, it would be highly appreciated!

    ReplyDelete
    Replies
    1. Actually, Thomas, you were right to be confused, the "2" at the denominator was a typo. Now I corrected it, sorry about that.

      I modified the post, adding some substance to the part that explains where the n-choose-k comes from. Hope I made myself a bit clearer.

      The idea is that besides the intersections between the already existing discs and the new ones, that is calculated with a mere multiplication, we have to consider also the intersection in the group of the new discs. And this is a n-choose-k, where k is 2, because we are interested in the intersections between couple of discs.

      Delete
    2. Excellent explanation; it is all clear now. Very clever. Thank you very much!

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Nice. At first I was skeptical, but having looked through the code, I'm deeply impressed. It seems to me that this solution cannot be extended for intersecting circles (as opposed to discs in this case), whereas the O(nlogn) solution can be.

    ReplyDelete
    Replies
    1. Thank you! I reckon you are right. I have tailored this solution on the problem's requirements. Changing them, even in a way that at first sight doesn't seem substantial, could lead to unforeseen troubles in the algorithm.

      Delete