Pages

Covering prefix

Another Codility training test is the alpha 2010 one, also known as prefix_set.

We have an array of integers, we want to identify its prefix subset that contains all the values of the complete lot, if it exists. To get the sense of the problem, there is nothing better than a few test cases.

I am coding in C++, and I use gtest as test framework, but you should be able to port this stuff to you preferred developing platform with a limited effort:
typedef std::vector<int>::size_type SIZE;
const SIZE MAX_SIZE = 1000000; // 1

int alpha(const std::vector<int>& input); // 2

TEST(TestAlpha, CaseSimple) // 3
{
  std::vector<int> input = {2, 2, 1, 0, 1};

  ASSERT_EQ(3, alpha(input)); 
}

TEST(TestAlpha, CaseSimple2) // 4
{
  std::vector<int> input = {2, 2, 1, 2, 1};

  ASSERT_EQ(2, alpha(input));
}

TEST(TestAlpha, CaseHugeFlat) // 5
{
  std::vector<int> input;
  input.resize(MAX_SIZE, 0);

  ASSERT_EQ(0, alpha(input));
}

TEST(TestAlpha, CaseHuge) // 6
{
  std::vector<int> input;
  input.reserve(MAX_SIZE);
  input.resize(MAX_SIZE - 1, 1);
  input.push_back(42);

  ASSERT_EQ(MAX_SIZE - 1, alpha(input));
}
1. The biggest beast we could meet has a one million element size.
2. Our function gets in input a vector of int, and returns the index of the last element we need to complete the required subset.
3. Test case provided by Codility. Given in input {2, 2, 1, 0, 1}, we should return three, the index of the element containing the value 0, because the next element contains a 1, that has already showed up.
4. A slight variation of (3). For {2, 2, 1, 2, 1} a return value of two is expected.
5. A huge vector containing just copies of the same value. We should return zero.
6. Variation on (5), the last element is the only different one, we should return its index.

The point (1) leads to a couple of constraints that I didn't rigorized here, to keep the solution as simple as possible (but see below). The vector size should be in the interval (0 .. 1,000,000], the values in the vector in [0 .. 1,000,000).

We are looking for something that is more difficult to say than see, the position of the first copy of the last value in vector. Let's see it in CaseSimple. There are three values in its input vector, zero, one and two. The last one we see is zero, and it appears in only one instance, at the element at position three. So three is the value that we have to return.

The algorithm should be something like this: loop on all the items in the vector; if the current item is a value that we have not already met, its index could be the one that we should return. Pretty simple, isn't it?

We just need to keep track of each possible vector value, to have a way to detect which value we have already checked. I used a vector of boolean for that:
int alpha(const std::vector<int>& input)
{
  std::vector<bool> check;
  check.resize(input.size(), false); // 1

  SIZE candidate = MAX_SIZE; // 2
  for(SIZE i = 0; i < input.size() ; ++i)
  {
    SIZE cur = input[i];
    if(!check[cur]) // 3
    {
      check[cur].flip(); // 4
      candidate = i; // 5
    }
  }

  return candidate;
}
1. The support vector, initialized with all-false values.
2. Initialize the candidate solution to a bad value, so that the caller could know if anything weird happened.
3. I consider only the values that have not been checked before.
4. Mark the current value as already checked.
5. The current index will be returned as solution, if we won't find any better candidate.

This code is fast and it is accepted by Codility with full marks. Still, we could do better, even if only for personal fun. Let's add some check in the code to enforce the limits on size and values for the input vector. I want my function throwing an int exception when it detects something wrong, like showed by these test cases:
TEST(TestAlpha, CaseEmpty) // 1
{
  std::vector<int> input;

  ASSERT_THROW(alpha(input), int);
}

TEST(TestAlpha, CaseTooBig) // 2
{
  std::vector<int> input;
  input.resize(MAX_SIZE + 1);

  ASSERT_THROW(alpha(input), int);
}

TEST(TestAlpha, CaseOverflow) // 3
{
  std::vector<int> input;
  input.resize(1, 1);

  ASSERT_THROW(alpha(input), int);
}
1. There should be at least an item in the vector. Passing an empty vector should result in an (int) exception.
2. The vector size should be smaller than one million.
3. Any value in the vector should be smaller than its size.

It is easy (and cheap) to enforce these requirements in the code:
int alpha(const std::vector<int>& input)
{
  if(input.empty() || input.size() > MAX_SIZE) // 1
    throw 0;

  std::vector<bool> check;
  check.resize(input.size(), false);

  SIZE candidate = MAX_SIZE;
  for(SIZE i = 0; i < input.size() ; ++i)
  {
    SIZE cur = input[i];
    if(cur >= input.size()) // 2
      throw 1;

    if(!check[cur])
    {
      check[cur].flip();
      candidate = i;
    }
  }

  return candidate;
}
1. Check the vector input size.
2. Check each value against the vector size.

No comments:

Post a Comment