Pages

Looking for a sequence in a vector

We want to write a function that checks if a vector of integers contains the natural sequence from 1 to a given value. If so, we should return the index of the highest element in the vector we need to complete the sequence. We are interested only in positive integer numbers, and we won't have to manage anything bigger than 100,000.

This same problem is expressed in a more colorful way in the Frog-River-One exercise in the Codility train page, Counting Elements section. There you could find the function prototype and the description of a test case, that I have ported to GoogleTest, preparing myself to write a C++ solution:
int solution(int x, const std::vector<int>& input);

TEST(TestFro, CaseSample)
{
  ASSERT_EQ(6, solution(5, { 1, 3, 1, 4, 2, 3, 5, 4 }));
}
Notice that I written the test case for a C++11 compiler (GCC 4.8.x) and not for the C++98 used by Codility. In this way I could use the handy list initializer constructor for STL vectors, that makes the code simpler and more readable. For this reason, I have also added a const attribute to the vector reference parameter in the function prototype. All this looks more natural to me, and shouldn't be a big issue for you to adapt it to the original requirements.

The problem looked quite straightforward to me, so I didn't spend anymore time writing more test cases. Beware that this could easily be a fatal mistake.

Inefficient solution

You should spot that this approach is flawed just thinking to it. It is obviously too expensive to be useful for other than trivial cases.

I mimicked what I would naturally do in a real life case. I'd check one by one all the numbers in the sequence, ensuring they are available, keeping track of which one is the rightmost element:
int solution(int x, const std::vector<int>& input)
{
  int result = -1; // 1
  for(int i = 1; i <= x; ++i) // 2
  {
    bool found = false;
    for(unsigned j = 0; j < input.size(); ++j) // 3
    {
      if(input[j] == i) // 4
      {
        if(result < static_cast<int>(j))
          result = j;
        found = true;
        break;
      }
    }
    if(!found) // 5
      return -1;
  }

  return result;
}
1. Initialize the result to the "not found" value.
2. Loop on all the sequence value.
3. Loop on the vector. This loop-in-the-loop makes this piece of code weak. In the worst case the time complexity is in the realm of the Big Oh N Squared family.
4. I am looking for the leftmost "i" elements, when I find it, I check if it is the current rightmost element of the sequence I have currently found, if so, I keep track of it.
5. If a value of the sequence is missing, there is no need of going on checking for the others.

Linear solution

A typical way of reducing the time complexity of an algorithm is buying time with space. And this second solution does just that. Instead of having a loop in a loop, I have one after the other, using a buffer to store the results of the first loop to make them available to the second one:
int solution(int x, const std::vector<int>& input)
{
  std::vector<int> buffer(x, -1); // 1

  for(unsigned i = 0; i < input.size(); ++i) // 2
  {
    unsigned pos = input[i] - 1; // 3
    if(x < (int)pos)
      continue;

    if(buffer[pos] == -1)
      buffer[pos] = i;
  }

  int time = -1;
  for(unsigned i = 0; i < buffer.size(); ++i) // 4
  {
    if(buffer[i] == -1)
      return -1;
    if(buffer[i] > time)
      time = buffer[i];
  }

  return time;
}
1. I'm looking for the natural sequence ranging from 1 to x, so I need a vector sized x. Each element is initialized to -1, meaning that the associated value has not been found yet.
2. First loop, I'm checking all the values in input.
3. I convert the currently checked value in input to an index for the buffer. If the value is out of range, we skip it. Otherwise, if the current value has not already been found, we keep track of its position.
4. Second loop, I'm checking all the value in the buffer. If I found a value not set, I return error. Otherwise I keep track of the highest value, and the end I return it.

[After one year, I came back to this problem. As often happens, I devised another solution. Maybe it would look more intuitive to you.]

No comments:

Post a Comment