Pages

Partial sum

Given a numeric sequence, its partial summation is another sequence where each element represents the sum of all the elements of the original sequence till that point. So, given the sequence { 1, 1, 1, ... }, its partial sum is { 1, 2, 3, 4, ... }. It is also known as prefix sum, prefix reduction, scan, or cumulative sum.

There is a C++ STL function that implements this concept, std::partial_sum(), available in two overloads. One is a proper partial sum implementation, the other is a generic one, that lets the caller to specify which kind of operation applying to transform the original sequence.

I have written a few test cases (for GoogleTest) that should clarify better its usage.

Typical case

I have an input sequence, I want to get its partial sum in a new container:
TEST(TestParSum, CaseStandard)
{
  std::vector<int> input { 1, 3, 1, 4, 2, 3, 5, 4 }; // 1
  std::vector<int> output(input.size()); // 2
  std::partial_sum(input.begin(), input.end(), output.begin()); // 3

  ASSERT_EQ(8, output.size());
  ASSERT_EQ(1, output[0]);
  ASSERT_EQ(4, output[1]);
  ASSERT_EQ(5, output[2]);
  ASSERT_EQ(9, output[3]);
  ASSERT_EQ(11, output[4]);
  ASSERT_EQ(14, output[5]);
  ASSERT_EQ(19, output[6]);
  ASSERT_EQ(23, output[7]);
}
1. The original container. I have used the handy C++11 list initalizer to set it up on construction.
2. The generated result will be stored in this container. It should have the same size of the input sequence.
3. Calculate the partial sum for the passed sequence, putting the result starting from the beginning of the output container.

In place generation

The standard partial_sum() function is designed in such a way that it could overwrite the original data:
TEST(TestParSum, CaseOverwrite)
{
  std::vector<int> data { 1, 3, 1, 4, 2, 3, 5, 4 };
  std::partial_sum(data.begin(), data.end(), data.begin()); // 1

  ASSERT_EQ(8, data.size());
  ASSERT_EQ(1, data[0]);
  ASSERT_EQ(4, data[1]);
  ASSERT_EQ(5, data[2]);
  ASSERT_EQ(9, data[3]);
  ASSERT_EQ(11, data[4]);
  ASSERT_EQ(14, data[5]);
  ASSERT_EQ(19, data[6]);
  ASSERT_EQ(23, data[7]);
}
1. The output iterator is the same that the input starter iterator. We are about to lose the original data, but we are saving some space in memory.

Not only summations

We could need something similar to a partial sum, with the variation that the operator applied is not an addition:
TEST(TestParSum, CaseMultiply)
{
  std::vector<int> data { 1, 3, 1, 4, 2, 3, 5, 4 };
  std::partial_sum(data.begin(), data.end(), data.begin(),
    std::multiplies<int>()); // 1

  ASSERT_EQ(8, data.size());
  ASSERT_EQ(1, data[0]);
  ASSERT_EQ(3, data[1]);
  ASSERT_EQ(3, data[2]);
  ASSERT_EQ(12, data[3]);
  ASSERT_EQ(24, data[4]);
  ASSERT_EQ(72, data[5]);
  ASSERT_EQ(360, data[6]);
  ASSERT_EQ(1440, data[7]);
}
1. Instead of summing, I want to multiply the values. So I pass the STL multiplies functor as last parameter. Nothing else changes from a "normal" partial_sum() call.

Even more generalized

Obviously, we are not restricted to use STL arithmetic functors as a binary operation. We could use our own specifically tailored functor or, if our compiler is C++11 compliant, lambda function.

Here I just rewrite the previous test:
TEST(TestParSum, CaseMultiplyLambda)
{
  std::vector<int> data { 1, 3, 1, 4, 2, 3, 5, 4 };
  std::partial_sum(data.begin(), data.end(), data.begin(),
    [](int a, int b) { return a*b; });

  ASSERT_EQ(8, data.size());
  ASSERT_EQ(1, data[0]);
  ASSERT_EQ(3, data[1]);
  ASSERT_EQ(3, data[2]);
  ASSERT_EQ(12, data[3]);
  ASSERT_EQ(24, data[4]);
  ASSERT_EQ(72, data[5]);
  ASSERT_EQ(360, data[6]);
  ASSERT_EQ(1440, data[7]);
}

Go to the full post

Simple finite state automaton

This problem is about writing a trivial finite state machine that gets in input the number of elements that it should work with, and a vector of integers that represents the operations we want it to execute on them. As output we expect a vector containing its final state.

It comes from the Codility Train problems' collection, the section about Counting Elements, identified by the codename Max-Counters.

Our function has to create a vector of integers initialized to zero and then, accordingly to the the values specified in the input vector, increase by one the specified element, or set all of them to the current maximum value.

A test case (written in C++ and the Google Test framework) should clarify the expected behavior:
std::vector<int> solution(int N, std::vector<int> & input)

TEST(MaCo, Given)
{
    std::vector<int> input;
    input.push_back(3);
    input.push_back(4);
    input.push_back(4);
    input.push_back(6);
    input.push_back(1);
    input.push_back(4);
    input.push_back(4);

    std::vector<int> output = solution(5, input);
    ASSERT_EQ(5, output.size());

    ASSERT_EQ(3, output[0]);
    ASSERT_EQ(2, output[1]);
    ASSERT_EQ(2, output[2]);
    ASSERT_EQ(4, output[3]);
    ASSERT_EQ(2, output[4]);
}
The first parameter we pass to our function is a five. Meaning we want it to work on an int vector of five elements.
The second parameter is the vector containing the (sort of) program we want our function to execute. Each integer represent an instruction:
- 3: increase the third element, that should store now 1.
- 4: increase the fourth element, set to 1.
- 4: increase again the fourth element, now it is 2.
- 6: assign the top value (2) to all the elements.
- 1: increase the first element to 3.
- 4: increase the fourth element to 3.
- 4: increase the fourth element to 4.
The resulting vector should be { 3, 2, 2, 4, 2 }.

As it always happens in this kind of problem, we can avoid any error handling, trusting the user to pass correct input. Besides, we expect to work with reasonable small values (100,000 is the biggest integer we should see in input).

When you get what the problem is asking, you should be already close to get the solution. I could figure out rapidly a simple solution without the need of writing other test cases (even if I wouldn't recommend you to follow my example, and to play more safely instead, spending some time to improve the testing).

Inefficient solution

The codility evaluation for this solution is 77%. It works fine for small data input, it gets far too slow as N and M (the size of the input vector) grows. We should spot immediately that it has an O(N*M) time complexity:
vector<int> solution(int N, vector<int> & input)
{
    std::vector<int> result(N);

    int highest = 0; // 1
    for(int i = 0; i < input.size(); ++i)
    {
        int value = input[i];
        if(value > 0 && value <= N) // 2
        {
            int newValue = ++result[value-1];
            if(highest < newValue)
                highest = newValue;
        }
        else if(value == N + 1) // 3
        {
            for(int j = 0; j < result.size(); ++j) // 4
                result[j] = highest;
        }
    }
    
    return result;
}
1. The current highest value in the vector of results. It is going to be used for the "leveling" operation.
2. If the current value in the input vector is in [1..N-1], I should apply the "increase the specified element value" operation. If in this way I get a new highest value, I keep track of the change.
3. A less paranoid developer would have probably used a plain "else" here. I couldn't help to add a minimal error handling, checking if the value is actually the expected one before applying the leveling. In this way any unexpected input value would lead to a no-op.
4. This is the obvious weak point in the algorithm I have chosen. A loop in a loop that we should avoid.

Linear solution

We have immediately spot the first implementation issue, what we want to do is moving the loop for applying the leveling outside the loop that checks the operations we have to perform on our data.

If you think about it, this is nothing complicated. We just have to use another integer to remember which is the value we have used for the latest leveling operation.

The refactoring is not a complicated task:
std::vector<int> solution(int N, std::vector<int>& input)
{
    std::vector<int> result(N);

    int highest = 0;
    int watermark = 0; // 1

    for(unsigned i = 0; i < input.size(); ++i)
    {
        int index = input[i] - 1; // 2
        if(index >= 0 && index < N)
        {
            result[index] = std::max(watermark, result[index]) + 1; // 3
            highest = std::max(highest, result[index]); // 4
        }
        else if(index == N)
        {
            watermark = highest; // 5
        }
    }

    for(unsigned i = 0; i < result.size(); ++i) // 6
    {
        result[i] = std::max(result[i], watermark);
    }

    return result;
}
1. I think to the leveling operation like a sort of flooding. This variable keeps track of the level reached by the water.
2. It gets complicated to adjust the index as perceived by the user (based one) and as managed internally (based zero) in the following code, so I added an utility variable that should lead to a more pleasant reading.
3. The new value for the current element is determined by it previous value and the watermark. I get the highest value, and I increase it.
4. I could have left the if-check as in the original version, choose which one you feel is more readable.
5. This is the core of the change. I don't loop anymore, I just remember the new level. This leads to the nuisance in (3) but saves us load of time.
6. No more loop in a loop. However, we should pay attention to apply the watermark value only when required. We want to only level up the elements.

Go to the full post

Looking for a sequence in a vector

We want to write a function that checks if a vector of integers contains the natural sequence from 1 to a given value. If so, we should return the index of the highest element in the vector we need to complete the sequence. We are interested only in positive integer numbers, and we won't have to manage anything bigger than 100,000.

This same problem is expressed in a more colorful way in the Frog-River-One exercise in the Codility train page, Counting Elements section. There you could find the function prototype and the description of a test case, that I have ported to GoogleTest, preparing myself to write a C++ solution:
int solution(int x, const std::vector<int>& input);

TEST(TestFro, CaseSample)
{
  ASSERT_EQ(6, solution(5, { 1, 3, 1, 4, 2, 3, 5, 4 }));
}
Notice that I written the test case for a C++11 compiler (GCC 4.8.x) and not for the C++98 used by Codility. In this way I could use the handy list initializer constructor for STL vectors, that makes the code simpler and more readable. For this reason, I have also added a const attribute to the vector reference parameter in the function prototype. All this looks more natural to me, and shouldn't be a big issue for you to adapt it to the original requirements.

The problem looked quite straightforward to me, so I didn't spend anymore time writing more test cases. Beware that this could easily be a fatal mistake.

Inefficient solution

You should spot that this approach is flawed just thinking to it. It is obviously too expensive to be useful for other than trivial cases.

I mimicked what I would naturally do in a real life case. I'd check one by one all the numbers in the sequence, ensuring they are available, keeping track of which one is the rightmost element:
int solution(int x, const std::vector<int>& input)
{
  int result = -1; // 1
  for(int i = 1; i <= x; ++i) // 2
  {
    bool found = false;
    for(unsigned j = 0; j < input.size(); ++j) // 3
    {
      if(input[j] == i) // 4
      {
        if(result < static_cast<int>(j))
          result = j;
        found = true;
        break;
      }
    }
    if(!found) // 5
      return -1;
  }

  return result;
}
1. Initialize the result to the "not found" value.
2. Loop on all the sequence value.
3. Loop on the vector. This loop-in-the-loop makes this piece of code weak. In the worst case the time complexity is in the realm of the Big Oh N Squared family.
4. I am looking for the leftmost "i" elements, when I find it, I check if it is the current rightmost element of the sequence I have currently found, if so, I keep track of it.
5. If a value of the sequence is missing, there is no need of going on checking for the others.

Linear solution

A typical way of reducing the time complexity of an algorithm is buying time with space. And this second solution does just that. Instead of having a loop in a loop, I have one after the other, using a buffer to store the results of the first loop to make them available to the second one:
int solution(int x, const std::vector<int>& input)
{
  std::vector<int> buffer(x, -1); // 1

  for(unsigned i = 0; i < input.size(); ++i) // 2
  {
    unsigned pos = input[i] - 1; // 3
    if(x < (int)pos)
      continue;

    if(buffer[pos] == -1)
      buffer[pos] = i;
  }

  int time = -1;
  for(unsigned i = 0; i < buffer.size(); ++i) // 4
  {
    if(buffer[i] == -1)
      return -1;
    if(buffer[i] > time)
      time = buffer[i];
  }

  return time;
}
1. I'm looking for the natural sequence ranging from 1 to x, so I need a vector sized x. Each element is initialized to -1, meaning that the associated value has not been found yet.
2. First loop, I'm checking all the values in input.
3. I convert the currently checked value in input to an index for the buffer. If the value is out of range, we skip it. Otherwise, if the current value has not already been found, we keep track of its position.
4. Second loop, I'm checking all the value in the buffer. If I found a value not set, I return error. Otherwise I keep track of the highest value, and the end I return it.

[After one year, I came back to this problem. As often happens, I devised another solution. Maybe it would look more intuitive to you.]

Go to the full post

Is it a permutation?

We want to write a function that gets in input a vector and check if it contains a permutation of the sequence of natural integers starting with 1.

This is an input that we should accept: { 4, 1, 3, 2 }
And this one should be rejected: { 4, 1, 3 }

You could find this problem in the Codility Train page, section Counting Elements, codename Perm-Check. You can submit you solution in one of a few different supported programming languages, to check it against their acceptance test.

My solution is written in C++98 (alas C++11 is not supported) with test cases for GoogleTest.

Test cases

A couple of test cases are given in the problem presentation, I just added a couple of trivial ones more:
int solution(std::vector<int>& input); // 1

TEST(PeCe, GivenGood) // 2
{
    std::vector<int> input;
    input.push_back(4);
    input.push_back(1);
    input.push_back(3);
    input.push_back(2);

    EXPECT_EQ(1, solution(input));
}

TEST(PeCe, GivenBad) // 3
{
    std::vector<int> input;
    input.push_back(4);
    input.push_back(1);
    input.push_back(3);

    EXPECT_EQ(0, solution(input));
}

TEST(PeCe, OneGood) // 4
{
    std::vector<int> input(1, 1);

    EXPECT_EQ(1, solution(input));
}

TEST(PeCe, OneBad) // 5
{
    std::vector<int> input(1, 42);

    EXPECT_EQ(0, solution(input));
}

TEST(PeCe, OneBigBad) // 6
{
    std::vector<int> input(1, 1000000000);

    EXPECT_EQ(0, solution(input));
}
1. The function prototype we have to implement. For some reason, instead of returning a boolean, it returns an integer that would act as a K&R C boolean emulation, 0 means false, 1 true.
2. First given test, it should detect the permutation.
3. Second given test, no permutation in input.
4. Simplest possible good case.
5. Simplest possible bad case.
6. A curious case. We should consider the case we have huge integers in input. Up to one billion, actually. This is a bit strange, since the max expected size for the vector is just a modest 100 thousand, however we should expect some trickery in the Codility acceptance test based on this point.

A too expensive solution

What we can think is repeatedly scan the input up looking for all the sequence values. If we don't find an expected one, we return failure, otherwise the input is accepted. We should smell immediately something bad. Looping on all the N elements of a container, checking for N values, leads to a Big Oh N Squared worst case time complexity, that we should avoid.

In any case, here it is:
int solution(std::vector<int>& input)
{
    for(unsigned i = 1; i <= input.size(); ++i) // 1
    {
        bool found = false;
        for(unsigned j = 0; j < input.size(); ++j) // 2
        {
            if(static_cast<unsigned>(input[j]) == i) // 3
            {
                found = true;
                break;
            }
        }
        if(!found)
            return 0;
    }

    return 1;
}
1. Our scrambled sequence should contains all the integers from one up to the number of elements in input.
2. Let's look for the current value.
3. The vector in input should have been parametrized for unsigned values, Codility decided otherwise, so here I say explicitly to the compiler not to worry about comparison with what it perceives as objects of different types. Trust me, they are actually both unsigned ints.

The code works fine for small input, but it rapidly gets too slow to be acceptable when the input size grows.

Linear solution

Let's divide the job in two consecutive steps, and use a buffer to store the temporary result. In this way, instead of having one loop nested in another one, we'll have one loop after the other, reducing the time complexity to O(N). We pay this improvement increasing the algorithm space complexity, but we happy to pay this price:
int solution(std::vector<int>& input)
{
    std::vector<bool> buffer(input.size()); // 1

    for(unsigned i = 0; i < input.size(); ++i) // 2
    {
        unsigned value = input[i]-1;
        if(value < input.size()) // 3
            buffer[value] = true;
        else
            return 0;
    }

    return std::count(buffer.begin(), buffer.end(), false) == 0 ? 1 : 0; // 4
}
1. Flags for the expected values in the input vector. The elements of a container are initialized with the default value for the underlying type. So in this moment here we have a bunch of false elements.
2. Loop on all the input values.
3. The OneBigBad test case we have seen above was written to help me to remember to write this check. If input contains (at least) a value that is bigger than expected, we know that we have to reject it. Only if the value is in the correct range we set its flag to true. Moreover, notice that in the line above I have also silently converted the value from signed to unsigned, if a rouge input included a negative value, it has been converted there to a huge positive numbers, and here the anomaly is caught. Better would have been to impose that the input vector was of unsigned elements.
4. I have hidden the second loop that uses the buffer partial result in this call to the STL count() function. We expect all no flag set to false anymore. If this is what happens, we can return success.

Sleeker and (maybe) faster

Actually, as suggested by Karsten, see his comments below, there is no actual need of the final buffer checking, if we have already carefully checked each value in input. The advantage of moving the check here is that we can fast fail as soon as we detect a duplicate element in input:
int solution(std::vector<int>& input)
{
    std::vector<bool> buffer(input.size());

    for(unsigned i = 0; i < input.size(); ++i)
    {
        unsigned value = input[i]-1;
        if(value < input.size() && buffer[value] == false) // 1
            buffer[value] = true;
        else
            return 0;
    }

    return 1; // 2
}
1. If we see that the flag has been already set, it means that we have spotted a duplicate, so we can reject the current input.
2. If we get here, each element in input has been accepted, we just have to return success.

Go to the full post

Equilibrium in a vector

We have an array of integer in input, whose size we know for sure being in the range [2..100,000] and each element is in [−1,000..1,000]. We'd like to split it in two parts, so that the sum of the elements on both sides is as close as possible. For some weird reason, we are not asked to return the index for which such condition is fulfilled, but the minimal difference, as an absolute value, between left and right sum.

You could find this problem in the Codility Train page, in the Time Complexity section, under the nickname Tape-Equilibrium. You could submit there your solution for evaluation in one of the many languages supported. C++11 is not one of them, so I have written my code for C++98.

[Good news, now Codility supports C++11. I have refreshed the code accordingly. Please follow the link for details.]

Notice that the number of elements in the vector and the value for each element is such that we can happily work with plain integers, being the maximum sum we could get something about a mere hundred millions. The real issue in this problem is time complexity, we should strive for a linear solution, if we want to get a 100% score.

Firstly, some test cases (written for GoogleTest):
int equilibrium(std::vector<int>& input); // 1

TEST(TapEq, Given) // 2
{
    std::vector<int> input;
    input.push_back(3);
    input.push_back(1);
    input.push_back(2);
    input.push_back(4);
    input.push_back(3);

    EXPECT_EQ(1, equilibrium(input));
}

TEST(TapEq, TwoBig) // 3
{
    std::vector<int> input;
    input.push_back(-1000);
    input.push_back(1000);

    EXPECT_EQ(2000, equilibrium(input));
}

TEST(TapEq, HundredK) // 4
{
    std::vector<int> input(100000, 1000);

    EXPECT_EQ(0, equilibrium(input));
}

TEST(TapEq, AlmostHundredK) // 5
{
    std::vector<int> input(99999, 1000);

    EXPECT_EQ(1000, equilibrium(input));
}
1. I don't feel right to pass around a non-const reference to a STL container if there is not any compelling reason to do that. Here the reason is just that Codility wants us to do that.
2. This is the test given with the problem description. The equilibrium point is such that the vector is split between (3, 1, 2) and (4, 3) leading to a difference of 1, that is going to be returned to the caller.
3. Minimal case, just two elements.
4. Biggest vector I could get in input, each element has value 100,000, so the result is zero.
5. Almost like the case (4), but here we have an odd number of elements, all of them have the same value, so it is not possible to get a perfect equilibrium.

Bad solution O(N**2)

We could think of looping on all the elements in the vector, from the first to the last but one. That would be our pivot, dividing the vector in two parts. Then we'll all the left and right elements, and compare the results:
int equilibrium(std::vector<int>& input)
{
    int result = std::numeric_limits<int>::max(); // 1
    for(unsigned i = 0; i < input.size() - 1; ++i)
    {
        int left = 0; // 2
        for(unsigned j = 0; j <= i; ++j)
            left += input[j];

        int right = 0; // 3
        for(unsigned j = i + 1; j < input.size(); ++j)
            right += input[j];

        int difference = std::abs(left - right); // 4
        if(difference < result)
            result = difference;
    }

    return result;
}
1. In the beginning we have no result, let's remark this initializing the variable to the highest available value.
2. Sum up all the elements on the left of the current pivot (included).
3. Sum up all the elements on the right of the current pivot (excluded).
4. Get the difference between left and right, and keep it, if it is the current minimum value.

This algorithm is straightforward, but it has a major issue. Its time complexity is in the order of N squared, as we can see immediately, given the twin for-loops in a for-loop.

And, all this summing up is not required. Or better, we repeats many time the same addictions we have already done. A better solution would permit us to minimize them to bare necessity.

Linear solution

Let's start splitting the vector in two parts. One contains just the leftmost element, the other one all the other elements. Then it would be just a matter of adding the current border element to the left and simultaneously subtracting it to the right. We still have two O(N) for-loops, but they are one after the other, so the resulting time complexity falls to linear:
int equilibrium(std::vector<int>& input)
{
    std::vector<int>::iterator pivot = input.begin();
    int left = *pivot;
    int right = std::accumulate(++pivot, input.end(), 0); // 1
    int result = std::abs(left - right);

    for(; pivot < input.end() - 1; ++pivot) // 2
    {
        left += *pivot;
        right -= *pivot;
        int diff = std::abs(left - right);
        if(diff < result)
            result = diff;
    }

    return result;
}
1. The first for-loop is hidden in this call to STL accumulate() function.
2. Second for-loop. Notice that I loop till the last but one element, since I want the right side of the vector to contain at least an element.

Go to the full post

The missing element

An easy problem, that you can find also in the Codility Time Complexity train section. I have written my solution in C++, you could test and submit your one in your preferred language (when available). Its Codility namecode is Perm-Missing-Elem.

In a few words: we are given in input a vector size N containing all the integers (with no duplicates) but one in [1..(N+1)] range. Our job is returning the missing one. We know that N is 100.000 maximum, we want a linear time and a constant space complexity.

First step, I have written the function prototype and I have tried to figure out a few test cases to help me in its design and development (I use Google Test, but any xUnit framework would do):
int missing(const std::vector<int>& input);

TEST(TestPME, CaseSample) // 1
{
  ASSERT_EQ(4, missing({ 2, 3, 1, 5 }));
}

TEST(TestPME, CaseEmpty) // 2
{
  ASSERT_EQ(1, missing({ }));
}

TEST(TestPME, CaseThree) // 3
{
  ASSERT_EQ(3, missing({ 2, 4, 1 }));
}
1. This is the test case provided by Codility. The vector is sized four, the biggest value contained could be five, it easy to see how the missing element is four. A technicality, I have used the C++11 handy notation to create a vector on the fly by an initialization list.
2. It is better to keep an eye on special cases. Here I check what happens when an empty vector is passed. N is zero, so only 1 could be the missing value.
3. There's not much to speculate on this function behavior. But for reason that would become evident in a little while, it is better to test it for both even and odd input sizes.

Naive solution

We could think of repeatedly scanning the vector looking for the missing value. But we should immediately see that this is not a viable solution. We need to loop for each possible value on all the vector elements, that means we are in the domain of a O(N square) time complexity.

Buying time with space

We can radically improve the time complexity using some more memory. Instead of performing a loop in a loop, we perform one after the other, reducing the time complexity to a mere O(N). Sure we have to pay for it, since we have to store the intermediate result somewhere, increasing the space complexity.

The idea is to scan the input vector, using the value read as index in another vector, to keep track that we have found it.
Then we scan the buffer, as soon as we find an element not set, we know that the original element was not present.

Here is my implementation:
int missing(const std::vector<int>& input)
{
  std::vector<bool> check(input.size()+1); // 1

  for(unsigned i = 0; i < input.size(); ++i) // 2
    check[input[i] - 1] = true;

  for(unsigned i = 0; i < check.size(); ++i) // 3
    if(check[i] == false)
      return i + 1;

  return 0;
}
1. I use as buffer a vector of boolean. Remember that we need one element more of the original vector. This STL vector constructor ensures that all the elements contained are set to its default value, that for boolean is false.
2. Loop on the input vector. Quite frighteningly, I haven't done any error handling, trusting the user to provide good stuff in. This is usually a very bad idea. In any case, I read the current value, I decrease it by one, getting a C-style 0-based index, and set the associated element in the check buffer to true, meaning "yeah, we have got this element".
3. Loop on the check buffer. The first element I find set to false is the one I am interested in (actually, it should be the first and only one). Convert back from zero-based index to actual value, increasing by one, and return it.

This solution is good enough to pass the Codility check, but doesn't look satisfactory to me.

Cheaper and more elegant

We are not so much interested in all the values we have in the input container. The unique value that is missing is what we are really interested in. We can spot it reasoning on the input nature. What we get is basically a scrambled sequence of the first N + 1 natural numbers from which we removed a single value. It is very easy to calculate an arithmetic series, if we don't consider the removal. At least since Gauss explained the trick. His classic formula for arithmetic series is usually written like this:
x = 1/2 * top * (top + 1)
What we do is adding the first and last element, top + 1, and multiply it for half the number of the elements. Think about it. To get the sum of all the integer in [1 .. 10], you can add (1 + 10), (2 + 9), ..., (5 + 6). That is 5 * 11.

Once we get the expected result for the full sequence, and remove from it the result we get summing the actual elements, what we get is the missing value:
int missing(const std::vector<int>& input)
{
  int64_t actual = std::accumulate(input.begin(), input.end(), int64_t(0)); // 1

  int64_t top = input.size() + 1;
  int64_t expected = top * (top + 1) / 2; // 2

  return expected - actual; // 3
}
1. The STL accumulate function sums up all the elements in the passed interval, using the third parameter as initial value. I have used as type an explicit 64 bit integer to get rid of problem related to different platforms on which this code could run. Pay attention to the fact that the size of the container and its biggest element could be 100K, so we could easily reach values in the range of billions.
2. I have rewritten the Gauss formula in this slightly different form to avoid the nuisance of multiplying for a floating number (1/2 = 0.5) when I know for sure that I am actually working with integer numbers only.
3. Implicit conversion from 64 bit integer to a plain (whatever means in your environment) integer. Since I am sure that the result is smaller than 100K, I am playing safely.

Go to the full post

Process name by procfs

In a UNIX(-like) environment, the init process is the parent all the other processes. It is usually created first at startup, with a 1 as pid (process identifier). However, there is no guarantee that is true in a specific environment. In my case, I found out that Ubuntu 13.10, Saucy Salamander, follows a different convention. We have an init process with pid 1, but our user processes are managed by a "user" init.Here I show how we could get on Linux the name of the executable associated to a specific process.

We get Linux system information checking the /proc (pseudo) file system. There we could also find a subdirectory for each process currently running on the system, identified by its pid. Here we are interested in checking the "comm" file. Notice that "comm" is available only from recent versions of the Linux kernel, in case it is not available on you environment, you could fallback to "cmdline". Besides, the "comm" stored value is truncated to TASK_COMM_LEN, that should be currently set to 16, characters.

It is easy to write a C++ function that does the trick, a bit more job is required for its C little brother. Here are their prototypes:
std::string getProcessName(int pid); // 1
bool getProcessName(int pid, char* name, int size); // 2
1. C++ let me define a clean interface. I pass in input the pid, I get back the name. If the pid has no associated process, I expect an empty string as output.
2. The C version is a bit clumsier. I need to pass as input parameter the buffer where to write the name, and its size. To simplify its usage I return also a boolean (you can change it to int, for older C compilers support) reporting for success of failure.

As usual, I write a few test cases (for Google Test) to let them drive me in the development:
TEST(ProcessName, GetInitPlus)
{
  ASSERT_STREQ("init", getProcessName(1).c_str()); // 1
  ASSERT_STREQ("init", getProcessName(1680).c_str()); // 2
}

TEST(ProcessName, GetOnePlus)
{
  ASSERT_STREQ("bash", getProcessName(2292).c_str()); // 3
}

TEST(ProcessName, GetMissingPlus)
{
  ASSERT_TRUE(getProcessName(8799).empty()); // 4
}
1. Pid 1 should refer to the init process.
2. My "user" init had that specific pid when I tested my code. You should change it as required.
3. Same as for (2), the pid of my bash shell was 2292, do not expect this test to succeed without proper editing.
4. I had no process with such id, so I expected an empty string as result.

TEST(ProcessName, GetInit)
{
  const int size = 16; // 1
  char buffer[size];
  ASSERT_TRUE(getProcessName(1, buffer, size)); // 2
  ASSERT_STREQ("init", buffer);
}

TEST(ProcessName, GetUserInit)
{
  const int size = 16;
  char buffer[size];
  ASSERT_TRUE(getProcessName(1680, buffer, size)); // 3
  ASSERT_STREQ("init", buffer);
}

TEST(ProcessName, GetOne)
{
  const int size = 16;
  char buffer[size];
  ASSERT_TRUE(getProcessName(2292, buffer, size));
  ASSERT_STREQ("bash", buffer);
}

TEST(ProcessName, GetMissing)
{
  const int size = 16;
  char buffer[size];
  ASSERT_FALSE(getProcessName(8799, buffer, size));
}
1. Accordingly to the official documentation, the constant TASK_COMM_LEN should be defined in "linux/sched.h". For some reason, I couldn't find it there or in any linux include. Maybe I have outdated includes on my machine. I didn't investigate much, and I used a local constant instead.
2. I assert that a process should be found and, in the next line, that the name should be "init".
3. As for the C++ version, check the actual pid for the "user" init on your current environment before testing.

Here is the C++11 version of my function:
std::string getProcessName(int pid)
{
  std::ifstream ifs { ("/proc/" + std::to_string(pid) + "/comm").c_str() }; // 1
  if(ifs.bad()) // 2
    return {};

  std::string command;
  std::getline(ifs, command);
  return command;
}
1. I try to open an input file stream for the proc/{pid}/comm file. The C++11 to_string() function converts an integer value to a string, so that its result could make use of the operator+ std::string overload to join it to the plain C-strings on its left and right. However ifstream requires a plain C-string in input, so a c_str() conversion on the result is required.
2. If the file can't be opened, I return an empty string. Otherwise the file content is extracted and returned.

And this is my C99 version:
bool getProcessName(int pid, char* name, int size)
{
  char filename[80];
  sprintf(filename, "/proc/%d/comm", pid);

  FILE* file = fopen(filename, "r"); // 1
  if(!file)
    return false;

  fgets(name, size, file); // 2
  name[strlen(name) - 1] = '\0';
  return true;
}
1. If I can't open the "comm" file for reading, I return an error, without caring about setting the buffer.
2. Otherwise I use fgets() to read the file content, then I get rid of backslash-n at the end of the string.

Go to the full post