Is it a permutation?

We want to write a function that gets in input a vector and check if it contains a permutation of the sequence of natural integers starting with 1.

This is an input that we should accept: { 4, 1, 3, 2 }
And this one should be rejected: { 4, 1, 3 }

You could find this problem in the Codility Train page, section Counting Elements, codename Perm-Check. You can submit you solution in one of a few different supported programming languages, to check it against their acceptance test.

My solution is written in C++98 (alas C++11 is not supported) with test cases for GoogleTest.

Test cases

A couple of test cases are given in the problem presentation, I just added a couple of trivial ones more:
int solution(std::vector<int>& input); // 1

TEST(PeCe, GivenGood) // 2
{
    std::vector<int> input;
    input.push_back(4);
    input.push_back(1);
    input.push_back(3);
    input.push_back(2);

    EXPECT_EQ(1, solution(input));
}

TEST(PeCe, GivenBad) // 3
{
    std::vector<int> input;
    input.push_back(4);
    input.push_back(1);
    input.push_back(3);

    EXPECT_EQ(0, solution(input));
}

TEST(PeCe, OneGood) // 4
{
    std::vector<int> input(1, 1);

    EXPECT_EQ(1, solution(input));
}

TEST(PeCe, OneBad) // 5
{
    std::vector<int> input(1, 42);

    EXPECT_EQ(0, solution(input));
}

TEST(PeCe, OneBigBad) // 6
{
    std::vector<int> input(1, 1000000000);

    EXPECT_EQ(0, solution(input));
}
1. The function prototype we have to implement. For some reason, instead of returning a boolean, it returns an integer that would act as a K&R C boolean emulation, 0 means false, 1 true.
2. First given test, it should detect the permutation.
3. Second given test, no permutation in input.
4. Simplest possible good case.
5. Simplest possible bad case.
6. A curious case. We should consider the case we have huge integers in input. Up to one billion, actually. This is a bit strange, since the max expected size for the vector is just a modest 100 thousand, however we should expect some trickery in the Codility acceptance test based on this point.

A too expensive solution

What we can think is repeatedly scan the input up looking for all the sequence values. If we don't find an expected one, we return failure, otherwise the input is accepted. We should smell immediately something bad. Looping on all the N elements of a container, checking for N values, leads to a Big Oh N Squared worst case time complexity, that we should avoid.

In any case, here it is:
int solution(std::vector<int>& input)
{
    for(unsigned i = 1; i <= input.size(); ++i) // 1
    {
        bool found = false;
        for(unsigned j = 0; j < input.size(); ++j) // 2
        {
            if(static_cast<unsigned>(input[j]) == i) // 3
            {
                found = true;
                break;
            }
        }
        if(!found)
            return 0;
    }

    return 1;
}
1. Our scrambled sequence should contains all the integers from one up to the number of elements in input.
2. Let's look for the current value.
3. The vector in input should have been parametrized for unsigned values, Codility decided otherwise, so here I say explicitly to the compiler not to worry about comparison with what it perceives as objects of different types. Trust me, they are actually both unsigned ints.

The code works fine for small input, but it rapidly gets too slow to be acceptable when the input size grows.

Linear solution

Let's divide the job in two consecutive steps, and use a buffer to store the temporary result. In this way, instead of having one loop nested in another one, we'll have one loop after the other, reducing the time complexity to O(N). We pay this improvement increasing the algorithm space complexity, but we happy to pay this price:
int solution(std::vector<int>& input)
{
    std::vector<bool> buffer(input.size()); // 1

    for(unsigned i = 0; i < input.size(); ++i) // 2
    {
        unsigned value = input[i]-1;
        if(value < input.size()) // 3
            buffer[value] = true;
        else
            return 0;
    }

    return std::count(buffer.begin(), buffer.end(), false) == 0 ? 1 : 0; // 4
}
1. Flags for the expected values in the input vector. The elements of a container are initialized with the default value for the underlying type. So in this moment here we have a bunch of false elements.
2. Loop on all the input values.
3. The OneBigBad test case we have seen above was written to help me to remember to write this check. If input contains (at least) a value that is bigger than expected, we know that we have to reject it. Only if the value is in the correct range we set its flag to true. Moreover, notice that in the line above I have also silently converted the value from signed to unsigned, if a rouge input included a negative value, it has been converted there to a huge positive numbers, and here the anomaly is caught. Better would have been to impose that the input vector was of unsigned elements.
4. I have hidden the second loop that uses the buffer partial result in this call to the STL count() function. We expect all no flag set to false anymore. If this is what happens, we can return success.

Sleeker and (maybe) faster

Actually, as suggested by Karsten, see his comments below, there is no actual need of the final buffer checking, if we have already carefully checked each value in input. The advantage of moving the check here is that we can fast fail as soon as we detect a duplicate element in input:
int solution(std::vector<int>& input)
{
    std::vector<bool> buffer(input.size());

    for(unsigned i = 0; i < input.size(); ++i)
    {
        unsigned value = input[i]-1;
        if(value < input.size() && buffer[value] == false) // 1
            buffer[value] = true;
        else
            return 0;
    }

    return 1; // 2
}
1. If we see that the flag has been already set, it means that we have spotted a duplicate, so we can reject the current input.
2. If we get here, each element in input has been accepted, we just have to return success.

7 comments:

  1. You can speed up your algorithm if you check in line 8 if buffer[3] is already true. In this case you have a duplicate and you can return true.

    ReplyDelete
    Replies
    1. Sorry, i mean buffer[ value ] == true. If this is the case you can return false.

      Delete
    2. Yes, Karsten, thank you. You are right, testing for duplicates is cheap, and it could save substantial computational time. I have modified the code following your suggestion.

      Delete
    3. If you have tested for duplicates and have checked that no values larger then N exist, I think you can also remove the last line and simply return true.

      Delete
    4. Absolutely right. The final check on flags doesn't make any sense anymore.

      Delete
  2. PHP 100/100 for PermCheck Codility : Ajeet Singh

    function solution($A) {
    $count_of_elements = array_fill(0, count($A)+1, null);
    for($i=0; $i < count($A); $i++ ){

    $count_of_elements[$A[$i]] += 1;

    if($A[$i] > count($A) || $A[$i] <= 0 || $count_of_elements[$A[$i]] > 1)
    return 0;

    }
    if(array_sum($count_of_elements) < count($A))
    return 0;
    return 1;
    }

    ReplyDelete
    Replies
    1. Thank you for contributing, Ajeet.

      Delete