Codility Distinct

We want to know how many distinct values in a passed integer vector.

This simple problem is available also on the Codility Sorting section. This fact is a huge hint to solve it, because sorting it before do any check makes our job a lot easier.

Reading the codility description, you could also find the description of a test case, that here I have ported to GoogleTest in C++11:
TEST(Distinct, Given)
{
    std::vector<int> input { 2, 1, 1, 2, 3, 1 };
    ASSERT_EQ(3, solution(input));
}
There are three distinct values in the input vector, so our function should return 3.

Here is a possible solution:
int solution(std::vector<int>& input)
{
    if(input.size() < 2) // 1
        return input.size();

    std::sort(input.begin(), input.end()); // 2

    int result = 1;
    int prev = input.front(); // 3
    std::for_each(input.begin() + 1, input.end(), [&](int cur){ // 4
        if(cur != prev) // 6
        {
            ++result;
            prev = cur;
        }
    });

    return result;
}
1. Trivial cases.
2. We are modifying the input data. This is not commonly seen as a good idea, since the caller usually doesn't want having such kind of surprises. On codility problems we usually assume that performances are of paramount importance.
3. There is at least a distinct value in the vector, the first one.
4. Loop on all the other values. Notice that I capture by reference the external values, so that I could change them in the lambda function.
5. Each time I see a new value, increase the result and keep track of the new value that I have to check.

Unique

The above solution works fine, still, meetingcpp tweeted me, what about std::unique?

Right, std::unique(). Here you can see it in action:
int solution(std::vector<int>& input)
{
    if(input.size() < 2)
        return input.size();

    std::sort(input.begin(), input.end());
    return std::unique(input.begin(), input.end()) - input.begin();
}
Isn't that a beauty?

The STL unique() function rearrange the elements in a sorted collection moving the duplicates to its end and returning an iterator to the new logical end - the actual collection size is not changed.

3 comments:

  1. #include
    int solution(vector &A) {
    // write your code in C++11
    set myset;
    for(auto&a : A) myset.insert(a);
    return myset.size();
    }

    ReplyDelete
  2. int solution(vector &A) {

    set distinct(A.begin(), A.end());
    return distinct.size();

    }

    ReplyDelete
    Replies
    1. It's a pity that copy and paste ruined the code for both solution, still it is clear the idea behind. There is no need of sorting, just think to the set container.
      A set has the cool property of discarding duplicates, so I should just copy the elements from the vector to the set, and then checking how many elements make it to it.
      But now that I think about it, we can do even better. If we use unordered_set, the underlying data structure would be an hash table (and not a balanced tree as for a plain set), so we could enjoy linear time execution.
      The dark side of this solutions is that they have a substantially higher cost in term of space, still usually we are happy to pay that price.
      Thank you to both contributors (and sorry for the late reply, especially to the first one).

      Delete