We have a collection of integers (whichever 32 bit signed int is possible, and up to one million of them), we want to know if there is a dominant value in it. Meaning, a value that occurs more than half the times. If so, the index of one of such item (anyone would do) should be returned. Otherwise a -1 would mean "no dominant value".

To make things a bit more spicy, linear time and constant space complexity is required.

I found this problem in the Codility train page, section Leader, under the nickname Dominator. There you would find more details and the free words description of the test case I have here translated for C++ and GoogleTest:
int solution(const std::vector<int>& input);

TEST(Domi, Given)
    int h[] = { 3, 4, 3, 2, 3, -1, 3, 3 };
    std::vector<int> input(h, h + 8);

    ASSERT_EQ(3, input[solution(input)]);
More than half the elements in the collection have value 3, so I expect as output the zero-based index of an element whose value is 3.

Not considering the space complexity requirement, is not difficult to find an (almost) suitable solution.

What I would do, is creating a map where the values of the input vector are associated to a counter that keep track of their sequence. Or better, since we want to know also the index of one of such elements, besides the counter I am going to store also the first index I get.

Then it would be just a matter of scanning the map.

Here is a possible C++98 implementation:
typedef std::pair<unsigned, unsigned> CountInd; // 1
typedef std::map<unsigned, CountInd> Counter; // 2

int solution(const std::vector<int>& input)
    Counter counter;
    for (unsigned i = 0; i < input.size(); ++i)
        Counter::iterator pos = counter.find(input[i]);
        if (pos != counter.end()) // 3
            counter[input[i]] = std::pair<unsigned, unsigned>(1, i);

    for (Counter::const_iterator it = counter.begin(); it != counter.end(); ++it)
        if (it->second.first > input.size() / 2)
            return it->second.second;

    return -1;
1. The value in the map would be a pair of frequency and index in the original vector.
2. I don't care about keeping the elements in the map ordered. If I were allowed to use C++11, the STL unordered_map would have been a much better choice. I am paying in term of time complexity something that I am not using. On the other hand, defining a custom hash map for this problem would be kind of an overkilling.
3. If I have already added an element in the map for the current value, I have just to increase its counter. Otherwise I add a new element where the key is the current input value, and the value is a pair having as first the current counter for this element (one), and as second the index to the current element in the input vector.

This solution scores full marks on Codility, but we know that is not the right answer. Time complexity is so close to be linear that their tested didn't detect the logarithmic multiplicator caused by the STL map. The real issue is the space complexity (even though not detected, too) that is linear instead of constant.

To achieve this result, we could apply an astute algorithm that you could see described in this linked document from Codility, where is given in a Python implementation in a function called goldenLeader.

I am going to use this hint to refactor my solution.

No comments:

Post a Comment