Couples and Single

A vector of integers is given. All values are in the range [1..100,000,000] and present in couples, but one that is single. Return it.

A C++11 GoogleTest test case to clarify the problem:
TEST(CouplesAndSingle, Simple)
{
    ASSERT_EQ(42, solution({ 1, 42, 1 }));
}
My solution uses an hash table where I store each new value value I see in input. If I see it a second time, I remove it from the table. In the end, I should have just one element in the hash table, the single value. It is not required, however I decided to return 0 in case of unexpected input.

Here is how I have implemented it:
int solution(const std::vector<int>& input)
{
    std::unordered_set<int> buffer; // 1
    for(int value : input) // 2
    {
        if(value < 1 || value > 100000000) // 3
            return 0;

        auto it = buffer.find(value); // 4
        if(it == buffer.end())
            buffer.insert(value);
        else
            buffer.erase(it);
    }
    return (buffer.size() != 1) ? 0 : *buffer.begin(); // 5
}
1. STL unordered_set implements hash table in C++11.
2. For each element in input.
3. Paranoid check.
4. If the current value is not in buffer, insert it. Otherwise remove it.
5. The single value is the first (and unique) element in buffer.

6 comments:

  1. That solution is not suitable for large vectors. If you have enough RAM check it on the amount of data in the 1-2GB with values in the array, which first increase monotonically till the array middle, and then again increase monotonically(starting with the same value as in previous part) till the end.
    Better solution is to xor all array elements - result is the required unique value [O(n) - time complexity, O(1) - space complexity].

    ReplyDelete
    Replies
    1. XOR-ing is a smarter solution, thank you for pointing it out. I am not so happy with it just for my lack of trust in input data. It works fine assuming that we actually have a bunch of pairs and just one single element. For more singles it returns some unexpected value.
      Moreover, if you pay attention to the problem definition, you should see how it refers to relatively small vectors. In the worst case the hash table would reach a size of 100 million before we start removing elements.
      Maybe it was worthy to add a check on the vector size (less than 200 millions) as first line of my solution.

      Delete
  2. would it be faster to sort input and check if input[i]==input[i+1] in a 2-step loop ?

    ReplyDelete
    Replies
    1. Nice try but no. Sorting can't cost you less than a O(n lg n) execution time, and that would become the dominant operation in your algorithm. Insert and lookup in a hash table are expected to be constant time operations, so my proposal runs in linear time.

      Delete
  3. Replies
    1. Oh, come on, anonymous. Writing in all caps is considered rude, don't you know?
      And, please, read my answer to the first comment.

      Delete