Pages

Majority element

Given a list of integers, return 1 if there is an element that appears more than half the time in the list, otherwise 0.

Naive solution

Loop on all the items in the list, and count each of them. If we find a result that is bigger than half the size of the list, we have a positive result.

We can try to apply some improvement to this algorithm, for instance, we can stop checking as soon as we reach the first element after the half size, and we don't need to check each element from the beginning of the list, since if there is an instance of that element before the current point we have already count it, and if there is not, there is no use in check it. Still, the time complexity is O(n^2), so we can use it only for very short input lists.

Divide and conquer

We divide the problem until we reach the base case of one or two elements - that could be solved immediately. Then we "conquer" this step determining which element is majority in the current subinterval. I described in detail my python code that implements this approach in the previous post.

Sorting

This is a simple improvement of the naive solution, still it is powerful enough to give better results than the divide and conquer approach.

The idea is sorting the input data, and then compare the i-th element against the i-th + half size of the list. If it is the same, that is a positive result. The most expensive part of this approach is in the sorting phase, O(n log n).

Hash table

If we don't mind to use some relevant extra space, we can scan the input list pushing in a hash table each new element that we find, increasing its value each time we see it again. Than we scan the values in the hash table looking for a more than half list size one.

The python Counter collection allows us to solve this problem as a one-liner:
def solution_hash(data):
    return True if Counter(data).most_common(1)[0][1] > len(data) // 2 else False
The time complexity is linear.

Boyer–Moore

The Boyer–Moore algorithm leads to a fun, elegant, and superfast solution.

We perform a first scan of the list, looking for a candidate:
def bm_candidate(data):
    result = None
    count = 0
    for element in data:  # 1
        if count == 0:  # 2
            result = element
            count = 1
        elif element == result:  # 3
            count += 1
        else:
            count -= 1

    return result
1. Loop on all the elements
2. If the candidate counter is zero, we get the current element as candidate, setting the count to one.
3. If we see another instance of the current candidate, we increase the counter. Otherwise we decrease the counter.

This scan ensures to identify the majority element, if it exists. However, if there is not a majority element, it returns the last candidate seen. So we need to perform another linear scan to ensure that the candidate is actually the majority element.

Nonetheless, Boyer–Moore wins easily against the other seen algorithms.

See my full python code for this problem, and the other from the same lot, on GitHub.

No comments:

Post a Comment