Pages

HackerRank Array Manipulation

We have an array of integers of a given size, that could be in the order of tenth of millions. It is initialized with all zero elements.
We change it a few times, up to tenth of thousands, adding to given subintervals each time a positive number.
At the end, we want to know which is the highest values stored in the array.

This is a Data Structures HackerRank problem. Here below I show you a naive solution, and a smarter one, both of them using Python as implementation language.

An example

First thing, I wrote a test following the provided example. As a welcomed side effect, it helps me to clarify the problem to myself.
Given an array sized 5, let modify it three times, adding 100 to the first and second element, then 100 to the elements from the second up to the fifth, then again 100 to the third and fourth.

The array should go through these states:
  0   0   0   0   0
100 100   0   0   0
100 200 100 100 100
100 200 200 200 100
At the end, 200 should be the highest value in it.

My test code is:
def test_provided_naive_0(self):
    manipulator = NaiveManipulator(5)
    manipulator.set(1, 2, 100)
    manipulator.set(2, 5, 100)
    manipulator.set(3, 4, 100)
    self.assertEqual(200, manipulator.solution())
As I'm sure you have guessed, I have implemented my naive solution as a class, NaiveManipulator, that is initialized passing the size of the underlying array, and that has a couple of methods, set() to perform a transformation, and solution() to get the requested value at the end.

Let's see its code.
class NaiveManipulator:
    def __init__(self, sz):
        self.data = [0] * sz  # 1
    
    def set(self, first, last, value):
        for i in range(first-1, last):  # 2
            self.data[i] += value  # 3

    def solution(self):
        return max(self.data)
1. The array, initialized with the passed size and with all zero elements, is kept in the member named "data".
2. The indices are given as 1-based, so I convert them in 0-based before using them.
3. Each element in the specified interval is increased by the passed value.
4. Just a call to the built-in function max()

This implementation is really naive. It works fine, but only for limited input data.

A more challenging example

What happens if I have thousands of transformations on a moderately large array, where the subinterval sizes are in the order of the array size?

Let's write a test to see it.
def test_naive(self):
    manipulator = NaiveManipulator(1_000)
    for _ in range(2_000):
        manipulator.set(10, 800, 1)
    self.assertEqual(2_000, manipulator.solution())
It works fine. However, we start seeing how it is getting time consuming. The fact is that in test_naive() we have a for-loop, inside it we call the manipulator set() where there is another for-loop. This algorithm has a O(N*M) time complexity, where N is the number of transformations and M the (average) size of the subintervals. It is enough to have both N and M in the order of thousands to get puny performances by this algorithm.

Be lazy to be faster

Do we really need to increase all the values in the subinterval each time? Why don't we just set where all the increases would start and end, and then perform the increase just once? That could save as a lot of time.

I have refactored my NaiveManipulator to a smarter ArrayManipulator, keeping the same interface, so to nullify the impact on the user code.
class ArrayManipulator:
    def __init__(self, sz):
        self.data = [0] * (sz + 2)  # 1
    
    def set(self, first, last, value):  # 2
        self.data[first] += value
        self.data[last+1] -= value

    def solution(self):  # 3
        return max(itertools.accumulate(self.data))
1. The data array is going to change its meaning. It is not storing the actual value for each element, but the difference between the previous element and the current one. This explains why I need to increase its size by two, since I add a couple of sentinel elements, one at the begin, the other at the end.
2. Instead of changing all the elements in the subinterval, now I change just the first, to signal an increment sized "value", and the one after the last one, to signal that we are getting back to the original value.
3. Large part of the job is now done here. I call accumulate() from the standard itertools library to convert the values stored in the data array to the actual value, then I pass its result to max(), to select its biggest value.

On my machine, this algorithm is about 200 times faster that the previous one on test_naive. Enough to pass the HackerRank scrutiny.

Full python code and test case pushed to GitHub.

1 comment: