Showing posts with label problem. Show all posts
Showing posts with label problem. Show all posts

HackerRank Array Manipulation

We have an array of integers of a given size, that could be in the order of tenth of millions. It is initialized with all zero elements.
We change it a few times, up to tenth of thousands, adding to given subintervals each time a positive number.
At the end, we want to know which is the highest values stored in the array.

This is a Data Structures HackerRank problem. Here below I show you a naive solution, and a smarter one, both of them using Python as implementation language.

An example

First thing, I wrote a test following the provided example. As a welcomed side effect, it helps me to clarify the problem to myself.
Given an array sized 5, let modify it three times, adding 100 to the first and second element, then 100 to the elements from the second up to the fifth, then again 100 to the third and fourth.

The array should go through these states:
  0   0   0   0   0
100 100   0   0   0
100 200 100 100 100
100 200 200 200 100
At the end, 200 should be the highest value in it.

My test code is:
def test_provided_naive_0(self):
    manipulator = NaiveManipulator(5)
    manipulator.set(1, 2, 100)
    manipulator.set(2, 5, 100)
    manipulator.set(3, 4, 100)
    self.assertEqual(200, manipulator.solution())
As I'm sure you have guessed, I have implemented my naive solution as a class, NaiveManipulator, that is initialized passing the size of the underlying array, and that has a couple of methods, set() to perform a transformation, and solution() to get the requested value at the end.

Let's see its code.
class NaiveManipulator:
    def __init__(self, sz): = [0] * sz  # 1
    def set(self, first, last, value):
        for i in range(first-1, last):  # 2
  [i] += value  # 3

    def solution(self):
        return max(
1. The array, initialized with the passed size and with all zero elements, is kept in the member named "data".
2. The indices are given as 1-based, so I convert them in 0-based before using them.
3. Each element in the specified interval is increased by the passed value.
4. Just a call to the built-in function max()

This implementation is really naive. It works fine, but only for limited input data.

A more challenging example

What happens if I have thousands of transformations on a moderately large array, where the subinterval sizes are in the order of the array size?

Let's write a test to see it.
def test_naive(self):
    manipulator = NaiveManipulator(1_000)
    for _ in range(2_000):
        manipulator.set(10, 800, 1)
    self.assertEqual(2_000, manipulator.solution())
It works fine. However, we start seeing how it is getting time consuming. The fact is that in test_naive() we have a for-loop, inside it we call the manipulator set() where there is another for-loop. This algorithm has a O(N*M) time complexity, where N is the number of transformations and M the (average) size of the subintervals. It is enough to have both N and M in the order of thousands to get puny performances by this algorithm.

Be lazy to be faster

Do we really need to increase all the values in the subinterval each time? Why don't we just set where all the increases would start and end, and then perform the increase just once? That could save as a lot of time.

I have refactored my NaiveManipulator to a smarter ArrayManipulator, keeping the same interface, so to nullify the impact on the user code.
class ArrayManipulator:
    def __init__(self, sz): = [0] * (sz + 2)  # 1
    def set(self, first, last, value):  # 2[first] += value[last+1] -= value

    def solution(self):  # 3
        return max(itertools.accumulate(
1. The data array is going to change its meaning. It is not storing the actual value for each element, but the difference between the previous element and the current one. This explains why I need to increase its size by two, since I add a couple of sentinel elements, one at the begin, the other at the end.
2. Instead of changing all the elements in the subinterval, now I change just the first, to signal an increment sized "value", and the one after the last one, to signal that we are getting back to the original value.
3. Large part of the job is now done here. I call accumulate() from the standard itertools library to convert the values stored in the data array to the actual value, then I pass its result to max(), to select its biggest value.

On my machine, this algorithm is about 200 times faster that the previous one on test_naive. Enough to pass the HackerRank scrutiny.

Full python code and test case pushed to GitHub.

Go to the full post

HackerRank Java Dequeue

Given n integers in input, consider each window in it sized m, where m is less or equal to n, to get the number of unique elements in it. Return its maximum value.

I found this problem in the HackerRank Java Challenges section with name Java Dequeue, as a hint that a Deque would helpful to solve it elegantly.

The HackerRank settings implies that data are passed us from To develop a method that could be easily tested, I made it accept an InputStream as parameter, then I create a Scanner on it, opened in a try-with-resource block.
public static int solution(InputStream is) {
    try (Scanner scanner = new Scanner(is)) {
        // ... see below
The first to integers we expect in the input stream are the above described n and m. A problem constrain states that m is not bigger than n, I make it clear in the code with an assertion.
int n = scanner.nextInt();
int m = scanner.nextInt();
assert m <= n;
Now I should be ready to get the next n integers an work on them. The big hint in the problem name suggests to use a Deque. But which one? ArrayDeque looks promising, but it is a flawed choice, since we'll be push about all items in from one side and pop them from the other, this would lead to a useless continuous data shift in it. Much better using a LinkedList.

Since we work with the count of items in the window, a second container would be useful. We'll push any item entering the window in a hash table, storing as value its current count. When an item exits the window we decrease its count in the hash. If the count is zero, we remove it.

Since we are going to call our method many times in a row, it would be a waste create each time a deque and a hash. Better to make them class data member, and simply reset them when the method is called.
public class Solution {
    private static Deque<Integer> window = new LinkedList<>();
    private static Map<Integer, Integer> counter = new HashMap<>();

    public static int solution(InputStream is) {
        // ... see above

        // ... see below
Let's fill up the window for the first time. The hash map 'counter' is set up as described above.
for (int i = 0; i < m; i++) {
    int in = scanner.nextInt();
    counter.merge(in, 1, Integer::sum);

int result = counter.size();

// ... see below
To count how many different integers are in the current window, I simply check the size of the hash map.

Now it is just a matter of iterating on all the other integer in the input stream.
for (int i = m; i < n && result < m; i++) {
    Integer out = window.remove();
    Integer in = scanner.nextInt();

    // ... see below

return result;
I have started looping on the m-th item, willing to go up to the n-th. But, wait, if I find a window with all different numbers I have already found the problem solution. That could save some CPU time.

Firstly I have adjusted the window, removing its first element and adding a new last element. The exiting and entering elements are used to modify the counter map.
if (out.intValue() != in.intValue()) { // 1
    counter.merge(in, 1, Integer::sum);  // 2
    counter.merge(out, -1, (a, b) -> a == 1 ? null : a + b); // 3
    result = Math.max(result, counter.size()); // 4
1. If what enters in the window is the same of what exists from it, I don't have to do anything.
2. I merge the item 'in' in the map. That means, if the map contains 'in', Integer::sum is used to adjust its value (on its current value and the '1' I passed in). Otherwise a new item is created in the map, key 'in', value '1'.
3. Similarly to the previous line, but now I'm performing a sort of 'merge down'. I'm not satisfied by this line I wrote, even though it is kind of fun. Its point is that we know for sure that 'out' is in the map, so we know that we are going to execute the lambda passed to merge(). It would return null if the value associated to 'out' is 1, removing it. Otherwise it would return a + b, but b is set to -1, so it would decrease it. The weak spot is, what if for 'out' is not in the map? Well, merge() would push it in, with value -1. Damn it. There is no way to avoid this inconsistency, since merge() would throw an exception if null is passed as value. End of the story is, I would not use this line in production code.
4. Maybe this check-and-set looks a bit too pythonic, what do you think about it?

Full Java code and test case available on GitHub.

Go to the full post

HackerRank Java Sort

We have a Student class, containing an int, a String and a double data member. We want read a bunch of Students from a data stream, push them all in list, sort them by the double - decreasing, then string, and then int. In the end we'll output just the student names.

This apparently boring little HackerRank problem shows how much more fun has become Java since functional programming has been introduced in it.

I won't touch the Student class, limiting my job to the main class where I create a Scanner on the input stream. Once I read the number of students to work with, Stream is my good friend.
Stream.generate(() -> new Student(sc.nextInt(),, sc.nextDouble())) // 1
    .limit(count) // 2
    .sorted(Comparator // 3
        .comparingDouble(Student::getCgpa).reversed() // 4
        .thenComparing(Student::getFname) // 5
        .thenComparingInt(Student::getId)) // 6
    .map(Student::getFname) // 7
    .forEach(System.out::println); // 8
1. The stream comes up by data provided on the fly, reading them from the scanner - named sc. I read an int, a string, a double from it, I shovel them in the Student ctor, that is used in a lambda as supplier for the stream generated() method.
2. I don't want generate() to be called forever, the stream size is limited by the number of students, that I have previously read in 'count'.
3. I have to sort my stream, to do that I provide a Comparator.
4. Firstly I compare (primitive) doubles as provided by the Student.getCgpa() method. But, wait, I don't want the natural order, so I reverse it.
5. Second level of comparison, the Student first name.
6. Third level, the (primitive) int representing the Student id should be used.
7. Almost ready to print the result. Since I'm interested only in the student names, I map each student to her/his name only.
8. Loop on the stream, containing the opportunely sorted student names, and print each one on a new line.

You could get the full Java code from GitHub.

Go to the full post

HackerRank Java 1D Array (Part 2)

Don't be misled by its puzzling name, this fun little HackerRank problem has its main interest not in its implementation language (I should admit that I designed it thinking more pythonesquely than javaly) nor in the use of the array data structure, but in the algorithm you should think to solve it.

There is a one dimensional board. Each cell in it could be set to zero or one. At startup the pawn is placed on the first cell to the left, and its job is going out to the right. It could be moved only on zero cells moving forward by 1 or 'leap' positions, and backward just by one.

Having specified the zeros and ones on the board, and the value of leap (non negative and not bigger than the board size), can we say if the game could be won?

It looks to me that Dynamic Programming is the a good approach to be used here. The problem splits naturally in a series of simple steps, that contribute to the requested solution. There's one issue that muddies the water, the backward move. The rest of the algorithm is pretty straightforward. Start from the last cell in the board. Determine if a pawn could be placed there and if this would lead to a win. Move to the next one of its left, up to the leftmost one. Then return the win condition for the first cell.

I implemented my idea in this way using, as required, Java as implementation language.

Firstly, I create a cache to store all the intermediate values:
boolean[] memo = new boolean[game.length];
if (game[game.length - 1] == 0)
    memo[memo.length - 1] = true;
They are all initialized to false, but the last one, and only if a pawn could go there.

Having set this initial condition, I could loop on all the other elements.
for (int i = memo.length - 2; i >= 0; i--) {
    // ... see below

return memo[0];
In the end, the first cell of my cache would contain the solution.

Each cell in the board is set to true if one of these conditions holds:
if (game[i] == 0 && (i + leap >= game.length || memo[i + 1] || memo[i + leap])) {
    memo[i] = true;
    // ... see below
First thing, check the current value in the board. If it is not zero, I can't put the pawn there, so there is nothing to do, the cache value surely stays set to false.
Otherwise, one of the following three conditions lead to true in cache:
- adding the current position to the leap value pushes the pawn out of the board
- the next cell in the board is a good one
- the next leap cell is a good one.

Now the funny part. When I set to true a cell, I should ensure that a few cells to the right are marked as good, too. This is due to the backward move. It is a sort of domino effect, that should stop when we find a cell where our pawn can't go.
for (int j = i + 1; j < memo.length && game[j] == 0; j++)
    memo[j] = true;
Full Java code and test case pushed on GitHub.

Go to the full post

Partitioning Souvenirs (patched)

Given a list of integers, we want to know if there is a way to split it evenly in three parts, so that the sum of each part is the same than the other ones.

Problem given in week six of the edX MOOC Algs200x Algorithmic Design and Techniques by the UC San Diego.

My first solution, as you can see in a previous post (please check it out for more background information), was accepted in the MOOC but didn't work in a few cases, as Shuai Zhao pointed out.

So, I added the Shuai test case to my testing suite, and then a couple of other ones, that I felt would help me in writing a more robust solution:
def test_shuai(self):
    self.assertFalse(solution([7, 2, 2, 2, 2, 2, 2, 2, 3]))

def test_confounding_choice(self):
    self.assertFalse(solution([3, 2, 2, 2, 3]))

def test_duplicates(self):
    self.assertTrue(solution([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]))
Firstly, I rewrote the initial check, that now looks in this way:
def solution(values):
    third, module = divmod(sum(values), 3)  # 1
    if len(values) < 3 or module or max(values) > third:  # 2
        return False
1. Add up the passed values, our target is its third, and surely we can't split it there is a remainder after the division. Using divmod() we get both values in a single shot.
2. If we have less than three values in input, a remainder by dividing their sum by three, or if there is a value bigger than a third of the sum, our job is already done.

Then, since I saw that the problem was due to the lack of check in the ownership for values, I added a shadow table to the one I'm using for the Dynamic Programming procedure. I named it 'taken', and it keeps the id of the owner for the relative value for the current row. Zero means the value is currently not taken.
table = [[0] * (len(values) + 1) for _ in range(third + 1)]
taken = [[0] * (len(values) + 1) for _ in range(third + 1)]
As before, I apply the standard DP procedure, looping on all the "valid" elements.
for i in range(1, third + 1):
    for j in range(1, len(values) + 1):
        # ...
Another not a substantial change, it occurred to me that I keep track of a maximum of two positive matches in the table, so, if I have already reached that value, there's no need in doing other checks, just set the current value to two.
if table[i][j-1] == 2:
    table[i][j] = 2
The next check was already in, I just refactored it out, because the combined if clause in which it was included was getting too complicated, and I added the first use of the taken table.
if values[j-1] == i:
    table[i][j] = 1 if table[i][j-1] == 0 else 2  # 1
    taken[i][j] = j  # 2

ii = i - values[j-1]  # 3
1. If the current value is the same of the current row value, I have found a match. So I increase its value in the DP table
2. And I mark it as taken for the j value in the current row.
3. If j is not an exact match for i, I split it, looking for the difference in the previous column.

Now it's getting complicated. If there is a good value in the previous column, before using it we have to ensure it is available. And, since it could have been the result of a splitting, we need to check on the full row to do our job correctly.
available = True
taken_id = taken[ii][j-1]  # 1
if taken_id:
    for jj in range(1, j):  # 2
        if taken[ii][jj] == taken_id:  # 3
            if taken[i][jj]:
                available = False
1. The current value id from the taken table is zero if was not used in the 'ii' row.
2. If the current value has a valid id, we need to ensure it has not be already used in current 'i' row.
3. The current jj is marked as used in the ii row for the id we care about, if it is taken also in the i row, we have a problem. We are trying to using in 'i' a value that has been already used. This was the check I missed.

Good, now I can safely going on with my DP algorithm.
if ii > 0 and table[ii][j - 1] > 0 and not taken[i][j - 1] and available:
    table[i][j] = 1 if table[i][j - 1] == 0 else 2
    // ...
The annoying thing is that I have to set the taken table too:
taken[i][j] = j  # 1
taken[i][j-1] = j

matcher = values[j-1] + values[j-2]  # 2
if taken_id:
    for jj in range(1, len(values)):  # 3
        if taken[ii][jj] == taken_id:
            taken[i][jj] = j
            matcher += values[jj-1]
if matcher < i:
    for jj in range(j-2, 0, -1):  # 4
        if taken[i][jj] or matcher + values[jj-1] > i:
        matcher += values[jj-1]
        taken[i][jj] = j
        if matcher == i:
1. This is nice and simple. The current j and the previous one should be marked as j.
2. We have to mark all the other values concurring to generate the current sum. These are two just marked 'j' on this row, all the ones marked as stored in taken_id on 'ii' and, if they are not enough, also the ones free on 'i', to the get the expected result. To perform this last check I need another variable, that I called 'matcher'.
3. Mark on 'i' as 'j' all the items marked taken_id on 'ii'. And adjust matcher.
4. If needed, loop on the left elements looking for the missing values.

And finally:
        # ...
        else:  # 1
            table[i][j] = table[i][j - 1]

return True if table[-1][-1] == 2 else False  # 2
1. This 'else' follows the 'if ii > 0 and ...' above, ensuring the current cell is adjourned correctly if nothing else happened before.
2. The bottom right cell in our DP table should be set to 2 only if we find a correct 3-way split.

See full python code and test cases on GitHub.

Go to the full post

HackerRank Deque-STL

Given an array of integers, find the max value for each contiguous subarray in it sized k. This HackerRank problem is meant to be solved in C++ and, as its name suggests, using a deque.

For instance, if we are given the array {3, 4, 6, 3, 4} and k is 2, we have to consider four subarrays sized:
{3,4} {4,6} {6,3} {3,4} 
And the expected solution is
{4, 6, 6, 4}
An adapter

The original HackerRank problem asks to write a function than outputs its result to standard output. I didn't like much this requisite. As a TDD developer, I'm used to let tests drive the code development. And having to check standard output to verify a function behavior is not fun. So I slightly changed the function signature, asking to return a vector containing the results, and I used the original function as a simple adapter to the original problem. Something like that:
std::vector<int> maxInSubs(int data[], int n, int k)
    // ...

// ...
void printKMax(int arr[], int n, int k)
    auto data = maxInSubs(arr, n, k);
    std::copy(data.begin(), data.end(), std::ostream_iterator<int>(std::cout, " "));
    std::cout << '\n';
First (naive) attempt

Just do what we are asked to to. For each subarray find its maximum value ad push it to the result vector.
std::vector<int> maxInSubs(int data[], int n, int k)
    std::vector<int> results;
    for (int i = 0; i < n - k + 1; ++i)
        results.push_back(*std::max_element(data + i, data + i + k));
    return results;
Clean and simple and, when k and n are small, not even too slow. However, for k comparable to a large n we can say bye bye to performance.

Patched naive attempt

We could be tempted to save the algorithm explained above, observing that the slowness is due to the k calls to max_element(). We could avoid to call it a substantial number of times checking the value of the elements exiting and entering the current window, for instance in this way:
std::vector<int> results{ *std::max_element(data, data + k) };  // 1

for (size_t beg = 1, end = k + 1; end <= n; ++beg, ++end)  // 2
    if (data[end - 1] > results[results.size() - 1])  // 3
        results.push_back(data[end - 1]);
    else if (data[beg - 1] < results[results.size() - 1])  // 4
        results.push_back(results[results.size() - 1]);
    else  // 5
        results.push_back(*std::max_element(data + beg, data + end));
1. Initialize the result vector with the max element for the first interval.
2. Keep beg and end as loop variable, describing the current window to check.
3. The new right element of the window is bigger than the max for the previous window. Surely it is the max for this one.
4. The element that has just left the window is smaller than the previous max. Surely the max is still in the window.
5. Otherwise, we'd better check which is the current max.

A smartly designed array in input could beat this simple algorithm. However on HackerRank they didn't spend too much time on this matter, and this solution is accepted with full marks.

Solution with a deque

In a more elegant solution, we should to minimize the multiple check we perform on the data elements. Right, but how? Until this moment, I haven't paid attention to the huge hint HackerRank gave us, "Use a deque!", they shout from the name of the problem itself.

The point is that I want to perform a cheap cleanup on each window, so that I could just pick a given element in it, without scanning the entire interval.

Let's use the deque as a buffer to store only the reasonable candidates as max. Since we want to remove from this buffer the candidates that are not anymore valid when the window is moved, instead of their value we keep in it their indices from the original data array.

Here is how I initialize it:
std::deque<int> candidates{ 0 };  // 1
for (int i = 1; i < k; ++i)
    pushBack(candidates, data, i);  // 2
1. We could safely say that the first element in data is a good candidate as max for its first subarray.
2. Push back to candidates the "i" index from data, but first ensure the previous candidates are useful.

Since the code in pushBack() is going to be used also afterward, I made function for it:
void pushBack(std::deque<int>& candidates, int data[], int i)
    while (!candidates.empty() && data[i] >= data[candidates.back()])  // 1
1. There is no use in a candidate, if the newcomer is bigger, so remove it.

Now candidates contains the indices of all the elements in the first window on data having the max value. Possibly just one element, but for sure the deque is not empty.

We are ready for the main loop:
for (int i = k; i < n; ++i)
    results.push_back(data[candidates.front()]);  // 1

    if (candidates.front() <= i - k)  // 2

    pushBack(candidates, data, i);  // 3
results.push_back(data[candidates.front()]);  // 4
1. As said above, we know that candidates is not empty and its front is the index of a max value in the current window. Good. Push it to results.
2. Now we prepare for the next window. If the front candidate is out, we remove it.
3. Push back the new element index among the candidates, following the algorithm described above. It would kill the candidates that are not bigger than it, ending up with a deque where the biggest element is surely on front.
4. Remember to push the last candidate in the results, and then the job is done.

Does this solution look more convincing to you? Full C++ code and test case on GitHub.

Go to the full post

HackerRank Equal

We have a list of integers, and we want to know in how many rounds we could make them all equal. In each round we add to all the items in the list but one the same number chosen among 1, 2, and 5. Pay special attention when you solve this problem on HackerRank. Currently (April 2018) its enunciation says each increase should be of one, three, or five. However, this is not what the solution is tested for.
It looks like someone decided to change 2 for 3 a few months ago, edited the description and then forgot to modify the testing part. Who knows what is going to happen next. Be ready to be surprised.
Besides, it is inserted in the Algorithms - Dynamic Programming section, but I don't think that is the right approach to follow.

Simplifying the problem

Instead of applying the weird addition process stated in the problem, we could decrease each time a single item. For instance, given in input [2, 2, 3, 7], a solution to the original problem is:
2, 2, 3, 7
+5 +5 +5  = (1)
 7, 7, 8, 7
+1 +1  = +1 (2)
 8, 8, 8, 8
We could solve it in this way instead:
2, 2, 3, 7
 =  =  = -5 (1)
 2, 2, 3, 2
 =  = -1  = (2)
 2, 2, 2, 2
Since we are asked to calculate the number of steps to get to the equal state, in both ways we get the same result.

Base cases

A sort of rough algorithm is already emerging. We get the lowest value in the list, and decrease all the other elements using the available alternatives until we reach it. We start from the biggest value (5) and only when we are forced to, we fall back to the shorter steps.

To better understand how we should manage the last steps, I put the base cases on paper.
If x is at level 1, 2 or 5, we could get zero at cost 1. But if x is at level 3 or 4, it costs 2 to us. Notice that if, instead of getting to level zero, we move both the lowest element and the x element at level -2, we get to the result in the same number of moves. For instance, if our input list is [10, 13]
10, 13
    -2 (1)
    -1 (2)
10, 10
10, 13
-2     (1)
    -5 (2)
 8,  8
If the input list is [0, 3, 3] getting down to the bottom from 3 in just one move gives us an advantage:
10, 13, 13
    -2      (1)
    -1      (2)
        -2  (3)
        -1  (4)
10, 10, 10
10, 13, 13
-2          (1)
    -5      (2)
        -5  (3)
 8,  8,  8
The algorithm

I think I've got it. Firstly I find the minimum value, M, in the input list. That is a possible target for all the items to be reach. However I have to check other two alternatives, M-1 and M-2.
I loop on all the items in the list. For all the three possible targets, I calculate the difference between it and the current value, count the number of steps to get there, and add it to the total number of steps required for getting to that target.
And then I choose as a result the cheapest target to reach.

The code

Using Python as implementation language, I started with a couple of test cases, and then added a few ones along the way, when I bumped into troubles, and I ended up with this code.
SHIFT = [0, 1, 2]  # 1

def solution(data):
    lowest = min(data)  # 2

    results = [0] * len(SHIFT)  # 3
    for item in data:
        for i in SHIFT:
            gap = item - lowest + i  # 4
            results[i] += gap // 5 + SHIFT[(gap%5 + 1) // 2]  # 5
    return min(results)  # 6
1. Represents the three possible targets, from the minimal value in the list down to -2.
2. Get the minimal value in the list.
3. Buffer for the results when using the different targets.
4. Distance that has to be split.
5. Add the current number of steps to the current buffer. Firstly I get the number of long moves dividing gap by 5. Now I am in the base case, as showed in the picture above. Notice that the cost of moving from X to target is [0, 1, 1, 2, 2] for gap in [0..5], if we take gap, apply modulo five, increase it and then divide by two, we get the index in SHIFT to the number of steps actually needed. Admittedly an obscure way to get there, if this was production code, I would have probably resisted temptation to use it.
6. Get the minimal result and return it.

All tests passed in hackerrank, python script and test case pushed to GitHub.

Go to the full post

HackerRank Roads and Libraries

We are given n nodes and a (possibly huge) number of edges. We are also given the cost of building a library in a city (i.e. a node) and a road (i.e. an edge). Based on these data we want to minimize the cost of creating a forest of graphs from the given nodes and edges, with the requirement that each graph should have a library on one of its nodes. This is a HackerRank problem on Graph Theory algorithms, and I am about to describe my python solution to it.

If a library is cheaper than a road, the solution is immediate. Build a library on every node.
def solution(n, library, road, edges):
    if road >= library:
        return n * library

    # ...
Otherwise, we want to create a minimum spanning forest, so to minimize the number of roads, keeping track of the number of edges used and how many graphs are actually generated. I found natural using an adapted form of the Kruskal MST (Minimum Spanning Tree) algorithm, that looks very close to our needs.

Kruskal needs a union-find to work, and this ADT is not commonly available out of the box. So, I first implemented a python UnionFind class, see previous post for details.
Then, while working on this problem, I made a slight change to it. My algorithm was simpler and faster if its union() method returned False when nothing was actually done in it, and True only if it led to a join in two existing graph.

Using such refactored UnionFind.union(), I wrote this piece of code based on Kruskal algorithm:
uf = UnionFind(n)
road_count = 0  # 1

for edge in edges:
 if uf.union(edge[0] - 1, edge[1] - 1):  # 2
  road_count += 1  # 4
  if uf.count == 1:  # 5
1. The union-find object keeps track of the numbers of disjointed graphs in the forest, but not of edges. This extra variable does.
2. I need to convert the edges from 1-based to 0-based convention before use them. If the two nodes get connected by this call to union(), I have some extra work to do.
4. An edge has been used by union(), keep track of it.
5. If union() connected all the nodes in a single graph, there is no use in going on looping.

Now it is just a matter of adding the cost for roads and libraries to get the result.
return road_count * road + uf.count * library

Complete python code for problem, union-find, and test case on GitHub.

Go to the full post

HackerRank Climbing the Leaderboard

We are given two lists of integers. The first one is monotonically decreasing and represent the scores of the topmost players in a leaderboard. The second one is monotonically increasing and contains the score history of Alice, a player who rapidly climbed the board.
Following the dense ranking convention, we want to get back a list containing the history of rank positions for Alice.
This is a HackerRank Algorithm Implementation problem, and I am going to show you how I solved it, using Python as implementation language.

I noticed that the first list, scores, is already sorted, we just have to get rid of duplicates to have a matching between the position and the score Alice has to get to achieve that ranking.

Bisect solution

I just have to do the matching. First idea jumped to my mind, was performing a binary search on scores to do it. It helps that Python provides for the job a well known library, bisect. There's just a tiny issue, bisect expects the underlying list to be sorted in natural order, so we need to reverse our scores.

It looks promising, let's implement it.

A pythonic way to get our ranking would be this:
ranking = sorted(list(set(scores)))
I get the original list, convert to a set to get rid of duplicates, than back to list, so that I can sort it in natural order. Nice, but in this problem we are kind of interested in performance, since we could have up to 20 thousand items in both lists. So we want to take advantage of the fact that the list is already sorted.

So, I ended up using this rather uncool code:
ranking = [scores[-1]]
for i in range(len(scores)-2, -1, -1):
 if scores[i] > ranking[-1]:
I initialize the ranking list with the last item in scores, then I range on all the other indices in the list from right to left. If the current item is bigger than the latest one pushed in ranking, I push it too.

Now I can just rely on the bisect() function in the bisect python module, that would find which position the current item should be inserted in the list. With a caveat, I have reverted the order, so I have to adjust the bisect() result to get the result I'm looking for:
results = []
last = len(ranking) + 1
for score in alice:
 results.append(last - bisect(ranking, score))
This code pass all the tests, job done.

However. Do I really need to pay for the bisect() search for each element of alice?

Double scan solution

Well, actually, we don't. Since we know that both list are sorted, we can use also the ordering in alice to move linearly in ranking.

Since we are not using anymore bisect, we don't need to revert the sorting order in ranking, and the duplicate cleanup is getting a bit simpler:
ranking = [scores[0]]
for score in scores[1:]:
 if score < ranking[-1]:

Now we compare each item in alice against the items in ranking moving linearly from bottom to head:
results = []
for score in alice:
 while ranking and score >= ranking[-1]:
 results.append(len(ranking) + 1)
We don't have to be alarmed by the nested loops, they don't have a multiplicative effect on the time complexity, since we always move forward on both lists, the result is a O(M + N) time complexity.

Is this a better solution than the first one? Well, it depends. We should know more on the expected input. However, for large and close values of N and M, it looks so.

I pushed the python script for both solutions and a test case to GitHub.

Go to the full post

HackerRank Divisible Sum Pairs

Given a list of integers, we want to know how many couples of them, when summed, are divisible by a given integer k.

So, for instance, given [1, 2, 3, 4, 5, 6], we have five couples of items with sum divisible by 3:
(1, 2), (1, 5), (2, 4), (3, 6), (4, 5)
This is a HackerRank algorithm problem, implementation section.

Naive solution

Test all the couples, avoiding duplicates. If we check (a1, a2), we don't have to check (a2, a1).

The code I have written for this solution should be of immediate comprehension, even if you are not that much into Python:
result = 0
for i in range(len(values) - 1):  # 1
    for j in range(i+1, len(values)):  # 2
        if (values[i] + values[j]) % k == 0:  # 3
            result += 1
1. Loops on all the indices in the list but the last one.
2. Loops from the next index to the current "i" to the last one.
3. Check the current couple, and increase the result if compliant.

Even if this is what HackerRank was expecting from us (the problem is marked as easy), we can do better than this, considering its disappointing O(N^2) time complexity.

Linear solution

The problem could be restated as counting the couples that, added up, equal to zero modulo k. Following this insight, let's partition the items accordingly to their own modulo.
remainders = [0] * k
for i in range(len(values)):
    remainders[values[i] % k] += 1
Each element in the "remainders" list represents the number of items in the original list having as modulo the index of the element.

For the example shown above we'll get this remainders:
[2, 2, 2]
Now, if we add an element having remainder x to element with remainder k - x, we'll get a number equal zero modulo k. We want all the possible combinations of the x elements with the k - x ones, so we apply the Cartesian product to the two sets, that has a size that is their product.

There are a couple of exceptions to this rule. The elements having modulo zero have to be added among themselves, and the same happens to the element having as modulo half k, if k is even. The number of combinations of a set of N elements could be expressed as N * (N-1) / 2.

Putting all together we have this code:
result = remainders[0] * (remainders[0] - 1) // 2  # 1

pivot = k // 2  # 2
if k%2:
    pivot += 1  # 3
    result += remainders[k//2] * (remainders[k//2] - 1) // 2  # 4

for i in range(1, pivot):  # 5
    result += remainders[i] * remainders[k-i]
1. Initialize "result" using the above described formula for the modulo zero items.
2. Let's calculate the central element in the list, where we have stop looping to sum up.
3. If k is odd, we won't have a lonely central element, and the pivot should be moved a step to the right.
4. When k is even, the elements having half-k modulo are managed as the zero modulo ones.
5. Normal cases.

After the for-loop, result contains the answer to the original question.

I pushed a python script with both solutions, naive and remainder based, to GitHub, along with a few tests.

Go to the full post

HackerRank DP: Coin Change

We are given in input a list of integers, and another integer that represents a total we should reach adding up how many elements from the passed list, each of them used 0+ times, with no upper limit. The name of the problem is slightly misleading, since the list could contain any positive integer, and we could not have almost any expectation them, besides being positive.
I'd say that is a version of the Partitions of Integers problem where a special condition is imposed on the integers that we can use.

You can find and solve this problem on HackerRank, section Cracking the Coding Interview.

First thing, I have taken a non completely trivial example and I studied it on paper.

Given in input [2, 5, 3, 6] and 10, it easy to see how the solution is 5:
2 + 2 + 2 + 2 + 2
5 + 5
2 + 3 + 5
3 + 3 + 2 + 2
2 + 2 + 6
The fact that it is marked as DP, should put me on the way of looking for a Dynamic Programming solution. So I create a table, reasoning how to fill it up coherently. Each column represents the totals I could get, ranging from 0 to the passed value. I have a column for each number passed in the input list, plus the topmost one, that represents the "no value" case.

Cell in position (0, 0) is set to 1, since I could get 0 from no value in just one way. The other values in the first row are set to zero, since I can't get that total having nothing to add up. We don't care much what it is in the other cells, since we are about to get the right value by construction.

We'll move in the usual way for a dynamic programming problem requiring a bidimensional table, row by row, skipping the zeroth one, from top to bottom, moving from left to right. We could have filled the first column before starting the procedure, since it is immediate to see that there is only one way to get a total of zero, whichever number I have at hand. Still, in this case it doesn't help to make the code simpler, so I just keep it in the normal table filling part.

For each cell what I have to do is:
  • copy the value from the cell above
  • if "cur", the current value associated to the row, is not greater than the current column index, add the value in the cell on the same row but "cur" times to the left
The first point should be clear. Maybe having a new number at hand would give us a new way to get the total, surely it won't reduce the alternatives we have already calculated.
The second point refers to the contribution of the new element. I guess the picture will help understand it.

The arrow pointing down from (0, 0) to (1, 0) means that since having no values leads to have one way to get a sum of zero, this implies that having no value and 2 still gives at least one way to get a sum of zero.
The other arrow pointing down, from (2, 8) to (3, 8) means that having one way to get 8 from no value and [2, 5] implies we still have at least one way to get it from no value and [2, 5, 3].
The arrow pointing left from (1, 0) to (1, 2) means that since we have a way to get zero having a 2, if we add a 2, we have a way to get 2 as a total.
The arrow pointing left from (3, 5) to (3, 8) means that having two ways of getting 5 using [2, 5, 3] implies that we still have two way of getting 5 + 3 = 8. Added with the one coming from the cell above, it explains why we put 3 in this cell.

Following the algorithm, I have written this python code here below:
def solution_full(denominations, total):  # 1
    table = [[0] * (total + 1) for _ in range(len(denominations) + 1)]  # 2
    table[0][0] = 1

    for i in range(1, len(denominations) + 1):  # 3
        for j in range(total+1):
            table[i][j] += table[i - 1][j]  # 4
            cur = denominations[i-1]
            if cur <= j:
                table[i][j] += table[i][j-cur]  # 5

    return table[-1][-1]  # 6
1. In the example, denominations is [2, 5, 3, 6] and total is 10.
2. Table has total + 1 columns and a row for each denomination, plus one. Its values are all set to zero, but the left-topmost one, set to 1.
3. Loop on all the "real" cells, meaning that I skip just the first row. I move in the usual way. Left to right, from the upper row downward.
4. The current cell value is initialized copying the value from the immediate upper one.
5. If there are enough cell to the left, go get the value of the one found shifting for the value of the current denomination, and add it to the one calculated in the previous step.
6. Return the value in the bottom right cell, that represents our solution.

How to save some memory

Writing the code, I have seen how there is no use in keeping all the rows. The only point where I use the values in the rows above the current one is in (4), and there I use just the value in the cell immediately above the current one. So I refactored the solution in this way:
def solution(denominations, total):
    cache = [0] * (total + 1)  # 1
    cache[0] = 1

    for denomination in denominations:  # 2
        for j in range(denomination, total+1):  # 3
            cache[j] += cache[j-denomination]
    return cache[-1]
1. The memoization here is done just in one row. Initialized as in the previous version.
2. Since I don't care anymore of the row index, I can just work directly on the denominations.
3. Instead of checking explicitly for the column index, I can start the internal loop from the first good position.

I pushed my python script with both solutions and a slim test case to GitHub.

Go to the full post

Partitioning Souvenirs

Given a list of integers, we want to know if there is a way to split it evenly in three parts, so that the sum of each part is the same than the other ones.

Problem given in week six of the edX MOOC Algs200x Algorithmic Design and Techniques by the UC San Diego.

This 3-partition problem is not too different from the classic 2-partition one, for which I have described the well known dynamic programming solution in the previous post. As before, we build a table where the rows represents the sums we want to get and the columns the elements in the collection we are about to consider.
However, we have to change a bit the meaning of the value that we push in each cell. This time we check two of the three tentative subcollections, and we want to keep track of how many of them could have as sum the row index, given the elements of the input list available in that column.

Consider as example this input:
[3, 1, 1, 2, 2]
We are looking for three subcollections having all a sum of three. The table is having six columns and four rows, including a first dummy one. We initialize all its cells to zero, and we loop on all the "real" cells applying rules close to the ones we have seen for the 2-partition problem, with slight variations.
(a) If the column element matches the row index, I increase the value of the left-hand cell, up to reach 2.
(b) If there is not a match, but the column element added to the previous one matches it, I still increase the value of the left-hand cell, up to reach 2.
(c) Otherwise, I copy the value in the left-hand cell to the current one.
The result should be reflected by this table:
And the answer to the original question is yes only if the bottom-left value in the table is two.

Here is my python code to implement this algorithm.
def solution(values):
    total = sum(values)
    if len(values) < 3 or total % 3:  # 1
        return False
    third = total // 3
    table = [[0] * (len(values) + 1) for _ in range(third + 1)]  # 2

    for i in range(1, third + 1):
        for j in range(1, len(values) + 1):  # 3
            ii = i - values[j - 1]  # 4
            if values[j - 1] == i or (ii > 0 and table[ii][j - 1]):  # 5
                table[i][j] = 1 if table[i][j - 1] == 0 else 2
                table[i][j] = table[i][j - 1]  # 6

    return True if table[-1][-1] == 2 else False
1. If dividing the sum of values by three I get a remainder, or if there are less than three elements in the list, for sure there is no way of 3-partition my list.
2. Build the table as discussed above. Note the zero as default value, even in the dummy top row - it is not formally correct, but those values are not used anywhere.
3. Loop on all the "real" cells.
4. Precalculate the row for the (b) check described above.
5. The first part of the condition is the (a) check above. If it fails, we pass to the second part, using the row calculate in (4). If one of the two conditions is true, the value of the current cell is increased up to 2.
6. Vanilla case, moving to the right we keep the value already calculate for the previous cell.

It looks easy, once one see it, doesn't it?

Actually, a tad too easy, as Shuai Zhao pointed out - see below in the comments. The problem is that the (b) check, as described above, is too simple. Before using a value I have to ensure it has not already used on the same line. Things are getting complicated, better to explain them in another post.

I pushed my python code and a few test cases to GitHub. The latest version is the patched code, working also for the Shuai test. Get back in the history if you want to see the solution described here.

Go to the full post

2-partition problem

Having a list of integers, we want to know if we can split it evenly in two parts.

There is a well known, elegant and relatively fast dynamic programming solution to this problem.

Say that this is the list
[3, 1, 1, 2, 2, 1]
Being the sum of its elements ten, we'll have a positive answer to the problem if we could find a subcollection with a sum of five.

To check it, we build a table having rows from zero to the sum of the subcollection we are looking for - five in this case. Actually, the zeroth row is pretty useless here, I keep it just because it makes indices less confusing in the code. The columns represents the partial sum of elements in the list we have in input, column zero is for the empty collection, one contains just the first element (3 in the example), column two the first two items (3 and 1 here), up to the last one that keep all.

The content in each cell is the answer to the question: is there a combination of elements in the subcollection specified by the column that have a sum specified by the row?

So, for instance, table[2][3] means: could I get 2 as a sum from [3, 1, 1]? The answer is yes, because of latter two elements.

The bottom-right cell in the table is the answer for the original problem.

Let's construct the table. Whatever I put in the topmost row is alright, since I won't use it in any way. They would represent the answer to the question if I could get a sum zero from a collection that could be empty (leftmost cell) up to including all the element in the original input (rightmost cell). Logically, we should put a True inside each of them but, since we don't care, I leave instead a bit misleading False. Forgive me, but this let me initialize with more ease the table, considering that each first cell in any row (but the zeroth one) should be initialized with False, since it is impossible having a sum different from zero from an empty collection.

Now let's scan all the cell in the table, from (1, 1) to the bottom-right one, moving from left to right, row by row.
If the currently added element in the list has the same value of the row index (that is, the total we are looking for), we can put a True in it.
If the cell on the immediate left contains a True, we can, again, safely put a True in it. Adding an element to the collection won't change the positive answer we already get.
If the first two checks don't hold, I try to get the total adding up the current value to the previous one. If so, bang, True again.

At the of looping, we should get a table like the one here below.
(a) The cell (1, 2) is set to True because the column represent the subcollection {3,1}, having as latest element the row index.
(b) The cell (1, 4) is True because (1, 3) is True
(c) The cell (4, 2) is True because of cell (3, 1), checked because being the left adjacent column, moving up 1 (from the latest element in current subcollection {3,1}).

Checking the bottom-right cell we have a True, so the answer to our original question is yes.

Here is my python implementation of this algorithm:
def solution(values):
    total = sum(values)  # 1
    if total % 2:
        return False

    half = total // 2
    table = [[False] * (len(values) + 1) for _ in range(half + 1)]  # 2

    for i in range(1, half + 1):
        for j in range(1, len(values) + 1):  # 3
            if values[j-1] == i or table[i][j-1]:  # 4
                table[i][j] = True
            else:  # 5
                ii = i-values[j-1]
                if ii > 0 and table[ii][j-1]:
                    table[i][j] = True

    return table[-1][-1]  # 6
1. If the sum of values is not an even number, we already know that the list can't be split evenly.
2. Build the table as described above. Don't pay attention to the topmost row, it's just a dummy.
3. Loop on all the "real" cell, skipping the leftmost ones, that are left initialized to False.
4. See above, case (a) and (b) as described and visualized in the picture
5. This code implements the case (c). I get the the tentative row index in ii. If the relative cell on the left adjacent column is available and it is set to True, the current cell is set to True too.
6. Get the solution to the problem.

I pushed my python code and a few test cases on GitHub.

Go to the full post

Other Dynamic Programming problems

Sixth and last week of the edX MOOC Algs200x Algorithmic Design and Techniques by the UC San Diego, again on Dynamic Programming. Just three problems, fully described in this pdf.

The first one, named "Maximum Amount of Gold", states that you have a bag of given capacity, and you see n gold bars of (possibly) different weights. Push as much gold as you can in the bag.

It is easy to say that it is a variation on the classic 0/1 knapsack problem. Here the bars have all the same unitary value, so we just need to know their weight to build our solution. Not much sweat to solve it, anyway I pushed to GitHub first a python script to solve the generic problem, then one tailored on the specific requirements of the problem.

Much more challenging the other two problems. "Partitioning Souvenirs" is a 3-Partition problem. Before solving it, I practiced with the more common 2-partition version. "Maximizing the Value of an Arithmetic Expression" is well described, step by step, in the course, and I guess that I solved it easily only because of this intensive training. You could see my python implementation on GitHub.

Go to the full post

Longest Common Subsequence of Three Sequences

And finally, the last one of this group of Dynamic Programming problems. Actually, from the algorithmic point of view this is the less interesting one, being just a variation on the previous one. Now we have in input three sequences instead of two, still we have to get the longest subsequence among them.

The interest here is all in extending the algorithm to work with a three-dimensional cache. Basically just an implementation issue, that each programming language could solve in its own way.

Here is how I did it in python:
def solution_dp(a, b, c):
    cube = []
    for m in range(len(c) + 1):  # 1
        sheet = [[0] * (len(b) + 1)]
        for n in range(1, len(a) + 1):
            sheet.append([0] * (len(b) + 1))

    for i in range(1, len(cube)):
        for j in range(1, len(cube[0])):
            for k in range(1, len(cube[0][0])):
                if a[j - 1] == b[k - 1] == c[i - 1]:  # 2
                    cube[i][j][k] = cube[i - 1][j - 1][k - 1] + 1
                    cube[i][j][k] = max(cube[i - 1][j][k], cube[i][j - 1][k], cube[i][j][k - 1])

    return cube[-1][-1][-1]
1. If you compare this code with the one for the 2-sequences problem, you would see how the difference is all in this extra for-loop. Now the cache is a three dimensional matrix (actually, it is not a cube but a parallelepiped, you could guess why I used a wrong name here).
2. The comparisons get now three way. Luckily, python helps us keeping them readable.

Once you manage correctly the three-level loop, the job is done.

I have pushed the complete python script and its test case to GitHub.

Go to the full post

Longest Common Subsequence of Two Sequences

Close to the previous problem, where we had to compute the minimum edit distance between two strings, here we have to get the maximum number of common elements in the same order between two sequences.

The similarity drives us to look again for a Dynamic Programming solution (adding up to the hint that these problems are in the same lot).

Here is my python solution:
def solution_dp(lhs, rhs):
    table = [[0] * (len(rhs) + 1)]  # 1
    for _ in range(1, len(lhs) + 1):
        table.append([0] * (len(rhs) + 1))

    for i in range(1, len(table)):
        for j in range(1, len(table[0])):  # 2
            if lhs[i - 1] == rhs[j - 1]:
                table[i][j] = table[i-1][j-1] + 1  # 3
                table[i][j] = max(table[i - 1][j], table[i][j - 1])  # 4
    return table[-1][-1]
1. The cache is created as in the previous problem, bidimensional, with extra dummy row and column to keep the code simple.
2. Again, we loop on all the "real" cells in the cache, from left to right, up to down.
3. The code change in the algorithm. If the corresponding elements in the input sequences match, we put as current value the counter stored in the top-left cell, increased it by one.
4. If it is a mismatch, we don't increase anything, just get the bigger value coming from the two possible alternatives.

And that's all. Richard Bellman, who found out this algorithm, was a genius.

Python script and testcase pushed to GitHub.

Go to the full post

Computing the Edit Distance Between Two Strings

Given two strings, we should compute their edit distance. It is a well know problem, commonly solved by Dynamic Programming.

As we should expect, the idea is very close to the one seen in the previous problem, with the noticeable difference that here we are working on two lists, so our cache is going to be a bidimensional matrix and the complexity of the algorithm is moving to the O(n * m) realm, being n and m the sizes of the two strings in input.

def solution_dp(lhs, rhs):
    table = [[x for x in range(len(rhs) + 1)]]  # 1
    for k in range(1, len(lhs) + 1):
        table.append([k] + [0] * len(rhs))

    for i in range(1, len(table)):
        for j in range(1, len(table[0])):  # 2
            if lhs[i - 1] == rhs[j - 1]:
                table[i][j] = table[i-1][j-1]  # 3
                table[i][j] = min(table[i - 1][j], table[i][j - 1], table[i - 1][j - 1]) + 1  # 4

    return table[-1][-1]  # 5
1. This is our cache. Instead of having a single dummy cell, here we have both zeroth row and column just filled with zeroes and not touched anymore. Again, not strictly a necessity, still the code is much more readable in this way.
2. Let's loop on all "real" elements in the matrix.
3. If the corresponding characters in the strings are the same, we have a match. Meaning the edit distance won't change, so we just copy in the current cell the value of the one on the left top corner.
4. Otherwise we have seen a change. Since we are looking to the minimal distance, we get the lowest value in the top / left cells, and increase it by one.
5. At the end of the loop, the bottom-right cell contains the result.

Incredible simple, isn't it?

Python code and testcase on GitHub.

Go to the full post

Primitive Calculator

We always start from 1, and we get the positive integer we should get to. We could apply just three operations, multiply by 2, by 3, or adding one. Which is the minimum number of operations that gives us the expected result? Which sequence will be generated?

This problem is close to the previous one, about changing money. After all, are all part of the same lot about Dynamic Programming.

The first part of the code follows almost naturally from looking at the money changer:
cache = [0] * (target + 1)  # 1
for i in range(1, len(cache)):  # 2
    cache[i] = cache[i-1] + 1
    if i % 2 == 0:
        cache[i] = min(cache[i], cache[i // 2] + 1)
    if i % 3 == 0:
        cache[i] = min(cache[i], cache[i // 3] + 1)
1. Reserve a cache for all the intermediate results, again, a dummy zeroth element would make our code simpler.
2. Loop an all the elements, checking for all the possible alternatives. The minimum local solution would be kept an used to the next steps.

Now in the last element of the cache we have the answer to the first question. To get the second answer we need to backtrack the cache, identify each choice we did at each step.
result = [1] * cache[-1]  # 1
for i in range(1, cache[-1]):  # 2
    result[-i] = target  # 3
    if cache[target-1] == cache[target] - 1:  # 4
        target -= 1
    elif target % 2 == 0 and (cache[target // 2] == cache[target] - 1):  # 5
        target //= 2
    else:  # 6 # target % 3 == 0 and (cache[target // 3] == cache[target] - 1):
        target //= 3
return result
1. This is the list we are going to return. We know its size, stored in the last element of the cache, since I have to provide an initialization value, I use 1, the right value for its leftmost element.
2. I am going to set the result list, each value but the leftmost one, already correctly set.
3. I know the rightmost value would be the target passed by the user.
4. If the previous element in the cache is the current value of the cache minus one, I got there adding one, so I here apply the inverse operator, decreasing target by one
5. If the current target is divisible by 2, and the cache at position current divided by two is actually the value of the current element of the cache minus one, we got there multiplying by two. So I apply the inverse to backtrace.
6. Otherwise we got there multiplying by three. I could have written the full elif statement as shown in the comment. The point is that by construction we have only three choices to get to an element in the cache and this must be the third one.

Python code and testcase on GitHub.

Go to the full post

Money Change Again

Already seen as greedy problem, now we have to approach it with Dynamic Programming.

We have an integer, representing a total amount that we want to get adding coins of given denominations. Our job is finding the minimum number of required coins.

If the denominations are such that we can always guarantee a safe greedy choice, the greedy solution is better. However, if, as in this case, the denominations are [1, 3, 4] we have to look somewhere else.

A recursive approach is safe, but it would lead to an exponential time complexity.

Let's solve it instead using a Dynamic Programming approach:

# ...

cache = [0] * (target + 1)  # 1

for i in range(1, target + 1):  # 2
    cache[i] = cache[i-1] + 1  # 3
    for coin in DENOMINATIONS[1:]:  # 4
        if coin <= i:
            other = cache[i-coin] + 1
            cache[i] = min(cache[i], other)

return cache[-1]  # 5
1. Each step would need to have a look at previous results, so we keep them in a cache. Notice that we keep also a dummy extra value, not a strict necessity, but keeps our code simpler.
2. Let's loop on all the "real" elements - the zeroth one, dummy, is left untouched with is original value of 0.
3. Assuming that among the available denominations there is "1", we can always select it. This is usually a safe assumption. If we can't do it, our code should be more complex, and we should also consider the case where a result can't be generated correctly.
4. Let's check all the other denominations. If the current one could be used, go and fetch in the cache the previous step, add one to it, and put it in the cache at the current position, if this is less than the currently calculated value.
5. Finally return the last calculated value.

I pushed test case and python script to GitHub.

Go to the full post

Almost five Dynamic Programming problems

I'm following edX MOOC Algs200x Algorithmic Design and Techniques by the UC San Diego, and I've just completed week five, that is about Dynamic Programming. To pass it, I had to solve a few problems, presented in this pdf.

Here I present them, my annotated python solutions in following posts.

Money Change Again

Given an integer amount, and an (infinite) numbers of coins of given denomination, return the minimum number of coins needed to change the full amount.

In the (sort of disappointing) previous week, they presented a greedy approach to this problem. When it works, it is cool and fast. However, to use it we have to prove that each greedy selection is safe. And that it is true only for some set of available denominations. Otherwise we must follow a different approach. And here enters Dynamic Programming.

It is a simple problem, good introduction to this technique.

Primitive Calculator

We should get to a given integer, starting from 1 and applying just a limited number of operations. Can we minimize this number?

If you see the similarity with the previous problem you have already done large part of the job.

Edit Distance Between Two Strings

Given two strings, returns their edit distance. This is a classic problem that could be solved by Dynamic Programming. The two previous one need a linear cache, here we should push our intermediate solutions in a matrix. Things are a bit more complicated, however you could find lot of documentation about this problem on the net.

Longest Common Subsequence

Given two sequences, find the largest shared subsequences. It is close to the previous problem, here we are interested just in commonalities between the two input sequences. Similar structure, some implementation differences. Also this problem is well known and studied.

Longest Common Subsequence of Three Sequences

Extension of the previous one, now we have three sequences to be compared. The point is not so much in Dynamic Programming anymore - since the structure is essentially the same - but how the programming language you use for implementing the solution manages a three-dimensional matrix.

Go to the full post