Pages

How to close a Spring ApplicationContext

I've just finished reading chapter 2 Getting Started of Pro Spring 5. Nice and smooth introduction to Spring, and to Dependency Injection motivation.

Going through the provided examples, I had just a minor on the code provided. In a couple of cases, the created Spring contexts is stored in plain ApplicationContexts reference. This is normally considered a good way of writing code. We don't want to bring around a fat interface when a slimmer one would be enough. Anyway, ApplicationContext has no close() method declared, leading to an annoying warning "Resource leak: 'ctx' is never closed".

Here is one place where you could see the issue, in class HelloWorldSpringDI:
ApplicationContext ctx =
    new ClassPathXmlApplicationContext("spring/app-context.xml");

MessageRenderer mr = ctx.getBean("renderer", MessageRenderer.class);
mr.render();
An obvious solution would be explicitly cast ctx to ClassPathXmlApplicationContext, and call close() on it. However, we could do better than that. The Spring application contexts close()'s come from the interface Closeable. This means that we could refactor our code wrapping the ApplicationContext usage in a try-with-resource block. We let Java taking care of calling close() when the context goes out of scope.

The resulting code is something like that:
try (ClassPathXmlApplicationContext ctx =
        new ClassPathXmlApplicationContext("spring/app-context.xml")) {
    MessageRenderer mr = ctx.getBean("renderer", MessageRenderer.class);
    mr.render();
}
I pushed on GitHub my patched code for both HelloWorldSpringDI and HelloWorldSpringAnnotated, that is, the other class suffering for the same lack of application context closing.

Go to the full post

The Gradle nature of Pro Spring 5

I've started reading Pro Spring 5: An In-Depth Guide to the Spring Framework and Its Tools. It looks fun and interesting. The first issue I had was on having its code from GitHub working in my STS IDE. Here are a few notes on what I did to solve it, if someone else stumble in the same place.

First step was easy. After navigating through the File menu in this way ...
File
    Import...
        Git
            Project from Git
                GitHub
                    Clone URI
I entered the repository address: https://github.com/Apress/pro-spring-5.git

Second step should have been even easier. Add Gradle Nature to the project. It's just a matter of right-clicking on the project, and from the menu select
Configure
    Add Gradle Nature
Unfortunately, there are a few version numbers in the main "build.gradle" file for the project that are not currently working, leading to a number of annoying exceptions like:
A problem occurred evaluating project ':chapter02:hello-world'.
Could not resolve all files for configuration ':chapter02:hello-world:compile'.
Could not resolve org.springframework:spring-context:5.0.4.BUILD-SNAPSHOT.
Required by:
    project :chapter02:hello-world
Could not resolve org.springframework:spring-context:5.0.4.BUILD-SNAPSHOT.
Could not get resource ...
Could not HEAD ...
Read timed out
org.gradle.tooling.BuildException: Could not run build action ...
I tried to minimize the impact of my changes on the original configuration file.

Basically, when I had an issue, I moved to a newer version, preferring a stable RELEASE to BUILD-SNAPSHOT's or release candidates.

I also had a problem for
io.projectreactor.ipc:reactor-netty:0.7.0.M1
That I moved to
io.projectreactor.ipc:reactor-netty:0.7.9.RELEASE
More complicated was the problem for chapter five. It looks that I have something wrong for aspectJ in my configuration. I didn't to spend too much time on that now, assuming that it would lead to to some detail in that specific area. So I opened the aspectj-aspects gradle build file and commented the dependency to gradle-aspectj and the plugin application for aspectj itself.

Now my build.gradle for chapter five - aspectj-aspects has these lines in:
}

    dependencies {
//      classpath "nl.eveoh:gradle-aspectj:2.0"
    }
}
// apply plugin: 'aspectj'
Sure enough, it won't compile. Still, I finally added the gradle nature to the project and I can go on reading the book. I'll take care of that issue when I'll reach chapter five.

I've forked the original APress repository, and there I pushed my variants for the main build.gradle file, and the one for chapter 5, aspectj-aspects. See there for details.

Go to the full post

A thread-safe DateFormatter via ThreadLocal

The exercise 2-b at the end of chapter two from the book Java 8 Lambdas by Richard Warburton asks to wrap a DateFormatter in a ThreadLocal so to make it thread safe.

We have spotted in legacy code, designed to work in a single thread context, something like:
DateFormatter formatter = // ...

// ...

Calendar cal = Calendar.getInstance();
cal.set(Calendar.YEAR, 1970);
cal.set(Calendar.MONTH, Calendar.JANUARY);
cal.set(Calendar.DAY_OF_MONTH, 1);
String formatted = formatter.getFormat().format(cal.getTime());
We know that DateFormatter, a Swing class used to format java util Date, is non thread safe. And now we plan to use this code in a multithreaded environment. We need to refactor it.

If we need to stick to the DateFormatter, here is a possible solution:
ThreadLocal<DateFormatter> formatter =  // 1
    ThreadLocal.withInitial(  // 2
        () -> new DateFormatter(  // 3
            new SimpleDateFormat("dd-MMM-yyyy", Locale.ENGLISH)));  // 4
1. Wrapping an instance of it in a ThreadLocal saves our day, since each thread has its own copy of the variable.
2. Since Java 8, the withInitial() static method let us create a ThreadLocal passing a Supplier that is going to be used to initialize the object.
3. That's the Supplier. Nothing is passed in, and it returns a new DateFormatter.
4. The DateFormatter will be constructed from this SimpleDateFormat built on the fly specifying the format as a string. Notice the second parameter, I want the dates to be localized in English whichever is the default locale.

Job done. Still it would be nice to ...

Switch to java.time

The classes in java.time have been designed for working correctly in multithreading. So, getting rid of Calendar and DateFormatter for LocalDate and DateTimeFormatter, would lead to code simpler and more robust, like this:
DateTimeFormatter formatter8 = DateTimeFormatter.ofPattern("dd-MMM-yyyy", Locale.ENGLISH);
// ...

LocalDate aDate = LocalDate.of(1970, 1, 1);
String formatted = formatter8.format(aDate);
I have pushed the above Java code in my GitHub repository forked from the one gently provided by the author.
Follow the links to see the Question2 solution and its test case.

Go to the full post

HackerRank Java Dequeue

Given n integers in input, consider each window in it sized m, where m is less or equal to n, and get the number of unique elements in each window. Return the maximum value among them.

This HackerRank Java Challenges problem is named Java Dequeue, a clear hint that a Deque would helpful to solve it elegantly.

The HackerRank settings implies that data are passed us by System.in. To develop a method that could be easily tested, I made it accept an InputStream as parameter, then I passed it through a Scanner, in its own try-with-resource block.
public static int solution(InputStream is) {
    try (Scanner scanner = new Scanner(is)) {
        // ... see below
    }
}
The first two integers we expect in the input stream are the above described n and m. A problem constrain states that m is not bigger than n, I make it clear in the code with an assertion.
int n = scanner.nextInt();
int m = scanner.nextInt();
assert m <= n;
Now I should be ready to get the next n integers and work on them. The big hint in the problem name suggests to push them in a Deque. Actually, we are going to use it as a queue. Since ArrayDeque is documented as "likely to be ... faster than LinkedList when used as a queue", my choice goes for it. To avoid slowdowns due to possible resizings, I create it specifying its capacity to the maximum "m" specified by the problem.

We also need to count the items in each window, and to do that a second container is useful. We'll push any item entering the window in a hash table, storing as value its current count. When an item exits the window we decrease its count in the hash. If the count is zero, we remove it.

We are going to call our method many times in a row, and I don't want to create each time a new deque and hash. It is cheaper to make them class data member, and simply reset them any time the method is called.
public class Solution {
    private static final int MAX_WINDOW_SIZE = 100_000;

    private static Deque<Integer> window = new ArrayDeque<>(MAX_WINDOW_SIZE);
    private static Map<Integer, Integer> counter = new HashMap<>();

    public static int solution(InputStream is) {
        // ... see above

        window.clear();            
        counter.clear();
        
        // ... see below
    }
}
Let's fill up the window for the first time. The hash map 'counter' is set up as described above.
for (int i = 0; i < m; i++) {
    int in = scanner.nextInt();
    window.add(in);
    counter.merge(in, 1, Integer::sum);
}

int result = counter.size();

// ... see below
To count how many different integers are in the current window, I simply check the size of the hash map.

Now it is just a matter of iterating on all the other integer in the input stream.
for (int i = m; i < n && result < m; i++) {
    Integer out = window.remove();
    Integer in = scanner.nextInt();
    window.add(in);

    // ... see below
}

return result;
I have started looping on the m-th item, willing to go up to the n-th. But, wait, if I find a window with all different numbers (meaning, the current result is already equal to "m") I have already found the problem solution. That could save some CPU time.

In the loop, firstly I adjust the window, removing its first element and adding a new last element. The exiting and entering elements are used to modify the counter map. Then I recalculate the result, and I check it against the current solution.
if (out.intValue() != in.intValue()) { // 1
    counter.merge(in, 1, Integer::sum);  // 2
    counter.merge(out, -1, (a, b) -> a == 1 ? null : a + b); // 3
    result = Math.max(result, counter.size()); // 4
}
1. If what enters in the window is the same of what exists from it, I don't have to do anything.
2. I merge the item 'in' in the map. That means, if the map contains 'in', Integer::sum is used to adjust its value (on its current value and the '1' I passed in). Otherwise a new item is created in the map, key 'in', value '1'.
3. Similarly to the previous line, but now I'm performing a sort of 'merge down'. I'm not satisfied by this line I wrote, even though it is kind of fun. Its point is that we know for sure that 'out' is in the map, so we know that we are going to execute the lambda passed to merge(). It would return null if the value associated to 'out' is 1, removing it. Otherwise it would return a + b, but b is set to -1, so it would decrease it. The weak spot is, what if for 'out' is not in the map? Well, merge() would push it in, with value -1. Damn it. There is no way to avoid this inconsistency, since merge() would throw an exception if null is passed as value. End of the story is, I would not use this line in production code.
4. Maybe this check-and-set looks a bit too pythonic, what do you think about it?

Full Java code and test case available on GitHub.

Go to the full post

HackerRank Java Sort

We have a Student class, having an int, a String (the name) and a double data member. We want read a bunch of Students from a data stream, sort them by the double - decreasing, then string, and then int. In the end we'll output just the student names.

This apparently boring little HackerRank problem shows how much more fun is Java since functional programming has been introduced in it.

I won't even talk about setting up the scanner and reading the 'count' number of students expected, and I focus here on the action.

Stream.generate(() -> new Student(sc.nextInt(), sc.next(), sc.nextDouble())) // 1
    .limit(count) // 2
    .sorted(Comparator // 3
        .comparingDouble(Student::getCgpa).reversed() // 4
        .thenComparing(Student::getFname) // 5
        .thenComparingInt(Student::getId)) // 6
    .map(Student::getFname) // 7
    .forEach(System.out::println); // 8
  1. The stream comes up by data provided on the fly, reading them from the scanner - named sc. I read an int, a string, a double from it, I shovel them in the Student ctor, that is used in a lambda as supplier for the stream generated() method
  2. I don't want generate() to be called forever, the stream size is limited by the number of students, that I have previously read in 'count'
  3. I have to sort my stream, to do that I provide a Comparator ...
  4. Firstly I compare (primitive) doubles as provided by the Student.getCgpa() method. But, wait, I don't want the natural order, so I reverse it
  5. Second level of comparison, the Student first name
  6. Third level, the (primitive) int representing the Student id should be used
  7. Almost done. I'm interested in just the names, so I extract them by mapping
  8. Loop on the sorted student names, and print each one on a new line

You could get the full Java code from GitHub.

Go to the full post

HackerRank Java 1D Array (Part 2)

Don't be misled by its puzzling name, this fun little HackerRank problem has its main interest not in its implementation language nor in the use of the array data structure, but in the algorithm you apply to solve it.

There is a one dimensional board. Each cell in it could be set to zero (meaning free) or one. At startup the pawn is placed on the first cell to the left, and its job is going out to the right. It could be moved only moving forward by 1 or 'leap' positions, or backward just by one, but only to free (set to zero) cells.

Having specified the zeros and ones on the board, and the value of leap (non negative and not bigger than the board size), can we say if the game could be won?

I decided to use the Dynamic Programming approach, since the problem splits naturally in a series of simple steps, each of them contributing to the solution. There is one issue that muddies the water, the backward move. The rest of the algorithm is pretty straightforward.

  • Start from the last cell in the board.
  • Determine if a pawn could be placed there and if this would lead to a win.
  • Move to the next one of its left, up to the leftmost one.
  • Then return the win condition for the first cell.

Firstly, I create a cache to store all the intermediate values:

boolean[] memo = new boolean[game.length];
if (game[game.length - 1] == 0)
    memo[memo.length - 1] = true;

They are all initialized to false, but the last one, and only if a pawn could go there.

Having set this initial condition, I could loop on all the other elements, in inverse order

for (int i = memo.length - 2; i >= 0; i--) {
    // ... see below
}

return memo[0];

In the end, the first cell of my cache would contain the solution.

Hold on while reading the following "if", explanation follows.

if (game[i] == 0 && (i + leap >= game.length || memo[i + 1] || memo[i + leap])) {
    memo[i] = true;
    // ... see below
}

First thing, check the current value in the board. If it is not zero, I can't put the pawn there, the cache value stays set to false.

Otherwise, any of the following three conditions lead to true in cache:

  • adding the current position to the leap value pushes the pawn out of the board
  • the next cell in the board is a good one
  • the next leap cell is a good one

Now the funny part. When I set to true a cell, I should ensure that a few cells to the right are marked as good, too. This is due to the backward move. It is a sort of domino effect, that should stop when we find a cell where our pawn can't go.

for (int j = i + 1; j < memo.length && game[j] == 0; j++)
    memo[j] = true;

Full Java code and test case pushed on GitHub.

Go to the full post

Partitioning Souvenirs (patched)

Given a list of integers, we want to know if there is a way to split it evenly in three parts, so that the sum of each part is the same than the other ones.

Problem given in week six of the edX MOOC Algs200x Algorithmic Design and Techniques by the UC San Diego.

My first solution, as you can see in a previous post (please check it out for more background information), was accepted in the MOOC but didn't work in a few cases, as Shuai Zhao pointed out.

So, I added the Shuai test case to my testing suite, and then a couple of other ones, that I felt would help me in writing a more robust solution:
def test_shuai(self):
    self.assertFalse(solution([7, 2, 2, 2, 2, 2, 2, 2, 3]))

def test_confounding_choice(self):
    self.assertFalse(solution([3, 2, 2, 2, 3]))

def test_duplicates(self):
    self.assertTrue(solution([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]))
Firstly, I rewrote the initial check, that now looks in this way:
def solution(values):
    third, module = divmod(sum(values), 3)  # 1
    if len(values) < 3 or module or max(values) > third:  # 2
        return False
1. Add up the passed values, our target is its third, and surely we can't split it there is a remainder after the division. Using divmod() we get both values in a single shot.
2. If we have less than three values in input, a remainder by dividing their sum by three, or if there is a value bigger than a third of the sum, our job is already done.

Then, since I saw that the problem was due to the lack of check in the ownership for values, I added a shadow table to the one I'm using for the Dynamic Programming procedure. I named it 'taken', and it keeps the id of the owner for the relative value for the current row. Zero means the value is currently not taken.
table = [[0] * (len(values) + 1) for _ in range(third + 1)]
taken = [[0] * (len(values) + 1) for _ in range(third + 1)]
As before, I apply the standard DP procedure, looping on all the "valid" elements.
for i in range(1, third + 1):
    for j in range(1, len(values) + 1):
        # ...
Another not a substantial change, it occurred to me that I keep track of a maximum of two positive matches in the table, so, if I have already reached that value, there's no need in doing other checks, just set the current value to two.
if table[i][j-1] == 2:
    table[i][j] = 2
    continue
The next check was already in, I just refactored it out, because the combined if clause in which it was included was getting too complicated, and I added the first use of the taken table.
if values[j-1] == i:
    table[i][j] = 1 if table[i][j-1] == 0 else 2  # 1
    taken[i][j] = j  # 2
    continue

ii = i - values[j-1]  # 3
1. If the current value is the same of the current row value, I have found a match. So I increase its value in the DP table
2. And I mark it as taken for the j value in the current row.
3. If j is not an exact match for i, I split it, looking for the difference in the previous column.

Now it's getting complicated. If there is a good value in the previous column, before using it we have to ensure it is available. And, since it could have been the result of a splitting, we need to check on the full row to do our job correctly.
available = True
taken_id = taken[ii][j-1]  # 1
if taken_id:
    for jj in range(1, j):  # 2
        if taken[ii][jj] == taken_id:  # 3
            if taken[i][jj]:
                available = False
                break
1. The current value id from the taken table is zero if was not used in the 'ii' row.
2. If the current value has a valid id, we need to ensure it has not be already used in current 'i' row.
3. The current jj is marked as used in the ii row for the id we care about, if it is taken also in the i row, we have a problem. We are trying to using in 'i' a value that has been already used. This was the check I missed.

Good, now I can safely going on with my DP algorithm.
if ii > 0 and table[ii][j - 1] > 0 and not taken[i][j - 1] and available:
    table[i][j] = 1 if table[i][j - 1] == 0 else 2
    
    // ...
The annoying thing is that I have to set the taken table too:
taken[i][j] = j  # 1
taken[i][j-1] = j

matcher = values[j-1] + values[j-2]  # 2
if taken_id:
    for jj in range(1, len(values)):  # 3
        if taken[ii][jj] == taken_id:
            taken[i][jj] = j
            matcher += values[jj-1]
if matcher < i:
    for jj in range(j-2, 0, -1):  # 4
        if taken[i][jj] or matcher + values[jj-1] > i:
            continue
        matcher += values[jj-1]
        taken[i][jj] = j
        if matcher == i:
            break
1. This is nice and simple. The current j and the previous one should be marked as j.
2. We have to mark all the other values concurring to generate the current sum. These are two just marked 'j' on this row, all the ones marked as stored in taken_id on 'ii' and, if they are not enough, also the ones free on 'i', to the get the expected result. To perform this last check I need another variable, that I called 'matcher'.
3. Mark on 'i' as 'j' all the items marked taken_id on 'ii'. And adjust matcher.
4. If needed, loop on the left elements looking for the missing values.

And finally:
        # ...
        else:  # 1
            table[i][j] = table[i][j - 1]

return True if table[-1][-1] == 2 else False  # 2
1. This 'else' follows the 'if ii > 0 and ...' above, ensuring the current cell is adjourned correctly if nothing else happened before.
2. The bottom right cell in our DP table should be set to 2 only if we find a correct 3-way split.

See full python code and test cases on GitHub.

Go to the full post

HackerRank Deque-STL

Given an array of integers, find the max value for each contiguous subarray in it sized k. This HackerRank problem is meant to be solved in C++ and, as its name suggests, using a deque.

For instance, if we are given the array {3, 4, 6, 3, 4} and k is 2, we have to consider four subarrays sized:
{3,4} {4,6} {6,3} {3,4} 
And the expected solution is
{4, 6, 6, 4}
An adapter

The original HackerRank problem asks to write a function than outputs its result to standard output. I didn't like much this requisite. As a TDD developer, I'm used to let tests drive the code development. And having to check standard output to verify a function behavior is not fun. So I slightly changed the function signature, asking to return a vector containing the results, and I used the original function as a simple adapter to the original problem. Something like that:
std::vector<int> maxInSubs(int data[], int n, int k)
{
    // ...
}

// ...
void printKMax(int arr[], int n, int k)
{
    auto data = maxInSubs(arr, n, k);
    std::copy(data.begin(), data.end(), std::ostream_iterator<int>(std::cout, " "));
    std::cout << '\n';
}
First (naive) attempt

Just do what we are asked to to. For each subarray find its maximum value ad push it to the result vector.
std::vector<int> maxInSubs(int data[], int n, int k)
{
    std::vector<int> results;
    for (int i = 0; i < n - k + 1; ++i)
    {
        results.push_back(*std::max_element(data + i, data + i + k));
    }
    return results;
}
Clean and simple and, when k and n are small, not even too slow. However, for k comparable to a large n we can say bye bye to performance.

Patched naive attempt

We could be tempted to save the algorithm explained above, observing that the slowness is due to the k calls to max_element(). We could avoid to call it a substantial number of times checking the value of the elements exiting and entering the current window, for instance in this way:
std::vector<int> results{ *std::max_element(data, data + k) };  // 1

for (size_t beg = 1, end = k + 1; end <= n; ++beg, ++end)  // 2
{
    if (data[end - 1] > results[results.size() - 1])  // 3
    {
        results.push_back(data[end - 1]);
    }
    else if (data[beg - 1] < results[results.size() - 1])  // 4
    {
        results.push_back(results[results.size() - 1]);
    }
    else  // 5
    {
        results.push_back(*std::max_element(data + beg, data + end));
    }
}
1. Initialize the result vector with the max element for the first interval.
2. Keep beg and end as loop variable, describing the current window to check.
3. The new right element of the window is bigger than the max for the previous window. Surely it is the max for this one.
4. The element that has just left the window is smaller than the previous max. Surely the max is still in the window.
5. Otherwise, we'd better check which is the current max.

A smartly designed array in input could beat this simple algorithm. However on HackerRank they didn't spend too much time on this matter, and this solution is accepted with full marks.

Solution with a deque

In a more elegant solution, we should to minimize the multiple check we perform on the data elements. Right, but how? Until this moment, I haven't paid attention to the huge hint HackerRank gave us, "Use a deque!", they shout from the name of the problem itself.

The point is that I want to perform a cheap cleanup on each window, so that I could just pick a given element in it, without scanning the entire interval.

Let's use the deque as a buffer to store only the reasonable candidates as max. Since we want to remove from this buffer the candidates that are not anymore valid when the window is moved, instead of their value we keep in it their indices from the original data array.

Here is how I initialize it:
std::deque<int> candidates{ 0 };  // 1
for (int i = 1; i < k; ++i)
{
    pushBack(candidates, data, i);  // 2
}
1. We could safely say that the first element in data is a good candidate as max for its first subarray.
2. Push back to candidates the "i" index from data, but first ensure the previous candidates are useful.

Since the code in pushBack() is going to be used also afterward, I made function for it:
void pushBack(std::deque<int>& candidates, int data[], int i)
{
    while (!candidates.empty() && data[i] >= data[candidates.back()])  // 1
        candidates.pop_back();
    candidates.push_back(i);
}
1. There is no use in a candidate, if the newcomer is bigger, so remove it.

Now candidates contains the indices of all the elements in the first window on data having the max value. Possibly just one element, but for sure the deque is not empty.

We are ready for the main loop:
for (int i = k; i < n; ++i)
{
    results.push_back(data[candidates.front()]);  // 1

    if (candidates.front() <= i - k)  // 2
        candidates.pop_front();

    pushBack(candidates, data, i);  // 3
}
results.push_back(data[candidates.front()]);  // 4
1. As said above, we know that candidates is not empty and its front is the index of a max value in the current window. Good. Push it to results.
2. Now we prepare for the next window. If the front candidate is out, we remove it.
3. Push back the new element index among the candidates, following the algorithm described above. It would kill the candidates that are not bigger than it, ending up with a deque where the biggest element is surely on front.
4. Remember to push the last candidate in the results, and then the job is done.

Does this solution look more convincing to you? Full C++ code and test case on GitHub.

Go to the full post

HackerRank Equal

We have a list of integers, and we want to know in how many rounds we could make them all equal. In each round we add to all the items in the list but one the same number chosen among 1, 2, and 5. Pay special attention when you solve this problem on HackerRank. Currently (April 2018) its enunciation says each increase should be of one, three, or five. However, this is not what the solution is tested for.
It looks like someone decided to change 2 for 3 a few months ago, edited the description and then forgot to modify the testing part. Who knows what is going to happen next. Be ready to be surprised.
Besides, it is inserted in the Algorithms - Dynamic Programming section, but I don't think that is the right approach to follow.

Simplifying the problem

Instead of applying the weird addition process stated in the problem, we could decrease each time a single item. For instance, given in input [2, 2, 3, 7], a solution to the original problem is:
2, 2, 3, 7
+5 +5 +5  = (1)
 7, 7, 8, 7
+1 +1  = +1 (2)
 8, 8, 8, 8
We could solve it in this way instead:
2, 2, 3, 7
 =  =  = -5 (1)
 2, 2, 3, 2
 =  = -1  = (2)
 2, 2, 2, 2
Since we are asked to calculate the number of steps to get to the equal state, in both ways we get the same result.

Base cases

A sort of rough algorithm is already emerging. We get the lowest value in the list, and decrease all the other elements using the available alternatives until we reach it. We start from the biggest value (5) and only when we are forced to, we fall back to the shorter steps.

To better understand how we should manage the last steps, I put the base cases on paper.
If x is at level 1, 2 or 5, we could get zero at cost 1. But if x is at level 3 or 4, it costs 2 to us. Notice that if, instead of getting to level zero, we move both the lowest element and the x element at level -2, we get to the result in the same number of moves. For instance, if our input list is [10, 13]
10, 13
    -2 (1)
    -1 (2)
10, 10
 
10, 13
-2     (1)
    -5 (2)
 8,  8
If the input list is [0, 3, 3] getting down to the bottom from 3 in just one move gives us an advantage:
10, 13, 13
    -2      (1)
    -1      (2)
        -2  (3)
        -1  (4)
10, 10, 10
 
10, 13, 13
-2          (1)
    -5      (2)
        -5  (3)
 8,  8,  8
The algorithm

I think I've got it. Firstly I find the minimum value, M, in the input list. That is a possible target for all the items to be reach. However I have to check other two alternatives, M-1 and M-2.
I loop on all the items in the list. For all the three possible targets, I calculate the difference between it and the current value, count the number of steps to get there, and add it to the total number of steps required for getting to that target.
And then I choose as a result the cheapest target to reach.

The code

Using Python as implementation language, I started with a couple of test cases, and then added a few ones along the way, when I bumped into troubles, and I ended up with this code.
SHIFT = [0, 1, 2]  # 1


def solution(data):
    lowest = min(data)  # 2

    results = [0] * len(SHIFT)  # 3
    for item in data:
        for i in SHIFT:
            gap = item - lowest + i  # 4
            results[i] += gap // 5 + SHIFT[(gap%5 + 1) // 2]  # 5
    return min(results)  # 6
1. Represents the three possible targets, from the minimal value in the list down to -2.
2. Get the minimal value in the list.
3. Buffer for the results when using the different targets.
4. Distance that has to be split.
5. Add the current number of steps to the current buffer. Firstly I get the number of long moves dividing gap by 5. Now I am in the base case, as showed in the picture above. Notice that the cost of moving from X to target is [0, 1, 1, 2, 2] for gap in [0..5], if we take gap, apply modulo five, increase it and then divide by two, we get the index in SHIFT to the number of steps actually needed. Admittedly an obscure way to get there, if this was production code, I would have probably resisted temptation to use it.
6. Get the minimal result and return it.

All tests passed in hackerrank, python script and test case pushed to GitHub.

Go to the full post

HackerRank Roads and Libraries

We are given n nodes and a (possibly huge) number of edges. We are also given the cost of building a library in a city (i.e. a node) and a road (i.e. an edge). Based on these data we want to minimize the cost of creating a forest of graphs from the given nodes and edges, with the requirement that each graph should have a library on one of its nodes. This is a HackerRank problem on Graph Theory algorithms, and I am about to describe my python solution to it.

If a library is cheaper than a road, the solution is immediate. Build a library on every node.
def solution(n, library, road, edges):
    if road >= library:
        return n * library

    # ...
Otherwise, we want to create a minimum spanning forest, so to minimize the number of roads, keeping track of the number of edges used and how many graphs are actually generated. I found natural using an adapted form of the Kruskal MST (Minimum Spanning Tree) algorithm, that looks very close to our needs.

Kruskal needs a union-find to work, and this ADT is not commonly available out of the box. So, I first implemented a python UnionFind class, see previous post for details.
Then, while working on this problem, I made a slight change to it. My algorithm was simpler and faster if its union() method returned False when nothing was actually done in it, and True only if it led to a join in two existing graph.

Using such refactored UnionFind.union(), I wrote this piece of code based on Kruskal algorithm:
uf = UnionFind(n)
road_count = 0  # 1

for edge in edges:
 if uf.union(edge[0] - 1, edge[1] - 1):  # 2
  road_count += 1  # 4
  if uf.count == 1:  # 5
   break
1. The union-find object keeps track of the numbers of disjointed graphs in the forest, but not of edges. This extra variable does.
2. I need to convert the edges from 1-based to 0-based convention before use them. If the two nodes get connected by this call to union(), I have some extra work to do.
4. An edge has been used by union(), keep track of it.
5. If union() connected all the nodes in a single graph, there is no use in going on looping.

Now it is just a matter of adding the cost for roads and libraries to get the result.
return road_count * road + uf.count * library

Complete python code for problem, union-find, and test case on GitHub.

Go to the full post

Union Find

I have a (possibly huge) bunch of edges describing a forest of graphs, and I'd like to know how many components it actually has. This problem has a simple solution if we shovel the edges in a union-find data structure, and then just ask it for that piece of information.

Besides the number of components, our structure keeps track of the id associated to each node, and the size of each component. Here is how the constructor for my python implementation looks:
class UnionFind:
    def __init__(self, n):  # 1
        self.count = n
        self.id = [i for i in range(n)]  # 2
        self.sz = [1 for i in range(n)]  # 3
1. If the union-find is created for n nodes, initially the number of components, named count, is n itself.
2. All the nodes in a component have the same id, initially the id is simply the index of each node.
3. In the beginning, each node is a component on its own, so the size is initialized to one for each of them.

This simple ADT has two operations, union and find, hence its name. The first gets in input an edge and, if the two nodes are in different components, joins them. The latter returns the id of the passed node.

Besides, the client code would check the count data member to see how many components are in. Pythonically, this is exposure of internal status is not perceived as horrifying. A more conservative implementation would mediate this access with a getter.

Moreover, a utility method is provided to check if two node are connected. This is not a strict necessity, still makes the user code more readable:
def connected(self, p, q):
    return self.find(p) == self.find(q)
The meaning of this method is crystal clear. Two nodes are connected only if they have the same id.

In this implementation, we connect two nodes making them share the same id. So, if we call union() on p and q, we'll change the id of one of them to assume the other one. Given this approach, we implement find() in this way:
def find(self, p):
    while p != self.id[p]:
        p = self.id[p]
    return p
If the passed p has id different from its default value, we check the other node until we find one that has its original value, that is the id of the component.

We could implement union() picking up randomly which id keep among the two passed, but we want keep low the operational costs, so we work it out so to keep low the height of the tree representing nodes in a component, leading to O(log n) find() complexity.
def union(self, p, q):
    i = self.find(p)
    j = self.find(q)
    if i != j:  # 1
        self.count -= 1  # 2
        if self.sz[i] < self.sz[j]:  # 3
            self.id[i] = j
            self.sz[j] += self.sz[i]
        else:
            self.id[j] = i
            self.sz[i] += self.sz[j]
1. If the two nodes are already in the same component, there is no need of doing anything more.
2. We are joining two components, their total number in the union-find decrease.
3. This is the smart trick to keep low the cost of find(). We decide which id to keep as representative for the component accordingly with the sizes of the two merging ones.

As example, consider this:
uf = UnionFind(10)
uf.union(4, 3)
uf.union(3, 8)
uf.union(6, 5)
uf.union(9, 4)
uf.union(2, 1)
uf.union(5, 0)
uf.union(7, 2)
uf.union(6, 1)
I created a union-find for nodes in [0..9], specifying eight edges among them, from (4, 3) to (6, 1).
As a result, I expect two components and, for instance, to see that node 2 and node 6 are connected, whilst 4 and 5 not.

I based my python code on the Java implementation provided by Robert Sedgewick and Kevin Wayne in their excellent Algorithms, 4th Edition, following the weighted quick-union variant. Check it out also for a better algorithm description.

I pushed to GitHub full code for the python class, and a test case for the example described above.

Go to the full post

HackerRank Climbing the Leaderboard

We are given two lists of integers. The first one is monotonically decreasing and represent the scores of the topmost players in a leaderboard. The second one is monotonically increasing and contains the score history of Alice, a player who rapidly climbed the board.
Following the dense ranking convention, we want to get back a list containing the history of rank positions for Alice.
This is a HackerRank Algorithm Implementation problem, and I am going to show you how I solved it, using Python as implementation language.

I noticed that the first list, scores, is already sorted, we just have to get rid of duplicates to have a matching between the position and the score Alice has to get to achieve that ranking.

Bisect solution

I just have to do the matching. First idea jumped to my mind, was performing a binary search on scores to do it. It helps that Python provides for the job a well known library, bisect. There's just a tiny issue, bisect expects the underlying list to be sorted in natural order, so we need to reverse our scores.

It looks promising, let's implement it.

A pythonic way to get our ranking would be this:
ranking = sorted(list(set(scores)))
I get the original list, convert to a set to get rid of duplicates, than back to list, so that I can sort it in natural order. Nice, but in this problem we are kind of interested in performance, since we could have up to 20 thousand items in both lists. So we want to take advantage of the fact that the list is already sorted.

So, I ended up using this rather uncool code:
ranking = [scores[-1]]
for i in range(len(scores)-2, -1, -1):
 if scores[i] > ranking[-1]:
  ranking.append(scores[i])
I initialize the ranking list with the last item in scores, then I range on all the other indices in the list from right to left. If the current item is bigger than the latest one pushed in ranking, I push it too.

Now I can just rely on the bisect() function in the bisect python module, that would find which position the current item should be inserted in the list. With a caveat, I have reverted the order, so I have to adjust the bisect() result to get the result I'm looking for:
results = []
last = len(ranking) + 1
for score in alice:
 results.append(last - bisect(ranking, score))
This code pass all the tests, job done.

However. Do I really need to pay for the bisect() search for each element of alice?

Double scan solution

Well, actually, we don't. Since we know that both list are sorted, we can use also the ordering in alice to move linearly in ranking.

Since we are not using anymore bisect, we don't need to revert the sorting order in ranking, and the duplicate cleanup is getting a bit simpler:
ranking = [scores[0]]
for score in scores[1:]:
 if score < ranking[-1]:
  ranking.append(score)

Now we compare each item in alice against the items in ranking moving linearly from bottom to head:
results = []
for score in alice:
 while ranking and score >= ranking[-1]:
  ranking.pop()
 results.append(len(ranking) + 1)
We don't have to be alarmed by the nested loops, they don't have a multiplicative effect on the time complexity, since we always move forward on both lists, the result is a O(M + N) time complexity.

Is this a better solution than the first one? Well, it depends. We should know more on the expected input. However, for large and close values of N and M, it looks so.

I pushed the python script for both solutions and a test case to GitHub.

Go to the full post