Pages

HashMap.computeIfAbsent() behavior in Java 9

If in your code you are using computeIfAbsent(), be aware that it has changed its behavior in Java 9. Now you get a ConcurrentModificationException if the mapping function changes the map.

So, for instance, this Fibonacci calculator works fine in Java 8:
class Fibonacci {
    private static final Map<Integer, Long> cache = new HashMap<>();
    static {
        cache.put(0, 0L);
        cache.put(1, 1L);
    }

    public long calculate(int x) {
        return cache.computeIfAbsent(x, n -> calculate(n - 1) + calculate(n - 2));
    }
}
But not if you try to run it in Java 9.

As the computeIfAbsent()'s javadoc states quite clearly, "This method will, on a best-effort basis, throw a ConcurrentModificationException if it is detected that the mapping function modifies this map during computation". If you have a look at the code, you will see that line 1139 is:
if (mc != modCount) { throw new ConcurrentModificationException(); }
And the mc counter is increased a bit below to keep track of any changes in the map due to mappingFunction.

My way out to this issue has been refactoring the Fibonacci calculation to get rid of computeIfAbsent(). Something like that:
public long calculate(int x) {
    if (cache.containsKey(x)) {
        return cache.get(x);
    }

    long lhs = cache.containsKey(x - 1) ? cache.get(x - 1) : calculate(x - 1);
    long rhs = cache.containsKey(x - 2) ? cache.get(x - 2) : calculate(x - 2);
    long result = lhs + rhs;

    cache.put(x, result);
    return result;
}
Well, it's not a beauty. At least it works.

Go to the full post

CodeEval Lowest Unique Number revised

Given a list of numbers, find the pne that is unique and have the smallest value. I have just provided a python solution to this CodeEval problem, but there was something in it that bothered me. So here I provide an alternative solution.

The soft spot in my previous solution was the combined sorting of the buffer counter dictionary added to a call to index() on the data list. I could live with the sorting, but I felt as unbearable the re-scan the data list to get the actual index. The Counter object knew it on its initialization, why don't use it at that time?

The refactoring ends up with code that is less readable than the first version and the performance gain, in this specific problem is minimal. It would make sense if the problem would require to accept in input long sequences of numbers. Anyway, here it is.
NOT_FOUND = 0  # 1
FOUND_MANY = -1

def solution(line):
    data = [int(x) for x in line.split()]  # 2
    result = [0] * 9  # 3
    for i, number in enumerate(data):  # 4
        result[number-1] = i+1 if result[number-1] == NOT_FOUND else FOUND_MANY  # 5

    for index in result:  # 6
        if index > NOT_FOUND:
            return index
    return 0
1. I am going to flag the items in the buffer with zero to mean that the relative number was not found, and with a minus one when multiple instance were found.
2. As in the original solution, I convert the input string in a list of integers.
3. Instead of using a Counter collection, here I have a list of integers, reporting for each number in [1..9] the first reference index in data.
4. First loop. I scan each element in data. I need both its position and value, so I use the enumerate builtin to get them.
5. The need for switch from 0-based to 1-based indices leads to this painful line. I have to decrease number, to make it a 0-based index in the buffer list named result, and I have to increase i so that it becomes an 1-based index, as required by the problem.
6. Second loop. I scan each index as stored in the result buffer, as soon as I find a valid value I return it as a solution. If there is no valid index in that list, zero is returned instead.

I pushed the changes on GitHub.

Go to the full post

CodeEval Lowest unique number

We have a string in input containing numbers is the range [1 .. 9]. We want to get the index (following the unsettling Pascal habit of giving one for the first element) of the lowest unique number available, if such a beast exists, otherwise zero. This is the CodeEval problem #103.

So, for instance, given these two sequences:
3 3 9 1 6 5 8 1 5 3
9 2 9 9 1 8 8 8 2 1 1
In the first case we should return 5, index of the unique 6 in the list, and in the second case we return 0, since there is no unique number.

Here is the core of my python solution.
data = [int(x) for x in line.split()]  # 1
counter = Counter(data)  # 2
for key, value in sorted(counter.items()):  # 3
    if  value == 1:  # 4
        return data.index(key) + 1
return 0  # 5
1. I convert the input string to a list of integer.
2. I use a collections Counter object to count the number in the list. Now counter is a dictionary where the key is any number passed in input and the associated value is its numerosity.
3. I sort the items in the counter, so that I start checking from the lowest value on, and I loop on all of them.
4. As soon as I find an item in the counter dictionary that has value 1 (meaning, its key is unique) I return the first element in data with such value. I have to add one to the position returned by the index() method because I have to convert the C-style index to a Pascal one, as required by the problem.
5. I fall off the for loop without finding my egg.

Full python script on GitHub.

Go to the full post

CodeEval Pass Triangle

We are given as input a sequence of integers that we should think representing a triangle. We should return the maximum value we can get adding all values from the top vertex of the triangle down to the bottom, chosing a path moving always downwards between adjacent elements. This is CodeEval problem #89.

For example, if this is the triangle we get as input:
5
  9 6
 4 6 8
0 7 1 5
The expected solution is 5 + 9 + 6 + 7 = 27

I started thinking in the Dynamic Programming direction. Seeing that for a minimal triangle the result comes out comparing the two lower elements and then adding the bigger one to the higher one, I simply applied this rule to all the elements, starting from the bottom line up to the second from top.

Here is my python code:
def solution(data):  # 1
    for i in range(len(data) - 1, 0, -1):  # 2
        for j in range(len(data[i])-1):  # 3
            data[i-1][j] += max(data[i][j], data[i][j+1])  # 4
    return data[0][0]  # 5
1. The solution function is called passing as input parameter a list of list of integers. The first list contains just a value, the triangle top element. As usual, we can assume that the data we receive is as expected. No check is required.
2. The loop variable in this for-loop goes from the last line (3, in the example case) down to 1. This is the line from which I'm reading the values, I'm going to write in the line above.
3. Here the loop variable of the for-loop represents the index of the first element I'm comparing.
4. Calling max() I see which one is the selected value, and I add it to the element in the line above.
5. Finally I just have to return the top element, that now contains the result.

Notice that my code is destructive, since I use the input data as working area. This is usually considered not a problem, or even good, in this context (less code, cheaper and faster). Not so if the user of the function would like to do something else with its data!

The complete python script is on GitHub.

Go to the full post

CodeEval Query Board

We have a square matrix of fixed size, and we receive a bunch of commands in input to operate on it. Two of them are about setting values on it, the other two perform a query and return an integer result.
This problem is quite simple, more detail on CodeEval #87, still, I'm writing a few lines about it because it gave me a way to exercise with a couple of Python features.

Mapping command names with function names

We receive the description of what to do as a string like this "SetCol 32 20". The first element is the name of the command we want to perform on our matrix. It looks like it has been written thinking in a Pascal-based language, but I want to use Python, so I convert it in a standard compliant name using a dictionary:
ID_2_NAME = {'SetRow': 'set_row', 'SetCol': 'set_col', 'QueryRow': 'query_row', 'QueryCol': 'query_col'}
A couple of notation here. Firstly, as usually happen in these kind of problem, we don't have to care about error handling. In real life, we should expect bad data in input.
Secondly, I have decided to wrap matrix and related functionality in a class. Using free functions instead, I would have mapped the names with the functions, something like this:
SIZE = 256
board = # ...

def set_row(i, value):
    for j in range(SIZE):
        board[i][j] = value

# ...
ID_2_FUN = {'SetRow': set_row, # ...

# call set_row(15, 7)
command = 'SetRow'
ID_2_FUN[command](15, 7)
The use of a class led me to follow a different path. But this way was interesting, too.

Defining the Board class

The Board initializer just sets the size and the bidimensional list used as matrix:
def __init__(self, size=256):  # 1
    self.size = size
    self.board = [[0 for _ in range(size)] for _ in range(size)]  # 2
1. I decided not to fix the size, as I would have done in a more hasty implementation, but just to default it to the dimension, as required by CodeEval.
2. The matrix is built up on the fly with a double list comprehension. The for loop variables are not used, so I named them both with the idiomatic underscore. The external, right side, for-loop creates all the rows. The internal, left side, for-loop creates a list filled with zeroes and attach it to each row.

There is almost nothing to say about set_row() and set_col(), it's just a for-loop on the passed row or column to set each element to the specified value, see on GitHub if you want to compare your implementation with mine.

The two query functions are again logically equivalent, we just have to sum all the values for a specified row or column. However, in case of column, we have to perform an explicit for-loop, where for the row we could use the built-in sum() function:
def query_row(self, i):
    return sum(self.board[i])

Calling the Board methods

Let see again a typical command line, "SetCol 32 20". We have to split it, so that we get as first element the command name, and then the function parameters. Once I all these elements, we could get help from getattr() to call the right method with its parameters:
board = Board()
# ...

command, *params = line.split()  # 1
res = getattr(board, ID_2_NAME[command])(*[int(p) for p in params])  # 2
if res is not None:  # 3
    print(res)
1. We extract in the list params all the elements next to the first one. Beware, they are all strings here.
2. ID_2_NAME maps the command name, for instance SetCol, to the method name, set_col. Using getattr() we are actually calling the method on the passed object, same of board.set_col() here. On the right, I convert the parameters in params to integer and I extract them to single elements from the list, by the star operator.
3. The setters do not return anything. Or better, being Python functions, they return None. We are not interested in printing it. But we want to print what the query methods return.

The full Python script is on GitHub.

Go to the full post

CodeEval Balanced Smileys

This problem is part of the family of the ones asking to check if parentheses in a sentence are balanced. See for instance the simpler matching brackets one.

In this version, happy ':)' and sad ':(' smileys could be part of the the sentence, so that we have to think if a parentheses has to be considered in the balancing or skipped, being the representation of the mouth of the smiley.

I found it as CodeEval #84 problem, but it is a repost, originally coming from Facebook Hacker Cup 2013 Hackathon.

I came to a solution inferior to the one published by the Hacker Cup editors, so I just threw mine in the dustbin and I spent some time pondering on the other one, coming out with a slightly different variation. [By the way, I would suggest you to do the same. When you see a piece of code you don't fully understand, play around with it, adapt it to your style, make it yours.]

The base problem is pretty simple. We loop on each character in the string, any time we see an open bracket, we push it in a stack (or we emulate a push in a stack, using just a counter). When we see a close bracket, we pop from the stack the matching open bracket. In the end, the stack should be empty. If we try to pop when the stack is empty, the sentence is unbalanced.

Having to care to the smileys, we need a second (virtual) stack, where we push also the dubious open brackets, the one that could have a double interpretation.

Looking at the code should clarify the sense of this solution:
min_open = 0  # 1
max_open = 0  # 2
for i in range(len(msg)):  # 3
    if msg[i] == '(':  #4
        max_open += 1
        if i == 0 or msg[i-1] != ':':  # 5
            min_open += 1
    elif msg[i] == ')':  # 6
        if i == 0:
            return False
        min_open = min_open - 1 if min_open else 0
        if msg[i-1] != ':':  # 7
            if max_open == 0:
                return False
            max_open -= 1
return True if min_open == 0 else False  # 8
1. Emulate the first stack. It keep track of the "clean" open brackets only.
2. The second "dirty" stack, containing also the brackets that could be seen as mouth for a sad smiley.
3. Loop an all the character in the sentence.
4. Open bracket, push it to the "dirty" stack (i.e. increase the max_open counter), and ...
5. ... only if it couldn't be seen as the mouth of a smiley, push it to the "clean" stack (i.e. increase the min_open counter)
6. Close bracket, if we are at the beginning, the sentence is unbalanced. Otherwise, pop the matching open bracket from the "clean" stack. If it was already empty, the sentence could be unbalanced. Not sure, however, because it could be a happy smiley. So ...
7. If the bracket could be seen as the mouth of a happy smiley, the "dirty" stack empty means the sentence is unbalanced, otherwise, pop a bracket from it.
8. Completed the loop, the sentence is balanced only if the "clean" stack is empty.

The full python script is on GitHub.

Go to the full post

Calling a Stored Function from JDBC

The code we should write to call a stored procedure or function via JDBC is slightly different from the one we write to perform a SQL SELECT. A CallableStatement is used instead of a Statement plus a ResultSet to set the parameter, execute the call, and extract the result.

To check it, I have firstly created a stored function in my Oracle 12 instance, on the HR schema. It takes a SQL CHAR in input, and returns the same string with 'suffix' appended:
create or replace FUNCTION foo (val CHAR)
RETURN CHAR AS
BEGIN
    RETURN val || 'suffix';
END;
I ensured it works as I expected running this query:
SELECT foo('hello') FROM DUAL;
The I have written a couple of tests to see how to get the same result in Java. Not just one, because we can use either the JDBC escape and the PL/SQL block syntax to achieve the same result, being the first
{? = call foo(?)
and the latter
begin ? := foo(?); end;
The core of both tests is in these few lines:
cs = conn.prepareCall(call);  // 1
cs.registerOutParameter(1, Types.CHAR);  // 2
cs.setString(2, "aa");
cs.execute();
String result = cs.getString(1);  // 3
1. Assuming conn is a good java.sql connection to the HR user on my Oracle database, and call is either the JDBC escape or the PL/SQL block string showed above, the prepareCall() should return a good CallableStatement.
2. The callable statement has to know the type of the output parameter is a character string. Then we set the other parameter, with the string that we want to pass as input.
3. After executing the callable statement, we get the first (and only) result as a string.

The actual code, that you could find on GitHub, class GettingStartedTest methods callFoo(), testFunctionCallJdbcEscape(), and testFunctionCallPlSqlBlock(), is a bit more verbose because I have to provide all the standard boilerplate, initializing, testing, cleaning up.

Reference: Oracle Database JDBC Developer's Guide 12c Release 2 (12.2) 2.8 Stored Procedure Calls in JDBC Programs

Go to the full post

Selecting on Oracle via JDBC

Once we decided how to establish a connection to the database, performing a SELECT on it is pretty simple. Here I'm going to use OracleDataSource as connection provider, using pure JDBC doesn't add any complexity.

The only problem in this Java code is that is slightly boring. So boring that there are a few wrapper aiming at keeping the advantages of JDBC easing its usage. See for instance the Spring JdbcTemplate.

Anyway, firstly we open a connection, then a statement, on which we execute a query (a select, in this case) that return a resultset. We normally do something on the resultset, and then we can close what we have opened, in reversed order, resultset, statement, connection. The code is made even more boring by the need of try-catching the SQLExceptions that could be thrown. Last straw in this boredom parade, is that before closing the JDBC objects we have to ensure they have been created, checking them for not being null.

So, our typical plain JDBC code could be seen in three steps:

Initialization

// ...
Connection conn = null;
Statement stmt = null;
ResultSet rs = null;
try {
    conn = ods.getConnection();  // 1
    stmt = conn.createStatement();
    rs = stmt.executeQuery("SELECT first_name FROM employees");  // 2

    // ...
1. I'm using the OracleDataSourceConnector I described in the previous post. Nothing fancy, anyway, I simply get a connection to a Oracle database for the well known hr example user.
2. Having extracted a statement from the connection, I execute a query, that is plain SQL SELECT. The resulting table, here consisting of just one column, is returned in a ResultSet.

Operation

Let's do something with the fetched data. Here I copy the first names to a list of string, then I sort them and I ensure the first and last are what I expect. Usually this part of the job has more sense.
List<String> results = new ArrayList<String>();
while (rs.next()) {
    results.add(rs.getString(1));
}

assertEquals(107, results.size());
Collections.sort(results);
assertEquals("Adam", results.get(0));
assertEquals("Winston", results.get(106));
The employees table on HR should have 107 rows, first names should range from Adam to Winston, however, your result may vary.

Cleanup

Finally (actually, the code is normally performed in the finally block following the try one where the initialization and operation happened - meaning that we would always perform this step, whatever happens there), being done with the data, we can close result set, statement, and connection.
assertNotNull(rs);
rs.close();

assertNotNull(stmt);
stmt.close();

assertNotNull(conn);
conn.close();
Notice that we have to strictly follow the inverted order seen in initialization.
Notice also that each close() method could throw and exception so, again, these calls have to be in a try-catch block.

See full code on GitHub, class GettingStartedTest, method testSelect().

Reference: Oracle Database JDBC Developer's Guide 12c Release 2 (12.2) 2.4 Sample: Connecting, Querying, and Processing the Results

Go to the full post

Connecting to Oracle via JDBC

There are a couple of ways to connect to a recent Oracle Database, going through the standard DriverManager class or the specific OracleDataSource one. Let's see both of them. And, when we are there, let's add also some consideration on connecting to MySql via DriverManager.

It's a bit of an overkill, but I have written a little abstract class that is going to be the root of a hierarchy of classes for providing access to database through JDBC:
public abstract class Connector {
    public abstract Connection getConnection() throws SQLException;
 
    public String getDatabaseVersion(Connection conn) throws SQLException {
        return conn.getMetaData().getDatabaseProductVersion();
    }
}
The point of it is that each different concrete class has its own way to get a database connection but all of them would use it in the same way. So the getConnection() method is abstract where the actual method performing JDBC operation, like getDatabaseVersion(), would act in the same way.

Using OracleDataSource

The OracleDataSourceConnector extends Connector, and its raison d'être is keeping an instance of a OracleDataSource object as its private data member. Initialized in the constructor, is used by the the getConnection() method to return a database connection.
public class OracleDataSourceConnector extends Connector {
    private OracleDataSource ods;

    public OracleDataSourceConnector(String url, String user, String password) throws SQLException {
        ods = new OracleDataSource();
        ods.setURL(url);
        ods.setUser(user);
        ods.setPassword(password);
    }

    @Override
    public Connection getConnection() throws SQLException {
        return ods.getConnection();
    }
}
The good thing about OracleDataSource is that it has its own pool of connections that is managed implicitly. On the flip side, it is not standard JDBC. That means extra work if we want to adapt our code to use a different database.

I have written a tester to check the functionality, OracleDataSourceConnectorTest. See it on GitHub for full reference. There are only a few things that I want to stress here.
public class OracleDataSourceConnectorTest {
    private static final String URL = "jdbc:oracle:thin:@localhost:1521/orclpdb";  // 1
    private static final String USER = "hr";
    private static final String PASSWORD = "hr";

    private static OracleDataSourceConnector ods;
    private static Connection conn;

    @BeforeClass
    public static void setUp() {  // 2
        try {
            ods = new OracleDataSourceConnector(URL, USER, PASSWORD);
            conn = ods.getConnection();
        } catch (SQLException e) {
            fail(e.getMessage());
        }
    }

    // ...
}
1. I'm using the thin Oracle JDBC driver, the other choice is the OCI one. See the Oracle documentation if you wonder which one to use. Short answer is, usually thin is the one you want to peek. My database is local, on the standard port, and the service is named orclpdb. Your setup may vary.
2. I setup the connector and a connection through this static method called only once before the tests in the class are called. The sense is that I don't want to repeat expensive operations without a reason. So, when nothing is against it, I would reuse connector and connection in more tests.

And here is a test that requires its own connector, because I want to perform a disruptive negative test:
@Test
public void testBadUser() {
 OracleDataSourceConnector connector = null;
 try {
  connector = new OracleDataSourceConnector(URL, "Unknown", PASSWORD);  // 1
 } catch (SQLException e) {
  fail(e.getMessage());
 }
 
 try {
  connector.getConnection();  // 2
  fail("No connection expected for unknown user");
 } catch (SQLException e) {
  String expectedState = "72000";
  assertEquals(expectedState, e.getSQLState());
  
  int expectedCode = 1017;
  assertEquals(expectedCode, e.getErrorCode());
 }
}
1. I pass a bas user name to the connector. No failure is expected here, since no connection is actually done.
2. I expect the failure to happen here. Oracle should react with a ORA-01017 error code, that is included in the SQL state 72000, SQL execute phase errors.

The other test ensures that I can actually get to the database:
@Test
public void testGetDatabaseNameVersion() {
 try {
  String expected = "Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production";
  String actual = ods.getDatabaseVersion(conn);
  assertEquals(expected, actual);
 } catch (SQLException e) {
  fail(e.getMessage());
 }
}
Using DriverManager

If I am not interested in the connection pooling service offered by OracleDataSource, and I prefer to keep as generic as I can, I could use these other solution. In the old days, it required to perform an explicit registration of the JDBC driver:
Class.forName(klass)
Where klass is the actual class name, like "oracle.jdbc.driver.OracleDriver" or "com.mysql.jdbc.Driver".

Now this is done implicitly by the DriverManager, that leads to a very slim connector:
public class PlainConnector extends Connector {
 private String url;
 private String user;
 private String password;

 public PlainConnector(String url, String user, String password) {
  this.url = url;
  this.user = user;
  this.password = password;
 }

 @Override
 public Connection getConnection() throws SQLException {
  return DriverManager.getConnection(url, user, password);
 }
}
The constructor just store a copy of the data for connecting to the database, the getConnection() uses them going through DriverManager.

I have written another test case to see that actually the plain connector works as the OracleDataSource one. An then another one to see what I have to change to access with the same class a different database, here MySql.

This test case is accessing a MySql database that I have installed locally and on which I have added a user named "hr", to keep it close to the Oracle one. You can see how minimal are the required changes.

I had to change the URL:
private static final String URL = "jdbc:mysql://localhost:3306/hr?useSSL=false";
Notice the parameter useSLL set to false, to remove the MySql warning "Establishing SSL connection without server's identity verification is not recommended." In this test we can live without id verification.

In case of bad user, I expect now 28000 as SQL state and 1045, access denied, as error code.

And, the database version now should be something like "5.7.19-log"

Ah, right. Remember to put in your application build path the jar for oracle JDBC (currently named ojdbc8.jar) and for MySql (mysql-connector-java-5.1.43-bin.jar) and to have the databases up and running, if you want your tests to succeed.

Reference: Oracle Database JDBC Developer's Guide 12c Release 2

Full Java code on GitHub.

Go to the full post

Installing and configuring Oracle 12c

It was a while since the last time I have installed an instance of the Oracle database for local use. Nothing much has changed in the process, with the noticeable exception of the introduction of pluggable databases. Here I jot down a few remarks I made during the setup of the version 12 release 2.

1. In the page Typical Installation we have to specify the name for our pluggable database. Remember your choice!
2. The Enterprise Manager Database Express is available on https://localhost:5500/em, access it through the SYS user (with the password specified during setup)
3. Add in the tnsnames.ora (in the network\admin directory under the database home) the network configuration settings to access the pluggable database we have created in (1). Something like:
PDBORCL =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = pdborcl)
    )
  )
4. Use sqlplus to connect to the pluggable database with SYS as sysdba:
sqlplus sys/Oracle_1@pdborcl as sysdba;
5. As usual, unlock the hr user to access the test tables:
alter user hr identified by hr account unlock;
So far so good. All standard stuff. A problem raises if you plan to stop the Oracle service, that should be named OracleServiceORCL. Since it is always memory hungry, sometimes I just want it stay out of my RAM. The first time I did it, I had the surprise of getting a nasty error when I tried to connect to the database after restarting. Basically, the issue was that the pluggable database was mounted but not started.

This is the recipe I followed to solve it:

1. From the shell, enter sqlplus without logging.
sqlplus /nolog
2. In sqlplus, connect as sysdba.
connect / as sysdba
3. Brutally shutdown the database.
shutdown abort
4. Restart it.
startup mount
5. Recover the database, then open it.
recover database 
alter database open;
6. Open all the pluggable database, and then ask them to remember their status, so that we don't have to repeat the maneuver each time.
alter pluggable database all open;
alter pluggable database pdb_name save state;

Go to the full post

CodeEval Minimum Path Sum

Given a squared matrix of integers, find the minimal path sum from top-left to bottom-right, assuming that we can only move down or right.
This is CodeEval problem #72.

For example, give this matrix
4 6
2 8
We have just two possible paths: 4 -> 6 -> 8 and 4 -> 2 -> 8. It is immediate to see that the second one leads to the minimum sum of 14.

The structure of the problem recalls a Dynamic Programming solution table. I found it quite straightforward to use this thought to write a first solution:
def solution(data):  # 1
    dp = [[float(math.inf)] * (len(data) + 1)]  # 2
    for i in range(len(data)):
        dp.append([math.inf])
        dp[i+1].extend(data[i])

    end = len(dp)
    for i in range(1, end):  # 3
        for j in range(1, end):
            if i == 1 and j == 1:  # 4
                continue
            dp[i][j] += min(dp[i-1][j], dp[i][j-1])  # 5
    return dp[i][j]  # 6
1. The input parameter data contains a list of list of integers, representing the input matrix.
2. I copy the input in a DP-style table, where the first row and column have fake values. Usually, in a DP problem, zero is used for those cells. However here we need a value that should not be a candidate as minimum value. In a C-family language, I would have chosen a constant representing the maximum available integer. Python 3 has nothing like that, so I use math.inf, the floating-point positive infinity representation.
3. Calculate all the possible minimal path sum, skipping the first row and column.
4. This check looks unsatisfactory to me, but it is the less confusing way I found out to say that I don't have to do anything on the left-topmost element of the original matrix.
5. All the other cells are adjusted adding the smallest element on the immediate left or up.
6. There is nothing else to do, but returning the value in the bottom-right cell.

I found this solution to be elegant enough and easy to be understood. However I decided to write an alternative one to get rid of the buffer table, and also to navigate the table backward, from bottom-right to top-left.
def solution_2(data):
    last = len(data) - 1
    for i in range(last, -1, -1):
        for j in range(last, -1, -1):  # 1
            if i == last and j == last:  # 2
                continue
            if i == last:  # 3
                data[i][j] += data[i][j+1]
            elif j == last:
                data[i][j] += data[i+1][j]
            else:
                data[i][j] += min(data[i + 1][j], data[i][j+1])
    return data[0][0]  # 4
1. The loop variables are initially set to the last row/column and are decreased down to zero.
2. Again, nothing has to be done for the first cell.
3. Not having the extra-row/column, I have to add an extra check in the code for both i and j.
4. In this case the solution is stored in the top-left cell.

Not using another table should lead to a performance improvement, at least if large matrices are expected in input, and its cost in terms of less readability looks bearable to me. Scanning the table backward doesn't seem to give any advantage and makes the code a bit more obscure, so I would probably get rid of it.

I pushed the full python code on GitHub.

Go to the full post

Using JdbcTemplate in Spitter Web App

In the previous post I developed a CommandLineRunner Spring App to show how to use JdbcTemplate for simplify the JDBC access to an embedded instance of a H2 database. Here I port the changes to the Spittr Web App I have built up previously.

Actually, there is not much to say. There is some inconsistency between the two apps that spiced up the process, but nothing really complicated. I'd say the most interesting part is in the POM file. Here I removed, whenever it was possible, the versioning for each single dependency, relying instead on the parent element, where they are defined.
<parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>1.5.4.RELEASE</version>
    <relativePath />
</parent>
However, for some reason, Spring could not find the version for the JSP dependency, so I kept it as before
<dependency>
    <groupId>javax.servlet.jsp</groupId>
    <artifactId>javax.servlet.jsp-api</artifactId>
    <version>${jsp.version}</version>
    </dependency>
And left the versioning for it among the properties.
<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
    <java.version>1.8</java.version>
    <jsp.version>2.3.1</jsp.version>
</properties>
I pushed the new version of Spittr on GitHub.

Go to the full post

Spring and JdbcTemplate

Let's say that we want to write a Spring application that requires some simple database activity. We decide to use an embedded database, H2 and, instead of choosing a higher level persistency framework, we decide to go for the JDBC support, mediating it with JdbcTemplate, a Spring class aimed at simplifying its use.

I create a Spring project using the STS wizard (File >> New >> Spring Starter Project).
Since I want to use JDBC and the H2 database support, I select these two options from the dependencies page:
From database schema to POJOs

In the source/main/resources folder, I put a file named schema.sql, this is enough for Spring, that knows it has to get it and run it at startup, so there I put the SQL commands to create my tables. I create two tables, spitter and spittle, connected by the id of the first, that is a foreign key in the second.

The tables defined in the database are reflected in a couple of plain old Java classes, Spitter.java and Spittle.java, having as fields what in the tables are the columns.

There is one thing that I think is interesting. The SQL spittle table has, among the other fields, a datetime, postedTime. We need to find a counterpart for it in the Java world. Since we are not interested here in time zones and the such, just in a plain timestamp, I decided to use Java 8 java.time.Instant.

That's it. If the SQL stuff is simple, and I have just a couple of tables, I could give it for done. To see that, I have created a basic version for the SpringBootApplication that implements the CommandLineRunner interface:
public class UsingJdbcApplication implements CommandLineRunner {
    private static final Logger log = LoggerFactory.getLogger(UsingJdbcApplication.class);

    @Autowired
    private JdbcTemplate template;  // 1

    public static void main(String[] args) {
        SpringApplication.run(UsingJdbcApplication.class, args);
    }

    @Override
    public void run(String... args) throws Exception {
        Long counter = template.queryForObject("SELECT count(id) FROM spitter", Long.class);  // 2
        log.info("Rows on spitter table: " + counter);

        log.info("Pushing a spitter in");
        Object[] admin = new Object[] { "admin", "password", "Admin", "admin@example.dd" };
        template.update("INSERT INTO spitter(username, password, fullname, email) VALUES(?, ?, ?, ?)", admin);  // 3

        template.query("SELECT * FROM spitter WHERE username = ?", new Object[] { "admin" },  // 4
                (rs, rowNum) -> new Spitter(rs.getLong("id"), rs.getString("username"), rs.getString("password"),
                        rs.getString("fullname"), rs.getString("email"), rs.getBoolean("updateByEmail")))
                .forEach(spitter -> log.info("Spitter: " + spitter.toString()));
    }
}
1. Sort of magic. We ask to Spring to wire this JdbcTemplate object. Since the H2 jar is in the classpath, it understands it has to use it as database. We don't provide any settings, and it uses a whole bunch of defaults. Usually not a good idea for production code, but very handy for prototyping.
2. Let's run a simple query that returns a single value. This line shows that actually Spring has initialized the database using the script named schema.sql that it found in the main resources folder. And how JdbcTemplate makes our life so much simpler. Notice that the queryForObject() method has a first parameter that is the SQL statement we want to run, and the second one that is the type for the expected return value.
3. A parametrized update(). Its first parameter is a parametrized JDBC SQL insert statement, the second one a vector of objects containing the values that we want to usein the query.
4. Something a bit more complicated. The query() method execute a query in a way similar to the update() seen above but, as third parameter, it accepts a lambda function that maps each row in the resulting resultset to a new Spitter object. We tipically have no use for the rowNum parameter, just access any resultset field we need.
This query() call returns a list of Spitter. On that list we apply a forEach() so that each element in the list is logged. Actually, here we are expecting just one Spitter, however this is a common pattern that we should learn to recognize and use.

Running the Spring Boot App, I can see in the console window the expected log.

Repositories and configuration

Usually we don't want to scatter SQL statements all over our Java code. So we create two interfaces, one for Spittle and one for Spitter, that act as repository for them, and then we implement them for our JDBC provider.
The two classes are quite similar. Let's see some parts of JdbcSpittleRepository, that is a tad more complicated.
// ...
private static final String SELECT = "select sp.id, s.id as spitterId, s.username, s.password, s.fullname, s.email, s.updateByEmail, sp.message, sp.postedTime from Spittle sp, Spitter s where sp.spitter = s.id";
private static final String SELECT_RECENT = SELECT + " order by sp.postedTime desc limit ?";

public List<Spittle> findRecent(int count) {
    return template.query(SELECT_RECENT, new SpittleRowMapper(), count);  // 1
}

// ...
private long insertSpittle(Spittle spittle) {
    SimpleJdbcInsert insert = new SimpleJdbcInsert(template).withTableName("Spittle");  // 2
    insert.setGeneratedKeyName("id");
    Map<String, Object> args = new HashMap<String, Object>();
    args.put("spitter", spittle.getSpitter().getId());
    args.put("message", spittle.getMessage());
    args.put("postedTime", Timestamp.from(spittle.getPostedTime()));
    long spittleId = insert.executeAndReturnKey(args).longValue();
    return spittleId;
}

// ...
private static final class SpittleRowMapper implements RowMapper<Spittle> {  // 3
    public Spittle mapRow(ResultSet rs, int rowNum) throws SQLException {
        long id = rs.getLong("id");
        String message = rs.getString("message");
        Instant postedTime = rs.getTimestamp("postedTime").toInstant();
        long spitterId = rs.getLong("spitterId");
        String username = rs.getString("username");
        String password = rs.getString("password");
        String fullName = rs.getString("fullname");
        String email = rs.getString("email");
        boolean updateByEmail = rs.getBoolean("updateByEmail");
        Spitter spitter = new Spitter(spitterId, username, password, fullName, email, updateByEmail);
        return new Spittle(id, spitter, message, postedTime);
    }
}
1. Almost an innocent JdbcTemplate wrapper to a JDBC query, if we don't pay much attention to the second parameter, that would convert the ResultSet coming from the SQL select in a list of Spittles. See the SpittleRowMapper below for details.
2. Here we see how SimpleJdbcInsert helps us in writing readable code when we want to insert a row in a table, in the case where the database will take the burden of generating a key for us, and we need to return it to the caller. After specifying the table we are working on, we set the name of the field that is a generated key, and then, as a map, the field names and associated values. Finally we have just to pass them to the SimpleJdbcInsert by executeAndReturnKey().
3. The SpittleRowMapper seen in action in (1). Its mapRow() method maps a single row from the underneath table in a Spittle object.

In some way, Spring has to configure these repositories, passing to them a viable JdbcTemplate object. In this app, it is done by JdbcConfig, annotated as a Spring Configuration class. It generates a few beans (in the Spring sense of the term) representing the DataSource, JdbcTemplate, the repositories, and also a transaction manager. We'll see them at work in the tester. It interesting seeing how the DataSource is set, when we don't rely on the default initialization by Spring.
@Configuration
public class JdbcConfig {
    @Bean
    public DataSource dataSource() {
        return new EmbeddedDatabaseBuilder()
                .setType(EmbeddedDatabaseType.H2)
                .addScripts("schema.sql", "test-data.sql")
                .build();
    }
 
    // ...
This data source represents an embedded database, H2 type, that would run a couple of scripts at startup. We already seen the first one, schema.sql, here I added a second one, that contains a bunch of SQL insert for spitters and spittles rows.

Testing

There are a couple of JUnit test, one for Spitter the other one for Spittle repository. They are quite similar. Let's have a fast look at first one.
@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(classes = JdbcConfig.class)  // 1
public class JdbcSpitterRepositoryTest {
    @Autowired
    JdbcSpitterRepository spitterRepository;  // 2

    // ...
    @Test
    @Transactional  // 3
    public void findAll() {
        List<Spitter> spitters = spitterRepository.findAll();  // 4
        assertEquals(4, spitters.size());
        assertSpitter(0, spitters.get(0));
        assertSpitter(1, spitters.get(1));
        assertSpitter(2, spitters.get(2));
        assertSpitter(3, spitters.get(3));
    }

    //...
}
1. Here I'm saying to Spring to fetch the JdbcConfig class and use it to configure itself for testing.
2. Thanks to (1) we can ask to Spring to autowire the spitter repository.
3. Having defined a PlatformTransactionManager, we can use it to keep very simple the transaction management. A transaction is started and completed, eventually rolled-back, behind the curtains.
4. The spitters come from the test-data.sql called at startup by JdbcConfig dataSource().

Reference: Hitting the database with Spring and JDBC, from Spring in Action, Fourth Edition by Craig Walls, chapter ten.

This complete Spring application is on GitHub.

Go to the full post

CodeEval Word Search

Given a board where in each cell there is a letter, check if word is in it. We could start anywhere, but we could move only horizontally and vertically. Each cell could be used only once. More details on the CodeEval page.

The board is fixed, and looks like this:
ABCE
SFCS
ADEE
I have converted the examples provided in the problem in a python test case:
class TestCodeEval(unittest.TestCase):
    def test_provided_1(self):
        self.assertEqual(False, solution('ASADB'))

    def test_provided_2(self):
        self.assertEqual(True, solution('ABCCED'))

    def test_provided_3(self):
        self.assertEqual(False, solution('ABCF'))
For instance, it should be easy to see that we can't follow ASADB on the board. After ASAD there is a 'F' that block the way to the final 'B'.

I have converted the board in a couple of dictionaries:
WHERE = {'A': {(0, 0), (2, 0)},
         'B': {(0, 1)},
         'C': {(0, 2), (1, 2)},
         'D': {(2, 1)},
         'E': {(0, 3), (2, 2), (2, 3)},
         'F': {(1, 1)},
         'S': {(1, 0), (1, 3)}}

NEIGHBORS = {
    (0, 0): {(0, 1), (1, 0)},
    (0, 1): {(0, 0), (1, 1), (0, 2)},
    (0, 2): {(0, 1), (1, 2), (0, 3)},
    (0, 3): {(0, 2), (1, 3)},

    (1, 0): {(0, 0), (1, 1), (2, 0)},
    (1, 1): {(1, 0), (0, 1), (2, 1), (1, 2)},
    (1, 2): {(1, 1), (1, 3), (0, 2), (2, 2)},
    (1, 3): {(1, 2), (0, 3), (2, 3)},

    (2, 0): {(1, 0), (2, 1)},
    (2, 1): {(2, 0), (2, 2), (1, 1)},
    (2, 2): {(2, 1), (1, 2), (2, 3)},
    (2, 3): {(2, 2), (1, 3)}
}
The first one, WHERE, is a pure utility, that let me easily find where each letter is on board.
The second one, NEIGHBORS, is a plain adjacency list, and should hint that I have seen in this problem description a graph. Given the rules of adjacency in this board it is not strictly a necessity, but it helped me to better think to the solving algorithm.

My solution is based on a graph search algorithm, adapted to the peculiarity of the problem. For each letter in the word, I check for the possible continuation, backtracking when I don't find a good way, until the word is consumed or I can't find any way to continue the search.

After some startup code, I only need a recursive function that moves from one step to the next one:
def find(node, visited, letters):  # 1
    if not letters:  # 2
        return True

    letter = letters.popleft()  # 3
    nodes = WHERE[letter] & NEIGHBORS[node]  # 4
    for node in nodes:
        if node not in visited:  # 5
            visited.add(node)
            if find(node, visited, letters):  # 6
                return True
            visited.remove(node)
    letters.appendleft(letter)  # 7
    return False
1. I pass to the function the current position on board, then a set of already visited nodes, and finally a deque containing the letters that should be still to be found.
2. No letter pending means I have already found the word. Job done.
3. Extract the first letter I have to find from the deque.
4. The intersection between the sets where the letter is present and the nodes adjacent to the current ones give me the set of nodes where I have to look for the next step.
5. But only the nodes that I have not already visited are good candidates.
6. Recursive call.
7. If this branch is not satisfactory, I push back the letter in the deque and return False.

Full code, both test and actual python script, is on GitHub.

Go to the full post

CodeEval Levenshtein Distance

We are given in input a short list of words, a separator, then another huge list of words. For each word in the first list, we have to return the numbers of elements in its "social network" from the second list. A word is "friend" of another one if their Levenshtein distance is one. Meaning, they have just one letter difference, as "aa" with "aaa", or "aaa" with "aba". Pay attention that the example provided by CodeEval, when tested against the file they pushed on GitHub, is slightly wrong.

The problem is conceptually simple, its main issue is about performance. Let's write firstly a straightforward solution that scores poorly even when implemented in C++, and don't even get accepted in my python implementation, due to its slowness.

The definition of friendship given in this problem should let us think about graphs and cliques. Fortunately we are not asked to get the maximum clique in the network, but only the maximum clique for the few specified nodes. So we should expect "just" an Big-Oh N squared time complexity on the number of elements in the full social network that, unfortunately, we know to be large.

Still, on top on that N x N, we should consider the cost of calculating the Levenshtein distance for any couple of words. This problem leads almost naturally to a Dynamic Programming solution, however, also in this case we should expect an expensive temporary complexity, give or take in the order of m x n, where m and n are the lengths of the two words.

A Levenshtein distance of one

But, wait a minute, really we are not interested in the Levenshtein distance, just on determine if two words have it equals to one. This is a much more simple problem, and we can even break it in two parts. If the two words have the same size, we could just check if there is one and only one different character in them, i.e., we can pass from one string to the other just with one char swap:
def is_swap(lhs, rhs):
    difference = False
    for left, right in zip(lhs, rhs):  # 1
        if left != right:  # 2
            if difference:
                return False
            difference = True
    return difference  # 3
1. In production code, I would have asserted the precondition len(lhs) == len(rhs), here I just rely on the caller to do it. Assuming the same size for lhs and rhs, I zip them, extracting the characters at the same position and putting them in the cycle variables, left and right.
2. If left is not equals to right, I raise the difference flag. But if it was already raised, I know the difference is too strong, and I can stop checking.
3. If there is one and only one difference I would return True.

Other part of my simplified Levenshtein check. If I have two strings with difference in their size of one, the longest one should be the same as the shorter, but having one character more.
def is_add(shorter, longer):
    if len(shorter) > len(longer):  # 1
        shorter, longer = longer, shorter

    x = 0  # 2
    i = 0  # 3
    while i < len(shorter):
        if shorter[i] != longer[i+x]:  # 4
            if x == 1:
                return False
            x += 1
        else:
            i += 1

    return True if x < 2 else False  # 5
1. I assume the caller ensures the precondition on the parameter size, but doesn't verify which is which. This simplify the caller code, making this function a bit less readable. Anyway, the point of this check is ensuring that the string named shorter refers to the actual shorter one in input. 2. Counter for the extra characters found in the longer string. 3. Loop variable on the shorter string. The longer one will be ahead of x positions. 4. I found a difference in the strings. Add the difference counter, but if I have already seen a difference, in that case I know the two strings are too different to be friends. 5. I return True if here x is both 0 or 1. Zero means a case like "aa" vs. "aax". Each comparison in the length of shorter succeeded, however longer has a character more. One means the difference is in the middle. More than one difference is trapped in the while loop. Setting up the graph My first idea was to generate the social network graph, and then looking on it for the friends of the requested nodes. Saying that words is a dictionary, each word I get in input would be pushed in it as key, initializing its value with an empty set:
for test in file:
    words[test.rstrip('\n')] = set()
Then I call a utility function, check_friends(), to push each friend in its place - seeing it as a graph, the node is in the key, the edges are in the value.
def check_friends():
    for lhs in words.keys():  # 1
        for rhs in words.keys():
            if (len(lhs) == len(rhs) and is_swap(lhs, rhs) or  # 2
                    abs(len(lhs) - len(rhs)) == 1 and is_add(lhs, rhs)):  # 3
                words[lhs].add(rhs)
                words[rhs].add(lhs)
1. The infamous N x N loop. I check each word against all the other ones. Actually, I let is_swap also to detect if the word is tested against itself. 2. Same size, let is_swap() decide if the words are friends. 3. One character difference, ask to is_add() for friendship. Getting the clique Checking if a given word is in the full social network is easy and fast, being stored in a dictionary. Then, I apply a Breadth-first search (BFS) algorithm to navigate the graph and collecting all the friends and friends of friends:
def solution(word):
    results = []  # 1
    visited = set()  # 2

    net = deque()  # 3
    if word not in words.keys():  # 4
        return 0
    net.append(word)  # 5
    visited.add(word)

    while net:  # 6
        friend = net.popleft()
        fofs = words[friend]
        for fof in fofs:  # 7
            if fof not in visited:
                net.append(fof)
                visited.add(fof)
        results.append(friend)  # 8

    return len(results)
1. Here I am going to push all the elements in the word social network. 2. This is to avoid infinite looping. It's equivalent of marking a node as visited. 3. In this queue I push the node in the personal net for the current node that I am checking. 4. It is not officially stated in the problem that the caller could ask only for a word in the list, so I added this control. In any case it is cheap. 5. Initialize the algorithm before starting. The passed word is ready to be processed, and it is marked as visited. 6. While there are nodes to be processed, loop. 7. Push each friend of the current friend in the net, mark it as visited. 8. Push the current friend in the results set. This approach works alright, and I'm quite happy with it. However, for the way the problem is conceived, it is too slow. We need to do something to improve performances. It is clear where the bottleneck is. Generating the graph for the full social network is obviously very expensive. Do we really need to do it? Usually it would be a good idea, but here we know that we need to calculate the social network just for a handful of words. So, let's avoid to do it. A slimmer algorithm First change, word is not anymore a dictionary, but just a set. This means there is no value associated to each element, when I read a word I just push it in the set, and that's it.
for test in file:
    words.add(test.rstrip('\n'))
Also, no need to call check_friends(), instead, I do almost the same job in a get_friends() function, but only for the relevant items:
def get_friends(item):
    result = []
    for word in words:
        if (len(item) == len(word) and is_swap(item, word) or
                abs(len(item) - len(word)) == 1 and is_add(item, word)):
            result.append(word)
    return result
The changes in solution() are minimal:
def solution(word):
    results = set()
    visited = set()

    net = deque()
    if word not in words:
        return 0
    net.append(word)
    visited.add(word)

    while net:
        friend = net.popleft()
        fofs = get_friends(friend)
        for fof in fofs:
            if fof not in visited:
                net.append(fof)
                visited.add(fof)
        results.add(friend)

    return len(results)
And now CodeEval accepts happily the code. I pushed both my python solutions to GitHub, the complete but slow one, and the sleeker one, tailored on the problem requisites.

Go to the full post

CodeEval Type Ahead

Given the size of an n-gram in input, and its first n-1 components, we should write a function that returns our predicted possible last words, sorted by their calculated probability (from higher to lower) and then alphabetically. The solution assumes we generate our n-gram against the nursery rhyme Mary had a little lamb that we could safely hard-code in our program, stripping it from all non-words. More details on the CodeEval page problem.

I plan to write a python solution. So, first thing, I write a python test case:
def test_provided_1(self):  # 1
    self.assertEqual('lamb,0.375;teacher,0.250;children,0.125;eager,0.125;rule,0.125', solution(2, 'the'))

def test_lamb(self):
    self.assertEqual('at,0.200;its,0.200;love,0.200;was,0.200;you,0.200', solution(2, 'lamb'))

def test_the_lamb(self):
    self.assertEqual('love,0.333;was,0.333;you,0.333', solution(3, 'the lamb'))

def test_at(self):  # 2
    self.assertEqual('school,1.000', solution(2, 'at'))
1. The given as example. Looking at the Mary & lamb rhyme, the possible resulting two-grams when the first word is 'the' could be 'the lamb', 'the teacher', 'the children', 'the eager', 'the rule'. The most probable one, using 'lamb', has an estimated 0.375 chance. The less probable ones, having the same chance, should be presented in alphabetical order.
2. Just a single two-gram could be generated from 'at'. Its probability should be obviously 1. Notice the format for probability, three decimal are always required.

The most boring part of the problem was converting the rhyme in a list of words, something like
TEXT = ['Mary', 'had', 'a', 'little', 'lamb', 'its', 'fleece', 'was',
# ...
        'teacher', 'did', 'reply']

I found out running the CodeEval tester that I don't have to worry about using some case insensitive comparison. Upper and lower case are as found in the rhyme.

Then I need a dictionary, where I will put all the n-grams in the rhyme
n_grams = {}
I decided to be lazy. Actually, I'd say that in this case being lazy is the only sensible thing to do, if I don't want to create n-grams that are not required by the user. I write instead a function that checks if a give n-gram family has been already generated. If not, it is created and pushed in the dictionary:
def check_n_grams(n):
    if n in n_grams:  # 1
        return

    item = defaultdict(lambda: defaultdict(int))  # 2
    for i in range(len(TEXT) - n):  # 3
        outer = ' '.join(TEXT[i:i + n])  # 4
        intern = TEXT[i + n]
        item[outer][intern] += 1  # 5

    n_grams[n] = item
1. the n_grams for the current n have been already generated, there is nothing to be done here.
2. Let's generate all the n-grams for the passed n. We need here a bit of a perplexing data structure, a dictionary of dictionaries (that is going to be pushed in a dictionary). Moreover, to simplify a bit the following code, I used defaultdicts. I can't pass defaultdict to initialize a defaultdict, because it is not a callable. So I used the clever trick of passing instead a lambda (that is a callable) that returns it.
3. Loop on all the words in TEXT, stopping a tad before the end, see below if you don't have already guessed why. This code spectacularly fails if the user has the insane, but legit, idea of asking for an extremely high n. For production code a check on its size should be mandatory.
4. Generate the current n-gram considering it divided in two parts, the first one, that I am going to store as key in the outer dictionary, is done joining n words starting from the current one on, the second, that the key in the intern dictionary, represents the last word in the n-gram. So, actually, I am storing in this maps all possible (n+1)-grams.
5. Push the n-gram in the dictionary of dictionaries, setting the initial value to one, or increasing it if such combination is already there.

Large part of the job is done. The solution function does not much more than calling the utility function defined above and formatting the output.
def solution(n, given):
    n -= 1  # 1
    check_n_grams(n)  # 2

    target = n_grams[n][given]  # 3

    values = list(target.items())  # 4
    values.sort(key=lambda x: x[0])  # 5
    values.sort(key=lambda x: x[1], reverse=True)  # 6

    results = []  # 7
    population = sum(target.values())
    for value in values:
        results.append('{},{:.3f}'.format(value[0], value[1] / population))  # 8
    return ';'.join(results)
1. As seen in check_n_grams(), I generate in the n_grams dictionary the (n+1)-grams, splitting them in the key-value on the internal dictionary. I felt that this trick made the code more readable, however here I pay the price of it, with this obscure decrement. Again, production code should perform some parameter checking to avoid runtime disasters.
2. Ensure the current n-grams have been generated.
3. Get the internal dictionary. For instance, if I am looking for the 2-grams for 'at', in target I expect to get a dictionary containing just one element, where the key is the last word in the n-gram and value is its hit counters. In this case 'school': 1.
4. I need to do some sorting now, so I convert the items in target to a list, data type sporting a handy function, sort() that is stable, in this way I can call it repeatedly to get the expected effect.
5. First I sort values by the zeroth component, representing the key of the internal dictionary.
6. Then I sort it by the one-th component, the word counter, in reversed mode, so that the most seen would be first.
7. Format the data, put them in this temporary list, join its components and return them.
8. As required, the probability is formatted as a floating number with three decimals.

Full code pushed on GitHub. Both the tester and the actual python script.

Go to the full post

CodeEval Cash Register

Given in input the price of a certain item and the cash that the client has passed to us to get it, we have to provide to the cashier a list of what he has to get back. More details in the problem page on CodeEval.

We assume the register has an infinite amount of coins and bills in the given denominations:
'PENNY': .01,
'NICKEL': .05,
'DIME': .10,
'QUARTER': .25,
'HALF DOLLAR': .50,
'ONE': 1.00,
'TWO': 2.00,
'FIVE': 5.00,
'TEN': 10.00,
'TWENTY': 20.00,
'FIFTY': 50.00,
'ONE HUNDRED': 100.00
To better clarify what we should do, the first thing I have done is converting the given examples in a Python test case:
def test_provided_1(self):
    self.assertEqual('NICKEL,PENNY', solution(15.94, 16))

def test_provided_2(self):
    self.assertEqual('ERROR', solution(17, 16))

def test_provided_3(self):
    self.assertEqual('ZERO', solution(35, 35))

def test_provided_4(self):
    self.assertEqual('FIVE', solution(45, 50))
The first number passed to the function is the price, the second one is the amount given in.

It shouldn't be to difficult to get a working solution in Python. The core of the problem is looping on all the available denominations, starting from the highest one down to the smallest, selecting each denomination the maximum possible times.

Since working with floating point numbers when a currency is expected is rarely a good idea, I have decided to store in my dictionary of denomination the values as multiple of cents. Something like that:
DENOMINATIONS = {
    'PENNY': 1,
    'NICKEL': 5,

    # ...

    'FIFTY': 5000,
    'ONE HUNDRED': 10000
}
Besides, I payed attention in converting the floating point numbers in integers so to avoid truncation problems:
change = int(cash * 100) - int(price * 100)
However, the interesting part is the loop to check if a given denomination has to be used and how many times:
for name, value in sorted(DENOMINATIONS.items(), key=lambda x: x[1], reverse=True):
    # ...
I want to loop on all the items() in the DENOMINATIONS dictionary, to operate on its key and value - where the key represent the name of the denomination.

I can't rely on the order chosen by Python in scanning the dictionary, I must start from ONE HUNDRED, and proceed in sorted fashion down to the PENNY. So I sorted() it.

I don't want to sort them by the key, it won't make any sense, but by value. So I specify as key that I want to use the second component. I say that to Python using a lambda, that gets the current item in input and returns its second component (index 1, since we are 0 based).

As said above, I want to start from the highest value, that's way I specify that the sorting order is reversed.

The rest of the code should be quite simple to figure out. In any case, my full solution is on GitHub, both the test case and the actual python script.

Go to the full post

A view on Spring error messages

We have seen in the previous post how to let Spring validate the data coming from the view against a bean (in the sense of a plain Java one) defined in the model of our web app, thanks to the Java Validation protocol, in the Hibernate implementation. Now we see how to let the resulting view acknowledge the errors, so that the user could correct them.

First thing, let's slightly change the controller, so that a empty spitter object is passed to the register view when we GET it.
@RequestMapping(value = "/register", method = GET)
public String showRegistrationForm(Model model) {
    model.addAttribute(new Spitter());
    return "registerForm";
}
Spring sees we want to push on the model a new Spitter. By default it names it as the class, first letter lowercase, "spitter".

The registerForm JSP now changes to use it. Our job is made simpler by using a Spring tag library, form, that provides a nice support for manage HTML forms. I associated the "sf" prefix to it, and I am going to use only a few of the many functionality it makes available.
I have rewritten the form in this way:
<sf:form method="POST" commandName="spitter">  <!-- 1 -->
  First Name: <sf:input path="firstName" />    <!-- 2 -->
 <sf:errors path="firstName" />         <!-- 3 -->
 <br />
  Last Name: <sf:input path="lastName" />
 <sf:errors path="lastName" />
 <br />
  Email: <sf:input path="email" />
 <sf:errors path="email" />
 <br />
  Username: <sf:input path="username" />
 <sf:errors path="username" />
 <br />
  Password: <sf:password path="password" />    <!-- 4 -->
 <sf:errors path="password" />
 <br />
 <input type="submit" value="Register" />
</sf:form>
1. Instead of using the standard HTML form, I used the Spring Form version. The attribute (in XML sense) commandName shows the name of the attribute that has to be used to populate the fields in the form. If we arrive here from GETting "/register", spitter would be empty. If we arrive here POSTing "/register", after detecting validation errors, spitter would contain the fields as entered by the user.
2. The Spring Form input tag uses its attribute path to select the field from the commandName object that is associated to it. In this case, firstName.
3. The Spring Form errors tag is just like a plain text label, but linked to the Spring Validation Errors object generated by the validation process - when available. Again, its path attribute refers to the specific field in it.
4. The Spring Form password tag is a sort of Spring aware HTML input-password tag.

And with this, we could claim that the job is done. However, we would like to have a better control on the messages showed to the user in case of errors. That is very easily done, as it is providing localized version of those messages.

To do that I have created a file named ValidationMessages.properties in the src/main/resources folder, and its ValidationMessages_it.properties counterpart for its Italian version. Each line represents an error message. For instance:
firstName.size=First name must be between {min} and {max} characters long.
Notice the words included in curly braces. These are variables that Spring will resolve when creating the actual message.

I am going to use the custom messages in the Spitter class:
@Size(min = 2, max = 30, message="{firstName.size}")
private String firstName;
Now, if I push the Register button without entering anything, I get:
(Notice that there is no error message for the empty email address - no email is considered a valid alternative)

In a browser localized for Italy, I have this alternative result:
Reference: Rendering web views, from Spring in Action, Fourth Edition by Craig Walls. Chapter six.

I have pushed the project changes to GitHub.

Go to the full post

Using tha Java Validation API

The user registration shown in the previous post has one very visible issue, there is no validation at all. The user could enter whatever value in each required field and the web app happily register it. Using Java Validation we could easily solve this problem.

We need to add an implementation of the javax.validation validation API, the Hibernate validator is a good candidate. So I add its dependency to my Spring Web App POM configuration file:
<dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-validator</artifactId>
    <version>5.4.1.Final</version>
</dependency>
Then I modify the Spitter bean (in the sense of plain Java Bean), annotating its properties so to impose the rules I want the registration to observe.
For instance, I want the first name to be not null and having a size in [2 .. 30]:
@NotNull
@Size(min = 2, max = 30)
private String firstName;
NotNull and Size are standard Java Validation annotations. Since I use the Hibernate implementation of that JSR, I can use a non-standard, hibernate specific annotation to enforce that the email field represents a valid e-mail address:
@NotNull
@Email
private String email;
These checks are enforced in the controller. I change the processRegistration() method in this way:
@RequestMapping(value = "/register", method = POST)
public String processRegistration(@Valid Spitter spitter, Errors errors) {  // 1
    if (errors.hasErrors()) {  // 2
        return "registerForm";
    }

    spitterRepository.save(spitter);  // 3
    return "redirect:/spitter/" + spitter.getUsername();
}
1. The Spitter input parameter is tagged with the Valid Java Validation annotation, in this way I'm saying to the Hibernate Validator to check its fields accordingly to the annotations specified on this data members. Any error on the validation is pushed in the Errors object.
2. In a real app I would do much more than simply check if there is any validation error. For sure we need to say something to the user, so to help him to correct his mistakes. And it would be probably a good idea to log them.
3. Only if no validation error is detected we go on saving the user and giving a confirmation to the caller.

Reference: Validating form, from Spring in Action, Fourth Edition by Craig Walls. Chapter five, section 4.2.

The complete Java Spring Web App project is on GitHub.

Go to the full post

Spitter registration

We need to provide a way to let people register to our spittr web application, this is done here relying on the support offered by the Spring framework, that easily support the forms, giving a natural way for processing them.

The Model

A registered user is defined as a Spitter:
public class Spitter {

    private Long id;

    private String username;
    private String password;
    private String firstName;
    private String lastName;
    private String email;
    // ...
}
We'll need to check for equality on different Spitters, so we need to override both equals() and hashCode(). I did it using Objects.equals() and Objects.hash() in their implementions. I feel they helps to keep the code compact and readable without using external libraries.

We'll have a repository where storing our spitters, but we don't need to fix its type here, just the interface that each spitter repository has to implements:
public interface SpitterRepository {
    Spitter save(Spitter spitter);
    Spitter findByUsername(String username);
}
Actually, just to do some smoke testing, it is nice to have a concrete repository to play with:
@Repository
public class JdbcSpitterRepository implements SpitterRepository {
    @Override
    public Spitter save(Spitter spitter) {
        // TODO: actual implementation
        return spitter;
    }

    @Override
    public Spitter findByUsername(String username) {
        // TODO: actual implementation
        return new Spitter(username, "password", "John", "Doe", "jdoe@somesite.dd");
    }
}
I assumed it will be refactored in future to become a real JDBC repository, hence its name. Notice that the class is annotated as Spring repository.

Data Configuration

Having introduced a new repository in the web app, we should say to Spring how to get it. To do that I added a bean (in the sense of Spring Bean) in the already existing DataConfig java class, annotated as Spring configuration.
@Configuration
public class DataConfig {
    // ...

    @Bean
    public SpitterRepository spitterRepository() {
        return new JdbcSpitterRepository();
    }
}
The View

The user enters data filling a simple HTML form, something like that:
<form method="POST">
    First Name: <input type="text" name="firstName" /><br />
    Last Name: <input type="text" name="lastName" /><br />
    Email: <input type="email" name="email" /><br />
    Username: <input type="text" name="username" /><br />
    Password: <input type="password" name="password" /><br />
    <input type="submit" value="Register" />
</form>
Being no action attribute specified in the form tag, the post is done referring to the same URL accessed originally. Not much more to say on this part, you could see the full registerForm.jsp file on GitHub.

We'll also have a way to show a registration, accessing profile.jsp. Our Spring web app should take care of putting in a variable named spitter, an instance of the above discussed Spitter class, that represents the currently registered user. So, to display it, using JSP EL, we'll write something like:
<c:out value="${spitter.username}" /><br />
<c:out value="${spitter.firstName}" />
<c:out value="${spitter.lastName}" /><br />
<c:out value="${spitter.email}" />
The Controller

As usual, the controller keeps model and view together. The Spitter controller reacts to calls based in "/spitter". We want it to convert a call to "/spitter/register" showing the register form view or redirecting it to /spitter/xxx, where xxx is the username, accordingly to the HTTP command used. GET is for the first one, POST for the second. Any "/spitter/xxx" GET call will drive to the profile view.
@Controller
@RequestMapping("/spitter")
public class SpitterController {  // 1
    private SpitterRepository spitterRepository;  // 2

    public SpitterController(SpitterRepository spitterRepository) {
        this.spitterRepository = spitterRepository;
    }

    @RequestMapping(value = "/register", method = GET)  // 3
    public String showRegistrationForm() {
        return "registerForm";
    }

    @RequestMapping(value = "/register", method = POST)  // 4
    public String processRegistration(Spitter spitter) {
        spitterRepository.save(spitter);
        return "redirect:/spitter/" + spitter.getUsername();
    }

    @RequestMapping(value = "/{username}", method = GET)  // 5
    public String showSpitterProfile(@PathVariable String username, Model model) {  // 6
        Spitter spitter = spitterRepository.findByUsername(username);  // 7
        model.addAttribute(spitter);
        return "profile";
    }
}
1. This class is annotated as Spring controller, and it maps calls rooted in "/spitter".
2. We are going to use a repository that has to implement the SpitterRepository.
3. Maps a GET to "/spitter/register" to the registerForm view.
4. Redirects a POST to "/spitter/register" to the "/spitter/xxx" view. A spitter is expected as parameter, and it is going to be saved to the repository before the redirect is performed.
5. Maps a GET to "/spitter/xxx" to the profile view.
6. The username is extracted from the path by the PathVariable Spring binding annotation.
7. The repository is asked to give the spitter associated with the username, then we push it to the model.

Testing

The controller design has been driven by a couple of mock JUnit tests, using Mockito, defined in the SpitterControllerTest class. Besides, having provided a (fake) concrete implementation for the spitter repository, it is possible to run the web app and see its general behavior.

So, GETting "/spitter/register", we'll see the input form:
As soon as we push the Register button, a POST will be generated for the same URL. As seen above, the controller will redirect to "/spitter/jdoe"

So. Our app works. At least in a sort of demo mode. It is easy to spot how weak is, however. We are going to improve its robustness in the next post.

Reference: Processing forms, from Spring in Action, Fourth Edition by Craig Walls. Chapter five, section four.

Full Java Spring code available on GitHub.

Go to the full post

Showing a single Spittle

In our Spring Spitter Web App we want to display any single Spittle, just passing its id. This could be easily done accepting a request in the form of "spittles/show?id=42", following the style we have just used for getting a list of spittles, but we don't like it that much here, since we feel that "spittles/42" will be more expressive of the access to a resource.

Good for us, the Spring Framework has an handy approach to this problem.

Stripping down the view to the limit, what we want to get is something like this:
I have get it through a JSP page with JSTL tags, spittle.jsp, accessing an attribute named spittle that contains a Spittle object. Besides that minimal JSP, I have added a mock test to ensure the new functionality works fine. I have not much to say about them, and anyway you can see them on GitHub.

Let's see instead the new method added to SpittleController.
@RequestMapping(value = "/{spittleId}", method = RequestMethod.GET)
public String spittle(@PathVariable long spittleId, Model model) {
    model.addAttribute(spittleRepository.findOne(spittleId));
    return "spittle";
}
The spittle() method returns a string, the name of the view that should be generated for the caller, and expects two parameters, the spittle id and the model where we are going to push the Spittle object.
Notice that its first parameter is annotated as PathVariable, and notice also as the method is annotated as RequestMapping, with value set to a path. the path is built using curly brackets, signalling to Spring what is a variable that has to be extracted and used as parameter.
Being the class itself annotated as RequestMapping on "/spittles", the Spring framework is smart enough to understand what we want to do here.

Reference: Taking input via path parameters, from Spring in Action, Fourth Edition by Craig Walls. Chapter five, section 3.2.

You can find the code I have written on GitHub.

Go to the full post

Showing a paged list of spittles

As next step in our little Spring Web App, we are going to add in the view a way to display spittles specifying the "end" id, using the standard convention that it is excluded from the interval, and how many of them we want to show.

To do that, I firstly change the SpittleController so that its method mapped to the GET request method would accept the two parameters, giving them default values so that the previous functionality still works.
@RequestMapping(method = RequestMethod.GET)
public List<Spittle> spittles(
        @RequestParam(value = "max", defaultValue = MAX_ID) long max,
        @RequestParam(value = "count", defaultValue = DEFAULT_COUNT) int count) {
    return spittleRepository.findSpittles(max, count);
}
Actually, this version looks quite different from the previous one. The return value here is a List of Spittles, the one that was put in the spittleList model, and so we should wonder how Spring could guess which JSP call to generate a view.

However, internally, they are equivalent - if we don't consider how findSpittles() is called. The job of generating the JSP page and the attribute name is done by the mighty Spring framework, that understands that the first should be extracted from the URI passed as string to the class RequestMapping annotation, "/spittles", stripping the leading slash, and builds the latter from the method return type.

Just a minor nuisance. As the Java compiler states, "The value for annotation attribute RequestParam.defaultValue must be a constant expression". So I can't define MAX_ID converting on the fly the Long.MAX_VALUE to a String:
private static final String MAX_ID = Long.toString(Long.MAX_VALUE); // WON'T WORK!
I have to operate the conversion "by hand", instead.
private static final String MAX_ID = "9223372036854775807"; // Boring!
Besides the already existing shouldShowRecentSpittles() test, that should still work, I added a new shouldShowPagedSpittles() test that is very close to the other one but passes parameters to the (mock) call to the controller. See the tester on GitHub for more details.

Given the fake implementation I gave to the JdbcSpittleRepository, I can also see a result from the browser.
Reference: Accepting request input, from Spring in Action, Fourth Edition by Craig Walls. Chapter five, section three.

Full code for this Spring project is on GitHub.

Go to the full post

From Model to View through a Spring Controller

Another improvement to our little Spittr Spring Web App. We'll fake a JDBC model, create a controller that would get some data from it and pass it to the view, a JSP page that uses EL to access the attribute.

In the end we want to display to the user a very simple web page like this one:
enter in the details of the JSP page, you can find it in my GitHub repository, I just mention the fact that the controller puts a list of Spittles in an attribute named spittleList, and the page reads it by EL looping through a core forEach JSTL tag.

The data is moved around in a POJO, spittr.Spittle, that contains the data identifying a spittle. There is not much to say about it, besides it has its own equals() and hashCode() implementation, that are respectively based on the equals() and hash() methods from the utility class Objects:
// ...

public class Spittle {
    private final Long id;
    private final String message;
    private final LocalDateTime time;
    private Double latitude;
    private Double longitude;

 // ...

    @Override
    final public boolean equals(Object that) {  // 1
        if(this == that) {
            return true;
        }
        if (!(that instanceof Spittle))
            return false;

        Spittle other = (Spittle) that;
        return this.id == other.id && Objects.equals(this.time, other.time);  // 2
    }

    @Override
    public int hashCode() {
        return Objects.hash(id, time);  // 3
    }
}
1. I do not expect the class Spittle to be extended. In case a subclass would be created, it's implementation won't modify the assumption that two spittles are considered equals if they have same id and creation timestamp, as stated in this equals() method. To enforce it, I mark it as final.
2. Using Objects.equals() saves me from the nuisance of checking the time component against null, keeping the code more readable.
3. Since id and time define Spittle equality, it makes sense using them to determine the hash code.

I have written a test case to ensure the Spittle class works as I expected. There is nothing very interesting in it and, as usual, you can look it on GitHub.

The model will be based on the interface spittr.data.SpittleRepository:
public interface SpittleRepository {
    List<Spittle> findRecentSpittles();

    List<Spittle> findSpittles(long max, int count);

    Spittle findOne(long id);

    void save(Spittle spittle);
}
The idea is creating a JDBC repository, for the moment I have faked it. Here is its findSpittles() method:
@Repository
public class JdbcSpittleRepository implements SpittleRepository {
    // ...
    @Override
    public List<Spittle> findSpittles(long max, int count) {
        List<Spittle> result = new ArrayList<>();
        for (long i = 0; i < count; i++) {
            result.add(new Spittle(i, "Message " + i, LocalDateTime.now(), 1.0 * i, 2.0 * i));
        }

        return result;
    }
    // ...
}
The repository interface is used in the SpittleController to find spittles an put them as attribute to be consumed by the view.
@Controller
@RequestMapping("/spittles")
public class SpittleController {
    private SpittleRepository spittleRepository;

    @Autowired
    public SpittleController(SpittleRepository spittleRepository) {
        this.spittleRepository = spittleRepository;
    }

    @RequestMapping(method = RequestMethod.GET)
    public String spittles(Model model) {
        model.addAttribute("spittleList", spittleRepository.findSpittles(Long.MAX_VALUE, 20));
        return "spittles";
    }
}
I use the concrete repository to see what happens when the app actually runs on my tomcat server, however, the real testing is performed on the interface, using the mocking features included in Spring integrated to the mockito framework. For this reason I added this dependence to the application POM.
<dependency>
    <groupId>org.mockito</groupId>
    <artifactId>mockito-core</artifactId>
    <version>${mokito.version}</version>
    <scope>test</scope>
</dependency>
I also have to tell Spring where the repository is. For this reason I created a new configuration class, DataConfig, that simply returns the JdbcSpittleRepository as a bean. This configuration class is called by the already existing RootConfig, by importing it.
@Configuration
@Import(DataConfig.class)
public class RootConfig {
}
The web app works fine both running it "by hand" and through the mock test. So I pushed the code on GitHub.

Reference: Passing model data to the view, from Spring in Action, Fourth Edition by Craig Walls. Chapter five, section 2.3.

Go to the full post