CodeEval Balanced Smileys

This problem is part of the family of the ones asking to check if parentheses in a sentence are balanced. See for instance the simpler matching brackets one.

In this version, happy ':)' and sad ':(' smileys could be part of the the sentence, so that we have to think if a parentheses has to be considered in the balancing or skipped, being the representation of the mouth of the smiley.

I found it as CodeEval #84 problem, but it is a repost, originally coming from Facebook Hacker Cup 2013 Hackathon.

I came to a solution inferior to the one published by the Hacker Cup editors, so I just threw mine in the dustbin and I spent some time pondering on the other one, coming out with a slightly different variation. [By the way, I would suggest you to do the same. When you see a piece of code you don't fully understand, play around with it, adapt it to your style, make it yours.]

The base problem is pretty simple. We loop on each character in the string, any time we see an open bracket, we push it in a stack (or we emulate a push in a stack, using just a counter). When we see a close bracket, we pop from the stack the matching open bracket. In the end, the stack should be empty. If we try to pop when the stack is empty, the sentence is unbalanced.

Having to care to the smileys, we need a second (virtual) stack, where we push also the dubious open brackets, the one that could have a double interpretation.

Looking at the code should clarify the sense of this solution:
min_open = 0  # 1
max_open = 0  # 2
for i in range(len(msg)):  # 3
    if msg[i] == '(':  #4
        max_open += 1
        if i == 0 or msg[i-1] != ':':  # 5
            min_open += 1
    elif msg[i] == ')':  # 6
        if i == 0:
            return False
        min_open = min_open - 1 if min_open else 0
        if msg[i-1] != ':':  # 7
            if max_open == 0:
                return False
            max_open -= 1
return True if min_open == 0 else False  # 8
1. Emulate the first stack. It keep track of the "clean" open brackets only.
2. The second "dirty" stack, containing also the brackets that could be seen as mouth for a sad smiley.
3. Loop an all the character in the sentence.
4. Open bracket, push it to the "dirty" stack (i.e. increase the max_open counter), and ...
5. ... only if it couldn't be seen as the mouth of a smiley, push it to the "clean" stack (i.e. increase the min_open counter)
6. Close bracket, if we are at the beginning, the sentence is unbalanced. Otherwise, pop the matching open bracket from the "clean" stack. If it was already empty, the sentence could be unbalanced. Not sure, however, because it could be a happy smiley. So ...
7. If the bracket could be seen as the mouth of a happy smiley, the "dirty" stack empty means the sentence is unbalanced, otherwise, pop a bracket from it.
8. Completed the loop, the sentence is balanced only if the "clean" stack is empty.

The full python script is on GitHub.

Go to the full post

Calling a Stored Function from JDBC

The code we should write to call a stored procedure or function via JDBC is slightly different from the one we write to perform a SQL SELECT. A CallableStatement is used instead of a Statement plus a ResultSet to set the parameter, execute the call, and extract the result.

To check it, I have firstly created a stored function in my Oracle 12 instance, on the HR schema. It takes a SQL CHAR in input, and returns the same string with 'suffix' appended:
create or replace FUNCTION foo (val CHAR)
RETURN CHAR AS
BEGIN
    RETURN val || 'suffix';
END;
I ensured it works as I expected running this query:
SELECT foo('hello') FROM DUAL;
The I have written a couple of tests to see how to get the same result in Java. Not just one, because we can use either the JDBC escape and the PL/SQL block syntax to achieve the same result, being the first
{? = call foo(?)
and the latter
begin ? := foo(?); end;
The core of both tests is in these few lines:
cs = conn.prepareCall(call);  // 1
cs.registerOutParameter(1, Types.CHAR);  // 2
cs.setString(2, "aa");
cs.execute();
String result = cs.getString(1);  // 3
1. Assuming conn is a good java.sql connection to the HR user on my Oracle database, and call is either the JDBC escape or the PL/SQL block string showed above, the prepareCall() should return a good CallableStatement.
2. The callable statement has to know the type of the output parameter is a character string. Then we set the other parameter, with the string that we want to pass as input.
3. After executing the callable statement, we get the first (and only) result as a string.

The actual code, that you could find on GitHub, class GettingStartedTest methods callFoo(), testFunctionCallJdbcEscape(), and testFunctionCallPlSqlBlock(), is a bit more verbose because I have to provide all the standard boilerplate, initializing, testing, cleaning up.

Reference: Oracle Database JDBC Developer's Guide 12c Release 2 (12.2) 2.8 Stored Procedure Calls in JDBC Programs

Go to the full post

Selecting on Oracle via JDBC

Once we decided how to establish a connection to the database, performing a SELECT on it is pretty simple. Here I'm going to use OracleDataSource as connection provider, using pure JDBC doesn't add any complexity.

The only problem in this Java code is that is slightly boring. So boring that there are a few wrapper aiming at keeping the advantages of JDBC easing its usage. See for instance the Spring JdbcTemplate.

Anyway, firstly we open a connection, then a statement, on which we execute a query (a select, in this case) that return a resultset. We normally do something on the resultset, and then we can close what we have opened, in reversed order, resultset, statement, connection. The code is made even more boring by the need of try-catching the SQLExceptions that could be thrown. Last straw in this boredom parade, is that before closing the JDBC objects we have to ensure they have been created, checking them for not being null.

So, our typical plain JDBC code could be seen in three steps:

Initialization

// ...
Connection conn = null;
Statement stmt = null;
ResultSet rs = null;
try {
    conn = ods.getConnection();  // 1
    stmt = conn.createStatement();
    rs = stmt.executeQuery("SELECT first_name FROM employees");  // 2

    // ...
1. I'm using the OracleDataSourceConnector I described in the previous post. Nothing fancy, anyway, I simply get a connection to a Oracle database for the well known hr example user.
2. Having extracted a statement from the connection, I execute a query, that is plain SQL SELECT. The resulting table, here consisting of just one column, is returned in a ResultSet.

Operation

Let's do something with the fetched data. Here I copy the first names to a list of string, then I sort them and I ensure the first and last are what I expect. Usually this part of the job has more sense.
List<String> results = new ArrayList<String>();
while (rs.next()) {
    results.add(rs.getString(1));
}

assertEquals(107, results.size());
Collections.sort(results);
assertEquals("Adam", results.get(0));
assertEquals("Winston", results.get(106));
The employees table on HR should have 107 rows, first names should range from Adam to Winston, however, your result may vary.

Cleanup

Finally (actually, the code is normally performed in the finally block following the try one where the initialization and operation happened - meaning that we would always perform this step, whatever happens there), being done with the data, we can close result set, statement, and connection.
assertNotNull(rs);
rs.close();

assertNotNull(stmt);
stmt.close();

assertNotNull(conn);
conn.close();
Notice that we have to strictly follow the inverted order seen in initialization.
Notice also that each close() method could throw and exception so, again, these calls have to be in a try-catch block.

See full code on GitHub, class GettingStartedTest, method testSelect().

Reference: Oracle Database JDBC Developer's Guide 12c Release 2 (12.2) 2.4 Sample: Connecting, Querying, and Processing the Results

Go to the full post

Connecting to Oracle via JDBC

There are a couple of ways to connect to a recent Oracle Database, going through the standard DriverManager class or the specific OracleDataSource one. Let's see both of them. And, when we are there, let's add also some consideration on connecting to MySql via DriverManager.

It's a bit of an overkill, but I have written a little abstract class that is going to be the root of a hierarchy of classes for providing access to database through JDBC:
public abstract class Connector {
    public abstract Connection getConnection() throws SQLException;
 
    public String getDatabaseVersion(Connection conn) throws SQLException {
        return conn.getMetaData().getDatabaseProductVersion();
    }
}
The point of it is that each different concrete class has its own way to get a database connection but all of them would use it in the same way. So the getConnection() method is abstract where the actual method performing JDBC operation, like getDatabaseVersion(), would act in the same way.

Using OracleDataSource

The OracleDataSourceConnector extends Connector, and its raison d'être is keeping an instance of a OracleDataSource object as its private data member. Initialized in the constructor, is used by the the getConnection() method to return a database connection.
public class OracleDataSourceConnector extends Connector {
    private OracleDataSource ods;

    public OracleDataSourceConnector(String url, String user, String password) throws SQLException {
        ods = new OracleDataSource();
        ods.setURL(url);
        ods.setUser(user);
        ods.setPassword(password);
    }

    @Override
    public Connection getConnection() throws SQLException {
        return ods.getConnection();
    }
}
The good thing about OracleDataSource is that it has its own pool of connections that is managed implicitly. On the flip side, it is not standard JDBC. That means extra work if we want to adapt our code to use a different database.

I have written a tester to check the functionality, OracleDataSourceConnectorTest. See it on GitHub for full reference. There are only a few things that I want to stress here.
public class OracleDataSourceConnectorTest {
    private static final String URL = "jdbc:oracle:thin:@localhost:1521/orclpdb";  // 1
    private static final String USER = "hr";
    private static final String PASSWORD = "hr";

    private static OracleDataSourceConnector ods;
    private static Connection conn;

    @BeforeClass
    public static void setUp() {  // 2
        try {
            ods = new OracleDataSourceConnector(URL, USER, PASSWORD);
            conn = ods.getConnection();
        } catch (SQLException e) {
            fail(e.getMessage());
        }
    }

    // ...
}
1. I'm using the thin Oracle JDBC driver, the other choice is the OCI one. See the Oracle documentation if you wonder which one to use. Short answer is, usually thin is the one you want to peek. My database is local, on the standard port, and the service is named orclpdb. Your setup may vary.
2. I setup the connector and a connection through this static method called only once before the tests in the class are called. The sense is that I don't want to repeat expensive operations without a reason. So, when nothing is against it, I would reuse connector and connection in more tests.

And here is a test that requires its own connector, because I want to perform a disruptive negative test:
@Test
public void testBadUser() {
 OracleDataSourceConnector connector = null;
 try {
  connector = new OracleDataSourceConnector(URL, "Unknown", PASSWORD);  // 1
 } catch (SQLException e) {
  fail(e.getMessage());
 }
 
 try {
  connector.getConnection();  // 2
  fail("No connection expected for unknown user");
 } catch (SQLException e) {
  String expectedState = "72000";
  assertEquals(expectedState, e.getSQLState());
  
  int expectedCode = 1017;
  assertEquals(expectedCode, e.getErrorCode());
 }
}
1. I pass a bas user name to the connector. No failure is expected here, since no connection is actually done.
2. I expect the failure to happen here. Oracle should react with a ORA-01017 error code, that is included in the SQL state 72000, SQL execute phase errors.

The other test ensures that I can actually get to the database:
@Test
public void testGetDatabaseNameVersion() {
 try {
  String expected = "Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production";
  String actual = ods.getDatabaseVersion(conn);
  assertEquals(expected, actual);
 } catch (SQLException e) {
  fail(e.getMessage());
 }
}
Using DriverManager

If I am not interested in the connection pooling service offered by OracleDataSource, and I prefer to keep as generic as I can, I could use these other solution. In the old days, it required to perform an explicit registration of the JDBC driver:
Class.forName(klass)
Where klass is the actual class name, like "oracle.jdbc.driver.OracleDriver" or "com.mysql.jdbc.Driver".

Now this is done implicitly by the DriverManager, that leads to a very slim connector:
public class PlainConnector extends Connector {
 private String url;
 private String user;
 private String password;

 public PlainConnector(String url, String user, String password) {
  this.url = url;
  this.user = user;
  this.password = password;
 }

 @Override
 public Connection getConnection() throws SQLException {
  return DriverManager.getConnection(url, user, password);
 }
}
The constructor just store a copy of the data for connecting to the database, the getConnection() uses them going through DriverManager.

I have written another test case to see that actually the plain connector works as the OracleDataSource one. An then another one to see what I have to change to access with the same class a different database, here MySql.

This test case is accessing a MySql database that I have installed locally and on which I have added a user named "hr", to keep it close to the Oracle one. You can see how minimal are the required changes.

I had to change the URL:
private static final String URL = "jdbc:mysql://localhost:3306/hr?useSSL=false";
Notice the parameter useSLL set to false, to remove the MySql warning "Establishing SSL connection without server's identity verification is not recommended." In this test we can live without id verification.

In case of bad user, I expect now 28000 as SQL state and 1045, access denied, as error code.

And, the database version now should be something like "5.7.19-log"

Ah, right. Remember to put in your application build path the jar for oracle JDBC (currently named ojdbc8.jar) and for MySql (mysql-connector-java-5.1.43-bin.jar) and to have the databases up and running, if you want your tests to succeed.

Reference: Oracle Database JDBC Developer's Guide 12c Release 2

Full Java code on GitHub.

Go to the full post

Installing and configuring Oracle 12c

It was a while since the last time I have installed an instance of the Oracle database for local use. Nothing much has changed in the process, with the noticeable exception of the introduction of pluggable databases. Here I jot down a few remarks I made during the setup of the version 12 release 2.

1. In the page Typical Installation we have to specify the name for our pluggable database. Remember your choice!
2. The Enterprise Manager Database Express is available on https://localhost:5500/em, access it through the SYS user (with the password specified during setup)
3. Add in the tnsnames.ora (in the network\admin directory under the database home) the network configuration settings to access the pluggable database we have created in (1). Something like:
PDBORCL =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = pdborcl)
    )
  )
4. Use sqlplus to connect to the pluggable database with SYS as sysdba:
sqlplus sys/Oracle_1@pdborcl as sysdba;
5. As usual, unlock the hr user to access the test tables:
alter user hr identified by hr account unlock;
So far so good. All standard stuff. A problem raises if you plan to stop the Oracle service, that should be named OracleServiceORCL. Since it is always memory hungry, sometimes I just want it stay out of my RAM. The first time I did it, I had the surprise of getting a nasty error when I tried to connect to the database after restarting. Basically, the issue was that the pluggable database was mounted but not started.

This is the recipe I followed to solve it:

1. From the shell, enter sqlplus without logging.
sqlplus /nolog
2. In sqlplus, connect as sysdba.
connect / as sysdba
3. Brutally shutdown the database.
shutdown abort
4. Restart it.
startup mount
5. Recover the database, then open it.
recover database 
alter database open;
6. Open all the pluggable database, and then ask them to remember their status, so that we don't have to repeat the maneuver each time.
alter pluggable database all open;
alter pluggable database pdb_name save state;

Go to the full post

CodeEval Minimum Path Sum

Given a squared matrix of integers, find the minimal path sum from top-left to bottom-right, assuming that we can only move down or right.
This is CodeEval problem #72.

For example, give this matrix
4 6
2 8
We have just two possible paths: 4 -> 6 -> 8 and 4 -> 2 -> 8. It is immediate to see that the second one leads to the minimum sum of 14.

The structure of the problem recalls a Dynamic Programming solution table. I found it quite straightforward to use this thought to write a first solution:
def solution(data):  # 1
    dp = [[float(math.inf)] * (len(data) + 1)]  # 2
    for i in range(len(data)):
        dp.append([math.inf])
        dp[i+1].extend(data[i])

    end = len(dp)
    for i in range(1, end):  # 3
        for j in range(1, end):
            if i == 1 and j == 1:  # 4
                continue
            dp[i][j] += min(dp[i-1][j], dp[i][j-1])  # 5
    return dp[i][j]  # 6
1. The input parameter data contains a list of list of integers, representing the input matrix.
2. I copy the input in a DP-style table, where the first row and column have fake values. Usually, in a DP problem, zero is used for those cells. However here we need a value that should not be a candidate as minimum value. In a C-family language, I would have chosen a constant representing the maximum available integer. Python 3 has nothing like that, so I use math.inf, the floating-point positive infinity representation.
3. Calculate all the possible minimal path sum, skipping the first row and column.
4. This check looks unsatisfactory to me, but it is the less confusing way I found out to say that I don't have to do anything on the left-topmost element of the original matrix.
5. All the other cells are adjusted adding the smallest element on the immediate left or up.
6. There is nothing else to do, but returning the value in the bottom-right cell.

I found this solution to be elegant enough and easy to be understood. However I decided to write an alternative one to get rid of the buffer table, and also to navigate the table backward, from bottom-right to top-left.
def solution_2(data):
    last = len(data) - 1
    for i in range(last, -1, -1):
        for j in range(last, -1, -1):  # 1
            if i == last and j == last:  # 2
                continue
            if i == last:  # 3
                data[i][j] += data[i][j+1]
            elif j == last:
                data[i][j] += data[i+1][j]
            else:
                data[i][j] += min(data[i + 1][j], data[i][j+1])
    return data[0][0]  # 4
1. The loop variables are initially set to the last row/column and are decreased down to zero.
2. Again, nothing has to be done for the first cell.
3. Not having the extra-row/column, I have to add an extra check in the code for both i and j.
4. In this case the solution is stored in the top-left cell.

Not using another table should lead to a performance improvement, at least if large matrices are expected in input, and its cost in terms of less readability looks bearable to me. Scanning the table backward doesn't seem to give any advantage and makes the code a bit more obscure, so I would probably get rid of it.

I pushed the full python code on GitHub.

Go to the full post

Using JdbcTemplate in Spitter Web App

In the previous post I developed a CommandLineRunner Spring App to show how to use JdbcTemplate for simplify the JDBC access to an embedded instance of a H2 database. Here I port the changes to the Spittr Web App I have built up previously.

Actually, there is not much to say. There is some inconsistency between the two apps that spiced up the process, but nothing really complicated. I'd say the most interesting part is in the POM file. Here I removed, whenever it was possible, the versioning for each single dependency, relying instead on the parent element, where they are defined.
<parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>1.5.4.RELEASE</version>
    <relativePath />
</parent>
However, for some reason, Spring could not find the version for the JSP dependency, so I kept it as before
<dependency>
    <groupId>javax.servlet.jsp</groupId>
    <artifactId>javax.servlet.jsp-api</artifactId>
    <version>${jsp.version}</version>
    </dependency>
And left the versioning for it among the properties.
<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
    <java.version>1.8</java.version>
    <jsp.version>2.3.1</jsp.version>
</properties>
I pushed the new version of Spittr on GitHub.

Go to the full post

Spring and JdbcTemplate

Let's say that we want to write a Spring application that requires some simple database activity. We decide to use an embedded database, H2 and, instead of choosing a higher level persistency framework, we decide to go for the JDBC support, mediating it with JdbcTemplate, a Spring class aimed at simplifying its use.

I create a Spring project using the STS wizard (File >> New >> Spring Starter Project).
Since I want to use JDBC and the H2 database support, I select these two options from the dependencies page:
From database schema to POJOs

In the source/main/resources folder, I put a file named schema.sql, this is enough for Spring, that knows it has to get it and run it at startup, so there I put the SQL commands to create my tables. I create two tables, spitter and spittle, connected by the id of the first, that is a foreign key in the second.

The tables defined in the database are reflected in a couple of plain old Java classes, Spitter.java and Spittle.java, having as fields what in the tables are the columns.

There is one thing that I think is interesting. The SQL spittle table has, among the other fields, a datetime, postedTime. We need to find a counterpart for it in the Java world. Since we are not interested here in time zones and the such, just in a plain timestamp, I decided to use Java 8 java.time.Instant.

That's it. If the SQL stuff is simple, and I have just a couple of tables, I could give it for done. To see that, I have created a basic version for the SpringBootApplication that implements the CommandLineRunner interface:
public class UsingJdbcApplication implements CommandLineRunner {
    private static final Logger log = LoggerFactory.getLogger(UsingJdbcApplication.class);

    @Autowired
    private JdbcTemplate template;  // 1

    public static void main(String[] args) {
        SpringApplication.run(UsingJdbcApplication.class, args);
    }

    @Override
    public void run(String... args) throws Exception {
        Long counter = template.queryForObject("SELECT count(id) FROM spitter", Long.class);  // 2
        log.info("Rows on spitter table: " + counter);

        log.info("Pushing a spitter in");
        Object[] admin = new Object[] { "admin", "password", "Admin", "admin@example.dd" };
        template.update("INSERT INTO spitter(username, password, fullname, email) VALUES(?, ?, ?, ?)", admin);  // 3

        template.query("SELECT * FROM spitter WHERE username = ?", new Object[] { "admin" },  // 4
                (rs, rowNum) -> new Spitter(rs.getLong("id"), rs.getString("username"), rs.getString("password"),
                        rs.getString("fullname"), rs.getString("email"), rs.getBoolean("updateByEmail")))
                .forEach(spitter -> log.info("Spitter: " + spitter.toString()));
    }
}
1. Sort of magic. We ask to Spring to wire this JdbcTemplate object. Since the H2 jar is in the classpath, it understands it has to use it as database. We don't provide any settings, and it uses a whole bunch of defaults. Usually not a good idea for production code, but very handy for prototyping.
2. Let's run a simple query that returns a single value. This line shows that actually Spring has initialized the database using the script named schema.sql that it found in the main resources folder. And how JdbcTemplate makes our life so much simpler. Notice that the queryForObject() method has a first parameter that is the SQL statement we want to run, and the second one that is the type for the expected return value.
3. A parametrized update(). Its first parameter is a parametrized JDBC SQL insert statement, the second one a vector of objects containing the values that we want to usein the query.
4. Something a bit more complicated. The query() method execute a query in a way similar to the update() seen above but, as third parameter, it accepts a lambda function that maps each row in the resulting resultset to a new Spitter object. We tipically have no use for the rowNum parameter, just access any resultset field we need.
This query() call returns a list of Spitter. On that list we apply a forEach() so that each element in the list is logged. Actually, here we are expecting just one Spitter, however this is a common pattern that we should learn to recognize and use.

Running the Spring Boot App, I can see in the console window the expected log.

Repositories and configuration

Usually we don't want to scatter SQL statements all over our Java code. So we create two interfaces, one for Spittle and one for Spitter, that act as repository for them, and then we implement them for our JDBC provider.
The two classes are quite similar. Let's see some parts of JdbcSpittleRepository, that is a tad more complicated.
// ...
private static final String SELECT = "select sp.id, s.id as spitterId, s.username, s.password, s.fullname, s.email, s.updateByEmail, sp.message, sp.postedTime from Spittle sp, Spitter s where sp.spitter = s.id";
private static final String SELECT_RECENT = SELECT + " order by sp.postedTime desc limit ?";

public List<Spittle> findRecent(int count) {
    return template.query(SELECT_RECENT, new SpittleRowMapper(), count);  // 1
}

// ...
private long insertSpittle(Spittle spittle) {
    SimpleJdbcInsert insert = new SimpleJdbcInsert(template).withTableName("Spittle");  // 2
    insert.setGeneratedKeyName("id");
    Map<String, Object> args = new HashMap<String, Object>();
    args.put("spitter", spittle.getSpitter().getId());
    args.put("message", spittle.getMessage());
    args.put("postedTime", Timestamp.from(spittle.getPostedTime()));
    long spittleId = insert.executeAndReturnKey(args).longValue();
    return spittleId;
}

// ...
private static final class SpittleRowMapper implements RowMapper<Spittle> {  // 3
    public Spittle mapRow(ResultSet rs, int rowNum) throws SQLException {
        long id = rs.getLong("id");
        String message = rs.getString("message");
        Instant postedTime = rs.getTimestamp("postedTime").toInstant();
        long spitterId = rs.getLong("spitterId");
        String username = rs.getString("username");
        String password = rs.getString("password");
        String fullName = rs.getString("fullname");
        String email = rs.getString("email");
        boolean updateByEmail = rs.getBoolean("updateByEmail");
        Spitter spitter = new Spitter(spitterId, username, password, fullName, email, updateByEmail);
        return new Spittle(id, spitter, message, postedTime);
    }
}
1. Almost an innocent JdbcTemplate wrapper to a JDBC query, if we don't pay much attention to the second parameter, that would convert the ResultSet coming from the SQL select in a list of Spittles. See the SpittleRowMapper below for details.
2. Here we see how SimpleJdbcInsert helps us in writing readable code when we want to insert a row in a table, in the case where the database will take the burden of generating a key for us, and we need to return it to the caller. After specifying the table we are working on, we set the name of the field that is a generated key, and then, as a map, the field names and associated values. Finally we have just to pass them to the SimpleJdbcInsert by executeAndReturnKey().
3. The SpittleRowMapper seen in action in (1). Its mapRow() method maps a single row from the underneath table in a Spittle object.

In some way, Spring has to configure these repositories, passing to them a viable JdbcTemplate object. In this app, it is done by JdbcConfig, annotated as a Spring Configuration class. It generates a few beans (in the Spring sense of the term) representing the DataSource, JdbcTemplate, the repositories, and also a transaction manager. We'll see them at work in the tester. It interesting seeing how the DataSource is set, when we don't rely on the default initialization by Spring.
@Configuration
public class JdbcConfig {
    @Bean
    public DataSource dataSource() {
        return new EmbeddedDatabaseBuilder()
                .setType(EmbeddedDatabaseType.H2)
                .addScripts("schema.sql", "test-data.sql")
                .build();
    }
 
    // ...
This data source represents an embedded database, H2 type, that would run a couple of scripts at startup. We already seen the first one, schema.sql, here I added a second one, that contains a bunch of SQL insert for spitters and spittles rows.

Testing

There are a couple of JUnit test, one for Spitter the other one for Spittle repository. They are quite similar. Let's have a fast look at first one.
@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(classes = JdbcConfig.class)  // 1
public class JdbcSpitterRepositoryTest {
    @Autowired
    JdbcSpitterRepository spitterRepository;  // 2

    // ...
    @Test
    @Transactional  // 3
    public void findAll() {
        List<Spitter> spitters = spitterRepository.findAll();  // 4
        assertEquals(4, spitters.size());
        assertSpitter(0, spitters.get(0));
        assertSpitter(1, spitters.get(1));
        assertSpitter(2, spitters.get(2));
        assertSpitter(3, spitters.get(3));
    }

    //...
}
1. Here I'm saying to Spring to fetch the JdbcConfig class and use it to configure itself for testing.
2. Thanks to (1) we can ask to Spring to autowire the spitter repository.
3. Having defined a PlatformTransactionManager, we can use it to keep very simple the transaction management. A transaction is started and completed, eventually rolled-back, behind the curtains.
4. The spitters come from the test-data.sql called at startup by JdbcConfig dataSource().

Reference: Hitting the database with Spring and JDBC, from Spring in Action, Fourth Edition by Craig Walls, chapter ten.

This complete Spring application is on GitHub.

Go to the full post

CodeEval Word Search

Given a board where in each cell there is a letter, check if word is in it. We could start anywhere, but we could move only horizontally and vertically. Each cell could be used only once. More details on the CodeEval page.

The board is fixed, and looks like this:
ABCE
SFCS
ADEE
I have converted the examples provided in the problem in a python test case:
class TestCodeEval(unittest.TestCase):
    def test_provided_1(self):
        self.assertEqual(False, solution('ASADB'))

    def test_provided_2(self):
        self.assertEqual(True, solution('ABCCED'))

    def test_provided_3(self):
        self.assertEqual(False, solution('ABCF'))
For instance, it should be easy to see that we can't follow ASADB on the board. After ASAD there is a 'F' that block the way to the final 'B'.

I have converted the board in a couple of dictionaries:
WHERE = {'A': {(0, 0), (2, 0)},
         'B': {(0, 1)},
         'C': {(0, 2), (1, 2)},
         'D': {(2, 1)},
         'E': {(0, 3), (2, 2), (2, 3)},
         'F': {(1, 1)},
         'S': {(1, 0), (1, 3)}}

NEIGHBORS = {
    (0, 0): {(0, 1), (1, 0)},
    (0, 1): {(0, 0), (1, 1), (0, 2)},
    (0, 2): {(0, 1), (1, 2), (0, 3)},
    (0, 3): {(0, 2), (1, 3)},

    (1, 0): {(0, 0), (1, 1), (2, 0)},
    (1, 1): {(1, 0), (0, 1), (2, 1), (1, 2)},
    (1, 2): {(1, 1), (1, 3), (0, 2), (2, 2)},
    (1, 3): {(1, 2), (0, 3), (2, 3)},

    (2, 0): {(1, 0), (2, 1)},
    (2, 1): {(2, 0), (2, 2), (1, 1)},
    (2, 2): {(2, 1), (1, 2), (2, 3)},
    (2, 3): {(2, 2), (1, 3)}
}
The first one, WHERE, is a pure utility, that let me easily find where each letter is on board.
The second one, NEIGHBORS, is a plain adjacency list, and should hint that I have seen in this problem description a graph. Given the rules of adjacency in this board it is not strictly a necessity, but it helped me to better think to the solving algorithm.

My solution is based on a graph search algorithm, adapted to the peculiarity of the problem. For each letter in the word, I check for the possible continuation, backtracking when I don't find a good way, until the word is consumed or I can't find any way to continue the search.

After some startup code, I only need a recursive function that moves from one step to the next one:
def find(node, visited, letters):  # 1
    if not letters:  # 2
        return True

    letter = letters.popleft()  # 3
    nodes = WHERE[letter] & NEIGHBORS[node]  # 4
    for node in nodes:
        if node not in visited:  # 5
            visited.add(node)
            if find(node, visited, letters):  # 6
                return True
            visited.remove(node)
    letters.appendleft(letter)  # 7
    return False
1. I pass to the function the current position on board, then a set of already visited nodes, and finally a deque containing the letters that should be still to be found.
2. No letter pending means I have already found the word. Job done.
3. Extract the first letter I have to find from the deque.
4. The intersection between the sets where the letter is present and the nodes adjacent to the current ones give me the set of nodes where I have to look for the next step.
5. But only the nodes that I have not already visited are good candidates.
6. Recursive call.
7. If this branch is not satisfactory, I push back the letter in the deque and return False.

Full code, both test and actual python script, is on GitHub.

Go to the full post

CodeEval Levenshtein Distance

We are given in input a short list of words, a separator, then another huge list of words. For each word in the first list, we have to return the numbers of elements in its "social network" from the second list. A word is "friend" of another one if their Levenshtein distance is one. Meaning, they have just one letter difference, as "aa" with "aaa", or "aaa" with "aba". Pay attention that the example provided by CodeEval, when tested against the file they pushed on GitHub, is slightly wrong.

The problem is conceptually simple, its main issue is about performance. Let's write firstly a straightforward solution that scores poorly even when implemented in C++, and don't even get accepted in my python implementation, due to its slowness.

The definition of friendship given in this problem should let us think about graphs and cliques. Fortunately we are not asked to get the maximum clique in the network, but only the maximum clique for the few specified nodes. So we should expect "just" an Big-Oh N squared time complexity on the number of elements in the full social network that, unfortunately, we know to be large.

Still, on top on that N x N, we should consider the cost of calculating the Levenshtein distance for any couple of words. This problem leads almost naturally to a Dynamic Programming solution, however, also in this case we should expect an expensive temporary complexity, give or take in the order of m x n, where m and n are the lengths of the two words.

A Levenshtein distance of one

But, wait a minute, really we are not interested in the Levenshtein distance, just on determine if two words have it equals to one. This is a much more simple problem, and we can even break it in two parts. If the two words have the same size, we could just check if there is one and only one different character in them, i.e., we can pass from one string to the other just with one char swap:
def is_swap(lhs, rhs):
    difference = False
    for left, right in zip(lhs, rhs):  # 1
        if left != right:  # 2
            if difference:
                return False
            difference = True
    return difference  # 3
1. In production code, I would have asserted the precondition len(lhs) == len(rhs), here I just rely on the caller to do it. Assuming the same size for lhs and rhs, I zip them, extracting the characters at the same position and putting them in the cycle variables, left and right.
2. If left is not equals to right, I raise the difference flag. But if it was already raised, I know the difference is too strong, and I can stop checking.
3. If there is one and only one difference I would return True.

Other part of my simplified Levenshtein check. If I have two strings with difference in their size of one, the longest one should be the same as the shorter, but having one character more.
def is_add(shorter, longer):
    if len(shorter) > len(longer):  # 1
        shorter, longer = longer, shorter

    x = 0  # 2
    i = 0  # 3
    while i < len(shorter):
        if shorter[i] != longer[i+x]:  # 4
            if x == 1:
                return False
            x += 1
        else:
            i += 1

    return True if x < 2 else False  # 5
1. I assume the caller ensures the precondition on the parameter size, but doesn't verify which is which. This simplify the caller code, making this function a bit less readable. Anyway, the point of this check is ensuring that the string named shorter refers to the actual shorter one in input. 2. Counter for the extra characters found in the longer string. 3. Loop variable on the shorter string. The longer one will be ahead of x positions. 4. I found a difference in the strings. Add the difference counter, but if I have already seen a difference, in that case I know the two strings are too different to be friends. 5. I return True if here x is both 0 or 1. Zero means a case like "aa" vs. "aax". Each comparison in the length of shorter succeeded, however longer has a character more. One means the difference is in the middle. More than one difference is trapped in the while loop. Setting up the graph My first idea was to generate the social network graph, and then looking on it for the friends of the requested nodes. Saying that words is a dictionary, each word I get in input would be pushed in it as key, initializing its value with an empty set:
for test in file:
    words[test.rstrip('\n')] = set()
Then I call a utility function, check_friends(), to push each friend in its place - seeing it as a graph, the node is in the key, the edges are in the value.
def check_friends():
    for lhs in words.keys():  # 1
        for rhs in words.keys():
            if (len(lhs) == len(rhs) and is_swap(lhs, rhs) or  # 2
                    abs(len(lhs) - len(rhs)) == 1 and is_add(lhs, rhs)):  # 3
                words[lhs].add(rhs)
                words[rhs].add(lhs)
1. The infamous N x N loop. I check each word against all the other ones. Actually, I let is_swap also to detect if the word is tested against itself. 2. Same size, let is_swap() decide if the words are friends. 3. One character difference, ask to is_add() for friendship. Getting the clique Checking if a given word is in the full social network is easy and fast, being stored in a dictionary. Then, I apply a Breadth-first search (BFS) algorithm to navigate the graph and collecting all the friends and friends of friends:
def solution(word):
    results = []  # 1
    visited = set()  # 2

    net = deque()  # 3
    if word not in words.keys():  # 4
        return 0
    net.append(word)  # 5
    visited.add(word)

    while net:  # 6
        friend = net.popleft()
        fofs = words[friend]
        for fof in fofs:  # 7
            if fof not in visited:
                net.append(fof)
                visited.add(fof)
        results.append(friend)  # 8

    return len(results)
1. Here I am going to push all the elements in the word social network. 2. This is to avoid infinite looping. It's equivalent of marking a node as visited. 3. In this queue I push the node in the personal net for the current node that I am checking. 4. It is not officially stated in the problem that the caller could ask only for a word in the list, so I added this control. In any case it is cheap. 5. Initialize the algorithm before starting. The passed word is ready to be processed, and it is marked as visited. 6. While there are nodes to be processed, loop. 7. Push each friend of the current friend in the net, mark it as visited. 8. Push the current friend in the results set. This approach works alright, and I'm quite happy with it. However, for the way the problem is conceived, it is too slow. We need to do something to improve performances. It is clear where the bottleneck is. Generating the graph for the full social network is obviously very expensive. Do we really need to do it? Usually it would be a good idea, but here we know that we need to calculate the social network just for a handful of words. So, let's avoid to do it. A slimmer algorithm First change, word is not anymore a dictionary, but just a set. This means there is no value associated to each element, when I read a word I just push it in the set, and that's it.
for test in file:
    words.add(test.rstrip('\n'))
Also, no need to call check_friends(), instead, I do almost the same job in a get_friends() function, but only for the relevant items:
def get_friends(item):
    result = []
    for word in words:
        if (len(item) == len(word) and is_swap(item, word) or
                abs(len(item) - len(word)) == 1 and is_add(item, word)):
            result.append(word)
    return result
The changes in solution() are minimal:
def solution(word):
    results = set()
    visited = set()

    net = deque()
    if word not in words:
        return 0
    net.append(word)
    visited.add(word)

    while net:
        friend = net.popleft()
        fofs = get_friends(friend)
        for fof in fofs:
            if fof not in visited:
                net.append(fof)
                visited.add(fof)
        results.add(friend)

    return len(results)
And now CodeEval accepts happily the code. I pushed both my python solutions to GitHub, the complete but slow one, and the sleeker one, tailored on the problem requisites.

Go to the full post

CodeEval Type Ahead

Given the size of an n-gram in input, and its first n-1 components, we should write a function that returns our predicted possible last words, sorted by their calculated probability (from higher to lower) and then alphabetically. The solution assumes we generate our n-gram against the nursery rhyme Mary had a little lamb that we could safely hard-code in our program, stripping it from all non-words. More details on the CodeEval page problem.

I plan to write a python solution. So, first thing, I write a python test case:
def test_provided_1(self):  # 1
    self.assertEqual('lamb,0.375;teacher,0.250;children,0.125;eager,0.125;rule,0.125', solution(2, 'the'))

def test_lamb(self):
    self.assertEqual('at,0.200;its,0.200;love,0.200;was,0.200;you,0.200', solution(2, 'lamb'))

def test_the_lamb(self):
    self.assertEqual('love,0.333;was,0.333;you,0.333', solution(3, 'the lamb'))

def test_at(self):  # 2
    self.assertEqual('school,1.000', solution(2, 'at'))
1. The given as example. Looking at the Mary & lamb rhyme, the possible resulting two-grams when the first word is 'the' could be 'the lamb', 'the teacher', 'the children', 'the eager', 'the rule'. The most probable one, using 'lamb', has an estimated 0.375 chance. The less probable ones, having the same chance, should be presented in alphabetical order.
2. Just a single two-gram could be generated from 'at'. Its probability should be obviously 1. Notice the format for probability, three decimal are always required.

The most boring part of the problem was converting the rhyme in a list of words, something like
TEXT = ['Mary', 'had', 'a', 'little', 'lamb', 'its', 'fleece', 'was',
# ...
        'teacher', 'did', 'reply']

I found out running the CodeEval tester that I don't have to worry about using some case insensitive comparison. Upper and lower case are as found in the rhyme.

Then I need a dictionary, where I will put all the n-grams in the rhyme
n_grams = {}
I decided to be lazy. Actually, I'd say that in this case being lazy is the only sensible thing to do, if I don't want to create n-grams that are not required by the user. I write instead a function that checks if a give n-gram family has been already generated. If not, it is created and pushed in the dictionary:
def check_n_grams(n):
    if n in n_grams:  # 1
        return

    item = defaultdict(lambda: defaultdict(int))  # 2
    for i in range(len(TEXT) - n):  # 3
        outer = ' '.join(TEXT[i:i + n])  # 4
        intern = TEXT[i + n]
        item[outer][intern] += 1  # 5

    n_grams[n] = item
1. the n_grams for the current n have been already generated, there is nothing to be done here.
2. Let's generate all the n-grams for the passed n. We need here a bit of a perplexing data structure, a dictionary of dictionaries (that is going to be pushed in a dictionary). Moreover, to simplify a bit the following code, I used defaultdicts. I can't pass defaultdict to initialize a defaultdict, because it is not a callable. So I used the clever trick of passing instead a lambda (that is a callable) that returns it.
3. Loop on all the words in TEXT, stopping a tad before the end, see below if you don't have already guessed why. This code spectacularly fails if the user has the insane, but legit, idea of asking for an extremely high n. For production code a check on its size should be mandatory.
4. Generate the current n-gram considering it divided in two parts, the first one, that I am going to store as key in the outer dictionary, is done joining n words starting from the current one on, the second, that the key in the intern dictionary, represents the last word in the n-gram. So, actually, I am storing in this maps all possible (n+1)-grams.
5. Push the n-gram in the dictionary of dictionaries, setting the initial value to one, or increasing it if such combination is already there.

Large part of the job is done. The solution function does not much more than calling the utility function defined above and formatting the output.
def solution(n, given):
    n -= 1  # 1
    check_n_grams(n)  # 2

    target = n_grams[n][given]  # 3

    values = list(target.items())  # 4
    values.sort(key=lambda x: x[0])  # 5
    values.sort(key=lambda x: x[1], reverse=True)  # 6

    results = []  # 7
    population = sum(target.values())
    for value in values:
        results.append('{},{:.3f}'.format(value[0], value[1] / population))  # 8
    return ';'.join(results)
1. As seen in check_n_grams(), I generate in the n_grams dictionary the (n+1)-grams, splitting them in the key-value on the internal dictionary. I felt that this trick made the code more readable, however here I pay the price of it, with this obscure decrement. Again, production code should perform some parameter checking to avoid runtime disasters.
2. Ensure the current n-grams have been generated.
3. Get the internal dictionary. For instance, if I am looking for the 2-grams for 'at', in target I expect to get a dictionary containing just one element, where the key is the last word in the n-gram and value is its hit counters. In this case 'school': 1.
4. I need to do some sorting now, so I convert the items in target to a list, data type sporting a handy function, sort() that is stable, in this way I can call it repeatedly to get the expected effect.
5. First I sort values by the zeroth component, representing the key of the internal dictionary.
6. Then I sort it by the one-th component, the word counter, in reversed mode, so that the most seen would be first.
7. Format the data, put them in this temporary list, join its components and return them.
8. As required, the probability is formatted as a floating number with three decimals.

Full code pushed on GitHub. Both the tester and the actual python script.

Go to the full post

CodeEval Cash Register

Given in input the price of a certain item and the cash that the client has passed to us to get it, we have to provide to the cashier a list of what he has to get back. More details in the problem page on CodeEval.

We assume the register has an infinite amount of coins and bills in the given denominations:
'PENNY': .01,
'NICKEL': .05,
'DIME': .10,
'QUARTER': .25,
'HALF DOLLAR': .50,
'ONE': 1.00,
'TWO': 2.00,
'FIVE': 5.00,
'TEN': 10.00,
'TWENTY': 20.00,
'FIFTY': 50.00,
'ONE HUNDRED': 100.00
To better clarify what we should do, the first thing I have done is converting the given examples in a Python test case:
def test_provided_1(self):
    self.assertEqual('NICKEL,PENNY', solution(15.94, 16))

def test_provided_2(self):
    self.assertEqual('ERROR', solution(17, 16))

def test_provided_3(self):
    self.assertEqual('ZERO', solution(35, 35))

def test_provided_4(self):
    self.assertEqual('FIVE', solution(45, 50))
The first number passed to the function is the price, the second one is the amount given in.

It shouldn't be to difficult to get a working solution in Python. The core of the problem is looping on all the available denominations, starting from the highest one down to the smallest, selecting each denomination the maximum possible times.

Since working with floating point numbers when a currency is expected is rarely a good idea, I have decided to store in my dictionary of denomination the values as multiple of cents. Something like that:
DENOMINATIONS = {
    'PENNY': 1,
    'NICKEL': 5,

    # ...

    'FIFTY': 5000,
    'ONE HUNDRED': 10000
}
Besides, I payed attention in converting the floating point numbers in integers so to avoid truncation problems:
change = int(cash * 100) - int(price * 100)
However, the interesting part is the loop to check if a given denomination has to be used and how many times:
for name, value in sorted(DENOMINATIONS.items(), key=lambda x: x[1], reverse=True):
    # ...
I want to loop on all the items() in the DENOMINATIONS dictionary, to operate on its key and value - where the key represent the name of the denomination.

I can't rely on the order chosen by Python in scanning the dictionary, I must start from ONE HUNDRED, and proceed in sorted fashion down to the PENNY. So I sorted() it.

I don't want to sort them by the key, it won't make any sense, but by value. So I specify as key that I want to use the second component. I say that to Python using a lambda, that gets the current item in input and returns its second component (index 1, since we are 0 based).

As said above, I want to start from the highest value, that's way I specify that the sorting order is reversed.

The rest of the code should be quite simple to figure out. In any case, my full solution is on GitHub, both the test case and the actual python script.

Go to the full post