Showing posts with label test. Show all posts
Showing posts with label test. Show all posts

Test on build with Maven

Let's add some testing to the tiny mavenized Java app seen in the previous couple of posts.

Actually, the app is so tiny that there is that there's nothing to test. This is easily solved, adding a method to the Main class
public static String getLoggerClassName() {
    return LOG.getClass().getName();
}
I plan to use JUnit 5 (aka Jupiter) as testing framework, asserting via Hamcrest, so I add both these dependencies to the POM.
<dependency>
    <groupId>org.junit.jupiter</groupId>
    <artifactId>junit-jupiter-engine</artifactId>
    <version>5.6.2</version>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.hamcrest</groupId>
    <artifactId>hamcrest</artifactId>
    <version>2.2</version>
    <scope>test</scope>
</dependency>
As I said, I plan to use JUnit and Hamcrest for testing only. I don't want to bother the user with them. That's the reason why I signaled to Maven that their scope is test.

Now I write a test for the newly added method
@Test
void getLoggerClassName() {
    String actual = Main.getLoggerClassName();
    assertThat(actual, is("ch.qos.logback.classic.Logger"));
}
I can run it from Eclipse in the usual way, anything works fine.

I build the app with Maven, clean package, and I get both my jars, slim and fat, as before. In both cases without the test dependencies, and that's good. Still, something is missing. Tests are not executed in this phase. Bad. I'd like to check anything works fine when I build my jar.

The issue is in the Maven Surefire Plugin. By default, an ancient version is executed that do not work nicely with JUnit 5. So, I force Maven to use a newer one. It's name is a bit scary, having an M in its version number. M as Milestone. However, it looks like we have to live with it. So I use it.
<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-surefire-plugin</artifactId>
    <version>3.0.0-M4</version>
</plugin>

Actually, there's no need to specify the group id in this case, since maven plugins is the default. Your choice.

That's it. Now the Maven package build trigger also the execution of the tests. Nice.

I have pushed the full code on GitHub. So, please, check there for details.

Go to the full post

Testing Spring Boot with JUnit5

I decided that is time to move my tests in Spring app from JUnit 4 to Jupiter. It is not too complicated, given that Spring is already supporting JUnit 5, even though the default is still the previous version. However it requires being careful in a couple of steps.

Firstly, I have changed the POM in this way:
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-test</artifactId>
    <scope>test</scope>
    <exclusions>
        <exclusion>
            <!-- get rid of JUnit 4 -->
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
        </exclusion>
    </exclusions>
</dependency>

<!-- using JUnit 5 instead -->
<dependency>
    <groupId>org.junit.jupiter</groupId>
    <artifactId>junit-jupiter-engine</artifactId>
    <!-- using the version set by Spring -->
    <scope>test</scope>
</dependency>
I have excluded JUnit 4 from the spring-boot-starter-test dependency, and then I have inserted a new dependency for Jupiter, all of them are obviously in the test scope. Notice that I didn't set the version for JUnit 5, accepting instead the one proposed by Spring. Just to stay on the safe side.

The changes in "normal" testing are quite simple, usually limited in using the right imports for the new library. Something a bit more engaging is to be done for Spring Application test. What I had for JUnit 4 was something like this
@RunWith(SpringRunner.class)
@SpringBootTest
public class MyApplicationTests {
    // ...
In Jupiter, however, the annotation RunWith has been replaced with the more powerful ExtendWith, so the code has to be changed to
@ExtendWith(SpringExtension.class)
@SpringBootTest
public class MyApplicationTests {
    // ...
Not too bad, isn't it?

Go to the full post

A nice Java testing MOOC

I've just finished the first week of the MOOC "Automated Software Testing: Practical Skills for Java Developers" provided by TU Delft on EdX. It looks to me they did a good job. Maybe, if you are a native English speaker, the speakers' accent could sometimes sound a bit weird. However, as continental European, I kind of enjoy it (and I know that my Italian accent sounds even funnier).

The course is focused on Java, JUnit 5, using IntelliJ as IDE.

It looks like a brand new production, so be ready to step into typos and minor mistakes here and there. The most annoying of them, is the one in Question 11 on the Quizzes.

We have to write tests to detect where is the bug in a function named mirrorEnds() that sort of detect how much a string is palindromic. Unfortunately, a wrong solution has to be selected to get the point on it!

If you are taking the course, I suggest you to have a look at the discussion forum to get help.

And, if you want to compare your code with mine, please find my forked repository on GitHub.

Go to the full post

From Model to View through a Spring Controller

Another improvement to our little Spittr Spring Web App. We'll fake a JDBC model, create a controller that would get some data from it and pass it to the view, a JSP page that uses EL to access the attribute.

In the end we want to display to the user a very simple web page like this one:
enter in the details of the JSP page, you can find it in my GitHub repository, I just mention the fact that the controller puts a list of Spittles in an attribute named spittleList, and the page reads it by EL looping through a core forEach JSTL tag.

The data is moved around in a POJO, spittr.Spittle, that contains the data identifying a spittle. There is not much to say about it, besides it has its own equals() and hashCode() implementation, that are respectively based on the equals() and hash() methods from the utility class Objects:
// ...

public class Spittle {
    private final Long id;
    private final String message;
    private final LocalDateTime time;
    private Double latitude;
    private Double longitude;

 // ...

    @Override
    final public boolean equals(Object that) {  // 1
        if(this == that) {
            return true;
        }
        if (!(that instanceof Spittle))
            return false;

        Spittle other = (Spittle) that;
        return this.id == other.id && Objects.equals(this.time, other.time);  // 2
    }

    @Override
    public int hashCode() {
        return Objects.hash(id, time);  // 3
    }
}
1. I do not expect the class Spittle to be extended. In case a subclass would be created, it's implementation won't modify the assumption that two spittles are considered equals if they have same id and creation timestamp, as stated in this equals() method. To enforce it, I mark it as final.
2. Using Objects.equals() saves me from the nuisance of checking the time component against null, keeping the code more readable.
3. Since id and time define Spittle equality, it makes sense using them to determine the hash code.

I have written a test case to ensure the Spittle class works as I expected. There is nothing very interesting in it and, as usual, you can look it on GitHub.

The model will be based on the interface spittr.data.SpittleRepository:
public interface SpittleRepository {
    List<Spittle> findRecentSpittles();

    List<Spittle> findSpittles(long max, int count);

    Spittle findOne(long id);

    void save(Spittle spittle);
}
The idea is creating a JDBC repository, for the moment I have faked it. Here is its findSpittles() method:
@Repository
public class JdbcSpittleRepository implements SpittleRepository {
    // ...
    @Override
    public List<Spittle> findSpittles(long max, int count) {
        List<Spittle> result = new ArrayList<>();
        for (long i = 0; i < count; i++) {
            result.add(new Spittle(i, "Message " + i, LocalDateTime.now(), 1.0 * i, 2.0 * i));
        }

        return result;
    }
    // ...
}
The repository interface is used in the SpittleController to find spittles an put them as attribute to be consumed by the view.
@Controller
@RequestMapping("/spittles")
public class SpittleController {
    private SpittleRepository spittleRepository;

    @Autowired
    public SpittleController(SpittleRepository spittleRepository) {
        this.spittleRepository = spittleRepository;
    }

    @RequestMapping(method = RequestMethod.GET)
    public String spittles(Model model) {
        model.addAttribute("spittleList", spittleRepository.findSpittles(Long.MAX_VALUE, 20));
        return "spittles";
    }
}
I use the concrete repository to see what happens when the app actually runs on my tomcat server, however, the real testing is performed on the interface, using the mocking features included in Spring integrated to the mockito framework. For this reason I added this dependence to the application POM.
<dependency>
    <groupId>org.mockito</groupId>
    <artifactId>mockito-core</artifactId>
    <version>${mokito.version}</version>
    <scope>test</scope>
</dependency>
I also have to tell Spring where the repository is. For this reason I created a new configuration class, DataConfig, that simply returns the JdbcSpittleRepository as a bean. This configuration class is called by the already existing RootConfig, by importing it.
@Configuration
@Import(DataConfig.class)
public class RootConfig {
}
The web app works fine both running it "by hand" and through the mock test. So I pushed the code on GitHub.

Reference: Passing model data to the view, from Spring in Action, Fourth Edition by Craig Walls. Chapter five, section 2.3.

Go to the full post

Mock test for a Spring controller

Having built a simple Spring Web App, now I want to test its only controller. In the previous post I have done it "by hand", running a browser on its url and verifying that I actually get the expected page, now I use the mock testing support provided by the Spring framework.

Firstly I add to the pom file the Spring test dependency:
<properties>
 <!-- ... -->
 <spring-test.version>4.3.0.RELEASE</spring-test.version>
 <!-- ... -->
</properties>
<dependencies>
 <!-- ... -->
 <dependency>
  <groupId>org.springframework</groupId>
  <artifactId>spring-test</artifactId>
  <version>${spring-test.version}</version>
  <scope>test</scope>
 </dependency>
 <!-- ... -->
</dependencies>
Then I write a JUnit tester for my HomeController, following a simple pattern:
@Test
public void testHome() {
    HomeController controller = new HomeController();  // 1
    try {
        MockMvc mockMvc = standaloneSetup(controller).build();  // 2
        mockMvc.perform(get("/")).andExpect(view().name("home"));  // 3
    } catch(Throwable t) {  // 4
        // ...
    }
}
1. I create an object from my controller.
2. I build a mock MVC test object from my controller.
3. I perform a mock HTTP GET command for "/" and verify I get back a view named "home".
4. There is not much control on what a MockMvc would throw in case something goes wrong, we have to be ready to catch any possible throwable.

It looks very clean and simple. However I had a problem in the form of this exception:
java.lang.NoClassDefFoundError: javax/servlet/SessionCookieConfig at
 org.springframework.test.web.servlet.setup.StandaloneMockMvcBuilder.
  initWebAppContext(StandaloneMockMvcBuilder.java:329)
...
The fact is that I specified in the POM a dependency for the ancient javax.servlet servlet-api 2.5, while I should use at least a version 3 to get the job done. For my purposes, 3.1.0 is more than enough.
Pay attention to the artifact name change that happened right at the time of moving from 2 to 3. Now the servlet api is identified by the id javax.servlet-api.

Let's slightly improve my web app. We want now that the controller returns the home view while getting both root and "/homepage".

I add another test case, testHomeExtra(), that is just like testHome() but it performs a GET on the other URI.
As expected, it fails, giving as reason, "No ModelAndView found".

I change my controller, specifying at class level which URI's it has to manage:
@Controller
@RequestMapping({"/", "/homepage"})
public class HomeController {
    @RequestMapping(method = GET)
    public String home(Model model) {
        return "home";
    }
}
And now I have green light while testing.
Reference: Building Spring web applications, from Spring in Action, Fourth Edition by Craig Walls. Chapter five, section two, Testing the controller.

I have committed the project changes to GitHub.

Go to the full post

Conditional bean

We have a bean in our application, but we want it instantiated only if some condition applies. To do that, we can use the Spring Conditional annotation, and let the framework decide if it has to be created or not.

Firstly, I create a very simple class in the restfun package:
public class MagicBean {
    @Override
    public String toString() {
        return "Magic Bean!";
    }
}
Then I create a class that implements the Condition Spring interface defining its matches() method.
public class MagicExistsCondition implements Condition {
    @Override
    public boolean matches(ConditionContext context, AnnotatedTypeMetadata metadata) {
        Environment env = context.getEnvironment();
        return !env.containsProperty("lack_of_magic");
    }
}
I am saying that the condition hold when the environment does not contain the property named lack_of_magic.

And here is a configuration class that is putting all together in its method annotated as Conditional Bean, that specify the Condition class to verify if there is the condition for acting and that returns a new instance of the required class.
@Configuration
public class MagicConfig {
    @Bean
    @Conditional(MagicExistsCondition.class)
    public MagicBean magicBean() {
        return new MagicBean();
    }
}
To check if my magic bean works as expected, I have created a test case where I get it through both the Spring application context and directly by autowiring the specific bean.
@Test
public void testBeanInContext() {
    MagicBean mb = (MagicBean) context.getBean("magicBean");
    assertThat(mb.toString(), is("Magic Bean!"));
}

@Test
public void testBeanWired() {
    assertThat(magic.toString(), is("Magic Bean!"));
}
I played a bit around this code, changing the Condition and the tests, and enjoing the JUnit green light.
Until I felt satisfied, and I pushed the code to GitHub. See the source package and the JUnit test class for details.

I have written this post while reading the third chapter, Advanced wiring, of Spring in Action, Fourth Edition by Craig Walls. I ported the original code to a Maven Spring Boot Web project on the STS IDE, using AspectJ annotations instead of using the classic xml configuration. You should expect a few other minor changes in this code, basically because I wanted to have some fun too.

Go to the full post

Testing a Spring Profile Bean

We have seen in a previous post how to use the Spring Profile annotation to let the framework select which Bean to load at runtime. Let's see now how to test this behavior with JUnit.

Say that for some peculiar reason, my Web Application needs to have different beans in different environments. To address this delicate matter, we can create a Spring Configuration class that autowires objects implementing classes in the same hierarchy using a specific factory method.

For instance, here is how I could let Spring know that I want to create CharSequence Beans in an application:
@Configuration
public class CharSeqConfig {
    @Bean
    @Profile("dev")
    public CharSequence myString() {
        return new String("String");
    }

    @Bean
    @Profile("prod")
    public CharSequence myStringBuilder() {
        StringBuilder sb = new StringBuilder();
        sb.append("StringBuilder");
        return sb;
    }
}
When the "dev" profile is specified, Spring will use the myString() method when asked to wire a CharSequence bean. When in "prod", myStringBuilder() will be selected instead.

Testing this code with JUnit it is not much more complicated than the usual job we have already seen, basically, we have to specify which profile we want check using the ActiveProfiles annotation. However, usually we want to keep together the same tests for different profiles, and the nice thing is that we can easily do it, using static classes:
public class CharSeqConfigTest {
    @RunWith(SpringJUnit4ClassRunner.class)
    @ContextConfiguration(classes=CharSeqConfig.class)
    @ActiveProfiles("dev")
    public static class DevCharSeqTest {
        @Autowired
        private CharSequence charSeq;
        
        @Test
        public void testCharSeq() {
          assertThat(charSeq.toString(), is("String"));
        }
    }

    @RunWith(SpringJUnit4ClassRunner.class)
    @ContextConfiguration(classes=CharSeqConfig.class)
    @ActiveProfiles("prod")
    public static class ProdCharSeqTest {
        @Autowired
        private CharSequence charSeq;
        
        @Test
        public void testCharSeq() {
          assertThat(charSeq.toString(), is("StringBuilder"));
        }
    }
}
A minor nuisance is that we have to tell to STS which family of tests we want to run with JUnit:
However it pays off well.
Having seen the green light for both the profiles, I have pushed the code on GitHub, both for the configuration class and the JUnit test case.

I have written this post while reading the third chapter, Advanced wiring, of Spring in Action, Fourth Edition by Craig Walls. I ported the original code to a Maven Spring Boot Web project on the STS IDE, using AspectJ annotations instead of using the classic xml configuration. I have also done a few minor changes to keep the post as slim and readable as possible. Here I have used CharSequence as root for my bean hierarchy instead of DataSource. In this way my example code makes less sense, but it give some immediate satisfaction.

Go to the full post

Testing Spring autowiring in a more complex scenario

Let's improve the code seen in the previous Spring post. I add a CD player components that implements a media player interface and it is autowired to the CD component.

The media player interface has just a couple of methods
public interface MediaPlayer {
    String play();
    boolean hasMedia();
}
The second one checks if a media is inserted in the player and, you have probably guessed, it's there just to let me check if the autowiring of a CD is done as expected.

Then I have written a CD player, implementing this interface
@Component
public class CDPlayer implements MediaPlayer {
    private CompactDisc cd;

    @Autowired
    public CDPlayer(CompactDisc cd) {
        this.cd = cd;
    }

    // ...
}
Notice that this class is annotated as Spring component, and its constructor is autowired. So we expect the framework will look for a suitable component in the CompactDisc hierarchy and wire it to the ctor parameter.

To have the wiring working we need to configure properly the component scan class, so I modify it in this way:
@Configuration
@ComponentScan(basePackageClasses = { CompactDisc.class, MediaPlayer.class })
public class CDPlayerConfig {
}
To test it, I have change the CDPlayerTest to use just a MediaPlayer.
@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(classes = CDPlayerConfig.class)
public class CDPlayerTest {
    @Autowired
    private MediaPlayer player;

    @Test
    public void testPlayerNotNull() {
        assertNotNull(player);
    }

    @Test
    public void testCDInserted() {
        assertTrue(player.hasMedia());
    }

    @Test
    public void testPlay() {
        assertEquals("Sgt. Pepper's Lonely Hearts Club Band by The Beatles", player.play());
    }
}
JUnit, using the Spring class runner, let the framework to do all the required wiring, in the app source code, and also here in the test. I ran it, and I was happy to get full green light.

I have written this post while reading the second chapter, Wiring beans, of Spring in Action, Fourth Edition by Craig Walls. While doing it, I have ported the original code to a Maven Spring Boot Web project on the STS IDE, using AspectJ annotations instead of using the classic xml configuration.
I have also done a few minor changes to keep the post as slim and readable as possible. For instance I removed the references to the brilliant System Rule library by Stefan Birkner & Marc Philipp.

Full code on github.
Have a look at the soundsystem package and the test file.

Go to the full post

Testing Spring autowiring with JUnit

Given a Spring component, we can let Spring wiring it automatically to a reference to an interface that it implements, assuming that we tell Spring which components to scan. And using the SpringJUnit4ClassRunner we can test if it works.

In my Maven Boot Spring web application, I created a package, dd.sia.soundsystem, and I put in it an interface:
public interface CompactDisc {
    String play();
}
Then I have written a class that implements it, annotated as Spring component:
@Component
public class SgtPeppers implements CompactDisc {
    // ...
    @Override
    public String play() {
        // ...
    }
}
Now I create a class having the only purpose of letting know to Spring how to configure for autowiring:
@Configuration
@ComponentScan(basePackageClasses=CompactDisc.class)
public class CDPlayerConfig { 
}
Here I said that Spring has to scan all the classes based on CompactDisc. There are other ways to feed the annotation ComponentScan, and we could even rely on a empty default, meaning check all the components in the current package. I feel more confident passing the base class/interface.

Let's test if the wiring works correctly.
@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(classes = CDPlayerConfig.class)
public class CDPlayerTest {
    @Autowired
    private CompactDisc cd;

    @Test
    public void testNotBeNull() {
        assertNotNull(cd);
    }

    @Test
    public void testPlay() {
        assertEquals("Sgt. Pepper's Lonely Hearts Club Band by The Beatles", cd.play());
    }
}
I annotated the class to let JUnit know that it should use the specific Spring runner, then I said to Spring that it should configure the context using CDPlayerConfig. Now I just have to autowire a CompactDisc reference, and let Spring inject in it an instance of the only available component rooted in it, namely, the SgtPeppers class.


I have written this post while reading the second chapter, Wiring beans, of Spring in Action, Fourth Edition by Craig Walls. While doing it, I have ported the original code to a Maven Spring Boot Web project on the STS IDE, using AspectJ annotations instead of using the classic xml configuration.

Full code on github. Most relevant files are the ones in the source soundsystem package and the test file.

Go to the full post

Spring and Mockito

A modern web application written using Spring would use Dependency Injection to loose coupling among the components in the project. This is good, and help us to write more flexible code, as we have seen in a previous post. However this also mean that we need a way to test code that uses a component that is not actually there. Mock testing its here to help us. We'll see here how to use Mockito to achieve this result.

Say that my Spring Boot Application has a RestController named KnightController. It maps a request for the /quest resource to a call to the method doQuest() on the Knight resource owned by the controller.
@RestController
public class KnightController {
    @Resource
    private Knight knight;

    @RequestMapping("/quest")
    public String quest() {
        return knight.doQuest();
    }
}
Knight is just an interface that declares a single method, doQuest(). The beauty of this approach is that we can send any kind of knight in a quest, the only requirement being he implements this interface. For instance, here is a BraveKnight:
public class BraveKnight implements Knight {
    private Quest quest;

    public BraveKnight(Quest quest) {
        this.quest = quest;
    }

    @Override
    public String doQuest() {
        return quest.embark();
    }
}
This Knight owns a Quest, that is defined at runtime, by Constructor Injection, i.e., the actual quest this knight is going to embark is injected in the knight by use of the ctor. As for the Knight, also Quest is an interface with a single method, embark(). Again, in this way a knight is very flexible, having a way to change by injection which is the quest he would embark.

Even if there is more to say on this architecture, here we are interested in see how we can test if what we have now works as expected. Meaning, I would like to write a test case where I create a Quest, I inject it in a BraveKnight, and when I call his doQuest() method I see if the quest embark() method is called. We can't do that in the real world, since Quest is just an interface. But we can mock it, for instance using Mockito.

I need to add the mokito-core dependency in my pom file for the test scope
And then I can create a JUnit-Mockito test case in the test folder. I named it BraveKnightTest.
Its structure is straighforward, let's spend in any case a few words on the first and last line:
Quest quest = Mockito.mock(Quest.class); // 1
// ...
Mockito.verify(quest, Mockito.times(1)).embark(); // 2
1. Create an instance from a mock class that implements the Quest interface.
2. Verify that the method embark() defined on our mock Quest object has actually been called once, as expected.

I have written this post while reading the first chapter of Spring in Action, Fourth Edition by Craig Walls. While doing it, I have ported the original code to a Maven Spring Boot Web project on the STS IDE, using AspectJ annotations instead of the classic xml configuration.

Full source code is on github in the springInAction folder.

Go to the full post

Writing backwards

We want to write a function that reverse the words in input. Just the order of words is reverted, not their actual structure. If we get in input "hello" and "world" we want in output "world hello" and not "dlrow olleh". Notice that there is a blank between each couple of words, but not at the end of the sentence.

This problem is based on the CodeEval Reverse world challenge.

A few test cases (C++11 for GTest as xUnit framework) would clarify what I'd expect from that function:
TEST(ReWo, Given) // 1
{
  ASSERT_EQ(solution( { "Hello", "World" } ), "World Hello");
}

TEST(ReWo, OneWord) // 2
{
  ASSERT_EQ(solution( { "Hello" } ), "Hello");
}

TEST(ReWo, Empty) // 3
{
  ASSERT_DEATH(solution( { } ),"Assertion .* failed.");
}
1. Vanilla case, the words in input are combined in a string from the last one up.
2. A single word in input should not break the algorithm.
3. It is a caller responsibility to ensure that at least a word is passed. To make this requirement explicit in the code, I want my function to assert the input being not empty. I could have been less strict, and simply return an empty string, but I wanted to show how to use the GoogleTest ASSERT_DEATH macro. More on the same topic on a previous post.

Here is how implemented in plain C++98 the function:
std::string solution(const std::vector<std::string>& input)
{
  assert(!input.empty()); // 1

  std::ostringstream oss; // 2
  for(std::vector<std::string>::const_reverse_iterator it = input.rbegin(); it != input.rend() - 1; ++it)
    oss << *it << ' '; // 3
  oss << input[0]; // 4

  return oss.str(); // 5
}
1. The input vector can't be empty.
2. I cared more about readability than performances here. Otherwise I would have be tempted to use a plain STL string, reserving to it enough memory on creation (summing up all the string sizes plus the number of blanks).
3. Put to the string stream all the elements in the vector followed by a blank, starting from the last one (reverse begin) up the the second (the one before the reverse end). We have asserted that the vector has at least an element, so know that this loop works fine. In case there is only one string in input, this block is simply skipped.
4. The first element in input (possibly the only one), is put to the string stream as last token. No trailing blank, as required.
5. Extract the resulting string from the stream and return it.

Go to the full post

Is it a permutation?

We want to write a function that gets in input a vector and check if it contains a permutation of the sequence of natural integers starting with 1.

This is an input that we should accept: { 4, 1, 3, 2 }
And this one should be rejected: { 4, 1, 3 }

You could find this problem in the Codility Train page, section Counting Elements, codename Perm-Check. You can submit you solution in one of a few different supported programming languages, to check it against their acceptance test.

My solution is written in C++98 (alas C++11 is not supported) with test cases for GoogleTest.

Test cases

A couple of test cases are given in the problem presentation, I just added a couple of trivial ones more:
int solution(std::vector<int>& input); // 1

TEST(PeCe, GivenGood) // 2
{
    std::vector<int> input;
    input.push_back(4);
    input.push_back(1);
    input.push_back(3);
    input.push_back(2);

    EXPECT_EQ(1, solution(input));
}

TEST(PeCe, GivenBad) // 3
{
    std::vector<int> input;
    input.push_back(4);
    input.push_back(1);
    input.push_back(3);

    EXPECT_EQ(0, solution(input));
}

TEST(PeCe, OneGood) // 4
{
    std::vector<int> input(1, 1);

    EXPECT_EQ(1, solution(input));
}

TEST(PeCe, OneBad) // 5
{
    std::vector<int> input(1, 42);

    EXPECT_EQ(0, solution(input));
}

TEST(PeCe, OneBigBad) // 6
{
    std::vector<int> input(1, 1000000000);

    EXPECT_EQ(0, solution(input));
}
1. The function prototype we have to implement. For some reason, instead of returning a boolean, it returns an integer that would act as a K&R C boolean emulation, 0 means false, 1 true.
2. First given test, it should detect the permutation.
3. Second given test, no permutation in input.
4. Simplest possible good case.
5. Simplest possible bad case.
6. A curious case. We should consider the case we have huge integers in input. Up to one billion, actually. This is a bit strange, since the max expected size for the vector is just a modest 100 thousand, however we should expect some trickery in the Codility acceptance test based on this point.

A too expensive solution

What we can think is repeatedly scan the input up looking for all the sequence values. If we don't find an expected one, we return failure, otherwise the input is accepted. We should smell immediately something bad. Looping on all the N elements of a container, checking for N values, leads to a Big Oh N Squared worst case time complexity, that we should avoid.

In any case, here it is:
int solution(std::vector<int>& input)
{
    for(unsigned i = 1; i <= input.size(); ++i) // 1
    {
        bool found = false;
        for(unsigned j = 0; j < input.size(); ++j) // 2
        {
            if(static_cast<unsigned>(input[j]) == i) // 3
            {
                found = true;
                break;
            }
        }
        if(!found)
            return 0;
    }

    return 1;
}
1. Our scrambled sequence should contains all the integers from one up to the number of elements in input.
2. Let's look for the current value.
3. The vector in input should have been parametrized for unsigned values, Codility decided otherwise, so here I say explicitly to the compiler not to worry about comparison with what it perceives as objects of different types. Trust me, they are actually both unsigned ints.

The code works fine for small input, but it rapidly gets too slow to be acceptable when the input size grows.

Linear solution

Let's divide the job in two consecutive steps, and use a buffer to store the temporary result. In this way, instead of having one loop nested in another one, we'll have one loop after the other, reducing the time complexity to O(N). We pay this improvement increasing the algorithm space complexity, but we happy to pay this price:
int solution(std::vector<int>& input)
{
    std::vector<bool> buffer(input.size()); // 1

    for(unsigned i = 0; i < input.size(); ++i) // 2
    {
        unsigned value = input[i]-1;
        if(value < input.size()) // 3
            buffer[value] = true;
        else
            return 0;
    }

    return std::count(buffer.begin(), buffer.end(), false) == 0 ? 1 : 0; // 4
}
1. Flags for the expected values in the input vector. The elements of a container are initialized with the default value for the underlying type. So in this moment here we have a bunch of false elements.
2. Loop on all the input values.
3. The OneBigBad test case we have seen above was written to help me to remember to write this check. If input contains (at least) a value that is bigger than expected, we know that we have to reject it. Only if the value is in the correct range we set its flag to true. Moreover, notice that in the line above I have also silently converted the value from signed to unsigned, if a rouge input included a negative value, it has been converted there to a huge positive numbers, and here the anomaly is caught. Better would have been to impose that the input vector was of unsigned elements.
4. I have hidden the second loop that uses the buffer partial result in this call to the STL count() function. We expect all no flag set to false anymore. If this is what happens, we can return success.

Sleeker and (maybe) faster

Actually, as suggested by Karsten, see his comments below, there is no actual need of the final buffer checking, if we have already carefully checked each value in input. The advantage of moving the check here is that we can fast fail as soon as we detect a duplicate element in input:
int solution(std::vector<int>& input)
{
    std::vector<bool> buffer(input.size());

    for(unsigned i = 0; i < input.size(); ++i)
    {
        unsigned value = input[i]-1;
        if(value < input.size() && buffer[value] == false) // 1
            buffer[value] = true;
        else
            return 0;
    }

    return 1; // 2
}
1. If we see that the flag has been already set, it means that we have spotted a duplicate, so we can reject the current input.
2. If we get here, each element in input has been accepted, we just have to return success.

Go to the full post

Equilibrium in a vector

We have an array of integer in input, whose size we know for sure being in the range [2..100,000] and each element is in [−1,000..1,000]. We'd like to split it in two parts, so that the sum of the elements on both sides is as close as possible. For some weird reason, we are not asked to return the index for which such condition is fulfilled, but the minimal difference, as an absolute value, between left and right sum.

You could find this problem in the Codility Train page, in the Time Complexity section, under the nickname Tape-Equilibrium. You could submit there your solution for evaluation in one of the many languages supported. C++11 is not one of them, so I have written my code for C++98.

[Good news, now Codility supports C++11. I have refreshed the code accordingly. Please follow the link for details.]

Notice that the number of elements in the vector and the value for each element is such that we can happily work with plain integers, being the maximum sum we could get something about a mere hundred millions. The real issue in this problem is time complexity, we should strive for a linear solution, if we want to get a 100% score.

Firstly, some test cases (written for GoogleTest):
int equilibrium(std::vector<int>& input); // 1

TEST(TapEq, Given) // 2
{
    std::vector<int> input;
    input.push_back(3);
    input.push_back(1);
    input.push_back(2);
    input.push_back(4);
    input.push_back(3);

    EXPECT_EQ(1, equilibrium(input));
}

TEST(TapEq, TwoBig) // 3
{
    std::vector<int> input;
    input.push_back(-1000);
    input.push_back(1000);

    EXPECT_EQ(2000, equilibrium(input));
}

TEST(TapEq, HundredK) // 4
{
    std::vector<int> input(100000, 1000);

    EXPECT_EQ(0, equilibrium(input));
}

TEST(TapEq, AlmostHundredK) // 5
{
    std::vector<int> input(99999, 1000);

    EXPECT_EQ(1000, equilibrium(input));
}
1. I don't feel right to pass around a non-const reference to a STL container if there is not any compelling reason to do that. Here the reason is just that Codility wants us to do that.
2. This is the test given with the problem description. The equilibrium point is such that the vector is split between (3, 1, 2) and (4, 3) leading to a difference of 1, that is going to be returned to the caller.
3. Minimal case, just two elements.
4. Biggest vector I could get in input, each element has value 100,000, so the result is zero.
5. Almost like the case (4), but here we have an odd number of elements, all of them have the same value, so it is not possible to get a perfect equilibrium.

Bad solution O(N**2)

We could think of looping on all the elements in the vector, from the first to the last but one. That would be our pivot, dividing the vector in two parts. Then we'll all the left and right elements, and compare the results:
int equilibrium(std::vector<int>& input)
{
    int result = std::numeric_limits<int>::max(); // 1
    for(unsigned i = 0; i < input.size() - 1; ++i)
    {
        int left = 0; // 2
        for(unsigned j = 0; j <= i; ++j)
            left += input[j];

        int right = 0; // 3
        for(unsigned j = i + 1; j < input.size(); ++j)
            right += input[j];

        int difference = std::abs(left - right); // 4
        if(difference < result)
            result = difference;
    }

    return result;
}
1. In the beginning we have no result, let's remark this initializing the variable to the highest available value.
2. Sum up all the elements on the left of the current pivot (included).
3. Sum up all the elements on the right of the current pivot (excluded).
4. Get the difference between left and right, and keep it, if it is the current minimum value.

This algorithm is straightforward, but it has a major issue. Its time complexity is in the order of N squared, as we can see immediately, given the twin for-loops in a for-loop.

And, all this summing up is not required. Or better, we repeats many time the same addictions we have already done. A better solution would permit us to minimize them to bare necessity.

Linear solution

Let's start splitting the vector in two parts. One contains just the leftmost element, the other one all the other elements. Then it would be just a matter of adding the current border element to the left and simultaneously subtracting it to the right. We still have two O(N) for-loops, but they are one after the other, so the resulting time complexity falls to linear:
int equilibrium(std::vector<int>& input)
{
    std::vector<int>::iterator pivot = input.begin();
    int left = *pivot;
    int right = std::accumulate(++pivot, input.end(), 0); // 1
    int result = std::abs(left - right);

    for(; pivot < input.end() - 1; ++pivot) // 2
    {
        left += *pivot;
        right -= *pivot;
        int diff = std::abs(left - right);
        if(diff < result)
            result = diff;
    }

    return result;
}
1. The first for-loop is hidden in this call to STL accumulate() function.
2. Second for-loop. Notice that I loop till the last but one element, since I want the right side of the vector to contain at least an element.

Go to the full post

The missing element

An easy problem, that you can find also in the Codility Time Complexity train section. I have written my solution in C++, you could test and submit your one in your preferred language (when available). Its Codility namecode is Perm-Missing-Elem.

In a few words: we are given in input a vector size N containing all the integers (with no duplicates) but one in [1..(N+1)] range. Our job is returning the missing one. We know that N is 100.000 maximum, we want a linear time and a constant space complexity.

First step, I have written the function prototype and I have tried to figure out a few test cases to help me in its design and development (I use Google Test, but any xUnit framework would do):
int missing(const std::vector<int>& input);

TEST(TestPME, CaseSample) // 1
{
  ASSERT_EQ(4, missing({ 2, 3, 1, 5 }));
}

TEST(TestPME, CaseEmpty) // 2
{
  ASSERT_EQ(1, missing({ }));
}

TEST(TestPME, CaseThree) // 3
{
  ASSERT_EQ(3, missing({ 2, 4, 1 }));
}
1. This is the test case provided by Codility. The vector is sized four, the biggest value contained could be five, it easy to see how the missing element is four. A technicality, I have used the C++11 handy notation to create a vector on the fly by an initialization list.
2. It is better to keep an eye on special cases. Here I check what happens when an empty vector is passed. N is zero, so only 1 could be the missing value.
3. There's not much to speculate on this function behavior. But for reason that would become evident in a little while, it is better to test it for both even and odd input sizes.

Naive solution

We could think of repeatedly scanning the vector looking for the missing value. But we should immediately see that this is not a viable solution. We need to loop for each possible value on all the vector elements, that means we are in the domain of a O(N square) time complexity.

Buying time with space

We can radically improve the time complexity using some more memory. Instead of performing a loop in a loop, we perform one after the other, reducing the time complexity to a mere O(N). Sure we have to pay for it, since we have to store the intermediate result somewhere, increasing the space complexity.

The idea is to scan the input vector, using the value read as index in another vector, to keep track that we have found it.
Then we scan the buffer, as soon as we find an element not set, we know that the original element was not present.

Here is my implementation:
int missing(const std::vector<int>& input)
{
  std::vector<bool> check(input.size()+1); // 1

  for(unsigned i = 0; i < input.size(); ++i) // 2
    check[input[i] - 1] = true;

  for(unsigned i = 0; i < check.size(); ++i) // 3
    if(check[i] == false)
      return i + 1;

  return 0;
}
1. I use as buffer a vector of boolean. Remember that we need one element more of the original vector. This STL vector constructor ensures that all the elements contained are set to its default value, that for boolean is false.
2. Loop on the input vector. Quite frighteningly, I haven't done any error handling, trusting the user to provide good stuff in. This is usually a very bad idea. In any case, I read the current value, I decrease it by one, getting a C-style 0-based index, and set the associated element in the check buffer to true, meaning "yeah, we have got this element".
3. Loop on the check buffer. The first element I find set to false is the one I am interested in (actually, it should be the first and only one). Convert back from zero-based index to actual value, increasing by one, and return it.

This solution is good enough to pass the Codility check, but doesn't look satisfactory to me.

Cheaper and more elegant

We are not so much interested in all the values we have in the input container. The unique value that is missing is what we are really interested in. We can spot it reasoning on the input nature. What we get is basically a scrambled sequence of the first N + 1 natural numbers from which we removed a single value. It is very easy to calculate an arithmetic series, if we don't consider the removal. At least since Gauss explained the trick. His classic formula for arithmetic series is usually written like this:
x = 1/2 * top * (top + 1)
What we do is adding the first and last element, top + 1, and multiply it for half the number of the elements. Think about it. To get the sum of all the integer in [1 .. 10], you can add (1 + 10), (2 + 9), ..., (5 + 6). That is 5 * 11.

Once we get the expected result for the full sequence, and remove from it the result we get summing the actual elements, what we get is the missing value:
int missing(const std::vector<int>& input)
{
  int64_t actual = std::accumulate(input.begin(), input.end(), int64_t(0)); // 1

  int64_t top = input.size() + 1;
  int64_t expected = top * (top + 1) / 2; // 2

  return expected - actual; // 3
}
1. The STL accumulate function sums up all the elements in the passed interval, using the third parameter as initial value. I have used as type an explicit 64 bit integer to get rid of problem related to different platforms on which this code could run. Pay attention to the fact that the size of the container and its biggest element could be 100K, so we could easily reach values in the range of billions.
2. I have rewritten the Gauss formula in this slightly different form to avoid the nuisance of multiplying for a floating number (1/2 = 0.5) when I know for sure that I am actually working with integer numbers only.
3. Implicit conversion from 64 bit integer to a plain (whatever means in your environment) integer. Since I am sure that the result is smaller than 100K, I am playing safely.

Go to the full post

Process name by procfs

In a UNIX(-like) environment, the init process is the parent all the other processes. It is usually created first at startup, with a 1 as pid (process identifier). However, there is no guarantee that is true in a specific environment. In my case, I found out that Ubuntu 13.10, Saucy Salamander, follows a different convention. We have an init process with pid 1, but our user processes are managed by a "user" init.Here I show how we could get on Linux the name of the executable associated to a specific process.

We get Linux system information checking the /proc (pseudo) file system. There we could also find a subdirectory for each process currently running on the system, identified by its pid. Here we are interested in checking the "comm" file. Notice that "comm" is available only from recent versions of the Linux kernel, in case it is not available on you environment, you could fallback to "cmdline". Besides, the "comm" stored value is truncated to TASK_COMM_LEN, that should be currently set to 16, characters.

It is easy to write a C++ function that does the trick, a bit more job is required for its C little brother. Here are their prototypes:
std::string getProcessName(int pid); // 1
bool getProcessName(int pid, char* name, int size); // 2
1. C++ let me define a clean interface. I pass in input the pid, I get back the name. If the pid has no associated process, I expect an empty string as output.
2. The C version is a bit clumsier. I need to pass as input parameter the buffer where to write the name, and its size. To simplify its usage I return also a boolean (you can change it to int, for older C compilers support) reporting for success of failure.

As usual, I write a few test cases (for Google Test) to let them drive me in the development:
TEST(ProcessName, GetInitPlus)
{
  ASSERT_STREQ("init", getProcessName(1).c_str()); // 1
  ASSERT_STREQ("init", getProcessName(1680).c_str()); // 2
}

TEST(ProcessName, GetOnePlus)
{
  ASSERT_STREQ("bash", getProcessName(2292).c_str()); // 3
}

TEST(ProcessName, GetMissingPlus)
{
  ASSERT_TRUE(getProcessName(8799).empty()); // 4
}
1. Pid 1 should refer to the init process.
2. My "user" init had that specific pid when I tested my code. You should change it as required.
3. Same as for (2), the pid of my bash shell was 2292, do not expect this test to succeed without proper editing.
4. I had no process with such id, so I expected an empty string as result.

TEST(ProcessName, GetInit)
{
  const int size = 16; // 1
  char buffer[size];
  ASSERT_TRUE(getProcessName(1, buffer, size)); // 2
  ASSERT_STREQ("init", buffer);
}

TEST(ProcessName, GetUserInit)
{
  const int size = 16;
  char buffer[size];
  ASSERT_TRUE(getProcessName(1680, buffer, size)); // 3
  ASSERT_STREQ("init", buffer);
}

TEST(ProcessName, GetOne)
{
  const int size = 16;
  char buffer[size];
  ASSERT_TRUE(getProcessName(2292, buffer, size));
  ASSERT_STREQ("bash", buffer);
}

TEST(ProcessName, GetMissing)
{
  const int size = 16;
  char buffer[size];
  ASSERT_FALSE(getProcessName(8799, buffer, size));
}
1. Accordingly to the official documentation, the constant TASK_COMM_LEN should be defined in "linux/sched.h". For some reason, I couldn't find it there or in any linux include. Maybe I have outdated includes on my machine. I didn't investigate much, and I used a local constant instead.
2. I assert that a process should be found and, in the next line, that the name should be "init".
3. As for the C++ version, check the actual pid for the "user" init on your current environment before testing.

Here is the C++11 version of my function:
std::string getProcessName(int pid)
{
  std::ifstream ifs { ("/proc/" + std::to_string(pid) + "/comm").c_str() }; // 1
  if(ifs.bad()) // 2
    return {};

  std::string command;
  std::getline(ifs, command);
  return command;
}
1. I try to open an input file stream for the proc/{pid}/comm file. The C++11 to_string() function converts an integer value to a string, so that its result could make use of the operator+ std::string overload to join it to the plain C-strings on its left and right. However ifstream requires a plain C-string in input, so a c_str() conversion on the result is required.
2. If the file can't be opened, I return an empty string. Otherwise the file content is extracted and returned.

And this is my C99 version:
bool getProcessName(int pid, char* name, int size)
{
  char filename[80];
  sprintf(filename, "/proc/%d/comm", pid);

  FILE* file = fopen(filename, "r"); // 1
  if(!file)
    return false;

  fgets(name, size, file); // 2
  name[strlen(name) - 1] = '\0';
  return true;
}
1. If I can't open the "comm" file for reading, I return an error, without caring about setting the buffer.
2. Otherwise I use fgets() to read the file content, then I get rid of backslash-n at the end of the string.

Go to the full post

Test hello rapidxml

This include-only C++ library is a bit outdated, still it is commonly used when the job be done fast and requirements are not too sophisticated. You can get the source code from sourceforge were you could find also a slim technical manual.

There are at least a couple of rapidxml characteristics you should be aware before start working with it.

Rapidxml parsing is destructive. The xml_document::parse() method gets in input a non-constant C-string of characters, that it uses as an its own internal buffer. If you want to keep your XML as it is, you'd better pass in a copy of it.

Preconditions are usually checked with assertions. Exceptions are thrown from the xml_document::parse() method only. Be careful in testing what you are passing to an asserting function (for instance, xml_node::last_node() requires the node to have at least a child (it asserts its first_node is not NULL), and try/catching the parse call.

I have written a test case (using the Google Test framework) that shows how to parse a simple XML and to read the information in it. Notice that I just read a document, without performing any editing on it, this keeps the example simple enough.
#include "rapidxml/rapidxml.hpp"
#include <gtest/gtest.h>

TEST(RapidXml, simple)
{
  char buffer[] = "<root><first>one</first><second>two</second><third>whatever</third></root>"; // 1

  rapidxml::xml_document<char> doc; // 2
  ASSERT_NO_THROW(doc.parse<0>(buffer)); // 3

  rapidxml::xml_node<char>* root = doc.first_node(); // 4
  ASSERT_TRUE(root);
  ASSERT_STREQ("root", root->name()); // 5

  bool fields[4] {}; // 6
  for(rapidxml::xml_node<char>* node = root->first_node(); node != NULL; node = node->next_sibling()) // 7
  {
    if(strcmp(node->name(), "first") == 0) // 8
    {
      ASSERT_STREQ("one", node->value());
      fields[0] = true;
    }
    else if(strcmp(node->name(), "second") == 0)
    {
      ASSERT_STREQ("two", node->value());
      fields[1] = true;
    }
    else if(strcmp(node->name(), "third") == 0) // 9
    {
      fields[2] = true;
    }
    else // 10
    {
      fields[3] = true; // unexpected!
      std::cout << "Unexpected node: " << node->name() << std::endl;
    }
  }

  EXPECT_TRUE(fields[0]); // 11
  EXPECT_TRUE(fields[1]);
  EXPECT_TRUE(fields[2]);
  EXPECT_FALSE(fields[3]);
}
1. Remember that rapidxml is going to change this C-string (NULL-terminated array of characters) for its own purposes.
2. The xml_document template class has a template parameter that defaults to char. If you want to save some typing you can rewrite this line without specifying the parameter, and using the char default:
rapidxml::xml_document<> doc
3. xml_document::parse() expects an int as template parameter, pass zero to get the default behavior. In your code you should try/catch this call for rapidxml::parse_error exception (it extends the std::exception). Here I assert that it should not throw.
4. xml_document IS-A xml_node, so I call on doc the xml_node::first_node() method to get the first document child. If doc has no child, first_node() returns a NULL pointer, otherwise we have a pointer to that node.
5. I expect the root to be there, so I assert that it is not zero (AKA false), then I get its name and I assert it is as expected. xml_node IS-A xml_base, where we can see that the name() method never returns NULL, if the node has no name, an empty C-string is returned instead.
6. Root has three children. I want to ensure I see all of them and nothing more. This bunch of booleans keeps track of them. They are all initialized to false (through the handy C++ empty list initializer) and then, in the following loop, when I see one of them I set the relative flag to true. There are four booleans, and not three, because I want to flag also the case of an unexpected child.
7. The for-loop is initialized getting the first root child, then we get the next sibling, until we reach the end of the family (a NULL is returned). We should pay attention using xml_node::next_sibling(), since it asserts when the current node has no parent. But here we call next_sibling() on a node that is surely a children of another node.
8. For first and second node, we want to ensure it has a specific value, hence the assertion.
9. The third node could have any value, I just set the flag when I see it.
10. In case an unexpected node is detected, I keep track of this anomaly setting the relative flag.
11. Check if the expectations are confirmed.

Go to the full post

Commuting Engineer CodeEval problem

We have the coordinates of a bunch of places that we want to visit. Being nerds, we are not happy if we don't generate an algorithm to determine our path. But, the nature of the problem is such that we are ready to deal with a loose approximation to a good solution. Frankly speaking, we are aiming to get what could look, at least at first sight, as a not-so-bad solution.

You can get a more detailed description of the problem on the CodeEval blog. There you can even enter your solution that would be used (at least at the time when I am writing this post) as a first screening for some job interview.

You can't use the solution I am proposing here for a few good reason. Firstly, it is not a good idea to use a piece of code written by someone else to represent you. Secondly, I have written and tested my code for C++11 on GCC 4.8, that is a bit too modern stuff for the current CodeEval requirements. And thirdly, come on, you don't want to give away a good chance of having some fun writing your own solution.

As I said, I am not aiming to the best possible solution, and this is not caused by sloppiness, there are good reasons for that. Let's see a couple of them.

NP-hard

If you have some knowledge of theory of computation, you should have recognized the problem as an instance of the well known Traveling Salesman Problem, commonly called just TSP. It is an interesting problem because it can be stated in a few words, it looks very easy indeed, but it comes out to be an NP-hard (Non-deterministic Polynomial-time hard) one. That means, forget about to come out with an elegant solution.

In the real life, what I would do if I had to solve a problem like that, is checking for a library provinding some adequate algorithm. For instance, you could have a look to BGL, the Boost Graph Library.

But here we can't use external libraries, we have to rely just on the standard ones. So, what I'll do, is implementing a greedy algorithm, choosing any time the local best solution. It is easy to show as such a strategy is heavily flawed, but at least it assure us we get a solution that is not the worst one, and it does it in a reasonable time.

Geography

As you should know, Earth is not flat. We usually think to it like a sort of sphere, but also this one is nothing more than a weak approximation. This implies that we shouldn't consider the coordinates of a place as they were on a two-dimensional surface.

Besides, we are going to check the distance as if we could go in a straight line from one point to the other, and this is usually not the case.

Splitting the problem

I have reduced the original problem to something that I can actually solve with a relatively simple piece of code. Now I split it in a few simpler problems, for each of them I could write a function that solve it.

Parsing the input

Our input is a number of strings, each of them like this one:
1 | CodeEval 1355 Market St, SF (37.7768016, -122.4169151)
We are interested in its ordinal number, and in the place longitude and latitude.

I decided to organize my data using STL pair's and a vector, and I gave them names that hopefully help to understand better what going on in the code:
using Position = std::pair<double, double>;
using Location = std::pair<int, Position>;
using Locations = std::vector<Location>;
Given that, what I want is to extract a Location from each input string, that is going to be pushed in a Locations container. This is the declaration of the function I am thinking of:
Location parse(const std::string& input);
This test case (written for Google Test) shows how I expect it to behave:
TEST(CommEng, Parse1)
{
    std::string input("1 | CodeEval 1355 Market St, SF (37.7768016, -122.4169151)");
    Location loc = parse(input);

    EXPECT_EQ(1, loc.first);
    EXPECT_DOUBLE_EQ(37.7768016, loc.second.first);
    EXPECT_DOUBLE_EQ(-122.4169151, loc.second.second);
}
Notice I use the EXPECT_DOUBLE_EQ() gtest macro to check the actual value extracted from the input string. This is to avoid, or at least reducing, rounding problems. What I basically do using this macro is delegating to GoogleTest the job of choosing an appropriate epsilon that determines when the two compared values are considered about equal.

Here is my function implementation:
Location parse(const std::string& input)
{
    int nr = std::stoi(input); // 1

    std::string::size_type bracket = input.find('('); // 2
    if(bracket == std::string::npos)
        return {}; // 3

    std::string::size_type comma = input.find(',', bracket);
    if(comma == std::string::npos)
        return {};

    double lon = std::stod(input.substr(bracket + 1)); // 4
    double lat = std::stod(input.substr(comma + 1));
    return { nr, {lon, lat}}; // 5
}
1. stoi() is the standard C++11 function similar to old atoi() but having as input parameter an STL-string and not a C-string. Here I am extracting the place ordinal number, that is expected to be right at the beginning of the string. Real code should be more robust, and be ready to (probably) throw an exception.
2. Just a minimal error checking, if no open bracket and following comma is found, an empty Location is returned.
3. Maybe is worthy to remember that this C++11 notation means "call the default ctor for the expected object". So, what I am doing here is building an "empty" Location.
4. Extract the input substring from the expected position (again, more error handling required in production code), than convert it to double using the C++11 stod() function.
5. Construct a Location object and return it to the caller.

Approximated distance

Just apply the pythagorean theorem to calculate the distance between to positions. For what I have said above, the result should be considered just an approximation:
double distance(const Position& beg, const Position& end)
{
    return std::sqrt(std::pow(beg.first - end.first, 2) + pow(beg.second - end.second, 2));
}
The closest point

My greedy algorithm needs to identify the closest Location to a specific Position:
Locations::iterator findClosest(Position& beg, Locations& others)
{
    double closest = std::numeric_limits<double>::max(); // 1
    Locations::iterator pos = others.end();
    for(Locations::iterator it = others.begin(); it != others.end(); ++it) // 2
    {
        double current = distance(beg, it->second); // 3
        if(current < closest)
        {
            closest = current;
            pos = it;
        }
    }

    return pos; // 4
}
1. Initially I set as a solution as "nothing sensible", so the closest distance found is set the the biggest double number available, and the found position to invalid - the end() iterator.
2. Loop on all the available other points.
3. Calculate the current distance, if I found a good candidate, mark it as such, and check if there is anything better.
4. Return the iterator to the best solution I found.

Generating a path

The core of my algorithm is a function that gets in input a container of Locations and gives back a vector containing the path, where each step is identified by the Location descriptor.
std::vector<int> getPath(const Locations& input)
{
    if(input.empty()) // 1
        return {};

    Locations locations(input); // 2
    std::vector<int> results;
    Locations::iterator current = locations.begin(); // 3
    do { // 4
        Position curPos = current->second; // 5
        results.push_back(current->first);
        locations.erase(current);

        current = findClosest(curPos, locations); // 6

    } while(!locations.empty());

    return results;
}
1. Trivial case, nothing in input, nothing in output.
2. Create an input local copy, since I am about to modify it.
3. As for requirements, the path should start from the first element in the provided list.
4. Loop until all the input is consumed.
5. Erase the current position from the list of Locations that I haven't visited yet, but before that, push its descriptor in the output vector.
6. Find the next "current" element.

Full C++11 source code, and some more test cases, on github.

Go to the full post

Assert death and deduction on Osmos

The Code Jam Round 1B 2013 Problem A is nicknamed Osmos. Writing googletest test cases for it, I found a way to call once the ASSERT_DEATH macro. Implementing the function, I have also used the template argument deduction. Hence the fancy name for this post.

We have in input an integer representing the weight of our mote (think of a mote like a greedy little sort-of-living organism), and a bunch of integers representing all the other motes in the neighborhood. We want our mote to survive. To do that, it has to absorb all the smaller motes (growing its weight in the process). We could help it in two ways, adding motes that it could swallow to increase its size, and removing motes that are too big for it. Besides, we are lazy. We want to minimize the number of our adding/removing actions.

Test cases would help to clarify the matter. I am using C++11 by GCC 4.8 on a Linux machine. Test cases are written in the xUnit way, using the gtest framework.
unsigned osmos(unsigned myMote, std::vector<unsigned>& motes); // 1

TEST(TestOsmos, CaseSample1) // 2
{
  const unsigned myMote = 2;
  std::vector<unsigned> motes { 2, 1 };

  ASSERT_EQ(0, osmos(myMote, motes));
}

TEST(TestOsmos, CaseSample2) // 3
{
  const unsigned myMote = 2;
  std::vector<unsigned> motes { 2, 1, 1, 6 };

  ASSERT_EQ(1, osmos(myMote, motes));
}

TEST(TestOsmos, CaseSample3) // 4
{
  const unsigned myMote = 10;
  std::vector<unsigned> motes { 25, 20, 9, 100 };

  ASSERT_EQ(2, osmos(myMote, motes));
}

TEST(TestOsmos, CaseSample4) // 5
{
  const unsigned myMote = 1;
  std::vector<unsigned> motes { 1, 1, 1, 1 };

  ASSERT_EQ(4, osmos(myMote, motes));
}

TEST(TestOsmos, CaseBigger) // 6
{
  const unsigned myMote = 42;
  std::vector<unsigned> motes { 1, 40, 12, 7 };

  ASSERT_EQ(0, osmos(myMote, motes));
}

TEST(TestOsmos, CaseMixed) // 7
{
  const unsigned myMote = 2;
  std::vector<unsigned> motes { 8, 15, 30, 40, 60 };

  ASSERT_EQ(3, osmos(myMote, motes));
}

TEST(TestOsmos, CaseBadMote) // 8
{
  const unsigned myMote = 0;
  std::vector<unsigned> motes { 8, 15, 30, 40, 60 };

  ASSERT_DEATH(osmos(myMote, motes),"Assertion .* failed.");
}
1. My osmos function gets in input my mote weight, and all the other motes ones. It gives back the number of adjustments I had to perform to the environment to help my mote. It is not clearly stated in the problem, but any mote weight shouldn't be less than one, and we can assume it would fit in an int.
2. There are four examples provided with the original problem, I have converted them in test cases. In the first one we expect our mote (sized two) to happily eat its smaller brother (one) so to get bigger enough (three) to eat the other one (sized two). No help is required from us, so the function should return zero.
3. My mote has no problem (even no ethical ones) to eat the tiny twins weighting one each, getting in this way big enough to eat the mote two, but it can't eat its brother six. We need to interfere once, either to remove six from the list, or to add a smaller mote so that our mote could absorb it, growing big enough to complete its job.
4. Eat nine, become nineteen. It is not enough to eat twenty, so I provide a eighteen mote, that would immediately be eaten by my mote that grows to a spectacular thirty seven. Eat twenty, twenty five, becomes eighty three. Again not enough to assimilate one hundred, so I need to help a second time, removing the big fish or adding another small mote to the list.
5. My mote is so small that it can't eat any other mote. I can only remove all the competitors.
6. I added a few more test for cases that jumped to my mind, here is the first one. If my mote is the biggest mote in town, there is no game for the other ones.
7. Similar to CaseSample3.
8. What if the user insert a ethereal mote with no weight? That should never occur, so I think it is right to assert on the mote value. That means that I am so unhappy of such an input that I terminate the program as soon as I spot it. You know as an assertion works, if it fails it terminates the execution outputting a string. The ASSERT_DEATH ensures termination and tests the returned string against the passed regular expression. Here I'm saying that I expect "Assertion (whatever) failed.", with anything instead of "(whatever)", accordingly to the compiler you are using you could get a different result.

There is a tiny nuisance here. On many platform googletest (at least up to version 1.6) issues a warning when calling ASSERT_DEATH, saying that it can't detect the current number of threads. You can easily get rid of this message, at least if you are working on Linux, following a suggestion you can find on stackoverflow.

And here is how I have implemented the function:
unsigned osmos(unsigned myMote, std::vector<unsigned>& motes)
{
  assert(myMote); // 1

  if(myMote == 1) // 2
    return motes.size();

  std::sort(motes.begin(), motes.end()); // 3

  if(myMote > motes.back()) // 4
    return 0;

  unsigned removing = motes.size(); // 5
  unsigned adding = 0; // 6
  for(unsigned i = 0; i < motes.size(); ++i) // 7
  {
    if(myMote <= motes[i]) // 8
    {
      removing = std::min<unsigned>(removing, adding + motes.size() - i); // 9
      while(myMote <= motes[i]) // 10
      {
        myMote += myMote - 1;
        ++adding;
      }
    }
    myMote += motes[i]; // 11
  }

  return std::min(removing, adding); // 12
}
1. It just can't happen that my mote is zero. If I detect it, something completely crazy is happening here, I don't know what to do anymore, and I do not expect the caller, its caller, any grand-caller knowing what to do. I do not have any alternative to terminate here the program execution. If you see it as a bit of an overstating, you would probably throw an exception instead.
2. My mote is so small, it can't ever eat anything. I have only one way of completing the job, removing all the motes from the passed collection. So I return the collection size.
3. Having the motes ordered by size, I could more efficiently determine which one my mote can eat. It costs a O(N log N) time complexity, but it is worthy.
4. Check the right-side mote in the collection (after sorting, the biggest one) against my mote. If my mote is bigger, it would surely eat all of them in a whiff, with no need of any help from my side.
5. Worst case scenario, I would need to remove all the motes to let my mote to win.
6. Best case scenario, I don't have to add anything.
7. Let's loop on all the motes.
8. My mote is not big enough to eat the current guy. We need to check if it is cheaper to add smaller motes or remove it.
9. The motes that I still have to process are the size of the collection minus the current position. In the worst case I should assume I have to remove all of them. Besides, I should remember that I could have already added a few motes to arrive in this position. I am interested in the less expensive solution, so I compare the previous worst case with the current one, and I choose the smallest one.
To make my choice, I use the STL min() template function. Usually I don't need to specify the parameter type, because the compiler is smart enough to deduct it automatically. But this is not the case, so I need to explicitly pass it the type I want to use.
10. Calculate how many motes I have to feed to my mote to make it big enough.
11. In any case my mote eats the current mote, growing up.
12. Let's compare if it is cheaper to remove or add motes, and return it. In this case the call to min() doesn't need any hint to understand that I want to use the unsigned int version.

Full C++ source code on github.

Go to the full post

Bidimensional array annihilator

We have in input a bidimensional array of integers. We should modify it so that any row and column that has originally at least a single element set to zero would have all of them set to zero.

Conceptually it is an easy problem, its main issue is a technical one, since that working with multidimensional arrays in C/C++ has never been exactly a piece of cake.

The idea is to store in two boolean vectors the status of each row and column, setting to true anyone that needs a reset.

Then using that flags to set to zero the elements that need it.

Firstly, let's see how to implement a solution that won't use any STL container.

The C harder way

All is done through C-arrays and raw pointers.

Here is the prototype of the function I want to define:
void cAnnihilator(int* data, const unsigned nRows, const unsigned nCols);
The problem does not specify the array size, so we need to keep it open in our call. For this reason I am forced to pass to the function just the pointer to the first element.

A couple of test case:
TEST(cAnnihilator, CaseSimple) // 1
{
  const int nRows = 5;
  const int nCols = 3;
  int data[nRows][nCols] = {
    { 1, 2, 3 },
    { 4, 5, 6 },
    { 7, 8, 9 },
    { 10, 11, 0 },
    { 13, 14, 15 }
  };

  cAnnihilator(&data[0][0], nRows, nCols); // 2

  for(int i = 0; i < nRows; ++i)
    ASSERT_EQ(data[i][2], 0);
  for(int i = 0; i < nCols; ++i)
    ASSERT_EQ(data[3][i], 0);
}

TEST(cAnnihilator, CaseNothing) // 3
{
  const int nRows = 2;
  const int nCols = 3;
  int data[2][3] = {
    { 1, 2, 3 },
    { 4, 5, 6 }
  };

  cAnnihilator(&data[0][0], nRows, nCols);

  for(int i = 0; i < nRows; ++i)
    for(int j = 0; j < nCols; ++j)
    ASSERT_NE(data[i][j], 0);
}
1. The simple case creates a 5x3 array with only a zero element, after the call to the annihilator, all the elements in column 2 and row 3 are expected to be set to zero.
2. I pass to the annihilator the pointer to the first element in the array and its dimensions.
3. This "nothing" test case gives in input a 3x2 array with no zero element. No change is expected in output.

Here is how I implemented the function:
void cAnnihilator(int* data, const unsigned nRows, const unsigned nCols)
{
  bool rows[nRows]; // 1
  for(unsigned i = 0; i < nRows; ++i)
    rows[i] = false;
  bool cols[nCols];
  for(unsigned i = 0; i < nCols; ++i)
    cols[i] = false;

  for(unsigned i = 0; i < nRows; ++i)
    for(unsigned j = 0; j < nCols; ++j)
      if(*(data + (i * nCols) + j) == 0) // 2
        rows[i] = cols[j] = true;

  for(unsigned i = 0; i < nRows; ++i)
    for(unsigned j = 0; j < nCols; ++j)
      if(rows[i] || cols[j])
        *(data + (i * nCols) + j) = 0; // 3
}
1. The rows and cols boolean array are initialized to "all false". When I'll find a zero element I'll set the relative row and column flag to true.
2. Convert the current element position in bidimensional array notation to its plain distance from the first array element.
3. Set to zero all the elements in rows and columns that require it.

The C++ way

There are a couple of improvement I'd like to to on the function parameter. Firstly, I'd like to give more work to the compiler, let it the burden of deducing the array's size, instead of requiring the client to pass the number of rows and columns. And secondly, I'd prefer to use a STL container instead the raw C-array.

As a result of these consideration, I wrote this function declaration:
template<size_t N, size_t M>
void annihilator(std::array<std::array<int, N>, M>& data);
Now I am passing to the annihilator a reference to an STL array of arrays. A M-sized array of N N-sized int arrays. N and M are template parameters that are deduced from the passed bidimensional array, so that the user doesn't have to specify them in the call.

The test cases for this version of my function look like this:
TEST(Annihilator, CaseSimple)
{
  std::array<std::array<int, 3>, 5> data {{ // 1
    {{ 1, 2, 3 }},
    {{ 4, 5, 6 }},
    {{ 7, 8, 9 }},
    {{ 10, 11, 12 }},
    {{ 13, 14, 15 }}
  }};

  unsigned row = 3; // 2
  unsigned col = 2;
  data[row][col] = 0;

  annihilator(data); // 3

  for(unsigned i = 0; i < data.size(); ++i) // 4
    ASSERT_EQ(data[i][col], 0);
  for(unsigned i = 0; i < data[3].size(); ++i)
    ASSERT_EQ(data[row][i], 0);
}

TEST(Annihilator, CaseNothing)
{
  std::array<std::array<int, 3>, 2> data {{
    {{ 1, 2, 3 }},
    {{ 4, 5, 6 }}
  }};

  annihilator(data);

  for(unsigned i = 0; i < data.size(); ++i)
    for(unsigned j = 0; j < data[i].size(); ++j)
      ASSERT_NE(data[i][j], 0);
}
1. This bracket surplus is a bit annoying. It is not required by the C++11 standard but the current GCC implementation. Your compiler could understand a single bracket.
2. Set a value to zero.
3. The annihilator call is so much cleaner.
4. Ensure the annihilator works as expected.

And finally, here is my C++11 annihilator implementation:
template<size_t N, size_t M>
void annihilator(std::array<std::array<int, N>, M>& data)
{
  std::array<bool, N> cols {}; // 1
  std::array<bool, M> rows {};

  for(unsigned i = 0; i < M; ++i)
    for(unsigned j = 0; j < N; ++j)
      if(data[i][j] == 0) // 2
        rows[i] = cols[j] = true;

  for(unsigned i = 0; i < M; ++i)
    for(unsigned j = 0; j < N; ++j)
      if(rows[i] || cols[j])
        data[i][j] = 0;
}
1. The initializer list is much more expressive than a for-loop. All the elements in cols and rows are initialized to false.
2. Accessing STL-array elements as if they were C-array one (and they actually are, after all).

The source code is on github.

Go to the full post

Parsing files with RapidJson

If you need to parse a JSON stored in a file from a C++ application, you could use the RapidJson header-only library.

In a file, named minimal.json, I have a JSON like this:
{
  "res":"OK",
  "tot":419,
  "nr":20,
  "ds":
  [
    {"a": "one/a","b": "one/b"},
    {"a": "two/a","b": "two/b"}
  ]
}
I want to do a few things this JSON, checking its status (stored in "res", getting the "nr" and "tot" integer fields, and the "ds" array.

As an helper, I have created this class:
#include <rapidjson/document.h>

class RJDoc
{
private:
  rapidjson::Document doc_;

public:
  RJDoc(const std::string& json);

  bool checkStatus();
  int getTotal();
  int getNumber();
  const rapidjson::Value& getDs();
};
As usual, I wrote a number of test cases to drive the code development. Here is one of them:
TEST(TestRJDoc, CaseMinimal)
{
  RJDoc reader("minimal.json"); // 1
  ASSERT_TRUE(reader.checkStatus()); // 2
  ASSERT_EQ(419, reader.getTotal()); // 3
  ASSERT_EQ(20, reader.getNumber());

  const rapidjson::Value& ds = reader.getDs(); // 4
  ASSERT_EQ(2, ds.Size()); // 5

  ASSERT_STREQ("one/a", ds[0U]["a"].GetString()); // 6
  ASSERT_STREQ("two/b", ds[1U]["b"].GetString());
  ASSERT_THROW(ds[3U]["a"].GetString(), std::logic_error); // 7
}
1. I want to be able to instantiate a reader passing the name of the associated file.
2. The status check should pass if the "res" field exists and it is set to "OK".
3. A couple of utility methods to get "tot" and "nr".
4. The access to the "ds" array should be granted by a function that returns it as a RapidJson Value object (a sort of variant).
5. Ensure that the array size is as expected.
6. An element of an object in a JSON array is retrieved using this syntax. Notice that we have to explicitly tell to the compiler that the index is an unsigned value. And remember that the RapidJson Strings are actually plain raw C-strings.
7. I need to manage the case of missing field. For instance here I am trying to access a field of a non-existing element in the array. The default RapidJson behavior is letting an assertion fail. I want instead a standard exception to be thrown, so that I can catch it and perform some sort of fall back operation.

For point (7), we can see in rapidjson.h this piece of code:
///////////////////////////////////////////////////////////////////////////////
// RAPIDJSON_ASSERT

//! Assertion.
/*! By default, rapidjson uses C assert() for assertion.
 User can override it by defining RAPIDJSON_ASSERT(x) macro.
*/
#ifndef RAPIDJSON_ASSERT
#include <cassert>
#define RAPIDJSON_ASSERT(x) assert(x)
#endif // RAPIDJSON_ASSERT
So what I have done is defining the symbol RAPIDJSON_ASSERT before any RapidJson include in my code:
#include <stdexcept>
#define RAPIDJSON_ASSERT(x) if(!(x)) throw std::logic_error("rapidjson exception");
I have defined my class constructor in this way:
RJDoc(const std::string& filename)
{
  std::stringstream ss;
  std::ifstream ifs;
  ifs.open(filename.c_str(), std::ios::binary);
  ss << ifs.rdbuf(); // 1
  ifs.close();

  if(doc_.Parse<0>(ss.str().c_str()).HasParseError()) // 2
    throw std::invalid_argument("json parse error"); // 3
}
1. I read the file stream in a string stream.
2. Then I try to parse the string associated to the stream through the rapidjson::Document defined as private data member.
3. What if the file is missing, the data format is wrong or corrupted? I decided to throw a standard invalid_argument exception (and I wrote a few test cases to document this behavior).

The other methods are even simpler then the constructor:
bool RJDoc::checkStatus()
{
  rapidjson::Value& status = doc["res"];
  if(!status.IsString())
    return false;

  return std::strcmp(doc["res"].GetString(), "OK") == 0; // 1
}

int RJDoc::getTotal()
{
  return doc["tot"].GetInt(); // 2
}

int RJDoc::getNumber()
{
  return doc["nr"].GetInt();
}

const rapidjson::Value& RJDoc::getDs()
{
  rapidjson::Value& value = doc["ds"];
  if(!value.IsArray())
    throw std::logic_error("bad ds"); // 3

  return value;
}
1. When calling a RapidJson GetXXX() method on an element that is not present in the current document, as for my definition of RAPIDJSON_ASSERT, I'd get a std::logic_error exception. In this case I don't want that. This is the reason for carefully checking that "res" is a(n existing) string element before getting its value and comparing it against "OK".
2. No special check for "tot" or "nr", just throw a std::logic_error exception if they are missing.
3. I really want "ds" to be an array, otherwise there is no use in returning its unexpected value. So I'll throw just like no "ds" was available at all.

Go to the full post