Alien language

Stripping out its colorful description, the Code Jam Qualification Round 2009 Problem A, better known as Alien language, is merely a matter of applying a pattern matching function. Given a vocabulary and a pattern list, we should say how many words match to each pattern.

The major issue is that I choose to implement the solution in C++, where the regular expressions had a complicated history, and only now, with C++11, we have a standardized regex library.

But this leads to another complication. The regex implementation by GCC is still incomplete, even on the most recent version (4.8.1, when I am writing this). So I fell back to the alway reliable Boost.

My idea is creating a class, that stores the vocabulary and makes available to the user a method to check a pattern on it:
#include <vector>
#include <string>

class LangCheck
{
private:
    std::vector<std::string> vocabulary_;

public:
    LangCheck(const std::vector<std::string>& vocabulary) : vocabulary_(vocabulary) {}

    int matches(std::string pattern);
};
Yeah, right. But why passing the pattern to the matches() function by value and not by reference?

The reason is that the pattern is produced in a slightly "wrong" way, so we have to adjust it before we can use it.

Here is the test case, as derived from the input provided by the original problem:
#include <gtest/gtest.h>

TEST(AlienLang, Test)
{
    LangCheck lc({ "abc", "bca", "dac", "dbc", "cba" }); // 1

    EXPECT_EQ(2, lc.matches("(ab)(bc)(ca)")); // 2
    EXPECT_EQ(1, lc.matches("abc"));
    EXPECT_EQ(3, lc.matches("(abc)(abc)(abc)"));
    EXPECT_EQ(0, lc.matches("(zyx)bc"));
}
1. I create a temporary vector of string object on the fly, using the new C++11 initializer list, that would be automagically moved to the LangCheck data member.
2. I call the matches() methods for all the passed cases. As you see, instead of squared brackets, round ones are used. We need to modify the input to adjust it to the standard regex format.

If you plan to use boost::regex too, remember that this library is not a "pure header" one, you need to link the opportune archive, or a shareable object, to your executable. Initially I forgot to do it, and I got a number of compiling errors like:
undefined reference to boost::re_detail::perl_matcher
undefined reference to boost::basic_regex
If you have installed boost on your linux box via apt-get, you should find them in the /usr/lib folder. Once I linked boost_regex, I could happily succeed compiling my tester.

Here is how I implemented the matches() function:
#include <algorithm>
#include <boost/regex.hpp>
// ...

int LangCheck::matches(std::string pattern) // 1
{
    std::replace(pattern.begin(), pattern.end(), '(', '['); // 2
    std::replace(pattern.begin(), pattern.end(), ')', ']');

    boost::regex re(pattern); // 3

    int matching = 0;
    for(std::string word : vocabulary_)
        if(boost::regex_match(word, re)) // 4
            ++matching;

    return matching;
}
1. For what I said above, the input string is passed by value.
2. Using the standard algorithm replace() function, that works on any STL container supporting forward iterator, I adjust the brackets in the pattern.
3. Create a boost regex object from the modified pattern.
4. Let's count how many words in the current vocabulary matches the pattern. Any success would increment the counter, that in the end would be returned.

No comments:

Post a Comment