T9 spelling

The third part from the Code Jam Qualification Round Africa 2010 was pretty easy to solve. Basically, we have to write a translator that converts a string that could include only lowercase standard English characters and blanks, to a sequence of numbers (and blanks that act as special characters), following rules similar to the ones that we use when composing text messages on a basic mobile phone.

For instance, for an "f" we get three 3, and for "oo", we get "666 666". As you can see, the pause we have to do to enter a character based on the same coding number is represented by a blank.

As usual, I'm using C++ as implementing language, and here I report only the core of the solution, the definition for a function declared as here below. I let the development be driven by test cases, here I show just one of them:
std::string t9(const std::string& input)

TEST(TestT9, Case4)
{
    std::string input = "hello world";
    std::string expected = "4433555 555666096667775553";

    std::string output = t9(input);

    EXPECT_EQ(expected, output);
}
I couldn't think of any smart solution to this question, so I played safely, using a lookup table to convert each possible input character to an output sequence. The English alphabet has 26 letters, in the table I will put as 27th element the translation for blank, a zero:
const int NR = 26;
char* CONVERSION[NR + 1] = { "2", "22", "222", "3", "33", "333", 
    "4", "44", "444", "5", "55", "555", "6", "66", "666",
    "7", "77", "777", "7777", "8", "88", "888",
    "9", "99", "999", "9999", "0"
};
The array reads: An 'a' becomes "2"; 'b' "22"; ...; 'z' "9999"; and ' ' "0".

Now I should write a piece of code to loop on all the letter in the input string (no error handling required, still would be better to spend at least a few minutes thinking to a more robust solution), convert each letter in an index for the conversion table (blanks require a bit of caution), put the resulting sequence in the output string. And remember to put the blank separator where required:
std::string t9(const std::string& input)
{
    std::string output;
    std::for_each(input.begin(), input.end(), [&output](char letter) // 1
    {
        int index = letter == ' ' ? NR : letter - 'a'; // 2

        if(!output.empty() && *output.rbegin() == CONVERSION[index][0]) // 3
            output += ' ';
        
        output += CONVERSION[index];
    });

    return output;
}
1. To implement the loop, I combined the standard algorithm for_each() with a lambda function (how could I live without them). In the lambda capture specification I put the string output, so that it can be seen inside the function body. The parameter passed to lambda is the letter currently selected by the for_each() algorithm.
2. Convert the input letter to an index for the lookup table. "Normal" characters, in the range a-z, require a subtraction by 'a', so that for 'a' I have the index 0, up to z that has index 25. What for blank? Twenty six. Or, as I put it, the last element in the array. But what happens if an unexpected character is in the input string? A catastrophe. A real piece of code should avoid it (throwing an exception, maybe).
3. Before appending the selected sequence to the output string, we check if we need a blank, to signal that we are entering a new sequence using the same number code as the previous one. To do that, we check the last character we have already appended to the output. This is a two-step condition. Firstly, we check that the string is not empty, otherwise there is nothing to check. Secondly, we get the last character in the output string, and we check if it is the same of a character in our current sequence. It could be worthy to spend a few extra words on the way I used to extract the last character from the output string:
*output.rbegin()
In plain C, this simple job is a pain. The trouble is that good old C-string is just a pointer to the beginning of the memory block occupied, and to get its last character we are forced to get its length first. The STL C++ version of a string is much more smarter that this. It knows where it begins, and ends. To get the first character of a string, assuming it is not empty, we could simply dereferencing the iterator to its begin, something like *output.begin(). But here we want the last character. One way to get it, is dereferencing the iterator to the reverse begin of the string.

As Arthur pointed out (see comment below), output.back() is more compact and expressive than *output.rbegin(), and leads to the same result. Why don't use it instead? No reason at all, if you use a recent enough C++ compiler. The element access function back() has been added to the STL string class interface in C++11, like front(), and a few other useful methods. Be at least prepared to see it in legacy code.

5 comments:

  1. Fun little problem, always interesting to see the different approaches to CS problems. I approached it as a contest, a solution ASAP — e.g. I normally don't use using namespace std — I posted my solution here (verified as correct by the problem page):

    https://gist.github.com/zenmumbler/5579626

    Also, instead of the *output.rbegin() expression you could use output.back() to access the last char. Is likely easier to understand than a dereference of a reverse iterator.

    http://en.cppreference.com/w/cpp/string/basic_string/back

    ReplyDelete
    Replies
    1. You are right, Arthur, it is interesting to see how different people approach to the same problem. Your solution deserve some more than a few words here, I think I'll spend some time delving into it.

      About using back() instead of dereferencing the reverse iterator. It's just that I am so used to this pre-C++11 trick that I find it more natural. But it is just me, better to use back(), I agree.

      Delete
    2. Speaking about reverse iterators, I just made an implementation for problem B, reversing words and I use them there. Problem B's core functionality can be essentially implemented using 2 algorithms, it's a simple one.

      https://gist.github.com/zenmumbler/5584573

      Delete
  2. Am I _evil_ for spotting this? We can do the digit mapping without any lookup table using some bit fiddling:

    '2' + std::min(7, (i - (i >> 4)) / 3)

    (having i == ch - 'a'). So, here's a much "improved" version featuring template wankery, robustness, and genericity: we're doing C++, so why don't we support streaming mode while we're at it?

    http://ideone.com/O5sLIS#li_O5sLIS

    Representative line:

    t9_transform(std::istreambuf_iterator(std::cin), std::istreambuf_iterator(), output);

    :)

    ReplyDelete
    Replies
    1. On the contrary, any suggestion is welcomed! I was not especially fond of using a lookup table, but the good of it is that is easy to understand what it is meant for. To tell you the truth, your bit fiddling looks to me a bit harder to understand ;-)

      I haven't used streams, and all the others goodies you put in your solution, because I tried to keep my code as simple as possible. But I do think that your code is conceptually more interesting. I'll have a closer look at it soon.

      Delete