Pages

Alien Numbers - base conversion

The Google Code Jam Practice Mode Problem A is about number base conversions. To make it a bit more spicy, it asks to perform a double conversion, and the user should also specify the symbols used to identify each number, from both the starting and ending language.

Our standard decimal system would be represented by the string "0123456789", and the standard binary system by "01". Our converter should allow us to use less conventional system, as a ternary system that uses "oF8" as base symbols.

As usual, my first step has been writing a number of test cases that are meant to drive the development. Here is the prototype of the C++ function that should be written and one of the tests that its implementation should pass to be accepted:
std::string convert(const std::string& number, const std::string& source, const std::string& target);

TEST(TestAlnum, Case3)
{
    const std::string aNumber("13");
    const std::string lSource("0123456789abcdef");
    const std::string lTarget("01");
    const std::string expected("10011");

    std::string output = convert(aNumber, lSource, lTarget);
    EXPECT_EQ(expected, output);
}
We want to write a function, named convert, that accept in input a number in a source language, and convert it to another number (the return value) in a target language.

As example, the testcase shows that the hexadecimal 13 should be converted in the binary 10011 (that is, decimal 19).

The most natural implementation that came to my mind, asks the convert() function to perform two different actions, converting the input number to an integer, and then converting that integer to the expected output.

For sake of clarity, I have explicitly split the code among two functions. A cleanest version would put all the code in a class Converter, barring the access to the implementation methods, leaving in the public interface just the original convert() function. But here I keep just this bunch of free functions:
int getInternal(const std::string& number, const std::string& source);
std::string getTarget(int internal, const std::string& target);

std::string convert(const std::string& number, const std::string& source, const std::string& target)
{
  int internal = getInternal(number, source);
  return getTarget(internal, target);
}
Another improvement this code is loudly calling for, is about error handling and input checking. We can't assume the user would pass us correct values, as in the Code Jam is customary doing, and we should manage unexpected behaviors, giving a sensible feedback to the user.

I didn't spend time on that, but I couldn't avoid to modify my test cases. Now I want to check also the two internal functions:
TEST(TestAlnum, Case1)
{
    const std::string aNumber("9");
    const std::string lSource("0123456789");
    const std::string lTarget("oF8");
    const std::string expected("Foo");

    int internal = getInternal(aNumber, lSource);
    EXPECT_EQ(9, internal);

    std::string target = getTarget(internal, lTarget);
    EXPECT_EQ(expected, target);

    std::string output = convert(aNumber, lSource, lTarget);
    EXPECT_EQ(expected, output);
}
After all this, I finally started writing the function to convert an integer to a coded string. The idea is pretty simple. If we want to convert an integer in its string representation, we could repetitively divide it by the base we intend to use. Its module is the number is the cipher that we need to convert to the character that represents it. The only caveat is that we write numbers with the most significative elements on the right, and this algorithm produces elements from the less significative element upward.
std::string getTarget(int internal, const std::string& target)
{
  std::string result;
  do {
    result += target[internal % target.size()]; // 1
  } while(internal /= target.size()); // 2
  return std::string(result.rbegin(), result.rend()); // 3
}
1. Here the string result is built in the wrong order (less significative cipher is on the left), I'd adjust the order at the end of the function. I could have avoided it inserting any new character at the begin of the string, instead of appending it to the end, in this way:
result.insert(0, 1, X);
But here the most interesting part is 'X', the character we actually use to represent the current cipher. Let's say that we want to convert 19 in a decimal base, target should be a string sized 10, so internal modulo target size is 9. The ninth element in the target string is the character representing the cipher 9, so we add it to the result string.
2. We have already managed the least significative figure in the internal number, we can happily discard it, and go on with the next iteration. If internal is reduced to zero (or was already zero from the beginning) we stop looping.
3. I opted to fill the result string with less significative ciphers on the left, so now I have to reverse it.

Converting a string to its integer representation is a specular job. I think the most natural approach is scanning the string from right (less significative) to left. We should remember that our notation for numbers is positional, and each cipher has to be multiplied for the associated power of the used base.
int getInternal(const std::string& number, const std::string& source)
{
  int result = 0;
  int exp = 0;
  for(std::string::const_reverse_iterator it = number.rbegin(); it != number.rend(); ++it)
  {
    size_t value = source.find(*it); // 1
    if(value == std::string::npos) // 2
      return -1;
    result += value * std::pow(source.size(), exp++); // 3
  }

  return result;
}
1. The current figure from the input number is searched in the string with available symbol for the base. If we are looking for 9 (as character) in a canonical ten base, we should get its position (that should be 9, as number).
2. I couldn't help to write at least this minimal error handling check. If the cipher is not in the symbol list, I return an (unexpected) negative number. A more robust approach would have required throwing an exception.
3. As we move from left to right, the exponent is increased accordingly.

Go to the full post

Alternative phone number conversions

The problem was converting lowercase letters (and blanks) to numbers, following the rules used by the phone keypad. In a previous post I gave a possible C++ solution, and I got as feedback a couple of alternative solutions. It is interesting to compare them, to see different approaches taken by different programmers.

Arthur's solution is on github. In a few words, he loops on the input string, calling for each character a support function named encodeChar() that returns a pair where first is the resulting number, and second the number of needed repetition.

The encodeChar() function delegates to the compiler the task of generating a string that maps the valid character to the relative expected position, filled with underscores for invalid positions. Then find() is used to get requested character and its multiplier.

Interesting features of this solution:

- C++11 range-based for loop. Code is more compact and readable.
- Use of STL data structure and algorithms.
- The lookup table is ingeniously mapped to a single string.

Some more chatting could be done on:

- C++11 use of the auto keyword. Using it when it is not a necessity (or a nice shortcut that improve readability) just makes code less clear. For instance, the C++11 std::stol() always return a long int, there is no advantage in assigning its returned value to an auto variable.
- Mapping the lookup table to a single string full implies that we should use find() to get converted character and its multiplier. We should be aware that find() has O(n) time complexity, against O(1) of a lookup table.

I couldn't help to rewrite the function main loop to get rid of the temp variable used to keep track of the last inserted key. Admittedly, it is just a minimal improvement, still, I nerdly fill better to see the code changed in this way:
for(char c : input)
{
  Token token = encodeChar(c);
  if(!output.empty() && token.first == output.back())
    output += ' ';
  output += std::string(token.second, token.first);
}
Anonymous put his solution on ideone, compiled on C++11 (gcc-4.7.2). I guess he is a fan of generic programming, and his job is a cool example of this technique. The main issue here is that there is no reason, beside the fun of writing it, to come out with such a complex piece of code to solve this problem. So it sorts make sense that he named his lambda function that determines the associated key for the current letter "magic", because it takes a while to understand it, and at first sight it looks really as it works for a kind of magic.

It is difficult to summarize the good stuff we could see in that piece of code, nice use of templates, decltype, auto declarations, and a static assertion to check at compile time the generic function actual correctness. I think it could be used as an introductory example to generic programming.

There is only a (minor) issue in the code, it does not manage consecutive blanks. In the t9_transform() main loop, there is a "else" that covers the blank case (assuming nothing else than lower case letters and blanks are in the input string). There we should add a check on the previously checked character, as we do in the "if".

More interestingly, I spent some time understanding how "magic" actually works. To show how it works, I think it could be useful to see the original one-liner lambda converted to an equivalent "normal" longish function. For sake of simplicity, I sacrificed the genericity of magic(), writing a char2rep() that works only for plain chars:
auto magic = [](decltype(*f) i) { return '2' + std::min(7, (i - (i >> 4)) / 3); };

char char2rep(char input)
{
  if(input == 'z')
    return '9';

  int shift = input > 'r' ? 1 : 0;
  int code = input - 'a';
  return '2' + (code - shift) / 3;
}
To understand how magic() works, we have to think about what it wants to achieve. Any number in the keypad, starting from 2, is the result of the conversion of three consecutive letters. If the number of letters was a precise multiple of three, the function would have been trivial. But there are two keys, 7 and 9, that should map four, and not three letters. So we have to plan for a couple of adjustment.
If the input letter is bigger that 'r', we need to shift one position, and the last character, AKA 'z', should follow a special rule. For magic() the trick is done maximizing the shift to 7, alternatively, we could just say that 'z' is converted to '9'.

So, after all it was no magic, just an obscure piece of code ;)

Go to the full post

Prime number checker

Writing a general algorithm to check if a (positive integer) number is a prime is a too complicated matter to be discussed here (and to be asked as a interview question for a programmer position). Have a look at the Manindra Agrawal, Neeraj Kayal, Nitin Saxena document where they "present an unconditional deterministic polynomial-time algorithm that determines whether an input number is prime or composite", friendly known as AKS algorithm, to see what I mean.

Still, if our requirements are not very strict, and we are happy enough with a function that has reasonable performance only for small numbers, it is a different matter.

A couple of thousands years ago, a very smart guy named Eratosthenes of Cyrene had prime numbers among his many interests. He devised the algorithm known as the sieve of Eratosthenes that could still make sense to use.

The algorithm is designed to return all the prime numbers in an interval, so using it to check just a single number is a kind of an overkill. About the same of what happens for similar algorithm to calculate factorial or Fibonacci numbers.

I want to write a function following this declaration and passing the below stated test case:
bool isPrime(unsigned long long value); // 1

TEST(TestPrime, Eratosthenes)
{
    EXPECT_FALSE(isPrime(42));
    EXPECT_TRUE(isPrime(977));
    EXPECT_TRUE(isPrime(4987));
    EXPECT_FALSE(isPrime(999999997));
    EXPECT_TRUE(isPrime(32416190071));
    EXPECT_TRUE(isPrime(99494896918097));
    EXPECT_TRUE(isPrime(997080383502859));
    EXPECT_TRUE(isPrime(2305843009213693951)); // 2
}
1. I'd like my function to be able to manage very big numbers. It should accept a not negative huge integer in input and answer with a boolean.
2. This 19 digits integer is known as Mersenne M61, or Pervushin's number, we know it is a prime number since 1883, and I could consider myself happy if my function could check it in a reasonable time.

Here is how I implemented the solution:
bool isPrime(unsigned long long value)
{
    if(value < 2) // 1
        return false;
    if(value < 4)
        return true;
    if(value % 2 == 0)
        return false;

    long max = static_cast<long>(std::ceil(std::sqrt(static_cast(value)))); // 2

    for(long i = 3; i <= max; i += 2) // 3
        if(value % i == 0) // 4
            return false;

    return true;
}
1. Get rid of trivial cases. Zero and one are not considered prime numbers; two and three are OK; all the even number bigger than two are not prime.
2. Optimization. If we want to check if 101 is a prime number, we don't have to check all the previous number till 100, we can stop checking an awful long before. It is enough to check till the next integer after its square root (and if you think about it, you should understand why).
3. Let's loop. We need to check for all the possible odd divisors starting from three. Actually, we don't really need to check a lot of them, like nine (being a multiple of three), but there is no cheap way of keeping track of which one has to be checked and which not. It is easier doing in this way.
4. Divide our input number for the current candidate. It could be the end of our job, otherwise try the next one.

Go to the full post

T9 spelling

The third part from the Code Jam Qualification Round Africa 2010 was pretty easy to solve. Basically, we have to write a translator that converts a string that could include only lowercase standard English characters and blanks, to a sequence of numbers (and blanks that act as special characters), following rules similar to the ones that we use when composing text messages on a basic mobile phone.

For instance, for an "f" we get three 3, and for "oo", we get "666 666". As you can see, the pause we have to do to enter a character based on the same coding number is represented by a blank.

As usual, I'm using C++ as implementing language, and here I report only the core of the solution, the definition for a function declared as here below. I let the development be driven by test cases, here I show just one of them:
std::string t9(const std::string& input)

TEST(TestT9, Case4)
{
    std::string input = "hello world";
    std::string expected = "4433555 555666096667775553";

    std::string output = t9(input);

    EXPECT_EQ(expected, output);
}
I couldn't think of any smart solution to this question, so I played safely, using a lookup table to convert each possible input character to an output sequence. The English alphabet has 26 letters, in the table I will put as 27th element the translation for blank, a zero:
const int NR = 26;
char* CONVERSION[NR + 1] = { "2", "22", "222", "3", "33", "333", 
    "4", "44", "444", "5", "55", "555", "6", "66", "666",
    "7", "77", "777", "7777", "8", "88", "888",
    "9", "99", "999", "9999", "0"
};
The array reads: An 'a' becomes "2"; 'b' "22"; ...; 'z' "9999"; and ' ' "0".

Now I should write a piece of code to loop on all the letter in the input string (no error handling required, still would be better to spend at least a few minutes thinking to a more robust solution), convert each letter in an index for the conversion table (blanks require a bit of caution), put the resulting sequence in the output string. And remember to put the blank separator where required:
std::string t9(const std::string& input)
{
    std::string output;
    std::for_each(input.begin(), input.end(), [&output](char letter) // 1
    {
        int index = letter == ' ' ? NR : letter - 'a'; // 2

        if(!output.empty() && *output.rbegin() == CONVERSION[index][0]) // 3
            output += ' ';
        
        output += CONVERSION[index];
    });

    return output;
}
1. To implement the loop, I combined the standard algorithm for_each() with a lambda function (how could I live without them). In the lambda capture specification I put the string output, so that it can be seen inside the function body. The parameter passed to lambda is the letter currently selected by the for_each() algorithm.
2. Convert the input letter to an index for the lookup table. "Normal" characters, in the range a-z, require a subtraction by 'a', so that for 'a' I have the index 0, up to z that has index 25. What for blank? Twenty six. Or, as I put it, the last element in the array. But what happens if an unexpected character is in the input string? A catastrophe. A real piece of code should avoid it (throwing an exception, maybe).
3. Before appending the selected sequence to the output string, we check if we need a blank, to signal that we are entering a new sequence using the same number code as the previous one. To do that, we check the last character we have already appended to the output. This is a two-step condition. Firstly, we check that the string is not empty, otherwise there is nothing to check. Secondly, we get the last character in the output string, and we check if it is the same of a character in our current sequence. It could be worthy to spend a few extra words on the way I used to extract the last character from the output string:
*output.rbegin()
In plain C, this simple job is a pain. The trouble is that good old C-string is just a pointer to the beginning of the memory block occupied, and to get its last character we are forced to get its length first. The STL C++ version of a string is much more smarter that this. It knows where it begins, and ends. To get the first character of a string, assuming it is not empty, we could simply dereferencing the iterator to its begin, something like *output.begin(). But here we want the last character. One way to get it, is dereferencing the iterator to the reverse begin of the string.

As Arthur pointed out (see comment below), output.back() is more compact and expressive than *output.rbegin(), and leads to the same result. Why don't use it instead? No reason at all, if you use a recent enough C++ compiler. The element access function back() has been added to the STL string class interface in C++11, like front(), and a few other useful methods. Be at least prepared to see it in legacy code.

Go to the full post

Reverse words

The second problem proposed in the Google Code Jam Qualification Round Africa 2010, asks to revert the words in a number of strings given in input. It is a simpler request than the first one in the round, but still there are a couple of points worth of spending some thinking.

The core of the question is defining a function like this (if you solve it in C++):
std::string revert(const std::string& input)
I have written a few test cases to drive the development, like this one:
TEST(TestRevert, Case1)
{
    std::string input = "this is a test";
    std::string reverted = "test a is this";

    std::string output = revert(input);

    EXPECT_EQ(output, reverted);
}
Then I spent some time asking myself how to actually implement the function. My first idea has been to use an input string stream to extract the words from the string, put them in a container, and then, via an output string stream, generate the output string.

It is not the best solution that one could came out with, in term of efficiency, but it has the advantage of being clean, easy to write, and event kind of fun.

Let's see it:
std::string revert(const std::string& input)
{
    std::vector<std::string> words;
    std::istringstream iss(input);
    std::string item;
    while(iss >> item)
        words.push_back(item); // 1

    if(words.empty())
        return ""; // 2

    std::ostringstream oss; // 3
    std::for_each(words.rbegin(), words.rend() - 1, [&oss](const std::string& cur) // 4
    {
        oss << cur << ' ';
    });
    oss << words[0]; // 5

    return oss.str(); // 6
}
1. I loop on the input string stream until there is something in it. By default, all blank characters are discarded by the put-to operator, so I just have to push back any word I get in the buffer container.
2. What if nothing was in the input string? I simply return an empty one.
3. Otherwise, I want to loop backward on all the collected words, putting in the output stream any single word followed by a blank. With the noticeable exception of the last one, that has no following blank.
In C++ you could do that using the for_each() STL algorithm, delimiting the range with the reverse iterators rbegin() - pointing to the last element in the collection - and the decremented rend() - pointing to the first one (remember that the "end" element in a iteration is the first out of the selection).
4. As functional to be executed on each element in the range, I defined a lambda function (I love them). If your compiler do not support them, I am sorry to say you have to rearrange the code, probably using a functor, or a plain for loop. In the capture clause I specified the above defined output string stream, passed by reference, so that we can use it inside the lambda body.
5. The first word in the collection is the last element sent to the output stream.
6. Finally, I extract the string from the stream, and I return it.

I like this code, still it has a couple of issues. Most importantly, it could be too expensive. If we have to process lots and lots of input lines, creating and using all those string streams could be easily become a pain.

This second version is stream-free, as this means faster. You could even get rid of STL strings, but you will pay this extra performance gain with an extra code readability loss. For our requirements, I assumed this is enough:
std::string revert(const std::string& input)
{
    std::vector<std::string> words;

    size_t beg = 0; // 1
    size_t end = 0;
    do {
        end = input.find(' ', beg); // 2
        std::string word(input.begin() + beg, (end == std::string::npos ? input.end() : input.begin() + end)); // 3
        words.push_back(word);
        beg = end + 1; // 4
    } while(end != std::string::npos); // 5

    std::string output;
    output.reserve(input.length()); // 6
    std::for_each(words.rbegin(), words.rend() - 1, [&output](const std::string& cur) // 7
    {
        output += cur;
        output += ' ';
    });
    output += words[0];

    return output;
}
1. I use a couple of variables to keep track of the beginning and end of a single world in the input.
2. Search the end of the word starting on beg (first iteration, beg is zero). If there is no blank after beg, find() return string::npos to signal it.
3. Let's build a word, starting from begin() + beg, and ending to end(), if no blank is found, or to the actual blank position.
4. The new beg value is the next one after the current end, meaning, the first character after the last used blank. Actually, it doesn't much sense to assign to beg string::npos + 1, how it happens in case no blank has been found, but it is harmless (see next point for the reason).
5. We loop until end is string::npos, that means, we have already scanned the complete input line.
6. Let's generate the output string. I know its size (same as input), so I (possibly) save some time specifying it.
7. The output construction loop is almost the same as in the previous version. What changes is that I work on the actual STL string, and not on the string stream.

Go to the full post

TopCoder Username

Another common source of interview questions is TopCoder, a site that hosts a huge variety of developer competitions. They have a standard format for tests, presented in their How to dissect a TopCoder problem statement page.

As example, they provides a few questions they have used in the past. First of them is the UserName problem. In a handful of words, we have to write a function that ensures the chosen username is available or, if not, propose a viable alternative.

I simplified a bit the required interface, tailoring it for C++. Now, my requirements are that we should write a function respecting this prototype:
std::string check(const std::string& input, const std::vector<std::string>& existing);
Where input is the username submitted by the user and existing is the list of already taken usernames. The function has to return a valid username, preferably the one chosen by the user. If the requested username is already taken, we will return it with a concatenated number - the smallest, the better.

As usual, before start coding, I have written a few testcases. Here is just one of them:
TEST(Username, Case0)
{
  std::string input("MyUsername");
  std::vector<std::string> existing = { "Taken", "Someone", "Hello", "Nickname", "User" }; // 1 

  ASSERT_EQ(input, check(input, existing)); // 2
}
1. I have written this code on a recent GCC compiler, so I took advantage of C++11 initialization for STL vectors.
2. The passed username is not among the list of already existing ones, so I expect to get it back as result of function call.

And here is my solution:
std::string check(const std::string& input, const std::vector<std::string>& existing)
{
  std::set<std::string> orderedNames(existing.begin(), existing.end()); // 1
  if(orderedNames.find(input) == orderedNames.end()) // 2
    return input;

  for(int i = 1; ; ++i) // 3
  {
    std::string candidate = input + boost::lexical_cast<std::string>(i); // 4
    if(orderedNames.find(candidate) == orderedNames.end())
      return candidate;
  }

  // we should never reach here
  throw new std::exception();
}
1. Seeing the requirements, the expected code complexity is not an issue, still I reckoned it was wise to be prepared for a huge vector of existing usernames. In that case, it is worthy to create a set so that we have them ordered, and looking for an element has a cheap O(log n) cost.
2. If the input is not among the existing items, return it.
3. We check until we find a good alternative candidate - so, I used an open ended for loop.
4. Username candidates are build concatenating the user input with a integer, starting from 1. Among the many possible ways of converting an int to a string, I've chosen the boost lexical cast.

Go to the full post