Pages

Alien Numbers - base conversion

The Google Code Jam Practice Mode Problem A is about number base conversions. To make it a bit more spicy, it asks to perform a double conversion, and the user should also specify the symbols used to identify each number, from both the starting and ending language.

Our standard decimal system would be represented by the string "0123456789", and the standard binary system by "01". Our converter should allow us to use less conventional system, as a ternary system that uses "oF8" as base symbols.

As usual, my first step has been writing a number of test cases that are meant to drive the development. Here is the prototype of the C++ function that should be written and one of the tests that its implementation should pass to be accepted:
std::string convert(const std::string& number, const std::string& source, const std::string& target);

TEST(TestAlnum, Case3)
{
    const std::string aNumber("13");
    const std::string lSource("0123456789abcdef");
    const std::string lTarget("01");
    const std::string expected("10011");

    std::string output = convert(aNumber, lSource, lTarget);
    EXPECT_EQ(expected, output);
}
We want to write a function, named convert, that accept in input a number in a source language, and convert it to another number (the return value) in a target language.

As example, the testcase shows that the hexadecimal 13 should be converted in the binary 10011 (that is, decimal 19).

The most natural implementation that came to my mind, asks the convert() function to perform two different actions, converting the input number to an integer, and then converting that integer to the expected output.

For sake of clarity, I have explicitly split the code among two functions. A cleanest version would put all the code in a class Converter, barring the access to the implementation methods, leaving in the public interface just the original convert() function. But here I keep just this bunch of free functions:
int getInternal(const std::string& number, const std::string& source);
std::string getTarget(int internal, const std::string& target);

std::string convert(const std::string& number, const std::string& source, const std::string& target)
{
  int internal = getInternal(number, source);
  return getTarget(internal, target);
}
Another improvement this code is loudly calling for, is about error handling and input checking. We can't assume the user would pass us correct values, as in the Code Jam is customary doing, and we should manage unexpected behaviors, giving a sensible feedback to the user.

I didn't spend time on that, but I couldn't avoid to modify my test cases. Now I want to check also the two internal functions:
TEST(TestAlnum, Case1)
{
    const std::string aNumber("9");
    const std::string lSource("0123456789");
    const std::string lTarget("oF8");
    const std::string expected("Foo");

    int internal = getInternal(aNumber, lSource);
    EXPECT_EQ(9, internal);

    std::string target = getTarget(internal, lTarget);
    EXPECT_EQ(expected, target);

    std::string output = convert(aNumber, lSource, lTarget);
    EXPECT_EQ(expected, output);
}
After all this, I finally started writing the function to convert an integer to a coded string. The idea is pretty simple. If we want to convert an integer in its string representation, we could repetitively divide it by the base we intend to use. Its module is the number is the cipher that we need to convert to the character that represents it. The only caveat is that we write numbers with the most significative elements on the right, and this algorithm produces elements from the less significative element upward.
std::string getTarget(int internal, const std::string& target)
{
  std::string result;
  do {
    result += target[internal % target.size()]; // 1
  } while(internal /= target.size()); // 2
  return std::string(result.rbegin(), result.rend()); // 3
}
1. Here the string result is built in the wrong order (less significative cipher is on the left), I'd adjust the order at the end of the function. I could have avoided it inserting any new character at the begin of the string, instead of appending it to the end, in this way:
result.insert(0, 1, X);
But here the most interesting part is 'X', the character we actually use to represent the current cipher. Let's say that we want to convert 19 in a decimal base, target should be a string sized 10, so internal modulo target size is 9. The ninth element in the target string is the character representing the cipher 9, so we add it to the result string.
2. We have already managed the least significative figure in the internal number, we can happily discard it, and go on with the next iteration. If internal is reduced to zero (or was already zero from the beginning) we stop looping.
3. I opted to fill the result string with less significative ciphers on the left, so now I have to reverse it.

Converting a string to its integer representation is a specular job. I think the most natural approach is scanning the string from right (less significative) to left. We should remember that our notation for numbers is positional, and each cipher has to be multiplied for the associated power of the used base.
int getInternal(const std::string& number, const std::string& source)
{
  int result = 0;
  int exp = 0;
  for(std::string::const_reverse_iterator it = number.rbegin(); it != number.rend(); ++it)
  {
    size_t value = source.find(*it); // 1
    if(value == std::string::npos) // 2
      return -1;
    result += value * std::pow(source.size(), exp++); // 3
  }

  return result;
}
1. The current figure from the input number is searched in the string with available symbol for the base. If we are looking for 9 (as character) in a canonical ten base, we should get its position (that should be 9, as number).
2. I couldn't help to write at least this minimal error handling check. If the cipher is not in the symbol list, I return an (unexpected) negative number. A more robust approach would have required throwing an exception.
3. As we move from left to right, the exponent is increased accordingly.

No comments:

Post a Comment