CodeEval Message Decoding

We are given in input an encoded message, and we have to return it decoded. See the CodeEval page for this problem for more details.

Only one sample is presented, and I didn't bother to convert it to a test case, I just ran my Python script passing the provided input and I had visual check on the result. The point is the problem is not difficult, it just takes some time to get understood properly. At least in my case.

However, if this is the input:
$#**\0100000101101100011100101000
We are expecting this output:
##*\$
Here is the logic behind the transformation.

The first characters represent the dictionary used in this message. We know that they are over when we read a zero or a one. In our case they are five:
$#**\
We convert them in a variable size string of zeros and ones, listing all the numbers from zero on, but skipping the ones represented by 1-only binary numbers:
$ -> 0
# -> 00
* -> 01
* -> 10
\ -> 000
If we had a sixth element, it would have been coded as "001".

Then we have a three-character block, that we have to read as a binary number, and give as the size of the following chunks in the message. Here is "010", meaning 2.
We go on reading elements of this size until when we get a terminator, represented by an all-one sequence:
00 00 10 11
We have three valid elements, and a terminator. We get the first part of our message:
# # *
Back to the size, this time is "011", meaning 3. So we get on reading three by three, until the terminator:
000 111
We already knew the only valid character sized 3 was '\', here we have one on it.

Read again the size, is "001". We read one character at the time.
0 1
There is just a single zero before we get 1, that is the sequence terminator. We add '$' to the decoded message and we read again the size, that is zero:
000
End of transmission. The message was "##*\$". Job done.

Writing a solution in Python, I used a generator to create the keys for the letters in the dictionary:
def key_generator():
    size = 1
    cur = 0
    while True:
        yield format(cur, '0{}b'.format(size))
        if cur == 2 ** size - 2:
            size += 1
            cur = 0
        else:
            cur += 1
I keep the generator status in two variables, size, that knows how long should be the generated key, and cur, that knows the binary value that should be pushed in that space.
The yield statement converts the cur value as expected, using both the built-in format() function and the format() string method. The first gets in the cur value and converts it to a string containing its binary representation. The second provides to the numbers of characters that the binary string should take.
Than we prepare to the next iteration. If increasing the cur we'd get a 1-only binary number, we skip it, resetting cur to zero, and increasing the size. Otherwise it is just a matter of increasing cur.

Now I just have to use the generator to map the provided characters to their coded representation:
mapping = {}
i = 0
for key in key_generator():  # 1
    if line[i] not in ['0', '1']:  # 2
        mapping[key] = line[i]
        i += 1
    else:
        break  # 3
1. I instantiate the generator and loop indefinitely on it.
2. If the current character is valid, I couple the key to it in the dictionary that I named mapping.
3. When I detect a '0' or a '1' I stop filling the dictionary.

I implemented the result generation with a boring double loop:
result = []
while True:
    size = int(line[i:i + 3], 2)  # 1
    if size == 0:
        break
    i += 3
    while True:
        chunk = line[i:i+size]  # 2
        i += size
        if chunk not in mapping:  # 3
            break
        result.append(mapping[chunk])  # 4
1. Convert a three character substring in an integer, considering it as binary. If it is zero, job done. Otherwise move the index in the line and go on reading.
2. Get a chunk of the specified size.
3. There is not such a key in the dictionary, meaning, we got a terminator. Interrupt the inner loop.
4. Otherwise convert the chunk in the actual character and put it in the result list.

We just have to collate the result in a string an return it to the caller.

I pushed the Python3 script to GitHub.

No comments:

Post a Comment