CodeEval Clean up the words

Given a string containing 'dirty' characters, return its 'clean' version. See the CodeEval problem #205 for a full description. Basically, we want to return only lowercase alphabetic characters and a single blank to replace any number of non-alpha characters in between. I am working on this problem using Python 3 as implementation language.

I have converted the given examples in Python test cases, just adding a garbage-only line:
def test_provided_1(self):
    self.assertEqual('hello world', solution('(--9Hello----World...--)'))

def test_provided_2(self):
    self.assertEqual('can you', solution('Can 0$9 ---you~'))

def test_provided_3(self):
    self.assertEqual('what are you doing', solution('13What213are;11you-123+138doing7'))

def test_garbage(self):
    self.assertEqual('', solution('7/8§§?'))
Given my background, I came up firstly with a solution that looks like a code porting from an existing C-language function:
def solution(line):
    result = []
    alpha = False

    for c in line: # 1
        if c.isalpha():
            alpha = True
            result.append(c.lower())
        elif alpha: # 2
            result.append(' ')
            alpha = False

    if result:
        result.pop() # 3

    return ''.join(result) # 4
1. Loop on all the characters in the string. If the current one is an alpha, push its lowered representation to the result list, and raise the flag to signal that we have detected an alpha.
2. If the current character is not an alpha, and we have just pushed an alpha to the result, it is time to push a blank and reset the flag.
3. Actually, CodeEval is not that strict to require this. However, we can avoid to leave a blank at the end of the string popping it. But only if the list is not empty!
4. Finally, convert the list to a string by joining it elements on an empty string.

Still does not come to me naturally, eventually I came up with a more pythonic solution.
import re


def solution(line):
    return re.sub('[^a-zA-Z]+', ' ', line).lower().strip()
Remove the 'dirty' characters by regular expression substitutions, asking to select all sequences composed by one or more non-alpha characters, and pushing a single blank for each of them. Then make it lowercase, and strip all leading and trailing blanks.

Very elegant indeed. Notice that here I'm saying to Python what I want it to do for me, and not how to do it. This way of thinking leads to code that is much more readable and compact.

CodeEval looks happy about both solutions, so I pushed test cases and my python functions to GitHub.

No comments:

Post a Comment