Pages

CodeEval Following Integer

Given a number, we should return its follower, accordingly to a weird rule explained on the CodeEval page of this problem. In a few words, we can only use the digits in the number itself, but we can add a zero if we need to.

The provided samples, that I converted to Python tests, give a better idea of our task:
def test_provided_1(self):
    self.assertEqual(151, solution('115'))

def test_provided_2(self):
    self.assertEqual(2048, solution('842'))

def test_provided_3(self):
    self.assertEqual(80000, solution('8000'))
115 is followed by 151. All the other permutations of the three digits are higher than that.
There is no permutation of 842 that is bigger than it. So we need to add a zero, getting 2048.
Same goes for 8000, we have to add a zero getting 80000.

I felt it was too complicated to check the number and trying to build its follower considering its digits, and I went instead for an almost brute force approach that comes naturally from watching the test case. Check the permutations, and see which one is the next one. If there is no next one, add a zero to the number in second position and return it.

I implemented this algorithm in Python3 adding just a tiny variation. First I generate the number with an added zero and then I check all the permutations. The reason for this inversion should be clear following the code.
digits = []
for c in line:  # 1
    if c != '0':
        insort(digits, c)  # 2
while len(digits) <= len(line):  # 3
    digits.insert(1, '0')
result = int(''.join(digits))
1. Loop on the digits of the passed number, stored in the string named line.
2. I store all the digits but the zeros, sorted in natural order, in the buffer list named digits. The insort function comes from the bisect package.
3. Then I push in the buffer all the required zeros after the first, lowest, significative digit. In this way I get the smaller possible number having one cipher more than the passed one.

Now I am ready to loop on all the permutations for the passed number:
current = int(line)
for item in permutations(line):
    candidate = int(''.join(item))  # 1
    if result > candidate > current:  # 2
        result = candidate
1. I get a permutation, I convert it to an integer. See the itertools library for details.
2. If the current candidate is less than the tentative solution and it is bigger that the number passed in input, I discard the previous calculate result and keep the current one.

At the end of the loop I have in result my solution.

I feared that this intuitive algorithm was too expensive. Fortunately, I was wrong. It has been accepted with good points. Test case and python3 script are on GitHub.

Go to the full post

CodeEval Self Describing Numbers

Tell if the number passed in is self-describing, returning 1 in case of success. That number, also known as self-descriptive number, is such that it has in each digit, starting from the leftmost, the number of that digit in the number itself. If this definition looks confusing to you, you are not alone. Have a look to the CodeEval page for this problem, or to the tests here below.

Based on the provided samples, I wrote these tests:
def test_provided_1(self):
    self.assertEqual(1, solution('2020'))  # 1

def test_provided_2(self):
    self.assertEqual(0, solution('22'))  # 2

def test_provided_3(self):
    self.assertEqual(1, solution('1210'))  # 3
1. '2020' has two 0's, no 1's, two 2's, zero 3's. So it is self describing.
2. '22' has two 2's. To be self-describing it should have two 0's and two 1's. So it is not.
3. One 0's, two 1's, one 2's, zero 3's. Yes, it is.

Got it? we say it is self-describing if all its digits have the value of the respective digits counted in the number.

We could write very compact Python solutions to this problem, almost one-liners, still easy to undestand.

First part, we need to count the digits in the input string. The Counter class in collections is designed right for this task:
from collections import Counter

counter = Counter(map(int, line))
I take the input line (that I name "line"), I map its elements to int, and I use them to initialize an object of the class Counter.

Then I use the built-in all to check all the digit in the range from zero to the string size (right open interval, as python standard). If they are all the same value of the counter for the respective digit, I return 1:
return int(all(counter[i] == int(line[i]) for i in range(len(line))))
Simple and convincing, isn't it?

The complete python3 script and unit test is on GitHub.

Go to the full post

CodeEval Happy Numbers

Check if a number is happy. Meaning that, adding up the squares of its digits we get one, or a number that, following the same recursice procedure, leads to one. A better description is on CodeEval. Now I see that I have already solved this problem a few years ago using C++ as implementation language. This time I used Python, and I have to say that the code looks more readable to me.

The core of the procedure to check if a number is happy requires to add on all its squared up digits. Thinking in a pythonic way, I have seen it as summing all the elements resulting from a custom iteration. I focused on the iterative part of this sub-problem, creating a generator that does the job:
def squared_ciphers(number):
    while number:
        number, cipher = divmod(number, 10)  # 1
        yield cipher ** 2  # 2
1. The handy built-in function divmod() returns the quotient and the remainder of the integer division by its passed parameters. I use it to get the rightmost digit in cipher removing it from number.
2. Each digit is squared and returned.

Now I am ready for the main part of the script. I am about to loop indefinitely until I see the number is happy or sad. Luckly we know that this loop is not infinite. See wikipedia for details.
def solution(line):
    candidate = int(line)
    explored = set()
    while candidate != 1:  # 1
        if candidate in explored:  # 2
            return 0
        explored.add(candidate)
        candidate = sum(squared_ciphers(candidate))  # 3
    return 1
1. If I get a 1, the originating number is happy. I can stop looping.
2. I store all the generate numbers in a set, so that I can easily check if I have already seen the current element. If so, I know I am entering a deadly loop, and so I can safely state the checked number is not happy.
3. I need to generate a new number from the current one. I call the generator to get all its squared ciphers, and I send the results to the sum() built-in function, so to get it.

Happy or not, this script nails it. You can get the full Python3 source from GitHub.

Go to the full post

CodeEval String List

In this CodeEval problem we are given a number n and a string. We have to return, as a comma separated list, the sorted collection of all the possible words sized n from the letters in the string.

I have converted the given samples in python tests:
def test_provided_1(self):
    self.assertEqual('a', solution('1,aa'))

def test_provided_2(self):
    self.assertEqual('aa,ab,ba,bb', solution('2,ab'))

def test_provided_3(self):
    self.assertEqual('ooo,oop,opo,opp,poo,pop,ppo,ppp', solution('3,pop'))
And then I spent some time thinking on them.

The first one remembers us that we need to keep only the unique letters in the string. So I would probably use a set to clean it up from repetitions.
The second one gives away a very important clue. The result is obviously the cartesian product between the first and the second letter. Since I am using Python as implementation language, I thought immediately to itertools product().
The third one tells me not to forget to sort the results. So set is useful as intermediate passage, but then I should put the items in a list, so that I can sort it. Besides, it shows me how I have to proceed, iterating on each partial solution applying each time the cartesian product between the previous result and the initial collection of letters.

Well. It looks quite easy now. It just a matter of writing the code.
def solution(line):
    n, data = line.split(',')
    letters = sorted(list(set(data)))  # 1
    result = letters  # 2
    for i in range(int(n)-1):  # 3
        result = ['{}{}'.format(x[0], x[1]) for x in product(result, letters)]  # 4
    return ','.join(result)
1. I use the set collection to remove duplicates, then I convert it to a list, and I sort it.
2. I need to keep the original letter collection aside, so that I can use it as a factor in the cartesian product, so I copy it in another list.
3. If n is 1, I don't have to do anything more, the plain list of single letters has been already generated. Otherwise have have to apply n-1 times the cartesian product.
4. The itertools product() function returns a tuple, I convert it to a plain string, and I push each result in a new list that overwrite the previous result.

There is not much left out to see, however the full code for the test case and python3 script is on GitHub.

Go to the full post

CodeEval Message Decoding

We are given in input an encoded message, and we have to return it decoded. See the CodeEval page for this problem for more details.

Only one sample is presented, and I didn't bother to convert it to a test case, I just ran my Python script passing the provided input and I had visual check on the result. The point is the problem is not difficult, it just takes some time to get understood properly. At least in my case.

However, if this is the input:
$#**\0100000101101100011100101000
We are expecting this output:
##*\$
Here is the logic behind the transformation.

The first characters represent the dictionary used in this message. We know that they are over when we read a zero or a one. In our case they are five:
$#**\
We convert them in a variable size string of zeros and ones, listing all the numbers from zero on, but skipping the ones represented by 1-only binary numbers:
$ -> 0
# -> 00
* -> 01
* -> 10
\ -> 000
If we had a sixth element, it would have been coded as "001".

Then we have a three-character block, that we have to read as a binary number, and give as the size of the following chunks in the message. Here is "010", meaning 2.
We go on reading elements of this size until when we get a terminator, represented by an all-one sequence:
00 00 10 11
We have three valid elements, and a terminator. We get the first part of our message:
# # *
Back to the size, this time is "011", meaning 3. So we get on reading three by three, until the terminator:
000 111
We already knew the only valid character sized 3 was '\', here we have one on it.

Read again the size, is "001". We read one character at the time.
0 1
There is just a single zero before we get 1, that is the sequence terminator. We add '$' to the decoded message and we read again the size, that is zero:
000
End of transmission. The message was "##*\$". Job done.

Writing a solution in Python, I used a generator to create the keys for the letters in the dictionary:
def key_generator():
    size = 1
    cur = 0
    while True:
        yield format(cur, '0{}b'.format(size))
        if cur == 2 ** size - 2:
            size += 1
            cur = 0
        else:
            cur += 1
I keep the generator status in two variables, size, that knows how long should be the generated key, and cur, that knows the binary value that should be pushed in that space.
The yield statement converts the cur value as expected, using both the built-in format() function and the format() string method. The first gets in the cur value and converts it to a string containing its binary representation. The second provides to the numbers of characters that the binary string should take.
Than we prepare to the next iteration. If increasing the cur we'd get a 1-only binary number, we skip it, resetting cur to zero, and increasing the size. Otherwise it is just a matter of increasing cur.

Now I just have to use the generator to map the provided characters to their coded representation:
mapping = {}
i = 0
for key in key_generator():  # 1
    if line[i] not in ['0', '1']:  # 2
        mapping[key] = line[i]
        i += 1
    else:
        break  # 3
1. I instantiate the generator and loop indefinitely on it.
2. If the current character is valid, I couple the key to it in the dictionary that I named mapping.
3. When I detect a '0' or a '1' I stop filling the dictionary.

I implemented the result generation with a boring double loop:
result = []
while True:
    size = int(line[i:i + 3], 2)  # 1
    if size == 0:
        break
    i += 3
    while True:
        chunk = line[i:i+size]  # 2
        i += size
        if chunk not in mapping:  # 3
            break
        result.append(mapping[chunk])  # 4
1. Convert a three character substring in an integer, considering it as binary. If it is zero, job done. Otherwise move the index in the line and go on reading.
2. Get a chunk of the specified size.
3. There is not such a key in the dictionary, meaning, we got a terminator. Interrupt the inner loop.
4. Otherwise convert the chunk in the actual character and put it in the result list.

We just have to collate the result in a string an return it to the caller.

I pushed the Python3 script to GitHub.

Go to the full post

HackerRank Binary Search: Ice Cream Parlor

Given an integer, the budget for two ice creams, and a list of integer, the price of the available flavors, return the 1-based indeces for the two chosen flavors. As stated in the page problem, we should implement a binary search in our solution.

I converted the provided samples in python tests, and I added to them a couple more when I was thinking on possible algorithms:
def test_provided_1(self):
    self.assertEqual('1 4', solution(4, [1, 4, 5, 3, 2]))

def test_provided_2(self):
    self.assertEqual('1 2', solution(4, [2, 2, 4, 3]))

def test_expensive(self):
    self.assertEqual('3 4', solution(10, [2, 2, 4, 6]))

def test_neighbor(self):
    self.assertEqual('3 4', solution(8, [1, 2, 4, 4, 8]))
Instead of using the suggestion given away in the title, I have been magnetically attracted to use a dictionary. The idea is quite simple, I use as key the flavor price, on which I have to perform the search, and as values a keep a list, so that I can push in it the position of all the flavor having the same price.
def solution(money, prices):
    counter = defaultdict(list)  # 1
    for i in range(len(prices)):  # 2
        counter[prices[i]].append(i+1)

    for price, flavors in counter.items():  # 3
        target = money - price
        if target == price:  # 4
            if len(flavors) > 1:
                return '{0} {1}'.format(flavors[0], flavors[1])
        else:
            others = counter.get(target)  # 5
            if others:
                result = sorted([others[0], flavors[0]])
                return '{} {}'.format(*result)
1. When I try to access a value in this dictionary, if the key was not already in, a new empty list is created. Using defaultdict saves my to call setdefault() on a plain dict.
2. I loop on the input array of prices, appending each flavor position (adjusted to be 1-based) to the associated key reporting its price.
3. Loop on all the price/flavors in the dictionary.
4. If the price I am ready to pay for the second flavor is the same of the amount I would pay for the first, I have only to check if the current bucket is used by more than one flavor. If so, I return the first two of them I found.
5. Otherwise, I check if there is in the dictionary a flavor for the price I am willing to pay. If I see it, I sort the two flavors position (as required by the problem) and return them.

This solution works fine, it is clear and fast, and it is accepted by HackerRank. However it is not what we are expected to produce. So I refactored it to use binary search. The rationale behind it could be thought as if we should run this algorithm in an embedded environment, and we found out that using hash tables eats too much memory, so we need to fall back to a slightly slower algorithm that uses a leaner data structure.

What I do now is putting price/flavors couples in a plain list, sort it and then checking for each element if there is a match to it among the other ones.
def solution(money, prices):
def solution(money, prices):
    counter = sorted([(price, flavor+1) for flavor, price in enumerate(prices)])  # 1
    for i in range(len(counter)):
        other = find_other(counter, i, money - counter[i][0])  # 2
        if other:
            result = sorted([counter[i][1], other])
            return '{} {}'.format(*result)
1. In this list comprehension I enumerate the prices. This generates a number of tuples in the format (index, price), I reverse them, increase the index, since we need it as 1-based, and push them in the list. Then I sort it.
2. For each element in the list, I check if there is a match. I pass to the find_other() function, see it below, the list, the current position, the amount I want spend for the ice cream. I expect it to return the other flavor or nothing, in the sense of python None, if it couldn't find it.

Here I finally perform the binary search:
def find_other(counter, pos, price):
    left = pos+1  # 1
    right = len(counter)-1
    while left <= right:  # 2
        mid = (left + right) // 2
        if counter[mid][0] == price:
            return counter[mid][1]

        if counter[mid][0] > price:
            right = mid-1
        else:
            left = mid+1
    return None
1. I don't care about the left-side elements in the counter list, they have been already checked. This saves some minimal searching time and, more interestingly, avoid the risk of a false positive. Think the case that the total budget is 2x and I am looking for a flavor costing x. I could end up finding the same x I have already found. It won't be difficult to skim this case off, but in this way we have it for free.
2. If there is anything in the current interval, I get its central point, I check it against the expected value, if it is not a match move to the side where I could find it.

Also this solution works fine and it is accepted by HackerRank.

On GitHub: The unit test, the binary search version, and the unofficial hash map version.

Go to the full post

The Spring AOP around advice

I am going to add a plain Spring component to my Spring Boot practice web application. The interesting part of it is that I am developing an AOP aspect that is designed to give a few advices in conjunction with a stated join point.

Here is the component, nothing fancy.
@Component
public class PopConcert implements Performance {
    private static final Log log = LogFactory.getLog(PopConcert.class);

    @Override
    public boolean perform(int level) throws PerformanceException {
        if (level > 8) {
            log.info("A nice gig");
            return true;
        } else if (level > 6) {
            log.info("Not in the mood tonight");
            return false;
        } else {
            throw new PerformanceException("Drunk drummer");
        }
    }
}
Its only method returns a boolean or throws and exception, when something really unusual happens.

I have already talked about Spring AOP with aspects in a previous post, let's see now a sligher more complex aspect, that define more advices:
@Aspect
public class Audience {
    private final static Log log = LogFactory.getLog(Audience.class);

    @Pointcut("execution(boolean Performance.perform(int))")  // 1
    public void performance() {}
    
    @Before("performance()")  // 2
    public void silenceMobile() {
        log.info("Silencing mobile phones");
    }

    @Before("performance()")
    public void takingSeat() {
        log.info("Taking seat");
    }

    @AfterReturning("performance()")  // 3
    public void applaude() {
        log.info("Applauding");
    }

    @AfterThrowing("performance()")  // 4
    public void complain() {
        log.info("Demanding a refund");
    }
}
1. Here I define a pointcut. Not a strict necessity, but it saves some typing, and should improve the code readability. I named it "performance()" and I state it refers to the method perfmorm() defined in the Performance interface. All the advices below refer to it.
2. I have a couple of before advices, as the AspectJ annotation says. Here the point is that I can have more methods associated to the same advice.
3. After the execution of the join point has been completed, Spring should follow this advice.
4. If an exception break the join point run, this advice should be followed by Spring.

As usual I let the magic of putting together everything to be performed by a Spring configuration class:
@Configuration
@EnableAspectJAutoProxy
@ComponentScan(basePackageClasses = Performance.class)  // 1
public class ConcertConfig {
    @Bean
    public Audience audience() {  // 2
        return new Audience();
    }
}
1. In this case the only component in the hierarchy based on Performance is the PopConcert class we have seen above.
2. This method let the aspect spring to life.

I have written a JUnit testcase to see it at work:
@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(classes = ConcertConfig.class)
public class PopConcertTest {
    @Autowired
    private Performance performance;

    @Test
    public void testPerform() {
        try {
            assertTrue(performance.perform(11));
        } catch (PerformanceException pe) {
            fail("Unexpected PerformanceException: " + pe.getMessage());
        }
    }

    @Test(expected=PerformanceException.class)
    public void testPerformBadly() throws PerformanceException {
        performance.perform(2);
    }
}
Running it I have the greenlight and I can see in the log that the aspect has done its job

When an exception occurs in the join point, the AfterThrowing advice is executed, and when all works fine, the AfterReturning is called:
That's nice. But let's see how put all this stuff in a single advice.

AspectJ Around

The around advice gets its join point as an annotation parameter, as usual, and it gets in input, as a parameter a reference to a ProceedingJoinPoint. What we have to do is just specifying in its body what we want to be executed before, after, and even in case of exception. In a natural way, but with a few caveats.
@Aspect
public class AudienceAround {
    private final static Log log = LogFactory.getLog(AudienceAround.class);

    @Around("execution(boolean Performance.perform(int))")
    public Object watchPerformance(ProceedingJoinPoint pjp) throws PerformanceException {  // 1
        Object result = null;  // 2
        try {
            log.info("Ensure mobile is off");
            log.info("Take place");
            result = pjp.proceed();  // 3
            log.info("Cheer");
        } catch (Throwable t) {  // 4
            log.info("Demand a refund: " + t.getMessage());
            if (t instanceof PerformanceException) {
                throw ((PerformanceException) t);  // 5
            } else {
                throw new PerformanceException("Unexpected:" + t.getMessage());  // 6
            }
        }
        return result;
    }
}
1. Our join point throws an exception, so also its Around advice has to throw it too.
2. When it does not throw an exception, it returns a value, so we have to be prepared to to the same. Notice that it returns a primitive type, but the proceed() method in ProceedingJoinPoint returns an object. Some boxing and unboxing is expected.
3. We are passing the control to Spring, that is going to give it to the specified join point.
4. If the join point throws an exception, we'll get a Throwable. It is more or less the same issue we have in (2). Since Spring doesn't know beforehand what throws it, it falls back to the mother of all exceptions.
5. It won't be a good idea to swallow the exception, the caller wants to get it. So here I rethrow it.
6. If I have done everything right, this should never happen.

I have swapped the beans in ConcertConfig. Now a AudienceAround is seen by Spring as the aspect it should care about. I run again the tester and nothing changes, just the log, and in the right way:
Reference: Aspect-oriented Spring, from Spring in Action, Fourth Edition by Craig Walls. Chapter four.
Repository on GitHub: See the JUnit test case, and browse the concert package.

Go to the full post

HackerRank Merge Sort: Counting Inversions

I found the description of this problem in its HackerRank page a bit misleading. Better trust its name instead. We have to count the inversions that occur when we run a merge-sort algorithm on list of positive integers.

So, basically, we have to increase a variable to keep track of the inversions and return it to the caller. If you know how to implement merge-sort, large part of the job is done.

The main problem is that they set the timeout very tightly, and it could be tough respecting it, especially if you are using an implementation language that is not among the fastest ones.

I use Python here, so I had to be careful. At least if I don't want to use PyPy to speed everything up.

In the end I came out with this stuff:
def count_inversions(a):
    return merge_sort(a, 0, len(a) - 1)
The required count_inversions() simply calls my adapted merge-sort() function.

Beside sorting, it also stores in the variable result the count of all the inversions, and then will return it.
def merge_sort(a, left, right):
    result = 0
    if left < right:
        center = left + (right - left) // 2
        result += merge_sort(a, left, center)
        result += merge_sort(a, center + 1, right)
        result += merge(a, left, center, right)
    return result
As you see, no surprise, it's a standard merge-sort implementation.

I have spent some sweating on merge, to save to most time I could without getting something still readable:
def merge(data, left, center, right):
def merge(data, left, center, right):
    merged = []  # 1
    result = 0  # 2

    i = left  # 3
    j = center + 1
    while i <= center and j <= right:  # 4
        if data[i] > data[j]:  # 5
            result += (center - i + 1)
            merged.append(data[j])
            j += 1
        else:
            merged.append(data[i])
            i += 1

    for i in range(i, center+1):  # 6
        merged.append(data[i])
    for j in range(j, right+1):
        merged.append(data[j])

    data[left:right+1] = merged  # 7
    return result
1. We need some temporary area where storing the merged data. I use this list.
2. Keep track of the swaps happening here.
3. The main while below runs on two loop variables, i and j, representing the indexes to the left and right elements.
4. Loop until there is at least one element on the left and one on the right to be merged.
5. If an element on the left is bigger than the ones on the right, we have a swap. Or better, as many swaps as the number of elements still on the right.
6. Take care of the leftover.
7. Replace the part of the list on which we worked with the merged elements.

Fast enough to get accepted. Test case and python script on GitHub.

Go to the full post

CodeEval Sum of integers

Given a list of integer numbers, return the largest sum in one of its subsequences of contiguous elements. CodeEval Sum of integers.

I have converted the given samples in python tests, then I added a few more cases to help me thinking about an algorithm.
def test_provided_1(self):
    self.assertEqual(8, solution('-10,2,3,-2,0,5,-15'))  # 1

def test_provided_2(self):
    self.assertEqual(12, solution('2,3,-2,-1,10'))  # 2

def test_plain(self):
    self.assertEqual(5, solution('2,3'))

def test_a_negative_first(self):
    self.assertEqual(3, solution('-2,3'))

def test_a_negative_second(self):
    self.assertEqual(2, solution('2,-3'))

def test_both_negative(self):
    self.assertEqual(-2, solution('-2,-3'))

def test_a_negative_in_between(self):
    self.assertEqual(2, solution('1,-1,2'))
The idea is, I am going to loop on the sequence keeping track of the candidate solution. I initialize it with the first value in the list, then I pass to consider the next element.
It is possible that the sum of the proposed solution with the current value is higher, so that would get the new tentative solution. Otherwise I would keep the original one.
But, consider for instance (1). If I simply add up the two values I get -8, that is better of -10. However I should consider just 2 as a better solution. To do that, I refine the check stated above. If the proposed solution is negative, there is no use in adding it up to the current value, I can discard it and keeping the new value as new candidate solution.
There is another problem, as seen in (2). What I should do when I see a negative number in the series after some positive numbers? The first idea could be consider close a subsequence and go looking for a new one. However this way of thinking is not rewarding here, where the full sequence, leads to a higher sum than its left and right parts.
To overcome this issue I introduce a second variable, acting as a tentative solution that keeps track of all the elements, negative included, until it gets negative.

After this thoughts, I written down this Python function:
def solution(line):
    head, *values = [int(x) for x in line.split(',')]  # 1

    result = head  # 2
    maybe = head
    for value in values:
        maybe = value if maybe < 0 else maybe + value  # 3
        if maybe > result:  # 4
            result = maybe
    return result
1. The input line is split on commas, its values are converted in integers and assigned to two variables, a single number, the head of the list that is managed differently, and a list of numbers, containing all the other elements.
2. I assign the first value to both solution. The currently chosen, result, and the tentative one, maybe.
3. For each following element, I check if it is worthy to keep the previous values, stored in maybe, and add the current value, or throw away all the old stuff and just keep the new value, trying to create a new subsequence.
4. Only if the tentative solution is actually better than the currently chosen one, I change it accordingly.

Well, it works. Python script and unit test are on GitHub.

Go to the full post

Plural Names Iterator

Given a file containing rules to convert a singular word in its plural, write a function that applies the best available rule for any passed name and return it in plural form.

The aim of this problem is the same of the one seen in the previous post, I am going to implement a similar plural() Python function that would get in input a word, as 'boy', and would return its plural form, like 'boys'.

The test case is just the same, plural() would be almost the same, the match_apply() closure I used to return two functions, one for checking the word against the current pattern, the other to apply the relative change to the word, is the same.
The big change is that I won't use a generator but a iterator under the bonnet, with the idea of reading the patterns from file just once, and not any time the user asks for a name pluralization.

So I create a class Rules, implementing the special methods __iter__() and __next__(), that make it both an iterable (for the first) and an iterator (for the latter), and then I would instantiate it in an object that is going to be used by plural(), instead of the generator.
class Rules:
    filename = 'plural_names_rules.txt'  # 1

    def __init__(self, filename=None):  # 2
        if filename:
            self.filename = filename

        self.rules = []
        with open(self.filename) as patterns:  # 3
            for line in patterns:
                pattern, search, replace = line.split()
                self.rules.append(match_apply(pattern, search, replace))  # 4

    def __iter__(self):  # 5
        self.index = -1
        return self

    def __next__(self):  # 6
        self.index += 1
        if len(self.rules) > self.index:
            return self.rules[self.index]
        raise StopIteration
1. By default, the rules are read from this file.
2. When initializing a Rules object, the user could change the rules file.
3. I have moved here the reading from file, so that it is done only once for each Rules object. Notice the with-as statement that delegates to Python the resource cleanup when exiting the block.
4. For each line in the pattern file, I extract the patter, search, and replace token, I pass them to the match_apply() closure, and I get back the relative match() apply() functions, that I push to the rules list.
5. I am stating that Rules is a Python iterable, providing a way to extract from it an iterator, that is the Rules object itself. Here I initialize the index to the internal list (namely rules) to the first item before the actual beginning (that means, minus one).
6. Rules is also a Python3 iterator. Each time it is required a new value from it, this method is called. If the iterator index is still in the range, the relative item is returned, otherwise, as required by the python iterator desing.

The change in the plural() function is minimal, but dense of meaning.
rules = Rules()  # 1


def plural(noun):
    for match, apply in rules:  # 2
        if match(noun):
            return apply(noun)
    return '???'
1. I need a Rules() object to work with. Here I instantiate it.
2. The for-each statement requires an iterable. The object rules has a __iter__() method, so it works fine in this role. Before executing a loop, for-each asks to the iterator provided by rules (that incidentally is rules itself) its next element. Since this is a Python3 script, the __next__() method of the iterator is called, getting the next element in the list of rules, a couple of functions retrieved from the match_apply() closure.

Reference: Dive into Python 3 Chapter 7 Classes & Iterators, sections 6.

Test case and python script on GitHub.

Go to the full post

Plural Names Generator

Given a file containing rules to convert a singular word in its plural, write a function that applies the best available rule for any passed name and return it in plural form.

My target is giving the user a function, named plural(), that gets in input a word and an optional file name for the rules to be applied (defaulted by the filename provided), and returns it as a plural word.

Here is a few test for this functionality:
def test_box(self):
    self.assertEqual('boxes', plural('box'))

def test_bush(self):
    self.assertEqual('bushes', plural('bush'))

def test_soliloquy(self):
    self.assertEqual('soliloquies', plural('soliloquy'))

def test_boy(self):
    self.assertEqual('boys', plural('boy'))

def test_vacancy(self):
    self.assertEqual('vacancies', plural('vacancy'))
The rules are in this format:
[sxz]$ $ es
[^aeioudgkprt]h$ $ es
(qu|[^aeiou])y$ y$ ies
$ $ s
I have four rules, each rule has three tokens.
First token is the tail of the word, that I am going to check to decide how to change it. First rule applies to words ending by 's', or 'x', or 'z'. The second one to words ending by 'h', and having in the previous position a letter that is not 'a', or 'e', or ..., or 't'. The third one to words ending by 'y', preceded by 'qu' or a single letter that is not a vowel. As last resort, the fourth rule is applied to any word.
The second token states what I have to change. A plain dollar sign '$' says that I have to add something at the end of the word, withour removing anything. The couple 'y$' means I have to remove the last 'y' in the word that is going to be replaced with something else.
The third token is what I have to add to the original word to make it plural. It ranges from 's', default case, to 'es', to 'ies'.

The plural() function makes use of a generator, named rules(), that returns a couple of function, one, match() to verify if the current word matches a specific rule in the list, and another, apply() to convert a word in its plural form, following the current rule.
def plural(noun, file='plural_names_rules.txt'):
    for match, apply in rules(file):
        if match(noun):
            return apply(noun)
    return '???'  # 1
1. If we have a list of rules carefully written, we should never get here. We should always get a matching rule for each word.

Let's see the rules() generator:
def rules(file):
    with open(file) as patterns:  # 1
        for line in patterns:
            pattern, search, replace = line.split()
            yield match_apply(pattern, search, replace)  # 2
1. Using the with-as compound statement we delegate to python the nuisance of cleaning up the involved resources as we exit the block - no matter how brutally that could happen. So, here that we are opening a file, we can be sure it will be closed when leaving the scope.

The generator yield the result of calling the match_apply() function, that is going to return a couple of functions. These functions are going to use the three parameters we are passing to match_apply(), and use them in conjunction with a new parameter that they are going to receive from their caller. So we are talking about a closure.
def match_apply(pattern, search, replace):
    def match(word):
        return re.search(pattern, word)  # 1

    def apply(word):
        return re.sub(search, replace, word)  # 2

    return match, apply  # 3
1. The match() function is going to be called on a passed word, and it would apply a regular expression search on the pattern passed to the closure.
2. The apply() function would call the regular expression sub() function using search and replace parameters from the closure, combining it with its word parameter.
3. The two functions are returned to the caller.

If you follow the test run in debugger mode, you will see what actually happens.
The test calls plural(), it loops on the generator rules(), getting from the closure match_apply() the couple of functions that check if the word matches the current rule and in that case apply the change to make the word plural.

Reference: Dive into Python 3, section 6.6.

Unit test and Python script are on GitHub.

Go to the full post