My target is giving the user a function, named plural(), that gets in input a word and an optional file name for the rules to be applied (defaulted by the filename provided), and returns it as a plural word.
Here is a few test for this functionality:
def test_box(self): self.assertEqual('boxes', plural('box')) def test_bush(self): self.assertEqual('bushes', plural('bush')) def test_soliloquy(self): self.assertEqual('soliloquies', plural('soliloquy')) def test_boy(self): self.assertEqual('boys', plural('boy')) def test_vacancy(self): self.assertEqual('vacancies', plural('vacancy'))The rules are in this format:
[sxz]$ $ es [^aeioudgkprt]h$ $ es (qu|[^aeiou])y$ y$ ies $ $ sI have four rules, each rule has three tokens.
First token is the tail of the word, that I am going to check to decide how to change it. First rule applies to words ending by 's', or 'x', or 'z'. The second one to words ending by 'h', and having in the previous position a letter that is not 'a', or 'e', or ..., or 't'. The third one to words ending by 'y', preceded by 'qu' or a single letter that is not a vowel. As last resort, the fourth rule is applied to any word.
The second token states what I have to change. A plain dollar sign '$' says that I have to add something at the end of the word, withour removing anything. The couple 'y$' means I have to remove the last 'y' in the word that is going to be replaced with something else.
The third token is what I have to add to the original word to make it plural. It ranges from 's', default case, to 'es', to 'ies'.
The plural() function makes use of a generator, named rules(), that returns a couple of function, one, match() to verify if the current word matches a specific rule in the list, and another, apply() to convert a word in its plural form, following the current rule.
def plural(noun, file='plural_names_rules.txt'): for match, apply in rules(file): if match(noun): return apply(noun) return '???' # 11. If we have a list of rules carefully written, we should never get here. We should always get a matching rule for each word.
Let's see the rules() generator:
def rules(file): with open(file) as patterns: # 1 for line in patterns: pattern, search, replace = line.split() yield match_apply(pattern, search, replace) # 21. Using the with-as compound statement we delegate to python the nuisance of cleaning up the involved resources as we exit the block - no matter how brutally that could happen. So, here that we are opening a file, we can be sure it will be closed when leaving the scope.
The generator yield the result of calling the match_apply() function, that is going to return a couple of functions. These functions are going to use the three parameters we are passing to match_apply(), and use them in conjunction with a new parameter that they are going to receive from their caller. So we are talking about a closure.
def match_apply(pattern, search, replace): def match(word): return re.search(pattern, word) # 1 def apply(word): return re.sub(search, replace, word) # 2 return match, apply # 31. The match() function is going to be called on a passed word, and it would apply a regular expression search on the pattern passed to the closure.
2. The apply() function would call the regular expression sub() function using search and replace parameters from the closure, combining it with its word parameter.
3. The two functions are returned to the caller.
If you follow the test run in debugger mode, you will see what actually happens.
The test calls plural(), it loops on the generator rules(), getting from the closure match_apply() the couple of functions that check if the word matches the current rule and in that case apply the change to make the word plural.
Reference: Dive into Python 3, section 6.6.
Unit test and Python script are on GitHub.
No comments:
Post a Comment