Spirit Qi provides a good number of built-in parsers that could be combined to create our own specific parser. The Spirit tutorial shows us how to start from a built-in parser (double_) to end up with a more complex parser that accepts a list of comma separated floating point numbers.
All the examples we are about to see in this post share the same structure: we call the Spirit Qi function phrase_parse() on a string containing our input specifying the parser that has to be applied, and the "skip-parser" element that could be in the input sequence and should not interfere with the evaluation (typically, and in this case too, anything that is considered a space - blank, return, ...). What we are changing is the parser that has to be used, so I wrote a function that implements the generic behaviour, and requires in input, besides the string containing the text to be evaluate, an expression that represents the parser to use:
#include <boost/spirit/include/qi.hpp>
#include <string>
// ...
template<typename Expr>
inline bool genericParse(const std::string& input, const Expr& expr)
{
std::string::const_iterator first = input.begin();
bool r = boost::spirit::qi::phrase_parse( // 1.
first, // 2.
input.end(),
expr, // 3.
boost::spirit::ascii::space // 4.
);
if(first != input.end()) // 5.
return false;
return r;
}
1. The phrase_parse() returns true if the input sequence is parsed correctly.
2. First two arguments: iterators delimiting the sequence.
3. Third argument: the parser.
4. Fourth argument: the skip-parser element
5. Here we implement a stricter parsing: we check that there is no trailing leftover.
Parsing a number
Now it is quite easy to implement a function that parses a floating point number:
bool bsParseDouble(const std::string& input)
{
return genericParse(input, boost::spirit::qi::double_);
}
boost::spirit::qi::double_ is the built-in parser that is used to identify a number that could be stored in a double variable.
I find that test cases are very useful not only to verify the correctness of the code we produce, but also to understand better what existing code actually does. So I have written a bunch of test cases to verify how the above written code behaves. Here is just the first one I have written:
TEST(BSParseDouble, Double)
{
std::string input("1.21");
EXPECT_TRUE(bsParseDouble(input));
}
Parsing two numbers
For parsing two floating point numbers we have to create a custom parser:
bool bsParseTwoDouble(const std::string& input)
{
auto expr = boost::spirit::qi::double_ >> boost::spirit::qi::double_;
return genericParse(input, expr);
}
Spirit overloads the operator right shift (>>) as a way to convey the meaning of "followed by". So we could read the custom parser we create as: a double followed by another double. And here is it one of the tests I have written for this function:
TEST(BSParseTwoDouble, Double)
{
std::string input("1.21");
EXPECT_FALSE(bsParseTwoDouble(input));
}
Parsing zero or more numbers
A postfix star (known as Kleene Star) is the usual way a zero or more repetition of a expression is represented in regular expressions. The problem is that there is no postfix start operator in C++, so that was not a possible choice for the Spirit designers. That's the reason why a postfix star is used instead:
bool bsParseKSDouble(const std::string& input)
{
return genericParse(input, *boost::spirit::qi::double_);
}
A test I wrote for this function ensures that a sequence of three double is accepted; another one is to check that a couples of ints in a few blanks are accepted too:
TEST(BSParseKSDouble, TrebleDouble)
{
std::string input("1.21 7.44 8.03");
EXPECT_TRUE(bsParseKSDouble(input));
}
TEST(BSParseKSDouble, BlankIntIntBlank)
{
std::string input(" 42 33 ");
EXPECT_TRUE(bsParseKSDouble(input));
}
Parsing a comma-delimited list of numbers
Finally, the big fish of this post. We expect at least one number, and a comma should be used as delimitator:
bool bsParseCSDList(const std::string& input)
{
auto expr = boost::spirit::qi::double_ >>
*(boost::spirit::qi::char_(',') >> boost::spirit::qi::double_);
return genericParse(input, expr);
}
We can read the parser in this way: a double followed by zero or more elements of the expression made by a comma followed by a double.
Actually, we didn't have to cast explicitely the character comma to the parser for it, since the operator >>, having on its right an element of type parser, is smart enough to infer the conversion on its own. So, we could have written:
auto expr = boost::spirit::qi::double_ >> *(',' >> boost::spirit::qi::double_);But it has been a good way to show the built-in char_ parser.
Being this parsing a bit more interesting, I'd suggest you to write a lot of test cases, to check if your expectations match the actual parsing behaviour. Here is a few of them:
TEST(BSParseCSDList, Empty)
{
std::string input;
EXPECT_FALSE(bsParseCSDList(input));
}
TEST(BSParseCSDList, Double)
{
std::string input("1.21");
EXPECT_TRUE(bsParseCSDList(input));
}
TEST(BSParseCSDList, DoubleDouble)
{
std::string input("1.21,7.44");
EXPECT_TRUE(bsParseCSDList(input));
}
TEST(BSParseCSDList, DoubleDouble2)
{
std::string input("1.21, 7.44");
EXPECT_TRUE(bsParseCSDList(input));
}
TEST(BSParseCSDList, DoubleDoubleBad)
{
std::string input("1.21 7.44");
EXPECT_FALSE(bsParseCSDList(input));
}
No comments:
Post a Comment