Pages

SAX parsing with Xerces

Besides a XML DOM parser, Xerces make available a SAX parser too.

Using the DOM parser we have access to the complete XML document, and we can navigate through it as we wish - so it is usually the best option for small XML and when we want to have full control over it.

Using the SAX parser, on the other way, works nice when the size of the XML document is so big that it gets unpractical to use the DOM one, that has to load all of it in memory before letting us doing our job.

We create instead a class specifying the behaviour we want to be accomplished when some specific event is generated by the SAX parser, and we pass an instance of this class to SAX, letting it calling back our methods.


Say that we want just to be acknowledged of the fact that SAX finds the start and the end of the passed XML document. We create a class that extends HandlerBase:

#include <xercesc/sax/HandlerBase.hpp>
#include <iostream>

XERCES_CPP_NAMESPACE_USE

class SimpleHandler : public HandlerBase
{
public:
/**
* override HandlerBase::startDocument()
*/
void startDocument()
{
std::cout << "Start document" << std::endl;
}

/**
* override HandlerBase::endDocument()
*/
void endDocument()
{
std::cout << "End document" << std::endl;
}
};

And we use an instance of this class in a function like this:

#include <xercesc/parsers/SAXParser.hpp>

XERCES_CPP_NAMESPACE_USE

void saxParse(const XMLCh* filename) // 1.
{
SAXParser parser;
SimpleHandler handler;
parser.setDocumentHandler(&handler); // 2.

try {
parser.parse(filename); // 3.
}
catch (const XMLException& xe) {
std::wcerr << "XML Exception: " << xe.getMessage() << std::endl;
return;
}
catch (const SAXParseException& se) {
std::wcerr << "SAX Parse Exception: " << se.getMessage() << std::endl;
return;
}
catch (...) {
std::cerr << "Unexpected Exception" << std::endl;
return;
}
}

1. As usual in Xerces, we use wide character strings - XMLCh is a define for wchar_t.
2. Here we pass our custom handler object to the parser - before actually parsing the file - so that it can call back our methods to perform the required functionality at the expected time.
3. Finally, we parse the XML file, ready to catch any possible exception, even the unexpected ones.

The required Xerces initialization and termination is shown in a previous post.

More details on SAX on chapter 12 of Beginning XML by David Hunter et al. (Wrox).

No comments:

Post a Comment