Pages

SAX parser and StartElement

Now we are about to modify our Xerces-C SAX parsing example to let the parser react at the starting of a new element.

Given the way the SAXParser is designed, we just have to change our callback class that implements DocumentHandler (actually, we extends HandlerBase that derives from it) overriding the startElement() method.

Our function must have this declaration:
void startElement(const XMLCh* const, AttributeList&)
where the first parameter is the element name, and the second is the list, possibly empty, of the associated attributes.

Here is the change in the code:

class SimpleHandler : public HandlerBase
{
public:
// ...

// ... we add this new function:

/**
* override HandlerBase::startElement(name, attrs)
*/
void startElement(const XMLCh* const name, AttributeList& attrs)
{
if(wcscmp(name, L"car") == 0 && attrs.getLength() == 1) // 1.
{
std::wcout << "Start a car: " << attrs.getName(0) << // 2.
" [" << attrs.getType(XMLSize_t(0)) << // 3.
"] = \"" << attrs.getValue(XMLSize_t(0)) << '\"' << std::endl;
}
else
{ // 4.
std::wcout << "Start element: " << name << std::endl;
for(XMLSize_t i = 0; i < attrs.getLength(); ++i)
{
std::wcout << "Attribute " << attrs.getName(i) <<
" [" << attrs.getType(i) <<
"] = \"" << attrs.getValue(i) << '\"' << std::endl;
}
}
}
};

1. this piece of code is called only for specific elements: the ones having name "car". Notice that Xerces works with wide character string, so we use wcscmp() instead of the plain strcmp(). In the second part of the condition we ensure that the element has one and only one attribute using the AttributeList::getLength() method.
2. AttributeList::getName() returns the name of the attribute specified by index.
3. AttributeList::getType() and AttributeList::getValue() are a bit trickier, because both of them are overloaded, and could be called passing the index or the name of the attribute. Nuisance is that we have to specify explicitely the type, we can't pass just the constant 0, otherwise the compiler wouldn't know if we mean it as a NULL pointer or a index.
4. Generic case: we output the element name and all its attributes, if any.

More details on SAX (referring to the Xerces-J implementation) on chapter 12 of Beginning XML by David Hunter et al. (Wrox).

No comments:

Post a Comment