RelaxNGCC Tutorial 6

$Revision: 1.1 $ by Kohsuke Kawaguchi

This tutorial describes how you can write code to parse documents by the code generated by the compiler. This is a two-step process. First, you set up a NGCCRuntime object (or its derived class if you are using c:runtime-type customization.) Second, you use another component, such as JAXP, to send SAX events to the generated code.

Setting up NGCCRuntime

Assume Foo is one of classes the compiler has generated. Then, the following code sets up a runtime object:

NGCCRuntime runtime = new NGCCRuntime();
Foo foo = new Foo(runtime);
runtime.pushHandler(foo);

If you are using your own runtime class MyRuntime, then replace NGCCRuntime with MyRuntime. Similarly, if you specify c:params on the scope block, then you need to add whatever parameters to the constructor.

Note that you can start parsing by any generated handler class.

Using JAXP

Perhaps the most straight-forward way to parse an XML document is to use JAXP. This is should be very familar to those of you who already know JAXP. First, you need to create a new instance of XMLReader.

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
XMLReader reader = factory.newSAXParser().getXMLReader();

Then you set an NGCCRuntime, as initialized as above, as the content handler.

reader.setContentHandler(runtime);

Finally, you parse a document and retrieve a parsing result from the root handler object:

reader.parse(inputSource);
doSomethingWith(foo);

Combining with XMLFilter

Sometimes adding a simple filter between the XML parser and the generated handler makes the processing vastly simple.

Take RelaxNGCC itself as an example, it uses itself to generate code to parse RELAX NG. RELAX NG allows foreign elements to appear in arbitrary positions, but we don't care about those elements.

If we are to write a grammar that handles this, we have to have a lot of <interleave>s to accept those foreign elements. This complicates the grammar, and also bloats the generated code since we need a bigger internal state transition table.

The other simpler approach is to write a filter and strips away all the irrelevant elements before it even gets to the generated handler.

Since now the generated handlers don't need to worry about those extras, this makes the schema much simpler.


RelaxNGCC home