RelaxNGCC チュートリアル５

$Revision: 1.2 $ by Kohsuke Kawaguchi

When you are building a non-trivial parser by using RelaxNGCC, you'll often find it useful to make a sort of "context" information available to all the parsing code. Such context information can include an error handler to report problems, a global dictionary where you keep ID values, a stack that tracks xml:base attribute, or a map that keeps track of in-scope namespace bindings.

In this tutorial, I will explain how you can extend NGCCRuntime to meet those needs.

How NGCCRuntime work?

The primary role of NGCCRuntime is to receive SAX events and choreograph NGCCHandler-derived classes, which are generated from the schema. For this purpose, it implements XMLFilter. In turn, each NGCCHandler class has the $runtime field, which points back to the instance of NGCCRuntime in charge.

RelaxNGCC allows applications to use its own class instead of NGCCRuntime, provided that it extends NGCCRuntime. There you define your own methods and fields, and you can have your NGCCHandlers use those methods and fields.

Let's see a very simple example of this. First, you define your own class by extending NGCCRuntime.

package org.acme.parser;
import org.acme.parser.gen.NGCCRuntime;

public class MyRuntime extends NGCCRuntime {
    public void sayHello() {
        System.out.println("Hello, world!");
    }
}

It's usually a good idea to use a dedicated package for the NGCC-generated files (because you can easily delete all the generated files), so I assumed that the generated code will go to org.acme.parser.gen package.

Then, you put a cc:runtime-type attribute to your grammar, so that the compiler can generate a reference to the runtime by using a proper type.

<element name="root"
  xmlns="http://relaxng.org/ns/structure/1.0"
  xmlns:cc="http://www.xml.gr.jp/xmlns/relaxngcc"
  cc:runtime-type="org.acme.parser.Runtime"
  >
  <text/>
  <cc:java>
    $runtime.sayHello();
  </cc:java>
</element>

Each handler class will be generated with the $runtime field declared as the specified type (when you don't specify cc:runtime-type, it will still be declared but as the default NGCCRuntime type.) Thus to access the runtime, you simply need to use this field, as shown in the example.

Built-in Functionalities of NGCCRuntime

Default NGCCRuntime comes with many functionalities that are useful to many applications.

org.xml.sax.Locator getLocator()

This method returns a Locator object which keeps information about the source location information (such as line numbers.) Note that you need to take a snapshot of Locator if you want to remember the value. (Usually by new org.xml.sax.helpers.LocatorImpl(getLocator()))

String resolveNamespacePrefix( String prefix )

This method resolves a namespace prefix to the corresponding namespace URI. Thus this method can be used to process QNames.

void redirectSubtree( ContentHandler child, String uri, String local, String qname )

This method allows you to redirect a sub-tree to another ContentHandler. The specified ContentHandler will receive this sub-tree as if it were the whole document (that is, startDocument and endDocument are properly invoked, for example.) This mechanism can be used to delegate a processing to some other library. In the following example, a dom4j tree is built from a sub-tree.

<element name="root"
  xmlns="http://relaxng.org/ns/structure/1.0"
  xmlns:cc="http://www.xml.gr.jp/xmlns/relaxngcc"
  >
  <cc:java-body>
    private org.dom4j.io.SAXContentHandler domBuilder
      = new org.dom4j.io.SAXContentHandler();
  </cc:java-body>
  <zeroOrMore>
    <choice>
      <element name="child">
        <text/>
      </element>
      <group>
        <element>
          <anyName>
            <except>
              <nsName/>
            </except>
          </anyName>
          $runtime.redirectSubtree(domBuilder,uri,localName,qname);
          <empty/>
        </element>
        Document result = domBuilder.getDocument();
    </choice>
  </zeroOrMore>
</element>

Three variables uri,localName,qname are variables that defined locally. For now, just think of it as a magic. The key here is to define RELAX NG as if the content of that element is empty. By the time your code after the </element is executed, the whole sub-tree is completely fed to the specified ContentHandler.

Overriding ContentHandler methods

Unless you know what you are doing, it is not recommended to override methods like startElement or characters, because there is a delay between the time a runtime receives those events and the time a handler receives those events.

Conclusion

Writing a lengthy Java code inside schema is painful and makes it less legible. This is yet another reason why you want to use this mechanism. For a real world example, see the source code of RELAX NG parser of RelaxNGCC itself. Also, check out the javadoc of NGCCRuntime for details.

RelaxNGCC home