RelaxNGCC Tutorial 4

$Revision: 1.2 $ by Kohsuke Kawaguchi

In the previous tutorial, we saw how RelaxNGCC can make it a snap to build AST out of a RELAX NG grammar. The main drawback there was the beauty of the generated code. Hey, it's a machine-generated code. What would you expect?

In this tutorial, you'll learn how RelaxNGCC can be used to develop AST with the quality equvalent to hand-written ones.

Combining hand-written code with machine-generated code

The technique we pursue in this tutorial is to combine hand-written code with code generated by RelaxNGCC; you write the object model by yourself, then you use RelaxNGCC to build a parser that reads XML documents and compose an AST from those objects.

Obviously this approach needs you to write more code, but it has the following advantages:

  1. The interface you'll expose will be as good as you can write. It won't have any single ugly internal method exposed. You can write nice javadoc comments as well.
  2. You can design a fairly complicated inheritance relationship among your AST node.
  3. If you already have AST code, then you can write a parser that builds that AST.
  4. People who use your exposed interface won't even notice that you are using RelaxNGCC.
  5. Since RelaxNGCC uses SAX, it is more memory-efficient. More importantly, you can access line number information, which is quite useful to produce human-readble messages.

To illustrate the difference in the beauty of the exposed interfaces, we use the same schema as in the previous tutorial again.

Hand-code Your Object Model

First, we design our AST. We'll have Folder and File. Since those two are common in being an object of a file system, we also introduce the FileSystemObject as the base class.

public abstract class FileSystemObject {
    protected FileSystemObject( String _name ) {
        this.name = _name;
    }
    
    private String name;
    
    public String getName() { return name; }
}

At this base class, we only define the name. Then we define File and Folder.

public final class File extends FileSystemObject {
    public File( String name ) {
        super(name);
    }
    
    /**
     * blah blah blah
     *
     * @return something useful
     */
    public InputStream open() {
        ....
    }
    
    // other methods that you define
    .....
}

public final class Folder extends FileSystemObject {
    public Folder( String name ) {
        super(name);
    }
    
    // files and sub-folders inside this folder.
    private final Map items = new Hashtable();
    
    public void add( FileSystemObject fso ) {
        items.put(fso.getName(),fso);
    }
    
    public FileSystemObject get( String name ) {
        return (FileSystemObject)items.get(name);
    }
    
    .....
}

Use RelaxNGCC to write a parser

Once you define your object model and define how it can be constructed, then you'll use RelaxNGCC to build a parser.

<?xml version="1.0" ?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
  xmlns:c="http://www.xml.gr.jp/xmlns/relaxngcc"
  c:package="test.sample4.parser">

  <c:java-import>
    import test.sample4.*;
  </c:java-import>

  <start c:class="Sample4">
    <element name="files">
      result=<ref name="FolderContents" />("");[1]
    </element>
  </start>

  <define name="FolderContents" c:params="String name"
    c:return-type="Folder" c:return-value="folder">
    [2]
    <c:java-body>
      private Folder folder;
      private FileSystemObject child;
    </c:java-body>
    
    folder = new Folder(name);
    <zeroOrMore>
      <choice>
        child=<ref name="File" />
        <element name="directory">
          subFolderName=<attribute name="name"/>
          child=<ref name="Dir" />(subFolderName);
        </element>
      </choice>
      folder.add(child);[3]
    </zeroOrMore>
  </define>
  
  <define name="File"
    c:return-type="File" c:return-value="makeResult()">[4]
    
    <c:java-body>
        private File makeResult() {
            return new File(name);
        }
    </c:java-body>
    <element name="file">
      name=<attribute name="name" />
    </element>
  </define>
</grammar>
[1] We define this Sample4 class to have the result field. Upon the completion of a parsing, this field will hold a reference to the parsed result.
[2] This is the heart of this parser. We take the folder name as a parameter, then build a Folder object and return it as a parsed result from this RELAX NG pattern. Three attributes are used to specify this behavior.
[3] This part of the code is a bit interesting. We declare the child variable to be of type FileSystemObject, so regardless of the branch this <choice> took (whether it is a File or a Folder), this variable will have a reference to it. So we will add the newly parsed child object to the folder variable here.
[4] It is often useful to make a function call inside the return-value attribute, as shown in this example.

Conclusion

This technique is so powerful that RelaxNGCC is using itself in this way to parse RELAX NG grammars. To see how it works, have a look at src/relaxngcc/parser/relaxng.rng file.


RelaxNGCC home