RelaxNGCC チュートリアル3

$Revision: 1.2 $ by Kohsuke Kawaguchi

So far, we've seen how we can use RelaxNGCC to develop a simple program that works on XML documents. In many real-world applications, it is often quite useful to build so-called "abstract syntax tree" (AST), AST is somewhat like DOM, but it's better than DOM in the sense that it gives you typed access to your data.

AST is what you get by using other data-binding tools like Relaxer, Castor, or JAXB. In this example, we will see how RelaxNGCC can be used to build AST.

In this example, we will also see a text-based syntax that doesn't use any c:alias or c:java attributes. This will render a grammar incorrect (because RELAX NG doesn't allow you to write characters), but it saves you a lot of typing.

Grammar

We will use the same sample schema as the previous tutorial, but this time we will turn it into AST that consists of the File class and the Dir class.

Since RelaxNGCC will make one class per each define block, you usually find it necessary to modify a grammar to make a good AST. In the following example, the structure of the grammar is refactored to make Dir and File classes out of it.

<?xml version="1.0" ?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
  xmlns:c="http://www.xml.gr.jp/xmlns/relaxngcc"
  c:package="test.sample3">

  <c:java-import>
    import java.util.*;
  </c:java-import>

  <start c:class="Sample3">
    <element name="files">
      result=<ref name="Dir" />("");[1]
    </element>
  </start>

  <define name="Dir" c:params="String name">
    <c:java-body>
      private final Set files = new HashSet();
      private final Set dirs = new HashSet();
      
      // accessor methods
      public Iterator iterateFiles() { return files.iterator(); }
      public Iterator iterateSubDirs() { return dirs.iterator(); }
    </c:java-body>
    <zeroOrMore>
      <choice>
        <group>
          f=<ref name="File" />
          files.add(f);[2]
        <group/>
        <element name="directory">
          dn= <attribute name="name"/>
          d = <ref name="Dir" />(dn/*dir name*/);
          dirs.addAll(d);
        </element>
      </choice>
    </zeroOrMore>
  </define>
  
  <define name="File">
    [3]
    <c:java-body>
      private String name;
      public String getName() { return name; }
    </c:java-body>
    <element name="file">
      name=<attribute name="name" />
    </element>
  </define>
</grammar>

Text-based syntax

[1] shows how text-based c:alias and c:with-params work. This example is treated as follows:

  <ref name="Dir" c:alias="result" c:with-params='""' />

[2] is the text-based syntax of c:java element. This [2] also highlights one thing that you need to be careful about. When you write <c:java> inside <choice> or <interleave>, you need to use <group>, or else RelaxNGCC cannot determine whether it should be executed after <ref name="File"/> is executed or before <directory> element is found. This additional <group> makes it clear when it is executed.

Text-based syntax is a new experimental feature, and the syntax could be changed in future. Also note that you still need to use <c:java-import> and <c:java-body> elements.

Abstract Syntax Tree

Whenever you define an alias, a corresponding Java field is declared on a Java class. This behavior makes RelaxNGCC very convinient to quickly build AST. For example, with the following fragment, you'll get the Address class with three field values.

<define name="address" c:alias="ASTAddress">
  <element name="address">
    name=<attribute name="name"/>
    <optional>
      email=<attribute name="e-mail"/>
    </optional>
    <optional>
      zipCode=<attribute name="zip-code"/>
    </optional>
  </element>
</define>

Somtimes, you don't want to directly expose values to other classes. [3] shows how you can make a field private and still provide access to other classes. If you declare the same variable by yourself, RelaxNGCC won't re-declare the same variable. In this way, you can change access modifiers or the type of variables.

If you need to store values in a collection, such as in [2], then you need to anchor the value into a variable once, then add it to the collection later.

Conclusion

By using the text-syntax, the source grammar gets easier to read and write. The source grammar isn't a valid grammar anymore, but you can write a simple XSLT transformation that strips away all those texts. And when you are not working on open-source projects, you need to do this anyway, since the grammar will be shipped with your binary.

RelaxNGCC greatly simplifies the development of AST from XML. The obtained AST can be used by other components inside your application. This approach makes it really painless to read XML documents. The downside of this approach is that the obtained AST is not so beautiful compared to a hand-written AST simply because it has a lot of other methods/fields that are meaningless to applications of AST.

Using other data-binding tools will usually give you this kind of AST, but RelaxNGCC gives you much more flexibility.


RelaxNGCC home