Geeks With Blogs
David Douglass .NET on My Mind

In some situations it is helpful to maintain an XML document using a class that mimics the structure of the XML document.  If the document has a schema there shouldn’t be any reason to code the class.  If the schema is well written, then everything necessary to generate the class is in the schema.  You could then write code along the lines of:

MyDoc doc = new MyDoc(documentPath);

doc.Collection["key"].Name = "Joe";

doc.Save();

There are 3 code generators I know of for generating these classes.  Unfortunately, all of them have serious limitations and will not allow you to write the simple code from above.

All the code generators fail to generate code for key access to elements with multiple occurrences.  Instead, there is an assumption that the document order position is known; access is strictly by index number.  If you want to find something by key, you have to loop through the data and look for it.  Also, the generated classes usually can't be extended through inheritance because returned references would be to the base class, not the derivation.

The best known of the code generators is xsd.  This is a command line tool that ships with the .NET SDK and Visual Studio.  If you've ever created a typed DataSet then Visual Studio ran xsd in the background for you.  Xsd generates classes in 2 styles, DataSet (i. e., a strongly typed DataSet) and Class.

Xsd in DataSet mode assumes your schema models a relational database.  This creates a naming restriction because xsd assumes that since a relational table can be defined only once in a database, a name will be used only once in a schema.  Thus, a schema for the following document:

<A>

      <B>

            <C>

                  <D />

            < </C>

      </B>

      <C>

            <D />

      </C>

</A>

can't be used because C is viewed as being defined twice.  Even if you write your schema to avoid this problem, xsd DataSets have other limitations:

·        The hierarchy is flattened.  This means that all non leaf elements are connected to the root element.  You can access root.B or root.C, but not root.B.C.

·        The code is bloated.  This has a minor impact on performance and makes using intellisense difficult as you wade through many useless (in this case) functions and properties.

One real asset for strongly typed DataSets is their understanding of XML Schema data types.  For example, even though an XML Schema duration is almost the same thing as a .NET TimeSpan, the representations are wildly different.  The strongly typed DataSet is the only code to handle this correctly.  Also, the strongly typed DataSet has built in support for accessing the file system.  All the other code generators are based on setting up an XmlSerializer and StreamReader.

Xsd can also be used in Class mode.  This creates a very lightweight class that creates a container class for the leaf nodes and array classes for the container classes, with the appropriate XML serialization attributes applied.  Because the classes are so simple, you can effectively derive from them to add key lookup functionality.  Unfortunately, all data types are represented as strings.  After modifying the document you'll need to validate it and deal with any errors.  The xsd classes alter the document hierarchy in the object model, which can be confusing when using intellisense.  Support for adding and removing elements is weak, it is done by direct manipulation of simple arrays.

Microsoft has released an unsupported tool, XsdObjectGen (http://www.microsoft.com/downloads/details.aspx?familyid=89E6B1E5-F66C-4A4D-933B-46222BB01EB0&displaylang=en) to address the shortcomings of xsd.  XsdObjectGen generates classes that follow the document hierarchy closely.  There is support for adding and removing elements, but adding elements requires more code than when using a strongly typed DataSet.  All data types are represented by strings and post modification validation is required.  Because of a bug in intellisense and the way XsdObjectGen creates code, intellisense sometimes doesn't list everything that it should.  You wind up in a situation where you just have to know what is actually allowed.

Dingo (http://dingo.sourceforge.net/) is an open source project to generate .NET code from schemas.  It was the only tool to generate code that didn't compile.  When Dingo encounters a data type it doesn't understand (such as a duration) it simply passes it on to .NET as a C# data type.  I had to add using statements (e. g., using duration = System.TimeSpan;) to several files to get the code to compile.  I also had to modify the namespace declarations.  Dingo passes the target namespace directly from the schema to the C# code.  If your schema has targetNamespace="urn:test", then the C# code gets a line namespace urn:test { (notice the :).  Dingo was unable to deserialize the instance document because it tried to load an XML schema duration into a .NET TimeSpan, which caused the deserialization to fail.  Dingo does not handle non repeating elements correctly.  If an elements can occur only once in a document (as per the schema) it is still stored in an array and must be accessed using array syntax.  Adding and removing nodes isn't directly supported; arrays must be directly manipulated.

My overall conclusion is that you are probably best off with strongly typed DataSets.  They are the most full featured and robust.  You'll lose some flexibility in designing your schemas, and the programming can be a bit clumsy, but this seems better than the alternatives.

Posted on Friday, April 14, 2006 1:55 PM | Back to top


Comments on this post: Adventures in Code Generation Land

# re: Adventures in Code Generation Land
Requesting Gravatar...
This is an amazing site auto insurance quotes
Left by Auto Insurance Quotes on Jun 23, 2011 5:13 PM

Your comment:
 (will show your gravatar)


Copyright © David Douglass | Powered by: GeeksWithBlogs.net