[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: generate common xml shema from multiple xml instances
> I further see following issues with the usefulness of XML to > XSD conversion tools. > > 1) Suppose a following element exists in the XML document. > > <color>RED</color> > > How would the "XML to Schema" conversion tool guess, that the > element "color" represents a "visual attribute of things" and > generate a simple type declaration like below: > > <xs:simpleType name="Color"> > <xs:restriction base="xs:string"> > <xs:enumeration value="RED" /> > <xs:enumeration value="GREEN" /> > <xs:enumeration value="YELLOW" /> > </xs:restriction> > </xs:simpleType> > > Which the Schema author may want to do. > > In the abscence of this semantic intelligence, the Schema > generation tool may generate a Schema declaration like following: > > <xs:element name="color" type="xs:string" /> Of course the tool can't have any semantic intelligence, but it's very easy to implement a heuristic that will generate an enumeration in most cases where it is appropriate. Saxon's DTDGenerator does it if the number of distinct values of an attribute is less than 20, and the number of instances of the attribute is more than 3 times the number of distinct values and more than 10. No heuristic like this will get the right answer every time, but this isn't an exercise in getting the right answer, it's an exercise in getting a schema that is sufficiently useful as a starting point for hand-tuning. > > 2) It may be difficult for the tool to reuse type > definitions. In case of structural similarities in a large > XML document, or a set of XML documents, the tool may > generate lot of Schema types, which the Schema author may > like to refactor. Yes, with a DTD generator I didn't have to tackle that one, but it's true enough that this is another challenge. However, it's again true that it should be possible to define a simple similarity metric over two sets of values to decide whether they are sufficiently similar to justify using the same type, or indeed two types one of which is a subtype of the other. Incidentally, it's quite possible to use attribute and element names as another heuristic. If an attribute name starts or ends in "date" then there's a fairly good chance it holds a date. > > Though I believe, the XML to Schema conversion tools may be > useful to quickly generate a Schema, which could be further > enahanced and refactored by the Schema author. > Yes, a schema generated from an instance - even from a large collection of instances - is never going to be perfect. But it can be surprisingly good. Regards, Michael Kay http://www.saxonica.com/ http://twitter.com/michaelhkay
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|