[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: generate common xml shema from multiple xml instances

  • From: Rita Shen <shaledova@gmail.com>
  • To: xml-dev@lists.xml.org
  • Date: Fri, 19 Jun 2009 11:40:56 +1000

Re:  generate common xml shema from multiple xml instances
Hi, thank you all for the helps!

I tried Michael's suggested method. Since for my project the requirement for the generated schema is to make sure all instance is valid against it, so this method worked well for me. 

By checking the results, I found that actually I got more simple type declarations (like Mukul mentioned in his email): 

> <xs:simpleType name="Color">
>   <xs:restriction base="xs:string">
>     <xs:enumeration value="RED" />
>     <xs:enumeration value="GREEN" />
>     <xs:enumeration value="YELLOW" />
>   </xs:restriction>
> </xs:simpleType>

than a simple element declaration like:
> <xs:element name="color" type="xs:string" />

But this is not good for some cases. For example for an element like date, or weight:-)

Cheers,
Rita

On Thu, Jun 18, 2009 at 6:33 PM, Michael Kay <mike@saxonica.com> wrote:
> I further see following issues with the usefulness of XML to
> XSD conversion tools.
>
> 1) Suppose a following element exists in the XML document.
>
> <color>RED</color>
>
> How would the "XML to Schema" conversion tool guess, that the
> element "color" represents a "visual attribute of things" and
> generate a simple type declaration like below:
>
> <xs:simpleType name="Color">
>   <xs:restriction base="xs:string">
>     <xs:enumeration value="RED" />
>     <xs:enumeration value="GREEN" />
>     <xs:enumeration value="YELLOW" />
>   </xs:restriction>
> </xs:simpleType>
>
> Which the Schema author may want to do.
>
> In the abscence of this semantic intelligence, the Schema
> generation tool may generate a Schema declaration like following:
>
> <xs:element name="color" type="xs:string" />

Of course the tool can't have any semantic intelligence, but it's very easy
to implement a heuristic that will generate an enumeration in most cases
where it is appropriate. Saxon's DTDGenerator does it if the number of
distinct values of an attribute is less than 20, and the number of instances
of the attribute is more than 3 times the number of distinct values and more
than 10. No heuristic like this will get the right answer every time, but
this isn't an exercise in getting the right answer, it's an exercise in
getting a schema that is sufficiently useful as a starting point for
hand-tuning.

>
> 2) It may be difficult for the tool to reuse type
> definitions. In case of structural similarities in a large
> XML document, or a set of XML documents, the tool may
> generate lot of Schema types, which the Schema author may
> like to refactor.

Yes, with a DTD generator I didn't have to tackle that one, but it's true
enough that this is another challenge. However, it's again true that it
should be possible to define a simple similarity metric over two sets of
values to decide whether they are sufficiently similar to justify using the
same type, or indeed two types one of which is a subtype of the other.

Incidentally, it's quite possible to use attribute and element names as
another heuristic. If an attribute name starts or ends in "date" then
there's a fairly good chance it holds a date.

>
> Though I believe, the XML to Schema conversion tools may be
> useful to quickly generate a Schema, which could be further
> enahanced and refactored by the Schema author.
>

Yes, a schema generated from an instance - even from a large collection of
instances - is never going to be perfect. But it can be surprisingly good.

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.