[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: 3 approaches to structure lists, plus an analysis of each
There is a vast literature on this subject, see for example http://www.oasis-open.org/committees/sc_home.php?wg_abbrev=ubl-clsc Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Costello, Roger L. [mailto:costello@m...] > Sent: 14 February 2009 22:41 > To: 'xml-dev@l...' > Subject: 3 approaches to structure lists, plus an > analysis of each approach > > > Hi Folks, > > What are the different approaches to structure lists? What > are the pros and cons of each approach? Is there a way to > structure lists to maximize their utility and minimize their overhead? > > The purpose of this message is to document and analyze > several approaches to structure lists. I use "country list" > to illustrate the different approaches. > > ASSERTION: LISTS THAT CAN BE USED FOR MULTIPLE PURPOSES ARE GOOD > > Lists should be structured in a way that they can be used for > multiple purposes. For example, a country list may be: > > - used as values in an XForms pick list. > > - transformed into a document that contains, for each country, > sales figures (or death rates, births, political leadership, > religions, etc). > > - used to validate an element's content, e.g. The value of the > <country-visited> element must be a country. > > Those are only a few of the myriad uses of a country list. A > well-designed country list should support all of them. > > > xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > THREE APPROACHES > xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > > Below I show three approaches to structure lists. Other > approaches are possible, such as comma-separated values. > > I illustrate the three approaches using the country list > example and then follow with an analysis of each approach. > > > APPROACH #1: Express lists using the XML Schema vocabulary: > > --------------------------------------------- > <?xml version="1.0" encoding="UTF-8"?> > <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" > targetNamespace="http://www.countries.org" > xmlns="http://www.countries.org" > elementFormDefault="qualified"> > > <xs:element name="countries" type="countriesType" /> > > <xs:simpleType name="countriesType"> > <xs:restriction base="xs:string"> > <xs:enumeration value="Afghanistan"/> > <xs:enumeration value="Albania"/> > <xs:enumeration value="Algeria"/> > ... > </xs:restriction> > </xs:simpleType> > </xs:schema> > --------------------------------------------- > > > APPROACH #2: Express lists using the RELAX NG vocabulary: > > --------------------------------------------- > <?xml version="1.0" encoding="UTF-8"?> > <grammar xmlns="http://relaxng.org/ns/structure/1.0" > ns="http://www.countries.org"> > > <define name="countriesElement"> > <element name="countries"> > <ref name="countriesType" /> > </element> > </define> > > <define name="countriesType"> > <choice> > <value>Afghanistan</value> > <value>Albania</value> > <value>Algeria</value> > ... > </choice> > </define> > </grammar> > --------------------------------------------- > > > APPROACH #3: Express lists using domain-specific > vocabularies. The markup comes from terminology used by > Subject Matter Experts (SMEs): > > --------------------------------------------- > <?xml version="1.0" encoding="UTF-8"?> > <countries xmlns="http://www.countries.org"> > > <country>Afghanistan</country> > <country>Albania</country> > <country>Algeria</country> > ... > </countries> > --------------------------------------------- > > > xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > ANALYSIS > xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > > > ANALYSIS OF APPROACH #1 AND APPROACH #2 > > Approach #1 and approach #2 make it easy to use a list for > validation purposes. A schema simply imports the list schema > and then its values are immediately available for validating > element content. > > Here is an XML Schema that imports the country list XML > Schema and uses its simpleType as the datatype for the > <country-visited> element: > > --------------------------------------------- > <?xml version="1.0" encoding="UTF-8"?> > <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" > targetNamespace="http://www.example.org" > xmlns:c="http://www.countries.org" > elementFormDefault="qualified"> > > <xs:import namespace="http://www.countries.org" > schemaLocation="countries.xsd" /> > > <xs:element name="country-visited" type="c:countriesType" /> > > </xs:schema> > --------------------------------------------- > > Here is a RELAX NG schema that includes the country list > RELAX NG schema and uses its define element as the datatype > for the <country-visited> element: > > --------------------------------------------- > <?xml version="1.0" encoding="UTF-8"?> > <grammar xmlns="http://relaxng.org/ns/structure/1.0" > ns="http://www.example.org"> > > <include href="countries.rng"/> > > <start> > <element name="country-visited"> > <ref name="countriesType" /> > </element> > </start> > > </grammar> > --------------------------------------------- > > If the schema doing the importing is an XML Schema then it > can't use the list if it's expressed using RELAX NG. And vice versa. > > Although these two approaches enable the efficient usage of > lists for validation, it's not clear that they are the most > efficient format for the myriad other ways that a list may be > used (rendering in a pick list, merging with other lists, > searching, and so forth). This is discussed further in the > below analysis of approach #3. > > > ANALYSIS OF APPROACH #3 > > Recall that approach #3 uses domain-specific terminology. > This can be helpful to Subject Matter Experts (SMEs) as they > maintain the lists. > > Validation can be accomplished using a Schematron schema. > Here is a Schematron schema which validates that the content > of the <country-visited> element matches one of the values in > the country list: > > --------------------------------------------- > <?xml version="1.0"?> > <sch:schema xmlns:sch="http://www.ascc.net/xml/schematron"> > <sch:ns uri="http://www.countries.org" > prefix="c" /> > > <sch:pattern name="Country List Check"> > > <sch:rule context="country-visited"> > > <sch:assert test=". = document('countries.xml')//c:country"> > The value of country-visited must be one of the > countries in the countries' list. > </sch:assert> > > </sch:rule> > > </sch:pattern> > > </sch:schema> > --------------------------------------------- > > With approach #3 the markup used to construct the list has > semantics specific to the list: > > {http://www.countries.org}countries > {http://www.countries.org}country > > This makes possible the creation of programs that are readily > understood, as they use terminology consistent with the > domain. For example, this XSLT program uses the country list > to generate an HTML list of all countries: > > --------------------------------------------- > <?xml version="1.0"?> > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > xmlns:c="http://www.countries.org" > version="2.0"> > > <xsl:output method="html"/> > > <xsl:template match="c:countries"> > > <html> > <head> > <title>Countries of the World</title> > </head> > <body> > <ol> > <xsl:apply-templates /> > </ol> > </body> > </html> > > </xsl:template> > > <xsl:template match="c:country"> > > <li> > <xsl:value-of select="." /> > </li> > > </xsl:template> > > </xsl:stylesheet> > --------------------------------------------- > > Note the template match values. They match on: > > {http://www.countries.org}countries > {http://www.countries.org}country > > > Conversely, with approach #1 and approach #2 the markup used > to construct the list has semantics that are specific to the > schema language: > > {http://www.w3.org/2001/XMLSchema}element > {http://www.w3.org/2001/XMLSchema}simpleType > {http://www.w3.org/2001/XMLSchema}restriction > {http://www.w3.org/2001/XMLSchema}enumeration > > {http://relaxng.org/ns/structure/1.0}define > {http://relaxng.org/ns/structure/1.0}choice > {http://relaxng.org/ns/structure/1.0}value > > Consequently programs must operate using schema terminology > rather than domain terminology. For example, this XSLT > program generates an HTML list of all countries from the > countries list specified by the XML Schema document: > > --------------------------------------------- > <?xml version="1.0"?> > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > xmlns:xs="http://www.w3.org/2001/XMLSchema" > version="2.0"> > > <xsl:output method="html"/> > > <xsl:template match="xs:simpleType"> > > <html> > <head> > <title>Countries of the World</title> > </head> > <body> > <ol> > <xsl:apply-templates /> > </ol> > </body> > </html> > > </xsl:template> > > <xsl:template match="xs:enumeration"> > > <li> > <xsl:value-of select="@value" /> > </li> > > </xsl:template> > > </xsl:stylesheet> > --------------------------------------------- > > Note the template match values. Rather than the XSLT program > operating on <countries> and <country> elements, it operates > on <schema>, <simpleType>, <restriction>, and <enumeration> > elements. This makes programming challenging and error-prone. > > With approach #3 a list can be used as a building block (data > component) which can be immediately dropped into other > documents to create compound documents. For example, consider > a list of religions, also formatted using approach #3: > > --------------------------------------------- > <?xml version="1.0" encoding="UTF-8"?> > <religions xmlns="http://www.religions.org"> > > <religion>Baha'i</religion> > <religion>Buddhism</religion> > <religion>Catholicism</religion> > ... > > </religions> > --------------------------------------------- > > It is easy to construct a compound document comprised of the > country and religion lists: > > --------------------------------------------- > <?xml version="1.0" encoding="UTF-8"?> > <religions-per-country> > <countries xmlns="http://www.countries.org"> > <country>Afghanistan</country> > <country>Albania</country> > <country>Algeria</country> > ... > </countries> > <religions xmlns="http://www.religions.org"> > <religion>Baha'i</religion> > <religion>Buddhism</religion> > <religion>Catholicism</religion> > ... > </religions> > <!-- markup that maps religions to countries --> > </religions-per-country> > --------------------------------------------- > > Due to the modularity provided by approach #3, it is possible > to perform list-specific processing on this compound > document. That is, a country-list-aware application would be > able to extract the country list from this compound document > and process it. Ditto for a religion-list-aware application. > > With approach #1 and approach #2 the XML vocabulary used to > construct the list is the same regardless of the list. Here > is the <religions-per-country> document using lists that are > defined using the XML Schemas vocabulary: > > --------------------------------------------- > <?xml version="1.0" encoding="UTF-8"?> > <religions-per-country> > <xs:simpleType xmlns:xs="http://www.w3.org/2001/XMLSchema" > name="countriesType"> > <xs:restriction base="xs:string"> > <xs:enumeration value="Afghanistan"/> > <xs:enumeration value="Albania"/> > <xs:enumeration value="Algeria"/> > ... > </xs:restriction> > </xs:simpleType> > <xs:simpleType xmlns:xs="http://www.w3.org/2001/XMLSchema" > name="religionsType"> > <xs:restriction base="xs:string"> > <xs:enumeration value="Baha'i"/> > <xs:enumeration value="Buddhism"/> > <xs:enumeration value="Catholicism"/> > ... > </xs:restriction> > </xs:simpleType> > <!-- markup that maps religions to countries --> > </religions-per-country> > --------------------------------------------- > > The namespace used by the country list cannot be > distinguished from the namespace used by the religion list. > Thus, the benefits namespaces provide in terms of modularity > are negated. It is not easy to create country-list-aware > applications or religion-list-aware applications. > > Approach #3 has minimal markup overhead. > > > ANALYSIS OF ALL APPROACHES > > Regardless of which approach is used, the meaning of the list > and its values must be clearly documented. It may be > challenging to achieve consensus on meaning: > > - The same terminology may be used by different people to > mean the same thing. For example, one person expects to see > Puerto Rico in a country list, whereas another person does > not. This is because one person defines "country" only as > principal sovereignties whereas another person defines > "country" to include territories and protectorates. > > - Further, some people use different terminology to mean the > same thing. For example, one person calls it "country" > another calls it "principality." > > Thus, with all approaches the issue arises of which > terminology and definitions to adopt. > > > OTHER FACTORS? > > Above is my initial stab at analyzing the three approaches. > Are three other factors of each approach that I have not considered? > > /Roger > ______________________________________________________________ > _________ > > XML-DEV is a publicly archived, unmoderated list hosted by > OASIS to support XML implementation and development. To > minimize spam in the archives, you must subscribe before posting. > > [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ > Or unsubscribe: xml-dev-unsubscribe@l... > subscribe: xml-dev-subscribe@l... List archive: > http://lists.xml.org/archives/xml-dev/ > List Guidelines: http://www.oasis-open.org/maillists/guidelines.php >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|