[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Strong Typing in SGML and XML
First, I'd like to concur with the need for a formal specification for data typing. I had hoped that HyTime's lextype feature would be sufficient. I for one would like to hear from the HyTime experts about how they would implement the parallel data typing. -- No use reinventing any standard. It may only need simplifying and explaining. Having said that, I ask when is strong data typing necessary? As far as I can tell there is only one place where it is useful -- when the document is being created or altered. There will always be data validation that cannot be handled by data typing and as such must be delegated to a validating application or a human. e.g. <NAME><FIRST>Albright</FIRST><LAST>Eric</LAST></NAME> As for comments about the proposal: I would like to see a simplified version of the data types. It is very important for databases to know the exact size in bytes that a data element will occupy. SGML/XML deals with a character string and therefore does not care. More important to me are the constraints on the data implicit by a given type. I think we need to determine the types of constraints that each data type requires and allow for the maximum flexibility without sacrificing precision. As far as I can tell, there are three basic types--character, numeric, and temporal. Each type requires its own unique constraints: CHARACTER - an alphabet, length constraint, content constraint (regular expressions) NUMERIC - a maximum value, a minimum value, some type of rounding/precision TEMPORAL - a maximum value, minimum value, (the maximum and minimum values may be constrained in relation to the current value), some type of rounding/precision I think that the CHARACTER data type should be able to specify the alphabet and length constraint within the content constraint. However some modification to the standard regular expression writing would be necessary. I for one do not want to have to type \([0-9][0-9][0-9]\)[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9] for a phone number. Perhaps \([0-9](+3)\)[0-9](+3)-[0-9](+4) would be better. To allow maximum flexibility and precision for numeric values, we should be able to specify the form (roman/arabic) and a base. The rounding allows us to constrain the significant digits to some factor of the base. A rounding type would be needed for the greatest flexibility (round/ceiling/floor). Temporal values can specify either an instant of time or an extent of time. They should also be able to be rounded. When an instant is rounded, the significant digits are to the left; when an extent is rounded, the significant digits are to the right. To signify that an instant is precise to the nearest five years, it would be rounded to 0005/00/00 00:00:00. To signify that an extent is precise to the nearest tenth of a second, it would be rounded by 0000/00/00 00:00:00.1 . Given this the "architectural form" for data typing would become: <!ATTLIST AnyElement XML-TYPE (character|numeric|temporal) #IMPLIED -- if omitted, default is character with no other constraints applied -- XML-TYPE-CONTENT CDATA #IMPLIED -- For CHARACTER types only; default is no constraint -- XML-TYPE-MIN CDATA #IMPLIED -- For NUMERIC/TEMPORAL; default is no constraint -- XML-TYPE-MAX CDATA #IMPLIED -- For NUMERIC/TEMPORAL; default is no constraint -- XML-TYPE-ROUNDTO CDATA #IMPLIED -- For NUMERIC/TEMPORAL; default is no constraint -- XML-TYPE-RNDMETH (round|ceiling|floor) #IMPLIED -- Round method; For NUMERIC/TEMPORAL default is "round" -- XML-TYPE-FORM (roman|arabic) #IMPLIED -- For NUMERIC; default is "roman" -- XML-TYPE-BASE CDATA #IMPLIED -- For NUMERIC; default is "10" -- XML-TYPE-TYPE (instant|extent) #IMPLIED -- required for TEMPORAL -- > This changes the number of attributes from 4 to 9 but provides for higher precision for data constraint. The examples would become: For a bank loan; balance, interest rate, and maturity date: <!ELEMENT BALANCE (#PCDATA) > <!ATTLIST BALANCE XML-TYPE CDATA #FIXED "NUMERIC" XML-TYPE-ROUNDTO CDATA #FIXED "0.01" XML-TYPE-MIN CDATA #FIXED "0.00" > <!ELEMENT INTEREST (#PCDATA)> <!ATTLIST INTEREST XML-TYPE CDATA #FIXED "NUMERIC" XML-TYPE-MAX CDATA #FIXED "100" -- in practice we may want this to be much lower -- XML-TYPE-MIN CDATA #FIXED "0" > <!ELEMENT MATURITY (#PCDATA)> <!ATTLIST MATURITY XML-TYPE CDATA #FIXED "TEMPORAL" XML-TYPE-TYPE CDATA #FIXED "INSTANT" XML-TYPE-ROUNDTO CDATA #FIXED "0000/00/01 00:00:00"> For an airline departure: passenger name, seat number, and departure time: <!ELEMENT LAST-NAME (#PCDATA)> <!ATTLIST LAST-NAME XML-TYPE CDATA #FIXED "CHARACTER" XML-TYPE-CONTENT CDATA #FIXED "[A-Z](*20)" -- up to 20 repetitions of [A-Z]--> <!ELEMENT FIRST-INITIAL (#PCDATA)> <!ATTLIST FIRST-INITIAL XML-TYPE CDATA #FIXED "CHARACTER" XML-TYPE-CONTENT CDATA #FIXED "[A-Z]" > <!ELEMENT SEAT-ROW (#PCDATA)> <!ATTLIST SEAT-ROW XML-TYPE CDATA #FIXED "NUMERIC" XML-TYPE-MIN CDATA #FIXED "1" XML-TYPE-MAX CDATA #FIXED "36" XML-TYPE-ROUNDTO CDATA #FIXED "1" > <!ELEMENT SEAT-LETTER (#PCDATA)> <!ATTLIST SEAT-LETTER XML-TYPE CDATA #FIXED "CHARACTER" XML-TYPE-CONTENT CDATA #FIXED "[A-F]" > <!ELEMENT DEPARTURE (#PCDATA)> <!ATTLIST DEPARTURE XML-TYPE CDATA #FIXED "TEMPORAL" XML-TYPE-TYPE CDATA #FIXED "INSTANT" XML-TYPE-ROUNDTO CDATA #FIXED "0000/00/00 00:01:00" -- to the nearest minute --> <!ELEMENT FLIGHT-TIME (#PCDATA)> <!ATTLIST FLIGHT-TIME XML-TYPE CDATA #FIXED "TEMPORAL" XML-TYPE-TYPE CDATA #FIXED "EXTENT" XML-TYPE-ROUNDTO CDATA #FIXED "0000/00/00 00:15:00" -- to the nearest 15 minutes --> Well, what do you think? Eric xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@i... the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|