[Home] [By Thread] [By Date] [Recent Entries]
Costello, Roger L. wrote: > > The content of <Author> can be characters from any language - English, Chinese, Arabic, Italian, Greek, German, Spanish, Russian, etc - plus punctuation symbols plus math symbols. If I did my arithmetic correctly [1], the total number of different characters is: 1,112,000. > Strictly, languages don't have characters, they have writing systems, and writing systems use scripts, and scripts are made from characters. Furthermore, Unicode has Private Use Areas which allow non-standard characters to be represented. XML Schemas String datatype is perhaps better thought of as an anti-datatype rather than a datatype. What it does is signify an absence of a value-space: it is not asserted to be a number, not asserted to be a date, not asserted to be a boolean. This is of course a little topsy turvy. I had a case with an insurance company who received data from the agents which had standard fields but the fields could contain any notation. There was a separate process where people would check the fields and "re-work" them into the standard notations. So the input might have <date>20th May, 2010</date> and after rework it would contain <date>2010-05-20</date> They were surprised to learn that they could not merely say that the incoming data was a string, and then restrict this string to be a date type. (Since xs:date is not a restriction of xs:string.) The original XML Schemas datatype hierarchy was not designed with document refinement in mind (i.e. marking up the document, passing it as text through several different XML stages): the design only makes sense if you assume that the data is living in a DBMS, i.e. where the types are actually primitive storage types for DBMS. Cheers Rick Jelliffe
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



