|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Contracts & Acceptence Testing. Re: IE5 and UTF-8
Lucio Piccoli wrote: > I am having a problems with a supplier that sending XML docs that fail to be > parsed by JAXP due to UTF-8 encoding errors. The supplier claims that docs > have been parsed by IE5 before, hence it validates that the XML is good. If you need to be able to pin down the specific encoding problem, some extra info would be helpful: - Can you tell us what the particular UTF-8 encoding error is? - Could you open the same document in IE5? - when you say "send" do you mean over HTTP? if so, are the senders setting the correct charset parameter in the HTTP header. If not, does the document have no encoding declaration or one that explictly says "UTF-8"? If you can capture the data and send a hex dump (if you have a GNU/UNIX system you can use "od -tcxC filename") of the offending fragment, that would be useful. SGML was developed in order to clarify where responsibility for correcting errors belongs: receivers can acceptence test the data. XML inherits this. For contracts, you should specify the validation tool for acceptance testing. I know of at least one customer who bought OmniMark solely to use for validation before delivery of their SGML data, even though they used their own tools; for XML it will be the same. For example, your contract could say something like (in Legalese): "i) Documents must conform to the requirements of ISO International Standard 8879:1986 (SGML) as corrected to 1999. ii) The particular form of SGML is required to be that profile specified by W3C as XML version 1.0 as corrected, as an "Additional Requirements" for SGML (reference to James Clark's SGML declarations for XML document at W3C). iii) The data encoding is required to be UTF-8, as specified by the Unicode Consortium, as corrected. iv) The meaning of characters is required to be that specified by ISO Interntional Standard 10646 (Universal Character Set) and Unicode Consortium in Unicode Character Set 3.0 as corrected. These requirements will be deemed satisfied by: <insert some reference XML processor that you have confidence in, including version and which platform--the version of Java should be specified too> and <insert some encoding test program, you may have to write it> " You could also put in whether the documents should be well-formed or valid, against which DTD, and which program will be used to validate it. If you are really serious, you should specify that valid data or WF data should pass 3/3 different parsers. If your data format has constraints that cannot be represented in DTDs, then also require some schema language such as SOX or Schematron or XML Schemas (which should have mature-enough implementations by 2001). If you send binary data, or data in other formats, the same thing is required. This is particularly important if you send in poorly standardized formats such as CGM or RTF or binary files including GIF. If data is sent in archive or compression or encryption formats, the same approach should be required." Rick Jelliffe *************************************************************************** This is xml-dev, the mailing list for XML developers. To unsubscribe, mailto:majordomo@x...&BODY=unsubscribe%20xml-dev List archives are available at http://xml.org/archives/xml-dev/ ***************************************************************************
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








