[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Proposed requirements on solutions that convertXML-illegal

  • From: Mukul Gandhi <gandhi.mukul@gmail.com>
  • To: Timothy Cook <timothywayne.cook@gmail.com>
  • Date: Thu, 27 Apr 2017 13:22:41 +0530

Re:  Proposed requirements on solutions that convertXML-illegal
Before agreeing to Mr. Timothy's post, I forgot to brush up what is really meant by the XSD type xs:string. If we look at xs:string definition at, https://www.w3.org/TR/xmlschema-2/#string it says
"The string datatype represents character strings in XML. The ·value space· of string is the set of finite-length sequences of characters (as defined in [XML 1.0 (Second Edition)]) that ·match· the Char production from [XML 1.0 (Second Edition)]."

While writing +1, I suspected that XSD's xs:string mirrors Java's String data type. But actually, xs:string takes its value space from XML 1.0 (Second Edition).

On 27 April 2017 at 08:50, Mukul Gandhi <gandhi.mukul@gmail.com> wrote:
+1

On 26 April 2017 at 01:33, Timothy Cook <timothywayne.cook@g...> wrote:
I'm not sure I see your problem. If NUL is embeded in a string and your element is defined to contain string content then it works as expected. 

For example nul.xsd is: 

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
   
 <xs:complexType name="test">
     <xs:sequence>
         <xs:element name="key" type="xs:string"></xs:element>
         <xs:element name="message" type="xs:string"></xs:element>
     </xs:sequence>
 </xs:complexType>
   
 <xs:element name="testdoc" type="test"></xs:element>  
</xs:schema>

and nul.xml is:
<?xml version="1.0" encoding="UTF-8"?>
<testdoc   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="nul.xsd">
    <key>\u0000</key>
    <message>Hello \u000C World</message>
</testdoc>

Then the nul.xml is perfectly valid.  Below is even a screen shot I took from within Oxygen XML Editor using Xerces as the parser and SaxonEE gives the same result.
 




On Tue, Apr 25, 2017 at 3:49 PM, Costello, Roger L. <costello@m...> wrote:

Hi Folks,

XML 1.0 has a limited set of characters. Some other data formats have a superset of characters – the other data formats may have characters that would be illegal in XML.

Suppose the other data format is to be converted to XML. How will the illegal characters be handled?

Other data format -> convert -> XML

Example: the JSON data format has a superset of characters. Suppose you want to convert the following JSON to XML:

{
 
"key":"\u0000"
}

 

\u0000 is a JSON encoding of the NUL (hex 0) character. Recall that the NUL character is not allowed in XML.

I am collecting requirements on the process of converting other data formats into XML. Below is my list thus far. Do you agree with the list? Are there requirements that you would add/delete?

1. The conversion must result in legal XML. Thus, conversion of the above JSON must not produce this:

<key>&#x0;</key>

That is not legal (well-formed) XML.

2. The conversion must be round-trippable. The operation must be lossless. Thus, it is not acceptable to convert the above JSON to this:

<key/>

Data has been lost. That is a lossy operation and is not round-trippable.

3. The conversion must output standard XML. The XML must not contain syntax/encoding that is specific to the other data format. The XML must be processable using standard XML tools. Thus, it is not acceptable to convert the above JSON to this:

                <key>\u0000</key>

That has a JSON-specific encoding embedded within XML. If we wanted, say, to do a string comparison on the value of <key>, the application would need to understand the JSON syntax.

4. The conversion must output readable text. No hexadecimal text output. Thus, it is not acceptable to convert this:

{
 
"message": "Hello \u000C World"
}

 

to this:

 

<message>48656c6c6f200c20576f726c64</message>

 

Well, that’s a start. What are the other requirements for converting illegal characters to XML?

 

Have these requirements boxed me into a situation where no solution is possible?

 

/Roger

 




--
Timothy Cook



--
Regards,
Mukul Gandhi



--
Regards,
Mukul Gandhi


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.