[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Why would MS want to make XML break on UNIX, Perl, Python
From: "Michael Rys" <mrys@m...> From: Rick Jelliffe [mailto:ricko@a...] >> Sure, lets make XML unsuitable for use in UNIX pipes by allowing ^D. >> And for Perl and Python text-processing programs that use standard in and >> expect EOF (^D or ^Z). > I was a Unix hack for at least 11 years of my life (before joining the > evil empire and after leaving the mainframe and early PCs and Macs :-)), > and I can assure you that unix pipes do not care about control > characters. You are right. I should have been talking about file IO using text mode on DOS/Windows systems.My experience with pipes failing due to binary data may be because of opening the file (in CYG-WIN) in text mode at the opening stage, not subsequent ones, and due to spooler behaviour UNIX (printers requiring ^D or ^Z). Perhaps I am showing my age: perhaps serial comms and text mode are used so rarely or are well-protected behind layers that it does not matter if control characters are embedded. (Doesn't seem likely...even then, the problem of editability is strong enough to discount embedded controls.) The APIs I mentioned (CGI, Haskell, Perl, Python, C++) all deal with opening files not reading stdin. E.g. http://sources.redhat.com/cygwin/cygwin-ug-net/using-textbinary.html > I don't want to know what kind of innuendo you want to imply on my > motives with your reply (I thanks Dare for his mail, although I am not > feeling as attacked as he interpreted the mail). But supporting the > standard also means that we need to work on evolving the standard if we > see issues that are not currently addressed in some of the main > scenarios of current XML usage. No innuendo was intended on your motives. But MS employers can hardly be surprised if the general public treats their comments with a certain wariness, following the court case. It is not a matter if "evil" or "motives", it is power, groupthink, organizational behaviour and prudence. > Embrace and extend would be to simply generate and consume arbitrary > code points in XML 1.0 without proper warnings and errors. No. Embrace and extend would be to include anything that would not fit in to competitor's architectures or established APIs. > Over the last three years, the usage of XML has evolved into areas where > some people claim it is not as well suited. I honor that opinion, > although I disagree. By evolving from 1.0 to 1.1, we will be adopting > XML for these areas in an interoperable way. If you do not want to write > or parse 1.1, stick with 1.0. I am glad you disagree. > Yes, there are some technical problems with C based APIs such as libxml. > Maybe libxml needs to evolve as well (to libxml2?) for 1.1 applications > and libxml for 1.0 will not be able to support some 1.1 documents and > raise an error if NULL comes through. How do we handle NULLs using C strings again? I missed that part. > Or we find an interoperable way to transport/encode the control > characters (agree on entities or char references or PIs). Yes. I believe the control character problem is a subset of a wider issue: how to define "private use" code points (I believe that the control characters are sui generis with the PUA characters) for use by contracted groups. I have made several proposals for these over the years, but the I18n group at W3C has always taken the line that only standard characters are suited for public interchange. I agree that it would be useful to be able send arbitrary strings, and using Bin64 is clunky and does not go far enough, because it does not establish the semantics of the characters enough (this has always been the problem with the C0 characters anyway, as anyone in serial comms can attest, they are a grab bag.) > However, sticking the head in the sand and ruling the problems out of > bound... I think I am happy to have the problems thought about, and indeed I have been discussing them and pushing for a solution for this kind of thing (semi-private characters) for many years. It is the particular solution proposed that is bad IMHO. > -XML will become as obscure as SGML itself was and ASN.1 takes over Having a non-text XML will fracture it just as fast as anything else. Indeed, the move to names > 16 bits (though good) will cause enough strain. > -XML will fracture and the fracture lines will not be along an >interoperable line but IBM will support NEL anyway, database vendors >will map their textual types into XML text without having an >interoperable way (or it will be confined to their industry group such >as the ISO SQL standard) etc.. I am not at all convinced that the tradeoffs that are appropriate for web distribution of publications and reports (i.e. what is in XML) must be appropriate for database serialization. Of course there is scope for change that improves both (e.g. I have suggested building in knowledge of standard ISO entities into parsers so that we can get rid of DTDs, and moving naming rules to being a layer above WF to allow faster exchange of data known to be valid.) But what to do if the needs of serialization conflicts with the current publishing/programming characteristics of XML (text allowing almost any encoding and catching many errors, with readable names {not to be confused with comprehensible names})? "Jettisoning SGML" seems to be code for adding incompatabilities that make XML non-text, unsafe (for encodings), or non-readable. If there is a conflict, XML should remain a markup language, and the developers who need a serialization language should work within XML-as-text or develop a better, special purpose binary format (ASN.1 has been mentioned before.) Cheers Rick Jelliffe (Not writing on behalf of employer.)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|