[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Character Entities: An XML Core WG View
Pardon the possible self-promotion, but I think a brief description of Ents might offer some hope to those who find internal subsets for character references to be unpleasant. Ents is a Java FilterReader that looks through a document and either replaces entity names with character references or character references with entity names. It does this in the text of the document, so this processing can be either fed into a parser or poured back out as XML for later processing. (There is also a SAXFilter for the skippedEntity() event.) The rules files, while not the best XML structure I've created, are pretty simple at their core: <equal ent="iexcl" ref="#161">inverted exclamation mark, U+00A1 ISOnum</equal> There's also room for some descriptions, an identification of the source for these references, etc. It's not a particularly bright tool at the moment, as I never got around to teaching it about hex to decimal conversion, but it does let you round-trip entities to character references and back. Humans can enjoy the (relative) convenience of named entities, while parsers can enjoy the simpler processing of character references. Features coming soon include the hexadecimal support mentioned above, as well as support for putting the characters directly into or out of text, not just character refs. I'm integrating Ents with my Gorille work on Unicode, and should have something to show in the next couple of weeks. It seems like pretty much all XML development to date has been at the parser level or above, but there's a lot of useful work to be done on the text. It's unfortunate that the parsing model described in XML 1.0 puts a lot of layers into a single processing context, but maybe we can start breaking out those layers and take advantage of having all this accessible text. ------------- Simon St.Laurent - SSL is my TLA http://simonstl.com may be my URI http://monasticxml.org may be my ascetic URI urn:oid:1.3.6.1.4.1.6320 is another possibility altogether
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|