[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Internal entities removed from XML?


0x1a xml
From: "Rich Salz" <rsalz@d...>
 
> > Well, assuming SAX-style parsing that is: just deliver entity expansions
> > as a separate characters() callback ... no copies or writes needed at
> > all.
> 
> The intent was to show in-place expansion can be way efficient.

Here is a version of Rich's C code that is exactly the same speed-efficiency if there
are no entity references, and no less space-efficient if there are entity
references. If we find a non-built-in reference, we replace the 
& delimiter with the Unicode Object Replacement character.

Afterwards,  "&" in text is just a regular character and U+FFFC means 
the delimiter "entity reference open". 

Entity expansion would happen lazily, by deferencing the name
when it is needed: no tree structures actually are built. We defer
merging buffers until later: if "later" is a stream, then we never incur
a space-cost of merging buffers or building trees.  (If you are not using 
wchar_t,  but say UTF-8 then you would substitute use 0x1A or some 
appropriate unused control point such as a flow control character. )

int  expand_entities_in_text_node(char* buff, int size)
{
     wchar_t *start, *src;
     for (start = src = buff; --size >= 0; )
     {
         if ((*buff++ = *src++) == '&')
         {
             if (size >= 3
             && src[0] == 'l' && src[1] == 't' && src[2] == ';')
                 buff[-1] = '<', src += 3, size -= 2;
             else if (size >= 4
                  && src[0] == 'a' && src[1] == 'm' && src[2] == 'p'
                  && src[3] == ';')
                 src += 4, size -= 3;
            else buff[-1] = 0xFFFC;  /* flag this as an entity reference */
         }
     }
     return buff - src;
}

(As Tim mentioned, for real code we would also need to cope with the
other builtin references and numeric character references, and there
is no error-handling either. )


Cheers
Rick Jelliffe

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.