[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: [off-topic] xtext -- encoding declarations for text
From: "John Cowan" <jcowan@r...> > Rick Jelliffe scripsit: > > > Comments welcome. > > The following regex (to be interpreted as referring to bytes, not characters) > should reliably detect the presence of an xtext declaration. > > \A(\0*.){1,4}\0*(x\0*t\0*e\0*x\0*t\0* |\xA7\xA3\x85\xA7\xA3\x40) > > \A means the beginning of the string, \0* skips any number of null bytes > introduced by UTF-16 or UTF-32 encodings, and the \xA7...\x40 is "xtext " > in the common subset of EBCDIC. No space at the end of ascii 'xtext'. It's not just that null bytes are noise, but that the characters may be re-ordered along big/little-endian lines. How is the requirement enforced that the 1-4 leading characters not contain a-z etc.? It would be interesting to see a regex that actually solved this problem, including the optional bom, but I doubt I would want to use it. ;-} Bob
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|