|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML parser using lex & yacc
> I want to develop an XML parser in C or maybe C++ for an > undergraduate university project. My approach will be to prototype > the parser using flex and bison. As I understand it, flex won't be > able to handle all of the character encodings required in the the > 1.0 spec. Using your own lexer may be the best approach, but all the "syntax characters" of XML are plain ASCII, so it might well be possible to use [f]lex to tokenise it. For UTF-8 it is straightforward: the lexer doesn't have to even know that the multibyte-characters are not just multiple characters - the next level up can translate them. Or you might be able to replace the lexer's input functions and change its character type to integer (if it isn't already); this would work for UTF-16 (the other required encoding) too. The most obvious problem with using yacc/lex type tools for XML is that keywords aren't always keywords. For example, in some places in the DTD "SYSTEM" is a keyword and in others it would just be a name. You can have the parser switch the lexer between states but it's not pretty. -- Richard xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








