[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: citation processing
At 11:32 AM 10/20/2006, Andrew wrote:
If you think its not really feasible to parse a plain text citation into a marked up version then that's good feedback - it could well be that a percentage need to be done by hand. Scale is a real issue here. Real-world citation formats include variations like "use 'pp.' on page ranges for articles in books, but not for articles in journals." At scale, even if your process does the correct thing with 85 of 100 citations (a very optimistic rate), that can leave scores of incorrect ones. And if your upconversion can't recognize where it's failing, you have to find the errors before you can fix them. David is right: it's ultimately an NLP problem (though a very interesting subset of NLP). As he also says, success depends both on handling the rules properly, and on the input actually following those rules. (There are dozens of citation formats around, too.) "Never say never" is good to keep in mind, but when I'm asked to look at citations I immediately start asking questions about the scope of the input, its validation, and acceptable strategies for exception handling. When told there won't be any exceptions it's usually pretty easy to find a bunch. Cheers, Wendell
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|