[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Structured from/within unstructured documents
Stephen Green wrote: > What methods are there, these days, for extracting structured data from > unstructured documents (such as PDF)? Maybe I'm missing something, but I didn't see anyone suggest saving the PDF as XML straight from Acrobat. If you have a full licence, it does a pretty respectable job, getting you paragraph and character tagging, tables and images. You can also batch process, converting entire directories or what have you. The results are at least as good as saving the PDF to something like Word first and you could be forgiven for expecting that they might even be better. Once you're that far, you can get on your XSLT boots... Marcus
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|