|
[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: parsing post script
ghostscript includes a pstext utility to extract text: it does a reasonable but not 100% accurate job (and includes the full ghostscript postscript interpreter). If you turn off the ps2ascii simple mode (remove the "-dSIMPLE" argument), GhostScript outputs font and positioning information for each string. You can use that information to eliminate headers & footers, identify elements to tag, and so forth. Exegenix (http://exegenix.com/) has a commercial solution for converting PostScript or PDF to XML; it looks intriguing. -- Larry Kollar k o l l a r @ a l l t e l . n e t "The hardest part of all this is the part that requires thinking." -- Paul Tyson, on xml-doc XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|

Cart








