Subject:Is a default adapter - not Tidy - being used here? Author:Doug Lundin Date:04 Apr 2007 11:53 AM
I don't understand what adapter, if any, is being used here. I can convert HTML to XML without using an adapter but when I choose to use Tidy, I receive an error. Does this mean that Stylus has a default functionality - similar to Tidy - that is used if Tidy is not selected? Any assistance is appreciated.
No adapter:
Choose File | Document wizards | XML Editor | HTML to XML
Click ...
Respecify http://biz.yahoo.com/e/070404/fbsi8-k.html
Click Convert to XML using adapter
Click OPEN
Choose HTML-to-XHTML HtmlTidy - leave default properties
Click OK
Click OK
Here is the error reported
Tidy (vers Sep 26, 2004) Parsing "InputStream"
line 73 column 15 - Error: discarding unexpected </form>
line 73 column 30 - Error: discarding unexpected </td>
line 73 column 35 - Error: discarding unexpected </tr>
line 288 column 35 - Error: discarding unexpected <td>
line 308 column 10 - Error: discarding unexpected </form>
Subject:Is a default adapter - not Tidy - being used here? Author:Tony Lavinio Date:04 Apr 2007 08:24 PM
Doing this:
Choose File | Document wizards | XML Editor | HTML to XML
does use HTML Tidy, but with some non-default settings. It
also uses a more sophisticated version of Tidy.
Using the adapter, try changing the errors= property to see
if it will let you open the HTML file. The adapter version
of Tidy is a slightly different version due to the limitations
imposed by having it inserted as a layer between the filesystem
and the editor.
java.io.IOException: Premature end of file. {AdapterFile.copyToFile}
at com.stylusstudio.adapter.fs.AdapterFile.checkException(AdapterFile.java:402)
at com.stylusstudio.adapter.fs.AdapterFile.copyToFile(AdapterFile.java:485)
Caused by: org.xml.sax.SAXParseException: Premature end of file.
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.stylusstudio.adapter.simple.HTML.toXML(HTML.java:74)
at com.stylusstudio.adapter.AdapterBase.toXML(AdapterBase.java:248)
at com.stylusstudio.adapter.fs.AdapterRunner.run(AdapterRunner.java:56)
at java.lang.Thread.run(Unknown Source)
Subject:Is a default adapter - not Tidy - being used here? Author:Tony Lavinio Date:05 Apr 2007 09:35 AM
In digging through the HTML on Yahoo!, I don't think the Tidy
version we use in the XML Converter will work for you.
Might I suggest looking into TagSoup? It can be used as a
replacement for the parser, and has a more aggressive error-recovery
policy. See http://home.ccil.org/~cowan/XML/tagsoup/
You could do this to use it with Stylus Studio:
1. Convert the Yahoo! page manually, and save it as XML.
2. Build your maps.
3. Deploy, but using TagSoup as the parser.
Subject:Is a default adapter - not Tidy - being used here? Author:Doug Lundin Date:05 Apr 2007 09:42 AM
I appreciate the option but am wondering if I can leverage the java api and your more sophisticated version of Tidy? This will be deployed as an automated process so converting HTML into XML manually is not an option.
Subject:Is a default adapter - not Tidy - being used here? Author:Tony Lavinio Date:06 Apr 2007 12:40 AM
The problem is that the version of Tidy used in the wizard is the
'C' version, and the version used in the Java adapters is JTidy,
which is a subset of the 'C' version.