XML Editor
Sign up for a WebBoard account Sign Up Keyword Search Search More Options... Options
Chat Rooms Chat Help Help News News Log in to WebBoard Log in Not Logged in
Show tree view Topic
Topic Page 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Go to previous topicPrev TopicGo to next topicNext Topic
Postnext
Richard PottsSubject: Strange 'invisible' characters
Author: Richard Potts
Date: 13 Jun 2008 06:18 AM
Hi guys, I'm receiving an xml extract from a database. I use the extract to do decoding and importing into MS-Excel for downstream users.

In Excel the formatting is getting screwed up i.e. unexpected newlines are appearing. I've traced this to some entries in the XML. See attached example.

I notice that the SS text editor viewer puts the CDATA closing brackets on a new line for my 'strange' entries.

I believe there are invisible chars being exported by the database in the xml and I want to automatically identify (As the xml is very large) all 'strange' entries so I can inform the database team to correct them. Can I do this in SS?

If not possible, perhaps it could be a new feature as I'm sure other xml guys are fed 'rubbish' from their upstream suppliers and need to identify/eliminate such issues.

Thanks in advance



Using Stylus Studio 2008 Enterprise R 2


UnknownStrangeChars.xml
Example with strange/invisible chars

Postnext
Tony LavinioSubject: Strange 'invisible' characters
Author: Tony Lavinio
Date: 13 Jun 2008 08:20 AM
There are no strange or invisible characters in the file you sent.

There are 22 tabs, 18 linefeeds, 18 carriage returns, 29 spaces, and
everything else is a printable character. There are no ampersands,
and therefore no other characters expressed as &#nnn; or &#xnnn; or
as character entities.

So what does this mean? It's possible the receiving side is just
expecting linefeeds and doesn't like the carriage returns.

But it's more likely that since the CR+LF pairs are part of the
content of the DESCRIPTION element in the CDATA wrapper, they are
getting imported, and they are the source of your extra lines.

Postnext
(Deleted User) Subject: Strange 'invisible' characters
Author: (Deleted User)
Date: 13 Jun 2008 08:23 AM
Hi Richard,
the XML you posted doesn't have invalid chars (you can look for them by pressing Ctrl-F, checking the 'use regular expression' check box and entering the search pattern "[^\x09-\x7E]" - without quotes); the fact that the end of the collapsible region is on the line following the end of the CDATA expression is because the region is for the DESCRIPTION element (the CDATA doesn't have a region for itself because doesn't span at least 3 lines).
Given this, it could be that the extra new line you see in Excel is an artifact of the transformation you perform, maybe caused by that extra new line located between the end of the CDATA and the end of the DESCRIPTION element.

Hope this helps,
Alberto

Posttop
Richard PottsSubject: Strange 'invisible' characters
Author: Richard Potts
Date: 16 Jun 2008 04:56 AM
Thanks guys, Yes I'm not expecting 'New lines' in the CDATA sections and it was this causing the issue.

So is there a regular 'expression' or other mechanism I can use to look for the CR LF that are part of the CDATA section? (e.g. find the 2nd and 3rd entries in the example file)

I clicked on the link in the SS help http://www.boost.org/libs/regex/doc/syntax.html (in the section "Moving Around in XML Documents") to learn more about regular expressions - and its a 'broken'link.

** update *** I figured it out from looking at other web pages namely:
http://www.codeproject.com/KB/string/re.aspx
(Posted here to help others with a 'solution')

the regular expression = "\n]" (without the quotes)

Using this expression I found that there are 100's of such entries in my source data and it will probably take a long time for this to get fixed. So I'll have to get 'defensive' in my XSL - so my next task is to figure out if there is a 'newline' in the resulting string from my <xsl:select...> and if so strip it off.

- looks like "normalize-space()" is the way to go.

Using Stylus Studio 2008 Enterprise R 2

 
Topic Page 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Go to previous topicPrev TopicGo to next topicNext Topic
Download A Free Trial of Stylus Studio 6 XML Professional Edition Today! Powered by Stylus Studio, the world's leading XML IDE for XML, XSLT, XQuery, XML Schema, DTD, XPath, WSDL, XHTML, SQL/XML, and XML Mapping!  
go

Log In Options

Site Map | Privacy Policy | Terms of Use | Trademarks
Stylus Scoop XML Newsletter:
W3C Member
Stylus Studio® and DataDirect XQuery ™are from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2016 All Rights Reserved.