SSDN - Strange 'invisible' characters

XML Editor

Sign Up

Search

Options

Chat

Help

News

Log in

Not Logged in

Conferences

	Stylus Studio Feature Requests (1192)
	Stylus Studio Technical Forum (14621)
	Stylus Studio - Registrar en o... (1)
	Stylus Studio - Registrar en o... (2)
	Can a pipeline send a file by ... (2)
	After Updateing WIN10 to WIN11... (12)
	Where do I add the custom java... (3)
	Where is the Diagram tab? (5)
	Applying XSLT to Word DOCX/XML (2)
	CSV conversion via ConvertToXM... (1)
	Text symbols in SS not same as... (4)
	Exposing xquery as webservice ... (6)
	Syntax Identifier (2)
	Saving a Converted XML as an X... (5)
	Output document cannot be pars... (4)
	Archiving output from conversi... (1)
	EDIFACT guideline from Stylus ... (3)
	CSV file putting all the data ... (5)
	Can't install Home version 64b... (5)
	presale - Can I covers this sc... (5)
	Problem with UNB (5)
	Splitting EDIFACT files pipeli... (4)
	[1-20] [21-40] [41-60] Next
	Website Feedback (249)
	XSLT Help and Discussion (7625)
	XQuery Help and Discussion (2017)
	Stylus Studio FAQs (159)
	Stylus Studio Code Samples & Utilities (364)
	Stylus Studio Announcements (113)

Topic

Richard Potts

Subject: Strange 'invisible' characters
Author: Richard Potts
Date: 13 Jun 2008 06:18 AM

Hi guys, I'm receiving an xml extract from a database. I use the extract to do decoding and importing into MS-Excel for downstream users.

In Excel the formatting is getting screwed up i.e. unexpected newlines are appearing. I've traced this to some entries in the XML. See attached example.

I notice that the SS text editor viewer puts the CDATA closing brackets on a new line for my 'strange' entries.

I believe there are invisible chars being exported by the database in the xml and I want to automatically identify (As the xml is very large) all 'strange' entries so I can inform the database team to correct them. Can I do this in SS?

If not possible, perhaps it could be a new feature as I'm sure other xml guys are fed 'rubbish' from their upstream suppliers and need to identify/eliminate such issues.

Thanks in advance

Using Stylus Studio 2008 Enterprise R 2

StrangeChars.xml
Example with strange/invisible chars

Tony Lavinio

Subject: Strange 'invisible' characters
Author: Tony Lavinio
Date: 13 Jun 2008 08:20 AM

There are no strange or invisible characters in the file you sent.

There are 22 tabs, 18 linefeeds, 18 carriage returns, 29 spaces, and
everything else is a printable character. There are no ampersands,
and therefore no other characters expressed as &#nnn; or &#xnnn; or
as character entities.

So what does this mean? It's possible the receiving side is just
expecting linefeeds and doesn't like the carriage returns.

But it's more likely that since the CR+LF pairs are part of the
content of the DESCRIPTION element in the CDATA wrapper, they are
getting imported, and they are the source of your extra lines.

(Deleted User)

Subject: Strange 'invisible' characters
Author: (Deleted User)
Date: 13 Jun 2008 08:23 AM

Hi Richard,
the XML you posted doesn't have invalid chars (you can look for them by pressing Ctrl-F, checking the 'use regular expression' check box and entering the search pattern "[^\x09-\x7E]" - without quotes); the fact that the end of the collapsible region is on the line following the end of the CDATA expression is because the region is for the DESCRIPTION element (the CDATA doesn't have a region for itself because doesn't span at least 3 lines).
Given this, it could be that the extra new line you see in Excel is an artifact of the transformation you perform, maybe caused by that extra new line located between the end of the CDATA and the end of the DESCRIPTION element.

Hope this helps,
Alberto

Richard Potts

Subject: Strange 'invisible' characters
Author: Richard Potts
Date: 16 Jun 2008 04:56 AM

Thanks guys, Yes I'm not expecting 'New lines' in the CDATA sections and it was this causing the issue.

So is there a regular 'expression' or other mechanism I can use to look for the CR LF that are part of the CDATA section? (e.g. find the 2nd and 3rd entries in the example file)

I clicked on the link in the SS help http://www.boost.org/libs/regex/doc/syntax.html (in the section "Moving Around in XML Documents") to learn more about regular expressions - and its a 'broken'link.

** update *** I figured it out from looking at other web pages namely:
http://www.codeproject.com/KB/string/re.aspx
(Posted here to help others with a 'solution')

the regular expression = "\n]" (without the quotes)

Using this expression I found that there are 100's of such entries in my source data and it will probably take a long time for this to get fixed. So I'll have to get 'defensive' in my XSL - so my next task is to figure out if there is a 'newline' in the resulting string from my <xsl:select...> and if so strip it off.

- looks like "normalize-space()" is the way to go.

Using Stylus Studio 2008 Enterprise R 2

Powered by Stylus Studio, the world's leading XML IDE for XML, XSLT, XQuery, XML Schema, DTD, XPath, WSDL, XHTML, SQL/XML, and XML Mapping!

Go to Conference:

Log In Options Username: Password:

Site Map | Privacy Policy | Terms of Use | Trademarks

Stylus Studio® and DataDirect XQuery ™are from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2016 All Rights Reserved.