[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Binary versus Text

  • From: Rick Jelliffe <rjelliffe@allette.com.au>
  • To: "Costello, Roger L." <costello@mitre.org>
  • Date: Wed, 27 Nov 2013 01:37:17 +1100

Re:  Binary versus Text
I would say that a text file is one which, when sequentially read, has is a simple transformation from the bytes to a sequence of characters in one or more character repertoires (lists), fully consuming all bytes with none remaining, except any file-termination codes. This transformation may be direct mapping using the values of the bytes, or may involve mapping sequences of bytes to some other number  (e.g. UTF-8), or may involve a simple state machine (e.g. ISO 2022), for example, (but surely nothing requiring a stack or random access.)   The result and initial objective of parsing the file is a single sequence of characters.

I would say that a binary file, when used in distinction to "text file", is one which uses potentially more complex transformations, where the result and initial objective of parsing the file will be a data structure or event stream. 

So a ZIP file containing an uncompressed XML file is not a text file, because there are some bytes that are not intended to map to characters. But a file with a single DNA sequence as a packed string probably counts as a text file.

{You might say that therefore a file containing artificial languages like markup is a text file that is also like a binary file (in that you end up with a data structure or event stream.)}

Text and binary are also names used to represent different modes in some applications: e.g. in FTP a text file may have its newlines replaced with platform specific newlines (a la text/* MIME type)  and perhaps even be transcoded, while a binary file will be kept byte-for-byte intact (a la application/*  MIME type.) This usage for modes should not cloud the usage relating to files.

So the test of a text file is "can I read it?" but "is it intended to be a sequence of characters from some repertoire with  a 'simple' O(n) sequential mapping from the bytes"? 

Something like that.

Cheers
Rick




On Mon, Nov 25, 2013 at 1:25 AM, Costello, Roger L. <costello@mitre.org> wrote:

Hi Folks,

 

Distinguishing "text" versus "binary" is important.

 

On October 30 we had a discussion titled, "Is the binary file format dead?"

 

During that discussion John Cowan made an excellent distinction between binary and text files. I thought it would be useful to summarize the distinction.

 

The universe of computer files falls into two categories:

 

1. Binary files

2. Text files

 

By convention we normally restrict "binary" to files which are not interpretable as streams of characters. [John Cowan]

 

The word "text" is applied to files which are interpretable as streams of characters.

 

Of course any text file is also a binary file, since the class of text files is obtained from the class of binary files by applying restrictions. But it would be confusing to call a text file a binary file; it would be like calling a cat a mammal: correct but imprecise.

 

 

/Roger




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.