[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Why would MS want to make XML break on UNIX, Perl, Python

unix eof
Rick Jelliffe wrote:

> As I understand it, a file opened in text mode through stdio may have embedded ^D (UNIX) 
> or ^Z (PC) converted to EOF by the standard library routines that read/write from/to
> stdio and present them to the application. This is independent of terminal signals,
> such as sending ^D to a shell.

On UNIX systems, No.  UNIX system's stdio library makes no distinctions
on text mode and binary mode.  The one who handles ^D as EOF indication
is the tty line discipline module in the kernel, not stdio library.  If
the tty is in canonical mode (line-at-a-time reading mode), and if the
user typed EOF character at the beginning of line, the read() system call
returns 0 without error and the stdio library will set EOF flag on the
FILE structure attached to the tty.  After that, stdio functions such
as fgetc() will return EOF until the EOF flag is cleared.  You can change
EOF character by using stty utility, such as "stty eof '^X'".  Please
note that the EOF character is a tty's property, not others'.  If the
tty is in raw mode (character-at-a-time reading mode), any character,
including ^D and NUL, can be read.  See stty(1), termio(7) and termios(7)
for more detail.

On DOS systems, however, stdio library is responsible for ^Z handling.
If a file is opened in text mode, and if ^Z is found in the file,
stdio library functions such as fgetc() will return EOF.  Please note
that ^Z handling at the end of file is a backward compatibility
behavior, and text files are not required to contain ^Z at the end.
Text files without ^Z at the end is perfectly legal on DOS.

Historically, ^Z for the end of file indication was required on CP/M
systems, where file size is managed as multiples of sector size (usually
128 bytes).  DOS systems, where file size is managed as number of bytes,
do not require such EOF byte, but treated ^Z as EOF for CP/M compatibility.
UNIX systems have no such a thing as "EOF byte" in text files.

DOS system call with terminal input also interprets ^Z as EOF character.
There is no equivalence of stty on DOS, so you can't change the EOF
character.  You can read ^Z character if you use a BIOS call for
keyboard input.

Please note that the "text mode" is introduced to the C language to
cooperate with systems with line ending convention different to UNIX.
On DOS, CRLF is converted to LF (\n) in text mode.  You can read text
files in "binary mode", and in such a case, CRLF-to-LF mapping and
^Z-as-EOF handling are disabled.

NUMATA Toshinori
XML Application Technology Development Dept., PROJECT-A XML,
Phone: +81-45-476-4637 (x4673)	Fax: +81-45-476-4734


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.