Re: Why would MS want to make XML break on UNIX, Perl, Python
Rick Jelliffe wrote: > As I understand it, a file opened in text mode through stdio may have embedded ^D (UNIX) > or ^Z (PC) converted to EOF by the standard library routines that read/write from/to > stdio and present them to the application. This is independent of terminal signals, > such as sending ^D to a shell. On UNIX systems, No. UNIX system's stdio library makes no distinctions on text mode and binary mode. The one who handles ^D as EOF indication is the tty line discipline module in the kernel, not stdio library. If the tty is in canonical mode (line-at-a-time reading mode), and if the user typed EOF character at the beginning of line, the read() system call returns 0 without error and the stdio library will set EOF flag on the FILE structure attached to the tty. After that, stdio functions such as fgetc() will return EOF until the EOF flag is cleared. You can change EOF character by using stty utility, such as "stty eof '^X'". Please note that the EOF character is a tty's property, not others'. If the tty is in raw mode (character-at-a-time reading mode), any character, including ^D and NUL, can be read. See stty(1), termio(7) and termios(7) for more detail. On DOS systems, however, stdio library is responsible for ^Z handling. If a file is opened in text mode, and if ^Z is found in the file, stdio library functions such as fgetc() will return EOF. Please note that ^Z handling at the end of file is a backward compatibility behavior, and text files are not required to contain ^Z at the end. Text files without ^Z at the end is perfectly legal on DOS. Historically, ^Z for the end of file indication was required on CP/M systems, where file size is managed as multiples of sector size (usually 128 bytes). DOS systems, where file size is managed as number of bytes, do not require such EOF byte, but treated ^Z as EOF for CP/M compatibility. UNIX systems have no such a thing as "EOF byte" in text files. DOS system call with terminal input also interprets ^Z as EOF character. There is no equivalence of stty on DOS, so you can't change the EOF character. You can read ^Z character if you use a BIOS call for keyboard input. Please note that the "text mode" is introduced to the C language to cooperate with systems with line ending convention different to UNIX. On DOS, CRLF is converted to LF (\n) in text mode. You can read text files in "binary mode", and in such a case, CRLF-to-LF mapping and ^Z-as-EOF handling are disabled. -- NUMATA Toshinori XML Application Technology Development Dept., PROJECT-A XML, Software Group, FUJITSU LIMITED Phone: +81-45-476-4637 (x4673) Fax: +81-45-476-4734
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format