TONT 40193 为什么行结束符是CR+LF?

请注意:本页内容发布于 2125 天前,内容可能已经过时,请注意甄别。

承袭古方。

原文链接:https://blogs.msdn.microsoft.com/oldnewthing/20040318-00/?p=40193

This protocol dates back to the days of teletypewriters. CR stands for “carriage return” – the CR control character returned the print head (“carriage”) to column 0 without advancing the paper. LF stands for “linefeed” – the LF control character advanced the paper one line without moving the print head. So if you wanted to return the print head to column zero (ready to print the next line) and advance the paper (so it prints on fresh paper), you need both CR and LF.

这一设定要追溯到电传打字机的年代了。CR代表『字车归位』(Carriage Return)——控制字符CR会将打印头(字车,carriage)归位到第0列(译注:即纸张最左端)但不进纸。LF代表『换行』(Line Feed)——控制字符LF会将纸张上提一行,但不移动打印头。所以如果你想将打印头归位到第0列(为打印下一行做准备)的同时进纸(为了能在纸张的空白位置进行录入),你需要同时使用CR和LF。

If you go to the various internet protocol documents, such as RFC 0821 (SMTP), RFC 1939 (POP), RFC 2060 (IMAP), or RFC 2616 (HTTP), you’ll see that they all specify CR+LF as the line termination sequence. So the the real question is not “Why do CP/M, MS-DOS, and Win32 use CR+LF as the line terminator?” but rather “Why did other people choose to differ from these standards documents and use some other line terminator?”

看一下不同的Internet协议文档,例如RFC 0821(SMTP)、RFC 1939(POP)、RFC 2060(IMAP)或RFC 2616(HTTP)的话,你会发现它们全都指定将CR+LF作为行结束符序列来使用。所以真正的问题不是『为什么CP/M、MS-DOS和Win32用CR+LF作为行结束符』,而应该是『为什么会有人选择与这些标准文档相向而行,选择使用其它的字符作为行结束符呢?』。

Unix adopted plain LF as the line termination sequence. If you look at the stty options, you’ll see that the onlcr option specifies whether a LF should be changed into CR+LF. If you get this setting wrong, you get stairstep text, where

UNIX系的操作系统使用LF作为行结束符序列。看一下stty的配置文件,可以看到有一个onlcr选项控制是否应当将LF变为CR+LF进行处理。如果这个选项设置不当的话,你会在屏幕上看到像楼梯一样的文本排布,使得(译注:请顺着将中文读下去)

each(每)
  line(一行)
    begins(都是从)

where the previous line left off. So even unix, when left in raw mode, requires CR+LF to terminate lines. The implicit CR before LF is a unix invention, probably as an economy, since it saves one byte per line.

上一行的末尾开始的。所以即使是在UNIX下,在raw模式下也需要CR+LF作为行结束符。在LF前隐含CR是UNIX系的发明,可能是出于经济方面的考量,因为这样可以给每行省出1个字节的空间。(译注:很久以前的存储器是很贵的)

The unix ancestry of the C language carried this convention into the C language standard, which requires only “\n” (which encodes LF) to terminate lines, putting the burden on the runtime libraries to convert raw file data into logical lines.

而具有UNIX血统的C语言将这一管理代入了C语言标准,只需要『\n』(在编译时编码为LF)作为行结束符即可,而把将原始文件转换为有逻辑的行这一重担交给了运行时库(Runtime libraries)。

The C language also introduced the term “newline” to express the concept of “generic line terminator”. I’m told that the ASCII committee changed the name of character 0x0A to “newline” around 1996, so the confusion level has been raised even higher.

C语言还引入了『newline』这一术语来阐述『通用行结束符』这一概念。后来我得知ASCII委员会在1996年左右时,把0x0A这个字符定义为了『newline』,这大概又把这一混乱提升到了一个新的高度。

Here’s another discussion of the subject, from a unix perspective.

这里还有另一篇以UNIX的视角对这一话题进行的讨论。(译注:原始链接已失效)

Comments

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

 剩余字数 ( Characters available )

Your comment will be available after auditing.
您的评论将在通过审核后显示。

Please DO NOT add any links in your comment, otherwise it would be identified as SPAM automatically and never be audited.
请不要在评论中插入任何链接,否则将被自动归类为垃圾评论,且永远不会被提交给博主进行复审。

*