TONT 36683 MS-DOS 是如何报告错误代码的?

请注意:本页内容发布于 1723 天前,内容可能已经过时,请注意甄别。

原文链接:https://devblogs.microsoft.com/oldnewthing/20050117-00/?p=36683

The old MS-DOS function calls (ah, int 21h), typically indicated error by returning with carry set and putting the error code in the AX register. These error codes will look awfully familiar today: They are the same error codes that Windows uses. All the small-valued error codes like ERROR_FILE_NOT_FOUND go back to MS-DOS (and possibly even further back).

旧式的 MS-DOS 功能调用(啊,INT 21h)通常通过在返回中设置carry标志、并将错误代码放在AX寄存器中来表明发生了错误。这些错误代码即使今天看起来也极其眼熟,因为 Windows 也使用了相同的错误代码。所有这些由小小的数字代表的错误代码——如 ERROR_FILE_NOT_FOUND ——都可以追溯到 MS-DOS(并且可能更早)。

Error code numbers are a major compatibility problem, because you cannot easily add new error code numbers without breaking existing programs. For example, it became well-known that “The only errors that can be returned from a failed call to OpenFile are 3 (path not found), 4 (too many open files), and 5 (access denied).” If MS-DOS ever returned an error code not on that list, programs would crash because they used the error number as an index into a function table without doing a range check first. Returning a new error like 32 (sharing violation) meant that the programs would jump to a random address and die.

错误代码是一项主要的兼容性问题,因为你无法简单地增加新的错误代码,而不影响已有的应用程序。例如,广为人知的是『调用 OpenFile 且失败时,可能的返回只会是3(找不到路径)、4(打开的文件数已超出上限)或者5(拒绝访问)』。如果 MS-DOS 返回了一个不在这个列表上的错误代码,(第三方)程序们就会崩溃,因为这些程序将错误代码用作了函数列表的索引,甚至连边界检查都没做。返回一个新的错误代码(例如32)会让这些程序跳到一个随机的地址,然后炸掉。

More about error number compatibility next time.

下次有机会时,我们再来说有关错误代码兼容性的事。

When it became necessary to add new error codes, compatibility demanded that the error codes returned by the functions not change. Therefore, if a new type of error occurred (for example, a sharing violation), one of the previous “well-known” error codes was selected that had the most similar meaning and that was returned as the error code. (For “sharing violation”, the best match is probably “access denied”.) Programs which were “in the know” could call a new function called “get extended error” which returned one of the newfangled error codes (in this case, 32 for sharing violation).

等到增加新的错误代码变得有必要时,兼容性需求会要求函数返回的错误代码不能改变。因此,当某个新型的错误发生时(例如共享违例),会返回一个之前『最广为人知』且含义最为接近的的错误代码。(对于『共享违例』来说,最佳的匹配项是『拒绝访问』)。那些『知道内情』的(新)程序可以通过调用名为『获取扩展错误代码』的方法来获取那些『新奇』的错误代码(在前面的例子中,程序会获得32——共享违例)。

The “get extended error” function returned other pieces of information. It gave you an “error class” which gave you a vague idea of what type of problem it is (out of resources? physical media failure? system configuration error?), an “error locus” which told you what type of device caused the problem (floppy? serial? memory?), and what I found to be the most interesting meta-information, the “suggested action”. Suggested actions were things like “pause, then retry” (for temporary conditions), “ask user to re-enter input” (for example, file not found), or even “ask user for remedial action” (for example, check that the disk is properly inserted).

这个『获取扩展错误代码』方法还返回了其它的信息,它会给你返回一个『错误类』来通知你关于问题的大致类别(资源不足?媒体硬件损坏?系统设置出错?),一个『错误核心』来告知你导致错误发生的具体设备类型(软驱?串口?内存?),以及我认为最有趣的元信息部分——『建议操作』。『建议操作』会是类似『暂停,然后重试』(对于暂时性的问题来说),『要求用户重新提供输入』(例如找不到文件这类错误),甚至『要求用户实行补救措施』(例如检查磁盘是否正确插入了)等等。

The purpose of these meta-error values is to allow a program to recover when faced with an error code it doesn’t understand. You could at least follow the meta-data to have an idea of what type of error it was (error class), where the error occurred (error locus), and what you probably should do in response to it (suggested action).

这些有关错误的元数据有助于程序在面对一个其不了解的错误代码时,从错误中恢复过来。至少你可以从元数据所描述中,知晓出错的类型(错误类)、出错的所在(错误核心)以及面对错误时可能应该进行的操作(建议操作)。

Sadly, this type of rich error information was lost when 16-bit programming was abandoned. Now you get an error code or an exception and you’d better know what to do with it. For example, if you call some function and an error comes back, how do you know whether the error was a logic error in your program (using a handle after closing it, say) or was something that is externally-induced (for example, remote server timed out)? You don’t.

可惜的是,这种丰富的错误信息设计随着16位程序退出历史舞台被遗弃了。现在当你面对错误代码或异常信息时,你最好知道自己应该做什么。例如,如果你调用了某个方法,然后返回了一个错误,你如何知道这是你程序设计中的逻辑错误(例如在关闭某个句柄后又去使用它),还是某些外界因素的导致的(例如远程服务器超时)?你没法知道。

This is particularly gruesome for exception-based programming. When you catch an exception, you can’t tell by looking at it whether it’s something that genuinely should crash the program (due to an internal logic error – a null reference exception, for example) or something that does not betray any error in your program but was caused externally (connection failed, file not found, sharing violation).

这种情形在面对以异常为错误机制的编程时尤为可惧。当你捕获了一个异常时,你没有办法通过观察异常信息,来判断是什么地方真的让你的程序崩溃了(来自内部的逻辑设计错误,例如空引用异常等等),还是某些实际上与你的程序无关、而是某些外界因素导致的(例如连接失败、未找到文件、共享违例等等)。

Comments

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

 剩余字数 ( Characters available )

Your comment will be available after auditing.
您的评论将在通过审核后显示。

Please DO NOT add any links in your comment, otherwise it would be identified as SPAM automatically and never be audited.
请不要在评论中插入任何链接,否则将被自动归类为垃圾评论,且永远不会被提交给博主进行复审。

*