RFC1468 - Japanese Character Encoding for Internet Messages

热度：9℃ 发布时间：2024-11-17 22:06:30

Network Working Group J. Murai
Request for Comments: 1468 Keio University
M. Crispin
Panda Programming
E. van der Poel
June 1993
Japanese Character Encoding for Internet Messages
Status of this Memo
This memo provides information for the Internet community. It does
not specify an Internet standard. Distribution of this memo is
unlimited.
IntrodUCtion
This document describes the encoding used in electronic mail [RFC822]
and network news [RFC1036] messages in several Japanese networks. It
was first specified by and used in JUNET [JUNET]. The encoding is now
also widely used in Japanese IP communities.
The name given to this encoding is "ISO-2022-JP", which is intended
to be used in the "charset" parameter field of MIME headers (see
[MIME1] and [MIME2]).
Description
The text starts in ASCII [ASCII], and switches to Japanese characters
through an escape sequence. For example, the escape sequence ESC $ B
(three bytes, hexadecimal values: 1B 24 42) indicates that the bytes
following this escape sequence are Japanese characters, which are
encoded in two bytes each. To switch back to ASCII, the escape
sequence ESC ( B is used.
The following table gives the escape sequences and the character sets
used in ISO-2022-JP messages. The ISOREG number is the registration
number in ISO"s registry [ISOREG].
Esc Seq Character Set ISOREG
ESC ( B ASCII 6
ESC ( J JIS X 0201-1976 ("Roman" set) 14
ESC $ @ JIS X 0208-1978 42
ESC $ B JIS X 0208-1983 87
Note that JIS X 0208 was called JIS C 6226 until the name was changed
on March 1st, 1987. Likewise, JIS C 6220 was renamed JIS X 0201.
The "Roman" character set of JIS X 0201 [JISX0201] is identical to
ASCII except for backslash () and tilde (~). The backslash is
replaced by the Yen sign, and the tilde is replaced by overline. This
set is Japan"s national variant of ISO 646 [ISO646].
The JIS X 0208 [JISX0208] character sets consist of Kanji, Hiragana,
Katakana and some other symbols and characters. Each character takes
up two bytes.
For further details about the JIS Japanese national character set
standards, refer to [JISX0201] and [JISX0208]. For further
information about the escape sequences, see [ISO2022] and [ISOREG].
If there are JIS X 0208 characters on a line, there must be a switch
to ASCII or to the "Roman" set of JIS X 0201 before the end of the
line (i.e., before the CRLF). This means that the next line starts in
the character set that was switched to before the end of the previous
line.
Also, the text must end in ASCII.
Other restrictions are given in the Formal Syntax below.
Formal Syntax
The notational conventions used here are identical to those used in
RFC822 [RFC822].
The * (asterisk) convention is as follows:
l*m something
meaning at least l and at most m somethings, with l and m taking
default values of 0 and infinity, respectively.
message = headers 1*( CRLF *single-byte-char *segment
single-byte-seq *single-byte-char )
; see also [MIME1] "body-part"
; note: must end in ASCII
headers = <see [RFC822] "fields" and [MIME1] "body-part">
segment = single-byte-segment / double-byte-segment
single-byte-segment = single-byte-seq 1*single-byte-char
double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 )
single-byte-seq = ESC "(" ( "B" / "J" )
double-byte-seq = ESC "$" ( "@" / "B" )
CRLF = CR LF
; ( Octal, Decimal.)
ESC = <ISO 2022 ESC, escape> ; ( 33, 27.)
SI = <ISO 2022 SI, shift-in> ; ( 17, 15.)
SO = <ISO 2022 SO, shift-out> ; ( 16, 14.)
CR = <ASCII CR, carriage return>( 15, 13.)
LF = <ASCII LF, linefeed> ; ( 12, 10.)
one-of-94 = <any one of 94 values> ; (41-176, 33.-126.)
7BIT = <any 7-bit value> ; ( 0-177, 0.-127.)
single-byte-char = <any 7BIT, including bare CR & bare LF, but NOT
including CRLF, and not including ESC, SI, SO>
MIME Considerations
The name given to the JUNET character encoding is "ISO-2022-JP". This
name is intended to be used in MIME messages as follows:
Content-Type: text/plain; charset=iso-2022-jp
The ISO-2022-JP encoding is already in 7-bit form, so it is not
necessary to use a Content-Transfer-Encoding header. It should be
noted that applying the Base64 or Quoted-Printable encoding will
render the message unreadable in current JUNET software.
ISO-2022-JP may also be used in MIME Part 2 headers. The "B"
encoding should be used with ISO-2022-JP text.
Background Information
The JUNET encoding was described in the JUNET User"s Guide [JUNET]
(JUNET Riyou No Tebiki Dai Ippan).
The encoding is based on the particular usage of ISO 2022 announced
by 4/1 (see [ISO2022] for details). However, the escape sequence
normally used for this announcement is not included in ISO-2022-JP
messages.
The Kana set of JIS X 0201 is not used in ISO-2022-JP messages.
In the past, some systems erroneously used the escape sequence ESC (
H in JUNET messages. This escape sequence is officially registered
for a Swedish character set [ISOREG], and should not be used in ISO-
2022-JP messages.
Some systems do not distinguish between ESC ( B and ESC ( J or
between ESC $ @ and ESC $ B for display. However, when relaying a
message to another system, the escape sequences must not be altered
in any way.
The human user (not implementor) should try to keep lines within 80
display columns, or, preferably, within 75 (or so) columns, to allow
insertion of ">" at the beginning of each line in excerpts. Each JIS
X 0208 character takes up two columns, and the escape sequences do
not take up any columns. The implementor is reminded that JIS X 0208
characters take up two bytes and should not be split in the middle to
break lines for displaying, etc.
The JIS X 0208 standard was revised in 1990, to add two characters at
the end of the table. Although ISO 2022 specifies special additional
escape sequences to indicate the use of revised character sets, it is
suggested here not to make use of this special escape sequence in
ISO-2022-JP text, even if the two characters added to JIS X 0208 in
1990 are used.
For further information about Japanese character encodings such as PC
codes, FTP locations of implementations, etc, see "Electronic
Handling of Japanese Text" [JPN.INF].
References
[ASCII] American National Standards Institute, "Coded character set
-- 7-bit American national standard code for information
interchange", ANSI X3.4-1986.
[ISO646] International Organization for Standardization (ISO),
"Information technology -- ISO 7-bit coded character set for
information interchange", International Standard, Ref. No. ISO/IEC
646:1991.
[ISO2022] International Organization for Standardization (ISO),
"Information processing -- ISO 7-bit and 8-bit coded character sets
-- Code extension techniques", International Standard, Ref. No. ISO
2022-1986 (E).
[ISOREG] International Organization for Standardization (ISO),
"International Register of Coded Character Sets To Be Used With
Escape Sequences".
[JISX0201] Japanese Standards Association, "Code for Information
Interchange", JIS X 0201-1976.
[JISX0208] Japanese Standards Association, "Code of the Japanese
graphic character set for information interchange", JIS X 0208-1978,
-1983 and -1990.
[JPN.INF] Ken R. Lunde <lunde@adobe.com>, "Electronic Handling of
Japanese Text", March 1992,
msi.umn.edu(128.101.24.1):pub/lunde/japan[123].inf
[JUNET] JUNET Riyou No Tebiki Sakusei Iin Kai (JUNET User"s Guide
Drafting Committee), "JUNET Riyou No Tebiki (Dai Ippan)" ("JUNET
User"s Guide (First Edition)"), February 1988.
[MIME1] Borenstein N., and N. Freed, "MIME (Multipurpose
Internet Mail Extensions): Mechanisms for Specifying and
Describing the Format of Internet Message Bodies", RFC1341,
Bellcore, Innosoft, June 1992.
[MIME2] Moore, K., "Representation of Non-ASCII Text in Internet
Message Headers", RFC1342, University of Tennessee, June 1992.
[RFC822] Crocker, D., "Standard for the Format of ARPA Internet
Text Messages", STD 11, RFC822, UDEL, August 1982.
[RFC1036] Horton M., and R. Adams, "Standard for Interchange of USENET
Messages", RFC1036, AT&T Bell Laboratories, Center for Seismic
Studies, December 1987.
Acknowledgements
Many people assisted in drafting this document. The authors wish to
thank in particular Akira Kato, Masahiro Sekiguchi and Ken"ichi
Handa.
Security Considerations
Security issues are not discussed in this memo.
Authors" Addresses
Jun Murai
Keio University
5322 Endo, Fujisawa
Kanagawa 252 Japan
Fax: +81 466 49 1101
EMail: jun@wide.ad.jp
Mark Crispin
Panda Programming
6158 Lariat Loop NE
Bainbridge Island, WA 98110-2098
USA
Phone: +1 206 842 2385
EMail: MRC@PANDA.COM
Erik M. van der Poel
A-105 Park Avenue
4-4-10 Ohta, Kisarazu
Chiba 292 Japan
Phone: +81 438 22 5836
Fax: +81 438 22 5837
EMail: erik@poel.juice.or.jp

网友评论

更多软件教程

软件教程推荐

更多+

Greenfoot设置中文的方法

Greenfoot是一款简单易用的Java开发环境，该软件界面清爽简约，既可以作为一个开发框使用，也能够作为集成开发环境使用，操作起来十分简单。这款软件支持多种语言，但是默认的语言是英文，因此将该软件下载到电脑上的时候，会发现软件的界面语言是英文版本的，这对于英语基础较差的朋友来说，使用这款软件就会...

07-05

Egret UI Editor修改快捷键的方法

Egret UI Editor是一款开源的2D游戏开发代码编辑软件，其主要功能是针对Egret项目中的Exml皮肤文件进行可视化编辑，功能十分强大。我们在使用这款软件的过程中，可以将一些常用操作设置快捷键，这样就可以简化编程，从而提高代码编辑的工作效率。但是这款软件在日常生活中使用得不多，并且专业性...

07-05

KittenCode新建项目的方法

KittenCode是一款十分专业的编程软件，该软件给用户提供了可视化的操作界面，支持Python语言的编程开发以及第三方库管理，并且提供了很多实用的工具，功能十分强大。我们在使用这款软件进行编程开发的过程中，最基本、最常做的操作就是新建项目，因此我们很有必要掌握新建项目的方法。但是这款软件的专业性...

07-05

Thonny设置中文的方法

Thonny是一款十分专业的Python编辑软件，该软件界面清爽简单，给用户提供了丰富的编程工具，具备代码补全、语法错误显示等功能，非常的适合新手使用。该软件还支持多种语言，所以在下载这款软件的时候，有时候下载到电脑中的软件是英文版本的，这对于英语基础较差的小伙伴来说，使用这款软件就会变得十分困难，...

07-05

阅读更多+