Mixing extended character sets in ANSI C
uunet!osf.org!meissner
uunet!osf.org!meissner
Wed Mar 2 11:59:40 PST 1994
| The C Standard specifies the support of extended character sets and locale
| specific behaviour. However, the Standard does not mention anything about
| codesets.
|
| Consider the following program :
|
| > main()
| > {
| > setlocale (LC_ALL, "japanese");
| > printf("taC\n"); /* this is a SJIS string */
| > printf("%U%!%$%k%n"); /* this is a EUC string */
| > }
|
| The program sets the locale to be "japanese" and defines two strings. One
| of the strings uses Shift JIS characters and the other uses EUC which is
| a totally different encoding. I cannot find anything in the Standard to
| allow this. Is this a violation of ANSI rule (if so, which one) ? Or is
| this fall into "locale specific behaviour" ? Comments ?
Look in the Multibyte Characters section (2.2.1.2 in the obsolete ANSI
standard -- I don't have the ISO standard in front of me):
For the source character set, the following shall hold:
* A comment, string literal, character constant, or header
name shall begin and end in the initial shift state.
* A comment, string literal, character constant, or header
name shall consist of a sequence of valid multibyte
characters.
If one of the strings is not a sequence of valid multibyte characters,
than it violates this.
More information about the Numeric-interest
mailing list