Mixing extended character sets in ANSI C

uunet!osf.org!meissner uunet!osf.org!meissner
Wed Mar 2 11:59:40 PST 1994


| The C Standard specifies the support of extended character sets and locale
| specific behaviour. However, the Standard does not mention anything about
| codesets.
| 
| Consider the following program :
| 
| > main()
| > {
| >         setlocale (LC_ALL, "japanese");
| >         printf("taC\n");     /* this is a SJIS string */
| >         printf("%U%!%$%k%n");     /* this is a EUC  string */
| > }
| 
| The program sets the locale to be "japanese" and defines two strings. One
| of the strings uses Shift JIS characters and the other uses EUC which is
| a totally different encoding. I cannot find anything in the Standard to
| allow this. Is this a violation of ANSI rule (if so, which one) ? Or is
| this fall into "locale specific behaviour" ? Comments ?

Look in the Multibyte Characters section (2.2.1.2 in the obsolete ANSI
standard -- I don't have the ISO standard in front of me):

	For the source character set, the following shall hold:

	* A comment, string literal, character constant, or header
	name shall begin and end in the initial shift state.

	* A comment, string literal, character constant, or header
	name shall consist of a sequence of valid multibyte
	characters.

If one of the strings is not a sequence of valid multibyte characters,
than it violates this.



More information about the Numeric-interest mailing list