C++ Visual Studio character encoding issues -
Not able to wrap my head around this is a real source of shame ...
< P> I am working with a French version of Visual Studio (2008), in a French Windows XP (2008). Insert French accents in the strings sent in the output window, get corrupted Input window to Input. Typical character encoding problem, I enter ANSI, in turn get UTF-8, or something for that effect. Which setting can be left in ANSI when showing the "hardcoded" string in the output window?Edit:
Example:
Add # & lt; Iostream & gt; Int main () {std :: cout & lt; & Lt; "AEEUUUU" & lt; & Lt; Std :: endl; Return 0; }
will show in the output:
and oacute; & Uacute; & Ucirc; & Amp; Uml;
(encoded here
& agrave; & eacute; & ecrac; and ugrave;
Before I go ahead, let me tell you what you are doing C / C ++ is not compliant. In states 2.2 which are set in the source code in source code, it is not much there, and all the characters have been used in ASCI therefore ... there is a specific implementation below (As it happens , VC2008 on the US locale machine).
To get started, you have 4 characters on your cout
line, and 4 on the glyphs output. So this issue is in UTF 8 encoding , Because it will combine more than one source character with fewer glyphs.
With the source string appearing on the console, all these things play a part:
- What is your source file encoding (i.e. your C ++ file compile
- What does your compiler do with the string, and what encoding it understands
- your
& lt; & Lt;
What encoding string you are passing - What encoding of the console
- How the console translates to a font glyph 1 and 2 are quite easy ... It seems that the compiler guesses in what format the source file is, and it is in its internal representation Decode makes it literally compatible data portion of the string in the current codepage, even if any source is encoding. I have failed to find clear details / controls on this.
3 is also easier than the control code,
& lt; & Lt;
Just gives data below to char *.4
SetConsoleOutputCP
is controlled by. This should be the default on your default system codepage.GetConsoleOutputCP
(Input is handled differently, viaSetConsoleCP
)5 A strange thing. I used to hit CP 1252 (Western European, windows) to find out why I did not get it to show correctly, to hit my head to know it. It has been found that my system font does not have a glyph for that character, and I use the standard glyce of the standard codepage (capital theta, as well as I would get if I did not call the setconolputup call) . To fix this, I have to change the font used to consume the Lucida console (a true type font).
Some interesting things I have seen:
- It does not make any difference to the encoding of the source, unless the compiler can understand it (especially, by changing it to UTF8 The generated code does not change. My "é" string is still with
233 0
) CP1252 - VC is picking a codepace for string literals that I do not see in control .
- Controlling the console show is more painful than what I was hoping for.
So ... what does this mean? Here are the bits of advice:
- Do not use non-ASI in string literals. Use the resources, where you can control the encoding.
- Make sure that you know what your console is expecting from encoding, and your font has glyphs to represent the characters you sent. / Li>
- If you want to know that encoding is being used in your case, then I recommend printing the actual value of the character as an integer.
four * one = "é"; Std :: cout & lt; & Lt; (Unsigned integer) (unsigned char) a [0]
shows 233 for me, which is encoded in CP1252.
BTW, if you got "instead of" what you pasted ", then it seems that your 4-byte interpretation is somewhere.
- It does not make any difference to the encoding of the source, unless the compiler can understand it (especially, by changing it to UTF8 The generated code does not change. My "é" string is still with
Comments
Post a Comment