c++ - problem using getline with a unicode file -
update: thank @potatoswatter , @jonathan leffler comments - rather embarrassingly caught out debugger tool tip not showing value of wstring correctly - still isn't quite working me , have updated question below:
if have small multibyte file want read string use following trick - use getline delimeter of '\0' e.g.
std::string contents_utf8; std::ifstream inf1("utf8.txt"); getline(inf1, contents_utf8, '\0'); this reads in entire file including newlines.
if try same thing wide character file doesn't work - wstring reads the first line.
std::wstring contents_wide; std::wifstream inf2(l"ucs2-be.txt"); getline( inf2, contents_wide, wchar_t(0) ); //doesn't work for example if unicode file contains chars , b seperated crlf, hex looks this:
fe ff 00 41 00 0d 00 0a 00 42 based on fact multibyte file getline '\0' reads entire file believed getline( inf2, contents_wide, wchar_t(0) ) should read in entire unicode file. doesn't - example above wide string contain following 2 wchar_ts: ff ff
(if remove wchar_t(0) reads in first line expected (ie fe ff 00 41 00 0d 00)
why doesn't wchar_t(0) work delimiting wchar_t getline stops on 00 00 (or reads end of file want)?
thank you
your ucs-2 decoder misbehaving. result of getline( inf2, contents_wide ) on fe ff 00 41 00 0d 00 0a 00 42 should 0041 0000 = l"a". assuming you're on windows, line ending should converted, , byte-order mark shouldn't appear in output.
suggest double-checking os documentation respect how set locale.
edit: did set locale?
locale::global( locale( "something if system supports ucs-2" ) ); or
locale::global( encoding_support::ucs2_bigendian_encoding ); where encoding_support library.
Comments
Post a Comment