c++ - problem using getline with a unicode file -
update: thank @potatoswatter , @jonathan leffler comments - rather embarrassingly caught out debugger tool tip not showing value of wstring correctly - still isn't quite working me , have updated question below:
if have small multibyte file want read string use following trick - use getline
delimeter of '\0'
e.g.
std::string contents_utf8; std::ifstream inf1("utf8.txt"); getline(inf1, contents_utf8, '\0');
this reads in entire file including newlines.
if try same thing wide character file doesn't work - wstring
reads the first line.
std::wstring contents_wide; std::wifstream inf2(l"ucs2-be.txt"); getline( inf2, contents_wide, wchar_t(0) ); //doesn't work
for example if unicode file contains chars , b seperated crlf, hex looks this:
fe ff 00 41 00 0d 00 0a 00 42
based on fact multibyte file getline '\0' reads entire file believed getline( inf2, contents_wide, wchar_t(0) )
should read in entire unicode file. doesn't - example above wide string contain following 2 wchar_ts: ff ff
(if remove wchar_t(0) reads in first line expected (ie fe ff 00 41 00 0d 00
)
why doesn't wchar_t(0) work delimiting wchar_t getline stops on 00 00
(or reads end of file want)?
thank you
your ucs-2 decoder misbehaving. result of getline( inf2, contents_wide )
on fe ff 00 41 00 0d 00 0a 00 42
should 0041 0000
= l"a"
. assuming you're on windows, line ending should converted, , byte-order mark shouldn't appear in output.
suggest double-checking os documentation respect how set locale.
edit: did set locale?
locale::global( locale( "something if system supports ucs-2" ) );
or
locale::global( encoding_support::ucs2_bigendian_encoding );
where encoding_support library.
Comments
Post a Comment