c++ - problem using getline with a unicode file -

- July 15, 2014

update: thank @potatoswatter , @jonathan leffler comments - rather embarrassingly caught out debugger tool tip not showing value of wstring correctly - still isn't quite working me , have updated question below:

if have small multibyte file want read string use following trick - use getline delimeter of '\0' e.g.

std::string contents_utf8; std::ifstream inf1("utf8.txt"); getline(inf1, contents_utf8, '\0');

this reads in entire file including newlines.
if try same thing wide character file doesn't work - wstring reads the first line.

std::wstring contents_wide; std::wifstream inf2(l"ucs2-be.txt"); getline( inf2, contents_wide, wchar_t(0) ); //doesn't work

for example if unicode file contains chars , b seperated crlf, hex looks this:

fe ff 00 41 00 0d 00 0a 00 42

based on fact multibyte file getline '\0' reads entire file believed getline( inf2, contents_wide, wchar_t(0) ) should read in entire unicode file. doesn't - example above wide string contain following 2 wchar_ts: ff ff

(if remove wchar_t(0) reads in first line expected (ie fe ff 00 41 00 0d 00)

why doesn't wchar_t(0) work delimiting wchar_t getline stops on 00 00 (or reads end of file want)?
thank you

your ucs-2 decoder misbehaving. result of getline( inf2, contents_wide ) on fe ff 00 41 00 0d 00 0a 00 42 should 0041 0000 = l"a". assuming you're on windows, line ending should converted, , byte-order mark shouldn't appear in output.

suggest double-checking os documentation respect how set locale.

edit: did set locale?

locale::global( locale( "something if system supports ucs-2" ) );

locale::global( encoding_support::ucs2_bigendian_encoding );

where encoding_support library.

Search This Blog

Manage

c++ - problem using getline with a unicode file -

Comments

Post a Comment

Popular posts from this blog

How do .net 4.0 [named] tuples work under the hood? -

javascript - Enclosure Memory Copies -

php - Replacing tags in braces, even nested tags, with regex -