perl - How do I split Chinese characters one by one? -


if there no special character(such white space, : etc) between firstname , lastname.

then how split chinese characters below.

use strict;  use warnings;  use data::dumper;    $fh = \*data;   $fname; # 小三;  $lname; # 张 ; while(my $name = <$fh>) {      $name =~ ??? ;     print $fname"/n";     print $lname;  }  __data__   张小三 

output

小三 张 

[update]

winxp. activeperl5.10.1 used.

you have problems because neglect decode binary data perl strings during input , encode perl strings binary data during output. reason regular expressions , friend split work on perl strings.

(?<=.) means "after first character". such, program not work correctly on 复姓/compound family names; keep in mind rare, exist. in order correctly split name family name , given name parts, need use dictionary family names.

linux version:

use strict; use warnings; use encode qw(decode encode);  while (my $full_name = <data>) {     $full_name = decode('utf-8', $full_name);     chomp $full_name;     ($family_name, $given_name) = split(/(?<=.)/, $full_name, 2);     print encode('utf-8',         sprintf('the full name %s, family name %s, given name %s.', $full_name, $family_name, $given_name)     );  }  __data__ 张小三 

output:

the full name 张小三, family name 张, given name 小三. 

windows version:

use strict; use warnings; use encode qw(decode encode); use encode::hanextra qw();  while (my $full_name = <data>) {     $full_name = decode('gb18030', $full_name);     chomp $full_name;     ($family_name, $given_name) = split(/(?<=.)/, $full_name, 2);     print encode('gb18030',         sprintf('the full name %s, family name %s, given name %s.', $full_name, $family_name, $given_name)     );  }  __data__ 张小三 

output:

the full name 张小三, family name 张, given name 小三. 

Comments

Popular posts from this blog

javascript - Enclosure Memory Copies -

php - Replacing tags in braces, even nested tags, with regex -