perl - How do I split Chinese characters one by one? -
if there no special character(such white space, : etc) between firstname , lastname.
then how split chinese characters below.
use strict; use warnings; use data::dumper; $fh = \*data; $fname; # 小三; $lname; # 张 ; while(my $name = <$fh>) { $name =~ ??? ; print $fname"/n"; print $lname; } __data__ 张小三
output
小三 张
[update]
winxp. activeperl5.10.1 used.
you have problems because neglect decode binary data perl strings during input , encode perl strings binary data during output. reason regular expressions , friend split
work on perl strings.
(?<=.)
means "after first character". such, program not work correctly on 复姓/compound family names; keep in mind rare, exist. in order correctly split name family name , given name parts, need use dictionary family names.
linux version:
use strict; use warnings; use encode qw(decode encode); while (my $full_name = <data>) { $full_name = decode('utf-8', $full_name); chomp $full_name; ($family_name, $given_name) = split(/(?<=.)/, $full_name, 2); print encode('utf-8', sprintf('the full name %s, family name %s, given name %s.', $full_name, $family_name, $given_name) ); } __data__ 张小三
output:
the full name 张小三, family name 张, given name 小三.
windows version:
use strict; use warnings; use encode qw(decode encode); use encode::hanextra qw(); while (my $full_name = <data>) { $full_name = decode('gb18030', $full_name); chomp $full_name; ($family_name, $given_name) = split(/(?<=.)/, $full_name, 2); print encode('gb18030', sprintf('the full name %s, family name %s, given name %s.', $full_name, $family_name, $given_name) ); } __data__ 张小三
output:
the full name 张小三, family name 张, given name 小三.
Comments
Post a Comment