When I use Encode::MIME::Header encoding From: header, it somehow encodes entire string including header name:
$ perl -Mutf8 -MEncode -E 'say encode(MIME_Header => q{From: "児島 新" <shin#kojima.org>});'
=?UTF-8?B?RnJvbTogIuWFkOWztiDmlrAiIDxzaGluQGtvamltYS5vcmc+?=
What I expect is:
From: "=?UTF-8?B?5YWQ5bO2IOaWsA==?=" <shin#kojima.org>
Could someone point me out what's wrong with my code and what should I do. Is Encode::MIME::Header broken?
Here is my $Encode::VERSION:
$ perl -MEncode -E 'say $Encode::VERSION;'
2.93
I'm sure that Encode::MIME::Header was not like that before.
The system perl from OSX seems pretty sane for me.
$ /usr/bin/perl -Mutf8 -MEncode -E 'say $Encode::VERSION; say encode(MIME_Header => q{From: "児島 新" <shin#kojima.org>});'
2.49
From: "=?UTF-8?B?5YWQ5bO2IOaWsA==?=" <shin#kojima.org>
Related
On Ubuntu with Perl 5.26.1 I have encountered the following problem when working on Dancer::Logger::Console. I've lifted this code out of Dancer2::Core::Role::Logger.
In order to run this, you need to generate the following locales:
sudo locale-gen de_DE.UTF-8
sudo locale-gen ko_KR.UTF-8
This example code uses the Korean locale, and fails without an error message. $# is empty.
$ LC_ALL=ko_KR.UTF-8 perl -MPOSIX -MEncode -E 'eval {
say Encode::decode("UTF-8", strftime("%b", localtime))
};
say $#;
'
Wide character at -e line 1.
When run with a German locale, it succeeds (but throws a wide character warning, which we can ignore for this test).
$ LC_ALL=de_DE.UTF-8 perl -MPOSIX -MEncode -E 'eval {
say Encode::decode("UTF-8", strftime("%b", localtime))
};
say $#;
'
Wide character in say at -e line 2.
M�r
The %b formatting is the abbreviated month as localised word (see http://strftime.net/).
If we don't Encode::decode("UTF-8", ...), it works, and the version above with Korean produces 3월.
What's going on here?
Under ko_KR.UTF-8, strftime("%b", localtime(1552997524)) returns 20.33.C6D4. When interpreted as Unicode Code Points, this is "␠3월" ("March", with a leading space).
Under de_DE.UTF-8, strftime("%b", localtime(1552997524)) returns 4D.E4.72. When interpreted as Unicode Code Points, this is "Mär" (short form of "März", "March").
So it seems decoded text (Unicode Code Points) are being returned, which is perfect. All that's left to do is to encode the outputs.
$ LC_ALL=ko_KR.UTF-8 perl -CSD -MPOSIX -e'CORE::say strftime("%b", localtime)'
3월
$ LC_ALL=de_DE.UTF-8 perl -CSD -MPOSIX -e'CORE::say strftime("%b", localtime)'
Mär
In a program (as opposed to a one-liner), you could use something like the following instead of -CSD:
use open ':std', ':encoding(UTF-8)';
To filter non-consecutive lines of file, the below one-liner working fine:
cat filename | perl -ane 'print unless $a{$_}++'
However, when i tried to make it as an alias and do, its not working as expected
alias uniqlines " cat \!* | perl -ane 'print unless \$a{\$_}++' "
erroring out as below
a: Undefined variable.
Using tcsh shell for SunOS operating system
In bash this syntax works:
alias uniqlines="perl -ane 'print unless \$a{\$_}++' "
Here is a way that seems to work even in tcsh:
alias uniqlines 'perl -ane '"'"'print unless $a{$_}++'"'"' '
You want to use a function instead:
uniqlines(){ cat "$#" | perl -ane 'print unless $a{$_}++'; }
But this is pretty much just a fancy way of saying:
alias uniqlines=uniq
I have an utility to generate code documentation every night. I would like to add a timestamp in order to be aware how old the generated documentation is. I would like to use perl.
I've seen that with the following sentence I can change a joker (%1) by any value I want
perl -pi.bak -e 's/%1/date/g' footer.html
And with this other one I can get the system timestamp:
perl -MPOSIX -we "print POSIX::strftime('%d/%m/%Y %H:%M:%S', localtime)"
My question is whether there is any way to merge both instructions in just one sentence.
Thank you very much
Try doing this :
perl -MPOSIX -pi.bak -e 'BEGIN{$date = strftime("%d/%m/%Y %H:%M:%S", localtime);} s/%1/$date/g' file.html
sh command:
perl -i.bak -MPOSIX -pe's/%1/strftime("%d/%m/%Y %H:%M:%S", localtime)/eg'
cmd command:
perl -i.bak -MPOSIX -pe"s/%1/strftime('%d/%m/%Y %H:%M:%S', localtime)/eg"
/e cause the replacement expression to be treated as Perl code to execute, the result of which is the replacement text.
$ perl -MMIME::Base64 -e 'print encode_base64("syn_ack#163.com");'
c3luX2Fjay5jb20=
$ perl -MMIME::Base64 -e 'print decode_base64("c3luX2Fjay5jb20=");'
syn_ack.com
The encode result cannot decode to original string, why?
You have to escape # as \#or use different quotes.
This is because double quotes are expanded, and #163 is treated as an array #163 (even if this name is not valid identifier).
This works as expected:
perl -MMIME::Base64 -e "print encode_base64('syn_ack#163.com');"
c3luX2Fja0AxNjMuY29t
perl -MMIME::Base64 -e 'print encode_base64("syn_ack\#163.com");'
c3luX2Fja0AxNjMuY29t
perl -MMIME::Base64 -e "print decode_base64('c3luX2Fja0AxNjMuY29t');"
syn_ack#163.com
Switch the quotes. Perl will interpolate variables when using double-quotes.
$ perl -MMIME::Base64 -e "print encode_base64('syn_ack#163.com');"
c3luX2Fja0AxNjMuY29t
$ perl -MMIME::Base64 -e "print decode_base64('c3luX2Fja0AxNjMuY29t');"
syn_ack#163.com
http://perlmeme.org/howtos/using_perl/interpolation.html
When you see unexpected results with Perl, make sure warnings are enabled.
$ perl -w -MMIME::Base64 -e 'print encode_base64("syn_ack#163.com");'
Possible unintended interpolation of #163 in string at -e line 1.
c3luX2Fjay5jb20=
No interpolation occurs inside single-quoted ('') strings, so you could run
perl -w -MMIME::Base64 -e 'print encode_base64('syn_ack#163.com');'
or leave the double-quotes ("") and escape the #
perl -w -MMIME::Base64 -e 'print encode_base64("syn_ack\#163.com");'
Either outputs
c3luX2Fja0AxNjMuY29t
Decoding gives
$ perl -w -MMIME::Base64 -e 'print decode_base64("c3luX2Fja0AxNjMuY29t");'
syn_ack#163.com
How can I get the default encoding used by current platform?
Is there any available module in CPAN or with the distribution of Perl itself?
I can't find the solution in perl.org
See I18N::Langinfo.
$ LANG=en_US.UTF-8 perl -MI18N::Langinfo=langinfo,CODESET -E 'say langinfo(CODESET())'
UTF-8
$ LANG=C perl -MI18N::Langinfo=langinfo,CODESET -E 'say langinfo(CODESET())'
ANSI_X3.4-1968
$ LANG=ja_JP.eucjp perl -MI18N::Langinfo=langinfo,CODESET -E 'say langinfo(CODESET())'
EUC-JP
This is probably what you're looking for. If you follow the code in I18N::Langinfo, you can see how it discovers what locale to use for returning this.