How to get the default encoding of current OS in perl script? - perl

How can I get the default encoding used by current platform?
Is there any available module in CPAN or with the distribution of Perl itself?
I can't find the solution in perl.org

See I18N::Langinfo.
$ LANG=en_US.UTF-8 perl -MI18N::Langinfo=langinfo,CODESET -E 'say langinfo(CODESET())'
UTF-8
$ LANG=C perl -MI18N::Langinfo=langinfo,CODESET -E 'say langinfo(CODESET())'
ANSI_X3.4-1968
$ LANG=ja_JP.eucjp perl -MI18N::Langinfo=langinfo,CODESET -E 'say langinfo(CODESET())'
EUC-JP
This is probably what you're looking for. If you follow the code in I18N::Langinfo, you can see how it discovers what locale to use for returning this.

Related

Why does Encode::decode with non-latin letter locales blow up on localised strftime output?

On Ubuntu with Perl 5.26.1 I have encountered the following problem when working on Dancer::Logger::Console. I've lifted this code out of Dancer2::Core::Role::Logger.
In order to run this, you need to generate the following locales:
sudo locale-gen de_DE.UTF-8
sudo locale-gen ko_KR.UTF-8
This example code uses the Korean locale, and fails without an error message. $# is empty.
$ LC_ALL=ko_KR.UTF-8 perl -MPOSIX -MEncode -E 'eval {
say Encode::decode("UTF-8", strftime("%b", localtime))
};
say $#;
'
Wide character at -e line 1.
When run with a German locale, it succeeds (but throws a wide character warning, which we can ignore for this test).
$ LC_ALL=de_DE.UTF-8 perl -MPOSIX -MEncode -E 'eval {
say Encode::decode("UTF-8", strftime("%b", localtime))
};
say $#;
'
Wide character in say at -e line 2.
M�r
The %b formatting is the abbreviated month as localised word (see http://strftime.net/).
If we don't Encode::decode("UTF-8", ...), it works, and the version above with Korean produces 3월.
What's going on here?
Under ko_KR.UTF-8, strftime("%b", localtime(1552997524)) returns 20.33.C6D4. When interpreted as Unicode Code Points, this is "␠3월" ("March", with a leading space).
Under de_DE.UTF-8, strftime("%b", localtime(1552997524)) returns 4D.E4.72. When interpreted as Unicode Code Points, this is "Mär" (short form of "März", "March").
So it seems decoded text (Unicode Code Points) are being returned, which is perfect. All that's left to do is to encode the outputs.
$ LC_ALL=ko_KR.UTF-8 perl -CSD -MPOSIX -e'CORE::say strftime("%b", localtime)'
3월
$ LC_ALL=de_DE.UTF-8 perl -CSD -MPOSIX -e'CORE::say strftime("%b", localtime)'
Mär
In a program (as opposed to a one-liner), you could use something like the following instead of -CSD:
use open ':std', ':encoding(UTF-8)';

How to use Encode::MIME::Header for From header properly?

When I use Encode::MIME::Header encoding From: header, it somehow encodes entire string including header name:
$ perl -Mutf8 -MEncode -E 'say encode(MIME_Header => q{From: "児島 新" <shin#kojima.org>});'
=?UTF-8?B?RnJvbTogIuWFkOWztiDmlrAiIDxzaGluQGtvamltYS5vcmc+?=
What I expect is:
From: "=?UTF-8?B?5YWQ5bO2IOaWsA==?=" <shin#kojima.org>
Could someone point me out what's wrong with my code and what should I do. Is Encode::MIME::Header broken?
Here is my $Encode::VERSION:
$ perl -MEncode -E 'say $Encode::VERSION;'
2.93
I'm sure that Encode::MIME::Header was not like that before.
The system perl from OSX seems pretty sane for me.
$ /usr/bin/perl -Mutf8 -MEncode -E 'say $Encode::VERSION; say encode(MIME_Header => q{From: "児島 新" <shin#kojima.org>});'
2.49
From: "=?UTF-8?B?5YWQ5bO2IOaWsA==?=" <shin#kojima.org>

Perl verbose output?

Is there are a way to get Perl debug output, similar to bash -x but in Perl?
I do not need strikt or diagnose messages (they compile the code but do not print the line that the Perl interpreter executes).
Assuming you are using some kind of unix you can use the Devel::Trace perl module.
If it is not installed you can install it from CPAN like this:
sudo perl -MCPAN -e 'install Devel::Trace'
Once you have it you can run your script like this:
perl -d:Trace myscript.pl
And it will do exactly what bash -x does (note that the name of the Trace package is case sensitive).

Perl: Replace text parameter by current timestamp

I have an utility to generate code documentation every night. I would like to add a timestamp in order to be aware how old the generated documentation is. I would like to use perl.
I've seen that with the following sentence I can change a joker (%1) by any value I want
perl -pi.bak -e 's/%1/date/g' footer.html
And with this other one I can get the system timestamp:
perl -MPOSIX -we "print POSIX::strftime('%d/%m/%Y %H:%M:%S', localtime)"
My question is whether there is any way to merge both instructions in just one sentence.
Thank you very much
Try doing this :
perl -MPOSIX -pi.bak -e 'BEGIN{$date = strftime("%d/%m/%Y %H:%M:%S", localtime);} s/%1/$date/g' file.html
sh command:
perl -i.bak -MPOSIX -pe's/%1/strftime("%d/%m/%Y %H:%M:%S", localtime)/eg'
cmd command:
perl -i.bak -MPOSIX -pe"s/%1/strftime('%d/%m/%Y %H:%M:%S', localtime)/eg"
/e cause the replacement expression to be treated as Perl code to execute, the result of which is the replacement text.

grep regex to perl or awk

I have been using Linux env and recently migrated to solaris. Unfortunately one of my bash scripts requires the use of grep with the P switch [ pcre support ] .As Solaris doesnt support the pcre option for grep , I am obliged to find another solution to the problem.And pcregrep seems to have an obvious loop bug and sed -r option is unsupported !
I hope that using perl or nawk will solve the problem on solaris.
I have not yet used perl in my script and am unware neither of its syntax nor the flags.
Since it is pcre , I beleive that a perl scripter can help me out in a matter of minutes. They should match over multiple lines .
Which one would be a better solution in terms of efficiency the awk or the perl solution ?
Thanks for the replies .
These are some grep to perl conversions you might need:
grep -P PATTERN FILE(s) ---> perl -nle 'print if m/PATTERN/' FILE(s)
grep -Po PATTERN FILE(s) ---> perl -nle 'print "$1\n" while m/(PATTERN)/g' FILE(s)
That's my guess as to what you're looking for, if grep -P is out of the question.
Here's a shorty:
grep -P /regex/ ====> perl -ne 'print if /regex/;'
The -n takes each line of the file as input. Each line is put into a special perl variable called $_ as Perl loops through the whole file.
The -e says the Perl program is on the command line instead of passing it a file.
The Perl print command automatically prints out whatever is in $_ if you don't specify for it to print out anything else.
The if /regex/ matches the regular expression against whatever line of your file is in the $_ variable.