Perl - How to solve the trouble with encoding in windows console? - perl

Trying to use russian lettaz and console acts like a donkey, because does not react on use utf8/utf-8 or cp1251 directives.
What the encoding of the text marked by red colour I don't know.
Anybody knows how to solve that ? Code listing below:
#!/usr/bin/perl -w
use strict;
use warnings;
use Tie::IxHash;
tie my %hash, "Tie::IxHash";
%hash = (
'шляпа' => 'серая',
'водка' => 'горькая',
'вобла' => 'вкусная');
print "В упорядоченной вставке список хеша такой:\n";
foreach my $qwerty (keys %hash){
print " $qwerty\n";
}
print "Кроме того, предметы обладают некоторыми свойствами:\n";
while((my($predmet, $opredelenie)) = each %hash){
print "$predmet $opredelenie","\n";
}

You need to specify STDOUT encoding. This script is utf-8 encoded:
use strict;
use warnings;
#use Tie::IxHash;
use utf8;
binmode STDOUT, ":encoding(cp866)";
my %hash = (
'шляпа' => 'серая',
'водка' => 'горькая',
'вобла' => 'вкусная'
);
print "В упорядоченной вставке список хеша такой:\n";
foreach my $qwerty ( keys %hash ) {
print " $qwerty\n";
}
print "Кроме того, предметы обладают некоторыми свойствами:\n";
while ( ( my ( $predmet, $opredelenie ) ) = each %hash ) {
print "$predmet $opredelenie", "\n";
}

Related

Transform a string to a hash

Is there any way/lib to transform a string to a hash ?
I have such string
{"hello"=>["world","perl"],"foo"=>"bar"}
and I would like to access the different values like if it was a hash
Thank you
Convert the string with Perl data structure to a string with JSON data structure by substitute => with : and decode it with JSON package.
#!/usr/bin/env perl
use warnings FATAL => 'all';
use strict;
use Data::Dumper;
use JSON qw(decode_json); # use JSON::XS for more performance
my $string = '{"hello"=>["world","perl"],"foo"=>"bar"}';
$string =~ s/"=>/":/g;
print Dumper(decode_json($string));
Output
$VAR1 = {
'hello' => [
'world',
'perl'
],
'foo' => 'bar'
};
Using eval():
#!/usr/bin/env perl
use strict;
use Data::Dumper;
my $string = qw( {"hello"=>["world","perl"],"foo"=>"bar"} );
print "String: $string\n";
my $hash = eval($string);
print "Hash: ", Dumper($hash), "\n";
Output
String: {"hello"=>["world","perl"],"foo"=>"bar"}
Hash: $VAR1 = {
'foo' => 'bar',
'hello' => [
'world',
'perl'
]
};
Using reval() and Safe if you are at all concerned about the input:
#!/usr/bin/env perl
use strict;
use Safe;
use Data::Dumper;
my $string = qw( {"hello"=>["world","perl"],"foo"=>"bar"} );
print "String: $string\n";
my $compartment = new Safe;
my $hash = $compartment->reval($string);
print $# ? "reval error: $#" : ("Hash: ", Dumper($hash)), "\n";
If you don't mind me plugging one of my own modules: Config::Perl uses PPI to parse strings like that, with no need for eval:
use warnings;
use strict;
use Data::Dumper; # Debug
use Config::Perl;
my $str = q( {"hello"=>["world","perl"],"foo"=>"bar"} );
my $data = Config::Perl->new->parse_or_die(\$str)->{_}[0];
print Dumper($data); # Debug
Output:
$VAR1 = {
'hello' => [
'world',
'perl'
],
'foo' => 'bar'
};
(The above code assumes that you've only got a single hash ref in your data, if you've got variations, you'll have to look at the whole data structure returned by parse_or_die.)

array of hashes example fails

I have a problem with multi dimension arrays. I then tried a sample in the book Perl 4th edition, page 379, and that failed as well! Why?
#!/usr/bin/perl
use strict;
use warnings;
# example in manual page 379
# input from file containing: husband=fred pal=barney wife=wilma pet=dino
while ( <> ) {
next unless s/^(.*?):\s*//;
my $who = $1;
for my $field ( split ) {
(my $key, my $value) = split /=/, $field;
my $HoH{$who}{$key} = $value;
}
}
`
Useful trick for illustrative examples - you can in line __DATA__ at the end of your file, and use that.
Anyway, when I run your code, I get:
Global symbol "$key" requires explicit package name (did you forget to declare "my $key"?)
Global symbol "$value" requires explicit package name (did you forget to declare "my $value"?
You are also declaring %HoH badly - you shouldn't use that form, and instead:
my %HoH;
And also that regex - will skip your input text, because it's looking for : and your input doesn't contain any. I will assume that like should be prefixed with flintstone:.
So to simplify and give you something that works:
#! usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %HoH;
while (<DATA>) {
next unless s/^(.*?):\s*//;
my $who = $1;
for my $field (split) {
my ( $key, $value ) = split /=/, $field;
$HoH{$who}{$key} = $value;
}
}
print Dumper \%HoH;
__DATA__
flintstone: husband=fred pal=barney wife=wilma pet=dino
This outputs the dumped HoH:
$VAR1 = {
'flintstone' => {
'husband' => 'fred',
'wife' => 'wilma',
'pal' => 'barney',
'pet' => 'dino'
}
};
Your code contains some syntax errors, and your input text is wrong (there is no :
Try this:
my %HoH;
while ( <DATA> ) {
next unless s/^(.*?):\s*//;
my $who = $1;
for my $field ( split ) {
my ($key, $value) = split /=/, $field;
$HoH{$who}{$key} = $value;
}
}
print Dumper \%HoH;
__DATA__
flintstones: husband=fred pal=barney wife=wilma pet=dino

Read ini files without section names

I want to make a configuration file which hold some objects, like this (where of course none of the paramaters can be considered as a primary key)
param1=abc
param2=ghj
param1=bcd
param2=hjk
; always the sames parameters
This file could be read, lets say with Config::IniFiles, because it has a direct transcription into ini file, like this
[0]
param1=abc
param2=ghj
[1]
param1=bcd
param2=hjk
with, for example, something like
perl -pe 'if (m/^\s*$/ || !$section ) print "[", ($section++ || 0) , "]"'
And finish with
open my $fh, '<', "/path/to/config_file.ini" or die $!;
$cfg = Config::IniFiles->new( -file => $fh );
(...parse here the sections starting with 0.)
But, I here ask me some question about the thing becoming quite complex....
(A) Is There a way to transform the $fh, so that it is not required to execute the perl one-liner BEFORE reading the file sequentially? So, to transform the file during perl is actually reading it.
or
(B) Is there a module to read my wonderfull flat database? Or something approching? I let myslef said, that Gnu coreutils does this kind of flat file reading, but I cannot remember how.
You can create a simple subclass of Config::INI::Reader:
package MyReader;
use strict;
use warnings;
use base 'Config::INI::Reader';
sub new {
my $class = shift;
my $self = $class->SUPER::new( #_ );
$self->{section} = 0;
return $self;
}
sub starting_section { 0 };
sub can_ignore { 0 };
sub parse_section_header {
my ( $self, $line ) = #_;
return $line =~ /^\s*$/ ? ++$self->{section} : undef ;
}
1;
With your input this gives:
% perl -MMyReader -MData::Dumper -e 'print Dumper( MyReader->read_file("cfg") )'
$VAR1 = {
'1' => {
'param2' => 'hjk',
'param1' => 'bcd'
},
'0' => {
'param2' => 'ghj',
'param1' => 'abc'
}
};
You can use a variable reference instead of a file name to create a filehandle that reads from it:
use strict;
use warnings;
use autodie;
my $config = "/path/to/config_file.ini";
my $content = do {
local $/;
open my $fh, "<", $config;
"\n". <$fh>;
};
# one liner replacement
my $section = 0;
$content =~ s/^\s*$/ "\n[". $section++ ."]" /mge;
open my $fh, '<', \$content;
my $cfg = Config::IniFiles->new( -file => $fh );
# ...
You can store the modified data in a real file or a string variable, but I suggest that you use paragraph mode by setting the input record separator $/ to the empty string. Like this
use strict;
use warnings;
{
local $/ = ''; # Read file in "paragraphs"
my $section = 0;
while (<DATA>) {
printf "[%d]\n", $section++;
print;
}
}
__DATA__
param1=abc
param2=ghj
param1=bcd
param2=hjk
output
[0]
param1=abc
param2=ghj
[1]
param1=bcd
param2=hjk
Update
If you read the file into a string, adding section identifiers as above, then you can read the result directly into a Config::IniFiles object using a string reference, for instance
my $config = Config::IniFiles->new(-file => \$modified_contents)
This example shows the tie interface, which results in a Perl hash that contains the configuration information. I have used Data::Dump only to show the structure of the resultant hash.
use strict;
use warnings;
use Config::IniFiles;
my $config;
{
open my $fh, '<', 'config_file.ini' or die "Couldn't open config file: $!";
my $section = 0;
local $/ = '';
while (<$fh>) {
$config .= sprintf "[%d]\n", $section++;
$config .= $_;
}
};
tie my %config, 'Config::IniFiles', -file => \$config;
use Data::Dump;
dd \%config;
output
{
# tied Config::IniFiles
"0" => {
# tied Config::IniFiles::_section
param1 => "abc",
param2 => "ghj",
},
"1" => {
# tied Config::IniFiles::_section
param1 => "bcd",
param2 => "hjk",
},
}
You may want to perform operations on a flux of objects (as Powershell) instead of a flux of text, so
use strict;
use warnings;
use English;
sub operation {
# do something with objects
...
}
{
local $INPUT_RECORD_SEPARATOR = '';
# object are separated with empty lines
while (<STDIN>) {
# key value
my %object = ( m/^ ([^=]+) = ([[:print:]]*) $ /xmsg );
# key cannot have = included, which is the delimiter
# value are printable characters (one line only)
operation ( \%object )
}
A like also other answers.

Perl word Stemming English text

I am trying to stem an English text, I read a lot of forums but I couldn't see a clear example. I am using porter stemmer as in using Text::ENglish.
This is how far I got:
use Lingua::StopWords qw(getStopWords);
my $stopwords = getStopWords('en');
use Text::English;
#stopwords = grep { $stopwords->{$_} } (keys %$stopwords);
chdir("c:/Test Facility/input");
#files = <*>;
foreach $file (#files)
{
open (input, $file);
while (<input>)
{
open (output,">>c:/Test Facility/normalized/".$file);
chomp;
for my $w (#stopwords)
{
s/\b\Q$w\E\b//ig;
}
$_ =~s/<[^>]*>//g;
$_ =~ s/[[:punct:]]//g;
##What should I write here to apply porter stemming using Text::English##
print output "$_\n";
}
}
close (input);
close (output);
Run the following code like this:
perl stemmer.pl /usr/lib/jvm/java-6-sun-1.6.0.26/jre/LICENSE
It produces output similar to:
operat system distributor licens java version sun microsystems inc sun willing to license java platform standard edition developer kit jdk
Note that strings with length 1 and numeric values are removed, besides stopwords.
#!/usr/bin/env perl
use common::sense;
use Encode;
use Lingua::Stem::Snowball;
use Lingua::StopWords qw(getStopWords);
use Scalar::Util qw(looks_like_number);
my $stemmer = Lingua::Stem::Snowball->new(
encoding => 'UTF-8',
lang => 'en',
);
my %stopwords = map {
lc
} keys %{getStopWords(en => 'UTF-8')};
local $, = ' ';
say map {
sub {
my #w =
map {
encode_utf8 $_
} grep {
length >= 2
and not looks_like_number($_)
and not exists $stopwords{lc($_)}
} split
/[\W_]+/x,
shift;
$stemmer->stem_in_place(\#w);
map {
lc decode_utf8 $_
} #w
}->($_);
} <>;

How to search a string in web page and print that full line in which search string is present?

I'm new to programming, learning perl as well.
Here's my question: How do I search a string in web page and print that full line in which search string is present?
Is it possible to find/hit directly that string and then print that full line in which search string is present? Do we need to use xpaths compulsory for this?
If it is just a very basic string you are looking for you can use LWP::Simple and a small regular expression like this:
use LWP::Simple;
my $doc = get('http://stackoverflow.com/q/11771655/479133') || die "GET failed";
foreach my $line (split("\n", $doc)) {
print $line and last if $line =~ m/Here's my query/;
}
There are countless modules available on CPAN to do such things. Have a look at Task::Kensho::WebCrawling if you need something "bigger".
LWP::UserAgent and HTML::Parser can be used:
#!/usr/bin/env perl
use strict;
use warnings;
use HTML::Parser;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $response = $ua->get('http://search.cpan.org/');
if ( !$response->is_success ) {
print "No matches\n";
exit 1;
}
my $parser = HTML::Parser->new( 'text_h' => [ \&text_handler, 'dtext' ] );
$parser->parse( $response->decoded_content );
sub text_handler {
chomp( my $text = shift );
if ( $text =~ /language/i ) {
print "Matched: $text\n";
}
}