Get all values in a hash from string after equals sign Perl - perl

I have a string like this "Test string has tes value like abc="123",bcd="345",or it it can be xyz="4567" and ytr="434"".
Now what i want to get the values after equals sign.The hash structure while be like this :
$hash->{abc} =123,
$hash->{bcd} =345,
$hash->{xyz} =4567,
i have tried this $str =~ / (\S+) \s* = \s* (\S+) /xg

The regex returns the captured pairs, which can be assigned to a hash, made anonymous.
use warnings 'all';
use strict;
use feature 'say';
my $str = 'Test string has tes value like abc="123",bcd="345",or it '
. 'it can be xyz="4567" and ytr="434"';
my $rh = { $str =~ /(\w+)="(\d+)"/g }
say "$_ => $rh->{$_}" for keys %$rh ;
Prints
bcd => 345
abc => 123
ytr => 434
xyz => 4567
Following a comment – for possible spaces around the = sign, change it to \s*=\s*.

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $string = q{Test string has tes value like abc="123",bcd="345" and xyz="523"};
my %hash = $string =~ /(\w+)="(\d*)"/g;
print Dumper \%hash;
Output
$VAR1 = {
'xyz' => '523',
'abc' => '123',
'bcd' => '345'
};
Demo

Your test string looks like this (editing slightly to fix the quoting problems).
'Test string has tes value like abc="123",bcd="345",or it it can be xyz="4567" and ytr="434"'
I used this code to test your regex:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Data::Dumper;
my $text = 'Test string has tes value like abc="123",bcd="345",or it it can be xyz="4567" and ytr="434"';
my %hash = $text =~ /(\S+)\s*=\s*(\S+)/g;
say Dumper \%hash;
Which gives this output:
$VAR1 = {
'abc="123",bcd' => '"345",or'
'ytr' => '"434"',
'xyz' => '"4567"'
};
The problem is that \S+ matches any non-whitespace character. And that's too much. You need to be more descriptive about the valid characters.
Your keys appear to all be letters. And your values are all digits - but they are surrounded by quote characters that you don't want. So try this regex instead = /([a-z]+)\s*=\s*"(\d+)"/g.
That gives:
$VAR1 = {
'bcd' => '345',
'abc' => '123',
'ytr' => '434',
'xyz' => '4567'
};
Which looks correct to me.

Related

Variable contains period in perl, but I don't want to concat

I am trying to create a hash from tabular data. One of the columns contain team names that are separated by .'s - for instance USA.APP.L2PES.
A sample line that will be split (| are delimiters):
http://10.x.x.x:8085/BINReport/servlet/Processing|2012/10/02 08:40:30|2015/03/10 16:00:42|nxcvapxxx_bin|Chandler|Linkpoint Connect|USA.APP.L2PES
And here is the code I'm using to split and arrange into a hash (found on another stack exchange comment).
my #records;
while (<$fh_OMDB>) {
my #fields = split /\s*\|\s*/, $_;
my %hash;
#hash{#headers} = #fields;
push #records, \%hash;
}
for my $record (#records) {
my $var = $record->{Arg04};
print $var;
}
However, whenever I try to print $var I get the error message
Use of uninitialized value $var in print at find_teams_v2.pl line 53,
line 2667.
I believe this is because the string contains a . and perl is trying to concat the values. How do I print the string as a literal?
Here is an output of Data dumper (only one of them, as there are a couple thousand output):
$VAR1 = {
'Arg01' => 'zionscatfdecs',
'Date Added' => '2013/08/06 10:30:04',
'URL' => 'https://zionscat.fdecs.com',
'Arg04
' => 'USA.FDFI.APP.TMECS2
',
'Date Updated' => '2013/08/06 10:30:04',
'Arg02' => 'Omaha',
'Arg03' => 'First Data eCustomer Service ()'
};
And the headers dump:
$VAR1 = 'URL';
$VAR2 = 'Date Added';
$VAR3 = 'Date Updated';
$VAR4 = 'Arg01';
$VAR5 = 'Arg02';
$VAR6 = 'Arg03';
$VAR7 = 'Arg04
';
Mocking up my own #headers it seems to work ok.
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my #headers = qw ( mee mah mo mum
fee fi fo fum );
my $string =
q{http://10.x.x.x:8085/BINReport/servlet/Processing|2012/10/02 08:40:30|2015/03/10 16:00:42|nxcvapxxx_bin|Chandler|Linkpoint Connect|USA.APP.L2PES};
my #records;
my #fields = split /\s*\|\s*/, $string;
print join( "\n", #fields );
my %hash;
#hash{#headers} = #fields;
print Dumper \%hash;
push #records, \%hash;
foreach my $record (#records) {
my $var = $record ->{fo};
print $var,"\n";
}
This prints the string - with the dots - as expected. That's because each loop the hash looks like (with your sample data):
$VAR1 = {
'fi' => 'Linkpoint Connect',
'mo' => '2015/03/10 16:00:42',
'mum' => 'nxcvapxxx_bin',
'mee' => 'http://10.x.x.x:8085/BINReport/servlet/Processing',
'mah' => '2012/10/02 08:40:30',
'fee' => 'Chandler',
'fum' => undef,
'fo' => 'USA.APP.L2PES'
};
However, if my 'headers' row is shorter:
my #headers = qw ( mee mah mo mum );
I can reproduce your error:
Use of uninitialized value $var in print
That's because there's no {fo} element:
$VAR1 = {
'mo' => '2015/03/10 16:00:42',
'mee' => 'http://10.x.x.x:8085/BINReport/servlet/Processing',
'mum' => 'nxcvapxxx_bin',
'mah' => '2012/10/02 08:40:30'
};
So:
Make sure you have strict and warnings switched on.
check #headers is declared.
check #headers is actually long enough to match every field.
Running:
use Data::Dumper;
print Dumper \#headers;
Will tell you this.
And as noted in the comments above:
'Arg04
' => 'USA.FDFI.APP.TMECS2
{Arg04} doesn't exist, we have a newline. chomp your line before converting it into headers, (or just chomp #headers) and your code will work.
Note - you may also wish to chomp inside your while loop, because otherwise the last field will also include a newline. (Which may be undesired)
Using the Data::Dumper printouts showed that I had fewer header fields than existed, so that when I made the call to Arg04 it didn’t exist or made a reference to an undef field. Doing a chomp on the header array fixed the error message.

how would I trim and split into perl hash

I have this string
$st="station_1:50, station_2:40, station_3:60";
how would I split this into a Perl hash table ?
I try
%hash = map{split /\:/, $_}(split /, /, $st);
it does correctly - but what if there is n-space between the , and station?
how would I make it so it strip out all the leading space?
If there may or may not be a space, split on /, ?/ instead of /, /. If there may be any number of spaces, use /, */.
The solution with your code (added \s* to the second split):
perl -we '
my $_ = "station_1:50, station_2:40, station_3:60";
my %hash = map {split /:/} split /,\s*/;
use Data::Dumper;
print Dumper \%hash
'
OUTPUT:
$VAR1 = {
'station_1' => '50',
'station_3' => '60',
'station_2' => '40'
};
Another working way using regex:
CODE
$ echo "station_1:50, station_2:40, station_3:6" |
perl -MData::Dumper -lne '
my %h;
$h{$1} = $2 while /\b(station_\d+):(\d+)/ig;
print Dumper \%h
'
SAMPLE OUTPUT
$VAR1 = {
'station_3' => '6',
'station_1' => '50',
'station_2' => '40'
};

Parsing HTML-attributes like strings

Have strings like HTML attributes
key1="value1 value2" key2="va3" key4
need parse such string to get HoA:
$parsed = {
key1' => [
'value1',
'value2'
],
key2 => [ 'val3' ], #or key2 => 'val3' doesn't matter..
key4 => undef,
};
Creating the function myself, like:
#!/usr/bin/env perl
use 5.014;
use strict;
use warnings;
use Data::Dumper;
while(<DATA>) {
my $parsed;
chomp;
next if m/\A\s*#/;
while( m/(\w+)(\s*=\s*(["'])(.*?)(\3))?/g ) {
my $k = $1;
if( $4 ) {
my #v = split(/\s+/, $4);
$parsed->{$k} = \#v;
}
else {
$parsed->{$k} = undef;
}
}
say Dumper $parsed;
}
__DATA__
key1="value1 value2" key2 key3="val3"
key1='value1 "value2"' key8 key3='val3'
key1='value1 i\'m' key2 key3="val3"
key1='value1 value2' key8 key3=val3
works and prints correct results for the first 2 lines.
$VAR1 = {
'key1' => [
'value1',
'value2'
],
'key3' => [
'val3'
],
'key2' => undef
};
$VAR1 = {
'key1' => [
'value1',
'"value2"'
],
'key3' => [
'val3'
],
'key8' => undef
};
Unfortunately, it fails on 3rd line - don't know how to handle the escaped quotes. (And just figured out than the key=val (without quotes) is valid too))
Additionally, because don't want reinvent the wheel again, probably exists some module on CPAN for this, only haven't any idea what to search. ;(
EDIT
#mpapec suggested a module, what could greatly help for parsing the RHS part of the "assignement". My problem is than the string contains multiple space delimited LHS=RHS, where the RHS could be quoted (in single and double) or not quoted (in the case of one value) and the RHS values (in the quotes) are space delimited too..
key1="value1 value2" key2="va3" key4 key5=val5 key6='val6' key7='val x\'y zzz'
So I don't know how to break the string into multiple LHS=RHS parts, because can't split at space and can't use my regex, because it fails in escaped quotes. (maybe some more complicated regex what handles escapes could work).
Any suggestions, please?
You can use Text::ParseWords as mpapec suggested:
use strict;
use warnings;
use 5.010;
use Data::Dumper;
use Text::ParseWords;
$Data::Dumper::Sortkeys = 1;
my $string = q{key1="value1 value2" key2="va3" key4 key5=val5 key6='val6' key7='val x\'y zzz'};
my #words = shellwords $string;
my %parsed;
foreach my $word (#words) {
my ($key, $values) = split /=/, $word, 2;
$parsed{$key} //= [];
push #{ $parsed{$key} }, $_ for shellwords $values;
}
print Dumper \%parsed;
Output:
$VAR1 = {
'key1' => [
'value1',
'value2'
],
'key2' => [
'va3'
],
'key4' => [],
'key5' => [
'val5'
],
'key6' => [
'val6'
],
'key7' => [
'val',
'x\'y',
'zzz'
]
};
Note that for consistency, I assigned keys without values an empty array instead of undef. I think this will make the data structure easier to use.
Also note that I called shellwords twice. I did this to remove the backslashes from escaped quotes, so that
key7='val x\'y zzz'
gets split into
val x'y zzz
instead of
val x\'y zzz
(The backslash in x\'y in the output above is added by Data::Dumper; there's no backslash in the variable itself.)
To fix your current issue, you can setup an alteration to handle backslashes in a special way.
#!/usr/bin/env perl
use 5.014;
use strict;
use warnings;
use Data::Dump;
my $parsed;
while (<DATA>) {
chomp;
next if m/\A\s*#/;
while (
m{
(\w+)
(?:
\s* = \s*
(["'])
( (?: (?!\2)[^\\] | \\. )* )
\2
)?
}gx
)
{
my $k = $1;
if ($2) {
( my $val = $3 ) =~ s/\\(.)/$1/g; # Unescape backslashes
$parsed->{$k} = [ split /\s+/, $val ]; # Split words
} else {
$parsed->{$k} = undef;
}
}
dd $parsed;
print "\n";
}
__DATA__
key1="value1 value2" key2 key3="val3"
key1='value1 "value2"' key2 key3='val3'
key1='value1 i\'m' key2 key3="val3"
Outputs:
{ key1 => ["value1", "value2"], key2 => undef, key3 => ["val3"] }
{ key1 => ["value1", "\"value2\""], key2 => undef, key3 => ["val3"] }
{ key1 => ["value1", "i'm"], key2 => undef, key3 => ["val3"] }
There are still other issues to take into account, but perhaps this will help you get further along.
You might consider something based on Parser::MGC. Your examples look like a nice simple loop on
my $key = $self->token_ident;
$self->expect( '=' );
my $value = $self->token_string;

Bug with parsing by Text::CSV_XS?

Tried to use Text::CSV_XS to parse some logs. However, the following code doesn't do what I expected -- split the line into pieces according to separator " ".
The funny thing is, if I remove the double quote in the string $a, then it will do splitting.
Wonder if it's a bug or I missed something. Thanks!
use Text::CSV_XS;
$a = 'id=firewall time="2010-05-09 16:07:21 UTC"';
$userDefinedSeparator = Text::CSV_XS->new({sep_char => " "});
print "$userDefinedSeparator\n";
$userDefinedSeparator->parse($a);
my $e;
foreach $e ($userDefinedSeparator->fields) {
print $e, "\n";
}
EDIT:
In the above code snippet, it I change the = (after time) to be a space, then it works fine. Started to wonder whether this is a bug after all?
$a = 'id=firewall time "2010-05-09 16:07:21 UTC"';
You have confused the module by leaving both the quote character and the escape character set to double quote ", and then left them embedded in the fields you want to split.
Disable both quote_char and escape_char, like this
use strict;
use warnings;
use Text::CSV_XS;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my $space_sep = Text::CSV_XS->new({
sep_char => ' ',
quote_char => undef,
escape_char => undef,
});
$space_sep->parse($string);
for my $field ($space_sep->fields) {
print "$field\n";
}
output
id=firewall
time="2010-05-09
16:07:21
UTC"
But note that you have achieved exactly the same things as print "$_\n" for split ' ', $string, which is to be preferred as it is both more efficient and more concise.
In addition, you must always use strict and use warnings; and never use $a or $b as variable names, both because they are used by sort and because they are meaningless and undescriptive.
Update
As #ThisSuitIsBlackNot points out, your intention is probably not to split on spaces but to extract a series of key=value pairs. If so then this method puts the values straight into a hash.
use strict;
use warnings;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my %data = $string =~ / ([^=\s]+) \s* = \s* ( "[^"]*" | [^"\s]+ ) /xg;
use Data::Dump;
dd \%data;
output
{ id => "firewall", time => "\"2010-05-09 16:07:21 UTC\"" }
Update
This program will extract the two name=value strings and print them on separate lines.
use strict;
use warnings;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my #fields = $string =~ / (?: "[^"]*" | \S )+ /xg;
print "$_\n" for #fields;
output
id=firewall
time="2010-05-09 16:07:21 UTC"
If you are not actually trying to parse csv data, you can get the time field by using Text::ParseWords, which is a core module in Perl 5. The benefit to using this module is that it handles quotes very well.
use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;
my $str = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my #fields = quotewords(' ', 0, $str);
print Dumper \#fields;
my %hash = map split(/=/, $_, 2), #fields;
print Dumper \%hash;
Output:
$VAR1 = [
'id=firewall',
'time=2010-05-09 16:07:21 UTC'
];
$VAR1 = {
'time' => '2010-05-09 16:07:21 UTC',
'id' => 'firewall'
};
I also included how you can make the data more accessible by adding it to a hash. Note that hashes cannot contain duplicate keys, so you need a new hash for each new time key.

Printing Hash in Perl

I want to know the following code why print "2/8".
#!/usr/bin/perl
#use strict;
#use warnings;
%a = ('a'=>'dfsd','b'=>'fdsfds');
print %a."\n";
You are printing a hash in scalar context by concatenating it with a string '\n'
If you evaluate a hash in scalar context, it returns false if the hash
is empty. If there are any key/value pairs, it returns true; more
precisely, the value returned is a string consisting of the number of
used buckets and the number of allocated buckets, separated by a
slash.
2/8 means that of the 8 buckets allocated, 2 have been touched. Considering that you've inserted only 2 values, it is doing well so far :)
The value is obviously of no use, except to evaluate how well the hash-function is doing. Use print %a; to print its contents.
As mentioned by #Dark.. you are printing a hash in scalar context.
if you really want to print a hash, then use Data::Dumper
use Data::Dumper;
...
...
print Dumper(%a);
for eg:
use Data::Dumper;
my %hash = ( key1 => 'value1', key2 => 'value2' );
print Dumper(%hash); # okay, but not great
print "or\n";
print Dumper(\%hash); # much better
And the output:
$VAR1 = 'key2';
$VAR2 = 'value2';
$VAR3 = 'key1';
$VAR4 = 'value1';
or
$VAR1 = {
'key2' => 'value2',
'key1' => 'value1'
};