how would I trim and split into perl hash

how would I trim and split into perl hash - perl

I have this string
$st="station_1:50, station_2:40, station_3:60";
how would I split this into a Perl hash table ?
I try
%hash = map{split /\:/, $_}(split /, /, $st);
it does correctly - but what if there is n-space between the , and station?
how would I make it so it strip out all the leading space?

If there may or may not be a space, split on /, ?/ instead of /, /. If there may be any number of spaces, use /, */.

The solution with your code (added \s* to the second split):
perl -we '
my $_ = "station_1:50, station_2:40, station_3:60";
my %hash = map {split /:/} split /,\s*/;
use Data::Dumper;
print Dumper \%hash
'
OUTPUT:
$VAR1 = {
'station_1' => '50',
'station_3' => '60',
'station_2' => '40'
};
Another working way using regex:
CODE
$ echo "station_1:50, station_2:40, station_3:6" |
perl -MData::Dumper -lne '
my %h;
$h{$1} = $2 while /\b(station_\d+):(\d+)/ig;
print Dumper \%h
'
SAMPLE OUTPUT
$VAR1 = {
'station_3' => '6',
'station_1' => '50',
'station_2' => '40'
};

Related

Get all values in a hash from string after equals sign Perl

I have a string like this "Test string has tes value like abc="123",bcd="345",or it it can be xyz="4567" and ytr="434"".
Now what i want to get the values after equals sign.The hash structure while be like this :
$hash->{abc} =123,
$hash->{bcd} =345,
$hash->{xyz} =4567,
i have tried this $str =~ / (\S+) \s* = \s* (\S+) /xg

The regex returns the captured pairs, which can be assigned to a hash, made anonymous.
use warnings 'all';
use strict;
use feature 'say';
my $str = 'Test string has tes value like abc="123",bcd="345",or it '
. 'it can be xyz="4567" and ytr="434"';
my $rh = { $str =~ /(\w+)="(\d+)"/g }
say "$_ => $rh->{$_}" for keys %$rh ;
Prints
bcd => 345
abc => 123
ytr => 434
xyz => 4567
Following a comment – for possible spaces around the = sign, change it to \s*=\s*.

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $string = q{Test string has tes value like abc="123",bcd="345" and xyz="523"};
my %hash = $string =~ /(\w+)="(\d*)"/g;
print Dumper \%hash;
Output
$VAR1 = {
'xyz' => '523',
'abc' => '123',
'bcd' => '345'
};
Demo

Your test string looks like this (editing slightly to fix the quoting problems).
'Test string has tes value like abc="123",bcd="345",or it it can be xyz="4567" and ytr="434"'
I used this code to test your regex:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Data::Dumper;
my $text = 'Test string has tes value like abc="123",bcd="345",or it it can be xyz="4567" and ytr="434"';
my %hash = $text =~ /(\S+)\s*=\s*(\S+)/g;
say Dumper \%hash;
Which gives this output:
$VAR1 = {
'abc="123",bcd' => '"345",or'
'ytr' => '"434"',
'xyz' => '"4567"'
};
The problem is that \S+ matches any non-whitespace character. And that's too much. You need to be more descriptive about the valid characters.
Your keys appear to all be letters. And your values are all digits - but they are surrounded by quote characters that you don't want. So try this regex instead = /([a-z]+)\s*=\s*"(\d+)"/g.
That gives:
$VAR1 = {
'bcd' => '345',
'abc' => '123',
'ytr' => '434',
'xyz' => '4567'
};
Which looks correct to me.

How can I use Text::ParseWords::parse_line when the line contains an extra unescaped double quote?

I'm using parse_line from Text::ParseWords to parse a line of text. However, when there is an unescaped double quote (") inside a pair of double quotes, parse_line fails.
For example:
use Text::ParseWords;
...
my $line = q(1000,"test","Hello"StackOverFlow");
...
#arr = &parse_line(",",1,$line);
I don't want to escape the inner double quote (e.g. "Hello \"StackOverFlow").
Is there any other way to parse the line?

Using #TLP and #ThisSuitIsBlackNot notes:
use 5.022;
use Text::CSV;
use Data::Dumper;
my $line = q(1000,"test","Hello"StackOverFlow");
my $csv = Text::CSV->new( {allow_loose_quotes => 1, escape_char => '%'});
$csv->parse($line);
my #fields = $csv->fields();
print Dumper \#fields;
__DATA__
$VAR1 = [
'1000',
'test',
'Hello"StackOverFlow'
];

Variable contains period in perl, but I don't want to concat

I am trying to create a hash from tabular data. One of the columns contain team names that are separated by .'s - for instance USA.APP.L2PES.
A sample line that will be split (| are delimiters):
http://10.x.x.x:8085/BINReport/servlet/Processing|2012/10/02 08:40:30|2015/03/10 16:00:42|nxcvapxxx_bin|Chandler|Linkpoint Connect|USA.APP.L2PES
And here is the code I'm using to split and arrange into a hash (found on another stack exchange comment).
my #records;
while (<$fh_OMDB>) {
my #fields = split /\s*\|\s*/, $_;
my %hash;
#hash{#headers} = #fields;
push #records, \%hash;
}
for my $record (#records) {
my $var = $record->{Arg04};
print $var;
}
However, whenever I try to print $var I get the error message
Use of uninitialized value $var in print at find_teams_v2.pl line 53,
line 2667.
I believe this is because the string contains a . and perl is trying to concat the values. How do I print the string as a literal?
Here is an output of Data dumper (only one of them, as there are a couple thousand output):
$VAR1 = {
'Arg01' => 'zionscatfdecs',
'Date Added' => '2013/08/06 10:30:04',
'URL' => 'https://zionscat.fdecs.com',
'Arg04
' => 'USA.FDFI.APP.TMECS2
',
'Date Updated' => '2013/08/06 10:30:04',
'Arg02' => 'Omaha',
'Arg03' => 'First Data eCustomer Service ()'
};
And the headers dump:
$VAR1 = 'URL';
$VAR2 = 'Date Added';
$VAR3 = 'Date Updated';
$VAR4 = 'Arg01';
$VAR5 = 'Arg02';
$VAR6 = 'Arg03';
$VAR7 = 'Arg04
';

Mocking up my own #headers it seems to work ok.
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my #headers = qw ( mee mah mo mum
fee fi fo fum );
my $string =
q{http://10.x.x.x:8085/BINReport/servlet/Processing|2012/10/02 08:40:30|2015/03/10 16:00:42|nxcvapxxx_bin|Chandler|Linkpoint Connect|USA.APP.L2PES};
my #records;
my #fields = split /\s*\|\s*/, $string;
print join( "\n", #fields );
my %hash;
#hash{#headers} = #fields;
print Dumper \%hash;
push #records, \%hash;
foreach my $record (#records) {
my $var = $record ->{fo};
print $var,"\n";
}
This prints the string - with the dots - as expected. That's because each loop the hash looks like (with your sample data):
$VAR1 = {
'fi' => 'Linkpoint Connect',
'mo' => '2015/03/10 16:00:42',
'mum' => 'nxcvapxxx_bin',
'mee' => 'http://10.x.x.x:8085/BINReport/servlet/Processing',
'mah' => '2012/10/02 08:40:30',
'fee' => 'Chandler',
'fum' => undef,
'fo' => 'USA.APP.L2PES'
};
However, if my 'headers' row is shorter:
my #headers = qw ( mee mah mo mum );
I can reproduce your error:
Use of uninitialized value $var in print
That's because there's no {fo} element:
$VAR1 = {
'mo' => '2015/03/10 16:00:42',
'mee' => 'http://10.x.x.x:8085/BINReport/servlet/Processing',
'mum' => 'nxcvapxxx_bin',
'mah' => '2012/10/02 08:40:30'
};
So:
Make sure you have strict and warnings switched on.
check #headers is declared.
check #headers is actually long enough to match every field.
Running:
use Data::Dumper;
print Dumper \#headers;
Will tell you this.
And as noted in the comments above:
'Arg04
' => 'USA.FDFI.APP.TMECS2
{Arg04} doesn't exist, we have a newline. chomp your line before converting it into headers, (or just chomp #headers) and your code will work.
Note - you may also wish to chomp inside your while loop, because otherwise the last field will also include a newline. (Which may be undesired)

Using the Data::Dumper printouts showed that I had fewer header fields than existed, so that when I made the call to Arg04 it didn’t exist or made a reference to an undef field. Doing a chomp on the header array fixed the error message.

Bug with parsing by Text::CSV_XS?

Tried to use Text::CSV_XS to parse some logs. However, the following code doesn't do what I expected -- split the line into pieces according to separator " ".
The funny thing is, if I remove the double quote in the string $a, then it will do splitting.
Wonder if it's a bug or I missed something. Thanks!
use Text::CSV_XS;
$a = 'id=firewall time="2010-05-09 16:07:21 UTC"';
$userDefinedSeparator = Text::CSV_XS->new({sep_char => " "});
print "$userDefinedSeparator\n";
$userDefinedSeparator->parse($a);
my $e;
foreach $e ($userDefinedSeparator->fields) {
print $e, "\n";
}
EDIT:
In the above code snippet, it I change the = (after time) to be a space, then it works fine. Started to wonder whether this is a bug after all?
$a = 'id=firewall time "2010-05-09 16:07:21 UTC"';

You have confused the module by leaving both the quote character and the escape character set to double quote ", and then left them embedded in the fields you want to split.
Disable both quote_char and escape_char, like this
use strict;
use warnings;
use Text::CSV_XS;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my $space_sep = Text::CSV_XS->new({
sep_char => ' ',
quote_char => undef,
escape_char => undef,
});
$space_sep->parse($string);
for my $field ($space_sep->fields) {
print "$field\n";
}
output
id=firewall
time="2010-05-09
16:07:21
UTC"
But note that you have achieved exactly the same things as print "$_\n" for split ' ', $string, which is to be preferred as it is both more efficient and more concise.
In addition, you must always use strict and use warnings; and never use $a or $b as variable names, both because they are used by sort and because they are meaningless and undescriptive.
Update
As #ThisSuitIsBlackNot points out, your intention is probably not to split on spaces but to extract a series of key=value pairs. If so then this method puts the values straight into a hash.
use strict;
use warnings;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my %data = $string =~ / ([^=\s]+) \s* = \s* ( "[^"]*" | [^"\s]+ ) /xg;
use Data::Dump;
dd \%data;
output
{ id => "firewall", time => "\"2010-05-09 16:07:21 UTC\"" }
Update
This program will extract the two name=value strings and print them on separate lines.
use strict;
use warnings;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my #fields = $string =~ / (?: "[^"]*" | \S )+ /xg;
print "$_\n" for #fields;
output
id=firewall
time="2010-05-09 16:07:21 UTC"

If you are not actually trying to parse csv data, you can get the time field by using Text::ParseWords, which is a core module in Perl 5. The benefit to using this module is that it handles quotes very well.
use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;
my $str = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my #fields = quotewords(' ', 0, $str);
print Dumper \#fields;
my %hash = map split(/=/, $_, 2), #fields;
print Dumper \%hash;
Output:
$VAR1 = [
'id=firewall',
'time=2010-05-09 16:07:21 UTC'
];
$VAR1 = {
'time' => '2010-05-09 16:07:21 UTC',
'id' => 'firewall'
};
I also included how you can make the data more accessible by adding it to a hash. Note that hashes cannot contain duplicate keys, so you need a new hash for each new time key.

Why wrong output from the RegEx?

When I run the script below, I get
$VAR1 = [
'ok0.ok]][[file:ok1.ok',
undef,
undef,
'ok2.ok|dgdfg]][[file:ok3.ok',
undef,
undef,
undef,
undef,
undef,
undef,
undef,
undef,
undef,
undef,
undef,
undef,
undef
];
where I was hoping for ok0.ok ok1.ok ok2.ok ok3.ok and ideally also ok4.ok ok5.ok ok6.ok ok7.ok
Question
Can anyone see what I am doing wrong?
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $html = "sdfgdfg[[file:ok0.ok]][[file:ok1.ok ]] [[file:ok2.ok|dgdfg]][[file:ok3.ok |dfgdfgg]] [[media:ok4.ok]] [[media:ok5.ok ]] [[media:ok6.ok|dgdfg]] [[media:ok7.ok |dfgdfgg]]ggg";
my #seen = ($html =~ /file:(.*?) |\||\]/g);
print Dumper \#seen;

A negated character class can simplify things a bit, I think. Be explicit as to your anchors (file:, or media:), and explicit as to what terminates the sequence (a space, pipe, or closing bracket). Then capture.
my #seen = $html =~ m{(?:file|media):([^\|\s\]]+)}g;
Explained:
my #seen = $html =~ m{
(?:file|media): # Match either 'file' or 'media', don't capture, ':'
( [^\|\s\]]+ ) # Match and capture one or more, anything except |\s]
}gx;
Capturing stops as soon as ], |, or \s is encountered.

It looks like you are trying to match everything starting with file: and ending with either a space, a pipe or a closing square bracket.
Your OR-statement at the end of the regexp needs to be between (square) brackets itself though:
my #seen = ($html =~ /file:(.*?)[] |]/g);
If you want the media: blocks as well, OR the file part. You might want a non-capturing group here:
my #seen = ($html =~ /(?:file|media):(.*?)[] |]/g);
How it works
The first statement will capture everything between 'file:' and a ], | or .
The second statement does the same, but with both file and media. We use a non-capturing group (?:group) instead of (group) so the word is not put into your #seen.

Try with
my #seen = ($html =~ /\[\[\w+:(\w+\.\w+)\]\]/g);

this is what your regex does:
...
my $ss = qr {
file: # start with file + column as anchor
( # start capture group
.*? # use any character in a non-greedy sweep
) # end capture group
\s # end non-greedy search on a **white space**
| # OR expression encountered up to here with:
\| # => | charachter
| # OR expression encountered up to here with:
\] # => ] charachter
}x;
my #seen = $html =~ /$ss/g;
...
and this is what your regex is supposed to do:
...
my $rb = qr {
\w : # alphanumeric + column as front anchor
( # start capture group
[^]| ]+ # the terminating sequence
) # end capture group
}x;
my #seen = $html =~ /$rb/g;
...
If you want a short, concise regex and know what you do, you could drop the capturing group altogether and use full capture chunk in list context together with positive lookbehind:
...
my #seen = $html =~ /(?<=(?:.file|media):)[^] |]+/g; # no cature group ()
...
or, if no other structure in your data as shown is to be dealt with, use the : as only anchor:
...
my #seen = $html =~ /(?<=:)[^] |]+/g; # no capture group and short
...
Regards
rbo

Depending on the possible characters in the file name, I think you probably want
my #seen = $html =~ /(?:file|media):([\w.]+)/g;
which captures all of ok0.ok through to ok7.ok.
It relies on the file names containing alphanumeric characters plus underscore and dot.

I hope this is what you required.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $string = "sdfgdfg[[file:ok0.ok]][[file:ok1.ok ]] [[file:ok2.ok|dgdfg]][[file:ok3.ok |dfgdfgg]] [[media:ok4.ok]] [[media:ok5.ok ]] [[media:ok6.ok|dgdfg]] [[media:ok7.ok |dfgdfgg]]ggg";
my #matches;
#matches = $string =~ m/ok\d\.ok/g;
print Dumper #matches;
Output:
$VAR1 = 'ok0.ok';
$VAR2 = 'ok1.ok';
$VAR3 = 'ok2.ok';
$VAR4 = 'ok3.ok';
$VAR5 = 'ok4.ok';
$VAR6 = 'ok5.ok';
$VAR7 = 'ok6.ok';
$VAR8 = 'ok7.ok';
Regards,
Kiran.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

how would I trim and split into perl hash - perl

If there may or may not be a space, split on /, ?/ instead of /, /. If there may be any number of spaces, use /, */.

Related

Get all values in a hash from string after equals sign Perl

How can I use Text::ParseWords::parse_line when the line contains an extra unescaped double quote?

Variable contains period in perl, but I don't want to concat

Bug with parsing by Text::CSV_XS?

Why wrong output from the RegEx?

Categories

Resources