Bug with parsing by Text::CSV_XS? - perl

Tried to use Text::CSV_XS to parse some logs. However, the following code doesn't do what I expected -- split the line into pieces according to separator " ".
The funny thing is, if I remove the double quote in the string $a, then it will do splitting.
Wonder if it's a bug or I missed something. Thanks!
use Text::CSV_XS;
$a = 'id=firewall time="2010-05-09 16:07:21 UTC"';
$userDefinedSeparator = Text::CSV_XS->new({sep_char => " "});
print "$userDefinedSeparator\n";
$userDefinedSeparator->parse($a);
my $e;
foreach $e ($userDefinedSeparator->fields) {
print $e, "\n";
}
EDIT:
In the above code snippet, it I change the = (after time) to be a space, then it works fine. Started to wonder whether this is a bug after all?
$a = 'id=firewall time "2010-05-09 16:07:21 UTC"';

You have confused the module by leaving both the quote character and the escape character set to double quote ", and then left them embedded in the fields you want to split.
Disable both quote_char and escape_char, like this
use strict;
use warnings;
use Text::CSV_XS;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my $space_sep = Text::CSV_XS->new({
sep_char => ' ',
quote_char => undef,
escape_char => undef,
});
$space_sep->parse($string);
for my $field ($space_sep->fields) {
print "$field\n";
}
output
id=firewall
time="2010-05-09
16:07:21
UTC"
But note that you have achieved exactly the same things as print "$_\n" for split ' ', $string, which is to be preferred as it is both more efficient and more concise.
In addition, you must always use strict and use warnings; and never use $a or $b as variable names, both because they are used by sort and because they are meaningless and undescriptive.
Update
As #ThisSuitIsBlackNot points out, your intention is probably not to split on spaces but to extract a series of key=value pairs. If so then this method puts the values straight into a hash.
use strict;
use warnings;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my %data = $string =~ / ([^=\s]+) \s* = \s* ( "[^"]*" | [^"\s]+ ) /xg;
use Data::Dump;
dd \%data;
output
{ id => "firewall", time => "\"2010-05-09 16:07:21 UTC\"" }
Update
This program will extract the two name=value strings and print them on separate lines.
use strict;
use warnings;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my #fields = $string =~ / (?: "[^"]*" | \S )+ /xg;
print "$_\n" for #fields;
output
id=firewall
time="2010-05-09 16:07:21 UTC"

If you are not actually trying to parse csv data, you can get the time field by using Text::ParseWords, which is a core module in Perl 5. The benefit to using this module is that it handles quotes very well.
use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;
my $str = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my #fields = quotewords(' ', 0, $str);
print Dumper \#fields;
my %hash = map split(/=/, $_, 2), #fields;
print Dumper \%hash;
Output:
$VAR1 = [
'id=firewall',
'time=2010-05-09 16:07:21 UTC'
];
$VAR1 = {
'time' => '2010-05-09 16:07:21 UTC',
'id' => 'firewall'
};
I also included how you can make the data more accessible by adding it to a hash. Note that hashes cannot contain duplicate keys, so you need a new hash for each new time key.

Related

Perl, Split string into Key:Value pairs for hash with lowercase keys without temporary array

Given a string of Key:Value pairs, I want to create a lookup hash but with lowercase values for the keys. I can do so with this code
my $a="KEY1|Value1|kEy2|Value2|KeY3|Value3";
my #a = split '\|', $a;
my %b = map { $a[$_] = ( !($_ % 2) ? lc($a[$_]) : $a[$_]) } 0 .. $#a ;
The resulting Hash would look like this Dumper output:
$VAR1 = {
'key3' => 'Value3',
'key2' => 'Value2',
'key1' => 'Value1'
};
Would it be possible to directly create hash %b without using temporary array #a or is there a more efficient way to achieve the same result?
Edit: I forgot to mention that I cannot use external modules for this. It needs to be basic Perl.
You can use pairmap from List::Util to do this without an intermediate array at all.
use strict;
use warnings;
use List::Util 1.29 'pairmap';
my $str="KEY1|Value1|kEy2|Value2|KeY3|Value3";
my %hash = pairmap { lc($a) => $b } split /\|/, $str;
Note: you should never use $a or $b outside of sort (or List::Util pair function) blocks. They are special global variables for sort, and just declaring my $a in a scope can break all sorts (and List::Util pair functions) in that scope. An easy solution is to immediately replace them with $x and $y whenever you find yourself starting to use them as example variables.
Since the key-value pair has to be around the | you can use a regex
my $v = "KEY1|Value1|kEy2|Value2|KeY3|Value3";
my %h = split /\|/, $v =~ s/([^|]+) \| ([^|]+)/lc($1).q(|).$2/xger;
use strict;
use warnings;
use Data::Dumper;
my $i;
my %hash = map { $i++ % 2 ? $_ : lc } split(/\|/, 'KEY1|Value1|kEy2|Value2|KeY3|Value3');
print Dumper(\%hash);
Output:
$VAR1 = {
'key1' => 'Value1',
'key2' => 'Value2',
'key3' => 'Value3'
};
For fun, here are two additional approaches.
A cheaper one than the original (since the elements are aliased rather than copied into #_):
my %hash = sub { map { $_ % 2 ? $_[$_] : lc($_[$_]) } 0..$#_ }->( ... );
A more expensive one than the original:
my %hash = ...;
#hash{ map lc, keys(%hash) } = delete( #hash{ keys(%hash) } );
More possible solutions using regexes to do all the work, but not very pretty unless you really like regex:
use strict;
use warnings;
my $str="KEY1|Value1|kEy2|Value2|KeY3|Value3";
my %hash;
my $copy = $str;
$hash{lc $1} = $2 while $copy =~ s/^([^|]*)\|([^|]*)\|?//;
use strict;
use warnings;
my $str="KEY1|Value1|kEy2|Value2|KeY3|Value3";
my %hash;
$hash{lc $1} = $2 while $str =~ m/\G([^|]*)\|([^|]*)\|?/g;
use strict;
use warnings;
my $str="KEY1|Value1|kEy2|Value2|KeY3|Value3";
my %hash = map { my ($k, $v) = split /\|/, $_, 2; (lc($k) => $v) }
$str =~ m/([^|]*\|[^|]*)\|?/g;
Here's a solution that avoids mutating the input string, constructing a new string of the same length as the input string, or creating an intermediate array in memory.
The solution here changes the split into looping over a match statement.
#! /usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $a="KEY1|Value1|kEy2|Value2|KeY3|Value3";
sub normalize_alist_opt {
my ($input) = #_;
my %c;
my $last_key;
while ($input =~ m/([^|]*(\||\z)?)/g) {
my $s = $1;
next unless $s ne '';
$s =~ s/\|\z//g;
if (defined $last_key) {
$c{ lc($last_key) } = $s;
$last_key = undef;
} else {
$last_key = $s;
}
}
return \%c;
}
print Dumper(normalize_alist_opt($a));
A potential solution that operates over the split directly. Perl might recognize and optimize the special case. Although based on discussions here and here, I'm not sure.
sub normalize_alist {
my ($input) = #_;
my %c;
my $last_key;
foreach my $s (split /\|/, $input) {
if (defined $last_key) {
$c{ lc($last_key) } = $s;
$last_key = undef;
} else {
$last_key = $s;
}
}
return \%c;
}

Variable contains period in perl, but I don't want to concat

I am trying to create a hash from tabular data. One of the columns contain team names that are separated by .'s - for instance USA.APP.L2PES.
A sample line that will be split (| are delimiters):
http://10.x.x.x:8085/BINReport/servlet/Processing|2012/10/02 08:40:30|2015/03/10 16:00:42|nxcvapxxx_bin|Chandler|Linkpoint Connect|USA.APP.L2PES
And here is the code I'm using to split and arrange into a hash (found on another stack exchange comment).
my #records;
while (<$fh_OMDB>) {
my #fields = split /\s*\|\s*/, $_;
my %hash;
#hash{#headers} = #fields;
push #records, \%hash;
}
for my $record (#records) {
my $var = $record->{Arg04};
print $var;
}
However, whenever I try to print $var I get the error message
Use of uninitialized value $var in print at find_teams_v2.pl line 53,
line 2667.
I believe this is because the string contains a . and perl is trying to concat the values. How do I print the string as a literal?
Here is an output of Data dumper (only one of them, as there are a couple thousand output):
$VAR1 = {
'Arg01' => 'zionscatfdecs',
'Date Added' => '2013/08/06 10:30:04',
'URL' => 'https://zionscat.fdecs.com',
'Arg04
' => 'USA.FDFI.APP.TMECS2
',
'Date Updated' => '2013/08/06 10:30:04',
'Arg02' => 'Omaha',
'Arg03' => 'First Data eCustomer Service ()'
};
And the headers dump:
$VAR1 = 'URL';
$VAR2 = 'Date Added';
$VAR3 = 'Date Updated';
$VAR4 = 'Arg01';
$VAR5 = 'Arg02';
$VAR6 = 'Arg03';
$VAR7 = 'Arg04
';
Mocking up my own #headers it seems to work ok.
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my #headers = qw ( mee mah mo mum
fee fi fo fum );
my $string =
q{http://10.x.x.x:8085/BINReport/servlet/Processing|2012/10/02 08:40:30|2015/03/10 16:00:42|nxcvapxxx_bin|Chandler|Linkpoint Connect|USA.APP.L2PES};
my #records;
my #fields = split /\s*\|\s*/, $string;
print join( "\n", #fields );
my %hash;
#hash{#headers} = #fields;
print Dumper \%hash;
push #records, \%hash;
foreach my $record (#records) {
my $var = $record ->{fo};
print $var,"\n";
}
This prints the string - with the dots - as expected. That's because each loop the hash looks like (with your sample data):
$VAR1 = {
'fi' => 'Linkpoint Connect',
'mo' => '2015/03/10 16:00:42',
'mum' => 'nxcvapxxx_bin',
'mee' => 'http://10.x.x.x:8085/BINReport/servlet/Processing',
'mah' => '2012/10/02 08:40:30',
'fee' => 'Chandler',
'fum' => undef,
'fo' => 'USA.APP.L2PES'
};
However, if my 'headers' row is shorter:
my #headers = qw ( mee mah mo mum );
I can reproduce your error:
Use of uninitialized value $var in print
That's because there's no {fo} element:
$VAR1 = {
'mo' => '2015/03/10 16:00:42',
'mee' => 'http://10.x.x.x:8085/BINReport/servlet/Processing',
'mum' => 'nxcvapxxx_bin',
'mah' => '2012/10/02 08:40:30'
};
So:
Make sure you have strict and warnings switched on.
check #headers is declared.
check #headers is actually long enough to match every field.
Running:
use Data::Dumper;
print Dumper \#headers;
Will tell you this.
And as noted in the comments above:
'Arg04
' => 'USA.FDFI.APP.TMECS2
{Arg04} doesn't exist, we have a newline. chomp your line before converting it into headers, (or just chomp #headers) and your code will work.
Note - you may also wish to chomp inside your while loop, because otherwise the last field will also include a newline. (Which may be undesired)
Using the Data::Dumper printouts showed that I had fewer header fields than existed, so that when I made the call to Arg04 it didn’t exist or made a reference to an undef field. Doing a chomp on the header array fixed the error message.

Apply my hash to a string to get numbers from letters

I am trying to convert letters to their respective number in the alphabet. I have a hash that I think should work I just dont know how to apply it to my string.
string:
my $string = "abc";
and my hash:
#hash{("a".."z")} = (1..26);
how do i get my string to be 123 in this case?
substitution
use warnings;
use strict;
my $string = "abc";
my %hash;
#hash{("a".."z")} = (1..26);
$string =~ s/(.)/$hash{$1}/g;
print "$string\n";
__END__
123
UPDATE: Another way, without a hash, is to use ord
my $string = "abc";
$string =~ s/(.)/ord($1) - 96/ge;
print "$string\n";
General solution:
my %lookup; #lookup{"a".."z"} = 1..26;
my $pat = '(?:'.( join '|', map quotemeta, keys %lookup ).')';
s/($pat)/$lookup{$1}/g;
Assumes keys consist of at most one character:
my %lookup; #lookup{"a".."z"} = 1..26;
my $class = '['.( join '', map quotemeta, keys %lookup ).']';
s/($class)/$lookup{$1}/g;
"Hardcoded":
$string =~ s/([a-z])/ ord($1) - ord('a') + 1 /ge;

perl: how to make compact name from a numbered sequence

[perl 5.8.8]
I have a sequence of names of things like:
names='foobar1304,foobar1305,foobar1306,foobar1307'
where the names differ only by a contiguous string of digits somewhere in the name. The strings of digits in any sequence are all of the same length, and the digit strings form a continuous numeric sequence with no skips, e.g. 003,004,005.
I want a compact representation like:
compact_name='foobar1304-7'
(The compact form is just a name, so it's exact form is negotiable.)
There will usually only be <10 things, though some sets might span a decade, e.g.
'foobaz2205-11'
Is there some concise way to do this in perl? I'm not a big perl hacker, so be a little gentle...
Bonus points for handling embedded sequences like:
names='foobar33-pqq,foobar34-pqq,foobar35-pqq'
The ideal script would neatly fall back to 'firstname2301-lastname9922' in case it can't identify a sequence in the names.
I am not sure I got your specification, but it works somehow:
#!/usr/bin/perl
use warnings;
use strict;
use Test::More;
sub compact {
my $string = shift;
my ($name, $value) = split /=/, $string;
$name =~ s/s$// or die "Cannot create compact name for $name.\n"; #/ SO hilite bug
$name = 'compact_' . $name;
$value =~ s/^'|'$//g; #/ SO hilite bug
my #values = split /,/, $value; #/ SO hilite bug
my ($prefix, $first, $suffix) = $values[0] =~ /^(.+?)([0-9]+)(.*)$/;
my $last = $first + $#values;
my $same = 0;
$same++ while substr($first, 0, $same) eq substr($last, 0, $same);
$last = substr $last, $same - 1;
for my $i ($first .. $first + $#values) {
$values[$i - $first] eq ($prefix . $i . $suffix)
or die "Invalid sequence at $values[$i-$first].\n";
}
return "$name='$prefix$first-$last$suffix'";
}
is( compact("names='foobar1304,foobar1305,foobar1306,foobar1307'"),
"compact_name='foobar1304-7'");
is( compact("names='foobaz2205,foobaz2206,foobaz2207,foobaz2208,foobaz2209,foobaz2210,foobaz2211'"),
"compact_name='foobaz2205-11'");
is( compact("names='foobar33-pqq,foobar34-pqq,foobar35-pqq'"),
"compact_name='foobar33-5-pqq'");
done_testing();
Someone sure will post an more elegant solution, but the following
use strict;
use warnings;
my $names='foobar1308-xy,foobar1309-xy,foobar1310-xy,foobar1311-xy';
my #names = split /,/,$names;
my $pfx = lcp(#names);
my #nums = map { m/$pfx(\d*)/; $1 } #names;
my $first=shift #nums;
my $last = pop #nums;
my $suf=$names[0];
$suf =~ s/$pfx\d*//;
print "$pfx\{$first-$last}$suf\n";
#https://gist.github.com/3309172
sub lcp {
my $match = shift;
substr($match, (($match ^ $_) =~ /^\0*/, $+[0])) = '' for #_;
$match;
}
prints:
foobar13{08-11}-xy

Using Perl to create another Perl file

I have an input file that looks like
*firsttitle
nameA
nameB
nameC
*secondtitle
xnameA
xnameB
xnameC
I want to create a Perl script that takes this file and basically will create another perl script that looks like
#!/usr/bin/perl
use strict;
use warnings;
my %tags = (
"firsttitle" => [ qw (nameA nameB nameC) ],
"secondtitle" => [ qw (xnameA xnameB xnameC) ]);
my $rx = join '|', keys %tags;
while (<>) {
s/^\s*($rx):\s*(\d+)/$1: $tags{$1}[$2]/;
print;
}
My thought process is that I have to first match print out the regular perl code (#!,use..etc.).Then add " my%tags=(. Then take the input file and look for the * and that's the lookup for the hash and start parsing everything after until the next(*) or end of life. If it's another * then do it again. If it's EOF then add ");" and end. And then finish with printing the last bit of perl code. Help/ideas would be appreciated. If you're going to post code snippets could you go through and explain what each part is doing? Thanks!
Very simple script. First just parse through the input file. Lines that start with * will be titles, and all the following lines up until the next *-line will be values. We put this into a hash of arrays.
The map statement gives us a list of the hash key (the title), and it's values joined together with space. We put this in an array for printing. The printing itself is done with printf, which can be a bit difficult to use, since meta characters will mess us up. Any % that are to be literal must be written as %%. I also changed single quotes from the original to double quotes. I use single quotes on the printf pattern to avoid accidental interpolation of variables.
An alternative - possibly better one - is to not just printf at all, and simply concatenate the string in a normal fashion.
use strict;
use warnings;
my ($title, %hash);
while (<DATA>) {
chomp;
if (/^\*(.+)$/) {
$title = $1;
} else {
push #{$hash{$title}}, $_;
}
}
my #args = ( map { $_, join(' ', #{$hash{$_}}) } keys %hash );
printf '#!/usr/bin/perl
use strict;
use warnings;
my %%tags = (
"%s" => [ qw ( %s ) ],
"%s" => [ qw ( %s ) ]);
my $rx = join "|", keys %%tags;
while (<>) {
s/^\s*($rx):\s*(\d+)/$1: $tags{$1}[$2]/;
print;
}', #args;
__DATA__
*firsttitle
nameA
nameB
nameC
*secondtitle
xnameA
xnameB
xnameC
Update:
This will use a different method of printing, which will be more stable.
my #args = ( map { " '$_' => [ qw ( #{$hash{$_}} ) ],\n" } keys %hash );
print '#!/usr/bin/perl
use strict;
use warnings;
my %tags = (
', #args, '
);
my $rx = join "|", keys %tags;
while (<>) {
s/^\s*($rx):\s*(\d+)/$1: $tags{$1}[$2]/;
print;
}';