I am trying to create a hash from tabular data. One of the columns contain team names that are separated by .'s - for instance USA.APP.L2PES.
A sample line that will be split (| are delimiters):
http://10.x.x.x:8085/BINReport/servlet/Processing|2012/10/02 08:40:30|2015/03/10 16:00:42|nxcvapxxx_bin|Chandler|Linkpoint Connect|USA.APP.L2PES
And here is the code I'm using to split and arrange into a hash (found on another stack exchange comment).
my #records;
while (<$fh_OMDB>) {
my #fields = split /\s*\|\s*/, $_;
my %hash;
#hash{#headers} = #fields;
push #records, \%hash;
}
for my $record (#records) {
my $var = $record->{Arg04};
print $var;
}
However, whenever I try to print $var I get the error message
Use of uninitialized value $var in print at find_teams_v2.pl line 53,
line 2667.
I believe this is because the string contains a . and perl is trying to concat the values. How do I print the string as a literal?
Here is an output of Data dumper (only one of them, as there are a couple thousand output):
$VAR1 = {
'Arg01' => 'zionscatfdecs',
'Date Added' => '2013/08/06 10:30:04',
'URL' => 'https://zionscat.fdecs.com',
'Arg04
' => 'USA.FDFI.APP.TMECS2
',
'Date Updated' => '2013/08/06 10:30:04',
'Arg02' => 'Omaha',
'Arg03' => 'First Data eCustomer Service ()'
};
And the headers dump:
$VAR1 = 'URL';
$VAR2 = 'Date Added';
$VAR3 = 'Date Updated';
$VAR4 = 'Arg01';
$VAR5 = 'Arg02';
$VAR6 = 'Arg03';
$VAR7 = 'Arg04
';
Mocking up my own #headers it seems to work ok.
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my #headers = qw ( mee mah mo mum
fee fi fo fum );
my $string =
q{http://10.x.x.x:8085/BINReport/servlet/Processing|2012/10/02 08:40:30|2015/03/10 16:00:42|nxcvapxxx_bin|Chandler|Linkpoint Connect|USA.APP.L2PES};
my #records;
my #fields = split /\s*\|\s*/, $string;
print join( "\n", #fields );
my %hash;
#hash{#headers} = #fields;
print Dumper \%hash;
push #records, \%hash;
foreach my $record (#records) {
my $var = $record ->{fo};
print $var,"\n";
}
This prints the string - with the dots - as expected. That's because each loop the hash looks like (with your sample data):
$VAR1 = {
'fi' => 'Linkpoint Connect',
'mo' => '2015/03/10 16:00:42',
'mum' => 'nxcvapxxx_bin',
'mee' => 'http://10.x.x.x:8085/BINReport/servlet/Processing',
'mah' => '2012/10/02 08:40:30',
'fee' => 'Chandler',
'fum' => undef,
'fo' => 'USA.APP.L2PES'
};
However, if my 'headers' row is shorter:
my #headers = qw ( mee mah mo mum );
I can reproduce your error:
Use of uninitialized value $var in print
That's because there's no {fo} element:
$VAR1 = {
'mo' => '2015/03/10 16:00:42',
'mee' => 'http://10.x.x.x:8085/BINReport/servlet/Processing',
'mum' => 'nxcvapxxx_bin',
'mah' => '2012/10/02 08:40:30'
};
So:
Make sure you have strict and warnings switched on.
check #headers is declared.
check #headers is actually long enough to match every field.
Running:
use Data::Dumper;
print Dumper \#headers;
Will tell you this.
And as noted in the comments above:
'Arg04
' => 'USA.FDFI.APP.TMECS2
{Arg04} doesn't exist, we have a newline. chomp your line before converting it into headers, (or just chomp #headers) and your code will work.
Note - you may also wish to chomp inside your while loop, because otherwise the last field will also include a newline. (Which may be undesired)
Using the Data::Dumper printouts showed that I had fewer header fields than existed, so that when I made the call to Arg04 it didn’t exist or made a reference to an undef field. Doing a chomp on the header array fixed the error message.
Related
I am reading user IDs from a csv file and trying to check if that user ID exists in my hash, however I have found that my checking through if(exists($myUserIDs{"$userID"})) is never returning true, despite testing with multiple keys that are in my hash.
open $file, "<", "test.csv";
while(<$file>)
{
chomp;
#fields = split /,/, $_;
$userID = #fields[1];
if(exists($myUserIDs{"$userID"}))
{
#do something
}
}
Turns out I wrote my test csv file with spaces after each comma.
Like 1, 2, 3 instead of 1,2,3 so my keys weren't actually matching. There goes an hour of my time haha.
See if this works for you.
use strict; use warnings;
use Data::Dumper;
my (%myUserIDs, #fields);
%myUserIDs = (
'1' => 'A',
'4' => 'D'
);
print Dumper(\%myUserIDs);
while(<DATA>)
{
chomp;
#fields = split /,/, $_;
my $userID = $fields[1];
if(exists($myUserIDs{$userID})){
print "ID:$userID exists in Hash\n";
} else {
print "ID:$userID not exists in Hash\n";
}
}
__DATA__
A,1
B,2
C,3
Output:
$VAR1 = {
'4' => 'D',
'1' => 'A'
};
ID:1 exists in Hash
ID:2 not exists in Hash
ID:3 not exists in Hash
Here's my code:
my %hash = (
'2564' => {
'st_responsible' => 'mname1',
'critical' => '',
'last_modified_by' => 'teamname1',
'transstatus' => '',
'rt_res' => 'pname1'
},
'2487' => {
'st_responsible' => 'mname2',
'critical' => '',
'last_modified_by' => 'teamname2',
'transstatus' => '',
'rt_res' => ''
}
);
print "xnum,st_responsible,critical,last_modified_by,transstatus,rt_res\n";
foreach my $x_number (sort keys %hash)
{
print "$x_number";
foreach my $element (keys %{$hash{$x_number}})
{
print ",$hash{$x_number}{$element}";
}
print "\n";
}
Expected output
xnum,st_responsible,critical,last_modified_by,transstatus,rt_res
2487,mname2,,teamname2,,
2564,mname1,,teamname1,,pname1
Actual output
xnum,st_responsible,critical,last_modified_by,transstatus,rt_res
2487,mname2,,,teamname2,
2564,mname1,,,teamname1,pname1
Please help in letting me know as to how exactly do I preserve the order of this data structure, and then write this to a CSV file.
I would suggest that for this, you'd be better off doing this with a slice, which is a way of extracting a list of values from a hash in a particular order?
#configure field order
my #order = qw ( st_responsible critical last_modified_by transstatus rt_res );
#print header row
print join (",", "xnum", #order ),"\n";
#iterate the rows
foreach my $key ( sort keys %hash ) {
#extract hash slice and join it with commas
print join ( ",", $key, #{$hash{$key}}{#order} ),"\n";
}
This gives:
xnum,st_responsible,critical,last_modified_by,transstatus,rt_res
2487,mname2,,teamname2,,
2564,mname1,,teamname1,,pname1
You can consider Text::CSV - but I'd suggest in this scenario it's overkill, best used when you've got quotes and quoted field separators to worry about. (And you don't).
If you have to deal with not just empty keys, but missing ones, you can make use of map:
my #order = qw ( st_responsible critical last_modified_by
transstatus missing rt_res extra_field_here );
print join (",", "xnum", #order ),"\n";
foreach my $key ( sort keys %hash ) {
print join ( ",", $key, map { $_ // '' } #{$hash{$key}}{#order} ),"\n";
}
(Otherwise you'll get a warning about an undef value).
Perl doesn't guarantee the order of items in the hash, this is the root cause of the issue. Even two different hashes with the same keys can have different order of keys. It may also differ from platform to platform and architecture and perl version.
You need to define another array with the list of keys which you want to print in correct order.
my #keys = qw(st_responsible critical last_modified_by transstatus rt_res);
foreach my $element (#keys) {
... print the value
}
EDIT: As you're trying to write CSV file, consider using Text::CSV which takes care about special characters, correct formatting and things like that.
There's probably a slicker way of achieving this, but give this a go:
use warnings;
use strict;
open my $csv_out, '>', 'out.csv' or die $!;
my #keys = qw(2487 2564);
my #vals = qw(st_responsible critical last_modified_by transstatus rt_res);
print $csv_out "xnum,st_responsible,critical,last_modified_by,transstatus,rt_res\n";
for my $key (#keys){
print $csv_out "$key,";
for my $vals (#vals){
$vals eq $vals[-1] ? print $csv_out "$hash{$key}{$vals}\n" : print $csv_out "$hash{$key}{$vals},";
}
}
This will print out comma-separated values to a csv file out.csv maintaining your original order (by iterating over arrays). If it's the last value it will print a newline.
--- OUTPUT ---
xnum,st_responsible,critical,last_modified_by,transstatus,rt_res
2487,mname2,,teamname2,,
2564,mname1,,teamname1,,pname1
Tried to use Text::CSV_XS to parse some logs. However, the following code doesn't do what I expected -- split the line into pieces according to separator " ".
The funny thing is, if I remove the double quote in the string $a, then it will do splitting.
Wonder if it's a bug or I missed something. Thanks!
use Text::CSV_XS;
$a = 'id=firewall time="2010-05-09 16:07:21 UTC"';
$userDefinedSeparator = Text::CSV_XS->new({sep_char => " "});
print "$userDefinedSeparator\n";
$userDefinedSeparator->parse($a);
my $e;
foreach $e ($userDefinedSeparator->fields) {
print $e, "\n";
}
EDIT:
In the above code snippet, it I change the = (after time) to be a space, then it works fine. Started to wonder whether this is a bug after all?
$a = 'id=firewall time "2010-05-09 16:07:21 UTC"';
You have confused the module by leaving both the quote character and the escape character set to double quote ", and then left them embedded in the fields you want to split.
Disable both quote_char and escape_char, like this
use strict;
use warnings;
use Text::CSV_XS;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my $space_sep = Text::CSV_XS->new({
sep_char => ' ',
quote_char => undef,
escape_char => undef,
});
$space_sep->parse($string);
for my $field ($space_sep->fields) {
print "$field\n";
}
output
id=firewall
time="2010-05-09
16:07:21
UTC"
But note that you have achieved exactly the same things as print "$_\n" for split ' ', $string, which is to be preferred as it is both more efficient and more concise.
In addition, you must always use strict and use warnings; and never use $a or $b as variable names, both because they are used by sort and because they are meaningless and undescriptive.
Update
As #ThisSuitIsBlackNot points out, your intention is probably not to split on spaces but to extract a series of key=value pairs. If so then this method puts the values straight into a hash.
use strict;
use warnings;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my %data = $string =~ / ([^=\s]+) \s* = \s* ( "[^"]*" | [^"\s]+ ) /xg;
use Data::Dump;
dd \%data;
output
{ id => "firewall", time => "\"2010-05-09 16:07:21 UTC\"" }
Update
This program will extract the two name=value strings and print them on separate lines.
use strict;
use warnings;
my $string = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my #fields = $string =~ / (?: "[^"]*" | \S )+ /xg;
print "$_\n" for #fields;
output
id=firewall
time="2010-05-09 16:07:21 UTC"
If you are not actually trying to parse csv data, you can get the time field by using Text::ParseWords, which is a core module in Perl 5. The benefit to using this module is that it handles quotes very well.
use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;
my $str = 'id=firewall time="2010-05-09 16:07:21 UTC"';
my #fields = quotewords(' ', 0, $str);
print Dumper \#fields;
my %hash = map split(/=/, $_, 2), #fields;
print Dumper \%hash;
Output:
$VAR1 = [
'id=firewall',
'time=2010-05-09 16:07:21 UTC'
];
$VAR1 = {
'time' => '2010-05-09 16:07:21 UTC',
'id' => 'firewall'
};
I also included how you can make the data more accessible by adding it to a hash. Note that hashes cannot contain duplicate keys, so you need a new hash for each new time key.
I'm new to using perl and I'm trying to build a hash of a hash from a tsv. My current process is to read in a file and construct a hash and then insert it into another hash.
my %hoh = ();
while (my $line = <$tsv>)
{
chomp $line;
my %hash;
my #data = split "\t", $line;
my $id;
my $iter = each_array(#columns, #data);
while(my($k, $v) = $iter->())
{
$hash{$k} = $v;
if($k eq 'Id')
{
$id = $v;
}
}
$hoh{$id} = %hash;
}
print "dump: ", Dumper(%hoh);
This outputs:
dump
$VAR1 = '1234567890';
$VAR2 = '17/32';
$VAR3 = '1234567891';
$VAR4 = '17/32';
.....
Instead of what I would expect:
dump
{
'1234567890' => {
'k1' => 'v1',
'k2' => 'v2',
'k3' => 'v3',
'k4' => 'v4',
'id' => '1234567890'
},
'1234567891' => {
'k1' => 'v1',
'k2' => 'v2',
'k3' => 'v3',
'k4' => 'v4',
'id' => '1234567891'
},
........
};
My limited understanding is that when I do $hoh{$id} = %hash; its inserting in a reference to %hash? What am I doing wrong? Also is there a more succint way to use my columns and data array's as key,value pairs into my %hash object?
-Thanks in advance,
Niru
To get a reference, you have to use \:
$hoh{$id} = \%hash;
%hash is the hash, not the reference to it. In scalar context, it returns the string X/Y wre X is the number of used buckets and Y the number of all the buckets in the hash (i.e. nothing useful).
To get a reference to a hash variable, you need to use \%hash (as choroba said).
A more succinct way to assign values to columns is to assign to a hash slice, like this:
my %hoh = ();
while (my $line = <$tsv>)
{
chomp $line;
my %hash;
#hash{#columns} = split "\t", $line;
$hoh{$hash{Id}} = \%hash;
}
print "dump: ", Dumper(\%hoh);
A hash slice (#hash{#columns}) means essentially the same thing as ($hash{$columns[0]}, $hash{$columns[1]}, $hash{$columns[2]}, ...) up to however many columns you have. By assigning to it, I'm assigning the first value from split to $hash{$columns[0]}, the second value to $hash{$columns[1]}, and so on. It does exactly the same thing as your while ... $iter loop, just without the explicit loop (and it doesn't extract the $id).
There's no need to compare each $k to 'Id' inside a loop; just store it in the hash as a normal field and extract it afterwards with $hash{Id}. (Aside: Is your column header Id or id? You use Id in your loop, but id in your expected output.)
If you don't want to keep the Id field in the individual entries, you could use delete (which removes the key from the hash and returns the value):
$hoh{delete $hash{Id}} = \%hash;
Take a look at the documentation included in Perl. The command perldoc is very helpful. You can also look at the Perldoc webpage too.
One of the tutorials is a tutorial on Perl references. It all help clarify a lot of your questions and explain about referencing and dereferencing.
I also recommend that you look at CPAN. This is an archive of various Perl modules that can do many various tasks. Look at Text::CSV. This module will do exactly what you want, and even though it says "CSV", it works with tab separated files too.
You missed putting a slash in front of your hash you're trying to make a reference. You have:
$hoh{$id} = %hash;
Probably want:
$hoh{$id} = \%hash;
also, when you do a Data::Dumper of a hash, you should do it on a reference to a hash. Internally, hashes and arrays have similar structures when a Data::Dumper dump is done.
You have:
print "dump: ", Dumper(%hoh);
You should have:
print "dump: ", Dumper( \%hoh );
My attempt at the program:
#! /usr/bin/env perl
#
use warnings;
use strict;
use autodie;
use feature qw(say);
use Data::Dumper;
use constant {
FILE => "test.txt",
};
open my $fh, "<", FILE;
#
# First line with headers
#
my $line = <$fh>;
chomp $line;
my #headers = split /\t/, $line;
my %hash_of_hashes;
#
# Rest of file
#
while ( my $line = <$fh> ) {
chomp $line;
my %line_hash;
my #values = split /\t/, $line;
for my $index ( ( 0..$#values ) ) {
$line_hash{ $headers[$index] } = $values[ $index ];
}
$hash_of_hashes{ $line_hash{id} } = \%line_hash;
}
say Dumper \%hash_of_hashes;
You should only store a reference to a variable if you do so in the last line before the variable goes go of scope. In your script, you declare %hash inside the while loop, so placing this statement as the last in the loop is safe:
$hoh{$id} = \%hash;
If it's not the last statement (or you're not sure it's safe), create an anonymous structure to hold the contents of the variable:
$hoh{$id} = { %hash };
This makes a copy of %hash, which is slower, but any subsequent changes to it will not effect what you stored.
I wish to convert a single string with multiple delimiters into a key=>value hash structure. Is there a simple way to accomplish this? My current implementation is:
sub readConfigFile() {
my %CONFIG;
my $index = 0;
open(CON_FILE, "config");
my #lines = <CON_FILE>;
close(CON_FILE);
my #array = split(/>/, $lines[0]);
my $total = #array;
while($index < $total) {
my #arr = split(/=/, $array[$index]);
chomp($arr[1]);
$CONFIG{$arr[0]} = $arr[1];
$index = $index + 1;
}
while ( ($k,$v) = each %CONFIG ) {
print "$k => $v\n";
}
return;
}
where 'config' contains:
pub=3>rec=0>size=3>adv=1234 123 4.5 6.00
pub=1>rec=1>size=2>adv=111 22 3456 .76
The last digits need to be also removed, and kept in a separate key=>value pair whose name can be 'ip'. (I have not been able to accomplish this without making the code too lengthy and complicated).
What is your configuration data structure supposed to look like? So far the solutions only record the last line because they are stomping on the same hash keys every time they add a record.
Here's something that might get you closer, but you still need to figure out what the data structure should be.
I pass in the file handle as an argument so my subroutine isn't tied to a particular way of getting the data. It can be from a file, a string, a socket, or even the stuff below DATA in this case.
Instead of fixing things up after I parse the string, I fix the string to have the "ip" element before I parse it. Once I do that, the "ip" element isn't a special case and it's just a matter of a double split. This is a very important technique to save a lot of work and code.
I create a hash reference inside the subroutine and return that hash reference when I'm done. I don't need a global variable. :)
use warnings;
use strict;
use Data::Dumper;
readConfigFile( \*DATA );
sub readConfigFile
{
my( $fh ) = shift;
my $hash = {};
while( <$fh> )
{
chomp;
s/\s+(\d*\.\d+)$/>ip=$1/;
$hash->{ $. } = { map { split /=/ } split />/ };
}
return $hash;
}
my $hash = readConfigFile( \*DATA );
print Dumper( $hash );
__DATA__
pub=3>rec=0>size=3>adv=1234 123 4.5 6.00
pub=1>rec=1>size=2>adv=111 22 3456 .76
This gives you a data structure where each line is a separate record. I choose the line number of the record ($.) as the top-level key, but you can use anything that you like.
$VAR1 = {
'1' => {
'ip' => '6.00',
'rec' => '0',
'adv' => '1234 123 4.5',
'pub' => '3',
'size' => '3'
},
'2' => {
'ip' => '.76',
'rec' => '1',
'adv' => '111 22 3456',
'pub' => '1',
'size' => '2'
}
};
If that's not the structure you want, show us what you'd like to end up with and we can adjust our answers.
I am assuming that you want to read and parse more than 1 line. So, I chose to store the values in an AoH.
#!/usr/bin/perl
use strict;
use warnings;
my #config;
while (<DATA>) {
chomp;
push #config, { split /[=>]/ };
}
for my $href (#config) {
while (my ($k, $v) = each %$href) {
print "$k => $v\n";
}
}
__DATA__
pub=3>rec=0>size=3>adv=1234 123 4.5 6.00
pub=1>rec=1>size=2>adv=111 22 3456 .76
This results in the printout below. (The while loop above reads from DATA.)
rec => 0
adv => 1234 123 4.5 6.00
pub => 3
size => 3
rec => 1
adv => 111 22 3456 .76
pub => 1
size => 2
Chris
The below assumes the delimiter is guaranteed to be a >, and there is no chance of that appearing in the data.
I simply split each line based on '>'. The last value will contain a key=value pair, then a space, then the IP, so split this on / / exactly once (limit 2) and you get the k=v and the IP. Save the IP to the hash and keep the k=v pair in the array, then go through the array and split k=v on '='.
Fill in the hashref and push it to your higher-scoped array. This will then contain your hashrefs when finished.
(Having loaded the config into an array)
my #hashes;
for my $line (#config) {
my $hash; # config line will end up here
my #pairs = split />/, $line;
# Do the ip first. Split the last element of #pairs and put the second half into the
# hash, overwriting the element with the first half at the same time.
# This means we don't have to do anything special with the for loop below.
($pairs[-1], $hash->{ip}) = (split / /, $pairs[-1], 2);
for (#pairs) {
my ($k, $v) = split /=/;
$hash->{$k} = $v;
}
push #hashes, $hash;
}
The config file format is sub-optimal, shall we say. That is, there are easier formats to parse and understand. [Added: but the format is already defined by another program. Perl is flexible enough to deal with that.]
Your code slurps the file when there is no real need.
Your code only pays attention to the last line of data in the file (as Chris Charley noted while I was typing this up).
You also have not allowed for comment lines or blank lines - both are a good idea in any config file and they are easy to support. [Added: again, with the pre-defined format, this is barely relevant, but when you design your own files, do remember it.]
Here's an adaptation of your function into somewhat more idiomatic Perl.
#!/bin/perl -w
use strict;
use constant debug => 0;
sub readConfigFile()
{
my %CONFIG;
open(CON_FILE, "config") or die "failed to open file ($!)\n";
while (my $line = <CON_FILE>)
{
chomp $line;
$line =~ s/#.*//; # Remove comments
next if $line =~ /^\s*$/; # Ignore blank lines
foreach my $field (split(/>/, $line))
{
my #arr = split(/=/, $field);
$CONFIG{$arr[0]} = $arr[1];
print ":: $arr[0] => $arr[1]\n" if debug;
}
}
close(CON_FILE);
while (my($k,$v) = each %CONFIG)
{
print "$k => $v\n";
}
return %CONFIG;
}
readConfigFile; # Ignores returned hash
Now, you need to explain more clearly what the structure of the last field is, and why you have an 'ip' field without the key=value notation. Consistency makes life easier for everybody. You also need to think about how multiple lines are supposed to be handled. And I'd explore using a more orthodox notation, such as:
pub=3;rec=0;size=3;adv=(1234,123,4.5);ip=6.00
Colon or semi-colon as delimiters are fairly conventional; parentheses around comma separated items in a list are not an outrageous convention. Consistency is paramount. Emerson said "A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines", but consistency in Computer Science is a great benefit to everyone.
Here's one way.
foreach ( #lines ) {
chomp;
my %CONFIG;
# Extract the last digit first and replace it with an end of
# pair delimiter.
s/\s*([\d\.]+)\s*$/>/;
$CONFIG{ip} = $1;
while ( /([^=]*)=([^>]*)>/g ) {
$CONFIG{$1} = $2;
}
print Dumper ( \%CONFIG );
}