hash arrays for basic Perl script - perl

I'm writing my first Perl script and am reading a small text file line by line. The fields are delimited by ':' so i want to split each field into a hash array using the first field(name) as the key for each. Also, (I think) I want a big hash that holds all the information, or maybe just an array that holds each field so I can print all info on one line based on a pattern. I've not gotten far as %info is creating odd # elements in the hash assignment. Should I make it a regular array, and am I even going about this the right way? Basically, lines are in this order.
name:phone:address:date:salary
#!/usr/bin/perl -w
use strict;
print $#ARGV;
if($#ARGV == -1)
{
print "Script needs 1 argument please.\n";
exit 1;
}
my $inFILE = $ARGV[0];
#open the file passed
open(IN, "$inFILE") || die "Cannot open: $!"; #open databook.txt
my %info = (my %name, my %phone, my %address, my %date, my %salary);
while(<IN>)
{
%info = (split /:/)[1];
}
close($inFILE);

First of all, you should define your data structure depending on how you would use the information parsed. If you're using the name as index to search the information, I suggest to use a nested hash, indexed by the name:
{name => {phone => ..., address => ..., date => ..., salary => ...}, ...}
If you're not going to use name as index, just store the information in an array:
[ {name => ..., address => ..., date => ..., salary => ...},
{name => ..., address => ..., date => ..., salary => ...}, ...]
In most cases I would use the first one.
Secondly, arrays and hashes in Perl are flat. So this:
my %info = (my %name, my %phone, my %address, my %date, my %salary);
doesn't make sense. Use a ref to store the data.
Last but not least, Perl has a syntax sugar for the input file. Use <> to read file from arguments, instead of opening files explicitly. This makes the program more "Perlish".
use strict;
use warnings;
use Data::Dumper;
my $info = {};
while (<>) {
chomp;
my #items = split /:/, $_;
$info->{$items[0]} = { phone => $items[1],
address => $items[2],
date => $items[3],
salary => $items[4] };
}
print Dumper $info;

Related

Parse report in blocks to CSV

I have lots of data dumps in a pretty huge amount of data structured as follow
Key1:.............. Value
Key2:.............. Other value
Key3:.............. Maybe another value yet
Key1:.............. Different value
Key3:.............. Invaluable
Key5:.............. Has no value at all
Which I would like to transform to something like:
Key1,Key2,Key3,Key5
Value,Other value,Maybe another value yet,
Different value,,Invaluable,Has no value at all
I mean:
Generate a collection of all the keys
Generate a header line with all the Keys
Map all the values to their correct "columns" (notice that in this example I have no "Key4", and Key3/Key5 interchanged)
Possibly in Perl, since it would be easier to use in various environments.
But I am not sure if this format is unusual, or if there is a tool that already does this.
This is fairly easy using hashes and the Text::CSV_XS module:
use strict;
use warnings;
use Text::CSV_XS;
my #rows;
my %headers;
{
local $/ = "";
while (<DATA>) {
chomp;
my %record;
for my $line (split(/\n/)) {
next unless $line =~ /^([^:]+):\.+\s(.+)/;
$record{$1} = $2;
$headers{$1} = $1;
}
push(#rows, \%record);
}
}
unshift(#rows, \%headers);
my $csv = Text::CSV_XS->new({binary => 1, auto_diag => 1, eol => $/});
$csv->column_names(sort(keys(%headers)));
for my $row_ref (#rows) {
$csv->print_hr(*STDOUT, $row_ref);
}
__DATA__
Key1:.............. Value
Key2:.............. Other value
Key3:.............. Maybe another value yet
Key1:.............. Different value
Key3:.............. Invaluable
Key5:.............. Has no value at all
Output:
Key1,Key2,Key3,Key5
Value,"Other value","Maybe another value yet",
"Different value",,Invaluable,"Has no value at all"
If your CSV format is 'complicated' - e.g. it contains commas, etc. - then use one of the Text::CSV modules. But if it isn't - and this is often the case - I tend to just work with split and join.
What's useful in your scenario, is that you can map key-values within a record quite easily using a regex. Then use a hash slice to output:
#!/usr/bin/env perl
use strict;
use warnings;
#set paragraph mode - records are blank line separated.
local $/ = "";
my #rows;
my %seen_header;
#read STDIN or files on command line, just like sed/grep
while ( <> ) {
#multi - line pattern, that matches all the key-value pairs,
#and then inserts them into a hash.
my %this_row = m/^(\w+):\.+ (.*)$/gm;
push ( #rows, \%this_row );
#add the keys we've seen to a hash, so we 'know' what we've seen.
$seen_header{$_}++ for keys %this_row;
}
#extract the keys, make them unique and ordered.
#could set this by hand if you prefer.
my #header = sort keys %seen_header;
#print the header row
print join ",", #header, "\n";
#iterate the rows
foreach my $row ( #rows ) {
#use a hash slice to select the values matching #header.
#the map is so any undefined values (missing keys) don't report errors, they
#just return blank fields.
print join ",", map { $_ // '' } #{$row}{#header},"\n";
}
This for you sample input, produces:
Key1,Key2,Key3,Key5,
Value,Other value,Maybe another value yet,,
Different value,,Invaluable,Has no value at all,
If you want to be really clever, then most of that initial building of the loop can be done with:
my #rows = map { { m/^(\w+):\.+ (.*)$/gm } } <>;
The problem then is - you would need to build up the 'headers' array still, and that means a bit more complicated:
$seen_header{$_}++ for map { keys %$_ } #rows;
It works, but I don't think it's as clear about what's happening.
However the core of your problem may be the file size - that's where you have a bit of a problem, because you need to read the file twice - first time to figure out which headings exist throughout the file, and then second time to iterate and print:
#!/usr/bin/env perl
use strict;
use warnings;
open ( my $input, '<', 'your_file.txt') or die $!;
local $/ = "";
my %seen_header;
while ( <$input> ) {
$seen_header{$_}++ for m/^(\w+):/gm;
}
my #header = sort keys %seen_header;
#return to the start of file:
seek ( $input, 0, 0 );
while ( <$input> ) {
my %this_row = m/^(\w+):\.+ (.*)$/gm;
print join ",", map { $_ // '' } #{$this_row}{#header},"\n";
}
This will be slightly slower, as it'll have to read the file twice. But it won't use nearly as much memory footprint, because it isn't holding the whole file in memory.
Unless you know all your keys in advance, and you can just define them, you'll have to read the file twice.
This seems to work with the data you've given
use strict;
use warnings 'all';
my %data;
while ( <> ) {
next unless /^(\w+):\W*(.*\S)/;
push #{ $data{$1} }, $2;
}
use Data::Dump;
dd \%data;
output
{
Key1 => ["Value", "Different value"],
Key2 => ["Other value"],
Key3 => ["Maybe another value yet", "Invaluable"],
Key5 => ["Has no value at all"],
}

how to read a line and save multiple parameters into variables separated by ;?

So lets say I have a file.txt, this documents Syntax is like this:
"1;22;333;'4444';55555",
I now want my code to do the following:
open the file = already done
read line and save each Parameter separated by ; into a variable like ( $one = 1, $two = 22, $three = 333, $four = '4444', $five = 55555; )
this step would be writing the variables into a DB but thats done already
Loop until all lines of the file are done
So I actually Need help with Step 2, i think I am able to do the Loop and DB code. Do you guys have any ideas or tips how I could do this? beginnerfriendly would be nice so I can learn out of it.
foreach $file (#file){
$currentfile = "$currentdir\\$file";
open(my $reader, "<", $currentfile) or die "Failed to open file: $!\n";
?????
close $reader;
}
If you're just doing 'numbered fields' then you should be thinking 'array':
use Data::Dumper;
while ( <$reader> ) {
chomp;
my #row = split /;/;
print Dumper \#row;
}
This will give you an array that you can access - e.g. $row[0] for the first element.
$VAR1 = [
'1',
'22',
'333',
'\'4444\'',
'55555'
];
If you know what the headers are 'named' and prefer to work on names you can do something similar with a hash:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #cols = qw ( id value fish name sprout );
while ( <DATA> ) {
my %row;
chomp;
#row{#cols} = split /;/;
print Dumper \%row;
}
__DATA__
1;22;333;'4444';55555
This gives instead:
$VAR1 = {
'fish' => '333',
'name' => '\'4444\'',
'id' => '1',
'value' => '22',
'sprout' => '55555'
};
Note - hashes are unordered, but their whole point is that you don't need to care about the 'order' - just print $row{name},"\n";
You need to read from the filehandle $reader, line by line. See the tutorial perlopentut and the full reference open.
Then you split each line by the separator ;, what returns a list which you assign to an array.
open my $reader, "<", $currentfile or die "Failed to open file: $!\n";
while (my $line = <$reader>) {
chomp($line);
my #params = split ';', $line;
# do something with #params, it will be overwritten on next iteration
}
close $reader;
The diamond operator <> reads from a filehandle, <$fh>, returning a line at a time. See about it in perlop. When there are no more lines it returns undef and looping stops. You may assign the string that it returns to a variable which you declare (my $line), which then exists only within the body of the while loop. If you don't, but do while (<$fh>) instead, the line is assigned to the special variable $_, which is default for many things in Perl.
The chomp removes the linefeed (new line) from the end of the line.
Note that '4444' from your example is not a number and cannot be used as such.
Alternatively, you can take a reference to the array with parameters on each line, and put it in another array which thus will in the end contain all lines.
my #all_params;
while (my $line = <$reader>) {
my #params = split ';', $line;
push #all_params, \#params;
}
Now #all_params has elements which are references, each to an array with parameters for one line. For how to work with references see the tutorial perlreftut and the Cookbook on complex data structures, perldsc.
The following is more complex but let me mention it since it's a bit of an idiom. You can do the above in one statement
my #all_params = map { [ split ';', $_ ] } <$reader>;
This uses map, which applies the code in { ... } to each element of the list that is submitted to it, returning a list. So it takes a list and returns the processed list. The [...] inside makes an anonymous array, equivalent to the reference we took of an array previously. The filehandle <$reader>returns all lines of the file in one list when invoked in the list context, which is in this case imposed by map (since it must receive a list).
An important one: always start your programs with
use warnings 'all';
use strict;
The order of these doesn't really matter. Mostly you'll see use strict; first.
Then your loop over filenames need be foreach my $file (#file) { ... } and you must declare all variables, so my $currentfile = ....

Ordered hash of hashes - setting and accessing key/value pairs

I want to implement an ordered hash where the value of each key value pair will be a another nested hash map. I am unable to do so. I am not getting any errors but nothing is being printed.
use Hash::Ordered;
use constant { lead_ID => 44671 , lag_ID => 11536 , start_time => time };
my $dict_lead=Hash::Ordered->new;
my $dict_lag=Hash::Ordered->new;
open(my $f1,"<","tcs_07may_nse_fo") or die "cant open input file";
open(my $f2,">","bid_ask_".&lead_ID) or die "cant open output file";
open(my $f3,">","ema_data/bid_ask_".&lag_ID) or die "cant open output file";
while(my $line =<$f1>){
my #data=split(/,/,$line);
chomp(#data);
my ($tstamp, $instr) = (int($data[0]), $data[1]);
if($instr==&lead_ID){
$dict_lead->set($tstamp=>{"bid"=>$data[5],"ask"=>$data[6]});
}
if($instr==&lag_ID){
$dict_lag->set($tstamp=>{"bid"=>$data[5],"ask"=>$data[6]});
}
}
close $f1;
foreach my $key ($dict_lead->keys){
my $spread=$dict_lead{$key}{"ask"}-$dict_lead{$key}{"bid"};
%hash=$dict_lead->get($key);
print $key.",".$hash{"ask"}."\n";
print $f2 $key.",".$dict_lead{$key}{"bid"}.","
.$dict_lead{$key}{"ask"}.",".$spread."\n";
}
foreach my $key ($dict_lag->keys){
my $spread=$dict_lag{$key}{"ask"}-$dict_lag{$key}{"bid"};
print $f3 $key.",".$dict_lag{$key}{"bid"}.","
.$dict_lag{$key}{"ask"}.",".$spread."\n";
}
close $f2;
close $f3;
print "Ring destroyed in " , time() - &start_time , " seconds\n";
The output printed on my terminal is :
1430992791,
1430992792,
1430992793,
1430992794,
1430992795,
1430992796,
1430992797,
1430992798,
1430992799,
Ring destroyed in 24 seconds
I realize from the first column of output that I am able to insert the key in ordered hash. But I don't understand how to insert another hash as value for those keys. Also how would I access those values while iterating through the keys of the hash?
The output in the file corresponding to file handle $f2 is:
1430970394,,,0
1430970395,,,0
1430970396,,,0
1430970397,,,0
1430970398,,,0
1430970399,,,0
1430970400,,,0
First of all, I don't see why you want to use a module that keeps your hash in order. I presume you want your output ordered by the timestamp fields, and the data that you are reading from the input file is already ordered like that, but it would be simple to sort the keys of an ordinary hash and print the contents in order without relying on the incoming data being presorted
You have read an explanation of why your code isn't behaving as it should. This is how I would write a solution that hopefully behaves properly (although I haven't been able to test it beyond checking that it compiles)
Instead of a hash, I have chosen to use a two-element array to contain the ask and bid prices for each timestamp. That should make the code run fractionally faster as well as making it simpler and easier to read
It's also noteworthy that I have added use autodie, which makes perl check the status of IO operations such as open and chdir automatically and removes the clutter caused by coding those checks manually. I have also defined a constant for the path to the root directory of the files, and used chdir to set the working directory there. That removes the need to repeat that part of the path and reduces the length of the remaining file path strings
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use autodie;
use Hash::Ordered;
use constant DIR => '../tcs_nse_fo_merged';
use constant LEAD_ID => 44671;
use constant LAG_ID => 11536;
chdir DIR;
my $dict_lead = Hash::Ordered->new;
my $dict_lag = Hash::Ordered->new;
{
open my $fh, '<', 'tcs_07may_nse_fo';
while ( <$fh> ) {
chomp;
my #data = split /,/;
my $tstamp = int $data[0];
my $instr = $data[1];
if ( $instr == LEAD_ID ) {
$dict_lead->set( $tstamp => [ #data[5,6] ] );
}
elsif ( $instr == LAG_ID ) {
$dict_lag->set( $tstamp => [ #data[5,6] ] );
}
}
}
{
my $file = 'ema_data/bid_ask_' . LEAD_ID;
open my $out_fh, '>', $file;
for my $key ( $dict_lead->keys ) {
my $val = $dict_lead->get($key);
my ($ask, $bid) = #$val;
my $spread = $ask - $bid;
print join(',', $key, $ask), "\n";
print $out_fh join(',', $key, $bid, $ask, $spread), "\n";
}
}
{
my $file = 'ema_data/bid_ask_' . LAG_ID;
open my $out_fh, '>', $file;
for my $key ( $dict_lag->keys ) {
my $val = $dict_lead->get($key);
my ($ask, $bid) = #$val;
my $spread = $ask - $bid;
print $out_fh join(',', $key, $bid, $ask, $spread), "\n";
}
}
printf "Ring destroyed in %d seconds\n", time - $^T;
With ordered hashes constructed using Hash::Ordered, the hash is an object. Those objects have properties (e.g. an index; if you examine a Hash::Ordered object it will have more than just hash elements inside of it) and they provide methods for you manipulate and access their data. So you need to use the supplied methods - like set to access the hash such as you do in this line:
$dict_lead->set($tstamp=>{"bid"=>$data[5],"ask"=>$data[6]});
where you create a key using the the scalar $tstamp and then associate it with an anonymous hash as it value.
But while you are using Hash::Ordered objects, your script also makes use of a plain data-structure (%hash) that you populate using $dict_lead->get($key) in your first foreach loop. All the normal techniques, idioms and rules for adding keys to a hash still apply in this case. You don't want to repeatedly copy the nested hash out of $dict_lead Hash::Ordered object into %hash here, you want to add the nested hash to %hash and associate it with a unique key.
Without sample data to test or a description of the expected output to compare against it is difficult to know for sure, but you probably just need to change:
%hash=$dict_lead->get($key);
to something like:
$hash{$key} = $dict_lead->get($key);
to populate your temporary %hash correctly. Or, since each key's value is an anonymous hash that is nested, you might instead want to try changing print $key.",".$hash{"ask"}."\n"; to:
print $key.",".$hash{$key}{"ask"}."\n"
There are other ways to "deeply" copy part of one nested data structure to another (see the Stackoverflow reference below) and you maybe be able to avoid using the temporary variable all together, but these small changes might be all that is necessary in your case.
In general, in order to "insert another hash as a value for ... keys" you need to use a reference or an anonymous hash constructor ({ k => "v" , ... }). So e.g. to add one key:
my %sample_hash ;
$sample_hash{"key_0"} = { bid => "1000000" , timestamp => 1435242285 };
dd %sample_hash ;
Output:
("key_0", { bid => 1000000, timestamp => 1435242285 })
To add multiple keys from one hash to another:
my %to_add = ( key_1 => { bid => "1500000" , timestamp => 1435242395 },
key_2 => { bid => "2000000" , timestamp => 1435244898 } );
for my $key ( keys %to_add ) {
$sample_hash{$key} = $to_add{$key}
}
dd %sample_hash ;
Output:
(
"key_1",
{ bid => 1000000, timestamp => 1435242285 },
"key_0",
{ bid => 1400000, timestamp => 1435242395 },
"key_2",
{ bid => 2000000, timestamp => 1435244898 },
)
References
How can I combine hashes in Perl? ++
perldoc perlfaq4
perldoc perldsc

How do I preserve the order of a hash in Perl?

I have a .sql file from which I am reading my input. Suppose the file contains the following input....
Message Fruits Fruit="Apple",Color="Red",Taste="Sweet";
Message Flowers Flower="Rose",Color="Red";
Now I have written a perl script to generate hash from this file..
use strict;
use Data::Dumper;
if(open(MYFILE,"file.sql")){
my #stack;
my %hash;
push #stack,\%hash;
my #file = <MYFILE>;
foreach my $row(#file){
if($row =~ /Message /){
my %my_hash;
my #words = split(" ",$row);
my #sep_words = split(",",$words[2]);
foreach my $x(#sep_words){
my($key,$value) = split("=",$x);
$my_hash{$key} = $value;
}
push #stack,$stack[$#stack]->{$words[1]} = {%my_hash};
pop #stack;
}
}
print Dumper(\%hash);
}
I am getting the following output..
$VAR1 = {
'Flowers' => {
'Flower' => '"Rose"',
'Color' => '"Red";'
},
'Fruits' => {
'Taste' => '"Sweet";',
'Fruit' => '"Apple"',
'Color' => '"Red"'
}
};
Now here the hash is not preserving the order in which the input is read.I want my hash to be in the same order as in input file.
I have found some libraries like Tie::IxHash but I want to avoid the use of any libraries.Can anybody help me out???
For a low key approach, you could always maintain the keys in an array, which does have an order.
foreach my $x(#sep_words){
my($key,$value) = split("=",$x);
$my_hash{$key} = $value;
push(#list_keys,$key);
}
And then to extract, iterate over the keys
foreach my $this_key (#list_keys) {
# do something with $my_hash{$this_key}
}
But that does have the issue of, you're relying on the array of keys and the hash staying in sync. You could also accidentally add the same key multiple times, if you're not careful.
Joel has it correct - you cannot reliably trust the order of a hash in Perl. If you need a certain order, you'll have to store your information in an array.
A hash is a set of key-value pairs with unique keys. A set is never ordered per se.
An array is a sequence of any number of scalars. An array is ordered per se, but uniqueness would have to be enforced externally.
Here is my take on your problem:
#!/usr/bin/perl
use strict; use warnings;
use Data::Dumper;
local $/ = ";\n";
my #messages;
while (<DATA>) {
chomp;
my ($msg, $to, $what) = split ' ', $_, 3; # limit number of fragments.
my %options;
while($what =~ /(\w+) = "((?:[^"]++|\\.)*)" (?:,|$)/xg) {
$options{$1} = $2;
}
push #messages, [$to => \%options];
}
print Dumper \#messages;
__DATA__
Message Fruits Fruit="Apple",Color="Red",Taste="Sweet";
Message Flowers Flower="Rose",Color="Red";
I put the messages into an array, because it has to be sorted. Also, I dont do weird gymnastics with a stack I don't need.
I don't split on all newlines, because you could have quoted value that contain newlines. For the same reason, I don't blindly split on , or = and use a sensible regex. It may be worth adding error detection, like die if not defined pos $what or pos($what) != length($what); at the end (requires /c flag on regex), to see if we actually processed everything or were thrown out of the loop prematurely.
This produces:
$VAR1 = [
[ 'Fruits',
{
'Taste' => 'Sweet',
'Fruit' => 'Apple',
'Color' => 'Red'
}
],
[ 'Flowers',
{
'Flower' => 'Rose',
'Color' => 'Red'
}
]
];
(with other indenting, but that's irrelevant).
One gotcha exists: The file has to be terminated by a newline, or the last semicolon isn't caught.

Perl parsing the csv file

I am just trying to read .csv file first time.I have gone through the below link :
http://metacpan.org/pod/Text::CSV_XS#Reading-a-CSV-file-line-by-line:
I have few doubt, well if you want, u can tell me this are silly question but i don't know, why i am not able to figure it out that how exactly perl is reading csv file :(
So, my doubt is:
First Question
What is the difference between reading the csv file line by line and parsing the file.
I have simple program where i am reading the csv file line by line.
Below is my program:
#!/usr/bin/perl -w
use strict;
use Text::CSV;
use Data::Dumper;
my $csv=Text::CSV->new( );
my $my_file="test.csv";
open(my $fl,"<",$my_file) or die"can not open the file $!";
#print "$ref_list\n";
while(my $ref_list=$csv->getline($fl))
{
print "$ref_list->[0]\n";
}
Below is the data in csv file :
"Emp_id","Emp_name","Location","Company"
102713,"raj","Banglore","abc"
403891,"Rakesh","Pune","Infy"
530201,"Kiran","Hyd","TCS"
503110,"raj","Noida","HCL"
Second Question:
If I want to get specific Emp_id along with Location then how can i proceed.
Third Question :
If I want only 102713 ,530201,503110 Emp record i.e name,location,compnay name then what should i do ?
Thanks
A CSV file is a good representation of tabular data in a text format, but it is unsuitable for an in-memory represenation. Because of that, we have to create an adequate representation. One such representation would be a hash:
my $hashref = {
Emp_Id => ...,
Emp_name => ...,
Location => ...,
Company => ...,
};
If the header row is in the array #header, we can create this hash with:
my #header = ...;
my #row = #{$csv->getline($fl)}; # turn the arrayref into an array
my $hashref = {};
for my $i (0..$#header) {
$hashref->{$header[$i]} = $row[$i];
}
# The $hashref now looks as described above
We can then create lookup hashes that use the id values as keys. So %lookup looks like this:
my %lookup = (
102713 => $hashref_to_first_line,
...,
);
We populate it by doing
$lookup{$row[0]} = $hashref;
after the above loop. We can then access a certain hashref with
my $a_certain_id_hashref = $lookup{102713};
or access certain elements directly with
my $a_certain_id_location = $lookup{102713}{Location};
If the key does not exist, these lookups should return undef.
If the CSV file is too big, this might cause perl to run out of memory. In that case, the hashes should be tied to files, but that is a different topic completely.
Here's another option that addresses your second question and part of your third question:
use Modern::Perl;
use Text::CSV;
my #empID = qw/ 102713 530201 503110 /;
my $csv = Text::CSV->new( { binary => 1 } )
or die 'Cannot use CSV: ' . Text::CSV->error_diag();
my $my_file = "test.csv";
open my $fl, '<', $my_file or die "can not open the file $!";
while ( my $ref_list = $csv->getline($fl) ) {
if ( $ref_list->[0] ~~ #empID ) {
say "Emp_id: $ref_list->[0] is Location: $ref_list->[2]";
}
}
$csv->eof or $csv->error_diag();
close $fl;
Output:
Emp_id: 102713 is Location: Banglore
Emp_id: 530201 is Location: Hyd
Emp_id: 503110 is Location: Noida
The array #empID contains the ID(s) you're interested in. In the while loop, each Emp_id is checked using the smart match operator (Perl v5.10+) to see if it's in the list of IDs. If so, the Emp_id and its corresponding Location is printed.