CSV into hash - perl

I have a csv with the first column a label followed by comma separated values:
LabelA,45,56,78,90
LabelB,56,65,43,32
LabelC,56,87,98,45
I'd like the first column (LabelA etc) to be the Key in a hash with the numeric values in an array.
I can read the file into an array or scalar but I'm not sure what to do after that. Suggestions??
Edit:
Ok, so it looks like this assigns the value to a key ..but what about the comma delimited numbers in my example? Where are they going? Are they in %hash ? If so could you maybe dumb down your explanation even further? Thanks.

Personally, I like the Text::CSV_XS and IO::File module:
use Text::CSV_XS;
use IO::File;
# Usage example:
my $hash_ref = csv_file_hashref('some_file.csv');
foreach my $key (sort keys %{$hash_ref}){
print qq{$key: };
print join q{,}, #{$hash_ref->{$key}};
print qq{\n};
}
# Implementation:
sub csv_file_hashref {
my ($filename) = #_;
my $csv_fh = IO::File->new($filename, 'r');
my $csv = Text::CSV_XS->new ();
my %output_hash;
while(my $colref = $csv->getline ($csv_fh))
{
$output_hash{shift #{$colref}} = $colref;
}
return \%output_hash;
}

Well, let's assume that there are no special characters and so on.
First you open the file:
open my $fh, '<', 'some.file.csv' or die "Cannot open: $!";
Then you read from it in loop:
while (my $line = <$fh>) {
Afterwards, you remove trailing white characters (\n and others):
$line =~ s/\s*\z//;
And split it into array:
my #array = split /,/, $line;
When it's in array, you get first element off the array:
my $key = shift #array;
And store it in hash:
$hash{$key} = \#array;
(\#array means reference to array).
Whole code:
my %hash;
open my $fh, '<', 'some.file.csv' or die "Cannot open: $!";
while (my $line = <$fh>) {
$line =~ s/\s*\z//;
my #array = split /,/, $line;
my $key = shift #array;
$hash{$key} = \#array;
}
close $fh;

See perlfunc split and perldsc.
Read each line.
Chomp it.
Split it on commas.
Use the first value in the result as the key to your HoA.
The other values become the array.
Store a ref to the array in the hash under the key.
...
Profit!!!
Make a hash of array references:
Your data structure should look like this:
my %foo = (
LabelA => [ 2, 3, 56, 78, 90 ],
LabelB => [ 65, 45, 23, 34, 87 ],
LabelC => [ 67, 34, 56, 67, 98 ],
);

Text::CSV::Hashify
Turn a CSV file into a Perl hash:
# Simple functional interface
use Text::CSV::Hashify;
$hash_ref = hashify('/path/to/file.csv', 'primary_key');
# Object-oriented interface
use Text::CSV::Hashify;
$obj = Text::CSV::Hashify->new( {
file => '/path/to/file.csv',
format => 'hoh', # hash of hashes, which is default
key => 'id', # needed except when format is 'aoh'
max_rows => 20, # number of records to read; defaults to all
... # other key-value pairs possible for Text::CSV
} );
# all records requested
$hash_ref = $obj->all;

I think this can also work easier.
The $refhashvariable will be a reference to an array of hashes.
Each hash contain the headers (as hashkey) and the values of one CSV line.
The array contains the hashes for all lines in the CSV.
use Text::CSV_XS qw( csv );
$refhashvariable = csv(
in => "$input_csv_filename",
sep => ';',
headers => "auto"
); # as array of hash
This works for me. I have not tried en case the CSV does not have headers.

Related

Count the number of items derived from split without putting into an array

I am looking to spare the use of an array for memory's sake, but still get the number of items derived from the split function for each pass of a while loop.
The ultimate goal is to filter the output files according to the number of their sequences, which could either be deduced by the number of rows the file has, or the number of carrots that appear, or the number of line breaks, etc.
Below is my code:
#!/usr/bin/perl
use warnings;
use strict;
use diagnostics;
open(INFILE, "<", "Clustered_Barcodes.txt") or die $!;
my %hash = (
"TTTATGC" => "TATAGCGCTTTATGCTAGCTAGC",
"TTTATGG" => "TAGCTAGCTTTATGGGCTAGCTA",
"TTTATCC" => "GCTAGCTATTTATCCGCTAGCTA",
"TTTATCG" => "AGTCATGCTTTATCGCGATCGAT",
"TTTATAA" => "TAGCTAGCTTTATAATAGCTAGC",
"TTTATAA" => "ATCGATCGTTTATAACGATCGAT",
"TTTATAT" => "TCGATCGATTTATATTAGCTAGC",
"TTTATAT" => "TAGCTAGCTTTATATGCTAGCTA",
"TTTATTA" => "GCTAGCTATTTATTATAGCTAGC",
"CTTGTAA" => "ATCGATCGCTTGTAACGATTAGC",
);
while(my $line = <INFILE>){
chomp $line;
open my $out, '>', "Clustered_Barcode_$..txt" or die $!;
foreach my $sequence (split /\t/, $line){
if (exists $hash{$sequence}){
print $out ">$sequence\n$hash{$sequence}\n";
}
}
}
The input file, "Clustered_Barcodes.txt" when opened, looks like the following:
TTTATGC TTTATGG TTTATCC TTTATCG
TTTATAA TTTATAA TTTATAT TTTATAT TTTATTA
CTTGTAA
There will be three output files from the code, "Clustered_Barcode_1.txt", "Clustered_Barcode_2.txt", and "Clustered_Barcode_3.txt". An example of what the output files would look like could be the 3rd and final file, which would look like the following:
>CTTGTAA
ATCGATCGCTTGTAACGATTAGC
I need some way to modify my code to identify the number of rows, carrots, or sequences that appear in the file and work that into the title of the file. The new title for the above sequence could be something like "Clustered_Barcode_Number_3_1_Sequence.txt"
PS- I made the hash in the above code manually in attempt to make things simpler. If you want to see the original code, here it is. The input file format is something like:
>TAGCTAGC
GCTAAGCGATGCTACGGCTATTAGCTAGCCGGTA
Here is the code for setting up the hash:
my $dir = ("~/Documents/Sequences");
open(INFILE, "<", "~/Documents/Clustered_Barcodes.txt") or die $!;
my %hash = ();
my #ArrayofFiles = glob "$dir/*"; #put all files from the specified directory into an array
#print join("\n", #ArrayofFiles), "\n"; #this is a diagnostic test print statement
foreach my $file (#ArrayofFiles){ #make hash of barcodes and sequences
open (my $sequence, $file) or die "can't open file: $!";
while (my $line = <$sequence>) {
if ($line !~/^>/){
my $seq = $line;
$seq =~ s/\R//g;
#print $seq;
$seq =~ m/(CATCAT|TACTAC)([TAGC]{16})([TAGC]+)([TAGC]{16})(CATCAT|TACTAC)/;
$hash{$2} = $3;
}
}
}
while(<INFILE>){
etc
You can use regex to get the count:
my $delimiter = "\t";
my $line = "zyz pqr abc xyz";
my $count = () = $line =~ /$delimiter/g; # $count is now 3
print $count;
Your hash structure is not right for your problem as you have multiple entries for same ids. for example TTTATAA hash id has 2 entries in your %hash.
To solve this, use hash of array to create the hash.
Change your hash creation code in
$hash{$2} = $3;
to
push(#{$hash{$2}}, $3);
Now change your code in the while loop
while(my $line = <INFILE>){
chomp $line;
open my $out, '>', "Clustered_Barcode_$..txt" or die $!;
my %id_list;
foreach my $sequence (split /\t/, $line){
$id_list{$sequence}=1;
}
foreach my $sequence(keys %id_list)
{
foreach my $val (#{$hash{$sequence}})
{
print $out ">$sequence\n$val\n";
}
}
}
I have assummed that;
The first digit in the output file name is the input file line number
The second digit in the output file name is the input file column number
That the input hash is a hash of arrays to cover the case of several sequences "matching" the one barcode as mentioned in the comments
When a barcode has a match in the hash, that the output file will lists all the sequences in the array, one per line.
The simplest way to do this that I can see is to build the output file using a temporary filename and the rename it when you have all the data. According to the perl cookbook, the easiest way to create temporary files is with the module File::Temp.
The key to this solution is to move through the list of barcodes that appear on a line by column index rather than the usual perl way of simply iterating over the list itself. To get the actual barcodes, the column number $col is used to index back into #barcodes which is created by splitting the line on whitespace. (Note that splitting on a single space is special cased by perl to emulate the behaviour of one of its predecessors, awk (leading whitespace is removed and the split is on whitespace, not a single space)).
This way we have the column number (indexed from 1) and the line number we can get from the perl special variable, $. We can then use these to rename the file using the builtin, rename().
use warnings;
use strict;
use diagnostics;
use File::Temp qw(tempfile);
open(INFILE, "<", "Clustered_Barcodes.txt") or die $!;
my %hash = (
"TTTATGC" => [ "TATAGCGCTTTATGCTAGCTAGC" ],
"TTTATGG" => [ "TAGCTAGCTTTATGGGCTAGCTA" ],
"TTTATCC" => [ "GCTAGCTATTTATCCGCTAGCTA" ],
"TTTATCG" => [ "AGTCATGCTTTATCGCGATCGAT" ],
"TTTATAA" => [ "TAGCTAGCTTTATAATAGCTAGC", "ATCGATCGTTTATAACGATCGAT" ],
"TTTATAT" => [ "TCGATCGATTTATATTAGCTAGC", "TAGCTAGCTTTATATGCTAGCTA" ],
"TTTATTA" => [ "GCTAGCTATTTATTATAGCTAGC" ],
"CTTGTAA" => [ "ATCGATCGCTTGTAACGATTAGC" ]
);
my $cbn = "Clustered_Barcode_Number";
my $trailer = "Sequence.txt";
while (my $line = <INFILE>) {
chomp $line ;
my $line_num = $. ;
my #barcodes = split " ", $line ;
for my $col ( 1 .. #barcodes ) {
my $barcode = $barcodes[ $col - 1 ]; # arrays indexed from 0
# skip this one if its not in the hash
next unless exists $hash{$barcode} ;
my #sequences = #{ $hash{$barcode} } ;
# Have a hit - create temp file and output sequences
my ($out, $temp_filename) = tempfile();
say $out ">$barcode" ;
say $out $_ for (#sequences) ;
close $out ;
# Rename based on input line and column
my $new_name = join "_", $cbn, $line_num, $col, $trailer ;
rename ($temp_filename, $new_name) or
warn "Couldn't rename $temp_filename to $new_name: $!\n" ;
}
}
close INFILE
All of the barcodes in your sample input data have a match in the hash, so when I run this, I get 4 files for line 1, 5 for line 2 and 1 for line 3.
Clustered_Barcode_Number_1_1_Sequence.txt
Clustered_Barcode_Number_1_2_Sequence.txt
Clustered_Barcode_Number_1_3_Sequence.txt
Clustered_Barcode_Number_1_4_Sequence.txt
Clustered_Barcode_Number_2_1_Sequence.txt
Clustered_Barcode_Number_2_2_Sequence.txt
Clustered_Barcode_Number_2_3_Sequence.txt
Clustered_Barcode_Number_2_4_Sequence.txt
Clustered_Barcode_Number_2_5_Sequence.txt
Clustered_Barcode_Number_3_1_Sequence.txt
Clustered_Barcode_Number_1_2_Sequence.txt for example has:
>TTTATGG
TAGCTAGCTTTATGGGCTAGCTA
and Clustered_Barcode_Number_2_5_Sequence.txt has:
>TTTATTA
GCTAGCTATTTATTATAGCTAGC
Clustered_Barcode_Number_2_3_Sequence.txt - which matched a hash key with two sequences - had the following;
>TTTATAT
TCGATCGATTTATATTAGCTAGC
TAGCTAGCTTTATATGCTAGCTA
I was speculating here about what you wanted when a supplied barcode had two matches. Hope that helps.

Use CSV_XS to read in a .CSV file and select specified columns

I would like to read in a .csv file using CSV_XS then select columns from with by header to match what is stored in an array outputting a new .csv
use strict;
use warnings;
use Text::CSV_XS;
my $csvparser = Text::CSV_XS->new () or die "".Text::CSV_XS->error_diag();
my $file;
my #headers;
foreach $file (#args){
my #CSVFILE;
my $csvparser = Text::CSV_XS->new () or die "".Text::CSV_XS->error_diag();
for my $line (#csvfileIN) {
$csvparser->parse($line);
my #fields = $csvparser->fields;
$line = $csvparser->combine(#fields);
}
}
The following example, just parse a CSV file to a variable, then you can match, remove, add lines to that variable, and write back the variable to the same CSV file.
In this example i just remove one entry line from the CSV.
First, i would just parse the CSV file.
use Text::CSV_XS qw( csv );
$parsed_file_array_of_hashesv = csv(
in => "$input_csv_filename",
sep => ';',
headers => "auto"
); # as array of hash
Second, once you have the $parsed_file_array_of_hashesv, now you can loop that array in perl and detect the line you want to remove from the array.
and then remove it using
splice ARRAY, OFFSET, LENGTH
removes anything from the OFFSET index through the index OFFSET+LENGT
lets assume index 0
my #extracted_array = #$parsed_file_array_of_hashesv; #dereference hashes reference
splice #extracted_array, 0, 1;#remove entry 0
$ref_removed_line_parsed = \#extracted_array; #referece to array
Third, write back the array to the CSV file
$current_metric_file = csv(
in => $ref_removed_line_parsed, #only accepts referece
out => "$output_csv_filename",
sep => ';',
eol => "\n", # \r, \n, or \r\n or undef
#headers => \#sorted_column_names, #only accepts referece
headers => "auto"
);
Notice, that if you use the \#sorted_column_names you will be able to control the order of the columns
my #sorted_column_names;
foreach my $name (sort {lc $a cmp lc $b} keys %{ $parsed_file_array_of_hashesv->[0] }) { #all hashes have the same column names so we choose the first one
push(#sorted_column_names,$name);
}
That should write the CSV file without your line.
use open ":std", ":encoding(UTF-8)";
use Text::CSV_XS qw( );
# Name of columns to copy to new file.
my #col_names_out = qw( ... );
my $csv = Text::CSV_XS->new({ auto_diag => 2, binary => 1 });
for (...) {
my $qfn_in = ...;
my $qfn_out = ...;
open(my $fh_in, "<", $qfn_in)
or die("Can't open \"$qfn_in\": $!\n");
open(my $fh_out, "<", $qfn_out)
or die("Can't create \"$qfn_out\": $!\n");
$csv->column_names(#{ $csv->getline($fh_in) });
$csv->say($fh_out, \#col_names_out);
while (my $row = $csv->getline_hr($fh_in)) {
$csv->say($fh_out, [ #$row{#col_names_out} ]);
}
}

Perl list all keys in hash with identical values

If I have a colon-delimited file name FILE and I do:
cat FILE|perl -F: -lane 'my %hash = (); $hash{#F[0]} = #F[2]'
to assign the first and 3rd tokens as the key => value pairs for the hash..
1) Is that a sane way to assign key value pairs to a hash?
2) What is the simplest way to now find all keys with shared values and list them?
Assume FILE looks like:
Mike:34:Apple:Male
Don:23:Corn:Male
Jared:12:Apple:Male
Beth:56:Maize:Female
Sam:34:Apple:Male
David:34:Apple:Male
Desired Output: Keys with value "Apple": Mike,Jared,David,Sam
Your example won't work as you want because the -n option puts a while loop around your one-line program, so the hash you declare is created and destoyed for every record in the file. You could get around that by not declaring the hash, and so making it a persistent package variable which will retain all values stored in it.
You can then write push #{ $hash{$F[2]} }, $F[0] but notice that it should be $F[0] etc. and not #F[0], and I have used push to create a list of column 1 values for each column 3 value instead of just a list of one-to-one values relating each column 1 value with its column 3 value.
To clarify, your method produces a hash looking like this, which has to be searched to produce the display that you want.
(
Beth => "Maize",
David => "Apple",
Don => "Corn",
Jared => "Apple",
Mike => "Apple",
Sam => "Apple",
)
while mine creates this, which as you can see is pretty much already in the form you want.
(
Apple => ["Mike", "Jared", "Sam", "David"],
Corn => ["Don"],
Maize => ["Beth"],
)
But I think this problem is a bit too big to be solved with a one-line Perl program. The solution below expects the path to the input file as a command-line parameter, like this
> perl prog.pl colons.csv
but it will default to myfile.csv if no file is specified.
use strict;
use warnings;
our #ARGV = 'myfile.csv' unless #ARGV;
my %data;
while (<>) {
my #fields = split /:/;
push #{ $data{$fields[2]} }, $fields[0];
}
while (my ($k, $v) = each %data) {
next unless #$v > 1;
printf qq{Keys with value "%s": %s\n}, $k, join ', ', #$v;
}
output
Keys with value "Apple": Mike, Jared, Sam, David
use strict;
use warnings;
open my $in, '<', 'in.txt';
my %data;
while(<$in>){
chomp;
my #split = split/:/;
$data{$split[0]} = $split[2];
}
my $query = 'Apple';
print "Keys with value $query = ";
foreach my $name (keys %data){
print "$name " if $data{$name} eq $query;
}
print "\n";
Arrays are used to hold list of values, so use an array.
perl -F: -lane'
push #{ $h{$F[2]} }, $F[0];
END {
for my $fruit (keys %h) {
next if #{ $h{$fruit} } < 2;
print "$fruit: ", join(",", #{ $h{$fruit} });
}
}
' FILE
The END block is executed on exit. In it, we iterate over the keys of the hash. If the value of the current hash element is an array with only one element, it's skipped. Otherwise, we prints the key followed by contents of the array referenced by the hash element.
Here is another way:
perl -F: -lane'
push #{ $h{$F[2]} }, $F[0];
}{
print "$_: ", join(",", #{ $h{$_} }) for grep { #{$h{$_}} > 1 } keys %h;
' file
We read each line and create hash of arrays using third column as key and first column as list of values for matching key. In the END block we iterate over our hash using grep and filter keys whose array count greater than 1 and print the key followed by array elements.
It doesn't have to be a one liner,
Good. It's not going to be...
Is that a sane way to assign key value pairs to a hash?
You're simply assigning the key value pairs as:
$hash{"key"} = "value";
Which is about as simple as it gets. There might be a way of doing it via map. However, the main issue I see is what should happen if you have duplicate keys.
Let's say your file looks like this:
Mike:34:Apple:Male
Don:23:Corn:Male
Jared:12:Apple:Male
Beth:56:Maize:Female
Sam:34:Apple:Male
David:34:Apple:Male # Note this entry is here twice!
David:35:Wheat:Male # Note this entry is here twice!
Let's do a simple assignment loop:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = $category;
}
When you get to $hash{David}, it will first be set to Apple, but then you change the value to Wheat. There are four ways you can handle this:
Use whatever the last value is. No change in the loop.
Use the first value and ignore subsequent values. Simple enough to do.
If that happens, it's an error. Abort the program and report the error.
Keep all values.
This last one is the most interesting because it involves a reference to an array as the values for your hash:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = [] if not exists $hash{$name}; # I'm making this an array reference
push #{ $hash{$name} }, $category;
}
Now, each value in my hash is a reference to an array:
my #values = #{ $hash{David} ); # The values of David...
print "David is in categories " . join ( ", ", #values ) . "\n";
This will print out David is in categories Wheat, Apple
What is the simplest way to now find all keys with shared values and list them?
The easiest way is to create a second hash that's keyed by your value. In this hash, you will need to use an array reference. Let's assume no duplicate names for now:
my %hash;
my %indexed_hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = $category;
my $indexed_hash{$category} = [] if not exist $indexed_hash{$category};
push #{ $indexed_hash{$category} }, $name;
}
Now, if I want to find all the duplicates of Apple:
my #names = #{ $indexed_hash{Apple} };
print "The following are in 'Apple': " . join ( ", " #names ) . "\n";
Since we're getting into references, we could take things a step further and store all of your values of your file in your hash. Again, for simplicity, I am assuming that you will have one and only one entry per name:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name}->{AGE} = $age;
$hash{$name}->{CATEGORY} = $category;
$hash{$name}->{SEX} = $sex;
}
for my $name ( sort keys %hash ) {
print "$name Information:\n";
print " Age: " . $hash{$name}->{AGE} . "\n";
printf "Category: %s\n", $hash{$name}->{CATEGORY};
print " Sex: #{[$hash{$name}->{SEX}]}\n\n";
}
That last two statements are easier ways of interpolating complex data structures into a string. The printf is fairly clear. The second #{[...]} is a neat little trick.
What have you tried?
If you reverse the hash into a list of value => key pairs then use List::Util's pairs() against the list, you can transform the hash into a hash of values => key arrayrefs. i.e. ( foo => [ 'bar', 'baz' ] ), grep {#{$hash{$_}} > 1} keys %hash, and print the results.

Build hash of hash in perl

I'm new to using perl and I'm trying to build a hash of a hash from a tsv. My current process is to read in a file and construct a hash and then insert it into another hash.
my %hoh = ();
while (my $line = <$tsv>)
{
chomp $line;
my %hash;
my #data = split "\t", $line;
my $id;
my $iter = each_array(#columns, #data);
while(my($k, $v) = $iter->())
{
$hash{$k} = $v;
if($k eq 'Id')
{
$id = $v;
}
}
$hoh{$id} = %hash;
}
print "dump: ", Dumper(%hoh);
This outputs:
dump
$VAR1 = '1234567890';
$VAR2 = '17/32';
$VAR3 = '1234567891';
$VAR4 = '17/32';
.....
Instead of what I would expect:
dump
{
'1234567890' => {
'k1' => 'v1',
'k2' => 'v2',
'k3' => 'v3',
'k4' => 'v4',
'id' => '1234567890'
},
'1234567891' => {
'k1' => 'v1',
'k2' => 'v2',
'k3' => 'v3',
'k4' => 'v4',
'id' => '1234567891'
},
........
};
My limited understanding is that when I do $hoh{$id} = %hash; its inserting in a reference to %hash? What am I doing wrong? Also is there a more succint way to use my columns and data array's as key,value pairs into my %hash object?
-Thanks in advance,
Niru
To get a reference, you have to use \:
$hoh{$id} = \%hash;
%hash is the hash, not the reference to it. In scalar context, it returns the string X/Y wre X is the number of used buckets and Y the number of all the buckets in the hash (i.e. nothing useful).
To get a reference to a hash variable, you need to use \%hash (as choroba said).
A more succinct way to assign values to columns is to assign to a hash slice, like this:
my %hoh = ();
while (my $line = <$tsv>)
{
chomp $line;
my %hash;
#hash{#columns} = split "\t", $line;
$hoh{$hash{Id}} = \%hash;
}
print "dump: ", Dumper(\%hoh);
A hash slice (#hash{#columns}) means essentially the same thing as ($hash{$columns[0]}, $hash{$columns[1]}, $hash{$columns[2]}, ...) up to however many columns you have. By assigning to it, I'm assigning the first value from split to $hash{$columns[0]}, the second value to $hash{$columns[1]}, and so on. It does exactly the same thing as your while ... $iter loop, just without the explicit loop (and it doesn't extract the $id).
There's no need to compare each $k to 'Id' inside a loop; just store it in the hash as a normal field and extract it afterwards with $hash{Id}. (Aside: Is your column header Id or id? You use Id in your loop, but id in your expected output.)
If you don't want to keep the Id field in the individual entries, you could use delete (which removes the key from the hash and returns the value):
$hoh{delete $hash{Id}} = \%hash;
Take a look at the documentation included in Perl. The command perldoc is very helpful. You can also look at the Perldoc webpage too.
One of the tutorials is a tutorial on Perl references. It all help clarify a lot of your questions and explain about referencing and dereferencing.
I also recommend that you look at CPAN. This is an archive of various Perl modules that can do many various tasks. Look at Text::CSV. This module will do exactly what you want, and even though it says "CSV", it works with tab separated files too.
You missed putting a slash in front of your hash you're trying to make a reference. You have:
$hoh{$id} = %hash;
Probably want:
$hoh{$id} = \%hash;
also, when you do a Data::Dumper of a hash, you should do it on a reference to a hash. Internally, hashes and arrays have similar structures when a Data::Dumper dump is done.
You have:
print "dump: ", Dumper(%hoh);
You should have:
print "dump: ", Dumper( \%hoh );
My attempt at the program:
#! /usr/bin/env perl
#
use warnings;
use strict;
use autodie;
use feature qw(say);
use Data::Dumper;
use constant {
FILE => "test.txt",
};
open my $fh, "<", FILE;
#
# First line with headers
#
my $line = <$fh>;
chomp $line;
my #headers = split /\t/, $line;
my %hash_of_hashes;
#
# Rest of file
#
while ( my $line = <$fh> ) {
chomp $line;
my %line_hash;
my #values = split /\t/, $line;
for my $index ( ( 0..$#values ) ) {
$line_hash{ $headers[$index] } = $values[ $index ];
}
$hash_of_hashes{ $line_hash{id} } = \%line_hash;
}
say Dumper \%hash_of_hashes;
You should only store a reference to a variable if you do so in the last line before the variable goes go of scope. In your script, you declare %hash inside the while loop, so placing this statement as the last in the loop is safe:
$hoh{$id} = \%hash;
If it's not the last statement (or you're not sure it's safe), create an anonymous structure to hold the contents of the variable:
$hoh{$id} = { %hash };
This makes a copy of %hash, which is slower, but any subsequent changes to it will not effect what you stored.

Extraction and printing of key-value pair from a text file using Perl

I have a text file temp.txt which contains entries like,
cinterim=3534
cstart=517
cstop=622
ointerim=47
ostart=19
ostop=20
Note: key-value pairs may be arranged in new line or all at once in one line separated by space.
I am trying to print and store these values in DB for corresponding keys using Perl. But I am getting many errors and warnings. Right now I am just trying to print those values.
use strict;
use warnings;
open(FILE,"/root/temp.txt") or die "Unable to open file:$!\n";
while (my $line = <FILE>) {
# optional whitespace, KEY, optional whitespace, required ':',
# optional whitespace, VALUE, required whitespace, required '.'
$line =~ m/^\s*(\S+)\s*:\s*(.*)\s+\./;
my #pairs = split(/\s+/,$line);
my %hash = map { split(/=/, $_, 2) } #pairs;
printf "%s,%s,%s\n", $hash{cinterim}, $hash{cstart}, $hash{cstop};
}
close(FILE);
Could somebody provide help to refine my program.
use strict;
use warnings;
open my $fh, '<', '/root/temp.txt' or die "Unable to open file:$!\n";
my %hash = map { split /=|\s+/; } <$fh>;
close $fh;
print "$_ => $hash{$_}\n" for keys %hash;
What this code does:
<$fh> reads a line from our file, or in list context, all lines and returns them as an array.
Inside map we split our line into an array using the regexp /= | \s+/x. This means: split when you see a = or a sequence of whitespace characters. This is just a condensed and beautified form of your original code.
Then, we cast the list resulting from map to the hash type. We can do that because the item count of the list is even. (Input like key key=value or key=value=valuewill throw an error at this point).
After that, we print the hash out. In Perl, we can interpolate hash values inside strings directly and don't have to use printf and friends except for special formatting.
The for loop iterates over all keys (returned in the $_ special variable), and $hash{$_} is the corresponding value. This could also have been written as
while (my ($key, $val) = each %hash) {
print "$key => $val\n";
}
where each iterates over all key-value pairs.
Try this
use warnings;
my %data = ();
open FILE, '<', 'file1.txt' or die $!;
while(<FILE>)
{
chomp;
$data{$1} = $2 while /\s*(\S+)=(\S+)/g;
}
close FILE;
print $_, '-', $data{$_}, $/ for keys %data;
The simplest way is to slurp the entire file into memory and assign key/value pairs to the hash using a regular expression.
This program shows the technique
use strict;
use warnings;
my %data = do {
open my $fh, '<', '/root/temp.txt' or die $!;
local $/;
<$fh> =~ /(\w+)\s*=\s*(\w+)/g;
};
use Data::Dump;
dd \%data;
output
{
cinterim => 3534,
cstart => 517,
cstop => 622,
ointerim => 47,
ostart => 19,
ostop => 20,
}