If I have a colon-delimited file name FILE and I do:
cat FILE|perl -F: -lane 'my %hash = (); $hash{#F[0]} = #F[2]'
to assign the first and 3rd tokens as the key => value pairs for the hash..
1) Is that a sane way to assign key value pairs to a hash?
2) What is the simplest way to now find all keys with shared values and list them?
Assume FILE looks like:
Mike:34:Apple:Male
Don:23:Corn:Male
Jared:12:Apple:Male
Beth:56:Maize:Female
Sam:34:Apple:Male
David:34:Apple:Male
Desired Output: Keys with value "Apple": Mike,Jared,David,Sam
Your example won't work as you want because the -n option puts a while loop around your one-line program, so the hash you declare is created and destoyed for every record in the file. You could get around that by not declaring the hash, and so making it a persistent package variable which will retain all values stored in it.
You can then write push #{ $hash{$F[2]} }, $F[0] but notice that it should be $F[0] etc. and not #F[0], and I have used push to create a list of column 1 values for each column 3 value instead of just a list of one-to-one values relating each column 1 value with its column 3 value.
To clarify, your method produces a hash looking like this, which has to be searched to produce the display that you want.
(
Beth => "Maize",
David => "Apple",
Don => "Corn",
Jared => "Apple",
Mike => "Apple",
Sam => "Apple",
)
while mine creates this, which as you can see is pretty much already in the form you want.
(
Apple => ["Mike", "Jared", "Sam", "David"],
Corn => ["Don"],
Maize => ["Beth"],
)
But I think this problem is a bit too big to be solved with a one-line Perl program. The solution below expects the path to the input file as a command-line parameter, like this
> perl prog.pl colons.csv
but it will default to myfile.csv if no file is specified.
use strict;
use warnings;
our #ARGV = 'myfile.csv' unless #ARGV;
my %data;
while (<>) {
my #fields = split /:/;
push #{ $data{$fields[2]} }, $fields[0];
}
while (my ($k, $v) = each %data) {
next unless #$v > 1;
printf qq{Keys with value "%s": %s\n}, $k, join ', ', #$v;
}
output
Keys with value "Apple": Mike, Jared, Sam, David
use strict;
use warnings;
open my $in, '<', 'in.txt';
my %data;
while(<$in>){
chomp;
my #split = split/:/;
$data{$split[0]} = $split[2];
}
my $query = 'Apple';
print "Keys with value $query = ";
foreach my $name (keys %data){
print "$name " if $data{$name} eq $query;
}
print "\n";
Arrays are used to hold list of values, so use an array.
perl -F: -lane'
push #{ $h{$F[2]} }, $F[0];
END {
for my $fruit (keys %h) {
next if #{ $h{$fruit} } < 2;
print "$fruit: ", join(",", #{ $h{$fruit} });
}
}
' FILE
The END block is executed on exit. In it, we iterate over the keys of the hash. If the value of the current hash element is an array with only one element, it's skipped. Otherwise, we prints the key followed by contents of the array referenced by the hash element.
Here is another way:
perl -F: -lane'
push #{ $h{$F[2]} }, $F[0];
}{
print "$_: ", join(",", #{ $h{$_} }) for grep { #{$h{$_}} > 1 } keys %h;
' file
We read each line and create hash of arrays using third column as key and first column as list of values for matching key. In the END block we iterate over our hash using grep and filter keys whose array count greater than 1 and print the key followed by array elements.
It doesn't have to be a one liner,
Good. It's not going to be...
Is that a sane way to assign key value pairs to a hash?
You're simply assigning the key value pairs as:
$hash{"key"} = "value";
Which is about as simple as it gets. There might be a way of doing it via map. However, the main issue I see is what should happen if you have duplicate keys.
Let's say your file looks like this:
Mike:34:Apple:Male
Don:23:Corn:Male
Jared:12:Apple:Male
Beth:56:Maize:Female
Sam:34:Apple:Male
David:34:Apple:Male # Note this entry is here twice!
David:35:Wheat:Male # Note this entry is here twice!
Let's do a simple assignment loop:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = $category;
}
When you get to $hash{David}, it will first be set to Apple, but then you change the value to Wheat. There are four ways you can handle this:
Use whatever the last value is. No change in the loop.
Use the first value and ignore subsequent values. Simple enough to do.
If that happens, it's an error. Abort the program and report the error.
Keep all values.
This last one is the most interesting because it involves a reference to an array as the values for your hash:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = [] if not exists $hash{$name}; # I'm making this an array reference
push #{ $hash{$name} }, $category;
}
Now, each value in my hash is a reference to an array:
my #values = #{ $hash{David} ); # The values of David...
print "David is in categories " . join ( ", ", #values ) . "\n";
This will print out David is in categories Wheat, Apple
What is the simplest way to now find all keys with shared values and list them?
The easiest way is to create a second hash that's keyed by your value. In this hash, you will need to use an array reference. Let's assume no duplicate names for now:
my %hash;
my %indexed_hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name} = $category;
my $indexed_hash{$category} = [] if not exist $indexed_hash{$category};
push #{ $indexed_hash{$category} }, $name;
}
Now, if I want to find all the duplicates of Apple:
my #names = #{ $indexed_hash{Apple} };
print "The following are in 'Apple': " . join ( ", " #names ) . "\n";
Since we're getting into references, we could take things a step further and store all of your values of your file in your hash. Again, for simplicity, I am assuming that you will have one and only one entry per name:
my %hash;
while my $line ( <$fh> ) {
chomp $line;
my ($name, $age, $category, $sex) = split /:/, $line;
$hash{$name}->{AGE} = $age;
$hash{$name}->{CATEGORY} = $category;
$hash{$name}->{SEX} = $sex;
}
for my $name ( sort keys %hash ) {
print "$name Information:\n";
print " Age: " . $hash{$name}->{AGE} . "\n";
printf "Category: %s\n", $hash{$name}->{CATEGORY};
print " Sex: #{[$hash{$name}->{SEX}]}\n\n";
}
That last two statements are easier ways of interpolating complex data structures into a string. The printf is fairly clear. The second #{[...]} is a neat little trick.
What have you tried?
If you reverse the hash into a list of value => key pairs then use List::Util's pairs() against the list, you can transform the hash into a hash of values => key arrayrefs. i.e. ( foo => [ 'bar', 'baz' ] ), grep {#{$hash{$_}} > 1} keys %hash, and print the results.
I have a data in an array as below. I want to copy all the content in a single variable. How can I do this ?
IFLADK
FJ
FAILED
FNKS
FKJ
FAILED
You could assign a reference to the array
my $scalar = \#array;
… or join all the strings in the array together
my $scalar = join "\n", #array;
With reference to previous question How to read n lines above the matched string in perl? Storing multiple hits in an array:
while (<$fh>) {
push #array, $_;
shift #array if #array > 4;
if (/script/) {
print #array;
push #found, join "", #array; # <----- this line
}
}
You could just use a scalar, e.g. $found = join "", #array, but then you would only store the last match in the loop.
Suppose the loop is finished, and now you have all the matches in array #found. If you want them in a scalar, just join again:
my $found = join "", #found;
Or you can just add them all at once in the loop:
$found .= join "", #array;
It all depends on what you intend to do with the data. Having the data in a scalar is rarely more beneficial than having it in an array. For example, if you are going to print it, there is no difference, as print $found is equivalent to print #found, because print takes a list of arguments.
If your intent is to interpolate the matches into a string:
print "Found matches: $found";
print "Found matches: ", #found;
$whole = join(' ', #lines)
But if you're reading the text from a file, it's easier to just read it all in one chunk, by (locally) undefining the record delimiter:
local $/ = undef;
$whole = <FILE>
Depends on what you are trying to do, but if you are wanting to package up an array into a scalar so that it can be retrieved later, then you might want Storable.
use Storable;
my #array = qw{foo bar baz};
my $stored_array = freeze \#array;
...
my #retrieved_array = #{ thaw($stored_array) };
Then again it could be that your needs may be served by just storing a reference to the array.
my #array = qw{foo bar baz};
my $stored_array = \#array;
...
my #retrieved_array = #$stored_array;
# my code as follows
use strict;
use FileHandle;
my #LISTS = ('incoming');
my $WORK ="c:\";
my $OUT ="c:\";
foreach my $list (#LISTS) {
my $INFILE = $WORK."test.dat";
my $OUTFILE = $OUT."TEST.dat";
while (<$input>) {
chomp;
my($f1,$f2,$f3,$f4,$f5,$f6,$f7) = split(/\|/);
push #sum, $f4,$f7;
}
}
while (#sum) {
my ($key,$value)= {shift#sum, shift#sum};
$hash{$key}=0;
$hash{$key} += $value;
}
while my $key (#sum) {
print $output2 sprintf("$key1\n");
# print $output2 sprintf("$key ===> $hash{$key}\n");
}
close($input);
close($output);
I am getting errors Unintialized error at addition (+) If I use 2nd print
I get HASH(0x19a69451) values if I use 1st Print.
I request you please correct me.
My output should be
unique Id ===> Total Revenue ($f4==>$f7)
This is wrong:
"c:\";
Perl reads that as a string starting with c:";\n.... Or in other words, it is a run away string. You need to write the last character as \\ to escape the \ and prevent it from escaping the subsequent " character
You probably want to use parens instead of braces:
my ($key, $value) = (shift #sum, shift #sum);
You would get that Unintialized error at addition (+) warning if the #sum array has an odd number of elements.
See also perltidy.
You should not enter the second while loop :
while my $key (#sum) {
because the previous one left the array #sum empty.
You could change to:
while (<$input>) {
chomp;
my #tmp = split(/\|/);
$hash{$tmp[3]} += $tmp[6];
}
print Dumper \%hash;
i have a problem where I need to have an array as a value in an associative array.
Go through the code below. Here I am trying to loop the files in a directory and it is more likely that more than 1 file can have the same ctrno. So, I would like to see what are all the files having the same ctrno. The code below gives error at "$ctrno_hash[$ctrno] = #arr;" in the else condition. The same case would be for if condition as well.
Am I following the right approach or could it be done differently?
sub loop_through_files
{
$file = "#_";
open(INPFILE, "$file") or die $!;
#print "$file:$ctrno\n";
while (<INPFILE>)
{
$line .= $_;
}
if ($line =~ /$ctrno/ )
{
print "found\n";
if ( exists $ctrno_hash[$ctrno])
{
local #arr = $ctrno_hash[$ctrno];
push (#arr, $file);
$ctrno_hash[$ctrno] = #arr;
}
else
{
local #arr;
push(#arr, $file);
$ctrno_hash[$ctrno] = #arr;
}
}
}
I believe you want something like
$ctrno_hash[$ctrno] = \#arr;
This will turn the array #arr into a array reference.
You then refer to the previously pushed array reference with
#{$ctrno_hash[$ctrno]}
That is, if $array_ref is an array reference, the construct #{ $array_ref } returns the array to which the array reference points.
Now, the construct $ctrno_hash[$ctrno] is not really a hash, but an ordinary array. In order to truly make it a hash, you need the curly brackets instead of the square brackets:
#{$ctrno_hash{$ctrno} } = \#arr;
And similarly, you later refer to the array with
#{$ctrno_hash{$ctrno} }
Now, having said that, you can entirly forgo the if ... exists construct:
if ($line =~ /$ctrno/ )
{
print "found\n";
push #{$ctrno_hash{$ctrno}}, $file
}
I have a csv file with following sample data.
o-option(alphabetical)
v-value(numerical)
number1,o1,v1,o2,v2,o3,v3,o4,v4,o5,v5,o6,v6
number2,o1,v11,o2,v22,o3,v33,o44,v44,o5,v55,o6,v66
and so on....
Required output.
NUM,o1,o2,o3,o4,o44,o5,o6
number1,v1,v2,v3,v4,,v5,v6
number2,v11,v22,v33,,v44,v55,v66
and so on...
In this data, all the options are same i.e. o1,o2,etc through out the file but option 4 value is changing, i.e. o4,o44, etc. In total there are about 9 different option values at o4 field. Can anyone please help me with the perl code to get the required output.
I have written the below code but still not getting the required output.
my #values;
my #options;
my %hash;
while (<STDIN>) {
chomp;
my ($srn,$o1,$v1,$o2,$v2,$o3,$v3,$o4,$v4,$o5,$v5,$o6,$v6) = split /[,\n]/, $_;
push #values, [$srn,$v1,$v2,$v3,$v4,$v5,$v6];
push #options, $o1,$o2,$o3,$o4,$o5,$o6;
}
#printing the header values
my #out = grep(!$hash{$_}++,#options);
print 'ID,', join(',', sort #out), "\n";
#printing the values.
for my $i ( 0 .. $#values) {
print #{$values[$i]}, "\n";
}
Output:
ID,o1,o2,o3,o4,o44,o5,o6
number1,v1,v2,v3,v4,v5,v6
number2,v1,v2,v3,v44,v5,v6
As from the above output, when the value 44 comes, it comes under option4 and hence the other values are shifting to left. The values are not mapping with the options. Please suggest.
If you want to line the numeric values up in columns based on the value of the preceding options values, store your data rows as hashes, using the options as the keys to the hash.
use strict;
use warnings;
my (#data, %all_opts);
while (<DATA>) {
chomp;
my %h = ('NUM', split /,/, $_);
push #data, \%h;
#all_opts{keys %h} = 1;
}
my #header = sort keys %all_opts;
print join(",", #header), "\n";
for my $d (#data){
my #vals = map { defined $d->{$_} ? $d->{$_} : '' } #header;
print join(",", #vals), "\n";
}
__DATA__
number1,o1,v1,o2,v2,o3,v3,o4,v4,o5,v5,o6,v6
number2,o1,v11,o2,v22,o3,v33,o44,v44,o5,v55,o6,v66
Is this what you're after?
use strict;
use warnings;
use 5.010;
my %header;
my #store;
while (<DATA>) {
chomp;
my ($srn, %f) = split /,/;
#header{ keys %f } = 1;
push #store, [ $srn, { %f } ];
}
# header
my #cols = sort keys %header;
say join q{,} => 'NUM', #cols;
# rows
for my $row (#store) {
say join q{,} => $row->[0],
map { $row->[1]->{ $_ } || q{} } #cols;
}
__DATA__
number1,o1,v1,o2,v2,o3,v3,o4,v4,o5,v5,o6,v6
number2,o1,v11,o2,v22,o3,v33,o44,v44,o5,v55,o6,v66
Which outputs:
NUM,o1,o2,o3,o4,o44,o5,o6
number1,v1,v2,v3,v4,,v5,v6
number2,v11,v22,v33,,v44,v55,v66
Make one pass through the file identifying all the different option values, build an array of those values.
Make second pass through the file:
for each record
initialise an associative array from your list of option value
parse the assigning values for the options you have
use your list of option values to iterate the associative array printing the values
You might look at the CPAN module DBD::AnyData. One of the neater modules out there. It allows you to manipulate a CSV file like it was a database. And much more.