Create an array for key value in hash (Perl) - perl

I am having a very small difficulty in Perl.
I am reading a text file which for some information. While I read the text file, I select some keys from text; as I read further, I wish to save an array of values for the keys.
See for e.g.
Alpha 1
Something: 2132
Something: 2134
Alpha 2
Something: 2132
Something: 2134
I read the file into an array called lines:
my $h;
my $alpha;
for my $line (#lines){
if ($line =~ m/Alpha (\d+)/){
$alpha = $1;
$h->{$alpha} = (); # create empty array for key?
}
elsif ($line =~ m/Something: (\d+)/){
push($h->{$alpha}, $1);
}
}
Apparently, it gives me an error:
Type of arg 1 to push must be array (not hash element) at test.pl line 28, near "$1)"
Execution of test.pl aborted due to compilation errors.
Unable to figure this out.

A hash key value can contain only a scalar. If you want to store an array, then you need to go for array reference.
You can do something like this:
for my $line (#lines){
if ($line =~ m/Alpha (\d+)/){
$alpha = $1;
$h->{$alpha} = []; # create empty array refernece for key
}
elsif ($line =~ m/Something: (\d+)/){
push( #{$h->{$alpha}}, $1);
}
}

You need to make two changes:
$h->{$alpha} = [];
** - Create an anonymous array and store
a ref to it in the hash
And
push(#{$h->{$alpha}}, $1);
because push requires an actual array, and you have an array reference. The #{...} wrapper dereferences the arrayref to an actual array.

As earlier answers say, they are correct. But not perfect. $alpha may remain undef. Hence in order to avoid it please add a check.
my $alpha;
for my $line (#lines){
if ($line =~ m/Alpha (\d+)/){
$alpha = $1;
$h->{$alpha} = []; # create empty array refernece for key
}
elsif ($line =~ m/Something: (\d+)/){
if(defined $apha) { ## Check
push( #{$h->{$alpha}}, $1);
}
}
}

Related

Counting through a hash - PERL

I have a db of places people have ordered items from. I parsed the list to get the city and state so it prints like this - city, state (New York, NY) etc....
I use the variables $city and $state but I want to count how many times each city and state occur so it looks like this - city, state, count (Seattle, WA 8)
I have all of it working except the count .. I am using a hash but I can't figure out what is wrong with this hash:
if ($varc==3) {
$line =~ /(?:\>)(\w+.*)(?:\<)/;
$city = $1;
}
if ($vars==5) {
$line =~ /(?:\>)((\w+.*))(?:\<)/;
$state = $1;
# foreach $count (keys %counts){
# $counts = {$city, $state} {$count}++;
# print $counts;
# }
print "$city, $state\n";
}
foreach $count (keys %counts){
$counts = {$city, $state} {$count}++;
print $counts;
}
Instead of printing city and state you can build a "location" string with both items and use the following counting code:
# Declare this variable before starting to parse the locations.
my %counts = ();
# Inside of the loop that parses the city and state, let's assume
# that you've got $city and $state already...
my $location = "$city, $state";
$counts{$location} += 1;
}
# When you've processed all locations then the counts will be correct.
foreach $location (keys %counts) {
print "OK: $location => $counts{$location}\n";
}
# OK: New York, NY => 5
# OK: Albuquerque, NM => 1
# OK: Los Angeles, CA => 2
This is going to be a mix of an answer and a code review. I will start with a warning though.
You are trying to parse what looks like XML with Regular Expressions. While this can be done, it should probably not be done. Use an existing parser instead.
How do I know? Stuff that is between angle brackets looks like the format is XML, unless you have a very weird CSV file.
# V V
$line =~ /(?:\>)(\w+.*)(?:\<)/;
Also note that you don't need to escape < and >, they have no special meaning in regex.
Now to your code.
First, make sure you always use strict and use warnings, so you are aware of stuff that goes wrong. I can tell you're not because the $count in your loop has no my.
What's $vars (with an s), and what's $varc (with a c). I am guessing that has to do with the state and the city. Is it the column number? In an XML file? Huh.
$line =~ /(?:\>)((\w+.*))(?:\<)/;
Why are there two capture groups, both capturing the same thing?
Anyway, you want to count how often each combination of state and city occurs.
foreach $count (keys %counts){
$counts = {$city, $state} {$count}++;
print $counts;
}
Have you run this code? Even without strict, it gives a syntax error. I'm not even sure what it's supposed to do, so I can't tell you how to fix it.
To implement counting, you need a hash. You got that part right. But you need to declare that hash variable outside of your file reading loop. Then you need to create a key for your city and state combination in the hash, and increment it every time that combination is seen.
my %counts; # declare outside the loop
while ( my $line = <$fh> ) {
chomp $line;
if ( $varc == 3 ) {
$line =~ /(?:\>)(\w+.*)(?:\<)/;
$city = $1;
}
if ( $vars == 5 ) {
$line =~ /(?:\>)((\w+.*))(?:\<)/;
$state = $1;
print "$city, $state\n";
$count{"$city, $state"}++; # increment when seen
}
}
You have to parse the whole file before you can know how often each combination is in the file. So if you want to print those together, you will have to move the printing outside of the loop that reads the file, and iterate the %count hash by keys at a later point.
my %counts; # declare outside the loop
while ( my $line = <$fh> ) {
chomp $line;
if ( $varc == 3 ) {
$line =~ /(?:\>)(\w+.*)(?:\<)/;
$city = $1;
}
if ( $vars == 5 ) {
$line =~ /(?:\>)((\w+.*))(?:\<)/;
$state = $1;
$count{"$city, $state"}++; # increment when seen
}
}
# iterate again to print final counts
foreach my $item ( sort keys %counts ) {
print "$item $counts{$item}\n";
}

how to count number of occurances in hash values

I have a below fix file and I want to find out how many orders are sent at same time. I am using tag 52 as the sending time.
Below is the file,
8=FIX.4.2|9=115|35=A|52=20080624-12:43:38.021|10=186|
8=FIX.4.2|52=20080624-12:43:38.066|10=111|
8=FIX.4.2|9=105|35=1|22=BOO|52=20080624-12:43:39.066|10=028|
If I want to count number how many same occurances of Tag 52 values were sent? How can I check?
So far, I have written below code but not giving me the frequency.
#!/usr/bin/perl
$f = '2.txt';
open (F,"<$f") or die "Can not open\n";
while (<F>)
{
chomp $_;
#data = split (/\|/,$_);
foreach $data (#data)
{
if ( $data == 52){
#data1 = split ( /=/,$data);
for my $j (#data1)
{
$hash{$j}++;
} for my $j (keys %hash)
{
print "$j: ", $hash{j}, "\n";
}
}
}
}
Here is your code corrected:
#!/usr/bin/perl
$f = '2.txt';
open (F,"<$f") or die "Can not open\n";
my %hash;
while (<F>) {
chomp $_;
#data = split (/\|/,$_);
foreach $data (#data) {
if ($data ~= /^52=(.*)/) {
$hash{$1}++;
}
}
}
for my $j (keys %hash) {
print "$j: ", $hash{j}, "\n";
}
Explanation:
if ( $data == 52) compares the whole field against value 52, not a substring of the field. Of course, you do not have such fields, and the test always fails. I replaces it with a regexp comparison.
The same regexp gives an opportunity to catch a timestamp immediately, without a need to split the field once more. It is done by (.*) in the regexp and $1 in the following assignment.
It is hardly makes sense to output the hash for every line of input data (your code outputs it within the foreach loop). I moved it down. But maybe, outputting the current hash for every line is what you wanted, I do not know.

How to get a subset of keys-value from a file with universe of key value pairs

I have a file with key value pairs separated by whitespace. The first column in the file is the key and the rest of the columns are the value. In other words, each key may have an array for a value.
I'm only interested in the values of certain keys in the file. I have an array with the keys I'm interested in. What's the best way in perl to create a hash with only the subset of key/value pairs that i'm interested in?
Here's what I have thus far:
foreach my $line (#{$file_arr_ref}) {
my $sub = substr( $line, 0, 1);
if(($sub ne "#") and ($sub ne "")){ #omit comments and blank lines
my #key_vals = split(/\s/, $line);
if $key_vals[0] eq "key_i'm_interested_in_1" or $key_vals[0] eq "key_i'm_interested_in_2" {
insert_into_hash();
}
}
}
Is there a more optimal way of doing this?
Create a hash from the array with keys you need.
my #keys_i_need = ('key_1', 'key_2', 'key_3');
my %keys_i_need = map {$_ => 1} #keys_i_need;
foreach my $line (#{$file_arr_ref}) {
my $sub = substr( $line, 0, 1);
if(($sub ne "#") and ($sub ne "")){ #omit comments and blank lines
my #key_vals = split(/\s/, $line);
insert_into_hash() if(exists $keys_i_need{$key_vals[0]});
}
}
Typically, when one is looking for the existance of something, the first data structure one should think of is a hash.
However, if the list of items is short, an array might be sufficient as well by using grep.
foreach my $line (#{$file_arr_ref}) {
next if $line =~ /^$/ || $line =~ /^#/; # Omit blank lines and comments
my #key_vals = split /\s/, $line;
next if ! grep {$key_vals[0] eq $_} qw(key_one key_two key_three);
insert_into_hash();
}
Also note, if you're going to be iterating on all the lines of your file, that it might be better to do it in the form while (<$fh>) instead of loading them all into an array first.

Reading the next line in the file and keeping counts separate

Another question for everyone. To reiterate I am very new to the Perl process and I apologize in advance for making silly mistakes
I am trying to calculate the GC content of different lengths of DNA sequence. The file is in this format:
>gene 1
DNA sequence of specific gene
>gene 2
DNA sequence of specific gene
...etc...
This is a small piece of the file
>env
ATGCTTCTCATCTCAAACCCGCGCCACCTGGGGCACCCGATGAGTCCTGGGAA
I have established the counter and to read each line of DNA sequence but at the moment it is do a running summation of the total across all lines. I want it to read each sequence, print the content after the sequence read then move onto the next one. Having individual base counts for each line.
This is what I have so far.
#!/usr/bin/perl
#necessary code to open and read a new file and create a new one.
use strict;
my $infile = "Lab1_seq.fasta";
open INFILE, $infile or die "$infile: $!";
my $outfile = "Lab1_seq_output.txt";
open OUTFILE, ">$outfile" or die "Cannot open $outfile: $!";
#establishing the intial counts for each base
my $G = 0;
my $C = 0;
my $A = 0;
my $T = 0;
#initial loop created to read through each line
while ( my $line = <INFILE> ) {
chomp $line;
# reads file until the ">" character is encounterd and prints the line
if ($line =~ /^>/){
print OUTFILE "Gene: $line\n";
}
# otherwise count the content of the next line.
# my percent counts seem to be incorrect due to my Total length counts skewing the following line. I am currently unsure how to fix that
elsif ($line =~ /^[A-Z]/){
my #array = split //, $line;
my $array= (#array);
# reset the counts of each variable
$G = ();
$C = ();
$A = ();
$T = ();
foreach $array (#array){
#if statements asses which base is present and makes a running total of the bases.
if ($array eq 'G'){
++$G;
}
elsif ( $array eq 'C' ) {
++$C; }
elsif ( $array eq 'A' ) {
++$A; }
elsif ( $array eq 'T' ) {
++$T; }
}
# all is printed to the outfile
print OUTFILE "G:$G\n";
print OUTFILE "C:$C\n";
print OUTFILE "A:$A\n";
print OUTFILE "T:$T\n";
print OUTFILE "Total length:_", ($A+=$C+=$G+=$T), "_base pairs\n";
print OUTFILE "GC content is(percent):_", (($G+=$C)/($A+=$C+=$G+=$T)*100),"_%\n";
}
}
#close the outfile and the infile
close OUTFILE;
close INFILE;
Again I feel like I am on the right path, I am just missing some basic foundations. Any help would be greatly appreciated.
The final problem is in the final counts printed out. My percent values are wrong and give me the wrong value. I feel like the total is being calculated then that new value is incorporated into the total.
Several things:
1. use hash instead of declaring each element.
2. assignment such as $G = (0); is indeed working, but it is not the right way to assign scalar. What you did is declaring an array, which in scalar context $G = is returning the first array item. The correct way is $G = 0.
my %seen;
$seen{/^([A-Z])/}++ for (grep {/^\>/} <INFILE>);
foreach $gene (keys %seen) {
print "$gene: $seen{$gene}\n";
}
Just reset the counters when a new gene is found. Also, I'd use hashes for the counting:
use strict; use warnings;
my %counts;
while (<>) {
if (/^>/) {
# print counts for the prev gene if there are counts:
print_counts(\%counts) if keys %counts;
%counts = (); # reset the counts
print $_; # print the Fasta header
} else {
chomp;
$counts{$_}++ for split //;
}
}
print_counts(\%counts) if keys %counts; # print counts for last gene
sub print_counts {
my ($counts) = #_;
print "$_:=", ($counts->{$_} || 0), "\n" for qw/A C G T/;
}
Usage: $ perl count-bases.pl input.fasta.
Example output:
> gene 1
A:=3
C:=1
G:=5
T:=5
> gene 2
A:=1
C:=5
G:=0
T:=13
Style comments:
When opening a file, always use lexical filehandles (normal variables). Also, you should do a three-arg open. I'd also recommend the autodie pragma for automatic error handling (since perl v5.10.1).
use autodie;
open my $in, "<", $infile;
open my $out, ">", $outfile;
Note that I don't open files in my above script because I use the special ARGV filehandle for input, and print to STDOUT. The output can be redirected on the shell, like
$ perl count-bases.pl input.fasta >counts.txt
Declaring scalar variables with their values in parens like my $G = (0) is weird, but works fine. I think this is more confusing than helpful. → my $G = 0.
Your intendation is a bit weird. It is very unusual and visually confusing to put closing braces on the same line with another statement like
...
elsif ( $array eq 'C' ) {
++$C; }
I prefer cuddling elsif:
...
} elsif ($base eq 'C') {
$C++;
}
This statement my $array= (#array); puts the length of the array into $array. What for? Tip: You can declare variables right inside foreach-loops, like for my $base (#array) { ... }.

Storing array as value in associative array

i have a problem where I need to have an array as a value in an associative array.
Go through the code below. Here I am trying to loop the files in a directory and it is more likely that more than 1 file can have the same ctrno. So, I would like to see what are all the files having the same ctrno. The code below gives error at "$ctrno_hash[$ctrno] = #arr;" in the else condition. The same case would be for if condition as well.
Am I following the right approach or could it be done differently?
sub loop_through_files
{
$file = "#_";
open(INPFILE, "$file") or die $!;
#print "$file:$ctrno\n";
while (<INPFILE>)
{
$line .= $_;
}
if ($line =~ /$ctrno/ )
{
print "found\n";
if ( exists $ctrno_hash[$ctrno])
{
local #arr = $ctrno_hash[$ctrno];
push (#arr, $file);
$ctrno_hash[$ctrno] = #arr;
}
else
{
local #arr;
push(#arr, $file);
$ctrno_hash[$ctrno] = #arr;
}
}
}
I believe you want something like
$ctrno_hash[$ctrno] = \#arr;
This will turn the array #arr into a array reference.
You then refer to the previously pushed array reference with
#{$ctrno_hash[$ctrno]}
That is, if $array_ref is an array reference, the construct #{ $array_ref } returns the array to which the array reference points.
Now, the construct $ctrno_hash[$ctrno] is not really a hash, but an ordinary array. In order to truly make it a hash, you need the curly brackets instead of the square brackets:
#{$ctrno_hash{$ctrno} } = \#arr;
And similarly, you later refer to the array with
#{$ctrno_hash{$ctrno} }
Now, having said that, you can entirly forgo the if ... exists construct:
if ($line =~ /$ctrno/ )
{
print "found\n";
push #{$ctrno_hash{$ctrno}}, $file
}