Regarding getting the size of array in a hash of array structure - perl

I wrote the following Perl function
sub Outputing
{
my $featureMatrix = shift;
my $indexRow = shift;
my $fileName = "/projectworkspace/input.dat";
open(DATA, "> $fileName");
printf DATA "%d", $#$indexRow;
print DATA "\n";
my $numDataPoints = $#{$featureMatrix{$indexRow->[1]}};
printf DATA "%d", $numDataPoints;
print DATA "\n";
close DATA;
}
I calling Outputing as follows:
Outputing($matrix, $Rows);e
$matrix is a hash of array, whose structure is like this
my $matrix
= { 200 => [ 0.023, 0.035, 0.026 ],
110 => [ 0.012, 0.020, 0,033],
};
Rows is an array storing the sorted key of matrix, it is obtained as follows
my #Rows = sort keys %matrix;
both matrix and Rows are used as parameters passed to Outputing.
The printed out $numDataPoints is -1, which is not correct? I do not know which might be the reason that causes this problem? If we use the above example, and assume $indexRow->[1]=110, then $numDataPoints should be 2. I am not sure whether the $#{$featureMatrix{$indexRow->[1]}}; is the correct way to get the size of this array.

Assuming that you've included all the relevant code, this:
my #indexRow = sort keys %featureMatrix;
should be this:
my #indexRow = sort keys %$featureMatrix;
and this:
my $numDataPoints = $#{$featureMatrix{$indexRow->[1]}};
should be this:
my $numDataPoints = $#{$featureMatrix->{$indexRow->[1]}};
That is, the problem is that in some places, you're using a hash named %featureMatrix, and in others, you're using a hashref named $featureMatrix that refers to an anonymous hash.
You should be using use warnings and use strict to prevent such mistakes: those would have prevented you from using %featureMatrix when you've only declared $featureMatrix. (Actually, use warnings might not help in this case — it could detect if you used %featureMatrix exactly once, but in your case, you use it a few times — but use strict would almost certainly have helped.)

Related

Perl find out if X is an element in an array

I don't know why small things are too not working for me in Perl. I am sorry for that.
I have been trying it around 2 hrs but i couldn't get the results.
my $technologies = 'json.jquery..,php.linux.';
my #techarray = split(',',$technologies);
#my #techarray = [
# 'json.jquery..',
# 'php.linux.'
# ];
my $search_id = 'json.jquery..';
check_val(#techarray, $search_id);
And i am doing a "if" to search the above item in array. but it is not working for me.
sub check_val{
my #techarray = shift;
my $search_id = shift;
if (grep {$_ eq $search_id} #techarray) {
print "It is there \n";
}else{
print "It is not there \n";
}
}
Output: It always going to else condition and returns "It is not there!" :(
Any idea. Am i done with any stupid mistakes?
You are using an anonymous array [ ... ] there, which as a scalar (reference) is then assigned to #techarray, as its only element. It is like #arr = 'a';. An array is defined by ( ... ).
A remedy is to either define an array, my #techarray = ( ... ), or to properly define an arrayref and then dereference when you search
my $rtecharray = [ .... ];
if (grep {$_ eq $search_id} #$rtecharray) {
# ....
}
For all kinds of list manipulations have a look at List::Util and List::MoreUtils.
Updated to changes in the question, as the sub was added
This has something else, which is more instructive.
As you pass an array to a function it is passed as a flat list of its elements. Then in the function the first shift picks up the first element,
and then the second shift picks up the second one.
Then the search is over the array with only 'json.jquery..' element, for 'php.linux.' string.
Instead, you can pass a reference,
check_val(\#techarray, $search_id);
and use it as such in the function.
Note that if you pass the array and get arguments in the function as
my (#array, $search_id) = #_; # WRONG
you are in fact getting all of #_ into #array.
See, for example, this post (passing to function) and this post (returning from function).
In general I'd recommend passing lists by reference.

Why I can use #list to call an array, but can't use %dict to call a hash in perl? [duplicate]

This question already has answers here:
Why do you need $ when accessing array and hash elements in Perl?
(9 answers)
Closed 8 years ago.
Today I start my perl journey, and now I'm exploring the data type.
My code looks like:
#list=(1,2,3,4,5);
%dict=(1,2,3,4,5);
print "$list[0]\n"; # using [ ] to wrap index
print "$dict{1}\n"; # using { } to wrap key
print "#list[2]\n";
print "%dict{2}\n";
it seems $ + var_name works for both array and hash, but # + var_name can be used to call an array, meanwhile % + var_name can't be used to call a hash.
Why?
#list[2] works because it is a slice of a list.
In Perl 5, a sigil indicates--in a non-technical sense--the context of your expression. Except from some of the non-standard behavior that slices have in a scalar context, the basic thought is that the sigil represents what you want to get out of the expression.
If you want a scalar out of a hash, it's $hash{key}.
If you want a scalar out of an array, it's $array[0]. However, Perl allows you to get slices of the aggregates. And that allows you to retrieve more than one value in a compact expression. Slices take a list of indexes. So,
#list = #hash{ qw<key1 key2> };
gives you a list of items from the hash. And,
#list2 = #list[0..3];
gives you the first four items from the array. --> For your case, #list[2] still has a "list" of indexes, it's just that list is the special case of a "list of one".
As scalar and list contexts were rather well defined, and there was no "hash context", it stayed pretty stable at $ for scalar and # for "lists" and until recently, Perl did not support addressing any variable with %. So neither %hash{#keys} nor %hash{key} had meaning. Now, however, you can dump out pairs of indexes with values by putting the % sigil on the front.
my %hash = qw<a 1 b 2>;
my #list = %hash{ qw<a b> }; # yields ( 'a', 1, 'b', 2 )
my #l2 = %list[0..2]; # yields ( 0, 'a', 1, '1', 2, 'b' )
So, I guess, if you have an older version of Perl, you can't, but if you have 5.20, you can.
But for a completist's sake, slices have a non-intuitive way that they work in a scalar context. Because the standard behavior of putting a list into a scalar context is to count the list, if a slice worked with that behavior:
( $item = #hash{ #keys } ) == scalar #keys;
Which would make the expression:
$item = #hash{ #keys };
no more valuable than:
scalar #keys;
So, Perl seems to treat it like the expression:
$s = ( $hash{$keys[0]}, $hash{$keys[1]}, ... , $hash{$keys[$#keys]} );
And when a comma-delimited list is evaluated in a scalar context, it assigns the last expression. So it really ends up that
$item = #hash{ #keys };
is no more valuable than:
$item = $hash{ $keys[-1] };
But it makes writing something like this:
$item = $hash{ source1(), source2(), #array3, $banana, ( map { "$_" } source4()};
slightly easier than writing:
$item = $hash{ [source1(), source2(), #array3, $banana, ( map { "$_" } source4()]->[-1] }
But only slightly.
Arrays are interpolated within double quotes, so you see the actual contents of the array printed.
On the other hand, %dict{1} works, but is not interpolated within double quotes. So, something like my %partial_dict = %dict{1,3} is valid and does what you expect i.e. %partial_dict will now have the value (1,2,3,4). But "%dict{1,3}" (in quotes) will still be printed as %dict{1,3}.
Perl Cookbook has some tips on printing hashes.

How to Hash in Perl

I am finding uniques URL in a log file along with the response stamp which can be available using $line[7]. I am using Hash to get the unique URLs.
How can I get the count of Unique URL?
How can I get the average of response time along with the count of Unique URL?
With below code I am getting
url1
url2
url3
but I want it along with the average response time and count of each URL
URL Av.RT Count
url1 10.5 125
url2 9.3 356
url3 7.8 98
Code:
#!/usr/bin/perl
open(IN, "web1.txt") or die "can not open file";
# Hash to store final list of unique IPs
my %uniqueURLs = ();
my $z;
# Read log file line by line
while (<IN>) {
#line = split(" ",$_);
$uniqueURLs{$line[9]}=1;
}
# Go through the hash table and print the keys
# which are the unique IPs
for $url (keys %uniqueURLs) {
print $url . "\n";
}
store a listref in your hashing directory:
$uniqueURLs{$line[9]} = [ <avg response time>, <count> ];
adjust the elements accordingly, eg. the count:
if (defined($uniqueURLs{$line[9]})) {
# url known, increment count,
# update average response time with data from current log entry
$uniqueURLs{$line[9]}->[0] =
(($uniqueURLs{$line[9]}->[0] * $uniqueURLs{$line[9]}->[1]) + ($line[7] + 0.0))
/ ($uniqueURLs{$line[9]}->[1] + 1)
;
$uniqueURLs{$line[9]}->[1] += 1;
}
else {
# url not yet known,
# init count with 1 and average response time with actual response time from log entry
$uniqueURLs{$line[9]} = [ $line[7] + 0.0, 1 ];
}
to print results:
# Go through the hash table and print the keys
# which are the unique IPs
for $url (keys %uniqueURLs) {
printf ( "%s %f %d\n", $url, $uniqueURLs{$url}->[0], $uniqueURLs{$url}->[1]);
}
adding 0.0 will guarantee type coercion from string to float as a safeguard.
Read up on References. Also, read up on modern Perl practices which will help improve your programming skills.
Instead of just using the keys of your hash of unique URLs, you could store information in those hashes. Let's start with just a count of the unique URLs:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use feature qw(say);
use constant {
WEB_FILE => "web1.txt",
};
open my $web_fh, "<", WEBFILE; #Autodie will catch this for you
my %unique_urls;
while ( my $line = <$web_fh> ) {
my $url = (split /\s+/, $line)[9];
if ( not exists $unique_urls{$url} ) { #Not really needed
$unique_urls{$url} = 0;
}
$unique_urls{$url} += 1;
}
close $web_fh;
Now, each key in your %unique_urls hash will contain the number of unique URLs you have.
This, by the way, is your code written in a bit more modern style. The use strict; and use warnings; pragmas will catch about 90% of the standard programming errors. The use autodie; will catch exceptions to things that you forget to check. In this case, the program will automatically die if the file doesn't exist.
The three parameter version of the open command is preferred, and so is using scalar variables for file handles. Using scalar variables for the file handle makes them easier to pass in subroutines, and the file will automatically close if the file handle falls out of scope.
However, we want to store in two items per hash. We want to store the unique count, and we want to store something that will help us find the average response time. This is where references come in.
In Perl, variables deal with single data items. A scalar variable (like $foo) deals with an individual data item. Arrays and Hashes (like #foo and %foo) deal with lists of individual data items. References help you get around this limitation.
Let's look at an array of people:
$person[0] = "Bob";
$person[1] = "Ted";
$person[2] = "Carol";
$person[3] = "Alice";
However, people are more than just first names. They have last names, phone numbers, addresses, etc. Let's take a look at a hash for Bob:
my %bob_hash;
$bob_hash{FIRST_NAME} = "Bob";
$bob_hash{LAST_NAME} = "Jones";
$bob_hash{PHONE} = "555-1234";
We can take a reference to this hash by putting a backslash in front of it. A reference is merely the memory address where this hash is stored:
$bob_reference = \%bob_hash;
print "$bob_reference\n": # Prints out something like HASH(0x7fbf79004140)
However, that memory address is a single item, and could be stored in our array of people!
$person[0] = $bob_reference;
If we want to get to the items in our reference, we dereference it by putting the right data type symbol in front. Since this is a hash, we will use %:
$bob_hash = %{ $person[0] };
Perl provides an easy way to dereference hashes with the -> syntax:
$person[0]->{FIRST_NAME} = "Bob";
$person[0]->{LAST_NAME} = "Jones";
$person[0]->{PHONE} = "555-1212";
We'll use the same technique in %unique_urls to store the number of times, and the total amount of response time. (Average will be total time / number of times).
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use feature qw(say);
use constant {
WEB_FILE => "web1.txt",
};
open my $web_fh, "<", WEB_FILE; #Autodie will catch this for you
my %unique_urls;
while ( my $line ( <$web_fh> ) {
my $url = (split /\s+/, $line)[9];
my $response_time = (split /\s+/, $line)[10]; #Taking a guess
if ( not exists $unique_urls{$url} ) { #Not really needed
$unique_urls{$url}->{INSTANCES} = 0;
$unique_urls{$url}->{TOTAL_RESP_TIME} = 0;
}
$unique_urls{$url}->{INSTANCES} += 1;
$unique_urls{$url}->{TOTAL_RESP_TIME} += $response_time;
}
$close $web_fh;
Now we can print them out:
print "%20.20s %6s %8s\n", "URL", "INST", "AVE";
for my $url ( sort keys %unique_urls ) {
my $total_resp_time = $unique_urls{$url}->{TOTAL_RESP_TIME};
my $instances = $unique_urls{$url}->{INSTANCES};
my $average = $total_resp_time / $instances
printf "%-20.20s %-6d %-8.5f\n", $url, $instances, $average";
}
I like using printf for tables.
Instead of setting the value to 1 here:
$uniqueURLs{$line[9]}=1;
Store a data structure indicating the response time and the number of times this URL has been seen (so you can properly calculate the average). You can use an array ref, or hashref if you want. If the key doesn't exist yet, that means it hasn't been seen yet, and you can set some initial values.
# Initialize 3-element arrayref: [count, total, average]
$uniqueURLS{$line[9]} = [0, 0, 0] if not exists $uniqueURLS{$line[9]};
$uniqueURLs{$line[9]}->[0]++; # Count
$uniqueURLs{$line[9]}->[1] += $line[7]; # Total time
# Calculate average
$uniqueURLs{$line[9]}->[2] = $uniqueURLs{$line[9]}->[1] / $uniqueURLs{$line[9]}->[0];
One way you can get count of uniqueURLS is by counting the keys:
print scalar(keys %uniqueURLS); # Print number of unique url's
In your loop, you can print out the url and average time like this:
for $url (keys %uniqueURLs) {
print $url, ' - ', $uniqueURLs[$url]->[2], "seconds \n";
}

Count hash values while using Data::Dumper

I need to find the count of values (ie abc1) in a Perl hash and if > 4 run run an internal command within a IF block. I just need to figure out the concept of how to count # of values.
(I could leave a code sample of what I've attempted but that would just result in uncontrolled laughter and confusion)
I am using Data::Dumper, and utilizing the following format to store key/value in hash.
push #{$hash{$key}}, $val;
A print of hash gives :
$ print Dumper \%hash;
$VAR1 = {
'5555' => [
'abc1',
'abc1',
'abc1'
]
};
Please let me know how to get the count.
Thanks in advance.
Well, do you want to count that particular string, or the number of elements?
my $count = #{$hash{$key}}; # get the size of the array (all elements)
my %num;
for my $val (#{$hash{$key}}) {
$num{$val}++; # count the individual keys
}
print "Number of 'abc1': $num{'abc1'}\n";
The number of values in a hash is the same as the number of keys. What you are after, though, is the number of elements in an array (referenced from a hash value). To get the size of an array, just use it in scalar context. For an array reference, you have to dereference it first:
my $count = #{ $hash{$key} };

How can I use hashes as arguments to subroutines in Perl?

I was asked to modify some existing code to add some additional functionality. I have searched on Google and cannot seem to find the answer. I have something to this effect...
%first_hash = gen_first_hash();
%second_hash = gen_second_hash();
do_stuff_with_hashes(%first_hash, %second_hash);
sub do_stuff_with_hashes
{
my %first_hash = shift;
my %second_hash = shift;
# do stuff with the hashes
}
I am getting the following errors:
Odd number of elements in hash assignment at ./gen.pl line 85.
Odd number of elements in hash assignment at ./gen.pl line 86.
Use of uninitialized value in concatenation (.) or string at ./gen.pl line 124.
Use of uninitialized value in concatenation (.) or string at ./gen.pl line 143.
Line 85 and 86 are the first two lines in the sub routine and 124 and 143 are where I am accessing the hashes. When I look up those errors it seems to suggest that my hashes are uninitialized. However, I can verify that the hashes have values. Why am I getting these errors?
The hashes are being collapsed into flat lists when you pass them into the function. So, when you shift off a value from the function's arguments, you're only getting one value. What you want to do is pass the hashes by reference.
do_stuff_with_hashes(\%first_hash, \%second_hash);
But then you have to work with the hashes as references.
my $first_hash = shift;
my $second_hash = shift;
A bit late but,
As have been stated, you must pass references, not hashes.
do_stuff_with_hashes(\%first_hash, \%second_hash);
But if you need/want to use your hashes as so, you may dereference them imediatly.
sub do_stuff_with_hashes {
my %first_hash = %{shift()};
my %second_hash = %{shift()};
};
Hash references are the way to go, as the others have pointed out.
Providing another way to do this just for kicks...because who needs temp variables?
do_stuff_with_hashes( { gen_first_hash() }, { gen_second_hash() } );
Here you are just creating hash references on the fly (via the curly brackets around the function calls) to use in your do_stuff_with_hashes function. This is nothing special, the other methods are just as valid and probably more clear. This might help down the road if you see this activity in your travels as someone new to Perl.
First off,
do_stuff_with_hashes(%first_hash, %second_hash);
"streams" the hashes into a list, equivalent to:
( 'key1_1', 'value1_1', ... , 'key1_n', 'value1_n', 'key2_1', 'value2_1', ... )
and then you select one and only one of those items. So,
my %first_hash = shift;
is like saying:
my %first_hash = 'key1_1';
# leaving ( 'value1', ... , 'key1_n', 'value1_n', 'key2_1', 'value2_1', ... )
You cannot have a hash like { 'key1' }, since 'key1' is mapping to nothing.
A solution without shift() is faster, because it do not copy the data in memory, and you can modify the hash in the subroutine. See my example:
sub do_stuff_with_hashes($$$) {
my ($str,$refHash1,$refHash2)=#_;
foreach (keys %{$refHash1}) { print $_.' ' }
$$refHash1{'new'}++;
}
my (%first_hash, %second_hash);
$first_hash{'first'}++;
do_stuff_with_hashes('any_parameter', \%first_hash, \%second_hash);
print "\n---\n", $first_hash{'new'};