Perl: Beginner. Which data structure should I use? - perl

Okay, not sure where to ask this, but I'm a beginner programmer, using Perl. I need to create an array of an array, but I'm not sure if it would be better use array/hash references, or array of hashes or hash of arrays etc.
I need an array of matches: #totalmatches
Each match contains 6 elements(strings):
#matches = ($chapternumber, $sentencenumber, $sentence, $grammar_relation, $argument1, $argument2)
I need to push each of these elements into the #matches array/hash/reference, and then push that array/hash/reference into the #totalmatches array.
The matches are found based on searching a file and selecting the strings based on meeting the criteria.
QUESTIONS
Which data structure would you use?
Can you push an array into another array, as you would push an element into an array? Is this an efficient method?
Can you push all 6 elements simultaneously, or have to do 6 separate pushes?
When working with 2-D, to loop through would you use:
foreach (#totalmatches) {
foreach (#matches) {
...
}
}
Thanks for any advice.

Which data structure would you use?
An array for a ordered set of things. A hash for a set of named things.
Can you push an array into another array, as you would push an element into an array? Is this an efficient method?
If you try to push an array (1) into an array (2), you'll end up pushing all the elements of 1 into 2. That is why you would push an array ref in instead.
Can you push all 6 elements simultaneously, or have to do 6 separate pushes?
Look at perldoc -f push
push ARRAY,LIST
You can push a list of things in.
When working with 2-D, to loop through would you use:
Nested foreach is fine, but that syntax wouldn't work. You have to access the values you are dealing with.
for my $arrayref (#outer) {
for my $item (#$arrayref) {
$item ...
}
}

Do not push one array into another array.
Lists just join with each other into a new list.
Use list of references.
#create an anonymous hash ref for each match
$one_match_ref = {
chapternumber => $chapternumber_value,
sentencenumber => $sentencenumber_value,
sentence => $sentence_value,
grammar_relation => $grammar_relation_value,
arg1 => $argument1,
arg2 => $argument2
};
# add the reference of match into array.
push #all_matches, $one_match_ref;
# list of keys of interest
#keys = qw(chapternumber sentencenumber sentence grammer_relation arg1 arg2);
# walk through all the matches.
foreach $ref (#all_matches) {
foreach $key (#keys) {
$val = $$ref{$key};
}
# or pick up some specific keys
my $arg1 = $$ref{arg1};
}

Which data structure would you use?
An array... I can't really justify that choice, but I can't imagine what you would use as keys if you used a hash.
Can you push an array into another array, as you would push an element into an array? Is this an efficient method?
Here's the thing; in Perl, arrays can only contain scalar variables - the ones which start with $. Something like...
#matrix = ();
#row = ();
$arr[0] = #row; # FAIL!
... wont't work. You will have to instead use a reference to the array:
#matrix = ();
#row = ();
$arr[0] = \#row;
Or equally:
push(#matrix, \#row);
Can you push all 6 elements simultaneously, or have to do 6 separate pushes?
If you use references, you need only push once... and since you don't want to concatenate arrays (you need an array of arrays) you're stuck with no alternatives ;)
When working with 2-D, to loop through would you use:
I'd use something like:
for($i=0; $i<#matrix; $i++) {
#row = #{$matrix[$i]}; # de-reference
for($j=0; $j<#row; $j++) {
print "| "$row[$j];
}
print "|\n";
}

Which data structure would you use?
Some fundamental container properties:
An array is a container for ordered scalars.
A hash is a container for scalars obtained by a unique key (there can be no duplicate keys in the hash). The order of values added later is not available anymore.
I would use the same structure like ZhangChn proposed.
Use a hash for each match.
The details of the match then can be accessed by descriptive names instead of plain numerical indices. i.e. $ref->{'chapternumber'} instead of $matches[0].
Take references of these anonymous hashes (which are scalars) and push them into an array in order to preserve the order of the matches.
To dereference items from the data structure
get an item from the array which is a hash reference
retrieve any matching detail you need from the hash reference

Related

Pushing a list of key/value pairs in Perl Hash

I'm trying to 'push' a list of key/value pairs into a perl hash as follows. I thought it would work like the key-value pairs are assigned when an array is assigned to a hash. #targets contains tab separated strings for each elements. However for every iteration of the map loop, the hash is overwitten and at the end I only got one key-value pair in the hash which corresponds to the last element of the #targets. I'm trying to avoid the usual $ID_Gene{$key}=$value type assignments.
How to push a list as a key-value pair to the hash?
Alternatively, is there any way I can build an anonymous hash and then push that hash to the original hash?Like: %ID_Gene = (%ID_Gene, %AnonyHash);
my %ID_Gene;
map{ %ID_Gene= (split /\t/,$_) ;
}#targets;
You're almost there, you just have the assignment in the wrong place, so you end up blowing away the entire hash each time you add an item. As a general rule, it usually doesn't make sense to do things with side-effects in map. You can simply do:
my %ID_Gene = map { split /\t/, $_ } #targets;
if you already have some stuff in %ID_Gene and you want to add to it, you can do
%ID_Gene = (%ID_Gene, map { split /\t/, $_ } #targets);
or if you think that's too much going on in one line:
my %to_add = map { split /\t/, $_ } #targets;
%ID_Gene = (%ID_Gene, %to_add);

Multidimension array in perl

I am working on a short script in which two to three variables are linked with each other.
Example:
my #batch;
my #case;
my #type = {
back => "sticker",
front => "no sticker",
};
for (my $i=0; $i<$#batch; $i++{
for (my $j=0; $j<$#batch; $j++{
if ($batch[$i]=="health" && $case[$i]$j]=="pain"){
$type[$i][$j]->back = "checked";
}
}
}
In this short code I want to use #type as $type[$i][$j]->back & $type[$i][$j]->front, but I am getting error that array referenced not defined . Can anyone help me how to fix this ?
Perl two-dimensional arrays are just arrays of arrays: each element of the top level array contains a (reference to) another array. The best reference for this is perldoc perlreftut
From what I can understand, you want an array of arrays of hashes. $type[$i][$j]->back and $type[$i][$j]->front are method calls in Perl, and what you want is $type[$i][$j]{back} and $type[$i][$j]{front}.
use strict;
use warnings;
my #batch;
my #case;
# Populate #batch and #case
my #type;
for my $i (0 .. $#batch) {
for my $j (0 .. $#{ $batch[$i] } ) {
if ($batch[$i] eq 'health' and $case[$i][$j] eq 'pain') {
$type[$i][$j]{back} = 'checked';
}
}
}
But I am very worried about your design. #type will be full of undefined elements, with only occasional ones set to checked. A proper fix depends entirely on what you need to do with #type once you have built it.
I hope this helps
Perl doesn't have multiple dimension variables. To emulate multidimential arrays, you can use what are called references. A reference is a way of referring to a memory location of another Perl structure such as an array or hash.
References allows you to build up more complex structures. For example, you could have an array and instead of each element in the array having a distinct value, it could point to another array. Using this, I can treat my array of arrays as a two dimensional array. But it's not a two dimensional array.
In a two dimensional array, each column ($j) has the same length. That's guaranteed. In Perl, what you have is each row ($i), pointing to a different array of columns ($j), and each of those column arrays could have a different number of elements (or even none at all! That inner array $j may not even be defined!).
There for, I have to check each column and see exactly how many values it might have:
for my $i ( 0..$#array ) {
if ( ref $array[i] ne "ARRAY" ) {
die qq(There is no sub array! for \$array[$i]!\n);
}
my #temp_j_array = #{ $array[$i] } { # This is how you dereference a reference
for my $j ( 0..$#temp_j_array ) {
# Here be dragons...
}
}
Note that I have to see exactly how many columns are in my inner ($j) array before I can go through it.
By the way, notice how I use .. to index my arrays. It's a lot cleaner than using that three part for loop which is very error prone. For example, should you check $i < $#array or $i <= $#array`? See the difference?
Since you're already dealing with a very complex structure (an array of arrays), I'm going to make it even more complex: (An array of arrays of hashes). This added complexity allows me to get rid of three separate variables. Instead of trying to keep #batch #case and #type in sync with each other, I can make these keys to my inner most hash:
my #structure = ... # Some sort of structure...
for my $i ( 0..$#structure ) {
my #temp = #{ $structure[$i] }; # This is a reference to an array. Dereference it.
for my $j ( 0..$#temp ) {
if ( $structure[$i]->[$j]->{batch} eq "health"
and $structure[$i]->[$j]->{case} eq "pain" ) {
$structure[$i]->[$j]->{back} = "checked";
}
}
}
This is a very common way to use Perl references to build more complex data structures:
my %employees; # Keyed by employee number:
$employees{1001}->{NAME} = "Bob";
$employees{1001}->{JOB} = "Yes man";
$employees{1002}->{NAME} = "Susan";
$employees{1002}->{JOB} = "sycophant";
You had some syntax errors, and were using the wrong boolean operator (==) instead of (ne).

perl push many array in one array

I have many short array
#seq1 /773..1447/ #seq2 /1 2 1843..1881 1923..2001/
but i use push
push(#add, #seq1);
push(#add, #seq2);
but it shows like it combine all array into one can't get each sub-array any more
/773..1447 1 2 1843..1881 1923..2001/
when i use
$number=#add;
it shows 6, but it should be 2. Can anyone explain the reason and how to change it.
When i use for loop to add each array
for(..){
#temp= split(/,/,$_);
push(#add, \#temp);
}
Then when i print #add; it only shows memory address, How can show all data in #add
This is normal behavior, use reference to #seq1 if you want #add to be two dimensional array,
push(#add, \#seq1);
To print all values in #add you should use Data::Dumper; print Dumper \#add;
The reason is that all parameters get flattened into list when they are pushed into array, so
#a = #b = (1,2);
push(#add, #a, #b);
is same as writing
push(#add, $a[0],$a[1], $b[0],$b[1]);
Check perlref and perllol for reference.
the push command takes an ARRAY and a LIST. It is important to understand what happens here.
The first argument must be an ARRAY, the one you want to push things on. After that it expects a LIST. What this means is that this push statement provides list context to any #seq_n array - and sort of expands the array into separate elements. So all the elements of the #seq_n are being pushed onto your #add.
Since you did not want that to happen, you wanted an array that holds the separate lists - what we call in Perl a List-of-Lists - you actually wanted to push a reference to your #seq_n arrays, using the \ character.
push #add, \#seq_1, \#seq_2, . . . \#seq_n;
Now you have an array that indeed holds references to each $seq_n.
To print them neatly, each sequence on its own line, you could iterate over each
foreach my $seq (#add) {
# $seq holds a reference to a list!
my $string = join " ", #$seq; # the # dereferences the $seq
print $string, "\n";
}
but TIMTOWTDI
print map {(join " ", #$_), "\n"} #add;
Always consider the context in Perl, and try to embrace the charms of join, grep and map.

Comparison to an array of a value [duplicate]

This question already has answers here:
How can I verify that a value is present in an array (list) in Perl?
(8 answers)
Closed 9 years ago.
I'm still feeling my way though perl and so there's probably a simple way of doing this but I can find it. I want to compare a single value say A or E to an array that may or may not contain that value, eg A B C D and then perform an action if they match. How should I set this up?
Thanks.
You filter each element of the array to see if it is the element you are looking for and then use the resulting array as a boolean value (not empty = true, empty = false):
#filtered_array = grep { $_ eq 'A' } #array;
if (#filtered_array) {
print "found it!\n";
}
If you store the list in an array then the only way is to examine each element individually in a loop, using grep, or for or any from List::MoreUtils. (grep is the worst of these, as it searches the entire array, even if a match has been found early on.) This is fine if the array is small, but you will hit performance probelms if the array has a significant size and you have to check it frequently.
You can speed things up by representing the same list in a hash, when a check for membership is just a single key lookup.
Alternatively, if the list is enormous, then it is best kept in a database, using SQLite.
Are you stuck on arrays?
Whenever in Perl you're talk about quickly looking up data, you should think in terms of hashes. A hash is a collection of data like an array, but it is keyed, and looking up the key is a very fast operation in Perl.
There's nothing that says the keys to your hash can't be your data, and it is very common in Perl to index an array with a hash in order to quickly search for values.
This turns your array #array into a hash called %arrays_hash.
use strict;
use warnings;
use feature qw(say);
use autodie;
my #array = qw(Alpha Beta Delta Gamma Ohm);
my %array_index;
for my $entry ( #array ) {
$array_index{$entry} = 1; # Doesn't matter. As long as it isn't blank or zero
}
Now, looking up whether or not your data is in your array is very quick. Just simply see if it's a key in your %array_index:
my $item = "Delta"; # Is this in my initial array?
if ( $array_index{$item} ) {
say "Yes! Item '$item' is in my array.";
}
else {
say "No. Item '$item' isn't in my array. David sad.";
}
This is so common, that you'll see a lot of programs that use the map command to index the array. Instead of that for loop, I could have done this:
my %array_index = ( map { $_ => 1 } #array );
or
my %array_index;
map { $array_index{$_} = 1 } #array;
You'll see both. The first one is a one liner. The map command takes each entry in the array, and puts it in $_. Then, it returns the results into an array. Thus, the map will return an array with your data in the even positions (0, 2, 4 8...) and a 1 in the odd positions (1, 3, 5...).
The second one is more literal and easier to understand (or about as easy to understand in a map command). Again, each item in your #array is being assigned to $_, and that is being used as the key in my %array_index hash.
Whether or not you want to use hashes depend upon the length of your array, and how many items of input you'll be searching for. If you're simply searching whether a single item is in your array, I'd probably use List::Utils or List::MoreUtils, or use a for loop to search each value of my array. If I am doing this for multiple values, I am better off with a hash.

How can I create multidimensional arrays in Perl?

I am a bit new to Perl, but here is what I want to do:
my #array2d;
while(<FILE>){
push(#array2d[$i], $_);
}
It doesn't compile since #array2d[$i] is not an array but a scalar value.
How should I declare #array2d as an array of array?
Of course, I have no idea of how many rows I have.
To make an array of arrays, or more accurately an array of arrayrefs, try something like this:
my #array = ();
foreach my $i ( 0 .. 10 ) {
foreach my $j ( 0 .. 10 ) {
push #{ $array[$i] }, $j;
}
}
It pushes the value onto a dereferenced arrayref for you. You should be able to access an entry like this:
print $array[3][2];
Change your "push" line to this:
push(#{$array2d[$i]}, $_);
You are basically making $array2d[$i] an array by surrounding it by the #{}... You are then able to push elements onto this array of array references.
Have a look at perlref and perldsc to see how to make nested data structures, like arrays of arrays and hashes of hashes. Very useful stuff when you're doing Perl.
There's really no difference between what you wrote and this:
#{$array2d[$i]} = <FILE>;
I can only assume you're iterating through files.
To avoid keeping track of a counter, you could do this:
...
push #array2d, [ <FILE> ];
...
That says 1) create a reference to an empty array, 2) storing all lines in FILE, 3) push it onto #array2d.
Another simple way is to use a hash table and use the two array indices to make a hash key:
$two_dimensional_array{"$i $j"} = $val;
If you're just trying to store a file in an array you can also do this:
fopen(FILE,"<somefile.txt");
#array = <FILE>;
close (FILE);