How to splice an array that is in a hash of arrays? - perl

I am populating a data structure like so:-
push #{$AvailTrackLocsTop{$VLayerName}}, $CurrentTrackLoc;
Where $VLayerName is a string like m1, m2, m3, etc., and $CurrentTrackLoc is simply a decimal number. If I use Data::Dumper to print the contents of the hash after it is fully populated, it reveals what I expect, e.g.:-
$VAR1 = {
'm11' => [
'0.228',
'0.316',
'0.402',
'0.576',
'0.750',
'569.458',
'569.544',
'569.718',
'569.892'
]
};
Now I need to effectively splice the stored list of decimal numbers. I can delete entries like so:-
for (my $i = $c; $i <= $endc; $i++) {
delete $AvailTrackLocsTop{$VLayerName}->[$i];
}
The result is, as expected, a bunch of "undef" entries where numbers used to exist, e.g.:-
$VAR1 = {
'm11' => [
undef,
undef,
undef,
undef,
'0.750',
'569.458',
'569.544',
'569.718',
'569.892'
]
};
But how can I purge the undef entries so that I see something like this instead?
$VAR1 = {
'm11' => [
'0.750',
'569.458',
'569.544',
'569.718',
'569.892'
]
};
It is important to note that the deletions can be anywhere in the array, e.g. like index 33 and 99 of 100. It is easy to splice arrays outside the context of a hash structure, but I am struggling to manipulate the array when it is embedded inside a large hash.

First, I want to note from the delete documentation:
WARNING: Calling delete on array values is strongly discouraged. The notion of deleting or checking the existence of Perl array elements is not conceptually coherent, and can lead to surprising behavior.
The correct way to set an array element to undef is with the undef function (or just assigning undef to it).
To instead remove the elements, you can use the splice function, it works the same way on nested arrayrefs as on a normal array, you just need to dereference it like you did for push.
splice #{$AvailTrackLocsTop{$VLayerName}}, $c, $endc - $c + 1;

Probably the easiest given where you're at is to rebuild the arrays without the undefs:
$_ = [ grep defined, #$_ ] for values %AvailTrackLocsTop;
Alternatively, instead of a hash of arrays, you could have a hash of hashes, and then deleting will cause them to disappear without simply turning to undef. You'll just lose the order, if that matters.

Related

Why does perl insert an undef value into my hash?

Let me start off with a simple minimal example:
use strict;
use warnings;
use Data::Dumper;
my %hash;
$hash{count} = 4;
$hash{elems}[$_] = {} for (1..$hash{count});
print Dumper \%hash;
Here is the result (reformatted):
$VAR1 = {
'count' => 4,
'elems' => [undef, {}, {}, {}, {}]
};
I do not understand, why did the first element of $hash{elems} become an undef?
I know there are probably easier ways to do what I am doing, but I am creating these empty hashes so that I can later do my $e = $hash{elems}[$i] and continue to use $e to interact with the element, eg continue the horror of nested structures with $e->{subelems}[0] = 100.
Array indices start at 0 in Perl (and in most programming languages for that matter).
In the 1st iteration of $hash{elems}[$_] = {} for (1..$hash{count});, $_ is 1, and you thus put {} at index 1 of $hash{elems}.
Since you didn't put anything at index 0 of $hash{elems}, it contains undef.
To remedy this, you could use push instead of assigning to specific indices:
push #{$hash{elems}}, {} for 1 .. $hash{count};
push adds items at the end of its first argument. Initially, $hash{elems} is empty, so the end is the 1st index (0).
Some tips:
The parenthesis are not needed in for (1..$hash{count}): for 1 .. $hash{count} works just as well and looks a bit lighter.
You could initialize your hash when you declare it:
my %hash = (
count => 4,
elems => [ map { {} } 1 .. 4 ]
);
Initializing elems with an arrayref of hashrefs is often useless, thanks to autovivification. Simply doing $hash{elems}[0]{some_key} = 42 will create an arrayref in $hash{elems}, a hashref at index 0 in this array, containing the key some_key with value 42.
In some cases though, your initialization could make sense. For instance, if you want to pass $hash{elems} (but not $hash) to a function (same thing if you want to pass $hash{elems}[..] to a function without passing $hash{elems}).

Is it possible to push a key-value pair directly to hash in perl?

I know pushing is only passible to array, not hash. But it would be much more convenient to allow pushing key-value pair directly to hash (and I am still surprise it is not possible in perl). I have an example:
#!/usr/bin/perl -w
#superior words begin first, example of that word follow
my #ar = qw[Animals,dog Money,pound Jobs,doctor Food];
my %hash;
my $bool = 1;
sub marine{
my $ar = shift if $bool;
for(#$ar){
my #ar2 = split /,/, $_;
push %hash, ($ar2[0] => $ar2[1]);
}
}
marine(\#ar);
print "$_\n" for keys %hash;
Here I have an array, which has 2 words separately by , comma. I would like to make a hash from it, making the first a key, and the second a value (and if it lacks the value, as does the last Food word, then no value at all -> simply undef. How to make it in perl?
Output:
Possible attempt to separate words with commas at ./a line 4.
Experimental push on scalar is now forbidden at ./a line 12, near ");"
Execution of ./a aborted due to compilation errors.
I might be oversimplyfing things here, but why not simply assign to the hash rather than trying to push into it?
That is, replace this unsupported expression:
push %hash, ($ar2[0] => $ar2[1]);
With:
$hash{$ar2[0]} = $ar2[1];
If I incoporate this in your code, and then dump the resulting hash at the end, I get:
$VAR1 = {
'Food' => undef,
'Money' => 'pound',
'Animals' => 'dog',
'Jobs' => 'doctor'
};
Split inside map and assign directly to a hash like so:
my #ar = qw[Animals,dog Money,pound Jobs,doctor Food];
my %hash_new = map {
my #a = split /,/, $_, 2;
#a == 2 ? #a : (#a, undef)
} #ar;
Note that this can also handle the case with more than one comma delimiter (hence splitting into a max of 2 elements). This can also handle the case with no commas, such as Food - in this case, the list with the single element plus the undef is returned.
If you need to push multiple key/value pairs to (another) hash, or merge hashes, you can assign a list of hashes like so:
%hash = (%hash_old, %hash_new);
Note that the same keys in the old hash will be overwritten by the new hash.
We can assign this array to a hash and perl will automatically look at the values in the array as if they were key-value pairs. The odd elements (first, third, fifth) will become the keys and the even elements (second, fourth, sixth) will become the corresponding values. check url https://perlmaven.com/creating-hash-from-an-array
use strict;
use warnings;
use Data::Dumper qw(Dumper);
my #ar;
my %hash;
#The code in the enclosing block has warnings enabled,
#but the inner block has disabled (misc and qw) related warnings.
{
#You specified an odd number of elements to initialize a hash, which is odd,
#because hashes come in key/value pairs.
no warnings 'misc';
#If your code has use warnings turned on, as it should, then you'll get a warning about
#Possible attempt to separate words with commas
no warnings 'qw';
#ar = qw[Animals,dog Money,pound Jobs,doctor Food];
# join the content of array with comma => Animals,dog,Money,pound,Jobs,doctor,Food
# split the content using comma and assign to hash
# split function returns the list in list context, or the size of the list in scalar context.
%hash = split(",", (join(",", #ar)));
}
print Dumper(\%hash);
Output
$VAR1 = {
'Animals' => 'dog',
'Money' => 'pound',
'Jobs' => 'doctor',
'Food' => undef
};

Does iterating over a hash reference require implicitly copying it in perl?

Lets say I have a large hash and I want to iterate over the contents of it contents. The standard idiom would be something like this:
while(($key, $value) = each(%{$hash_ref})){
///do something
}
However, if I understand my perl correctly this is actually doing two things. First the
%{$hash_ref}
is translating the ref into list context. Thus returning something like
(key1, value1, key2, value2, key3, value3 etc)
which will be stored in my stacks memory. Then the each method will run, eating the first two values in memory (key1 & value1) and returning them to my while loop to process.
If my understanding of this is right that means that I have effectively copied my entire hash into my stacks memory only to iterate over the new copy, which could be expensive for a large hash, due to the expense of iterating over the array twice, but also due to potential cache hits if both hashes can't be held in memory at once. It seems pretty inefficient. I'm wondering if this is what really happens, or if I'm either misunderstanding the actual behavior or the compiler optimizes away the inefficiency for me?
Follow up questions, assuming I am correct about the standard behavior.
Is there a syntax to avoid copying of the hash by iterating over it values in the original hash? If not for a hash is there one for the simpler array?
Does this mean that in the above example I could get inconsistent values between the copy of my hash and my actual hash if I modify the hash_ref content within my loop; resulting in $value having a different value then $hash_ref->($key)?
No, the syntax you quote does not create a copy.
This expression:
%{$hash_ref}
is exactly equivalent to:
%$hash_ref
and assuming the $hash_ref scalar variable does indeed contain a reference to a hash, then adding the % on the front is simply 'dereferencing' the reference - i.e. it resolves to a value that represents the underlying hash (the thing that $hash_ref was pointing to).
If you look at the documentation for the each function, you'll see that it expects a hash as an argument. Putting the % on the front is how you provide a hash when what you have is a hashref.
If you wrote your own subroutine and passed a hash to it like this:
my_sub(%$hash_ref);
then on some level you could say that the hash had been 'copied', since inside the subroutine the special #_ array would contain a list of all the key/value pairs from the hash. However even in that case, the elements of #_ are actually aliases for the keys and values. You'd only actually get a copy if you did something like: my #args = #_.
Perl's builtin each function is declared with the prototype '+' which effectively coerces a hash (or array) argument into a reference to the underlying data structure.
As an aside, starting with version 5.14, the each function can also take a reference to a hash. So instead of:
($key, $value) = each(%{$hash_ref})
You can simply say:
($key, $value) = each($hash_ref)
No copy is created by each (though you do copy the returned values into $key and $value through assignment). The hash itself is passed to each.
each is a little special. It supports the following syntaxes:
each HASH
each ARRAY
As you can see, it doesn't accept an arbitrary expression. (That would be each EXPR or each LIST). The reason for that is to allow each(%foo) to pass the hash %foo itself to each rather than evaluating it in list context. each can do that because it's an operator, and operators can have their own parsing rules. However, you can do something similar with the \% prototype.
use Data::Dumper;
sub f { print(Dumper(#_)); }
sub g(\%) { print(Dumper(#_)); } # Similar to each
my %h = (a=>1, b=>2);
f(%h); # Evaluates %h in list context.
print("\n");
g(%h); # Passes a reference to %h.
Output:
$VAR1 = 'a'; # 4 args, the keys and values of the hash
$VAR2 = 1;
$VAR3 = 'b';
$VAR4 = 2;
$VAR1 = { # 1 arg, a reference to the hash
'a' => 1,
'b' => 2
};
%{$h_ref} is the same as %h, so all of the above applies to %{$h_ref} too.
Note that the hash isn't copied even if it is flattened. The keys are "copied", but the values are returned directly.
use Data::Dumper;
my %h = (abc=>"def", ghi=>"jkl");
print(Dumper(\%h));
$_ = uc($_) for %h;
print(Dumper(\%h));
Output:
$VAR1 = {
'abc' => 'def',
'ghi' => 'jkl'
};
$VAR1 = {
'abc' => 'DEF',
'ghi' => 'JKL'
};
You can read more about this here.

Why I can use #list to call an array, but can't use %dict to call a hash in perl? [duplicate]

This question already has answers here:
Why do you need $ when accessing array and hash elements in Perl?
(9 answers)
Closed 8 years ago.
Today I start my perl journey, and now I'm exploring the data type.
My code looks like:
#list=(1,2,3,4,5);
%dict=(1,2,3,4,5);
print "$list[0]\n"; # using [ ] to wrap index
print "$dict{1}\n"; # using { } to wrap key
print "#list[2]\n";
print "%dict{2}\n";
it seems $ + var_name works for both array and hash, but # + var_name can be used to call an array, meanwhile % + var_name can't be used to call a hash.
Why?
#list[2] works because it is a slice of a list.
In Perl 5, a sigil indicates--in a non-technical sense--the context of your expression. Except from some of the non-standard behavior that slices have in a scalar context, the basic thought is that the sigil represents what you want to get out of the expression.
If you want a scalar out of a hash, it's $hash{key}.
If you want a scalar out of an array, it's $array[0]. However, Perl allows you to get slices of the aggregates. And that allows you to retrieve more than one value in a compact expression. Slices take a list of indexes. So,
#list = #hash{ qw<key1 key2> };
gives you a list of items from the hash. And,
#list2 = #list[0..3];
gives you the first four items from the array. --> For your case, #list[2] still has a "list" of indexes, it's just that list is the special case of a "list of one".
As scalar and list contexts were rather well defined, and there was no "hash context", it stayed pretty stable at $ for scalar and # for "lists" and until recently, Perl did not support addressing any variable with %. So neither %hash{#keys} nor %hash{key} had meaning. Now, however, you can dump out pairs of indexes with values by putting the % sigil on the front.
my %hash = qw<a 1 b 2>;
my #list = %hash{ qw<a b> }; # yields ( 'a', 1, 'b', 2 )
my #l2 = %list[0..2]; # yields ( 0, 'a', 1, '1', 2, 'b' )
So, I guess, if you have an older version of Perl, you can't, but if you have 5.20, you can.
But for a completist's sake, slices have a non-intuitive way that they work in a scalar context. Because the standard behavior of putting a list into a scalar context is to count the list, if a slice worked with that behavior:
( $item = #hash{ #keys } ) == scalar #keys;
Which would make the expression:
$item = #hash{ #keys };
no more valuable than:
scalar #keys;
So, Perl seems to treat it like the expression:
$s = ( $hash{$keys[0]}, $hash{$keys[1]}, ... , $hash{$keys[$#keys]} );
And when a comma-delimited list is evaluated in a scalar context, it assigns the last expression. So it really ends up that
$item = #hash{ #keys };
is no more valuable than:
$item = $hash{ $keys[-1] };
But it makes writing something like this:
$item = $hash{ source1(), source2(), #array3, $banana, ( map { "$_" } source4()};
slightly easier than writing:
$item = $hash{ [source1(), source2(), #array3, $banana, ( map { "$_" } source4()]->[-1] }
But only slightly.
Arrays are interpolated within double quotes, so you see the actual contents of the array printed.
On the other hand, %dict{1} works, but is not interpolated within double quotes. So, something like my %partial_dict = %dict{1,3} is valid and does what you expect i.e. %partial_dict will now have the value (1,2,3,4). But "%dict{1,3}" (in quotes) will still be printed as %dict{1,3}.
Perl Cookbook has some tips on printing hashes.

Perl: Beginner. Which data structure should I use?

Okay, not sure where to ask this, but I'm a beginner programmer, using Perl. I need to create an array of an array, but I'm not sure if it would be better use array/hash references, or array of hashes or hash of arrays etc.
I need an array of matches: #totalmatches
Each match contains 6 elements(strings):
#matches = ($chapternumber, $sentencenumber, $sentence, $grammar_relation, $argument1, $argument2)
I need to push each of these elements into the #matches array/hash/reference, and then push that array/hash/reference into the #totalmatches array.
The matches are found based on searching a file and selecting the strings based on meeting the criteria.
QUESTIONS
Which data structure would you use?
Can you push an array into another array, as you would push an element into an array? Is this an efficient method?
Can you push all 6 elements simultaneously, or have to do 6 separate pushes?
When working with 2-D, to loop through would you use:
foreach (#totalmatches) {
foreach (#matches) {
...
}
}
Thanks for any advice.
Which data structure would you use?
An array for a ordered set of things. A hash for a set of named things.
Can you push an array into another array, as you would push an element into an array? Is this an efficient method?
If you try to push an array (1) into an array (2), you'll end up pushing all the elements of 1 into 2. That is why you would push an array ref in instead.
Can you push all 6 elements simultaneously, or have to do 6 separate pushes?
Look at perldoc -f push
push ARRAY,LIST
You can push a list of things in.
When working with 2-D, to loop through would you use:
Nested foreach is fine, but that syntax wouldn't work. You have to access the values you are dealing with.
for my $arrayref (#outer) {
for my $item (#$arrayref) {
$item ...
}
}
Do not push one array into another array.
Lists just join with each other into a new list.
Use list of references.
#create an anonymous hash ref for each match
$one_match_ref = {
chapternumber => $chapternumber_value,
sentencenumber => $sentencenumber_value,
sentence => $sentence_value,
grammar_relation => $grammar_relation_value,
arg1 => $argument1,
arg2 => $argument2
};
# add the reference of match into array.
push #all_matches, $one_match_ref;
# list of keys of interest
#keys = qw(chapternumber sentencenumber sentence grammer_relation arg1 arg2);
# walk through all the matches.
foreach $ref (#all_matches) {
foreach $key (#keys) {
$val = $$ref{$key};
}
# or pick up some specific keys
my $arg1 = $$ref{arg1};
}
Which data structure would you use?
An array... I can't really justify that choice, but I can't imagine what you would use as keys if you used a hash.
Can you push an array into another array, as you would push an element into an array? Is this an efficient method?
Here's the thing; in Perl, arrays can only contain scalar variables - the ones which start with $. Something like...
#matrix = ();
#row = ();
$arr[0] = #row; # FAIL!
... wont't work. You will have to instead use a reference to the array:
#matrix = ();
#row = ();
$arr[0] = \#row;
Or equally:
push(#matrix, \#row);
Can you push all 6 elements simultaneously, or have to do 6 separate pushes?
If you use references, you need only push once... and since you don't want to concatenate arrays (you need an array of arrays) you're stuck with no alternatives ;)
When working with 2-D, to loop through would you use:
I'd use something like:
for($i=0; $i<#matrix; $i++) {
#row = #{$matrix[$i]}; # de-reference
for($j=0; $j<#row; $j++) {
print "| "$row[$j];
}
print "|\n";
}
Which data structure would you use?
Some fundamental container properties:
An array is a container for ordered scalars.
A hash is a container for scalars obtained by a unique key (there can be no duplicate keys in the hash). The order of values added later is not available anymore.
I would use the same structure like ZhangChn proposed.
Use a hash for each match.
The details of the match then can be accessed by descriptive names instead of plain numerical indices. i.e. $ref->{'chapternumber'} instead of $matches[0].
Take references of these anonymous hashes (which are scalars) and push them into an array in order to preserve the order of the matches.
To dereference items from the data structure
get an item from the array which is a hash reference
retrieve any matching detail you need from the hash reference