Using Delete in Perl - perl

I'm creating a graph data structure that contains essentially an array of nodes (and edgeList with extra information). I also have a hash that allows me to quickly get a reference to a particular node by giving its name. Suppose I now want to implement a removeNode() function in the graph class, how can I delete something quickly. Let's say the function takes the name of a node, and I hash directly to it (and have a reference to that node). Delete takes arrays or hashes as a parameter, but within an array I want to delete the object that I have a reference to.
Any ideas?

I'm not clear on exactly what you're trying to do. If you just want to remove an item from a hash, delete $hash{$key}; is all you need.
If you want to remove an item from an array, and not leave that index undefined, then you can use splice #array, $index, 1; which will remove the item and shift everything after it down one spot.
If you want to just remove an element from an array but leave the rest of the list alone, then you can just undefine it: $array[$index] = undef;
That's the same thing that delete $array[$index] does, but using delete on an array index is deprecated.
Edit:
If you need to find an object in an array and then delete it, the best way is to use firstidx from List::MoreUtils, e.g.
use List::MoreUtils 'firstidx';
my $obj = get_object_to_delete();
my $index = firstidx { $_ eq $obj } #array;
splice #array, $index, 1;
This assumes the objects stringify to something suitable for comparing for equality. If they have stringification overloaded, use something like refaddr from Scalar::Util to get the numeric reference address directly.

Related

(Perl) How to add new elements in hash, and print the new elements on top?

I know how to add new elements to the hash, but I need to prioritise the new elements and print them first.
I found that, the hash printed will follow the sequence itself, not print randomly, and I know that when I add a new one, it will appear directly on the bottom. Is there any way to add new elements directly on the top of the hash?
or after adding new elements, and they can be shown first? (all the elements in the hash will be displayed but the new will be on top)
No. Hashes are unordered in Perl. If you have observed that keys seem to come out in a particular order that is a coincidence. You cannot rely on them always coming out in that order.
There are ways you can prevent this native behaviour of Perl, but it's not a good idea to do so. If you want order, you need an array. You can store the keys of the hash in an array to preserve the order that you like. For example:
$hash{$key} = $foo;
push #keys, $key; # store key in array
In this array, the new keys will be at the end, so to get the newest key you would do:
my $newest = pop #keys; # get newest key
There's at least one module on CPAN, Tie::IxHash, that provides ordered array-like access and hash-like random lookup by key:
This Perl module implements Perl hashes that preserve the order in which the hash elements were added. The order is not affected when values corresponding to existing keys in the IxHash are changed. The elements can also be set to any arbitrary supplied order. The familiar perl array operations can also be performed on the IxHash.
Example script:
#!/usr/bin/env perl
use strict;
use warnings;
use feature qw/say/;
use Tie::IxHash;
my $t = tie my %hash, 'Tie::IxHash', a => 1, b => 2, c => 3;
$hash{'last'} = 4; # Will appear at the end of the elements
$t->Unshift(first => 0); # Will appear at the beginning of the elements
while (my ($k, $v) = each %hash) {
say "$k: $v";
}
and output:
first: 0
a: 1
b: 2
c: 3
last: 4

Perl: reference to a hash to pass to another routine

In code that uploads excel spreadsheets it gives me the data in array ref:
for( #{$listref} ){...
I access it with $_->{'whateverthehashkeyis'} and have no problem.
What I need to do is pass the hash I am accessing in the current iteration of the loop to another subroutine.
This is where I am having problems. I have tried different things with no luck.
This DOES NOT work, but it should be an example of what I need to do
%args = #{$_};
$results = &format_trading_card_preview_item(\%args);
....
sub format_trading_card_preview_item
{
my %args = shift;
I think what I need to do is dereference the hash to send it over. Is that right?
Thanks in advance for any help
It looks like $listref is a reference to an array of hash references.
If you need to use the variable holding the hash references then it is better if you name that variable instead of using the default scalar $_
There is also no point in dereferencing the hash and copying it to %args, only to take a reference to that hash and pass it as a parameter to your subroutine
And it is wrong to call a subroutine with an ampersand & character, and has been so ever since Perl v5.5 landed over seventeen years ago
Your loop should look like this
for my $item ( #$listref ) {
format_trading_card_preview_item($item);
}
Within the subroutine, it depends a lot on what you want to do with the hash passed in, but you don't say anything about that, so it's probably best to leave it as a reference and write
sub format_trading_card_preview_item {
my ($item) = #_;
...
}
or you could use the statement modifier form of for, like this
format_trading_card_preview_item($_) for #$listref;
To answer your question, you don't need to dereference the hash reference in order to pass it to another subroutine. Creating a shallow copy and then taking a reference to that new hash is inefficient, but it would technically work just fine.
However, your problem is that you're confusing hashes and arrays by using the syntax to dereference an array reference on something that is actually a hash reference. In fact, you should have gotten an error message basically saying the same thing:
Not an ARRAY reference at foo.pl line ...
What you actually want to do is something like this:
for my $href (#$listref) { # variable names could be better
# do something
my $results = format_trading_card_preview_item($href);
# do something else
}
sub format_trading_card_preview_item {
my $args = shift;
print $args->{foo};
return 42;
}
Check out perlreftut and perlref for more information on Perl references and nested data structures.

How to change an array into a hashtable?

I'm trying to make a program where I read in a file with a bunch of text in it. I then take punctuation out and then I read in a file that has stop words in it. Both get read in and put into arrays. I'm trying to put the array of the general text file and put it in a hash. I'm not really sure what I'm doing wrong, but I'm trying. I want to do this so I can generate stats on how many words are repeated and what not, but I have to take out stop words and such.
Anyway here is what I have so far I put a comment #WORKING ON MERGING ARRAY INTO HASH that is where I'm working at. I don't think the way I'm trying to put the array into the hash is right, but I looked online and the %hash{array} = "value"; doesn't compile. so not sure how else to do it.
Thanks, if you have any questions for me I will respond back quickly.
#!/usr/bin/perl
use strict;
use warnings;
#Reading in the text file
my $file0="data.txt";
open(my $filehandle0,'<', $file0) || die "Could not open $file0\n";
my#words;
while (my $line = <$filehandle0>){
chomp $line;
my #word = split(/\s+/, $line);
push(#words, #word);
}
for (#words) {
s/[\,|\.|\!|\?|\:|\;]//g;
}
my %words_count; #The code I was told to add in this post.
$words_count{$_}++ for #words;
Next I read in the stop words I have in another array.
#Reading in the stopwords file
my $file1 = "stoplist.txt";
open(my $filehandle1, '<',$file1) or die "Could not open $file1\n";
my #stopwords;
while(my $line = <$filehandle1>){
chomp $line;
my #linearray = split(" ", $line);
push(#stopwords, #linearray);
}
for my $w (my #stopwords) {
s/\b\Q$w\E\B//ig;
}
Some notes about hashes in Perl... Problem description:
Anyway here is what I have so far I put a comment #WORKING ON MERGING ARRAY INTO HASH that is where I'm working at. I don't think the way I'm trying to put the array into the hash is right, but I looked online and the %hash{array} = "value"; doesn't compile. so not sure how else to do it.
At first, ask yourself why you want to "put the array into the hash". An array represents a list of values while a hash represents a set of key-value pairs. So you have to define what keys and values should be. Not only for us, but for you. It often helps to explain even simple things to get a better understanding.
In this case, you may want to count how often a given word $word occured in your #words array. This could be done by iterating over all words and increase $count{$word} by one each time. This is what #raina77ow did in his answer. Important here is, that you're accessing single hash values, which are represented with the scalar sigil $ in Perl. So if you have a hash named %count, you can increase the value for the key 'foo' by
$count{foo}++;
Your result of "online looking" above (%hash{array} = "value") doesn't make sense. There are three valid ways to store values in a hash:
set all key-value pairs by assingning a even-sized list to the whole hash:
%count = (hello => 42, world => 17);
set a single value for a given key by assigning a single value for a defined key (this is what we did before):
$count{hello} = 42;
set a list of values for a given list of keys using a so-called hash slice:
#count{qw(hello world)} = (42, 17);
Note the use of sigils here: % for a hashy even-sized list of keys and values mixed, $ for single (scalar) values and # for lists of values. In your example you're using %, but define an array in the key braces {...} and assign a single scalar value.
Well, if you have a list of words in #words array, and want to get a hash where each key refers to specific word, and each value is the quantity of this word appearances in the source array, it's done as simple as...
my %words_count;
$words_count{$_}++ for #words;
In other words (no pun intended), you iterate over #words array, for each member increasing by 1 the corresponding element of %words_count hash OR, when that element is not yet defined, essentially creating it with value 1 (so-called auto-vivification).
As a sidenote, calling keys function on arrays is close to meaningless: in 5.12+ it'll give you the list of indexes used instead, and before that, throw a syntax error at you.

Is there any advantage to using keys #array instead of 0 .. $#array?

I was quite surprised to find that the keys function happily works with arrays:
keys HASH
keys ARRAY
keys EXPR
Returns a list consisting of all the keys of the named hash, or the
indices of an array. (In scalar context, returns the number of keys or
indices.)
Is there any benefit in using keys #array instead of 0 .. $#array with respect to memory usage, speed, etc., or are the reasons for this functionality more of a historic origin?
Seeing that keys #array holds up to $[ modification, I'm guessing it's historic :
$ perl -Mstrict -wE 'local $[=4; my #array="a".."z"; say join ",", keys #array;'
Use of assignment to $[ is deprecated at -e line 1.
4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29
Mark has it partly right, I think. What he's missing is that each now works on an array, and, like the each with hashes, each with arrays returns two items on each call. Where each %hash returns key and value, each #array also returns key (index) and value.
while (my ($idx, $val) = each #array)
{
if ($idx > 0 && $array[$idx-1] eq $val)
{
print "Duplicate indexes: ", $idx-1, "/", $idx, "\n";
}
}
Thanks to Zaid for asking, and jmcnamara for bringing it up on perlmonks' CB. I didn't see this before - I've often looped through an array and wanted to know what index I'm at. This is waaaay better than manually manipulating some $i variable created outside of a loop and incremented inside, as I expect that continue, redo, etc., will survive this better.
So, because we can now use each on arrays, we need to be able to reset that iterator, and thus we have keys.
The link you provided actually has one important reason you might use/not use keys:
As a side effect, calling keys() resets the internal interator of the HASH or ARRAY (see each). In particular, calling keys() in void context resets the iterator with no other overhead.
That would cause each to reset to the beginning of the array. Using keys and each with arrays might be important if they ever natively support sparse arrays as a real data-type.
All that said, with so many array-aware language constructs like foreach and join in perl, I can't remember the last time I used 0..$#array.
I actually think you've answered your own question: it returns the valid indices of the array, no matter what value you've set for $[. So from a generality point of view (especially for library usage), it's more preferred.
The version of Perl I have (5.10.1) doesn't support using keys with arrays, so it can't be for historic reasons.
Well in your example, you are putting them in a list; So, in a list context
keys #array will be replaced with all elements of array
whereas 0 .. $#array will do the same but as array slicing; So, instead $array[0 .. $#array] you can also mention $array[0 .. (some specific index)]

Are Perl subroutines call-by-reference or call-by-value?

I'm trying to figure out Perl subroutines and how they work.
From perlsub I understand that subroutines are call-by-reference and that an assignment (like my(#copy) = #_;) is needed to turn them into call-by-value.
In the following, I see that change is called-by-reference because "a" and "b" are changed into "x" and "y". But I'm confused about why the array isn't extended with an extra element "z"?
use strict;
use Data::Dumper;
my #a = ( "a" ,"b" );
change(#a);
print Dumper(\#a);
sub change
{
#_[0] = "x";
#_[1] = "y";
#_[2] = "z";
}
Output:
$VAR1 = [
'x',
'y'
];
In the following, I pass a hash instead of an array. Why isn't the key changed from "a" to "x"?
use strict;
use Data::Dumper;
my %a = ( "a" => "b" );
change(%a);
print Dumper(\%a);
sub change
{
#_[0] = "x";
#_[1] = "y";
}
Output:
$VAR1 = {
'a' => 'y'
};
I know the real solution is to pass the array or hash by reference using \#, but I'd like to understand the behaviour of these programs exactly.
Perl always passes by reference. It's just that sometimes the caller passes temporary scalars.
The first thing you have to realise is that the arguments of subs can be one and only one thing: a list of scalars.* One cannot pass arrays or hashes to them. Arrays and hashes are evaluated, returning a list of their content. That means that
f(#a)
is the same** as
f($a[0], $a[1], $a[2])
Perl passes by reference. Specifically, Perl aliases each of the arguments to the elements of #_. Modifying the elements #_ will change the scalars returned by $a[0], etc. and thus will modify the elements of #a.
The second thing of importance is that the key of an array or hash element determines where the element is stored in the structure. Otherwise, $a[4] and $h{k} would require looking at each element of the array or hash to find the desired value. This means that the keys aren't modifiable. Moving a value requires creating a new element with the new key and deleting the element at the old key.
As such, whenever you get the keys of an array or hash, you get a copy of the keys. Fresh scalars, so to speak.
Back to the question,
f(%h)
is the same** as
f(
my $k1 = "a", $h{a},
my $k2 = "b", $h{b},
my $k2 = "c", $h{c},
)
#_ is still aliased to the values returned by %h, but some of those are just temporary scalars used to hold a key. Changing those will have no lasting effect.
* — Some built-ins (e.g. grep) are more like flow control statements (e.g. while). They have their own parsing rules, and thus aren't limited to the conventional model of a sub.
** — Prototypes can affect how the argument list is evaluated, but it will still result in a list of scalars.
Perl's subroutines accept parameters as flat lists of scalars. An array passed as a parameter is for all practical purposes a flat list too. Even a hash is treated as a flat list of one key followed by one value, followed by one key, etc.
A flat list is not passed as a reference unless you do so explicitly. The fact that modifying $_[0] modifies $a[0] is because the elements of #_ become aliases for the elements passed as parameters. Modifying $_[0] is the same as modifying $a[0] in your example. But while this is approximately similar to the common notion of "pass by reference" as it applies to any programming language, this isn't specifically passing a Perl reference; Perl's references are different (and indeed "reference" is an overloaded term). An alias (in Perl) is a synonym for something, where as a reference is similar to a pointer to something.
As perlsyn states, if you assign to #_ as a whole, you break its alias status. Also note, if you try to modify $_[0], and $_[0] happens to be a literal instead of a variable, you'll get an error. On the other hand, modifying $_[0] does modify the caller's value if it is modifiable. So in example one, changing $_[0] and $_[1] propagates back to #a because each element of #_ is an alias for each element in #a.
Your second example is a little tricky. Hash keys are immutable. Perl doesn't provide a way to modify a hash key, aside from deleting it. That means that $_[0] is not modifiable. When you attempt to modify $_[0] Perl cannot comply with that request. It probably ought to throw a warning, but doesn't. You see, the flat list passed to it consists of unmodifiable-key followed by modifiable-value, etc. This is mostly a non-issue. I cannot think of any reason to modify individual elements of a hash in the way you're demonstrating; since hashes have no particular order you wouldn't have simple control over which elements in #_ propagate back to which values in %a.
As you pointed out, the proper protocol is to pass \#a or \%a, so that they can be referred to as $_[0]->{element} or $_[0]->[0]. Even though the notation is a little more complicated, it becomes second nature after awhile, and is much clearer (in my opinion) as to what is going on.
Be sure to have a look at the perlsub documentation. In particular:
Any arguments passed in show up in the array #_. Therefore, if you called a function with two arguments, those would be stored in $_[0] and $_[1]. The array #_ is a local array, but its elements are aliases for the actual scalar parameters. In particular, if an element $_[0] is updated, the corresponding argument is updated (or an error occurs if it is not updatable). If an argument is an array or hash element which did not exist when the function was called, that element is created only when (and if) it is modified or a reference to it is taken. (Some earlier versions of Perl created the element whether or not the element was assigned to.) Assigning to the whole array #_ removes that aliasing, and does not update any arguments.
(Note that use warnings is even more important than use strict.)
#_ itself isn't a reference to anything, it is an array (really, just a view of the stack, though if you do something like take a reference to it, it morphs into a real array) whose elements each are an alias to a passed parameter. And those passed parameters are the individual scalars passed; there is no concept of passing an array or hash (though you can pass a reference to one).
So shifts, splices, additional elements added, etc. to #_ don't affect anything passed, though they may change the index of or remove from the array one of the original aliases.
So where you call change(#a), this puts two aliases on the stack, one to $a[0] and one to $a[1]. change(%a) is more complicated; %a flattens out into an alternating list of keys and values, where the values are the actual hash values and modifying them modifies what's stored in the hash, but where the keys are merely copies, no longer associated with the hash.
Perl does not pass the array or hash itself by reference, it unfurls the entries (the array elements, or the hash keys and values) into a list and passes this list to the function. #_ then allows you to access the scalars as references.
This is roughly the same as writing:
#a = (1, 2, 3);
$b = \$a[2];
${$b} = 4;
#a now [1, 2, 4];
You'll note that in the first case you were not able to add an extra item to #a, all that happened was that you modified the members of #a that already existed. In the second case, the hash keys don't really exist in the hash as scalars, so these need to be created as copies in temporary scalars when the expanded list of the hash is created to be passed into the function. Modifying this temporary scalar will not modify the hash key, as it is not the hash key.
If you want to modify an array or hash in a function, you will need to pass a reference to the container:
change(\%foo);
sub change {
$_[0]->{a} = 1;
}
Firstly, you are confusing the # sigil as indicating an array. This is actually a list. When you call Change(#a) you are passing the list to the function, not an array object.
The case with the hash is slightly different. Perl evaluates your call into a list and passes the values as a list instead.