I am trying to input a text file to Perl program and reverse its order of lines i.e. last line will become first, second last will become second etc. I am using following code
#!C:\Perl64\bin
$k = 0;
while (<>){
print "the value of i is $i";
#array[k] = $_;
++$k;
}
print "the array is #array";
But for some reason, my array is only printing the last line of the text file.
Any suggestions?
Typically, rather than keep a separate array index, perl programs use the push operator to push a string onto an array. One way to do this in your program:
push #array, $_;
If you really want to do it by array index, then you need to use the following syntax:
$array[$k] = $_;
Notice the $ rather than # in front. This tells perl that you're dealing with a single element from the array, not multiple elements. #array gives you the entire array, while $array[$k] gives you a single element. (There is a more advanced topic called "slices," but let's not get into that here. I will say that #array[$k] gives you a slice, and that isn't what you want here.)
If you really just want to slurp the entire file into an array, you can do that in one step:
#array = ( <> );
That will read the entire file into #array in one step.
You might have noticed I omitted/ignored your print statement. I'm not sure what it's doing printing out a variable named $i, since it didn't seem connected at all to the rest of the code. I reasoned it was debug code you had added, and not really relevant to the task at hand.
Anyway, that should get your input into #array. Now reversing the array... There are many ways you could do this in perl, but I'll let you discover those yourself.
Instead of:
#array[k] = $_;
you want:
$array[$k] = $_;
To reference the scalar variable $k, you need the $ on the front. Without that it is interpreted as the literal string 'k', which when used as an array index would be interpreted as 0 (since a non-numeric string will be interpreted as 0 in a numeric context).
So, each time around the loop you are setting the first element to the line read in (overwriting the value set in the previous iteration).
A few other tips:
#array[ ] is actually the syntax for an array slice rather than a single element. It works in this case because you are assigning to a slice of 1. The usual syntax for accessing a single element would be $array[ ].
I recommend placing 'use strict;' at the top of your script - you would have gotten an error pointing out the incorrect reference to $k
Instead of using an index variable, you could push the values onto the end of the array, eg:
while (<>) {
push #array, $_;
}
Accept input until it finds the word end
Solution1
#!/usr/bin/perl
while(<>) {
last if $_=~/end/i;
push #array,$_;
}
for (my $i=scalar(#array);$i>=0;$i--){
print pop #array;
}
Solution2
while(<>){
last if $_=~/end/i;
push #array,$_;
}
print reverse(#array);
Related
NOTE: See the end of this post for final explanation.
This is probably a very basic question, but I'm still trying to master a few of the fundamentals regarding references in Perl, and came across something in the perldsc page that I'd like to confirm. The following code is in the Generating Array of Arrays section:
while ( <> ) {
push #AoA, [ split ];
}
Obviously, the <> operation in the while loop reads one line of input in at a time. I am assuming at this point that line is then put into an anonymous array via the [ ] brackets, we'll call this #zero. Then the split command places everything in a given line separated by whitespace within the array (e.g., the first word is assigned to $zero[0], the second to $zero[1] and so on). The scalar reference of #zero is then pushed onto #AoA.
The next line of input is passed via the <> operator and gets assigned to a completely new anonymous array (e.g. #one), and its scalar reference is pushed onto #AoA.
Once #AoA is populated, I could then access its contents with a nested foreach loop; the first iterating through the "rows" (e.g. for $row (#AoA)), and a second, inner loop, foreach to access the columns of that particular row.
The latter (accessing said "columns" would be done by dereferencing (e.g., for $column (#$row)) the particular $row being read by the previous, "outer" foreach loop.
Is my understanding correct? I'm assuming you could still access any element of the #AoA just as you would if it were assigned vs. being anonymous? That is $element = $AoA[8][1]; .
I'm want to verify my thought process here. Is the automatic declaration of a unique, anonymous array each time through the loop part of the autovivication in Perl? I guess that is what is throwing me off a bit. Thanks.
EDIT: Based on the comments below my understanding regarding the anonymous array is still unclear, so I want to take a shot at one more description to see if it meets everyone's understanding.
Starting with the push #AoA, [split]; statement, split takes in the line from $_ and returns a list parsed by whitepace. That list is captured by [ ], which then returns an array reference. That array reference (created by [ ]) is then pushed onto #AoA. Is this accurate re: [ ]? The next step (dereferencing / use of #AoA) was covered very well by #krico below.
FINAL ANSWER/EXPLANATION: Based on all of the comments / feedback here, some further research on my part, and testing it seems my understanding was correct. I'll break it down here, so others can easily reference it later. See #krico's response below for a more explicit code representation that follows the steps outlined here.
while ( <> ) {
push #AoA, [ split ];
}
One line of input is passed at a time to the <> operator
The split function takes that line in via $_ and parses it based on whitespace (the default).
split then returns a LIST.
The [ ] is an anonymous array that provides the perl data structure for the List passed by split.
The push #AoA pushes the reference to the anonymous array onto its queue as element $AoA[0] (the second anonymous array reference will be put into $AoA1, etc...).
This continues through the entire input file. Once completed, #AoA is a 2D array, holding reference values (scalar values) to each of the previously generated anonymous arrays.
From this point #AoA can be dereferenced appropriately to work with the underlying/reference elements taken in from the input file. The default dereferencing technique is CIRCUMFIX (see perlfef below); however as of 5.19 a new method of dereferencing is available and will be released in 5.20, POSTFIX. Articles are linked below.
References: Perl References Documentation, Perl References Tutorial, Perl References Question noted by #Eli Hubert, Mike Friedman's blog post about differences between arrays and lists, Upcoming Postfix dereferencing in Perl, and Postfix dereferencing Article
This is what is going on:
The <> will put the line into the default variable $_
The split function will read $_ and return an array
The [ ] brackets will return a scalar, in it there will be a reference to that array
That reference is then pushed into the #AoA array
When you do $AoA[8][2] you are implicitly dereferencing the scalar. It's the same as $AoA[8]->[2].
The same code a little more readeable and you should understand it.
my $line;
while ( $line = <STDIN> ) {
my #parts = split $line;
my $partsRef = \#parts;
push #AoA, $partsRef;
}
Now, if you wanted to print the 2nd part of the 5th line you could say.
my $ref = #AoA[4];
my #parts = #$ref;
print $parts[1];
Get it?
I'm trying to make a program where I read in a file with a bunch of text in it. I then take punctuation out and then I read in a file that has stop words in it. Both get read in and put into arrays. I'm trying to put the array of the general text file and put it in a hash. I'm not really sure what I'm doing wrong, but I'm trying. I want to do this so I can generate stats on how many words are repeated and what not, but I have to take out stop words and such.
Anyway here is what I have so far I put a comment #WORKING ON MERGING ARRAY INTO HASH that is where I'm working at. I don't think the way I'm trying to put the array into the hash is right, but I looked online and the %hash{array} = "value"; doesn't compile. so not sure how else to do it.
Thanks, if you have any questions for me I will respond back quickly.
#!/usr/bin/perl
use strict;
use warnings;
#Reading in the text file
my $file0="data.txt";
open(my $filehandle0,'<', $file0) || die "Could not open $file0\n";
my#words;
while (my $line = <$filehandle0>){
chomp $line;
my #word = split(/\s+/, $line);
push(#words, #word);
}
for (#words) {
s/[\,|\.|\!|\?|\:|\;]//g;
}
my %words_count; #The code I was told to add in this post.
$words_count{$_}++ for #words;
Next I read in the stop words I have in another array.
#Reading in the stopwords file
my $file1 = "stoplist.txt";
open(my $filehandle1, '<',$file1) or die "Could not open $file1\n";
my #stopwords;
while(my $line = <$filehandle1>){
chomp $line;
my #linearray = split(" ", $line);
push(#stopwords, #linearray);
}
for my $w (my #stopwords) {
s/\b\Q$w\E\B//ig;
}
Some notes about hashes in Perl... Problem description:
Anyway here is what I have so far I put a comment #WORKING ON MERGING ARRAY INTO HASH that is where I'm working at. I don't think the way I'm trying to put the array into the hash is right, but I looked online and the %hash{array} = "value"; doesn't compile. so not sure how else to do it.
At first, ask yourself why you want to "put the array into the hash". An array represents a list of values while a hash represents a set of key-value pairs. So you have to define what keys and values should be. Not only for us, but for you. It often helps to explain even simple things to get a better understanding.
In this case, you may want to count how often a given word $word occured in your #words array. This could be done by iterating over all words and increase $count{$word} by one each time. This is what #raina77ow did in his answer. Important here is, that you're accessing single hash values, which are represented with the scalar sigil $ in Perl. So if you have a hash named %count, you can increase the value for the key 'foo' by
$count{foo}++;
Your result of "online looking" above (%hash{array} = "value") doesn't make sense. There are three valid ways to store values in a hash:
set all key-value pairs by assingning a even-sized list to the whole hash:
%count = (hello => 42, world => 17);
set a single value for a given key by assigning a single value for a defined key (this is what we did before):
$count{hello} = 42;
set a list of values for a given list of keys using a so-called hash slice:
#count{qw(hello world)} = (42, 17);
Note the use of sigils here: % for a hashy even-sized list of keys and values mixed, $ for single (scalar) values and # for lists of values. In your example you're using %, but define an array in the key braces {...} and assign a single scalar value.
Well, if you have a list of words in #words array, and want to get a hash where each key refers to specific word, and each value is the quantity of this word appearances in the source array, it's done as simple as...
my %words_count;
$words_count{$_}++ for #words;
In other words (no pun intended), you iterate over #words array, for each member increasing by 1 the corresponding element of %words_count hash OR, when that element is not yet defined, essentially creating it with value 1 (so-called auto-vivification).
As a sidenote, calling keys function on arrays is close to meaningless: in 5.12+ it'll give you the list of indexes used instead, and before that, throw a syntax error at you.
How do I get rid of use of an uninitialized value within an if construct using a Perl regex?
When using the code below, I get use of uninitialized value messages.
if($arrayOld[$i] =~ /-(.*)/ || $arrayOld[$i] =~ /\#(.*)/)
When using the code below, I get no output.
if(defined($arrayOld[$i]) =~ /-(.*)/ || defined($arrayOld[$i]) =~ /\#(.*)/)
What is the proper way to check if a variable has a value given the code above?
Try:
if($arrayOld[$i] && $arrayOld[$i] =~ /-|\#(.*)/)
This first checks $arrayOld[$i] for a value before running a regx against it.
(Have also combined the || into the regex.)
From the error message in your comment, you're accessing an element of #arrayOld that isn't defined. Without seeing the rest of the code, this could indicate a bug in your program, or it could just be expected behavior.
If you understand why $arrayOld[$i] is undef, and you want to allow that without getting a warning, there's a couple of things you can do. Perl 5.10.0 introduced the defined-or operator //, which you can use to substitute the empty string for undef:
use 5.010;
...
if(($arrayOld[$i] // '') =~ /-(.*)/ || ($arrayOld[$i] // '') =~ /\#(.*)/)
Or, you can just turn off the warning:
if (do { no warnings 'uninitalized';
$arrayOld[$i] =~ /-(.*)/ || $arrayOld[$i] =~ /\#(.*)/ })
Here, I'm using do to limit the time the warning is disabled. However, turning off the warning also suppresses the warning you'd get if $i were undef. Using // allows you to specify exactly what is allowed to be undef, and exactly what value should be used instead of undef.
Note: defined($arrayOld[$i]) =~ /-(.*)/ is running a pattern match on the result of the defined function, which is just going to be a true/false value; not the string you want to test.
To answer your question narrowly, you can prevent undefined-value warnings in that line of code with
if (defined $i && defined $arrayOld[$i]
&& ($arrayOld[$i] =~ /-(.*)/ || $arrayOld[$i] =~ /\#(.*)/))
{
...;
}
That is, evaluating either $i or the expression $arrayOld[$i] may result in an undefined value. Note the additional layer of parentheses that are necessary as written above because of the difference in precedence between && and ||, with the former binding more tightly. For the particular patterns in your question, you could sidestep this precedence issue by combining your patterns into one regex, but this can be tricky to do in the general case.
I recommend against using the unpleasing code above. Read on to see an elegant solution to your problem that has Perl do the work for you and is much easier to read.
Looking back
From the slightly broader context of your earlier question, $i is a loop variable and by construction will certainly be defined, so testing $i is overkill. Your code blindly pulls elements from #arrayOld, and Perl happily obliges. In cases where nothing is there, you get the undefined value.
This sort of one-by-one peeking and poking is common in C programs, but in Perl, it is almost always a red flag that you could express your algorithm more elegantly. Consider the complete, working example below.
Working demonstration
#! /usr/bin/env perl
use strict;
use warnings;
use 5.10.0; # given/when
*FILEREAD = *DATA; # for demo only
my #interesting_line = (qr/-(.*)/, qr/\#(.*)/);
$/ = ""; # paragraph mode
while(<FILEREAD>) {
chomp;
my #arrayOld = split /\n/;
my #arrayNewLines;
for (1 .. #arrayOld) {
given (shift #arrayOld) {
push #arrayNewLines, $_ when #interesting_line;
push #arrayOld, $_;
}
}
print "\#arrayOld:\n", map("$_\n", #arrayOld), "\n",
"\#arrayNewLines:\n", map("$_\n", #arrayNewLines);
}
__DATA__
#SCSI_test # put this line into #arrayNewLines
kdkdkdkdkdkdkdkd
dkdkdkdkdkdkdkdkd
- ccccccccccccccc # put this line into #arrayNewLines
Front matter
The line
use 5.10.0;
enables Perl’s given/when switch statement, and this makes for a nice way to decide which array gets a given line of input.
As the comment indicates
*FILEREAD = *DATA; # for demo only
is for the purpose of this Stack Overflow demonstration. In your real code, you have open FILEREAD, .... Placing the input from your question into Perl’s DATA filehandle allows presenting code and input in one self-contained unit, and then we alias FILEREAD to DATA so the rest of the code will drop into yours with no fuss.
The main event
The core of the processing is
for (1 .. #arrayOld) {
given (shift #arrayOld) {
push #arrayNewLines, $_ when #interesting_line;
push #arrayOld, $_;
}
}
Notice that there are no defined checks or even explicit regex matches! There’s no $i or $arrayOld[$i]! What’s going on?
You start with #arrayOld containing all the lines from the current paragraph and want to end with the interesting lines in #arrayNewLines and everything else staying in #arrayOld. The code above takes the next line out of #arrayOld with shift. If the line is interesting, we push it onto the end of #arrayNewLines. Otherwise, we put it back on the end of #arrayOld.
The statement modifier when #interesting_line performs an implicit smart-match with the topic from given. As explained in “Smart matching in detail,” when smart matching against an array, Perl implicitly loops over it and stops on the first match. In this case, the array #interesting_line contains compiled regexes that match lines you want to move to #arrayNewLines. If the current line (in $_ thanks to given) does not match any of those patterns, it goes back in #arrayOld.
We do the preceding process exactly scalar #arrayOld times, that is, once for each line in the current paragraph. This way, we process everything exactly once and do not have to worry about fussy bookkeeping over where the current array index is. Whatever is left in #arrayOld after that many shifts must be the lines we pushed back onto it, which are the uninteresting lines in the order that the occurred in the input.
Sample output
For the input in your question, the output is
#arrayOld:
kdkdkdkdkdkdkdkd
dkdkdkdkdkdkdkdkd
#arrayNewLines:
#SCSI_test # put this line into #arrayNewLines
- ccccccccccccccc # put this line into #arrayNewLines
I am trying to pass all the data from a file into a Perl array and then I am trying to use a foreach loop to process every string in the array. The problem is that the foreach instead of printing each individual line is printing the entire array.I am using the following script.
while (<FILE>) {
$_ =~ s/(\)|\()//g;
push #array, $_;
}
foreach $n(#array) {
print "$n\n";
}
Say for example the data in the array is #array=qw(He goes to the school everyday)
the array is getting printed properly but the foreach loop instead of printing every element on different line is printing the entire array.
After reading your comments, I am guessing that your problem is that your source file does not contain any newlines: I.e. the entire file is just one line. Some text editors just wrap the text without actually adding any line break characters.
There is no "solution" to that problem; You have to add line breaks where you want them. You could write a script to do it, but I doubt it would make much sense. It all depends on what you want to do with this text.
Here's my code suggestions for your snippet.
chomp(#array = <FILE>);
s/[()]//g for #array;
print "$_\n" for #array;
or
#array = <FILE>;
s/[()]//g for #array;
print #array;
Note that if you have a file from another filesystem, you may get \r characters left over at the end of your strings after chomp, causing the output to look corrupted, overwriting itself.
Additional notes:
(\)|\() is better written as a character class: [()].
#array = <FILE> will read the entire file into the array. No need
to loop.
As shown in my examples, print can be assigned a list of items
(e.g. an array) as arguments. And you can have a postfix loop to
print sequentially.
With a (postfix) loop, all the loop elements are aliased to $_,
which is a handy way to do substitutions on the array.
Since the entire file is just one line.You can split the string on basis of whitespace and print every element of array in new line
use strict;
use warnings;
open(FILE,'YOURFILE' ) || die ("could not open");
my $line= <FILE>;
my #array = split ' ',$line;
foreach my $n(#array)
{
print "$n\n";
}
close(FILE);
Input File
In recent years many risk factors for the development of breast cancer that .....
Output
In
recent
years
many
risk
factors
for
the
development
of
breast
cancer
that
.....
I've found in a Module a for-loop written like this
for( #array ) {
my $scalar = $_;
...
...
}
Is there Difference between this and the following way of writing a for-loop?
for my $scalar ( #array ) {
...
...
}
Yes, in the first example, the for loop is acting as a topicalizer (setting $_ which is the default argument to many Perl functions) over the elements in the array. This has the side effect of masking the value $_ had outside the for loop. $_ has dynamic scope, and will be visible in any functions called from within the for loop. You should primarily use this version of the for loop when you plan on using $_ for its special features.
Also, in the first example, $scalar is a copy of the value in the array, whereas in the second example, $scalar is an alias to the value in the array. This matters if you plan on setting the array's value inside the loop. Or, as daotoad helpfully points out, the first form is useful when you need a copy of the array element to work on, such as with destructive function calls (chomp, s///, tr/// ...).
And finally, the first example will be marginally slower.
$_ is the "default input and pattern matching space". In other words, if you read in from a file handle at the top of a while loop, or run a foreach loop and don't name a loop variable, $_ is set up for you.
However, if you write a foreach loop and name a loop variable, $_ is not set up.This can be justified by following code:
1. #!/usr/bin/perl -w
2. #array = (1,2,3);
3. foreach my $elmnt (#array)
4. {
5. print "$_ ";
6. }
The output being "Use of uninitialized value in concatenation (.)"
However if you replace line 3 by:
foreach (#array)
The output is "1 2 3" as expected.
Now in your case, it is always better to name a loop variable in a foreach loop to make the code more readable(perl is already cursed much for being less readable), this way there will also be no need of explicit assignment to the $_ variable and resulting scoping issues.
I can't explain better than the doc can