Not entirely sure why but for some reason i cant print the hash value outside the while loop.
#!/usr/bin/perl -w
opendir(D, "cwd" );
my #files = readdir(D);
closedir(D);
foreach $file (#files)
{
open F, $file or die "$0: Can't open $file : $!\n";
while ($line = <F>) {
chomp($line);
$line=~ s/[-':!?,;".()]//g;
$line=~ s/^[a-z]/\U/g;
#words = split(/\s/, $line);
foreach $word (#words) {
$frequency{$word}++;
$counter++;
}
}
close(F);
print "$file\n";
print "$ARGV[0]\n";
print "$frequency{$ARGV[0]}\n";
print "$counter\n";
}
Any help would be much appreciated!
cheers.
This line
print "$frequency{$ARGV[0]}\n";
Expects you to have an argument to your script, e.g. perl script.pl argument. If you have no argument, $ARGV[0] is undefined, but it will stringify to the empty string. This empty string is a valid key in the hash, but the value is undefined, hence your warning
Use of uninitialized value within %frequency in concatenation (.) or string
But you should also see the warning
Use of uninitialized value $ARGV[0] in hash element
And it is a very big mistake not to include that error in this question.
Also, when using readdir, you get all the files in the directory, including directories. You might consider filtering the files somewhat.
Using
use strict;
use warnings;
Is something that will benefit you very much, so add that to your script.
I had originally written this,
There is no %frequency defined at the top level of your program.
When perl sees you reference %frequency inside the inner-most
loop, it will auto-vivify it, in that scratchpad (lexical scope).
This means that when you exit the inner-most loop (foreach $word
(#words)), the auto-vivified %frequency is out of scope and
garbage-collected. Each time you enter that loop, a new, different
variable will be auto-vivified, and then discarded.
When you later refer to %frequency in your print, yet another new,
different %frequency will be created.
… but then realized that you had forgotten to use strict, and Perl was being generous and giving you a global %frequency, which ironically is probably what you meant. So, this answer is wrong in your case … but declaring the scope of %frequency would probably be good form, regardless.
These other, “unrelated” notes are still useful perhaps, or else I'd delete the answer altogether:
As #TLP mentioned, you should probably also skip directories (at least) in your file loop. A quick way to do this would be my #files = grep { -f "cwd/$_" } (readdir D); this will filter the list to contain only files.
I'm further suspicious that you named a directory "cwd" … are you perhaps meaning the current working directory? In all the major OS'es in use today, that directory is referenced as “.” — you're looking for a directory literally named "cwd"?
Related
I would like to use
myscript.pl targetfolder/*
to read some number from ASCII files.
myscript.pl
#list = <#ARGV>;
# Is the whole file or only 1st line is loaded?
foreach $file ( #list ) {
open (F, $file);
}
# is this correct to judge if there is still file to load?
while ( <F> ) {
match_replace()
}
sub match_replace {
# if I want to read the 5th line in downward, how to do that?
# if I would like to read multi lines in multi array[row],
# how to do that?
if ( /^\sName\s+/ ) {
$name = $1;
}
}
I would recommend a thorough read of perlintro - it will give you a lot of the information you need. Additional comments:
Always use strict and warnings. The first will enforce some good coding practices (like for example declaring variables), the second will inform you about potential mistakes. For example, one warning produced by the code you showed would be readline() on unopened filehandle F, giving you the hint that F is not open at that point (more on that below).
#list = <#ARGV>;: This is a bit tricky, I wouldn't recommend it - you're essentially using glob, and expanding targetfolder/* is something your shell should be doing, and if you're on Windows, I'd recommend Win32::Autoglob instead of doing it manually.
foreach ... { open ... }: You're not doing anything with the files once you've opened them - the loop to read from the files needs to be inside the foreach.
"Is the whole file or only 1st line is loaded?" open doesn't read anything from the file, it just opens it and provides a filehandle (which you've named F) that you then need to read from.
I'd strongly recommend you use the more modern three-argument form of open and check it for errors, as well as use lexical filehandles since their scope is not global, as in open my $fh, '<', $file or die "$file: $!";.
"is this correct to judge if there is still file to load?" Yes, while (<$filehandle>) is a good way to read a file line-by-line, and the loop will end when everything has been read from the file. You may want to use the more explicit form while (my $line = <$filehandle>), so that your variable has a name, instead of the default $_ variable - it does make the code a bit more verbose, but if you're just starting out that may be a good thing.
match_replace(): You're not passing any parameters to the sub. Even though this code might still "work", it's passing the current line to the sub through the global $_ variable, which is not a good practice because it will be confusing and error-prone once the script starts getting longer.
if (/^\sName\s+/){$name = $1;}: Since you've named the sub match_replace, I'm guessing you want to do a search-and-replace operation. In Perl, that's called s/search/replacement/, and you can read about it in perlrequick and perlretut. As for the code you've shown, you're using $1, but you don't have any "capture groups" ((...)) in your regular expression - you can read about that in those two links as well.
"if I want to read the 5th line in downward , how to do that ?" As always in Perl, There Is More Than One Way To Do It (TIMTOWTDI). One way is with the range operator .. - you can skip the first through fourth lines by saying next if 1..4; at the beginning of the while loop, this will test those line numbers against the special $. variable that keeps track of the most recently read line number.
"and if I would like to read multi lines in multi array[row], how to do that ?" One way is to use push to add the current line to the end of an array. Since keeping the lines of a file in an array can use up more memory, especially with large files, I'd strongly recommend making sure you think through the algorithm you want to use here. You haven't explained why you would want to keep things in an array, so I can't be more specific here.
So, having said all that, here's how I might have written that code. I've added some debugging code using Data::Dumper - it's always helpful to see the data that your script is working with.
#!/usr/bin/env perl
use warnings;
use strict;
use Data::Dumper; # for debugging
$Data::Dumper::Useqq=1;
for my $file (#ARGV) {
print Dumper($file); # debug
open my $fh, '<', $file or die "$file: $!";
while (my $line = <$fh>) {
next if 1..4;
chomp($line); # remove line ending
match_replace($line);
}
close $fh;
}
sub match_replace {
my ($line) = #_; # get argument(s) to sub
my $name;
if ( $line =~ /^\sName\s+(.*)$/ ) {
$name = $1;
}
print Data::Dumper->Dump([$line,$name],['line','name']); # debug
# ... do more here ...
}
The above code is explicitly looping over #ARGV and opening each file, and I did say above that more verbose code can be helpful in understanding what's going on. I just wanted to point out a nice feature of Perl, the "magic" <> operator (discussed in perlop under "I/O Operators"), which will automatically open the files in #ARGV and read lines from them. (There's just one small thing, if I want to use the $. variable and have it count the lines per file, I need to use the continue block I've shown below, this is explained in eof.) This would be a more "idiomatic" way of writing that first loop:
while (<>) { # reads line into $_
next if 1..4;
chomp; # automatically uses $_ variable
match_replace($_);
} continue { close ARGV if eof } # needed for $. (and range operator)
I am very very new to perl programming.
While reading about the loops, for the foreach loop I got two examples.
The one example is,
foreach ('hickory','dickory','doc') {
print $_;
print "\n";
}
Output:-
hickory
dickory
doc
The $_ variable contains the each item. So, it prints.
In another example, they said did not specified the $_ variable in print statement. The empty print statement only there. How it prints the foreach arguments.
foreach ('hickory','dickory','doc') {
print;
print "\n";
}
Output:-
hickory
dickory
doc
For this also the same output. How it prints the values. In that book they did not given any explanation for that. I was searched in internet. But I am not able to find anything.
Your question about print in foreach being answered, here is a little more on $_.
From General Variables in perlvar
Here are the places where Perl will assume $_ even if you don't use it:
The following functions use $_ as a default argument:
abs, alarm, chomp, chop, chr, chroot, cos, defined, eval, evalbytes, exp, fc, glob, hex, int, lc, lcfirst, length, log, lstat, mkdir, oct, ord, pos, print, printf, quotemeta, readlink, readpipe, ref, require, reverse (in scalar context only), rmdir, say, sin, split (for its second argument), sqrt, stat, study, uc, ucfirst, unlink, unpack.
All file tests (-f , -d ) except for -t , which defaults to STDIN. See -X
The pattern matching operations m//, s/// and tr/// (aka y///) when used without an =~ operator.
The default iterator variable in a foreach loop if no other variable is supplied.
The implicit iterator variable in the grep() and map() functions.
The implicit variable of given().
The default place to put the next value or input record when a <FH>, readline, readdir or each operation's result is tested by itself as the sole criterion of a while test. Outside a while test, this will not happen.
$_ is by default a global variable.
As you can see, it is available nearly everywhere and it is indeed used a lot. Note that the perlvar page describes a whole lot more of similar variables, many of them good to know about.
Here is an example. Consider that we read lines from a file, want to discard the ones which have only spaces or start with # (comments), and for others want to split them by spaces into words.
open my $fh, '<', $file or die "Can't open $file: $!";
while (<$fh>)
{
next if not /\S/;
next if /^\s*#/;
my #words = split;
# do something with #words ...
}
Let's see how many uses of $_ are in the above example. Here is an equivalent program
while (my $line = <$fh>)
{
next if not $line =~ m/\S/; # if not matching any non-space character
next if $line =~ m/^\s*#/; # if matching # after only (possible) spaces
my #words = split ' ', $line; # split $line by ' ' (any white space)
# do something with #words ...
}
Compare these two
the filehandle read <$fh> in the while condition assigns to $_, then available in the loop.
regular expression's match operator by default works on $_. The m itself can be dropped.
split by default splits $_. We also use the other default, for the pattern to split the string by, which is ' ' (any amount of any white space).
once we do $line = <$fh> the deal with $_ is off (it is undefined in the loop) and we have to use $line everywhere. So either do this or do while (<$fh>) and use $_.
To illustrate all this a bit further, let us find the longest capitalized word on each line
use List::Util 'max';
my $longest_cap = max map { length } grep { /^[A-Z]/ } #words;
The grep takes the list in #words and applies the block to each element. Each element is assigned to $_ and is thus available to the code inside the block as $_. This is what the regex uses by default. The ones that satisfy the condition are passed to map, which also iterates assigning them to $_, what is of course the default for length. Finally max from List::Util picks the largest one.
Note that $_ is never actually written and no temporary variable is needed.
Here is some of the relevant documentation. The I/O Operators in perlop discusses while (<$fh>) and all manner of related things. The regex part is in Regexp Quote-Like Operators in perlop and in perlretut. Also have a look at split.
Defaults are used regularly and to read code written by others you must understand them. When you write your own code though you can choose whether to use $_ or not, as one can always introduce a lexical variable instead of it.
So, when to use $_ as default (which need not be written) and when not to?
Correct use of defaults, $_ in particular, can lead to clearer and more readable code. What generally means better code. But it is quite possible to push this too far and end up with obscure, tricky, and brittle code. So good taste is required.
Another case is when some parts of the code benefit from having $_ for their defaults while at other places you then have to use $_ explicitly. I'd say that if $_ is seen more than once or twice in a section of code it means that there should be a properly named variable instead.
Overall, if in doubt simply name everything.
If you are not declaring any variable in your foreach loop it sets by default $_
From perldoc about foreach:
The foreach keyword is actually a synonym for the for keyword, so you
can use either. If VAR is omitted, $_ is set to each value.
So it explains the first loop.
The second loop, as you already know now that $_ is set with each element from your array, will works because you are omitting the $var.
You could use foreach loop with explicit variable like this:
foreach my $item ( #list )
{
print "My item is: $item\n";
}
Or you can omit like you did and print will still work as #Dada said because:
If FILEHANDLE is omitted, prints to the last selected (see select)
output handle. If LIST is omitted, prints $_ to the currently selected
output handle.
I will explain why you getting same results with the different syntax:
If you omit the control variable from the beginning of the foreach
loop, Perl uses its favorite default variable, $_ . This is (mostly) just like any other scalar variable, except for its unusual name. For example:
foreach ('hickory','dickory','doc') {
print $_;
print "\n";
}
Output :
hickory
dickory
doc
Although this isn’t Perl’s only default by a long shot, it’s Perl’s most common default. You’ll see many other cases in which Perl will automatically use $_ when you don’t tell it to use some other variable or value, thereby saving the programmer from the heavy labor of having to think up and type a new variable name. So as not to keep you in suspense, one of those cases is
print , which will print $_ if given no other argument:
foreach ('hickory','dickory','doc') {
print; # prints $_ by default
print "\n";
}
Output :-
hickory
dickory
doc
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I want to fill the folder with copies of the same file that would be called differently. I created a filelist.txt to get filenames using Windows cmd and then the following code:
use strict; # safety net
use warnings; # safety net
use File::NCopy qw(copy);
open FILE, 'C:\blabla\filelist.txt';
my #filelist = <FILE>;
my $filelistnumber = #filelist + 1;
my $file = 0;
## my $filename = 'null.txt';
my $filename = $filelist[$file];
while( $file < $filelistnumber ){
copy('base.smp','temp.smp');
rename 'temp.smp', $filename;
$file = $file + 1;
};
If I try renaming it into 'test.smp' or whatever, it works. If I try the code above, I get this:
Use of uninitialized value $filename in print at blablabla/bla/bla.pl line 25, <FILE> line 90.
What am I doing wrong? I feel there's some kind of little mistake, a syntax mistake probably, that keeps evading me.
First, here's some improved code:
use strict;
use warnings;
use File::Copy;
while (<>) {
chomp;
copy('base.smp', $_) or die $!;
}
You'll save it as script.pl and invoke it like this:
$ perl script.pl C:\blabla\filelist.txt
In what ways is this code an improvement?
It uses the core module File::Copy instead of the deprecated File::NCopy.
It uses the null filehandle or "diamond operator" (<>) to implicitly iterate over a file given as a command line parameter, which is simple and elegant.
It handles errors in the event that copy() fails for some reason.
It doesn't use a while loop or a C-style for loop to iterate over an array, which are both prone to off-by-one errors and forgetting to re-assign the iterator, as you've discovered.
It doesn't use the old 2-argument syntax for open(). (Well, not explicitly, but that's kind of beyond the scope of this answer.)
What am I doing wrong? I feel there's some kind of little mistake, a
syntax mistake probably, that keeps evading me.
A syntax error would have resulted in an error message saying that there was a syntax error. But since you asked what you're doing wrong, let's walk through it:
use File::NCopy qw(copy);
This module was last updated in 2007 and is marked as deprecated. Don't use it.
open FILE, 'C:\blabla\filelist.txt';
You should use the three-argument form of open, use a lexical filehandle, and always check the return values of system calls.
my #filelist = <FILE>;
Rarely do you need to slurp an entire file into memory. In this case, you don't.
my $filelistnumber = #filelist + 1;
There's nothing inherently wrong with this line, but there is when you consider how you're using it later on. Remember that arrays are 0-indexed, so you've just set yourself up for an out of bounds array index. But we'll get to that in a second.
my $filename = $filelist[$file];
You would typically want to do this assignment inside your loop, lest you forget to update it after incrementing your counter (which is exactly what happened here).
while( $file < $filelistnumber ){
This is an odd way to iterate over an array in Perl. You could use a typical C-style for loop, but the most Perlish thing to do would be to use a foreach-style loop:
for my $element (#array) {
...
}
Each element of the list is localized to the loop, and you don't have to worry about counters, conditions, or array bounds.
copy('base.smp','temp.smp');
Again, always check the return values of system calls.
rename 'temp.smp', $filename;
No need to do a copy and a rename. You can copy to your final destination filename the first time. But if you are going to rename, always check the return values of system calls.
};
Blocks don't need to be terminated with a semicolon like simple statements do.
You should avoid using bareword file handles. When opening you should open using a file reference like and make sure you catch it if it fails:
open(my $fh, '<', 'C:\blabla\filelist.txt') or die "Cannot open filelist.txt: $!";
The $fh variable will contain your file reference.
For your problem it looks as though your filelist.txt must be empty. Try using Data::Dumper to print out your #filelist to determine it's contents.
use Data::Dumper;
EDIT:
Looks like you are also wanting to be setting the $filename variable to the next one in the list for each iteration, so put $filename = $filelist[$file]; at the beginning of your loop.
Your problem could be that you are looping too far? Try getting rid of the + 1 in my $filelistnumber = #filelist + 1;
What seemed liked a straightforward piece of code most certainly didn't do what I wanted it to do.
Can somebody explain to me what it does do and why?
my $dir = './some/directory';
if ( -d $dir && <$dir/*> ) {
print "Dir exists and has non-hidden files in it\n";
}
else {
print "Dir either does not exist or has no non-hidden files in it\n";
}
In my test case, the directory did exist and it was empty. However, the then (first) section of the if triggered instead of the else section as expected.
I don't need anybody to suggest how to accomplish what I want to accomplish. I just want to understand Perl's interpretation of this code, which definitely does not match mine.
Using glob (aka <filepattern>) in a scalar context makes it an iterator; it will return one file at a time each time it is called, and will not respond to changes in the pattern (e.g. a different $dir) until it has finished iterating over the initial results; I suspect this is causing the trouble you see.
The easy answer is to always use it in list context, like so:
if( -d $dir && ( () = <$dir/*> ) ) {
glob may only really be used safely in scalar context in code you will execute more than once if you are absolutely sure you will exhaust the iterator before you try to start a new iteration. Most of the time it's just easier to avoid glob in scalar context altogether.
I believe that #ysth is on the right track, but repeated calls to glob in scalar context don't generate false positives.
For example
use strict;
use warnings;
use 5.010;
say scalar glob('/usr/*'), "\n";
say scalar glob('/usr/*'), "\n";
output
/usr/bin
/usr/bin
But what is true is that any single call to glob maintains a state, so if I have
use strict;
use warnings;
use 5.010;
for my $dir ( '/sys', '/usr', '/sys', '/usr' ) {
say scalar glob("$dir/*"), "\n";
}
output
/sys/block
/sys/bus
/sys/class
/sys/dev
So clearly that glob statement inside the loop is maintaining a state, and ignoring the changes to $dir.
This is similar to the way that the pos (and corresponding \G regex anchor) has a state per scalar variable, and how print without a specific file handle prints to the last selected handle. In the end it is how all of Perl works, with the it variable $_ being the ultimate example.
I intend to recursively traverse a directory containing this piece of perl script.
The idea is to traverse all directories whose parent directory contains the perl script and list all files path into a single array variable. Then return the list.
Here comes the error msg:
readdir() attempted on invalid dirhandle $DIR at xxx
closedir() attempted on invalid dirhandle $DIR at xxx
Code is attached for reference, Thank you in advance.
use strict;
use warnings;
use Cwd;
our #childfile = ();
sub recursive_dir{
my $current_dir = $_[0]; # a parameter
opendir(my $DIR,$current_dir) or die "Fail to open current directory,error: $!";
while(my $contents = readdir($DIR)){
next if ($contents =~ m/^\./); # filter out "." and ".."
#if-else clause separate dirs from files
if(-d "$contents"){
#print getcwd;
#print $contents;
#closedir($DIR);
recursive_dir(getcwd."/$contents");
print getcwd."/$contents";
}
else{
if($contents =~ /(?<!\.pl)$/){
push(#childfile,$contents);
}
}
}
closedir($DIR);
#print #childfile;
return #childfile;
}
recursive_dir(getcwd);
Please tell us if this is homework? You are welcome to ask for help with assignments, but it changes the sort of answer you should be given.
You are relying on getcwd to give you the current directory that you are processing, yet you never change the current working directory so your program will loop endlessly and eventually run out of memory. You should simply use $current_dir instead.
I don't believe that those error messages can be produced by the program you show. Your code checks whether opendir has succeeded and the program dies unless $DIR is valid, so the subsequent readdir and closedir must be using a valid handle.
Some other points:
Comments like # a parameter are ridiculous and only serve to clutter your code
Upper-case letters are generally reserved for global identifiers like package names. And $dir is a poor name for a directory handle, as it could also mean the directory name or the directory path. Use $dir_handle or $dh
It is crazy to use a negative look-behind just to check that a file name doesn't end with .pl. Just use push #childfile, $contents unless $contents =~ /\.pl$/
You never use the return value from your subroutine, so it is wasteful of memory to return what could be an enormous array from every call. #childfile is accessible throughout the program so you can just access it directly from anywhere
Don't put scalar variables inside double quotes. It simply forces the value to a string, which is probably unnecessary and may cause arcane bugs. Use just -d $contents
You probably want to ignore symbolic links, as otherwise you could be looping endlessly. You should change else { ... } to elsif (-f $contents) { ... }