Perl - Code review [closed] - perl

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am working on a program that takes information from a CSV file as a source to search with through a text file that has "customer packages". I am getting odd counts on only some of the entries, and I can't seem to figure out what is causing the duplicate counts. Can anyone look through my code and tell me if my logic/syntax is off? (probably is). All i am trying to accomplish is to count the total occurances in the text file of an entry in the csv file (packageid,package_description)
Thanks for the help! im going nuts over here.
#!/usr/bin/perl
use strict;
use Text::CSV;
# Variables already declared in the other PL file ** Remove if consolidating **
my $file2 = 'master_plist.csv';
my $csv2 = Text::CSV->new(); # Create a Text::CSV object
open (CSV2, "<", $file2) or die $!; #open CSV file for parsing
while (<CSV2>) {
if ($csv2->parse($_)) {
my #columns2 = $csv2->fields(); # Parse CSV and load into an array for each row.
my $packID = $columns2[0];
my $packDESC = $columns2[1];
my $val = 'customer_packages_report.txt';
chomp ($val);
my $cnt=0;
open (HNDL, "$val") || die "wrong filename";
while ($val = <HNDL>)
{
while ($val =~ /$packID - $packDESC/ig)
{
$cnt++;
}
}
#if ($packDESC =~ /\(/g) {
# $packDESC =~ s/\(/\(/g;
#}
print "Total iterations of $packDESC: $cnt\n";
close (HNDL);
# End original code
} # Close IF
} # Close WHILE
close CSV;

#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
# Variables already declared in the other PL file ** Remove if consolidating **
my $file2 = 'master_plist.csv';
my $csv2 = Text::CSV->new(); # Create a Text::CSV object
open (CSV2, "<", $file2) or die "I die while opening $file2! $!"; #open CSV file for parsing
while ($each_csv2_line=<CSV2>) {
if ($csv2->parse($each_csv2_line)) {
my #columns2 = $csv2->fields(); # Parse CSV and load into an array for each row.
my $packID = $columns2[0];
my $packDESC = $columns2[1];
my $val = 'customer_packages_report.txt';
chomp ($val);
my $cnt=0;
open (HNDL,"<","$val") or die "wrong filename: $val! $!";
while (<HNDL>){
$cnt++ while (/$packID - $packDESC/ig);
}
#if ($packDESC =~ /\(/g) {
# $packDESC =~ s/\(/\(/g;
#}
print "Total iterations of $packDESC: $cnt\n";
close (HNDL);
# End original code
} # Close IF
} # Close WHILE
# end of script
close CSV;
My recommendations:
Use $HNDL instead of HNDL <- lexical variables for filehandles more better.
Try to catch all mistakes (by defined and ==0 and eq "")
I try to format your code and add some features that i sometimes use. Be better than me and read first Style Coding for Little Perl Monk. And you can be more impressive with this language and write not only writeonly code.
Example (and also a quote):
"The situation is exactly the same for the line-input operator, <>, although Perl does this for you automatically.
It looks like you’re testing the line from STDIN in this while:
while (<STDIN>) {
do_something($_);
}
However, this is a special case in which Perl automatically converts to check $_ for definedness:
while ( defined( $_ = <STDIN> ) ) { # implicitly done
do_something($_);
}
"
Effective Perl Programming, page 24.

You could do a number of things to improve your code:
use warnings;.
Use proper indentation.
Use descriptive variable names. Instead of $file2 (has no meaning, and why is there no file 1?), use $package_file or whatever makes sense.
if you are already using Text::CSV, you can use $csv->getline() to go through the file line by line. This will simplify your code. See the documentation for an example.
chomp($val) removes a newline from the end of a string. You are using it on a string literal you just declared, which has no newline. That doesn't make sense.
Never use the same variable ($val) to do two completely different things. This is extremely confusing.
Might the variables that you are interpolating in the regex contain special characters? If so, you need to escape them. For example, if $packDESC contained a period, it would match any character in the regex. To treat the contents of the variable literally, use \Q..\E, as in this example: /\Q$packID - $packDESC\E/ig.
You are opening customer_packages_report.txt and going through it line-by-line on every line of the csv file. You could simplify this by reading it in once and storing the results in an array.
You don't need a while loop to count matches: $cnt = () = /$packID - $packDESC/ig;. This puts the match in array context, returning an array of matches, then puts it back in scalar context to count the matches. A little bit tricky, but simpler.
It's hard to say exactly what is causing your problem without seeing the data. Might you have some unnecessary repetition that stems from your nested looping over both files? I would start by rewriting to improve your code, then see if the problem still exists.

Your code seems to compile with perl -c without errors, so that's good. If I were to guess, I would assume your problem lies in having meta characters in some of your fields. The regex /$packID - $packDESC/ is vulnerable to meta characters. For example
my $str = "foo? bar";
$str =~ /$str/; # returns false, because ? is a meta character
In the above example, the question mark ? is a quantifier which affects whatever comes before it, so that o? means "0 or 1 o". To solve the meta character problem, use the \Q ... \E escape:
$str =~ /\Q$str/; # will now match
Terminating the escape sequence with \E is optional.
Some other things to note:
It is very good that you use use strict. You should also always use warnings. Not doing so is not removing the issues with your code, only hiding them.
You create a Text::CSV object with default settings. Depending on your input, that may or may not be appropriate. Setting binary => 1 is recommended in the documentation.
Using the parse() function may not be the best option, the documentation has good things to say about getline.
As loldop points out in the comments, you are reusing $val to read from your file. While technically that should work, it is asking for trouble.
Style and practice notes and practical tips:
Using three-argument open and lexical file handles is a good thing to do. Three-argument in essence means to use an explicit open mode, which makes your script safer to use. Using lexical file handles means that you will not have global scope on your file handle, which is a good thing.
This code
my #columns2 = $csv2->fields();
my $packID = $columns2[0];
my $packDESC = $columns2[1];
Can be written like this
my ($packID, $packDESC) = $csv2->fields();
You are chomping $val right after you assign it. That is redundant, because chomp by default only removes newlines from the end of your strings, and you did not add any such. It doesn't change anything, but not required here. If you read something from stdin or a file, you would probably want to use chomp, though.
Using die without referring to the error $! is a sure way to make yourself annoyed.
Do not underestimate how much easier it becomes to write code when you use proper indentation. Use a text editor with automatic indentation and colouring. I can warmly recommend vim (gvim if you are using windows). Though it has a learning curve, is is a powerful editor that also often comes already installed on many systems.

Since so many people have already commented on your program itself, I'm going to talk about how you can become a better Perl programmer, and help write in such a way that will help eliminate many of your issues.
Take a look at Perl::Tidy and run your program thorough that. That will help improve your syntax and Perl and will help you catch a lot of the various issues you're having.
Also, you should get a copy of Perl Best Practices which is where most of Perl Tidy is taken from. And, as someone already referenced Effective Perl Programming is another excellent book.
The big issue with Perl is that few people learn it. Most are tossed into a situation where we had to pick it up ourselves. Plus, Perl is a fairly old and rather crufty language. Most Perl books still lean heavily on Perl 3.x ways of programming and fail to mention such basics as using use strict; and use warnings;.
You combine old programming practices, with most people learning Perl by hacking their way through old programs with old syntax (and probably written by people who learned Perl by hacking their way through even older programs), and you can see why Perl has a reputation of being a write-only language.

You may want to use the getline method from Text::CSV, which saves a few lines of code.
The problem is likely to be because you have regex metacharacters in the strings you are searching for. Escape them with \Q...\E in the regex so that they are taken literally. In the rewrite below I have also added \s* instead of a literal space, just in case there isn't exactly one space on either side of the hyphen.
I have also changed the filehandles to lexical ones, which have the advantage that they will be closed automatically when the handle goes out of scope.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $file2 = 'master_plist.csv';
my $csv2 = Text::CSV->new();
open(my $csv_fh, '<', $file2) or die $!;
while (my $row = $csv2->getline($csv_fh)) {
my ($packID, $packDESC) = #$row;
my $val = 'customer_packages_report.txt';
chomp($val);
open(my $fh, '<', $val) or die "wrong filename";
my $cnt = 0;
while ($val = <$fh>) {
while ($val =~ /\Q$packID\E\s*-\s*\Q$packDESC\E/ig) {
$cnt++;
}
}
print "Total iterations of $packDESC: $cnt\n";
}

Related

Perl, find a match and read next line in perl

I would like to use
myscript.pl targetfolder/*
to read some number from ASCII files.
myscript.pl
#list = <#ARGV>;
# Is the whole file or only 1st line is loaded?
foreach $file ( #list ) {
open (F, $file);
}
# is this correct to judge if there is still file to load?
while ( <F> ) {
match_replace()
}
sub match_replace {
# if I want to read the 5th line in downward, how to do that?
# if I would like to read multi lines in multi array[row],
# how to do that?
if ( /^\sName\s+/ ) {
$name = $1;
}
}
I would recommend a thorough read of perlintro - it will give you a lot of the information you need. Additional comments:
Always use strict and warnings. The first will enforce some good coding practices (like for example declaring variables), the second will inform you about potential mistakes. For example, one warning produced by the code you showed would be readline() on unopened filehandle F, giving you the hint that F is not open at that point (more on that below).
#list = <#ARGV>;: This is a bit tricky, I wouldn't recommend it - you're essentially using glob, and expanding targetfolder/* is something your shell should be doing, and if you're on Windows, I'd recommend Win32::Autoglob instead of doing it manually.
foreach ... { open ... }: You're not doing anything with the files once you've opened them - the loop to read from the files needs to be inside the foreach.
"Is the whole file or only 1st line is loaded?" open doesn't read anything from the file, it just opens it and provides a filehandle (which you've named F) that you then need to read from.
I'd strongly recommend you use the more modern three-argument form of open and check it for errors, as well as use lexical filehandles since their scope is not global, as in open my $fh, '<', $file or die "$file: $!";.
"is this correct to judge if there is still file to load?" Yes, while (<$filehandle>) is a good way to read a file line-by-line, and the loop will end when everything has been read from the file. You may want to use the more explicit form while (my $line = <$filehandle>), so that your variable has a name, instead of the default $_ variable - it does make the code a bit more verbose, but if you're just starting out that may be a good thing.
match_replace(): You're not passing any parameters to the sub. Even though this code might still "work", it's passing the current line to the sub through the global $_ variable, which is not a good practice because it will be confusing and error-prone once the script starts getting longer.
if (/^\sName\s+/){$name = $1;}: Since you've named the sub match_replace, I'm guessing you want to do a search-and-replace operation. In Perl, that's called s/search/replacement/, and you can read about it in perlrequick and perlretut. As for the code you've shown, you're using $1, but you don't have any "capture groups" ((...)) in your regular expression - you can read about that in those two links as well.
"if I want to read the 5th line in downward , how to do that ?" As always in Perl, There Is More Than One Way To Do It (TIMTOWTDI). One way is with the range operator .. - you can skip the first through fourth lines by saying next if 1..4; at the beginning of the while loop, this will test those line numbers against the special $. variable that keeps track of the most recently read line number.
"and if I would like to read multi lines in multi array[row], how to do that ?" One way is to use push to add the current line to the end of an array. Since keeping the lines of a file in an array can use up more memory, especially with large files, I'd strongly recommend making sure you think through the algorithm you want to use here. You haven't explained why you would want to keep things in an array, so I can't be more specific here.
So, having said all that, here's how I might have written that code. I've added some debugging code using Data::Dumper - it's always helpful to see the data that your script is working with.
#!/usr/bin/env perl
use warnings;
use strict;
use Data::Dumper; # for debugging
$Data::Dumper::Useqq=1;
for my $file (#ARGV) {
print Dumper($file); # debug
open my $fh, '<', $file or die "$file: $!";
while (my $line = <$fh>) {
next if 1..4;
chomp($line); # remove line ending
match_replace($line);
}
close $fh;
}
sub match_replace {
my ($line) = #_; # get argument(s) to sub
my $name;
if ( $line =~ /^\sName\s+(.*)$/ ) {
$name = $1;
}
print Data::Dumper->Dump([$line,$name],['line','name']); # debug
# ... do more here ...
}
The above code is explicitly looping over #ARGV and opening each file, and I did say above that more verbose code can be helpful in understanding what's going on. I just wanted to point out a nice feature of Perl, the "magic" <> operator (discussed in perlop under "I/O Operators"), which will automatically open the files in #ARGV and read lines from them. (There's just one small thing, if I want to use the $. variable and have it count the lines per file, I need to use the continue block I've shown below, this is explained in eof.) This would be a more "idiomatic" way of writing that first loop:
while (<>) { # reads line into $_
next if 1..4;
chomp; # automatically uses $_ variable
match_replace($_);
} continue { close ARGV if eof } # needed for $. (and range operator)

Perl: renaming doesn't work for $value filename [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I want to fill the folder with copies of the same file that would be called differently. I created a filelist.txt to get filenames using Windows cmd and then the following code:
use strict; # safety net
use warnings; # safety net
use File::NCopy qw(copy);
open FILE, 'C:\blabla\filelist.txt';
my #filelist = <FILE>;
my $filelistnumber = #filelist + 1;
my $file = 0;
## my $filename = 'null.txt';
my $filename = $filelist[$file];
while( $file < $filelistnumber ){
copy('base.smp','temp.smp');
rename 'temp.smp', $filename;
$file = $file + 1;
};
If I try renaming it into 'test.smp' or whatever, it works. If I try the code above, I get this:
Use of uninitialized value $filename in print at blablabla/bla/bla.pl line 25, <FILE> line 90.
What am I doing wrong? I feel there's some kind of little mistake, a syntax mistake probably, that keeps evading me.
First, here's some improved code:
use strict;
use warnings;
use File::Copy;
while (<>) {
chomp;
copy('base.smp', $_) or die $!;
}
You'll save it as script.pl and invoke it like this:
$ perl script.pl C:\blabla\filelist.txt
In what ways is this code an improvement?
It uses the core module File::Copy instead of the deprecated File::NCopy.
It uses the null filehandle or "diamond operator" (<>) to implicitly iterate over a file given as a command line parameter, which is simple and elegant.
It handles errors in the event that copy() fails for some reason.
It doesn't use a while loop or a C-style for loop to iterate over an array, which are both prone to off-by-one errors and forgetting to re-assign the iterator, as you've discovered.
It doesn't use the old 2-argument syntax for open(). (Well, not explicitly, but that's kind of beyond the scope of this answer.)
What am I doing wrong? I feel there's some kind of little mistake, a
syntax mistake probably, that keeps evading me.
A syntax error would have resulted in an error message saying that there was a syntax error. But since you asked what you're doing wrong, let's walk through it:
use File::NCopy qw(copy);
This module was last updated in 2007 and is marked as deprecated. Don't use it.
open FILE, 'C:\blabla\filelist.txt';
You should use the three-argument form of open, use a lexical filehandle, and always check the return values of system calls.
my #filelist = <FILE>;
Rarely do you need to slurp an entire file into memory. In this case, you don't.
my $filelistnumber = #filelist + 1;
There's nothing inherently wrong with this line, but there is when you consider how you're using it later on. Remember that arrays are 0-indexed, so you've just set yourself up for an out of bounds array index. But we'll get to that in a second.
my $filename = $filelist[$file];
You would typically want to do this assignment inside your loop, lest you forget to update it after incrementing your counter (which is exactly what happened here).
while( $file < $filelistnumber ){
This is an odd way to iterate over an array in Perl. You could use a typical C-style for loop, but the most Perlish thing to do would be to use a foreach-style loop:
for my $element (#array) {
...
}
Each element of the list is localized to the loop, and you don't have to worry about counters, conditions, or array bounds.
copy('base.smp','temp.smp');
Again, always check the return values of system calls.
rename 'temp.smp', $filename;
No need to do a copy and a rename. You can copy to your final destination filename the first time. But if you are going to rename, always check the return values of system calls.
};
Blocks don't need to be terminated with a semicolon like simple statements do.
You should avoid using bareword file handles. When opening you should open using a file reference like and make sure you catch it if it fails:
open(my $fh, '<', 'C:\blabla\filelist.txt') or die "Cannot open filelist.txt: $!";
The $fh variable will contain your file reference.
For your problem it looks as though your filelist.txt must be empty. Try using Data::Dumper to print out your #filelist to determine it's contents.
use Data::Dumper;
EDIT:
Looks like you are also wanting to be setting the $filename variable to the next one in the list for each iteration, so put $filename = $filelist[$file]; at the beginning of your loop.
Your problem could be that you are looping too far? Try getting rid of the + 1 in my $filelistnumber = #filelist + 1;

PERL: String Replacement on file

I am working on a script to do a string replacement in a file and I will read the variables and values and files from a configuration file and do string replacement.
Here is my logic to do a string replacement.
sub expansion($$$){
my $f = shift(#_) ; # file Name
my $vname = shift(#_) ; # variable name for pattern match
my $value = shift(#_) ; # value to replace
my $n = "$f".".new";
open ( O, "<$f") or print( "Can't open $f file: $!");
open ( N ,">$n" ) or print( "Can't open $n file: $!");
while (<O>)
{
$_ =~ s/$vname/$value/g; #check for pattern
print N "$_" ;
}
close (O);
close (N);
}
In my logic am reading line by line in from input file ($f) for the pattern and writing to a new file ($n) .
Instead of write to a new file is there any way to do a string replacement the original file when I try to do the same it has only empty file with no contents.
Do not. Never, ever1. Don't you dare, Don't even think of, do not use subroutine prototyping. It is horribly broken (that is, it doesn't do what you think it does) and is dangerous.
Now, we got that out of the way:
Yes, you can do what you want. You can open a file as both read and writable by using the mode <+. So far, so good.
However, due to buffering, you cannot use the standard read and write methods to read and write to the file. Instead, you need to use sysread and syswrite.
Then, what you need to do is read the line, use sysseek to go back to the start of where you read, and then write to that spot.
Not only is it very complex to do, but it is full of peril. Let's take a simple example. I have a document, and I want to replace my curly quotes with straight quotes.
$line =~ s/“|”/"/g;
That should work. I'm replacing one character with another. What could go wrong?
If this is a UTF-8 file (what Macs and Linux systems use by default), those curly quotes are two-byte characters and that straight quote is a single byte character. I would be writing back a line that was shorter than the line I read in. My buffer is going to be off.
Back in the days when computer memory and storage were measured in kilobytes, and you serial devices like reel-to-reel tapes, this type of operation was quite common. However, in this age where storage is vast, it's simply not worth the complexity and error prone process that this entails. Stick with reading from one file, and writing to another. Then use unlink and rename to delete the original and to rename the copy to the original's name.
A few more pointers:
Don't print if the file can't be opened. Use die. Otherwise, your program will simply continue on blithely unaware that it is not working. Even better, use the pragma use autodie;, and you won't have to worry about testing whether or not a read/write failed.
Use scalars for file handles.
That is instead of
open OUT, ">my_file.txt";
use
open my $out_fh, ">my_file.txt";
And, it is highly recommended to use the three parameter open:
Use
open my $out_fh, ">", "my_file.txt";
If you aren't, always add use strict; and use warnings;.
In fact, your Perl syntax is a bit ancient. You need to get a book on Modern Perl. Perl originally was written as a hack language to replace shell and awk programming. However, Perl has morphed into a full fledge language that can handle complex data types, object orientation, and large projects. Learning the modern syntax of Perl will help you find errors, and become a better developer.
1. Like all rules, this can be broken, but only if you have a clear and careful understanding what is going on. It's like those shows that say "Don't do this at home. We're professionals."
sub inplace_expansion($$$){
my $f = shift(#_) ; # file Name
my $vname = shift(#_) ; # variable name for pattern match
my $value = shift(#_) ; # value to replace
local #ARGV = ( $f );
local $^I = '';
while (<>)
{
s/\Q$vname/$value/g; #check for pattern
print;
}
}
or, my preference would run closer to this (basically equivalent, changes mostly in formatting, variable names, etc.):
use English;
sub inplace_expansion {
my ( $filename, $pattern, $replacement ) = #_;
local #ARGV = ( $filename ),
$INPLACE_EDIT = '';
while ( <> ) {
s/\Q$pattern/$replacement/g;
print;
}
}
The trick with local basically simulates a command-line script (as one would run with perl -e); for more details, see perldoc perlrun. For more on $^I (aka $INPLACE_EDIT), see perldoc perlvar.
(For the business with \Q (in the s// expression), see perldoc -f quotemeta. This is unrelated to your question, but good to know. Also be aware that passing regex patterns around in variables—as opposed to, e.g., using literal regexes exclusively— can be vulnerable to injection attacks; Perl's built-in taint mode is useful here.)
EDIT: David W. is right about prototypes.

Removing whitespace and line breaks between delimiters in Perl

I am new to Perl and trying to sort out an issue but did not have success. I am trying to read data from a text file. The code is:
open FH, 'D:\Learning\Test.txt' or die $!;
my #data_line;
while (<FH>)
{
#data_line = split (/\|\~/);
print #data_line;
}
The file content is like this:
101|~John|~This line is
broken and showing
space in print|~version123|~data|~|~|~
102|~Abrahim|~This is a line to be print|~version1.3|~|~|~|~
And the output is:
101JohnThis line is
broken and showing
space in printversion123data
102AbrahimThis is a line to be printversion1.3
I just want to show the data in one line between the delimiters like:
101JohnThis line is broken and showing space in printversion123data
102AbrahimThis is a line to be printversion1.3
Please suggest me what should I do. I had tried chomp(#data_line) also, but it did not work.
I am using Windows operating system.
I want to insert these "|~" separated values in different fields of a table. I had added :
$_ =~ s/\n//g;
before #data_line = split (/\|\~/);
it printed the details as per my requirement but not inserting data properly in my database table.
Please suggest me what should i do ? Thanks in advance.
A slight rewrite:
use strict;
use warnings;
use feature qw(say); #See note #1
use autodie; #See note #2
use constant FILE => 'D:/Learning/Test.txt'; #See note #3
open my $fh, "<", FILE; #See note #4
my $desired_output;
while ( my $line = <DATA> ) { #See note #5
chomp $line; #See note #6
$line =~ s/\|~//g;
if ( $desired_output ) {
if ( $line =~ /^\d+/ ) {
$desired_output .= "\n$line";
}
else {
$desired_output .= " $line";
}
}
else { #See note #7
$desired_output = $line;
}
}
close $fh; #See note #8
say "$desired_output";
Instead of using split, why not simply remove the field separators entirely with the substitute command? Also note that I save the output as one continuous line. The interior if structure is a bit more complex than I like, but it's pretty easy to follow. If there is no data in $desired_output, I simply set $desired_output equal to my line. Otherwise, I'll check to see if $line begins with a number. If it does, I'll append a \n to $desired_output and then append $line. Otherwise, I append a space and then $line.
Now for my notes. This is more or less written in what is now called the standard Perl style. This includes some good advice (use strict, warnings, etc) and the way modern programs are now laid out. For example, use underscores to separate out words in variable names instead of camel casing them ($desired_output vs. $desiredOutput). A lot of this is covered in Damian Conway's Perl Best Practices. These might not be the way I'd like to do things, but I do them because it's what everyone else is doing. And, it's usually more important to follow a standard than to complain about it. It's about maintenance and readability. You follow the crowd.
Always put these three lines on all of your programs. The first two will catch 90% of your programming errors and the use features qw(say); allows you to use say instead of print. It saves you from having to add a \n at the end, and this can be a bit more important than it sounds right now. Trust me, you would rather use say instead of print when possible.
use autodie handles many situations in Perl when your program should not continue to run. For example, if you can't read in your file, you might as well not continue your program. The nice thing about autodie is that it will stop your program short when you forget to test the return value of your commands.
When something doesn't change, you should make it a constant. This puts all of your unchanging data in one place, and it allows you to define mystery numbers like PI = 3.1416. Unfortunately, constants cannot be easily interpolated into output unless you know the Perl deep dark secret.
When you open a file, use the three parameter form of the open command, and use scalar file handles. You can more easily pass a scalar file handle to a subroutine than you can with the older global handle.
Don't use $_, the automatic variable unless you have to (like in grep or map). It doesn't improve readability or speed up execution. And, it has the propensity of getting you into trouble. It's a global variable in all packages and can be affected without you even knowing it.
I always chomp every time I read in data that could possibly have a new line on the end, even if it might prove handy later. New lines on the end of lines can cause all sorts of consternations with regular expressions. This could have been done inside the while itself: while ( chomp ( my $line = <$fh> ) ), but that doesn't add to readability or speed.
Note my indentations and the way I use parentheses. This is now the preferred standard. It took me several years unlearning the way it's done in Pascal and K&R style C to do it this way. Might as learn it the right way early on.
Always close file handles when you're done with them. It's just good form.
you need to chomp the "it" variable just before you split.
while (<FH>)
{
chomp ($_);
#data_line = split (/\|\~/);
print #data_line;
}
I usually use an explicit variable to make it more readable.
while ( my $line= <FH> )
{
chomp ($line);
...
open FH, 'D:\Learning\Test.txt' or die $!;
my #data_line;
while (<FH>)
{
chomp;
#data_line = split (/\|\~/);
print #data_line;
}
you can use chomp to erase the '/n' on file.
this one liner will help you. but it will change your input file
perl -pi -e 's/\|\~//g;s/\n/ /g' test.txt

Should I manually set Perl's #ARGV so I can use <> to open, scan, and close files?

I have recently started learning Perl and one of my latest assignments involves searching a bunch of files for a particular string. The user provides the directory name as an argument and the program searches all the files in that directory for the pattern. Using readdir() I have managed to build an array with all the searchable file names and now need to search each and every file for the pattern, my implementation looks something like this -
sub searchDir($) {
my $dirN = shift;
my #dirList = glob("$dirN/*");
for(#dirList) {
push #fileList, $_ if -f $_;
}
#ARGV = #fileList;
while(<>) {
## Search for pattern
}
}
My question is - is it alright to manually load the #ARGV array as has been done above and use the <> operator to scan in individual lines or should I open / scan / close each file individually? Will it make any difference if this processing exists in a subroutine and not in the main function?
On the topic of manipulating #ARGV - that's definitely working code, Perl certainly allows you to do that. I don't think it's a good coding habit though. Most of the code I've seen that uses the "while (<>)" idiom is using it to read from standard input, and that's what I initially expect your code to do. A more readable pattern might be to open/close each input file individually:
foreach my $file (#files) {
open FILE, "<$file" or die "Error opening file $file ($!)";
my #lines = <FILE>;
close FILE or die $!;
foreach my $line (#file) {
if ( $line =~ /$pattern/ ) {
# do something here!
}
}
}
That would read more easily to me, although it is a few more lines of code. Perl allows you a lot of flexibility, but I think that makes it that much more important to develop your own style in Perl that's readable and understandable to you (and your co-workers, if that's important for your code/career).
Putting subroutines in the main function or in a subroutine is also mostly a stylistic decision that you should play around with and think about. Modern computers are so fast at this stuff that style and readability is much more important for scripts like this, as you're not likely to encounter situations in which such a script over-taxes your hardware.
Good luck! Perl is fun. :)
Edit: It's of course true that if he had a very large file, he should do something smarter than slurping the entire file into an array. In that case, something like this would definitely be better:
while ( my $line = <FILE> ) {
if ( $line =~ /$pattern/ ) {
# do something here!
}
}
The point when I wrote "you're not likely to encounter situations in which such a script over-taxes your hardware" was meant to cover that, sorry for not being more specific. Besides, who even has 4GB hard drives, let alone 4GB files? :P
Another Edit: After perusing the Internet on the advice of commenters, I've realized that there are hard drives that are much larger than 4GB available for purchase. I thank the commenters for pointing this out, and promise in the future to never-ever-ever try to write a sarcastic comment on the internet.
I would prefer this more explicit and readable version:
#!/usr/bin/perl -w
foreach my $file (<$ARGV[0]/*>){
open(F, $file) or die "$!: $file";
while(<F>){
# search for pattern
}
close F;
}
But it is also okay to manipulate #ARGV:
#!/usr/bin/perl -w
#ARGV = <$ARGV[0]/*>;
while(<>){
# search for pattern
}
Yes, it is OK to adjust the argument list before you start the 'while (<>)' loop; it would be more nearly foolhardy to adjust it while inside the loop. If you process option arguments, for instance, you typically remove items from #ARGV; here, you are adding items, but it still changes the original value of #ARGV.
It makes no odds whether the code is in a subroutine or in the 'main function'.
The previous answers cover your main Perl-programming question rather well.
So let me comment on the underlying question: How to find a pattern in a bunch of files.
Depending on the OS it might make sense to call a specialised external program, say
grep -l <pattern> <path>
on unix.
Depending on what you need to do with the files containing the pattern, and how big the hit/miss ratio is, this might save quite a bit of time (and re-uses proven code).
The big issue with tweaking #ARGV is that it is a global variable. Also, you should be aware that while (<>) has special magic attributes. (reading each file in #ARGV or processing STDIN if #ARGV is empty, testing for definedness rather than truth). To reduce the magic that needs to be understood, I would avoid it, except for quickie-hack-jobs.
You can get the filename of the current file by checking $ARGV.
You may not realize it, but you are actually affecting two global variables, not just #ARGV. You are also hitting $_. It is a very, very good idea to localize $_ as well.
You can reduce the impact of munging globals by using local to localize the changes.
BTW, there is another important, subtle bit of magic with <>. Say you want to return the line number of the match in the file. You might think, ok, check perlvar and find $. gives the linenumber in the last handle accessed--great. But there is an issue lurking here--$. is not reset between #ARGV files. This is great if you want to know how many lines total you have processed, but not if you want a line number for the current file. Fortunately there is a simple trick with eof that will solve this problem.
use strict;
use warnings;
...
searchDir( 'foo' );
sub searchDir {
my $dirN = shift;
my $pattern = shift;
local $_;
my #fileList = grep { -f $_ } glob("$dirN/*");
return unless #fileList; # Don't want to process STDIN.
local #ARGV;
#ARGV = #fileList;
while(<>) {
my $found = 0;
## Search for pattern
if ( $found ) {
print "Match at $. in $ARGV\n";
}
}
continue {
# reset line numbering after each file.
close ARGV if eof; # don't use eof().
}
}
WARNING: I just modified your code in my browser. I have not run it so it, may have typos, and probably won't work without a bit of tweaking
Update: The reason to use local instead of my is that they do very different things. my creates a new lexical variable that is only visible in the contained block and cannot be accessed through the symbol table. local saves the existing package variable and aliases it to a new variable. The new localized version is visible in any subsequent code, until we leave the enclosing block. See perlsub: Temporary Values Via local().
In the general case of making new variables and using them, my is the correct choice. local is appropriate when you are working with globals, but you want to make sure you don't propagate your changes to the rest of the program.
This short script demonstrates local:
$foo = 'foo';
print_foo();
print_bar();
print_foo();
sub print_bar {
local $foo;
$foo = 'bar';
print_foo();
}
sub print_foo {
print "Foo: $foo\n";
}