Below I'v proided just a chunk of a huge perl script I am trying to write. I am getting syntax errors in else statement but in the console window its only saying syntax error at perl script and not clearly telling the error. I am trying to create a variable file file_no_$i.txt and copy contents of t_code.txt in it and then find and replace string in the variable file with some selected keys of hash %defines_2
open ( my $pointer, "<", "t_code.txt" ) or die $!;
my $out_pointer;
for (my $i=0 ; $i <=$#match ; $i++) {
for (my $j=0; $j <= $#match ; $j++) {
if ($match[$i]=~$match[$j]) {
next;
}
else {
my $file_name = "file_no_$i.txt";
open $out_pointer, ">" , $file_name or die "Can't open the output file!";
copy("$file_name","t_code.txt") or die "Copy failed: $!";
my #lin = <$out_pointer>;
foreach $_(#lin) {
$_ =~ s/UART90_BASE_ADDRESS/$defines_2{ $_ = grep{/$match[$i]/} (keys %defines_2)};
}
}
}
}
You cannot use / unquoted inside a s/// construct. Instead of backslashes, you can use different delimiters:
s#UART90_BASE_ADDRESS#$defines_2{ $_ = grep{/$match[$i]/} (keys %defines_2)}#;
It fixes the syntax error, but I fear it still won't do what you want. Without data, it's hard to test, though.
What I think you're doing is editing a number of text files whose names look like file_no_1.txt etc. You're doing that by copying the current file to t_code.txt and then reading that file line by line, editing as required, as writing the lines back to the original text file.
The problem with that approach is that the file will be copied and rewritten many times, and it would be better to read the whole file into an array, make all the edits, and then write them back in one operation. That would be fine unless the file is enormous — say, several GB.
Here's some code that implements that approach. You see that $file_name is defined and #lines is filled outside the inner loop. The innermost loop modifies the elements of #lines and, outside that loop again, #lines is written back to the original text file.
I couldn't fathom a couple of things about your code.
I'm not sure if you should be using =~ or if you intended a simple eq. The former does a contains test, and you had a problem in the past where you meant to check that the first string had the second at the end
The grep call
grep{/$match[$i]/} (keys %defines_2)
worries me, as it can potentially return more than one key of the %defines_2 hash, in which case your own code will insert what is pretty much a random selection from the hash elements
If your code is working then that's fine, but if not then I hope this helps you fix it. If you need more help on this chunk of code then you should include a small sample of the data so that we can better understand what is going on.
for my $i (0 .. $#match) {
my $file_name = "file_no_$i.txt";
my #lines = do {
open my $in_fh, '<', 't_code.txt' or die $!;
<$in_fh>;
};
for my $j (0 .. $#match) {
next if $match[$i] =~ $match[$j];
for ( #lines ) {
my ($match) = grep { /$match[$i]/ } keys %defines_2;
s/UART90_BASE_ADDRESS/$defines_2{$match}/;
}
}
open my $out_fh, '>', $file_name or die qq{Can't open "$file_name" for output: $!};
print $out_fh $_ for #lines;
close $out_fh or die qq{Failed to close output file "$file_name": $!};
}
Related
I am trying to copy the content of three separate .vect files into one. I want to do this for all 5,000 files in the $fromdir directory.
When I run this program it generates just a single modified .vect file in the output directory. If I include the close(DATA) calls after individual while loops inside the foreach loop, I get the same behavior: a single output file in the output directory instead of the wanted 5,000 files.
I have done some reading, and at first thought I may not be opening the files. But if I print($vectfile) in the foreach loop every file name in the directory is printed.
My second thought was that it was how I was closing the files, but
I get the same behavior whether
I close the file handles inside or outside the foreach loop.
My final thought was maybe I don't have write permission to the file or directory, but I don't know how to change this.
How can I get this loop to run all 5,000 times and not just once?
use strict;
use warnings;
use feature qw(say);
my $dir = "D:\\Downloads";
# And M3.1 and P3.1
my $subfolder = "A0.1";
my $fromdir = $dir . "\\" . $subfolder;
my #files = <$fromdir/*vect>;
# Top of file
my $readfiletop = "C:\\Users\\Owner\\Documents\\MoreKnotVis\\ScriptsForAdditionalDataSets\\VectFileHeader.vect";
# Bottom of file
my $readfilebottom = "C:\\Users\\Owner\\Documents\\MoreKnotVis\\ScriptsForAdditionalDataSets\\VectFileCloser.vect";
foreach my $vectfile ( #files ) {
say("$vectfile");
my $count = 0;
my $readfilebody = $vectfile;
my $out_file = "D:\\Downloads\\ColorsA0.1\\" . "$count" . ".vect";
$count++;
# open top part of each file
open(DATA1, "<", $readfiletop) or die "Can't open '$readfiletop': $!";
# open bottom part of each file
open(DATA3, "<", $readfilebottom) or die "Can't open '$readfilebottom': $!";
# open a file to read
open(DATA2, "<", $vectfile) or die "Can't open '$vectfile': $!";
# open a file to write to
open(DATA4, ">" ,$out_file) or die "Can't open '$out_file': $!";
# Copy data from VectFileTop file to another.
while ( <DATA1> ) {
print DATA4 $_;
}
# Copy the data from VectFileBody to another.
while ( <DATA2> ) {
print DATA4 $_, $_ if 8..12;
}
# Copy the data from VectFileBottom to another.
while ( <DATA3> ) {
print DATA4 $_;
}
}
close( DATA1 );
close( DATA2 );
close( DATA3 );
close( DATA4 );
print("quit\n");
You construct the output file name including $count in it.
But note what you do with this variable:
initially, but inside the loop you set it to 0,
the output file name is constructed with 0 in it,
then you increment it, but this has no effect, because this variable
is again set to 0 in the next execution of the loop..
The effect is that:
the loop executes the required numer of times,
but the output file name every time contains 0 as the "number",
so you keep overwriting the same file with a new content.
Move my $count = 0; instruction before the loop and everything
should be OK.
You seem to be clinging to a specific form of code in fear of everything falling apart if you change a single thing. I recommend that you dare to stray a little more from the formula so that the code is more concise and readable
The problem is that you reset your $count to zero before processing each input file, so all the output files have the same name and overwrite one another. The remaining output file contains only the data from the last input file
Here's a refactoring of your code. I can't guarantee that it will run correctly but it looks right and does compile
I've added use autodie to avoid having to check the status of every IO operation
I've used the same lexical file handle $fh for all the input file. Opening another file on a file handle that is already open will close it first, and a lexical file handle will be closed by perl when it goes out of scope at the end of the block
I've used a while loop to iterate over the input file names instead of reading the whole list into an array which unnecessarily uses an additional variable #files and wastes space
I've used forward slashes instead of backslashes in all the file paths. This is fine in library calls on Windows: it is only a problem if they appear in command line input
I hope you'll agree that this form is more readable. I think you would have stood a much better chance of finding the problem if your code were in this form
use strict;
use warnings;
use autodie;
use feature qw/ say /;
my $indir = 'D:/Downloads';
my $subdir = 'A0.1'; # And M3.1 and P3.1
my $extrasdir = 'C:/Users/Owner/Documents/MoreKnotVis/ScriptsForAdditionalDataSets';
my $outdir = "$indir/Colors$subdir";
my $topfile = "$extrasdir/VectFileHeader.vect";
my $bottomfile = "$extrasdir/VectFileCloser.vect";
my $filenum;
while ( my $vectfile = glob "$indir/$subdir/*.vect" ) {
say qq/Processing "$vectfile"/;
$filenum++;
open my $outfh, '>', "$outdir/$filenum.vect";
my $fh;
open $fh, '<', $topfile;
print { $outfh } $_ while <$fh>;
open $fh, '<', $vectfile;
while ( <$fh> ) {
print { $outfh } $_, $_ if 8..12;
}
open $fh, '<', $bottomfile;
print { $outfh } $_ while <$fh>;
}
say 'DONE';
I'm incredibly new to Perl, and never have been a phenomenal programmer. I have some successful BVA routines for controlling microprocessor functions, but never anything embedded, or multi-facted. Anyway, my question today is about a boggle I cannot get over when trying to figure out how to remove duplicate lines of text from a text file I created.
The file could have several of the same lines of txt in it, not sequentially placed, which is problematic as I'm practically comparing the file to itself, line by line. So, if the first and third lines are the same, I'll write the first line to a new file, not the third. But when I compare the third line, I'll write it again since the first line is "forgotten" by my current code. I'm sure there's a simple way to do this, but I have issue making things simple in code. Here's the code:
my $searchString = pseudo variable "ideally an iterative search through the source file";
my $file2 = "/tmp/cutdown.txt";
my $file3 = "/tmp/output.txt";
my $count = "0";
open (FILE, $file2) || die "Can't open cutdown.txt \n";
open (FILE2, ">$file3") || die "Can't open output.txt \n";
while (<FILE>) {
print "$_";
print "$searchString\n";
if (($_ =~ /$searchString/) and ($count == "0")) {
++ $count;
print FILE2 $_;
} else {
print "This isn't working\n";
}
}
close (FILE);
close (FILE2);
Excuse the way filehandles and scalars do not match. It is a work in progress... :)
The secret of checking for uniqueness, is to store the lines you have seen in a hash and only print lines that don't exist in the hash.
Updating your code slightly to use more modern practices (three-arg open(), lexical filehandles) we get this:
my $file2 = "/tmp/cutdown.txt";
my $file3 = "/tmp/output.txt";
open my $in_fh, '<', $file2 or die "Can't open cutdown.txt: $!\n";
open my $out_fh, '>', $file3 or die "Can't open output.txt: $!\n";
my %seen;
while (<$in_fh>) {
print $out_fh unless $seen{$_}++;
}
But I would write this as a Unix filter. Read from STDIN and write to STDOUT. That way, your program is more flexible. The whole code becomes:
#!/usr/bin/perl
use strict;
use warnings;
my %seen;
while (<>) {
print unless $seen{$_}++;
}
Assuming this is in a file called my_filter, you would call it as:
$ ./my_filter < /tmp/cutdown.txt > /tmp/output.txt
Update: But this doesn't use your $searchString variable. It's not clear to me what that's for.
If your file is not very large, you can store each line readed from the input file as a key in a hash variable. And then, print the hash keys (ordered). Something like that:
my %lines = ();
my $order = 1;
open my $fhi, "<", $file2 or die "Cannot open file: $!";
while( my $line = <$fhi> ) {
$lines {$line} = $order++;
}
close $fhi;
open my $fho, ">", $file3 or die "Cannot open file: $!";
#Sort the keys, only if needed
my #ordered_lines = sort { $lines{$a} <=> $lines{$b} } keys(%lines);
for my $key( #ordered_lines ) {
print $fho $key;
}
close $fho;
You need two things to do that:
a hash to keep track of all the lines you have seen
a loop reading the input file
This is a simple implementation, called with an input filename and an output filename.
use strict;
use warnings;
open my $fh_in, '<', $ARGV[0] or die "Could not open file '$ARGV[0]': $!";
open my $fh_out, '<', $ARGV[1] or die "Could not open file '$ARGV[1]': $!";
my %seen;
while (my $line = <$fh_in>) {
# check if we have already seen this line
if (not $seen{$line}) {
print $fh_out $line;
}
# remember this line
$seen{$line}++;
}
To test it, I've included it with the DATA handle as well.
use strict;
use warnings;
my %seen;
while (my $line = <DATA>) {
# check if we have already seen this line
if (not $seen{$line}) {
print $line;
}
# remember this line
$seen{$line}++;
}
__DATA__
foo
bar
asdf
foo
foo
asdfg
hello world
This will print
foo
bar
asdf
asdfg
hello world
Keep in mind that the memory consumption will grow with the file size. It should be fine as long as the text file is smaller than your RAM. Perl's hash memory consumption grows a faster than linear, but your data structure is very flat.
I've files with filenames such as lin.txt and lin1.txt along with other .txt files. I need to find only these files and print its content only by one. I've the below code, but its somehow not matching the files starting with lin*. What is the issue?
$te_dir= "/projects/xxx/";
opendir (DIR, $te_dir) or die $!;
while (my $file = readdir(DIR))
{
if ($file=~/\.txt/)
{
#// Doing some tasks.
if($file ~= 'lin*.txt')
{
$linfile=$te_dir/$file;
open(LINFILE, $linfile) or die "Couldn't open file $file:$!";
while(my $line = <LINFILE>)
{
print $line;
}
close LINFILE;
}
}
}
You are mixing globs (shell wildcards) with regular expressions. These are two different formalisms with different syntax and semantics. In regular expressions (which is what Perl matching uses), n* matches zero or more occurrences of the character n. You probably mean
if ($file =~ /lin.*\.txt/)
Notice also the syntax error in the operator. You correctly have =~ in the first conditional, but you misspelled it as ~= where you do this comparison. (Maybe it's just a transcription error; for me, this creates a clear syntax error, so the script would not run in the first place.)
As noted in #brianadams' answer, the proper regular expression for this is
if ($file =~ /^lin.*\.txt$/)
with beginning of line ^ and end of line $ anchors to prevent e.g. feline.txt.html from matching. The default behavior of Perl's regular expressions is to find a match anywhere in the input string.
Here's a quick (and minimal) rewrite of your code that might help:
use strict;
use warnings;
my $te_dir = "/projects/xxx/";
opendir( my $dirh, $te_dir ) or die "Could not open '$te_dir': $!";
while ( my $file = readdir($dirh) ) {
next unless $file =~ /\.txt$/;
#// Doing some tasks.
if ( $file =~ /^ lin \d* \.txt $/x ) {
my $linfile = "$te_dir/$file";
open( my $fh, $linfile ) or die "Couldn't open file $linfile: $!";
while ( my $line = <$fh> ) {
print $line;
}
close $fh or die "Could not close $linfile: $!";
}
}
First, note that we've put strict and warnings at the top of the code. That will tell you about all sorts of interesting issues, including misspelled variable names.
Next, we've switch to lexical handles (e.g., my $dirh instead of DIR). The "bareword" version of the handles you're using (DIR and LINFILE have been discouraged for a long time because those are effectively global constructs and generally global data is bad because when it gets broken, it's awfully hard to tell what broke it, so we much, much prefer the lexical versions (the handles declared with the my builtin).
Also, this line you had probably doesn't do what you're thinking:
$linfile=$te_dir/$file;
You're trying to smash together a directory and filename with a forward slash, but since you didn't use string interpolation, you're actually using division. Both your director and filename will, in this numeric context, probably evaluate to zero, giving you a divide by zero error when you're trying to open a file!
However, if you're willing to use a CPAN module, you can make this even easier:
use strict;
use warnings;
use File::Find::Rule;
my $te_dir = "/projects/xxx/";
my #files = File::Find::Rule->file->name('lin*.txt')->in($te_dir);
foreach my $linfile (#files) {
#// Doing some tasks.
open my $fh, $linfile or die "Couldn't open file $linfile: $!";
while ( my $line = <$fh> ) {
print $line;
}
}
No muss, no fuss. Get only the files you want in the first pass and already have the correct file names (note that I didn't close the filehandle because it will close automatically when $fh goes out of scope at the end of the foreach loop.)
To match files starting with lin
if ( $file =~ /^lin.*\.txt$/ )
Try changing your 2nd if condition from this,
if($file ~= 'lin*.txt')
to this,
if($file =~ /lin*\.txt/)
You could also try: if($file =~ /^lin*\.txt/) , as already pointed out in other answers, but you'll need to make sure that the file names stored in the $file variable contain only the file name and not the entire path as well.
I'm writing this Perl script that gets two command line arguments: a directory and a year. In this directory is a ton of text files or html files(depending on the year). Lets say for instance it's the year 2010 which contains files that look like this <number>rank.html with the number ranging from 2001 to 2212. I want it to open each file individually and take a part of the title in the html file and print it to a text file. However, when I run my code it just prints the first files title to the text file. It seems that it only ever opens the first file 2001rank.html and no others. I'll post the code below and thanks to anyone that helps.
my $directory = shift or "Must supply directory\n";
my $year = shift or "Must supply year\n";
unless (-d $directory) {
die "Error: Directory must be a directory\n";
}
unless ($directory =~ m/\/$/) {
$directory = "$directory/";
}
open COLUMNS, "> columns$year.txt" or die "Can't open columns file";
my $column_name;
for (my $i = 2001; $i <= 2212; $i++) {
if ($year >= 2009) {
my $html_file = $directory.$i."rank.html";
open FILE, $html_file;
#check if opened correctly, if not, skip it
unless (defined fileno(FILE)) {
print "skipping $html_file\n";
next;
}
$/ = "\n";
my $line = <FILE>;
if (defined $line) {
$column_name = "";
$_ = <FILE> until m{</title>};
$_ =~ m{<title>CIA - The World Factbook -- Country Comparison :: (.+)</title>}i;
$column_name = $1;
}
else {
close FILE;
next;
}
close FILE;
}
else {
my $text_file = $directory.$i."rank.txt";
open FILE, $text_file;
unless (defined fileno(FILE)) {
print "skipping $text_file\n";
next;
}
$/ = "\r";
my $line = <FILE>;
if (defined $line) {
$column_name = "";
$_ = <FILE> until /Rank/i;
$_ =~ /Rank(\s+)Country(\s+)(.+)(\s+)Date/i;
$column_name = $3;
}
else {
close FILE;
next;
}
close FILE;
}
print "Adding $column_name to text file\n";
print COLUMNS "$column_name\n";
}
close COLUMNS;
In other words $column_name gets set equal to the same thing every pass in the loop, even though I know the html files are different.
You'll probably be able to debug this a lot faster if you convert using local lexicals for your filehandles instead of globals, as well as turn on strict checking:
use strict;
use warnings;
while (...)
{
# ...
open my $filehandle, $html_file;
# ...
my $line = <$filehandle>;
}
This way, the filehandle(s) will go out of scope during each loop iteration, so you can more clearly see what exactly is being referenced and where. (Hint: you may have missed a condition where the filehandle gets closed, so it is improperly reused the next time around.)
For more on best practices with open and filehandles, see:
Why is three-argument open calls with autovivified filehandles a Perl best practice?
What's the best way to open and read a file in Perl?
Some other points:
Don't ever explicitly assign to $_, that's asking for trouble. Declare your own variable to hold your data: my $line = <$filehandle> (as in the example above)
Pull out your matches directly into variables, rather than using $1, $2 etc, and only use parentheses for the portions you actually need: my ($column_name) = ($line =~ m/Rank\s+Country\s+.+(\s+)Date/i);
put the error conditions first, so the bulk of your code can be outdented one (or more) level(s). This will improve readability, as when the bulk of your algorithm is visible on the screen at once, you can better visualize what it is doing and catch errors.
If you apply the points above I'm pretty sure that you'll spot your error. I spotted it while making this last edit, but I think you'll learn more if you discover it yourself. (I'm not trying to be snooty; trust me on this!)
Your processing is similar for HTML and text files, so make your life easy and factor out the common part:
sub scrape {
my($path,$pattern,$sep) = #_;
unless (open FILE, $path) {
warn "$0: skipping $path: $!\n";
return;
}
local $/ = $sep;
my $column_name;
while (<FILE>) {
next unless /$pattern/;
$column_name = $1;
last;
}
close FILE;
($path,$column_name);
}
Then make it specific for the two types of input:
sub scrape_html {
my($directory,$i) = #_;
scrape $directory.$i."rank.html",
qr{<title>CIA - The World Factbook -- Country Comparison :: (.+)</title>}i,
"\n";
}
sub scrape_txt {
my($directory,$i) = #_;
scrape $directory.$i."rank.txt",
qr/Rank\s+Country\s+(.+)\s+Date/i,
"\r";
}
Then your main program is straightforward:
my $directory = shift or die "$0: must supply directory\n";
my $year = shift or die "$0: must supply year\n";
die "$0: $directory is not a directory\n"
unless -d $directory;
# add trailing slash if necessary
$directory =~ s{([^/])$}{$1/};
my $columns_file = "columns$year.txt";
open COLUMNS, ">", $columns_file
or die "$0: open $columns_file: $!";
for (my $i = 2001; $i <= 2212; $i++) {
my $process = $year >= 2009 ? \&scrape_html : \&scrape_txt;
my($path,$column_name) = $process->($directory,$i);
next unless defined $path;
if (defined $column_name) {
print "$0: Adding $column_name to text file\n";
print COLUMNS "$column_name\n";
}
else {
warn "$0: no column name in $path\n";
}
}
close COLUMNS or warn "$0: close $columns_file: $!\n";
Note how careful you have to be to close global filehandles. Please use lexical filehandles as in
open my $fh, $path or die "$0: open $path: $!";
Passing $fh as a parameter or stuffing it in hashes is much nicer. Also, lexical filehandles close automatically when they go out of scope. There's no chance of stomping on a handle someone else is already using.
Have you considered grep?
grep out just the line from the HTML containing the title, and then process the output of grep.
Simpler, as you would not have to write any file-handling code. You didn't say what you want with that title - if you only need a list, you might not need to write any code at all.
Try something like:
grep -ri title <directoryname>
This is a beginner-best-practice question in perl. I'm new to this language. The question is:
If I want to process the output lines from a program, how can I format THE FIRST LINE in a special way?
I think of two possibilities:
1) A flag variable, once the loop is executed first time is set. But it will be evaluated for each cycle. BAD solution
2) An index-based loop (like a "for"). Then I would start the loop in i=1. This solution is far better. The problem is HOW CAN I DO IT?
I just found the code for looping over with the while ( <> ) construct.
Here you can see better:
$command_string = "par-format 70j p0 s0 < " . $ARGV[0] . "|\n";
open DATA, $command_string or die "Couldn't execute program: $!";
print "\t <div>          |-- <strong>Description</strong></div>\n";
while ( defined( my $line = <DATA> ) ) {
chomp($line);
# print "$line\n";
print "\t <div>          |   -- " . $line . "</div>\n";
}
close DATA;
Please also don't hesitate in correcting any code in here, this is my first perl poem.
Thanks!
You can always use $. or the English name $INPUT_LINE_NUMBER to control the logic in your loop with:
while (my $line = <>) {
if ($. == 1) {
# do cool stuff here
}
# do normal stuff here
}
To handle the first line differently, you could just put
$line = <DATA>;
above your loop.
With proper checking for read problems (empty file, etc.) this should be
if ($line = <DATA>) {
...do special things...
}
while (my $line = <DATA>) {
...do regular things...
}
I'm not sure about the defined() call. You might not need it, since an empty string has a false truth value.
From a 'best practices' perspective there is much wrong with that code sample:
open DATA, $command_string or die "Couldn't execute program: $!";
Security hole, please exploit me.
DATA is a magical value that points to a __DATA__ section at the end of the current file.
You should use
open my $fh
Which uses a lexical variable for a file handle instead of a global.
You should use 3 arg open, ie:
open my $fh, '<' , $filename
open my $fh, '-|' , $command
open my $fh, '-|' , $command, #args
sadly I have yet to work out how 3-arg works with dual-pipes.
theres' this IPC::Open2 thing, but I haven't worked out how
to use that effectively yet. Suggestions welcome .