How to check for a string using Perl given - perl

I'm trying to replace a particular line in a file. I can get my program to run, but it doesn't actually do the replacing that I want it to.
Here is my sample file:
test line 1
test line 2
line to be overwritten
test line 3
Here is the code that I have:
my $origFile = $file_path . "junk\.file";
my $newFile = $file_path . "junk\.file\.backup";
# system command to make a backup of the file
system "mv $origFile $newFile";
#opens the files
open( my $INFILE, $newFile ) || die "Unable to read $newFile\n";
open( my $OUTFILE, '>' . $origFile ) || die "Unable to create $origFile\n";
# While loop to read in the file line by line
while ( <$INFILE> ) {
given ($_) {
when ("line to be overwritten") {
print $OUTFILE "line has been overwritten\n";
}
default {
print $OUTFILE $_;
}
}
}
close($INFILE);
close($OUTFILE);
I've tried to change the when statements several different ways to no avail:
when ($_ eq "line to be overwritten")
when ($_ == "line to be overwritten")
when ($_ cmp "line to be overwritten")
But those only generate errors. Anyone know what I'm doing wrong here?

As highlighted in a comment on the original question, given/when is an experimental feature of perl. I would personally recommend using if/else in a loop, and then either use string equality or a regex to match the line(s) you want to replace. A quick example:
use strict;
use warnings;
while(my $line = <DATA>) {
if ( $line =~ /line to be overwritten/ ) {
print "Overwritten\n";
} else {
print $line;
}
}
__DATA__
test line 1
test line 2
line to be overwritten
test line 3
This will give the output:
test line 1
test line 2
Overwritten
test line 3
You could also use the string equality if you aren't confident in your regex, or the string is guaranteed to be the same:
...
if ($line eq 'line to be overwritten') {
...
Sidenotes
open
On your initial open, it is recommended to use the 3 argument version of open to save from unexpected issues:
open(my $INFILE, '<', $newFile) || die "Unable to read $newFile\n";
(for more info on this, see here: http://modernperlbooks.com/mt/2010/04/three-arg-open-migrating-to-modern-perl.html)
strict & warnings
Also, it is recommended to use strict and warnings in your code file, as seen in my example above - this will save you from accidental mistakes like trying to use a variable which has not been declared, and syntax errors which may give you head-scratching results!
Experimental Features
Experimental features in perl are where there is no guarantee made for backwards compatibility to be maintained when a new release of perl comes out. Obviously if you are using the same version of perl everywhere it should be compatible, but things may break if you update to another major version of perl. answered here as I dont have the reputation to answer in the comments...

You seem to be making it way more complicated than it needs to be - a simple regex to check each line and act accordingly should do the job.
while(<$INFILE>)
{
chomp($_);
if /^line to be overwritten$/ )
{
print $OUTFILE "line has been overwritten\n";
}
else
{
print $OUTFILE "$_\n";
}
}

One way to do it is to use Tie::File module. It allows to replace data right in the file. You can make the backup same way you are currently doing, before changing the original file.
use strict;
use warnings;
use Tie::File;
my $file = 'test.txt';
tie my #textFile, 'Tie::File', $file, recsep => "\n" or die $!;
s/line to be overwritten/line has been overwritten/ for #textFile;
untie #textFile;

Related

problem to read line after the first string match

Not sure why this script isn't working. Can anyone suggest a fix in order to get the expected output?
Perl script:
open(INFILE,"test_input.txt")||die "can't open the file";
while(<INFILE>){
$_=~s/^\s+//;
$_=~s/\s+$//;
if ($_ =~ /work/){
<INFILE> or die "Bad file format";
my $model = INFILE;
print "line below search: $model\n";
if ($model =~/^good/){
print "found good word\n";
}
else{
print "no good word\n";
}
}
}
input file:
employment
work hard
good people
Expected Output:
line below search: good people
found good word
Actual Output:
line below search:
no good word
You were very close. To get your program running as you expected, I only had to make a couple of changes.
You have this code:
<INFILE> or die "Bad file format";
my $model = INFILE;
I think you're trying to ensure that there is actually another line after the one you've matched. But in the first of those lines, you just read the next line and throw the data away. And in the second of those lines, you set $model to the string "INLINE". Perhaps you meant my $model = <INFILE> - but even that doesn't give you what you want as you've already read (and discarded) the record you want in the previous line of code.
So I think you want to replace those two lines with a single line.
my $model = <INFILE> or die "Bad file format";
If you're interested, here's how I'd write this. Notice, I've got rid of the hard-coded file name. Instead, I read from the generic filehandle, <>. This is automatically connected to the file whose name is passed to the program on the command line. This has the advantage that the code is a) easier to write (as you don't need to explicitly open a file) and b) more flexible (as you can process a file with any name). I've also removed a few unnecessary uses of $_ =~.
#!/usr/bin/perl
# Always use these
use strict;
use warnings;
use feature 'say'; # for 'say()'
while (<>) {
s/^\s+//; # No need for $_ =~
s/\s+$//;
if (/work/) {
my $model = <> or die "Bad file format";
if ($model =~ /^good/) {
say 'found good word';
} else {
say 'no good word';
}
}
}
Try this.
Read file into array. Process line by line. You can the check the current and next line.
my $file = 'testFile.txt';
open my $fh, "<", "test_input.txt" or die "can't open the file";
chomp(my #data = <$fh>);
close $fh;
for my $index (0..$#data) {
if($data[$index] =~ /work/){
my $nextIndex = $index + 1;
die "Bad file format" unless(defined $data[$nextIndex]);
print "line below search: $data[$nextIndex]\n";
if ($data[$nextIndex] =~/^good/){
print "found good word\n";
}
else{
print "no good word\n";
}
}
}

Match file names in if condition of perl

I've files with filenames such as lin.txt and lin1.txt along with other .txt files. I need to find only these files and print its content only by one. I've the below code, but its somehow not matching the files starting with lin*. What is the issue?
$te_dir= "/projects/xxx/";
opendir (DIR, $te_dir) or die $!;
while (my $file = readdir(DIR))
{
if ($file=~/\.txt/)
{
#// Doing some tasks.
if($file ~= 'lin*.txt')
{
$linfile=$te_dir/$file;
open(LINFILE, $linfile) or die "Couldn't open file $file:$!";
while(my $line = <LINFILE>)
{
print $line;
}
close LINFILE;
}
}
}
You are mixing globs (shell wildcards) with regular expressions. These are two different formalisms with different syntax and semantics. In regular expressions (which is what Perl matching uses), n* matches zero or more occurrences of the character n. You probably mean
if ($file =~ /lin.*\.txt/)
Notice also the syntax error in the operator. You correctly have =~ in the first conditional, but you misspelled it as ~= where you do this comparison. (Maybe it's just a transcription error; for me, this creates a clear syntax error, so the script would not run in the first place.)
As noted in #brianadams' answer, the proper regular expression for this is
if ($file =~ /^lin.*\.txt$/)
with beginning of line ^ and end of line $ anchors to prevent e.g. feline.txt.html from matching. The default behavior of Perl's regular expressions is to find a match anywhere in the input string.
Here's a quick (and minimal) rewrite of your code that might help:
use strict;
use warnings;
my $te_dir = "/projects/xxx/";
opendir( my $dirh, $te_dir ) or die "Could not open '$te_dir': $!";
while ( my $file = readdir($dirh) ) {
next unless $file =~ /\.txt$/;
#// Doing some tasks.
if ( $file =~ /^ lin \d* \.txt $/x ) {
my $linfile = "$te_dir/$file";
open( my $fh, $linfile ) or die "Couldn't open file $linfile: $!";
while ( my $line = <$fh> ) {
print $line;
}
close $fh or die "Could not close $linfile: $!";
}
}
First, note that we've put strict and warnings at the top of the code. That will tell you about all sorts of interesting issues, including misspelled variable names.
Next, we've switch to lexical handles (e.g., my $dirh instead of DIR). The "bareword" version of the handles you're using (DIR and LINFILE have been discouraged for a long time because those are effectively global constructs and generally global data is bad because when it gets broken, it's awfully hard to tell what broke it, so we much, much prefer the lexical versions (the handles declared with the my builtin).
Also, this line you had probably doesn't do what you're thinking:
$linfile=$te_dir/$file;
You're trying to smash together a directory and filename with a forward slash, but since you didn't use string interpolation, you're actually using division. Both your director and filename will, in this numeric context, probably evaluate to zero, giving you a divide by zero error when you're trying to open a file!
However, if you're willing to use a CPAN module, you can make this even easier:
use strict;
use warnings;
use File::Find::Rule;
my $te_dir = "/projects/xxx/";
my #files = File::Find::Rule->file->name('lin*.txt')->in($te_dir);
foreach my $linfile (#files) {
#// Doing some tasks.
open my $fh, $linfile or die "Couldn't open file $linfile: $!";
while ( my $line = <$fh> ) {
print $line;
}
}
No muss, no fuss. Get only the files you want in the first pass and already have the correct file names (note that I didn't close the filehandle because it will close automatically when $fh goes out of scope at the end of the foreach loop.)
To match files starting with lin
if ( $file =~ /^lin.*\.txt$/ )
Try changing your 2nd if condition from this,
if($file ~= 'lin*.txt')
to this,
if($file =~ /lin*\.txt/)
You could also try: if($file =~ /^lin*\.txt/) , as already pointed out in other answers, but you'll need to make sure that the file names stored in the $file variable contain only the file name and not the entire path as well.

Reading file line by line iteration issue

I have the following simple piece of code (identified as the problem piece of code and extracted from a much larger program).
Is it me or can you see an obvious error in this code that it stopping it from matching against $variable and printing $found when it definitely should be doing?
Nothing is printed when I try to print $variable, and there are definitely matching lines in the file I am using.
The code:
if (defined $var) {
open (MESSAGES, "<$messages") or die $!;
my $theText = $mech->content( format => 'text' );
print "$theText\n";
foreach my $variable (<MESSAGES>) {
chomp ($variable);
print "$variable\n";
if ($theText =~ m/$variable/) {
print "FOUND\n";
}
}
}
I have located this as the point at which the error is occurring but cannot understand why?
There may be something I am totally overlooking as its very late?
Update I have since realised that I misread your question and this probably doesn't solve the problem. However the points are valid so I am leaving them here.
You probably have regular expression metacharacters in $variable. The line
if ($theText =~ m/$variable/) { ... }
should be
if ($theText =~ m/\Q$variable/) { ... }
to escape any that there are.
But are you sure you don't just want eq?
In addition, you should read from the file using
while (my $variable = <MESSAGES>) { ... }
as a for loop will unnecessarily read the entire file into memory. And please use a better name than $variable.
This works for me.. Am I missing the question at hand? You're just trying to match "$theText" to anything on each line in the file right?
#!/usr/bin/perl
use warnings;
use strict;
my $fh;
my $filename = $ARGV[0] or die "$0 filename\n";
open $fh, "<", $filename;
my $match_text = "whatever";
my $matched = '';
# I would use a while loop, out of habit here
#while(my $line = <$fh>) {
foreach my $line (<$fh>) {
$matched =
$line =~ m/$match_text/ ? "Matched" : "Not matched";
print $matched . ": " . $line;
}
close $fh
./test.pl testfile
Not matched: this is some textfile
Matched: with a bunch of lines or whatever and
Not matched: whatnot....
Edit: Ah, I see.. Why don't you try printing before and after the "chomp()" and see what you get? That shouldn't be the issue, but it doesn't hurt to test each case..

Parsing the large files in Perl

I need to compare the big file(2GB) contains 22 million lines with the another file. its taking more time to process it while using Tie::File.so i have done it through 'while' but problem remains. see my code below...
use strict;
use Tie::File;
# use warnings;
my #arr;
# tie #arr, 'Tie::File', 'title_Nov19.txt';
# open(IT,"<title_Nov19.txt");
# my #arr=<IT>;
# close(IT);
open(RE,">>res.txt");
open(IN,"<input.txt");
while(my $data=<IN>){
chomp($data);
print"$data\n";
my $occ=0;
open(IT,"<title_Nov19.txt");
while(my $line2=<IT>){
my $line=$line2;
chomp($line);
if($line=~m/\b$data\b/is){
$occ++;
}
}
print RE"$data\t$occ\n";
}
close(IT);
close(IN);
close(RE);
so help me to reduce it...
Lots of things wrong with this.
Asides from the usual (lack of use strict, use warnings, use of 2-argument open(), not checking open() result, use of global filehandles), the specific problem in your case is that you are opening/reading/closing the second file once for every single line of the first. This is going to be very slow.
I suggest you open the file title_Nov19.txt once, read all the lines into an array or hash or something, then close it; and then you can open the first file, input.txt and walk along that once, comparing to things in the array so you don't have to reopen that second file all the time.
Futher I suggest you read some basic articles on style/etc.. as your question is likely to gain more attention if it's actually written in vaguely modern standards.
I tried to build a small example script with a better structure but I have to say, man, your problem description is really very unclear. It's important to not read the whole comparison file each time as #LeoNerd explained in his answer. Then I use a hash to keep track of the match count:
#!/usr/bin/env perl
use strict;
use warnings;
# cache all lines of the comparison file
open my $comp_file, '<', 'input.txt' or die "input.txt: $!\n";
chomp (my #comparison = <$comp_file>);
close $comp_file;
# prepare comparison
open my $input, '<', 'title_Nov19.txt' or die "title_Nov19.txt: $!\n";
my %count = ();
# compare each line
while (my $title = <$input>) {
chomp $title;
# iterate comparison strings
foreach my $comp (#comparison) {
$count{$comp}++ if $title =~ /\b$comp\b/i;
}
}
# done
close $input;
# output (sorted by count)
open my $output, '>>', 'res.txt' or die "res.txt: $!\n";
foreach my $comp (#comparison) {
print $output "$comp\t$count{$comp}\n";
}
close $output;
Just to get you started... If someone wants to further work on this: these were my test files:
title_Nov19.txt
This is the foo title
Wow, we have bar too
Nothing special here but foo
OMG, the last title! And Foo again!
input.txt
foo
bar
And the result of the program was written to res.txt:
foo 3
bar 1
Here's another option using memowe's (thank you) data:
use strict;
use warnings;
use File::Slurp qw/read_file write_file/;
my %count;
my $regex = join '|', map { chomp; $_ = "\Q$_\E" } read_file 'input.txt';
for ( read_file 'title_Nov19.txt' ) {
my %seen;
!$seen{ lc $1 }++ and $count{ lc $1 }++ while /\b($regex)\b/ig;
}
write_file 'res.txt', map "$_\t$count{$_}\n",
sort { $count{$b} <=> $count{$a} } keys %count;
Numerically-sorted output to res.txt:
foo 3
bar 1
An alternation regex which quotes meta characters (\Q$_\E) is built and used, so only one pass against the large file's lines is needed. The hash %seen is used to insure that the input words are only counted once per line.
Hope this helps!
Try this:
grep -i -c -w -f input.txt title_Nov19.txt > res.txt

How can I find the strings from one file in another file in Perl?

The script below takes function names in a text file and scans on a
folder that contains multiple c,h files. It opens those files one-by-one and
reads each line. If the match is found in any part of the files, it prints the
line number and the line that contains the match.
Everything is working fine except that the comparison is not working properly. I would be very grateful to whoever solves my problem.
#program starts:
use FileHandle;
print "ENTER THE PATH OF THE FILE THAT CONTAINS THE FUNCTIONS THAT YOU WANT TO
SEARCH: ";#getting the input file
our $input_path = <STDIN>;
$input_path =~ s/\s+$//;
open(FILE_R1,'<',"$input_path") || die "File open failed!";
print "ENTER THE PATH OF THE FUNCTION MODEL: ";#getting the folder path that
#contains multiple .c,.h files
our $model_path = <STDIN>;
$model_path =~ s/\s+$//;
our $last_dir = uc(substr ( $model_path,rindex( $model_path, "\\" ) +1 ));
our $output = $last_dir."_FUNC_file_names";
while(our $func_name_input = <FILE_R1> )#$func_name_input is the function name
#that is taken as the input
{
$func_name_input=reverse($func_name_input);
$func_name_input=substr($func_name_input,rindex($func_name_input,"\("+1);
$func_name_input=reverse($func_name_input);
$func_name_input=substr($func_name_input,index($func_name_input," ")+1);
#above 4 lines are func_name_input is choped and only part of the function
#name is taken.
opendir FUNC_MODEL,$model_path;
while (our $file = readdir(FUNC_MODEL))
{
next if($file !~ m/\.(c|h)/i);
find_func($file);
}
close(FUNC_MODEL);
}
sub find_func()
{
my $fh1 = FileHandle->new("$model_path//$file") or die "ERROR: $!";
while (!$fh1->eof())
{
my $func_name = $fh1->getline(); #getting the line
**if($func_name =~$func_name_input)**#problem here it does not take the
#match
{
next if($func_name=~m/^\s+/);
print "$.,$func_name\n";
}
}
}
$func_name_input=substr($func_name_input,rindex($func_name_input,"\("+1);
You're missing an ending parenthesis. Should be:
$func_name_input=substr($func_name_input,rindex($func_name_input,"\(")+1);
There's probably an easier way than those four statements, too. But it's a little early to wrap my head around it all. Do you want to match "foo" in "function foo() {"? If so, you could use a regex like /\s+([^) ]+)/.
When you say $func_name =~$func_name_input, you're treating all characters in $func_name_input as special regex characters. If this is not what you mean to do, you can use quotemeta (perldoc -f quotemeta): $func_name =~quotemeta($func_name_input) or $func_name =~ qr/\Q$func_name_input\E/.
Debugging will be easier with strictures (and a syntax-hilighting editor). Also note that, if you're not using those variables in other files, "our" doesn't do anything "my" wouldn't do for file-scoped variables.
find + xargs + grep does 90% of what you want.
find . -name '*.[c|h]' | xargs grep -n your_pattern
ack does it even easier.
ack --type=cc your_pattern
Simply take your list of patterns from your file and "or" them together.
ack --type=cc 'foo|bar|baz'
This has the benefit of only search the files once, and not once for each pattern being searched for as you're doing.
I still think you should just use ack, but your code needed some serious love.
Here is an improved version of your program. It now takes the directory to search and patterns on the command line rather than having to ask for (and the user write) files. It searches all the files under the directory, not just the ones in the directory, using File::Find. It does this in one pass by concatenating all the patterns into regular expressions. It uses regexes instead of index() and substr() and reverse() and oh god. It simply uses built in filehandles rather than the FileHandle module and checking for eof(). Everything is declared lexical (my) instead of global (our). Strict and warnings are on for easier debugging.
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
die "Usage: search_directory function ...\n" unless #ARGV >= 2;
my $Search_Dir = shift;
my $Pattern = build_pattern(#ARGV);
find(
{
wanted => sub {
return unless $File::Find::name =~ m/\.(c|h)$/i;
find_func($File::Find::name, $pattern);
},
no_chdir => 1,
},
$Search_Dir
);
# Join all the function names into one pattern
sub build_pattern {
my #patterns;
for my $name (#_) {
# Turn foo() into foo. This replaces all that reverse() and rindex()
# and substr() stuff.
$name =~ s{\(.*}{};
# Use \Q to protect against regex metacharacters in the input
push #patterns, qr{\Q$name\E};
}
# Join them up into one pattern.
return join "|", #patterns;
}
sub find_func {
my( $file, $pattern ) = #_;
open(my $fh, "<", $file) or die "Can't open $file: $!";
while (my $line = <$fh>) {
# XXX not all functions are unindented, but your choice
next if $line =~ m/^\s+/;
print "$file:$.: $line" if $line =~ $pattern;
}
}