Scope of $_ : why does this change it? - perl

I have a code snippet like the following:
use strict;
use warnings;
# file names to search for
open(my $files, "<", "fileList.txt") or die "Can't open fileList.txt: $!";
my $flag = 0;
while (<$files>) {
print "File loop: $_\n";
open(my $search, "<", "searchMe.txt") or die "Can't open searchMe.txt: $!";
$flag = 0;
while (<$search>){
print "Search loop: $_\n";
}
}
fileList.txt contains one line: "CheckFilesFunctions.pm"
searchMe.txt contains one line: abc
The output here is
File loop: CheckFilesFunctions.pm
Search loop: abc
However. when I change the search loop to the following
while (<$search> && !$flag){
Suddenly the search loop starts printing
Search loop: CheckFilesFunctions.pm
Why does the scope of $_ change here?

while (<filehandle>) is convenient shorthand for while (defined( $_ = <filehandle> )); if you have a more complicated expression to test, you need to explicitly include the full thing:
while ( defined( $_ = <$search> ) && ! $flag ) {
though I would suggest explicitly using readline (<> can mean either readline or glob, depending on the argument; I prefer to use those directly) and using a lexical variable:
while ( defined( my $line = readline $search ) && ! $flag ) {
Alternatively, you could break out of the loop instead of modifying the condition:
while (<$search>) {
...
if (...) {
last;
Though looking at your code, you probably want to be reading the search file just once into an array before the file loop, and just looping over that array.

Related

Perl - substring keywords

I have a text file where is lot of lines, I need search in this file keywords and if exist write to log file line where is keywords and line one line below and one above the keyword. Now search or write keyword not function if find write all and I dont known how can I write line below and above. Thanks for some advice.
my $vstup = "C:/Users/Omega/Documents/Kontroly/testkontroly/kontroly20220513_154743.txt";
my $log = "C:/Users/Omega/Documents/Kontroly/testkontroly/kontroly.log";
open( my $default_fh, "<", $vstup ) or die $!;
open( my $main_fh, ">", $log ) or die $!;
my $var = 0;
while ( <$default_fh> ) {
if (/\Volat\b/)
$var = 1;
}
if ( $var )
print $main_fh $_;
}
}
close $default_fh;
close $main_fh;
The approach below use one semaphore variable and a buffer variable to enable the desired behavior.
Notice that the pattern used was replaced by 'A` for simplicity testing.
#!/usr/bin/perl
use strict;
use warnings;
my ($in_fh, $out_fh);
my ($in, $out);
$in = 'input.txt';
$out = 'output.txt';
open($in_fh, "< ", $in) || die $!."\n";
open($out_fh, "> ", $out) || die $!;
my $p_next = 0;
my $p_line;
while (my $line = <$in_fh>) {
# print line after occurrence
print $out_fh $line if ($p_next);
if ($line =~ /A/) {
if (defined($p_line)) {
# print previous line
print $out_fh $p_line;
# once printed undefine variable to avoid printing it again in the next loop
undef($p_line);
}
# Print current line if not already printed as the line follows a pattern
print $out_fh $line if (!$p_next);
# toggle semaphore to print the next line
$p_next = 1;
} else {
# pattern not found.
# if pattern was not detected in both current and previous line.
$p_line = $line if (!$p_next);
$p_next = 0;
}
}
close($in_fh);
close($out_fh);

Open two text files, process them and write to separate files

I'm using with Perl to open two text files, process them and then write the output to another file.
I have a file INPUT were every line is a customer. I will process each line into variables that will be used to substitute text in another file, TEMP. The result should be written into individual files for each customer, OUTPUT.
My program seems to be working on only the first file. The rest of the files remain empty with no output.
#!/usr/bin/perl -w
if ( $#ARGV < 0) {
print "Usage: proj5.pl <mm/dd/yyyy>\n";
exit;
}
my $date = $ARGV[0];
open(INFO, "p5Customer.txt") or die("Could not open p5Customer.txt file\n");
open(TEMP, "template.txt") or die("Could not open template.txt file\n");
my $directory = "Emails";
mkdir $directory unless(-e $directory);
foreach $info (<INFO>){
($email, $fullname, $title, $payed, $owed) = split /,/, $info;
next if($owed < $payed);
chomp($owed);
$filepath = "$directory/$email";
unless(open OUTPUT, '>>'.$filepath){
die "Unable to create '$filepath'\n";
}
foreach $detail (<TEMP>){
$detail =~ s/EMAIL/$email/g;
$detail =~ s/(NAME|FULLNAME)/$fullname/g;
$detail =~ s/TITLE/$title/g;
$detail =~ s/AMOUNT/$owed/g;
$detail =~ s{DATE}{$date}g;
print OUTPUT $detail;
}
close(OUTPUT);
}
close(INFO);
close(TEMP);
As has been said, you need to open your template file again each time you read from it. There's a bunch of other issues with your code too
Always use strict and use warnings 'all' and declare every variable with my as close as possible to where it is first used
$#ARGV is the index of the last element of #ARGV, so $#ARGV < 0 is much better written as #ARGV < 1
You should use lexical file handles, and the three-parameter form of open, so open(INFO, "p5Customer.txt") should be open my $info_fh, '<', "p5Customer.txt"
You should use while instead of for to read from a file
It is easier to use the default variable $_ for short loops
It is pointless to capture a substring in a regular expression if you're not going to use it, so (NAME|FULLNAME) should be NAME|FULLNAME
There is no point in closing input files before the end of your program
It is also much better to use an existing template system, such as
Template::Toolkit
This should work for you
#!/usr/bin/perl
use strict;
use warnings 'all';
if ( #ARGV < 1 ) {
print "Usage: proj5.pl <mm/dd/yyyy>\n";
exit;
}
my $date = $ARGV[0];
open my $info_fh, '<', 'p5Customer.txt' or die qq{Could not open "p5Customer.txt" file: $!};
my $directory = "Emails";
mkdir $directory unless -e $directory;
while ( <$info_fh> ) {
chomp;
my ($email, $fullname, $title, $payed, $owed) = split /,/;
next if $owed < $payed;
open my $template_fh, '<', 'template.txt' or die qq{Could not open "template.txt" file: $!};
my $filepath = "$directory/$email";
open my $out_fh, '>', $filepath or die qq{Unable to create "$filepath": $!};
while ( <$template_fh> ) {
s/EMAIL/$email/g;
s/FULLNAME|NAME/$fullname/g;
s/TITLE/$title/g;
s/AMOUNT/$owed/g;
s/DATE/$date/g;
print $out_fh $_;
}
close($out_fh);
}
Your problem is that the TEMP loop is inside the INPUT loop and so the TEMP loop will end while the INPUT loop is still on the first line of the INPUT file.
Best to store TEMP file data into a hash table and work on the TEMP hash table inside the INPUT loop.
Good luck.

How to get a comment printed for each line of text that matches within a file?

I am trying to match a keyword/text/line given in a file called expressions.txt from all files matching *main_log. When a match is found I want to print the comment for each line that matches.
Is there any better way to get this printed?
expression.txt
Hello World ! # I want to print this comments#
Bye* #I want this to print when Bye Is match with main_log#
:::
:::
Below Is the code I used :
{
open( my $kw, '<', 'expressions.txt' ) or die $!;
my #keywords = <$kw>;
chomp( #keywords ); # remove newlines at the end of keywords
# get list of files in current directory
my #files = grep { -f } ( <*main_log>, <*Project>, <*properties> );
# loop over each file to search keywords in
foreach my $file ( #files ) {
open( my $fh, '<', $file ) or die $!;
my #content = <$fh>;
close( $fh );
my $l = 0;
foreach my $kw ( #keywords ) {
my $search = quotemeta( $kw ); # otherwise keyword is used as regex, not literally
#$kw =~ m/\[(.*)\]/;
$kw =~ m/\((.*)\)/;
my $temp = $1;
print "$temp\n";
foreach ( #content ) { # go through every line for this keyword
$l++;
printf 'Found keyword %s in file %s, line %d:%s'.$/, $kw, $file, $l, $_ if /$search/;
}
}
}
I tried this code to print the comments mentioned within parentheses (...) but it is not printing in the fashion which I want like below:
If the expression.txt contains
Hello World ! # I want to print this comments#
If Hello World ! string is matched in my file called main_log then it should match only Hello World! from the main_log but print # I want to print this comments# as a comment for user to understand the keyword.
These keywords can be from any length or contains any character.
It worked fine but just a little doubt on printing the required output Into a file though I have used perl -w Test.pl > my_output.txt command on command prompt not sure how can I use Inside the perl script Itself
open( my $kw, '<', 'expressions.txt') or die $!;
my #keywords = <$kw>;
chomp(#keywords); # remove newlines at the end of keywords
# post-processing your keywords file
my $kwhashref = {
map {
/^(.*?)(#.*?#)*$/;
defined($2) ? ($1 => $2) : ( $1 => undef )
} #keywords
};
# get list of files in current directory
my #files = grep { -f } (<*main_log>,<*Project>,<*properties>);
# loop over each file to search keywords in
foreach my $file (#files) {
open(my $fh, '<', $file) or die $!;
my #content = <$fh>;
close($fh);
my $l = 0;
#foreach my $kw (#keywords) {
foreach my $kw (keys %$kwhashref) {
my $search = quotemeta($kw); # otherwise keyword is used as regex, not literally
#$kw =~ m/\[(.*)\]/;
#$kw =~ m/\#(.*)\#/;
#my $temp = $1;
#print "$temp\n";
foreach (#content) { # go through every line for this keyword
$l++;
if (/$search/)
{
# only print if comment defined
print $kwhashref->{$kw}."\n" if defined($kwhashref->{$kw}) ;
printf 'Found keyword %s in file %s, line %d:%s'.$/, $kw, $file, $l, $_
#printf '$output';
}
}
}
}
Your example code has mismatched braces { ... } and won't compile.
If you were to add another closing brace to the end of your code then it would compile, but the line
$kw =~ m/\((.*)\)/;
will never succeed since there are no parentheses anywhere in expressions.txt. If a match has not succeeded then the value of $1 will be retained from the most recently successful regex match operation
You are also trying to search the lines from the files against the whole of the lines retrieved from expressions.txt, when you should be splitting those lines into keywords and their corresponding comments
This seems to be the followup for this answer of another question of you. What I tried to suggest in the last paragraph would start after the first three lines of your code:
# post-processing your keywords file
my $kwhashref = {
map {
/^(.*?)(#.*?#)*$/;
defined($2) ? ($1 => $2) : ( $1 => undef )
} #keywords
};
Now you have the keywords in a hashref containing the actual keywords to search for as keys, and comments as values, if they exists (using your #comment# at the end of line syntax here).
Your keyword loop would now have to use keys %$kwhashref and you now can additionally print the comment in the inner loop, converted like shown in the answer I linked. The additional print:
print $kwhashref->{$kw}."\n" if defined($kwhashref->{$kw}); # only print if comment defined

Perl Programming

I have these questions. But I don't know how to prove it or if I'm right. Are my answers right?
Find all complete lines of a file which contain only a row of any number of the letter x
x*
^x+$
^x*$ <-This one
^xxxxx$
Find all complete lines of a file which contain a row consisting only the letter x but ignoring any leading or trailing space on the line.
^\s* x+\s*$ <--This one
^\s(x*)\s$
\s* x+\s*
^\s+x+\s+$
I tried to use this
use strict;
use warnings;
my $filename = 'data.txt';
open( my $fh, '<:encoding(UTF-8)', $filename ) or die "Could not open file '$filename' $!";
while ( my $row = <$fh> ) {
chomp $row;
print "$row\n";
}
I tried this code but I got error at (^
use strict;
use warnings;
my $filename = 'data.txt';
open( my $fh, '<:encoding(UTF-8)', $filename ) or die "Could not open file '$filename' $!";
while ( my $row = <$fh> ) {
if ( ^x*$ ) {
print "This is";
}
}
You're talking about regular expressions and how to use them in Perl. Your question seems to be whether the answers you picked to homework are correct.
The code you've added should do what you want, but it has syntax errors.
if ( ^x*$ ) {
print "This is";
}
Your pattern is correct, but you don't know how to use a regular expression in Perl. You're missing the actual operator to tell Perl that you want a regular expression.
The short form is this, where I've highlighted the important part with #
if ( /^x*$/ ) {
# #
The slashes // tell Perl that it should match a pattern. The long form of it is:
if ( $_ =~ m/^x*$/ ) {
## ## ## #
$_ is the variable that you are matching against a pattern. The =~ is the matching operator. The m// constructs a pattern to match with. If you use // you can leave out the m, but it's clearer to put it in.
The $_ is called topic. It's like a default variable that stuff goes into in Perl if you don't specify another variable.
while ( <$fh> ) {
print $_ if $_ =~ m/foo/; # print all lines that contain foo
}
This code can be written as $_, because a lot of commands in Perl assume that you mean $_ when you don't explicitly name a variable.
while ( <$fh> ) { # puts each line in $_
print if m/foo/; # prints $_ if $_ contains foo
}
You code looks like you wanted to do that, but in fact you have a $row in your loop. That's good, because it is more explicit. That means it's easier to read. So what you need to do for your match is:
while ( my $row = <$fh> ) {
if ( $row =~ m/^x*$/ ) {
print "This is";
}
}
Now you will iterate each line of the file behind the $fh filehandle, and check if it matches the pattern ^x*$. If it does, you print _"This is". That doesn't sound very useful.
Consider this example, where I am using the __DATA__ section instead of a file.
use strict;
use warnings;
while ( my $row = <DATA> ) {
if ( $row =~ m/^x*$/ ) {
print "This is";
}
}
__DATA__
foo
xxx
x
xxxxx
bar
This will print:
This isThis isThis isThis is
It really does not seem to be very useful. It would make more sense to include the line that matched.
if ( $row =~ m/^x*$/ ) {
print "match: $row";
}
Now we get this:
match: xxx
match:
match: x
match: xxxxx
That's almost what we expected. It matches a single x, and a bunch of xs. It did not match foo or bar. But it does match an empty line.
That's because you picked the wrong pattern.
The * multiplier means match as many as possible, as least none.
The + multiplier means match as many as possible, at least one.
So your pattern should be the one with +, or it will match if there is nothing, because start of the line, no x, end of the line matches an empty line.
While you're at it, you could also rename your variable. Unless you're dealing with CSV, which has rows of data, you have lines, not rows. So $line would be a better name for your variable. Giving variables good, descriptive names is very important because it makes it easier to understand your program.
use strict;
use warnings;
my $filename = 'data.txt';
open( my $fh, '<:encoding(UTF-8)', $filename )
or die "Could not open file '$filename' $!";
while ( my $line = <$fh> ) {
if ( $line =~ m/^x+$/ ) {
print "match: $line";
}
}

How to search and replace using hash with Perl

I'm new to Perl and I'm afraid I am stuck and wanted to ask if someone might be able to help me.
I have a file with two columns (tab separated) of oldname and newname.
I would like to use the oldname as key and newname as value and store it as a hash.
Then I would like to open a different file (gff file) and replace all the oldnames in there with the newnames and write it to another file.
I have given it my best try but am getting a lot of errors.
If you could let me know what I am doing wrong, I would greatly appreciate it.
Here are how the two files look:
oldname newname(SFXXXX) file:
genemark-scaffold00013-abinit-gene-0.18 SF130001
augustus-scaffold00013-abinit-gene-1.24 SF130002
genemark-scaffold00013-abinit-gene-1.65 SF130003
file to search and replace in (an example of one of the lines):
scaffold00013 maker gene 258253 258759 . - . ID=maker-scaffold00013-augustus-gene-2.187;Name=maker-scaffold00013-augustus-gene-2.187;
Here is my attempt:
#!/usr/local/bin/perl
use warnings;
use strict;
my $hashfile = $ARGV[0];
my $gfffile = $ARGV[1];
my %names;
my $oldname;
my $newname;
if (!defined $hashfile) {
die "Usage: $0 hash_file gff_file\n";
}
if (!defined $gfffile) {
die "Usage: $0 hash_file gff_file\n";
}
###save hashfile with two columns, oldname and newname, into a hash with oldname as key and newname as value.
open(HFILE, $hashfile) or die "Cannot open $hashfile\n";
while (my $line = <HFILE>) {
chomp($line);
my ($oldname, $newname) = split /\t/;
$names{$oldname} = $newname;
}
close HFILE;
###open gff file and replace all oldnames with newnames from %names.
open(GFILE, $gfffile) or die "Cannot open $gfffile\n";
while (my $line2 = <GFILE>) {
chomp($line2);
eval "$line2 =~ s/$oldname/$names{oldname}/g";
open(OUT, ">SFrenamed.gff") or die "Cannot open SFrenamed.gff: $!";
print OUT "$line2\n";
close OUT;
}
close GFILE;
Thank you!
Your main problem is that you aren't splitting the $line variable. split /\t/ splits $_ by default, and you haven't put anything in there.
This program builds the hash, and then constructs a regex from all the keys by sorting them in descending order of length and joining them with the | regex alternation operator. The sorting is necessary so that the longest of all possible choices is selected if there are any alternatives.
Every occurrence of the regex is replaced by the corresponding new name in each line of the input file, and the output written to the new file.
use strict;
use warnings;
die "Usage: $0 hash_file gff_file\n" if #ARGV < 2;
my ($hashfile, $gfffile) = #ARGV;
open(my $hfile, '<', $hashfile) or die "Cannot open $hashfile: $!";
my %names;
while (my $line = <$hfile>) {
chomp($line);
my ($oldname, $newname) = split /\t/, $line;
$names{$oldname} = $newname;
}
close $hfile;
my $regex = join '|', sort { length $b <=> length $a } keys %names;
$regex = qr/$regex/;
open(my $gfile, '<', $gfffile) or die "Cannot open $gfffile: $!";
open(my $out, '>', 'SFrenamed.gff') or die "Cannot open SFrenamed.gff: $!";
while (my $line = <$gfile>) {
chomp($line);
$line =~ s/($regex)/$names{$1}/g;
print $out $line, "\n";
}
close $out;
close $gfile;
Why are you using an eval? And $oldname is going to be undefined in the second while loop, because the first while loop you redeclare them in that scope (even if you used the outer scope, it would store the very last value that you processed, which wouldn't be helpful).
Take out the my $oldname and my $newname at the top of your script, it is useless.
Take out the entire eval line. You need to repeat the regex for each thing you want to replace. Try something like:
$line2 =~ s/$_/$names{$_}/g for keys %names;
Also see Borodin's answer. He made one big regex instead of a loop, and caught your lack of the second argument to split.