Perl: Getting lines from file based on user input [duplicate] - perl

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How can I implement Unix grep in Perl?
Is there anyway I can write a perl script to print out all lines in a file that contains the string input by the user?

There are two separate concepts in this answer:
File IO, and iterating through all lines in a while
Regular expression, and especially passing a variable to a regular expression.
Note the use of quotemeta. It is important when the user input contains RE specific characters (which may even create an illigel RE if you don't handle it).
Here is the code:
print "Looking for: ";
my $input = <>;
chomp $input;
my $re = quotemeta $input;
open my $fh, "<", "myfile.txt" or die $!;
while( <$fh> ) {
print if /$re/;
}
close $fh;

You add the regular expression modifer "i"
print "Name: ";
my $string = <>;
open FILE, "test.txt" or die $!;
while(<FILE>) {
chomp;
print "$_\n" if(/$string*/i);
}
close FILE;

Oh, figured out how to do it. Turned out to be quite simple
But how do I do it if I don't want the search to be case sensitive?
print "Name: ";
my $string = <>;
open FILE, "test.txt" or die $!;
while(<FILE>) {
chomp;
print "$_\n" if(/$string*/);
}
close FILE;

Your while loop body is a bit complex. This also works:
while (<FILE>) { print if /$string/i }
There's no need to chomp the newline and then add it back in.

One liner:
perl -nE 'BEGIN{$pattern=shift};say if /$pattern/i' pattern filename(s)
or
perl -nE 'BEGIN{$pattern=shift};say "$ARGV $.: $_" if /$pattern/i' pattern filename(s)
to get the file and line number too.

Related

Need something like open OR DIE except with chomp

I'm fairly new to coding and I need a fail statement to print out as if it were an or die.
Part of my code for an example:
print "Please enter the name of the file to search:";
chomp (my $filename=<STDIN>) or die "No such file exists. Exiting program. Please try again."\n;
print "Enter a word to search for:";
chomp (my $word=<STDIN>);
I need it to do it for both of these print/chomp statements. Is there anyway to just add on to this?
Whole program:
#!/usr/bin/perl -w
use strict;
print "Welcome to the word frequency calculator.\n";
print "This program prompts the user for a file to open, \n";
print "then it prompts for a word to search for in that file,\n";
print "finally the frequency of the word is displayed.\n";
print " \n";
print "Please enter the name of the file to search:";
while (<>){
print;
}
print "Enter a word to search for:";
chomp( my $input = <STDIN> );
my $filename = <STDIN>;
my$ctr=0;
foreach( $filename ) {
if ( /\b$input\b/ ) {
$ctr++;
}
}
print "Freq: $ctr\n";
exit;
You don't need to test the filehandle read <> for success. See I/O Operators in perlop. When it has nothing to read it returns an undef, which is precisely what you want so your code knows when to stop reading.
As for removing the newline, you want to chomp separately anyway. Otherwise, once the read does return an undef you'd chomp on an undefined variable, triggering a warning.
Normally, with a filehandle $fh opened on some resource, you'd do
while (my $line = <$fh>) {
chomp $line;
# process/store input as it comes ...
}
This can be STDIN as well. If it is certainly just one line
my $filename = <STDIN>;
chomp $filename;
You don't need to test chomp against failure either. Note that it returns the number of characters that it removed, so if there was no $/ (newline typically) it legitimately returns 0.
To add, it is a very good practice to always test! As a part of that mindset, please make sure to always use warnings;, and I also strongly recommend coding with use strict;.
Update to a significant question edit
In the first while loop you do not store the filename anywhere. Given the greeting that is printed, instead of that loop you should just read the filename. Then you read the word to search for.
# print greeting
my $filename = <STDIN>;
chomp $filename;
my $input = <STDIN>;
chomp $input;
However, then we get to the bigger problem: you need to open the file, and only then can you go through it line by line and search for the word. This is where you need the test. See the linked doc page and the tutorial perlopentut. First check whether a file with that name exists.
if (not -e $filename) {
print "No file $filename. Please try again.\n";
exit;
}
open my $fh, '<', $filename or die "Can't open $filename: $!";
my $word_count = 0;
while (my $line = <$fh>)
{
# Now search for the word on a line
while ($line =~ /\b$input\b/g) {
$word_count++;
}
}
close $fh or die "Can't close filehandle: $!";
The -e above is one of the file-tests, this one checking whether the given file exists. See the doc page for file-tests (-X). In the code above we just exit with a message, but you may want to print the message prompting the user to enter another name, in a loop.
We use while and the /g modifier in regex to find all occurencies of the word on a line.
I'd like to also strongly suggest to always start your programs with
use warnings 'all';
use strict;

My perl script isn't working, I have a feeling it's the grep command

I'm trying for search in the one file for instances of the
number and post if the other file contains those numbers
#!/usr/bin/perl
open(file, "textIds.txt"); #
#file = <file>; #file looking into
# close file; #
while(<>){
$temp = $_;
$temp =~ tr/|/\t/; #puts tab between name and id
#arrayTemp = split("\t", $temp);
#found=grep{/$arrayTemp[1]/} <file>;
if (defined $found[0]){
#if (grep{/$arrayTemp[1]/} <file>){
print $_;
}
#found=();
}
print "\n";
close file;
#the input file lines have the format of
#John|7791 154
#Smith|5432 290
#Conor|6590 897
#And in the file the format is
#5432
#7791
#6590
#23140
There are some issues in your script.
Always include use strict; and use warnings;.
This would have told you about odd things in your script in advance.
Never use barewords as filehandles as they are global identifiers. Use three-parameter-open
instead: open( my $fh, '<', 'testIds.txt');
use autodie; or check whether the opening worked.
You read and store testIds.txt into the array #file but later on (in your grep) you are
again trying to read from that file(handle) (with <file>). As #PaulL said, this will always
give undef (false) because the file was already read.
Replacing | with tabs and then splitting at tabs is not neccessary. You can split at the
tabs and pipes at the same time as well (assuming "John|7791 154" is really "John|7791\t154").
Your talking about "input file" and "in file" without exactly telling which is which.
I assume your "textIds.txt" is the one with only the numbers and the other input file is the
one read from STDIN (with the |'s in it).
With this in mind your script could be written as:
#!/usr/bin/perl
use strict;
use warnings;
# Open 'textIds.txt' and slurp it into the array #file:
open( my $fh, '<', 'textIds.txt') or die "cannot open file: $!\n";
my #file = <$fh>;
close($fh);
# iterate over STDIN and compare with lines from 'textIds.txt':
while( my $line = <>) {
# split "John|7791\t154" into ("John", "7791", "154"):
my ($name, $number1, $number2) = split(/\||\t/, $line);
# compare $number1 to each member of #file and print if found:
if ( grep( /$number1/, #file) ) {
print $line;
}
}

How do I find the line a word is on when the user enters text in Perl?

I have a simple text file that includes all 50 states. I want the user to enter a word and have the program return the line the specific state is on in the file or otherwise display a "word not found" message. I do not know how to use find. Can someone assist with this? This is what I have so far.
#!/bin/perl -w
open(FILENAME,"<WordList.txt"); #opens WordList.txt
my(#list) = <FILENAME>; #read file into list
my($state); #create private "state" variable
print "Enter a US state to search for: \n"; #Print statement
$line = <STDIN>; #use of STDIN to read input from user
close (FILENAME);
An alternative solution that reads only the parts of the file until a result is found, or the file is exhausted:
use strict;
use warnings;
print "Enter a US state to search for: \n";
my $line = <STDIN>;
chomp($line);
# open file with 3 argument open (safer)
open my $fh, '<', 'WordList.txt'
or die "Unable to open 'WordList.txt' for reading: $!";
# read the file until result is found or the file is exhausted
my $found = 0;
while ( my $row = <$fh> ) {
chomp($row);
next unless $row eq $line;
# $. is a special variable representing the line number
# of the currently(most recently) accessed filehandle
print "Found '$line' on line# $.\n";
$found = 1; # indicate that you found a result
last; # stop searching
}
close($fh);
unless ( $found ) {
print "'$line' was not found\n";
}
General notes:
always use strict; and use warnings; they will save you from a wide range of bugs
3 argument open is generally preferred, as well as the or die ... statement. If you are unable to open the file, reading from the filehandle will fail
$. documentation can be found in perldoc perlvar
Tool for the job is grep.
chomp ( $line ); #remove linefeeds
print "$line is in list\n" if grep { m/^\Q$line\E$/g } #list;
You could also transform your #list into a hash, and test that, using map:
my %cities = map { $_ => 1 } #list;
if ( $cities{$line} ) { print "$line is in list\n";}
Note - the above, because of the presence of ^ and $ is an exact match (and case sensitive). You can easily adjust it to support fuzzier scenarios.

How to search and replace using hash with Perl

I'm new to Perl and I'm afraid I am stuck and wanted to ask if someone might be able to help me.
I have a file with two columns (tab separated) of oldname and newname.
I would like to use the oldname as key and newname as value and store it as a hash.
Then I would like to open a different file (gff file) and replace all the oldnames in there with the newnames and write it to another file.
I have given it my best try but am getting a lot of errors.
If you could let me know what I am doing wrong, I would greatly appreciate it.
Here are how the two files look:
oldname newname(SFXXXX) file:
genemark-scaffold00013-abinit-gene-0.18 SF130001
augustus-scaffold00013-abinit-gene-1.24 SF130002
genemark-scaffold00013-abinit-gene-1.65 SF130003
file to search and replace in (an example of one of the lines):
scaffold00013 maker gene 258253 258759 . - . ID=maker-scaffold00013-augustus-gene-2.187;Name=maker-scaffold00013-augustus-gene-2.187;
Here is my attempt:
#!/usr/local/bin/perl
use warnings;
use strict;
my $hashfile = $ARGV[0];
my $gfffile = $ARGV[1];
my %names;
my $oldname;
my $newname;
if (!defined $hashfile) {
die "Usage: $0 hash_file gff_file\n";
}
if (!defined $gfffile) {
die "Usage: $0 hash_file gff_file\n";
}
###save hashfile with two columns, oldname and newname, into a hash with oldname as key and newname as value.
open(HFILE, $hashfile) or die "Cannot open $hashfile\n";
while (my $line = <HFILE>) {
chomp($line);
my ($oldname, $newname) = split /\t/;
$names{$oldname} = $newname;
}
close HFILE;
###open gff file and replace all oldnames with newnames from %names.
open(GFILE, $gfffile) or die "Cannot open $gfffile\n";
while (my $line2 = <GFILE>) {
chomp($line2);
eval "$line2 =~ s/$oldname/$names{oldname}/g";
open(OUT, ">SFrenamed.gff") or die "Cannot open SFrenamed.gff: $!";
print OUT "$line2\n";
close OUT;
}
close GFILE;
Thank you!
Your main problem is that you aren't splitting the $line variable. split /\t/ splits $_ by default, and you haven't put anything in there.
This program builds the hash, and then constructs a regex from all the keys by sorting them in descending order of length and joining them with the | regex alternation operator. The sorting is necessary so that the longest of all possible choices is selected if there are any alternatives.
Every occurrence of the regex is replaced by the corresponding new name in each line of the input file, and the output written to the new file.
use strict;
use warnings;
die "Usage: $0 hash_file gff_file\n" if #ARGV < 2;
my ($hashfile, $gfffile) = #ARGV;
open(my $hfile, '<', $hashfile) or die "Cannot open $hashfile: $!";
my %names;
while (my $line = <$hfile>) {
chomp($line);
my ($oldname, $newname) = split /\t/, $line;
$names{$oldname} = $newname;
}
close $hfile;
my $regex = join '|', sort { length $b <=> length $a } keys %names;
$regex = qr/$regex/;
open(my $gfile, '<', $gfffile) or die "Cannot open $gfffile: $!";
open(my $out, '>', 'SFrenamed.gff') or die "Cannot open SFrenamed.gff: $!";
while (my $line = <$gfile>) {
chomp($line);
$line =~ s/($regex)/$names{$1}/g;
print $out $line, "\n";
}
close $out;
close $gfile;
Why are you using an eval? And $oldname is going to be undefined in the second while loop, because the first while loop you redeclare them in that scope (even if you used the outer scope, it would store the very last value that you processed, which wouldn't be helpful).
Take out the my $oldname and my $newname at the top of your script, it is useless.
Take out the entire eval line. You need to repeat the regex for each thing you want to replace. Try something like:
$line2 =~ s/$_/$names{$_}/g for keys %names;
Also see Borodin's answer. He made one big regex instead of a loop, and caught your lack of the second argument to split.

Print email addresses to a file in Perl

I have been scouring this site and others to find the best way to do what I need to do but to no avail. Basically I have a text file with some names and email addresses. Each name and email address is on its own line. I need to get the email addresses and print them to another text file. So far all I have been able to print is the "no email addresses found" message. Any thoughts? Thanks!!
#!/usr/bin/perl
open(IN, "<contacts.txt") || die("file not found");
#chooses the file to read
open(OUT, ">emailaddresses.txt");
#prints file
$none = "No emails found!";
$line = <IN>;
for ($line)
{
if ($line =~ /[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}/g)
{
print (OUT $line);
}
else
{
print (OUT $none);
}
}
close(IN);
close(OUT);
First, always use strict; use warnings. This helps writing correct scripts, and is an invaluable aid when debugging.
Also, use a three-arg-open:
open my $fh, "<", $filename or die qq(Can't open "$filename": $!);
I included the reason for failure ($!), which is a good practice too.
The idiom to read files (on an open filehandle) is:
while (<$fh>) {
chomp;
# The line is in $_;
}
or
while (defined(my $line = <$fh>)) { chomp $line; ... }
What you did was to read one line into $line, and loop over that one item in the for loop.
(Perl has a notion of context. Operators like <$fh> behave differently depending on context. Generally, using a scalar variable ($ sigil) forces scalar context, and #, the sigil for arrays, causes list context. This is quite unlike PHP.)
I'd rewrite your code like:
use strict; use warnings;
use feature 'say';
my $regex = qr/[A-Z0-9._%+-]+\#[A-Z0-9.-]+\.[A-Z]{2,4}/i; # emails are case insensitive
my $found = 0;
while (<>) { # use special ARGV filehandle, which usually is STDIN
while (/($regex)/g) {
$found++;
say $1;
}
}
die "No emails found\n" unless $found;
Invoked like perl script.pl <contacts.txt >emailaddresses.txt. The shell is your friend, and creating programs that can be piped from and to is good design.
Update
If you want to hardcode the filenames, we would combine the above script with the three-arg open I have shown:
use strict; use warnings; use feature 'say';
use autodie; # does `... or die "Can't open $file: $!"` for me
my $regex = qr/[A-Z0-9._%+-]+\#[A-Z0-9.-]+\.[A-Z]{2,4}/i;
my $found = 0;
my $contact_file = "contacts.txt";
my $email_file = "emailaddresses.txt";
open my $contact, "<", $contact_file;
open my $email, ">", $email_file;
while (<$contact>) { # read from the $contact filehandle
while (/($regex)/g) { # the /g is optional if there is max one address per line
$found++;
say {$email} $1; # print to the $email file handle. {curlies} are optional.
}
}
die "No emails found\n" unless $found; # error message goes to STDERR, not to the file