How do I find the line a word is on when the user enters text in Perl? - perl

I have a simple text file that includes all 50 states. I want the user to enter a word and have the program return the line the specific state is on in the file or otherwise display a "word not found" message. I do not know how to use find. Can someone assist with this? This is what I have so far.
#!/bin/perl -w
open(FILENAME,"<WordList.txt"); #opens WordList.txt
my(#list) = <FILENAME>; #read file into list
my($state); #create private "state" variable
print "Enter a US state to search for: \n"; #Print statement
$line = <STDIN>; #use of STDIN to read input from user
close (FILENAME);

An alternative solution that reads only the parts of the file until a result is found, or the file is exhausted:
use strict;
use warnings;
print "Enter a US state to search for: \n";
my $line = <STDIN>;
chomp($line);
# open file with 3 argument open (safer)
open my $fh, '<', 'WordList.txt'
or die "Unable to open 'WordList.txt' for reading: $!";
# read the file until result is found or the file is exhausted
my $found = 0;
while ( my $row = <$fh> ) {
chomp($row);
next unless $row eq $line;
# $. is a special variable representing the line number
# of the currently(most recently) accessed filehandle
print "Found '$line' on line# $.\n";
$found = 1; # indicate that you found a result
last; # stop searching
}
close($fh);
unless ( $found ) {
print "'$line' was not found\n";
}
General notes:
always use strict; and use warnings; they will save you from a wide range of bugs
3 argument open is generally preferred, as well as the or die ... statement. If you are unable to open the file, reading from the filehandle will fail
$. documentation can be found in perldoc perlvar

Tool for the job is grep.
chomp ( $line ); #remove linefeeds
print "$line is in list\n" if grep { m/^\Q$line\E$/g } #list;
You could also transform your #list into a hash, and test that, using map:
my %cities = map { $_ => 1 } #list;
if ( $cities{$line} ) { print "$line is in list\n";}
Note - the above, because of the presence of ^ and $ is an exact match (and case sensitive). You can easily adjust it to support fuzzier scenarios.

Related

can't loop through the whole thing to start at the beginning after it shows your results

I am really new in perl and I am writing this program that gives you the unique words that are in a text file. however I don't know how to make it loop to ask the user for another file or to quit the program altogether.
I tried to put my whole code under a do until loop and it did not work
use 5.18.0;
use warnings;
use strict;
print "Enter the name of the file: ";
my %count;
my $userinput = <>; #the name of the text file the user wants to read
chomp($userinput); #take out the new line comand
my $linenumb = $ARGV[1];
my $uniqcount = 0;
#opens the file if is readeable
open(FH, '<:encoding(UTF-8)', $userinput) or die "Could not open file '$userinput' $!";
print "Summary of file '$userinput': \n";
my ($lines, $wordnumber, $total) = (0, 0, 0);
my #words = ();
my $count =1;
while (my $line = <FH>) {
$lines++;
my #words = split (" ", $line);
$wordnumber = #words;
print "\n Line $lines : $wordnumber ";
$total = $total+$wordnumber;
$wordnumber++;
}
print "\nTotal no. of words in file are $total \n";
#my #uniq = uniq #words;
#print "Unique Names: " .scalar #uniq . "\n";
close(FH);
It's often a good idea to put complicated pieces of your code into subroutines so that you can forget (temporarily) how the details work and concentrate on the bigger picture.
I'd suggest that you have two obvious subroutines here that might be called get_user_input() and process_file(). Putting the code into subroutines might look like this:
sub get_user_input {
print "Enter the name of the file: ";
my $userinput = <>; #the name of the text file the user wants to read
chomp($userinput); #take out the new line comand
return $userinput;
}
sub process_file {
my ($file) = #_;
#opens the file if is readeable
# Note: Changed to using a lexical filehandle.
# This will automatically be closed when the
# lexical variable goes out of scope (i.e. at
# the end of this subroutine).
open(my $fh, '<:encoding(UTF-8)', $file)
or die "Could not open file '$file' $!";
print "Summary of file '$file': \n";
# Removed $lines variable. We'll use the built-in
# variable $. instead.
# Moved declaration of $wordnumber inside the loop.
# Removed #words and $count variables that aren't used.
my $total = 0;
# Removed $line variable. We'll use $_ instead.
while (<$fh>) {
# With no arguments, split() defaults to
# behaving as split ' ', $_.
# When assigned to a scalar, split() returns
# the number of elements in the split list
# (which is what we want here - we never actually
# use the list of words).
my $wordnumber = split;
print "\n Line $. : $wordnumber ";
# $x += $y is a shortcut for $x = $x + $y.
$total += $wordnumber;
$wordnumber++;
}
print "\nTotal no. of words in file are $total \n";
}
And then you can plug them together with code something like this:
# Get the first filename from the user
my $filename = get_user_input();
# While the user hasn't typed 'q' to quit
while ($filename ne 'q') {
# Process the file
process_file($filename);
# Get another filename from the user
$filename = get_user_input();
}
Update: I've cleaned up the process_file() subroutine a bit and added comments about the changes I've made.
Wrap everything in a neverending loop and conditionally jump out of it.
while () {
my $prompt = …
last if $prompt eq 'quit';
… # file handling goes here
}

Need something like open OR DIE except with chomp

I'm fairly new to coding and I need a fail statement to print out as if it were an or die.
Part of my code for an example:
print "Please enter the name of the file to search:";
chomp (my $filename=<STDIN>) or die "No such file exists. Exiting program. Please try again."\n;
print "Enter a word to search for:";
chomp (my $word=<STDIN>);
I need it to do it for both of these print/chomp statements. Is there anyway to just add on to this?
Whole program:
#!/usr/bin/perl -w
use strict;
print "Welcome to the word frequency calculator.\n";
print "This program prompts the user for a file to open, \n";
print "then it prompts for a word to search for in that file,\n";
print "finally the frequency of the word is displayed.\n";
print " \n";
print "Please enter the name of the file to search:";
while (<>){
print;
}
print "Enter a word to search for:";
chomp( my $input = <STDIN> );
my $filename = <STDIN>;
my$ctr=0;
foreach( $filename ) {
if ( /\b$input\b/ ) {
$ctr++;
}
}
print "Freq: $ctr\n";
exit;
You don't need to test the filehandle read <> for success. See I/O Operators in perlop. When it has nothing to read it returns an undef, which is precisely what you want so your code knows when to stop reading.
As for removing the newline, you want to chomp separately anyway. Otherwise, once the read does return an undef you'd chomp on an undefined variable, triggering a warning.
Normally, with a filehandle $fh opened on some resource, you'd do
while (my $line = <$fh>) {
chomp $line;
# process/store input as it comes ...
}
This can be STDIN as well. If it is certainly just one line
my $filename = <STDIN>;
chomp $filename;
You don't need to test chomp against failure either. Note that it returns the number of characters that it removed, so if there was no $/ (newline typically) it legitimately returns 0.
To add, it is a very good practice to always test! As a part of that mindset, please make sure to always use warnings;, and I also strongly recommend coding with use strict;.
Update to a significant question edit
In the first while loop you do not store the filename anywhere. Given the greeting that is printed, instead of that loop you should just read the filename. Then you read the word to search for.
# print greeting
my $filename = <STDIN>;
chomp $filename;
my $input = <STDIN>;
chomp $input;
However, then we get to the bigger problem: you need to open the file, and only then can you go through it line by line and search for the word. This is where you need the test. See the linked doc page and the tutorial perlopentut. First check whether a file with that name exists.
if (not -e $filename) {
print "No file $filename. Please try again.\n";
exit;
}
open my $fh, '<', $filename or die "Can't open $filename: $!";
my $word_count = 0;
while (my $line = <$fh>)
{
# Now search for the word on a line
while ($line =~ /\b$input\b/g) {
$word_count++;
}
}
close $fh or die "Can't close filehandle: $!";
The -e above is one of the file-tests, this one checking whether the given file exists. See the doc page for file-tests (-X). In the code above we just exit with a message, but you may want to print the message prompting the user to enter another name, in a loop.
We use while and the /g modifier in regex to find all occurencies of the word on a line.
I'd like to also strongly suggest to always start your programs with
use warnings 'all';
use strict;

My perl script isn't working, I have a feeling it's the grep command

I'm trying for search in the one file for instances of the
number and post if the other file contains those numbers
#!/usr/bin/perl
open(file, "textIds.txt"); #
#file = <file>; #file looking into
# close file; #
while(<>){
$temp = $_;
$temp =~ tr/|/\t/; #puts tab between name and id
#arrayTemp = split("\t", $temp);
#found=grep{/$arrayTemp[1]/} <file>;
if (defined $found[0]){
#if (grep{/$arrayTemp[1]/} <file>){
print $_;
}
#found=();
}
print "\n";
close file;
#the input file lines have the format of
#John|7791 154
#Smith|5432 290
#Conor|6590 897
#And in the file the format is
#5432
#7791
#6590
#23140
There are some issues in your script.
Always include use strict; and use warnings;.
This would have told you about odd things in your script in advance.
Never use barewords as filehandles as they are global identifiers. Use three-parameter-open
instead: open( my $fh, '<', 'testIds.txt');
use autodie; or check whether the opening worked.
You read and store testIds.txt into the array #file but later on (in your grep) you are
again trying to read from that file(handle) (with <file>). As #PaulL said, this will always
give undef (false) because the file was already read.
Replacing | with tabs and then splitting at tabs is not neccessary. You can split at the
tabs and pipes at the same time as well (assuming "John|7791 154" is really "John|7791\t154").
Your talking about "input file" and "in file" without exactly telling which is which.
I assume your "textIds.txt" is the one with only the numbers and the other input file is the
one read from STDIN (with the |'s in it).
With this in mind your script could be written as:
#!/usr/bin/perl
use strict;
use warnings;
# Open 'textIds.txt' and slurp it into the array #file:
open( my $fh, '<', 'textIds.txt') or die "cannot open file: $!\n";
my #file = <$fh>;
close($fh);
# iterate over STDIN and compare with lines from 'textIds.txt':
while( my $line = <>) {
# split "John|7791\t154" into ("John", "7791", "154"):
my ($name, $number1, $number2) = split(/\||\t/, $line);
# compare $number1 to each member of #file and print if found:
if ( grep( /$number1/, #file) ) {
print $line;
}
}

Print email addresses to a file in Perl

I have been scouring this site and others to find the best way to do what I need to do but to no avail. Basically I have a text file with some names and email addresses. Each name and email address is on its own line. I need to get the email addresses and print them to another text file. So far all I have been able to print is the "no email addresses found" message. Any thoughts? Thanks!!
#!/usr/bin/perl
open(IN, "<contacts.txt") || die("file not found");
#chooses the file to read
open(OUT, ">emailaddresses.txt");
#prints file
$none = "No emails found!";
$line = <IN>;
for ($line)
{
if ($line =~ /[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}/g)
{
print (OUT $line);
}
else
{
print (OUT $none);
}
}
close(IN);
close(OUT);
First, always use strict; use warnings. This helps writing correct scripts, and is an invaluable aid when debugging.
Also, use a three-arg-open:
open my $fh, "<", $filename or die qq(Can't open "$filename": $!);
I included the reason for failure ($!), which is a good practice too.
The idiom to read files (on an open filehandle) is:
while (<$fh>) {
chomp;
# The line is in $_;
}
or
while (defined(my $line = <$fh>)) { chomp $line; ... }
What you did was to read one line into $line, and loop over that one item in the for loop.
(Perl has a notion of context. Operators like <$fh> behave differently depending on context. Generally, using a scalar variable ($ sigil) forces scalar context, and #, the sigil for arrays, causes list context. This is quite unlike PHP.)
I'd rewrite your code like:
use strict; use warnings;
use feature 'say';
my $regex = qr/[A-Z0-9._%+-]+\#[A-Z0-9.-]+\.[A-Z]{2,4}/i; # emails are case insensitive
my $found = 0;
while (<>) { # use special ARGV filehandle, which usually is STDIN
while (/($regex)/g) {
$found++;
say $1;
}
}
die "No emails found\n" unless $found;
Invoked like perl script.pl <contacts.txt >emailaddresses.txt. The shell is your friend, and creating programs that can be piped from and to is good design.
Update
If you want to hardcode the filenames, we would combine the above script with the three-arg open I have shown:
use strict; use warnings; use feature 'say';
use autodie; # does `... or die "Can't open $file: $!"` for me
my $regex = qr/[A-Z0-9._%+-]+\#[A-Z0-9.-]+\.[A-Z]{2,4}/i;
my $found = 0;
my $contact_file = "contacts.txt";
my $email_file = "emailaddresses.txt";
open my $contact, "<", $contact_file;
open my $email, ">", $email_file;
while (<$contact>) { # read from the $contact filehandle
while (/($regex)/g) { # the /g is optional if there is max one address per line
$found++;
say {$email} $1; # print to the $email file handle. {curlies} are optional.
}
}
die "No emails found\n" unless $found; # error message goes to STDERR, not to the file

help merging perl code routines together for file processing

I need some perl help in putting these (2) processes/code to work together. I was able to get them working individually to test, but I need help bringing them together especially with using the loop constructs. I'm not sure if I should go with foreach..anyways the code is below.
Also, any best practices would be great too as I'm learning this language. Thanks for your help.
Here's the process flow I am looking for:
read a directory
look for a particular file
use the file name to strip out some key information to create a newly processed file
process the input file
create the newly processed file for each input file read (if i read in 10, I create 10 new files)
Part 1:
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
next if ($file =~ /^\.+$/);
#Get filename attributes
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
print "$1\n";
print "$2\n";
print "$3\n";
}
print "$file\n";
}
Part 2:
use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
my $digest = md5_hex($data);
chomp;
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2" ;
$extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(#values));
$data .= "$extra$eorec";
print NEWFILE "$data";
}
#print $data;
close (NEWFILE);
You are using an old-style of Perl programming. I recommend you to use functions and CPAN modules (http://search.cpan.org). Perl pseudocode:
use Modern::Perl;
# use...
sub get_input_files {
# return an array of files (#)
}
sub extract_file_info {
# takes the file name and returs an array of values (filename attrs)
}
sub process_file {
# reads the input file, takes the previous attribs and build the output file
}
my #ifiles = get_input_files;
foreach my $ifile(#ifiles) {
my #attrs = extract_file_info($ifile);
process_file($ifile, #attrs);
}
Hope it helps
I've bashed your two code fragments together (making the second a sub that the first calls for each matching file) and, if I understood your description of the objective correctly, this should do what you want. Comments on style and syntax are inline:
#!/usr/bin/env perl
# - Never forget these!
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
# Parens on postfix "if" are optional; I prefer to omit them
next if $file =~ /^\.+$/;
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
process_file($file, $1, $2, $3);
}
print "$file\n";
}
sub process_file {
my ($orig_name, $foo_x, $name_x, $p_x) = #_;
my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";
# - From your description of the task, it sounds like we actually want to
# read from the found file, not from <>, so opening it here to read
# - Better to use lexical ("my") filehandle and three-arg form of open
# - "or" has lower operator precedence than "||", so less chance of
# things being grouped in the wrong order (though either works here)
# - Including $! in the error will tell why the file open failed
open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";
my $data = '';
my $line1 = <$in_fh>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
while (<$in_fh>) {
chomp;
my $digest = md5_hex($data);
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2";
$extra .= "$heading[$_]$sep1$values[$_]$sep2"
for (0 .. scalar(#values));
# - Useless use of double quotes removed on next two lines
$data .= $extra . $eorec;
#print $out_fh $data;
}
# - Moved print to output file to here (where it will print the complete
# output all at once) rather than within the loop (where it will print
# all previous lines each time a new line is read in) to prevent
# duplicate output records. This could also be achieved by printing
# $extra inside the loop. Printing $data at the end will be slightly
# faster, but requires more memory; printing $extra within the loop and
# getting rid of $data entirely would require less memory, so that may
# be the better option if you find yourself needing to read huge input
# files.
print $out_fh $data;
# - $in_fh and $out_fh will be closed automatically when it goes out of
# scope at the end of the block/sub, so there's no real point to
# explicitly closing it unless you're going to check whether the close
# succeeded or failed (which can happen in odd cases usually involving
# full or failing disks when writing; I'm not aware of any way that
# closing a file open for reading can fail, so that's just being left
# implicit)
close $out_fh or die "Failed to close file: $!";
}
Disclaimer: perl -c reports that this code is syntactically valid, but it is otherwise untested.