How to pass file names to a subroutine in perl? - perl

I'm writing a perl script and I would like to pass a file name for the output file to a subroutine.
I tried something like this:
use strict;
use warnings;
test("Output.dat");
sub test {
my $name = #_;
open(B, ">$name") or die "Failure \n";
print B "This is a test! \n";
close(B);
}
I'm going to use the subroutine multiple times, so i have to pass the file name and cannot declare it within the subroutine.
I hope you can help me :)

Your problem is this line:
my $name = #_;
You are assigning an array to a scalar variable. In Perl this will give you the number of elements in the array - so I expect you're ending up with "1" in $name.
There are a number of ways to get the first element from an array;
my $name = $_[0]; # Explicitly get the first element
my $name = shift #_; # Get the first element (and remove it from the array)
my $name = shift; # Same as the previous example - shift works on #_ by default in a subroutine
my ($name) = #_; # The parentheses make it into a list assignment
The last two are the ones that you will see most commonly.
A few more points:
1/ You would get a better clue to the problem if you included $name in your error message.
open(A, ">$name") or die "Failure: $name \n";
Or, even better, the error message that Perl gets from your operating system.
open(A, ">$name") or die "Could not open $name: $!\n";
(I've added back the missing comma - I assume that was a typo.)
2/ This days, it is generally accepted as good practice to use the three-arg version of open and lexical filehandles.
open(my $output_fh, '>', $name) or die "Failure: $name \n";
3/ In your example you open a filehandle called "A", but then try to write to a filehandle called "B". Is this a typo?

my $name = #_;
Will assign to $name value of #_ in scalar mode. It means number of elements in array _. It is a number of arguments. It is most probably not what you would like. So you have to assign an array to an array or a scalar to a scalar. You have two options
my $name = $_[0];
or
my ($name) = #_; # or even (my $name) = #_;
Where I would prefer later because it can be easily modified to my ($a, $b, $c) = #_; and it is Perl idiom.
But your code has more flaws. For example, you should use this open form
open my $fd, '>', $name or die "cannot open > $name: $!";
This has few advantages. The first, you use lexical scoped IO handle which prevents leaking outside of the lexical scope and is automatically closed when exits this lexical scope. The second, list form prevents interpretation of $name content other than file name.
So resulting code should look like:
sub test {
my ($name) = #_;
open my $fd, '>', $name
or die "cannot open > $name: $!";
print $fd "This is a test!\n";
}

Before answering your question, would like to suggest one thing -
Always use 3 parameter open() version like -
open (my $FH, '>', 'file.txt') or die "Cannot open the file:$!";
If you are passing single parameter to the subroutine, you can use 'shift' operator.
test("Output.dat");
sub test {
my $name = shift;
open (my $B, '>', $name) or die "Cannot open the file:$!";
print $B "This is a test! \n";
close($B);
}

Related

redirect output files in different directory

I'm new to Perl and trying to put output files in a different directory.piece of code is as below
use File::Basename;
use File::Copy;
use File::Path;
foreach my $file (sort({uc($a) cmp uc($b)} keys(%$ch_ref))) {
my $num = keys(%{$$ch_ref{$file}});
print "\n -> $string $file ($num):\n";
foreach my $sid (keys(%{$$ch_ref{$file}})) {
if ($type == $PRINT_OLD) {
open ( my $output, '>>',$file );
print {$output} " something";
close ( $output ) or die $!;
}
The third argument to open() is the full path to the file that you want to open. Currently, you're just giving it the filename. but you can expand that to include the directory as well.
Something like this:
my $dir = '/path/to/some/directory';
open my $output, '>>', $dir . $string . '_' . $file;
You should really be checking the success of the open() call, and it's a bit easier to give a sensible error message if you build the filename into a variable first.
my $dir = '/path/to/some/directory';
my $filename = "$dir${string}_$file";
open my $output, '>>', $filename
or die "Can't open $filename: $!";
Note that using ${string} instead of $string means that you can use it directly in the string without the name getting tangled up with the following _ character.
I'd also strongly recommend dropping your use of prototypes on your subroutine. Perl prototypes are often far more trouble than they are worth.
Also, there's no need to open() and close() your file so many times. Just open it at the top of the loop (it will be automatically closed at the end as $output goes out of scope).

Perl: comparing words in two files

This is my current script to try and compare the words in file_all.txt to the ones in file2.txt. It should print out any of the words in file_all that are not in file2.
I need to format these as one word per line, but that's not the more pressing issue.
I am new to Perl ... I get C and Python more but this is being a bit tricky, I know my variable assignment is off.
use strict;
use warnings;
my $file2 = "file_all.txt"; %I know my assignment here is wrong
my $file1 = "file2.txt";
open my $file2, '<', 'file2' or die "Couldn't open file2: $!";
while ( my $line = <$file2> ) {
++$file2{$line};
}
open my $file1, '<', 'file1' or die "Couldn't open file1: $!";
while ( my $line = <$file1> ) {
print $line unless $file2{$line};
}
EDIT: OH, it should ignore case... like Pie is the same as PIE when comparing. and remove apostrophes
These are the errors I am getting:
"my" variable $file2 masks earlier declaration in same scope at absent.pl line 9.
"my" variable $file1 masks earlier declaration in same scope at absent.pl line 14.
Global symbol "%file2" requires explicit package name at absent.pl line 11.
Global symbol "%file2" requires explicit package name at absent.pl line 16.
Execution of absent.pl aborted due to compilation errors.
Your error messages:
"my" variable $file2 masks earlier declaration in same scope at absent.pl line 9.
"my" variable $file1 masks earlier declaration in same scope at absent.pl line 14.
Global symbol "%file2" requires explicit package name at absent.pl line 11.
Global symbol "%file2" requires explicit package name at absent.pl line 16.
Execution of absent.pl aborted due to compilation errors.
You are assigning a file name to $file2, and then later you are using open my $file2 ... The use of my $file2 in the second case masks the use in the first case. Then, in the body of the while loop, you pretend there is a hash table %file2, but you haven't declared it at all.
You should use more descriptive variable names to avoid conceptual confusion.
For example:
my #filenames = qw(file_all.txt file2.txt);
Using variables with integer suffixes is a code smell.
Then, factor common tasks to subroutines. In this case, what you need are: 1) A function that takes a filename and returns a table of words in that file, and 2) A function that takes a filename, and a lookup table, and prints words that are in the file, but do not appear in the lookup table.
#!/usr/bin/env perl
use strict;
use warnings;
use Carp qw( croak );
my #filenames = qw(file_all.txt file2.txt);
print "$_\n" for #{ words_notseen(
$filenames[0],
words_from_file($filenames[1])
)};
sub words_from_file {
my $filename = shift;
my %words;
open my $fh, '<', $filename
or croak "Cannot open '$filename': $!";
while (my $line = <$fh>) {
$words{ lc $_ } = 1 for split ' ', $line;
}
close $fh
or croak "Failed to close '$filename': $!";
return \%words;
}
sub words_notseen {
my $filename = shift;
my $lookup = shift;
my %words;
open my $fh, '<', $filename
or croak "Cannot open '$filename': $!";
while (my $line = <$fh>) {
for my $word (split ' ', $line) {
unless (exists $lookup->{$word}) {
$words{ $word } = 1;
}
}
}
return [ keys %words ];
}
You are almost there.
The % sigil denotes a hash. You can't store a file name in a hash, you need a scalar for that.
my $file2 = 'file_all.txt';
my $file1 = 'file2.txt';
You need a hash to count the occurrences.
my %count;
To open a file, specify its name - it's stored in the scalar, do you remember?
open my $FH, '<', $file2 or die "Can't open $file2: $!";
Then, process the file line by line:
while (my $line = <$FH> ) {
chomp; # Remove newline if present.
++$count{lc $line}; # Store the lowercased string.
}
Then, open the second file, process it line by line, use lc again to get the lowercased string.
To remove apostophes, use a substitution:
$line =~ s/'//g; # Replace ' by nothing globally (i.e. everywhere).
As you have mention in your question: It should print out any of the words in file_all that are not in file2
This below small code does this:
#!/usr/bin/perl
use strict;
use warnings;
my ($file1, $file2) = qw(file_all.txt file2.txt);
open my $fh1, '<', $file1 or die "Can't open $file1: $!";
open my $fh2, '<', $file2 or die "Can't open $file2: $!";
while (<$fh1>)
{
last if eof($fh2);
my $compline = <$fh2>;
chomp($_, $compline);
if ($_ ne $compline)
{
print "$_\n";
}
}
file_all.txt:
ab
cd
ee
ef
gh
df
file2.txt:
zz
yy
ee
ef
pp
df
Output:
ab
cd
gh
The issue is the following two lines:
my %file2 = "file_all.txt";
my %file1 = "file2.txt";
Here you are assigning a single value, called a SCALAR in Perl, to a Hash (denoted by the % sigil). Hashes consist of key value pairs separated by the arrow operator (=>). e.g.
my %hash = ( key => 'value' );
Hashes expect an even number of arguments because they must be given both a key and a value. You currently only give each Hash a single value, thus this error is thrown.
To assign a value to a SCALAR, you use the $ sigil:
my $file2 = "file_all.txt";
my $file1 = "file2.txt";

How to search and replace using hash with Perl

I'm new to Perl and I'm afraid I am stuck and wanted to ask if someone might be able to help me.
I have a file with two columns (tab separated) of oldname and newname.
I would like to use the oldname as key and newname as value and store it as a hash.
Then I would like to open a different file (gff file) and replace all the oldnames in there with the newnames and write it to another file.
I have given it my best try but am getting a lot of errors.
If you could let me know what I am doing wrong, I would greatly appreciate it.
Here are how the two files look:
oldname newname(SFXXXX) file:
genemark-scaffold00013-abinit-gene-0.18 SF130001
augustus-scaffold00013-abinit-gene-1.24 SF130002
genemark-scaffold00013-abinit-gene-1.65 SF130003
file to search and replace in (an example of one of the lines):
scaffold00013 maker gene 258253 258759 . - . ID=maker-scaffold00013-augustus-gene-2.187;Name=maker-scaffold00013-augustus-gene-2.187;
Here is my attempt:
#!/usr/local/bin/perl
use warnings;
use strict;
my $hashfile = $ARGV[0];
my $gfffile = $ARGV[1];
my %names;
my $oldname;
my $newname;
if (!defined $hashfile) {
die "Usage: $0 hash_file gff_file\n";
}
if (!defined $gfffile) {
die "Usage: $0 hash_file gff_file\n";
}
###save hashfile with two columns, oldname and newname, into a hash with oldname as key and newname as value.
open(HFILE, $hashfile) or die "Cannot open $hashfile\n";
while (my $line = <HFILE>) {
chomp($line);
my ($oldname, $newname) = split /\t/;
$names{$oldname} = $newname;
}
close HFILE;
###open gff file and replace all oldnames with newnames from %names.
open(GFILE, $gfffile) or die "Cannot open $gfffile\n";
while (my $line2 = <GFILE>) {
chomp($line2);
eval "$line2 =~ s/$oldname/$names{oldname}/g";
open(OUT, ">SFrenamed.gff") or die "Cannot open SFrenamed.gff: $!";
print OUT "$line2\n";
close OUT;
}
close GFILE;
Thank you!
Your main problem is that you aren't splitting the $line variable. split /\t/ splits $_ by default, and you haven't put anything in there.
This program builds the hash, and then constructs a regex from all the keys by sorting them in descending order of length and joining them with the | regex alternation operator. The sorting is necessary so that the longest of all possible choices is selected if there are any alternatives.
Every occurrence of the regex is replaced by the corresponding new name in each line of the input file, and the output written to the new file.
use strict;
use warnings;
die "Usage: $0 hash_file gff_file\n" if #ARGV < 2;
my ($hashfile, $gfffile) = #ARGV;
open(my $hfile, '<', $hashfile) or die "Cannot open $hashfile: $!";
my %names;
while (my $line = <$hfile>) {
chomp($line);
my ($oldname, $newname) = split /\t/, $line;
$names{$oldname} = $newname;
}
close $hfile;
my $regex = join '|', sort { length $b <=> length $a } keys %names;
$regex = qr/$regex/;
open(my $gfile, '<', $gfffile) or die "Cannot open $gfffile: $!";
open(my $out, '>', 'SFrenamed.gff') or die "Cannot open SFrenamed.gff: $!";
while (my $line = <$gfile>) {
chomp($line);
$line =~ s/($regex)/$names{$1}/g;
print $out $line, "\n";
}
close $out;
close $gfile;
Why are you using an eval? And $oldname is going to be undefined in the second while loop, because the first while loop you redeclare them in that scope (even if you used the outer scope, it would store the very last value that you processed, which wouldn't be helpful).
Take out the my $oldname and my $newname at the top of your script, it is useless.
Take out the entire eval line. You need to repeat the regex for each thing you want to replace. Try something like:
$line2 =~ s/$_/$names{$_}/g for keys %names;
Also see Borodin's answer. He made one big regex instead of a loop, and caught your lack of the second argument to split.

Print email addresses to a file in Perl

I have been scouring this site and others to find the best way to do what I need to do but to no avail. Basically I have a text file with some names and email addresses. Each name and email address is on its own line. I need to get the email addresses and print them to another text file. So far all I have been able to print is the "no email addresses found" message. Any thoughts? Thanks!!
#!/usr/bin/perl
open(IN, "<contacts.txt") || die("file not found");
#chooses the file to read
open(OUT, ">emailaddresses.txt");
#prints file
$none = "No emails found!";
$line = <IN>;
for ($line)
{
if ($line =~ /[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}/g)
{
print (OUT $line);
}
else
{
print (OUT $none);
}
}
close(IN);
close(OUT);
First, always use strict; use warnings. This helps writing correct scripts, and is an invaluable aid when debugging.
Also, use a three-arg-open:
open my $fh, "<", $filename or die qq(Can't open "$filename": $!);
I included the reason for failure ($!), which is a good practice too.
The idiom to read files (on an open filehandle) is:
while (<$fh>) {
chomp;
# The line is in $_;
}
or
while (defined(my $line = <$fh>)) { chomp $line; ... }
What you did was to read one line into $line, and loop over that one item in the for loop.
(Perl has a notion of context. Operators like <$fh> behave differently depending on context. Generally, using a scalar variable ($ sigil) forces scalar context, and #, the sigil for arrays, causes list context. This is quite unlike PHP.)
I'd rewrite your code like:
use strict; use warnings;
use feature 'say';
my $regex = qr/[A-Z0-9._%+-]+\#[A-Z0-9.-]+\.[A-Z]{2,4}/i; # emails are case insensitive
my $found = 0;
while (<>) { # use special ARGV filehandle, which usually is STDIN
while (/($regex)/g) {
$found++;
say $1;
}
}
die "No emails found\n" unless $found;
Invoked like perl script.pl <contacts.txt >emailaddresses.txt. The shell is your friend, and creating programs that can be piped from and to is good design.
Update
If you want to hardcode the filenames, we would combine the above script with the three-arg open I have shown:
use strict; use warnings; use feature 'say';
use autodie; # does `... or die "Can't open $file: $!"` for me
my $regex = qr/[A-Z0-9._%+-]+\#[A-Z0-9.-]+\.[A-Z]{2,4}/i;
my $found = 0;
my $contact_file = "contacts.txt";
my $email_file = "emailaddresses.txt";
open my $contact, "<", $contact_file;
open my $email, ">", $email_file;
while (<$contact>) { # read from the $contact filehandle
while (/($regex)/g) { # the /g is optional if there is max one address per line
$found++;
say {$email} $1; # print to the $email file handle. {curlies} are optional.
}
}
die "No emails found\n" unless $found; # error message goes to STDERR, not to the file

help merging perl code routines together for file processing

I need some perl help in putting these (2) processes/code to work together. I was able to get them working individually to test, but I need help bringing them together especially with using the loop constructs. I'm not sure if I should go with foreach..anyways the code is below.
Also, any best practices would be great too as I'm learning this language. Thanks for your help.
Here's the process flow I am looking for:
read a directory
look for a particular file
use the file name to strip out some key information to create a newly processed file
process the input file
create the newly processed file for each input file read (if i read in 10, I create 10 new files)
Part 1:
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
next if ($file =~ /^\.+$/);
#Get filename attributes
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
print "$1\n";
print "$2\n";
print "$3\n";
}
print "$file\n";
}
Part 2:
use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
my $digest = md5_hex($data);
chomp;
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2" ;
$extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(#values));
$data .= "$extra$eorec";
print NEWFILE "$data";
}
#print $data;
close (NEWFILE);
You are using an old-style of Perl programming. I recommend you to use functions and CPAN modules (http://search.cpan.org). Perl pseudocode:
use Modern::Perl;
# use...
sub get_input_files {
# return an array of files (#)
}
sub extract_file_info {
# takes the file name and returs an array of values (filename attrs)
}
sub process_file {
# reads the input file, takes the previous attribs and build the output file
}
my #ifiles = get_input_files;
foreach my $ifile(#ifiles) {
my #attrs = extract_file_info($ifile);
process_file($ifile, #attrs);
}
Hope it helps
I've bashed your two code fragments together (making the second a sub that the first calls for each matching file) and, if I understood your description of the objective correctly, this should do what you want. Comments on style and syntax are inline:
#!/usr/bin/env perl
# - Never forget these!
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
# Parens on postfix "if" are optional; I prefer to omit them
next if $file =~ /^\.+$/;
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
process_file($file, $1, $2, $3);
}
print "$file\n";
}
sub process_file {
my ($orig_name, $foo_x, $name_x, $p_x) = #_;
my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";
# - From your description of the task, it sounds like we actually want to
# read from the found file, not from <>, so opening it here to read
# - Better to use lexical ("my") filehandle and three-arg form of open
# - "or" has lower operator precedence than "||", so less chance of
# things being grouped in the wrong order (though either works here)
# - Including $! in the error will tell why the file open failed
open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";
my $data = '';
my $line1 = <$in_fh>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
while (<$in_fh>) {
chomp;
my $digest = md5_hex($data);
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2";
$extra .= "$heading[$_]$sep1$values[$_]$sep2"
for (0 .. scalar(#values));
# - Useless use of double quotes removed on next two lines
$data .= $extra . $eorec;
#print $out_fh $data;
}
# - Moved print to output file to here (where it will print the complete
# output all at once) rather than within the loop (where it will print
# all previous lines each time a new line is read in) to prevent
# duplicate output records. This could also be achieved by printing
# $extra inside the loop. Printing $data at the end will be slightly
# faster, but requires more memory; printing $extra within the loop and
# getting rid of $data entirely would require less memory, so that may
# be the better option if you find yourself needing to read huge input
# files.
print $out_fh $data;
# - $in_fh and $out_fh will be closed automatically when it goes out of
# scope at the end of the block/sub, so there's no real point to
# explicitly closing it unless you're going to check whether the close
# succeeded or failed (which can happen in odd cases usually involving
# full or failing disks when writing; I'm not aware of any way that
# closing a file open for reading can fail, so that's just being left
# implicit)
close $out_fh or die "Failed to close file: $!";
}
Disclaimer: perl -c reports that this code is syntactically valid, but it is otherwise untested.