Im using Perl for the first time [closed] - perl

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 8 years ago.
Improve this question
Im taking a Bioinformatics class and I keep getting an "Undefined subroutine &main::Print called at ReverseComp.txt line 4." error
# ReverseComp.txt => takes DNA sequence from user
# and returns the reverse complement
print ("please input DNA sequence:\n");
$DNA =<STDIN>;
$DNA =~tr/ATGC/TACG/; # Find complement of DNA sequence
$DNA =~reverse ($DNA); # Reverse DNA sequence
print ("Reverse complement of sequence is:\n");
print $DNA."\n";
This is my code and I have tried a few different things with line 4 but with no results. Any suggestions? (I am writing this from a prompt, everything looks right....)

I have some notes related to your code:
Name your scripts with the .pl extension instead of .txt. That is the common accepted extension for Perl scripts (and the .pm for Perl modules, libraries with reusable code)
Always start your scripts with use strict; use warnings;. These sentences helps you to avoid common mistakes.
Declare your variables before you use it (See the my function)
chomp your input from STDIN to remove the newline at the end.
The call to reverse function is odd. I think that $DNA =~ reverse ($DNA); should be $DNA = reverse ($DNA);
The reverse function is more common used with Perl arrays; with a string you get the reversed version of that string, as I guest you expected.
The print function may take a list of parameters, so you can print several things in one sentence
You can omit parentheses in many places, e.g. reverse($a) is the same as reverse $a. Both are valid, but the latter is more suitable to the Perl style of writing code. The Perl style guide is a recommended read
Related to your question, I think your script is right, because the print function exists in Perl, and the error you got says about Print (with uppercase, which is important in Perl). You maybe run a different script that you have posted here.
This is your script with the previous considerations applied (ReverseComp.pl):
use strict;
use warnings;
print "please input DNA sequence:\n";
chomp( my $DNA = <STDIN> );
$DNA =~ tr/ATGC/TACG/; # Find complement of DNA sequence
$DNA = reverse $DNA; # Reverse DNA sequence
print "Reverse complement of sequence is:\n", $DNA, "\n";
In any case, welcome to the fantastic Perl world, and be prepared to enjoy your trip.

Related

Why cannot I read a file line by line in a Perl script? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 7 years ago.
Improve this question
I have a Perl script, which has to read a file line by line.
Line in the file:
0060|9592014|A001-9592014-0060|82769|NOVARTIS PHARMA SERVICES AG BASEL|51671|NOVARTIS AG|A+|SWITZERLAND|Guarantees Issued|12/31/2016|12/31/2016|0|0|0|0|0|0|0|0|0|29014.0967835279993469339764885601502052|0||||0|1|550.3648|32541||.32|SUIG|OLEG|AAA||||||END|
I need to get only 32 fields, the first 32.
open (PRISM, "$infile") or die "Can't open $infile\n";
while (my $file_line = <PRISM>)
{
last if ($file_line=~/^PRISMEXP/);
next if ($file_line=~/^(\s)*$/); # Skip blank lines
print "LINE: $file_line\n"; # This line doesn't print anything
my #field = (split /\|/, $file_line[0-32]);
print "$field[0]\n"; #This line doesn't print anything
}
And as you can see, this part of code doesn't read the file and doesn't print anything. Why? Where is my mistake?
Where you have WHILE you should have while.
Also, your blank line check should have =~, not =.
Your split uses $file_line[0-32] is the same thing as $file_line[-32], which is the 32nd element from the end of #file_line, but you haven't set that array anywhere; I'm guessing that should be substr($file_line,0,32).
Or, if you only want the first 32 fields, it should be:
my #field;
#field[0..31] = split /\|/, $file_line;
Always use use strict; use warnings;. It would have caught the last error, and likely the second error too.
Here are some notes on your program that should help you improve your success rate
Always use strict and use warnings at the top of every Perl program, if you haven't done that already
Use lexical file handles, like my $prism_fh instead of global bareword file handles like PRISM
Don't put scalar variables inside double quotes. At best it will make no difference, and at worst you will get a completely different string
Always put the $! variable in your die string when checking the status of open calls. It will tell you why the open failed. Also, perl will add the source file name and line number to the output of die unless you put a newline on the end of your string, so don't do that if you want to know where in your code the error occurred
It is often better to use the default variable $_ when reading from a file. Many operators use it as their default parameter, making for more concise and tidy code
Don't forget unless. You can more cleanly check whether a line contains non-blanks by using next unless $file_line =~ /\S/
If you don't chomp the input lines then there is no need to put a newline on the end when you print the output
You need to split lines before you can select fields from the input $file_line[0-32] isn't valid Perl
Here's your Perl code refactored so that it prints the first 32 pipe-separated fields. I hope it is obvious that it needs a preamble that does use strict and use warnings and defines $infile.
open my $prism_fh, '<', $infile or die qq{Can't open "$infile": $!\n};
while (<$prism_fh>) {
next unless /\S/;
last if /^PRISMEXP/;
chomp;
my #fields = (split /\|/);
print join('|', #fields[0 .. 31]), "\n";
}
output
0060|9592014|A001-9592014-0060|82769|NOVARTIS PHARMA SERVICES AG BASEL|51671|NOVARTIS AG|A+|SWITZERLAND|Guarantees Issued|12/31/2016|12/31/2016|0|0|0|0|0|0|0|0|0|29014.0967835279993469339764885601502052|0||||0|1|550.3648|32541||.32
Update
Instead of splitting and recombining, you could use a regular expression to grab the first 32 pipe-separated fields, like this
while (<$prism_fh) {
next unless /\S/;
last if /^PRISMEXP/;
chomp;
print $1, "\n" if /^((?:[^|]*\|){31}[^|]*)/;
}
The output is identical to that of the program above.
Because of the line:
last if ($file_line=~/^PRISMEXP/);
If the first line of $infile begins with PRISMEXP you will never print anything.
You have also to change the line:
my #field = (split /\|/, $file_line[0-32]);
to:
my #field = (split /\|/, $file_line)[0..32];

Filling a hash with parsed input elements [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I would like to take a given input, say , and run specific parsings over it and fill a hash with the outputs of those parsings. For example, I'd like this input:
"barcodedSamples": "{\"I-735\":{\"barcodes\":[\"IonXpress_001\"]},\"13055\":{\"barcodes\":[\"IonXpress_002\"]}}",
to be parsed (using a combination of grep and some more specific fiddling that I don't have a strong grasp on) into a table that lists the barcodes and sample names as follows:
barcode sample
IonXpress_001 I-735
IonXpress_002 13055
where "barcode" and "sample" are treated as keys. Another example is that I would like to grep to a line that starts:
"library": "hg19",
and map the value "hg19" (so, the string inside the second set of quotation marks, programmatically speaking) to an arbitrary key like "lib":
Library
hg19
The string closely resembles JSON, however requires some cleaning up to become valid JSON.
#!/usr/bin/perl
use strict;
use warnings FATAL => qw/all/;
use JSON;
use Data::Dumper;
my $json_string = '"barcodedSamples": "{\"I-735\":{\"barcodes\":[\"IonXpress_001\"]},\"13055\":{\"barcodes\":[\"IonXpress_002\"]}}"';
$json_string =~ s/\\//g; # remove escape backslashes.
$json_string =~ s/"\{/{/; # remove an invalid opening quote.
chop $json_string; # remove an invalid closing quote.
$json_string = '{' . $json_string . '}'; # wrap in curly braces.
my $json_object = JSON->new( );
my $perl_ref = $json_object->decode( $json_string );
print Dumper( $perl_ref );
That string you're parsing looks suspiciously like JSON. Why not just use the JSON module (which comes with newer Perls, but can be installed from CPAN for older ones) instead of writing your own parser?

Randomizing 3 lines to display in CGI with Perl [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'm trying to write a CGI script that will take three lines of text and randomize them. Each time you view the webpage, the three lines will appear one after the other in a different order each time. How do I do this and what is the code?
perldoc -q "random line"
Found in D:\sb\perl\lib\perlfaq5.pod
How do I select a random line from a file?
Short of loading the file into a database or pre-indexing the lines in
the file, there are a couple of things that you can do.
Here's a reservoir-sampling algorithm from the Camel Book:
srand;
rand($.) < 1 && ($line = $_) while <>;
This has a significant advantage in space over reading the whole file
in. You can find a proof of this method in *The Art of Computer
Programming*, Volume 2, Section 3.4.2, by Donald E. Knuth.
You can use the File::Random module which provides a function for that
algorithm:
use File::Random qw/random_line/;
my $line = random_line($filename);
Another way is to use the Tie::File module, which treats the entire file
as an array. Simply access a random array element.
or
perldoc -q shuffle
Found in D:\sb\perl\lib\perlfaq4.pod
How do I shuffle an array randomly?
If you either have Perl 5.8.0 or later installed, or if you have
Scalar-List-Utils 1.03 or later installed, you can say:
use List::Util 'shuffle';
#shuffled = shuffle(#list);
If not, you can use a Fisher-Yates shuffle.
sub fisher_yates_shuffle {
my $deck = shift; # $deck is a reference to an array
return unless #$deck; # must not be empty!
my $i = #$deck;
while (--$i) {
my $j = int rand ($i+1);
#$deck[$i,$j] = #$deck[$j,$i];
}
}
use List::Util qw( shuffle );
#lines = shuffle(#lines);

Why is the perl print filehandle syntax the way it is?

I am wondering why the perl creators chose an unusual syntax for printing to a filehandle:
print filehandle list
with no comma after filehandle. I see that it's to distinguish between "print list" and "print filehandle list", but why was the ad-hoc syntax preferred over creating two functions - one to print to stdout and one to print to given filehandle?
In my searches, I came across the explanation that this is an indirect object syntax, but didn't the print function exist in perl 4 and before, whereas the object-oriented features came into perl relatively late? Is anyone familiar with the history of print in perl?
Since the comma is already used as the list constructor, you can't use it to separate semantically different arguments to print.
open my $fh, ...;
print $fh, $foo, $bar
would just look like you were trying to print the values of 3 variables. There's no way for the parser, which operates at compile time, to tell that $fh is going to refer to a file handle at run time. So you need a different character to syntactically (not semantically) distinguish between the optional file handle and the values to actually print to that file handle.
At this point, it's no more work for the parser to recognize that the first argument is separated from the second argument by blank space than it would be if it were separated by any other character.
If Perl had used the comma to make print look more like a function, the filehandle would always have to be included if you are including anything to print besides $_. That is the way functions work: If you pass in a second parameter, the first parameter must also be included. There isn't one function I can think of in Perl where the first parameter is optional when the second parameter exists. Take a look at split. It can be written using zero to four parameters. However, if you want to specify a <limit>, you have to specify the first three parameters too.
If you look at other languages, they all include two different ways ways to print: One if you want STDOUT, and another if you're printing to something besides STDOUT. Thus, Python has both print and write. C has both printf and fprintf. However, Perl can do this with just a single statement.
Let's look at the print statement a bit more closely -- thinking back to 1987 when Perl was first written.
You can think of the print syntax as really being:
print <filehandle> <list_to_print>
To print to OUTFILE, you would say:
To print to this file, you would say:
print OUTFILE "This is being printed to myfile.txt\n";
The syntax is almost English like (PRINT to OUTFILE the string "This is being printed to myfile.txt\n"
You can also do the same with thing with STDOUT:
print STDOUT "This is being printed to your console";
print STDOUT " unless you redirected the output.\n";
As a shortcut, if the filehandle was not given, it would print to STDOUT or whatever filehandle the select was set to.
print "This is being printed to your console";
print " unless you redirected the output.\n";
select OUTFILE;
print "This is being printed to whatever the filehandle OUTFILE is pointing to\n";
Now, we see the thinking behind this syntax.
Imagine I have a program that normally prints to the console. However, my boss now wants some of that output printed to various files when required instead of STDOUT. In Perl, I could easily add a few select statements, and my problems will be solved. In Python, Java, or C, I would have to modify each of my print statements, and either have some logic to use a file write to STDOUT (which may involve some conniptions in file opening and dupping to STDOUT.
Remember that Perl wasn't written to be a full fledge language. It was written to do the quick and dirty job of parsing text files more easily and flexibly than awk did. Over the years, people used it because of its flexibility and new concepts were added on top of the old ones. For example, before Perl 5, there was no such things as references which meant there was no such thing as object oriented programming. If we, back in the days of Perl 3 or Perl 4 needed something more complex than the simple list, hash, scalar variable, we had to munge it ourselves. It's not like complex data structures were unheard of. C had struct since its initial beginnings. Heck, even Pascal had the concept with records back in 1969 when people thought bellbottoms were cool. (We plead insanity. We were all on drugs.) However, since neither Bourne shell nor awk had complex data structures, so why would Perl need them?
Answer to "why" is probably subjective and something close to "Larry liked it".
Do note however, that indirect object notation is not a feature of print, but a general notation that can be used with any object or class and method. For example with LWP::UserAgent.
use strict;
use warnings;
use LWP::UserAgent;
my $ua = new LWP::UserAgent;
my $response = get $ua "http://www.google.com";
my $response_content = decoded_content $response;
print $response_content;
Any time you write method object, it means exactly the same as object->method. Note also that parser seems to only reliably work as long as you don't nest such notations or do not use complex expressions to get object, so unless you want to have lots of fun with brackets and quoting, I'd recommend against using it anywhere except common cases of print, close and rest of IO methods.
Why not? it's concise and it works, in perl's DWIM spirit.
Most likely it's that way because Larry Wall liked it that way.

Perl - Code review [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am working on a program that takes information from a CSV file as a source to search with through a text file that has "customer packages". I am getting odd counts on only some of the entries, and I can't seem to figure out what is causing the duplicate counts. Can anyone look through my code and tell me if my logic/syntax is off? (probably is). All i am trying to accomplish is to count the total occurances in the text file of an entry in the csv file (packageid,package_description)
Thanks for the help! im going nuts over here.
#!/usr/bin/perl
use strict;
use Text::CSV;
# Variables already declared in the other PL file ** Remove if consolidating **
my $file2 = 'master_plist.csv';
my $csv2 = Text::CSV->new(); # Create a Text::CSV object
open (CSV2, "<", $file2) or die $!; #open CSV file for parsing
while (<CSV2>) {
if ($csv2->parse($_)) {
my #columns2 = $csv2->fields(); # Parse CSV and load into an array for each row.
my $packID = $columns2[0];
my $packDESC = $columns2[1];
my $val = 'customer_packages_report.txt';
chomp ($val);
my $cnt=0;
open (HNDL, "$val") || die "wrong filename";
while ($val = <HNDL>)
{
while ($val =~ /$packID - $packDESC/ig)
{
$cnt++;
}
}
#if ($packDESC =~ /\(/g) {
# $packDESC =~ s/\(/\(/g;
#}
print "Total iterations of $packDESC: $cnt\n";
close (HNDL);
# End original code
} # Close IF
} # Close WHILE
close CSV;
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
# Variables already declared in the other PL file ** Remove if consolidating **
my $file2 = 'master_plist.csv';
my $csv2 = Text::CSV->new(); # Create a Text::CSV object
open (CSV2, "<", $file2) or die "I die while opening $file2! $!"; #open CSV file for parsing
while ($each_csv2_line=<CSV2>) {
if ($csv2->parse($each_csv2_line)) {
my #columns2 = $csv2->fields(); # Parse CSV and load into an array for each row.
my $packID = $columns2[0];
my $packDESC = $columns2[1];
my $val = 'customer_packages_report.txt';
chomp ($val);
my $cnt=0;
open (HNDL,"<","$val") or die "wrong filename: $val! $!";
while (<HNDL>){
$cnt++ while (/$packID - $packDESC/ig);
}
#if ($packDESC =~ /\(/g) {
# $packDESC =~ s/\(/\(/g;
#}
print "Total iterations of $packDESC: $cnt\n";
close (HNDL);
# End original code
} # Close IF
} # Close WHILE
# end of script
close CSV;
My recommendations:
Use $HNDL instead of HNDL <- lexical variables for filehandles more better.
Try to catch all mistakes (by defined and ==0 and eq "")
I try to format your code and add some features that i sometimes use. Be better than me and read first Style Coding for Little Perl Monk. And you can be more impressive with this language and write not only writeonly code.
Example (and also a quote):
"The situation is exactly the same for the line-input operator, <>, although Perl does this for you automatically.
It looks like you’re testing the line from STDIN in this while:
while (<STDIN>) {
do_something($_);
}
However, this is a special case in which Perl automatically converts to check $_ for definedness:
while ( defined( $_ = <STDIN> ) ) { # implicitly done
do_something($_);
}
"
Effective Perl Programming, page 24.
You could do a number of things to improve your code:
use warnings;.
Use proper indentation.
Use descriptive variable names. Instead of $file2 (has no meaning, and why is there no file 1?), use $package_file or whatever makes sense.
if you are already using Text::CSV, you can use $csv->getline() to go through the file line by line. This will simplify your code. See the documentation for an example.
chomp($val) removes a newline from the end of a string. You are using it on a string literal you just declared, which has no newline. That doesn't make sense.
Never use the same variable ($val) to do two completely different things. This is extremely confusing.
Might the variables that you are interpolating in the regex contain special characters? If so, you need to escape them. For example, if $packDESC contained a period, it would match any character in the regex. To treat the contents of the variable literally, use \Q..\E, as in this example: /\Q$packID - $packDESC\E/ig.
You are opening customer_packages_report.txt and going through it line-by-line on every line of the csv file. You could simplify this by reading it in once and storing the results in an array.
You don't need a while loop to count matches: $cnt = () = /$packID - $packDESC/ig;. This puts the match in array context, returning an array of matches, then puts it back in scalar context to count the matches. A little bit tricky, but simpler.
It's hard to say exactly what is causing your problem without seeing the data. Might you have some unnecessary repetition that stems from your nested looping over both files? I would start by rewriting to improve your code, then see if the problem still exists.
Your code seems to compile with perl -c without errors, so that's good. If I were to guess, I would assume your problem lies in having meta characters in some of your fields. The regex /$packID - $packDESC/ is vulnerable to meta characters. For example
my $str = "foo? bar";
$str =~ /$str/; # returns false, because ? is a meta character
In the above example, the question mark ? is a quantifier which affects whatever comes before it, so that o? means "0 or 1 o". To solve the meta character problem, use the \Q ... \E escape:
$str =~ /\Q$str/; # will now match
Terminating the escape sequence with \E is optional.
Some other things to note:
It is very good that you use use strict. You should also always use warnings. Not doing so is not removing the issues with your code, only hiding them.
You create a Text::CSV object with default settings. Depending on your input, that may or may not be appropriate. Setting binary => 1 is recommended in the documentation.
Using the parse() function may not be the best option, the documentation has good things to say about getline.
As loldop points out in the comments, you are reusing $val to read from your file. While technically that should work, it is asking for trouble.
Style and practice notes and practical tips:
Using three-argument open and lexical file handles is a good thing to do. Three-argument in essence means to use an explicit open mode, which makes your script safer to use. Using lexical file handles means that you will not have global scope on your file handle, which is a good thing.
This code
my #columns2 = $csv2->fields();
my $packID = $columns2[0];
my $packDESC = $columns2[1];
Can be written like this
my ($packID, $packDESC) = $csv2->fields();
You are chomping $val right after you assign it. That is redundant, because chomp by default only removes newlines from the end of your strings, and you did not add any such. It doesn't change anything, but not required here. If you read something from stdin or a file, you would probably want to use chomp, though.
Using die without referring to the error $! is a sure way to make yourself annoyed.
Do not underestimate how much easier it becomes to write code when you use proper indentation. Use a text editor with automatic indentation and colouring. I can warmly recommend vim (gvim if you are using windows). Though it has a learning curve, is is a powerful editor that also often comes already installed on many systems.
Since so many people have already commented on your program itself, I'm going to talk about how you can become a better Perl programmer, and help write in such a way that will help eliminate many of your issues.
Take a look at Perl::Tidy and run your program thorough that. That will help improve your syntax and Perl and will help you catch a lot of the various issues you're having.
Also, you should get a copy of Perl Best Practices which is where most of Perl Tidy is taken from. And, as someone already referenced Effective Perl Programming is another excellent book.
The big issue with Perl is that few people learn it. Most are tossed into a situation where we had to pick it up ourselves. Plus, Perl is a fairly old and rather crufty language. Most Perl books still lean heavily on Perl 3.x ways of programming and fail to mention such basics as using use strict; and use warnings;.
You combine old programming practices, with most people learning Perl by hacking their way through old programs with old syntax (and probably written by people who learned Perl by hacking their way through even older programs), and you can see why Perl has a reputation of being a write-only language.
You may want to use the getline method from Text::CSV, which saves a few lines of code.
The problem is likely to be because you have regex metacharacters in the strings you are searching for. Escape them with \Q...\E in the regex so that they are taken literally. In the rewrite below I have also added \s* instead of a literal space, just in case there isn't exactly one space on either side of the hyphen.
I have also changed the filehandles to lexical ones, which have the advantage that they will be closed automatically when the handle goes out of scope.
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $file2 = 'master_plist.csv';
my $csv2 = Text::CSV->new();
open(my $csv_fh, '<', $file2) or die $!;
while (my $row = $csv2->getline($csv_fh)) {
my ($packID, $packDESC) = #$row;
my $val = 'customer_packages_report.txt';
chomp($val);
open(my $fh, '<', $val) or die "wrong filename";
my $cnt = 0;
while ($val = <$fh>) {
while ($val =~ /\Q$packID\E\s*-\s*\Q$packDESC\E/ig) {
$cnt++;
}
}
print "Total iterations of $packDESC: $cnt\n";
}