I am trying to extract a DNA sequence from this FASTA file to a specified length of bases per line, say 40.
> sample dna (This is a typical fasta header.)
agatggcggcgctgaggggtcttgggggctctaggccggccacctactgg
tttgcagcggagacgacgcatggggcctgcgcaataggagtacgctgcct
gggaggcgtgactagaagcggaagtagttgtgggcgcctttgcaaccgcc
tgggacgccgccgagtggtctgtgcaggttcgcgggtcgctggcgggggt
Using this Perl module (fasta.pm):
package fasta;
use strict;
sub read_fasta ($filename) {
my $filename = #_;
open (my $FH_IN, "<", $filename) or die "Can't open file: $filename $!";
my #lines = <$FH_IN>;
chomp #lines;
return #lines;
}
sub read_seq (\#lines) {
my $linesRef = #_;
my #lines = #{$linesRef};
my #seq;
foreach my $line (#lines) {
if ($line!~ /^>/) {
print "$line\n";
push (#seq, $line);
}
}
return #seq;
}
sub print_seq_40 (\#seq) {
my $linesRef = #_;
my #lines = #{$linesRef};
my $seq;
foreach my $line (#lines) {
$seq = $seq.$line;
}
my $i= 0;
my $seq_line;
while (($i+1)*40 < length ($seq)) {
my $seq_line = substr ($seq, $i*40, 40);
print "$seq_line\n";
$i++;
}
$seq_line = substr ($seq, $i*40);
print "$seq_line\n";
}
1;
And the main script is
use strict;
use warnings;
use fasta;
print "What is your filename: ";
my $filename = <STDIN>;
chomp $filename;
my #lines = read_fasta ($filename);
my #seq = read_seq (\#lines);
print_seq_40 (\#seq);
exit;
This is the error I get
Undefined subroutine &main::read_fasta called at q2.pl line 13, <STDIN> line 1.
Can anyone please enlighten me on which part I did wrong?
It looks like you're getting nowhere with this.
I think your choice to use a module and subroutines is a little strange, given that you call each subroutine only once and the correspond to very little code indeed.
Both your program and your module need to start with use strict and use warnings, and you cannot use prototypes like that in Perl subroutines. Including a number of other bugs, this is a lot closer to the code that you need.
package Fasta;
use strict;
use warnings;
use 5.010;
use autodie;
use base 'Exporter';
our #EXPORT = qw/ read_fasta read_seq print_seq_40 /;
sub read_fasta {
my ($filename) = #_;
open my $fh_in, '<', $filename;
chomp(my #lines = <$fh_in>);
#lines;
}
sub read_seq {
my ($lines_ref) = $_[0];
grep { not /^>/ } #$lines_ref;
}
sub print_seq_40 {
my ($lines_ref) = #_;
print "$_\n" for unpack '(A40)*', join '', #$lines_ref;
}
1;
q2.pl
use strict;
use warnings;
use Fasta qw/ read_fasta read_seq print_seq_40 /;
print "What is your filename: ";
my $filename = <STDIN>;
chomp $filename;
my #lines = read_fasta($filename);
my #seq = read_seq(\#lines);
print_seq_40(\#seq);
You need to either:
add to your module:
use Exporter;
our #EXPORT = qw ( read_fasta
read_seq ); #etc.
call the code in the remote module explicitly:
fasta::read_fasta();
explicitly import the module sub:
use fasta qw ( read_fasta );
Also: General convention on modules is to uppercase the first letter of the module name.
In Perl, if you use fasta;, this does not automatically export all its methods into the namespace of your program. Call fasta::read_fasta instead.
Or: use Exporter to automatically export methods or enable something like use Fasta qw/read_fasta/.
For example:
package Fasta;
require Exporter;
our #ISA = qw(Exporter);
our #EXPORT_OK = qw/read_fasta read_seq read_seq40/;
To use:
use Fasta qw/read_fasta read_seq read_seq40/;
You can also make Fasta export all methods automatically or define keywords to group methods, though the latter has caused me some problems in the past, and I would recommend it only if you are certain it is worth possible trouble.
If you want to make all methods available:
package Fasta;
use Exporter;
our #ISA = qw(Exporter);
our #EXPORT = qw/read_fasta read_seq read_seq40/;
Note #EXPORT is not #EXPORT_OK. The latter allows importing them later (as I did), the former automatically exports all. The documentation I linked to makes this clear.
I just noticed something else. You are flattening #_ into $filename in read_fasta. I am not sure this works. Try this:
sub read_fasta {
my $filename = $_[0]; # or ($filename) = #_; #_ is an array. $filename not.
}
To explain the problem: $filename = #_; means: store #_ ( an ARRAY ) into $filename (a SCALAR). Perl does this in this way: ARRAY length is stored in $filename. That is not what you want. You want the first element of the array. That would be $_[0].
Added #ISA which is probably needed OR use comment by Borodir.
Related
I have an input file:
id_1 10 15 20:a:4:c
id_2 1 5 2:2:5:c
id_3 0.4 3 12:1:4:1
id_4 18 2 9:1:0/0:1
id_5 a b c:a:foo:2
I have many files of this type that I want to parse in different programs, so I want to make a function that returns a hash with easily accessible.
I've not written a function like this before, and I'm not sure how to properly access the returned hashes. Here's what I've got so far:
Library_SO.pm
#!/urs/bin/perl
package Library_SO;
use strict;
use warnings;
sub tum_norm_gt {
my $file = shift;
open my $in, '<', $file or die $!;
my %SVs;
my %info;
while(<$in>){
chomp;
my ($id, $start, $stop, $score) = split;
my #vals = (split)[1..2];
my #score_fields = split(/:/, $score);
$SVs{$id} = [ $start, $stop, $score ];
push #{$info{$id}}, #score_fields ;
}
return (\%SVs, \%info);
}
1;
And my main script:
get_vals.pl
#!/urs/bin/perl
use Library_SO;
use strict;
use warnings;
use feature qw/ say /;
use Data::Dumper;
my $file = shift or die $!;
my ($SVs, $info) = Library_SO::tum_norm_gt($file);
print Dumper \%$SVs;
print Dumper \%$info;
# for (keys %$SVs){
# say;
# my #vals = #{$SVs{$_}}; <- line 20
# }
I call this with:
perl get_vals.pl test_in.txt
The Dumper output is what I was hoping for, but when I try to iterate over the returned hash(?) and access the values (e.g. as in the commented out section) I get:
Global symbol "%SVs" requires explicit package name at get_vals.pl line 20.
Execution of get_vals.pl aborted due to compilation errors.
Have I got this totally upside down?
Your library function returns two hashrefs. If you now want to access the values you'll have to dereference the hashref:
my ($SVs, $info) = Library_SO::tum_norm_gt($file);
#print Dumper \%$SVs;
# Easier and better readable:
print Dumper $SVs ;
#print Dumper \%$info;
# Easier and better readable:
print Dumper $info ;
for (keys %{ $SVs } ){ # Better visual derefencing
say;
my #vals = #{$SVs->{$_}}; # it is not $SVs{..} but $SVs->{...}
}
I would like to count the total number of files whose modify_time is between $atime and $btime. Here is part of my code, but it doesn't return anything. What is wrong?
sub mtime_between {
my $mtime=0;
my $counts=0;
$mtime = (stat $File::Find::name)[9] if -f $File::Find::name;
if ($mtime > $atime and $mtime < $btime) {
return sub { print ++$counts,"$File::Find::name\n"};
}
When i call the subroutine, I get nothing.
find(\&mtime_between,"/usr");
You should not be returning a function.
Check File::Find documentation.
find() does a depth-first search over the given #directories in the order they are given. For each file or directory found, it calls the &wanted subroutine.
In the wanted function you should do the things you want to do directly. To return a function reference will not work and this is why you are having problems.
So you actually want something more like:
#!/usr/bin/env perl
use strict;
use warnings;
use feature qw{say};
use File::Find;
use Data::Dumper;
my ($atime, $btime) = (1461220840, 1561220844);
sub findFilesEditedBetweenTimestamps {
my ($atime, $btime, $path) = #_;
my $count = 0;
my #files = ();
my $mtime_between = sub {
my $mtime = 0;
$mtime = (stat $File::Find::name)[9] if -f $File::Find::name;
if ($mtime > $atime and $mtime < $btime) {
push #files, $File::Find::name;
$count++;
}
return;
};
find ($mtime_between, $path);
say "Found a total of $count files";
say "Files:";
print Dumper(#files);
}
findFilesEditedBetweenTimestamps($atime, $btime, "./");
I get:
Found a total of 2 files
Files:
$VAR1 = './test.txt';
$VAR2 = './test.pl';
As has been said, the value returned by the wanted subroutine is ignored. Returning a callback from a callback may be a step too far for some!
This may be of interest. I've used the File::stat module to make extraction of the modification time more readable, and Time::Piece, so that $atime and $btime can be expressed in readable strings instead of epoch values
There's no need to write a separate subroutine for the wanted function unless you prefer -- you can just use an anonymous subroutine in the find call. And it's easiest to simply return from the wanted subroutine if the node isn't a file
#!/usr/bin/env perl
use strict;
use warnings 'all';
use File::Find;
use File::stat;
use Time::Piece;
sub count_files_between_times {
my ($from, $to, $path) = #_;
my $count = 0;
find(sub {
my $st = stat($_) or die $!;
return unless -f $st;
my $mtime = $st->mtime;
++$count if $mtime >= $fromand $mtime <= $to;
}, $path);
print "Found a total of $count files\n";
}
my ($from, $to) = map {
Time::Piece->strptime($_, '%Y-%m-%dT%H:%M:%S')->epoch;
} '2016-04-19T00:00:00', '2019-04-22T00:00:00';
count_files_between_times($from, $to, '/usr');
Update
Some people prefer the File::Find::Rule module. Personally I dislike it intensely, and having looked at the source code I am very wary of it, but it certainly makes this process more concise
Note that File::Find::Rule is layered on top of File::Find, which does the heavy-lifting for it. So it is essentially a different way of writing the wanted subroutine
use File::Find::Rule ();
sub count_files_between_times {
my ($from, $to, $path) = #_;
my #files = File::Find::Rule->file->mtime(">= $from")->mtime("<= $to")->in($path);
printf "Found a total of %d files\n", scalar #files;
}
or if you prefer you can add the restrictions one statement at a time
use File::Find::Rule ();
sub count_files_between_times {
my ($from, $to, $path) = #_;
my $rule = File::Find::Rule->new;
$rule->file;
$rule->mtime(">= $from");
$rule->mtime("<= $to");
my #files = $rule->in($path);
printf "Found a total of %d files\n", scalar #files;
}
Both of these alternative subroutines produce identical results to that of the original above
I am using the File::Grep module. I have following example:
#!/usr/bin/perl
use strict;
use warnings;
use File::Grep qw( fgrep fmap fdo );
my #matches = fgrep { 1.1.1 } glob "file.csv";
foreach my $str (#matches) {
print "$str\n";
}
But when I try to print $str value it gives me HEX value: GLOB(0xac2e78)
What's wrong with this code?
The documentation doesn't seem to be accurate, but judging from the source-code — http://cpansearch.perl.org/src/MNEYLON/File-Grep-0.02/Grep.pm — the list you get back from fgrep contains one element per file. Each element is a hash of the form
{
filename => $filename,
count => $num_matches_in_that_file,
matches => {
$line_number => $line,
...
}
}
I think it would be simpler to skip fgrep and its complicated return-value that has way more information than you want, in favor of fdo, which lets you just iterate over all lines of a file and do what you want:
fdo { my ( $file, $pos, $line ) = #_;
print $line if $line =~ m/1\.1\.1/;
} 'file.csv';
(Note that I removed the glob, by the way. There's not much point in writing glob "file.csv", since only one file can match that globstring.)
or even just dispense with this module and write:
{
open my $fh, '<', 'file.csv';
while (<$fh>) {
print if m/1\.1\.1/;
}
}
I assume you want to see all the lines in file.csv that contain 1.1.1?
The documentation for File::Grep isn't up to date, but this program will put into #lines all the matching lines from all the files (if there were more than one).
use strict;
use warnings;
use File::Grep qw/ fgrep /;
$File::Grep::SILENT = 0;
my #matches = fgrep { /1\.1\.1/ } 'file.csv';
my #lines = map {
my $matches = $_->{matches};
#{$matches}{ sort { $a <=> $b } keys %$matches};
} #matches;
print for #lines;
Update
The most Perlish way to do this is like so
use strict;
use warnings;
open my $fh, '<', 'file.csv' or die $!;
while (<$fh>) {
print if /1\.1\.1/;
}
So I have a text file with the following line:
123456789
But then I have a second file:
987654321
So how can I make the first file's contents the keys in a hash, and the second file's values the values? (Each character is a key/value)
Should I store each file into different arrays and then somehow merge them? How would I do that? Anything else?
Honestly, I would give you my code I have tried, but I haven't the slightest idea where to start.
You could use a hash slice.
If each line is a key/value: (s///r requires 5.14, but it can easily be rewritten for earlier versions)
my %h;
#h{ map s/\s+\z//r, <$fh1> } = map s/\s+\z//r, <$fh2>;
If each character is a key/value:
my %h;
{
local $/ = \1;
#h{ grep !/\n/, <$fh1> } = grep !/\n/, <$fh2>;
}
Just open both files and read them line by line simultaneously:
use strict; use warnings;
use autodie;
my %hash;
open my $keyFile, '<', 'keyfileName';
open my $valueFile, '<', 'valuefileName';
while(my $key = <$keyFile>) {
my $value = <$valueFile>;
chomp for $key, $value;
$hash{$key} = $value;
}
Of course this is just a quick sketch on how it could work.
The OP mentions that each character is a key or value, by this I take it that you mean that the output should be a hash like ( 1 => 9, 2 => 8, ... ). The OP also asks:
Should I store each file into different arrays and then somehow merge them? How would I do that?
This is exactly how this answer works. Here get_chars is a function that reads in each file, splits on every char and returns that array. Then use zip from List::MoreUtils to create the hash.
#!/usr/bin/env perl
use strict;
use warnings;
use List::MoreUtils 'zip';
my ($file1, $file2) = #ARGV;
my #file1chars = get_chars($file1);
my #file2chars = get_chars($file2);
my %hash = zip #file1chars, #file2chars;
use Data::Dumper;
print Dumper \%hash;
sub get_chars {
my $filename = shift;
open my $fh, '<', $filename
or die "Could not open $filename: $!";
my #chars;
while (<$fh>) {
chomp;
push #chars, split //;
}
return #chars;
}
Iterator madness:
#!/usr/bin/env perl
use autodie;
use strict; use warnings;
my $keyfile_contents = join("\n", 'A' .. 'J');
my $valuefile_contents = join("\n", map ord, 'A' .. 'E');
# Use get_iterator($keyfile, $valuefile) to read from physical files
my $each = get_iterator(\ ($keyfile_contents, $valuefile_contents) );
my %hash;
while (my ($k, $v) = $each->()) {
$hash{ $k } = $v;
}
use YAML;
print Dump \%hash;
sub get_iterator {
my ($keyfile, $valuefile) = #_;
open my $keyf, '<', $keyfile;
open my $valf, '<', $valuefile;
return sub {
my $key = <$keyf>;
return unless defined $key;
my $value = <$valf>;
chomp for grep defined, $key, $value;
return $key => $value;
};
}
Output:
C:\temp> yy
---
A: 65
B: 66
C: 67
D: 68
E: 69
F: ~
G: ~
H: ~
I: ~
J: ~
I would write
my %hash = ('123456789' => '987654321');
I am unit testing a component that requires user input. How do I tell Test::More to use some input that I predefined so that I don't need to enter it manually?
This is what I have now:
use strict;
use warnings;
use Test::More;
use TestClass;
*STDIN = "1\n";
foreach my $file (#files)
{
#this constructor asks for user input if it cannot find the file (1 is ignore);
my $test = TestClass->new( file=> #files );
isa_ok( $test, 'TestClass');
}
done_testing;
This code does press enter but the function is retrieving 0 not 1;
If the program reads from STDIN, then just set STDIN to be the open filehandle you want it to be:
#!perl
use strict;
use warnings;
use Test::More;
*STDIN = *DATA;
my #a = <STDIN>;
is_deeply \#a, ["foo\n", "bar\n", "baz\n"], "can read from the DATA section";
my $fakefile = "1\n2\n3\n";
open my $fh, "<", \$fakefile
or die "could not open fake file: $!";
*STDIN = $fh;
my #b = <STDIN>;
is_deeply \#b, ["1\n", "2\n", "3\n"], "can read from a fake file";
done_testing;
__DATA__;
foo
bar
baz
You may want to read more about typeglobs in perldoc perldata and more about turning strings into fake files in the documentation for open (look for "Since v5.8.0, perl has built using PerlIO by default.") in perldoc perlfunc.
The following minimal script seems to work:
#!/usr/bin/perl
package TestClass;
use strict;
use warnings;
sub new {
my $class = shift;
return unless <STDIN> eq "1\n";
bless {} => $class;
}
package main;
use strict;
use warnings;
use Test::More tests => 1;
{
open my $stdin, '<', \ "1\n"
or die "Cannot open STDIN to read from string: $!";
local *STDIN = $stdin;
my $test = TestClass->new;
isa_ok( $test, 'TestClass');
}
Output:
C:\Temp> t
1..1
ok 1 - The object isa TestClass