I am using the File::Grep module. I have following example:
#!/usr/bin/perl
use strict;
use warnings;
use File::Grep qw( fgrep fmap fdo );
my #matches = fgrep { 1.1.1 } glob "file.csv";
foreach my $str (#matches) {
print "$str\n";
}
But when I try to print $str value it gives me HEX value: GLOB(0xac2e78)
What's wrong with this code?
The documentation doesn't seem to be accurate, but judging from the source-code — http://cpansearch.perl.org/src/MNEYLON/File-Grep-0.02/Grep.pm — the list you get back from fgrep contains one element per file. Each element is a hash of the form
{
filename => $filename,
count => $num_matches_in_that_file,
matches => {
$line_number => $line,
...
}
}
I think it would be simpler to skip fgrep and its complicated return-value that has way more information than you want, in favor of fdo, which lets you just iterate over all lines of a file and do what you want:
fdo { my ( $file, $pos, $line ) = #_;
print $line if $line =~ m/1\.1\.1/;
} 'file.csv';
(Note that I removed the glob, by the way. There's not much point in writing glob "file.csv", since only one file can match that globstring.)
or even just dispense with this module and write:
{
open my $fh, '<', 'file.csv';
while (<$fh>) {
print if m/1\.1\.1/;
}
}
I assume you want to see all the lines in file.csv that contain 1.1.1?
The documentation for File::Grep isn't up to date, but this program will put into #lines all the matching lines from all the files (if there were more than one).
use strict;
use warnings;
use File::Grep qw/ fgrep /;
$File::Grep::SILENT = 0;
my #matches = fgrep { /1\.1\.1/ } 'file.csv';
my #lines = map {
my $matches = $_->{matches};
#{$matches}{ sort { $a <=> $b } keys %$matches};
} #matches;
print for #lines;
Update
The most Perlish way to do this is like so
use strict;
use warnings;
open my $fh, '<', 'file.csv' or die $!;
while (<$fh>) {
print if /1\.1\.1/;
}
Related
I have a list of Accession numbers that I want to pair randomly using a Perl script below:
#!/usr/bin/perl -w
use List::Util qw(shuffle);
my $file = 'randomseq_acc.txt';
my #identifiers = map { (split /\n/)[1] } <$file>;
chomp #identifiers;
#Shuffle them and put in a hash
#identifiers = shuffle #identifiers;
my %pairs = (#identifiers);
#print the pairs
for (keys %pairs) {
print "$_ and $pairs{$_} are partners\n";
but keep getting errors.
The accession numbers in the file randomseq_acc.txt are:
1094711
1586007
2XFX_C
Q27031.2
P22497.2
Q9TVU5.1
Q4N4N8.1
P28547.2
P15711.1
AAC46910.1
AAA98602.1
AAA98601.1
AAA98600.1
EAN33235.2
EAN34465.1
EAN34464.1
EAN34463.1
EAN34462.1
EAN34461.1
EAN34460.1
I needed to add the closing right curly brace to be able to compile the script.
As arrays are indexed from 0, (split /\n/)[1] returns the second field, i.e. what follows newline on each line (i.e. nothing). Change it to [0] to make it work:
my #identifiers = map { (split /\n/)[0] } <$file>; # Still wrong.
The diamond operator needs a file handle, not a file name. Use open to associate the two:
open my $FH, '<', $file or die $!;
my #identifiers = map { (split /\n/)[0] } <$FH>;
Using split to remove a newline is not common. I'd probably use something else:
map { /(.*)/ } <$FH>
# or
map { chomp; $_ } <$FH>
# or, thanks to ikegami
chomp(my #identifiers = <$FH>);
So, the final result would be something like the following:
#!/usr/bin/perl
use warnings;
use strict;
use List::Util qw(shuffle);
my $filename = '...';
open my $FH, '<', $filename or die $!;
chomp(my #identifiers = <$FH>);
my %pairs = shuffle(#identifiers);
print "$_ and $pairs{$_} are partners\n" for keys %pairs;
I am trying to extract a DNA sequence from this FASTA file to a specified length of bases per line, say 40.
> sample dna (This is a typical fasta header.)
agatggcggcgctgaggggtcttgggggctctaggccggccacctactgg
tttgcagcggagacgacgcatggggcctgcgcaataggagtacgctgcct
gggaggcgtgactagaagcggaagtagttgtgggcgcctttgcaaccgcc
tgggacgccgccgagtggtctgtgcaggttcgcgggtcgctggcgggggt
Using this Perl module (fasta.pm):
package fasta;
use strict;
sub read_fasta ($filename) {
my $filename = #_;
open (my $FH_IN, "<", $filename) or die "Can't open file: $filename $!";
my #lines = <$FH_IN>;
chomp #lines;
return #lines;
}
sub read_seq (\#lines) {
my $linesRef = #_;
my #lines = #{$linesRef};
my #seq;
foreach my $line (#lines) {
if ($line!~ /^>/) {
print "$line\n";
push (#seq, $line);
}
}
return #seq;
}
sub print_seq_40 (\#seq) {
my $linesRef = #_;
my #lines = #{$linesRef};
my $seq;
foreach my $line (#lines) {
$seq = $seq.$line;
}
my $i= 0;
my $seq_line;
while (($i+1)*40 < length ($seq)) {
my $seq_line = substr ($seq, $i*40, 40);
print "$seq_line\n";
$i++;
}
$seq_line = substr ($seq, $i*40);
print "$seq_line\n";
}
1;
And the main script is
use strict;
use warnings;
use fasta;
print "What is your filename: ";
my $filename = <STDIN>;
chomp $filename;
my #lines = read_fasta ($filename);
my #seq = read_seq (\#lines);
print_seq_40 (\#seq);
exit;
This is the error I get
Undefined subroutine &main::read_fasta called at q2.pl line 13, <STDIN> line 1.
Can anyone please enlighten me on which part I did wrong?
It looks like you're getting nowhere with this.
I think your choice to use a module and subroutines is a little strange, given that you call each subroutine only once and the correspond to very little code indeed.
Both your program and your module need to start with use strict and use warnings, and you cannot use prototypes like that in Perl subroutines. Including a number of other bugs, this is a lot closer to the code that you need.
package Fasta;
use strict;
use warnings;
use 5.010;
use autodie;
use base 'Exporter';
our #EXPORT = qw/ read_fasta read_seq print_seq_40 /;
sub read_fasta {
my ($filename) = #_;
open my $fh_in, '<', $filename;
chomp(my #lines = <$fh_in>);
#lines;
}
sub read_seq {
my ($lines_ref) = $_[0];
grep { not /^>/ } #$lines_ref;
}
sub print_seq_40 {
my ($lines_ref) = #_;
print "$_\n" for unpack '(A40)*', join '', #$lines_ref;
}
1;
q2.pl
use strict;
use warnings;
use Fasta qw/ read_fasta read_seq print_seq_40 /;
print "What is your filename: ";
my $filename = <STDIN>;
chomp $filename;
my #lines = read_fasta($filename);
my #seq = read_seq(\#lines);
print_seq_40(\#seq);
You need to either:
add to your module:
use Exporter;
our #EXPORT = qw ( read_fasta
read_seq ); #etc.
call the code in the remote module explicitly:
fasta::read_fasta();
explicitly import the module sub:
use fasta qw ( read_fasta );
Also: General convention on modules is to uppercase the first letter of the module name.
In Perl, if you use fasta;, this does not automatically export all its methods into the namespace of your program. Call fasta::read_fasta instead.
Or: use Exporter to automatically export methods or enable something like use Fasta qw/read_fasta/.
For example:
package Fasta;
require Exporter;
our #ISA = qw(Exporter);
our #EXPORT_OK = qw/read_fasta read_seq read_seq40/;
To use:
use Fasta qw/read_fasta read_seq read_seq40/;
You can also make Fasta export all methods automatically or define keywords to group methods, though the latter has caused me some problems in the past, and I would recommend it only if you are certain it is worth possible trouble.
If you want to make all methods available:
package Fasta;
use Exporter;
our #ISA = qw(Exporter);
our #EXPORT = qw/read_fasta read_seq read_seq40/;
Note #EXPORT is not #EXPORT_OK. The latter allows importing them later (as I did), the former automatically exports all. The documentation I linked to makes this clear.
I just noticed something else. You are flattening #_ into $filename in read_fasta. I am not sure this works. Try this:
sub read_fasta {
my $filename = $_[0]; # or ($filename) = #_; #_ is an array. $filename not.
}
To explain the problem: $filename = #_; means: store #_ ( an ARRAY ) into $filename (a SCALAR). Perl does this in this way: ARRAY length is stored in $filename. That is not what you want. You want the first element of the array. That would be $_[0].
Added #ISA which is probably needed OR use comment by Borodir.
i have a very simple perl question regarding pattern matching problem.
I am reading file with a list of names (fileA).
I would like to check if any of these names exist in another file (fileB).
if ($name -e $fileB){
do something
}else{
do something else
}
it is in a way to check if a pattern exists in a file.
I have tried
open(IN, $controls) or die "Can't open the control file\n";
while(my $line = <IN>){
if ($name =~ $line ){
print "$name\tfound\n";
}else{
print "$name\tnotFound\n";
}
}
This is repeating itself as it checks and prints every entry rather than checking whether the name exists or not.
When you are doing compare one list to another, you're interested in hashes. A hash is an array that is keyed and the list itself has no order. A hash can only have a single instance of a particular key (but different keys can have the same data).
What you can do is go through the first file, and create a hash keyed by that line. Then, you go through the second folder and check to see if any of those lines match any keys in your hash:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use autodie; #You don't have to check if "open" fails.
use constant {
FIRST_FILE => 'file1.txt',
SECOND_FILE => 'file2.txt',
};
open my $first_fh, "<", FIRST_FILE;
# Get each line as a hash key
my %line_hash;
while ( my $line = <$first_fh> ) {
chomp $line;
$line_hash{$line} = 1;
}
close $first_fh;
Now each line is a key in your hash %line_hash. The data really doesn't matter. The important part is the value of the key itself.
Now that I have my hash of the lines in the first file, I can read in the second file and see if that line exists in my hash:
open my $second_fh, "<", SECOND_FILE;
while ( my $line = <$second_fh> ) {
chomp $line;
if ( exists $line_hash{$line} ) {
say qq(I found "$line" in both files);
}
}
close $second_fh;
There's a map function too that can be used:
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use autodie; #You don't have to check if "open" fails.
use constant {
FIRST_FILE => 'file1.txt',
SECOND_FILE => 'file2.txt',
};
open my $first_fh, "<", FIRST_FILE
chomp ( my #lines = <$first_fh> );
# Get each line as a hash key
my %line_hash = map { $_ => 1 } #lines;
close $first_fh;
open my $second_fh, "<", SECOND_FILE;
while ( my $line = <$second_fh> ) {
chomp $line;
if ( exists $line_hash{$line} ) {
say qq(I found "$line" in both files);
}
}
close $second_fh;
I am not a great fan of map because I don't find it that much more efficient and it is harder to understand what is going on.
To check whether a pattern exists in a file, you have to open the file and read its content. The fastest way how to search for inclusion of two lists is to store the content in a hash:
#!/usr/bin/perl
use strict;
use warnings;
open my $LST, '<', 'fileA' or die "fileA: $!\n";
open my $FB, '<', 'fileB' or die "fileB: $!\n";
my %hash;
while (<$FB>) {
chomp;
undef $hash{$_};
}
while (<$LST>) {
chomp;
if (exists $hash{$_}) {
print "$_ exists in fileB.\n";
}
}
I have just given an algorithm kind of code which is not tested.
But i feel this does the job for you.
my #a;
my $matched
my $line;
open(A,"fileA");
open(A,"fileB");
while(<A>)
{
chomp;
push #a,$_;
}
while(<B>)
{
chomp;
$line=$_;
$matched=0;
for(#a){if($line=~/$_/){last;$matched=1}}
if($matched)
{
do something
}
else
{
do something else
}
}
So I have a text file with the following line:
123456789
But then I have a second file:
987654321
So how can I make the first file's contents the keys in a hash, and the second file's values the values? (Each character is a key/value)
Should I store each file into different arrays and then somehow merge them? How would I do that? Anything else?
Honestly, I would give you my code I have tried, but I haven't the slightest idea where to start.
You could use a hash slice.
If each line is a key/value: (s///r requires 5.14, but it can easily be rewritten for earlier versions)
my %h;
#h{ map s/\s+\z//r, <$fh1> } = map s/\s+\z//r, <$fh2>;
If each character is a key/value:
my %h;
{
local $/ = \1;
#h{ grep !/\n/, <$fh1> } = grep !/\n/, <$fh2>;
}
Just open both files and read them line by line simultaneously:
use strict; use warnings;
use autodie;
my %hash;
open my $keyFile, '<', 'keyfileName';
open my $valueFile, '<', 'valuefileName';
while(my $key = <$keyFile>) {
my $value = <$valueFile>;
chomp for $key, $value;
$hash{$key} = $value;
}
Of course this is just a quick sketch on how it could work.
The OP mentions that each character is a key or value, by this I take it that you mean that the output should be a hash like ( 1 => 9, 2 => 8, ... ). The OP also asks:
Should I store each file into different arrays and then somehow merge them? How would I do that?
This is exactly how this answer works. Here get_chars is a function that reads in each file, splits on every char and returns that array. Then use zip from List::MoreUtils to create the hash.
#!/usr/bin/env perl
use strict;
use warnings;
use List::MoreUtils 'zip';
my ($file1, $file2) = #ARGV;
my #file1chars = get_chars($file1);
my #file2chars = get_chars($file2);
my %hash = zip #file1chars, #file2chars;
use Data::Dumper;
print Dumper \%hash;
sub get_chars {
my $filename = shift;
open my $fh, '<', $filename
or die "Could not open $filename: $!";
my #chars;
while (<$fh>) {
chomp;
push #chars, split //;
}
return #chars;
}
Iterator madness:
#!/usr/bin/env perl
use autodie;
use strict; use warnings;
my $keyfile_contents = join("\n", 'A' .. 'J');
my $valuefile_contents = join("\n", map ord, 'A' .. 'E');
# Use get_iterator($keyfile, $valuefile) to read from physical files
my $each = get_iterator(\ ($keyfile_contents, $valuefile_contents) );
my %hash;
while (my ($k, $v) = $each->()) {
$hash{ $k } = $v;
}
use YAML;
print Dump \%hash;
sub get_iterator {
my ($keyfile, $valuefile) = #_;
open my $keyf, '<', $keyfile;
open my $valf, '<', $valuefile;
return sub {
my $key = <$keyf>;
return unless defined $key;
my $value = <$valf>;
chomp for grep defined, $key, $value;
return $key => $value;
};
}
Output:
C:\temp> yy
---
A: 65
B: 66
C: 67
D: 68
E: 69
F: ~
G: ~
H: ~
I: ~
J: ~
I would write
my %hash = ('123456789' => '987654321');
I quickly jotted off a Perl script that would average a few files with just columns of numbers. It involves reading from an array of filehandles. Here is the script:
#!/usr/local/bin/perl
use strict;
use warnings;
use Symbol;
die "Usage: $0 file1 [file2 ...]\n" unless scalar(#ARGV);
my #fhs;
foreach(#ARGV){
my $fh = gensym;
open $fh, $_ or die "Unable to open \"$_\"";
push(#fhs, $fh);
}
while (scalar(#fhs)){
my ($result, $n, $a, $i) = (0,0,0,0);
while ($i <= $#fhs){
if ($a = <$fhs[$i]>){
$result += $a;
$n++;
$i++;
}
else{
$fhs[$i]->close;
splice(#fhs,$i,1);
}
}
if ($n){ print $result/$n . "\n"; }
}
This doesn't work. If I debug the script, after I initialize #fhs it looks like this:
DB<1> x #fhs
0 GLOB(0x10443d80)
-> *Symbol::GEN0
FileHandle({*Symbol::GEN0}) => fileno(6)
1 GLOB(0x10443e60)
-> *Symbol::GEN1
FileHandle({*Symbol::GEN1}) => fileno(7)
So far, so good. But it fails at the part where I try to read from the file:
DB<3> x $fhs[$i]
0 GLOB(0x10443d80)
-> *Symbol::GEN0
FileHandle({*Symbol::GEN0}) => fileno(6)
DB<4> x $a
0 'GLOB(0x10443d80)'
$a is filled with this string rather than something read from the glob. What have I done wrong?
You can only use a simple scalar variable inside <> to read from a filehandle. <$foo> works. <$foo[0]> does not read from a filehandle; it's actually equivalent to glob($foo[0]). You'll have to use the readline builtin, a temporary variable, or use IO::File and OO notation.
$text = readline($foo[0]);
# or
my $fh = $foo[0]; $text = <$fh>;
# or
$text = $foo[0]->getline; # If using IO::File
If you weren't deleting elements from the array inside the loop, you could easily use a temporary variable by changing your while loop to a foreach loop.
Personally, I think using gensym to create filehandles is an ugly hack. You should either use IO::File, or pass an undefined variable to open (which requires at least Perl 5.6.0, but that's almost 10 years old now). (Just say my $fh; instead of my $fh = gensym;, and Perl will automatically create a new filehandle and store it in $fh when you call open.)
If you are willing to use a bit of magic, you can do this very simply:
use strict;
use warnings;
die "Usage: $0 file1 [file2 ...]\n" unless #ARGV;
my $sum = 0;
# The current filehandle is aliased to ARGV
while (<>) {
$sum += $_;
}
continue {
# We have finished a file:
if( eof ARGV ) {
# $. is the current line number.
print $sum/$. , "\n" if $.;
$sum = 0;
# Closing ARGV resets $. because ARGV is
# implicitly reopened for the next file.
close ARGV;
}
}
Unless you are using a very old perl, the messing about with gensym is not necessary. IIRC, perl 5.6 and newer are happy with normal lexical handles: open my $fh, '<', 'foo';
I have trouble understanding your logic. Do you want to read several files, which just contains numbers (one number per line) and print its average?
use strict;
use warnings;
my #fh;
foreach my $f (#ARGV) {
open(my $fh, '<', $f) or die "Cannot open $f: $!";
push #fh, $fh;
}
foreach my $fh (#fh) {
my ($sum, $n) = (0, 0);
while (<$fh>) {
$sum += $_;
$n++;
}
print "$sum / $n: ", $sum / $n, "\n" if $n;
}
Seems like a for loop would work better for you, where you could actually use the standard read (iteration) operator.
for my $fh ( #fhs ) {
while ( defined( my $line = <$fh> )) {
# since we're reading integers we test for *defined*
# so we don't close the file on '0'
#...
}
close $fh;
}
It doesn't look like you want to shortcut the loop at all. Therefore, while seems to be the wrong loop idiom.