subroutine in perl to perform over #ARGV entries - perl

I've got a small programme that basically processes lists of blast hits, and checks to see if there's overlap between the blast results by iterating blast results (as hash key) through hashes containing each blast list.
This involves processing each blast input file as $ARGV in the same way. Depending on what I'm trying to achieve, I might want to compare 2, 3 or 4 blast lists for gene overlap. I want to know how I can write the basic processing block as a subroutine that I can iterate over for however many $ARGV arguments exist.
For example, the below works fine if I input 2 blast lists:
#!/usr/bin/perl -w
use strict;
use File::Slurp;
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;
if ($#ARGV != 1){
die "Usage: intersect.pl <de gene list 1><de gene list 2>\n"
}
my $input1 = $ARGV[0];
open my $blast1, '<', $input1 or die $!;
my $results1 = 0;
my (#blast1ID, #blast1_info, #split);
while (<$blast1>) {
chomp;
#split = split('\t');
push #blast1_info, $split[0];
push #blast1ID, $split[2];
$results1++;
}
print "$results1 blast hits in $input1\n";
my %blast1;
push #{$blast1{$blast1ID[$_]} }, [ $blast1_info[$_] ] for 0 .. $#blast1ID;
#print Dumper (\%blast1);
my $input2 = $ARGV[1];
open my $blast2, '<', $input2 or die $!;
my $results2 = 0;
my (#blast2ID, #blast2_info);
while (<$blast2>) {
chomp;
#split = split('\t');
push #blast2_info, $split[0];
push #blast2ID, $split[2];
$results2++;
}
my %blast2;
push #{$blast2{$blast2ID[$_]} }, [ $blast2_info[$_] ] for 0 .. $#blast2ID;
#print Dumper (\%blast2);
print "$results2 blast hits in $input2\n";
But I would like to be able to adjust it to cater for 3 or 4 blast lists inputs. I imagine a sub routine would work best for this, that is invoked for each input, and might look something like this:
sub process {
my $input$i = $ARGV[$i-1];
open my $blast$i, '<', $input[$i] or die $!;
my $results$i = 0;
my (#blast$iID, #blast$i_info, #split);
while (<$blast$i>) {
chomp;
#split = split('\t');
push #blast$i_info, $split[0];
push #blast$iID, $split[2];
$results$i++;
}
print "$results$i blast hits in $input$i\n";
print Dumper (\#blast$i_info);
print Dumper (\#blast$iID);
# Call sub 'process for every ARGV...
&process for 0 .. $#ARGV;
UPDATE:
I've removed the hash part for the last snippet.
The resultant data structure should be:
4 blast hits in <$input$i>
$VAR1 = [
'TCONS_00001332(XLOC_000827),_4.60257:9.53943,_Change:1.05146,_p:0.03605,_q:0.998852',
'TCONS_00001348(XLOC_000833),_0.569771:6.50403,_Change:3.51288,_p:0.0331,_q:0.998852',
'TCONS_00001355(XLOC_000837),_10.8634:24.3785,_Change:1.16613,_p:0.001,_q:0.998852',
'TCONS_00002204(XLOC_001374),_0.316322:5.32111,_Change:4.07226,_p:0.00485,_q:0.998852',
];
$VAR1 = [
'gi|50418055|gb|BC078036.1|_Xenopus_laevis_cDNA_clone_MGC:82763_IMAGE:5156829,_complete_cds',
'gi|283799550|emb|FN550108.1|_Xenopus_(Silurana)_tropicalis_mRNA_for_alpha-2,3-sialyltransferase_ST3Gal_V_(st3gal5_gene)',
'gi|147903202|ref|NM_001097651.1|_Xenopus_laevis_forkhead_box_I4,_gene_1_(foxi4.1),_mRNA',
'gi|2598062|emb|AJ001730.1|_Xenopus_laevis_mRNA_for_Xsox17-alpha_protein',
];
And the input:
TCONS_00001332(XLOC_000827),_4.60257:9.53943,_Change:1.05146,_p:0.03605,_q:0.998852 0.0 gi|50418055|gb|BC078036.1|_Xenopus_laevis_cDNA_clone_MGC:82763_IMAGE:5156829,_complete_cds
TCONS_00001348(XLOC_000833),_0.569771:6.50403,_Change:3.51288,_p:0.0331,_q:0.998852 0.0 gi|283799550|emb|FN550108.1|_Xenopus_(Silurana)_tropicalis_mRNA_for_alpha-2,3-sialyltransferase_ST3Gal_V_(st3gal5_gene)
TCONS_00001355(XLOC_000837),_10.8634:24.3785,_Change:1.16613,_p:0.001,_q:0.998852 0.0 gi|147903202|ref|NM_001097651.1|_Xenopus_laevis_forkhead_box_I4,_gene_1_(foxi4.1),_mRNA
TCONS_00002204(XLOC_001374),_0.316322:5.32111,_Change:4.07226,_p:0.00485,_q:0.998852 0.0 gi|2598062|emb|AJ001730.1|_Xenopus_laevis_mRNA_for_Xsox17-alpha_protein

You can't inject a variable value in the middle of a variable name. (Well, you can but you shouldn't. Even then you and can't use array indexing in the middle of the name.)
These names aren't valid:
#blast[$i]_info
#blast[$i]_ID
You need to move the index to the end:
#blast_info[$i]
#blast_ID[$i]
That said, I'd get rid of the arrays completely and use a hash instead.
Your second code snippet doesn't show a call to your subroutine. Unless it's explicitly called it will never run and your program will do nothing. I'd modify the process sub to take a single argument and call it for each element of #ARGV. e.g.
process($_) foreach #ARGV;
Here's how I'd write your program:
use strict;
use warnings;
use Data::Dumper;
my #blast;
push #blast, process($_) foreach #ARGV;
print Dumper(\#blast);
sub process {
my $file = shift;
open my $fh, '<', $file or die "Can't read file '$file' [$!]\n";
my %data;
while (<$fh>) {
chomp;
my ($id, undef, $info) = split '\t';
$data{$id} = $info;
}
return \%data;
}
It isn't quite clear what your resulting data structure should look like. (I took my best guess.) I recommend reading perlreftut to gain a better basic understanding of references and using them to build data structures in Perl.

Related

Perl: Read columns and convert to array

I am new to perl, trying to read a file with columns and creating an array.
I am having a file with following columns.
file.txt
A 15
A 20
A 33
B 20
B 45
C 32
C 78
I wanted to create an array for each unique item present in A with its values assigned from second column.
eg:
#A = (15,20,33)
#B = (20,45)
#C = (32,78)
Tried following code, only for printing 2 columns
use strict;
use warnings;
my $filename = $ARGV[0];
open(FILE, $filename) or die "Could not open file '$filename' $!";
my %seen;
while (<FILE>)
{
chomp;
my $line = $_;
my #elements = split (" ", $line);
my $row_name = join "\t", #elements[0,1];
print $row_name . "\n" if ! $seen{$row_name}++;
}
close FILE;
Thanks
Firstly some general Perl advice. These days, we like to use lexical variables as filehandles and pass three arguments to open().
open(my $fh, '<', $filename) or die "Could not open file '$filename' $!";
And then...
while (<$fh>) { ... }
But, given that you have your filename in $ARGV[0], another tip is to use an empty file input operator (<>) which will return data from the files named in #ARGV without you having to open them. So you can remove your open() line completely and replace the while with:
while (<>) { ... }
Second piece of advice - don't store this data in individual arrays. Far better to store it in a more complex data structure. I'd suggest a hash where the key is the letter and the value is an array containing all of the numbers matching that letter. This is surprisingly easy to build:
use strict;
use warnings;
use feature 'say';
my %data; # I'd give this a better name if I knew what your data was
while (<>) {
chomp;
my ($letter, $number) = split; # splits $_ on whitespace by default
push #{ $data{$letter} }, $number;
}
# Walk the hash to see what we've got
for (sort keys %data) {
say "$_ : #{ $data{$_ } }";
}
Change the loop to be something like:
while (my $line = <FILE>)
{
chomp($line);
my #elements = split (" ", $line);
push(#{$seen{$elements[0]}}, $elements[1]);
}
This will create/append a list of each item as it is found, and result in a hash where the keys are the left items, and the values are lists of the right items. You can then process or reassign the values as you wish.

Perl Merge file

I have 3 or multiple files I need to merge, the data looks like this..
file 1
0334.45656
0334.45678
0335.67899
file 2
0334.89765
0335.12346
0335.56789
file 3
0334.12345
0335.45678
0335.98764
Expected output in file 4,
0334.89765
0334.89765
0334.89765
0334.12345
0335.67899
0335.12346
0335.56789
0335.45678
0335.98764
So far I have tried but data in 4rth file does not come in sorted order,
#!/usr/bin/perl
my %hash;
my $outFile = "outFile.txt";
foreach $file(#ARGV)
{
print "$file\n";
open (IN, "$file") || die "cannot open file $!";
open (OUT,">>$outFile") || die "cannot open file $!";
while ( <IN> )
{
chomp $_;
($timestamp,$data) = split (/\./,$_);
$hash{$timeStamp}{'data'}=$data;
if (defined $hash{$timeStamp})
{
print "$_\n";
print OUT"$_\n";
}
}
}
close (IN);
close (OUT);
I wouldn't normally suggest this, but unix utilties should be able to handle this just fine.
cat the 3 files together.
use sort to sort the merged file.
However, using perl, could just do the following:
#!/usr/bin/perl
use strict;
use warnings;
my #data;
push #data, $_ while (<>);
# Because the numbers are all equal length, alpha sort will work here
print for sort #data;
However, as we've discussed, it's possible that the files will be extremely large. Therefore it will be more efficient both in memory and speed if you're able to take advantage of the fact that all the files are already sorted.
The following solution therefore streams the files, pulling out the next one in order each loop of the while:
#!/usr/bin/perl
# Could name this catsort.pl
use strict;
use warnings;
use autodie;
# Initialize File handles
my #fhs = map {open my $fh, '<', $_; $fh} #ARGV;
# First Line of each file
my #data = map {scalar <$_>} #fhs;
# Loop while a next line exists
while (#data) {
# Pull out the next entry.
my $index = (sort {$data[$a] cmp $data[$b]} (0..$#data))[0];
print $data[$index];
# Fill In next Data at index.
if (! defined($data[$index] = readline $fhs[$index])) {
# End of that File
splice #fhs, $index, 1;
splice #data, $index, 1;
}
}
Using Miller's idea in a more reusable way,
use strict;
use warnings;
sub get_sort_iterator {
my #fhs = map {open my $fh, '<', $_ or die $!; $fh} #_;
my #d;
return sub {
for my $i (0 .. $#fhs) {
# skip to next file handle if it doesn't exists or we have value in $d[$i]
next if !$fhs[$i] or defined $d[$i];
# reading from $fhs[$i] file handle was success?
if ( defined($d[$i] = readline($fhs[$i])) ) { chomp($d[$i]) }
# file handle at EOF, not needed any more
else { undef $fhs[$i] }
}
# compare as numbers, return undef if no more data
my ($index) = sort {$d[$a] <=> $d[$b]} grep { defined $d[$_] } 0..$#d
or return;
# return value from $d[$index], and set it to undef
return delete $d[$index];
};
}
my $iter = get_sort_iterator(#ARGV);
while (defined(my $x = $iter->())) {
print "$x\n";
}
output
0334.12345
0334.45656
0334.45678
0334.89765
0335.12346
0335.45678
0335.56789
0335.67899
0335.98764
Suppose every input files are already in ascending order and have at least one line in them, this script could merge them in ascending order:
#!/usr/bin/perl
use warnings;
use strict;
use List::Util 'reduce';
sub min_index {
reduce { $_[$a] < $_[$b] ? $a : $b } 0 .. $#_;
}
my #fhs = map { open my $fh, '<', $_; $fh } #ARGV;
my #data = map { scalar <$_> } #fhs;
while (#data) {
my $idx = min_index(#data);
print "$data[$idx]";
if (! defined($data[$idx] = readline $fhs[$idx])) {
splice #data, $idx, 1;
splice #fhs, $idx, 1;
}
}
Note: this is basic the same as the second script offered by #Miller, but a bit clearer and more concise.
I suggest this solution, which uses a sorted array of hashes - each hash corresponding to an input file, and containing a file handle fh, the last line read line and the timestamp extracted from the line timestamp.
The hash at the end of the array always corresponds to the input that has the smallest value for the timestamp, so all that is necessary is to repeateedly pop the next value from the array, print its data, read the next line and (if it hasn't reached eof) insert it back into the array in sorted order.
This could produce an appreciable increase in speed over the repeated sorting of all the data for each output line that other answers use.
Note that the program expects the list of input files as parameters on the command line, and sends its merged output to STDOUT. It also assumes that the input files are already sorted.
use strict;
use warnings;
use autodie;
my #data;
for my $file (#ARGV) {
my $item;
open $item->{fh}, '<', $file;
insert_item($item, \#data);
}
while (#data) {
my $item = pop #data;
print $item->{line};
insert_item($item, \#data);
}
sub insert_item {
my ($item, $array) = #_;
return if eof $item->{fh};
$item->{line} = readline $item->{fh};
($item->{timestamp}) = $item->{line} =~ /^(\d+)/;
my $i = 0;
++$i while $i < #$array and $item->{timestamp} < $array->[$i]{timestamp};
splice #$array, $i, 0, $item;
}
output
0334.45656
0334.89765
0334.12345
0334.45678
0335.12346
0335.45678
0335.67899
0335.56789
0335.98764

how to extract substrings by knowing the coordinates

I am terribly sorry for bothering you with my problem in several questions, but I need to solve it...
I want to extract several substrings from a file whick contains string by using another file with the begin and the end of each substring that I want to extract.
The first file is like:
>scaffold30 24194
CTTAGCAGCAGCAGCAGCAGTGACTGAAGGAACTGAGAAAAAGAGCGAGCTGAAAGGAAGCATAGCCATTTGGGAGTGCCAGAGAGTTGGGAGG GAGGGAGGGCAGAGATGGAAGAAGAAAGGCAGAAATACAGGGAGATTGAGGATCACCAGGGAG.........
.................
(the string must be everything in the file except the first line), and the coordinates file is like:
44801988 44802104
44846151 44846312
45620133 45620274
45640443 45640543
45688249 45688358
45729531 45729658
45843362 45843490
46066894 46066996
46176337 46176464
.....................
my script is this:
my $chrom = $ARGV[0];
my $coords_file = $ARGV[1];
#finds subsequences: fasta files
open INFILE1, $chrom or die "Could not open $chrom: $!";
my $count = 0;
while(<INFILE1>) {
if ($_ !~ m/^>/) {
local $/ = undef;
my $var = <INFILE1>;
open INFILE, $coords_file or die "Could not open $coords_file: $!";
my #cline = <INFILE>;
foreach my $cline (#cline) {
print "$cline\n";
my#data = split('\t', $cline);
my $start = $data[0];
my $end = $data[1];
my $offset = $end - $start;
$count++;
my $sub = substr ($var, $start, $offset);
print ">conserved $count\n";
print "$sub\n";
}
close INFILE;
}
}
when I run it, it looks like it does only one iteration and it prints me the start of the first file.
It seems like the foreach loop doesn't work.
also substr seems that doesn't work.
when I put an exit to print the cline to check the loop, it prints all the lines of the file with the coordinates.
I am sorry if I become annoying, but I must finish it and I am a little bit desperate...
Thank you again.
This line
local $/ = undef;
changes $/ for the entire enclosing block, which includes the section where you read in your second file. $/ is the input record separator, which essentially defines what a "line" is (it is a newline by default, see perldoc perlvar for details). When you read from a filehandle using <>, $/ is used to determine where to stop reading. For example, the following program relies on the default line-splitting behavior, and so only reads until the first newline:
my $foo = <DATA>;
say $foo;
# Output:
# 1
__DATA__
1
2
3
Whereas this program reads all the way to EOF:
local $/;
my $foo = <DATA>;
say $foo;
# Output:
# 1
# 2
# 3
__DATA__
1
2
3
This means your #cline array gets only one element, which is a string containing the text of your entire coordinates file. You can see this using Data::Dumper:
use Data::Dumper;
print Dumper(\#cline);
Which in your case will output something like:
$VAR1 = [
'44801988 44802104
44846151 44846312
45620133 45620274
45640443 45640543
45688249 45688358
45729531 45729658
45843362 45843490
46066894 46066996
46176337 46176464
'
];
Notice how your array (technically an arrayref in this case), delineated by [ and ], contains only a single element, which is a string (delineated by single quotes) that contains newlines.
Let's walk through the relevant sections of your code:
while(<INFILE1>) {
if ($_ !~ m/^>/) {
# Enable localized slurp mode. Stays in effect until we leave the 'if'
local $/ = undef;
# Read the rest of INFILE1 into $var (from current line to EOF)
my $var = <INFILE1>;
open INFILE, $coords_file or die "Could not open $coords_file: $!";
# In list context, return each block until the $/ character as a
# separate list element. Since $/ is still undef, this will read
# everything until EOF into our first list element, resulting in
# a one-element array
my #cline = <INFILE>;
# Since #cline only has one element, the loop only has one iteration
foreach my $cline (#cline) {
As a side note, your code could be cleaned up a bit. The names you chose for your filehandles leave something to be desired, and you should probably use lexical filehandles anyway (and the three-argument form of open):
open my $chromosome_fh, "<", $ARGV[0] or die $!;
open my $coordinates_fh, "<", $ARGV[1] or die $!;
Also, you do not need to nest your loops in this case, it just makes your code more convoluted. First read the relevant parts of your chromosome file into a variable (named something more meaningful than var):
# Get rid of the `local $/` statement, we don't need it
my $chromosome;
while (<$chromosome_fh>) {
next if /^>/;
$chromosome .= $_;
}
Then read in your coordinates file:
my #cline = <$coordinates_fh>;
Or if you only need to use the contents of the coordinates file once, process each line as you go using a while loop:
while (<$coordinates_fh>) {
# Do something for each line here
}
As 'ThisSuitIsBlackNot' suggested, your code could be cleaned up a little. Here is a possible solution that may be what you want.
#!/usr/bin/perl
use strict;
use warnings;
my $chrom = $ARGV[0];
my $coords_file = $ARGV[1];
#finds subsequences: fasta files
open INFILE1, $chrom or die "Could not open $chrom: $!";
my $fasta;
<INFILE1>; # get rid of the first line - '>scaffold30 24194'
while(<INFILE1>) {
chomp;
$fasta .= $_;
}
close INFILE1 or die "Could not close '$chrom'. $!";
open INFILE, $coords_file or die "Could not open $coords_file: $!";
my $count = 0;
while(<INFILE>) {
my ($start, $end) = split;
# Or, should this be: my $offset = $end - ($start - 1);
# That would include the start fasta
my $offset = $end - $start;
$count++;
my $sub = substr ($fasta, $start, $offset);
print ">conserved $count\n";
print "$sub\n";
}
close INFILE or die "Could not close '$coords_file'. $!";

perl to loop and check across files

still having trouble with perl programming and I need to be pushed to make a script work out.
I have two files and I want to use the list file to "extract" rows from the data one. The problem is that the list file is formatted as follow:
X1 A B
X2 C D
X3 E F
And my data looks like this:
A X1 2 5
B X1 3 7
C X2 1 4
D X2 1 5
I need to obtain the element pairs from the list file by which select the row in the data file. At the same time I would like to write an output like this:
X1 A B 2 5 3 7
X2 C D 1 4 1 5
I'm trying writing a perl code, but I'm not able to produce something useful. I'm at this point:
open (LIST, "< $fils_list") || die "impossibile open the list";
#list = <LIST>;
close (LIST);
open (HAN, "< $data") || die "Impossible open data";
#r = <HAN>;
close (HAN);
for ($p=0; $p<=$#list; $p++){
chomp ($list[$p]);
($x, $id1, $id2) = split (/\t/, $list[$p]);
$pair_one = $id1."\t".$x;
$pair_two = $id2."\t".$x;
for ($i=0; $i<=$#r; $i++){
chomp ($r[$i]);
($a, $b, $value1, $value2) = split (/\t/, $r[$i]);
$bench = $a."\t".$b;
if (($pair_one eq $bench) || ($pair_two eq $bench)){
print "I don't know what does this script must print!\n";
}
}
}
I'm not able to rationalize about what to print.
Any kind of suggestion is very welcome!
A few general recommendations:
Indent your code to show the structure of your program.
Use meaningful variable names, not $a or $value1 (if I do so below, this is due to my lack of domain knowledge).
Use data structures that suit your program.
Don't do operations like parsing a line more that once.
In Perl, every program should use strict; use warnings;.
use autodie for automatic error handling.
Also, use the open function like open my $fh, "<", $filename as this is safer.
Remember what I said about data structures? In the second file, you have entries like
A X1 2 5
This looks like a secondary key, a primary key, and some data columns. Key-value relationships are best expressed through a hash table.
use strict; use warnings; use autodie;
use feature 'say'; # available since 5.010
open my $data_fh, "<", $data;
my %data;
while (<$data_fh>) {
chomp; # remove newlines
my ($id2, $id1, #data) = split /\t/;
$data{$id1}{$id2} = \#data;
}
Now %data is a nested hash which we can use for easy lookups:
open my $list_fh, "<", $fils_list;
LINE: while(<$list_fh>) {
chomp;
my ($id1, #id2s) = split /\t/;
my $data_id1 = $data{$id1};
defined $data_id1 or next LINE; # maybe there isn't anything here. Then skip
my #values = map #{ $data_id1->{$_} }, #id2s; # map the 2nd level ids to their values and flatten the list
# now print everything out:
say join "\t", $id1, #id2s, #values;
}
The map function is a bit like a foreach loop, and builds a list of values. We need the #{ ... } here because the data structure doesn't hold arrays, but references to arrays. The #{ ... } is a dereference operator.
This is how i would do it, mostly using Hashes resp. Hash- and Array-References (test1.txt and test2.txt contain the data you provided in your example):
use strict;
use warnings;
open(my $f1, '<','test1.txt') or die "Cannot open file1: $!\n";
open(my $f2, '<','test2.txt') or die "Cannot open file2: $!\n";
my #data1 = <$f1>;
my #data2 = <$f2>;
close($f1);
close($f2);
chomp #data1;
chomp #data2;
my %result;
foreach my $line1 (#data1) {
my #fields1 = split(' ',$line1);
$result{$fields1[0]}->{$fields1[1]} = [];
$result{$fields1[0]}->{$fields1[2]} = [];
}
foreach my $line2 (#data2){
my #fields2 = split(' ',$line2);
push #{$result{$fields2[1]}->{$fields2[0]}}, $fields2[2];
push #{$result{$fields2[1]}->{$fields2[0]}}, $fields2[3];
}
foreach my $res (sort keys %result){
foreach (sort keys %{$result{$res}}){
print $res . " " . $_ . " " . join (" ", sort #{$result{$res}->{$_}}) . "\n";
}
}

Count the occurrences of a string in a file

I wrote a perl script to count the occurrences of a character in a file.
So far this is what I have got,
#!/usr/bin/perl -w
use warnings;
no warnings ('uninitialized', 'substr');
my $lines_ref;
my #lines;
my $count;
sub countModule()
{
my $file = "/test";
open my $fh, "<",$file or die "could not open $file: $!";
my #contents = $fh;
my #filtered = grep (/\// ,#contents);
return \#filtered;
}
#lines = countModule();
##lines = $lines_ref;
$count = #lines;
print "###########\n $count \n###########\n";
My test file looks like this:
10.0.0.1/24
192.168.10.0/24
172.16.30.1/24
I am basically trying to count the number of instances of "/"
This is the output that I get:
###########
1
###########
I am getting 1 instead of 3, which is the number of occurrences.
Still learning perl, so any help will be appreciated..Thank you!!
Here are a few points about your code
You should always use strict at the top of your program, and only use no warnings for special reasons in a limited scope. There is no general reason why a working Perl program should need to disable warnings globally
Declare your variables close to their first point of use. The style of declaring everything at the top of the file is unnecessary and is a legacy of C
Never use prototypes in your code. They are available for very special purposes and shouldn't be used for the vast majority of Perl code. sub countModule() { ... } insists that countModule may never be called with any parameters and isn't necessary or useful. The definition should be just sub countModule { ... }
A big well done! for using a lexical file handle, the three-parameter form of open, and putting $! in your die string
my #contents = $fh will just set #contents to a single-element list containing just the filehandle. To read the whole file into the array you need my #contents = <$fh>
You can avoid escaping slashes in a regular expression if you use a different delimiter. To do that you need to use the m operator explicitly, like my #filtered = grep m|/|, #contents)
You return an array reference but assign the returned value to an array, so #lines = countModule() sets #lines to a single-element list containing just the array reference. You should either return a list with return #filtered or dereference the return value on assignment with #lines = #{ countModule }
If all you need to do is to print the number of lines in the file that contain a slash character then you could write something like this
use strict;
use warnings;
my $count;
sub countModule {
open my $fh, '<', '/test' or die "Could not open $file: $!";
return [ grep m|/|, <$fh> ];
}
my $lines = countModule;
$count = #$lines;
print "###########\n $count \n###########\n";
Close, but a few issues:
use strict;
use warnings;
sub countModule
{
my $file = "/test";
open my $fh, "<",$file or die "could not open $file: $!";
my #contents = <$fh>; # The <> brackets are used to read from $fh.
my #filtered = grep (/\// ,#contents);
return #filtered; # Remove the reference.
}
my #lines = countModule();
my $count = scalar #lines; # 'scalar' is not required, but lends clarity.
print "###########\n $count \n###########\n";
Each of the changes I made to your code are annotated with a #comment explaining what was done.
Now in list context your subroutine will return the filtered lines. In scalar context it will return a count of how many lines were filtered.
You did also mention find the occurrences of a character (despite everything in your script being line-oriented). Perhaps your counter sub would look like this:
sub file_tallies{
my $file = '/test';
open my $fh, '<', $file or die $!;
my $count;
my $lines;
while( <$fh> ) {
$lines++;
$count += $_ =~ tr[\/][\/];
}
return ( $lines, $count );
}
my( $line_count, $slash_count ) = file_tallies();
In list context,
return \#filtered;
returns a list with one element -- a reference to the named array #filtered. Maybe you wanted to return the list itself
return #filtered;
Here's some simpler code:
sub countMatches {
my ($file, $c) = #_; # Pass parameters
local $/;
undef $/; # Slurp input
open my $fh, "<",$file or die "could not open $file: $!";
my $s = <$fh>; # The <> brackets are used to read from $fh.
close $fh;
my $ptn = quotemeta($c); # So we can match strings like ".*" verbatim
my #hits = $s =~ m/($ptn)/g;
0 + #hits
}
print countMatches ("/test", '/') . "\n";
The code pushes Perl beyond the very basics, but not too much. Salient points:
By undeffing $/ you can read the input into one string. If you're counting
occurrences of a string in a file, and not occurrences of lines that contain
the string, this is usually easier to do.
m/(...)/g will find all the hits, but if you want to count strings like
"." you need to quote the meta characters in them.
Store the results in an array to evaluate m// in list context
Adding 0 to a list gives the number of items in it.