Add counter to if statement - perl

How can I add a counter to this statement.
# go through each reference file
for my $file (#reference_files)
{
open my $ref, "<", $file or die "Can't open reference file '$file': $!";
while (my $line = <$ref>)
{
chomp $line;
my ($scaffold, undef, $type, $org_snp, $new_snp, undef, undef, undef, $info) = split /\t/, $line;
next if not $scaffold =~ /^KB/;
next if not $type =~ /^GENE/i;
my ($transcript_id, $gene_name, $auto) = split /[;][ ]/, $info;
$gene_name = $1 if $gene_name =~ /["]([^"]*)["]/;
if (my $matching_genes = $genes{$scaffold})
{
say join "\t", $gene_name, $_ for values %$matching_genes;
}
}
say "###";
}
I would like the script to additionally count all $matching_genes. Is there a way to incorporate this? I've been unsuccessful with standard counters (i.e. $i++) as it's pulling all values in the hash.

You can have a global counter variable on the top intialized to 0 before your for loop, say:
my $counter = 0;
# go through each reference file
for my $file (#reference_files)
# ... Rest of your code ...
Then, you can increment $counter inside of the if statement where $matching_genes is assigned:
if (my $matching_genes = $genes{$scaffold})
{
$counter++;
say join "\t", $gene_name, $_ for values %$matching_genes;
}

my $count=0;
# go through each reference file
for my $file (#reference_files)
{
open my $ref, "<", $file or die "Can't open reference file '$file': $!";
while (my $line = <$ref>)
{
chomp $line;
my ($scaffold, undef, $type, $org_snp, $new_snp, undef, undef, undef, $info) = split /\t/, $line;
next if not $scaffold =~ /^KB/;
next if not $type =~ /^GENE/i;
my ($transcript_id, $gene_name, $auto) = split /[;][ ]/, $info;
$gene_name = $1 if $gene_name =~ /["]([^"]*)["]/;
if (my $matching_genes = $genes{$scaffold})
{
say join "\t", $gene_name, $_ for values %$matching_genes;
$count =+ scalar(keys %$matching_genes);
}
}
say "###";
}
print "total: $count\n";

Related

how to display the hash value from my sample data

I'm learning perl at the moment, i wanted to ask help to answer this exercise.
My objective is to display the hash value of PartID 1,2,3
the sample output is displaying lot, wafer, program, version, testnames, testnumbers, hilimit, lolimit and partid values only.
sample data
lot=lot123
wafer=1
program=prgtest
version=1
Testnames,T1,T2,T3
Testnumbers,1,2,3
Hilimit,5,6,7
Lolimit,1,2,3
PartID,,,,
1,3,0,5
2,4,3,2
3,5,6,3
This is my code:
#!/usr/bin/perl
use strict;
use Getopt::Long;
my $file = "";
GetOptions ("infile=s" => \$file ) or die("Error in command line arguments\n");
my $lotid = "";
open(DATA, $file) or die "Couldn't open file $file";
while(my $line = <DATA>) {
#print "$line";
if ( $line =~ /^lot=/ ) {
#print "$line \n";
my ($dump, $lotid) = split /=/, $line;
print "$lotid\n";
}
elsif ($line =~ /^program=/ ) {
my ($dump, $progid) = split /=/, $line;
print "$progid \n";
}
elsif ($line =~ /^wafer=/ ) {
my ($dump, $waferid) = split /=/, $line;
print "$waferid \n";
}
elsif ($line =~ /^version=/ ) {
my ($dump, $verid) = split /=/, $line;
print "$verid \n";
}
elsif ($line =~ /^testnames/i) {
my ($dump, #arr) = split /\,/, $line;
foreach my $e (#arr) {
print $e, "\n";
}
}
elsif ($line =~ /^testnumbers/i) {
my ($dump, #arr1) = split /\,/, $line;
foreach my $e1 (#arr1) {
print $e1, "\n";
}
}
elsif ($line =~ /^hilimit/i) {
my ($dump, #arr2) = split /\,/, $line;
foreach my $e2 (#arr2) {
print $e2, "\n";
}
}
elsif ($line =~ /^lolimit/i) {
my ($dump, #arr3) = split /\,/, $line;
foreach my $e3 (#arr3) {
print $e3, "\n";
}
}
}
Kindly help add to my code to display Partid 1,2,3 hash.
So I've rewritten your code a little to use a few more modern Perl idioms (along with some comments to explain what I've done). The bit I've added is near the bottom.
#!/usr/bin/perl
use strict;
# Added 'warnings' which you should always use
use warnings;
# Use say() instead of print()
use feature 'say';
use Getopt::Long;
my $file = "";
GetOptions ("infile=s" => \$file)
or die ("Error in command line arguments\n");
# Use a lexical variable for a filehandle.
# Use the (safer) 3-argument version of open().
# Add $! to the error message.
open(my $fh, '<', $file) or die "Couldn't open file $file: $!";
# Read each record into $_ - which makes the following code simpler
while (<$fh>) {
# Match on $_
if ( /^lot=/ ) {
# Use "undef" instead of a $dump variable.
# split() works on $_ by default.
my (undef, $lotid) = split /=/;
# Use say() instead of print() - less punctuation :-)
say $lotid;
}
elsif ( /^program=/ ) {
my (undef, $progid) = split /=/;
say $progid;
}
elsif ( /^wafer=/ ) {
my (undef, $waferid) = split /=/;
say $waferid;
}
elsif ( /^version=/ ) {
my (undef, $verid) = split /=/;
say $verid;
}
elsif ( /^testnames/i) {
my (undef, #arr) = split /\,/;
# Changed all of these similar pieces of code
# to use the same variable names. As they are
# defined in different code blocks, they are
# completely separate variables.
foreach my $e (#arr) {
say $e;
}
}
elsif ( /^testnumbers/i) {
my (undef, #arr) = split /\,/;
foreach my $e (#arr) {
say $e;
}
}
elsif ( /^hilimit/i) {
my (undef, #arr) = split /\,/;
foreach my $e (#arr) {
say $e;
}
}
elsif ( /^lolimit/i) {
my (undef, #arr) = split /\,/;
foreach my $e (#arr) {
say $e;
}
}
# And here's the new bit.
# If we're on the "partid" line, then read the next
# three lines, split each one and print the first
# element from the list returned by split().
elsif ( /^partid/i) {
say +(split /,/, <$fh>)[0] for 1 .. 3;
}
}
Update: By the way, there are no hashes anywhere in this code :-)
Update 2: I've just realised that you only have three different ways to process the data. So you can simplify your code drastically by using slightly more complex regexes.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Getopt::Long;
my $file = "";
GetOptions ("infile=s" => \$file)
or die ("Error in command line arguments\n");
open(my $fh, '<', $file) or die "Couldn't open file $file: $!";
while (<$fh>) {
# Single value - just print it.
if ( /^(?:lot|program|wafer|version)=/ ) {
my (undef, $value) = split /=/;
say $value;
}
# List of values - split and print.
elsif ( /^(?:testnames|testnumbers|hilimit|lolimit)/i) {
my (undef, #arr) = split /\,/;
foreach my $e (#arr) {
say $e;
}
}
# Extract values from following lines.
elsif ( /^partid/i) {
say +(split /,/, <$fh>)[0] for 1 .. 3;
}
}

I need the output in following way in perl

# File-
# a,b,c,d,e,f
# 1,2,3,4,3,2
# 9,8,7,6,5,0
# 2,3,4,6,7,8
# i need output like this:-
# a=1,d=4,c=3,a=9,d=6,c=7,a=2,d=6,c=4
# but my program gives this:-
# a=1,d=4,c=3a=9,d=6,c=7a=2,d=6,c=4 (there is no , after c and a)
my script is :-
open ($fh, 'parse.txt');
my #arr;
my $dummy=<$fh>;
while (<$fh>) {
chomp;
$a = substr $_, 0,1;
$b = substr $_, 6,1;
$c = substr $_, 4,1;
print "a=$a,d=$b,c=$c";
}
close (IN);
my $prefix = '';
while (<$fh>) {
chomp;
my #fields = split /,/;
print $prefix."a=$fields[0],d=$fields[3],c=$fields[2]";
$prefix = ',';
}
print("\n");
or
my #recs;
while (<$fh>) {
chomp;
my #fields = split /,/;
push #recs, "a=$fields[0],d=$fields[3],c=$fields[2]";
}
print(join(',', #recs), "\n");
Instead of printing out the values you could append them to a string and include a comma after the "c" value. Then at the end of the loop, erase the final comma from the string and print it out. There are some scalability problems if your input file is too large. But if it's a reasonable size there shouldn't be any substantial issue.
my $output;
my $dummy=<$fh>;
while (<$fh>) {
chomp;
$a = substr $_, 0,1;
$b = substr $_, 6,1;
$c = substr $_, 4,1;
$output .= "a=$a,d=$b,c=$c,";
}
chop $output;
print $output;
If you have fields with separators split the line and collect needed elements
use warnings;
use strict;
use feature 'say';
my $file = 'parse.txt';
open my $fh, '<', $file or die "Can't open $file: $!";
my $dummy = <$fh>;
my #res;
while (<$fh>)
{
my ($a, $d, $c) = (split /,/)[0,3,2];
push #res, "a=$a,d=$d,c=$c";
}
say join ',', #res;
or pick the order in the assignment
my ($a, $c, $d) = (split /,/)[0,2,3];

Correct use of Perl "exists"

I have two files. The first two columns in both are chromosome loci and genotypes, for instance chr1:1736464585 and T/G.
I have put the first two columns into a hash. I want to check whether the hash key (the chromosome locus) exists in the second file.
I have written this Perl program and have tried many variations but I'm not sure if I'm using exists correctly: it gives the error exists is not an HASH or ARRAY element or a subroutine.
#!/usr/bin/perl
use strict;
use warnings;
my $output = "annotated.txt";
open( O, ">>$output" );
my $filename = "datatest.txt";
my $filename2 = "MP2.txt";
chomp $filename;
chomp $filename2;
my %hash1 = ();
open( FN1, $filename ) or die "Can't open $filename: $!";
my #lines = <FN1>;
foreach my $line (#lines) {
my #split = split /\t/, $line;
if ( $line =~ /^chr/ ) {
my ( $key, $value ) = ( $split[0], $split[1] );
$hash1{$key} = $value;
}
}
my $DATA;
open( $DATA, $filename2 ) or die $!;
my #lines2 = <$DATA>;
foreach my $line2 (#lines2) {
my #split2 = split /\t/, $line2;
if ( $line2 =~ /^chr/ ) {
if ( exists %hash1{$key} ) {
print "$line2\n";
}
}
}
The syntax of the following line is incorrect:
if (exists %hash1{$key}) { ... }
This should be:
if (exists $hash1{$key}) { ... }

Extracting multiple lines of record/data using a subroutine or functions

Can you show me how to create a subroutine or function using this code?
Basically I want to make my code into a subroutine so I'll be able to re-use it without making my script too long.
Here is my script:
#!/usr/local/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Carp qw(croak);
my #fields;
my ($tmp_var, $rec_type, $country, $header, $Combline, $records, $line);
my $filename = 'data5.txt';
open (my $input_fh, '<', $filename ) or croak "Can't open $filename: $!";
open my $OUTPUTA, ">", 'drp1.txt' or die $!;
open my $OUTPUTB, ">", 'drp2.txt' or die $!;
while (<$input_fh>) {
$line = _trim($_);
#fields = split (/\|/, $line);
$rec_type = $fields[0];
$country = $fields[1];
my $string = substr $fields[1], 0, 1;
$header = $line if(/^INVHDR/);
if ($rec_type eq 'INVDET') {
if ($string eq 'I') {
$records = $header . $line;
print $OUTPUTA $records, scalar <$input_fh>;
}
else {
$records = $header . $line;
print $OUTPUTB $records, scalar <$input_fh>;
}
}
}
close $OUTPUTA or die $!;
close $OUTPUTB or die $!;
sub _trim {
my $word = shift;
if ( $word ) {
$word =~ s/\s*\|/\|/g; #remove trailing spaces
$word =~ s/"//g; #remove double quotes
}
return $word;
}
This is the part of the script that I wanted to put in a subroutine or function:
$line = _trim($_);
#fields = split (/\|/, $line);
$rec_type = $fields[0];
$country = $fields[1];
my $string = substr $fields[1], 0, 1;
$header = $line if (/^INVHDR/);
if ($rec_type eq 'INVDET') {
if ($string eq 'I') {
$records = $header . $line;
print $OUTPUTA $records, scalar <$input_fh>;
}
else {
$records = $header . $line;
print $OUTPUTB $records, scalar <$input_fh>;
}
}
I would suggest breaking it out a little differently and expand on your _trim function, turning it into a parse function:
use strict;
use warnings;
open( my $input_fh, '<', 'data5.txt' ) or die "Can't open $filename: $!";
open( my $OUTPUTA, '>', 'drp1.txt' ) or die $!;
open( my $OUTPUTB, '>', 'drp2.txt' ) or die $!;
my $header = '';
while (<$input_fh>) {
if ($_ =~ /^INVHDR/) {
$header = $_;
}
if ($_ =~ /^INVDET/) {
my #data = parse($_);
my $line = $header . join('|', #data);
# scalar <$input_fh> is almost certainly not doing what you expect,
# though I'm not sure what you're try to accomplish with it
if ( $data[1] =~ /^I/ ) {
print $OUTPUTA $line;
} else {
print $OUTPUTB $line;
}
}
}
sub parse {
my $input = shift || return;
my $input =~ s/"//g; # remove double quotes
# Here I've combined the removal of trailing spaces with the split.
my #fields = split( m{\s*\|}, $input );
return #fields;
}

How can I find the elements appearing in two columns of a tab-delimited file?

I have a file which is tab delimited, and has two columns, A and B.
I want to count the number of times an element in B is repeated in A. I could have done it in Excel, but since the two columns contain more than 200k elements, it hangs.
I tried with this code but it counts elements in itself:
my %counts = ();
for (#A) {
$count{$_}++;
}
foreach my $k(keys %counts) {
print "$k\t$count{$k}\n";
}
Try this solution:
use strict;
use warnings;
my %countx;
my #y;
my $file = 'ab.txt';
open my $fh, '<', $file or die "Couldn't open $file";
while (my $line = <$fh>) {
chomp $line; # remove newline
# I've avoided using $a and $b because they are special variables in perl
my ( $x, $y ) = split /\t/, $line;
$countx{ $x }++;
push #y, $y;
}
close $fh;
foreach my $y (#y) {
my $count = $countx{ $y } || 0;
print "$y\t$count\n";
}