Perl compilation error - perl

sorry if it seems obvious but Im pretty new at Perl and programming and I've been working over a week and can't get it done.
My idea is simple. I've got a .csv where I've got the names in the first column, a number from -1 to 1 in the second and a position on the third. Then another file where I have got the names (line starts with >) and the info with 80 characters per line.
What I want to do is keep the name lines of the first file and grab the 'position' given from -20 to +60. But I cannot get it to work and I've got to the point where don't know where to follow.
use strict; #read file line by line
use warnings;
my $outputfile = "Output1.txt";
my $filename = "InputP.txt";
my $inputfasta = "Inputfasta.txt";
open my $fh, '<', $filename or die "Couldn't open '$filename'";
open my $fh2, '>', $outputfile or die "Couldn't create '$outputfile'";
open my $fh3, '<', $inputfasta or die "Couldn't open '$inputfasta'";
my $Psequence = 0;
my $seqname = 0;
while (my $line = <$fh>) {
chomp $line;
my $length = index ($line, ",");
$seqname = substr ($line, 0, $length);
my $length2 = index ($line, ",", $length);
my $score = substr ($line, $length +1, $length2);
my $length3 = index ($line, ",", $length2);
my $position = substr ($line, $length2 +1, $length3);
#print $fh2 "$seqname"."\t"."$score"."\t"."$position"."\n"; }
my $Rlength2 = index ($score, ",");
my $Rscore = substr ($score, 0, $Rlength2);
#print "$Rscore"."\n";}
while (my $linea = <$fh3>){ #same order.
chomp $linea;
if ($linea=~/^>(.+)/) {
print $fh3 "\n"."$linea"."\n"; }
else { $linea =~ /^\s*(.*)\s*$/;
chomp $linea;
print $fh3 "$linea". "\n"; }
}
if ($Rscore >= 0.5){
$Psequence = substr ($linea, -20, 81);
print "$seqname"."\n"."$Psequence";}
}

Please, learn to indent the code correctly. Then the error will be more obvious:
while (my $linea = <$fh3>){ #same order.
chomp $linea;
if ($linea =~ /^>(.+)/) {
print $fh3 "\n$linea\n";
} else {
# Commented out as it does nothing.
# $linea =~ /^\s*(.*)\s*$/;
# chomp $linea;
print $fh3 "$linea\n";
}
}
if ($Rscore >= 0.5){
$Psequence = substr $linea, -20, 81;
print "$seqname\n$Psequence";
}
$linea exists only in the while loop, but you try to use it in the following paragraph, too. The variable disappears when the loop ends.

Create a hash from the CSV where the key is the name and the value is the position.
use Text::CSV_XS qw( );
my %pos_by_name;
{
open(my $fh, '<', $input_qfn)
or die("Can't open $input_qfn: $!\n");
my $csv = Text::CSV_XS->new({ auto_diag => 1, binary => 1 });
while (my $row = $csv_in->getline($fh)) {
$pos_by_name{ $row->[0] } = $row->[2];
}
}
Then, it's just a question of extracting the names from the other file, and using the hash to find the associated position.
open(my $fh, '<', $fasta_qfn)
or die("Can't open $fasta_qfn: $!\n");
while (<$fh>) {
chomp;
my ($name) = /^>(.*)/
or next;
my $pos = $pos_by_name{$name};
if (!defined($pos)) {
die("Can't find position for $name\n");
}
... Do something with $name and $pos ...
}

Related

Duplicate values in column

I have a original file which has following columns,
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,C,Sell,0.25,2000
02-May-2018,JPM,Sell,0.25,3000
02-May-2018,WFC,Sell,0.25,5000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,GOOG,Sell,0.25,8000
02-May-2018,GOOG,Sell,0.25,9000
02-May-2018,C,Sell,0.25,2000
02-May-2018,AAPL,Sell,0.25,3000
I am trying to print this original line if I see value in the second column more then 2 times.. for example, if I see AAPL more then 2 times desired result should print
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,AAPL,Sell,0.25,3000
So Far, I have written the following which prints results multiple times which is wrong.. can you please help on what I am doing wrong?
open (FILE, "<$TMPFILE") or die "Could not open $TMPFILE";
open (OUT, ">$TMPFILE1") or die "Could not open $TMPFILE1";
%count = ();
#symbol = ();
while ($line = <FILE>)
{
chomp $line;
(#data) = split(/,/,$line);
$count{$data[1]}++;
#keys = sort {$count{$a} cmp $count{$b}} keys %count;
for my $key (#keys)
{
if ( $count{$key} > 2 )
{
print "$line\n";
}
}
}
I'd do it something like this - store lines you've seen in a 'buffer' and print them out again if the condition is hit (before continuing to print as you go):
#!/usr/bin/env perl
use strict;
use warnings;
my %buffer;
my %count_of;
while ( my $line = <> ) {
my ( $date, $ticker, #values ) = split /,/, $line;
#increment the count
$count_of{$ticker}++;
if ( $count_of{$ticker} < 3 ) {
#count limit not hit, so stash the current line in the buffer.
$buffer{$ticker} .= $line;
next;
}
#print the buffer if the count has been hit
if ( $count_of{$ticker} == 3 ) {
print $buffer{$ticker};
}
#only gets to here once the limit is hit, so just print normally.
print $line;
}
With your input data, this outputs:
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,AAPL,Sell,0.25,3000
Simple answer:
push #{ $lines{(split",")[1]} }, $_ while <>;
print #{ $lines{$_} } for grep #{ $lines{$_} } > 2, sort keys %lines;
perl program.pl inputfile > outputfile
You need to read the input file twice, because you don't know the final counts until you get to the end of the file
use strict;
use warnings 'all';
my ($TMPFILE, $TMPFILE1) = qw/ infile outfile /;
my %counts;
{
open my $fh, '<', $TMPFILE or die "Could not open $TMPFILE: $!";
while ( <$fh> ) {
my #fields = split /,/;
++$counts{$fields[1]};
}
}
open my $fh, '<', $TMPFILE or die "Could not open $TMPFILE: $!";
open my $out_fh, '>', $TMPFILE1 or die "Could not open $TMPFILE1: $!";
while ( <$fh> ) {
my #fields = split /,/;
print $out_fh $_ if $counts{$fields[1]} > 2;
}
output
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,AAPL,Sell,0.25,3000
This should work:
use strict;
use warnings;
open (FILE, "<$TMPFILE") or die "Could not open $TMPFILE";
open (OUT, ">$TMPFILE1") or die "Could not open $TMPFILE1";
my %data;
while ( my $line = <FILE> ) {
chomp $line;
my #line = split /,/, $line;
push(#{$data{$line[1]}}, $line);
}
foreach my $key (keys %data) {
if(#{$data{$key}} > 2) {
print "$_\n" foreach #{$data{$key}};
}
}

Perl - Comparison of Files using specific substrings

i ve writted thsi script to compare lines of two files, and output common/not common lines into two different files. The script is :
use strict;
use warnings;
use autodie;
my $f1 = shift || "CSP8216.TXT";
my $f2 = shift || "CSP8217.TXT";
open my $fh1, '>', 'file1';
open FH2, '>', 'file2';
my %results;
open my $file1, '<', $f1;
while (my $line = <$file1>) {
$results{$line} = 1
}
open my $file2, '<', $f2;
while (my $line = <$file2>) {
$results{$line}++
}
foreach my $line (sort { $results{$b} <=> $results{$a} } keys %results)
{
if ($results{$line} >= 1)
{
print {$fh1} "$line";
}
else
{
print FH2 "$line";
}
}
My problem is when i try to mod this script but run the comparisons based on specific substrings of each line, ie :
If a specific substring of a line of file A matches another specific substring of a line in File B, then output said /entire/ line of File B into fh1, otherwise output it into fh2.
I tried this, but it doesnt work - really new to Perl still, any help will be really appreciated :
use strict;
use warnings;
use autodie;
my $f1 = shift || "CSP8216.TXT";
my $f2 = shift || "CSP8216.TXT";
open my $fh1, '>', 'file1';
open FH2, '>', 'file2';
my %results;
open my $file1, '<', $f1;
while (my $line = <$file1>)
{
my $sbs1 = substr($line, 0, 10);
$results{$sbs1} = 1
}
open my $file2, '<', $f2;
while (my $line = <$file2>)
{
my $sbs2 = substr($line, 0, 10);
$results{$sbs2}++
}
foreach my $line (sort { $results{$b} <=> $results{$a} } keys %results)
{
if ($results{$line} >= 1)
{
print {$fh1} "$line";
}
else
{
print FH2 "$line";
}
}
This does not work, and i have a feeling its a problem in the logic, it outputs just the substrings in a single line.
As per my comment, we need to keep the lines from file A and file B separate if we need to support that a single line can appear twice in one file.
On option is to solve the basic problem like this
open my $fh1, '<', $filename1 or die "Can't open $file1: $!";
while (my $line = <$fh1>) {
$combined{$line} = $file1{$line} = 1;
}
open my $fh2, '<', $filename2 or die "Can't open $file2: $!";
while (my $line = <$fh2>) {
$combined{$line} = $file2{$line} = 1;
}
open my $out1, '>', $outfilename1 or die "...";
open my $out2, '>', $outfilename2 or die "...";
for my $line (keys %combined) {
if ($file1{$line} && $file2{$line}) {
print $out1 $line;
} else {
print $out2 $line;
}
}
To solve the substring issue I would keep the substrings from each file as keys in the hashes. But instead of just storing the true value I would store the full string as value in %file2:
open my $fh1, '<', $filename1 or die "Can't open $file1: $!";
while (my $line = <$fh1>) {
my $substr = substr($line, 0, 10);
$combined{$line} = $file1{$substr} = 1;
}
open my $fh2, '<', $filename2 or die "Can't open $file2: $!";
while (my $line = <$fh2>) {
my $substr = substr($line, 20, 30);
$combined{$line} = 1;
$file2{$substr} = $line;
}
open my $out1, '>', $outfilename1 or die "...";
open my $out2, '>', $outfilename2 or die "...";
for my $line (keys %combined) {
my $substr1 = substr($line, 0, 10);
my $substr2 = substr($line, 20, 30);
if ($file1{$substr1} && $file2{$substr2}) {
print $out1 $file2{$substr2};
} else {
print $out2 $line;
}
}
This works for me
#!/usr/bin/perl
use warnings;
use autodie;
my %results;
my $f1 = shift || "CSP8216.TXT";
my $f2 = shift || "CSP8217.TXT";
open my $fh1, '>', 'file1';
open my $fh2, '>', 'file2';
open my $file1, '<', $f1;
while (my $line = <$file1>) {
my $sbs1 = substr($line, 0, 10);
$results{$sbs1} = 1
}
open my $file2, '<', $f2;
while (my $line = <$file2>) {
my $sbs2 = substr($line, 0, 10);
if (!$results{$sbs2}) {
$results{$sbs2} = 1;
}
$results{$sbs2}++
}
foreach my $line (sort { $results{$b} <=> $results{$a} } keys %results) {
if ($results{$line} > 1) {
print {$fh1} "$line";
}
else {
print {$fh2} "$line";
}
}

compare two file in perl and find mismatches

I tried to read two files and compare them:
file1 : AAAAAAAAAA
file2: AAAABAAAAA
output: MMMMNMMMMM
open(my $fh1, '<', 'file1');
open(my $fh2, '<', 'file2');
while(
defined(my $line1 = <$fh1>)
and
defined(my $line2 = <$fh2>)
){
chomp $line1;
chomp $line2;
my #line1 = split(//, $line1);
my #line2 = split(//, $line2);
for my $i (0 ..#line1-1){
for my $j (0 .. #line2-1){
if ($line1[$i] eq $line2[$j]){
print "M\";}
else {
print "N";}
$j++;}
$i++;}}
close $fh1;
close $fh2;
It prints output repeatedly!! If somebody help me it would be a great help.
You need only one for loop per line,
my $max = $#line1 > $#line2 ? $#line1 : $#line2;
for my $i (0 .. $max) {
if ($line1[$i] eq $line2[$i]) { print "M";}
else { print "N";}
}

how to count the number of specific characters through each line from file?

I'm trying to count the number of 'N's in a FASTA file which is:
>Header
AGGTTGGNNNTNNGNNTNGN
>Header2
AGNNNNNNNGNNGNNGNNGN
so in the end I want to get the count of number of 'N's and each header is a read so I want to make a histogram so I would at the end output something like this:
# of N's # of Reads
0 300
1 240
etc...
so there are 300 sequences or reads that have 0 number of 'N's
use strict;
use warnings;
my $file = shift;
my $output_file = shift;
my $line;
my $sequence;
my $length;
my $char_N_count = 0;
my #array;
my $count = 0;
if (!defined ($output_file)) {
die "USAGE: Input FASTA file\n";
}
open (IFH, "$file") or die "Cannot open input file$!\n";
open (OFH, ">$output_file") or die "Cannot open output file $!\n";
while($line = <IFH>) {
chomp $line;
next if $line =~ /^>/;
$sequence = $line;
#array = split ('', $sequence);
foreach my $element (#array) {
if ($element eq 'N') {
$char_N_count++;
}
}
print "$char_N_count\n";
}
Try this. I changed a few things like using scalar file handles. There are many ways to do this in Perl, so some people will have other ideas. In this case I used an array which may have gaps in it - another option is to store results in a hash and key by the count.
Edit: Just realised I'm not using $output_file, because I have no idea what you want to do with it :) Just change the 'print' at the end to 'print $out_fh' if your intent is to write to it.
use strict;
use warnings;
my $file = shift;
my $output_file = shift;
if (!defined ($output_file)) {
die "USAGE: $0 <input_file> <output_file>\n";
}
open (my $in_fh, '<', $file) or die "Cannot open input file '$file': $!\n";
open (my $out_fh, '>', $output_file) or die "Cannot open output file '$output_file': $!\n";
my #results = ();
while (my $line = <$in_fh>) {
next if $line =~ /^>/;
my $num_n = ($line =~ tr/N//);
$results[$num_n]++;
}
print "# of N's\t# of Reads\n";
for (my $i = 0; $i < scalar(#results) ; $i++) {
unless (defined($results[$i])) {
$results[$i] = 0;
# another option is to 'next' if you don't want to show the zero totals
}
print "$i\t\t$results[$i]\n";
}
close($in_fh);
close($out_fh);
exit;

How can I print lines from a file to separate files

I have a file which has lines like this:
1 107275 447049 scaffold1443 465 341154 -
There are several lines which starts with one, after that a blank line separates and start lines with 2 and so on.
I want to separate these lines to different files based on their number.
I wrote this script but it prints in every file only the first line.
#!/usr/bin/perl
#script for choosing chromosome
use strict;
my $filename= $ARGV[0];
open(FILE, $filename);
while (my $line = <FILE>) {
my #data = split('\t', $line);
my $length = #data;
#print $length;
my $num = $data[0];
if ($length == 6) {
open(my $fh, '>', $num);
print $fh $line;
}
$num = $num + 1;
}
please, i need your help!
use >> to open file for appending to end of it as > always truncates desired file to zero bytes,
use strict;
my $filename = $ARGV[0];
open(my $FILE, "<", $filename) or die $!;
while (my $line = <$FILE>) {
my #data = split('\t', $line);
my $length = #data;
#print $length;
my $num = $data[0];
if ($length == 6) {
open(my $fh, '>>', $num);
print $fh $line;
}
$num = $num + 1;
}
If I understand your question correctly, then paragraph mode might be useful. This breaks a record on two or more new-lines, instead of just one:
#ARGV or die "Supply a filename\n";
my $filename= $ARGV[0];
local $/ = ""; # Set paragraph mode
open(my $file, $filename) or die "Unable to open '$filename' for read: $!";
while (my $lines = <$file>) {
my $num = (split("\t", $lines))[0];
open(my $fh, '>', $num) or die "Unable to open '$num' for write: $!";
print $fh $lines;
close $fh;
}
close $file;