compare two file in perl and find mismatches - perl

I tried to read two files and compare them:
file1 : AAAAAAAAAA
file2: AAAABAAAAA
output: MMMMNMMMMM
open(my $fh1, '<', 'file1');
open(my $fh2, '<', 'file2');
while(
defined(my $line1 = <$fh1>)
and
defined(my $line2 = <$fh2>)
){
chomp $line1;
chomp $line2;
my #line1 = split(//, $line1);
my #line2 = split(//, $line2);
for my $i (0 ..#line1-1){
for my $j (0 .. #line2-1){
if ($line1[$i] eq $line2[$j]){
print "M\";}
else {
print "N";}
$j++;}
$i++;}}
close $fh1;
close $fh2;
It prints output repeatedly!! If somebody help me it would be a great help.

You need only one for loop per line,
my $max = $#line1 > $#line2 ? $#line1 : $#line2;
for my $i (0 .. $max) {
if ($line1[$i] eq $line2[$i]) { print "M";}
else { print "N";}
}

Related

Duplicate values in column

I have a original file which has following columns,
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,C,Sell,0.25,2000
02-May-2018,JPM,Sell,0.25,3000
02-May-2018,WFC,Sell,0.25,5000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,GOOG,Sell,0.25,8000
02-May-2018,GOOG,Sell,0.25,9000
02-May-2018,C,Sell,0.25,2000
02-May-2018,AAPL,Sell,0.25,3000
I am trying to print this original line if I see value in the second column more then 2 times.. for example, if I see AAPL more then 2 times desired result should print
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,AAPL,Sell,0.25,3000
So Far, I have written the following which prints results multiple times which is wrong.. can you please help on what I am doing wrong?
open (FILE, "<$TMPFILE") or die "Could not open $TMPFILE";
open (OUT, ">$TMPFILE1") or die "Could not open $TMPFILE1";
%count = ();
#symbol = ();
while ($line = <FILE>)
{
chomp $line;
(#data) = split(/,/,$line);
$count{$data[1]}++;
#keys = sort {$count{$a} cmp $count{$b}} keys %count;
for my $key (#keys)
{
if ( $count{$key} > 2 )
{
print "$line\n";
}
}
}
I'd do it something like this - store lines you've seen in a 'buffer' and print them out again if the condition is hit (before continuing to print as you go):
#!/usr/bin/env perl
use strict;
use warnings;
my %buffer;
my %count_of;
while ( my $line = <> ) {
my ( $date, $ticker, #values ) = split /,/, $line;
#increment the count
$count_of{$ticker}++;
if ( $count_of{$ticker} < 3 ) {
#count limit not hit, so stash the current line in the buffer.
$buffer{$ticker} .= $line;
next;
}
#print the buffer if the count has been hit
if ( $count_of{$ticker} == 3 ) {
print $buffer{$ticker};
}
#only gets to here once the limit is hit, so just print normally.
print $line;
}
With your input data, this outputs:
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,AAPL,Sell,0.25,3000
Simple answer:
push #{ $lines{(split",")[1]} }, $_ while <>;
print #{ $lines{$_} } for grep #{ $lines{$_} } > 2, sort keys %lines;
perl program.pl inputfile > outputfile
You need to read the input file twice, because you don't know the final counts until you get to the end of the file
use strict;
use warnings 'all';
my ($TMPFILE, $TMPFILE1) = qw/ infile outfile /;
my %counts;
{
open my $fh, '<', $TMPFILE or die "Could not open $TMPFILE: $!";
while ( <$fh> ) {
my #fields = split /,/;
++$counts{$fields[1]};
}
}
open my $fh, '<', $TMPFILE or die "Could not open $TMPFILE: $!";
open my $out_fh, '>', $TMPFILE1 or die "Could not open $TMPFILE1: $!";
while ( <$fh> ) {
my #fields = split /,/;
print $out_fh $_ if $counts{$fields[1]} > 2;
}
output
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,AAPL,Sell,0.25,3000
This should work:
use strict;
use warnings;
open (FILE, "<$TMPFILE") or die "Could not open $TMPFILE";
open (OUT, ">$TMPFILE1") or die "Could not open $TMPFILE1";
my %data;
while ( my $line = <FILE> ) {
chomp $line;
my #line = split /,/, $line;
push(#{$data{$line[1]}}, $line);
}
foreach my $key (keys %data) {
if(#{$data{$key}} > 2) {
print "$_\n" foreach #{$data{$key}};
}
}

Perl compilation error

sorry if it seems obvious but Im pretty new at Perl and programming and I've been working over a week and can't get it done.
My idea is simple. I've got a .csv where I've got the names in the first column, a number from -1 to 1 in the second and a position on the third. Then another file where I have got the names (line starts with >) and the info with 80 characters per line.
What I want to do is keep the name lines of the first file and grab the 'position' given from -20 to +60. But I cannot get it to work and I've got to the point where don't know where to follow.
use strict; #read file line by line
use warnings;
my $outputfile = "Output1.txt";
my $filename = "InputP.txt";
my $inputfasta = "Inputfasta.txt";
open my $fh, '<', $filename or die "Couldn't open '$filename'";
open my $fh2, '>', $outputfile or die "Couldn't create '$outputfile'";
open my $fh3, '<', $inputfasta or die "Couldn't open '$inputfasta'";
my $Psequence = 0;
my $seqname = 0;
while (my $line = <$fh>) {
chomp $line;
my $length = index ($line, ",");
$seqname = substr ($line, 0, $length);
my $length2 = index ($line, ",", $length);
my $score = substr ($line, $length +1, $length2);
my $length3 = index ($line, ",", $length2);
my $position = substr ($line, $length2 +1, $length3);
#print $fh2 "$seqname"."\t"."$score"."\t"."$position"."\n"; }
my $Rlength2 = index ($score, ",");
my $Rscore = substr ($score, 0, $Rlength2);
#print "$Rscore"."\n";}
while (my $linea = <$fh3>){ #same order.
chomp $linea;
if ($linea=~/^>(.+)/) {
print $fh3 "\n"."$linea"."\n"; }
else { $linea =~ /^\s*(.*)\s*$/;
chomp $linea;
print $fh3 "$linea". "\n"; }
}
if ($Rscore >= 0.5){
$Psequence = substr ($linea, -20, 81);
print "$seqname"."\n"."$Psequence";}
}
Please, learn to indent the code correctly. Then the error will be more obvious:
while (my $linea = <$fh3>){ #same order.
chomp $linea;
if ($linea =~ /^>(.+)/) {
print $fh3 "\n$linea\n";
} else {
# Commented out as it does nothing.
# $linea =~ /^\s*(.*)\s*$/;
# chomp $linea;
print $fh3 "$linea\n";
}
}
if ($Rscore >= 0.5){
$Psequence = substr $linea, -20, 81;
print "$seqname\n$Psequence";
}
$linea exists only in the while loop, but you try to use it in the following paragraph, too. The variable disappears when the loop ends.
Create a hash from the CSV where the key is the name and the value is the position.
use Text::CSV_XS qw( );
my %pos_by_name;
{
open(my $fh, '<', $input_qfn)
or die("Can't open $input_qfn: $!\n");
my $csv = Text::CSV_XS->new({ auto_diag => 1, binary => 1 });
while (my $row = $csv_in->getline($fh)) {
$pos_by_name{ $row->[0] } = $row->[2];
}
}
Then, it's just a question of extracting the names from the other file, and using the hash to find the associated position.
open(my $fh, '<', $fasta_qfn)
or die("Can't open $fasta_qfn: $!\n");
while (<$fh>) {
chomp;
my ($name) = /^>(.*)/
or next;
my $pos = $pos_by_name{$name};
if (!defined($pos)) {
die("Can't find position for $name\n");
}
... Do something with $name and $pos ...
}

comparing two files and display matched results

I've got the following problem to solve...I have two files containing the following information:
a.txt
alan, 23, alan#yahoo.com
albert, 27, albert#yahoo.com
b.txt
alan:173:analyst
victor:149:director
albert:171:clerk
coste:27:driver
I need to extract name(zero field) from every line of both files, compare them and if they match, print age and occupation information.
Thus, my output should be:
alan, 23, analyst
albert, 27, clerk
What I have got so far, and it's not working:
open F2, 'a.txt' or die $!;
#interesting_lines = <F2>;
foreach $line (#interesting_lines ) {
#string = split(', ', $line);
print "$string[0]\n";
}
close F2;
open F1, 'b.txt' or die $!;
while (defined(my $line = <F1>)) {
#string2 = split(':', $line);
print $string2[0];
print "$.:\t$string2[0]" if grep {$string2[0] eq $_} $string[0] ;
}
Does anyone have any ideas how can I implement my requirements? Thanks...
Ps, bith files might have more lines than I posted, but file b.txt will always have every name that file a.tx has, plus extra lines.
open my $F2, '<', 'a.txt' or die $!;
chomp(my #interesting_lines = <$F2>);
close $F2;
my %dic;
foreach my $line (#interesting_lines) {
my ($name, #arr) = split(/, /, $line);
$dic{$name} = \#arr;
}
open my $F1, '<', 'b.txt' or die $!;
while (my $line = <$F1>) {
chomp($line);
my ($name, $pos) = ( split(/:/, $line) )[0, 2];
my $arr = $dic{$name} or next;
printf("%s, %s, %s, %s\n", $name, $arr->[0], $pos, $arr->[1]);
}
close $F1;

how to count the number of specific characters through each line from file?

I'm trying to count the number of 'N's in a FASTA file which is:
>Header
AGGTTGGNNNTNNGNNTNGN
>Header2
AGNNNNNNNGNNGNNGNNGN
so in the end I want to get the count of number of 'N's and each header is a read so I want to make a histogram so I would at the end output something like this:
# of N's # of Reads
0 300
1 240
etc...
so there are 300 sequences or reads that have 0 number of 'N's
use strict;
use warnings;
my $file = shift;
my $output_file = shift;
my $line;
my $sequence;
my $length;
my $char_N_count = 0;
my #array;
my $count = 0;
if (!defined ($output_file)) {
die "USAGE: Input FASTA file\n";
}
open (IFH, "$file") or die "Cannot open input file$!\n";
open (OFH, ">$output_file") or die "Cannot open output file $!\n";
while($line = <IFH>) {
chomp $line;
next if $line =~ /^>/;
$sequence = $line;
#array = split ('', $sequence);
foreach my $element (#array) {
if ($element eq 'N') {
$char_N_count++;
}
}
print "$char_N_count\n";
}
Try this. I changed a few things like using scalar file handles. There are many ways to do this in Perl, so some people will have other ideas. In this case I used an array which may have gaps in it - another option is to store results in a hash and key by the count.
Edit: Just realised I'm not using $output_file, because I have no idea what you want to do with it :) Just change the 'print' at the end to 'print $out_fh' if your intent is to write to it.
use strict;
use warnings;
my $file = shift;
my $output_file = shift;
if (!defined ($output_file)) {
die "USAGE: $0 <input_file> <output_file>\n";
}
open (my $in_fh, '<', $file) or die "Cannot open input file '$file': $!\n";
open (my $out_fh, '>', $output_file) or die "Cannot open output file '$output_file': $!\n";
my #results = ();
while (my $line = <$in_fh>) {
next if $line =~ /^>/;
my $num_n = ($line =~ tr/N//);
$results[$num_n]++;
}
print "# of N's\t# of Reads\n";
for (my $i = 0; $i < scalar(#results) ; $i++) {
unless (defined($results[$i])) {
$results[$i] = 0;
# another option is to 'next' if you don't want to show the zero totals
}
print "$i\t\t$results[$i]\n";
}
close($in_fh);
close($out_fh);
exit;

How can I print lines from a file to separate files

I have a file which has lines like this:
1 107275 447049 scaffold1443 465 341154 -
There are several lines which starts with one, after that a blank line separates and start lines with 2 and so on.
I want to separate these lines to different files based on their number.
I wrote this script but it prints in every file only the first line.
#!/usr/bin/perl
#script for choosing chromosome
use strict;
my $filename= $ARGV[0];
open(FILE, $filename);
while (my $line = <FILE>) {
my #data = split('\t', $line);
my $length = #data;
#print $length;
my $num = $data[0];
if ($length == 6) {
open(my $fh, '>', $num);
print $fh $line;
}
$num = $num + 1;
}
please, i need your help!
use >> to open file for appending to end of it as > always truncates desired file to zero bytes,
use strict;
my $filename = $ARGV[0];
open(my $FILE, "<", $filename) or die $!;
while (my $line = <$FILE>) {
my #data = split('\t', $line);
my $length = #data;
#print $length;
my $num = $data[0];
if ($length == 6) {
open(my $fh, '>>', $num);
print $fh $line;
}
$num = $num + 1;
}
If I understand your question correctly, then paragraph mode might be useful. This breaks a record on two or more new-lines, instead of just one:
#ARGV or die "Supply a filename\n";
my $filename= $ARGV[0];
local $/ = ""; # Set paragraph mode
open(my $file, $filename) or die "Unable to open '$filename' for read: $!";
while (my $lines = <$file>) {
my $num = (split("\t", $lines))[0];
open(my $fh, '>', $num) or die "Unable to open '$num' for write: $!";
print $fh $lines;
close $fh;
}
close $file;