comparing two files and display matched results

comparing two files and display matched results - perl

I've got the following problem to solve...I have two files containing the following information:
a.txt
alan, 23, alan#yahoo.com
albert, 27, albert#yahoo.com
b.txt
alan:173:analyst
victor:149:director
albert:171:clerk
coste:27:driver
I need to extract name(zero field) from every line of both files, compare them and if they match, print age and occupation information.
Thus, my output should be:
alan, 23, analyst
albert, 27, clerk
What I have got so far, and it's not working:
open F2, 'a.txt' or die $!;
#interesting_lines = <F2>;
foreach $line (#interesting_lines ) {
#string = split(', ', $line);
print "$string[0]\n";
}
close F2;
open F1, 'b.txt' or die $!;
while (defined(my $line = <F1>)) {
#string2 = split(':', $line);
print $string2[0];
print "$.:\t$string2[0]" if grep {$string2[0] eq $_} $string[0] ;
}
Does anyone have any ideas how can I implement my requirements? Thanks...
Ps, bith files might have more lines than I posted, but file b.txt will always have every name that file a.tx has, plus extra lines.

open my $F2, '<', 'a.txt' or die $!;
chomp(my #interesting_lines = <$F2>);
close $F2;
my %dic;
foreach my $line (#interesting_lines) {
my ($name, #arr) = split(/, /, $line);
$dic{$name} = \#arr;
}
open my $F1, '<', 'b.txt' or die $!;
while (my $line = <$F1>) {
chomp($line);
my ($name, $pos) = ( split(/:/, $line) )[0, 2];
my $arr = $dic{$name} or next;
printf("%s, %s, %s, %s\n", $name, $arr->[0], $pos, $arr->[1]);
}
close $F1;

Related

Duplicate values in column

I have a original file which has following columns,
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,C,Sell,0.25,2000
02-May-2018,JPM,Sell,0.25,3000
02-May-2018,WFC,Sell,0.25,5000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,GOOG,Sell,0.25,8000
02-May-2018,GOOG,Sell,0.25,9000
02-May-2018,C,Sell,0.25,2000
02-May-2018,AAPL,Sell,0.25,3000
I am trying to print this original line if I see value in the second column more then 2 times.. for example, if I see AAPL more then 2 times desired result should print
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,AAPL,Sell,0.25,3000
So Far, I have written the following which prints results multiple times which is wrong.. can you please help on what I am doing wrong?
open (FILE, "<$TMPFILE") or die "Could not open $TMPFILE";
open (OUT, ">$TMPFILE1") or die "Could not open $TMPFILE1";
%count = ();
#symbol = ();
while ($line = <FILE>)
{
chomp $line;
(#data) = split(/,/,$line);
$count{$data[1]}++;
#keys = sort {$count{$a} cmp $count{$b}} keys %count;
for my $key (#keys)
{
if ( $count{$key} > 2 )
{
print "$line\n";
}
}
}

I'd do it something like this - store lines you've seen in a 'buffer' and print them out again if the condition is hit (before continuing to print as you go):
#!/usr/bin/env perl
use strict;
use warnings;
my %buffer;
my %count_of;
while ( my $line = <> ) {
my ( $date, $ticker, #values ) = split /,/, $line;
#increment the count
$count_of{$ticker}++;
if ( $count_of{$ticker} < 3 ) {
#count limit not hit, so stash the current line in the buffer.
$buffer{$ticker} .= $line;
next;
}
#print the buffer if the count has been hit
if ( $count_of{$ticker} == 3 ) {
print $buffer{$ticker};
}
#only gets to here once the limit is hit, so just print normally.
print $line;
}
With your input data, this outputs:
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,AAPL,Sell,0.25,3000

Simple answer:
push #{ $lines{(split",")[1]} }, $_ while <>;
print #{ $lines{$_} } for grep #{ $lines{$_} } > 2, sort keys %lines;
perl program.pl inputfile > outputfile

You need to read the input file twice, because you don't know the final counts until you get to the end of the file
use strict;
use warnings 'all';
my ($TMPFILE, $TMPFILE1) = qw/ infile outfile /;
my %counts;
{
open my $fh, '<', $TMPFILE or die "Could not open $TMPFILE: $!";
while ( <$fh> ) {
my #fields = split /,/;
++$counts{$fields[1]};
}
}
open my $fh, '<', $TMPFILE or die "Could not open $TMPFILE: $!";
open my $out_fh, '>', $TMPFILE1 or die "Could not open $TMPFILE1: $!";
while ( <$fh> ) {
my #fields = split /,/;
print $out_fh $_ if $counts{$fields[1]} > 2;
}
output
02-May-2018,AAPL,Sell,0.25,1000
02-May-2018,AAPL,Sell,0.25,7000
02-May-2018,AAPL,Sell,0.25,3000

This should work:
use strict;
use warnings;
open (FILE, "<$TMPFILE") or die "Could not open $TMPFILE";
open (OUT, ">$TMPFILE1") or die "Could not open $TMPFILE1";
my %data;
while ( my $line = <FILE> ) {
chomp $line;
my #line = split /,/, $line;
push(#{$data{$line[1]}}, $line);
}
foreach my $key (keys %data) {
if(#{$data{$key}} > 2) {
print "$_\n" foreach #{$data{$key}};
}
}

Nested if statements: Swapping headers and sequences in fasta files

I am opening a directory and processing each file. A sample file looks like this when opened:
>AAAAA
TTTTTTTTTTTAAAAATTTTTTTTTT
>BBBBB
TTTTTTTTTTTTTTTTTTBBBBBTTT
>CCCCC
TTTTTTTTTTTTTTTTCCCCCTTTTT
For the above sample file, I am trying to make them look like this:
>TAAAAAT
AAAAA
>TBBBBBT
BBBBB
>TCCCCCT
CCCCC
I need to find the "header" in next line sequence, take flanks on either side of the match, and then flip them. I want to print each file's worth of contents to another separate file.
Here is my code so far. It runs without errors, but doesn't generate any output. My guess is this is probably related to the nested if statements. I have never worked with those before.
#!/usr/bin/perl
use strict;
use warnings;
my ($directory) = #ARGV;
my $dir = "$directory";
my #ArrayofFiles = glob "$dir/*";
my $count = 0;
open(OUT, ">", "/path/to/output_$count.txt") or die $!;
foreach my $file(#ArrayofFiles){
open(my $fastas, $file) or die $!;
while (my $line = <$fastas>){
$count++;
if ($line =~ m/(^>)([a-z]{5})/i){
my $header = $2;
if ($line !~ /^>/){
my $sequence .= $line;
if ($sequence =~ m/(([a-z]{1})($header)([a-z]{1}))/i){
my $matchplusflanks = $1;
print OUT ">", $matchplusflanks, "\n", $header, "\n";
}
}
}
}
}
How can I fix this code? Thanks.

Try this
foreach my $file(#ArrayofFiles)
{
open my $fh," <", $file or die"error opening $!\n";
while(my $head=<$fh>)
{
chomp $head;
$head=~s/>//;
my $next_line = <$fh>;
my($extract) = $next_line =~m/(.$head.)/;
print ">$extract\n$head\n";
}
}

There are several mistakes in your code but the main problem is:
if ($line =~ m/(^>)([a-z]{5})/i) {
my $header = $2;
if ($line !~ /^>/) {
# here you write to the output file
Because the same line can't start and not start with > at the same time, your output files are never written. The second if statement always fails and its block is never executed.
open(OUT, ">", "/path/to/output_$count.txt") or die $!; and $count++ are misplaced. Since you want to produce an output file (with a new name) for each input file, you need to put them in the foreach block, not outside or in the while loop.
Example:
#!/usr/bin/perl
use strict;
use warnings;
my ($dir) = #ARGV;
my #files = glob "$dir/*";
my $count;
my $format = ">%s\n%s\n";
foreach my $file (#files) {
open my $fhi, '<', $file
or die "Can't open file '$file': $!";
$count++;
my $output_path = "/path/to/output_$count.txt";
open my $fho, '>', $output_path
or die "Can't open file '$output_path': $!";
my ($header, $seq);
while(<$fhi>) {
chomp;
if (/^>([a-z]{5})/i) {
if ($seq) { printf $fho $format, $seq =~ /([a-z]$header[a-z])/i, $header; }
($header, $seq) = ($1, '');
} else { $seq .= $_; }
}
if ($seq) { printf $fho $format, $seq =~ /([a-z]$header[a-z])/i, $header; }
}
close $fhi;
close $fho;

Perl compilation error

sorry if it seems obvious but Im pretty new at Perl and programming and I've been working over a week and can't get it done.
My idea is simple. I've got a .csv where I've got the names in the first column, a number from -1 to 1 in the second and a position on the third. Then another file where I have got the names (line starts with >) and the info with 80 characters per line.
What I want to do is keep the name lines of the first file and grab the 'position' given from -20 to +60. But I cannot get it to work and I've got to the point where don't know where to follow.
use strict; #read file line by line
use warnings;
my $outputfile = "Output1.txt";
my $filename = "InputP.txt";
my $inputfasta = "Inputfasta.txt";
open my $fh, '<', $filename or die "Couldn't open '$filename'";
open my $fh2, '>', $outputfile or die "Couldn't create '$outputfile'";
open my $fh3, '<', $inputfasta or die "Couldn't open '$inputfasta'";
my $Psequence = 0;
my $seqname = 0;
while (my $line = <$fh>) {
chomp $line;
my $length = index ($line, ",");
$seqname = substr ($line, 0, $length);
my $length2 = index ($line, ",", $length);
my $score = substr ($line, $length +1, $length2);
my $length3 = index ($line, ",", $length2);
my $position = substr ($line, $length2 +1, $length3);
#print $fh2 "$seqname"."\t"."$score"."\t"."$position"."\n"; }
my $Rlength2 = index ($score, ",");
my $Rscore = substr ($score, 0, $Rlength2);
#print "$Rscore"."\n";}
while (my $linea = <$fh3>){ #same order.
chomp $linea;
if ($linea=~/^>(.+)/) {
print $fh3 "\n"."$linea"."\n"; }
else { $linea =~ /^\s*(.*)\s*$/;
chomp $linea;
print $fh3 "$linea". "\n"; }
}
if ($Rscore >= 0.5){
$Psequence = substr ($linea, -20, 81);
print "$seqname"."\n"."$Psequence";}
}

Please, learn to indent the code correctly. Then the error will be more obvious:
while (my $linea = <$fh3>){ #same order.
chomp $linea;
if ($linea =~ /^>(.+)/) {
print $fh3 "\n$linea\n";
} else {
# Commented out as it does nothing.
# $linea =~ /^\s*(.*)\s*$/;
# chomp $linea;
print $fh3 "$linea\n";
}
}
if ($Rscore >= 0.5){
$Psequence = substr $linea, -20, 81;
print "$seqname\n$Psequence";
}
$linea exists only in the while loop, but you try to use it in the following paragraph, too. The variable disappears when the loop ends.

Create a hash from the CSV where the key is the name and the value is the position.
use Text::CSV_XS qw( );
my %pos_by_name;
{
open(my $fh, '<', $input_qfn)
or die("Can't open $input_qfn: $!\n");
my $csv = Text::CSV_XS->new({ auto_diag => 1, binary => 1 });
while (my $row = $csv_in->getline($fh)) {
$pos_by_name{ $row->[0] } = $row->[2];
}
}
Then, it's just a question of extracting the names from the other file, and using the hash to find the associated position.
open(my $fh, '<', $fasta_qfn)
or die("Can't open $fasta_qfn: $!\n");
while (<$fh>) {
chomp;
my ($name) = /^>(.*)/
or next;
my $pos = $pos_by_name{$name};
if (!defined($pos)) {
die("Can't find position for $name\n");
}
... Do something with $name and $pos ...
}

compare two file in perl and find mismatches

I tried to read two files and compare them:
file1 : AAAAAAAAAA
file2: AAAABAAAAA
output: MMMMNMMMMM
open(my $fh1, '<', 'file1');
open(my $fh2, '<', 'file2');
while(
defined(my $line1 = <$fh1>)
and
defined(my $line2 = <$fh2>)
){
chomp $line1;
chomp $line2;
my #line1 = split(//, $line1);
my #line2 = split(//, $line2);
for my $i (0 ..#line1-1){
for my $j (0 .. #line2-1){
if ($line1[$i] eq $line2[$j]){
print "M\";}
else {
print "N";}
$j++;}
$i++;}}
close $fh1;
close $fh2;
It prints output repeatedly!! If somebody help me it would be a great help.

You need only one for loop per line,
my $max = $#line1 > $#line2 ? $#line1 : $#line2;
for my $i (0 .. $max) {
if ($line1[$i] eq $line2[$i]) { print "M";}
else { print "N";}
}

How can I print lines from a file to separate files

I have a file which has lines like this:
1 107275 447049 scaffold1443 465 341154 -
There are several lines which starts with one, after that a blank line separates and start lines with 2 and so on.
I want to separate these lines to different files based on their number.
I wrote this script but it prints in every file only the first line.
#!/usr/bin/perl
#script for choosing chromosome
use strict;
my $filename= $ARGV[0];
open(FILE, $filename);
while (my $line = <FILE>) {
my #data = split('\t', $line);
my $length = #data;
#print $length;
my $num = $data[0];
if ($length == 6) {
open(my $fh, '>', $num);
print $fh $line;
}
$num = $num + 1;
}
please, i need your help!

use >> to open file for appending to end of it as > always truncates desired file to zero bytes,
use strict;
my $filename = $ARGV[0];
open(my $FILE, "<", $filename) or die $!;
while (my $line = <$FILE>) {
my #data = split('\t', $line);
my $length = #data;
#print $length;
my $num = $data[0];
if ($length == 6) {
open(my $fh, '>>', $num);
print $fh $line;
}
$num = $num + 1;
}

If I understand your question correctly, then paragraph mode might be useful. This breaks a record on two or more new-lines, instead of just one:
#ARGV or die "Supply a filename\n";
my $filename= $ARGV[0];
local $/ = ""; # Set paragraph mode
open(my $file, $filename) or die "Unable to open '$filename' for read: $!";
while (my $lines = <$file>) {
my $num = (split("\t", $lines))[0];
open(my $fh, '>', $num) or die "Unable to open '$num' for write: $!";
print $fh $lines;
close $fh;
}
close $file;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

comparing two files and display matched results - perl

Related

Duplicate values in column

Nested if statements: Swapping headers and sequences in fasta files

Perl compilation error

compare two file in perl and find mismatches

How can I print lines from a file to separate files

Categories

Resources