Print unique values in FOR loop - perl

I have two files myresult and annotation. data in the two files appear to be as range but they are not, that's why i cant store it in an array and i need to use split operator so that i can use it in a for loop and compare. now i need to print all common values from $i(myresult) and $j(annotation) without repetition (unique). i am not getting which condition and how to implement it so as to get desired output. i tried using %hash but could not able to implement.
myresult
0..351
12..363
24..375
36..387
48..399
60..411
.
.
annotation
272..1042
1649..2629
3436..4752
4793..4975
5408..6022
6025..6252
.
.
CODE:
#!/usr/bin/perl
open( $inp0, "<myresult" ) or die "not found";
open( $inp2, "<annotation" ) or die "not found";
open( $out, ">output" );
my #arr2 = <$inp0>;
my #arr4 = <$inp2>;
my $sum1 = 0;
foreach my $line1 (#arr2) {
my ( $from1, $to1 ) = split( /\.\./, $line1 );
foreach my $line2 (#arr4) {
my ( $from2, $to2 ) = split( /\.\./, $line2 );
for ( my $i = $from1; $i <= $to1; $i++ ) {
for ( my $j = $from2; $j <= $to2; $j++ ) {
if ( $i == $j ) {
print $out "$i \n";
$sum1++;
}
}
}
}
}
print "Unique values = $sum1";

You don't need to iterate over both the arrays. If the starting points of the ranges are ascending, you can use the following code:
#!/usr/bin/perl
use warnings;
use strict;
my #result = qw( 4..20 8..12 14..22 22..29 27..29 28..35 40..50 );
my #annot = qw( 1..5 11..13 25..37 45..55 );
my $from = (split /\.\./, $result[0])[0];
my $to = (split /\.\./, $result[-1])[1];
for my $i ($from .. $to) {
print "$i\n" if grep inside($i, $_), #result
and grep inside($i, $_), #annot;
}
sub inside {
my ($i, $range) = #_;
my ($from, $to) = split /\.\./, $range;
return ($from <= $i and $i <= $to)
}

Translate each data set into an array of values.
Then use a hash to count the matched uniq values from both lists:
#!/usr/bin/perl -w
use strict;
use warnings;
use autodie;
use List::MoreUtils qw(uniq);
my #result = do {
#open my $fh, '<', "myresult";
open my $fh, '<', \ "0..351\n12..363\n24..375\n36..387\n48..399\n60..411\n";
map { my ( $min, $max ) = /\d+/g; ( $min .. $max ) } <$fh>;
};
my #annot = do {
#open my $fh, '<', "myresult";
open my $fh, '<', \ "272..1042\n1649..2629\n3436..4752\n4793..4975\n5408..6022\n6025..6252\n";
map { my ( $min, $max ) = /\d+/g; ( $min .. $max ) } <$fh>;
};
my %count;
$count{$_}++ for uniq(#result), uniq(#annot);
print join( ' ', sort { $a <=> $b } grep { $count{$_} == 2 } keys %count ), "\n";
Outputs:
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411

Related

how to expand the file based on the numbers which have given in the input file?

I have one input file
MOD_GlcNHglycan 264-268 DTSGT
I would like to have output as follow
MOD_GlcNHglycan 264 D
MOD_GlcNHglycan 265 T
MOD_GlcNHglycan 266 S
MOD_GlcNHglycan 267 G
MOD_GlcNHglycan 268 T
my $str = 'MOD_GlcNHglycan 264-268 DTSGT';
my ($name, $range, $chars) = split ' ', $str;
my ($start, $end) = split '-', $range;
my #chars_arr = split '', $chars;
my #results = ();
foreach my $char ( #chars_arr ) {
push #results, $name, ' ', $start++, ' ', $char;
}
Results in an array containing:
MOD_GlcNHglycan 264 D
MOD_GlcNHglycan 265 T
MOD_GlcNHglycan 266 S
MOD_GlcNHglycan 267 G
MOD_GlcNHglycan 268 T

How to group the values in foreach by if condition?

My script like this
use warnings;
use strict;
my #ar = <DATA>;
for(my $i = 0; $i<=$#ar; $i++){
$ar[$i] =~m/(\d+)$/g;
print "$ar[$i]\n" if ($& <= 15);
print "$ar[$i]\n" if ($& >100);
print "$ar[$i]\n" if ($& <40 && $& > 15);
}
__DATA__
hinsa 121
mkzin 12
mkva 34
mvakine 2
mzkev 9
mkvvz 5
mkhvzz 35
It gives the outputs but it is not group the value by if condition. and I also try this
#ar = <DATA>;
for(my $i = 0; $i<=$#ar; $i++){
$ar[$i] =~m/(\d+)$/g;
print "$ar[$i]\n" if ($& <= 15);
}
for(my $v = 0; $v<=$#ar; $v++){
$ar[$v] =~m/(\d+)$/g;
print "$ar[$v]\n" if ($& >100);
}
for(my $z = 0; $z<=$#ar; $z++){
$ar[$z] =~m/(\d+)$/g;
print "$ar[$z]\n" if ($& <40 && $& > 15);
}
In this code the second for condition is not working.
It gives the output:
mkzin 12
mvakine 2
mzkev 9
mkvvz 5
mkva 34
mkhvzz 35
I expect output is
mkzin 12
mvakine 2
mzkev 9
mkvvz 5
hisa 121
mkva 34
mkhvzz 35
How can i do it?
And also please explain, In my script 2 why the second foreach condition is not working?
#Hussain: When you write a perl code make sure that you use use strict; and use warnings;. I have modified your perl code and the problem with your code is you are trying to compare uninitialized $& value with a number. So it will throw a warning saying use of uninitialized $& in numeric gt (>) at so and so. For that i have modified with a scalar variable as shown below:
Input File(test.txt):
hinsa 121
mkzin 12
mkva 34
mvakine 2
mzkev 9
mkvvz 5
mkhvzz 35
Code:
use strict;
use warnings;
#Pass test.txt as an argument to the program
my $file = $ARGV[0];
open (my $fh, "<", $file) || die "cant open file";
my #ar = <$fh>;
for(my $i = 0; $i<=$#ar; $i++){
my $temp = 0;
($temp) = $ar[$i] =~ m/(\d+)/g;
print "$ar[$i]\n" if ($temp <= 15);
}
for(my $v = 0; $v<=$#ar; $v++){
my $temp = 0;
($temp) = $ar[$v] =~ m/(\d+)/g;
print "$ar[$v]\n" if ($temp > 100);
}
for(my $z = 0; $z<=$#ar; $z++){
my $temp = 0;
($temp) = $ar[$z] =~ m/(\d+)/g;
print "$ar[$z]\n" if ($temp <40 && $temp > 15);
}
close($fh);
Output:
mkzin 12
mvakine 2
mzkev 9
mkvvz 5
hisa 121
mkva 34
mkhvzz 35
There is no need for such convoluted code.
This program works by saving each line of the file into the appropriate element of array #groups, and printing the contents once the file has been read.
I hope you realise that lines with a value between 40 and 100 won't be printed at all?
use strict;
use warnings;
my #groups;
while (<DATA>) {
next unless /(\d+)/;
my $i;
$i = 0 if $1 <= 15;
$i = 1 if $1 > 100;
$i = 2 if $1 < 40 and $1 > 15;
push #{ $groups[$i] }, $_ if defined $i;
}
for (#groups) {
print for #$_;
print "\n";
}
__DATA__
hinsa 121
mkzin 12
mkva 34
mvakine 2
mzkev 9
mkvvz 5
mkhvzz 35
output
mkzin 12
mvakine 2
mzkev 9
mkvvz 5
hinsa 121
mkva 34
mkhvzz 35

How do print second column elements in row separated by comma(,) if the first element of column is same

The input what I am handling is as follows.
Q9NRG9 15
Q9NRG9 160
Q9NRG9 56
Q9NRG9 89
Q16613 26
Q16613 63
Q16613 102
O95477 19
O95477 91
O95477 78
O95477 86
O95477 16
O95477 203
O95477 66
P78363 18
P78363 159
P78363 88
I want output as
Q9NRG9 15,160,56,89
Q16613 26,63,102
O95477 78,86,16,203,66
I tried with perl program, but I couldn't get correct output what I want.
Using perl from the command line:
perl -lane '
push #{ $h{$F[0]} }, $F[1]
}{
$" = ",";
print "$_ #{ $h{$_} }" for keys %h
' file
O95477 19,91,78,86,16,203,66
Q9NRG9 15,160,56,89
P78363 18,159,88
Q16613 26,63,102
To maintain the order, you can do:
perl -lane '
$k{$F[0]}++ or push #r, $F[0];
push #{ $h{$F[0]} }, $F[1]
}{
$" = ",";
print "$_ #{ $h{$_} }" for #r
' file
Try this:
open (FILE, "text.txt") or die "cannot open file".$!;
my %data;
while(<FILE>){
chomp($_);
my ($key, $value) = split(/\s+/,$_);
push(#{$data{$key}}, $value);
}
foreach (keys %data){
print $_." ".join(",",#{$data{$_}})."\n";
}

why the following code not printing the common unique values from the two files [duplicate]

I have two files myresult and annotation. data in the two files appear to be as range but they are not, that's why i cant store it in an array and i need to use split operator so that i can use it in a for loop and compare. now i need to print all common values from $i(myresult) and $j(annotation) without repetition (unique). i am not getting which condition and how to implement it so as to get desired output. i tried using %hash but could not able to implement.
myresult
0..351
12..363
24..375
36..387
48..399
60..411
.
.
annotation
272..1042
1649..2629
3436..4752
4793..4975
5408..6022
6025..6252
.
.
CODE:
#!/usr/bin/perl
open( $inp0, "<myresult" ) or die "not found";
open( $inp2, "<annotation" ) or die "not found";
open( $out, ">output" );
my #arr2 = <$inp0>;
my #arr4 = <$inp2>;
my $sum1 = 0;
foreach my $line1 (#arr2) {
my ( $from1, $to1 ) = split( /\.\./, $line1 );
foreach my $line2 (#arr4) {
my ( $from2, $to2 ) = split( /\.\./, $line2 );
for ( my $i = $from1; $i <= $to1; $i++ ) {
for ( my $j = $from2; $j <= $to2; $j++ ) {
if ( $i == $j ) {
print $out "$i \n";
$sum1++;
}
}
}
}
}
print "Unique values = $sum1";
You don't need to iterate over both the arrays. If the starting points of the ranges are ascending, you can use the following code:
#!/usr/bin/perl
use warnings;
use strict;
my #result = qw( 4..20 8..12 14..22 22..29 27..29 28..35 40..50 );
my #annot = qw( 1..5 11..13 25..37 45..55 );
my $from = (split /\.\./, $result[0])[0];
my $to = (split /\.\./, $result[-1])[1];
for my $i ($from .. $to) {
print "$i\n" if grep inside($i, $_), #result
and grep inside($i, $_), #annot;
}
sub inside {
my ($i, $range) = #_;
my ($from, $to) = split /\.\./, $range;
return ($from <= $i and $i <= $to)
}
Translate each data set into an array of values.
Then use a hash to count the matched uniq values from both lists:
#!/usr/bin/perl -w
use strict;
use warnings;
use autodie;
use List::MoreUtils qw(uniq);
my #result = do {
#open my $fh, '<', "myresult";
open my $fh, '<', \ "0..351\n12..363\n24..375\n36..387\n48..399\n60..411\n";
map { my ( $min, $max ) = /\d+/g; ( $min .. $max ) } <$fh>;
};
my #annot = do {
#open my $fh, '<', "myresult";
open my $fh, '<', \ "272..1042\n1649..2629\n3436..4752\n4793..4975\n5408..6022\n6025..6252\n";
map { my ( $min, $max ) = /\d+/g; ( $min .. $max ) } <$fh>;
};
my %count;
$count{$_}++ for uniq(#result), uniq(#annot);
print join( ' ', sort { $a <=> $b } grep { $count{$_} == 2 } keys %count ), "\n";
Outputs:
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411

Merging multiple text files based on one column in perl

I am new to Perl and this is my first question in this blog hopefully to be solved.
I am having some text (10-18) files in a folder, I want to read all the files and merge all the files which are having common variables in the Names column along with their Area column for all the files.
For example :
file 1.txt
Name sim Area Cas
aa 12 54 222
ab 23 2 343
aaa 32 34 34
bba 54 76 65
file 2.txt
Name Sim Area Cas
ab 45 45 56
abc 76 87 98
bba 54 87 87
aaa 33 43 54
file 3.txt
Name Sim Area Cas
aaa 43 54 65
ab 544 76 87
ac 54 65 76
Output should be
Name Area1 Area2 area3
aaa 32 43 54
ab 23 45 76
Can anyone help regarding this. I am a very new to Perl and struggling to use Hashes.
I have tried this so far
use strict;
use warnings;
my $input_dir = 'C:/Users/Desktop/mr/';
my $output_dir = 'C:/Users/Desktop/test_output/';
opendir SD, $input_dir || die 'cannot open the input directory $!';
my #files_list = readdir(SD);
closedir(SD);
foreach my $each_file(#files_list)
{
if ($each_file!~/^\./)
{
#print "$each_file\n"; exit;
open (IN, $input_dir.$each_file) || die 'cannot open the inputfile $!';
open (OUT, ">$output_dir$each_file") || die 'cannot open the outputfile $!';
print OUT "Name\tArea\n";
my %hash; my %area; my %remaning_data;
while(my $line=<IN>){
chomp $line;
my #line_split=split(/\t/,$line);
# print $_,"\n" foreach(#line_split);
my $name=$line_split[0];
my $area=$line_split[1];
}
}
}
Can anyone provide guidance on how to complete this?
Thanks in advance.
perl -lane '$X{$F[0]}.=" $F[2]";END{foreach(keys %X){if(scalar(split / /,$X{$_})==4){print $_,$X{$_}}}}' file1 file2 file3
tested:
> perl -lane '$X{$F[0]}.=" $F[2]";END{foreach(keys %X){if(scalar(split / /,$X{$_})==4){print $_,$X{$_}}}}' file1 file2 file3
ab 2 45 76
aaa 34 43 54
#!/usr/bin/perl
use strict;
use warnings;
my $inputDir = '/tmp/input';
my $outputDir = '/tmp/out';
opendir my $readdir, $inputDir || die 'cannot open the input directory $!';
my #files_list = readdir($readdir);
closedir($readdir);
my %areas;
foreach my $file (#files_list) {
next if $file =~ /^\.+$/; # skip ..
open (my $fh, "<$inputDir/$file");
while (my $s = <$fh>) {
if ($s =~ /(\w+)\s+[\d\.]+\s+([\d\.]+)/) {
my ($name,$area) = ($1, $2); # parse name and area
push(#{$areas{$name}}, $area); # add area to the hash of arrays
}
}
close ($fh);
}
open (my $out, ">$outputDir/outfile");
foreach my $key (keys %areas) {
print $out "$key ";
print $out join " ", #{$areas{$key}};
print $out "\n";
}
close ($out);