group list of numbers - perl

I have a list of numbers in a file in one column such as-
144
542
123
54
234
233
I want to group numbers every nth time
For example : if n=2 then 144,542 is in one group , 123,54 in the second , 234,233 is in the third till the end of the file
the loop I wrote just gives me the first group of numbers and not the entire list:
What changes should I do ?
use strict;
open ( IN ,"$inputfile") || die ("cannot open ! ");
my #list;
my $N=2;
while (#list = <IN>) {
chomp;
for ( $i=1;$i<=$N;$i++){
print "#list[$i]";
}
}

Use natatime from List::MoreUtils
use warnings;
use strict;
use List::MoreUtils qw(natatime);
my $n = 2;
my #list;
while (<DATA>) {
chomp;
push #list, $_;
}
my $it = natatime($n, #list);
while (my #vals = $it->()) {
print "#vals\n";
}
__DATA__
144
542
123
54
234
233
Prints:
144 542
123 54
234 233

You can use the by function from List::Gen to partition a list into equal size segments:
use List::Gen qw(by);
my $pairs = by 2 => # partition by 2
grep {s/^\s+|\s+$//g; length} # remove whitespace and empty lines
<DATA>; # read all lines
print "#$_\n" for #$pairs;
__DATA__
144
542
123
54
234
233
which prints:
144 542
123 54
234 233

I have to applaud your use of strict and would like to encourage you to also add warnings. :)
And a solution that makes the semantics a bit more clear:
use strict;
use warnings;
use File::Slurp 'read_file';
use Array::Split qw( split_by );
my $inputfile = 'file';
my #lines = read_file( "$inputfile" );
$_ =~ s/[\r\n]//g for #lines; # remove newlines
my #grouped_lines = split_by( 2, #lines );
for my $group ( #grouped_lines ) {
print join ',', #{$group};
print "\n";
}
__END__
144
542
123
54
234
233
becomes:
144,542
123,54
234,233

Related

How to split left data as keys and right data ask values in perl

my input file:
Lin :202020 123 455
Star :21334 1233 555
I'm trying to split Lin & Star as keys and the rest of the numbers as values.
However, I'm getting output like this :
'202020 123 455 Star' => undef,
my expected output :
$VAR1 = {
'Lin' =>'202020 123 455',
'Star'=>'21334 1233 555'
}
My code:
use strict;
use warnings;
use Data::Dumper;
.
.
.
.
while($lines)
{
my %hash = split /[:]/,$lines;
print Dumper (%hash);
last;
}
Problems with your code:
We don't know what $lines is
while ($lines) doesn't make any sense
You're re-initializing %hash on every iteration of the loop
last doesn't make any sense
Solution:
use strict;
use warnings;
use Data::Dumper;
my %hash;
while (<DATA>) {
chomp;
my #parts = split(/\s*:\s*/);
$hash{$parts[0]} = $parts[1];
}
print Dumper(\%hash);
__DATA__
Lin :202020 123 455
Star :21334 1233 555
use 5.18.2;
use strict ;
use warnings ;
use Data::Dumper;
my #Data = ( "Lin :202020 123 455" , "Star :21334 1233 555" ) ;
my %hash = map { (split(':'))[0] => (split(':'))[1] } #Data ; # you can also use <STDIN> instead of #Data
print Dumper(\%hash);

Merging N no of files based on their first column in perl

My question is similar to this question posted earlier.
I am having many files which I need to merge them based on the presence or absence of the first column ID, but while merging I am getting lots of empty values in my output file, I want those empty values to be zero if it is not present in another file. The example below is based on only two files content, but I have many sample files like this format (tabular).
For example:
File1
ID Value
123 1
231 2
323 3
541 7
File2
ID Value
541 6
123 1
312 3
211 4
Expected Output:
ID File1 File2
123 1 1
231 2 0
323 3 0
541 7 6
312 0 3
211 0 4
Obtaining Output:
ID File1 File2
123 1 1
231 2
323 3
541 7 6
312 undef 3
211 undef 4
As you can see above I am getting output but in file2 column, it's not adding zero or leaving empty and in case of file1 column it is having undef value. I have checked undef values and then my final output gives zeros in place of undef values but still I am having those empty spaces. Please find my code below (hardcoded only for two files).
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
use Data::Dumper;
my $path = "/home/pranjay/Projects/test";
my #files = ("s1.txt","s2.txt");
my %classic_com;
my $cnt;
my $classic_txt;
my $sample_cnt = 0;
my $classic_txtcomb = "test_classic.txt";
open($classic_txt,">$path/$classic_txtcomb") or die "Couldn't open file
$classic_txtcomb for writing,$!";
print $classic_txt "#ID\t"."file1\tfile2\n";
foreach my $file(#files){
$sample_cnt++;
print "$sample_cnt\n";
open($cnt,"<$path/$file")or die "Couldn't open file $file for reading,$!";
while(<$cnt>){
chomp($_);
my #count = ();
next if($_=~/^ID/);
my #record=();
#record=split(/\t/,$_);
my $scnt = $sample_cnt -1;
if((exists($classic_com{$record[0]})) and ($sample_cnt > 0)){
${$classic_com{$record[0]}}[$scnt]=$record[1];
}else{
$count[$scnt] = "$record[1]";
$classic_com{$record[0]}= [#count];
}
}
}
my %final_txt=();
foreach my $key ( keys %classic_com ) {
#print "$key: ";
my #val = #{ $classic_com{$key} };
my #v;
foreach my $i ( #val ) {
if(not defined($i)){
$i = 0;
push(#v, $i);
}else{
push(#v, $i);
next;
}
}
$final_txt{$key} = [#v];
}
#print Dumper %classic_com;
while(my($key,$value)=each(%final_txt)){
my $val=join("\t", #{$value});
print $classic_txt "$key\t"."#{$value}"."\n";
}
Just read the input files into a hash of arrays. The topmost key is the ID, each inner array contains the value for file i on the i-th position. When printing, use the // defined-or operator to replace undefs with zeroes:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my %merged;
my $file_tally = 0;
while (my $file = shift) {
open my $in, '<', $file or die "$file: $!";
<$in>; # skip the header
while (<$in>) {
my ($id, $value) = split;
$merged{$id}[$file_tally] = $value;
}
++$file_tally;
}
for my $value (keys %merged) {
my #values = #{ $merged{$value} };
say join "\t", $value, map $_ // 0, #values[0 .. $file_tally - 1];
}
program.pl
my %val;
/ (\d+) \s+ (\d+) /x and $val{$1}{$ARGV} = $2 while <>;
pr( 'ID', my #f = sort keys %{{map%$_,values%val}} );
pr( $_, map$_//0, #{$val{$_}}{#f} ) for sort keys %val;
sub pr{ print join("\t",#_)."\n" }
Run:
perl program.pl s1.txt s2.txt
ID s1.txt s2.txt
123 1 1
211 0 4
231 2 0
312 0 3
323 3 0
541 7 6

Calculating distances in PDB file

With reference to the question Calculating the distance between atomic coordinates, where the input is
ATOM 920 CA GLN A 203 39.292 -13.354 17.416 1.00 55.76 C
ATOM 929 CA HIS A 204 38.546 -15.963 14.792 1.00 29.53 C
ATOM 939 CA ASN A 205 39.443 -17.018 11.206 1.00 54.49 C
ATOM 947 CA GLU A 206 41.454 -13.901 10.155 1.00 26.32 C
ATOM 956 CA VAL A 207 43.664 -14.041 13.279 1.00 40.65 C
.
.
.
ATOM 963 CA GLU A 208 45.403 -17.443 13.188 1.00 40.25 C
there is an answer reported as
use strict;
use warnings;
my #line;
while (<>) {
push #line, $_; # add line to buffer
next if #line < 2; # skip unless buffer is full
print proc(#line), "\n"; # process and print
shift #line; # remove used line
}
sub proc {
my #a = split ' ', shift; # line 1
my #b = split ' ', shift; # line 2
my $x = ($a[6]-$b[6]); # calculate the diffs
my $y = ($a[7]-$b[7]);
my $z = ($a[8]-$b[8]);
my $dist = sprintf "%.1f", # format the number
sqrt($x**2+$y**2+$z**2); # do the calculation
return "$a[3]-$b[3]\t$dist"; # return the string for printing
}
The output of above code is the distance between the first CA to the second one and second to third and so on...
How to modify this code to find the distance between first CA to rest of the CAs (2, 3, ..) and from second CA to rest of the CAs (3, 4, ..) and so on and printing only those which is less then 5 Angstrom?
I found that push #line, $_; statement should be altered to increase the array size but not clear how to do that.
To get the pairs, read the file into an array, #data_array. Then loop over the entries.
Update: Added file opening and load #data_array.
open my $fh, '<', 'atom_file.pdb' or die $!;
my #data_array = <$fh>;
close $fh or die $!;
for my $i (0 .. $#data_array) {
for my $j ($i+1 .. $#data_array) {
process(#data_array[$i,$j]);
}
}
May be try this:
use strict;
use warnings;
my #alllines = ();
while(<DATA>) { push(#alllines, $_); }
#Each Current line
for(my $i=0; $i<=$#alllines+1; $i++)
{
#Each Next line
for(my $j=$i+1; $j<=$#alllines; $j++)
{
if($alllines[$i])
{
#Split the line into tab delimits
my ($line1_tb_1,$line1_tb_2,$line1_tb_3) = split /\t/, $alllines[$i];
print "Main_Line: $line1_tb_1\t$line1_tb_2\t$line1_tb_3";
if($alllines[$j])
{
#Split the line into tab delimits
my ($line_nxt_tb1,$line_nxt_tb2,$line_nxt_tb3) = split /\t/, $alllines[$j];
print "Next_Line: $line_nxt_tb1\t$line_nxt_tb2\t$line_nxt_tb3";
#Do it your coding/regex here
}
}
#system 'pause'; Testing Purpose!!!
}
}
__DATA__
tab1 123 456
tab2 789 012
tab3 345 678
tab4 901 234
tab5 567 890
I hope this will help you.

Awk - Get rows from a file that contain values within ranges described in another file

I have 2 tab-delimited files formatted similar to this:
file 1
A 100 90 PASS
B 89 80 PASS
C 79 70 PASS
D 69 60 FAIL
F 59 0 FAIL
file 2
Randy 80
Denis 44
Earl 97
I want to take the values from column 2 in file 2 and compare them with the ranges given between columns 2 and 3 of file 1. Then I want to create a new file that combines this data, printing columns 1 and 2 from file 2 and columns 1 and 4 from file 1:
file 3
Randy 80 B PASS
Denis 44 F FAIL
Earl 97 A PASS
I want to implement this using awk or perl.
You can use this awk:
awk 'BEGIN{FS=OFS="\t"}
FNR==NR {
a[$0] = $2
next
}
{
for (i in a)
if ($2>=a[i] && $3<=a[i])
print i, $1, $4
}' file2 file1
Earl 97 A PASS
Randy 80 B PASS
Denis 44 F FAIL
In perl, I'd probably do something like this:
#!/usr/bin/env perl
use strict;
use warnings 'all';
use Data::Dumper;
open ( my $grades_in, '<', "file1.txt" ) or die $!;
my #grade_lookup = map { [split] } <$grades_in>;
print Dumper \#grade_lookup;
close ( $grades_in );
open ( my $people, '<', "file2.txt" ) or die $!;
while (<$people>) {
chomp;
my ( $person, $score ) = split;
my ( $grade ) = grep { $_ -> [1] >= $score
and $_ -> [2] <= $score } #grade_lookup;
print join " ", $person, $score, $grade -> [0], $grade -> [3], "\n";
}
close ( $people );
output:
Randy 80 B PASS
Denis 44 F FAIL
Earl 97 A PASS
In Perl
use strict;
use warnings 'all';
use autodie;
use List::Util 'first';
my #grades = do {
open my $fh, '<', 'file1.txt';
map [ split ], <$fh>;
};
open my $fh, '<', 'file2.txt';
while ( <$fh>) {
my ($name, $score) = split;
my $grade = first { $_->[2] <= $score } #grades;
print "$name $score #$grade[0,3]\n";
}
output
Randy 80 B PASS
Denis 44 F FAIL
Earl 97 A PASS

Print header row and each row in file the file, side by side in columns

Hi I have a tsv file that i am trying to print the header row and each line of the file side by side ie in columns.
Unfortunatley i am bit confused on how to join the lines in a print statement.
#!/usr/bin/perl
use strict;
use warnings;
local $" = "'\n'";
my #temp;
while (<DATA>) {
chomp;
my #columns = join "\t", $_;
push #temp, #columns;
}
my $Header_row = shift (#temp);
my #head;
my $abc;
my #abc = split(/\t/,$Header_row);
for my $abc(#abc) {
push #head, $abc ."\n";
}
my #roows;
my $elements;
foreach (#temp){
chomp;
my $line = $_;
my #elements = split ("\t", $line);
for $elements(#elements){
push #roows, $elements ."\n";
}
}
#print #head, $abc ."\n";
#print #roows, $elements ."\n";
__DATA__
Year Tonn Class Cargo Type
88 61 T Rice Truck
89 55 G Corn Train
92 93 S Peas Ship
required Output
OUTPUT
Year 88
Tonn 61
Class T
Cargo Rice
Type Truck
Year 89
Tonn 55
Class G
Cargo Corn
Type Train
Year 92
Tonn 93
Class S
Cargo Peas
Type Ship
Based on your source, this should do the trick:
#!/usr/bin/env perl
use strict;
use warnings;
#read the header line into #header;
my $header_line = <DATA>;
chomp $header_line;
chomp ( my #header = split ( ' ', $header_line );
#iteraate data fh
while ( <DATA> ) {
#strip linefeed
chomp;
#read this row into a hash
my %row; #row{#header} = split;
#print this hash in the same order as the header.
#note - $_ is set to each element of header in turn when doing this.
print "$_\t$row{$_}\n" for #header;
#insert extra linefeed
print "\n";
}
__DATA__
Year Tonn Class Cargo Type
88 61 T Rice Truck
89 55 G Corn Train
92 93 S Peas Ship
Note - you can condense further that 'read headers' to:
chomp ( my #header = split ( ' ', <DATA> ) );
Which does the same thing, but might be a bit harder to follow.
There's really no need to read all the lines into #temp before looping through to print them out. It would be more efficient to read just the first line to get the headings and then loop through the remaining lines printing them immediately:
#!/usr/bin/perl
use strict;
use warnings;
my #temp;
my $line = <DATA>;
chomp($line);
my #head = split "\t", $line;
foreach $line (<DATA>) {
chomp($line);
my #elements = split ("\t", $line);
foreach my $i (0..$#head) {
print $head[$i], "\t", $elements[$i], "\n";
}
print "\n";
}
__DATA__
Year Tonn Class Cargo Type
88 61 T Rice Truck
89 55 G Corn Train
92 93 S Peas Ship
The print line could also be written as:
print "$head[$i]\t$elements[$i]\n";
I just thought it was a little clearer to separate out all the parts.