File parsing using perl - perl

I am stuck in middle , i need help .
i have two files :
file1:
Total X :
Total y :
Total z :
Total t :
file 2:
4790351 4786929 3422 0
84860 84860 0 0
206626 206626 0 0
93902 93823 79 0
now i want output like this in third file
Total X : 4790351 4786929 3422 0
Total y : 84860 84860 0 0
Total z : 206626 206626 0 0
Total t : 93902 93823 79 0
This is my code below to try the parsing :Please help me getting the required output
while ( not eof $tata and not eof $outfh )
{
my #vals1 = split /":"/,<$tata>;
my #vals2 = split /\s+/, <$outfh>;
my #sum = join "\t", map { $vals1,$vals2[$_]} 0 .. $#vals2;
printf $_ for #sum,"\n";
}

use strict;
use warnings;
use 5.020;
use autodie;
use Data::Dumper;
open my $FILE1, "<", "file1.txt";
open my $FILE2, "<", "file2.txt";
open my $OUTFILE, ">", "results.txt";
my $first_line = <$FILE1>;
close $FILE1;
my #line_prefixes = split /\s*:\s*/, $first_line;
while (my $line = <$FILE2>) {
print {$OUTFILE} "$line_prefixes[$. - 1]: $line";
}
close $FILE2;
close $OUTFILE;
$. is the current line number in the file ($. equals 1 for the first line).
A sample run:
/pperl_programs$ cat file1.txt
Total X : Total y : Total z : Total t :
~/pperl_programs$ cat file2.txt
4790351 4786929 3422 0
84860 84860 0 0
206626 206626 0 0
93902 93823 79 0
~/pperl_programs$ cat results.txt
~/pperl_programs$ perl myprog.pl
~/pperl_programs$ cat results.txt
Total X: 4790351 4786929 3422 0
Total y: 84860 84860 0 0
Total z: 206626 206626 0 0
Total t: 93902 93823 79 0
~/pperl_programs$
For your altered files:
use strict;
use warnings;
use 5.020;
use autodie;
use Data::Dumper;
open my $FILE1, "<", "file1.txt";
open my $FILE2, "<", "file2.txt";
open my $OUTFILE, ">", "results.txt";
chomp(my #line_prefixes = <$FILE1>);
close $FILE1;
while (my $line = <$FILE2>) {
print {$OUTFILE} "$line_prefixes[$.-1] $line";
}
close $FILE2;
close $OUTFILE;
Sample output:
~/pperl_programs$ cat file1.txt
Total X :
Total y :
Total z :
Total t :
~/pperl_programs$ cat file2.txt
4790351 4786929 3422 0
84860 84860 0 0
206626 206626 0 0
93902 93823 79 0
~/pperl_programs$ cat results.txt
~/pperl_programs$ perl 1.pl
~/pperl_programs$ cat results.txt
Total X : 4790351 4786929 3422 0
Total y : 84860 84860 0 0
Total z : 206626 206626 0 0
Total t : 93902 93823 79 0
If your files are big, you probably don't want to read the whole first file into memory. If that's the case, you can read each file line by line:
use strict;
use warnings;
use 5.020;
use autodie;
use Data::Dumper;
open my $FILE1, "<", "file1.txt";
open my $FILE2, "<", "file2.txt";
open my $OUTFILE, ">", "results.txt";
while (!eof($FILE1) and !eof($FILE2) ) {
my $line_prefix = <$FILE1>;
chomp $line_prefix;
my $numbers_line = <$FILE2>;
chomp $numbers_line;
my #numbers = split /\s+/, $numbers_line;
my $fifth_column = $numbers[1] / $numbers[0];
say {$OUTFILE} "$line_prefix $numbers_line $fifth_column";
}
close $FILE1;
close $FILE2;
close $OUTFILE;

Your specification has a few loose ends; for instance - what if there are more lines in file2 then there are total labels in file1? Do you want the spaces in the first input file ignored? Do you want the spaces in the output file specifically as shown? ... and you really don't want any totals calculated??
I've presumed "yes" to most of these questions. My solution is driven by the second data file - which means that if there are more total labels then there are lines of data, they are going to be ignored. It also means that if there are more data lines in file2 then there are labels in file1, the program will simply make up the label - No Label?.
Finally, just in case you want to add these numbers up at some point, I've included but commented the sum function from List::Util.
use v5.12;
use File::Slurp;
# use File::Util qw( sum );
my $file1 = "file1.txt";
my $file2 = "file2.txt";
my $file3 = "file3.txt";
open(my $outfh, '>', $file3) or die "$file3: $!";
my #vals1 = split /\s*:\s*\n/ , read_file($file1);
my #vals2 = read_file($file2);
while (my $line = shift #vals2) {
chomp $line;
# my $total = sum split(" ", $line);
printf $outfh "%s : %s\n" , shift #vals1 // "No Label?" , $line ;
}
#
# $ cat file3.txt
Total X : 4790351 4786929 3422 0
Total y : 84860 84860 0 0
Total z : 206626 206626 0 0
Total t : 93902 93823 79 0

Related

Perl: read an array and calculate corresponding percentile

I am trying to code for a perl code that reads a text file with a series of number, calculates, and prints out the numbers that corresponds to the percentiles. I do not have access to the other statistical modules, so I'd like to stick with just pure perl coding. Thanks in advance!
The input text file looks like:
197
98
251
82
51
272
154
167
38
280
157
212
188
88
40
229
228
125
292
235
67
70
127
26
279
.... (and so on)
The code I have is:
#!/usr/bin/perl
use strict;
use warnings;
my #data;
open (my $fh, "<", "testing2.txt")
or die "Cannot open: $!\n";
while (<$fh>){
push #data, $_;
}
close $fh;
my %count;
foreach my $datum (#data) {
++$count{$datum};
}
my %percentile;
my $total = 0;
foreach my $datum (sort { $a <=> $b } keys %count) {
$total += $count{$datum};
$percentile{$datum} = $total / #data;
# percentile subject to change
if ($percentile{$datum} <= 0.10) {
print "$datum : $percentile{$datum}\n\n";
}
}
My desired output:
2 : 0.01
3 : 0.01333
4 : 0.01666
6 : 0.02
8 : 0.03
10 : 0.037
12 : 0.04
14 : 0.05
15 : 0.05333
16 : 0.06
18 : 0.06333
21 : 0.07333
22 : 0.08
25 : 0.09
26 : 0.09666
Where the format is #number from the list : #corresponding percentile
To store the numer wihtout a newline in #data, just add chomp; before pushing it, or chomp #data; after you've read them all.
If your input file has MSWin style newlines, convert it to *nix style using dos2unix or fromdos.
Also, try to learn how to indent your code, it boosts readability. And consider renaming $total to $running_total, as you use the value as it changes.

I want to have a output with 85 characters in each line, could you please say how I have to use print in this field?

I used following command to get a specific format that the output of it is in one line:
MASH P 0 3.64 NAMD P 0 3.79 AGHA P 0 4.50 SARG P 0 4.71 BENG P 0 5.47 BANR P 0 6.75 ABZA P 0 6.25 KALI P 0 6.91
I want to have a output with 85 characters in each line, could someone explain how I have to use print in this field?
You can use a regular expression with a quantifier:
$_ = 'MASH P 0 3.64 NAMD P 0 3.79 AGHA P 0 4.50 SARG P 0 4.71 BENG P 0 5.47 BANR P 0 6.75 ABZA P 0 6.25 KALI P 0 6.91';
print $&, "\n" while /.{1,85}/g;
or, if it's a part of a larger program and you don't want to suffer the performance penalty, use ${^MATCH} instead of $&:
use Syntax::Construct qw{ /p };
print ${^MATCH}, "\n" while /.{1,85}/gp;
You can also use the four argument substr:
print substr($_, 0, 85, q()), "\n" while $_;

merge multiple files with similar column

I have 30 files where column 1 is similar in each file. I would like to join the files based on column 1 so that the output file contains column 2 from each of the input files. I know how to join two files, but struggle with multiple files.
join -1 1 -2 1 File1 File2
The files are tab-separated with no header like this
File1
5S_rRNA 1324
5_8S_rRNA 32
7SK 15
ACA59 0
ACA64 0
BC040587 0
CDKN2B-AS 0
CDKN2B-AS_2 0
CDKN2B-AS_3 0
CLRN1-AS1 0
File2
5S_rRNA 571
5_8S_rRNA 11
7SK 5
ACA59 0
ACA64 0
BC040587 0
CDKN2B-AS 0
CDKN2B-AS_2 0
CDKN2B-AS_3 0
CLRN1-AS1 0
Output
5S_rRNA 1324 571
5_8S_rRNA 32 11
7SK 15 5
ACA59 0 0
ACA64 0 0
BC040587 0 0
CDKN2B-AS 0 0
CDKN2B-AS_2 0 0
CDKN2B-AS_3 0 0
CLRN1-AS1 0 0
First memory is the problem as the file size increases.Second if the ordering of the content is not important this will work good.
#!/usr/bin/perl
use strict;
use warnings;
my %hash;
my ($key,$value);
my #files=<files/*>;
foreach(#files){
open my $fh, '<', $_ or die "unable to open file: $! \n";
while(<$fh>){
chomp;
($key,$value)=split;
push(#{$hash{$key}},$value);
}
close($fh);
}
for(keys %hash){
print "$_ #{$hash{$_}} \n";
}
Below code will give your desire output but it will take more memory when number of files will increase (as you said there are 30 files). By using sort it sort the hash in alphabetical order of its keys (will give the output in same order as you mentioned in question).
#!/usr/bin/perl
use strict;
use warnings;
my #files = qw| input.log input1.log |; #you can give here path of files, or use #ARGV if you wish to pass files from command line
my %data;
foreach my $filename (#files)
{
open my $fh, '<', $filename or die "Cannot open $filename for reading: $!";
while (my $line = <$fh>)
{
chomp $line;
my ($col1, $col2) = split /\s+/, $line;
push #{ $data{$col1} }, $col2; #create an hash of array
}
}
foreach my $col1 (sort keys %data)
{
print join("\t", $col1, #{ $data{$col1} }), "\n";
}
Output:
5S_rRNA 1324 571
5_8S_rRNA 32 11
7SK 15 5
ACA59 0 0
ACA64 0 0
BC040587 0 0
CDKN2B-AS 0 0
CDKN2B-AS_2 0 0
CDKN2B-AS_3 0 0
CLRN1-AS1 0 0

i have csv file and how to create new file for every same first five digit phone number and file name also shoud be first 5 digit

I have csv file like below.
Service Area Code Phone Numbers Preferences
17 9861511646 0 D 2
17 9861310397 0 D 2
13 9827035035 0 A 2
13 9827304969 0 D 2
13 9827355786 0 A 2
13 9827702373 0 A 2
17 9861424414 0 D 2
13 9827702806 0 A 2
23 9832380279 0 D 2
13 9827231370 0 D 2
13 9827163453 0 D 2
and i want to create new file according to first 4 digit like 9861.csv, 9827.csv etc
and data should be like this in 9861.csv:
Service Area Code Phone Numbers Preferences
17 9861511646 0 D 2
17 9861310397 0 D 2
17 9861424414 0 D 2
in 9827.csv data:
Service Area Code Phone Numbers Preferences
13 9827035035 0 A 2
13 9827304969 0 D 2
13 9827355786 0 A 2
13 9827702373 0 A 2
13 9827702806 0 A 2
13 9827231370 0 D 2
13 9827163453 0 D 2
here my code
my $file = "mycsvfile.csv";
open(my $data, '<', $file) or die "Could not open '$file' $!\n";
while (my $line = <$data>) {
my #fields = split "," , $line;
my $first_four = substr ($fields[1], -10, 4,);
open $line{$first_four}, '>', "$first_four.csv";
print { $line{$first_four} } $line;
close OUT;
}
Use Text::CSV it will take some of the hassle (like the header line) from you.
I don't understand why you use $first_five instead of $first_four...
my $file = "mycsvfile.csv";
open(my $data, '<', $file) or die "Could not open '$file' $!\n";
while (my $line = <$data>) {
my #fields = split "," , $line;
my $first_four = substr($fields[1], 0, 4);
open my $fh, '>>', "$first_four.csv" or die $!;
print {$fh} $line;
close $fh;
}
close $data;

Truth Table Generation for the given input

I want to generate a truth table for the given input.Suppose if i give input 2 the output will be
10 01 11 00
if the input is 3 the output will be
111 000 110 101 011 100 001 010
i have a code snippet
#!/usr/bin/perl
#print "a|b|c\n";
for $a (1, 0){
for $b (1, 0){
for $c (1,0) {
for $d ( 1,0)
{
print "$a $b $c $d";
#print $x = ($a & $b & $c);
print "\n";
}
}
}
}
print "\n";
above code is for 4.
i don't know how to do this without writing multiple for loops. Here for value 2 i need to write two for loops and so on.
can any body tell me how to tweak this code for several input values.
Any help would be great appreciated
Recursion
Here is a simple solution using recursion:
#!/usr/bin/perl -w
my $variables=$ARGV[0]||0;
show_combinations($variables);
sub show_combinations { my($n,#prefix)=#_;
if($n > 0) {
show_combinations( $n-1, #prefix, 0);
show_combinations( $n-1, #prefix, 1);
} else {
print "#prefix\n";
}
}
Here are some sample cases:
> script.pl 1
0
1
> script.pl 2
0 0
0 1
1 0
1 1
> script.pl 3
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
I'm no Perl expert, so you might need to clean this up, but if forced to use Perl I'd probably do something like this:
#!/usr/bin/perl
my ($n) = #ARGV;
printf("%0*b\n", $n, $_) for 0 .. (1 << $n) - 1;
This is simple one line Perl code using module Math::Cartesian::Product.
use Math::Cartesian::Product;
cartesian {print "#_\n"} ([0..1]) x $ARGV[0];
Output
./sample.pl 2
0 0
0 1
1 0
1 1
./sample.pl 3
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
I don't know Perl so this may not work:
-- loop from 0 to (2^n)-1, where n is the number of digits in your cases;
-- convert each number to its n-digit binary representation;
Here is a generalization of my previous solution using Math::BigInt. This is an iterative solution:
#!/usr/bin/perl
use strict;
use warnings;
use Math::BigInt try => 'GMP';
my $n_bits = $ARGV[0] || 0;
my $it = make_it($n_bits);
while ( defined(my $bits = $it->()) ) {
print "$bits\n";
}
sub make_it {
my ($n_bits) = #_;
my $limit = Math::BigInt->new('2');
$limit->blsft($n_bits - 1);
my $next = Math::BigInt->new('-1');
return sub {
$next->binc;
return unless $next->bcmp($limit) < 0;
my $bits = $next->as_bin;
$bits =~ s/^0b//;
if ( (my $x = length $bits) < $n_bits ) {
$bits = '0' x ($n_bits - $x) . $bits;
}
return $bits;
}
}
You can use the %b format specifier for printf:
use strict;
use warnings;
my ($count) = #ARGV;
my $fmt = "%0${count}b";
my $n = 2**$count - 1;
for my $c (0 .. $n) {
my #bits = split //, sprintf $fmt, $c;
print "#bits\n";
}
This will only work for $count values less than 32.
Output:
C:\Temp> y 3
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
I am surprised no one has mentioned glob as a solution here:
perl -e 'print join "\n", glob("{0,1}" x shift || 1 )' -- 3
This prints:
000
001
010
011
100
101
110
111
glob is very handy for computing string permutations.
Here is the above, in a cleaner, non-one-liner form:
use strict;
use warnings;
my $symbol_count = shift || 1;
my #permutations = glob( '{0,1}' x $symbol_count );
print join "\n", #permutations;