an arithmetic calculation on each number in a txt file by perl - perl

I am very new to programming. I need to read a file line-by-line in perl. The text file has two columns and 100,000 rows all having numbers. I need to apply this formula (/16)*100 on each number and the result should be a separate file again with 2 columns and 100000 rows.
use strict;
use warnings;
my $filename = 'results_AH.txt';
open(my $fh, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
while (my $row = <$fh>) {
chomp $row;
print "$row\n";
}
print "done\n";
This is what I have. The file looks like (just a part). The calculation is to be done on both columns and each number.Please help :)
AH LHH
5 0
4 0
3 0
5 0
5 0
4 0
3 0
4 0
4 0
4 0
5 0
5 0
3 0
4 0

Hard-coding a filename is almost always a bad idea. If you read from <> then you can pass any filename on the command line. Also, it's more Perlish to read data into $_.
while (<>) {
# do stuff with $_
}
So what do we want to do? Well first let's split the data into individual columns and store them in an array.
my #numbers = split;
Notice that split() works on $_ and splits on whitespace by default.
Now we need to do your calculation. We can do it on all elements of #numbers using map().
my #new_numbers = map { $_ * 100 / 16 } #numbers;
And finally we want to print our results. That's as simple as:
print "#new_numbers\n";

Related

How Can I Read a lot of Colums as a Data in a File and Store Them in a Array Correctly in Perl

Right now I am trying to do some comparison with using some datas that are belongs to two different input files.
The first input file looks like this: I have two rows and a lot of columns.
id date1 time1 date2 time2 ne CC0 CC1 CC2 CC3 CC4... due to CC127
1 2016-09-26 14:13:56 2016-09-26 14:08:56 S1 7 1226 0 86 0
2 2016-09-26 14:13:56 2016-09-26 14:08:56 S2 8 1346 2 97 12
Second input file looks like this:
ne type time threshold
S1 CC000 09 50
S1 CC000 10 50
S1 CC000 11 50
S1 CC000 12 50
S1 CC000 13 50
S1 CC000 14 50
My main aim is to read those two files, storing necessary informations and datas in the arrays. If the time(as an hour) and ne condition matches then ı want to compare its data and its threshold value. If data is bigger than the threshold, ı want to keep this data and write them another file as a result. For example, for ne S1, and hour 14, CC0 data is equal to 7 and threshold is equal to 50.
So far ı wrote this codes; ( the last edited one, with help of Chris)
#! /usr/bin/perl -w
#compiler profilleri
use strict;
use warnings;
#dosya locationları
my $input_file="C:/Perl64/output/innput.txt";
my $s1_threshold="C:/Perl64/output/s1_threshold.txt";
#komutlar######
my $date; my $time; my $ne; my #hour; my #cc;
my $i=0; my $j=0;
open INPUT, "< $input_file" or die "$0: open of $input_file failed, error: $! \n";
while ( defined ($_=<INPUT>) )
{
my $line1 = $_;
my ( undef, $date, $time, undef, undef, $ne, #cc) = split (' ',$line1);
#print("$cc[16]\n");
my #time1= split(':',$time);
#hour=split(',',$time1[0]);
#print("#hour\n");
open THR, "< $s1_threshold" or die "$0: open pf $s1_threshold failed, error: $! \n";
while (defined($_=<THR>) )
{
my $line2=$_;
my ($ne1, $cc_type, $time1, $threshold ) =split(' ',$line2);
if( $hour[0] == $time1 && $ne eq $ne1 )
{
for ( $i=0;$i<128;$i++)
{
if ( $cc[$i] > $threshold )
{
# print("$cc[$i]\n");
}
}
}
}
}
Now ı can obtain all data correctly in a simple way, but when it comes to final if command I mean this,
if ( $ cc[$i] > $threshold )
cc array values are being compared with all of the threshold values, not just the value for related cc_type and hour.
Second input file contains threshold values with respond to cc_types. For each cc_type there are 23 different value with respect to hour so ı want to compare only for that specific hour and cc_type. How can ı fix that?
(When ı figure out the first part for that, ı will add the same procedure by adding another threshold file for S2. )
ı am a newbie in perl language, so any kind answer related to this appreciated.
Thanks in advance.
Regards.
UPDATE Changed the compare line to if ( $hour1 == $hour2 && $ne1 eq $ne2 ) and moved my $i = ... inside the if statement.
If I understand the 'type' variable correctly, (CC000 => 000), then the changed code here might do what you need.
Instead of using substr to get the data, Isplit the fields into the variables.
In the first file, the last receiver, #cc, gets all the remaining columns in the input line, (you stated there is only one line of data in the first file).
If there is only 1 line, there is no need for a while loop to read the data. Simply, note how I read the 1 line into the variables, (split ' ', <$fh>).
Since you don't seem to need date1 and time1, I assigned them to undef. (undef here is just a placeholder for values you don't want to capture. I could have also used undef for the first field, but I assigned it to $id which you aren't using anyway).
Also, I used lexical filehandles, ($fh, $fh2), instead of INPUT and THR, because it is the preferred practice. I can't say exactly why it is preferred, but I think it was adopted in perl v 5.6.
I also used the 3 argument, (filehandle, mode, file) mode of opening a file. (You used the 2 argument, the 3 arg. was introduced in perl ver. 5.6. It is a safer form of open).
#!/usr/bin/perl
use strict;
use warnings;
my $input_file = 'file1';
my $s1_threshold="file2";
open my $fh, '<', $input_file
or die "$0: open of $input_file failed, error: $! \n";
my ($id, $date, $time1, undef, undef, $ne1, #cc) = split ' ', <$fh>;
close $fh or die "$0: close of $input_file failed, error: $! \n";
# get hour from time1
my $hour1 = substr $time1, 0, 2;
open my $fh2, '<', $s1_threshold
or die "$0: open pf $s1_threshold failed, error: $! \n";
while (<$fh2>) {
my ($ne2, $cc, $hour2, $threshold) = split;
if ( $hour1 == $hour2 && $ne1 eq $ne2 ) {
my $i = 0 + substr $cc, 2;
if ( $cc[$i] > $threshold )
{
print("$cc[$i]\n");
print ("match\n");
}
else
{
print("not match\n");
}
}
}
close $fh2 or die "$0: close pf $s1_threshold failed, error: $! \n";
You are trying to use all of the values as one single integer, which will never work. You need to get those values one by one. Also the way you are parsing the lines is calling for trouble. You are better of with something like this:
my ($id, $date1, $time1, $date2, $time2, $ne, $cc0, $cc1, $cc2, $cc3, $cc4) = split /\s+/, $sonuc;
Now you can use $cc0, $cc1, ... individually as integers.

Issue with nested loop

I got file called numbers.txt which is basically line with 5 numbers:
they look like this:
1 2 3 4 5
What I'm trying to achieve is I want to read those numbers from the line (which already works), then in each iteration I want to add +1 to every number which was read from that file and print them on screen with print, so the final result should look like:
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
.
#!/usr/bin/perl
use strict;
use warnings;
open("handle", 'numbers.txt') or die('unable to open numbers file\n');
$/ = ' ';
OUT: for my $line (<handle>) {
for (my $a = 0; $a < 5; $a++) {
chomp $line;
$line += 1;
print "$line ";
next OUT;
}
}
close("handle");
Haven't done looping in perl for a while now and would be great if someone could provide working example.
Also, it would be great if you could provide more than one working example, just to be future proof ;)
Thanks
You can try this on for size.
#!/usr/bin/perl
use strict;
use warnings;
open("handle", 'numbers.txt') or die('unable to open numbers file\n');
for my $line (<handle>) {
chomp $line;
for my $number (split /\s+/, $line) {
for (my $a = $number; $a < $number+5; $a++) {
print "$a ";
}
print "\n";
}
}
close("handle");
You can dispense with $/=' ' and instead let the outer loop iterate on lines of the file.
For each line you want to iterate for each number which is separated by white space, thus the split /\s+/, $line which gives you a list of numbers for the inner loop.
For your output $a starts at the number read from the file.
This will do what you're after:
use strict;
use warnings;
while(<DATA>) {
chomp;
print "$_\n";
my #split = split;
my $count = 0;
for (1..4){
$count++;
foreach (#split){
my $num = $_ + $count;
print "$num ";
}
print "\n";
}
}
__DATA__
1 2 3 4 5
Here no need to use nested loop it's always program make slower.
#!/usr/bin/perl
use strict;
use warnings;
my #num = split(" ",(<DATA>)[0]);
foreach my $inc (0..$#num)
{
print map{$inc+$_," "}#num; # Add one by one in array element
print "\n";
}
__DATA__
1 2 3 4 5
Update Added another method, this one in line with the posted approach.
Increment each number in the string, changing the string in place. Repeat that. Below are two ways to do that. Yet another method reads individual numbers and prints following integer sequences.
(1) With regular expressions. It also fits in one-liner
echo "1 2 3 4 5" | perl -e '$v = <>; for (1..5) { print $v; $v =~ s/(\d+)/$1+1/eg; }'
This prints the desired output. But better put it in a script
use warnings;
use strict;
my $file = 'numbers.txt';
open my $fh, '<', $file or die "can't open $file: $!";
while (my $line = <$fh>) {
# Add chomp($line) if needed for some other processing.
for (1..5) {
print $line;
$line =~ s/(\d+)/$1+1/eg;
}
}
The /e modifier is crucial for this. It makes the replacement side of the regex be evaluated as code instead of as a double-quoted string. So you can actually execute code there and here we add to the captured number, $1+1, for each matched number as /g moves down the string. This changes the string so the next iteration of the for (1..5) increments those, etc. I match multiple digits, \d+, which isn't necessary in your example but makes far more sense in general.
(2) Via split + map + join, also repeatedly changing the line in place
while (my $line = <$fh>) {
for (1..5) {
print $line;
$line = join ' ', map { $_+1 } split '\s+', $line;
}
}
The split gets the list of numbers from $line and feeds it to map, which increments each, feeding its output list to join. The joined string is assigned back to $line, and this is repeated. I split by \s+ to allow multiple white space but this makes it very 'relaxed' in what input format it accepts, see perlrecharclass. If you know it's one space please change that to ' '.
(3) Take a number at a time and print the integer sequence starting from it.
open my $fh, '<', $file or die "can't open $file: $!";
local $/ = ' ';
while (my $num = <$fh>) {
print "$_ " for $num..$num+4;
print "\n";
}
The magical 4 can be coded by pre-processing the whole line to find the sequence length, say by
my $len = () = $line =~ /(\d+)/g;
or by split-ing into an array and taking its scalar, then using $len-1.
Additional comments.
I recommend the three-argument open, open my $fh, '<', $file
When you check a call print the error, die "Your message: $!", to see the reason for failure. If you decide to quit, if ($bad) { die "Got $bad" }, then you may not need $!. But when an external call fails you don't know the reason so you need the suitable error variable, most often $!.
Your program has a number of problems. Here is what's stopping it working
You are setting the record separator to a single space. Your input file contains "1 2 3 4 5\n", so the while loop will iterate five times setting $line to "1 ", "2 ", "3 ", "4 ", "5\n"
Your for loop is set up to iterate five times. It does chomp $line which removes the space after the number, then increments $line and prints it. Then you jump out of the for loop, having executed it only once, with next OUT. This results in each value in the file being incremented by one and printed, so you get 2 3 4 5 6
Removing the unnecessary next OUT, produces something closer
2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9 6 7 8 9 10
There are now five numbers being printed for each number in the input file
Adding print "\n" after the for loop help separate the lines
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
6 7 8 9 10
Now we need to print the number before it is incremented instead of afterwards. If we swap $line += 1 and print "$line " we get this
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5
6 7 8 9
What is happening here is that the 5 is still followed be a newline, which now appears in the output. The chomp won't remove this because it removes the value of $/ from the end of a string. You've set that to a space, so it will remove only spaces. The fix is to replace chomp with a substitution s/\s+//g which removes *all whitespace from the string. You also need to do that only once so I've put it outside the for loop at the top
Now we get this
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
5 6 7 8 9
And this is your code as it ended up
use strict;
use warnings;
open( "handle", 'numbers.txt' ) or die('unable to open numbers file\n');
$/ = ' ';
for my $line (<handle>) {
$line =~ s/\s+//g;
for ( my $a = 0; $a < 5; $a++ ) {
print "$line ";
$line += 1;
}
print "\n";
}
close("handle");
There are a few other best practices that could improve your program
Use use warnings 'all'
Use lexical file handles, and the three-parameter form of open
Use local if you are changing Perl's built-in variables
Put $! into your die string so that you know why the open failed
Avoid the C-style for loop, and iterate over a list instead
Making these fixes as well looks like this. The output is identical to the above
use strict;
use warnings 'all';
open my $fh, '<', 'numbers.txt'
or die qq{Unable to open "numbers.txt" for input: $!};
local $/ = ' ';
for my $line ( <$fh> ) {
$line =~ s/\s+//g;
for my $a ( 0 .. 4 ) {
print "$line ";
++$line;
}
print "\n";
}

Push, big file. Correct and improvement

dummy.pepmasses
YCL049C 1 511.2465 0 0 MFSK
YCL049C 2 4422.3098 0 0 YLVTASSLFVA
YCL049C 3 1131.5600 0 0 DFYQVSFVK
YCL049C 4 1911.0213 0 0 SIAPAIVNSSVIFHDVSR
YCL049C 5 774.4059 0 0 GVAMGNVK
YCL049C 6 261.1437 0 0 SR
my $dummyfile = "dummy.pepmasses"; #filename defined here
my #mzco = ();
open (IFILE, $dummyfile) or die "unable to open file $dummyfile\n ";
while (my $line = $dummyfile){
#read each line in file
chomp $line;
my $mz_value = (split/\s+/,$line)[3]; #pick column 3rd at every line
$mz_value = join "\n"; # add "\n" for data
push (#mzco,$mz_value); #add them all in one array #mzco
}
print "#mzco";
close IFILE;
There should be better way to express this one. How can it be ?
I want to pick up the third column and push it into an array. Are there better methods?
I'll just go through your code and comment
open (IFILE, $dummyfile) or die "unable to open file $dummyfile\n ";
You should use 3-argument open with explicit mode, and a lexical file handle. Also, you should not include newline in the die message unless you want to suppress line number. You should also include the error, $!.
open my $fh, "<", $dummyfile or die "Unable to open $dummyfile: $!";
while (my $line = $dummyfile){
#read each line in file
No, this just copies the file name. To read from the file handle, do this:
while (my $line = <IFILE>) {
Or <$fh> if you use a lexical file handle.
chomp $line;
my $mz_value = (split/\s+/,$line)[3]; #pick column 3rd at every line
This is actually the 4th column, since indexes start at zero 0.
$mz_value = join "\n"; # add "\n" for data
join does not work that way. It is join EXPR, LIST to join a list of values into a string. You want the concatenation operator .:
$mz_value = $mz_value . "\n";
Or more appropriately:
$mz_value .= "\n";
But why do it that way? It is simpler to just add the newline when you print.
print "#mzco";
You can do this:
print "$_\n" for #mzco;
Or if you are feeling daring:
use feature 'say';
say for #mzco;
And just to show you the power of Perl, this program can be reduced to a one-liner, using a lot of built-in features:
perl -lane ' print $F[3] ' dummy.pepmasses
-l chomp lines, add newline (by default) to print
-n put while (<>) loop around code: read input file or stdin
-a autosplit each line into #F.
The program as a file would look like this:
$\ = $/; # set output record separator to input record separator
while (<>) {
chomp;
my #F = split;
print $F[3];
}

Finding The number of Divisors in a given number?

I have created a Perl program to calculate the amount of divisible numbers in numbers 3 to 10.
Example: the number 6 has 4 divisors 1, 2, 3 and 6.
This is how the program is suppose to work:
The program will calculated the number of divisors of 3 it will then print it to the report.txt file. Next, it will move on to calculate the number of divisors of 4 and print it to report.txt. The program will do this until it has calculated to the number 10 then it will close the program.
#!/usr/bin/perl
use warnings;
use strict;
my $num = 2; # The number that will be calculated
my $count = 1; # Counts the number of divisors
my $divisors; # The number of divisors
my $filename = 'report.txt';
open(my $fh, '>', $filename) or die "Could not open file '$filename' $!"; # open file "report.txt"
for (my $i=2; $i <= 10; $i++) {
while( $num % $i == 0) { # Checks if the number has a remainder.
$num++; # Adds 1 to $num so it will calculate the next number.
$count++; # counts the number of divisible numbers.
$num /= $i; # $num = $num / $i.
}
$divisors = $count; # The number of divisors are equal to $count.
print $fh "$divisors\n"; # The output will be repeated..
}
close $fh # Closes the file "report.txt"
I think the problem is that the for-loop keeps repeating this code:
print $fh "$divisors\n";
The output is:
2
2
2
2
2
2
2
2
2
but, I'm not sure exactly what I am missing.
Give your variables meaningful names. This helps in both making your code self-documenting, but also in that it helps you recognize when you're using a variable incorrectly. The variable name $i doesn't communicate anything, but $divisor says that you are testing if that number is a divisor.
As for why your code is looping, can't say. Here is a reformatted version of your code that does function though:
#!/usr/bin/perl
use warnings;
use strict;
use autodie;
for my $num (2..10) {
my $divisor_count = 0;
for my $divisor (1..$num) {
$divisor_count++ if $num % $divisor == 0;
}
print "$num - $divisor_count\n"
}
Output:
2 - 2
3 - 2
4 - 3
5 - 2
6 - 4
7 - 2
8 - 4
9 - 3
10 - 4

Compare and insert count number of elements in columns into table in perl

I'm working in with two large data sets (300 x 500,000) and I've got a matrix with 0,1, 2 and NA values in both data, and I would like to compare these files and count the number are matching in both files by every row and insert the results into the a output table results.
File 1
2 1 0
0 1 1
1 0 NA
File 2
2 1 0
Na 1 1
1 NA 0
How can I compare count of match values in every row and the total sum?
I've interpreted what you mean by "total" and the count of matching lines is just dumped but this does what you asked for and you should be able to adopt it to your exact spec
#!/usr/bin/perl
#
use Data::Dumper;
use strict;
use warnings;
# open files with error checking
open(my $f1,"file1") || die "$! file1";
open(my $f2,"file2") || die "$! file2";
#hash to store count of similar rows in
my %match_count=();
#total sum
my $total=0;
#read line from each file, lower case it to ignore Na NA difference and
#chomp to remove \n so this isn't stored
while(my $l1=lc(<$f1>)) {
my $l2 = lc(<$f2>);
chomp($l1);
chomp($l2);
#see if lines are the same
if ($l1 eq $l2) {
#increment counter for this line
$match_count{$l1}++;
#find sum of row and add to total
my ($first,$second,$third) = split(/\s/,$l1);
$total += $first+$second+$third;
}
}
print "sum total of matches = $total\n";
print Dumper(\%match_count);