Trying to write a specific result to a new outfile - perl

I am extremely new to the Perl process. I am very much enjoying the learning curve and Perl but I am frustrated beyond belief and have spent many, many hours on one task achieving little to no results.
#!/usr/bin/perl
use strict;
print "Average value of retroviruses for the length of each genome and each of the genes:\n"; #create a title for the script
my $infile = "Lab1_table.txt"; # This is the file path.
open INFILE, $infile or die "Can't open $infile: $!"; # Provides an error message if the file can'tbe found.
# set my initial values.
my $tally = 0;
my #header = ();
my #averages = ();
# create my first loop to run through the file by line.
while (my $line = <INFILE>){
chomp $line;
print "$line\n";
# add one to the loop and essentially remove the header line of value.
# the first line is what was preventing me from caclulating averages as Perl can't calculate words.
my #row = split /\t/, $line; # split the file by tab characters.
$tally++; #adds one to the tally.
if ( $tally == 1 ) { #if the tally = 1 the row is determined as a the header.
#header = #row;
}
# if the tally is anything else besides 1 then it will read those rows.
else {
for( my $i = 1; $i < scalar #row; $i++ ) {
$averages[$i] += $row[$i];
}
foreach my $element (#row){
}
foreach my $i (0..4){
$averages[$i] = $averages[$i] + $row[1..4];
}
}
}
print "Average values of genome, gag, pol and env:\n";
for( my $i = 1; $i < scalar #averages; $i++ ) { # this line is used to determine the averages of the columns and print the values
print $averages[$i]/($tally-1), "\n";
}
SO, I got the results to come up with what I wanted (not in the exact format I wanted but as close as I can seem to get at the moment) and they do average the columns.
The issue now is writing to a an outfile. I am trying to get my table and results from the previous code to appear in my outfile. I get a good file name but no results.
foreach my $i (1){
my $outfile= "Average_values".".txt";
open OUTFILE, ">$outfile" or die "$outfile: $!";
print "Average values of genome, gag, pol and env:\n";
}
close OUTFILE;
close INFILE;
I feel like there is an easy way to do this and a hard way and I have taken the very hard way. Any help would be much appreciated.

You did not tell Perl where to print:
print OUTFILE "Average values of genome, gag, pol and env:\n";
BTW, together with use strict, also use warnings. And for working with files, use lexical filehandles and the three argument form of open:
open my $FH, '>', $filename or die $!;
print $FH 'Something';
close $FH or die $!;

Related

Check how many "," in each line in Perl [duplicate]

This question already has answers here:
Counting number of occurrences of a string inside another (Perl)
(4 answers)
Closed 7 years ago.
I have to check how many times was "," in each line in file. Anybody have idea how can I do it in Perl?
On this moment my code looks like it:
open($list, "<", $student_list)
while ($linelist = <$list>)
{
printf("$linelist");
}
close($list)
But I have no idea how to check how many times is "," in each $linelist :/
Use the transliteration operator in counting mode:
my $commas = $linelist =~ y/,//;
Edited in your code :
use warnings;
use strict;
open my $list, "<", "file.csv" or die $!;
while (my $linelist = <$list>)
{
my $commas = $linelist =~ y/,//;
print "$commas\n";
}
close($list);
If you just want to count the number of somethings in a file, you don't need to read it into memory. Since you aren't changing the file, mmap would be just fine:
use File::Map qw(map_file);
map_file my $map, $filename, '<';
my $count = $map =~ tr/,//;
#! perl
# perl script.pl [file path]
use strict;
use warnings;
my $file = shift or die "No file name provided";
open(my $IN, "<", $file) or die "Couldn't open file $file: $!";
my #matches = ();
my $index = 0;
# while <$IN> will get the file one line at a time rather than loading it all into memory
while(<$IN>){
my $line = $_;
my $current_count = 0;
# match globally, meaning keep track of where the last match was
$current_count++ while($line =~ m/,/g);
$matches[$index] = $current_count;
$index++;
}
$index = 0;
for(#matches){
$index++;
print "line $index had $_ matches\n"
}
You can use mmap Perl IO layer instead of File::Map. It is almost as efficient as former but most probably present in your Perl installation without needing installing a module. Next, using y/// is more efficient than m//g in array context.
use strict;
use warnings;
use autodie;
use constant STUDENT_LIST => 'text.txt';
open my $list, '<:mmap', STUDENT_LIST;
while ( my $line = <$list> ) {
my $count = $line =~ y/,//;
print "There is $count commas at $.. line.\n";
}
If you would like grammatically correct output you can use Lingua::EN::Inflect in the right place
use Lingua::EN::Inflect qw(inflect);
print inflect "There PL_V(is,$count) $count PL_N(comma,$count) at ORD($.) line.\n";
Example output:
There are 7 commas at 1st line.
There are 0 commas at 2nd line.
There is 1 comma at 3rd line.
There are 2 commas at 4th line.
There are 7 commas at 5th line.
Do you want #commas for each line in the file, or #commas in the entire file?
On a per-line basis, replace your while loop with:
my #data = <list>;
foreach my $line {
my #chars = split //, $line;
my $count = 0;
foreach my $c (#chars) { $count++ if $c eq "," }
print "There were $c commas\n";
}

How to multiply the numbers in csv file per line and add together

I need a way to take the numbers in one line in my .csv file and multiply them together, and then add the products from each line together to get just one number. My .csv file looks something like:
1,1
2,3
3,4
I know the answer should be 19, but I'm not sure how exactly to program it in Perl. I have both numbers split into different variables by:
($x,$y) = split (/,/, $line)
I've already read the file in and all that, I just need help with this one part of my code.
If anyone could point me in the right direction I would really appreciate it.
A naive solution could look like this:
use strict;
use warnings FATAL => 'all';
my $total;
open(my $fh, '<', "temp.csv");
while( my $line = <$fh> ) {
my ($x, $y) = split(',', $line);
$total += ($x * $y);
}
print "Total is: $total\n";
In short form
perl -F, -anE'$s+=$F[0]*$F[1]}{say$s'
my $sum = 0;
open my $csv, '<', $filename or die $!;
while(my $line = <$csv>) {
my $prod = 1;
$prod *= $_ for split ',', $line;
$sum += $prod;
}

Nested foreach loop not working

It should be a simple nested foreach loop but it's not working and really starting to annoy me that I can't figure this out! Still a perl beginner but I thought I understood this by now. Can someone explain to me where I'm going wrong? The idea is simple: 2 files, 1 small, 1 large with info I want in the small one. Both have unique id's in them. Compare and match the id's and output a new small file with the added info in the small file.
I have 2 pieces of code: 1 without stricts and 1 with and both are not working. I know to use stricts but i'm still curious as to why the one without stricts isn't working either.
WITOUT STRICTS:
if ($#ARGV != 2){
print "input_file1 input_file2 output_file\n";
exit;
}
$inputfile1=$ARGV[0];
$inputfile2=$ARGV[1];
$outputfile1=$ARGV[2];
open(INFILE1,$inputfile1) || die "No inputfile :$!\n";
open(INFILE2,$inputfile2) || die "No inputfile :$!\n";
open(OUTFILE_1,">$outputfile1") || die "No outputfile :$!\n";
$i = 0;
$j = 0;
#infile1=<INFILE1>;
#infile2=<INFILE2>;
foreach ( #infile1 ){
#elements = split(";",$infile1[$i]);
$id1 = $elements[3];
print "1. $id1\n";
$lat = $elements[5];
$lon = $elements[6];
$lat =~ s/,/./;
$lon =~ s/,/./;
print "2. $lat\n";
print "3. $lon\n";
foreach ( #infile2 ){
#loopelements = split(";",$infile2[$j]);
$id2 = $loopelements[4];
print "4. $id2\n";
if ($id1 == $id2){
print OUTFILE_1 "$loopelements[0];$loopelements[1];$loopelements[2];$loopelements[3];$loopelements[4];$lat,$lon\n";
};
$j = $j+1;
};
#elements = join(";",#elements); # add ';' to all elements
#print "$i\r";
$i = $i+1;
}
close(INFILE1);
close(INFILE2);
close(OUTFILE_1);
The error without is the second loop will not start if i'm not mistaken.
WITH STRICTS:
use strict;
use warnings;
my $inputfile1 = shift || die "Give input!\n";
my $inputfile2 = shift || die "Give more input!\n";
my $outputfile = shift || die "Give output!\n";
open my $INFILE1, '<', $inputfile1 or die "In use/Not found :$!\n";
open my $INFILE2, '<', $inputfile2 or die "In use/Not found :$!\n";
open my $OUTFILE, '>', $outputfile or die "In use/Not found :$!\n";
my $i = 0;
my $j = 0;
foreach ( my $infile1 = <$INFILE1> ){
my #elements = split(";",$infile1[$i]);
my $id1 = $elements[3];
print "1: $id1\n";
my $lat = $elements[5];
my $lon = $elements[6];
$lat =~ s/,/./;
$lon =~ s/,/./;
print "2: $lat\n";
print "3: $lon\n";
foreach ( my $infile2 = <$INFILE2> ){
my #loopelements = split(";",$infile2[$j]);
my $id2 = $loopelements[4];
print "4: $id2\n";
if ($id1 == $id2){
print $OUTFILE "$loopelements[0];$loopelements[1];$loopelements[2];$loopelements[3];$loopelements[4];$lat,$lon\n";
};
$j = $j+1;
};
##elements = join(";",#elements); # add ';' to all elements
#print "$i\r";
$i = $i+1;
}
close($INFILE1);
close($INFILE2);
close($OUTFILE);
The error with stricts:
Global symbol "#infile1" requires explicit package name at Z:\Data-Content\Data\test\jan\bestemming_zonder_acco\add_latlon_dest_test.pl line 16.
Global symbol "#infile2" requires explicit package name at Z:\Data-Content\Data\test\jan\bestemming_zonder_acco\add_latlon_dest_test.pl line 31.
Your 'strict' implementation gives you errors due to a confusion about the sigils (the $ and # characters) indication whether a variable is an scalar or an array. In the loop statement you are reading each line of the file into a scalar called $infile1 but in the following line you are trying to access a element of the array #infile1. These to variables are not related and as perl tells you the latter is not declared.
Another problem with you 'strict' implementation is that you are reading the file inside the loop. This means that for nested loops you will read file 2 in the first iteration of the outer loop and for all succeeding iterations the inner loop will not be able to read any lines.
I missed the foreach/while issue, pointed out by stevenl, even fixing the stricture issues will leave you with foreach loops with only one iteration.
I'm not sure what your problem with the unstrict script are.
But I wouldn't use a nested loop at all for processing two files. I would un-nest the loops, so it roughly looked like this:
my %cord;
while ( my $line = <$INFILE1> ) {
my #elements = split /;/, $line;
$cord{ $elements[3] } = "$elements[5],$elements[6]";
}
while ( my $line = <$INFILE2> ) {
my #elements = split /;/, $line;
if ( exists %coord{ $elements[4] } ) {
print $OUTFILE "....;$cord{ $elements4 }\n";
}
}
I can't see exactly where the problem with the non-strict version is. What is the problem that you are encountering?
The problem with the strict version is particularly in these 2 lines:
foreach ( my $infile1 = <$INFILE1> ){
my #elements = split(";",$infile1[$i]);
You have a scalar $infile1 in the first line, but you are treating it as an array in the next line. Also, change the foreach to a while (see below).
A few comments.
For the non-strict version, you could have collapsed the loop to a C-style for loop as:
for (my $i = 0; $i < #infile1; $i++) {
...
}
That can be made simpler to read if you go without the array indexes altogether:
foreach my $infile1 (#infile1) {
my #elements = split ';', $infile1;
...
}
But with the larger file, it might take time to slurp the entire file into the array at the beginning. So it might be better to iterate through the file as you go:
while (my $infile = <$INFILE1>) {
...
}
Note the last point should be how the strict version looks. You need a while loop rather than a foreach loop, because assigning <$INFILE1> to a scalar means it will return the next line only, which evaluates to true as long as there is another line in the file. (Thus, the foreach would only ever get the first line to loop over.)
You don't reset $j before the inner foreach loop runs. Therefore, the second time your inner loop runs, you are trying to access elements that are past the end of the array. This mistake exists in both the strict and non-strict version.
You should not be using $i and $j at all; the point of foreach is that it automatically gets each element for you. Here is an example of correctly using foreach in the inner loop:
foreach my $line ( #infile2 ){
#loopelements = split(";",$line);
#...now do stuff as before
}
This puts each element of #infile one into the variable $line in succession, until you have gone through all of the array.

perl increasing the counter number every time the script running

I have a script to compare 2 files and print out the matching lines on the file. what I want to add a logic to help me to identify for how long these devices are matched. currently I have add the starting point 1 so I want to increase that number every time the script run and matched.
Example.
inputfile:-########################
retiredDevice.txt
Alpha
Beta
Gamma
Delta
prodDevice.txt
first
second
third
forth
Gamma
Delta
output file :-#######################
final_result.txt
1 Delta
1 Gamma
my objective is to add a counter stamp on each matching line to identify for how long "Delta" and "Gamma" matched. the script running every week. so every time the script running adding 1 so when I audit the 'finalResult.txt. the result should looks like
Delta 4
Gamma 3
the result indicate me Delta matched for last 4 weeks and Gamma for last 3 weeks.
#! /usr/local/bin/perl
my $ndays = 1;
my $f1 = "/opt/retiredDevice.txt ";
my $f2 = "prodDevice.txt";
my $outfile = "/opt/final_result.txt";
my %results = ();
open FILE1, "$f1" or die "Could not open file: $! \n";
while(my $line = <FILE1>){ $results{$line}=1;
}
close(FILE1);
open FILE2, "$f2" or die "Could not open file: $! \n";
while(my $line =<FILE2>) {
$results{$line}++;
}
close(FILE2);
open (OUTFILE, ">$outfile") or die "Cannot open $outfile for writing \n";
foreach my $line (keys %results) {
my $x = $ndays;
$x++;
print OUTFILE "$x : ", $line if $results{$line} != 1;
}
close OUTFILE;
Thanks in advance for any help!
Based on your earlier question and comments, perhaps this might work.
use strict;
use warnings;
use autodie;
my $logfile = 'int.txt';
my $f1 = shift || "/opt/test.txt";
my $f2 = shift || "/opt/test1.txt";
my %results;
open my $file1, '<', $f1;
while (my $line = <$file1>) {
chomp $line;
$results{$line} = 1;
}
open my $file2, '<', $f2;
while (my $line = <$file2>) {
chomp $line;
$results{$line}++;
}
{ ############ added part
my %c;
for (keys %results) {
$c{$_} = $results{$_} if $results{$_} > 1;
}
%results = %c;
} ############ end added part
my (%log, $log);
if ( -e $logfile ) {
open $log, '<', $logfile;
while (<$log>) {
my ($num, $key) = split;
$log{$key} = $num;
}
}
open $log, '>', $logfile or die $!;
for my $key (keys %results) {
my $old = ( $log{$key} || 0 ); # keep old count, or 0 otherwise
my $new = ( $results{$key} ? 1 : 0 ); # 1 if it exists, 0 otherwise
print $log $old + $new, " $key\n";
}
Perform this computation in two steps.
Each time you run the comparison between retired and prod, produce an output file that you save with a unique file name, e.g. result-XXX where XXX denotes when you ran the comparison.
Then write a script which iterates over all of the result-XXX files and produces a summary.
I would name the files result-YYYY-MM-DD where YYYY-MM-DD is the date that the comparison was created. Then it will be relatively easy to iterate over a subset of the files (e.g. ones for a certain month).
Or store the data in a relational database.

help merging perl code routines together for file processing

I need some perl help in putting these (2) processes/code to work together. I was able to get them working individually to test, but I need help bringing them together especially with using the loop constructs. I'm not sure if I should go with foreach..anyways the code is below.
Also, any best practices would be great too as I'm learning this language. Thanks for your help.
Here's the process flow I am looking for:
read a directory
look for a particular file
use the file name to strip out some key information to create a newly processed file
process the input file
create the newly processed file for each input file read (if i read in 10, I create 10 new files)
Part 1:
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
next if ($file =~ /^\.+$/);
#Get filename attributes
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
print "$1\n";
print "$2\n";
print "$3\n";
}
print "$file\n";
}
Part 2:
use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
my $digest = md5_hex($data);
chomp;
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2" ;
$extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(#values));
$data .= "$extra$eorec";
print NEWFILE "$data";
}
#print $data;
close (NEWFILE);
You are using an old-style of Perl programming. I recommend you to use functions and CPAN modules (http://search.cpan.org). Perl pseudocode:
use Modern::Perl;
# use...
sub get_input_files {
# return an array of files (#)
}
sub extract_file_info {
# takes the file name and returs an array of values (filename attrs)
}
sub process_file {
# reads the input file, takes the previous attribs and build the output file
}
my #ifiles = get_input_files;
foreach my $ifile(#ifiles) {
my #attrs = extract_file_info($ifile);
process_file($ifile, #attrs);
}
Hope it helps
I've bashed your two code fragments together (making the second a sub that the first calls for each matching file) and, if I understood your description of the objective correctly, this should do what you want. Comments on style and syntax are inline:
#!/usr/bin/env perl
# - Never forget these!
use strict;
use warnings;
use Digest::MD5 qw(md5_hex);
my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
# Parens on postfix "if" are optional; I prefer to omit them
next if $file =~ /^\.+$/;
if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
process_file($file, $1, $2, $3);
}
print "$file\n";
}
sub process_file {
my ($orig_name, $foo_x, $name_x, $p_x) = #_;
my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";
# - From your description of the task, it sounds like we actually want to
# read from the found file, not from <>, so opening it here to read
# - Better to use lexical ("my") filehandle and three-arg form of open
# - "or" has lower operator precedence than "||", so less chance of
# things being grouped in the wrong order (though either works here)
# - Including $! in the error will tell why the file open failed
open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";
my $data = '';
my $line1 = <$in_fh>;
chomp $line1;
my #heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
while (<$in_fh>) {
chomp;
my $digest = md5_hex($data);
my (#values) = split /,/;
my $extra = "__mykey__$sep1$digest$sep2";
$extra .= "$heading[$_]$sep1$values[$_]$sep2"
for (0 .. scalar(#values));
# - Useless use of double quotes removed on next two lines
$data .= $extra . $eorec;
#print $out_fh $data;
}
# - Moved print to output file to here (where it will print the complete
# output all at once) rather than within the loop (where it will print
# all previous lines each time a new line is read in) to prevent
# duplicate output records. This could also be achieved by printing
# $extra inside the loop. Printing $data at the end will be slightly
# faster, but requires more memory; printing $extra within the loop and
# getting rid of $data entirely would require less memory, so that may
# be the better option if you find yourself needing to read huge input
# files.
print $out_fh $data;
# - $in_fh and $out_fh will be closed automatically when it goes out of
# scope at the end of the block/sub, so there's no real point to
# explicitly closing it unless you're going to check whether the close
# succeeded or failed (which can happen in odd cases usually involving
# full or failing disks when writing; I'm not aware of any way that
# closing a file open for reading can fail, so that's just being left
# implicit)
close $out_fh or die "Failed to close file: $!";
}
Disclaimer: perl -c reports that this code is syntactically valid, but it is otherwise untested.