Related
FILE:
1,2015-08-20,00:00:00,89,1007.48,295.551,296.66,
2,2015-08-20,03:00:00,85,1006.49,295.947,296.99,
3,2015-08-20,06:00:00,86,1006.05,295.05,296.02,
4,2015-08-20,09:00:00,85,1005.87,296.026,296.93,
5,2015-08-20,12:00:00,77,1004.96,298.034,298.87
code:
use IPC::System::Simple qw( capture capturex );
use POSIX;
my $tb1_file = '/var/egridmanage_pl/daily_pl/egrid-csv/test.csv';
open my $fh1, '<', $tb1_file or die qq{Unable to open "$tb1_file" for input: $!};
my #t1_temp_12 = map {
chomp;
my #t1_ft_12 = split /,/;
sprintf "%.0f", $t1_ft_12[6] if $t1_ft_12[2] eq '12:00:00';
} <$fh1>;
print "TEMP #t1_temp_12\n";
my $result = #t1_temp_12 - 273.14;
print "$result should equal something closer to 24 ";
$result value prints out -265.14 making me think the #t1_temp_12 is hashed
So I tried to do awk
my $12temp = capture("awk -F"," '$3 == "12:00:00" {print $7 - 273-.15}' test.csv");
I've tried using ``, qx, open, system all having the same error result using the awk command
But this errors out. When executing awk at command line i get the favoured results.
This looks like there's some cargo cult programming going on here. It looks like all you're trying to do is find the line for 12:00:00 and print the temperature in degrees C rather than K.
Which can be done like this:
#!/usr/bin/perl
use strict;
use warnings;
while (<DATA>) {
my #fields = split /,/;
print $fields[6] - 273.15 if $fields[2] eq "12:00:00";
}
__DATA__
1,2015-08-20,00:00:00,89,1007.48,295.551,296.66,
2,2015-08-20,03:00:00,85,1006.49,295.947,296.99,
3,2015-08-20,06:00:00,86,1006.05,295.05,296.02,
4,2015-08-20,09:00:00,85,1005.87,296.026,296.93,
5,2015-08-20,12:00:00,77,1004.96,298.034,298.87
Prints:
25.72
You don't really need to do map sprintf etc. (Although you could do a printf on that output if you do want to format it).
Edit: From the comments, it seems one of the sources of confusion is extracting an element from an array. An array is zero or more scalar elements - you can't just assign one to the other, because .... well, what should happen if there isn't just one element (which is the usual case).
Given an array, we can:
pop #array will return the last element (and remove it from the array) so you could my $result = pop #array;
[0] is the first element of the array, so we can my $result = $array[0];
Or we can assign one array to another: my ( $result ) = #array; - because on the left hand side we have an array now, and it's a single element - the first element of #array goes into $result. (The rest isn't used in this scenario - but you could do my ( $result, #anything_else ) = #array;
So in your example - if what you're trying to do is retrieve a value matching a criteria - the normal tool for the job would be grep - which filters an array by applying a conditional test to each element.
So:
my #lines = grep { (split /,/)[2] eq "12:00:00" } <DATA>;
print "#lines";
print $lines[0];
Which we can reduce to:
my ( $firstresult ) = grep { (split /,/)[2] eq "12:00:00" } <DATA>;
print $firstresult;
But as we want to want to transform our array - map is the tool for the job.
my ( $result ) = map { (split /,/)[6] - 273.15 } grep { (split /,/)[2] eq "12:00:00" } <DATA>;
print $result;
First we:
use grep to extract the matching elements. (one in this case, but doesn't necessarily have to be!)
use map to transform the list, so that that we turn each element into just it's 6th field, and subtract 273.15
assign the whole lot to a list containing a single element - in effect just taking the first result, and throwing the rest away.
But personally, I think that's getting a bit complicated and may be hard to understand - and would suggest instead:
my $result;
while (<DATA>) {
my #fields = split /,/;
if ( $fields[2] eq "12:00:00" ) {
$result = $fields[6] - 273.15;
last;
}
}
print $result;
Iterate your data, split - and test - each line, and when you find one that matches the criteria - set $result and bail out of the loop.
#t1_temp_12 is an array. Why are you trying to subtract an single value from it?
my $result = "#t1_temp_12 - 273.14";
Did you want to do this instead?
#t1_temp_12 = map {$_ - 273.14} #t1_temp_12;
As a shell one-liner, you could write your entire script as:
perl -F, -lanE 'say $F[6]-273.14 if $F[2] eq "12:00:00"' <<DATA
1,2015-08-20,00:00:00,89,1007.48,295.551,296.66,
2,2015-08-20,03:00:00,85,1006.49,295.947,296.99,
3,2015-08-20,06:00:00,86,1006.05,295.05,296.02,
4,2015-08-20,09:00:00,85,1005.87,296.026,296.93,
5,2015-08-20,12:00:00,77,1004.96,298.034,298.87
DATA
25.73
i am new in Perl and i need to do some regexp.
I read, when array is used like integer value, it gives count of elements inside.
So i am doing for example
if (#result = $pattern =~ /(\d)\.(\d)/) {....}
and i was thinking it should return empty array, when pattern matching fails, but it gives me still array with 2 elements, but with uninitialized values.
So how i can put pattern matching inside if condition, is it possible?
EDIT:
foreach (keys #ARGV) {
if (my #result = $ARGV[$_] =~ /^--(?:(help|br)|(?:(input|output|format)=(.+)))$/) {
if (defined $params{$result[0]}) {
print STDERR "Cmd option error\n";
}
$params{$result[0]} = (defined $result[1] ? $result[1] : 1);
}
else {
print STDERR "Cmd option error\n";
exit ERROR_CMD;
}
}
It is regexp pattern for command line options, cmd options are in long format with two hyphens preceding and possible with argument, so
--CMD[=ARG]. I want elegant solution, so this is why i want put it to if condition without some prolog etc.
EDIT2:
oh sry, i was thinking groups in #result array are always counted from 0, but accesible are only groups from branch, where the pattern is success. So if in my code command is "input", it should be in $result[0], but actually it is in $result[1]. I thought if $result[0] is uninitialized, than pattern fails and it goes to the if statement.
Consider the following:
use strict;
use warnings;
my $pattern = 42.42;
my #result = $pattern =~ /(\d)\.(\d)/;
print #result, ' elements';
Output:
24 elements
Context tells Perl how to treat #result. There certainly aren't 24 elements! Perl has printed the array's elements which resulted from your regex's captures. However, if we do the following:
print 0 + #result, ' elements';
we get:
2 elements
In this latter case, Perl interprets a scalar context for #result, so adds the number of elements to 0. This can also be achieved through scalar #results.
Edit to accommodate revised posting: Thus, the conditional in your code:
if(my #result = $ARGV[$_] =~ /^--(?:(help|br)|(?:(input|output|format)=(.+)))$/) { ...
evaluates to true if and only if the match was successful.
#results = $pattern =~ /(\d)\.(\d)/ ? ($1,$2) : ();
Try this:
#result = ();
if ($pattern =~ /(\d)\.(\d)/)
{
push #result, $1;
push #result, $2;
}
=~ is not an equal sign. It's doing a regexp comparison.
So my code above is initializing the array to empty, then assigning values only if the regexp matches.
I am trying to use the following script to shuffle the order of sequences (lines) within a file. I'm not sure how to "initialize" values -- please help!
print "Please enter filename (without extension): ";
my $input = <>;
chomp $input;
use strict;
use warnings;
print "Please enter total no. of sequence in fasta file: ";
my $orig_size = <>*2-1;
chomp $orig_size;
open INFILE, "$input.fasta"
or die "Error opening input file for shuffling!";
open SHUFFLED, ">"."$input"."_shuffled.fasta"
or die "Error creating shuffled output file!";
my #array = (0); # Need to initialise 1st element in array1&2 for the shift function
my #array2 = (0);
my $i = 1;
my $index = 0;
my $index2 = 0;
while (my #line = <INFILE>){
while ($i <= $orig_size) {
$array[$i] = $line[$index];
$array[$i] =~ s/(.)\s/$1/seg;
$index++;
$array2[$i] = $line[$index];
$array2[$i] =~ s/(.)\s/$1/seg;
$i++;
$index++;
}
}
my $array = shift (#array);
my $array2 = shift (#array2);
for ($i = my $header_size; $i >= 0; $i--) {
my $j = int rand ($i+1);
next if $i == $j;
#array[$i,$j] = #array[$j,$i];
#array2[$i,$j] = #array2[$j,$i];
}
while ($index2 <= my $header_size) {
print SHUFFLED "$array[$index2]\n";
print SHUFFLED "$array2[$index2]\n";
$index2++;
}
close INFILE;
close SHUFFLED;
I'm getting these warnings:
Use of uninitialized value in substitution (s///) at fasta_corrector6.pl line 27, <INFILE> line 578914.
Use of uninitialized value in substitution (s///) at fasta_corrector6.pl line 31, <INFILE> line 578914.
Use of uninitialized value in numeric ge (>=) at fasta_corrector6.pl line 40, <INFILE> line 578914.
Use of uninitialized value in addition (+) at fasta_corrector6.pl line 41, <INFILE> line 578914.
Use of uninitialized value in numeric eq (==) at fasta_corrector6.pl line 42, <INFILE> line 578914.
Use of uninitialized value in numeric le (<=) at fasta_corrector6.pl line 47, <INFILE> line 578914.
Use of uninitialized value in numeric le (<=) at fasta_corrector6.pl line 50, <INFILE> line 578914.
First, you read the whole input file in:
use IO::File;
my #lines = IO::File->new($file_name)->getlines;
then you shuffle it:
use List::Util 'shuffle';
my #shuffled_lines = shuffle(#lines);
then you write them out:
IO::File->new($new_file_name, "w")->print(#shuffled_lines);
There's an entry in the Perl FAQ about how to shuffle an array. Another entry tells of the many ways to read a file in one go. Perl FAQs contain a lot of samples and trivia on how to do many common things -- it's a good place to continue learning more about Perl.
On your previous question I gave this answer, and noted that your code failed because you had not initialized a variable named $header_size used in a loop condition. Not only have you repeated that mistake, you have elaborated on it by starting to declare the variable with my each time you try to access it.
for ($i = my $header_size; $i >= 0; $i--) {
# ^^--- wrong!
while ($index2 <= my $header_size) {
# ^^--- wrong!
A variable that is declared with my is empty (undef) by default. $index2 can never contain anything but undef here, and your loop will run only once, because 0 <= undef will evaluate true (albeit with an uninitialized warning).
Please take my advice and set a value for $header_size. And only use my when declaring a variable, not every time you use it.
A better solution
Seeing your errors above, it seems that your input files are rather large. If you have over 500,000 lines in your files, it means your script will consume large amounts of memory to run. It may be worthwhile for you to use a module such as Tie::File and work only with array indexes. For example:
use strict;
use warnings;
use Tie::File;
use List::Util qw(shuffle);
tie my #file, 'Tie::File', $filename or die $!;
for my $lineno (shuffle 0 .. $#file) {
print $line[$lineno];
}
untie #file; # all done
I cannot pinpoint what exactly went wrong, but there are a few oddities with your code:
The Diamond Operator
Perl's Diamond operator <FILEHANDLE> reads a line from the filehandle. If no filehandle is provided, each command line Argument (#ARGV) is treated as a file and read. If there are no arguments, STDIN is used. better specify this yourself. You also should chomp before you do arithemtics with the line, not afterwards. Note that strings that do not start with a number are treated as numeric 0. You should check for numericness (with a regex?) and include error handling.
The Diamond/Readline operator is context sensitive. If given in scalar context (e.g, a conditional, a scalar assignment) it returns one line. If given in list context, e.g. as a function parameter or an array assignment, it returns all lines as an array. So
while (my #line = <INFILE>) { ...
will not give you one line but all lines and is thus equivalent to
my #line;
if (#line = <INFILE>) { ...
Array gymnastics
After you read in the lines, you try to do some manual chomping. Here I remove all trailing whitspaces in #line, in a single line:
s/\s+$// foreach #line;
And here, I remove all non-leading whitespaces (what your regex is doing in fact):
s/(?<!^)\s//g foreach #line;
To stuff an element alternatingly into two arrays, this might work as well:
for my $i (0 .. $##line) {
if ($i % 2) {
push #array1, shift #line;
} else {
push #array2, shift #line;
}
}
or
my $i = 0;
while (#line) {
push ($i++ % 2 ? #array1 : #array2), shift #line
}
Manual bookkeeping of array indices is messy and error-prone.
Your for-loop could be written mor idiomatic as
for my $i (reverse 0 .. $header_size)
Do note that declaring $header_size inside the loop initialisation is possible if it was not declared before, but it will yield the undef value, therefore you assigned undef to $i which leads to some of the error messages, as undef should not be used in arithemtic operations. Assignments always assigns the right side to the left side.
I am writing a Perl Script to find out the frequency of occurrence of characters in a message. Here is the logic I am following:
Read one char at a time from the message using getc() and store it into an array.
Run a for loop starting from index 0 to the length of this array.
This loop will read each char of the array and assign it to a temp variable.
Run another for loop nested in the above, which will run from the index of the character being tested till the length of the array.
Using a string comparison between this character and the current array indexed char, a counter is incremented if they are equal.
After completion of inner For Loop, I am printing the frequency of the char for debug purposes.
Question: I don't want the program to recompute the frequency of a character if it's already been calculated. For instance, if character "a" occurs 3 times, for the first run, it calculates the correct frequency. However, at the next occurrence of "a", since loop runs from that index till the end, the frequency is (actual freq -1). Similary for the third occurrence, frequency is (actual freq -2).
To solve this. I used another temp array to which I would push the char whose frequency is already evaluated.
And then at the next run of for loop, before entering the inner for loop, I compare the current char with the array of evaluated chars and set a flag. Based on that flag, the inner for loop runs.
This is not working for me. Still the same results.
Here's the code I have written to accomplish the above:
#!/usr/bin/perl
use strict;
use warnings;
my $input=$ARGV[0];
my ($c,$ch,$flag,$s,#arr,#temp);
open(INPUT,"<$input");
while(defined($c = getc(INPUT)))
{
push(#arr,$c);
}
close(INPUT);
my $length=$#arr+1;
for(my $i=0;$i<$length;$i++)
{
$count=0;
$flag=0;
$ch=$arr[$i];
foreach $s (#temp)
{
if($ch eq $s)
{
$flag = 1;
}
}
if($flag == 0)
{
for(my $k=$i;$k<$length;$k++)
{
if($ch eq $arr[$k])
{
$count = $count+1;
}
}
push(#temp,$ch);
print "The character \"".$ch."\" appears ".$count." number of times in the message"."\n";
}
}
You're making your life much harder than it needs to be. Use a hash:
my %freq;
while(defined($c = getc(INPUT)))
{
$freq{$c}++;
}
print $_, " ", $freq{$_}, "\n" for sort keys %freq;
$freq{$c}++ increments the value stored in $freq{$c}. (If it was unset or zero, it becomes one.)
The print line is equivalent to:
foreach my $key (sort keys %freq) {
print $key, " ", $freq{$key}, "\n";
}
If you want to do a single character count for the whole file then use any of the suggested methods posted by the others. If you want a count of all the occurances
of each character in a file then I propose:
#!/usr/bin/perl
use strict;
use warnings;
# read in the contents of the file
my $contents;
open(TMP, "<$ARGV[0]") or die ("Failed to open $ARGV[0]: $!");
{
local($/) = undef;
$contents = <TMP>;
}
close(TMP);
# split the contents around each character
my #bits = split(//, $contents);
# build the hash of each character with it's respective count
my %counts = map {
# use lc($_) to make the search case-insensitive
my $foo = $_;
# filter out newlines
$_ ne "\n" ?
($foo => scalar grep {$_ eq $foo} #bits) :
() } #bits;
# reverse sort (highest first) the hash values and print
foreach(reverse sort {$counts{$a} <=> $counts{$b}} keys %counts) {
print "$_: $counts{$_}\n";
}
I donĀ“t understand the problem you are trying to solve, so I propose a more simple way to count the characters in a string:
$string = "fooooooobar";
$char = 'o';
$count = grep {$_ eq $char} split //, $string;
print $count, "\n";
This prints the number of $char occurrences in $string (7).
Hope this helps to write a more compact code
Faster solution :
#result = $subject =~ m/a/g; #subject is your file
print "Found : ", scalar #result, " a characters in file!\n";
Of course you can put a variable in the place of 'a' or even better execute this line for whatever characters you want to count the occurrences.
As a one-liner:
perl -F"" -anE '$h{$_}++ for #F; END { say "$_ : $h{$_}" for keys %h }' foo.txt
I need to detect if the first character in a file is an equals sign (=) and display the line number. How should I write the if statement?
$i=0;
while (<INPUT>) {
my($line) = $_;
chomp($line);
$findChar = substr $_, 0, 1;
if($findChar == "=")
$output = "$i\n";
print OUTPUT $output;
$i++;
}
Idiomatic perl would use a regular expression (^ meaning beginning of line) plus one of the dreaded builtin variables which happens to mean "line in file":
while (<INPUT>) {
print "$.\n" if /^=/;
}
See also perldoc -v '$.'
Use $findChar eq "=". In Perl:
== and != are numeric comparisons. They will convert both operands to a number.
eq and ne are string comparisons. They will convert both operands to a string.
Yes, this is confusing. Yes, I still write == when I mean eq ALL THE TIME. Yes, it takes me forever to spot my mistake too.
It looks like you are not using strict and warnings. Use them, especially since you do not know Perl, you might also want to add diagnostics to the list of must-use pragmas.
You are keeping track of the input line number in a separate variable $i. Perl has various builtin variables documented in perlvar. Some of these, such as $. are very useful use them.
You are using my($line) = $_; in the body of the while loop. Instead, avoid $_ and assign to $line directly as in while ( my $line = <$input> ).
Note that bareword filehandles such as INPUT are package global. With the exception of the DATA filehandle, you are better off using lexical filehandles to properly limit the scope of your filehandles.
In your posts, include sample data in the __DATA_ section so others can copy, paste and run your code without further work.
With these comments in mind, you can print all lines that do not start with = using:
#!/usr/bin/perl
use strict; use warnings;
while (my $line = <DATA> ) {
my $first_char = substr $line, 0, 1;
if ( $first_char ne '=' ) {
print "$.:$first_char\n";
}
}
__DATA__
=
=
a
=
+
However, I would be inclined to write:
while (my $line = <DATA> ) {
# this will skip blank lines
if ( my ($first_char) = $line =~ /^(.)/ ) {
print "$.:$first_char\n" unless $first_char eq '=';
}
}