Not able to extract a number from a text file

Not able to extract a number from a text file - perl

I have a text file which has the information of the number of tests that were run, passed and failed.
Also it contains information on which of the tests failed.
I want to extract the total number of the tests that were run and failed.
Below is the sample of the log file:
file_1 has difference
file_2 has difference
file_3 has difference
file_4 has difference
file_5 has difference
file_6 has difference
file_7 has difference
file_8 has difference
events has difference
QShrink has difference
Total tests run = 10
Total tests passed = 0
Total tests failed = 10
I tried to capture it like this, but didn't work:
if ( $_=~/^# run =/ || $_=~/^# failed =/ ) {
print $_;
my $entry = <FILE>;
print $entry;
}
My objective is that I should be able to fetch only the corresponding numbers and not the entire string.

You should put the entire line into the pattern and discern based on he last word before the =. That makes it flexible, because you don't need to care if all of the lines are present.
use strict;
use warnings 'all';
use Data::Dumper;
my %stats;
while (<DATA>) {
if ( m/^Total tests ([a-z]+) = (\d+)/ ) {
$stats{$1} = $2;
}
}
print Dumper \%stats;
__DATA__
file_1 has difference
file_2 has difference
file_3 has difference
file_4 has difference
file_5 has difference
file_6 has difference
file_7 has difference
file_8 has difference
events has difference
QShrink has difference
Total tests run = 10
Total tests passed = 0
Total tests failed = 10
This solution uses a hash to store the matches.
$VAR1 = {
'failed' => '10',
'run' => '10',
'passed' => '0'
};
Let's take a look at what you did.
if($_=~/^# run =/ || $_=~/^# failed =/)
{
print $_;
my$entry=<FILE>;
print $entry;
}
This code assumes there is something in $_. Maybe you already opened the file and are reading it.
while (<DATA>) {
if ($_ =~ /.../) {
So you are saying that if the current line matches the beginning of the string, a #, a space, the word run, a space and an = (or the same with failed, it should print the full line, then assign the next line to a lexical variable that only exists within that block, and print it.
This pattern does not match your input, so the block will never be executed. If it would be, you'd pull away another line of the input for every line that matches.
All of that is not what you want and does not get you anywhere near the numbers.

if($_=~/Total tests run = ([0-9]+)/)
{
print "Total tests run :$1\n";
}
In the above code the numbers that you want are captured in perls default variable $1 as they are placed in braces. Similarly you can do for failed number of tests.

Related

Perl claims uninitialized value on a variable that was set from an array

I'm not sure why its doing this, but it is. I have a project i'm working on for keeping score, and saving it to text/data files. it also will show information saved in a file, and a few other goodies. Right now, i have a set of code that i'm trying to get working. I have the file set so that it automatically runs a certain sub, but i'm trying to trigger the sub that shows data. I run the script using perl scorecard.pl --display-file scores.dat, and i get the following:
Use of uninitialized value $gn in print at scorecard.pl line 30.
Use of uninitialized value $gt in print at scorecard.pl line 30.
Use of uninitialized value $gp in print at scorecard.pl line 30.
Game '' was started with players.
these were the scores:
Use of uninitialized value $gp in numeric lt (<) at scorecard.pl line 31.
Died at scorecard.pl line 35.
Welcome to scorecard
A simple scorecard script
What game would you like to score?
^C
Here's the code for the display sub:
sub dispfile()
{
my ($opt_name, $dfile) = #_;
open (my $fhd,'<',$dfile)
or die "Could not open file '",$dfile,"'.\n";
chomp(my #ls = <$fhd>);
my $gt = $ls[0];
my $gn = $ls[1];
my $gp = $ls[2];
print "Game '",$gn,"' was started ",$gt," with ",$gp," players.\nthese were the scores:\n";
for(my $i=3;$i<$gp;$i++){
print $ls[$i];
}
close $fhd;
die;
}
The whole project is currently on my github, and i have pushed the latest version to my dev-0.1-r2 branch: GitHub - scorecard.pl

A few suggestions on your code.
You should check whether you've opened an empty file and take appropriate action:
chomp(my #ls = <$fhd>);
die "No data in file $dfile\n" unless #ls;
You can use list assignment simplify the next line:
my ($gt, $gn, $gp) = #ls;
You can use string interpolation to simplify your print() line:
print "Game '$gn' was started $gt with $gp players.\nthese were the scores:\n";
A foreach loop is usually far easier to understand than a C-style for loop.
foreach (3 .. $#ls) {
print $ls[$_];
}
You can simplify that even more by iterating over array elements rather than array indexes.
foreach (#ls[3 .. $#ls]) {
print $_;
}
Or perhaps rewrite it using the postfix version of the loop.
print foreach #ls[3 .. $#ls];
And, finally, if you want to exit your program in completely normal circumstances, then use exit instead of die.
The first of these suggestions will solve the problem that you're having.

Regular expression to print a string from a command outpout

I have written a function that uses regex and prints the required string from a command output.
The script works as expected. But it's does not support a dynamic output. currently, I use regex for "icmp" and "ok" and print the values. Now, type , destination and return code could change. There is a high chance that command doesn't return an output at all. How do I handle such scenarios ?
sub check_summary{
my ($self) = #_;
my $type = 0;
my $return_type = 0;
my $ipsla = $self->{'ssh_obj'}->exec('show ip sla');
foreach my $line( $ipsla) {
if ( $line =~ m/(icmp)/ ) {
$type = $1;
}
if ( $line =~ m/(OK)/ ) {
$return_type = $1;
}
}
INFO ($type,$return_type);
}
command Ouptut :
PSLAs Latest Operation Summary
Codes: * active, ^ inactive, ~ pending
ID Type Destination Stats Return Last
(ms) Code Run
-----------------------------------------------------------------------
*1 icmp 192.168.25.14 RTT=1 OK 1 second ago

Updated to some clarifications -- we need only the last line
As if often the case, you don't need a regex to parse the output as shown. You have space-separated fields and can just split the line and pick the elements you need.
We are told that the line of interest is the last line of the command output. Then we don't need the loop but can take the last element of the array with lines. It is still unclear how $ipsla contains the output -- as a multi-line string or perhaps as an arrayref. Since it is output of a command I'll treat it as a multi-line string, akin to what qx returns. Then, instead of the foreach loop
my #lines = split '\n', $ipsla; # if $ipsla is a multi-line string
# my #lines = #$ipsla; # if $ipsla is an arrayref
pop #lines while $line[-1] !~ /\S/; # remove possible empty lines at end
my ($type, $return_type) = (split ' ', $lines[-1])[1,4];
Here are some comments on the code. Let me know if more is needed.
We can see in the shown output that the fields up to what we need have no spaces. So we can split the last line on white space, by split ' ', $lines[-1], and take the 2nd and 5th element (indices 1 and 4), by ( ... )[1,4]. These are our two needed values and we assign them.
Just in case the output ends with empty lines we first remove them, by doing pop #lines as long as the last line has no non-space characters, while $lines[-1] !~ /\S/. That is the same as
while ( $lines[-1] !~ /\S/ ) { pop #lines }
Original version, edited for clarifications. It is also a valid way to do what is needed.
I assume that data starts after the line with only dashes. Set a flag once that line is reached, process the line(s) if the flag is set. Given the rest of your code, the loop
my $data_start;
foreach (#lines)
{
if (not $data_start) {
$data_start = 1 if /^\s* -+ \s*$/x; # only dashes and optional spaces
}
else {
my ($type, $return_type) = (split)[1,4];
print "type: $type, return code: $return_type\n";
}
}
This is a sketch until clarifications come. It also assumes that there are more lines than one.

I'm not sure of all possibilities of output from that command so my regular expression may need tweaking.
I assume the goal is to get the values of all columns in variables. I opted to store values in a hash using the column names as the hash keys. I printed the results for debugging / demonstration purposes.
use strict;
use warnings;
sub check_summary {
my ($self) = #_;
my %results = map { ($_,undef) } qw(Code ID Type Destination Stats Return_Code Last_Run); # Put results in hash, use column names for keys, set values to undef.
my $ipsla = $self->{ssh_obj}->exec('show ip sla');
foreach my $line (#$ipsla) {
chomp $line; # Remove newlines from last field
if($line =~ /^([*^~])([0-9]+)\s+([a-z]+)\s+([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\s+([[:alnum:]=]+)\s+([A-Z]+)\s+([^\s].*)$/) {
$results{Code} = $1; # Code prefixing ID
$results{ID} = $2;
$results{Type} = $3;
$results{Destination} = $4;
$results{Stats} = $5;
$results{Return_Code} = $6;
$results{Last_Run} = $7;
}
}
# Testing
use Data::Dumper;
print Dumper(\%results);
}
# Demonstrate
check_summary();
# Commented for testing
#INFO ($type,$return_type);
Worked on the submitted test line.
EDIT:
Regular expressions allow you to specify patterns instead of the exact text you are attempting to match. This is powerful but complicated at times. You need to read the Perl Regular Expression documentation to really learn them.
Perl regular expressions also allow you to capture the matched text. This can be done multiple times in a single pattern which is how we were able to capture all the columns with one expression. The matches go into numbered variables...
$1
$2

delete previous and next lines in perl

I have the following file:
#TWEETY:150:000000000-ACFKE:1:2104:27858:17965
AAATTAGCAAAAAACAATAACAAAACTGGGAAAATGCAATTTAACAACGAAAATTTTCCGAGAACTTGAAAGCGTACGAAAACGATACGCTCC
+
D1FFFB11FDG00EE0FFFA1110FAA1F/ABA0FGHEGDFEEFGDBGGGGFEHBFDDG/FE/EGH1#GF#F0AEEEEFHGGFEFFCEC/>EE
#TWEETY:150:000000000-ACFKE:1:1105:22044:20029
AAAAAATATTAAAACTACGAATGCATAAATTATTTCGTTCGAAATAAACTCACACTCGTAACATTGAACTACGCGCTCC
+
CCFDDDFGGGGGGGGGGHGGHHHHGHHHHHHHHHHHHHHHGHHGHHHHHHHHHHHHHGHGHGGHHHHHHGHHEGGGGGG
#TWEETY:150:000000000-ACFKE:1:2113:14793:7182
TATATAAAGCGAGAGTAGAAACTTTTTAATTGACGCGGCGAGAAAGTATATAGCAACAAGCGAGCACCCGCTCC
+
BBFFFFFGGGGFFGGFGHHHHHHHHHHHHHHHHHGGAEEEAFGGGHHFEGHHGHHHHHGHHGGGGFHHGG?EEG
#TWEETY:150:000000000-ACFKE:1:2109:5013:22093
AAAAAAATAATTCATATCGCCATATCGACTGACAGATAATCTATCTATAATCATAACTTTTCCCTCGCTCC
+
DAFAADDGF1EAGG3EG3A00ECGDFFAEGFCHHCAGHBGEAGBFDEDGGHBGHGFGHHFHHHBDG?/FA/
#TWEETY:150:000000000-ACFKE:1:2106:25318:19875
+
CCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
The lines are in groups of four (each time there is a name, starting with #TWEETY, a string of letters, a + character, and another string of letters).
The second and fourth lines should have the same number of characters.
But there are cases where the second line is empty, as in the last four lines.
In these cases, I would like to get rid of the whole block (the previous line before the empty line and the next two lines).
I have just started perl and have been trying to write a script for my problem, but am having a hard time. Does anyone have some feedback?
Thanks!

Keep an array buffer of the last four lines. When it's full, check the second line, print the lines or not, empty the buffer, repeat.
#!/usr/bin/perl
use warnings;
use strict;
my #buffer;
sub output {
print #buffer unless 1 == length $buffer[1];
#buffer = ();
}
while (<>) {
if (4 == #buffer) {
output();
}
push #buffer, $_;
}
output(); # Don't forget to process the last four lines.

Yes. Start with looking at $/ and set it so you can work on a chunk at a time. I would suggest you can treat # as a record separator in your example.
Then iterate your records using a while loop. E.g. while ( <> ) {
Use split on \n to turn the current chunk into an array of lines.
Perform your test on the appropriate lines, and either print - or not - depending on whether it passed.
If you get stuck with that, then I'm sure a specific question including your code and where you're having problems will be well received here.

If you chunk the data correctly, this becomes almost trivial.
#!/usr/bin/perl
use strict;
use warnings;
# Use '#TWEETY' as the record separator to make it
# easy to chunk the data.
local $/ = '#TWEETY';
while (<DATA>) {
# The first entry will be empty (as the separator
# is the first thing in the file). Skip that record.
next unless /\S/;
# Skip any records with two consecutive newlines
# (as they will be the ones with the empty line 2)
next if /\n\n/;
# Print the remaining records
# (with $/ stuck back on the front)
print "$/$_";
}
__DATA__
#TWEETY:150:000000000-ACFKE:1:2104:27858:17965
AAATTAGCAAAAAACAATAACAAAACTGGGAAAATGCAATTTAACAACGAAAATTTTCCGAGAACTTGAAAGCGTACGAAAACGATACGCTCC
+
D1FFFB11FDG00EE0FFFA1110FAA1F/ABA0FGHEGDFEEFGDBGGGGFEHBFDDG/FE/EGH1#GF#F0AEEEEFHGGFEFFCEC/>EE
#TWEETY:150:000000000-ACFKE:1:1105:22044:20029
AAAAAATATTAAAACTACGAATGCATAAATTATTTCGTTCGAAATAAACTCACACTCGTAACATTGAACTACGCGCTCC
+
CCFDDDFGGGGGGGGGGHGGHHHHGHHHHHHHHHHHHHHHGHHGHHHHHHHHHHHHHGHGHGGHHHHHHGHHEGGGGGG
#TWEETY:150:000000000-ACFKE:1:2113:14793:7182
TATATAAAGCGAGAGTAGAAACTTTTTAATTGACGCGGCGAGAAAGTATATAGCAACAAGCGAGCACCCGCTCC
+
BBFFFFFGGGGFFGGFGHHHHHHHHHHHHHHHHHGGAEEEAFGGGHHFEGHHGHHHHHGHHGGGGFHHGG?EEG
#TWEETY:150:000000000-ACFKE:1:2109:5013:22093
AAAAAAATAATTCATATCGCCATATCGACTGACAGATAATCTATCTATAATCATAACTTTTCCCTCGCTCC
+
DAFAADDGF1EAGG3EG3A00ECGDFFAEGFCHHCAGHBGEAGBFDEDGGHBGHGFGHHFHHHBDG?/FA/
#TWEETY:150:000000000-ACFKE:1:2106:25318:19875
+
CCCCCCCCCCCCGGGGGGGGGGGGGGGGGGGGGGGGFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Thanks everyone for the feedback!
It was all really useful. Thanks to your suggestions, I explored all the options and learned the unless statement.
The easiest solution given my existing code, was just to add an unless statement at the end.
### Write to output, but remove non-desired Gs
open OUT, ">$outfile";
my #accorder = #{$store0{"accorder"}};
foreach my $acc (#accorder){
# retrieve seq(2nd line) and qual(4th line)
my $seq = $store0{$acc}{"seq"};
my $qual = $store0{$acc}{"qual"};
# clean out polyG at end
$seq =~ s/G{3,}.{0,1}$//;
my $lenseq = length($seq);
my $lenqual = length($qual);
my $startqual = $lenqual - $lenseq;
$qual = substr($qual, 0, $lenseq);
#the above was in order to remove multiple G characters at the end of the
#second line, which is what led to empty lines (lines that were made up of
#only Gs got cut out)
# print to output, unless sequence has become empty
unless($lenseq == 0){ #this is the unless statement I added
print OUT "\#$acc\n$seq\n+\n$qual\n";
}
}
close(OUT);

Using Perl to find and fix errors in CSV files

I am dealing with very large amounts of data. Every now and then there is a slip up. I want to identify each row with an error, under a condition of my choice. With that I want the row number along with the line number of each erroneous row. I will be running this script on a handful of files and I will want to output the report to one.
So here is my example data:
File_source,ID,Name,Number,Date,Last_name
1.csv,1,Jim,9876,2014-08-14,Johnson
1.csv,2,Jim,9876,2014-08-14,smith
1.csv,3,Jim,9876,2014-08-14,williams
1.csv,4,Jim,9876,not_a_date,jones
1.csv,5,Jim,9876,2014-08-14,dean
1.csv,6,Jim,9876,2014-08-14,Ruzyck
Desired output:
Row#5,4.csv,4,Jim,9876,not_a_date,jones (this is an erroneous row)
The condition I have chosen is print to output if anything in the date field is not a date.
As you can see, my desired output contains the line number where the error occurred, along with the data itself.
After I have my output that shows the lines within each file that are in error, I want to grab that line from the untouched original CSV file to redo (both modified and original files contain the same amount of rows). After I have a file of these redone rows, I can omit and clean up where needed to prevent interruption of an import.
Folder structure will contain:
Modified: 4.txt
Original: 4.csv
I have something started here, written in Perl, which by the logic will at least return the rows I need. However I believe my syntax is a little off and I do not know how to plug in the other subroutines.
Code:
$count = 1;
while (<>) {
unless ($F[4] =~ /\d+[-]\d+[-]\d+/)
print "Row#" . $count++ . "," . "$_";
}
The code above is supposed to give me my erroneous rows, but to be able to extract them from the originals is beyond me. The above code also contains some syntax errors.

This will do as you ask.
Please be certain that none of the fields in the data can ever contain a comma , otherwise you will need to use Text::CSV to process it instead of just a simple split.
use strict;
use warnings;
use 5.010;
use autodie;
open my $fh, '<', 'example.csv';
<$fh>; # Skip header
while (<$fh>) {
my #fields = split /,/;
if( $fields[4] !~ /^\d{4}-\d{2}-\d{2}$/ ) {
print "Row#$.,$_";
}
}
output
Row#5,4.csv,4,Jim,9876,not_a_date,jones
Update
If you want to process a number of files then you need this instead.
The close ARGV at the end of the loop is there so that the line counter $. is reset to
1 at the start of each file. Without it it just continues from 1 upwards across all the files.
You would run this like
rob#Samurai-U:~$ perl findbad.pl *.csv
or you could list the files individually, separated by spaces.
For the test I have created files 1.csv and 2.csv which are identical to your example data except that the first field of each line is the name of the file containing the data.
You may not want the line in the output that announces each file name, in which case you should replace the entire first if block with just next if $. == 1.
use strict;
use warnings;
#ARGV = map { glob qq{"$_"} } #ARGV; # For Windows
while (<>) {
if ($. == 1) {
print "\n\nFile: $ARGV\n\n";
next;
}
my #fields = split /,/;
unless ( $fields[4] =~ /^\d{4}-\d{2}-\d{2}$/ ) {
printf "Row#%d,%s", $., $_;
}
close ARGV if eof ARGV;
}
output
File: 1.csv
Row#5,1.csv,4,Jim,9876,not_a_date,jones
File: 2.csv
Row#5,2.csv,4,Jim,9876,not_a_date,jones

Searching equal operator in a text file in perl

I've a text file which has equal symbol as shown below many times the start of the line. How can i extract such a line. I"ve tried the below code, but its not working. ANy clues as to why its not matching?
Text file line:
[==========] 10 tests from 4 test cases ran. (43950 ms total)
Code:
if (/^\Q[==========]\E/ .. /^\Qran\)\E/) {
print "$i.Match Found:".$_."\n";
$i++;
}

Try this, have not tested, but should work. I HAVE tested the regex and it works.
#!/usr/bin/perl
use strict;
use warnings;
open (somefile, 'data.txt');
while(<somefile>) {
chomp;
if ( $_ =~ m/^\[==========\]/ ) {
print "Match found: ";
}
}
close (somefile);
For clarification purposes; chomp removes new lines from end of the line, and is not essential in this case.

#!/usr/bin/perl
# your code goes here
use strict;
use warnings;
while(chomp(my $line = <DATA>)) {
if ( $line =~ m$^\[=.*?]$ ){
print "Line which starts with [==] is $line\n";
}
}
__DATA__
[==========] 10 tests from 4 test cases ran. (43950 ms total)
A line without the equal signs at the beginning
[==========] 4 tests from 2 test cases ran. (30950 ms total)
[===]A line with equal signs at beginning.
Demo

You're using the flip-flop operator which will match lines starting from the first regex and ending at the second one (or the end of the data). From the regex you're using, I don't think this is your intention.
To match a line starting with [==========] and extract everything up to the word ran you need to use a capture group:
if (/^\Q[==========]\E(.*?ran)/) {
print "$. Match Found: $1\n";
}
The brackets match any character up to and including ran then place them in the special $1 variable. Note also the use of $., the current line number, to save you keeping count with $i.
If you wanted to extract just the numbers you could use:
if (/^\Q[==========]\E (\d+) tests from (\d+) test cases ran/) {
print "$. Match Found: $1 $2\n";
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse