Extracting specific data from text file in Perl - perl

I am new to Perl and am trying to extract specific data from a file, which looks like this:
Print of 9 heaviest strained elements:
Element no Max strain
20004 9.6 %
20013 0.5 %
11189 0.1 %
20207 0.1 %
11157 0.1 %
11183 0.0 %
10665 0.0 %
20182 0.0 %
11160 0.0 %
==================================================
I would like to extract the element numbers only (20004, 20013 etc.) and write these to a new file. The reading of the file should end as soon as the line (=========) is reached, as there are more element numbers with the same heading later on in the file.
Hope that makes sense.
Any advice much appreciated!
I now have this code, which gives me a list of the numbers, maximum 10 in a row:
my $StrainOut = "PFP_elem"."_$loadComb"."_"."$i";
open DATAOUT, ">$StrainOut" or die "can't open $StrainOut"; # Open the file for writing.
open my $in, '<', "$POSTout" or die "Unable to open file: $!\n";
my $count = 0;
while(my $line = <$in>) {
last if $line =~ / ={10}\s*/;
if ($line =~ /% *$/) {
my #columns = split " ", $line;
$count++;
if($count % 10 == 0) {
print DATAOUT "$columns[1]\n";
}
else {
print DATAOUT "$columns[1] ";
}
}
}
close (DATAOUT);
close $in;
What needs changing is the "my #columns = split..." line. At the moment it splits up the $line scalar whenever it has '9 spaces'. As the number of digits of the element numbers can vary, this is a poor way of extracting the data. Is it possible to just read from left to right, omitting all spaces and recording numbers only until the numbers are followed by more spaces (that way the percentage value is ignored)?

#!/usr/bin/perl
use strict;
use warnings;
while (<>) { # read the file line by line
if (/% *$/) { # if the line ends in a percent sign
my #columns = split; # create columns
print $columns[0], "\n"; # print the first one
}
last if /={10}/; # end of processing
}

A one-liner using flip-flop:
perl -ne '
if ( m/\A\s*(?i)element\s+no/ .. ($end = /\A\s*=+\s*\Z/) ) {
printf qq[$1\n] if m/\A\s*(\d+)/;
exit 0 if $end
}
' infile
Result:
20004
20013
11189
20207
11157
11183
10665
20182
11160

#!/usr/bin/perl
use strict;
use warnings;
while (my $f= shift) {
open(F, $f) or (warn("While opening $f: $!", next);
my foundstart=0;
while(<F>) {
($foundstart++, next) if /^\s#Element/;
last if /\s*=+/;
print $_ if $foundstart;
}
$foundstart=0;
close(F);
}

#!/usr/bin/perl
use strict;
use warnings;
open my $rh, '<', 'input.txt' or die "Unable to open file: $!\n";
open my $wh, '>', 'output.txt' or die "Unable to open file: $!\n";
while (my $line = <$rh>) {
last if $line =~ /^ ={50}/;
next unless $line =~ /^ {6}(\d+)/;
print $wh "$1\n";
}
close $wh;

You could do it by running this one-liner in a command shell.
On *nix:
cat in_file.txt | perl -ne 'print "$1\n" if ( m/\s*(\d+)\s*\d+\.\d+/ )' > out_file.txt
On Windows:
type in_file.txt | perl -ne "print qq{$1\n} if ( m/\s*(\d+)\s*\d+\.\d+/ )" > out_file.txt

Related

how to combine the code to make the output is on the same line?

Can you help me to combine both of these progeam to display the output in a row with two columns? The first column is for $1 and the second column is $2.
Kindly help me to solve this. Thank you :)
This is my code 1.
#!/usr/local/bin/perl
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line=~ m/^\s+Timing Path Group \'(\S+)\'/) {
$line = $1;
print ("$1\n");
}
}
close (FILE);
This is my code 2.
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line=~ m/^\s+Levels of Logic:\s+(\S+)/) {
$line = $1;
print ("$1\n");
}
}
close (FILE);
this is my output for code 1 which contain 26 line of data:
**async_default**
**clock_gating_default**
Ddia_link_clk
Ddib_link_clk
Ddic_link_clk
Ddid_link_clk
FEEDTHROUGH
INPUTS
Lclk
OUTPUTS
VISA_HIP_visa_tcss_2000
ckpll_npk_npkclk
clstr_fscan_scanclk_pulsegen
clstr_fscan_scanclk_pulsegen_notdiv
clstr_fscan_scanclk_wavegen
idvfreqA
idvfreqB
psf5_primclk
sb_nondet4tclk
sb_nondetl2tclk
sb_nondett2lclk
sbclk_nondet
sbclk_sa_det
stfclk_scan
tap4tclk
tapclk
The output code 1 also has same number of line.
paste is useful for this: assuming your shell is bash, then using process substitutions
paste <(perl script1.pl) <(perl script2.pl)
That emits columns separated by a tab character. For prettier output, you can pipe the output of paste to column
paste <(perl script1.pl) <(perl script2.pl) | column -t -s $'\t'
And with this, you con't need to try and "merge" your perl programs.
To combine the two scripts and to output two items of data on the same line, you need to hold on until the end of the file (or until you have both data items) and then output them at once. So you need to combine both loops into one:
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $input = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $output = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";
open (FILE, '<',"$output") or die "Cannot open $output\n";
my( $levels, $timing );
while (<FILE>) {
my $line = $_;
chomp ($line);
if ($line=~ m/^\s+Levels of Logic:\s+(\S+)/) {
$levels = $1;
}
if ($line=~ m/^\s+Timing Path Group \'(\S+)\'/) {
$timing = $1;
}
}
print "$levels, $timing\n";
close (FILE);
You still haven't given us one vital piece of information - what does the input data looks like. Most importantly, are the two pieces of information you're looking for on the same line?
[Update: Looking more closely at your regexes, I see it's possible for both pieces of information to be on the same line - as they are both supposed to be the first item on the line. It would be helpful if you were clearer about that in your question.]
I think this will do the right thing, no matter what the answer to your question is. I've also added the improvements I suggested in my answer to your previous question:
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $zipped = "par_disp_fabric.all_max_lowvcc_qor.rpt.gz";
my $unzipped = "par_disp_fabric.all_max_lowvcc_qor.txt";
gunzip $zipped => $unzipped
or die "gunzip failed: $GunzipError\n";
open (my $fh, '<', $unzipped) or die "Cannot open '$unzipped': $!\n";
my ($levels, $timing);
while (<$fh>) {
chomp;
if (m/^\s+Levels of Logic:\s+(\S+)/) {
$levels = $1;
}
if (m/^\s+Timing Path Group \'(\S+)\'/) {
$timing = $1;
}
# If we have both values, then print them out and
# set the variables to 'undef' for the next iteration
if ($levels and $timing) {
print "$levels, $timing\n";
undef $levels;
undef $timing;
}
}
close ($fh);

Populate an array by splitting a string

I am trying to convert a string into an array based on space delimiter.
My input file looks like this:
>Reference
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnctcACCATGGTGTCGACTC
TTCTATGGAAACAGCGTGGATGGCGTCTCCAGGCGATCTGACGGTTCACTAAACGAGCTC
Ignoring the line starting with >, the length of rest of the string is 360.
I am trying to convert this into an array.
Here's my code so far:
#!/usr/bin/perl
use strict;
use warnings;
#### To to change bases with less than 10X coverage to N #####
#### Take depth file and consensus fasta file as input arguments ####
my ($in2) = #ARGV;
my $args = $#ARGV + 1;
if ( $args != 1 ) {
print "Error!!! Insufficient Number of Argumrnts\n";
print "Usage: $0 <consensus fasta file> \n";
}
#### Open a filehandle to read in consensus fasta file ####
my $FH2;
my $line;
my #consensus;
my $char;
open($FH2, '<', $in2) || die "Could not open file $in2\n";
while ( <$FH2> ) {
$line = $_;
chomp $line;
next if $line =~ />/; # skip header line
$line =~ s/\s+//g;
my $len = length($line);
print "$len\n";
#print "$line";
#consensus = split(// , $line);
print "$#consensus\n";
#print "#consensus\n";
#for $char (0 .. $#consensus){
# print "$char: $consensus[$char]\n";
# }
}
The problem is the $len variable returns a value of 60 instead of 360 and $#consensus returns a value of 59 instead of 360 which is the length of the string.
I have removed the whitespace after each line with code $line =~ s/\s+//g;but it still is not working.
It looks like your code is essentially working. It's just your checking logic that makes no sense. I'd do the following:
use strict;
use warnings;
if (#ARGV != 1) {
print STDERR "Usage: $0 <consensus fasta file>\n";
exit 1;
}
open my $fh, '<', $ARGV[0] or die "$0: cannot open $ARGV[0]: $!\n";
my #consensus;
while (my $line = readline $fh) {
next if $line =~ /^>/;
$line =~ s/\s+//g;
push #consensus, split //, $line;
}
print "N = ", scalar #consensus, "\n";
Main things to note:
Error messages should go to STDERR, not STDOUT.
If an error occurs, the program should exit with an error code, not keep running.
Error messages should include the name of the program and the reason for the error.
chomp is redundant if you're going to remove all whitespace anyway.
As you're processing the input line by line, you can just keep pushing elements to the end of #consensus. At the end of the loop it'll have accumulated all characters across all lines.
Examining #consensus within the loop makes little sense as it hasn't finished building yet. Only after the loop do we have all characters we're interested in.

Perl simple filehandling of text

What this program is meant to do is that it reads a text file which looks like:
Item \t\t Price
apple \t\t 20
orange \t\t 50
lime \t\t 30
I'm using split function to split these 2 columns and then i should apply a -25% discount on all items and print it out to a new file. My code so far does what i want but the new text file has a '0' value under my last number in price column. I also get 2 errors if i run it with "use warnings" which are:
Use of uninitialized value $item in multiplication * ...
Use of uninitialized value $item[0] in concatenation (.) ...
I should also tell total number of items calculated but i get like 5 1's instead of 5. (11111 instead of 5)
use strict;
use warnings;
my $filename = 'shop.txt';
if (-e $filename){
open (IN, $filename);
}
else{
die "Can't open input file for reading: $!";
}
open (OUT,">","discount.txt") or die "Can't open output file for writing: $!";
my $header = <IN>;
print OUT $header;
while (<IN>) {
chomp;
my #items = split(/\t\t/);
foreach my $item ($items[1]){
my $discount = $item * (0.75);
print OUT "$items[0]\t\t$discount\n";
}
}
This is too complicated and not clear what are you doing in foreach loop and you are not skipping empty lines. Keep it simple:
use warnings;
use strict;
use v5.10;
<>; # skip header
while(my $line = <>)
{
chomp $line;
next unless ($line);
my ($title, $price ) = split /\s+/, $line;
if( $title && defined $price )
{
$price *= 0.75;
say "$title\t\t$price";
}
}
and run like
perl script.pl <input.txt >output.txt
use strict;
use warnings;
my $filename = 'shop.txt';
if (-e $filename){
open (IN, $filename);
}
else{
die "Can't open input file for reading: $!";
}
open (OUT,">","discount.txt") or die "Can't open output file for writing: $!";
my $header = <IN>;
my $item;
my $price;
print OUT $header;
while (<IN>) {
chomp;
($item, $price) = split(/\t\t/);
my $discount = $price*0.75;
print OUT "$item $discount\n";
}
This should help! :)
If the total item count isn't very important to you:
$ perl -wane '$F[1] *= 0.75 if $. > 1; print join("\t", #F), "\n";' input.txt
Output:
Item Price
apple 15
orange 37.5
lime 22.5
If you really need the total item count:
$ perl -we 'while (<>) { #F = split; if ($. > 1) { $F[1] *= 0.75; $i++ } print join("\t", #F), "\n"; } print "$i items\n";' input.txt
Output:
Item Price
apple 15
orange 37.5
lime 22.5
3 items
I'd use this approach
#!/usr/bin/perl
use strict;
use warnings;
my %items;
my $filename = 'shop.txt';
my $discount = 'discount.txt';
open my $in, '<', $filename or die "Failed to open file! : $!\n";
open my $out, ">", $discount or die "Can't open output file for writing: $!";
print $out "Item\t\tPrice\n";
my $cnt = 0;
while (my $line = <$in>) {
chomp $line;
if (my ($item,$price) = $line =~ /(\w.+)\s+([0-9.]+)/){
$price = $price * (0.75);
print $out "$item\t\t$price\n";
$items{$item} = $price;
$cnt++;
}
}
close($in);
close($out);
my $total = keys %items;
print "Total items - $total \n";
print "Total items - $cnt\n";
Using regex capture groups to capture the item and price (using \w.+ in case the item is 2 words like apple sauce), this will also prevent empty lines from printing to file.
I also hard coded the Item and Price header, probably a good idea if you are going to be using a consistent header.
Hope it helps
---Update ----
I added 2 examples of a total count in my script. The first one is using a hash and printing out the hash size, the second method is using a counter. The hash option is good except if your list has 2 items that are the same in which case the key of the hash will be overridden with the last item found which shares the same name. The counter is a simple solution.

perl increasing the counter number every time the script running

I have a script to compare 2 files and print out the matching lines on the file. what I want to add a logic to help me to identify for how long these devices are matched. currently I have add the starting point 1 so I want to increase that number every time the script run and matched.
Example.
inputfile:-########################
retiredDevice.txt
Alpha
Beta
Gamma
Delta
prodDevice.txt
first
second
third
forth
Gamma
Delta
output file :-#######################
final_result.txt
1 Delta
1 Gamma
my objective is to add a counter stamp on each matching line to identify for how long "Delta" and "Gamma" matched. the script running every week. so every time the script running adding 1 so when I audit the 'finalResult.txt. the result should looks like
Delta 4
Gamma 3
the result indicate me Delta matched for last 4 weeks and Gamma for last 3 weeks.
#! /usr/local/bin/perl
my $ndays = 1;
my $f1 = "/opt/retiredDevice.txt ";
my $f2 = "prodDevice.txt";
my $outfile = "/opt/final_result.txt";
my %results = ();
open FILE1, "$f1" or die "Could not open file: $! \n";
while(my $line = <FILE1>){ $results{$line}=1;
}
close(FILE1);
open FILE2, "$f2" or die "Could not open file: $! \n";
while(my $line =<FILE2>) {
$results{$line}++;
}
close(FILE2);
open (OUTFILE, ">$outfile") or die "Cannot open $outfile for writing \n";
foreach my $line (keys %results) {
my $x = $ndays;
$x++;
print OUTFILE "$x : ", $line if $results{$line} != 1;
}
close OUTFILE;
Thanks in advance for any help!
Based on your earlier question and comments, perhaps this might work.
use strict;
use warnings;
use autodie;
my $logfile = 'int.txt';
my $f1 = shift || "/opt/test.txt";
my $f2 = shift || "/opt/test1.txt";
my %results;
open my $file1, '<', $f1;
while (my $line = <$file1>) {
chomp $line;
$results{$line} = 1;
}
open my $file2, '<', $f2;
while (my $line = <$file2>) {
chomp $line;
$results{$line}++;
}
{ ############ added part
my %c;
for (keys %results) {
$c{$_} = $results{$_} if $results{$_} > 1;
}
%results = %c;
} ############ end added part
my (%log, $log);
if ( -e $logfile ) {
open $log, '<', $logfile;
while (<$log>) {
my ($num, $key) = split;
$log{$key} = $num;
}
}
open $log, '>', $logfile or die $!;
for my $key (keys %results) {
my $old = ( $log{$key} || 0 ); # keep old count, or 0 otherwise
my $new = ( $results{$key} ? 1 : 0 ); # 1 if it exists, 0 otherwise
print $log $old + $new, " $key\n";
}
Perform this computation in two steps.
Each time you run the comparison between retired and prod, produce an output file that you save with a unique file name, e.g. result-XXX where XXX denotes when you ran the comparison.
Then write a script which iterates over all of the result-XXX files and produces a summary.
I would name the files result-YYYY-MM-DD where YYYY-MM-DD is the date that the comparison was created. Then it will be relatively easy to iterate over a subset of the files (e.g. ones for a certain month).
Or store the data in a relational database.

File manipulation in Perl

I have a simple .csv file that has that I want to extract data out of a write to a new file.
I to write a script that reads in a file, reads each line, then splits and structures the columns in a different order, and if the line in the .csv contains 'xxx' - dont output the line to output file.
I have already managed to read in a file, and create a secondary file, however am new to Perl and still trying to work out the commands, the following is a test script I wrote to get to grips with Perl and was wondering if I could aulter this to to what I need?-
open (FILE, "c1.csv") || die "couldn't open the file!";
open (F1, ">c2.csv") || die "couldn't open the file!";
#print "start\n";
sub trim($);
sub trim($)
{
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}
$a = 0;
$b = 0;
while ($line=<FILE>)
{
chop($line);
if ($line =~ /xxx/)
{
$addr = $line;
$post = substr($line, length($line)-18,8);
}
$a = $a + 1;
}
print $b;
print " end\n";
Any help is much appreciated.
To manipulate CSV files it is better to use one of the available modules at CPAN. I like Text::CSV:
use Text::CSV;
my $csv = Text::CSV->new ({ binary => 1, empty_is_undef => 1 }) or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, "<", 'c1.csv' or die "ERROR: $!";
$csv->column_names('field1', 'field2');
while ( my $l = $csv->getline_hr($fh)) {
next if ($l->{'field1'} =~ /xxx/);
printf "Field1: %s Field2: %s\n", $l->{'field1'}, $l->{'field2'}
}
close $fh;
If you need do this only once, so don't need the program later you can do it with oneliner:
perl -F, -lane 'next if /xxx/; #n=map { s/(^\s*|\s*$)//g;$_ } #F; print join(",", (map{$n[$_]} qw(2 0 1)));'
Breakdown:
perl -F, -lane
^^^ ^ <- split lines at ',' and store fields into array #F
next if /xxx/; #skip lines what contain xxx
#n=map { s/(^\s*|\s*$)//g;$_ } #F;
#trim spaces from the beginning and end of each field
#and store the result into new array #n
print join(",", (map{$n[$_]} qw(2 0 1)));
#recombine array #n into new order - here 2 0 1
#join them with comma
#print
Of course, for the repeated use, or in a bigger project you should use some CPAN module. And the above oneliner has much cavetas too.