How to extract some specific information from a file in perl? - perl

## some lines
## cell (a) { area : 0.898; power: 0.867;
....(some parameters values)
}
pin(a1) { power: 0.767; (some more parameters specific to pins)
timing() {
## again some parameters value....
}
My file contains approx 300 such cells and these cells are in between the files. I want to parse the file and what to know all the variable parameters, i tried following code but it is of no use
while (defined($line=<$fh>)) {
if ($line =~ /cell \(\w+\) \{/../cell \(\w+\) \{/) {
print $result "$line \n";
}
}
I want to get the values inside { } also but , dont know how to get as i have parenthesis inside parenthesis in my code. Please help.
Thank u all for the help..I wrote a code to take into account of scalar attributes(ignoring all the attributes inside parenthesis.) BUt I am facing a very weird problem. I am facing problem with if ($line =~ /cell (\w/../cell (\w/) in my code. For the first file, it detects the line which has cell ( field and starts from there, but for the second file it starts from the first line itself.
open $result_file1, ">", "file1.txt";
open $result_file2, ">", "file2.txt";
open $fl1, $file1; open $fl2, $file2;
sub file_reader {
($fh, $indx) = #_;
$count = 0;
undef #temp; undef #pram;
while (defined($line=<$fh>)) {
if ($line =~ /cell \(\w/../cell \(\w/) {
if ($indx == "1") {print $result_file1 "$line\n";}
if ($indx == "2") {print $result_file2 "$line\n";}
if ($line =~ /cell \(\w/) {
#temp = split (' ', $line);}
if ($line =~ /\{/) {
$count += 1;}
if ($line =~ /\}/) {
$count = $count - 1; }
if (($line =~ /:/) and ($count == 1)) {
#pram = split (':', $line);
if ($indx == "1") {$file1{$temp[1]}{#pram[1]} = #pram[2];}
elsif ($indx == "2") { $file2{$temp[1]}{#pram[1]} = #pram[2];}
} }}
close $fh;}
file_reader($fl1, "1");
file_reader($fl2, "2");
A piece of output of file1 :
cell (AND2X1) {
cell_footprint : "AND2X1 ";
area : 7.3728 ;
cell_leakage_power : 3.837209e+04;
driver_waveform_rise : "preDrv";
driver_waveform_fall : "preDrv";
pg_pin (VDD) {
voltage_name : "VDD";
pg_type : "primary_power";
}
pg_pin (VSS) {
voltage_name : "VSS";
pg_type : "primary_ground";
}
.......
A piece of output of file2:
/**********************************************************************
**** ****
**** The data contained in the file is created for educational ****
**** and training purposes only and are not recommended ****
**** for fabrication ****
**** ****
***********************************************************************
**** ****
Why it is not able to apply that range if condition for my second file?

I have to guess at your input data and therefor am not very sure about the problem and the goal.
But anyway, try changing your line
if ($line =~ /cell \(\w/../cell \(\w/) {
to
if ($line =~ /cell \(\w/.. $line =~/cell \(\w/) {
Otherwise the second regex will be matched against the uninitialised "$_".
I found this out by
use strict;
use warnings;
which is one of my favorite tools.
By the way, you made me aware of this use of the range operator, which I find fascinating. Thanks.

Related

compare the value of same items with one exception case

A,food,75
B,car,136
A,car,69
B,shop,80
A,house,179
B,food,75
C,car,136
ECX5,flight,50
QC4,train,95
C,food,85
B,house,150
D,shop,80
EAX5,flight,50
QA4,train,75
F,movie,
It should do comparison between the values of same type (wherever the 2nd column matches) and Print the differ .Now i want output to look like:
**A,food,75 is not matching with B,food,75 C,food,85
A,car,69 is not matching with C,car,136 B,Car,136
A,house,179 is not matching with B,house,150
QC4,train,95 is not matching with QA4,train,75
F,movie missing value
Code I've written is below but its not printing the format the way I want.
while (FILE) {
my $line = $_ ;
my #lines = split /,/, $line ;
$data{$lines[1]}{$lines[0]} = $lines[2] ;
}
foreach my $item (keys %val) {
foreach my $letter1 (keys %{$val{$item}}) {
foreach my $letter2 (keys %{$val{$item}}) {
if ( ($val{$item}{$letter1} != $val{$item}{$letter2}) && ($letter1 ne
$letter2) && ( (!defined $done{$item}{$letter1}{$letter2}) ||
(!defined
$done{$item}{$letter2}{$letter1}) ) ) {
print "$item : $letter1, $val{$item}{$letter1}, $letter2 ,
$val{$item}
{$letter2}\n" ;
}
}
Really hard to follow the logic of your code.
But I seem to get the desired result with this:
[Edit] The code was edited as per the comments
use strict;
use warnings;
my (%hash,
);
while(my $line=<DATA>) {
chomp $line;
my ($letter, $object, $number)=split /,/, $line;
### here we are dealing with missing values
if ($number =~ m{^\s*$}) {
print $line, "missing value\n";
next;
}
### here we dissever exceptional items apart from the others
if ($letter=~m{^E[AC]X\d$}) {
$object = "exceptional_$object";
}
$number+=0; # in case there is whitespace at the end
push #{$hash{$object}{$number}}, [$letter,$number,$line];
}
for my $object(sort keys %hash) {
my $oref = $hash{$object};
if (1==keys %$oref) {
next;
}
my $str;
for my $item (values %$oref) {
$str .= $str ? " $item->[0][2]" : "$item->[0][2] is not matching with";
}
print ($str,"\n");
}
__DATA__
A,food,75
B,car,136
A,car,69
B,shop,80
A,house,179
B,food,75
C,car,136
ECX5,flight,50
ECX4,train,95
C,food,85
B,house,150
D,shop,80
EAX5,flight,50
EAX4,train,75
F,movie,
output
F,movie,missing value
A,car,69 is not matching with B,car,136
EAX4,train,75 is not matching with ECX4,train,95
C,food,85 is not matching with A,food,75
A,house,179 is not matching with B,house,150
What the algorithm does:
Looping through the input we remember the all the lines for each unique pair of object and number.
After going through the input loop we do the following:
For each object we skip it if it has no different numbers:
if (1==keys %$oref) {
next
}
If it has, we build an output string from a list of the first remembered lines for that object and number (that is we omit the duplicates for the object and number);
the first item from the list amended with "is not matching with".
Also, I am reading from the special filehandle DATA, which accesses embedded data in the script. This is for convenience of demoing the code

Extra HASH() reference added to Perl hash output

I'm trying to read a FORTRAN program using Perl, and remove an INCLUDE command and replace it with a USE.
This is working great, except when printing out the contents of the hash storing the existing USE statements, I get an extra entry.
my #lines = ( );
my %uses = { };
foreach $f1line (<>) {
$f1line =~ s/\r[\n]*//g;
if ($f1line =~ /^\s*INCLUDE 'FILE1.CMN'/ ||
$f1line =~ /^\s*INCLUDE 'FILE2.CMN'/ ||
$f1line =~ /^\s*INCLUDE 'FILE3.CMN'/) {
$f1line = " USE My_Mod";
}
if ($f1line =~ /^\s*USE /) {
$uses{$f1line} = 1;
}
push #lines, $f1line . $/;
}
$found = 0;
foreach $line (#lines) {
if ($found == 0 && $line =~ /^\s*USE /) {
foreach my $x (sort(keys(%uses))) {
print $x . $/; # (1)
}
$found = 1;
} elsif ($found == 1 && $line =~ /^\s*USE /) {
next;
} else {
print $line;
}
}
The output is this:
C Include parameters here
USE My_Mod
USE MyOther_Mod
USE EvenAnother_Mod
HASH(0x7f9dc3805ce8)
Where is the HASH(0x...) reference coming from? The only place I'm printing the contents of the hash is on line (1). It almost looks like iterating over the keys of the hash is somehow including the hash itself.
How can I get rid of this?
You are not really having a big problem, the big deal here is that you are not able to see the errors you are doing.
That's why you should always strict and warnings
In your code you start with:
my %uses = { };
When it should be:
my %uses = ();
or
my %uses; #it's fine also
And then it will works.
By using {} in a "hash" context you could create a hashref which is not the case.
A reference to an anonymous hash can be created using curly brackets:
$hashref = {
'Adam' => 'Eve',
'Clyde' => 'Bonnie',
};
Also is a good practice declare your variables in foreach loop like:
foreach my $line (#lines) {
And in the rest of your code.

Parsing through a file using perl and string concatenation

I'm trying to parse through the file and collecting types of mismatches in the different modules and generating an xls. Below is the report pattern i need to parse( but actual report pattern is not simple as below):
outsocket ports in design1 not in design2
a
b
Insocket ports in design1 not in design2
g
h
There can be many design name(pushed to module list using $module) in the log but this pattern will not change.(Insocket/outsocket/othertype ports in <> not in <>)
Below is my core code.I'm facing issue with this code and it is not working(especially string concatenation) as expected please help me fix this.
while ($line = <FH>) {
if ( $line =~ /Insocket(\s*)ports(\s*)in(\s*)${design1}(\s*)not(\s*)in(\s*)${design2}/) {
$mismatch_type = "type_i_n";
}
elsif ($line =~ /Insocket(\s*)ports(\s*)in(\s*)${design2}(\s*)not(\s*)in(\s*)${design1}/) {
$mismatch_type = "type_i_r";
}
elsif ( $line =~ /outsocket(\s*)ports(\s*)in(\s*)${design2}(\s*)not(\s*)in(\s*)${design1}/ ) {
$mismatch_type = "type_o_n";
}
elsif ( $line =~ /outsocket(\s*)ports(\s*)in(\s*)${design1}(\s*)not(\s*)in(\s*)${design2}/ ) {
$mismatch_type = "type_o_r";
}
$result = $mismatch_type . "_code_ " . $module;
$$result = $$result .$line;
if(!present_in_list ($module,#module_list)) {
push #module_list,$module;
}
push #mismatch_type_list,$mismatch_type;
}#PARSING OF FILE ENDS
#NOW PROCESSING THE RESULTS BASED ON PARSING
foreach $module (#module_list) {
foreach $each_mismatch (#mismatch_type_list) {
$result = $mismatch_type . "_code_ " . $module;
print FH2" $$result,";
}
}
Here are a couple of guesses and general suggestions. I'm not sure this will fix your problem, but it will probably help:
The first two regexes have an extra } in ${design1}} and ${design2}}. Not sure if this is present in your original program or if it's an error that was introduced in posting here.
It looks like you're using string concatenation to build up the name of a $result variable, so you can then use that string as the name of another variable like $type_i_n_code_module_a. I strongly recommend that to reduce confusion, that you use a hash instead:
%result = ();
while ($line = <FH>) {
...
$result_key = $mismatch_type . "_code_ " . $module;
$result{$result_key} = $result{$result_key} . $line;
}
...
foreach $module (#module_list) {
foreach $each_mismatch (#mismatch_type_list) {
$result_key = $mismatch_type . "_code_ " . $module;
print FH2" $result{$result_key},";
}
}

How can I replace a specific word when it occurs only once in a given subset of data?

Consider the dataset below. Each chunk begining with a number is a 'case'. In the real dataset I have hundreds of thousands of cases. I'd like to replace the word "Exclusion" with "0" when there's only one word Exclusion in a case (e.g. case 10001).
If I loop through lines, I can count how many "Exclusions" I have in each case. But, if there's only one line with the word "Exclusion", I don't know how to get back to that line and replace the word.
How can I do that?
10001
M1|F1|SP1;12;12;12;11;13;10;Exclusion;D16S539
M1|F1|SP1;12;10;12;9;11;9;3.60;D16S
M1|F1|SP1;12;10;10;7;11;7;20.00;D7S
M1|F1|SP1;13;12;12;12;12;12;3.91;D13S
M1|F1|SP1;11;11;13;11;13;11;3.27;D5S
M1|F1|SP1;14;12;14;10;12;10;1.99;CSF
10002
M1|F1|SP1;8;13;13;8;8;12;2.91;D16S
M1|F1|SP1;13;11;13;10;10;10;4.13;D7S
M1|F1|SP1;12;9;12;10;11;16;Exclusion;D13S
M1|F1|SP1;12;10;12;10;14;15;Exclusion;D5S
M1|F1|SP1;13;10;10;10;17;18;Exclusion;CSF
sub process_block {
my ($block) = #_;
$block =~ s/\bExclusion\b/0/
if $block !~ /\bExclusion\b.*\bExclusion\b/s;
print($block);
}
my $buf;
while (<>) {
if (/^\d/) {
process_block($buf) if $buf;
$buf = '';
}
$buf .= $_;
}
process_block($buf) if $buf;
As you read the file, buffer up all lines in a case, and count exclusions,
my ($case,$buf,$count) = (undef,"",0);
while(my $ln = <>) {
Use a regex to detect a case,
if( $ln =~ /^\d+$/ ) {
#new case, process/print old case
$buf =~ s/;Exclusion;/;0;/ if($count==1);
print $buf;
($case,$buf,$count) = ($ln,"",0);
}
use a regex to detect 'Exclusion' now?
elsif( $ln =~ /;Exclusion;/ ) { $count++; }
$buf .= $l;
}
And when you are done, you may have a case left to process,
if( length($buf)>0 ) {
$buf =~ s/;Exclusion;/;0;/ if($count==1);
print $buffer;
}
This is the best I could think of. Assume you read your file into #lines
# separate into blocks
foreach my $line (#lines) {
chomp($line);
if ($line =~ m/^(\d+)/) {
$key = $1;
}
else {
push (#{$block{$key}}, $line);
}
}
# go through each block
foreach my $key (keys %block) {
print "$key\n";
my #matched = grep ($_ =~ m/exclusion/i, #{$block{$key}});
if (scalar (1 == #matched)){
foreach my $line (#{$block{$key}}) {
$line =~ s/Exclusion/0/i;
print "$line\n";
}
}
else {
foreach my $line (#{$block{$key}}) {
print "$line\n";
}
}
}
There're already many correct answers here, which use buffers to store the content of a "case".
Here's another solution using tell and seek to rewind the file, so buffers are not necessary. This could be useful when your "case" is very large and you're sensitive to the performance or memory usage.
use strict;
use warnings;
open FILE, "text.txt";
open REPLACE, ">replace.txt";
my $count = 0; # count of 'Exclusion' in the current case
my $position = 0;
my $prev_position = 0;
my $first_occur_position = 0; # first occurence of 'Exclusion' in the current case
my $visited = 0; # whether the current line is visited before
while (<FILE>) {
# keep track of the position before reading
# the current line
$prev_position = $position;
$position = tell FILE;
if ($visited == 0) {
if (/^\d+/) {
# new case
if ($count == 1) {
# rewind to the first occurence
# of 'Exclusion' in the previous case
seek FILE, $first_occur_position, 0;
$visited = 1;
}
else {
print REPLACE $_;
}
}
elsif (/Exclusion/) {
$count++;
if ($count > 1) {
seek FILE, $first_occur_position, 0;
$visited = 1;
}
elsif ($count == 1) {
$first_occur_position = $prev_position;
}
}
else {
print REPLACE $_ if ($count == 0);
}
if (eof FILE && $count == 1) {
seek FILE, $first_occur_position, 0;
$visited = 1;
}
}
else {
if ($count == 1) {
s/Exclusion/0/;
}
if (/^\d+/) {
$position = tell FILE;
$visited = 0;
$count = 0;
}
print REPLACE $_;
}
}
close REPLACE;
close FILE;

Perl Help Needed: Replacing values

I am having an input file like this:
Input file
I need to replace the value #pSBSB_ID="*" of #rectype=#pRECTYPE="SBSB" with #pMEME_SSN="034184233", value of #pRECTYPE="SMSR", ..and have to delete the row where #rectype='#pRECTYPE="SMSR", '
Example:
So, after changes have been made, the file should be like this:
....#pRECTYPE="SBSB", #pGWID="17199269", #pINPUT_METHOD="E", #pGS08="005010X220A1", #pSBSB_FAM_UPDATE_CD="UP", #pSBSB_ID="034184233".....
....#pRECTYPE="SBEL", #pSBEL_EFF_DT="01/01/2013", #pSBEL_UPDATE_CD="TM", #pCSPD_CAT="M", #pCSPI_ID="MHMO1003"
.
.
.
Update
I tried below mentioned code:
Input file extension: mms and there are multiple files to process.
my $save_for_later;
my $record;
my #KwdFiles;
my $r;
my $FilePath = $ARGV[0];
chdir($FilePath);
#KwdFiles = <*>;
foreach $File(#KwdFiles)
{
unless(substr($File,length($File)-4,length($File)) eq '.mms')
{
next;
}
unless(open(INFILE, "$File"))
{
print "Unable to open file: $File";
exit(0);
}
print "Successfully opened the file: \"$File\" for processing\n\n";
while ( my $record = <INFILE> ) {
my %r = $record =~ /\#(\w+) = '(.*?)'/xg;
if ($r{rectype} eq "SMSR") {
$save_for_later = $r{pMEME_SSN};
next;
}
elsif ($r{rectype} eq "SBSB" and $r{pSBSB_ID} eq "*") {
$record =~ s|(\#pSBSB_ID = )'.*?'|$1'$save_for_later'|x;
}
close(INFILE);
}
}
But, I am still not getting the updated values in the file.
#!/usr/bin/perl
open IN, "< in.txt";
open OUT, "> out.txt";
my $CUR_RECID = 1^1;
while (<IN>) {
if ($CUR_RECID) {
s/recname='.+?'/recname='$CUR_RECID'/ if /rectype='DEF'/;
$CUR_RECID = 1^1;
print OUT;
}
$CUR_RECID = $1 if /rectype='ABC'.+?rec_id='(.+?)'/;
}
close OUT;
close IN;
Try that whole code. No need a separate function; This code does everything.
Run this script from your terminal with the files to be modified as arguments:
use strict;
use warnings;
$^I = '.bak'; #modify original file and create a backup of the old ones with .bak appended to the name
my $replacement;
while (<>) {
$replacement = $1 if m/(?<=\#pMEME_SSN=)("\d+")/; #assume replacement will be on the first line of every file.
next if m/^\s*\#pRECTYPE="SMSR"/;
s/(?<=\#pSBSB_ID=)("\*")/$replacement/g;
print;
}