awk two files based on 1st & 2nd column - perl

I'm trying to merge a new file with an old one.
There is a unique key on the primary column, then a separator '=' and a value.
In case primary key exist in both file I must keep the old value and if the new one is different add a comment near the line.
In case primary key exist only in old, keep it.
In case primary key exist only in new, insert it
For example:
In the old file:
$ cat oldfile.txt
VAR NAME ONE = FOO
TWO BAR = VALUE
; this is a comment
In the new one:
$ cat newfile.txt
TWO BAR = VALUE
; this is a comment
VAR NAME ONE = BAR
NEW = DATA
Desired output :
$ cat output.txt
VAR NAME ONE = FOO
;new value:
; VAR NAME ONE = BAR
TWO BAR = VALUE
; this is a comment
NEW = DATA
I've tried to deal with diff but it work only line by line, I'm pretty sure awk can do it.. but I'm not an expert with awk. I can write something in ksh to do the job, but I'm pretty sure there is an other way quicker and simpler.
Please note, order of line in previous and new file can change and I'm on AIX (Unix), not Linux.
Thanks for your help :)
EDIT:
I did not precise in the first post, new comments must be kept if they are not already present in the previous file.

Perl solution. First, it reads the new file into a hash. Then it goes over the old one and consults the hash for the changes. You did not specify what to do with comments in the new file, you have to tweak the code at the corresponding comment.
#!/usr/bin/perl
use warnings;
use strict;
my ($oldfile, $newfile) = #ARGV;
my %new;
my $last;
open my $NEW, '<', $newfile or die $!;
while (<$NEW>) {
chomp;
if (/^;/) {
$new{$last}{comment} = $_; # What should we do with comments?
} elsif (! /=/) {
warn "Invalid new line $.\n";
} else {
my ($key, $value) = split /\s* = \s*/x, $_, 2;
$new{$key}{value} = $value;
$last = $key;
}
}
open my $OLD, '<', $oldfile or die $!;
while (<$OLD>) {
chomp;
if (/^;/) {
print "$_\n";
} elsif (my ($key, $value) = split /\s* = \s*/x, $_, 2) {
if (exists $new{$key}) {
print "$key = $value\n";
if ($new{$key}{value} ne $value) {
print ";new value:\n";
print "; $key = $new{$key}{value}\n";
}
} else {
print "$key = $value\n";
}
delete $new{$key};
} else {
warn "Invalid old line $.\n";
}
}
for my $key (keys %new) {
print "$key = $new{$key}{value}\n";
}

Using awk:
awk '
BEGIN {FS=OFS="="}
NR==FNR {
line[$1] = $2;
next
}
($1 in line) && ($2!=line[$1]) {
print $0;
print ";new value:";
print "; "$1, line[$1];
delete line[$1];
next
}
($1 in line) && ($2==line[$1]) {
print $0;
delete line[$1];
next
}1
END {
for (k in line)
print k, line[k]
}' newfile oldfile
Output:
VAR NAME ONE = FOO
;new value:
; VAR NAME ONE = BAR
TWO BAR = VALUE
; this is a comment
NEW = DATA

Related

How to extract some specific information from a file in perl?

## some lines
## cell (a) { area : 0.898; power: 0.867;
....(some parameters values)
}
pin(a1) { power: 0.767; (some more parameters specific to pins)
timing() {
## again some parameters value....
}
My file contains approx 300 such cells and these cells are in between the files. I want to parse the file and what to know all the variable parameters, i tried following code but it is of no use
while (defined($line=<$fh>)) {
if ($line =~ /cell \(\w+\) \{/../cell \(\w+\) \{/) {
print $result "$line \n";
}
}
I want to get the values inside { } also but , dont know how to get as i have parenthesis inside parenthesis in my code. Please help.
Thank u all for the help..I wrote a code to take into account of scalar attributes(ignoring all the attributes inside parenthesis.) BUt I am facing a very weird problem. I am facing problem with if ($line =~ /cell (\w/../cell (\w/) in my code. For the first file, it detects the line which has cell ( field and starts from there, but for the second file it starts from the first line itself.
open $result_file1, ">", "file1.txt";
open $result_file2, ">", "file2.txt";
open $fl1, $file1; open $fl2, $file2;
sub file_reader {
($fh, $indx) = #_;
$count = 0;
undef #temp; undef #pram;
while (defined($line=<$fh>)) {
if ($line =~ /cell \(\w/../cell \(\w/) {
if ($indx == "1") {print $result_file1 "$line\n";}
if ($indx == "2") {print $result_file2 "$line\n";}
if ($line =~ /cell \(\w/) {
#temp = split (' ', $line);}
if ($line =~ /\{/) {
$count += 1;}
if ($line =~ /\}/) {
$count = $count - 1; }
if (($line =~ /:/) and ($count == 1)) {
#pram = split (':', $line);
if ($indx == "1") {$file1{$temp[1]}{#pram[1]} = #pram[2];}
elsif ($indx == "2") { $file2{$temp[1]}{#pram[1]} = #pram[2];}
} }}
close $fh;}
file_reader($fl1, "1");
file_reader($fl2, "2");
A piece of output of file1 :
cell (AND2X1) {
cell_footprint : "AND2X1 ";
area : 7.3728 ;
cell_leakage_power : 3.837209e+04;
driver_waveform_rise : "preDrv";
driver_waveform_fall : "preDrv";
pg_pin (VDD) {
voltage_name : "VDD";
pg_type : "primary_power";
}
pg_pin (VSS) {
voltage_name : "VSS";
pg_type : "primary_ground";
}
.......
A piece of output of file2:
/**********************************************************************
**** ****
**** The data contained in the file is created for educational ****
**** and training purposes only and are not recommended ****
**** for fabrication ****
**** ****
***********************************************************************
**** ****
Why it is not able to apply that range if condition for my second file?
I have to guess at your input data and therefor am not very sure about the problem and the goal.
But anyway, try changing your line
if ($line =~ /cell \(\w/../cell \(\w/) {
to
if ($line =~ /cell \(\w/.. $line =~/cell \(\w/) {
Otherwise the second regex will be matched against the uninitialised "$_".
I found this out by
use strict;
use warnings;
which is one of my favorite tools.
By the way, you made me aware of this use of the range operator, which I find fascinating. Thanks.

Ignore empty lines in a CSV file using Perl

I have to read a CSV file, abc.csv, select a few fields from them and form a new CSV file, def.csv.
Below is my code. I am trying to ignore empty lines from abc.csv.
genNewCsv();
sub genNewCsv {
my $csv = Text::CSV->new();
my $aCsv = "abc.csv"
my $mCsv = "def.csv";
my $fh = FileHandle->new( $aCsv, "r" );
my $nf = FileHandle->new( $mCsv, "w" );
$csv->print( $nf, [ "id", "flops""ub" ] );
while ( my $row = $csv->getline($fh) ) {
my $id = $row->[0];
my $flops = $row->[2];
next if ( $id =~ /^\s+$/ ); #Ignore empty lines
my $ub = "TRUE";
$csv->print( $nf, [ $id, $flops, $ub ] );
}
$nf->close();
$fh->close();
}
But I get the following error:
Use of uninitialized value $flops in pattern match (m//)
How do I ignore the empty lines in the CSV file?
I have used Stack Overflow question Remove empty lines and space with Perl, but it didn't help.
You can skip the entire row if any fields are empty:
unless(defined($id) && defined($flop) && defined($ub)){
next;
}
You tested if $id was empty, but not $flops, which is why you got the error.
You should also be able to do
unless($id && $flop && $ub){
next;
}
There, an empty string would evaluate to false.
Edit: Now that I think about it, what do you mean by ignore lines that aren't there?
You can also do this
my $id = $row[0] || 'No Val' #Where no value is anything you want to signify it was empty
This will show a default value for the the variable, if the first value evaluated to false.
Run this on your file first to get non-empty lines
sub removeblanks {
my #lines = ();
while(#_) {
push #lines, $_ unless( $_ =~ /^\s*$/ ); #add any other conditions here
}
return \#lines;
}
You can do the following to skip empty lines:
while (my $row = $csv->getline($fh)){
next unless grep $_, #$row;
...
You could use List::MoreUtils to check if any of the fields are defined:
use List::MoreUtils qw(any);
while(my $row = ...) {
next unless any { defined } #{ $row };
...
}

Perl Help Needed: Replacing values

I am having an input file like this:
Input file
I need to replace the value #pSBSB_ID="*" of #rectype=#pRECTYPE="SBSB" with #pMEME_SSN="034184233", value of #pRECTYPE="SMSR", ..and have to delete the row where #rectype='#pRECTYPE="SMSR", '
Example:
So, after changes have been made, the file should be like this:
....#pRECTYPE="SBSB", #pGWID="17199269", #pINPUT_METHOD="E", #pGS08="005010X220A1", #pSBSB_FAM_UPDATE_CD="UP", #pSBSB_ID="034184233".....
....#pRECTYPE="SBEL", #pSBEL_EFF_DT="01/01/2013", #pSBEL_UPDATE_CD="TM", #pCSPD_CAT="M", #pCSPI_ID="MHMO1003"
.
.
.
Update
I tried below mentioned code:
Input file extension: mms and there are multiple files to process.
my $save_for_later;
my $record;
my #KwdFiles;
my $r;
my $FilePath = $ARGV[0];
chdir($FilePath);
#KwdFiles = <*>;
foreach $File(#KwdFiles)
{
unless(substr($File,length($File)-4,length($File)) eq '.mms')
{
next;
}
unless(open(INFILE, "$File"))
{
print "Unable to open file: $File";
exit(0);
}
print "Successfully opened the file: \"$File\" for processing\n\n";
while ( my $record = <INFILE> ) {
my %r = $record =~ /\#(\w+) = '(.*?)'/xg;
if ($r{rectype} eq "SMSR") {
$save_for_later = $r{pMEME_SSN};
next;
}
elsif ($r{rectype} eq "SBSB" and $r{pSBSB_ID} eq "*") {
$record =~ s|(\#pSBSB_ID = )'.*?'|$1'$save_for_later'|x;
}
close(INFILE);
}
}
But, I am still not getting the updated values in the file.
#!/usr/bin/perl
open IN, "< in.txt";
open OUT, "> out.txt";
my $CUR_RECID = 1^1;
while (<IN>) {
if ($CUR_RECID) {
s/recname='.+?'/recname='$CUR_RECID'/ if /rectype='DEF'/;
$CUR_RECID = 1^1;
print OUT;
}
$CUR_RECID = $1 if /rectype='ABC'.+?rec_id='(.+?)'/;
}
close OUT;
close IN;
Try that whole code. No need a separate function; This code does everything.
Run this script from your terminal with the files to be modified as arguments:
use strict;
use warnings;
$^I = '.bak'; #modify original file and create a backup of the old ones with .bak appended to the name
my $replacement;
while (<>) {
$replacement = $1 if m/(?<=\#pMEME_SSN=)("\d+")/; #assume replacement will be on the first line of every file.
next if m/^\s*\#pRECTYPE="SMSR"/;
s/(?<=\#pSBSB_ID=)("\*")/$replacement/g;
print;
}

Relative Record Separator in Perl

I have a data that looks like this:
id:40108689 --
chr22_scrambled_bysegments:10762459:F : chr22:17852459:F (1.0),
id:40108116 --
chr22_scrambled_bysegments:25375481:F : chr22_scrambled_bysegments:25375481:F (1.0),
chr22_scrambled_bysegments:25375481:F : chr22:19380919:F (1.0),
id:1 --
chr22:21133765:F : chr22:21133765:F (0.0),
So each record is separated by id:[somenumber] --
What's the way to access the data so that we can have a hash of array:
$VAR = { 'id:40108689' => [' chr22_scrambled_bysegments:10762459:F : chr22:17852459:F (1.0),'],
'id:40108116' => ['chr22_scrambled_bysegments:25375481:F :chr22_scrambled_bysegments:25375481:F (1.0)',
'chr22_scrambled_bysegments:25375481:F : chr22:19380919:F (1.0),'
#...etc
}
I tried to approach this using record separator. But not sure how to generalize it?
{
local $/ = " --\n"; # How to include variable content id:[number] ?
while ($content = <INFILE>) {
chomp $content;
print "$content\n" if $content; # Skip empty records
}
}
my $result = {};
my $last_id;
while (my $line = <INFILE>) {
if ($line =~ /(id:\d+) --/) {
$last_id = $1;
next;
}
next unless $last_id; # Just in case the file doesn't start with an id line
push #{ $result->{$last_id} }, $line;
}
use Data::Dumper;
print Dumper $result;
Uses the normal record separator.
Uses $last_id to keep track of the last id row encountered and is set to the next id when another one is encountered. Pushes non-id rows on to an array for the hash key of the last matched id line.

Dealing with multiple capture groups in multiple records

Data Format:
attribname: data
Data Example:
cheese: good
pizza: good
bagel: good
fire: bad
Code:
my $subFilter='(.+?): (.+)';
my #attrib = ($dataSet=~/$subFilter/g);
for (#attrib)
{
print "$_\n";
}
The code spits out:
cheese
good
pizza
good
[etc...]
I was wondering what an easy Perly way to do this is? I am parsing the data from a log the data above is trash for simplicity. I am newer to Perl, I suspect I could do this via fanangling indexes, but I was wondering if there is a short method of implementing this? Is there any way to have the capture groups put into two different variables instead of serially appended to the list along with all matches?
Edit: I want the attribute and it's associated value together so I can the do what I need to to them. For example if within my for loop I could access both the attribute name and attribute value.
Edit:
I tried
my %attribs;
while (my $line = <$data>)
{
my ($attrib, $value) = ($line=~m/$subFilter/);
print $attribs{$attrib}," : ", $value,"\n";
}
and no luck :( I don't get any output with this. My data is in a variable not a file, because it parsed out of a set of parent data which is in a file. It would be convenient if the my variable worked so that my (#attrib, #value) = ($line=~/$subFilter/g); filled the lists appropriately with the multiple matches.
Solution:
my #line = ($7 =~/(.+?)\n/g);
for (#line)
{
my ($attrib, $value) = ($_=~m/$subFilter/);
if ($attrib ne "")
{
print $attrib," : ", $value,"\n";
}
}
I'm not really clear on what you actually want to store, but here's how you could store the data in a hash table, with '1' indicating good and '0' indicating 'bad':
use strict;
use warnings;
use Data::Dumper;
my %foods;
while (my $line = <DATA>)
{
chomp $line;
my ($food, $good) = ($line =~ m/^(.+?): (.+)$/);
$foods{$food} = ($good eq 'good' ? 1 : 0);
}
print Dumper(\%foods);
__DATA__
cheese: good
pizza: good
bagel: good
fire: bad
This prints:
$VAR1 = {
'bagel' => 1,
'cheese' => 1,
'fire' => 0,
'pizza' => 1
};
A sensible approach would be to make use of the split function:
my %attrib;
open my $data, '<', 'fileName' or die "Unable to open file: $!";
while ( my $line = <$data> ) {
my ( $attrib, $value ) = split /:\s*/, $line, 2;
$attrib{$attrib} = $value;
}
close $data;
foreach my $attrib ( keys %attrib ) {
print "$attrib: $attrib{$attrib}\n";
}
If you're into one-liners, the following would achieve the same:
$ perl -F/:\s*/ -ane '$attrib{$F[0]} = $F[1]; } END { print $_,"\t",$attrib{$_},"\n" foreach keys %attrib;" fileName