How can I use awk or Perl to increment a number in a large XML file? - perl

I have an XML file with the following line:
<VALUE DECIMAL_VALUE="0.2725" UNIT_TYPE="percent"/>
I would like to increment this value by .04 and keep the format of the XML in place. I know this is possible with a Perl or awk script, but I am having difficulty with the expressions to isolate the number.

If you're on a box with the xsltproc command in place I would suggest you use XSLT for this.
For a Perl solution I'd go for using the DOM. Check this DOM Processing with Perl article out.
That said. If your XML file is produced in a predictable way something naïve like the following could work:
perl -pe 's#(<VALUE DECIMAL_VALUE=")([0-9.]+)(" UNIT_TYPE="percent"/>)#"$1" . ($2 + 0.4) . "$3"#e;'

If you are absolutely sure that the format of your XML will never change, that the order of the attributes is fixed, that you can indeed get the regexp for the number right... then go for the non-parser based solution.
Personally I would use XML::Twig (maybe because I wrote it ;--). It will process the XML as XML, while still respecting the original format of the file, and won't load it all in memory before starting to work.
Untested code below:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
XML::Twig->new( # call the sub for each VALUE element with a DECIMAL_VALUE attribute
twig_roots => { 'VALUE[#DECIMAL_VALUE]' => \&upd_decimal },
# print anything else as is
twig_print_outside_roots => 1,
)
->parsefile_inplace( 'foo.xml');
sub upd_decimal
{ my( $twig, $value)= #_; # twig is the XML::Twig object, $value the element
my $decimal_value= $value->att( 'DECIMAL_VALUE');
$decimal_value += 0.4;
$value->set_att( DECIMAL_VALUE => $decimal_value);
$value->print;
}

This takes input on stdin, outputs to stdout:
while(<>){
if( $_ =~ /^(.*DECIMAL_VALUE=\")(.*)(\".*)$/ ){
$newVal = $2 + 0.04;
print "$1$newVal$3\n";
}else{
print $_;
}
}

Something akin to the following will work. It may need tweaking if there is extra spacing, but that is left as an exercise for the reader.
function update_after(in_string, locate_string, delta) {
local_pos = index(in_string,locate_string);
leadin = substr(in_string,0,local_pos-1);
leadout = substr(in_string,local_pos+length(locate_string));
new_value = leadout+delta;
quote_pos = index(leadout,"\"");
leadout = substr(leadout, quote_pos + 1);
return leadin locate_string new_value"\"" leadout;
}
/^ *\<VALUE/{
print update_after($0, "DECIMAL_VALUE=\"",0.4);
}

here's gawk
awk '/DECIMAL_VALUE/{
for(i=1;i<=NF;i++){
if( $i~/DECIMAL_VALUE/){
gsub(/DECIMAL_VALUE=|\042/,"",$i)
$i="DECIMAL_VALUE=\042"$i+0.4"\042"
}
}
}1' file

Related

Search and replace in Perl for particular word

I have a huge file which consists of similar lines below , with different clocks:
cmd -quiet [get_ports p1] ref_clocks "cudtclk_sp cudtclk"
cmd -quiet [get_ports p2] clock "cu2xdtclk_sp cu2xdtclk"
And I need to replace cudtclk with some other name like cdtclk whenever I have ref_clocks in my file, globally.
I have written following code but it doesn't seem to be working.
#!/usr/bin/perl
use strict;
use warnings;
sub clock_change
{       # Get the subroutine's argument.
my $arg = shift;
# Hash of stuff we want to replace.
my %replace = (
"cudtclk" => "cdtclk",
);
# See if there's a replacement for the given text.
my $text = $replace{$arg};
if(defined($text)) {
return $text;
}
return $arg;
}
open PAR, "<file name>";
while(<PAR>) {
$_ =~ s/\S+\s\S+\s\S+\s\S+\sref_clocks\s+(\S+\s+\S+)/clock_change($1)/eig;
print $_;   ##print it to some file later.
}
"And I need to replace cudtclk with some other name like cdtclk"
perl -pe 's/\bcudtclk\b/cdtclk/' thefile > newfile
"whenever I have ref_clocks"
perl -pe 's/\bcudtclk\b/cdtclk/ if /\bref_clocks\b/' thefile > newfile
Alternatively:
# saves original file as file.bak
perl -i.bak -pe 's/\bcudtclk\b/cdtclk/ if /\bref_clocks\b/' file
Tighten to suit your data, as necessary.
Although the substitution seems like unnecessarily complex, you can fix it with something similar to:
$_ =~ s/(ref_clocks\s+")([^_]+)_sp(\s+)\2/
$1.clock_change($2)."_sp$3".clock_change($2)/eig;

how can i fetch the whole word on the basis of index no of that string in perl

I have one string of line like
comments:[I#1278327] is related to office communicator.i fixed the bug to declare it null at first time.
Here I am searching index of I#then I want the whole word means [I#1278327]. I'm doing it like this:
open(READ1,"<letter.txt");
while(<READ1>)
{
if(index($_,"I#")!=-1)
{
$indexof=index($_,"I#");
print $indexof,"\n";
$string=substr($_,$indexof);##i m cutting that string first from index of I# to end then...
$string=substr($string,0,index($string," "));
$lengthof=length($string);
print $lengthof,"\n";
print $string,"\n";
print $_,"\n";
}
}
Is any API is there in perl to find the word length directly after finding the index of I# in that line.
You could do something like:
$indexof=index($_,"I#");
$index2 = index($_,' ',$indexof);
$lengthof = $index2 - $indexof;
However, the bigger issue is you are using Perl as if it were BASIC. A more perlish approach to the task of printing selected lines:
use strict;
use warnings;
open my $read, '<', 'letter.txt'; # safer version of open
LINE:
while (<$read>) {
print "$1 - $_" if (/(I#.*?) /);
}
I would use a regex instead, a regex will allow you to match a pattern ("I#") and also capture other data from the string:
$_ =~ m/I#(\d+)/;
The line above will match and set $1 to the number.
See perldoc perlre

how to put a file into an array and save it in perl

Hello everyone I'm a beginner in perl and I'm facing some problems as I want to put my strings starting from AA to \ in to an array and want to save it. There are about 2000-3000 strings in a txt file starting from same initials i.e., AA to / I'm doing it by this way plz correct me if I'm wrong.
Input File
AA c0001
BB afsfjgfjgjgjflffbg
CC table
DD hhhfsegsksgk
EB jksgksjs
\
AA e0002
BB rejwkghewhgsejkhrj
CC chair
DD egrhjrhojohkhkhrkfs
VB rkgjehkrkhkh;r
\
Source code
$flag = 0
while ($line = <ifh>)
{
if ( $line = m//\/g)
{
$flag = 1;
}
while ( $flag != 0)
{
for ($i = 0; $i <= 10000; $i++)
{ # Missing brace added by editor
$array[$i] = $line;
} # Missing brace added by editor
}
} # Missing close brace added by editor; position guessed!
print $ofh, $line;
close $ofh;
Welcome to StackOverflow.
There are multiple issues with your code. First, please post compilable Perl; I had to add three braces to give it the remotest chance of compiling, and I had to guess where one of them went (and there's a moderate chance it should be on the other side of the print statement from where I put it).
Next, experts have:
use warnings;
use strict;
at the top of their scripts because they know they will miss things if they don't. As a learner, it is crucial for you to do the same; it will prevent you making errors.
With those in place, you have to declare your variables as you use them.
Next, remember to indent your code. Doing so makes it easier to comprehend. Perl can be incomprehensible enough at the best of times; don't make it any harder than it has to be. (You can decide where you like braces - that is open to discussion, though it is simpler to choose a style you like and stick with it, ignoring any discussion because the discussion will probably be fruitless.)
Is the EB vs VB in the data significant? It is hard to guess.
It is also not clear exactly what you are after. It might be that you're after an array of entries, one for each block in the file (where the blocks end at the line containing just a backslash), and where each entry in the array is a hash keyed by the first two letters (or first word) on the line, with the remainder of the line being the value. This is a modestly complex structure, and probably beyond what you're expected to use at this stage in your learning of Perl.
You have the line while ($line = <ifh>). This is not invalid in Perl if you opened the file the old fashioned way, but it is not the way you should be learning. You don't show how the output file handle is opened, but you do use the modern notation when trying to print to it. However, there's a bug there, too:
print $ofh, $line; # Print two values to standard output
print $ofh $line; # Print one value to $ofh
You need to look hard at your code, and think about the looping logic. I'm sure what you have is not what you need. However, I'm not sure what it is that you do need.
Simpler solution
From the comments:
I want to flag each record starting from AA to \ as record 0 till record n and want to save it in a new file with all the record numbers.
Then you probably just need:
#!/usr/bin/env perl
use strict;
use warnings;
my $recnum = 0;
while (<>)
{
chomp;
if (m/^\\$/)
{
print "$_\n";
$recnum++;
}
else
{
print "$recnum $_\n";
}
}
This reads from the files specified on the command line (or standard input if there are none), and writes the tagged output to standard output. It prefixes each line except the 'end of record' marker lines with the record number and a space. Choose your output format and file handling to suit your needs. You might argue that the chomp is counter-productive; you can certainly code the program without it.
Overly complex solution
Developed in the absence of clear direction from the questioner.
Here is one possible way to read the data, but it uses moderately advanced Perl (hash references, etc). The Data::Dumper module is also useful for printing out Perl data structures (see: perldoc Data::Dumper).
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my #data;
my $hashref = { };
my $nrecs = 0;
while (<>)
{
chomp;
if (m/^\\$/)
{
# End of group - save to data array and start new hash
$data[$nrecs++] = $hashref;
$hashref = { };
}
else
{
m/^([A-Z]+)\s+(.*)$/;
$hashref->{$1} = $2;
}
}
foreach my $i (0..$nrecs-1)
{
print "Record $i:\n";
foreach my $key (sort keys $data[$i])
{
print " $key = $data[$i]->{$key}\n";
}
}
print Data::Dumper->Dump([ \#data ], [ '#data' ]);
Sample output for example input:
Record 0:
AA = c0001
BB = afsfjgfjgjgjflffbg
CC = table
DD = hhhfsegsksgk
EB = jksgksjs
Record 1:
AA = e0002
BB = rejwkghewhgsejkhrj
CC = chair
DD = egrhjrhojohkhkhrkfs
VB = rkgjehkrkhkh;r
$#data = [
{
'EB' => 'jksgksjs',
'CC' => 'table',
'AA' => 'c0001',
'BB' => 'afsfjgfjgjgjflffbg',
'DD' => 'hhhfsegsksgk'
},
{
'CC' => 'chair',
'AA' => 'e0002',
'VB' => 'rkgjehkrkhkh;r',
'BB' => 'rejwkghewhgsejkhrj',
'DD' => 'egrhjrhojohkhkhrkfs'
}
];
Note that this data structure is not optimized for searching except by record number. If you need to search the data in some other way, then you need to organize it differently. (And don't hand this code in as your answer without understanding it all - it is subtle. It also does no error checking; beware faulty data.)
It can't be right. I can see two main issues with your while-loop.
Once you enter the following loop
while ( $flag != 0)
{
...
}
you'll never break out because you do not reset the flag whenever you find an break-line. You'll have to parse you input and exit the loop if necessary.
And second you never read any input within this loop and thus process the same $line over and over again.
You should not put the loop inside your code but instead you can use the following pattern (pseudo-code)
if flag != 0
append item to array
else
save array to file
start with new array
end
I believe what you want is to split the files content at \ though it's not too clear.
To achieve this you can slurp the file into a variable by setting the input record separator, then split the content.
To find out about Perl's special variables related to filehandlers read perlvar
#!perl
use strict;
use warnings;
my $content;
{
open my $fh, '<', 'test.txt';
local $/; # slurp mode
$content = <$fh>;
close $fh;
}
my #blocks = split /\\/, $content;
Make sure to localize modifications of Perl's special variables to not interfere with different parts of your program.
If you want to keep the separator you could set $/ to \ directly and skip split.
#!perl
use strict;
use warnings;
my #blocks;
{
open my $fh, '<', 'test.txt';
local $/ = '\\'; # seperate at \
#blocks = <$fh>;
close $fh;
}
Here's a way to read your data into an array. As I said in a comment, "saving" this data to a file is pointless, unless you change it. Because if I were to print the #data array below to a file, it would look exactly like the input file.
So, you need to tell us what it is you want to accomplish before we can give you an answer about how to do it.
This script follows these rules (exactly):
Find a line that begins with "AA",
and save that into $line
Concatenate every new line from the
file into $line
When you find a line that begins with
a backslash \, stop concatenating
lines and save $line into #data.
Then, find the next line that begins
with "AA" and start the loop over.
These matching regexes are pretty loose, as they will match AAARGH and \bonkers as well. If you need them stricter, you can try /^\\$/ and /^AA$/, but then you need to watch out for whitespace at the beginning and end of line. So perhaps /^\s*\\\s*$/ and /^\s*AA\s*$/ instead.
The code:
use warnings;
use strict;
my $line="";
my #data;
while (<DATA>) {
if (/^AA/) {
$line = $_;
while (<DATA>) {
$line .= $_;
last if /^\\/;
}
}
push #data, $line;
}
use Data::Dumper;
print Dumper \#data;
__DATA__
AA c0001
BB afsfjgfjgjgjflffbg
CC table
DD hhhfsegsksgk
EB jksgksjs
\
AA e0002
BB rejwkghewhgsejkhrj
CC chair
DD egrhjrhojohkhkhrkfs
VB rkgjehkrkhkh;r
\

perl + numeration word or parameter in file

I need help about how to numeration text in file.
I have also linux machine and I need to write the script with perl
I have file name: file_db.txt
In this file have parameters like name,ParameterFromBook,NumberPage,BOOK_From_library,price etc
Each parameter equal to something as name=elephant
My question How to do this by perl
I want to give number for each parameter (before the "=") that repeated (unique parameter) in the file , and increase by (+1) the new number of the next repeated parameter until EOF
lidia
For example
file_db.txt before numbering
parameter=1
name=one
parameter=2
name=two
file_db.txt after parameters numbering
parameter1=1
name1=one
parameter2=2
name2=two
other examples
Example1 before
name=elephant
ParameterFromBook=234
name=star.world
ParameterFromBook=200
name=home_room1
ParameterFromBook=264
Example1 after parameters numbering
name1=elephant
ParameterFromBook1=234
name2=star.world
ParameterFromBook2=200
name3=home_room1
ParameterFromBook3=264
Example2 before
file_db.txt before numbering
lines_and_words=1
list_of_books=3442
lines_and_words=13
list_of_books=344224
lines_and_words=120
list_of_books=341
Example2 after
file_db.txt after parameters numbering
lines_and_words1=1
list_of_books1=3442
lines_and_words2=13
list_of_books2=344224
lines_and_words3=120
list_of_books3=341
It can be condensed to a one line perl script pretty easily, though I don't particularly recommend it if you want readability:
#!/usr/bin/perl
s/(.*)=/$k{$1}++;"$1$k{$1}="/e and print while <>;
This version reads from a specified file, rather than using the command line:
#!/usr/bin/perl
open IN, "/tmp/file";
s/(.*)=/$k{$1}++;"$1$k{$1}="/e and print while <IN>;
The way I look at it, you probably want to number blocks and not just occurrences. So you probably want the number on each of the keys to be at least as great as the earliest repeating key.
my $in = \*::DATA;
my $out = \*::STDOUT;
my %occur;
my $num = 0;
while ( <$in> ) {
if ( my ( $pre, $key, $data ) = m/^(\s*)(\w+)=(.*)/ ) {
$num++ if $num < ++$occur{$key};
print { $out } "$pre$key$num=$data\n";
}
else {
$num++;
print;
}
}
__DATA__
name=elephant
ParameterFromBook=234
name=star.world
ParameterFromBook=200
name=home_room1
ParameterFromBook=264
However, if you just wanted to give the key it's particular count. This is enough:
my %occur;
while ( <$in> ) {
my ( $pre, $key, $data ) = m/^(\s*)(\w+)=(.*)/;
$occur{$key}++;
print { $out } "$pre$key$occur{$key}=$data\n";
}
in pretty much pseudo code:
open(DATA, "file");
my #lines = <DATA>;
my %tags;
foreach line (#lines)
{
my %parts=split(/=/, $line);
my $name=$parts[0];
my $value=$parts[1];
$name = ${name}$tags{ $name };
$tags{ $name } = $tags{ $name } + 1;
printf "${name}=$value\n";
}
close( DATA );
This looks like a CS101 assignment. Is it really good to ask for complete solutions instead of asking specific technical questions if you have difficulty?
If Perl is not a must, here's an awk version
$ cat file
name=elephant
ParameterFromBook=234
name=star.world
ParameterFromBook=200
name=home_room1
ParameterFromBook=264
$ awk -F"=" '{s[$1]++}{print $1s[$1],$2}' OFS="=" file
name1=elephant
ParameterFromBook1=234
name2=star.world
ParameterFromBook2=200
name3=home_room1
ParameterFromBook3=264

perl split on empty file

I have basically the following perl I'm working with:
open I,$coupon_file or die "Error: File $coupon_file will not Open: $! \n";
while (<I>) {
$lctr++;
chomp;
my #line = split/,/;
if (!#line) {
print E "Error: $coupon_file is empty!\n\n";
$processFile = 0; last;
}
}
I'm having trouble determining what the split/,/ function is returning if an empty file is given to it. The code block if (!#line) is never being executed. If I change that to be
if (#line)
than the code block is executed. I've read information on the perl split function over at
http://perldoc.perl.org/functions/split.html and the discussion here about testing for an empty array but not sure what is going on here.
I am new to Perl so am probably missing something straightforward here.
If the file is empty, the while loop body will not run at all.
Evaluating an array in scalar context returns the number of elements in the array.
split /,/ always returns a 1+ elements list if $_ is defined.
You might try some debugging:
...
chomp;
use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper( { "line is" => $_ } );
my #line = split/,/;
print Dumper( { "split into" => \#line } );
if (!#line) {
...
Below are a few tips to make your code more idiomatic:
The special variable $. already holds the current line number, so you can likely get rid of $lctr.
Are empty lines really errors, or can you ignore them?
Pull apart the list returned from split and give the pieces names.
Let Perl do the opening with the "diamond operator":
The null filehandle <> is special: it can be used to emulate the behavior of sed and awk. Input from <> comes either from standard input, or from each file listed on the command line. Here's how it works: the first time <> is evaluated, the #ARGV array is checked, and if it is empty, $ARGV[0] is set to "-", which when opened gives you standard input. The #ARGV array is then processed as a list of filenames. The loop
while (<>) {
... # code for each line
}
is equivalent to the following Perl-like pseudo code:
unshift(#ARGV, '-') unless #ARGV;
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}
except that it isn't so cumbersome to say, and will actually work.
Say your input is in a file named input and contains
Campbell's soup,0.50
Mac & Cheese,0.25
Then with
#! /usr/bin/perl
use warnings;
use strict;
die "Usage: $0 coupon-file\n" unless #ARGV == 1;
while (<>) {
chomp;
my($product,$discount) = split /,/;
next unless defined $product && defined $discount;
print "$product => $discount\n";
}
that we run as below on Unix:
$ ./coupons input
Campbell's soup => 0.50
Mac & Cheese => 0.25
Empty file or empty line? Regardless, try this test instead of !#line.
if (scalar(#line) == 0) {
...
}
The scalar method returns the array's length in perl.
Some clarification:
if (#line) {
}
Is the same as:
if (scalar(#line)) {
}
In a scalar context, arrays (#line) return the length of the array. So scalar(#line) forces #line to evaluate in a scalar context and returns the length of the array.
I'm not sure whether you're trying to detect if the line is empty (which your code is trying to) or whether the whole file is empty (which is what the error says).
If the line, please fix your error text and the logic should be like the other posters said (or you can put if ($line =~ /^\s*$/) as your if).
If the file, you simply need to test if (!$lctr) {} after the end of your loop - as noted in another answer, the loop will not be entered if there's no lines in the file.