Multiple line search and replace - perl

my $string = "<name>
POWERDOWN_SUPPORT
</name>
<bool>
<value> true </value>
</bool>";
if ($string=~ s/POWERDOWN_SUPPORT<\/name><bool><value>.*?<\/value>/<false>/ims) {
print "$string\n";
}
How do I get the replacement to work?
Expected output:
<name>
POWERDOWN_SUPPORT
</name>
<bool>
<value> false </value>
</bool>

This may work:
s/(POWERDOWN_SUPPORT\s*?<\/name>\s*?<bool>\s*?<value>).*?(<\/value>)/$1 false $2/s

Use a XML aware tool, parsing XML with regular expressions is hard and error prone.
For example, xsh, a wrapper around XML::LibXML:
open file.xml ;
my $v = //name[normalize-space(.)='POWERDOWN_SUPPORT']/following-sibling::bool/value[normalize-space(.)='true'];
set $v/text() 'false' ;
save :b ;

Related

Remove line break characters in multi-line XML element

On a Unix system I have an input text file containing long multi-line strings.
I now want to remove line breaks only between two patterns ( and ) which can be on different lines.
Example input file:
text1 text2 <remarks> text3
text4 text5 </remarks> text6 text7 text8
Result output for the above input file should be:
text1 text2 <remarks> text3 text4 text5 </remarks> text6 text7 text8
I would prefer to use sed or Perl or maybe awk to do the job.
I do not see a solution as the newlines can happen "randomly" and text is just some log messages.
Here is a more detailed look of the input file I need to process. It does not contain a root XML section, but for testing I might just add one manually. Also there may be many "remarks" sections.
Inputfile Snippet (as it is very long), Filename is test:
<paymentTerm keyValue1="8" objectType="PAYMENTTERM" />
<paymentType keyValue1="20" objectType="PAYMENTTYPE" />
<priceList keyValue1="1" objectType="PRICELIST" />
<remarks>Zollanmeldung ab 250 €
Lager Adresse:
Hessen-Ring 456
D-64546 Mörfelden-Walldorff
eine Stunde vor Ankunft melden unter Mobile
Neu Spedition
A&R Logistics Group
Storkenburgstrasse 99
D-62546 Mörfelden-Walldorf
www.asp.de</remarks>
<salesPersons>
<PERSON keyValue1="2" keyValue2="SALESEMPLOYEE" objectType="PERSON" />
</salesPersons>
<shippingType keyValue1="5" objectType="SHIPPINGTYPE" />
As stated above I want to remove the linebreaks ONLY between the patterns "remarks" and "/remarks".
I tried the Perl XML Parsing suggested by borodin like this:
use strict;
use warnings 'all';
use XML::Twig;
use constant XML_FILE => 'test';
my $twig = XML::Twig->new(
twig_handlers => {
remarks => sub { $_->set_text($_->trimmed_text) }
}
);
$twig->parsefile(XML_FILE);
$twig->print;
It works, but prints everything on one line.
With GNU awk for multi-char RS:
$ awk -v RS='</?remarks>' -v ORS= '!(NR%2){gsub(/\n/,OFS)} {print $0 RT}' file
text1 text2 <remarks> text3 text4 text5 </remarks> text6 text7 text8
XML can represent the same information in many different ways, and it is always a risk to try processing it using regular expressions. It is far better to use a proper XML module to process XML data. This solution uses
XML::Twig
In the constructor for the $twig object you can specify a callback which is called automatically every time a given XML element is encountered in the input
The trimmed_text method removes leading and trailing whitespace from the text of the element, and turns any internal whitespace sequences, including line breaks, into a single space. That is exactly what you are asking for here, so a call to set_text is all that is necessary to update the string
The file to be processed is specified by the XML_FILE constant and you should modify that to specify the path to your own data file. The modified XML is printed to STDOUT
use strict;
use warnings 'all';
use open qw/ :std :encoding(UTF-8) /;
use XML::Twig;
use constant XML_FILE => 'remarks.xml';
my $twig = XML::Twig->new(
keep_spaces => 1,
twig_handlers => {
remarks => sub { $_->set_text($_->trimmed_text) }
}
);
$twig->parsefile(XML_FILE);
$twig->print;
input
Your sample data is invalid XML, so I have edited it to look like this. I have added the XML declaration that you said in a comment that you had, and I have added a root element <data>
<?xml version="1.0" encoding="UTF-8"?>
<data>
<paymentTerm keyValue1="8" objectType="PAYMENTTERM" />
<paymentType keyValue1="20" objectType="PAYMENTTYPE" />
<priceList keyValue1="1" objectType="PRICELIST" />
<remarks>Zollanmeldung ab 250 €
Lager Adresse:
Hessen-Ring 456
D-64546 Mörfelden-Walldorff
eine Stunde vor Ankunft melden unter Mobile
Neu Spedition
A&R Logistics Group
Storkenburgstrasse 99
D-62546 Mörfelden-Walldorf
www.asp.de</remarks>
<salesPersons>
<PERSON keyValue1="2" keyValue2="SALESEMPLOYEE" objectType="PERSON" />
</salesPersons>
<shippingType keyValue1="5" objectType="SHIPPINGTYPE" />
</data>
output
<?xml version="1.0" encoding="UTF-8"?>
<data>
<paymentTerm keyValue1="8" objectType="PAYMENTTERM"/>
<paymentType keyValue1="20" objectType="PAYMENTTYPE"/>
<priceList keyValue1="1" objectType="PRICELIST"/>
<remarks>Zollanmeldung ab 250 € Lager Adresse: Hessen-Ring 456 D-64546 Mörfelden-Walldorff eine Stunde vor Ankunft melden unter Mobile Neu Spedition A&R Logistics Group Storkenburgstrasse 99 D-62546 Mörfelden-Walldorf www.asp.de</remarks>
<salesPersons>
<PERSON keyValue1="2" keyValue2="SALESEMPLOYEE" objectType="PERSON"/>
</salesPersons>
<shippingType keyValue1="5" objectType="SHIPPINGTYPE"/>
</data>

Perl collecting xml snippets from log with specific contents

I have a script copied from another stackoverflow question, but it seems to replace the content of the variable, could someone point me to the error? If i remove the if check for the ">OK<" it prints the whole xml to a file, if i put the if back it only prints the line containing the ">OK<", why is the $xml variable modified by the =~?
# Example usage:
# perl script.pl data.xml RootTag > RootTag.xml
use strict;
use warnings;
my $tag = pop;
while (<>){
if ( s/.*(<$tag>)/$1/ .. s/(<(\/)$tag>).*/$1/ ) {
my $xml = $_;
if ($xml =~ m/>OK</) {
print "$xml";
}
}
}
An example of a input file could be
reioirioree
brebreberbre
rbebrbebre
<test>
<id>1</id>
<status>OK</status>
</test>
bbrtbtrbt
rtbtrb
<test>
<id>2</id>
<status>KO</status>
</test>
brtoibjtrbi
bebbetreb
<test>
<id>3</id>
<status>OK</status>
</test>
dfbreberbreb
berbrebre
In this case if we user "test" as parameter, i would like following output
<test>
<id>1</id>
<status>OK</status>
</test>
<test>
<id>3</id>
<status>OK</status>
</test>
The objective is to capture the whole tag when it contains a specific pattern (>OK<).
Here is a step-by-step way which spells out details. I keep your program interface.
use strict;
use warnings;
my $tag = pop;
my ($inside_tag, $found, #buff);
while (<>)
{
if (s/.*(<$tag>)/$1/) {
$inside_tag = 1;
}
elsif (s|(</$tag>).*|$1|) { #/
$inside_tag = 0;
if ($found) {
print #buff, $_;
$found = 0;
}
#buff = ();
}
next unless $inside_tag;
push #buff, $_;
$found = 1 if />OK</;
}
On the opening tag we set the flag that we are inside the tag. On the closing tag we unset it, and if the marker has been $found we print the buffer (and unset marker's flag). We clear the buffer here.
Then we skip the iteration if outside of the tag. Otherwise, add the line to the buffer and test for the marker on that line.
A glitch with using the range in this problem is that we must know when we are on the closing-tag line, and would like to know the opening line as well. Then we need further tests and flip-flop isn't so clean any more. We can use the sequence number that the .. operator returns
The value returned is either the empty string for false, or a sequence number (beginning with 1) for true. The sequence number is reset for each range encountered. The final sequence number in a range has the string "E0" appended to it, which doesn't affect its numeric value, but gives you something to search for if you want to exclude the endpoint. You can exclude the beginning point by waiting for the sequence number to be greater than 1.
It would go something like
if (my $seq = /BEG/ .. /END/)
{
if ($seq == 1) { # first line of range
# ...
}
elsif ($seq =~ /EO$/) { # last line of range
# ...
}
else { ... } # inside
and I don't see that this is clearer or better than keeping the state manually.

Replace string in shell using sed or Perl?

How can I replace the following string:
<value>-myValue</value>
<value>1234</value>
And make it to be:
<value>-myValue</value>
<value>0</value>
Please take into account that there is a line break.
Script
sed -e '/<value>-myValue</,/<value>/{ /<value>[0-9][0-9]*</ s/[0-9][0-9]*/0/; }' data
From a line containing <value>-myValue< to the next line containing <value>, if the line matches <value>XX< where XX is a string of one or more digits, replace the string of digits with 0.
Input
This is not something to change
<value>-myValue</value>
<value>1234</value>
<value>myValue</value>
<value>1234</value>
nonsense
<value>-myValue</value>
<value>abcd</value>
<value>-myValue</value>
<value>4321</value>
stuffing
Output
This is not something to change
<value>-myValue</value>
<value>0</value>
<value>myValue</value>
<value>1234</value>
nonsense
<value>-myValue</value>
<value>abcd</value>
<value>-myValue</value>
<value>0</value>
stuffing
If this is XML, TLP is right that an XML parser would be superior. Continuing on with your sed approach, however, consider:
$ sed '/<value>-myValue/ {N; s/<value>[[:digit:]]\+/<value>0/}' file
<value>-myValue</value>
<value>0</value>
You can possibly simplify this a bit, depending on what criteria you specifically want to use:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig->new( 'pretty_print' => 'indented_a' )->parse( \*DATA );
foreach my $value ( $twig->findnodes('//value') ) {
if ( $value->trimmed_text eq '-myValue'
and $value->next_sibling('value')
and $value->next_sibling('value')->text =~ m/^\d+$/ )
{
$value->next_sibling('value')->set_text('1234');
}
}
$twig->print;
__DATA__
<root>
<value>-myValue</value>
<value>0</value>
</root>
This outputs:
<root>
<value>-myValue</value>
<value>1234</value>
</root>
It parses your XML.
Looks for all nodes with a tag of value.
Checks that it has a sibling.
Checks that sibling is 'just numeric' e.g. matching regex ^\d+$
replaces the content of that sibling with 1234.
And will work on XML regardless of formatting, which is the problem with XML - pretty fundamentally there's a bunch of entirely valid things you can do that are semantically identical in XML.

Perl LIBXML : Using findnodes with attributes to reach a node in the xml file

The XML file looks like below:
<?xml version="1.0"?>
<application name="pos">
<artifact id="123" type="war" cycle="Release7-Sprint1">
<jira/>
<jenkins/>
<deployment/>
<scm>
<transaction id="1234" user="">
<file name=""/>
<file name=""/>
</transaction>
</scm>
</artifact>
</application>
My piece of code looks below and works fine when I use the hard coded value of attribute(name), instead of using a variable. I am referencing the line ( my $query =
'//application[#name="pos"]'; )
my $manifestDoc = $manifestFileParser->parse_file($manifestFile);
my $changeLogDoc = $changeLogParser->parse_file($changeLogXml );
my $changeLogRoot = $changeLogDoc->getDocumentElement;
#my $applicationName = pos;
my $query = '//application[#name="pos"]';
my $applicationNode = $manifestDoc->findnodes($query);
my $artifactNode = $manifestDoc->createElement('artifact');
$artifactNode->setAttribute("id",$artifactID);
$artifactNode->setAttribute("type",$artifactType);
$artifactNode->setAttribute("cycle",$releaseCycle);
$applicationNode->[0]->appendChild($artifactNode);
But if I modify the $query variable to use a variable ($applicationName) instead of a hard coded value of attribute, it gives me a compilation error saying below:
Can't call method "appendChild" on an undefined value at updateManifest.pl line
Modified code:
my $applicationName = "pos" ;
my $query = '//application[#name="$applicationName"]';
Not sure what is wrong. Anything to do with quotes?
Any help is much appreciated.
The expression '//application[#name="$applicationName"]' means the literal string with those contents – no variables are interpolated with single quotes. If you'd use double quotes, then both #name and $applicationName would be interpolated.
You have three options:
Use double quotes, but escape the #:
qq(//application[\#name="$applicationName"])
The qq operator is equivalent to double quotes "…" but can have arbitrary delimiters, which avoids the need to escape the " inside the string.
Concatenate the string:
'//application[#name="' . $applicationName . '"]'
This often has a tendency to be hard to read. I'd avoid this solution.
Use a sprintf pattern to build the string from a template:
sprintf '//application[#name="%s"]', $applicationName
If you don't already know printf patterns, you can find them documented in perldoc -f sprintf.

perl, libxml, xpath : how to get an element through an attribute in this example .xml file

I would like your help in the following :
given the .xml file :
<network>
<netelement>
<node pwd="KOR-ASBG" func="describe_SBG_TGC">
<collection category="IMT" dir="Stream_statistics"></collection>
</node>
</netelement>
<netelement>
<node pwd="ADR-ASBG" func="describe_SBG_TGC">
<collection category="IMT" dir="Stream_statistics"></collection>
<collection category="IMT" dir="Proxy_registrar_statistics_ACCESS"></collection>
</node>
</netelement></network>
What I would like to do is to get the element with the attribute "KOR-ASBG", for example,
but using only XPath.
I have written the following Perl code :
#!/usr/bin/perl -w
use strict ;
use warnings ;
use XML::LibXML ;
use Data::Dump qw(dump) ;
my $dump = "/some_path/_NETELEMENT_.xml" ;
my $parser = new XML::LibXML ; my $doc ;
eval{ $doc = $parser->parse_file($dump) ; } ;
if( !$doc ) { print "failed to parse $dump" ; next ; }
my $root = $doc->getDocumentElement ;
my $_demo = $root->find('/network/netelement/node[#pwd="KOR-ASBG"]') ;
print dump($_demo)."\n" ;
But, what it gets dispalyed is :
bless([bless(do{\(my $o = 155172440)}, "XML::LibXML::Element")], "XML::LibXML::NodeList")
So the question would be, how can I get the XML Element that contains the "pwd" attribute (that equals "KOR-ASBG"), using XPath ?
Thank you :)
PS. I have also tried :
my #_demo = $root->findnodes('/network/netelement/node[#pwd="KOR-ASBG"]') ;
print dump(#_demo)."\n" ;
and what it gets displayed is :
bless(do{\(my $o = 179552448)}, "XML::LibXML::Element")
There could technically be more than one element that matches, which is why a result set is being returned instead of single node. You could use
my ($ele) = $root->findnodes('/network/netelement/node[#pwd="KOR-ASBG"]');
That will place the first match into $ele.
Your dumper object is not lying to you; you are getting a node list. To access it you may either iterate through it or just access the first node:
print $_demo->get_node(0)->toString()
Of course, all DOM methods are available to you once you get the actual node:
print $_demo->get_node(0)->getAttribute('func');
What you are seeing is what they call in Perl an "opaque object". It's not a hash, but a key to a set of lexical hashes in the the package which hold the fields for all the instances. It's Perl's way of implementing objects with security. The only way to get at their info is to call their get accessors.
The way to figure out how to deal with these is note the second argument to the bless and look up this:
http://search.cpan.org/perldoc?<name-of-package>
Or in your case: http://search.cpan.org/perldoc?XML::LibXML::NodeList and
http://search.cpan.org/perldoc?XML::LibXML::Element
Now, I don't recommend this in all cases, but if you notice, the NodeList object is a blessed array reference. So you could just access the last node, like so:
my $nodes = $root->find('/network/netelement/node[#pwd="KOR-ASBG"]');
my $first_node = $nodes->[0];
my $last_node = $nodes->[-1];
Of course it often makes sense to make a list implementation behave like an array, either through blessed array or overloaded operators or ties. So, in this case, I don't think it's too big a violation of encapsulation.