How to grep for specific string in a file - perl

this is my input file
<MessageOut>
<Attribute name="Session-Id" value="22250"/><Attribute name="CC-Request-Type" value="2"/><Attribute name="CC-Request-Number" value="1"/><Attribute name="Origin-Host" value="indlnqw291"/><Attribute name="Origin-Realm" value="amdocs.com"/><Attribute name="Auth-Application-Id" value="4"/><Attribute name="Result-Code" value="5031"/><Attribute name="CC-Session-Failover" value="1"/><Attribute name="Low-Balance-Indication" value="0"/><Attribute name="Multiple-Services-Credit-Control"><Group><Attribute name="Result-Code" value="5031"/><Attribute name="Service-Identifier" value="0"/><Attribute name="Rating-Group" value="2"/></Group></Attribute></MessageOut>
<MessageOut>
<Attribute name="Session-Id" value="22250"/><Attribute name="CC-Request-Type" value="3"/><Attribute name="CC-Request-Number" value="2"/><Attribute name="Origin-Host" value="indlnqw291"/><Attribute name="Origin-Realm" value="amdocs.com"/><Attribute name="Auth-Application-Id" value="4"/><Attribute name="Result-Code" value="5031"/></MessageOut>
<MessageOut>
<Attribute name="Session-Id" value="22250"/><Attribute name="CC-Request-Type" value="1"/><Attribute name="CC-Request-Number" value="0"/><Attribute name="Origin-Host" value="indlnqw291"/><Attribute name="Origin-Realm" value="amdocs.com"/><Attribute name="Auth-Application-Id" value="4"/><Attribute name="Result-Code" value="5031"/><Attribute name="CC-Session-Failover" value="1"/><Attribute name="Low-Balance-Indication" value="0"/><Attribute name="Multiple-Services-Credit-Control"><Group><Attribute name="Result-Code" value="5031"/><Attribute name="Service-Identifier" value="0"/><Attribute name="Rating-Group" value="2"/></Group></Attribute></MessageOut>
i want grep result code after "Multiple-Services-Credit-Control"
expected result:
"CC-Request-Type" value="1"
"CC-Request-Number" value="0"
"Result-Code" value="5031"
"CC-Request-Type" value="2"
"CC-Request-Number" value="1"
"Result-Code" value="5031"
"CC-Request-Type" value="3"
"CC-Request-Number" value="2"
"Result-Code" value="5031"
thanks in advance

This is XML. It's a bad idea to try and use regular expressions on XML, because XML is contextual, and regular expressions aren't.
Use an XML Parser. Most will let you do xpath which is comparable to regular expressions - but specifically designed to handle the contextual nature of XML.
perl has multiple options. I like XML::Twig particularly.:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my $twig = XML::Twig -> parsefile ( 'your_file.xml' );
my #attributes = qw ( CC-Request-Type CC-Request-Number Result-Code );
foreach my $msg ( $twig -> get_xpath('//MessageOut') ) {
foreach my $attribute ( #attributes ) {
print "$attribute value=",$msg -> get_xpath("//Attribute[\#name=\'$attribute\']",0)->att('value'),"\n";
}
print "\n";
}
With your sample data (slightly amended to include root tags) give:
CC-Request-Type value=2
CC-Request-Number value=1
Result-Code value=5031
CC-Request-Type value=2
CC-Request-Number value=1
Result-Code value=5031
CC-Request-Type value=2
CC-Request-Number value=1
Result-Code value=5031

use strict;
use warnings;
my $filename = 'path_to_input_file\data.txt';
open(my $fh, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
while (my $row = <$fh>) {
chomp $row;
$row =~ /.*?("CC-Request-Type"\svalue="\d*").*?("CC-Request-Number"\svalue="\d*").*?("Result-Code" value="\d*")/;
if ( (defined $1) and (defined $2) and (defined $3)){
print "\n$1\n$2\n$3\n";
}
}
This is the solution in Perl. If you need explanations for the reg-ex used I will gladly exlpain it.

Related

Which syntax is better XML::Simple or XML::Twig

I was running a Perl script and I encountered the following result, instead of the answer I expected.
input HASH(0x17268bb0)
input HASH(0x172b3300)
input HASH(0x172b32a0)
Can anyone say what this is and how to rectify it?
This is my XML file here
<Root>
<Top name="ri_32">
<Module name="ALU">
<input name="power_control_bus"/>
<bidirection name="address_bus"/>
</Module>
<Module name="Power_control">
<input name="cpu_control_bus"/>
<output name="power_control_bus"/>
<bidirection name="address_bus"/>
</Module>
<input name="address"/>
<input name="clock"/>
<input name="data_in"/>
<output name="data_out"/>
<bidirection name="control"/>
</Top>
</Root>
I'm writing a Perl script which can be converted into a specific requirement (.v, .sv file)
use strict;
use XML::Simple;
use Data::Dumper;
my $xml_root = XMLin( './simodule.xml' );
my $root_top = $xml_root->{Top};
my $mod = $root_top->{Module};
print "Top $root_top->{name}\n";
my $top_in = $root_top->{input};
foreach my $namein ( keys %$top_in ) {
print " input $top_in->{$namein}\n";
}
my $top_ou = $root_top->{output};
foreach my $nameou ( keys %$top_ou ) {
print " output $top_ou->{$nameou}\n";
}
my $top_bi = $root_top->{bidirection};
foreach my $namebi ( keys %$top_bi ) {
print " bidirection $top_bi->{$namebi}\n";
}
output:
Top risc_32
input HASH(0x172b3300)
input HASH(0x172b32a0)
input HASH(0x17268bb0)
output data_out
bidirection control
Expected output
input address
input clock
input data_in
output data_out
bidirection control
You've made your task more difficult for yourself by using one of the most deceitful modules on CPAN. XML::Simple isn't simple.
But it's docs also suggest not using it:
Why is XML::Simple "Discouraged"?
So - how about instead, XML::Twig:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
#$twig now contains our XML data structure.
my $twig = XML::Twig->new->parsefile('simodule.xml');
#fetch a value with an xpath expression - ./Top
#then extract the attribute 'name' from this node.
print "Top ", $twig->get_xpath( './Top', 0 )->att('name'), "\n";
#iterate all 'input' elements beneath "Top":
#note - single argument to "get_xpath" means all of them in a list.
foreach my $input ( $twig->get_xpath('./Top/input') ) {
#retrieve from each their name attribute (and print)
print "input ", $input->att('name'), "\n";
}
#locate the 'output' and 'bidirection' nodes within the tree, and fetch
#their name attribute.
print "output ", $twig -> get_xpath( './Top/output',0) -> att('name'),"\n";
print "bidirection ", $twig -> get_xpath( './Top/bidirection',0) -> att('name'),"\n";
We use XML::Twig which makes use of get_xpath to specify an XML path. We also use att to retrieve a named attribute. You could use iterators such as first_child and children if you prefer though:
#Top element is below the root - we create a reference to it $top
my $top = $twig->root->first_child('Top');
#From this reference, fetch the name attribute.
print "Top ", $top->att('name'), "\n";
#get children of Top matching 'input' and iterate
foreach my $input ( $top -> children('input') ) {
#print attribute called 'name'.
print "input ", $input->att('name'), "\n";
}
#Find a child below Top called 'output' and retrieve 'name' attribute.
print "output ", $top -> first_child('output') -> att('name'),"\n";
#as above.
print "bidirection ", $top -> first_child('bidirection') -> att('name'),"\n";
These are doing the same thing - personally I like xpath as a way of navigating XML but that's a matter of taste. (It lets you do all sorts of things like specify a path with embedded attributes, that kind of thing - moot point in this example though).
Given your input XML, both produce:
Top ri_32
input address
input clock
input data_in
output data_out
bidirection control
it skips the nested hashes for ALU and Power_Control because your original code appears to.
You still haven't been at all clear about exactly what output you want. But I use XML::LibXML for most of my XML processing requirements and I'd write something like this:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use XML::LibXML;
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file('simodule.xml');
foreach my $type (qw[input output bidirection]) {
foreach ($doc->findnodes("/Root/Top/$type")) {
say $_->nodeName, ' ', $_->getAttribute('name');
}
}
The output is correct.
As we don't know what exactly you need,
I modify your code to following so maybe you can figure out why you see the HASH and how to de-reference it by yourself, it's pretty simple:
use strict;
use XML::Simple;
use Data::Dumper;
local $/;
my $xml_root = XMLin(<DATA>);
print Dumper $xml_root;
my $root_top=$xml_root->{Top};
my $mod=$root_top->{Module};
print "Top $root_top->{name}\n";
my $top_in=$root_top->{input};
foreach my $namein (keys %$top_in)
{
print " input" , Dumper $top_in->{$namein};
}
my $top_ou=$root_top->{output};
foreach my $nameou (keys %$top_ou)
{
print " output $top_ou->{$nameou}\n";
}
my $top_bi=$root_top->{bidirection};
foreach my $namebi (keys %$top_bi)
{
print " bidirection $top_bi->{$namebi}\n";
}
__DATA__
<Root>
<Top name="ri_32">
<Module name="ALU">
<input name="power_control_bus"/>
<bidirection name="address_bus"/>
</Module>
<Module name="Power_control">
<input name="cpu_control_bus"/>
<output name="power_control_bus"/>
<bidirection name="address_bus"/>
</Module>
<input name="address">X</input>
<input name="clock"/>
<input name="data_in"/>
<output name="data_out"/>
<bidirection name="control"/>
</Top>
</Root>

Print attributes from two tags together

I am using XML:twig to extract some attributes from an XML file using Perl;
Here is my code
use XML::Twig;
my $file = $ARGV[0];
$file =~ /(.+)\.xml/;
my $outfile = $1 . ".snp" ;
open my $out, '>', $outfile or die "Could not open file '$outfile' $!";
my $twig = XML::Twig->new(
twig_handlers => {
'Rs/MergeHistory' => \&MergeHistory,
}
);
$twig -> parsefile( "$file");
sub MergeHistory {
my ($twig, $elt) = #_;
print $out "\t";
print $out "rs";
print $out $elt->att('rsId'), ",";
print $out "b";
print $out $elt->att('buildId'), ",";
}
This print the following results:
rs56546490,b130, rs386588736,b142
rs56546490,b130, rs386588736,b142
What I want is to print each MergeHistory rsId and buildId together as the following:
rs56546490,rs386588736, b130,b142
rs56546490,rs386588736, b130,b142
Here is a part of the XML file which contains on two MergeHistory tags :
<Rs rsId="98324" snpClass="snp" snpType="notwithdrawn"
molType="genomic" genotype="true"bitField="050028000005130500030100"
taxId="9606">
<Het type="est" value="0.05" stdError="0.1547"/>
<Validation byCluster="true" byOtherPop="true" byHapMap="true"
by1000G="true">
<otherPopBatchId>7179</otherPopBatchId>
</Validation>
<Create build="36" date="2000-09-19 17:02"/>
<Update build="144" date="2015-05-07 10:52"/>
<Sequence exemplarSs="491581208" ancestralAllele="C,C">
<Seq5>
ATAAGCAAATAACTGAAGTTTAATCAGTCTCCTCCCAGCAAGTGATATGCAACTGAGATTCC
TTATGACACATCTGAACACTAGTGGATTTGCTTTGTAGTAGGAACAA
GGTACATTCGCGGGATAAATGTGGCCAAGTTTTATCTGCTGCCAGGGCTTTCAAAT
AGGTTGACCTGACAATGGGTCACCTCTGGGACTGA</Seq5>
<Observed>C/T</Observed>
<Seq3>AATTAGGAAGAGCTGGTACCTAAAATGAAAGATGCCCTTAAATTTCAGATTCACAATTTTTT
TTTCTTAGTATAAGCATGTCCCATGTAATATCTGGGATATACTCATACCTT
TAAAAATGTGCTCATTGTTTATCTGAAATTCACATTTTAACAGGGAACCATTGT
TTTGTTATTGTTTATTGTTTTGTTTCTAAATAA</Seq3>
</Sequence>
<Ss ssId="1556770886" handle="1000GENOMES" batchId="1061891"
locSnpId="PHASE3_chrY_229259" subSnpClass="snp" orient="reverse"
strand="top" molType="genomic" buildId="144"
methodClass="sequence" validated="by-submitter">
<Sequence>
<Seq5>TTTTAGGTACCAGCTCTTCCTAATT</Seq5>
<Observed>A/G</Observed>
<Seq3>TCAGTCCCAGAGGTGACCCATTGTC</Seq3>
</Sequence>
</Ss>
<Assembly dbSnpBuild="144" genomeBuild="38.2"
groupLabel="GRCh38.p2" current="true" reference="true">
<Component componentType="contig" accession="NT_011875.13"
chromosome="Y" start="11642902" end="21789280"
orientation="fwd" gi="568801947" groupTerm="NC_000024.10"
contigLabel="GCF_000001405.28">
<MapLoc asnFrom="5341580" asnTo="5341580" locType="exact"
alnQuality="1" orient="reverse" physMapInt="16984482"
leftContigNeighborPos="5341579"rightContigNeighborPos="5341581"
refAllele="G"/>
</Component>
<SnpStat mapWeight="unique-in-contig" chromCount="1"
placedContigCount="1" unplacedContigCount="0" seqlocCount="1"
hapCount="0"/>
</Assembly>
<RsLinkout resourceId="1" linkValue="3894"/>
<RsLinkout resourceId="4" linkValue="60936"/>
<RsLinkout resourceId="5" linkValue="23388839"/>
<MergeHistory rsId="56546490" buildId="130"/>
<MergeHistory rsId="386588736" buildId="142"/>
<hgvs>NC_000024.9:g.19096363G>A</hgvs>
<hgvs>NC_000024.10:g.16984483G>A</hgvs>
<Frequency freq="0.0276" allele="A" sampleSize="1233"/>
</Rs>
twig_handlers is good for pre-processing XML, and most especially for discarding it as you go.
It's probably not what you want here though - it looks like what you're trying to do is:
extract each 'MergeHistory' element from each 'Rs' element.
Print the content reformatted.
So with that in mind - I think what you probably want is findnodes and children.
my $twig = XML::Twig->parsefile( $file );
foreach my $rs ( $twig->findnodes('//Rs') ) {
print join( ",",
map { "rs" . $_->att('rsId') } $rs->children('MergeHistory') ),
"\t";
print join( ",",
map { "b" . $_->att('buildId') } $rs->children('MergeHistory') ),
"\n";
}
Given your sample, this prints:
rs56546490,rs386588736 b130,b142
Which looks roughly what you wanted?
We use findnodes to iterate Rs elements.
Within each, we use children to fetch the MergeHistory elements.
map to extract the attribute and concat the b or rs string on the front.
join to merge it comma separated.
(you could still do the above with twig_handlers if you prefer, by firing the handler on "Rs" instead)

perl script to iterate over xml nodes using XML::LibXML

I am trying to come up with a perl script to iterate over some nodes and get values in xml file.
My XML File looks like below and is saved spec.xml
<?xml version="1.0" encoding="UTF-8"?>
<WO xmlns="http://www.example.com/yyyy" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
<WOSet>
<SR>
<FINISHTIME>2013-07-29T18:21:38-05:00</FINISHTIME>
<STARTTIME xsi:nil="true" />
<TYPE>SR</TYPE>
<DESCRIPTION>Create CUST</DESCRIPTION>
<EXTERNALSYSTEMID />
<REPORTEDBY>PCAUSR</REPORTEDBY>
<REPORTEDEMAIL />
<STATUS>RESOLVED</STATUS>
<SRID>1001</SRID>
<UID>1</UID>
<SPEC>
<AVALUE>IT</AVALUE>
<ATTRID>CUST_DEPT</ATTRID>
<NALUE xsi:nil="true" />
<TVALUE />
</SPEC>
<SPEC>
<AVALUE>001</AVALUE>
<ATTRID>DEPT_CODE</ATTRID>
<NVALUE xsi:nil="true" />
<TVALUE />
</SPEC>
</SR>
</WOSet>
</WO>
when I run the below script , I neither get the output nor any error to get clue on where to fix things...
I am not a perl expert , would love experts here to through some light...
#!/usr/bin/perl
use XML::LibXML;
use strict;
use warnings;
my $file = 'spec.xml';
my $parser = XML::LibXML->new();
my $tree = $parser->parse_file($file);
my $root = $tree->getDocumentElement;
foreach my $atrid ( $tree->findnodes('WO/WOSet/SR/SPEC') ) {
my $name = $atrid->findvalue('ATTRID');
my $value = $atrid->findvalue('AVALUE');
print $name
print " = ";
print $value;
print ";\n";
}
My expected output is
CUST_DEPT = IT
DEPT_CODE = 001
The XML doesn't contain any element named WO in the null namespace. You want to match the elements named WO in the http://www.example.com/yyyy namespace.
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML qw( );
use XML::LibXML::XPathContext qw( );
my $file = 'spec.xml';
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($file);
my $root = $doc->getDocumentElement;
my $xpc = XML::LibXML::XPathContext->new($doc);
$xpc->registerNs(y => 'http://www.example.com/yyyy');
for my $atrid ( $xpc->findnodes('y:WO/y:WOSet/y:SR/y:SPEC') ) {
my $name = $xpc->findvalue('y:ATTRID', $atrid);
my $value = $xpc->findvalue('y:AVALUE', $atrid);
print "$name = $value\n";
}

perl script using XML parser to read values in text file and replace it xml file

Perl script using XML parser to read values in text file and replace it in xml file
how to read xml tag and replace value from text file value. if an entry value is null in install.properties then same has to be updated in property.xml and if entry value is null xml it should get updated with text file value
install.properties text file
TYPE = Patch
LOCATION =
HOST = 127.1.1.1
PORT = 8080
property.xml file before values are replaced
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<entry key="TYPE">Release</entry>
<!-- tst -->
<entry key="LOCATION">c:/release</entry>
<entry key="HOST">localhost</entry>
<entry key="PORT"></entry>
</properties>
property.xml file after values has been replaced
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<entry key="TYPE">Patch</entry>
<!-- tst -->
<entry key="LOCATION"></entry>
<entry key="HOST">127.1.1.1</entry>
<entry key="PORT">8080</entry>
</properties>
A solution using XML::XSH2, a wrapper around XML::LibXML.
#!/usr/bin/perl
use warnings;
use strict;
use XML::XSH2;
open my $INS, '<', 'install.properties' or die $!;
while (<$INS>) {
chomp;
my ($var, $val) = split / = /; # / fix StackOverflow syntax highlighting.
$XML::XSH2::Map::ins->{$var} = $val;
}
xsh << '__XSH__';
open property.xml ;
for /properties/entry {
set ./text() xsh:lookup('ins', #key) ;
}
save :b ;
__XSH__
The same programme imlemented using only XML::LibXML:
#!/usr/bin/perl
use warnings;
use strict;
use XML::LibXML;
open my $INS, '<', 'install.properties' or die $!;
my %ins;
while (<$INS>) {
chomp;
my ($var, $val) = split / = /; # / fix StackOverflow syntax highlighting.
$ins{$var} = $val;
}
my $xml = 'XML::LibXML'->load_xml( location => 'property.xml' );
for my $entry( $xml->findnodes('/properties/entry')) {
my ($text) = $entry->findnodes('text()');
$text->setData($ins{ $entry->getAttribute('key') });
}
rename 'property.xml', 'property.xml~';
$xml->toFile('property.xml');
Again, with XML::Twig:
#!/usr/bin/perl
use strict;
use warnings;
use autodie qw( open);
use XML::Twig;
my $IN= "install.properties";
my $XML= "properties.xml";
# load the input file into a a hash key => value
open( my $in, '<', $IN);
my %entry= map { chomp; split /\s*=\s*/; } <$in>;
XML::Twig->new( twig_handlers => { entry => \&entry, },
keep_spaces => 1,
)
->parsefile_inplace( $XML);
sub entry
{ my( $t, $entry)= #_;
if( my $val= $entry{$entry->att( 'key')} )
{ $entry->set_text( $val); }
$t->flush;
}

how to remove comments syntax only?

I want collect all tags in from XML file. How can I remove comments syntax only?
XML File:
<xml>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Holt</surname>
<given-names> Maurice<!--<xref ref-type="fn" rid="fnI_1"><sup>1</sup></xref>--></given-names>
</name>
</contrib>
</contrib-group>
</xml>
I need output as:
<xml>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Holt</surname>
<given-names> Maurice<xref ref-type="fn" rid="fnI_1"><sup>1</sup></xref></given-names>
</name>
</contrib>
</contrib-group>
</xml>
How can I remove comments.. without remove contains?
script:
#!/usr/bin/perl
use warnings;
use strict;
use XML::Twig;
open(my $output , '>', "split.xml") || die "can't open the Output $!\n";
my $xml = XML::Twig->new( twig_handlers => { xref => sub{comments => 'drop'} } );
$xml->parsefile("sample.xml");
$xml->print($output);
I can't do it... How can I remove <!-- --> only without remove contain?
#!/usr/bin/perl
use warnings;
use strict;
use XML::Twig;
open my $output , '>', 'split.xml' or die "Can't open: $!\n";
my $xml = XML::Twig->new( comments => 'process', # Turn on comment processing
twig_handlers =>
{ '#COMMENT' => \&uncomment }
);
$xml->parsefile('sample.xml');
$xml->print($output);
sub uncomment {
my ($xml, $comment) = #_;
$comment->set_outer_xml($comment->text); # Replace the comment with its contents.
}