help in parsing - perl

I am having a XML file as shown below,
<message1>
<val1>100</val1>
<val2>200</val2>
<val3>300</val3>
<val4>400</val4>
</message1>
<message2>
<val1>100</val1>
<val2>200</val2>
<val3>300</val3>
<val4>400</val4>
</message2>
I have to parse the values (val) and i could not use XML::Simple module. The parsing should be started from <message1> and i have to put the values in an array till </message1> and then i have to repeat this for <message2> till </message2>.
Pictorially it is like
<message1>
----100
----200
----300
----400
</message1>
<message2>
----100
----200
----300
----400
</message2>
Can any one help me .. I am struggling a lot
Thanks
Senthil kumar

Since we're back in 1999, I think I would forget about strict and warnings, use symbolic references and string eval, and be done with it:
#!/usr/bin/perl
while( <DATA>)
{ s{<(message\d)>}{\#$1=(}; # #message1=(
s{<val\d>}{}; #
s{<\/val\d>}{,}; # ,
s{</message\d>}{);}; # );
$s.=$_;
};
eval $s;
$,= ", "; $\= "\n";
foreach (1..2) { print "\#message$_: ", #{"message$_"}; }
__DATA__
<message1>
<val1>100</val1>
<val2>200</val2>
<val3>300</val3>
<val4>400</val4>
</message1>
<message2>
<val1>100</val1>
<val2>200</val2>
<val3>300</val3>
<val4>400</val4>
</message2>
(in case that's not clear: that's a joke! As they say "Have you tried using an XML parser instead?")

Assuming your input is completely regular as you show, the following should work.
But you are far better off getting a real XML parser to work, by wrapping a root element around all your content or by parsing each message separately.
use strict;
use warnings;
my %data;
while (<>) {
# skip blank lines
next unless /\S/;
my ($tag) = /^<(.*)>$/
or warn("expected tag, got $_ "), next;
$data{$tag} ||= [];
while (<>) {
last if /^<\/\Q$tag\E>$/;
my (undef, $value) = /^<val(\d+)>(.*)<\/val\1>$/
or warn("expected val, got $_ "), next;
push #{ $data{$tag} }, $value;
}
}
use Data::Dumper;
print Dumper \%data;

Related

How to push data to an array-containing hash with `eval` in perl?

I'm trying to mirror the website which having the files and folder to hash.
This one having example So I tried, the following
my $url = "http://localhost/mainfolder/";
my ($parent) = $url=~m/\/(\w+)\/?$/;
my %tree=(mainfolder=>[]);
folder_create($url);
sub folder_create
{
my $url = shift;
my $cont = get($url);
my ($child) = $url=~m/($parent.*)/;
$child=~s/\/?(\w+)\/?/{$1}/g;
while($cont=~m/(<tr.+?<\/tr>)/g)
{
my $line = $1;
if($line=~m/\[DIR\].*?href="([^"]*)"[^>]*>(.+?)<\/a>/)
{
my $sub =$1;
$sub=~s/\///;
print "$child\n\n";
push ( eval'#{$tree $child}',$sub);
}
}
}
use Data::Dumper;
print Dumper \%tree,"\n\n\n";
Update
Instead of messing with eval you should use the Data::Diver module
Because of the single quotes, you're trying to execute #{$hash$var} which isn't valid Perl.
If you wrote it as
push eval "\#{\$hash$var}", "somedata"
Then the eval would work, but it would evaluate to the contents of the array in hash element main, which is an empty list of values. That means your call would become
push( ( ), "somedata")
or just
push "somedata"
which is meaningless
This is a particularly unpleasant thing to want to do. Why do you think you need it?

How to use refernce concept and access element of subroutine argument using Perl?

I am writing a code for calling a subroutine which has 4 argument(3 hashes and one file handler).i want to know how to access them in subroutine.My code is as below.
#print OUTFILE "Content of TPC file:.\n";
my $DATA_INFO = $ARGV[0];
my $OUT_DIR = $ARGV[1];
my $log= "$OUT_DIR/log1";
open(LOG1,">$log");
require "$DATA_INFO";
my $SCRIPT_DIR = $ENV{"SCRIPT_DIR"} ;
require "$SCRIPT_DIR/cmp_fault.pl";
require "$SCRIPT_DIR/pattern_mismatch.pl";
require "$SCRIPT_DIR/scan_count.pl";
print "\nComparing data:\n\n" ;
pattern_mismatch("\%data","\%VAR1","\%status",*LOG1);
cmp_fault("\%data","\%VAR1","\%status",*LOG1);
scan_count("\%data","\%status",*LOG1);
print "\n Comparison done:\n";
foreach $pattern (keys %status) {
print "pattern";
foreach $attr (keys %{$status{$pattern}}) {
print ",$attr";
}
print "\n";
last;
}
#Print Data
foreach $pattern (keys %status) {
print "$pattern";
foreach $attr (keys %{$status{$pattern}}) {
print ",$status{$pattern}{$attr}";
}
print "\n";
Sub routine cmp_fault is here:
sub cmp_fault {
use strict;
use warning;
$data_ref= $_[0];;
$VAR1_ref= $_[1];
$status_ref = $_[2];
$log1_ref=$_[3];
# print LOG1"For TPC : First find the pattern and then its fault type\n";
for $pat ( keys %$data_ref ) {
print "fgh:\n$pat,";
for $key (keys %{$data_ref{$pat}}) {
if($key=~/fault/){
print LOG1 "$key:$data_ref{$pat}{$key},\n";
}
}
}
# print LOG1 "\nFor XLS : First find the pattern and then its pattern type\n";
for $sheet (keys %$VAR1_ref){
if ("$sheet" eq "ATPG") {
for $row (1 .. $#{$VAR1_ref->{$sheet}}) {
$patname = $VAR1_ref->{'ATPG'}[$row]{'Pattern'} ;
next if ("$patname" eq "") ;
$faultXls = $VAR1_ref->{'ATPG'}[$row]{'FaultType'} ;
# print LOG1 " $patname==>$faultXls \n";
if (defined $data{$patname}{'fault'}) {
$faultTpc = $data{$patname}{'fault'} ;
# print LOG1 "\n $patname :XLS: $faultXls :TPC: $faultTpc\n";
if("$faultXls" eq "$faultTpc") {
print LOG1 "PASS: FaultType Matched $patname :XLS: $faultXls :TPC: $faultTpc\n\n\n";
print "PASS: FaultType Matched $patname :XLS: $faultXls :TPC: $faultTpc\n\n";
$status_ref->{$patname}{'FaultType'} = PASS;
}
else {
print LOG1 "FAIL: FaultType Doesn't Match\n\n";
$status_ref->{$patname}{'FaultType'} = Fail;
}
}
}
}
}
}
return 1;
When passing parameters into an array, you can only ever pass a single list of parameters.
For scalars, this isn't a problem. If all you're acting on is a single array, this also isn't a problem.
If you need to send scalars and an array or hash, then the easy way is to 'extract' the scalar parameters first, and then treat 'everything else' as the list.
use strict;
use warnings;
sub scalars_and_array {
my ( $first, $second, #rest ) = #_;
print "$first, $second, ", join( ":", #rest ), "\n";
}
scalars_and_array( "1", "2", "3", 4, 5, 6 );
But it should be noted that by doing so - you're passing values. You can do this with hashes too.
To pass data structure references, it's as you note - pass by reference, then dereference. It's useful to be aware though, that -> becomes useful, because it's accessing a hash and dereferencing it.
use strict;
use warnings;
use Data::Dumper;
sub pass_hash {
my ( $hashref ) = #_;
print $hashref,"\n";
print $hashref -> {"one"},"\n";
print $hashref -> {"fish"} -> {"haddock"};
}
my %test_hash = ( "one" => 2,
"three" => 4,
"fish" => { "haddock" => "plaice" }, );
pass_hash ( \%test_hash );
print "\n";
print Dumper \%test_hash;
The core of your problem here though, is that you haven't turned on strict and warnings which would tell you that:
for $pat ( keys %data_ref ) {
is wrong - there is no hash called data_ref there's only a scalar (which holds a hash reference) called $data_ref.
You need %$data_ref here.
And here:
for $key ( keys %{ $data{$pat} } ) {
You also have no $data - your code says $data_ref. (You might have %data in scope, but that's a really bad idea to mess around with within a sub).
There's a bunch of other errors - which would also be revealed by strict and warnings. That's a very basic debugging step, and you will generally get a much better response from Stack Overflow if you do this before asking for assistance. So please - do that, tidy up your code and remove errors/warnings. If you are still having problems after that, then by all means make a post outlining where and what problem you're having.

Push into end of hash in Perl

So what I am trying to do with the following code is push a string, let's say "this string" onto the end of each key in a hash. I'm completely stumped on how to do this. Here's my code:
use warnings;
use strict;
use File::Find;
my #name;
my $filename;
my $line;
my #severity = ();
my #files;
my #info = ();
my $key;
my %hoa;
my $xmlfile;
my $comment;
my #comments;
open( OUTPUT, "> $ARGV[0]" );
my $dir = 'c:/programs/TEST/Test';
while ( defined( $input = glob( $dir . "\\*.txt" ) ) ) {
open( INPUT, "< $input" );
while (<INPUT>) {
chomp;
if (/File/) {
my #line = split /:/;
$key = $line[1];
push #{ $hoa{$key} }, "Filename\n";
}
if ( /XML/ ... /File/ ) {
$xmlfile = $1;
push #{ $hoa{$key} }, "XML file is $xmlfile\n";
}
if (/Important/) {
push #{ $hoa{$key} }, "Severity is $_\n";
}
if (/^\D/) {
next if /Important/;
push #{ $hoa{$key} }, "Given comment is $_\n";
}
push #{ $hoa{$key} }, "this string\n";
}
}
foreach my $k ( keys %hoa ) {
my #list = #{ $hoa{$k} };
foreach my $l (#list) {
print OUTPUT $l, "\n";
}
}
}
close INPUT;
close OUTPUT;
Where I have "this string" is where I was trying to push that string onto the end of the array. However, what ended up happening was that it ended up printing "this string" three times, and not at the end of every key like I wanted. When I tried to put it outside the while() loop, it said that the value of $key was not initialized. So please, any help? And if you need any clarification on what I'm asking, just let me know. Thank you!
No offence, but there are so many issues in this code I don't even know where to start...
First, the 'initialization block' (all these my $something; my #somethings lines at the beginning of this script) is not required in Perl. In fact, it's not just 'redundant' - it's actually confusing: I had to move my focus back and forth every time I encountered a new variable just to check its type. Besides, even with all this $input var is still not declared as local; it's either missing in comments, or the code given has omissions.
Second, why do you declare your intention to use File::Find (good) - but then do not use it at all? It could greatly simplify all this while(glob) { while(<FH>) { ... } } routine.
Third, I'm not sure why you assign something to $key only when the line read is matched by /File/ - but then use its value as a key in all the other cases. Is this an attempt to read the file organized in sections? Then it can be done a bit more simple, either by slurp/splitting or localizing $/ variable...
Anyway, the point is that if the first line of the file scanned is not matched by /File/, the previous (i.e., from the previous file!) value is used - and I'm not quite sure that it's intended. And if the very first line of the first file is not /File/-matched, then an empty string is used as a key - again, it smells like a bug...
Could you please describe your task in more details? Give some test input/output results, perhaps... It'd be great to proceed in short tasks, organizing your code in process.
Your program is ill-conceived and breaks a lot of good practice rules. Rather than enumerate them all, here is an equivalent program with a better structure.
I wonder if you are aware that all of the if statements will be tested and possibly executed? Perhaps you need to make use of elsif?
Aside from the possibility that $key is undefined when it is used, you are also setting $xmlfile to $1 which will never be defined as there are no captures in any of your regular expressions.
It is impossible to tell from your code what you are trying to do, so we can help you only if you show us your output, input and say how to derive one from the other.
use strict;
use warnings;
use File::Find;
my ($outfile) = #ARGV;
my $dir = 'c:/programs/TEST/Test';
my %hoa;
my $key;
while (my $input = glob "$dir/*.txt") {
open my $in, '<', $input or die $!;
while (<$in>) {
chomp;
if (/File/) {
my $key = (split /:/)[1];
push #{ $hoa{$key} }, "Filename\n";
}
if (/XML/ ... /File/) {
my $xmlfile = $1;
push #{ $hoa{$key} }, "XML file is $xmlfile\n";
}
if (/Important/) {
push #{ $hoa{$key} }, "Severity is $_\n";
}
if (/^\D/) {
next if /Important/;
push #{ $hoa{$key} }, "Given comment is $_\n";
}
push #{ $hoa{$key} }, "this string\n";
}
close $in;
}
open my $out, '>', $outfile or die $!;
foreach my $k (keys %hoa) {
foreach my $l (#{ $hoa{$k} }) {
print $out $l, "\n";
}
}
close $out;
I suspect based on your code, that the line where $key is set is not called each time through the loop, and that you do not trigger any of the other if statements.
This would append "this string" to the end of the array. Based on that you are getting 3 of the "this strings" at the end of the array, I would suspect that two lines do not go through the if (/FILE/) or any of the other if statements. This would leave the $key value the same and at the end, you would append "this string" to the array, using whatever the last value of $key was when it was set.
This will append the string "this string" to every element of the hash %hoa, which elements are array refs:
for (values(%hoa)) { push #{$_}, "this string"; }
Put that outside your while loop, and you'll print "this string" at the end of each element of %hoa.
It will autovivify array refs where it finds undefined elements. It will also choke if it cannot dereference an element as an array, and will manipulate arrays by symbolic reference if it finds a simple scalar and is not running under strict:
my %autoviv = ( a => ['foo'], b => undef );
push #$_, "PUSH" for values %autoviv; # ( a => ['foo', 'PUSH'], b => ['PUSH'] )
my %fatal = ( a => {} );
push #$_, "PUSH" for values %fatal; # FATAL: "Not an ARRAY reference at..."
my %dangerous = (a => "foo");
push #$_, "PUSH" for values %dangerous; # Yikes! #foo is now ("PUSH")
use strict;
my %kablam = (a => "foo");
push #$_, "PUSH" for values %kablam; # "Can't use string ("foo") as an ARRAY ref ..."
As I understand it, traverse the hash with a map command to modify its keys. An example:
EDIT: I've edited because I realised that the map command can be assigned to the same hash. No need to create a new one.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my %hash = qw|
key1 value1
key2 value2
key3 value3
|;
my %hash = map { $_ . "this string" => $hash{ $_ } } keys %hash;
print Dump \%hash;
Run it like:
perl script.pl
With following output:
$VAR1 = {
'key3this string' => 'value3',
'key2this string' => 'value2',
'key1this string' => 'value1'
};

Perl reading configuration file without using Modules

Let's say I have a configuration file.
Config.csv
Server,Properties
"so-al-1","48989"
"so-al-3","43278"
"so-al-5","12345"
I need to use a perl script to retrieve the server and properties from the file in order to use the varible's values in my script. Also our client server doesn't want us to install any modules.
So how do I read this document in variables without using a module?
open(FILE,"Config.csv");
undef($/); #sucks the entire file in at once
while(<FILE>){
(#words)=split(/\s+/);
}
close FILE;
for (#words){
s/[\,|\.|\!|\?|\:|\;]//g; #removed punctuation
$word{$_}++;
}
for (sort keys %word){
print "$_ occurred $word{$_} times\n";
}
I did try the above but it doesn't put it to a hash that I wanted.
Edited: I copied the code too fast and missed a line.
Edited: I just found out that there's a question like this in StackOverflow already. How can I parse quoted CSV in Perl with a regex?
Following the usual warning of "you should use a CSV module", this works:
#!/usr/bin/env perl
use warnings;
use strict;
my $header_str=<DATA>;
chomp $header_str;
my #header=$header_str =~ /(?:^|,)("(?:[^"]+|"")*"|[^,]*)/g;
my %fields;
my #temp;
my $line;
while($line=<DATA>){
chomp $line;
#temp = $line =~ /(?:^|,)("(?:[^"]+|"")*"|[^,]*)/g;
for (#temp) {
if (s/^"//) {
s/"$//; s/""/"/g;
}
}
$fields{$temp[0]}=$temp[1];
}
print "$_\t\t" for (#header);
print "\n";
print map { "$_\t\t$fields{$_}\n" } sort keys %fields;
__DATA__
Server,Properties
"so-al-1","48989"
"so-al-3","43278"
"so-al-5","12345"
Output:
Server Properties
so-al-1 48989
so-al-3 43278
so-al-5 12345
#!/usr/bin/perl
use warnings;
use strict;
while (<DATA>) {
chomp;
next unless my($key,$value) = split /,/;
s/^"//, s/"$// for $key, $value;
print "key=$key value=$value\n";
}
__DATA__
Server,Properties
"so-al-1","48989"
"so-al-3","43278"
"so-al-5","12345"

Dealing with multiple capture groups in multiple records

Data Format:
attribname: data
Data Example:
cheese: good
pizza: good
bagel: good
fire: bad
Code:
my $subFilter='(.+?): (.+)';
my #attrib = ($dataSet=~/$subFilter/g);
for (#attrib)
{
print "$_\n";
}
The code spits out:
cheese
good
pizza
good
[etc...]
I was wondering what an easy Perly way to do this is? I am parsing the data from a log the data above is trash for simplicity. I am newer to Perl, I suspect I could do this via fanangling indexes, but I was wondering if there is a short method of implementing this? Is there any way to have the capture groups put into two different variables instead of serially appended to the list along with all matches?
Edit: I want the attribute and it's associated value together so I can the do what I need to to them. For example if within my for loop I could access both the attribute name and attribute value.
Edit:
I tried
my %attribs;
while (my $line = <$data>)
{
my ($attrib, $value) = ($line=~m/$subFilter/);
print $attribs{$attrib}," : ", $value,"\n";
}
and no luck :( I don't get any output with this. My data is in a variable not a file, because it parsed out of a set of parent data which is in a file. It would be convenient if the my variable worked so that my (#attrib, #value) = ($line=~/$subFilter/g); filled the lists appropriately with the multiple matches.
Solution:
my #line = ($7 =~/(.+?)\n/g);
for (#line)
{
my ($attrib, $value) = ($_=~m/$subFilter/);
if ($attrib ne "")
{
print $attrib," : ", $value,"\n";
}
}
I'm not really clear on what you actually want to store, but here's how you could store the data in a hash table, with '1' indicating good and '0' indicating 'bad':
use strict;
use warnings;
use Data::Dumper;
my %foods;
while (my $line = <DATA>)
{
chomp $line;
my ($food, $good) = ($line =~ m/^(.+?): (.+)$/);
$foods{$food} = ($good eq 'good' ? 1 : 0);
}
print Dumper(\%foods);
__DATA__
cheese: good
pizza: good
bagel: good
fire: bad
This prints:
$VAR1 = {
'bagel' => 1,
'cheese' => 1,
'fire' => 0,
'pizza' => 1
};
A sensible approach would be to make use of the split function:
my %attrib;
open my $data, '<', 'fileName' or die "Unable to open file: $!";
while ( my $line = <$data> ) {
my ( $attrib, $value ) = split /:\s*/, $line, 2;
$attrib{$attrib} = $value;
}
close $data;
foreach my $attrib ( keys %attrib ) {
print "$attrib: $attrib{$attrib}\n";
}
If you're into one-liners, the following would achieve the same:
$ perl -F/:\s*/ -ane '$attrib{$F[0]} = $F[1]; } END { print $_,"\t",$attrib{$_},"\n" foreach keys %attrib;" fileName