What is this perl object and how do I iterate through it? - perl

I have a perl object that was returned to me whose data I can't seem to extract. If I run Data::Dumper->Dump on it as:
Data::Dumper->Dump($message_body)
I get:
$VAR1 = 'SBM Message
';
$VAR2 = '--SBD.Boundary.605592468
';
$VAR3 = 'Content-Type: text/plain;charset=US-ASCII
';
$VAR4 = 'Content-Disposition: inline
If I execute the line:
print $message_body;
I get:
ARRAY(0x9145668)
I would think this is an array. However, trying to iterate through it there only seems to be a single element. How do I extract each of the elements from this? By the way this, is basically the body of a mail message extracted using the MIME::Parser package. It was created using the following:
my $parser = new MIME::Parser;
my $entity = $parser->parse($in_fh); # Where $in_fh points to a mail message
$message_body = $entity->body;

Try below foreach loop.
foreach my $item (#{$message_body})
{
print $item."\n";
}
$message_body is an ARRAY reference. Hence you need to dereference it and then iterate through each element using the foreach loop.
Read:
http://perlmeme.org/howtos/using_perl/dereferencing.html and http://www.thegeekstuff.com/2010/06/perl-array-reference-examples/

Data::Dumper is only a poor man's choice to see the content.
To see all the gory internal details use Devel::Peek instead.
use Devel::Peek;
Dump $message_body;

Related

Grep http response in a perl object

I have an http response in a perl object (HTTP::Tiny) that I'm trying to extract specific values from but a positive match returns the entire object instead of just the values I want. How can I just extract those lines or values I want from the object?
for my $url (#urls) {
print $url."\n";
$response = $http->get("$url");
my ($res) = grep {/href/} $response->{content};
print $res;
}
grep in Perl isn't quite like the command-line program of the same name. It works over arrays rather than text lines. Try something like this instead:
my ($res) = grep {/href/} split(/\n/, $response->{content});
The split will turn the document you GET into an array of lines so that the grep will do what you expect.

Perl - Need to append duplicates in a file and write unique value only

I have searched a fair bit and hope I'm not duplicating something someone has already asked. I have what amounts to a CSV that is specifically formatted (as required by a vendor). There are four values that are being delimited as follows:
"Name","Description","Tag","IPAddresses"
The list is quite long (and there are ~150 unique names--only 2 in the sample below) but it basically looks like this:
"2B_AppName-Environment","desc","tag","192.168.1.1"
"2B_AppName-Environment","desc","tag","192.168.22.155"
"2B_AppName-Environment","desc","tag","10.20.30.40"
"6G_ServerName-AltEnv","desc","tag","1.2.3.4"
"6G_ServerName-AltEnv","desc","tag","192.192.192.40"
"6G_ServerName-AltEnv","desc","tag","192.168.50.5"
I am hoping for a way in Perl (or sed/awk, etc.) to come up with the following:
"2B_AppName-Environment","desc","tag","192.168.1.1,192.168.22.155,10.20.30.40"
"6G_ServerName-AltEnv","desc","tag","1.2.3.4,192.192.192.40,192.168.50.5"
So basically, the resulting file will APPEND the duplicates to the first match -- there should only be one line per each app/server name with a list of comma-separated IP addresses just like what is shown above.
Note that the "Decription" and "Tag" fields don't need to be considered in the duplication removal/append logic -- let's assume these are blank for the example to make things easier. Also, in the vendor-supplied list, the "Name" entries are all already sorted to be together.
This short Perl program should suit you. It expects the path to the input CSV file as a parameter on the command line and prints the result to STDOUT. It keeps track of the appearance of new name fields in the #names array so that it can print the output in the order that each name first appears, and it takes the values for desc and tag from the first occurrence of each unique name.
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({always_quote => 1, eol => "\n"});
my (#names, %data);
while (my $row = $csv->getline(*ARGV)) {
my $name = $row->[0];
if ($data{$name}) {
$data{$name}[3] .= ','.$row->[3];
}
else {
push #names, $name;
$data{$name} = $row;
}
}
for my $name (#names) {
$csv->print(*STDOUT, $data{$name});
}
output
"2B_AppName-Environment","desc","tag","192.168.1.1,192.168.22.155,10.20.30.40"
"6G_ServerName-AltEnv","desc","tag","1.2.3.4,192.192.192.40,192.168.50.5"
Update
Here's a version that ignores any record that doesn't have a valid IPv4 address in the fourth field. I've used Regexp::Common as it's the simplest way to get complex regex patterns right. It may need installing on your system.
use strict;
use warnings;
use Text::CSV;
use Regexp::Common;
my $csv = Text::CSV->new({always_quote => 1, eol => "\n"});
my (#names, %data);
while (my $row = $csv->getline(*ARGV)) {
my ($name, $address) = #{$row}[0,3];
next unless $address =~ $RE{net}{IPv4};
if ($data{$name}) {
$data{$name}[3] .= ','.$address;
}
else {
push #names, $name;
$data{$name} = $row;
}
}
for my $name (#names) {
$csv->print(*STDOUT, $data{$name});
}
I would advise you to use a CSV parser like Text::CSV for this type of problem.
Borodin has already pasted a good example of how to do this.
One of the approaches that I'd advise you NOT to use are regular expressions.
The following one-liner demonstrates how one could do this, but this is a very fragile approach compared to an actual csv parser:
perl -0777 -ne '
while (m{^((.*)"[^"\n]*"\n(?:(?=\2).*\n)*)}mg) {
$s = $1;
$s =~ s/"\n.*"([^"\n]+)(?=")/,$1/g;
print $s
}' test.csv
Outputs:
"2B_AppName-Environment","desc","tag","192.168.1.1,192.168.22.155,10.20.30.40"
"6G_ServerName-AltEnv","desc","tag","1.2.3.4,192.192.192.40,192.168.50.5"
Explanation:
Switches:
-0777: Slurp the entire file
-n: Creates a while(<>){...} loop for each “line” in your input file.
-e: Tells perl to execute the code on command line.
Code:
while (m{^((.*)"[^"]*"\n(?:(?=\2).*\n)*)}mg): Separate text into matching sections.
$s =~ s/"\n.*"([^"\n]+)(?=")/,$1/g;: Join all ip addresses by a comma in matching sections.
print $s: Print the results.

XML parsing using perl

I tried to research on simple question I have but couldn't do it. I am trying to get data from web which is in XML and parse it using perl. Now, I know how to loop on repeating elements. But, I am stuck when its not repeating (I know this might be silly). If the elements are repeating, I put it in array and get the data. But, when there is only a single element it throws and error saying 'Not an array reference'. I want my code such that it can parse at both time (for single and multiple elements). The code I am using is as follows:
use LWP::Simple;
use XML::Simple;
use Data::Dumper;
open (FH, ">:utf8","xmlparsed1.txt");
my $db1 = "pubmed";
my $query = "13054692";
my $q = 16354118; #for multiple MeSH terms
my $xml = new XML::Simple;
$urlxml = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=$db1&id=$query&retmode=xml&rettype=abstract";
$dataxml = get($urlxml);
$data = $xml->XMLin("$dataxml");
#print FH Dumper($data);
foreach $e(#{$data->{PubmedArticle}->{MedlineCitation}->{MeshHeadingList}->{MeshHeading}})
{
print FH $e->{DescriptorName}{content}, ' $$ ';
}
Also, can I do something such that the separator $$ will not get printed after the last element?
I also tried the following code:
$mesh = $data->{PubmedArticle}->{MedlineCitation}->{MeshHeadingList}->{MeshHeading};
while (my ($key, $value) = each(%$mesh)){
print FH "$value";
}
But, this prints all the childnodes and I just want the content node.
Perl's XML::Simple will take a single item and return it as a scalar, and if the value repeats it sends it back as an array reference. So, to make your code work, you just have to force MeshHeading to always return an array reference:
$data = $xml->XMLin("$dataxml", ForceArray => [qw( MeshHeading )]);
I think you missed the part of "perldoc XML::Simple" that talks about the ForceArray option:
check out ForceArray because you'll almost certainly want to turn it on
Then you will always get an array, even if the array contains only one element.
As others have pointed out, the ForceArray option will solve this particular problem. However you'll undoubtedly strike another problem soon after due to XML::Simple's assumptions not matching yours. As the author of XML::Simple, I strongly recommend you read Stepping up from XML::Simple to XML::LibXML - if nothing else it will teach you more about XML::Simple.
Since $data->{PubmedArticle}-> ... ->{MeshHeading} can be either a string or an array reference depending on how many <MeshHeading> tags are present in the document, you need to examine the value's type with ref and conditionally dereference it. Since I am unaware of any terse Perl idioms for doing this, your best bet is to write a function:
sub toArray {
my $meshes = shift;
if (!defined $meshes) { return () }
elsif (ref $meshes eq 'ARRAY') { return #$meshes }
else { return ($meshes) }
}
and then use it like so:
foreach my $e (toArray($data->{PubmedArticle}->{MedlineCitation}->{MeshHeadingList}->{MeshHeading})) { ... }
To prevent ' $$ ' from being printed after the last element, instead of looping over the list, concatenate all the elements together with join:
print FH join ' $$ ', map { $_->{DescriptionName}{content} }
toArray($data->{PubmedArticle}->{MedlineCitation}->{MeshHeadingList}->{MeshHeading});
This is a place where XML::Simple is being...simple. It deduces whether there's an array or not by whether something occurs more than once. Read the doc and look for the ForceArray option to address this.
To only include the ' $$ ' between elements, replace your loop with
print FH join ' $$ ', map $_->{DescriptorName}{content}, #{$data->{PubmedArticle}->{MedlineCitation}->{MeshHeadingList}->{MeshHeading}};

Perl Data::Dumper output processing

I am using the DATA::Dumper api to parse an html table..
Here is the perl code:
print Dumper $row;
Here is the output:
$VAR1 = [
'Info1',
'Info2',
'Info3',
];
Question:
1. I want to modify Info1, Info2, etc before writing into a SQL table. How do i access that from above output?
Something like $row->{var1}->? I've tried a couple of options and nothing worked.
This is an old question, with an answer that was never selected.
Ways to update an arrayref
Element by array reference:
$row->[0] = 'foo';
$row->[1] = 'bar';
$row->[2] = 'baz';
List assignment:
($row->[0], $row->[1], $row->[2]) = ('foo','bar','baz');
Array list assignment:
#{$row} = ('foo','bar','baz');

Why can't I access elements inside an XML file with XPath in XML::LibXML?

I have an XML file, part of which looks like this:
<wave waveID="1">
<well wellID="1" wellName="A1">
<oneDataSet>
<rawData>0.1123975676</rawData>
</oneDataSet>
</well>
... more wellID's and rawData continues here...
I am trying to parse the file with Perl's libXML and output the wellName and the rawData using the following:
use XML::LibXML;
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file('/Users/johncumbers/Temp/1_12-18-09-111823.orig.xml');
my $xc = XML::LibXML::XPathContext->new( $doc->documentElement() );
$xc->registerNs('ns', 'http://moleculardevices.com/microplateML');
my #n = $xc->findnodes('//ns:wave[#waveID="1"]'); #xc is xpathContent
# should find a tree from the node representing everything beneath the waveID 1
foreach $nod (#n) {
my #c = $nod->findnodes('//rawData'); #element inside the tree.
print #c;
}
It is not printing out anything right now and I think I have a problem with my Xpath statements. Please can you help me fix it, or can you show me how to trouble shoot the xpath statements? Thanks.
Instead of using findnodes in the loop, use getElementsByTagName():
my #c = $nod->getElementsByTagName('rawData');
Here are some other handy methods to use processing to #c array:
$c[0]->toString; # <rawData>0.1123975676</rawData>
$c[0]->nodeName; # rawData
$c[0]->textContent; # 0.1123975676
If the 'wave' element is in a namespace then the 'rawData' element is as well so you probably need to use
foreach $nod (#n) {
my #c = $xc->findnodes('descendant::ns:rawData', $nod); #element inside the tree.
print #c;
}