Lingua::TreeTagger tagging only first word in Parts Of Speech tagging - perl

I'm using Lingua::TreeTagger for POS tagging, but its tagging just first word of string.
my $tagger = Lingua::TreeTagger->new(
'language' => 'english',
'options' => [ qw( -token -lemma -no-unknown ) ],
);
$text_to_tag = 'I another yet sample text I.';
my $tagged_text = $tagger->tag_text( \$text_to_tag );
print Dumper $tagged_text;
Output of the above dumper is as below:
'sequence' => [
bless( {
'is_SGML_tag' => 0,
'original' => 'I',
'tag' => 'PP',
'lemma' => 'I'
}, 'Lingua::TreeTagger::Token' )
],
Please note that only I is tagged here, but I want to tag whole sentence. In my actual code, I want to tag contents of a file. How can I tag all words of the sentence? Any help is appreciated.

Related

Print data after dumper

I have this structure with data-dumper:
$VAR1 = {
'field' => [
{
'content' => {
'en' => [
'Footware haberdashery leather goods'
],
'de' => [
'Schuhe Kurzwaren und Lederartikel'
],
'it' => [
'Calzature mercerie e pelletterie'
]
},
'type' => 'tag',
'valore' => 'TAG3'
},
{
'content' => {
'en' => [
'Cobbler'
],
'de' => [
'Schuster'
],
'it' => [
'Calzolai'
]
},
'type' => 'tag',
'valore' => 'TAG24'
}
]
};
My question is: how to take data and print one for one ?
I want print the name, the tag and valore.
For my software is necessary take the name of shop and more data for example the type
It looks like the structure is a hashref containing an arrayref of hashes, and so on. And apparently where you mention 'name' you mean 'content' by language. Likewise, it seems that where you mention 'tag' you mean 'type'. My answer will be based on those assumptions.
foreach my $rec (#{$href->{field}}) {
print "$rec->{content}->{en}->[0]: $rec->{type}, $rec->{valore}\n";
}
The -> between {content} and {en}, and again between {en} and [0] are optional, and a matter of style.
If you simply want to access elements directly (foregoing the loop), you might do it like this:
print $href->{field}->[0]->{content}->{en}->[0], "\n";
print $href->{field}->[0]->{type}, "\n";
print $href->{field}->[0]->{valore}, "\n";
If you want to print all the languages, you could do this:
foreach my $rec (#{$href->{field}}) {
print $rec->{content}->{$_}->[0], "\n" foreach sort keys %{$rec->{content}};
print $rec->{type}, "\n";
print $rec->{valor}, "\n\n";
}
There are several Perl documentation pages that could be of use to you in the future as you learn to manipulate references and datastructures with Perl: perlreftut, perlref, and perldsc. Access them from your own system as perldoc perlreftut, for example.

Hash not passing properly between functions

I set up a user hash in a function with the following,
push #{$profile{$index}{$infoName}}, $information
and print it using print Dumper(\%profile); index++; in the function it was set up it prints each of indexes
`$VAR1 = { '374' => { 'degree' => [ 'CS' ], 'birthdate' => [ '1973/12/13' ], 'gender' => [ 'M' ],...}
$VAR1 = { '375' => { 'degree' => [ 'CS' ], 'birthdate' => [ '1933/02/03' ], 'gender' => [ 'F' ],...}`
when i try to access this within another function's foreach loop using print "${$profile{$currIndex}{'gender'}}"; i get odd behaviour where the print returns an empty string and get some random numbers appear in the hash: '$VAR1 = { '4' => {}, '1' => {}, '3' => {}, '2' => {}, '378' => { 'birthdate' => [ '1961/03/29' ], 'gender' => ['F'],..}
How can i properly access the gender feild from within a loop?
push #{$profile{$index}{$infoName}}, $information;
print "${$profile{$currIndex}{'gender'}}";
I'm not even sure what the second line actually does. On my Ubuntu, perl produces an error: not a scalar reference.
What you want is, to print all array elements:
print "#{$profile{$currIndex}{'gender'}}\n";
or, to print the first:
print $profile{$currIndex}->{'gender'}->[0], "\n";
The leaf element is an array references and has to be dereferenced as such.
I'm not sure why you use there array references. In your sample data there are no multiple elements in the arrays. Probably you wanted to write simply this? -
$profile{$index}{$infoName} = $information;
...
print "$profile{$currIndex}{'gender'}\n";

How to parse XML and create a tree structure in Perl

I am parsing a XML file with XML::Simple. Is there any way to get a tree form from the XML? If so please explain with example or suggest a CPAN package.
I would like to know which tag I have to process after column and so on.
There is no sequence for the tags. The column tag can appear after Table or display_name many times.
Tab
column
Table
column
display_name
column
display_name
XML:
<Tab>
<column>
<display_name>xyz</display_name>
<display_name>pqr</display_name>
</column>
<Table>
<column><display_name>Department</display_name></column>
</Table>
<display_name>abc</display_name>
<column>pwd</column>
<display_name>jack</display_name>
</Tab>
output with XML::Simple:
$VAR1 = {
'Table' => {
'column' => {
'display_name' => 'Department'
}
},
'display_name' => [
'abc',
'jack'
],
'column' => [
{
'display_name' => [
'xyz',
'pqr'
]
},
'pwd'
]
};
Expected o/p:
$VAR1 = {
'column' => {
'display_name' => [
'xyz',
'pqr'
]
}
'Table' => {
'column' => {
'display_name' => 'Department'
}
},
'display_name' => 'abc',
'column' => 'pwd',
'display_name' =>'jack'
};
I know a hash with same keys isn't possible. Please suggest a way that I can maintain the sequence of tags and will be able to print them.
XML::LibXML creates a tree with no loss of information.
use XML::LibXML qw( );
my $parser = XML::LibXML->new();
my $tree = $parser->parse_file($qfn);
You can generate the output you specified from there. (I don't know why you'd want to, since the Perl code you want for output would lose data if run.)
I used XML::Parser for same file
#!/usr/sbin/perl
use XML::Parser;
use Data::Dumper;
use strict;
my $Filename = "abc.xml";
my $Parser = new XML::Parser( Style => 'tree' );
my $Tree = $Parser->parsefile( $Filename );
print Dumper( $Tree );
If there is another way to get desired output please suggest.

strange bug accessing a variable in template toolkit

I'm struggling with a bug, that I can't nail down.
I have a function that takes a postcode, does a lookup, and returns a latitude, longitude and area name.
for example, pass it AD300 it returns (something like) 42.6, 1.55, ordino - it works very well.
The function is called like this:
my ($lat, $lng, $area) = $object->release();
The return values are fine, and I can print them in perl with a warn
warn "Area $area, $rellat, $rellng";
This works fine. "Area Ordino, 42.6, 1.55"
I then take one of these values, say $area, add it to a hash of data, and pass it to a web page where it is preprocessed via TT (as I do successfully with a load of other variables).
I'm assigning the value to the hash in the normal way. e.g.
$hash->{'area'} = $area;
Here is where the fun begins. When I try to reference the value in TT e.g. [% hash.area %]
I don't get "Ordino" printed on the web page, I'm told I've passed an Array reference to TT.
After a little debugging, I've found that my hash variable hash.area, is somehow referencing an array (according to TT) holding the three values that I've returned from the subroutine "release". I.e.
hash.area = [42.6, 1.55, ordino] according to TT.
That is, to get the value "Ordino" within the web page, I have to access [% hash.area.2 %].
Further, I can set $hash->{'area'} to equal any of the variables, $lat, $lng, or $area and get the same behavior. TT believes all three variables reference the same array. that is
$lat = $lng = $area = [42.6, 1.55, ordino] according to TT
This is bizare, I can happily print the variables in perl and they appears as normal - not an array. I've tried dumping the hash with dumper, no array, everything is fine. Yet somehow, TT is finding an array. It's doing my head in.
The site is quite large, with a lot of pages and I happily pass variables and hashes via TT to web pages all the time, and have been for 4 years now. I've never seen this. On other pages, I even pass exactly the same output from the "release" method and it is processed correctly.
I don't think my TT processing code is the problem, however the following is relevant.
my $tt = Template->new({
INCLUDE_PATH => [ #$template_directories ],
COMPILE_EXT => '.ttc',
COMPILE_DIR => '/tmp/ttc',
FILTERS => YMGN::View->filters,
PLUGIN_BASE => [ 'YMGN::V::TT::Plugins' ],
EVAL_PERL => 1
});
$self->{tt} = $tt;
$self->{template_directories} = $template_directories;
$self->{output} = $params->{output} || undef;
$self->{data} = $params->{data} || [];
The above creates a new tt object and is part of the "new" function (refed below).
"data" contains the hash. "output" holds the processed template ready to send to users browser. We call new (above), process the data and create the output with the code below.
sub process {
my $self = shift;
my $params = shift;
if (!ref $self || !exists $self->{tt}) {
my $class = $self;
$self = $class->new($params);
}
if (!$self->{output}) {
die "You need to specify output";
}
delete $self->{error};
$self->y->utils->untaint(\$self->{template});
my $rv = $self->{tt}->process(
$self->{template},
$self->{data},
$self->{output},
binmode => ':utf8',
);
if (!$rv) {
warn $self->{tt}->error();
return {
error => $self->{tt}->error(),
};
}
return 0;
}
All of the above is sanitised because there is a lot of other stuff going on.
I believe what's important is that the data going in looks correct, here is a full dump of the complete data that is being processed by tt (at the point of processing). The thing that is causing the problem is bubbles->[*]->{'release'} (note, that release == area in the data. The name was changed for unrelated reasons). As you can see, dumper thinks it's a string. TT deals with everything else fine.
data $VAR1 = {
'system' => {
system stuff
},
'features' => {
site feature config
},
'message_count' => '0',
'bubbles' => [
bless( {
'history' => [
{
'creator' => '73',
'points' => '10',
'screenname' => 'sarah10',
'classname' => 'Flootit::M::Bubbles',
'id' => '1378',
'updated' => '1352050471',
'type' => 'teleport',
'label' => 'teleport',
'class' => 'Flootit::M::Bubbles'
}
],
'creator' => '6',
'release' => 'Escaldes-Engordany',
'image' => 'http://six.flooting.com/files/833/7888.png',
'pop_time' => '1352050644',
'y' => $VAR1->{'y'},
'taken_by' => '0',
'city' => '3',
'title' => 'hey a new bubble',
'id' => '566',
'class' => 'Flootit::M::Bubbles',
'prize' => 'go for it kids'
}, 'Flootit::M::Bubbles' ),
bless( {
'history' => [
{
'creator' => '6',
'points' => '10',
'screenname' => 'sarah20',
'classname' => 'Flootit::M::Bubbles',
'id' => '1723',
'updated' => '1349548017',
'type' => 'teleport',
'label' => 'teleport',
'class' => 'Flootit::M::Bubbles'
},
{
'creator' => '6',
'points' => '5',
'screenname' => 'sarah20',
'classname' => 'Flootit::M::Bubbles',
'id' => '1732',
'updated' => '1349547952',
'type' => 'blow',
'label' => 'blow',
'class' => 'Flootit::M::Bubbles'
}
],
'creator' => '89',
'release' => 'Ordino',
'image' => 'http://six.flooting.com/files/1651/8035.png',
'pop_time' => '1351203843',
'y' => $VAR1->{'y'},
'taken_by' => '0',
'city' => '3',
'title' => 'test4',
'id' => '1780',
'class' => 'Flootit::M::Bubbles',
'prize' => 'asdfasdf dsadsasdfasdfasdf'
}, 'Flootit::M::Bubbles' ),
bless( {
'history' => [],
'creator' => '6',
'release' => 'Andorra la Vella',
'image' => 'http://six.flooting.com/files/1671/8042.png',
'pop_time' => '0',
'y' => $VAR1->{'y'},
'taken_by' => '0',
'city' => '3',
'title' => 'Pretty flowers, tres joli',
'id' => '1797',
'class' => 'Flootit::M::Bubbles',
'prize' => 'With lots of pretty pictures'
}, 'Flootit::M::Bubbles' ),
bless( {
'history' => [],
'creator' => '6',
'release' => 'Hillrise Ward',
'image' => 'http://six.flooting.com/files/1509/8003.png',
'pop_time' => '0',
'y' => $VAR1->{'y'},
'taken_by' => '0',
'city' => '3',
'title' => 'Test beats',
'id' => '1546',
'class' => 'Flootit::M::Bubbles',
'prize' => 'Sound great'
}, 'Flootit::M::Bubbles' )
]
};
What comes out after processing is this (in $output)
There is a
[% FOREACH floot IN bubbles %]
Floating around ARRAY(0xfaf5d448). from [% floot.release %]
if we make this [% floot.release.2 %] it gives the correct value.
All the other fields can be referenced correctly - go figure.
The code that puts "bubbles" together is;
my $bubbles = $y->model('Bubbles')->search(['type' => 'golden', 'image' => '!NULL',
'bubble_prizes' => ['p', { 'p.bubble' => 'self.id'}], ], {
order_by => '(created>CURRENT_DATE() AND thumbsup+thumbsdown<10) DESC, COALESCE(thumbsup,0)-COALESCE(thumbsdown,0) DESC, pop_time DESC',
count => 10,
fields => ['p.title as title', 'p.prize as prize', 'city', 'taken_by', 'pop_time', 'id', 'creator'],
});
for (my $i=0; $i<#$bubbles; $i++) {
# Find specified bubbles (see below for when not found here)
my ($rellat, $rellng, $area) = $bubbles->[$i]->release() ;
$bubbles->[$i]->{'release'} = $area;
}
}
The controller then takes $bubble, bundles it up with session / site data, puts it inside an anonymous hash (as you can see in the data above) and passes it to view for processing.
The code for release is :
sub release {
my $self = shift;
my $postcode = $self->y->model('Prizes')->find({bubble => $self->id})->postcode;
my ( $user_lat, $user_long, $region_name );
if($postcode)
{
( $user_lat, $user_long, $region_name ) = $self->y->api('Location')->from_postcode($postcode);
return ( $user_lat, $user_long, $region_name );
}
}
API::Location is quite large, however the relevant lines are;
$postcode_record = $self->y->model('GeoData')->find( {
source => "ALL_COUNTRIES_POSTCODES",
country => $country_code,
sourceid => $postcode, } );
return ( $postcode_record->latitude, $postcode_record->longitude, $postcode_record->town );
The data dumps I've shown you are taken from inside TT.pm (part of view).
So, any ideas what might be going on or where to start? What can I do to try and debug this further? I'm out of ideas.
Maybe it's because $area is a blessed object; Try this to convert to a scalar string :
$string = ''.$area;
# e.g.
$hash->{'area'} = ''.$area;
Following #Moritz comment, to check that $area is blessed:
print ref($area);
use Data::Dumper; warn Dumper($area);
And q{""} overloaded:
print defined ${ref($area).'::'}{'(""'};
EDIT
sub release can return
- undef if $postcode evaluate to false
- a list but as it is used in as scalar context returns the last argument $region_name like parenthesized list (comma expression)
sub release {
my $self = shift;
my $postcode = $self->y->model('Prizes')->find({bubble => $self->id})->postcode;
my ( $user_lat, $user_long, $region_name );
if($postcode)
{
( $user_lat, $user_long, $region_name ) = $self->y->api('Location')->from_postcode($postcode);
return ( $user_lat, $user_long, $region_name );
}
}
It will be relevant to Dump $region_name or $area, or to look at from_postcode.
I found that the problem went away on other development servers and the production server.
I therefore tried uninstalling and reinstalling TT, however that didn't help.
As it appears it's an environment issue on my dev server, so I am retiring the box and starting a new one.

Iterating over bless objects in Perl

I'm working on some code to query F5 load balancers using the BigIP::iControl module.
Right now, I'm getting the following output when doing a Dumper on a variable I get back from a particular function.
I've having a lot of trouble iterating of this object.
How could I go about iterating over this and only taking out the monitor_status for each member?
$VAR1 = [
bless( [
bless( {
'monitor_status' => 'MONITOR_STATUS_UP',
'member' => bless( {
'address' => '127.0.0.0.1',
'port' => '8085'
}, 'Common::IPPortDefinition' )
}, 'LocalLB::PoolMember::MemberMonitorStatus' ),
bless( {
'monitor_status' => 'MONITOR_STATUS_UP',
'member' => bless( {
'address' => '127.0.0.0.1',
'port' => '8085'
}, 'Common::IPPortDefinition' )
}, 'LocalLB::PoolMember::MemberMonitorStatus' ),
bless( {
'monitor_status' => 'MONITOR_STATUS_DOWN',
'member' => bless( {
'address' => '127.0.0.0.1',
'port' => '8085'
}, 'Common::IPPortDefinition' )
}, 'LocalLB::PoolMember::MemberMonitorStatus' ),
bless( {
'monitor_status' => 'MONITOR_STATUS_DOWN',
'member' => bless( {
'address' => '127.0.0.0.1',
'port' => '8085'
}, 'Common::IPPortDefinition' )
}, 'LocalLB::PoolMember::MemberMonitorStatus' )
], 'LocalLB::PoolMember::MemberMonitorStatus[]' )
];
I'm not sure whether those member variables are public - I'm not familiar with the modules used - so this might well violate the encapsulation of the LocalLB::PoolMember::MemberMonitorStatus class. You should check before using.
for my $mms ( #{$VAR1->[0]} ) {
warn $mms->{monitor_status};
}
It would be better to check whether the MemberMonitorStatus class provides an accessor, and possibly an iterator for the member monitor status array.
The above was tested simply by pasting your Dumper output into a Perl script with the code of the for loop implemented based on eyeballing the data structure.
(edit: based on F5 webcentral docs in the Google cache, it may be that MemberMonitorStatus is a simple struct in the underlying code, exposed in the Perl as a class with two member variables - but no behaviour. If so, the above is probably OK.)