XML::FeedPP and accessing media:* property - perl

I'm trying to parse a youtube xml feed and want to access certain media elements in the feed.
I'm able to access basic elements such as title and link but accessing anything under media:group returns empty string.
use XML::FeedPP;
my $feed = XML::FeedPP->new("https://www.youtube.com/feeds/videos.xml\?channel_id\=UCzJuUAme9EABE1quatA8z-Q");
foreach my $item ( $feed->get_item() ) {
print $item->get("media:group") . "\n";
}
Any suggestions on how I can access the media:group and its child elements ?

Inspecting the $item objects in that feed with Data::Printer shows that the objects know about the media:group and other things in the media: namespace.
use strict;
use warnings;
use Data::Printer;
use XML::FeedPP;
my $feed = XML::FeedPP->new("https://www.youtube.com/feeds/videos.xml?channel_id=UCzJuUAme9EABE1quatA8z-Q");
foreach my $item ( $feed->get_item() ) {
p $item;
}
__END__
XML::FeedPP::Atom::Atom10::Entry {
Parents XML::FeedPP::Atom::Common::Entry
public methods (6) : category, description, get_pubDate_native, link, pubDate, title
private methods (0)
internals: {
author {
name "Fun to Origami",
uri "http://www.youtube.com/channel/UCzJuUAme9EABE1quatA8z-Q"
},
id "yt:video:332UeGpfY3E",
link {
-href "http://www.youtube.com/watch?v=332UeGpfY3E",
-rel "alternate"
},
media:group {
media:community {
media:starRating {
-average 4.56,
-count 9,
-max 5,
-min 1
},
media:statistics {
-views 940
}
},
media:content {
-height 390,
-type "application/x-shockwave-flash",
-url "https://www.youtube.com/v/332UeGpfY3E?version=3",
-width 640
},
media:description "...",
media:thumbnail {
-height 360,
-url "https://i4.ytimg.com/vi/332UeGpfY3E/hqdefault.jpg",
-width 480
},
media:title "Origami Pteranodon : Paper Dinosaur Tutorial"
},
published "2015-02-20T01:22:36+00:00",
title "Origami Pteranodon : Paper Dinosaur Tutorial",
updated "2016-02-15T13:42:07+00:00",
yt:channelId "UCzJuUAme9EABE1quatA8z-Q",
yt:videoId "332UeGpfY3E"
}
}
Source: Youtube, Omission mine
So the most obvious way would be to just access the data structure directly. Of course you don't want to do that, as it's bad style, and the underlying implementation might change.
foreach my $item ( $feed->get_item() ) {
say $item->{'media:group'}->{'media:content'}->{'-height'};
}
__END__
390
...
If this is a run-once-and-forget script, stop here.
Now the fun part begins. The $item is an XML::FeedPP::Atom::Atom10::Entry, which is an XML::FeedPP::Item, which is an XML::FeedPP::Element. That guy has a method get. It looks like it would not have a problem dealing with the : part, but it returns undef.
This module seems to be extensively tested. There is an 11_media.t that actually plays around with the media: namespace. In the examples there, however, it not only probably work (or I could not have installed the module), but it's also a bit different. The media: element is not very deep. It's just one tag with attributes.
Feel free to take the research further from this point.

Related

Perl get all values from hahes inside of array

I`m really struggling with Perl and I need to solve this topic. I Have a rest api response that I converted to json, Dumper shows something like this:
VAR1= [
{
"id":"abc",
"type":"info",
"profile":
{"name":"Adam",
"description":"Adam description"}
},
{
"id":"efg",
"type":"info",
"profile":
{"name":"Jean",
"description":"Jean description"}
},
{
"id":"hjk",
"type":"info",
"profile":
{"name":"Jack",
"description":"Jack description"}
},
]
What I need is to iterate over each "name" and check if value is Jean. I wanted to iterate over hashes inside of array, but each time it will only store first hash, not all of them.
What I`m trying and failing:
# my json gather, Dumper is shown above.
my $result_json = JSON::from_json($rest->GET( $host, $headers )->{_res}->decoded_content);
# I`ve tried many things to get all hashes, but either error, or single hash, or single value:
my $list = $result_json->[0];
my $list2 = $result_json->[0]->{'profile'};
my $list3 = #result_json->[0];
my $list4 = #result_json->[0]->{'profile'};
my $list5 = #result_json;
my $list5 = #result_json->{'profile'}; # this throws error
my $list6 = #result_json->[0]->{'profile'}->{'name'};
my $list7 = $result_json->[0]->{'profile'}->{'name'};
# and maybe more combinations... its just an example.
foreach my $i (<lists above>){
print $i;
};
Any idea how to set it up properly and iterate over each "name"?
Assuming that the call to JSON::from_json shown in the code smaple is indeed given the JSON string shown as a Dumper output,† that $result_json is an array reference so iterate over its elements (hash references)
foreach my $hr (#{ $result_json }) {
say "profile name: ", $hr->{profile}{name};
}
† That last comma in the supposed Dumper's output can't actually be there, so I removed it to use the rest as a sample JSON for testing

What to do when an API has no documentation

I'm currently trying to use a REST API to insert data from Powershell into a Jira custom field made by a certain plugin (Easy links for JIRA). Unfortunately there's no documentation on the required syntax. Does anyone who's run into this plugin know the commands/syntax I'll need to use the REST API (It's quite a small plugin so I'll be surprised if anyone else has seen it)? Failing that, does anyone have any advice on discovering how to use APIs with no/bad documentation i.e. some standard method of making an API return a list of commands and syntax (preferably by using powershell)?
I've tried contacting the developer but haven't heard back from them.
The code I'm using is here, if that's helpful:
function Test-Upload(){
Param()
Process{
$data=#"
{
"fields":
{
"project":
{
"key": "CCWASSET"
},
"summary": "Testing Linked Field",
"issuetype":
{
"name": "Asset"
},
"description" : "Testing Linked Field"
},
"update":{
"customfield_10500":[
{
"set":{
"type":{
"name":"Asset PO",
"inward":"Asset",
"outward":"Purchase Order"
},
"outwardIssue":{
"key":""
}
}
}
]
}
}
"#
return Jira-WebRequest -data $data
}
}
function Jira-WebRequest(){
Param(
[Parameter(mandatory=$false)]$data,
[Parameter(mandatory=$false)]$requesttype="issue",
[Parameter(mandatory=$false)]$method="POST",
[Parameter(mandatory=$false)]$ContentType='application/json'
)
Process{
$path = $("/rest/api/2/$requesttype/")
$Uri = ""
[URI]::TryCreate([URI]::new($Settings.Jira.URL),"$path",$([ref]$Uri))
$Params = #{
ContentType = $ContentType
Body = $data #$(#{"vlan_id"=$vlanID;"port_id"="$portID";"port_mode"="$portMode"} | ConvertTo-JSON)
Method = $method
URI = $uri.AbsoluteUri
Headers = $JiraHeaders
#WebSession = $Session
}
try{
$result = Invoke-RestMethod #Params -Verbose
return $result
} Catch [System.Net.WebException] {
$exception = $_.Exception
$respstream = $exception.Response.GetResponseStream()
$sr = new-object System.IO.StreamReader $respstream
$ErrorResult = $sr.ReadToEnd()
return $ErrorResult
}
}
}
It doesn't matter if the custom field made by a plugin or by you. At the end it's custom field so since the plugin doesn't have a good documentation, I would recommend you to stay with Atlassian Documentation and update / edit your issues base on official REST API. You can see Atlassian examples with Custom Fields here
What you need to do is you just need to figure out what's the custom field ID and that would be easy by going towards admin panel and clicking on the custom field and check it from URL if you don't want to go to database.
I know it's pain to see API without documents but at least you can workaround it this way.
Managed to get it working for this particular plugin. Had to include the ID field as well as the Key of the issue that I want to link to. I'd still be interested to hear if anyone has some tips for working with API's that have no/bad documentation.
"update":{
"customfield_10500":[
{
"set":{
"type":{
"name":"Asset PO",
"inward":"Asset",
"outward":"Purchase Order"
},
"outwardIssue":{
"key":""
"ID":""
}
}
}
]
}
}

perl Catalyst REST action not working

So I'm writing a simple API server, and obviously C::C::R is the right answer. I have an action to get a "list of thingies" working fine:
package stuff::Controller::Thingy;
use Moose;
use namespace::autoclean;
BEGIN { extends 'Catalyst::Controller::REST'; }
__PACKAGE__->config(namespace => '');
sub thingy : Local : ActionClass('REST') { }
sub thingy_GET :Args(0) :Path("/thingy") {
}
This works great. Also yay HashrefInflator and a JSON view. Makes the code really small.
But! If I add a second action to get a single thingy, my original action stops working:
sub thingy_GET :Args(1) :Path("/thingy") {
my ( $self, $c, $thingy_id ) = #_;
}
When plackup starts, I get:
[debug] Loaded Path actions:
.-------------------------------------+--------------------------------------.
| Path | Private |
+-------------------------------------+--------------------------------------+
| /... | /default |
| /bar/thingy/... | /bar/thingy |
| /thingy/* | /thingy_GET |
| /thingy/... | /thingy |
'-------------------------------------+--------------------------------------'
If I call /thingy I get:
{
"data": []
}
Ideas?
Your second thingy_GET function needs a different function name. Perhaps thingy_GET_list and thingy_GET_resource, or whatever you want.
Subs cannot have the same name, a different attribute is not enough, and Sub::Multi does not help here.
Use __PACKAGE__->config(action => { … to configure the actions instead.
OK, that was (relatively) simple. Don't call the subs the same thing. My screen is too small, I missed the:
Subroutine thing_GET redefined at lib/foo/Controller/Thingy.pm line 40.
And yet the docs as far as I read them make no mention.
Fortunately #catalyst yelled at me. And I scrolled up the plackup output.

perl - searching in list of objects which are an accessor of another object

I am a Perl-OO beginner and I am encountering a design-challenge. I hope you can give me some hints to get to an elegant solution. I am working with Mouse Object System here.
For a minimal example lets say I have a User-Object. A user has a name.
package User;
use Mouse;
has "name" => (
is => "rw",
isa => "Str|Undef",
);
Then I have a User-Cache-Object, which gets a list of all Users (from an LDAP-Server). You can say this is a "has-a" Relationship between the User Cache and the User.
package UserCache;
use Mouse;
has "users" => (
is => 'rw',
isa => 'ArrayRef|Undef',
default => sub { [] },
);
I store this list of Users as an Array of User-Objects in the accessor of the User-Cache.
my $cache = UserCache->new();
foreach my $entry ( $ldap->searchGetEntries() ) {
my $user = User->new();
$user->name($entry->get_value('userdn'));
push #{ $cache->users }, $user;
}
Now this is where my Problem comes in. If I want to find a User-Object with specific attributes (e.g. a User named John), I have to loop over this whole Array of User-Objects and query each object for its name. When given a list of names, this gets a really inefficient process.
foreach my $user ( #{ $cache->users } ) {
if ( $user->name eq 'John' ) {
#do something with John
}...
}
Is there a way of storing Lists of Objects in other Objects in a way, that I can efficently search? Like $cache->get_users->get_name('John') and that returns the object I need?
You don't really have to write the UserCache class yourself. Instead, use CHI to cache users you want to cache under the key you want to use for lookups. If you want, you can wrap your cache class to abstract away from the specific cache implementation.
Also, you have this:
push #{ $cache->users }, $user;
where you leak implementation details. Instead, your UserCache object needs something like a save_user method so the code it uses does not depend on the implementation details.
$cache->save_user( $user );
For Moose objects, you get Moose::Meta::Attribute::Native::Trait::Array; for Mouse, you get MouseX::NativeTraits::ArrayRef.
No. At least not universally. You can of course build indexes for common things. Or you could cache searches once you have done them.
Lookups are best implemented as hashes. Those could be attached to the UserCache object. Something like:
my #users = $cache->find( name => 'John' );
That would internally map to a hashref with search fields.
package UserCache;
#...
has _search_index => (
is => 'ro',
isa => 'HashRef',
default => sub { {} },
);
And the hash reference would look something like this:
{
name => {
John => [
User->new( name => 'John', last_name => 'Smith' ),
User->new( name => 'John', last_name => 'Wayne' ),
User->new( name => 'John', last_name => 'Bon Jovi' ),
],
James => [ ... ],
},
id => {
# ...
},
),
But again, you'd have to build those. So you need to do the lookup once. But I think the lookup should be done inside UserCache and stored there too.
sub find {
my ($self, $key, $value) = #_;
# get operation
return #{ $self->_search_index->{$key}->{$value} }
if exists $self->_search_index->{$key}->{$value};
# set operation
foreach my $user ( #{ $self->users } ) {
push #{ $self->_search_index->{$key}->{$value} }, $user
if $user->$key eq $value
}
return #{ $self->_search_index->{$key}->{$value} }
}
This is a very naive implementation and it doesn't support multiple lookups, but it's a start.
Note that if you have a lot of users and a lot of indexes, the data structure might become large.
To make it easier, Moose's built-in traits might be helpful. If you want a stronger cache behavior, look at CHI.

Perl mechanize Find all links array loop issue

I am currently attempting to create a Perl webspider using WWW::Mechanize.
What I am trying to do is create a webspider that will crawl the whole site of the URL (entered by the user) and extract all of the links from every page on the site.
But I have a problem with how to spider the whole site to get every link, without duplicates
What I have done so far (the part im having trouble with anyway):
foreach (#nonduplicates) { #array contain urls like www.tree.com/contact-us, www.tree.com/varieties....
$mech->get($_);
my #list = $mech->find_all_links(url_abs_regex => qr/^\Q$urlToSpider\E/); #find all links on this page that starts with http://www.tree.com
#NOW THIS IS WHAT I WANT IT TO DO AFTER THE ABOVE (IN PSEUDOCODE), BUT CANT GET WORKING
#foreach (#list) {
#if $_ is already in #nonduplicates
#then do nothing because that link has already been found
#} else {
#append the link to the end of #nonduplicates so that if it has not been crawled for links already, it will be
How would I be able to do the above?
I am doing this to try and spider the whole site to get a comprehensive list of every URL on the site, without duplicates.
If you think this is not the best/easiest method of achieving the same result I'm open to ideas.
Your help is much appreciated, thanks.
Create a hash to track which links you've seen before and put any unseen ones onto #nonduplicates for processing:
$| = 1;
my $scanned = 0;
my #nonduplicates = ( $urlToSpider ); # Add the first link to the queue.
my %link_tracker = map { $_ => 1 } #nonduplicates; # Keep track of what links we've found already.
while (my $queued_link = pop #nonduplicates) {
$mech->get($queued_link);
my #list = $mech->find_all_links(url_abs_regex => qr/^\Q$urlToSpider\E/);
for my $new_link (#list) {
# Add the link to the queue unless we already encountered it.
# Increment so we don't add it again.
push #nonduplicates, $new_link->url_abs() unless $link_tracker{$new_link->url_abs()}++;
}
printf "\rPages scanned: [%d] Unique Links: [%s] Queued: [%s]", ++$scanned, scalar keys %link_tracker, scalar #nonduplicates;
}
use Data::Dumper;
print Dumper(\%link_tracker);
use List::MoreUtils qw/uniq/;
...
my #list = $mech->find_all_links(...);
my #unique_urls = uniq( map { $_->url } #list );
Now #unique_urls contains the unique urls from #list.