Associating Adjacent Elements with Perl's Web::Scraper - perl

The following is the example at hand :
#!/usr/bin/perl
use strict;
use Web::Scraper;
use Data::Dumper;
my $html = q[
<html>
<body>
<div class="mainContainer">
<div class="when">February 20, 2014</div>
<div class="name">Name 1</div>
<div class="desc">Desc 1</div>
<div class="when">February 21, 2014</div>
<div class="name">Name 2</div>
<div class="desc">Desc 2</div>
<div class="name">Name 3</div>
<div class="desc">Desc 3</div>
<div class="when">February 22, 2014</div>
<div class="name">Name 4</div>
<div class="desc">Desc 4</div>
</div>
</body>
</html>
];
my $scraper = scraper {
process ".when", "events[]" => scraper {
my $when = $_->content();
my $hash = {};
$hash->{$when}->{name} = "NAME";
$hash->{$when}->{desc} = "DESC";
return $hash;
};
};
my $result = $scraper->scrape($html);
print Dumper( $result );
What I am trying to do is associate the dates with the events details. As you can see, the divs are not nested so it is not as trivial (at least for me). Also each event is composed of the name and desc. I did not find a way to associate the adjacent elements in the desired structure using css selectors. I figured I would need a custom subroutine to return to do that associations of the elements. What I would like to retrieve is something similar to the following :
[
'February 20, 2014' => [
{
'name' => 'Name 1',
'desc' => 'Desc 1'
}
],
'February 21, 2014' => [
{
'name' => 'Name 2',
'desc' => 'Desc 2'
},
{
'name' => 'Name 3',
'desc' => 'Desc 3'
}
],
'February 22, 2014' => [
{
'name' => 'Name 4',
'desc' => 'Desc 4'
}
]
]

You might be better served by getting the data first, and then processing these after the scraper. So...:
my $scraper = scraper {
process ".when", "dates[]" => "TEXT";
process ".name", "names[]" => "TEXT";
process ".desc", "desc[]" => "TEXT";
};
my $result = $scraper->scrape($html);
# Here you would start processing these
my #dates = #{ $result->{dates} };
my #names = #{ $result->{names} };
my #info = #{ $result->{desc} };
my %events;
for ( my $i = 0; $i < scalar #dates; $i++ ) {
my $date = $dates[$i];
my $name = $names[$i];
my $info = $info[$i];
if ( exists $events{$date} ) {
push #{ $events{$date} }, { 'name' => $name, 'desc' => $info };
}
else {
$events{$date} = [{ 'name' => $name, 'desc' => $info}];
}
}
%events would have the data you need. This is all assuming you still need this, and each event date has a name and description after it. Also, I haven't tested this.

Related

update_post_meta for properties visibility?

I'm trying to find how can I display the list of imported properties. The problem is the metadata, after import the properties are not visible on the frontend and I need to update every item manually. How can I update metadata editing a following code:
<?php
$args = array(
'posts_per_page' => $custom_property_items_amount,
'post_type' => 'property',
'orderby' => array(
'menu_order'=>'ASC',
'date' =>'DESC',
),
'offset' => ( max( 1, get_query_var( 'paged' ) ) - 1 ) * $custom_property_items_amount,
'ignore_sticky_posts' => 1,
'post_status' => array('publish','pending','draft','future','private'),
);
$data = new WP_Query( $args );
?>
<div class="<?php echo join( ' ', $wrapper_classes ) ?>">
<?php if ( $data->have_posts() ) :
while ( $data->have_posts() ): $data->the_post(); ?>
<?php ere_get_template( 'content-property.php', array(
'custom_property_image_size' => $custom_property_image_size,
'property_item_class' => $property_item_class
)); ?>
<?php endwhile;
else: ?>
<div class="item-not-found"><?php esc_html_e( 'Not found', 'essential-real-estate' ); ?></div>
<?php endif; ?>
<div class="clearfix"></div>
<?php
$max_num_pages = $data->max_num_pages;
ere_get_template( 'global/pagination.php', array( 'max_num_pages' => $max_num_pages ) );
wp_reset_postdata(); ?>
</div>
Solved! The problem was in this part of the code:
if (!empty($features)) {
foreach($features as $feature){
$tax_query[] = array(
'taxonomy' => 'property-feature',
'field' => 'slug',
'terms' => $feature
);
$parameters.=sprintf( __('Feature: <strong>%s</strong>; ', 'essential-real-estate'), $feature);
}
}
$args['meta_query'] = array(
'relation' => 'AND',
$meta_query
);
$tax_count = count($tax_query);
if ($tax_count > 0) {
$args['tax_query'] = array(
'relation' => 'AND',
$tax_query
);
}
By deleting this all the items were displayed.

Paragraph breaks missing from shortcode output

I created a shortcode in Wordpress to perform a query and display the content, but the content line breaks are being removed.
add_shortcode( 'resource' , 'Resource' );
function Resource($atts) {
$atts = shortcode_atts( array(
'category' => ''
), $atts );
$categories = explode(',' , $atts['category']);
$args = array(
'post_type' => 'resource',
'post_status' => 'publish',
'orderby' => 'title',
'order' => 'ASC',
'posts_per_page'=> -1,
'tax_query' => array( array(
'taxonomy' => 'category',
'field' => 'term_id',
'operator' => 'AND',
'terms' => $categories
) )
);
$string = '';
$query = new WP_Query( $args );
if( ! $query->have_posts() ) {
$string .= '<p>no listings at this time...</p>';
}
while( $query->have_posts() ){
$query->the_post();
$string .= '<div id="links"><div id="linksImage">' . get_the_post_thumbnail() . '</div>
<div id="linksDetails"><h1>'. get_the_title() .'</h1><p>' . get_the_content() . '</p>
<p>for more information CLICK HERE</div></div>';
}
wp_reset_postdata();
$output = '<div id="linksWrapper">' . $string . '</div>';
return $output;
}
Any suggestion on why this is happening and what to do to fix it. This is only happening on the shortcode output. On regular pages - the content displays correctly.
found a solution through more searches:
function get_the_content_with_formatting ($more_link_text = '(more...)', $stripteaser = 0, $more_file = '') {
$content = get_the_content($more_link_text, $stripteaser, $more_file);
$content = apply_filters('the_content', $content);
$content = str_replace(']]>', ']]>', $content);
return $content;
}
works perfect, so I thought I would share..

register View-Helper understanding issue

I tried to register a View Helper for navigation, it is an example from olegkrivtsov,I chose this to learn more about the topic. I also read the posts about it. I thought it must be really easy, but it doesn't work, probably some more experienced Zend-developer will see the problem immediately.
First the folder I use, is this the right folder, what is the diffenrence to the folder helpers in the module Import for example?
Here is the content of menu.php
<?php
namespace Application\View\Helper;
use Zend\View\Helper\AbstractHelper;
// This view helper class displays a menu bar.
class Menu extends AbstractHelper
{
// Menu items array.
protected $items = [];
// Active item's ID.
protected $activeItemId = '';
// Constructor.
public function __construct($items=[])
{
$this->items = $items;
}
// Sets menu items.
public function setItems($items)
{
$this->items = $items;
}
// Sets ID of the active items.
public function setActiveItemId($activeItemId)
{
$this->activeItemId = $activeItemId;
}
// Renders the menu.
public function render()
{
if (count($this->items)==0)
return ''; // Do nothing if there are no items.
$result = '<nav class="navbar navbar-default" role="navigation">';
$result .= '<div class="navbar-header">';
$result .= '<button type="button" class="navbar-toggle" ';
$result .= 'data-toggle="collapse" data-target=".navbar-ex1-collapse">';
$result .= '<span class="sr-only">Toggle navigation</span>';
$result .= '<span class="icon-bar"></span>';
$result .= '<span class="icon-bar"></span>';
$result .= '<span class="icon-bar"></span>';
$result .= '</button>';
$result .= '</div>';
$result .= '<div class="collapse navbar-collapse navbar-ex1-collapse">';
$result .= '<ul class="nav navbar-nav">';
// Render items
foreach ($this->items as $item) {
$result .= $this->renderItem($item);
}
$result .= '</ul>';
$result .= '</div>';
$result .= '</nav>';
return $result;
}
// Renders an item.
protected function renderItem($item)
{
$id = isset($item['id']) ? $item['id'] : '';
$isActive = ($id==$this->activeItemId);
$label = isset($item['label']) ? $item['label'] : '';
$result = '';
if(isset($item['dropdown'])) {
$dropdownItems = $item['dropdown'];
$result .= '<li class="dropdown ' . ($isActive?'active':'') . '">';
$result .= '<a href="#" class="dropdown-toggle" data-toggle="dropdown">';
$result .= $label . ' <b class="caret"></b>';
$result .= '</a>';
$result .= '<ul class="dropdown-menu">';
foreach ($dropdownItems as $item) {
$link = isset($item['link']) ? $item['link'] : '#';
$label = isset($item['label']) ? $item['label'] : '';
$result .= '<li>';
$result .= ''.$label.'';
$result .= '</li>';
}
$result .= '</ul>';
$result .= '</a>';
$result .= '</li>';
} else {
$link = isset($item['link']) ? $item['link'] : '#';
$result .= $isActive?'<li class="active">':'<li>';
$result .= ''.$label.'';
$result .= '</li>';
}
return $result;
}
}
I posted the hole example for somebody who also wants to use it.
Here how I tried to register in my module.config.php
'view_helpers' => [
'factories' => [
View\Helper\Menu::class => InvokableFactory::class,
],
'aliases' => [
'mainMenu' => View\Helper\Menu::class
]
],
I placed it in the layout.phtml
<div class="collapse navbar-collapse">
<?php
$this->mainMenu()->setItems([
[
'id' => 'home',
'label' => 'Dashboard',
'link' => $this->url('home')
],
[
'id' => 'project',
'label' => 'Project',
'link' => $this->url("project", ['action'=>'index'])
],
[
'id' => 'unit',
'label' => 'Unit',
'dropdown' => [
[
'id' => 'add',
'label' => 'add Unit',
// 'link' => $this->url('unit', ['page'=>'contents'])
'link' => $this->url('unit', ['action'=>'add'])
],
[
'id' => 'help',
'label' => 'Help',
'link' => $this->url('home')
]
]
],
]);
echo $this->mainMenu()->render();
?>
</div>
With this code I replaced the former part, which came from the skeleton:
<div class="collapse navbar-collapse">
<?= $this->navigation('navigation')
->menu()
->setMinDepth(0)
->setMaxDepth(0)
->setUlClass('nav navbar-nav') ?>
I get this error message via browser:
Fatal error: Uncaught Error: Class 'Application\view\helper\Menu' not found in C:\wamp64\www\xyz\vendor\zendframework\zend-servicemanager\src\Factory\InvokableFactory.php
I'd really love to understand this because it might be really helpful in future, so any suggestion is appreciated.
Move file Menu.php to the folder Application/src/Application/View/Helper

Dynamic Country - city select on form

newbie alert,
I am really enjoying perl Catalyst, however, i have googled and cant find a solution for Country - City dynamic selection. when i select a country from the dropdown, i would like the cities to change to that coutries cities only. How can i achieve this in Perl, Catalyst using HTML::FormHandler.
PS
The data is coming from mysql db with a one to many relatioship
has_field 'city_id' => (
label => 'City',
type => 'Select',
empty_select => 'Choose city',
required => 1,
required_message => 'Please enter city.',
);
has_field 'country_code' => (
label => 'Country',
type => 'Select',
empty_select => 'Choose country',
required => 1,
required_message => 'Please enter your country.',
);
has_field 'submit' => (
type => 'Submit',
value => 'Save',
element_class => ['btn']
);
sub options_country_code {
my $self = shift;
return unless $self->schema;
my #countries = $self->schema->resultset('Country')->all;
my #options = map { { value => $_->country_code, label => $_->country_name } } #countries;
unshift #options, { value => 0, label => 'Choose Country' };
return #options;
}
__PACKAGE__->meta->make_immutable;
1;
I found exactly what I was looking for, thank you for the effort, I appreciate it very much.
The solution I needed is here
I am using this code as my base:
<html>
<head>
<script type="text/javascript">
function setmenu2() {
var menu1 = document.getElementById('menu1');
var sel = menu1.options[menu1.selectedIndex].text;
var menu2 = document.getElementById('menu2');
if (sel == "Foo") {
menu2.innerHTML = "<option>Foo-1</option>"
+"<option>Foo-2</option>";
} else {
menu2.innerHTML = "<option>Bar-1</option>"
+"<option>Bar-2</option>"
+"<option>Bar-3</option>";
}
}
</script>
</head>
<body>
<select id="menu1" onchange="setmenu2()">
<option>please select...</option>
<option>Foo</option>
<option>Bar</option>
</select>
<select id="menu2">
<option>- ?? -</option>
</select>
</body>
</html>
So far this is what I have:
<script type="text/javascript">
function setcities() {
var country_select = document.getElementById('country_code');
var sel = country_select.options[country_select.selectedIndex].value;
var city_select = document.getElementById('city_id');
if (sel == "AFG") {
city_select.innerHTML = "<option value='1' id='city_id.1'>Kabul</option>"
+"<option value='2' id='city_id.2'> Qandahar</option>";
} else if (sel == "AGO"){
city_select.innerHTML = "<option value='56' id='city_id.1'>Luanda</option>"
+"<option value='57' id='city_id.2'>Huambo</option>"
+"<option value='58' id='city_id.3'>Lobito</option>";
} else {
city_select.innerHTML = "<option>nothing</option>";
}
}
</script>
<select id="country_code" onchange="setcities()">
<option value="" id="country_code.0">please select...</option>
<option value="AFG" id="country_code.1">Afghanistan</option>
<option value="AGO" id="country_code.2">Angola</option>
<option value="AIA" id="country_code.3">Anguilla</option>
</select>
<select id="city_id">
<option>- ?? -</option>
</select>

Unable to parse html tags with perl

I am trying to parse the following link using perl
http://www.inc.com/profile/fuhu
I am trying to get information like Rank, 2013 Revenue and 2010 Revenue, etc,
But when fetch data with perl, I get following and same shows in Page Source Code.
<dl class="RankTable">
<div class="dtddwrapper">
<div class="dtdd">
<dt>Rank</dt><dd><%=rank%></dd>
</div>
</div>
<div class="dtddwrapper">
And When I check with Firebug, I get following.
<dl class="RankTable">
<div class="dtddwrapper">
<div class="dtdd">
<dt>Rank</dt><dd>1</dd>
</div>
</div>
<div class="dtddwrapper">
My Perl code is as following.
use WWW::Mechanize;
$url = "http://www.inc.com/profile/fuhu";
my $mech = WWW::Mechanize->new();
$mech->get( $url );
$data = $mech->content();
print $data;
As other have said this is not plain HTML, there is some JS wizardry. The data comes from a dynamic JSON request.
The following script prints the rank and dumps everything else available in $data.
First it gets the ID of the profile and then it makes the appropriate JSON request, just like a regular browser.
use strict;
use warnings;
use WWW::Mechanize;
use JSON qw/decode_json/;
use Data::Dumper;
my $url = "http://www.inc.com/profile/fuhu";
my $mech = WWW::Mechanize->new();
$mech->get( $url );
if ($mech->content() =~ /profileID = (\d+)/) {
my $id = $1;
$mech->get("http://www.inc.com/rest/inc5000company/$id/full_list");
my $data = decode_json($mech->content());
my $rank = $data->{data}{rank};
print "rank is $rank\n";
print "\ndata hash value \n";
print Dumper($data);
}
Output:
rank is 1
data hash value
$VAR1 = {
'time' => '2014-08-22 11:40:00',
'data' => {
'ifi_industry' => 'Consumer Products & Services',
'app_revenues_lastyear' => '195640000',
'industry_rank' => '1',
'ifc_company' => 'Fuhu',
'current_industry_rank' => '1',
'app_employ_fouryearsago' => '49',
'ifc_founded' => '2008-00-00',
'rank' => '1',
'city_display_name' => 'Los Angeles',
'metro_rank' => '1',
'ifc_business_model' => 'The creator of an Android tablet for kids and an Adobe Air application that allows children to access the Internet in a parent-controlled environment.',
'next_id' => '25747',
'industry_id' => '4',
'metro_id' => '2',
'app_employ_lastyear' => '227',
'state_rank' => '1',
'ifc_filelocation' => 'fuhu',
'ifc_url' => 'http://www.fuhu.com',
'years' => [
{
'ify_rank' => '1',
'ify_metro_rank' => '1',
'ify_industry_rank' => '1',
'ify_year' => '2014',
'ify_state_rank' => '1'
},
{
'ify_industry_rank' => undef,
'ify_year' => '2013',
'ify_rank' => '1',
'ify_metro_rank' => undef,
'ify_state_rank' => undef
}
],
'ifc_twitter_handle' => 'NabiTablet',
'id' => '22890',
'app_revenues_fouryearsago' => '123000',
'ifc_city' => 'El Segundo',
'ifc_state' => 'CA'
}
};
This thing : <%=rank%> is inside a script, it's not HTML. So when you see it in firebug, it shows after executing this part. But when you look at the HTML code, you see it this way. So HTML parsing won't work here.
Usually in this type of cases, the variables (rank for example) are passed from server using a XHR call. So you need to check the XHR calls in firebug and see the responses.