php - splitting a string with HTML by the first instance of a table cell - explode

I am checking on HTML content on my page, and I've got the split down to have the variable left with this content:
">
<td>Oklahoma City</td>
<td>Oklahoma</td>
<td>OK</td>
<td>405</td>
<td>CST</td>
</tr>
</table>
<div id="
Those are dynamic pages I'm checking, so the data will always be different, but the layout the same...
How can I get the value out of the second <td> if that html is in 1 variable(string)?
It was a full page, I've used explode twice to remove everything above a div field and everything below the last dive field id... so it has some open html tags left because I did not know how to get rid of that along the way to be left with just this:
<td>Oklahoma City</td>
<td>Oklahoma</td>
<td>OK</td>
<td>405</td>
<td>CST</td>
</tr>
</table>
Can you tell me how to get that out? I just need the second one because it is the county and that is what I'm checking on...

Related

DOM element not rendering html

I have built an html table that generates dynamically from a data array. It is intended to make a menu of beers on tap in my bar. The array contains the following data: [tap_number, brewery_image, beer_name, price_for_a_pint, price_for_a_pitcher]. Therefore the array dictates that the table generate with 5 columns and 14 rows (with current data). The image data consists of an html tagged image and similarly the beer_name is tagged <h3></h3>. All the html tagged data is rendering as text. What have I done wrong? btw, using Materialize css for basic table styling. Have tried with bootstrap also - same result. Here's a snippet of the html element that the js generates:
<table class = "tabel">
<thead>
<tr>
...has <td>column name</td> X5
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td><img src='[url of image]' alt=''></td>
<td><h3>[Beer_Name]</h3></td>
<td>$6</td>
<td>$10</td>
</tr>
...a bunch more rows..
</tbody>
</table>
.
So data positions 1 and 2 should be rendering according to html tag but are just appearing on the page as the text of their html. All data from the array is passed as textContent. Should I be using innerHtml? when I do, nothing at all renders. Cannot figure out what amateur mistake I've made or whether Materialize is screwing me... Thanks for any advice...
It was an amateur mistake. was using innerHtml instead of innerHTML....

Trouble pinpointing child elements while using Mojo::DOM

I'm trying to extract text from an old vBulletin forum using WWW::Mechanize and Mojo::DOM.
vBulletin doesn't use HTML and CSS for semantic markup, and I'm having trouble using Mojo::DOM->children to get at certain elements.
These vBulletin posts are structured differently depending on their content.
Single message:
<div id="postid_12345">The quick brown fox jumps over the lazy dog.<div>
Single message quoting another user:
<div id="postid_12345">
<div>
<table>
<tr>
<td>
<div>Quote originally posted by Bob</div>
<div>Everyone knows the sky is blue.</div>
</td>
</tr>
</table>
</div>
I disagree with you, Bob. It's obviously green.
</div>
Single message with spoilers:
<div id="postid_12345">
<div class="spoiler">Yoda is Luke's father!</div>
</div>
Single message quoting another user, with spoilers:
<div id="postid_12345">
<div>
<table>
<tr>
<td>
<div>Quote originally posted by Fred</div>
<div class="spoiler">Yoda is Luke's father!</div>
</td>
</tr>
</table>
</div>
<div class="spoiler">No waaaaay!</div>
</div>
Assuming the above HTML and an array packed with the necessary post IDs:
for (#post_ids) {
$mech->get($full_url_of_specific_forum_post);
my $dom = Mojo::DOM->new($mech->content);
my $div_id = 'postid_' . $_;
say $dom->at($div_id)->children('div')->first;
say $dom->at($div_id)->text;
}
Using $dom->at($div_id)->all_text gives me everything in an unbroken line, which makes it difficult to tell what's quoted and what's original in the post.
Using $dom->at($div_id)->text skips all of the child elements, so quoted text and spoilers are not picked up.
I've tried variations of $dom->at($div_id)->children('div')->first, but this gives me everything, including the HTML.
Ideally, I'd like to be able to pick up all the text in each post, with each child element on its own line, e.g.
POSTID12345:
+ Quote originally posted by Bob
+ Everyone knows the sky is blue.
I disagree with you, Bob. It's obviously green.
I'm new to Mojo and rusty with Perl. I wanted to solve this on my own, but after looking over the documentation and fiddling with it for a few hours, my brain is mush and I'm at a loss. I'm just not getting how Mojo::DOM and Mojo::Collections work.
Any help will be greatly appreciated.
Looking at the source of Mojo::DOM, basically the all_text method recursively walks the DOM and extracts all text. Use that source to write your own walking the DOM function. Its recursive function depends on returning a single string, in yours you might have it return an array with whatever context you need.
EDIT:
After some discussion on IRC, the web scraping example has been updated, it might help you guide you. http://mojolicio.us/perldoc/Mojolicious/Guides/Cookbook#Web_scraping
There is a module to flattern HTML tree, HTML::Linear.
The explanation of purpose for flatterning HTML tree is a bit long and boring, so here's a picture showing the output of the xpathify tool, bound with that module:
As you see, HTML tree nodes become single key/value list, where the key is the XPath for that node, and the value is the node's text attribute.
In a few keystrokes, this is how you use HTML::Linear:
#!/usr/bin/env perl
use strict;
use utf8;
use warnings;
use Data::Printer;
use HTML::Linear;
my $hl = HTML::Linear->new;
$hl->parse_file(q(vboard.html));
for my $el ($hl->as_list) {
my $hash = $el->as_hash;
next unless keys %{$hash};
p $hash;
}

Tooltip javascript disrupts following form element when "title" attribute not given

I'm using a javascript tooltip provided by jqueryTOOLS to give tool tips within a form.
For some form elements that do not require a tool tip I want to leave the title string blank, however if I do this then it causes disruption in the subsequent elements of the form - almost as if it is treating them as a tool tip: on mouseover an element with no title string it moves the following element's position to hover next to the field, then when no longer focused it disappears permanently.
My tooltip code:
$(function() {
$("#myform :input").tooltip({
position: "center right",
offset: [-2, 10],
effect: "fade",
opacity: 0.7
});
});
As you have probably guessed this tool tip is based on the 'title' attribute of a field.
After having included
<script src="http://cdn.jquerytools.org/1.2.7/full/jquery.tools.min.js"></script>
in the header.
Hopefully my description of events made sense!
Thanks in advance for any help
I fixed a problem giving me the same error, but in a different context:
I had a table with two columns, both colums containing tooltips (a DIV in the corresponding TD).
<table>
<tbody>
<tr>
<td><div class="tooltip"></div></td>
<td><div class="tooltip"></div></td>
</tr>
</tbody>
</table>
When opening the tooltip in the first column, the height and width of the second column TD change (as if that TD was the tooltip).
Adding an extra empty DIV element after the tooltip DIV in the TD of the first column solved the problem.
<table>
<tbody>
<tr>
<td>
<div class="tooltip"></div>
<div></div> <!-- extra empty div -->
</td>
<td><div class="tooltip"></div></td>
</tr>
</tbody>
</table>
I figured it out:
The trigger element was defined to all inputs, so it was using the next as a tooltip in the lack of a title element.
Just a bug on the party of jqueryTOOLS but an easy solution:
replace
$("#myform :input")
with
$("#myform :input[title]")
Hope this helps someone else

Why are some of my tags being removed (GWT)?

I'm adding an element to a document with the following:
Element parent = getParentElement(); // Returns the right thing.
HTML html = new HTML();
html.setHTML( "<td>BLAH</td>" );
parent.appendChild( html.getElement() );
When I view the resulting document with FireBug though, the parent's child looks like this:
<div class="gwt-HTML"> BLAH </div>
I can use FireBug to add in the <td> elements manually, and all my formatting applies, etc. Does anyone know why the HTML element seems to be removing my <td> tags?
It turns out that it's FireFox that's stripping it out. If I just use plain old javascript to create a div, or a tr, and set innerHTML to be <td>BLAH</td>, it still gets stripped. A couple of others have noticed this as well: http://www.jtanium.com/2009/10/28/firefox-gotcha-innerhtml-strips-td-tags/
If I use javascript to create a <table> tag, and add it to the DOM, I can then place the <td> in that. Of course, it helpfully creates a <tbody><tr> for me as well, so I'm not really getting back what I put in....

Nesting dynamically displayed components in wicket

I have a need to create following kind of markup with wicket using ajax:
<table>
<tr>
<td><a>first</a></td>
<tr>
<tr>
<td>displayed/closed if first is clicked <a>open</a></td>
</tr>
<tr><td>this and following displayed/closed if above open is clicked</td></tr>
<tr><td>there may be any number of these</td></tr>
<tr>
<td>there may be any number of these as well <a>open</a></td>
</tr>
<tr>
<td>any number of these as well <a>second</a></td>
</tr>
</table>
How to use ListViews or some other wicket element to individually toggle open "inner" rows of the table. I don't want to resort to render everything and toggling visibility but really create the rows in server side only when expand is requested. The markup should also be valid xhtml (rules out arbitrary container for row groups). I know I can put multiple tbodys, but it's good enough only for one level of nesting (no .... allowed).
From Lord Torgamus' comment, the ajax tree sounds appropriate..