I'm fairly new to using Selenium Remote Driver and Perl. What I'd like to do is to have Selenium find all elements on a page using a partial match of text. Then store the full text of those elements into an array.
I've tried using:
#elements = $driver->find_elements("//tbody/tr[td[2]/div/span[2][contains(text(),'matching text')]]")->get_text;
However, this doesn't seem to work.
I've also tried:
#elements = $driver->find_elements("//tbody/tr[td[2]/div/span[2][contains(text(),'matching text')]]");
This does populate the array with webelements.
my #elements;
my #elementtext;
my $elementtext;
#elements = $driver->find_elements("//tbody/tr[td[2]/div/span[2][contains(text(),'matching text')]]");
foreach my $currentelement (#elements) {
$elementtext = $driver->find_element($currentelement)->get_text();
push #elementtext, $elementtext;
}
This causes perl to generate an error because webdriver can't find the element. Any ideas on what I'm doing wrong and how to fix it? I suspect that the problem is with the contents of the #elements array not actually being xpath elements.
Here is an example of the html:
<td>
<div class='cellContent'>Atlanta</div>
</td>
<td>
<div class='cellContent'>City</div>
</td>
<td>
<div class='cellContent'>Georgia</div>
</td>
<td class='sort_column'>
<div class='cellContent'>USA</div>
</td>
</tr>
<tr>
<td>
<div class='cellContent'>Joe Passenger</div>
</td>
<td>
<div class='cellContent'> <span>NFL</span>
<span>matching text: Atlanta.Falcons.team</span>
</div>
</td>
I want to get 'matching text: Atlanta.Falcons.team' stored into the array.
you can directly point to span tag using this xpath.
//tbody/tr/td/div/span[contains(text(),'matching text')]
Related
I am using Flutter and want to parse HTML using parser.dart
<div class="weather-item now"><!-- now -->
<span class="time">Now</span>
<div class="temp">19.8<span>℃</span>
<small>(23℃)</small>
</div>
<table>
<tr>
<th><i class="icon01" aria-label="true"></i></th>
<td>93%</td>
</tr>
<tr>
<th><i class="icon02" aria-label="true"></i></th>
<td>south 2.2km/h</td>
</tr>
<tr>
<th><i class="icon03" aria-label="true"></i></th>
<td>-</td>
</tr>
</table>
</div>
Using,
import 'package:html/parser.dart';
I want to get this data
Now,19.8,23,93%,south 2.2km/h
How can I do this?
Since you are using the html package, you can get the desired data like so using some html parsing and string processing (as needed), here is a dart sample, you can use the parseData function as is in your flutter application -
main.dart
import 'package:html/parser.dart' show parse;
main(List<String> args) {
parseData();
}
parseData(){
var document = parse("""
<div class="weather-item now"><!-- now -->
<span class="time">Now</span>
<div class="temp">19.8<span>℃</span>
<small>(23℃)</small>
</div>
<table>
<tr>
<th><i class="icon01" aria-label="true"></i></th>
<td>93%</td>
</tr>
<tr>
<th><i class="icon02" aria-label="true"></i></th>
<td>south 2.2km/h</td>
</tr>
<tr>
<th><i class="icon03" aria-label="true"></i></th>
<td>-</td>
</tr>
</table>
</div>
""");
//declaring a list of String to hold all the data.
List<String> data = [];
data.add(document.getElementsByClassName("time")[0].innerHtml);
//declaring variable for temp since we will be using it multiple places
var temp = document.getElementsByClassName("temp")[0];
data.add(temp.innerHtml.substring(0, temp.innerHtml.indexOf("<span>")));
data.add(temp.getElementsByTagName("small")[0].innerHtml.replaceAll(RegExp("[(|)|℃]"), ""));
//We can also do document.getElementsByTagName("td") but I am just being more specific here.
var rows = document.getElementsByTagName("table")[0].getElementsByTagName("td");
//Map elememt to its innerHtml, because we gonna need it.
//Iterate over all the table-data and store it in the data list
rows.map((e) => e.innerHtml).forEach((element) {
if(element != "-"){
data.add(element);
}
});
//print the data to console.
print(data);
}
Here's the sample output -
[Now, 19.8, 23, 93%, south 2.2km/h]
Hope it helps!
This article would probably be of help. It specifically uses the html package parser.
Following the example in the package's readme you can easily obtain a Document object. With this object you can obtain specific Elements of the DOM with methods like getElementById, getElementsByClassName, and getElementsByTagName. From there you can obtain the innerHtml of each Element that is returned and put together the output string you desire.
In my application,I've to select the last td (which is an img) in table.Can anyone help me with this ?
HTML
<table>
<tbody>
<tr>
<td>
<td>
<a onclick="return confirm('Delete creative?')" href="delete.page?cid=47">
<a href="edit.page?id=47"><a href="?duplicateId=47">
<img title="Duplicate" src="/tracker/images/skin2/bolean.png">
</a>
</td>
</tr>
</tbody>
</table>
Implemenetd as below :
#browser.img(:src => "/tracker/images/skin2/bolean.png").click
#browser.img(:src => "/tracker/images/skin2/bolean.png").last.click
which is clicking on the first image.
When you do:
#browser.img(:src => "/tracker/images/skin2/bolean.png")
This returns the first matching element.
If you want to get all of the matching elements, you need to pluralize the method:
#browser.imgs(:src => "/tracker/images/skin2/bolean.png")
You will then get a collection of all images that have the specified src. You can then get the last one and click it similar to how Željko did it for tds.
#browser.imgs(:src => "/tracker/images/skin2/bolean.png").last.click
Try this:
#browser.tds.last.click
Given this table structure:
<table id="tblBranchDetails">
<tr>
<td width="120px">Branch:</td>
<td id="branchName" class="branchData">
<label id="lblBranchName"></label>
<input type="text" id="txtBranchName" class="hideOnLoad editBranchInfo" />
</td>
</tr>
<tr>
<td>Address Line 1:</td>
<td id="branchAddress1" class="branchData">
<label id="lblAddress1"></label>
<input type="text" id="txtAddress1" class="hideOnLoad editBranchInfo" />
</td>
</tr>...
I'm trying to select the label and input in each td so I can clear the label's text and hide the input.
This gives me the text in the label (verified in the console):
$('table#tblBranchDetails tr td:nth-child(2):eq(0)').text();
So I know I can clear the label's text with ".text('')"
Having figured that out, I thought this would give me the value of the input:
$('table#tblBranchDetails tr td:nth-child(2):eq(1)').val()
But it gives me the value of the label in the next td. So obviously I'm using the :nth-child() and :eq() functions wrong.
What's the correct way to do what I'm trying to do?
I was nuking this.
Here's the solution:
$('#btnShowBranchEditBoxes').click(function() {
$('#tblBranchDetails tr').find('label').fadeOut(200);
$('#tblBranchDetails tr').find('.editBranchInfo').delay(200).fadeIn(200);
// Replace the buttons
$('table#tblBranchDetails input#btnShowBranchEditBoxes').fadeOut(200);
$('table#tblBranchDetails input.btnEditBranch').delay(200).fadeIn(200);
});
I am using Mojo::UserAgent->new to fetch some XML which has the following format:
<row>
<td> content1 </td>
<td> content2 </td>
<td> content3 </td>
</row>
<row>
<td> content4 </td>
<td> content5 </td>
<td> content6 </td>
</row>
Is it possible to view the results like this:
content1,content2,content3
content4,content5,content6
below are the query i am using which get different resutls
$ua->get($url)->res->dom->at->(row)->children->each(sub {print "$_\t"})
Sure, that's absolutely possible and not hard with Mojo::Collection working behind the scenes.
Code
# replace this line by your existing $ua->get($url)->res->dom code
my $dom = Mojo::DOM->new(do { local $/ = undef; <DATA> });
# pretty-print rows
$dom->find('row')->each(sub {
my $row = shift;
say $row->children->pluck('text')->join(', ');
});
Data
__DATA__
<row>
<td> content1 </td>
<td> content2 </td>
<td> content3 </td>
</row>
<row>
<td> content4 </td>
<td> content5 </td>
<td> content6 </td>
</row>
Output
content1, content2, content3
content4, content5, content6
Some comments
each evaluates a code ref for each element of a collection (which is what find returns).
pluck returns a Mojo::Collection object with the return values of the given method name (text in this case). This is just a fancy way to map simple stuff.
text automagically trims the element content.
join joins all elements of the Mojo::Collection object together, all td elements of a row in this case.
Your code doesn't even compile, but using at won't work anyway because it returns just the first matching DOM element, not all. You want to iterate all rows.
HTH!
I tried to run the following Perl script on the HTML further below. My problem is how to define the correct hash reference, with attribs that specify attributes of interest within my HTML <table> tag itself.
#!/usr/bin/perl
use strict; use warnings;
use HTML::TableExtract;
use YAML;
my $table = HTML::TableExtract->new(keep_html=>0, depth => 1, count => 1, br_translate => 0 );
$table->parse($html);
foreach my $row ($table->rows)
sub cleanup {
for ( #_ ) {
s/\s+//;
s/[\xa0 ]+\z//;
s/\s+/ /g;
}
}
{ print join("\t", #$row), "\n"; }
I want to apply this code on the HTML-document you see further below.
My first approach is to do this with the columns method. But i am not able to figure out how to use the columns method on the below HTML-file: My intuition makes me think it should be something like the following (but my intuition is wrong):
foreach my $column ($table->columns) {
print join("\t", #$column), "\n";
}
The HTML::TableExtract documentation doesn't shed much light (for me anyway).
I can see in the code of the module that the columns method belongs to HTML::TableExtract::Table, but I can't figure out how to use it. I appreciate any help.
Background:
I try to get the table extracted and I have a very very small document of tables that i want to parse with the HTML::TableExtract module I am trying to search for keywords in the HTML - so that i can take them for the attribs I have to print only the necessary data.
I tried going CPAN but could not really find how to search through it for particular keywords. One way to do it would be HTML::TableExtract - the other way would be to parse with HTML::TokeParser I have very little experience with HTML::TokeParser.
Well - one or the other way i need to do this parsing: I want to output the result of the parsed tables into some .text - or even better store it into a database. The problem here is I cant find anyway to search through the resulting parsed table and get necessary data.
The HTML
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta name="GENERATOR" content="Microsoft FrontPage 3.0">
<link rel="stylesheet" href="jspsrc/css/bp_style.css" type="text/css">
<title>Weitere Schulinformationen</title>
</head>
<body class="bodyclass">
<div style="text-align:center;"><center>
<!-- <fieldset><legend> general information </legend>
-->
<br/>
<table border="1" cellspacing="0" bordercolordark="white" bordercolorlight="black" width="80%" class='bp_result_tab_info'>
<!-- <table border="0" cellspacing="0" bordercolordark="white" bordercolorlight="black" width="80%" class='bp_search_info'>
-->
<tr>
<td width="100%" colspan="2" class="ldstabTitel"><strong>data_one </strong></td>
</tr>
<tr>
<td width="27%"><strong>data_two</strong></td>
<td width="73%"> 116439
</td>
</tr>
<tr>
<td width="27%"><strong>official_description</strong></td>
<td width="73%">the name </td>
</tr>
<tr>
<td width="27%"><strong>name of the street</strong></td>
<td width="73%">champs elysee</td>
</tr>
<tr>
<td width="27%"><strong>number and town</strong></td>
<td width="73%"> 75000 paris </td>
</tr>
<tr>
<td width="27%"><strong>telefon</strong></td>
<td width="73%"> 000241 49321
</td>
</tr>
<tr>
<td width="27%"><strong>fax</strong></td>
<td width="73%"> 000241 4093287
</td>
</tr>
<tr>
<td width="27%"><strong>e-mail-adresse</strong></td>
<td width="73%"> <a href=mailto:1111116439#my_domain.org>1222216439#site.org</a>
</td>
</tr>
<tr>
<td width="27%"><strong>internet-site</strong></td>
<td width="73%"> <a href=http://www.thesite.org>http://www.thesite.org</td>
</tr>
<!--
<tr>
<td width="27%"> </td>
<td width="73%" align="right"><a href="schule_aeinfo.php?SNR=<? print $SCHULNR ?>" target="_blank">
[Schuldaten ändern] </a>
</tr>
</td> -->
<tr>
<td width="27%"> </td>
<td width="73%">the department</td>
</tr>
<tr>
<td width="100%" colspan=2><strong> </strong></td>
</tr>
<tr>
<td width="27%"><strong>number of indidviduals</strong></td>
<td width="73%"> 192</td>
<tr>
<td width="100%" colspan=2><strong> </strong></td>
</tr>
<!-- if (!fsp.isEmpty()){
ztext = " ";
int i = 0;
Iterator it = fsp.iterator();
while (it.hasNext()){
String[] zwert = new String[2];
zwert = (String[])it.next();
if (i==0){
if (zwert[1].equals("0")){
ztext = ztext+zwert[0];
}else{
ztext = ztext+zwert[0]+" mit "+zwert[1];
if (zwert[1].equals("1")){
ztext = ztext+" Schüler";
}else{
ztext = ztext+" Schülern";
}
}
i++;
}else{
if (zwert[1].equals("0")){
ztext = ztext+"<br> "+zwert[0];
}else{
ztext = ztext+"<br> "+zwert[0]+" mit "+zwert[1];
if (zwert[1].equals("1")){
ztext = ztext+" Schüler";
}else{
ztext = ztext+" Schülern";
}
}
}
}
-->
</table>
<!-- </fieldset> -->
<br>
</body>
</html>
Thanks for any and all help.
You need to provide something that uniquely identifies the table in question. This can be the content of its headers or the HTML attributes. In this case, there is only one table in the document, so you don't even need to do that. But, if I were to provide anything to the constructor, I would provide the class of the table.
Also, I do not think you want the columns of the table. The first column of this table consists of labels and the second column consists of values. To get the labels and values at the same time, you should process the table row-by-row.
#!/usr/bin/perl
use strict; use warnings;
use HTML::TableExtract;
use YAML;
my $te = HTML::TableExtract->new(
attribs => { class => 'bp_result_tab_info' },
);
$te->parse_file('t.html');
for my $table ( $te->tables ) {
print Dump $table->columns;
}
Output:
---
- 'data_one '
- data_two
- official_description
- name of the street
- number and town
- telefon
- fax
- e-mail-adresse
- internet-site
- á
- á
- number of indidviduals
- á
---
- ~
- "á116439\r\n "
- 'the name '
- champs elysee
- ' 75000 paris '
- "á000241 49321\r\n"
- "á000241 4093287\r\n"
- "á1222216439#site.org\r\n"
- áhttp://www.thesite.org
- the department
- ~
- á192
- ~
Finally, a word of advice: It is clear that you do not have much of an understanding of Perl (or HTML for that matter). It would be better for you to try to learn some of the basics first. This way, all you are doing is incorrectly copying and pasting code from one answer into another and not learning anything.