PuppeteerJS - how can I scrape text content from a td element based on the text of the adjacent td? - element

I am attempting to scrape a link from a td cell adjacent to another td labeling the type or description of the link using puppeteer. There are no classes or id distinguishing these td cells other than the text content
<tr>
<td scope="row">1</td>
<td scope="row">10-Q</td>
<td scope="row">nflx-093018x10qxdoc.htm</td>
<td scope="row">10-Q</td>
<td scope="row">1339833</td>
</tr>
<tr class="blueRow">
<td scope="row">2</td>
<td scope="row">EXHIBIT 31.1</td>
<td scope="row">nflx311_q32018.htm</td>
<td scope="row">EX-31.1</td>
<td scope="row">14914</td>
</tr>
<tr>
<td scope="row">3</td>
<td scope="row">EXHIBIT 31.2</td>
<td scope="row">nflx312_q32018.htm</td>
<td scope="row">EX-31.2</td>
<td scope="row">14553</td>
</tr>
<tr class="blueRow">
<td scope="row">4</td>
<td scope="row">EXHIBIT 32.1</td>
<td scope="row">nflx321_q32018.htm</td>
<td scope="row">EX-32.1</td>
<td scope="row">12406</td>
</tr>
the link after td containing '10Q'

XPath expressions
This is where XPath expression are great:
//td[contains(., '10-Q')]/following-sibling::td[1]/a[1]
This XPath expression queries for a td element containing the text 10-Q. Then it will take the following td element and return the first link (a) inside. Alternatively, you could use //td[text()='10-Q']/ in the beginning, if you don't just want the element to contain the text, but to exactly match it.
Usage within puppeteer
To get the element with puppeteer, use the page.$x function. To extract information (like href) from the queried node, use page.evaluate.
Putting all together, the code looks like this:
const [linkHandle] = await page.$x("//td[contains(., '10-Q')]/following-sibling::td[1]/a[1]");
const address = await page.evaluate(link => link.href, linkHandle);

You can do this with vanila javascript,
// find all tr elements
[...document.querySelectorAll('tr')]
// check which one of them includes the word
.find(e=>e.innerText.includes('10-Q'))
// get the link inside
.querySelector('a')
With puppeteer $eval, this can be simplified,
page.$$eval('tr', eachTr=> eachTr.find(e=>e.innerText.includes('10-Q')).querySelector('a'))
Or page.evaluate,
page.evaluate(()=> {
// find all tr elements
return [...document.querySelectorAll('tr')]
// check which one of them includes the word
.find(e=>e.innerText.includes('10-Q'))
// get the link inside
.querySelector('a')
// do whatever you want to do with this
.href
})
Readable solution.

Related

use getByRole to select gridcell with particular description

I have some tabular data with the headers ('Type', 'Name'). I would like to select all items in column 'name', to check if they contain a search string. Each item in that column has the role 'gridcell', and the description 'Name'. See attached image1.
getByRole('gridcell', {description: /name/i}) doesn't work. I've looked through the typescript declarations of the queries and nothing seems helpful. How can one accomplish this?
Use getAllByRole('cell', {description: /name/i}) to retrieve an array containing the cells in column Name.
Check the array contains a certain value using toContain(item).
https://jestjs.io/docs/expect#tocontainitem
An example (using React):
'App.js'
function App() {
return (
<div className='App'>
<table>
<thead>
<tr>
<td id='name'>Name</td>
</tr>
</thead>
<tbody>
<tr>
<td aria-describedby='name'>J Blogs</td>
</tr>
<tr>
<td aria-describedby='name'>J Doe</td>
</tr>
<tr>
<td aria-describedby='name'>J Hancock</td>
</tr>
</tbody>
</table>
</div>
);
}
export default App;
'App.test.js'
import { render, screen } from '#testing-library/react';
import App from './App';
test('retrieves all cells described by name', () => {
render(<App />);
const cells = screen.getAllByRole('cell', {description: /name/i});
const cellValues = cells.map(cell => cell.textContent);
expect(cellValues).toContain('J Doe');
});

how to show a single value of column razor view

here is my markup
<td class="subtotal">#Model.Where(s=>s.wcc.xyz=="abc").Select(s=>s.targetxyz.wcc)</td>
<td class="subtotal">#Model.Where(s=>s.wcc.xyz=="abc").Sum(s=>s.wcc.col1)</td>
<td class="subtotal">#Model.Where(s=>s.wcc.xyz=="abc").Sum(s=>s.wcc.col2)</td>
<td class="subtotal">#Model.Where(s=>s.wcc.xyz=="abc").Sum(s=>s.wcc.col3)</td>
<td class="subtotal">#Model.Where(s=>s.wcc.xyz=="abc").Sum(s=>s.wcc.col4)</td>
<td class="subtotal">#Model.Where(s=>s.wcc.xyz=="abc").Sum(s=>s.wcc.col5)</td>
<td class="subtotal">#Model.Where(s=>s.wcc.xyz=="abc").Sum(s=>s.wcc.col6)</td>
<td class="subtotal">#Model.Where(s=>s.wcc.xyz=="abc").Sum(s=>s.wcc.col7)</td>
<td class="subtotal">#Model.Where(s=>s.wcc.xyz=="abc").Sum(s=>s.wcc.col8)</td>
</tr>
my query
var data =
from b in re.wccs
join t in re.targetxyz on b.xyz equals t.dname
select new val { wcc = b, targetxyz = t };
return View(data);
my error
System.Linq.Enumerable+WhereSelectEnumerableIterator`2[db.Models.val,System.Int32]
i can load all the columns with sum but at the place of select command i get this error.
why any suggestion?
You will need to use First or Single to get that value, and possible some ordering on data (as not sure what you want to show here)
e.g.
#Model.Where(s=>s.wcc.xyz=="abc").Select(s=>s.targetxyz.wcc).FirstOrDefault()
#Model.Where(s=>s.wcc.xyz=="abc").Select(s=>s.targetxyz.wcc).SingleOrDefault()
read here for difference of single & first:
LINQ Single vs First

WinJS Repeater table with rows (tr) wrapped in ItemContainer

Is it possible to create table using Repeater control which has rows wrapped in ItemContainer controls? Something along the line:
<table id="products">
<thead>
<tr>
<td>Name</td>
<td>Description</td>
<td>Type</td>
<td>Billing Periodicity</td>
<td>Average Life Time (in months)</td>
<td>Is default</td>
</tr>
</thead>
<tbody id="tableBody" data-win-control="WinJS.UI.Repeater" data-win-bind="winControl.data: products">
<tr data-win-control="WinJS.UI.ItemContainer">
<td data-win-bind="textContent: name"></td>
<td data-win-bind="textContent: description"></td>
<td data-win-bind="textContent: type"></td>
<td data-win-bind="textContent: costPeriodicity"></td>
<td data-win-bind="textContent: averageLifeTime"></td>
<td data-win-bind="textContent: isDefault"></td>
</tr>
</tbody>
</table>
Given example throws exception at runtime:
Unable to get property 'children' of undefined or null reference
I' d like to use ItemContainer's functionality to make table rows clickable. Is my approach to the issue invalid? Is ItemContainer control wrong to use in that scenario?
Side note - if I apply ItemContainer control to table cells (td), evertything runs smoothly (they behave like windows8 - like clickable objects).
You incorrectly declared data source for repeater, it should be declared as win-options not -win-bind, when you change it to:
<tbody data-win-control="WinJS.UI.Repeater" data-win-options="{data: products}">
it should work with no problems.

Comet tables with Lift 2.4 and HTML5

I'm trying to dynamically update a HTML table via Comet. I've got something like the following:
class EventsComet extends CometClient[Event] {
def server = Event
def render = {
println("Binding on: " + defaultHtml)
data.flatMap( event =>
bind("event", "name" -> event.name.toString, "date" -> event.startDate.toString)
)
}
}
And:
<lift:comet type = "EventsComet">
<table>
<thead>
<tr>
<th>Name</th>
<th>Date</th>
</tr>
</thead>
<tbody>
<tr>
<td><event:name />Test Name</td>
<td><event:date />Oct. 25, 2012</td>
</tr>
</tbody>
</table>
</lift:comet>
This prints out the entire table over and over again, one for each event rendered by EventsComet. The println statement outputs the entire table node.
So I tried variations:
<table>
<thead>
<tr>
<th>Race</th>
<th>Track</th>
<th>Date</th>
</tr>
</thead>
<tbody>
<lift:comet type = "EventsComet">
<tr>
<td><event:name />Test Name</td>
<td><event:date />Oct. 25, 2012</td>
</tr>
</lift:comet>
</tbody>
</table>
As expected, the HTML5 parser strips out the [lift:comet] tags and no binding occurs.
So I tried switching the rows to:
<tr lift:comet = "EventsComet">
<td><event:name />Test Name</td>
<td><event:date />Oct. 25, 2012</td>
</tr>
...as is shown in a snippet example here, but with this syntax my CometClient is not being instantiated at all.
Can anyone advise on the proper syntax?
EventsComet itself works fine; it can keep lists of events up to date without problem. I only run into issue using tables (and presumably other highly-nested structures I've not tried yet?).
Thank you. This is all rather frustrating for such a simple problem, and makes me want to just start implementing my templates in a strongly-typed templating language instead of using bindings.
The proper syntax seems to be:
<tr class="lift:comet?type=EventsComet">
<td><event:name />Test Name</td>
<td><event:date />Oct. 25, 2012</td>
</tr>
From this thread:
https://groups.google.com/forum/?fromgroups=#!topic/liftweb/NUDU1_7PwmM
Sometimes I'm getting duplicate rows (inserted above the table header at that), but I'd imagine this is related to my comet actor itself.

Sort a table with multiple tbody?

I have a table structure as follows. Now I need to sort these nested tables separately. Forexample: sorting chapter's row will only update chapters order in a separate table. Whereas, sorting items will update their order in another table.
I managed to setup the code and sorting. However, when I drag the items from chapter 4, it pass on the order of the items in from chapter 1 since they come before chapter 4???
Could someone help me with sorting only relevant items??
NOTE: This list is dynamic coming from database. So I am interested in one jquery code covering all the ordering bits.
<table id=subsortsortable>
<tbody class=content>
<tr id="chapter_1"><td>Chapter one</td></tr>
<tr id="chapter_2"><td>Chapter two</td></tr>
<tr id="chapter_3">
<td>
<table>
<tbody class=subcontent>
<tr id="item_31"><td>three.one</td></tr>
<tr id="item_32"><td>three.two</td></tr>
</tbody>
</table>
</td>
</tr>
<tr id="chapter_4">
<td>
<table>
<tbody class=subcontent>
<tr id="item_41"><td>four.one</td></tr>
<tr id="item_42"><td>four.two</td></tr>
<tr id="item_43"><td>four.three</td></tr>
<tr id="item_44"><td>four.four</td></tr>
<tr id="item_45"><td>four.five</td></tr>
</tbody>
</table>
</td>
</tr>
<tr id="chapter_4"><td>Chapter Four</td></tr>
</tbody>
</table>
The code I am using is as follows:
//for sorting chapters - which is outer table
$("#subsortable tbody.content").sortable({
opacity: 0.7,
cursor: 'move',
placeholder: "ui-state-highlight",
forcePlaceholderSize: true,
update: function(){
var order = $('#subsortable tbody.content').sortable('serialize') + '&action=updateChaptersOrder';
$.post("/admin/ajax/ajax_calls.php", order, function(theResponse){
});
}
});
// For sorting and updating items within a specific chapter - which is nested tbody
$("tbody.sortItems").subcontent({
opacity: 0.7,
cursor: 'move',
placeholder: "ui-state-highlight",
forcePlaceholderSize: true,
update: function(){
var order = $('tbody.subcontent').sortable('serialize');// + '&action=updateListings';
$.post("/admin/ajax/ajax_calls.php", order, function(theResponse){
});
}
});
I have got the answer to my own question.. In case someone else encounter the same problem. I have changed the following code inside the internal table:
var order = $('tbody.subcontent').sortable('serialize');
to
var order = $(this).sortable('serialize');