Find value in a html file using HTML::TreeBuilder

Find value in a html file using HTML::TreeBuilder - perl

Below is my data in html file. I want to find the values in the html file using "HTML::TreeBuilder"
<table id="stats" cellpadding="0" cellspacing="0">
<tbody>
<tr class="row-even">
<td class="stats_left">Main Domain</td>
<td class="stats_right"><b>myabcab.com</b></td>
</tr>
<tr class="row-odd">
<td class="stats_left">Home Directory</td>
<td class="stats_right">/home/abc</td>
</tr>
<tr class="row-even">
<td class="stats_left">Last login from</td>
<td class="stats_right">22.32.232.223 </td>
</tr>
<tr class="row-odd">
<td class="stats_left">Disk Space Usage</td>
<td class="stats_right">30.2 / ∞ MB<br>
<div class="stats_progress_bar">
<div class="cpanel_widget_progress_bar" title="0%"
style="position: relative; width: 100%; height: 100%; padding: 0px; margin: 0px; border: 0px">
</div>
<div class="cpanel_widget_progress_bar_percent" style="display: none">0</div>
</div>
</td>
</tr>
<tr class="row-even">
<td class="stats_left">Monthly Bandwidth Transfer</td>
<td class="stats_right">0 / ∞ MB<br>
<div class="stats_progress_bar">
<div class="cpanel_widget_progress_bar" title="0%"
style="position: relative; width: 100%; height: 100%; padding: 0px; margin: 0px; border: 0px">
</div>
<div class="cpanel_widget_progress_bar_percent" style="display: none">0</div>
</div>
</td>
</tr>
</tbody>
</table>
How can I find "Disk Usage space" value using "HTML::TreeBuilder". I have many tds with same classes from above code,

Find the <td> with the matching content, in this case "Disk Space Usage" and then find the next <td>.
Once you have an element tree:
my $usage = $t->look_down(
_tag => 'td',
sub {
$_[0]->as_trimmed_text() =~ /^Disk Space Usage$/
}
)->right()->as_trimmed_text();
You may want to wrap that in an eval block in case look_down doesn't find a match.
The tree navigation methods in HTML::Element are a key part of making effective use of HTML::TreeBuilder effectively.
Mohini asks, "why doesn't this work?"
(formatting added by me)
use strict;
use warnings;
use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new_from_file( "index.html");
my $disk_value; my $disk_space;
for ( $tree->look_down( _tag => q{tr}, 'class' => 'row-odd' ) ) {
$disk_space = $tree->look_down(
_tag => q{td},
'class' => 'stats_left'
)->as_trimmed_text;
if ( $disk_space eq 'Home Directory' ) {
$disk_value = $tree->look_down( _tag => q{td}, 'class' => 'stats_right' )
->right()
->as_trimmed_text();
}
}
print STDERR "my home value is $disk_space : $disk_value\n";
look_down starts from the root node you invoke it from, and looks down the element tree (these trees grow upside down) and returns either the list of matching nodes or the first matching node, depending on context.
Since all calls to look down are on tree, you repeatedly find the same nodes each time through the loop.
Your loop should look something more like this:
my %table_stuff;
for my $odd_row ( $tree->look_down( _tag => q{tr}, 'class' => 'row-odd' ) ) {
$heading = $odd_row->look_down(
_tag => q{td},
'class' => 'stats_left'
);
$table_stuff{ $heading->as_trimmed_text() } = $heading->right()->as_trimmed_text();
}
This populates a hash with table elements.
If you only want the one value, don't use a loop at all. look_down already acts as a loop.
my $heading = $t->look_down(
_tag => 'td',
sub {
$_[0]->as_trimmed_text() =~ /^Home Directory$/
}
);
my $value = $heading->right();
# Now $heading and $value have HTML::Element nodes that you can do whatever you want with.
my $disk_value = $value->as_trimmed_text();
my $disk_space = $heading->as_trimmed_text();

Related

Powershell email sender with html body (htmlbody.Replace)

I'm currently a C# dotnet automation engineer. One of my tests results output is via email. My tests results output goes through powershell. I'm fairly new to email templates and HTML in general.
It's a simple HTML body with variables that I replace with $EmailBody= $EmailBody.Replace("PassedTests",$Passed) function etc
The whole premise: my script replaces Total tests/Passed/Failed/Skipped with data that I extract from a .trx file after the test run.
My extraction code:
$PassedTests = $testResultsXml.TestRun.ResultSummary.Counters.Passed
$FailedTests = $testResultsXml.TestRun.ResultSummary.Counters.Failed
$SkippedTests = $testResultsXml.TestRun.ResultSummary.Counters.Skipped
$OnlyFailed = $testResultsXml.TestRun.Results.UnitTestResult | Where-Object { $_.outcome -eq "Failed" }
$FailedTestsName = $OnlyFailed.TestName
I have the "Error list" table (picture below) that shows test names if there are any failed tests that in the HTML body
#</td>
</tr>
<!--end img-->
<tr>
<td height="15"></td>
</tr>
<!--title-->
<tr align="center">
<td align="center" style="font-family:
'Open Sans', Arial, sans-serif; font-size:16px;color:#3b3b3b;font-weight: bold;">**ERROR LIST**</td>
</tr>
<!--end title-->
<tr>
<td height="10"></td>
</tr>
<!--content-->
<tr align="center">
<td align="center" style="font-family: 'Open Sans', Arial, sans-serif; font-size:12px;color:#7f8c8d;line-height: 24px;">NoErrors</td>
</tr>
<!--end content-->
</table>
</td>
</tr>
<tr>
<td height="30"></td>
</tr>
</table>
Now the main question is: is it somehow possible to ONLY show the "Error list" table only IF there are any failed tests? If there are no failed tests it would be great for that table not to be shown at all.
Any kind of help would be greatly appreciated. Thanks!
$EmailBody= $EmailBody.Replace("PassedTests",$Passed)
$EmailBody= $EmailBody.Replace("FailedTests",$Failed)
$EmailBody= $EmailBody.Replace("SkippedTests",$Skipped)
$EmailBody= $EmailBody.Replace("ErrorList",$FailedTestsName)
$Emailattachment = "\TestResults.trx"

You are on the good path.
You just need to extend what you are doing.
Remove the thing that might or might not be in the email (The "errors list" section as it won't be there if there are no error)
Put the section your removed in its own variable
Add a placeholder in your main html template at the location where it is supposed to be (just like you do already so we can do a replace in the html template.
From there, the logic is :
If there are 0 errors, you replace the placeholder from the main template by an empty string (you don't want that placeholder to appear in the final email)
If there are 1 or more error, instead of replacing by your error list, you build a new variable that contain the section you want to append, then you replace its loop by the errors content and finally you replace the placeholder by that section (which contains the error loop)
That would look something like this.
$EmailBody = #'
</td>
</tr>
<!--end img-->
<tr>
<td height="15"></td>
</tr>
**ErrorsTable**
'#
$ErrorListBody = #'
<!--title-->
<tr align="center">
<td align="center" style="font-family:
'Open Sans', Arial, sans-serif; font-size:16px;color:#3b3b3b;font-weight: bold;">**ERROR LIST**</td>
</tr>
<!--end title-->
<tr>
<td height="10"></td>
</tr>
<!--content-->
<tr align="center">
<td align="center" style="font-family: 'Open Sans', Arial, sans-serif; font-size:12px;color:#7f8c8d;line-height: 24px;">NoErrors</td>
</tr>
<!--end content-->
</table>
</td>
</tr>
<tr>
<td height="30"></td>
</tr>
</table>
'#
if ($FailedTests.Count -gt 0) {
# inserting errors to the `$ErrorListBody` html segment
$ErrorsHtml = $ErrorListBody.Replace("ErrorList", $FailedTestsName)
# inserting the html segment into the main email
$EmailBody = $EmailBody.Replace("**ErrorsTable**", $ErrorsHtml)
} else {
# Removing the placeholder from the main template.
$EmailBody = $EmailBody.Replace("**ErrorsTable**", '')
}

grouping tr but browser closes tbody prematurely

When using LitElement to render data dynamically, the browser inserts tbody all the time negating any effort to "group" table rows.
render() {
return html`
<table style="border-collapse:collapse;border:solid 1px #666;">
${this.rows.map((row)=>{
var disp= html`
${(row=="FOUR")?html`</tbody>`:''}
${(row=="TWO")?html`
<tbody style="border:solid 2px #F00; border-collapse:separate;">
<tr>
<td>SOME HEADER</td>
</tr>
`:''}
<tr>
<td>${row}</td>
</tr>
`
return disp;
})}
</table>
`;
} //render()
constructor() {
super();
this.rows = ['ONE','TWO','THREE','FOUR'];
}
The result is the tbody is closed immediately after the closing tr of "SOME HEADER" instead of the tbody being closed after the tr "FOUR" which is what we want in this case.
It looks like the browser does this by itself because it always wants to insert a tbody whenever a tr is written to the DOM?
So I suspect this would happen to any framework that renders data dynamically - but have not verified this.
I assume this is not going to be "fixed" anytime soon.
That being the case, anyone have a suggestion on how to group tr's together in a block in this case?
Thanks!

If you have an unclosed <tbody> in a document fragment the browser will close it for you.
Instead nest the <tr> you want to group inside a template that holds both the <tbody> and closing </tbody>:
const groups = //process this.rows into groups
return html`
<table style="border-collapse:collapse;border:solid 1px #666;">
${groups.map(g => html`
<tbody style="border:solid 2px #F00; border-collapse:separate;">
<tr><th>${g.name}</th></tr>
${g.rows.map(r => html`
<tr><td>${r}</td></tr>`)}
</tbody>`)}
</table>`;

Formatting output from Invoke-WebRequest in Powershell

Information
So what I am looking to do is scrape my local intranet where our HR team upload new starter information and be able to either hold that information in a usable format, or export it to a CSV to then be used by another script.
Currently our service desk team manually go looking at this intranet page, and create the users based on the information our HR team enter one by one.
Naturally, this is a very time consuming task that could be easily automated. Unfortunately, our HR team are not open for any changes to the process at the current time due to other work they are focusing on. Internal politics stuff, so sadly they can't be convinced.
Now, I have managed to use Invoke-WebRequest and get the content of the page but the formatting is awful. It returns as a load of HTML and I'm iterating through multiple steps of splitting and string replacing which just doesn't feel optimal to me and I feel like there is a better way to get the results I want.
Current Script
$webRequest = Invoke-WebRequest -Uri "http://intranet-site/HR/NewStarterList.php?action=ItToComp" -Headers #{"Header Info here"} -UseDefaultCredentials
$content = $webRequest.Content
$initialReplace = $content -replace '(?<=<).*?(?=>)', ' '
$split = $initialReplace -split "< >< >< >"
$split = $split -split "< >< >"
$split = $split -replace '< >',""
$split = $split[5..$($split.count)]
As you can see, this is not really ideal, and I'm wondering if there is a better way to grab just the information I need from the page.
The initial content returns as below (I have shortened and replaced any names to make it easy on the eye)
<html>
<head>
<title>New Starter List</title>
<link rel="STYLESHEET" type="text/css" href="/common/StyleSheet/Reports.css" /> <style> TD {font-family: Verdana; font-size: 8pt; border-left: solid 0px black; border-right: solid 0px black;} </style>
<script type="text/javascript" src="../../../cgi-bin/calendar/tableH.js"></script>
</head>
<body>
<img src="/common/images/logo.gif" border="0">
<br>
<br>
<b><span style="font-size: 12pt; font-variant: small-caps; ">New Starter List</span></b>
<br>Logged In As "UserName"<br>
<br>
<tableonMouseOver="javascript:trackTableHighlight(this.event,'FFFF66');"onMouseOut="javascript:highlightTableRow(0);" border="4" frame="border" width="80%" rules="none" cellspacing="6%" cellpadding="6%">
<th align="left">Date Started</th>
<th align="left">Name</th>
<th align="left">Initials</th>
<th align="left">Department</th>
<th align="left">Contact</th>
<th align="left">IT Completed?</th>
<th align="left">Supervisor Completed?</th>
<tr colspan="6"><td align="left">25 Sep 2019</td>
<td align="left">Joe Bloggs</td>
<td align="left">JXBL</td>
<td align="left">Team A</td>
<td align="left">Manager 1</td>
<td align="left">No</td>
<td align="left">Yes</td></tr>
<tr colspan="6"><td align="left">08 Jul 2019</td>
<td align="left">Harry Bloggs</td>
<td align="left">HXBL</td>
<td align="left">Team B</td>
<td align="left">Manager 2</td>
<td align="left">No</td>
<td align="left">Yes</td></tr>
<th align="left" colspan="7">72 starters</th>
</table>
</body>
</html>
After I run my splits and replaces, It looks like below (again, names changed)
25 Sep 2019
Joe Bloggs
JXBL
Team 1
Manager 1
No
Yes
08 Jul 2019
Harry Bloggs
HXBL
Team 2
Manager 2
No
Yes
72 starters
The idea is then to be able to run with this information to automate our on-boarding process.
I feel like I am missing something obvious, like there is a neater or more efficient way to do this, as this is the first time I'm using Invoke-WebRequest and finding it troublesome as it is anyway.
Expected Results
What I want is preferably an array of users with properties for each bit of info, like a CSV or a PSObject.
So when I call a variable holding the info, I want it to return something like the below:
Name : Joe Bloggs
Initials : JXBL
Department : Team 1
Manager : Manager 1
IT : No
Supervisor : No
StartDate : 08 Jul 2019
Name : Harry Smith
Initials : HXSM
Department : Team 2
Manager : Manager 2
IT : Yes
Supervisor : No
Similar Questions
I only saw one question that looked like it may cover what I wanted, but it ended up being about needing a "try-catch" loop.
Similar Question Link
Please let me know if you need any further information, or if you have any questions.
Thanks in advance for the help.
EDIT
Added in an expected results bit, as I realized this was missing.

The trick is to have something to denote the lines you want to keep.
In your sample above, the link stands out:
<a href="NewStarterInfo.php?id=3117">
So, if you import the page as a single array, you can parse that array finding only lines that contain "NewStarterInfo.php" for example.
$a = #"
<html>
<head>
<title>New Starter List</title>
<link rel="STYLESHEET" type="text/css" href="/common/StyleSheet/Reports.css" /> <style> TD {font-family: Verdana; font-size: 8pt; border-left: solid 0px black; border-right: solid 0px black;} </style>
<script type="text/javascript" src="../../../cgi-bin/calendar/tableH.js"></script>
</head>
<body>
<img src="/common/images/logo.gif" border="0">
<br>
<br>
<b><span style="font-size: 12pt; font-variant: small-caps; ">New Starter List</span></b>
<br>Logged In As "UserName"<br>
<br>
<tableonMouseOver="javascript:trackTableHighlight(this.event,'FFFF66');"onMouseOut="javascript:highlightTableRow(0);" border="4" frame="border" width="80%" rules="none" cellspacing="6%" cellpadding="6%">
<th align="left">Date Started</th>
<th align="left">Name</th>
<th align="left">Initials</th>
<th align="left">Department</th>
<th align="left">Contact</th>
<th align="left">IT Completed?</th>
<th align="left">Supervisor Completed?</th>
<tr colspan="6"><td align="left">25 Sep 2019</td>
<td align="left">Joe Bloggs</td>
<td align="left">JXBL</td>
<td align="left">Team A</td>
<td align="left">Manager 1</td>
<td align="left">No</td>
<td align="left">Yes</td></tr>
<tr colspan="6"><td align="left">08 Jul 2019</td>
<td align="left">Harry Bloggs</td>
<td align="left">HXBL</td>
<td align="left">Team B</td>
<td align="left">Manager 2</td>
<td align="left">No</td>
<td align="left">Yes</td></tr>
<th align="left" colspan="7">72 starters</th>
</table>
</body>
</html>
"#
With $a set to the content of the page, loop thru it.
foreach($x in $a.split("<")) # break it at the "<" that starts each line.
{
if ($x.contains("NewStarterInfo.php") -eq $true) { write-host $x.split(">")[1] }
}
This will take all of the lines in a single variable (not an array) and find the lines with a person's name, and display the name.
If you actually have an array, then you can omit the .split("<") from the foreach statement.

Removing a line from HTML Pogramatically through Open2

I have a code that shows up in two different places, one is a browser html page, then the other is a downloaded PDF. On this page there is a line that says "print using the print button" but of course that isn't on the PDF model so I would like to remove it when the function printFilePdf is run. However I can't add (or I don't know how) a condition do the HTML that is in the method.
sub printHeader {
my ($outFH) = #_;
my ($sec, $min, $hour, $mday, $month, $year) = (localtime)[0, 1, 2, 3, 4, 5];
$month++;
$year += 1900;
my ($string) = scalar localtime(time());
print $outFH <<EOT;
<html>
<head>
<title>THIS IS A TITLE</title>
</head>
<body>
<table width="640" border="0" cellspacing="0" cellpadding="0">
<tr>
<td class="bold"> </td>
</tr>
<tr>
<td class="bold">
<div align="center" class="header">
I want to keep this line $string<br>
</div>
</td>
</tr>
<tr>
<td class="bold"> </td>
</tr>
<tr>
<td class="bold" style="color: red; text-align: center">
I also what to keep this line.
</td>
</tr>
<tr>
<td class="bold" style="color: red; text-align: center">
This line is not needed when the printFilePdf function is run.
</td>
</tr>
</table>
EOT
print $outFH qq(<p align="center"> </p>);
print $outFH <<EOT;
</td>
</tr>
</table>
EOT
}
Is there anyway to do this? Like add a name to the table row and in the method above say something like
if(!printFilePdf())
{
<tr>
<td class="bold" style="color: red; text-align: center">
This line is not needed when the printFilePdf function is run.
</td>
</tr>
}

You could check the caller. If caller is printFilePdf then search and replace to delete unneeded data.
perldoc -f caller
As a side note: If you are using some templating engine like HTML::Template then it'd be lot easier. In that case you can put conditionals in HTML.
<TMPL_IF NAME="NON_PDF">
Some text that only gets displayed if NON_PDF is true!
</TMPL_IF>

Just break the HTML into two parts: one is common to both the formats, one is HTML specific:
#!/usr/bin/perl
use warnings;
use strict;
sub printHeader {
my ($outFH, $goes_to_html) = #_;
print $outFH <<'__HTML__';
Here is the common text.
__HTML__
print $outFH <<'__HTML__' if $goes_to_html;
This doesn't go to PDF.
__HTML__
}
print "To HTML:\n";
printHeader(*STDOUT, 1);
print "To PDF:\n";
printHeader(*STDOUT, 0);

How to remove payment method from woocommerce admin email

Cannot find how to remove payment method from woocommerce admin email.
Searched through the files for Payment Method, but no luck

First of all, if you haven't done so, copy your woocommerce template files to your themes root as described in http://docs.woothemes.com/document/template-structure/.
Then, open a file that is responsible for building the email template. In my case it was (after copying it over) /wp-content/themes/MY_THEME/woocommerce/emails/admin-new-order.php
Find the following lines of code
<tfoot>
<?php
if ( $totals = $order->get_order_item_totals() ) {
$i = 0;
foreach ( $totals as $total ) {
$i++;
?><tr>
<th scope="row" colspan="2" style="text-align:left; border: 1px solid #eee; <?php if ( $i == 1 ) echo 'border-top-width: 4px;'; ?>"><?php echo $total['label']; ?></th>
<td style="text-align:left; border: 1px solid #eee; <?php if ( $i == 1 ) echo 'border-top-width: 4px;'; ?>"><?php echo $total['value']; ?></td>
</tr><?php
}
}
?>
</tfoot>
And add a condition to check if one of the labels contains Payment Method, like so
<tfoot>
<?php
if ( $totals = $order->get_order_item_totals() ) {
$i = 0;
foreach ( $totals as $total ) {
$i++;
if ( $total['label'] != 'Payment Method:' ){
?><tr>
<th scope="row" colspan="2" style="text-align:left; border: 1px solid #eee; <?php if ( $i == 1 ) echo 'border-top-width: 4px;'; ?>"><?php echo $total['label']; ?></th>
<td style="text-align:left; border: 1px solid #eee; <?php if ( $i == 1 ) echo 'border-top-width: 4px;'; ?>"><?php echo $total['value']; ?></td>
</tr><?php
}
}
}
?>
</tfoot>
You can use this for other fields also

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Find value in a html file using HTML::TreeBuilder - perl

Related

Powershell email sender with html body (htmlbody.Replace)

grouping tr but browser closes tbody prematurely

Formatting output from Invoke-WebRequest in Powershell

Removing a line from HTML Pogramatically through Open2

How to remove payment method from woocommerce admin email

Categories

Resources