Extracting Content from Webpage with ParsedHtml - powershell

I've been trying to use the invoke-Webrequest and the "ParsedHtml.getElements"
ParsedHtml.getElementsByTagName("div") | Where{ $_.className -eq 'pricingContainer-priceContainer' } ).innerText
to try to get the value $8.29 but using it on the below code produces no result. What am I doing wrong?
<div class="pricingContainer pricingContainer--grid u-ngFade noCenterTag" ng-class="::{'noCenterTag': !showCenterTag}" ng-if="::featuresEnabled">
<!-- ngIf: ::(product.IsOnSpecial && !product.HideWasSavedPrice) -->
<div class="pricingContainer-priceContainer">
<span class="pricingContainer-priceAmount" ng-class="::specialClass">$8.29</span>
<!-- ngIf: ::product.CupPrice --><span ng-if="::product.CupPrice" class="pricingContainer-priceCup">
$5.19 / 100G
</span><!-- end ngIf: ::product.CupPrice -->
</div>
</div>

By replacing className by class:
($html.getElementsByTagName("span") | Where{ $_.class -eq 'pricingContainer-priceCup' }).innerText
or
($html.getElementsByTagName("div") | Where{ $_.class -eq 'pricingContainer-priceContainer' }).innerText
An example:
$Site = "http://example.com/index.html"
$all = Invoke-WebRequest -URI $Site
# $all contains all informations of the page
$html = [xml]$all.Content
#[xml] is a cast to convert code to xml
$html.getElementsByTagName("div")
You can use automation with IE. You choose a div witch contains the Card and you can get the innerHTML like this:
$ie = New-Object -ComObject "InternetExplorer.Application"
$ie.Navigate("http://www.example.com/index.html")
$ie.Visible = $true
while ($ie.Busy -eq $true) { Start-Sleep -Milliseconds 2000; }
$html= $ie.Document.body.getElementsByTagName('div') | Where-Object {$_.className -eq "cardList-cards cardList-isotopeContainer"}
$lines = $html.innerHTML.split("`n")
$prices = $lines | Where-Object { $_ -Match '<span class=\"pricingContainer\-priceAmount\"' }
$prices = $prices | foreach { [regex]::matches($_, '>([0-9.$]*)</span>').Groups[1].Value }
echo $prices

Worked this bad boy out by opening the webpage , wait for correct html to load over the dynamic html then dumps to a txt file to read and search.
$path = "c:\sourcecode.txt"
$ie = New-Object -com InternetExplorer.Application
$ie.visible=$true
$ie.navigate("blahblahblahblah insert webpage here")
while($ie.ReadyState -ne 4) {start-sleep -s 10}
$ie.Document.body.outerHTML | Out-File -FilePath $path
$pricebf = select-string -path $path -pattern "pricingContainer-priceAmount" | select-object -First 1 | select Line
$Descriptionbf = select-string -path $path -pattern "canOpenDetail --><a title=" | select-object -First 1 | select Line

Related

Downloading .xlsx attachment from Outlook of Specific Date using Powershell

I have the below script. This $Tests shows the list of .xlsx attachment of specific date but is not able to download and throws an error. Please find the below script.
Add-type -assembly "Microsoft.Office.Interop.Outlook"
$olDefaultFolders = "Microsoft.Office.Interop.Outlook.olDefaultFolders" -as [type]
$outlook = New-Object -comobject Outlook.Application
$mapi = $outlook.GetNameSpace(“MAPI”)
$inbox = $mapi.GetDefaultFolder(6)
$FilePath= "c:\temp\Test\"
$subfolder = $inbox.Folders | Where-Object {$_.Name -eq “Test”}
$mail=$subfolder.Items |Select-Object -Property "ReceivedTime",#{name="Attachments";expression={$_.Attachments|%{$_.DisplayName}}} | Where-Object{$_.attachments -match ".xlsx" -and ($_.receivedtime -match "9/15/2020")} | Select-Object "attachments"
$Test = $mail.attachments
foreach ($out in $test) {$_.attachments|foreach {
Write-Host $_.filename
$Filename = $_.filename
If ($out.Contains("xlsx")) {
$_.saveasfile((Join-Path $FilePath "$out")) }}}
I am able to filter the .xlsx Attachments with Specific Date. But after this, I don't know how to save/download them.
Working with com objects can be rather frustrating in powershell. I recommend you get extremely familiar with Get-Member. You really have to interrogate each object. I've simplified your script as well as tested thoroughly. It will download each matching attachment (name) from each match email (received date)
Add-type -assembly "Microsoft.Office.Interop.Outlook"
$olDefaultFolders = "Microsoft.Office.Interop.Outlook.olDefaultFolders" -as [type]
$outlook = New-Object -comobject Outlook.Application
$mapi = $outlook.GetNameSpace(“MAPI”)
$inbox = $mapi.GetDefaultFolder(6)
$FilePath= "c:\temp\Test\"
$subfolder.Items | Where-Object {$_.receivedtime -match "9/20/2020" -and $($_.attachments).filename -match '.xlsx'} | foreach {
$filename = $($_.attachments | where filename -match '.xlsx').filename
foreach($file in $filename)
{
Write-Host Downloading $file to $filepath -ForegroundColor green
$outpath = join-path $filepath $file
$($_.attachments).saveasfile($outpath)
}
}
You may use this for more of an "in-line" approach.
Add-type -assembly "Microsoft.Office.Interop.Outlook"
$olDefaultFolders = "Microsoft.Office.Interop.Outlook.olDefaultFolders" -as [type]
$outlook = New-Object -comobject Outlook.Application
$mapi = $outlook.GetNameSpace(“MAPI”)
$inbox = $mapi.GetDefaultFolder(6)
$FilePath= "c:\temp\Test\"
$subfolder.Items | Where-Object {$_.receivedtime -match "9/20/2020" -and $($_.attachments).filename -match '.xlsx'} | foreach {
foreach($attachment in $($_.attachments | where filename -match '.xlsx'))
{
Write-Host Downloading $attachment.filename to $filepath -ForegroundColor green
$attachment.SaveAsFile((join-path $FilePath $attachment.filename))
}
}

Powershell Error: "There is not enough memory or disk to complete the operation"

I'm running a powershell script to read multiple word documents. When running to around 700 documents, it shows error "There is not enough memory or disk to complete the operation".
Here is my code
$excel = New-Object -ComObject Excel.application
$source = 'powershell/attachments'
$docs = Get-ChildItem -Path $source -Recurse -Filter *cover*.docx
$XL = New-Object -ComObject Excel.Application
#Open the workbook
$WB = $XL.Workbooks.Open("powershell/result.xlsx")
#Activate Sheet1, pipe to Out-Null to avoid 'True' output to screen
$WB.Sheets.Item("Sheet1").Activate() | Out-Null
$SearchArray = #('employment', 'source of income', 'US address', 'residential address', 'ID', 'driver license', 'visa', 'passport', 'I-20', 'Social Security Card', 'information update form', 'w9', 'w8', 'tax', 'email address')
$word = New-Object -ComObject Word.application
foreach ($doc in $docs) {
$Document = $word.Documents.Open($doc)
$CVSInfo = $Document.Paragraphs | ForEach-Object{
foreach ($SerchText in $SearchArray) {
$_.Range.Text | Where-Object { $_-match $SerchText} | ForEach-Object {
$_-split ' ' | Select-Object -Last 1
}
}
}
$PathArray = $doc.FullName
#Launch Excel
#Find first blank row #, and activate the first cell in that row
$FirstBlankRow = $($xl.ActiveSheet.UsedRange.Rows)[-1].Row + 1
$XL.ActiveSheet.Range("A$FirstBlankRow").Activate()
#Create PSObject with the properties that we want, convert it to a tab delimited CSV, and copy it to the clipboard
$Record = [PSCustomObject]#{
'ID' = $PathArray
'Context' = $CVSInfo
}
$Record | ConvertTo-Csv -Delimiter "`t" -NoTypeInformation | Clip
#Paste at the currently active cell
$XL.ActiveSheet.Paste() | Out-Null
# Save and close
$WB.Save() | Out-Null
}
$WB.Close() | Out-Null
$XL.Quit() | Out-Null
#Release ComObject
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($XL)
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($word)
Thanks in advance!
It looks like you have hundreds of Word documents open. Don't forget to close them in each iteration of the loop:
$Document.Close()

Resolve Powershell Script to Click On Google Result

i have powershell script to find a website on google result and click on website link
$IE = new-object -com internetexplorer.application
$IE.navigate("https://www.google.com/search?q=%D8%AE%D8%A8%D8%B1%DA%AF%D8%B2%D8%A7%D8%B1%DB%8C+%D9%81%D8%A7%D8%B1%D8%B3&oq=%D8%AE%D8%A8%D8%B1%DA%AF%D8%B0%D8%A7%D8%B1%DB%8C")
$IE.visible=$true
while ($IE.busy) {sleep 10}
$Link = ($html.Links |Where-Object { $_.class -eq 'https://khabarfarsi.com' }) |Select-Object -ExpandProperty href
$Link = ($HTML.ParsedHtml.getElementsByTagName("a") | Where {$_.className -eq 'https://khabarfarsi.com'}).InnerHTML
$Link = #($IE.Document.getElementsByTagName("a") | ? {$_.InnerHTML -like 'https://khabarfarsi.com'})[0]
if ($Link -eq $null){ $Link = $IE.Document.getElementsByTagName("a") | ? {$_.InnerHTML -like 'https://khabarfarsi.com'} }
if ($Link -eq $null){$ie.quit(); Break}
$Link.click()
I have a number of beginner trainees who often have trouble doing this. I want to create a script that they don't need to do manually.
Thank you for accompanying me in doing this
Thanks
Regards
Here is a working script that can find and click the link.
$ie = new-object -com internetexplorer.application
$ie.visible=$true
$ie.navigate("https://www.google.com/search?q=%D8%AE%D8%A8%D8%B1%DA%AF%D8%B2%D8%A7%D8%B1%DB%8C+%D9%81%D8%A7%D8%B1%D8%B3&oq=%D8%AE%D8%A8%D8%B1%DA%AF%D8%B0%D8%A7%D8%B1%DB%8C")
while($ie.busy) {sleep 1}
$link = $ie.Document.getElementsByTagName('A') | where-object {$_.href -eq 'https://khabarfarsi.com/w/farsnews.com'}
$link.click()
Output:

Download an image from website

I ran this powershell script to download an image from a website (to download it, certain steps had to be made, that's why I used IE navigate). I put a random string with a space between 4 and 4 characters.
But I get an error and it doesn't even start to fill the blank with the string:
Exception from HRESULT: 0x800A01B6
At E:\getbd.ps1:13 char:1
+ $ie.Document.getElementsByTagName("text") | where { $.name -eq "words ...
Here is the code:
$url = "https://fakecaptcha.com"
$set = "abcdefghijklmnopqrstuvwxyz0123456789".ToCharArray()
for($i=1; $i -le 4; $i++){
$result += $set | Get-Random}
$result += ' '
for($i=1; $i -le 4; $i++){
$result += $set | Get-Random}
$ie = New-Object -comobject InternetExplorer.Application
$ie.visible = $true
$ie.silent = $true
$ie.Navigate( $url )
while( $ie.busy){Start-Sleep 1}
$ie.Document.getElementsByTagName("text") | where { $.name -eq "words" }.value = $result
$generateBtn = $ie.Document.getElementsById('input') | Where-Object {$_.Type -eq 'submit' -and $_.Value -eq 'Create it now!'}
$generateBtn.click()
while( $ie.busy){Start-Sleep 1}
$readyBtn = $ie.Document.getElementsById('input') | Where-Object {$_.Type -eq 'button' -and $_.Value -eq 'Your captcha is done! Please click here to view it!!'}
$readyBtn.click()
while( $wc.busy){Start-Sleep 1}
$downloadBtn = $ie.Document.getElementsById('input') | Where-Object {$_.Type -eq 'button' -and $_.Value -eq 'DOWNLOAD'}
$downloadBtn.click()
while( $ie.busy){Start-Sleep 1}
$source = $ie.document.getElementsByTagName('img') | Select-Object -ExpandProperty src
$file = '$E:\bdlic\'+$result+'.jpg'
$wc = New-Object System.Net.WebClient
$wc.DownloadFile($source,$file)
while( $wc.busy){Start-Sleep 1}
$ie.quit()
You have 2 syntax errors in that line:
$ie.Document.getElementsByTagName("text") | where { $.name -eq "words" }.value = $result
# ^ ^^^^^^
$.Name: The "current object" variable is $_, not just $.
where {...}.value: You cannot use dot-notation on the scriptblock of a Where-Object statement. You need to put the entire statement in a (sub)expression for that.
Change the line to this:
($ie.Document.getElementsByTagName("text") | where { $_.name -eq "words" }).value = $result

Powershell, URL shortcut

I've got a problem and i need your help. I'm trying to do a shortcut from the active URL. I tried a few things and got to this.
Param([switch]$Full, [switch]$Location, [switch]$Content)
$urls = (New-Object -ComObject Shell.Application).Windows() |
Where-Object {$_.LocationUrl -match "(^https?://.+)|(^ftp://)"} |
Where-Object {$_.LocationUrl}
if($Full)
{
$urls
}
elseif($Location)
{
$urls | select Location*
}
elseif($Content)
{
$urls | ForEach-Object {
$ie.LocationName;
$ie.LocationUrl;
$_.Document.body.innerText
}
}
else
{
$urls | ForEach-Object {$_.LocationUrl}
}
$Shortcut = $WshShell.CreateShortcut("E:\Powershell\Ziel\short.lnk")
$Shortcut.TargetPath = "$urls"
$Shortcut.Save()
But i get an shortcut which makes no sense. What do i do wrong? I'm happy about any suggestion.
I tried now doing it like this:
Param([switch]$Full, [switch]$Location, [switch]$Content)
$urls = (New-Object -ComObject Shell.Application).Windows() |
Where-Object {$_.LocationUrl -match "(^https?://.+)|(^ftp://)"} |
Where-Object {$_.LocationUrl}
if($Full)
{
$urls
}
elseif($Location)
{
$urls | select Location*
}
elseif($Content)
{
$urls | ForEach-Object {
$ie.LocationName;
$ie.LocationUrl;
$_.Document.body.innerText
}
}
else
{
$urls | ForEach-Object {$_.LocationUrl}
}
$url = $urls | ForEach-Object {$_.LocationUrl} | select -First 1
$Shortcut = $WshShell.CreateShortcut("E:\Powershell\Ziel\short.lnk")
$Shortcut.TargetPath = $url
$Shortcut.Save()
But no it tells me that "$Shortcut = $WshShell.CreateShortcut("E:\Powershell\Ziel\short.lnk")" has the value NULL. I mean, how is that even possible. I don't get it. Please help.
This one is wrong... (this is not a valid url, it just an array of objects)
$Shortcut.TargetPath = "$urls"
You need to select one of the url's first, for example:
$url = $urls | ForEach-Object {$_.LocationUrl} | select -First 1
Then:
$Shortcut = $WshShell.CreateShortcut("E:\Powershell\Ziel\short.lnk")
$Shortcut.TargetPath = $url
$Shortcut.Save()
if you want to create a url for each of the URL's Array, then you can use
foreach, like this:
foreach ($url in $URLs)
{
$UrlName = $url.LocationName.Substring(0,8)
$Link = $url.LocationUrl
$Shortcut = $WshShell.CreateShortcut("E:\Powershell\Ziel\$UrlName.lnk")
$Shortcut.TargetPath = $Link
$Shortcut.Save()
}