Issues pulling back values while web scraping a table - powershell

I am attempting to pull the text from a table on a webpage. I pull the webpage using Invoke-WebRequest, set that variable to show "AllElements" and attempt to only pull the inner values matching "Table"; but when I run the script nothing is pulled back and no errors are shown.
$URI = 'https://www.python.org/downloads/release/python-2716/'
$R = Invoke-WebRequest -URI $URI
$R.AllElements|?{$_.Class -eq "table"}|select innerText
I was hoping to show the values of the table on the python.org site, but when the script is run nothing is returned.
How do I solve this problem?

That is because there are no tables or table class, it's a div with dynamically generated ordered list items.
You can see this in the browser developer tools, using F12 in Edge or similar in Firefox, Chrome, etc...
$URI = 'https://www.python.org/downloads/release/python-2716'
$R = Invoke-WebRequest -URI $URI
$R.AllElements |
Where {$_.Class -eq 'container' }
$R.AllElements |
Where {$_.Class -eq 'list-row-container menu' }
($R.AllElements |
Where {$_.class -eq 'list-row-container menu'}).innerText
($R.AllElements |
Where {$_.Class -eq 'release-number' })
($R.AllElements |
Where {$_.Class -eq 'release-number' }).outerHTML
(($R.AllElements |
Where {$_.Class -eq 'release-number' }).outerHTML -split '<a href="|/">Python')[2]
Or just do this...
$R.Links
$R.Links.href
$R.Links.href -match 'downloads'

Related

How to get value of a link from a website with powershell?

I want to get download URL of the last version of GIMP from it's site ,I wrote a script but it returns the link name I do not know how to get the value
$web = Invoke-WebRequest -Uri "https://download.gimp.org/pub/gimp/v2.10/windows/"
$web.Links | Where-Object href -like '*exe' | select -Last 1 | select -expand href
the above code returne link name (gimp-2.10.32-setup.exe)
but I need the value ("https://download.gimp.org/pub/gimp/v2.10/windows/gimp-2.10.32-setup.exe")
can someone guide me how to do it
You know that the url presented is relative.
Just append the root part of the URL yourself.
$Uri = 'https://download.gimp.org/pub/gimp/v2.10/windows/'
$web = Invoke-WebRequest -Uri $uri
$ExeRelLink = $web.Links | Where-Object href -like '*exe' | select -Last 1 -expand href
# Here is your download link.
$DownloadLink = $Uri + $ExeRelLink
Additional Note
You can combine the -Last and -Expand from your 2 select statements into 1.
There are several downloads sites with exactly the same or very similar layout to this GIMP page, including many Apache projects like Tomcat and ActiveMQ. I had written a little function to parse these and other pages in the past, and interestingly it also worked for this GIMP page. I thought it was worth sharing as such.
Function Extract-FilenameFromWebsite {
[cmdletbinding()]
Param(
[parameter(Position=0,ValueFromPipeline)]
$Url
)
begin{
$pattern = '<a href.+">(?<FileName>.+?\..+?)</a>\s+(?<Date>\d+-.+?)\s{2,}(?<Size>\d+\w)?'
}
process{
$website = Invoke-WebRequest $Url -UseBasicParsing
switch -Regex ($website.Content -split '\r?\n'){
$pattern {
[PSCustomObject]#{
FileName = $matches.FileName
URL = '{0}{1}' -f $Url,$matches.FileName
LastModified = [datetime]$matches.Date
Size = $matches.Size
}
}
}
}
}
It's assumed the site passed in has a trailing slash. If you want to account for either, you can add this simple line to the process block.
if($Url -notmatch '/$'){$Url = "$Url/"}
To get the latest version, call the function like this
$url = 'https://download.gimp.org/pub/gimp/v2.10/windows/'
$latest = Extract-FilenameFromWebsite -Url $Url | Where-Object filename -like '*exe' |
Sort-Object LastModified | Select-Object -Last 1
$latest.url
Or you could expand the property while retrieving
$url = 'https://download.gimp.org/pub/gimp/v2.10/windows/'
$latesturl = Extract-FilenameFromWebsite -Url $Url | Where-Object filename -like '*exe' |
Sort-Object LastModified | Select-Object -Last 1 -ExpandProperty URL
$latesturl

Calling #odata.nextlink from powershell

I'm trying to figure out how to call the odata.nextlink from a powershell script im writing to get Azure AD signin information for users.
$LastLogin = Invoke-WebRequest -Headers $AuthHeader1 -Uri "https://graph.microsoft.com/beta/users?`$select=displayName,userPrincipalName,signInActivity" -Verbose
$result = ($LastLogin.Content | ConvertFrom-Json).Value
$result | select DisplayName,UserPrincipalName,#{n="LastLoginDate";e={$_.signInActivity.lastSignInDateTime}}
and it results in the first 100 results being disaplayed
if I view the $lastLogin output I can see the content includes the odata.nextlink option but I can't seem to get the uri to pass into a while loop to get all the results
$lastLogin output image
if I do $lastLogin."#odata.nextLink' it just returns a null value.
Where am I going wrong?
Thanks
In the second step rather using the
$result = ($LastLogin.Content | ConvertFrom-Json).Value
Use $result = ($LastLogin.Content | ConvertFrom-Json) and then pull the nextLink using the $result.'#odata.nextLink'
It worked for me.

Powershell: -UseBasicParsing doesn't return all the src elements

I was trying to build a simple spider that returns the urls of images from a web-page (not the whole website). And I was using this:
$iwr=Invoke-WebRequest -Uri "$Uri" -UseBasicParsing
But, recently,I found out that sometimes it doesn't return all the image urls, specially the images I was trying to get. And , removing the -UseBasicParsing switch solves the problem as below:
$iwr=Invoke-WebRequest -Uri "$Uri"
But, then, it creates another problem. [Edit] As soon as I execute the next statement below:
$iwr.Images
or
$iwr.Images.src
it opens up a pop-up saying
"You ll need an app to open this about."
I have already configured my Internet explorer for first time use way days ago, and i have rechecked it. I changed the user agent to chrome, and i am still getting the pop up.
How do i prevent this pop-up for any webpage/website in general?
[Edit]: A more efficient script solved the problem, which still uses the -UseBasicParsing switch. It doesn't give any pop-up but returns all the image urls, including the somehow 'masked' urls. The credit goes to #postanote as below:
Clear-Host
# Regular expression Urls terminating with '.jpg' or '.png' for domain name space
$regexDomainAddress = "[(http(s)?):\/\/(www\.)?a-z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-z0-9#:%_\+.~#?&//=]*)((.jpg(\/)?)|(.png(\/)?)){1}(?!([\w\/]+))"
$images=((Invoke-WebRequest –Uri $url -UseBasicParsing).Images `
| Select-String -pattern $regexDomainAddress -Allmatches `
| ForEach-Object {$_.Matches} `
| Select-Object $_.Value -Unique).Value -replace 'href=','' `
| Select-Object -Unique
What you are attempting to do sounds very similar to this post:
How do I get the output file to contain the images on the webpage and
not just the links to the images?
invoke-webrequest to get complete web page with images
Update
Follow-up after the OP update
Using your exact post, I do not get any popups at all on the systems I tested on.
$iwr=Invoke-WebRequest -Uri "$url" -UseBasicParsing
$iwr.Images
outerHTML : <img id="id_p" class="id_avatar sw_spd" style="display:none" aria-hidden="true"
src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAEALAAAAAABAAEAAAIBTAA7" aria-label="Profile Picture"
onError="FallBackToDefaultProfilePic(this)"/>
tagName : IMG
id : id_p
class : id_avatar sw_spd
style : display:none
aria-hidden : true
src : data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAEALAAAAAABAAEAAAIBTAA7
aria-label : Profile Picture
onError : FallBackToDefaultProfilePic(this)
...
$iwr.Images.src
data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAEALAAAAAABAAEAAAIBTAA7
/sa/simg/sw_mg_l_4d_cct.png
http://tse3.mm.bing.net/th?id=OIP.fIx_Z6ywbsKCvY-PQkH8NAHaGN&w=230&h=170&rs=1&pcl=dddddd&o=5&pid=1.1
....
So, this sounds like something environmental on your host(s). So, give the below approach a shot and see if you get hit with any popups. It's more code, but may be an option, if it works for your use case.
Clear-Host
# Regular expression Urls terminating with '.jpg' or '.png' for domain name space
$regexDomainAddress = "[(http(s)?):\/\/(www\.)?a-z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-z0-9#:%_\+.~#?&//=]*)((.jpg(\/)?)|(.png(\/)?)){1}(?!([\w\/]+))"
((Invoke-WebRequest –Uri $url).Links `
| Select-String -pattern $regexDomainAddress -Allmatches `
| ForEach-Object {$_.Matches} `
| Select-Object $_.Value -Unique).Value -replace 'href=','' `
| Select-Object -Unique
Clear-Host
# Regular expression Urls terminating with '.jpg' or '.png' for relative url
$regexRelativeUrl = "[a-z]{2,6}\b([-a-z0-9#:%_\+.~#?&//=]*)((.jpg(\/)?)|(.png(\/)?)){1}(?!([\w\/]+))"
((Invoke-WebRequest –Uri $url).Links `
| Select-String -pattern $regexRelativeUrl -Allmatches `
| ForEach-Object {$_.Matches} `
| Select-Object $_.Value -Unique).Value -replace 'href=','' `
| Select-Object -Unique

Get first value that is not false from json mixed types

This seems like it should be straight forward but I'm not sure why Powershell is having trouble.
I'm getting data from Node.js converting it to JSON and then I want to get the first object which is not false.
(Invoke-WebRequest -UseBasicParsing -Uri "https://nodejs.org/dist/index.json").Content |
ConvertFrom-Json | ? { $_.lts -ne 'False' }
I also tried but it didn't work either:
| ? { -not (-not $_.lts) }
I know the above doesn't actually get me the first value. I haven't found that solution yet. But help with that would be nice too!
The data set is something like this:
[
{"lts": false},
{"lts": 'Carbon'}
]
You can see the complete data set here.
Update
When I set the JSON value to a variable it works. Strange.
Invoke-WebRequest -UseBasicParsing -Uri 'https://nodejs.org/dist/index.json' `
|% Content `
| ConvertFrom-Json `
|% { $_ } `
|? lts -ne $False `
;
The ConvertFrom-Json converts the JSON array into Object[] which has to be exploded to be processed record-by-record. The ForEach-Object after ConvertFrom-Json splits them up nicely.

PowerShell TFS REST-API object loop advise

I have a piece of code that i managed to get working, but i feel that it can be written a lot easier. Im new with PowerShell and am trying to understand it better. I have a double foreach below to get the key and value out of the PSCustomObject that comes out of the TFS REST-API call.
For some reason im doing 2 loops, but i dont understand why this is required.
A sample of the contents of $nameCap.userCapabilities is
Name1 Name2
----- -----
Value1 Value2
So basically i want to loop over the "name/value pairs" and get their values.
What can i do better ?
$uri = "$tfsUri/_apis/distributedtask/pools/$global:agentPoolId/agents?api-version=3.0-preview&includeCapabilities=true"
$result = (Invoke-RestMethod -Uri $uri -Method Get -ContentType "application/json" -UseDefaultCredentials).value | select name, userCapabilities, systemCapabilities
#Loop over all agents and their capablities
foreach ($nameCap in $result)
{
$capabilityNamesList = New-Object System.Collections.ArrayList
#Loop over all userCapabilities and store their names
#($nameCap.userCapabilities) | %{
$current_Cap = $_
$req_cap_exists = $false
Get-Member -MemberType Properties -InputObject $current_Cap | %{
$temp_NAME = $_.Name
$temp_Value = Select-Object -InputObject $current_Cap -ExpandProperty $_.Name
[void]$capabilityNamesList.Add($temp_NAME)
}
}
}
I mean if you just need the Name and value, like userCapabilities, then just select for it.
so:
$result | select Name,userCapabilites
And if it doesn't give you a table automatically, then | ft -force