I am attempting to pull the text from a table on a webpage. I pull the webpage using Invoke-WebRequest, set that variable to show "AllElements" and attempt to only pull the inner values matching "Table"; but when I run the script nothing is pulled back and no errors are shown.
$URI = 'https://www.python.org/downloads/release/python-2716/'
$R = Invoke-WebRequest -URI $URI
$R.AllElements|?{$_.Class -eq "table"}|select innerText
I was hoping to show the values of the table on the python.org site, but when the script is run nothing is returned.
How do I solve this problem?
That is because there are no tables or table class, it's a div with dynamically generated ordered list items.
You can see this in the browser developer tools, using F12 in Edge or similar in Firefox, Chrome, etc...
$URI = 'https://www.python.org/downloads/release/python-2716'
$R = Invoke-WebRequest -URI $URI
$R.AllElements |
Where {$_.Class -eq 'container' }
$R.AllElements |
Where {$_.Class -eq 'list-row-container menu' }
($R.AllElements |
Where {$_.class -eq 'list-row-container menu'}).innerText
($R.AllElements |
Where {$_.Class -eq 'release-number' })
($R.AllElements |
Where {$_.Class -eq 'release-number' }).outerHTML
(($R.AllElements |
Where {$_.Class -eq 'release-number' }).outerHTML -split '<a href="|/">Python')[2]
Or just do this...
$R.Links
$R.Links.href
$R.Links.href -match 'downloads'
Related
I want to get download URL of the last version of GIMP from it's site ,I wrote a script but it returns the link name I do not know how to get the value
$web = Invoke-WebRequest -Uri "https://download.gimp.org/pub/gimp/v2.10/windows/"
$web.Links | Where-Object href -like '*exe' | select -Last 1 | select -expand href
the above code returne link name (gimp-2.10.32-setup.exe)
but I need the value ("https://download.gimp.org/pub/gimp/v2.10/windows/gimp-2.10.32-setup.exe")
can someone guide me how to do it
You know that the url presented is relative.
Just append the root part of the URL yourself.
$Uri = 'https://download.gimp.org/pub/gimp/v2.10/windows/'
$web = Invoke-WebRequest -Uri $uri
$ExeRelLink = $web.Links | Where-Object href -like '*exe' | select -Last 1 -expand href
# Here is your download link.
$DownloadLink = $Uri + $ExeRelLink
Additional Note
You can combine the -Last and -Expand from your 2 select statements into 1.
There are several downloads sites with exactly the same or very similar layout to this GIMP page, including many Apache projects like Tomcat and ActiveMQ. I had written a little function to parse these and other pages in the past, and interestingly it also worked for this GIMP page. I thought it was worth sharing as such.
Function Extract-FilenameFromWebsite {
[cmdletbinding()]
Param(
[parameter(Position=0,ValueFromPipeline)]
$Url
)
begin{
$pattern = '<a href.+">(?<FileName>.+?\..+?)</a>\s+(?<Date>\d+-.+?)\s{2,}(?<Size>\d+\w)?'
}
process{
$website = Invoke-WebRequest $Url -UseBasicParsing
switch -Regex ($website.Content -split '\r?\n'){
$pattern {
[PSCustomObject]#{
FileName = $matches.FileName
URL = '{0}{1}' -f $Url,$matches.FileName
LastModified = [datetime]$matches.Date
Size = $matches.Size
}
}
}
}
}
It's assumed the site passed in has a trailing slash. If you want to account for either, you can add this simple line to the process block.
if($Url -notmatch '/$'){$Url = "$Url/"}
To get the latest version, call the function like this
$url = 'https://download.gimp.org/pub/gimp/v2.10/windows/'
$latest = Extract-FilenameFromWebsite -Url $Url | Where-Object filename -like '*exe' |
Sort-Object LastModified | Select-Object -Last 1
$latest.url
Or you could expand the property while retrieving
$url = 'https://download.gimp.org/pub/gimp/v2.10/windows/'
$latesturl = Extract-FilenameFromWebsite -Url $Url | Where-Object filename -like '*exe' |
Sort-Object LastModified | Select-Object -Last 1 -ExpandProperty URL
$latesturl
I'm trying to figure out how to call the odata.nextlink from a powershell script im writing to get Azure AD signin information for users.
$LastLogin = Invoke-WebRequest -Headers $AuthHeader1 -Uri "https://graph.microsoft.com/beta/users?`$select=displayName,userPrincipalName,signInActivity" -Verbose
$result = ($LastLogin.Content | ConvertFrom-Json).Value
$result | select DisplayName,UserPrincipalName,#{n="LastLoginDate";e={$_.signInActivity.lastSignInDateTime}}
and it results in the first 100 results being disaplayed
if I view the $lastLogin output I can see the content includes the odata.nextlink option but I can't seem to get the uri to pass into a while loop to get all the results
$lastLogin output image
if I do $lastLogin."#odata.nextLink' it just returns a null value.
Where am I going wrong?
Thanks
In the second step rather using the
$result = ($LastLogin.Content | ConvertFrom-Json).Value
Use $result = ($LastLogin.Content | ConvertFrom-Json) and then pull the nextLink using the $result.'#odata.nextLink'
It worked for me.
I was trying to build a simple spider that returns the urls of images from a web-page (not the whole website). And I was using this:
$iwr=Invoke-WebRequest -Uri "$Uri" -UseBasicParsing
But, recently,I found out that sometimes it doesn't return all the image urls, specially the images I was trying to get. And , removing the -UseBasicParsing switch solves the problem as below:
$iwr=Invoke-WebRequest -Uri "$Uri"
But, then, it creates another problem. [Edit] As soon as I execute the next statement below:
$iwr.Images
or
$iwr.Images.src
it opens up a pop-up saying
"You ll need an app to open this about."
I have already configured my Internet explorer for first time use way days ago, and i have rechecked it. I changed the user agent to chrome, and i am still getting the pop up.
How do i prevent this pop-up for any webpage/website in general?
[Edit]: A more efficient script solved the problem, which still uses the -UseBasicParsing switch. It doesn't give any pop-up but returns all the image urls, including the somehow 'masked' urls. The credit goes to #postanote as below:
Clear-Host
# Regular expression Urls terminating with '.jpg' or '.png' for domain name space
$regexDomainAddress = "[(http(s)?):\/\/(www\.)?a-z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-z0-9#:%_\+.~#?&//=]*)((.jpg(\/)?)|(.png(\/)?)){1}(?!([\w\/]+))"
$images=((Invoke-WebRequest –Uri $url -UseBasicParsing).Images `
| Select-String -pattern $regexDomainAddress -Allmatches `
| ForEach-Object {$_.Matches} `
| Select-Object $_.Value -Unique).Value -replace 'href=','' `
| Select-Object -Unique
What you are attempting to do sounds very similar to this post:
How do I get the output file to contain the images on the webpage and
not just the links to the images?
invoke-webrequest to get complete web page with images
Update
Follow-up after the OP update
Using your exact post, I do not get any popups at all on the systems I tested on.
$iwr=Invoke-WebRequest -Uri "$url" -UseBasicParsing
$iwr.Images
outerHTML : <img id="id_p" class="id_avatar sw_spd" style="display:none" aria-hidden="true"
src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAEALAAAAAABAAEAAAIBTAA7" aria-label="Profile Picture"
onError="FallBackToDefaultProfilePic(this)"/>
tagName : IMG
id : id_p
class : id_avatar sw_spd
style : display:none
aria-hidden : true
src : data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAEALAAAAAABAAEAAAIBTAA7
aria-label : Profile Picture
onError : FallBackToDefaultProfilePic(this)
...
$iwr.Images.src
data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAEALAAAAAABAAEAAAIBTAA7
/sa/simg/sw_mg_l_4d_cct.png
http://tse3.mm.bing.net/th?id=OIP.fIx_Z6ywbsKCvY-PQkH8NAHaGN&w=230&h=170&rs=1&pcl=dddddd&o=5&pid=1.1
....
So, this sounds like something environmental on your host(s). So, give the below approach a shot and see if you get hit with any popups. It's more code, but may be an option, if it works for your use case.
Clear-Host
# Regular expression Urls terminating with '.jpg' or '.png' for domain name space
$regexDomainAddress = "[(http(s)?):\/\/(www\.)?a-z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-z0-9#:%_\+.~#?&//=]*)((.jpg(\/)?)|(.png(\/)?)){1}(?!([\w\/]+))"
((Invoke-WebRequest –Uri $url).Links `
| Select-String -pattern $regexDomainAddress -Allmatches `
| ForEach-Object {$_.Matches} `
| Select-Object $_.Value -Unique).Value -replace 'href=','' `
| Select-Object -Unique
Clear-Host
# Regular expression Urls terminating with '.jpg' or '.png' for relative url
$regexRelativeUrl = "[a-z]{2,6}\b([-a-z0-9#:%_\+.~#?&//=]*)((.jpg(\/)?)|(.png(\/)?)){1}(?!([\w\/]+))"
((Invoke-WebRequest –Uri $url).Links `
| Select-String -pattern $regexRelativeUrl -Allmatches `
| ForEach-Object {$_.Matches} `
| Select-Object $_.Value -Unique).Value -replace 'href=','' `
| Select-Object -Unique
This seems like it should be straight forward but I'm not sure why Powershell is having trouble.
I'm getting data from Node.js converting it to JSON and then I want to get the first object which is not false.
(Invoke-WebRequest -UseBasicParsing -Uri "https://nodejs.org/dist/index.json").Content |
ConvertFrom-Json | ? { $_.lts -ne 'False' }
I also tried but it didn't work either:
| ? { -not (-not $_.lts) }
I know the above doesn't actually get me the first value. I haven't found that solution yet. But help with that would be nice too!
The data set is something like this:
[
{"lts": false},
{"lts": 'Carbon'}
]
You can see the complete data set here.
Update
When I set the JSON value to a variable it works. Strange.
Invoke-WebRequest -UseBasicParsing -Uri 'https://nodejs.org/dist/index.json' `
|% Content `
| ConvertFrom-Json `
|% { $_ } `
|? lts -ne $False `
;
The ConvertFrom-Json converts the JSON array into Object[] which has to be exploded to be processed record-by-record. The ForEach-Object after ConvertFrom-Json splits them up nicely.
I have a piece of code that i managed to get working, but i feel that it can be written a lot easier. Im new with PowerShell and am trying to understand it better. I have a double foreach below to get the key and value out of the PSCustomObject that comes out of the TFS REST-API call.
For some reason im doing 2 loops, but i dont understand why this is required.
A sample of the contents of $nameCap.userCapabilities is
Name1 Name2
----- -----
Value1 Value2
So basically i want to loop over the "name/value pairs" and get their values.
What can i do better ?
$uri = "$tfsUri/_apis/distributedtask/pools/$global:agentPoolId/agents?api-version=3.0-preview&includeCapabilities=true"
$result = (Invoke-RestMethod -Uri $uri -Method Get -ContentType "application/json" -UseDefaultCredentials).value | select name, userCapabilities, systemCapabilities
#Loop over all agents and their capablities
foreach ($nameCap in $result)
{
$capabilityNamesList = New-Object System.Collections.ArrayList
#Loop over all userCapabilities and store their names
#($nameCap.userCapabilities) | %{
$current_Cap = $_
$req_cap_exists = $false
Get-Member -MemberType Properties -InputObject $current_Cap | %{
$temp_NAME = $_.Name
$temp_Value = Select-Object -InputObject $current_Cap -ExpandProperty $_.Name
[void]$capabilityNamesList.Add($temp_NAME)
}
}
}
I mean if you just need the Name and value, like userCapabilities, then just select for it.
so:
$result | select Name,userCapabilites
And if it doesn't give you a table automatically, then | ft -force