How to get value of a link from a website with powershell? - powershell

I want to get download URL of the last version of GIMP from it's site ,I wrote a script but it returns the link name I do not know how to get the value
$web = Invoke-WebRequest -Uri "https://download.gimp.org/pub/gimp/v2.10/windows/"
$web.Links | Where-Object href -like '*exe' | select -Last 1 | select -expand href
the above code returne link name (gimp-2.10.32-setup.exe)
but I need the value ("https://download.gimp.org/pub/gimp/v2.10/windows/gimp-2.10.32-setup.exe")
can someone guide me how to do it

You know that the url presented is relative.
Just append the root part of the URL yourself.
$Uri = 'https://download.gimp.org/pub/gimp/v2.10/windows/'
$web = Invoke-WebRequest -Uri $uri
$ExeRelLink = $web.Links | Where-Object href -like '*exe' | select -Last 1 -expand href
# Here is your download link.
$DownloadLink = $Uri + $ExeRelLink
Additional Note
You can combine the -Last and -Expand from your 2 select statements into 1.

There are several downloads sites with exactly the same or very similar layout to this GIMP page, including many Apache projects like Tomcat and ActiveMQ. I had written a little function to parse these and other pages in the past, and interestingly it also worked for this GIMP page. I thought it was worth sharing as such.
Function Extract-FilenameFromWebsite {
[cmdletbinding()]
Param(
[parameter(Position=0,ValueFromPipeline)]
$Url
)
begin{
$pattern = '<a href.+">(?<FileName>.+?\..+?)</a>\s+(?<Date>\d+-.+?)\s{2,}(?<Size>\d+\w)?'
}
process{
$website = Invoke-WebRequest $Url -UseBasicParsing
switch -Regex ($website.Content -split '\r?\n'){
$pattern {
[PSCustomObject]#{
FileName = $matches.FileName
URL = '{0}{1}' -f $Url,$matches.FileName
LastModified = [datetime]$matches.Date
Size = $matches.Size
}
}
}
}
}
It's assumed the site passed in has a trailing slash. If you want to account for either, you can add this simple line to the process block.
if($Url -notmatch '/$'){$Url = "$Url/"}
To get the latest version, call the function like this
$url = 'https://download.gimp.org/pub/gimp/v2.10/windows/'
$latest = Extract-FilenameFromWebsite -Url $Url | Where-Object filename -like '*exe' |
Sort-Object LastModified | Select-Object -Last 1
$latest.url
Or you could expand the property while retrieving
$url = 'https://download.gimp.org/pub/gimp/v2.10/windows/'
$latesturl = Extract-FilenameFromWebsite -Url $Url | Where-Object filename -like '*exe' |
Sort-Object LastModified | Select-Object -Last 1 -ExpandProperty URL
$latesturl

Related

web scraping using powershell

I am trying to scrape the pages of website https://www.enghindi.com/ .
URLs are saved in csv file, for example
URL
Hindi meaning
Url1
hindi meaning
url2
hindi meaning
now, everytime I am running following script . it just shows result of only URL1 and that goes into multiple cells. I want all result of url 1 should be in one cell (in hindi meaning box) and similarly for URL2.
url1 : https://www.enghindi.com/index.php?q=close
url2 : https://www.enghindi.com/index.php?q=compose
$URLs = import-csv -path C:\Scripts\PS\urls.csv | select -expandproperty urls
foreach ($url in $urls)
{
$web = Invoke-WebRequest $url
$data = $web.AllElements | Where{$_.TagName -eq "BIG"} | Select-Object -Expand InnerText
$datafinal = $data.where({$_ -like "*which*"},'until')
}
foreach ($item in $datafinal) {
[ pscustomobject]#{ Url = $url; Data = $item } | Export-Csv -Path C:\Scripts\PS\output.csv -NoTypeInformation -Encoding unicode -Append
}
Are there other ways I can get english to hindi word meaning using web scraping instead of copying and pasting. I prefer google translate but that I think difficult that is why i am trying with enghindi.com.
thanks alot
Web scraping, due its inherent unreliability, should only be a last resort.
You can make it work in Windows PowerShell, but note that the HTML DOM parsing is no longer available in PowerShell (Core) 7+.
You code has two basic problems:
It operates on $datafinal after the foreach loop, at which point you only see the results of the last Invoke-WebRequest call.
You loop over each element of array $datafinal and create an output object for each, instead of creating an output object per input URL.
The following reformulation fixes these problems:
# Sample input URLs
$URLs = #(
'https://www.enghindi.com/index.php?q=close',
'https://www.enghindi.com/index.php?q=compose'
)
$URLs |
ForEach-Object {
$web = Invoke-WebRequest $_
$data = $web.AllElements | Where { $_.TagName -eq "BIG" } | Select-Object -Expand InnerText
$datafinal = $data.where({ $_ -like "*which*" }, 'until')
# Create the output object for the URL at hand and implicitly output it.
# Join the $datafinal elements with newlines to form a single vaulue.
[pscustomobject] #{
Url = $_
Hindi = $datafinal -join "`n"
}
} |
ConvertTo-Csv -NoTypeInformation
Note that, for demonstration purposes, ConvertTo-Csv is used in lieu of Export-Csv, which allows you to see the results instantly.

getting the latest download url and version of a software from a website with powershell

I want to take the take the latest version of the software and check with the version that is installed on system if it is newer install the new version .
''' $web = Invoke-WebRequest -Uri "https://www.webex.com/downloads/jabber/jabber-vdi.html"
( $latest = $web.AllElements | Where-Object {$_.TagName -eq "li"} | Select-String "Jabber Windows client" | Select -first 1 )'''
for taking the version number and url I've wrote these but does not work
''' ( $latestversion = $latest.Context | Select-String -pattern "\d\d.\d")
( $downloadUrl=$latest.Context | Select-String -pattern "\w.msi" )'''
also I have tried this way but does not work
'''$latestversion = $latest.links.href '''
You can use the Links property to view all retrieved links, then filter it to select only those ending with "msi"
(Invoke-WebRequest -Uri "https://www.webex.com/downloads/jabber/jabber-vdi.html").Links | Where-Object href -like '*msi' | select -First 1 | select -expand href
edit: to get both, maybe use ParsedHtml like so:
(Invoke-WebRequest -Uri "https://www.webex.com/downloads/jabber/jabber-vdi.html").ParsedHtml.body.getElementsByClassName('vdi-links')[0].innerHTML -match "<LI>(\d{1,2}\.\d).*(https.*msi)"
write-host "Version $($Matches[1]) available at $($Matches[2])"
$Matches is an automatic variable which contains the result of the -match regex. The brackets in the match define our match groups, so for our regex of "<LI>(\d{1,2}\.\d).*(https.*msi)":
Our first match is (\d{1,2}\.\d) where \d is any number and {1,2} means match either 1 or 2 (so we could match "9" or "10"), \. matches the dot character literally
Our second match is (https.*msi) where . matches any character and * means match any number of occurances.

Issues pulling back values while web scraping a table

I am attempting to pull the text from a table on a webpage. I pull the webpage using Invoke-WebRequest, set that variable to show "AllElements" and attempt to only pull the inner values matching "Table"; but when I run the script nothing is pulled back and no errors are shown.
$URI = 'https://www.python.org/downloads/release/python-2716/'
$R = Invoke-WebRequest -URI $URI
$R.AllElements|?{$_.Class -eq "table"}|select innerText
I was hoping to show the values of the table on the python.org site, but when the script is run nothing is returned.
How do I solve this problem?
That is because there are no tables or table class, it's a div with dynamically generated ordered list items.
You can see this in the browser developer tools, using F12 in Edge or similar in Firefox, Chrome, etc...
$URI = 'https://www.python.org/downloads/release/python-2716'
$R = Invoke-WebRequest -URI $URI
$R.AllElements |
Where {$_.Class -eq 'container' }
$R.AllElements |
Where {$_.Class -eq 'list-row-container menu' }
($R.AllElements |
Where {$_.class -eq 'list-row-container menu'}).innerText
($R.AllElements |
Where {$_.Class -eq 'release-number' })
($R.AllElements |
Where {$_.Class -eq 'release-number' }).outerHTML
(($R.AllElements |
Where {$_.Class -eq 'release-number' }).outerHTML -split '<a href="|/">Python')[2]
Or just do this...
$R.Links
$R.Links.href
$R.Links.href -match 'downloads'

Powershell: -UseBasicParsing doesn't return all the src elements

I was trying to build a simple spider that returns the urls of images from a web-page (not the whole website). And I was using this:
$iwr=Invoke-WebRequest -Uri "$Uri" -UseBasicParsing
But, recently,I found out that sometimes it doesn't return all the image urls, specially the images I was trying to get. And , removing the -UseBasicParsing switch solves the problem as below:
$iwr=Invoke-WebRequest -Uri "$Uri"
But, then, it creates another problem. [Edit] As soon as I execute the next statement below:
$iwr.Images
or
$iwr.Images.src
it opens up a pop-up saying
"You ll need an app to open this about."
I have already configured my Internet explorer for first time use way days ago, and i have rechecked it. I changed the user agent to chrome, and i am still getting the pop up.
How do i prevent this pop-up for any webpage/website in general?
[Edit]: A more efficient script solved the problem, which still uses the -UseBasicParsing switch. It doesn't give any pop-up but returns all the image urls, including the somehow 'masked' urls. The credit goes to #postanote as below:
Clear-Host
# Regular expression Urls terminating with '.jpg' or '.png' for domain name space
$regexDomainAddress = "[(http(s)?):\/\/(www\.)?a-z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-z0-9#:%_\+.~#?&//=]*)((.jpg(\/)?)|(.png(\/)?)){1}(?!([\w\/]+))"
$images=((Invoke-WebRequest –Uri $url -UseBasicParsing).Images `
| Select-String -pattern $regexDomainAddress -Allmatches `
| ForEach-Object {$_.Matches} `
| Select-Object $_.Value -Unique).Value -replace 'href=','' `
| Select-Object -Unique
What you are attempting to do sounds very similar to this post:
How do I get the output file to contain the images on the webpage and
not just the links to the images?
invoke-webrequest to get complete web page with images
Update
Follow-up after the OP update
Using your exact post, I do not get any popups at all on the systems I tested on.
$iwr=Invoke-WebRequest -Uri "$url" -UseBasicParsing
$iwr.Images
outerHTML : <img id="id_p" class="id_avatar sw_spd" style="display:none" aria-hidden="true"
src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAEALAAAAAABAAEAAAIBTAA7" aria-label="Profile Picture"
onError="FallBackToDefaultProfilePic(this)"/>
tagName : IMG
id : id_p
class : id_avatar sw_spd
style : display:none
aria-hidden : true
src : data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAEALAAAAAABAAEAAAIBTAA7
aria-label : Profile Picture
onError : FallBackToDefaultProfilePic(this)
...
$iwr.Images.src
data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAEALAAAAAABAAEAAAIBTAA7
/sa/simg/sw_mg_l_4d_cct.png
http://tse3.mm.bing.net/th?id=OIP.fIx_Z6ywbsKCvY-PQkH8NAHaGN&w=230&h=170&rs=1&pcl=dddddd&o=5&pid=1.1
....
So, this sounds like something environmental on your host(s). So, give the below approach a shot and see if you get hit with any popups. It's more code, but may be an option, if it works for your use case.
Clear-Host
# Regular expression Urls terminating with '.jpg' or '.png' for domain name space
$regexDomainAddress = "[(http(s)?):\/\/(www\.)?a-z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-z0-9#:%_\+.~#?&//=]*)((.jpg(\/)?)|(.png(\/)?)){1}(?!([\w\/]+))"
((Invoke-WebRequest –Uri $url).Links `
| Select-String -pattern $regexDomainAddress -Allmatches `
| ForEach-Object {$_.Matches} `
| Select-Object $_.Value -Unique).Value -replace 'href=','' `
| Select-Object -Unique
Clear-Host
# Regular expression Urls terminating with '.jpg' or '.png' for relative url
$regexRelativeUrl = "[a-z]{2,6}\b([-a-z0-9#:%_\+.~#?&//=]*)((.jpg(\/)?)|(.png(\/)?)){1}(?!([\w\/]+))"
((Invoke-WebRequest –Uri $url).Links `
| Select-String -pattern $regexRelativeUrl -Allmatches `
| ForEach-Object {$_.Matches} `
| Select-Object $_.Value -Unique).Value -replace 'href=','' `
| Select-Object -Unique

PowerShell TFS REST-API object loop advise

I have a piece of code that i managed to get working, but i feel that it can be written a lot easier. Im new with PowerShell and am trying to understand it better. I have a double foreach below to get the key and value out of the PSCustomObject that comes out of the TFS REST-API call.
For some reason im doing 2 loops, but i dont understand why this is required.
A sample of the contents of $nameCap.userCapabilities is
Name1 Name2
----- -----
Value1 Value2
So basically i want to loop over the "name/value pairs" and get their values.
What can i do better ?
$uri = "$tfsUri/_apis/distributedtask/pools/$global:agentPoolId/agents?api-version=3.0-preview&includeCapabilities=true"
$result = (Invoke-RestMethod -Uri $uri -Method Get -ContentType "application/json" -UseDefaultCredentials).value | select name, userCapabilities, systemCapabilities
#Loop over all agents and their capablities
foreach ($nameCap in $result)
{
$capabilityNamesList = New-Object System.Collections.ArrayList
#Loop over all userCapabilities and store their names
#($nameCap.userCapabilities) | %{
$current_Cap = $_
$req_cap_exists = $false
Get-Member -MemberType Properties -InputObject $current_Cap | %{
$temp_NAME = $_.Name
$temp_Value = Select-Object -InputObject $current_Cap -ExpandProperty $_.Name
[void]$capabilityNamesList.Add($temp_NAME)
}
}
}
I mean if you just need the Name and value, like userCapabilities, then just select for it.
so:
$result | select Name,userCapabilites
And if it doesn't give you a table automatically, then | ft -force