Scraping Multiple Pages and Making a Table - powershell

I've been trying to find a way to successfully scrape data from a website easily and have found that using Powershell gets me the results needed, although I can only tell how to do it one by one.
The URLs go from www.example.com/Item/1 to www.example.com/Item/40 and present data from a form.
I've used the commands:
$WebResponse = Invoke-WebRequest "www.example.com/Item/1"
$WebResponse.Forms.Fields
And the results I get are what I need, but I was wanted to be able to do it for all 40 pages and make a readable table from it.
I'm really new to anything to do with powershell so I'm assuming there's just something I'm looking over.

Just chuck it in a loop:
for ( $i = 1; $i -lt 40; $i++ ) {
$WebResponse = Invoke-WebRequest "www.example.com/Item/$i"
$WebResponse.Forms.Fields
}

Another possible way to write it:
$start = 1
$end = 40
$start..$end | select {
$WebResponse = Invoke-WebRequest "www.example.com/Item/$_"
$WebResponse.Forms.Fields
}

Related

Read Invoke-WebRequest line by line

I'm trying to keep a central list of log file locations where my log file cleanup script can grab the most up to date list.
$logpaths = (Invoke-WebRequest -UseBasicParsing -Uri 'http://10.7.58.99/logpaths.txt').Content
foreach($logpath in $logpaths)
{
"line"
$logpath
}
My script was sort of working but I was seeing some strange behavior so when I broke it down I found that within the foreach loop it just loops once and dumps the entire contents.
If I download the file the a text file on the local machine I can then use [System.IO.File]::ReadLines and it steps through perfectly. However, I don't want to download the file each time I run it or store it on the local server at all for that matter. How can I step through the content of Invoke-WebRequest line by line?
Based on this example from the .NET docs, you could read a response stream line-by-line like this, which should have better performance.
$url = 'http://10.7.58.99/logpaths.txt'
& {
$myHttpWebRequest = [System.Net.WebRequest]::Create($url)
$myHttpWebResponse = $myHttpWebRequest.GetResponse()
$receiveStream = $myHttpWebResponse.GetResponseStream()
$encode = [System.Text.Encoding]::GetEncoding("utf-8")
$readStream = [System.IO.StreamReader]::new($receiveStream, $encode)
while (-not $readStream.EndOfStream) {
$readStream.ReadLine()
}
$myHttpWebResponse.Close()
$readStream.Close()
} | foreach {
$logPath = $_
}
You might want to turn this into a nice little function. Let me know if you need help.

Unable to display Data View result in Power shell

I am new to azure data explorer and Kusto Queries. I am learning from below online sample
https://dataexplorer.azure.com/clusters/help/databases/Samples
Here is the query which i am getting results in Data Explorer but unable to display in power shell
StormEvents
| where DamageProperty >0
| limit 2
| project StormSummary.TotalDamages
Below is reference link for code which i am trying to run query in Powershell (Example2 in below link page)
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/api/powershell/powershell
I had changed only "$Query" and modified last code line as like below
$dataView | Format-Table -AutoSize
i am getting output as
StormSummary_TotalDamages
-------------------------
{}
I tried modifying query without "TotalDamages" in "StormSummary.TotalDamages" but resulting dataview which i am unable to get again "TotalDamages" .
StormSummary
------------
{TotalDamages, StartTime, EndTime, Details}
Some one helped me to fix my issue. I am posting it to helps others.
Explanation:
Query result is importing to Json string, converting it from json and converting the columns&rows data-layout back into individual PSObjects really helped my issue.
Code:
As per the Example 2 mentioned in my question . we are calling
$reader = $queryProvider.ExecuteQuery($query, $crp)
After this I removed existing code and modified like below:
Modified Code to get Projected Field data(TotalDamages):
$json = [Kusto.Cloud.Platform.Data.ExtendedDataReader]::ToJsonString($reader)
$data = $json | ConvertFrom-Json
$columns = #{}
$count = 0
foreach ($column in $data.Tables[0].Columns) {
$columns[$column.ColumnName] = $count
$count++
}
$items = foreach ($row in $data.Tables[0].Rows) {
$hash = #{}
foreach ($property in $columns.Keys){
$hash[$property] = $row[$columns[$property]]
}
[PSCustomObject]$hash
}
foreach($item in $items)
{
Write-Host "TotalDamages: "$item.StormSummary.TotalDamages
}
Output:
TotalDamages: 6200000
TotalDamages: 2000

Partially merging files

I'm quite new to Powershell and have so far created a couple of scripts based of what I have found on various sites.
Now I want to expand my scripts further and have run into problems. I guess its not that diffucult to do what I want, but I dont seem to get it to work.
Scenario:
I have a file called from.csv that is automatically created with below info:
from.csv
Name,Mac
Server01,00:50:56:00:00:01
Server02,00:50:56:00:00:02
Server03,00:50:56:00:00:03
I also have a file called to.csv with below info:
to.csv
Name,Mac,IP
Server01,,192.168.0.1
Server02,,192.168.0.2
Server03,,192.168.0.3
What I now want to do is to get the correct (corresponding to the correct server) MAC-address from the "from.csv" file included to the correct column in the "to.csv" file.
Thanks
This is quite easy, actually.
First you'll load your from.csv:
$from = Import-CSV from.csv
Then it's easiest if you create a lookup table from that data:
$servers = #{}
$from | foreach { $servers[$_.Name] = $_.Mac }
Then you can load to.csv:
$to = Import-CSV to.csv
And add in the missing data:
$to | foreach { $_.Mac = $servers[$_.Name] }
And save the result:
$to | Export-Csv to_result.csv

How can I use PowerShell to edit hyperlinks in a Word document?

I am in the process of converting a series of 3500 html documents to Word for a documentation repository. We've run into a problem where some hyperlinks are broken on the back end of the conversion for no apparent reason. I want to generate a list of filenames and the links contained in each to see if I can spot any patterns and adjust my conversion program accordingly. Unfortunately, searches that include PowerShell and hyperlinks lead to a lot of items about how to ADD hyperlinks using Powershell, and none of the situations have been applicable to my needs.
Using this link and this link as my starting point with this code....
$word = New-Object -ComObject Word.Application
$document = $word.documents.open("C:\users\administrator\desktop\TEST.docx")
$document.Hyperlinks
([uri]"http://domain.com/This is a bad link").AbsoluteUri
$hyperlinks = #($document.Hyperlinks)
$hyperlinks | ForEach {
If ($_.Address -match "\s") {
$newURI = ([uri]$_.address).AbsoluteUri
Write-Verbose ("Updating {0} to {1}" -f $_.Address,$newURI) -Verbose
$_.address = $newURI
}
}
$document.save()
$word.quit()
I've been trying to craft something that will meet my needs. I can duplicate the above script's results, but have not been able to get a successful run iterating through all the documents in a directory with a ForEach command. I'm trying to change all links from html to doc, but the second I insert this code:
If ($.Address. -match ".\.doc") {
$newExt = ".doc" ;
$newURI = ([uri]$$_.address).BaseName.$newExt.
I get out of bounds and command failure errors at runtime. This Link helped, and this link answers my question for VBA/VBScript...but not PowerShell. Does anyone have a Powershell solution for this?
Someone had asked a similar question, for Excel a while ago:
Excel & Powershell: Bulk Find and replace URL's used in formulas
So, once you have hyperlinks you could simply replace the .html to .doc using -replace. For example:
$hyperlinks | % {$_.TextToDisplay = $_.address= $_.address -replace '.html','.doc'}
Note that If you do not change TextToDisplay, hyperlink address will change but you will still be seeing the old values.
Might have something to do with the following:
If ($.Address. -match ".\.doc") {
^
$newExt = ".doc" ;
$newURI = ([uri]$$_.address).BaseName.$newExt.
^ ^
Why not rewrite it into something like this (you'll need to find the right types like Hyperlink yourself)
$toChange = $document.Hyperlinks | ? { $_.address.endswith('.doc') } | % { $_.address = $_.address.replace('.doc', '.html') }

loop to minus 15 days from the current date

I'm trying to figure out what would be the best way to call an exe that requires a date range parameter (ex: 20130801-20130815) and then loop it so it minuses 15 days and calls the exe with the new date range.
I thought of using a do until but i'm not sure how (new to powershell/programming) but I'm sure this is far from the right method :). I've just started to figure this out, so thanks in advance for any/all help.
do {
$startDate = (Get-Date).adddays(-34)
$requireddate = some date that is set ad-hoc
$startdate.ToString("yyyyMMdd")
#[datetime]::parseexact($startdate,"MMddyyyy",$null)
Call THE EXE at this point with the parameters $startdate and $enddate
$enddate = $startdate.AddDays(-15)
write-host $enddate.ToString("yyyyMMdd")
}
until ($enddate -eq $requireddate)
You can use a For loop to do what you're trying to achieve as well:
(I've split things out into Variables a bit as well as I find it helps when writing functions)
$requiredAddDays = 30
$requiredDate = (get-date).AddDays($requiredAddDays)
$startDate = (get-date).AddDays(-34)
$endAddDays = -15
for($i = $startDate; $i -lt $requiredDate; $i = $i.AddDays(1))
{
Write-Output "$($i.ToString("yyyyMMdd"))-$($i.AddDays($endAddDays).ToString("yyyyMMdd"))"
}
Using Write-Output means that whatever is returned is returned as an object (whereas Write-Host always returns a string). By returning an object it can be fed into the Pipeline (using the pipe |)
The $(code) syntax in my Write-Output means that whatever is inside of the brackets gets evaluated before returning the string (as an object).
You could go one further and make this a parameterised function if you'll be using it lots:
Function Get-DateRange
{
Param(
[datetime]$startDate,
[int]$endAddDays,
[datetime]$requiredDate
)
for($i = $startDate; $i -lt $requiredDate; $i = $i.AddDays(1))
{
Write-Output "$($i.ToString("yyyyMMdd"))-$($i.AddDays($endAddDays).ToString("yyyyMMdd"))"
}
}
Then you could call it (once it's loaded into your session) by running something like this:
Get-DateRange -startDate (get-Date).AddDays(-10) -endAddDays 15 -requiredDate (get-Date).AddDays(15)
P.S. If you would like to write functions it might be a good idea to try and keep to the typical Powershell Verbs if you can. Run get-verb | sort verb to see the whole list. :)
There's lots of ways. If you want to use a Powershell specific method (not do..until or while(){} ) then you could go with a pipeline:
0..15 | %{
$changingDate = $startdate.AddDays(-$_)
#do your work with the .exe & $changingDate
$changingDate
}