Parsing HTML with <DIV> class to variable - powershell

I am trying to parse a server monitoring page which doesnt have any class name . The HTML file looks like this
<div style="float:left;margin-right:50px"><div>Server:VIP Owner</div><div>Server Role:ACTIVE</div><div>Server State:AVAILABLE</div><div>Network State:GY</div>
how do i parse this html content to a variable like
$Server VIP Owner
$Server_Role Active
$Server_State Available
Since there is no class name.. i am struggling to get this extracted.
$htmlcontent.ParsedHtml.getElementsByTagName('div') | ForEach-Object {
>> New-Variable -Name $_.className -Value $_.textContent

While you are only showing us a very small part of the HTML, it is very likely there are more <div> tags in there.
Without an id property or anything else that uniquely identifies the div you are after, you can use a Where-Object clause to find the part you are looking for.
Try
$div = ($htmlcontent.ParsedHtml.getElementsByTagName('div') | Where-Object { $_.InnerHTML -like '<div>Server Name:*' }).outerText
# if you're on PowerShell version < 7.1, you need to replace the (first) colons into equal signs
$result = $div -replace '(?<!:.*):', '=' | ConvertFrom-StringData
# for PowerShell 7.1, you can use the `-Delimiter` parameter
#$result = $div | ConvertFrom-StringData -Delimiter ':'
The result is a Hashtable like this:
Name Value
---- -----
Server Name VIP Owner
Server State AVAILABLE
Server Role ACTIVE
Network State GY
Of course, if there are more of these in the report, you'll have to loop over divs with something like this:
$result = ($htmlcontent.ParsedHtml.getElementsByTagName('div') | Where-Object { $_.InnerHTML -like '<div>Server Name:*' }) | Foreach-Object {
$_.outerText -replace '(?<!:.*):', '=' | ConvertFrom-StringData
}
Ok, so the original question did not show what we are dealing with..
Apparently, your HTML contains divs like this:
<div>=======================================</div>
<div>Service Name:MysqlReplica</div>
<div>Service Status:RUNNING</div>
<div>Remarks:Change role completed in 1 ms</div>
<div>=======================================</div>
<div>Service Name:OCCAS</div>
<div>Service Status:RUNNING</div>
<div>Remarks:Change role completed in 30280 ms</div>
To deal with blocks like that, you need a whole different approach:
# create a List object to store the results
$result = [System.Collections.Generic.List[object]]::new()
# create a temporary ordered dictionary to build the resulting items
$svcHash = [ordered]#{}
foreach ($div in $htmlcontent.ParsedHtml.getElementsByTagName('div')) {
switch -Regex ($div.InnerText) {
'^=+' {
if ($svcHash.Count) {
# add the completed object to the list
$result.Add([PsCustomObject]$svcHash)
$svcHash = [ordered]#{}
}
}
'^(Service .+|Remarks):' {
# split into the property Name and its value
$name, $value = ($_ -split ':',2).Trim()
$svcHash[$name] = $value
}
}
}
if ($svcHash.Count) {
# if we have a final service block filled. This happens when no closing
# <div>=======================================</div>
# was found in the HTML, we need to add that to our final array of PSObjects
$result.Add([PsCustomObject]$svcHash)
}
# output on screen
$result | Format-Table -AutoSize
# output to CSV file
$result | Export-Csv -Path 'X:\services.csv' -NoTypeInformation
Output on screen using the above example:
Service Name Service Status Remarks
------------ -------------- -------
MysqlReplica RUNNING Change role completed in 1 ms
OCCAS RUNNING Change role completed in 30280 ms

Related

web scraping using powershell

I am trying to scrape the pages of website https://www.enghindi.com/ .
URLs are saved in csv file, for example
URL
Hindi meaning
Url1
hindi meaning
url2
hindi meaning
now, everytime I am running following script . it just shows result of only URL1 and that goes into multiple cells. I want all result of url 1 should be in one cell (in hindi meaning box) and similarly for URL2.
url1 : https://www.enghindi.com/index.php?q=close
url2 : https://www.enghindi.com/index.php?q=compose
$URLs = import-csv -path C:\Scripts\PS\urls.csv | select -expandproperty urls
foreach ($url in $urls)
{
$web = Invoke-WebRequest $url
$data = $web.AllElements | Where{$_.TagName -eq "BIG"} | Select-Object -Expand InnerText
$datafinal = $data.where({$_ -like "*which*"},'until')
}
foreach ($item in $datafinal) {
[ pscustomobject]#{ Url = $url; Data = $item } | Export-Csv -Path C:\Scripts\PS\output.csv -NoTypeInformation -Encoding unicode -Append
}
Are there other ways I can get english to hindi word meaning using web scraping instead of copying and pasting. I prefer google translate but that I think difficult that is why i am trying with enghindi.com.
thanks alot
Web scraping, due its inherent unreliability, should only be a last resort.
You can make it work in Windows PowerShell, but note that the HTML DOM parsing is no longer available in PowerShell (Core) 7+.
You code has two basic problems:
It operates on $datafinal after the foreach loop, at which point you only see the results of the last Invoke-WebRequest call.
You loop over each element of array $datafinal and create an output object for each, instead of creating an output object per input URL.
The following reformulation fixes these problems:
# Sample input URLs
$URLs = #(
'https://www.enghindi.com/index.php?q=close',
'https://www.enghindi.com/index.php?q=compose'
)
$URLs |
ForEach-Object {
$web = Invoke-WebRequest $_
$data = $web.AllElements | Where { $_.TagName -eq "BIG" } | Select-Object -Expand InnerText
$datafinal = $data.where({ $_ -like "*which*" }, 'until')
# Create the output object for the URL at hand and implicitly output it.
# Join the $datafinal elements with newlines to form a single vaulue.
[pscustomobject] #{
Url = $_
Hindi = $datafinal -join "`n"
}
} |
ConvertTo-Csv -NoTypeInformation
Note that, for demonstration purposes, ConvertTo-Csv is used in lieu of Export-Csv, which allows you to see the results instantly.

matching data across two arrays and combining with additional data in array

The Goal
See if $SP.ip is in $NLIP.IpRanges and if it is, add $NLIP.IpRanges and $NLIP.DisplayName to the $SP array or all into a new array.
The Arrays
Array 1 is $SP, it's a CSV import and has the properties 'name' and 'ip', it looks like this:
name: bob
ip: 1.9.8.2
Array 2 is $NLIP and has the relevant properties 'IpRanges' and 'DisplayName'. It's fetched from: $NLIP = Get-AzureADMSNamedLocationPolicy | where-object {$_.OdataType -eq "#microsoft.graph.ipNamedLocation"}, it looks like this:
DisplayName : Named Location 1
IpRanges : {class IpRange {
CidrAddress: 16.29.28.9/28 #fictitious CIDR
}
, class IpRange {
CidrAddress: 1.9.8.3/28 #fictitious CIDR
}
}
The Code / the problem
I'm using IPInRange.ps1 function from https://github.com/omniomi/PSMailTools to find if the IP is in the range. It works like so:
> IPInRange 1.9.8.2 1.9.8.3/28
True
I also worked out that $NLTP.IpRanges.split() | Where-Object ($_ -like "*/*"} can return all the ranges, but $NLIP | Where-Object {$_.IpRanges.split() -like "*/*"} doesn't. I would naturally use the second to keep the variable in the pipe to return the DisplayName. So I'm struggling on how to pull the individual ranges out in such a way that I can then add the 'IpRange' and 'DisplayName' to an array.
Also, maybe it's because I haven't worked out the above issue, but I'm struggling to think how I would iterate through both arrays and combine them into one. I know I would probably enter into a foreach ($item in $SP) and create a temporary array, but after that it's getting hazy.
The result
What I'm hoping to have in the end is:
name: bob
ip: 1.9.8.2
IpRange: 1.9.8.3/28 #fictitious CIDR
DisplayName: Named Location 1
thanks in advance.
I believe this will work for you if I understood the NLIP construct correctly.
We will loop through all the SP objects and see if we can find any NLIP that match the IP range using the IPinRange function you linked. We will then add the 2 properties you want to the SP object if matched and finally pass thru to the pipeline or you can append | export-csv -path YourPath to the end if you would like to send to a csv file
$SP | ForEach-Object {
$target = $_
$matched = $NLIP | ForEach-Object {
$item = $_
# Using where to single out matching range using IPinRange function
$_.IpRanges.Where({ IPInRange -IPAddress $target.ip -Range $_.CidrAddress }) |
ForEach-Object {
# for matching range output custom object containing the displayname and iprange
[PSCustomObject]#{
DisplayName = $item.DisplayName
IpRange = $_.CidrAddress
}
}
}
# add the 2 properties (DisplayName and IpRange) from the match to the original $SP
# object and then pass thru
$target | Add-Member -NotePropertyName DisplayName -NotePropertyValue $matched.DisplayName
$target | Add-Member -NotePropertyName IpRange -NotePropertyValue $matched.IpRange -PassThru
}
By the way, this is how I envisioned the NLIP objects and what I tested with
$NLIP = #(
[pscustomobject]#{
DisplayName = 'Named location 1'
IpRanges = #(
[pscustomobject]#{
CidrAddress = '16.29.28.9/28'
},
[pscustomobject]#{
CidrAddress = '1.9.8.3/28'
}
)
},
[pscustomobject]#{
DisplayName = 'Named location 2'
IpRanges = #(
[pscustomobject]#{
CidrAddress = '16.29.28.25/28'
},
[pscustomobject]#{
CidrAddress = '1.9.8.25/28'
}
)
}
)
Let's to shed some lights in the hazy darkness by first creating a Minimal, Reproducible Example (mcve):
$SP = ConvertFrom-Csv #'
IP, Name
1.9.8.2, BOB
10.10.10.10, Apple
16.29.28.27, Pear
16.30.29.28, Banana
'#
$NLIP = ConvertFrom-Csv #'
IPRange, SubNet
16.29.28.9/28, NetA
1.9.8.3/28, NetB
'#
To tackle this, you need two loops where the second loop is inside the first loop. For the outer loop you might use the ForEach-Object cmdlet which lets you stream each object and with that actually use less memory (assuming that you import the data from a file and eventually export it to a new file). Within the inner loop you might than cross link each IP address with the IPRange using the function you refer to and in case the condition is true create a new PSCustomObject:
$SP |ForEach-Object { # | Import-Csv .\SP.csv |ForEach-Object { ...
ForEach($SubNet in $NLIP) {
if (IPInRange $_.IP $SubNet.IPRange) {
[PSCustomObject]#{
IP = $_.IP
Name = $_.Name
IPRange = $SubNet.IPRange
SubNet = $SubNet.SubNet
}
}
}
} # | Export-Csv .\Output.csv
Which results in:
IP Name IPRange SubNet
-- ---- ------- ------
1.9.8.2 BOB 1.9.8.3/28 NetB
16.29.28.27 Pear 16.29.28.9/8 NetA
16.30.29.28 Banana 16.29.28.9/8 NetA
But as you are considering 3rd party scripts anyways, you might as well use this Join-Object script/Join-Object Module (see also: In Powershell, what's the best way to join two tables into one?):
$SP |Join $NLIP -Using { IPInRange $Left.IP $Right.IPRange }
Which gives the same results.

values in this csv are not being edited - Powershell

$users = Import-Csv -Path "C:\scripts\door-system\test\testChange.csv" -Encoding UTF8
$users | ft
$output = forEach ($user in $users)
{
if ($user.GroupName -like "Normal")
{
$output.GroupName = "edited"
}
}
$output | export-csv .\modified.csv -noTypeInformation
you have two glitches in your code. [grin]
the 1st is modifying the $Output collection inside the loop AND assigning the output of the loop to the $Output collection. do one or the other, not both.
the 2nd is not outputting anything to put in the $Output collection. that will give you an empty collection since you assigned nothing at all to it.
here's my version & what it does ...
fakes reading in a CSV file
when you are ready to do this with real data, remove the entire #region/#endregion block and use Import-CSV.
sets the target and replacement strings
iterates thru the imported collection
tests for the target in the .GroupName property of each object
if found, it replaces that value with the replacement string
sends the modified object out to the $Results collection
displays $Results on screen
saves $Results to a CSV file
the code ...
#region >>> fake reading in a CSV file
# in real life, use Import-CSV
$UserList = #'
UserName, GroupName
ABravo, Normal
BCharlie, Abnormal
CDelta, Other
DEcho, Normal
EFoxtrot, Edited
FGolf, Strange
'# | ConvertFrom-Csv
#endregion >>> fake reading in a CSV file
$TargetGName = 'Normal'
$ReplacementGName = 'Edited'
$Results = foreach ($UL_Item in $UserList)
{
if ($UL_Item.GroupName -eq $TargetGName)
{
$UL_Item.GroupName = $ReplacementGName
}
# send the modified data to the $Results collection
$UL_Item
}
# show on screen
$Results
# send to CSV
$Results |
Export-Csv -LiteralPath "$env:TEMP\Connor Tuohy_-_Modified.csv" -NoTypeInformation
on screen output ...
UserName GroupName
-------- ---------
ABravo Edited
BCharlie Abnormal
CDelta Other
DEcho Edited
EFoxtrot Edited
FGolf Strange
CSV file ["C:\Temp\Connor Tuohy_-_Modified.csv"] content ...
"UserName","GroupName"
"ABravo","Edited"
"BCharlie","Abnormal"
"CDelta","Other"
"DEcho","Edited"
"EFoxtrot","Edited"
"FGolf","Strange"

Powershell function returning an array instead of string

i'm importing a csv and i would like to add a column to it (with the result based off of the previous columns)
my data looks like this
host address,host prefix,site
10.1.1.0,24,400-01
i would like to add a column called "sub site"
so I wrote this module but the problem is, the actual ending object is an array instead of string
function site {
Param($s)
$s -match '(\d\d\d)'
return $Matches[0]
}
$csv = import-csv $file | select-object *,#{Name='Sub Site';expression= {site $_.site}}
if I run the command
PS C:\>$csv[0]
Host Address :10.1.1.0
host prefix :24
site :400-01
sub site : {True,400}
when it should look like
PS C:\>$csv[0]
Host Address :10.1.1.0
host prefix :24
site :400-01
sub site : 400
EDIT: I found the solution but the question is now WHY.
If I change my function to $s -match "\d\d\d" |out-null I get back the expected 400
Good you found the answer. I was typing this up as you found it. The reason is because the -match returns a value and it is added to the pipeline, which is all "returned" from the function.
For example, run this one line and see what is does:
"Hello" -match 'h'
It prints True.
Since I had this typed up, here is another way to phrase your question with the fix...
function site {
Param($s)
$null = $s -match '(\d\d\d)'
$ret = $Matches[0]
return $ret
}
$csv = #"
host address,host prefix,site
10.1.1.1,24,400-01
10.1.1.2,24,500-02
10.1.1.3,24,600-03
"#
$data = $csv | ConvertFrom-Csv
'1 =============='
$data | ft -AutoSize
$data2 = $data | select-object *,#{Name='Sub Site';expression= {site $_.site}}
'2 =============='
$data2 | ft -AutoSize

powershell: Check if any of a bunch of properties is set

I'm importing a csv-file which looks like this:
id,value1.1,value1.2,value1.3,Value2.1,Value2.2,Value3.1,Value3.2
row1,v1.1,,v1.3
row2,,,,v2.1,v2.2
row3,,,,,,,v3.2
Now I want to check, if any of the value-properties in one group is set.
I can do
Import-Csv .\test.csv | where {$_.Value1.1 -or $_.Value1.2 -or $_.Value1.3}
or
Import-Csv .\test.csv | foreach {
if ($_.Value1 -or $_.Value2 -or $_.Value3) {
Write-Output $_
}
}
But my "real" csv-file contains about 200 columns and I have to check 31 properties x 5 different object types that are mixed up in this csv. So my code will be realy ugly.
Is there anything like
where {$_.Value1.*}
or
where {$ArrayWithPropertyNames}
?
You could easily use the Get-Member cmdlet to get the properties which have the correct prefix (just use * as a wildcard after the prefix).
So to achieve what you want you could just filter the data based on whether any of the properties with the correct prefix contains data.
The script below uses your sample data, with a row4 added, and filters the list to find all items which have a value in any property starting with value1.
$csv = #"
id,value1.1,value1.2,value1.3,Value2.1,Value2.2,Value3.1,Value3.2
row1,v1.1,,v1.3
row2,,,,v2.1,v2.2
row3,,,,,,,v3.2
row4,v1.1,,v1.3
"#
$data = ConvertFrom-csv $csv
$data | Where {
$currentDataItem = $_
$propertyValues = $currentDataItem |
# Get's all the properties with the correct prefix
Get-Member 'value1*' -MemberType NoteProperty |
# Gets the values for each of those properties
Foreach { $currentDataItem.($_.Name) } |
# Only keep the property value if it has a value
Where { $_ }
# Could just return $propertyValues, but this makes the intention clearer
$hasValueOnPrefixedProperty = $propertyValues.Length -gt 0
Write-Output $hasValueOnPrefixedProperty
}
Alternate solution:
$PropsToCheck = 'Value1*'
Import-csv .\test.csv |
Where {
(($_ | Select $PropsToCheck).psobject.properties.value) -contains ''
}