HTMLDocumentClass and getElementsByClassName not working - powershell

Last year I had powershell (v3) script that parsed HTML of one festival page (and generate XML for my Windows Phone app).
I also was asking a question about it here and it worked like a charm.
But when I run the script this year, it is not working. To be specific - the method getElemntsByClassName is not returning anything. I tried that method also on other web pages with no luck.
Here is my code from last year, that is not working now:
$tmpFile_bandInfo = "C:\band.txt"
Write-Host "Stahuji kapelu $($kap.Nazev) ..." -NoNewline
Invoke-WebRequest http://www.colours.cz/ucinkujici/the-asteroids-galaxy-tour/ -OutFile $tmpFile_bandInfo
$content = gc $tmpFile_bandInfo -Encoding utf8 -raw
$ParsedHtml = New-Object -com "HTMLFILE"
$ParsedHtml.IHTMLDocument2_write($content)
$ParsedHtml.Close()
$bodyK = $ParsedHtml.body
$bodyK.getElementsByClassName("body four column page") # this returns NULL
$page = $page.item(0)
$aside = $page.getElementsByTagName("aside").item(0)
$img = $aside.getElementsByTagName("img").item(0)
$imgPath = $img.src
this is code I used to workaround this:
$sec = $bodyK.getElementsByTagName("section") | ? ClassName -eq "body four column page"
# but now I have no innerHTML, only the lonely tag SECTION
# so I am walking through siblings
$img = $sec.nextSibling.nextSibling.nextSibling.getElementsByTagName("img").item(0)
$imgPath = $img.src
This works, but this seems silly solution to me.
Anyone knows what I am doing wrong?

I actually solved this problem by abandoning Invoke-WebRequest cmdlet and by adopting HtmlAgilityPack.
I transformed my former sequential HTML parsing into few XPath queries (everything stayed in powershell script). This solution is much more elegant and HtmlAgilityPack is real badass ;) It is really honour to work with project like this!

The issue is not a bug but rather that the return where you're seeing NULL is because it's actually a reference to a proxy HTMLFile COM call to the DOM model.
You can force this to operate and return the underlying strings by boxing it into an array #() as such:
#($mybody.getElementsByClassName("body four column page")).textContent
If you do a Select-Object on it, that also automatically happens and it will unravel it via COM and return it as a string
$mybody.getElementsByClassName("body four column page") | Select-Object -Property TextContent

Related

How to make a PSCustomObject with Nested Values

I am trying to create a PSCustomObject with nested values and am having a really tough time, I see plenty of examples of hash tables and pscustom objects but for only extremely basic sets of data.
I am trying to create a a pscustom object that will have a server name property and then another property that has an array of services as well as their current running status.
I was able to make a hash table of the services and their running status but when I try to make an object with the has table it doesnt work very well:
hash table contains service names and their running status (stopped or running)
$hashtable
$myObject = [PSCustomObject]#{
Name = "$server"
Services = "$hashtable"
}
I am open to anything really, I have plenty of examples of how to convert items from JSON or XML and would love be able to use those as well but still having the same issue with being able to format the data in the first place.
edit: sorry about some of the vagueness of this post. As some people have already mentioned, the problem was the double qoutes around the hashtable. Everything is working now
As #t1meless noted in the comments, when you enclose a variable in double quotes it will attempt to convert that value to a string. For a Hashtable object, rather than give any information on from the object this will return "System.Collections.Hashtable". If you remove the double quotes, it will store the value of the hashtable as you are intending.
Here is a full example of what pulling the service information from a server and storing the values in a custom object. Note that $server can still be left in quotes as it is a string, but since it's already a string this would be unnecessary.
$myObject = Foreach ($Server in $Servers) {
$hashtable = #{}
Get-Service -ComputerName $Server | ForEach-Object { $hashtable.Add($_.name,$_.Status}
[PSCustomObject]#{
Name = "$Server"
Services = $hashtable
}
}

Setting a DateTime to $null/empty using PowerShell and the SCSM cmdlets

I'm currently trying to sync additional attributes from the AD (Active Directory) for user objects in SCSM (System Center Service Manager) using a PowerShell script.
The extension I wrote for this, includes an attribute for the expiration date of a AD user account (DateTime value, named DateTimeAttribute in the example) if the user account doesn't expire it should be empty/null.
Using Import-SCSMInstance, which should be similar to a CSV import, it kind of works by passing "null" for the field. The problem is that Import-SCSMInstance seems to be quite unreliable and it doesn't offer any kind of information of why it works or doesn't work. Using Update-SCSMClassInstance seems to work a lot better but I can't figure out a way to clear the field using this and even using [DateTime]::MinValue will result in an error, stating that it's an invalid value.
So would anyone have an idea on how to clear the value using Update-SCSMClassInstance or figure out why Import-SCSMInstance might or might not work?
A simple example for this could look like the following:
$server = "<ServerName>"
$extensionGuid = "<GUID>"
Import-Module 'C:\Program Files\System Center 2012 R2\Service Manager\Powershell\System.Center.Service.Manager.psd1'
New-SCManagementGroupConnection -ComputerName $server
$extensionClass = Get-SCSMClass -Id $extensionGuid
$scsmUserObject = Get-SCSMClassInstance -Class $extensionClass -Filter 'UserName -eq "<User>"'
# Error
$scsmUserObject.DateTimeAttribute = ""
# Works but fails on Update-SCSMClassInstance
$scsmUserObject.DateTimeAttribute = $null
$scsmUserObject.DateTimeAttribute = [DateTime]::MinValue
# Works
$scsmUserObject.DateTimeAttribute = "01-01-1970"
Update-SCSMClassInstance -Instance $scsmUserObject
It seems that you can't clear a date once it's filled. When you write $null, it sets the date to 1-1-0001 01:00:00, which is an invalid date causing the update-scsmclassinstance to fail.
What we have set as a user's AD property when we don't want something to expire, is 2999-12-31 (yyyy-MM-dd). Perhaps this could help, although it's not directly what you asked for...
Also, you can use the pipeline to update a property in SCSM:
Get-SCSMClassInstance -Class $extensionClass -Filter 'UserName -eq "<User>"' | % {$_.DateTimeAttribute = <date>; $_} | update-scsmclassinstance
It doesn't look like it's currently possible to clear custom date attributes using the PowerShell cmdlets.

Powershell - Can't figure out how to get web content from IE object

I am pretty new to Powershell and just using it for personal stuff. I have been experimenting with pulling specific info from websites to include in emails to family. By reading the forums I got pretty good using the Invoke-WebRequest cmdlet, but soon hit upon its limitation of not having access to content constructed dynamically at the time the page is loaded.
Thanks to these forums, I then discovered the IE object and how to pull the data. I had luck with one website, but another I tried does not work the same. Hoping for a little help figuring it out.
Here is a snippet of the inspected code for the page, with my target of interest highlighted.
Below is the code where I am trying to extract that text string. I have tried many iterations and approaches with no success. What is odd, though, the $ie.Document object supposedly has a "body" object, but when I tried to access it, I get a null object error. I notice the Document object itself has a getElementsByTagName method, so I tried that. It does not have a getElementsByClassName method.
Note that the URL I am loading is "https" so I am wondering if this is causing issues. Suggestions appreciated! If I can just get a surrounding chunk of the HTML, I am fine doing some string manipulation to get what I want.
# Create IE object and load URL
$WeatherURL = "https://weather.com/weather/today/l/77630"
$ie = New-Object -comobject "InternetExplorer.Application"
$ie.visible = $true
$ie.navigate($WeatherURL)
# Wait for the page to load
while ($ie.Busy -eq $true -Or $ie.ReadyState -ne 4) {Start-Sleep 2}
$Doc = $ie.Document
$Weather0 = $Doc.getElementsByTagName('span') `
| ?{$_.getAttribute('class') -eq "today-wx-descrip"} | Select-Object -First 1
You should replace
$Weather0 = $Doc.getElementsByTagName('span') `
| ?{$_.getAttribute('class') -eq "today-wx-description"} | Select-Object -First 1
With
$Weather0 = $Doc.getElementsByTagName('span') `
| ?{$_.getAttribute('class') -eq "today-wx-descrip"} | Select-Object -First 1
Note today-wx-description vs today-wx-descrip.

Export-csv formatting. Powershell

Let's start off by saying that I'm quite new to Powershell and not the greatest one working with it's code and scripts but trying to learn. And now to the problem!
I'm working on a script that fetches information from computers in the network. I've got some code that works quite well for my purposes. But I'm having some problem when it comes to some information, mostly information that contains multiple objects, like service.
#This application will pull information from a list of devices. The devices list is
sent with C:\Users\test.txt and output is pushed to a file C:\Users\devices.csv
function kopiera
{
param
(
[Parameter(ValueFromPipeline=$true)]
[string[]]$Computer=$env:computername
)
Process
{
$computer | ForEach-Object{
$service=Get-WmiObject -Class Win32_Service -Computername $_
$prop= [ordered]#{
Service =$service.caption
}
New-Object PSCustomObject -Property $prop
}
}
}
Get-Content C:\Users\test.txt | kopiera | Export-csv c:\Users\devices.csv
When I export the csv file it looks like this:
TYPE System.Management.Automation.PSCustomObject
"Service"
"System.Object[]"
So it doesn't fetch the service.caption (Because there are too many?).
But If I replace the export-csv C:\Users\device.csv with out-file C:\Users\devices.txt it looks like this instead:
{Adobe Acrobat Update Service, Adobe Flash Player Update Service, Andrea ADI Filters Service, Application Experience...}
So it's starting to look better, but it doesn't get them all (Still because there are too many services?). What I'd like to do with this export/out-file is to get the information to appear vertically instead of horizontal.
(Wanted result)
Adobe Acrobat Service
Adobe Flash Player Update Service
and so on..
instead of:
(Actual result)
Adobe Acrobat Update Service, Adobe Flash Player Update Service, and so on...
Is there a way to make this possible, been trying for a while and can't wrap my brain around this.
Any help is appreciated!
CSV is not usually a good choice for exporting objects that contain multi-valued or complex properties. The object properties are going to be converted to a single string value. The only way to store an array of values is to convert it to a delimited string.
$prop= [ordered]#{
Service =($service.caption) -join ';'
}
will create a semi-colon delimited string, and you'll have to deal with splitting it back out in whatever appication is using the csv later.
If you want to save and re-import the original object with the property as an array, you can switch to Export-CLIXML instead of Export-CSV.

How can I write a binary Stream object to a file in PowerShell?

I think I've tried every wrong way, and those few that don't just give ugly error messages write a garbled file that cannot be opened (you can still see the JFIF in it, but the jpeg magic smoke has been lost).
The Stream itself is $contactInfo.Get_Item("Photo"). I think I need to do something like this:
$br = new-object System.IO.BinaryReader $contactInfo.Get_Item("Photo")
But past that, I don't know what to do. I've tried Googling, but I'm not even sure what I'm looking for to be quite honest.
The type of the Stream object is Microsoft.Lync.Model.UCStream.
I don't have access to this particular type (UCStream) but in general you would write this in PowerShell like so:
$br = new-object io.binaryreader $contactInfo.Get_Item("Photo")
$al = new-object collections.generic.list[byte]
while (($i = $br.Read()) != -1)
{
$al.Add($i)
}
Set-Content photo.jpeg $al.ToArray() -enc byte