Compare semi-large json files in Powershell stuck - powershell

I have two json files (about 20mb each).
Structure is something like that
{
"Property1": [
"https://testweb.site"
],
"Property2": "abc",
"Property3": [
"7676"
],
"Property4": "some string",
"Property5": "some string",
"Property6": "test",
"Property7": "C:\\Folder\\Test\\file.txt"
}
What I do:
$oldContent = Get-Content file1.json | ConvertFrom-Json
$newContent = Get-Content file2.json | ConvertFrom-Json
$diff = Compare-Object $oldContent $newContent -Property Property1, Property2, Property3 ...
I should compare the selected properties and find what is new in json2.
The problem:
This has been working for the last few months, but now is like stuck in forever on the compare step.
Doesn't consume Memory or CPU. I'm not sure if the compare is too complicated or something else.
Questions:
Is there any other better way to make this comparison?
Any suggestion what is the reason?

Related

Use Powershell to find and compare text in file to a folder name

My apologies, but unfortunately my Powershell scripting is quite poor, however I'm trying to muddle on with this... The below is actually for a Nagios check, hence the defining the OK, WARNING etc
Summary
I have 2 small text files with specific text within that I need to check for a folder existing with the corresponding name.
In File 1 below, note the section that states "prod" on line 18, against this I am interested in the apples, pears and bananas data (that which is a date format, plus other text) within the speech marks, so for bananas it would be 20220817-1-64 only.
The position within the text file of the "prod" line and subsequent data I'm
interested in can't be guaranteed.
The data against apples, pears etc will change over time
file1.txt:
{
"default": "prod",
"pre": {
"apples": "20220711-0",
"pears": "20220711-0",
"bananas": "20220711-1-64"
},
"test": {
"apples": "20220920-0",
"pears": "20220920-0",
"bananas": "20220920-1-64"
},
"new": {
"apples": "20220910-0",
"pears": "20220910-0",
"bananas": "20220910-1-64"
},
"prod": {
"apples": "20220817-0",
"pears": "20220210-0",
"bananas": "20220817-1-64"
},
"old": {
"apples": "20220601-0",
"pears": "20220601-0",
"bananas": "20220601-1-64"
}
}
File 2 has a similar principal, I am only interested in 20220923-0 next to the "prod" line (again, position within the file can't be guaranteed and the data will change over time)
File2.txt:
{
"default": "prod",
"pre": "20220917-0",
"test": "20220926-0",
"new": "20220924-0",
"prod": "20220923-0"
}
Each of the values would need to be compared against a directory, to see if a folder of the same name exists. If it matches, the result would be OK, if different then result in a WARNING, if missing the result would be CRITICAL.
What I have tried
Defining the result and folder to check against is straight forward enough:
# Result
$OK=0
$WARNING=1
$CRITICAL=2
# Folders to check
$apples_folder = (Get-Childitem c:\folder_path\apples\*).Name
$pears_folder = (Get-Childitem c:\folder_path\pears\*).Name
However the main part I'm struggling with is picking out the relevant text from the text file(s) against the prod line(s)
From what I have gathered, I suspect using regex or possibly grep commands may hold the answer, but I can't quite get my head around it.
Any pointers in the right direction would be appreciated.
Continuing from my comment, you can parse File1.txt as JSON, assuming that what you gave us as example is missing the final closing bracket '}' as a result of posting it here and that the json in the actual file is complete.
To work with that file type, you can do
$json = Get-Content -Path 'X:\somewhere\File1.txt' -Raw | ConvertFrom-Json
$prodFolder = $json.prod.bananas
# next test if you can find a folder with that name and do whatever you need to do then
if (Test-Path -Path "c:\folder_path\prod\$prodFolder" -PathType Container) {
"OK"
}
else {
"CRITICAL"
}
For File2.txt, things are quite different because that is a strange format..
What you can do is convert the data in there into a Hashtable using the ConvertFrom-StringData cmdlet.
In PowerShell 5.x you need to replace the colons that separate the names from their values into an equal sign (=)
$data = (Get-Content -Path 'X:\somewhereElse\File2.txt' -Raw) -replace '^(.*?):(.*)','$1=$2' -replace '[", ]' | ConvertFrom-StringData
$prodFolder = $data.prod
# next test if you can find a folder with that name and do whatever you need to do then
if (Test-Path -Path "c:\folder_path\prod\$prodFolder" -PathType Container) {
"OK"
}
else {
"CRITICAL"
}
In PowerShell 7.x you do not have to replace the colons:
$data = (Get-Content -Path 'X:\somewhereElse\File2.txt' -Raw) -replace '[", ]' | ConvertFrom-StringData -Delimiter ':'

Understanding Powershell: example - Convert JSON to CSV

I've read several posts (like Convert JSON to CSV using PowerShell) regarding using PowerShell to CSV. I have also read that it is relatively poor form to use the pipe syntax in scripts -- that it's really meant for command line and can create a hassle for developers to maintain over time.
Using this sample JSON file...
[
{
"a": "Value 1",
"b": 20,
"g": "Arizona"
},
{
"a": "Value 2",
"b": 40
},
{
"a": "Value 3"
},
{
"a": "Value 4",
"b": 60
}
]
...this code...
((Get-Content -Path $pathToInputFile -Raw) | ConvertFrom-Json) | Export-CSV $pathToOutputFile -NoTypeInformation
...creates a file containing CSV as expected.
"a","b","g"
"Value 1","20","Arizona"
"Value 2","40",
"Value 3",,
"Value 4","60",
This code...
$content = Get-Content -Path $pathToInputFile -Raw
$psObj = ConvertFrom-Json -InputObject $content
Export-Csv -InputObject $psObj -LiteralPath $pathToOutputFile -NoTypeInformation
...creates a file containing nonsense:
"Count","Length","LongLength","Rank","SyncRoot","IsReadOnly","IsFixedSize","IsSynchronized"
"4","4","4","1","System.Object[]","False","True","False"
It looks like maybe an object definition(?).
What is the difference? What PowerShell nuance did I miss when converting the code?
The answer to Powershell - Export a List of Objects to CSV says the problem is from the -InputObject option causing the object, not it's contents, to be sent to Export-Csv, but doesn't state how to remedy the problem without using the pipe syntax. I'm thinking something like -InputObject $psObj.contents. I realize that's not a real thing, but I Get-Members doesn't show me anything that looks like it will solve this.
This is not meant as an answer but just to give you a vague representation of what ConvertTo-Csv and Export-Csv are doing and to help you understand why -InputObject is meant to be bound from the pipeline and should not be used manually.
function ConvertTo-Csv2 {
param(
[parameter(ValueFromPipeline)]
[Object] $InputObject
)
begin {
$isFirstObject = $true
filter Normalize {
if($_ -match '"') { return $_.Replace('"','""') }
$_
}
}
process {
if($isFirstObject) {
$headers = $InputObject.PSObject.Properties.Name | Normalize
$isFirstObject = $false
[string]::Format('"{0}"', [string]::Join('","', $headers))
}
$values = foreach($value in $InputObject.PSObject.Properties.Value) {
$value | Normalize
}
[string]::Format('"{0}"', [string]::Join('","', $values))
}
}
As we can observe, there is no loop enumerating the $InputObject in the process block of this function, yet, because of how this block works, each object coming from the pipeline is processed and converted to a Csv string representation of the object.
Within a pipeline, the Process block executes once for each input object that reaches the function.
If instead, we attempt to use the InputObject parameter from the function, the object being passed as argument will be processed only once.
Calling the function at the beginning, or outside of a pipeline, executes the Process block once.
Get-Members doesn't show me anything that looks like it will solve this
Get-Member
It's because how you pass values has different behavior.
The pipeline enumerates values, it's almost like a foreach($item in $pipeline). Passing by Parameter skips that
Here I have an array of 3 letters.
$Letters = 'a'..'c'
I'm getting different types
Get-Member -InputObject $Letters
# [Object[]]
# [char]
$letters | Get-Member
Processed for each item
$letters | ForEach-Object {
"iteration: $_"
}
iteration: a
iteration: b
iteration: c
Compare to
ForEach-Object -InputObject $Letters {
"iteration: $_"
}
iteration: a b c
Detecting types
Here's a few ways to inspect objects.
using ClassExplorer
PS> ($Letters).GetType().FullName
PS> ($Letters[0]).GetType().FullName # first child
System.Object[]
System.Char
PS> $Letters.count
PS> $Letters[0].Count
3
1
$Letters.pstypenames -join ', '
$Letters[0].pstypenames -join ', '
System.Object[], System.Array, System.Object
System.Char, System.ValueType, System.Object
Tip: $null.count always returns 0. It does not throw an error.
if($neverExisted.count -gt 1) { ... }
Misc
I have also read that it is relatively poor form to use the pipe syntax in scripts
This is not true, Powershell is designed around piping objects.
Tip: $null.count always returns 0. It does not throw an error.
Maybe They were talking about
Example2: slow operations
Some cases when you need something fast, the overhead to Foreach-Object over a foreach can be an issue. It makes it so you have to use some extra syntax.
If you really need speed, you should probably be calling dotnet methods anyway.
Example1: Piping when you could use a parameter
I'm guessing they meant piping a variable in cases where you can pass parameters?
$text = "hi-world"
# then
$text | Write-Host
# vs
Write-Host -InputObject $Text

foreach through hashtable values

I know that this question has already been answered for hashtable keys... but it does not seem to work for hashtable values.
I'm creating a hash of VM's based on the cluster they reside in. So the hashtable looks like this
$clusters[$clustername][$clustervms] = #{}
The reason each VM is a hashtable is because i'm trying to associate it with their VM tag as well (vmware).
This code works incredibly fast but destroys the keys, by injecting values as keys... or in other words, Rather than key/value pairs - values become keys, keys become values ... it's just a shit show.
foreach ($value in $($clusters.values)) {
$clusters[$value] = (get-tagassignment -entity ($value).name).tag
This code works - but it is unbelievably slow.
foreach ($key in $($clusters.keys)) {
$vms = (Get-Cluster -Name $key | Get-Vm).name
foreach ($vm in $vms) {
$clusters[$key][$vm] = #{};
$tag = (Get-TagAssignment -Entity $vm).tag;
$clusters[$key][$vm] = $tag;
}
}
When i say unbelievably slow - i mean getting the VM names takes about 5 seconds. Getting the tag assignments through the first code (codename: shit show) takes about 7 seconds. I've waited a minute on this code, and it's only gone through 6 VM's in that time. So i know there's a better way.
Thanks,
I commented on this above, I wrote an example script which should make this more clear. Also note this powershell is meant to be illustrative, and some/many/or all things could be done in a more efficient manner.
# for example, im just using sourcedata variable to make this clearer.
# you would normally be populating this hash programatically
# lets say a VM has this payload data:
# #{ vm_name="bar"; os="win" }
$SourceData = #(
#{
cluster_name = "foo";
vms = #( #{ vm_name="bar" ; os="win" }, #{ vm_name="baz"; os="linux" })
}, #{
cluster_name = "taco";
vms = #( #{ vm_name="guac"; os="win" }, #{ vm_name="hot"; os="win" })
})
$clusters = #{}
# load the sourcedata into our clusters catalog
$SourceData | %{
$clusternm = $_.cluster_name
$clusters[ $clusternm ] = #{}
$_.vms | %{
$vmnm = $_.vm_name
$clusters[ $clusternm ][ $vmnm ] = $_
}
}
# show the whole thing
$clusters | ConvertTo-Json | Write-Output
<#
{
"taco": {
"hot": {
"os": "win",
"vm_name": "hot"
},
"guac": {
"os": "win",
"vm_name": "guac"
}
},
"foo": {
"bar": {
"os": "win",
"vm_name": "bar"
},
"baz": {
"os": "linux",
"vm_name": "baz"
}
}
}
#>
# show just a vm
$clusters['foo']['bar'] | ConvertTo-Json | Write-Output
<#
{
"os": "win",
"vm_name": "bar"
}
#>
And finally, to assure you that iterating hashes takes no appreciable time:
# now lets iterate each cluster, and each vm in that cluster. in this example, just dump the OS of each vm in each cluster
$clusters.Keys | %{
$clusternm = $_
$clusters[$clusternm].Keys | %{
$vmnm = $_
Write-Output "${clusternm}/${vmnm}: os: $( $clusters[$clusternm][$vmnm].os )"
}
}
<#
taco/hot: os: win
taco/guac: os: win
foo/bar: os: win
foo/baz: os: linux
#>
Whole script runs immediately. Only the json conversion methods to have illustrative output added 0.1s

Compare-Object Find Matches and Remove Found from First Object

I'm wondering if there's a simpler way to accomplish this. I have two (JSON) objects, where they have properties that are lists of IPs (the properties are individual IPs). I'm comparing the two object properties to find any matching items and want to remove any matches from the first object ($JSONConverted). I believe I can use the remove feature (which I haven't gotten working yet). I'm really wondering if there's a simpler way to accomplish this.
$JSONConverted = Get-Content -Raw -Path Output.json | ConvertFrom-Json
$FWJSONConverted = Get-Content -Raw -Path FWOutput.json | ConvertFrom-Json
$MatchingIPs = Compare-Object -IncludeEqual -ExcludeDifferent -ReferenceObject $FWJSONConverted.data.value -DifferenceObject $JSONConverted.data.value
$ListOfMatchingIPs = $MatchingIPs.InputObject
$JSONConverted.data.value | ForEach-Object {
foreach ($IP in $ListOfMatchingIPs) {
if ($_ -eq $IP) {
$JSONConverted.remove.($_)
}
}
}
Here's an example of the $JSONConverted data:
{
"number_of_elements": 1134,
"timeout_type": "LAST",
"name": "IP List",
"data": [
{
"last_seen": 1486571563476,
"source": "WORD: WORDS",
"value": "10.10.10.10",
"first_seen": 1486397213696
},
{
"last_seen": 1486736205285,
"source": "WORD: WORDS",
"value": "10.17.24.22",
"first_seen": 1486397813280
},
{
"last_seen": 1486637743793,
"source": "WORD: WORDS",
"value": "10.11.10.10",
"first_seen": 1486398713056
}
],
"creation_time": 1486394698941,
"time_to_live": "1 years 0 mons 3 days 0 hours 0 mins 0.00 secs",
"element_type":"IP"
}
Something like this should suffice (assuming you want to remove the entire child object from the data array):
$JSONConverted.data = $JSONConverted.data | Where-Object {
#($FWJSONConverted.data.value) -notcontains $_.value
}

How to store an array of hashtables in a text file and then call all of the values for a given key in each hashtable

I am using a text file as the backend for an application that I am developing. I first started off leaving the text file in a human-readable format but I decided that there was no sense in that figured it would be best to leave out formatting.
Where I am now in the backend dev process is creating a single-line hashtable with identical keys but different values for each entry. Seems logical and easy to work with.
Here is a mock-up of the entries in the text file:
#{'bName'='1xx'; 'bTotal'='1yy'; 'bSet'='1zz'}
#{'bName'='2xx'; 'bTotal'='2yy'; 'bSet'='2zz'}
#{'bName'='3xx'; 'bTotal'='3yy'; 'bSet'='3zz'}
As you can see, the keys for each entry are identical, however, the values are going to be different. (The numerical and repetitious nature of the values are purely coincidental and put in place for the sake of a mock-up. Actual values will not be numerically-oriented and won't be repetitious as seen in the example.)
I am able to access keys and values by typing:
$hash = Get-Content .\Desktop\Test.txt | Out-String | iex
which outputs:
Name Value
---- -----
bName 1xx
bTotal 1yy
bSet 1zz
bName 2xx
bTotal 2yy
bSet 2zz
bName 3xx
bTotal 3yy
bSet 3zz
What I ultimately want to do is gather each of the values for bName, bTotal, and bSet so that I can append each to a separate WinForms ComboBox. The WinForms part will be simple, I am just having a bit of an issue with getting the values from each hashtable in the text file.
I tried:
$hash.Values | ?{$hash.Keys -contains 'bName'}
but it just prints out every $hash.Value regardless of the $hash.Key match given in the pipe.
I understand that $hash is an array and I figured I may have to pipe out each iteration in a foreach ($hash | %{}) loop but I'm not quite sure the correct way to do this. For example, when I try:
$hash | $_.Keys
or
$hash | $_.Values
it isn't treating each iteration like a hashtable.
What am I doing wrong here? Am I going about it in a convoluted way while there is a much easier way to accomplish this? I am open to all sorts of ideas or suggestions.
As an afterthought: It is kind of funny how often an obvious solution presents itself when you step away and divert your attention towards something else.
I went to grab lunch and I can't, for the life of me, begin to comprehend why I didn't realize that I could just very easily do this:
$hash.bName
or:
$hash.bTotal
or:
$hash.bSet
That will do exact as I was wanting to do. However, considering the answers provided, I may go a different route in terms of using an .ini file in CSV format rather than creating an array of hashtables.
One way of storing hashtables in a text file is the INI format.
[hashtable1]
bName=1xx
bTotal=1yy
bSet=1zz
[hashtable2]
bName=2xx
bTotal=2yy
bSet=2zz
[hashtable3]
bName=3xx
bTotal=3yy
bSet=3zz
INI files are basically a hashtable of hashtables in text form. They can be read like this:
$ht = #{}
Get-Content 'C:\path\to\hashtables.txt' | ForEach-Object {
$_.Trim()
} | Where-Object {
$_ -notmatch '^(;|$)'
} | ForEach-Object {
if ($_ -match '^\[.*\]$') {
$section = $_ -replace '\[|\]'
$ht[$section] = #{}
} else {
$key, $value = $_ -split '\s*=\s*', 2
$ht[$section][$key] = $value
}
}
and written like this:
$ht.Keys | ForEach-Object {
'[{0}]' -f $_
foreach ($key in $ht[$_].Keys) {
'{0}={1}' -f $key, $ht[$_][$key]
}
} | Set-Content 'C:\path\to\hashtables.txt'
Individual values in such a hashtable of hashtables can be accessed like this:
$ht['section']['key']
or like this:
$ht.section.key
Another option would be to store each hashtable in a separate file
hashtable1.txt:
bName=1xx
bTotal=1yy
bSet=1zz
hashtable2.txt.
bName=2xx
bTotal=2yy
bSet=2zz
hashtable3.txt:
bName=3xx
bTotal=3yy
bSet=3zz
That would allow you to import each file into a hashtable via ConvertFrom-StringData:
$ht1 = Get-Content 'C:\path\to\hashtable1.txt' | Out-String |
ConvertFrom-Stringdata
Writing the files would basically be the same as above (there is no ConverTo-StringData cmdlet):
$ht1.Keys | ForEach-Object {
'{0}={1}' -f $_, $ht[$_]
} | Set-Content 'C:\path\to\hashtables1.txt'
PowerShell has built in csv handling so it makes it a good choice to use in this case. So, assuming you had your data stored in a file in the standard csv format with headers:
"bName","bTotal","bSet"
"1xx","1yy","1zz"
"2xx","2yy","2zz"
"3xx","3yy","3zz"
Then you import your data like this:
$data = Import-Csv $path
Now you have an array of PsCustomObject and each header in the csv file is a property of the object. So if, for example, you wanted to get the bTotal of the second object you would do the following:
$data[1].bTotal
2yy