Powershell - Search every element of a large array against every element of another large array - powershell

I have two large arrays. One is an array (call it Array1) of 100,000 PSCustomObjects, each of which has a property called "Token". And the other array is simply an array of strings, the size of this second array being 2500 elements.
The challenge is that EVERY element of Array1 needs to be checked against all the elements in Array2 and tagged accordingly. i.e., if the the Token value from Array1 matches any of the elements from Array2, label it as "Match found!"
Looping through would actually make it extremely slow. Is there a better way to do this?
P.S.: The items in Array1 have an ordinal number property as well, and the array is sorted in that order.
Here is the code:
$Array1 = #()
$Array2 = #()
#Sample object:
$obj = New-Object -TypeName PSCustomObject
$obj | Add-Member -MemberType NoteProperty -Name Token -Value "SOMEVALUEHERE"
$obj | Add-Member -MemberType NoteProperty -Name TokenOrdinalNum -Value 1
$Array1 += $obj # This array has 100K such objects
$Array2 = #("VAL1", "SOMEVALUEHERE", ......) #Array2 has 2500 such strings.
The output of this would need to be a new array of objects, say 'ArrayFinal', that has an additional noteproperty called 'MatchFound'.
Please help.

I would create a Hashtable for fast lookups from the values in your $Array2.
For clarity, I have renamed $Array1 and $Array2 into $objects and $tokens.
# the object array
$objects = [PsCustomObject]#{ Token = 'SOMEVALUEHERE'; TokenOrdinalNum = 1 },
[PsCustomObject]#{ Token = 'VAL1'; TokenOrdinalNum = 123 },
[PsCustomObject]#{ Token = 'SomeOtherValue'; TokenOrdinalNum = 555 } # etcetera
# the array with token keywords to check
$tokens = 'VAL1', 'SOMEVALUEHERE', 'ShouldNotFindThis' # etcetera
# create a lookup Hashtable from the array of token values for FAST lookup
# you can also use a HashSet ([System.Collections.Generic.HashSet[string]]::new())
# see https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.hashset-1
$lookup = #{}
$tokens | ForEach-Object { $lookup[$_] = $true } # it's only the Keys that matter, the value is not important
# now loop over the objects in the first array and check their 'Token' values
$ArrayFinal = foreach ($obj in $objects) {
$obj | Select-Object *, #{Name = 'MatchFound'; Expression = { $lookup.ContainsKey($obj.Token) }}
}
# output on screen
$ArrayFinal | Format-Table -AutoSize
# write to Csv ?
$ArrayFinal | Export-Csv -Path 'Path\To\MatchedObjects.csv' -NoTypeInformation
Output:
Token TokenOrdinalNum MatchFound
----- --------------- ----------
SOMEVALUEHERE 1 True
VAL1 123 True
SomeOtherValue 555 False

100kb objects isn't too big. Here's an example using compare-object. By default it checks every object against every other object (919 ms). EDIT: Ok, if I change the order of $b, it takes much longer (13 min). Sorting both lists first should work well, if most of the positions end up the same.(1.99 s with measure-command). If every item were off by 1 position it will still take a long time ($b = 1,$b).
$a = foreach ($i in 1..100kb) { [pscustomobject]#{token = get-random} }
$a = $a | sort-object token
$b = $a.token | sort-object
compare-object $a.token $b -IncludeEqual
InputObject SideIndicator
----------- -------------
1507400001 ==
120471924 ==
28523825 ==
...

Related

In powershell, how to test in an array already contains an object with all the same properties?

I want to avoid inserting duplicates into an array in powershell. Trying to use -notcontains doesn't seem to work with a PSCUstomObject array.
Here's a code sample
$x = [PSCustomObject]#{
foo = 111
bar = 222
}
$y = [PSCustomObject]#{
foo = 111
bar = 222
}
$collection = #()
$collection += $x
if ($collection -notcontains $y){
$collection += $y
}
$collection.Count #Expecting to only get 1, getting 2
I would use Compare-Object for this.
$x = [PSCustomObject]#{
foo = 111
bar = 222
}
$y = [PSCustomObject]#{
foo = 111
bar = 222
}
$collection = [System.Collections.Arraylist]#()
[void]$collection.Add($x)
if (Compare-Object -Ref $collection -Dif $y -Property foo,bar | Where SideIndicator -eq '=>') {
[void]$collection.Add($y)
}
Explanation:
Comparing a custom object to another using comparison operators is not trivial. This solution compares the particular properties you care about (foo and bar in this case). This can be done simply with Compare-Object, which will output differences in either object by default. The SideIndicator value of => indicates the difference lies in the object passed into the -Difference parameter.
The [System.Collections.Arraylist] type is used over an array to avoid the inefficient += typically seen when growing an array. Since the .Add() method produces an output of the index being modified, the [void] cast is used to suppress that output.
You could be dynamic with the solution regarding the properties. You may not want to hard code the property names into the Compare-Object command. You can do something like the following instead.
$x = [PSCustomObject]#{
foo = 111
bar = 222
}
$y = [PSCustomObject]#{
foo = 111
bar = 222
}
$collection = [System.Collections.Arraylist]#()
[void]$collection.Add($x)
$properties = $collection[0] | Get-Member -MemberType NoteProperty |
Select-Object -Expand Name
if (Compare-Object -Ref $collection -Dif $y -Property $properties | Where SideIndicator -eq '=>') {
[void]$collection.Add($y)
}

Hashtable with multiple values in GridView

I am storing data in a hashtable with multiple values like this:
$hash = #{}
$folders = dir (...) | where (...)
foreach ($folder in $folders) {
$num1 = (...)
$num2 = (...)
$hash.Add($folder.Name,#($num1,$num2))
}
So this is a hash with an array in its value part. The array always got two items. When finished the foreach part I want to show the data with Out-GridView like this:
$hash | select -Property #{Expression={$_.Name};Label="FolderName"},
#{Expression={$_.Name[0]};Label="num1"},
#{Expression={$_.Name[1]};Label="num2"} | Out-GridView
But as you can imagine, this is not working. How can I split the stored array in the value part of my hash into two new columns to show them in overall three columns in the GridView?
Should be something like Name, Value1, Value2 ...
And then multiple items which are stored in the hashtable as multiple rows.
Hashtables are not lists of objects with a Name and a Value property. That's just how PowerShell displays the data structure for your convenience. For processing a hashtable the way you tried you need an enumerator to produce such objects:
$hash.GetEnumerator() |
Select-Object #{n='FolderName';e={$_.Name}},
#{n='num1';e={$_.Value[0]}},
#{n='num2';e={$_.Value[1]}} |
Out-GridView
Or you can enumerate the keys of the hashtable, use them as the current objects in the pipeline, and look up the values by the respective key and index:
$hash.Keys |
Select-Object #{n='FolderName';e={$_}},
#{n='num1';e={$hash[$_][0]}},
#{n='num2';e={$hash[$_][1]}} |
Out-GridView
If you don't know the number of array elements beforehand you need an inner loop for processing the nested arrays, e.g. like this:
$hash.Keys | ForEach-Object {
$o = New-Object -Type PSObject -Property #{ 'FolderName' = $_ }
$a = $hash[$_]
for ($i = 1; $i -le $a.Count; $i++) {
$o | Add-Member -Type NoteProperty -Name "num$i" -Value $a[$i-1]
}
$o
} | Out-GridView
If you have a variable number of array elements, beware that PowerShell determines by the first object which properties will be displayed.

Find matches in two different Powershell objects based on one property

I am trying to find the matching names in two different types of Powershell objects
$Object1 has two properties - Name (string), ResourceID (uint32)
$object2 has one noteproperty - Name (system.string)
This gives me a list of the matching names but I also want the corresponding resourceID property from $object1.
$computers = Compare-Object $Object1.name $WSD_CM12 | where {$_.sideindicator -eq "=>"} | foreach {$_.inputobject}
These are big objects with over 10,000 items so I'm looking for the most efficient way to accomplish this.
If I'm understanding what you're after, I'd start by creating a hash table from your Object1 collection:
$object1_hash = #{}
Foreach ($object1 in $object1_coll)
{ $object1_hash[$object1.Name] = $object1.ResourceID }
Then you can find the ResourceID for any given Object2.name with:
$object1_hash[$Object2.Name]
Test bed for creating hash table:
$object1_coll = $(
New-Object PSObject -Property #{Name = 'Name1';ResourceID = 001}
New-Object PSObject -Property #{Name = 'Name2';ResourceID = 002}
)
$object1_hash = #{}
Foreach ($object1 in $object1_coll)
{ $object1_hash[$object1.Name] = $object1.ResourceID }
$object1_hash
Name Value
---- -----
Name2 2
Name1 1
Alternative method:
# Create sample list of objects with both Name and Serial
$obj1 = New-Object -Type PSCustomObject -Property:#{ Name = "Foo"; Serial = "1234" }
$obj2 = New-Object -Type PSCustomObject -Property:#{ Name = "Cow"; Serial = "4242" }
$collection1 = #($obj1, $obj2)
# Create subset of items with only Name
$objA = New-Object -Type PSCustomObject -Property:#{ Name = "Foo"; }
$collection2 = #($objA)
#Everything above this line is just to make sample data
# replace $collection1 and $collection2 with $Object1, $WSD_CM12
# Combine into one list
($collection1 + $collection2) |
# Group by name property
Group-Object -Property Name |
# I only want items that exist in both
Where { $_.Count -gt 1 } |
# Now give me the object
Select -Expand Group |
# And get the properties
Where { $_.Serial -ne $null }

powershell select-object outputs array on one line

I have around 20 arrays which contain over 100 values each.
I want to output these to a csv file with column headings.
If I type any of these arrays in a powershell command prompt they display on multiple lines and I can select different items from the array using $arrayname{14] for example, so I think they are being stored correctly.
If I use the following line in my script:
"" | select-object #{Name="Column1"; Expression={"$Array1"}},#{Name="Column2"; Expression={"$Array2"}},#{Name="Column3"; Expression={"$Array3"}} | export-csv $exportLocation -notypeinformation
Then it creates the columns with the heading but each array variable is displayed on one line.
How can I get the output to display the arrays in the respective columns on a line of their own?
You need to convert your 4 arrays into an array of objects with 4 properties. Try this:
$Array1 = #(...)
$Array2 = #(...)
$Array3 = #(...)
$Array4 = #(...)
$len1 = [Math]::Max($Array1.Length, $Array2.Length)
$len2 = [Math]::Max($Array3.Length, $Array4.Length)
$maxlen = [Math]::Max($len1, $len2)
$csv = for ($i=0; $i -lt $maxlen; $i++) {
New-Object -Type PSCustomObject -Property #{
'Column1' = $Array1[$i];
'Column2' = $Array2[$i];
'Column3' = $Array3[$i];
'Column4' = $Array4[$i];
}
}
$csv | Export-Csv 'C:\path\to\output.csv'

Powershell Multidimensional Arrays

I have a way of doing Arrays in other languagues like this:
$x = "David"
$arr = #()
$arr[$x]["TSHIRTS"]["SIZE"] = "M"
This generates an error.
You are trying to create an associative array (hash). Try out the following
sequence of commands
$arr=#{}
$arr["david"] = #{}
$arr["david"]["TSHIRTS"] = #{}
$arr["david"]["TSHIRTS"]["SIZE"] ="M"
$arr.david.tshirts.size
Note the difference between hashes and arrays
$a = #{} # hash
$a = #() # array
Arrays can only have non-negative integers as indexes
from powershell.com:
PowerShell supports two types of multi-dimensional arrays: jagged arrays and true multidimensional arrays.
Jagged arrays are normal PowerShell arrays that store arrays as elements. This is very cost-effective storage because dimensions can be of different size:
$array1 = 1,2,(1,2,3),3
$array1[0]
$array1[1]
$array1[2]
$array1[2][0]
$array1[2][1]
True multi-dimensional arrays always resemble a square matrix. To create such an array, you will need to access .NET. The next line creates a two-dimensional array with 10 and 20 elements resembling a 10x20 matrix:
$array2 = New-Object 'object[,]' 10,20
$array2[4,8] = 'Hello'
$array2[9,16] = 'Test'
$array2
for a 3-dimensioanl array 10*20*10
$array3 = New-Object 'object[,,]' 10,20,10
To extend on what manojlds said above is that you can nest Hashtables. It may not be a true multi-dimensional array but give you some ideas about how to structure the data. An example:
$hash = #{}
$computers | %{
$hash.Add(($_.Name),(#{
"Status" = ($_.Status)
"Date" = ($_.Date)
}))
}
What's cool about this is that you can reference things like:
($hash."Name1").Status
Also, it is far faster than arrays for finding stuff. I use this to compare data rather than use matching in Arrays.
$hash.ContainsKey("Name1")
Hope some of that helps!
-Adam
Knowing that PowerShell pipes objects between cmdlets, it is more common in PowerShell to use an array of PSCustomObjects:
$arr = #(
[PSCustomObject]#{Name = 'David'; Article = 'TShirt'; Size = 'M'}
[PSCustomObject]#{Name = 'Eduard'; Article = 'Trouwsers'; Size = 'S'}
)
Or for older PowerShell Versions (PSv2):
$arr = #(
New-Object PSObject -Property #{Name = 'David'; Article = 'TShirt'; Size = 'M'}
New-Object PSObject -Property #{Name = 'Eduard'; Article = 'Trouwsers'; Size = 'S'}
)
And grep your selection like:
$arr | Where {$_.Name -eq 'David' -and $_.Article -eq 'TShirt'} | Select Size
Or in newer PowerShell (Core) versions:
$arr | Where Name -eq 'David' | Where Article -eq 'TShirt' | Select Size
Or (just get the size):
$arr.Where{$_.Name -eq 'David' -and $_.Article -eq 'TShirt'}.Size
Addendum 2020-07-13
Syntax and readability
As mentioned in the comments, using an array of custom objects is straighter and saves typing, if you like to exhaust this further you might even use the ConvertForm-Csv (or the Import-Csv) cmdlet for building the array:
$arr = ConvertFrom-Csv #'
Name,Article,Size
David,TShirt,M
Eduard,Trouwsers,S
'#
Or more readable:
$arr = ConvertFrom-Csv #'
Name, Article, Size
David, TShirt, M
Eduard, Trouwsers, S
'#
Note: values that contain spaces or special characters need to be double quoted
Or use an external cmdlet like ConvertFrom-SourceTable which reads fixed width table formats:
$arr = ConvertFrom-SourceTable '
Name Article Size
David TShirt M
Eduard Trouwsers S
'
Indexing
The disadvantage of using an array of custom objects is that it is slower than a hash table which uses a binary search algorithm.
Note that the advantage of using an array of custom objects is that can easily search for anything else e.g. everybody that wears a TShirt with size M:
$arr | Where Article -eq 'TShirt' | Where Size -eq 'M' | Select Name
To build an binary search index from the array of objects:
$h = #{}
$arr | ForEach-Object {
If (!$h.ContainsKey($_.Name)) { $h[$_.Name] = #{} }
If (!$h[$_.Name].ContainsKey($_.Article)) { $h[$_.Name][$_.Article] = #{} }
$h[$_.Name][$_.Article] = $_ # Or: $h[$_.Name][$_.Article]['Size'] = $_.Size
}
$h.david.tshirt.size
M
Note: referencing a hash table key that doesn't exist in Set-StrictMode will cause an error:
Set-StrictMode -Version 2
$h.John.tshirt.size
PropertyNotFoundException: The property 'John' cannot be found on this object. Verify that the property exists.
Here is a simple multidimensional array of strings.
$psarray = #(
('Line' ,'One' ),
('Line' ,'Two')
)
foreach($item in $psarray)
{
$item[0]
$item[1]
}
Output:
Line
One
Line
Two
Two-dimensional arrays can be defined this way too as jagged array:
$array = New-Object system.Array[][] 5,5
This has the nice feature that
$array[0]
outputs a one-dimensional array, containing $array[0][0] to $array[0][4].
Depending on your situation you might prefer it over $array = New-Object 'object[,]' 5,5.
(I would have commented to CB above, but stackoverflow does not let me yet)
you could also uses System.Collections.ArrayList to make a and array of arrays or whatever you want.
Here is an example:
$resultsArray= New-Object System.Collections.ArrayList
[void] $resultsArray.Add(#(#('$hello'),2,0,0,0,0,0,0,1,1))
[void] $resultsArray.Add(#(#('$test', '$testagain'),3,0,0,1,0,0,0,1,2))
[void] $resultsArray.Add("ERROR")
[void] $resultsArray.Add(#(#('$var', '$result'),5,1,1,0,1,1,0,2,3))
[void] $resultsArray.Add(#(#('$num', '$number'),3,0,0,0,0,0,1,1,2))
One problem, if you would call it a problem, you cannot set a limit. Also, you need to use [void] or the script will get mad.
Using the .net syntax (like CB pointed above)
you also add coherence to your 'tabular' array...
if you define a array...
and you try to store diferent types
Powershell will 'alert' you:
$a = New-Object 'byte[,]' 4,4
$a[0,0] = 111; // OK
$a[0,1] = 1111; // Error
Of course Powershell will 'help' you
in the obvious conversions:
$a = New-Object 'string[,]' 2,2
$a[0,0] = "1111"; // OK
$a[0,1] = 111; // OK also
Another thread pointed here about how to add to a multidimensional array in Powershell. I don't know if there is some reason not to use this method, but it worked for my purposes.
$array = #()
$array += ,#( "1", "test1","a" )
$array += ,#( "2", "test2", "b" )
$array += ,#( "3", "test3", "c" )
Im found pretty cool solvation for making arrays in array.
$GroupArray = #()
foreach ( $Array in $ArrayList ){
$GroupArray += #($Array , $null)
}
$GroupArray = $GroupArray | Where-Object {$_ -ne $null}
Lent from above:
$arr = ConvertFrom-Csv #'
Name,Article,Size
David,TShirt,M
Eduard,Trouwsers,S
'#
Print the $arr:
$arr
Name Article Size
---- ------- ----
David TShirt M
Eduard Trouwsers S
Now select 'David'
$arr.Where({$_.Name -eq "david"})
Name Article Size
---- ------- ----
David TShirt M
Now if you want to know the Size of 'David'
$arr.Where({$_.Name -eq "david"}).size
M