I have two large arrays. One is an array (call it Array1) of 100,000 PSCustomObjects, each of which has a property called "Token". And the other array is simply an array of strings, the size of this second array being 2500 elements.
The challenge is that EVERY element of Array1 needs to be checked against all the elements in Array2 and tagged accordingly. i.e., if the the Token value from Array1 matches any of the elements from Array2, label it as "Match found!"
Looping through would actually make it extremely slow. Is there a better way to do this?
P.S.: The items in Array1 have an ordinal number property as well, and the array is sorted in that order.
Here is the code:
$Array1 = #()
$Array2 = #()
#Sample object:
$obj = New-Object -TypeName PSCustomObject
$obj | Add-Member -MemberType NoteProperty -Name Token -Value "SOMEVALUEHERE"
$obj | Add-Member -MemberType NoteProperty -Name TokenOrdinalNum -Value 1
$Array1 += $obj # This array has 100K such objects
$Array2 = #("VAL1", "SOMEVALUEHERE", ......) #Array2 has 2500 such strings.
The output of this would need to be a new array of objects, say 'ArrayFinal', that has an additional noteproperty called 'MatchFound'.
Please help.
I would create a Hashtable for fast lookups from the values in your $Array2.
For clarity, I have renamed $Array1 and $Array2 into $objects and $tokens.
# the object array
$objects = [PsCustomObject]#{ Token = 'SOMEVALUEHERE'; TokenOrdinalNum = 1 },
[PsCustomObject]#{ Token = 'VAL1'; TokenOrdinalNum = 123 },
[PsCustomObject]#{ Token = 'SomeOtherValue'; TokenOrdinalNum = 555 } # etcetera
# the array with token keywords to check
$tokens = 'VAL1', 'SOMEVALUEHERE', 'ShouldNotFindThis' # etcetera
# create a lookup Hashtable from the array of token values for FAST lookup
# you can also use a HashSet ([System.Collections.Generic.HashSet[string]]::new())
# see https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.hashset-1
$lookup = #{}
$tokens | ForEach-Object { $lookup[$_] = $true } # it's only the Keys that matter, the value is not important
# now loop over the objects in the first array and check their 'Token' values
$ArrayFinal = foreach ($obj in $objects) {
$obj | Select-Object *, #{Name = 'MatchFound'; Expression = { $lookup.ContainsKey($obj.Token) }}
}
# output on screen
$ArrayFinal | Format-Table -AutoSize
# write to Csv ?
$ArrayFinal | Export-Csv -Path 'Path\To\MatchedObjects.csv' -NoTypeInformation
Output:
Token TokenOrdinalNum MatchFound
----- --------------- ----------
SOMEVALUEHERE 1 True
VAL1 123 True
SomeOtherValue 555 False
100kb objects isn't too big. Here's an example using compare-object. By default it checks every object against every other object (919 ms). EDIT: Ok, if I change the order of $b, it takes much longer (13 min). Sorting both lists first should work well, if most of the positions end up the same.(1.99 s with measure-command). If every item were off by 1 position it will still take a long time ($b = 1,$b).
$a = foreach ($i in 1..100kb) { [pscustomobject]#{token = get-random} }
$a = $a | sort-object token
$b = $a.token | sort-object
compare-object $a.token $b -IncludeEqual
InputObject SideIndicator
----------- -------------
1507400001 ==
120471924 ==
28523825 ==
...
I am trying to find the most efficient way of sifting out any duplicates in a large hash table which consists of almost 5k objects.
I am running all of this in Powershell. So, I have this large hash table which consists of (in essence) User's and Subscription Names
1. User_id | Sub_name
2. User_id | Sub_name
etc...
In most cases, there are 5+ lines for each User_id as each new line represents a subscription name that user is subscribed to.
What I need to do is this: Identify any duplicate subscriptions for each user. For example
1. mm1234 | sub_1
2. mm1234 | sub_4
3. mm1234 | sub_1
4. mm9999 | sub_1
5. mm9999 | sub_2
6. mm8888 | sub_1
7. mm8888 | sub_1
So, in the above example, I would need to remove lines 3 & 7. Now, currently there is no actual grouping in terms of how users are grouped in the hash, they are just shoveled in. I'm wondering if it is possible to do it from the final product hash like seen above. Thoughts?
Maybe this can help.
If your large hash looks similar to this:
$hash = #{
'1' = #{ 'user_uuid' = 'mm1234'; 'lob' = 'subscription_1' }
'2' = #{ 'user_uuid' = 'mm5678'; 'lob' = 'subscription_1' }
'3' = #{ 'user_uuid' = 'mm1234'; 'lob' = 'subscription_2' }
'4' = #{ 'user_uuid' = 'mm5678'; 'lob' = 'subscription_5' }
'5' = #{ 'user_uuid' = 'mm1234'; 'lob' = 'subscription_3' }
'6' = #{ 'user_uuid' = 'mm1478'; 'lob' = 'subscription_1' }
}
You could create a new result hash where the keys are the user_uuid's and the values are arrays of uniquely sorted subscriptions (or lob as you call them)
$result = #{}
$hash.Keys | ForEach-Object {
$uid = $hash.$_.user_uuid
$value = $hash.$_.lob
if ($result.ContainsKey($uid)) {
# add to the subscriptions array for this user_uuid
$result[$uid] = ($result[$uid] + $value) | Sort-Object -Unique
}
else {
# create an element for this user_uuid and make sure the value is an array
$result[$uid] = #($value)
}
}
The resulting Hashtable will have this content:
Name Value
---- -----
mm1234 {subscription_1, subscription_2, subscription_3}
mm1478 {subscription_1}
mm5678 {subscription_1, subscription_5}
If you need to convert this back into the format of the original $hash (a hash of hashes), you can do something like this:
# recreate the large hash using the deduped values
$newHash = #{}
$count = 1
$result.Keys | ForEach-Object {
foreach ($value in $result.$_) {
$newHash[$count++] = #{ 'user_uuid' = $_; 'lob' = $value }
}
}
A while ago I changed my Join-Object cmdlet which appeared to cause a bug which didn’t reveal in any of my testing.
The objective of the change was mainly code minimizing and trying to improve performance by preparing a custom PSObject and reusing this in the pipeline.
As the Join-Object cmdlet is rather complex, I have created a simplified cmdlet to show the specific issue:
(The PowerShell version is: 5.1.16299.248)
Function Test($Count) {
$PSObject = New-Object PSObject -Property #{Name = $Null; Value = $Null}
For ($i = 1; $i -le $Count; $i++) {
$PSObject.Name = "Name$i"; $PSObject.Value = $i
$PSObject
}
}
Directly testing the output gives exactly what I expected:
Test 3 | ft
Value Name
----- ----
1 Name1
2 Name2
3 Name3
Presuming that it shouldn't matter whether I assign the result to a variable (e.g. $a) or not, but it does:
$a = Test 3
$a | ft
Value Name
----- ----
3 Name3
3 Name3
3 Name3
So, apart from sharing this experience, I wonder whether this is programming flaw or a PowerShell bug/quirk?
Your original approach is indeed conceptually flawed in that you're outputting the same object multiple times, iteratively modifying its properties.
The discrepancy in output is explained by the pipeline's item-by-item processing:
Outputting to the console (via ft / Format-Table) prints the then-current state of $PSObject in each iteration, which gives the appearance that everything is fine.
Capturing in a variable, by contrast, reflects $PSObject's state after all iterations have completed, at which point it contains only the last iteration's values, Name3 and 3.
You can verify that output array $a indeed references the very same custom object three times as follows:
[object]::ReferenceEquals($a[0], $a[1]) # $True
[object]::ReferenceEquals($a[1], $a[2]) # $True
The solution is therefore to create a distinct [pscustomobject] instance in each iteration:
PSv3+ offers syntactic sugar for creating custom objects: you can cast a hashtable (literal) to [pscustomobject]. Since this also creates a new instance every time, you can use it to simplify your function:
Function Test($Count) {
For ($i = 1; $i -le $Count; $i++) {
[pscustomobject] #{ Name = "Name$i"; Value = $i }
}
}
Here's your own PSv2-compatible solution:
Function Test($Count) {
$Properties = #{}
For ($i = 1; $i -le $Count; $i++) {
$Properties.Name = "Name$i"; $Properties.Value = $i
New-Object PSObject -Property $Properties
}
}
I have a way of doing Arrays in other languagues like this:
$x = "David"
$arr = #()
$arr[$x]["TSHIRTS"]["SIZE"] = "M"
This generates an error.
You are trying to create an associative array (hash). Try out the following
sequence of commands
$arr=#{}
$arr["david"] = #{}
$arr["david"]["TSHIRTS"] = #{}
$arr["david"]["TSHIRTS"]["SIZE"] ="M"
$arr.david.tshirts.size
Note the difference between hashes and arrays
$a = #{} # hash
$a = #() # array
Arrays can only have non-negative integers as indexes
from powershell.com:
PowerShell supports two types of multi-dimensional arrays: jagged arrays and true multidimensional arrays.
Jagged arrays are normal PowerShell arrays that store arrays as elements. This is very cost-effective storage because dimensions can be of different size:
$array1 = 1,2,(1,2,3),3
$array1[0]
$array1[1]
$array1[2]
$array1[2][0]
$array1[2][1]
True multi-dimensional arrays always resemble a square matrix. To create such an array, you will need to access .NET. The next line creates a two-dimensional array with 10 and 20 elements resembling a 10x20 matrix:
$array2 = New-Object 'object[,]' 10,20
$array2[4,8] = 'Hello'
$array2[9,16] = 'Test'
$array2
for a 3-dimensioanl array 10*20*10
$array3 = New-Object 'object[,,]' 10,20,10
To extend on what manojlds said above is that you can nest Hashtables. It may not be a true multi-dimensional array but give you some ideas about how to structure the data. An example:
$hash = #{}
$computers | %{
$hash.Add(($_.Name),(#{
"Status" = ($_.Status)
"Date" = ($_.Date)
}))
}
What's cool about this is that you can reference things like:
($hash."Name1").Status
Also, it is far faster than arrays for finding stuff. I use this to compare data rather than use matching in Arrays.
$hash.ContainsKey("Name1")
Hope some of that helps!
-Adam
Knowing that PowerShell pipes objects between cmdlets, it is more common in PowerShell to use an array of PSCustomObjects:
$arr = #(
[PSCustomObject]#{Name = 'David'; Article = 'TShirt'; Size = 'M'}
[PSCustomObject]#{Name = 'Eduard'; Article = 'Trouwsers'; Size = 'S'}
)
Or for older PowerShell Versions (PSv2):
$arr = #(
New-Object PSObject -Property #{Name = 'David'; Article = 'TShirt'; Size = 'M'}
New-Object PSObject -Property #{Name = 'Eduard'; Article = 'Trouwsers'; Size = 'S'}
)
And grep your selection like:
$arr | Where {$_.Name -eq 'David' -and $_.Article -eq 'TShirt'} | Select Size
Or in newer PowerShell (Core) versions:
$arr | Where Name -eq 'David' | Where Article -eq 'TShirt' | Select Size
Or (just get the size):
$arr.Where{$_.Name -eq 'David' -and $_.Article -eq 'TShirt'}.Size
Addendum 2020-07-13
Syntax and readability
As mentioned in the comments, using an array of custom objects is straighter and saves typing, if you like to exhaust this further you might even use the ConvertForm-Csv (or the Import-Csv) cmdlet for building the array:
$arr = ConvertFrom-Csv #'
Name,Article,Size
David,TShirt,M
Eduard,Trouwsers,S
'#
Or more readable:
$arr = ConvertFrom-Csv #'
Name, Article, Size
David, TShirt, M
Eduard, Trouwsers, S
'#
Note: values that contain spaces or special characters need to be double quoted
Or use an external cmdlet like ConvertFrom-SourceTable which reads fixed width table formats:
$arr = ConvertFrom-SourceTable '
Name Article Size
David TShirt M
Eduard Trouwsers S
'
Indexing
The disadvantage of using an array of custom objects is that it is slower than a hash table which uses a binary search algorithm.
Note that the advantage of using an array of custom objects is that can easily search for anything else e.g. everybody that wears a TShirt with size M:
$arr | Where Article -eq 'TShirt' | Where Size -eq 'M' | Select Name
To build an binary search index from the array of objects:
$h = #{}
$arr | ForEach-Object {
If (!$h.ContainsKey($_.Name)) { $h[$_.Name] = #{} }
If (!$h[$_.Name].ContainsKey($_.Article)) { $h[$_.Name][$_.Article] = #{} }
$h[$_.Name][$_.Article] = $_ # Or: $h[$_.Name][$_.Article]['Size'] = $_.Size
}
$h.david.tshirt.size
M
Note: referencing a hash table key that doesn't exist in Set-StrictMode will cause an error:
Set-StrictMode -Version 2
$h.John.tshirt.size
PropertyNotFoundException: The property 'John' cannot be found on this object. Verify that the property exists.
Here is a simple multidimensional array of strings.
$psarray = #(
('Line' ,'One' ),
('Line' ,'Two')
)
foreach($item in $psarray)
{
$item[0]
$item[1]
}
Output:
Line
One
Line
Two
Two-dimensional arrays can be defined this way too as jagged array:
$array = New-Object system.Array[][] 5,5
This has the nice feature that
$array[0]
outputs a one-dimensional array, containing $array[0][0] to $array[0][4].
Depending on your situation you might prefer it over $array = New-Object 'object[,]' 5,5.
(I would have commented to CB above, but stackoverflow does not let me yet)
you could also uses System.Collections.ArrayList to make a and array of arrays or whatever you want.
Here is an example:
$resultsArray= New-Object System.Collections.ArrayList
[void] $resultsArray.Add(#(#('$hello'),2,0,0,0,0,0,0,1,1))
[void] $resultsArray.Add(#(#('$test', '$testagain'),3,0,0,1,0,0,0,1,2))
[void] $resultsArray.Add("ERROR")
[void] $resultsArray.Add(#(#('$var', '$result'),5,1,1,0,1,1,0,2,3))
[void] $resultsArray.Add(#(#('$num', '$number'),3,0,0,0,0,0,1,1,2))
One problem, if you would call it a problem, you cannot set a limit. Also, you need to use [void] or the script will get mad.
Using the .net syntax (like CB pointed above)
you also add coherence to your 'tabular' array...
if you define a array...
and you try to store diferent types
Powershell will 'alert' you:
$a = New-Object 'byte[,]' 4,4
$a[0,0] = 111; // OK
$a[0,1] = 1111; // Error
Of course Powershell will 'help' you
in the obvious conversions:
$a = New-Object 'string[,]' 2,2
$a[0,0] = "1111"; // OK
$a[0,1] = 111; // OK also
Another thread pointed here about how to add to a multidimensional array in Powershell. I don't know if there is some reason not to use this method, but it worked for my purposes.
$array = #()
$array += ,#( "1", "test1","a" )
$array += ,#( "2", "test2", "b" )
$array += ,#( "3", "test3", "c" )
Im found pretty cool solvation for making arrays in array.
$GroupArray = #()
foreach ( $Array in $ArrayList ){
$GroupArray += #($Array , $null)
}
$GroupArray = $GroupArray | Where-Object {$_ -ne $null}
Lent from above:
$arr = ConvertFrom-Csv #'
Name,Article,Size
David,TShirt,M
Eduard,Trouwsers,S
'#
Print the $arr:
$arr
Name Article Size
---- ------- ----
David TShirt M
Eduard Trouwsers S
Now select 'David'
$arr.Where({$_.Name -eq "david"})
Name Article Size
---- ------- ----
David TShirt M
Now if you want to know the Size of 'David'
$arr.Where({$_.Name -eq "david"}).size
M
Does the latest version of Powershell have the ability to do something like JavaScript's:
var point = new Object();
point.x = 12;
point.y = 50;
If not, what is the equivalent or workaround?
UPDATE
Read all comments
The syntax is not directly supported by the functionality is there via the add-member cmdlet's. Awhile ago, I wrapped this functionality in a general purpose tuple function.
This will give you the ability to one line create these objects.
$point = New-Tuple "x",12,"y",50
Here is the code for New-Tuple
function New-Tuple()
{
param ( [object[]]$list= $(throw "Please specify the list of names and values") )
$tuple = new-object psobject
for ( $i= 0 ; $i -lt $list.Length; $i = $i+2)
{
$name = [string]($list[$i])
$value = $list[$i+1]
$tuple | add-member NoteProperty $name $value
}
return $tuple
}
Blog Post on the subject: http://blogs.msdn.com/jaredpar/archive/2007/11/29/tuples-in-powershell.aspx#comments
For simple ways, first, is a hashtable (available in V1)
$obj = #{}
$obj.x = 1
$obj.y = 2
Second, is a PSObject (easier in V2)
$obj = new-object psobject -property #{x = 1; y =2}
It gives you roughly the same object, but psobjects are nicer if you want to sort/group/format/export them
Sorry, even though the selected answer is good, I couldn't resist the hacky one line answer:
New-Object PsObject | Select-Object x,y | %{$_.x = 12; $_.y = 50; $foo = $_; }
You can do it like this:
$point = New-Object Object |
Add-Member NoteProperty x ([int] 12) -passThru |
Add-Member NoteProperty y ([int] 15) -passThru
Regarding one of your comments elsewhere, custom objects may be more useful than hash tables because they work better with cmdlets that expect objects to have named properties. For example:
$mypoints | Sort-Object y # mypoints sorted by y-value
$point = "" | Select #{Name='x'; Expression={12}} ,#{Name='y'; Expression={15}}
or more intuitively
$point = "" | Select x,y
$point.x=12; $point.y=15