Powershell compare items for order when some are optional

Powershell compare items for order when some are optional - powershell

I have some XML with a number of <task> nodes that can contain a combination of four child nodes in this order; <rules>, <preprocess>, <process>, <postprocess>.
<process> is mandatory, but the other three are optional. I need to validate this XML before using it to instantiate my Task object, and I can't use XSD because XSD 1.0 doesn't support some of the other things I have going on in the XML.
My thinking is this. I can convert the node names to a list $providedData, and also have a list $requiredOrder with the four node names in the required order, then duplicate that as a list $workingOrder. Loop through the requiredOrder and any item that isn't in $providedData is removed from $workingOrder. Now I have $workingOrder with the same items as $providedData, but in the order defined by $requiredOrder. Now a comparison tells me if $providedData is correctly ordered. So...
$requiredOrder = #('rules', 'preprocess', 'process', 'postprocess')
$providedData = #('preprocess', 'process')
CLS
$workingOrder = [System.Collections.Generic.List[String]]::new()
$workingOrder.AddRange([System.Collections.Generic.List[String]]$requiredOrder)
$providedOrder = [System.Collections.Generic.List[String]]::new()
$providedOrder.AddRange([System.Collections.Generic.List[String]]$providedData)
foreach ($item in $requiredOrder) {
if ($providedOrder -notContains $item) {
$workingOrder.Remove($item) > $null
}
}
if (Compare-Object -ReferenceObject $workingOrder -DifferenceObject $providedOrder) {
Write-Host "Correct"
} else {
Write-Host "Incorrect"
}
I know I can't use the -eq operator, but I thought Compare-Object would work here. No dice, I get Incorrect. But if I just dump $workingOrder and $providedOrder to the console, they are (visually) the same.
So, two questions:
1: What am I doing wrong in my comparison here?
2: Is there a much better way to do this?
Interesting...
if (($workingOrder -join ',') -eq ($providedOrder -join ',')) { works.
I would still like to know if there is a better way, or a way to get Compare-Object to work. But I can proceed with this for now.

To compare whether two same-typed collections are equal, both in content and order, I like to use Enumerable.SequenceEqual():
function Test-NodeOrder
{
param([string[]]$Nodes)
$requiredOrder = #('rules', 'preprocess', 'process', 'postprocess')
$mandatory = #('process')
$matchingNodes = $Nodes.Where({$_ -in $requiredOrder})
if($missing = $mandatory.Where({$_ -notin $matchingNodes})){
Write-Warning "The following mandatory nodes are missing: [$($missing -join ', ')]"
return $false
}
$orderedNodes = $requiredOrder.Where({$_ -in $matchingNodes})
if(-not [System.Linq.Enumerable]::SequenceEqual([string[]]$matchingNodes, [string[]]$orderedNodes)){
Write-Warning "Wrong order provided - expected [$($orderedNodes -join ', ')] but got [$($matchingNodes -join ', ')]"
return $false
}
return $true
}
Output:
PS C:\> $providedData = #('preprocess', 'process')
PS C:\> Test-NodeOrder $providedData
True
PS C:\> $providedData = #('preprocess')
PS C:\> Test-NodeOrder $providedData
WARNING: The following mandatory nodes are missing: [process]
False
PS C:\> $providedData = #('preprocess', 'process', 'rules')
PS C:\> Test-NodeOrder $providedData
WARNING: Wrong order provided - expected [rules, preprocess, process] but got [preprocess, process, rules]
False

There are two problems with your code:
You need to invert your success-test logic when calling Compare-Object.
In order to compare two arrays for not only containing the same elements, but also in the same order (sequence equality), you need to use -SyncWindow 0.
Therefore:
if (-not (Compare-Object -SyncWindow 0 $workingOrder $providedOrder)) {
'Correct'
} else {
'Incorrect'
}
As for the success-test logic:
Compare-Object's output doesn't indicate success of the comparison; instead, it outputs the objects that differ.
Given PowerShell's implicit to-Boolean conversion, using a Compare-Object call directly as an if conditional typically means: if there are differences, the conditional evaluates to $true, and vice versa.
Since Compare-Object with -SyncWindow 0 outputs at least two difference objects (one pair for each array position that doesn't match) and since a 2+-element array is always $true when coerced to a [bool], you can simply apply the -not operator on the result, which reports $true if the Compare-Object call had no output (implying the arrays were the same), and $false otherwise.
Optional reading: Performance comparison between Compare-Object -SyncWindow 0 and [System.Linq.Enumerable]::SequenceEqual():
Mathias R. Jessen's helpful answer shows a LINQ-based alternative for sequence-equality testing based on the System.Linq.Enumerable.SequenceEqual method, which generally performs much better than Compare-Object -SyncWindow 0, though with occasional invocations with smallish array sizes that may not matter.
The following performance tests illustrate this, based on averaging 10 runs with 1,000-element arrays.
The absolute timings, measured on a macOS 10.15.7 system with PowerShell 7.1, will vary based on many factors, but the Factor column should give a sense of relative performance.
Note that the Compare-Object -SyncWindow 0 call is fastest only on the very first invocation in a session; after [System.Linq.Enumerable]::SequenceEqual() has been called once in a session, calling it is internally optimized and becomes much faster than the Compare-Object calls.
That is, if you simply re-run the tests in a session, [System.Linq.Enumerable]::SequenceEqual() will be the fastest method by far, along the lines of the 2nd group of results below:
--- 1,000 elements: ALL-positions-different case:
Factor Secs (1-run avg.) Command TimeSpan
------ ----------------- ------- --------
1.00 0.006 -not (Compare-Object $a1 $a2 -SyncWindow 0 | Select-Object -first 1) 00:00:00.0060075
1.59 0.010 [Linq.Enumerable]::SequenceEqual($a1, $a2) 00:00:00.0095582
3.78 0.023 -not (Compare-Object $a1 $a2 -SyncWindow 0) 00:00:00.0227288
--- 1,000 elements: 1-position-different-only case (Note: on first run in a session, the LINQ method is now compiled and is much faster):
Factor Secs (1-run avg.) Command TimeSpan
------ ----------------- ------- --------
1.00 0.000 [Linq.Enumerable]::SequenceEqual($a1, $a2) 00:00:00.0001879
22.40 0.004 -not (Compare-Object $a1 $a2 -SyncWindow 0) 00:00:00.0042097
24.86 0.005 -not (Compare-Object $a1 $a2 -SyncWindow 0 | Select-Object -first 1) 00:00:00.0046707
Optimizations for Compare-Object -SyncWindow 0 shown above:
Because Compare-Object -SyncWindow 0 outputs difference objects, in the worst-case scenario it outputs 2 * N objects - one pair of difference objects for each mismatched array position.
Piping to Select-Object -First 1 so as to only output one difference object is an effective optimization in this case, but note that Compare-Object still creates all objects up front (it isn't optimized to recognize that with -SyncWindow 0 it doesn't need to collect all input first).
-PassThru, to avoid construction of the [pscustomobject] wrappers, can sometimes help a little, but ultimately isn't worth combining with the more important Select-Object -First 1 optimization; the reason that it doesn't help more is that the passed-though objects are still decorated with a .SideIndicator ETS property, which is expensive too.
Test code that produced the above timings, which is based on the Time-Command function available from this Gist:
Note: Assuming you have looked at the linked code to ensure that it is safe (which I can personally assure you of, but you should always check), you can install Time-Command directly as follows:
irm https://gist.github.com/mklement0/9e1f13978620b09ab2d15da5535d1b27/raw/Time-Command.ps1 | iex
foreach ($i in 1..2) {
# Array size
[int] $n = 1000
# How many runs to average:
# If you set this to 1 and $n is at around 1,300 or below, ONLY the very first
# test result in a session will show
# Compare-Object $a1 $a2 -SyncWindow 0 | Select-Object -first 1
# as the fastest method.
# Once the LINQ method access is internally compiled,
# [Linq.Enumerable]::SequenceEqual() is dramatically faster, with any array size.
$runs = 1
# Construct the arrays to use.
# Note: In order to be able to pass the arrays directly to [Linq.Enumerable]::SequenceEqual($a1, $a2),
# they must be strongly typed.
switch ($i) {
1 {
Write-Host ('--- {0:N0} elements: ALL-positions-different case:' -f $n)
# Construct the arrays so that Compare-Object will report 2 * N
# difference objects.
# This maximizes the Select-Object -First 1 optimization.
[int[]] $a1 = 1..$n
[int[]] $a2 = , 0 + 1..($n-1)
}
default {
Write-Host ('--- {0:N0} elements: 1-position-different-only case (Note: on first run in a session, the LINQ method is now compiled and is much faster):' -f $n)
# Construct the arrays so that Compare-Object only outputs 2 difference objects.
[int[]] $a1 = 1..$n
[int[]] $a2 = 1..($n-1) + 42
}
}
Time-Command -Count $runs {
-not (Compare-Object $a1 $a2 -SyncWindow 0)
},
{
-not (Compare-Object $a1 $a2 -SyncWindow 0 | Select-Object -first 1)
},
{
[Linq.Enumerable]::SequenceEqual($a1, $a2)
} | Out-Host
}

Related

powershell get mac adress and output as text [duplicate]

Let's say we have an array of objects $objects. Let's say these objects have a "Name" property.
This is what I want to do
$results = #()
$objects | %{ $results += $_.Name }
This works, but can it be done in a better way?
If I do something like:
$results = objects | select Name
$results is an array of objects having a Name property. I want $results to contain an array of Names.
Is there a better way?

I think you might be able to use the ExpandProperty parameter of Select-Object.
For example, to get the list of the current directory and just have the Name property displayed, one would do the following:
ls | select -Property Name
This is still returning DirectoryInfo or FileInfo objects. You can always inspect the type coming through the pipeline by piping to Get-Member (alias gm).
ls | select -Property Name | gm
So, to expand the object to be that of the type of property you're looking at, you can do the following:
ls | select -ExpandProperty Name
In your case, you can just do the following to have a variable be an array of strings, where the strings are the Name property:
$objects = ls | select -ExpandProperty Name

As an even easier solution, you could just use:
$results = $objects.Name
Which should fill $results with an array of all the 'Name' property values of the elements in $objects.

To complement the preexisting, helpful answers with guidance of when to use which approach and a performance comparison.
Outside of a pipeline[1], use (requires PSv3+):
$objects.Name # returns .Name property values from all objects in $objects
as demonstrated in rageandqq's answer, which is both syntactically simpler and much faster.
Accessing a property at the collection level to get its elements' values as an array (if there are 2 or more elements) is called member-access enumeration and is a PSv3+ feature.
Alternatively, in PSv2, use the foreach statement, whose output you can also assign directly to a variable: $results = foreach ($obj in $objects) { $obj.Name }
If collecting all output from a (pipeline) command in memory first is feasible, you can also combine pipelines with member-access enumeration; e.g.:
(Get-ChildItem -File | Where-Object Length -lt 1gb).Name
Tradeoffs:
Both the input collection and output array must fit into memory as a whole.
If the input collection is itself the result of a command (pipeline) (e.g., (Get-ChildItem).Name), that command must first run to completion before the resulting array's elements can be accessed.
In a pipeline, in case you must pass the results to another command, notably if the original input doesn't fit into memory as a whole, use: $objects | Select-Object -ExpandProperty Name
The need for -ExpandProperty is explained in Scott Saad's answer (you need it to get only the property value).
You get the usual pipeline benefits of the pipeline's streaming behavior, i.e. one-by-one object processing, which typically produces output right away and keeps memory use constant (unless you ultimately collect the results in memory anyway).
Tradeoff:
Use of the pipeline is comparatively slow.
For small input collections (arrays), you probably won't notice the difference, and, especially on the command line, sometimes being able to type the command easily is more important.
Here is an easy-to-type alternative, which, however is the slowest approach; it uses ForEach-Object via its built-in alias, %, with simplified syntax (again, PSv3+):
; e.g., the following PSv3+ solution is easy to append to an existing command:
$objects | % Name # short for: $objects | ForEach-Object -Process { $_.Name }
Note: Use of the pipeline is not the primary reason this approach is slow, it is the inefficient implementation of the ForEach-Object (and Where-Object) cmdlets, up to at least PowerShell 7.2. This excellent blog post explains the problem; it led to feature request GitHub issue #10982; the following workaround greatly speeds up the operation (only somewhat slower than a foreach statement, and still faster than .ForEach()):
# Speed-optimized version of the above.
# (Use `&` instead of `.` to run in a child scope)
$objects | . { process { $_.Name } }
The PSv4+ .ForEach() array method, more comprehensively discussed in this article, is yet another, well-performing alternative, but note that it requires collecting all input in memory first, just like member-access enumeration:
# By property name (string):
$objects.ForEach('Name')
# By script block (more flexibility; like ForEach-Object)
$objects.ForEach({ $_.Name })
This approach is similar to member-access enumeration, with the same tradeoffs, except that pipeline logic is not applied; it is marginally slower than member-access enumeration, though still noticeably faster than the pipeline.
For extracting a single property value by name (string argument), this solution is on par with member-access enumeration (though the latter is syntactically simpler).
The script-block variant ({ ... }) allows arbitrary transformations; it is a faster - all-in-memory-at-once - alternative to the pipeline-based ForEach-Object cmdlet (%).
Note: The .ForEach() array method, like its .Where() sibling (the in-memory equivalent of Where-Object), always returns a collection (an instance of [System.Collections.ObjectModel.Collection[psobject]]), even if only one output object is produced.
By contrast, member-access enumeration, Select-Object, ForEach-Object and Where-Object return a single output object as-is, without wrapping it in a collection (array).
Comparing the performance of the various approaches
Here are sample timings for the various approaches, based on an input collection of 10,000 objects, averaged across 10 runs; the absolute numbers aren't important and vary based on many factors, but it should give you a sense of relative performance (the timings come from a single-core Windows 10 VM:
Important
The relative performance varies based on whether the input objects are instances of regular .NET Types (e.g., as output by Get-ChildItem) or [pscustomobject] instances (e.g., as output by Convert-FromCsv).
The reason is that [pscustomobject] properties are dynamically managed by PowerShell, and it can access them more quickly than the regular properties of a (statically defined) regular .NET type. Both scenarios are covered below.
The tests use already-in-memory-in-full collections as input, so as to focus on the pure property extraction performance. With a streaming cmdlet / function call as the input, performance differences will generally be much less pronounced, as the time spent inside that call may account for the majority of the time spent.
For brevity, alias % is used for the ForEach-Object cmdlet.
General conclusions, applicable to both regular .NET type and [pscustomobject] input:
The member-enumeration ($collection.Name) and foreach ($obj in $collection) solutions are by far the fastest, by a factor of 10 or more faster than the fastest pipeline-based solution.
Surprisingly, % Name performs much worse than % { $_.Name } - see this GitHub issue.
PowerShell Core consistently outperforms Windows Powershell here.
Timings with regular .NET types:
PowerShell Core v7.0.0-preview.3
Factor Command Secs (10-run avg.)
------ ------- ------------------
1.00 $objects.Name 0.005
1.06 foreach($o in $objects) { $o.Name } 0.005
6.25 $objects.ForEach('Name') 0.028
10.22 $objects.ForEach({ $_.Name }) 0.046
17.52 $objects | % { $_.Name } 0.079
30.97 $objects | Select-Object -ExpandProperty Name 0.140
32.76 $objects | % Name 0.148
Windows PowerShell v5.1.18362.145
Factor Command Secs (10-run avg.)
------ ------- ------------------
1.00 $objects.Name 0.012
1.32 foreach($o in $objects) { $o.Name } 0.015
9.07 $objects.ForEach({ $_.Name }) 0.105
10.30 $objects.ForEach('Name') 0.119
12.70 $objects | % { $_.Name } 0.147
27.04 $objects | % Name 0.312
29.70 $objects | Select-Object -ExpandProperty Name 0.343
Conclusions:
In PowerShell Core, .ForEach('Name') clearly outperforms .ForEach({ $_.Name }). In Windows PowerShell, curiously, the latter is faster, albeit only marginally so.
Timings with [pscustomobject] instances:
PowerShell Core v7.0.0-preview.3
Factor Command Secs (10-run avg.)
------ ------- ------------------
1.00 $objects.Name 0.006
1.11 foreach($o in $objects) { $o.Name } 0.007
1.52 $objects.ForEach('Name') 0.009
6.11 $objects.ForEach({ $_.Name }) 0.038
9.47 $objects | Select-Object -ExpandProperty Name 0.058
10.29 $objects | % { $_.Name } 0.063
29.77 $objects | % Name 0.184
Windows PowerShell v5.1.18362.145
Factor Command Secs (10-run avg.)
------ ------- ------------------
1.00 $objects.Name 0.008
1.14 foreach($o in $objects) { $o.Name } 0.009
1.76 $objects.ForEach('Name') 0.015
10.36 $objects | Select-Object -ExpandProperty Name 0.085
11.18 $objects.ForEach({ $_.Name }) 0.092
16.79 $objects | % { $_.Name } 0.138
61.14 $objects | % Name 0.503
Conclusions:
Note how with [pscustomobject] input .ForEach('Name') by far outperforms the script-block based variant, .ForEach({ $_.Name }).
Similarly, [pscustomobject] input makes the pipeline-based Select-Object -ExpandProperty Name faster, in Windows PowerShell virtually on par with .ForEach({ $_.Name }), but in PowerShell Core still about 50% slower.
In short: With the odd exception of % Name, with [pscustomobject] the string-based methods of referencing the properties outperform the scriptblock-based ones.
Source code for the tests:
Note:
Download function Time-Command from this Gist to run these tests.
Assuming you have looked at the linked code to ensure that it is safe (which I can personally assure you of, but you should always check), you can install it directly as follows:
irm https://gist.github.com/mklement0/9e1f13978620b09ab2d15da5535d1b27/raw/Time-Command.ps1 | iex
Set $useCustomObjectInput to $true to measure with [pscustomobject] instances instead.
$count = 1e4 # max. input object count == 10,000
$runs = 10 # number of runs to average
# Note: Using [pscustomobject] instances rather than instances of
# regular .NET types changes the performance characteristics.
# Set this to $true to test with [pscustomobject] instances below.
$useCustomObjectInput = $false
# Create sample input objects.
if ($useCustomObjectInput) {
# Use [pscustomobject] instances.
$objects = 1..$count | % { [pscustomobject] #{ Name = "$foobar_$_"; Other1 = 1; Other2 = 2; Other3 = 3; Other4 = 4 } }
} else {
# Use instances of a regular .NET type.
# Note: The actual count of files and folders in your file-system
# may be less than $count
$objects = Get-ChildItem / -Recurse -ErrorAction Ignore | Select-Object -First $count
}
Write-Host "Comparing property-value extraction methods with $($objects.Count) input objects, averaged over $runs runs..."
# An array of script blocks with the various approaches.
$approaches = { $objects | Select-Object -ExpandProperty Name },
{ $objects | % Name },
{ $objects | % { $_.Name } },
{ $objects.ForEach('Name') },
{ $objects.ForEach({ $_.Name }) },
{ $objects.Name },
{ foreach($o in $objects) { $o.Name } }
# Time the approaches and sort them by execution time (fastest first):
Time-Command $approaches -Count $runs | Select Factor, Command, Secs*
[1] Technically, even a command without |, the pipeline operator, uses a pipeline behind the scenes, but for the purpose of this discussion using the pipeline refers only to commands that use |, the pipeline operator, and therefore by definition involve multiple commands.

Caution, member enumeration only works if the collection itself has no member of the same name. So if you had an array of FileInfo objects, you couldn't get an array of file lengths by using
$files.length # evaluates to array length
And before you say "well obviously", consider this. If you had an array of objects with a capacity property then
$objarr.capacity
would work fine UNLESS $objarr were actually not an [Array] but, for example, an [ArrayList]. So before using member enumeration you might have to look inside the black box containing your collection.
(Note to moderators: this should be a comment on rageandqq's answer but I don't yet have enough reputation.)

I learn something new every day! Thank you for this. I was trying to achieve the same. I was directly doing this:
$ListOfGGUIDs = $objects.{Object GUID}
Which basically made my variable an object again! I later realized I needed to define it first as an empty array,
$ListOfGGUIDs = #()

Powershell Where-Object doesn't seem to filter

I'm trying to do some reporting on Azure Policies. I'll eventually be filtering on dates, but having trouble filtering on anything, so present the following sample.
PS C:\>$defstrings = az policy definition list --management-group "mgsandbox" # returns an array of strings
PS C:\>$def = ConvertFrom-Json -InputObject ($defstrings -join "`n") -depth 99 # converts to an array of PSCustomObject
PS C:\>$def.count
2070
PS C:\>$sel = Where-Object -inputobject $def -FilterScript { $_.displayName -eq "Kubernetes cluster containers should not share host process ID or host IPC namespace" }
PS C:\>$sel.count
2070
PS C:\> $def[0].displayName -eq "Kubernetes cluster containers should not share host process ID or host IPC namespace"
False
While I might possibly find more than one hit on the displayName, there are clearly a non-zero set of displayNames that do not match the filter, yet the selection is getting all of them.
Any suggestions what's wrong with my syntax? It seems straightforward.

Do not use an -InputObject argument with Where-Object; instead, provide input via the pipeline:
# Use the pipeline to provide input, don't use -InputObject
$def | Where-Object -FilterScript { $_.displayName -eq "Kubernetes cluster containers should not share host process ID or host IPC namespace" }
In most cmdlets, the -InputObject parameter is a mere implementation detail whose purpose is to facilitate pipeline input and cannot be meaningfully used directly; see this answer for more information.
As for what you tried:
When you use -InputObject, an argument that is a collection (enumerable) is passed as a whole to the cmdlet, whereas using the same in the pipeline cause its enumeration, i.e. the collection's elements are passed, one by one.
A simplified example:
# Sample array.
$arr = 1, 2, 3
# WRONG: array is passed *as a whole*
# and in this case outputs *all* its elements.
# -> 1, 2, 3
Where-Object -InputObject $arr { $_ -eq 2 }
That is, the script block passed to Where-Object is executed once, with the automatic $_ variable bound to the array as a whole, so that the above is in effect equivalent to:
if ($arr -eq 2) { $arr }
Since $arr -eq 2 evaluates to $true in a Boolean context (the if conditional), $arr as a whole is output (although on output it is enumerated), giving the impression that no filtering took place.
The reason that $arr -eq 2 evaluates to $true is that the -eq operator, among others, supports arrays as its LHS, in which case the behavior changes to filtering, by returning the sub-array of matching elements instead of a Boolean, so that 1, 2, 3 -eq 2 yields #(2) (an array containing the one matching element, 2), and coercing #(2) to a Boolean yields $true ([bool] #(2)).[1]
Conversely, if the implied conditional yields $false (e.g., $_ -eq 5), no output is produced at all.
By contrast, if you use the pipeline, you'll get the desired behavior:
# Sample array.
$arr = 1, 2, 3
# OK: Array elements are enumerated, i.e.
# sent *one by one* through the pipeline.
# -> 2
$arr | Where-Object{ $_ -eq 2 }
Alternatively, you can bypass the pipeline by using the intrinsic .Where() method:
Note: This requires collecting all input in memory first; however, especially with data already in memory, this approach performs better than the pipeline approach:
# OK:
# -> 2 (wrapped in a collection)
#(1, 2, 3).Where({ $_ -eq 2 })
Note: .Where() always outputs an array-like collection, even when only a single object matches the filter. In practice, however, that usually doesn't matter.
[1] For a summary of PowerShell's to-Boolean coercion rules, see the bottom section of this answer.

Which operator provides quicker output -match -contains or Where-Object for large CSV files

I am trying to build a logic where I have to query 4 large CSV files against 1 CSV file. Particularly finding an AD object against 4 domains and store them in variable for attribute comparison.
I have tried importing all files in different variables and used below 3 different codes to get the desired output. But it takes longer time for completion than expected.
CSV import:
$AllMainFile = Import-csv c:\AllData.csv
#Input file contains below
EmployeeNumber,Name,Domain
Z001,ABC,Test.com
Z002,DEF,Test.com
Z003,GHI,Test1.com
Z001,ABC,Test2.com
$AAA = Import-csv c:\AAA.csv
#Input file contains below
EmployeeNumber,Name,Domain
Z001,ABC,Test.com
Z002,DEF,Test.com
Z003,GHI,Test1.com
Z001,ABC,Test2.com
Z004,JKL,Test.com
$BBB = Import-Csv C:\BBB.csv
$CCC = Import-Csv C:\CCC.csv
$DDD = Import-Csv c:\DDD.csv
Sample code 1:
foreach ($x in $AllMainFile) {
$AAAoutput += $AAA | ? {$_.employeeNumber -eq $x.employeeNumber}
$BBBoutput += $BBB | ? {$_.employeeNumber -eq $x.employeeNumber}
$CCCoutput += $CCC | ? {$_.employeeNumber -eq $x.employeeNumber}
$DDDoutput += $DDD | ? {$_.employeeNumber -eq $x.employeeNumber}
if ($DDDoutput.Count -le 1 -and $AAAoutput.Count -le 1 -and $BBBoutput.Count -le 1 -and $CCCoutput.Count -le 1) {
#### My Other script execution code here
} else {
#### My Other script execution code here
}
}
Sample code 2 (just replacing with -match instead of Where-Object):
foreach ($x in $AllMainFile) {
$AAAoutput += $AAA -match $x.EmployeeNumber
$BBBoutput += $BBB -match $x.EmployeeNumber
$CCCoutput += $CCC -match $x.EmployeeNumber
$DDDoutput += $AllMainFile -match $x.EmployeeNumber
if ($DDDoutput.Count -le 1 -and $AAAoutput.Count -le 1 -and $BBBoutput.Count -le 1 -and $CCCoutput.Count -le 1) {
#### My Other script execution code here
} else {
#### My Other script execution code here
}
}
Sample code 3 (just replacing with -contains operator):
foreach ($x in $AllMainFile) {
foreach ($c in $AAA){ if ($AllMainFile.employeeNumber -contains $c.employeeNumber) {$AAAoutput += $c}}
foreach ($c in $BBB){ if ($AllMainFile.employeeNumber -contains $c.employeeNumber) {$BBBoutput += $c}}
foreach ($c in $CCC){ if ($AllMainFile.employeeNumber -contains $c.employeeNumber) {$CCCoutput += $c}}
foreach ($c in $DDD){ if ($AllMainFile.employeeNumber -contains $c.employeeNumber) {$DDDoutput += $c}}
if ($DDDoutput.Count -le 1 -and $AAAoutput.Count -le 1 -and $BBBoutput.Count -le 1 -and $CCCoutput.Count -le 1) {
#### My Other script execution code here
} else {
#### My Other script execution code here
}
}
I am expecting to execute the script as quick and fast as possible by comparing and lookup all 4 CSV files against 1 input file. Each files contains more than 1000k objects/rows with 5 columns.

Performance
Before answering the question, I would like to clear some air about measuring the performance of PowerShell cmdlets. Native PowerShell is very good in streaming objects and therefore could save a lot of memory if streamed correctly (do not assign a stream to a variable or use brackets). PowerShell is also capable of invoking almost every existing .Net methods (like Add()) and technologies like LINQ.
The usual way of measuring the performance of a command is:
(Measure-Command {<myCommand>}).TotalMilliseconds
If you use this on native powershell streaming cmdlets, they appear not to perform very well in comparison with statements and dotnet commands. Often it is concluded that e.g. LINQ outperforms native PowerShell commands well over a factor hundred. The reason for this is that LINQ is reactive and using a deferred (lazy) execution: It tells it has done the job but it is actually doing it at the moment you need any result (besides it is caching a lot of results which is easiest to exclude from a benchmark by starting a new session) where of Native PowerShell is rather proactive: it passes any resolved item immediately back into the pipeline and any next cmdlet (e.g. Export-Csv) might than finalize the item and release it from memory.
In other words, if you have a slow input (see: Advocating native PowerShell) or have a large amount data to process (e.g. larger than the physical memory available), it might be better and easier to use the Native PowerShell approach.
Anyways, if you are comparing any results, you should test is in practice and test it end-to-end and not just on data that is already available in memory.
Building a list
I agree that using the Add() method on a list is much faster that using += which concatenates the new item with the current array and then reassigns it back to the array.
But again, both approaches stall the pipeline as they collect all the data in memory where you might be better off to intermediately release the result to the disk.
HashTables
You will probably find the most performance improvement in using a hash table as they are optimized for a binary search.
As it is required to compare two collections to each other, you can't stream both but as explained, it might be best and easiest you use 1 hash table for one side and compare this to each item in a stream at the other side and because you want to compare the AllData which each of the other tables, it is best to index that table into memory (in the form of a hash table).
This is how I would do this:
$Main = #{}
ForEach ($Item in $All) {
$Main[$Item.EmployeeNumber] = #{MainName = $Item.Name; MainDomain = $Item.Domain}
}
ForEach ($Name in 'AAA', 'BBB', 'CCC', 'DDD') {
Import-Csv "C:\$Name.csv" | Where-Object {$Main.ContainsKey($_.EmployeeNumber)} | ForEach-Object {
[PSCustomObject](#{EmployeeNumber = $_.EmployeeNumber; Name = $_.Name; Domain = $_.Domain} + $Main[$_.EmployeeNumber])
} | Export-Csv "C:\Output$Name.csv"
}
Addendum
Based on the comment (and the duplicates in the lists), it appears that actually a join on all keys is requested and not just on the EmployeeNumber. For this you need to concatenate the concerned keys (separated with a separator that is not used in the data) and use that as key for the hash table.
Not in the question but from the comment it appears also that full-join is expected. For the right-join part this can be done by returning the right object in case there is no match found in the main table ($Main.ContainsKey($Key)). For the left-join part this is more complex as you will need to track ($InnerMain) which items in main are already matched and return the leftover items in the end:
$Main = #{}
$Separator = "`t" # Chose a separator that isn't used in any value
ForEach ($Item in $All) {
$Key = $Item.EmployeeNumber, $Item.Name, $Item.Domain -Join $Separator
$Main[$Key] = #{MainEmployeeNumber = $Item.EmployeeNumber; MainName = $Item.Name; MainDomain = $Item.Domain} # What output is expected?
}
ForEach ($Name in 'AAA', 'BBB', 'CCC', 'DDD') {
$InnerMain = #($False) * $Main.Count
$Index = 0
Import-Csv "C:\$Name.csv" | ForEach-Object {
$Key = $_.EmployeeNumber, $_.Name, $_.Domain -Join $Separator
If ($Main.ContainsKey($Key)) {
$InnerMain[$Index] = $True
[PSCustomObject](#{EmployeeNumber = $_.EmployeeNumber; Name = $_.Name; Domain = $_.Domain} + $Main[$Key])
} Else {
[PSCustomObject](#{EmployeeNumber = $_.EmployeeNumber; Name = $_.Name; Domain = $_.Domain; MainEmployeeNumber = $Null; MainName = $Null; MainDomain = $Null})
}
$Index++
} | Export-Csv "C:\Output$Name.csv"
$Index = 0
ForEach ($Item in $All) {
If (!$InnerMain[$Index]) {
$Key = $Item.EmployeeNumber, $Item.Name, $Item.Domain -Join $Separator
[PSCustomObject](#{EmployeeNumber = $Null; Name = $Null; Domain = $Null} + $Main[$Key])
}
$Index++
} | Export-Csv "C:\Output$Name.csv"
}
Join-Object
Just FYI, I have made a few improvements to Join-Object cmdlet (use and installation are very simple, see: In Powershell, what's the best way to join two tables into one?) including an easier changing of multiple joins which might come in handy for a request as this one. Although I still do not have full understanding of what you exactly looking for (and have minor questions like: how could the domains differ in a domain column if it is an extract from one specific domain?).
I take the general description "Particularly finding an AD object against 4 domains and store them in variable for attribute comparison" as leading.
In here I presume that the $AllMainFile is actually just an intermediate table existing out of a concatenation of all concerned tables (and not really necessarily but just confusing as it might contain to types of duplicates the employeenumbers from the same domain and the employeenumbers from other domains). If this is correct, you can just omit this table using the Join-Object cmdlet:
$AAA = ConvertFrom-Csv #'
EmployeeNumber,Name,Domain
Z001,ABC,Domain1
Z002,DEF,Domain2
Z003,GHI,Domain3
'#
$BBB = ConvertFrom-Csv #'
EmployeeNumber,Name,Domain
Z001,ABC,Domain1
Z002,JKL,Domain2
Z004,MNO,Domain4
'#
$CCC = ConvertFrom-Csv #'
EmployeeNumber,Name,Domain
Z005,PQR,Domain2
Z001,ABC,Domain1
Z001,STU,Domain2
'#
$DDD = ConvertFrom-Csv #'
EmployeeNumber,Name,Domain
Z005,VWX,Domain4
Z006,XYZ,Domain1
Z001,ABC,Domain3
'#
$AAA | FullJoin $BBB -On EmployeeNumber -Discern AAA |
FullJoin $CCC -On EmployeeNumber -Discern BBB |
FullJoin $DDD -On EmployeeNumber -Discern CCC,DDD | Format-Table
Result:
EmployeeNumber AAAName AAADomain BBBName BBBDomain CCCName CCCDomain DDDName DDDDomain
-------------- ------- --------- ------- --------- ------- --------- ------- ---------
Z001 ABC Domain1 ABC Domain1 ABC Domain1 ABC Domain3
Z001 ABC Domain1 ABC Domain1 STU Domain2 ABC Domain3
Z002 DEF Domain2 JKL Domain2
Z003 GHI Domain3
Z004 MNO Domain4
Z005 PQR Domain2 VWX Domain4
Z006 XYZ Domain1

Why does select-object -first 1 not return an array for consistency?

When I pipe some objects to select-object -first n it returns an array except if n is 1:
PS C:\> (get-process | select-object -first 1).GetType().FullName
System.Diagnostics.Process
PS C:\> (get-process | select-object -first 2).GetType().FullName
System.Object[]
For consistency reasons, I'd have expected both pipelines to return an array.
Apparently, PowerShell chooses to return one object as object rather than as an element in an array.
Why is that?

Why questions are generally indeterminate in cases like this, but it mostly boils down to:
Since we asked for the "-first 1" we would expect a single item.
If we received an array/list we would still need to index the first one to obtain just that one, which is pretty much what "Select-Object -First 1" is designed to do (in that case.)
The result can always be wrapped in #() to force an array -- perhaps in the case where we've calculated "-First $N" and don't actually know (at that moment in the code) that we might receive only 1.
The designer/developer thought it should be that way.
It's #3 that keeps it from being an issue:
$PSProcess = #(Get-Process PowerShell | Select -First 1)
...this will guarantee $PSProcces is an array no matter what the count.
It even works with:
$n = Get-Random 3
#(Get-Process -first $n) # $n => 0, 1, or 2 but always returns an array.

The pipeline will return the [System.Diagnostics.Process] object. In your first example it's only one object. The second one is an [System.Object[]] array of the [System.Diagnostics.Process].
$a = (get-process | select-object -first 1)
$a | Get-Member
$b = (get-process | select-object -first 2)
,$b | Get-Member

Rewrite powershell file comparison without loops or if statements

What's the best way to rewrite the following powershell code that compares a list of two files, ensuring they have the same (or greater) file count, and that the second list contains every file in the first list:
$noNewFiles = $NewFiles.Count -ge $OldFiles.Count
foreach ($oldFile in $OldFiles){
if (!$NewFiles.Contains($oldFile)) {
return $false
}
}

PSv3+ syntax (? is a built-in alias for Where-Object cmdlet):
(Compare-Object $NewFiles $OldFiles | ? SideIndicator -eq '=>').Count -eq 0
More efficient PSv4+ alternative, using the Where() method (as suggested by Brian (the OP) himself):
(Compare-Object $NewFiles $OldFiles).Where-Object({ $_.SideIndicator -eq '=>' }).Count -eq 0
By default, Compare-Object only returns differences between two sets and outputs objects whose .SideIndicator property indicates the set that an element is unique to:
Since string value => indicates an element that is unique to the 2nd set (RHS), we can filter the differences down to elements unique to the 2nd set, so if their count is 0, the implication is that there are no elements unique to the 2nd set.
Side note:
How "sameness" (equality) is determined depends on the data type of the elements.
A pitfall is that instances of reference types are compared by their .ToString() values, which can result in disparate objects being considered equal. For instance, Compare-Object #{ one=1 } #{ two=2 } produces no output.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Powershell compare items for order when some are optional - powershell

Related

powershell get mac adress and output as text [duplicate]

Powershell Where-Object doesn't seem to filter

Which operator provides quicker output -match -contains or Where-Object for large CSV files

Why does select-object -first 1 not return an array for consistency?

Rewrite powershell file comparison without loops or if statements

Categories

Resources