Rewrite powershell file comparison without loops or if statements - powershell

What's the best way to rewrite the following powershell code that compares a list of two files, ensuring they have the same (or greater) file count, and that the second list contains every file in the first list:
$noNewFiles = $NewFiles.Count -ge $OldFiles.Count
foreach ($oldFile in $OldFiles){
if (!$NewFiles.Contains($oldFile)) {
return $false
}
}

PSv3+ syntax (? is a built-in alias for Where-Object cmdlet):
(Compare-Object $NewFiles $OldFiles | ? SideIndicator -eq '=>').Count -eq 0
More efficient PSv4+ alternative, using the Where() method (as suggested by Brian (the OP) himself):
(Compare-Object $NewFiles $OldFiles).Where-Object({ $_.SideIndicator -eq '=>' }).Count -eq 0
By default, Compare-Object only returns differences between two sets and outputs objects whose .SideIndicator property indicates the set that an element is unique to:
Since string value => indicates an element that is unique to the 2nd set (RHS), we can filter the differences down to elements unique to the 2nd set, so if their count is 0, the implication is that there are no elements unique to the 2nd set.
Side note:
How "sameness" (equality) is determined depends on the data type of the elements.
A pitfall is that instances of reference types are compared by their .ToString() values, which can result in disparate objects being considered equal. For instance, Compare-Object #{ one=1 } #{ two=2 } produces no output.

Related

Compare 2 arrays with powershell

I have 2 Arrays $UsersGroup and $UsersActive, i need to find where in $UsersActive i have a line with SamAccountName and the ObjectGUID .
$UsersGroup =
SamAccountName ObjectGUID
-------------- ----------
XXXX00XX 0031e949-9120-4df1-bddb-98067a141448
XXXX01XX 0031e949-9120-4df1-bdgb-99067a141448
XXXX02XX 0031e949-9120-4df1-bdab-97067a141448
and without headers
$UsersActive =
fcb483fa146b
fcb515739a2f
fcb82f1ef74c
fcc5ee8b8722
fcd3f1f471c2
fceb26a598a3
fd0b14cecd0e
98067a141448
I need to have the match user from $UsersActive to $UserGroup.Object like that
$UsersGroup | ForEach-Object {if($_.ObjectGUID -contains $UsersActive) {$_}}
But i don't get the result like that :
XXXX00XX 0031e949-9120-4df1-bddb-98067a141448
Can some one help me , thanks !
-contains is a collection containment operator, testing for the exact occurrence of the right-hand side argument in the left-hand side argument.
To test for the presence of a substring in a string, use the -like wildcard string comparison operator:
$UsersGroup | Where-Object {
$guid = $_.ObjectGUID
$UsersActive.Where({$guid -like "*$_*"}, 'First')
}
Each group entry will now be tested against every $UsersActive value until a match is found (causing Where-Object to pass the object through) or no match is found (causing Where-Object to filter out the object)
If I understand you correctly, using compare-object... This has only one match. Compare-object can be slow for large lists.
$usersgroup =
'SamAccountName,ObjectGUID
XXXX00XX,0031e949-9120-4df1-bddb-98067a141448
XXXX01XX,0031e949-9120-4df1-bdgb-99067a141448
XXXX02XX,0031e949-9120-4df1-bdab-97067a141448' | convertfrom-csv
$usersactive = -split
'fcb483fa146b
fcb515739a2f
fcb82f1ef74c
fcc5ee8b8722
fcd3f1f471c2
fceb26a598a3
fd0b14cecd0e
98067a141448'
compare ($usersgroup.objectguid -replace '.*-') $usersactive -IncludeEqual |
? sideindicator -eq '==' # order doesn't matter
InputObject SideIndicator
----------- -------------
98067a141448 ==
To offer an alternative solution:
As Mathias states in his answer, the -contains operator and its operands-reversed counterpart, -in, only perform equality comparison of a single comparison value against the elements of a collection (array).
As an aside: Given that $_.ObjectGUID is the scalar (single object) and $UsersActive the collection (array) to search in, you would have needed to use -in, not -contains ($_.ObjectGUID -in $UsersActive)
Unfortunately, as of version 7.3.2, PowerShell's pattern-matching operators, -like (with wildcard expressions) and -match (with regexes), do not support matching against multiple (an array of) patterns.
However, since you're looking to match a literal substring of each .ObjectGUID value, you can use -in if you extract that relevant substring first and use it as the comparison value:
# -> Object whose GUID is '0031e949-9120-4df1-bddb-98067a141448'
$usersGroup | Where-Object { ($_.ObjectGUID -split '-')[-1] -in $usersActive }
Note how the -split operator is used to split the GUID into tokens by -, with [-1] returning the last token, which -in then looks for by equality comparison in the $UsersActive array.
As an aside:
Allowing multiple patterns as the RHS of -like and -match - so as to return $true if any of them match - would be a helpful future improvement.
GitHub issue #2132 asks for just that.

Powershell eq operator saying hashes are different, while Write-Host is showing the opposite

I have a script that periodically generates a list of all files in a directory, and then writes a text file of the results to a different directory.
I'd like to change this so it checks the newest text file in the output directory, and only makes a new one if there's differences. It seemed simple enough.
Here's what I tried:
First I get the most recent file in the directory, grab the hash, and write my variable values to the console:
$lastFile = gci C:\ReportOutputDir | sort LastWriteTime | select -last 1 | Select-Object -ExpandProperty FullName
$oldHash = Get-FileHash $lastFile | Select-Object Hash
Write-Host 'lastFile = '$lastFile
Write-Host 'oldHash = '$oldHash
Output:
lastFile = C:\ReportOutputDir\test1.txt
oldHash = #{Hash=E7787C54F5BAE236100A24A6F453A5FDF6E6C7333B60ED8624610EAFADF45521}
Then I do the exact same gci on the FileList dir, and create a new file (new_test.txt), then grab the hash of this file:
gci -Path C:\FileLists -File -Recurse -Name -Depth 2 | Sort-Object | out-file C:\ReportOutputDir\new_test.txt
$newFile = gci C:\ReportOutputDir | sort LastWriteTime | select -last 1 | Select-Object -ExpandProperty FullName
$newHash = Get-FileHash $newFile | Select-Object Hash
Write-Host 'newFile = '$newFile
Write-Host 'newHash = '$newHash
Output:
newFile = C:\ReportOutputDir\new_test.txt
newHash = #{Hash=E7787C54F5BAE236100A24A6F453A5FDF6E6C7333B60ED8624610EAFADF45521}
Finally, I attempt my -eq operator where I'd usually simply remove the newFile if it's equal. For now, I'm just doing a simple :
if ($newHash -eq $oldHash){
'files are equal'
}
else {'files are not equal'}
And somehow, I'm getting
files are not equal
What gives? Also, for the record I was originally trying to save the gci output to a variable and comparing the contents of the last file to the gci output, but was also having trouble with the -eq operator. Fairly new to powershell stuff so I'm sure I'm doing something wrong here.
Select-Object Hash creates an object with a .Hash property and it is that property that contains the hash string.
The object returned is of type [pscustomobject], and two instances of this type never compare as equal - even if all their property names and values are equal:
The reason is that reference equality is tested, because [pscustomobject] is a .NET reference type that doesn't define custom equality-testing logic.
Testing reference equality means that only two references to the very same instance compare as equal.
A quick example:
PS> [pscustomobject] #{ foo = 1 } -eq [pscustomobject] #{ foo = 1 }
False # !! Two distinct instances aren't equal, no matter what they contain.
You have two options:
Compare the .Hash property values, not the objects as a whole:
if ($newHash.Hash -eq $oldHash.Hash) { # ...
If you don't need a [pscustomobject] wrapper for the hash strings, use Select-Object's -ExpandProperty parameter instead of the (possibly positionally implied) -Property parameter:
Select-Object -ExpandProperty Hash
As for why the Write-Host output matched:
When you force objects to be converted to string representations - essentially, Write-Host calls .ToString() on its arguments - the string representations of distinct [pscustomobject] instances that have the same properties and values will be the same:
PS> "$([pscustomobject] #{ foo = 1 })" -eq "$([pscustomobject] #{ foo = 1 })"
True # Same as: '#{foo=1}' -eq '#{foo=1}'
However, you should not rely on these hashtable-like string representations to determine equality of [pscustomobject]s as a whole, because of the inherent limitations of these representations, which can easily yield false positives.
This answer shows how to compare [pscustomobject] instances as a whole, by comparing all of their property values, by passing all property names to Compare-Object -Property - but note that this assumes that all property values are either strings or instances of .NET value types or corresponding properties must again either reference the very same instance of a .NET reference type or be of a type that implements custom equality-comparison logic.

How best to speed up powershell processing time (compare-object)

I have a powershell script which uses Compare-Object to diff/compare a list of MD5 checksum's against each-other ... how can I speed this up? its been running for hours!
$diffmd5 =
(Compare-Object -ReferenceObject $localmd5 -DifferenceObject $remotefilehash |
Where-Object { ($_.SideIndicator -eq '=>') } |
Select-Object -ExpandProperty InputObject)
Compare-Object is convenient, but indeed slow; also avoiding the pipeline altogether is important for maximizing performance.
I suggest using a [System.Collections.Generic.HashSet[T] instance, which supports high-performance lookups in a set of unordered[1] values:[2]
# Two sample arrays
$localmd5 = 'Foo1', 'Bar1', 'Baz1'
$remotefilehash = 'foo1', 'bar1', 'bar2', 'baz1', 'more'
# Create a hash set from the local hashes.
# Make lookups case-*insensitive*.
# Note: Strongly typing the input array ([string[]]) is a must.
$localHashSet = [System.Collections.Generic.HashSet[string]]::new(
[string[]] $localmd5,
[System.StringComparer]::OrdinalIgnoreCase
)
# Loop over all remote hashes to find those not among the local hashes.
$remotefilehash.Where({ -not $localHashSet.Contains($_) })
The above yields collection 'bar2', 'more'.
Note that if case-sensitive lookups are sufficient, which is the default (for string elements), a simple cast is sufficient to construct the hash set:
$localHashSet = [System.Collections.Generic.HashSet[string]] $localmd5
Note: Your later feedback states that $remotefilehash is a hashtable(-like) collection of key-value pairs rather than a collection of mere file-hash strings, in which the keys store the hash strings. In that case:
To find just the differing hash strings (note the .Keys property access to get the array of key values):
$remotefilehash.Keys.Where({ -not $localHashSet.Contains($_) })
To find those key-value pairs whose keys are not in the hash set (note the .GetEnumerator() call to enumerate all entries (key-value pairs)):
$remotefilehash.GetEnumerator().Where({ -not $localHashSet.Contains($_.Key) })
Alternatively, if the input collections are (a) of the same size and (b) have corresponding elements (that is, element 1 from one collection should be compared to element 1 from the other, and so on), using Compare-Object with -SyncWindow 0, as shown in js2010's helpful answer, with subsequent .SideIndicator filtering may be an option; to speed up the operation, the -PassThru switch should be used, which forgoes wrapping the differing objects in [pscustomobject] instances (the .SideIndicator property is then added as a NoteProperty member directly to the differing objects).
[1] There is a related type for maintaining sorted values, System.Collections.Generic.SortedSet[T], but - as of .NET 6 - no built-in type for maintaining values in input order, though you can create your own type by deriving from [System.Collections.ObjectModel.KeyedCollection[TKey, TItem]]
[2] Note that a hash set - unlike a hash table - has no values associated with its entries. A hash set is "all keys", if you will - all it supports is testing for the presence of a key == value.
By default, compare-object compares every element in the first array with every element in the second array (up to about 2 billion positions), so the order doesn't matter, but large lists would be very slow. -syncwindow 0 would be much faster but would require matches to be in the same exact positions:
Compare-Object $localmd5 $remotefilehash -syncwindow 0
As a simple demo of -syncwindow:
compare-object 1,2,3 1,3,2 -SyncWindow 0 # not equal
InputObject SideIndicator
----------- -------------
3 =>
2 <=
2 =>
3 <=
compare-object 1,2,3 1,3,2 -SyncWindow 1 # equal
compare-object 1,2,3 1,2,3 -SyncWindow 0 # equal
I feel this should be faster than Compare-Object
$result = [system.collections.generic.list[string]]::new()
foreach($hash in $remotefilehash)
{
if(-not($localmd5.contains($hash)))
{
$result.Add($hash)
}
}
The problem here is that .contains method is case sensitive, I believe all MD5 hashes have uppercase letters but if this was not the case you would need to call the .toupper() or .tolower() methods to normalize the arrays.

Powershell: Comparing Array (-notcontains) will not work

I want to compare two Lists of Arrays and to display the Delta in a third array:
$ListOfVMs
$ListOfRunningVMs
$StoppedVMs = $ListOfVMs | { Where-Object $_.Name -notcontains $ListOfVMs.Name }
This Filter delivers still the complete Content of $ListOfVMs and not only the Delta. What I am doing wrong?
You may do the following:
$StoppedVMs = $ListOfVMs | Where-Object Name -notin $ListOfRunningVMs.Name
You need to pipe to Where-Object. Where-Object is what contains the script block (if you need to use it). You are also not comparing both lists here as you only reference $ListOfVMs.
Since you are comparing a single item against a collection, you will want to use -notin if the single item is on the left-hand side (LHS). -notcontains would be used if the collection is on the LHS.

Comparing Two Arrays Without Using -Compare

I have two array's, one contains multiple columns from a CSV file read in, and the other just contains server names, both type string. For this comparison, I plan on only using the name column from the CSV file. I don't want to use -compare because I want to still be able to use all CSV columns with the results. Here is an example of data from each array.
csvFile.Name:
linu40944
windo2094
windo4556
compareFile:
linu40944
windo2094
linu24455
As you can see, they contain similar server names, except $csvFile.Name contains 25,000+ records, and $compareFile contains only 3,500.
I've tried:
foreach ($server in $compareFile) {
if ($csvFile.Name -like $server) {
$count++
}
}
Every time I run this, it takes forever to run, and results in $count having a value in the millions when it should be roughly 3,000. I've tried different variations of -match, -eq, etc. where -like is. Also note that my end goal is to do something else where $count is, but for now I'm just trying to make sure it is outputting as much as it should, which it is not.
Am I doing something wrong here? Am I using the wrong formatting?
One possible thought given the size of your data.
Create a hashtable (dictionary) for every name in the first/larger file. Name is the Key. Value is 0 for each.
For each name in your second/smaller/compare file, add 1 to the value in your hashtable IF it exists. If it does not exist, what is your plan???
Afterwards, you can dump all keys and values and see which ones are 0, 1, or >1 which may or may not be of value to you.
If you need help with this code, I may be able to edit my answer. Since you are new, to StackOverflow, perhaps you want to try this first yourself.
Build custom objects from $compareFile (so that you can compare the same property), then use Compare-Object with the parameter -PassThru for the comparison. Discriminate the results using the SideIndicator.
$ref = $compareFile | ForEach-Object {
New-Object -Type PSObject -Property #{
'Name' = $_
}
}
Compare-Object $csvFile $ref -Property Name -PassThru | Where-Object {
$_.SideIndicator -eq '<='
} | Select-Object -Property * -Exclude SideIndicator
The trailing Select-Object removes the additional property SideIndicator that Compare-Object adds to the result.