Powershell compare arrays and get unique values - powershell

I am currently trying to write a powershell script that can be run weekly on two CSV files, to check they both contain the same information. I want the script to output anything that appears in one file but not the other to a new file.
The script I have written so far compares the two but only adds <= and => to the values.
It also doesn't work all the time, because I manually checked the file and found results that existed in both.
Code below:
$NotPresents = compare-object -ReferenceObject $whatsup -DifferenceObject $vmservers -Property device
foreach ($NotPresent in $NotPresents)
{
Write-Host $NotPresent.device
}
$NotPresents | Out-File Filepath.txt
$NotPresents.count
Any ideas what I have done wrong?

In order to avoid having to iterate over one of the arrays more than once*, you may want to throw them each into a hashtable:
$whatsupTable = #{}
foreach($entry in $whatsup){
$whatsupTable[$entry.device] = $true
}
$vmserversTable = #{}
foreach($entry in $vmservers){
$vmserversTable[$entry.device] = $true
}
Now you can easily find the disjunction with a single loop and a lookup against the other table:
$NotInWhatsUp = $vmservers |Where { -not $whatsupTable[$_] }
$NotInVMServers = $whatsup |Where { -not $vmserversTable[$_] }
*) ok, technically we're looping through each twice, but still much better than nested looping

Related

Check if a condition is met by a line within a TXT but "in an advanced way"

I have a TXT file with 1300 megabytes (huge thing). I want to build code that does two things:
Every line contains a unique ID at the beginning. I want to check for all lines with the same unique ID if the conditions is met for that "group" of IDs. (This answers me: For how many lines with the unique ID X have all conditions been met)
If the script is finished I want to remove all lines from the TXT where the condition was met (see 2). So I can rerun the script with another condition set to "narrow down" the whole document.
After few cycles I finally have a set of conditions that applies to all lines in the document.
It seems that my current approach is very slow.( one cycle needs hours). My final result is a set of conditions that apply to all lines of code.
If you find an easier way to do that, feel free to recommend.
Help is welcome :)
Code so far (does not fullfill everything from 1&2)
foreach ($item in $liste)
{
# Check Conditions
if ( ($item -like "*XXX*") -and ($item -like "*YYY*") -and ($item -notlike "*ZZZ*")) {
# Add a line to a document to see which lines match condition
Add-Content "C:\Desktop\it_seems_to_match.txt" "$item"
# Retrieve the unique ID from the line and feed array.
$array += $item.Split("/")[1]
# Remove the line from final document
$liste = $liste -replace $item, ""
}
}
# Pipe the "new cleaned" list somewhere
$liste | Set-Content -Path "C:\NewListToWorkWith.txt"
# Show me the counts
$array | group | % { $h = #{} } { $h[$_.Name] = $_.Count } { $h } | Out-File "C:\Desktop\count.txt"
Demo Lines:
images/STRINGA/2XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg images/STRINGA/3XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg images/STRINGB/4XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg images/STRINGB/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg images/STRINGC/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
performance considerations:
Add-Content "C:\Desktop\it_seems_to_match.txt" "$item"
try to avoid wrapping cmdlet pipelines
See also: Mastering the (steppable) pipeline
$array += $item.Split("/")[1]
Try to avoid using the increase assignment operator (+=) to create a collection
See also: Why should I avoid using the increase assignment operator (+=) to create a collection
$liste = $liste -replace $item, ""
This is a very expensive operation considering that you are reassigning (copying) a long list ($liste) with each iteration.
Besides it is a bad practice to change an array that you are currently iterating.
$array | group | ...
Group-Object is a rather slow cmdlet, you better collect (or count) the items on-the-fly (where you do $array += $item.Split("/")[1]) using a hashtable, something like:
$Name = $item.Split("/")[1]
if (!$HashTable.Contains($Name)) { $HashTable[$Name] = [Collections.Generic.List[String]]::new() }
$HashTable[$Name].Add($Item)
To minimize memory usage it may be better to read one line at a time and check if it already exists. Below code I used StringReader and you can replace with StreamReader for reading from a file. I'm checking if the entire string exists, but you may want to split the line. Notice I have duplicaes in the input but not in the dictionary. See code below :
$rows= #"
images/STRINGA/2XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGA/3XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGB/4XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGB/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGC/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGA/2XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGA/3XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGB/4XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGB/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGC/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
"#
$dict = [System.Collections.Generic.Dictionary[int, System.Collections.Generic.List[string]]]::new();
$reader = [System.IO.StringReader]::new($rows)
while(($row = $reader.ReadLine()) -ne $null)
{
$hash = $row.GetHashCode()
if($dict.ContainsKey($hash))
{
#check if list contains the string
if($dict[$hash].Contains($row))
{
#string is a duplicate
}
else
{
#add string to dictionary value if it is not in list
$list = $dict[$hash].Value
$list.Add($row)
}
}
else
{
#add new hash value to dictionary
$list = [System.Collections.Generic.List[string]]::new();
$list.Add($row)
$dict.Add($hash, $list)
}
}
$dict

Powershell script. More efficient way to perform nested foreach loops? [duplicate]

This question already has answers here:
In PowerShell, what's the best way to join two tables into one?
(5 answers)
Closed 5 months ago.
Good day.
I wrote a script that imports Excel files and then compares the rows. Each file contains about 13K rows. It is taking about 3 hours to process, which seems too long. This is happening because I am looping through every 13K rows from fileb for each row in filea.
Is there a more efficient way to do this?
Here is sample code:
#Import rows as customObject
rowsa = Import-Excel $filea
rowsb = Import-Excel $fileb
#Loop through each filea rows
foreach ($rowa in $rowsa)
{
#Loop through each fileb row. If the upc code matches rowa, check if other fields match
foreach ($rowb in $rowsb)
{
$rowb | Where-Object -Property "UPC Code" -Like $rowa.upc |
Foreach-Object {
if (( $rowa.uom2 -eq 'INP') -and ( $rowb.'Split Quantity' -ne $rowa.qty1in2 ))
{
#Do Something
}
}
}
Seems like you can leverage Group-Object -AsHashtable for this. See about Hash Tables for more info on why this should be faster.
$mapB = Import-Excel $fileb | Group-Object 'UPC Code' -AsHashTable -AsString
foreach($row in Import-Excel $filea) {
if($mapB.ContainsKey($row.upc)) {
$value = $mapB[$row.upc]
if($row.uom2 -eq 'INP' -and $row.qty1in2 -ne $value.'Split Quantity') {
$value # => has the row matching on UPC (FileA) / UPC Code (FileB)
$row # => current row in FileA
}
}
}
A few tricks:
The Object Pipeline may be easy, but it's not as fast as a statement
Try changing your code use to foreach statements, not Where-Object and Foreach-Object.
Use Hashtables to group.
While you can use Group-Object to do this, Group-Object suffers from the same performance problems as anything else in the pipeline.
Try to limit looping within looping.
As a general rule, looping within looping will be o(n^2). If you can avoid loops within loops, this is great. So switching the code around to loop thru A, then loop thru B, will be more efficient. So will exiting your loops as quickly as possible.
Consider using a benchmarking tool
There's a little module I make called Benchpress that can help you test multiple approaches to see which is faster. The Benchpress docs have a number of general PowerShell performance benchmarks to help you determine the fastest way to script a given thing.
Updated Script Below:
#Import rows as customObject
$rowsa = Import-Excel $filea
$rowsb = Import-Excel $fileb
$rowAByUPC = #{}
foreach ($rowA in $rowsa) {
# This assumes there will only be one row per UPC.
# If there is more than one, you may need to make a list here instead
$rowAByUPC[$rowA.UPC] = $rowA
}
foreach ($rowB in $rowsB) {
# Skip any rows in B that don't have a UPC code.
$rowBUPC = $rowB."UPC Code"
if (-not $rowBUPC) { continue }
$RowA = $rowAByUPC[$rowBUPC]
# It seems only rows that have 'INP' in uom2 are important
# so continue if missing
if ($rowA.uom2 -ne 'INP') { continue }
if ($rowA.qty1in2 -ne $rowB.'Split Quantity') {
# Do what you came here to do.
}
}
Please note that as you have not shared the code within the matching condition, you may need to take the advice contained in this answer and apply it to the inner code.

Powershell .Where() method with multiple properties

I have a GenericList of Hashtables, and I need to test for the existence of a record based on two properties. In my hash table, I have two records that share one property value, but are different on another property value.
Specifically, DisplayName of both is Autodesk Content for Revit 2023
But UninstallString for one is MsiExec.exe /X{GUID} while the other is C:\Program Files\Autodesk\AdODIS\V1\Installer.exe followed by a few hundred characters of other info
I want to select only the one with AdODIS in the UninstallString. And I would like to do it without a loop, and specifically using the .Where() method rather than the pipeline and Where-Object.
There are also MANY other records.
I CAN select just based on one property, like this...
$rawKeys.Where({$_.displayName -eq 'Autodesk Content for Revit 2023'})
And I get the appropriate two records returned. However, when I try expanding that to two properties with different criteria, like this...
$rawKeys.Where({($_.displayName -eq 'Autodesk Content for Revit 2023') -and ($_.uninstallString -like 'MsiExec.exe*')})
nothing is returned. I also tried chaining the .Where() calls, like this...
$rawKeys.Where({$_.displayName -eq 'Autodesk Content for Revit 2023'}).Where({$_.uninstallString -like 'MsiExec.exe*'})
and again, nothing returned.
just to be sure the second condition is working, I tried...
$rawKeys.Where({$_.uninstallString -like 'MsiExec.exe*'})
and got multiple records returned, as expected.
I found [this][1] that talk about doing it with Where-Object, and applying that approach to the method was my first attempt. But I have yet to see either an example of doing it with .Where() or something specifically saying .Where() is limited to one conditional.
So, am I just doing something wrong? Or is this actually not possible with .Where() and I have no choice but to use the pipeline? And there I would have thought based on that link that some variation on...
$rawKeys | Where-Object {(($_.displayName -eq 'Autodesk Content for Revit 2023') -and ($_.uninstallString -like 'MsiExec.exe*'))}
would work, but that's failing too.
I also tried...
$rawKeys.Where({$_.displayName -eq 'Autodesk Content for Revit 2023'}) -and $rawKeys.Where({$_.uninstallString -like 'MsiExec.exe*'})
And THAT returns true, which for my current need is enough, but one: I would like to know if it can be done in a single method call, and two: I can imagine I will eventually want to get the record(s) back, rather than just a bool. Which is only possible with the single method call.
EDIT: OK, this is weird. I tried doing a minimal example of actual data, like this...
$rawKeys = New-Object System.Collections.Generic.List[Hashtable]
$rawKeys.Add(#{
displayName = 'Autodesk Content for Revit 2023'
uninstallString = 'C:\Program Files\Autodesk\AdODIS\V1\Installer.exe whatever else is here'
guid = '{019AEF66-C054-39BB-88AD-B2D8EA9BE40A}'
})
$rawKeys.Add(#{
displayName = 'Autodesk Content for Revit 2023'
uninstallString = 'MsiExec.exe /X{205C6D76-2023-0057-B227-DC6376F702DC}'
guid = '{205C6D76-2023-0057-B227-DC6376F702DC}'
})
and that WORKS. So somewhere in my real code I am changing the data, and for the life of me I can't see where it's happening. But it's happening. The ACTUAL data comes from the registry, with this code...
$uninstallKeyPaths = #('SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall',
'SOFTWARE\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall')
$rawKeys = New-Object System.Collections.Generic.List[Hashtable]
$localMachineHive = [Microsoft.Win32.RegistryKey]::OpenBaseKey([Microsoft.Win32.RegistryHive]::LocalMachine, 0)
foreach ($uninstallKeyPath in $uninstallKeyPaths) {
foreach ($uninstallKeyName in $localMachineHive.OpenSubKey($uninstallKeyPath).GetSubKeyNames()) {
if ($uninstallKeyPath -like '*Wow6432Node*') {
$bitness = 'x32'
} else {
$bitness = 'x64'
}
$uninstallKey = $localMachineHive.OpenSubKey("$uninstallKeyPath\$uninstallKeyName")
if (($displayName = $uninstallKey.GetValue('DisplayName')) -and ($displayVersion = $uninstallKey.GetValue('DisplayVersion')) -and
(($installDate = $uninstallKey.GetValue('InstallDate')) -or ($uninstallString = $uninstallKey.GetValue('UninstallString')))) {
$keyName = [System.IO.Path]::GetFileName($uninstallKey.Name)
$keyData = #{
displayName = $displayName
displayVersion = $displayVersion
guid = "$(if ($keyName -match $pattern.guid) {$keyName})" #$Null
publisher = $uninstallKey.GetValue('Publisher')
uninstallString = $uninstallString
installDate = $installDate
properties = (#($uninstallKey.GetValueNames()) | Sort-Object) -join ', '
type = $bitness
}
[void]$rawKeys.Add($keyData)
}
}
}
So, meaningless unless you actually have Autodesk Revit 2023 installed on your machine, but maybe someone sees where I am changing the data.
[1]: Where-object $_ matches multiple criterias

Fast compare two large csv(boths rows and columns) in powershell

I have two large CSVs to compare. Bosth csvs are basically data from the same system 1 day apart. No of rows are around 12k and columns 30.
The aim is to identify what column data has changed for primary key(#ID).
My idea was to loop through the CSVs to identify which rows have changed and dump these into a separate csvs. One done, I again loop through the changes rows, and indetify the exact change in column.
NewCSV = Import-Csv -Path ".\Data_A.csv"
OldCSV = Import-Csv -Path ".\Data_B.csv"
foreach ($LineNew in $NewCSV)
{
ForEach ($LineOld in $OldCSV)
{
If($LineNew -eq $LineOld)
{
Write-Host $LineNew, " Match"
}else{
Write-Host $LineNew, " Not Match"
}
}
}
But as soon as run the loop, it takes forever to run for 12k rows. I was hoping there must be a more efficient way to compare large files powershell. Something that is quicker.
Well you can give this a try, I'm not claiming it will be fast for what vonPryz has already pointed out but it should give you a good side-by-side perspective to compare what has changed from OldCsv to NewCsv.
Note: Those cells that have the same value on both CSVs will be ignored.
$NewCSV = Import-Csv -Path ".\Data_A.csv"
$OldCSV = Import-Csv -Path ".\Data_B.csv" | Group-Object ID -AsHashTable -AsString
$properties = $newCsv[0].PSObject.Properties.Name
$result = foreach($line in $NewCSV)
{
if($ref = $OldCSV[$line.ID])
{
foreach($prop in $properties)
{
if($line.$prop -ne $ref.$prop)
{
[pscustomobject]#{
ID = $line.ID
Property = $prop
OldValue = $ref.$prop
NewValue = $line.$prop
}
}
}
continue
}
Write-Warning "ID $($line.ID) could not be found on Old Csv!!"
}
As vonPryz hints in the comments, you've written an algorithm with quadratic time complexity (O(n²) in Big-O notation) - every time the input size doubles, the number of computations performed increase 4-fold.
To avoid this, I'd suggest using a hashtable or other dictionary type to hold each data set, and use the primary key from the input as the dictionary key. This way you get constant-time lookup of corresponding records, and the time complexity of your algorithm becomes near-linear (O(2n + k)):
$NewCSV = #{}
Import-Csv -Path ".\Data_A.csv" |ForEach-Object {
$NewCSV[$_.ID] = $_
}
$OldCSV = #{}
Import-Csv -Path ".\Data_B.csv" |ForEach-Object {
$OldCSV[$_.ID] = $_
}
Now that we can efficiently resolve each row by it's ID, we can inspect the whole of the data sets with an independent loop over each:
foreach($entry in $NewCSV.GetEnumerator()){
if(-not $OldCSV.ContainsKey($entry.Key)){
# $entry.Value is a new row, not seen in the old data set
}
$newRow = $entry.Value
$oldRow = $OldCSV[$entry.Key]
# do the individual comparison of the rows here
}
Do another loop like above, but with $NewCSV in place of $OldCSV to find/detect deletions.

Powershell array of arrays loop process

I need help with loop processing an array of arrays. I have finally figured out how to do it, and I am doing it as such...
$serverList = $1Servers,$2Servers,$3Servers,$4Servers,$5Servers
$serverList | % {
% {
Write-Host $_
}
}
I can't get it to process correctly. What I'd like to do is create a CSV from each array, and title the lists accordingly. So 1Servers.csv, 2Servers.csv, etc... The thing I can not figure out is how to get the original array name into the filename. Is there a variable that holds the list object name that can be accessed within the loop? Do I need to just do a separate single loop for each list?
You can try :
$1Servers = "Mach1","Mach2"
$2Servers = "Mach3","Mach4"
$serverList = $1Servers,$2Servers
$serverList | % {$i=0}{$i+=1;$_ | % {New-Object -Property #{"Name"=$_} -TypeName PsCustomObject} |Export-Csv "c:\temp\$($i)Servers.csv" -NoTypeInformation }
I take each list, and create new objects that I export in a CSV file. The way I create the file name is not so nice, I don't take the var name I just recreate it, so if your list is not sorted it will not work.
It would perhaps be more efficient if you store your servers in a hash table :
$1Servers = #{Name="1Servers"; Computers="Mach1","Mach2"}
$2Servers = #{Name="2Servers"; Computers="Mach3","Mach4"}
$serverList = $1Servers,$2Servers
$serverList | % {$name=$_.name;$_.computers | % {New-Object -Property #{"Name"=$_} -TypeName PsCustomObject} |Export-Csv "c:\temp\$($name).csv" -NoTypeInformation }
Much like JPBlanc's answer, I kinda have to kludge the filename... (FWIW, I can't see how you can get that out of the array itself).
I did this example w/ foreach instead of foreach-object (%). Since you have actual variable names you can address w/ foreach, it seems a little cleaner, if nothing else, and hopefully a little easier to read/maintain:
$1Servers = "apple.contoso.com","orange.contoso.com"
$2Servers = "peach.contoso.com","cherry.contoso.com"
$serverList = $1Servers,$2Servers
$counter = 1
foreach ( $list in $serverList ) {
$fileName = "{0}Servers.csv" -f $counter++
"FileName: $fileName"
foreach ( $server in $list ) {
"-- ServerName: $server"
}
}
I was able to resolve this issue myself. Because I wasn't able to get the object name through, I just changed the nature of the object. So now my server lists consist of two columns, one of which is the name of the list itself.
So...
$1Servers = += [pscustomobject] #{
Servername = $entry.Servername
Domain = $entry.Domain
}
Then...
$serverList = $usaServers,$devsubServers,$wtencServers,$wtenclvServers,$pcidevServers
Then I am able to use that second column to name the lists within my foreach loop.