I have been using code from this answer to check for additions/changes to class rosters from MS Teams:
$set = [System.Collections.Generic.HashSet[string]]::new(
[string[]] (Import-CSV -Path stundent.csv).UserPrincipalName,
[System.StringComparer]::InvariantCultureIgnoreCase
)
Import-Csv ad.csv | Where-Object { $set.Add($_.UserPrincipalName) } |
Export-Csv path\to\output.csv -NoTypeInformation
Ideally, I want to be able to check if there have been removals when compared to a new file, swap the import file positions, and check for additions. If my files look like Source1 and Source2 (below), the check for removals would return Export1, and the check for additions would return Export2.
Since there will be multiple instances of students across multiple classes, I want to include TeamDesc in the filter query to make sure only the specific instance of that student with that class is returned.
Source1.csv
TeamDesc
UserPrincipalName
Name
Team 1
student1#domain.com
john smith
Team 1
student2#domain.com
nancy drew
Team 2
student3#domain.com
harvey dent
Team 3
student1#domain.com
john smith
Source2.csv
TeamDesc
UserPrincipalName
Name
Team 1
student2#domain.com
nancy drew
Team 2
student3#domain.com
harvey dent
Team 2
student4#domain.com
tim tams
Team 3
student1#domain.com
john smith
Export1.csv
TeamDesc
UserPrincipalName
Name
Team 1
student1#domain.com
john smith
Export2.csv
TeamDesc
UserPrincipalName
Name
Team 2
student4#domain.com
tim tams
Try the following, which uses Compare-Object to compare the CSV files by two column values, simply by passing the property (column) names of interest to -Property; the resulting output is split into two collections based on which input side a differing property combination is unique to, using the intrinsic .Where() method:
$removed, $added = (
Compare-Object (Import-Csv Source1.csv) (Import-Csv Source2.csv) -PassThru `
-Property TeamDesc, UserPrincipalName
).Where({ $_.SideIndicator -eq '=>' }, 'Split')
$removed |
Select-Object -ExcludeProperty SideIndicator |
Export-Csv -NoTypeInformation Export1.csv
$added |
Select-Object -ExcludeProperty SideIndicator |
Export-Csv -NoTypeInformation Export2.csv
Assuming both Csvs are stored in memory, Source1.csv is $csv1 and Source2.csv is $csv2, you already have the logic for Export2.csv using the HashSet<T>:
$set = [System.Collections.Generic.HashSet[string]]::new(
[string[]] $csv1.UserPrincipalName,
[System.StringComparer]::InvariantCultureIgnoreCase
)
$csv2 | Where-Object { $set.Add($_.UserPrincipalName) }
Outputs:
TeamDesc UserPrincipalName Name
-------- ----------------- ----
Team 2 student4#domain.com tim tams
For the first requirement, Export1.csv, the reference object would be $csv2 and instead of a HashSet<T> you could use a hash table, Group-Object -AsHashTable makes it really easy in this case:
$map = $csv2 | Group-Object UserPrincipalName -AsHashTable -AsString
# if Csv2 has unique values for `UserPrincipalName`
$csv1 | Where-Object { $map[$_.UserPrincipalName].TeamDesc -ne $_.TeamDesc }
# if Csv2 has duplicated values for `UserPrincipalName`
$csv1 | Where-Object { $_.TeamDesc -notin $map[$_.UserPrincipalName].TeamDesc }
Outputs:
TeamDesc UserPrincipalName Name
-------- ----------------- ----
Team 1 student1#domain.com john smith
Using this Join-Object script/Join-Object Module (see also: How to compare two CSV files and output the rows that are just in either of the file but not in both and In Powershell, what's the best way to join two tables into one?):
Loading your sample data:
(In your case you probably want to use Import-Csv to import your data)
Install-Script -Name Read-HtmlTable
$Csv1 = Read-HtmlTable https://stackoverflow.com/q/74452725 -Table 0 # Import-Csv .\Source1.csv
$Csv2 = Read-HtmlTable https://stackoverflow.com/q/74452725 -Table 1 # Import-Csv .\Source2.csv
Install-Module -Name JoinModule
$Csv1 |OuterJoin $Csv2 -On TeamDesc, UserPrincipalName -Name Out,In
TeamDesc UserPrincipalName OutName InName
-------- ----------------- ------- ------
Team 1 student1#domain.com john smith
Team 2 student4#domain.com tim tams
You might use the (single) result file as is. If you really want to work with two different files, you might split the results as in the nice answer from mklement0.
I have a PS custom object that is in this format:
|Name|Number|Email|
|----|------| -----|
|Bob| 23| bob.bob#test.com|
|Tom|124|tom.tom#test.com|
|Jeff|125|jeff.jeff#test.com|
|Jeff|127|jeff.jeff#test.com|
|Jeff|129|jeff.jeff#test.com|
|Jessica|126|jessica.jessica#test.com|
|Jessica|132|jessica.jessica#test.com|
I'd like to group together the fields where the numbers are the same. I.e:
|Name|Number|Email|
|----|------|-----|
|Bob|123|bob.bob#test.com|
|Tom|124|tom.tom#test.com|
|Jeff|125,127,129|jeff.jeff#test.com|
|Jessica|126,132|jessica.jessica#test.com|
I've tried a number of compare-object, sort-object, creating a new array etc. but I can't seem to get it.
Any ideas?
Group-Object is perfectly suited for that task. With the help of calculated properties, you can create a select statement that will produce the desired output in one sweep.
$data = #'
Name|Number|Email|
Bob| 23| bob.bob#test.com|
Tom|124|tom.tom#test.com|
Jeff|125|jeff.jeff#test.com|
Jeff|127|jeff.jeff#test.com|
Jeff|129|jeff.jeff#test.com|
Jessica|126|jessica.jessica#test.com|
Jessica|132|jessica.jessica#test.com|
'# | ConvertFrom-Csv -Delimiter '|'
# Group by name and email
$data | Group-Object -Property Name, Email |
Select #{'Name' = 'Name' ; 'Expression' = { $_.Group[0].Name } },
#{'Name' = 'Number' ; 'Expression' = { $_.Group.Number } },
#{'Name' = 'Email' ; 'Expression' = { $_.Group[0].Email } }
Output
Name Number Email
---- ------ -----
Bob 23 bob.bob#test.com
Tom 124 tom.tom#test.com
Jeff {125, 127, 129} jeff.jeff#test.com
Jessica {126, 132} jessica.jessica#test.com
References
Group-Object
about calculated properties
Say [hypothetically], I have two .CSVs I'm comparing to try and see which of my current members are original members... I wrote a nested ForEach-Object comparing every $name and $memberNumber from each object against every other object. It works fine, but is taking way to long, especially since each CSV has 10s of thousands of objects. Is there another way I should approach this?
Original_Members.csv
Name, Member_Number
Alice, 1234
Jim , 4567
Current_Members.csv
Alice, 4599
Jim, 4567
$currentMembers = import-csv $home\Desktop\current_members.csv |
ForEach-Object {
$name = $_.Name
$memNum = $_."Member Number"
$ogMembers = import-csv $home\Desktop\original_members.csv" |
ForEach-Object {
If ($ogMembers.Name -eq $name -and $ogMembers."Member Number" -eq $memNum) {
$ogMember = "Yes"
}
Else {
$ogMember = "No"
}
}
[pscustomobject]#{
"Name"=$name
"Member Number"=$memNum
"Original Member?"=$ogMember
}
} |
select "Name","Member Number","Original Member?" |
Export-CSV "$home\Desktop\OG_Compare_$(get-date -uformat "%d%b%Y").csv" -Append -NoTypeInformation
Assuming both of your files are like the below:
Original_Members.csv
Name, Member_Number
Alice, 1234
Jim, 4567
Current_Members.csv
Name, Member_Number
Alice, 4599
Jim, 4567
You could store the original member names in a System.Collections.Generic.HashSet<T> for constant time lookups, instead of doing a linear search for each name. We can use System.Linq.Enumerable.ToHashSet to create a hashset of string[] names.
We can then use Where-Object to filter current names by checking if the hashset contains the original name with System.Collections.Generic.HashSet<T>.Contains(T), which is an O(1) method.
$originalMembers = Import-Csv -Path .\Original_Members.csv
$currentMembers = Import-Csv -Path .\Current_Members.csv
$originalMembersLookup = [Linq.Enumerable]::ToHashSet(
[string[]]$originalMembers.Name,
[StringComparer]::CurrentCultureIgnoreCase
)
$currentMembers |
Where-Object {$originalMembersLookup.Contains($_.Name)}
Which will output the current members that were original members:
Name Member_Number
---- -------------
Alice 4599
Jim 4567
Update
As requested in the comments, If we want to check both Name and Member_Number, we can concatenate both strings to use for lookups:
$originalMembers = Import-Csv -Path .\Original_Members.csv
$currentMembers = Import-Csv -Path .\Current_Members.csv
$originalMembersLookup = [Linq.Enumerable]::ToHashSet(
[string[]]($originalMembers |
ForEach-Object {
$_.Name + $_.Member_Number
}),
[StringComparer]::CurrentCultureIgnoreCase
)
$currentMembers |
Where-Object {$originalMembersLookup.Contains($_.Name + $_.Member_Number)}
Which will now only return:
Name Member_Number
---- -------------
Jim 4567
Wondering if someone would be able to help me. Problem is that I'm trying to Import , Group, Sum and the Export a CSV. The problem is that my CSV has a unknown number of columns of the following format.
GroupA,GroupB,GroupC,ValueA,ValueB,ValueC,ValueD...
GroupA, B and C are constant and the fields I want to group by - I know the names of these fields in advance. The problem is there are an unknown number of Value columns - all of which I want to Sum (and don't know the names of in advance.)
I'm comfortable getting this code working if I know the name of the Value fields and have a fixed number of Value Fields. But I'm struggling to get code for unknown names and number of columns.
$csvImport = import-csv 'C:\input.csv'
$csvGrouped = $csvImport | Group-Object -property GroupA,GroupB,GroupC
$csvGroupedFinal = $csvGrouped | Select-Object #{Name = 'GroupA';Expression={$_.Values[0]}},
#{Name = 'GroupB';Expression={$_.Values[1]}},
#{Name = 'GroupC';Expression={$_.Values[2]}},
#{Name = 'ValueA' ;Expression={
($_.Group|Measure-Object 'ValueA' -Sum).Sum
}}
$csvGroupedFinal | Export-Csv 'C:\output.csv' -NoTypeInformation
Example Input Data -
GroupA, GroupB, Value A
Sam, Apple, 10
Sam, Apple, 20
Sam, Orange, 50
Ian, Apple, 15
Output Data -
GroupA, GroupB, Value A
Sam, Apple, 30
Sam, Orange, 50
Ian, Apple, 15
The following script should work. Pay your attention to the $FixedNames variable:
$csvImport = #"
Group A,Group B,Value A
sam,apple,10
sam,apple,20
sam,orange,50
ian,apple,15
"# | ConvertFrom-Csv
$FixedNames = #('Group A', 'Group B', 'Group C')
# $aux = ($csvImport|Get-Member -MemberType NoteProperty).Name ### sorted (wrong)
$aux = ($csvImport[0].psobject.Properties).Name ### not sorted
$auxGrpNames = #( $aux | Where-Object {$_ -in $FixedNames})
$auxValNames = #( $aux | Where-Object {$_ -notin $FixedNames})
$csvGrouped = $csvImport | Group-Object -property $auxGrpNames
$csvGroupedFinal = $csvGrouped |
ForEach-Object {
($_.Name.Replace(', ',','), (($_.Group |
Measure-Object -Property $auxValNames -Sum
).Sum -join ',')) -join ','
} | ConvertFrom-Csv -Header $aux
$csvGroupedFinal
Tested likewise for
$csvImport = #"
Group A,Group B,Value A,Value B
sam,apple,10,1
sam,apple,20,
sam,orange,50,5
ian,apple,15,51
"# | ConvertFrom-Csv
as well as for more complex data of Group A,Group B,Group C,Value A,Value B header.
Edit updated according to the beneficial LotPings' comment.
After importing this script splits the properties (columns) into Groups / Values
It groups dynamically and sums on only value fields independent of the number
The input ordering is maintained with a final Select-Object
## Q:\Test\2019\01\17\SO_54237887.ps1
$csvImport = Import-Csv '.\input.csv'
$Cols = ($csvImport[0].psobject.Properties).Name
# get list of group columns by name and wildcard
$GroupCols = $Cols | Where-Object {$_ -like 'Group*'}
# a different approach would be to select a number of leading columns
# $GroupCols = $Cols[0..1]
$ValueCols = $Cols | Where-Object {$_ -notin $GroupCols}
$OutCols = ,'Groups' + $ValueCols
$csvGrouped = $csvImport | Group-Object $GroupCols | ForEach-Object{
$Props = #{Groups=$_.Name}
ForEach ($ValCol in $ValueCols){
$Props.Add($ValCol,($_.Group|Measure-Object $ValCol -Sum).Sum)
}
[PSCustomObject]$Props
}
$csvGrouped | Select-Object $OutCols
With this sample input file
GroupA GroupB ValueA ValueB
------ ------ ------ ------
Sam Apple 10 15
Sam Apple 20 25
Sam Orange 50 75
Ian Apple 15 20
Sample output for any number of Groups and values
Groups ValueA ValueB
------ ------ ------
Sam, Apple 30 40
Sam, Orange 50 75
Ian, Apple 15 20
Without any change in code it does process data from Hassans answer too:
Groups ValueA ValueB ValueC
------ ------ ------ ------
Sam, Apple 30 4 20
Sam, Orange 50 4 5
Ian, Apple 15 3 3
script1.ps1
Import-Csv 'input.csv' | `
Group-Object -Property GroupA,GroupB | `
% {$b=$_.name -split ', ';$c=($_.group | `
Measure-Object -Property Value* -Sum).Sum;
[PScustomobject]#{GroupA=$b[0];
GroupB=$b[1];
Sum=($c | Measure-Object -Sum).Sum }}
input.csv
GroupA, GroupB, ValueA, ValueB, ValueC
Sam, Apple, 10, 1, 10
Sam, Apple, 20, 3, 10
Sam, Orange, 50, 4, 5
Ian, Apple, 15, 3, 3
OUTPUT
PS D:\coding> .\script1.ps1
GroupA GroupB Sum
------ ------ ---
Sam Apple 54
Sam Orange 59
Ian Apple 21
I'm comparing two CSV files that come from different sources (different column/property names) using the Compare-Object cmdlet. How can I include properties that are in either CSV file in the output without including them in the comparison?
Example CSV data
users1.csv
e-mail-address,name,side
luke#sw.com,Luke,light
users2.csv
e-mail-address,hiredate,hobbies
lando#sw.com,5/2/17,Sabacc
The following gives me a column with the e-mail address and side indicator, but how can I get $Users1.name and $Users2.hiredate without using them in the comparison?
$Users1 = Import-Csv users1.csv
$Users2 = Import-Csv users2.csv
Compare-Object $Users1 $Users2 -Property "E-mail-Address"
I'd like output similar to:
e-mail-address | SideIndicator | name | hiredate
---------------|---------------|------|----------
luke#sw.com | <= | Luke |
lando#sw.com | => | | 5/2/17
Add the PassThru parameter to have Compare-Object return all the properties, then use Select-Object to grab the name and hiredate properties:
Compare-Object $users1 $users2 -Property e-mail-address -PassThru|Select-Object e-mail-address,SideIndicator,name,hiredate