How to compare 2 csv files and transfer the difference into 2nd csv file using powershell [duplicate] - powershell

I have a file first.csv
name,surname,height,city,county,state,zipCode
John,Doe,120,jefferson,Riverside,NJ,8075
Jack,Yan,220,Phila,Riverside,PA,9119
Jill,Fan,120,jefferson,Riverside,NJ,8075
Steve,Tan,220,Phila,Riverside,PA,9119
Alpha,Fan,120,jefferson,Riverside,NJ,8075
and second.csv
name,surname,height,city,county,state,zipCode
John,Doe,120,jefferson,Riverside,NJ,8075
Jack,Yan,220,Phila,Riverside,PA,9119
Jill,Fan,120,jefferson,Riverside,NJ,8075
Steve,Tan,220,Phila,Riverside,PA,9119
Bravo,Tan,220,Phila,Riverside,PA,9119
I want to compare the rows of both first.csv and second.csv files and output the rows that are either in first.csv or second.csv but not in both.
So the output.csv should have
Alpha,Fan,120,jefferson,Riverside,NJ,8075
Bravo,Tan,220,Phila,Riverside,PA,9119
There are quite a few similar questions but the output is not exactly what I want.
Thank you

$filea = Import-Csv C:\Powershell\TestCSVs\group1.csv
$fileb = Import-Csv C:\Powershell\TestCSVs\group2.csv
Compare-Object $filea $fileb -Property name, surname, height, city, county, state, zipCode | Select-Object name, surname, height, city, county, state, zipCode | export-csv C:\Powershell\TestCSVs\out.csv -NoTypeInformation
I'm using the all the fields to compare and sort here but you can specify the unique value(s) that you're wanting to use to match the rows.
output
"name","surname","height","city","county","state","zipCode"
"Bravo","Tan","220","Phila","Riverside","PA","9119"
"Alpha","Fan","120","jefferson","Riverside","NJ","8075"

Getting the symmetric difference (everything that is unrelated) from two lists is actually a quite common use in comparing objects.
Therefore, I have added this feature (#30) to the Join-Object script/Join-Object Module (see also: In Powershell, what's the best way to join two tables into one?).
For this for this specific question:
PS C:\> Import-Csv .\First.csv |OuterJoin (Import-Csv .\Second.csv) |Format-Table
name surname height city county state zipCode
---- ------- ------ ---- ------ ----- -------
Alpha Fan 120 jefferson Riverside NJ 8075
Bravo Tan 220 Phila Riverside PA 9119

Related

how can I correct my reconciliation of .csv files to remove dupes/nulls

I have been using code from this answer to check for additions/changes to class rosters from MS Teams:
$set = [System.Collections.Generic.HashSet[string]]::new(
[string[]] (Import-CSV -Path stundent.csv).UserPrincipalName,
[System.StringComparer]::InvariantCultureIgnoreCase
)
Import-Csv ad.csv | Where-Object { $set.Add($_.UserPrincipalName) } |
Export-Csv path\to\output.csv -NoTypeInformation
Ideally, I want to be able to check if there have been removals when compared to a new file, swap the import file positions, and check for additions. If my files look like Source1 and Source2 (below), the check for removals would return Export1, and the check for additions would return Export2.
Since there will be multiple instances of students across multiple classes, I want to include TeamDesc in the filter query to make sure only the specific instance of that student with that class is returned.
Source1.csv
TeamDesc
UserPrincipalName
Name
Team 1
student1#domain.com
john smith
Team 1
student2#domain.com
nancy drew
Team 2
student3#domain.com
harvey dent
Team 3
student1#domain.com
john smith
Source2.csv
TeamDesc
UserPrincipalName
Name
Team 1
student2#domain.com
nancy drew
Team 2
student3#domain.com
harvey dent
Team 2
student4#domain.com
tim tams
Team 3
student1#domain.com
john smith
Export1.csv
TeamDesc
UserPrincipalName
Name
Team 1
student1#domain.com
john smith
Export2.csv
TeamDesc
UserPrincipalName
Name
Team 2
student4#domain.com
tim tams
Try the following, which uses Compare-Object to compare the CSV files by two column values, simply by passing the property (column) names of interest to -Property; the resulting output is split into two collections based on which input side a differing property combination is unique to, using the intrinsic .Where() method:
$removed, $added = (
Compare-Object (Import-Csv Source1.csv) (Import-Csv Source2.csv) -PassThru `
-Property TeamDesc, UserPrincipalName
).Where({ $_.SideIndicator -eq '=>' }, 'Split')
$removed |
Select-Object -ExcludeProperty SideIndicator |
Export-Csv -NoTypeInformation Export1.csv
$added |
Select-Object -ExcludeProperty SideIndicator |
Export-Csv -NoTypeInformation Export2.csv
Assuming both Csvs are stored in memory, Source1.csv is $csv1 and Source2.csv is $csv2, you already have the logic for Export2.csv using the HashSet<T>:
$set = [System.Collections.Generic.HashSet[string]]::new(
[string[]] $csv1.UserPrincipalName,
[System.StringComparer]::InvariantCultureIgnoreCase
)
$csv2 | Where-Object { $set.Add($_.UserPrincipalName) }
Outputs:
TeamDesc UserPrincipalName Name
-------- ----------------- ----
Team 2 student4#domain.com tim tams
For the first requirement, Export1.csv, the reference object would be $csv2 and instead of a HashSet<T> you could use a hash table, Group-Object -AsHashTable makes it really easy in this case:
$map = $csv2 | Group-Object UserPrincipalName -AsHashTable -AsString
# if Csv2 has unique values for `UserPrincipalName`
$csv1 | Where-Object { $map[$_.UserPrincipalName].TeamDesc -ne $_.TeamDesc }
# if Csv2 has duplicated values for `UserPrincipalName`
$csv1 | Where-Object { $_.TeamDesc -notin $map[$_.UserPrincipalName].TeamDesc }
Outputs:
TeamDesc UserPrincipalName Name
-------- ----------------- ----
Team 1 student1#domain.com john smith
Using this Join-Object script/Join-Object Module (see also: How to compare two CSV files and output the rows that are just in either of the file but not in both and In Powershell, what's the best way to join two tables into one?):
Loading your sample data:
(In your case you probably want to use Import-Csv to import your data)
Install-Script -Name Read-HtmlTable
$Csv1 = Read-HtmlTable https://stackoverflow.com/q/74452725 -Table 0 # Import-Csv .\Source1.csv
$Csv2 = Read-HtmlTable https://stackoverflow.com/q/74452725 -Table 1 # Import-Csv .\Source2.csv
Install-Module -Name JoinModule
$Csv1 |OuterJoin $Csv2 -On TeamDesc, UserPrincipalName -Name Out,In
TeamDesc UserPrincipalName OutName InName
-------- ----------------- ------- ------
Team 1 student1#domain.com john smith
Team 2 student4#domain.com tim tams
You might use the (single) result file as is. If you really want to work with two different files, you might split the results as in the nice answer from mklement0.

Powershell make list with object groupping

Assuming a CSV File:
Name,group_name,group_id
foo,Best,1
bar,Worst,2
baz,Best,1
bob,Worst,2
What's the simplest form of Grouping by Powershell I can use to have output like:
Count group_id group_name Names
----- -------- ---------- -----
2 1 Best ["foo", "baz"]
2 2 Worst ["bar", "bob"]
Use the Group-Object cmdlet to group the rows together by name and id, then use Select-Object to extract the appropriate details from each group as individual properties:
# replace with `$Data = Import-Csv path\to\file.csv`
$Data = #'
Name,group_name,group_id
foo,Best,1
bar,Worst,2
baz,Best,1
bob,Worst,2
'#|ConvertFrom-Csv
# Group rows, then construct output record with `Select-Object`
$Data |Group-Object group_name,group_id |Select-Object Count,#{Name='group_id';Expression={$_.Group[0].group_id}},#{Name='group_name';Expression={$_.Group[0].group_name}},#{Name='Names';Expression={$_.Group.Name}}

Outputting sorted sections of sorted data to a variable?

I have the following PowerShell code:
[xml]$xml = Get-Content uccxResourceList_forReference.xml
$xml.resources.resource |
Select-Object firstname, lastname, extension,
#{Name="Team"; Expression={($_.team.name)}} |
Sort Team |
Format-Table
which produces a table like this:
firstName lastName extension Team
--------- -------- --------- ----------------
Homer Simpson 1000 SafetyInspectors
Frank Grimes 1001 SafetyInspectors
Lenford Leonard 1002 SafetyInspectors
Carlton Carlson 1003 SafetyInspectors
Montgomery Burns 2000 Executives
Waylon Smithers 2001 Executives
What I would like to do is output each team into its own file. So not just a simple | Out-File teamlist.txt at the end, but I would like to output a text file containing all of the "SafetyInspectors" and another with all of the "Executives".
I know I could get this done with a subsequent foreach loop but I feel it could also be done in the pipeline and I just don't know how to do it.
I'd prefer to output to a csv file (which easily is imported again) , so:
[xml]$xml = Get-Content uccxResourceList_forReference.xml
$xml.resources.resource |
Select-Object firstname, lastname, extension,
#{Name="Team"; Expression={($_.team.name)}} |
Group-Object Team | ForEach-Object {
$_.Group | Export-Csv ("{0}.csv" -f $_.Name) -NoTypeInformation
}
Should return something like this:
> gc .\Executives.csv
"firstName","lastName","extension","Team"
"Montgomery","Burns","2000","Executives"
"Waylon","Smithers","2001","Executives"
> gc .\SafetyInspectors.csv
"firstName","lastName","extension","Team"
"Homer","Simpson","1000","SafetyInspectors"
"Frank","Grimes","1001","SafetyInspectors"
"Lenford","Leonard","1002","SafetyInspectors"
"Carlton","Carlson","1003","SafetyInspectors"

Merge csv's - no join

I need to combine a slew of Excel spreadsheets. I used PowerSHell to convert them to CSVs and now need to merge them, but not as you typically would. The merge doesn't use a join. If I have 3 files with 100 rows each, my new file should have 300 rows. So, this is more if a UNION than a JOIN to use database terms.
Some of the columns do have the same name. Some don't. If they have the same name, a new column shouldn't be created. Is there a way to do this without manually having to list out all the columns as properties?
Example (with only 2 files)
File1:
Name Address
Bob 123 Main
File2:
Name City
Bob LA
Tom Boston
Results
Name Address City
Bob 123 Main
Bob LA
Tom Boston
At the end of the day this might not be sorted right. The trick here is to read the header of each file and collect it as a string array and remove and of the duplicates.
This code assumes all the files are in the same location. If not you will need to account for that.
$files = Get-ChildItem -Path 'C:\temp\csv\' -Filter '*.csv' | Select-Object -ExpandProperty FullName
# Gather the headers for all the files.
$headers = $files | ForEach-Object{
(Get-Content $_ -Head 1).Split(",") | ForEach-Object{$_.Trim()}
} | Sort-Object -Unique
# Loop again now and read in the csv files as objects
$files | ForEach-Object{
Import-Csv $_
} | Select-Object $headers
The output would look like this:
Address City Name
------- ---- ----
123 Main Bob
LA Bob
Boston Tom

Powershell counting same values from csv

Using PowerShell, I can import the CSV file and count how many objects are equal to "a". For example,
#(Import-csv location | where-Object{$_.id -eq "a"}).Count
Is there a way to go through every column and row looking for the same String "a" and adding onto count? Or do I have to do the same command over and over for every column, just with a different keyword?
So I made a dummy file that contains 5 columns of people names. Now to show you how the process will work I will show you how often the text "Ann" appears in any field.
$file = "C:\temp\MOCK_DATA (3).csv"
gc $file | %{$_ -split ","} | Group-Object | Where-Object{$_.Name -like "Ann*"}
Don't focus on the code but the output below.
Count Name Group
----- ---- -----
5 Ann {Ann, Ann, Ann, Ann...}
9 Anne {Anne, Anne, Anne, Anne...}
12 Annie {Annie, Annie, Annie, Annie...}
19 Anna {Anna, Anna, Anna, Anna...}
"Ann" appears 5 times on it's own. However it is a part of other names as well. Lets use a simple regex to find all the values that are only "Ann".
(select-string -Path 'C:\temp\MOCK_DATA (3).csv' -Pattern "\bAnn\b" -AllMatches | Select-Object -ExpandProperty Matches).Count
That will return 5 since \b is for a word boundary. In essence it is only looking at what is between commas or beginning or end of each line. This omits results like "Anna" and "Annie" that you might have. Select-Object -ExpandProperty Matches is important to have if you have more than one match on a single line.
Small Caveat
It should not matter but in trying to keep the code simple it is possible that your header could match with the value you are looking for. Not likely which is why I don't account for it. If that is a possibility then we could use Get-Content instead with a Select -Skip 1.
Try cycling through properties like this:
(Import-Csv location | %{$record = $_; $record | Get-Member -MemberType Properties |
?{$record.$($_.Name) -eq 'a';}}).Count