Merge csv's - no join - powershell

I need to combine a slew of Excel spreadsheets. I used PowerSHell to convert them to CSVs and now need to merge them, but not as you typically would. The merge doesn't use a join. If I have 3 files with 100 rows each, my new file should have 300 rows. So, this is more if a UNION than a JOIN to use database terms.
Some of the columns do have the same name. Some don't. If they have the same name, a new column shouldn't be created. Is there a way to do this without manually having to list out all the columns as properties?
Example (with only 2 files)
File1:
Name Address
Bob 123 Main
File2:
Name City
Bob LA
Tom Boston
Results
Name Address City
Bob 123 Main
Bob LA
Tom Boston

At the end of the day this might not be sorted right. The trick here is to read the header of each file and collect it as a string array and remove and of the duplicates.
This code assumes all the files are in the same location. If not you will need to account for that.
$files = Get-ChildItem -Path 'C:\temp\csv\' -Filter '*.csv' | Select-Object -ExpandProperty FullName
# Gather the headers for all the files.
$headers = $files | ForEach-Object{
(Get-Content $_ -Head 1).Split(",") | ForEach-Object{$_.Trim()}
} | Sort-Object -Unique
# Loop again now and read in the csv files as objects
$files | ForEach-Object{
Import-Csv $_
} | Select-Object $headers
The output would look like this:
Address City Name
------- ---- ----
123 Main Bob
LA Bob
Boston Tom

Related

How to compare 2 csv files and transfer the difference into 2nd csv file using powershell [duplicate]

I have a file first.csv
name,surname,height,city,county,state,zipCode
John,Doe,120,jefferson,Riverside,NJ,8075
Jack,Yan,220,Phila,Riverside,PA,9119
Jill,Fan,120,jefferson,Riverside,NJ,8075
Steve,Tan,220,Phila,Riverside,PA,9119
Alpha,Fan,120,jefferson,Riverside,NJ,8075
and second.csv
name,surname,height,city,county,state,zipCode
John,Doe,120,jefferson,Riverside,NJ,8075
Jack,Yan,220,Phila,Riverside,PA,9119
Jill,Fan,120,jefferson,Riverside,NJ,8075
Steve,Tan,220,Phila,Riverside,PA,9119
Bravo,Tan,220,Phila,Riverside,PA,9119
I want to compare the rows of both first.csv and second.csv files and output the rows that are either in first.csv or second.csv but not in both.
So the output.csv should have
Alpha,Fan,120,jefferson,Riverside,NJ,8075
Bravo,Tan,220,Phila,Riverside,PA,9119
There are quite a few similar questions but the output is not exactly what I want.
Thank you
$filea = Import-Csv C:\Powershell\TestCSVs\group1.csv
$fileb = Import-Csv C:\Powershell\TestCSVs\group2.csv
Compare-Object $filea $fileb -Property name, surname, height, city, county, state, zipCode | Select-Object name, surname, height, city, county, state, zipCode | export-csv C:\Powershell\TestCSVs\out.csv -NoTypeInformation
I'm using the all the fields to compare and sort here but you can specify the unique value(s) that you're wanting to use to match the rows.
output
"name","surname","height","city","county","state","zipCode"
"Bravo","Tan","220","Phila","Riverside","PA","9119"
"Alpha","Fan","120","jefferson","Riverside","NJ","8075"
Getting the symmetric difference (everything that is unrelated) from two lists is actually a quite common use in comparing objects.
Therefore, I have added this feature (#30) to the Join-Object script/Join-Object Module (see also: In Powershell, what's the best way to join two tables into one?).
For this for this specific question:
PS C:\> Import-Csv .\First.csv |OuterJoin (Import-Csv .\Second.csv) |Format-Table
name surname height city county state zipCode
---- ------- ------ ---- ------ ----- -------
Alpha Fan 120 jefferson Riverside NJ 8075
Bravo Tan 220 Phila Riverside PA 9119

how can I correct my reconciliation of .csv files to remove dupes/nulls

I have been using code from this answer to check for additions/changes to class rosters from MS Teams:
$set = [System.Collections.Generic.HashSet[string]]::new(
[string[]] (Import-CSV -Path stundent.csv).UserPrincipalName,
[System.StringComparer]::InvariantCultureIgnoreCase
)
Import-Csv ad.csv | Where-Object { $set.Add($_.UserPrincipalName) } |
Export-Csv path\to\output.csv -NoTypeInformation
Ideally, I want to be able to check if there have been removals when compared to a new file, swap the import file positions, and check for additions. If my files look like Source1 and Source2 (below), the check for removals would return Export1, and the check for additions would return Export2.
Since there will be multiple instances of students across multiple classes, I want to include TeamDesc in the filter query to make sure only the specific instance of that student with that class is returned.
Source1.csv
TeamDesc
UserPrincipalName
Name
Team 1
student1#domain.com
john smith
Team 1
student2#domain.com
nancy drew
Team 2
student3#domain.com
harvey dent
Team 3
student1#domain.com
john smith
Source2.csv
TeamDesc
UserPrincipalName
Name
Team 1
student2#domain.com
nancy drew
Team 2
student3#domain.com
harvey dent
Team 2
student4#domain.com
tim tams
Team 3
student1#domain.com
john smith
Export1.csv
TeamDesc
UserPrincipalName
Name
Team 1
student1#domain.com
john smith
Export2.csv
TeamDesc
UserPrincipalName
Name
Team 2
student4#domain.com
tim tams
Try the following, which uses Compare-Object to compare the CSV files by two column values, simply by passing the property (column) names of interest to -Property; the resulting output is split into two collections based on which input side a differing property combination is unique to, using the intrinsic .Where() method:
$removed, $added = (
Compare-Object (Import-Csv Source1.csv) (Import-Csv Source2.csv) -PassThru `
-Property TeamDesc, UserPrincipalName
).Where({ $_.SideIndicator -eq '=>' }, 'Split')
$removed |
Select-Object -ExcludeProperty SideIndicator |
Export-Csv -NoTypeInformation Export1.csv
$added |
Select-Object -ExcludeProperty SideIndicator |
Export-Csv -NoTypeInformation Export2.csv
Assuming both Csvs are stored in memory, Source1.csv is $csv1 and Source2.csv is $csv2, you already have the logic for Export2.csv using the HashSet<T>:
$set = [System.Collections.Generic.HashSet[string]]::new(
[string[]] $csv1.UserPrincipalName,
[System.StringComparer]::InvariantCultureIgnoreCase
)
$csv2 | Where-Object { $set.Add($_.UserPrincipalName) }
Outputs:
TeamDesc UserPrincipalName Name
-------- ----------------- ----
Team 2 student4#domain.com tim tams
For the first requirement, Export1.csv, the reference object would be $csv2 and instead of a HashSet<T> you could use a hash table, Group-Object -AsHashTable makes it really easy in this case:
$map = $csv2 | Group-Object UserPrincipalName -AsHashTable -AsString
# if Csv2 has unique values for `UserPrincipalName`
$csv1 | Where-Object { $map[$_.UserPrincipalName].TeamDesc -ne $_.TeamDesc }
# if Csv2 has duplicated values for `UserPrincipalName`
$csv1 | Where-Object { $_.TeamDesc -notin $map[$_.UserPrincipalName].TeamDesc }
Outputs:
TeamDesc UserPrincipalName Name
-------- ----------------- ----
Team 1 student1#domain.com john smith
Using this Join-Object script/Join-Object Module (see also: How to compare two CSV files and output the rows that are just in either of the file but not in both and In Powershell, what's the best way to join two tables into one?):
Loading your sample data:
(In your case you probably want to use Import-Csv to import your data)
Install-Script -Name Read-HtmlTable
$Csv1 = Read-HtmlTable https://stackoverflow.com/q/74452725 -Table 0 # Import-Csv .\Source1.csv
$Csv2 = Read-HtmlTable https://stackoverflow.com/q/74452725 -Table 1 # Import-Csv .\Source2.csv
Install-Module -Name JoinModule
$Csv1 |OuterJoin $Csv2 -On TeamDesc, UserPrincipalName -Name Out,In
TeamDesc UserPrincipalName OutName InName
-------- ----------------- ------- ------
Team 1 student1#domain.com john smith
Team 2 student4#domain.com tim tams
You might use the (single) result file as is. If you really want to work with two different files, you might split the results as in the nice answer from mklement0.

Combine two CSV files in powershell without changing the order of columns

I have "a.csv" and "b.csv" . I tried to merge them with below commands
cd c:/users/mine/test
Get-Content a.csv, b.csv | Select-Object -Unique | Set-Content -Encoding ASCII joined.csv
But I got Output file like b.csv added by end of the row of a.csv. I wanted add by end of the column of a.csv then b.csv columns should begin
Vm Resource SID
mnvb vclkn vxjcb
vjc.v vnxc,m bvkxncb
Vm 123 456 789
mnvb apple banana orange
vjc.v lemon onion tomato
My expected output should be like below. Without changing the order
Vm Resource SID 123 456 789
mnvb vclkn vxjcb apple banana orange
vjc.v vnxc,m bvkxncb lemon onion tomato
From here, there are two ways to do it -
Join-Object custom function by RamblingCookieMonster. This is short and sweet. After you import the function in your current PoSh environment, you can use the below command to get your desired result -
Join-Object -Left $a -Right $b -LeftJoinProperty vm -RightJoinProperty vm | Export-Csv Joined.csv -NTI
The accepted answer from mklement which would work for you as below -
# Read the 2 CSV files into collections of custom objects.
# Note: This reads the entire files into memory.
$doc1 = Import-Csv a.csv
$doc2 = Import-Csv b.csv
$outFile = 'Joined.csv'
# Determine the column (property) names that are unique to document 2.
$doc2OnlyColNames = (
Compare-Object $doc1[0].psobject.properties.name $doc2[0].psobject.properties.name |
Where-Object SideIndicator -eq '=>'
).InputObject
# Initialize an ordered hashtable that will be used to temporarily store
# each document 2 row's unique values as key-value pairs, so that they
# can be appended as properties to each document-1 row.
$htUniqueRowD2Props = [ordered] #{}
# Process the corresponding rows one by one, construct a merged output object
# for each, and export the merged objects to a new CSV file.
$i = 0
$(foreach($rowD1 in $doc1) {
# Get the corresponding row from document 2.
$rowD2 = $doc2[$i++]
# Extract the values from the unique document-2 columns and store them in the ordered
# hashtable.
foreach($pname in $doc2OnlyColNames) { $htUniqueRowD2Props.$pname = $rowD2.$pname }
# Add the properties represented by the hashtable entries to the
# document-1 row at hand and output the augmented object (-PassThru).
$rowD1 | Add-Member -NotePropertyMembers $htUniqueRowD2Props -PassThru
}) | Export-Csv -NoTypeInformation -Encoding Utf8 $outFile

Outputting sorted sections of sorted data to a variable?

I have the following PowerShell code:
[xml]$xml = Get-Content uccxResourceList_forReference.xml
$xml.resources.resource |
Select-Object firstname, lastname, extension,
#{Name="Team"; Expression={($_.team.name)}} |
Sort Team |
Format-Table
which produces a table like this:
firstName lastName extension Team
--------- -------- --------- ----------------
Homer Simpson 1000 SafetyInspectors
Frank Grimes 1001 SafetyInspectors
Lenford Leonard 1002 SafetyInspectors
Carlton Carlson 1003 SafetyInspectors
Montgomery Burns 2000 Executives
Waylon Smithers 2001 Executives
What I would like to do is output each team into its own file. So not just a simple | Out-File teamlist.txt at the end, but I would like to output a text file containing all of the "SafetyInspectors" and another with all of the "Executives".
I know I could get this done with a subsequent foreach loop but I feel it could also be done in the pipeline and I just don't know how to do it.
I'd prefer to output to a csv file (which easily is imported again) , so:
[xml]$xml = Get-Content uccxResourceList_forReference.xml
$xml.resources.resource |
Select-Object firstname, lastname, extension,
#{Name="Team"; Expression={($_.team.name)}} |
Group-Object Team | ForEach-Object {
$_.Group | Export-Csv ("{0}.csv" -f $_.Name) -NoTypeInformation
}
Should return something like this:
> gc .\Executives.csv
"firstName","lastName","extension","Team"
"Montgomery","Burns","2000","Executives"
"Waylon","Smithers","2001","Executives"
> gc .\SafetyInspectors.csv
"firstName","lastName","extension","Team"
"Homer","Simpson","1000","SafetyInspectors"
"Frank","Grimes","1001","SafetyInspectors"
"Lenford","Leonard","1002","SafetyInspectors"
"Carlton","Carlson","1003","SafetyInspectors"

Powershell counting same values from csv

Using PowerShell, I can import the CSV file and count how many objects are equal to "a". For example,
#(Import-csv location | where-Object{$_.id -eq "a"}).Count
Is there a way to go through every column and row looking for the same String "a" and adding onto count? Or do I have to do the same command over and over for every column, just with a different keyword?
So I made a dummy file that contains 5 columns of people names. Now to show you how the process will work I will show you how often the text "Ann" appears in any field.
$file = "C:\temp\MOCK_DATA (3).csv"
gc $file | %{$_ -split ","} | Group-Object | Where-Object{$_.Name -like "Ann*"}
Don't focus on the code but the output below.
Count Name Group
----- ---- -----
5 Ann {Ann, Ann, Ann, Ann...}
9 Anne {Anne, Anne, Anne, Anne...}
12 Annie {Annie, Annie, Annie, Annie...}
19 Anna {Anna, Anna, Anna, Anna...}
"Ann" appears 5 times on it's own. However it is a part of other names as well. Lets use a simple regex to find all the values that are only "Ann".
(select-string -Path 'C:\temp\MOCK_DATA (3).csv' -Pattern "\bAnn\b" -AllMatches | Select-Object -ExpandProperty Matches).Count
That will return 5 since \b is for a word boundary. In essence it is only looking at what is between commas or beginning or end of each line. This omits results like "Anna" and "Annie" that you might have. Select-Object -ExpandProperty Matches is important to have if you have more than one match on a single line.
Small Caveat
It should not matter but in trying to keep the code simple it is possible that your header could match with the value you are looking for. Not likely which is why I don't account for it. If that is a possibility then we could use Get-Content instead with a Select -Skip 1.
Try cycling through properties like this:
(Import-Csv location | %{$record = $_; $record | Get-Member -MemberType Properties |
?{$record.$($_.Name) -eq 'a';}}).Count