How to merge two csv files with PowerShell

How to merge two csv files with PowerShell - powershell

I have two .CSV files that contain information about the employees from where I work. The first file (ActiveEmploye.csv) has approximately 70 different fields with 2500 entries. The other one (EmployeEmail.csv) has four fields (FirstName, LastName, Email, FullName) and 700 entries. Both files have the FullName field in common and it is the only thing I can use to compare each file. What I need to do is to add the Email address (from EmployeEmail.csv) to the corresponding employees in ActiveEmploye.csv. And, for those who don't have email address, the field can be left blank.
I tried to use a Join-Object function I found on internet few days ago but the first csv files contain way too much fields for the function can handle.
Any suggestions is highly appreciated!

There are probably several ways to skin this cat. I would try this one:
Import-Csv EmployeeEmail.csv | ForEach-Object -Begin {
$Employees = #{}
} -Process {
$Employees.Add($_.FullName,$_.email)
}
Import-Csv ActiveEmployees.csv | ForEach-Object {
$_ | Add-Member -MemberType NoteProperty -Name email -Value $Employees."$($_.FullName)" -PassThru
} | Export-Csv -NoTypeInformation Joined.csv

Here is an alternate method to #BartekB which I used to join multiple fields to the left and had better results for processing time.
Import-Csv EmployeeEmail.csv | ForEach-Object -Begin {
$Employees = #{}
} -Process {
$Employees.Add($_.FullName,$_)
}
Import-Csv ActiveEmployees.csv |
Select *,#{Name="email";Expression={$Employees."$($_.FullName)"."email"}} |
Export-Csv Joined.csv -NoTypeInformation
This allows one to query the array index of FullName on $Employees and then get the value of a named member on that element.

Related

How to merge 2 x CSVs with the same column but overwrite not append?

I've got this one that has been baffling me all day, and I can't seem to find any search results that match exactly what I am trying to do.
I have 2 CSV files, both of which have the same columns and headers. They look like this (shortened for the purpose of this post):
"plate","labid","well"
"1013740016604537004556","none46","F006"
"1013740016604537004556","none47","G006"
"1013740016604537004556","none48","H006"
"1013740016604537004556","3835265","A007"
"1013740016604537004556","3835269","B007"
"1013740016604537004556","3835271","C007"
Each of the 2 CSVs only have some actual Lab IDs, and the 'nonexx' are just fillers for the importing software. There is no duplication ie each 'well' is only referenced once across the 2 files.
What I need to do is merge the 2 CSVs, for example the second CSV might have a Lab ID for well H006 but the first will not. I need the lab ID from the second CSV imported into the first, overwriting the 'nonexx' currently in that column.
Here is my current code:
$CSVB = Import-CSV "$RootDir\SymphonyOutputPending\$plateID`A_Header.csv"
Import-CSV "$RootDir\SymphonyOutputPending\$plateID`_Header.csv" | ForEach-Object {
$CSVData = [PSCustomObject]#{
labid = $_.labid
well = $_.well
}
If ($CSVB.well -match $CSVData.wellID) {
write-host "I MATCH"
($CSVB | Where-Object {$_.well -eq $CSVData.well}).labid = $CSVData.labid
}
$CSVB | Export-CSV "$RootDir\SymphonyOutputPending\$plateID`_final.csv" -NoTypeInformation
}
The code runs but doesn't 'merge' the data, the final CSV output is just a replication of the first input file. I am definitely getting a match as the string "I MATCH" appears several times when debugging as expected.

Based on the responses in the comments of your question, I believe this is what you are looking for. This assumes that the both CSVs contain the exact same data with labid being the only difference.
There is no need to modify csv2 if we are just grabbing the labid to overwrite the row in csv1.
$csv1 = Import-Csv C:\temp\LabCSV1.csv
$csv2 = Import-Csv C:\temp\LabCSV2.csv
# Loop through csv1 rows
Foreach($line in $csv1) {
# If Labid contains "none"
If($line.labid -like "none*") {
# Set rows labid to the labid from csv2 row that matches plate/well
# May be able to remove the plate section if well is a unique value
$line.labid = ($csv2 | Where {$_.well -eq $line.well -and $_.plate -eq $line.plate}).labid
}
}
# Export to CSV - not overwrite - to confirm results
$csv1 | export-csv C:\Temp\LabCSV1Adjusted.csv -NoTypeInformation

Since you need to do a bi-directional comparison of the 2 Csvs you could create a new array of both and then group the objects by their well property, for this you can use Group-Object, then filter each group if their Count is equal to 2 where their labid property does not start with none else return the object as-is.
Using the following Csvs for demonstration purposes:
Csv1
"plate","labid","well"
"1013740016604537004556","none46","F006"
"1013740016604537004556","none47","G006"
"1013740016604537004556","3835265","A007"
"newrowuniquecsv1","none123","X001"
Csv2
"plate","labid","well"
"1013740016604537004556","none48","A007"
"1013740016604537004556","3835269","F006"
"1013740016604537004556","3835271","G006"
"newrowuniquecsv2","none123","X002"
Code
Note that this code assumes there will be a maximum of 2 objects with the same well property and, if there are 2 objects with the same well, one of them must have a value not starting with none.
$mergedCsv = #(
Import-Csv pathtocsv1.csv
Import-Csv pathtocsv2.csv
)
$mergedCsv | Group-Object well | ForEach-Object {
if($_.Count -eq 2) {
return $_.Group.Where{ -not $_.labid.StartsWith('none') }
}
$_.Group
} | Export-Csv pathtomerged.csv -NoTypeInformation
Output
plate labid well
----- ----- ----
1013740016604537004556 3835265 A007
1013740016604537004556 3835269 F006
1013740016604537004556 3835271 G006
newrowuniquecsv1 none123 X001
newrowuniquecsv2 none123 X002

If the lists are large, performance might be an issue as Where-Object (or any other where method) and Group-Object do not perform very well for embedded loops.
By indexing the second csv file (aka creating a hashtable), you have quicker access to the required objects. Indexing upon two (or more) items (plate and well) is issued here: Does there exist a designated (sub)index delimiter? and resolved by #mklement0 and zett42 with a nice CaseInsensitiveArrayEqualityComparer class.
To apply this class on Drew's helpful answer:
$csv1 = Import-Csv C:\temp\LabCSV1.csv
$csv2 = Import-Csv C:\temp\LabCSV2.csv
$dict = [hashtable]::new([CaseInsensitiveArrayEqualityComparer]::new())
$csv2.ForEach{ $dict.($_.plate, $_.well) = $_ }
Foreach($line in $csv1) {
If($line.labid -like "none*") {
$line.labid = $dict.($line.plate, $line.well).labid
}
}
$csv1 | export-csv C:\Temp\LabCSV1Adjusted.csv -NoTypeInformation

Export results of (2) cmdlets to separate columns in the same CSV

I'm new to PS, so your patience is appreciated.
I'm trying to grab data from (2) separate CSV files and then dump them into a new CSV with (2) columns. Doing this for (1) is easy, but I don't know how to do it for more.
This works perfectly:
Import-CSV C:\File1.csv | Select "Employee" | Export-CSV -Path D:\Result.csv -NoTypeInformation
If I add another Import-CSV, then it simply overwrites the existing data:
Import-CSV C:\File2.csv | Select "Department" | Export-CSV -Path D:\Result.csv -NoTypeInformation
How can I get columns A and B populated with the info result from these two commands? Thanks for your help.

I would have choose this option:
$1 = Import-Csv -Path "C:\Users\user\Desktop\1.csv" | Select "Employee"
$2 = Import-Csv -Path "C:\Users\user\Desktop\2.csv" | Select "Department"
$marged = [pscustomobject]#()
$object = [pscustomobject]
for ($i=0 ; $i -lt $1.Count ; $i++){
$object = [pscustomobject]#{
Employees = $1[$i].Employee
Department = $2[$i].Department}
$marged += $object
}
$marged | ForEach-Object{ [pscustomobject]$_ } | Export-Csv -Path "C:\Users\user\Desktop\3.csv" -NoTypeInformation -Force

I'll explain how I would do this, but I do it this way because I'm more comfortable working with objects than with hastables. Someone else may offer an answer using hashtables which would probably work better.
First, I would define an array to hold your data, which can later be exported to CSV:
$report = #()
Then, I would import your CSV to an object that can be iterated through:
$firstSet = Import-CSV .\File1.csv
Then I would iterate through this, importing each row into an object that has the two properties I want. In your case these are Employee and Department (potentially more which you can add easily).
foreach($row in $firstSet)
{
$employeeName = $row.Employee
$employee = [PSCustomObject]#{
Employee = $employee
Department = ""
}
$report += $employee
}
And, as you can see in the example above, add this object to your report.
Then, import the second CSV file into a second object to iterate through (for good form I would actually do this at the begining of the script, when you import your first one):
$secondSet = Import-CSV .\File2.csv
Now here is where it gets interesting. Based on just the information you have provided, I am assuming that all employees in the one file are in the same order as the departments in the other files. So for example, if I work for the "Cake Tasting Department", and my name is on row 12 of File 1, row 12 of File 2 says "Cake Tasting Department".
In this case it's fairly easy. You would just roll through both lists and update the report:
$i = 0
foreach($row in $secondSet)
{
$dept = $row.Department
$report[i].Department = $dept
$i++
}
After this, your $report object will contain all of your employees in one row and departments in the other. Then you can export it to CSV:
$report | Export-CSV .\Result.csv -NoTypeInformation
This works if, as I said, your data aligns across both files. If not, then you need to get a little fancier:
foreach($row in $secondSet)
{
$emp = $row.Employee
$dept = $row.Department
$report | Where {$_.Employee -eq $emp} foreach {$_.Department = $dept
}
Technically you could just do it this way anyway, but it depends on a lot of things. First of all whether you have the data to match in that column across both files (which obviously in my example you don't otherwise you wouldn't need to do this in the first place, but you could match across other fields you may have, like EmployeeID or DoB). Second, on the sovereignty of individual records (e.g., if you have multiple matching records in your first file, you will have a problem; you would expect duplicates in the second as there are more than one person in each department).
Anyway, I hope this helps. As I said there is probably a 'better' way to do this, but this is how I would do it.

powershell: Write specific rows from files to formatted csv

The following code gives me the correct output to console. But I would need it in a csv file:
$array = #{}
$files = Get-ChildItem "C:\Temp\Logs\*"
foreach($file in $files){
foreach($row in (Get-Content $file | select -Last 2)){
if($row -like "Total peak job memory used:*"){
$sp_memory = $row.Split(" ")[5]
$array.Add(($file.BaseName),([double]$sp_memory))
break
}
}
}
$array.GetEnumerator() | sort Value -Descending |Format-Table -AutoSize
current output (console):
required output (csv):
In order to increase performance I would like to avoid the array and write output directly to csv (no append).
Thanks in advance!

Change your last line to this -
$array.GetEnumerator() | sort Value -Descending | select #{l='FileName'; e={$_.Name}}, #{l='Memory (MB)'; e={$_.Value }} | Export-Csv -path $env:USERPROFILE\Desktop\Output.csv -NoTypeInformation
This will give you a csv file named Output.csv on your desktop.
I am using Calculated properties to change the column headers to FileName and Memory (MB) and piping the output of $array to Export-Csv cmdlet.
Just to let you know, your variable $array is of type Hashtable which won't store duplicate keys. If you need to store duplicate key/value pairs, you can use arrays. Just suggesting! :)

Select specific column based on data supplied using Powershell

I have a csv file that may have unknown headers, one of the columns will contain email addresses for example.
Is there a way to select only the column that contains the email addresses and save it as a list to a variable?
One csv could have the header say email, another could say emailaddresses, another could say email addresses another file might not even have the word email in the header. As you can see, the headers are different. So I want to be able to detect the correct column first and use that data further in the script. Once the column is identified based on the data it contains, select that column only.
I've tried the where-object and select-string cmdlets. With both, the output is the entire array and not just the data in the column I am wanting.
$CSV = import-csv file.csv
$CSV | Where {$_ -like "*#domain.com"}
This outputs the entire array as all rows will contain this data.

Sample Data for visualization
id,first_name,bagel,last_name
1,Base,bcruikshank0#homestead.com,Cruikshank
2,Regan,rbriamo1#ebay.co.uk,Briamo
3,Ryley,rsacase2#mysql.com,Sacase
4,Siobhan,sdonnett3#is.gd,Donnett
5,Patty,pesmonde4#diigo.com,Esmonde
Bagel is obviously what we are trying to find. And we will play pretend in that we have no knowledge of the columns name or position ahead of time.
Find column dynamically
# Import the CSV
$data = Import-CSV $path
# Take the first row and get its columns
$columns = $data[0].psobject.properties.name
# Cycle the columns to find the one that has an email address for a row value
# Use a VERY crude regex to validate an email address.
$emailColumn = $columns | Where-Object{$data[0].$_ -match ".*#*.\..*"}
# Example of using the found column(s) to display data.
$data | Select-Object $emailColumn
Basically read in the CSV like normal and use the first columns data to try and figure out where the email address column is. There is a caveat that if there is more than one column that matches it will get returned.
To enforce only 1 result a simple pipe to Select-Object -First 1 will handle that. Then you just have to hope the first one is the "right" one.

If you're using Import-Csv, the result is a PSCustomObject.
$CsvObject = Import-Csv -Path 'C:\Temp\Example.csv'
$Header = ($CsvObject | Get-Member | Where-Object { $_.Name -like '*email*' }).Name
$CsvObject.$Header
This filters for the header containing email, then selects that column from the object.
Edit for requirement:
$Str = #((Get-Content -Path 'C:\Temp\Example.csv') -like '*#domain.com*')
$Headers = #((Get-Content -Path 'C:\Temp\Example.csv' -TotalCount 1) -split ',')
$Str | ConvertFrom-Csv -Delimiter ',' -Header $Headers

Other method:
$PathFile="c:\temp\test.csv"
$columnName=$null
$content=Get-Content $PathFile
foreach ($item in $content)
{
$SplitRow= $item -split ','
$Cpt=0..($SplitRow.Count - 1) | where {$SplitRow[$_] -match ".*#*.\..*"} | select -first 1
if ($Cpt)
{
$columnName=($content[0] -split ',')[$Cpt]
break
}
}
if ($columnName)
{
import-csv "c:\temp\test.csv" | select $columnName
}
else
{
"No Email column founded"
}

Compare 2 csv files and match based on 1 column then export new file that contains fields from both

I have 2 csv files. Each have different headers and different number of columns, and have different number of entries.
Here are some examples of the first couple lines
CSV 1
ID,Last_Name,First_Name,Middle_Name,Email_Addr,Title,Gender
###1,smith,bill,p,smith#soso.com,boss,m
###2,smith2,billy,p,smith2#soso.com,someguy,m
CSV 2
ID,Name Id,Last Name,First Name,Middle Name,Gender
###2,ID1010,smith2,billy,p,M
I am trying to import them and compare the ID column. When a match is found I am wanting a new csv file with All info from CSV 1 and the matched Name ID from csv 2.
New CSV Example:
ID,Last_Name,First_Name,Middle_Name,Email_Addr,Title,Gender,Name Id
###1,smith,bill,p,smith#soso.com,boss,m,
###2,smith2,billy,p,smith2#soso.com,someguy,m,ID1010
Ive been looking and came across this Stackoverflow from about a year ago that seemed to be on the right track but I cant seem to get code modified for my needs. Here is what I have tried.
$csv1 = Import-Csv -Path C:\STAFF\test1sky.csv
$csv2 = Import-Csv -Path C:\STAFF\test1power.csv
ForEach($Record in $csv2){
$MatchedValue = (Compare-Object $csv1 $Record -Property "ID" -IncludeEqual -ExcludeDifferent -PassThru).value
$Record = Add-Member -InputObject $Record -Type NoteProperty -Name "Name Id" -Value $MatchedValue
}
$csv2|Export-Csv 'C:\STAFF\combined.csv' -NoTypeInformation
I get the correct header in the new file but I never get the Name ID values to come though.
Any idea where I went wrong? I maybe on the wrong path completely and there be a easier way, but I need to be able to do this nightly without user interaction. Any help is appreciated!!

Let's try to simplify this. Add the 'Name ID' field to all records in CSV1. Then loop through it, and get the matches, and update the field. Something like:
$CSV1 = C:\Path\To\File1.csv
$CSV2 = C:\Path\To\File2.csv
$CSV1|ForEach{$_|Add-Member 'Name ID' $Null}
ForEach($Record in $CSV1){
$Record.'Name ID' = $CSV2|Where{$_.ID -eq $Record.ID}|Select -Expand 'Name ID'
}

$CSV1 = import-csv C:\Path\To\File1.csv
$CSV2 = import-csv C:\Path\To\File2.csv
#adds a row named "Name ID" to the PS Object( the CSV Import)
$CSV1|ForEach{$_|Add-Member 'Name ID' $Null}
ForEach($Record in $CSV1){
#gets the value from CSV1 for comparing to CSV2
$NameValue=Record."Last_Name"
#gets the Power Shell Object from the CSV2 Import that matches the Name ID from $csv1
$Nameobject= $CSV2|Where-object "Last Name" -contains $Namevalue
#Sets the Field "Name ID" in the PS Object $CSV1 Record to the Name ID from $csv2
$record."Name ID" = $Nameobject."Name ID"
}
You can easily grab addtional fields by adding other references to the CSV1 File by manipulating the CSV2 PS Object.
$record."Middle Name" = $nameobject."Middle_Name"
Since you have the entire object in the for loop form $csv2 you can call any of its fields or manipulate them by using variables and " |select -Property "Value" Like this
$objlength = $nameobject |select "First_Name"
$objlength.length
but i prefer to call it directly from the object as the output looks cleaner like this
$nameobject."First_Name".length

The operation you are looking for is called a relational join. Sometimes it's called an inner join, and sometimes just a join. My knowledge of join comes from SQL, not from Powershell.
Here's a description of "Join-Object". It seems to be what you are looking for.
http://blogs.msdn.com/b/powershell/archive/2012/07/13/join-object.aspx

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse