Fastest way to combine multiple csv files based on 1st column value - powershell

So lets say I have 5 csv files (created in order from 1 to 5) with 8-10 columns each. Each file has about 300,000 (give or take) rows each.
Each file should match the value (unique) from the first column in every file, and then combine the records + column title(s). If files 2 through 5 do not have it's value from column 1 found in file1 (from column 1), the entire row should be excluded from the merging.
Example below of two (out of 5) csv files...
File1
ColumnTitle1,ColumnTitle2,ColumnTitle3,ColumnTitle4,ColumnTitle5,ColumnTitle6,ColumnTitle7,ColumnTitle8,ColumnTitle9,ColumnTitle10
Column1Value3145,Column2Value1,Column3Value1,Column4Value1,Column5Value1,Column6Value1,Column7Valu1,Column8Value1,Column9Value1,Column10Value1
Column1Value573,Column2Value2,Column3Value2,Column4Value2,Column5Value2,Column6Value2,Column7Valu2,Column8Value2,Column9Value2,Column10Value2
Column1Value62317,Column2Value3,Column3Value3,Column4Value3,Column5Value3,Column6Value3,Column7Valu3,Column8Value3,Column9Value3,Column10Value3
Column1Value93,Column2Value4,Column3Value4,Column4Value4,Column5Value4,Column6Value4,Column7Valu4,Column8Value4,Column9Value4,Column10Value4
File2:
ColumnTitle1,ColumnTitle11,ColumnTitle12,ColumnTitle13,ColumnTitle14,ColumnTitle15,ColumnTitle16,ColumnTitle17,ColumnTitle18
Column1Value752789,Column11Value1,Column12Value1,Column13Value1,Column14Value1,Column15Value1,Column16Value1,Column17Value1,Column18Value1
Column1Value3145,Column11Value2,Column12Value2,Column13Value2,Column14Value2,Column15Value2,Column16Value2,Column17Value2,Column18Value2
Column1Value573,Column11Value3,Column12Value3,Column13Value3,Column14Value3,Column15Value3,Column16Value3,Column17Value3,Column18Value3
Column1Value832657,Column11Value4,Column12Value4,Column13Value4,Column14Value4,Column15Value4,Column16Value4,Column17Value4,Column18Value4
Column1Value62317,Column11Value5,Column12Value5,Column13Value5,Column14Value5,Column15Value5,Column16Value5,Column17Value5,Column18Value5
Column1Value93,Column11Value6,Column12Value6,Column13Value6,Column14Value6,Column15Value6,Column16Value6,Column17Value6,Column18Value6
Column1Value423568,Column11Value7,Column12Value7,Column13Value7,Column14Value7,Column15Value7,Column16Value7,Column17Value7,Column18Value7
If I were to just merge these two files (2 out of the 5) it would look something like this:
ColumnTitle1,ColumnTitle2,ColumnTitle3,ColumnTitle4,ColumnTitle5,ColumnTitle6,ColumnTitle7,ColumnTitle8,ColumnTitle9,ColumnTitle10,ColumnTitle11,ColumnTitle12,ColumnTitle13,ColumnTitle14,ColumnTitle15,ColumnTitle16,ColumnTitle17,ColumnTitle18
Column1Value3145,Column2Value1,Column3Value1,Column4Value1,Column5Value1,Column6Value1,Column7Valu1,Column8Value1,Column9Value1,Column10Value1,Column11Value2,Column12Value2,Column13Value2,Column14Value2,Column15Value2,Column16Value2,Column17Value2,Column18Value2
Column1Value573,Column2Value2,Column3Value2,Column4Value2,Column5Value2,Column6Value2,Column7Valu2,Column8Value2,Column9Value2,Column10Value2,Column11Value3,Column12Value3,Column13Value3,Column14Value3,Column15Value3,Column16Value3,Column17Value3,Column18Value3
Column1Value62317,Column2Value3,Column3Value3,Column4Value3,Column5Value3,Column6Value3,Column7Valu3,Column8Value3,Column9Value3,Column10Value3,Column11Value5,Column12Value5,Column13Value5,Column14Value5,Column15Value5,Column16Value5,Column17Value5,Column18Value5
Column1Value93,Column2Value4,Column3Value4,Column4Value4,Column5Value4,Column6Value4,Column7Valu4,Column8Value4,Column9Value4,Column10Value4,Column11Value6,Column12Value6,Column13Value6,Column14Value6,Column15Value6,Column16Value6,Column17Value6,Column18Value6
Adding files 3 - 5 would increase the columns to around 50 (give or take).
I'm not sure if this is the quickest method, but here is the logic I am thinking (which I'm not sure how to do using powershell):
Go one file at a time to match and merge with file one
store file1 in variable
store file2 in variable
Loop through lines in file1
\\\\ Where value1 in column1 from file1 is found in column1 from file2
\\\\ append row from file2 to row in file1
\\\\ remove row from file2 (lessen the search during the next loop iteration)
clear variable holding file2
store next file in variable
repeat the loop find and append iterations

All roads lead to Rome. One of them is:
#Hashtable to store master-objects in
$data = #{}
#Import-CSV -Filter "MyMasterList.csv" | Foreach-Object { $data[$_.ColumnTitle1] = $_ }
#Sampledata below
#"
ColumnTitle1,ColumnTitle2,ColumnTitle3,ColumnTitle4,ColumnTitle5,ColumnTitle6,ColumnTitle7,ColumnTitle8,ColumnTitle9,ColumnTitle10
Column1Value3145,Column2Value1,Column3Value1,Column4Value1,Column5Value1,Column6Value1,Column7Valu1,Column8Value1,Column9Value1,Column10Value1
Column1Value573,Column2Value2,Column3Value2,Column4Value2,Column5Value2,Column6Value2,Column7Valu2,Column8Value2,Column9Value2,Column10Value2
Column1Value62317,Column2Value3,Column3Value3,Column4Value3,Column5Value3,Column6Value3,Column7Valu3,Column8Value3,Column9Value3,Column10Value3
Column1Value93,Column2Value4,Column3Value4,Column4Value4,Column5Value4,Column6Value4,Column7Valu4,Column8Value4,Column9Value4,Column10Value4
"# | ConvertFrom-Csv | % { $data[$_.ColumnTitle1] = $_ }
Get-ChildItem -Path "C:\MyOtherCSVs" -Filter "*.csv" | ForEach-Object { Import-Csv -Path $_.FullName } | ForEach-Object {
$ID = $_.ColumnTitle1
#If row is in MasterList
if($data.ContainsKey($ID)) {
#Get matching object
$obj = $data[$ID]
#Foreach line in csv
$_.psobject.Properties | Where-Object { $_.Name -ne 'ColumnTitle1' } | ForEach-Object {
#Foreach property, add to master-object
Add-Member -InputObject $obj -MemberType NoteProperty -Name $_.Name -Value $_.Value
}
#Put modified object back into hashtable
$data[$ID] = $obj
}
}
$data.Values | Export-Csv -Path "MergedCSV.csv" -NoTypeInformation
Be sure to pack some extra memory with large CSV-files.

Related

Export the matched output to a CSV as Column 1 and Column 2 using Powershell

I have below code to match the pattern and save it in CSV file. I need to save regex1 and regex2 as col1 and col2 in csv instead of saving all in 1st col.
$inputfile = ( Get-Content D:\Users\naham1224\Desktop\jil.txt )
$FilePath = "$env:USERPROFILE\Desktop\jil2.csv"
$regex1 = "(insert_job: [A-Za-z]*_*\S*)"
$regex2 = "(machine: [A-Z]*\S*)"
$inputfile |
Select-String -Pattern $regex2,$regex1 -AllMatches |
ForEach-Object {$_.matches.groups[1].value} |
Add-Content $FilePath`
Input file contains : input.txt
/* ----------------- AUTOSYS_DBMAINT ----------------- */
insert_job: AUTOSYS_DBMAINT job_type: CMD
command: %AUTOSYS%\bin\DBMaint.bat
machine: PWISTASASYS01
owner: svc.autosys#cbs
permission:
date_conditions: 1
days_of_week: su,mo,tu,we,th,fr,sa
start_times: "03:30"
description: "Runs DBmaint process on AE Database - if fails - MTS - will run next scheduled time"
std_out_file: ">$$LOGS\dbmaint.txt"
std_err_file: ">$$LOGS\dbmaint.txt"
alarm_if_fail: 0
alarm_if_terminated: 0
send_notification: 0
notification_msg: "Check DBMaint output in autouser.PD1\\out directory"
notification_emailaddress: jnatal#cbs.com
/* ----------------- TEST_ENV ----------------- */
insert_job: TEST_ENV job_type: CMD
command: set
machine: PWISTASASYS01
owner: svc.autosys#cbs
permission:
date_conditions: 1
days_of_week: su,mo,tu,we,th,fr,sa
start_times: "03:30"
description: "output env"
std_out_file: ">C:\Users\svc.autosys\Documents\env.txt"
std_err_file: ">C:\Users\svc.autosys\Documents\env.txt"
alarm_if_fail: 1
alarm_if_terminated: 1
Current output :
Current output
Expected output :
Expected output
I am trying various ways to do so but no luck. any suggestions and help is greatly appreciated.
Here is how I would do this:
$inputPath = 'input.txt'
$outputPath = 'output.csv'
# RegEx patterns to extract data.
$patterns = #(
'(insert_job): ([A-Za-z]*_*\S*)'
'(machine): ([A-Z]*\S*)'
)
# Create an ordered Hashtable to collect columns for one row.
$row = [ordered] #{}
# Loop over all occurences of the patterns in input file
Select-String -Path $inputPath -Pattern $patterns -AllMatches | ForEach-Object {
# Extract key and value from current match
$key = $_.matches.Groups[ 1 ].Value
$value = $_.matches.Value
# Save one column of current row.
$row[ $key ] = $value
# If we have all columns of current row, output it as PSCustomObject.
if( $row.Count -eq $patterns.Count ) {
# Convert hashtable to PSCustomObject and output (implicitly)
[PSCustomObject] $row
# Clear Hashtable in preparation for next row.
$row.Clear()
}
} | Export-Csv $outputPath -NoTypeInformation
Output CSV:
"insert_job","machine"
"insert_job: AUTOSYS_DBMAINT","machine: PWISTASASYS01"
"insert_job: TEST_ENV","machine: PWISTASASYS01"
Remarks:
Using Select-String with parameter -Path we don't have to read the input file beforehand.
An ordered Hashtable (a dictionary) is used to collect all columns, until we have an entire row to output. This is the crucial step to produce multiple columns instead of outputting all data in a single column.
Converting the Hashtable to a PSCustomObject is necessary because Export-Csv expects objects, not dictionaries.
While the CSV looks like your "expected output" and you possibly have good reason to expect it like that, in a CSV file the values normally shouldn't repeat the column names. To remove the column names from the values, simply replace $value = $_.matches.Value by $_.matches.Groups[ 2 ].Value, which results in an output like this:
"insert_job","machine"
"AUTOSYS_DBMAINT","PWISTASASYS01"
"TEST_ENV","PWISTASASYS01"
As for what you have tried:
Add-Content writes only plain text files from string input. While you could use it to create CSV files, you would have to add separators and escape strings all by yourself, which is easy to get wrong and more hassle than necessary. Export-CSV otoh takes objects as inputs and cares about all of the CSV format details automatically.
As zett42 mentioned Add-Content is not the best fit for this. Since you are looking for multiple values separated by commas Export-Csv is something you can use. Export-Csv will take objects from the pipeline, convert them to lines of comma-separated properties, add a header line, and save to file
I took a little bit of a different approach here with my solution. I've combined the different regex patterns into one which will give us one match that contains both the job and machine names.
$outputPath = "$PSScriptRoot\output.csv"
# one regex to match both job and machine in separate matching groups
$regex = '(?s)insert_job: (\w+).+?machine: (\w+)'
# Filter for input files
$inputfiles = Get-ChildItem -Path $PSScriptRoot -Filter input*.txt
# Loop through each file
$inputfiles |
ForEach-Object {
$path = $_.FullName
Get-Content -Raw -Path $path | Select-String -Pattern $regex -AllMatches |
ForEach-Object {
# Loop through each match found in the file.
# Should be 2, one for AUTOSYS_DBMAINT and another for TEST_ENV
$_.Matches | ForEach-Object {
# Create objects with the values we want that we can output to csv file
[PSCustomObject]#{
# remove next line if not needed in output
InputFile = $path
Job = $_.Groups[1].Value # 1st matching group contains job name
Machine = $_.Groups[2].Value # 2nd matching group contains machine name
}
}
}
} | Export-Csv $outputPath # Pipe our objects to Export-Csv
Contents of output.csv
"InputFile","Job","Machine"
"C:\temp\powershell\input1.txt","AUTOSYS_DBMAINT","PWISTASASYS01"
"C:\temp\powershell\input1.txt","TEST_ENV","PWISTATEST2"
"C:\temp\powershell\input2.txt","AUTOSYS_DBMAINT","PWISTASAPROD1"
"C:\temp\powershell\input2.txt","TEST_ENV","PWISTATTEST1"

Powershell: How to merge unique headers from one CSV to another?

Edit 1:
So I've figure out how to get the unique headers in CSV 2 to append to CSV 1.
$header = ($table | Get-Member -MemberType NoteProperty).Name
$header_add = ($table_add | Get-Member -MemberType NoteProperty).Name
$header_diff = $header + $header_add
$header_diff = ($header_diff | Sort-Object -Unique)
$header_diff = (Compare-Object -ReferenceObject $header -DifferenceObject $header_diff -PassThru)
$header is an array of headers from CSV 1 ($table). $header_add is an array of headers from CSV 2 ($table_add). $header_diff houses the unique headers in CSV 2 by the end of the code block.
So as far as I'm aware, my next step would be:
$append = ($table_add | Select-Object $header_diff)
My problem now is how do I append these objects to my CSV 1 ($table 1) object? I don't quite see a way for Add-Member to do this in a particularly nice fashion.
Original:
Here's the headers for the two CSV files I'm trying to combine.
CSV 1:
Date, Name, Assigned Router, City, Country, # of Calls , Calls in , Calls out
CSV 2:
Date, Name, Assigned Router, City, Country, # of Minutes, Minutes in, Minutes out
So a quick rundown of what these files are; both files contain call information for a set of names for one day (the date column has the same date for each row; this is because this eventually gets sent to a master .xlsx file with all dates combined). All of the columns up to Country contain the same values in the same order in both files. The files simply separate the # of calls and # of minutes data. I was wondering if there was a convenient way to move the unlike columns from one CSV to another.
I've tried using something along the lines of:
Import-Csv (Get-ChildItem <directory> -Include <common pattern in file pair>) | Export-Csv <output path> -NoTypeInformation
This didn't combine all of the matching headers and append the unique ones afterwards. Only the first file that's processed kept its unique headers. The second file that was processed had all of those headers and data discarded in the output. Shared header data in the second CSV was added as additional rows.
An example output of my described fail output:
PS > $small | Format-Table
Column_1 Column_2 Column_3
-------- -------- --------
1 a a
1 b b
1 c c
PS > $small_add | Format-Table
Column_1 Column_4 Column_5
-------- -------- --------
1 x x
1 y y
1 z z
PS > Import-Csv (Get-ChildItem ./*.* -Include "small*.csv") | Select-Object * -unique | Format-Table
Column_1 Column_2 Column_3
-------- -------- --------
1 a a
1 b b
1 c c
1
1
1
I was wondering if I could do something like the following algorithm:
Import-Csv CSV_1 and CSV_2 to separate variables
Compare CSV_2 headers to CSV_1 headers, storing the unlike headers in CSV_2 into a separate variable
Select-Object all CSV_1 headers and unlike CSV_2 headers
Pipe the Select-Object output to Export-Csv
The only other method I could only think of is doing it line by line where I would:
Import-Csv both
remove all of the shared columns from CSV_2
change it from the custom object Powershell uses for CSVs to a string
append each line of CSV_2 to each line of CSV_1
It feels a bit unrefined and inflexible (flexibility can probably be dealt with by how columns/headers are isolated so there's no problem appending strings).
* This answer focuses on a high-level-of-abstraction OO solution.
* The OP's own solution relies more on string processing, which has the potential to be faster.
# The input file paths.
$files = 'csv1.csv', 'csv2.csv'
$outFile = 'csvMerged.csv'
# Read the 2 CSV files into collections of custom objects.
# Note: This reads the entire files into memory.
$doc1 = Import-Csv $files[0]
$doc2 = Import-Csv $files[1]
# Determine the column (property) names that are unique to document 2.
$doc2OnlyColNames = (
Compare-Object $doc1[0].psobject.properties.name $doc2[0].psobject.properties.name |
Where-Object SideIndicator -eq '=>'
).InputObject
# Initialize an ordered hashtable that will be used to temporarily store
# each document 2 row's unique values as key-value pairs, so that they
# can be appended as properties to each document-1 row.
$htUniqueRowD2Props = [ordered] #{}
# Process the corresponding rows one by one, construct a merged output object
# for each, and export the merged objects to a new CSV file.
$i = 0
$(foreach($rowD1 in $doc1) {
# Get the corresponding row from document 2.
$rowD2 = $doc2[$i++]
# Extract the values from the unique document-2 columns and store them in the ordered
# hashtable.
foreach($pname in $doc2OnlyColNames) { $htUniqueRowD2Props.$pname = $rowD2.$pname }
# Add the properties represented by the hashtable entries to the
# document-1 row at hand and output the augmented object (-PassThru).
$rowD1 | Add-Member -NotePropertyMembers $htUniqueRowD2Props -PassThru
}) | Export-Csv -NoTypeInformation -Encoding Utf8 $outFile
To put the above to the test, you can use the following sample input:
# Create sample input CSV files
#'
Date,Name,Assigned Router,City,Country,# of Calls,Calls in,Calls out
dt,nm,ar,ct,cy,cc,ci,co
dt2,nm2,ar2,ct2,cy2,cc2,ci2,co2
'# > csv1.csv
# Same column layout and data as above through column 'Country', then different.
#'
Date,Name,Assigned Router,City,Country,# of Minutes,Minutes in,Minutes out
dt,nm,ar,ct,cy,mc,mi,mo
dt2,nm2,ar2,ct2,cy2,mc2,mi2,mo2
'# > csv2.csv
The code should produce the following content in csvMerged.csv:
"Date","Name","Assigned Router","City","Country","# of Calls","Calls in","Calls out","# of Minutes","Minutes in","Minutes out"
"dt","nm","ar","ct","cy","cc","ci","co","mc","mi","mo"
"dt2","nm2","ar2","ct2","cy2","cc2","ci2","co2","mc2","mi2","mo2"
Edit 1:
# Read 2 CSVs into PowerShell CSV object
$table = Import-Csv test.csv
$table_add = Import-Csv test_add.csv
# Isolate unique headers in second CSV
$unique_headers = (Compare-Object -ReferenceObject $table[0].PSObject.Properties.Name -DifferenceObject $table_add[0].PSObject.Properties.Name | Where-Object SideIndicator -eq "=>").InputObject
# Convert CSVs to strings, with second CSV only containing unique columns
$table_str = ($table | ConvertTo-Csv -NoTypeInformation)
$table_add_str = ($table_add | Select-Object $unique_headers | ConvertTo-Csv -NoTypeInformation)
# Append CSV 2's unique columns to CSV 1
# Set line counter
$line = 0
# Concatenate CSV 2 lines to the end of CSV 1 lines until one or both are out of lines
While (($table_str[$line] -ne $null) -and ($table_add_str[$line] -ne $null)) {
If ($line -eq 0) {
$table_sum_str = $table_str[$line] + "," + $table_add_str[$line]
}
If ($line -ne 0) {
$table_sum_str = $table_sum_str + "`n" + ($table_str[$line] + "," + $table_add_str[$line])
}
$line = $line + 1
}
$table_sum_str | Set-Content -Path $outpath -Encoding UTF8
Using Measure-Command, the above code on my machine for the most part takes anywhere between 14-17 milliseconds to run. Running Measure-Command on mklement's yields effectively the same times from just eyeballing it.
Note that for both solutions, the data in the 2 CSV files must be in the same order. If you want to add 2 CSVs together that have complimentary data but in different orders, you need to use mklement's object oriented approach and add mechanisms to match the data to a location or name.
Original:
For those who don't want to use a hash table to do this:
# Make sure you're in same directory as files:
# CSV 1
$table = Import-Csv test.csv
# CSV 2
$table_add = Import-Csv test_add.csv
# Get array with CSV 1 headers
$header = ($table | Get-Member -MemberType NoteProperty).Name
# Get array with CSV 2 headers
$header_add = ($table_add | Get-Member -MemberType NoteProperty).Name
# Add arrays of both headers together
$header_diff = $header + $header_add
# Sort the headers, remove duplicate headers (first couple ones), keep unique ones
$header_diff = ($header_diff | Sort-Object -Unique)
# Remove all of CSV 1's unique headers and shared headers
$header_diff = (Compare-Object -ReferenceObject $header -DifferenceObject $header_diff -PassThru)
# Generate a CSV table containing only CSV 2's unique headers
$table_diff = ($table_add | Select-Object $header_diff)
# Convert CSV 1 from a custom PSObject to a string
$table_str = ($table | Select-Object * | ConvertTo-Csv)
# Convert CSV 2 (unique headers only) from custom PSObject to a string
$table_diff_str = ($table_diff | Select-Object * | ConvertTo-Csv)
# Set line counter
$line = 0
# Set flag for if headers have been processed
$headproc = 0
# Concatenate CSV 2 lines to the end of CSV 1 lines until one or both are out of lines.
While (($table_str[$line] -ne $null) -and ($table_diff_str[$line] -ne $null)) {
If ($headproc -eq 1) {
$table_sum_str = $table_sum_str + "`n" + ($table_str[$line] + "," + $table_diff_str[$line])
}
If ($headproc -eq 0) {
$table_sum_str = $table_str[$line] + "," + $table_diff_str[$line]
$headproc = 1
}
$line = $line + 1
}
$table_sum_str | ConvertFrom-Csv | Select-Object * | Export-Csv -Path "./test_sum.csv" -Encoding UTF8 -NoTypeInformation
Ran a quick comparison using Measure-Command between this and mklement0's script.
PS > Measure-Command {./self.ps1}
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 26
Ticks : 267771
TotalDays : 3.09920138888889E-07
TotalHours : 7.43808333333333E-06
TotalMinutes : 0.000446285
TotalSeconds : 0.0267771
TotalMilliseconds : 26.7771
PS > Measure-Command {./mklement.ps1}
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 18
Ticks : 185058
TotalDays : 2.141875E-07
TotalHours : 5.1405E-06
TotalMinutes : 0.00030843
TotalSeconds : 0.0185058
TotalMilliseconds : 18.5058
I assume speed differences are because I spend time creating a separate CSV PSObject to isolate columns instead of comparing them directly. mklement's also has the advantage of keeping the columns in the same order.

Powershell removing columns and rows from CSV

I'm having trouble making some changes to a series of CSV files, all with the same data structure. I'm trying to combine all of the files into one CSV file or one tab delimited text file (don't really mind), however each file needs to have 2 empty rows removed and two of the columns removed, below is an example:
col1,col2,col3,col4,col5,col6 <-remove
col1,col2,col3,col4,col5,col6 <-remove
col1,col2,col3,col4,col5,col6
col1,col2,col3,col4,col5,col6
^ ^
remove remove
End Result:
col1,col2,col4,col6
col1,col2,col4,col6
This is my attempt at doing this (I'm very new to Powershell)
$ListofFiles = "example.csv" #this is an list of all the CSV files
ForEach ($file in $ListofFiles)
{
$content = Get-Content ($file)
$content = $content[2..($content.Count)]
$contentArray = #()
[string[]]$contentArray = $content -split ","
$content = $content[0..2 + 4 + 6]
Add-Content '...\output.txt' $content
}
Where am I going wrong here...
your example file should be read, before foreach to fetch the file list
$ListofFiles = get-content "example.csv"
Inside the foreach you are getting content of mainfile
$content = Get-Content ($ListofFiles)
instead of
$content = Get-Content $file
and for removing rows i will recommend this:
$obj = get-content C:\t.csv | select -Index 0,1,3
for removing columns (column numbers 0,1,3,5):
$obj | %{(($_.split(","))[0,1,3,5]) -join "," } | out-file test.csv -Append
According to the fact the initial files looks like
col1,col2,col3,col4,col5,col6
col1,col2,col3,col4,col5,col6
,,,,,
,,,,,
You can also try this one liner
Import-Csv D:\temp\*.csv -Header 'C1','C2','C3','C4','C5','C6' | where {$_.c1 -ne ''} | select -Property 'C1','C2','C5' | Export-Csv 'd:\temp\final.csv' -NoTypeInformation
According to the fact that you CSVs have all the same structure, you can directly open them providing the header, then remove objects with the missing datas then export all the object in a csv file.
It is sufficient to specify fictitious column names, with a column number that can exceed the number of columns in the file, change where you want and exclude columns that you do not want to take.
gci "c:\yourdirwithcsv" -file -filter *.csv |
%{ Import-Csv $_.FullName -Header C1,C2,C3,C4,C5,C6 |
where C1 -ne '' |
select -ExcludeProperty C3, C4 |
export-csv "c:\temp\merged.csv" -NoTypeInformation
}

Loop through csv compare content with an array and then add content to csv

I don't know how to append a string to CSV. What am I doing:
I have two csv files. One with a list of host-names and id's and another one with a list of host-names and some numbers.
Example file 1:
Hostname | ID
IWBW140004 | 3673234
IWBW130023 | 2335934
IWBW120065 | 1350213
Example file 2:
ServiceCode | Hostname | ID
4 | IWBW120065 |
4 | IWBW140004 |
4 | IWBW130023 |
Now I read the content of file 1 in a two dimensional array:
$pcMatrix = #(,#())
Import-Csv $outputFile |ForEach-Object {
foreach($property in $_.PSObject.Properties){
$pcMatrix += ,($property.Value.Split(";")[1],$property.Value.Split(";")[2])
}
}
Then I read the content of file 2 and compare it with my array:
Import-Csv $Group".csv" | ForEach-Object {
foreach($property in $_.PSObject.Properties){
for($i = 0; $i -lt $pcMatrix.Length; $i++){
if($pcMatrix[$i][0] -eq $property.Value.Split('"')[1]){
#Add-Content here
}
}
}
}
What do I need to do, to append $pcMatrix[$i][1] to the active column in file 2 in the row ID?
Thanks for your suggestions.
Yanick
It seems like you are over-complicating this task.
If I understand you correctly, you want to populate the ID column in file two, with the ID that corresponds to the correct hostname from file 1. The easiest way to do that, is to fill all the values from the first file into a HashTable and use that to lookup the ID for each row in the second file:
# Read the first file and populate the HashTable:
$File1 = Import-Csv .\file1.txt -Delimiter "|"
$LookupTable = #{}
$File1 |ForEach-Object {
$LookupTable[$_.Hostname] = $_.ID
}
# Now read the second file and update the ID values:
$File2 = Import-Csv .\file2.txt -Delimiter "|"
$File2 |ForEach-Object {
$_.ID = $LookupTable[$_.Hostname]
}
# Then write the updated rows back to a new CSV file:
$File2 | Export-CSV -Path .\file3.txt -NoTypeInformation -Delimiter "|"

Remove rows present in one .csv from another .csv (windows, powershell, notepad++)

I have notepad++, powershell, and excel 2007. I have two .csv files named
database.csv and import.csv . Import.csv contains new entries that I want to put
into my database online. Database.csv contains the current records in that database.
Both files contain a simple comma-newline delimited list of unique values.
However, the database may already contain some entries in the new file. And, the new
file contains entries that are not in the database. And, the database file contains
entries that are still retained for recording purposes, but are not in the input file.
Simply combining them results in duplicates of any record that has an ongoing existence.
It also results in single copies of records only present in the database and records only
present in the input file.
What I want is a file that only contains records that are only present in the input file.
Any advice?
Assuming your csv files have the columns a, b, & c:
$db = Import-Csv database.csv
$import = Import-Csv import.csv
$new = Compare-Object -ReferenceObject $db -DifferenceObject $import -Property a,b,c -PassThru | ? { $_.SideIndicator -eq "=>" } | Select a,b,c
Just replace a, b, and c with the names of the columns you want to compare
Powershell:
Get-Content <database file> -TotalCount 1 |
Set-Content C:\somedir\ToUpload.csv
$import = #{}
Get-Content <import file> |
select -Skip 1
foreach {
$import[$_] = $true
}
Get-Content <Database file> |
select -Skip 1 |
foreach {
if ($import[$_])
{
$import[$_].remove()
}
}
$import.Keys |
Add-Content C:\Somedir\ToUpload.csv
Alternatively, reading both files into memory:
Get-Content <database file> -TotalCount 1 |
Set-Content C:\somedir\ToUpload.csv
$import = Get-Content <import file>
select -Skip 1
$database = Get-Content <database file>
select -Skip 1
$import |
where {$database -notcontains $_} |
Add-Content C:\somedir\ToUpload.csv
The solutions using import / export csv will work but impose additional memory and process overhead compared to dealing with the files as text data. The difference may be trivial or substantial, depending on the size of the files and the number of columns there are in the csv files. IMHO.
Compare-Object struggles sometimes with customobject imported from csv if you don't have any specific properties to match.
If you want performance(for large csv files), you could try this:
$i = #{}
[IO.File]::ReadAllLines("C:\input.csv") | % { $i[$_] = $true }
$reader = New-Object System.IO.StreamReader "C:\db.csv"
#Skip header. This way the output file(new.csv) will get input.csv's header
$reader.ReadLine() | Out-Null
while (($line = $reader.ReadLine()) -ne $null) {
#Remove row if it exists in db.csv
if ($i.ContainsKey($line)) {
$i.Remove($line)
}
}
$reader.Close()
$i.Keys | Add-Content c:\new.csv