Edit CSV without EXCEL - powershell

I need to edit a .CSV file by shifting the first column of data down 1 row. then taking the last value in the first column and move it to the top. Any idea how I can do this without using
$objExcel = New-Object -ComObject Excel.Application

Although I wouldn't know why you want to rotate the values in one of the columns, here is how you can do that without the need for Excel.
From your comments, I gather the CSV file has no headers and contains only data rows.
Because of that, the following adds headers when importing the data.
Suppose your csv file looks like this:
Clothing Rental,Chicago Illinois,1,25
Clothing Purchase,Dallas Texas,2,35
Clothing Free of Charge,Phoenix Arizona,3,45
Then the following should do what you want:
$data = Import-Csv -Path 'D:\yourdata.csv' -Header 'Stuff','City','Number','InStock' # or add whatever headers you like
# get the first column as array of values
$column1 = $data.Stuff
# rotate the array values
switch ($column1.Count) {
1 { Write-Host "Nothing to do here. There is only one row of data.."; break}
2 {
# swap the values
$data[0].Stuff,$data[1].Stuff = $data[1].Stuff,$data[0].Stuff
break
}
default {
$newColumn1 = #($column1[-1]; $column1[0..($column1.Count -2)])
# re-write the first column in the data
for ($i = 0; $i -lt $newColumn1.Count; $i++) {
$data[$i].Stuff = $newColumn1[$i]
}
}
}
# output on screen
$data
# output to new CSV file WITH headers
$data | Export-Csv -Path 'D:\your_rotated_data.csv' -NoTypeInformation -Force
# output to new CSV file WITHOUT headers
$data | ConvertTo-Csv -NoTypeInformation | Select-Object -Skip 1 | Set-Content -Path 'D:\your_rotated_data.csv' -Force
The output on screen after running this looks like
Stuff City Number InStock
----- ---- ------ -------
Clothing Free of Charge Chicago Illinois 1 25
Clothing Rental Dallas Texas 2 35
Clothing Purchase Phoenix Arizona 3 45
and you can see all values in the first column ("Stuff") have been rotated, i.e. the last value is now on top and the other values have moved down.

Related

Insert blank columns into csv with Powershell

In my script, I am building a custom Powershell object which will be sent to Export-Csv. The receiving party has required that I include some blank columns (no data and no header) and I have no idea how to do that.
If the object looks like this:
$obj = [PSCustomObject][ordered]#{
EMPLOYER_EIN = '123456'
ACTION_CODE = 1
LAST_NAME = Smith
FIRST_NAME = John
MIDDLE_INITIAL = $null
EMPLOYEE_SSN = '111-11-1111'
}
How can I have the resulting .csv file's first row look like this:
EMPLOYER_EIN,ACTION_CODE,,LAST_NAME,FIRST_NAME,MIDDLE_INITIAL,,EMPLOYEE_SSN
Put another way, after I run Export-Csv, I want the file to look like this when opened in Excel:
EMPLOYER_EIN
ACTION_CODE
LAST_NAME
FIRST_NAME
MIDDLE_INITIAL
EMPLOYEE_SSN
123456
1
Smith
John
111-11-1111
Note the extra columns between action_code/last_name and middle_initial/employee_ssn. I am using PS 5.1 but could use 7 if necessary.
As a test, I created a CSV test.csv with fields A,B, and C, and put a couple of lines of values:
"A","B","C"
1,2,3
4,5,6
I then executed the sequence of commands
Import-CSV -path Test.csv | Select-Object -Prop A," ",B,C | Export-CSV -Path test2.csv
and looked at the resultant test2.csv, which contained
#TYPE Selected.System.Management.Automation.PSCustomObject
"A"," ","B","C"
"1",,"2","3"
"4",,"5","6"
I believe that this is going to be the closest you'll get without manually processing the CSV as a text file.
This is essentially what Santiago Squarzon was suggesting in the comments.
If you need multiple "blank" columns, each one will have to have a header with a different non-zero number of spaces.
I suggest:
constructing the object with blank dummy properties with a shared name prefix, such as BLANK_, followed by a sequence number (the property names must be unique)
initially piping to ConvertTo-Csv, which allows use of a -replace operation to replace the dummy property names with empty strings in the first output line (the header line).
the result - which already is in CSV format - can then be saved to a CSV file with Set-Content.
$obj = [PSCustomObject] #{
EMPLOYER_EIN = '123456'
ACTION_CODE = 1
BLANK_1 = $null # first dummy property
LAST_NAME = 'Smith'
FIRST_NAME = 'John'
MIDDLE_INITIAL = $null
BLANK_2 = $null # second dummy property
EMPLOYEE_SSN = '111-11-1111'
}
$first = $true
$obj |
ConvertTo-Csv |
ForEach-Object {
if ($first) { # header row: replace dummy property names with empty string
$first = $false
$_ -replace '\bBLANK_\d+'
}
else { # data row: pass through
$_
}
} # pipe to Set-Content as needed.
Output (note the blank column names after ACTION CODE and MIDDLE_INITIAL):
"EMPLOYER_EIN","ACTION_CODE","","LAST_NAME","FIRST_NAME","MIDDLE_INITIAL","","EMPLOYEE_SSN"
"123456","1",,"Smith","John",,,"111-11-1111"

re-arrange and combine powershell custom objects

I have a system that currently reads data from a CSV file produced by a separate system that is going to be replaced.
The imported CSV file looks like this
PS> Import-Csv .\SalesValues.csv
Sale Values AA BB
----------- -- --
10 6 5
5 3 4
3 1 9
To replace this process I hope to produce an object that looks identical to the CSV above, but I do not want to continue to use a CSV file.
I already have a script that reads data in from our database and extracts the data that I need to use. I'll not detail the fairly long script that preceeds this point but in effect it looks like this:
$SQLData = Custom-SQLFunction "SELECT * FROM SALES_DATA WHERE LIST_ID = $LISTID"
$SQLData will contain ~5000+ DataRow objects that I need to query.
One of those DataRow object looks something like this:
lead_id : 123456789
entry_date : 26/10/2018 16:51:16
modify_date : 01/11/2018 01:00:02
status : WRONG
user : mrexample
vendor_lead_code : TH1S15L0NGC0D3
source_id : A543212
list_id : 333004
list_name : AA Some Text
gmt_offset_now : 0.00
SaleValue : 10
list_name is going to be prefixed with AA or BB.
SaleValue can be any integer 3 and up, however realistically extremely unlikely to be higher than 100 (as this is a monthly donation) and will be one of 3,5,10 in the vast majority of occurrences.
I already have script that takes the content of list_name, creates and populates the data I need to use into two separate psobjects ($AASalesValues and $BBSalesValues) that collates the total numbers of 'SaleValue' across the data set.
Because I cannot reliably anticipate the value of any SaleValue I have to dynamically create the psobjects properties like this
foreach ($record in $SQLData) {
if ($record.list_name -match "BB") {
if ($record.SaleValue -gt 0) {
if ($BBSalesValues | Get-Member -Name $($record.SaleValue) -MemberType Properties) {
$BBSalesValues.$($record.SaleValue) = $BBSalesValues.$($record.SaleValue)+1
} else {
$BBSalesValues | Add-Member -Name $($record.SaleValue) -MemberType NoteProperty -Value 1
}
}
}
}
The two resultant objects look like this:
PS> $AASalesValues
10 5 3 50
-- - - --
17 14 3 1
PS> $BBSalesvalues
3 10 5 4
- -- - -
36 12 11 1
I now have the data that I need, however I need to format it in a way that replicates the format of the CSV so I can pass it directly to another existing powershell script that is configured to expect the data in the format that the CSV is in, but I do not want to write the data to a file.
I'd prefer to pass this directly to the next part of the script.
Ultimately what I want to do is to produce a new object/some output that looks like the output from Import-Csv command at the top of this post.
I'd like a new object, say $OverallSalesValues, to look like this:
PS>$overallSalesValues
Sale Values AA BB
50 1 0
10 17 12
5 14 11
4 0 1
3 3 36
In the above example the values from $AASalesValues is listed under the AA column, the values from $BBSalesValues is listed under the BB column, with the rows matching the headers of the two original objects.
I did try this with hashtables but I was unable to work out how to both create them from dynamic values and format them to how I needed them to look.
Finally got there.
$TotalList = #()
foreach($n in 3..200){
if($AASalesValues.$n -or $BBSalesValues.$n){
$AACount = $AASalesValues.$n
$BBcount = $BBSalesValues.$n
$values = [PSCustomObject]#{
'Sale Value'= $n
AA = $AACount
BB = $BBcount
}
$TotalList += $values
}
}
$TotalList
produces an output of
Sale Value AA BB
---------- -- --
3 3 36
4 2
5 14 11
10 18 12
50 1
Just need to add a bit to include '0' values instead of $null.
I'm going to assume that $record contains a list of the database results for either $AASalesValues or $BBSalesValues, not both, otherwise you'd need some kind of selector to avoid counting records of one group with the other group.
Group the records by their SaleValue property as LotPings suggested:
$BBSalesValues = $record | Group-Object SaleValue -NoElement
That will give you a list of the SaleValue values with their respective count.
PS> $BBSalesValues
Count Name
----- ----
36 3
12 10
11 5
1 4
You can then update your CSV data with these values like this:
$file = 'C:\path\to\data.csv'
# read CSV into a hashtable mapping the sale value to the complete record
# (so that we can lookup the record by sale value)
$csv = #{}
Import-Csv $file | ForEach-Object {
$csv[$_.'Sale Values'] = $_
}
# Add records for missing sale values
$($AASalesValues; $BBSalesValues) | Select-Object -Expand Name -Unique | ForEach-Object {
if (-not $csv.ContainsKey($_)) {
$csv[$_] = New-Object -Type PSObject -Property #{
'Sale Values' = $_
'AA' = 0
'BB' = 0
}
}
}
# update records with values from $AASalesValues
$AASalesValues | ForEach-Object {
[int]$csv[$_.Name].AA += $_.Count
}
# update records with values from $BBSalesValues
$BBSalesValues | ForEach-Object {
[int]$csv[$_.Name].BB += $_.Count
}
# write updated records back to file
$csv.Values | Export-Csv $file -NoType
Even with your updated question the approach would be pretty much the same, you'd just add another level of grouping for collecting the sales numbers:
$sales = #{}
$record | Group-Object {$_.list_name.Split()[0]} | ForEach-Object {
$sales[$_.Name] = $_.Group | Group-Object SaleValue -NoElement
}
and then adjust the merging to something like this:
$file = 'C:\path\to\data.csv'
# read CSV into a hashtable mapping the sale value to the complete record
# (so that we can lookup the record by sale value)
$csv = #{}
Import-Csv $file | ForEach-Object {
$csv[$_.'Sale Values'] = $_
}
# Add records for missing sale values
$sales.Values | Select-Object -Expand Name -Unique | ForEach-Object {
if (-not $csv.ContainsKey($_)) {
$prop = #{'Sale Values' = $_}
$sales.Keys | ForEach-Object {
$prop[$_] = 0
}
$csv[$_] = New-Object -Type PSObject -Property $prop
}
}
# update records with values from $sales
$sales.GetEnumerator() | ForEach-Object {
$name = $_.Key
$_.Value | ForEach-Object {
[int]$csv[$_.Name].$name += $_.Count
}
}
# write updated records back to file
$csv.Values | Export-Csv $file -NoType

Adding columns and manipulating existing column values in csv file using powershell

I have a lot of csv files with values arranged like so:
X1,Y1
X2,Y2
...,...
Xn,Yn
I find it very tedious processing these with excel, so I want to setup a batch script to process these files such that they appear like this:
#where N is a specified value like 65536
X1,N-Y1,1
X2,N-Y2,2
...,...,...
Xn,N-Yn,n
I have only recently started using powershell for image processing (really simple scripts) and file name appending, so I am not certain how to go about this. A lot of the scripts I have encountered looking to answer this question use csv files with titles per column whereas my files are just arrays of values without object titles in the first row. I would like to avoid running multiple scripts to add titles.
My bonus question is something I have yet to find a good answer to at all, and is the most tedious part of processing. Using excels sort function, I usually change the order of the Yn values in Col2 such that they are sorted in the exported csv like so:
X1,N-Yn,n
...,...,...
Xn-1,N-Y2,2
Xn,N-Y1,1
Using the Col3 values as the sorting order (largest to smallest), then I delete this column so that the final saved csv only contains the first two columns (crucial step). Any help at all would be greatly appreciated, I apologize for the long-winded-ness of this question.
I have encountered looking to answer this question use csv files with titles per column whereas my files are just arrays of values without object titles in the first row.
The -Header parameter of Import-Csv is for adding column headers when the file does not contain them. It takes an array of strings, of however many columns there are.
I would like to avoid running multiple scripts to add titles.
If you couldn't use -Header, you could read the lines with Get-Content into memory, add a header in memory, and then use ConvertFrom-CSV all in one script.
That said, if I'm reading it rightly, you want:
No headers in the input file, and I imagine no headers in the output file
The whole point of adding the third column and sorting and removing it is just to reverse the lines?
The only column you keep is column 1?
I wouldn't use Import-Csv for this, it won't make it much nicer.
$n = 65536
# Read lines into a list, and reverse it
$lines = [Collections.Generic.List[String]](Get-Content -LiteralPath 'c:\test\test.csv')
$lines.Reverse()
# Split each line into two, create a new line with X and N-Y
# write new lines to an output file
$lines | ForEach-Object {
$x, $y = $_.split(',')
"$x,$($n - [int]$y)"
} | Set-Content -LiteralPath 'c:\test\output.csv' -Encoding Ascii
If you do want to use CSV handling, then:
$n = 65536
$counter = 1
Import-Csv -LiteralPath 'C:\test\test.csv' -Header 'ColX', 'ColY' |
Add-Member -MemberType ScriptProperty -Name 'ColN-Y' -Value {$n - $_.ColY} -PassThru |
Add-Member -MemberType ScriptProperty -Name 'N' -Value {$script:counter++} -PassThru |
Sort-Object -Property 'N' -Descending |
Select-Object -Property 'ColX', 'ColN-Y' |
Export-Csv -LiteralPath 'c:\test\output.csv' -NoTypeInformation
But the output will have CSV headers and double-quoted values.
I would try something like, by extending the original table with a calculatable script-property as a new column:
#Your N number
$N = 65536
# Import CSV file without header columns
$table = Import-Csv -Header #("colX","colY") `
-Delimiter ',' `
-Path './numbers.csv'
Write-Host "Original table"
$table | Format-Table
# Manipulate table
$newtable = $table |
Add-Member -MemberType ScriptProperty -Name colNX -Value { $N-$this.colX } - PassThru
Write-Host "New table"
$newtable | Format-Table

Powershell: How to merge unique headers from one CSV to another?

Edit 1:
So I've figure out how to get the unique headers in CSV 2 to append to CSV 1.
$header = ($table | Get-Member -MemberType NoteProperty).Name
$header_add = ($table_add | Get-Member -MemberType NoteProperty).Name
$header_diff = $header + $header_add
$header_diff = ($header_diff | Sort-Object -Unique)
$header_diff = (Compare-Object -ReferenceObject $header -DifferenceObject $header_diff -PassThru)
$header is an array of headers from CSV 1 ($table). $header_add is an array of headers from CSV 2 ($table_add). $header_diff houses the unique headers in CSV 2 by the end of the code block.
So as far as I'm aware, my next step would be:
$append = ($table_add | Select-Object $header_diff)
My problem now is how do I append these objects to my CSV 1 ($table 1) object? I don't quite see a way for Add-Member to do this in a particularly nice fashion.
Original:
Here's the headers for the two CSV files I'm trying to combine.
CSV 1:
Date, Name, Assigned Router, City, Country, # of Calls , Calls in , Calls out
CSV 2:
Date, Name, Assigned Router, City, Country, # of Minutes, Minutes in, Minutes out
So a quick rundown of what these files are; both files contain call information for a set of names for one day (the date column has the same date for each row; this is because this eventually gets sent to a master .xlsx file with all dates combined). All of the columns up to Country contain the same values in the same order in both files. The files simply separate the # of calls and # of minutes data. I was wondering if there was a convenient way to move the unlike columns from one CSV to another.
I've tried using something along the lines of:
Import-Csv (Get-ChildItem <directory> -Include <common pattern in file pair>) | Export-Csv <output path> -NoTypeInformation
This didn't combine all of the matching headers and append the unique ones afterwards. Only the first file that's processed kept its unique headers. The second file that was processed had all of those headers and data discarded in the output. Shared header data in the second CSV was added as additional rows.
An example output of my described fail output:
PS > $small | Format-Table
Column_1 Column_2 Column_3
-------- -------- --------
1 a a
1 b b
1 c c
PS > $small_add | Format-Table
Column_1 Column_4 Column_5
-------- -------- --------
1 x x
1 y y
1 z z
PS > Import-Csv (Get-ChildItem ./*.* -Include "small*.csv") | Select-Object * -unique | Format-Table
Column_1 Column_2 Column_3
-------- -------- --------
1 a a
1 b b
1 c c
1
1
1
I was wondering if I could do something like the following algorithm:
Import-Csv CSV_1 and CSV_2 to separate variables
Compare CSV_2 headers to CSV_1 headers, storing the unlike headers in CSV_2 into a separate variable
Select-Object all CSV_1 headers and unlike CSV_2 headers
Pipe the Select-Object output to Export-Csv
The only other method I could only think of is doing it line by line where I would:
Import-Csv both
remove all of the shared columns from CSV_2
change it from the custom object Powershell uses for CSVs to a string
append each line of CSV_2 to each line of CSV_1
It feels a bit unrefined and inflexible (flexibility can probably be dealt with by how columns/headers are isolated so there's no problem appending strings).
* This answer focuses on a high-level-of-abstraction OO solution.
* The OP's own solution relies more on string processing, which has the potential to be faster.
# The input file paths.
$files = 'csv1.csv', 'csv2.csv'
$outFile = 'csvMerged.csv'
# Read the 2 CSV files into collections of custom objects.
# Note: This reads the entire files into memory.
$doc1 = Import-Csv $files[0]
$doc2 = Import-Csv $files[1]
# Determine the column (property) names that are unique to document 2.
$doc2OnlyColNames = (
Compare-Object $doc1[0].psobject.properties.name $doc2[0].psobject.properties.name |
Where-Object SideIndicator -eq '=>'
).InputObject
# Initialize an ordered hashtable that will be used to temporarily store
# each document 2 row's unique values as key-value pairs, so that they
# can be appended as properties to each document-1 row.
$htUniqueRowD2Props = [ordered] #{}
# Process the corresponding rows one by one, construct a merged output object
# for each, and export the merged objects to a new CSV file.
$i = 0
$(foreach($rowD1 in $doc1) {
# Get the corresponding row from document 2.
$rowD2 = $doc2[$i++]
# Extract the values from the unique document-2 columns and store them in the ordered
# hashtable.
foreach($pname in $doc2OnlyColNames) { $htUniqueRowD2Props.$pname = $rowD2.$pname }
# Add the properties represented by the hashtable entries to the
# document-1 row at hand and output the augmented object (-PassThru).
$rowD1 | Add-Member -NotePropertyMembers $htUniqueRowD2Props -PassThru
}) | Export-Csv -NoTypeInformation -Encoding Utf8 $outFile
To put the above to the test, you can use the following sample input:
# Create sample input CSV files
#'
Date,Name,Assigned Router,City,Country,# of Calls,Calls in,Calls out
dt,nm,ar,ct,cy,cc,ci,co
dt2,nm2,ar2,ct2,cy2,cc2,ci2,co2
'# > csv1.csv
# Same column layout and data as above through column 'Country', then different.
#'
Date,Name,Assigned Router,City,Country,# of Minutes,Minutes in,Minutes out
dt,nm,ar,ct,cy,mc,mi,mo
dt2,nm2,ar2,ct2,cy2,mc2,mi2,mo2
'# > csv2.csv
The code should produce the following content in csvMerged.csv:
"Date","Name","Assigned Router","City","Country","# of Calls","Calls in","Calls out","# of Minutes","Minutes in","Minutes out"
"dt","nm","ar","ct","cy","cc","ci","co","mc","mi","mo"
"dt2","nm2","ar2","ct2","cy2","cc2","ci2","co2","mc2","mi2","mo2"
Edit 1:
# Read 2 CSVs into PowerShell CSV object
$table = Import-Csv test.csv
$table_add = Import-Csv test_add.csv
# Isolate unique headers in second CSV
$unique_headers = (Compare-Object -ReferenceObject $table[0].PSObject.Properties.Name -DifferenceObject $table_add[0].PSObject.Properties.Name | Where-Object SideIndicator -eq "=>").InputObject
# Convert CSVs to strings, with second CSV only containing unique columns
$table_str = ($table | ConvertTo-Csv -NoTypeInformation)
$table_add_str = ($table_add | Select-Object $unique_headers | ConvertTo-Csv -NoTypeInformation)
# Append CSV 2's unique columns to CSV 1
# Set line counter
$line = 0
# Concatenate CSV 2 lines to the end of CSV 1 lines until one or both are out of lines
While (($table_str[$line] -ne $null) -and ($table_add_str[$line] -ne $null)) {
If ($line -eq 0) {
$table_sum_str = $table_str[$line] + "," + $table_add_str[$line]
}
If ($line -ne 0) {
$table_sum_str = $table_sum_str + "`n" + ($table_str[$line] + "," + $table_add_str[$line])
}
$line = $line + 1
}
$table_sum_str | Set-Content -Path $outpath -Encoding UTF8
Using Measure-Command, the above code on my machine for the most part takes anywhere between 14-17 milliseconds to run. Running Measure-Command on mklement's yields effectively the same times from just eyeballing it.
Note that for both solutions, the data in the 2 CSV files must be in the same order. If you want to add 2 CSVs together that have complimentary data but in different orders, you need to use mklement's object oriented approach and add mechanisms to match the data to a location or name.
Original:
For those who don't want to use a hash table to do this:
# Make sure you're in same directory as files:
# CSV 1
$table = Import-Csv test.csv
# CSV 2
$table_add = Import-Csv test_add.csv
# Get array with CSV 1 headers
$header = ($table | Get-Member -MemberType NoteProperty).Name
# Get array with CSV 2 headers
$header_add = ($table_add | Get-Member -MemberType NoteProperty).Name
# Add arrays of both headers together
$header_diff = $header + $header_add
# Sort the headers, remove duplicate headers (first couple ones), keep unique ones
$header_diff = ($header_diff | Sort-Object -Unique)
# Remove all of CSV 1's unique headers and shared headers
$header_diff = (Compare-Object -ReferenceObject $header -DifferenceObject $header_diff -PassThru)
# Generate a CSV table containing only CSV 2's unique headers
$table_diff = ($table_add | Select-Object $header_diff)
# Convert CSV 1 from a custom PSObject to a string
$table_str = ($table | Select-Object * | ConvertTo-Csv)
# Convert CSV 2 (unique headers only) from custom PSObject to a string
$table_diff_str = ($table_diff | Select-Object * | ConvertTo-Csv)
# Set line counter
$line = 0
# Set flag for if headers have been processed
$headproc = 0
# Concatenate CSV 2 lines to the end of CSV 1 lines until one or both are out of lines.
While (($table_str[$line] -ne $null) -and ($table_diff_str[$line] -ne $null)) {
If ($headproc -eq 1) {
$table_sum_str = $table_sum_str + "`n" + ($table_str[$line] + "," + $table_diff_str[$line])
}
If ($headproc -eq 0) {
$table_sum_str = $table_str[$line] + "," + $table_diff_str[$line]
$headproc = 1
}
$line = $line + 1
}
$table_sum_str | ConvertFrom-Csv | Select-Object * | Export-Csv -Path "./test_sum.csv" -Encoding UTF8 -NoTypeInformation
Ran a quick comparison using Measure-Command between this and mklement0's script.
PS > Measure-Command {./self.ps1}
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 26
Ticks : 267771
TotalDays : 3.09920138888889E-07
TotalHours : 7.43808333333333E-06
TotalMinutes : 0.000446285
TotalSeconds : 0.0267771
TotalMilliseconds : 26.7771
PS > Measure-Command {./mklement.ps1}
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 18
Ticks : 185058
TotalDays : 2.141875E-07
TotalHours : 5.1405E-06
TotalMinutes : 0.00030843
TotalSeconds : 0.0185058
TotalMilliseconds : 18.5058
I assume speed differences are because I spend time creating a separate CSV PSObject to isolate columns instead of comparing them directly. mklement's also has the advantage of keeping the columns in the same order.

Fastest way to combine multiple csv files based on 1st column value

So lets say I have 5 csv files (created in order from 1 to 5) with 8-10 columns each. Each file has about 300,000 (give or take) rows each.
Each file should match the value (unique) from the first column in every file, and then combine the records + column title(s). If files 2 through 5 do not have it's value from column 1 found in file1 (from column 1), the entire row should be excluded from the merging.
Example below of two (out of 5) csv files...
File1
ColumnTitle1,ColumnTitle2,ColumnTitle3,ColumnTitle4,ColumnTitle5,ColumnTitle6,ColumnTitle7,ColumnTitle8,ColumnTitle9,ColumnTitle10
Column1Value3145,Column2Value1,Column3Value1,Column4Value1,Column5Value1,Column6Value1,Column7Valu1,Column8Value1,Column9Value1,Column10Value1
Column1Value573,Column2Value2,Column3Value2,Column4Value2,Column5Value2,Column6Value2,Column7Valu2,Column8Value2,Column9Value2,Column10Value2
Column1Value62317,Column2Value3,Column3Value3,Column4Value3,Column5Value3,Column6Value3,Column7Valu3,Column8Value3,Column9Value3,Column10Value3
Column1Value93,Column2Value4,Column3Value4,Column4Value4,Column5Value4,Column6Value4,Column7Valu4,Column8Value4,Column9Value4,Column10Value4
File2:
ColumnTitle1,ColumnTitle11,ColumnTitle12,ColumnTitle13,ColumnTitle14,ColumnTitle15,ColumnTitle16,ColumnTitle17,ColumnTitle18
Column1Value752789,Column11Value1,Column12Value1,Column13Value1,Column14Value1,Column15Value1,Column16Value1,Column17Value1,Column18Value1
Column1Value3145,Column11Value2,Column12Value2,Column13Value2,Column14Value2,Column15Value2,Column16Value2,Column17Value2,Column18Value2
Column1Value573,Column11Value3,Column12Value3,Column13Value3,Column14Value3,Column15Value3,Column16Value3,Column17Value3,Column18Value3
Column1Value832657,Column11Value4,Column12Value4,Column13Value4,Column14Value4,Column15Value4,Column16Value4,Column17Value4,Column18Value4
Column1Value62317,Column11Value5,Column12Value5,Column13Value5,Column14Value5,Column15Value5,Column16Value5,Column17Value5,Column18Value5
Column1Value93,Column11Value6,Column12Value6,Column13Value6,Column14Value6,Column15Value6,Column16Value6,Column17Value6,Column18Value6
Column1Value423568,Column11Value7,Column12Value7,Column13Value7,Column14Value7,Column15Value7,Column16Value7,Column17Value7,Column18Value7
If I were to just merge these two files (2 out of the 5) it would look something like this:
ColumnTitle1,ColumnTitle2,ColumnTitle3,ColumnTitle4,ColumnTitle5,ColumnTitle6,ColumnTitle7,ColumnTitle8,ColumnTitle9,ColumnTitle10,ColumnTitle11,ColumnTitle12,ColumnTitle13,ColumnTitle14,ColumnTitle15,ColumnTitle16,ColumnTitle17,ColumnTitle18
Column1Value3145,Column2Value1,Column3Value1,Column4Value1,Column5Value1,Column6Value1,Column7Valu1,Column8Value1,Column9Value1,Column10Value1,Column11Value2,Column12Value2,Column13Value2,Column14Value2,Column15Value2,Column16Value2,Column17Value2,Column18Value2
Column1Value573,Column2Value2,Column3Value2,Column4Value2,Column5Value2,Column6Value2,Column7Valu2,Column8Value2,Column9Value2,Column10Value2,Column11Value3,Column12Value3,Column13Value3,Column14Value3,Column15Value3,Column16Value3,Column17Value3,Column18Value3
Column1Value62317,Column2Value3,Column3Value3,Column4Value3,Column5Value3,Column6Value3,Column7Valu3,Column8Value3,Column9Value3,Column10Value3,Column11Value5,Column12Value5,Column13Value5,Column14Value5,Column15Value5,Column16Value5,Column17Value5,Column18Value5
Column1Value93,Column2Value4,Column3Value4,Column4Value4,Column5Value4,Column6Value4,Column7Valu4,Column8Value4,Column9Value4,Column10Value4,Column11Value6,Column12Value6,Column13Value6,Column14Value6,Column15Value6,Column16Value6,Column17Value6,Column18Value6
Adding files 3 - 5 would increase the columns to around 50 (give or take).
I'm not sure if this is the quickest method, but here is the logic I am thinking (which I'm not sure how to do using powershell):
Go one file at a time to match and merge with file one
store file1 in variable
store file2 in variable
Loop through lines in file1
\\\\ Where value1 in column1 from file1 is found in column1 from file2
\\\\ append row from file2 to row in file1
\\\\ remove row from file2 (lessen the search during the next loop iteration)
clear variable holding file2
store next file in variable
repeat the loop find and append iterations
All roads lead to Rome. One of them is:
#Hashtable to store master-objects in
$data = #{}
#Import-CSV -Filter "MyMasterList.csv" | Foreach-Object { $data[$_.ColumnTitle1] = $_ }
#Sampledata below
#"
ColumnTitle1,ColumnTitle2,ColumnTitle3,ColumnTitle4,ColumnTitle5,ColumnTitle6,ColumnTitle7,ColumnTitle8,ColumnTitle9,ColumnTitle10
Column1Value3145,Column2Value1,Column3Value1,Column4Value1,Column5Value1,Column6Value1,Column7Valu1,Column8Value1,Column9Value1,Column10Value1
Column1Value573,Column2Value2,Column3Value2,Column4Value2,Column5Value2,Column6Value2,Column7Valu2,Column8Value2,Column9Value2,Column10Value2
Column1Value62317,Column2Value3,Column3Value3,Column4Value3,Column5Value3,Column6Value3,Column7Valu3,Column8Value3,Column9Value3,Column10Value3
Column1Value93,Column2Value4,Column3Value4,Column4Value4,Column5Value4,Column6Value4,Column7Valu4,Column8Value4,Column9Value4,Column10Value4
"# | ConvertFrom-Csv | % { $data[$_.ColumnTitle1] = $_ }
Get-ChildItem -Path "C:\MyOtherCSVs" -Filter "*.csv" | ForEach-Object { Import-Csv -Path $_.FullName } | ForEach-Object {
$ID = $_.ColumnTitle1
#If row is in MasterList
if($data.ContainsKey($ID)) {
#Get matching object
$obj = $data[$ID]
#Foreach line in csv
$_.psobject.Properties | Where-Object { $_.Name -ne 'ColumnTitle1' } | ForEach-Object {
#Foreach property, add to master-object
Add-Member -InputObject $obj -MemberType NoteProperty -Name $_.Name -Value $_.Value
}
#Put modified object back into hashtable
$data[$ID] = $obj
}
}
$data.Values | Export-Csv -Path "MergedCSV.csv" -NoTypeInformation
Be sure to pack some extra memory with large CSV-files.