Combining CSV files in Powershell - different headings - powershell

I need to take a slew of csv files from a directory and get them into an array in Powershell (to eventually manipulate and write back to a CSV).
The problem is there are 5 file types. I need around 8 columns from each. The columns are essentially the same, but have different headings.
Is there an easy way to do this? I started creating a custom object with my 8 fields, looping through the files importing each one, looking at the filename (which tells me the column names I need) and then a bunch of ifs to add it to my custom object array.
I was wondering if there is a simpler way...like with a template saying which columns from each file.

wound up doing this. It may have not been the most efficient, but works. I wound up writing out each file separately and combining at the end as PS really got bogged down (over a million rows combined).
$Newcsv = #()
$path = "c:\scrap\BWFILES\"
$files = gci -path $path -recurse -filter *.csv | Where-Object { ! ($_.psiscontainer) }
$counter=1
foreach($file in $files)
{
$csv = Import-Csv $file.FullName
if ($file.Name -like '*SAV*')
{
$Newcsv = $csv | Select-Object #{Name="PRODUCT";Expression={"SV"}},DMBRCH,DMACCT,DMSHRT
}
if ($file.Name -like '*TIME*')
{
$Newcsv = $csv | Select-Object #{Name="PRODUCT";Expression={"TM"}},TMBRCH,TMACCT,TMSHRT
}
if ($file.Name -like '*TRAN*')
{
$Newcsv = $csv | Select-Object #{Name="PRODUCT";Expression={"TR"}},DMBRCH,DMACCT,DMSHRT
}
if ($file.Name -like '*LN*')
{
$Newcsv = $csv | Select-Object #{Name="PRODUCT";Expression={"LN"}},LNBRCH,LNNOTE,LNSHRT
}
$Newcsv | Export-Csv "C:\scrap\$file.name$counter.csv" -force -notypeinformation
$counter++
}
get-childItem "c:\scrap\*.csv" | foreach {
$filePath = $_
$lines = $lines = Get-Content $filePath
$linesToWrite = switch($getFirstLine) {
$true {$lines}
$false {$lines | Select -Skip 1}
}
$getFirstLine = $false
Add-Content "c:\scrap\combined.csv" $linesToWrite
}

With a hashtable for reference, a little RegEx matching, and using the automatic variable $Matches in a ForEach-Object loop (alias % used) that could all be shortened to:
$path = "c:\scrap\BWFILES\"
$Reference = #{
'SAV' = 'SV'
'TIME' = 'TM'
'TRAN' = 'TR'
'LN'='LN'
}
Set-Content -Value "PRODUCT,BRCH,ACCT,SHRT" -Path 'c:\scrap\combined.csv'
gci -path $path -recurse -filter *.csv | Where-Object { !($_.psiscontainer) -and $_.Name -match ".*(SAV|TIME|TRAN|LN).*"}|%{
$Product = $Reference[($Matches[1])]
Import-CSV $_.FullName | Select-Object #{Name="PRODUCT";Expression={$Product}},*BRCH,#{l='Acct';e={$_.LNNOTE, $_.DMACCT, $_.TMACCT|?{$_}}},*SHRT | ConvertTo-Csv -NoTypeInformation | Select -Skip 1 | Add-Content 'c:\scrap\combined.csv'
}
That should produce the exact same file. Only kind of tricky part was the LNNOTE/TMACCT/DMACCT field since obviously you can't just do the same as like *SHRT.

Related

Powershell Piping Variable to Export-CSV not giving the data I'm looking for

In this script I'm getting a collection of CSV files, performing a replace, storing in an empty array and attempting to export it to CSV.
$CSVFiles = Get-ChildItem "C:\GALIC\Test\Test2\WindowsLists\*.csv" -Exclude M*
$AllJobsList = $CSVFiles | ForEach { (Import-CSV $_ -Delimiter ',' | Select 'Agent', 'Name', 'Folder' | Where-Object {$_.Agent -like "*AGENTGROUP*"})}
$UpdatedGroupsList = #()
$AllJobsList | Export-Csv -Path "C:\GALIC\Test\Test2\WindowsLists\FullJobs-Test.csv" -NoTypeInformation -Force
**$CSVContent = Get-Content "C:\GALIC\Test\Test2\WindowsLists\FullJobs-Test.csv"
foreach($line in $CSVContent)
{
if($line.Contains('|') -and $line.Contains('HOSTG'))
{
#Write-Host $line
$null = $line.Replace('|', '').Replace('HOSTG', '')
#Write-Host $LineReplace
$UpdatedGroupsList += $line
}
}
$UpdatedGroupsList | Export-CSV -Path "C:\GALIC\Test\Test2\WindowsLists\UpdatedFullJobs.csv" -NoTypeInformation -Force**
($CSVContent on down is what's giving me issues.)
After opening the CSV file, the content looks nothing like what I'm expecting. Any ideas/suggestions?
enter image description here

Powershell - Combine CSV files and append a column

I'm trying (badly) to work through combining CSV files into one file and prepending a column that contains the file name. I'm new to PowerShell, so hopefully someone can help here.
I tried initially to do the well documented approach of using Import-Csv / Export-Csv, but I don't see any options to add columns.
Get-ChildItem -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv CombinedFile.txt -UseQuotes Never -NoTypeInformation -Append
Next I'm trying to loop through the files and append the name, which kind of works, but for some reason this stops after the first row is generated. Since it's not a CSV process, I have to use the switch to skip the first title row of each file.
$getFirstLine = $true
Get-ChildItem -Filter *.csv | Where-Object {$_.Name -NotMatch "Combined.csv"} | foreach {
$filePath = $_
$collection = Get-Content $filePath
foreach($lines in $collection) {
$lines = ($_.Basename + ";" + $lines)
}
$linesToWrite = switch($getFirstLine) {
$true {$lines}
$false {$lines | Select -Skip 1}
}
$getFirstLine = $false
Add-Content "Combined.csv" $linesToWrite
}
This is where the -PipelineVariable parameter comes in real handy. You can set a variable to represent the current iteration in the pipeline, so you can do things like this:
Get-ChildItem -Filter *.csv -PipelineVariable File | Where-Object {$_.Name -NotMatch "Combined.csv"} | ForEach-Object { Import-Csv $File.FullName } | Select *,#{l='OriginalFile';e={$File.Name}} | Export-Csv Combined.csv -Notypeinfo
Merging your CSVs into one and adding a column for the file's name can be done as follows, using a calculated property on Select-Object:
Get-ChildItem -Filter *.csv | ForEach-Object {
$fileName = $_.Name
Import-Csv $_.FullName | Select-Object #{
Name = 'FileName'
Expression = { $fileName }
}, *
} | Export-Csv path/to/merged.csv -NoTypeInformation

Memory exception while filtering large CSV files

getting memory exception while running this code. Is there a way to filter one file at a time and write output and append after processing each file. Seems the below code loads everything to memory.
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
Get-ChildItem $inputFolder -File -Filter '*.csv' |
ForEach-Object { Import-Csv $_.FullName } |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
May be can you export and filter your files one by one and append result into your output file like this :
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
Remove-Item $outputFile -Force -ErrorAction SilentlyContinue
Get-ChildItem $inputFolder -Filter "*.csv" -file | %{import-csv $_.FullName | where machine_type -eq 'workstations' | export-csv $outputFile -Append -notype }
Note: The reason for not using Get-ChildItem ... | Import-Csv ... - i.e., for not directly piping Get-ChildItem to Import-Csv and instead having to call Import-Csv from the script block ({ ... } of an auxiliary ForEach-Object call, is a bug in Windows PowerShell that has since been fixed in PowerShell Core - see the bottom section for a more concise workaround.
However, even output from ForEach-Object script blocks should stream to the remaining pipeline commands, so you shouldn't run out of memory - after all, a salient feature of the PowerShell pipeline is object-by-object processing, which keeps memory use constant, irrespective of the size of the (streaming) input collection.
You've since confirmed that avoiding the aux. ForEach-Object call does not solve the problem, so we still don't know what causes your out-of-memory exception.
Update:
This GitHub issue contains clues as to the reason for excessive memory use, especially with many properties that contain small amounts of data.
This GitHub feature request proposes using strongly typed output objects to help the issue.
The following workaround, which uses the switch statement to process the files as text files, may help:
$header = ''
Get-ChildItem $inputFolder -Filter *.csv | ForEach-Object {
$i = 0
switch -Wildcard -File $_.FullName {
'*workstations*' {
# NOTE: If no other columns contain the word `workstations`, you can
# simplify and speed up the command by omitting the `ConvertFrom-Csv` call
# (you can make the wildcard matching more robust with something
# like '*,workstations,*')
if ((ConvertFrom-Csv "$header`n$_").machine_type -ne 'workstations') { continue }
$_ # row whose 'machine_type' column value equals 'workstations'
}
default {
if ($i++ -eq 0) {
if ($header) { continue } # header already written
else { $header = $_; $_ } # header row of 1st file
}
}
}
} | Set-Content $outputFile
Here's a workaround for the bug of not being able to pipe Get-ChildItem output directly to Import-Csv, by passing it as an argument instead:
Import-Csv -LiteralPath (Get-ChildItem $inputFolder -File -Filter *.csv) |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
Note that in PowerShell Core you could more naturally write:
Get-ChildItem $inputFolder -File -Filter *.csv | Import-Csv |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
Solution 2 :
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
$encoding = [System.Text.Encoding]::UTF8 # modify encoding if necessary
$Delimiter=','
#find header for your files => i take first row of first file with data
$Header = Get-ChildItem -Path $inputFolder -Filter *.csv | Where length -gt 0 | select -First 1 | Get-Content -TotalCount 1
#if not header founded then not file with sise >0 => we quit
if(! $Header) {return}
#create array for header
$HeaderArray=$Header -split $Delimiter -replace '"', ''
#open output file
$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)
#write header founded
$w.WriteLine($Header)
#loop on file csv
Get-ChildItem $inputFolder -File -Filter "*.csv" | %{
#open file for read
$r = New-Object System.IO.StreamReader($_.fullname, $encoding)
$skiprow = $true
while ($line = $r.ReadLine())
{
#exclude header
if ($skiprow)
{
$skiprow = $false
continue
}
#Get objet for current row with header founded
$Object=$line | ConvertFrom-Csv -Header $HeaderArray -Delimiter $Delimiter
#write in output file for your condition asked
if ($Object.machine_type -eq 'workstations') { $w.WriteLine($line) }
}
$r.Close()
$r.Dispose()
}
$w.close()
$w.Dispose()
You have to read and write to the .csv files one row at a time, using StreamReader and StreamWriter:
$filepath = "C:\Change\2019\October"
$outputfile = "C:\Change\2019\output.csv"
$encoding = [System.Text.Encoding]::UTF8
$files = Get-ChildItem -Path $filePath -Filter *.csv |
Where-Object { $_.machine_type -eq 'workstations' }
$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)
$skiprow = $false
foreach ($file in $files)
{
$r = New-Object System.IO.StreamReader($file.fullname, $encoding)
while (($line = $r.ReadLine()) -ne $null)
{
if (!$skiprow)
{
$w.WriteLine($line)
}
$skiprow = $false
}
$r.Close()
$r.Dispose()
$skiprow = $true
}
$w.close()
$w.Dispose()
get-content *.csv | add-content combined.csv
Make sure combined.csv doesn't exist when you run this, or it's going to go full Ouroboros.

How do I process multiple CSV files in Powershell and give them output names?

So I'm trying to process CSV files, then giving the output new name. I can do it with one file by explicitly specifying the file name. But is there a way / wildcard I can use to make the script to process multiple files at the same time? Let's just say I want to process anything with .csv as an extension. Here's my script that's used to process a specific file
$objs =#();
$output = Import-csv -Path D:\TEP\FilesProcessing\Test\file1.csv | ForEach {
$Object = New-Object PSObject -Property #{
Time = $_.READ_DTTM
Value = $_.{VALUE(KWH)}
Tag = [String]::Concat($_.SUBSTATION,'_',$_.CIRCUITNAME,'_',$_.PHASE,'_',$_.METERID,'_KWH')
}
$objs += $Object;
}
$objs
$objs | Export-CSv -NoTypeInformation D:\TEP\FilesProcessing\Test\file1_out.csv
You can combine Get-ChildItem and Import-Csv.
Here's an example that specifies different input and output directories to avoid name collisions:
$inputPath = "D:\TEP\FilesProcessing\Test"
$outputPath = "D:\TEP\FilesProcessing\Output"
Get-ChildItem (Join-Path $inputPath "*.csv") | ForEach-Object {
$outputFilename = Join-Path $outputPath $_.Name
Import-Csv $_.FullName | ForEach-Object {
New-Object PSObject -Property #{
"Time" = $_.READ_DTTM
"Value" = $_.{VALUE(KWH)}
"Tag" = "{0}_{1}_{2}_{3}_KWH" -f $_.SUBSTATION,$_.CIRCUITNAME,$_.PHASE,$_.METERID
}
} | Export-Csv $outputFilename -NoTypeInformation
}
Note that there's no need for creating an array and repeatedly appending it. Just output the custom objects you want and export afterwards.
Use the Get-Childitem and cut out all the unnecessary intermediate variables so that you code it in a more Powershell type way. Something like this:
Get-CHhilditems 'D:\TEP\FilesProcessing\Test\*.csv' | % {
Import-csv $_.FullName | % {
New-Object PSObject -Property #{
Time = $_.READ_DTTM
Value = $_.{VALUE(KWH)}
Tag = '{0}_{1}_{2}_{3}_KWH' -f $_.SUBSTATION, $_.CIRCUITNAME, $_.PHASE, $_.METERID
}
} | Export-CSv ($_.FullName -replace '\.csv', '_out.csv') -NoTypeInformation
}
The Get-ChildItem is very useful for situations like this.
You can add wildcards directly into the path:
Get-ChildItem -Path D:\TEP\FilesProcessing\Test\*.csv
You can recurse a path and use the provider to filter files:
Get-ChildItem -Path D:\TEP\FilesProcessing\Test\ -recurse -include *.csv
This should get you what you need.
$Props = #{
Time = [datetime]::Parse($_.READ_DTTM)
Value = $_.{VALUE(KWH)}
Tag = $_.SUBSTATION,$_.CIRCUITNAME,$_.PHASE,$_.METERID,'KWH' -join "_"
}
$data = Get-ChildItem -Path D:\TEP\FilesProcessing\Test\*.csv | Foreach-Object {Import-CSV -Path $_.FullName}
$data | Select-Object -Property $Props | Export-CSv -NoTypeInformation D:\TEP\FilesProcessing\Test\file1_out.csv
Also when using Powershell avoid doing these things:
$objs =#();
$objs += $Object;

Comparing csv files with -like in Powershell

I have two csv files, each that contain a PATH column. For example:
CSV1.csv
PATH,Data,NF
\\server1\folderA,1,1
\\server1\folderB,1,1
\\server2\folderA,1,1
\\server2\folderB,1,1
CSV2.csv
PATH,User,Access,Size
\\server1\folderA\file1,don,1
\\server1\folderA\file2,don,1
\\server1\folderA\file3,sue,1
\\server2\folderB\file1,don,1
What I'm attempting to do is create a script that will result in separate csv exports based on the paths in CSV1 such that the new files contain file values from CSV2 that match. For example, from the above, I'd end up with 2 results:
result1.csv
\\server1\folderA\file1,don,1
\\server1\folderA\file2,don,1
\\server1\folderA\file3,sue,1
result2.csv
\\server2\folderB\file1,don,1
Previously I've used a script lime this when the two values are exact:
$reportfile = import-csv $apireportoutputfile -delimiter ';' -encoding unicode
$masterlist = import-csv $pathlistfile
foreach ($record in $masterlist)
{
$path=$record.Path
$filename = $path -replace '\\','_'
$filename = '.\Working\sharefiles\' + $filename + '.csv'
$reportfile | where-object {$_.path -eq $path} | select FilePath,UserName,LastAccessDate,LogicalSize | export-csv -path $filename
write-host " Creating files list for $path" -foregroundcolor red -backgroundcolor white
}
however since the two path values are not the same, it returns nothing. I found a -like operator but am not sure how to use it in this code to get the results I want. where-object is a filter while -like ends up returning a true/false. Am I on the right track? Any ideas for a solution?
Something like this, maybe?
$ht = #{}
Import-Csv csv1.csv |
foreach { $ht[$_.path] = New-Object collections.arraylist }
Import-Csv csv2.csv |
foreach {
$path = $_.path | Split-Path -Parent
$ht[$path].Add($_) > $null
}
$i=1
$ht.Values |
foreach { if ($_.count)
{
$_ | Export-Csv "result$i.csv" -NoTypeInformation
$i++
}
}
My suggestion:
$1=ipcsv .\csv1.CSV
$2=ipcsv .\csv2.CSV
$equal = diff ($2|select #{n='PATH';e={Split-Path $_.PATH}}) $1 -Property PATH -IncludeEqual -ExcludeDifferent -PassThru
0..(-1 + $equal.Count) | %{%{$i = $_}{
$2 | ?{ (Split-Path $_.PATH) -eq $equal[$i].PATH } | epcsv ".\Result$i.CSV"
}}