I'm trying to merge CSV files in Powershell. I've read numerous answers here but I'm stuck on this problem.
I have a list of csv files, 2 difficulties :
[A] each file has a metadataline, the headers are in the second line.
[B] each file has the same structure, but sometimes quotes surround the column to escape the content.
Thanks to this question : Merging multiple CSV files into one using PowerShell,
I'm able to solve these two problems individually.
However, I'm stuck at combining the solutions.
Partial solution A
Skips every metadata line as well as header for subsequent files
Adapting the answer from kemiller2002:
$sourcefilefolderPath = "C:\CSV_folder"
$destinationfilePath = "C:\appended_files.csv"
$getHeader = $true
Get-ChildItem -Path $sourcefilefolderPath -Filter *.csv -Recurse| foreach {
$filePath = $_.FullName
$lines = $lines = Get-Content $filePath
$linesToWrite = switch($getHeader) {
$true {$lines | Select -Skip 1} # skips only the metadata line
$false {$lines | Select -Skip 2} # skips both the metadata line as well as headers
}
$getHeader = False
Add-Content $destination_file $linesToWrite
}
The problem : Import-Csv $destination_file give inconsistent results, as the quoting can be different for each source file.
Partial solution B
handles successfully random quoted columns
Solution provided by stinkyfriend.
Import-Csv seems to import the data gracefully when the column quoting, however different from one column to the other, is consistent for each line of the source file.
I could not combine this solution with the one above.
Get-ChildItem -Path $sourcefilefolderPath -File -Filter *.csv -Recurse |
Select-Object -ExpandProperty FullName |
Import-Csv |
Export-Csv $destination_file -NoTypeInformation -Append
Thanks a lot for your help !
Solution C
produces blank file on my PC
using suggestion from Mathias R. Jessen
Get-ChildItem -Path $sourcefilefolderPath -File -Filter *.csv -Recurse | foreach {
Write-Host $_.FullName |
Get-Content $_.FullName | Select-Object -Skip 1 | ConvertFrom-Csv |
Export-Csv $destinationfilePath -NoTypeInformation -Append
--- EDIT ---
RESULT
I could solve the problem by creating appended_files.csv using the first matching source file and then append to it.
$pattern_sourceFile = "*.csv*"
$list_files = Get-ChildItem -Path $sourcefilefolderPath -File -Recurse | Where {
$_FullName -match $pattern_sourcefile }
Get-Content $list_files[0].FullName |
Select-Object -Skip 1 | # skips metadataline
ConvertFrom-Csv | Export-Csv $destinationfilePath -NoTypeInformation
$list_files |
Select-Object -Skip 1 | # skips $array_files[0]
foreach { Get-Content $_.FullName |
Select-Object -Skip 1 | # skips metadata line
ConvertFrom-Csv |
Export-Csv $destinationfilePath -NoTypeInformation -Append }
Use ConvertFrom-Csv instead of Import-Csv, this way you can still control how many lines to skip:
Get-Content $file |Select -Skip 1 |ConvertFrom-Csv
So you'll end up with something like:
$sourcefilefolderPath = "C:\CSV_folder"
$destinationfilePath = "C:\appended_files.csv"
Get-ChildItem -Path $sourcefilefolderPath -Filter *.csv -Recurse | foreach {
Get-Content $_.FullName |Select-Object -Skip 1 |ConvertFrom-Csv |Export-Csv -Path $destinationfilePath -NoTypeInformation -Append
}
Related
I'm trying (badly) to work through combining CSV files into one file and prepending a column that contains the file name. I'm new to PowerShell, so hopefully someone can help here.
I tried initially to do the well documented approach of using Import-Csv / Export-Csv, but I don't see any options to add columns.
Get-ChildItem -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv CombinedFile.txt -UseQuotes Never -NoTypeInformation -Append
Next I'm trying to loop through the files and append the name, which kind of works, but for some reason this stops after the first row is generated. Since it's not a CSV process, I have to use the switch to skip the first title row of each file.
$getFirstLine = $true
Get-ChildItem -Filter *.csv | Where-Object {$_.Name -NotMatch "Combined.csv"} | foreach {
$filePath = $_
$collection = Get-Content $filePath
foreach($lines in $collection) {
$lines = ($_.Basename + ";" + $lines)
}
$linesToWrite = switch($getFirstLine) {
$true {$lines}
$false {$lines | Select -Skip 1}
}
$getFirstLine = $false
Add-Content "Combined.csv" $linesToWrite
}
This is where the -PipelineVariable parameter comes in real handy. You can set a variable to represent the current iteration in the pipeline, so you can do things like this:
Get-ChildItem -Filter *.csv -PipelineVariable File | Where-Object {$_.Name -NotMatch "Combined.csv"} | ForEach-Object { Import-Csv $File.FullName } | Select *,#{l='OriginalFile';e={$File.Name}} | Export-Csv Combined.csv -Notypeinfo
Merging your CSVs into one and adding a column for the file's name can be done as follows, using a calculated property on Select-Object:
Get-ChildItem -Filter *.csv | ForEach-Object {
$fileName = $_.Name
Import-Csv $_.FullName | Select-Object #{
Name = 'FileName'
Expression = { $fileName }
}, *
} | Export-Csv path/to/merged.csv -NoTypeInformation
I have a task that requires to scan the property of all the files indicated by certain directories where the files are stored. I need my code to read the following line of information separated by the delimiter "," stored in a .txt file as follows (the directory is made up by myself on my own device and I went ahead making up some blank .xlsx files to test my code:
Jakarta,C:\\temp\Hfolder,C:\temp\Lfolder
I currently have code that looks like this:
$LocContent = Import-Csv "C:\temp\Location.txt" # -Header $fileHeaders
ForEach($line in $LocContent){C:\temp\test1.csv -NoTypeInformation
#split fields into values
$line = $LocContent -split (",")
$country = $line[0]
$hDrivePath = $line[1]
$lDrivePath = $line[2]
Get-ChildItem $hDrivePath -force -include *.xlsx, *.accdb, *.accde, *.accdt, *.accdr -Recurse
Get-ChildItem $lDrivePath -force -include *.xlsx, *.accdb, *.accde, *.accdt, *.accdr -Recurse
? {
$_.LastWriteTime -gt (Get-Date).AddDays(-5)
}
Select-Object -Property Name, Directory, #{Name="Owner";Expression={(Get-ACL $_.Fullname).Owner}}, CreationTime, LastAccessTime, #{N="Location";E={$country}}, #{N='size in MB';E={$_.Length/1024kb}} | Export-Csv
}
However there is no output on the .csv file I assigned to output the information. What is wrong in my code?
Thanks!
There are several flaws within your code:
The Select has neither an -InputObject nor is anything piped to it so there can't be an output
You should decide whether you treat C:\temp\Location.txt as
a text file with Get-Contentand a split
or as a csv with headers
or without headers and supply them to the import.
The Get-ChildItem output isn't piped anywhere nor stored in a variable so it goes to the screen.
Export-Csv needs a file name to export to.
Try this untested script:
## Q:\Test\2018\06\26\SO_51038180.ps1
$fileHeaders = #('country','hDrivePath','lDrivePath')
$extensions = #('*.xlsx','*.accdb','*.accde','*.accdt','*.accdr')
$LocContent = Import-Csv "C:\temp\Location.txt" -Header $fileHeaders
$NewData = ForEach($Row in $LocContent){
Get-ChildItem $Row.hDrivePath,$Row.lDrivePath -Force -Include $extensions -Recurse |
Where-Object LastWriteTime -gt (Get-Date).AddDays(-5) |
Select-Object -Property Name,
Directory,
#{Name="Owner";Expression={(Get-ACL $_.Fullname).Owner}},
CreationTime,
LastAccessTime,
#{N="Location";E={$Row.country}},
#{N='size in MB';E={$_.Length/1024kb}}
}
# you choose what to do with the result uncomment the desired
$NewData | Format-Table -Auto
# $NewData | Out-Gridview
# $NewData | Export-Csv '.\NewData.csv' -NoTypeInformation
I wish to search for specific files listed in searchFiles and pipe their locations to TestFileLocation.CSV. However, my current script only generates an empty CSV. What am I missing?
My TestFindFile.csv is of the form:
Name
123.pdf
321.pdf
aaa.pdf
SNIPPET
$searchFiles = Import-CSV 'C:\Data\SCRIPTS\PS1\TestFindFile.csv' -Header ("Name")
$source = 'C:\Data'
ForEach($File in $searchFiles)
{
Get-ChildItem $source -Filter $File -rec | where {!$_.PSIsContainer} | select-object FullName | export-csv -notypeinformation -delimiter '|' -path c:\data\scripts\ps1\TestFileLocation.csv
}
You were overwriting the CSV for each iteration of the loop.
$searchFiles = Import-CSV 'C:\Data\SCRIPTS\PS1\TestFindFile.csv' -Header ("Name")
$source = 'C:\Data'
$outputPath = 'c:\data\scripts\ps1\TestFileLocation.csv'
$searchFiles | ForEach-Object {
# Silently continue to try to ignore error like
# not being able to read path's which are too long
Get-ChildItem $source -Filter $_ -rec -ErrorAction SilentlyContinue | where {!$_.PSIsContainer} | select-object FullName
} | export-csv -notypeinformation -delimiter '|' -path $outputPath
Example using AlphaFS
A comment asked for an example using AlphaFS because it claims to overcome the long path issue. I'm not going into all the details, but here is how I got it to work.
# download and unzip to c:\alpahfs
# dir C:\AlphaFS\* -Recurse -File | Unblock-File
[System.Reflection.Assembly]::LoadFrom('C:\AlphaFS\lib\net451\AlphaFS.dll')
$searchFiles = Import-CSV 'C:\Data\SCRIPTS\PS1\TestFindFile.csv' -Header ("Name")
$source = 'C:\Data'
$outputPath = 'c:\data\scripts\ps1\TestFileLocation.csv'
$searchFiles | ForEach-Object {
$files = [Alphaleonis.Win32.Filesystem.Directory]::EnumerateFiles($source,'*',[System.IO.SearchOption]::AllDirectories)
$files | ForEach-Object { [PSCustomObject] #{FileName = $_} }
} | export-csv -notypeinformation -delimiter '|' -path $outputPath
# type $outputPath
If your .csv file contains the header "Name", there is no need to again declare it when running Import-Csv.
The reason the output is empty is that you are searching for an Object which contains the property Name (imported from the TestFindFile.csv). Search for $File.Name. Also pull commands outside the loop that don't need to be there:
$searchFiles | Select -ExpandProperty Name | % {
Get-ChildItem $source -Filter $_ -Recurse | where {!$_.PSIsContainer}
} | select-object FullName | export-csv -notypeinformation -delimiter '|' -path c:\data\scripts\ps1\TestFileLocation.csv
I need to take a slew of csv files from a directory and get them into an array in Powershell (to eventually manipulate and write back to a CSV).
The problem is there are 5 file types. I need around 8 columns from each. The columns are essentially the same, but have different headings.
Is there an easy way to do this? I started creating a custom object with my 8 fields, looping through the files importing each one, looking at the filename (which tells me the column names I need) and then a bunch of ifs to add it to my custom object array.
I was wondering if there is a simpler way...like with a template saying which columns from each file.
wound up doing this. It may have not been the most efficient, but works. I wound up writing out each file separately and combining at the end as PS really got bogged down (over a million rows combined).
$Newcsv = #()
$path = "c:\scrap\BWFILES\"
$files = gci -path $path -recurse -filter *.csv | Where-Object { ! ($_.psiscontainer) }
$counter=1
foreach($file in $files)
{
$csv = Import-Csv $file.FullName
if ($file.Name -like '*SAV*')
{
$Newcsv = $csv | Select-Object #{Name="PRODUCT";Expression={"SV"}},DMBRCH,DMACCT,DMSHRT
}
if ($file.Name -like '*TIME*')
{
$Newcsv = $csv | Select-Object #{Name="PRODUCT";Expression={"TM"}},TMBRCH,TMACCT,TMSHRT
}
if ($file.Name -like '*TRAN*')
{
$Newcsv = $csv | Select-Object #{Name="PRODUCT";Expression={"TR"}},DMBRCH,DMACCT,DMSHRT
}
if ($file.Name -like '*LN*')
{
$Newcsv = $csv | Select-Object #{Name="PRODUCT";Expression={"LN"}},LNBRCH,LNNOTE,LNSHRT
}
$Newcsv | Export-Csv "C:\scrap\$file.name$counter.csv" -force -notypeinformation
$counter++
}
get-childItem "c:\scrap\*.csv" | foreach {
$filePath = $_
$lines = $lines = Get-Content $filePath
$linesToWrite = switch($getFirstLine) {
$true {$lines}
$false {$lines | Select -Skip 1}
}
$getFirstLine = $false
Add-Content "c:\scrap\combined.csv" $linesToWrite
}
With a hashtable for reference, a little RegEx matching, and using the automatic variable $Matches in a ForEach-Object loop (alias % used) that could all be shortened to:
$path = "c:\scrap\BWFILES\"
$Reference = #{
'SAV' = 'SV'
'TIME' = 'TM'
'TRAN' = 'TR'
'LN'='LN'
}
Set-Content -Value "PRODUCT,BRCH,ACCT,SHRT" -Path 'c:\scrap\combined.csv'
gci -path $path -recurse -filter *.csv | Where-Object { !($_.psiscontainer) -and $_.Name -match ".*(SAV|TIME|TRAN|LN).*"}|%{
$Product = $Reference[($Matches[1])]
Import-CSV $_.FullName | Select-Object #{Name="PRODUCT";Expression={$Product}},*BRCH,#{l='Acct';e={$_.LNNOTE, $_.DMACCT, $_.TMACCT|?{$_}}},*SHRT | ConvertTo-Csv -NoTypeInformation | Select -Skip 1 | Add-Content 'c:\scrap\combined.csv'
}
That should produce the exact same file. Only kind of tricky part was the LNNOTE/TMACCT/DMACCT field since obviously you can't just do the same as like *SHRT.
I'm looking for a way to add a header line into multiple CSV files.
Problem with this code below is that it will add an extra empty row at the end of each file. I don't understand why there is extra empty line but I need to delete those lines.
$header="Column1,Column2,Column3,Column4,Column5,Column6"
Get-ChildItem .\ -Recurse -Filter *.csv| Foreach-Object {
$header+"`r`n"+ (Get-Content $_.FullName | Out-String) |
Set-Content -Path $_.FullName
}
The canonical way would be to import the files specifying the headers and then re-export them:
$header = 'Column1', 'Column2', 'Column3', 'Column4', 'Column5', 'Column6'
Get-ChildItem .\ -Recurse -Filter '*.csv' | ForEach-Object {
$file = $_.FullName
(Import-Csv -Path $file -Header $header) | Export-Csv -Path $file -NoType
}
Export-Csv does add double quotes around all fields of the CSV, though. Also, parsing the data into objects does have a performance impact. If you don't want the double quotes added or are pressed for performance the solution suggested by #PetSerAl in the comments to your question might be a better approach for you:
$header = 'Column1,Column2,Column3,Column4,Column5,Column6'
Get-ChildItem .\ -Recurse -Filter '*.csv' | ForEach-Object {
$file = $_.FullName
#($header; Get-Content -Path $file) | Set-Content -Path $file
}