Powershell script to combine multiple csv files to single one - powershell

I have 3 csv files in C:\temp. Trying to combine all 3 csv files to single file.
F1.csv, F2.csv, F3.csv [All having unique headers and different number of rows and columns]. Below are sample contents in the file.
F1.csv
F1C1 F1C2
ABC 123
F2.csv
F2C1 F2C2
DEF 456
GHI 789
JKL 101112
F3.csv
F3C1
MNO
PQR
I want the result csv file FR.csv to be like below
FR.csv
F1C1 F1C2 F2C1 F2C2 F3C1
ABC 123 DEF 456 MNO
GHI 789 PQR
JKL 101112
I tried running the below script, but FR.csv gives output in single column.
Get-Content C:\temp\*csv | Add-Content C:\temp\FinalResult.csv

The following solutions assume that Get-ChildItem *.csv enumerates the files to merge, in the desired order (which works with input files F1.csv, F2.csv, F3.csv in the current dir).
Plain-text solution, using .NET APIs, System.IO.StreamReaderand System.IO.StreamWriter:
This solution performs much better than the OO solution below, but the latter gives you more flexibility. Input files without a Unicode BOM are assumed to be UTF-8-encoded, and the output is saved to a BOM-less UTF8 file named FR.csv in the current dir. (the APIs used do allow you to specify different encodings, if needed).
$outFile = 'FR.csv'
# IMPORTANT: Always use *full* paths with .NET APIs.
# Writer for the output file.
$writer = [System.IO.StreamWriter] (Join-Path $Pwd.ProviderPath $outFile)
# Readers for all input files.
$readers = [System.IO.StreamReader[]] (Get-ChildItem *.csv -Exclude $outFile).FullName
# Read all files in batches of corresponding lines, join the
# lines of each batch with ",", and save to the output file.
$isHeader = $true
while ($readers.EndOfStream -contains $false) {
if ($isHeader) {
$headerLines = $readers.ReadLine()
$colCounts = $headerLines.ForEach({ ($_ -split ',').Count })
$writer.WriteLine($headerLines -join ',')
$isHeader = $false
} else {
$i = 0
$lines = $readers.ForEach({
if ($line = $_.ReadLine()) { $line }
else { ',' * ($colCounts[$i] - 1) }
++$i
})
$writer.WriteLine($lines -join ',')
}
}
$writer.Close()
$readers.Close()
OO solution, using Import-Csv and ConvertTo-Csv / Export-Csv:
# Read all CSV files into an array of object arrays.
$objectsPerCsv =
Get-ChildItem *.csv -Exclude FR.csv |
ForEach-Object {
, #(Import-Csv $_.FullName)
}
# Determine the max. row count.
$maxCount = [Linq.Enumerable]::Max($objectsPerCsv.ForEach('Count'))
# Get all column names per CSV.
$colNamesPerCsv = $objectsPerCsv.ForEach({ , $_[0].psobject.Properties.Name })
0..($maxCount-1) | ForEach-Object {
$combinedProps = [ordered] #{}
$row = $_; $col = 0
$objectsPerCsv.ForEach({
if ($object = $_[$row]) {
foreach ($prop in $object.psobject.Properties) {
$combinedProps.Add($prop.Name, $prop.Value)
}
}
else {
foreach ($colName in $colNamesPerCsv[$col]) {
$combinedProps.Add($colName, $null)
}
}
++$col
})
[pscustomobject] $combinedProps
} | ConvertTo-Csv
Replace ConvertTo-Csv with Export-Csv to export the data to a file; use the -NoTypeInformation parameter and -Encoding as needed; e.g. ... | Export-Csv -NoTypeInformation -Encoding utf8 Merged.csv

Related

How to make a script that merges all .txt files into one .csv file into multiple columns in Powershell

I don't know how to merge multiple .txt files with datas into one .csv file each of the .txt file seperated into columns.
This is my code so far,
$location = (Get-Location).Path
$files = Get-ChildItem $location -Filter "*.asd.txt"
$data = #()
foreach ($file in $files) {
$fileData = Get-Content $file.FullName
foreach ($line in $fileData) {
$lineData = $line -split "\t"
$data = $lineData[1]
Add-Content -Path "$location\output.csv" -Value $data
}
}
Each of the file looks like this
I want to keep the first column "WaveLength" and put the second columns next to each other from all the files in the folder. The header will start with the exac name
"stovikmladyDoupno2 2020080500001.asd" or "stovikmladyDoupno2 2020080500002.asd" and so on ....
so it should look like this
I have tried to look for information over two days and still don't know. I have tried to put "," on the end of the file, I though excel will handle with that, but nothing helped.
Here I provide few files as test data
https://mega.nz/folder/zNhTzR4Z#rpc-BQdRfm3wxl87r9XUkw
few lines of data
Wavelength stovikmladyDoupno2 2020080500000.asd
350 6.38961399706465E-02
351 6.14107911262903E-02
352 6.04866108251357E-02
353 5.83485359067184E-02
354 0.054978792413247
355 5.27014859356317E-02
356 5.34849237528764E-02
357 5.32841277775603E-02
358 5.23466655229364E-02
359 5.47595002186027E-02
360 5.22061034631109E-02
361 4.90149806042666E-02
362 4.81633530421385E-02
363 4.83974076557941E-02
364 4.65219929658367E-02
365 0.044800930294557
366 4.47830287392802E-02
367 4.46947539436297E-02
368 0.043756926558447
369 4.31725380363072E-02
370 4.36867609723618E-02
371 4.33227601805265E-02
372 4.29978664449687E-02
373 4.23860463187361E-02
374 4.12183604375401E-02
375 4.14306521081773E-02
376 4.11760903772502E-02
377 4.06421127128478E-02
378 4.09771489689262E-02
379 4.10083126746385E-02
380 4.05161601354181E-02
381 3.97904564387456E-02
I assumed a location since I'm not fond of declaring file paths without a literal path. Please adjust path as needed.
$Files = Get-ChildItem J:\Test\*.txt -Recurse
$Filecount = 0
$ObjectCollectionArray = #()
#Fist parse and collect each row in an array.. While keeping the datetime information from filename.
foreach($File in $Files){
$Filecount++
Write-Host $Filecount
$DateTime = $File.fullname.split(" ").split(".")[1]
$Content = Get-Content $File.FullName
foreach($Row in $Content){
$Split = $Row.Split("`t")
if($Split[0] -ne 'Wavelength'){
$Object = [PSCustomObject]#{
'Datetime' = $DateTime
'Number' = $Split[0]
'Wavelength' = $Split[1]
}
$ObjectCollectionArray += $Object
}
}
}
#Match by number and create a new object with relation to the number and different datetime.
$GroupedCollection = #()
$Grouped = $ObjectCollectionArray | Group-Object number
foreach($GroupedNumber in $Grouped){
$NumberObject = [PSCustomObject]#{
'Number' = $GroupedNumber.Name
}
foreach($Occurance in $GroupedNumber.Group){
$NumberObject | Add-Member -NotePropertyName $Occurance.Datetime -NotePropertyValue $Occurance.wavelength
}
$GroupedCollection += $NumberObject
}
$GroupedCollection | Export-Csv -Path J:\Test\result.csv -NoClobber -NoTypeInformation
What you're looking to do is quite a hard task, there are a few ways to do it. This method requires that all files are in memory to process them. You can definitely treat these files as TSVs, so Import-Csv -Delimiter "`t" is an option so you can deal with objects instead of plain text.
# using this temp dictionary to create objects for each line of each tsv
$tmp = [ordered]#{}
# get all files and enumerate
$csvs = Get-ChildItem $location -Filter *.asd.txt | ForEach-Object {
# get their content as objects
$content = $_ | Import-Csv -Delimiter "`t"
# get their property Name that is not `Wavelength`
$property = $content[0].PSObject.Properties.Where{ $_.Name -ne 'Wavelength' }.Name
# output an object holding the total lines of this csv,
# its content and the property name of interest
[pscustomobject]#{
Lines = $content.Count
Content = $content
Property = $property
}
}
# use a scriptblock to allow streaming so `Export-Csv` starts exporting as
# output is going through the pipeline
& {
# for loop used for each line of the Tsv having the highest number of lines
for($i = 0; $i -lt [System.Linq.Enumerable]::Max([int[]] $csvs.Lines); $i++) {
# this boolean is used to preserve the "Wavelength" value of the first Tsv
$isFirstCsv = $true
foreach($csv in $csvs) {
# if this is the first object
if($isFirstCsv) {
# add the value of "Wavelength"
$tmp['Wavelength'] = $csv.Content[$i].Wavelength
# and set the bool to false, since we are only using this once
$isFirstCsv = $false
}
# then add the value of each property of each Tsv to the temp dictionary
$tmp[$csv.Property] = $csv.Content[$i].($csv.Property)
}
# then output this object
[pscustomobject] $tmp
# clear the temp dictionary
$tmp.Clear()
}
} | Export-Csv path\to\result.csv -NoTypeInformation
Here is a much more efficient approach that treats the files as plain text, this method is much faster and memory efficient however not as reliable. It uses StreamReader to read the file contents line-by-line and a StringBuilder to construct each line.
& {
# get all files and enumerate
$readers = Get-ChildItem $location -Filter *.asd.txt | ForEach-Object {
# create a stream reader for each file
[System.IO.StreamReader] $_.FullName
}
# this StringBuilder is used to construct each line
$sb = [System.Text.StringBuilder]::new()
# while any of the readers has more content
while($readers.EndOfStream -contains $false) {
# signals this is our first Tsv
$isFirstReader = $true
# enumerate each reader
foreach($reader in $readers) {
# if this is the first Tsv
if($isFirstReader) {
# append the line as-is, only trimming exces white space
$sb = $sb.Append($reader.ReadLine().Trim())
$isFirstReader = $false
# go to next reader
continue
}
# if this is not the first Tsv,
# split on Tab and exclude the first token (Wavelength)
$null, $line = $reader.ReadLine().Trim() -split '\t'
# append a Tab + this line
$sb = $sb.Append("`t$line")
}
# append a new line and output the constructed string
$sb.AppendLine().ToString()
# and clear it for next lines
$sb = $sb.Clear()
}
# dispose all readers when done
$readers | ForEach-Object Dispose
} | Set-Content path\to\result.tsv -NoNewline

Export CSV. Folder, subfolder and file into separate column

I created a script that lists all the folders, subfolders and files and export them to csv:
$path = "C:\tools"
Get-ChildItem $path -Recurse |select fullname | export-csv -Path "C:\temp\output.csv" -NoTypeInformation
But I would like that each folder, subfolder and file in pfad is written into separate column in csv.
Something like this:
c:\tools\test\1.jpg
Column1
Column2
Column3
tools
test
1.jpg
I will be grateful for any help.
Thank you.
You can split the Fullname property using the Split() method. The tricky part is that you need to know the maximum path depth in advance, as the CSV format requires that all rows have the same number of columns (even if some columns are empty).
# Process directory $path recursively
$allItems = Get-ChildItem $path -Recurse | ForEach-Object {
# Split on directory separator (typically '\' for Windows and '/' for Unix-like OS)
$FullNameSplit = $_.FullName.Split( [IO.Path]::DirectorySeparatorChar )
# Create an object that contains the splitted path and the path depth.
# This is implicit output that PowerShell captures and adds to $allItems.
[PSCustomObject] #{
FullNameSplit = $FullNameSplit
PathDepth = $FullNameSplit.Count
}
}
# Determine highest column index from maximum depth of all paths.
# Minus one, because we'll skip root path component.
$maxColumnIndex = ( $allItems | Measure-Object -Maximum PathDepth ).Maximum - 1
$allRows = foreach( $item in $allItems ) {
# Create an ordered hashtable
$row = [ordered]#{}
# Add all path components to hashtable. Make sure all rows have same number of columns.
foreach( $i in 1..$maxColumnIndex ) {
$row[ "Column$i" ] = if( $i -lt $item.FullNameSplit.Count ) { $item.FullNameSplit[ $i ] } else { $null }
}
# Convert hashtable to object suitable for output to CSV.
# This is implicit output that PowerShell captures and adds to $allRows.
[PSCustomObject] $row
}
# Finally output to CSV file
$allRows | Export-Csv -Path "C:\temp\output.csv" -NoTypeInformation
Notes:
The syntax Select-Object #{ Name= ..., Expression = ... } creates a calculated property.
$allRows = foreach captures and assigns all output of the foreach loop to variable $allRows, which will be an array if the loop outputs more than one object. This works with most other control statements as well, e. g. if and switch.
Within the loop I could have created a [PSCustomObject] directly (and used Add-Member to add properties to it) instead of first creating a hashtable and then converting to [PSCustomObject]. The choosen way should be faster as no additional overhead for calling cmdlets is required.
While a file with rows containing a variable number of items is not actually a CSV file, you can roll your own and Microsoft Excel can read it.
=== Get-DirCsv.ps1
Get-Childitem -File |
ForEach-Object {
$NameParts = $_.FullName -split '\\'
$QuotedParts = [System.Collections.ArrayList]::new()
foreach ($NamePart in $NameParts) {
$QuotedParts.Add('"' + $NamePart + '"') | Out-Null
}
Write-Output $($QuotedParts -join ',')
}
Use this to capture the output to a file with:
.\Get-DirCsv.ps1 | Out-File -FilePath '.\dir.csv' -Encoding ascii

Stuck with this PS Script

I have a text file that contains millions of records
I want to find out from each line that does not start with string + that line number (String starts with double quote 01/01/2019)
Can you help me modify this code?
Get-Content "(path).txt" | Foreach { if ($_.Split(',')[-1] -inotmatch "^01/01/2019") { $_; } }
Thanks
Based on your comments the content will look something like the array.
So you want to read the content, filter it, and get the resulting line from that content:
# Get the content
# $content = Get-Content -Path 'pathtofile.txt'
$content = #('field1,field2,field3', '01/01/2019,b,c')
# Convert from csv
$csvContent = $content | ConvertFrom-Csv
# Add your filter based on the field
$results = $csvContent | Where-Object { $_.field1 -notmatch '01/01/2019'} | % { $_ }
# Convert your results back to csv if needed
$results | ConvertTo-Csv
If performance is an issue then .net would handle millions of records with CsvHelper just like PowerBi.
# install CsvHelper
nuget install CsvHelper
# import csvhelper
import-module CsvHelper.2.16.3.0\lib\net45\CsvHelper.dll
# write the content to the file just for this example
#('field1,field2,field3', '01/01/2019,b,c') | sc -path "c:\temp\text.csv"
$results = #()
# open the file for reading
try {
$stream = [System.IO.File]::OpenRead("c:\temp\text.csv")
$sr = [System.IO.StreamReader]::new($stream)
$csv = [CsvHelper.CsvReader]::new($sr)
# read in the records
while($csv.Read()){
# add in the result
$result= #{}
[string] $value = "";
for($i = 0; $csv.TryGetField($i, [ref] $value ); $i++) {
$result.Add($i, $value);
}
# add your filter here for the results
$results.Add($result)
}
# dispose of everything once we are done
}finally {
$stream.Dispose();
$sr.Dispose();
$csv.Dispose();
}
My .txt file looks like this...
date,col2,col3
"01/01/2019 22:42:00", "column2", "column3"
"01/02/2019 22:42:00", "column2", "column3"
"01/01/2019 22:42:00", "column2", "column3"
"02/01/2019 22:42:00", "column2", "column3"
This command does exactly what you are asking...
Get-Content -Path C:\myFile.txt | ? {$_ -notmatch "01/01/2019"} | Select -Skip 1
The output is:
"01/02/2019 22:42:00", "column2", "column3"
"02/01/2019 22:42:00", "column2", "column3"
I skipped the top row. If you want to deal with particular columns, change myFile.txt to a .csv and import it.
Looking at the question and comments, you are dealing with a headerless CSV file it seems. Because the file contains millions of records, I think using Get-Content or Import-Csv could slow down too much. Using [System.IO.File]::ReadLines() would then be faster.
If indeed each line starts with a quoted date, you could use various methods of figuring out if the line start with "01/01/2019 or not. Here, I use the -notlike operator:
$fileIn = "D:\your_text_file_which_is_in_fact_a_CSV_file.txt"
$fileOut = "D:\your_text_file_which_is_in_fact_a_CSV_file_FILTERED.txt"
foreach ($line in [System.IO.File]::ReadLines($fileIn)) {
if ($line -notlike '"01/01/2019*') {
# write to a NEW file
Add-Content -Path $fileOut -Value $line
}
}
Update
Judging from your comment, you are apparently using an older .NET framework, as the [System.IO.File]::ReadLines() became available as of version 4.0.
In that case, the below code should work for you:
$fileIn = "D:\your_text_file_which_is_in_fact_a_CSV_file.txt"
$fileOut = "D:\your_text_file_which_is_in_fact_a_CSV_file_FILTERED.txt"
$reader = New-Object System.IO.StreamReader($fileIn)
$writer = New-Object System.IO.StreamWriter($fileOut)
while (($line = $reader.ReadLine()) -ne $null) {
if ($line -notlike '"01/01/2019*') {
# write to a NEW file
$writer.WriteLine($line)
}
}
$reader.Dispose()
$writer.Dispose()

Powershell .csv merge with column remove

Using the code below I am able to merge several .csv files in 5 seconds.
$getFirstLine = $true
get-childItem "C:\my\dir\*.csv" | foreach {
$filePath = $_
$lines = $lines = Get-Content $filePath
$linesToWrite = switch($getFirstLine) {
$true {$lines}
$false {$lines | Select -Skip 1}
}
$getFirstLine = $false
Add-Content "C:\my\dir\output_code2.csv" $linesToWrite
}
I would like to take this one step further, preferable using piping to remove several of the columns using a command like:
select DateAndTime,DG1_KW,DG2_KW,WT_KW,HTR1_KW,POSS_Load_KW,INV1_KW,INV2_SOC|Export-csv output_test.csv -Notypeinformation
that being the variables in the header of each file.
How would I modify this code to make this work? The idea here is that I am going to be working with hundreds up to thousands of files.
I have other code which can do this but it is no where near as fast.
for instance using 10 .csv files that are 450kb each. the code below takes 20 seconds to process and spits out a .csv file in 20 seconds removing 48 of the 56 columns leaving the variables I need. If I remove part of the code that trims the columns it still takes 12+ seconds.
# Directory containing csv files, include *.*
$directory = "C:\my\dir\*.*";
# Get the csv files
$csvFiles = Get-ChildItem -Path $directory -Filter *.csv;
#$content = $null;
$content = #();
# Process each file
foreach($csv in $csvFiles)
{
$content += Import-Csv $csv;
}
# Write a datetime stamped csv file
$datetime = Get-Date -Format "yyyyMMddhhmmss";
$content |Export-Csv -Path "C:\my\dir\output_code2_$datetime.csv" -NoTypeInformation;
The code I would like to modify runs those same 10 files in 5 seconds but does not remove the 48 columns.
Any Ideas guys?
Ok, you want an example... Let's say your CSVs always look like this:
Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9,Col10
data1,data2,data3,data4,data5,data6,data7,data8,data9,data10
dataA,dataB,dataC,dataD,dataE,dataF,dataG,dataH,dataI,dataJ
Now let's say you only want Col1, Col2, Col6, Col9, and Col10. You could do a RegEx replace something like:
$Files = get-childItem "C:\my\dir\*.csv" | Select -Expand FullName
ForEach($File in $Files){
If($SkipFirst){
Get-Content $File | Select -Skip 1 | ForEach{$_ -replace "^((?:.*?\,){2})(?:.*\,){3}(.*?\,)(?:(?:.*?\,){2})(.*?,.*?)$", '$1$2$3'} | Add-Content "C:\my\dir\output_code2.csv"
}Else{
Get-Content $File | ForEach{$_ -replace "^((?:.*?\,){2})(?:.*\,){3}(.*?\,)(?:(?:.*?\,){2})(.*?,.*?)$", '$1$2$3'} | Add-Content "C:\my\dir\output_code2.csv"
}
}
That would extract just the columns that I noted above. See https://regex101.com/r/jY4oO6/1 for detailed breakdown of RegEx string. Effective output would be (skipping first line if so dictated):
Col1,Col2,Col6,Col9,Col10
data1,data2,data6,data9,data10
dataA,dataB,dataF,dataI,dataJ

How to modify contents of a pipe-delimited text file with PowerShell

I have a pipe-delimited text file. The file contains "records" of various types. I want to modify certain columns for each record type. For simplicity, let's say there are 3 record types: A, B, and C. A has 3 columns, B has 4 columns, and C has 5 columns. For example, we have:
A|stuff|more_stuff
B|123|other|x
C|something|456|stuff|more_stuff
B|78903|stuff|x
A|1|more_stuff
I want to append the prefix "P" to all desired columns. For A, the desired column is 2. For B, the desired column is 3. For C, the desired column is 4.
So, I want the output to look like:
A|Pstuff|more_stuff
B|123|Pother|x
C|something|456|Pstuff|more_stuff
B|78903|Pstuff|x
A|P1|more_stuff
I need to do this in PowerShell. The file could be very large. So, I'm thinking about going with the File-class of .NET. If it were a simple string replacement, I would do something like:
$content = [System.IO.File]::ReadAllText("H:\test_modify_contents.txt").Replace("replace_text","something_else")
[System.IO.File]::WriteAllText("H:\output_file.txt", $content)
But, it's not so simple in my particular situation. So, I'm not even sure if ReadAllText and WriteAllText is the best solution. Any ideas on how to do this?
I would ConvertFrom-Csv so you can check each line as an object. On this code, I did add a header, but mainly for code readability. The header is cut out of the output on the last line anyway:
$input = "H:\test_modify_contents.txt"
$output = "H:\output_file.txt"
$data = Get-Content -Path $input | ConvertFrom-Csv -Delimiter '|' -Header 'Column1','Column2','Column3','Column4','Column5'
$data | % {
If ($_.Column5) {
#type C:
$_.Column4 = "P$($_.Column4)"
} ElseIf ($_.Column4) {
#type B:
$_.Column3 = "P$($_.Column3)"
} Else {
#type A:
$_.Column2 = "P$($_.Column2)"
}
}
$data | Select Column1,Column2,Column3,Column4,Column5 | ConvertTo-Csv -Delimiter '|' -NoTypeInformation | Select-Object -Skip 1 | Set-Content -Path $output
It does add extra | for the type A and B lines. Output:
"A"|"Pstuff"|"more_stuff"||
"B"|"123"|"Pother"|"x"|
"C"|"something"|"456"|"Pstuff"|"more_stuff"
"B"|"78903"|"Pstuff"|"x"|
"A"|"P1"|"more_stuff"||
If your file sizes are large then reading the complete file contents at once using Import-Csv or ReadAll is probably not a good idea. I would use Get-Content cmdlet using the ReadCount property which will stream the file one row at time and then use a regex for the processing. Something like this:
Get-Content your_in_file.txt -ReadCount 1 | % {
$_ -replace '^(A\||B\|[^\|]+\||C\|[^\|]+\|[^\|]+\|)(.*)$', '$1P$2'
} | Set-Content your_out_file.txt
EDIT:
This version should output faster:
$d = Get-Date
Get-Content input.txt -ReadCount 1000 | % {
$_ | % {
$_ -replace '^(A\||B\|[^\|]+\||C\|[^\|]+\|[^\|]+\|)(.*)$', '$1P$2'
} | Add-Content output.txt
}
(New-TimeSpan $d (Get-Date)).Milliseconds
For me this processed 50k rows in 350 milliseconds. You probably get more speed by tweaking the -ReadCount value to find the ideal amount.
Given the large input file, i would not use either ReadAllText or Get-Content.
They actually read the entire file into memory.
Consider using something along the lines of
$filename = ".\input2.csv"
$outfilename = ".\output2.csv"
function ProcessFile($inputfilename, $outputfilename)
{
$reader = [System.IO.File]::OpenText($inputfilename)
$writer = New-Object System.IO.StreamWriter $outputfilename
$record = $reader.ReadLine()
while ($record -ne $null)
{
$writer.WriteLine(($record -replace '^(A\||B\|[^\|]+\||C\|[^\|]+\|[^\|]+\|)(.*)$', '$1P$2'))
$record = $reader.ReadLine()
}
$reader.Close()
$reader.Dispose()
$writer.Close()
$writer.Dispose()
}
ProcessFile $filename $outfilename
EDIT: After testing all the suggestions on this page, i have borrowed the regex from Dave Sexton and this is the fastest implementation. Processes a 1gb+ file in 175 seconds. All other implementations are significantly slower on large input files.