I am using PowerShell to read and loop through a CSV file in order to create a new file for each row of the CSV file. I need to use the header names as part of each new file.
For each row of the CSV, how can I loop through each column and output the key and value for each variable in the output of each new file?
For example, if Master.csv contains
a,b,c
1,2,3
4,5,6
I would like to output a file named file1.txt:
a=1
b=2
c=3
and a file named file2.txt:
a=4
b=5
c=6
Is there an advantage to converting the array into a hash table, and using something like $d.Keys?
I am trying the below, but cannot get the key:
Import-Csv "C:\Master.csv" | %{
$CsvObject = $_
Write-Output "Working with $($CsvObject.a)"
$CsvObject | ForEach-Object {
Write-Output "Key = Value`n"
}
}
this will do the job, it seems. [grin] it uses the hidden .PSObject property to iterate thru the properties of each object.
# fake reading in a CSV file
# in real life, use Import-CSV
$Instuff = #'
a,b,c
1,2,3
4,5,6
'# | ConvertFrom-Csv
$Counter = 1
foreach ($IS_Item in $Instuff)
{
$FileName = "$env:TEMP\HamletHub_File$Counter.txt"
$TextLines = foreach ($Prop in $IS_Item.PSObject.Properties.Name)
{
'{0} = {1}' -f $Prop, $IS_Item.$Prop
}
Set-Content -LiteralPath $FileName -Value $TextLines
$Counter ++
}
HamletHub_File1.txt content ...
a = 1
b = 2
c = 3
Related
I have 3 csv files in C:\temp. Trying to combine all 3 csv files to single file.
F1.csv, F2.csv, F3.csv [All having unique headers and different number of rows and columns]. Below are sample contents in the file.
F1.csv
F1C1 F1C2
ABC 123
F2.csv
F2C1 F2C2
DEF 456
GHI 789
JKL 101112
F3.csv
F3C1
MNO
PQR
I want the result csv file FR.csv to be like below
FR.csv
F1C1 F1C2 F2C1 F2C2 F3C1
ABC 123 DEF 456 MNO
GHI 789 PQR
JKL 101112
I tried running the below script, but FR.csv gives output in single column.
Get-Content C:\temp\*csv | Add-Content C:\temp\FinalResult.csv
The following solutions assume that Get-ChildItem *.csv enumerates the files to merge, in the desired order (which works with input files F1.csv, F2.csv, F3.csv in the current dir).
Plain-text solution, using .NET APIs, System.IO.StreamReaderand System.IO.StreamWriter:
This solution performs much better than the OO solution below, but the latter gives you more flexibility. Input files without a Unicode BOM are assumed to be UTF-8-encoded, and the output is saved to a BOM-less UTF8 file named FR.csv in the current dir. (the APIs used do allow you to specify different encodings, if needed).
$outFile = 'FR.csv'
# IMPORTANT: Always use *full* paths with .NET APIs.
# Writer for the output file.
$writer = [System.IO.StreamWriter] (Join-Path $Pwd.ProviderPath $outFile)
# Readers for all input files.
$readers = [System.IO.StreamReader[]] (Get-ChildItem *.csv -Exclude $outFile).FullName
# Read all files in batches of corresponding lines, join the
# lines of each batch with ",", and save to the output file.
$isHeader = $true
while ($readers.EndOfStream -contains $false) {
if ($isHeader) {
$headerLines = $readers.ReadLine()
$colCounts = $headerLines.ForEach({ ($_ -split ',').Count })
$writer.WriteLine($headerLines -join ',')
$isHeader = $false
} else {
$i = 0
$lines = $readers.ForEach({
if ($line = $_.ReadLine()) { $line }
else { ',' * ($colCounts[$i] - 1) }
++$i
})
$writer.WriteLine($lines -join ',')
}
}
$writer.Close()
$readers.Close()
OO solution, using Import-Csv and ConvertTo-Csv / Export-Csv:
# Read all CSV files into an array of object arrays.
$objectsPerCsv =
Get-ChildItem *.csv -Exclude FR.csv |
ForEach-Object {
, #(Import-Csv $_.FullName)
}
# Determine the max. row count.
$maxCount = [Linq.Enumerable]::Max($objectsPerCsv.ForEach('Count'))
# Get all column names per CSV.
$colNamesPerCsv = $objectsPerCsv.ForEach({ , $_[0].psobject.Properties.Name })
0..($maxCount-1) | ForEach-Object {
$combinedProps = [ordered] #{}
$row = $_; $col = 0
$objectsPerCsv.ForEach({
if ($object = $_[$row]) {
foreach ($prop in $object.psobject.Properties) {
$combinedProps.Add($prop.Name, $prop.Value)
}
}
else {
foreach ($colName in $colNamesPerCsv[$col]) {
$combinedProps.Add($colName, $null)
}
}
++$col
})
[pscustomobject] $combinedProps
} | ConvertTo-Csv
Replace ConvertTo-Csv with Export-Csv to export the data to a file; use the -NoTypeInformation parameter and -Encoding as needed; e.g. ... | Export-Csv -NoTypeInformation -Encoding utf8 Merged.csv
I created a script that lists all the folders, subfolders and files and export them to csv:
$path = "C:\tools"
Get-ChildItem $path -Recurse |select fullname | export-csv -Path "C:\temp\output.csv" -NoTypeInformation
But I would like that each folder, subfolder and file in pfad is written into separate column in csv.
Something like this:
c:\tools\test\1.jpg
Column1
Column2
Column3
tools
test
1.jpg
I will be grateful for any help.
Thank you.
You can split the Fullname property using the Split() method. The tricky part is that you need to know the maximum path depth in advance, as the CSV format requires that all rows have the same number of columns (even if some columns are empty).
# Process directory $path recursively
$allItems = Get-ChildItem $path -Recurse | ForEach-Object {
# Split on directory separator (typically '\' for Windows and '/' for Unix-like OS)
$FullNameSplit = $_.FullName.Split( [IO.Path]::DirectorySeparatorChar )
# Create an object that contains the splitted path and the path depth.
# This is implicit output that PowerShell captures and adds to $allItems.
[PSCustomObject] #{
FullNameSplit = $FullNameSplit
PathDepth = $FullNameSplit.Count
}
}
# Determine highest column index from maximum depth of all paths.
# Minus one, because we'll skip root path component.
$maxColumnIndex = ( $allItems | Measure-Object -Maximum PathDepth ).Maximum - 1
$allRows = foreach( $item in $allItems ) {
# Create an ordered hashtable
$row = [ordered]#{}
# Add all path components to hashtable. Make sure all rows have same number of columns.
foreach( $i in 1..$maxColumnIndex ) {
$row[ "Column$i" ] = if( $i -lt $item.FullNameSplit.Count ) { $item.FullNameSplit[ $i ] } else { $null }
}
# Convert hashtable to object suitable for output to CSV.
# This is implicit output that PowerShell captures and adds to $allRows.
[PSCustomObject] $row
}
# Finally output to CSV file
$allRows | Export-Csv -Path "C:\temp\output.csv" -NoTypeInformation
Notes:
The syntax Select-Object #{ Name= ..., Expression = ... } creates a calculated property.
$allRows = foreach captures and assigns all output of the foreach loop to variable $allRows, which will be an array if the loop outputs more than one object. This works with most other control statements as well, e. g. if and switch.
Within the loop I could have created a [PSCustomObject] directly (and used Add-Member to add properties to it) instead of first creating a hashtable and then converting to [PSCustomObject]. The choosen way should be faster as no additional overhead for calling cmdlets is required.
While a file with rows containing a variable number of items is not actually a CSV file, you can roll your own and Microsoft Excel can read it.
=== Get-DirCsv.ps1
Get-Childitem -File |
ForEach-Object {
$NameParts = $_.FullName -split '\\'
$QuotedParts = [System.Collections.ArrayList]::new()
foreach ($NamePart in $NameParts) {
$QuotedParts.Add('"' + $NamePart + '"') | Out-Null
}
Write-Output $($QuotedParts -join ',')
}
Use this to capture the output to a file with:
.\Get-DirCsv.ps1 | Out-File -FilePath '.\dir.csv' -Encoding ascii
So I have a file that looks like :
A=www.google.com
B=www.yahoo.com
Now, I want to convert this text file to a HashTable and read values using keys ie A or B
This is what I have come up with:
$hash = Get-Content .\test.txt
$hash[1].Split('=')[1]
The above script works fine except that I want to use key instead of number
Something like :
$hash['B'].Split('=')[1]
You will need to convert the file data into a hashtable object first. There are several techniques to add data to a hashtable object. The following will convert all lines to a hash table value provided they have the format key=value.
$hash = [ordered]#{}
Get-Content test.txt | Foreach-Object {
$key,$value = ($_ -split '=',2).Trim()
$hash[$key] = $value
}
# Value Retrieval syntax
$hash.A
$hash['A']
If you want to target a specific line in the file, you can do the following:
$hash = [ordered]#{}
$data = Get-Content test.txt
$temp = $data[1] -split '='
$hash[$temp[0]] = $temp[1]
# Value Retrieval Syntax
$hash.B
$hash['B']
You could technically convert the file data with two commands, but the order may vary. I'm not sure if ConvertFrom-StringData is favorable anymore.
$hash = Get-Content test.txt -Raw | ConvertFrom-StringData
# Value Retrieval Syntax
$hash.B
$hash['B']
Output From First Code Snippet:
Get-Content test.txt
A=www.google.com
B=www.yahoo.com
$hash = [ordered]#{}
Get-Content test.txt | Foreach-Object {
$temp = ($_ -split '=').Trim()
$hash[$temp[0]] = $temp[1]
}
$hash
Name Value
---- -----
A www.google.com
B www.yahoo.com
$hash['B']
www.yahoo.com
I have a pipe-delimited text file. The file contains "records" of various types. I want to modify certain columns for each record type. For simplicity, let's say there are 3 record types: A, B, and C. A has 3 columns, B has 4 columns, and C has 5 columns. For example, we have:
A|stuff|more_stuff
B|123|other|x
C|something|456|stuff|more_stuff
B|78903|stuff|x
A|1|more_stuff
I want to append the prefix "P" to all desired columns. For A, the desired column is 2. For B, the desired column is 3. For C, the desired column is 4.
So, I want the output to look like:
A|Pstuff|more_stuff
B|123|Pother|x
C|something|456|Pstuff|more_stuff
B|78903|Pstuff|x
A|P1|more_stuff
I need to do this in PowerShell. The file could be very large. So, I'm thinking about going with the File-class of .NET. If it were a simple string replacement, I would do something like:
$content = [System.IO.File]::ReadAllText("H:\test_modify_contents.txt").Replace("replace_text","something_else")
[System.IO.File]::WriteAllText("H:\output_file.txt", $content)
But, it's not so simple in my particular situation. So, I'm not even sure if ReadAllText and WriteAllText is the best solution. Any ideas on how to do this?
I would ConvertFrom-Csv so you can check each line as an object. On this code, I did add a header, but mainly for code readability. The header is cut out of the output on the last line anyway:
$input = "H:\test_modify_contents.txt"
$output = "H:\output_file.txt"
$data = Get-Content -Path $input | ConvertFrom-Csv -Delimiter '|' -Header 'Column1','Column2','Column3','Column4','Column5'
$data | % {
If ($_.Column5) {
#type C:
$_.Column4 = "P$($_.Column4)"
} ElseIf ($_.Column4) {
#type B:
$_.Column3 = "P$($_.Column3)"
} Else {
#type A:
$_.Column2 = "P$($_.Column2)"
}
}
$data | Select Column1,Column2,Column3,Column4,Column5 | ConvertTo-Csv -Delimiter '|' -NoTypeInformation | Select-Object -Skip 1 | Set-Content -Path $output
It does add extra | for the type A and B lines. Output:
"A"|"Pstuff"|"more_stuff"||
"B"|"123"|"Pother"|"x"|
"C"|"something"|"456"|"Pstuff"|"more_stuff"
"B"|"78903"|"Pstuff"|"x"|
"A"|"P1"|"more_stuff"||
If your file sizes are large then reading the complete file contents at once using Import-Csv or ReadAll is probably not a good idea. I would use Get-Content cmdlet using the ReadCount property which will stream the file one row at time and then use a regex for the processing. Something like this:
Get-Content your_in_file.txt -ReadCount 1 | % {
$_ -replace '^(A\||B\|[^\|]+\||C\|[^\|]+\|[^\|]+\|)(.*)$', '$1P$2'
} | Set-Content your_out_file.txt
EDIT:
This version should output faster:
$d = Get-Date
Get-Content input.txt -ReadCount 1000 | % {
$_ | % {
$_ -replace '^(A\||B\|[^\|]+\||C\|[^\|]+\|[^\|]+\|)(.*)$', '$1P$2'
} | Add-Content output.txt
}
(New-TimeSpan $d (Get-Date)).Milliseconds
For me this processed 50k rows in 350 milliseconds. You probably get more speed by tweaking the -ReadCount value to find the ideal amount.
Given the large input file, i would not use either ReadAllText or Get-Content.
They actually read the entire file into memory.
Consider using something along the lines of
$filename = ".\input2.csv"
$outfilename = ".\output2.csv"
function ProcessFile($inputfilename, $outputfilename)
{
$reader = [System.IO.File]::OpenText($inputfilename)
$writer = New-Object System.IO.StreamWriter $outputfilename
$record = $reader.ReadLine()
while ($record -ne $null)
{
$writer.WriteLine(($record -replace '^(A\||B\|[^\|]+\||C\|[^\|]+\|[^\|]+\|)(.*)$', '$1P$2'))
$record = $reader.ReadLine()
}
$reader.Close()
$reader.Dispose()
$writer.Close()
$writer.Dispose()
}
ProcessFile $filename $outfilename
EDIT: After testing all the suggestions on this page, i have borrowed the regex from Dave Sexton and this is the fastest implementation. Processes a 1gb+ file in 175 seconds. All other implementations are significantly slower on large input files.
I have 3 csv files and they are all only 1 column long. I have tried lots of thing to put it all in one csv file but i cant get it to work. When i output it, it all ends up in one column here is what i did so far
#Putting Csv files to Array
$CSV1 = #(gc $EmailPathCsv)
$CSV2 = #(gc $UserLPathCsv)
$CSV3 = #(gc $EmailPathCsv)
#
for ($i=0; $i -lt $CSV1.Count; $i++)
{
$CSV4 += $CSV1[$i] + "," + $CSV2[$i] + "," + $CSV3[$i] + " "
}
$csv4 | out-file -append $MergedCsvExport
Your loop is adding everything into $CSV4, each time through the loop $CSV4 gets longer and longer.
Then you print it once. That's why you get one really long line. Try printing it once every time through the loop, and overwriting $CSV4 every time:
#Putting Csv files to Array
$CSV1 = #(gc $EmailPathCsv)
$CSV2 = #(gc $UserLPathCsv)
$CSV3 = #(gc $EmailPathCsv)
#
for ($i=0; $i -lt $CSV1.Count; $i++)
{
$CSV4 = $CSV1[$i] + "," + $CSV2[$i] + "," + $CSV3[$i] + " "
out-file -InputObject $csv4 -Append -FilePath $MergedCsvExport
}
I'd use a foromat string for that.
$CSV1 = #(gc $EmailPathCsv)
$CSV2 = #(gc $UserLPathCsv)
$CSV3 = #(gc $EmailPathCsv)
for ($i=0; $i -lt $CSV1.Count; $i++)
{
'"{0}","{1}","{2}"' -f $CSV1[$i],$CSV2[$i],$CSV3[$i] |
add-content $MergedCsvExport
}
As a more fun answer:
$CSV1 = 1,2,3,4
$CSV2 = 10,11,12,13
$CSV3 = 'a','b','c','d'
$c = gv -Name CSV* | select -expand name | sort
(gv -Va $c[0])|%{$r=0}{#($c|%{(gv -Va $_)[$r]}) -join ',';$r++}
Sample output:
1, 10, a
2, 11, b
3, 12, c
4, 13, d
You could put |Out-File "merged-data.csv" on the end to save to a file.
It works for more columns too, just add more arrays called $CSV{something}.
Edit: I wonder if Get-Variable's output is in a predictable order, or unspecified? If you don't mind if the column order might change, it collapses to:
$CSV1 = 1,2,3,4
$CSV2 = 10,11,12,13
$CSV3 = 'a','b','c','d'
(gv CSV*)[0].Value|%{$r=0}{#((gv CSV*)|%{(gv -Va $_.Name)[$r]}) -join ',';$r++}
Edit again: Well, in case anyone notices and is curious, and has time on their hands, I've expanded it all out with an explanation of what it's doing:
# Search the local variable scope for variables named CSV*
# This will find $CSV1, $CSV2, etc.
# This means the number of columns
# isn't fixed, you can easily add more.
# Take their names, sort them.
# Result: an array of strings "CSV1", "CSV2", ...
# for however many variables there are
$columnVariables = Get-Variable -Name "CSV*" | Select-Object -Expand Name | Sort-Object
# NB. if you remove $CSV3= from your code, it is
# still in memory from previous run. To remove
# it, use `Remove-Variable "CSV3"
# In pseudo-code, what the next part does is
# for row in range(data):
# #(CSV1[row], CSV2[row], ... CSVn[row]) -join ','
# The outer loop over the number of columns
# is done by piping something of the right length
# into a foreach loop, but ignoring the content.
# So get the first column array content:
$firstColumn = (Get-Variable $columnVariables[0]).Value
# and pipe it into a loop.
# The loop uses ForEach {setup} {loop} pattern
# to setup a row-counter before the loop body
$firstColumn | ForEach { $row=0 } {
# Now we know the row we are on, we can get
# $CSV1[row], $CSV2[row], ...
# Take the column variable array "CSV1", "CSV2", ..
# Loop over them
$rowContent = $columnVariables | ForEach {
# $_ a string of the name, e.g. "CSV1"
# Use Get-Variable to convert it
# into the variable $CSV1
# with -ValueOnly to get the array itself
# rather than details about the variable
$columnVar = Get-Variable -ValueOnly $_
# $columVar is now one of the $CSVn variables
# so it contains, e.g. 1,2,3,4
# Index into that for the current row
# to get one item, e.g. 3
# Output it into the pipeline
($columnVar)[$row]
} # This is the end of the inner loop
# The pipeline now contains the column
# values/content making up a single row
# 1
# 10
# 'a'
# Back in the outer row loop, Take the row
# content and make it a comma separated string
# e.g. "1,10,a"
# Output this into the pipeline
#($rowContent) -join ','
# Increment the row counter
$row++
}