I have got a source CSV file (without a header, all columns delimited by a comma) which I am trying split out into separate CSV files based upon the value in the first column and using that column value as the output file name.
Input file:
S00000009,2016,M04 01/07/2016,0.00,0.00,0.00,0.00,0.00,0.00,750.00,0.00,0.00
S00000009,2016,M05 01/08/2016,0.00,0.00,0.00,0.00,0.00,0.00,600.00,0.00,0.00
S00000009,2016,M06 01/09/2016,0.00,0.00,0.00,0.00,0.00,0.00,600.00,0.00,0.00
S00000010,2015,W28 05/10/2015,2275.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
S00000010,2015,W41 04/01/2016,0.00,0.00,0.00,0.00,0.00,0.00,568.75,0.00,0.00
S00000010,2015,W42 11/01/2016,0.00,0.00,0.00,0.00,0.00,0.00,568.75,0.00,0.00
S00000012,2015,W10 01/06/2015,0.00,0.00,0.00,0.00,0.00,0.00,650.00,0.00,0.00
S00000012,2015,W11 08/06/2015,0.00,0.00,0.00,0.00,0.00,0.00,650.00,0.00,0.00
S00000012,2015,W12 15/06/2015,0.00,0.00,0.00,0.00,0.00,0.00,650.00,0.00,0.00
My PowerShell script looks like this:
Import-Csv INPUT_FILE.csv -Header service_id,year,period,cash_exp,cash_inc,cash_def,act_exp,act_inc,act_def,comm_exp,comm_inc,comm_def |
Group-Object -Property "service_id" |
Foreach-Object {
$path = $_.Name + ".csv";
$_.group | Export-Csv -Path $path -NoTypeInformation
}
Output files:
S00000009.csv:
"service_id","year","period","cash_exp","cash_inc","cash_def","act_exp","act_inc","act_def","comm_exp","comm_inc","comm_def"
"S00000009","2016","M04 01/07/2016","0.00","0.00","0.00","0.00","0.00","0.00","750.00","0.00","0.00"
"S00000009","2016","M05 01/08/2016","0.00","0.00","0.00","0.00","0.00","0.00","600.00","0.00","0.00"
"S00000009","2016","M06 01/09/2016","0.00","0.00","0.00","0.00","0.00","0.00","600.00","0.00","0.00"
S00000010.csv:
"service_id","year","period","cash_exp","cash_inc","cash_def","act_exp","act_inc","act_def","comm_exp","comm_inc","comm_def"
"S00000010","2015","W28 05/10/2015","2275.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00"
"S00000010","2015","W41 04/01/2016","0.00","0.00","0.00","0.00","0.00","0.00","568.75","0.00","0.00"
"S00000010","2015","W42 11/01/2016","0.00","0.00","0.00","0.00","0.00","0.00","568.75","0.00","0.00"
It is generating the new files using the header value in column 1 (service_id).
There are 2 problems.
The output CSV file contains a header row which I don't need.
The columns are enclosed with double quotes which I don't need.
First of all the .csv file needs headers and the quote marks as a csv file structure. But if you don't want them then you can go on with a text file or...
$temp = Import-Csv INPUT_FILE.csv -Header service_id,year,period,cash_exp,cash_inc,cash_def,act_exp,act_inc,act_def,comm_exp,comm_inc,comm_def | Group-Object -Property "service_id" |
Foreach-Object {
$path=$_.name+".csv"
$temp0 = $_.group | ConvertTo-Csv -NoTypeInformation | Select-Object -Skip 1
$temp1 = $temp0.replace("""","")
$temp1 > $path
}
But this output is not a "real" csv file.
Hope that helps.
For your particular scenario you could probably use a simpler approach. Read the input file as a plain text file, group the lines by splitting off the first field, then write the groups to output files named after the groups:
Get-Content 'INPUT_FILE.csv' |
Group-Object { $_.Split(',')[0] } |
ForEach-Object { $_.Group | Set-Content ($_.Name + '.csv') }
Another solution,
using no named headers but simply numbers (as they aren't wanted in output anyway)
avoiding unneccessary temporary files.
removing only field delimiting double quotes.
Import-Csv INPUT_FILE.csv -Header (1..12) |
Group-Object -Property "1" | Foreach-Object {
($_.Group | ConvertTo-Csv -NoType | Select-Object -Skip 1).Trim('"') -replace '","',',' |
Set-Content -Path ("{0}.csv" -f $_.Name)
}
I have multiple CSV files that need to be merged to one. In every single CSV file there is a header and in the second row some text that I don't need.
I noticed the | Select -Skip 1 statement for the headers. Now I was wondering how I can skip the 3rd row?
I tried this, but this gives me an empty file
Get-ChildItem -Path $CSVFolder -Recurse -Filter "*.csv" | %{
Import-Csv $_.FullName -Header header1, header3, header4 |
Select -Skip 1 | Select -Skip 2
} | Export-Csv "C:\Export\result.csv" -NoTypeInformation
Select-Object doesn't allow you to skip arbitrary rows in between other rows. If you want to remove a particular row from a text input file, you can do so with a counter, e.g. like this:
$cnt = 0
Import-Csv 'C:\path\to\input.csv' |
Where-Object { ($cnt++) -ne 3 } |
Export-Csv 'C:\path\to\output.csv' -NoType
If the records in your input CSV don't have nested line breaks you could also use Get-Content/Set-Content, which is probably a little faster than Import-Csv/Export-Csv (due to less parsing overhead). Increase the line number you want to skip by one to account for the header line.
$cnt = 0
Get-Content 'C:\path\to\input.csv' |
Where-Object { ($cnt++) -ne 4 } |
Set-Content 'C:\path\to\output.csv'
try this
$i=0;
import-csv "C:\temp2\missing.csv" | %{$i++; if ($i -ne 3) {$_}} | export-csv "C:\temp2\result.csv" -NoTypeInformation
If all you are doing si skipping the first the rows in all user cases, just use -skip 3.
Get-Content -Path 'D:\Temp\UserRecord.csv'
# Results
<#
Name Codes
------- ---------
John AJFKC,EFUY
Ben EFOID, EIUF
Alex OIPORE, OUOIJE
#>
# Return all text after row the Header and row 3
(Get-Content -Path 'D:\Temp\UserRecord.csv') |
Select -Skip 3
# Results
<#
Ben EFOID, EIUF
Alex OIPORE, OUOIJE
#>
See also:
Parsing Text with PowerShell (1/3)
I think that I must be missing something obvious because I'm trying to use Import-CSV to import CSV files that have commented out lines (always beginning with a # as the first character) at the top of the file, so the file looks like this:
#[SpecialCSV],,,,,,,,,,,,,,,,,,,,
#Version,1.0.0,,,,,,,,,,,,,,,,,,,
#,,,,,,,,,,,,,,,,,,,,
#,,,,,,,,,,,,,,,,,,,,
#[Table],,,,,,,,,,,,,,,,,,,,
Header1,Header2,Header3,Header4,Header5,Header6,Header7,...
Data1,Data2,Data3,Data4,Data5,Data6,Data7,...
I'd like to ignore those first 5 lines, but still use Import-csv to get the rest of the information nicely in to Powershell.
Thanks
Simple - just use Select-String to exclude commented lines with a regex, and pipe to ConvertFrom-Csv:
Get-Content <path to CSV file> | Select-String '^[^#]' | ConvertFrom-Csv
The difference between Import-Csv and ConvertTo-Csv is that the former takes input from a file, and the latter takes pipeline input, otherwise they do the same thing - convert CSV data to an array of PSCustomObjects. So, by using ConvertFrom-Csv you can do this without modifying the CSV flie or using a temp file. You can assign the results to an array or pipe to a Foreach-Object block just as you'd do with Import-Csv:
$array = Get-Content <path to CSV file> | Select-String '^[^#]' | ConvertFrom-Csv
or
Get-Content <path to CSV file> | Select-String '^[^#]' | ConvertFrom-Csv | %{
<whatever you want do with the data>
}
CSV has no notion of "comments" - it's just flat data. You'll need to use Get-Content and inspect each line. If a line starts with #, ignore it, otherwise process it.
If you're OK with using a temp file:
Get-content special.csv |where-object{!$_.StartsWith("#")}|add-content -path $(join-path -path $env:temp -childpath "special-filtered.csv");
$mydata = import-csv -path $(join-path -path $env:temp -childpath "special-filtered.csv");
remove-item -path $(join-path -path $env:temp -childpath "special-filtered.csv")
$mydata |format-table -autosize; #Just for illustration
Edit: Forgot about convertfrom-csv. It gets much simpler this way.
$mydata = Get-Content special.csv |
Where-Object { !$_.StartsWith("#") } |
ConvertFrom-Csv
If you feed convertfrom-csv csv data as an array of lines it seems to automatically filter out comments. I frequently use convertfrom-csv this way but I haven't seen it documented.
cat data.csv | convertfrom-csv #skips commented lines automagically
("co1,col2,col3", "abc,def,ghi", "#this,is,a,comment", "abc1,def1,ghi1")|convertfrom-csv
co1 col2 col3
--- ---- ----
abc def ghi
abc1 def1 ghi1
However, the following will not skip comments:
"co1,col2,col3
abc,def,ghi
#this,is,a,comment
abc1,def1,ghi1
"|convertfrom-csv
co1 col2 col3
--- ---- ----
abc def ghi
#this is a
abc1 def1 ghi1
Where-object will work after import-csv as well. You just have to reference the first column from csv in the clause.
e.g.:
$EscapeCharacter = '#'
$FilteredData = Import-Csv -Path "$($Home)\Documents\sample.csv" -Delimiter "`t" -Encoding UTF8 | Where-Object {$_.coll1 -notlike "$EscapeCharacter*"}
The sample of tab delimited csv:
coll1 coll2
#Kotehulky SomeValue
Cakovice OtherValue