I have got a source CSV file (without a header, all columns delimited by a comma) which I am trying split out into separate CSV files based upon the value in the first column and using that column value as the output file name.
Input file:
S00000009,2016,M04 01/07/2016,0.00,0.00,0.00,0.00,0.00,0.00,750.00,0.00,0.00
S00000009,2016,M05 01/08/2016,0.00,0.00,0.00,0.00,0.00,0.00,600.00,0.00,0.00
S00000009,2016,M06 01/09/2016,0.00,0.00,0.00,0.00,0.00,0.00,600.00,0.00,0.00
S00000010,2015,W28 05/10/2015,2275.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
S00000010,2015,W41 04/01/2016,0.00,0.00,0.00,0.00,0.00,0.00,568.75,0.00,0.00
S00000010,2015,W42 11/01/2016,0.00,0.00,0.00,0.00,0.00,0.00,568.75,0.00,0.00
S00000012,2015,W10 01/06/2015,0.00,0.00,0.00,0.00,0.00,0.00,650.00,0.00,0.00
S00000012,2015,W11 08/06/2015,0.00,0.00,0.00,0.00,0.00,0.00,650.00,0.00,0.00
S00000012,2015,W12 15/06/2015,0.00,0.00,0.00,0.00,0.00,0.00,650.00,0.00,0.00
My PowerShell script looks like this:
Import-Csv INPUT_FILE.csv -Header service_id,year,period,cash_exp,cash_inc,cash_def,act_exp,act_inc,act_def,comm_exp,comm_inc,comm_def |
Group-Object -Property "service_id" |
Foreach-Object {
$path = $_.Name + ".csv";
$_.group | Export-Csv -Path $path -NoTypeInformation
}
Output files:
S00000009.csv:
"service_id","year","period","cash_exp","cash_inc","cash_def","act_exp","act_inc","act_def","comm_exp","comm_inc","comm_def"
"S00000009","2016","M04 01/07/2016","0.00","0.00","0.00","0.00","0.00","0.00","750.00","0.00","0.00"
"S00000009","2016","M05 01/08/2016","0.00","0.00","0.00","0.00","0.00","0.00","600.00","0.00","0.00"
"S00000009","2016","M06 01/09/2016","0.00","0.00","0.00","0.00","0.00","0.00","600.00","0.00","0.00"
S00000010.csv:
"service_id","year","period","cash_exp","cash_inc","cash_def","act_exp","act_inc","act_def","comm_exp","comm_inc","comm_def"
"S00000010","2015","W28 05/10/2015","2275.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00"
"S00000010","2015","W41 04/01/2016","0.00","0.00","0.00","0.00","0.00","0.00","568.75","0.00","0.00"
"S00000010","2015","W42 11/01/2016","0.00","0.00","0.00","0.00","0.00","0.00","568.75","0.00","0.00"
It is generating the new files using the header value in column 1 (service_id).
There are 2 problems.
The output CSV file contains a header row which I don't need.
The columns are enclosed with double quotes which I don't need.
First of all the .csv file needs headers and the quote marks as a csv file structure. But if you don't want them then you can go on with a text file or...
$temp = Import-Csv INPUT_FILE.csv -Header service_id,year,period,cash_exp,cash_inc,cash_def,act_exp,act_inc,act_def,comm_exp,comm_inc,comm_def | Group-Object -Property "service_id" |
Foreach-Object {
$path=$_.name+".csv"
$temp0 = $_.group | ConvertTo-Csv -NoTypeInformation | Select-Object -Skip 1
$temp1 = $temp0.replace("""","")
$temp1 > $path
}
But this output is not a "real" csv file.
Hope that helps.
For your particular scenario you could probably use a simpler approach. Read the input file as a plain text file, group the lines by splitting off the first field, then write the groups to output files named after the groups:
Get-Content 'INPUT_FILE.csv' |
Group-Object { $_.Split(',')[0] } |
ForEach-Object { $_.Group | Set-Content ($_.Name + '.csv') }
Another solution,
using no named headers but simply numbers (as they aren't wanted in output anyway)
avoiding unneccessary temporary files.
removing only field delimiting double quotes.
Import-Csv INPUT_FILE.csv -Header (1..12) |
Group-Object -Property "1" | Foreach-Object {
($_.Group | ConvertTo-Csv -NoType | Select-Object -Skip 1).Trim('"') -replace '","',',' |
Set-Content -Path ("{0}.csv" -f $_.Name)
}
I have a CSV file which contains many lines and I want to take the text between <STR_0.005_Long>, and µm,5.000µm.
Example line from the CSV:
Straightness(Up/Down) <STR_0.005_Long>,4.444µm,5.000µm,,Pass,2.476µm,1.968µm,25,0.566µm,0.720µm
This is the script that I am trying to write:
$arr = #()
$path = "C:\Users\georgi\Desktop\5\test.csv"
$pattern = "(?<=.*<STR_0.005_Long>,)\w+?(?=µm,5.000µm*)"
$Text = Get-Content $path
$Text.GetType() | Format-Table -AutoSize
$Text[14] | Foreach {
if ([Regex]::IsMatch($_, $pattern)) {
$arr += [Regex]::Match($_, $pattern)
Out-File C:\Users\georgi\Desktop\5\test.txt -Append
}
}
$arr | Foreach {$_.Value} | Out-File C:\Users\georgi\Desktop\5\test.txt -Append
Use a Where-Object filter with your regular expression and simply output the match to the output file:
Get-Content $path |
Where-Object { $_ -match $pattern } |
ForEach-Object { $matches[0] } |
Out-File 'C:\Users\georgi\Desktop\5\test.txt'
Of course, since you have a CSV, you could simply use Import-Csv and export the value of that particular column:
Import-Csv $path | Select-Object -Expand 'column_name' |
Out-File 'C:\Users\georgi\Desktop\5\test.txt'
Replace column_name with the actual name of the column. If the CSV doesn't have a column header you can specify one via the -Header parameter:
Import-Csv $path -Header 'col1','col2','col3',... |
Select-Object -Expand 'col2' |
Out-File 'C:\Users\georgi\Desktop\5\test.txt'
As a continuation of a script I'm running, working on the following.
I have a CSV file that has formatted information, example as follows:
File named Import.csv:
Name,email,x,y,z
\I\RS\T\Name1\c\x,email#jksjks,d,f
\I\RS\T\Name2\d\f,email#jsshjs,d,f
...
This file is large.
I also have another file called Note.txt.
Name1
Name2
Name3
...
With help from #mathias-r-jessen
$Dir = PathToFile
$import = Import-Csv $Dir\import.csv
$NoteFile = "$Dir\Note.txt"
$Note = GC $NoteFile
$Import |Where-Object {$Note -contains $_.Name.Split('\')[4]} |Export-Csv "$Dir\Result.csv" -NoTypeInformation -Append
This code quickly and effortlessly parses the big csv and extracts every line that contains any of the lines in the $note file.
My next question is how do i log any lines in the $note file that were not found in the csv file.
I tried the following:
$result = $Import |Where-Object {$Note -contains $_.Name.Split('\')[4]} |Export-Csv "$Dir\Result.csv" -NoTypeInformation -Append
$Note | Where-Object {$result.Name.Split('\')[4] -notcontains $Note} | out-file $dir\not-found.log -append
This seems to return every line in $note.
#mathias-r-jessen any help you can provide would be appreciated.
You could use a Switch to do that.
Switch($Import){
{$Note -contains $_.Name.Split('\')[4]} {$_ | Export-Csv "$Dir\Result.csv" -NoTypeInformation -Append; continue}
default {$_ | Export-csv "$Dir\Not-Found.csv" -NoType -Append}
}
The continue in the first option makes it so that if the first case is a match it performs the relevant action, and then continues to the next record. If the first case doesn't match it moves on to the default action, which outputs it to a different file.
I solved it by using the following:
$result = $Import |Where-Object {$Note -contains $_.Name.Split('\')[4]}
$result | Export-Csv "$Dir\Result.csv" -NoTypeInformation -Append
$matches = $note | where-object { $result.Name -match $_}
compare-object $note $matches |where-object {$_.SideIndicator -like "<=" | select -ExpandProperty InputObject | Out-file "$Dir\Not_found.txt" -Append