I am using get-childitem to recurse through directories ( skipping some at the top level ) open a series of csv files, append the filename to the end each line of data
and combine the data into one.
$mergedData= Get-ChildItem $path -Exclude yesterday,"OHCC Extract",output |
Get-ChildItem -recurse -Filter *csv |
Where-Object { $_.CreationTime -gt (Get-Date).AddDays(-1) } |
% {
$file = $_.Name
$fn = $_.FullName
## capture the header line
$FirstLine = Get-Content $fn -TotalCount 1
## add the column header for filename
$header = $FirstLine + ",Filename"
## get the contents of the files without the first line
Get-Content $fn | SELECT -Skip 1 | %{ "$_,$file" }
Now each file had 5 columns , ID, First Name , Last Name , Phone , Address. The column names are surrounded by double quotes ( "ID", "First Name" ) .
The request is now to skip everything but the ID and the Last Name column. So I tried ( starting with just ID, will add First Name later)
Get-Content $fn | SELECT -Skip 1 -Property ID | %{ "$_,$file" }
I get #{ID=} in the resulting file.
Then I tried
Get-Content $fn | SELECT -Skip 1 | %{ $_.ID }
which yield blanks and then
Import-Csv -Path $fn -Delimiter ',' | SELECT ID
Which gives #{ID=73aec2fe-6cb3-492e-a157-25e355ed9691}
At this point I am just flailing because I obviously don't know how to handle objects in PS.
I have PowerShell 5.1.19041.1682 on windows 10.
I was asked for sample data , so here it is. There are 35 files across multiple subdirectories
Input FileA
Input FileB
But I did figure it out myself . Working code
$path = "<directory where the files are located> "
$pathout = "<path to outputted file>"
$out = "$pathout\csv_merged_$(get-date -f MMddyyyy).csv"
$mergedData= Get-ChildItem $path -Exclude yesterday,output | Get-ChildItem -recurse -Filter *csv | Where-Object { $_.CreationTime -gt (Get-Date).AddDays(-1) } | % {
$file = $_.Name
$fn = $_.FullName
write-host $fn , $_.CreationTime
## get the contents of the files ,exclude columns and add columns
$Data = Import-Csv -Path $fn -Delimiter ',' | SELECT *, #{Name = 'Filename'; Expression = {$file}} -ExcludeProperty ID
# get the headers
$header= $Data | ConvertTo-Csv -NoTypeInformation | Select-Object -First 1
write-host $header
## convert the object and remove the column headers for each file
$Data | ConvertTo-Csv -NoTypeInformation | Select-Object -Skip 1
write-host '-----------------------'
# Prefix the header before the compiled data
$header, $mergedData | Set-Content -Encoding utf8 $out
The missing piece was the ConvertTo_Csv which expanded the object.
I have got a source CSV file (without a header, all columns delimited by a comma) which I am trying split out into separate CSV files based upon the value in the first column and using that column value as the output file name.
Input file:
S00000009,2016,M04 01/07/2016,0.00,0.00,0.00,0.00,0.00,0.00,750.00,0.00,0.00
S00000009,2016,M05 01/08/2016,0.00,0.00,0.00,0.00,0.00,0.00,600.00,0.00,0.00
S00000009,2016,M06 01/09/2016,0.00,0.00,0.00,0.00,0.00,0.00,600.00,0.00,0.00
S00000010,2015,W28 05/10/2015,2275.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
S00000010,2015,W41 04/01/2016,0.00,0.00,0.00,0.00,0.00,0.00,568.75,0.00,0.00
S00000010,2015,W42 11/01/2016,0.00,0.00,0.00,0.00,0.00,0.00,568.75,0.00,0.00
S00000012,2015,W10 01/06/2015,0.00,0.00,0.00,0.00,0.00,0.00,650.00,0.00,0.00
S00000012,2015,W11 08/06/2015,0.00,0.00,0.00,0.00,0.00,0.00,650.00,0.00,0.00
S00000012,2015,W12 15/06/2015,0.00,0.00,0.00,0.00,0.00,0.00,650.00,0.00,0.00
My PowerShell script looks like this:
Import-Csv INPUT_FILE.csv -Header service_id,year,period,cash_exp,cash_inc,cash_def,act_exp,act_inc,act_def,comm_exp,comm_inc,comm_def |
Group-Object -Property "service_id" |
Foreach-Object {
$path = $_.Name + ".csv";
$_.group | Export-Csv -Path $path -NoTypeInformation
Output files:
"S00000009","2016","M04 01/07/2016","0.00","0.00","0.00","0.00","0.00","0.00","750.00","0.00","0.00"
"S00000009","2016","M05 01/08/2016","0.00","0.00","0.00","0.00","0.00","0.00","600.00","0.00","0.00"
"S00000009","2016","M06 01/09/2016","0.00","0.00","0.00","0.00","0.00","0.00","600.00","0.00","0.00"
"S00000010","2015","W28 05/10/2015","2275.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00"
"S00000010","2015","W41 04/01/2016","0.00","0.00","0.00","0.00","0.00","0.00","568.75","0.00","0.00"
"S00000010","2015","W42 11/01/2016","0.00","0.00","0.00","0.00","0.00","0.00","568.75","0.00","0.00"
It is generating the new files using the header value in column 1 (service_id).
There are 2 problems.
The output CSV file contains a header row which I don't need.
The columns are enclosed with double quotes which I don't need.
First of all the .csv file needs headers and the quote marks as a csv file structure. But if you don't want them then you can go on with a text file or...
$temp = Import-Csv INPUT_FILE.csv -Header service_id,year,period,cash_exp,cash_inc,cash_def,act_exp,act_inc,act_def,comm_exp,comm_inc,comm_def | Group-Object -Property "service_id" |
Foreach-Object {
$temp0 = $_.group | ConvertTo-Csv -NoTypeInformation | Select-Object -Skip 1
$temp1 = $temp0.replace("""","")
$temp1 > $path
But this output is not a "real" csv file.
Hope that helps.
For your particular scenario you could probably use a simpler approach. Read the input file as a plain text file, group the lines by splitting off the first field, then write the groups to output files named after the groups:
Get-Content 'INPUT_FILE.csv' |
Group-Object { $_.Split(',')[0] } |
ForEach-Object { $_.Group | Set-Content ($_.Name + '.csv') }
Another solution,
using no named headers but simply numbers (as they aren't wanted in output anyway)
avoiding unneccessary temporary files.
removing only field delimiting double quotes.
Import-Csv INPUT_FILE.csv -Header (1..12) |
Group-Object -Property "1" | Foreach-Object {
($_.Group | ConvertTo-Csv -NoType | Select-Object -Skip 1).Trim('"') -replace '","',',' |
Set-Content -Path ("{0}.csv" -f $_.Name)
I am using the following script that iterates through hundreds of text files looking for specific instances of the regex expression within. I need to add a second data point to the array, which tells me the object the pattern matched in.
In the below script the [Regex]::Matches($str, $Pattern) | % { $_.Value } piece returns multiple rows per file, which cannot be easily output to a file.
What I would like to know is, how would I output a 2 column CSV file, one column with the file name (which should be $_.FullName), and one column with the regex results? The code of where I am at now is below.
$FolderPath = "C:\Test"
$Pattern = "(?i)(?<=\b^test\b)\s+(\w+)\S+"
$Lines = #()
Get-ChildItem -Recurse $FolderPath -File | ForEach-Object {
$str = Get-Content $_.FullName
$Lines += [Regex]::Matches($str, $Pattern) |
% { $_.Value } |
Sort-Object |
$Lines = $Lines.Trim().ToUpper() -replace '[\r\n]+', ' ' -replace ";", '' |
Sort-Object |
Get-Unique # Cleaning up data in array
I can think of two ways but the simplest way is to use a hashtable (dict). Another way is create psobjects to fill your Lines variable. I am going to go with the simple way so you can only use one variable, the hashtable.
$FolderPath = "C:\Test"
$Pattern = "(?i)(?<=\b^test\b)\s+(\w+)\S+"
$Results =#{}
Get-ChildItem -Recurse $FolderPath -File |
ForEach-Object {
$str = Get-Content $_.FullName
$Line = [regex]::matches($str,$Pattern) | % { $_.Value } | Sort-Object | Get-Unique
$Line = $Line.Trim().ToUpper() -Replace '[\r\n]+', ' ' -Replace ";",'' | Sort-Object | Get-Unique # Cleaning up data in array
$Results[$_.FullName] = $Line
$Results.GetEnumerator() | Select #{L="Folder";E={$_.Key}}, #{L="Matches";E={$_.Value}} | Export-Csv -NoType -Path <Path to save CSV>
Your results will be in $Results. $Result.keys contain the folder names. $Results.Values has the results from expression. You can reference the results of a particular folder by its key $Results["Folder path"]. of course it will error if the key does not exist.