Bulk Merge of CSV Files in PowerShell using parent folder name - powershell

After beginning this task at the command line I realised I need to get down and dirty with Powershell. I have about 100 folders and each folder has a few thousand CSV files that I would like to merge together inside each folder. Ideally the merged CSV file(s) in each folder would use the parent folders name. For example, here is a top level folder conatining the 100 folders
E:\CSVFolders
The subfolders are named in a semi-random fashion like this:
E:\CSVFolders\Folder1
E:\CSVFolders\Folder18
So far I am at this point:
# Merge csv files and use the parent folder name
Import-Csv (Get-ChildItem File*.csv) |
Export-Csv $folderName.csv -NoTypeInformation -Encoding UTF8
I am struggling to make the script enumerate the subfolders and then use their name as the basis for the merged CSV file so if anyone is able to shed light on this I would appreciate it!

Use two loops:
Get-ChildItem 'E:\CSVFolders' | Where-Object {
$_.PSIsContainer
} | ForEach-Object {
$csv = Join-Path $_.FullName ($_.Name + '.csv')
Get-ChildItem $_.FullName -Filter File*.csv | ForEach-Object {
Import-Csv $_.FullName
} | Export-Csv $csv -NoType -Encoding UTF8
}

you can group by directory like this:
Get-ChildItem "c:\temp" -file -Filter "*.csv" -Recurse |
group DirectoryName |
%{$dir=$_.Name; $_.Group.FullName | %{import-csv -path $_} | export-csv "$dir\global.csv" -NoTypeInformation}
short version (for no purist) :
gci "c:\temp" -file -Filter "*.csv" -Rec |
group DirectoryName |
%{$dir=$_.Name; $_.Group.FullName | %{ipcsv -path $_} | epcsv "$dir\global.csv" -NoType}

Related

Is there a way to pull a variable value into a select-object expression?

The purpose of this script is to find the Name, directory and last write time of files and output it to a csv file.
get-childitem D:\testing -recurse -filter *.txt | select-object Name,DirectoryName,LastWriteTime, #{Name="New_colimn";Expression={"copy-item \`"DirectoryName\`" To_Compile_directory"}} | where { $_.DirectoryName -ne $NULL } | Export-CSV D:\testing\rdf.csv
My problem is that there is 1 cell I want to fill with another script that takes values from the generated csv file. Is there a way to pull the value of each DirectoryName and paste it into the Expression of the same row? I only get an error that says DirectoryName is an invalid key.
when I try to pull using $.DirectoryName the script only reads the $ and the value it has is the Name.
Thank for helping.
Did you mean to collect the data from the files like this:
Get-ChildItem -Path 'D:\testing' -Filter '*.txt' -File -Recurse |
Select-Object Name,DirectoryName,LastWriteTime | Export-Csv -Path 'D:\testing\rdf.csv' -NoTypeInformation
and then have your other script read the DirectoryName's from it like this?
$directories = (Import-Csv -Path 'D:\testing\rdf.csv').DirectoryName | Select-Object -Unique
# maybe do something with these directories here?
foreach ($folderPath in $directories) {
# copy the directories including the files to an existing root destination folder
Copy-Item -Path $folderPath -Destination 'D:\SomeExistingDestinationPath' -Recurse -Force
}

List file count by subfolder

I am trying to use powershell to produce a list of folder names and how many files are in each folder.
I have this script
$dir = "C:\Users\folder"
Get-ChildItem $dir -Recurse -Directory | ForEach-Object{
[pscustomobject]#{
Folder = $_.FullName
Count = #(Get-ChildItem -Path $_.Fullname -File).Count
}
} | Select-Object Folder,Count
Which lists the file count, but it puts the full path (i.e. C:\Users\name\Desktop\1\2\-movi...). Is there any way to just display the last folder ("movies") as well as save the result to a .txt file?
Thank you
Instead of $_.FullName, use $_.Name to only get the directory name.
Your Select-Object call is redundant - it is effectively a no-op.
While it's easy to send the results to a .txt file with >, for instance, it's better to use a more structured format for later programmatic processing.
In the simplest form, that means outputting to a CSV file via Export-Csv; generally, however, the most faithful way of serializing objects to a file is to use Export-CliXml.
Using Export-Csv for serialization:
$dir = 'C:\Users\folder'
Get-ChildItem -LiteralPath $dir -Recurse -Directory | ForEach-Object {
[pscustomobject] #{
Folder = $_.Name
Count = #(Get-ChildItem -LiteralPath $_.Fullname -File).Count
}
} | Export-Csv -NoTypeInformation results.csv
Note that you could streamline your command by replacing the ForEach-Object call with a Select-Object call that uses a calculated property:
$dir = 'C:\Users\folder'
Get-ChildItem -LiteralPath $dir -Recurse -Directory |
Select-Object Name,
#{ n='Count'; e={#(Get-ChildItem -LiteralPath $_.Fullname -File).Count} } |
Export-Csv -NoTypeInformation results.csv
You mean something like this...
Clear-Host
Get-ChildItem -Path 'd:\temp' -Recurse -Directory |
Select-Object Name,FullName,
#{Name='FileCount';Expression = {(Get-ChildItem -Path $_.FullName -File -Recurse| Measure-Object).Count}} `
| Format-Table -AutoSize
# Results
Name FullName FileCount
---- -------- ---------
abcpath0 D:\temp\abcpath0 5
abcpath1 D:\temp\abcpath1 5
abcpath2 D:\temp\abcpath2 5
Duplicates D:\temp\Duplicates 12677
EmptyFolder D:\temp\EmptyFolder 0
NewFiles D:\temp\NewFiles 4
PngFiles D:\temp\PngFiles 4
results D:\temp\results 905
...

Merges csv files from directory into a single csv file PowerShell

How can I run one single PowerShell script that does the following in series?
Adds a the filename of all csv files in a directory as a column in the end of each file using this script:
Get-ChildItem *.csv | ForEach-Object {
$CSV = Import-CSV -Path $_.FullName -Delimiter ","
$FileName = $_.Name
$CSV | Select-Object *,#{N='Filename';E={$FileName}} | Export-CSV $_.FullName -NTI -Delimiter ","}
Merges all csv files in the directory into a single csv file
Keeping only a header (first row) only from first csv and excluding all other first rows from files.
Similiar to what kemiller2002 has done here, except one script with csv inputs and a csv output.
Bill's answer allows you to combine CSVs, but doesn't tack file names onto the end of each row. I think the best way to do that would be to use the PipelineVariable common parameter to add that within the ForEach loop.
Get-ChildItem \inputCSVFiles\*.csv -PipelineVariable File |
ForEach-Object { Import-Csv $_ | Select *,#{l='FileName';e={$File.Name}}} |
Export-Csv \outputCSVFiles\newOutputFile.csv -NoTypeInformation
That should accomplish what you're looking for.
This is the general pattern:
Get-ChildItem \inputCSVFiles\*.csv |
ForEach-Object { Import-Csv $_ } |
Export-Csv \outputCSVFiles\newOutputFile.csv -NoTypeInformation
Make sure the output CSV file has a different filename pattern, or use a different directory name (like in this example).
If your csv files dont have always same header you can do it :
$Dir="c:\temp\"
#get header first csv file founded
$header=Get-ChildItem $Dir -file -Filter "*.csv" | select -First 1 | Get-Content -head 1
#header + all rows without header into new file
$header, (Get-ChildItem $Dir -file -Filter "*.csv" | %{Get-Content $_.fullname | select -skip 1}) | Out-File "c:\temp\result.csv"

Powershell Delete Files but keep last x version

I have a folder structure with, for example, 100 folders. Each folder has 200 files in it.
I would like to delete (via scheduled task) all files in each folder but keep the last 10 versions of it.
I am trying to upskill in Powershell so I am guessing that this should be pretty simple. I have created this script,
#Delete all files, keep last 10 versions#
$Directory = "D:\Octopus\Packages"
$Keep = "10"
Get-ChildItem $Directory| ?{ $_.PSIsContainer } | Select-Object FullName | Export-Csv $Directory\FolderList.csv
$FolderList = import-csv $Directory\FolderList.csv
ForEach ($row in $FolderList)
{
Get-ChildItem -Recurse | where{-not $_.PsIsContainer}| sort CreationTime -desc | select -Skip $Keep | Remove-Item -Force
}
It appears to be looping through each folder, but keeping the last 10 files for the entire folder structure, not per folder. So some folders have 0 files, some may have 2 files, some may have 8 files.
Any pointers would be appreciated
Thanks !
If you actually need to have that CSV then just modify Get-ChildItem -Recurse to Get-ChildItem $row -recurse. However, if you don't need to be creating the CSV, you can remove of that and just pipe the results of your first Get-ChildItem into the next action.
$Directory = "D:\Octopus\Packages"
$Keep = "10"
Get-ChildItem $Directory| ?{ $_.PSIsContainer } | Select-Object FullName |
ForEach-object {Get-ChildItem $_.fullname -Recurse |
where{-not $_.PsIsContainer}| sort CreationTime -desc |
select -Skip $Keep | Remove-Item -Force }

Export CSV list with all specific files in directory - with their "path"

I am currently running the following script:
Get-ChildItem $dir1 -recurse -include *ending.csv |
Sort-Object fullname |
Select $CurrentDate,Path,FullName,Name,CreationTime,Length,Chosen |
Export-Csv -Delimiter ';' -Force -NoTypeInformation $dir2
$dir1 and $dir2 are the source / destination paths.
Now the script just creates a CSV file that lists all files that have an ending according to what I specified along with some other parameters of the files.
I'd also like to have a column that contains the "path without the file" for the file. So basically the FullName without Name of the file.
Is there such a cmdlet / command?
You're looking for the Directory property:
Get-ChildItem $dir1 -recurse -include *ending.csv |
Sort-Object fullname |
Select $CurrentDate,Directory,FullName,Name,CreationTime,Length,Chosen |
Export-Csv -Delimiter ';' -Force -NoTypeInformation $dir2