Export CSV: file structure with folders as columns - powershell

My question is quite similar to one posted here: Export CSV. Folder, subfolder and file into separate column
I have a file and folder structure containing possibly up to 10 folders deep and I want to run PowerShell to create a hash table that writes each file into a row, with each of the folders as a separate column, and then the filename at a dedicated column.
I start off with
gci -path C:\test -file -recurse|export-csv C:\temp\out.csv -notypeinformation
But this produces the standard table with some of the info I need but the directory is of course presented as one long string.
I'd like to get an output where each folder and its subfolder that houses the file to be presented as a column.
C:\Test\Folder1\Folder2\Folder3\file.txt
to be presented as
Name
Parent1
Parent2
Parent3
Parent4
Parent5
Parent6
Filename
file.txt
Folder1
Folder2
Folder3
file.txt
image1.png
Folder1
image.1png
Doc1.docx
Folder1
Folder2
Folder3
Folder4
Folder5
Folder6
Doc1.docx
table3.csv
Folder1
Folder2
table3.csv
As you can see there are some files which have just one folder whereas others could stored in several folders deep.
I need to keep this consistent, as I want to use Power Automate and the File system connector to read the file paths using the Excel table and then parse and create the file into SharePoint using the parent/folder levels as metadata/column in the document library.
I took zett42's code from the linked question and modified it.
$allItems = Get-ChildItem C:\Test -File -Recurse | ForEach-Object {
# Split on directory separator (typically '\' for Windows and '/' for Unix-like OS)
$FullNameSplit = $_.FullName.Split( [IO.Path]::DirectorySeparatorChar )
# Create an object that contains the splitted path and the path depth.
# This is implicit output that PowerShell captures and adds to $allItems.
[PSCustomObject] #{
FullNameSplit = $FullNameSplit
PathDepth = $FullNameSplit.Count
Filename = $_.Name
}
}
# Determine highest column index from maximum depth of all paths.
# Minus one, because we'll skip root path component.
$maxColumnIndex = ( $allItems | Measure-Object -Maximum PathDepth ).Maximum - 1
$allRows = foreach( $item in $allItems ) {
# Create an ordered hashtable
$row = [ordered]#{}
# Add all path components to hashtable. Make sure all rows have same number of columns.
foreach( $i in 1..$maxColumnIndex ) {
$row[ "Filename" ] = $item.Filename
$row[ "Column$i" ] = if( $i -lt $item.FullNameSplit.Count ) { $item.FullNameSplit[ $i ] } else { $null }
}
# Convert hashtable to object suitable for output to CSV.
# This is implicit output that PowerShell captures and adds to $allRows.
[PSCustomObject] $row
}
I can get the filename to show as a separate column but I don't want the script to add the filename at the last column.
PowerShell allrows output screenshot
Thanks

I've answered my own question.
Modified zett42's script, and included a few variables around splitting around just the Name of from GetChild-Item as opposed to the FullName and then of course the fixed column with just the filename in the hash table.
$allItems = Get-ChildItem C:\Test -File -Recurse | ForEach-Object {
# Split on directory separator (typically '\' for Windows and '/' for Unix-like OS)
# $FullNameSplit = $_.FullName.Split( [IO.Path]::DirectorySeparatorChar )
$FullNameSplit = split-path -Path $_.FullName
$DirNameSplit = $FullNameSplit.Split( [IO.Path]::DirectorySeparatorChar )
# Create an object that contains the splitted path and the path depth.
# This is implicit output that PowerShell captures and adds to $allItems.
[PSCustomObject] #{
#FullNameSplit = $FullNameSplit
#PathDepth = $FullNameSplit.Count
DirNameSplit = $DirNameSplit
PathDepth = $DirNameSplit.Count
Filename = $_.Name
}
}
# Determine highest column index from maximum depth of all paths.
# Minus one, because we'll skip root path component.
$maxColumnIndex = ( $allItems | Measure-Object -Maximum PathDepth ).Maximum - 1
$allRows = foreach( $item in $allItems ) {
# Create an ordered hashtable
$row = [ordered]#{}
# Add all path components to hashtable. Make sure all rows have same number of columns.
foreach( $i in 1..$maxColumnIndex ) {
$row[ "Filename" ] = $item.Filename
#$row[ "Column$i" ] = if( $i -lt $item.FullNameSplit.Count ) { $item.FullNameSplit[ $i ] } else { $null }
$row[ "Parent$i" ] = if( $i -lt $item.DirNameSplit.Count ) { $item.DirNameSplit[ $i ] } else { $null }
# $row[ "Column$i" ] = $item.DirNameSplit[$i]
}
# Convert hashtable to object suitable for output to CSV.
# This is implicit output that PowerShell captures and adds to $allRows.
[PSCustomObject] $row
}

Related

Split property in array into many properties

I use get-childitem to get files from directory structure.
Get-ChildItem $path -Recurse -include *.jpg,*.png | select-object Directory, BaseName, Extension
I get an array of objects with properties Directory, BaseName, Extension.
Like:
Directory BaseName Extension
C:\dir1\dir2\dir3\dir4 file txt
I wan't to break directory structure into multiple properties inside same array - each subdirectory level it's own property.
The end result properties should be (I can remove c:\ earlier in script):
dir1 dir2 dir3 dir4 Basename Extension
dir1 dir2 dir3 dir4 file txt
I used to export that to csv and import it back with delimiter to another array and than rebuilding the original array, but I think there must be an easier way!
Here is a possible approach:
$path = 'C:\test'
$maxDepth = 0
Set-Location $path # Set base path for Resolve-Path -Relative
# Get all files and splits their directory paths
$tempResult = Get-ChildItem $path -Recurse -include *.jpg,*.png | ForEach-Object {
# Make directory path relative to $path
$relPath = Resolve-Path $_.Directory -Relative
# Create an array of directory path components, skipping the first '.' directory
$dirNames = $relPath -replace '^\.\\' -split '\\|/'
# Remember maximum directory depth
$maxDepth = [Math]::Max( $dirNames.Count, $maxDepth )
# Implicit output that PowerShell adds to $tempResult
[PSCustomObject]#{
dirNames = $dirNames
fileInfo = $_ | Select-Object BaseName, Extension
}
}
# Process $tempResult to add directory properties and file name properties
$finalResult = $tempResult.ForEach{
$outProps = [ordered] #{}
# Add directory properties
for( $i = 0; $i -lt $maxDepth; ++$i ) {
$outProps[ "Dir$i" ] = if( $i -lt $_.dirNames.Count ) { $_.dirNames[ $i ] } else { $null }
}
# Add all fileInfo properties
$_.fileInfo.PSObject.Properties.ForEach{ $outProps[ $_.Name ] = $_.Value }
# Implicit output that PowerShell adds to $finalResult
[PSCustomObject] $outProps
}
$finalResult # Output to console
This is done in two passes to ensure all output elements have the same number of directory properties:
Iterate over all files and split their directory paths. Determine maximum directory depth (number of directory properties).
Iterate over the intermediate result to create the desired objects consisting of directory properties and file name properties. This is done by first adding the properties to an ordered hashtable and then converting that hashtable to a PSCustomObject. This is easier and more efficient than using Add-Member.
Test input:
C:\test
\subdir1
file1.png
file2.jpg
\subdir2
\subdir3
file3.jpg
Output:
Dir0 Dir1 BaseName Extension
---- ---- -------- ---------
subdir1 file1 .png
subdir1 file2 .jpg
subdir2 subdir3 file3 .jpg

Export CSV. Folder, subfolder and file into separate column

I created a script that lists all the folders, subfolders and files and export them to csv:
$path = "C:\tools"
Get-ChildItem $path -Recurse |select fullname | export-csv -Path "C:\temp\output.csv" -NoTypeInformation
But I would like that each folder, subfolder and file in pfad is written into separate column in csv.
Something like this:
c:\tools\test\1.jpg
Column1
Column2
Column3
tools
test
1.jpg
I will be grateful for any help.
Thank you.
You can split the Fullname property using the Split() method. The tricky part is that you need to know the maximum path depth in advance, as the CSV format requires that all rows have the same number of columns (even if some columns are empty).
# Process directory $path recursively
$allItems = Get-ChildItem $path -Recurse | ForEach-Object {
# Split on directory separator (typically '\' for Windows and '/' for Unix-like OS)
$FullNameSplit = $_.FullName.Split( [IO.Path]::DirectorySeparatorChar )
# Create an object that contains the splitted path and the path depth.
# This is implicit output that PowerShell captures and adds to $allItems.
[PSCustomObject] #{
FullNameSplit = $FullNameSplit
PathDepth = $FullNameSplit.Count
}
}
# Determine highest column index from maximum depth of all paths.
# Minus one, because we'll skip root path component.
$maxColumnIndex = ( $allItems | Measure-Object -Maximum PathDepth ).Maximum - 1
$allRows = foreach( $item in $allItems ) {
# Create an ordered hashtable
$row = [ordered]#{}
# Add all path components to hashtable. Make sure all rows have same number of columns.
foreach( $i in 1..$maxColumnIndex ) {
$row[ "Column$i" ] = if( $i -lt $item.FullNameSplit.Count ) { $item.FullNameSplit[ $i ] } else { $null }
}
# Convert hashtable to object suitable for output to CSV.
# This is implicit output that PowerShell captures and adds to $allRows.
[PSCustomObject] $row
}
# Finally output to CSV file
$allRows | Export-Csv -Path "C:\temp\output.csv" -NoTypeInformation
Notes:
The syntax Select-Object #{ Name= ..., Expression = ... } creates a calculated property.
$allRows = foreach captures and assigns all output of the foreach loop to variable $allRows, which will be an array if the loop outputs more than one object. This works with most other control statements as well, e. g. if and switch.
Within the loop I could have created a [PSCustomObject] directly (and used Add-Member to add properties to it) instead of first creating a hashtable and then converting to [PSCustomObject]. The choosen way should be faster as no additional overhead for calling cmdlets is required.
While a file with rows containing a variable number of items is not actually a CSV file, you can roll your own and Microsoft Excel can read it.
=== Get-DirCsv.ps1
Get-Childitem -File |
ForEach-Object {
$NameParts = $_.FullName -split '\\'
$QuotedParts = [System.Collections.ArrayList]::new()
foreach ($NamePart in $NameParts) {
$QuotedParts.Add('"' + $NamePart + '"') | Out-Null
}
Write-Output $($QuotedParts -join ',')
}
Use this to capture the output to a file with:
.\Get-DirCsv.ps1 | Out-File -FilePath '.\dir.csv' -Encoding ascii

Split a large csv file into multiple csv files according to the size in powershell

I have a large CSV file and I want to split it with respect to size and the header should be in every file.
For example, I have this 1.6MB file and I want the child files shouldn't be more than 512KB. So practically the parent file should have 4 child file.
Tried with the below simple program but the file is splitting with blank child files.
function csvSplitter {
$csvFile = "D:\Test\PTest\Dummy.csv";
$split = 10;
$content = Import-Csv $csvFile;
$start = 1;
$end = 0;
$records_per_file = [int][Math]::Ceiling($content.Count / $split);
for($i = 1; $i -le $split; $i++) {
$end += $records_per_file;
$content | Where-Object {[int]$_.Id -ge $start -and [int]$_.Id -le $end} | Export-Csv -Path "D:\Test\PTest\Destination\file$i.csv" -NoTypeInformation;
$start = $end + 1;
}
}csvSplitter
The logic for the size of the file is yet to write.
Tried to add both the files but I guess there is no option to add files.
this takes a slightly different path to a solution. [grin]
it ...
loads the CSV as a plain text file
saves the 1st line as a header line
calcs the batch size from the total line count & the batch count
uses array index ranges to grab the lines for each batch
combines the header line with the current batch of lines
writes that out to a text file
the reason for such a roundabout method is to save RAM. one drawback to loading the file as a CSV is the sheer amount of RAM needed. just loading the lines of text requires noticeably less RAM.
$SourceDir = $env:TEMP
$InFileName = 'LargeFile.csv'
$InFullFileName = Join-Path -Path $SourceDir -ChildPath $InFileName
$BatchCount = 4
$DestDir = $env:TEMP
$OutFileName = 'LF_Batch_.csv'
$OutFullFileName = Join-Path -Path $DestDir -ChildPath $OutFileName
#region >>> build file to work with
# remove this region when you are ready to do this with your test data OR to do this with real data
if (-not (Test-Path -LiteralPath $InFullFileName))
{
Get-ChildItem -LiteralPath $env:APPDATA -Recurse -File |
Sort-Object -Property Name |
Select-Object Name, Length, LastWriteTime, Directory |
Export-Csv -LiteralPath $InFullFileName -NoTypeInformation
}
#endregion >>> build file to work with
$CsvAsText = Get-Content -LiteralPath $InFullFileName
[array]$HeaderLine = $CsvAsText[0]
$BatchSize = [int]($CsvAsText.Count / $BatchCount) + 1
$StartLine = 1
foreach ($B_Index in 1..$BatchCount)
{
if ($B_Index -ne 1)
{
$StartLine = $StartLine + $BatchSize + 1
}
$CurrentOutFullFileName = $OutFullFileName.Replace('_.', ('_{0}.' -f $B_Index))
$HeaderLine + $CsvAsText[$StartLine..($StartLine + $BatchSize)] |
Set-Content -LiteralPath $CurrentOutFullFileName
}
there is no output on screen, but i got 4 files named LF_Batch_1.csv thru LF_Batch_4.csv that contained the 4our parts of the source file as expected. the last file has a slightly smaller number of rows, but that is what happens when the row count is not evenly divisible by the batch count. [grin]
Try this:
Add-Type -AssemblyName System.Collections
function Split-Csv {
param (
[string]$filePath,
[int]$partsNum
)
# Use generic lists for import/export
[System.Collections.Generic.List[object]]$contentImport = #()
[System.Collections.Generic.List[object]]$contentExport = #()
# import csv-file
$contentImport = Import-Csv $filePath
# how many lines per export file
$linesPerFile = [Math]::Max( [int]($contentImport.Count / $partsNum), 1 )
# start pointer for source list
$startPointer = 0
# counter for file name
$counter = 1
# main loop
while( $startPointer -lt $contentImport.Count ) {
# clear export list
[void]$contentExport.Clear()
# determine from-to from source list to export
$endPointer = [Math]::Min( $startPointer + $linesPerFile, $contentImport.Count )
# move lines to export to export list
[void]$contentExport.AddRange( $contentImport.GetRange( $startPointer, $endPointer - $startPointer ) )
# export
$contentExport | Export-Csv -Path ($filePath.Replace('.', $counter.ToString() + '.' ) ) -NoTypeInformation -Force
# move pointer
$startPointer = $endPointer
# increase counter for filename
$counter++
}
}
Split-Csv -filePath 'test.csv' -partsNum 7
try running this script:
$sw = new-object System.Diagnostics.Stopwatch
$sw.Start()
$FilePath = $HOME +'\Documents\Projects\ADOPT\Data8277.csv'
$SplitDir = $HOME +'\Documents\Projects\ADOPT\Split\'
CSV-FileSplitter -Path $FilePath -PartSizeBytes 35MB -SplitDir $SplitDir #-Verbose
$sw.Stop()
Write-Host "Split complete in " $sw.Elapsed.TotalSeconds "seconds"
I created this for files larger than 50GB files

How to output jagged noteproperty values using Export-csv powershell

I wrote a function that parses out the folder names for files and stores them as note properties for each individual folder encountered, so directory(n) = direcory1, directoryn+1= directory2 etc... So for each file the directory(s) will be various lengths depending on where the file is in the directory structure.
The problem that I am facing is how to output the jagged directory results in column format using Export-csv combined with other static property values, since the noteproperty lengths will vary from file to file (Jagged) I am struggling to figure out the logic to try and output in csv format the directory's in column format.
The output should have headers like the following:
Example File1
Directory 1, Directory 2, Directory 3, Other properties
Directory Value 1, Directory Value 2, Directory value 3
File2
Directory 1, Directory 2, Directory 3, Directory 4
Directory value 1, Directory value 2, Directory 3, Directory 4
function Get-Folder ($Files)
{
foreach ($file in $Files)
{
$TotalDirLvl = ($file.FullName.Split('\').count)-1
$x =0
While($x -lt $TotalDirLvl){
$file|Add-Member -NotePropertyName Directory$x -NotepropertyValue
$file.FullName.Split('\')[$x]
$x++
}
}
Return $Files
}
You need to know how many directories will be in the tree that you export so that you can create the proper number of properties on your object that you Export-CSV or your csv file will not have properties "to the right" of first row. IE file 2 in your example would have Directory 1..3, but not 4. The way I did this is by looping over the files twice. The first time gets the max depth you will traverse, and the second time constructs a psobject and adds it to a array to be written to an csv file at the end.
For files that have fewer path segments than your maximum path segment, you need to specify an empty or null value for the segments that are not filled. Also, if you want to include other properties you should probably do it to the left of this directory tree. If you do not want a property for a given file, you still need to pass a null/blank value into your object's property or else.
The script below creates a csv file like:
from a directory structure that you explain in your post.
$files = Get-ChildItem -Path "$env:temp\SO" -Recurse | where { ! $_.PSIsContainer }
$outObjs = #()
$maxDepth = 0
foreach ($file in $files) {
$TotalDirLvl = ($file.FullName.Split('\').count)-1
if ($TotalDirLvl -gt $maxDepth){
$maxDepth = $TotalDirLvl
}
}
foreach($file in $files){
$outObj = New-Object PSObject
$fileDepth = ($file.FullName.Split('\').count)-1
$outObj | Add-Member -NotePropertyName DirectoryDepth -NotepropertyValue $fileDepth
$x = 0
#Add other properties for each file here to the left of your directory tree
While($x -le $maxDepth)
{
if ($x -gt $fileDepth){
$value = ''
}
else{
$value = $file.FullName.Split('\')[$x]
}
$outObj | Add-Member -NotePropertyName "Directory$x" -NotepropertyValue $value
$x++
}
$outOBjs += $outObj
}
$outOBjs | Export-Csv -Path "$env:temp\SO\test.csv" -NoTypeInformation -Force

Powershell - Pass list of directory paths to FOR Loop - Output results to CSV

The code below works. Rather than specify the path manually I would like to pass a list of values from a csv file E:\Data\paths.csv and then output individual csv files for each path processed displaying the $Depth for that directory......
$StartLevel = 0 # 0 = include base folder, 1 = sub-folders only, 2 = start at 2nd level
$Depth = 10 # How many levels deep to scan
$Path = "E:\Data\MyPath" # starting path
For ($i=$StartLevel; $i -le $Depth; $i++) {
$Levels = "\*" * $i
(Resolve-Path $Path$Levels).ProviderPath | Get-Item | Where PsIsContainer |
Select FullName
}
Thanks,
Phil
Get-Help Import-Csv will help you in this regards.
regards,
kvprasoon
I assume you want something like the following:
# Create sample input CSV
#"
Path,StartLevel,Depth
"E:\Data\MyPath",0,10
"# > PathSpecs.csv
# Loop over each input CSV row (object with properties
# .Path, .StartLevel, and .Depth)
foreach ($pathSpec in Import-Csv PathSpecs.csv) {
& { For ([int] $i=$pathSpec.StartLevel; $i -le $pathSpec.Depth; $i++) {
$Levels = "\*" * $i
Resolve-Path "$($pathSpec.Path)$Levels" | Get-Item | Where PsIsContainer |
Select FullName
} } | # Export paths to a CSV file named "Path-<input-path-with-punct-stripped>.csv"
Export-Csv -NoTypeInformation "Path-$($pathSpec.Path -replace '[^\p{L}0-9]+', '_').csv"
}
Note that your approach to breadth-first enumeration of subdirectories in the subtree works, but will be quite slow with large subtrees.