PowerShell script - Loop list of folders to get file count and sum of files for each folder listed - powershell

I want to get the file count & the sum of files for each individual folder listed in DGFoldersTEST.txt.
However, I’m currently getting the sum of all 3 folders.
And now I'm getting 'Index was outside the bounds of the array' error message.
$DGfolderlist = Get-Content -Path C:\DiskGroupsFolders\DGFoldersTEST.txt
$FolderSize =#()
$int=0
Foreach ($DGfolder in $DGfolderlist)
{
$FolderSize[$int] =
Get-ChildItem -Path $DGfolderlist -File -Recurse -Force -ErrorAction SilentlyContinue |
Measure-Object -Property Length -Sum |
Select-Object -Property Count, #{Name='Size(MB)'; Expression={('{0:N2}' -f($_.Sum/1mb))}}
Write-Host $DGfolder
Write-Host $FolderSize[$int]
$int++
}

To explain the error, you're trying to assign a value at index $int of your $FolderSize array, however, when arrays are initialized using the array subexpression operator #(..), they're intialized with 0 Length, hence why the error. It's different as to when you would initialize them with a specific Length:
$arr = #()
$arr.Length # 0
$arr[0] = 'hello' # Error
$arr = [array]::CreateInstance([object], 10)
$arr.Length # 10
$arr[0] = 'hello' # all good
As for how to approach your code, since you don't really know how many items will come as output from your loop, initializing an array with a specific Length is not possible. PowerShell offers the += operator for adding elements to it, however this is a very expensive operation and not a very good idea because each time we append a new element to the array, a new array has to be created, this is because arrays are of a fixed size. See this answer for more information and better approaches.
You can simply let PowerShell capture the output of your loop by assigning the variable to the loop itself:
$FolderSize = foreach ($DGfolder in $DGfolderlist) {
Get-ChildItem -Path $DGfolder -File -Recurse -Force -ErrorAction SilentlyContinue |
Measure-Object -Property Length -Sum |
Select-Object #(
#{ Name = 'Folder'; Expression = { $DGfolder }}
'Count'
#{ Name = 'Size(MB)'; Expression = { ($_.Sum / 1mb).ToString('N2') }}
)
}

Related

Why Powershell outputting this table?

I'm a powershell noob. How come the following code is also outputing the table at the end after the "File to Delete" loop?
$stopwatch = [System.Diagnostics.Stopwatch]::StartNew()
# use partial hashes for files larger than 100KB:
# see documentation at: https://powershell.one/tricks/filesystem/finding-duplicate-files#finding-duplicate-files-fast
$result = Find-PSOneDuplicateFileFast -Path '\\READYNAS\Pictures\2020\10' #-Debug -Verbose
$stopwatch.Stop()
# output duplicates
$allFilesToDelete = #(foreach($key in $result.Keys)
{
#filters out the LAST item in the array of duplicates, because a file name of xxxx (0) comes before one without the (0)
$filesToDelete = $result[$key][0..($result[$key].count - 2)]
#add each remaining duplicate file to table
foreach($file in $filesToDelete)
{
$file |
Add-Member -MemberType NoteProperty -Name Hash -Value $key -PassThru |
Select-Object Hash, Length, FullName
}
}
)
$allFilesToDelete | Format-Table -GroupBy Hash -Property FullName | Out-String | Write-Host
$allFilesToDelete | Sort-Object -Property FullName -OutVariable allFilesToDelete
$allFilesToDelete | Format-Table -Property FullName | Out-String | Write-Host
$confirmation = Read-Host "Are you Sure You Want To Delete $($allFilesToDelete.count) files? (y/n)"
if ($confirmation -eq 'y') {
$i = 0
foreach($fileToDelete in $allFilesToDelete)
{
$i++
Write-Host "$i File to Delete: $($fileToDelete.FullName)"
#Remove-Item $file.FullName -Force -Verbose 4>&1 | % { $x = $_; Write-Host "Deleted file ($i) $x" }
}
} else {
Write-Host "User chose NOT to delete files!"
}
$allFilesToDelete | Sort-Object -Property FullName -OutVariable allFilesToDelete produces output (the input objects in the requested sort order), and since you're not capturing or redirecting it, it prints to the host (display, terminal) by default.
It seems your intent is to sort the objects stored in $allFilesToDelete, which your command does, but it also produces output (the common -OutVariable parameter does not affect a cmdlet's output behavior, it simply also stores the output objects in the given variable); you could simply assign the output back to the original variable, which wouldn't produce any output:
$allFilesToDelete = $allFilesToDelete | Sort-Object -Property FullName
In cases where actively suppressing (discarding) output is needed, $null = ... is the simplest solution:
See this answer for details and alternatives.
Also see this blog post, which you found yourself.
Because the output resulted in implicitly Format-Table-formatted display representations (for custom objects that have no predefined formatting data), the subsequent Read-Host and Write-Host statements - surprisingly - printed first.
The reason is that this implicit use of Format-Table results in asynchronous behavior: output objects are collected for 300 msecs. in an effort to determine suitable column widths, and during that period output to other output streams may print.
The - suboptimal - workaround is to force pipeline output to print synchronously to the host (display), using Out-Host.
See this answer for details.

PowerShell: Find unique values from multiple CSV files

let's say that I have several CSV files and I need to check a specific column and find values that exist in one file, but not in any of the others. I'm having a bit of trouble coming up with the best way to go about it as I wanted to use Compare-Object and possibly keep all columns and not just the one that contains the values I'm checking.
So I do indeed have several CSV files and they all have a Service Code column, and I'm trying to create a list for each Service Code that only appears in one file. So I would have "Service Codes only in CSV1", "Service Codes only in CSV2", etc.
Based on some testing and a semi-related question, I've come up with a workable solution, but with all of the nesting and For loops, I'm wondering if there is a more elegant method out there.
Here's what I do have:
$files = Get-ChildItem -LiteralPath "C:\temp\ItemCompare" -Include "*.csv"
$HashList = [System.Collections.Generic.List[System.Collections.Generic.HashSet[String]]]::New()
For ($i = 0; $i -lt $files.Count; $i++){
$TempHashSet = [System.Collections.Generic.HashSet[String]]::New([String[]](Import-Csv $files[$i])."Service Code")
$HashList.Add($TempHashSet)
}
$FinalHashList = [System.Collections.Generic.List[System.Collections.Generic.HashSet[String]]]::New()
For ($i = 0; $i -lt $HashList.Count; $i++){
$UniqueHS = [System.Collections.Generic.HashSet[String]]::New($HashList[$i])
For ($j = 0; $j -lt $HashList.Count; $j++){
#Skip the check when the HashSet would be compared to itself
If ($j -eq $i){Continue}
$UniqueHS.ExceptWith($HashList[$j])
}
$FinalHashList.Add($UniqueHS)
}
It seems a bit messy to me using so many different .NET references, and I know I could make it cleaner with a tag to say using namespace System.Collections.Generic, but I'm wondering if there is a way to make it work using Compare-Object which was my first attempt, or even just a simpler/more efficient method to filter each file.
I believe I found an "elegant" solution based on Group-Object, using only a single pipeline:
# Import all CSV files.
Get-ChildItem $PSScriptRoot\csv\*.csv -File -PipelineVariable file | Import-Csv |
# Add new column "FileName" to distinguish the files.
Select-Object *, #{ label = 'FileName'; expression = { $file.Name } } |
# Group by ServiceCode to get a list of files per distinct value.
Group-Object ServiceCode |
# Filter by ServiceCode values that exist only in a single file.
# Sort-Object -Unique takes care of possible duplicates within a single file.
Where-Object { ( $_.Group.FileName | Sort-Object -Unique ).Count -eq 1 } |
# Expand the groups so we get the original object structure back.
ForEach-Object Group |
# Format-Table requires sorting by FileName, for -GroupBy.
Sort-Object FileName |
# Finally pretty-print the result.
Format-Table -Property ServiceCode, Foo -GroupBy FileName
Test Input
a.csv:
ServiceCode,Foo
1,fop
2,fip
3,fap
b.csv:
ServiceCode,Foo
6,bar
6,baz
3,bam
2,bir
4,biz
c.csv:
ServiceCode,Foo
2,bla
5,blu
1,bli
Output
FileName: b.csv
ServiceCode Foo
----------- ---
4 biz
6 bar
6 baz
FileName: c.csv
ServiceCode Foo
----------- ---
5 blu
Looks correct to me. The values 1, 2 and 3 are duplicated between multiple files, so they are excluded. 4, 5 and 6 exist only in single files, while 6 is a duplicate value only within a single file.
Understanding the code
Maybe it is easier to understand how this code works, by looking at the intermediate output of the pipeline produced by the Group-Object line:
Count Name Group
----- ---- -----
2 1 {#{ServiceCode=1; Foo=fop; FileName=a.csv}, #{ServiceCode=1; Foo=bli; FileName=c.csv}}
3 2 {#{ServiceCode=2; Foo=fip; FileName=a.csv}, #{ServiceCode=2; Foo=bir; FileName=b.csv}, #{ServiceCode=2; Foo=bla; FileName=c.csv}}
2 3 {#{ServiceCode=3; Foo=fap; FileName=a.csv}, #{ServiceCode=3; Foo=bam; FileName=b.csv}}
1 4 {#{ServiceCode=4; Foo=biz; FileName=b.csv}}
1 5 {#{ServiceCode=5; Foo=blu; FileName=c.csv}}
2 6 {#{ServiceCode=6; Foo=bar; FileName=b.csv}, #{ServiceCode=6; Foo=baz; FileName=b.csv}}
Here the Name contains the unique ServiceCode values, while Group "links" the data to the files.
From here it should already be clear how to find values that exist only in single files. If duplicate ServiceCode values within a single file wouldn't be allowed, we could even simplify the filter to Where-Object Count -eq 1. Since it was stated that dupes within single files may exist, we need the Sort-Object -Unique to count multiple equal file names within a group as only one.
It is not completely clear what you expect as an output.
If this is just the ServiceCodes that intersect then this is actually a duplicate with:
Comparing two arrays & get the values which are not common
Union and Intersection in PowerShell?
But taking that you actually want the related object and files, you might use this approach:
$HashTable = #{}
ForEach ($File in Get-ChildItem .\*.csv) {
ForEach ($Object in (Import-Csv $File)) {
$HashTable[$Object.ServiceCode] = $Object |Select-Object *,
#{ n='File'; e={ $File.Name } },
#{ n='Count'; e={ $HashTable[$Object.ServiceCode].Count + 1 } }
}
}
$HashTable.Values |Where-Object Count -eq 1
Here is my take on this fun exercise, I'm using a similar approach as yours with the HashSet but adding [System.StringComparer]::OrdinalIgnoreCase to leverage the .Contains(..) method:
using namespace System.Collections.Generic
# Generate Random CSVs:
$charset = 'abABcdCD0123xXyYzZ'
$ran = [random]::new()
$csvs = #{}
foreach($i in 1..50) # Create 50 CSVs for testing
{
$csvs["csv$i"] = foreach($z in 1..50) # With 50 Rows
{
$index = (0..2).ForEach({ $ran.Next($charset.Length) })
[pscustomobject]#{
ServiceCode = [string]::new($charset[$index])
Data = $ran.Next()
}
}
}
# Get Unique 'ServiceCode' per CSV:
$result = #{}
foreach($key in $csvs.Keys)
{
# Get all unique `ServiceCode` from the other CSVs
$tempHash = [HashSet[string]]::new(
[string[]]($csvs[$csvs.Keys -ne $key].ServiceCode),
[System.StringComparer]::OrdinalIgnoreCase
)
# Filter the unique `ServiceCode`
$result[$key] = foreach($line in $csvs[$key])
{
if(-not $tempHash.Contains($line.ServiceCode))
{
$line
}
}
}
# Test if the code worked,
# If something is returned from here means it didn't work
foreach($key in $result.Keys)
{
$tmp = $result[$result.Keys -ne $key].ServiceCode
foreach($val in $result[$key])
{
if($val.ServiceCode -in $tmp)
{
$val
}
}
}
i was able to get unique items as follow
# Get all items of CSVs in a single variable with adding the file name at the last column
$CSVs = Get-ChildItem "C:\temp\ItemCompare\*.csv" | ForEach-Object {
$CSV = Import-CSV -Path $_.FullName
$FileName = $_.Name
$CSV | Select-Object *,#{N='Filename';E={$FileName}}
}
Foreach($line in $CSVs){
$ServiceCode = $line.ServiceCode
$file = $line.Filename
if (!($CSVs | where {$_.ServiceCode -eq $ServiceCode -and $_.filename -ne $file})){
$line
}
}

delete objects from array when their path property equals object in another array

I have an $array of PSCustomObjects which contain a path,days,filter and recurse property
I Test-Path the path of each PSCustomObject and if it's false, I save only the path in another variable like $failpath
Now I want to remove all Objects inside $array when the path is inside $failpath
I tried things like the .remove() method for the $array, but that doesn't work and gave me this error (example pic from web): https://i0.wp.com/www.sapien.com/blog/wp-content/uploads/2014/11/image8.png
So I tried creating a new array, but it's giving me a hard time because I don't know how to iterate over the failpaths correctly. so that each correct objects only gets sent to the new array once (when I tried it, the correct object was there multiple times) - i can't show you the code for this because I already edited it too many times and now it's just a mess.
this is how $array and $faultypath look like
$array = #(
[pscustomobject]#{
path = "\\server\daten\Alle Adressen\Dokumente 70"
filter = "*.pdf"
days = "90"
recurse = "false"
}
[pscustomobject]#{
path = "\\server\Tobit\itacom\ERP2UMS"
filter = "*.fax"
days = "7"
recurse = "false"
}
)
[string[]]$faultypath = #()
$pfade | % { if (!(Test-Path $_.path)) { $faultypath += $_.path } }
How can I substract everything which is in $faultypath from $array?
For PowerShell 3 or higher
$faultyPath = $pfade | Where-Object { -not (Test-Path $_.Path) } | ForEach-Object Path
$array | Where-Object Path -notin $faultyPath
For PowerShell 2 or lower
$faultyPath = $pfade | Where-Object { -not (Test-Path $_.Path) } | ForEach-Object { $_.Path }
$array | Where-Object { $faultyPath -notcontains $_.Path }
This is potentially an expensive array comparison if both sets are large. In that case dictionaries or hashtables will provide better performance for the comparison.

How do I return a custom object in Powershell that's formatted as a table?

I'm pretty new to powershell, so I won't be surprised at all if I'm going about this all wrong. I'm trying to create a function that, when executed, prints results formatted as a table. Maybe it would even be possible to pipe those results to another function for further analysis.
Here's what I have so far. This is a simple function that iterates through a list of paths and collects the name of the directory and the number of items in that directory, putting the data in a hashtable, and returning an array of hashtables:
function Check-Paths(){
$paths =
"C:\code\DirA",
"C:\code\DirB"
$dirs = #()
foreach ($path in $paths){
if (Test-Path $path){
$len = (ls -path $path).length
}
else{
$len = 0
}
$dirName = ($path -split "\\")[-1]
$dirInfo = #{DirName = $dirName; NumItems = $len}
$dirs += $dirInfo
}
return $dirs
}
That seems straightforward enough. However, when I go run the command, this is what I get:
PS > Check-Paths
Name Value
---- -----
DirName DirA
NumItems 0
DirName DirB
NumItems 0
What I want is this:
DirName NumItems
------- --------
DirA 0
DirB 0
I could just hack my function to use a write statement, but I think there must be a much better way to do this. Is there a way to get the data formatted as a table, even better if that can be such that it can be piped to another method?
How 'bout using
return new-object psobject -Property $dirs
That would return an object whose properties match the items in the hashtable. Then you can use the built-in powershell formatting cmdlets to make it look like you want. since you only have 2 properties, it will be formatted as a table by default.
EDIT: Here's how the whole thing would look (After the various suggestions):
function Check-Paths(){
$paths =
"C:\code\DirA",
"C:\code\DirB"
$dirs = #()
foreach ($path in $paths){
if (Test-Path $path){
$len = (ls -path $path).length
}
else{
$len = 0
}
$dirName = ($path -split "\\")[-1]
new-object psobject -property #{DirName = $dirName; NumItems = $len}
}
}
Here's a one liner that will give you the number of children for each folder.
"C:\code\DirA", "C:\code\DirB" | ? {Test-Path $_} | Get-Item | select -property Name, #{ Name="NumOfItems" ; Expression = {$_.GetFileSystemInfos().Count} }
It passes an array of strings to Where-Object to select the ones that exist. The path strings that exist are passed to Get-Item to get the FileSystemObjects which get passed to Select-Object to create PSCustomObject objects. The PSCustomObjects have two properties, the name of the directory and the number of children.
If you want the outputted table columns closer together you can pipe the above to: Format-Table -AutoSize
Example usage and output:
dir | ? {$_.PsIsContainer} | select -property Name, #{ Name="NumOfItems" ; Expression = {$_.GetFileSystemInfos().Count} } | Format-Table -AutoSize
Name NumOfItems
---- ----------
Desktop 12
Favorites 3
My Documents 3
Start Menu 2

Powershell Select-Object from array not working

I am trying to seperate values in an array so i can pass them to another function.
Am using the select-Object function within a for loop to go through each line and separate the timestamp and value fields.
However, it doesn't matter what i do the below code only displays the first select-object variable for each line. The second select-object command doesn't seem to work as my output is a blank line for each of the 6 rows.
Any ideas on how to get both values
$ReportData = $SystemStats.get_performance_graph_csv_statistics( (,$Query) )
### Allocate a new encoder and turn the byte array into a string
$ASCII = New-Object -TypeName System.Text.ASCIIEncoding
$csvdata = $ASCII.GetString($ReportData[0].statistic_data)
$csv2 = convertFrom-CSV $csvdata
$newarray = $csv2 | Where-Object {$_.utilization -ne "0.0000000000e+00" -and $_.utilization -ne "nan" }
for ( $n = 0; $n -lt $newarray.Length; $n++)
{
$nTime = $newarray[$n]
$nUtil = $newarray[$n]
$util = $nUtil | select-object Utilization
$util
$tstamp = $nTime | select-object timestamp
$tstamp
}
Let me slightly modify the processing code, if it will help.
$csv2 |
Where-Object {$_.utilization -ne "0.0000000000e+00" -and $_.utilization -ne "nan" } |
Select-Object Utilization,TimeStamp
It will produce somewhat different output, but that should be better for working with.
The result are objects with properties Utilization and TimeStamp. You can pass them to the another function as you mention.
Generally it is better to use pipes instead of for loops. You don't need to care about indexes and it works with arrays as well as with scalar values.
If my updated code won't work: is the TimeStamp property really filled with any value?