How can I improve the speed and memory usage of calculating the size of the N largest files?

How can I improve the speed and memory usage of calculating the size of the N largest files? - powershell

I am getting the total number of bytes of the 32 largest files in the folder:
$big32 = Get-ChildItem c:\\temp -recurse |
Sort-Object length -descending |
select-object -first 32 |
measure-object -property length –sum
$big32.sum /1gb
However, it's working very slowly. We have about 10 TB of data in 1.4 million files.

The following implements improvements by only using PowerShell cmdlets. Using System.IO.Directory.EnumerateFiles() as a basis as suggested by this answer might give another performance improvement but you should do your own measurements to compare.
(Get-ChildItem c:\temp -Recurse -File).ForEach('Length') |
Sort-Object -Descending -Top 32 |
Measure-Object -Sum
This should reduce memory consumption considerably as it only sorts an array of numbers instead of an array of FileInfo objects. Maybe it's also somewhat faster due to better caching (an array of numbers is stored in a contiguous, cache-friendly block of memory, whereas an array of objects only stores the references in a contiguous way, but the objects themselfs can be scattered all around in memory).
Note the use of .ForEach('Length') instead of just .Length because of member enumeration ambiguity.
By using Sort-Object parameter -Top we can get rid of the Select-Object cmdlet, further reducing pipeline overhead.

I can think of some improvements, especially to memory usage but following should be considerable faster than Get-ChildItem
[System.IO.Directory]::EnumerateFiles('c:\temp', '*.*', [System.IO.SearchOption]::AllDirectories) |
Foreach-Object {
[PSCustomObject]#{
filename = $_
length = [System.IO.FileInfo]::New($_).Length
}
} |
Sort-Object length -Descending |
Select-Object -First 32
Edit
I would look at trying to implement an implit heap to reduce memory usage without hurting performance (possibly even improves it... to be tested)
Edit 2
If the filenames are not required, the easiest gain on memory is to not include them in the results.
[System.IO.Directory]::EnumerateFiles('c:\temp', '*.*', [System.IO.SearchOption]::AllDirectories) |
Foreach-Object {
[System.IO.FileInfo]::New($_).Length
} |
Sort-Object length -Descending |
Select-Object -First 32

Firstly, if you're going to use Get-ChildItem then you should pass the -File switch parameter so that [System.IO.DirectoryInfo] instances never enter the pipeline.
Secondly, you're not passing the -Force switch parameter to Get-ChildItem, so any hidden files in that directory structure won't be retrieved.
Thirdly, note that your code is retrieving the 32 largest files, not the files with the 32 largest lengths. That is, if files 31, 32, and 33 are all the same length, then file 33 will be arbitrarily excluded from the final count. If that distinction is important to you you could rewrite your code like this...
$filesByLength = Get-ChildItem -File -Force -Recurse -Path 'C:\Temp\' |
Group-Object -AsHashTable -Property Length
$big32 = $filesByLength.Keys |
Sort-Object -Descending |
Select-Object -First 32 |
ForEach-Object -Process { $filesByLength[$_] } |
Measure-Object -Property Length -Sum
$filesByLength is a [Hashtable] that maps from a length to the file(s) with that length. The Keys property contains all of the unique lengths of all of the retrieved files, so we get the 32 largest keys/lengths and use each one to send all the files of that length down the pipeline.
Most importantly, sorting the retrieved files to find the largest ones is problematic for several reasons:
Sorting cannot start until all of the input data is available, meaning at that point in time all 1.4 million [System.IO.FileInfo] instances will be present in memory.
I'm not sure how Sort-Object buffers the incoming pipeline data, but I imagine it would be some kind of list that doubles in size every time it needs more capacity, leading to further garbage in memory to be cleaned up.
Each of the 1.4 million [System.IO.FileInfo] instances will be accessed a second time to get their Length property, all the while whatever sorting manipulations (depending on what algorithm Sort-Object uses) are occurring, too.
Since we only care about a mere 32 largest files/lengths out of 1.4 million files, what if we only kept track of those 32 instead of all 1.4 million? Consider if we only wanted to find the single largest file...
$largestFileLength = 0
$largestFile = $null
foreach ($file in Get-ChildItem -File -Force -Recurse -Path 'C:\Temp\')
{
# Track the largest length in a separate variable to avoid two comparisons...
# if ($largestFile -eq $null -or $file.Length -gt $largestFile.Length)
if ($file.Length -gt $largestFileLength)
{
$largestFileLength = $file.Length
$largestFile = $file
}
}
Write-Host -Message "The largest file is named ""$($largestFile.Name)"" and has length $largestFileLength."
As opposed to Get-ChildItem ... | Sort-Object -Property Length -Descending | Select-Object -First 1, this has the advantage of only one [FileInfo] object being "in-flight" at a time and the complete set of [System.IO.FileInfo]s being enumerated only once. Now all we need to do is to take the same approach but expanded from 1 file/length "slot" to 32...
$basePath = 'C:\Temp\'
$lengthsToKeep = 32
$includeZeroLengthFiles = $false
$listType = 'System.Collections.Generic.List[System.IO.FileInfo]'
# A SortedDictionary[,] could be used instead to avoid having to fully enumerate the Keys
# property to find the new minimum length, but add/remove/retrieve performance is worse
$dictionaryType = "System.Collections.Generic.Dictionary[System.Int64, $listType]"
# Create a dictionary pre-sized to the maximum number of lengths to keep
$filesByLength = New-Object -TypeName $dictionaryType -ArgumentList $lengthsToKeep
# Cache the minimum length currently being kept
$minimumKeptLength = -1L
Get-ChildItem -File -Force -Recurse -Path $basePath |
ForEach-Object -Process {
if ($_.Length -gt 0 -or $includeZeroLengthFiles)
{
$list = $null
if ($filesByLength.TryGetValue($_.Length, [ref] $list))
{
# The current file's length is already being kept
# Add the current file to the existing list for this length
$list.Add($_)
}
else
{
# The current file's length is not being kept
if ($filesByLength.Count -lt $lengthsToKeep)
{
# There are still available slots to keep more lengths
$list = New-Object -TypeName $listType
# The current file's length will occupy an empty slot of kept lengths
}
elseif ($_.Length -gt $minimumKeptLength)
{
# There are no available slots to keep more lengths
# The current file's length is large enough to keep
# Get the list for the minimum length
$list = $filesByLength[$minimumKeptLength]
# Remove the minimum length to make room for the current length
$filesByLength.Remove($minimumKeptLength) |
Out-Null
# Reuse the list for the now-removed minimum length instead of allocating a new one
$list.Clear()
# The current file's length will occupy the newly-vacated slot of kept lengths
}
else
{
# There are no available slots to keep more lengths
# The current file's length is too small to keep
return
}
$list.Add($_)
$filesByLength.Add($_.Length, $list)
$minimumKeptLength = ($filesByLength.Keys | Measure-Object -Minimum).Minimum
}
}
}
# Unwrap the files in each by-length list
foreach ($list in $filesByLength.Values)
{
foreach ($file in $list)
{
$file
}
}
I went with the approach, described above, of retrieving the files with the 32 largest lengths. A [Dictionary[Int64, List[FileInfo]]] is used to track those 32 largest lengths and the corresponding files with that length. For each input file, we first check if its length is among the largest so far and, if so, add the file to the existing List[FileInfo] for that length. Otherwise, if there's still room in the dictionary we can unconditionally add the input file and its length, or if the input file is at least bigger than the smallest tracked length we can remove that smallest length and add in its place the input file and its length. Once there are no more input files we output all of the [FileInfo] objects from all of the [List[FileInfo]]s in the [Dictionary[Int64, [List[FileInfo]]]].
I ran this simple benchmarking template...
1..5 |
ForEach-Object -Process {
[GC]::Collect()
return Measure-Command -Expression {
# Code to test
}
} | Measure-Object -Property 'TotalSeconds' -Minimum -Maximum -Average
...on PowerShell 7.2 against my $Env:WinDir directory (325,000 files) with these results:
# Code to test
Minimum
Maximum
Average
Memory usage*
Get-ChildItem -File -Force -Recurse -Path $Env:WinDir
69.7240896
79.727841
72.81731518
+260 MB
Get $Env:WinDir files with 32 largest lengths using -AsHashtable, Sort-Object
82.7488729
83.5245153
83.04068032
+1 GB
Get $Env:WinDir files with 32 largest lengths using dictionary of by-length lists
81.6003697
82.7035483
82.15654538
+235 MB
* As observed in the Task Manager → Details tab → Memory (active private working set) column
I'm a little disappointed that my solution is only about 1% faster than the code using the Keys of a [Hashtable], but perhaps grouping the files using a compiled cmdlet vs. not grouping or sorting them but with more (interpreted) PowerShell code is a wash. Still, the difference in memory usage is significant, though I can't explain why the Get-ChildItem call to simply enumerate all files ended up using a bit more.

Related

PowerShell script - Loop list of folders to get file count and sum of files for each folder listed

I want to get the file count & the sum of files for each individual folder listed in DGFoldersTEST.txt.
However, I’m currently getting the sum of all 3 folders.
And now I'm getting 'Index was outside the bounds of the array' error message.
$DGfolderlist = Get-Content -Path C:\DiskGroupsFolders\DGFoldersTEST.txt
$FolderSize =#()
$int=0
Foreach ($DGfolder in $DGfolderlist)
{
$FolderSize[$int] =
Get-ChildItem -Path $DGfolderlist -File -Recurse -Force -ErrorAction SilentlyContinue |
Measure-Object -Property Length -Sum |
Select-Object -Property Count, #{Name='Size(MB)'; Expression={('{0:N2}' -f($_.Sum/1mb))}}
Write-Host $DGfolder
Write-Host $FolderSize[$int]
$int++
}

To explain the error, you're trying to assign a value at index $int of your $FolderSize array, however, when arrays are initialized using the array subexpression operator #(..), they're intialized with 0 Length, hence why the error. It's different as to when you would initialize them with a specific Length:
$arr = #()
$arr.Length # 0
$arr[0] = 'hello' # Error
$arr = [array]::CreateInstance([object], 10)
$arr.Length # 10
$arr[0] = 'hello' # all good
As for how to approach your code, since you don't really know how many items will come as output from your loop, initializing an array with a specific Length is not possible. PowerShell offers the += operator for adding elements to it, however this is a very expensive operation and not a very good idea because each time we append a new element to the array, a new array has to be created, this is because arrays are of a fixed size. See this answer for more information and better approaches.
You can simply let PowerShell capture the output of your loop by assigning the variable to the loop itself:
$FolderSize = foreach ($DGfolder in $DGfolderlist) {
Get-ChildItem -Path $DGfolder -File -Recurse -Force -ErrorAction SilentlyContinue |
Measure-Object -Property Length -Sum |
Select-Object #(
#{ Name = 'Folder'; Expression = { $DGfolder }}
'Count'
#{ Name = 'Size(MB)'; Expression = { ($_.Sum / 1mb).ToString('N2') }}
)
}

powershell get mac adress and output as text [duplicate]

Let's say we have an array of objects $objects. Let's say these objects have a "Name" property.
This is what I want to do
$results = #()
$objects | %{ $results += $_.Name }
This works, but can it be done in a better way?
If I do something like:
$results = objects | select Name
$results is an array of objects having a Name property. I want $results to contain an array of Names.
Is there a better way?

I think you might be able to use the ExpandProperty parameter of Select-Object.
For example, to get the list of the current directory and just have the Name property displayed, one would do the following:
ls | select -Property Name
This is still returning DirectoryInfo or FileInfo objects. You can always inspect the type coming through the pipeline by piping to Get-Member (alias gm).
ls | select -Property Name | gm
So, to expand the object to be that of the type of property you're looking at, you can do the following:
ls | select -ExpandProperty Name
In your case, you can just do the following to have a variable be an array of strings, where the strings are the Name property:
$objects = ls | select -ExpandProperty Name

As an even easier solution, you could just use:
$results = $objects.Name
Which should fill $results with an array of all the 'Name' property values of the elements in $objects.

To complement the preexisting, helpful answers with guidance of when to use which approach and a performance comparison.
Outside of a pipeline[1], use (requires PSv3+):
$objects.Name # returns .Name property values from all objects in $objects
as demonstrated in rageandqq's answer, which is both syntactically simpler and much faster.
Accessing a property at the collection level to get its elements' values as an array (if there are 2 or more elements) is called member-access enumeration and is a PSv3+ feature.
Alternatively, in PSv2, use the foreach statement, whose output you can also assign directly to a variable: $results = foreach ($obj in $objects) { $obj.Name }
If collecting all output from a (pipeline) command in memory first is feasible, you can also combine pipelines with member-access enumeration; e.g.:
(Get-ChildItem -File | Where-Object Length -lt 1gb).Name
Tradeoffs:
Both the input collection and output array must fit into memory as a whole.
If the input collection is itself the result of a command (pipeline) (e.g., (Get-ChildItem).Name), that command must first run to completion before the resulting array's elements can be accessed.
In a pipeline, in case you must pass the results to another command, notably if the original input doesn't fit into memory as a whole, use: $objects | Select-Object -ExpandProperty Name
The need for -ExpandProperty is explained in Scott Saad's answer (you need it to get only the property value).
You get the usual pipeline benefits of the pipeline's streaming behavior, i.e. one-by-one object processing, which typically produces output right away and keeps memory use constant (unless you ultimately collect the results in memory anyway).
Tradeoff:
Use of the pipeline is comparatively slow.
For small input collections (arrays), you probably won't notice the difference, and, especially on the command line, sometimes being able to type the command easily is more important.
Here is an easy-to-type alternative, which, however is the slowest approach; it uses ForEach-Object via its built-in alias, %, with simplified syntax (again, PSv3+):
; e.g., the following PSv3+ solution is easy to append to an existing command:
$objects | % Name # short for: $objects | ForEach-Object -Process { $_.Name }
Note: Use of the pipeline is not the primary reason this approach is slow, it is the inefficient implementation of the ForEach-Object (and Where-Object) cmdlets, up to at least PowerShell 7.2. This excellent blog post explains the problem; it led to feature request GitHub issue #10982; the following workaround greatly speeds up the operation (only somewhat slower than a foreach statement, and still faster than .ForEach()):
# Speed-optimized version of the above.
# (Use `&` instead of `.` to run in a child scope)
$objects | . { process { $_.Name } }
The PSv4+ .ForEach() array method, more comprehensively discussed in this article, is yet another, well-performing alternative, but note that it requires collecting all input in memory first, just like member-access enumeration:
# By property name (string):
$objects.ForEach('Name')
# By script block (more flexibility; like ForEach-Object)
$objects.ForEach({ $_.Name })
This approach is similar to member-access enumeration, with the same tradeoffs, except that pipeline logic is not applied; it is marginally slower than member-access enumeration, though still noticeably faster than the pipeline.
For extracting a single property value by name (string argument), this solution is on par with member-access enumeration (though the latter is syntactically simpler).
The script-block variant ({ ... }) allows arbitrary transformations; it is a faster - all-in-memory-at-once - alternative to the pipeline-based ForEach-Object cmdlet (%).
Note: The .ForEach() array method, like its .Where() sibling (the in-memory equivalent of Where-Object), always returns a collection (an instance of [System.Collections.ObjectModel.Collection[psobject]]), even if only one output object is produced.
By contrast, member-access enumeration, Select-Object, ForEach-Object and Where-Object return a single output object as-is, without wrapping it in a collection (array).
Comparing the performance of the various approaches
Here are sample timings for the various approaches, based on an input collection of 10,000 objects, averaged across 10 runs; the absolute numbers aren't important and vary based on many factors, but it should give you a sense of relative performance (the timings come from a single-core Windows 10 VM:
Important
The relative performance varies based on whether the input objects are instances of regular .NET Types (e.g., as output by Get-ChildItem) or [pscustomobject] instances (e.g., as output by Convert-FromCsv).
The reason is that [pscustomobject] properties are dynamically managed by PowerShell, and it can access them more quickly than the regular properties of a (statically defined) regular .NET type. Both scenarios are covered below.
The tests use already-in-memory-in-full collections as input, so as to focus on the pure property extraction performance. With a streaming cmdlet / function call as the input, performance differences will generally be much less pronounced, as the time spent inside that call may account for the majority of the time spent.
For brevity, alias % is used for the ForEach-Object cmdlet.
General conclusions, applicable to both regular .NET type and [pscustomobject] input:
The member-enumeration ($collection.Name) and foreach ($obj in $collection) solutions are by far the fastest, by a factor of 10 or more faster than the fastest pipeline-based solution.
Surprisingly, % Name performs much worse than % { $_.Name } - see this GitHub issue.
PowerShell Core consistently outperforms Windows Powershell here.
Timings with regular .NET types:
PowerShell Core v7.0.0-preview.3
Factor Command Secs (10-run avg.)
------ ------- ------------------
1.00 $objects.Name 0.005
1.06 foreach($o in $objects) { $o.Name } 0.005
6.25 $objects.ForEach('Name') 0.028
10.22 $objects.ForEach({ $_.Name }) 0.046
17.52 $objects | % { $_.Name } 0.079
30.97 $objects | Select-Object -ExpandProperty Name 0.140
32.76 $objects | % Name 0.148
Windows PowerShell v5.1.18362.145
Factor Command Secs (10-run avg.)
------ ------- ------------------
1.00 $objects.Name 0.012
1.32 foreach($o in $objects) { $o.Name } 0.015
9.07 $objects.ForEach({ $_.Name }) 0.105
10.30 $objects.ForEach('Name') 0.119
12.70 $objects | % { $_.Name } 0.147
27.04 $objects | % Name 0.312
29.70 $objects | Select-Object -ExpandProperty Name 0.343
Conclusions:
In PowerShell Core, .ForEach('Name') clearly outperforms .ForEach({ $_.Name }). In Windows PowerShell, curiously, the latter is faster, albeit only marginally so.
Timings with [pscustomobject] instances:
PowerShell Core v7.0.0-preview.3
Factor Command Secs (10-run avg.)
------ ------- ------------------
1.00 $objects.Name 0.006
1.11 foreach($o in $objects) { $o.Name } 0.007
1.52 $objects.ForEach('Name') 0.009
6.11 $objects.ForEach({ $_.Name }) 0.038
9.47 $objects | Select-Object -ExpandProperty Name 0.058
10.29 $objects | % { $_.Name } 0.063
29.77 $objects | % Name 0.184
Windows PowerShell v5.1.18362.145
Factor Command Secs (10-run avg.)
------ ------- ------------------
1.00 $objects.Name 0.008
1.14 foreach($o in $objects) { $o.Name } 0.009
1.76 $objects.ForEach('Name') 0.015
10.36 $objects | Select-Object -ExpandProperty Name 0.085
11.18 $objects.ForEach({ $_.Name }) 0.092
16.79 $objects | % { $_.Name } 0.138
61.14 $objects | % Name 0.503
Conclusions:
Note how with [pscustomobject] input .ForEach('Name') by far outperforms the script-block based variant, .ForEach({ $_.Name }).
Similarly, [pscustomobject] input makes the pipeline-based Select-Object -ExpandProperty Name faster, in Windows PowerShell virtually on par with .ForEach({ $_.Name }), but in PowerShell Core still about 50% slower.
In short: With the odd exception of % Name, with [pscustomobject] the string-based methods of referencing the properties outperform the scriptblock-based ones.
Source code for the tests:
Note:
Download function Time-Command from this Gist to run these tests.
Assuming you have looked at the linked code to ensure that it is safe (which I can personally assure you of, but you should always check), you can install it directly as follows:
irm https://gist.github.com/mklement0/9e1f13978620b09ab2d15da5535d1b27/raw/Time-Command.ps1 | iex
Set $useCustomObjectInput to $true to measure with [pscustomobject] instances instead.
$count = 1e4 # max. input object count == 10,000
$runs = 10 # number of runs to average
# Note: Using [pscustomobject] instances rather than instances of
# regular .NET types changes the performance characteristics.
# Set this to $true to test with [pscustomobject] instances below.
$useCustomObjectInput = $false
# Create sample input objects.
if ($useCustomObjectInput) {
# Use [pscustomobject] instances.
$objects = 1..$count | % { [pscustomobject] #{ Name = "$foobar_$_"; Other1 = 1; Other2 = 2; Other3 = 3; Other4 = 4 } }
} else {
# Use instances of a regular .NET type.
# Note: The actual count of files and folders in your file-system
# may be less than $count
$objects = Get-ChildItem / -Recurse -ErrorAction Ignore | Select-Object -First $count
}
Write-Host "Comparing property-value extraction methods with $($objects.Count) input objects, averaged over $runs runs..."
# An array of script blocks with the various approaches.
$approaches = { $objects | Select-Object -ExpandProperty Name },
{ $objects | % Name },
{ $objects | % { $_.Name } },
{ $objects.ForEach('Name') },
{ $objects.ForEach({ $_.Name }) },
{ $objects.Name },
{ foreach($o in $objects) { $o.Name } }
# Time the approaches and sort them by execution time (fastest first):
Time-Command $approaches -Count $runs | Select Factor, Command, Secs*
[1] Technically, even a command without |, the pipeline operator, uses a pipeline behind the scenes, but for the purpose of this discussion using the pipeline refers only to commands that use |, the pipeline operator, and therefore by definition involve multiple commands.

Caution, member enumeration only works if the collection itself has no member of the same name. So if you had an array of FileInfo objects, you couldn't get an array of file lengths by using
$files.length # evaluates to array length
And before you say "well obviously", consider this. If you had an array of objects with a capacity property then
$objarr.capacity
would work fine UNLESS $objarr were actually not an [Array] but, for example, an [ArrayList]. So before using member enumeration you might have to look inside the black box containing your collection.
(Note to moderators: this should be a comment on rageandqq's answer but I don't yet have enough reputation.)

I learn something new every day! Thank you for this. I was trying to achieve the same. I was directly doing this:
$ListOfGGUIDs = $objects.{Object GUID}
Which basically made my variable an object again! I later realized I needed to define it first as an empty array,
$ListOfGGUIDs = #()

Powershell Process CPU checking

Have the following which works OK, but with an in issue in PowerShell:
$FileName = "E:\Work\ps\Inventory.htm"
$serverlist = "E:\Work\ps\Monitored_computers.txt"
foreach ($server in Get-Content $serverlist)
{
$servern=$server.split(",")[0]
$ip=$server.split(",")[1]
$cpu = gwmi Win32_PerfFormattedData_PerfProc_Process -Computer $servern -filter "Name <> '_Total' and Name <> 'Idle'" | Sort-Object PercentProcessorTime -Descending | where { $_.PercentProcessorTime -gt 0 }| select -First 1
if ($cpu.PercentProcessorTime -ge "92") {
write-host $servern ' ' $cpu.Name ' ' $cpu.PercentProcessorTime
}
}
I have seen some other code in PowerShell, that takes an Average but almost seems like an "average of an average" - which is meaningless. And, this is for overall CPU Usage
gwmi win32_processor | Measure-Object -property LoadPercentage -Average | Foreach {$_.Average}
Now, if we can take the same logic and apply for our process issue:
gwmi Win32_PerfFormattedData_PerfProc_Process | Sort-Object PercentProcessorTime -Descending | where { $_.PercentProcessorTime -gt 0 } | select -First 1 | Measure-Object -property PercentProcessorTime -Average | Foreach {$_.PercentProcessorTime}
What am trying to ask is: I do get the CPU Percentage, which seems to be a "point in time". How do locate the true CPU Percentage? This is why I am pointing out the average. I really want to get around the "point in time" part of this.
The point being, when we have seen on several occasions, a high CPU per process on a server, we login to the server and the high CPU has subsided. This is not to say, this has been each time, but we know that sometimes a CPU will spike and then quiet down.
Thanks for any insight!

First issue, you are stuck at a Point In Time because when you execute your script it captures a snapshot of what is happening right then and there. What you are looking for is historical data, so you can figure out the average CPU usage of processes over a set amount of time, and pinpoint the process that's bogging down your CPU. Do you have performance monitors in place to track CPU usage for individual processes? You may need to setup performance logging if you want to be able to get the numbers you're looking for after the fact.
Secondly, I think that you misunderstand how Measure-Object works. If you run Get-Help on the cmdlet and check the Output you'll see that it outputs a GenericMeasureInfo object. This object will have a property for the relevant stat that you are looking for, which in your case is the Average property. It is not an average of an average, the most common usage I see for it is to calculate something, like a Sum or Average, and then output the value of that property.
Let's try a simple example...
Find the average size of the files in a folder. First we use Get-ChildItem to get a collection of files, and pipe it to Measure-Object. We will specify the -Average argument to specify that we want the Average calculated, and -Property length, so that it knows what to average:
GCI C:\Temp\* -file | Measure-Object -Average -Property length
This outputs a GenericMeasureInfo object like this:
Count : 30
Average : 55453155
Sum :
Maximum :
Minimum :
Property : Length
That lets me know that it had 30 files piped to it, and it found the Average for the Length property. Now, sometime you want to calculate more than one thing, so you can use more than one argument, such as -Sum and -Maximum, and those values will be populated as well:
Count : 30
Average : 55453155
Sum : 1663594650
Maximum : 965376000
Minimum :
Property : Length
So it looks like my average file is ~55MB, but out of the 1.6GB in the whole folder I've got one file that's 965MB! That file is undoubtedly skewing my numbers. With that output I could find folders that have multiple files, but one file is taking up over half of the space for the folder, and find anomalies... such as the ISO that I have saved to my C:\temp folder for some reason. Looks like I need to do some file maintenance.

Thanks to #TheMadTechnician I have been able to sort this out. I had a wrong component with
$_.Average
where I had
$_.PercentProcessorTime
and that would never work. Here is the correct script:
$serverlist = "D:\Work\ps\Monitored_computers.txt"
foreach ($server in Get-Content $serverlist) {
$servern=$server.split(",")[0]
$ip=$server.split(",")[1]
$cpu = gwmi Win32_PerfFormattedData_PerfProc_Process -Computer $ip | `
Where-Object {$_.Name -like "*tomcat*"} | `
Measure-Object -property PercentProcessorTime -Average | `
Foreach {$_.Average}
if ($cpu -ge "20") {
write-host $servern $cpu ' has a tomcat process greater than 20'
}
}

Using Powershell to compare two files and then output only the different string names

So I am a complete beginner at Powershell but need to write a script that will take a file, compare it against another file, and tell me what strings are different in the first compared to the second. I have had a go at this but I am struggling with the outputs as my script will currently only tell me on which line things are different, but it also seems to count lines that are empty too.
To give some context for what I am trying to achieve, I would like to have a static file of known good Windows processes ($Authorized) and I want my script to pull a list of current running processes, filter by the process name column so to just pull the process name strings, then match anything over 1 character, sort the file by unique values and then compare it against $Authorized, plus finally either outputting the different process strings found in $Processes (to the ISE Output Pane) or just to output the different process names to a file.
I have spent today attempting the following in Powershell ISE and also Googling around to try and find solutions. I heard 'fc' is a better choice instead of Compare-Object but I could not get that to work. I have thus far managed to get it to work but the final part where it compares the two files it seems to compare line by line, for which would always give me false positives as the line position of the process names in the file supplied would change, furthermore I only want to see the changed process names, and not the line numbers which it is reporting ("The process at line 34 is an outlier" is what currently gets outputted).
I hope this makes sense, and any help on this would be very much appreciated.
Get-Process | Format-Table -Wrap -Autosize -Property ProcessName | Outfile c:\users\me\Desktop\Processes.txt
$Processes = 'c:\Users\me\Desktop\Processes.txt'
$Output_file = 'c:\Users\me\Desktop\Extracted.txt'
$Sorted = 'c:\Users\me\Desktop\Sorted.txt'
$Authorized = 'c:\Users\me\Desktop\Authorized.txt'
$regex = '.{1,}'
select-string -Path $Processes -Pattern $regex |% { $_.Matches } |% { $_.Value } > $Output_file
Get-Content $Output_file | Sort-Object -Unique > $Sorted
$dif = Compare-Object -ReferenceObject $(Get-Content $Sorted) -DifferenceObject $(get-content $Authorized) -IncludeEqual
$lineNumber = 1
foreach ($difference in $dif)
{
if ($difference.SideIndicator -ne "==")
{
Write-Output "The Process at Line $linenumber is an Outlier"
}
$lineNumber ++
}
Remove-Item c:\Users\me\Desktop\Processes.txt
Remove-Item c:\Users\me\Desktop\Extracted.txt
Write-Output "The Results are Stored in $Sorted"

From the length and complexity of your script, I feel like I'm missing something, but your description seems clear
Running process names:
$ProcessNames = #(Get-Process | Select-Object -ExpandProperty Name)
.. which aren't blank: $ProcessNames = $ProcessNames | Where-Object {$_ -ne ''}
List of authorised names from a file:
$AuthorizedNames = Get-Content 'c:\Users\me\Desktop\Authorized.txt'
Compare:
$UnAuthorizedNames = $ProcessNames | Where-Object { $_ -notin $AuthorizedNames }
optional output to file:
$UnAuthorizedNames | Set-Content out.txt
or in the shell:
#(gps).Name -ne '' |? { $_ -notin (gc authorized.txt) } | sc out.txt
1 2 3 4 5 6 7 8
1. #() forces something to be an array, even if it only returns one thing
2. gps is a default alias of Get-Process
3. using .Property on an array takes that property value from every item in the array
4. using an operator on an array filters the array by whether the items pass the test
5. ? is an alias of Where-Object
6. -notin tests if one item is not in a collection
7. gc is an alias of Get-Content
8. sc is an alias of Set-Content
You should use Set-Content instead of Out-File and > because it handles character encoding nicely, and they don't. And because Get-Content/Set-Content sounds like a memorable matched pair, and Get-Content/Out-File doesn't.

How can I reference the original object from values in an array?

I've done quite a bit of searching but can't seem to find an answer to this, but if it has been answered, I apologize, and just link me to it if you can.
What I'm trying to do is distribute files across 6 different paths based on which path currently has the least number of files in it.
What I thought to do is to add the responses from these ($Queue1-6 are just file paths) to an array and then sort them and get the path from the first object.
$QueueFiles1 = ( Get-ChildItem $Queue1 | Measure-Object ).Count
$QueueFiles2 = ( Get-ChildItem $Queue2 | Measure-Object ).Count
$QueueFiles3 = ( Get-ChildItem $Queue3 | Measure-Object ).Count
$QueueFiles4 = ( Get-ChildItem $Queue4 | Measure-Object ).Count
$QueueFiles5 = ( Get-ChildItem $Queue5 | Measure-Object ).Count
$QueueFiles6 = ( Get-ChildItem $Queue6 | Measure-Object ).Count
$FileNumArray = #($QueueFiles1, $QueueFiles2, $QueueFiles3, $QueueFiles4, $QueueFiles5, $QueueFiles6)
$FileNumArray = $FileNumArray | Sort-Object
The problem is (as far as I can tell) that after adding these values to the array, the object is lost and all that is left is the value, so now I don't know how to reference back to the original object to obtain the path information.
Any thoughts on how to do this would be appreciated and it doesn't need to be done with an array, like this, if there's an easier way to compare those file count values and obtain the path information of the lowest value.
Also, if there is more than 1 path with the lowest value, it doesn't matter which is returned.
Thanks in advance for any assistance.

Note that i have used some of my own folder to take place of the $queue(s). Also depending on the location of these your might be able to build a simple for each loop ie: if they were all subfolders of the same parent.
$Queue1 = "C:\Temp\NewName"
$Queue2 = "C:\temp\TaskManagement"
$Queue3 = "C:\temp\message_log.csv"
# Build a hashtable. Add each queue to the hash table.
$fileCount = #{}
# Set the queue as the name and the count as the value
$fileCount.Add("$Queue1", (Get-ChildItem $Queue1 | Measure-Object ).Count)
$fileCount.Add("$Queue2", (Get-ChildItem $Queue2 | Measure-Object ).Count)
$fileCount.Add("$Queue3", (Get-ChildItem $Queue4 | Measure-Object ).Count)
# Sort the results by the value of the hashtable (Counts from earlier) and select only the one.
$fileCount.GetEnumerator() | Sort-Object value | Select-Object -First 1 -ExpandProperty Name
Explaining the last line
.GetEnumerator() is required in order to sort the hashtable.
Sort-Object is Ascending by default so there is no need to mention it.
Select-Object -First 1 if you dont care which one you get as long as it has the smallest amount of files.
-ExpandProperty Name since you only really need the path and not the hashtable entry itself.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse