Does Powershell have an Aggregate/Reduce function?

Does Powershell have an Aggregate/Reduce function? - powershell

I realize there are related questions, but all the answers seem to be work-arounds that avoid the heart of the matter. Does powershell have an operation that can use a scriptblock to aggregate elements of an array into a single value? This is what is known in other languages as aggregate or reduce or fold.
I can write it myself pretty easily, but given that its the base operation of any list processing, I would assume there's something built in I just don't know about.
So what I'm looking for is something like this
1..10 | Aggregate-Array {param($memo, $x); $memo * $x}

There is not anything so obviously named as Reduce-Object but you can achieve your goal with Foreach-Object:
1..10 | Foreach {$total=1} {$total *= $_} {$total}
BTW there also isn't a Join-Object to merge two sequences of data based on some matching property.

If you need Maximum, Minimum, Sum or Average you can use Measure-Object sadly it dosen´t handle any other aggregate method.
Get-ChildItem | Measure-Object -Property Length -Minimum -Maximum -Average

This is something I wanted to start for a while. Seeing this question, just wrote a pslinq (https://github.com/manojlds/pslinq) utility. The first and only cmdlet as of now is Aggregate-List, which can be used like below:
1..10 | Aggregate-List { $acc * $input } -seed 1
#3628800
Sum:
1..10 | Aggregate-List { $acc + $input }
#55
String reverse:
"abcdefg" -split '' | Aggregate-List { $input + $acc }
#gfedcba
PS: This is more an experiment

Ran into a similar issue recently. Here's a pure Powershell solution. Doesn't handle arrays within arrays and strings like the Javascript version does but maybe a good starting point.
function Reduce-Object {
[CmdletBinding()]
[Alias("reduce")]
[OutputType([Int])]
param(
# Meant to be passed in through pipeline.
[Parameter(Mandatory=$True,
ValueFromPipeline=$True,
ValueFromPipelineByPropertyName=$True)]
[Array] $InputObject,
# Position=0 because we assume pipeline usage by default.
[Parameter(Mandatory=$True,
Position=0)]
[ScriptBlock] $ScriptBlock,
[Parameter(Mandatory=$False,
Position=1)]
[Int] $InitialValue
)
begin {
if ($InitialValue) { $Accumulator = $InitialValue }
}
process {
foreach($Value in $InputObject) {
if ($Accumulator) {
# Execute script block given as param with values.
$Accumulator = $ScriptBlock.InvokeReturnAsIs($Accumulator, $Value)
} else {
# Contigency for no initial value given.
$Accumulator = $Value
}
}
}
end {
return $Accumulator
}
}
1..10 | reduce {param($a, $b) $a + $b}
# Or
reduce -inputobject #(1,2,3,4) {param($a, $b) $a + $b} -InitialValue 2

There's a functional module I came upon https://github.com/chriskuech/functional that has a reduce-object, also merge-object, test-equality, etc. It's surprising that powershell has map (foreach-object) and filter (where-object) but not reduce. https://medium.com/swlh/functional-programming-in-powershell-876edde1aadb Even javascript has reduce for arrays. A reducer actually is very powerful. You can define map and filter in terms of it.
1..10 | reduce-object { $a * $b }
3628800
Measure-object can't sum [timespan]'s:
1..10 | % { [timespan]"$_" } | Reduce-Object { $a + $b } | ft
Days Hours Minutes Seconds Milliseconds
---- ----- ------- ------- ------------
55 0 0 0 0
Merge two objects:
#{a=1},#{b=2} | % { [pscustomobject]$_ } | Merge-Object -Strategy fail
b a
- -
2 1

Related

Is it possible to set parameter positions for default commandlets?

For example I would like the Select-Object commandlet to interpret Get-ChildItem|Select-Object 1 as Get-ChildItem|Select-Object -First 1
I have been getting by with "Wrappers" like this for my most common commandlets:
function Select-Object{
[CMDletbinding()]
param (
[Parameter(ValueFromPipeline)]
$InputObect,
[Parameter(Position = 1)]
[int]$First,
[Parameter(Position = 2)]
[int]$Last
)
$input | Select-Object -First $First -Last $Last
}
But there are sometimes buggy and I always rewrite it to add more parameters.
I have been reading the docs and have not found anything other than parametersplicing.
It does not have to be an official solution. So if anyone has come up with a solution to this I would like to know.
PS: I know this is can lead to confusing code, but I am intending to only use it for an interactive\terminal sessions. Doing gci | sel 1 is infinitely more preferable to Get-ChildItem | Select-Object -First 1
Any help would be greatly appreciated!

For wrappers like this you should use a ProxyCommand. Most of the code below can be auto-generated for you via its .Create(..) Method:
[System.Management.Automation.ProxyCommand]::Create((Get-Command Select-Object))
Using only the 2 parameters you're interested in -First and -Last in addition to the pipeline parameter, the function would look like this:
function sel {
[CmdletBinding()]
param(
[Parameter(ValueFromPipeline)]
[object] $InputObject,
[Parameter(Position = 1)]
[int] $First,
[Parameter(Position = 2)]
[int] $Last
)
begin {
$wrapper = { Microsoft.PowerShell.Utility\Select-Object #PSBoundParameters }
$pipeline = $wrapper.GetSteppablePipeline($MyInvocation.CommandOrigin)
$pipeline.Begin($PSCmdlet)
}
process {
$pipeline.Process($InputObject)
}
end {
$pipeline.End()
}
}
Now you can use positional binding without problems:
0..10 | sel 1 # for `-First 1`
0..10 | sel 1 1 # for `-First 1` `-Last 1`
As for parameter splicing you might referring to what's known as Splatting in powershell. For that you can look into about Splatting. As an example the code above is using splatting with the automatic variable $PSBoundParameters.

What is the best way to structure an advanced multi-threaded Powershell function to work both on the pipeline as well as non-pipeline calls?

I have a function that flattens directories in parallel for multiple folders. It works great when I call it in a non-pipeline fashion:
$Files = Get-Content $FileList
Merge-FlattenDirectory -InputPath $Files
But now I want to update my function to work both on the pipeline as well as when called off the pipeline. Someone on discord recommended the best way to do this is to defer all processing to the end block, and use the begin and process blocks to add pipeline input to a list. Basically this:
function Merge-FlattenDirectory {
[CmdletBinding()]
param (
[Parameter(Mandatory,Position = 0,ValueFromPipeline)]
[string[]]
$InputPath
)
begin {
$List = [System.Collections.Generic.List[PSObject]]#()
}
process {
if(($InputPath.GetType().BaseType.Name) -eq "Array"){
Write-Host "Array detected"
$List = $InputPath
} else {
$List.Add($InputPath)
}
}
end {
$List | ForEach-Object -Parallel {
# Code here...
} -ThrottleLimit 16
}
}
However, this is still not working on the pipeline for me. When I do this:
$Files | Merge-FlattenDirectory
It actually passes individual arrays of length 1 to the function. So testing for ($InputPath.GetType().BaseType.Name) -eq "Array" isn't really the way forward, as only the first pipeline value gets used.
My million dollar question is the following:
What is the most robust way in the process block to differentiate between pipeline input and non-pipeline input? The function should add all pipeline input to a generic list, and in the case of non-pipeline input, should skip this step and process the collection as-is moving directly to the end block.
The only thing I could think of is the following:
if((($InputPath.GetType().BaseType.Name) -eq "Array") -and ($InputPath.Length -gt 1)){
$List = $InputPath
} else {
$List.Add($InputPath)
}
But this just doesn't feel right. Any help would be extremely appreciated.

You might just do
function Merge-FlattenDirectory {
[CmdletBinding()]
param (
[Parameter(Mandatory,Position = 0,ValueFromPipeline)]
[string[]]
$InputPath
)
begin {
$List = [System.Collections.Generic.List[String]]::new()
}
process {
$InputPath.ForEach{ $List.Add($_) }
}
end {
$List |ForEach-Object -Parallel {
# Code here...
} -ThrottleLimit 16
}
}
Which will process the input values either from the pipeline or the input parameter.
But that doesn't comply with the Strongly Encouraged Development Guidelines to Support Well Defined Pipeline Input (SC02) especially for Implement for the Middle of a Pipeline
This means if you correctly want to implement the PowerShell Pipeline, you should directly (parallel) process your items in the Process block and immediately output any results from there:
function Merge-FlattenDirectory {
[CmdletBinding()]
param (
[Parameter(Mandatory,Position = 0,ValueFromPipeline)]
[string[]]
$InputPath
)
begin {
$SharedPool = New-ThreadPool -Limit 16
}
process {
$InputPath |ForEach-Object -Parallel -threadPool $Using:SharedPool {
# Process your current item ($_) here ...
}
}
}
In general, script authors are advised to use idiomatic PowerShell which often comes down to lesser object manipulations and usually results in a correct PowerShell pipeline implementation with less memory usage.
Please let me know if you intent to collect (and e.g. order) the output based on this suggestion.
Caveat
The full invocation of the ForEach-Object -Parallel cmdlet itself is somewhat inefficient as you open and close a new pipeline each iteration. To resolve this, my whole general statement about idiomatic PowerShell falls a bit apart, but should be resolvable by using a steppable pipeline.
To implement this, you might use the ForEach-Object cmdlet as a template:
[System.Management.Automation.ProxyCommand]::Create((Get-Command ForEach-Object))
And set the ThrottleLimit of the ThreadPool in the Begin Block
function Merge-FlattenDirectory {
[CmdletBinding()]
param (
[Parameter(Mandatory,Position = 0,ValueFromPipeline)]
[string[]]
$InputPath
)
begin {
$PSBoundParameters += #{
ThrottleLimit = 4
Parallel = {
Write-Host (Get-Date).ToString('HH:mm:ss.s') 'Started' $_
Start-Sleep -Seconds 3
Write-Host (Get-Date).ToString('HH:mm:ss.s') 'finished' $_
}
}
$wrappedCmd = $ExecutionContext.InvokeCommand.GetCommand('ForEach-Object', [System.Management.Automation.CommandTypes]::Cmdlet)
$scriptCmd = {& $wrappedCmd #PSBoundParameters }
$steppablePipeline = $scriptCmd.GetSteppablePipeline($myInvocation.CommandOrigin)
$steppablePipeline.Begin($PSCmdlet)
}
process {
$InputPath.ForEach{ $steppablePipeline.Process($_) }
}
end {
$steppablePipeline.End()
}
}
1..5 |Merge-FlattenDirectory
17:57:40.40 Started 3
17:57:40.40 Started 2
17:57:40.40 Started 1
17:57:40.40 Started 4
17:57:43.43 finished 3
17:57:43.43 finished 1
17:57:43.43 finished 4
17:57:43.43 finished 2
17:57:43.43 Started 5
17:57:46.46 finished 5

Here's how I would write it with comments where I have changed it.
function Merge-FlattenDirectory {
[CmdletBinding()]
param (
[Parameter(Mandatory,Position = 0,ValueFromPipeline)]
$InputPath # <this may be a string, a path object, a file object,
# or an array
)
begin {
$List = #() # Use an array for less than 100K objects.
}
process {
#even if InputPath is a string for each will iterate once and set $p
#if it is an array of strings add each. If it is one or more objects,
#try to find the right property for the path.
foreach ($p in $inputPath) {
if ($p -is [String]) {$list += $p }
elseif ($p.Path) {$list += $p.Path}
elseif ($p.FullName) {$list += $p.FullName}
elseif ($p.PSPath) {$list += $p.PSPath}
else {Write-warning "$P makes no sense"}
}
}
end {
$List | ForEach-Object -Parallel {
# Code here...
} -ThrottleLimit 16
}
}
#iRon That "write for the middle of the pipeline" in the docs does not mean write everything in the process block .
function one { #(1,2,3,4,5) }
function two {
param ([parameter(ValueFromPipeline=$true)] $p )
begin {Write-host "Two begins" ; $a = #() }
process {Write-host "Two received $P" ; $a += $p }
end {Write-host "Two ending" ; $a; Write-host "Two ended"}
}
function three {
param ([parameter(ValueFromPipeline=$true)] $p )
begin {Write-host "three Starts" }
process {Write-host "Three received $P" }
end {Write-host "Three ended" }
}
one | two | three
One is treated as an end block.
One, two and three all run their begins (one's is empty).
One's output goes to the process block in two, which just collects the data. Two's end block starts after one's end-block ends, and sends output
At this point three's process block gets input. After two's end block ends, three's endblock runs.
Two is "in the middle" it has a process block to deal with multiple piped items (if it were all one 'end' block it would only process the last one).

Find index of array where condition is true in PowerShell [duplicate]

I have trouble of getting index of the current element for multiple elements that are exactly the same object:
$b = "A","D","B","D","C","E","D","F"
$b | ? { $_ -contains "D" }
Alternative version:
$b = "A","D","B","D","C","E","D","F"
[Array]::FindAll($b, [Predicate[String]]{ $args[0] -contains "D" })
This will return:
D
D
D
But this code:
$b | % { $b.IndexOf("D") }
Alternative version:
[Array]::FindAll($b, [Predicate[String]]{ $args[0] -contains "D" }) | % { $b.IndexOf($_) }
Returns:
1
1
1
so it's pointing at the index of the first element. How to get indexes of the other elements?

You can do this:
$b = "A","D","B","D","C","E","D","F"
(0..($b.Count-1)) | where {$b[$_] -eq 'D'}
1
3
6

mjolinor's answer is conceptually elegant, but slow with large arrays, presumably due to having to build a parallel array of indices first (which is also memory-inefficient).
It is conceptually similar to the following LINQ-based solution (PSv3+), which is more memory-efficient and about twice as fast, but still slow:
$arr = 'A','D','B','D','C','E','D','F'
[Linq.Enumerable]::Where(
[Linq.Enumerable]::Range(0, $arr.Length),
[Func[int, bool]] { param($i) $arr[$i] -eq 'D' }
)
While any PowerShell looping solution is ultimately slow compared to a compiled language, the following alternative, while more verbose, is still much faster with large arrays:
PS C:\> & { param($arr, $val)
$i = 0
foreach ($el in $arr) { if ($el -eq $val) { $i } ++$i }
} ('A','D','B','D','C','E','D','F') 'D'
1
3
6
Note:
Perhaps surprisingly, this solution is even faster than Matt's solution, which calls [array]::IndexOf() in a loop instead of enumerating all elements.
Use of a script block (invoked with call operator & and arguments), while not strictly necessary, is used to prevent polluting the enclosing scope with helper variable $i.
The foreach statement is faster than the Foreach-Object cmdlet (whose built-in aliases are % and, confusingly, also foreach).
Simply (implicitly) outputting $i for each match makes PowerShell collect multiple results in an array.
If only one index is found, you'll get a scalar [int] instance instead; wrap the whole command in #(...) to ensure that you always get an array.
While $i by itself outputs the value of $i, ++$i by design does NOT (though you could use (++$i) to achieve that, if needed).
Unlike Array.IndexOf(), PowerShell's -eq operator is case-insensitive by default; for case-sensitivity, use -ceq instead.
It's easy to turn the above into a (simple) function (note that the parameters are purposely untyped, for flexibility):
function get-IndicesOf($Array, $Value) {
$i = 0
foreach ($el in $Array) {
if ($el -eq $Value) { $i }
++$i
}
}
# Sample call
PS C:\> get-IndicesOf ('A','D','B','D','C','E','D','F') 'D'
1
3
6

You would still need to loop with the static methods from [array] but if you are still curious something like this would work.
$b = "A","D","B","D","C","E","D","F"
$results = #()
$singleIndex = -1
Do{
$singleIndex = [array]::IndexOf($b,"D",$singleIndex + 1)
If($singleIndex -ge 0){$results += $singleIndex}
}While($singleIndex -ge 0)
$results
1
3
6
Loop until a match is not found. Assume the match at first by assigning the $singleIndex to -1 ( Which is what a non match would return). When a match is found add the index to a results array.

Simulating `ls` in Powershell

I'm trying to get something that looks like UNIX ls output in PowerShell. This is getting there:
Get-ChildItem | Format-Wide -AutoSize -Property Name
but it's still outputting the items in row-major instead of column-major order:
PS C:\Users\Mark Reed> Get-ChildItem | Format-Wide -AutoSize -Property Name
Contacts Desktop Documents Downloads Favorites
Links Music Pictures Saved Games
Searches Videos
Desired output:
PS C:\Users\Mark Reed> My-List-Files
Contacts Downloads Music Searches
Desktop Favorites Pictures Videos
Documents Links Saved Games
The difference is in the sorting: 1 2 3 4 5/6 7 8 9 reading across the lines, vs 1/2/3 4/5/6 7/8/9 reading down the columns.
I already have a script that will take an array and print it out in column-major order using Write-Host, though I found a lot of PowerShellish idiomatic improvements to it by reading Keith's and Roman's takes. But my impression from reading around is that's the wrong way to go about this. Instead of calling Write-Host, a script should output objects, and let the formatters and outputters take care of getting the right stuff written to the user's console.
When a script uses Write-Host, its output is not capturable; if I assign the result to a variable, I get a null variable and the output is written to the screen anyway. It's like a command in the middle of a UNIX pipeline writing directly to /dev/tty instead of standard output or even standard error.
Admittedly, I may not be able to do much with the array of Microsoft.PowerShell.Commands.Internal.Format.* objects I get back from e.g. Format-Wide, but at least it contains the output, which doesn't show up on my screen in rogue fashion, and which I can recreate at any time by passing the array to another formatter or outputter.

This is a simple-ish function that formats column major. You can do this all in PowerShell Script:
function Format-WideColMajor {
[CmdletBinding()]
param(
[Parameter(ValueFromPipeline)]
[AllowNull()]
[AllowEmptyString()]
[PSObject]
$InputObject,
[Parameter()]
$Property
)
begin {
$list = new-object System.Collections.Generic.List[PSObject]
}
process {
$list.Add($InputObject)
}
end {
if ($Property) {
$output = $list | Foreach {"$($_.$Property)"}
}
else {
$output = $list | Foreach {"$_"}
}
$conWidth = $Host.UI.RawUI.BufferSize.Width - 1
$maxLen = ($output | Measure-Object -Property Length -Maximum).Maximum
$colWidth = $maxLen + 1
$numCols = [Math]::Floor($conWidth / $colWidth)
$numRows = [Math]::Ceiling($output.Count / $numCols)
for ($i=0; $i -lt $numRows; $i++) {
$line = ""
for ($j = 0; $j -lt $numCols; $j++) {
$item = $output[$i + ($j * $numRows)]
$line += "$item$(' ' * ($colWidth - $item.Length))"
}
$line
}
}
}

How does Select-Object stop the pipeline in PowerShell v3?

In PowerShell v2, the following line:
1..3| foreach { Write-Host "Value : $_"; $_ }| select -First 1
Would display:
Value : 1
1
Value : 2
Value : 3
Since all elements were pushed down the pipeline. However, in v3 the above line displays only:
Value : 1
1
The pipeline is stopped before 2 and 3 are sent to Foreach-Object (Note: the -Wait switch for Select-Object allows all elements to reach the foreach block).
How does Select-Object stop the pipeline, and can I now stop the pipeline from a foreach or from my own function?
Edit: I know I can wrap a pipeline in a do...while loop and continue out of the pipeline. I have also found that in v3 I can do something like this (it doesn't work in v2):
function Start-Enumerate ($array) {
do{ $array } while($false)
}
Start-Enumerate (1..3)| foreach {if($_ -ge 2){break};$_}; 'V2 Will Not Get Here'
But Select-Object doesn't require either of these techniques so I was hoping that there was a way to stop the pipeline from a single point in the pipeline.

Check this post on how you can cancel a pipeline:
http://powershell.com/cs/blogs/tobias/archive/2010/01/01/cancelling-a-pipeline.aspx
In PowerShell 3.0 it's an engine improvement. From the CTP1 samples folder ('\Engines Demos\Misc\ConnectBugFixes.ps1'):
# Connect Bug 332685
# Select-Object optimization
# Submitted by Shay Levi
# Connect Suggestion 286219
# PSV2: Lazy pipeline - ability for cmdlets to say "NO MORE"
# Submitted by Karl Prosser
# Stop the pipeline once the objects have been selected
# Useful for commands that return a lot of objects, like dealing with the event log
# In PS 2.0, this took a long time even though we only wanted the first 10 events
Start-Process powershell.exe -Args '-Version 2 -NoExit -Command Get-WinEvent | Select-Object -First 10'
# In PS 3.0, the pipeline stops after retrieving the first 10 objects
Get-WinEvent | Select-Object -First 10

After trying several methods, including throwing StopUpstreamCommandsException, ActionPreferenceStopException, and PipelineClosedException, calling $PSCmdlet.ThrowTerminatingError and $ExecutionContext.Host.Runspace.GetCurrentlyRunningPipeline().stopper.set_IsStopping($true) I finally found that just utilizing select-object was the only thing that didn't abort the whole script (versus just the pipeline). [Note that some of the items mentioned above require access to private members, which I accessed via reflection.]
# This looks like it should put a zero in the pipeline but on PS 3.0 it doesn't
function stop-pipeline {
$sp = {select-object -f 1}.GetSteppablePipeline($MyInvocation.CommandOrigin)
$sp.Begin($true)
$x = $sp.Process(0) # this call doesn't return
$sp.End()
}
New method follows based on comment from OP. Unfortunately this method is a lot more complicated and uses private members. Also I don't know how robust this - I just got the OP's example to work and stopped there. So FWIW:
# wh is alias for write-host
# sel is alias for select-object
# The following two use reflection to access private members:
# invoke-method invokes private methods
# select-properties is similar to select-object, but it gets private properties
# Get the system.management.automation assembly
$smaa=[appdomain]::currentdomain.getassemblies()|
? location -like "*system.management.automation*"
# Get the StopUpstreamCommandsException class
$upcet=$smaa.gettypes()| ? name -like "*upstream*"
filter x {
[CmdletBinding()]
param(
[parameter(ValueFromPipeline=$true)]
[object] $inputObject
)
process {
if ($inputObject -ge 5) {
# Create a StopUpstreamCommandsException
$upce = [activator]::CreateInstance($upcet,#($pscmdlet))
$PipelineProcessor=$pscmdlet.CommandRuntime|select-properties PipelineProcessor
$commands = $PipelineProcessor|select-properties commands
$commandProcessor= $commands[0]
$null = $upce.RequestingCommandProcessor|select-properties *
$upce.RequestingCommandProcessor.commandinfo =
$commandProcessor|select-properties commandinfo
$upce.RequestingCommandProcessor.Commandruntime =
$commandProcessor|select-properties commandruntime
$null = $PipelineProcessor|
invoke-method recordfailure #($upce, $commandProcessor.command)
1..($commands.count-1) | % {
$commands[$_] | invoke-method DoComplete
}
wh throwing
throw $upce
}
wh "< $inputObject >"
$inputObject
} # end process
end {
wh in x end
}
} # end filter x
filter y {
[CmdletBinding()]
param(
[parameter(ValueFromPipeline=$true)]
[object] $inputObject
)
process {
$inputObject
}
end {
wh in y end
}
}
1..5| x | y | measure -Sum
PowerShell code to retrieve PipelineProcessor value through reflection:
$t_cmdRun = $pscmdlet.CommandRuntime.gettype()
# Get pipelineprocessor value ($pipor)
$bindFlags = [Reflection.BindingFlags]"NonPublic,Instance"
$piporProp = $t_cmdRun.getproperty("PipelineProcessor", $bindFlags )
$pipor=$piporProp.GetValue($PSCmdlet.CommandRuntime,$null)
Powershell code to invoke method through reflection:
$proc = (gps)[12] # semi-random process
$methinfo = $proc.gettype().getmethod("GetComIUnknown", $bindFlags)
# Return ComIUnknown as an IntPtr
$comIUnknown = $methinfo.Invoke($proc, #($true))

I know that throwing a PipelineStoppedException stops the pipeline. The following example will simulate what you see with Select -first 1 in v3.0, in v2.0:
filter Select-Improved($first) {
begin{
$count = 0
}
process{
$_
$count++
if($count -ge $first){throw (new-object System.Management.Automation.PipelineStoppedException)}
}
}
trap{continue}
1..3| foreach { Write-Host "Value : $_"; $_ }| Select-Improved -first 1
write-host "after"

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse