How to stop PowerShell from unpacking an Enumerable object? - powershell

Working on a simple helper function in PowerShell that takes a couple of parameters and creates a custom Enumerable object and outputs that object to the pipeline. The problem I am having is that PowerShell is always outputting a System.Array that contains the objects that are enumerated by my custom Enumerable object. How can I keep PowerShell from unpacking the Enumerable object?
The code: http://gist.github.com/387768

Try to change the line 46 from
$row
to
, $row
EDIT: as Johannes correctly pointed out, the unary operator comma creates an array with one member.

Related

Why can't I call IEnumerable.Sum() on an array in PowerShell? [duplicate]

I am trying to use LINQ in PowerShell. It seems like this should be entirely possible since PowerShell is built on top of the .NET Framework, but I cannot get it to work. For example, when I try the following (contrived) code:
$data = 0..10
[System.Linq.Enumerable]::Where($data, { param($x) $x -gt 5 })
I get the following error:
Cannot find an overload for "Where" and the argument count: "2".
Never mind the fact that this could be accomplished with Where-Object. The point of this question is not to find an idiomatic way of doing this one operation in PowerShell. Some tasks would be light-years easier to do in PowerShell if I could use LINQ.
The problem with your code is that PowerShell cannot decide to which specific delegate type the ScriptBlock instance ({ ... }) should be cast.
So it isn't able to choose a type-concrete delegate instantiation for the generic 2nd parameter of the Where method. And it also does't have syntax to specify a generic parameter explicitly. To resolve this problem, you need to cast the ScriptBlock instance to the right delegate type yourself:
$data = 0..10
[System.Linq.Enumerable]::Where($data, [Func[object,bool]]{ param($x) $x -gt 5 })
Why does [Func[object, bool]] work, but [Func[int, bool]] does not?
Because your $data is [object[]], not [int[]], given that PowerShell creates [object[]] arrays by default; you can, however, construct [int[]] instances explicitly:
$intdata = [int[]]$data
[System.Linq.Enumerable]::Where($intdata, [Func[int,bool]]{ param($x) $x -gt 5 })
To complement PetSerAl's helpful answer with a broader answer to match the question's generic title:
Note: The following applies up to at least PowerShell 7.2. Direct support for LINQ - with syntax comparable to the one in C# - is being discussed for a future version of PowerShell Core in GitHub issue #2226.
Using LINQ in PowerShell:
You need PowerShell v3 or higher.
You cannot call the LINQ extension methods directly on collection instances and instead must invoke the LINQ methods as static methods of the [System.Linq.Enumerable] type to which you pass the input collection as the first argument.
Having to do so takes away the fluidity of the LINQ API, because method chaining is no longer an option. Instead, you must nest static calls, in reverse order.
E.g., instead of $inputCollection.Where(...).OrderBy(...) you must write [Linq.Enumerable]::OrderBy([Linq.Enumerable]::Where($inputCollection, ...), ...)
Helper functions and classes:
Some methods, such as .Select(), have parameters that accept generic Func<> delegates (e.g, Func<T,TResult> can be created using PowerShell code, via a cast applied to a script block; e.g.:
[Func[object, bool]] { $Args[0].ToString() -eq 'foo' }
The first generic type parameter of Func<> delegates must match the type of the elements of the input collection; keep in mind that PowerShell creates [object[]] arrays by default.
Some methods, such as .Contains() and .OrderBy have parameters that accept objects that implement specific interfaces, such as IEqualityComparer<T> and IComparer<T>; additionally, input types may need to implement IEquatable<T> in order for comparisons to work as intended, such as with .Distinct(); all these require compiled classes written, typically, in C# (though you can create them from PowerShell by passing a string with embedded C# code to the Add-Type cmdlet); in PSv5+, however, you may also use custom PowerShell classes, with some limitations.
Generic methods:
Some LINQ methods themselves are generic and therefore require one or more type arguments.
In PowerShell (Core) 7.2- and Windows PowerShell, PowerShell cannot directly call such methods and must use reflection instead, because it only supports inferring type arguments, which cannot be done in this case; e.g.:
# Obtain a [string]-instantiated method of OfType<T>.
$ofTypeString = [Linq.Enumerable].GetMethod("OfType").MakeGenericMethod([string])
# Output only [string] elements in the collection.
# Note how the array must be nested for the method signature to be recognized.
PS> $ofTypeString.Invoke($null, (, ('abc', 12, 'def')))
abc
def
For a more elaborate example, see this answer.
In PowerShell (Core) 7.3+, you now have the option of specifying type arguments explicitly (see the conceptual about_Calling_Generic_Methods help topic); e.g.:
# Output only [string] elements in the collection.
# Note the need to enclose the input array in (...)
# -> 'abc', 'def'
[Linq.Enumerable]::OfType[string](('abc', 12, 'def'))
The LINQ methods return a lazy enumerable rather than an actual collection; that is, what is returned isn't the actual data yet, but something that will produce the data when enumerated.
In contexts where enumeration is automatically performed, notably in the pipeline, you'll be able to use the enumerable as if it were a collection.
However, since the enumerable isn't itself a collection, you cannot get the result count by invoking .Count nor can you index into the iterator; however, you can use member-access enumeration (extracting the values of a property of the objects being enumerated).
If you do need the results as a static array to get the usual collection behavior, wrap the invocation in [Linq.Enumerable]::ToArray(...).
Similar methods that return different data structures exist, such as ::ToList().
For an advanced example, see this answer.
For an overview of all LINQ methods including examples, see this great article.
In short: using LINQ from PowerShell is cumbersome and is only worth the effort if any of the following apply:
you need advanced query features that PowerShell's cmdlets cannot provide.
performance is paramount - see this article.
If you want to achieve LINQ like functionality then PowerShell has some cmdlets and functions, for instance: Select-Object, Where-Object, Sort-Object, Group-Object. It has cmdlets for most of LINQ features like Projection, Restriction, Ordering, Grouping, Partitioning, etc.
See Powershell One-Liners: Collections and LINQ.
For more details on using Linq and possibly how to make it easier, the article LINQ Through Powershell may be helpful.
I ran accross LINQ, when wanting to have a stable sort in PowerShell (stable: if property to sort by has the same value on two (or more) elements: preserve their order). Sort-Object has a -Stable-Switch, but only in PS 6.1+. Also, the Sort()-Implementations in the Generic Collections in .NET are not stable, so I came accross LINQ, where documentation says it's stable.
Here's my (Test-)Code:
# Getting a stable sort in PowerShell, using LINQs OrderBy
# Testdata
# Generate List to Order and insert Data there. o will be sequential Number (original Index), i will be Property to sort for (with duplicates)
$list = [System.Collections.Generic.List[object]]::new()
foreach($i in 1..10000){
$list.Add([PSCustomObject]#{o=$i;i=$i % 50})
}
# Sort Data
# Order Object by using LINQ. Note that OrderBy does not sort. It's using Delayed Evaluation, so it will sort only when GetEnumerator is called.
$propertyToSortBy = "i" # if wanting to sort by another property, set its name here
$scriptBlock = [Scriptblock]::Create("param(`$x) `$x.$propertyToSortBy")
$resInter = [System.Linq.Enumerable]::OrderBy($list, [Func[object,object]]$scriptBlock )
# $resInter.GetEnumerator() | Out-Null
# $resInter is of Type System.Linq.OrderedEnumerable<...>. We'll copy results to a new Generic List
$res = [System.Collections.Generic.List[object]]::new()
foreach($elem in $resInter.GetEnumerator()){
$res.Add($elem)
}
# Validation
# Check Results. If PropertyToSort is the same as in previous record, but previous sequence-number is higher, than the Sort has not been stable
$propertyToSortBy = "i" ; $originalOrderProp = "o"
for($i = 1; $i -lt $res.Count ; $i++){
if(($res[$i-1].$propertyToSortBy -eq $res[$i].$propertyToSortBy) -and ($res[$i-1].$originalOrderProp -gt $res[$i].$originalOrderProp)){
Write-host "Error on line $i - Sort is not Stable! $($res[$i]), Previous: $($res[$i-1])"
}
}
There is a simple way to make Linq chaining fluent, by setting a using statement to the Linq namespace, Then you can call the where function directly, no need to call the static Where function.
using namespace System.Linq
$b.Where({$_ -gt 0})
$b is an array of bytes, and I want to get all bytes that are greater than 0.
Works perfect.

ArrayList .Add vs .AddRange vis-a-vis the Pipeline

Given a properly defined variable
$test = New-Object System.Collections.ArrayList
.Add pollutes the pipeline with the count of items in the array, while .AddRange does not.
$test.Add('Single') will dump the count to the console. $test.AddRange(#('Single2')) will be clean with no extra effort. Why the different behavior? Is it just an oversight, or is there some intentional behavior I am not understanding?
Given that .AddRange requires coercing to an array when not using a variable (that is already an array) I am tending towards using [void]$variable.Add('String') when I know I need to only add one item, and [void]$test.AddRange($variable) when I am adding an array to an array, even when $variable only contains, or could only contain, a single item. The [void] here isn't required, but I wonder if it's just best practice to have it, depending of course on the answer above. Or am I missing something there too?
Why the different behavior? Is it just an oversight, or is there some intentional behavior I am not understanding?
Because many years ago, someone decided that's how ArrayList should behave!
Add() returns the index at which the argument was inserted into the list, which may indeed be useful and makes sense.
With AddRange() on the other hand, it's not immediately clear why it should return anything, and if yes, what? The index of the first item in the input arguments? The last? Or should it return a variable-sized array with all the insert indices? That would be awkward! So whoever implemented ArrayList decided not to return anything at all.
In C# or VB.NET, for which ArrayList was initially designed, "polluting the pipeline" doesn't really exist as a concept, the runtime would simply omit copying the return value back to the caller if someone invokes .Add() without assigning to a variable.
The [void] here isn't required, but I wonder if it's just best practice to have it, depending of course on the answer above. Or am I missing something there too?
No, it's completely unnecessary. AddRange() is not magically one day gonna change to output anything.
If you don't ever need to know the insert index, use a [System.Collections.Generic.List[psobject]] instead:
$list = [System.Collections.Generic.List[psobject]]::new()
# this won't return anything, no need for `[void]`
$list.Add(123)
If for some reason you must use an ArrayList, you can "silence" it by overriding the Add() method:
function New-SilentArrayList {
# Create a new ArrayList
$newList = [System.Collections.ArrayList]::new()
# Create a new `Add()` method, then return the list
$newAdd = #{
InputObject = $newList
MemberType = 'ScriptMethod'
Name = 'Add'
Value = {param($obj) $this.AddRange(#($obj))}
}
Write-Output $(
Add-Member #newAdd -Force -PassThru
) -NoEnumerate
}
Now your ArrayList's Add() will never make a peep again!
PS C:\> $list = New-SilentArrayList
PS C:\> $list.Add(123)
PS C:\> $list
123
Apparently I didn't quiet understand where you where heading to.
"Add pollutes the pipeline", at a second thought is a correct statement but .Net methods like $variable.Add('String') do not use the PowerShell pipeline by itself (until the moment you output the array using the Write-Output command which is the default command if you do not assign it to a variable).
The Write-Output cmdlet is typically used in scripts to display
strings and other objects on the console. However, because the default
behavior is to display the objects at the end of a pipeline, it is
generally not necessary to use the cmdlet.
The point is that Add method of ArrayList returns a [Int32] "The ArrayList index at which the value has been added" and the AddRange doesn't return anything. Meaning if you don't assign the results to something else (which includes $Null = $test.Add('Single')) it will indeed be output to the PowerShell Pipeline.
Instead you might also consider to use the Add method of the List class which also doesn't return anything, see also: ArrayList vs List<> in C#.
But in general, I recommend to use native PowerShell commands that do use the Pipeline
(I can't give you a good example as it is not clear what output you expect but I noticed another question you removed and from that question, I presume that this Why should I avoid using the increase assignment operator (+=) to create a collection answer might help you further)

Is there a way to access the input object after it was piped?

Is there a way to pipe a whole object through a pipeline and process mentioned Object in one step? Put simply the $PSItem Variable on the other side of my pipeline should have the same value as the whole object which was put through the pipe.
I've found the following method to have a sort of anonymous functions in posh though this processes every item in the input object separately (As this is what the process block in advanced functions is meant for).
Therefor the code:
Get-Service | & {process {return $_.length}}
Returns:
1 1 1 1 1 1 1..
What I'm looking for is a way to access the full object with the $_/$PSItem variable after the pipeline and process it further / return properties of this object.
The Process Block in PowerShell can take single member arrays as its input which then leads it to use the whole member of the array to process rather then all members of the Object.
Using the comma operator one can create a single member array in a simple fashion.
Further information about Operators
The following code uses the comma operator to put the object array which is returned by Get-Process into a single member array.
,(Get-Process)
You are now free to use the object in the pipeline and access properties of it.
,(Get-Process) | & {process {if($_.length -ge 10) {return "Greater / equals 10"}else{return "Smaller than 10"}}}

How to force array type?

When I run Get-ChildItem in a directory with only one file, I get a single DirectoryInfo object:
PS H:\> (ls).GetType().Name
DirectoryInfo
As soon as I add a second file, the output becomes an array:
PS H:\> (ls).GetType().Name
Object[]
How should I deal with this dichotomy in a function? Ideally, I'd like to force it to return an Array even when there's only one element, preferably without having to put in conditional logic based on the result of GetType() or Length or whatever.
Use array operator #(): $Array=#(ls). That operator guaranteed that you will have an array even if pipeline return zero or one object.
Expanding on PetSerAl's answer, you could also cast the type you need more explicitly:
[array](ls) will get you a System.Array object with a single member, so you could use this in a place where you want to avoid creating a new variable but need a specific type
You can also specify arrays that contain only specific types by casting: [int[]]$integersOnly = 1,2,3 will give you a System.Array object that can only hold objects of type [int]
Keep in mind you can use .Net classes - what if you want an array you can modify easily? [System.Collections.ArrayList](ls) does that, enjoy using the Remove() method
A few other hints, while I'm at it:
Want to see what you can do with an object of a specific type? Pipe it to Get-Member, the single most useful command I can think of; it'll show you everything you can do with the object
Curious about a class and what it can do, or looking for details like the different constructors that are available? Just enter [<class_name_here>] and if the assembly is loaded it'll show you everything you want to know

How to create an ArrayList from an Array in PowerShell?

I've got a list of files in an array. I want to enumerate those files, and remove specific files from it. Obviously I can't remove items from an array, so I want to use an ArrayList.
But the following doesn't work for me:
$temp = Get-ResourceFiles
$resourceFiles = New-Object System.Collections.ArrayList($temp)
Where $temp is an Array.
How can I achieve that?
I can't get that constructor to work either. This however seems to work:
# $temp = Get-ResourceFiles
$resourceFiles = New-Object System.Collections.ArrayList($null)
$resourceFiles.AddRange($temp)
You can also pass an integer in the constructor to set an initial capacity.
What do you mean when you say you want to enumerate the files? Why can't you just filter the wanted values into a fresh array?
Edit:
It seems that you can use the array constructor like this:
$resourceFiles = New-Object System.Collections.ArrayList(,$someArray)
Note the comma. I believe what is happening is that when you call a .NET method, you always pass parameters as an array. PowerShell unpacks that array and passes it to the method as separate parameters. In this case, we don't want PowerShell to unpack the array; we want to pass the array as a single unit. Now, the comma operator creates arrays. So PowerShell unpacks the array, then we create the array again with the comma operator. I think that is what is going on.
Probably the shortest version:
[System.Collections.ArrayList]$someArray
It is also faster because it does not call relatively expensive New-Object.