As part of a refactor of a large PowerShell program from PS2.0, functions and scripting quick practices to PS5.0, classes and programming best practices, I have been moving to strong typing everywhere and finding some places where that brings up questions. The latest one being hash tables.
With both [Array] and [Hashtable] you can have a mix of content, which then makes enumerating that collection impossible to strongly type. For Arrays you have options like [String[]] or moving to Generic Lists with [System.Collections.Generic.List[String]]. But Dictionaries seem to pose a problem. I can't find a way to create a dictionary and limit it's values to a particular type. Something like [System.Collections.Specialized.OrderedDictionary[Int32,Int32]] fails.
So, IS there a way to make Dictionaries and OrderedDictionaries with strong typing of the index, the value or both? And if there isn't a way, is this considered a bit of a problem that must be overcome, or not a problem and if so why is it not a problem?
You can create generic dictionaries, but as per Create new System.Collections.Generic.Dictionary object fails in PowerShell you have to use a backtick to escape the comma in the list of generic type parameters:
e.g.
PS> $dict = new-object System.Collections.Generic.Dictionary[int`,string]
Note that PowerShell will still do its best to coerce types, so it will only throw an exception if it can't convert the type:
PS> $dict.Add("1", "aaa") # equivalent to $dict.Add(1, "aaa")
And I don't believe there is a generic OrderedDictionary out-of-the-box, which is probably why that fails :-)
PS> new-object System.Collections.Specialized.OrderedDictionary[Int32`,Int32]
New-Object: Cannot find type [System.Collections.Specialized.OrderedDictionary[Int32,Int32]]: verify that the assembly containing this type is loaded.
Related
I am trying to use LINQ in PowerShell. It seems like this should be entirely possible since PowerShell is built on top of the .NET Framework, but I cannot get it to work. For example, when I try the following (contrived) code:
$data = 0..10
[System.Linq.Enumerable]::Where($data, { param($x) $x -gt 5 })
I get the following error:
Cannot find an overload for "Where" and the argument count: "2".
Never mind the fact that this could be accomplished with Where-Object. The point of this question is not to find an idiomatic way of doing this one operation in PowerShell. Some tasks would be light-years easier to do in PowerShell if I could use LINQ.
The problem with your code is that PowerShell cannot decide to which specific delegate type the ScriptBlock instance ({ ... }) should be cast.
So it isn't able to choose a type-concrete delegate instantiation for the generic 2nd parameter of the Where method. And it also does't have syntax to specify a generic parameter explicitly. To resolve this problem, you need to cast the ScriptBlock instance to the right delegate type yourself:
$data = 0..10
[System.Linq.Enumerable]::Where($data, [Func[object,bool]]{ param($x) $x -gt 5 })
Why does [Func[object, bool]] work, but [Func[int, bool]] does not?
Because your $data is [object[]], not [int[]], given that PowerShell creates [object[]] arrays by default; you can, however, construct [int[]] instances explicitly:
$intdata = [int[]]$data
[System.Linq.Enumerable]::Where($intdata, [Func[int,bool]]{ param($x) $x -gt 5 })
To complement PetSerAl's helpful answer with a broader answer to match the question's generic title:
Note: The following applies up to at least PowerShell 7.2. Direct support for LINQ - with syntax comparable to the one in C# - is being discussed for a future version of PowerShell Core in GitHub issue #2226.
Using LINQ in PowerShell:
You need PowerShell v3 or higher.
You cannot call the LINQ extension methods directly on collection instances and instead must invoke the LINQ methods as static methods of the [System.Linq.Enumerable] type to which you pass the input collection as the first argument.
Having to do so takes away the fluidity of the LINQ API, because method chaining is no longer an option. Instead, you must nest static calls, in reverse order.
E.g., instead of $inputCollection.Where(...).OrderBy(...) you must write [Linq.Enumerable]::OrderBy([Linq.Enumerable]::Where($inputCollection, ...), ...)
Helper functions and classes:
Some methods, such as .Select(), have parameters that accept generic Func<> delegates (e.g, Func<T,TResult> can be created using PowerShell code, via a cast applied to a script block; e.g.:
[Func[object, bool]] { $Args[0].ToString() -eq 'foo' }
The first generic type parameter of Func<> delegates must match the type of the elements of the input collection; keep in mind that PowerShell creates [object[]] arrays by default.
Some methods, such as .Contains() and .OrderBy have parameters that accept objects that implement specific interfaces, such as IEqualityComparer<T> and IComparer<T>; additionally, input types may need to implement IEquatable<T> in order for comparisons to work as intended, such as with .Distinct(); all these require compiled classes written, typically, in C# (though you can create them from PowerShell by passing a string with embedded C# code to the Add-Type cmdlet); in PSv5+, however, you may also use custom PowerShell classes, with some limitations.
Generic methods:
Some LINQ methods themselves are generic and therefore require one or more type arguments.
In PowerShell (Core) 7.2- and Windows PowerShell, PowerShell cannot directly call such methods and must use reflection instead, because it only supports inferring type arguments, which cannot be done in this case; e.g.:
# Obtain a [string]-instantiated method of OfType<T>.
$ofTypeString = [Linq.Enumerable].GetMethod("OfType").MakeGenericMethod([string])
# Output only [string] elements in the collection.
# Note how the array must be nested for the method signature to be recognized.
PS> $ofTypeString.Invoke($null, (, ('abc', 12, 'def')))
abc
def
For a more elaborate example, see this answer.
In PowerShell (Core) 7.3+, you now have the option of specifying type arguments explicitly (see the conceptual about_Calling_Generic_Methods help topic); e.g.:
# Output only [string] elements in the collection.
# Note the need to enclose the input array in (...)
# -> 'abc', 'def'
[Linq.Enumerable]::OfType[string](('abc', 12, 'def'))
The LINQ methods return a lazy enumerable rather than an actual collection; that is, what is returned isn't the actual data yet, but something that will produce the data when enumerated.
In contexts where enumeration is automatically performed, notably in the pipeline, you'll be able to use the enumerable as if it were a collection.
However, since the enumerable isn't itself a collection, you cannot get the result count by invoking .Count nor can you index into the iterator; however, you can use member-access enumeration (extracting the values of a property of the objects being enumerated).
If you do need the results as a static array to get the usual collection behavior, wrap the invocation in [Linq.Enumerable]::ToArray(...).
Similar methods that return different data structures exist, such as ::ToList().
For an advanced example, see this answer.
For an overview of all LINQ methods including examples, see this great article.
In short: using LINQ from PowerShell is cumbersome and is only worth the effort if any of the following apply:
you need advanced query features that PowerShell's cmdlets cannot provide.
performance is paramount - see this article.
If you want to achieve LINQ like functionality then PowerShell has some cmdlets and functions, for instance: Select-Object, Where-Object, Sort-Object, Group-Object. It has cmdlets for most of LINQ features like Projection, Restriction, Ordering, Grouping, Partitioning, etc.
See Powershell One-Liners: Collections and LINQ.
For more details on using Linq and possibly how to make it easier, the article LINQ Through Powershell may be helpful.
I ran accross LINQ, when wanting to have a stable sort in PowerShell (stable: if property to sort by has the same value on two (or more) elements: preserve their order). Sort-Object has a -Stable-Switch, but only in PS 6.1+. Also, the Sort()-Implementations in the Generic Collections in .NET are not stable, so I came accross LINQ, where documentation says it's stable.
Here's my (Test-)Code:
# Getting a stable sort in PowerShell, using LINQs OrderBy
# Testdata
# Generate List to Order and insert Data there. o will be sequential Number (original Index), i will be Property to sort for (with duplicates)
$list = [System.Collections.Generic.List[object]]::new()
foreach($i in 1..10000){
$list.Add([PSCustomObject]#{o=$i;i=$i % 50})
}
# Sort Data
# Order Object by using LINQ. Note that OrderBy does not sort. It's using Delayed Evaluation, so it will sort only when GetEnumerator is called.
$propertyToSortBy = "i" # if wanting to sort by another property, set its name here
$scriptBlock = [Scriptblock]::Create("param(`$x) `$x.$propertyToSortBy")
$resInter = [System.Linq.Enumerable]::OrderBy($list, [Func[object,object]]$scriptBlock )
# $resInter.GetEnumerator() | Out-Null
# $resInter is of Type System.Linq.OrderedEnumerable<...>. We'll copy results to a new Generic List
$res = [System.Collections.Generic.List[object]]::new()
foreach($elem in $resInter.GetEnumerator()){
$res.Add($elem)
}
# Validation
# Check Results. If PropertyToSort is the same as in previous record, but previous sequence-number is higher, than the Sort has not been stable
$propertyToSortBy = "i" ; $originalOrderProp = "o"
for($i = 1; $i -lt $res.Count ; $i++){
if(($res[$i-1].$propertyToSortBy -eq $res[$i].$propertyToSortBy) -and ($res[$i-1].$originalOrderProp -gt $res[$i].$originalOrderProp)){
Write-host "Error on line $i - Sort is not Stable! $($res[$i]), Previous: $($res[$i-1])"
}
}
There is a simple way to make Linq chaining fluent, by setting a using statement to the Linq namespace, Then you can call the where function directly, no need to call the static Where function.
using namespace System.Linq
$b.Where({$_ -gt 0})
$b is an array of bytes, and I want to get all bytes that are greater than 0.
Works perfect.
Well i do get a simple (?) issue : i do not manage to remove a given hashtable from an ArrayList :
$testarraylist = New-Object System.Collections.ArrayList
$testarraylist.Add(1)
$testarraylist.Add(2)
$testarraylist.Add(3)
$testarraylist.Add(4)
$testarraylist.Add(#{1=1})
$testarraylist.Add(#{2=2})
$testarraylist.Add(#{3=3})
$testarraylist.Add(5)
$testarraylist.Add(6)
$testarraylist.Add(7)
i can remove simple "list" element, but i fail at removing the hashtable one
$testarraylist.remove(x)
the only way i found i by using
$testarraylist.removeat(4)
Which work, but aren't there an easier way ?
I searched quit a bit and i found lots of exemple but strangely none on this specific case.
Well maybe not strange as it may be super easy so that's why no one ever had to ask this question ?
or my google skill are failling me ... ?
thanks in advance.
This all comes from how the ArrayList will look for the object you want to remove. If you have a look at the .NET specification again, numbers like System.Int32 are a value types, but collections like System.Collections.Hashtable are reference types.
Basically, what it means, value types are always passed "by value", but for reference types, only a "reference" to that instance is passed.
You can try that out, so 1 -eq 1, because both have the same value, but #{1=1} -ne #{1=1}, because they are two separate instances, and thus two different references.
So, what you would have to do, is store the reference to the original instance in a variable first:
$h = #{1=1}
$testarraylist.Add($h)
$testarraylist.Remove($h)
Because this is such a basic and important concept in .NET, and basically all programming languages, I recommend you to take a few minutes and read more about it.
I can define an array as a generic list like this
$array = [Collections.Generic.List[String]]#()
And I can define an element in a hash table as an array like this
$hash = #{
array = #()
}
But I can't define an element in a hash table as a Generic List, like this
$hash = #{
array = [Collections.Generic.List[String]]#()
}
Instead I get this error
Cannot convert the "System.Object[]" value of type "System.Object[]"
to type "System.Collections.Generic.List`1[System.String]
I have been using Generic Lists to avoid the (minor in my case, to be sure) performance issue with regularly adding to a standard array. But this is the first time I have needed to create a hash table that contains a generic list (for a complex return value).
So, first question, is this is even possible? And second question, what is the difference under the hood between simply setting a variable and a hash table element?
EDIT: This is interesting. I CAN use
[System.Collections.ArrayList]#()
and it works. So, now I am curious what exactly is the difference between
[System.Collections.ArrayList]
and
[Collections.Generic.List[String]]
I guess this is the down side of being self taught. I found reference to [Collections.Generic.List[String]] on a BLOG, and maybe [System.Collections.ArrayList] is a much better answer? What I think I understand from this is that the former is specifically typed as a list of strings, while the latter is a list of generic objects, which then must be cast in use, which has potential bug and performance issues. Still, I wonder why the typed generic doesn't work in a hash table.
When I run Get-ChildItem in a directory with only one file, I get a single DirectoryInfo object:
PS H:\> (ls).GetType().Name
DirectoryInfo
As soon as I add a second file, the output becomes an array:
PS H:\> (ls).GetType().Name
Object[]
How should I deal with this dichotomy in a function? Ideally, I'd like to force it to return an Array even when there's only one element, preferably without having to put in conditional logic based on the result of GetType() or Length or whatever.
Use array operator #(): $Array=#(ls). That operator guaranteed that you will have an array even if pipeline return zero or one object.
Expanding on PetSerAl's answer, you could also cast the type you need more explicitly:
[array](ls) will get you a System.Array object with a single member, so you could use this in a place where you want to avoid creating a new variable but need a specific type
You can also specify arrays that contain only specific types by casting: [int[]]$integersOnly = 1,2,3 will give you a System.Array object that can only hold objects of type [int]
Keep in mind you can use .Net classes - what if you want an array you can modify easily? [System.Collections.ArrayList](ls) does that, enjoy using the Remove() method
A few other hints, while I'm at it:
Want to see what you can do with an object of a specific type? Pipe it to Get-Member, the single most useful command I can think of; it'll show you everything you can do with the object
Curious about a class and what it can do, or looking for details like the different constructors that are available? Just enter [<class_name_here>] and if the assembly is loaded it'll show you everything you want to know
I want to create a List<Tuple<int,string>> in PowerShell, but
New-Object System.Collections.Generic.List[System.Collections.Generic.Tuple[int,string]]
does not work. What am I missing?
Lee's answer is the correct way to create a List of Tuples (although you can make the statement much shorter by omitting the System namespace). However, the better questions to ask in while programming in PowerShell are:
Why should I return a strongly typed object?
Do I really want to output a list?
The first one has its pros and cons. Strongly typed objects are useful to return if they have methods or events that will be useful for the next step in the pipeline. If, on the other hand, you just want to return a bunch of items with a name an int, I'd use something like:
[PSCustomObject]#{string="string";int=1}
This will create a property bag (what most devs know as a Tuple) containing the data you need, with more descriptive names than a .NET tuple will give you on the object. It's also pretty fast. If, on the other hand, the data is meant for an API, then by all means created the strongly typed object it expects.
The second question is a little bit harder to understand but has a much clearer answer. In many cases, you'll want to accept input for another function from the output of one function. In this, for many reasons, a strongly typed list is not your best friend. Strongly typed lists do not always clearly convert into arrays (this is especially true for generics), and, as arguments to a function, severely limit the different types of data you can put into the function. They also end up providing a little bit of a misleading and harder to use output (especially when piping in objects and producing multiple results), since the whole list will be displayed as one outputted item, instead of displaying each item on its own. Most annoyingly, strongly typed lists behave differently than arrays in PowerShell when you "over-index" (i.e. ask for item 10000 in a list of 5 items) Arrays will quietly return null. Lists will barf loudly. More practically, accumulating items into a list and then outputting the list will "hold" the pipeline until all items are in. This may be what you want, but in most cases it's nice to see output coming out of a function as it runs. Finally, lists add to the memory overhead of the function, as you need to accumulate a set of objects in the function's stack.
What I generally do is simply emit multiple objects. That is, I avoid using the return keyword and I take advantage of PowerShell's ability to return objects that are not captured into a variable. If I assign the result into a variable, the items will be accumulated within an arraylist and returned to you as an array. This quick little demonstration function shows you how.
function Get-RandomData {
param($count = 10)
foreach ($n in 1..$count){
[PSCustomObject]#{Name="Number$n";Number=Get-Random}
}
}
It's worth noting that specialized collections are still quite useful. I very often use Queues and Stacks when the need arises. However, I very rarely find myself using generics or lists unless I am working with a part of .NET that specifically requires generics or lists. This is pretty personally ironic, since I was the person who tested support for generics in PowerShell V2. It's absolutely required when you want to work with a piece of .NET that can only take a list of tuples. It's slightly to severely counterproductive in all other cases.
You can create it with:
New-Object 'Collections.Generic.List[Tuple[int,string]]'
You are spelling Generic wrongly, and the Tuple types are in the System namespace, not System.Collections.Generic.
This is what worked for me. I am also providing sample code to insert into the list:
$myList = New-Object System.Collections.ArrayList
#add range
$myList.AddRange((
[Tuple]::Create(1,"string 1"),
[Tuple]::Create(2,"string 2"),
[Tuple]::Create(3,"string 3")
));
#add single item
$myList.Add([Tuple]::Create(4,"string 4"))
#create variable and add to list
$myTuple = [Tuple]::Create(5,"string 5")
$myList.Add( $myTuple)
Write-Host $myList
Reference:
Using and Understanding Tuples in PowerShell
You can create a tuple with up to 7 elements like this:
$tuple = [tuple]::Create(1,2,3,4,5,6,7)
and you can get the value of an element by naming its item (starting with item1):
$tuple.item1
1
If you have 8 or more elements then use another tuple for the 8th element:
$tuple = [tuple]::Create(1,2,3,4,5,6,7,[tuple]::create(8,9))
internally, the 8th element is called "rest". You can get the values like this:
$tuple.rest.item1.item1
8
If you need to specify a type for each element then do it in front of each value:
$tuple = [tuple]::create([string]"a", [int]1, [byte]255)
Finally, adding a tuple to a list works like this:
$list = New-Object 'Collections.ArrayList'
$tuple = [tuple]::create([string]"a", [int]1, [byte]255)
$list.add($tuple)
There is no need to specify the tuple-details for creating the list.
Keep in mind, that you cannot change a value of a tuple later and the default sort-method sorts ascendig in the order of items1, item2 etc. (or you need a custom IComparer), but using them is super fast (way faster than working with large lists of PsObject or PSCustomObject and also faster than an import-Csv)!