Intrinsic index of item in a Powershell filtered item (using Where-Object for example) - powershell

Is there any way of getting the intrinsic numeric index (order#)
of the selected/filtered/matched object
from a Where-Object processing?
for example:
Get-Process | Where-Object -property id -eq 1024
without using further code...
?is it possible to get the index of the object with ID=4
from some 'inner/hidden' Powershell mechanism???
or instruct Where-Object to spit out the index
where the match(es) took place?
(in this case would be 1, 0 is the 'idle' process)

You could capture the result of the Get-Process cmdlet as array, and use the IndexOf() method to get the index or -1 if that Id is not found:
$gp = (Get-Process).Id
$gp.IndexOf(1024)

No, there is no built-in mechanism for what you're looking for as of PowerShell 7.1
If only one item is to be matched - as in your case - you can use the Array type's static .FindIndex() method:
$processes = Get-Process
# Find the 0-based index of the process with ID 1024.
[Array]::FindIndex($processes, [Predicate[object]] { param($o) $o.Id -eq 1024 })
Note that this returns a zero-based index if a match is found, and -1 otherwise.
The ::FindIndex() method has the advantage of searching only for the first match, unlike Where-Object, which as of PowerShell 7.1 always searches the entire input collection (see below). As a method rather than a pipeline-based cmdlet, it invariably requires the input array to be in memory in full (which the pipeline doesn't require).
While it wouldn't directly address your use case, note that there's conceptually related feature request #13772 on GitHub, which proposes introducing an automatic $PSIndex variable to be made available inside ForEach-Object and Where-Object script blocks.
As an aside:
Note that while [Array]::FindIndex only ever finds the first match's index, Where-Object is limited in the opposite way: as of PowerShell 7.1, it always finds all matches, which is inefficient if you're only looking for one match.
While the related .Where() array method does offer a way to stop processing after the first match (e.g.,
('long', 'foo', 'bar').Where({ $_.Length -eq 3 }, 'First')), methods operate on in-memory collections only, so it would be helpful if the pipeline-based Where-Object cmdlet supported such a feature as well - see GitHub feature request #13834.

Or something like this. Add a property called line that keeps increasing for each object.
ps | % { $line = 1 } { $_ | add-member line ($line++) -PassThru } | where id -eq 13924 |
select line,id
line Id
---- --
184 13924

Related

How to pipe results into output array

After playing around with some powershell script for a while i was wondering if there is a version of this without using c#. It feels like i am missing some information on how to pipe things properly.
$packages = Get-ChildItem "C:\Users\A\Downloads" -Filter "*.nupkg" |
%{ $_.Name }
# Select-String -Pattern "(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)" |
# %{ #($_.Matches[0].Groups["packageId"].Value, $_.Matches[0].Groups["version"].Value) }
foreach ($package in $packages){
$match = [System.Text.RegularExpressions.Regex]::Match($package, "(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)")
Write-Host "$($match.Groups["packageId"].Value) - $($match.Groups["version"].Value)"
}
Originally i tried to do this with powershell only and thought that with #(1,2,3) you could create an array.
I ended up bypassing the issue by doing the regex with c# instead of powershell, which works, but i am curious how this would have been done with powershell only.
While there are 4 packages, doing just the powershell version produced 8 lines. So accessing my data like $packages[0][0] to get a package id never worked because the 8 lines were strings while i expected 4 arrays to be returned
Terminology note re without using c#: You mean without direct use of .NET APIs. By contrast, C# is just another .NET-based language that can make use of such APIs, just like PowerShell itself.
Note:
The next section answers the following question: How can I avoid direct calls to .NET APIs for my regex-matching code in favor of using PowerShell-native commands (operators, automatic variables)?
See the bottom section for the Select-String solution that was your true objective; the tl;dr is:
# Note the `, `, which ensures that the array is output *as a single object*
%{ , #($_.Matches[0].Groups["packageId"].Value, $_.Matches[0].Groups["version"].Value) }
The PowerShell-native (near-)equivalent of your code is (note tha the assumption is that $package contains the content of the input file):
# Caveat: -match is case-INSENSITIVE; use -cmatch for case-sensitive matching.
if ($package -match '(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)') {
"$($Matches['packageId']) - $($Matches['Version'])"
}
-match, the regular-expression matching operator, is the equivalent of [System.Text.RegularExpressions.Regex]::Match() (which you can shorten to [regex]::Match()) in that it only looks for (at most) one match.
Caveat re case-sensitivity: -match (and its rarely used alias -imatch) is case-insensitive by default, as all PowerShell operators are; for case-sensitive matching, use the c-prefixed variant, -cmatch.
By contrast, .NET APIs are case-sensitive by default; you'd have to pass the [System.Text.RegularExpressions.RegexOptions]::IgnoreCase flag to [regex]::Match() for case-insensitive matching (you may use 'IgnoreCase', which PowerShell auto-converts for you).
As of PowerShell 7.2.x, there is no operator that is the equivalent of the related return-ALL-matches .NET API, [regex]::Matches(). See GitHub issue #7867 for a green-lit but yet-to-be-implemented proposal to introduce one, named -matchall.
However, instead of directly returning an object describing what was (or wasn't) matched, -match returns a Boolean, i.e. $true or $false, to indicate whether matching succeeded.
Only if -match returns $true does information about a match become available, namely via the automatic $Matches variable, which is a hashtable reflecting the matching parts of the input string: entry 0 is always the full match, with optional additional entries reflecting what any capture groups ((...)) captured, either by index, if they're anonymous (starting with 1) or, as in your case, for named capture groups ((?<name>...)) by name.
Syntax note: Given that PowerShell allows use of dot notation (property-access syntax) even with hashtables, the above command could have used $Matches.packageId instead of $Matches['packageId'], for instance, which also works with the numeric (index-based) entries, e.g., $Matches.0 instead of $Matches[0]
Caveat: If an array (enumerable) is used as the LHS operand, -match' behavior changes:
$Matches is not populated.
filtering is performed; that is, instead of returning a Boolean indicating whether matching succeeded, the subarray of matching input strings is returned.
Note that the $Matches hashtable only provides the matched strings, not also metadata such as index and length, as found in [regex]::Match()'s return object, which is of type [System.Text.RegularExpressions.Match].
Select-String solution:
$packages |
Select-String '(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)' |
ForEach-Object {
"$($_.Matches[0].Groups['packageId'].Value) - $($_.Matches[0].Groups['version'].Value)"
}
Select-String outputs Microsoft.PowerShell.Commands.MatchInfo instances, whose .Matches collection contains one or more [System.Text.RegularExpressions.Match] instances, i.e. instances of the same type as returned by [regex]::Match()
Unless -AllMatches is also passed, .Matches only ever has one entry, hence the use of [0] to target that entry above.
As you can see, working with Select-Object's output objects requires you to ultimately work with the same .NET type as when you call [regex]::Match() directly.
However, no method calls are required, and discovering the properties of the output objects is made easy in PowerShell via the Get-Member cmdlet.
If you want to capture the matches in a jagged array:
$capturedStrings = #(
$packages |
Select-String '(?<packageId>[^\d]+)\.(?<version>[\w\d\.-]+)(?=.nupkg)' |
ForEach-Object {
# Output an array of all capture-group matches,
# *as a single object* (note the `, `)
, $_.Matches[0].Groups.Where({ $_.Name -ne '0' }).Value
}
)
This returns an array of arrays, each element of which is the array of capture-group matches for a given package, so that $capturedStrings[0][0] returns the packageId value for the first package, for instance.
Note:
$_.Matches[0].Groups.Where({ $_.Name -ne '0' }).Value programmatically enumerates all capture-group matches and returns an their .Value property values as an array, using member-access enumeration; note how name '0' must be excluded, as it represents the whole match.
With the capture groups in your specific regex, the above is equivalent to the following, as shown in a commented-out line in your question:
#($_.Matches[0].Groups['packageId'].Value, $_.Matches[0].Groups['version'].Value)
, ..., the unary form of the array-construction operator, is used as a shortcut for outputting the array (symbolized by ... here) as a whole, as a single object. By default, enumeration would occur and the elements would be emitted one by one. , ... is in effect a shortcut to the conceptually clearer Write-Output -NoEnumerate ... - see this answer for an explanation of the technique.
Additionally, #(...), the array subexpression operator is needed in order to ensure that a jagged array (nested array) is returned even in the event that only one array is returned across all $packages.

PowerShell: How to get count of piped collection?

Suppose I have a process that generates a collection of objects. For a very simple example, consider $(1 | get-member). I can get the number of objects generated:
PS C:\WINDOWS\system32> $(1 | get-member).count
21
or I can do something with those objects.
PS C:\WINDOWS\system32> $(1 | get-member) | ForEach-object {write-host $_.name}
CompareTo
Equals
...
With only 21 objects, doing the above is no problem. But what if the process generates hundreds of thousands of objects? Then I don't want to run the process once just to count the objects and then run it again to execute what I want to do with them. So how can I get a count of objects in a collection sent down the pipeline?
A similar question was asked before, and the accepted answer was to use a counter variable inside the script block that works on the collection. The problem is that I already have that counter and what I want is to check that the outcome of that counter is correct. So I don't want to just count inside the script block. I want a separate, independent measure of the size of the collection that I sent down the pipeline. How can I do that?
If processing and counting is needed:
Doing your own counting inside a ForEach-Object script block is your best bet to avoid processing in two passes.
The problem is that I already have that counter and what I want is to check that the outcome of that counter is correct.
ForEach-Object is reliably invoked for each and every input object, including $null values, so there should be no need to double-check.
If you want a cleaner separation of processing and counting, you can pass multiple -Process script blocks to ForEach-Object (in this example, { $_ + 1 } is the input-processing script block and { ++$count } is the input-counting one):
PS> 1..5 | ForEach-Object -Begin { $count = 0 } `
-Process { $_ + 1 }, { ++$count } `
-End { "--- count: $count" }
2
3
4
5
6
--- count: 5
Note that, due to a quirk in ForEach-Object's parameter binding, passing -Begin and -End script blocks is actually required in order to pass multiple -Process (per-input-object) blocks; pass $null if you don't actually need -Begin and/or -End - see GitHub issue #4513.
Also note that the $count variable lives in the caller's scope, and is not scoped to the ForEach-Object call; that is, $count = 0 potentially updates a preexisting $count variable, and, if it didn't previously exist, lives on after the ForEach-Object call.
If only counting is needed:
Measure-Object is the cmdlet to use with large, streaming input collections in the pipeline[1]:
The following example generates 100,000 integers one by one and has Measure-Object count them one by one, without collecting the entire input in memory.
PS> (& { $i=0; while ($i -lt 1e5) { (++$i) } } | Measure-Object).Count
100000
Caveat: Measure-Object ignores $null values in the input collection - see GitHub issue #10905.
Note that while counting input objects is Measure-Object's default behavior, it supports a variety of other operations as well, such as summing -Sum and averaging (-Average), optionally combined in a single invocation.
[1] Measure-Object, as a cmdlet, is capable of processing input in a streaming fashion, meaning it counts objects it receives one by one, as they're being received, which means that even very large streaming input sets (those also created one by one, such as enumerating the rows of a large CSV file with Import-Csv) can be processed without the risk of running out of memory - there is no need to load the input collection as a whole into memory. However, if (a) the input collection already is in memory, or (b) it can fit into memory and performance is important, then use (...).Count.

PowerShell: What is the point of ForEach-Object with InputObject?

The documentation for ForEach-object says "When you use the InputObject parameter with ForEach-Object, instead of piping command results to ForEach-Object, the InputObject value is treated as a single object." This behavior can easily be observed directly:
PS C:\WINDOWS\system32> ForEach-Object -InputObject #(1, 2, 3) {write-host $_}
1 2 3
This seems weird. What is the point of a "ForEach" if there is no "each" to do "for" on? Is there really no way to get ForEach-object to act directly on the individual elements of an array without piping? if not, it seems that ForEach-Object with InputObject is completely useless. Is there something I don't understand about that?
In the case of ForEach-Object, or any cmdlet designed to operate on a collection, using the -InputObject as a direct parameter doesn't make sense because the cmdlet is designed to operate on a collection, which needs to be unrolled and processed one element at a time. However, I would also not call the parameter "useless" because it still needs to be defined so it can be set to allow input via the pipeline.
Why is it this way?
-InputObject is, by convention, a generic parameter name for what should be considered to be pipeline input. It's a parameter with [Parameter(ValueFromPipeline = $true)] set to it, and as such is better suited to take input from the pipeline rather passed as a direct argument. The main drawback of passing it in as a direct argument is that the collection is not guaranteed to be unwrapped, and may exhibit some other behavior that may not be intended. From the about_pipelines page linked to above:
When you pipe multiple objects to a command, PowerShell sends the objects to the command one at a time. When you use a command parameter, the objects are sent as a single array object. This minor difference has significant consequences.
To explain the above quote in different words, passing in a collection (e.g. an array or a list) through the pipeline will automatically unroll the collection and pass it to the next command in the pipeline one at a time. The cmdlet does not unroll -InputObject itself, the data is delivered one element at a time. This is why you might see problems when passing a collection to the -InputObject parameter directly - because the cmdlet is probably not designed to unroll a collection itself, it expects each collection element to be handed to it in a piecemeal fashion.
Consider the following example:
# Array of hashes with a common key
$myHash = #{name = 'Alex'}, #{name='Bob'}, #{name = 'Sarah'}
# This works as intended
$myHash | Where-Object { $_.name -match 'alex' }
The above code outputs the following as expected:
Name Value
---- -----
name Alex
But if you pass the hash as InputArgument directly like this:
Where-Object -InputObject $myHash { $_.name -match 'alex' }
It returns the whole collection, because -InputObject was never unrolled as it is when passed in via the pipeline, but in this context $_.name -match 'alex' still returns true. In other words, when providing a collection as a direct parameter to -InputObject, it's treated as a single object rather than executing each time against each element in the collection. This can also give the appearance of working as expected when checking for a false condition against that data set:
Where-Object -InputObject $myHash { $_.name -match 'frodo' }
which ends up returning nothing, because even in this context frodo is not the value of any of the name keys in the collection of hashes.
In short, if something expects the input to be passed in as pipeline input, it's usually, if not always, a safer bet to do it that way, especially when passing in a collection. However, if you are working with a non-collection, then there is likely no issue if you opt to use the -InputObject parameter directly.
Bender the Greatest's helpful answer explains the current behavior well.
For the vast majority of cmdlets, direct use of the -InputObject parameter is indeed pointless and the parameter should be considered an implementation detail whose sole purpose is to facilitate pipeline input.
There are exceptions, however, such as the Get-Member cmdlet, where direct use of -InputObject allows you to inspect the type of a collection itself, whereas providing that collection via the pipeline would report information about its elements' types.
Given how things currently work, it is quite unfortunate that the -InputObject features so prominently in most cmdlets' help topics, alongside "real" parameters, and does not frame the issue with enough clarity (as of this writing): The description should clearly convey the message "Don't use this parameter directly, use the pipeline instead".
This GitHub issue provides an categorized overview of which cmdlets process direct -InputObject arguments how.
Taking a step back:
While technically a breaking change, it would make sense for -InputObject parameters (or any pipeline-binding parameter) to by default accept and enumerate collections even when they're passed by direct argument rather than via the pipeline, in a manner that is transparent to the implementing command.
This would put direct-argument input on par with pipeline input, with the added benefit of the former resulting in faster processing of already-in-memory collections.

Passing subset of objects down pipeline, based on count of properties?

I need to script up some things in PowerCLI (VMWare's bolt on to PowerShell). Basically we have a server cluster with three hosts. Each host has multiple virtual switches. Each virtual switch has multiple vlans ('port groups' in VMWare speak). I need to audit the fact that the same port groups exist on each host (so things keep working if the VM is moved).
Step 1 to achieving this is would be to know that the port group name exists on each of the three host machines.
I'm falling over with how to filter some objects out of all the ones returned by a cmdlet, based on number of results returned from a property of those objects. I then need to perform further operations with original object type that passes the filter test to go on down the pipeline.
To give some specifics, this an example showing 'Some PortGroup Name' and the three hosts it exists on (and as a bonus, the vSwitch):
Get-VirtualPortGroup -Name 'Some PortGroup Name' |
Select-Object Name, VMHostID, VirtualSwitchId
produces the output
Name VMHostId VirtualSwitchId
---- -------- ---------------
Some PortGroup Name HostSystem-host-29459 key-vim.host.VirtualSwitch-vSwitch6
Some PortGroup Name HostSystem-host-29463 key-vim.host.VirtualSwitch-vSwitch6
Some PortGroup Name HostSystem-host-29471 key-vim.host.VirtualSwitch-vSwitch6
Instead of 3, I'm starting with the 1849 port group names that are being returned by Get-VirtualPortGroup. I need a pipeline to whittle the number of VirtualPortGroup objects down to a collection consisting of only those objects where a count of the 'VMHostId' property is less than 3, and pass the remaining VirtualPortGroup objects down the pipeline for further processing.
This seems simple enough to do. I'm still failing though.
The following almost works. Piping it to measure shows a count of 229, instead of the original 1849 (so it's definitely filtered a lot out, and is possibly correctly returning the subset I'm after...?). The problem is, the object type is now a 'Group' or something at this point in the pipeline, and doesn't have all the properties and methods of the original Get-VirtualPortGroup objects.
Get-VirtualPortGroup |
Group-Object -Property Name |
Where-Object $_.Count -lt 3
Bolting a | Select-Object -ExpandProperty Group to the end of the above seemed promising, except it then seems to return the entire collection of Get-VirtualPortGroup objects as though I had done no filtering in there at all....
Am I doing something fundamentally wrong?
How can I filter out objects based on the count of the number of results returned by a specific property of an object, but still pass the original object type down the pipe?
Your approach is correct, but you got the Where-Object syntax wrong. The abbreviated syntax is:
Where-Object <property> <op> <value>
without the current object variable ($_). In your case that would be:
Where-Object Count -lt 3
Otherwise you must use scriptblock notation:
Where-Object { $_.Count -lt 3 }
This should do what you want:
Get-VirtualPortGroup |
Group-Object -Property Name |
Where-Object { $_.Count -lt 3 } |
Select-Object -Expand Group

using powershell and pipeing output od Select-Object to access selected columns

I have the power shell below that selectes certain fields
dir -Path E:\scripts\br\test | Get-FileMetaData | Select-Object name, Comments, Path, Rating
what i want to do is utilize Name,Comments,Path,Rating in further Pipes $_.name etc dosnt work
If I understand your question correctly, you want to do something with the output of Select-Object, but you want to do it in a pipeline.
To do this, you need to pass the output down the pipeline into a Cmdlet that accepts pipeline input (such as ForEach-Object). If the next operation in the pipeline does not accept pipeline input, you will have to set the output to a variable and access the information through the variable,
Using ForEach-Object
In this method, you will be processing each object individually. This will be similar to the first option in Method 1 (that is, dealing with individual items in the collection of items returned by Select-Object).
dir | Get-FileMetaData | Select-Object Name,Comments,Path,Rating | ForEach-Object {
# Do stuff with $_
# Note that $_ is a single item in the collection returned by Select-Object
}
The variable method is included in case your next Cmdlet does not accept pipeline input.
Using Variable
In this method, you will treat $tempVariable as an array and you can operate on each item. If need be, you can actually access each column individually, getting everything at once.
$tempVariable = dir | Get-FileMetaData | Select-Object Name,Comments,Path,Rating
# Do stuff with each Name by using $tempVariable[i].Name, etc.
# Or do stuff with all Names by using $tempVariable.Name, etc.