PowerShell ForEach / Piping confusion

PowerShell ForEach / Piping confusion - powershell

I am using the TFS PowerTools Cmdlets in PowerShell to try to get at some information about Changesets and related WorkItems from my server. I have boiled the problem down to behavior I don't understand and I am hoping it is not TFS specific (so someone out there might be able to explain the problem to me :) )
Here's the only command that I can get to work:
Get-TfsItemHistory C:\myDir -recurse -stopafter 5 | % { Write-Host $_.WorkItems[0]["Title"] }
It does what I expect - Get-TfsItemHistory returns a list of 5 ChangeSets, and it pipes those to a foreach that prints out the Title of the first associated WorkItem. So what's my problem? I am trying to write a large script, and I prefer to code things to look more like a C# program (powershell syntax makes me cry). Whenever I try to do the above written any other way, the WorkItems collection is null.
The following commands (which I interpret to be logically equivalent) do not work (The WorkItems collection is null):
$items = Get-TfsItemHistory C:\myDir -recurse -stopafter 5
$items | ForEach-Object { Write-Host $_.WorkItems[0]["Title"] }
The one I would really prefer:
$items = Get-TfsItemHistory C:\myDir -recurse -stopafter 5
foreach ($item in $items)
{
$item.WorkItems[0]["Title"]
# do lots of other stuff
}
I read an article about the difference between the 'foreach' operator and the ForEach-Object Cmdlet, but that seems to be more of a performance debate. This really appears to be an issue about when the piping is being used.
I'm not sure why all three of these approaches don't work. Any insight is appreciated.

This is indeed confusing. For now a work-around is to grab the items like so:
$items = #(Get-TfsItemHistory . -r -Stopafter 25 |
Foreach {$_.WorkItems.Count > $null; $_})
This accesses the WorkItems collection which seems to cause this property to be populated (I know - WTF?). I tend to use #() to generate an array in cases where I want to use the foreach keyword. The thing with the foreach keyword is that it will iterate a scalar value including $null. So the if the query returns nothing, $items gets assigned $null and the foreach will iterate the loop once with $item set to null. Now PowerShell generally deals with nulls very nicely. However if you hand that value back to the .NET Framework, it usually isn't as forgiving. The #() will guarantee an array with with either 0, 1 or N elements in it. If it is 0 then the foreach loop will not execute its body at all.
BTW your last approach - foreach ($item in $items) { ... } - should work just fine.

Related

Powershell ().count output doesn't show data if counter<1 [duplicate]

I have a foreach loop that currently puts three entries in my hashtable:
$result = foreach($key in $serverSpace.Keys){
if($serverSpace[$key] -lt 80){
[pscustomobject]#{
Server = $key
Space = $serverSpace[$key]}}}
When I use
$result.count
I get 3 as expected.
I changed the foreach loop to exlude the entries less than or equal to one using
$result = foreach($key in $serverSpace.Keys){
if($serverSpace[$key] -lt 80 -and $serverSpace[$key] -gt 1){
[pscustomobject]#{
Server = $key
Space = $serverSpace[$key]}}}
$result.count should have 1 as its output but it doesn't recognize .count as a suggested command and $result.count doesn't output anything anymore. I'm assuming when theres only one entry in the hash table it won't allow a count? Not sure whats going on but my conditions for my script are dependent on the count of $result. Any help would be appreciated.

$result is not a hashtable so I prefixed it with #($result).count. Thank you to #Theo and #Lee_Dailey

What you're seeing is a bug in Windows PowerShell (as of the latest and final version, 5.1), which has since been corrected in PowerShell (Core) - see GitHub issue #3671 for the original bug report.
That is, since v3 all objects should have an intrinsic .Count property , not just collections, in the interest of unified treatment of scalars and collections - see this answer for more information.
The workaround for Windows PowerShell is indeed to force a value to be an array via #(...), the array-subexpression operator, which is guaranteed to have a .Count property, as shown in your answer, but it shouldn't be necessary and indeed isn't anymore in PowerShell (Core, v6+)
# !! Due to a BUG, this outputs $null in *Windows PowerShell*,
# !! but correctly outputs 1 in PowerShell (Core).
([pscustomobject] #{}).Count
# Workaround for Windows PowerShell that is effective in *both* editions,
# though potentially wasteful in PowerShell (Core):
#([pscustomobject] #{}).Count

Getting the count of a hashtable that only has one key/value in it

I have a foreach loop that currently puts three entries in my hashtable:
$result = foreach($key in $serverSpace.Keys){
if($serverSpace[$key] -lt 80){
[pscustomobject]#{
Server = $key
Space = $serverSpace[$key]}}}
When I use
$result.count
I get 3 as expected.
I changed the foreach loop to exlude the entries less than or equal to one using
$result = foreach($key in $serverSpace.Keys){
if($serverSpace[$key] -lt 80 -and $serverSpace[$key] -gt 1){
[pscustomobject]#{
Server = $key
Space = $serverSpace[$key]}}}
$result.count should have 1 as its output but it doesn't recognize .count as a suggested command and $result.count doesn't output anything anymore. I'm assuming when theres only one entry in the hash table it won't allow a count? Not sure whats going on but my conditions for my script are dependent on the count of $result. Any help would be appreciated.

$result is not a hashtable so I prefixed it with #($result).count. Thank you to #Theo and #Lee_Dailey

What you're seeing is a bug in Windows PowerShell (as of the latest and final version, 5.1), which has since been corrected in PowerShell (Core) - see GitHub issue #3671 for the original bug report.
That is, since v3 all objects should have an intrinsic .Count property , not just collections, in the interest of unified treatment of scalars and collections - see this answer for more information.
The workaround for Windows PowerShell is indeed to force a value to be an array via #(...), the array-subexpression operator, which is guaranteed to have a .Count property, as shown in your answer, but it shouldn't be necessary and indeed isn't anymore in PowerShell (Core, v6+)
# !! Due to a BUG, this outputs $null in *Windows PowerShell*,
# !! but correctly outputs 1 in PowerShell (Core).
([pscustomobject] #{}).Count
# Workaround for Windows PowerShell that is effective in *both* editions,
# though potentially wasteful in PowerShell (Core):
#([pscustomobject] #{}).Count

Can the following Nested foreach loop be simplified in PowerShell?

I have created a script that loops through an array and excludes any variables that are found within a second array.
While the code works; it got me wondering if it could be simplified or piped.
$result = #()
$ItemArray = #("a","b","c","d")
$exclusionArray = #("b","c")
foreach ($Item in $ItemArray)
{
$matchFailover = $false
:gohere
foreach ($ExclusionItem in $exclusionArray)
{
if ($Item -eq $ExclusionItem)
{
Write-Host "Match: $Item = $ExclusionItem"
$matchFailover = $true
break :gohere
}
else{
Write-Host "No Match: $Item != $ExclusionItem"
}
}
if (!($matchFailover))
{
Write-Host "Adding $Item to results"
$result += $Item
}
}
Write-Host "`nResults are"
$result

To give your task a name: You're looking for the relative complement aka set difference between two arrays:
In set-theory notation, it would be $ItemArray \ $ExclusionArray, i.e., those elements in $ItemArray that aren't also in $ExclusionArray.
This related question is looking for the symmetric difference between two sets, i.e., the set of elements that are unique to either side - at last that's what the Compare-Object-based solutions there implement, but only under the assumption that each array has no duplicates.
EyIM's helpful answer is conceptually simple and concise.
A potential problem is performance: a lookup in the exclusion array must be performed for each element in the input array.
With small arrays, this likely won't matter in practice.
With larger arrays, LINQ offers a substantially faster solution:
Note: In order to benefit from the LINQ solution, your arrays should be in memory already, and the benefit is greater the larger the exclusion array is. If your input is streaming via the pipeline, the overhead from executing the pipeline may make attempts to optimize array processing pointless or even counterproductive, in which case sticking with the native PowerShell solution makes sense - see iRon's answer.
# Declare the arrays as [string[]]
# so that calling the LINQ method below works as-is.
# (You could also cast to [string[]] ad hoc.)
[string[]] $ItemArray = 'a','b','c','d'
[string[]] $exclusionArray = 'b','c'
# Return only those elements in $ItemArray that aren't also in $exclusionArray
# and convert the result (a lazy enumerable of type [IEnumerable[string]])
# back to an array to force its evaluation
# (If you directly enumerate the result in a pipeline, that step isn't needed.)
[string[]] [Linq.Enumerable]::Except($ItemArray, $exclusionArray) # -> 'a', 'd'
Note the need to use the LINQ types explicitly, via their static methods, because PowerShell, as of v7, has no support for extension methods.
However, there is a proposal on GitHub to add such support; this related proposal asks for improved support for calling generic methods.
See this answer for an overview of how to currently call LINQ methods from PowerShell.
Performance comparison:
Tip of the hat to iRon for his input.
The following benchmark code uses the Time-Command function to compare the two approaches, using arrays with roughly 4000 and 2000 elements, respectively, which - as in the question - differ by only 2 elements.
Note that in order to level the playing field, the .Where() array method (PSv4+) is used instead of the pipeline-based Where-Object cmdlet, as .Where() is faster with arrays already in memory.
Here are the results averaged over 10 runs; note the relative performance, as shown in the Factor columns; from a single-core Windows 10 VM running Windows PowerShell v5.1.:
Factor Secs (10-run avg.) Command TimeSpan
------ ------------------ ------- --------
1.00 0.046 # LINQ... 00:00:00.0455381
8.40 0.382 # Where ... -notContains... 00:00:00.3824038
The LINQ solution is substantially faster - by a factor of 8+ (though even the much slower solution only took about 0.4 seconds to run).
It seems that the performance gap is even wider in PowerShell Core, where I've seen a factor of around 19 with v7.0.0-preview.4.; interestingly, both tests ran faster individually than in Windows PowerShell.
Benchmark code:
# Script block to initialize the arrays.
# The filler arrays are randomized to eliminate caching effects in LINQ.
$init = {
$fillerArray = 1..1000 | Get-Random -Count 1000
[string[]] $ItemArray = $fillerArray + 'a' + $fillerArray + 'b' + $fillerArray + 'c' + $fillerArray + 'd'
[string[]] $exclusionArray = $fillerArray + 'b' + $fillerArray + 'c'
}
# Compare the average of 10 runs.
Time-Command -Count 10 { # LINQ
. $init
$result = [string[]] [Linq.Enumerable]::Except($ItemArray, $exclusionArray)
}, { # Where ... -notContains
. $init
$result = $ItemArray.Where({ $exclusionArray -notcontains $_ })
}

You can use Where-Object with -notcontains:
$ItemArray | Where-Object { $exclusionArray -notcontains $_ }
Output:
a, d

Advocating native PowerShell:
As per #mklement0's answer, with no doubt, Language Integrated Query (LINQ) is //Fast...
But in some circumstances, native PowerShell commands using the pipeline as suggested by #EylM can still beat LINQ. This is not just theoretical but might happen in used cases where the concerned process is idle and waiting for a slow input. E.g. where the input comes from:
A remote server (e.g. Active Directory)
A slow device
A separate thread that has to make a complex calculation
The internet ...
Despite I haven't seen an easy prove for this yet, this is suggested at several sites and can be deducted from sites as e.g. High Performance PowerShell with LINQ and Ins and Outs of the PowerShell Pipeline.
Prove
To prove the above thesis, I have created a small Slack cmdlet that slows down each item dropped into the pipeline with 1 millisecond (by default):
Function Slack-Object ($Delay = 1) {
process {
Start-Sleep -Milliseconds $Delay
Write-Output $_
}
}; Set-Alias Slack Slack-Object
Now let's see if native PowerShell can actually beat LINQ:
(To get a good performance comparison, caches should be cleared by e.g. starting a fresh PowerShell session.)
[string[]] $InputArray = 1..200
[string[]] $ExclusionArray = 100..300
(Measure-Command {
$Result = [Linq.Enumerable]::Except([string[]] ($InputArray | Slack), $ExclusionArray)
}).TotalMilliseconds
(Measure-Command {
$Result = $InputArray | Slack | Where-Object {$ExclusionArray -notcontains $_}
}).TotalMilliseconds
Results:
LINQ: 411,3721
PowerShell: 366,961
To exclude the LINQ cache, a single run test should be done but as commented by #mklement0, the results of single runs might vary each run.
The results also highly depend on the size of the input arrays, the size of the result, the slack, the test system, etc.
Conclusion:
PowerShell might still be faster than LINQ in some scenarios!
Quoting mklement0's comment:
"Overall, it's fair to say that the difference in performance is so small in this scenario that it's not worth picking the approach based on performance - and it makes sense to go with the more PowerShell-like approach (Where-Object), given that the LINQ approach is far from obvious. The bottom line is: choose LINQ only if you have large arrays that are already in memory. If the pipeline is involved, the pipeline overhead alone may make optimizations pointless."

Powershell - clarification about foreach

I am learning powershell and I need someone to give me an initial push to get me through the learning curve. I am familiar with programming and dos but not powershell.
What I would like to do is listing all files from my designated directory and pushing the filenames into an array. I am not very familiar with the syntax and when I tried to run my test I was asked about entering parameters.
Could someone please enlighten me and show me the correct way to get what I want?
This is what powershell asked me:
PS D:\ABC> Test.ps1
cmdlet ForEach-Object at command pipeline position 2
Supply values for the following parameters:
Process[0]:
This is my test:
[string]$filePath = "D:\ABC\*.*";
Get-ChildItem $filePath | foreach
{
$myFileList = $_.BaseName;
write-host $_.BaseName
}
Why was ps asking about Process[0]?
I would want to ps to list all the files from the directory and pipe the results to foreach where I put each file into $myFileList array and print out the filename as well.

Don't confuse foreach (the statement) with ForEach-Object (the cmdlet). Microsoft does a terrible job with this because there is an alias of foreach that points to ForEach-Object, so when you use foreach you have to know which version you're using based on how you're using it. Their documentation makes this worse by further conflating the two.
The one you're trying to use in your code is ForEach-Object, so you should use the full name of it to differentiate it. From there, the issue is that the { block starts on the next line.
{} is used in PowerShell for blocks of code related to statements (like while loops) but is also used to denote a [ScriptBlock] object.
When you use ForEach-Object it's expecting a scriptblock, which can be taken positionally, but it must be on the same line.
Conversely, since foreach is a statement, it can use its {} on the next line.
Your code with ForEach-Object:
Get-ChildItem $filePath | ForEach-Object {
$myFileList = $_.BaseName;
write-host $_.BaseName
}
Your code with foreach:
$files = Get-ChildItem $filePath
foreach ($file in $Files)
{
$myFileList = $file.BaseName;
write-host $file.BaseName
}

How to get Select-Object to return a raw type (e.g. String) rather than PSCustomObject?

The following code gives me an array of PSCustomObjects, how can I get it to return an array of Strings?
$files = Get-ChildItem $directory -Recurse | Select-Object FullName | Where-Object {!($_.psiscontainer)}
(As a secondary question, what's the psiscontainer part for? I copied that from an example online)
Post-Accept Edit: Two great answers, wish I could mark both of them. Have awarded the original answer.

You just need to pick out the property you want from the objects. FullName in this case.
$files = Get-ChildItem $directory -Recurse | Select-Object FullName | Where-Object {!($_.psiscontainer)} | foreach {$_.FullName}
Edit: Explanation for Mark, who asks, "What does the foreach do? What is that enumerating over?"
Sung Meister's explanation is very good, but I'll add a walkthrough here because it could be helpful.
The key concept is the pipeline. Picture a series of pingpong balls rolling down a narrow tube one after the other. These are the objects in the pipeline. Each stage of pipeline--the code segments separated by pipe (|) characters--has a pipe going into it and pipe going out of it. The output of one stage is connected to the input of the next stage. Each stage takes the objects as they arrive, does things to them, and sends them back out into the output pipeline or sends out new, replacement objects.
Get-ChildItem $directory -Recurse
Get-ChildItem walks through the filesystem creating FileSystemInfo objects that represent each file and directory it encounters, and puts them into the pipeline.
Select-Object FullName
Select-Object takes each FileSystemInfo object as it arrives, grabs the FullName property from it (which is a path in this case), puts that property into a brand new custom object it has created, and puts that custom object out into the pipeline.
Where-Object {!($_.psiscontainer)}
This is a filter. It takes each object, examines it, and sends it back out or discards it depending on some condition. Your code here has a bug, by the way. The custom objects that arrive here don't have a psiscontainer property. This stage doesn't actually do anything. Sung Meister's code is better.
foreach {$_.FullName}
Foreach, whose long name is ForEach-Object, grabs each object as it arrives, and here, grabs the FullName property, a string, from it. Now, here is the subtle part: Any value that isn't consumed, that is, isn't captured by a variable or suppressed in some way, is put into the output pipeline. As an experiment, try replacing that stage with this:
foreach {'hello'; $_.FullName; 1; 2; 3}
Actually try it out and examine the output. There are four values in that code block. None of them are consumed. Notice that they all appear in the output. Now try this:
foreach {'hello'; $_.FullName; $ x = 1; 2; 3}
Notice that one of the values is being captured by a variable. It doesn't appear in the output pipeline.

To get the string for the file name you can use
$files = Get-ChildItem $directory -Recurse | Where-Object {!($_.psiscontainer)} | Select-Object -ExpandProperty FullName
The -ExpandProperty parameter allows you to get back an object based on the type of the property specified.
Further testing shows that this did not work with V1, but that functionality is fixed as of the V2 CTP3.

For Question #1
I have removed "select-object" portion - it's redundant and moved "where" filter before "foreach" unlike dangph's answer - Filter as soon as possible so that you are dealing with only a subset of what you have to deal with in the next pipe line.
$files = Get-ChildItem $directory -Recurse | Where-Object {!$_.PsIsContainer} | foreach {$_.FullName}
That code snippet essentially reads
Get all files full path of all files recursively (Get-ChildItem $directory -Recurse)
Filter out directories (Where-Object {!$_.PsIsContainer})
Return full file name only (foreach {$_.FullName})
Save all file names into $files
Note that for foreach {$_.FullName}, in powershell, last statement in a script block ({...}) is returned, in this case $_.FullName of type string
If you really need to get a raw object, you don't need to do anything after getting rid of "select-object". If you were to use Select-Object but want to access raw object, use "PsBase", which is a totally different question(topic) - Refer to "What's up with PSBASE, PSEXTENDED, PSADAPTED, and PSOBJECT?" for more information on that subject
For Question #2
And also filtering by !$_.PsIsContainer means that you are excluding a container level objects - In your case, you are doing Get-ChildItem on a FileSystem provider(you can see PowerShell providers through Get-PsProvider), so the container is a DirectoryInfo(folder)
PsIsContainer means different things under different PowerShell providers;
e.g.) For Registry provider, PsIsContainer is of type Microsoft.Win32.RegistryKey
Try this:
>pushd HKLM:\SOFTWARE
>ls | gm
[UPDATE] to following question: What does the foreach do? What is that enumerating over?
To clarify, "foreach" is an alias for "Foreach-Object"
You can find out through,
get-help foreach
-- or --
get-alias foreach
Now in my answer, "foreach" is enumerating each object instance of type FileInfo returned from previous pipe (which has filtered directories). FileInfo has a property called FullName and that is what "foreach" is enumerating over.
And you reference object passed through pipeline through a special pipeline variable called "$_" which is of type FileInfo within the script block context of "foreach".

For V1, add the following filter to your profile:
filter Get-PropertyValue([string]$name) { $_.$name }
Then you can do this:
gci . -r | ?{!$_.psiscontainer} | Get-PropertyName fullname
BTW, if you are using the PowerShell Community Extensions you already have this.
Regarding the ability to use Select-Object -Expand in V2, it is a cute trick but not obvious and really isn't what Select-Object nor -Expand was meant for. -Expand is all about flattening like LINQ's SelectMany and Select-Object is about projection of multiple properties onto a custom object.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse