How does pipeline and $_ work in PowerShell?

How does pipeline and $_ work in PowerShell? - powershell

Get-Process|Get-Member $_, why doesn't this work? If $_ represents current object in pipeline, shouldn't the above output be returning members of each process object?
Will a cmdlet pipeline all its output objects at once to the next cmdlet or as and when there is an object available?
As #1 does not work, when exactly can I use $_ variable? The conditions under which this variable gets created will be more helpful rather than just a cmdlet example to demonstrate the use of $_.

$_ is an automatic variable that is available for use within the scriptblock input of certain cmdlets to represent the current item in the pipeline.
Get-Process | Get-Member $_ doesn't work because you send the pipeline object in to Get-Member via the |, but you then don't have any way to access the internals of Get-Member.
You could do this:
Get-Process | ForEach-Object {
$_ | Get-Member
}
You would then get a Get-Member output for every item in the collection of objects output by Get-Process, although this would be redundant as each would be the same.
Cmdlets do send the objects down the pipeline one at a time. You can see that with this example:
Get-Process | ForEach-Object {
$_
Start-Sleep 1
}
You can see with the added delay that the results are arriving in the ForEach-Object one at a time as soon as they are available.
Other places you can use the $_ variable are in Where-Object and Select-Object. For example:
Get-Process | Where-Object { $_.name -like 'win*' }
Here the Where-Object cmdlet is taking each item of the pipeline and we're using $_ to access the name property of that item to see if it's like the string win. If it is, then it gets sent onwards (and so comes out to the console) if it's not, Where-Object discards it.
You can use $_ in a Select-Object when doing calculated properties. For example:
Get-Process | Select-Object name,#{N='WorkingSetGB';E={$_.WorkingSet / 1GB}}
Here we use $_ to get at the WorkingSet property of each item and then convert it to a GB value by using / 1GB.

Yes, $_ represents the current object in the pipeline but since Get-Member takes a pipeline input you just have to pipe the result to the cmdlet:
Get-Process | Get-Member
Another example is
Get-Process | Export-Csv MyFile.csv
Here again, $_ is not needed, because Export-Csv takes pipeline input, and receives the output of Get-Process, one process at a time, through the pipeline. There is a loop inside the implementation of Export-csv, but that need not concern you here.
You typically use $_ when you pipe an object to the ForEach-Object cmdlet:
Get-Process | ForEach-Object {
Write-Host $_.Name
}

Related

Powershell Performance tuning for aggregation operation on big delimited files

I have a delimited file with 350 columns. The delimiter is \034(Field separator).
I have to extract a particular column value and find out the count of each distinct value of that column in the file. If the count of distinct value is greater or equal to 2, I need to output it to a file.
The source file is 1GB. I have written the following command. It is very slow.
Get-Content E:\Test\test.txt | Foreach {($_ -split '\034')[117]} | Group-Object -Property { $_ } | %{ if($_.Count -ge 2) { Select-Object -InputObject $_ -Property Name,Count} } | Export-csv -Path "E:\Test\test2.csv" -NoTypeInformation
Please help!

I suggest using a switch statement to process the input file quickly (by PowerShell standards):
# Get an array of all the column values of interest.
$allColValues = switch -File E:\Test\test.txt {
default { # each input line
# For better performance with *literal* separators,
# use the .Split() *method*.
# Generally, however, use of the *regex*-based -split *operator* is preferable.
$_.Split([char] 0x1c)[117] # hex 0x1c is octal 034
}
}
# Group the column values, and only output those that occur at least
# twice.
$allColValues | Group-Object -NoElement | Where-Object Count -ge 2 |
Select-Object Name, Count | Export-Csv E:\Test\test2.csv -NoTypeInformation
Tip of the hat to Mathias R. Jessen for suggesting the -NoElement switch, which streamlines the Group-Object call by only maintaining abstract group information; that is, only the grouping criteria (as reflected in .Name, not also the individual objects that make up the group (as normally reflected in .Group) are returned via the output objects.
As for what you tried:
Get-Content with line-by-line streaming in the pipeline is slow, both generally (the object-by-object passing introduces overhead) and, specifically, because Get-Content decorates each line it outputs with ETS (Extended Type System) metadata.
GitHub issue #7537 proposes adding a way to opt-out of this decoration.
At the expense of memory consumption and potentially additional work for line-splitting, the -Raw switch reads the entire file as a single, multi-line string, which is much faster.
Passing -Property { $_ } to Group-Object isn't necessary - just omit it. Without a -Property argument, the input objects are grouped as a whole.
Chaining Where-Object and Select-Object - rather than filtering via an if statement in a ForEach-Object call combined with multiple Select-Object calls - is not only conceptually clearer, but performs better.

Why Doesn't the Command Line Argument Work When Put Directly Into a Command in a Script?

I am struggling to understand why using $args[$i] directly in this command doesn't work. It gives a completely wrong answer.
$memory=$(Get-Process | Where-Object {$_.id -eq "$($args[$i])"} | select -expand VirtualMemorySize64)
However, putting the command line argument into another variable and using that one works.
$id=($args[$i])
$memory=$(Get-Process | Where-Object {$_.id -eq "$id"} | select -expand VirtualMemorySize64)
An explanation on why this is would be great.

Every script block ({ ... }) in PowerShell has its own copy of the automatic $args array in which positionally passed arguments are automatically collected.
Therefore, $args inside {$_.id -eq "$($args[$i])"} is not the same as $args at the script level, so you indeed need to save the script-level value in an auxiliary variable first, as in your 2nd snippet, which can be streamlined as follows:
# Must use aux. variable to access the script-level $args inside
# the Where-Object script block.
$id = $args[$i]
$memory = Get-Process | Where-Object { $_.id -eq $id } |
Select-Object -ExpandProperty VirtualMemorySize64
Note the absence of superfluous (...) and $(...), and the removal of quoting around "$id", given that the .Id property of process object is a number (type [int]).
Taking a step back, I suggest declaring parameters in your script, which is preferable to using $args - the variables holding the values of such parameters can be used without a problem in Where-Object script blocks.
Generally:
It is only meaningful to access $args inside a script block that you've invoked with arguments, which is not the case in a script block passed to Where-Object, where the input to the script block comes (only) from the pipeline, via the automatic $_ variable
By contrast, you can pass arguments to a script block, if you invoke it with &, the call operator, for instance: & { "[$args]" } 'foo' yields [foo].

Where-Object, Select-Object and ForEach-object - Differences and Usage

Where-Object, Select-Object and ForEach-Object
I am a PowerShell beginner. I don't understand too much. Can someone give examples to illustrate the differences and usage scenarios between them?

If you are at all familiar with either LINQ or SQL then it should be much easier to understand because it uses the same concepts for the same words with a slight tweak.
Where-Object
is used for filtering out objects from the pipeline and is similar to how SQL filters rows. Here, objects are compared against a condition, or optionally a ScriptBlock, to determine whether it should be passed on to the next cmdlet in the pipeline. To demonstrate:
# Approved Verbs
Get-Verb | Measure-Object # count of 98
Get-Verb | Where-Object Verb -Like G* | Measure-Object # 3
# Integers 1 to 100
1..100 | Measure-Object # count of 100
1..100 | Where-Object {$_ -LT 50} | Measure-Object # count of 49
This syntax is usually the most readable when not using a ScriptBlock, but is necessary if you want to refer to the object itself (not a property) or for more complicated boolean results. Note: many resources will recommend (as #Iftimie Tudor mentions) trying to filter sooner (more left) in the pipeline for performance benefits.
Select-Object
is used for filtering properties of an object and is similar to how SQL filters columns. Importantly, it transforms the pipeline object into a new PSCustomObject that only has the requested properties with the object's values copied. To demonstrate:
Get-Process
Get-Process | Select-Object Name,CPU
Note, though, that this is only the standard usage. Explore its parameter sets using Get-Help Select-Object where it has similar row-like filtering capabilities like only getting the first n objects from the pipeline (aka, Get-Process | Select-Object -First 3) that continue onto the next cmdlet.
ForEach-Object
is like your foreach loops in other languages, with its own important flavour. In fact, PowerShell also has a foreach loop of its own! These may be easily confused but are operationally quite different. The main visual difference is that the foreach loop cannot be used in a pipeline, but ForEach-Object can. The latter, ForEach-Object, is a cmdlet (foreach is not) and can be used for transforming the current pipeline or for running a segment of code against the pipeline. It is really the most flexible cmdlet there is.
The best way to think about it is that it is the body of a loop, where the current element, $_, is coming from the pipeline and any output is passed onto the next cmdlet. To demonstrate:
# Transform
Get-Verb | ForEach-Object {"$($_.Verb) comes from the group $($_.Group)"}
# Retrieve Property
Get-Verb | ForEach-Object Verb
# Call Method
Get-Verb | ForEach-Object GetType
# Run Code
1..100 | ForEach-Object {
$increment = $_ + 1
$multiplied = $increment * 3
Write-Output $multiplied
}
Edit (Feb, 2023): thanks to #IkemKrueger for a missing }.

You have two things in there: filtering and iterating through a collection.
Filtering:
principle: Always use filtering left as much as possible. These two commands do the same thing, but the second one won't transmit a huge chunk of data through the pipe (or network):
Get-Process | where-Object {$_.Name -like 'chrome'} | Export-Csv
'c:\temp\processes.csv'
Get-Process -Name chrome | Export-Csv c:\temp\processes.csv
This is great when working with huge lists of computers or big files.
Many commandlets have their own filtering capabilities. Run get Get-Help get-process -full to see what they offer before piping.
iterating through collections:
Here you have 3 possibilities:
batch cmdlets is commandlet built in capability of passing a collection to another commandlet:
Get-Service -name BITS,Spooler,W32Time | Set-Service -startuptype
Automatic
WMI methods - WMI uses it's own way of doing the first one (different syntax)
gwmi win32_networkadapterconfiguration -filter "description like
'%intel%'" | EnableDHCP()
enumerating objects - iterating through the list:
Get-WmiObject Win32_Service -filter "name = 'BITS'" | ForEach-Object
-process { $_.change($null,$null,$null,$null,$null,$null,$null,"P#ssw0rd") }
Credits:
I found explanations that cleared the mess in my head around all these things in a book called : Learn Powershell in a month of lunches (chapters 9 and 13 in this case)

using powershell and pipeing output od Select-Object to access selected columns

I have the power shell below that selectes certain fields
dir -Path E:\scripts\br\test | Get-FileMetaData | Select-Object name, Comments, Path, Rating
what i want to do is utilize Name,Comments,Path,Rating in further Pipes $_.name etc dosnt work

If I understand your question correctly, you want to do something with the output of Select-Object, but you want to do it in a pipeline.
To do this, you need to pass the output down the pipeline into a Cmdlet that accepts pipeline input (such as ForEach-Object). If the next operation in the pipeline does not accept pipeline input, you will have to set the output to a variable and access the information through the variable,
Using ForEach-Object
In this method, you will be processing each object individually. This will be similar to the first option in Method 1 (that is, dealing with individual items in the collection of items returned by Select-Object).
dir | Get-FileMetaData | Select-Object Name,Comments,Path,Rating | ForEach-Object {
# Do stuff with $_
# Note that $_ is a single item in the collection returned by Select-Object
}
The variable method is included in case your next Cmdlet does not accept pipeline input.
Using Variable
In this method, you will treat $tempVariable as an array and you can operate on each item. If need be, you can actually access each column individually, getting everything at once.
$tempVariable = dir | Get-FileMetaData | Select-Object Name,Comments,Path,Rating
# Do stuff with each Name by using $tempVariable[i].Name, etc.
# Or do stuff with all Names by using $tempVariable.Name, etc.

How to get Select-Object to return a raw type (e.g. String) rather than PSCustomObject?

The following code gives me an array of PSCustomObjects, how can I get it to return an array of Strings?
$files = Get-ChildItem $directory -Recurse | Select-Object FullName | Where-Object {!($_.psiscontainer)}
(As a secondary question, what's the psiscontainer part for? I copied that from an example online)
Post-Accept Edit: Two great answers, wish I could mark both of them. Have awarded the original answer.

You just need to pick out the property you want from the objects. FullName in this case.
$files = Get-ChildItem $directory -Recurse | Select-Object FullName | Where-Object {!($_.psiscontainer)} | foreach {$_.FullName}
Edit: Explanation for Mark, who asks, "What does the foreach do? What is that enumerating over?"
Sung Meister's explanation is very good, but I'll add a walkthrough here because it could be helpful.
The key concept is the pipeline. Picture a series of pingpong balls rolling down a narrow tube one after the other. These are the objects in the pipeline. Each stage of pipeline--the code segments separated by pipe (|) characters--has a pipe going into it and pipe going out of it. The output of one stage is connected to the input of the next stage. Each stage takes the objects as they arrive, does things to them, and sends them back out into the output pipeline or sends out new, replacement objects.
Get-ChildItem $directory -Recurse
Get-ChildItem walks through the filesystem creating FileSystemInfo objects that represent each file and directory it encounters, and puts them into the pipeline.
Select-Object FullName
Select-Object takes each FileSystemInfo object as it arrives, grabs the FullName property from it (which is a path in this case), puts that property into a brand new custom object it has created, and puts that custom object out into the pipeline.
Where-Object {!($_.psiscontainer)}
This is a filter. It takes each object, examines it, and sends it back out or discards it depending on some condition. Your code here has a bug, by the way. The custom objects that arrive here don't have a psiscontainer property. This stage doesn't actually do anything. Sung Meister's code is better.
foreach {$_.FullName}
Foreach, whose long name is ForEach-Object, grabs each object as it arrives, and here, grabs the FullName property, a string, from it. Now, here is the subtle part: Any value that isn't consumed, that is, isn't captured by a variable or suppressed in some way, is put into the output pipeline. As an experiment, try replacing that stage with this:
foreach {'hello'; $_.FullName; 1; 2; 3}
Actually try it out and examine the output. There are four values in that code block. None of them are consumed. Notice that they all appear in the output. Now try this:
foreach {'hello'; $_.FullName; $ x = 1; 2; 3}
Notice that one of the values is being captured by a variable. It doesn't appear in the output pipeline.

To get the string for the file name you can use
$files = Get-ChildItem $directory -Recurse | Where-Object {!($_.psiscontainer)} | Select-Object -ExpandProperty FullName
The -ExpandProperty parameter allows you to get back an object based on the type of the property specified.
Further testing shows that this did not work with V1, but that functionality is fixed as of the V2 CTP3.

For Question #1
I have removed "select-object" portion - it's redundant and moved "where" filter before "foreach" unlike dangph's answer - Filter as soon as possible so that you are dealing with only a subset of what you have to deal with in the next pipe line.
$files = Get-ChildItem $directory -Recurse | Where-Object {!$_.PsIsContainer} | foreach {$_.FullName}
That code snippet essentially reads
Get all files full path of all files recursively (Get-ChildItem $directory -Recurse)
Filter out directories (Where-Object {!$_.PsIsContainer})
Return full file name only (foreach {$_.FullName})
Save all file names into $files
Note that for foreach {$_.FullName}, in powershell, last statement in a script block ({...}) is returned, in this case $_.FullName of type string
If you really need to get a raw object, you don't need to do anything after getting rid of "select-object". If you were to use Select-Object but want to access raw object, use "PsBase", which is a totally different question(topic) - Refer to "What's up with PSBASE, PSEXTENDED, PSADAPTED, and PSOBJECT?" for more information on that subject
For Question #2
And also filtering by !$_.PsIsContainer means that you are excluding a container level objects - In your case, you are doing Get-ChildItem on a FileSystem provider(you can see PowerShell providers through Get-PsProvider), so the container is a DirectoryInfo(folder)
PsIsContainer means different things under different PowerShell providers;
e.g.) For Registry provider, PsIsContainer is of type Microsoft.Win32.RegistryKey
Try this:
>pushd HKLM:\SOFTWARE
>ls | gm
[UPDATE] to following question: What does the foreach do? What is that enumerating over?
To clarify, "foreach" is an alias for "Foreach-Object"
You can find out through,
get-help foreach
-- or --
get-alias foreach
Now in my answer, "foreach" is enumerating each object instance of type FileInfo returned from previous pipe (which has filtered directories). FileInfo has a property called FullName and that is what "foreach" is enumerating over.
And you reference object passed through pipeline through a special pipeline variable called "$_" which is of type FileInfo within the script block context of "foreach".

For V1, add the following filter to your profile:
filter Get-PropertyValue([string]$name) { $_.$name }
Then you can do this:
gci . -r | ?{!$_.psiscontainer} | Get-PropertyName fullname
BTW, if you are using the PowerShell Community Extensions you already have this.
Regarding the ability to use Select-Object -Expand in V2, it is a cute trick but not obvious and really isn't what Select-Object nor -Expand was meant for. -Expand is all about flattening like LINQ's SelectMany and Select-Object is about projection of multiple properties onto a custom object.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse