Which way is better in PowerShell and why - powershell

I am novice in powershell and using it very rarely for some little things.
I am using this one liner in order to extract emails
recursive
(Get-ChildItem -Include *.txt -Recurse | Get-Content | Select-String -Pattern "(?:[a-zA-Z0-9_\-\.]+)#(?:[a-zA-Z0-9_\-\.]+)\.(?:[a-zA-Z]{2,5})").Matches | Select-Object -ExpandProperty Value -Unique
In order to access Matches property I've added parentheses. Later I come to that way:
Get-ChildItem -Include *.txt -Recurse | Get-Content | Select-String -Pattern "(?:[a-zA-Z0-9_\-\.]+)#(?:[a-zA-Z0-9_\-\.]+)\.(?:[a-zA-Z]{2,5})" | Select-Object -ExpandProperty Matches -Unique | Select-Object -ExpandProperty Value
I want to to ask what parentheses do exactly in the first version.

Say you have an $output via some function (gci in your case) and you are interested in the field $output.Matches.
If you run $output | select Matches (example 1), you run a
Foreach-Object statement against every object in your array. This
pipeline will use some RAM (very limited, indeed) that are used in a
serial calculation, so every object of $output is processed one after
the other.
If you run $output.Matches (example 2), you select a field from an
array. This will use a lot of RAM at once, but the field will be
processed as one big object instead of many little objects.
As it comes to performance. As always, note that PowerShell is not the way to go if you need high performance. It was never designed to be a fast programming language.
When you're using small objects (like gci $env:userprofile\Desktop), the performance hit will be small. When using large objects or using a lot of nested pipes, the performance hit will be large.
I've just tested it with a gci Z:\ -recurse when Z:\ is a network drive. Performance is dropped with a factor of 20 in this specific case. (Use Measure-Command to test this.)

Related

Get-Childitem - improve memory usage and performance

I would like to be able to also retrieve the file owner , LastAccessTime, LastWriteTime, CreationTime. Get-Childitem has known performance issues when scaled to large directory structures.
We had some performance issue while looking for files in a folder which have more than 100000 subfolders.
Here is my script:
$Dir = get-childitem "W:\DATA" -recurse -force
$Dir | Select-Object name,fullname, LastAccessTime, LastWriteTime, CreationTime, #{N='Owner';E={$_.GetAccessControl().Owner}} | Export-Csv -path C:\Scripts\xlsx.csv -NoTypeInformation
thanks in advance,
Memory
PowerShell objects (PSCustomObject) are optimized for streaming (One-at-a-time processing) and therefore quiet heavy.
Using parenthesis ((...)) or assigning you stream to a variable (like: $Dir =) will choke the pipeline and pile up all the objects into memory.
To reduce memory usage, immediately pass your objects through the pipeline by chaining the concerned cmdlets with a pipe character:
Get-childitem "W:\DATA" -recurse -force |
Select-Object astAccessTime, LastWriteTime, CreationTime |
Export-Csv -path C:\Scripts\xlsx.csv -NoTypeInformation
Performance
Starting with a quote from PowerShell scripting performance considerations:
PowerShell scripts that leverage .NET directly and avoid the pipeline tend to be faster than idiomatic PowerShell. Idiomatic PowerShell typically uses cmdlets and PowerShell functions heavily, often leveraging the pipeline, and dropping down into .NET only when necessary.
In your case, the performance bottleneck is likely not in PowerShell but due to the server and the network. Meaning leveraging from .NET directly would probably not have any effect on the performance.
In fact, using the PowerShell pipeline might be even faster in this case as you do not have to wait until the last file info item is loaded into memory where the native PowerShell pipeline immediately starts processing at the first item while the next items are (slowly) provided by the server.
If you change the last cmdlet (Export-Csv) to ConvertTo-Csv you will probably see the difference where a (correctly setup) pipeline almost starts on fly and other solutions take a while before outputting any data to the console.
The numbers tell the tale
(In Dutch: "meten is weten", which literally means: measuring is knowing)
If you aren't sure what technique would give you the best performance, I recommend you to simply test it (on a subset), like:
Measure-Command {
Get-childitem "W:\DATA" -recurse -force |
Select-Object astAccessTime, LastWriteTime, CreationTime |
Export-Csv -path C:\Scripts\xlsx.csv -NoTypeInformation
} | Select-Object TotalMilliseconds
and compare the results.
Give this a try, should be faster than Get-ChildItem. You could also use [SearchOption]::AllDirectories and no Collections.Queue but I'm not certain if that would consume less memory.
using namespace System.Collections
using namespace System.IO
class InfoProps {
[string] $Name
[string] $FullName
[datetime] $LastAccessTime
[datetime] $LastWriteTime
[datetime] $CreationTime
[string] $Owner
Infoprops([object]$FileInfo)
{
$this.Name = $FileInfo.Name
$this.FullName = $FileInfo.FullName
$this.LastAccessTime = $FileInfo.LastAccessTime
$this.LastWriteTime = $FileInfo.LastWriteTime
$this.CreationTime = $FileInfo.CreationTime
$this.Owner = $FileInfo.GetAccessControl().Owner
}
}
$initialDirectory = $pwd.Path
$queue = [Queue]::new()
$queue.Enqueue($initialDirectory)
& {
while ($queue.Count)
{
$target = $queue.Dequeue()
foreach ($childs in [Directory]::EnumerateDirectories($target)) {
$queue.Enqueue($childs)
}
[InfoProps] [DirectoryInfo] $target # => Remove this line if you want only files!
[InfoProps[]] [FileInfo[]] [Directory]::GetFiles($target)
}
} | Export-Csv test.csv -NoTypeInformation

Where-Object, Select-Object and ForEach-object - Differences and Usage

Where-Object, Select-Object and ForEach-Object
I am a PowerShell beginner. I don't understand too much. Can someone give examples to illustrate the differences and usage scenarios between them?
If you are at all familiar with either LINQ or SQL then it should be much easier to understand because it uses the same concepts for the same words with a slight tweak.
Where-Object
is used for filtering out objects from the pipeline and is similar to how SQL filters rows. Here, objects are compared against a condition, or optionally a ScriptBlock, to determine whether it should be passed on to the next cmdlet in the pipeline. To demonstrate:
# Approved Verbs
Get-Verb | Measure-Object # count of 98
Get-Verb | Where-Object Verb -Like G* | Measure-Object # 3
# Integers 1 to 100
1..100 | Measure-Object # count of 100
1..100 | Where-Object {$_ -LT 50} | Measure-Object # count of 49
This syntax is usually the most readable when not using a ScriptBlock, but is necessary if you want to refer to the object itself (not a property) or for more complicated boolean results. Note: many resources will recommend (as #Iftimie Tudor mentions) trying to filter sooner (more left) in the pipeline for performance benefits.
Select-Object
is used for filtering properties of an object and is similar to how SQL filters columns. Importantly, it transforms the pipeline object into a new PSCustomObject that only has the requested properties with the object's values copied. To demonstrate:
Get-Process
Get-Process | Select-Object Name,CPU
Note, though, that this is only the standard usage. Explore its parameter sets using Get-Help Select-Object where it has similar row-like filtering capabilities like only getting the first n objects from the pipeline (aka, Get-Process | Select-Object -First 3) that continue onto the next cmdlet.
ForEach-Object
is like your foreach loops in other languages, with its own important flavour. In fact, PowerShell also has a foreach loop of its own! These may be easily confused but are operationally quite different. The main visual difference is that the foreach loop cannot be used in a pipeline, but ForEach-Object can. The latter, ForEach-Object, is a cmdlet (foreach is not) and can be used for transforming the current pipeline or for running a segment of code against the pipeline. It is really the most flexible cmdlet there is.
The best way to think about it is that it is the body of a loop, where the current element, $_, is coming from the pipeline and any output is passed onto the next cmdlet. To demonstrate:
# Transform
Get-Verb | ForEach-Object {"$($_.Verb) comes from the group $($_.Group)"}
# Retrieve Property
Get-Verb | ForEach-Object Verb
# Call Method
Get-Verb | ForEach-Object GetType
# Run Code
1..100 | ForEach-Object {
$increment = $_ + 1
$multiplied = $increment * 3
Write-Output $multiplied
}
Edit (Feb, 2023): thanks to #IkemKrueger for a missing }.
You have two things in there: filtering and iterating through a collection.
Filtering:
principle: Always use filtering left as much as possible. These two commands do the same thing, but the second one won't transmit a huge chunk of data through the pipe (or network):
Get-Process | where-Object {$_.Name -like 'chrome'} | Export-Csv
'c:\temp\processes.csv'
Get-Process -Name chrome | Export-Csv c:\temp\processes.csv
This is great when working with huge lists of computers or big files.
Many commandlets have their own filtering capabilities. Run get Get-Help get-process -full to see what they offer before piping.
iterating through collections:
Here you have 3 possibilities:
batch cmdlets is commandlet built in capability of passing a collection to another commandlet:
Get-Service -name BITS,Spooler,W32Time | Set-Service -startuptype
Automatic
WMI methods - WMI uses it's own way of doing the first one (different syntax)
gwmi win32_networkadapterconfiguration -filter "description like
'%intel%'" | EnableDHCP()
enumerating objects - iterating through the list:
Get-WmiObject Win32_Service -filter "name = 'BITS'" | ForEach-Object
-process { $_.change($null,$null,$null,$null,$null,$null,$null,"P#ssw0rd") }
Credits:
I found explanations that cleared the mess in my head around all these things in a book called : Learn Powershell in a month of lunches (chapters 9 and 13 in this case)

PowerShell: Find similar filenames in a directory

In a purely hypothetical situation of a person that downloaded some TV episodes, but is wondering if he/she accidentally downloaded an HDTV, a WEBRip and a WEB-DL version of an episode, how could PowerShell find these 'duplicates' so the lower quality versions can be automagically deleted?
First, I'd get all the files in the directory:
$Files = Get-ChildItem -Path $Directory -Exclude '*.nfo','*.srt','*.idx','*.sub' |
Sort-Object -Property Name
I exclude the non-video extensions for now, since they would cause false positives. I would still have to deal with them though (during the delete phase).
At this point, I would likely use a ForEach construct to parse through the files one by one and look for files that have the same episode number. If there are any, they should be looked at.
Assuming a common spaces equals dots notation here, a typical filename would be AwesomeSeries.S01E01.HDTV.x264-RLSGRP
To compare, I need to get only the episode number. In the above case, that means S01E01:
If ($File.BaseName -match 'S*(\d{1,2})(x|E)(\d{1,2})') { $EpisodeNumber = $Matches[0] }
In the case of S01E01E02 I would simply add a second if-statement, so I'm not concerned with that for now.
$EpisodeNumber should now contain S01E01. I can use that to discover if there are any other files with that episode number in $Files. I can do that with:
$Files -match $EpisodeNumber
This is where my trouble starts. The above will also return the file I'm processing. I could at this point handle the duplicates immediately, but then I would have to do the Get-ChildItem again because otherwise the same match would be returned when the ForEach construct gets to the duplicate file which would then result in an error.
I could store the files I wish to delete in an array and process them after the ForEach contruct is over, but then I'd still have to filter out all the duplicates. After all, in the ForEach loop,
AwesomeSeries.S01E01.HDTV.x264-RLSGRP
would first match
AwesomeSeries.S01E01.WEB-DL.x264.x264-RLSGRP, only for
AwesomeSeries.S01E01.WEB-DL.x264.x264-RLSGRP
to match
AwesomeSeries.S01E01.HDTV.x264-RLSGRP afterwards.
So maybe I should process every episode number only once, but how?
I get the feeling I'm being very inefficient here and there must be a better way to do this, so I'm asking for help. Can anyone point me in the right direction?
Filter the $Files array to exclude the current file when matching:
($Files | Where-Object {$_.FullName -ne $File.FullName}) -match $EpisodeNumber
Regarding the duplicates in the array the end, you can use Select-Object -Unique to only get distinct entries.
Since you know how to get the episode number let's use that to group the files together.
$Files = Get-ChildItem -Path $Directory -Exclude '*.nfo','*.srt','*.idx','*.sub' | Select-Object FullName, #{Name="EpisodeIndex";Expression={
# We do not have to do it like this but if your detection logic gets more complicated then having
# this select-object block will be a cleaner option then using a calculated property
If ($_.BaseName -match 'S*(\d{1,2})(x|E)(\d{1,2})'){$Matches[0]}
}}
# Group the files by season episode index (that have one). Return groups that have more than one member as those would need attention.
$Files | Where-Object{$_.EpisodeIndex } | Group-Object -Property EpisodeIndex |
Where-Object{$_.Count -gt 1} | ForEach-Object{
# Expand the group members
$_.Group
# Not sure how you plan on dealing with it.
}

Powershell memory exhaustion using NTFSSecurity module on a deep folder traverse

I have been tasked with reporting all of the ACL's on each folder in our Shared drive structure. Added to that, I need to do a look up on the membership of each unique group that gets returned.
Im using the NTFSSecurity module in conjunction with the get-childitem2 cmdlet to get past the 260 character path length limit. The path(s) I am traversing are many hundreds of folders deep and long since pass the 260 character limit.
I have been banging on this for a couple of weeks. My first challenge was crafting my script to do my task all at once, but now im thinking thats my problem... The issue at hand is resources, specifically memory exhaustion. Once the script gets into one of the deep folders, it consumes all RAM and starts swapping to disk, and I eventually run out of disk space.
Here is the script:
$csvfile = 'C:\users\user1\Documents\acl cleanup\dept2_Dir_List.csv'
foreach ($record in Import-Csv $csvFile)
{
$Groups = get-childitem2 -directory -path $record.FullName -recurse | Get-ntfsaccess | where -property accounttype -eq -value group
$groups2 = $Groups | where -property account -notmatch -value '^builtin|^NT AUTHORITY\\|^Creator|^AD\\Domain'
$groups3 = $groups2 | select account -Unique
$GroupMembers = ForEach ($Group in $Groups3) {
(Get-ADGroup $Group.account.sid | get-adgroupmember | select Name, #{N="GroupName";e={$Group.Account}}
)}
$groups2 | select FullName,Account,AccessControlType,AccessRights,IsInherited | export-csv "C:\Users\user1\Documents\acl cleanup\Dept2\$($record.name).csv"
$GroupMembers | export-csv "C:\Users\user1\Documents\acl cleanup\Dept2\$($record.name)_GroupMembers.csv"
}
NOTE: The dir list it reads in is the top level folders created from a get-childitem2 -directory | export-csv filename.csv
During the run, it appears to not be flushing memory properly. This is just a guess from observation. At the end of each run through the code, the variables should be getting over-written, I thought, but memory doesn't go down, so it looked to me that since memory didn't go back down, that it wasn't properly releasing it? Like I said, a guess... I have been reading about runspaces but I am confused about how to implement that with this script. Is that the right direction for this?
Thanks in advance for any assistance...!
Funny you should post about this as I just finished a modified version of the script that I think works much better. A friend turned me on to 'Function Filters' that seem to work well here. Ill test it on the big directories tomorrow to see how much better the memory management is but so far it looks great.
#Define the function ‘filter’ here and call it ‘GetAcl’. Process is the keyword that tells the function to deal with each item in the pipeline one at a time
Function GetAcl {
PROCESS {
Get-NTFSAccess $_ | where -property accounttype -eq -value group | where -property account -notmatch -value '^builtin|^NT AUTHORITY\\|^Creator|^AD\\Domain'
}
}
#Import the directory top level paths
$Paths = import-csv 'C:\users\rknapp2\Documents\acl cleanup\dept2_Dir_List.csv'
#Process each line from the importcsv one at a time and run GetChilditem against it.
#Notice the second part – I ‘|’ pipe the results of the GetChildItem to the function that because of the type of function it is, handles each item one at a time
#When done, pass results to Exportcsv and send it to a file name based on the path name. This puts each dir into its own file.
ForEach ($Path in $paths) {
(Get-ChildItem2 -path $path.FullName -Recurse -directory) | getacl | export-csv "C:\Users\rknapp2\Documents\acl cleanup\TestFilter\$($path.name).csv" }

How do I write a powershell script that gets the file with the most recent last write time from a folder?

The subject line says it all. I'd also like to do this using pipes.
I figured that I could use Get-ChildItem, Measure-Object and Where-Object, but Measure-Object doesn't like dates.
Should I have a script block which loops through each item returned from Get-ChildItem and does a comparison to see if it's the most recent? I thought that there should be a handy PS cmdlet for that.
Get-ChildItem | Sort LastWriteTime -Descending | Select -First 1