powershell slow(?) - write names of subfolders to a text file - powershell

My Powershell script seems slow, when I run the below code in ISE, it keeps running, doesn't stop.
I am trying to write the list of subfolders in a folder(the folder path is in $scratchpart) to a text file. There are >30k subfolders
$limit = (Get-Date).AddDays(-15)
$path = "E:\Data\PathToScratch.txt"
$scratchpath = Get-Content $path -TotalCount 1
Get-ChildItem -Path $scratchpath -Recurse -Force | Where-Object { $_.PSIsContainer -and $_.CreationTime -lt $limit } | Add-Content C:\Data\eProposal\POC\ScratchContents.txt
Let me know if my approach is not optimal. Ultimately, I will read the text file, zip the subfolders for archival and delete them.
Thanks for your help in advance. I am new to PS, watched few videos on MVA

Add-Content, Set-Content, and even Out-File are notoriously slow in PowerShell. This is because each call opens the file, writes to it, and closes the handle. It never does anything more intelligently than that.
That doesn't sound bad until you consider how pipelines work with Get-ChildItem (and Where-Object and Select-Object). It doesn't wait until it's completed before it begins passing objects into the pipeline. It starts passes objects as soon as the provider returns them. For a large result set, this means that the objects are still feeding in the pipeline long after several have finished processing. Generally speaking, this is great! It means the system will function more efficiently, and it's why stuff like this:
$x = Get-ChildItem;
$x | ForEach-Object { [...] };
Is significantly slower than stuff like this:
Get-ChildItem | ForEach-Object { [...] };
And it's why stuff like this appears to stall:
Get-ChildItem | Sort-Object Name | ForEach-Object { [...] };
The Sort-Object cmdlet needs to waits until it's received all pipeline objects before it sorts. It kind of has to to be able to sort. The sort itself is nearly instantaneous; it's just the cmdlet waiting until it has the full results.
The issue with Add-Content is that, well, it experiences the pipeline not as, "Here's a giant string to write once," but instead as, "Here's a string to write. Here's a string to write. Here's a string to write. Here's a string to write." You'll be sending content to Add-Content here line by line. Each line will instantiate a new call to Add-Content, requiring the file to open, write, and close. You'll likely see better performance if you assign the result of Get-ChildItem [...] | Where-Object [...] to a variable, and then write the entire variable to the file at once:
$limit = (Get-Date).AddDays(-15);
$path = "E:\Data\PathToScratch.txt";
$scratchpath = Get-Content $path -TotalCount 1;
$Results = Get-ChildItem -Path $scratchpath -Recurse -Force -Directory | `
Where-Object{$_.CreationTime -lt $limit } | `
Select-Object -ExpandPropery FullName;
Add-Content C:\Data\eProposal\POC\ScratchContents.txt -Value $Results;
However, you might be concerned about memory usage if your results are actually going to be extremely large. You can actually use System.IO.StreamWriter for this purpose, too. My process improved in speed by nearly two orders of magnitude (from 12 hours to 20 minutes) by switching to StreamWriter and also only calling StreamWriter when I had about 250 lines to write (that seemed to be the break-even point for StreamWriter's overhead). But I was parsing all ACLs for user home and group shares for about 10,000 users and nearly 10 TB of data. Your task might not be as large.
Here's a good blog explaining the issue.

Do you have at least PowerShell 3.0? If you do you should be able to reduce the time by filtering out the files since you are returning those as well.
Get-ChildItem -Path $scratchpath -Recurse -Force -Directory | ...
Currently you are returning all files and folders then filtering out the files with $_.PSIsContainer which would be slower. So should end up with something like this
Get-ChildItem -Path $scratchpath -Recurse -Force -Directory |
Where-Object{$_.CreationTime -lt $limit } |
Select-Object -ExpandPropery FullName |
Add-Content C:\Data\eProposal\POC\ScratchContents.txt

Related

Get-Childitem - improve memory usage and performance

I would like to be able to also retrieve the file owner , LastAccessTime, LastWriteTime, CreationTime. Get-Childitem has known performance issues when scaled to large directory structures.
We had some performance issue while looking for files in a folder which have more than 100000 subfolders.
Here is my script:
$Dir = get-childitem "W:\DATA" -recurse -force
$Dir | Select-Object name,fullname, LastAccessTime, LastWriteTime, CreationTime, #{N='Owner';E={$_.GetAccessControl().Owner}} | Export-Csv -path C:\Scripts\xlsx.csv -NoTypeInformation
thanks in advance,
Memory
PowerShell objects (PSCustomObject) are optimized for streaming (One-at-a-time processing) and therefore quiet heavy.
Using parenthesis ((...)) or assigning you stream to a variable (like: $Dir =) will choke the pipeline and pile up all the objects into memory.
To reduce memory usage, immediately pass your objects through the pipeline by chaining the concerned cmdlets with a pipe character:
Get-childitem "W:\DATA" -recurse -force |
Select-Object astAccessTime, LastWriteTime, CreationTime |
Export-Csv -path C:\Scripts\xlsx.csv -NoTypeInformation
Performance
Starting with a quote from PowerShell scripting performance considerations:
PowerShell scripts that leverage .NET directly and avoid the pipeline tend to be faster than idiomatic PowerShell. Idiomatic PowerShell typically uses cmdlets and PowerShell functions heavily, often leveraging the pipeline, and dropping down into .NET only when necessary.
In your case, the performance bottleneck is likely not in PowerShell but due to the server and the network. Meaning leveraging from .NET directly would probably not have any effect on the performance.
In fact, using the PowerShell pipeline might be even faster in this case as you do not have to wait until the last file info item is loaded into memory where the native PowerShell pipeline immediately starts processing at the first item while the next items are (slowly) provided by the server.
If you change the last cmdlet (Export-Csv) to ConvertTo-Csv you will probably see the difference where a (correctly setup) pipeline almost starts on fly and other solutions take a while before outputting any data to the console.
The numbers tell the tale
(In Dutch: "meten is weten", which literally means: measuring is knowing)
If you aren't sure what technique would give you the best performance, I recommend you to simply test it (on a subset), like:
Measure-Command {
Get-childitem "W:\DATA" -recurse -force |
Select-Object astAccessTime, LastWriteTime, CreationTime |
Export-Csv -path C:\Scripts\xlsx.csv -NoTypeInformation
} | Select-Object TotalMilliseconds
and compare the results.
Give this a try, should be faster than Get-ChildItem. You could also use [SearchOption]::AllDirectories and no Collections.Queue but I'm not certain if that would consume less memory.
using namespace System.Collections
using namespace System.IO
class InfoProps {
[string] $Name
[string] $FullName
[datetime] $LastAccessTime
[datetime] $LastWriteTime
[datetime] $CreationTime
[string] $Owner
Infoprops([object]$FileInfo)
{
$this.Name = $FileInfo.Name
$this.FullName = $FileInfo.FullName
$this.LastAccessTime = $FileInfo.LastAccessTime
$this.LastWriteTime = $FileInfo.LastWriteTime
$this.CreationTime = $FileInfo.CreationTime
$this.Owner = $FileInfo.GetAccessControl().Owner
}
}
$initialDirectory = $pwd.Path
$queue = [Queue]::new()
$queue.Enqueue($initialDirectory)
& {
while ($queue.Count)
{
$target = $queue.Dequeue()
foreach ($childs in [Directory]::EnumerateDirectories($target)) {
$queue.Enqueue($childs)
}
[InfoProps] [DirectoryInfo] $target # => Remove this line if you want only files!
[InfoProps[]] [FileInfo[]] [Directory]::GetFiles($target)
}
} | Export-Csv test.csv -NoTypeInformation

Powershell - Match ID's in a text file against filenames in multiple folders

I need to search through 350,000 files to find any that contains certain patterns in the filename. However, the list of patterns (id numbers) that it needs to match is 1000! So I would very much like to be able to script this, because they were originally planning on doing it manually...
So to make it clearer:
Check each File in folder and all subfolders.
If the filename contains any of the IDs in the text file then move it to another file
Otherwise, ignore it.
So I have the basic code that works with a single value:
$name = Get-Content 'C:\test\list.txt'
get-childitem -Recurse -path "c:\test\source\" -filter "*$name*" |
move-item -Destination "C:\test\Destination"
If I change $name to point to a single ID, it works, if I have a single ID in the txt file, it works. Multiple items in a list:
1111111
2222222
3333333
It fails. What am I doing wrong? How can I get it to work? I'm still new to powershell so please be a little more descriptive in any answers.
Your test fails because it is effectively trying to do this (using your test data).
Get-ChildItem -Recurse -Path "c:\test\source\" -filter "*1111111 2222222 3333333*"
Which obviously does not work. It is squishing the array into one single space delimited string. You have to account for the multiple id logic in a different way.
I am not sure which of these will perform better so make sure you test both of these with your own data to get a better idea of execution time.
Cycle each "filter"
$filters = Get-Content 'C:\test\list.txt'
# Get the files once
$files = Get-ChildItem -Recurse -Path "c:\test\source" -File
# Cycle Each ID filter manually
$filters | ForEach-Object{
$singleFilter
$files | Where-Object{$_.Name -like "*$singleFilter*"}
} | Move-Item -Destination "C:\test\Destination"
Make one larger filter
$filters = Get-Content 'C:\test\list.txt'
# Build a large regex alternative match pattern. Escape each ID in case there are regex metacharacters.
$regex = ($filters | ForEach-Object{[regex]::Escape($_)}) -join "|"
# Get the files once
Get-ChildItem -Recurse -path "c:\test\source" -File |
Where-Object{$_.Name -match $regex} |
Move-Item -Destination "C:\test\Destination"
try following this tutorial on how to use get-content function. Looks like when you have a multiple line file, you get an array back. you then have to iterate through your array and use the logic you used for only one item

Powershell -- Get-ChildItem Directory full path and lastaccesstime

I am attempting to output full directory path and lastaccesstime in one line.
Needed --
R:\Directory1\Directory2\Directory3, March 10, 1015
What I am getting --
R:\Directory1\Directory2\Directory3
March 10, 1015
Here is my code, It isn't that complicated, but it is beyond me.
Get-ChildItem -Path "R:\" -Directory | foreach-object -process{$_.FullName, $_.LastAccessTime} | Where{ $_.LastAccessTime -lt [datetime]::Today.AddYears(-2) } | Out-File c:\temp\test.csv
I have used foreach-object in the past in order to ensure I do not truncate the excessively long directory names and paths, but never used it when pulling two properties. I would like the information to be on all one line, but haven't been successful. Thanks in advance for the assist.
I recommend filtering (Where-Object) before selecting the properties you want. Also I think you want to replace ForEach-Object with Select-Object, and lastly I think you want Export-Csv rather than Out-File. Example:
Get-ChildItem -Path "R:\" -Directory |
Where-Object { $_.LastAccessTime -lt [DateTime]::Today.AddYears(-2) } |
Select-Object FullName,LastAccessTime |
Export-Csv C:\temp\test.csv -NoTypeInformation
We can get your output on one line pretty easily, but to make it easy to read we may have to split your script out to multiple lines. I'd recommend saving the script below as a ".ps1" which would allow you to right click and select "run with powershell" to make it easier in the future. This script could be modified to play around with more inputs and variables in order to make it more modular and work in more situations, but for now we'll work with the constants you provided.
$dirs = Get-ChildItem -Path "R:\" -Directory
We'll keep the first line you made, since that is solid and there's nothing to change.
$arr = $dirs | Select-Object {$_.FullName, $_.LastAccessTime} | Where-Object{ $_.LastAccessTime -lt [datetime]::Today.AddYears(-2) }
For the second line, we'll use "Select-Object" instead. In my opinion, it's a lot easier to create an array this way. We'll want to deal with the answers as an array since it'll be easiest to post the key,value pairs next to each other this way. I've expanded your "Where" to "Where-Object" since it's best practice to use the full cmdlet name instead of the alias.
Lastly, we'll want to convert our "$arr" object to csv before putting in the temp out-file.
ConvertTo-CSV $arr | Out-File "C:\Temp\test.csv"
Putting it all together, your final script will look like this:
$dirs = Get-ChildItem -Path "C:\git" -Directory
$arr = $dirs | Select-Object {$_.FullName, $_.LastAccessTime} | Where{ $_.LastAccessTime -lt [datetime]::Today.AddYears(-2) }
ConvertTo-CSV $arr | Out-File "C:\Temp\test.csv"
Again, you can take this further by creating a function, binding it to a cmdlet, and creating parameters for your path, output file, and all that fun stuff.
Let me know if this helps!

Powershell memory exhaustion using NTFSSecurity module on a deep folder traverse

I have been tasked with reporting all of the ACL's on each folder in our Shared drive structure. Added to that, I need to do a look up on the membership of each unique group that gets returned.
Im using the NTFSSecurity module in conjunction with the get-childitem2 cmdlet to get past the 260 character path length limit. The path(s) I am traversing are many hundreds of folders deep and long since pass the 260 character limit.
I have been banging on this for a couple of weeks. My first challenge was crafting my script to do my task all at once, but now im thinking thats my problem... The issue at hand is resources, specifically memory exhaustion. Once the script gets into one of the deep folders, it consumes all RAM and starts swapping to disk, and I eventually run out of disk space.
Here is the script:
$csvfile = 'C:\users\user1\Documents\acl cleanup\dept2_Dir_List.csv'
foreach ($record in Import-Csv $csvFile)
{
$Groups = get-childitem2 -directory -path $record.FullName -recurse | Get-ntfsaccess | where -property accounttype -eq -value group
$groups2 = $Groups | where -property account -notmatch -value '^builtin|^NT AUTHORITY\\|^Creator|^AD\\Domain'
$groups3 = $groups2 | select account -Unique
$GroupMembers = ForEach ($Group in $Groups3) {
(Get-ADGroup $Group.account.sid | get-adgroupmember | select Name, #{N="GroupName";e={$Group.Account}}
)}
$groups2 | select FullName,Account,AccessControlType,AccessRights,IsInherited | export-csv "C:\Users\user1\Documents\acl cleanup\Dept2\$($record.name).csv"
$GroupMembers | export-csv "C:\Users\user1\Documents\acl cleanup\Dept2\$($record.name)_GroupMembers.csv"
}
NOTE: The dir list it reads in is the top level folders created from a get-childitem2 -directory | export-csv filename.csv
During the run, it appears to not be flushing memory properly. This is just a guess from observation. At the end of each run through the code, the variables should be getting over-written, I thought, but memory doesn't go down, so it looked to me that since memory didn't go back down, that it wasn't properly releasing it? Like I said, a guess... I have been reading about runspaces but I am confused about how to implement that with this script. Is that the right direction for this?
Thanks in advance for any assistance...!
Funny you should post about this as I just finished a modified version of the script that I think works much better. A friend turned me on to 'Function Filters' that seem to work well here. Ill test it on the big directories tomorrow to see how much better the memory management is but so far it looks great.
#Define the function ‘filter’ here and call it ‘GetAcl’. Process is the keyword that tells the function to deal with each item in the pipeline one at a time
Function GetAcl {
PROCESS {
Get-NTFSAccess $_ | where -property accounttype -eq -value group | where -property account -notmatch -value '^builtin|^NT AUTHORITY\\|^Creator|^AD\\Domain'
}
}
#Import the directory top level paths
$Paths = import-csv 'C:\users\rknapp2\Documents\acl cleanup\dept2_Dir_List.csv'
#Process each line from the importcsv one at a time and run GetChilditem against it.
#Notice the second part – I ‘|’ pipe the results of the GetChildItem to the function that because of the type of function it is, handles each item one at a time
#When done, pass results to Exportcsv and send it to a file name based on the path name. This puts each dir into its own file.
ForEach ($Path in $paths) {
(Get-ChildItem2 -path $path.FullName -Recurse -directory) | getacl | export-csv "C:\Users\rknapp2\Documents\acl cleanup\TestFilter\$($path.name).csv" }

PowerShell script file modify time>10h and return a value if nothing is found

I am trying to compose a script/one liner, which will find files which have been modified over 10 hours ago in a specific folder and if there are no files I need it to print some value or string.
Get-ChildItem -Path C:\blaa\*.* | where {$_.Lastwritetime -lt (date).addhours(-10)}) | Format-table Name,LastWriteTime -HideTableHeaders"
With that one liner I am getting the wanted result when there are files with
modify time over 10 hours, but I also need it to print value/string if there are
no results, so that I can monitor it properly.
The reason for this is to utilize the script/one liner for monitoring purposes.
Those cmdlet Get-ChildItem and where clause you have a would return null if nothing was found. You would have to account for that separately. I would also caution the use of Format-Table for output unless you are just using it for screen reading. If you wanted a "one-liner" you would could this. All PowerShell code can be a one liner if you want it to be.
$results = Get-ChildItem -Path C:\blaa\*.* | where {$_.Lastwritetime -lt (date).addhours(-10)} | Select Name,LastWriteTime; if($results){$results}else{"No files found matching criteria"}
You have an added bracket in your code, that might be a copy artifact, I had to remove. Coded properly would look like this
$results = Get-ChildItem -Path "C:\blaa\*.*" |
Where-Object {$_.Lastwritetime -lt (date).addhours(-10)} |
Select Name,LastWriteTime
if($results){
$results
}else{
"No files found matching criteria"
}