Powershell get-childitem needs a lot of memory - powershell

My question is pretty much the same as the one posted on metafilter.
I need to use a PowerShell script to scan through a large amount of files. The issue is that it seems that the "Get-ChildItem" function insists on shoving the whole folder and file structure in memory. Since the drive has over a million files in over 30,000 folders, the script needs a lot of memory.
http://ask.metafilter.com/134940/PowerShell-recursive-processing-of-all-files-and-folders-without-OutOfMemory-exception
All what I need is the name, size and location of the files.
What I do since now is:
$filesToIndex = Get-ChildItem -Path $path -Recurse | Where-Object { !$_.PSIsContainer }
It works but I don't want to punish my memory :-)
Best regards,
greenhoorn

If you want to optimize the script to use less memory, you need to properly utilize the pipeline. What you are doing is saving the result of Get-ChildItem -recurse into memory, all of it! What you could do is something like this:
Get-ChildItem -Path $Path -Recurse | Foreach-Object {
if (-not($_.PSIsContainer)) {
# do stuff / get info you need here
}
}
This way you are always streaming the data through the pipeline and you will see that PowerShell will consume less memory (if done correctly).

One thing you can do to help is to reduce the size of the objects your saving by paring them down to just the properties you're interested in.
$filesToIndex = Get-ChildItem -Path $path -Recurse |
Where-Object { !$_.PSIsContainer } |
Select Name,Fullname,Length

Related

Apply a file to multiple folders using better PowerShell script

I'm working on a project where I have to apply a file to multiple folders every so often. I'm trying to learn some PowerShell commands to make this a little easier. I came up with the following script, which works, but I feel that this is too verbose and could be distilled down with a better script:
[string]$sourceDirectory = "C:\Setup\App Folder Files\*"
# Create an array of folders
$destinationDirectories = #(
'C:\Users\GG_RCB1\Documents\',
'C:\Users\GG_RCB2\Documents\',
'C:\Users\LA_RCB1\Documents\',
'C:\Users\PR_RCB1\Documents\',
'C:\Users\PQ_RCB1\Documents\',
'C:\Users\PQ_RCB2\Documents\',
'C:\Users\XC_RCB1\Documents\',
'C:\Users\XC_RCB2\Documents\',
'C:\Users\XC_RCB3\Documents\',
'C:\Users\XC_RCB4\Documents\',
'C:\Users\XC_RCB5\Documents\',
'C:\Users\XC_RCB6\Documents\',
'C:\Users\XC_RCB7\Documents\',
'C:\Users\XC_RCB8\Documents\')
# Perform iteration to create the same file in each folder
foreach ($i in $destinationDirectories) {
Copy-item -Force -Recurse -Verbose $sourceDirectory -Destination $i
}
I go into this process knowing that every folder in the User folder area is going to have the same format: _RCB<#>\Documents\
I know that I can loop through those files using this code:
Get-ChildItem -Path 'C:\Users'| where-object {$_.Name -match "^[A-Z][A-Z]_RCB"}
What I'm not sure how to do is to how, within that loop, drill down to the Documents folder and do the copy. I want to avoid having to keep updating the array from the first code sample, particularly when I know the naming convention of the subfolders in the Users folder. I'm just looking for a cleaner way to do this.
Thanks for any suggestions!
Ehh, I'll go ahead and post what I had in mind as well. Not to take away from #Mathias suggestion in the comments, but to offer my solution, here's my take:
Get-ChildItem -Path "C:\users\[A-Z][A-Z]_RCB*\documents" |
Copy-Item -Path $sourceDirectory -Destination { $_.FullName } -Recurse -WhatIf
Since everyone loves the "One-Liners" that can accomplish your needs. Get-ChildItem accepts wildcard-expressions in it's path which let's us accomplish this in one go. Given that your directories are...
consistent with the same naming pattern,
[A-Z][A-Z]_*
and the folder destination is the same.
Documents
Luckily, Copy-Item also has some cool features on it's own such as being able to use a script block that will allow the passing of $_.FullName property as it's destination, while they are passed down the pipeline one at a time.
Remove the -WhatIf common parameter when you've dictated the results are what you're after.

Get-Childitem - improve memory usage and performance

I would like to be able to also retrieve the file owner , LastAccessTime, LastWriteTime, CreationTime. Get-Childitem has known performance issues when scaled to large directory structures.
We had some performance issue while looking for files in a folder which have more than 100000 subfolders.
Here is my script:
$Dir = get-childitem "W:\DATA" -recurse -force
$Dir | Select-Object name,fullname, LastAccessTime, LastWriteTime, CreationTime, #{N='Owner';E={$_.GetAccessControl().Owner}} | Export-Csv -path C:\Scripts\xlsx.csv -NoTypeInformation
thanks in advance,
Memory
PowerShell objects (PSCustomObject) are optimized for streaming (One-at-a-time processing) and therefore quiet heavy.
Using parenthesis ((...)) or assigning you stream to a variable (like: $Dir =) will choke the pipeline and pile up all the objects into memory.
To reduce memory usage, immediately pass your objects through the pipeline by chaining the concerned cmdlets with a pipe character:
Get-childitem "W:\DATA" -recurse -force |
Select-Object astAccessTime, LastWriteTime, CreationTime |
Export-Csv -path C:\Scripts\xlsx.csv -NoTypeInformation
Performance
Starting with a quote from PowerShell scripting performance considerations:
PowerShell scripts that leverage .NET directly and avoid the pipeline tend to be faster than idiomatic PowerShell. Idiomatic PowerShell typically uses cmdlets and PowerShell functions heavily, often leveraging the pipeline, and dropping down into .NET only when necessary.
In your case, the performance bottleneck is likely not in PowerShell but due to the server and the network. Meaning leveraging from .NET directly would probably not have any effect on the performance.
In fact, using the PowerShell pipeline might be even faster in this case as you do not have to wait until the last file info item is loaded into memory where the native PowerShell pipeline immediately starts processing at the first item while the next items are (slowly) provided by the server.
If you change the last cmdlet (Export-Csv) to ConvertTo-Csv you will probably see the difference where a (correctly setup) pipeline almost starts on fly and other solutions take a while before outputting any data to the console.
The numbers tell the tale
(In Dutch: "meten is weten", which literally means: measuring is knowing)
If you aren't sure what technique would give you the best performance, I recommend you to simply test it (on a subset), like:
Measure-Command {
Get-childitem "W:\DATA" -recurse -force |
Select-Object astAccessTime, LastWriteTime, CreationTime |
Export-Csv -path C:\Scripts\xlsx.csv -NoTypeInformation
} | Select-Object TotalMilliseconds
and compare the results.
Give this a try, should be faster than Get-ChildItem. You could also use [SearchOption]::AllDirectories and no Collections.Queue but I'm not certain if that would consume less memory.
using namespace System.Collections
using namespace System.IO
class InfoProps {
[string] $Name
[string] $FullName
[datetime] $LastAccessTime
[datetime] $LastWriteTime
[datetime] $CreationTime
[string] $Owner
Infoprops([object]$FileInfo)
{
$this.Name = $FileInfo.Name
$this.FullName = $FileInfo.FullName
$this.LastAccessTime = $FileInfo.LastAccessTime
$this.LastWriteTime = $FileInfo.LastWriteTime
$this.CreationTime = $FileInfo.CreationTime
$this.Owner = $FileInfo.GetAccessControl().Owner
}
}
$initialDirectory = $pwd.Path
$queue = [Queue]::new()
$queue.Enqueue($initialDirectory)
& {
while ($queue.Count)
{
$target = $queue.Dequeue()
foreach ($childs in [Directory]::EnumerateDirectories($target)) {
$queue.Enqueue($childs)
}
[InfoProps] [DirectoryInfo] $target # => Remove this line if you want only files!
[InfoProps[]] [FileInfo[]] [Directory]::GetFiles($target)
}
} | Export-Csv test.csv -NoTypeInformation

Copying files defined in a list from network location

I'm trying to teach myself enough powershell or batch programming to figure out to achieve the following (I've had a search and looked through a couple hours of Youtube tutorials but can't quite piece it all together to figure out what I need - I don't get Tokens, for example, but they seem necessary in the For loop). Also, not sure if the below is best achieved by robocopy or xcopy.
Task:
Define a list of files to retrieve in a csv (file name will be listed as a 13 digit number, extension will be UNKNOWN, but will usually be .jpg but might occasionally be .png - could this be achieved with a wildcard?)
list would read something like:
9780761189931
9780761189988
9781579657159
For each line in this text file, do:
Search a network folder and all subfolders
If exact filename is found, copy to an arbitrary target (say a new folder created on desktop)
(Not 100% necessary, but nice to have) Once the For loop has completed, output a list of files copied into a text file in the newly created destination folder
I gather that I'll maybe need to do a couple of things first, like define variables for the source and destination folders? I found the below elsewhere but couldn't quite get my head around it.
set src_folder=O:\2017\By_Month\Covers
set dst_folder=c:\Users\%USERNAME&\Desktop\GetCovers
for /f "tokens=*" %%i in (ISBN.txt) DO (
xcopy /K "%src_folder%\%%i" "%dst_folder%"
)
Thanks in advance!
This solution is in powershell, by the way.
To get all subfiles of a folder, use Get-ChildItem and the pipeline, and you can then compare the name to the insides of your CSV (which you can get using import-CSV, by the way).
Get-ChildItem -path $src_folder -recurse | foreach{$_.fullname}
I'd personally then use a function to edit the name as a string, but I know this probably isn't the best way to do it. Create a function outside of the pipeline, and have it return a modified path in such a way that you can continue the previous line like this:
Get-ChildItem -path $src_folder -recurse | foreach{$_.CopyTo (edit-path $_.fullname)}
Where "edit-directory" is your function that takes in the path, and modifies it to return your destination path. Also, you can alternatively use robocopy or xcopy instead of CopyTo, but Copy-Item is a powershell native and doesn't require much string manipulation (which in my experience, the less, the better).
Edit: Here's a function that could do the trick:
function edit-path{
Param([string] $path)
$modified_path = $dst_folder + "\"
$modified_path = $path.substring($src_folder.length)
return $modified_path
}
Edit: Here's how to integrate the importing from CSV, so that the copy only happens to files that are written in the CSV (which I had left out, oops):
$csv = import-csv $CSV_path
Get-ChildItem -path $src_folder -recurse | where-object{$csv -contains $_.name} | foreach{$_.CopyTo (edit-path $_.fullname)}
Note that you have to put the whole CSV path in the $CSV_path variable, and depending on how the contents of that file are written, you may have to use $_.fullname, or other parameters.
This seems like an average enough problem:
$Arr = Import-CSV -Path $CSVPath
Get-ChildItem -Path $Folder -Recurse |
Where-Object -FilterScript { $Arr -contains $PSItem.Name.Substring(0,($PSItem.Length - 4)) } |
ForEach-Object -Process {
Copy-Item -Destination $env:UserProfile\Desktop
$PSItem.Name | Out-File -FilePath $env:UserProfile\Desktop\Results.txt -Append
}
I'm not great with string manipulation so the string bit is a bit confusing, but here's everything spelled out.

Powershell memory exhaustion using NTFSSecurity module on a deep folder traverse

I have been tasked with reporting all of the ACL's on each folder in our Shared drive structure. Added to that, I need to do a look up on the membership of each unique group that gets returned.
Im using the NTFSSecurity module in conjunction with the get-childitem2 cmdlet to get past the 260 character path length limit. The path(s) I am traversing are many hundreds of folders deep and long since pass the 260 character limit.
I have been banging on this for a couple of weeks. My first challenge was crafting my script to do my task all at once, but now im thinking thats my problem... The issue at hand is resources, specifically memory exhaustion. Once the script gets into one of the deep folders, it consumes all RAM and starts swapping to disk, and I eventually run out of disk space.
Here is the script:
$csvfile = 'C:\users\user1\Documents\acl cleanup\dept2_Dir_List.csv'
foreach ($record in Import-Csv $csvFile)
{
$Groups = get-childitem2 -directory -path $record.FullName -recurse | Get-ntfsaccess | where -property accounttype -eq -value group
$groups2 = $Groups | where -property account -notmatch -value '^builtin|^NT AUTHORITY\\|^Creator|^AD\\Domain'
$groups3 = $groups2 | select account -Unique
$GroupMembers = ForEach ($Group in $Groups3) {
(Get-ADGroup $Group.account.sid | get-adgroupmember | select Name, #{N="GroupName";e={$Group.Account}}
)}
$groups2 | select FullName,Account,AccessControlType,AccessRights,IsInherited | export-csv "C:\Users\user1\Documents\acl cleanup\Dept2\$($record.name).csv"
$GroupMembers | export-csv "C:\Users\user1\Documents\acl cleanup\Dept2\$($record.name)_GroupMembers.csv"
}
NOTE: The dir list it reads in is the top level folders created from a get-childitem2 -directory | export-csv filename.csv
During the run, it appears to not be flushing memory properly. This is just a guess from observation. At the end of each run through the code, the variables should be getting over-written, I thought, but memory doesn't go down, so it looked to me that since memory didn't go back down, that it wasn't properly releasing it? Like I said, a guess... I have been reading about runspaces but I am confused about how to implement that with this script. Is that the right direction for this?
Thanks in advance for any assistance...!
Funny you should post about this as I just finished a modified version of the script that I think works much better. A friend turned me on to 'Function Filters' that seem to work well here. Ill test it on the big directories tomorrow to see how much better the memory management is but so far it looks great.
#Define the function ‘filter’ here and call it ‘GetAcl’. Process is the keyword that tells the function to deal with each item in the pipeline one at a time
Function GetAcl {
PROCESS {
Get-NTFSAccess $_ | where -property accounttype -eq -value group | where -property account -notmatch -value '^builtin|^NT AUTHORITY\\|^Creator|^AD\\Domain'
}
}
#Import the directory top level paths
$Paths = import-csv 'C:\users\rknapp2\Documents\acl cleanup\dept2_Dir_List.csv'
#Process each line from the importcsv one at a time and run GetChilditem against it.
#Notice the second part – I ‘|’ pipe the results of the GetChildItem to the function that because of the type of function it is, handles each item one at a time
#When done, pass results to Exportcsv and send it to a file name based on the path name. This puts each dir into its own file.
ForEach ($Path in $paths) {
(Get-ChildItem2 -path $path.FullName -Recurse -directory) | getacl | export-csv "C:\Users\rknapp2\Documents\acl cleanup\TestFilter\$($path.name).csv" }

powershell slow(?) - write names of subfolders to a text file

My Powershell script seems slow, when I run the below code in ISE, it keeps running, doesn't stop.
I am trying to write the list of subfolders in a folder(the folder path is in $scratchpart) to a text file. There are >30k subfolders
$limit = (Get-Date).AddDays(-15)
$path = "E:\Data\PathToScratch.txt"
$scratchpath = Get-Content $path -TotalCount 1
Get-ChildItem -Path $scratchpath -Recurse -Force | Where-Object { $_.PSIsContainer -and $_.CreationTime -lt $limit } | Add-Content C:\Data\eProposal\POC\ScratchContents.txt
Let me know if my approach is not optimal. Ultimately, I will read the text file, zip the subfolders for archival and delete them.
Thanks for your help in advance. I am new to PS, watched few videos on MVA
Add-Content, Set-Content, and even Out-File are notoriously slow in PowerShell. This is because each call opens the file, writes to it, and closes the handle. It never does anything more intelligently than that.
That doesn't sound bad until you consider how pipelines work with Get-ChildItem (and Where-Object and Select-Object). It doesn't wait until it's completed before it begins passing objects into the pipeline. It starts passes objects as soon as the provider returns them. For a large result set, this means that the objects are still feeding in the pipeline long after several have finished processing. Generally speaking, this is great! It means the system will function more efficiently, and it's why stuff like this:
$x = Get-ChildItem;
$x | ForEach-Object { [...] };
Is significantly slower than stuff like this:
Get-ChildItem | ForEach-Object { [...] };
And it's why stuff like this appears to stall:
Get-ChildItem | Sort-Object Name | ForEach-Object { [...] };
The Sort-Object cmdlet needs to waits until it's received all pipeline objects before it sorts. It kind of has to to be able to sort. The sort itself is nearly instantaneous; it's just the cmdlet waiting until it has the full results.
The issue with Add-Content is that, well, it experiences the pipeline not as, "Here's a giant string to write once," but instead as, "Here's a string to write. Here's a string to write. Here's a string to write. Here's a string to write." You'll be sending content to Add-Content here line by line. Each line will instantiate a new call to Add-Content, requiring the file to open, write, and close. You'll likely see better performance if you assign the result of Get-ChildItem [...] | Where-Object [...] to a variable, and then write the entire variable to the file at once:
$limit = (Get-Date).AddDays(-15);
$path = "E:\Data\PathToScratch.txt";
$scratchpath = Get-Content $path -TotalCount 1;
$Results = Get-ChildItem -Path $scratchpath -Recurse -Force -Directory | `
Where-Object{$_.CreationTime -lt $limit } | `
Select-Object -ExpandPropery FullName;
Add-Content C:\Data\eProposal\POC\ScratchContents.txt -Value $Results;
However, you might be concerned about memory usage if your results are actually going to be extremely large. You can actually use System.IO.StreamWriter for this purpose, too. My process improved in speed by nearly two orders of magnitude (from 12 hours to 20 minutes) by switching to StreamWriter and also only calling StreamWriter when I had about 250 lines to write (that seemed to be the break-even point for StreamWriter's overhead). But I was parsing all ACLs for user home and group shares for about 10,000 users and nearly 10 TB of data. Your task might not be as large.
Here's a good blog explaining the issue.
Do you have at least PowerShell 3.0? If you do you should be able to reduce the time by filtering out the files since you are returning those as well.
Get-ChildItem -Path $scratchpath -Recurse -Force -Directory | ...
Currently you are returning all files and folders then filtering out the files with $_.PSIsContainer which would be slower. So should end up with something like this
Get-ChildItem -Path $scratchpath -Recurse -Force -Directory |
Where-Object{$_.CreationTime -lt $limit } |
Select-Object -ExpandPropery FullName |
Add-Content C:\Data\eProposal\POC\ScratchContents.txt