Getting files in a directory that has over 7 million items using powershell - powershell

This is currently what I am trying to execute.
$folderPath = 'M:\abc\WORKFORCE\Media\Attachments'
Write-Host "Executing Script..."
foreach ($file in Get-ChildItem $folderPath -file)
{
# execute code
}
However when I execute the powershell script it freezes on me. It's been this way for an hour now. I'm assuming it might be because the directory has over 8 million items in it. Is there a more efficient way to move these items? Is waiting my only option? Or is it not possible to do this at all with powershell because of how large the directory is?

When you do not need any information except file name, you should use [System.IO.Directory]::EnumerateFiles($folderPath, '*')
EnumerateFiles returns IEnumerable[String].
IEnumerable is a special type that can be used in foreach statements. It does not loads information into memory, but instead it gets next item only when requested. It works almost immediately.
So, your code will be
$filesIEnumerable = [System.IO.Directory]::EnumerateFiles($folderPath,'*')
foreach ($fullName in $filesIEnumerable) {
# code here
$fileName = [System.IO.Path]::GetFileName($fullName)
# more code here
}
In case you want to keep in-memory all list of files instead of iterating once ( for example you need to iterate several times ), EnumerateFiles is still a faster and requires less memory than Get-ChildItem because it does not get any extended file attributes:
$files = #([System.IO.Directory]::EnumerateFiles($folderPath,'*'))
Look further about EnumerateFiles at learn.microsoft.com

Without further explanation of what the end-goal of the script is; there can not really be a solution to this question.
However, a tip on performance, can be given.
Original script:
$folderPath = 'M:\abc\WORKFORCE\Media\Attachments'
Write-Host "Executing Script..."
foreach ($file in Get-ChildItem $folderPath -file)
{
# execute code
}
Suggested approach:
$files = Get-ChildItem 'M:\abc\WORKFORCE\Media\Attachments' -file
$DestinationPath = 'F:\DestinationFolder'
Write-Host "Executing Script..."
$Files | ForEach-Object {
# execute code
# Write-Verbose "Moving $_.Name"
# Move-Item -Destination $DestinationPath
}
That being said, it looks like filimonic's take on an answer has a superior speed to its execution, than my suggestion.
( To expand on that, check this thread)

Related

How to find a file from files via PowerShell?

I had an excel script to search for files in a command.
I found this example on the forum, the statement says that to search for a file by name, you need to write down the name and send (*) but when requested, it does not find anything
Get-ChildItem -Path "C:\\Folder\\test\*"
What can I do to simplify the code and make it much faster. Wait 10 minutes to find a file out of 10000. this is very long
I have a folder with 10,000 files, and excel searches through VBA for a script in almost 2-3 seconds.
When to script in PowerShell via
$find = Get-ChildItem -Path "C:\\Folder"
for ($f=0; $f -lt $find.Count; $f++){
$path_name = $find\[$f\].Name
if($path_name-eq 'test'){
Write Host 'success'
}
}
ut it turns out sooooooo long, the script hangs for 10 minutes and does not respond, and maybe he will be lucky to answer.
How can I find a file by filter using
Get-ChildItem
To make your search faster you can use Get-ChildItem filter.
$fileName = "test.txt"
$filter = "*.txt"
$status = Get-ChildItem -Path "C:\PS\" -Recurse -Filter $filter | Where-Object {$_.Name -match $fileName}
if ($status) {
Write-Host "$($status.Name) is found"
} else {
Write-Host "No such file is available"
}
You could also compare the speed of searching by using Measure-Command
If the disk the data is on is slow then it'll be slow no matter what you do.
If the folder is full of files then it'll also be slow depending on the amount of RAM in the system.
Less files per folder equals more performance so try to split them up into several folders if possible.
Doing that may also mean you can run several Get-ChildItems at once (disk permitting) using PSJobs.
Using several loops to take take care of a related problem usually makes the whole thing run "number of loops" times as long. That's what Where-Object is for (in addition to the -Filter, -Include and -Exclude flags to Get-ChildItem`).
Console I/O takes A LOT of time. Do NOT output ANYTHING unless you have to, especially not inside loops (or cmdlets that act like loops).
For example, including basic statistics:
$startTime = Get-Date
$FileList = Get-ChildItem -Path "C:\Folder" -File -Filter 'test'
$EndTime = Get-Date
$FileList
$AfterOutputTime = Get-Date
'Seconds taken for listing:'
(EndTime - $startTime).TotalSeconds
'Seconds taken including output:'
($AfterOutputTime - $StartTime).TotalSeconds

Sensitive word search with powershell

I am somewhat new to PowerShell so any help I would appreciate.
I am trying to put a PS script together so that I can search a file for sensitive words before transferring it from one network to another. Like 'Classified' and multiple other words that I can add to a word bank in a text file vice updating the code every time
Right now I am forced to use PS 2 windows 7 and server 2008
Select-String -Path e:\transfer_folder\*.* -pattern Classified,restricted
Then I can get an output for any hits on the list of words so that I can find them. I am trying to speed up my searching through hundreds of pages of documents with what I like to call a dirty word search so I do not put something that should not go on the wrong network.
You've got the right idea. The -Pattern tack in powershell can usually be called on to work with regular expressions. If you've never worked with regular expressions, take a look at this beginner's guide to using regex pattern matching. What you probably want is a set of variables that you can use to dynamically pick out those sensitive keywords.
The short and simple answer is that you want to use a pipe to separate your options for pattern, and pass it in as a string.
Select-String -Path e:\transfer_folder\*.* -pattern "Classified|Restricted"
Also, you might want to think about doing this at the file level rather than just importing all of your stuff in willynilly like that. I would go for something like:
$files = #(Get-ChildItem -Path E:\transfer_folder\ -Filter "*.txt|*.etc").FullName
(The # symbol means that you get your output as an array. The .FullName means that you're only selecting the FullName field from the object that's being produced by the command.)
Then you can process each file individually, like:
Foreach ($file in $files) {
Write-Host "Processing $file"
echo (Select-String -Path $file -Pattern $pattern)
}
One of the reasons that I love powershell is how comparatively easy it is to perform these types of matching operations. If you dig into Regex, you'll notice that you can represent "OR" as "|". So you have two options to do this logically:
Just hard write it out
$pattern = "Classified|Forbidden|Death|Danger"
Do it dynamically
Scripting is all about not having to do things more than once, right? So you'll probably want to encapsulate this in a function or something. Or maybe you want to get your words from a text file? You can be like:
(might take some tweaking)
function Get-ForbiddenWords ([string[]]$words, [string]$folder) {
ForEach ($word in $words) {
$pattern += "$word|"
}
#remove trailing pipe
$pattern -replace ".$"
$files = #(Get-ChildItem -Path $folder -Filter "*.txt|*.etc").FullName
Foreach ($file in $files) {
Write-Host "Processing $file"
echo (Select-String -Path $file -Pattern $pattern)
}
}
Now you can put this in your powershell profile and invoke it with
Get-ForbiddenWords -words secret dangerous whatever -folder E:\transfer_folder\

Powershell - clarification about foreach

I am learning powershell and I need someone to give me an initial push to get me through the learning curve. I am familiar with programming and dos but not powershell.
What I would like to do is listing all files from my designated directory and pushing the filenames into an array. I am not very familiar with the syntax and when I tried to run my test I was asked about entering parameters.
Could someone please enlighten me and show me the correct way to get what I want?
This is what powershell asked me:
PS D:\ABC> Test.ps1
cmdlet ForEach-Object at command pipeline position 2
Supply values for the following parameters:
Process[0]:
This is my test:
[string]$filePath = "D:\ABC\*.*";
Get-ChildItem $filePath | foreach
{
$myFileList = $_.BaseName;
write-host $_.BaseName
}
Why was ps asking about Process[0]?
I would want to ps to list all the files from the directory and pipe the results to foreach where I put each file into $myFileList array and print out the filename as well.
Don't confuse foreach (the statement) with ForEach-Object (the cmdlet). Microsoft does a terrible job with this because there is an alias of foreach that points to ForEach-Object, so when you use foreach you have to know which version you're using based on how you're using it. Their documentation makes this worse by further conflating the two.
The one you're trying to use in your code is ForEach-Object, so you should use the full name of it to differentiate it. From there, the issue is that the { block starts on the next line.
{} is used in PowerShell for blocks of code related to statements (like while loops) but is also used to denote a [ScriptBlock] object.
When you use ForEach-Object it's expecting a scriptblock, which can be taken positionally, but it must be on the same line.
Conversely, since foreach is a statement, it can use its {} on the next line.
Your code with ForEach-Object:
Get-ChildItem $filePath | ForEach-Object {
$myFileList = $_.BaseName;
write-host $_.BaseName
}
Your code with foreach:
$files = Get-ChildItem $filePath
foreach ($file in $Files)
{
$myFileList = $file.BaseName;
write-host $file.BaseName
}

Runtime of Foreach-Object vs Foreach loop

I want to do a progress bar of my script but then I need a total amount of folders.
Is there a significant runtime difference between:
Get-ChildItem $path -Directory | ForEach-Object {
#do work
}
and
$folders = Get-ChildItem $path -Directory
foreach($folder in $folders){
#do work
}
Then I can use $folders.Count as my total amount of folders. I don't know how to do it with a foreach-object loop.
You can check for yourself:
Measure-Command {
1..100000 | ForEach-Object $_
}
1.17s
Measure-Command {
foreach ($i in 1..100000)
{
$i
}
}
0.15s
Piping is designed to process items immediately as they appear so the entire length of the list is not known while it's being piped.
Get-ChildItem $path -Directory | ForEach {
# PROCESSING STARTS IMMEDIATELY
# LENGTH IS NOT KNOWN
}
Advantage: processing starts immediately, no delay to build the list.
Disadvantage: the list length is not known until it's fully processed
On the other hand, assigning the list to a variable builds the entire list at this point, which can take an extremely large amount of time if the list contains lots of items or it's slow to build, for example, if it's a directory with lots of nested subdirectories, or a slow network directory.
# BUILD THE ENTIRE LIST AND ASSIGN IT TO A VARIABLE
$folders = Get-ChildItem $path -Directory
# A FEW MOMENTS/SECONDS/MINUTES/HOURS LATER WE CAN PROCESS IT
ForEach ($folder in $folders) {
# LENGTH IS KNOWN: $folders.count
}
Advantage of building the list + ForEach statement: overall time spent is less because processing { } block is not invoked on each item whereas with piping it is invoked like a function or scriptblock, and this invocation overhead is very big in PowerShell.
Disadvantage: the initial delay in the list assignment statement can be extremely huge
Yes, there is a performance difference. foreach is faster than ForEach-Object, but requires more memory, because all items ($folders) must be in memory. ForEach-Object processes one item at a time as they're passed through the pipeline, so it has a smaller memory footprint, but isn't as fast as foreach.
See also.

powershell slow(?) - write names of subfolders to a text file

My Powershell script seems slow, when I run the below code in ISE, it keeps running, doesn't stop.
I am trying to write the list of subfolders in a folder(the folder path is in $scratchpart) to a text file. There are >30k subfolders
$limit = (Get-Date).AddDays(-15)
$path = "E:\Data\PathToScratch.txt"
$scratchpath = Get-Content $path -TotalCount 1
Get-ChildItem -Path $scratchpath -Recurse -Force | Where-Object { $_.PSIsContainer -and $_.CreationTime -lt $limit } | Add-Content C:\Data\eProposal\POC\ScratchContents.txt
Let me know if my approach is not optimal. Ultimately, I will read the text file, zip the subfolders for archival and delete them.
Thanks for your help in advance. I am new to PS, watched few videos on MVA
Add-Content, Set-Content, and even Out-File are notoriously slow in PowerShell. This is because each call opens the file, writes to it, and closes the handle. It never does anything more intelligently than that.
That doesn't sound bad until you consider how pipelines work with Get-ChildItem (and Where-Object and Select-Object). It doesn't wait until it's completed before it begins passing objects into the pipeline. It starts passes objects as soon as the provider returns them. For a large result set, this means that the objects are still feeding in the pipeline long after several have finished processing. Generally speaking, this is great! It means the system will function more efficiently, and it's why stuff like this:
$x = Get-ChildItem;
$x | ForEach-Object { [...] };
Is significantly slower than stuff like this:
Get-ChildItem | ForEach-Object { [...] };
And it's why stuff like this appears to stall:
Get-ChildItem | Sort-Object Name | ForEach-Object { [...] };
The Sort-Object cmdlet needs to waits until it's received all pipeline objects before it sorts. It kind of has to to be able to sort. The sort itself is nearly instantaneous; it's just the cmdlet waiting until it has the full results.
The issue with Add-Content is that, well, it experiences the pipeline not as, "Here's a giant string to write once," but instead as, "Here's a string to write. Here's a string to write. Here's a string to write. Here's a string to write." You'll be sending content to Add-Content here line by line. Each line will instantiate a new call to Add-Content, requiring the file to open, write, and close. You'll likely see better performance if you assign the result of Get-ChildItem [...] | Where-Object [...] to a variable, and then write the entire variable to the file at once:
$limit = (Get-Date).AddDays(-15);
$path = "E:\Data\PathToScratch.txt";
$scratchpath = Get-Content $path -TotalCount 1;
$Results = Get-ChildItem -Path $scratchpath -Recurse -Force -Directory | `
Where-Object{$_.CreationTime -lt $limit } | `
Select-Object -ExpandPropery FullName;
Add-Content C:\Data\eProposal\POC\ScratchContents.txt -Value $Results;
However, you might be concerned about memory usage if your results are actually going to be extremely large. You can actually use System.IO.StreamWriter for this purpose, too. My process improved in speed by nearly two orders of magnitude (from 12 hours to 20 minutes) by switching to StreamWriter and also only calling StreamWriter when I had about 250 lines to write (that seemed to be the break-even point for StreamWriter's overhead). But I was parsing all ACLs for user home and group shares for about 10,000 users and nearly 10 TB of data. Your task might not be as large.
Here's a good blog explaining the issue.
Do you have at least PowerShell 3.0? If you do you should be able to reduce the time by filtering out the files since you are returning those as well.
Get-ChildItem -Path $scratchpath -Recurse -Force -Directory | ...
Currently you are returning all files and folders then filtering out the files with $_.PSIsContainer which would be slower. So should end up with something like this
Get-ChildItem -Path $scratchpath -Recurse -Force -Directory |
Where-Object{$_.CreationTime -lt $limit } |
Select-Object -ExpandPropery FullName |
Add-Content C:\Data\eProposal\POC\ScratchContents.txt