Increase speed of PowerShell Get-ChildItem large directory - powershell

I have a script that references a .csv document of filenames and then runs a Get-ChildItem over a large directory to find the file and pull the 'owner'. Finally the info outputs into another .csv document. We use this to find who created files. Additionally I have it create .txt files with filename and timestamp to see how fast the script is finding the data. The code is as follows:
Get-ChildItem -Path $serverPath -Filter $properFilename -Recurse -ErrorAction 'SilentlyContinue' |
Where-Object {$_.LastWriteTime -lt (get-date).AddDays(30) -and
$_.Extension -eq ".jpg"} |
Select-Object -Property #{
Name='Owner'
Expression={(Get-Acl -Path $_.FullName).Owner}
},'*' |
Export-Csv -Path "$desktopPath\Owner_Reports\Owners.csv" -NoTypeInformation -Append
$time = (get-date -f 'hhmm')
out-file "$desktopPath\Owner_Reports\${fileName}_$time.txt"
}
This script serves it's purpose but is extremely slow based on the large size of the parent directory. Currently it takes 12 minutes per filename. We query approx 150 files at a time and this long wait time is hindering production.
Does anyone have better logic that could increase the speed? I assume that each time the script runs Get-ChildItem it recreates the index of the parent directory, but I am not sure. Is there a way we can create the index one time instead of for each filename?
I am open to any and all suggestions! If more data is required (such as the variable naming etc) I will provide upon request.
Thanks!

Related

Powershell: Get-ChildItem performance to deal with bulk files

The scenario is in a remote server, a folder is shared for ppl to access the log files.
The log files will be kept for around 30 days before they got aged, and each day, around 1000 log files will be generated.
For problem analysis, I need to copy log files to my own machine, according to the file timestamp.
My previous strategy is:
Use dir /OD command to get the list of the files, to my local PC, into a file
Open the file, find the timestamp, get the list of the files I need to copy
Use copy command to copy the actual log files
It works but needs some manual work, ie step2 I use notepad++ and regular expression to filter the timestamp
I tried to use powershell as:
Get-ChildItem -Path $remotedir | where-object {$_.lastwritetime -gt $starttime -and $_lastwritetime -lt $endtime } |foreach {copy-item $_.fullname -destination .\}
However using this approach it took hours and hours and no file has been copied, while compared with the dir solution it took around 7-8 minutes to generate the list of the file than copy itself took sometime but not hours
I guess most of the time spent on the filter file. I'm not quite sure why the get-childitem's performance is so poor.
Can you please advise if there's anything i can change?
Thanks
For directories with a lot of files, Get-ChildItem is too slow. It looks like most of the time is spent enumerating the directory, then filtering through 'where', then copying each file.
Use .net directly, particularly [io.directoryinfo] with the GetFileSystemInfos() method.
e.g.
$remotedir = [io.directoryinfo]'\\server\share'
$destination = '.\'
$filemask = '*.*'
$starttime = [datetime]'jan-21-2021 1:23pm'
$endtime = [datetime]'jan-21-2021 4:56pm'
$remotedir.GetFileSystemInfos($filemask, [System.IO.SearchOption]::TopDirectoryOnly) | % {
if ($_.lastwritetime -gt $starttime -and $_.lastwritetime -lt $endtime){
Copy-Item -Path $_.fullname -Destination $destination
}
}

How to speed up Get-ChildItem in Powershell

Just wandering how could I speed up the Get-ChildItem in Powershell?
I have the following script to search for a file that created by today and copy it over to another folder.
$fromDirectory = "test_file_*.txt"
$fromDirectory = "c:\sour\"
$toDirectory = "c:\test\"
Get-ChildItem $fromDirectory -Include $fileName -Recurse | Where {$_.LastWriteTime -gt (Get-Date).Date} | Copy-Item -Destination $toDirectory
Due to the folder that I search have 124,553 history files, it's take me age for the search. Does any know how could I improve my script to speed up my search and copy?
Here are some things to try:
First, use Measure-Command {} to get the actual performance:
Measure-Command { Get-ChildItem $fromDirectory -Include $fileName -Recurse | Where {$_.LastWriteTime -gt (Get-Date).Date} | Copy-Item -Destination $toDirectory }
Then, consider removing the '-Recurse' flag, because this is actually going inside every directory and child and child of child. If your target log files are really that scattered, then...
Try using robocopy to match a pattern in the filename and lastwritetime, then use powershell to copy over. You could even use robocopy to do the copying.
It's possible that you just have a huge, slow problem to solve, but try these to see if you can break it down.
This is a well-known feature of NTFS. Microsoft's docs say that the limit for decreasing performance is about 50 000 files in a directory.
If the file names are of very similar, creation of 8dot3 legacy names will start to slow down when there are about 300 000 files. Though you have "only" 120 k files, it's the same order of magnitude.
Some previous questions discuss this issue. Sadly, there is no a single good solution but better hierarchy in directories. The usual tricks are to disable 8dot3 with fsutil and last access date via registry, but those will help only so much.
Can you redesign the directory structure? Moving old files into, say, year-quarter subdirs might keep the main directory clean enough. To find out file's year-quarter, a quick way is like so,
gci | % {
$("{2} => {1}\Q{0:00}" -f [Math]::ceiling( ($_.LastAccessTime.toString('MM'))/3),
$_.LastAccessTime.ToString('yyyy'),
$_.Name
)
}
I would try putting the Get-ChildItem $fromDirectory -Include $fileName -Recurse | Where {$_.LastWriteTime -gt (Get-Date).Date} in an Array and then copying results from the Array.
$GrabFiles =#()
$GrabFiles =#( Get-ChildItem $fromDirectory -Include $fileName -Recurse | Where {$_.LastWriteTime -gt (Get-Date).Date} )
Copy-Item -Path $GrabFiles -Destination $toDirectory }

Backup QVD files every day & save three versions before removing the first generation

I have a few QlikView servers with alot of QVD files I need to backup.
The idea is to backup three generations, so lets say the app is named tesla.qvd.
Backing it up naming it like testa.qvd.2019-06.05 if the file was modified today.
Then it would backup a new one the next time it's modified/written to.
In total I would like to save two generations before the first one is removed.
This is for a windows 2012 server, using PS 4.0
#$RemotePath = "C:\qlikview Storage\privatedata\backup\"
$LocalPath = "C:\qlikview Storage\privatedata"
$nomatch = "*\backup\*"
$predetermined=[system.datetime](get-date)
$date= ($predetermined).AddDays(-1).ToString("MM/dd/yyyy:")
Foreach($file in (Get-ChildItem -File $localpath -Recurse | Where {$_.FullName -notlike $nomatch} -Verbose ))
{
Copy-Item -Path $file.fullname -Destination "C:\qlikview Storage\privatedata\backup\$file.$(get-date -f yyyy-MM-dd)"
}
The code above would back the files up with the dates as described in the text before the code.
It's proceeding from here thats my problem.
I tried google and searching the forum.
I don't ask for someone to solve the whole issue I have.
But if you can help me out with which functions / what I should look on to get my end result it would help alot so I can proceed.
In the picture you can see an example how the library looks after backup has been done. The lastwrite on the files would be same as date thou, this is fictionaly created for this question.
You can use the basename attribute of the files in the backup folder since you add a new extension to the files. It would look something like this:
# Group by basename and find groups with more then 2 backups
$Groups = Get-ChildItem -Path "C:\qlikview Storage\privatedata\backup" | Group-Object basename | Where-Object {$_.Count -gt 2}
foreach ($g in $Groups) {
$g.Group | sort LastWriteTime -Descending | select -skip 2 | foreach {del $_.fullname -force}
}

Powershell: Recursively search a drive or directory for a file type in a specific time frame of creation

I am trying to incorporate Powershell into my everyday workflow so I can move up from a Desktop Support guy to a Systems Admin. One question that I encountered when helping a coworker was how to search for a lost or forgotten file saved in an unknown directory. The pipeline I came up with was:
dir C:\ -Recurse -Filter *.pdf -ErrorAction SilentlyContinue -Force | Out-File pdfs.txt
This code performed exactly how I wanted but now I want to extend this command and make it more efficient. Especially since my company has clients with very messy file management.
What I want to do with this pipeline:
Recursively search for a specific file-type that was created in a specified time-frame. Lets say the oldest file allowed in this search is a file from two days ago.
Save the file to a text file with the columns containing the Filename, FullName(Path), and sorted by the created time in descending order.
What I have so far:
dir C:\ -Recurse -Filter *.pdf -ErrorAction SilentlyContinue -Force | Select-Object Name, FullName | Out-File *pdfs.txt
I really need help on how to create a filter for the time that the file was created. I think I need to use the Where-Object cmdlet right after the dir pipe and before the Select Object pipe but I don't know how to set that up. This is what I wrote: Where-Object {$_.CreationTime <
You're on the right track, to get the files from a specific file creation date range, you can pipe the dir command results to:
Where-Object {$_.CreationTime -ge "06/20/2017" -and $_.CreationTime -le "06/22/2017"}
If you want something more repeatable where you don't have to hard-code the dates everytime and just want to search for files from up to 2 days ago you can set variables:
$today = (Get-Date)
$daysago = (Get-Date).AddDays(-2)
then plugin the variables:
Where-Object {$_.CreationTime -ge $daysago -and $_.CreationTime -le $today}
I'm not near my Windows PC to test this but I think it should work!
See if this helps
dir c:\ -Recurse -Filter *.ps1 -ErrorAction SilentlyContinue -Force | select LastWriteTime,Name | Where-Object {$_.LastWriteTime -ge [DateTime]::Now.AddDays(-2) } | Out-File Temp.txt

Is my PowerShell script inefficient?

I wrote a script to delete temporary files/folders older than 90 days on remote servers. The server.txt file is loaded with Get-Content, and I use 'net use' to map to the IPC$ share. I'm worried that I'm not using Best Practices to delete the old temp files. Here is the meat of my script:
net use \\$server\IPC$ /user:$Authname $pw /persistent:yes
Get-ChildItem -Path "\\$($server)\C$\Temp" -Recurse | Where-Object {!$_.PSIsContainer -and $_.LastAccessTime -lt $cutoffdate} | Remove-Item -Recurse
(Get-ChildItem -Path "\\$($server)\C$\Temp" -recurse | Where-Object {$_.PSIsContainer -eq $True}) | Where-Object {$_.GetFiles().Count -eq 0} | Remove-Item -Recurse
net use \\$Server\IPC$ /delete
The first gci deletes old files, the second deletes empty folders.
The reason I'm concerned is that in my initial tests, it's taking about a half hour to delete approximately 4 gb off of one server. And I work in a big shop; my script needs to be run against about 10,000 servers. At that rate my script won't be done for more than six months, and I was hoping to run it on a quarterly basis.
Am I doing something the hard way?
get a list of your servers
cycle through the list and use invoke-command -computername
your command will be executed on the remote server rather than pulling all the data across the network which is very slow