How to speed up Powershell Get-Childitem over UNC - powershell

DIR or GCI is slow in Powershell, but fast in CMD. Is there any way to speed this up?
In CMD.exe, after a sub-second delay, this responds as fast as the CMD window can keep up
dir \\remote-server.domain.com\share\folder\file*.*
In Powershell (v2), after a 40+ second delay, this responds with a noticable slowness (maybe 3-4 lines per second)
gci \\remote-server.domain.com\share\folder\file*.*
I'm trying to scan logs on a remote server, so maybe there's a faster approach.
get-childitem \\$s\logs -include $filemask -recurse | select-string -pattern $regex

Okay, this is how I'm doing it, and it seems to work.
$files = cmd /c "$GETFILESBAT \\$server\logs\$filemask"
foreach( $f in $files ) {
if( $f.length -gt 0 ) {
select-string -Path $f -pattern $regex | foreach-object { $_ }
}
}
Then $GETFILESBAT points to this:
#dir /a-d /b /s %1
#exit
I'm writing and deleting this BAT file from the PowerShell script, so I guess it's a PowerShell-only solution, but it doesn't use only PowerShell.
My preliminary performance metrics show this to be eleventy-thousand times faster.
I tested gci vs. cmd dir vs. FileIO.FileSystem.GetFiles from #Shawn Melton's referenced link.
The bottom line is that, for daily use on local drives, GetFiles is the fastest. By far. CMD DIR is respectable. Once you introduce a slower network connection with many files, CMD DIR is slightly faster than GetFiles. Then Get-ChildItem... wow, this ranges from not too bad to horrible, depending on the number of files involved and the speed of the connection.
Some test runs. I've moved GCI around in the tests to make sure the results were consistent.
10 iterations of scanning c:\windows\temp for *.tmp files
.\test.ps1 "c:\windows\temp" "*.tmp" 10
GetFiles ... 00:00:00.0570057
CMD dir ... 00:00:00.5360536
GCI ... 00:00:01.1391139
GetFiles is 10x faster than CMD dir, which itself is more than 2x faster than GCI.
10 iterations of scanning c:\windows\temp for *.tmp files with recursion
.\test.ps1 "c:\windows\temp" "*.tmp" 10 -recurse
GetFiles ... 00:00:00.7020180
CMD dir ... 00:00:00.7644196
GCI ... 00:00:04.7737224
GetFiles is a little faster than CMD dir, and both are almost 7x faster than GCI.
10 iterations of scanning an on-site server on another domain for application log files
.\test.ps1 "\\closeserver\logs\subdir" "appname*.*" 10
GetFiles ... 00:00:00.3590359
CMD dir ... 00:00:00.6270627
GCI ... 00:00:06.0796079
GetFiles is about 2x faster than CMD dir, itself 10x faster than GCI.
One iteration of scanning a distant server on another domain for application log files, with many files involved
.\test.ps1 "\\distantserver.company.com\logs\subdir" "appname.2011082*.*"
CMD dir ... 00:00:00.3340334
GetFiles ... 00:00:00.4360436
GCI ... 00:11:09.5525579
CMD dir is fastest going to the distant server with many files, but GetFiles is respectably close. GCI on the other hand is a couple of thousand times slower.
Two iterations of scanning a distant server on another domain for application log files, with many files
.\test.ps1 "\\distantserver.company.com\logs\subdir" "appname.20110822*.*" 2
CMD dir ... 00:00:00.9360240
GetFiles ... 00:00:01.4976384
GCI ... 00:22:17.3068616
More or less linear increase as test iterations increase.
One iteration of scanning a distant server on another domain for application log files, with fewer files
.\test.ps1 "\\distantserver.company.com\logs\othersubdir" "appname.2011082*.*" 10
GetFiles ... 00:00:00.5304170
CMD dir ... 00:00:00.6240200
GCI ... 00:00:01.9656630
Here GCI is not too bad, GetFiles is 3x faster, and CMD dir is close behind.
Conclusion
GCI needs a -raw or -fast option that does not try to do so much. In the meantime, GetFiles is a healthy alternative that is only occasionally a little slower than CMD dir, and usually faster (due to spawning CMD.exe?).
For reference, here's the test.ps1 code.
param ( [string]$path, [string]$filemask, [switch]$recurse=$false, [int]$n=1 )
[reflection.assembly]::loadwithpartialname("Microsoft.VisualBasic") | Out-Null
write-host "GetFiles... " -nonewline
$dt = get-date;
for($i=0;$i -lt $n;$i++){
if( $recurse ){ [Microsoft.VisualBasic.FileIO.FileSystem]::GetFiles( $path,
[Microsoft.VisualBasic.FileIO.SearchOption]::SearchAllSubDirectories,$filemask
) | out-file ".\testfiles1.txt"}
else{ [Microsoft.VisualBasic.FileIO.FileSystem]::GetFiles( $path,
[Microsoft.VisualBasic.FileIO.SearchOption]::SearchTopLevelOnly,$filemask
) | out-file ".\testfiles1.txt" }}
$dt2=get-date;
write-host $dt2.subtract($dt)
write-host "CMD dir... " -nonewline
$dt = get-date;
for($i=0;$i -lt $n;$i++){
if($recurse){
cmd /c "dir /a-d /b /s $path\$filemask" | out-file ".\testfiles2.txt"}
else{ cmd /c "dir /a-d /b $path\$filemask" | out-file ".\testfiles2.txt"}}
$dt2=get-date;
write-host $dt2.subtract($dt)
write-host "GCI... " -nonewline
$dt = get-date;
for($i=0;$i -lt $n;$i++){
if( $recurse ) {
get-childitem "$path\*" -include $filemask -recurse | out-file ".\testfiles0.txt"}
else {get-childitem "$path\*" -include $filemask | out-file ".\testfiles0.txt"}}
$dt2=get-date;
write-host $dt2.subtract($dt)

Here is a good explanation on why Get-ChildItem is slow by Lee Holmes. If you take note of the comment from "Anon 11 Mar 2010 11:11 AM" at the bottom of the page his solution might work for you.
Anon's Code:
# SCOPE: SEARCH A DIRECTORY FOR FILES (W/WILDCARDS IF NECESSARY)
# Usage:
# $directory = "\\SERVER\SHARE"
# $searchterms = "filname[*].ext"
# PS> $Results = Search $directory $searchterms
[reflection.assembly]::loadwithpartialname("Microsoft.VisualBasic") | Out-Null
Function Search {
# Parameters $Path and $SearchString
param ([Parameter(Mandatory=$true, ValueFromPipeline = $true)][string]$Path,
[Parameter(Mandatory=$true)][string]$SearchString
)
try {
#.NET FindInFiles Method to Look for file
# BENEFITS : Possibly running as background job (haven't looked into it yet)
[Microsoft.VisualBasic.FileIO.FileSystem]::GetFiles(
$Path,
[Microsoft.VisualBasic.FileIO.SearchOption]::SearchAllSubDirectories,
$SearchString
)
} catch { $_ }
}

I tried some of the suggested methods with a large amount of files (~190.000). As mentioned in Kyle's comment, GetFiles isn't very useful here, because it needs nearly forever.
cmd dir was better than Get-ChildItems at my first tests, but it seems, GCI speeds up a lot if you use the -Force parameter. With this the needed time was about the same as for cmd dir.
P.S.: In my case I had to exclude most of the files because of their extension. This was made with -Exclude in gci and with a | where in the other commands. So the results for just searching files might slightly differ.

Here's an interactive reader that parses cmd /c dir (which can handle unc paths), and will collect the 3 most important properties for most people: full path, size, timestamp
usage would be something like $files_with_details = $faster_get_files.GetFileList($unc_compatible_folder)
and there's a helper function to check combined size $faster_get_files.GetSize($files_with_details)
$faster_get_files = New-Module -AsCustomObject -ScriptBlock {
#$DebugPreference = 'Continue' #verbose, this will take figuratively forever
#$DebugPreference = 'SilentlyContinue'
$directory_filter = "Directory of (.+)"
$file_filter = "(\d+/\d+/\d+)\s+(\d+:\d+ \w{2})\s+([\d,]+)\s+(.+)" # [1] is day, [2] is time (AM/PM), [3] is size, [4] is filename
$extension_filter = "(.+)[\.](\w{3,4})" # [1] is leaf, [2] is extension
$directory = ""
function GetFileList ($directory = $this.directory) {
if ([System.IO.Directory]::Exists($directory)) {
# Gather raw file list
write-Information "Gathering files..."
$files_raw = cmd /c dir $directory \*.* /s/a-d
# Parse file list
Write-Information "Parsing file list..."
$files_with_details = foreach ($line in $files_raw) {
Write-Debug "starting line {$($line)}"
Switch -regex ($line) {
$this.directory_filter{
$directory = $matches[1]
break
}
$this.file_filter {
Write-Debug "parsing matches {$($matches.value -join ";")}"
$date = $matches[1]
$time = $matches[2] # am/pm style
$size = $matches[3]
$filename = $matches[4]
# we do a second match here so as to not append a fake period to files without an extension, otherwise we could do a single match up above
Write-Debug "parsing extension from {$($filename)}"
if ($filename -match $this.extension_filter) {
$file_leaf = $matches[1]
$file_extension = $matches[2]
} else {
$file_leaf = $filename
$file_extension = ""
}
[pscustomobject][ordered]#{
"fullname" = [string]"$($directory)\$($filename)"
"filename" = [string]$filename
"folder" = [string]$directory
"file_leaf" = [string]$file_leaf
"extension" = [string]$file_extension
"date" = get-date "$($date) $($time)"
"size" = [int]$size
}
break
}
} # finish directory/file test
} # finish all files
return $files_with_details
} #finish directory exists test
else #directory doesn't exist {throw("Directory not found")}
}
function GetSize($files_with_details) {
$combined_size = ($files_with_details|measure -Property size -sum).sum
$pretty_size_gb = "$([math]::Round($combined_size / 1GB, 4)) GB"
return $pretty_size_gb
}
Export-ModuleMember -Function * -Variable *
}

Related

Efficiently loop over array, appending result to string

I'm very much a Powershell beginner (if that). I've been stringing together code from everywhere to make something work, but now I want to make it a bit more manageable to edit.
I previously had this script which I had written out all, but it is very repetitive. It was a copy from source to destination, and was easy to control with 1-5 to do. I'm now up to 14, and it's a lot of repetitive code.
I'm trying to consolidate it into an easier to maintain script.
Previously, I was copying the $copy and $result lines multiple times for each share couple.
I've now moved that into an array which I want to loop over:
# shares to copy source and destination
# blank 1st entry for easier numbering
$shares = #(
#(),
#("\\nas01\share01\", "\\nas02\share01\"),
...
#("\\nas01\share99\", "\\nas02\share99\"),
)
# loop
for($parentLoop=1; $parentLoop -lt 99; $parentLoop++) {
for($childLoop=0; $childLoop -lt 2 ; $childLoop++) {
$args = '/cmd=sync /open_window /force_close "'+$shares[$parentLoop][0]+'" /to="'+$shares[$parentLoop][1]+'"'
$copy = Start-Process -FilePath "C:\Program Files\FastCopy\FastCopy.exe" -ArgumentList $args -Wait -NoNewWindow -PassThru
$result = if( $copy.ExitCode -eq -1 ) { "FAILED" } else { "SUCCESS" }
$source = [Math]::Round(("{0:N2}" -f ((Get-ChildItem $shares[$parentLoop][0] -recurse | Measure-Object -property length -sum).Sum / 1GB) / 1024 ), 2)
$destination = [Math]::Round(("{0:N2}" -f ((Get-ChildItem $shares[$parentLoop][1] -recurse | Measure-Object -property length -sum).Sum / 1GB) / 1024 ), 2)
}
$response = "$result ( Source: $source / Destination: $destination )"
}
# get the date time
$dateTime = Get-Date -Format "dd-MM-yyyy HH:mm:ss"
# compile email
$emailBody = "Completed copy\n---\n\nShare 1: $response"
$emailArgs = "-host:192.168.0.1:25 -from:email#domain.ltd `"-to:me#domain.ltd`" `"-subject:Copy report - $dateTime`" `"-body:$emailBody`""
$email = Start-Process -FilePath "C:\Program Files\cMail\CMail.exe" -ArgumentList $emailArgs -Wait -NoNewWindow -PassThru
What I don't know how to do is to add in the output of the $response for each loop into the $emailBody.
I also don't know if there is an easier way to get the share size. I thought about mounting the share (source and destination) then capturing it locally (Get-WmiObject Win32_LogicalDisk -Filter "DeviceID='A:'" | Select-Object Size) and then unmounting it for the next loop, but wasn't sure it was efficient or if I'd run into issues of "failure to dismount".
Ideally, I'd like to be able to scale the $shares array up or down, and the email output to scale too. This was I can have a plaintext email (quicker to read than attachments) that says:
Completed copy
---
Share 1: SUCCESS (Source: 256 / Destination: 256)
...
Share 99: FAILURE (Source: 25 / Destination: 985)

Converting a line of cmd to powershell

EDIT2: Final code below
I need help on converting some codes as I am very new to mkvmerge, powershell and command prompt.
The CMD code is from https://github.com/Serede/mkvtoolnix-batch/blob/master/mkvtoolnix-batch.bat
for %%f in (*.mkv) do %mkvmerge% #options.json -o "mkvmerge_out/%%f" "%%f"
What I've managed so far
$SourceFolder = "C:\tmp" #In my actual code, this is done using folder browser
$SourceFiles = Get-ChildItem -LiteralPath $SourceFolder -File -Include *.mkv
$SourceFiles | foreach
{
start-process "F:\Desktop\#progs\mkvtoolnix\mkvmerge.exe"
}
I'd be grateful for any help as I'm having trouble understanding and converting while learning both sides. Thank you very much.
**EDIT 2:**Here's my final working code.
Function Get-Folder($initialDirectory) {
#Prompt to choose source folder
[void] [System.Reflection.Assembly]::LoadWithPartialName('System.Windows.Forms')
$FolderBrowserDialog = New-Object System.Windows.Forms.FolderBrowserDialog
$FolderBrowserDialog.Description = 'Choose the video folder'
$FolderBrowserDialog.RootFolder = 'MyComputer'
if ($initialDirectory) { $FolderBrowserDialog.SelectedPath = $initialDirectory }
[void] $FolderBrowserDialog.ShowDialog()
return $FolderBrowserDialog.SelectedPath
}
Function ExitMessage
{
#endregion Function output
Write-Host "`nOperation complete";
Write-Host -NoNewLine 'Press any key to continue...';
$null = $Host.UI.RawUI.ReadKey('NoEcho,IncludeKeyDown');
Exit;
}
($SourceFolder = Get-Folder | select )
#Check for output folder and create if unavailable
$TestFile = "$SourceFolder" + "\mkvmerge_out"
if ((Test-Path -LiteralPath $TestFile) -like "False")
{
new-item -Path $SourceFolder -name "mkvmerge_out" -type directory
Write-Host 'Folder created';
}
#Checking for the presence of a Json file
$TestFile = (Get-ChildItem -LiteralPath $SourceFolder -File -Filter *.json)
if ($TestFile.count -eq 0)
{
Write-Host 'json file not found';
ExitMessage;
}
$TestFile = "$SourceFolder" + "\$TestFile"
#Getting the total number of files and start timer.
[Int] $TotalFiles = 0;
[Int] $FilesDone = 0;
$TotalFiles = (Get-ChildItem -LiteralPath $SourceFolder -File -Filter *.mkv).count
$PercentFiles = 0;
$Time = [System.Diagnostics.Stopwatch]::StartNew()
#Start mkvmerge process with progress bar
$mkvmergeExe = 'F:\Desktop\#progs\mkvtoolnix\mkvmerge.exe'
$JsonFile = "$TestFile" # alternatively, use Join-Path
Get-ChildItem -LiteralPath $SourceFolder -File -Filter *.mkv | ForEach-Object {
$PercentFiles = [math]::truncate(($FilesDone/$TotalFiles)*100)
Write-Progress -Activity mkvmerge -Status ("{0}% Completed; {1}/{2} done; Time Elapsed: {3:d2}:{4:d2}:{5:d2}" -f $PercentFiles, $FilesDone, $TotalFiles, $Time.Elapsed.Hours, $Time.Elapsed.minutes, $Time.Elapsed.seconds) -PercentComplete $PercentFiles;
Write-Host "Processing $_"
$f = $_.FullName
$of = "$SourceFolder\mkvmerge_out\$($_.Name)"
& $mkvmergeExe -q `#$JsonFile -o $of $f
$FilesDone++
}
Remove-Item -LiteralPath $JsonFile #Remove this line if you want to keep the Json file
$PercentFiles = [math]::truncate(($FilesDone/$TotalFiles)*100)
Write-Progress -Activity mkvmerge -Status ("{0}% Completed; {1}/{2} done; Time Elapsed: {3:d2}:{4:d2}:{5:d2}" -f $PercentFiles, $FilesDone, $TotalFiles, $Time.Elapsed.Hours, $Time.Elapsed.minutes, $Time.Elapsed.seconds) -PercentComplete $PercentFiles;
ExitMessage;
$mkvmergeExe = 'F:\Desktop\#progs\mkvtoolnix\mkvmerge.exe'
$optionsFile = "$SourceFolder\options.json" # alternatively, use Join-Path
Get-ChildItem -LiteralPath $SourceFolder -File -Filter *.mkv | ForEach-Object {
$f = $_.FullName
$of = "$SourceFolder\mkvmerge_out\$($_.Name)"
& $mkvmergeExe `#$optionsFile -o $of $f
}
Note that your cmd code assumes that it's operating in the current directory, while your PowerShell code passes a directory explicitly via $SourceFolder; therefore, the options.json file must be looked for in $SourceFolder and too, and the output file path passed to -o must be prefixed with $SourceFolder too which is achieved via expandable strings ("...") .
The main points to consider:
for %%f in (*.mkv) has no direct counterpart in PowerShell; you correctly used Get-ChildItem instead, to get a list of matching files, which are returned as System.IO.FileInfo instances.
However, -Include won't work as intended in the absence of -Recurse (unless you append \* - see this GitHub issue; -Filter does, and is also the faster method, but it has its limitations and legacy quirks (see this answer).
While PowerShell too allows you to execute commands whose names or paths are stored in a variable (or specified as a quoted string literal), you then need &, the call operator, to invoke it, for syntactic reasons.
Inside a script block ({ ... }) passed to the ForEach-Object cmdlet, automatic variable $_ represents the pipeline input object at hand.
$_.FullName ensures that the System.IO.FileInfo input instances are represented by their full path when used in a string context.
This extra step is no longer necessary in PowerShell [Core] 6+, where System.IO.FileInfo instances thankfully always stringify as their full paths.
The # character is preceded by ` (backtick), PowerShell's escape character, because # - unlike in cmd - is a metacharacter, i.e. a character with special syntactic meaning. `# ensures that the # is treated verbatim, and therefore passed through to mkvmerge.
Alternatively, you could have quoted the argument instead of escaping just the #: "#$optionsFile"
See this answer for background information.
You generally do not need to enclose arguments in "..." in PowerShell, even if they contain spaces or other metacharacters.

How to use Powershell Pipeline to Avoid Large Objects?

I'm using a custom function to essentially do a DIR command (recursive file listing) on an 8TB drive (thousands of files).
My first iteration was:
$results = $PATHS | % {Get-FolderItem -Path "$($_)" } | Select Name,DirectoryName,Length,LastWriteTime
$results | Export-CVS -Path $csvfile -Force -Encoding UTF8 -NoTypeInformation -Delimiter "|"
This resulted in a HUGE $results variable and slowed the system down to a crawl by spiking the powershell process to use 99%-100% of the CPU as the processing went on.
I decided to use the power of the pipeline to WRITE to the CSV file directly (presumably freeing up the memory) instead of saving to an intermediate variable, and came up with this:
$PATHS | % {Get-FolderItem -Path "$($_)" } | Select Name,DirectoryName,Length,LastWriteTime | ConvertTo-CSV -NoTypeInformation -Delimiter "|" | Out-File -FilePath $csvfile -Force -Encoding UTF8
This seemed to be working fine (the CSV file was growing..and CPU seemed to be stable) but then abruptly stopped when the CSV file size hit ~200MB, and the error to the console was "The pipeline has been stopped".
I'm not sure the CSV file size had anything to do with the error message, but I'm unable to process this large directory with either method! Any suggestions on how to allow this process to complete successfully?
Get-FolderItem runs robocopy to list the files and converts its output into a PSObject array. This is a slow operation, which isn't required for the actual task, strictly speaking. Pipelining also adds big overhead compared to the foreach statement. In the case of thousands or hundreds of thousands repetitions that becomes noticeable.
We can speed up the process beyond anything pipelining and standard PowerShell cmdlets can offer to write the info for 400,000 files on an SSD drive in 10 seconds.
.NET Framework 4 or newer (included since Win8, installable on Win7/XP) IO.DirectoryInfo's EnumerateFileSystemInfos to enumerate the files in a non-blocking pipeline-like fashion;
PowerShell 3 or newer as it's faster than PS2 overall;
foreach statement which doesn't need to create ScriptBlock context for each item thus it's much faster than ForEach cmdlet
IO.StreamWriter to write each file's info immediately in a non-blocking pipeline-like fashion;
\\?\ prefix trick to lift the 260 character path length restriction;
manual queuing of directories to process to get past "access denied" errors, which otherwise would stop naive IO.DirectoryInfo enumeration;
progress reporting.
function List-PathsInCsv([string[]]$PATHS, [string]$destination) {
$prefix = '\\?\' #' UNC prefix lifts 260 character path length restriction
$writer = [IO.StreamWriter]::new($destination, $false, [Text.Encoding]::UTF8, 1MB)
$writer.WriteLine('Name|Directory|Length|LastWriteTime')
$queue = [Collections.Generic.Queue[string]]($PATHS -replace '^', $prefix)
$numFiles = 0
while ($queue.Count) {
$dirInfo = [IO.DirectoryInfo]$queue.Dequeue()
try {
$dirEnumerator = $dirInfo.EnumerateFileSystemInfos()
} catch {
Write-Warning ("$_".replace($prefix, '') -replace '^.+?: "(.+?)"$', '$1')
continue
}
$dirName = $dirInfo.FullName.replace($prefix, '')
foreach ($entry in $dirEnumerator) {
if ($entry -is [IO.FileInfo]) {
$writer.WriteLine([string]::Join('|', #(
$entry.Name
$dirName
$entry.Length
$entry.LastWriteTime
)))
} else {
$queue.Enqueue($entry.FullName)
}
if (++$numFiles % 1000 -eq 0) {
Write-Progress -activity Digging -status "$numFiles files, $dirName"
}
}
}
$writer.Close()
Write-Progress -activity Digging -Completed
}
Usage:
List-PathsInCsv 'c:\windows', 'd:\foo\bar' 'r:\output.csv'
dont use robocopy, use native PowerShell command, like this :
$PATHS = 'c:\temp', 'c:\temp2'
$csvfile='c:\temp\listresult.csv'
$PATHS | % {Get-ChildItem $_ -file -recurse } | Select Name,DirectoryName,Length,LastWriteTime | export-csv $csvfile -Delimiter '|' -Encoding UTF8 -NoType
Short version for no purist :
$PATHS | % {gci $_ -file -rec } | Select Name,DirectoryName,Length,LastWriteTime | epcsv $csvfile -D '|' -E UTF8 -NoT

'System.OutOfMemoryException' while looping through array in powershell

I was trying to write a function that to look for pool tags in .sys files. I created an array of all the directories that had .sys files then looped through them using the sysinternals Strings utility.
This is the array:
$paths = Get-ChildItem \\$server\c$ *.sys -Recurse -ErrorAction SilentlyContinue |
Select-Object Directory -unique
This was my first attempt at a loop:
foreach ($path in $paths) {
#convert object IO fileobject to string and strip out extraneous characters
[string]$path1 = $path
$path2 = $path1.replace("#{Directory=","")
$path3 = $path2.replace("}","")
$path4 = "$path3\*.sys"
Invoke-Command -ScriptBlock {strings -s $path4 | findstr $string}
}
I found some references to the error indicating that in foreach loops, all of the information is stored in memory until it completes its processing.
So I tried this:
for ($i = 0; $i -lt $paths.count; $i++){
[string]$path1 = $paths[$i]
$path2 = $path1.replace("#{Directory=","")
$path3 = $path2.replace("}","")
$path4 = "$path3\*.sys"
Invoke-Command -ScriptBlock {strings -s $path4 | findstr $string}
}
But it had the same result. I've read that sending an item at a time across the pipeline will prevent this error/issue, but I'm at a loss on how to proceed. Any thoughts?
Yeah, it is usually better to approach this problem using streaming so you don't have to buffer up a bunch of objects e.g.:
Get-ChildItem \\server\c$ -r *.sys -ea 0 | Foreach {
"Processing $_"; strings $_.Fullname | findstr $string}
Also, I'm not sure why you're using Invoke-Command when you can invoke strings and findstr directly. You typically use Invoke-Command to run a command on a remote computer.

How to retrieve a recursive directory and file list from PowerShell excluding some files and folders?

I want to write a PowerShell script that will recursively search a directory, but exclude specified files (for example, *.log, and myFile.txt), and also exclude specified directories, and their contents (for example, myDir and all files and folders below myDir).
I have been working with the Get-ChildItem CmdLet, and the Where-Object CmdLet, but I cannot seem to get this exact behavior.
I like Keith Hill's answer except it has a bug that prevents it from recursing past two levels. These commands manifest the bug:
New-Item level1/level2/level3/level4/foobar.txt -Force -ItemType file
cd level1
GetFiles . xyz | % { $_.fullname }
With Hill's original code you get this:
...\level1\level2
...\level1\level2\level3
Here is a corrected, and slightly refactored, version:
function GetFiles($path = $pwd, [string[]]$exclude)
{
foreach ($item in Get-ChildItem $path)
{
if ($exclude | Where {$item -like $_}) { continue }
$item
if (Test-Path $item.FullName -PathType Container)
{
GetFiles $item.FullName $exclude
}
}
}
With that bug fix in place you get this corrected output:
...\level1\level2
...\level1\level2\level3
...\level1\level2\level3\level4
...\level1\level2\level3\level4\foobar.txt
I also like ajk's answer for conciseness though, as he points out, it is less efficient. The reason it is less efficient, by the way, is because Hill's algorithm stops traversing a subtree when it finds a prune target while ajk's continues. But ajk's answer also suffers from a flaw, one I call the ancestor trap. Consider a path such as this that includes the same path component (i.e. subdir2) twice:
\usr\testdir\subdir2\child\grandchild\subdir2\doc
Set your location somewhere in between, e.g. cd \usr\testdir\subdir2\child, then run ajk's algorithm to filter out the lower subdir2 and you will get no output at all, i.e. it filters out everything because of the presence of subdir2 higher in the path. This is a corner case, though, and not likely to be hit often, so I would not rule out ajk's solution due to this one issue.
Nonetheless, I offer here a third alternative, one that does not have either of the above two bugs. Here is the basic algorithm, complete with a convenience definition for the path or paths to prune--you need only modify $excludeList to your own set of targets to use it:
$excludeList = #("stuff","bin","obj*")
Get-ChildItem -Recurse | % {
$pathParts = $_.FullName.substring($pwd.path.Length + 1).split("\");
if ( ! ($excludeList | where { $pathParts -like $_ } ) ) { $_ }
}
My algorithm is reasonably concise but, like ajk's, it is less efficient than Hill's (for the same reason: it does not stop traversing subtrees at prune targets). However, my code has an important advantage over Hill's--it can pipeline! It is therefore amenable to fit into a filter chain to make a custom version of Get-ChildItem while Hill's recursive algorithm, through no fault of its own, cannot. ajk's algorithm can be adapted to pipeline use as well, but specifying the item or items to exclude is not as clean, being embedded in a regular expression rather than a simple list of items that I have used.
I have packaged my tree pruning code into an enhanced version of Get-ChildItem. Aside from my rather unimaginative name--Get-EnhancedChildItem--I am excited about it and have included it in my open source Powershell library. It includes several other new capabilities besides tree pruning. Furthermore, the code is designed to be extensible: if you want to add a new filtering capability, it is straightforward to do. Essentially, Get-ChildItem is called first, and pipelined into each successive filter that you activate via command parameters. Thus something like this...
Get-EnhancedChildItem –Recurse –Force –Svn
–Exclude *.txt –ExcludeTree doc*,man -FullName -Verbose
... is converted internally into this:
Get-ChildItem | FilterExcludeTree | FilterSvn | FilterFullName
Each filter must conform to certain rules: accepting FileInfo and DirectoryInfo objects as inputs, generating the same as outputs, and using stdin and stdout so it may be inserted in a pipeline. Here is the same code refactored to fit these rules:
filter FilterExcludeTree()
{
$target = $_
Coalesce-Args $Path "." | % {
$canonicalPath = (Get-Item $_).FullName
if ($target.FullName.StartsWith($canonicalPath)) {
$pathParts = $target.FullName.substring($canonicalPath.Length + 1).split("\");
if ( ! ($excludeList | where { $pathParts -like $_ } ) ) { $target }
}
}
}
The only additional piece here is the Coalesce-Args function (found in this post by Keith Dahlby), which merely sends the current directory down the pipe in the event that the invocation did not specify any paths.
Because this answer is getting somewhat lengthy, rather than go into further detail about this filter, I refer the interested reader to my recently published article on Simple-Talk.com entitled Practical PowerShell: Pruning File Trees and Extending Cmdlets where I discuss Get-EnhancedChildItem at even greater length. One last thing I will mention, though, is another function in my open source library, New-FileTree, that lets you generate a dummy file tree for testing purposes so you can exercise any of the above algorithms. And when you are experimenting with any of these, I recommend piping to % { $_.fullname } as I did in the very first code fragment for more useful output to examine.
The Get-ChildItem cmdlet has an -Exclude parameter that is tempting to use but it doesn't work for filtering out entire directories from what I can tell. Try something like this:
function GetFiles($path = $pwd, [string[]]$exclude)
{
foreach ($item in Get-ChildItem $path)
{
if ($exclude | Where {$item -like $_}) { continue }
if (Test-Path $item.FullName -PathType Container)
{
$item
GetFiles $item.FullName $exclude
}
else
{
$item
}
}
}
Here's another option, which is less efficient but more concise. It's how I generally handle this sort of problem:
Get-ChildItem -Recurse .\targetdir -Exclude *.log |
Where-Object { $_.FullName -notmatch '\\excludedir($|\\)' }
The \\excludedir($|\\)' expression allows you to exclude the directory and its contents at the same time.
Update: Please check the excellent answer from msorens for an edge case flaw with this approach, and a much more fleshed out solution overall.
Recently, I explored the possibilities to parameterize the folder to scan through and the place where the result of recursive scan will be stored. At the end, I also did summarize the number of folders scanned and number of files inside as well. Sharing it with community in case it may help other developers.
##Script Starts
#read folder to scan and file location to be placed
$whichFolder = Read-Host -Prompt 'Which folder to Scan?'
$whereToPlaceReport = Read-Host -Prompt 'Where to place Report'
$totalFolders = 1
$totalFiles = 0
Write-Host "Process started..."
#IMP separator ? : used as a file in window cannot contain this special character in the file name
#Get Foldernames into Variable for ForEach Loop
$DFSFolders = get-childitem -path $whichFolder | where-object {$_.Psiscontainer -eq "True"} |select-object name ,fullName
#Below Logic for Main Folder
$mainFiles = get-childitem -path "C:\Users\User\Desktop" -file
("Folder Path" + "?" + "Folder Name" + "?" + "File Name " + "?"+ "File Length" )| out-file "$whereToPlaceReport\Report.csv" -Append
#Loop through folders in main Directory
foreach($file in $mainFiles)
{
$totalFiles = $totalFiles + 1
("C:\Users\User\Desktop" + "?" + "Main Folder" + "?"+ $file.name + "?" + $file.length ) | out-file "$whereToPlaceReport\Report.csv" -Append
}
foreach ($DFSfolder in $DFSfolders)
{
#write the folder name in begining
$totalFolders = $totalFolders + 1
write-host " Reading folder C:\Users\User\Desktop\$($DFSfolder.name)"
#$DFSfolder.fullName | out-file "C:\Users\User\Desktop\PoC powershell\ok2.csv" -Append
#For Each Folder obtain objects in a specified directory, recurse then filter for .sft file type, obtain the filename, then group, sort and eventually show the file name and total incidences of it.
$files = get-childitem -path "$whichFolder\$($DFSfolder.name)" -recurse
foreach($file in $files)
{
$totalFiles = $totalFiles + 1
($DFSfolder.fullName + "?" + $DFSfolder.name + "?"+ $file.name + "?" + $file.length ) | out-file "$whereToPlaceReport\Report.csv" -Append
}
}
# If running in the console, wait for input before closing.
if ($Host.Name -eq "ConsoleHost")
{
Write-Host ""
Write-Host ""
Write-Host ""
Write-Host " **Summary**" -ForegroundColor Red
Write-Host " ------------" -ForegroundColor Red
Write-Host " Total Folders Scanned = $totalFolders " -ForegroundColor Green
Write-Host " Total Files Scanned = $totalFiles " -ForegroundColor Green
Write-Host ""
Write-Host ""
Write-Host "I have done my Job,Press any key to exit" -ForegroundColor white
$Host.UI.RawUI.FlushInputBuffer() # Make sure buffered input doesn't "press a key" and skip the ReadKey().
$Host.UI.RawUI.ReadKey("NoEcho,IncludeKeyUp") > $null
}
##Output
##Bat Code to run above powershell command
#ECHO OFF
SET ThisScriptsDirectory=%~dp0
SET PowerShellScriptPath=%ThisScriptsDirectory%MyPowerShellScript.ps1
PowerShell -NoProfile -ExecutionPolicy Bypass -Command "& {Start-Process PowerShell -ArgumentList '-NoProfile -ExecutionPolicy Bypass -File ""%PowerShellScriptPath%""' -Verb RunAs}";
A bit late, but try this one.
function Set-Files($Path) {
if(Test-Path $Path -PathType Leaf) {
# Do any logic on file
Write-Host $Path
return
}
if(Test-Path $path -PathType Container) {
# Do any logic on folder use exclude on get-childitem
# cycle again
Get-ChildItem -Path $path | foreach { Set-Files -Path $_.FullName }
}
}
# call
Set-Files -Path 'D:\myFolder'
Commenting here as this seems to be the most popular answer on the subject for searching for files whilst excluding certain directories in powershell.
To avoid issues with post filtering of results (i.e. avoiding permission issues etc), I only needed to filter out top level directories and that is all this example is based on, so whilst this example doesn't filter child directory names, it could very easily be made recursive to support this, if you were so inclined.
Quick breakdown of how the snippet works
$folders << Uses Get-Childitem to query the file system and perform folder exclusion
$file << The pattern of the file I am looking for
foreach << Iterates the $folders variable performing a recursive search using the Get-Childitem command
$folders = Get-ChildItem -Path C:\ -Directory -Name -Exclude Folder1,"Folder 2"
$file = "*filenametosearchfor*.extension"
foreach ($folder in $folders) {
Get-Childitem -Path "C:/$folder" -Recurse -Filter $file | ForEach-Object { Write-Output $_.FullName }
}