How to retrieve a recursive directory and file list from PowerShell excluding some files and folders? - powershell

I want to write a PowerShell script that will recursively search a directory, but exclude specified files (for example, *.log, and myFile.txt), and also exclude specified directories, and their contents (for example, myDir and all files and folders below myDir).
I have been working with the Get-ChildItem CmdLet, and the Where-Object CmdLet, but I cannot seem to get this exact behavior.

I like Keith Hill's answer except it has a bug that prevents it from recursing past two levels. These commands manifest the bug:
New-Item level1/level2/level3/level4/foobar.txt -Force -ItemType file
cd level1
GetFiles . xyz | % { $_.fullname }
With Hill's original code you get this:
...\level1\level2
...\level1\level2\level3
Here is a corrected, and slightly refactored, version:
function GetFiles($path = $pwd, [string[]]$exclude)
{
foreach ($item in Get-ChildItem $path)
{
if ($exclude | Where {$item -like $_}) { continue }
$item
if (Test-Path $item.FullName -PathType Container)
{
GetFiles $item.FullName $exclude
}
}
}
With that bug fix in place you get this corrected output:
...\level1\level2
...\level1\level2\level3
...\level1\level2\level3\level4
...\level1\level2\level3\level4\foobar.txt
I also like ajk's answer for conciseness though, as he points out, it is less efficient. The reason it is less efficient, by the way, is because Hill's algorithm stops traversing a subtree when it finds a prune target while ajk's continues. But ajk's answer also suffers from a flaw, one I call the ancestor trap. Consider a path such as this that includes the same path component (i.e. subdir2) twice:
\usr\testdir\subdir2\child\grandchild\subdir2\doc
Set your location somewhere in between, e.g. cd \usr\testdir\subdir2\child, then run ajk's algorithm to filter out the lower subdir2 and you will get no output at all, i.e. it filters out everything because of the presence of subdir2 higher in the path. This is a corner case, though, and not likely to be hit often, so I would not rule out ajk's solution due to this one issue.
Nonetheless, I offer here a third alternative, one that does not have either of the above two bugs. Here is the basic algorithm, complete with a convenience definition for the path or paths to prune--you need only modify $excludeList to your own set of targets to use it:
$excludeList = #("stuff","bin","obj*")
Get-ChildItem -Recurse | % {
$pathParts = $_.FullName.substring($pwd.path.Length + 1).split("\");
if ( ! ($excludeList | where { $pathParts -like $_ } ) ) { $_ }
}
My algorithm is reasonably concise but, like ajk's, it is less efficient than Hill's (for the same reason: it does not stop traversing subtrees at prune targets). However, my code has an important advantage over Hill's--it can pipeline! It is therefore amenable to fit into a filter chain to make a custom version of Get-ChildItem while Hill's recursive algorithm, through no fault of its own, cannot. ajk's algorithm can be adapted to pipeline use as well, but specifying the item or items to exclude is not as clean, being embedded in a regular expression rather than a simple list of items that I have used.
I have packaged my tree pruning code into an enhanced version of Get-ChildItem. Aside from my rather unimaginative name--Get-EnhancedChildItem--I am excited about it and have included it in my open source Powershell library. It includes several other new capabilities besides tree pruning. Furthermore, the code is designed to be extensible: if you want to add a new filtering capability, it is straightforward to do. Essentially, Get-ChildItem is called first, and pipelined into each successive filter that you activate via command parameters. Thus something like this...
Get-EnhancedChildItem –Recurse –Force –Svn
–Exclude *.txt –ExcludeTree doc*,man -FullName -Verbose
... is converted internally into this:
Get-ChildItem | FilterExcludeTree | FilterSvn | FilterFullName
Each filter must conform to certain rules: accepting FileInfo and DirectoryInfo objects as inputs, generating the same as outputs, and using stdin and stdout so it may be inserted in a pipeline. Here is the same code refactored to fit these rules:
filter FilterExcludeTree()
{
$target = $_
Coalesce-Args $Path "." | % {
$canonicalPath = (Get-Item $_).FullName
if ($target.FullName.StartsWith($canonicalPath)) {
$pathParts = $target.FullName.substring($canonicalPath.Length + 1).split("\");
if ( ! ($excludeList | where { $pathParts -like $_ } ) ) { $target }
}
}
}
The only additional piece here is the Coalesce-Args function (found in this post by Keith Dahlby), which merely sends the current directory down the pipe in the event that the invocation did not specify any paths.
Because this answer is getting somewhat lengthy, rather than go into further detail about this filter, I refer the interested reader to my recently published article on Simple-Talk.com entitled Practical PowerShell: Pruning File Trees and Extending Cmdlets where I discuss Get-EnhancedChildItem at even greater length. One last thing I will mention, though, is another function in my open source library, New-FileTree, that lets you generate a dummy file tree for testing purposes so you can exercise any of the above algorithms. And when you are experimenting with any of these, I recommend piping to % { $_.fullname } as I did in the very first code fragment for more useful output to examine.

The Get-ChildItem cmdlet has an -Exclude parameter that is tempting to use but it doesn't work for filtering out entire directories from what I can tell. Try something like this:
function GetFiles($path = $pwd, [string[]]$exclude)
{
foreach ($item in Get-ChildItem $path)
{
if ($exclude | Where {$item -like $_}) { continue }
if (Test-Path $item.FullName -PathType Container)
{
$item
GetFiles $item.FullName $exclude
}
else
{
$item
}
}
}

Here's another option, which is less efficient but more concise. It's how I generally handle this sort of problem:
Get-ChildItem -Recurse .\targetdir -Exclude *.log |
Where-Object { $_.FullName -notmatch '\\excludedir($|\\)' }
The \\excludedir($|\\)' expression allows you to exclude the directory and its contents at the same time.
Update: Please check the excellent answer from msorens for an edge case flaw with this approach, and a much more fleshed out solution overall.

Recently, I explored the possibilities to parameterize the folder to scan through and the place where the result of recursive scan will be stored. At the end, I also did summarize the number of folders scanned and number of files inside as well. Sharing it with community in case it may help other developers.
##Script Starts
#read folder to scan and file location to be placed
$whichFolder = Read-Host -Prompt 'Which folder to Scan?'
$whereToPlaceReport = Read-Host -Prompt 'Where to place Report'
$totalFolders = 1
$totalFiles = 0
Write-Host "Process started..."
#IMP separator ? : used as a file in window cannot contain this special character in the file name
#Get Foldernames into Variable for ForEach Loop
$DFSFolders = get-childitem -path $whichFolder | where-object {$_.Psiscontainer -eq "True"} |select-object name ,fullName
#Below Logic for Main Folder
$mainFiles = get-childitem -path "C:\Users\User\Desktop" -file
("Folder Path" + "?" + "Folder Name" + "?" + "File Name " + "?"+ "File Length" )| out-file "$whereToPlaceReport\Report.csv" -Append
#Loop through folders in main Directory
foreach($file in $mainFiles)
{
$totalFiles = $totalFiles + 1
("C:\Users\User\Desktop" + "?" + "Main Folder" + "?"+ $file.name + "?" + $file.length ) | out-file "$whereToPlaceReport\Report.csv" -Append
}
foreach ($DFSfolder in $DFSfolders)
{
#write the folder name in begining
$totalFolders = $totalFolders + 1
write-host " Reading folder C:\Users\User\Desktop\$($DFSfolder.name)"
#$DFSfolder.fullName | out-file "C:\Users\User\Desktop\PoC powershell\ok2.csv" -Append
#For Each Folder obtain objects in a specified directory, recurse then filter for .sft file type, obtain the filename, then group, sort and eventually show the file name and total incidences of it.
$files = get-childitem -path "$whichFolder\$($DFSfolder.name)" -recurse
foreach($file in $files)
{
$totalFiles = $totalFiles + 1
($DFSfolder.fullName + "?" + $DFSfolder.name + "?"+ $file.name + "?" + $file.length ) | out-file "$whereToPlaceReport\Report.csv" -Append
}
}
# If running in the console, wait for input before closing.
if ($Host.Name -eq "ConsoleHost")
{
Write-Host ""
Write-Host ""
Write-Host ""
Write-Host " **Summary**" -ForegroundColor Red
Write-Host " ------------" -ForegroundColor Red
Write-Host " Total Folders Scanned = $totalFolders " -ForegroundColor Green
Write-Host " Total Files Scanned = $totalFiles " -ForegroundColor Green
Write-Host ""
Write-Host ""
Write-Host "I have done my Job,Press any key to exit" -ForegroundColor white
$Host.UI.RawUI.FlushInputBuffer() # Make sure buffered input doesn't "press a key" and skip the ReadKey().
$Host.UI.RawUI.ReadKey("NoEcho,IncludeKeyUp") > $null
}
##Output
##Bat Code to run above powershell command
#ECHO OFF
SET ThisScriptsDirectory=%~dp0
SET PowerShellScriptPath=%ThisScriptsDirectory%MyPowerShellScript.ps1
PowerShell -NoProfile -ExecutionPolicy Bypass -Command "& {Start-Process PowerShell -ArgumentList '-NoProfile -ExecutionPolicy Bypass -File ""%PowerShellScriptPath%""' -Verb RunAs}";

A bit late, but try this one.
function Set-Files($Path) {
if(Test-Path $Path -PathType Leaf) {
# Do any logic on file
Write-Host $Path
return
}
if(Test-Path $path -PathType Container) {
# Do any logic on folder use exclude on get-childitem
# cycle again
Get-ChildItem -Path $path | foreach { Set-Files -Path $_.FullName }
}
}
# call
Set-Files -Path 'D:\myFolder'

Commenting here as this seems to be the most popular answer on the subject for searching for files whilst excluding certain directories in powershell.
To avoid issues with post filtering of results (i.e. avoiding permission issues etc), I only needed to filter out top level directories and that is all this example is based on, so whilst this example doesn't filter child directory names, it could very easily be made recursive to support this, if you were so inclined.
Quick breakdown of how the snippet works
$folders << Uses Get-Childitem to query the file system and perform folder exclusion
$file << The pattern of the file I am looking for
foreach << Iterates the $folders variable performing a recursive search using the Get-Childitem command
$folders = Get-ChildItem -Path C:\ -Directory -Name -Exclude Folder1,"Folder 2"
$file = "*filenametosearchfor*.extension"
foreach ($folder in $folders) {
Get-Childitem -Path "C:/$folder" -Recurse -Filter $file | ForEach-Object { Write-Output $_.FullName }
}

Related

How to pipe Rename-Item into Move-Item (powershell)

I'm in the process of writing up a PowerShell script that can take a bunch of .TIF images, rename them, and place them in a new folder structure depending on the original file name.
For example, a folder containing the file named:
ABC-ALL-210316-0001-3001-0001-1-CheckInvoice-Front.TIF
would be renamed to "00011CIF.TIF", and placed in the following folder:
\20220316\03163001\
I've been trying to put together a code to perform this task, and I got one to work where I had two different "ForEach" methods. One would do a bunch of file renaming to remove "-" and shorten "CheckInvoiceFront" to "CIF" and such. Then the second method would again pull all .TIF images, create substrings of the image names, and create folders from those substrings, and then move the image to the new folder, shortening the file name. Like I said, it worked... but I wanted to combine the ForEach methods into one process. However, each time I try to run it, it fails for various reasons... I've tried to change things around, but I just can't seem to get it to work.
Here's the current (non-working) code:
# Prompt user for directory to search through
$sorDirectory = Read-Host -Prompt 'Input source directory to search for images: '
$desDirectory = Read-Host -Prompt 'Input target directory to output folders: '
Set-Location $sorDirectory
# Check directory for TIF images, and close if none are found
Write-Host "Scanning "$sorDirectory" for images... "
$imageCheck = Get-ChildItem -File -Recurse -Path $sorDirectory -include '*.tif'
$imageCount = $imageCheck.count
if ($imageCount -gt 0) {
Write-Host "Total number of images found: $imageCount"
""
Read-Host -Prompt "Press ENTER to continue or CTRL+C to quit"
$count1=1;
# Rename all images, removing "ABCALL" from the start and inserting "20", and then shorten long filetype names, and move files to new folders with new names
Clear-Host
Write-Host "Reformatting images for processing..."
""
Get-ChildItem -File -Recurse -Path $sorDirectory -include '*.tif' |
ForEach-Object {
Write-Progress -Activity "Total Formatted Images: $count1/$imageCount" -Status "0--------10%--------20%--------30%--------40%--------50%--------60%--------70%--------80%--------90%-------100" -CurrentOperation $_ -PercentComplete (($count1 / $imageCount) * 100)
Rename-Item $_ -NewName $_.Name.Replace("-", "").Replace("ABCALL", "20").Replace("CheckInvoiceFront", "CIF").Replace("CheckInvoiceBack", "CIB").Replace("CheckFront", "CF").Replace("CheckBack", "CB") |Out-Null
$year = $_.Name.SubString(0, 4)
$monthday = $_.Name.Substring(4,4)
$batch = $_.Name.SubString(12, 4)
$fulldate = $year+$monthday
$datebatch = $monthday+$batch
$image = $_.Name.SubString(16)
$fullPath = "$desDirectory\$fulldate\$datebatch"
if (-not (Test-Path $fullPath)) { mkdir $fullPath |Out-Null }
Move-Item $_ -Destination "$fullPath\$image" |Out-Null
$count1++
}
# Finished
Clear-Host
Write-Host "Job complete!"
Timeout /T -1
}
# Closes if no images are found (likely bad path)
else {
Write-Host "There were no images in the selected folder. Now closing..."
Timeout /T 10
Exit
}
Usually this results in an error stating that it's can't find the path of the original file name, as if it's still looking for the original non-renamed image. I tried adding some other things, but then it said I was passing null values. I'm just not sure what I'm doing wrong.
Note that if I take the everything after the "Rename-Item" (starting with "$year =") and have that in a different ForEach method, it works. I guess I just don't know how to make the Rename-Item return its results back to "$_" before everything else tries working on it. I tried messing around with "-PassThru" but I don't think I was doing it right.
Any suggestions?
As Olaf points out, situationally you may not need both a Rename-Item and a Move-Item call, because Move-Item can rename and move in single operation.
That said, Move-Item does not support implicit creation of the target directory to move a file to, so in your case you do need separate calls.
You can use Rename-Item's -PassThru switch to make it output a System.IO.FileInfo instance (or, if a directory is being renamed, a System.IO.DirectoryInfo instance) representing the already renamed file; you can directly pass such an instance to Move-Item via the pipeline:
Get-ChildItem -File -Recurse -Path $sorDirectory -include '*.tif' |
ForEach-Object {
# ...
# Use -PassThru with Rename-Item to output a file-info object describing
# the already renamed file.
$renamedFile = $_ | Rename-Item -PassThru -NewName $_.Name.Replace("-", "").Replace("ABCALL", "20").Replace("CheckInvoiceFront", "CIF").Replace("CheckInvoiceBack", "CIB").Replace("CheckFront", "CF").Replace("CheckBack", "CB")
# ...
# Pass $renamedFile to Move-Item via the pipeline.
$renamedFile | Move-Item -Destination "$fullPath\$image"
# ...
}
As for your desire to:
make the Rename-Item return its results back to "$_"
While PowerShell doesn't prevent you from modifying the automatic $_ variable, it is better to treat automatic variables as read-only.
Therefore, a custom variable is used above to store the output from Rename-Item -PassThru
You need -passthru and -destination:
rename-item file1 file2 -PassThru | move-item -Destination dir1

Powershell dropping characters while creating folder names

I am having a strange problem in Powershell (Version 2021.8.0) while creating folders and naming them. I start with a number of individual ebook files in a folder that I set using Set-Location. I use the file name minus the extension to create a new folder with the same name as the e-book file. The code works fine the majority of the time with various file extensions I have stored in an array beginning of the code.
What's happening is that the code creates the proper folder name the majority of the time and moves the source file into the folder after it's created.
The problem is, if the last letter of the source file name, on files with the extension ".epub" end in an "e", then the "e" is missing from the end of the created folder name. I thought that I saw it also drop "r" and "p" but I have been unable to replicate that error recently.
Below is my code. It is set up to run against file extensions for e-books and audiobooks. Please ignore the error messages that are being generated when files of a specific type don't exist in the working folder. I am just using the array for testing and it will be filled automatically later by reading the folder contents.
This Code Creates a Folder for Each File and moves the file into that Folder:
Clear-Host
$SourceFileFolder = 'N:\- Books\- - BMS\- Books Needing Folders'
Set-Location $SourceFileFolder
$MyArray = ( "*.azw3", "*.cbz", "*.doc", "*.docx", "*.djvu", "*.epub", "*.mobi", "*.mp3", "*.pdf", "*.txt" )
Foreach ($FileExtension in $MyArray) {
Get-ChildItem -Include $FileExtension -Name -Recurse | Sort-Object | ForEach-Object { $SourceFileName = $_
$NewDirectoryName = $SourceFileName.TrimEnd($FileExtension)
New-Item -Name $NewDirectoryName -ItemType "directory"
$OriginalFileName = Join-Path -Path $SourceFileFolder -ChildPath $SourceFileName
$DestinationFilename = Join-Path -Path $NewDirectoryName -ChildPath $SourceFileName
$DestinationFilename = Join-Path -Path $SourceFileFolder -ChildPath $DestinationFilename
Move-Item $OriginalFileName -Destination $DestinationFilename
}
}
Thanks for any help you can give. Driving me nuts and I am pretty sure it's something that I am doing wrong, like always.
String.TrimEnd()
Removes all the trailing occurrences of a set of characters specified in an array from the current string.
TrimEnd method will remove all characters that matches in the character array you provided. It does not look for whether or not .epub is at the end of the string, but rather it trims out any of the characters in the argument supplied from the end of the string. In your case, all dots,e,p,u,b will be removed from the end until no more of these characters are within the string. Now, you will eventually (and you do) remove more than what you intended for.
I'd suggest using EndsWith to match your extensions and performing a substring selection instead, as below. If you deal only with single extension (eg: not with .tar.gz or other double extensions type), you can also use the .net [System.IO.Path]::GetFileNameWithoutExtension($MyFileName) method.
$MyFileName = "Teste.epub"
$FileExt = '.epub'
# Wrong approach
$output = $MyFileName.TrimEnd($FileExt)
write-host $output -ForegroundColor Yellow
#Output returns Test
# Proper method
if ($MyFileName.EndsWith($FileExt)) {
$output = $MyFileName.Substring(0,$MyFileName.Length - $FileExt.Length)
Write-Host $output -ForegroundColor Cyan
}
# Returns Tested
#Alternative method. Won't work if you want to trim out double extensions (eg. tar.gz)
if ($MyFileName.EndsWith($FileExt)) {
$Output = [System.IO.Path]::GetFileNameWithoutExtension($MyFileName)
Write-Host $output -ForegroundColor Cyan
}
You're making this too hard on yourself. Use the .BaseName to get the filename without extension.
Your code simplified:
$SourceFileFolder = 'N:\- Books\- - BMS\- Books Needing Folders'
$MyArray = "*.azw3", "*.cbz", "*.doc", "*.docx", "*.djvu", "*.epub", "*.mobi", "*.mp3", "*.pdf", "*.txt"
(Get-ChildItem -Path $SourceFileFolder -Include $MyArray -File -Recurse) | Sort-Object Name | ForEach-Object {
# BaseName is the filename without extension
$NewDirectory = Join-Path -Path $SourceFileFolder -ChildPath $_.BaseName
$null = New-Item -Path $NewDirectory -ItemType Directory -Force
$_ | Move-Item -Destination $NewDirectory
}

Parse directory listing and pass to another script?

I am trying to write a PowerShell script that will loop through a directory in C:\ drive and parse the filenames with the file extension to another script to use.
Basically, the output of the directory listing should be accessible to be parsed to another script one by one. The script is a compiling script which expects an argument (parameter) to be parsed to it in order to compile the specific module (filename).
Code:
Clear-Host $Path = "C:\SandBox\"
Get-ChildItem $Path -recurse -force | ForEach { If ($_.extension -eq ".cob")
{
Write-Host $_.fullname
}
}
If ($_.extension -eq ".pco")
{
Write-Host $_.fullname }
}
You don't need to parse the output as text, that's deprecated.
Here's something that might work for you:
# getmyfiles.ps1
Param( [string])$Path = Get-Location )
dir $Path -Recurse -Force | where {
$_.Extension -in #('.cob', '.pco')
}
# this is another script that calls the above
. getmyfile.ps1 -Path c:\sandbox | foreach-object {
# $_ is a file object. I'm just printing its full path but u can do other stuff eith it
Write-host $_.Fullname
}
Clear-Host
$Path = "C:\Sandbox\"
$Items = Get-ChildItem $Path -recurse -Include "*.cob", "*.pco"
From your garbled code am guessing you want to return a list of files that have .cob and .pco file extensions. You could use the above code to gather those.
$File = $Items.name
$FullName = $items.fullname
Write-Host $Items.name
$File
$FullName
Adding the above lines will allow you to display them in various ways. You can pick the one that suites your needs then loop through them on a for-each.
As a rule its not a place for code to be writen for you, but you have tried to add some to the questions so I've taken a look. Sometimes you just want a nudge in the right direction.

How to use Powershell to list duplicate files in a folder structure that exist in one of the folders

I have a source tree, say c:\s, with many sub-folders. One of the sub-folders is called "c:\s\Includes" which can contain one or more .cs files recursively.
I want to make sure that none of the .cs files in the c:\s\Includes... path exist in any other folder under c:\s, recursively.
I wrote the following PowerShell script which works, but I'm not sure if there's an easier way to do it. I've had less than 24 hours experience with PowerShell so I have a feeling there's a better way.
I can assume at least PowerShell 3 being used.
I will accept any answer that improves my script, but I'll wait a few days before accepting the answer. When I say "improve", I mean it makes it shorter, more elegant or with better performance.
Any help from anyone would be greatly appreciated.
The current code:
$excludeFolder = "Includes"
$h = #{}
foreach ($i in ls $pwd.path *.cs -r -file | ? DirectoryName -notlike ("*\" + $excludeFolder + "\*")) { $h[$i.Name]=$i.DirectoryName }
ls ($pwd.path + "\" + $excludeFolder) *.cs -r -file | ? { $h.Contains($_.Name) } | Select #{Name="Duplicate";Expression={$h[$_.Name] + " has file with same name as " + $_.Fullname}}
1
I stared at this for a while, determined to write it without studying the existing answers, but I'd already glanced at the first sentence of Matt's answer mentioning Group-Object. After some different approaches, I get basically the same answer, except his is long-form and robust with regex character escaping and setup variables, mine is terse because you asked for shorter answers and because that's more fun.
$inc = '^c:\\s\\includes'
$cs = (gci -R 'c:\s' -File -I *.cs) | group name
$nopes = $cs |?{($_.Group.FullName -notmatch $inc)-and($_.Group.FullName -match $inc)}
$nopes | % {$_.Name; $_.Group.FullName}
Example output:
someFile.cs
c:\s\includes\wherever\someFile.cs
c:\s\lib\factories\alt\someFile.cs
c:\s\contrib\users\aa\testing\someFile.cs
The concept is:
Get all the .cs files in the whole source tree
Split them into groups of {filename: {files which share this filename}}
For each group, keep only those where the set of files contains any file with a path that matches the include folder and contains any file with a path that does not match the includes folder. This step covers
duplicates (if a file only exists once it cannot pass both tests)
duplicates across the {includes/not-includes} divide, instead of being duplicated within one branch
handles triplicates, n-tuplicates, as well.
Edit: I added the ^ to $inc to say it has to match at the start of the string, so the regex engine can fail faster for paths that don't match. Maybe this counts as premature optimization.
2
After that pretty dense attempt, the shape of a cleaner answer is much much easier:
Get all the files, split them into include, not-include arrays.
Nested for-loop testing every file against every other file.
Longer, but enormously quicker to write (it runs slower, though) and I imagine easier to read for someone who doesn't know what it does.
$sourceTree = 'c:\\s'
$allFiles = Get-ChildItem $sourceTree -Include '*.cs' -File -Recurse
$includeFiles = $allFiles | where FullName -imatch "$($sourceTree)\\includes"
$otherFiles = $allFiles | where FullName -inotmatch "$($sourceTree)\\includes"
foreach ($incFile in $includeFiles) {
foreach ($oFile in $otherFiles) {
if ($incFile.Name -ieq $oFile.Name) {
write "$($incFile.Name) clash"
write "* $($incFile.FullName)"
write "* $($oFile.FullName)"
write "`n"
}
}
}
3
Because code-golf is fun. If the hashtables are faster, what about this even less tested one-liner...
$h=#{};gci c:\s -R -file -Filt *.cs|%{$h[$_.Name]+=#($_.FullName)};$h.Values|?{$_.Count-gt1-and$_-like'c:\s\includes*'}
Edit: explanation of this version: It's doing much the same solution approach as version 1, but the grouping operation happens explicitly in the hashtable. The shape of the hashtable becomes:
$h = {
'fileA.cs': #('c:\cs\wherever\fileA.cs', 'c:\cs\includes\fileA.cs'),
'file2.cs': #('c:\cs\somewhere\file2.cs'),
'file3.cs': #('c:\cs\includes\file3.cs', 'c:\cs\x\file3.cs', 'c:\cs\z\file3.cs')
}
It hits the disk once for all the .cs files, iterates the whole list to build the hashtable. I don't think it can do less work than this for that bit.
It uses +=, so it can add files to the existing array for that filename, otherwise it would overwrite each of the hashtable lists and they would be one item long for only the most recently seen file.
It uses #() - because when it hits a filename for the first time, $h[$_.Name] won't return anything, and the script needs put an array into the hashtable at first, not a string. If it was +=$_.FullName then the first file would go into the hashtable as a string and the += next time would do string concatenation and that's no use to me. This forces the first file in the hashtable to start an array by forcing every file to be a one item array. The least-code way to get this result is with +=#(..) but that churn of creating throwaway arrays for every single file is needless work. Maybe changing it to longer code which does less array creation would help?
Changing the section
%{$h[$_.Name]+=#($_.FullName)}
to something like
%{if (!$h.ContainsKey($_.Name)){$h[$_.Name]=#()};$h[$_.Name]+=$_.FullName}
(I'm guessing, I don't have much intuition for what's most likely to be slow PowerShell code, and haven't tested).
After that, using h.Values isn't going over every file for a second time, it's going over every array in the hashtable - one per unique filename. That's got to happen to check the array size and prune the not-duplicates, but the -and operation short circuits - when the Count -gt 1 fails, the so the bit on the right checking the path name doesn't run.
If the array has two or more files in it, the -and $_ -like ... executes and pattern matches to see if at least one of the duplicates is in the includes path. (Bug: if all the duplicates are in c:\cs\includes and none anywhere else, it will still show them).
--
4
This is edited version 3 with the hashtable initialization tweak, and now it keeps track of seen files in $s, and then only considers those it's seen more than once.
$h=#{};$s=#{};gci 'c:\s' -R -file -Filt *.cs|%{if($h.ContainsKey($_.Name)){$s[$_.Name]=1}else{$h[$_.Name]=#()}$h[$_.Name]+=$_.FullName};$s.Keys|%{if ($h[$_]-like 'c:\s\includes*'){$h[$_]}}
Assuming it works, that's what it does, anyway.
--
Edit branch of topic; I keep thinking there ought to be a way to do this with the things in the System.Data namespace. Anyone know if you can connect System.Data.DataTable().ReadXML() to gci | ConvertTo-Xml without reams of boilerplate?
I'd do more or less the same, except I'd build the hashtable from the contents of the includes folder and then run over everything else to check for duplicates:
$root = 'C:\s'
$includes = "$root\includes"
$includeList = #{}
Get-ChildItem -Path $includes -Filter '*.cs' -Recurse -File |
% { $includeList[$_.Name] = $_.DirectoryName }
Get-ChildItem -Path $root -Filter '*.cs' -Recurse -File |
? { $_.FullName -notlike "$includes\*" -and $includeList.Contains($_.Name) } |
% { "Duplicate of '{0}': {1}" -f $includeList[$_.Name], $_.FullName }
I'm not as impressed with this as I would like but I thought that Group-Object might have a place in this question so I present the following:
$base = 'C:\s'
$unique = "$base\includes"
$extension = "*.cs"
Get-ChildItem -Path $base -Filter $extension -Recurse |
Group-Object $_.Name |
Where-Object{($_.Count -gt 1) -and (($_.Group).FullName -match [regex]::Escape($unique))} |
ForEach-Object {
$filename = $_.Name
($_.Group).FullName -notmatch [regex]::Escape($unique) | ForEach-Object{
"'{0}' has file with same name as '{1}'" -f (Split-Path $_),$filename
}
}
Collect all the files with the extension filter $extension. Group the files based on their names. Then of those groups find every group where there are more than one of that particular file and one of the group members is at least in the directory $unique. Take those groups and print out all the files that are not from the unique directory.
From Comment
For what its worth this is what I used for testing to create a bunch of files. (I know the folder 9 is empty)
$base = "E:\Temp\dev\cs"
Remove-Item "$base\*" -Recurse -Force
0..9 | %{[void](New-Item -ItemType directory "$base\$_")}
1..1000 | %{
$number = Get-Random -Minimum 1 -Maximum 100
$folder = Get-Random -Minimum 0 -Maximum 9
[void](New-Item -Path $base\$folder -ItemType File -Name "$number.txt" -Force)
}
After looking at all the others, I thought I would try a different approach.
$includes = "C:\s\includes"
$root = "C:\s"
# First script
Measure-Command {
[string[]]$filter = ls $includes -Filter *.cs -Recurse | % name
ls $root -include $filter -Recurse -Filter *.cs |
Where-object{$_.FullName -notlike "$includes*"}
}
# Second Script
Measure-Command {
$filter2 = ls $includes -Filter *.cs -Recurse
ls $root -Recurse -Filter *.cs |
Where-object{$filter2.name -eq $_.name -and $_.FullName -notlike "$includes*"}
}
In my first script, I get all the include files into a string array. Then i use that string array as a include param on the get-childitem. In the end, I filter out the include folder from the results.
In my second script, I enumerate everything and then filter after the pipe.
Remove the measure-command to see the results. I was using that to check the speed. With my dataset, the first one was 40% faster.
$FilesToFind = Get-ChildItem -Recurse 'c:\s\includes' -File -Include *.cs | Select Name
Get-ChildItem -Recurse C:\S -File -Include *.cs | ? { $_.Name -in $FilesToFind -and $_.Directory -notmatch '^c:\s\includes' } | Select Name, Directory
Create a list of file names to look for.
Find all files that are in the list but not part of the directory the list was generated from
Print their name and directory

Get-ChildItem -force reports "Access Denied" on My Documents folder and other junction points

I have a script that I wrote that replaces files. I pass params to it for the name of the file, and the base location to search from. The worker lines are:
$SubLocations = Get-ChildItem -Path $Startlocation -Recurse -include $Filename -Force |
Where { $_.FullName.ToUpper().contains($Filter.ToUpper())}
I set $Startlocation to "C:\Users", however, I am getting access denied when trying to recurse through other users folders. I'm full admin on the machine, and I have already tried running powershell as admin. I can access all the files via Windows explorer with no issue. Any idea?
Get-ChildItem : Access to the path 'C:\Users\jepa227\Documents\My Music' is denied.
At C:\Users\krla226\Google Drive\Documents\PowerShell\Replace-File.ps1:35 char:46
+ $SubLocations = Get-ChildItem <<<< -Path $Startlocation -Recurse - include $Filename -Force |
+ CategoryInfo : PermissionDenied: (C:\Users\jepa227\Documents\My Music:String) [Get-ChildItem], Una
uthorizedAccessException
+ FullyQualifiedErrorId : DirUnauthorizedAccessError,Microsoft.PowerShell.Commands.GetChildItemCommand
UPDATE
While I was unable to get it working via GCI, I was able to use WMI to solve my problem. For those interested:
$SubLocations = Get-WmiObject -Class cim_datafile -Filter "fileName = '$filename' AND Extension = '$extension'" |
Where { $_.Name.ToUpper().contains($Filter.ToUpper()) }
I was able to reproduce this on a Windows 7 machine with the following command logged in as an admin user named "admin", running powershell with elevated privileges, and UAC disabled:
get-childitem "c:\users\Admin\my documents"
and
cd "c:\users\admin\my documents"
get-childitem
Based on the article here, it looks like My Documents, My Music, etc., are defined as junction points for backwards-compatibility with pre-Vista software. Powershell doesn't natively do well with junction points. It seems like there are a couple options here:
1) Remove the -force from the Get-ChildItem command. This is likely your best bet.
get-childitem c:\users -recurse
works without error and skips junction points and system directories like AppData.
Editor's note: Omitting -Force does solve the immediate problem, but invariably skips all hidden items, not just the hidden junction points that cause the access-denied errors.
2) If you absolutely need to use the -Force for some reason, you could programmatically recurse each subdirectory, skipping junction points. This article describes the mechanism to identify junction points. A skeleton of this in a .ps1 script file might look like:
Param( [Parameter(Mandatory=$true)][string]$startLocation )
$errorActionPreference = "Stop"
function ProcessDirectory( $dir )
{
Write-Host ("Working on " + $dir.FullName)
# Work on the files in this folder here
$filesToProcess = ( gci | where { ($_.PsIsContainer -eq 0) } ) # and file matches the requested pattern
# process files
$subdirs = gci $dir.FullName -force | where {($_.Attributes -band [IO.FileAttributes]::ReparsePoint) -eq 0 -and ($_.PsIsContainer -eq 1) -and (![string]::IsNullOrEmpty($_.FullName))}
foreach( $subdir in $subdirs )
{
# Write-Host( $subdir.Name + ", " + $subdir.FullName )
if ( $subdir -ne $null )
{
ProcessDirectory -dir $subdir
}
}
}
$dirs = get-childitem $startLocation -force
$dirs | foreach { ProcessDirectory -dir $_ }
mcating's helpful answer explains the problem well.
The quick fix he suggests is to omit -Force, which works, because PowerShell ignores hidden items unless -Force is used, and these system-defined junction points do have the Hidden attribute (alongside the ReparsePoint and System attributes).
If you do need -Force in order to process hidden items in general and only want to ignore these system-defined junction points, you can use Get-ChildItem's -Attributes parameter as follows:
Get-ChildItem -Force -Recurse -Attributes !Hidden, !System, !ReparsePoint
The -Attributes value excludes all items that have all of the following attributes set: Hidden, System, ReparsePoint, which is true of all system-defined junction points.
While it is technically possible to create your own junction points (or symbolic links) with these attributes, this is unlikely to occur in practice.