PowerShell script to execute if threshold exceeded - powershell

First off, sorry for the long post - I'm trying to be detailed!
I'm looking to automate a work around for an issue I discovered. I have a worker that periodically bombs once the "working" directory has more than 100,000 files in it. Preventatively I can stop the process and rename the working directory to "HOLD" and create new working dir to keep it going. Then I move files from the HOLD folder(s) back into the working dir a little bit at a time until its caught up.
What I would like to do is automate the entire process via Task Scheduler with 2 PowerShell scripts.
----SCRIPT 1----
Here's the condition:
If file count in working dir is greater than 60,000
I find that( [System.IO.Directory]::EnumerateFiles($Working)is faster than Get-ChildItem.
The actions:
Stop-Service for Service1, Service2, Service3
Rename-Item -Path "C:\Prod\Working\" -NewName "Hold" or "Hold1","2","3",etc.. if the folder already exists --I'm not particular about the numeration as long as it is consistent so if it's easier to let the system name it HOLD, HOLD(1), HOLD(2), etc.. or append the date after HOLD then that's fine.
New-Item C:\Prod\Working -type directory
Start-Service Service1, Service2, Service3
---SCRIPT 2----
Condition:
If file count in working dir is less than 50,000
Actions:
Move 5,000 files from HOLD* folder(s) --Move 5k files from the HOLD folder until empty, then skip the empty folder and start moving files from HOLD1. This process should be dynamic and repeat to the next folders.
Before it comes up, I'm well aware it would be easier to simply move the files from the working folder to a Hold folder, but the size of the files can be very large and moving them always seems to take much longer.
I greatly appreciate any input and I'm eager to see some solid answers!
EDIT
Here's what I'm running for Script 2 -courtesy of Bacon
#Setup
$restoreThreshold = 30000; # Ensure there's enough room so that restoring $restoreBatchSize
$restoreBatchSize = 500; # files won't push $Working's file count above $restoreThreshold
$Working = "E:\UnprocessedTEST\"
$HoldBaseDirectory = "E:\"
while (#(Get-ChildItem -File -Path $Working).Length -lt $restoreThreshold - $restoreBatchSize)
{
$holdDirectory = Get-ChildItem -Path $HoldBaseDirectory -Directory -Filter '*Hold*' |
Select-Object -Last 1;
if ($holdDirectory -eq $null)
{
# There are no Hold directories to process; don't keep looping
break;
}
# Restore the first $restoreBatchSize files from $holdDirectory and store the count of files restored
$restoredCount = Get-ChildItem $holdDirectory -File `
| Select-Object -First $restoreBatchSize | Move-Item -Destination $Working -PassThru |
Measure-Object | Select-Object -ExpandProperty 'Count';
# If less than $restoreBatchSize files were restored then $holdDirectory is now empty; delete it
if ($restoredCount -lt $restoreBatchSize)
{
Remove-Item -Path $holdDirectory;
}
}

The first script could look like this:
$rotateThreshold = 60000;
$isThresholdExceeded = #(
Get-ChildItem -File -Path $Working `
| Select-Object -First ($rotateThreshold + 1) `
).Length -gt $rotateThreshold;
#Alternative: $isThresholdExceeded = #(Get-ChildItem -File -Path $Working).Length -gt $rotateThreshold;
if ($isThresholdExceeded)
{
Stop-Service -Name 'Service1', 'Service2', 'Service3';
try
{
$newName = 'Hold_{0:yyyy-MM-ddTHH-mm-ss}' -f (Get-Date);
Rename-Item -Path $Working -NewName $newName;
}
finally
{
New-Item -ItemType Directory -Path $Working -ErrorAction SilentlyContinue;
Start-Service -Name 'Service1', 'Service2', 'Service3';
}
}
The reason for assigning $isThresholdExceeded the way I am is because we don't care what the exact count of files is, just if it's above or below that threshold. As soon as we know that threshold has been exceeded we don't need any further results from Get-ChildItem (or the same for [System.IO.Directory]::EnumerateFiles($Working)), so as an opimization Select-Object will terminate the pipeline on the element after the threshold is reached. In a directory with 100,000 files on an SSD I found this to be almost 40% faster than allowing Get-ChildItem to enumerate all files (4.12 vs. 6.72 seconds). Other implementations using foreach or ForEach-Object proved to be slower than #(Get-ChildItem -File -Path $Working).Length.
As for generating the new name for the 'Hold' directories, you could save and update an identifier somewhere, or just generate new names with an incrementing suffix until you find one that's not in use. I think it's easier to just base the name on the current time. As long as the script doesn't run more than once a second you'll know the name is unique, they'll sort just as well as numerals, plus it gives you a little diagnostic information (the time that directory was rotated out) for free.
Here's some basic code for the second script:
$restoreThreshold = 50000;
$restoreBatchSize = 5000;
# Ensure there's enough room so that restoring $restoreBatchSize
# files won't push $Working's file count above $restoreThreshold
while (#(Get-ChildItem -File -Path $Working).Length -lt $restoreThreshold - $restoreBatchSize)
{
$holdDirectory = Get-ChildItem -Path $HoldBaseDirectory -Directory -Filter 'Hold_*' `
| Select-Object -First 1;
if ($holdDirectory -eq $null)
{
# There are no Hold directories to process; don't keep looping
break;
}
# Restore the first $restoreBatchSize files from $holdDirectory and store the count of files restored
$restoredCount = Get-ChildItem -File -Path $holdDirectory.FullName `
| Select-Object -First $restoreBatchSize `
| Move-Item -Destination $Working -PassThru `
| Measure-Object `
| Select-Object -ExpandProperty 'Count';
# If less than $restoreBatchSize files were restored then $holdDirectory is now empty; delete it
if ($restoredCount -lt $restoreBatchSize)
{
Remove-Item -Path $holdDirectory.FullName;
}
}
As noted in the comment before the while loop, the condition is ensuring that the count of files in $Working is at least $restoreBatchSize files away from $restoreThreshold so that if $restoreBatchSize files are restored it won't exceed the threshold in the process. If you don't care about that, or the chosen threshold already accounts for that, you change the condition to compare against $restoreThreshold instead of $restoreThreshold - $restoreBatchSize. Alternatively, leave the condition the same and change $restoreThreshold to 55000.
The way I've written the loop, on each iteration at most $restoreBatchSize files will be restored from the first 'Hold_*' directory it finds, then the file count in $Working is reevaluated. Considering that, as I understand it, there are files being added and removed from $Working external to this script and simultaneous to its execution, this might be the safest approach and also the simplest approach. You could certainly enhance this by calculating how far below $restoreThreshold you are and performing the necessary number of batch restores, from one or more 'Hold_*' directories, all in one iteration of the loop.

Related

How to pipe Rename-Item into Move-Item (powershell)

I'm in the process of writing up a PowerShell script that can take a bunch of .TIF images, rename them, and place them in a new folder structure depending on the original file name.
For example, a folder containing the file named:
ABC-ALL-210316-0001-3001-0001-1-CheckInvoice-Front.TIF
would be renamed to "00011CIF.TIF", and placed in the following folder:
\20220316\03163001\
I've been trying to put together a code to perform this task, and I got one to work where I had two different "ForEach" methods. One would do a bunch of file renaming to remove "-" and shorten "CheckInvoiceFront" to "CIF" and such. Then the second method would again pull all .TIF images, create substrings of the image names, and create folders from those substrings, and then move the image to the new folder, shortening the file name. Like I said, it worked... but I wanted to combine the ForEach methods into one process. However, each time I try to run it, it fails for various reasons... I've tried to change things around, but I just can't seem to get it to work.
Here's the current (non-working) code:
# Prompt user for directory to search through
$sorDirectory = Read-Host -Prompt 'Input source directory to search for images: '
$desDirectory = Read-Host -Prompt 'Input target directory to output folders: '
Set-Location $sorDirectory
# Check directory for TIF images, and close if none are found
Write-Host "Scanning "$sorDirectory" for images... "
$imageCheck = Get-ChildItem -File -Recurse -Path $sorDirectory -include '*.tif'
$imageCount = $imageCheck.count
if ($imageCount -gt 0) {
Write-Host "Total number of images found: $imageCount"
""
Read-Host -Prompt "Press ENTER to continue or CTRL+C to quit"
$count1=1;
# Rename all images, removing "ABCALL" from the start and inserting "20", and then shorten long filetype names, and move files to new folders with new names
Clear-Host
Write-Host "Reformatting images for processing..."
""
Get-ChildItem -File -Recurse -Path $sorDirectory -include '*.tif' |
ForEach-Object {
Write-Progress -Activity "Total Formatted Images: $count1/$imageCount" -Status "0--------10%--------20%--------30%--------40%--------50%--------60%--------70%--------80%--------90%-------100" -CurrentOperation $_ -PercentComplete (($count1 / $imageCount) * 100)
Rename-Item $_ -NewName $_.Name.Replace("-", "").Replace("ABCALL", "20").Replace("CheckInvoiceFront", "CIF").Replace("CheckInvoiceBack", "CIB").Replace("CheckFront", "CF").Replace("CheckBack", "CB") |Out-Null
$year = $_.Name.SubString(0, 4)
$monthday = $_.Name.Substring(4,4)
$batch = $_.Name.SubString(12, 4)
$fulldate = $year+$monthday
$datebatch = $monthday+$batch
$image = $_.Name.SubString(16)
$fullPath = "$desDirectory\$fulldate\$datebatch"
if (-not (Test-Path $fullPath)) { mkdir $fullPath |Out-Null }
Move-Item $_ -Destination "$fullPath\$image" |Out-Null
$count1++
}
# Finished
Clear-Host
Write-Host "Job complete!"
Timeout /T -1
}
# Closes if no images are found (likely bad path)
else {
Write-Host "There were no images in the selected folder. Now closing..."
Timeout /T 10
Exit
}
Usually this results in an error stating that it's can't find the path of the original file name, as if it's still looking for the original non-renamed image. I tried adding some other things, but then it said I was passing null values. I'm just not sure what I'm doing wrong.
Note that if I take the everything after the "Rename-Item" (starting with "$year =") and have that in a different ForEach method, it works. I guess I just don't know how to make the Rename-Item return its results back to "$_" before everything else tries working on it. I tried messing around with "-PassThru" but I don't think I was doing it right.
Any suggestions?
As Olaf points out, situationally you may not need both a Rename-Item and a Move-Item call, because Move-Item can rename and move in single operation.
That said, Move-Item does not support implicit creation of the target directory to move a file to, so in your case you do need separate calls.
You can use Rename-Item's -PassThru switch to make it output a System.IO.FileInfo instance (or, if a directory is being renamed, a System.IO.DirectoryInfo instance) representing the already renamed file; you can directly pass such an instance to Move-Item via the pipeline:
Get-ChildItem -File -Recurse -Path $sorDirectory -include '*.tif' |
ForEach-Object {
# ...
# Use -PassThru with Rename-Item to output a file-info object describing
# the already renamed file.
$renamedFile = $_ | Rename-Item -PassThru -NewName $_.Name.Replace("-", "").Replace("ABCALL", "20").Replace("CheckInvoiceFront", "CIF").Replace("CheckInvoiceBack", "CIB").Replace("CheckFront", "CF").Replace("CheckBack", "CB")
# ...
# Pass $renamedFile to Move-Item via the pipeline.
$renamedFile | Move-Item -Destination "$fullPath\$image"
# ...
}
As for your desire to:
make the Rename-Item return its results back to "$_"
While PowerShell doesn't prevent you from modifying the automatic $_ variable, it is better to treat automatic variables as read-only.
Therefore, a custom variable is used above to store the output from Rename-Item -PassThru
You need -passthru and -destination:
rename-item file1 file2 -PassThru | move-item -Destination dir1

Powershell script been running for days when doing comparison

I got a powershell query, it works fine for smaller amount of data but i am trying to run my CSV against a folder which has multiple folders and files within. Folder size is nearly 800GB and 180 folders within.
I want to see if the file exists in the folder, I can manually search the files within Windows and does not take to long to return a result but my CSV has 3000 rows and i do not wish to do this for 3000 rows. My script works fine for a smaller amount of data.
The script has been running for 6 days and it has not generated a file with data as of yet. it is 0KB and I am running it via task scheduler.
Script is below.
$myFolder = Get-ChildItem 'C:\Test\TestData' -Recurse -ErrorAction
SilentlyContinue -Force
$myCSV = Import-Csv -Path 'C:\Test\differences.csv' | % {$_.'name' -replace "\\", ""}
$compare = Compare-Object -ReferenceObject $myCSV -DifferenceObject $myFolder
Write-Output "`n_____MISSING FILES_____`n"
$compare
Write-Output "`n_____MISSING FILES DETAILS____`n"
foreach($y in $compare){
if($y.SideIndicator -eq "<="){
write-output "$($y.InputObject) Is present in the CSV but not in Missing folder."
}
}
I then created another script which runs the above script and contains an out file command and runs with Task scheduler.
C:\test\test.ps1 | Out-File 'C:\test\Results.csv'
is there a better way of doing this?
Thanks
is there a better way of doing this?
Yes!
Add each file name on disk to a HashSet[string]
the HashSet type is SUPER FAST at determining whether it contains a
specific value or not, much faster than Compare-Object
Loop over your CSV records, check if each file name exists in the set from step 1
# 1. Build our file name index using a HashSet
$fileNames = [System.Collections.Generic.HashSet[string]]::new()
Get-ChildItem 'C:\Test\TestData' -Recurse -ErrorAction
SilentlyContinue -Force |ForEach-Object {
[void]$fileNames.Add($_.Name)
}
# 2. Check each CSV record against the file name index
Import-Csv -Path 'C:\Test\differences.csv' |ForEach-Object {
$referenceName = $_.name -replace '\\'
if(-not $fileNames.Contains($referenceName)){
"${referenceName} is present in CSV but not on disk"
}
}
Another option is to use the hash set from step 1 in a Where-Object filter:
$csvRecordsMissingFromDisk = Import-Csv -Path 'C:\Test\differences.csv' |Where-Object { -not $fileNames.Contains($_) }

How to select [n] Items from a CSV list to assign them to a variable and afterwards remove those items and save the file using PowerShell

I'm parsing a CSV file to get the names of folders which I need to copy to another location. Because there are hundreds of them, I need to select the first 10 or so and run the copy routine but to avoid copying them again I'm removing them from the list and saving the file.
I'll run this on a daily scheduled task to avoid having to wait for the folders to finish copying. I'm having a problem using the 'Select' and 'Skip' options in the code (see below), if I remove those lines the folders are copied (I'm using empty folders to test) but if I have them in, then nothing happens when I run this in PowerShell.
I looked around in other questions about similar issues but did not find anything that answers this particular issue selecting and skipping rows in the CSV.
$source_location = 'C:\Folders to Copy'
$folders_Needed = gci $source_location
Set-Location -Path $source_location
$Dest = 'C:\Transferred Folders'
$csv_name = 'C:\List of Folders.csv'
$csv_Import = Get-Content $csv_name
foreach($csv_n in $csv_Import | Select-Object -First 3){
foreach ($folder_Tocopy in $folders_Needed){
if("$folder_Tocopy" -contains "$csv_n"){
Copy-Item -Path $folder_Tocopy -Destination $Dest -Recurse -Verbose
}
}
$csv_Import | Select-Object -Skip 3 | Out-File -FilePath $csv_name
}
It should work with skip/first as in your example, but I cannot really test it without your sample data. Also, it seems wrong that you write the same output to the csv file at every iteration of the loop. And I assume it's not a csv file but actually just a plain text file, a list of folders? Just folder names or full paths? (I assume the first.)
Anyways, here is my suggested update to the script (see comments):
$source_location = 'C:\Folders to Copy'
$folders_Needed = Get-ChildItem $source_location
$Dest = 'C:\Transferred Folders'
$csv_name = 'C:\List of Folders.csv'
$csv_Import = #(Get-Content $csv_name)
# optional limit
# set this to $csv_Import.Count if you want to copy all folders
$limit = 10
# loop over the csv entries
for ($i = 0; $i -lt $csv_Import.Count -and $i -lt $limit; $i++) {
# current line in the csv file
$csv_n = $csv_Import[$i]
# copy the folder(s) which name matches the csv entry
$folders_Needed | where {$_.Name -eq $csv_n} | Copy-Item -Destination $Dest -Recurse -Verbose
# update the csv file (skip all processed entries)
$csv_Import | Select-Object -Skip ($i + 1) | Out-File -FilePath $csv_name
}

Get-ChildItem in an enormous directory and RAM usage

I have created a PS script on a domain controller (SRV2012R2).
The script checks every shared folder (mapped drive) to see if there are any files present larger than 2GB:
foreach($dir in $Dirs)
{
$files = Get-ChildItem $dir -Recurse
foreach ($item in $files)
{
#Check if $item.Size is greater than 2GB
}
}
I have the following problem:
The shares are pretty filled with over 800GB of (sub)folders, files and most of them are just normal documents.
Whenever I run my script, I see that the CPU+RAM consumes enormous amounts while running my script (after 5 minutes into the Get-Childitem-line, the RAM has already reached >4GB).
My question is, why does Get-ChildItem need so many resources? What alternative can I use? Because I haven't manage to run my script succesfully.
I have seen that I can use | SELECT fullname, length after my Get-ChildItem-clause as an improvement, but this hasn't helped me at all (query still consumes enormous RAM).
Is there anything I can do in order that I can loop through the directories without much resistance from the machine resources?
Instead of saving every single file to a variable, use the pipeline to filter out the ones you don't need (ie. those smaller than 2GB):
foreach($dir in $Dirs)
{
$bigFiles = Get-ChildItem $dir -Recurse |Where Length -gt 2GB
}
If you need to process or analyze these big files further, I'd suggest you extend that pipeline with ForEach-Object:
foreach($dir in $Dirs)
{
Get-ChildItem $dir -Recurse |Where Length -gt 2GB |ForEach-Object {
# do whatever else you must
# $_ contains the current file
}
}
Try this:
# Create an Array
$BigFilesArray = #()
#Populate it with your data
$BigFilesArray = #(ForEach-Object($dir in $dirs) {Get-ChildItem $dir -Recurse | Where-Object Length -GT 2GB})
#Number of items in the array
$BigFilesArray.Count
#Looping to get the name and size of each item in the array
ForEach-Object ($bf in $BigFilesArray) {"$($bf.name) - $($bf.length)"}
Hope this helps!

How to use Powershell to list duplicate files in a folder structure that exist in one of the folders

I have a source tree, say c:\s, with many sub-folders. One of the sub-folders is called "c:\s\Includes" which can contain one or more .cs files recursively.
I want to make sure that none of the .cs files in the c:\s\Includes... path exist in any other folder under c:\s, recursively.
I wrote the following PowerShell script which works, but I'm not sure if there's an easier way to do it. I've had less than 24 hours experience with PowerShell so I have a feeling there's a better way.
I can assume at least PowerShell 3 being used.
I will accept any answer that improves my script, but I'll wait a few days before accepting the answer. When I say "improve", I mean it makes it shorter, more elegant or with better performance.
Any help from anyone would be greatly appreciated.
The current code:
$excludeFolder = "Includes"
$h = #{}
foreach ($i in ls $pwd.path *.cs -r -file | ? DirectoryName -notlike ("*\" + $excludeFolder + "\*")) { $h[$i.Name]=$i.DirectoryName }
ls ($pwd.path + "\" + $excludeFolder) *.cs -r -file | ? { $h.Contains($_.Name) } | Select #{Name="Duplicate";Expression={$h[$_.Name] + " has file with same name as " + $_.Fullname}}
1
I stared at this for a while, determined to write it without studying the existing answers, but I'd already glanced at the first sentence of Matt's answer mentioning Group-Object. After some different approaches, I get basically the same answer, except his is long-form and robust with regex character escaping and setup variables, mine is terse because you asked for shorter answers and because that's more fun.
$inc = '^c:\\s\\includes'
$cs = (gci -R 'c:\s' -File -I *.cs) | group name
$nopes = $cs |?{($_.Group.FullName -notmatch $inc)-and($_.Group.FullName -match $inc)}
$nopes | % {$_.Name; $_.Group.FullName}
Example output:
someFile.cs
c:\s\includes\wherever\someFile.cs
c:\s\lib\factories\alt\someFile.cs
c:\s\contrib\users\aa\testing\someFile.cs
The concept is:
Get all the .cs files in the whole source tree
Split them into groups of {filename: {files which share this filename}}
For each group, keep only those where the set of files contains any file with a path that matches the include folder and contains any file with a path that does not match the includes folder. This step covers
duplicates (if a file only exists once it cannot pass both tests)
duplicates across the {includes/not-includes} divide, instead of being duplicated within one branch
handles triplicates, n-tuplicates, as well.
Edit: I added the ^ to $inc to say it has to match at the start of the string, so the regex engine can fail faster for paths that don't match. Maybe this counts as premature optimization.
2
After that pretty dense attempt, the shape of a cleaner answer is much much easier:
Get all the files, split them into include, not-include arrays.
Nested for-loop testing every file against every other file.
Longer, but enormously quicker to write (it runs slower, though) and I imagine easier to read for someone who doesn't know what it does.
$sourceTree = 'c:\\s'
$allFiles = Get-ChildItem $sourceTree -Include '*.cs' -File -Recurse
$includeFiles = $allFiles | where FullName -imatch "$($sourceTree)\\includes"
$otherFiles = $allFiles | where FullName -inotmatch "$($sourceTree)\\includes"
foreach ($incFile in $includeFiles) {
foreach ($oFile in $otherFiles) {
if ($incFile.Name -ieq $oFile.Name) {
write "$($incFile.Name) clash"
write "* $($incFile.FullName)"
write "* $($oFile.FullName)"
write "`n"
}
}
}
3
Because code-golf is fun. If the hashtables are faster, what about this even less tested one-liner...
$h=#{};gci c:\s -R -file -Filt *.cs|%{$h[$_.Name]+=#($_.FullName)};$h.Values|?{$_.Count-gt1-and$_-like'c:\s\includes*'}
Edit: explanation of this version: It's doing much the same solution approach as version 1, but the grouping operation happens explicitly in the hashtable. The shape of the hashtable becomes:
$h = {
'fileA.cs': #('c:\cs\wherever\fileA.cs', 'c:\cs\includes\fileA.cs'),
'file2.cs': #('c:\cs\somewhere\file2.cs'),
'file3.cs': #('c:\cs\includes\file3.cs', 'c:\cs\x\file3.cs', 'c:\cs\z\file3.cs')
}
It hits the disk once for all the .cs files, iterates the whole list to build the hashtable. I don't think it can do less work than this for that bit.
It uses +=, so it can add files to the existing array for that filename, otherwise it would overwrite each of the hashtable lists and they would be one item long for only the most recently seen file.
It uses #() - because when it hits a filename for the first time, $h[$_.Name] won't return anything, and the script needs put an array into the hashtable at first, not a string. If it was +=$_.FullName then the first file would go into the hashtable as a string and the += next time would do string concatenation and that's no use to me. This forces the first file in the hashtable to start an array by forcing every file to be a one item array. The least-code way to get this result is with +=#(..) but that churn of creating throwaway arrays for every single file is needless work. Maybe changing it to longer code which does less array creation would help?
Changing the section
%{$h[$_.Name]+=#($_.FullName)}
to something like
%{if (!$h.ContainsKey($_.Name)){$h[$_.Name]=#()};$h[$_.Name]+=$_.FullName}
(I'm guessing, I don't have much intuition for what's most likely to be slow PowerShell code, and haven't tested).
After that, using h.Values isn't going over every file for a second time, it's going over every array in the hashtable - one per unique filename. That's got to happen to check the array size and prune the not-duplicates, but the -and operation short circuits - when the Count -gt 1 fails, the so the bit on the right checking the path name doesn't run.
If the array has two or more files in it, the -and $_ -like ... executes and pattern matches to see if at least one of the duplicates is in the includes path. (Bug: if all the duplicates are in c:\cs\includes and none anywhere else, it will still show them).
--
4
This is edited version 3 with the hashtable initialization tweak, and now it keeps track of seen files in $s, and then only considers those it's seen more than once.
$h=#{};$s=#{};gci 'c:\s' -R -file -Filt *.cs|%{if($h.ContainsKey($_.Name)){$s[$_.Name]=1}else{$h[$_.Name]=#()}$h[$_.Name]+=$_.FullName};$s.Keys|%{if ($h[$_]-like 'c:\s\includes*'){$h[$_]}}
Assuming it works, that's what it does, anyway.
--
Edit branch of topic; I keep thinking there ought to be a way to do this with the things in the System.Data namespace. Anyone know if you can connect System.Data.DataTable().ReadXML() to gci | ConvertTo-Xml without reams of boilerplate?
I'd do more or less the same, except I'd build the hashtable from the contents of the includes folder and then run over everything else to check for duplicates:
$root = 'C:\s'
$includes = "$root\includes"
$includeList = #{}
Get-ChildItem -Path $includes -Filter '*.cs' -Recurse -File |
% { $includeList[$_.Name] = $_.DirectoryName }
Get-ChildItem -Path $root -Filter '*.cs' -Recurse -File |
? { $_.FullName -notlike "$includes\*" -and $includeList.Contains($_.Name) } |
% { "Duplicate of '{0}': {1}" -f $includeList[$_.Name], $_.FullName }
I'm not as impressed with this as I would like but I thought that Group-Object might have a place in this question so I present the following:
$base = 'C:\s'
$unique = "$base\includes"
$extension = "*.cs"
Get-ChildItem -Path $base -Filter $extension -Recurse |
Group-Object $_.Name |
Where-Object{($_.Count -gt 1) -and (($_.Group).FullName -match [regex]::Escape($unique))} |
ForEach-Object {
$filename = $_.Name
($_.Group).FullName -notmatch [regex]::Escape($unique) | ForEach-Object{
"'{0}' has file with same name as '{1}'" -f (Split-Path $_),$filename
}
}
Collect all the files with the extension filter $extension. Group the files based on their names. Then of those groups find every group where there are more than one of that particular file and one of the group members is at least in the directory $unique. Take those groups and print out all the files that are not from the unique directory.
From Comment
For what its worth this is what I used for testing to create a bunch of files. (I know the folder 9 is empty)
$base = "E:\Temp\dev\cs"
Remove-Item "$base\*" -Recurse -Force
0..9 | %{[void](New-Item -ItemType directory "$base\$_")}
1..1000 | %{
$number = Get-Random -Minimum 1 -Maximum 100
$folder = Get-Random -Minimum 0 -Maximum 9
[void](New-Item -Path $base\$folder -ItemType File -Name "$number.txt" -Force)
}
After looking at all the others, I thought I would try a different approach.
$includes = "C:\s\includes"
$root = "C:\s"
# First script
Measure-Command {
[string[]]$filter = ls $includes -Filter *.cs -Recurse | % name
ls $root -include $filter -Recurse -Filter *.cs |
Where-object{$_.FullName -notlike "$includes*"}
}
# Second Script
Measure-Command {
$filter2 = ls $includes -Filter *.cs -Recurse
ls $root -Recurse -Filter *.cs |
Where-object{$filter2.name -eq $_.name -and $_.FullName -notlike "$includes*"}
}
In my first script, I get all the include files into a string array. Then i use that string array as a include param on the get-childitem. In the end, I filter out the include folder from the results.
In my second script, I enumerate everything and then filter after the pipe.
Remove the measure-command to see the results. I was using that to check the speed. With my dataset, the first one was 40% faster.
$FilesToFind = Get-ChildItem -Recurse 'c:\s\includes' -File -Include *.cs | Select Name
Get-ChildItem -Recurse C:\S -File -Include *.cs | ? { $_.Name -in $FilesToFind -and $_.Directory -notmatch '^c:\s\includes' } | Select Name, Directory
Create a list of file names to look for.
Find all files that are in the list but not part of the directory the list was generated from
Print their name and directory