I have a folder full of 500,00+ files. I'm trying to iterate through this folder and run some logic to determine if we can delete unneeded files. The problem is this process needs to run semi-regularly and the new files that need to be deleted are currently at the end of the list it seems.
I put together the following list of code to sort through it all:
gci $RPT | %{
$flag = 0;
$number = [int]($_.Name | select-string -pattern "\d{12}" -Allmatches).Matches.Value
if ($submidlist -match "^$number$"){
if ($_ -notmatch "acct\.csv|jpd\.csv|jss\.pdf|jman\.pdf|3600\.pdf|cont\.pdf|msl\.txt|pres\.pdf|tray\.pdf|qual\.pdf|zipl\.pdf"){
echo "DELETE SUBMID $_"
remove-item $RPT\$_
$count++
$totalcount++
$flag = 1;
}
}
if ($jobidlist -match "^$number$"){
if ($_ -match "acct\.csv|jpd\.csv|jss\.pdf|jman\.pdf|3600\.pdf|cont\.pdf|msl\.txt|pres\.pdf|tray\.pdf|qual\.pdf|zipl\.pdf"){
echo "DELETE JOBID $_"
remove-item $RPT\$_
$count++
$totalcount++
$flag = 1;
}
}
}
Currently, running the above script takes over 24 hours and it still doesn't make it to the end of the list. Is there a way to optimize this or reverse the order that get-childitem iterates through this folder?
function Delete-Items($List, [string]$ListName){
$DoNotDelete = #("acct.csv","jpd.csv","jss.pdf","jman.pdf","3600.pdf","cont.pdf","msl.txt","pres.pdf","tray.pdf","qual.pdf","zipl.pdf")
$List = $List | %{
"*$_*"
}
Get-ChildItem C:\TEST\56381643\ -Recurse -Include $List -Directory | %{
Get-ChildItem $_.FullName -Exclude $DoNotDelete -Recurse | %{
echo "DELETE $ListName $($_.name | select-string -pattern "\d{12}")"
Remove-Item -Path $_.FullName -WhatIf
}
}
}
#Example Usage
$JobList = #(
098765432109
123456789012
)
$SubmitList = #(
234567890123
)
Delete-Items -List $JobList -ListName JOBID
Delete-Items -List $SubmitList -ListName SUBMID
Lets go over a basic rundown of whats happening in the function.
We have a array of files not to delete
We turn the $list numbers into wildcards by adding a * before and after each item in the array. We then only search for those directories that contain those numbers.
We then use another Get-ChildItem to get the files in each directory but exclude the ones mentioned in$DoNotDelete`.
If you want to delete the files delete the -Whatif on the remove-item
Related
In a directory, there are files with the following filenames:
ExampleFile.mp3
ExampleFile_pn.mp3
ExampleFile2.mp3
ExampleFile2_pn.mp3
ExampleFile3.mp3
I want to iterate through the directory, and IF there is a filename that contains the string '_pn.mp3', I want to test if there is a similarly named file without the '_pn.mp3' in the same directory. If that file exists, I want to remove it.
In the above example, I'd want to remove:
ExampleFile.mp3
ExampleFile2.mp3
and I'd want to keep ExampleFile3.mp3
Here's what I have so far:
$pattern = "_pn.mp3"
$files = Get-ChildItem -Path '$path' | Where-Object {! $_.PSIsContainer}
Foreach ($file in $files) {
If($file.Name -match $pattern){
# filename with _pn.mp3 exists
Write-Host $file.Name
# search in the current directory for the same filename without _pn
<# If(Test-Path $currentdir $filename without _pn.mp3) {
Remove-Item -Force}
#>
}
enter code here
You could use Group-Object to group all files by their BaseName (with the pattern removed), and then loop over the groups where there are more than one file. The result of grouping the files and filtering by count would look like this:
$files | Group-Object { $_.BaseName.Replace($pattern,'') } |
Where-Object Count -GT 1
Count Name Group
----- ---- -----
2 ExampleFile {ExampleFile.mp3, ExampleFile_pn.mp3}
2 ExampleFile2 {ExampleFile2.mp3, ExampleFile2_pn.mp3}
Then if we loop over these groups we can search for the files that do not end with the $pattern:
#'
ExampleFile.mp3
ExampleFile_pn.mp3
ExampleFile2.mp3
ExampleFile2_pn.mp3
ExampleFile3.mp3
'# -split '\r?\n' -as [System.IO.FileInfo[]] | Set-Variable files
$pattern = "_pn"
$files | Group-Object { $_.BaseName.Replace($pattern,'') } |
Where-Object Count -GT 1 | ForEach-Object {
$_.Group.Where({-not $_.BaseName.Endswith($pattern)})
}
This is how your code would look like, remove the -WhatIf switch if you consider the code is doing what you wanted.
$pattern = "_pn.mp3"
$files = Get-ChildItem -Path -Filter *.mp3 -File
$files | Group-Object { $_.BaseName.Replace($pattern,'') } |
Where-Object Count -GT 1 | ForEach-Object {
$toRemove = $_.Group.Where({-not $_.BaseName.Endswith($pattern)})
Remove-Item $toRemove -WhatIf
}
I think you can get by here by adding file names into a hash map as you go. If you encounter a file with the ending you are interested in, check if a similar file name was added. If so, remove both the file and the similar match.
$ending = "_pn.mp3"
$files = Get-ChildItem -Path $path -File | Where-Object { ! $_.PSIsContainer }
$hash = #{}
Foreach ($file in $files) {
# Check if file has an ending we are interested in
If ($file.Name.EndsWith($ending)) {
$similar = $file.Name.Split($ending)[0] + ".mp3"
# Check if we have seen the similar file in the hashmap
If ($hash.Contains($similar)) {
Write-Host $file.Name
Write-Host $similar
Remove-Item -Force $file
Remove-Item -Force $hash[$similar]
# Remove similar from hashmap as it is removed and no longer of interest
$hash.Remove($similar)
}
}
else {
# Add entry for file name and reference to the file
$hash.Add($file.Name, $file)
}
}
Just get a list of the files with the _pn then process against the rest.
$pattern = "*_pn.mp3"
$files = Get-ChildItem -Path "$path" -File -filter "$pattern"
Foreach ($file in $files) {
$TestFN = $file.name -replace("_pn","")
If (Test-Path -Path $(Join-Path -Path $Path -ChildPath $TestFN)) {
$file | Remove-Item -force
}
} #End Foreach
I am doing a script that identifies the hashes of all the files of a path (and recursively). This is alright.
My problem comes when, after I have identified which hashes are the same, I want to save them into an array so later I can delete these files that have the same Hash (if I want to), or just print the duplicate files. And I have been all afternoon and evening trying to figure out how to do it.
My code at the moment:
Write-Host "Write a path: "
$UserInput=Read-Host
Get-ChildItem -Path $UserInput -Recurse
#Get-FileHash cmdlet to get the hashes
$files = Get-ChildItem -Path $UserInput -Recurse | where { !$_.PSIsContainer }
$files | % {(Get-FileHash -Path $_.FullName -Algorithm MD5)}
#Creating an array for all the values and an array for the duplicates
$originals=#()
$copies=#()
#grouping the hashes that are duplicated cmdlet Group-Object:
$Duplicates = Get-ChildItem -Path $UserInput -Recurse -File |Group {($_|Get-FileHash).Hash} |Where Count -gt 1
foreach($FileGroup in $Duplicates)
{
Write-Host "These files share hash : $($FileGroup.Name)"
$FileGroup.Group.FullName |Write-Host
$copies+=$Duplicates
}
So the last part "$copies+=$Duplicates" does not work properly.
In the begining I was thinking of saving the first file in the "original" array. If the second one has the same hash, save that 2nd in the "copies" array. But I am not sure if I can do that in the 1st part of the script when getting the hashes.
After that, the second array would have the duplicates, so it would be easy to delete them from the computer.
I think you should filter the items. I did it and I have a list with only one item of duplicate files and a list with all duplicated files.
You can use the SHA1 algorithm instead of MD5
SHA1 is much more faster than the MD5 algorithm
$fileHashes = Get-ChildItem -Path $myFilePath -Recurse -File | Get-Filehash -Algorithm SHA1
$duplicates = $fileHashes | Group hash | ? {$_.count -gt 1} | % {$_.Group}
$uniqueItems = #{}
$doubledItems = #()
foreach($item in $duplicates) {
if(-not $uniqueItems.ContainsKey($item.Hash)){
$uniqueItems.Add($item.Hash,$item)
}else{
$doubledItems += $item
}
}
# all duplicates files
$doubledItems
# Remove the duplicate files
# $doubledItems | % {Remove-Item $_.path} -Verbose
# one of the duplicate files
$uniqueItems
Set the seach root folder
$myFilePath = ''
You should only need to use Get-ChildItem once, once you have all the files you can create a hash for them and then group the hashes to find duplicates. See my example code below:
Write-Host "Write a path: "
$UserInput=Read-Host
#Get-FileHash cmdlet to get the hashes
$files = Get-ChildItem -Path $UserInput -Recurse | Where-Object -FilterScript { !$_.PSIsContainer }
$hashes = $files | ForEach-Object -Process {Get-FileHash -Path $_.FullName -Algorithm MD5}
$duplicates = $hashes | Group-Object -Property Hash | Where-Object -FilterScript {$_.Count -gt 1}
foreach($duplicate in $duplicates)
{
Write-Host -Object "These files share hash : $($duplicate.Group.Path -join ', ')"
# delete first duplicate
# Remove-Item -Path $duplicate.Group[0].Path -Force -WhatIf
# delete second duplicate
# Remove-Item -Path $duplicate.Group[1].Path -Force -WhatIf
# delete all duplicates except the first
# foreach($duplicatePath in ($duplicate.Group.Path | Select-Object -Skip 1))
# {
# Remove-Item -Path $duplicatePath -Force -WhatIf
# }
}
Uncomment the code at the end to delete duplicates based on your preferences and when you're ready to delete files make sure you also remove the -WhatIf parameter.
This is the output i receive from the above command if i uncomment out the "delete all duplicates except the first"
Write a path:
H:\
These files share hash : H:\Rename template 2.csv, H:\Rename template.csv
What if: Performing the operation "Remove File" on target "H:\Rename template.csv".
In the most basic sense, I have a SQL query which returns an array of IDs, which I've stored into a variable $ID. I then want to perform a Get-childitem on a specific folder for any filenames that contain any of the IDs in said variable ($ID) There are three possible filenames that could exist:
$ID.xml
$ID_input.xml
$ID_output.xml
Once I have the results of get-childitem, I want to output this as a text file and delete the files from the folder. The part I'm having trouble with is filtering the results of get-childitem to define the filenames I'm looking for, so that only files that contain the IDs from the SQL output are displayed in my get-childitem results.
I found another way of doing this, which works fine, by using for-each ($i in $id), then building the desired filenames from that and performing a remove item on them:
# Build list of XML files
$XMLFile = foreach ($I in $ID)
{
"$XMLPath\$I.xml","$XMLPath\$I`_output.xml","$XMLPath\$I`_input.xml"
}
# Delete XML files
$XMLFile | Remove-Item -Force
However, this produces a lot of errors in the shell, as it tries to delete files that don't exist, but whose IDs do exist in the database. I also can't figure out how to produce a text output of the files that were actually deleted, doing it this way, so I'd like to get back to the get-childitem approach, if possible.
Any ideas would be greatly appreciated. If you require more info, just ask.
You can find all *.xml files with Get-ChildItem to minimize the number of files to test and then use regex to match the filenames. It's faster than a loop/multiple test, but harder to read if you're not familiar with regex.
$id = 123,111
#Create regex-pattern (search-pattern)
$regex = "^($(($id | ForEach-Object { [regex]::Escape($_) }) -join '|'))(?:_input|_output)?$"
$filesToDelete = Get-ChildItem -Path "c:\users\frode\Desktop\test" -Filter "*.xml" | Where-Object { $_.BaseName -match $regex }
#Save list of files
$filesToDelete | Select-Object -ExpandProperty FullName | Out-File "deletedfiles.txt" -Append
#Remove files (remove -WhatIf when ready)
$filesToDelete | Remove-Item -Force -WhatIf
Regex demo: https://regex101.com/r/dS2dJ5/2
Try this:
clear
$ID = "a", "b", "c"
$filesToDelete = New-Object System.Collections.ArrayList
$files = Get-ChildItem e:\
foreach ($I in $ID)
{
($files | Where-object { $_.Name -eq "$ID.xml" }).FullName | ForEach-Object { $filesToDelete.Add($_) }
($files | Where-object { $_.Name -eq "$ID_input.xml" }).FullName | ForEach-Object { $filesToDelete.Add($_) }
($files | Where-object { $_.Name -eq "$ID_output.xml" }).FullName | ForEach-Object { $filesToDelete.Add($_) }
}
$filesToDelete | select-object -Unique | ForEach-Object { Remove-Item $_ -Force }
I am using the below script to search for credit card numbers inside a folder that contains many subfolders:
Get-ChildItem -rec | ?{ findstr.exe /mprc:. $_.FullName }
| select-string "[456][0-9]{15}","[456][0-9]{3}[-| ][0-9]{4} [-| ][0-9]{4}[-| ][0-9]{4}"
However, this will return all instances found in every folder/subfolder.
How can I amend the script to skip the current folder on the first instance found? meaning that if it finds a credit card number it will stop processing the current folder and move to the next folder.
Appreciate you answers and help.
Thanks in advance,
You could use this recursive function:
function cards ($dir)
Get-ChildItem -Directory $dir | % { cards($_.FullName) }
Get-ChildItem -File $dir\* | % {
if ( Select-String $_.FullName "[456][0-9]{15}","[456][0-9]{3}[-| ][0-9]{4} [-| ][0-9]{4}[-| ][0-9]{4}" ) {
write-host "card found in $dir"
return
}
}
}
cards "C:\path\to\base\dir"
It'll keep going through subdirectories of the top level directory you specify. Whenever it gets to a directory with no subdirectories, or its been through all the subdirectories of the current directory, it'll start looking through the files for the matching regex, but will bail out of the function when the first match is found.
So really what you want is the first file in every folder that has a credit card number in the contents.
Break it into two parts. Get a list of all your folders, recursively. Then, for each folder, get the list of files, non-recursively. Search each file until you find one that matches.
I don't see any easy way to do this with pipes alone. That means more traditional programming techniques.
This requires PowerShell 3.0. I've eliminated ?{ findstr.exe /mprc:. $_.FullName } because all I can see that it does is eliminate folders (and zero length files) and this already handles that.
Get-ChildItem -Directory -Recurse | ForEach-Object {
$Found = $false;
$i = 0;
$Files = $_ | Get-ChildItem -File | Sort-Object -Property Name;
for ($i = 0; ($Files[$i] -ne $null) -and ($Found -eq $false); $i++) {
$SearchResult = $Files[$i] | Select-String "[456][0-9]{15}","[456][0-9]{3}[-| ][0-9]{4} [-| ][0-9]{4}[-| ][0-9]{4}";
if ($SearchResult) {
$Found = $true;
Write-Output $SearchResult;
}
}
}
Didn't have the time to test it fully, but I thought about something like this:
$Location = 'H:\'
$Dirs = Get-ChildItem $Location -Directory -Recurse
$Regex1 = "[456][0-9]{3}[-| ][0-9]{4} [-| ][0-9]{4}[-| ][0-9]{4}"
$Regex2 = "[456][0-9]{15}"
Foreach ($d in $Dirs) {
$Files = Get-ChildItem $d.FullName -File
foreach ($f in $Files) {
if (($f.Name -match $Regex1) -or ($f.Name -match $Regex2)) {
Write-Host 'Match found'
Return
}
}
}
Here is another one, why not, the more the merrier.
I'm assuming that your Regex is correct.
Using break in the second loop will skip looking for a credit card in the remaining files if one is found and continue to the next folder.
$path = '<your path here>'
$folders = Get-ChildItem $path -Directory -rec
foreach ($folder in $folders)
{
$items = Get-ChildItem $folder.fullname -File
foreach ($i in $items)
{
if (($found = $i.FullName| select-string "[456][0-9]{15}","[456][0-9]{3}[-| ][0-9]{4} [-| ][0-9]{4}[-| ][0-9]{4}") -ne $null)
{
break
}
}
}
I think the intention was to look inside each file for the PII data right?
If so, you need to open the load the file and search each line. The code you posted will only run a regex on the name of the file.
I am trying to count the files in all subfolders in a directory and display them in a list.
For instance the following dirtree:
TEST
/VOL01
file.txt
file.pic
/VOL02
/VOL0201
file.nu
/VOL020101
file.jpg
file.erp
file.gif
/VOL03
/VOL0301
file.org
Should give as output:
PS> DirX C:\TEST
Directory Count
----------------------------
VOL01 2
VOL02 0
VOL02/VOL0201 1
VOL02/VOL0201/VOL020101 3
VOL03 0
VOL03/VOL0301 1
I started with the following:
Function DirX($directory)
{
foreach ($file in Get-ChildItem $directory -Recurse)
{
Write-Host $file
}
}
Now I have a question: why is my Function not recursing?
Something like this should work:
dir -recurse | ?{ $_.PSIsContainer } | %{ Write-Host $_.FullName (dir $_.FullName | Measure-Object).Count }
dir -recurse lists all files under current directory and pipes (|) the result to
?{ $_.PSIsContainer } which filters directories only then pipes again the resulting list to
%{ Write-Host $_.FullName (dir $_.FullName | Measure-Object).Count } which is a foreach loop that, for each member of the list ($_) displays the full name and the result of the following expression
(dir $_.FullName | Measure-Object).Count which provides a list of files under the $_.FullName path and counts members through Measure-Object
?{ ... } is an alias for Where-Object
%{ ... } is an alias for foreach
Similar to David's solution this will work in Powershell v3.0 and does not uses aliases in case someone is not familiar with them
Get-ChildItem -Directory | ForEach-Object { Write-Host $_.FullName $(Get-ChildItem $_ | Measure-Object).Count}
Answer Supplement
Based on a comment about keeping with your function and loop structure i provide the following. Note: I do not condone this solution as it is ugly and the built in cmdlets handle this very well. However I like to help so here is an update of your script.
Function DirX($directory)
{
$output = #{}
foreach ($singleDirectory in (Get-ChildItem $directory -Recurse -Directory))
{
$count = 0
foreach($singleFile in Get-ChildItem $singleDirectory.FullName)
{
$count++
}
$output.Add($singleDirectory.FullName,$count)
}
$output | Out-String
}
For each $singleDirectory count all files using $count ( which gets reset before the next sub loop ) and output each finding to a hash table. At the end output the hashtable as a string. In your question you looked like you wanted an object output instead of straight text.
Well, the way you are doing it the entire Get-ChildItem cmdlet needs to complete before the foreach loop can begin iterating. Are you sure you're waiting long enough? If you run that against very large directories (like C:) it is going to take a pretty long time.
Edit: saw you asked earlier for a way to make your function do what you are asking, here you go.
Function DirX($directory)
{
foreach ($file in Get-ChildItem $directory -Recurse -Directory )
{
[pscustomobject] #{
'Directory' = $File.FullName
'Count' = (GCI $File.FullName -Recurse).Count
}
}
}
DirX D:\
The foreach loop only get's directories since that is all we care about, then inside of the loop a custom object is created for each iteration with the full path of the folder and the count of the items inside of the folder.
Also, please note that this will only work in PowerShell 3.0 or newer, since the -directory parameter did not exist in 2.0
Get-ChildItem $rootFolder `
-Recurse -Directory |
Select-Object `
FullName, `
#{Name="FileCount";Expression={(Get-ChildItem $_ -File |
Measure-Object).Count }}
My version - slightly cleaner and dumps content to a file
Original - Recursively count files in subfolders
Second Component - Count items in a folder with PowerShell
$FOLDER_ROOT = "F:\"
$OUTPUT_LOCATION = "F:DLS\OUT.txt"
Function DirX($directory)
{
Remove-Item $OUTPUT_LOCATION
foreach ($singleDirectory in (Get-ChildItem $directory -Recurse -Directory))
{
$count = Get-ChildItem $singleDirectory.FullName -File | Measure-Object | %{$_.Count}
$summary = $singleDirectory.FullName+" "+$count+" "+$singleDirectory.LastAccessTime
Add-Content $OUTPUT_LOCATION $summary
}
}
DirX($FOLDER_ROOT)
I modified David Brabant's solution just a bit so I could evaluate the result:
$FileCounter=gci "$BaseDir" -recurse | ?{ $_.PSIsContainer } | %{ (gci "$($_.FullName)" | Measure-Object).Count }
Write-Host "File Count=$FileCounter"
If($FileCounter -gt 0) {
... take some action...
}