How to delete duplicate files with similar name - powershell

I'm fairly new to PowerShell and I've not been able to find a definitive answer for my problem. I have a bunch of excel files in different folders which are duplicates but have varying file names due to them being updated.
e.g.
015 Approved warranty - Turkey - Case-2019 08-1437015 (issue 3),
015 Approved warranty - Turkey - Case-2019 08-1437015 (final issue)
015 Approved warranty - Turkey - Case-2019 08-1437015
015 Approved warranty - Turkey - Case-2019 08-1437015 amended
I've tried different things but now I know the easiest way to filter the files but don't know the syntax. The anchor point will be the case number just after the date. I want to compare the case numbers against each other and only keep the newest ones (by date modified) and delete the rest. Any guidance is appreciated.
#take files from folder
$dupesource = 'C:\Users\W_Brooker\Documents\Destination\2019\08'
#filter files by case number (7 digit number after date)
$files = Get-ChildItem $dupesource -Filter "08-aaaaaaa"
#If case number is the same keep newest file delete rest
foreach ($file in $files){
$file | Delete-Item - sort -property Datemodified |select -Last 1
}

A PowerShell-idiomatic solution is to:
combine multiple cmdlets in a single pipeline,
in which Group-Object provides the core functionality of grouping duplicate files by shared case number in the file name:
# Define the regex that matches a case number:
# A 7-digit number embedded in filenames that duplicates share.
$regex = '\b\d{7}\b'
# Enumerate all files and select only those whose name contains a case number.
Get-ChildItem -File $dupesource | Where-Object { $_.BaseName -match $regex } |
# Group the resulting files by shared embedded case number.
Group-Object -Property { [regex]::Match($_.BaseName, $regex).Value } |
# Process each group:
ForEach-Object {
# In each group, sort files by most recently updated first.
$_.Group | Sort-Object -Descending LastWriteTimeUtc |
# Skip the most recent file and delete the older ones.
Select-Object -Skip 1 | Remove-Item -WhatIf
}
The -WhatIf common parameter previews the operation. Remove it once you're sure it will do what you want.

This should do the trick:
$files = Get-ChildItem 'C:\Users\W_Brooker\Documents\Destination\2019\08' -Recurse
# create datatable to store file Information in it
$dt = New-Object system.Data.DataTable
[void]$dt.Columns.Add('FileName',[string]::Empty.GetType() )
[void]$dt.Columns.Add('CaseNumber',[string]::Empty.GetType() )
[void]$dt.Columns.Add('FileTimeStamp',[DateTime]::MinValue.GetType() )
[void]$dt.Columns.Add('DeleteFlag',[byte]::MinValue.GetType() )
# Step 1: Make inventory
foreach( $file in $files ) {
if( !$file.PSIsContainer -and $file.Extension -like '.xls*' -and $file.Name -match '^.*\-\d+ *[\(\.].*$' ) {
$row = $dt.NewRow()
$row.FileName = $file.FullName
$row.CaseNumber = $file.Name -replace '^.*\-(\d+) *[\(\.].*$', '$1'
$row.FileTimeStamp = $file.LastWriteTime
$row.DeleteFlag = 0
[void]$dt.Rows.Add( $row )
}
}
# Step 2: Mark files to delete
$rows = $dt.Select('', 'CaseNumber, FileTimeStamp DESC')
$caseNumber = ''
foreach( $row in $rows ) {
if( $row.CaseNumber -ne $caseNumber ) {
$caseNumber = $row.CaseNumber
Continue
}
$row.DeleteFlag = 1
[void]$dt.AcceptChanges()
}
# Step 3: Delete files
$rows = $dt.Select('DeleteFlag = 1', 'FileTimeStamp DESC')
foreach( $row in $rows ) {
$fileName = $row.FileName
Remove-Item -Path $fileName -Force | Out-Null
}

Here's an alternative that leverages the PowerShell Group-Object cmdlet.
It uses a regex to matche files on the case number, ignoring those that don't have a case number. See the screen shot at the bottom that shows test data (a collection of test xlsx files)
cls
#Assume that each file has an xlsx extension.
#Assume that a case number always looks like this: "Case-YYYY~XX-Z" where YYYY is 4 digits, ~ is a single space, XX is two digits, and Z is one-to-many-digits
#make a list of xlsx files (recursive)
$files = Get-ChildItem -LiteralPath .\ExcelFiles -Recurse -Include *.xlsx
#$file is a System.IO.FileInfo object. Parse out the Case number and add it to the $file object as CaseNumber property
foreach ($file in $files)
{
$Matches = $null
$file.Name -match "(^.*)(Case-\d{4}\s{1}\d{2}-\d{1,})(.*\.xlsx$)" | out-null
if ($Matches.Count -eq 4)
{
$caseNumber = $Matches[2]
$file | Add-Member -NotePropertyName CaseNumber -NotePropertyValue $caseNumber
}
Else
{
#child folders will end up in this group too
$file | Add-Member -NotePropertyName CaseNumber -NotePropertyValue "NoCaseNumber"
}
}
#group the files by CaseNumber
$files | Group-Object -Property CaseNumber -OutVariable fileGroups | out-null
foreach ($fileGroup in $fileGroups)
{
#skip folders and files that don't have a valid case #
if ($fileGroup.Name -eq "NoCaseNumber")
{
continue
}
#for each group: sort files descending by LastWriteTime. Newest file will be first, so skip 1st file and remove the rest
$fileGroup.Group | sort -Descending -Property LastWriteTime | select -skip 1 | foreach {Remove-Item -LiteralPath $_.FullName -Force}
}
Test Data

Related

how to find count of certain string from multiple text file in powershell

I have a multiple text files and I need to find and count unique specific words in those files.
Like we need to find how many users logged in for certain time from multiple log files.
I have created the following code, its working fine for lesser files but for multiple larger files its taking too much time
$A =Get-Content C:\Users\XXXXXXX\Documents\Python\Test\*.log | ForEach-Object { $wrds=$_.Split(" "); foreach ($i in $wrds) { Write-Output $i } } | Sort-Object | Get-Unique | select-string -pattern "AAA" -CaseSensitive -SimpleMatch
is it possible to finetune this to run faster.
If I understand correctly, you would like to find certain user logins occurring in many log files, based on your use of Select-String.
# an array of usernames to search for
$users = 'user1', 'user2', 'userX'
# create a regex from this array by joining the values with regex 'OR' (the pipe symbol)
[regex]$regex = ($users | ForEach-Object { [regex]::Escape($_)}) -join '|'
# or if you need whole string matches instead of allowing partial matches, use
# [regex]$regex = '\b({0})\b' -f (($users | ForEach-Object { [regex]::Escape($_)}) -join '|')
# get a list of all log files
$logFiles = Get-ChildItem -Path 'C:\Users\XXXXXXX\Documents\Python\Test' -Filter '*.log' -File
# loop trhough the list of log files and find the matches in each of them
$result = foreach ($file in $logFiles) {
$allmatches = $regex.Matches(($file | Get-Content -Raw))
$logins = #($allmatches.Value | Select-Object -Unique)
if ($logins.Count) {
[PsCustomObject]#{
LogFile = $file.FullName
LoginCount = $logins.Count
Users = $logins -join ', '
}
}
}
# visual output
$result | Out-GridView -Title 'Login search results'
# or save as CSV file
$result | Export-Csv -Path 'X:\somewhere\results.csv' -NoTypeInformation

Powershell: How to remove all files in Directory except recent ones

Have folder which has backups of SQL databases with backup date in the name.
e.g. C:\Backup folder.
Example of backup files:
archive_1_01022022.bak
archive_1_02022022.bak
archive_1_03022022.bak
archive_2_01022022.bak
archive_2_02022022.bak
archive_2_03022022.bak
archive_3_01022022.bak
archive_3_02022022.bak
archive_3_03022022.bak
I need powershell script which removes all files from this directory but keeps recent ones (e.g. for last 5 days), but at the same time I need to keep at least 3 copies of each database (in case there are no backups done for more than last 5 days).
Below script removes all files and keeps recent ones for last 5 days:
$Folder = "C:\Backup"
$CurrentDate = Get-Date
$DateDel = $CurrentDate.AddDays(-5)
Get-ChildItem $Folder | Where-Object { $_.LastWriteTime -lt $DateDel } | Remove-Item
Above is wokring fine, but if there are no recent backups for last 10 days and if I run above code then it will remove all files in C:\Backup. For such cases I need to keep at least 3 backup files of each databases.
If I use below code (for example I have 9 different databases), then it do job:
$Folder = "C:\Backup"
Get-ChildItem $Folder | ? { -not $_.PSIsContainer } |
Sort-Object -Property LastWriteTime -Descending |
Select-Object -Skip 27 |
Remove-Item -Force
But implementation is weird. For example if I have backups of 9 databases, then I need to provide "Select-Object -Skip" with value 27 (9 databases x skip 3 files of each database). In case I have more databases or less, then each time I need to adjust this number. How can I make "Select-Object -Skip 3" static value?
In that case, you need to test how many files with a newer or equal date compared to the reference date there are in the folder. If less than 3, sort them by the LastWriteTime property and keep the top 3. If you have enough newer files left, you can delete the old ones:
$Folder = "C:\Backup"
$DateDel = (Get-Date).AddDays(-5).Date # set to midnight
# get a list of all backup files
$allFiles = Get-ChildItem -Path $Folder -Filter 'archive*.bak' -File
# test how many of these are newer than 5 days ago
$latestFiles = #($allFiles | Where-Object { $_.LastWriteTime -ge $DateDel })
if ($latestFiles.Count -lt 3) {
# if less than three keep the latest 3 files and remove the rest
$allFiles | Sort-Object LastWriteTime -Descending | Select-Object -Skip 3 | Remove-Item -WhatIf
}
else {
# there are plenty of newer files, so we can remove the older ones
$allFiles | Where-Object { $_.LastWriteTime -lt $DateDel } | Remove-Item -WhatIf
}
I have added the -WhatIf safety switch to both Remove-Item cmdlets, so you can first see what would happen before actualy destroying files. Once you are satisfied with what the console shows, remove those -WhatIf switches and run again
If you have 9 databases and the number in the filename after archive_ makes the distinction between those database backup files, just put the above inside a loop and adjust the -Filter:
$Folder = "C:\Backup"
$DateDel = (Get-Date).AddDays(-5).Date # set to midnight
# loop through the 9 database files
for ($i = 1; $i -le 9; $i++) {
# get a list of all backup files per database
$allFiles = Get-ChildItem -Path $Folder -Filter "archive_$($i)_*.bak" -File
# test how many of these are newer than 5 days ago
$latestFiles = #($allFiles | Where-Object { $_.LastWriteTime -ge $DateDel })
if ($latestFiles.Count -lt 3) {
# if less than three keep the latest 3 files and remove the rest
$allFiles | Sort-Object LastWriteTime -Descending | Select-Object -Skip 3 | Remove-Item -WhatIf
}
else {
# there are plenty of newer files, so we can remove the older ones
$allFiles | Where-Object { $_.LastWriteTime -lt $DateDel } | Remove-Item -WhatIf
}
}
Ok, so now we know the example names you gave do not bare resemblance with the real names, the code could be as simple as this:
$dbNames = 'archive', 'master', 'documents', 'rb' # the names used in the backup files each database creates
$Folder = "C:\Backup"
$DateDel = (Get-Date).AddDays(-5).Date # set to midnight
# loop through the database files
foreach ($name in $dbNames) {
# get a list of all backup files per database
$allFiles = Get-ChildItem -Path $Folder -Filter "$($name)_*.bak" -File
# test how many of these are newer than 5 days ago
$latestFiles = #($allFiles | Where-Object { $_.LastWriteTime -ge $DateDel })
if ($latestFiles.Count -lt 3) {
# if less than three keep the latest 3 files and remove the rest
$allFiles | Sort-Object LastWriteTime -Descending | Select-Object -Skip 3 | Remove-Item -WhatIf
}
else {
# there are plenty of newer files, so we can remove the older ones
$allFiles | Where-Object { $_.LastWriteTime -lt $DateDel } | Remove-Item -WhatIf
}
}
Basing on the assumption that your backups have a name convention of : DBNAME_ddMMyyyy.bak where the date correspond to the backup date, I would do something like below.
$Params = #{
MinBackupThresold = 1
MinBackupDays = 5
SimulateDeletion = $False # Set to true to perform a Remove-Item -WhatIf deletion}
$Folder = "C:\temp\test"
$CurrentDate = Get-Date
$DateDel = $CurrentDate.AddDays($Params.MinBackupDays).Date # set to midnight
$Archives = Foreach ($File in Get-ChildItem $Folder ) {
# -13 come from assuming naming convention DBName_8CharBackupDate.ext (eg: Db1_01012022.bak)
$DbNameEndIndex = $File.Name.Length - 13
# +1 since our naming convention have an underscore between db name and date.
$RawDateStr = $File.Name.Substring($DbNameEndIndex + 1 , 8)
[PSCustomObject]#{
Path = $FIle.FullName
LastWriteTime = $File.LastWriteTime
DBName = $File.Name.Substring(0, $DbNameEndIndex)
BackupDate = [datetime]::ParseExact( $RawDateStr, 'ddMMyyyy', $null)
}
}
#Here we group archives by their "dbname" so we can make sure to keep a min. backups for each.
$GroupedArchives = $Archives | Group DBName
Foreach ($Db in $GroupedArchives) {
if ($Db.Count -gt $Params.MinBackupThresold) {
$Db.Group | Sort BackupDate | Select-Object -Skip $Params.MinBackupThresold | Where-Object { $_.BackupDate -lt $DateDel } | % { Remove-Item -Path $_.Path -Force -WhatIf:$Params.SimulateDeletion }
} else {
# You could include additional checks to verify last backup, alert you if there should be more in there, etc...
}
}
Note: Using the date extracted from the filename will be more accurate than the lastwritetime, which could be updated for other reasons (Since we have it, might as well use it.)
Note 2 : Added WhatIf in the $params so you can easily switch between actual removal and simulation (Theo's answer gave me the idea of providing that switch) and his .Date to make sure the date was set to midnight instead of current time of day.

How do I write a Powershell script that checks when the last time a file was added to a folder?

I'm currently writing a script that checks each folder in a directory for the last time a file was written to each folder. I'm having trouble figuring out how to obtain the last time a file was written to the folder, as opposed to just retrieving the folder's creation date.
I've tried using Poweshell's recursive method, but couldn't figure out how to properly set it up. Right now, the script successfully prints the name of each folder to the Excel spreadsheet, and also print the last write time of each folder, which is the incorrect information.
$row = 2
$column = 1
Get-ChildItem "C:\Users\Sylveon\Desktop\Test"| ForEach-Object {
#FolderName
$sheet.Cells.Item($row,$column) = $_.Name
$column++
#LastBackup
$sheet.Cells.Item($row,$column) = $_.LastWriteTime
$column++
#Increment to next Row and reset Column
$row++
$column = 1
}
The current state of the script prints each folder name to the report, but gives the folders creation date rather than the last time a file was written to that folder.
The following should work to get the most recent edit date of any file in the current directory.
Get-ChildItem | Sort-Object -Property LastWriteTime -Descending | Select-Object -first 1 -ExpandProperty "LastWriteTime"
Get-ChildItem gets items in your directory
Sort-Object -Property LastWriteTime -Descending sorts by write-time, latest first
Select-Object -first 1 -ExpandProperty "LastWriteTime" gets the first one in the list, then gets its write-time
I made this to get the data you're trying to get. The last line gives us an empty string if the directory is empty, which is probably what's safest for Excel, but you could also default to something other than an empty string, like the directory's creation date:
$ChildDirs = Get-ChildItem | Where-Object { $_ -is [System.IO.DirectoryInfo] }
$EditNames = $ChildDirs | ForEach-Object Name
$EditTimes = $EditNames | ForEach-Object { #( (Get-ChildItem $_ | Sort-Object -Property LastWriteTime -Descending | Select-Object -first 1 LastWriteTime), '' -ne $null)[0] }
for($i=0; $i -lt $ChildDirs.Length; $i++) {
Write-Output $EditNames[$i]
Write-Output $EditTimes[$i]
}
To implement this for what you're doing, if I understand your question correctly, try the following:
$ChildDirs = Get-ChildItem | Where-Object { $_ -is [System.IO.DirectoryInfo] }
$EditNames = $ChildDirs | ForEach-Object Name
$EditTimes = $EditNames | ForEach-Object { #( (Get-ChildItem $_ | Sort-Object -Property LastWriteTime -Descending | Select-Object -first 1 LastWriteTime), '' -ne $null)[0] }
for($i=0; $i -lt $ChildDirs.Length; $i++) {
#FolderName
$sheet.Cells.Item($row, $column) = $EditNames[$i]
$column++
#LastBackup
$sheet.Cells.Item($row, $column) = $EditTimes[$i]
$row++
$column = 1
}
If you're only looking at the first level of files in each folder, you can do it using a nested loop:
$row = 2
$column = 1
$folders = Get-ChildItem $directorypath
ForEach ($folder in $folders) {
# start off with LastEdited set to the last write time of the folder itself
$LastEdited = $folder.LastWriteTime
$folderPath = $directoryPath + '\' + $folder.Name
# this 'dynamically' sets each folder's path
$files = Get-Childitem $folderPath
ForEach ($file in $files) {
if ((Get-Date $file.LastWriteTime) -gt (Get-Date $LastEdited)) {
$LastEdited = $file.LastWriteTime
}
}
$sheet.Cells.Item($row,$column) = $folder.Name
$column++
$sheet.Cells.Item($row,$column) = $LastEdited
$row++
$column = 1
}

Compare contents of 6 objects and delete which are not matching

I have some 6 files which are created dynamically (so,I dont know the contents). I need to compare these 6 files (exactly speaking compare one file with 5 others) and see what all contents in the file 1 are matching with the other 5. The contents which are matching should be saved, others need to be deleted.
I coded something like below, but is deleting everything (which are matching too).
$lines = Get-Content "C:\snaps.txt"
$check1 = Get-Content "C:\Previous_day_latest.txt"
$check2 = Get-Content "C:\this_week_saved_snaps.txt"
$check3 = Get-Content "C:\all_week_latest_snapshots.txt"
$check4 = Get-Content "C:\each_month_latest.txt"
$check5 = Get-Content "C:\exclusions.txt"
foreach($l in $lines)
{
if(($l -notmatch $check1) -and ($l -notmatch $check2) -and ($l -notmatch $check3) -and ($l -notmatch $check4))
{
Remove-Item -Path "C:\$l.txt"
}else
{
#nothing
}
}
foreach($ch in $check5)
{
Remove-Item -Path "C:\$ch.txt"
}
Contents of 6 files will be as shown below:
$lines
testinstance-01-07-15-08-00
testinstance-10-07-15-23-00
testinstance-13-02-15-13-00
testinstance-15-06-15-23-00
testinstance-19-01-15-23-00
testinstance-23-05-15-20-00
testinstance-27-03-15-23-00
testinstance-28-02-15-23-00
testinstance-29-07-15-08-00
testinstance-30-04-15-23-00
testinstance-30-06-15-23-00
testinstance-31-01-15-23-00
testinstance-31-12-14-23-00
$check1
testinstance-29-07-15-08-00
$check2
testinstance-23-05-15-20-00
testinstance-27-03-15-23-00
$check3
testinstance-01-07-15-23-00
testinstance-13-02-15-13-00
testinstance-19-01-15-23-00
$check4
testinstance-28-02-15-23-00
testinstance-30-04-15-23-00
testinstance-30-06-15-23-00
testinstance-31-01-15-23-00
$check5
testinstance-31-12-14-23-00
I've read about compare-object. But not sure how that can be implemented in my case as contents of all 5 files will be different and all those contents should be saved from deletion. Can someone please guide me to achieve what I said.? Any help would be really appreciated.
I would create an array of the files to check so you can simply add new files without modifying other parts of your script.
I use the where cmdlet which filters all lines that are in the reference file using -in condition and finally overwrite the file:
$referenceFile = 'C:\snaps.txt'
$compareFiles = #(
'C:\Previous_day_latest.txt',
'C:\this_week_saved_snaps.txt',
'C:\all_week_latest_snapshots.txt',
'C:\each_month_latest.txt',
'C:\exclusions.txt'
)
# get the content of the reference file
$referenceContent = (gc $referenceFile)
foreach ($file in $compareFiles)
{
# get the content of the file to check
$content = (gc $file)
# filter all contents from the file to check which are in the reference file and save it
$content | where { $_ -in $referenceContent } | sc $file
}
You can use the -contains operator to compare array contents. If you open all the files you want to check and store into an array, you can compare that with the reference file:
$lines = Get-Content "C:\snaps.txt"
$check1 = "C:\Previous_day_latest.txt"
$check2 = "C:\this_week_saved_snaps.txt"
$check3 = "C:\all_week_latest_snapshots.txt"
$check4 = "C:\each_month_latest.txt"
$check5 = "C:\exclusions.txt"
$checklines = #()
(1..5) | ForEach-Object {
$comp = Get-Content $(Get-Variable check$_).value
$checklines += $comp
}
$matches = $lines | ? { $checklines -contains $_ }
If you switch the -contains to -notcontains you'll see the three lines that don't match
The other answers here are great but I wanted to show you that Compare-Object could still work. You need to use it in a loop however. Just to try and show something else I included a simple use of Join-Path for building the array of checks. Basically we are saving some typing when you move your files to a production area. Update one path instead of more.
$rootPath = "C:\"
$fileNames = "Previous_day_latest.txt", "this_week_saved_snaps.txt", "all_week_latest_snapshots.txt", "each_month_latest.txt", "exclusions.txt"
$lines = Get-Content (Join-path $rootPath "snaps.txt")
$checks = $fileNames | ForEach-Object{Join-Path $rootPath $_}
ForEach($check in $checks){
Compare-Object -ReferenceObject $lines -DifferenceObject (Get-Content $check) -IncludeEqual |
Where-Object{$_.SideIndicator -eq "=="} |
Select-Object -ExpandProperty InputObject |
Set-Content $check
}
So we take each file path and use Compare-Object in a loop comparing each to the $lines array. Using -IncludeEqual we find the lines that both files share and write those back to the file.
Depending on how many checks you have and where they are it might be easier to have this line to build the array $checks
$checks = Get-ChildItem "C:\" -Filter "*.txt" | Select-Object -Expand FullName

Removing items from one CSV based on items in another CSV file

I have a script that will generate a CSV file. The purpose of the script is to verify if a certain file is missing. For example, let's say I have the following files:
1.jpg
2.jpg
3.jpg
4.jpg
1.gif
3.gif
2.txt
3.txt
Once the script is run, it will generate a report so I can visually see what file is missing. The report looks like:
JPG Files GIF Files TXT Files
1.jpg 1.gif
2.jpg 2.txt
3.jpg 3.gif 3.txt
So you can see, I'm missing 1.txt and 2.gif.
Here's where my problem comes in....
I now have a SECOND CSV file that has a list of files that MUST be kept in the FIRST CSV. Anything that is NOT in the SECOND CSV file must now be removed from my FIRST CSV. For example:
My FIRST CSV contains:
1.jpg
2.jpg
3.jpg
1.gif
3.gif
2.txt
3.txt
The SECOND CSV says that the following files need to remain:
1.jpg
3.jpg
1.gif
2.txt
Therefore, anything that does not appear in the SECOND CSV file, needs to be removed from the FIRST CSV while retaining the same format, meaning that if 1.jpg is missing (it is still listed in the SECOND CSV but does not exist in the C:\JPG folder) it must show a blank space in the FIRST CSV.
I hope this make sense. Please ask me if you have any questions or need clarification.
Below is the portion of code from my script that generates the FIRST CSV:
# Get dirs
$dirJPG = "C:\JPG"
$dirGIF = "C:\GIF"
$dirTXT = "C:\TXT"
$files = #()
$files += Get-ChildItem -Path $dirBGR -Filter "*.jpg"
$files += Get-ChildItem -Path $dirMI -Filter "*.gif"
$files += Get-ChildItem -Path $dirW3F -Filter "*.txt"
# Write a datetime stamped CSV file
$datetime = Get-Date -Format "MM_dd_yyyy_hhmm"
$files | Sort-Object -Property { $_.Name } | Group-Object -Property {
[System.IO.Path]::GetFileNameWithoutExtension($_.Name) } | % {
New-Object psobject -Property #{
"JPG" Files" = $_.Group | ? { $_.Extension -eq ".jpg" } | % { $_.Name }
"GIF Files" = $_.Group | ? { $_.Extension -eq ".gif" } | % { $_.Name }
"TXT Files" = $_.Group | ? { $_.Extension -eq ".txt" } | % { $_.Name }
} } | Export-Csv -Path "$datetime.csv" -NoTypeInformation
Thanks in advance for your assistance! :D
It is possible to use arrays, but it will probably be more efficient to use hashtables. You can check iterate (foreach) through the first CSV items and check if files are in CSV1 and not in CSV2:
# Get the files by directory for each file type
function Get-FilesByType() {
param ([hashtable]$filters)
$result = #{}
foreach ($filter in $filters.Keys) {
$path = $filters[$filter]
Get-ChildItem -Path $path -Filter $filter | % {
$result.Add($_.Name, $_)
}
}
return $result
}
# Assume CSV1 hashtable already exists and is loaded
# Get the hashtable of files for CSV2
$csv2 = Get-FilesByType #{"*.jpg"="C:\JPG"; "*.gif"="C:\GIF"; "*.txt"="C:\TXT" }
# Remove items from CSV1 that do not exist in CSV2
# NOTE: You cannot remove items from the hashtable while
# iterating through the collection, so use a copy of the
# keys to iterate.
$keys = #()
$keys += $csv1.Keys
$keys | % {
if ( ! $csv2.ContainsKey($_) ) {
Write-Host "Removing $_"
$csv1.Remove($_)
}
}
# Write a datetime stamped CSV file
$datetime = Get-Date -Format "MM_dd_yyyy_hhmm"
$csv1.Values | Sort-Object -Property { $_.Name } | Group-Object -Property {
[System.IO.Path]::GetFileNameWithoutExtension($_.Name)
} | % {
New-Object psobject -Property #{
"JPG Files" = $_.Group | ? { $_.Extension -eq ".jpg" } | % { $_.Name }
"GIF Files" = $_.Group | ? { $_.Extension -eq ".gif" } | % { $_.Name }
"TXT Files" = $_.Group | ? { $_.Extension -eq ".txt" } | % { $_.Name }
}
} | Export-Csv -Path "$datetime.csv" -NoTypeInformation
Don't use an array - use the Hashtable like Ryan said. The array is not a good choice when you want to remove elements from it.
Found what my problem was...I was calling the files that needed to be removed and then removing them. I simply needed to add a not condition:
$keys = #()
$keys += $currentFiles.Keys
$keys | % {
if (! $filesToKeep.ContainsKey($_)) {
Write-Host "Removing $_"
$currentFiles.Remove($_)
}
}