Identify duplicate files from leading number of characters - powershell

I have a file directory which contains approx. 600 employee image files which have been copied from an alternative source.
The filename format is:
xxxxxx_123456_123_20141212.jpg
When the employee image file is updated it just creates another file in the same location and only the datetime changes at the end.
I need to be able to identify the most recent file, however i need to establish first of all which files are 'duplicated'.
My initial thoughts were to try and match the first 14 characters and, if they matched, work out the recent modified date and then delete the older file.

This requires PowerShell version 3.
$Path = 'C:\Users\madtomvane\Documents\PowerShellTest'
#Get the files #Group them by name #Select the most resent file
$FilesToKeep = Get-ChildItem $Path -Recurse -File | Group-Object -Property {$_.Name[0..14]} | ForEach-Object {$_.Group | Sort-Object -Property LastWriteTime -Descending | Select-Object -First 1}
#Get the files #Group them by name #Where there is more than one file in the group #Select the old ones
$FilesToRemove = Get-ChildItem $Path -Recurse -File | Group-Object -Property {$_.Name[0..14]} | Where-Object {$_.Group.Count -gt 1} | ForEach-Object {$_.Group | Sort-Object -Property LastWriteTime -Descending | Select-Object -Skip 1}
$FilesToRemove | Remove-Item

Related

How to get last modified folder name in powershell?

I want to get the name of the last modified folder. I have tried the below command but it is not giving me the correct folder name.
(Get-ChildItem c:\ -Directory).Name | Sort-object -Property lastWriteTime -Descending | Select -First 1
Don't select the name in Get-ChildItem, but in the later select, and use -First because you are already sorting it descending:
Get-ChildItem c:\ -Directory | Sort-object -Property lastWriteTime -Descending | select name -first 1
(Get-ChildItem -Path C:\example -Directory | Sort-Object LastWriteTime | Select-Object -Last 1).Name
Get-ChildItem -Path C:\example -Directory: gets a list of all the subfolders in the "C:\example" directory.
Sort-Object LastWriteTime: sorts the folders by their last modified date.
Select-Object -Last 1: selects the last folder in the sorted list.
.Name: displays the name of the selected folder.

Powershell: Get-ChildItem, get latest file of files with similar names

I got a script to get files from a folder that I then put in a HTML-Table and mail.
In the folder you have files like HR May 2020, HR April 2020, RR May 2021 etc.
Below is the code itself as a sample, this looks for other files but they come every month as well. In total I will filter 8 files.
Get-ChildItem -Path D:\Temp\Test |
Where-Object { $_.Name -match '^RR_Prognos.*|^AllokeringBogNycklar.*' } |
Sort-Object -Property LastWriteTime |
Select-Object LastWriteTime,FullName
Now I am only interested in seeing the latest file of each, so using last or -days, month, hours or similar wont work.
I tried to find a better solution googling it but could not come up with anything that solved the problem.
So I just need to add to the code it it picks the lastest of each file i filter on, the filter is so it does not care about the month in the name.
Edit: Lets say I would use:
Get-ChildItem -Path c:\tm1 | Where-Object { $_.Name -match '^RR_Prognos.*|^AllokeringBogNycklar.*|^Aktivversion.*|^AllokeringNycklar.*|^HR_prognos.*|^KostnaderDK.*|KostnaderProdukt_prognos.*|^Parametrar_prognos.*|ProduktNyckel_prognos apr.*' } | Sort-Object -Property LastWriteTime -Descending | Select-First 8 | Select-Object LastWriteTime,FullName
Then if one file does not come with the batch, it would show the 2nd last one of that as well. Is there a easier way to block that from happening?
Ok, now that you've provided the whole list that you filter against I can write up a real answer. Here we'll group by file name, then sort each group and grab the last one from each group:
Get-ChildItem -Path c:\tm1 |
Where-Object { $_.Name -match '^RR_Prognos.*|^AllokeringBogNycklar.*|^Aktivversion.*|^AllokeringNycklar.*|^HR_prognos.*|^KostnaderDK.*|KostnaderProdukt_prognos.*|^Parametrar_prognos.*|ProduktNyckel_prognos apr.*' } |
Group {$_.Name -replace '.*?(^RR_Prognos|^AllokeringBogNycklar|^Aktivversion|^AllokeringNycklar|^HR_prognos|^KostnaderDK|KostnaderProdukt_prognos|^Parametrar_prognos|ProduktNyckel_prognos apr).*','$1'} |
ForEach-Object {
$_.Group |
Sort-Object -Property LastWriteTime -Descending |
Select -First 1
} |
Select-Object LastWriteTime,FullName
This might be what you are looking for. This iterates over each search string where you need the newest file.
$searchStrings = #('^pattern1*', '^pattern2*', '^pattern3*')
foreach($searchString in $searchStrings) {
$items = Get-ChildItem -Path $folder | Where-Object { $_.Name -match "$searchString" } | Sort-Object -Property LastWriteTime -Descending
$newestItem = $items[0]
Write-Host "newest item for '$searchString' is $newestItem"
}

Powershell script deleting files despite -Exclude switch

I have the following script where I'm trying to delete all the SQL .bak files except for the last two. When I run it it wipes out everything in the folder. Does -Exclude not work with array values?
$excludefile=get-childitem D:\TempDB | sort lastwritetime | select-object -Last 2 | select-object -Property Name | select-object -expandproperty Name
foreach ($element in $excludefile)
{
$element
remove-item -Path D:\TempDB -Exclude ($element) -Force
}
Is this what you're looking for?
Get-ChildItem D:\TempDB |
Sort-Object LastWriteTime -Descending |
Select-Object -Skip 2 |
Remove-Item -WhatIf
Of course, you can remove -WhatIf if this is what you need.

How do I group and sort values into a dictionary?

I'm attempting to find the latest version of multiple files within a directory. Currently, I'm calling GCI per file, but that is extremely slow, so I want to instead cache all the results by unique file name and then just perform a lookup in the cache.
I'm currently doing the following:
Gci $filePath -Recurse | ?{ -Not $_.PSIsContainer } | Group-Object Name
I'm trying to convert this to the powershell equivalent of the C# code:
group.ToDictionary(g => g.Key, g => g.Values.OrderByDescending(v => v.ModifiedAt).First().FullName)
How would I accomplish this in Powershell?
What I would do to create a hashtable of files would be to create an empty hashtable, and then populate it with the results of your GCI:
$files = ${}
GCI $filepath -Recurse -File | Group Name | ForEach{ $files.Add($_.Name, ($_.Group | Sort LastWriteTime)) }
Or if all you want is the most recent file, add | Select -Last 1 after the Sort LastWriteTime. If all you care about is the path, you could even do | Select -Last 1 -ExpandProperty FullName.
$Files = ${}
GCI $filepath -recurse | Group Name | ForEach{ $files.Add($_.Name, ($_.Group | Sort LastWriteTime | Select -Last 1 -ExpandProperty FullName)) }

Sum of file folder size based on file/folder name

I have multiple folders across a number of SQL Servers that contain hundreds/thousands of databases. Each database comprises of three elements:
<dbname>.MDF
<dbname>.LDF
<dbname>files (Folder that contains db files/attachments)
I need to marry these files together and add up their total size, does anyone have any advice on how to do this?
EDIT : Just to clarify, I'm currently able to output the filesizes of the MDF/LDF files, I have a separate script that summarises the folder sizes. I need a method of adding together a .MDF/.LDF/DBFiles folder when their name matches. Bearing in mind all of the files are prefixed with the database name.
EDIT #2: The 2 options given so far sum together the .mdf/.ldf files with no problem, but do not add the folder size of the DBFiles folder. Does anyone have any input on how to amend these scripts to include a folder beginning with the same name.
First provided script:
$root = 'C:\db\folder'
Get-ChildItem "$root\*.mdf" | Select-Object -Expand BaseName |
ForEach-Object {
New-Object -Type PSObject -Property #{
Database = $_
Size = Get-ChildItem "$root\$_*" -Recurse |
Measure-Object Length -Sum |
Select-Object -Expand Sum
}
}
Second provided script:
gci "c:\temp" -file -Include "*.mdf", "*.ldf" -Recurse |
group BaseName, DirectoryName |
%{new-object psobject -Property #{FilesAndPath=$_.Name; Size=($_.Group | gci | Measure-Object Length -Sum).Sum } }
EDIT #3:
Thanks to Ansgar (below), the updated solution has done the trick perfectly. Updating question with final solution:
$root = 'C:\db\folder'
Get-ChildItem "$root\*.mdf" | Select-Object -Expand BaseName |
ForEach-Object {
New-Object -Type PSObject -Property #{
Database = $_
Size = Get-ChildItem "$root\$_*\*" -Recurse |
Measure-Object Length -Sum |
Select-Object -Expand Sum
}
}
Enumerate just the .mdf files from your database folder, then enumerate the files and folders for each basename.
$root = 'C:\db\folder'
Get-ChildItem "$root\*.mdf" | Select-Object -Expand BaseName |
ForEach-Object {
New-Object -Type PSObject -Property #{
Database = $_
Size = Get-ChildItem "$root\$_*\*" -Recurse |
Measure-Object Length -Sum |
Select-Object -Expand Sum
}
}
if you want the sum of sise files database by dir and name file (without extension), try it
gci "c:\temp" -file -Include "*.mdf", "*.ldf" -Recurse |
group BaseName, DirectoryName |
%{new-object psobject -Property #{FilesAndPath=$_.Name; Size=($_.Group | gci | Measure-Object Length -Sum).Sum } }
Modifiy a little the include gci if necessary