Delete duplicate files with Powershell except the file specified - powershell

I am using the following code to delete duplicate files in one folder:
ls *.wav -recurse | get-filehash | group -property hash | where { $_.count -gt 1 } | % { $_.group | select -skip 1 } | del
I have two issues. I want to limit this to only one filehash at a time and I need to specify the file name I want to keep.
As an example, I have a folder named Recordings. The first five files listed all have the same filehash but only one has the filename that has already been entered in my database.
Recordings
It would be great if I could use the -Exclude parameter for the del cmdlet but that parameter does not accept pipeline input.
I also considered using the code above and then renaming the remaining file afterward but the code is not limited to one filehash.

It all depends on how you want it to work. For example, if you know the file name you want to keep in advance, you could do it this way:
$fileName = 'file1.txt'
$fileHash = Get-FileHash .\$filename
$duplicates = ls -Recurse | Get-FileHash | Where-Object {$_.Hash -eq $fileHash.Hash -and ($_.Path | Split-Path -Leaf) -ne $fileName }
$duplicates | del
This sequence sets the filename, gets the hash of that file, and then the main command checks for other files with that same hash that doesn't have the same filename.
Note: Test first to make sure this will do what you expect before you execute the del command.
Update: It appears that Get-FileHash puts some sort of lock on the files being hashed so you cannot immediately pipe to the del (Remove-Item) command. I modified the results to save the array of duplicates to a variable and then pass that to the delete command which works.

Related

Foreach/copy-item based on name contains

I'm trying to create a list of file name criteria (MS Hotfixes) then find each file name containing that criteria in a directory and copy it to another directory. I think I'm close here but missing something simple.
Here is my current attempt:
#Create a list of the current Hotfixes.
Get-HotFix | Select-Object HotFixID | Out-File "C:\Scripts\CurrentHotfixList.txt"
#
#Read the list into an Array (dropping the first 3 lines).
$HotfixList = Get-Content "C:\Scripts\CurrentHotfixList.txt" | Select-Object -Skip 3
#
#Use the Hotfix names and copy the individual hotfixes to a folder
ForEach ($Hotfix in $HotfixList) {
Copy-Item -Path "C:\KBtest\*" -Include *$hotfix* -Destination "C:\KBtarget"
}
If I do a Write-Host $Hotfix and comment out my Copy-Item line I get the list of hotfixes as expected.
If I run just the copy command and input the file name I am looking for - it works.
Copy-Item -Path "C:\KBtest\*" -Include *kb5016693* -Destination "C:\KBtarget"
But when I run my script it copies all the files in the folder and not just the one file I am looking for. I have several hotfixes in that KBtest folder but only one that is correct for testing.
What am I messing up here?
The simplest solution to your problem, taking advantage of the fact that -Include can accept an array of patterns:
# Construct an array of include patterns by enclosing each hotfix ID
# in *...*
$includePatterns = (Get-HotFix).HotfixID.ForEach({ "*$_*" })
# Pass all patterns to a single Copy-Item call.
Copy-Item -Path C:\KBtest\* -Include $includePatterns -Destination C:\KBtarget
As for what you tried:
To save just the hotfix IDs to a plain-text file, each on its own line, use the following, don't use Select-Object -Property HotfixId (-Property is implied if you omit it), use Select-Object -ExpandProperty HotfixId:
Get-HotFix | Select-Object -ExpandProperty HotFixID | Out-File "C:\Scripts\CurrentHotfixList.txt"
Or, more simply, using member-access enumeration:
(Get-HotFix).HotFixID > C:\Scripts\CurrentHotfixList.txt
Using Select-Object -ExpandProperty HotfixID or (...).HotfixID returns only the values of the .HotfixID properties, whereas Select-Object -Property HotfixId - despite only asking for one property - returns an object that has a .HotfixID property - this is a common pitfall; see this answer for more information.
Then you can use a Get-Content call alone to read the IDs (as strings) back into an array (no need for Select-Object -Skip 3):
$HotfixList = Get-Content "C:\Scripts\CurrentHotfixList.txt"
(Note that, as the solution at the top demonstrates, for use in the same script you don't need to save the IDs to a file in order to capture them.)
This will likely fix your primary problem, which stems from how Out-File creates for-display string representations of the objects ([pscustomobject] instances) that Select-Object -Property HotfixID created:
Not only is there an empty line followed by a table header at the start of the output (which your Select-Object -Skip 3 call skips), there are also two empty lines at the end.
When these empty lines were read into $hotfix in your foreach loop, -Include *$hotfix* effectively became -Include **, which therefore copied all files.
first, you do not need to create and import those textfiles:
get-hotfix | ?{$_.hotfixid -notmatch 'KB5016594|KB5004567|KB5012170'} | %{
copy-item -path "C:\kbtest\$($_.HotfixId).exe" -Destination "C:\kbTarget"
}
This filters for the hotfixes you do not want, if you do not need it remove:
?{$_.hotfixid -notmatch 'KB5016594|KB5004567|KB5012170'}
I assume that those files are exe files in my example.

Powershell Delete all files apart from the latest file per day

I have a folder that contains a lot of files, multiple files per day.
I would like to script something that deletes all but the latest file per day.
I have seen a lot of scripts that delete files over X days old but this is slightly different and having written no powershell before yesterday (I'm exclusively tsql), I'm not really sure how to go about it.
I'm not asking anyone to write the code for me but maybe describe the methods of achieving this would be good and I can go off an research how to put it into practise.
All files are in a single directory, no subfolders. there are files I dont want to delete, the files i want to delete have file name in format constant_filename_prefix_YYYYMMDDHHMMSS.zip
Is powershell the right tool? Should i instead be looking at Python (which I also don't know) Powershell is more convinient since other code we have is written in PS.
PowerShell has easy to use cmdlets for this kind of thing.
The question to me is if you want the use the dates in the file names, or the actual LastWriteTime dates of the files themselves (as shown in File Explorer).
Below two ways of handling this. I've put in a lot of code comments to help you get the picture.
If you want to remove the files based on their actual last write times:
$sourceFolder = 'D:\test' # put the path to the folder where your files are here
$filePrefix = 'constant_filename_prefix'
Get-ChildItem -Path $sourceFolder -Filter "$filePrefix*.zip" -File | # get files that start with the prefix and have the extension '.zip'
Where-Object { $_.BaseName -match '_\d{14}$' } | # that end with an underscore followed by 14 digits
Sort-Object -Property LastWriteTime -Descending | # sort on the LastWriteTime property
Select-Object -Skip 1 | # select them all except for the first (most recent) one
Remove-Item -Force -WhatIf # delete these files
OR
If you want to remove the files based the dates in the file names.
Because the date formats you used are sortable, you can safely sort on the last 14 digits of the file BaseName:
$sourceFolder = 'D:\test'
$filePrefix = 'constant_filename_prefix'
Get-ChildItem -Path $sourceFolder -Filter "$filePrefix*.zip" -File | # get files that start with the prefix and have the extension '.zip'
Where-Object { $_.BaseName -match '_\d{14}$' } | # that end with an underscore followed by 14 digits
Sort-Object -Property #{Expression = {$_.BaseName.Substring(14)}} -Descending | # sort on the last 14 digits descending
Select-Object -Skip 1 | # select them all except for the first (most recent) one
Remove-Item -Force -WhatIf # delete these files
In both alternatives you will find there is a switch -WhatIf at the end of the Remove-Item cmdlet. Yhis is for testing the code and no files wil actually be deleted. Instead, with this switch, in the console it writes out what would happen.
Once you are satisfied with this output, you can remove or comment out the -WhatIf switch to have the code delete the files.
Update
As I now understand, there are multiple files for several days in that folder and you want to keep the newest file for each day, deleting the others.
In that case, we have to create 'day' groups of the files and withing every group sort by date and delete the old files.
This is where the Group-Object comes in.
Method 1) using the LastWriteTime property of the files
$sourceFolder = 'D:\test' # put the path to the folder where your files are here
$filePrefix = 'constant_filename_prefix'
Get-ChildItem -Path $sourceFolder -Filter "$filePrefix*.zip" -File | # get files that start with the prefix and have the extension '.zip'
Where-Object { $_.BaseName -match '_\d{14}$' } | # that end with an underscore followed by 14 digits
Group-Object -Property #{Expression = { $_.LastWriteTime.Date }} | # create groups based on the date part without time part
ForEach-Object {
$_.Group |
Sort-Object -Property LastWriteTime -Descending | # sort on the LastWriteTime property
Select-Object -Skip 1 | # select them all except for the first (most recent) one
Remove-Item -Force -WhatIf # delete these files
}
Method 2) using the date taken from the file names:
$sourceFolder = 'D:\test' # put the path to the folder where your files are here
$filePrefix = 'constant_filename_prefix'
Get-ChildItem -Path $sourceFolder -Filter "$filePrefix*.zip" -File | # get files that start with the prefix and have the extension '.zip'
Where-Object { $_.BaseName -match '_\d{14}$' } | # that end with an underscore followed by 14 digits
Group-Object -Property #{Expression = { ($_.BaseName -split '_')[-1].Substring(0,8)}} | # create groups based on the date part without time part
ForEach-Object {
$_.Group |
Sort-Object -Property #{Expression = {$_.BaseName.Substring(14)}} -Descending | # sort on the last 14 digits descending
Select-Object -Skip 1 | # select them all except for the first (most recent) one
Remove-Item -Force -WhatIf # delete these files
}

File Size with Powershell

What I am trying to do is create a PS script to see when a certain folder has a file over 1GB. If it found a file over 1GB, I want it to write a log file with info saying the name of the certain file and its size.
This works but not fully, if the file is less than 1GB I don't want a log file. (right now this will display the file info for over 1GB but if its less that 1GB it still creates a log file with no data). I don't want it to create a log for anything less than 1GB.
Any idea on how to do that?
Thanks!
Ryan
Get-ChildItem -Path C:\Tomcat6.0.20\logs -File -Recurse -ErrorAction SilentlyContinue |
Where-Object {$_.Length -gt 1GB} |
Sort-Object length -Descending |
Select-Object Name,#{n='GB';e={"{0:N2}" -F ($_.length/ 1GB)}} |
Format-List Name,Directory,GB > C:\Users\jensen\Desktop\FolderSize\filesize.log`
First, set a variable with the term/filter you're after and store the results
$items = Get-ChildItem -Path C:\Tomcat6.0.20\logs -File -Recurse -ErrorAction SilentlyContinue |
Where-Object {$_.Length -gt 1GB} |
Sort-Object Length -Descending |
Select-Object Name,#{n='GB';e={"{0:N2}" -F ($_.length/ 1GB)}}
Then pipe that to Out-File to your desired output path. This example will output a file to the Desktop of the user running the script, change as needed:
$items | Out-File -FilePath $ENV:USERPROFILE\Desktop\filesize.log -Force
The -Force parameter will overwrite an existing filesize.log file if one already exists.
To make sure you don't write blank files you should collect the minimal starting results that match your filter, and test them to see if they contain anything at all.
Then if they on;t you can ed the script, but if they do you can go on to do the sort and select the data and output it to a log file.
# Collect Matching Files
$Matched = GCI -Path "C:\Tomcat6.0.20\logs" -File -R -ErrorA "SilentlyContinue" | ? {
$_.Length -gt 1GB
}
# Check is $Matched contains Results before further processing, otherwise, we're done!
IF ([bool]$Matched) {
# If here, we have Data so select what we need and output to the log file:
$Matched | Sort Length -D | FT Name,Directory,#{
n="GB";e={"{0:N2}" -F ($_.Length/ 1GB)}
} -Auto | Out-File "C:\Users\jensen\Desktop\FolderSize\filesize_$(Get-Date -F "yyyy-MM-dd_HH-mm-ss").log"
}
In the above script, I fixed the $. to be $_., and separated Matching the 1GB files from Manipulating them, and Outputting them to a file.
We simply test if any files were matched at 1 GB by checking to see if the Variable has any results or is $NULL/undefined.
If so, there is no need to take any further action.
Only when 1Gb files are matched do we quickly sort them, and select the details you wanted, but instead we'll just Use Format-Table (FT) with -Auto-size to get nice looking output that is much easier to review for this sort of data.
(Note Format-Table selects and formats the info into a table in one step, saving the unnecessary step of using Select to get the data and then piping (|) it to Format-list or Format-Table, as that just adds a bit of a redundant step. Select-Object is best used when you will need to do further steps with that data that require "object-oriented" operations in future steps.)
Then we pipe that output to save it all to a Log file using Out-File, and I also changed the log file name to contain the current date and time in ISO format filesize_$(Get-Date -F "yyyy-MM-dd_HH-mm-ss").log So you can save each run and review them later and won't want to have one gigantic file or have no history of runs.

Check file names and delete files with lower number in suffix

I have a following task on PowerShell:
I need to check files on remote machines:
For instance:
Get-ChildItem \\ServerName\data\
In this folder I have following files:
standard_file.0.tst
standard_file.1.tst
standard_file.2.tst
standard_file.3.tst
So, i need to delete files with lower number prefix (based on file name).
In the end, into the folder should be only one file with biggest prefix.
For instance:
standard_file.3.tst
I broke up my mind - and have no any ideas how to perform this.
Could you please push me to the right direction?
Thanks in advance.
You could use a regex to get the number and cast it to an int. Then sort the filenames by the number using the Sort-Object cmdlet so the file with the highest number will be the last. Then you select all objects using Select-Object and skip the last one and finally remove it using Remove-Item:
Get-ChildItem '\ServerName\data\' |
Sort-Object { [int][regex]::Match($_, '.*?(\d+)\.[^.]+$').Groups[1].Value } |
Select-Object -SkipLast 1 |
Remove-Item
Regex:
.*?(\d+)\.[^.]+$
This will gather all the files in the path that have numerical suffixes in their names. The way that is done is by using a regex to match all of the digits on the end of the basename. Sorting on the result of that match in descending order will put the wanted file on the top of the list. We then remove the remaining files by skipping that first result.
$path = "c:\temp"
Get-ChildItem $path | Where-Object{$_.BaseName -match "\.\d+$"} |
Sort-Object -Property {$_.BaseName -match "\.(\d+)$";[int]$Matches[1]} -Descending |
Select-Object -Skip 1 |
Remove-Item -Confirm:$false -WhatIf
Remove -WhatIf when you are sure that it will remove the files you want.

Delete files containing string

How can I delete all files in a directory that contain a string using powershell?
I've tried something like
$list = get-childitem *.milk | select-string -pattern "fRating=2" | Format-Table Path
$list | foreach { rm $_.Path }
And that worked for some files but did not remove everything. I've tried other various things but nothing is working.
I can easily get the list of file names and can create an array with the path's only using
$lista = #(); foreach ($f in $list) { $lista += $f.Path; }
but can't seem to get any command (del, rm, or Remove-Item) to do anything. Just returns immediately without deleting the files or giving errors.
Thanks
First we can simplify your code as:
Get-ChildItem "*.milk" | Select-String -Pattern "fRating=2" | Select-Object -ExcludeProperty path | Remove-Item -Force -Confirm
The lack of action and errors might be addressable by one of two things. The Force parameter which:
Allows the cmdlet to remove items that cannot otherwise be changed,
such as hidden or read-only files or read-only aliases or variables.
I would aslo suggest that you run this script as administrator. Depending where these files are located you might not have permissions. If this is not the case or does not work please include the error you are getting.
Im going to guess the error is:
remove-item : Cannot remove item C:\temp\somefile.txt: The process cannot access the file 'C:\temp\somefile.txt'
because it is being used by another process.
Update
In testing, I was also getting a similar error. Upon research it looks like the Select-String cmd-let was holding onto the file preventing its deletion. Assumption based on i have never seen Get-ChildItem do this before. The solution in that case would be encase the first part of this in parentheses as a sub expression so it would process all the files before going through the pipe.
(Get-ChildItem | Select-String -Pattern "tes" | Select-Object -ExpandProperty path) | Remove-Item -Force -Confirm
Remove -Confirm if deemed required. It exists as a precaution so that you don't open up a new powershell in c:\windows\system32 and copy paste a remove-item cmdlet in there.
Another Update
[ and ] are wildcard searches in powershell in order to escape those in some cmdlets you use -Literalpath. Also Select-String can return multiple hits in files so we should use -Unique
(Get-ChildItem *.milk | Select-String -Pattern "fRating=2" | Select-Object -ExpandProperty path -Unique) | ForEach-Object{Remove-Item -Force -LiteralPath $_}
Why do you use select-string -pattern "fRating=2"? You would like to select all files with this name?
I think the Format-Table Path don't work. The command Get-ChildItem don't have a property called "Path".
Work this snipped for you?
$list = get-childitem *.milk | Where-Object -FilterScript {$_.Name -match "fRating=2"}
$list | foreach { rm $_.FullName }
The following code gets all files of type *.milk and puts them in $listA, then uses that list to get all the files that contain the string fRating=[01] and stores them in $listB. The files in $listB are deleted and then the number of files deleted versus the number of files that contained the match is displayed(they should be equal).
sv -name listA -value (Get-ChildItem *.milk); sv -name listB -value ($listA | Select-String -Pattern "fRating=[01]"); (($listB | Select-Object -ExpandProperty path) | ForEach-Object {Remove-Item -Force -LiteralPath $_}); (sv -name FCount -value ((Get-ChildItem *.milk).Count)); Write-Host -NoNewline Files Deleted ($listA.Count - $FCount)/($listB.Count)`n;
No need to complicate things:
1. $sourcePath = "\\path\to\the\file\"
2. Remove-Item "$sourcePath*whatever*"
I tried the answer, unfortunately, errors seems to always come up, however, I managed to create a solution to get this done:
Without using Get-ChilItem; You can use select-string directly to search for files matching a certain string, yes, this will return the filename:count:content ... etc, but, internally these have names that you can chose or omit, the one you need is the "filename" to do this pipe this into "select-object" choosing the "FileName" from the output.
So, to select all *.MSG files that has the pattern of "Subject: Webservices restarted", you can do the following:
Select-String -Path .*.MSG -Pattern 'Subject: WebServices Restarted'
-List | select-object Filename
Also, to remove these files on the fly, you could pip into a ForEach statement with the RM command as follows:
Select-String -Path .*.MSG -Pattern 'Subject: WebServices Restarted'
-List | select-object Filename | foreach { rm $_.FileName }
I tried this myself, works 100%.
I hope this helps