Powershell "if more than one, then delete all but one" - powershell

Is there a way to do something like this in Powershell:
"If more than one file includes a certain set of text, delete all but one"
Example:
"...Cam1....jpg"
"...Cam2....jpg"
"...Cam2....jpg"
"...Cam3....jpg"
Then I would want one of the two "...Cam2....jpg" deleted, while the other one should stay.
I know that I can use something like
gci *Cam2* | del
but I don't know how I can make one of these files stay.
Also, for this to work, I need to look through all the files to see if there are any duplicates, which defeats the purpose of automating this process with a Powershell script.
I searched for a solution to this for a long time, but I just can't find something that is applicable to my scenario.

Get a list of files into a collection and use range operator to select a subset of its elements. To remove all but first element, start from index one. Like so,
$cams = gci "*cam2*"
if($cams.Count -gt 1) {
$cams[1..$cams.Count] | remove-item
}

Expanding on the idea of commenter boxdog:
# Find all duplicately named files.
$dupes = Get-ChildItem c:\test -file -recurse | Group-Object Name | Where-Object Count -gt 1
# Delete all duplicates except the 1st one per group.
$dupes | ForEach-Object { $_.Group | Select-Object -Skip 1 | Remove-Item -Force }
I've split this up into two sub tasks to make it easier to understand. Also it is a good idea to always separate directory iteration from file deletion, to avoid inconsistent results.
First statement uses Group-Object to group files by names. It outputs a Count property containing the number of files per group. Then Where-Object is used to get only groups that contain more than one file, which will be the dupes. The result is stored in variable $dupes, which is an array that looks like this:
Count Name Group
----- ---- -----
2 file1.txt {C:\test\subdir1\file1.txt, C:\test\subdir2\file1.txt}
2 file2.txt {C:\test\subdir1\file2.txt, C:\test\subdir2\file2.txt}
The second statement uses ForEach-Object to iterate over all groups of duplicates. From the Group-Object call of the 1st statement we got a Group property that contains an array of file informations. Using Select-Object -Skip 1 we select all but the 1st element of this array, which are passed to Remove-Item to delete the files.

Related

PowerShell: Find similar filenames in a directory

In a purely hypothetical situation of a person that downloaded some TV episodes, but is wondering if he/she accidentally downloaded an HDTV, a WEBRip and a WEB-DL version of an episode, how could PowerShell find these 'duplicates' so the lower quality versions can be automagically deleted?
First, I'd get all the files in the directory:
$Files = Get-ChildItem -Path $Directory -Exclude '*.nfo','*.srt','*.idx','*.sub' |
Sort-Object -Property Name
I exclude the non-video extensions for now, since they would cause false positives. I would still have to deal with them though (during the delete phase).
At this point, I would likely use a ForEach construct to parse through the files one by one and look for files that have the same episode number. If there are any, they should be looked at.
Assuming a common spaces equals dots notation here, a typical filename would be AwesomeSeries.S01E01.HDTV.x264-RLSGRP
To compare, I need to get only the episode number. In the above case, that means S01E01:
If ($File.BaseName -match 'S*(\d{1,2})(x|E)(\d{1,2})') { $EpisodeNumber = $Matches[0] }
In the case of S01E01E02 I would simply add a second if-statement, so I'm not concerned with that for now.
$EpisodeNumber should now contain S01E01. I can use that to discover if there are any other files with that episode number in $Files. I can do that with:
$Files -match $EpisodeNumber
This is where my trouble starts. The above will also return the file I'm processing. I could at this point handle the duplicates immediately, but then I would have to do the Get-ChildItem again because otherwise the same match would be returned when the ForEach construct gets to the duplicate file which would then result in an error.
I could store the files I wish to delete in an array and process them after the ForEach contruct is over, but then I'd still have to filter out all the duplicates. After all, in the ForEach loop,
AwesomeSeries.S01E01.HDTV.x264-RLSGRP
would first match
AwesomeSeries.S01E01.WEB-DL.x264.x264-RLSGRP, only for
AwesomeSeries.S01E01.WEB-DL.x264.x264-RLSGRP
to match
AwesomeSeries.S01E01.HDTV.x264-RLSGRP afterwards.
So maybe I should process every episode number only once, but how?
I get the feeling I'm being very inefficient here and there must be a better way to do this, so I'm asking for help. Can anyone point me in the right direction?
Filter the $Files array to exclude the current file when matching:
($Files | Where-Object {$_.FullName -ne $File.FullName}) -match $EpisodeNumber
Regarding the duplicates in the array the end, you can use Select-Object -Unique to only get distinct entries.
Since you know how to get the episode number let's use that to group the files together.
$Files = Get-ChildItem -Path $Directory -Exclude '*.nfo','*.srt','*.idx','*.sub' | Select-Object FullName, #{Name="EpisodeIndex";Expression={
# We do not have to do it like this but if your detection logic gets more complicated then having
# this select-object block will be a cleaner option then using a calculated property
If ($_.BaseName -match 'S*(\d{1,2})(x|E)(\d{1,2})'){$Matches[0]}
}}
# Group the files by season episode index (that have one). Return groups that have more than one member as those would need attention.
$Files | Where-Object{$_.EpisodeIndex } | Group-Object -Property EpisodeIndex |
Where-Object{$_.Count -gt 1} | ForEach-Object{
# Expand the group members
$_.Group
# Not sure how you plan on dealing with it.
}

PowerShell find most recent file

I'm new to powershell and scripting in general. Doing lots of reading and testing and this is my first post.
Here is what I am trying to do. I have a folder that contains sub-folders for each report that runs daily. A new sub-folder is created each day.
The file names in the sub-folders are the same with only the date changing.
I want to get a specific file from yesterday's folder.
Here is what I have so far:
Get-ChildItem -filter “MBVOutputQueriesReport_C12_Custom.html” -recurse -path D:\BHM\Receive\ | where(get-date).AddDays(-1)
Both parts (before and after pipe) work. But when I combine them it fails.
What am I doing wrong?
What am I doing wrong?
0,1,2,3,4,5 | Where { $_ -gt 3 }
this will compare the incoming number from the pipeline ($_) with 3 and allow things that are greater than 3 to get past it - whenever the $_ -gt 3 test evaluates to $True.
0,1,2,3,4,5 | where { $_ }
this has nothing to compare against - in this case, it casts the value to boolean - 'truthy' or 'falsey' and will allow everything 'truthy' to get through. 0 is dropped, the rest are allowed.
Get-ChildItem | where Name -eq 'test.txt'
without the {} is a syntax where it expects Name is a property of the thing coming through the pipeline (in this case file names) and compares those against 'test.txt' and only allows file objects with that name to go through.
Get-ChildItem | where Length
In this case, the property it's looking for is Length (the file size) and there is no comparison given, so it's back to doing the "casting to true/false" thing from earlier. This will only show files with some content (non-0 length), and will drop 0 size files, for example.
ok, that brings me to your code:
Get-ChildItem | where(get-date).AddDays(-1)
With no {} and only one thing given to Where, it's expecting the parameter to be a property name, and is casting the value of that property to true/false to decide what to do. This is saying "filter where *the things in the pipeline have a property named ("09/08/2016 14:12:06" (yesterday's date with current time)) and the value of that property is 'truthy'". No files have a property called (yesterday's date), so that question reads $null for every file, and Where drops everything from the pipeline.
You can do as Jimbo answers, and filter comparing the file's write time against yesterday's date. But if you know the files and folders are named in date order, you can save -recursing through the entire folder tree and looking at everything, because you know what yesterday's file will be called.
Although you didn't say, you could do approaches either like
$yesterday = (Get-Date).AddDays(-1).ToString('MM-dd-yyyy')
Get-ChildItem "d:\receive\bhm\$yesterday\MBVOutputQueriesReport_C12_Custom.html"
# (or whatever date pattern gets you directly to that file)
or
Get-ChildItem | sort -Property CreationTime -Descending | Select -Skip 1 -First 1
to get the 'last but one' thing, ordered by reverse created date.
Read output from get-date | Get-Member -MemberType Property and then apply Where-Object docs:
Get-ChildItem -filter “MBVOutputQueriesReport_C12_Custom.html” -recurse -path D:\BHM\Receive\ | `
Where-Object {$_.LastWriteTime.Date -eq (get-date).AddDays(-1).Date}
Try:
where {$_.lastwritetime.Day -eq ((get-date).AddDays(-1)).Day}
You could pipe the results to the Sort command, and pipe that to Select to just get the first result.
Get-ChildItem -filter “MBVOutputQueriesReport_C12_Custom.html” -recurse -path D:\BHM\Receive\ | Sort LastWriteTime -Descending | Select -First 1
Can do something like this.
$time = (get-date).AddDays(-1).Day
Get-ChildItem -Filter "MBVOutputQueriesReport_C12_Custom.html" -Recurse -Path D:\BHM\Receive\ | Where-Object { $_.LastWriteTime.Day -eq $time }

Get PathTwo Counts In Dynamic PathOne

I need to find all the PathTwos in PathOne under the directory Path; currently I can get all the PathOne's by using:
$path = Get-ChildItem "C:\Path\" | ?{ $_.PSIsContainer }
Since PathOne is dynamic (it's name may be anything), this helps me loop through all the possible paths. Now, PathOne may have 2 or more folders, like PathTwo1, PathTwo2 and PathTwo3. I need to know how many folders are in the dynamic PathOne. Originally I thought that I could loop within PathOne, get the name of the dynamic path and then loop through PathOne, counting all the PathTwos and return everything over 1; unfortunately that doesn't return what I need.
I've tried:
A loop within a loop: creates a mess and doesn't return the correct result.
Use C:\Path\.\ to get the count of the folders within PathOne, by jumping to whatever the next folder would be.
Based on comments, example:
C:\Path\PathOne1\PathTwo1
C:\Path\PathOne1\PathTwo2
C:\Path\PathOne2\PathTwo1
C:\Path\PathOne2\PathTwo2
C:\Path\PathOne2\PathTwo3
C:\Path\PathOne3\PathTwo1 # don't want because only one PathTwo
I don't care how many PathOnes there are, but I do need every PathOne that has more than one PathTwos.
You can get desired result by this command:
# Get all items on two level deep.
Get-Item C:\Path\*\* |
# Get only directories.
Where-Object PSIsContainer |
# Group them by parent.
Group-Object {$_.Parent.FullName} -NoElement |
# Choose groups with Count more then one.
Where-Object Count -gt 1 |
# Select name of parent directory
Select-Object -ExpandProperty Name

PowerShell file rename and counter issue

I have made a very simple ps1 script that rename my txt files with an "ID" number like
Card.321.txt
This works for simple lines, but I need a mass rename of lines, so I need something different.
$i = 1
Get-ChildItem *.txt | %{Rename-Item $_ -NewName ('Card.{0:D3}.txt' -f $i++)}
But I can only properly run it if there are no other files named Card.xxx.txt
and every day I get one new file that I store in an archive folder after it got renamed.
how can I make a script that doesn't try to do a mass renaming task?
I need a counter that can continue from yesterdays task performed with the same script.
Card.321.txt
Card.322.txt
Card.323.txt
Card.324.txt
Card.325.txt
ToDaysFiledToBeRenamed.txt
How about something like this. Not the most efficient but should be easy to read
$thePath = "C:\temp"
$allTXTFiles = Get-ChildItem $thePath *.txt
$filestoberenamed = $allTXTFiles | Where-Object{$_.BaseName -notmatch "^Card\.\d{3}$"}
$highestNumber = $allTXTFiles | Where-Object{$_.BaseName -match "^Card\.\d{3}$"} |
ForEach-Object{[int]($_.BaseName -split "\.")[-1]} |
Measure-Object -Maximum | Select-Object -ExpandProperty Maximum
$filestoberenamed | ForEach-Object{Rename-Item $_ -NewName ('Card.{0:D3}.txt' -f $highestNumber++)}
Collects all the files and splits them into two groups. Using $allTXTFiles we filter all the files that have the "Card" naming convention and parse out the number. Of those number determine the current highest one as $highestNumber.
Then we take the remaining files as $filestoberenamed and put them through your Rename-Item snippet using $highestNumber as index.
Known Caveats
This would not act correctly if there is ever a file with the number higher than 999. It would allow the creation of one but it currently I am only looking for files with 3 digits. We could change that to ^Card\.\d+$ instead. Depends on what logic you would want.

Find the first file that matches a pattern using PowerShell

I would like to select any one ".xls" file in a directory. The problem is the dir command can return different types.
gci *.xls
will return
object[] if there is more than one file
FileInfo if there is exactly one file
null if there are no files
I can deal with null, but how do I just select the "first" file?
You can force PowerShell into returning an array, even when only one item is present by wrapping a statement into #(...):
#(gci *.xls)[0]
will work for each of your three cases:
it returns the first object of a collection of files
it returns the only object if there is only one
it returns $null of there wasn't any object to begin with
There is also the -First parameter to Select-Object:
Get-ChildItem -Filter *.xls | Select-Object -First 1
gci -fi *.xls | select -f 1
which works pretty much identical to the above, except that the list of files doesn't need to be enumerated completely by Get-ChildItem, as the pipeline is aborted after the first item. This can make a difference when there are many files matching the filter.