PowerShell CSV Compare files with dynamic file names [duplicate] - powershell

We produce files with date in the name.
(* below is the wildcard for the date)
I want to grab the last file and the folder that contains the file also has a date(month only) in its title.
I am using PowerShell and I am scheduling it to run each day. Here is the script so far:
$LastFile = *_DailyFile
$compareDate = (Get-Date).AddDays(-1)
$LastFileCaptured = Get-ChildItem -Recurse | Where-Object {$LastFile.LastWriteTime
-ge $compareDate}

If you want the latest file in the directory and you are using only the LastWriteTime to determine the latest file, you can do something like below:
gci path | sort LastWriteTime | select -last 1
On the other hand, if you want to only rely on the names that have the dates in them, you should be able to something similar
gci path | select -last 1
Also, if there are directories in the directory, you might want to add a ?{-not $_.PsIsContainer}

Yes I think this would be quicker.
Get-ChildItem $folder | Sort-Object -Descending -Property LastWriteTime -Top 1

Try:
$latest = (Get-ChildItem -Attributes !Directory | Sort-Object -Descending -Property LastWriteTime | select -First 1)
$latest_filename = $latest.Name
Explanation:
PS C:\Temp> Get-ChildItem -Attributes !Directory *.txt | Sort-Object -Descending -Property LastWriteTime | select -First 1
Directory: C:\Temp
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 5/7/2021 5:51 PM 1802 Prison_Mike_autobiography.txt
Get-ChildItem -Attributes !Directory *.txt or Get-ChildItem or gci : Gets list of files ONLY in current directory. We can give a file extension filter too as needed like *.txt. Reference: gci, Get-ChildItem
Sort-Object -Descending -Property LastWriteTime : Sort files by LastWriteTime (modified time) in descending order. Reference
select -First 1 : Gets the first/top record. Reference Select-Object / select
Getting file metadata
PS C:\Temp> $latest.Name
Prison_Mike_autobiography.txt
PS C:\Temp> $latest.DirectoryName
C:\Temp
PS C:\Temp> $latest.FullName
C:\Temp\Prison_Mike_autobiography.txt
PS C:\Temp> $latest.CreationTime
Friday, May 7, 2021 5:51:19 PM
PS C:\Temp> $latest.Mode
-a----

#manojlds's answer is probably the best for the scenario where you are only interested in files within a root directory:
\path
\file1
\file2
\file3
However, if the files you are interested are part of a tree of files and directories, such as:
\path
\file1
\file2
\dir1
\file3
\dir2
\file4
To find, recursively, the list of the 10 most recently modified files in Windows, you can run:
PS > $Path = pwd # your root directory
PS > $ChildItems = Get-ChildItem $Path -Recurse -File
PS > $ChildItems | Sort-Object LastWriteTime -Descending | Select-Object -First 10 FullName, LastWriteTime

You could try to sort descending "sort LastWriteTime -Descending" and then "select -first 1." Not sure which one is faster

Related

How to get last modified folder name in powershell?

I want to get the name of the last modified folder. I have tried the below command but it is not giving me the correct folder name.
(Get-ChildItem c:\ -Directory).Name | Sort-object -Property lastWriteTime -Descending | Select -First 1
Don't select the name in Get-ChildItem, but in the later select, and use -First because you are already sorting it descending:
Get-ChildItem c:\ -Directory | Sort-object -Property lastWriteTime -Descending | select name -first 1
(Get-ChildItem -Path C:\example -Directory | Sort-Object LastWriteTime | Select-Object -Last 1).Name
Get-ChildItem -Path C:\example -Directory: gets a list of all the subfolders in the "C:\example" directory.
Sort-Object LastWriteTime: sorts the folders by their last modified date.
Select-Object -Last 1: selects the last folder in the sorted list.
.Name: displays the name of the selected folder.

Search for specific folder name in filtered path

I need to search for a specific folder name from filtered path.
For example:
I have some folders like on disk m:
M:\
├───2.46.567
│ └───A
├───3.09.356
│ └───A
├───4.05.123
│ └───A
└───4.05.124
└───B
I want to search folder A only from 4.05.xxx dir. And also i want to check is this folder is last one contains folder A.
I try something like the following command:
Get-ChildItem -Path m:\* -recurse -filter '*4.05*' | sort -descending LastWriteTime
Can I do this in PowerShell?
Get-ChildItem allows wildcards on several levels in a path not just the last - no recurse needed.
Get-ChildItem 'M:\4.05.*\A' -Directory | Sort-Object -Descending LastWriteTime
In the above tree this returns just one entry:
Directory: M:\4.05.123
Mode LastWriteTime Length Name
---- ------------- ------ ----
d----- 2019-08-30 12:10 A
An alternative with the same result based on above tree:
Get-ChildItem -Path 'M:\4.05.*' -Filter A -Recurse -Directory | Sort-Object -Descending LastWriteTime
PowerShell Version 2 variant
Get-ChildItem 'M:\4.05.*\A' | Where-Object {$_.PSIsContainer} |
Sort-Object -Desc LastWriteTime | Select-Object -First 1 | Set-Location
Try this:
param(
$SourceDir = "M:\"
)
$a = gci $SourceDir | foreach { $i = gci $sourcedir\$_ -Name ; if($i.equals("A")) {"$_"} }
for($h=0;$h -le $a.Length-1; $h++ ) {
if($a[$h] -like "4.05.*") {
$a[$h]
if( $a[$h].equals($a[$a.length-1])) {
"It is the last one."
}}}
This will return all folders that contain a folder "A" and have "4.05." as part of its name. It will also return whether or not it is the last folder in the array, therefore also the last folder that contains "A".
You can also use Resolve-Path.
(Resolve-Path "M:\4.05.*\A").ProviderPath
This returns the string (not the folder object!) of the paths you're after.

Column Directory empty?

I thought that applying this answer about replacing a path would work for:
$folder = 'C:\test'
$List = Get-ChildItem $folder -Recurse | Sort-Object -Property LastWriteTime
$List | Format-Table name, LastWriteTime, #{Label="Directory"; Expression={$_.Directory.Replace($folder, "")}}
Instead I get nothing in Directory column whereas I should get
\subfolder\
since the files are in
c:\test\subfolder
Name LastWriteTime Directory
---- ------------- ---------
test.1.png 7/21/2018 10:20:44 PM
test.2.png 7/21/2018 10:21:16 PM
test.3.png 7/21/2018 10:21:43 PM
subfolder 9/10/2018 6:53:28 PM
The Directory member of Get-ChildItem is a System.IO.DirectoryInfo. It has a member, Name, that can be used.
PS H:\clan\2018-09-05> (Get-ChildItem).Directory | Get-Member
TypeName: System.IO.DirectoryInfo
Try using:
Get-ChildItem | ForEach-Object { $_.Directory.Name }

Powershell to display duplicate files

I have a task to check if new files are imported for the day in a shared location folder and alert if any duplicate files and no recursive check needed.
Below code displays all the file details with size which are 1 day old However I need only files with the same size as I cannot compare them using name.
$Files = Get-ChildItem -Path E:\Script\test |
Where-Object {$_.CreationTime -gt (Get-Date).AddDays(-1)}
$Files | Select-Object -Property Name, hash, LastWriteTime, #{N='SizeInKb';E={[double]('{0:N2}' -f ($_.Length/1kb))}}
I didn't like the big DOS-like script answer written here, so here's an idiomatic way of doing it for Powershell:
From the folder you want to find the duplicates, just run this simple set of pipes
Get-ChildItem -Recurse -File `
| Group-Object -Property Length `
| ?{ $_.Count -gt 1 } `
| %{ $_.Group } `
| Get-FileHash `
| Group-Object -Property Hash `
| ?{ $_.Count -gt 1 } `
| %{ $_.Group }
Which will show all files and their hashes that match other files.
Each line does the following:
get files
from current directory (use -Path $directory otherwise)
recursively (if not wanted, remove -Recurse)
group based on file size
discard groups with less than 2 files
grab all those files
get hashes for each
group based on hash
discard groups with less than 2 files
get all those files
Add | %{ $_.path } to just show the paths instead of the hashes.
Add | %{ $_.path -replace "$([regex]::escape($(pwd)))",'' } to only show the relative path from the current directory (useful in recursion).
For the question-asker specifically, don't forget to whack in | Where-Object {$_.CreationTime -gt (Get-Date).AddDays(-1)} after the gci so you're not comparing files you don't want to consider, which might get very time-consuming if you have a lot of coincidentally same-length files in that shared folder.
Finally, if you're like me and just wanted to find dupes based on name, as google will probably take you here too:
gci -Recurse -file | Group-Object name | Where-Object { $_.Count -gt 1 } | select -ExpandProperty group | %{ $_.fullname }
All the examples here take in account only timestamp, lenght and name. That is for sure not enough.
Imagine this example
You have two files:
c:\test_path\test.txt and c:\test_path\temp\text.txt.
The first one contains 12345. The second contains 54321. In this case these files will be considered identical even when they are not.
I have create a duplicate checker based on hash calculation. It was created right now from my head so it is rather crude (but I think you get the idea and it will be easy to optimize):
Edit I've decided the source code was "too crude" (nick name for incorrect) and I have improved it (removed superfluous code):
# The current directory where the script is executed
$path = (Resolve-Path .\).Path
$hash_details = #{}
$duplicities = #{}
# Remove unique record by size (different size = different hash)
# You can select only those you need with e.g. "*.jpg"
$file_names = Get-ChildItem -path $path -Recurse -Include "*.*" | ? {( ! $_.PSIsContainer)} | Group Length | ? {$_.Count -gt 1} | Select -Expand Group | Select FullName, Length
# I'm using SHA256 due to SHA1 collisions found
$hash_details = ForEach ($file in $file_names) {
Get-FileHash -Path $file.Fullname -Algorithm SHA256
}
# just counter for the Hash table key
$counter = 0
ForEach ($first_file_hash in $hash_details) {
ForEach ($second_file_hash in $hash_details) {
If (($first_file_hash.hash -eq $second_file_hash.hash) -and ($first_file_hash.path -ne $second_file_hash.path)) {
$duplicities.add($counter, $second_file_hash)
$counter += 1
}
}
}
##Throw output with duplicity files
If ($duplicities.count -gt 0) {
#Write-Output $duplicities.values
Write-Output "Duplicate files found:" $duplicities.values.Path
$duplicities.values | Out-file -Encoding UTF8 duplicate_log.txt
} Else {
Write-Output 'No duplicities found'
}
I have created a test structure:
PS C:\prg\PowerShell\_Snippets\_file_operations\duplicities> Get-ChildItem -path $path -Recurse
Directory: C:\prg\PowerShell\_Snippets\_file_operations\duplicities
Mode LastWriteTime Length Name
---- ------------- ------ ----
d---- 9.4.2018 9:58 test
-a--- 9.4.2018 11:06 2067 check_for_duplicities.ps1
-a--- 9.4.2018 11:06 757 duplicate_log.txt
Directory: C:\prg\PowerShell\_Snippets\_file_operations\duplicities\test
Mode LastWriteTime Length Name
---- ------------- ------ ----
d---- 9.4.2018 9:58 identical_file
d---- 9.4.2018 9:56 t
-a--- 9.4.2018 9:55 5 test.txt
Directory: C:\prg\PowerShell\_Snippets\_file_operations\duplicities\test\identical_file
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 9.4.2018 9:55 5 test.txt
Directory: C:\prg\PowerShell\_Snippets\_file_operations\duplicities\test\t
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 9.4.2018 9:55 5 test.txt
(Where file in the ..\duplicities\test\t is different from the others).
The result of the running script.
The console output:
PS C:\prg\PowerShell\_Snippets\_file_operations\duplicities> .\check_for_duplicities.ps1
Duplicate files found:
C:\prg\PowerShell\_Snippets\_file_operations\duplicities\test\identical_file\test.txt
C:\prg\PowerShell\_Snippets\_file_operations\duplicities\test\test.txt
The duplicate_log.txt file contains more detailed information:
Algorithm Hash Path
--------- ---- ----
SHA256 5994471ABB01112AFCC18159F6CC74B4F511B99806DA59B3CAF5A9C173CACFC5 C:\prg\PowerShell\_Snippets\_file_operations\duplicities\test\identical_file\test.txt
SHA256 5994471ABB01112AFCC18159F6CC74B4F511B99806DA59B3CAF5A9C173CACFC5 C:\prg\PowerShell\_Snippets\_file_operations\duplicities\test\test.txt
Conclusion
As you see the different file is correctly omitted from the result set.
Since the file contents that you are determining to be duplicate. It's more prudent to just hash files and compare the hash.
The name, size. timestamp would not be a prudent attributes for the defined use case. Since the hash would tell you if the files have the same content.
See these discussions
Need a way to check if two files are the same? Calculate a hash of
the files. Here is one way to do it:
https://blogs.msdn.microsoft.com/powershell/2006/04/25/duplicate-files
Duplicate File Finder and Remover
And now the moment you have been waiting for....an all PowerShell file
duplicate finder and remover! Now you can clean up all those copies of
pictures, music files, and videos. The script opens a file dialog box
to select the target folder, recursively scans each file for duplica
https://gallery.technet.microsoft.com/scriptcenter/Duplicate-File-Finder-and-78f40ae9
This might helpful for you.
$files = Get-ChildItem 'E:\SC' | Where-Object {$_.CreationTime -eq (Get-Date).AddDays(-1)} | Group-Object -Property Length
foreach($filegroup in $allfiles)
{
if ($filegroup.Count -ne 1)
{
foreach ($file in $filegroup.Group)
{
Invoke-Item $file.fullname
}
}
}

Powershell Get-ChildItem most recent file in directory

We produce files with date in the name.
(* below is the wildcard for the date)
I want to grab the last file and the folder that contains the file also has a date(month only) in its title.
I am using PowerShell and I am scheduling it to run each day. Here is the script so far:
$LastFile = *_DailyFile
$compareDate = (Get-Date).AddDays(-1)
$LastFileCaptured = Get-ChildItem -Recurse | Where-Object {$LastFile.LastWriteTime
-ge $compareDate}
If you want the latest file in the directory and you are using only the LastWriteTime to determine the latest file, you can do something like below:
gci path | sort LastWriteTime | select -last 1
On the other hand, if you want to only rely on the names that have the dates in them, you should be able to something similar
gci path | select -last 1
Also, if there are directories in the directory, you might want to add a ?{-not $_.PsIsContainer}
Yes I think this would be quicker.
Get-ChildItem $folder | Sort-Object -Descending -Property LastWriteTime -Top 1
Try:
$latest = (Get-ChildItem -Attributes !Directory | Sort-Object -Descending -Property LastWriteTime | select -First 1)
$latest_filename = $latest.Name
Explanation:
PS C:\Temp> Get-ChildItem -Attributes !Directory *.txt | Sort-Object -Descending -Property LastWriteTime | select -First 1
Directory: C:\Temp
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 5/7/2021 5:51 PM 1802 Prison_Mike_autobiography.txt
Get-ChildItem -Attributes !Directory *.txt or Get-ChildItem or gci : Gets list of files ONLY in current directory. We can give a file extension filter too as needed like *.txt. Reference: gci, Get-ChildItem
Sort-Object -Descending -Property LastWriteTime : Sort files by LastWriteTime (modified time) in descending order. Reference
select -First 1 : Gets the first/top record. Reference Select-Object / select
Getting file metadata
PS C:\Temp> $latest.Name
Prison_Mike_autobiography.txt
PS C:\Temp> $latest.DirectoryName
C:\Temp
PS C:\Temp> $latest.FullName
C:\Temp\Prison_Mike_autobiography.txt
PS C:\Temp> $latest.CreationTime
Friday, May 7, 2021 5:51:19 PM
PS C:\Temp> $latest.Mode
-a----
#manojlds's answer is probably the best for the scenario where you are only interested in files within a root directory:
\path
\file1
\file2
\file3
However, if the files you are interested are part of a tree of files and directories, such as:
\path
\file1
\file2
\dir1
\file3
\dir2
\file4
To find, recursively, the list of the 10 most recently modified files in Windows, you can run:
PS > $Path = pwd # your root directory
PS > $ChildItems = Get-ChildItem $Path -Recurse -File
PS > $ChildItems | Sort-Object LastWriteTime -Descending | Select-Object -First 10 FullName, LastWriteTime
You could try to sort descending "sort LastWriteTime -Descending" and then "select -first 1." Not sure which one is faster