Removing duplicate files with Powershell - powershell

I have several thousand duplicate files (jar files as an example) that I'd like to use powershell to
Search through the file system recursively
Find the dups (either by name only or a checksum method or both)
Delete all duplicates but one.
I'm new to powershell and am throwing this out there to the PS folks that might be able to help.

try this:
ls *.txt -recurse | get-filehash | group -property hash | where { $_.count -gt 1 } | % { $_.group | select -skip 1 } | del
from: http://n3wjack.net/2015/04/06/find-and-delete-duplicate-files-with-just-powershell/

Keep a dictionary of files, delete when the next file name was already encountered before:
$dict = #{};
dir c:\admin -Recurse | foreach {
$key = $_.Name #replace this with your checksum function
$find = $dict[$key];
if($find -ne $null) {
#current file is a duplicate
#Remove-Item -Path $_.FullName ?
}
$dict[$key] = 0; #dummy placeholder to save memory
}
I used file name as a key, but you can use a checksum if you want (or both) - see code comment.

Even though the question is old, I have been in a need to clean up all duplicate files based on content. The idea is simple, the algorithm for this is not straightforward. Here is the code which accepts a parameter of "path" to delete duplicates from.
Function Delete-Duplicates {
param(
[Parameter(
Mandatory=$True,
ValueFromPipeline=$True,
ValueFromPipelineByPropertyName=$True
)]
[string[]]$PathDuplicates)
$DuplicatePaths =
Get-ChildItem $PathDuplicates |
Get-FileHash |
Group-Object -Property Hash |
Where-Object -Property Count -gt 1 |
ForEach-Object {
$_.Group.Path |
Select -First ($_.Count -1)}
$TotalCount = (Get-ChildItem $PathDuplicates).Count
Write-Warning ("You are going to delete {0} files out of {1} total. Please confirm the prompt" -f $DuplicatePaths.Count, $TotalCount)
$DuplicatePaths | Remove-Item -Confirm
}
The script
a) Lists all ChildItems
b) Retrieves FileHash from them
c) Groups them by Hash Property (so all the same files are in the single group)
d) Filters out the already-unique files (count of group -eq 1)
e) Loops through each group and lists all but last paths - ensuring one file of each "Hash" always stays
f) Warns before preceding, saying how many files are there in total and how many are going to be deleted.
Probably not the most performance-wise option (SHA1-ing every file) but ensures the file is a duplicate.
Works perfectly fine for me :)

Evolution of #KaiWang's answer which:
Avoids calculating hash of every single file by comparing file length first;
Allows choosing which file you want (here it keeps the file with the longest name).
Get-ChildItem *.ttf -Recurse |
Group -Property Length |
Where { $_.Count -gt 1 } |
ForEach { $_.Group } |
ForEach { $_ } |
Get-FileHash -Algorithm 'MD5' |
Group -Property Hash |
Where { $_.Count -gt 1 } |
ForEach {
$_.Group |
Sort -Property #{ Expression = { $_.Path.Length } } |
Select -SkipLast 1
} |
ForEach { $_.Path } |
ForEach {
Write-Host $_
Del -LiteralPath $_
}

Instead of just remove your duplicates files, you can replace by a shortcut
#requires -version 3
<#
.SYNOPSIS
Script de nettoyage des doublons
.DESCRIPTION
Cherche les doublons par taille, compare leur CheckSum MD5 et les regroupes par Taille et MD5
peut remplacer chacun des doubles par un lien vers le 1er fichier, l'original
.PARAMETER Path
Chemin ou rechercher les doublon
.PARAMETER ReplaceByShortcut
si specifier alors les doublons seront remplacé
.PARAMETER MinLength
ignore les fichiers inferieure a cette taille (en Octets)
.EXAMPLE
.\Clean-Duplicate '\\dfs.adds\donnees\commun'
.EXAMPLE
recherche les doublon de 10Ko et plus
.\Clean-Duplicate '\\dfs.adds\donnees\commun' -MinLength 10000
.EXAMPLE
.\Clean-Duplicate '\\dpm1\d$\Coaxis\Logiciels' -ReplaceByShortcut
#>
[CmdletBinding()]
param (
[string]$Path = '\\Contoso.adds\share$\path\data',
[switch]$ReplaceByShortcut = $false,
[int]$MinLength = 10*1024*1024 # 10 Mo
)
$version = '1.0'
function Create-ShortCut ($ShortcutPath, $shortCutName, $Target) {
$link = "$ShortcutPath\$shortCutName.lnk"
$WshShell = New-Object -ComObject WScript.Shell
$Shortcut = $WshShell.CreateShortcut($link)
$Shortcut.TargetPath = $Target
#$Shortcut.Arguments ="shell32.dll,Control_RunDLL hotplug.dll"
#$Shortcut.IconLocation = "hotplug.dll,0"
$Shortcut.Description ="Copy Doublon"
#$Shortcut.WorkingDirectory ="C:\Windows\System32"
$Shortcut.Save()
# write-host -fore Cyan $link -nonewline; write-host -fore Red ' >> ' -nonewline; write-host -fore Yellow $Target
return $link
}
function Replace-ByShortcut {
Param(
[Parameter(ValueFromPipeline=$true,ValueFromPipelineByPropertyName=$true)]
$SameItems
)
begin{
$result = [pscustomobject][ordered]#{
Replaced = #()
Gain = 0
Count = 0
}
}
Process{
$Original = $SameItems.group[0]
foreach ($doublon in $SameItems.group) {
if ($doublon -ne $Original) {
$result.Replaced += [pscustomobject][ordered]#{
lnk = Create-Shortcut -ShortcutPath $doublon.DirectoryName -shortCutName $doublon.BaseName -Target $Original.FullName
target = $Original.FullName
size = $doublon.Length
}
$result.Gain += $doublon.Length
$result.Count++
Remove-item $doublon.FullName -force
}
}
}
End{
$result
}
}
function Get-MD5 {
param (
[Parameter(Mandatory)]
[string]$Path
)
$HashAlgorithm = New-Object -TypeName System.Security.Cryptography.MD5CryptoServiceProvider
$Stream = [System.IO.File]::OpenRead($Path)
try {
$HashByteArray = $HashAlgorithm.ComputeHash($Stream)
} finally {
$Stream.Dispose()
}
return [System.BitConverter]::ToString($HashByteArray).ToLowerInvariant() -replace '-',''
}
if (-not $Path) {
if ((Get-Location).Provider.Name -ne 'FileSystem') {
Write-Error 'Specify a file system path explicitly, or change the current location to a file system path.'
return
}
$Path = (Get-Location).ProviderPath
}
$DuplicateFiles = Get-ChildItem -Path $Path -Recurse -File |
Where-Object { $_.Length -gt $MinLength } |
Group-Object -Property Length |
Where-Object { $_.Count -gt 1 } |
ForEach-Object {
$_.Group |
ForEach-Object {
$_ | Add-Member -MemberType NoteProperty -Name ContentHash -Value (Get-MD5 -Path $_.FullName)
}
$_.Group |
Group-Object -Property ContentHash |
Where-Object { $_.Count -gt 1 }
}
$somme = ($DuplicateFiles.group | Measure-Object length -Sum).sum
write-host "$($DuplicateFiles.group.count) doublons, soit $($somme/1024/1024) Mo" -fore cyan
if ($ReplaceByShortcut) {
$DuplicateFiles | Replace-ByShortcut
} else {
$DuplicateFiles
}

Related

Powershell script to compare two directories (including sub directories and contents) that are supposed to be identical but on different servers

I would like to run a powershell script that can be supplied a directory name by the user and then it will check the directory, sub directories, and all file contents of those directories to compare if they are identical to each other. There are 8 servers that should all have identical files and contents. The below code does not appear to be doing what I intended. I have seen the use of Compare-Object, Get-ChildItem, and Get-FileHash but have not found the right combo that I am certain is actually accomplishing the task. Any and all help is appreciated!
$35 = "\\server1\"
$36 = "\\server2\"
$37 = "\\server3\"
$38 = "\\server4\"
$45 = "\\server5\"
$46 = "\\server6\"
$47 = "\\server7\"
$48 = "\\server8\"
do{
Write-Host "|1 : New |"
Write-Host "|2 : Repeat|"
Write-Host "|3 : Exit |"
$choice = Read-Host -Prompt "Please make a selection"
switch ($choice){
1{
$App = Read-Host -Prompt "Input Directory Application"
}
2{
#rerun
}
3{
exit; }
}
$c35 = $35 + "$App" +"\*"
$c36 = $36 + "$App" +"\*"
$c37 = $37 + "$App" +"\*"
$c38 = $38 + "$App" +"\*"
$c45 = $45 + "$App" +"\*"
$c46 = $46 + "$App" +"\*"
$c47 = $47 + "$App" +"\*"
$c48 = $48 + "$App" +"\*"
Write-Host "Comparing Server1 -> Server2"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c36 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
Write-Host "Comparing Server1 -> Server3"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c37 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
Write-Host "Comparing Server1 -> Server4"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c38 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
Write-Host "Comparing Server1 -> Server5"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c45 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
Write-Host "Comparing Server1 -> Server6"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c46 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
Write-Host "Comparing Server1 -> Server7"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c47 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
Write-Host "Comparing Server1 -> Server8"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c48 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}
} until ($choice -eq 3)
Here is an example function that tries to compare one reference directory against multiple difference directories efficiently. It does so by comparing the most easily available informations first and stopping at the first difference.
Get all relevant informations about files in reference directory once, including hashes (though this could be more optimized by getting hashes only if necessary).
For each difference directory, compare in this order:
file count - if different, then obviously directories are different
relative file paths - if not all paths from difference directory can be found in reference directory, then directories are different
file sizes - should be obvious
file hashes - hashes only need to be calculated if files have equal size
Function Compare-MultipleDirectories {
param(
[Parameter(Mandatory)] [string] $ReferencePath,
[Parameter(Mandatory)] [string[]] $DifferencePath
)
# Get basic file information recursively by calling Get-ChildItem with the addition of the relative file path
Function Get-ChildItemRelative {
param( [Parameter(Mandatory)] [string] $Path )
Push-Location $Path # Base path for Get-ChildItem and Resolve-Path
try {
Get-ChildItem -File -Recurse |
Select-Object FullName, Length, #{ n = 'RelativePath'; e = { Resolve-Path $_.FullName -Relative } }
} finally {
Pop-Location
}
}
Write-Verbose "Reading reference directory '$ReferencePath'"
# Create hashtable with all infos of reference directory
$refFiles = #{}
Get-ChildItemRelative $ReferencePath |
Select-Object *, #{ n = 'Hash'; e = { (Get-FileHash $_.FullName -Algorithm MD5).Hash } } |
ForEach-Object { $refFiles[ $_.RelativePath ] = $_ }
# Compare content of each directory of $DifferencePath with $ReferencePath
foreach( $diffPath in $DifferencePath ) {
Write-Verbose "Comparing directory '$diffPath' with '$ReferencePath'"
$areDirectoriesEqual = $false
$differenceType = $null
$diffFiles = Get-ChildItemRelative $diffPath
# Directories must have same number of files
if( $diffFiles.Count -eq $refFiles.Count ) {
# Find first different path (if any)
$firstDifferentPath = $diffFiles | Where-Object { -not $refFiles.ContainsKey( $_.RelativePath ) } |
Select-Object -First 1
if( -not $firstDifferentPath ) {
# Find first different content (if any) by file size comparison
$firstDifferentFileSize = $diffFiles |
Where-Object { $refFiles[ $_.RelativePath ].Length -ne $_.Length } |
Select-Object -First 1
if( -not $firstDifferentFileSize ) {
# Find first different content (if any) by hash comparison
$firstDifferentContent = $diffFiles |
Where-Object { $refFiles[ $_.RelativePath ].Hash -ne (Get-FileHash $_.FullName -Algorithm MD5).Hash } |
Select-Object -First 1
if( -not $firstDifferentContent ) {
$areDirectoriesEqual = $true
}
else {
$differenceType = 'Content'
}
}
else {
$differenceType = 'FileSize'
}
}
else {
$differenceType = 'Path'
}
}
else {
$differenceType = 'FileCount'
}
# Output comparison result
[PSCustomObject]#{
ReferencePath = $ReferencePath
DifferencePath = $diffPath
Equal = $areDirectoriesEqual
DiffCause = $differenceType
}
}
}
Usage example:
# compare each of directories B, C, D, E, F against A
Compare-MultipleDirectories -ReferencePath 'A' -DifferencePath 'B', 'C', 'D', 'E', 'F' -Verbose
Output example:
ReferencePath DifferencePath Equal DiffCause
------------- -------------- ----- ---------
A B True
A C False FileCount
A D False Path
A E False FileSize
A F False Content
DiffCause column gives you the information why the function thinks the directories are different.
Note:
Select-Object -First 1 is a neat trick to stop searching after we got the first result. It is efficient because it doesn't process all input first and drop everything except first item, but instead it actually cancels the pipeline after the 1st item has been found.
Group-Object RelativePath -AsHashTable creates a hashtable of the file information so it can be looked up quickly by the RelativePath property.
Empty sub directories are ignored, because the function only looks at files. E. g. if reference path contains some empty directories but difference path does not, and the files in all other directories are equal, the function treats the directories as equal.
I've choosen MD5 algorithm because it is faster than the default SHA-256 algorithm used by Get-FileHash, but it is insecure. Someone could easily manipulate a file that is different, to have the same MD5 hash as the original file. In a trusted environment this won't matter though. Remove -Algorithm MD5 if you need more secure comparison.
A simple place to start:
compare (dir -r dir1) (dir -r dir2) -Property name,length,lastwritetime
You can also add -passthru to see the original objects, or -includeequal to see the equal elements. The order of each array doesn't matter without -syncwindow. I'm assuming all the lastwritetime's are in sync, to the millisecond. Don't assume you can skip specifying the properties to compare. See also Comparing folders and content with PowerShell
I was looking into calculated properties like for relative path, but it looks like you can't name them, even in powershell 7. I'm chopping off the first four path elements, 0..3.
compare (dir -r foo1) (dir -r foo2) -Property length,lastwritetime,#{e={($_.fullname -split '\\')[4..$_.fullname.length] -join '\'}}
length lastwritetime ($_.fullname -split '\\')[4..$_.fullname.length] -join '\' SideIndicator
------ ------------- ---------------------------------------------------------- -------------
16 11/12/2022 11:30:20 AM foo2\file2 =>
18 11/12/2022 11:30:20 AM foo1\file2 <=

How to speed up my SUPER SLOW search script

UPDATE (06/21/22): See my updated script below, which utilizes some of the answer.
I am building a script to search for $name through a large batch of CSV files. These files can be as big as 67,000 KB. This is my script that I use to search the files:
Powershell Script
Essentially, I use Import-Csv. I change a few things depending on the file name, however. For example, some files don't have headers, or they may use a different delimiter. Then I store all the matches in $results and then return that variable. This is all put in a function called CSVSearch for ease of running.
#create function called CSV Search
function CSVSearch{
#prompt
$name = Read-Host -Prompt 'Input name'
#set path to root folder
$path = 'Path\to\root\folder\'
#get the file path for each CSV file in root folder
$files = Get-ChildItem $path -Filter *.csv | Select-Object -ExpandProperty FullName
#count files in $files
$filesCount = $files.Count
#create empty array, $results
$results= #()
#count for write-progress
$i = 0
foreach($file in $files){
Write-Progress -Activity "Searching files: $i out of $filesCount searched. $resultsCount match(es) found" -PercentComplete (($i/$files.Count)*100)
#import method changes depending on CSV file name found in $file (headers, delimiters).
if($file -match 'File1*'){$results += Import-Csv $file -Header A, Name, C, D -Delimiter '|' | Select-Object *,#{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
if($file -match 'File2*'){$results += Import-Csv $file -Header A, B, Name -Delimiter '|' | Select-Object *,#{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
if($file -match 'File3*'){$results += Import-Csv $file | Select-Object *,#{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
if($file -match 'File4*'){$results += Import-Csv $file | Select-Object *,#{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
$i++
$resultsCount = $results.Count
}
#if the loop ends and $results array is empty, return "No matches."
if(!$results){Write-Host 'No matches found.' -ForegroundColor Yellow}
#return results stored in $results variable
else{$results
Write-Host $resultsCount 'matches found.' -ForegroundColor Green
Write-Progress -Activity "Completed" -Completed}
}
CSVSearch
Below are what the CSV files look like. Obviously, the amount of the data below is not going to equate to the actual size of the files. But below is the basic structure:
CSV files
File1.csv
1|Moonknight|QWEPP|L
2|Star Wars|QWEPP|T
3|Toy Story|QWEPP|U
File2.csv
JKLH|1|Moonknight
ASDF|2|Star Wars
QWER|3|Toy Story
File3.csv
1,Moonknight,AA,DDD
2,Star Wars,BB,CCC
3,Toy Story,CC,EEE
File4.csv
1,Moonknight,QWE
2,Star Wars,QWE
3,Toy Story,QWE
The script works great. Here is an example of the output I would receive if $name = Moonknight:
Example of results
A : 1
Name : Moonknight
C: QWE
FileName: Path\to\root\folder\File4.csv
A : 1
Name : Moonknight
B : AA
C : DDD
FileName: Path\to\root\folder\File3.csv
A : JKLH
B : 1
Name : Moonknight
FileName: Path\to\root\folder\File2.csv
A : 1
Name : Moonknight
C : QWEPP
D : L
FileName: Path\to\root\folder\File1.csv
4 matches found.
However, it is very slow, and I have a lot of files to search through. Any ideas on how to speed my script up?
Edit: I must mention. I tried importing the data into a hash table and then searching the hash table, but that was much slower.
UPDATED SCRIPT - My Solution (06/21/22):
This update utilizes some of Santiago's script below. I was having a hard time decoding everything he did, as I am new to PowerShell. So I sort of jerry rigged my own solution, that used a lot of his script/ideas.
The one thing that made a huge difference was outputting $results[$i] which returns the most recent match as the script is running. Probably not the most efficient way to do it, but it works for what I'm trying to do. Thanks!
function CSVSearch{
[cmdletbinding()]
param(
[Parameter(Mandatory)]
[string] $Name
)
$files = Get-ChildItem 'Path\to\root\folder\' -Filter *.csv -Recurse | %{$_.FullName}
$results = #()
$i = 0
foreach($file in $files){
if($file -like '*File1*'){$results += Import-Csv $file -Header A, Name, C, D -Delimiter '|' | Where-Object { $_.'Name' -match $Name} | Select-Object *,#{Name='FileName';Expression={$file}}}
if($file -like' *File2*'){$results += Import-Csv $file -Header A, B, Name -Delimiter '|' | Where-Object { $_.'Name' -match $Name} | Select-Object *,#{Name='FileName';Expression={$file}}}
if($file -like '*File3*'){$results += Import-Csv $file | Where-Object { $_.'Name' -match $Name} | Select-Object *,#{Name='FileName';Expression={$file}}}
if($file -like '*File4*'){$results += Import-Csv $file | Where-Object { $_.'Name' -match $Name} | Select-Object *,#{Name='FileName';Expression={$file}}}
$results[$i]
$i++
}
if(-not $results) {
Write-Host 'No matches found.' -ForegroundColor Yellow
return
}
Write-Host "$($results.Count) matches found." -ForegroundColor Green
}
Give this one a try, it should be a bit faster. Select-Object has to reconstruct your object, if you use it before filtering, you're actually recreating your entire CSV, you want to filter first (Where-Object / .Where) before reconstructing it.
.Where should be a faster than Where-Object here, the caveat is that the intrinsic method requires that the collections already exists in memory, there is no pipeline processing and no streaming.
Write-Progress will only slow down your script, better remove it.
Lastly, you can use splatting to avoid having multiple if conditions.
function CSVSearch {
[cmdletbinding()]
param(
[Parameter(Mandatory)]
[string] $Name,
[Parameter()]
[string] $Path = 'Path\to\root\folder\'
)
$param = #{
File1 = #{ Header = 'A', 'Name', 'C', 'D'; Delimiter = '|' }
File2 = #{ Header = 'A', 'B', 'Name' ; Delimiter = '|' }
File3 = #{}; File4 = #{} # File3 & 4 should have headers ?
}
$results = foreach($file in Get-ChildItem . -Filter file*.csv) {
$thisparam = $param[$file.BaseName]
$thisparam['LiteralPath'] = $file.FullName
(Import-Csv #thisparam).where{ $_.Name -match $name } |
Select-Object *, #{Name='FileName';Expression={$file}}
}
if(-not $results) {
Write-Host 'No matches found.' -ForegroundColor Yellow
return
}
Write-Host "$($results.Count) matches found." -ForegroundColor Green
$results
}
CSVSearch -Name Moonknight
If you want the function to stream results as they're found, you can use a Filter, this is a very efficient filtering technique, certainly faster than Where-Object:
function CSVSearch {
[cmdletbinding()]
param(
[Parameter(Mandatory)]
[string] $Name,
[Parameter()]
[string] $Path = 'Path\to\root\folder\'
)
begin {
$param = #{
File1 = #{ Header = 'A', 'Name', 'C', 'D'; Delimiter = '|' }
File2 = #{ Header = 'A', 'B', 'Name' ; Delimiter = '|' }
File3 = #{}; File4 = #{} # File3 & 4 should have headers ?
}
$counter = [ref] 0
filter myFilter {
if($_.Name -match $name) {
$counter.Value++
$_ | Select-Object *, #{N='FileName';E={$file}}
}
}
}
process {
foreach($file in Get-ChildItem $path -Filter *.csv) {
$thisparam = $param[$file.BaseName]
$thisparam['LiteralPath'] = $file.FullName
Import-Csv #thisparam | myFilter
}
}
end {
if(-not $counter.Value) {
Write-Host 'No matches found.' -ForegroundColor Yellow
return
}
Write-Host "$($counter.Value) matches found." -ForegroundColor Green
}
}

Calling a Function During Array Creation to export contents of folder containing Zip files

I am having trouble trying to call a function for a script that I'm using to build a list of zip files in a folder on my PC. The final CSV I need is to create is a list of the zip files with their uncompressed sizes. Here is what I have so far (compiled from several posts):
Function to get the uncompressed size:
function Get-UncompressedZipFileSize {
param (
$Path
)
$shell = New-Object -ComObject shell.application
$zip = $shell.NameSpace($Path)
$size = 0
foreach ($item in $zip.items()) {
if ($item.IsFolder) {
$size += Get-UncompressedZipFileSize -Path $item.Path
} else {
$size += $item.size
}
}
[System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$shell) | Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
return $size
}
Here is my Array Creation:
$arr = #()
gci C:\zips -recurse | ? {$_.PSIsContainer -eq $False} | % {
$obj = New-Object PSObject
$obj | Add-Member NoteProperty Name $_.Name
$obj | Add-Member NoteProperty FullPath $_.FullName
$arr += $obj
}
$arr | Export-CSV -notypeinformation c:\zips
I'm stuck at creating a new member object into my array that will call the get-uncompressedzipfilesize function to pass that size back into the array as a new column in my zip. Is something like this even possible.?
Here is an alternative using ZipFile Class. The SizeConvert Class is inspired from this answer. The output of the Get-ZipFileSize would be the absolute path of the Zip File, its compressed and expanded size and its formatted friendly sizes (i.e.: 7.88 MB instead of 8262942).
using namespace System.IO
using namespace System.IO.Compression
using namespace System.Linq
function Get-ZipFileSize {
[cmdletbinding()]
param(
[parameter(ValueFromPipelineByPropertyName)]
[string] $FullName
)
begin {
if(-not $IsCoreCLR) {
Add-Type -AssemblyName System.IO.Compression.FileSystem
}
class SizeConvert {
static [string[]] $Suffix = "B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB"
static [string] ToFriendlySize([int64] $Length, [int] $DecimalPoints) {
$idx = 0
while ($Length -ge 1kb) {
$Length /= 1kb
$idx++
}
return '{0} {1}' -f [math]::Round($Length, $DecimalPoints), [SizeConvert]::Suffix[$idx]
}
}
}
process {
try {
$zip = [ZipFile]::OpenRead($FullName)
$expanded = [Enumerable]::Sum([Int64[]] $zip.Entries.Length)
$compressed = [Enumerable]::Sum([int64[]] $zip.Entries.CompressedLength)
[pscustomobject]#{
FilePath = $FullName
RawExpanded = $expanded
RawCompressed = $compressed
FormattedExpanded = [SizeConvert]::ToFriendlySize($expanded, 2)
FormattedCompressed = [SizeConvert]::ToFriendlySize($compressed, 2)
}
}
catch {
$PSCmdlet.WriteError($_)
}
finally {
if($zip -is [System.IDisposable]) {
$zip.Dispose()
}
}
}
}
Get-ChildItem -Filter *.zip -Recurse | Get-ZipFileSize | Export-Csv ....
To make this simpler, since you're only calling your function to get the size of the current folder (zip), you can use a Calculated Property for this:
$Path = "C:\Zips"
Get-ChildItem -Path $Path -Directory -Recurse |
Select-Object -Property Name, FullName,
#{
Name = "Size"
Expression = {
Get-UncompressedZipFileSize -Path $_.FullName
}
} | Export-Csv -Path "$Path\zip.csv" -Force -NoTypeInformation -Append
On another note, if you ever find yourself explicitly adding to an array, take advantage of PowerShell's pipeline streaming.
$Path = "C:\Zips"
Get-ChildItem -Path $Path -Directory -Recurse |
ForEach-Object -Process {
[PSCustomObject]#{
Name = $_.Name
FullPath = $_.FullName
Size = Get-UncompressedZipFileSize -Path $_.FullName
} | Export-Csv -Path "$Path\zip.csv" -Force -NoTypeInformation -Append
}
Not only is adding to a fixed array (+=) computationally expensive (if you have a large directory), it is slow. Fixed arrays mean just that, they are a fixed size and in order for you to add to it, it needs to be broken down and recreated. An alternate solution would by an arraylist but, in this case - and in most cases - it's not needed.
Get-ChildItem also includes a -Directory switch to search for just folders. Presented in V3.
I would recommend searching for the file extension of the compressed folders as well so you don't run into any issues using -Filter.

Powershell Script is printing out duplicate entries of the same path

My objective is to write a powershell script that will recursively check a file server for any directories that are "x" (insert days) old or older.
I ran into a few issues initially, and I think I got most of it worked out. One of the issues I ran into was with the path limitation of 248 characters. I found a custom function that I am implementing in my code to bypass this limitation.
The end result is I would like to output the path and LastAccessTime of the folder and export the information into an easy to read csv file.
Currently everything is working properly, but for some reason I get some paths output several times (duplicates, triples, even 4 times). I just want it output once for each directory and subdirectory.
I'd appreciate any guidance I can get. Thanks in advance.
Here's my code
#Add the import and snapin in order to perform AD functions
Add-PSSnapin Quest.ActiveRoles.ADManagement -ea SilentlyContinue
Import-Module ActiveDirectory
#Clear Screen
CLS
Function Get-FolderItem
{
[cmdletbinding(DefaultParameterSetName='Filter')]
Param (
[parameter(Position=0,ValueFromPipeline=$True,ValueFromPipelineByPropertyName=$True)]
[Alias('FullName')]
[string[]]$Path = $PWD,
[parameter(ParameterSetName='Filter')]
[string[]]$Filter = '*.*',
[parameter(ParameterSetName='Exclude')]
[string[]]$ExcludeFile,
[parameter()]
[int]$MaxAge,
[parameter()]
[int]$MinAge
)
Begin
{
$params = New-Object System.Collections.Arraylist
$params.AddRange(#("/L","/S","/NJH","/BYTES","/FP","/NC","/NFL","/TS","/XJ","/R:0","/W:0"))
If ($PSBoundParameters['MaxAge'])
{
$params.Add("/MaxAge:$MaxAge") | Out-Null
}
If ($PSBoundParameters['MinAge'])
{
$params.Add("/MinAge:$MinAge") | Out-Null
}
}
Process
{
ForEach ($item in $Path)
{
Try
{
$item = (Resolve-Path -LiteralPath $item -ErrorAction Stop).ProviderPath
If (-Not (Test-Path -LiteralPath $item -Type Container -ErrorAction Stop))
{
Write-Warning ("{0} is not a directory and will be skipped" -f $item)
Return
}
If ($PSBoundParameters['ExcludeFile'])
{
$Script = "robocopy `"$item`" NULL $Filter $params /XF $($ExcludeFile -join ',')"
}
Else
{
$Script = "robocopy `"$item`" NULL $Filter $params"
}
Write-Verbose ("Scanning {0}" -f $item)
Invoke-Expression $Script | ForEach {
Try
{
If ($_.Trim() -match "^(?<Children>\d+)\s+(?<FullName>.*)")
{
$object = New-Object PSObject -Property #{
ParentFolder = $matches.fullname -replace '(.*\\).*','$1'
FullName = $matches.FullName
Name = $matches.fullname -replace '.*\\(.*)','$1'
}
$object.pstypenames.insert(0,'System.IO.RobocopyDirectoryInfo')
Write-Output $object
}
Else
{
Write-Verbose ("Not matched: {0}" -f $_)
}
}
Catch
{
Write-Warning ("{0}" -f $_.Exception.Message)
Return
}
}
}
Catch
{
Write-Warning ("{0}" -f $_.Exception.Message)
Return
}
}
}
}
Function ExportFolders
{
#================ Global Variables ================
#Path to folders
$Dir = "\\myFileServer\somedir\blah"
#Get all folders
$ParentDir = Get-ChildItem $Dir | Where-Object {$_.PSIsContainer -eq $True}
#Export file to our destination
$ExportedFile = "c:\temp\dirFolders.csv"
#Duration in Days+ the file hasn't triggered "LastAccessTime"
$duration = 800
$cutOffDate = (Get-Date).AddDays(-$duration)
#Used to hold our information
$results = #()
#=============== Done with Variables ===============
ForEach ($SubDir in $ParentDir)
{
$FolderPath = $SubDir.FullName
$folders = Get-ChildItem -Recurse $FolderPath -force -directory| Where-Object { ($_.LastAccessTimeUtc -le $cutOffDate)} | Select-Object FullName, LastAccessTime
ForEach ($folder in $folders)
{
$folderPath = $folder.fullname
$fixedFolderPaths = ($folderPath | Get-FolderItem).fullname
ForEach ($fixedFolderPath in $fixedFolderPaths)
{
#$fixedFolderPath
$getLastAccessTime = $(Get-Item $fixedFolderPath -force).lastaccesstime
#$getLastAccessTime
$details = #{ "Folder Path" = $fixedFolderPath; "LastAccessTime" = $getLastAccessTime}
$results += New-Object PSObject -Property $details
$results
}
}
}
}
ExportFolders
I updated my code a bit and simplified it. Here is the new code.
#Add the import and snapin in order to perform AD functions
Add-PSSnapin Quest.ActiveRoles.ADManagement -ea SilentlyContinue
Import-Module ActiveDirectory
#Clear Screen
CLS
Function ExportFolders
{
#================ Global Variables ================
#Path to user profiles in Barrington
$Dir = "\\myFileServer\somedir\blah"
#Get all user folders
$ParentDir = Get-ChildItem $Dir | Where-Object {$_.PSIsContainer -eq $True} | where {$_.GetFileSystemInfos().Count -eq 0 -or $_.GetFileSystemInfos().Count -gt 0}
#Export file to our destination
$ExportedFile = "c:\temp\dirFolders.csv"
#Duration in Days+ the file hasn't triggered "LastAccessTime"
$duration = 1
$cutOffDate = (Get-Date).AddDays(-$duration)
#Used to hold our information
$results = #()
$details = $null
#=============== Done with Variables ===============
ForEach ($SubDir in $ParentDir)
{
$FolderName = $SubDir.FullName
$FolderInfo = $(Get-Item $FolderName -force) | Select-Object FullName, LastAccessTime #| ft -HideTableHeaders
$FolderLeafs = gci -Recurse $FolderName -force -directory | Where-Object {$_.PSIsContainer -eq $True} | where {$_.GetFileSystemInfos().Count -eq 0 -or $_.GetFileSystemInfos().Count -gt 0} | Select-Object FullName, LastAccessTime #| ft -HideTableHeaders
$details = #{ "LastAccessTime" = $FolderInfo.LastAccessTime; "Folder Path" = $FolderInfo.FullName}
$results += New-Object PSObject -Property $details
ForEach ($FolderLeaf in $FolderLeafs.fullname)
{
$details = #{ "LastAccessTime" = $(Get-Item $FolderLeaf -force).LastAccessTime; "Folder Path" = $FolderLeaf}
$results += New-Object PSObject -Property $details
}
$results
}
}
ExportFolders
The FolderInfo variable is sometimes printing out multiple times, but the FolderLeaf variable is printing out once from what I can see. The problem is if I move or remove the results variable from usnder the details that print out the folderInfo, then the Parent directories don't get printed out. Only all the subdirs are shown. Also some directories are empty and don't get printed out, and I want all directories printed out including empty ones.
The updated code seems to print all directories fine, but as I mentioned I am still getting some duplicate $FolderInfo variables.
I think I have to put in a condition or something to check if it has already been processed, but I'm not sure which condition I would use to do that, so that it wouldn't print out multiple times.
In your ExportFolders you Get-ChildItem -Recurse and then loop over all of the subfolders calling Get-FolderItem. Then in Get-FolderItem you provide Robocopy with the /S flag in $params.AddRange(#("/L", "/S", "/NJH", "/BYTES", "/FP", "/NC", "/NFL", "/TS", "/XJ", "/R:0", "/W:0")) The /S flag meaning copy Subdirectories, but not empty ones. So you are recursing again. Likely you just need to remove the /S flag, so that you are doing all of your recursion in ExportFolders.
In response to the edit:
Your $results is inside of the loop. So you will have a n duplicates for the first $subdir then n-1 duplicates for the second and so forth.
ForEach ($SubDir in $ParentDir) {
#skipped code
ForEach ($FolderLeaf in $FolderLeafs.fullname) {
#skipped code
}
$results
}
should be
ForEach ($SubDir in $ParentDir) {
#skipped code
ForEach ($FolderLeaf in $FolderLeafs.fullname) {
#skipped code
}
}
$results

Powershell get total size of files which user owns

I need a powershell script, which will go through all users in system and will find total size of all files which any user own... I have script which is going through all users, but then I've no idea to continue with counting total size which user owns for each user
Here is a script, which I`ve now:
$users = Get-WmiObject -class Win32_UserAccount
foreach($user in $users) {
$name = $user.Name
$fullName = $user.FullName;
if(Test-Path "C:\Users\$name") {
$path = "C:\Users\$name"
} else {
$path = "C:\Users\Public"
}
$dirSize = (Get-ChildItem $path -recurse | Measure-Object -property length -sum)
"{0:N2}" -f ($dirSize.sum / 1Gb) + " Gb"
echo "$dirSize"
Add-Content -path "pathototxt..." -value "$name $fullName $path"
}
I would be more than happy If somebody know the answer and tell me it...
Thank you
If there's a lot of files, you might want to consider:
$oSIDs = #{}
get-childitem <filespec> |
foreach {
$oSID = $_.GetAccessControl().Sddl -replace '^o:(.+?).:.+','$1'
$oSIDs[$oSID] += $_.length
}
Then resolve the SIDs when you're done. Parsing the owner SID or well-know security principal ID from the SDDL string saves the provider from having to do a lot of repetitive name resolution to give you back the "friendly" names.
I have no idea what you're asking for here.
"to continue with counting total size which user owns for each user". huh? Do want to check every file on the system or just the userfolder as you currently do?
Your script works fine if you just tweak it to include the filesize in the output. Personally I'd consider using a csv to store this because not all users will have e.g. a full name(admin, guest etc.). Also, atm. your script is counting the public folder multiple times(each time a user doesn't have a profile). E.g. admin(if it has never logged in), guest etc. might both get it specified.
Updated script that outputs both textfile and csv
$users = Get-WmiObject -class Win32_UserAccount
$out = #()
#If you want to append to a csv-file, replace the $out line above with the one below
#$out = Import-Csv "file.csv"
foreach($user in $users) {
$name = $user.Name
$fullName = $user.FullName;
if(Test-Path "C:\Users\$name") {
$path = "C:\Users\$name"
} else {
$path = "C:\Users\Public"
}
$dirSize = (Get-ChildItem $path -Recurse -ErrorAction SilentlyContinue | ? { !$_.PSIsContainer } | Measure-Object -Property Length -Sum)
$size = "{0:N2}" -f ($dirSize.Sum / 1Gb) + " Gb"
#Saving as textfile
#Add-Content -path "pathototxt..." -value "$name $fullName $path $size"
Add-Content -path "file.txt" -value "$name $fullName $path $size"
#CSV-way
$o = New-Object psobject -Property #{
Name = $name
FullName = $fullName
Path = $path
Size = $size
}
$out += $o
}
#Exporting to csv format
$out | Export-Csv "file.csv" -NoTypeInformation
EDIT: Another solution using the answer provided by #mjolinor and #C.B. modified to scan your c:\ drive while excluding some "rootfolders" like "program files", "windows" etc. It exports the result to a csv file ready for Excel.:
$oSIDs = #{}
$exclude = #("Program Files", "Program Files (x86)", "Windows", "Perflogs");
Get-ChildItem C:\ | ? { $exclude -notcontains $_.Name } | % { Get-ChildItem $_.FullName -Recurse -ErrorAction SilentlyContinue | ? { !$_.PSIsContainer } } | % {
$oSID = $_.GetAccessControl().Sddl -replace '^o:(.+?).:.+','$1'
$oSIDs[$oSID] += $_.Length
}
$out = #()
$oSIDs.GetEnumerator() | % {
$user = (New-Object System.Security.Principal.SecurityIdentifier($_.Key)).Translate([System.Security.Principal.NTAccount]).Value
$out += New-Object psobject -Property #{
User = if($user) { $user } else { $_.Key }
"Size(GB)" = $oSIDs[$_.Key]/1GB
}
}
$out | Export-Csv file.csv -NoTypeInformation -Delimiter ";"