How to check duplicate multiple file using powershell? - powershell

I want to check duplicate file.If the condition of the file like this, it means duplicate. The same name but different extension.
AAA18WWQ6BT602.PRO
AAA18WWQ6BT602.XML
I can figure out this case with my script. But I have problem if I have this more than 1 .XML file like this
AAA18WWQ6BT602.PRO
AAA18WWQ6BT602.XML
AAA18WWQ6BT601.XML
AAA18WWQ6BT604.XML
This case, it will not detect that file AAA18WWQ6BT602.PRO and AAA18WWQ6BT602.XML duplicated.
Anyone can help me please.
Thanks
$duplicate = #()
#(Get-ChildItem "$Flag_Path\*.xml") | ForEach-Object { $duplicate += $_.basename }
if(Test-Path -Path "$Flag_Path\*$duplicate*" -Exclude *.xml)
{
Get-ChildItem -Path "$Flag_Path\*$duplicate*" -Include *.xml | Out-File $Flag_Path\Flag_Duplicate
Write-Host "Flag duplicated, continue for Error_Monitoring"
pause
Error_Monitoring
}
else{
Write-Host "Flag does not duplicate, continue the process"
}

The -Include parameter only works if the path on Get-ChildItem ends in \* OR if the -Recurse switch is used.
The following should do what you want:
$flagFolder = 'D:\*'
$dupeReport = 'D:\Flag_Duplicate.txt'
$duplicates = Get-ChildItem -Path $flagFolder -File -Include '*.xml', '*.pro' |
Group-Object -Property BaseName | Where-Object { $_.Count -gt 1 }
if ($duplicates) {
# output the duplicate XML to Flag_Duplicate.txt
$duplicates.Group | Where-Object {$_.Extension -eq '.xml' } | ForEach-Object {
$_.FullName | Out-File -FilePath $dupeReport -Append
}
# do the rest of your code
Write-Host "Flag duplicated, continue for Error_Monitoring"
Error_Monitoring
}
else {
Write-Host "Flag does not duplicate, continue the process"
}

Your script does not iterate correctly. You need to have an iteration to check. The Test-Path logic looks mixed up to me. I tried to keep as much of your code as possible.
This script checks for a any xml basename filename against any suffix duplicate (not only pro):
$Flag_Path = "C:\dir_to_be_checked"
$xmlFilesArray = #()
$allFilesExceptXml = #() # all files excluding xml files
# Get all the xml files
Get-ChildItem -Path $Flag_Path -Include "*.xml" | ForEach-Object { $xmlFilesArray += $_.basename }
# Get all files from the directory the xml files
Get-ChildItem -Path $Flag_Path -Exclude "*.xml" | ForEach-Object { $allFilesExceptXml += $_.basename }
# Iterate over list of files names without suffix
ForEach ($xmlFile in $xmlFilesArray) {
ForEach ($fileToCheck in $allFilesExceptXml) {
If ($xmlFile -eq $fileToCheck) {
# logging the duplicate file (specifying utf8 or the output would be UTF-16)
Write-Output "$Flag_Path\$xmlFile.xml" | Out-File -Append -Encoding utf8 $Flag_Path\Flag_Duplicate
Write-Host "Flag duplicated, continue with duplicate search"
# pause
Write-Host "Press any key to continue ..."
$x = $host.UI.RawUI.ReadKey("NoEcho,IncludeKeyDown")
Error_Monitoring
} Else {
Write-Host "Flag is not duplicated. Continue with the search."
}
}
}

Related

Find similarly-named files, and if present, remove the files without a specific string using PowerShell

In a directory, there are files with the following filenames:
ExampleFile.mp3
ExampleFile_pn.mp3
ExampleFile2.mp3
ExampleFile2_pn.mp3
ExampleFile3.mp3
I want to iterate through the directory, and IF there is a filename that contains the string '_pn.mp3', I want to test if there is a similarly named file without the '_pn.mp3' in the same directory. If that file exists, I want to remove it.
In the above example, I'd want to remove:
ExampleFile.mp3
ExampleFile2.mp3
and I'd want to keep ExampleFile3.mp3
Here's what I have so far:
$pattern = "_pn.mp3"
$files = Get-ChildItem -Path '$path' | Where-Object {! $_.PSIsContainer}
Foreach ($file in $files) {
If($file.Name -match $pattern){
# filename with _pn.mp3 exists
Write-Host $file.Name
# search in the current directory for the same filename without _pn
<# If(Test-Path $currentdir $filename without _pn.mp3) {
Remove-Item -Force}
#>
}
enter code here
You could use Group-Object to group all files by their BaseName (with the pattern removed), and then loop over the groups where there are more than one file. The result of grouping the files and filtering by count would look like this:
$files | Group-Object { $_.BaseName.Replace($pattern,'') } |
Where-Object Count -GT 1
Count Name Group
----- ---- -----
2 ExampleFile {ExampleFile.mp3, ExampleFile_pn.mp3}
2 ExampleFile2 {ExampleFile2.mp3, ExampleFile2_pn.mp3}
Then if we loop over these groups we can search for the files that do not end with the $pattern:
#'
ExampleFile.mp3
ExampleFile_pn.mp3
ExampleFile2.mp3
ExampleFile2_pn.mp3
ExampleFile3.mp3
'# -split '\r?\n' -as [System.IO.FileInfo[]] | Set-Variable files
$pattern = "_pn"
$files | Group-Object { $_.BaseName.Replace($pattern,'') } |
Where-Object Count -GT 1 | ForEach-Object {
$_.Group.Where({-not $_.BaseName.Endswith($pattern)})
}
This is how your code would look like, remove the -WhatIf switch if you consider the code is doing what you wanted.
$pattern = "_pn.mp3"
$files = Get-ChildItem -Path -Filter *.mp3 -File
$files | Group-Object { $_.BaseName.Replace($pattern,'') } |
Where-Object Count -GT 1 | ForEach-Object {
$toRemove = $_.Group.Where({-not $_.BaseName.Endswith($pattern)})
Remove-Item $toRemove -WhatIf
}
I think you can get by here by adding file names into a hash map as you go. If you encounter a file with the ending you are interested in, check if a similar file name was added. If so, remove both the file and the similar match.
$ending = "_pn.mp3"
$files = Get-ChildItem -Path $path -File | Where-Object { ! $_.PSIsContainer }
$hash = #{}
Foreach ($file in $files) {
# Check if file has an ending we are interested in
If ($file.Name.EndsWith($ending)) {
$similar = $file.Name.Split($ending)[0] + ".mp3"
# Check if we have seen the similar file in the hashmap
If ($hash.Contains($similar)) {
Write-Host $file.Name
Write-Host $similar
Remove-Item -Force $file
Remove-Item -Force $hash[$similar]
# Remove similar from hashmap as it is removed and no longer of interest
$hash.Remove($similar)
}
}
else {
# Add entry for file name and reference to the file
$hash.Add($file.Name, $file)
}
}
Just get a list of the files with the _pn then process against the rest.
$pattern = "*_pn.mp3"
$files = Get-ChildItem -Path "$path" -File -filter "$pattern"
Foreach ($file in $files) {
$TestFN = $file.name -replace("_pn","")
If (Test-Path -Path $(Join-Path -Path $Path -ChildPath $TestFN)) {
$file | Remove-Item -force
}
} #End Foreach

Memory exception while filtering large CSV files

getting memory exception while running this code. Is there a way to filter one file at a time and write output and append after processing each file. Seems the below code loads everything to memory.
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
Get-ChildItem $inputFolder -File -Filter '*.csv' |
ForEach-Object { Import-Csv $_.FullName } |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
May be can you export and filter your files one by one and append result into your output file like this :
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
Remove-Item $outputFile -Force -ErrorAction SilentlyContinue
Get-ChildItem $inputFolder -Filter "*.csv" -file | %{import-csv $_.FullName | where machine_type -eq 'workstations' | export-csv $outputFile -Append -notype }
Note: The reason for not using Get-ChildItem ... | Import-Csv ... - i.e., for not directly piping Get-ChildItem to Import-Csv and instead having to call Import-Csv from the script block ({ ... } of an auxiliary ForEach-Object call, is a bug in Windows PowerShell that has since been fixed in PowerShell Core - see the bottom section for a more concise workaround.
However, even output from ForEach-Object script blocks should stream to the remaining pipeline commands, so you shouldn't run out of memory - after all, a salient feature of the PowerShell pipeline is object-by-object processing, which keeps memory use constant, irrespective of the size of the (streaming) input collection.
You've since confirmed that avoiding the aux. ForEach-Object call does not solve the problem, so we still don't know what causes your out-of-memory exception.
Update:
This GitHub issue contains clues as to the reason for excessive memory use, especially with many properties that contain small amounts of data.
This GitHub feature request proposes using strongly typed output objects to help the issue.
The following workaround, which uses the switch statement to process the files as text files, may help:
$header = ''
Get-ChildItem $inputFolder -Filter *.csv | ForEach-Object {
$i = 0
switch -Wildcard -File $_.FullName {
'*workstations*' {
# NOTE: If no other columns contain the word `workstations`, you can
# simplify and speed up the command by omitting the `ConvertFrom-Csv` call
# (you can make the wildcard matching more robust with something
# like '*,workstations,*')
if ((ConvertFrom-Csv "$header`n$_").machine_type -ne 'workstations') { continue }
$_ # row whose 'machine_type' column value equals 'workstations'
}
default {
if ($i++ -eq 0) {
if ($header) { continue } # header already written
else { $header = $_; $_ } # header row of 1st file
}
}
}
} | Set-Content $outputFile
Here's a workaround for the bug of not being able to pipe Get-ChildItem output directly to Import-Csv, by passing it as an argument instead:
Import-Csv -LiteralPath (Get-ChildItem $inputFolder -File -Filter *.csv) |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
Note that in PowerShell Core you could more naturally write:
Get-ChildItem $inputFolder -File -Filter *.csv | Import-Csv |
Where-Object { $_.machine_type -eq 'workstations' } |
Export-Csv $outputFile -NoType
Solution 2 :
$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
$encoding = [System.Text.Encoding]::UTF8 # modify encoding if necessary
$Delimiter=','
#find header for your files => i take first row of first file with data
$Header = Get-ChildItem -Path $inputFolder -Filter *.csv | Where length -gt 0 | select -First 1 | Get-Content -TotalCount 1
#if not header founded then not file with sise >0 => we quit
if(! $Header) {return}
#create array for header
$HeaderArray=$Header -split $Delimiter -replace '"', ''
#open output file
$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)
#write header founded
$w.WriteLine($Header)
#loop on file csv
Get-ChildItem $inputFolder -File -Filter "*.csv" | %{
#open file for read
$r = New-Object System.IO.StreamReader($_.fullname, $encoding)
$skiprow = $true
while ($line = $r.ReadLine())
{
#exclude header
if ($skiprow)
{
$skiprow = $false
continue
}
#Get objet for current row with header founded
$Object=$line | ConvertFrom-Csv -Header $HeaderArray -Delimiter $Delimiter
#write in output file for your condition asked
if ($Object.machine_type -eq 'workstations') { $w.WriteLine($line) }
}
$r.Close()
$r.Dispose()
}
$w.close()
$w.Dispose()
You have to read and write to the .csv files one row at a time, using StreamReader and StreamWriter:
$filepath = "C:\Change\2019\October"
$outputfile = "C:\Change\2019\output.csv"
$encoding = [System.Text.Encoding]::UTF8
$files = Get-ChildItem -Path $filePath -Filter *.csv |
Where-Object { $_.machine_type -eq 'workstations' }
$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)
$skiprow = $false
foreach ($file in $files)
{
$r = New-Object System.IO.StreamReader($file.fullname, $encoding)
while (($line = $r.ReadLine()) -ne $null)
{
if (!$skiprow)
{
$w.WriteLine($line)
}
$skiprow = $false
}
$r.Close()
$r.Dispose()
$skiprow = $true
}
$w.close()
$w.Dispose()
get-content *.csv | add-content combined.csv
Make sure combined.csv doesn't exist when you run this, or it's going to go full Ouroboros.

Copy-Item by header & timestamp

I am using the code below to filter out files depending on the headers in the file.
It works like a charm, but I have a problem with that it takes all the files in the $InputDirectory.
I would like to limit it so it only takes files that are 1-2 hours old.
There are two ways where I can get the date for this process.
Filename contains timestamp = XXXXXXXXXXX_XXXXXXXX_valuereport_YYYYMMDDhhmmss.csv
The timestamp the file was created (please note we are talking about 800K-1M files in the directory and more is added every hour, so the fastest way would be appreciated.
So how do I insert something in my code, so it besides the header, only takes files that are <1-2 hours old?
Sorry about the code example, I am new to this site and not sure how to get it in the right order.
Nothing yet.
foreach ($FilePath in (Get-ChildItem $InputDirectory -File) | Select-Object -ExpandProperty FullName) {
$Header = Get-Content $FilePath -First 1
# test for a string in the header line that distincts it from the other files
if ($Header -match ';energy,Wh,') {
# the substring ';energy,Wh,' defines this file as a 'HeatMeter' file
Copy-Item -Path $FilePath -Destination $OutputPathHeat
} elseif ($Header -match ';fabrication-no,,inst-value,0,0,0;datetime,,inst-value,0,0,0;volume,m3') {
# the substring ';datetime,,inst-value,0,0,0;volume,m3' defines this file as a 'WaterMeter' file
Copy-Item -Path $FilePath -Destination $OutputPathWater
} else {
# if all key substrings above did not match, move to the 'Other' directory
Copy-Item -Path $FilePath -Destination $OutputPathOther
}
There are several ways to filter a directory listing. The easiest way is to pipe the result of Get-ChildItem through Where-Object like:
Get-ChildItem -Path $InputDirectory -File |
Where-Object { $_.CreationTime -gt (Get-Date).AddHours(-2) } |
Select-Object -ExpandProperty FullName |
ForEach-Object {
$FilePath = $_
$Header = Get-Content $FilePath -First 1
# test for a string in the header line that distincts it from the other files
if ($Header -match ';energy,Wh,') {
# the substring ';energy,Wh,' defines this file as a 'HeatMeter' file
Copy-Item -Path $FilePath -Destination $OutputPathHeat
}
elseif ($Header -match ';fabrication-no,,inst-value,0,0,0;datetime,,inst-value,0,0,0;volume,m3') {
# the substring ';datetime,,inst-value,0,0,0;volume,m3' defines this file as a 'WaterMeter' file
Copy-Item -Path $FilePath -Destination $OutputPathWater
}
else {
# if all key substrings above did not match, move to the 'Other' directory
Copy-Item -Path $FilePath -Destination $OutputPathOther
}
}
It checks that the CreationTime is greater than now - 2h. Note that the last modified (LastWriteTime) timestamp may also be suitable for your use case.

Write folder name and subfolder name being deleted to output file

I have written the below with the intention of deleting all folders in a directory that have a creation date of 2 days or more and log this in an output file if it is successful or not.
The script works as I would like with the exception that the name of the file will not show in the output file. All that is displayed is 'Deletion of Failed/Successful'
$dump_path = "C:\desktop"
$max_days = "-2"
$curr_date = Get-Date
$del_date = $curr_date.AddDays($max_days)
ForEach-Object {
$filename = $_
Get-ChildItem $statfolder\$_ -Recurse | Where-Object {
$_.CreationTime -lt $del_date
} | Remove-Item -Recurse -Force
if ($? -eq $false) {
echo "$Deletion of $filename Failed" |
Out-File -Append C:\Logs\DELETION_FAIL_K_$(Get-Date -Format `"dd-MMM-yyyy`").txt
} else {
Write-Output "Deletion of $filename Successful" |
Out-File -Append C:\Logs\DELETION_SUCCESS_K_$(Get-Date -format `"dd-MMM-yyyy`").txt
}
}
I would ideally like the log to display the parent folder name and a list of the sub folders in next level down only. Is this possible?
Eg. the log would read
Deletion of folder 12-Jan-2017 containing sub folders R2015, R2086 was Successful
If the sub folders in the next level cannot be added then just the below would be great:
Deletion of folder 12-Jan-2017 was Successful
The way you're using ForEach-Object the current object variable $_ (and consequentially the variable $filename) is never populated. Where would you expect the value to come from?
Feed the output of Get-ChildItem | Where-Object into ForEach-Object, but sort the results by full name first, so that nested folders are deleted before their parents.
Get-ChildItem $statfolder\$_ -Recurse | Where-Object {
$_.PSIsContainer -and
$_.CreationTime -lt $del_date
} | Sort-Object FullName | ForEach-Object {
$folder = $_.FullName
"Deleting folder '$folder'."
Remove-Item $folder -Recurse -Force -WhatIf
}
With the -WhatIf switch present you'll be doing a dry-run, just echoing what would be deleted without actually deleting it. After you verified that everything would work as intended remove the switch and re-run.

Recursively count files in subfolders

I am trying to count the files in all subfolders in a directory and display them in a list.
For instance the following dirtree:
TEST
/VOL01
file.txt
file.pic
/VOL02
/VOL0201
file.nu
/VOL020101
file.jpg
file.erp
file.gif
/VOL03
/VOL0301
file.org
Should give as output:
PS> DirX C:\TEST
Directory Count
----------------------------
VOL01 2
VOL02 0
VOL02/VOL0201 1
VOL02/VOL0201/VOL020101 3
VOL03 0
VOL03/VOL0301 1
I started with the following:
Function DirX($directory)
{
foreach ($file in Get-ChildItem $directory -Recurse)
{
Write-Host $file
}
}
Now I have a question: why is my Function not recursing?
Something like this should work:
dir -recurse | ?{ $_.PSIsContainer } | %{ Write-Host $_.FullName (dir $_.FullName | Measure-Object).Count }
dir -recurse lists all files under current directory and pipes (|) the result to
?{ $_.PSIsContainer } which filters directories only then pipes again the resulting list to
%{ Write-Host $_.FullName (dir $_.FullName | Measure-Object).Count } which is a foreach loop that, for each member of the list ($_) displays the full name and the result of the following expression
(dir $_.FullName | Measure-Object).Count which provides a list of files under the $_.FullName path and counts members through Measure-Object
?{ ... } is an alias for Where-Object
%{ ... } is an alias for foreach
Similar to David's solution this will work in Powershell v3.0 and does not uses aliases in case someone is not familiar with them
Get-ChildItem -Directory | ForEach-Object { Write-Host $_.FullName $(Get-ChildItem $_ | Measure-Object).Count}
Answer Supplement
Based on a comment about keeping with your function and loop structure i provide the following. Note: I do not condone this solution as it is ugly and the built in cmdlets handle this very well. However I like to help so here is an update of your script.
Function DirX($directory)
{
$output = #{}
foreach ($singleDirectory in (Get-ChildItem $directory -Recurse -Directory))
{
$count = 0
foreach($singleFile in Get-ChildItem $singleDirectory.FullName)
{
$count++
}
$output.Add($singleDirectory.FullName,$count)
}
$output | Out-String
}
For each $singleDirectory count all files using $count ( which gets reset before the next sub loop ) and output each finding to a hash table. At the end output the hashtable as a string. In your question you looked like you wanted an object output instead of straight text.
Well, the way you are doing it the entire Get-ChildItem cmdlet needs to complete before the foreach loop can begin iterating. Are you sure you're waiting long enough? If you run that against very large directories (like C:) it is going to take a pretty long time.
Edit: saw you asked earlier for a way to make your function do what you are asking, here you go.
Function DirX($directory)
{
foreach ($file in Get-ChildItem $directory -Recurse -Directory )
{
[pscustomobject] #{
'Directory' = $File.FullName
'Count' = (GCI $File.FullName -Recurse).Count
}
}
}
DirX D:\
The foreach loop only get's directories since that is all we care about, then inside of the loop a custom object is created for each iteration with the full path of the folder and the count of the items inside of the folder.
Also, please note that this will only work in PowerShell 3.0 or newer, since the -directory parameter did not exist in 2.0
Get-ChildItem $rootFolder `
-Recurse -Directory |
Select-Object `
FullName, `
#{Name="FileCount";Expression={(Get-ChildItem $_ -File |
Measure-Object).Count }}
My version - slightly cleaner and dumps content to a file
Original - Recursively count files in subfolders
Second Component - Count items in a folder with PowerShell
$FOLDER_ROOT = "F:\"
$OUTPUT_LOCATION = "F:DLS\OUT.txt"
Function DirX($directory)
{
Remove-Item $OUTPUT_LOCATION
foreach ($singleDirectory in (Get-ChildItem $directory -Recurse -Directory))
{
$count = Get-ChildItem $singleDirectory.FullName -File | Measure-Object | %{$_.Count}
$summary = $singleDirectory.FullName+" "+$count+" "+$singleDirectory.LastAccessTime
Add-Content $OUTPUT_LOCATION $summary
}
}
DirX($FOLDER_ROOT)
I modified David Brabant's solution just a bit so I could evaluate the result:
$FileCounter=gci "$BaseDir" -recurse | ?{ $_.PSIsContainer } | %{ (gci "$($_.FullName)" | Measure-Object).Count }
Write-Host "File Count=$FileCounter"
If($FileCounter -gt 0) {
... take some action...
}