Compare multiple folders for file differences - powershell

I began to compare 2 folder structures to find files that did not match by date and size, but the requirment has been changed to 4 folders and I am stuck.
So here is what I am trying to do:
We upload several hundred folders\files to 4 different servers. The files must all match. Sometimes a file will not copy properly. So I need a script to read all four directories and compare all the files to make sure they match by size and date.
Output should only be a simple list that shows me the files that didn't match.
Any ideas?
Thanks.
I can do two folders but am lost on four. Also, this output is confusing. Not sure how to only list those that don't match.
$path1 = "\\path\folder
$path2 = "\\path\folder1
$dif = Compare-Object -ReferenceObject $path1 -DifferenceObject $path2 -Property FullName, Length, LastWriteTime
$dif | ft -AutoSize

I'd go about it with a hash based approach, and possibly use a database table somehwere to help yourself out. BTW, PSCX has the Get-Hash commandlet which will help you do this.
Basic approach
Traverse each server's desired folder-tree (you want to do this on the servers involved for performance reasons, not over a network share!) and generate a hash on each file you find. Store the hash and the full path and server name somewhere, preferrably a database table accessible from all four servers--it'll make processing much easier.
Then, if you've used a database table, write a few simple queries:
find any hash where there are fewer than 4 instances of the hash.
find any file path (you may have to process the path string to get it to the same relative root for each server ) where there are differing hashes (although this might be covered by 1. above).
All of this can be done from within PS, of course.
Why this way of doing things may be helpful
You don't have to run a four-way Compare-Object. The hashes serve as your point of comparison.
Your Powershell code to generate the hashes is one identical function that gets run on each server.
It scales. You could easily do this for 100 folders.
You end up with something easily manipulated and "distributed",i.e. accesible to the servers involved--the database table.
Downside
PSCX Get-Hash isn't very fast. This can easily be remedied by having PS fire some faster hash generating command, such as this one, md5sums.
How to do without using a database table
1. Write the hashes, file paths, severnames to files on each server as you are processing folders for hashes, and bring those files back when done.
2. Process the files into a hash table that keys on the hashcodes and counts each hash code.
3. You can have a parallel hash table (built at that same time as 2. while you pass throug the result files) that keys on each hash code to an array of paths/servers for that hash code.
4. Look for hash codes in hash table 1 with a count of less than 4. Use parallel hash table 2 to look up hash codes found with a count less 4, to find out what the file path(s) and server(s) were.

Try this:
Remember that the PrimaryPath has to be a masterlocation(contents are correct). Also, be consistent with how you write the paths(if you include the \ or not). Ex. Either use c:\folders\folder1\ for all paths or c:\folders\folder1.
Compare.ps1
Param(
[parameter(Mandatory=$true)] [alias("p")] [string]$PrimaryPath,
[parameter(Mandatory=$true)] [alias("c")] [string[]]$ComparePath
)
#Get filelist with relativepath property
function Get-FilesWithRelativePath ($Path) {
Get-ChildItem $Path -Recurse | ? { !$_.PSIsContainer } | % {
Add-Member -InputObject $_ -MemberType NoteProperty -Name RelativePath -Value $_.FullName.Substring($Path.Length)
$_
}
}
#If path exists and is folder
if (Test-Path $PrimaryPath -PathType Container) {
#Get master fileslist
$Masterfiles = Get-FilesWithRelativePath (Resolve-Path $PrimaryPath).Path
#Compare folders
foreach ($Folder in $ComparePath) {
if (Test-Path $Folder -PathType Container) {
#Getting filelist and adding relative-path property to files
$ResolvedFolder = (Resolve-Path $Folder).Path
$Files = Get-FilesWithRelativePath $ResolvedFolder
#Compare and output filepath to missing or old file
Compare-Object -ReferenceObject $Masterfiles -DifferenceObject $Files -Property RelativePath, Length, LastWriteTime | ? { $_.SideIndicator -eq "<=" } | Select #{n="FilePath";e={Join-Path $ResolvedFolder $_.RelativePath}}
} else { Write-Error "$Folder is not a valid foldername. Foldertype: Compare" }
}
} else { Write-Error "$PrimaryPath is not a valid foldername. Foldertype: Master" }

Related

How to handle copy file with infinity loop using Powershell?

I want to check .jpg file in the 2nd folder. 2nd folder has some subfolder. if .jpg exist in the subfolder of 2nd folder, I will copy a file from 1st folder to subfolder of 2nd folder based on the base name. I can do this part refer to this great answer How to copy file based on matching file name using PowerShell?
https://stackoverflow.com/a/58182359/11066255
But I want to do this process with infinity loop. Once I use infinity loop, I found that I have a lot of duplicate file. How do I make limitation, if I already copy the file, I will not copy again in the next loop.
Anyone can help me please. Thank you.
for(;;)
{
$Job_Path = "D:\Initial"
$JobError = "D:\Process"
Get-ChildItem -Path "$OpJob_Path\*\*.jpg" | ForEach-Object {
$basename = $_.BaseName.Substring(15)
$job = "$Job_Path\${basename}.png"
if (Test-Path $job) {
$timestamp = Get-Date -Format 'yyyyMMddhhmmss'
$dst = Join-Path $_.DirectoryName "${timestamp}_${basename}.gif"
$Get = (Get-ChildItem -Name "$OpJob_Path\*\*$basename.jpg*" | Measure-Object).Count
Copy-Item $job $dst -Force
}
}
}
File management 101 is that, Windows will not allow duplicate file names in the same location. You can only have duplicate files, if the name of the file is unique, but the content is the same. Just check for the filename, but they must be the same filename, and do stuff if it is not a match else do nothing.
Also, personally, I'd suggest using a PowerShell FileSystemWatcher instead of a infinite loop. Just saying...
This line …
$timestamp = Get-Date -Format 'yyyyMMddhhmmss'
… will always generate a unique filename by design, the content inside it is meaningless, unless you are using file hashing for the compare as part of this.
Either remove / change that line to something else, or use file hash (they ensure uniqueness regardless of name used) ...
Get-FileHash -Path 'D:\Temp\input.txt'
Algorithm Hash Path
--------- ---- ----
SHA256 1C5B508DED35A28B9CCD815D47ECF500ECF8DDC2EDD028FE72AB5505C0EC748B D:\Temp\input.txt
... for compare and prior to the copy if another if/then.
something like...
If ($job.Hash -ne $dst.Hash)
{Copy-Item $job.Path $dst.Path}
Else
{
#Do nothing
}
There are of course other ways to do this as well, this is just one idea.

How do I copy a list of files and rename them in a PowerShell Loop

We are copying a long list of files from their different directories into a single location (same server). Once there, I need to rename them.
I was able to move the files until I found out that there are duplicates in the list of file names to move (and rename). It would not allow me to copy the file multiple times into the same destination.
Here is the list of file names after the move:
"10.csv",
"11.csv",
"12.csv",
"13.csv",
"14.csv",
"15.csv",
"16.csv",
"17.csv",
"18.csv",
"19.csv",
"20.csv",
"Invoices_Export(16) - Copy.csv" (this one's name should be "Zebra.csv")
I wrote a couple of foreach loops, but it is not working exactly correctly.
The script moves the files just fine. It is the rename that is not working the way I want. The first file does not rename; the other files rename. However, they leave the moved file in place too.
This script requires a csv that has 3 columns:
Path of the file, including the file name (eg. c:\temp\smefile.txt)
Destination of the file, including the file name (eg. c:\temp\smefile.txt)
New name of the file. Just the name and extention.
# Variables
$Path = (import-csv C:\temp\Test-CSV.csv).Path
$Dest = (import-csv C:\temp\Test-CSV.csv).Destination
$NN = (import-csv C:\temp\Test-CSV.csv).NewName
#Script
foreach ($D in $Dest) {
$i -eq 0
Foreach ($P in $Path) {
Copy-Item $P -destination C:\Temp\TestDestination -force
}
rename-item -path "$D" -newname $NN[$i] -force
$i += 1
}
There were no error per se, just not the outcome that I expected.
Welcome to Stack Overflow!
There are a couple ways to approach the duplicate names situation:
Check if the file exists already in the destination with Test-Path. If it does, start a while loop that appends a number to the end of the name and check if that exists. Increment the number you append after each check with Test-Path. Keep looping until Test-Path comes back $false and then break out of the loop.
Write an error message and skip that row in the CSV.
I'm going to show a refactored version of your script with approach #2 above:
$csv = Import-Csv 'C:\temp\Test-CSV.csv'
foreach ($row in $csv)
{
$fullDestinationPath = Join-Path -Path $row.Destination -ChildPath $row.NewName
if (Test-Path $fullDestinationPath)
{
Write-Error ("The path '$fullDestinationPath' already exists. " +
"Skipping row for $($row.Path).")
continue
}
# You may also want to check if $row.Path exists before attempting to copy it
Copy-Item -Path $row.Path -Destination $fullDestinationPath
}
Now that your question is answered, here are some thoughts for improving your code:
Avoid using acronyms and abbreviations in identifiers (variable names, function names, etc.) when possible. Remember that code is written for humans and someone else has to be able to understand your code; make everything as obvious as possible. Someone else will have to read your code eventually, even if it's Future-You™!
Don't Repeat Yourself (called the "DRY" principle). As Lee_daily mentioned in the comments, you don't need to import the CSV file three times. Import it once into a variable and then use the variable to access the properties.
Try to be consistent. PowerShell is case-insensitive, but you should pick a style and stick to it (i.e. ForEach or foreach, Rename-Item or rename-item, etc.). I would recommend PascalCase as PowerShell cmdlets are all in PascalCase.
Wrap literal paths in single quotes (or double quotes if you need string interpolation). Paths can have spaces in them and without quotes, PowerShell interprets a space as you are passing another argument.
$i -eq 0 is not an assignment statement, it is a boolean expression. When you run $i -eq 0, PowerShell will return $true or $false because you are asking it if the value stored in $i is 0. To assign the value 0 to $i, you need to write it like this: $i = 0.
There's nothing wrong with $i += 1, but it could be shortened to $i++, if you want to.
When you can, try to check for common issues that may come up with your code. Always think about what can go wrong. "If I copy a file, what can go wrong? Does the source file or folder exist? Is the name pulled from the CSV a valid path name or does it contain characters that are invalid in a path (like :)?" This is called defensive programming and it will save you so so many headaches. As with anything in life, be careful not to go overboard. Only check for likely scenarios; rare edge-cases should just raise errors.
Write some decent logs so you can see what happened at runtime. PowerShell provides a pair of great cmdlets called Start-Transcript and Stop-Transcript. These cmdlets log all the output that was sent to the PowerShell console window, in addition to some system information like the version of PowerShell installed on the machine. Very handy!

PowerShell: Find similar filenames in a directory

In a purely hypothetical situation of a person that downloaded some TV episodes, but is wondering if he/she accidentally downloaded an HDTV, a WEBRip and a WEB-DL version of an episode, how could PowerShell find these 'duplicates' so the lower quality versions can be automagically deleted?
First, I'd get all the files in the directory:
$Files = Get-ChildItem -Path $Directory -Exclude '*.nfo','*.srt','*.idx','*.sub' |
Sort-Object -Property Name
I exclude the non-video extensions for now, since they would cause false positives. I would still have to deal with them though (during the delete phase).
At this point, I would likely use a ForEach construct to parse through the files one by one and look for files that have the same episode number. If there are any, they should be looked at.
Assuming a common spaces equals dots notation here, a typical filename would be AwesomeSeries.S01E01.HDTV.x264-RLSGRP
To compare, I need to get only the episode number. In the above case, that means S01E01:
If ($File.BaseName -match 'S*(\d{1,2})(x|E)(\d{1,2})') { $EpisodeNumber = $Matches[0] }
In the case of S01E01E02 I would simply add a second if-statement, so I'm not concerned with that for now.
$EpisodeNumber should now contain S01E01. I can use that to discover if there are any other files with that episode number in $Files. I can do that with:
$Files -match $EpisodeNumber
This is where my trouble starts. The above will also return the file I'm processing. I could at this point handle the duplicates immediately, but then I would have to do the Get-ChildItem again because otherwise the same match would be returned when the ForEach construct gets to the duplicate file which would then result in an error.
I could store the files I wish to delete in an array and process them after the ForEach contruct is over, but then I'd still have to filter out all the duplicates. After all, in the ForEach loop,
AwesomeSeries.S01E01.HDTV.x264-RLSGRP
would first match
AwesomeSeries.S01E01.WEB-DL.x264.x264-RLSGRP, only for
AwesomeSeries.S01E01.WEB-DL.x264.x264-RLSGRP
to match
AwesomeSeries.S01E01.HDTV.x264-RLSGRP afterwards.
So maybe I should process every episode number only once, but how?
I get the feeling I'm being very inefficient here and there must be a better way to do this, so I'm asking for help. Can anyone point me in the right direction?
Filter the $Files array to exclude the current file when matching:
($Files | Where-Object {$_.FullName -ne $File.FullName}) -match $EpisodeNumber
Regarding the duplicates in the array the end, you can use Select-Object -Unique to only get distinct entries.
Since you know how to get the episode number let's use that to group the files together.
$Files = Get-ChildItem -Path $Directory -Exclude '*.nfo','*.srt','*.idx','*.sub' | Select-Object FullName, #{Name="EpisodeIndex";Expression={
# We do not have to do it like this but if your detection logic gets more complicated then having
# this select-object block will be a cleaner option then using a calculated property
If ($_.BaseName -match 'S*(\d{1,2})(x|E)(\d{1,2})'){$Matches[0]}
}}
# Group the files by season episode index (that have one). Return groups that have more than one member as those would need attention.
$Files | Where-Object{$_.EpisodeIndex } | Group-Object -Property EpisodeIndex |
Where-Object{$_.Count -gt 1} | ForEach-Object{
# Expand the group members
$_.Group
# Not sure how you plan on dealing with it.
}

Copying files defined in a list from network location

I'm trying to teach myself enough powershell or batch programming to figure out to achieve the following (I've had a search and looked through a couple hours of Youtube tutorials but can't quite piece it all together to figure out what I need - I don't get Tokens, for example, but they seem necessary in the For loop). Also, not sure if the below is best achieved by robocopy or xcopy.
Task:
Define a list of files to retrieve in a csv (file name will be listed as a 13 digit number, extension will be UNKNOWN, but will usually be .jpg but might occasionally be .png - could this be achieved with a wildcard?)
list would read something like:
9780761189931
9780761189988
9781579657159
For each line in this text file, do:
Search a network folder and all subfolders
If exact filename is found, copy to an arbitrary target (say a new folder created on desktop)
(Not 100% necessary, but nice to have) Once the For loop has completed, output a list of files copied into a text file in the newly created destination folder
I gather that I'll maybe need to do a couple of things first, like define variables for the source and destination folders? I found the below elsewhere but couldn't quite get my head around it.
set src_folder=O:\2017\By_Month\Covers
set dst_folder=c:\Users\%USERNAME&\Desktop\GetCovers
for /f "tokens=*" %%i in (ISBN.txt) DO (
xcopy /K "%src_folder%\%%i" "%dst_folder%"
)
Thanks in advance!
This solution is in powershell, by the way.
To get all subfiles of a folder, use Get-ChildItem and the pipeline, and you can then compare the name to the insides of your CSV (which you can get using import-CSV, by the way).
Get-ChildItem -path $src_folder -recurse | foreach{$_.fullname}
I'd personally then use a function to edit the name as a string, but I know this probably isn't the best way to do it. Create a function outside of the pipeline, and have it return a modified path in such a way that you can continue the previous line like this:
Get-ChildItem -path $src_folder -recurse | foreach{$_.CopyTo (edit-path $_.fullname)}
Where "edit-directory" is your function that takes in the path, and modifies it to return your destination path. Also, you can alternatively use robocopy or xcopy instead of CopyTo, but Copy-Item is a powershell native and doesn't require much string manipulation (which in my experience, the less, the better).
Edit: Here's a function that could do the trick:
function edit-path{
Param([string] $path)
$modified_path = $dst_folder + "\"
$modified_path = $path.substring($src_folder.length)
return $modified_path
}
Edit: Here's how to integrate the importing from CSV, so that the copy only happens to files that are written in the CSV (which I had left out, oops):
$csv = import-csv $CSV_path
Get-ChildItem -path $src_folder -recurse | where-object{$csv -contains $_.name} | foreach{$_.CopyTo (edit-path $_.fullname)}
Note that you have to put the whole CSV path in the $CSV_path variable, and depending on how the contents of that file are written, you may have to use $_.fullname, or other parameters.
This seems like an average enough problem:
$Arr = Import-CSV -Path $CSVPath
Get-ChildItem -Path $Folder -Recurse |
Where-Object -FilterScript { $Arr -contains $PSItem.Name.Substring(0,($PSItem.Length - 4)) } |
ForEach-Object -Process {
Copy-Item -Destination $env:UserProfile\Desktop
$PSItem.Name | Out-File -FilePath $env:UserProfile\Desktop\Results.txt -Append
}
I'm not great with string manipulation so the string bit is a bit confusing, but here's everything spelled out.

Compare Directory and File removing equals from Directory

This is the code I am trying to execute
$objDir = Get-ChildItem "C:\Users\Bruce\Serena\"
$objFile = Get-Content "C:\Users\Bruce\process.txt"
$matches = (Compare-Object -ReferenceObject $objFile -DifferenceObject $objDir -Property Name,Length -excludedifferent -includeequal)
foreach ($file in $matches)
{
Remove-Item C:\Users\Bruce\Serena\$($file.Name)
}
All items that are equal in directory and txt file I want to delete from the directory will this code do that?
It's hard to tell what you should do without seeing the format of the data in process.txt, but I can tell you definitively that invoking Compare-Object on the results of Get-ChildItem and Get-Content can't possibly work, because the former returns an array of FileInfo objects and the latter retuns an array of strings (or just a string, if the file has only one line).
Compare-Object is intended primarily to compare sets of objects of the same type, though it can be used to compare sets of objects that have common property names. However, in the latter case the properties need to have the same type of information, not just the same names, in order for the comparison to be meaningful.
There's no way for it to guess what content in the strings in $objFile to compare to properties of the FileInfo objects in $objDir. The only property name these object types have in common is Length, but any matches on that property would be meaningless (and very unlikely) coincidences, because they have completely different meanings—the number of characters in the string, and the size of the file in bytes, respectively.
How you should do it depends on what kind of data you have in process.txt. If it's just a list of filenames, then it's as simple as
foreach ($file in (Get-Content 'C:\Users\Bruce\process.txt')) {
Remove-Item -ErrorAction SilentlyContinue "C:\Users\Bruce\Serena\$file"
}
If it's a CSV file containing the name and size of each file, then you'd use Import-Csv rather than Get-Content, to import the data into an array of objects with properties you can compare to the directory listing, but I'd need to see some sample data before getting specific.