Compare two directories for files of the same name but different content - powershell

I have dir1 and dir2 which have subfolders and files of the same name. Both folders have roughly 1800 items and I need to compare to find which files are different. I need to be able to report the name of any files that are either, in one and not the other, or in both but different.
I have used tools such as WinMerge which can spot it in under a minute. However, I am trying to automate this process so being able to do it in powershell or as a batch command would be ideal.
From a powershell standpoint, my searches have suggested to pull the hash and compare them between files, which works, but takes forever due to the size of the directories.
If anyone could help steer me in the right direction or how I should approach this, it would be much appreciated.

WinMerge has a CLI which should give you exactly what you need.

Related

using powershell to only read last 1month of folders. copy in 7 days worth of folders then using robocopy

apologies to start as im new to powershell and robocopy.
i have a robocopy command that pulls in any files within its many subfolders that are within a maxage of 7. however, the main folder has a huge amount of folders dating back years(and i only need last 7 days each week it runs) so its slow reading each file in each folder before it even copies using robocopy.
it looks like powershell commands may be a way for me to limit the search of files for my robocopy, would this be possible? currently robocopy search each files in each folder in my main folder, ideally i would want it to be smart enough to only search even a months worth of files and then copy over last 7 days. this would speed up the run time hugely.
if possible even further, i only want csv files in each of the folders in my main folder but current robocopy searches the other folders and its files as well which takes time. all the csv files are in a folder called "run" in each parent folder(parent folder is a unique number within the "mainfolder".
my robocopy command:
robocopy \\server\mainfolder \\server\new_main_folder /S /maxage:7 /r:0 /w:0
I was going to point to you either FastCopy or FreeFileSync, both handle long file name paths and work well for me. But found problems running FastCopy when trying to filter folders the way you described. I wasn't getting the results I expected, so that leaves FreeFileSync. There is a little bit of a learning curve with FreeFileSync, but really, the only problem/complaint I've had with it is the xml based batch script that you can use to automate the program kept changing formats and they haven't been providing a way to read the old xml batch scripts with the new version of the software. Maybe that has changed, I haven't looked into that lately.
Maybe other people have had better experience with RoboCopy, but I found it to take literally many multiples longer to do the same job as many other copy programs. I don't think FreeFileSync is as fast as FastCopy, but I've never seen it act as bad as what I experienced with RoboCopy.
The way FreeFileSync works is:
You define 1 or more source/destination pairs.
There is a global setting at the top to set the defaults for all copy pairs.
There are individual settings per each copy pair that when set override the global settings.
In the filter tab of the settings you can set "Time span:" to "Last x days:" and set it to the 7 days that you want.
You can change include from * to something like \run\*.csv. I didn't try that exact pattern, but the patterns I did try worked as expected (Unlike FastCopy).
The Synchronization tab is the tricky/fun one. You can do logs, versioning, tell the system to shutdown or restart when done, maintain a database for tracking moved files ("Detect moved files" checkbox), and all kinds of adjustments to how it behaves when files don't match.
When done, there is I believe at least 2 options for saving the configuration - though I've always just created the xml based batch script and called that from another scripting language or an icon on the desktop.

how to create a script that allows to use the path list as a reference for copying files in PowerShell in .bat script

I'm looking for a way to automate archiving where after I plug my two external drives I can copy all my resources. The problem is that I have different file structures on my laptop and on both external drives so I need to select specific folders to be copied. It means that I can't select one root folder and copy it straightforward. I tried to find a way to declare more than one path in the cp command and in the copy command, without success. An example path:
/my_programming_stuff
/folder1
/folder2
/folder3
/folder4
I want to select only the first 3 folders to copy them into external drive1 and external drive 2. The idea is to create a .bat file that will copy everything at once ( in the best case scenario it will be copied simultaneously on both external drives, so it will be much faster). Another problem is that there needs to be a bypass the ntfs long path limitations (max. 260 characters).
Flags that I want to use:
Copy the files and directories and all of their attributes,
including ownerships and permissions.
Recursively copy directories and their contents.
When copying files from one directory to another, only
copy files that either doesn't exist or are newer than the
existing corresponding files, in the destination
directory.
data verification (so it's certain that the copy was verified)
progression bar with time eta
Until now I was using Total Commander to do this but every day I need to pick only a few folders to be copied which takes time and is inefficient.
I have experience with Bash and PowerShell but I am not sure how to handle this topic.
Create a static batch file with robocopy commands. I think /copyall is the only switch you need to specify for all this. Other defaults should satisfy requirements.
https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/robocopy
I think your time will be better spent learning how to use either FastCopy or FreeFileSynce. I used FreeFileSync some years ago but got disgusted with the it's constantly changing format of its xml file used for starting a backup, so I switched to FastCopy. But it looks like FreeFileSync may be getting their act together and I aim to do some experiments over the summer to see if I want to switch back to it.
Both can handle the long filename format issues, both can be executed by a batch file, both seem to have a lot of quality, but FreeFileSync has more features - and more bloated because of the features. But speed wise, I think FastCopy is probably one of the better products out there and very streamline in use and design.

Tool/Script to Automate File Naming From Metadata

I have a folder containing 1000+ PDF files with generic names (i.e. DOC (1)) that I need to rename with the order number, which is listed inside the file itself. I've had to manually open each document, look for the order number, then change the file name.
I have a spreadsheet list of all of the order numbers contained in that folder. I've used the search function of the file explorer to look up each individual order #, then renaming whichever file comes up. This has reduced the task time substantially but it's still very time-consuming.
Does anyone know of a macro or scrip that I can build to automate the task? Again, I would need something to grab the order number line by line from the spreadsheet, then go to the file explorer to search for any files with that specific term, and finally rename whichever file comes up with said term.
Any and all relevant help is greatly appreciated.
Thank you in advance for your time!

Use Powershell to copy Long Filename files

Unfortunatly for me we have a legacy file system which has been around since the dinosaurs roamed. Over time, files have been moved/copied to a location which has generated a folder + file structure which is too long to simply copy from one location to another. (long file name)
What I see as a solution is to use powershell to map a drive to each file location, copy it then recurse for each.
e.g.
net use w: \newlocation\
net use x: \silly long file path\folder1\
xcopy x:\file.txt w:\folder1
then use for each to copy the same structure as it was originally.
I know it seems long winded but I cannot think of a better way forward. I've tried Robocopy which is supposed to avoid this problem but through much testing.. Nope. Still fails to copy.
Can you please suggest a good method for doing this?
Not sure if you are still looking for an answer but you can try the program called Beyond Compare which should work for you.
Honestly, I didn't spend too much time with it as it has bunch of options but I know people who successfully use it every day for files with long file names.

How do you compare the content of two archive files programmatically?

I'm doing some testing to ensure that the all in one zip file that i created using a script file will produce the same output as the content of a few zip files that i must manually click and create via web interface. Therefore the zip will have different folder structure.
Of course i can manually extracted them out and using my powerful eyeball technique to scan them or even lazier i can write a script to do that, but before i invest more time and get accused by my boss for company time robbery, i'm asking if there's a better way to do this?
I'm using perl LAMP stack by the way.
thanks.
You can use perl's Archive::ZIP or Python's zipfile to extract the filenames, sizes and CRC checksums of the files in the archives. Create a file which contains the results sorted by file name (ignore the path).
For your smaller ZIPs, merge the results of the script (cat list1 list2 list3 | sort).
Now, you can use diff to compare the results.
I can wholeheartly recommend Beyond Compare. Unless you're really getting underpaid, it's the biggest bang for your (bosses) buck.
[Edit] I seem to have scanned over the different folder structure, sorry about that.Beyond Compare can compare all files in folders with the same folderstructure. It does not have (I believe) the intelligence to go searching for matches in files in different folders.
Regards,
Lieven
Create a crc checksum for your files.
If your checksum is the same for the original files and the unzipped files, you can be sure the files are the same. And even works for non text data.
A checksum be easily be created with an external program such as "SFV Checker" or programmatically (.net/java for example include libraries to do this).
Taking a cue from Carra's answer...if A.zip is your single big archive and B.zip is the archive generated through the web then use the following algorithm
Extract all files from A.zip and recursively (w.r.t folders) compute the checksum of the files present in the folder (using cksum, md5sum etc) where the contents were extracted and save this information after sorting it (pipe it through sort) to a file (say A.txt)
Do the same for B.zip and generate B.txt
Compare A.txt with B.txt they should be exactly the same.
OR
Use unzip -l to get file/directory lists for both the (zip) archives and then flatten the hierarchy of the user generated zip file and compare with the contents of your script generated zip file using some thing like diff. By flattening of hierarchy I mean you may need to do some kind of pre-precessing on one or both lists before you can do a meaningful comparison with diff.