We use Robocopy as part of our backup concept.
Now when the destination computer has crashed, we will restore files from our backup/source computer using the backup one hour (which is the related backup interval) ago. However the destination computer might receive files this way, which had been deleted in the meantime by users on the destination server (which acts very dynamic) on purpose.
This is something, we would like avoid. Is this possible with Robocopy? Per my understanding it is not...
If I understand the situation correctly it is something like this:
- One hour ago: your backup is created
- 30 mins ago: User deletes a file
- 10 mins ago: server disk crashes
- Now you need to restore from backup
The server hard disk is now gone. The only information you now have is on the backup from one hour ago. The backup was done before the user deleted the file. So you have no way of knowing that the user deleted the file.
It seems to me that robocopy will not help in this situation, as there is no way to know what files were deleted after the backup was taken.
Does this make sense?
Related
recently I've got some problems with my old hard drive and windows so I switched over to a new one. I can still access any files on the old hard drive and I still have an installation of mongodb on there with some pretty important data (should have done a backup sometime).
What would be the smartest way to get this data and transfer it to my new instance? Just copying the data files over? This "old" instance is btw not running and its not possible for me to start it again.
You could get the "old" instance running again by running mongo on the system and pointing the DBPath to the folder on the old hard drive.
It looks like copying the files over is a valid option though.
See https://docs.mongodb.com/manual/core/backups/#back-up-with-cp-or-rsync
We've been using gsutil -m rsync -r to keep dev and deploy boxes in sync with a GCS bucket for nearly 2 years without any problem. There are about 85k objects in the bucket.
Until recently, this worked perfectly: we'd run a deploy-box -> GCS rsync every 15 mins or so, to keep all new uploaded resource backed up, and then a GCS -> dev box rsync whenever we wanted to refresh the local dev data (running on OSX El Capitan).
Within the last couple of months, though, the GCS->dev rsync has started to bloat, downloading more and more images.
Initially I just thought "great, we're getting more resources uploaded", but it's been growing way faster than the data, until today when it seems to be downloading the whole 85k images.
I've double-checked I'm in the right place, the command is correct, the paths are correct, etc. For all that the gsutil output is scrolling by with reams and reams of "Copying..." and "Downloading..." messages, making good parallel use of our 100mbps connection, when I go to another terminal and run find . -type f | wc -l on the destination directory every 10 seconds, it shows that barely 2 or 3 new files are being added a minute. I look at modification times on files that gsutil says it's downloading right now and in the large majority they're old, plenty haven't changed in a year or more. Meaning: it's downloading all the data, using tons of time and bandwidth, all for the sake of a few hundred files.
Has something changed in recent OSX gsutil versions? Is there possibly a bug? How would I even start to go about tracking this down? Or reporting it? The newsgroups gsutil-discuss and gs-discussion have been archived, and the talk in gce-discussion is all about using gsutil from GCE instances.
Thanks!
I had a similar issue where the same files were synced over and over. I don't have that many files so you might need to check for performance but I decided to use the -c option to force using the checksum instead of mtime which was modified locally in my build process.
I think (and hope) the documentation is slightly wrong stating that
compare checksums for files if the size of source and destination as
well as mtime match
as it seems to use checksum even if mtime does not match
gsutil 4.20 (released 2016-07-20) modified the change detection algorithm for rsync. Instead of comparing only the size of the local file with its cloud counterpart, it now compares both the size and file modification time of local files. The file modification time is stored in the custom user metadata for the file when it is uploaded with rsync. If that doesn't exist the object creation time is used.
We're using GCS for our archive backup and I was curious what people thought was better for the initial update - rsync or cp?
I've gotten hung up twice (once on a non-unicode character and again on what seemed like a long path) and would like to be able to pick up where I left off.
Any advice would be appreciated!
(and if this is a bad question, can someone tell me exactly why its bad or how to fix it? it seems I suck at asking questions here!)
rsync is better suited for doing archives/backups, for the reason you hinted at - if you started uploading data and then encountered a problem partway through, restarting a cp would cause you to re-upload files that were already successfully uploaded, while rsync would only upload files that weren't uploaded (or that changed since the last upload). Moreover, if some of the source files were deleted since you last started uploading, rsync will remove them from the destination bucket, making the destination content match the source content.
I have two drives A and B. Using a python script I am creating some files in "A" drive and I am running a powerscript which copies all the files in the drive A to drive B in the interval of 1 sec.
I am getting this error in my powershell.
2015/03/10 23:55:35 ERROR 32 (0x00000020) Time-Stamping Destination
File \x.x.x.x\share1\source\ Dummy_100.txt The process cannot access
the file because it is being used by another process. Waiting 30
seconds...
How will I overcome this error?
This happened is because the file is locked by running process. To fix this, download Process Explorer. Then use Find>Find Handle or DLL, find out which process locked this file. Use 'taskkill' to kill that process in commandline. You will be fine.
if you want to skip this files you can use /r:n that n is times of tries
for example /w:3 /r:5 will try 5 time every 3 seconds
How will I overcome this error?
If backup is, what you got in mind, and you encounter in-use files frequently, you look into Volume Shadow Copies (VSS), which allow to copy files despite them being ‘in use’. It's not a product, but a windows technology used by various backup tool.
Sadly, it's not built into robocopy, but can be used in conjunction with it. See
➝ https://superuser.com/a/602833/75914
and especially:
➝ https://github.com/candera/shadowspawn
It could be many reasons.
In my case, I was running a CMD script to copy from one server to another, a heap of SQL Server backups and transaction logs. I too had the same problem because it was trying to write into a log file that was supposedly opened by another process. It was not.
I ran many IP checks and Process ID checkers that I ran out of knowing what was hogging the log file. Event viewer said nothing.
I found out it was not even the log file that was being locked. I was able to delete it by logging into the server as a normal user with no admin privileges!
It was the backup files themselves by the SQL Server Agent. Like #Oseack said, there may have been the need to use another tool whilst the backup files themselves were still being used or locked by the SQL Server Agent.
The way I got around it was to force ROBOCOPY to wait.
/W:5
did it.
I've wrote the code that creates full backups of my ESENT database, using JetBeginExternalBackup API.
Following the MSDN guidelines, I backed up every file returned by JetGetAttachInfo and JetGetLogInfo.
I've made the backup, erased old database, and copied the backup data to the database folder.
The DB engine was unable to start, the JetInit error code is "JET_errMissingLogFile".
I've checked the backup, it only contains the database file, and "<inst>XXXXX.log" log files. It lacks the current log file (I'm using circular logging, BTW).
Is there any way to restore such backup?
I don't want to use JetExternalRestore API because it's too complex: I don't need to restore to another location, I don't understand why there're 3 input folders not 2, and I don't know the values to supply in genLow and genHigh arguments.
I do need external backups: the ESENT database is used by ASP.NET on a remote server, and I'm backing it up over the Internet.
Or, maybe there's a way to retrieve the name of the current log file, and I should just add it to the backup?
Thanks in advance!
P.S. I've got no permissions to span processes on my web server, so using eseutil.exe is not an option.
Unpack all backed up files to a single folder.
Take the name of your main database file. Replace extension to .pat. Create zero-length file with that name, e.g. database.pat.
After this simple step, call JetRestoreInstance API, it will restore the backup from that folder.