Copying constantly changing directory - command-line

I am trying to copy files from a directory that is in constant use by a security cam program. To archive these .jpg files to another HD, I first need to copy them. The problem is, the directory is being filled as the copying proceeds at the rate of about 10 .jpgs per second. I have the option of stopping the program, do the copy then start it again which is not what I want to do for many reasons. Or I could do the find/mtime approach. I have tried the following:
find /var/cache/zm/events/* -mmin +5 -exec cp -r {} /media/events_cache/ \;
Which under normal circumstances would work. But it seems the directories are also changing their timestamps and branch off in different directions so it never comes out logically and for some reason each directory is very deep like /var/cache/zm/events/../../../../../../../001.jpg x 3000. All I want to do is copy the files and directories via cron with a simple command line if possible. With the directories constantly changing, is there are way to make this copy without stopping the program?
Any insight would be appreciated.

rsync should be a better option in this case but you will need to try it out. Try setting it up at off peak hours when the traffic is not that high.
Another option would be setting up the directory on a volume which uses say mirroring or RAID 5 ; this way you do not have to worry about losing data (if that indeed is your concern).

Related

using powershell to only read last 1month of folders. copy in 7 days worth of folders then using robocopy

apologies to start as im new to powershell and robocopy.
i have a robocopy command that pulls in any files within its many subfolders that are within a maxage of 7. however, the main folder has a huge amount of folders dating back years(and i only need last 7 days each week it runs) so its slow reading each file in each folder before it even copies using robocopy.
it looks like powershell commands may be a way for me to limit the search of files for my robocopy, would this be possible? currently robocopy search each files in each folder in my main folder, ideally i would want it to be smart enough to only search even a months worth of files and then copy over last 7 days. this would speed up the run time hugely.
if possible even further, i only want csv files in each of the folders in my main folder but current robocopy searches the other folders and its files as well which takes time. all the csv files are in a folder called "run" in each parent folder(parent folder is a unique number within the "mainfolder".
my robocopy command:
robocopy \\server\mainfolder \\server\new_main_folder /S /maxage:7 /r:0 /w:0
I was going to point to you either FastCopy or FreeFileSync, both handle long file name paths and work well for me. But found problems running FastCopy when trying to filter folders the way you described. I wasn't getting the results I expected, so that leaves FreeFileSync. There is a little bit of a learning curve with FreeFileSync, but really, the only problem/complaint I've had with it is the xml based batch script that you can use to automate the program kept changing formats and they haven't been providing a way to read the old xml batch scripts with the new version of the software. Maybe that has changed, I haven't looked into that lately.
Maybe other people have had better experience with RoboCopy, but I found it to take literally many multiples longer to do the same job as many other copy programs. I don't think FreeFileSync is as fast as FastCopy, but I've never seen it act as bad as what I experienced with RoboCopy.
The way FreeFileSync works is:
You define 1 or more source/destination pairs.
There is a global setting at the top to set the defaults for all copy pairs.
There are individual settings per each copy pair that when set override the global settings.
In the filter tab of the settings you can set "Time span:" to "Last x days:" and set it to the 7 days that you want.
You can change include from * to something like \run\*.csv. I didn't try that exact pattern, but the patterns I did try worked as expected (Unlike FastCopy).
The Synchronization tab is the tricky/fun one. You can do logs, versioning, tell the system to shutdown or restart when done, maintain a database for tracking moved files ("Detect moved files" checkbox), and all kinds of adjustments to how it behaves when files don't match.
When done, there is I believe at least 2 options for saving the configuration - though I've always just created the xml based batch script and called that from another scripting language or an icon on the desktop.

how to create a script that allows to use the path list as a reference for copying files in PowerShell in .bat script

I'm looking for a way to automate archiving where after I plug my two external drives I can copy all my resources. The problem is that I have different file structures on my laptop and on both external drives so I need to select specific folders to be copied. It means that I can't select one root folder and copy it straightforward. I tried to find a way to declare more than one path in the cp command and in the copy command, without success. An example path:
/my_programming_stuff
/folder1
/folder2
/folder3
/folder4
I want to select only the first 3 folders to copy them into external drive1 and external drive 2. The idea is to create a .bat file that will copy everything at once ( in the best case scenario it will be copied simultaneously on both external drives, so it will be much faster). Another problem is that there needs to be a bypass the ntfs long path limitations (max. 260 characters).
Flags that I want to use:
Copy the files and directories and all of their attributes,
including ownerships and permissions.
Recursively copy directories and their contents.
When copying files from one directory to another, only
copy files that either doesn't exist or are newer than the
existing corresponding files, in the destination
directory.
data verification (so it's certain that the copy was verified)
progression bar with time eta
Until now I was using Total Commander to do this but every day I need to pick only a few folders to be copied which takes time and is inefficient.
I have experience with Bash and PowerShell but I am not sure how to handle this topic.
Create a static batch file with robocopy commands. I think /copyall is the only switch you need to specify for all this. Other defaults should satisfy requirements.
https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/robocopy
I think your time will be better spent learning how to use either FastCopy or FreeFileSynce. I used FreeFileSync some years ago but got disgusted with the it's constantly changing format of its xml file used for starting a backup, so I switched to FastCopy. But it looks like FreeFileSync may be getting their act together and I aim to do some experiments over the summer to see if I want to switch back to it.
Both can handle the long filename format issues, both can be executed by a batch file, both seem to have a lot of quality, but FreeFileSync has more features - and more bloated because of the features. But speed wise, I think FastCopy is probably one of the better products out there and very streamline in use and design.

ClearCase, a makefile use case

I have an issue with the clearmake command in IBM ClearCase,
I use clearmake command to run my own makefile so i can build my program from the 'C' sourse code.
I want to put a command in make file, like shell cleartool -some-command to ignore all checkouts and all private files.
The disadvantage is that in config spec, i must include the command element * CHECKEDOUT.
But in my use case i want to working with files and the same time i could make a compile/build with the old files, so i could work faster and i shouldn't change views or edit configspecs.
But my contemplation is, if i can ignore the checked outed files with a command, without to lose it.
Could you give me a solution ?
I want to working with files and the same time i could make a compile/build with the old files,
It would be easier to use two different snapshot views loaded on the disk at two different places.
In one (where no checkout has ever been done), you can set all files writables (through Windows, not ClearCase): all the files becomes hijacked, but modifiable, host for compilation/testing purposes.
In the other view, you keep your checked out files and your work in progress (but do not run your clearmake).

Reduce relocatable win32 Perl to as few files and bytes as possible

I'm trying to use a perl program on a Windows HTCondor computing cluster. The way HTCondor on windows works is it copies all dependencies into a temporary directory (used as a chroot of sorts) and then it deletes the directory after the specified outputs are moved to a designated place.
If I take only perl.exe and perl514.dll and make a job like this: perl -e "print qq/hello\n/" and tell the cluster to run it 200 times, then each replication winds up taking about 15 seconds, which is acceptable overhead. That's almost all time spent repeatedly copying the files over the network and then deleting them. echo_hello.bat run 200 times takes more like two seconds per replication.
The problem I have is that when I try to use my full blown perl distribution of 55MB and 2,289 files, a single "hello" rep takes something like four minutes of copying and deleting, which is unacceptable. When I try to do many runs the disks on the machines grind to a halt trying to concurrently handle all the file operations across all the reps, so it doesn't work at all. I don't know how long it might take to eventually finish because I gave up after half an hour and no jobs had finished.
I figured PAR::Packer might fix the issue, but nope. I tried print_hello.exe created like this: pp -o print_hello.exe -e "print qq/hello\n/". It still makes things grind to a halt, apparently by swamping the filesystem. I think a PAR::Packer executable makes a ton of temporary files as it pulls out files it needs from the archive. I think the windows file system totally chokes when there are a bunch of concurrent small file operations.
So how can I go about cutting down the perl I built to something like 6MB and a dozen files? I'm really only using a tiny number of core modules and don't need most of the crap in bin and lib, but I have no idea how to proceed ripping out stuff in a sane way.
Is there an automated way to strip away un-needed files and modules?
I know TCL has a bunch of facilities for packing files into a single uncompressed archive that can then be accessed through a "virtual filesystem" without expanding the file. Is there some way to do this with perl itself sort of like with PAR? The problem is PAR compresses everything and then has to extract to temporary files, rather than directly work through a virtual filesystem layer. (If I understand correctly.)
My usage of perl is actually as a scripting layer. It's embedded in a simulation. So I'm really running my_simulation.exe which depends on per514.dll, but you get the idea. I also cannot realistically do anything to the HTCondor cluster other than use it. So there's no need to think outside the box on what I should be using instead of perl and what I could administratively tweak in Windows and HTCondor, thanks.
You can use Module::ScanDeps to get list of actual dependencies of your perl. It was terrible, that it took significant amount of time, when PAR::Packer unpacked the whole application, so I decided to build the executable by myself.
Here is my ready to use script which gathers perl dependencies into some directory; it might be useful for you to reduce the number of perl-modules, e.g. by manually removing some dependencies after copying.
In theory (I have never tried that), the next your step could be merge all pure-perl dependencies into single file (like deps.pm); although it might be non-trivial due to perl's autoload magic and some other tricks.
You can list the modules that are needed by your program using the very nice ListDependencies module
To my knowledge it isn't downloadable anywhere, but it is simple to copy and paste into your own ListDependencies.pm file
You should read the POD documentation within the module for usage instructions

How to detect changing directory size in Perl

I am trying to find a way of monitoring directories in Perl, in particular the size of a directory, and upon detecting a change in directory size, perform a particular action.
The issue I have is with large files that require a noticeable amount of time to copy into this directory, i.e. > 100MB. What happens (in Windows, not Unix) is the system reserves enough disk space for the entire file, even though the file is still copying in progress. This causes problems for me, because my script will try to perform an action on this file that has not finished copying over. I can easily detect directory size changes in Unix via 'du', but 'du' in Windows does not behave the same way.
Are there any accurate methods of detecting directory size changes in Perl?
Edit: Some points to clarify:
- My Perl script is only monitoring a particular directory, and upon detecting a new file or a new directory, perform an action on this new file or directory. It is not copying any files; users on the network will be copying files into the directory I am monitoring.
- The problem occurs when a new file or directory appears (copied, not moved) that is significantly large (> 100MB, but usually a couple GB) and my program fires before this copy completes
- In Unix I can easily 'du' to see that the file/directory in question is growing in size, and take the appropriate action
- In Windows the size is static, so I cannot detect this change
- opendir/readdir/closedir is not feasible, as some of the directories that appear may contain thousands of files, and I want to avoid the overhead of
Ideally I would like my program to be triggered on change, but I am not sure how to do this. As of right now it busy waits until it detects a change. The change in file/directory size is not in my control.
You seem to be working around the underlying issue rather than addressing it -- your program is not properly sending a notification when it is finished copying a file. Why not do that instead of using OS-specific mechanisms to try to indirectly determine when the operation is complete?
You can use Linux::Inotify2 or Win32::ChangeNotify to detect directory/file changes.
EDIT: File::ChangeNotify seems a better option (cross-platform & used by Catalyst)
As I understand it, you are polling a directory with thousands of files. When you see a new file, there is an action that is taken on the file. This causes problems if the file is in use or still being copied, correct?
There are potentially several solutions:
1) Use flock to detect if the file is still in use by another process (test if it works properly on your OS, file system, and Perl version).
2) Use a LockFile call on Windows. If it fails, the OS or another process is using that file.
3) Change the poll interval to a non busy time on the server and take the directory off line while your process completes.
Evaluating the size of a directory is something all but the most inexperienced Perl programmers should be able to do. You can write your own portable version of du in 15 lines of code if you know about:
Either glob or opendir / readdir / closedir to iterate through the files in a directory
The filetest operators (-f file, -d file, etc.) to distinguish between regular files and directory names
The stat function or file size operator -s file to obtain the size of a file
There is a nice module called File::Monitor, it will detect new files, deleted files, changes in size and any other attribute that can be done with stat. It will then go and out put the files for you.
http://metacpan.org/pod/File::Monitor
You set up a baseline scan, then set up a call back for each item you are looking for, so new changes you can see via
$monitor->watch( {
name => 'somedir',
recurse => 1,
callback => {
files_created => sub {
my ($name, $event, $change) = #_;
# Do stuff
}
}
} );
If you need to go deeper than one level just do it to whatever level you need. After this is done and it finds new files you can trigger you application to do what you want on the files.