How to detect changing directory size in Perl - perl

I am trying to find a way of monitoring directories in Perl, in particular the size of a directory, and upon detecting a change in directory size, perform a particular action.
The issue I have is with large files that require a noticeable amount of time to copy into this directory, i.e. > 100MB. What happens (in Windows, not Unix) is the system reserves enough disk space for the entire file, even though the file is still copying in progress. This causes problems for me, because my script will try to perform an action on this file that has not finished copying over. I can easily detect directory size changes in Unix via 'du', but 'du' in Windows does not behave the same way.
Are there any accurate methods of detecting directory size changes in Perl?
Edit: Some points to clarify:
- My Perl script is only monitoring a particular directory, and upon detecting a new file or a new directory, perform an action on this new file or directory. It is not copying any files; users on the network will be copying files into the directory I am monitoring.
- The problem occurs when a new file or directory appears (copied, not moved) that is significantly large (> 100MB, but usually a couple GB) and my program fires before this copy completes
- In Unix I can easily 'du' to see that the file/directory in question is growing in size, and take the appropriate action
- In Windows the size is static, so I cannot detect this change
- opendir/readdir/closedir is not feasible, as some of the directories that appear may contain thousands of files, and I want to avoid the overhead of
Ideally I would like my program to be triggered on change, but I am not sure how to do this. As of right now it busy waits until it detects a change. The change in file/directory size is not in my control.

You seem to be working around the underlying issue rather than addressing it -- your program is not properly sending a notification when it is finished copying a file. Why not do that instead of using OS-specific mechanisms to try to indirectly determine when the operation is complete?

You can use Linux::Inotify2 or Win32::ChangeNotify to detect directory/file changes.
EDIT: File::ChangeNotify seems a better option (cross-platform & used by Catalyst)

As I understand it, you are polling a directory with thousands of files. When you see a new file, there is an action that is taken on the file. This causes problems if the file is in use or still being copied, correct?
There are potentially several solutions:
1) Use flock to detect if the file is still in use by another process (test if it works properly on your OS, file system, and Perl version).
2) Use a LockFile call on Windows. If it fails, the OS or another process is using that file.
3) Change the poll interval to a non busy time on the server and take the directory off line while your process completes.

Evaluating the size of a directory is something all but the most inexperienced Perl programmers should be able to do. You can write your own portable version of du in 15 lines of code if you know about:
Either glob or opendir / readdir / closedir to iterate through the files in a directory
The filetest operators (-f file, -d file, etc.) to distinguish between regular files and directory names
The stat function or file size operator -s file to obtain the size of a file

There is a nice module called File::Monitor, it will detect new files, deleted files, changes in size and any other attribute that can be done with stat. It will then go and out put the files for you.
http://metacpan.org/pod/File::Monitor
You set up a baseline scan, then set up a call back for each item you are looking for, so new changes you can see via
$monitor->watch( {
name => 'somedir',
recurse => 1,
callback => {
files_created => sub {
my ($name, $event, $change) = #_;
# Do stuff
}
}
} );
If you need to go deeper than one level just do it to whatever level you need. After this is done and it finds new files you can trigger you application to do what you want on the files.

Related

how to create a script that allows to use the path list as a reference for copying files in PowerShell in .bat script

I'm looking for a way to automate archiving where after I plug my two external drives I can copy all my resources. The problem is that I have different file structures on my laptop and on both external drives so I need to select specific folders to be copied. It means that I can't select one root folder and copy it straightforward. I tried to find a way to declare more than one path in the cp command and in the copy command, without success. An example path:
/my_programming_stuff
/folder1
/folder2
/folder3
/folder4
I want to select only the first 3 folders to copy them into external drive1 and external drive 2. The idea is to create a .bat file that will copy everything at once ( in the best case scenario it will be copied simultaneously on both external drives, so it will be much faster). Another problem is that there needs to be a bypass the ntfs long path limitations (max. 260 characters).
Flags that I want to use:
Copy the files and directories and all of their attributes,
including ownerships and permissions.
Recursively copy directories and their contents.
When copying files from one directory to another, only
copy files that either doesn't exist or are newer than the
existing corresponding files, in the destination
directory.
data verification (so it's certain that the copy was verified)
progression bar with time eta
Until now I was using Total Commander to do this but every day I need to pick only a few folders to be copied which takes time and is inefficient.
I have experience with Bash and PowerShell but I am not sure how to handle this topic.
Create a static batch file with robocopy commands. I think /copyall is the only switch you need to specify for all this. Other defaults should satisfy requirements.
https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/robocopy
I think your time will be better spent learning how to use either FastCopy or FreeFileSynce. I used FreeFileSync some years ago but got disgusted with the it's constantly changing format of its xml file used for starting a backup, so I switched to FastCopy. But it looks like FreeFileSync may be getting their act together and I aim to do some experiments over the summer to see if I want to switch back to it.
Both can handle the long filename format issues, both can be executed by a batch file, both seem to have a lot of quality, but FreeFileSync has more features - and more bloated because of the features. But speed wise, I think FastCopy is probably one of the better products out there and very streamline in use and design.

Force overwrite or delete file in use (executable that currently runs)

I'm looking for solution to delete or (preferably directly) overwrite source of an exe file while it is running.
To explain further before you get it all wrong, I'll give an example:
I have an exe file on drive D:\ which I run (with previously posted question's answer, giving params to "Start in" folder on C:\Program Files\MyProgram\" so it finds its dlls.
Now after the file is running, I'd like to rewrite the file's byte stream (just like opening it in hex editor...), or at least delete it so I can copy over new exe file directly using same name.
So far the solution I'm using is that I trigger format D: command for the whole drive D:\ (which, in my case is ramdisk and thumb-drive, as I only have this exe on it, I copy it there as necessary), since that removes the file and let's me copy new file there.
Trying to use del myProgram.exe even with -force flag triggers error that access to the file is denied. Same goes if I try to overwrite the contents of the file.
Is there any alternative to do that without using the format command, as that requires to have partition drive only for the purpose?
Update: Note: MoveFileEx and similar techniques that require termination of the process or system restart/reboot are not qualified as a solution. This should be done while the process is running without further actions that can compromise the process's run state.
On a side note, when formatting the drive using the Powershell's format command, the file is gone, although if viewing the partition using Hex viewer tool, there is full binary (hex) content of the exe visible there and an be restored using just as simple as copy-paste technique. This is one of the points as to where overwriting the file contents would be preferable than deleting the file directly.
Please note: This is a knowledge and skills based question, and would therefore appreciate sparing the moral and security-concerning comments about such actions and behaviour.
For deleting/replacing/overwriting a file at least two conditions must be met:
The user performing the operation must have the required permissions to do so. This can be verified for instance via Get-Acl or icacls.
Windows must not have an open handle to the file. This can be checked for instance with tools like Process Explorer or handle. These tools can also be used to forcibly close open handles, although that's not recommended as it may cause data loss and/or damage to the files in question. I'm not sure, though, if it's actually possible to close handles to an executable without terminating the process.
Note that antivirus software is likely to interfere with this kind of operation.
The basic problem here is that Windows loads from the .EXE upon demand, it's not all read in at once.
If you destroy the original file what happens when it tries to load in a page that no longer exists?
If I had to write something of this sort I would copy the .exe to a temporary location (beware that running code from the temp directory may be prohibited), run the new .exe, terminate the old one and then do what I want to it.

Perl: Is it better to clobber a file or remove it and open a new one?

For example,
#!/usr/bin/perl
open FILE1, '>out/existing_file1.txt';
open FILE2, '>out/existing_file2.txt';
open FILE3, '>out/existing_file3.txt';
versus
#!/usr/bin/perl
if (-d out) {
system('rm -f out/*');
}
open FILE1, '>out/new_file1.txt';
open FILE2, '>out/new_file2.txt';
open FILE3, '>out/new_file3.txt';
In the first example, we clobber the files (truncate them to zero length). In the second, we clean the directory and then create new files.
The second method (where we clean the directory) seems redundant and unnecessary. The only advantage to doing this (in my mind) is that it resets permissions, as well as the change date.
Which is considered the best practice? (I suspect the question is pedantic, and the first example is more common.)
Edit: The reason I ask is because I have a script that will parse data and write output files to a directory - each time with the same filename/path. This script will be run many times, and I'm curious whether at the start of the script I should partially clean the directory (of the files I am writing to) or just let the file handle '>' clobber the files for me, and take no extra measures myself.
Other than the permissions issue you mentioned, the only significant difference between the two methods is if another process has one of the output files open while you do this. If you remove the file and then recreate it, the other process will continue to see the data in the original file. If you clobber the file, the other process will see the file contents change immediately (although if it's using buffered I/O, it may not notice it until it needs to refill the buffer).
Removing the files will also update the modification time of the containing directory.

Copying constantly changing directory

I am trying to copy files from a directory that is in constant use by a security cam program. To archive these .jpg files to another HD, I first need to copy them. The problem is, the directory is being filled as the copying proceeds at the rate of about 10 .jpgs per second. I have the option of stopping the program, do the copy then start it again which is not what I want to do for many reasons. Or I could do the find/mtime approach. I have tried the following:
find /var/cache/zm/events/* -mmin +5 -exec cp -r {} /media/events_cache/ \;
Which under normal circumstances would work. But it seems the directories are also changing their timestamps and branch off in different directions so it never comes out logically and for some reason each directory is very deep like /var/cache/zm/events/../../../../../../../001.jpg x 3000. All I want to do is copy the files and directories via cron with a simple command line if possible. With the directories constantly changing, is there are way to make this copy without stopping the program?
Any insight would be appreciated.
rsync should be a better option in this case but you will need to try it out. Try setting it up at off peak hours when the traffic is not that high.
Another option would be setting up the directory on a volume which uses say mirroring or RAID 5 ; this way you do not have to worry about losing data (if that indeed is your concern).

Automatically running a script to read particular information from a .txt file ? (Perl Script, or suggest)

My scenario: A text file(s) will keep coming into say a folder, I need to detect the new text file, and read particular information from it, say format being (word : info, OR word and under it a column of info, etc.). And, this process needs to keep going on always.
Problem: How should I go about doing this, I guess use perl scipt, but where to go from there ?, I am getting ideas, and also help on the internet, but I thought asking it here might make my thoughts clearer.
Kindly help, please suggest a path to do this.
Regards,
Chirayu
First thing: you want a daemon process, so you may want to have a look at Proc::Daemon.
Second thing, you need to read & parse your file. Parsing it, depends on its format, and your question is not really clear about it.
Finally, you may want to consider moving a newly detected file (or renaming it) while processing it, end then (possibly) deleting it after having processed. This depends on the requirements that you have. Alternatively, you may want to move the newly detected file into an archive directory after having processed them.
One approach might be to have a perl process that regularly (say every 5 seconds, every 5 minutes or every 5 hours, your call really) scans said directory and as soon as any new text file appears, spawn a child process that process it.
The child process might be another perl script which gets the name of the text file as it's argument and which reads the file, detects the word you mention and then extracts the information you are interested in (and then does whatever you consider necessary with that information).
Things to look out for is what to do with the text files once they are processed. Are they supposed to stay around? Then you need to keep track of which of them you have processed, so they do not get processed again in the case your master process (the one that scans the directory and spawn perl children) has to be restarted (due to either a crash or a deliberate restart).
If the text files are supposed to disappear once they are processed, then I assume it could be a good idea to either let the children remove them after completion or to let the master process remove them provided the master process always waits for the children to complete before it continues running. The drawback with a master process waiting for children to complete is that children then cannot be run in parallell but has to be run in strict sequence (not necessary a drawback depending on your situation).
(If you have a master process always waiting for the child process to run, you can actually skip having child processes altogether and create a subroutine in the master program which reads and processes the text file).
High level description but hope it helps.
What is the operating system you are using?
On Windows, you can use Win32::ChangeNotify and on Linux, you can use Linux::Inotify2 to be notified of changes to the contents of a directory.
Your script can simply wait to be notified and take action when notified instead of polling the contents of the directory which will either waste resources or potentially miss some changes.