MCE chunk size when reading from STDIN - perl

I'm writing a Perl program that processes a high number of log entries. To speed things up I'm using MCE to create a number of worker processes to handle processing in parallel. So far things are great, but I've found myself trying different chunk sizes in a very unscientific manner. Here's some background before I get to my question.
We're receiving logs from a number of syslog sources and collecting them at a central location. The central syslog server writes these log entries to a single text file. The program in question picks up the logs from this text file, does some munging, and ships them elsewhere. The raw log files are then archived.
To read the log files I'm doing this:
my $tail = 'tail -q -F '.$logdir.$logFile;
open my $tail_fh, "-|", $tail or die "Can't open tail\n";
I'm then using mce_loop_f to iterate over the file handle:
mce_loop_f { my $hr = process($_); MCE->gather($hr); } $tail_fh;
This works well for the most part, but if there is a spike in log activity the program starts to get bogged down. While there are a number of factors to make things "go faster" one of those factors I'm a little unsure of is chunk_size in MCE.
I understand that chunk_size has an "auto" value, but how would that work on a file handle that is a pipe from tail? Would auto be appropriate here?
What factors should I consider when adjusting the chunk_size? Log entries occur at a rate of 1000-2000 events per second depending on time of day (1000 at night, 2000 during the day).
I'm also a neophyte when it comes to MCE, so if mce_loop_f is a bad idea for this use case, please let me know.

Related

Tcl How to write data to a specific line number in the middle of operating

Is there any way or command in Tcl for writing in the middle of {data.txt} and also specific line number ... ?
for example after writing data in text file, when I'm writing in line number 1000, is there any way for turning back to line number 20 and adding the data in this line for output. (something look like llappend & append for list variables, but in puts command)
You can use seek to move the current location in a channel opened on a file (it's not meaningful for pipes, sockets and serial lines); the next read (gets, read) or write (puts) will happen at that point. Except in append mode, when writes always go to the end.
seek $channel $theNewLocation
However, the location to seek to is in bytes; the only locations that it is trivially easy to go to are the start of the file and the end of the file (the latter using end-based indexing). Thus you need to either remember where “line 20” really is from the first time, or go to the start and read forward a few lines.
seek $channel 0
for {set i 0} {$i < 20} {incr i} {gets $channel}
Also be aware that the data afterwards does not shuffle up or down to accommodate what you've written the second time. If you don't write exactly the same length of data as was already there, you end up with partial lines in the file. Truncating the file with chan truncate might help, but might also be totally the wrong thing to do. If you're going to go back and patch a text file, writing an extra long line where you're going to do the patch (e.g., with lots of spaces on the end) can make it much easier. It's a cheesy hack, but simple to do.

Mistake when use abaqus subroutine to read file with multiple processors(cpus)

I got a mistake when I use abaqus subroutine to read file with multiple processors(cpus),could you help me to deal with this mistake.thanks a lot
I want to read variables from a file ,when one cpu is used,everything is ok,
but when more than one cpus are used,there will be a mistake,it seems that every cpu repeat the same command.
for example,the following is the contents of the file to read from,file name is data.dat
*matID ,2,1
131000.000, 8880.000, 8180.000
0.324, 0.324, 0.300
3990.000, 5320.000, 5320.000
1871.000, 59.700, 59.700
1291.000, 215.000, 215.000
90.000, 102.000, 102.000
my subroutine is shown as follow:
character*12 check1
integer check2,error
OPEN(10,file='data.dat',status='old',iostat=error)
if (error.EQ.0) then
read(10,*,iostat=error) check1,Nm
end if
close(10)
print *,'Nm=',nm,error
print *,'**'
when I use 2 cpus,the printed results will be :
Nm= 2 0
Nm= 8880 0
**
**
Depending on the reason for reading in data from a file, there are a couple of ways to avoid this problem:
If you only need to access the data once:
Read in the data in a subroutine that is always called in serial. UEXTERNALDB is a good example and can be used so that the file open only happens at the beginning of an analysis or the beginning of an increment as needed. You can then carefully store the information in common blocks. Reading from a common block in parallel should work fine, but do not write to them from the parallel subroutines.
Another way to get in a smaller amount of data is to define solution variables in your input file instead.
If you really need to open this file locally within each parallel thread (can't see why but open to correction), you can use GETNUMCPUS and GETRANK to open different copies of the files within each thread. GETRANK returns an integer giving you the rank/id of the process. I would advise against this method though. If your problem is large enough to warrant using parallel, then you should avoid slowing it down with file reads.
For more info see sections 1.1.31 and 2.1.4 of the Abaqus 6.14 docs.

To read a big file which are in Gigs fastly in PERL

We are currently reading the file line by line which delays to read and complete for all.
we would need to read the file fastly and prgoress with our commands.
the commands which i tried using fork and array just displays me the first set of lines only and not proceeding with pther sets.
please help on it.
Reading a large file takes a fair bit of time - disks are slow, after all. Before you start looking at Perl, first try (assuming you're on a unix-type system):
time cat /path/to/your/large/file >/dev/null
The output will tell you how long it takes to just read that file from disk without doing anything to it. Alternately, open the file in your favorite text editor and time how long it takes to load. Once you have that time, compare it to how long your Perl program takes to read the file. Unless the Perl program takes significantly longer, you're not likely to be able to do anything about it because the time is being spent on getting the data from disk rather than on processing it.
Of course, that's assuming that you actually do need to read the entire file. If you can get by with only reading specific parts of it, then you could create an index file and use that to jump directly to the part that's of interest, but you haven't provided enough information for us to tell whether that would apply to your case or not.
If you need more specific help, please provide a better description of what you mean to accomplish and a small, runnable piece of Perl code which shows how you're currently reading and processing the file so that we can see whether you're doing anything particularly inefficient that can be improved on.

Automatically running a script to read particular information from a .txt file ? (Perl Script, or suggest)

My scenario: A text file(s) will keep coming into say a folder, I need to detect the new text file, and read particular information from it, say format being (word : info, OR word and under it a column of info, etc.). And, this process needs to keep going on always.
Problem: How should I go about doing this, I guess use perl scipt, but where to go from there ?, I am getting ideas, and also help on the internet, but I thought asking it here might make my thoughts clearer.
Kindly help, please suggest a path to do this.
Regards,
Chirayu
First thing: you want a daemon process, so you may want to have a look at Proc::Daemon.
Second thing, you need to read & parse your file. Parsing it, depends on its format, and your question is not really clear about it.
Finally, you may want to consider moving a newly detected file (or renaming it) while processing it, end then (possibly) deleting it after having processed. This depends on the requirements that you have. Alternatively, you may want to move the newly detected file into an archive directory after having processed them.
One approach might be to have a perl process that regularly (say every 5 seconds, every 5 minutes or every 5 hours, your call really) scans said directory and as soon as any new text file appears, spawn a child process that process it.
The child process might be another perl script which gets the name of the text file as it's argument and which reads the file, detects the word you mention and then extracts the information you are interested in (and then does whatever you consider necessary with that information).
Things to look out for is what to do with the text files once they are processed. Are they supposed to stay around? Then you need to keep track of which of them you have processed, so they do not get processed again in the case your master process (the one that scans the directory and spawn perl children) has to be restarted (due to either a crash or a deliberate restart).
If the text files are supposed to disappear once they are processed, then I assume it could be a good idea to either let the children remove them after completion or to let the master process remove them provided the master process always waits for the children to complete before it continues running. The drawback with a master process waiting for children to complete is that children then cannot be run in parallell but has to be run in strict sequence (not necessary a drawback depending on your situation).
(If you have a master process always waiting for the child process to run, you can actually skip having child processes altogether and create a subroutine in the master program which reads and processes the text file).
High level description but hope it helps.
What is the operating system you are using?
On Windows, you can use Win32::ChangeNotify and on Linux, you can use Linux::Inotify2 to be notified of changes to the contents of a directory.
Your script can simply wait to be notified and take action when notified instead of polling the contents of the directory which will either waste resources or potentially miss some changes.

How to detect changing directory size in Perl

I am trying to find a way of monitoring directories in Perl, in particular the size of a directory, and upon detecting a change in directory size, perform a particular action.
The issue I have is with large files that require a noticeable amount of time to copy into this directory, i.e. > 100MB. What happens (in Windows, not Unix) is the system reserves enough disk space for the entire file, even though the file is still copying in progress. This causes problems for me, because my script will try to perform an action on this file that has not finished copying over. I can easily detect directory size changes in Unix via 'du', but 'du' in Windows does not behave the same way.
Are there any accurate methods of detecting directory size changes in Perl?
Edit: Some points to clarify:
- My Perl script is only monitoring a particular directory, and upon detecting a new file or a new directory, perform an action on this new file or directory. It is not copying any files; users on the network will be copying files into the directory I am monitoring.
- The problem occurs when a new file or directory appears (copied, not moved) that is significantly large (> 100MB, but usually a couple GB) and my program fires before this copy completes
- In Unix I can easily 'du' to see that the file/directory in question is growing in size, and take the appropriate action
- In Windows the size is static, so I cannot detect this change
- opendir/readdir/closedir is not feasible, as some of the directories that appear may contain thousands of files, and I want to avoid the overhead of
Ideally I would like my program to be triggered on change, but I am not sure how to do this. As of right now it busy waits until it detects a change. The change in file/directory size is not in my control.
You seem to be working around the underlying issue rather than addressing it -- your program is not properly sending a notification when it is finished copying a file. Why not do that instead of using OS-specific mechanisms to try to indirectly determine when the operation is complete?
You can use Linux::Inotify2 or Win32::ChangeNotify to detect directory/file changes.
EDIT: File::ChangeNotify seems a better option (cross-platform & used by Catalyst)
As I understand it, you are polling a directory with thousands of files. When you see a new file, there is an action that is taken on the file. This causes problems if the file is in use or still being copied, correct?
There are potentially several solutions:
1) Use flock to detect if the file is still in use by another process (test if it works properly on your OS, file system, and Perl version).
2) Use a LockFile call on Windows. If it fails, the OS or another process is using that file.
3) Change the poll interval to a non busy time on the server and take the directory off line while your process completes.
Evaluating the size of a directory is something all but the most inexperienced Perl programmers should be able to do. You can write your own portable version of du in 15 lines of code if you know about:
Either glob or opendir / readdir / closedir to iterate through the files in a directory
The filetest operators (-f file, -d file, etc.) to distinguish between regular files and directory names
The stat function or file size operator -s file to obtain the size of a file
There is a nice module called File::Monitor, it will detect new files, deleted files, changes in size and any other attribute that can be done with stat. It will then go and out put the files for you.
http://metacpan.org/pod/File::Monitor
You set up a baseline scan, then set up a call back for each item you are looking for, so new changes you can see via
$monitor->watch( {
name => 'somedir',
recurse => 1,
callback => {
files_created => sub {
my ($name, $event, $change) = #_;
# Do stuff
}
}
} );
If you need to go deeper than one level just do it to whatever level you need. After this is done and it finds new files you can trigger you application to do what you want on the files.