I have a perl program that read the packets of a flow from a pcap file, but it takes a lot of time,I want to make it parallel,but I don't know is it possible or not?if yes can I do it with MPI?and another question, the best way for making this code parallel,here is the piece of my code ( I think I should work on this part for paralleling, but I don't know the best way!)
while (!eof($inFileH))
{
#inFileH is the handler of the pcap file
#in each while I read one packet
$ts_sec = readBytes($inFileH,4);
$ts_usec = readBytes($inFileH,4);
$incl_len = readBytes($inFileH,4);
$orig_len = readBytes($inFileH,4);
if ($totalLen == 0) # it is the 1st packet
{
$startTime = $ts_sec + $ts_usec/1000000;
}
$timeStamp = $ts_sec + $ts_usec/1000000 - $startTime;
$totalLen += $orig_len;
$#packet = -1; n # initing the array
for (my $i=0 ; $i<$incl_len ; $i++) #read all included octects of the current packet
{
read $inFileH, $packet[$i], 1;
$packet[$i] = ord($packet[$i]);
}
#and after that I will work on the "packet" and analyze it
so how should I send the file content for other processors to work on it in parallel.....
First you need to determine the bottleneck. If it is really CPU usage (i.e. CPU usage is at 100% while you are running the script), you need to figure out where the processing spends its time.
This may well be in the way that you are parsing the input. There may be obvious ways to speed this up. For instance, if you use complex regular expressions, and focus exclusively on matching input correctly, there may be ways to make the matching a lot faster by rewriting the expressions or doing simpler matches before trying more complex ones.
If you can't reduce CPU usage far enough in this way, and you really want to parallelize, see if you can employ the mechanism with which Perl was born: Unix pipes. You can write Perl scripts that pass data through to each other in a pipeline, or you can do the creation of the processes and pipes within Perl itself (see perlopentut, and if that isn't enough, perlipc).
As a general rule, I would consider these options first before trying other mechanisms, but it really depends of the details of what you're trying to do and the context in which you need to do it.
Related
TL;DR I am trying to send syslog messages I have already created to a syslog server using spoofed source IPs but I'm making it really hard for myself and could do with a succinct approach
Most questions about syslog do not include the spoofed IP problem hence I am asking afresh.
I am writing (well updating) a script I wrote a long time ago that generates spoofed syslog messages (using UDP). It currently uses Net::RawIP which is terrible for portability and also the code for the transmission I wrote ages ago has decided to stop working in Perl 5 (I haven't used this for ages and I am refreshing it). I have been meaning to get rid of Net::RawIP for ages but its the only one I know how to use!
Given I have to fix it and I have a little time at the moment I probably want to move to use the Socket capability, which is what I have been playing with - using code from SO or gists or other places I can find - rather than something like IO::Socket as I need the spoofed IPs permission given a low level ability to write to sockets.
However, I have tied myself in knots with this, what I have right now forms the packets from scratch and then creates a socket and sends it, but in the processes wraps a superflous IPv4 header (I can see using wireshark) and without starting afresh I think its stuck like that as it has a fundamental flaw, hence I'm not sharing old code.
Basically, I can keep playing with the overly complicated code I have or ask for help simplify it, as I am beyond my knowledge of sockets and many hours of googling haven't helped much. What I keep finding is code that will work but it not compliant in some way - probably not an issue for what they are usually for which is for DDOS or syn attacks or whatever.
Key requirements of this are (which every attempt I have done has failed in some form!):
must come from a spoofed source IP and go to a known destination IP
(hence I'm using UDP) (both of which I have in config variables) so that the syslog server things many different devices generated the logs
must come from a set port and go to a set port (both of which I have in my
existing config variables)
must contain a message I have already formed which includes all the syslog content (the PRI and the syslog message content)
must be fully complaint with checksums and packet lengths etc when received
be as portable as possible (I'll probably embed it in my main script to keep it all in one file I can share with others).
I just feel there must be a simple and easy way to do this as everything I have is overly flexible and complicated and a nightmare to unwind.
NB This is shared in Sourceforge as "must syslog" and so you can see what I used to do but be aware its stopped working so it wont work if you run it currently! Once I fix this I'll upload a new version.
Cheers, --Chris
Actually I just had a breakthrough, I have linked the sources in the code but I found a test harness that is pretty basic and some UDP checksum code that with a bit of playing worked together and wireshark confirms its all correct
Posted in case someone else needs it as its taken me a good couple of days to get to this
#!/usr/bin/perl
use Socket;
use List::Util qw(sum);
sub udp_checksum {
# thanks to ikegami in post https://stackoverflow.com/questions/46181281/udp-checksum-function-in-perl
my $packed_src_addr = shift; # As 4-char string, e.g. as returned by inet_aton.
my $packed_dst_addr = shift; # As 4-char string, e.g. as returned by inet_aton.
my $udp_packet = shift; # UDP header and data as a string.
my $sum = sum(
IPPROTO_UDP,
length($udp_packet),
map({ unpack('n*', $_) }
$packed_src_addr,
$packed_dst_addr,
$udp_packet."\0", # Extra byte ignored if length($udp_packet) is even.
),
);
while (my $hi = $sum >> 16) {
$sum = ($sum & 0xFFFF) + $hi;
}
return ~$sum & 0xFFFF;
}
#this was found and adapted from http://rhosted.blogspot.com/2009/08/creating-udp-packetip-spoofing-through.html
$src_host = $ARGV[0];
$dst_host = $ARGV[1];
$src_port = 33333;
$dest_port = 514;
$cksum = 0; #initialise, we will sort this later
#$udp_len is the udp packet length in the udp header. Its just 8 plus the length of the data
#for this test harness we will set the data here too
$data = "<132> %FWSM-3-106010: Deny inbound tcp src outside:215.251.218.222/11839 dst inside:192.168.1.1/369";
$udp_len = 8+length($data);
$udp_proto = 17; #17 is the code for udp
#Prepare the udp packet, needed for the checksum to happen, then get the checksum (horrifically complicated, just google it)
$udp_packet = pack("nnnna*", $src_port,$dest_port,$udp_len, $cksum, $data);
$cksum = udp_checksum(inet_aton($src_host),inet_aton($dst_host),$udp_packet);
$zero_cksum = 0;
#test harness checks about host IPs
my $dst_host = (gethostbyname($dst_host))[4]; my $src_host = (gethostbyname($src_host))[4];
# Now lets construct the IP packet
my $ip_ver = 4;
my $ip_len = 5;
my $ip_ver_len = $ip_ver . $ip_len;
my $ip_tos = 00;
my ($ip_tot_len) = $udp_len + 20;
my $ip_frag_id = 19245;
my $ip_frag_flag = "010";
my $ip_frag_oset = "0000000000000";
my $ip_fl_fr = $ip_frag_flag . $ip_frag_oset;
my $ip_ttl = 30;
#H2H2nnB16C2na4a4 for the IP Header part#nnnna* for the UDP Header part.
#To understand these, see the manual of pack function and IP and UDP Header formats
#IP checksum ($zero_cksum is calculated by the kernel. Dont worry about it.)
my ($pkt) = pack('H2H2nnB16C2na4a4nnnna*',
$ip_ver_len,$ip_tos,$ip_tot_len,$ip_frag_id,
$ip_fl_fr,$ip_ttl,$udp_proto,$zero_cksum,$src_host,
$dst_host,$src_port,$dest_port,$udp_len, $cksum, $data);
#bit that makes the socket and sends the packet
socket(RAW, AF_INET, SOCK_RAW, 255) || die $!; setsockopt(RAW, 0, 1, 1);
my ($destination) = pack('Sna4x8', AF_INET, $dest_port, $dst_host);
send(RAW,$pkt,0,$destination);
I have a file, where I keep stored a serialized perl hash. In my current script, I load the values like this:
my $arrayref = retrieve("mySerializedFile");
my $a = $arrayref->[0];
my $b = $arrayref->[1];
my $c = $arrayref->[2];
My problem is that the file is about a 1GB so it takes about ten secs to load, and then a second more to perform some operations. I would like to reduce the retrieve time.
Is there any way of having this info loaded before the script execution? I mean, mySerialiedFile is not suposed to be changed in a long time, so if I could have it loaded always on the system would be nice, and would improve my execution time from 11secs to 1.
Following the suggestions in the comments, I used a db engine, which improved A LOT the execution time, which is about 5secs now.
I've inherited some code at work i'm trying to improve on. My Perl skills are somewhat lacking so would love some assistance!
Essentially this script is SNMP polling a network of thousands of nodes to update it's local interface index cache. I've found it's hitting a problem where it's exhausting it's memory and failing. Code as follows (heavily reduced but i think you'll get the jist)
use strict;
use warnings;
use Parallel::Loops;
my %snmp_results;
my $maxProcs = 50;
my #exceptions;
my #devices;
my %snmp_results;
my $pl = Parallel::Loops->new($maxProcs);
$pl->share(\%snmp_results, \#exceptions );
load_devices();
get_snmp_interfaces();
sub get_snmp_interfaces {
$pl->foreach( \#devices, sub {
my ($name, $community, $snmp_ver) = #$_;
# Create the new ifindex cache, and return an array reference to the new entries
my $result = getSNMPIFFull($name, $community, $snmp_ver);
if (defined $result && $result ne "") {
my %cache = %{$result};
print "Got cache for $name\n";
# Build hash of all the links polled through SNMP
# [ifindex, ifdesc, ifalias, ifspeed, ip]
for my $link (keys %cache) {
$snmp_results{$name}{$cache{$link}[0]} = [$cache{$link}[0], $cache{$link}[1], $cache{$link}[2], $cache{$link}[3], $cache{$link}[4]];
}
}
else {
push(#exceptions, "Unable to poll $name - $community - $snmp_ver");
}
});
}
This particular VM has 3.1GB of ram alloctable and is idling on about 83MB usage when this script is not running. If i drop the maxProcs down to 25, it will finish fine but this script can already take a long time given the sheer number of devices + latency so would rather keep the parallelism high!
I have a feeling that the $pl->share() is sharing the ever-expanding %snmp_results with each forked process which is definitely not necessary since it's not reading/modifying other entries: just adding new entries. Is there a better way I can be doing this?
I'm also slightly unsure about my %cache = %{$result};. If this is just creating a pointer as a hash then cool but if it's doing a copy, that's also a bit wasteful!
Any help will be greatly appreciated!
Documentation of the module can be found in the CPAN here.
There's one part talking about the performance:
Also, if each loop sub returns a massive amount of data, this needs to
be communicated back to the parent process, and again that could
outweigh parallel performance gains unless the loop body does some
heavy work too.
You are probably moving around complete copies of the variables in memory, pushing to the machine's limit if the MIB to poll and number of machines are big enough.
Since what you are doing is an I/O intensive task and not a CPU task that could benefit of parallel CPU processing, I would reconsider the approach of launching so many (50!) threads for polling.
Run the program with $maxProcs down to 1 to 5 processes and see how it behaves. Do some profiling of your code, attaching Devel::NYTProf to check where you are consuming time and if increasing the number of processes actually leads to a better performance.
Reconsider using Parallel::Loops for this task. You may get better performance with use threads[1] and a hash shared between the different threads (use threads::shared).
Apologies if this could have been a comment. Starting in SO is difficult due to all the limitations that are in place :(
If you already found a solution it would be great if you could share with us your findings. I didn't know Parallel::Loops before and I think I can give it some use.
I have a mail parser perl script which is called every time a mail arrives for a user (using .qmail). It extracts a calendar attachment out of the mail and places the "path" of the file in a FIFO queue implemented using the Directory::Queue module.
Another perl script which reads the path of the calendar attachment and performs certain file operations on the local system as well as on the remote CalDAV server, is being run as a daemon, as explained here. So basically this script looks like:
my $declarations
sub foo {
.
.
}
sub bar {
.
.
}
while ($keep_running) {
for(keep-checking-the-queue-for-new-entries) {
sub caldav_logic1 {
.
.
}
sub caldav_logic2 {
.
.
}
}
}
I am using Proc::Daemon for running the script as a daemon. Now the problem is, this process has almost 100% CPU usage. What are the suggested ways to implement the daemon in a more standard, safer way ? I am using pretty much the same code as mentioned in the link mentioned for usage of Proc::Daemon.
I bet it is your for loop and checking for new queue entries.
There are ways to watch a directory for file changes. These ways are OS dependent but there might be a Perl module that wraps them up for you. Use that instead of busy looping. Even with a sleep delay, the looping is inefficient when you can have your program told exactly when to wake up by an OS event.
File::ChangeNotify looks promising.
Maybe you don't want truly continuous polling. Is keep-checking-the-queue-for-new-entries a CPU-intensive part of the code, even when the queue is empty? That would explain why your processor is always busy.
Try putting a sleep 1 statement at the very top (or very bottom) of the while loop to let the processor rest between queue checks. If that doesn't degrade the program performance too much (i.e., if everyone can tolerate waiting an extra second before the company calendars get updated) and if the CPU usage still seems high, try sleep 2, sleep 5, etc.
cpan Linux::Inotify2
The kernel knows when files change and sends this information to your program which runs the sub. Maybe this will be better because the program will run the sub only when the file is changed.
I'm trying to profile a Perl script, but CORE::sleep gobble all the space (and time) of my report.
How can i tell NYTProf to ignore sleep calls ?
Assuming we have the following script :
sub BrandNewSubroutine {
sleep 10;
print "Odelay\n";
}
BrandNewSubroutine();
I want to get rid of the following line of the report :
Exclusive Time;Inclusive Time;Subroutine
10.0s;10.0s;main::::CORE:sleepmain::CORE:sleep
(opcode)
Edit: Using DB::disable_profile() and DB::enable_profile() won't do the trick, as it add sleep time to BrandNewSubroutine Inclusive time.
Thanks in advance.
I'd suggest either wrapping the calls to sleep (possibly by use of method mentioned in perlsub) with DB::disable_profile() and DB::enable_profile() calls (RUN-TIME CONTROL OF PROFILING in NYTProf documentation), or post processing the report to remove the offending calls.
CORE::accept is already ignored in the way you'd like CORE::sleep to be, so the mechanism is already in place. See this code in NYTProf.xs:
/* XXX make configurable eg for wait(), and maybe even subs like FCGI::Accept
* so perhaps use $hide_sub_calls->{$package}{$subname} to make it general.
* Then the logic would have to move out of this block.
*/
if (OP_ACCEPT == op_type)
subr_entry->hide_subr_call_time = 1;
So with a little hacking (OP_SLEEP==op_type || OP_ACCEPT == op_type) you'd be able to ignore CORE::sleep in the same way.
I'd accept a patch to enable that as an option.