pcap - Proper capitalization when referring to the file standard? - pcap

How does one properly refer to a Packet Capture file in short hand when writing about it for documentation?
I see a mix between PCAP, PCap and pcap in various areas and wikis.

The proper way to refer to a packet capture file is "a packet capture file"; "pcap"/"PCap"/"pcap" are often used to refer to a particular type of packet capture file, those packet capture files written in the format that libpcap/WinPcap supports for writing. There are several other capture file formats, one of which Wireshark, and libpcap 1.1.0 and later, can read (pcap-ng), and several of which Wireshark can read (and some that Wireshark can't read).
The way I (as a core developer of libpcap, tcpdump, and Wireshark) would say is the proper way to refer to files in the aforementioned format is "pcap", with no extra capitalization; the "pcap" comes from "libpcap", not directly from "packet capture", and "libpcap" is not capitalized (it's a UN*X library, and those tend to have all-lower-case names, given that almost all UN*X file systems are case-sensitive).
Others may call it "PCAP", perhaps because a number of terms in the computer and networking fields are acronyms or other initialisms and they assume "PCAP" must be as well, or call it "PCap", because they think of it as standing for "Packet Capture" rather than referring to libpcap and WinPcap, but, then again, people also referred to Sun Microsystems as "SUN" (it did come from the Stanford University Network project, but it wasn't "Stanford University Network Microsystems", it was just "Sun Microsystems").

Related

Why can file descriptors under UNIX be transmitted over sockets, but not over pipes?

I just learned about pipes and sockets today and that sockets are special because they allow you to transmit file descriptors between processes.
I've also looked up that it seems to be sendmsg() and the msghdr structure that are used to produce this behavior.
My professor told me that pipes can't be used to replicate this behavior/feature, but I am interested exactly what part of the implementation allows sockets to do what pipes can't.

Difference between machine language, binary code and a binary file

I'm studying programming and in many sources I see the concepts: "machine language", "binary code" and "binary file". The distinction between these three is unclear to me, because according to my understanding machine language means the raw language that a computer can understand i.e. sequences of 0s and 1s.
Now if machine language is a sequence of 0s and 1s and binary code is also a sequence of 0s and 1s then does machine language = binary code?
What about binary file? What really is a binary file? To me the word "binary file" means a file, which consists of binary code. So for example, if my file was:
010010101010010
010010100110100
010101100111010
010101010101011
010101010100101
010101010010111
Would this be a binary file? If I google binary file and see Wikipedia I see this example picture of binary file which confuses me (it's not in binary?....)
Where is my confusion happening? Am I mixing file encoding here or what? If I were to ask one to SHOW me what is machine language, binary code and binary file, what would they be? =) I guess the distinction is too abstract to me.
Thnx for any help! =)
UPDATE:
In Python for example, there is one phrase in a file I/O tutorial, which I don't understand: Opens a file for reading only in binary format. What does reading a file in binary format mean?
Machine code and binary are the same - a number system with base 2 - either a 1 or 0. But machine code can also be expressed in hex-format (hexadecimal) - a number system with base 16. The binary system and hex are very interrelated with each other, its easy to convert from binary to hex and convert back from hex to binary. And because hex is much more readable and useful than binary - it's often used and shown. For instance in the picture above in your question -uses hex-numbers!
Let say you have the binary sequence 1001111000001010 - it can easily be converted to hex by grouping in blocks - each block consisting of four bits.
1001 1110 0000 1010 => 9 14 0 10 which in hex becomes: 9E0A.
One can agree that 9E0A is much more readable than the binary - and hex is what you see in the image.
I'm honestly surprised to not see the information I was looking for, looking back though, I guess the title of this thread isn't fully appropriate to the question the OP was asking.
You guys all say "Machine Code is a bunch of numbers".
Sure, the "CODE" is a bunch of numbers, but what people are wondering (I'm guessing) is "what actually is happening physically?"
I'm quite a novice when it comes to programming, but I understand enough to feel confident in 'roughly' answering this question.
Machine code, to the actual circuitry, isn't numbers or values.
Machine code is a bunch of voltage gates that are either open or closed, and depending on what they're connected to, a certain light will flicker at a certain time etc.
I'm guessing that the "machine code" dictates the pathway and timing for specific electrical signals that will travel to reach their overall destination.
So for 010101, 3 voltage gates are closed (The 0's), 3 are open (The 1's)
I know I'm close to the right answer here, but I also know it's much more sophisticated - because I can imagine that which I don't know.
010101 would be easy instructions for a simple circuit, but what I can't begin to fathom is how a complex computer processes all of the information.
So I guess let's break it down?
x-Bit-processors tell how many bits the processor can process at once.
A bit is either 1 or 0, "On" or "Off", "Open" or "Closed"
so 32-bit processors process "10101010 10101010 10101010 10101010" - this many bits at once.
A processor is an "integrated circuit", which is like a compact circuit board, containing resistors/capacitors/transistors and some memory. I'm not sure if processors have resistors but I know you'll usually find a ton of them located around the actual processor on the circuit board
Anyways, a transistor is a switch so if it receives a 1, it sends current in one direction, or if it receives a 0, it'll send current in a different direction... (or something like that)
So I imagine that as machine code goes... the segment of code the processor receives changes the voltage channels in such a way that it sends a signal to another part of the computer (why do you think processors have so many pins?), probably another integrated circuit more specialized to a specific task.
That integrated circuit then receives a chunk of code, let's say 2 to 4 bits 01 or 1100 or something, which further defines where the final destination of the signal will end up, which might be straight back to the processor, or possibly to some output device.
Machine code is a very efficient way of taking a circuit and connecting it to a lightbulb, and then taking that lightbulb out of the circuit and switching the circuit over to a different lightbulb
Memory in a computer is highly necessary because otherwise to get your computer to do anything, you would need to type out everything (in machine code). Instead, all of the 1's and 0's are stored inside some storage device, either a spinning hard disk with a magnetic head pin that 'reads' 1's or 0's based on the charge of the disk, or a flash memory device that uses a series of transistors, where sending a voltage through elicits 1's and 0's (I'm not fully aware how flash memory works)
Fortunately, someone took the time to think up a different base number system for programming (hex), and a way to compile those numbers (translate them) back into binary. And then all software programs have branched out from there.
Each key on the keyboard creates a specific signal in binary that translates to
a bunch of switches being turned on or off using certain voltages, so that a current could be run through the specific individual pixels on your screen that create "1" or "0" or "F", or all the characters of this post.
So I wonder, how does a program 'program', or 'make' the computer 'do' something... Rather, how does a compiler compile a program of a code different from binary?
It's hard to think about now because I'm extremely tired (so I won't try) but also because EVERYTHING you do on a computer is because of some program.
There are actively running programs (processes) in task manager. These keep your computer screen looking the way you've become accustomed, and also allow for the screen to be manipulated as if to say the pictures on the screen were real-life objects. (They aren't, they're just pictures, even your mouse cursor)
(Ok I'm done. enough editing and elongating my thoughts, it's time for bed)
Also, what I don't really get is how 0's are 'read' by the computer.
It seems that a '0' must not be a 'lack of voltage', rather, it must be some other type of signal
Where perhaps something like 1 volt = 1, and 0.5 volts = 0. Some distinguishable difference between currents in a circuit that would still send a signal, but could be the difference between opening and closing a specific circuit.
If I'm close to right about any of this, serious props to the computer engineers of the world, the level of sophistication is mouthwatering. I hope to know everything about technology someday. For now I'm just trying to get through arduino.
Lastly... something I've wondered about... would it even be possible to program today's computers without the use of another computer?
Machine language is a low-level programming language that generally consists entirely of numbers. Because they are just numbers, they can be viewed in binary, octal, decimal, hexadecimal, or any other way. Dave4723 gave a more thorough explanation in his answer.
Binary code isn't a very well-defined technical term, but it could mean any information represented by a sequence of 1s and 0s, or it could mean code in a machine language, or it could mean something else depending on context.
Technically, all files are stored in binary, we just don't usually look at the binary when we view a file. However, the term binary file is usually used to refer to any non-text file; e.g. an .exe, a .png, etc.
You have to understand how a computer works in its basic principles and this will clear things up for you... Therefore I recommend on reading into stuff like Neumann Architecture
Basically in a very simple computer you only have one memory like an array
which has instructions for your processor, the data and everything is a binary numbers.
Your program starts at a certain place in your memory and reads the first number...
so here comes the twist: these numbers can be instructions or data.
Your processor reads these numbers and interprets them as instructions
Example: the start address is 0
in 0 is a instruction like "read value from address 120 into the ALU (Math-Unit)
then it steps to address 1
"read value from address 121 into ALU"
then it steps to address 2
"subtract numbers in ALU"
then it steps to address 3
"if ALU-Value is smaller than zero go to address 10"
it is not smaller than zero so it steps to address 4
"go to address 20"
you see that this is a basic if(a < b)
You can write these instructions as numbers and they can be run by your processor but because nobody wants to do this work (that was what they did with punchcards in the 60s)
assembler was invented...
that looks like:
add 10 ,11, 20 // load var from address 10 and 11; run addition and store into address 20
In Conclusion:
Assembler (processor instructions) can be called binary because it's stored in plain numbers
But everything else can be a Binary file, too.
In reality if you have a simple .exe file it is both... If you have variables in there like a = 10 and b = 20, these values can be stored some where between if clauses and for loops... It depends on the compiler where it put these
But if you have a complex 3D-model it can be stored in a separate file with no executable code in it...
I hope it helps to clear things up a little.

Perl network frame/packet parser

I am writing a small sniffer as part of a personal project. I am using Net::Pcap (really really great tool).
In the packet-processing loop I am using the excellent Net::Frame for unpacking all the headers and getting at the data. I am getting concerned that this might not be terribly efficient (Net::Frame is great but seems to be more than I need for this project).
Also I dislike that for some Debian systems I had to manually compile libdumbnet (the package provided in the official apt repositories didn't seem to work, Net-Libdnet-0.92 didn't like it).
All I want is to get at the payload inside a TCP segment. Is there any alternative ?
Thank you.
P.S. Would it be really really bad (read "thedailywtf.com worthy") if I just took the packet and searched it for some pattern ?
I recently wrote a PCAP dump file unpacker in C and then afterwards wished I'd just used the open source libraries instead (when I realised they existed and were so easy to use). I have to say that as it's a binary file format it's probably easier to do in C than Perl, but I'll no doubt get boo'ed by all the Perl fanatics out there.
What I will say is that using existing code will be quicker all round than coding it yourself, but if you really really want to, the file format is freely available online and is really quite simple.
As for searching for a pattern, it almost certainly won't work. It's a binary file format and the packets can be fragmented and/or duplicated, so the only reliable way to know where a message starts and ends is by unpacking the headers, checking the packet flags, reading the content length field, etc. etc. Doing pattern searches may work 90% of the time, but at some point you'll find a packet capture log that means you need to change your code. And then a while later find another packet that means another change, and so on and so forth.

Is there some kind of tool to look at the encoding of Intel x86 instructions?

Forgive me if this might be a dumb question but, I'm in an assembly class that was mostly taught using an emulated CPU that was supposed to teach the concepts of assembly code. We haven't even written an Intel program, so I'm trying to adjust. In our emulated CPU, we were able to generate a symbol table file that gave the bytes equivalent for instructions:
http://imgur.com/tw5S8.png
Would I be able to do such a thing with Intel x86 instructions?
Try IDA. It has an option to show binary values of opcodes.
EDIT: Well.. it's a disassembler. Try opening a binary file, and set the number of opcode bytes to show (in Options/General/) to something that is not zero.
If you are looking for an IDE that shows you in real time the opcodes for the instruction you've used, then I don't think you'll find one, because of lack of "market". Can you explain why you need it? Do you want to know just their length, or want to learn them? There is simple pattern for lengths, so by dissasembling many binaries you'll catch it. If it's the opcodes you want.. well, there are lots of them, almost no rules, and practically no use to do it.
I see.. then you have to generate the list file . Your assembler should have an option for that. (for NASM it's -l listfile). Just put any instruction(s) in your .asm file, and generate listing for it. It should contain the binary encoding for each instruction.
First, get Intel Instruction Set Refference, or, better, this link: http://siyobik.info/index.php?module=x86 . There you'll find that most opcodes have several encodings. In your particular case, the bit 1 of the opcode specifies direction, and since both operands are registers, you can toggle the direction and swap the register codes, and the result will be the same. Usually you have this freedom on most register to register arithmetic operations. To check this, try decompiling with IDA this source file:
db 02h, E0h
db 00h, C4h
There is a demo program shipped with fasm.dll which has an editor and hex-viewer:

Named pipe similar to "mkfifo" creation, but bidirectional

I'd like to create a named pipe, like the one created by "mkfifo", but one caveat. I want the pipe to be bidirectional. That is, I want process A to write to the fifo, and process B to read from it, and vice-versa. A pipe created by "mkfifo" allows process A to read the data its written to the pipe. Normally I'd use two pipes, but I am trying to simulate an actual device so I'd like the semantics of open(), read(), write(), etc to be as similar to the actual device as possible. Anyone know of a technique to accomplish this without resorting to two pipes or a named socket?
Or pty ("pseudo-terminal interface"). man pty.
Use a Unix-domain socket.
Oh, you said you don't want to use the only available solution - a Unix-domain socket.
In that case, you are stuck with opening two named pipes, or doing without. Or write your own device driver for them, of course - you could do it for the open source systems, anyway; it might be harder for the closed source systems (Windows, AIX, HP-UX).