For .pcap file, whether a swapped magic_number 0xd4c3b2a1 effect the network packet? - pcap

For example, when the magic_number is 0xa1b2c3d4, the ether_type is 0x0008.
Then, if the magic_number is 0xd4c3b2a1, will the ether_type be 0x0800?
I only have .pcap files that the magic_number is 0xa1b2c3d4, so I cann't verify myself.
Or someone may upload a .pcap file whose magic_number is 0xd4c3b2a1, then I can analysis myself.
Thanks.

According to tcpdump manpage :
That allows software reading the file to determine
whether the byte order of the host that wrote the file is the same as
the byte order of the host on which the file is being read, and thus
whether the values in the per-file and per-packet headers need to be
byte-swapped.
I think the answer is yes.

The Ethernet type field is always in big-endian ("network") byte order in a packet, so, to read it, you must convert it from big-endian byte order to your machine's byte order, using, for example, ntohs().
This is independent of the byte order of the host that wrote the file; using ntohs() will work on all machines with all pcap files.
(Bear in mind, however, that there are machines on which you would have to fetch the Ethernet type field - or any other multi-byte integral field - a byte at a time and assemble those bytes into the value, because those machines trap on unaligned memory accesses. See, for example, the EXTRACT_ macros/functions in extract.h in the tcpdump source.)

Related

Did anyone ever heard about asciihex encoding?

this type of encoding is used in soap messages...
I'm receiving a message encoded in ASCIIHEX and I don't have any ideas on how this encoding actually works although I have the clear description of the encoding method:
"If this mode is used, every single original byte is encoded as a sequence of two characters representing it in hexadecimal. So, if the original byte was 0x0a, the transmitted bytes are 0x30 and 0x41 (‘0’ and ‘a’ in ASCII)."
The buffer received : "1f8b0800000000000000a58e4d0ac2400c85f78277e811f2e665329975bbae500f2022dd2978ff95715ae82cdcf9415efec823c6710247582d5965c32c65aab0f5fc0a5204c415855e7c190ef61b34710bcdc7486d2bab8a7a4910d022d5e107d211ed345f2f37a103da2ddb1f619ab8acefe7fdb1beb6394998c7dfbde3dcac3acf3f399f3eeae152012e010000"
The actual file contains this : "63CD13C1697540000000662534034000030000120011084173878R 00000001000018600050000000100460000009404872101367219 000000000000 DNSO_038114 000000002001160023Replacem000000333168625 N0000 00000000"
The provider sent me the file that contains the string above. I tried to start from the buffer string and get the same result as the one sent by the provider but no results. I also tried searching after this "asciihex" encoding and same. If someone knows anything about this encoding or can give me any advice I would really appreciate it. I have pretty much no experience with SOAP services.
Based on the comments above, it's possible the buffer is compressed. It starts with 1F 8B which is a signature for GZIP compression. See the following list of signatures.
Write the bytes that correspond to the hex strings into a file. Name that file with a gz or tar.gz extension and try to extract it or open it with some file archiver tool.
Another thing you could try would be to not send the Compress element in your request, assuming it's an optional field and you can do that. If you can, check if the buffer changes and has the proper length and you can see similar patterns as the original content (for those zeros at the end, for example).

Can someone explain the sequence of events that occurs in the encoding/decoding process?

I'm trying to solidify my understanding of encoding and decoding. I'm not sure how the sequence of events works in different settings:
When I type on my computer, is the computer(or whatever program I'm in), automatically decoding my letters in UTF-8(or whatever encoding is used)
When I save a file, is it automatically saving it using the encoding standard that was used to decode my text? Let's say I send over that document or dataset to someone, am I sending a bunch of 1s and 0s to them? and then their decoder is decoding it based on whatever default or encoding standard they specify?
How does code points play into this? Does my computer also have a default code point dictionary it uses?
If these above is true, how do I find out what kind of decoding/encoding my computer/program is using?
Sorry if this isn't clear, or if I'm misunderstanding/using terminology incorrectly.
There are a few ways that this can work, but here is one possibility.
First, yes, in a way, the computer "decodes" each letter you type into some encoding. Each time you press a key on your keyboard, you close a circuit, which signals to other hardware in your computer (e.g., a keyboard controller) that a key was pressed. This hardware then populates a buffer with information about the keyboard event (key up, key down, key repeat) and sends an interrupt to the CPU.
When the CPU receives the interrupt, it jumps to a hardware-defined location in memory and begins executing the code it finds there. This code often will examine which device sent the interrupt and then jump to some other location that has code to handle an interrupt sent by the particular device. This code will then read a "scan code" from the buffer on the device to determine which key event occurred.
The operating system then processes the scan code and delivers it to the application that is waiting for keyboard input. One way it can do this is by populating a buffer with the UTF-8-encoded character that corresponds to the key (or keys) that was pressed. The application would then read the buffer when it receives control back from the operating system.
To answer your second question, we first have to remember what happens as you enter data into your file. As you type, your application receives the letters (perhaps UTF-8-encoded, as mentioned above) corresponding to the keys that you press. Now, your application will need to keep track of which letters it has received so that it can later save the data you've entered to a file. One way that it can do this is by allocating a buffer when the program is started and then copying each character into the buffer as it is received. If the characters are delivered from the OS UTF-8-encoded, then your application could simply copy those bytes to the other buffer. As you continue typing, your buffer will continue to be populated by the characters that are delivered by the OS. When it's time to save your file, your application can ask the OS to write the contents of the buffer to a file or to send them over the network. Device drivers for your disk or network interface know how to send this data to the appropriate hardware device. For example, to write to a disk, you may have to write your data to a buffer, write to a register on the disk controller to signal to write the data in the buffer to the disk, and then repeatedly read from another register on the disk controller to check if the write is complete.
Third, Unicode defines a code point for each character. Each code point can be encoded in more than one way. For example, the code point U+004D ("Latin capital letter M") can be encoded in UTF-8 as 0x4D, in UTF-16 as 0x004D, or in UTF-32 as 0x0000004D (see Table 3-4 in The Unicode Standard). If you have data in memory, then it is encoded using some encoding, and there are libraries available that can convert from one encoding to another.
Finally, you can find out how your computer processes keyboard input by examining the device drivers. You could start by looking at some Linux drivers, as many are open source. Each program, however, can encode and decode data however it chooses to. You would have to examine the code for each individual program to understand how its encoding and decoding works.
It is a complex question, also because it depends on many things.
When I type on my computer, is the computer(or whatever program I'm in), automatically decoding my letters in UTF-8(or whatever encoding is used)
This is very complex. Some programs get the keyboard code (e.g. games), but most programs uses operating system services, to interpret keyboard codes (considering various keyboard layouts, but also modifying result according Shift, Control, etc.).
So, it depends on operating system and program about which encoding you get. For terminal programs, the locale of the process include also encoding of stdin/stdout (standard input and standard output). For graphical interfaces, you may get different encoding (according system encoding).
But UTF-8 is an encoding, so you used wrongly the word decoding in UTF-8.
When I save a file, is it automatically saving it using the encoding standard that was used to decode my text? Let's say I send over that document or dataset to someone, am I sending a bunch of 1s and 0s to them? and then their decoder is decoding it based on whatever default or encoding standard they specify?
This is the complex part. Many systems, and computer languages are old, so they were designed with just one system encoding. E.g. C language. So there is not really a decoding. Programs uses directly the encoding, and they hard code that letter A has a specific value. For computers, only the numeric value matter. Only when data is printed things are interpreted, and in a complex way (fonts, character size, ligatures, next line, ...). [And also if you use string functions, you explicitly tell program to uses the numbers as a string of characters].
Some languages (and HTML: you view a page generated by an external machine, so system encoding is not more the same) introduced the decoding part: internally in a program you have one single way to represent a string (e.g. with Unicode Code Points). But to have such uniform format, we need to decode strings (but so, now we can handle different encoding, and not being restricted to the encoding of the system).
If you save a file, it will have a sequences of bytes. To interpret (also known as decoding) you need to know which encoding has the file. In general you should know it, or give (e.g. as HTML) an out-of-band information ("the following file is UTF-8", e.g. in HTTP headers, or in extension, or in field definition of a database, or...). Some systems (Microsoft Windows) uses BOM (Byte order mark) to distinguish between UTF-16LE, UTF-16BE, UTF-8, and old system encoding (some people call it ANSI, but it is not ANSI, and it could be many different code pages).
The decoder: usually it should know the encoding, else either it use defaults, or it guess it. HTML has a list on step to perform to get an estimate. BOM method above could help. And some tools will check looking common combination of characters (in various languages). But this is still magic. Without BOM or out-of-band data, we can just estimate, and we get wrong often.
How does code points play into this? Does my computer also have a default code point dictionary it uses?
Code point is the base of Unicode. Every "character" has a code point: a fix number, with a description. This is abstract. In UTF-32 you use the same number for encoding (using 32bit integers), on all other encoding, you have a functions (or a map) from code point to encoded values (and also the way back). Code point is just a numeric value which describes the semantic (so the meaning) of a character. To transmit such information, usually we need an encoding (or just a escaping sequence, e.g. U+FFFF represent (as text) the BOM character).
If these above is true, how do I find out what kind of decoding/encoding my computer/program is using?
Nobody can answer: your computer will uses a lot of encoding.
MacOS, Unix, POSIX systems: modern systems (and not root account): they will use probably UTF-8. Root will probably use just ASCII (7-bit).
Windows: Internally it uses often UTF16. The output, it depends on the program, but nearly always it uses an 8-bit encoding (so not the UTF16). Windows can read and write several encoding. You can ask the system the default encoding (but programs could still write in UTF-8 or other encoding, if they want). Terminal and settings could gives you different default encoding on different programs.
For this reason, if you program in Windows, you should explicitly save files as UTF-8 (my recommendation), and possibly with BOM (but if you need interoperability with non-Windows machines, in such case, ignore BOM, but you should already know that such files must be UTF-8).

Why does PNG include NLEN (one's complement of LEN)?

In the PNG spec, uncompressed blocks include two pieces of header information:
LEN is the number of data bytes in the block. NLEN is the one's complement of LEN
Why would the file include the one's complement of a value? How would this be used and/or for what purpose?
Rather than inventing a new compression type for PNG, its authors decided to use an existing industry standard: zlib.
The link you provide does not point to the official PNG specifications at http://www.w3.org/TR/PNG/ but only to this part: the DEFLATE compression scheme. NLEN is not mentioned in the official specs; it only says the default compression is done according to zlib (https://www.rfc-editor.org/rfc/rfc1950), and therefore DEFLATE (https://www.rfc-editor.org/rfc/rfc1951).
As to "why": zlib precedes current day high-speed internet connections, and at the time it was invented, private internet communication was still done using audio line modems. Only few institutions could afford dedicated landlines for just data; the rest of the world was connected via dial-up. Due to this, data transmission was highly susceptible to corruption. For simple text documents, a corrupted file might still be usable, but in compressed data literally every single bit counts.
Apart from straight-on data corruption, a dumb (or badly configured) transmission program might try to interpret certain bytes, for instance changing Carriage Return (0x0D) into Newline (0x0A), which was a common option at the time. "One's complement" is the inversion of every single bit for 0 to 1 and the reverse. If either LEN or NLEN happened to be garbled or changed by the transmission software, then its one's complement would not match anymore.
Effectively, the presence of both LEN and NLEN doubles the level of protection against transmission errors: if they do not match, there is an error. It adds another layer of error checking over zlib's ADLER32, and PNGs own per-block checksum.

Receiving unknown strings lengths?

So I'm converting a Python program I wrote to Erlang, and it's been a long time since I used Erlang. So I guest I'm moved back to beginner level. Anyways from experience every language I use when dealing with sockets have send/recv functions that always return the length of data sent/receive. In Erlangs gen_tcp case however doesn't seem to do that.
So when I call send/recv/or inet:setopts it knows when the packet has ended? Will I need to write a looping recvAll/sendAll function so I can find the escape or \n in the packet(string) I wish to receive?
http://erlang.org/doc/man/gen_tcp.html#recv-2
Example code I'm using:
server(LS) ->
case gen_tcp:accept(LS) of
{ok,S} ->
loop(S),
server(LS);
Other ->
io:format("accept returned ~w - goodbye!~n",[Other]),
ok
end.
loop(S) ->
inet:setopts(S,[{active,once}]),
receive
{tcp,S,Data} ->
Answer = process(Data), % Not implemented in this example
gen_tcp:send(S,Answer),
loop(S);
{tcp_closed,S} ->
io:format("Socket ~w closed [~w]~n",[S,self()]),
ok
end.
Just from looking at examples and documentation it seems like Erlang just knows. And I want to confirm, because the length of data being received can be anywhere between to 20 bytes to 9216 bytes and or could be sent in chunks since the client is a PHP socket library I'm writing.
Thank you,
Ajm.
TL;DR
So when I call send/recv/or inet:setopts it knows when the packet has
ended?
No, it doesn't.
Will I need to write a looping recvAll/sendAll function so I can find
the escape or \n in the packet(string) I wish to receive?
Yes, generally, you will. But erlang can do this job for you.
HOW?
Actually, you couldn't rely on TCP in sense of splitting messages into packets. In general, TCP will split your stream to arbitrary sized chunks, and you program have to assemble this chunks and parse this stream by own. So, first, your protocol must be "self delimiting". For example you can:
In binary protocol - precede each packet with its length (fixed-size field). So, protocol frame will looks like this: <<PacketLength:2/big-unsigned-integer, Packet/binary>>
In text protocol - terminate each line with line feed symbol.
Erlang can help you with this deal. Take a look here http://erlang.org/doc/man/gen_tcp.html#type-option. There is important option:
{packet, PacketType}(TCP/IP sockets)
Defines the type of packets to use for a socket. The following values are valid:
raw | 0
No packaging is done.
1 | 2 | 4
Packets consist of a header specifying the number of bytes in the packet, followed by that number of bytes. The length of header can be one, two, or four bytes; containing an unsigned integer in big-endian byte order. Each send operation will generate the header, and the header will be stripped off on each receive operation.
In current implementation the 4-byte header is limited to 2Gb.
line
Line mode, a packet is a line terminated with newline, lines longer than the receive buffer are truncated.
Last option (line) is most interesting for you. If you'll set this option, erlang will parse input stream internally and yeld packets splitted by lines.

tcprewrite - truncated packet error

i use tcprewrite to rewrite my raw pcap files (change MAC address and then change the IP) from CAIDA dataset. I came across this problem.
command use as follows:
sudo tcprewrite --infile=xxx.pcap --dlt=enet --outfile=yyy.pcap --enet-dmac=00:00:00:03 --enet-smac=00:00:00:1f
the error
pcap was captured using a snaplen of 65000 bytes. This may mean you have truncated packets.
I tried to search for the solutions from web and unfortunately i cannot solve it. According to this thread
http://sourceforge.net/p/tcpreplay/mailman/tcpreplay-users/?viewmonth=201201
the error rise because of the packet is not captured since the beginning.
Does anyone has an idea how to solve this problem?
pcap was captured using a snaplen of 65000 bytes. This may mean you have truncated packets.
Or it may not and, in fact, it probably doesn't mean you have truncated packets.
The packet capture mechanism on some systems requires that some maximum packet length be specified, but it can be something sufficiently large that, in practice, no packets will be larger than the maximum and thus no packets will be truncated.
65535 was often used as such a maximum - Wireshark's done so even before it was renamed Wireshark, and tcpdump was first changed so that "-s 0" would use 65535 and then changed to default to 65535. tcpreplay is treating any maximum packet length < 65535 as "not trying to capture the entire packet", but warning about, for example, 65534 is a bit silly.
Tcpdump and Wireshark recently boosted the limit to 262144 to handle some USB captures.
(The limit is chosen not to be too big - the pcap and pcap-ng file formats allow up to 2^32-1, but some software might, when reading such a file, try to allocate a 2^32-1-byte buffer and fail.)
So, don't worry about it, but also don't explicitly specify a snapshot length of 65000 - explicit snapshot lengths are useful only if you want to capture only part of the packet or if you have an old tcpdump (or ANCIENT Ethereal) that defaults to 68 or 96 bytes, and, in the latter case, you might as well go with a value >= 65535.