Inspecting binary over sockets

Inspecting binary over sockets - sockets

I'm using WireShark to inspect data sent/received over a web-socket, however, all I see is nonsense.
0000 1c 74 0d 7d 42 24 d8 5d e2 26 c1 7d 08 00 45 00 .t.}B$.].&.}..E.
0010 00 3c 75 4e 40 00 80 06 22 eb c0 a8 01 c0 4f 89 .<uN#...".....O.
0020 50 91 c4 f1 0f 78 72 e5 d0 f4 ea 5e 6e e2 50 18 P....xr....^n.P.
0030 00 40 91 b3 00 00 c2 8e 6d 06 87 95 7f 76 78 62 .#......m....vxb
0040 92 f9 54 2a 92 f9 b4 95 6c 06 ..T*....l.
I've seen this type of output before. The left is a line of binary, and the right is the decoded string (ASCII), right?
Is this data obfuscated/encrypted?
Is it possible to get cogent information from my socket?
Also, what do the [FIN] and [MASKED] flags mean?

If you copy and paste the data you supplied into a text file and append a line beginning with 0050 with nothing following it, you can then run text2pcap -a infile.txt outfile.pcap to convert the data to a pcap file that Wireshark can read and decode for you.
See the text2pcap man page for more information about this tool.
I have done this and the packet appears to just be a simple TCP segment. There is no [FIN] or [MASKED] flag, only PSH and ACK. For information about these TCP flags, refer to RFC 793, section 3.1, as well as the other RFC's mentioned at the top, which update this one.

Related

Unknown data between h264 NAL units in an AVC file

I want to understand a weird observation I had while working with h264 encoded AVC files. In such files, each NAL unit is preceded by 1/2/4 bytes that encode the size of the NAL unit (without the size header). However, there has been some cases where the end of one NAL unit doesn't take to another NAL unit, instead it takes to a sequence of some data till it eventually reaches another NAL unit
For example, starting at 01ADF399, we have:
*00 00 35 99 41* 9A 12 25 83 A5 F0 7A 08 41 0C 1E 02 50 20 03 80 A4 12 30 B6 44 90
0C E1 CD A2 68 9F 9F 2E C0 2E 1C 18 A2 28 8A 85 65 AC 0B 7D F1 DD 0F ...
Which ends at 01AE2936 as:
21 1A 54 6D FC 34 3B 32 FA AA D6 71 8A BC 92 F9 95 79 75 8A E6 B5 A9 77 24 4A AC
1C E3 EF A2 9D 97 30 51 D1 7B EB 75 FD B2 8D 8A A7 B9 47 8A C6 59 1A 32 FB 9E 77
03 8E CA 67 23 B7 52 EE 2E A4 BA 43 CE F9 CD 46 48 C5 C4 41 35 32 F3 D6 5B CD BE
DA B8 B3 3E 1B 33 87 AE 65 A0 45 74 DF EB 37 96 2F DA 9C ...
Clearly not the start of an NAL unit (since FC doesn't have forbidden zero bit)
However, at 01AE7535, we have the following:
00 00 27 EA 41 9A 14 29 81 29 7C 80 41 04 18 98 44 64 01 C6 54 00 0D 9F 34 58 71
E5 0A A6 CD B0 4B 38 60 7F E6 1F C8 00 24 7A 06 E5 9B 21 99 F0 51 24 9B ...
Which is the start of an NAL unit. I verified that those two NAL units are consecutive since filtering the file to the annex B h264 format removes the unknown data in between and places those two exact NAL units right next to each other.
I tried looking at ISO 14496 part 15 but it doesn't mention anything about this.

Where are files marked as assume-unchanged in Git? [duplicate]

I like to modify config files directly (like .gitignore and .git/config) instead of remembering arbitrary commands, but I don't know where Git stores the file references that get passed to "git update-index --assume-unchanged file".
If you know, please do tell!

It says where in the command - git update-index
So you can't really be editing the index as it is not a text file.
Also, to give more detail on what is stored with the git update-index --assume-unchanged command, see the Using “assume unchanged” bit section in the manual

As others said, it's stored in the index, which is located at .git/index.
After some detective work, I found that it is located at the: assume valid bit of each index entry.
Therefore, before understanding what follows, you should first understand the global format of the index, as explained in my other answer.
Next, I will explain how I verified that the "assume valid" bit is the culprit:
empirically
by reading the source
Empirical
Time to hd it up.
Setup:
git init
echo a > b
git add b
Then:
hd .git/index
Gives:
00000000 44 49 52 43 00 00 00 02 00 00 00 01 54 e9 b6 f3 |DIRC........T...|
00000010 2d 4f e1 2f 54 e9 b6 f3 2d 4f e1 2f 00 00 08 05 |-O./T...-O./....|
00000020 00 de 32 ff 00 00 81 a4 00 00 03 e8 00 00 03 e8 |..2.............|
00000030 00 00 00 00 e6 9d e2 9b b2 d1 d6 43 4b 8b 29 ae |...........CK.).|
00000040 77 5a d8 c2 e4 8c 53 91 00 01 62 00 c9 a2 4b c1 |wZ....S...b...K.|
00000050 23 00 1e 32 53 3c 51 5d d5 cb 1a b4 43 18 ad 8c |#..2S<Q]....C...|
00000060
Now:
git update-index --assume-unchanged b
hd .git/index
Gives:
00000000 44 49 52 43 00 00 00 02 00 00 00 01 54 e9 b6 f3 |DIRC........T...|
00000010 2d 4f e1 2f 54 e9 b6 f3 2d 4f e1 2f 00 00 08 05 |-O./T...-O./....|
00000020 00 de 32 ff 00 00 81 a4 00 00 03 e8 00 00 03 e8 |..2.............|
00000030 00 00 00 00 e6 9d e2 9b b2 d1 d6 43 4b 8b 29 ae |...........CK.).|
00000040 77 5a d8 c2 e4 8c 53 91 80 01 62 00 17 08 a8 58 |wZ....S...b....X|
00000050 f7 c5 b3 e1 7d 47 ac a2 88 d9 66 c7 5c 2f 74 d7 |....}G....f.\/t.|
00000060
By comparing the two indexes, and looking at the global structure of the index, see that the only differences are:
byte number 0x48 (9th on line 40) changed from 00 to 80. That is our flag, the first bit of the cache entry flags.
the 20 bytes from 0x4C to 0x5F. This is expected since that is a SHA-1 over the entire index.
This has also though me that the SHA-1 of the index entry in bytes from 0x34 to 0x47 does not take into account the flags, since it did not changed between both indexes. This is probably why the flags are placed after the SHA, which only considers what comes before it.
Source code
Now let's see if that is coherent with source code of Git 2.3.
First look at the source of update-index, grep assume-unchanged.
This leads to the following line:
{OPTION_SET_INT, 0, "assume-unchanged", &mark_valid_only, NULL,
N_("mark files as \"not changing\""),
PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, MARK_FLAG},
{OPTION_SET_INT, 0, "no-assume-unchanged", &mark_valid_only, NULL,
N_("clear assumed-unchanged bit"),
PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, UNMARK_FLAG},
so the value is stored at mark_valid_only. Grep it, and find that it is only used at one place:
if (mark_valid_only) {
if (mark_ce_flags(path, CE_VALID, mark_valid_only == MARK_FLAG))
die("Unable to mark file %s", path);
return;
}
CE means Cache Entry.
By quickly inspecting mark_ce_flags, we see that:
if (mark)
active_cache[pos]->ce_flags |= flag;
else
active_cache[pos]->ce_flags &= ~flag;
So the function basically sets or unsets the CE_VALID bit, depending on mark_valid_only, which is a tri-state:
mark: --assume-unchanged
unmark: --no-assume-unchanged
do nothing: the default value 0 of the option set at {OPTION_SET_INT, 0
Next, by grepping under builtin/, we see that no other place sets the value of CE_VALID, so --assume-unchanged must be the only command that sets it.
The flag is however used in many places of the source code, which should be expected as it has many side-effects, and it is used every time like:
ce->ce_flags & CE_VALID
so we conclude that it is part of the ce_flags field of struct cache_entry.
The index is specified at cache.h because one of its functions is to be a cache for creating commits faster.
By looking at the definition of CE_VALID under cache.h and surrounding lines we have:
#define CE_STAGEMASK (0x3000)
#define CE_EXTENDED (0x4000)
#define CE_VALID (0x8000)
#define CE_STAGESHIFT 12
So we conclude that it is the very first bit of that integer (0x8000), just next to the CE_EXTENDED, which is coherent with my earlier experiment.

Identifying an unknown data encoding?

I'm trying understand an undocumented API I have discovered, and I can't get past the format of the data that is being returned.
Here is an example of what I get back when I perform a GET on the url I'm looking at:
A+uZL4258wXdnWztlEPJNXtdl3Tu4hRITtW2AUwQHUK5c6BATSBU/XsQEVIttCpI7wrW/oXWiBloT8+cdtUWBag3mzk3cLohKPvi7PWpf7jqCSbjNGh+5Iv5Gb8by2k31kp62sfwZ+i8r/3TA6nGrnJb6edOB7d0c6F34RTFRrrZSeJtiWYXAJ5JeD3yJY+C
At first I thought this was base64 encoded, but that just gives me back gibberish:
echo -n "<above snippet>" | base64 -D
?/???ݝl?C?5Vy??????,?8?s?#M T?{R-?*H?
???ֈhOϜv??7?97p?!(??????? &?4h~???i7?Jz???g輯???Ʈr[??N?ts?w??F??I?m?f?Ix=?%?
When I strip the URL down to just the domain, I get a website with cyrillic text. Maybe the data could be converted to cyrillic somehow?
Does this data format look familiar to you?
I'll continue to keep trying and report back if I make any progress.

This is definitely base64, because of the / and + characters.
When you decode that string using base64, you get this hexdump:
00000000 03 eb 99 2f 8d b9 f3 05 dd 9d 6c ed 94 43 c9 35 |.../......l..C.5|
00000010 7b 5d 97 74 ee e2 14 48 4e d5 b6 01 4c 10 1d 42 |{].t...HN...L..B|
00000020 b9 73 a0 40 4d 20 54 fd 7b 10 11 52 2d b4 2a 48 |.s.#M T.{..R-.*H|
00000030 ef 0a d6 fe 85 d6 88 19 68 4f cf 9c 76 d5 16 05 |........hO..v...|
00000040 a8 37 9b 39 37 70 ba 21 28 fb e2 ec f5 a9 7f b8 |.7.97p.!(.......|
00000050 ea 09 26 e3 34 68 7e e4 8b f9 19 bf 1b cb 69 37 |..&.4h~.......i7|
00000060 d6 4a 7a da c7 f0 67 e8 bc af fd d3 03 a9 c6 ae |.Jz...g.........|
00000070 72 5b e9 e7 4e 07 b7 74 73 a1 77 e1 14 c5 46 ba |r[..N..ts.w...F.|
00000080 d9 49 e2 6d 89 66 17 00 9e 49 78 3d f2 25 8f 82 |.I.m.f...Ix=.%..|
This just looks like 128 bytes of random data. And whenever you call this API URL again, you get a different string, although it starts with the same few characters.
Perhaps you should ask the maintainers of that website how to use their API. Maybe this string is some session ID that you should use in further calls.

What is the size (network footprint) of a socket connection?

Out of curiosity, how much data is actually sent when establishing a connection to a port (using Java sockets). Is it the size of a Socket object? SocketConnection object?

Your understanding of TCP network connections seems to conflate them with electrical circuits. (Understandable, given your background.)
From a physical standpoint, there's no such thing as a connection, only data packets. Through the TCP protocol, two devices agree to establish a logical (that is, software) connection. A connection is established by a client first sending data to the remote host (SYN), the server sending data back to the client (SYN-ACK), and the client sending a final acknowledgement (ACK). All of this negotation necessarily consumes bandwidth, and when you terminate a connection, you must negotiate a completely new connection to begin sending data again.
For example, I'll connect from my machine to a local web server, 192.168.1.2:80.
First, my machine sends a TCP SYN. This sends 66 bytes over the wire: (headers deliniated with |)
0000 00 24 8c a9 4c b4 00 1e 68 66 20 79 08 00|45 00 .$..L... hf y..E.
0010 00 34 53 98 40 00 80 06 00 00 c0 a8 01 0b c0 a8 .4S.#... ........
0020 01 02|36 0a 00 50 09 ef 3a a7 00 00 00 00 80 02 ..6..P.. :.......
0030 20 00 50 c8 00 00 02 04 05 b4 01 03 03 02 01 01 .P..... ........
0040 04 02 ..
The first 14 bytes are the Ethernet frame, specifying that this packet's destination MAC address. This will typically be an upstream router, but in this case, the server happens to be on the same switch, so it's the machine's MAC address, 00:24:8c:a9:4c:b4. The source (my) MAC follows, along with the payload type (IP, 0x0800). The next 20 bytes are the IPv4 headers, followed by 32 bytes of TCP headers.
The server responds with a 62-byte SYN-ACK:
0000 00 1e 68 66 20 79 00 24 8c a9 4c b4 08 00|45 00 ..hf y.$ ..L...E.
0010 00 30 69 b9 40 00 80 06 0d b1 c0 a8 01 02 c0 a8 .0i.#... ........
0020 01 0b|00 50 36 0a d3 ae 9a 73 09 ef 3a a8 70 12 ...P6... .s..:.p.
0030 20 00 f6 9d 00 00 02 04 05 b4 01 01 04 02 ....... ......
Again, 14 bytes of Ethernet headers, 20 bytes of IP headers, and 28 bytes of TCP headers. I send an ACK:
0000 00 24 8c a9 4c b4 00 1e 68 66 20 79 08 00|45 00 .$..L... hf y..E.
0010 00 28 53 9a 40 00 80 06 00 00 c0 a8 01 0b c0 a8 .(S.#... ........
0020 01 02|36 0a 00 50 09 ef 3a a8 d3 ae 9a 74 50 10 ..6..P.. :....tP.
0030 fa f0 83 78 00 00 ...x..
14 + 20 + 20 = 54 bytes over the wire (this is the smallest possible TCP packet size, by the way – the SYN and SYN-ACK packets were larger because they included options).
This adds up to 182 bytes over the wire to establish a connection; now I can begin sending actual data to the server:
0000 00 24 8c a9 4c b4 00 1e 68 66 20 79 08 00 45|00 .$..L... hf y..E.
0010 01 9d 53 9d 40 00 80 06 00 00 c0 a8 01 0b c0 a8 ..S.#... ........
0020 01 02|36 0a 00 50 09 ef 3a a8 d3 ae 9a 74 50 18 ..6..P.. :....tP.
0030 fa f0 84 ed 00 00|47 45 54 20 2f 20 48 54 54 50 ......GE T / HTTP
0040 2f 31 2e 31 0d 0a 48 6f 73 74 3a 20 66 73 0d 0a /1.1..Ho st: fs..
...
14 Ethernet + 20 IP + 20 TCP + data, in this case HTTP.
So we can see that it costs ~182 bytes to establish a TCP connection, and an additional 162-216 bytes to terminate a TCP connection (depending on whether a 4-way FIN ACK FIN ACK or more common 3-way FIN FIN-ACK ACK termination handshake is used), adding up to nearly 400 bytes to "pulse" a connection by disconnecting and reconnecting.
Compared to the 55 bytes you'd use to send one byte of data over an already established connection, this is obviously wasteful.
What you want to do is establish one connection and then send data as-needed. If you're really bandwidth constrained, you could use UDP (which requires no handshaking at all and has an overhead of only 14 Ethernet + 20 IP + 8 UDP bytes per packet), but then you face the problem of using an unreliable transport, and having to handle lost packets on your own.

The minimum size of a TCP packet is 40 bytes. It takes three exchanged packets to create a connection, two from the client and one from the server, and four more to close it, two in each direction. The last packet in a connect exchange can also contain data, as can the first in the close exchange in each direction, which can amortize it a bit, as can combining the outgoing FIN and ACK as per #josh3736's comment below.

Decoding an arbitrary block of NSData?

If I have an arbitrary block of NSData as a hex value, is there a way to determine what the object might have been before it was archived or serialized? I don't mind a few guess and check methods, but I need some pointers in the right direction.
I have an NSData object with some hex in it. What methods of NSData should I look at? Are there other classes to try as well
Don't want to scare people away from answering, but I have a file of game data which was likely encoded using a Cocoa Touch class. The data, when viewed in a hex editor, shows gibberish and a username, which leads me to suspect that it's an archived or encoded object of sorts. I have copied the hex from the hex editor into a sample project which I am using to try and unarchive the data.
I don't believe this is related to the 3d format, the file extension is arbitrary.
Here's the data. I'm hoping it doesn't get lost in translation:
'µköXN[ÎÀü÷h/F9ó9Vìñ°ceE¸z¶=Hmoshbermú«ó¼Ppù#ÝVÔ=4â®L,K;Êç;ASÀ&Ë÷ëÓ%È;Úf¬G}tmQ;µéüø_87´y©ã©!ß¶óQòAçÛl©âSG4S½3ýJ×ª9Ã¤ô¡wxiD²M¼ÏB]39øþ:óñ7ª¾÷èº£È3Ï¢ÍEFÍ¢ª»r]BmÁ'Ò+åygÞÅQ?luó>÷ú¼è6¸|}[¼[¶Ñ¦g!\OÎÒJSE..pSß&_ÈEäø)6òëó¨¼2¶ð°æà`ï7Ë=Ã¥:cÆ§=L4qG-"µ(ÐÝïß ÓãXkÀ4fzæ·p\ññT<tu¥Æ©;Ìn4£³Ï¢ÌFåG´
And the corresponding hex:
27 B5 6B F6 01 00 00 00 58 4E 5B CE C0 FC F7 68 2F 46 86 87 83 39 F3 39 9E 56 EC F1 B0 63 9E 65 45 B8 7A B6 3D 07 99 48 6D 6F 73 68 62 65 72 6D 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90 86 FA 03 0E AB F3 BC 0B 50 70 F9 23 DD 87 56 03 D4 3D 34 90 E2 AE 4C 2C 94 9E 8E 15 4B 0C 83 8C 3B 03 CA E7 3B 1B 41 53 C0 26 04 CB F7 EB D3 25 C8 3B DA 66 8A AC 47 7D 8A 7F 74 6D 51 3B B5 19 E9 FC F8 5F 38 37 B4 11 0C 79 A9 12 E3 A9 21 DF B6 F3 51 F2 41 E7 DB 85 02 9F 6C A9 E2 53 47 1F 34 86 53 BD 33 FD 4A D7 AA 39 C3 A4 F4 A1 77 78 69 44 B2 4D BC CF 42 5D 33 39 F8 FE 97 3A 81 F3 F1 10 37 AA BE 86 91 F7 1F E8 83 BA A3 C8 33 CF 1D A2 CD 45 7F 46 1F CD A2 AA BB 1A 72 5D 42 02 6D C1 0F 27 D2 2B E5 0B 79 67 DE C5 1A 51 3F 14 6C 75 F3 3E F7 FA BC E8 36 8E B8 7C 02 1C 7D 01 00 92 8C 19 5B BC 5B B6 D1 A6 67 7F 21 5C 84 13 4F CE 0C D2 4A 53 19 82 45 1B 2E 2E 96 70 53 DF 26 5F C8 1C 45 8F E4 F8 29 36 F2 EB 9D 95 F3 A8 BC 32 B6 F0 B0 E6 91 98 1A E0 99 60 EF 37 CB 3D C3 A5 3A 63 0C C6 A7 3D 4C 34 71 47 2D 22 B5 28 D0 DD EF DF 09 D3 E3 58 6B C0 17 34 66 7A E6 B7 70 5C F1 F1 54 3C 74 94 75 A5 C6 15 A9 9E 14 3B CC 15 10 83 6E 34 A3 B3 CF 0F A2 9C CC 8E 46 8C E5 00 00 47 B4 17 05 00 00 00 00
If anyone cares to help figure this out it would be much appreciated.

If I have an arbitrary block of NSData as a hex value, is there a way to determine what the object might have been before it was archived or serialized?
Not really. That's about as 'trivial' as reading arbitrary files correctly without the use of a UTI, extension, MIME type. Of course, your program would also need to support reading of all those files/formats.
I don't mind a few guess and check methods, but I need some pointers in the right direction.
You need to narrow your problem/inputs down, if you don't want an impossibly difficult task.
I have an NSData object with some hex in it. What methods of NSData should I look at?
It's just a data blob of length bytes. It could represent anything -- if you don't know where it came from.
Are there other classes to try as well?
Perhaps you would start by saving all your data via NSCoder or another serializer/archiver which offers some introspection and support for you to enter your own information (which would be comparable to a UTI or MIME type).
Edit:
Don't want to scare people away from answering, but I have a file of game data which was likely encoded using a Cocoa Touch class. The data, when viewed in a hex editor, shows gibberish and a username, which leads me to suspect that it's an archived or encoded object of sorts. I have copied the hex from the hex editor into a sample project which I am using to try and unarchive the data.
Using these APIs, the data may be represented multiple ways. You're probably facing something within the domain of 1) a proprietary file format through 2) a keyed archive.
The latter is easier for nontrivial data representations. You would need to define any objc classes you do not have available when unarchived. In that case, a few sample representations would offer a rough outline of the data structures you will need (under conventional implementations). It could also be an archive similar to an NSDictionary, if the unarchiver is capable of opening it. This is a problem which is easier than with other langs, since archiving often falls back on keys and values mapped to members in Cocoa.
Edit2:
The file came from the Draw Something directory. It's called gamedata.i3d
(shrug)

Try using NSKeyedUnarchiver to read it. It's not uncommon to use just the standard Foundation containers like NSArray, NSDictionary, and NSString to store data, so you might get lucky. That obviously won't work if custom classes were involved, but it might be worth 15 minutes of your time to try it.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse