Identifying an unknown data encoding? - encoding

I'm trying understand an undocumented API I have discovered, and I can't get past the format of the data that is being returned.
Here is an example of what I get back when I perform a GET on the url I'm looking at:
A+uZL4258wXdnWztlEPJNXtdl3Tu4hRITtW2AUwQHUK5c6BATSBU/XsQEVIttCpI7wrW/oXWiBloT8+cdtUWBag3mzk3cLohKPvi7PWpf7jqCSbjNGh+5Iv5Gb8by2k31kp62sfwZ+i8r/3TA6nGrnJb6edOB7d0c6F34RTFRrrZSeJtiWYXAJ5JeD3yJY+C
At first I thought this was base64 encoded, but that just gives me back gibberish:
echo -n "<above snippet>" | base64 -D
?/???ݝl?C?5Vy??????,?8?s?#M T?{R-?*H?
???ֈhOϜv??7?97p?!(??????? &?4h~???i7?Jz???g輯???Ʈr[??N?ts?w??F??I?m?f?Ix=?%?
When I strip the URL down to just the domain, I get a website with cyrillic text. Maybe the data could be converted to cyrillic somehow?
Does this data format look familiar to you?
I'll continue to keep trying and report back if I make any progress.

This is definitely base64, because of the / and + characters.
When you decode that string using base64, you get this hexdump:
00000000 03 eb 99 2f 8d b9 f3 05 dd 9d 6c ed 94 43 c9 35 |.../......l..C.5|
00000010 7b 5d 97 74 ee e2 14 48 4e d5 b6 01 4c 10 1d 42 |{].t...HN...L..B|
00000020 b9 73 a0 40 4d 20 54 fd 7b 10 11 52 2d b4 2a 48 |.s.#M T.{..R-.*H|
00000030 ef 0a d6 fe 85 d6 88 19 68 4f cf 9c 76 d5 16 05 |........hO..v...|
00000040 a8 37 9b 39 37 70 ba 21 28 fb e2 ec f5 a9 7f b8 |.7.97p.!(.......|
00000050 ea 09 26 e3 34 68 7e e4 8b f9 19 bf 1b cb 69 37 |..&.4h~.......i7|
00000060 d6 4a 7a da c7 f0 67 e8 bc af fd d3 03 a9 c6 ae |.Jz...g.........|
00000070 72 5b e9 e7 4e 07 b7 74 73 a1 77 e1 14 c5 46 ba |r[..N..ts.w...F.|
00000080 d9 49 e2 6d 89 66 17 00 9e 49 78 3d f2 25 8f 82 |.I.m.f...Ix=.%..|
This just looks like 128 bytes of random data. And whenever you call this API URL again, you get a different string, although it starts with the same few characters.
Perhaps you should ask the maintainers of that website how to use their API. Maybe this string is some session ID that you should use in further calls.

Related

Unknown data between h264 NAL units in an AVC file

I want to understand a weird observation I had while working with h264 encoded AVC files. In such files, each NAL unit is preceded by 1/2/4 bytes that encode the size of the NAL unit (without the size header). However, there has been some cases where the end of one NAL unit doesn't take to another NAL unit, instead it takes to a sequence of some data till it eventually reaches another NAL unit
For example, starting at 01ADF399, we have:
*00 00 35 99 41* 9A 12 25 83 A5 F0 7A 08 41 0C 1E 02 50 20 03 80 A4 12 30 B6 44 90
0C E1 CD A2 68 9F 9F 2E C0 2E 1C 18 A2 28 8A 85 65 AC 0B 7D F1 DD 0F ...
Which ends at 01AE2936 as:
21 1A 54 6D FC 34 3B 32 FA AA D6 71 8A BC 92 F9 95 79 75 8A E6 B5 A9 77 24 4A AC
1C E3 EF A2 9D 97 30 51 D1 7B EB 75 FD B2 8D 8A A7 B9 47 8A C6 59 1A 32 FB 9E 77
03 8E CA 67 23 B7 52 EE 2E A4 BA 43 CE F9 CD 46 48 C5 C4 41 35 32 F3 D6 5B CD BE
DA B8 B3 3E 1B 33 87 AE 65 A0 45 74 DF EB 37 96 2F DA 9C ...
Clearly not the start of an NAL unit (since FC doesn't have forbidden zero bit)
However, at 01AE7535, we have the following:
00 00 27 EA 41 9A 14 29 81 29 7C 80 41 04 18 98 44 64 01 C6 54 00 0D 9F 34 58 71
E5 0A A6 CD B0 4B 38 60 7F E6 1F C8 00 24 7A 06 E5 9B 21 99 F0 51 24 9B ...
Which is the start of an NAL unit. I verified that those two NAL units are consecutive since filtering the file to the annex B h264 format removes the unknown data in between and places those two exact NAL units right next to each other.
I tried looking at ISO 14496 part 15 but it doesn't mention anything about this.

RSA Private Key PEM in ASN.1 Format Contains Extra Bytes

I'm looking at an RSA private key in PEM format. When I decode the base64 string and review the components of the key, some of them have an extra byte, specifically the modulus, P, DP, and IQ. They all have a leading 0x00 byte. I'm handling this by just trimming the byte[] down to the expected lengths of 256 or 128 so I can use them with .NET RSAParameters and RSACryptoServiceProvider, but wondering why some of these INTEGER structures have the extra byte while others don't. It would appear that online and other libraries that decode the PEM to XML, etc handle this gracefully, so is this part of the RFC, just something that you have to protect against, or only a concern because I'm using the .NET libraries after decoding? Here's an example of the modulus with 257 bytes:
00 B7 55 AA 3F 14 89 BC CE ED AF 80 1C 54 2A DF
AB 3C 6A 44 B4 55 58 90 0E 0D 32 96 E6 EF 35 2D
AD B7 44 A7 AB CE 6F D3 BB 9D B4 4B FD 0A DE 87
96 03 55 23 81 49 FE 1B 3E CE 62 B6 2F B1 4C 33
E4 F8 C2 09 5F 0E 10 78 22 D0 F3 C9 BF B9 AC AC
11 00 17 28 09 23 10 D5 8A C9 2B E2 86 96 A7 E2
57 68 D7 3B 63 BE 74 ED B8 02 E2 63 EF F5 40 85
0C A6 9F D0 B6 88 36 8B 4E 6B 35 27 BE 11 CC C8
C3 0A 66 25 E0 AB B6 DD 6D E6 2B AF 9E 1C D7 11
CE 5F E7 C8 1F EB 3D 79 B3 B2 E1 FF D8 20 6D 76
A2 43 9E 20 67 58 97 39 46 D8 73 F6 F0 76 01 E0
61 8E 4A EE C4 03 A6 44 C7 D3 50 E3 C8 62 CF 33
D1 37 6B 85 F5 D4 3C 6D 1F 1A 14 B3 30 B5 E0 82
A5 94 83 4F 7A 17 DA 86 2B F7 2A 47 A3 5F D2 D5
7B 96 32 86 27 5E 2A 6A 85 6E C6 24 15 A9 09 65
BB 04 8C 0D 39 F7 15 D4 F0 F8 5F 0F B0 1D A7 2F
D7
Here's an example of the "D" parameter that is not padded with a leading 0x00:
04 07 EF 8A 5D 88 3D C7 8B 00 5D DF C1 96 03 BE
FF 20 1D 0C A8 07 BF 7B 1F 9D 2A 26 3F C2 3A 93
E4 40 B5 33 18 E1 EA 94 E8 7D C0 61 EF F8 3E A0
F4 C7 CD 75 0D 4C 72 0A EA 7C CF 26 B3 4E 4A A1
D1 3A 6A FA 55 11 D5 A2 66 57 C5 EA DA 49 4A AB
41 06 41 52 1A 1C 47 A5 BA 90 A5 75 72 20 94 E0
79 24 AA 60 A2 12 6E 1B AA AC 91 A7 F8 0B 88 21
64 14 85 81 4D F3 6D 12 B7 56 BE DD F6 04 3B B1
CC 95 A6 8C 9D A6 8D BF 05 C1 72 A4 0B 03 75 F6
40 B6 8E 25 91 3D 87 84 CD 23 EF 2C 29 13 DD A7
75 6E 48 F4 DE 49 98 4F B7 09 CF 5A A3 F5 39 05
37 C8 2B 79 64 F0 B8 AD 11 EF 79 FD 78 C0 6B 2B
50 7F DE BC 59 3D D1 A1 90 59 B7 7E 57 B4 2C A0
D2 20 D2 D6 7C 4A B3 3C 63 5D FA E6 67 18 58 AC
F3 EF 0E E1 C0 C9 B6 D9 8C D1 8E 3D CE 8A FF F0
12 BF C2 FE 72 DC 07 E4 3C 00 5B BE 05 D9 5A 61
And the DP parameter without leading 0:
3E 50 B2 28 A3 B1 71 F3 D5 31 B1 2D FD B3 60 4B
57 F8 C1 46 C7 89 B7 95 F4 7D AE 54 F2 EA 11 98
F7 61 93 30 50 D9 24 19 BF 7F 06 19 DB 97 01 06
8B 20 D7 7A 5E 1A FA 76 9A 0E 27 46 AB FF 25 3C
74 61 E2 9B 3E CE A5 F9 58 40 70 15 94 F2 58 3E
DB E4 90 91 3C 50 B0 24 8F C7 A7 55 EB E3 59 A7
5D 01 19 29 4F F9 F9 E6 EB 78 D1 93 14 61 E4 5C
36 D7 E7 82 58 E7 C5 60 21 F3 1E 5A D4 49 C6 D1
The RSA modulus is a positive number, and ASN.1 integers are all signed.
So if the leading 0x00 wasn't there, this byte encoding would represent a negative number because the first byte would have the high bit set (0xB7 >= 0x80). As a consequence, the 0x00 gets inserted into the DER data stream.
.NET's representation is based on the Windows CAPI representation. CAPI uses domain knowledge to know that the values are all unsigned integers, and then omits the leading 0x00 bytes. So it's up to whoever translates between the DER data and the .NET/CAPI data to add or remove the bytes as needed.
These values are encoded as INTEGER ASN type which uses two-complement notation. That is if most significant bit is set to 1, then the number is negative. However, all numbers (modulus, exponents, primes) in key are positive numbers and prepended with extra leading zero octet to denote positive integer. If the most significant bit is already set to 0, then no extra bytes are added.

How is this data encoded?

I've found several website exchange data with the browser through a seemingly similar encoding consisting of [A-z0-9/+] such as
11UmFuZG9tSVYkc2RlIyh9YWJdm48m5cJDWuLLIYaigN6cCfiKOtB/Xb00yCTJRLU1iZilyDQAF8okuqrX2gHm2pVsKeYHWxDNNLB9teqqv2msRNJADEmHqHJWhIpgq6kB4LRtzrU7x1ms9VfChurilER4JfNiBiNqsetbxxPaDn3+SZFhbqDh1V6ufUeNbri49YKKra3WOe4=
or
11UmFuZG9tSVYkc2RlIyh9YWJdm48m5cJDWuLLIYaigN6cCfiKOtB/Xb00yCTJRLU1iZilyDQAF8okuqrX2gHm2pVsKeYHWxDN9IdK3oj1qFvfsjLKRaHSu60i6RSRagBfu5CtrYWAiHIZXA5wKwOGZwh5lUwHApYhPZzbdpfT6ubbVr5WFlrO5VHDBw9bFi1hduxQaia2i8k=
or
11UmFuZG9tSVYkc2RlIyh9YUbM7JQKVn66Jan93YWDhnDQ1nd8g669ScUfvmjw724qB521xitv+brNAzNRj46wgJBqF3rIKqgVRRsk8d9ccybqIhLm0E76CUjS/XnuWXzxZM+wTAx8lSbkiqHWg7hcwb7Fr8bMqW0d+1gZqUJto3zdImOPwDTNRFmEmRhWnOEAV86C5yFfCyw=
It appears to be more than a simple ID - rather encoded data of some sort.
I've tried Base 64 - the second half (following the /) appears to be a valid base64 encoding of non Text data.
The hex values of the example strings (base64 decoded after the /) are:
5d bd 34 c8 24 c9 44 b5 35 89 98 a5 c8 34 00 17 ca 24 ba aa d7 da 01 e6 da 95 6c 29 e6 07 5b 10 cd 34 b0 7d b5 ea aa bf 69 ac 44 d2 40 0c 49 87 a8 72 56 84 8a 60 ab a9 01 e0 b4 6d ce b5 3b c7 59 ac f5 57 c2 86 ea e2 94 44 78 25 f3 62 06 23 6a b1 eb 5b c7 13 da 0e 7d fe 49 91 61 6e a0 e1 d5 5e ae 7d 47 8d 6e b8 b8 f5 82 8a ad ad d6 39 ee
5d bd 34 c8 24 c9 44 b5 35 89 98 a5 c8 34 00 17 ca 24 ba aa d7 da 01 e6 da 95 6c 29 e6 07 5b 10 cd f4 87 4a de 88 f5 a8 5b df b2 32 ca 45 a1 d2 bb ad 22 e9 14 91 6a 00 5f bb 90 ad ad 85 80 88 72 19 5c 0e 70 2b 03 86 67 08 79 95 4c 07 02 96 21 3d 9c db 76 97 d3 ea e6 db 56 be 56 16 5a ce e5 51 c3 07 0f 5b 16 2d 61 76 ec 50 6a 26 b6 8b c9
5e 7b 96 5f 3c 59 33 ec 13 03 1f 25 49 b9 22 a8 75 a0 ee 17 30 6f b1 6b f1 b3 2a 5b 47 7e d6 06 6a 50 9b 68 df 37 48 98 e3 f0 0d 33 51 16 61 26 46 15 a7 38 40 15 f3 a0 b9 c8 57 c2 cb

Decoding an arbitrary block of NSData?

If I have an arbitrary block of NSData as a hex value, is there a way to determine what the object might have been before it was archived or serialized? I don't mind a few guess and check methods, but I need some pointers in the right direction.
I have an NSData object with some hex in it. What methods of NSData should I look at? Are there other classes to try as well
Don't want to scare people away from answering, but I have a file of game data which was likely encoded using a Cocoa Touch class. The data, when viewed in a hex editor, shows gibberish and a username, which leads me to suspect that it's an archived or encoded object of sorts. I have copied the hex from the hex editor into a sample project which I am using to try and unarchive the data.
I don't believe this is related to the 3d format, the file extension is arbitrary.
Here's the data. I'm hoping it doesn't get lost in translation:
'µköXN[ÎÀü÷h/F9ó9Vìñ°ceE¸z¶=Hmoshbermú«ó¼Ppù#ÝVÔ=4â®L,K;Êç;ASÀ&Ë÷ëÓ%È;Úf¬G}tmQ;µéüø_87´y©ã©!߶óQòAçÛl©âSG4S½3ýJת9äô¡wxiD²M¼ÏB]39øþ:óñ7ª¾÷躣È3Ï¢ÍEFÍ¢ª»r]BmÁ'Ò+åygÞÅQ?luó>÷ú¼è6¸|}[¼[¶Ñ¦g!\OÎÒJSE..pSß&_ÈEäø)6òëó¨¼2¶ð°æà`ï7Ë=Ã¥:cƧ=L4qG-"µ(ÐÝïß ÓãXkÀ4fzæ·p\ññT<tu¥Æ©;Ìn4£³Ï¢ÌFåG´
And the corresponding hex:
27 B5 6B F6 01 00 00 00 58 4E 5B CE C0 FC F7 68 2F 46 86 87 83 39 F3 39 9E 56 EC F1 B0 63 9E 65 45 B8 7A B6 3D 07 99 48 6D 6F 73 68 62 65 72 6D 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90 86 FA 03 0E AB F3 BC 0B 50 70 F9 23 DD 87 56 03 D4 3D 34 90 E2 AE 4C 2C 94 9E 8E 15 4B 0C 83 8C 3B 03 CA E7 3B 1B 41 53 C0 26 04 CB F7 EB D3 25 C8 3B DA 66 8A AC 47 7D 8A 7F 74 6D 51 3B B5 19 E9 FC F8 5F 38 37 B4 11 0C 79 A9 12 E3 A9 21 DF B6 F3 51 F2 41 E7 DB 85 02 9F 6C A9 E2 53 47 1F 34 86 53 BD 33 FD 4A D7 AA 39 C3 A4 F4 A1 77 78 69 44 B2 4D BC CF 42 5D 33 39 F8 FE 97 3A 81 F3 F1 10 37 AA BE 86 91 F7 1F E8 83 BA A3 C8 33 CF 1D A2 CD 45 7F 46 1F CD A2 AA BB 1A 72 5D 42 02 6D C1 0F 27 D2 2B E5 0B 79 67 DE C5 1A 51 3F 14 6C 75 F3 3E F7 FA BC E8 36 8E B8 7C 02 1C 7D 01 00 92 8C 19 5B BC 5B B6 D1 A6 67 7F 21 5C 84 13 4F CE 0C D2 4A 53 19 82 45 1B 2E 2E 96 70 53 DF 26 5F C8 1C 45 8F E4 F8 29 36 F2 EB 9D 95 F3 A8 BC 32 B6 F0 B0 E6 91 98 1A E0 99 60 EF 37 CB 3D C3 A5 3A 63 0C C6 A7 3D 4C 34 71 47 2D 22 B5 28 D0 DD EF DF 09 D3 E3 58 6B C0 17 34 66 7A E6 B7 70 5C F1 F1 54 3C 74 94 75 A5 C6 15 A9 9E 14 3B CC 15 10 83 6E 34 A3 B3 CF 0F A2 9C CC 8E 46 8C E5 00 00 47 B4 17 05 00 00 00 00
If anyone cares to help figure this out it would be much appreciated.
If I have an arbitrary block of NSData as a hex value, is there a way to determine what the object might have been before it was archived or serialized?
Not really. That's about as 'trivial' as reading arbitrary files correctly without the use of a UTI, extension, MIME type. Of course, your program would also need to support reading of all those files/formats.
I don't mind a few guess and check methods, but I need some pointers in the right direction.
You need to narrow your problem/inputs down, if you don't want an impossibly difficult task.
I have an NSData object with some hex in it. What methods of NSData should I look at?
It's just a data blob of length bytes. It could represent anything -- if you don't know where it came from.
Are there other classes to try as well?
Perhaps you would start by saving all your data via NSCoder or another serializer/archiver which offers some introspection and support for you to enter your own information (which would be comparable to a UTI or MIME type).
Edit:
Don't want to scare people away from answering, but I have a file of game data which was likely encoded using a Cocoa Touch class. The data, when viewed in a hex editor, shows gibberish and a username, which leads me to suspect that it's an archived or encoded object of sorts. I have copied the hex from the hex editor into a sample project which I am using to try and unarchive the data.
Using these APIs, the data may be represented multiple ways. You're probably facing something within the domain of 1) a proprietary file format through 2) a keyed archive.
The latter is easier for nontrivial data representations. You would need to define any objc classes you do not have available when unarchived. In that case, a few sample representations would offer a rough outline of the data structures you will need (under conventional implementations). It could also be an archive similar to an NSDictionary, if the unarchiver is capable of opening it. This is a problem which is easier than with other langs, since archiving often falls back on keys and values mapped to members in Cocoa.
Edit2:
The file came from the Draw Something directory. It's called gamedata.i3d
(shrug)
Try using NSKeyedUnarchiver to read it. It's not uncommon to use just the standard Foundation containers like NSArray, NSDictionary, and NSString to store data, so you might get lucky. That obviously won't work if custom classes were involved, but it might be worth 15 minutes of your time to try it.

S-Box used in AES 128 bit CFB mode of encryption

Can anybody give me the S-Box used for 128 bit AES CFB mode of encryption.
Will this S-Box be same for every 128 -bit AES CFB Implementation.
Thanks
Here it is:
| 0 1 2 3 4 5 6 7 8 9 a b c d e f
---|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|
00 |63 7c 77 7b f2 6b 6f c5 30 01 67 2b fe d7 ab 76
10 |ca 82 c9 7d fa 59 47 f0 ad d4 a2 af 9c a4 72 c0
20 |b7 fd 93 26 36 3f f7 cc 34 a5 e5 f1 71 d8 31 15
30 |04 c7 23 c3 18 96 05 9a 07 12 80 e2 eb 27 b2 75
40 |09 83 2c 1a 1b 6e 5a a0 52 3b d6 b3 29 e3 2f 84
50 |53 d1 00 ed 20 fc b1 5b 6a cb be 39 4a 4c 58 cf
60 |d0 ef aa fb 43 4d 33 85 45 f9 02 7f 50 3c 9f a8
70 |51 a3 40 8f 92 9d 38 f5 bc b6 da 21 10 ff f3 d2
80 |cd 0c 13 ec 5f 97 44 17 c4 a7 7e 3d 64 5d 19 73
90 |60 81 4f dc 22 2a 90 88 46 ee b8 14 de 5e 0b db
a0 |e0 32 3a 0a 49 06 24 5c c2 d3 ac 62 91 95 e4 79
b0 |e7 c8 37 6d 8d d5 4e a9 6c 56 f4 ea 65 7a ae 08
c0 |ba 78 25 2e 1c a6 b4 c6 e8 dd 74 1f 4b bd 8b 8a
d0 |70 3e b5 66 48 03 f6 0e 61 35 57 b9 86 c1 1d 9e
e0 |e1 f8 98 11 69 d9 8e 94 9b 1e 87 e9 ce 55 28 df
f0 |8c a1 89 0d bf e6 42 68 41 99 2d 0f b0 54 bb 16
It can also be found in FIPS Pub 197, the official standard.
And yes, it is exactly the same for every implementation of AES. Otherwise you wouldn't be able to encrypt something others could decrypt or vice-versa.