Search for string in base64 encoded document? - encoding

I have some data that is stored in a base64 stream. I need to search for a certain string, let's say "White Rabbit". Rather than decoding all the records and searching for "White Rabbit" I thought I might be able to encode the string and search for that.
I don't know much about base64 and encoding, but I did notice that ANYTHING I encode has an = or an == at the end of it. There are no equal signs anywhere else in any encoded string.
So, what does this mean for my search? Can I just remove the equal signs?

The = serves as padding. A base64 string will only end with one or two = if they are required to pad the string out to the proper length.
Reference: https://en.wikipedia.org/wiki/Base64#Padding
As you mentioned, the process would be to encode your search string and then perform a normal search for it after removing any = characters..

Related

how to fix Base64 mutated encoding?

I've got a string, that looks like a Base64 ASCII encoded string:
#2aHR0cDovL2RhdGEwMi1jZG4uZGF0YWxvY2sucnUvZmkybG0vZTAxNGNmZTZhMzE1ZjgyODgyZWUxMmJjOTY5MzQ1MDkvN2ZfVGVzdC5uYS5iZXJlbWVubm9zdC5zMDMuZTAyLldFQi1ETFJpcC4yNUt1em1pY2g\/\/b2xvbG8=uYTEuMTYuMTEuMjIubXA0
If I decode it without any edit, it seems like a mess, but if I remove 2 chars from the very begining (#2), it decodes into a mostly correct string:
http://data02-cdn.datalock.ru/fi2lm/e014cfe6a315f82882ee12bc96934509/7f_Test.na.beremennost.s03.e02.WEB-DLRip.25Kuzmich?
but is still not complete. This URL should be like:
http://data02-cdn.datalock.ru/fi2lm/f03143c36c778262bd9906da5d545f85/7f_Test.na.beremennost.s03.e02.WEB-DLRip.25Kuzmich.a1.16.11.22.mp4
If I remove some more characters from initial string (#2aHR0cDovL2RhdGEwMi1jZG4uZ), I get a corrupted text with correct ending of decoded URL:
][KLKLMMLMYYLLML
LK\K\[Y[LPQ\R^ZXololo.a1.16.11.22.mp4
Is it a regular problem of base64 encoding or maybe there is some sort of mutation in encoded string and it can be solved?
In my experiments i used base64decode.org
The slashes should not be backslashed and there should not be an equals sign in the middle of the stream.
If I remove the backslashed slashes and the equals sign, I get
http://data02-cdn.datalock.ru/fi2lm/e014cfe6a315f82882ee12bc96934509/7f_Test.na.beremennost.s03.e02.WEB-DLRip.25Kuzmich.16.11.22.184
I don't think there's anything "regular" about this corruption. Lucky they didn't inject valid base64 or a lot of spurious junk.

Should bcrypted passwords be stored as byte array or as string and why?

I read on some SO answer (which I can't seem to find again) that bcrypted passwords must be stored as byte array in the database. I'm storing them as string. Is there any advantage I gain by storing them as byte array or maybe it's more secure that way?
Storing a base64 string? That's exactly as secure as raw binary. Treating raw binary as an ASCII string? That might get messed up during read, write, string processing, table export/backup. For example binary 0 = \0 = end of string marker in many languages.
An encoding doesn't change the security of the system. Use whatever is easier and compatible with your database. As always the encoding that you choose and the database API must be appropriate for arbitrary binary data, most character encodings such as UTF-8 and ASCII are not.

Is it possible to base64 decode part of a base64 encoded message

I am working on a project where I am getting parts of base64 encoded data, but not the whole thing. Is it possible to figure out what that part of the base64 encoded data was?
For example. Say I base64 encode hello world
It becomes aGVsbG8gd29ybGQ=
But say I am only able to capture sbG8gd29y
Which base4 decodes to ݽ
I am familiar with how base64 encoding process works and I cannot think of a way to figure out what part of a base64 encoded message is without adding data randomly to the chunk on the front and back and comparing with dictionary words, but the problem is I am not even 100% sure that the data I am working with includes dictionary words.
Thanks
I just spent a little time using an online conveter (http://www.convertstring.com/EncodeDecode/Base64Decode)
If you take your captured section you can run it through the converter and see that its an invalid length for a base64 encoded string.
For a captured section to have a valid length you will need to add some extra characters (0-3 depending on the length of the section). A valid base64 string has a length that is exactly devisible by 4.
Pick a character ('a' for example) and then run through the posibilities of adding the correct amount of characters to the section, front and back. With your added characters the string will be decodable and one of the decoded values will be more readable, that will be the one that has the partially decoded data.
E.G:
sbG8gd29yaaa
and
aaasbG8gd29y
decodes to:
����ݽɦ�
and
i��lo wor
You can make a rudimentary programatic test for readability by counting the number of 'normal' characters within the string (a-z for example). You will need to make up your own mind what is 'normal', it will depend on the expected language of the data and the context (is it known to be numeric only for example).

Special chars in base64 decoded string

When I decode (using one of the online decoders) a base64 string, the decoded data returns several special chars like sqaure blocks and `"
base64 encodes binary data to visible characters. If you decode it, the string will be turned back into the binary data, where some of the bytes won't have an ascii/unicode representation and will show as squares. This is normal behaviour. You should decode the data in the program you want to use the data in.

String to Unicode in C#

I want to use Unicode in my code. My Unicode value is 0100 and I am adding my Unicode string \u with my value. When I use string myVal="\u0100", it's working, but when I use like below, it's not working. The value is looking like "\\u1000";. How do I resolve this?
I want to use it like the below one, because the Unicode value may vary sometimes.
string uStr=#"\u" + "0100";
There are a couple of problems here. One is that #"\u" is actually the literal string "\u" (can also be represented as "\u").
The other issue is that you cannot construct a string in the way you describe because "\u" is not a valid string by itself. The compiler is expecting a value to follow "\u" (like "\u0100") to determine what the encoded value is supposed to be.
You need to keep in mind that strings in .NET are immutable, which means that when you look at what is going on behind the scenes with your concatenated example (`#"\u"+"0100"), this is what is actually happening:
Create the string "\u"
Create the string "0100"
Create the string "\u0100"
So you have three strings in memory. In order for that to happen all of the strings must be valid.
The first option that comes to mind for handling those values is to actually parse them as integers, and then convert them to characters. Something like:
var unicodeValue = (char)int.Parse("0100",System.Globalization.NumberStyles.AllowHexSpecifier);
Which will give you the single Unicode character value. From there, you can add it to a string, or convert it to a string using ToString().