What encoding does payload.encoded retrieves by default in metasploit? - encoding

I am analyzing this metasploit module, and I am wondering what encoding method does payload.encoded retrieves by default in metasploit.
I did a print payload.encoded in that exploit (without setting any encoder), and I get a normal string like:
PYIIIIIIIIIIQZVTX30VX4AP0A3HH0A00ABAABTAAQ2AB2BB0BBXP........
The module has an encoder option but it's commented.
I am use to see payloads encoded with the standard hex values like:
\xd9\xf7\xbd\x0f\xee\xaa\x47.......
Could someone help me understand where that string returned by payload.encoded comes from and what encoding it uses?

Turns out the first one was an alpha_upper encoded payload, the second is just binary data encoded with hex.

Related

Is it possible to base64 decode part of a base64 encoded message

I am working on a project where I am getting parts of base64 encoded data, but not the whole thing. Is it possible to figure out what that part of the base64 encoded data was?
For example. Say I base64 encode hello world
It becomes aGVsbG8gd29ybGQ=
But say I am only able to capture sbG8gd29y
Which base4 decodes to ݽ
I am familiar with how base64 encoding process works and I cannot think of a way to figure out what part of a base64 encoded message is without adding data randomly to the chunk on the front and back and comparing with dictionary words, but the problem is I am not even 100% sure that the data I am working with includes dictionary words.
Thanks
I just spent a little time using an online conveter (http://www.convertstring.com/EncodeDecode/Base64Decode)
If you take your captured section you can run it through the converter and see that its an invalid length for a base64 encoded string.
For a captured section to have a valid length you will need to add some extra characters (0-3 depending on the length of the section). A valid base64 string has a length that is exactly devisible by 4.
Pick a character ('a' for example) and then run through the posibilities of adding the correct amount of characters to the section, front and back. With your added characters the string will be decodable and one of the decoded values will be more readable, that will be the one that has the partially decoded data.
E.G:
sbG8gd29yaaa
and
aaasbG8gd29y
decodes to:
����ݽɦ�
and
i��lo wor
You can make a rudimentary programatic test for readability by counting the number of 'normal' characters within the string (a-z for example). You will need to make up your own mind what is 'normal', it will depend on the expected language of the data and the context (is it known to be numeric only for example).

Polish name (Wężarów) returned from json service as W\u0119\u017car\u00f3w, renders as Wężarów. Can't figure out encoding/charset.

I'm using DB-IP.com to get city names from IP addresses. Many of these are international cities, with special characters in the names.
As an example, one of these cities is Wężarów in Poland. Checking the JSON return in the console or opening the request URL directly, it's being returned from DB-IP as "W\u0119\u017car\u00f3w" with a Content-Type of text/javascript;charset=UTF-8. This is rendered in the browser as Wężarów - it is also saved in my mysql database as Wężarów (which I've tried with both utf8 and latin1 encoding).
I'm ok with saving it in the DB as another format, as long as I can convert it back to Wężarów for display in browser. I've tried encoding and decoding to/from several formats, even just to display directly on the screen (ignoring the DB entirely). I'm completely confused on what I need to do here to get it in readable format.
I'm working with PERL, however if I can figure out what I need to do with the encoding/decoding/charset (as I'm currently clueless), I'm sure I can figure it out from there.
It looks like the UTF-8 encoded string was interpreted by the browser as if it were Windows-1252. Here's how I deduced it:
% python3
>>> s = "W\u0119\u017car\u00f3w"
>>> b = bytes(s, encoding='utf-8')
>>> b
b'W\xc4\x99\xc5\xbcar\xc3\xb3w'
>>> str(b, encoding='utf-8')
'Wężarów'
>>> str(b, encoding='latin-1')
'WÄ\x99żarów'
>>> str(b, encoding='windows-1252')
'Wężarów'
If you're not good with Python, what I'm doing here is encoding the string "W\u0119\u017car\u00f3w" into UTF-8, yielding the byte sequence 'W\xc4\x99\xc5\xbcar\xc3\xb3w'. Decoding that with UTF-8 yielded 'Wężarów', confirming that this is the correct UTF-8 encoding of the string you want. So I took a guess that the browser is using the wrong encoding to render it, and decoded it using Latin-1. That gave me something very close, so I looked up Latin-1 and noticed that it's named as the basis for Windows-1252. Decoding again as Windows-1252 gives the result you saw.
What's gone wrong here is that the browser can't tell what encoding to use to render the page, and it's guessing wrong. You need to fix this by telling it explicitly to use UTF-8. Here's a page by the W3C that describes how to do that. Essentially what you need to do is add an HTML <meta> element to the document head. If you also set an HTTP header with the encoding name in it, make sure they are consistent.
(In Firefox, while you're debugging, you can go to View -> Character Encoding to set the encoding on a page-by-page basis. I assume other browsers have the same feature.)

How to auto detect a String encoding?

I have a String which contains some encoded values in some way like Base64.
The problem is that I really don't know if it's actually Base64 (there are A-Z, a-z. 0-9, +, /) so it can be some any other code that i'm not familiar with.
Is there a way or any other online site to send him an encoded input and it can tell me in which code is it?
NOTE:
I'm not asking how to know if my String is UTF-8 or iso-8859-1 or something like that.
What I need is to know in which is my code is encoded.
EDIT:
To be more clear,
I need something to get an input like: 23Nzi4lUE4qlc+Pmc3blWMS1Irmgo3i8UTQHhoL7VyzqpEV/i9bDhoiteZ0a7/TqcVSkrXR89V2Yj7tEFDGJx4gvWEBs= this is the encoded String that I have.
The output should be the type of the encoded String and it's decoding like:
Base64 -> "Big yellow fish is swimming in the tube."
Maybe there is some program which get's an input and tries to decode it with a list of coding types (Base64 and etc.). The output doesn't really matter because it's the users decision if it's good or not.
This site handles base64 de/encoding.
Since Base64 is just one instance of a class of encoding schemes ( specifically, encoding a bit stream as base_<n> number ), you probably will never fare better than testing for just a couple of standard encoding schemes.
You either check the well-formedness of the encoding scheme or try to decode without getting an error thrown using a web service or your own code.
In (possibly pathological) cases there will be more than one encoding scheme for which a given octet stream will successfully decode.
Best practice would be to take the effort invested into setting up the verification to committing the data provider to one (or 'a few') encoding(s) first (won't always be possible, of course).

Perl encodings question

I need to get a string from <STDIN>, written in latin and russian mixed encodings, and convert it to some url:
$search_url = "http://searchengine.com/search?text=" . uri_escape($query);
But this proccess goes bad and gives out Mojibake (a mixture of weird letters). What can I do with Perl to solve it?
Before you can get started, there's a few things you need to know.
You'll need to know the encoding of your input. "Latin" and "russian" aren't (character) encodings.
If you're dealing with multiple encodings, you'll need to know what is encoded using which encoding. "It's a mix" isn't good enough.
You'll need to know the encoding the site expects the query to use. This should be the same encoding as the page that contains the search form.
Then, it's just a matter of decoding the input using the correct encoding, and encoding the query using the correct encoding. That's the easy part. Encode provides functions decode and encode to do just that.

How to read text file without knowing the encoding

When reading a text file that was created somewhere else outside my app, the encoding used is unknown. My app has being using NSUnicodeStringEncoding (which is the same as NSUTF16StringEncoding) so have problems reading other than UTF16 encoded files.
Is there a way I can guess the encoding of a file? My priority is to be able to read UTF8 files and then all other files.
Is iterating through available encodings and check if read string's length is more than zero is really a good approach?
Thanks in advance.
Ignacio
Apple's documentation has some guidance on how to proceed: String Programming Guide: Reading data with an unknown encoding:
If you are forced to guess the encoding (and note that in the absence of explicit information, it is a guess):
Try stringWithContentsOfFile:usedEncoding:error: or initWithContentsOfFile:usedEncoding:error: (or the URL-based equivalents).
These methods try to determine the encoding of the resource, and if successful return by reference the encoding used.
If (1) fails, try to read the resource by specifying UTF-8 as the encoding.
If (2) fails, try an appropriate legacy encoding.
"Appropriate" here depends a bit on circumstances; it might be the default C string encoding, it might be ISO or Windows Latin 1, or something else, depending on where your data is coming from.
If the file is properly constructed you can read the first four bytes and see if it is a BOM (Byte Order Mark):
http://en.wikipedia.org/wiki/Byte-order_mark