Encoded string decoding - encoding

Got a string in JSON, definitly encoded. The question is: if there any way to find out which method's been used to encode it an how to make it readable again?
#2aHR0cDovL2RhdGEtaGQuZGF0YWxvY2sucnUvZmkybG0vYS8vNDQwMjg1L2hkX1RoZS5XYWxraW5nLkRlYWQuUzExRTAwLjcyMHAucnVzLkxvc3RGaWxtLlRWLmExLjE2LjExLjIyLm1wNCBvciBodHRwOi8vdGVtcC1jZG4uZGF0YWxvY2sucnUvZmkybG0vYS8vNDQwMjg1LzdmX1RoZS5XYWxraW5nLkRlYWQuUzExRTAwLjcyMHAucnVzLkxvc3RGaWxtLlRWLmExLjE2LjExLj\/\/b2xvbG8=IyLm1wNA==
The whole object looks like:
{"title":"1 \u0441\u0435\u0440\u0438\u044f SD<br>","file":"#2aHR0cDovL3RlbXAtY2RuLmRhdGFsb2NrLnJ1L2ZpMmxtL2UwMTRjZmU2YTMxNWY4Mjg4MmVlMTJiYzk2OTM0NTA5LzdmX\/\/b2xvbG8=1Rlc3QubmEuYmVyZW1lbm5vc3QuczAzLmUwMS5XRUItRExSaXAuMjVLdXptaWNoLmExLjE2LjExLjIyLm1wNA==","subtitle":"","galabel":"34382_880414","id":"1","vars":"880414"}

Related

How to handle complex transformations in circe decode

Let's say that I have JSON
{"some": "123"}
And I want to create a custom decoder that would decode it to
case class Test(some: Long Refined NonNegative = 0)
So first I would need to decode JSON to String. Then turn String to Long which may throw an error which I can wrap in Try and use Decoder.decodeString.emapTry. But then I have to turn Long to Long Refined NonNegative which can be done via method refinedV[NonNegative](x: Long) which returns Either[String, Long Refined NonNegative]. And on top of all that, there is the default value I have to handle. Its gets very messy.
How can I manage such complex transformations during decoding, are there more complex structures than emap and emapTry?

How can I save a string array to PlayerPrefs in Unity?

I have an array and I would like to save it to PlayerPrefs. I heard, I can do this:
PlayerPrefs.SetStringArray('title', anArray);
but for some reason it does not work.
Maybe I'm not using some library like using UnityEngine.PlayerPrefs;?
Can someone help me?
Thanks in advance
You can't. PlayerPrefs doesn't support arrays.
But you could use a special separator and do e.g.
PlayerPrefs.SetString("title", string.Join("###", anArray));
and then for reading use
var anArray = PlayerPrefs.SetString("title").Split(new []{"###"}, StringSplitOptions.None);
Or if you know the content and in particular which character is never used you could also use a single char e.g.
PlayerPrefs.SetString("title", string.Join("/n", anArray));
and then for reading use
var anArray = PlayerPrefs.SetString("title").Split('/n');
Yes as TEEBQNE mentioned there is PlayerPrefsX.cs which might be the source of the confusion.
I would NOT recommend it though! It simply converts all the different input types into byte[] and from there to Base64 strings.
That might be cool and all for int[], bool[], etc. But for string[] this is absolutely inefficient since the Base64 bytes representation of a string is way longer than the string itself!
It might be a valid alternative though if you can not rely on your strings contents and you can not be sure that your separator sequence is never actually a content of any string.

How can I find which value caused a bson.errors.InvalidStringData

I have a system that reads data from various sources and stores them in MongoDB. The data I receive is already properly encoded in utf-8 or in unicode. Documents are loosely related and vary greatly in schema, if you will.
Every now and then, a document has a field value that is pure binary data, like a JPEG image. I know how to wrap that value in a bson.binary.Binary object to avoid the bson.errors.InvalidStringData exception.
Is there a way to tell which part of a document made pymongo driver to raise a bson.errors.InvalidStringData, or do I have to try and convert each field to find it ?
(+If by chance a binary object happens to be a valid unicode string or utf-8, it will be stored as a string and that's ok)
PyMongo has two BSON implementations, one in Python for portability and one in C for speed. _make_c_string in the Python version will tell you what it failed to encode but the C version, which is evidently what you're using, does not. You can tell which BSON implementation you have with import bson; bson.has_c(). I've filed PYTHON-533, it'll be fixed soon.
(Answering my own question)
You can't tell from the exception, and some rewrite of the driver would be required to support that feature.
The code is in bson/__init__.py. There is a function named _make_c_string that raises InvalidStringData if string throws a UnicodeError if it is to be encoded in utf-8. The same function is used for both keys and values that are strings.
In other words, at this point in code, the driver does not know if it is dealing with a key or value.
The offending data is passed as a raw string to the exception's constructor, but for a reason I don't understand, it does not come out of the driver.
>>> bad['zzz'] = '0\x82\x05\x17'
>>> try:
... db.test.insert(bad)
... except bson.errors.InvalidStringData as isd:
... print isd
...
strings in documents must be valid UTF-8
But that does not matter: you would have to look up the keys for that value anyway.
The best way is to iterate over the values, trying to decode them in utf-8. If a UnicodeDecodeError is raised, wrap the value in a Binary object.
Somewhat like this :
try:
#This code could deal with other encodings, like latin_1
#but that's not the point here
value.decode('utf-8')
except UnicodeDecodeError:
value = bson.binary.Binary(str(value))

Getting UTF-8 Request Parameter Strings in mod_perl2

I'm using mod_perl2 for a website and use CGI::Apache2::Wrapper to get the request parameters for the page (e.g. post data). I've noticed that the string the $req->param("parameter") function returns is not UTF-8. If I use the string as-is I can end up with garbled results, so I need to decode it using Encode::decode_utf8(). Is there anyway to either get the parameters already decoded into UTF-8 strings or loop through the parameters and safely decode them?
To get the parameters already decoded, we would need to override the behaviour of the underlying class Apache2::Request from libapreq2, thus losing its XS speed advantage. But that is not even straightforward possible, as unfortunately we are sabotaged by the CGI::Apache2::Wrapper constructor:
unless (defined $r and ref($r) and ref($r) eq 'Apache2::RequestRec') {
This is wrong OO programming, it should say
… $r->isa('Apache2::RequestRec')
or perhaps forego class names altogether and just test for behaviour (… $r->can('param')).
I say, with those obstacles, it's not worth it. I recommend to keep your existing solution that decodes parameters explicitly. It's clear enough.
To loop over the request parameters, simply do not pass an argument to the param method and you get a list of the names. This is documented (1, 2), please read more carefully.

What kind of data type is this?

In an class header I have seen something like this:
enum {
kAudioSessionProperty_PreferredHardwareSampleRate = 'hwsr', // Float64
kAudioSessionProperty_PreferredHardwareIOBufferDuration = 'iobd' // Float32
};
Now I wonder what data type such an kAudioSessionProperty_PreferredHardwareSampleRate actually is?
I mean this looks like plain old C, but in Objective-C I would write #"hwsr" if I wanted to make it a string.
I want to pass such an "constant" or "enum thing" as argument to an method.
This converts to an UInt32 enum value using the ASCII value of each of the entries. This style have been around for a long time in Mac OS headers.
'hwsr' has the same value as if you had written 0x68777372, but is a lot more reader friendly. If you used the #"hwsr" style instead you would need more than 4 bytes to represent the same.
The advantage of using this style is that you are actually able to quickly identify the content of a raw data stream if you can see the ASCII values of it.