UTF8 Encoding problem - in Java and VC interaction

UTF8 Encoding problem - in Java and VC interaction - encoding

I have a Java application, which makes an socket connection with a VC applications and send a String over that. This string is UTF8 encoded before sending.
Die to security requirements, I need to encrypt certain part of the String; in this encryption some character are generated for whom ASCII value is greater than 128.
When I send such characters over socket connection and convert those back to ANSI, those are changed to ? in the receiving VC application.
I am not able to find out what maybe causing it; please help.
Regards
Nitin

Related

SOAP asciihex encoding

I would really appreciate if anyone can help me with this, I looked a bit into this issue on the internet but no results.
As a client, I'm retrieving a SOAP message trough a GET request, the message is encoded trough "asciihex" encoding.
The documentation says this - "If the file is downloaded, it will be encoded and the client has to decode it to read it correctly. Currently, CCon supports just asciihex encoding mode. If this mode is used, every single original byte is encoded as a sequence of two characters representing it in hexadecimal. So, if the original byte was 0x0a, the transmitted bytes are 0x30 and 0x41 (‘0’ and ‘a’ in ASCII)."
Problem comes with encoding this message, I don't have any idea how to translate this into an intelligible text.
The messaging protocol used for this is SOAP, I'm testing this using SoapUI.

Base64 Encoding difference in a particular String

I have a doubt. It's regarding Base64 encoding of one particular String.
We have an application which allows REST WebServices to be executed after authorization of type Basic Authentication is successful.
I has set the password for a user USER_NAME with the password CP#5N0v22nD17RrV8f4.
From my system, using Postman/Advanced REST client, the request sent is processed successfully. But the same request fails when made most of the other systems using the same REST client.
When I set this password to another user, that user credentials is facing the same problem.
I noticed that the Base64 encoding Output Charset is the problem. But there is no method to change it in the REST clients (not in the most of the ready-made ones).
But why is this happening only for this particular password. I check with every other passwords and it works fine.
String: USER_NAME:CP#5N0v22nD17RrV8f4
UTF-8: VVNFUl9OQU1FOkNQQDVOMHYyMm5EMTdSclY4ZjTigIs=
Windows-1252: VVNFUl9OQU1FOkNQQDVOMHYyMm5EMTdSclY4ZjQ=
ASCII: VVNFUl9OQU1FOkNQQDVOMHYyMm5EMTdSclY4ZjQ=
Only for CP#5N0v22nD17RrV8f4 the UTF-8 output charset encoding in Base64 is giving a different result.
Using any other passwords, all the outputs are the same.
Please make me understand why CP#5N0v22nD17RrV8f4 is different from the rest of the strings.
Thanks in Advance
Balu

The string has a non breaking space at the end of the string.
I tested this using the following steps.
Decoded the UTF-8 string VVNFUl9OQU1FOkNQQDVOMHYyMm5EMTdSclY4ZjTigIs= at https://www.base64decode.org/
Copied the result to encode in UTF-8 at https://www.base64decode.org/, but this time pressed backspace once at the end of string. Gives me output VVNFUl9OQU1FOkNQQDVOMHYyMm5EMTdSclY4ZjQ=
You could also try typing the characters manually, and encoding.

How to auto detect a String encoding?

I have a String which contains some encoded values in some way like Base64.
The problem is that I really don't know if it's actually Base64 (there are A-Z, a-z. 0-9, +, /) so it can be some any other code that i'm not familiar with.
Is there a way or any other online site to send him an encoded input and it can tell me in which code is it?
NOTE:
I'm not asking how to know if my String is UTF-8 or iso-8859-1 or something like that.
What I need is to know in which is my code is encoded.
EDIT:
To be more clear,
I need something to get an input like: 23Nzi4lUE4qlc+Pmc3blWMS1Irmgo3i8UTQHhoL7VyzqpEV/i9bDhoiteZ0a7/TqcVSkrXR89V2Yj7tEFDGJx4gvWEBs= this is the encoded String that I have.
The output should be the type of the encoded String and it's decoding like:
Base64 -> "Big yellow fish is swimming in the tube."
Maybe there is some program which get's an input and tries to decode it with a list of coding types (Base64 and etc.). The output doesn't really matter because it's the users decision if it's good or not.

This site handles base64 de/encoding.
Since Base64 is just one instance of a class of encoding schemes ( specifically, encoding a bit stream as base_<n> number ), you probably will never fare better than testing for just a couple of standard encoding schemes.
You either check the well-formedness of the encoding scheme or try to decode without getting an error thrown using a web service or your own code.
In (possibly pathological) cases there will be more than one encoding scheme for which a given octet stream will successfully decode.
Best practice would be to take the effort invested into setting up the verification to committing the data provider to one (or 'a few') encoding(s) first (won't always be possible, of course).

How to send text in UTF-8 using Indy TIdTCPServer in c++ builder

My client j2me application reading text input stream using UTF-8
reader = new InputStreamReader(in,"UTF-8");
and my server when gets connected sends text using this statement
AContext->Connection->IOHandler->WriteLn(cxMemo1->Text,TEncoding::UTF8);
but result text showing weird characters like ?????????????????????????? ?????????????
Where I'm doing wrong?
also when i tried to load from utf-8 encoding data file in such a way
AContext->Connection->IOHandler->WriteFile("c:\\fids.xml");
it's all the same!

Indy 10 completely supports UTF-8 encoding. I've myself worked with it's TIdFTP component & successfully uploaded Unicode text files. From what I can make of it:
Your connection/transfer type is set to ftASCII rather than ftBinary.
Your J2ME applet/Host platform does not suport UTF-8

'?' characters occur when data is going through a Unicode-to-Ansi conversion to an Ansi charset that does not support the Unicode characters being converted.
What version of C++Builder are you using? In versions prior to CB2009, you should tell Indy the encoding of the AnsiString data that you are passing in. Indy defaults to ASCII (ie: TIdTextEncoding::ASCII) for most String-based operation. That can be overridden when needed, either with optional AAnsiEncoding parameters, the TIdIOHandler::DefAnsiEncoding property, or the global Idglobal::GIdDefaultAnsiEncoding setting. If you do not specify the correct encoding, the AnsiString data may not be converted to Unicode correctly before then being converted to UTF-8. For example:
AContext->Connection->IOHandler->WriteLn(cxMemo1->Text, TIdTextEncoding_UTF8, TTIdTextEncoding_Default);
Or:
AContext->Connection->IOHandler->DefAnsiEncoding = TIdTextEncoding_Default;
AContext->Connection->IOHandler->WriteLn(cxMemo1->Text, TIdTextEncoding_UTF8);
You can optionally also use the TIdIOHandler::DefStringEncoding property if you do not want to specify the UTF-8 encoding on every call:
AContext->Connection->IOHandler->DefStringEncoding = TIdTextEncoding_UTF8;
AContext->Connection->IOHandler->WriteLn(cxMemo1->Text);
Now, with that said, the fact that WriteFile() is also sending data that J2ME is not handling correctly tells me that Indy is not the root of the issue. WriteFile() simply dups the raw file data as-is to the connection without any interpretation at all. If you send a UTF-8 encoded file, then UTF-8 encoded octets will be sent to J2ME.
I suggest you use a packet sniffer, such as Wireshark, to verify the data that Indy is sending. That will tell you for sure whether Indy is really at fault or not.
*PS: notice in the examples above that I use Indy's TIdTextEncoding macros instead of TEncoding directly. This is because Indy's TIdTextEncoding logic works around some bugs in Embarcadero's TEncoding classes. Also, we're going to phase out direct support for TEncoding in Indy 11 and expand on TIdTextEncoding so Indy has more control than Embarcadero offers.

How to read text file without knowing the encoding

When reading a text file that was created somewhere else outside my app, the encoding used is unknown. My app has being using NSUnicodeStringEncoding (which is the same as NSUTF16StringEncoding) so have problems reading other than UTF16 encoded files.
Is there a way I can guess the encoding of a file? My priority is to be able to read UTF8 files and then all other files.
Is iterating through available encodings and check if read string's length is more than zero is really a good approach?
Thanks in advance.
Ignacio

Apple's documentation has some guidance on how to proceed: String Programming Guide: Reading data with an unknown encoding:
If you are forced to guess the encoding (and note that in the absence of explicit information, it is a guess):
Try stringWithContentsOfFile:usedEncoding:error: or initWithContentsOfFile:usedEncoding:error: (or the URL-based equivalents).
These methods try to determine the encoding of the resource, and if successful return by reference the encoding used.
If (1) fails, try to read the resource by specifying UTF-8 as the encoding.
If (2) fails, try an appropriate legacy encoding.
"Appropriate" here depends a bit on circumstances; it might be the default C string encoding, it might be ISO or Windows Latin 1, or something else, depending on where your data is coming from.

If the file is properly constructed you can read the first four bytes and see if it is a BOM (Byte Order Mark):
http://en.wikipedia.org/wiki/Byte-order_mark

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

UTF8 Encoding problem - in Java and VC interaction - encoding

Related

SOAP asciihex encoding

Base64 Encoding difference in a particular String

How to auto detect a String encoding?

How to send text in UTF-8 using Indy TIdTCPServer in c++ builder

How to read text file without knowing the encoding

Categories

Resources