I am trying to solve the problem with chars encoding in api results. I have tried already to convince API provide to add utf-8 definition to header, but without success. So I need to convert result by myself in dart/flutter.
Tried already many things with Utf8Decoder or utf8.encode/decode, but without proper result.
Below example is in Polish. What I should have is:
Czesi walczą z pożarem
What I have in API results is:
Czesi walczÄ z pożarem
How can I decode the string to proper format?
Thanks!
Related
I'm using camel 2.14.1 and splitting huge xml file with Chinese/Japanese characters using group=10000 within tokenize tag.
Files are created successfully based on grouping but Chinese/Japanese text codes are converted to Junk characters.
I tried enforcing UTF-8 before new XML creation using "ConvertBodyTo" but still issue persists.
Can someone help me !!
I had run into a similar issue while trying to split a csv file using tokenize with grouping.
Sample csv file: (with Delimiter - '|')
CandidateNumber|CandidateLastName|CandidateFirstName|EducationLevel
CAND123C001|Wells|Jimmy|Bachelor's Degree (±16 years)
CAND123C002|Wells|Tom|Bachelor's Degree (±16 years)
CAND123C003|Wells|James|Bachelor's Degree (±16 years)
CAND123C004|Wells|Tim|Bachelor's Degree (±16 years)
The ± character is corrupted after tokenize with grouping. I was initially under the assumption that the problem was with not setting the proper File Encoding for split, but the exchange seems to have the right value for property CamelCharsetName=ISO-8859-1.
from("file://<dir with csv files>?noop=true&charset=ISO-8859-1")
.split(body().tokenize("\n",2,true)).streaming()
.log("body: ${body}");
The same works fine with dont use grouping.
from("file://<dir with csv files>?noop=true&charset=ISO-8859-1")
.split(body().tokenize("\n")).streaming()
.log("body: ${body}");
Thanks to this post, it confirmed the issue is while grouping.
Looking at GroupTokenIterator in camel code base the problem seems to be with the way TypeConverter is used to convert String to InputStream
// convert to input stream
InputStream is =
camelContext.getTypeConverter().mandatoryConvertTo(InputStream.class, data);
...
Note: the mandatoryConvertTo() has an overloaded method with exchange
<T> T mandatoryConvertTo(Class<T> type, Exchange exchange, Object value)
As the exchange is not passed as argument it always falls back to default charset set using system property "org.apache.camel.default.charset"
Potential Fix:
// convert to input stream
InputStream is =
camelContext.getTypeConverter().mandatoryConvertTo(InputStream.class, exchange, data);
...
As this fix is in the camel-core, another potential option is to use split without grouping and use AgrregateStrategy with completionSize() and completionTimeout().
Although it would be great to get this fixed in camel-core.
I have this kind of request which seems to be GWT-RPC format :
R7~"61~com.foo.Service~"14~MethodA~D2~"3~F7e~"3~B0e~Ecom.foo.data.BeanType~I116~Lcom.foo.Parameter~I5~"1~b~Z0~"1~n~V~"1~o~Z1~"1~p~Z1~"1~q~V~'
But it is not in line with protocol described here:
https://docs.google.com/document/d/1eG0YocsYYbNAtivkLtcaiEE5IOF5u4LUol8-LL0TIKU/edit
What exactly is this protocol ? Is it really GWT-RPC or something else (deRPC?) ?
Looking into gwt-2.5.1 source code, I notice it seems that following packages could be generating this kind of format:
com.google.gwt.rpc.client
com.google.gwt.rpc.server
Is this deRPC ?
Based on a quick glance through the deRPC classes in the packages you listed, it does indeed appear to be deRPC. Note that deRPC has always been marked as experimental, and is now deprecated, and that either RPC or RequestFactory should be used instead.
Details that seem to confirm this:
com.google.gwt.rpc.client.impl.SimplePayloadSink#RPC_SEPARATOR_CHAR is a constant equal to the ~ character, which appears to be a separator between different tokens in the sample string you provided.
Both com.google.gwt.rpc.client.impl.SimplePayloadSink andcom.google.gwt.rpc.server.SimplePayloadDecoder` have many comments that appear to depict the same basic format that you are seeing in there:
// "4~abcd in endVisit(StringValueCommand x, Context ctx) closely matches several tokens in the sample string - a quote indicating a string, an int describing the length, a ~ separator, then the string itself (this doesnt match everything, I suspect because you removed details about the service and the name of the method):
"3~F7e
"3~B0e
"1~b
Booleans all follow Z1 or Z0, as in your sample string
Im working on a script to hash a "fingerprint" for communicating with the secure Pay Direct Post API.
The issue I have is im trying to create a SHA-1 String that matches the sample code provided so that i can ensure things get posted accurately.
the example Sha-1 string appears encoded like
01a1edbb159aa01b99740508d79620251c2f871d
However my string when converted appears as
7871D5C9A366339DA848FC64CB32F6A9AD8FCADD
completely different...
my code for this is as follows..
<cfset variables.finger_print = "ABC0010|txnpassword|0|Test Reference|1.00|20110616221931">
<cfset variables.finger_print = hash(variables.finger_print,'SHA-1')>
<cfoutput>
#variables.finger_print#
</cfoutput>
Im using Coldfusion 8 to do this
it generates a 40 character hash, but i can see its generating completely different strings.
Hopefully someone out there has done this before and can point me in the right direction...
thanks in advance
** EDIT
The article for creating the Hash only contains the following information.
Example: Setting the fingerprint Fields joined with a | separator:
ABC0010|txnpassword|0|Test Reference|1.00|20110616221931
SHA1 the above string: 01a1edbb159aa01b99740508d79620251c2f871d
When generating the above example string using coldfusion hash it turns it into this
7871D5C9A366339DA848FC64CB32F6A9AD8FCADD
01a1edbb159aa01b99740508d79620251c2f871d
Sorry, but I do not see how the sample string could possibly produce that result given that php, CF and java all say otherwise. I suspect an error in the documentation. The one thing that stands out is the use of "txnpassword" instead of a sample value, like with the other fields. Perhaps they used a different value to produce the string and forgot to plug it into the actual example?
Update:
Example 5.2.1.12, on page 27, makes more sense. Ignoring case, the results from ColdFusion match exactly. I noticed the description also mentions something about a summarycode value, which is absent from the example in section 3.3.6. So that tends to support the theory of documentation error with the earlier example.
Code:
<cfset input = "ABC0010|mytxnpasswd|MyReference|1000|201105231545|1">
<cfoutput>#hash(input, "sha-1")#</cfoutput>
Result:
3F97240C9607E86F87C405AF340608828D331E10
I'm struggling with Solr search Arabic for several days and made some experiment. Here is the simple reflection of the problem.
After I store some Arabic sentence (now only 1 word السوري ) into database and have Solr index it, then query it by q=*:*&wt=python,(if no wt part, it was garbled chars) the response is:
'\u00d8\u00a7\u00d9\u201e\u00d8\u00b3\u00d9\u02c6\u00d8\u00b1\u00d9\u0160'
The actual word I store there for index is coding in another way:
'\xd8\xa7\xd9\x84\xd8\xb3\xd9\x88\xd8\xb1\xd9\x8a'
As you can tell, there is a one-to-to corresponding from \xd8↔\u00d8. But I don't know what is the name of this coding, thus I cannot convert it. And when I do the search as: <>/select/?q=السوري&wt=python,the response is:
{'responseHeader':{'status':0,'QTime':0,'params':{'wt':'python','q':u'\u0627\u0644\u0633\u0648\u0631\u064a'}},'response':{'numFound':0,'start':0,'docs':[]}}
No docs found and it seems using a third version for coding u'\u0627\u0644\u0633\u0648\u0631\u064a'. if I take it and encode('utf8') then it convert back to '\xd8\xa7\xd9\x84\xd8\xb3\xd9\x88\xd8\xb1\xd9\x8a'.
In summary, when it (السوري) is in my code (python) or in data base (mysql),
it presents as 'form1':
'\xd8\xa7\xd9\x84\xd8\xb3\xd9\x88\xd8\xb1\xd9\x8a'
When it is indexed by Solr, it converts to form2:
'\u00d8\u00a7\u00d9\u201e\u00d8\u00b3\u00d9\u02c6\u00d8\u00b1\u00d9\u0160'
And when I use <>/select/?q=السوري&wt=python, to query from browser (Google chrome), it becomes form3:
'\u0627\u0644\u0633\u0648\u0631\u064a'
(which could convert back to form1 by encode('utf8') But since they are different, the search matches nothing.
Therefore, those three different encode strategy may be the core problem. Could anyone help me figure it out and solve the search problem?
Thanks in advance.
I am working on an API using SOAP and WSDL. The WSDL expects integers to come through. I am fairly new to ALL of this, and constructing XML in Python. I have chosen to use minidom to create my SOAP message. So using minidom, to get a value into a node I found I have to do this:
weight_node = xml_file.createElement("web:Weight")
weight_contents = xml_file.createTextNode(weight)
weight_node.appendChild(weight_contents)
So say weight needs to go in as an integer and IS an integer. The function is 'createTextNode' does this mean its going to be text, or what I put in there has to be text? Again I am fairly new to all of this. So if what I have explained seems way off base, please speak up.