How to convert all \u**** string into the real readable NSString? [duplicate] - iphone

I am trying to support arbitrary unicode from a variety of international users. They have already put a bunch of data into sqlite databases on their iPhones, and now I want to capture the data into a database, then send it back to their device. Right now I am using a php page that is sending data back to from an internet mysql database. The data is saved in the mysql database properly, but when it's sent back it comes out as unicode text, such as
Frank\u00e2\u0080\u0099s iPad
instead of just
Frank's iPad
where the apostrophe should really be a curly apostrophe.
The answer posted to another question indicates that there is no built-in Cocoa methods to convert the "\u00e2\u0080\u0099" portion of the unicode string from the webserver to an NSString object. Is this correct?
That seems really surprising (and scarily disappointing), since Cocoa definitely allows input from many different Unicode characters, and I need to support any arbitrary language that I have never heard of, and all of the possible characters. I save them to and from the local sqlite database just fine now, but once I send it to a web server, then perhaps pull down different data, I want to ensure the data pulled from the web server is correctly formatted.

[...] there is no built-in Cocoa methods to convert [...]. Is this
correct?
It's not correct.
You might be interested in CFStringTransform and it's capabilities. It is a full blown ICU transformation engine, which can (also) perform your requested transformation.
See Using Objective C/Cocoa to unescape unicode characters, ie \u1234

All NSStrings are Unicode.
The problem with the “Frank\u00e2\u0080\u0099s iPad” data isn't that it's Unicode; it's that it's escaped to ASCII. “Frank’s iPad” is valid Unicode in any UTF, and is what you need.
So, you need to see whether the database is returning the data escaped or the PHP layer is escaping it at some point. If either of those is the case, fix it if you can; the PHP resource should return UTF-8/16/32. Only if that approach fails should you seek to unescape the string on the Cocoa side.
You're correct that there is no built-in way to unescape the string in Cocoa. If you get to that point, see if you can find some open-source code to do it; if not, you'll need to do it yourself, probably using NSScanner.

Check that your web service response has Content type and charset. Also that xml has encoding specified. In PHP you need to add the following before printing XML:
header('Content-type: text/xml; charset=UTF-8');
print '<?xml version="1.0" encoding="UTF-8"?>';
I guess there is just no encoding specified.

Related

Spring cloud gateway uri decoding failing

I wrote a gateway application using Spring cloud Greenwich binaries. I'm seeing issues when special characters are present in URL. The request fails with below exception in Spring gateway when request URI contains special characters.
localhost:8080/myresource/WG_splchar_%26%5E%26%25%5E%26%23%25%24%5E%26%25%26*%25%2B)!%24%23%24%25%26%5E_new
When I hit above url, Spring fails with below exception. I'm not able to figure out why it's an invalid sequence and how things like these can be handled.
java.lang.IllegalArgumentException: Invalid encoded sequence "%^&#%$^&%&*%+)!$#$%&^_new"
at org.springframework.util.StringUtils.uriDecode(StringUtils.java:741) ~[spring-core-5.1.4.RELEASE.jar:5.1.4.RELEASE]
at org.springframework.http.server.DefaultPathContainer.parsePathSegment(DefaultPathContainer.java:126) ~[spring-web-5.1.4.RELEASE.jar:5.1.4.RELEASE]
at org.springframework.http.server.DefaultPathContainer.createFromUrlPath(DefaultPathContainer.java:111) ~[spring-web-5.1.4.RELEASE.jar:5.1.4.RELEASE]
at org.springframework.http.server.PathContainer.parsePath(PathContainer.java:76) ~[spring-web-5.1.4.RELEASE.jar:5.1.4.RELEASE]
at org.springframework.cloud.gateway.handler.predicate.PathRoutePredicateFactory.lambda$apply$2(PathRoutePredicateFactory.java:79) ~[spring-cloud-gateway-core-2.1.0.RC3.jar:2.1.0.RC3]
at org.springframework.cloud.gateway.support.ServerWebExchangeUtils.lambda$toAsyncPredicate$1(ServerWebExchangeUtils.java:128) ~[spring-cloud-gateway-core-2.1.0.RC3.jar:2.1.0.RC3]
at org.springframework.cloud.gateway.handler.AsyncPredicate.lambda$and$1(AsyncPredicate.java:35) ~[spring-cloud-gateway-core-2.1.0.RC3.jar:2.1.0.RC3]
at org.springframework.cloud.gateway.handler.RoutePredicateHandlerMapping.lambda$null$2(RoutePredicateHandlerMapping.java:112) ~[spring-cloud-gateway-core-2.1.0.RC3.jar:2.1.0.RC3]
at reactor.core.publisher.MonoFilterWhen$MonoFilterWhenMain.onNext(MonoFilterWhen.java:116) [reactor-core-3.2.5.RELEASE.jar:3.2.5.RELEASE]
at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2070) [reactor-core-3.2.5.RELEASE.jar:3.2.5.RELEASE]
at reactor.core.publisher.MonoFilterWhen$MonoFilterWhenMain.onSubscribe(MonoFilterWhen.java:103) [reactor-core-3.2.5.RELEASE.jar:3.2.5.RELEASE]
at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:54) [reactor-core-3.2.5.RELEASE.jar:3.2.5.RELEASE]
at reactor.core.publisher.MonoFilterWhen.subscribe(MonoFilterWhen.java:56) [reactor-core-3.2.5.RELEASE.jar:3.2.5.RELEASE]
I answered the other question already and don't feel like retyping. The spirit of the answer is the exact same.
Write a unit test exercising this method off of the Spring cloud utils. This is what's breaking. You can try passing in more or less of the string you're concerned about to find where the breakage is. Use a binary search to figure out what's broken. Make sure you don't split the string in the middle of an encoded character or else you'll give yourself a false positive. When it says you have an invalid sequence I would expect you have something like %99 where 99 is does not map to any valid character (I'm just making one up)
https://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/util/StringUtils.html#uriDecode-java.lang.String-java.nio.charset.Charset-
As an aside
Where is this encoded string coming from? Did someone at your company create their own solution to encode this string to begin with? Are you accepting user data? It's VERY POSSIBLE that whomever is responsible for producing this string encoded it incorrectly by homerolling their own encoder.
ALTERNATIVELY
spring.cloud.gateway.routes[7].predicates[0]=Path=/test/{testId}/test1/test_%26%5E%26%25%5E%26%25%26*%25%2B)!
When I look at this I see a path that is already encoded. For example, you've taken your ampersand & character and replaced it with %26
Have you tried inputting a path that is NOT already encoded?
For example
spring.cloud.gateway.routes[7].predicates[0]=Path=/test/{testId}/test1/test_&^&%^ < I only partially decoded it by hand using this chart. https://www.w3schools.com/tags/ref_urlencode.asp

Encoding a GPX file such that it's accepted by the /matchroute endpoint of the Here API

I am trying to call the resource /matchroute via a GET request.
However, I can't figure out how to encode the GPX file so that the resource accepts my request: I always receive HTTP error 400 as a response from the Here server.
As exemplary data I used the following file:
<?xml version="1.0"?>
<gpx version="1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.topografix.com/GPX/1/0"
xsi:schemaLocation="http://www.topografix.com/GPX/1/0
http://www.topografix.com/GPX/1/0/gpx.xsd">
<trk>
<trkseg>
<trkpt lat="51.10177" lon="0.39349"/>
<trkpt lat="51.10181" lon="0.39335"/>
<trkpt lat="51.10255" lon="0.39366"/>
<trkpt lat="51.10398" lon="0.39466"/>
<trkpt lat="51.10501" lon="0.39533"/>
</trkseg>
</trk>
</gpx>
that I got from the this example.
I encoded this file using MATLAB's function matlab.net.base64encode which yielded the following base64-encoded string:
PD94bWwgdmVyc2lvbj0iMS4wIj8+PGdweCB2ZXJzaW9uPSIxLjAieG1sbnM6eHNpPSJodHRwOi8vd3d3LnczLm9yZy8y
MDAxL1hNTFNjaGVtYS1pbnN0YW5jZSJ4bWxucz0iaHR0cDovL3d3dy50b3BvZ3JhZml4LmNvbS9HUFgvMS8wInhzaTpz
Y2hlbWFMb2NhdGlvbj0iaHR0cDovL3d3dy50b3BvZ3JhZml4LmNvbS9HUFgvMS8wIGh0dHA6Ly93d3cudG9wb2dyYWZp
eC5jb20vR1BYLzEvMC9ncHgueHNkIj48dHJrPjx0cmtzZWc+PHRya3B0IGxhdD0iNTEuMTAxNzciIGxvbj0iMC4zOTM0
OSIvPjx0cmtwdCBsYXQ9IjUxLjEwMTgxIiBsb249IjAuMzkzMzUiLz48dHJrcHQgbGF0PSI1MS4xMDI1NSIgbG9uPSIw
LjM5MzY2Ii8+PHRya3B0IGxhdD0iNTEuMTAzOTgiIGxvbj0iMC4zOTQ2NiIvPjx0cmtwdCBsYXQ9IjUxLjEwNTAxIiBs
b249IjAuMzk1MzMiLz48L3Rya3NlZz48L3Ryaz48L2dweD4=
However, as stated before, the HERE server consistently responds with HTTP-error 400 to my request
https://rme.api.here.com/2/matchroute.json?app_id={app_id}&app_code={app_code}&routemode=car&file=...
where "..." equals the above mentioned base64-encoded string.
Question: Could anyone please provide a code sample showing how to encode the above mentioned GPX file correctly (ideally in MATLAB language) so that the /matchroute resource is able to respond?
Remarks:
If I use the base64 string
UEsDBBQAAAAIANmztEQSwaeZzwAAAM8BAAAQAAAAc2FtcGxlLXRyYWNlLmdweIXPTQuCMBwG8HufQnZv%2F605S0k9dj
EIungdZjpSJ27kPn6%2BRBgYXcYYv2cPzzG2deU8805L1YSIYoLiaHMsWvv9uBlYowOrZYhKY9oAoO973DOsugJ2hFBI
z8k1K%2FNabGWjjWiy%2FJ36ShjVqqITd2lxpmo4XVKgMP6vZaCneKIyYabivzHnr4BhCbb6hoZRpnvMp86L%2BdIapx
ImRJxiSuh%2Bj5xq7CWY%2Bcz1EaypA10qxlfVjvOl8rxVxfzDQrk%2FFCfLRs7YpOCzA%2BZd49LoBVBLAQIUABQAAA
AIANmztEQSwaeZzwAAAM8BAAAQAAAAAAAAAAEAIAAAAAAAAABzYW1wbGUtdHJhY2UuZ3B4UEsFBgAAAAABAAEAPgAAAP
0AAAAAAA%3D%3D
from this example the GET request works. However, I couldn't figure out how to reproduce this encoding myself so that I am able to encode my own data accordingly.
Link to the Here API definition: https://developer.here.com/documentation/route-match/topics/resource-matchroute-request.html
Looking at the two base64 strings I can tell you the fundamental difference between them - the first one (which doesn't work) is unescaped whereas the second one (which works) is.
You can convert between the two formats manually using various online tools like this one. The escaped version of the non-working base64 string, in case you want to test it, is:
PD94bWwgdmVyc2lvbj0iMS4wIj8+PGdweCB2ZXJzaW9uPSIxLjAieG1sbnM6eHNpPSJodHRwOi8vd3d3LnczLm9yZy8y
%0AMDAxL1hNTFNjaGVtYS1pbnN0YW5jZSJ4bWxucz0iaHR0cDovL3d3dy50b3BvZ3JhZml4LmNvbS9HUFgvMS8wInhza
Tpz%0AY2hlbWFMb2NhdGlvbj0iaHR0cDovL3d3dy50b3BvZ3JhZml4LmNvbS9HUFgvMS8wIGh0dHA6Ly93d3cudG9wb2
dyYWZp%0AeC5jb20vR1BYLzEvMC9ncHgueHNkIj48dHJrPjx0cmtzZWc+PHRya3B0IGxhdD0iNTEuMTAxNzciIGxvbj0
iMC4zOTM0%0AOSIvPjx0cmtwdCBsYXQ9IjUxLjEwMTgxIiBsb249IjAuMzkzMzUiLz48dHJrcHQgbGF0PSI1MS4xMDI1
NSIgbG9uPSIw%0ALjM5MzY2Ii8+PHRya3B0IGxhdD0iNTEuMTAzOTgiIGxvbj0iMC4zOTQ2NiIvPjx0cmtwdCBsYXQ9I
jUxLjEwNTAxIiBs%0Ab249IjAuMzk1MzMiLz48L3Rya3NlZz48L3Ryaz48L2dweD4%3D
I'm not an expert on this, but as I understand, you need to URL-encode strings only when you want to paste them as-is into the web path of your browser (read about "URL Params"). If you construct your HTTP requests the right way™ (by this I mean specify the headers of the request and the key-value pairs correctly), you shouldn't have to worry about URL-encoding at all, since the tool that you're using (in this case, MATLAB) should take care of the conversion for you.
Unfortunately, I cannot test this theory, as I have no access to the discussed API - but I am fairly certain that this would solve your problem.
I had the exact same problem.
The documentation seems to be incomplete. You can check here for additional information. Several ways I solved this:
Use filetype='CSV' or filtetype='GPX' in parameter. It says the filetype is guessed if passed, that is actually not true. After passing an XML file the API told me my file didn't look like a 'CSV'
Compression is OPTIONAL, I suggest to avoid it completely I could not find a suitable compression either. It works fine with plain base64 encoding.
I suggest to actually use CSV because the XML actually returns parsing errors.
In python
data='''latitude,longitude
51.10177,0.39349
'''
r = requests.get('https://rme.api.here.com/2/matchroute.json?app_id={APP_ID}&app_code={APP_CODE}&routemode=car&file={file}&filetype={filetype}'.format(
APP_ID=os.getenv('HERE_APP_ID'),
APP_CODE=os.getenv('HERE_APP_CODE'),
filetype='CSV',
file=base64.b64encode(data.encode()).decode()
))

Wrong encoding when saving forms on Orbeon

I created my own persistence for SQL Server, and the CRUD works fine,
BUT I'm having some trouble with the enconding i think,
i receive the xml text from the XForms like that when i'm going to save something
?xml version="1.0" encoding="UTF-8"?xhtml:html xmlns:xhtml="http://www.w3 ...............
metadata
application-name w4/application-name
form-name usuario/form-name
title xml:lang="en"Cadastro/title
description xml:lang="en"Usuário/description ---------PROBLEM!!!
metadata
xforms:instance....................
Any ideas how to solve this??
In general, you need to make sure, when you are decoding the XML, to properly deal with the character encoding. How exactly to do that depends on the programming language or framework you are using, but you should:
if possible use an XML parser and just feed it the bytes (the parser will take care of handling the encoding by itself)
never assume a default or platform encoding when converting bytes to characters (Java in particular has a number of APIs which, for very wrong reasons, use a default encoding which is platform-dependent)

Is form data automatically encoded by browsers?

I have read some stuff about form data encoding, but one thing remains unclear. In case of enctype="application/x-www-form-urlencoded" we need to urlencode data by hand, don't we?
... Forms submitted with this content type must be encoded as follows
Must be encoded by whom? By browsers? Or by application developers?
The other thing is -- what encoding (if any) is used, or should be used, in case of multipart/form-data?
I'm kindda mislead so big thx in advance.
Actually, browsers url-encode data automatically. And this w3 docs is first of all for those who make browsers. So that phrase, Forms submitted with this content type must be encoded as follows means that data should be encoded by browsers. Anyway, one can check it by viewing raw post in the form data handling script (in case of php in looks like file_get_contents("php://input");)

Hotmail messing with encoded URL parameters

We have a system that sends out regular emails with links in, many of which contain URL encoded parameters such as this:
href="http://www.mydomain.com/login.aspx?returnurl=http%3A%2F%2Fwww.mydomain.com%2Fview.aspx%3Fid%3D1234%26alert%3Dtrue"
You can see that the "returnurl" parameter is encoded. However, it seems that a large number of our users (seemingly hotmail) are receiving the emails with this paramater partly decoded such as:
href="http://www.mydomain.com/login.aspx?returnurl=http://www.mydomain.com/view.aspx?view.aspx%3Fid%3D1234%26alert%3Dtrue"
Why would it decode like this? Why only partly decode?? I therefore have no idea how to deal with it. I thought of base-64 encoding but that base64 strings contain characters that would need decoding too... I thought of double encoding but then I will not know whether to double-decode the parameter or not... Can anyone help? Thanks.
One reason this could be happening is because url rules for encoding are different before and after ? so if mechanism that is doing decoding does it from the 'back' of url and apples query decoding rules until it finds first ? then this could cause problem you are describing...
Not sure how to deal with it though as I understand system that does this inappropriate decoding is outside of your control. I would try to hide the ? in return url query somehow...