How to avoid malformed URI sequence error? - perl

I'm working with perl. I have data saved on database as  “
and I want to escape those characters to avoid having malformed URI sequence error on the client side. This error seems to happen on fire fox only. The fix I found while googling is not to use decodeURI , yet I need this for other characters to be displayed correctly.
Any help? uri_escape does not seem enough on the server side.
Thanks in advance.
Detalils:
In perl I'm doing the following:
print "<div style='display:none;' id='summary_".$note_count."_note'>".uri_escape($summary)."</div>";
and on the java script side I want to read from this div and place it on another place as this:
getObj('summary_div').innerHTML= unescape(decodeURI(note_obj.innerHTML));
where the note_obj is the hidden div that saved the summary on perl.
When I remove decodeURI the problem is solved, I don't get malformed URI sequence error on java script. Yet I need to use decodeURI for other characters.
This issue seems to be reproduced on firefox and IE7.

you can try to use the CGI module, and perform
$uri = CGI::escape($uri);
maybe it depends of the context your try to escape the uri.
This worked fine for me in CGI context.
After you added details, i can suggest :
<div style='display:none;' id='summary_".$note_count."_note'>".CGI::escape($summary)."</div>";

URL escaping won't help you here -- that's for escaping URLs, not escaping text in HTML. What you really want is to encode the string when you output it. See the Encode.pm built-in library. Make sure that you get your charset statements right in the HTTP headers: "Content-Type: text/html; charset=UTF-8" or something like that.
If you're unlucky, you may also have to decode the string as it comes out of the database. That depends on the database driver and the encoding...

Related

php preg_replace string doesn't work

I have tried to minify a json file with str_replace() but that doesn't work well as I used it.
//I want to minify a json file with php.
//Here I am trying to replace ", {" with ",{"
//$result = preg_replace('/abc/', 'def', $string); # Replace all 'abc' with 'def'
$new = preg_replace('/, {/', ',{', $new); //doesn't work.. why?
As for the specific issue, { is a special character in regular expressions and you need to escape it. See the Meta-characters section of PCRE syntax in the PHP manual. So change the first argument to '/, \{/'. Never mind, as #Hugo demonstrated, it should work, and without telling us how your approach failed, we can't help more.
More importantly, this is terribly error-prone. What about a JSON string like ['hello, {name}']. Your attempt will incorrectly "minify" the part inside the quotes and turn it into ['hello,{name}']. Not a critical bug in this case, but might be more severe in other cases. Handling string literals properly is a pain, the simplest solution to actually minify JSON strings is to do json_encode(json_decode($json)), since PHP by default does not pretty print or put unnecessary whitespace into JSON.
And finally, maybe you don't really need to do this. If you are doing this to save HTTP traffic or something, just make sure your server gzips responses, caches properly, etc.

Perl CGI.pm encoding - wrong encoding for "ě"

I have a simple web page that uses CGI.pm This is what I do:
when I call any perl CGI.pm function and use czech character "ě" for value of a textfield, label of radio_group or anything else I get �› insetad of "ě"
this is extremly weird - since the whole page is utf8 (<meta name="charset" content="utf-8"/> ). Especially since this works
print '<textfield value="ěěěě" >';
therefore I am positive - it has to be CGI.pm causing the problem... I tried to put
use utf8;
utf8::decode($textfield_value);
at the beginning of my scirpt and it fixed the CGI.pm problem but made all other characters in the script (those that are regulary printed) look funny..
Any ideas???
Set the accept-charset attribute in your form fields to UTF-8?
<form action="/..." accept-charset="UTF-8">
This might not be sufficient to solve your problem, but it is often necessary to force the client browser to utf8-encode the form data that gets sent to the server.
Have you tried replacing the ě's with their octal or hex escapes? Unfortunately, there doesn't seem to be an HTML code for the character.

Perl Escape::HTML function doesn't escape #?

Managed to narrow the code down a lot more:
http://pastebin.com/J40Atm9m
Sorry to be a pain but I really thought I had it cracked by using uri_escape in the GetQueryString subroutine but now I'm really out of ideas otherwise I wouldn't ask.
Any insights are much appreciated.
Martin
That is a lot of code. A reduced test case would be helpful.
Rather than read all of it, I'm going to assume that this is what you are doing:
You get raw data
You put raw data in a URI
You encode the URI for HTML
You put the encoded URI in the HTML
If so, then what you missed is this:
You need to encode the data for the URI.
HTML::Escape isn't supposed to escape "#" because "#" isn't unsafe for HTML.The problem is that you're not URI-escaping your data before you're putting it into a URI; use URI::Escape for that.

UTF-8 incorrectly displayed in Lua/ Corona

In Lua, for an iPad Corona project, I'm requesting a UTF-8 server text file (containing Chinese characters) using network.request, but the result when displayed in the console or in the app shows as "garbage". Google Chrome, for instance, displays the same UTF-8 page fine, as I'm setting the http header when the server sends this (using PHP) to 'Content-Type: text/plain; charset=utf-8' (and there's no BOM, byte order mark either). The "garbage" I'm seeing in Lua looks similar to when I "force" Chrome to render the page as ISO-8859-1 using the options menu.
Does anyone have any help or pointers?
If all else fails, how would I convert the "garbage" string back to its UTF-8 origins within Lua?
Thanks for any help!
Lua doesn't know anything about UTF-8; Lua strings are just sequences of bytes. It sounds like Corona itself is parsing the strings as ISO8859-1. The most likely cause for this is them doing something really stupid and naive like treating each byte of the string as a Unicode code point.
I'm afraid I don't know Corona, so can't provide any specific solutions, but I'd suggest looking to see what functions it's got that involve encodings --- there may be a specific function to render a string with a particular encoding, for example.
Can you show the code for your network.request() call?
If you're downloading a html page, you should use network.download().
I had this exact same problem, except with Japanese characters. Although Lua doesn't support UTF-8, Corona acts like it does. What that means is that... if you pass a UTF-8 String to display.newText(...), it should display properly. Now, if you output to the console, it will actually print out the raw bytes of the String. And, if you try to print the length of the string, it will actually print out the number of bytes.
So, in summary, Lua treats all strings as an array of bytes. It knows nothing about UTF-8. Some Corona API methods, when passed UTF-8 strings, will display the strings correctly.
I had issues when I mixed UTF-8 with plain ASCII characters, which I believe confused Corona (what I mean is that I mixed English characters with Japanese characters... still all UTF-8, though). I have a hunch that each character in the string must be of the same length in bytes for Corona to display it properly. Try printing out one character at a time to see if that helps. Please feel free to post comments here if you run into trouble. I'd like to figure this issue out myself, too.

Can can I encode spaces as %20 in a POST from WWW::Mechanize?

I'm using WWW::Mechanize to do some standard website traversal, but at one point I have to construct a special POST request and send it off. All this requires session cookies.
In the POST request I'm making, spaces are being encoded to + symbols, but I need them encoded as a %20.
I can't figure out how to alter this behaviour. I realise that they are equivalent, but for reasons that are out of my hands, this is what I have to do.
Thanks for any help.
This is hard-coded in URI::_query::query_form(). It translates the spaces to +.
$val =~ s/ /+/g;
It then calls URI::_query::query with the joined pairs, where the only + signs should be encoded spaces. The easiest thing to do is probably to intercept calls to URI::_query::query with Hook::LexWrap, modify the argument before the call starts so you can turn + into %20, and go on from there.
A little bit more annoying would be to redefine URI::_query::query. It's not that long, and you just need to add some code at the beginning of the subroutine to transform the arguments before it continues.
Or, you can fix the broken parser on the other side. :)
I have a couple chapters on dealing with method overriding and dynamic subroutines in Mastering Perl. The trick is to do it without changing the original source so you don't introduce new problems for everyone else.
This appears to be hardcoded in URI::_query::query_form(). I'd conditionally modify that based on a global as is done with $URI::DEFAULT_QUERY_FORM_DELIMITER and submit your change to the URI maintainer.
Other than that, perhaps you could use a LWP::UserAgent request_prepare callback handler?