Perl Escape::HTML function doesn't escape #? - perl

Managed to narrow the code down a lot more:
http://pastebin.com/J40Atm9m
Sorry to be a pain but I really thought I had it cracked by using uri_escape in the GetQueryString subroutine but now I'm really out of ideas otherwise I wouldn't ask.
Any insights are much appreciated.
Martin

That is a lot of code. A reduced test case would be helpful.
Rather than read all of it, I'm going to assume that this is what you are doing:
You get raw data
You put raw data in a URI
You encode the URI for HTML
You put the encoded URI in the HTML
If so, then what you missed is this:
You need to encode the data for the URI.

HTML::Escape isn't supposed to escape "#" because "#" isn't unsafe for HTML.The problem is that you're not URI-escaping your data before you're putting it into a URI; use URI::Escape for that.

Related

php preg_replace string doesn't work

I have tried to minify a json file with str_replace() but that doesn't work well as I used it.
//I want to minify a json file with php.
//Here I am trying to replace ", {" with ",{"
//$result = preg_replace('/abc/', 'def', $string); # Replace all 'abc' with 'def'
$new = preg_replace('/, {/', ',{', $new); //doesn't work.. why?
As for the specific issue, { is a special character in regular expressions and you need to escape it. See the Meta-characters section of PCRE syntax in the PHP manual. So change the first argument to '/, \{/'. Never mind, as #Hugo demonstrated, it should work, and without telling us how your approach failed, we can't help more.
More importantly, this is terribly error-prone. What about a JSON string like ['hello, {name}']. Your attempt will incorrectly "minify" the part inside the quotes and turn it into ['hello,{name}']. Not a critical bug in this case, but might be more severe in other cases. Handling string literals properly is a pain, the simplest solution to actually minify JSON strings is to do json_encode(json_decode($json)), since PHP by default does not pretty print or put unnecessary whitespace into JSON.
And finally, maybe you don't really need to do this. If you are doing this to save HTTP traffic or something, just make sure your server gzips responses, caches properly, etc.

What kind of encoding is this (not base64)?

Recently, I'm working with an API which has response code similar to this:
P257SIae5AEchhrQy6
I've tried base64 and doesn't seem like it. Any clue? I think it could be some sort of base64 because sometimes it has this ending " VEPJlm/a2cDz9JMY+ignA= " which uses = to fill up space.
edit:
NjFpSYZO7LQByUEhBVWKR6R69DLVKdJxvC+PfsQAvdkADzYl/P257SIae5AEchhrQy6/Gx2FCtHP7ykVmc6kHQRczgq3WF3AvNAHuPsMeXphjWbHGUGyuEz5Jd4QD9xt0UdZVFt/tQW6+l+CkSA3U1CwsV8n787tB+t/XbB42F57k1LpT39OUYTvRS4lbnq3
It may not be any encoding. Just a unique id.
It is probably Base64 encoded binary data.
Is this over a serial line? May be a mismatched baudrate, parity, data/stop bit or flow control handshake problem.

Regex to remove HTML-head-tag

how can I remove, with NSRegularExpression, the entire head-tag in a HTML file. Can some one give me a regex?
Thanks in advance,
Ph99Ph
There is none! HTML is a type-2 language and thus not parsable with a regular expression (type-3).
See this wiki article in case of doubt.
Lots of people use regex for parsing/editing HTML. This works quite well in simple cases but is utterly error prone.
This being said: You should have fairly reliable results with this regex:
<head>.+?</head>
This requires "." to also match line breaks. If it doesn't, then use this:
<head>(?:.|\n|\r)+?</head>
Again: This is error prone, don't do it.
What you should use is an XML parser such as NSXMLParser.
Please see the accepted answer at RegEx match open tags except XHTML self-contained tags. Or any version of this exact same question posted each day since the beginning of Stack Overflow.
In short, you cannot reliably parse HTML with Regular Expressions. RegEx is simply not advanced enough because of the complexities of HTML.
use something like this :
result = System.Text.RegularExpressions.Regex.Replace(result,
#"<( )*head([^>])*>", "<head>",
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
result = System.Text.RegularExpressions.Regex.Replace(result,
#"(<( )*(/)( )*head( )*>)", "</head>",
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
result = System.Text.RegularExpressions.Regex.Replace(result,
"(<head>).*(</head>)", " ",
System.Text.RegularExpressions.RegexOptions.IgnoreCase);

How to avoid malformed URI sequence error?

I'm working with perl. I have data saved on database as  “
and I want to escape those characters to avoid having malformed URI sequence error on the client side. This error seems to happen on fire fox only. The fix I found while googling is not to use decodeURI , yet I need this for other characters to be displayed correctly.
Any help? uri_escape does not seem enough on the server side.
Thanks in advance.
Detalils:
In perl I'm doing the following:
print "<div style='display:none;' id='summary_".$note_count."_note'>".uri_escape($summary)."</div>";
and on the java script side I want to read from this div and place it on another place as this:
getObj('summary_div').innerHTML= unescape(decodeURI(note_obj.innerHTML));
where the note_obj is the hidden div that saved the summary on perl.
When I remove decodeURI the problem is solved, I don't get malformed URI sequence error on java script. Yet I need to use decodeURI for other characters.
This issue seems to be reproduced on firefox and IE7.
you can try to use the CGI module, and perform
$uri = CGI::escape($uri);
maybe it depends of the context your try to escape the uri.
This worked fine for me in CGI context.
After you added details, i can suggest :
<div style='display:none;' id='summary_".$note_count."_note'>".CGI::escape($summary)."</div>";
URL escaping won't help you here -- that's for escaping URLs, not escaping text in HTML. What you really want is to encode the string when you output it. See the Encode.pm built-in library. Make sure that you get your charset statements right in the HTTP headers: "Content-Type: text/html; charset=UTF-8" or something like that.
If you're unlucky, you may also have to decode the string as it comes out of the database. That depends on the database driver and the encoding...

Can can I encode spaces as %20 in a POST from WWW::Mechanize?

I'm using WWW::Mechanize to do some standard website traversal, but at one point I have to construct a special POST request and send it off. All this requires session cookies.
In the POST request I'm making, spaces are being encoded to + symbols, but I need them encoded as a %20.
I can't figure out how to alter this behaviour. I realise that they are equivalent, but for reasons that are out of my hands, this is what I have to do.
Thanks for any help.
This is hard-coded in URI::_query::query_form(). It translates the spaces to +.
$val =~ s/ /+/g;
It then calls URI::_query::query with the joined pairs, where the only + signs should be encoded spaces. The easiest thing to do is probably to intercept calls to URI::_query::query with Hook::LexWrap, modify the argument before the call starts so you can turn + into %20, and go on from there.
A little bit more annoying would be to redefine URI::_query::query. It's not that long, and you just need to add some code at the beginning of the subroutine to transform the arguments before it continues.
Or, you can fix the broken parser on the other side. :)
I have a couple chapters on dealing with method overriding and dynamic subroutines in Mastering Perl. The trick is to do it without changing the original source so you don't introduce new problems for everyone else.
This appears to be hardcoded in URI::_query::query_form(). I'd conditionally modify that based on a global as is done with $URI::DEFAULT_QUERY_FORM_DELIMITER and submit your change to the URI maintainer.
Other than that, perhaps you could use a LWP::UserAgent request_prepare callback handler?