php preg_replace string doesn't work - preg-replace

I have tried to minify a json file with str_replace() but that doesn't work well as I used it.
//I want to minify a json file with php.
//Here I am trying to replace ", {" with ",{"
//$result = preg_replace('/abc/', 'def', $string); # Replace all 'abc' with 'def'
$new = preg_replace('/, {/', ',{', $new); //doesn't work.. why?

As for the specific issue, { is a special character in regular expressions and you need to escape it. See the Meta-characters section of PCRE syntax in the PHP manual. So change the first argument to '/, \{/'. Never mind, as #Hugo demonstrated, it should work, and without telling us how your approach failed, we can't help more.
More importantly, this is terribly error-prone. What about a JSON string like ['hello, {name}']. Your attempt will incorrectly "minify" the part inside the quotes and turn it into ['hello,{name}']. Not a critical bug in this case, but might be more severe in other cases. Handling string literals properly is a pain, the simplest solution to actually minify JSON strings is to do json_encode(json_decode($json)), since PHP by default does not pretty print or put unnecessary whitespace into JSON.
And finally, maybe you don't really need to do this. If you are doing this to save HTTP traffic or something, just make sure your server gzips responses, caches properly, etc.

Related

Does psycopg2's "execute()" offer sufficient SQL injection prevention?

Can I sleep easy knowing that no SQL Injection can get past pycopg2?
Of course assuming that I correctly use it. By this I understand that I have to actually use the parameterisation (sp?) feature of the cursor.execute() function, eg
my_cur.execute(insert_statement, value_list)
And NOT something like
my_cur.execute(insert_statement % value_list)
The question is whether there is any value in me ALSO parsing and adding escapes to the strings in value_list.
The question is whether there is any value in me ALSO parsing and adding escapes to the strings in value_list.
No, you should not need to do that. The entire point of the two-argument form is to avoid having to escape strings. If you escape them manually, psycopg2 will escape them again, so that the escaped form is visible to end users. This is probably not what you intend.

Regex for strings in Bibtex

I've trying to write a Bibtex parser with flex/bison. Here are the rules for strings in bibtex:
Strings can be enclosed in double quotes "..." or in braces {...}
In a string, braces can be nested
Inside a string, the braces should be balanced (invalid string: {this is a { test})
Inside an "internet" {}, you can have any characters. So this string is valid: {This is a string {test"} and it is valid}
Any ideas on how to do this?
Now you're going into the field of a text parser. Surprisingly, nobody has made a bibtex library for Actionscript that I could find, so it's an interesting problem. If you do make one, do the community a favor and open source it :)
It won't be easy to do since you essentially have to go character by character and check for the chars that you need and do logic around that. However, I recommend you look at as3corelib's implementation of the JSON parser which is somewhat similar to what you're trying to accomplish. You'll at least get an idea of how to do it using a tokenizer and it's a very good start on your project.
Good luck.

How do I protect against cross-site scripting?

I am using php, mysql with smarty and I places where users can put comments and etc. I've already escaped characters before inserting into database for SQL Injection. What else do I need to do?
XSS is mostly about the HTML-escaping(*). Any time you take a string of plain text and put it into an HTML page, whether that text is from the database, directly from user input, from a file, or from somewhere else entirely, you need to escape it.
The minimal HTML escape is to convert all the & symbols to & and all the < symbols to <. When you're putting something into an attribute value you would also need to escape the quote character being used to delimit the attribute, usually " to ". It does no harm to always escape both quotes (" and the single quote apostrophe '), and some people also escape > to >, though this is only necessary for one corner case in XHTML.
Any good web-oriented language should provide a function to do this for you. For example in PHP it's htmlspecialchars():
<p> Hello, <?php htmlspecialchars($name); ?>! </p>
and in Smarty templates it's the escape modifier:
<p> Hello, {$name|escape:'html'}! </p>
really since HTML-escaping is what you want 95% of the time (it's relatively rare to want to allow raw HTML markup to be included), this should have been the default. Newer templating languages have learned that making HTML-escaping opt-in is a huge mistake that causes endless XSS holes, so HTML-escape by default.
You can make Smarty behave like this by changing the default modifiers to html. (Don't use htmlall as they suggest there unless you really know what you're doing, or it'll likely screw up all your non-ASCII characters.)
Whatever you do, don't fall into the common PHP mistake of HTML-escaping or “sanitising” for HTML on the input, before it gets processed or put in the database. This is the wrong place to be performing an output-stage encoding and will give you all sort of problems. If you want to validate your input to make sure it's what the particular application expects, then fine, but weeding out or escaping “special” characters at this stage is inappropriate.
*: Other aspects of XSS are present when (a) you actually want to allow users to post HTML, in which case you have to whittle it down to acceptable elements and attributes, which is a complicated process usually done by a library like HTML Purifier, and even then there have been holes. Alternative, simpler markup schemes may help. And (b) when you allow users to upload files, which is something very difficult to make secure.
In regards to SQL Injection, escaping is not enough - you should use data access libraries where possible and parameterized queries.
For XSS (cross site scripting), start with html encoding outputted data. Again, anti XSS libraries are your friend.
One current approach is to only allow a very limited number of tags in and sanitize those in the process (whitelist + cleanup).
You'll want to make sure people can't post JavaScript code or scary HTML in their comments. I suggest you disallow anything but very basic markup.
If comments are not supposed to contain any markup, doing a
echo htmlspecialchars($commentText);
should suffice, but it's very crude. Better would be to sanitize all input before even putting it in your database. The PHP strip_tags() function could get you started.
If you want to allow HTML comments, but be safe, you could give HTML Purifier a go.
You should not modify data that is entered by the user before putting it into the database. The modification should take place as you're outputting it to the website. You don't want to lose the original data.
As you're spitting it out to the website, you want to escape the special characters into HTML codes using something like htmlspecialchars("my output & stuff", ENT_QUOTES, 'UTF-8') -- make sure to specify the charset you are using. This string will be translated into my output & stuff for the browser to read.
The best way to prevent SQL injection is simply not to use dynamic SQL that accepts user input. Instead, pass the input in as parameters; that way it will be strongly typed and can't inject code.

Can can I encode spaces as %20 in a POST from WWW::Mechanize?

I'm using WWW::Mechanize to do some standard website traversal, but at one point I have to construct a special POST request and send it off. All this requires session cookies.
In the POST request I'm making, spaces are being encoded to + symbols, but I need them encoded as a %20.
I can't figure out how to alter this behaviour. I realise that they are equivalent, but for reasons that are out of my hands, this is what I have to do.
Thanks for any help.
This is hard-coded in URI::_query::query_form(). It translates the spaces to +.
$val =~ s/ /+/g;
It then calls URI::_query::query with the joined pairs, where the only + signs should be encoded spaces. The easiest thing to do is probably to intercept calls to URI::_query::query with Hook::LexWrap, modify the argument before the call starts so you can turn + into %20, and go on from there.
A little bit more annoying would be to redefine URI::_query::query. It's not that long, and you just need to add some code at the beginning of the subroutine to transform the arguments before it continues.
Or, you can fix the broken parser on the other side. :)
I have a couple chapters on dealing with method overriding and dynamic subroutines in Mastering Perl. The trick is to do it without changing the original source so you don't introduce new problems for everyone else.
This appears to be hardcoded in URI::_query::query_form(). I'd conditionally modify that based on a global as is done with $URI::DEFAULT_QUERY_FORM_DELIMITER and submit your change to the URI maintainer.
Other than that, perhaps you could use a LWP::UserAgent request_prepare callback handler?

Are quotes around hash keys a good practice in Perl?

Is it a good idea to quote keys when using a hash in Perl?
I am working on an extremely large legacy Perl code base and trying to adopt a lot of the best practices suggested by Damian Conway in Perl Best Practices. I know that best practices are always a touchy subject with programmers, but hopefully I can get some good answers on this one without starting a flame war. I also know that this is probably something that a lot of people wouldn't argue over due to it being a minor issue, but I'm trying to get a solid list of guidelines to follow as I work my way through this code base.
In the Perl Best Practices book by Damian Conway, there is this example which shows how alignment helps legibility of a section of code, but it doesn't mention (anywhere in the book that I can find) anything about quoting the hash keys.
$ident{ name } = standardize_name($name);
$ident{ age } = time - $birth_date;
$ident{ status } = 'active';
Wouldn't this be better written with quotes to emphasize that you are not using bare words?
$ident{ 'name' } = standardize_name($name);
$ident{ 'age' } = time - $birth_date;
$ident{ 'status' } = 'active';
Without quotes is better. It's in {} so it's obvious that you are not using barewords, plus it is both easier to read and type (two less symbols). But all of this depends on the programmer, of course.
When specifying constant string hash keys, you should always use (single) quotes. E.g., $hash{'key'} This is the best choice because it obviates the need to think about this issue and results in consistent formatting. If you leave off the quotes sometimes, you have to remember to add them when your key contains internal hypens, spaces, or other special characters. You must use quotes in those cases, leading to inconsistent formatting (sometimes unquoted, sometimes quoted). Quoted keys are also more likely to be syntax-highlighted by your editor.
Here's an example where using the "quoted sometimes, not quoted other times" convention can get you into trouble:
$settings{unlink-devices} = 1; # I saved two characters!
That'll compile just fine under use strict, but won't quite do what you expect at runtime. Hash keys are strings. Strings should be quoted as appropriate for their content: single quotes for literal strings, double quotes to allow variable interpolation. Quote your hash keys. It's the safest convention and the simplest to understand and follow.
I never single-quote hash keys. I know that {} basically works like quotes do, except in special cases (a +, and double-quotes). My editor knows this too, and gives me some color-based cues to make sure that I did what I intended.
Using single-quotes everywhere seems to me like a "defensive" practice perpetrated by people that don't know Perl. Save some keyboard wear and learn Perl :)
With the rant out of the way, the real reason I am posting this comment...the other comments seem to have missed the fact that + will "unquote" a bareword. That means you can write:
sub foo {
$hash{+shift} = 42;
}
or:
use constant foo => 'OH HAI';
$hash{+foo} = 'I AM A LOLCAT';
So it's pretty clear that +shift means "call the shift function" and shift means "the string 'shift'".
I will also point out that cperl-mode highlights all of the various cases correctly. If it doesn't, ping me on IRC and I will fix it :)
(Oh, and one more thing. I do quote attribute names in Moose, as in has 'foo' => .... This is a habit I picked up from working with stevan, and although I think it looks nice... it is a bit inconsistent with the rest of my code. Maybe I will stop doing it soon.)
Quoteless hash keys received syntax-level attention from Larry Wall to make sure that there would be no reason for them to be other than best practice. Don't sweat the quotes.
(Incidentally, quotes on array keys are best practice in PHP, and there can be serious consequences to failing to use them, not to mention tons of E_WARNINGs. Okay in Perl != okay in PHP.)
I don't think there's a best practice on this one. Personally I use them in hash keys like so:
$ident{'name'} = standardize_name($name);
but don't use them to the left of the arrow operator:
$ident = {name => standardize_name($name)};
Don't ask me why, it's just the way I do it :)
I think the most important thing you can do is to always, always, always:
use strict;
use warnings;
That way the compiler will catch any semantic errors for you, leaving you less likely to mistype something, whichever way you decide to go.
And the second most important thing is to be consistent.
I go without quotes, just because it's less to type and read and worry about. The times when I have a key which won't be auto-quoted are few and far between so as not to be worth all the extra work and clutter. Perhaps my choice of hash keys have changed to fit my style, which is just as well. Avoid the edge cases entirely.
It is sort of the same reason I use " by default. It's more common for me to plop a variable in the middle of a string than to use a character that I don't want interpolated. Which is to say, I've more often written 'Hello, my name is $name' than "You owe me $1000".
At least, quoting prevent syntax highlighting reserved words in not-so-perfect editors. Check out:
$i{keys} = $a;
$i{values} = [1,2];
...
I prefer to go without quotes, unless I want some string interpolation. And then I use double quotes. I liken it to literal numbers. Perl would really allow you to do the following:
$achoo['1'] = 'kleenex';
$achoo['14'] = 'hankies';
But nobody does that. And it doesn't help with clarity, simply because we add two more characters to type. Just like sometimes we specifically want slot #3 in an array, sometimes we want the PATH entry out of %ENV. Single-quoting it add no clarity as far as I'm concerned.
The way Perl parses code makes it impossible to use other types of "bare words" in a hash index.
Try
$myhash{shift}
and you're only going to get the item stored in the hash under the 'shift' key, you have to do this
$myhash{shift()}
in order to specify that you want the first argument to index your hash.
In addition, I use jEdit, the ONLY visual editor (that I've seen--besides emacs) that allows you total control over highlighting. So it's doubly clear to me. Anything looking like the former gets KEYWORD3 ($myhash) + SYMBOL ({) + LITERAL2 (shift) + SYMBOL (}) if there is a paranthesis before the closing curly it gets KEYWORD3 + SYMBOL + KEYWORD1 + SYMBOL (()}). Plus I'll likely format it like this as well:
$myhash{ shift() }
Go with the quotes! They visually break up the syntax and more editors will support them in the syntax highlighting (hey, even Stack Overflow is highlighting the quote version). I'd also argue that you'd notice typos quicker with editors checking that you ended your quote.
It is better with quotes because it allows you to use special characters not permitted in barewords. By using quotes I can use the special characters of my mother tongue in hash keys.
I've wondered about this myself, especially when I found I've made some lapses:
use constant CONSTANT => 'something';
...
my %hash = ()
$hash{CONSTANT} = 'whoops!'; # Not what I intended
$hash{word-with-hyphens} = 'whoops!'; # wrong again
What I tend to do now is to apply quotes universally on a per-hash basis if at least one of the literal keys needs them; and use parentheses with constants:
$hash{CONSTANT()} = 'ugly, but what can you do?';
You can precede the key with a "-" (minus character) too, but be aware that this appends the "-" to the beginning of your key. From some of my code:
$args{-title} ||= "Intrig";
I use the single quote, double quote, and quoteless way too. All in the same program :-)
I've always used them without quotes but I would echo the use of strict and warnings as they pick out most of the common mistakes.