Need token delimiter that doesn't conflict with PowerShell or RegEx special characters

Need token delimiter that doesn't conflict with PowerShell or RegEx special characters - powershell

I have written a PowerShell function that expands tokens quite nicely. Unfortunately, I used % as my token delimiter, and now I find that % is a special character in PowerShell v3 that is likely to conflict. I am thinking I might use ~ as my delimiter, but I wonder if there is a best practice here? Something that is sure to not conflict with either PowerShell or RegEx, and since my users are not always IT folks, and only need to use my tool a few weeks out of the year, something that is really obvious would be helpful.
I am currently expanding tokens before I do any other RegEx processing, and before I do any $ExecutionContext.InvokeCommand.ExpandString(), but I keep finding new opportunities to expand the tool and I may be doing those things earlier at some point, so I want to future proof the data as much as possible.
Any advice is greatly appreciated!
Gordon

How about using # as your delimiter?
It's not used at all in regex. In Powershell is designates a comment, but only if it appears outside the context of a quoted string literal. In this application you're using it as a string delimiter, so it's always going to be used in a quoted string argument.

Related

Is double escaping in postgres enough to prevent SQL injections/attacks? (Alternative to using parameters) [duplicate]

I realize that parameterized SQL queries is the optimal way to sanitize user input when building queries that contain user input, but I'm wondering what is wrong with taking user input and escaping any single quotes and surrounding the whole string with single quotes. Here's the code:
sSanitizedInput = "'" & Replace(sInput, "'", "''") & "'"
Any single-quote the user enters is replaced with double single-quotes, which eliminates the users ability to end the string, so anything else they may type, such as semicolons, percent signs, etc., will all be part of the string and not actually executed as part of the command.
We are using Microsoft SQL Server 2000, for which I believe the single-quote is the only string delimiter and the only way to escape the string delimiter, so there is no way to execute anything the user types in.
I don't see any way to launch an SQL injection attack against this, but I realize that if this were as bulletproof as it seems to me someone else would have thought of it already and it would be common practice.
What's wrong with this code? Is there a way to get an SQL injection attack past this sanitization technique? Sample user input that exploits this technique would be very helpful.
UPDATE:
I still don't know of any way to effectively launch a SQL injection attack against this code. A few people suggested that a backslash would escape one single-quote and leave the other to end the string so that the rest of the string would be executed as part of the SQL command, and I realize that this method would work to inject SQL into a MySQL database, but in SQL Server 2000 the only way (that I've been able to find) to escape a single-quote is with another single-quote; backslashes won't do it.
And unless there is a way to stop the escaping of the single-quote, none of the rest of the user input will be executed because it will all be taken as one contiguous string.
I understand that there are better ways to sanitize input, but I'm really more interested in learning why the method I provided above won't work. If anyone knows of any specific way to mount a SQL injection attack against this sanitization method I would love to see it.

First of all, it's just bad practice. Input validation is always necessary, but it's also always iffy.
Worse yet, blacklist validation is always problematic, it's much better to explicitly and strictly define what values/formats you accept. Admittedly, this is not always possible - but to some extent it must always be done.
Some research papers on the subject:
http://www.imperva.com/docs/WP_SQL_Injection_Protection_LK.pdf
http://www.it-docs.net/ddata/4954.pdf (Disclosure, this last one was mine ;) )
https://www.owasp.org/images/d/d4/OWASP_IL_2007_SQL_Smuggling.pdf (based on the previous paper, which is no longer available)
Point is, any blacklist you do (and too-permissive whitelists) can be bypassed. The last link to my paper shows situations where even quote escaping can be bypassed.
Even if these situations do not apply to you, it's still a bad idea. Moreover, unless your app is trivially small, you're going to have to deal with maintenance, and maybe a certain amount of governance: how do you ensure that its done right, everywhere all the time?
The proper way to do it:
Whitelist validation: type, length, format or accepted values
If you want to blacklist, go right ahead. Quote escaping is good, but within context of the other mitigations.
Use Command and Parameter objects, to preparse and validate
Call parameterized queries only.
Better yet, use Stored Procedures exclusively.
Avoid using dynamic SQL, and dont use string concatenation to build queries.
If using SPs, you can also limit permissions in the database to executing the needed SPs only, and not access tables directly.
you can also easily verify that the entire codebase only accesses the DB through SPs...

Okay, this response will relate to the update of the question:
"If anyone knows of any specific way to mount a SQL injection attack against this sanitization method I would love to see it."
Now, besides the MySQL backslash escaping - and taking into account that we're actually talking about MSSQL, there are actually 3 possible ways of still SQL injecting your code
sSanitizedInput = "'" & Replace(sInput, "'", "''") & "'"
Take into account that these will not all be valid at all times, and are very dependant on your actual code around it:
Second-order SQL Injection - if an SQL query is rebuilt based upon data retrieved from the database after escaping, the data is concatenated unescaped and may be indirectly SQL-injected. See
String truncation - (a bit more complicated) - Scenario is you have two fields, say a username and password, and the SQL concatenates both of them. And both fields (or just the first) has a hard limit on length. For instance, the username is limited to 20 characters. Say you have this code:
username = left(Replace(sInput, "'", "''"), 20)
Then what you get - is the username, escaped, and then trimmed to 20 characters. The problem here - I'll stick my quote in the 20th character (e.g. after 19 a's), and your escaping quote will be trimmed (in the 21st character). Then the SQL
sSQL = "select * from USERS where username = '" + username + "' and password = '" + password + "'"
combined with the aforementioned malformed username will result in the password already being outside the quotes, and will just contain the payload directly.
3. Unicode Smuggling - In certain situations, it is possible to pass a high-level unicode character that looks like a quote, but isn't - until it gets to the database, where suddenly it is. Since it isn't a quote when you validate it, it will go through easy... See my previous response for more details, and link to original research.

In a nutshell: Never do query escaping yourself. You're bound to get something wrong. Instead, use parameterized queries, or if you can't do that for some reason, use an existing library that does this for you. There's no reason to be doing it yourself.

I realize this is a long time after the question was asked, but ..
One way to launch an attack on the 'quote the argument' procedure is with string truncation.
According to MSDN, in SQL Server 2000 SP4 (and SQL Server 2005 SP1), a too long string will be quietly truncated.
When you quote a string, the string increases in size. Every apostrophe is repeated.
This can then be used to push parts of the SQL outside the buffer. So you could effectively trim away parts of a where clause.
This would probably be mostly useful in a 'user admin' page scenario where you could abuse the 'update' statement to not do all the checks it was supposed to do.
So if you decide to quote all the arguments, make sure you know what goes on with the string sizes and see to it that you don't run into truncation.
I would recommend going with parameters. Always. Just wish I could enforce that in the database. And as a side effect, you are more likely to get better cache hits because more of the statements look the same. (This was certainly true on Oracle 8)

I've used this technique when dealing with 'advanced search' functionality, where building a query from scratch was the only viable answer. (Example: allow the user to search for products based on an unlimited set of constraints on product attributes, displaying columns and their permitted values as GUI controls to reduce the learning threshold for users.)
In itself it is safe AFAIK. As another answerer pointed out, however, you may also need to deal with backspace escaping (albeit not when passing the query to SQL Server using ADO or ADO.NET, at least -- can't vouch for all databases or technologies).
The snag is that you really have to be certain which strings contain user input (always potentially malicious), and which strings are valid SQL queries. One of the traps is if you use values from the database -- were those values originally user-supplied? If so, they must also be escaped. My answer is to try to sanitize as late as possible (but no later!), when constructing the SQL query.
However, in most cases, parameter binding is the way to go -- it's just simpler.

Input sanitation is not something you want to half-ass. Use your whole ass. Use regular expressions on text fields. TryCast your numerics to the proper numeric type, and report a validation error if it doesn't work. It is very easy to search for attack patterns in your input, such as ' --. Assume all input from the user is hostile.

It's a bad idea anyway as you seem to know.
What about something like escaping the quote in string like this: \'
Your replace would result in: \''
If the backslash escapes the first quote, then the second quote has ended the string.

Simple answer: It will work sometimes, but not all the time.
You want to use white-list validation on everything you do, but I realize that's not always possible, so you're forced to go with the best guess blacklist. Likewise, you want to use parametrized stored procs in everything, but once again, that's not always possible, so you're forced to use sp_execute with parameters.
There are ways around any usable blacklist you can come up with (and some whitelists too).
A decent writeup is here: http://www.owasp.org/index.php/Top_10_2007-A2
If you need to do this as a quick fix to give you time to get a real one in place, do it. But don't think you're safe.

There are two ways to do it, no exceptions, to be safe from SQL-injections; prepared statements or prameterized stored procedures.

If you have parameterised queries available you should be using them at all times. All it takes is for one query to slip through the net and your DB is at risk.

Patrick, are you adding single quotes around ALL input, even numeric input? If you have numeric input, but are not putting the single quotes around it, then you have an exposure.

Yeah, that should work right up until someone runs SET QUOTED_IDENTIFIER OFF and uses a double quote on you.
Edit: It isn't as simple as not allowing the malicious user to turn off quoted identifiers:
The SQL Server Native Client ODBC driver and SQL Server Native Client OLE DB Provider for SQL Server automatically set QUOTED_IDENTIFIER to ON when connecting. This can be configured in ODBC data sources, in ODBC connection attributes, or OLE DB connection properties. The default for SET QUOTED_IDENTIFIER is OFF for connections from DB-Library applications.
When a stored procedure is created, the SET QUOTED_IDENTIFIER and SET ANSI_NULLS settings are captured and used for subsequent invocations of that stored procedure.
SET QUOTED_IDENTIFIER also corresponds to the QUOTED_IDENTIFER setting of ALTER DATABASE.
SET QUOTED_IDENTIFIER is set at parse time. Setting at parse time means that if the SET statement is present in the batch or stored procedure, it takes effect, regardless of whether code execution actually reaches that point; and the SET statement takes effect before any statements are executed.
There's a lot of ways QUOTED_IDENTIFIER could be off without you necessarily knowing it. Admittedly - this isn't the smoking gun exploit you're looking for, but it's a pretty big attack surface. Of course, if you also escaped double quotes - then we're back where we started. ;)

Your defence would fail if:
the query is expecting a number rather than a string
there were any other way to represent a single quotation mark, including:
an escape sequence such as \039
a unicode character
(in the latter case, it would have to be something which were expanded only after you've done your replace)

What ugly code all that sanitisation of user input would be! Then the clunky StringBuilder for the SQL statement. The prepared statement method results in much cleaner code, and the SQL Injection benefits are a really nice addition.
Also why reinvent the wheel?

Rather than changing a single quote to (what looks like) two single quotes, why not just change it to an apostrophe, a quote, or remove it entirely?
Either way, it's a bit of a kludge... especially when you legitimately have things (like names) which may use single quotes...
NOTE: Your method also assumes everyone working on your app always remembers to sanitize input before it hits the database, which probably isn't realistic most of the time.

I'm not sure about your case, but I just encountered a case in Mysql that Replace(value, "'", "''") not only can't prevent SQL injection, but also causes the injection.
if an input ended with \', it's OK without replace, but when replacing the trailing ', the \ before end of string quote causes the SQL error.

While you might find a solution that works for strings, for numerical predicates you need to also make sure they're only passing in numbers (simple check is can it be parsed as int/double/decimal?).
It's a lot of extra work.

It might work, but it seems a little hokey to me. I'd recommend verifing that each string is valid by testing it against a regular expression instead.

Yes, you can, if...
After studying the topic, I think input sanitized as you suggested is safe, but only under these rules:
you never allow string values coming from users to become anything else than string literals (i.e. avoid giving configuration option: "Enter additional SQL column names/expressions here:"). Value types other than strings (numbers, dates, ...): convert them to their native data types and provide a routine for SQL literal from each data type.
SQL statements are problematic to validate
you either use nvarchar/nchar columns (and prefix string literals with N) OR limit values going into varchar/char columns to ASCII characters only (e.g. throw exception when creating SQL statement)
this way you will be avoiding automatic apostrophe conversion from CHAR(700) to CHAR(39) (and maybe other similar Unicode hacks)
you always validate value length to fit actual column length (throw exception if longer)
there was a known defect in SQL Server allowing to bypass SQL error thrown on truncation (leading to silent truncation)
you ensure that SET QUOTED_IDENTIFIER is always ON
beware, it is taken into effect in parse-time, i.e. even in inaccessible sections of code
Complying with these 4 points, you should be safe. If you violate any of them, a way for SQL injection opens.

Does psycopg2's "execute()" offer sufficient SQL injection prevention?

Can I sleep easy knowing that no SQL Injection can get past pycopg2?
Of course assuming that I correctly use it. By this I understand that I have to actually use the parameterisation (sp?) feature of the cursor.execute() function, eg
my_cur.execute(insert_statement, value_list)
And NOT something like
my_cur.execute(insert_statement % value_list)
The question is whether there is any value in me ALSO parsing and adding escapes to the strings in value_list.

The question is whether there is any value in me ALSO parsing and adding escapes to the strings in value_list.
No, you should not need to do that. The entire point of the two-argument form is to avoid having to escape strings. If you escape them manually, psycopg2 will escape them again, so that the escaped form is visible to end users. This is probably not what you intend.

lex default token definition syntax

I guess this is a simple question, but I have found no reference. I have a small lex file defining some tokens from a string and altering them (actually converting them to uppercase).
Basically it is a list of commands like this:
word {setToUppercase(yytext);}
Where setToUppercase is a procedure to change case and store it.
I need to have the complete entry string with the altered words. Is there a way to define a default token / rest of tokens so I can asociate them with an unaltered storage in an output string?

You can do that in one shot with:
.|\n {save_str(yytext);}

I said it was an easy one.
. {save_str(yytext);}
\n {save_str(yytext);}
This way all characters and newline are treated.

Regular expression to prevent SQL injection

I know I have to escape single quotes, but I was just wondering if there's any other character, or text string I should guard against
I'm working with mysql and h2 database...

If you check the MySQL function mysql-real-escape-string which is used by all upper level languages you'll see that the strange characters list is quite huge:
\
'
"
NUL (ASCII 0)
\n
\r
Control+Z
The upper language wrappers like the PHP one may also protect the strings from malformed unicode characters which may end up as a quote.
The conclusion is: do not escape strings, especially with hard-to-debug hard-to-read, hard-to-understand regular expressions. Use the built-in provided functions or use parameterized SQL queries (where all parameters cannot contain anything interpredted as SQL by the engine). This is also stated in h2 documentation: h2 db sql injection protection.
A simple solution for the problem above is to use a prepared statement:

This will somewhat depend on what type of information you need to obtain from the user. If you are only looking for simple text, then you might as well ignore all special characters that a user might input (if it's not too much trouble)--why allow the user to input characters that don't make sense in your query?
Some languages have functions that will take care of this for you. For example, PHP has the mysql_real_escape_string() function (http://php.net/manual/en/function.mysql-real-escape-string.php).
You are correct that single quotes (') are user input no-no's; but double quotes (") and backslashes (\) should also definitely be ignored (see the above link for which characters the PHP function ignores, since those are the most important and basic ones).
Hope this is at least a good start!

Are quotes around hash keys a good practice in Perl?

Is it a good idea to quote keys when using a hash in Perl?
I am working on an extremely large legacy Perl code base and trying to adopt a lot of the best practices suggested by Damian Conway in Perl Best Practices. I know that best practices are always a touchy subject with programmers, but hopefully I can get some good answers on this one without starting a flame war. I also know that this is probably something that a lot of people wouldn't argue over due to it being a minor issue, but I'm trying to get a solid list of guidelines to follow as I work my way through this code base.
In the Perl Best Practices book by Damian Conway, there is this example which shows how alignment helps legibility of a section of code, but it doesn't mention (anywhere in the book that I can find) anything about quoting the hash keys.
$ident{ name } = standardize_name($name);
$ident{ age } = time - $birth_date;
$ident{ status } = 'active';
Wouldn't this be better written with quotes to emphasize that you are not using bare words?
$ident{ 'name' } = standardize_name($name);
$ident{ 'age' } = time - $birth_date;
$ident{ 'status' } = 'active';

Without quotes is better. It's in {} so it's obvious that you are not using barewords, plus it is both easier to read and type (two less symbols). But all of this depends on the programmer, of course.

When specifying constant string hash keys, you should always use (single) quotes. E.g., $hash{'key'} This is the best choice because it obviates the need to think about this issue and results in consistent formatting. If you leave off the quotes sometimes, you have to remember to add them when your key contains internal hypens, spaces, or other special characters. You must use quotes in those cases, leading to inconsistent formatting (sometimes unquoted, sometimes quoted). Quoted keys are also more likely to be syntax-highlighted by your editor.
Here's an example where using the "quoted sometimes, not quoted other times" convention can get you into trouble:
$settings{unlink-devices} = 1; # I saved two characters!
That'll compile just fine under use strict, but won't quite do what you expect at runtime. Hash keys are strings. Strings should be quoted as appropriate for their content: single quotes for literal strings, double quotes to allow variable interpolation. Quote your hash keys. It's the safest convention and the simplest to understand and follow.

I never single-quote hash keys. I know that {} basically works like quotes do, except in special cases (a +, and double-quotes). My editor knows this too, and gives me some color-based cues to make sure that I did what I intended.
Using single-quotes everywhere seems to me like a "defensive" practice perpetrated by people that don't know Perl. Save some keyboard wear and learn Perl :)
With the rant out of the way, the real reason I am posting this comment...the other comments seem to have missed the fact that + will "unquote" a bareword. That means you can write:
sub foo {
$hash{+shift} = 42;
}
or:
use constant foo => 'OH HAI';
$hash{+foo} = 'I AM A LOLCAT';
So it's pretty clear that +shift means "call the shift function" and shift means "the string 'shift'".
I will also point out that cperl-mode highlights all of the various cases correctly. If it doesn't, ping me on IRC and I will fix it :)
(Oh, and one more thing. I do quote attribute names in Moose, as in has 'foo' => .... This is a habit I picked up from working with stevan, and although I think it looks nice... it is a bit inconsistent with the rest of my code. Maybe I will stop doing it soon.)

Quoteless hash keys received syntax-level attention from Larry Wall to make sure that there would be no reason for them to be other than best practice. Don't sweat the quotes.
(Incidentally, quotes on array keys are best practice in PHP, and there can be serious consequences to failing to use them, not to mention tons of E_WARNINGs. Okay in Perl != okay in PHP.)

I don't think there's a best practice on this one. Personally I use them in hash keys like so:
$ident{'name'} = standardize_name($name);
but don't use them to the left of the arrow operator:
$ident = {name => standardize_name($name)};
Don't ask me why, it's just the way I do it :)
I think the most important thing you can do is to always, always, always:
use strict;
use warnings;
That way the compiler will catch any semantic errors for you, leaving you less likely to mistype something, whichever way you decide to go.
And the second most important thing is to be consistent.

I go without quotes, just because it's less to type and read and worry about. The times when I have a key which won't be auto-quoted are few and far between so as not to be worth all the extra work and clutter. Perhaps my choice of hash keys have changed to fit my style, which is just as well. Avoid the edge cases entirely.
It is sort of the same reason I use " by default. It's more common for me to plop a variable in the middle of a string than to use a character that I don't want interpolated. Which is to say, I've more often written 'Hello, my name is $name' than "You owe me $1000".

At least, quoting prevent syntax highlighting reserved words in not-so-perfect editors. Check out:
$i{keys} = $a;
$i{values} = [1,2];
...

I prefer to go without quotes, unless I want some string interpolation. And then I use double quotes. I liken it to literal numbers. Perl would really allow you to do the following:
$achoo['1'] = 'kleenex';
$achoo['14'] = 'hankies';
But nobody does that. And it doesn't help with clarity, simply because we add two more characters to type. Just like sometimes we specifically want slot #3 in an array, sometimes we want the PATH entry out of %ENV. Single-quoting it add no clarity as far as I'm concerned.
The way Perl parses code makes it impossible to use other types of "bare words" in a hash index.
Try
$myhash{shift}
and you're only going to get the item stored in the hash under the 'shift' key, you have to do this
$myhash{shift()}
in order to specify that you want the first argument to index your hash.
In addition, I use jEdit, the ONLY visual editor (that I've seen--besides emacs) that allows you total control over highlighting. So it's doubly clear to me. Anything looking like the former gets KEYWORD3 ($myhash) + SYMBOL ({) + LITERAL2 (shift) + SYMBOL (}) if there is a paranthesis before the closing curly it gets KEYWORD3 + SYMBOL + KEYWORD1 + SYMBOL (()}). Plus I'll likely format it like this as well:
$myhash{ shift() }

Go with the quotes! They visually break up the syntax and more editors will support them in the syntax highlighting (hey, even Stack Overflow is highlighting the quote version). I'd also argue that you'd notice typos quicker with editors checking that you ended your quote.

It is better with quotes because it allows you to use special characters not permitted in barewords. By using quotes I can use the special characters of my mother tongue in hash keys.

I've wondered about this myself, especially when I found I've made some lapses:
use constant CONSTANT => 'something';
...
my %hash = ()
$hash{CONSTANT} = 'whoops!'; # Not what I intended
$hash{word-with-hyphens} = 'whoops!'; # wrong again
What I tend to do now is to apply quotes universally on a per-hash basis if at least one of the literal keys needs them; and use parentheses with constants:
$hash{CONSTANT()} = 'ugly, but what can you do?';

You can precede the key with a "-" (minus character) too, but be aware that this appends the "-" to the beginning of your key. From some of my code:
$args{-title} ||= "Intrig";
I use the single quote, double quote, and quoteless way too. All in the same program :-)

I've always used them without quotes but I would echo the use of strict and warnings as they pick out most of the common mistakes.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse