How can I disable Lucene.Net query parser regular expressions - lucene.net

I would like to disable the query parser regular expression feature so that forward slashes will be considered as normal characters without having to be escaped. Is this possible out of the box? If not, how would I create such a query parser?

Related

Complex RegEx in T-SQL

I recently started using RegEx as conditional in my queries, but it seems that T-SQL has limited support for the official syntax.
As an example, I wish to test if a string is valid as a time between 00:00 and 23:59, and a fine RegEx expression would be "([0-1][0-9]|[2][0-3]):([0-5][0-9])":
select iif('16:06' like '([0-1][0-9]|[2][0-3]):([0-5][0-9])', 'Valid', 'Invalid')
.. fails and outputs "Invalid". Am I right to understand that T-SQL cannot handle groupings and conditionals (|)? I wound up lazily using a simplified RegEx which does not properly test the string - which I am fairly unhappy with:
select iif('16:06' like '[0-2][0-9]:[0-5][0-9]', 'Valid, 'Invalid')
.. which returns "Valid", but would also consider the string "28:06" as valid.
I know I can add further checks to fully check if it is a valid time string, but I would much prefer to take full advantage of RegEx.
Simply asked: Am I just doing or thinking things wrong about this being a limitation, and if yes - how can I use proper RegEx in T-SQL?
The pattern syntax used for LIKE and PATINDEX is much more limited than what's commonly known as Regular Expressions.
In standard SQL it actually has only 2 special characters.
% : wildcard for 0 or more characters
_ : any 1 character
And T-SQL added the character class [...] to the syntax.
But to test if a string contains a time, using LIKE is a clumsy way to do it.
In MS Sql Server one can use the TRY_CONVERT or TRY_CAST functions.
They'll return NULL is the conversion to a datatype fails.
select IIF(TRY_CAST('16:06' AS TIME) IS NOT NULL, 'Valid', 'Invalid')
This will return 'Valid' for '23:59', but 'Invalid' for '24:00'
You may use the following logic:
SELECT IIF('16:06' LIKE '[01][0-9]:[0-5][0-9]' OR
'16:06' LIKE '2[0-3]:[0-5][0-9]', 'Valid', 'Invalid');
The first LIKE expression matches 00:00 to 19:59, and the second LIKE matches 20:00 to 23:59. If SQL Server supported full regex, we could just use a single regex expression with an alternation.
I would recommend writing a user defined function using SQLCLR. Since .Net supports Regex you can port it to T-SQL. First link in Google gave this implementation, but there may be other (better) implementations.
Caveat - use of SQLCLR requires elevated permissions and may lead to security issues or performance issues or even issues with stability of the SQL Server if not implemented correctly. But if you know what you are doing this may lead to significant enhancements of T-SQL specific for your use cases.

"sqlLike" and "sqlLikeCaseInsensitive" escape character?

Is there any way to escape SQL Like string when using "sqlLike" and "sqlLikeCaseInsensitive"?
Example: I want a match for "abc_123". Using "_______" (7 underscores) would also return "abcX123", how can I enforce "_" as the 4th character?
If you issue the query in persistence, this is actually not a mdriven issue but an SQL issue as mdriven converts the Expression into SQL. So if you really want to restrict the results to underscores only take a look to this question:
Why does using an Underscore character in a LIKE filter give me all the results?
The way to escape the underscore may depend on the needs of your SQL database as the different answers indicate.

Algolia tag not searchable when ending with special characters

I'm coming across a strange situation where I cannot search on string tags that end with a special character. So far I've tried ) and ].
For example, given a Fruit index with a record with a tag apple (red), if you query (using the JS library) with tagFilters: "apple (red)", no results will be returned even if there are records with this tag.
However, if you change the tag to apple (red (not ending with a special character), results will be returned.
Is this a known issue? Is there a way to get around this?
EDIT
I saw this FAQ on special characters. However, it seems as though even if I set () as separator characters to index that only effects the direct attriubtes that are searchable, not the tag. is this correct? can I change the separator characters to index on tags?
You should try using the array syntax for your tags:
tagFilters: ["apple (red)"]
The reason it is currently failing is because of the syntax of tagFilters. When you pass a string, it tries to parse it using a special syntax, documented here, where commas mean "AND" and parentheses delimit an "OR" group.
By the way, tagFilters is now deprecated for a much clearer syntax available with the filters parameter. For your specific example, you'd use it this way:
filters: '_tags:"apple (red)"'

How to use a text search and parameterized queries in sql without allowing regex injection

So I'm trying to wrap my head around how to combine three conflicting things (bound parameters, regex, partial-match-searching using user input) securely, and I am not sure I have discovered the right/secure way to deal with these things. This shouldn't be an uncommon concern, but the documentation that deals with the intersection of all three security factors for PDO & php is either hard to find or non-existent.
My needs are relatively simple and standard, in order of priority:
I want to prevent sql injection (currently I'm using bound
parameters)
I want to prevent regex injection
I want to search with partial matching using user input strings
So for example, I want to allow a user to search through usernames with a case insensitive partial match, e.g.
A search for Xiu will bring up the username Xiu and Xiulu and also xiuislowercase
Currently I have the following statement:
select * from users where username ilike :search_string || '%'
and elsewhere, I use more complex cases using the regex operator similarly to:
select * from users where username ~* :search_string || '%'
Where :search_string is a bound statement in php pdo, and the database is postgresql.
This performs the right search, returns the right results, and I'm reasonably certain that it is proof against sql injection since it's a bound parameter. However, I'm not certain that it is proof against regex injection, and I have no idea how to make it proof against regex injection at the same time as having it be proof against sql-injection.
How would I completely secure it against regex injection as well, using php, PDO, and postgresql?
LIKE does not support regular expressions, it only has limited pattern-matching metacharacters % and _. So if you escape those two characters with a backslash in the string before you pass it the parameter value, you should be safe.
<?php
$search_string = preg_replace('/[%_]/', '\\\\$0', $search_string);
$pdoStmt->execute(array('search_string'=>$search_string));
Alternatively, you could compare a left-substring of the username to your input, then it's comparing against a fixed string with no metacharacter pattern-matching features.
select * from users where left(username, :search_string_length) = :search_string
Re your comment:
The general rule to avoid code injection is: never execute arbitrary user input as code.
This applies to SQL injection of course, which is why we use parameters to force user input to be interpreted as values, and not modify the syntax of the SQL statement.
But it also applies to the "code" in a regular expression string within an SQL operation. A regular expression is itself a type of code logic, it's a very compact representation of a finite state machine for matching input.
The solution to avoid code injection is that it's okay to let user input choose code (as in whitelisting), but don't let the user input be interpreted as code.

Regular expression to prevent SQL injection

I know I have to escape single quotes, but I was just wondering if there's any other character, or text string I should guard against
I'm working with mysql and h2 database...
If you check the MySQL function mysql-real-escape-string which is used by all upper level languages you'll see that the strange characters list is quite huge:
\
'
"
NUL (ASCII 0)
\n
\r
Control+Z
The upper language wrappers like the PHP one may also protect the strings from malformed unicode characters which may end up as a quote.
The conclusion is: do not escape strings, especially with hard-to-debug hard-to-read, hard-to-understand regular expressions. Use the built-in provided functions or use parameterized SQL queries (where all parameters cannot contain anything interpredted as SQL by the engine). This is also stated in h2 documentation: h2 db sql injection protection.
A simple solution for the problem above is to use a prepared statement:
This will somewhat depend on what type of information you need to obtain from the user. If you are only looking for simple text, then you might as well ignore all special characters that a user might input (if it's not too much trouble)--why allow the user to input characters that don't make sense in your query?
Some languages have functions that will take care of this for you. For example, PHP has the mysql_real_escape_string() function (http://php.net/manual/en/function.mysql-real-escape-string.php).
You are correct that single quotes (') are user input no-no's; but double quotes (") and backslashes (\) should also definitely be ignored (see the above link for which characters the PHP function ignores, since those are the most important and basic ones).
Hope this is at least a good start!