Postgres: LIKE query against a string returns no results, though results exist - postgresql

I feel like I'm missing something really simple here. I'm trying to get a list of results from Postgres, and I know the rows exist since I can see them when I do a simple SELECT *. However, when I try to do a LIKE query, I get nothing back.
Sample data:
{
id: 1,
userId: 2,
cache: '{"dataset":"{\"id\":4,\"name\":\"Test\",\"directory\":\"/data/"...'
};
Example:
SELECT cache FROM storage_table WHERE cache LIKE '%"dataset":"{\"id\":4%';
Is there some escaping I need?

The LIKE operator supports escaping of wildcards (e.g. so that you can search for an underscore or % sign). The default escape character is a backslash.
In order to tell the LIKE operator that you do not want to use the backslash as an ESCAPE character you need to define a different one, e.g. the ~
SELECT cache
FROM storage_table
WHERE cache LIKE '%"dataset":"{\"id\":4%' ESCAPE '~';
You can use any character that does not appear in your data, ESCAPE '#' or ESCAPE 'ยง'
SQLFiddle example: http://sqlfiddle.com/#!15/7703f/1
But you should seriously consider upgrading to a more recent Postgres version that fully supports JSON. Not only will your queries be more robust, they can also be a lot faster due to the ability to index a JSON document.

Related

"sqlLike" and "sqlLikeCaseInsensitive" escape character?

Is there any way to escape SQL Like string when using "sqlLike" and "sqlLikeCaseInsensitive"?
Example: I want a match for "abc_123". Using "_______" (7 underscores) would also return "abcX123", how can I enforce "_" as the 4th character?
If you issue the query in persistence, this is actually not a mdriven issue but an SQL issue as mdriven converts the Expression into SQL. So if you really want to restrict the results to underscores only take a look to this question:
Why does using an Underscore character in a LIKE filter give me all the results?
The way to escape the underscore may depend on the needs of your SQL database as the different answers indicate.

PQgetvalue() strips spaces from result of string_agg()

I have a GNU C++ project that uses the PostgreSQL API and for some reason, it strips spaces from the result of a certain query. Other environments (psql and pgAdmin) don't. The query is:
SELECT string_agg(my_varchar, ', ') FROM my table;
Notice the space after the comma in the delimiter. Instead of 1046976, 1046977 being returned by PQgetvalue(), I get 1046976,1046977. Just for kicks, I tried changing the delimiter to silly things like string_agg(my_varchar, ',:) ' and string_agg(my_varchar, ', :)'. It doesn't strip the space if the space is in the middle of the delimiter.
Again, I don't have this problem if I do the same queries in db browsers like psql and pgAdmin; they don't strip the space in any of those queries.
Yes, I considered the possibility that because the columns from which they extract are varchars, but the data are 7-bit integers, the engine might be confused. I changed the query to something that is truly a varchar, but the spaces were still stripped.
Looking at https://www.postgresql.org/docs/9.4/static/functions-aggregate.html, I see that string_agg() expects its arguments to be texts or byteas. Well, I never got an error, but to be sure, I tried string_agg(my_varchar::text, ', '::text). It didn't make a difference.
I don't know a great deal about this API, but it doesn't appear to connect to the db with any options, so I don't think there's much to say about the configuration.
I'm running this in GNU C++ v4.9.2 on Debian 8.10. The PostgreSQL engine and API are 9.4.

PostgreSQL Trimming Leading and Trailing Characters: = and "

I'm working to build an import tool that utilizes a quoted CSV file. However, several of the fields in the CSV file are reported as such:
"=""38000"""
Where 38000 is the data I need. The data integration software I use (Talend 6.11) already strips the leading and trailing double quotes for me (so, "38000" becomes 38000), but I can't find a way to get rid of those others.
So, essentially, I need "=""38000""" to become "38000" where the leading "=" is removed and the trailing "" is removed.
Is there a TRIM function that can accomplish this for me? Perhaps there is a method in Talend that can do this?
As the other answer stated, you could do that operation in SQL. Or, you could do it in Java, Groovy, etc, within Talend. However, if there is an existing Talend component which does the job, my preference is to use it. That leads to faster development, potentially less testing, and easier maintenance. Having said that, it is important to review all the components which are available, so you know what's available to you.
You can use the Talend component tReplace, to inspect each of the input columns you want to trim of quotes and equal signs. A single tReplace component can do search and replace operations on multiple input columns. If all the of the replaces are related to each other, I would keep them within a single tReplace. When it gets to the point of doing unrelated replacements, I might place those within a new tReplace so that logical operations are organized and grouped together.
tReplace
For a given Input Column
search for "=", replace with ""
search for "\"", replace with ""
Something like that:
SELECT format( '"%s"', trim( both '"=' from '"=""38000"""' ) );
-[ RECORD 1 ]---
format | "38000"
1st: trim() function removes all " and = chars. Result is simply 38000
2nd: with format can add double quote back to get wishful end result
Alternatively, can use regexp and other Postgres string functions.
See more:
https://www.postgresql.org/docs/current/static/functions-string.html

Regular expression to prevent SQL injection

I know I have to escape single quotes, but I was just wondering if there's any other character, or text string I should guard against
I'm working with mysql and h2 database...
If you check the MySQL function mysql-real-escape-string which is used by all upper level languages you'll see that the strange characters list is quite huge:
\
'
"
NUL (ASCII 0)
\n
\r
Control+Z
The upper language wrappers like the PHP one may also protect the strings from malformed unicode characters which may end up as a quote.
The conclusion is: do not escape strings, especially with hard-to-debug hard-to-read, hard-to-understand regular expressions. Use the built-in provided functions or use parameterized SQL queries (where all parameters cannot contain anything interpredted as SQL by the engine). This is also stated in h2 documentation: h2 db sql injection protection.
A simple solution for the problem above is to use a prepared statement:
This will somewhat depend on what type of information you need to obtain from the user. If you are only looking for simple text, then you might as well ignore all special characters that a user might input (if it's not too much trouble)--why allow the user to input characters that don't make sense in your query?
Some languages have functions that will take care of this for you. For example, PHP has the mysql_real_escape_string() function (http://php.net/manual/en/function.mysql-real-escape-string.php).
You are correct that single quotes (') are user input no-no's; but double quotes (") and backslashes (\) should also definitely be ignored (see the above link for which characters the PHP function ignores, since those are the most important and basic ones).
Hope this is at least a good start!

How to escape string while matching pattern in PostgreSQL

I want to find rows where a text column begins with a user given string, e.g. SELECT * FROM users WHERE name LIKE 'rob%' but "rob" is unvalidated user input. If the user writes a string containing a special pattern character like "rob_", it will match both "robert42" and "rob_the_man". I need to be sure that the string is matched literally, how would I do that? Do I need to handle the escaping on an application level or is it a more beautiful way?
I'm using PostgreSQL 9.1 and go-pgsql for Go.
The _ and % characters have to be quoted to be matched literally in a LIKE statement, there's no way around it. The choice is about doing it client-side, or server-side (typically by using the SQL replace(), see below). Also to get it 100% right in the general case, there are a few things to consider.
By default, the quote character to use before _ or % is the backslash (\), but it can be changed with an ESCAPE clause immediately following the LIKE clause.
In any case, the quote character has to be repeated twice in the pattern to be matched literally as one character.
Example: ... WHERE field like 'john^%node1^^node2.uucp#%' ESCAPE '^' would match john%node1^node2.uccp# followed by anything.
There's a problem with the default choice of backslash: it's already used for other purposes when standard_conforming_strings is OFF (PG 9.1 has it ON by default, but previous versions being still in wide use, this is a point to consider).
Also if the quoting for LIKE wildcard is done client-side in a user input injection scenario, it comes in addition to to the normal string-quoting already necessary on user input.
A glance at a go-pgsql example tells that it uses $N-style placeholders for variables... So here's an attempt to write it in a somehow generic way: it works with standard_conforming_strings both ON or OFF, uses server-side replacement of [%_], an alternative quote character, quoting of the quote character, and avoids sql injection:
db.Query("SELECT * from USERS where name like replace(replace(replace($1,'^','^^'),'%','^%'),'_','^_') ||'%' ESCAPE '^'",
variable_user_input);
To escape the underscore and the percent to be used in a pattern in like expressions use the escape character:
SELECT * FROM users WHERE name LIKE replace(replace(user_input, '_', '\\_'), '%', '\\%');
As far as I can tell the only special characters with the LIKE operator is percent and underscore, and these can easily be escaped manually using backslash. It's not very beautiful but it works.
SELECT * FROM users WHERE name LIKE
regexp_replace('rob', '(%|_)', '\\\1', 'g') || '%';
I find it strange that there is no such functions shipped with PostgreSQL. Who wants their users to write their own patterns?
The best answer is that you shouldn't be interpolating user input into your sql at all. Even escaping the sql is still dangerous.
The following which uses go's db/sql library illustrates a much safer way. Substitute the Prepare and Exec calls with whatever your go postgresql library's equivalents are.
// The question mark tells the database server that we will provide
// the LIKE parameter later in the Exec call
sql := "SELECT * FROM users where name LIKE ?"
// no need to escape since this won't be interpolated into the sql string.
value := "%" + user_input
// prepare the completely safe sql string.
stmt, err := db.Prepare(sql)
// Now execute that sql with the values for every occurence of the question mark.
result, err := stmt.Exec(value)
The benefits of this are that user input can safely be used without fear of it injecting sql into the statements you run. You also get the benefit of reusing the prepared sql for multiple queries which can be more efficient in certain cases.