LIKE clause with \ character in PostgreSQL - postgresql

I have this behavior in PostgreSQL 9.3:
-- (1) this "doesn't" work
select 't\om' like '%t\om%'
-- result = false
-- (2) this works
select 't/om' like '%t/om%'
-- result = true
Why is the (1) query result false? What is the best way to get true in (1) query?

The \ has no special meaning in SQL except inside the condition for the LIKE operator where it can be used to escape the wildcard characters.
But you can define a different escape character for LIKE which then makes the \ a "normal" character:
select 't\om' like '%t\om%' escape '#';
edit
As Sunrelax has commented, you can also use an empty string as the "escape" sequence:
select 't\om' like '%t\om%' escape '';

\ is an escape sequence, so you need to escape it, too:
select 't\om' like '%t\\om%';
There is also a configuration option you can set. See Escaping backslash in Postgresql

Related

POSTGRES 🐘- combine LIKE with exception(E') literal

I'm trying to match a certain text that includes a single quote (i.e. 'company's report...')
normally I would have used the E' literal + ' or double single quotes.
but when it gets to using the LIKE '%' operator, things got complicated.
what is the best approach to match a text with a single quote?
You can escape single quote with another single quote. Example:
WHERE column LIKE 'RSNboim''s'
From https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS
To include a single-quote character within a string constant, write two adjacent single quotes, e.g., 'Dianne''s horse'. Note that this is not the same as a double-quote character (").
You can use Dollar-quoted String Constants at Lexical Structure
Your condition should be something like below;
select * from atable
where afield like $$Dianne's %$$

How to efficiently escape meta-characters in vim search

Perl provides quotemeta function, as well as the possibility to surround strings using the \Qlots-of-meta-characters\E construct, to make sure that all the characters between \Q and \E are interpreted as literals.
Very often I search strings full of meta characters in Vim. It's counterproductive to escape every special character individually. Is there anything like /\Qstring-to-search\E in Vim, which would make life easier?
You can use /\Vstring-to-search.
There are two caveats:
\ is special. You can still use all regex metacharacters by putting a \ in front of them.
There is no \E equivalent. \V affects the rest of the regex.
See :help /\V.
You could combine this with the code from the answer in https://stackoverflow.com/a/676619/1848654 as follows:
vnoremap <C-f> "hy/\V<C-r>=substitute(#h,'[\/]','\\&','g')<cr>
The idea is:
Copy ("yank") the selected text into register h: "hy
Start search mode: /
Prefill the beginning of the regex: \V
Insert the contents of a register: <C-r>
Don't use a real register; take the result of evaluating an expression instead: =
Our expression (terminated by <cr>) is: substitute(#h,'[\/]','\\&','g')
Take the contents of the h register: #h
Apply a substitution. Insert a \ before every \ and /: substitute(...,'[\/]','\\&','g')

Why does my LIKE statement fail with '\\_' for matching?

I have a database entry that has entries that look like this:
id | name | code_set_id
I have this particular entry that I need to find:
674272310 | raphodo/qrc_resources.py | 782732
In my rails app (2.3.8), I have a statement that evaluates to this:
SELECT * from fyles WHERE code_set_id = 782732 AND name LIKE 'raphodo/qrc\\_resources.py%';
From reading up on escaping, the above query is correct. This is supposed to correctly double escape the underscore. However this query does not find the record in the database. These queries will:
SELECT * from fyles WHERE code_set_id = 782732 AND name LIKE 'raphodo/qrc\_resources.py%';
SELECT * from fyles WHERE code_set_id = 782732 AND name LIKE 'raphodo/qrc_resources.py%';
Am I missing something here? Why is the first SQL statement not finding the correct entry?
A single backslash in the RHS of a LIKE escapes the following character:
9.7.1. LIKE
[...]
To match a literal underscore or percent sign without matching other characters, the respective character in pattern must be preceded by the escape character. The default escape character is the backslash but a different one can be selected by using the ESCAPE clause. To match the escape character itself, write two escape characters.
So this is a literal underscore in a LIKE pattern:
\_
and this is a single backslash followed by an "any character" pattern:
\\_
You want LIKE to see this:
raphodo/qrc\_resources.py%
PostgreSQL used to interpret C-stye backslash escapes in strings by default but no longer, now you have to use E'...' to use backslash escapes in string literals (unless you've changed the configuration options). The String Constants with C-style Escapes section of the manual covers this but the simple version is that these two:
name LIKE E'raphodo/qrc\\_resources.py%'
name LIKE 'raphodo/qrc\_resources.py%'
do the same thing as of PostgreSQL 9.1.
Presumably your Rails 2.3.8 app (or whatever is preparing your LIKE patterns) is assuming an older version of PostgreSQL than the one you're actually using. You'll need to adjust things to not double your backslashes (or prefix the pattern string literals with Es).

How to safely unload/copy a table in RedShift?

In RedShift, it is convenient to use unload/copy to move data to S3 and load back to redshift, but I feel it is hard to choose the delimiter each time. The right delimiter is relevant to the content of the table! I had to change the delimiter each time I met load errors.
For example, when I use the following command to unload/copy a table:
unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' addquotes allowoverwrite;
copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' removequotes;
I will get load error if the table happens to have a field with its content as "||". Then I have to change the delimiter '|' to another one like ',' and try again, if I'm unlucky, maybe it takes multiple tries to get a success.
I'm wondering if there's a way to unload/copy a redshift table which is irrelevant to the content of the table, which will always succeed no mater what weird strings are stored in the table.
Finally I figured out the right approach, to add escape in both unload and copy command:
unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' addquotes escape allowoverwrite;
copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' removequotes escape;
With escape in unload command, for CHAR and VARCHAR columns in delimited unload files, an escape character (\) is placed before every occurrence of the following characters:
Linefeed: \n
Carriage return: \r
The delimiter character specified for the unloaded data.
The escape character: \
A quote character: " or ' (if both ESCAPE and ADDQUOTES are specified
in the UNLOAD command).
And with escape in copy command, the backslash character () in input data is treated as an escape character. The character that immediately follows the backslash character is loaded into the table as part of the current column value, even if it is a character that normally serves a special purpose. For example, you can use this option to escape the delimiter character, a quote, an embedded newline, or the escape character itself when any of these characters is a legitimate part of a column value.
Try unload like below
unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter as ',' addquotes escape
To load it back use as below
copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter ',' removequotes escape;
This will work irrespective of your data might have , in between.
Since this topic comes up in many places we decided to package up the UNLOAd/extract process into a Docker service. All the code is on Github so you can use it as-is or grab the underlying Python code to create your own version: https://github.com/openbridge/ob_redshift_unload
You can set the delimiter, dates and ad hoc SQL via run-time configuration. This will also export a header row as well, something that is a little more complicated to undertake.
Here are a few of the runtime options:
-t: The table you wish to UNLOAD
-f: The S3 key at which the file will be placed
-s (Optional): The file you wish to read a custom valid SQL WHERE clause from. This will be sanitized then inserted into the UNLOAD command.
-r (Optional): The range column you wish to use to constrain the results. Any type supported by Redshift's BETWEEN function is accepted here (date, integer, etc.)
-r1 (Optional): The desired start range to constrain the result set
-r2 (Optional): The desired end range to constrain the result set
Note: -s and -d are mutually exlusive and cannot be used together. If neither is used, the script will default to not specifying a WHERE clause and output the entire table.
Then you can run it like this to UNLOAD:
docker run -it -v /local/path/to/my/config.json:/config.json openbridge/ob_redshift_unload python /unload.py -t mytable -f s3://dest-bucket/foo/bar/output_file.csv -r datecol -r1 2017-01-01 -r2 2017-06-01
The goal was to enhance the default UNLOAD process and wrap it into something that can help ensure consistency in generating outputs.
Here is a write-up that details the features/capabilities:
https://blog.openbridge.com/how-to-easily-extract-data-from-amazon-redshift-4e55435f7003

PostgreSQL regexp_replace with matched expression

I am using PostgreSQL regexp_replace function to escape square brackets, parentheses and backslash in a string so that I could use that string as a regex pattern itself (there are other manipulations done on this string as well before using it, but they are outside the scope of this question. The idea is to replace:
[ with \[
] with \]
( with \(
) with \)
\ with \\
Postgres documentation page on regular expressions states the following:
The replacement string can contain \n, where n is 1 through 9, to
indicate that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be
inserted. Write \ if you need to put a literal backslash in the
replacement text.
However regexp_replace('abc [def]', '([\[\]\(\)\\])', E'\\\1', 'g'); produces abc \ def\.
Further down on that same page, an example is given, which uses \\1 notation - so I tried that.
Yet, regexp_replace('abc [def]', '([\[\]\(\)\\])', E'\\\\1', 'g'); produces abc \1def\1.
I would guess this is expected, but regexp_replace('abc [def]', '([\[\]\(\)\\])', E'.\\1', 'g'); produces abc .[def.]. That is, escaping works with characters other than the standard backslash.
At this point I don't know how to proceed. What can I do to actually give me the replacement I want?
OK, found the answer. Apparently, I need to double-escape the backslash in the replacement. Also, I need to E-prefix and double-escape backslashes in the search pattern on older versions of postgres (8.3 in my case). The final code looks like this:
regexp_replace('abc [def]', E'([\\[\\]\\(\\)\\\\\?\\|_%])', E'\\\\\\1', 'g')
Yes, it looks horrible, but it works :)
it's simpliest way
select regexp_replace('abc [def]', '([\[\]\(\)\\])', '\\\1', 'g')