PSQLException: ERROR: syntax error in tsquery - postgresql

Which characters must be avoided to make sure PSQLException: ERROR: syntax error in tsquery will not occur?
The documentation does not say anything about how to escape the search string: http://www.postgresql.org/docs/8.3/static/datatype-textsearch.html

Use quotes around your terms if you want them as phrases/verbatim or they contain characters used in the syntax:
select to_tsquery('"hello there" | hi');
Bear in mind that you shouldn't really have crazy characters in your terms, since they are not going to match anything in the tsvector.
The (non-token) characters recognized by the tsquery parser are: \0 (null), (, ),   (whitespace), |, &, :, * and !. But how you tokenize your query should be based on how you have setup your dictionary. There are a great many other characters that you will likely not want in your query, not because they will cause a syntax error but because it means you are not tokenizing your query correctly.
Use the plainto_tsquery version if it's a simple AND query and you don't want to deal with creating the query manually.

Related

Oddities in fail2ban regex

This appears to be a bug in fail2ban, with different behaviour between the fail2ban-regex tool and a failregex filter
I am attempting to develop a new regex rule for fail2ban, to match:
\"%20and%20\"x\"%3D\"x
When using fail2ban-regex, this appears to produce the desired result:
^<HOST>.*GET.*\\"%20and%20\\"x\\"%3D\\"x.* 200.*$
As does this:
^<HOST>.*GET.*\\\"%20and%20\\\"x\\\"%3D\\\"x.* 200.*$
However, when I put either of these into a filter, I get the following error:
Failed during configuration: '%' must be followed by '%' or '(', found:…
To have this work in a filter you have to double-up the ‘%’, ie ‘%%’:
^<HOST>.*GET.*\\\"%%20and%%20\\\"x\\\"%%3D\\\"x.* 200.*$
While this gets the required hits running as a filter, it gets none running through fail2ban-regex.
I tried the \\\\ as Andre suggested below, but this gets no results in fail2ban-regex.
So, as this appears to be differential behaviour, I am going to file it as a bug.
According to Python's own site a singe backslash "\" has to be written as "\\\\" and there's no mention of %.
Regular expressions use the backslash character ('') to indicate
special forms or to allow special characters to be used without
invoking their special meaning. This collides with Python’s usage of
the same character for the same purpose in string literals; for
example, to match a literal backslash, one might have to write '\\'
as the pattern string, because the regular expression must be \, and
each backslash must be expressed as \ inside a regular Python string
literal
I would just go with:
failregex = (?i)^<HOST> -.*"(GET|POST|HEAD|PUT).*20and.*3d.*$
the .* wil match anything inbetween anyways and (?i) makes the entire regex case-insensitive

Papa Parse with backslash escaping

I have input that people will probably say "that's not really CSV", but I still have to parse it. (using Papa Parse)
comma is the delimiter. backslash is the escape. comma, double quote, backslash, r and n (to denote newlines) can all be escaped. There is no "quoting" of strings.
so... I see data like:
this is one\, field,1/2\" bolt,this is text with \\ and a new line \r\n embedded
and I want:
[0] this is one\, field
[1] 1/2\" bolt
[2] this is text with \\ and a new line \r\n embedded
but I'm getting
[0] this is one\
[1] field
[2] 1/2\" bolt
...
I can deal with the other \x things in post processing... I'd just like to get it to handle \, correctly.
I've tried the obvious values of quoteChar and escapeChar with no luck.
oh... and the Donate link is broken on https://www.papaparse.com/ if Matt Holt is listening.
const parsed = window.Papa.parse(csvText, {
escapeChar: '\\',
});
Seems like default escape character is ", but it can be overidden in the paramters.
Upd. though now that I look at it, it does not seem to work with your case. It only did fix an issue I had of 2.5\","Shell being considered single value because " was interpreted as escape character for ,.
I'm starting to get a feeling that the only way to escape coma is to enclose the field in the quotes.
Hope someone will post the right answer eventually...

unterminated CSV quoted field in Postgres

I'm trying to insert some data into my table using the copy command :
copy otype_cstore from '/tmp/otype_fdw.csv' delimiter ';' quote '"' csv;
And I have this answer :
ERROR: unterminated CSV quoted field
There is the line in my CSV file where I have the problem :
533696;PoG;-251658240;from id GSW C";
This is the only line a double quote and I can't remove it, so do you have some advice for me ?
Thank you in advance
If you have lines like this in your csv:
533696;PoG;-251658240;from id GSW C";
this actually means/shows the fields are not quoted, which is still perfectly valid csv as long as there are no separators inside the fields.
In this case the parser should be told the fields are not quoted.
So, instead of using quote '"' (which is actually telling the parser the fields are quoted and why you get the error), you should use something like quote 'none', or leave the quote parameter out (I don't know Postgres, so I can't give you the exact option to do this).
Ok, I did a quick lookup of the parameters. It looks like there is not really an option to turn quoting off. The only option left would be to provide a quote character that is never used in the data.
quote E'\b' (backspace) seems to work ok.
Bit late to reply, but I was facing same issue and found that for double quote we need to add one more double quote to escape with along with prefix and suffix double quote to that special characater. for your case it would be
input data is : from id GSW C"
change data to : from id GSW C""""
note there are 4 consecutive double quotes.
first and last double quote is prefix and postfix as per documentation.
middle 2 double quotes is data with one escape double quote.
Hope this helps for readers with similar issue going forward.
So for every double quote in data it needs to be escaped with one escape character (default double quote). This is as per documentation.

Why does my LIKE statement fail with '\\_' for matching?

I have a database entry that has entries that look like this:
id | name | code_set_id
I have this particular entry that I need to find:
674272310 | raphodo/qrc_resources.py | 782732
In my rails app (2.3.8), I have a statement that evaluates to this:
SELECT * from fyles WHERE code_set_id = 782732 AND name LIKE 'raphodo/qrc\\_resources.py%';
From reading up on escaping, the above query is correct. This is supposed to correctly double escape the underscore. However this query does not find the record in the database. These queries will:
SELECT * from fyles WHERE code_set_id = 782732 AND name LIKE 'raphodo/qrc\_resources.py%';
SELECT * from fyles WHERE code_set_id = 782732 AND name LIKE 'raphodo/qrc_resources.py%';
Am I missing something here? Why is the first SQL statement not finding the correct entry?
A single backslash in the RHS of a LIKE escapes the following character:
9.7.1. LIKE
[...]
To match a literal underscore or percent sign without matching other characters, the respective character in pattern must be preceded by the escape character. The default escape character is the backslash but a different one can be selected by using the ESCAPE clause. To match the escape character itself, write two escape characters.
So this is a literal underscore in a LIKE pattern:
\_
and this is a single backslash followed by an "any character" pattern:
\\_
You want LIKE to see this:
raphodo/qrc\_resources.py%
PostgreSQL used to interpret C-stye backslash escapes in strings by default but no longer, now you have to use E'...' to use backslash escapes in string literals (unless you've changed the configuration options). The String Constants with C-style Escapes section of the manual covers this but the simple version is that these two:
name LIKE E'raphodo/qrc\\_resources.py%'
name LIKE 'raphodo/qrc\_resources.py%'
do the same thing as of PostgreSQL 9.1.
Presumably your Rails 2.3.8 app (or whatever is preparing your LIKE patterns) is assuming an older version of PostgreSQL than the one you're actually using. You'll need to adjust things to not double your backslashes (or prefix the pattern string literals with Es).

PostgreSQL regexp_replace with matched expression

I am using PostgreSQL regexp_replace function to escape square brackets, parentheses and backslash in a string so that I could use that string as a regex pattern itself (there are other manipulations done on this string as well before using it, but they are outside the scope of this question. The idea is to replace:
[ with \[
] with \]
( with \(
) with \)
\ with \\
Postgres documentation page on regular expressions states the following:
The replacement string can contain \n, where n is 1 through 9, to
indicate that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be
inserted. Write \ if you need to put a literal backslash in the
replacement text.
However regexp_replace('abc [def]', '([\[\]\(\)\\])', E'\\\1', 'g'); produces abc \ def\.
Further down on that same page, an example is given, which uses \\1 notation - so I tried that.
Yet, regexp_replace('abc [def]', '([\[\]\(\)\\])', E'\\\\1', 'g'); produces abc \1def\1.
I would guess this is expected, but regexp_replace('abc [def]', '([\[\]\(\)\\])', E'.\\1', 'g'); produces abc .[def.]. That is, escaping works with characters other than the standard backslash.
At this point I don't know how to proceed. What can I do to actually give me the replacement I want?
OK, found the answer. Apparently, I need to double-escape the backslash in the replacement. Also, I need to E-prefix and double-escape backslashes in the search pattern on older versions of postgres (8.3 in my case). The final code looks like this:
regexp_replace('abc [def]', E'([\\[\\]\\(\\)\\\\\?\\|_%])', E'\\\\\\1', 'g')
Yes, it looks horrible, but it works :)
it's simpliest way
select regexp_replace('abc [def]', '([\[\]\(\)\\])', '\\\1', 'g')