How to append prefix match to tsquery in PostgreSQL - postgresql

I'm trying to utilize the full text search feature of PostgreSQL, particularly when user types in some search text, I would like to display him results with an assumption that the last word is incomplete.
For that purpose the "*" wildcard character needs to be attached to the last tsquery lexeme. E.g. if the user types in "The fat ra" the tsquery should be 'fat' & 'ra':*.
If I append the wildcard to the input string and parse it with plainto_tsquery function then the wildcard is removed plainto_tsquery("The fat ra" || ":*") => 'fat' & 'ra'.
Constructing a tsquery manually with to_tsquery function requires a lot modifications to the string (such as trim spaces and other special characters, replace spaces with the ampersand character) to make the function accept it.
Is there an easier way to do that?

You can make the last lexeme in a tsquery a prefix match by casting it to a string, appending ':*', then casting it back to a tsquery:
=> SELECT ((to_tsquery('foo <-> bar')::text || ':*')::tsquery);
tsquery
-------------------
'foo' <-> 'bar':*
For your usecase, you'll want to use <-> instead of & to require the words to be next to each other. Here's a demonstration of how they're different:
=> SELECT 'foo bar baz' ## tsquery('foo & baz');
?column?
----------
t
(1 row)
=> SELECT 'foo bar baz' ## tsquery('foo <-> baz');
?column?
----------
f
(1 row)
phraseto_tsquery makes it easy to have specify many words that have to be next to each other:
=> SELECT phraseto_tsquery('foo baz');
phraseto_tsquery
------------------
'foo' <-> 'baz'
Putting it all together:
=> SELECT (phraseto_tsquery('The fat ra')::text || ':*')::tsquery;
tsquery
------------------
'fat' <-> 'ra':*
Depending on your needs, a simpler way might be to build a tsquery directly with a string then a cast:
=> SELECT $$'fat' <-> 'ra':*$$::tsquery;
tsquery
------------------
'fat' <-> 'ra':*

Related

Use ilike any() with escape character

In PostgreSQL you can do a case insensitive query with ILIKE:
select * from test where value ilike 'half is 50$%' escape '$'
And you can query multiple values at once by combining ILIKE with ANY()
select * from test where value ilike any(array['half is 50%', 'fifth is 20%'])
The query above will match 'Fifth is 2019', which I do not want, but when I try to use ILIKE and ANY() with an escape character I get a syntax error.
Am I missing something stupid, or is this simply not supported? If not, is there another way to query in a case insensitive way with multiple values at once?
EDIT: To clarify, the query will accept parameters through JDBC, so the actual SQL will look something like
select * from test where value ilike any(?) escape '$'
This is why I'm looking make % and _ from the user input be interpreted as literals.
The ESCAPE clause in ILIKE refers only to literals and does not apply to expressions. You should use a backslash, or if not possible, you can try:
with test(value) as (
values
('half is 50%'),
('half is 50x'),
('fifth is 20%'),
('fifth is 2000')
)
select *
from test
where value ilike any(select replace(unnest(array['half is 50$%', 'fifth is 20$%']), '$', '\'))
value
--------------
half is 50%
fifth is 20%
(2 rows)
Looks a bit clumsy but works well.
To match them as raw strings, you may use the ~* operator for insensitive match.
knayak=# select 'Half is 50%' ~* any(array['half is 50%', 'fifth is 20%'])
knayak-# ;
?column?
----------
t --True
(1 row)
knayak=# select 'fifth is 20' ~* any(array['half is 50%', 'fifth is 20%']);
?column?
----------
f --False
(1 row)
If you wish to escape the right hand operands of ilike use "escape" string constants, which are an extension to the SQL standard. An escape string constant is specified by writing the letter E (upper or lower case) just before the opening single quote
knayak=# select 'Half is 50%' ilike any(array[E'half is 50\\%', E'half is 20\\%'])
knayak-# ;
?column?
----------
t
(1 row)
DEMO

Convert String to Array - PostgreSQL

I have a column in a table that stores names separated by commas, example: "Mel's Hou Rest, Mel's Lad Rest". What I need is to convert this string into an array separated by commas.
The query I need is:
SELECT home_location, subs_state FROM cust
WHERE (home_location = ANY('{"Mel''s Hou Rest", Mel''s Lad Rest"}')) AND subs_state = 'active'
I have tried this, but I keep getting an error:
WHERE (home_location = ANY(string_to_array("Mel's Hou Rest, Mel's Lad Rest", ',')::text[])
Is there any way to accomplish this without me having to change the database from 'text' to 'array'
SQL uses single quotes for string literals. Your string "Mel's Hou Rest, Mel's Lad Rest" has double quotes around it which makes Postgres interpret it as an quoted identifier. You can use two single quotes to include one in the string.
SELECT * FROM cust WHERE home_location = ANY(string_to_array("Mel's Hou Rest, Mel's Lad Rest", ','))
-- ERROR: column "Mel's Hou Rest, Mel's Lad Rest" does not exist
SELECT * FROM cust WHERE home_location = ANY(string_to_array('Mel''s Hou Rest, Mel''s Lad Rest', ','))
-- OK
Also note that string_to_array does not remove whitespace around the delimiter which might not be what you expect.
For example:
-- With whitespace around the delimiter
=> SELECT string_to_array('foo, bar', ',')
string_to_array
-----------------
{foo," bar"}
=> select 'foo' = ANY(string_to_array('foo, bar', ','));
?column?
----------
t
=> select 'bar' = ANY(string_to_array('foo, bar', ','));
?column?
----------
f
-- Without extra whitespace
=> SELECT string_to_array('foo,bar', ',')
string_to_array
-----------------
{foo,bar}
=> select 'foo' = ANY(string_to_array('foo,bar', ','));
?column?
----------
t
=> select 'bar' = ANY(string_to_array('foo,bar', ','));
?column?
----------
t
This of course can be countered by normalising the input before using it in the query. In somes cases it might be feasible to strip the whitespace in the query with string_to_array(regexp_replace('foo, bar', '\s*,\s', ','), ',') but I would not complicate the queries like that without a good reason.
To supplement the accepted answer a bit...
Note that the array string literal notation (with curly braces) '{foo, ba''r, "b{a}z", "b,u,z"}' (equivalent to the more explicit ARRAY['foo', 'ba''r', 'b{a}z', 'b,u,z']) is still actually just a string. To be used as an array it needs to first be converted, which a few operations can do implicitly (like ANY()). In many cases though, you'd need to explicitly cast it (e.g. with CAST(array_literal as text[]) or array_literal::text[]).
Your first expression should therefore work if rewritten as
SELECT home_location, subs_state FROM cust
WHERE
home_location = ANY('{Mel''s Hou Rest, Mel''s Lad Rest}')
AND subs_state = 'active';

KDB string concatenation with symbol list for dynamic query

In this link, there is an example on how to include a dynamic parameter. d, in a KDB select query:
h: hopen`:myhost01:8012 // open connection
d: 2016.02.15 // define date var
symList: `GBPUSD`EURUSD
h raze "select from MarketDepth where date=", string d, ", sym in `GBPUSD`EURUSD" // run query with parameter d
Here d is of type date and is easy to string concatenate in order to generate a dynamic query.
If I want to add symList as a dynamic parameter as well by converting to string:
raze "select from MarketDepth where date=", string d, ", sym in ", string symList
The concatenated string becomes: select from MarketDepth where date=2016.02.15, sym in GBPUSDEURUSD, in other words the string concatenation loses the backticks so the query does not run. How can I solve this?
p.S: I know about functional querying but after failing for 2 hours, I have given up on that.
No need for functional selects.
q)MarketDepth:([] date:9#2016.02.15; sym:9#`A`B)
q)d:2016.02.15
q)symList:`B
q)h ({[dt;sl] select from MarketDepth where date=dt,sym in sl}; d; symList)
date sym
--------------
2016.02.15 B
2016.02.15 B
2016.02.15 B
2016.02.15 B
You are right, string SYMBOL does not preserve a backtick character, so you'll have to append it yourself like this:
symList: `GBPUSD`EURUSD
strSymList: "`",'string symList / ("`GBPUSD";"`EURUSD")
I used join , with each-both adverb ' to join a backtick with each element of a list. Having your symbol list stringified your dynamic query becomes
"select from MarketDepth where date=", (string d), ", sym in ",raze"`",'string symList
You can also use parse to see how a shape of a functional form of your query will look like.
q) parse "select from MarketDepth where date=", (string d), ", sym in ",raze"`",'string symList
(?;`MarketDepth;enlist ((=;`date;2016.02.15);(in;`sym;enlist `GBPUSD`EURUSD));0b;())
Now it's easy to create a functional select:
?[`MarketDepth;enlist ((=;`date;2016.02.15);(in;`sym;enlist symList));0b;()]
Hope this helps.
Update: #Ryan Hamilton's solution is probably the best in your particular scenario. You can even make a table name an argument if you want:
h({[t;d;s]select from t where date=d,sym in s};`MarketDepth; d; symList)
But it is worth noting that you can't use this technique when you need to make a list of columns dynamic. The following will NOT work:
h({[c;d;s]select c from t where date=d,sym in s};`time`sym; d; symList)
You will have to either build a dynamic select expression like you do or use functional forms.
Others have already given good alternative approaches for your problem. But in case if you need to join string and symbols (or other data types) without losing backtick, function .Q.s1 does the task.
q) .Q.s1 `a`b
q)"`a`b"
q)"select from table where sym in ",.Q.s1 symlist
Note: Generally it is not suggested to use .Q namespace functions.

Pattern matching with identical wildcards

I'm working with PostgreSQL and want to know whether you can have a wildcard retain its value.
So for example say I had
select * from tableOne where field like ‘_DEF_’;
Is there a way to get the first and last wildcard to be the exact same character?
So an example matching result could be: ADEFA or ZDEFZ.
You can use a regular expression with a back-reference:
select *
from some_table
where some_column ~* '^(.)DEF(\1)$'
^(.)DEF(\1)$ means: some character at the beginning followed DEF followed by the first character must occur at the end of the string.
The () defines a group and the \1 references the first group (which is the first character in the input sequence in this example)
SQLFiddle example: http://sqlfiddle.com/#!15/d4c4d/1
Use regular expression:
with test as (
select 'xABa' as foo
union select 'xABx'
union select 'xJBx'
)
select * from test
where foo ~* E'^(.)AB\\1$'
Outputs:
foo
------
xABx
(1 row)

remove non-numeric characters in a column (character varying), postgresql (9.3.5)

I need to remove non-numeric characters in a column (character varying) and keep numeric values in postgresql 9.3.5.
Examples:
1) "ggg" => ""
2) "3,0 kg" => "3,0"
3) "15 kg." => "15"
4) ...
There are a few problems, some values are like:
1) "2x3,25"
2) "96+109"
3) ...
These need to remain as is (i.e when containing non-numeric characters between numeric characters - do nothing).
Using regexp_replace is more simple:
# select regexp_replace('test1234test45abc', '[^0-9]+', '', 'g');
regexp_replace
----------------
123445
(1 row)
The ^ means not, so any character that is not in the range 0-9 will be replaced with an empty string, ''.
The 'g' is a flag that means all matches will be replaced, not just the first match.
For modifying strings in PostgreSQL take a look at The String functions and operators section of the documentation. Function substring(string from pattern) uses POSIX regular expressions for pattern matching and works well for removing different characters from your string.
(Note that the VALUES clause inside the parentheses is just to provide the example material and you can replace it any SELECT statement or table that provides the data):
SELECT substring(column1 from '(([0-9]+.*)*[0-9]+)'), column1 FROM
(VALUES
('ggg'),
('3,0 kg'),
('15 kg.'),
('2x3,25'),
('96+109')
) strings
The regular expression explained in parts:
[0-9]+ - string has at least one number, example: '789'
[0-9]+.* - string has at least one number followed by something, example: '12smth'
([0-9]+.\*)* - the string similar to the previous line zero or more times, example: '12smth22smth'
(([0-9]+.\*)*[0-9]+) - the string from the previous line zero or more times and at least one number at the end, example: '12smth22smth345'