SQLAlchemy search on ts_vector column without to_tsvector

SQLAlchemy search on ts_vector column without to_tsvector - postgresql

I have a computed column that is a tsvector.
The api sends a search query, but, of course, these are not valid tsvectors. Postgres's plainto_tsquery converts text input to a correctly formatted tsvector for matching.
This breaks with SQLAlchemy.
column.match(func.plainto_tsquery('english', search)) does not work because SQLAlchemy converts that to:
column ## to_tsquery(plainto_tsquery('english', 'the search query'))
what I actually want is the correct operator (##) but without the magic conversion
column ## plainto_tsquery('english', 'the search query')
A dumb way that works but is not what I want is:
column.match(
cast(func.plainto_tsquery('english', search), String)
)

How about
column.op("##")(func.plainto_tsquery('english', search))

Related

How to query only the data that has emojis (postgresql)

I have data that contains emojis within a database column, i.e.
message_text
-------
🙂
😀
Hi 😀
I want to query only the rows that have data containing emojis. Is there a simple way to do this in postgres?
select data from table where data_col contains emoji
Currently I use a simple query
select message_text from cb_messages_v1 cmv
where message_text IN ('👍🏻','😀','😐','🙂', '😧')
but I want it to be more dynamic, where if future emotions are added it will capture the data.

From your example it seems like you are not only interested in emoticons (U+1F601 - U+1F64F), but also in Miscellaneous Symbols And Pictographs (U+1F300 - U+1F5FF) and Transport And Map Symbols (U+1F680 - U+1F6C5).
You can find values that contain one of these with
WHERE textcol ~ '[\U0001F300-\U0001F6FF]'
~ is the regular expression matching operator, and the pattern is a range of Unicode characters.

Parametric query and hstore in PostgreSQL

I have a query with one parameter and am using jmoiron/sqlx to run it against Nominatim database that has a hstore field "name". The query itself is like
SELECT place_id, parent_place_id, name->'name:ru' as name from placesx WHERE admin_level = 3 and parent_place_id IN (?)
The problem when I use sqlx.In, sqlx.Bind and sqlx.Prepare functions that it takes :ru as a query parameter and complains about it.
Question is - how it can be avoided so that I can retrieve specific locale value ('name:en', 'name:de' etc) from hstore without this collision?
So far I use a regular expression and do not unmasrhal string to hstore’ map[string]string since I couldn’t figure out how to retrieve value from it by key.

test JOOQ postgres jsonb column for key exists

I've got a table TABLE that contains a jsonb column named tags. The tags element in each row may or may not contain a field called group. My goal is to group by tags.group for all rows where tags contains a group field. Like the following postgres query:
select tags->>'group' as group, sum(n) as sum
from TABLE
where tags ? 'group'
group by tags->>'group';
I'm trying to turn it into JOOQ and cannot find out how to express the where tags ? 'group' condition.
For example,
val selectGroup = DSL.field("{0}->>'{1}'", String::class.java, TABLE.TAGS, "group")
dsl().select(selectGroup, DSL.sum(TABLE.N))
.from(TABLE)
.where(TABLE.TAGS.contains('group'))
.groupBy(selectGroup)
This is equivalent to testing contains condition #> in postgres. But I need to do exists condition ?. How can I express that in JOOQ?

There are two things worth mentioning here:
The ? operator in JDBC
Unfortunately, there's no good solution to this as ? is currently strictly limited to be used as a bind variable placeholder in the PostgreSQL JDBC driver. So, even if you could find a way to send that character to the server through jOOQ, the JDBC driver would still misinterpret it.
A workaround is documented in this Stack Overflow question.
Plain SQL and string literals
When you're using the plain SQL templating language in jOOQ, beware that there is a parser that will parse certain tokens of your string, including e.g. comments and string literals. This means that your usage of...
DSL.field("{0}->>'{1}'", String::class.java, TABLE.TAGS, "group")
is incorrect, as '{1}' will be parsed as a string literal and sent to the server as is. If you want to use a variable string literal, do this instead:
DSL.field("{0}->>{1}", String::class.java, TABLE.TAGS, DSL.inline("group"))
See also DSL.inline()

Ignore newline characters when performing a fulltext search in MS SQL Server

I have a table with a full text index on one of its columns. The text in this column can contain newline characters "\r\n", e.g.
line1\r\nline2
Currently I am using the following SQL to search
SELECT * FROM SomeTable WHERE CONTAINS(TextValue, '"*line1*"')
which matches successfully. However, my users would like to be able to enter
"line1 line2"
as a search term which should also match successfully. In other words, I would like the search to ignore "\r" and "\n".
I have tried
SELECT * FROM civ.MetadataFieldValues WHERE CONTAINS(REPLACE(REPLACE(TextValue, CHAR(13), ' '), CHAR(10), ''), '"*line1 line2*"')
but this returns the error
Incorrect syntax near '('.
Is there a way to achieve this using CONTAINS?

CONTAINS makes use of a full-text index. For this reason, if you were to modify the content of the field with a function prior to searching then the advantage of the full-text index would be nullified.
It is possible to create a full-text index on a view. Therefore, I would recommend creating a view with a second field containing the exact same contents as the currently searched field except that you would replace any undesirable special characters with spaces or something else innocuous. You can then create a full-text index on the new field in the view and use CONTAINS on that field in the view.

why is this postgresql full text search query returning ts_rank of 0?

Before I invest in using solr or lucene or sphinx, I wanted to try to implement a search capability on my system using postgresql full text search.
I have a national list of businesses in a table that I want to search. I created a ts vector that combines the business name and city so that I can do a search like "outback atlanta".
I am also implementing an auto-completion function by using the wildcard capability of the search by appending ":" to the search pattern and inserting " & " between keywords, so the search pattern "outback atl" turns into the "outback & atl:" before getting converted into a query using to_tsquery().
Here's the problem that I am running into currently.
if the search pattern is entered as "ou", many "Outback Steakhouse" records are returned.
if the search pattern is entered as "out", no results are returned.
if the search pattern is entered as "outb", many "Outback Steakhouse" records are returned.
doing a little debugging, I came up with this:
select ts_rank(to_tsvector('Outback Steakhouse'),to_tsquery('ou:*')) as "ou",
ts_rank(to_tsvector('Outback Steakhouse'),to_tsquery('out:*')) as "out",
ts_rank(to_tsvector('Outback Steakhouse'),to_tsquery('outb:*')) as "outb"
which results this:
ou out outb
0.0607927 0 0.0607927
What am I doing wrong?
Is this a limitation of pg full text search?
Is there something that I can do with my dictionary or configuration to get around this anomaly?
UPDATE:
I think that "out" may be a stop word.
when I run this debug query, I don't get any lexemes for "out"
SELECT * FROM ts_debug('english','out back outback');
alias description token dictionaries dictionary lexemes
asciiword Word all ASCII out {english_stem} english_stem {}
blank Space symbols {}
asciiword Word all ASCII back {english_stem} english_stem {back}
blank Space symbols {}
asciiword Word all ASCII outback {english_stem} english_stem {outback}
So now I ask how do I modify the stop word list to remove a word?
UPDATE:
here is the query that I currently using:
select id,name,address,city,state,likes
from view_business_favorite_count
where textsearchable_index_col ## to_tsquery('simple',$1)
ORDER BY ts_rank(textsearchable_index_col, to_tsquery('simple',$1)) DESC
When I execute the query (I'm using Strongloop Loopback + Express + Node), I pass the pattern in to replace $1 param. The pattern (as stated above) will look something like "keyword:" or "keyword1 & keyword2 & ... & keywordN:"
thanks

The problem here is that you are searching against business names and as #Daniel correctly pointed out - 'english' dictionary will not help you to find "fuzzy" match for NON-dictionary words like "Outback Steakhouse" etc;
'simple' dictionary
'simple' dictionary on its own will not help you neither, in your case business names will work only for exact match as all words are unstemmed.
'simple' dictionary + pg_trgm
But if you use 'simple' dictionary together with pg_trgm module - it will be exactly what you need, in particular:
for to_tsvector('simple','<business name>') you don't need to worry about stop words "hack", you will get all the lexemes unstemmed;
using similarity() from pg_trgm you will get the the highest "rank"
for the best match,
look at this:
WITH pg_trgm_test(business_name,search_pattern) AS ( VALUES
('Outback Steakhouse','ou'),
('Outback Steakhouse','out'),
('Outback Steakhouse','outb')
)
SELECT business_name,search_pattern,similarity(business_name,search_pattern)
FROM pg_trgm_test;
result:
business_name | search_pattern | similarity
--------------------+----------------+------------
Outback Steakhouse | ou | 0.1
Outback Steakhouse | out | 0.15
Outback Steakhouse | outb | 0.2
(3 rows)
Ordering by similarity DESC you will be able to get what you need.
UPDATE
For you situation there are 2 possible options.
Option #1.
Just create trgm index for name column in view_business_favorite_count table; index definition may be the following:
CREATE INDEX name_trgm_idx ON view_business_favorite_count USING gin (name gin_trgm_ops);
Query will look something like that:
SELECT
id,
name,
address,
city,
state,
likes,
similarity(name,$1) AS trgm_rank -- similarity score
FROM
view_business_favorite_count
WHERE
name % $1 -- trgm search
ORDER BY trgm_rank DESC;
Option #2.
With full text search, you need to :
create a separate table, for example unnested_business_names, where you will store 2 columns: 1st column will keep all lexemes from to_tsvector('simple',name) function, 2nd column will have vbfc_id(FK for id from view_business_favorite_count table);
add trgm index for the column, which contains lexemes;
add trigger for unnested_business_names, which will update OR insert OR delete new values from view_business_favorite_count to keep all words up to date

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

SQLAlchemy search on ts_vector column without to_tsvector - postgresql

How about column.op("##")(func.plainto_tsquery('english', search))

Related

How to query only the data that has emojis (postgresql)

Parametric query and hstore in PostgreSQL

test JOOQ postgres jsonb column for key exists

Ignore newline characters when performing a fulltext search in MS SQL Server

why is this postgresql full text search query returning ts_rank of 0?

Categories

Resources