Postgresql tsvector not searching few strings - postgresql

I am using PostgreSQL 11, created tsvector with gin index on column search_fields.
Data in table test
id | name | search_fields
-------+--------------------------+--------------------------------
19973 | Ongoing 10x consultation | '10x' 'Ongoing' 'consultation'
19974 | 5x marketing | '5x' 'marketing'
19975 | Ongoing 15x consultation | '15x' 'Ongoing' 'consultation'
The default text search config is set as 'pg_catalog.english'.
Below both queries output 0 rows.
select id, name, search_fields from test where search_fields ## to_tsquery('ongoing');
id | name | search_fields
----+------+---------------
(0 rows)
select id, name, search_fields from test where search_fields ## to_tsquery('simple','ongoing');
id | name | search_fields
----+------+---------------
(0 rows)
But when I pass string as '10x' or 'consultation' it returns the correct output.
Any idea, why it is not searching for 'ongoing' words?
Afterwards, I have created the triggers using function tsvector_update_trigger() and update the search_fields and also set default_text_search_config to 'pg_catalog.simple' in postgresql.conf file, then I updated the search_fields with search_fields and it output as
select id, name, search_fields from test where search_fields ## to_tsquery('ongoing');
id | name | search_fields
----+---------------------------------+-----------------------------------------
19973 | Ongoing 10x consultation | '10x':2 'consultation':3 'ongoing':1
This time when I ran query passing 'ongoing' string it output as per the expected result.
select id, name, search_fields from test where search_fields ## to_tsquery('ongoing');
id | name | search_fields
-------+--------------------------+--------------------------------
19973 | Ongoing 10x consultation | '10x':2 'consultation':3 'ongoing':1
19975 | Ongoing 15x consultation | '15x':2 'consultation':3 'ongoing':1
As per above experiment, setting trigger and default_text_search_config to 'pg_catalog.simple' help to achieve the result.
Now, I don't know what is the reason why it didn't work with default_text_search_config to 'pg_catalog.english'.
Is trigger always required when tsvector is used?
Any help in understanding the difference between both would be appreciated.
Thanks,
Nishit

You don't describe how you create your search_fields initially. It was not constructed correctly. Since we don't know what you did, we don't know what you did wrong. If you rebuild it correctly, then it will start working. When you changed default_text_search_config to 'simple', you appear to have correctly repopulated the search_fields, which is why it worked. If you change back to 'english' and correctly repopulate the search_fields then it will also work.
You don't always need a trigger. A trigger is one way. Another way is to just manually update the tsvector column every time you update the text column. My usual favorite way is not to store the tsvector at all, and just derive it on the fly:
select id, name, search_fields from test where
to_tsvector('english',name) ## to_tsquery('english','ongoing');
If you want to do it this way, you need to specify the configuration, not rely on default_text_search_config, otherwise the expressional gin index will not be used. Also, this way is not a good idea if you want to use phrase searching, as the rechecking will be slow.

Related

Is this INSERT statement containing SELECT subquery safe for multiple concurrent writes?

In Postgres, suppose I have the following table to be used like to a singly linked list, where each row has a reference to the previous row.
Table node
Column | Type | Collation | Nullable | Default
-------------+--------------------------+-----------+----------+----------------------
id | uuid | | not null | gen_random_uuid()
created_at | timestamp with time zone | | not null | now()
name | text | | not null |
prev_id | uuid | | |
I have the following INSERT statement, which includes A SELECT to look up the last row as data for the new row to be inserted.
INSERT INTO node(name, prev_id)
VALUES (
:name,
(
SELECT id
FROM node
ORDER BY created_at DESC
LIMIT 1
)
)
RETURNING id;
I understand storing prev_id may seem redundant in this example (ordering can be derived from created_at), but that is beside the point. My question: Is the above INSERT statement safe for multiple concurrent writes? Or, is it necessary to explicitly use LOCK in some way?
For clarity, by "safe", I mean is it possible that by the time the SELECT subquery executed and found the "last row", another concurrent query would have just finished an insert, so the "last row" found earlier is no longer the last row, so this insert would use the wrong "last row" value. The effect is multiple rows may share the same prev_id values, which is invalid for a linked list structure.

Postgres Full Text Search - Find Other Similar Documents

I am looking for a way to use Postgres (version 9.6+) Full Text Search to find other documents similar to an input document - essentially looking for a way to produce similar results to Elasticsearch's more_like_this query. Far as I can tell Postgres offers no way to compare ts_vectors to each other.
I've tried various techniques like converting the source document back into a ts_query, or reprocessing the original doc but that requires too much overhead.
Would greatly appreciate any advice - thanks!
Looks like the only option is to use pg_trgm instead of the Postgres built in full text search. Here is how I ended up implementing this:
Using a simple table (or materialized view in this case) - it holds the primary key to the post and the full text body in two columns.
Materialized view "public.text_index"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+---------+-----------+----------+---------+----------+--------------+-------------
id | integer | | | | plain | |
post | text | | | | extended | |
View definition:
SELECT posts.id,
posts.body AS text
FROM posts
ORDER BY posts.publication_date DESC;
Then using a lateral join we can match rows and order them by similarity to find posts that are "close to" or "related" to any other post:
select * from text_index tx
left join lateral (
select similarity(tx.text, t.text) from text_index t where t.id = 12345
) s on true
order by similarity
desc limit 10;
This of course is a naive way to match documents and may require further tuning. Additionally using a tgrm gin index on the text column will speed up the searches significantly.

Is this a postgresql bug? Only one row can not query by equal but can query by like

i have a table,only one row in this table can not query by equal query,but can query by like (not incloud %).
postgresql server version:90513
# select id,external_id,username,external_id from users where username = 'oFIC94vdidrrKHpi5lc1_2Ibv-OA';
id | external_id | username | external_id
----+-------------+----------+-------------
(0 rows)
# select id,external_id,username,external_id from users where username like 'oFIC94vdidrrKHpi5lc1_2Ibv-OA';
id | external_id | username | external_id
--------------------------------------+------------------------------+------------------------------+------------------------------
61ebea19-74f5-4713-9a30-63eb5af8ac8f | oFIC94vdidrrKHpi5lc1_2Ibv-OA | oFIC94vdidrrKHpi5lc1_2Ibv-OA | oFIC94vdidrrKHpi5lc1_2Ibv-OA
(1 row)
if i dump this table and restore it,it will be fixed. by why.
it is a postgresql bug? how can i workaround it. I've met twice.
Do you have an index on this table? If yes, this appears like corrupted index - PostgreSQL uses index in first case, and if the index is corrupt it might return no result.
This is usually bug, either software one or hardware (data loss on power loss, or any memory issues). Try dropping and recreating index, or rebuilding it with https://www.postgresql.org/docs/9.3/sql-reindex.html

Insert last characters of a value in Postgres table while inserting that same value

CONTEXT:
I'm currently building a custom import script in Python with psycopg2 that inserts values from a csv file into a postgres database. The csv however provides me with a value that needs refining.
PROBLEM: In the example below you can see what I want:
I want the 5 last digits from the 15-digit.
mytestdb=# select * from testtable;
uid | first_name | last_name | age | 15-digit | last_5_digits
-----+------------+-----------+-----+----------------------------+-----------------
1 | John | Doe | 42 | 99999999912345 | 12345
I know I could accomplish this by first inserting the supplied values (first_name, last_name, age and 15-digit) and then using RIGHT(15-digit,5) and an UPDATE statement fill the last_5_digits field.
I would however prefer to do this during the initial INSERT of the row. This would considerably lower the amount of transactions on the database.
Could anyone help me getting this done?

DB2 table partitioning and delete old records based on condition

I have a table with few million records.
___________________________________________________________
| col1 | col2 | col3 | some_indicator | last_updated_date |
-----------------------------------------------------------
| | | | yes | 2009-06-09.12.2345|
-----------------------------------------------------------
| | | | yes | 2009-07-09.11.6145|
-----------------------------------------------------------
| | | | no | 2009-06-09.12.2345|
-----------------------------------------------------------
I have to delete records which are older than month with some_indicator=no.
Again I have to delete records older than year with some_indicator=yes.This job will run everyday.
Can I use db2 partitioning feature for above requirement?.
How can I partition table using last_updated_date column and above two some_indicator values?
one partition should contain records falling under monthly delete criterion whereas other should contain yearly delete criterion records.
Are there any performance issues associated with table partitioning if this table is being frequently read,upserted?
Any other best practices for above requirement will surely help.
I haven't done much with partitioning (I've mostly worked with DB2 on the iSeries), but from what I understand, you don't generally want to be shuffling things between partitions (ie - making the partition '1 month ago'). I'm not even sure if it's even possible. If it was, you'd have to scan some (potentially large) portion of your table every day, just to move it (select, insert, delete, in a transaction).
Besides which, partitioning is a DB Admin problem, and it sounds like you just have a DB User problem - namely, deleting 'old' records. I'd just do this in a couple of statements:
DELETE FROM myTable
WHERE some_indicator = 'no'
AND last_updated_date < TIMESTAMP(CURRENT_DATE - 1 MONTH, TIME('00:00:00'))
and
DELETE FROM myTable
WHERE some_indicator = 'yes'
AND last_updated_date < TIMESTAMP(CURRENT_DATE - 1 YEAR, TIME('00:00:00'))
.... and you can pretty much ignore using a transaction, as you want the rows gone.
(as a side note, using 'yes' and 'no' for indicators is terrible. If you're not on a version that has a logical (boolean) type, store character '0' (false) and '1' (true))