Speeding up postgres query - postgresql

I have table 'auctions' with 60k records.
It has a vector column that contains tsearch vectors like below
auctions.tsvector_content_tsearch
107658 | '-75':75 '-83':81 '0.265':49 '0.50':140 '1':62 '1000':61 '1080':38 '16':39 '160':91 '170':86 '1920':36 '1920x1080':65,69 '2':154 '219':129 '23':3,20 '23.0':31 '236v3lsb':6,23 '24':164 '24.75':134 '250':58 '3.190':117 '30':80 '426':127 '5':54 '5.0':99 '56':74 '566':125 '9':40 'black':45 'cal':32 'cd/m2':59 'compatible':158 'czarny':46 'czas':51 'czuwać':139 'częstotliwość':71,77 'd':110,114 'd-sub':109 'dodatkowy':146,152 'dvi':7,24,113 'dvi-d':112 'ekran':30 'energia':131,137 'energy':97 'epeat':100 'ergonomics':104 'full':41 'g':120 'gs':106 'gwarancja':153,161 'hd':42 'hz':76 'informacja':151 'jasność':56 'kabel':147,149 'kensington':156 'kg':116,118 'khz':82 'kolor':43 'kontrast':60 'kąt':83,88 'lcd':2,15,19 'led':4,21 'lina':28 'lock':157 'maksymalny':64 'matryca':34,53,57 'miejsce':165 'miesiąc':163 'mm':50 'monitor':1,14,18,96 'mś':55 'nazwać':12 'norma':93 'obudowa':44 'odchylać':72,78 'ogólny':17 'okres':159 'opis':16 'optymalny':68 'philips':5,11,22 'piksel':66,70 'pionowy':73,85 'plamka':48 'pobór':130,136 'poziom':90 'poziomy':79,90 'producent':10 'przekątna':29 'przeć':133 'reakcja':52 'rodzina':25 'rohs':102 'rozdzielczość':63,67 'rękojmia':160 'serwis':167 'serwisować':166 'silver':101 'specyfikacja':8 'spełniać':94 'star':98 'stopień':87,92 'sub':111 'techniczny':9 'tryb':138 'tryba':138 'tuv':103,105 'typ':13,33 'v':27 'v-line':26 'vga':150 'waga':115 'wbudować':142 'widzenia':84,89 'widzenie':84,89 'widzieć':84,89 'wielkość':47 'wuxga':35 'wymiar':119 'wyposażenie':145 'wyposażyć':145 'x':37,121,123,126,128 'zasilacz':143 'zasilać':148 'zewn':108 'zewnętrzny':168 'złączać':107 'złącze':107 'łat':155 'ś':122
Table auction has an index on that column:
"auctions_tsvector_content_tsearch_idx" gin (tsvector_content_tsearch)
When I search for some matching vectors query takes about 4000-5000ms; that is too long.
Is there any way to speed things up here?
EXPLAIN SELECT auctions.id FROM auctions WHERE (auctions.tsvector_content_tsearch ## to_tsquery('polish', 'lcd'));
QUERY PLAN
--------------------------------------------------------------
Seq Scan on auctions (cost=0.00..6598.02 rows=7762 width=4)
Filter: (tsvector_content_tsearch ## '''lcd'''::tsquery)
(2 rows)
_ EDIT __
Ok I think I found a problem: polish dictionary.
Using standard postgres dictionary fix long time problem.
Thanks for tips

Apparently, the planner estimated that sequential scan is going to be faster than using the index. Try the following:
SET enable_seqscan=off (useful for test, however - do not use it in production)
raising the stats target
That behaviour sometimes occurs with GIN indices. Check this thread on PostgreSQL mailing list. You can also consult the official PostgreSQL documentation about this issue.

Related

psycopg2 execute_values fails with > 100 rows in supabase?

Not sure if this is a Supabase issue or a psycopg2 issue honestly and would love some help debugging.
I have the following code:
args = [('HaurGlass','60000','2022-10-20T21:15:39.751Z','10130506261','ac76e8db-ace0-40df-b6fa-f470641805e9','ad43639e-f66e-49d5-8fe8-d1ce5cd26193','{}')]
statement = ('''
INSERT INTO %s (%s) VALUES %s ON CONFLICT (company_id, crm_id)
DO UPDATE SET (%s)=(%s) RETURNING crm_id, id''')
statement = cur.mogrify(statement,
(AsIs(db_table), AsIs(','.join(keys)),
AsIs("%s"), AsIs(','.join(update_keys)),
AsIs(','.join(excluded_keys))))
output = execute_values(cur, statement, args, fetch=True)
The weird thing is that if args is <=100 rows in length, this query works without any problems. As soon as I increase the length of args to 101 rows or more, my Postgres logs show:
INSERT INTO licenses (name,value,subscription_end,crm_id,company_id,csm_id,custom_data) VALUES ('HaurGlass','60000','2022-10-20T21:15:39.751Z','10130506261','ac76e8db-ace0-40df-b6fa-f470641805e9','ad43639e-f66e-49d5-8fe8-d1ce5cd26193','{}')...
which would be good, except that it's immediately followed by:
INSERT INTO licenses (name,value,subscription_end,crm_id,company_id,csm_id,custom_data) VALUES ('HaurGlass','60000','2022-10-20T21:15:39.751Z','10130506261','ac76e8db-ace0-40df-b6fa-f470641805e9',NULL,'{}'),...
I've also confirmed that the number of records in the second "NULLifying" query is exactly equal to len(args)-100.
Any idea what is going on?
OK so it turns out I was missing the page_size parameter. All I had to do was:
output = execute_values(cur, statement, args, fetch=True, page_size=len(args))

Is it possible to have hibernate generate update from values statements for postgresql?

Given a postgresql table
Table "public.test"
Column | Type | Modifiers
----------+-----------------------------+-----------
id | integer | not null
info | text |
And the following values :
# select * from test;
id | info
----+--------------
3 | value3
4 | value4
5 | value5
As you may know with postgresql you can use this kind of statements to update multiples rows with different values :
update test set info=tmp.info from (values (3,'newvalue3'),(4,'newvalue4'),(5,'newvalue5')) as tmp (id,info) where test.id=tmp.id;
And it results in the table being updated in a single queries to :
# select * from test;
id | info
----+--------------
3 | newvalue3
4 | newvalue4
5 | newvalue5
I have been looking around everywhere as to how to make hibernate generate this kind of statements for update queries. I know how to make it work for insert queries (with reWriteBatchedInserts jdbc option and hibernate batch config options).
But is it possible for update queries or do I have to write the native query myself ?
No matter what I do, hibernate always sends separate update queries to the database (I'm looking to the postgresql server statements logs for this affirmation).
2020-06-18 08:19:48.895 UTC [1642] LOG: execute S_6: BEGIN
2020-06-18 08:19:48.895 UTC [1642] LOG: execute S_8: update test set info = $1 where id = $2
2020-06-18 08:19:48.895 UTC [1642] DETAIL: parameters: $1 = 'newvalue3', $2 = '3'
2020-06-18 08:19:48.896 UTC [1642] LOG: execute S_8: update test set info = $1 where id = $2
2020-06-18 08:19:48.896 UTC [1642] DETAIL: parameters: $1 = 'newvalue4', $2 = '4'
2020-06-18 08:19:48.896 UTC [1642] LOG: execute S_8: update test set info = $1 where id = $2
2020-06-18 08:19:48.896 UTC [1642] DETAIL: parameters: $1 = 'newvalue4', $2 = '5'
2020-06-18 08:19:48.896 UTC [1642] LOG: execute S_1: COMMIT
I always find it many times faster to issue a single massive update query than many separate update targeting single rows. With many seperate update queries, even though they are sent in a batch by the jdbc driver, they still need to be processed sequentially by the server, so it is not as efficient as a single update query targeting multiples rows. So if anyone has a solution that wouldn't involve writing native queries for my entities, I would be very glad !
Update
To further refine my question I want to add a clarification. I'm looking for a solution that wouldn't abandon Hibernate dirty checking feature for entities updates. I'm trying to avoid to write batch update queries by hand for the general case of having to updating a few basic fields with different values on an entity list. I'm currently looking into the SPI of hibernate to see it if it's doable. org.hibernate.engine.jdbc.batch.spi.Batch seems to be the proper place but I'm not quite sure yet because I've never done anything with hibernate SPI). Any insights would be welcomed !
You can use Blaze-Persistence for this which is a query builder on top of JPA which supports many of the advanced DBMS features on top of the JPA model.
It does not yet support the FROM clause in DML, but that is about to land in the next release: https://github.com/Blazebit/blaze-persistence/issues/693
Meanwhile you could use CTEs for this. First you need to define a CTE entity(a concept of Blaze-Persistence):
#CTE
#Entity
public class InfoCte {
#Id Integer id;
String info;
}
I'm assuming your entity model looks roughly like this
#Entity
public class Test {
#Id Integer id;
String info;
}
Then you can use Blaze-Persistence like this:
criteriaBuilderFactory.update(entityManager, Test.class, "test")
.with(InfoCte.class, false)
.fromValues(Test.class, "newInfos", newInfosCollection)
.bind("id").select("newInfos.id")
.bind("info").select("newInfos.info")
.end()
.set("info")
.from(InfoCte.class, "cte")
.select("cte.info")
.where("cte.id").eqExpression("test.id")
.end()
.whereExists()
.from(InfoCte.class, "cte")
.where("cte.id").eqExpression("test.id")
.end()
.executeUpdate();
This will create an SQL query similar to the following
WITH InfoCte(id, info) AS(
SELECT t.id, t.info
FROM (VALUES(1, 'newValue', ...)) t(id, info)
)
UPDATE test
SET info = (SELECT cte.info FROM InfoCte cte WHERE cte.id = test.id)
WHERE EXISTS (SELECT 1 FROM InfoCte cte WHERE cte.id = test.id)

Add exclusion to Doctrine Query Builder in JOIN

I have built the following query with the Doctrine Query Builder in my Symfony application.
$qb->select('c')
->from('AppBundle:Course', 'c')
->join('AppBundle:Log', 'a', Expr\Join::WITH, $qb->expr()->eq('c.id', 'a.course'))
->where($qb->expr()->in('a.type', ':type'))
->andWhere($qb->expr()->between('a.time', ':start', ':end'))
->andWhere($qb->expr()->eq('c.status', ':status'))
->setParameter(':type', ['opened'])
->setParameter(':standardScratchScore', [74])
->setParameter(':status', Course::OPENED)
->setParameter(':start', $dateFrom->format('Y-m-d H:i:s'))
->setParameter(':end', $dateTo->format('Y-m-d H:i:s'))
;
In my code I iterate over the Courses and then again query the Log table to check that an entry with a specific type doesn't exist for the Course. Is there a way I can incorporate the exclusion of log.type = 'log.sent-email' for this Course into this initial query, without using something like a sub-select?
Querying the same table again within the loop feels sub-optimal to me and NewRelic suggests it is hurting the performance of my application.
Well you can always join the table one more time for this specific need:
$qb->select('c')
->from('AppBundle:Course', 'c')
->join('AppBundle:Log', 'a', Expr\Join::WITH, $qb->expr()->eq('c.id', 'a.course'))
->leftjoin(
'AppBundle:Log',
'b',
Expr\Join::WITH,
$qb->expr()->andx(
$qb->expr()->eq('c.id', 'b.course'),
$qb->expr()->eq('b.type', 'log.sent-email')
))
) // join log a second time, with the type condition
->where($qb->expr()->in('a.type', ':type'))
->andWhere($qb->expr()->between('a.time', ':start', ':end'))
->andWhere($qb->expr()->eq('c.status', ':status'))
->andWhere($qb->expr()->isNull('b.type')) // Only select records where no log record is found
->setParameter(':type', ['opened'])
->setParameter(':standardScratchScore', [74])
->setParameter(':status', Course::OPENED)
->setParameter(':start', $dateFrom->format('Y-m-d H:i:s'))
->setParameter(':end', $dateTo->format('Y-m-d H:i:s'))
;

Create a middle line through a polygon path using postgis

I'm trying to create a middle line through a polygon path but having problems, now i'm totally lost how to do it. Can anyone help to achieve this goal ?
ST_ApproximateMedialAxis might be what you're looking for.
This PostGIS function can be installed with the extension postgis_sfcgal:
CREATE EXTENSION postgis_sfcgal
Data Sample:
CREATE TABLE t (geom GEOMETRY);
INSERT INTO t VALUES ('POLYGON((-4.689807593822478 54.20411976258862,-4.68751162290573 54.20415427666532,-4.686465561389922 54.20414172609529,-4.685768187046051 54.20414800138079,-4.685280025005341 54.20414486373812,-4.685070812702178 54.204126037877415,-4.685092270374298 54.2040538719985,-4.685854017734527 54.204078973188075,-4.687039554119109 54.20407583554021,-4.688123166561126 54.204082110835685,-4.689078032970428 54.2040601472973,-4.689936339855194 54.20403818374726,-4.689807593822478 54.20411976258862))');
Query:
SELECT ST_ASText(ST_ApproximateMedialAxis(geom)) FROM t;
--------------------------------------------------------
MULTILINESTRING((-4.68993633985519 54.2040381837473,-4.68979598869017 54.2040808603332),(-4.68812343743121 54.2041135944836,-4.68907889005644 54.2040954248621),(-4.68812343743121 54.2041135944836,-4.68751156547988 54.2041164214432),(-4.68646560965613 54.2041095395079,-4.6858535007301 54.2041131034922),(-4.68646560965613 54.2041095395079,-4.68703949691079 54.2041122226661),(-4.6858535007301 54.2041131034922,-4.68576814087419 54.2041120816007),(-4.68907889005644 54.2040954248621,-4.68979598869017 54.2040808603332),(-4.68576814087419 54.2041120816007,-4.68528206303828 54.2041025125126),(-4.68703949691079 54.2041122226661,-4.68751156547988 54.2041164214432),(-4.68512015518242 54.2040925683677,-4.68528206303828 54.2041025125126))
(1 Zeile)
Depending on your use case, another option would be ST_StraightSkeleton:
SELECT ST_ASText(ST_StraightSkeleton(geom)) FROM t;
-----------------------------------------------------
MULTILINESTRING((-4.68980759382248 54.2041197625886,-4.68979598869017 54.2040808603332),(-4.68993633985519 54.2040381837473,-4.68979598869017 54.2040808603332),(-4.68907803297043 54.2040601472973,-4.68907889005644 54.2040954248621),(-4.68812316656113 54.2040821108357,-4.68812343743121 54.2041135944836),(-4.68703955411911 54.2040758355402,-4.68703949691079 54.2041122226661),(-4.68585401773453 54.2040789731881,-4.6858535007301 54.2041131034922),(-4.6850922703743 54.2040538719985,-4.68512015518242 54.2040925683677),(-4.68507081270218 54.2041260378774,-4.68512015518242 54.2040925683677),(-4.68528002500534 54.2041448637381,-4.68528206303828 54.2041025125126),(-4.68576818704605 54.2041480013808,-4.68576814087419 54.2041120816007),(-4.68646556138992 54.2041417260953,-4.68646560965613 54.2041095395079),(-4.68751162290573 54.2041542766653,-4.68751156547988 54.2041164214432),(-4.68812343743121 54.2041135944836,-4.68907889005644 54.2040954248621),(-4.68812343743121 54.2041135944836,-4.68751156547988 54.2041164214432),(-4.68646560965613 54.2041095395079,-4.6858535007301 54.2041131034922),(-4.68646560965613 54.2041095395079,-4.68703949691079 54.2041122226661),(-4.6858535007301 54.2041131034922,-4.68576814087419 54.2041120816007),(-4.68907889005644 54.2040954248621,-4.68979598869017 54.2040808603332),(-4.68576814087419 54.2041120816007,-4.68528206303828 54.2041025125126),(-4.68703949691079 54.2041122226661,-4.68751156547988 54.2041164214432),(-4.68512015518242 54.2040925683677,-4.68528206303828 54.2041025125126))
(1 Zeile)

H2 Optimize select statement / shutdown defrag

Test Case:
drop table master;
create table master(id int primary key, fk1 int, fk2 int, fk3 int, dataS varchar(255), data1 int, data2 int, data3 int, data4 int,data5 int,data6 int,data7 int,data8 int,data9 int,b1 boolean,b2 boolean,b3 boolean,b4 boolean,b5 boolean,b6 boolean,b7 boolean,b8 boolean,b9 boolean,b10 boolean,b11 boolean,b12 boolean,b13 boolean,b14 boolean,b15 boolean,b16 boolean,b17 boolean,b18 boolean,b19 boolean,b20 boolean,b21 boolean,b22 boolean,b23 boolean,b24 boolean,b25 boolean,b26 boolean,b27 boolean,b28 boolean,b29 boolean,b30 boolean,b31 boolean,b32 boolean,b33 boolean,b34 boolean,b35 boolean,b36 boolean,b37 boolean,b38 boolean,b39 boolean,b40 boolean,b41 boolean,b42 boolean,b43 boolean,b44 boolean,b45 boolean,b46 boolean,b47 boolean,b48 boolean,b49 boolean,b50 boolean);
create index idx_comp on master(fk1,fk2,fk3);
#loop 5000000 insert into master values(?, mod(?,100), mod(?,5), ?,'Hello World Hello World Hello World',?, ?, ?,?, ?, ?, ?, ?, ?,true,true,true,true,true,true,false,false,false,true,true,true,true,true,true,true,false,false,false,true,true,true,true,true,true,true,false,false,false,true,true,true,true,true,true,true,false,false,false,true,true,true,true,true,true,true,false,false,false,true);
1.The following select statement takes up to 30seconds. Is there a way to optimize the response time?
SELECT count(*), SUM(CONVERT(b1,INT)) ,SUM(CONVERT(b2,INT)),SUM(CONVERT(b3,INT)),SUM(CONVERT(b4,INT)),SUM(CONVERT(b5,INT)),SUM(CONVERT(b6,INT)),SUM(CONVERT(b7,INT)),SUM(CONVERT(b8,INT)),SUM(CONVERT(b9,INT)),SUM(CONVERT(b10,INT)),SUM(CONVERT(b11,INT)),SUM(CONVERT(b12,INT)),SUM(CONVERT(b13,INT)),SUM(CONVERT(b14,INT)),SUM(CONVERT(b15,INT)),SUM(CONVERT(b16,INT))
FROM master
WHERE fk1=53 AND fk2=3
2.I tried shutdown defrag. But this statement took about 40min for my test case. After shutdown defrag the select takes up to 15seconds. If i execute the statement again it takes under 1sec. Even if stop and start the server, the statement takes about 1sec.
Has H2 a persistent Cache?
Infrastructure: WebBrowser <-> H2 Console Server <-> H2 DB: h2 1.3.158
According to the profiler output, the main problem (93%) is reading from the disk. I ran this in the H2 Console:
#prof_start;
SELECT ... FROM master WHERE fk1=53 AND fk2=3;
#prof_stop;
and got:
Profiler: top 3 stack trace(s) of 48039 ms [build-158]:
4084/4376 (93%):
at java.io.RandomAccessFile.readBytes(Native Method)
at java.io.RandomAccessFile.read(RandomAccessFile.java:338)
at java.io.RandomAccessFile.readFully(RandomAccessFile.java:397)
at org.h2.store.FileStore.readFully(FileStore.java:285)
at org.h2.store.PageStore.readPage(PageStore.java:1253)
at org.h2.store.PageStore.getPage(PageStore.java:707)
at org.h2.index.PageDataIndex.getPage(PageDataIndex.java:225)
at org.h2.index.PageDataNode.getRowWithKey(PageDataNode.java:269)
at org.h2.index.PageDataNode.getRowWithKey(PageDataNode.java:270)
According to EXPLAIN ANALYZE SELECT it's reading over 55'000 pages from the disk (2 KB each page; 110 MB) for this query. I'm not sure how other databases perform for such a query. But I guess if possible the query should be changed so that it reads less data.
Is it possible to have a temporary table/view that already has the datatype conversions done? If it's feasible to have that update itself from the main table occassionally (once a night or so), then you've got a lot of processing power that goes into the conversion done already.
If that's not feasible, you may want to do multiple sub-selects, one for each "b" column, where you only pull where b# = 1. Then do a COUNT instead of a SUM, which should be faster as well. For instance:
SELECT (count1+count2) AS Count,
(SELECT COUNT(*) FROM master WHERE fk1=53 AND fk2=3 AND b1=1) AS count1
(SELECT COUNT(*) FROM master WHERE fk1=53 AND fk2=3 AND b2=1) AS count2
I'm not sure if that exact syntax works in your program, but hopefully as a generic SQL idea it gets you on the right track.