Strange PostgreSQL index using while using LIMIT..OFFSET - postgresql

PostgreSQL 9.6.3 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit
Table and indices:
create table if not exists orders
(
id bigserial not null constraint orders_pkey primary key,
partner_id integer,
order_id varchar,
date_created date,
state_code integer,
state_date timestamp,
recipient varchar,
phone varchar,
);
create index if not exists orders_partner_id_index on orders (partner_id);
create index if not exists orders_order_id_index on orders (order_id);
create index if not exists orders_partner_id_date_created_index on orders (partner_id, date_created);
The task is to create paging/sorting/filtering data.
The query for the first page:
select order_id, date_created, recipient, phone, state_code, state_date
from orders
where partner_id=1 and date_created between '2019-04-01' and '2019-04-30'
order by order_id asc limit 10 offset 0;
The query plan:
QUERY PLAN
"Limit (cost=19495.48..38990.41 rows=10 width=91)"
" -> Index Scan using orders_order_id_index on orders (cost=0.56..**41186925.66** rows=21127 width=91)"
" Filter: ((date_created >= '2019-04-01'::date) AND (date_created <= '2019-04-30'::date) AND (partner_id = 1))"
Index orders_partner_id_date_created_index is not used, so the cost is extremely high!
But starting from some offset values (the exact value differs from time to time, looks like it depends on total row count) the index starts to be used:
select order_id, date_created, recipient, phone, state_code, state_date
from orders
where partner_id=1 and date_created between '2019-04-01' and '2019-04-30'
order by order_id asc limit 10 offset 40;
Plan:
QUERY PLAN
"Limit (cost=81449.76..81449.79 rows=10 width=91)"
" -> Sort (cost=81449.66..81502.48 rows=21127 width=91)"
" Sort Key: order_id"
" -> Bitmap Heap Scan on orders (cost=4241.93..80747.84 rows=21127 width=91)"
" Recheck Cond: ((partner_id = 1) AND (date_created >= '2019-04-01'::date) AND (date_created <= '2019-04-30'::date))"
" -> Bitmap Index Scan on orders_partner_id_date_created_index (cost=0.00..4236.65 rows=21127 width=0)"
" Index Cond: ((partner_id = 1) AND (date_created >= '2019-04-01'::date) AND (date_created <= '2019-04-30'::date))"
What's happening? Is this a way to force the server to use the index?

General answer:
Postgres stores some information about your tables
Before executing the query, planner prepares execution plan based on those informations
In your case, planner thinks that for certain offset value this sub-optimal plan will be better. Note that your desired plan requires sorting all selected rows by order_id, while this "worse" plan does not. I'd guess that Postgres bets there will be quite many such rows for various orders and just tests one order after another, starting from lowest.
I can think of two solutions:
A) provide more data to the planer, by running
ANALYZE orders;
(https://www.postgresql.org/docs/9.6/sql-analyze.html)
or bo changing gathered statistics
ALTER TABLE orders SET STATISTCS (...);
(https://www.postgresql.org/docs/9.6/planner-stats.html)
B) Rewrite query in a way that hints desired index usage, like this:
WITH
partner_date (partner_id, date_created) AS (
SELECT 1,
generate_series('2019-04-01'::date, '2019-04-30'::date, '1 day'::interval)::date
)
SELECT o.order_id, o.date_created, o.recipient, o.phone, o.state_code, o.state_date
FROM orders o
JOIN partner_date pd
ON (o.partner_id, o.date_created) = (pd.partner_id, pd.date_created)
ORDER BY order_id ASC LIMIT 10 OFFSET 0;
Or maybe even better:
WITH
partner_date (partner_id, date_created) AS (
SELECT 1,
generate_series('2019-04-01'::date, '2019-04-30'::date, '1 day'::interval)::date
),
all_data AS (
SELECT o.order_id, o.date_created, o.recipient, o.phone, o.state_code, o.state_date
FROM orders o
JOIN partner_date pd
ON (o.partner_id, o.date_created) = (pd.partner_id, pd.date_created)
)
SELECT *
FROM all_data
ORDER BY order_id ASC LIMIT 10 OFFSET 0;
Disclaimer - I can't explain why the first query should be interpreted in other way by Postgres planner, just think it could. On the other hand, second query separates offsets/limits from joins and I'd be very surprised if Postgres still did it the "bad" (according to you benchmarks) way.

Related

Postgres does not use composite index

I have a table.
CREATE TABLE orders(
id int NOT NULL,
created_at timestamp DEFAULT CURRENT_TIMESTAMP,
order_type_id int,
);
And 2 indexes:
CREATE INDEX ix_orders_created_at ON orders (created_at);
CREATE INDEX ix_orders_order_type_id_created_at ON orders (orders_type_id, created_at desc);
Sometimes I want to get all orders sorted by created_at DESC and sometimes I need to get orders of specific type only, also sorted by created_at. For the second case the query is:
SELECT * FROM orders
WHERE order_type_id=(SELECT id FROM order_types WHERE name='order_type_name')
ORDER by created_at DESC
LIMIT 50;
I expect that postgres will use seconds composite index for such query. But no, postgres uses simple index on created_at, "scans" all table records by date and filters required types.
QUERY PLAN
----------------------------------------------------------------------------------------------
Index Scan using ix_orders_created_at on orders (cost=1.68..1663504.64 rows=5747582 width=1471)
Filter: (order_type_id = $0)
InitPlan 1 (returns $0)
-> Seq Scan on order_types (cost=0.00..1.11 rows=1 width=4)
Filter: (name = 'order_type_name'::text)
(5 rows)
For orders of frequent types it is acceptable, but for rare or not yet existing types of orders this leads to heavy and long queries scaning a lot of records. How do I force postgres to use composite index instead? "ANALAZE orders" doesn't help.

Multi column order by kills query performance even when the time range does not contain any records

I have a fairly small table of 26 million records.
CREATE TABLE t1
(
cam varchar(100) NOT NULL,
updatedat timestamp,
objid varchar(40) NOT NULL,
image varchar(100) NOT NULL,
reader varchar(60) NOT NULL,
imgcap timestamp NOT NULL
);
ALTER TABLE t1
ADD CONSTRAINT t1_pk
PRIMARY KEY (reader, cam, image, objid, imgcap);
I have a simple query to iterate the records between a time range.
SELECT * FROM t1
WHERE updatedat >= '2021-12-09 20:30:00' and updatedat <= '2021-12-09 20:32:01'
ORDER BY reader ASC , imgcap ASC, objid ASC, cam ASC, image ASC
LIMIT 10000
OFFSET 0;
I added an index to support the query with the comparison as the left most field and the remaining elements to support the sort.
CREATE INDEX t1_idtmp ON t1 USING btree (updatedat , reader , imgcap , objid, cam, image);
However, the query takes more than 10 seconds to get complete. It takes same time even if there are no elements in the range.
-> Incremental Sort (cost=8.28..3809579.24 rows=706729 width=223) (actual time=11034.114..11065.710 rows=10000 loops=1)
Sort Key: reader, imgcap, objid, cam, image
Presorted Key: reader, imgcap
Full-sort Groups: 62 Sort Method: quicksort Average Memory: 42kB Peak Memory: 42kB
Pre-sorted Groups: 62 Sort Methods: top-N heapsort, quicksort Average Memory: 58kB Peak Memory: 58kB
-> Index Scan using t1_idxevtim on t1 (cost=0.56..3784154.75 rows=706729 width=223) (actual time=11033.613..11036.823 rows=10129 loops=1)
Filter: ((updatedat >= '2021-12-09 20:30:00'::timestamp without time zone) AND (updatedat <= '2021-12-09 20:32:01'::timestamp without time zone))
Rows Removed by Filter: 25415461
Planning Time: 0.137 ms
Execution Time: 11066.791 ms
There are couple of more indexes on table to support other use cases.
CREATE INDEX t1_idxua ON t1 USING btree (updatedat);
CREATE INDEX t1_idxevtim ON t1 USING btree (reader, imgcap);
I think, Postgresql wants to avoid an expensive sort and thinks that pre sorted key will be faster but why does Postgresql not use the t1_idtmp index as both search & sort can be satisfied with it?
why does Postgresql not use the t1_idtmp index as both search & sort can be satisfied with it?
Because the sort can't be satisfied by it. An btree index on (updatedat , reader , imgcap , objid, cam, image) can only produce data ordered by reader , imgcap , objid, cam, image for within ties of updatedat. So if your condition was for a specific value of updatedat, that would work. But since it is for a range of updatedat, that won't work as they are not all tied with each other.

UPDATE query for 180k rows in 10M row table unexpectedly slow

I have a table that is getting too big and I want to reduce it's size
with an UPDATE query. Some of the data in this table is redundant, and
I should be able to reclaim a lot of space by setting the redundant
"cells" to NULL. However, my UPDATE queries are taking excessive
amounts of time to complete.
Table details
-- table1 10M rows (estimated)
-- 45 columns
-- Table size 2200 MB
-- Toast Table size 17 GB
-- Indexes Size 1500 MB
-- **columns in query**
-- id integer primary key
-- testid integer foreign key
-- band integer
-- date timestamptz indexed
-- data1 real[]
-- data2 real[]
-- data3 real[]
This was my first attempt at an update query. I broke it up into some
temporary tables just to get the id's to update. Further, to reduce the
query, I selected a date range for June 2020
CREATE TEMP TABLE A as
SELECT testid
FROM table1
WHERE date BETWEEN '2020-06-01' AND '2020-07-01'
AND band = 3;
CREATE TEMP TABLE B as -- this table has 180k rows
SELECT id
FROM table1
WHERE date BETWEEN '2020-06-01' AND '2020-07-01'
AND testid in (SELECT testid FROM A)
AND band > 1
UPDATE table1
SET data1 = Null, data2 = Null, data3 = Null
WHERE id in (SELECT id FROM B)
Queries for creating TEMP tables execute in under 1 sec. I ran the UPDATE query for an hour(!) before I finally killed it. Only 180k
rows needed to be updated. It doesn't seem like it should take that much
time to update that many rows. Temp table B identifies exactly which
rows to update.
Here is the EXPLAIN from the above UPDATE query. One of the odd features of this explain is that it shows 4.88M rows, but there are only 180k rows to update.
Update on table1 (cost=3212.43..4829.11 rows=4881014 width=309)
-> Nested Loop (cost=3212.43..4829.11 rows=4881014 width=309)
-> HashAggregate (cost=3212.00..3214.00 rows=200 width=10)
-> Seq Scan on b (cost=0.00..2730.20 rows=192720 width=10)
-> Index Scan using table1_pkey on table1 (cost=0.43..8.07 rows=1 width=303)
Index Cond: (id = b.id)
Another way to run this query is in one shot:
WITH t as (
SELECT id from table1
WHERE testid in (
SELECT testid
from table1
WHERE date BETWEEN '2020-06-01' AND '2020-07-01'
AND band = 3
)
)
UPDATE table1 a
SET data1 = Null, data2 = Null, data3 = Null
FROM t
WHERE a.id = t.id
I only ran this one for about 10 minutes before I killed it. It feels like I should be able to run this query in much less time if I just knew the tricks. This query has EXPLAIN below. This explain shows 195k rows which is more expected, but cost is much higher # 1.3M to 1.7M
Update on testlog a (cost=1337986.60..1740312.98 rows=195364 width=331)
CTE t
-> Hash Join (cost=8834.60..435297.00 rows=195364 width=4)
Hash Cond: (testlog.testid = testlog_1.testid)
-> Seq Scan on testlog (cost=0.00..389801.27 rows=9762027 width=8)
-> Hash (cost=8832.62..8832.62 rows=158 width=4)"
-> HashAggregate (cost=8831.04..8832.62 rows=158 width=4)
-> Index Scan using amptest_testlog_date_idx on testlog testlog_1 (cost=0.43..8820.18 rows=4346 width=4)
Index Cond: ((date >= '2020-06-01 00:00:00-07'::timestamp with time zone) AND (date <= '2020-07-01 00:00:00-07'::timestamp with time zone))
Filter: (band = 3)
-> Hash Join (cost=902689.61..1305015.99 rows=195364 width=331)
Hash Cond: (t.id = a.id)
-> CTE Scan on t (cost=0.00..3907.28 rows=195364 width=32)
-> Hash (cost=389801.27..389801.27 rows=9762027 width=303)
-> Seq Scan on testlog a (cost=0.00..389801.27 rows=9762027 width=303)
Edit: one of the suggestions in the accepted answer was to drop any indexes before the update and then add them back later. This is what I went with, with a twist: I needed another table to hold indexed data from the dropped indexes to make the A and B queries faster:
CREATE TABLE tempid AS
SELECT id, testid, band, date
FROM table1
I made indexes on this table for id, testid, and date. Then I replaced table1 in the A and B queries with tempid. It still went slower than I would have liked, but it did get the job done.
You might have another table that has a foreign key to this table to one or more columns you are setting to NULL. And this foreign table does not have an index on the column.
Each time you set the row value to NULL the database has to check the foreign table - maybe it has a row that references the value you are removing.
If this is the case you should be able to speed it up by adding an index on this remote table.
For example if you have a table like this:
create table table2 (
id serial primary key,
band integer references table1(data1)
)
Then you can create an index create index table2_band_nnull_idx on table2(band) where band is not null.
But you suggested that all columns you are setting to NULL have array type. This means that it is unlikely that they are referenced. Still it is worth checking.
Another possibility is that you have a trigger on the table that works slowly.
Another possibility is that you have a lot of indexes on the table. Each index has to be updated for each row you update and it can use only a single processor core.
Sometimes it is faster to drop all indexes, do the bulk update and then recreate them all back. Creating indexes can use multiple cores - one core per index.
Another possibility is that your query is waiting for some other query to finish and release its locks. You should check with:
select now()-query_start, * from pg_stat_activity where state<>'idle' order by 1;

How to speed up PostgreSQL aggregate select with sub queries and case statements

Background: I have a table containing financial transaction records. The table has several tens of millions of rows for tens of thousands of users. I need to fetch the sum of the transactions for showing balances and other aspects of the site.
My current query can get extremely slow and often times out. I have tried optimizing the query but can't seem to get it to run efficiently.
Environment: My application is running on Heroku using a Postgres Standard-2 plan (8GB ram, 400 max connections, 256GB allowed storage). My max connections at any given time is about 20 and my current DB size is 35GB. According to statistics, this query runs on average about 1,000ms and is used very frequently which has a big impact on site performance.
For the database, the index cache hit rate is 99% and the table cache hit rate is 97%. Autovacuum runs about every other day based on the current thresholds.
Here's my current transactions table setup:
CREATE TABLE transactions (
id bigint DEFAULT nextval('transactions_id_seq'::regclass) NOT NULL,
user_id integer NOT NULL,
date timestamp without time zone NOT NULL,
amount numeric(15,2) NOT NULL,
transaction_type integer DEFAULT 0 NOT NULL,
account_id integer DEFAULT 0,
reconciled integer DEFAULT 0,
parent integer DEFAULT 0,
ccparent integer DEFAULT 0,
created_at timestamp without time zone DEFAULT now() NOT NULL
);
CREATE INDEX transactions_user_id_key ON transactions USING btree (user_id);
CREATE INDEX transactions_user_date_idx ON transactions (user_id, date);
CREATE INDEX transactions_user_ccparent_idx ON transactions (user_id, ccparent) WHERE ccparent >0;
And here's my current query:
SELECT account_id,
sum(deposit) - sum(withdrawal) AS balance,
sum(r_deposit)-sum(r_withdrawal) AS r_balance,
sum(deposit) AS o_deposit,
sum(withdrawal) AS o_withdrawal,
sum(r_deposit) AS r_deposit,
sum(r_withdrawal) AS r_withdrawal
FROM
(SELECT t.account_id,
CASE
WHEN transaction_type > 0 THEN sum(amount)
ELSE 0
END AS deposit,
CASE
WHEN transaction_type = 0 THEN sum(amount)
ELSE 0
END AS withdrawal,
CASE
WHEN transaction_type > 0 AND reconciled=0 THEN sum(amount)
ELSE 0
END AS r_deposit,
CASE
WHEN transaction_type = 0 AND reconciled=0 THEN sum(amount)
ELSE 0
END AS r_withdrawal
FROM transactions AS t
WHERE user_id = $1 AND parent=0 AND ccparent=0
GROUP BY transaction_type, account_id, reconciled ) AS t0
GROUP BY account_id;
The query has several parts. I have to get the following for each account the user has:
1) the overall account balance
2) the balance for all reconciled transactions
3) separately, the sum of all deposits, withdrawals, reconciled deposits and reconciled withdrawals.
Here's one query plan when I run explain analyze on the query:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=13179.85..13180.14 rows=36 width=132) (actual time=1326.200..1326.204 rows=6 loops=1)
Group Key: t.account_id
-> HashAggregate (cost=13179.29..13179.58 rows=36 width=18) (actual time=1326.163..1326.171 rows=16 loops=1)
Group Key: t.transaction_type, t.account_id, t.reconciled
-> Bitmap Heap Scan on transactions t (cost=73.96..13132.07 rows=13491 width=18) (actual time=17.410..1317.863 rows=12310 loops=1)
Recheck Cond: (user_id = 1)
Filter: ((parent = 0) AND (ccparent = 0))
Rows Removed by Filter: 2
Heap Blocks: exact=6291
-> Bitmap Index Scan on transactions_user_id_key (cost=0.00..73.29 rows=13601 width=0) (actual time=15.901..15.901 rows=12343 loops=1)
Index Cond: (user_id = 1)
Planning time: 0.895 ms
Execution time: 1326.424 ms
Does anyone have any suggestions on how to speed up this query? Like I said, it's the most run query in my application and is also one of the most demanding on the DB. If I could optimize this, it would have tremendous benefits to the app in general.
Try if it picks up an index on transactions (user_id, parent, ccparent, transaction_type, account_id, reconciled).
CREATE INDEX transactions_u_p_ccp_tt_a_r_idx
ON transactions
(user_id,
parent,
ccparent,
transaction_type,
account_id,
reconciled);
Maybe you can even include amount in the index.
CREATE INDEX transactions_u_p_ccp_tt_a_r_a_idx
ON transactions
(user_id,
parent,
ccparent,
transaction_type,
account_id,
reconciled,
amount);

PostgreSQL: out of memory issues with array_agg() on a heroku db server

I'm stuck with a (Postgres 9.4.6) query that a) is using too much memory (most likely due to array_agg()) and also does not return exactly what I need, making post-processing necessary. Any input (especially regarding memory consumption) is highly appreciated.
Explanation:
the table token_groups holds all words used in tweets I've parsed with their respective occurrence frequency in a hstore, with one row per 10 minutes (for the last 7 days, so 7*24*6 rows in total). These rows are inserted in order of tweeted_at, so I can simply order by id. I'm using row_numberto identify when a word occurred.
# \d token_groups
Table "public.token_groups"
Column | Type | Modifiers
------------+-----------------------------+-----------------------------------------------------------
id | integer | not null default nextval('token_groups_id_seq'::regclass)
tweeted_at | timestamp without time zone | not null
tokens | hstore | not null default ''::hstore
Indexes:
"token_groups_pkey" PRIMARY KEY, btree (id)
"index_token_groups_on_tweeted_at" btree (tweeted_at)
What I'd ideally want is a list of words with each the relative distances of their row numbers. So if e.g. the word 'hello' appears in row 5 once, in row 8 twice and in row 20 once, I'd want a column with the word, and an array column returning {5,3,0,12}. (meaning: first occurrence in fifth row, next occurrence 3 rows later, next occurrence 0 rows later, next 12 rows later). If anyone wonders why: 'relevant' words occur in clusters, so (simplified) the higher the standard deviation of timely distances, the more likely a word is a keyword. See more here: http://bioinfo2.ugr.es/Publicaciones/PRE09.pdf
For now, I return an array with positions and an array with frequencies, and use this info to calculate the distances in ruby.
Currently the primary problem is a high memory spike, which seems to be caused by array_agg(). As I'm being told by the (very helpful) heroku staff that some of my connections use 500-700MB with very little shared memory, causing out of memory errors (I'm running Standard-0, which gives me 1GB total for all connections), I need to find an optimization.
The total number of hstore entries is ~100k, which then is aggregated (after skipping words with very low frequency):
SELECT COUNT(*)
FROM (SELECT row_number() over(ORDER BY id ASC) AS position,
(each(tokens)).key, (each(tokens)).value::integer
FROM token_groups) subquery;
count
--------
106632
Here is the query causing the memory load:
SELECT key, array_agg(pos) AS positions, array_agg(value) AS frequencies
FROM (
SELECT row_number() over(ORDER BY id ASC) AS pos,
(each(tokens)).key,
(each(tokens)).value::integer
FROM token_groups
) subquery
GROUP BY key
HAVING SUM(value) > 10;
The output is:
key | positions | frequencies
-------------+---------------------------------------------------------+-------------------------------
hello | {172,185,188,210,349,427,434,467,479} | {1,2,1,1,2,1,2,1,4}
world | {166,218,265,343,415,431,436,493} | {1,1,2,1,2,1,2,1}
some | {35,65,101,180,193,198,223,227,420,424,427,428,439,444} | {1,1,1,1,1,1,1,2,1,1,1,1,1,1}
other | {77,111,233,416,421,494} | {1,1,4,1,2,2}
word | {170,179,182,184,185,186,187,188,189,190,196} | {3,1,1,2,1,1,1,2,5,3,1}
(...)
Here's what explain says:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=12789.00..12792.50 rows=200 width=44) (actual time=309.692..343.064 rows=2341 loops=1)
Output: ((each(token_groups.tokens)).key), array_agg((row_number() OVER (?))), array_agg((((each(token_groups.tokens)).value)::integer))
Group Key: (each(token_groups.tokens)).key
Filter: (sum((((each(token_groups.tokens)).value)::integer)) > 10)
Rows Removed by Filter: 33986
Buffers: shared hit=2176
-> WindowAgg (cost=177.66..2709.00 rows=504000 width=384) (actual time=0.947..108.157 rows=106632 loops=1)
Output: row_number() OVER (?), (each(token_groups.tokens)).key, ((each(token_groups.tokens)).value)::integer, token_groups.id
Buffers: shared hit=2176
-> Sort (cost=177.66..178.92 rows=504 width=384) (actual time=0.910..1.119 rows=504 loops=1)
Output: token_groups.id, token_groups.tokens
Sort Key: token_groups.id
Sort Method: quicksort Memory: 305kB
Buffers: shared hit=150
-> Seq Scan on public.token_groups (cost=0.00..155.04 rows=504 width=384) (actual time=0.013..0.505 rows=504 loops=1)
Output: token_groups.id, token_groups.tokens
Buffers: shared hit=150
Planning time: 0.229 ms
Execution time: 570.534 ms
PS: if anyone wonders: every 10 minutes I append new data to the token_groupstable and remove outdated data. Which is easy when storing data one row per 10 minutes, I still have to come up with a better data structure that e.g. uses one row per word. But that does not seem to be the main issue, I think it's the array aggregation.
Your presented query can be simpler, evaluating each() only once per row:
SELECT key, array_agg(pos) AS positions, array_agg(value) AS frequencies
FROM (
SELECT t.key, pos, t.value::int
FROM (SELECT row_number() OVER (ORDER BY id) AS pos, * FROM token_groups) tg
, each(g.tokens) t -- implicit LATERAL join
ORDER BY t.key, pos
) sub
GROUP BY key
HAVING sum(value) > 10;
Also preserving correct order of elements.
What I'd ideally want is a list of words with each the relative distances of their row numbers.
This would do it:
SELECT key, array_agg(step) AS occurrences
FROM (
SELECT key, CASE WHEN g = 1 THEN pos - last_pos ELSE 0 END AS step
FROM (
SELECT key, value::int, pos
, lag(pos, 1, 0) OVER (PARTITION BY key ORDER BY pos) AS last_pos
FROM (SELECT row_number() OVER (ORDER BY id)::int AS pos, * FROM token_groups) tg
, each(g.tokens) t
) t1
, generate_series(1, t1.value) g
ORDER BY key, pos, g
) sub
GROUP BY key;
HAVING count(*) > 10;
SQL Fiddle.
Interpreting each hstore key as a word and the respective value as number of occurrences in the row (= for the last 10 minutes), I use two cascading LATERAL joins: 1st step to decompose the hstore value, 2nd step to multiply rows according to value. (If your value (frequency) is mostly just 1, you can simplify.) About LATERAL:
What is the difference between LATERAL and a subquery in PostgreSQL?
Then I ORDER BY key, pos, g in the subquery before aggregating in the outer SELECT. This clause seems to be redundant, and in fact, I see the same result without it in my tests. That's a collateral benefit from the window definition of lag() in the inner query, which is carried over to the next step unless any other step triggers re-ordering. However, now we depend on an implementation detail that's not guaranteed to work.
Ordering the whole query once should be substantially faster (and easier on the required sort memory) than per-aggregate sorting. This is not strictly according to the SQL standard either, but the simple case is documented for Postgres:
Alternatively, supplying the input values from a sorted subquery will usually work. For example:
SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
But this syntax is not allowed in the SQL standard, and is not portable to other database systems.
Strictly speaking, we only need:
ORDER BY pos, g
You could experiment with that. Related:
PostgreSQL unnest() with element number
Possible alternative:
SELECT key
, ('{' || string_agg(step || repeat(',0', value - 1), ',') || '}')::int[] AS occurrences
FROM (
SELECT key, pos, value::int
,(pos - lag(pos, 1, 0) OVER (PARTITION BY key ORDER BY pos))::text AS step
FROM (SELECT row_number() OVER (ORDER BY id)::int AS pos, * FROM token_groups) g
, each(g.tokens) t
ORDER BY key, pos
) t1
GROUP BY key;
-- HAVING sum(value) > 10;
Might be cheaper to use text concatenation instead of generate_series().