Add datetime constraint to a PostgreSQL multi-column partial index

Add datetime constraint to a PostgreSQL multi-column partial index - postgresql

I've got a PostgreSQL table called queries_query, which has many columns.
Two of these columns, created and user_sid, are frequently used together in SQL queries by my application to determine how many queries a given user has done over the past 30 days. It is very, very rare that I query these stats for any time older than the most recent 30 days.
Here is my question:
I've currently created my multi-column index on these two columns by running:
CREATE INDEX CONCURRENTLY some_index_name ON queries_query (user_sid, created)
But I'd like to further restrict the index to only care about those queries in which the created date is within the past 30 days. I've tried doing the following:
CREATE INDEX CONCURRENTLY some_index_name ON queries_query (user_sid, created)
WHERE created >= NOW() - '30 days'::INTERVAL`
But this throws an exception stating that my function must be immutable.
I'd love to get this working so that I can optimize my index, and cut back on the resources Postgres needs to do these repeated queries.

You get an exception using now() because the function is not IMMUTABLE (obviously) and, quoting the manual:
All functions and operators used in an index definition must be "immutable" ...
I see two ways to utilize a (much more efficient) partial index:
1. Partial index with condition using constant date:
CREATE INDEX queries_recent_idx ON queries_query (user_sid, created)
WHERE created > '2013-01-07 00:00'::timestamp;
Assuming created is actually defined as timestamp. It wouldn't work to provide a timestamp constant for a timestamptz column (timestamp with time zone). The cast from timestamp to timestamptz (or vice versa) depends on the current time zone setting and is not immutable. Use a constant of matching data type. Understand the basics of timestamps with / without time zone:
Ignoring time zones altogether in Rails and PostgreSQL
Drop and recreate that index at hours with low traffic, maybe with a cron job on a daily or weekly basis (or whatever is good enough for you). Creating an index is pretty fast, especially a partial index that is comparatively small. This solution also doesn't need to add anything to the table.
Assuming no concurrent access to the table, automatic index recreation could be done with a function like this:
CREATE OR REPLACE FUNCTION f_index_recreate()
RETURNS void
LANGUAGE plpgsql AS
$func$
BEGIN
DROP INDEX IF EXISTS queries_recent_idx;
EXECUTE format('
CREATE INDEX queries_recent_idx
ON queries_query (user_sid, created)
WHERE created > %L::timestamp'
, LOCALTIMESTAMP - interval '30 days'); -- timestamp constant
-- , now() - interval '30 days'); -- alternative for timestamptz
END
$func$;
Call:
SELECT f_index_recreate();
now() (like you had) is the equivalent of CURRENT_TIMESTAMP and returns timestamptz. Cast to timestamp with now()::timestamp or use LOCALTIMESTAMP instead.
Select today's (since midnight) timestamps only
db<>fiddle here
Old sqlfiddle
If you have to deal with concurrent access to the table, use DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY. But you can't wrap these commands into a function because, per documentation:
... a regular CREATE INDEX command can be performed within a
transaction block, but CREATE INDEX CONCURRENTLY cannot.
So, with two separate transactions:
CREATE INDEX CONCURRENTLY queries_recent_idx2 ON queries_query (user_sid, created)
WHERE created > '2013-01-07 00:00'::timestamp; -- your new condition
Then:
DROP INDEX CONCURRENTLY IF EXISTS queries_recent_idx;
Optionally, rename to old name:
ALTER INDEX queries_recent_idx2 RENAME TO queries_recent_idx;
2. Partial index with condition on "archived" tag
Add an archived tag to your table:
ALTER queries_query ADD COLUMN archived boolean NOT NULL DEFAULT FALSE;
UPDATE the column at intervals of your choosing to "retire" older rows and create an index like:
CREATE INDEX some_index_name ON queries_query (user_sid, created)
WHERE NOT archived;
Add a matching condition to your queries (even if it seems redundant) to allow it to use the index. Check with EXPLAIN ANALYZE whether the query planner catches on - it should be able to use the index for queries on an newer date. But it won't understand more complex conditions not matching exactly.
You don't have to drop and recreate the index, but the UPDATE on the table may be more expensive than index recreation and the table gets slightly bigger.
I would go with the first option (index recreation). In fact, I am using this solution in several databases. The second incurs more costly updates.
Both solutions retain their usefulness over time, performance slowly deteriorates as more outdated rows are included in the index.

Related

How does now() get evaluated when passed into a function as a parameter

I have a table that is range partitioned on timestamp with timezone field. I was pretty surprised to find that the following where condition caused the planner to query every single 'child' table in the partition:
WHERE reading_time > (now() - '72:00:00'::interval)
As I learned, the planner doesn't know what now() will be at execution time, so it generates the plan to query every child table. That's understandable, but that defeats the purpose of setting up partitions in the first place! If I issue reading_time > '2018-03-31', it'll only do an index scan the tables that have data that meets those conditions.
What happens if I create the following function
CREATE OR REPLACE FUNCTION public.last_72hours(in_time timestamp with time zone)
Select * from precip where reading_time > (in_time - '72:00:00'::interval)
--the function will then do work on the returned rows
END;
Then I can call the function with
SELECT last_72hours(now())
When does now() get evaluated? Or, in other words, does the literal time value (e.g., 2018-03-31 1:01:01+5) get passed into the function? If it's the literal value, then Postgres only queries the appropriate child tables, right? But if it's evaluating now() inside the function, then I'm back to the plan that scans the index of every child table. It seems like there's no easy what to see what the planner is doing in the function. Is that correct?

There are several questions here; I'll do my best to answer them all.
PostgreSQL cannot evalutate now() at planning time because there is no way to know when the statement will be executed. Plans can be kept around for an unlimited time.
If you call a function with now() as argument, it will be evaluated at the time of the function call.
If you use a parameter in an SQL statement involving a partitioned table inside a function (so the plan is cached), two things can happen:
PostgreSQL decides to switch to a generic plan after the fifth execution of the query. Then no partition pruning can take place.
PostgreSQL decides to stick with custom plans so that partition pruning would take place.
One would assume that the second option will usually be chosen, but to find out you can use auto_explain to see the plans actually used.
It might be a good idea to use dynamic SQL so that the query is always replanned with the current parameter values, and partition pruning is certain to be used.

Posgres 10: "spoofing" now() in an immutable function. A safe idea?

My app reports rainfall and streamflow information to whitewater boaters. Postgres is my data store for the gauge readings that come in 15 minute intervals. Over time, these tables get pretty big and the availabity of range partitioning in Postgres 10 inspired me to leave my shared hosting service and build a server from scratch at Linode. My queries on these large tables became way faster after I partitioned the readings into 2 week chunks. Several months down the road, I checked out the query plan and was very surprised to see that using now() in a query caused PG to scan allof the indexes on my partitioned tables. What the heck?!?! Isn't the point of partitiong data is to avoid situations like this?
Here's my set up: my partitioned table
CREATE TABLE public.precip
(
gauge_id smallint,
inches numeric(8, 2),
reading_time timestamp with time zone
) PARTITION BY RANGE (reading_time)
I've created partitions for every two weeks, so I have about 50 partition tables so far. One of my partitions:
CREATE TABLE public.precip_y2017w48 PARTITION OF public.precip
FOR VALUES FROM ('2017-12-03 00:00:00-05') TO ('2017-12-17 00:00:00-05');
Then each partition is indexed on gauge_id and reading_time
I have a lot of queries like
WHERE gauge_id = xxx
AND precip.reading_time > (now() - '01:00:00'::interval)
AND precip.reading_time < now()
As I mentioned postgres scans all of the indexes on reading_time for every 'child' table rather than only querying the child table that has timestamps in the query range. If I enter literal values (e.g., precip.reading_time > '2018-03-01 01:23:00') instead of now(), it only scans the indexes of appropriate child tables(s). I've done some reading and I understand that now() is volatile and that the planner won't know what the value will be when the query executes. I've also read that query planning is expensive so postgres caches plans. I can understand why PG is programmed to do that. However, one counter argument I read was that a re-planned query is probably way less expensive than a query that end up ignoring partitions. I agree - and that's probably the case in my situation.
As a work arounds, I've created this function:
CREATE OR REPLACE FUNCTION public.hours_ago2(i integer)
RETURNS timestamp with time zone
LANGUAGE 'plpgsql'
COST 100
IMMUTABLE
ROWS 0
AS $BODY$
DECLARE X timestamp with time zone;
BEGIN
X:= now() + cast(i || ' hours' as interval);
RETURN X;
END;
$BODY$;
Note the IMMUTABLE statment. Now when issue queries like
select * from stream
where gauge_id = 2142 and reading_time > hours_ago2(-3)
and reading_time < hours_ago2(0)
PG only searches the partition table that stores data for that time frame. This is the goal I was shooting for when I set up the partitions in the first place. Booyah. But is this safe? Will the query planner ever cache the results of hours_ago2(-3) and use it over and over again for hours down the road? It's ok if it's cached for a few minutes. Again, my app reports rain and streamflow information; it doesn't deal with financial transactions or any other 'critical' types of data processing. I've tested simple statements like select hours_ago2(-3) and it returns new values every time. So it seems safe. But is it really?

That is not safe because at planning time you have no idea if the statement will be executed in the same transaction or not.
If you are in a situation where query plans are cached, this will return wrong results. Query plans are cached for named prepared statements and statements in PL/pgSQL functions, so you could end up with an out-of-date value for the duration of the database session.
For example:
CREATE TABLE times(id integer PRIMARY KEY, d timestamptz NOT NULL);
PREPARE x AS SELECT * FROM times WHERE d > hours_ago2(1);
The function is evaluated at planning time, and the result is a constant in the execution plan (for immutable functions that is fine).
EXPLAIN (COSTS off) EXECUTE x;
QUERY PLAN
---------------------------------------------------------------------------
Seq Scan on times
Filter: (d > '2018-03-12 14:25:17.380057+01'::timestamp with time zone)
(2 rows)
SELECT pg_sleep(100);
EXPLAIN (COSTS off) EXECUTE x;
QUERY PLAN
---------------------------------------------------------------------------
Seq Scan on times
Filter: (d > '2018-03-12 14:25:17.380057+01'::timestamp with time zone)
(2 rows)
The second query definitely does not return the result you want.
I think you should evaluate now() (or better an equivalent function on the client side) first, perform your date arithmetic and supply the result as parameter to the query. Inside of PL/pgSQL functions, use dynamic SQL.

Change the queries to use 'now'::timestamptz instead of now(). Also, interval math on timestamptz is not immutable.
Change your query to something like:
WHERE gauge_id = xxx
AND precip.reading_time > ((('now'::timestamptz AT TIME ZONE 'UTC') - '01:00:00'::interval) AT TIME ZONE 'UTC')
AND precip.reading_time < 'now'::timestamptz

Proper table to track employee changes over time?

I have been using Python to do this in memory, but I would like to know the proper way to set up an employee mapping table in Postgres.
row_id | employee_id | other_id | other_dimensions | effective_date | expiration_date | is_current
Unique constraint on (employee_id, other_id), so a new row would be inserted whenever there is a change
I would want the expiration date from the previous row to be updated to the new effective_date minus 1 day, and the is_current should be updated to False
Ultimate purpose is to be able to map each employee back accurately on a given date
Would love to hear some best practices so I can move away from my file-based method where I read the whole roster into memory and use pandas to make changes, then truncate the original table and insert the new one.

Here's a general example built using the column names you provided that I think does more or less what you want. Don't treat it as a literal ready-to-run solution, but rather an example of how to make something like this work that you'll have to modify a bit for your own actual use case.
The rough idea is to make an underlying raw table that holds all your data, and establish a view on top of this that gets used for ordinary access. You can still use the raw table to do anything you need to do to or with the data, no matter how complicated, but the view provides more restrictive access for regular use. Rules are put in place on the view to enforce these restrictions and perform the special operations you want. While it doesn't sound like it's significant for your current application, it's important to note that these restrictions can be enforced via PostgreSQL's roles and privileges and the SQL GRANT command.
We start by making the raw table. Since the is_current column is likely to be used for reference a lot, we'll put an index on it. We'll take advantage of PostgreSQL's SERIAL type to manage our raw table's row_id for us. The view doesn't even need to reference the underlying row_id. We'll default the is_current to a True value as we expect most of the time we'll be adding current records, not past ones.
CREATE TABLE raw_employee (
row_id SERIAL PRIMARY KEY,
employee_id INTEGER,
other_id INTEGER,
other_dimensions VARCHAR,
effective_date DATE,
expiration_date DATE,
is_current BOOLEAN DEFAULT TRUE
);
CREATE INDEX employee_is_current_index ON raw_employee (is_current);
Now we define our view. To most of the world this will be the normal way to access employee data. Internally it's a special SELECT run on-demand against the underlying raw_employee table that we've already defined. If we had reason to, we could further refine this view to hide more data (it's already hiding the low-level row_id as mentioned earlier) or display additional data produced either via calculation or relations with other tables.
CREATE OR REPLACE VIEW employee AS
SELECT employee_id, other_id,
other_dimensions, effective_date, expiration_date,
is_current
FROM raw_employee;
Now our rules. We construct these so that whenever someone tries an operation against our view, internally it'll perform a operation against our raw table according to the restrictions we define. First INSERT; it mostly just passes the data through without change, but it has to account for the hidden row_id:
CREATE OR REPLACE RULE employee_insert AS ON INSERT TO employee DO INSTEAD
INSERT INTO raw_employee VALUES (
NEXTVAL('raw_employee_row_id_seq'),
NEW.employee_id, NEW.other_id,
NEW.other_dimensions,
NEW.effective_date, NEW.expiration_date,
NEW.is_current
);
The NEXTVAL part enables us to lean on PostgreSQL for row_id handling. Next is our most complicated one: UPDATE. Per your described intent, it has to match against employee_id, other_id pairs and perform two operations: updating the old record to be no longer current, and inserting a new record with updated dates. You didn't specify how you wanted to manage new expiration dates, so I took a guess. It's easy to change it.
CREATE OR REPLACE RULE employee_update AS ON UPDATE TO employee DO INSTEAD (
UPDATE raw_employee SET is_current = FALSE
WHERE raw_employee.employee_id = OLD.employee_id AND
raw_employee.other_id = OLD.other_id;
INSERT INTO raw_employee VALUES (
NEXTVAL('raw_employee_row_id_seq'),
COALESCE(NEW.employee_id, OLD.employee_id),
COALESCE(NEW.other_id, OLD.other_id),
COALESCE(NEW.other_dimensions, OLD.other_dimensions),
COALESCE(NEW.effective_date, OLD.expiration_date - '1 day'::INTERVAL),
COALESCE(NEW.expiration_date, OLD.expiration_date + '1 year'::INTERVAL),
TRUE
);
);
The use of COALESCE enables us to update columns that have explicit updates, but keep old values for ones that don't. Finally, we need to make a rule for DELETE. Since you said you want to ensure you can track employee histories, the best way to do this is also the simplest: we just disable it.
CREATE OR REPLACE RULE employee_delete_protect AS
ON DELETE TO employee DO INSTEAD NOTHING;
Now we ought to be able to insert data into our raw table by performing INSERT operations on our view. Here are two sample employees; the first has a few weeks left but the second is about to expire. Note that at this level we don't need to care about the row_id. It's an internal implementation detail of the lower level raw table.
INSERT INTO employee VALUES (
1, 1,
'test', CURRENT_DATE - INTERVAL '1 week', CURRENT_DATE + INTERVAL '3 weeks',
TRUE
);
INSERT INTO employee VALUES (
2, 2,
'another test', CURRENT_DATE - INTERVAL '1 month', CURRENT_DATE,
TRUE
);
The final example is deceptively simple after all the build-up that we've done. It performs an UPDATE operation on the view, and internally it results in an update to the existing employee #2 plus a new entry for employee #2.
UPDATE employee SET expiration_date = CURRENT_DATE + INTERVAL '1 year'
WHERE employee_id = 2 AND other_id = 2;
Again I'll stress that this isn't meant to just take and use without modification. There should be enough info here though for you to make something work for your specific case.

Optimize Postgres query on timestamp range

I have the following table and indices defined:
CREATE TABLE ticket (
wid bigint NOT NULL DEFAULT nextval('tickets_id_seq'::regclass),
eid bigint,
created timestamp with time zone NOT NULL DEFAULT now(),
status integer NOT NULL DEFAULT 0,
argsxml text,
moduleid character varying(255),
source_id bigint,
file_type_id bigint,
file_name character varying(255),
status_reason character varying(255),
...
)
I created an index on the created timestamp as follows:
CREATE INDEX ticket_1_idx
ON ticket
USING btree
(created );
Here's my query:
select * from ticket
where created between '2012-12-19 00:00:00' and '2012-12-20 00:00:00'
This was working fine until the number of records started to grow (about 5 million) and now it's taking forever to return.
Explain analyze reveals this:
Index Scan using ticket_1_idx on ticket (cost=0.00..10202.64 rows=52543 width=1297) (actual time=0.109..125.704 rows=53340 loops=1)
Index Cond: ((created >= '2012-12-19 00:00:00+00'::timestamp with time zone) AND (created <= '2012-12-20 00:00:00+00'::timestamp with time zone))
Total runtime: 175.853 ms
So far I've tried setting:
random_page_cost = 1.75
effective_cache_size = 3
Also created:
create CLUSTER ticket USING ticket_1_idx;
Nothing works. What am I doing wrong? Why is it selecting sequential scan? The indexes are supposed to make the query fast. Anything that can be done to optimize it?

CLUSTER
If you intend to use CLUSTER, the displayed syntax is invalid.
create CLUSTER ticket USING ticket_1_idx;
Run once:
CLUSTER ticket USING ticket_1_idx;
This can help a lot with bigger result sets. Less for a single or few rows returned.
If your table isn't read-only the effect deteriorates over time. Re-run CLUSTER at reasonable intervals. Postgres remembers the index for subsequent calls, so this works, too:
CLUSTER ticket;
(But I would rather be explicit and use the first form.)
However, if you have lots of updates, CLUSTER (or VACUUM FULL) may actually be bad for performance. The right amount of bloat allows UPDATE to place new row versions on the same data page and avoids the need for extending the underlying physical file (expensively) too often. You can use a carefully tuned FILLFACTOR to get the best of both worlds:
Fillfactor for a sequential index that is PK
pg_repack / pg_squeeze
CLUSTER takes an exclusive lock on the table, which may be a problem in a multi-user environment. Quoting the manual:
When a table is being clustered, an ACCESS EXCLUSIVE lock is acquired
on it. This prevents any other database operations (both reads and
writes) from operating on the table until the CLUSTER is finished.
Bold emphasis mine. Consider the alternatives!
pg_repack:
Unlike CLUSTER and VACUUM FULL it works online, without holding an
exclusive lock on the processed tables during processing. pg_repack is
efficient to boot, with performance comparable to using CLUSTER directly.
and:
pg_repack needs to take an exclusive lock at the end of the reorganization.
The current version 1.4.7 works with PostgreSQL 9.4 - 14.
pg_squeeze is a newer alternative that claims:
In fact we try to replace pg_repack extension.
The current version 1.4 works with Postgres 10 - 14.
Query
The query is simple enough not to cause any performance problems per se.
However: The BETWEEN construct includes boundaries. Your query selects all of Dec. 19, plus records from Dec. 20, 00:00. That's an extremely unlikely requirement. Chances are, you really want:
SELECT *
FROM ticket
WHERE created >= '2012-12-19 00:00'
AND created < '2012-12-20 00:00';
Performance
Why is it selecting sequential scan?
Your EXPLAIN output clearly shows an Index Scan, not a sequential table scan. There must be some kind of misunderstanding.
You may be able to improve performance, but the necessary background information is not in the question. Possible options include:
Only query required columns instead of * to reduce transfer cost (and other performance benefits).
Look at partitioning and put practical time slices into separate tables. Add indexes to partitions as needed.
If partitioning is not an option, another related but less intrusive technique would be to add one or more partial indexes.
For example, if you mostly query the current month, you could create the following partial index:
CREATE INDEX ticket_created_idx ON ticket(created)
WHERE created >= '2012-12-01 00:00:00'::timestamp;
CREATE a new index right before the start of a new month. You can easily automate the task with a cron job.
Optionally DROP partial indexes for old months later.
Keep the total index in addition for CLUSTER (which cannot operate on partial indexes). If old records never change, table partitioning would help this task a lot, since you only need to re-cluster newer partitions.
Then again if records never change at all, you probably don't need CLUSTER.
Performance Basics
You may be missing one of the basics. All the usual performance advice applies:
https://wiki.postgresql.org/wiki/Slow_Query_Questions
https://wiki.postgresql.org/wiki/Performance_Optimization

Optimization of count query for PostgreSQL

I have a table in postgresql that contains an array which is updated constantly.
In my application i need to get the number of rows for which a specific parameter is not present in that array column. My query looks like this:
select count(id)
from table
where not (ARRAY['parameter value'] <# table.array_column)
But when increasing the amount of rows and the amount of executions of that query (several times per second, possibly hundreds or thousands) the performance decreses a lot, it seems to me that the counting in postgresql might have a linear order of execution (I’m not completely sure of this).
Basically my question is:
Is there an existing pattern I’m not aware of that applies to this situation? what would be the best approach for this?
Any suggestion you could give me would be really appreciated.

PostgreSQL actually supports GIN indexes on array columns. Unfortunately, it doesn't seem to be usable for NOT ARRAY[...] <# indexed_col, and GIN indexes are unsuitable for frequently-updated tables anyway.
Demo:
CREATE TABLE arrtable (id integer primary key, array_column integer[]);
INSERT INTO arrtable(1, ARRAY[1,2,3,4]);
CREATE INDEX arrtable_arraycolumn_gin_arr_idx
ON arrtable USING GIN(array_column);
-- Use the following *only* for testing whether Pg can use an index
-- Do not use it in production.
SET enable_seqscan = off;
explain (buffers, analyze) select count(id)
from arrtable
where not (ARRAY[1] <# arrtable.array_column);
Unfortunately, this shows that as written we can't use the index. If you don't negate the condition it can be used, so you can search for and count rows that do contain the search element (by removing NOT).
You could use the index to count entries that do contain the target value, then subtract that result from a count of all entries. Since counting all rows in a table is quite slow in PostgreSQL (9.1 and older) and requires a sequential scan this will actually be slower than your current query. It's possible that on 9.2 an index-only scan can be used to count the rows if you have a b-tree index on id, in which case this might actually be OK:
SELECT (
SELECT count(id) FROM arrtable
) - (
SELECT count(id) FROM arrtable
WHERE (ARRAY[1] <# arrtable.array_column)
);
It's guaranteed to perform worse than your original version for Pg 9.1 and below, because in addition to the seqscan your original requires it also needs an GIN index scan. I've now tested this on 9.2 and it does appear to use an index for the count, so it's worth exploring for 9.2. With some less trivial dummy data:
drop index arrtable_arraycolumn_gin_arr_idx ;
truncate table arrtable;
insert into arrtable (id, array_column)
select s, ARRAY[1,2,s,s*2,s*3,s/2,s/4] FROM generate_series(1,1000000) s;
CREATE INDEX arrtable_arraycolumn_gin_arr_idx
ON arrtable USING GIN(array_column);
Note that a GIN index like this will slow updates down a LOT, and is quite slow to create in the first place. It is not suitable for tables that get updated much at all - like your table.
Worse, the query using this index takes up to twice times as long as your original query and at best half as long on the same data set. It's worst for cases where the index is not very selective like ARRAY[1] - 4s vs 2s for the original query. Where the index is highly selective (ie: not many matches, like ARRAY[199]) it runs in about 1.2 seconds vs the original's 3s. This index simply isn't worth having for this query.
The lesson here? Sometimes, the right answer is just to do a sequential scan.
Since that won't do for your hit rates, either maintain a materialized view with a trigger as #debenhur suggests, or try to invert the array to be a list of parameters that the entry does not have so you can use a GiST index as #maniek suggests.

Is there an existing pattern I’m not aware of that applies to this
situation? what would be the best approach for this?
Your best bet in this situation might be to normalize your schema. Split the array out into a table. Add a b-tree index on the table of properties, or order the primary key so it's efficiently searchable by property_id.
CREATE TABLE demo( id integer primary key );
INSERT INTO demo (id) SELECT id FROM arrtable;
CREATE TABLE properties (
demo_id integer not null references demo(id),
property integer not null,
primary key (demo_id, property)
);
CREATE INDEX properties_property_idx ON properties(property);
You can then query the properties:
SELECT count(id)
FROM demo
WHERE NOT EXISTS (
SELECT 1 FROM properties WHERE demo.id = properties.demo_id AND property = 1
)
I expected this to be a lot faster than the original query, but it's actually much the same with the same sample data; it runs in the same 2s to 3s range as your original query. It's the same issue where searching for what is not there is much slower than searching for what is there; if we're looking for rows containing a property we can avoid the seqscan of demo and just scan properties for matching IDs directly.
Again, a seq scan on the array-containing table does the job just as well.

I think with Your current data model You are out of luck. Try to think of an algorithm that the database has to execute for Your query. There is no way it could work without sequential scanning of data.
Can You arrange the column so that it stores the inverse of data (so that the the query would be select count(id) from table where ARRAY[‘parameter value’] <# table.array_column) ? This query would use a gin/gist index.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse