LATERAL JOIN not using trigram index - postgresql

I want to do some basic geocoding of addresses using Postgres. I have an address table that has around 1 million raw address strings:
=> \d addresses
Table "public.addresses"
Column | Type | Modifiers
---------+------+-----------
address | text |
I also have a table of location data:
=> \d locations
Table "public.locations"
Column | Type | Modifiers
------------+------+-----------
id | text |
country | text |
postalcode | text |
latitude | text |
longitude | text |
Most of the address strings contain postalcodes, so my first attempt was to do a like and a lateral join:
EXPLAIN SELECT * FROM addresses a
JOIN LATERAL (
SELECT * FROM locations
WHERE address ilike '%' || postalcode || '%'
ORDER BY LENGTH(postalcode) DESC
LIMIT 1
) AS l ON true;
That gave the expected result, but it was slow. Here's the query plan:
QUERY PLAN
--------------------------------------------------------------------------------------
Nested Loop (cost=18383.07..18540688323.77 rows=1008572 width=91)
-> Seq Scan on addresses a (cost=0.00..20997.72 rows=1008572 width=56)
-> Limit (cost=18383.07..18383.07 rows=1 width=35)
-> Sort (cost=18383.07..18391.93 rows=3547 width=35)
Sort Key: (length(locations.postalcode))
-> Seq Scan on locations (cost=0.00..18365.33 rows=3547 width=35)
Filter: (a.address ~~* (('%'::text || postalcode) || '%'::text))
I tried adding a gist trigram index to the address column, like mentioned at https://stackoverflow.com/a/13452528/36191, but the query plan for the above query doesn't make use of it, and the query plan in unchanged.
CREATE INDEX idx_address ON addresses USING gin (address gin_trgm_ops);
I have to remove the order by and limit in the lateral join query for the index to get used, which doesn't give me the results I want. Here's the query plan for the query without ORDER or LIMIT:
QUERY PLAN
-----------------------------------------------------------------------------------------------
Nested Loop (cost=39.35..129156073.06 rows=3577682241 width=86)
-> Seq Scan on locations (cost=0.00..12498.55 rows=709455 width=28)
-> Bitmap Heap Scan on addresses a (cost=39.35..131.60 rows=5043 width=58)
Recheck Cond: (address ~~* (('%'::text || locations.postalcode) || '%'::text))
-> Bitmap Index Scan on idx_address (cost=0.00..38.09 rows=5043 width=0)
Index Cond: (address ~~* (('%'::text || locations.postalcode) || '%'::text))
Is there something I can do to get the query to use the index, or is there a better way to rewrite this query?

Why?
The query cannot use the index on principal. You would need an index on the table locations, but the one you have is on the table addresses.
You can verify my claim by setting:
SET enable_seqscan = off;
(In your session only, and for debugging only. Never use it in production.) It's not like the index would be more expensive than a sequential scan, there is just no way for Postgres to use it for your query at all.
Aside: [INNER] JOIN ... ON true is just an awkward way of saying CROSS JOIN ...
Why is the index used after removing ORDER and LIMIT?
Because Postgres can rewrite this simple form to:
SELECT *
FROM addresses a
JOIN locations l ON a.address ILIKE '%' || l.postalcode || '%';
You'll see the exact same query plan. (At least I do in my tests on Postgres 9.5.)
Solution
You need an index on locations.postalcode. And while using LIKE or ILIKE you would also need to bring the indexed expression (postalcode) to the left side of the operator. ILIKE is implemented with the operator ~~* and this operator has no COMMUTATOR (a logical necessity), so it's not possible to flip operands around. Detailed explanation in these related answers:
Can PostgreSQL index array columns?
PostgreSQL - text Array contains value similar to
Is there a way to usefully index a text column containing regex patterns?
A solution is to use the trigram similarity operator % or its inverse, the distance operator <-> in a nearest neighbour query instead (each is commutator for itself, so operands can switch places freely):
SELECT *
FROM addresses a
JOIN LATERAL (
SELECT *
FROM locations
ORDER BY postalcode <-> a.address
LIMIT 1
) l ON address ILIKE '%' || postalcode || '%';
Find the most similar postalcode for each address, and then check if that postalcode actually matches fully.
This way, a longer postalcode will be preferred automatically since it's more similar (smaller distance) than a shorter postalcode that also matches.
A bit of uncertainty remains. Depending on possible postal codes, there could be false positives due to matching trigrams in other parts of the string. There is not enough information in the question to say more.
Here, [INNER] JOIN instead of CROSS JOIN makes sense, since we add an actual join condition.
The manual:
This can be implemented quite efficiently by GiST indexes, but not by GIN indexes.
So:
CREATE INDEX locations_postalcode_trgm_gist_idx ON locations
USING gist (postalcode gist_trgm_ops);

It's a far shot, but how does the following alternative perform?
SELECT DISTINCT ON ((x.a).address) (x.a).*, l.*
FROM (
SELECT a, l.id AS lid, LENGTH(l.postalcode) AS pclen
FROM addresses a
LEFT JOIN locations l ON (a.address ilike '%' || l.postalcode || '%') -- this should be fast, but produce many rows
) x
LEFT JOIN locations l ON (l.id = x.lid)
ORDER BY (x.a).address, pclen DESC -- this is where it will be slow, as it'll have to sort the entire results, to filter them by DISTINCT ON

It can work if you turn the lateral join inside out. But even then it might still be really slow
SELECT DISTINCT ON (address) *
FROM (
SELECT *
FROM locations
,LATERAL(
SELECT * FROM addresses
WHERE address ilike '%' || postalcode || '%'
OFFSET 0 -- force fencing, might be redundant
) a
) q
ORDER BY address, LENGTH(postalcode) DESC
The downside is that you can implement paging only on postalcodes, not addresses.

Related

Postgresql should left join use WHERE or ON is enough?

When you do a select subquery, should you use WHERE inside it or 's on s.id = t.id' is enough? I want to understand if subquery without where selecting all the rows and then filter them, or it's select only that match condition 'on add.id = table.id'
SELECT * FROM table
left join (
select *
from add
/* where add.id = 1 */ - do i need this?
group by add.id
) add on add.id = table.id
WHERE table.id = 1
As i understand from EXPLAIN:
Nested Loop Left Join (cost=2.95..13.00 rows=10 width=1026)
Join Filter: (add.id = table.id)
It loads all rows and then do a filter. Is it bad?
I'm not sure if your example it too simple, but you shouldn't need a subquery at all for this one - and definitely not the group by.
Suppose you do need a subquery, then for this specific example, it leads to exactly the same query plan whether you add the where clause or not. The idea of the query planner is that it tries to find a way to make your query as fast as possible. Oftentimes this means ordering the execution of joins and where clauses in such a way, that the result set is increased sooner rather than later. I generated exactly the same query, only with reservations and customers, I hope that's okay.
EXPLAIN
SELECT *
FROM reservations
LEFT OUTER JOIN (
SELECT *
FROM customers
) AS customers ON customers.id = reservations.customer_id
WHERE customer_id = 1;
Nested Loop Left Join (cost=0.17..183.46 rows=92 width=483)
Join Filter: (customers.id = reservations.customer_id)
-> Index Scan using index_reservations_on_customer_id on reservations (cost=0.09..179.01 rows=92 width=255)
Index Cond: (customer_id = 1)
-> Materialize (cost=0.08..4.09 rows=1 width=228)
-> Index Scan using customers_pkey on customers (cost=0.08..4.09 rows=1 width=228)
Index Cond: (id = 1)
The deepest arrows are executed first. This means that even though I didn't have the equivalent of where add.id = 1 in my subquery, it still knew that the equality customers.id = customer_id = 1 should be true, so it decided to filter on customers.id = 1 before even attempting to join anything

Simple WHERE EXISTS ... ORDER BY... query very slow in PostrgeSQL

I have this very simple query, generated by my ORM (Entity Framework Core):
SELECT *
FROM "table1" AS "t1"
WHERE EXISTS (
SELECT 1
FROM "table2" AS "t2"
WHERE ("t2"."is_active" = TRUE) AND ("t1"."table2_id" = "t2"."id"))
ORDER BY "t1"."table2_id"
There are 2 "is_active" records. The other involved columns ("id") are the primary keys. Query returns exactly 4 rows.
Table 1 is 96 million records.
Table 2 is 30 million records.
The 3 columns involved in this query are indexed (is_active, id, table2_id).
The C#/LINQ code that generates this simple query is: Table2.Where(t => t.IsActive).Include(t => t.Table1).ToList();`
SET STATISTICS 10000 was set to all of the 3 columns.
VACUUM FULL ANALYZE was run on both tables.
WITHOUT the ORDER BY clause, the query returns within a few milliseconds, and I’d expect nothing else for 4 records to return. EXPLAIN output:
Nested Loop (cost=1.13..13.42 rows=103961024 width=121)
-> Index Scan using table2_is_active_idx on table2 (cost=0.56..4.58 rows=1 width=8)
Index Cond: (is_active = true)
Filter: is_active
-> Index Scan using table1_table2_id_fkey on table1 t1 (cost=0.57..8.74 rows=10 width=121)
Index Cond: (table2_id = table1.id)
WITH the ORDER BY clause, the query takes 5 minutes to complete! EXPLAIN output:
Merge Semi Join (cost=10.95..4822984.67 rows=103961040 width=121)
Merge Cond: (t1.table2_id = t2.id)
-> Index Scan using table1_table2_id_fkey on table1 t1 (cost=0.57..4563070.61 rows=103961040 width=121)
-> Sort (cost=4.59..4.59 rows=2 width=8)
Sort Key: t2.id
-> Index Scan using table2_is_active_idx on table2 a (cost=0.56..4.58 rows=2 width=8)
Index Cond: (is_active = true)
Filter: is_active
The inner, first index scan should return no more than 2 rows. Then the outer, second index scan doesn't make any sense with its cost of 4563070 and 103961040 rows. It only has to match 2 rows in table2 with 4 rows in table1!
This is a very simple query with very few records to return. Why is Postgres failing to execute it properly?
Ok I solved my problem in the most unexpected way. I upgraded Postgresql from 9.6.1 to 9.6.3. And that was it. After restarting the service, the explain plan now looked good and the query ran just fine this time. I did not change anything, no new index, nothing. The only explanation I can think of is that there is was a query planner bug in 9.6.1 and solved in 9.6.3. Thank you all for your answers!
Add an index:
CREATE INDEX _index
ON table2
USING btree (id)
WHERE is_active IS TRUE;
And rewrite query like this
SELECT table1.*
FROM table2
INNER JOIN table1 ON (table1.table2_id = table2.id)
WHERE table2.is_active IS TRUE
ORDER BY table2.id
It is necessary to take into account that "is_active IS TRUE" and "is_active = TRUE" process by PostgreSQL in different ways. So the expression in the index predicate and the query must match.
If u can't rewrite query try add an index:
CREATE INDEX _index
ON table2
USING btree (id)
WHERE is_active = TRUE;
Your guess is right, there is a bug in Postgres 9.6.1 that fits your use case exactly. And upgrading was the right thing to do. Upgrading to the latest point-release is always the right thing to do.
Quoting the release notes for Postgres 9.6.2:
Fix foreign-key-based join selectivity estimation for semi-joins and
anti-joins, as well as inheritance cases (Tom Lane)
The new code for taking the existence of a foreign key relationship
into account did the wrong thing in these cases, making the estimates
worse not better than the pre-9.6 code.
You should still create that partial index like Dima advised. But keep it simple:
is_active = TRUE and is_active IS TRUE subtly differ in that the second returns FALSE instead of NULL for NULL input. But none of this matters in a WHERE clause where only TRUE qualifies. And both expressions are just noise. In Postgres you can use boolean values directly:
CREATE INDEX t2_id_idx ON table2 (id) WHERE is_active; -- that's all
And do not rewrite your query with a LEFT JOIN. This would add rows consisting of NULL values to the result for "active" rows in table2 without any siblings in table1. To match your current logic it would have to be an [INNER] JOIN:
SELECT t1.*
FROM table2 t2
JOIN table1 t1 ON t1.table2_id = t2.id -- and no parentheses needed
WHERE t2.is_active -- that's all
ORDER BY t1.table2_id;
But there is no need to rewrite your query that way at all. The EXISTS semi-join you have is just as good. Results in the same query plan once you have the partial index.
SELECT *
FROM table1 t1
WHERE EXISTS (
SELECT 1 FROM table2
WHERE is_active -- that's all
WHERE id = t1.table2_id
)
ORDER BY table2_id;
BTW, since you fixed the bug by upgrading and once you have created that partial index (and run ANALYZE or VACUUM ANALYZE on the table at least once - or autovacuum did that for you), you will never again get a bad query plan for this, since Postgres maintains separate estimates for the partial index, which are unambiguous for your numbers. Details:
Get count estimates from pg_class.reltuples for given conditions
Index that is not used, yet influences query

Why can't PostgreSQL do this simple FULL JOIN?

Here's a minimal setup with 2 tables a and b each with 3 rows:
CREATE TABLE a (
id SERIAL PRIMARY KEY,
value TEXT
);
CREATE INDEX ON a (value);
CREATE TABLE b (
id SERIAL PRIMARY KEY,
value TEXT
);
CREATE INDEX ON b (value);
INSERT INTO a (value) VALUES ('x'), ('y'), (NULL);
INSERT INTO b (value) VALUES ('y'), ('z'), (NULL);
Here is a LEFT JOIN that works fine as expected:
SELECT * FROM a
LEFT JOIN b ON a.value IS NOT DISTINCT FROM b.value;
with output:
id | value | id | value
----+-------+----+-------
1 | x | |
2 | y | 1 | y
3 | | 3 |
(3 rows)
Changing "LEFT JOIN" to "FULL JOIN" gives an error:
SELECT * FROM a
FULL JOIN b ON a.value IS NOT DISTINCT FROM b.value;
ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions
Can someone please answer:
What is a "merge-joinable or hash-joinable join condition" and why joining on a.value IS NOT DISTINCT FROM b.value doesn't fulfill this condition, but a.value = b.value is perfectly fine?
It seems that the only difference is how NULL values are handled. Since the value column is indexed in both tables, running an EXPLAIN on a NULL lookup is just as efficient as looking up values that are non-NULL:
EXPLAIN SELECT * FROM a WHERE value = 'x';
QUERY PLAN
--------------------------------------------------------------------------
Bitmap Heap Scan on a (cost=4.20..13.67 rows=6 width=36)
Recheck Cond: (value = 'x'::text)
-> Bitmap Index Scan on a_value_idx (cost=0.00..4.20 rows=6 width=0)
Index Cond: (value = 'x'::text)
EXPLAIN SELECT * FROM a WHERE value ISNULL;
QUERY PLAN
--------------------------------------------------------------------------
Bitmap Heap Scan on a (cost=4.20..13.65 rows=6 width=36)
Recheck Cond: (value IS NULL)
-> Bitmap Index Scan on a_value_idx (cost=0.00..4.20 rows=6 width=0)
Index Cond: (value IS NULL)
This has been tested with PostgreSQL 9.6.3 and 10beta1.
There has been discussion about this issue, but it doesn't directly answer the above question.
PostgreSQL implements FULL OUTER JOIN with either a hash or a merge join.
To be eligible for such a join, the join condition has to have the form
<expression using only left table> <operator> <expression using only right table>
Now your join condition does look like this, but PostgreSQL does not have a special IS NOT DISTINCT FROM operator, so it parses your condition into:
(NOT ($1 IS DISTINCT FROM $2))
And such an expression cannot be used for hash or merge joins, hence the error message.
I can think of a way to work around it:
SELECT a_id, NULLIF(a_value, '<null>'),
b_id, NULLIF(b_value, '<null>')
FROM (SELECT id AS a_id,
COALESCE(value, '<null>') AS a_value
FROM a
) x
FULL JOIN
(SELECT id AS b_id,
COALESCE(value, '<null>') AS b_value
FROM b
) y
ON x.a_value = y.b_value;
That works if <null> does not appear anywhere in the value columns.
I just solved such a case by replacing the ON condition with "TRUE", and moving the original "ON" condition into a WHERE clause. I don't know the performance impact of this, though.

PostgreSQL: out of memory issues with array_agg() on a heroku db server

I'm stuck with a (Postgres 9.4.6) query that a) is using too much memory (most likely due to array_agg()) and also does not return exactly what I need, making post-processing necessary. Any input (especially regarding memory consumption) is highly appreciated.
Explanation:
the table token_groups holds all words used in tweets I've parsed with their respective occurrence frequency in a hstore, with one row per 10 minutes (for the last 7 days, so 7*24*6 rows in total). These rows are inserted in order of tweeted_at, so I can simply order by id. I'm using row_numberto identify when a word occurred.
# \d token_groups
Table "public.token_groups"
Column | Type | Modifiers
------------+-----------------------------+-----------------------------------------------------------
id | integer | not null default nextval('token_groups_id_seq'::regclass)
tweeted_at | timestamp without time zone | not null
tokens | hstore | not null default ''::hstore
Indexes:
"token_groups_pkey" PRIMARY KEY, btree (id)
"index_token_groups_on_tweeted_at" btree (tweeted_at)
What I'd ideally want is a list of words with each the relative distances of their row numbers. So if e.g. the word 'hello' appears in row 5 once, in row 8 twice and in row 20 once, I'd want a column with the word, and an array column returning {5,3,0,12}. (meaning: first occurrence in fifth row, next occurrence 3 rows later, next occurrence 0 rows later, next 12 rows later). If anyone wonders why: 'relevant' words occur in clusters, so (simplified) the higher the standard deviation of timely distances, the more likely a word is a keyword. See more here: http://bioinfo2.ugr.es/Publicaciones/PRE09.pdf
For now, I return an array with positions and an array with frequencies, and use this info to calculate the distances in ruby.
Currently the primary problem is a high memory spike, which seems to be caused by array_agg(). As I'm being told by the (very helpful) heroku staff that some of my connections use 500-700MB with very little shared memory, causing out of memory errors (I'm running Standard-0, which gives me 1GB total for all connections), I need to find an optimization.
The total number of hstore entries is ~100k, which then is aggregated (after skipping words with very low frequency):
SELECT COUNT(*)
FROM (SELECT row_number() over(ORDER BY id ASC) AS position,
(each(tokens)).key, (each(tokens)).value::integer
FROM token_groups) subquery;
count
--------
106632
Here is the query causing the memory load:
SELECT key, array_agg(pos) AS positions, array_agg(value) AS frequencies
FROM (
SELECT row_number() over(ORDER BY id ASC) AS pos,
(each(tokens)).key,
(each(tokens)).value::integer
FROM token_groups
) subquery
GROUP BY key
HAVING SUM(value) > 10;
The output is:
key | positions | frequencies
-------------+---------------------------------------------------------+-------------------------------
hello | {172,185,188,210,349,427,434,467,479} | {1,2,1,1,2,1,2,1,4}
world | {166,218,265,343,415,431,436,493} | {1,1,2,1,2,1,2,1}
some | {35,65,101,180,193,198,223,227,420,424,427,428,439,444} | {1,1,1,1,1,1,1,2,1,1,1,1,1,1}
other | {77,111,233,416,421,494} | {1,1,4,1,2,2}
word | {170,179,182,184,185,186,187,188,189,190,196} | {3,1,1,2,1,1,1,2,5,3,1}
(...)
Here's what explain says:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=12789.00..12792.50 rows=200 width=44) (actual time=309.692..343.064 rows=2341 loops=1)
Output: ((each(token_groups.tokens)).key), array_agg((row_number() OVER (?))), array_agg((((each(token_groups.tokens)).value)::integer))
Group Key: (each(token_groups.tokens)).key
Filter: (sum((((each(token_groups.tokens)).value)::integer)) > 10)
Rows Removed by Filter: 33986
Buffers: shared hit=2176
-> WindowAgg (cost=177.66..2709.00 rows=504000 width=384) (actual time=0.947..108.157 rows=106632 loops=1)
Output: row_number() OVER (?), (each(token_groups.tokens)).key, ((each(token_groups.tokens)).value)::integer, token_groups.id
Buffers: shared hit=2176
-> Sort (cost=177.66..178.92 rows=504 width=384) (actual time=0.910..1.119 rows=504 loops=1)
Output: token_groups.id, token_groups.tokens
Sort Key: token_groups.id
Sort Method: quicksort Memory: 305kB
Buffers: shared hit=150
-> Seq Scan on public.token_groups (cost=0.00..155.04 rows=504 width=384) (actual time=0.013..0.505 rows=504 loops=1)
Output: token_groups.id, token_groups.tokens
Buffers: shared hit=150
Planning time: 0.229 ms
Execution time: 570.534 ms
PS: if anyone wonders: every 10 minutes I append new data to the token_groupstable and remove outdated data. Which is easy when storing data one row per 10 minutes, I still have to come up with a better data structure that e.g. uses one row per word. But that does not seem to be the main issue, I think it's the array aggregation.
Your presented query can be simpler, evaluating each() only once per row:
SELECT key, array_agg(pos) AS positions, array_agg(value) AS frequencies
FROM (
SELECT t.key, pos, t.value::int
FROM (SELECT row_number() OVER (ORDER BY id) AS pos, * FROM token_groups) tg
, each(g.tokens) t -- implicit LATERAL join
ORDER BY t.key, pos
) sub
GROUP BY key
HAVING sum(value) > 10;
Also preserving correct order of elements.
What I'd ideally want is a list of words with each the relative distances of their row numbers.
This would do it:
SELECT key, array_agg(step) AS occurrences
FROM (
SELECT key, CASE WHEN g = 1 THEN pos - last_pos ELSE 0 END AS step
FROM (
SELECT key, value::int, pos
, lag(pos, 1, 0) OVER (PARTITION BY key ORDER BY pos) AS last_pos
FROM (SELECT row_number() OVER (ORDER BY id)::int AS pos, * FROM token_groups) tg
, each(g.tokens) t
) t1
, generate_series(1, t1.value) g
ORDER BY key, pos, g
) sub
GROUP BY key;
HAVING count(*) > 10;
SQL Fiddle.
Interpreting each hstore key as a word and the respective value as number of occurrences in the row (= for the last 10 minutes), I use two cascading LATERAL joins: 1st step to decompose the hstore value, 2nd step to multiply rows according to value. (If your value (frequency) is mostly just 1, you can simplify.) About LATERAL:
What is the difference between LATERAL and a subquery in PostgreSQL?
Then I ORDER BY key, pos, g in the subquery before aggregating in the outer SELECT. This clause seems to be redundant, and in fact, I see the same result without it in my tests. That's a collateral benefit from the window definition of lag() in the inner query, which is carried over to the next step unless any other step triggers re-ordering. However, now we depend on an implementation detail that's not guaranteed to work.
Ordering the whole query once should be substantially faster (and easier on the required sort memory) than per-aggregate sorting. This is not strictly according to the SQL standard either, but the simple case is documented for Postgres:
Alternatively, supplying the input values from a sorted subquery will usually work. For example:
SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
But this syntax is not allowed in the SQL standard, and is not portable to other database systems.
Strictly speaking, we only need:
ORDER BY pos, g
You could experiment with that. Related:
PostgreSQL unnest() with element number
Possible alternative:
SELECT key
, ('{' || string_agg(step || repeat(',0', value - 1), ',') || '}')::int[] AS occurrences
FROM (
SELECT key, pos, value::int
,(pos - lag(pos, 1, 0) OVER (PARTITION BY key ORDER BY pos))::text AS step
FROM (SELECT row_number() OVER (ORDER BY id)::int AS pos, * FROM token_groups) g
, each(g.tokens) t
ORDER BY key, pos
) t1
GROUP BY key;
-- HAVING sum(value) > 10;
Might be cheaper to use text concatenation instead of generate_series().

Optimization of search on concatenated firstname and lastname in PostgreSQL

I've written a SQL query in Postgres which search for a user by both firstname and lastname. My question is simply if it can be optimized, since it will be used a lot.
CREATE INDEX users_firstname_special_idx ON users(firstname text_pattern_ops);
CREATE INDEX users_lastname_special_idx ON users(lastname text_pattern_ops);
SELECT id, firstname, lastname FROM users WHERE firstname || ' ' || lastname ILIKE ('%' || 'sen' || '%') LIMIT 25;
If I run an explain I get the followin output:
Limit (cost=0.00..1.05 rows=1 width=68)
-> Seq Scan on users (cost=0.00..1.05 rows=1 width=68)
Filter: (((firstname || ' '::text) || lastname) ~~* '%sen%'::text)
As I understand I should try and make postgrep skip the "Filter:"-thing. Is that correct?
Hope you guys have any suggestions.
Cheers.
If you have more than 1 % wildcards in a string, you need to use a trigram index.
In your case, however, you are doing something odd. You concatenate firstname and lastname, with a space in the middle. The search string '%sen%' is therefore only present in the firstname or the lastname and never including the space. A better solution would therefore be:
CREATE INDEX users_firstname_special_idx ON users USING gist (firstname gist_trgm_ops);
CREATE INDEX users_lastname_special_idx ON users USING gist (lastname gist_trgm_ops);
SELECT id, firstname || ' ' || lastname AS fullname
FROM users
WHERE firstname ILIKE ('%sen%') OR lastname ILIKE ('%sen%')
LIMIT 25;
You described situation exactly from PostgreSQL Documentation:
Indexes on Expressions