Postgres combining date and time fields, is this efficient - postgresql

I am selecting rows based on a date range which is held in a string using the below SQL which works but is this a efficient way of doing it. As you can see the date and time is held in different fields. From my memory or doing Oracle work as soon as you put a function around a attribute it cant use indexes.
select *
from events
where venue_id = '2'
and EXTRACT(EPOCH FROM (start_date + start_time))
between EXTRACT(EPOCH FROM ('2017-09-01 00:00')::timestamp)
and EXTRACT(EPOCH FROM ('2017-09-30 00:00')::timestamp)
So is there a way of doing this that can use indexes?

Preface: Since your query is limited to a single venue_id, both examples below create a compound index with venue_id first.
If you want an index for improving that query, you can create an expression index:
CREATE INDEX events_start_idx
ON events (venue_id, (EXTRACT(EPOCH FROM (start_date + start_time))));
If you don't want a dedicated function index, you can create a normal index on the start_date column, and add extra logic to use the index. The index will then limit access plan to date range, and fringe records with wrong time of day on first and last dates are filtered out.
In the following, I'm also eliminating the unnecessary extraction of epoch.
CREATE INDEX events_venue_start
ON events (venue_id, start_date);
SELECT *
FROM events
WHERE venue_id = '2'
AND start_date BETWEEN '2017-09-01'::date AND '2017-09-30'::date
AND start_date + start_time BETWEEN '2017-09-01 00:00'::timestamp
AND '2017-09-30 00:00'::timestamp
The first two parts of the WHERE clause will use the index to full benefit. the last part is then use the filter the records found by the index.

Related

How to properly index and query time series data in Postgres?

I'm experimenting with Postgres recently instead of using Bigquery.
My table transactions_all structure
access_id (text)
serial_no (text)
transaction_id (text)
date_local (timestamp)
and an index (BTREE, condition (((date_local)::date), serial_no))
When I'm loading 500.000 rows for one month into this table, performance is okay to query the last 2 days like this
SELECT *
FROM transactions_all
WHERE DATE(date_local) BETWEEN CURRENT_DATE - INTERVAL '1 day' AND CURRENT_DATE
AND access_id = 'accessid1'
and serial_no = 'IO99267637'
But if I'm selecting the last 21 days like this
SELECT *
FROM transactions_all
WHERE DATE(date_local) BETWEEN CURRENT_DATE - INTERVAL '20 day' AND CURRENT_DATE
AND access_id = 'accessid1'
and serial_no = 'IO99267637'
then fetching the data takes multiple seconds instead of milliseconds.
Is this a normal behaviour or am I using the wrong index?
Your index columns are in the wrong order. In the front you need those expressions that are used with the = operator in the WHERE condition, because index columns after the column that is used with a different operator can no longer be used.
To understand that, imagine a phone book where the names are ordered by (last name, first name). Then consider that it is easy to find all entries with last name "Miller" and a first name less than "J": you just read the "Miller" entries until you hit "J". Now consider that the task is to find all "Joe"s whole last name is less than "M": you have to scan all entries up to "M", and it doesn't help you much that the first names are sorted too.
So use an index like
CREATE INDEX ON transactions_all (serial_no, access_id, date(date_local));

How to keep postgres statistics up to date to encourage the best index to be selected

I have a Notifications table with approximately 7,000,000 records where the relevant columns are:
id: integer
time_created: timestamp with time zone
device_id: integer (foreign key to another table)
And the indexes:
CREATE INDEX notifications_device ON notifications (device_id);
CREATE INDEX notifications_time ON notifications (time_created);
And my query:
SELECT COUNT(*) AS "__count"
FROM "notifications"
WHERE ("notifications"."device_id" IN (
SELECT "id" FROM device WHERE (
device."device_type" = 'iOS' AND
device."registration_id" IN (
'XXXXXXX',
'YYYYYYY',
'ZZZZZZZ'
)
)
)
AND "notifications"."time_created" BETWEEN
'2020-10-26 00:00:00' AND '2020-10-26 17:33:00')
;
For most of the day, this query will use the index on device_id, and will run in under 1ms. But once the table is written to very quickly (logging notifications sent) the planner switches to using the index on time_created and the query blows out to 300ms.
Running an ANALYZE NOTIFICATIONS immediately fixes the problem, and the index on device_id is used again.
The table is pruned to the last 30 days each night, which is why there is a separate index on the time_created column.
Can I fix this issue, so that the planner always chooses the index on device_id, by forcing postgres to maintain better statistics on this table? Alternatively, can I re-write the time_created index (perhaps by using a different index type like BRIN) so that it'd only be considered for a WHERE clause like time_created < ..30 days ago.. and not WHERE time_created BETWEEN midnight and now?
EXPLAIN ANALYZE stats:
Bad Plan (time_created):
Rows Removed by Filter = 20926
Shared Hit Blocks = 143934
Plan Rows = 38338
Actual Rows = 84479
Good Plan (device_id):
Rows Removed by Filter = 95
Shared Hit Blocks = 34
Plan Rows = 1
Actual Rows = 0
I would actually suggest a composite index on the notifications table:
CREATE INDEX idx1 ON notifications (device_id, time_created);
This index would cover both restrictions in the current WHERE clause. I would also add an index on the device table:
CREATE INDEX idx2 ON device (device_type, registration_id, id);
The first two columns of this 3-column index would cover the WHERE clause of the subquery. It also includes the id column to completely cover the SELECT clause. If used, Postgres could more rapidly evaluate the subquery on the device table.
You could also play around with some slight variants of the above two indices, by changing column order. For example, you could also try:
CREATE INDEX idx1 ON notifications (time_created, device_id);
CREATE INDEX idx2 ON device (registration_id , device_type, id);
The table is pruned to the last 30 days each night, which is why there is a separate index on the time_created column.
But, is that a good reason to have the index? Does it matter if the nightly query takes a little longer? Indeed, for deleting 3% of a table, does it even use the index and if it does, does that actually make it faster? Maybe you could replace the index with partitioning, or with nothing.
In any case, you can use this ugly hack to force it not to use the index:
AND "notifications"."time_created" + interval '0 seconds' BETWEEN '2020-10-26 00:00:00' AND '2020-10-26 17:33:00'

Postgres index timestamp with timezone column

I'm running PostgreSQL 9.6, and I have a table named decks with an expiration column of type timestamp with time zone (for storing decks of cards where each card can expire independently).
I'd like to create a nightly cron job that finds all cards which expired at any point during the previous day—i.e. between 0:00 and 23:59 inclusive.
This seems to gives me the time range I want...
SELECT id
FROM decks
WHERE expiration >= (now()::date - 1)::timestamptz
AND expiration < (now()::date)::timestamptz;
...but I'm wondering two things:
What's the best way to index the expiration column for my scenario?
Is there a better/cleaner way to specify the start and end times?
Question 1: For that query, a standard index is the best option. However, see below.
Question 2: Lots of options, here. A quick change to your query:
SELECT id
FROM decks
WHERE expiration::date = (now()::date - 1);
... allows you to create a functional index on expiration::date which should be smaller, and a bit more efficient.
Personally, I'd go a bit further and use current_date instead of now():
SELECT id
FROM decks
WHERE expiration::date = (current_date - 1);
As always, I recommend use of EXPLAIN and EXPLAIN ANALYZE when evaluating indexes.

query to fetch records between two date and time

I have been using postgreSQL. My table has 3 columns date, time and userId. I have to find out records between the given date and time frame. Since date and time columns are different, 'BETWEEN' clause is not providing valid results
Combine the two columns into a single timestamp by adding the time to the date:
select *
from some_table
where date_column + time_column
between timestamp '2017-06-14 17:30:00' and timestamp '2017-06-19 08:26:00';
Note that this will not use an index on date_column or time_column. You would need to create an index on that expression. Or better: use a single column defined as timestamp instead.

Optimizing date queries in postgresql

I'm having a hard time to optimizing queries on a very big table. Basically all of the them filter the set of results by the date:
SELECT FROM bigtable WHERE date >= '2015-01-01' AND date <= '2016-01-01' ORDER BY date desc;
Adding the following date index actually makes things worse:
CREATE INDEX CONCURRENTLY bigtable_date_index ON bigtable(date(date));
That is, without the index it takes about 1s to run and with it it takes about 10s to run. But with bigger ranges and filtering it is very slow even without that index.
I'm using postgresql 9.4 and I see that 9.5 has some improvements for sorting that might help?
Does BRIN indexes should help in this case?
For an index to be effective, it needs to index the same thing you're filtering by. In this case, you're filtering by date, but you appears to have indexed date(date), so the index can't be used.
Either filter your table using date(date):
SELECT FROM bigtable
WHERE date(date) >= '2015-01-01' AND date(date) <= '2016-01-01'
ORDER BY date(date) desc;
Or index the naked date:
CREATE INDEX CONCURRENTLY bigtable_date_index ON bigtable(date);