How to properly index and query time series data in Postgres? - postgresql

I'm experimenting with Postgres recently instead of using Bigquery.
My table transactions_all structure
access_id (text)
serial_no (text)
transaction_id (text)
date_local (timestamp)
and an index (BTREE, condition (((date_local)::date), serial_no))
When I'm loading 500.000 rows for one month into this table, performance is okay to query the last 2 days like this
SELECT *
FROM transactions_all
WHERE DATE(date_local) BETWEEN CURRENT_DATE - INTERVAL '1 day' AND CURRENT_DATE
AND access_id = 'accessid1'
and serial_no = 'IO99267637'
But if I'm selecting the last 21 days like this
SELECT *
FROM transactions_all
WHERE DATE(date_local) BETWEEN CURRENT_DATE - INTERVAL '20 day' AND CURRENT_DATE
AND access_id = 'accessid1'
and serial_no = 'IO99267637'
then fetching the data takes multiple seconds instead of milliseconds.
Is this a normal behaviour or am I using the wrong index?

Your index columns are in the wrong order. In the front you need those expressions that are used with the = operator in the WHERE condition, because index columns after the column that is used with a different operator can no longer be used.
To understand that, imagine a phone book where the names are ordered by (last name, first name). Then consider that it is easy to find all entries with last name "Miller" and a first name less than "J": you just read the "Miller" entries until you hit "J". Now consider that the task is to find all "Joe"s whole last name is less than "M": you have to scan all entries up to "M", and it doesn't help you much that the first names are sorted too.
So use an index like
CREATE INDEX ON transactions_all (serial_no, access_id, date(date_local));

Related

Postgres - Select all rows with a date column value within a range of another row's value?

I'm trying to use a date value as a starting point to construct a date range within a single postgres query. The date value would be something like
SELECT upgraded_at FROM accounts ORDER BY upgraded_at DESC limit 1;
which would then be used as the starting point. I then want to do something like
SELECT * from accounts WHERE upgraded_at >= (basis_date - 2 days) AND upgraded_at < (basis_date + 2 days);
Ideally I'd like to accomplish this with a single query. So I'll need to some subquery to get the starting date, then use that as a variable within the rest of the query.
Also eventually I'm going to be doing this within Sequelize. I definitely need the raw SQL way to do it but I'm also curious if later there's a Sequelize-specific way.
You can actually avoid making two references to the basis date here.
WITH cte AS (
SELECT *, MAX(upgraded_at) OVER () AS max_upgraded_at
FROM accounts
)
SELECT *
FROM cte
WHERE upgraded_at - max_upgraded_at BETWEEN -2 AND 2;

Selecting certain columns from a table with dates as columns

I have a table where column names are like years "2020-05","2020-06", "2020-07" etc and so many years as columns.I need to select only the current month, next month and third month columns alone from this table.(DB : PostgreSQL Version 11)
But since the column names are "TEXT" are in the format YYYY-MM , How can I select only the current month and future 2 months from this table without hard-coding the column names.
Below is the table structure , Name : static_data
Required select statement is like this,The table contains the 14 months data as in the above screen shot like DATES as columns.From this i want the current month , and next 2 month columns along with their data, something like below.
SELECT "2020-05","2020-06","2020-07" from static
-- SELECT Current month and next 2 months
Required output:
It's nearly impossible to get the actual value of the current month as the column name, but you can do something like this:
select d.item_sku,
d.status,
to_jsonb(d) ->> to_char(current_date, 'yyyy-mm') as current_month,
to_jsonb(d) ->> to_char(current_date + interval '1 month', 'yyyy-mm') as "month + 1",
to_jsonb(d) ->> to_char(current_date + interval '2 month', 'yyyy-mm') as "month + 2"
from bad_design d
;
Technically, you can use the information schema to achieve this. But, like GMB said, please re-design your schema and do not approach this issue like this, in the first place.
The special schema information_schema contains meta-data about your DB. Among these is are details about existing columns. In other words, you can query it and convert their names into dates to compare them to what you need.
Here are a few hints.
Query existing column names.
SELECT column_name
FROM information_schema.columns
WHERE table_schema = 'your_schema'
AND table_name = 'your_table'
Compare two dates.
SELECT now() + INTERVAL '3 months' < now() AS compare;
compare
---------
f
(1 row)
You're already pretty close with the conversion yourself.
Have fun and re-design your schema!
Disclaimer: this does not answer your question - but it's too long for a comment.
You need to fix the design of this table. Instead of storing dates in columns, you should have each date on a separate row.
There are numerous drawbacks to your current design:
very simple queries are utterly complicated : filtering on dates, aggregation... All these operations require dynamic SQL, which adds a great deal of complexity
adding or removing new dates requires modifying the structure of the table
storage is wasted for rows where not all columns are filled
Instead, consider this simple design, with one table that stores the master data of each item_sku, and a child table
create table myskus (
item_sku int primary key,
name text,
cat_level_3_name text
);
create table myvalues (
item_sku int references myskus(item_sku),
date_sku date,
value_sku text,
primary key (item_sku, date_sku)
);
Now your original question is easy to solve:
select v.*, s.name, s.cat_level_3_name
from myskus s
inner join myvalues v on v.item_sku = s.item_sku
where
v.date_sku >= date_trunc('month', now())
and v.date_sku < date_trunc('month', now()) + interval '3 month'

Postgres combining date and time fields, is this efficient

I am selecting rows based on a date range which is held in a string using the below SQL which works but is this a efficient way of doing it. As you can see the date and time is held in different fields. From my memory or doing Oracle work as soon as you put a function around a attribute it cant use indexes.
select *
from events
where venue_id = '2'
and EXTRACT(EPOCH FROM (start_date + start_time))
between EXTRACT(EPOCH FROM ('2017-09-01 00:00')::timestamp)
and EXTRACT(EPOCH FROM ('2017-09-30 00:00')::timestamp)
So is there a way of doing this that can use indexes?
Preface: Since your query is limited to a single venue_id, both examples below create a compound index with venue_id first.
If you want an index for improving that query, you can create an expression index:
CREATE INDEX events_start_idx
ON events (venue_id, (EXTRACT(EPOCH FROM (start_date + start_time))));
If you don't want a dedicated function index, you can create a normal index on the start_date column, and add extra logic to use the index. The index will then limit access plan to date range, and fringe records with wrong time of day on first and last dates are filtered out.
In the following, I'm also eliminating the unnecessary extraction of epoch.
CREATE INDEX events_venue_start
ON events (venue_id, start_date);
SELECT *
FROM events
WHERE venue_id = '2'
AND start_date BETWEEN '2017-09-01'::date AND '2017-09-30'::date
AND start_date + start_time BETWEEN '2017-09-01 00:00'::timestamp
AND '2017-09-30 00:00'::timestamp
The first two parts of the WHERE clause will use the index to full benefit. the last part is then use the filter the records found by the index.

query to fetch records between two date and time

I have been using postgreSQL. My table has 3 columns date, time and userId. I have to find out records between the given date and time frame. Since date and time columns are different, 'BETWEEN' clause is not providing valid results
Combine the two columns into a single timestamp by adding the time to the date:
select *
from some_table
where date_column + time_column
between timestamp '2017-06-14 17:30:00' and timestamp '2017-06-19 08:26:00';
Note that this will not use an index on date_column or time_column. You would need to create an index on that expression. Or better: use a single column defined as timestamp instead.

how do you sum over a related period

I need to sum values that are + 2 months or within a quarter period (related date table)
is there a way to use dense rank to partition those periods (custom periods)?
select
FiscalMonth
,Value
from table
The sql will have to do the following:
Join the value table and the period table
Include the period in the select list and sum the value, grouping by the period
i.e
select b.period, sum(a.value)
from table a
inner join period b on a.FiscalMonth between b.StartMonth and b.EndMonth
group by b.period
Note: The join condition will have to be modified based on what data you actually have in the period table.
Hope this helps
Well, If you need value from an X interval, by month you could use something like:
SELECT *
FROM yourTable
MONTH(some_date) = MONTH(CURRENT_DATE - INTERVAL 1 MONTH) //Could be X interval!
This is an example (which show the results of the previous month, from the actual one). Just trying to write that it is possible to massage the query in functions on intervals.
Of course, you could use the SUMcommand for the adding.