Postgres SQL query runs very slow when using current_date - postgresql

We have a SQL query using a where clause to filter the data set by date. Currently the where clause in the query is set as follows -
WHERE date_field BETWEEN current_date - integer #interval AND current_date (where the interval is the last 90 days)
This query has suddenly started to slow down for the past couple of days. It has started to take upwards of 10 mins. If we remove the current_date from this query and hard code dates in this query it runs like it used to previously in less than 10 seconds.
Hard-coded version
WHERE date_field BETWEEN '03-12-2020' AND '06-12-2020'
This query is run against the Postgres engine in Amazon Aurora. The same where clause is used in other queries filter against the same field which are not impacted by this issue.
I am trying to figure out how we can determine what caused this issue suddenly and how we can resolve this?

Related

Move rows older that x days to archive table or partition table in Postgres 11

I would like to speed up the queries on my big table that contains lots of old data.
I have a table named post that has the date column created_at. The table has over ~31 million rows and ~30 million rows older than 30 days.
Actually, I want this:
move data older than 30 days into the post_archive table or create a partition table.
when the value in column created_at becomes older than 30 days then that row should be moved to the post_archive table or partition table.
Any detailed and concrete solution in PostgresSQL 11.15?
My ideas:
Solution 1. create a cron script in whatever language (e.g. JavaScript) and run it every day to copy data from the post table into post_archive and then delete data from the post table
Solution 2. create a Postgres function that should copy the data from the post table into the partition table, and create a cron job that will call the function every day
Thanks
This is to split your data into a post and post_archive table. It's a common approach, and I've done it (with SQL Server).
Before you do anything else, make sure you have an index on your created_at column on your post table. Important.
Next, you need to use a common expression to mean "thirty days ago". This is it.
(CURRENT_DATE - INTERVAL '30 DAY')::DATE
Next, back everything up. You knew that.
Then, here's your process to set up your two tables.
CREATE TABLE post_archive AS TABLE post; to populate your archive table.
Do these two steps to repopulate your post table with the most recent thirty days. It will take forever to DELETE all those rows, so we'll truncate the table and repopulate it. That's also good because it's like starting from scratch with a much smaller table, which is what you want. This takes a modest amount of downtime.
TRUNCATE TABLE post;
INSERT INTO post SELECT * FROM post_archive
WHERE created_at > (CURRENT_DATE - INTERVAL '30 DAY')::DATE;
DELETE FROM post_archive WHERE created_at > (CURRENT_DATE - INTERVAL '30 DAY')::DATE; to remove the most recent thirty days from your archive table.
Now, you have the two tables.
Your next step is the daily row-migration job. PostgreSQL lacks a built-in job scheduler like SQL Server's Job or MySQL's EVENT so your best bet is a cronjob.
It's probably wise to do the migration daily if that fits with your business rules. Why? Many-row DELETEs and INSERTs cause big transactions, and that can make your RDBMS server thrash. Smaller numbers of rows are better.
The SQL you need is something like this:
INSERT INTO post_archive SELECT * FROM post
WHERE created_at <= (CURRENT_DATE - INTERVAL '30 DAY')::DATE;
DELETE FROM post
WHERE created_at <= (CURRENT_DATE - INTERVAL '30 DAY')::DATE;
You can package this up as a shell script. On UNIX-derived systems like Linux and FreeBSD the shell script file might look like this.
#!/bin/sh
psql postgres://username:password#hostname:5432/database << SQLSTATEMENTS
INSERT INTO post_archive SELECT * FROM post
WHERE created_at <= (CURRENT_DATE - INTERVAL '30 DAY')::DATE;
DELETE FROM post
WHERE created_at <= (CURRENT_DATE - INTERVAL '30 DAY')::DATE;
SQLSTATEMENTS
Then run the shell script from cron a few minutes after 3am each day.
Some notes:
3am? Why? In many places daylight-time switchover messes up the time between 02:00 and 03:00 twice a year. A choice of, say 03:22 as a time to run the daily migration keeps you well away from that problem.
CURRENT_DATE gets you midnight of today. So, if you run the script more than once in any calendar day, no harm is done.
If you miss a day, the next day's migration will catch up.
You could package up the SQL as a stored procedure and put it into your RDBMS, then invoke it from your shell script. But then your migration procedure lives in two different places. You need the cronjob and shell script in any case in PostgreSQL.
Will your application go off the rails if it sees identical rows in both post and post_archive while the migration is in progress? If so, you'll need to wrap your SQL statements in a transaction. That way other users of the database won't see the duplicate rows. Do this.
#!/bin/sh
psql postgres://username:password#hostname:5432/database << SQLSTATEMENTS
START TRANSACTION;
INSERT INTO post_archive SELECT * FROM post
WHERE created_at <= (CURRENT_DATE - INTERVAL '30 DAY')::DATE;
DELETE FROM post
WHERE created_at <= (CURRENT_DATE - INTERVAL '30 DAY')::DATE;
COMMIT;
SQLSTATEMENTS
Cronjobs are quite reliable on Linux and FreeBSD.

Grafana PostgreSQL distinct on() with time series

I'm quite new to Grafana and Postgres and could use some help with this. I have a dataset in PostgreSQL with temperature forecasts. Mutiple forecasts are published at various points throughout the day (indicated by dump_date) for the same reference date. Say: at 06:00 today and at 12:00 today a forecast is published for tomorrow (where the time is indicated by start_time). Now I want to visualize the temperature forecast as a time series using Grafana. However, I only want to visualize the latest published forecast (12:00) and not both forecasts. I thought I would use DISTINCT ON() to select only the latest published forecast from this dataset, but somehow with Grafana this is not responding. My code in Grafana is as follows:
SELECT
$__time(distinct on(t_ID.start_time)),
concat('Forecast')::text as metric,
t_ID.value
FROM
forecast_table t_ID
WHERE
$__timeFilter(t_ID.start_time)
and t_ID.start_time >= (current_timestamp - interval '30 minute')
and t_ID.dump_date >= (current_timestamp - interval '30 minute')
ORDER BY
t_ID.start_time asc,
t_ID.dump_date desc
This is not working however since I get the message: 'syntax error at or near AS'. What should I do?
You are using Grafana macro $__time, so your query in the editor:
SELECT
$__time(distinct on(t_ID.start_time)),
generates SQL:
SELECT
distinct on(t_ID.start_time AS "time"),
which is incorrect SQL syntax.
I wouldn't use macro. I would write correct SQL directly, e.g.
SELECT
distinct_on(t_ID.start_time) AS "time",
Also use Generated SQL and Query inspector Grafana features for debugging and query development. Make sure that Grafana generates correct SQL for Postgres.

Ecto.Adapters.SQL.query! gives a different result

So this is apparently one of these weird days... And I know this makes 0 sense.
I'm executing a query in datagrip (a tool to execute raw querys) to the exact same database as in my phoenix application. And they are returning different results.
The query is quite complicated, but it's the only query that shows different results. So I cannot simplify it. I've tried other queries to be sure that I'm having the same database, restarted the server etc.
Here is the exact same query executed from my console. As you can see it is not the same result. A few rows are missing.
I have also checked if this is a timing issue by executing select now() => same result (more or less obviously). If I execute only the generate_series part, it returns the same result. So it could have something to do with the join.
I also checked the last few entries in the ttnmessages table just to be sure there is no general caching issue. The queries do also give the same result there.
So my question is: Is there anything that Ecto does differently upon executing a query? How can I figure this out? I'm grateful for any hint.
EDIT: The query is in both cases:
SELECT g.series AS time, MAX((t.payload ->'pulse')::text::numeric) as pulse
FROM generate_series(date_trunc('hour', now())- INTERVAL '12 hours', date_trunc('hour', now()), INTERVAL '60 min') AS g(series)
LEFT JOIN ttnmessages t
ON t.inserted_at < g.series + INTERVAL '60 min'
AND t.inserted_at > g.series
WHERE t.hardware_serial LIKE '093B55DF0C2C525A'
GROUP BY g.series
ORDER BY g.series;
While I did not find out the cause, I changed the query to the following:
SELECT MAX(t.inserted_at) as time, (t.payload ->'pulse')::text::numeric as pulse
FROM ttnmessages t
WHERE t.inserted_at > now() - INTERVAL '12 hours'
AND t.payload ->'pulse' IS NOT NULL
AND t.hardware_serial LIKE '093B55DF0C2C525A'
GROUP BY (t.payload ->'pulse')
ORDER BY time;
Runtime is < 50ms, so I'm happy with the result.
And I'll ignore the different results from the question. The query here returns the same result just like it's supposed to.

extract timestamp from EPOCH cloumn from postgresql DB table

I am trying to execute below-mentioned PostgreSQL query and in which checkin_ts in epoch format and want to write a query which is humanly understandable(putting timestamp in human readable format).
select * from users where to_timestamp(checkin_ts) >= '2017-11-11 00:00:00'
LIMIT 100;
when I tried to execute the above query then I get the following error
ERROR: execute cannot be used while an asynchronous query is underway
You need to give us the description of your table, to know checkin_ts format.
But to_timestamp need to text paramaters like this :
SELECT *
FROM users
WHERE to_timestamp(checkin_ts::text, 'YYYY-MM-DD HH24:MI:SS') >= '2017-11-11 00:00:00' LIMIT 100;
And what is your environment to execute this query ?
The error seems to be that you are trying to execute two queries on the same connection using two different cursors.

How to query just the last record of every day

I have a table power with a datetimetz field called sample_time and a column called amp_hours.
The amp_hours field gains a record about every two minute and is reset every night at midnight.
I would like to see sample_time and amp_hours for the last record of every day.
I'm very new to SQL so I may be overlooking an obvious answer.
I saw this post on how to select the last record of a group but I'm not familiar enough with SQL to get it to work for datetimes:
I thought to use lead() or lag() to compare the date of a record with the next record but I'm using postgresql 8.3 and I think windowing was introduced in 8.4.
Try this:
SELECT DISTINCT ON (sample_time::date) sample_time, amp_hours
FROM power
ORDER BY sample_time::date DESC, sample_time DESC;