Huge PostgreSQL table - Select, update very slow - postgresql

I am using PostgreSQL 9.5. I have a table which is almost 20GB's. It has a primary key on the ID column which is an auto-increment column, however I am running my queries on another column which is a timestamp... I am trying to select/update/delete on the basis of a timestamp column but the queries are very slow. For example: A select on this table `where timestamp_column::date (current_date - INTERVAL '10 DAY')::date) is taking more than 15 mins or so..
Can you please help on what kind of Index should I add to this table (if needed) to make it perform faster?
Thanks

You can create an index with your clause expression:
CREATE INDEX ns_event_last_updated_idx ON ns_event (CAST(last_updated AT TIME ZONE 'UTC' AS DATE));
But, keep in mind that you're using timestamp with timezone, cast this type to date can let you get undesirable side effects.
Also, remove all casting in your sql:
select * from ns_event where Last_Updated < (current_date - INTERVAL '25 DAY');

Related

postgresql: Copy a column into another column when the number of rows are very large

I have an table with millions of rows.
I have column called time_from_device (which is of type timezone with time stamp)
id | name | t1 | t2 | t3 | time_from_device |
---------------------------------------------
Now i want to add a column called created_at whose value will be now()
But before i set the default value of created_at as now(), I want to fill the existing rows created_at with time_from_device - INTERVAL '1 hour'
So I am doing the following
ALTER TABLE devicedata ADD COLUMN "created_at" timestamp with time zone;
This creates a new column created_at with NULL values
Now i want to fill the column with time values from time_from_device - INTERVAL '1 hour'
UPDATE devicedata SET created_at = time_from_device - INTERVAL '1 hour';
Since there are millions of rows, this command just hangs
How can I know whether its working or not
The way you are doing this is correct. Just be patient.
One potential problem could be row locks by concurrent long running transactions. Make sure that there are no such transactions.
You can examine the wait_event in the pg_stat_activity row of the corresponding session: if that is NULL, your query is happily working.
To speed up the operation, you could drop all indexes and constraints before updating the table and then create them again – your operation will probably require down time anyway.
The size of a transaction is no performance problem in PostgreSQL.

How to Truncate a postgreSQL table with conditions

I'm trying to truncate a PostgreSQL Table with some conditions.
Truncate all the data in the table and just let the data of the last 6 months
For that i have written this Query
select distinct datecalcul
from Table
where datecalcul > now() - INTERVAL '6 months'
order by datecalcul asc
How could I add the truncate clause?
TRUNCATE does not support a WHERE condition. You will have to use a DELETE statement.
delete from the_table
where ...
If you want to get rid of old ("expired") rows efficiently based on a timestamp, you can think about partitioning. Then you can just drop the old partitions.

Postgres combining date and time fields, is this efficient

I am selecting rows based on a date range which is held in a string using the below SQL which works but is this a efficient way of doing it. As you can see the date and time is held in different fields. From my memory or doing Oracle work as soon as you put a function around a attribute it cant use indexes.
select *
from events
where venue_id = '2'
and EXTRACT(EPOCH FROM (start_date + start_time))
between EXTRACT(EPOCH FROM ('2017-09-01 00:00')::timestamp)
and EXTRACT(EPOCH FROM ('2017-09-30 00:00')::timestamp)
So is there a way of doing this that can use indexes?
Preface: Since your query is limited to a single venue_id, both examples below create a compound index with venue_id first.
If you want an index for improving that query, you can create an expression index:
CREATE INDEX events_start_idx
ON events (venue_id, (EXTRACT(EPOCH FROM (start_date + start_time))));
If you don't want a dedicated function index, you can create a normal index on the start_date column, and add extra logic to use the index. The index will then limit access plan to date range, and fringe records with wrong time of day on first and last dates are filtered out.
In the following, I'm also eliminating the unnecessary extraction of epoch.
CREATE INDEX events_venue_start
ON events (venue_id, start_date);
SELECT *
FROM events
WHERE venue_id = '2'
AND start_date BETWEEN '2017-09-01'::date AND '2017-09-30'::date
AND start_date + start_time BETWEEN '2017-09-01 00:00'::timestamp
AND '2017-09-30 00:00'::timestamp
The first two parts of the WHERE clause will use the index to full benefit. the last part is then use the filter the records found by the index.

query to fetch records between two date and time

I have been using postgreSQL. My table has 3 columns date, time and userId. I have to find out records between the given date and time frame. Since date and time columns are different, 'BETWEEN' clause is not providing valid results
Combine the two columns into a single timestamp by adding the time to the date:
select *
from some_table
where date_column + time_column
between timestamp '2017-06-14 17:30:00' and timestamp '2017-06-19 08:26:00';
Note that this will not use an index on date_column or time_column. You would need to create an index on that expression. Or better: use a single column defined as timestamp instead.

date_trunc on timestamp column returns nothing

I have a strange problem when retrieving records from db after comparing a truncated field with date_trunc().
This query doesn't return any data:
select id from my_db_log
where date_trunc('day',creation_date) >= to_date('2014-03-05'::text,'yyyy-mm-dd');
But if I add the column creation_date with id then it returns data(i.e. select id, creation_date...).
I have another column last_update_date having same type and when I use that one, still does the same behavior.
select id from my_db_log
where date_trunc('day',last_update_date) >= to_date('2014-03-05'::text,'yyyy-mm-dd');
Similar to previous one. it also returns record if I do id, last_update_date in my select.
Now to dig further, I have added both creation_date and last_updated_date in my where clause and this time it demands to have both of them in my select clause to have records(i.e. select id, creation_date, last_update_date).
Does anyone encountered the same problem ever? This similar thing works with my other tables which are having this type of columns!
If it helps, here is my table schema:
id serial NOT NULL,
creation_date timestamp without time zone NOT NULL DEFAULT now(),
last_update_date timestamp without time zone NOT NULL DEFAULT now(),
CONSTRAINT db_log_pkey PRIMARY KEY (id),
I have asked a different question earlier that didn't get any answer. This problem may be related to that one. If you are interested on that one, here is the link.
EDITS:: EXPLAIN (FORMAT XML) with select * returns:
<explain xmlns="http://www.postgresql.org/2009/explain">
<Query>
<Plan>
<Node-Type>Result</Node-Type>
<Startup-Cost>0.00</Startup-Cost>
<Total-Cost>0.00</Total-Cost>
<Plan-Rows>1000</Plan-Rows>
<Plan-Width>658</Plan-Width>
<Plans>
<Plan>
<Node-Type>Result</Node-Type>
<Parent-Relationship>Outer</Parent-Relationship>
<Alias>my_db_log</Alias>
<Startup-Cost>0.00</Startup-Cost>
<Total-Cost>0.00</Total-Cost>
<Plan-Rows>1000</Plan-Rows>
<Plan-Width>658</Plan-Width>
<Node/s>datanode1</Node/s>
<Coordinator-quals>(date_trunc('day'::text, creation_date) >= to_date('2014-03-05'::text, 'yyyy-mm-dd'::text))</Coordinator-quals>
</Plan>
</Plans>
</Plan>
</Query>
</explain>
"Impossible" phenomenon
The number of rows returned is completely independent of items in the SELECT clause. (But see #Craig's comment about SRFs.) Something must be broken in your db.
Maybe a broken covering index? When you throw in the additional column, you force Postgres to visit the table itself. Try to re-index:
REINDEX TABLE my_db_log;
The manual on REINDEX. Or:
VACUUM FULL ANALYZE my_db_log;
Better query
Either way, use instead:
select id from my_db_log
where creation_date >= '2014-03-05'::date
Or:
select id from my_db_log
where creation_date >= '2014-03-05 00:00'::timestamp
'2014-03-05' is in ISO 8601 format. You can just cast this string literal to date. No need for to_date(), works with any locale. The date is coerced to timestamp [without time zone] automatically when compared to creation_date (being timestamp [without time zone]). More details about timestamps in Postgres here:
Ignoring timezones altogether in Rails and PostgreSQL
Also, you gain nothing by throwing in date_trunc() here. On the contrary, your query will be slower and any plain index on the column cannot be used (potentially making this much slower)