I have a postgres question :
I have a database with a table which contains weeks data (history : 3 years). An application displays the data. I change the configuration by default in postgres to improve the import of data.
Everyday the table is refresh only with the current week, so the script deletes the current week and import the current week in the table.
It takes 40 minutes but I think I could improve this.
If I truncate the entire table and import all the data, it tooks 3 hours (7 gigas).
Is there a better way than the delete/insert ?
I could create another table with only the data of the current week and use in the application "union"
select * from tb_data union all select * from tb_data_week
I think it will be faster because a truncate/insert on the data week table will be faster than the delete/insert in the big table.
But it will be maybe slowly with the union all in the application
Thanks a lot
You could partition the table by week.
To import the weekly data, insert into a new empty table. Once the import is finished, drop the old week partition and attach the new partition using alter table base_table attach partition ...
The manual has an example for this process: Partition Maintenance
Related
I have one Function in postgresql. And now I have fixed data in my code.
Here one piece of my code:
"TimeStamp_data" > now() - interval '100 hours');
How is it possible to get the 100 hour read out from another table?
Example:
Table: Times
Column: cg_time.
So i like to read out from the cg_time column the time. So enough just change in the table, not need to change in the code.
I am planning to use citus to store system logs for upto n no of days after which they should be deleted.Citus columnar store looked like the perfect database for this until I read
this where its mentioned no deletes can be performed on columnar.
So my question is there an alternate way of achieving delete in the columnar store?
You can temporarily switch table access method to row mode to delete or update the table. Then after the operation you can switch back to columnar access method. An example usage is shown below:
-- create table and fill with generated data until 20 days before
CREATE TABLE logs (
id int not null,
log_date timestamp
);
-- set access method columnar
SELECT alter_table_set_access_method('logs', 'columnar');
-- fill the table with generated data which goes until 20 days before
INSERT INTO logs select i, now() - interval '1 hour' * i from generate_series(1,480) i;
-- now you want to drop last 10 days data, you can switch to row access method temporarily to execute delete or updates
SELECT alter_table_set_access_method('logs', 'heap');
DELETE FROM logs WHERE log_date < (now() - interval '10 days');
-- switch back to columnar access method
SELECT alter_table_set_access_method('logs', 'columnar');
A better alternative for log archiving: We are creating a whole copy of the source table to have a table with new access method. The bigger the table, the more resources will be consumed. A better option is that if you can divide your log table into partitions of days or months, you will only need to change access method for single partition. Note that you should set access method for each partition separately. Columnar currently do not support to set access method of partitioned table directly.
Learn more:
Citus docs
Columnar demo
Archiving logs with columnar
I am new to postgreSQL, I am working on a project where I am requested to move all the partitions older than 6 months to a legacy table so that the query on the table would be faster. I have the partition table with 10 years of data.
Lets assume if myTable is the table with current 6 months data and myTable_legacy is going to have all the data older than 6 months for up-to 10 years. The table is partitioned by monthly range
My questions that I researched online and unable to conclude are
I am currently testing before finalizing the steps, I was using below link as reference for my lab testing, and before performing the actual migration.
How to migrate an existing Postgres Table to partitioned table as transparently as possible?
create table myTable(
forDate date not null,
key2 int not null,
value int not null
) partition by range (forDate);
create table myTable_legacy(
forDate date not null,
key2 int not null,
value int not null
) partition by range (forDate);
1)Daily application query will be only on the current 6 month data. Is it necessary to move data older than 6 months to a new partition to get a better response of query. I researched online but wasn't able to find any solid evidence related to the same.
2)If performance going to be better, How to move older partitions from myTable to myTable_legacy. Based on my research, I can see that we don't have option of exchange partition in PostgreSQL.
Any help or guidance would help me proceed further with the requirement.
When I try to attach partition to mytable_legacy, I am getting error
alter table mytable detach partition mytable_200003;
alter table mytable_legacy attach partition mytable_200003
for values from ('2003-03-01') to ('2003-03-30');
results in:
ERROR: partition constraint is violated by some row
SQL state: 23514
The contents of the partition:
select * from mytable_200003;
"2000-03-02" 1 19
"2000-03-30" 15 8
It's always better to keep the production table light, One of the practices that i do is to use timestamp and write trigger function that will insert row in the other table if timestamp is less than now() (6 months old data).
Quote from the manual
When creating a range partition, the lower bound specified with FROM is an inclusive bound, whereas the upper bound specified with TO is an exclusive bound
(emphasis mine)
So the expression to ('2003-30-03') does not allow March, 30st to be inserted into the partition.
Additionally your data in mytable_200003 is for the year 2000, not for the year 2003 (which you use in your partition definition). To specify the complete march, simply use April, 1st as the upper bound
So you need to change the partition definition to cover March 2000 not March 2003.
alter table mytable_legacy
attach partition mytable_200003
for values from ('2000-03-01') to ('2000-04-01');
^ here ^ here
Online example
I have 3 rather simple tables in Postgres that record which IDs were valid for each business date going back several years. The three tables represent 3 sources that record activity from these IDs. I can't give you the entire table, but imagine:
Date ID
2000-01-02 1
2000-01-02 2
2000-01-02 3
2000-01-02 4
. . .
2018-01-02 49997
2018-01-02 49998
2018-01-02 49999
2018-01-02 50000
So each table has daily data with potentially tens of thousands of IDs. Not all IDs show up on all days in all tables, so all I want is a view that gives me the master list of any ID that shows up on any of the tables on any of the days. Simple:
create view all_ids as
select distinct * from table1 union
select distinct * from table2 union
select distinct * from table3;
The view is created without any problem but it proves impossible to query. If I want to see what days a single id shows up on, I would write:
select * from all_ids where id=37;
The problem is that when Postgres runs this query, it first attempts to create a huge temporary table that is the union of the 3 tables. This, unfortunately, exceeds the temp_file_limit (5767168kB), and as I am not an admin, I cannot change the temp_file_limit. Regardless, this seems to contradict my understanding of how views even work. Please note: I can query an id or list of ids from any of the individual tables just fine.
I can write this as a function that I can pass specific IDs, but again, I believe that the view itself is supposed to handle this by returning just what I am asking for rather than creating the universe of data in memory first and then selecting from it.
Other relevant information is that we're using an old version of Postgres, 9.2.23. I am thinking there is something wrong about how it is handling views. The answer may be to bug out admin to upgrade this.
Any ideas?
What you are looking for is materialized views. I will just quote from the docs::
CREATE VIEW defines a view of a query. The view is not physically materialized. Instead, the query is run every time the view is referenced in a query.
The view that you have created all_ids, like you said, gets re-created every time this view is referenced.
Edit applies to Postgres 9.3+
I want to copy records from one database to another using Pentaho. But I ran into a problem. Let's say there is a transaction_timestamp column ir Table1 with data type timestamp with time zone. But once I select records from the source DB and insert them into the other database - values of the column are offset by one hour or so. The weirdest thing - this doesn't even affect all records. I also tried something like this :
select
transaction_timestamp::timestamp without time zone as transaction_timestamp,
t1.* from
table1 t1
And it didn't work. Could the problem be that when I copy the records to the 2nd DB, it sets all values to the local timezone? But why then the select statement I mentioned doesn't work? And only a part of the records is affected?