In PostgreSQL i want to create script which can delete old data of before 1 month from A table(which contain many rows) and insert this data into one new alias table. and i want to execute this script every month.
for that i have created script as
insert into B select * from A where date >(now-'30 day'::interval);
delete from A where date <(now()-'30 days.
but in some month there is 30 days and in some 31 days .so how can i set this in cron tab to delete exact data and move in alias table.
While Laurenz answer makes much sense and seems to be a good guess of what OP really wants (cron monthly probably means he wants to flush not "older than one month", but of "previous month" so date_trunc to month is a case)
Yet answering it the other way (as I understand the original post):
begin;
insert into B select * from A where date >(now()-'1 month'::interval);
delete from A where date <(now()-'1 month'::interval);
end;
will delete not all data for previous month, but same timestamp one month ago, eg:
t=# select now()-'1 month'::interval;
?column?
-------------------------------
2017-05-12 07:31:31.785402+00
(1 row)
And with this logic you might want to schedule this data purge daily, not monthly - to keep last month data in active table, not "up to two" untill cron fires...
run it on the first of every month and write it like this:
... WHERE date >= date_trunc('month', current_timestamp) - INTERVAL '1 month'
AND date < date_trunc('month', current_timestamp)
If your table contains a lot of data, you might want to look into partitioning.
Related
I would like to speed up the queries on my big table that contains lots of old data.
I have a table named post that has the date column created_at. The table has over ~31 million rows and ~30 million rows older than 30 days.
Actually, I want this:
move data older than 30 days into the post_archive table or create a partition table.
when the value in column created_at becomes older than 30 days then that row should be moved to the post_archive table or partition table.
Any detailed and concrete solution in PostgresSQL 11.15?
My ideas:
Solution 1. create a cron script in whatever language (e.g. JavaScript) and run it every day to copy data from the post table into post_archive and then delete data from the post table
Solution 2. create a Postgres function that should copy the data from the post table into the partition table, and create a cron job that will call the function every day
Thanks
This is to split your data into a post and post_archive table. It's a common approach, and I've done it (with SQL Server).
Before you do anything else, make sure you have an index on your created_at column on your post table. Important.
Next, you need to use a common expression to mean "thirty days ago". This is it.
(CURRENT_DATE - INTERVAL '30 DAY')::DATE
Next, back everything up. You knew that.
Then, here's your process to set up your two tables.
CREATE TABLE post_archive AS TABLE post; to populate your archive table.
Do these two steps to repopulate your post table with the most recent thirty days. It will take forever to DELETE all those rows, so we'll truncate the table and repopulate it. That's also good because it's like starting from scratch with a much smaller table, which is what you want. This takes a modest amount of downtime.
TRUNCATE TABLE post;
INSERT INTO post SELECT * FROM post_archive
WHERE created_at > (CURRENT_DATE - INTERVAL '30 DAY')::DATE;
DELETE FROM post_archive WHERE created_at > (CURRENT_DATE - INTERVAL '30 DAY')::DATE; to remove the most recent thirty days from your archive table.
Now, you have the two tables.
Your next step is the daily row-migration job. PostgreSQL lacks a built-in job scheduler like SQL Server's Job or MySQL's EVENT so your best bet is a cronjob.
It's probably wise to do the migration daily if that fits with your business rules. Why? Many-row DELETEs and INSERTs cause big transactions, and that can make your RDBMS server thrash. Smaller numbers of rows are better.
The SQL you need is something like this:
INSERT INTO post_archive SELECT * FROM post
WHERE created_at <= (CURRENT_DATE - INTERVAL '30 DAY')::DATE;
DELETE FROM post
WHERE created_at <= (CURRENT_DATE - INTERVAL '30 DAY')::DATE;
You can package this up as a shell script. On UNIX-derived systems like Linux and FreeBSD the shell script file might look like this.
#!/bin/sh
psql postgres://username:password#hostname:5432/database << SQLSTATEMENTS
INSERT INTO post_archive SELECT * FROM post
WHERE created_at <= (CURRENT_DATE - INTERVAL '30 DAY')::DATE;
DELETE FROM post
WHERE created_at <= (CURRENT_DATE - INTERVAL '30 DAY')::DATE;
SQLSTATEMENTS
Then run the shell script from cron a few minutes after 3am each day.
Some notes:
3am? Why? In many places daylight-time switchover messes up the time between 02:00 and 03:00 twice a year. A choice of, say 03:22 as a time to run the daily migration keeps you well away from that problem.
CURRENT_DATE gets you midnight of today. So, if you run the script more than once in any calendar day, no harm is done.
If you miss a day, the next day's migration will catch up.
You could package up the SQL as a stored procedure and put it into your RDBMS, then invoke it from your shell script. But then your migration procedure lives in two different places. You need the cronjob and shell script in any case in PostgreSQL.
Will your application go off the rails if it sees identical rows in both post and post_archive while the migration is in progress? If so, you'll need to wrap your SQL statements in a transaction. That way other users of the database won't see the duplicate rows. Do this.
#!/bin/sh
psql postgres://username:password#hostname:5432/database << SQLSTATEMENTS
START TRANSACTION;
INSERT INTO post_archive SELECT * FROM post
WHERE created_at <= (CURRENT_DATE - INTERVAL '30 DAY')::DATE;
DELETE FROM post
WHERE created_at <= (CURRENT_DATE - INTERVAL '30 DAY')::DATE;
COMMIT;
SQLSTATEMENTS
Cronjobs are quite reliable on Linux and FreeBSD.
I need a setup where rows older than 60 days get removed from the table in PostgreSQL.
I Have created a function and a trigger:
BEGIN
DELETE FROM table
WHERE updateDate < NOW() - INTERVAL '60 days';
RETURN NULL;
END;
$$;
But I believe if the insert frequency is high, this will have to scan the entire table quite often, which will cause high DB load.
I could run this function through a cron job or Lambda function every hour/day. I need to know the insert every hour on that table to take that decision.
Is there a query or job that I can setup which will collect the details?
Just to count the number of records per hour, you could run this query:
SELECT CAST(updateDate AS date) AS day
, EXTRACT(HOUR FROM updateDate) AS hour
, COUNT(*)
FROM _your_table
WHERE updateDate BETWEEN ? AND ?
GROUP BY
1,2
ORDER BY
1,2;
We do about 40 million INSERT's a day on a single table, that is partitioned by month. And after 3 months we just drop the partition. That is way faster than a DELETE.
I need to find a date that is 11 business days after a date.
I did not have a date table. Requested one, long lead time for one.
Used a CTE to produce results that have a datekey, 1 if weekday, and 1 if holiday, else 0. Put those results into a Table Variable, now Business_Day is (weekday-holiday). Much Googling has already happened.
select dt.Datekey,
(dt.Weekdaycount - dt.HolidayCount) as Business_day
from #DateTable dt[enter image description here][1]
UPDATE, I've figured it out in Excel. Running count of business days, a column of business day count + 11, then a Vlookup finding the +11 date . Now how do I do that in SQL?
Results like this
Datekey
2019-01-01
Business_day 0
Datekey
2019-01-02
Business_day
1
I will assume you want to set your weekdays, and you can enter the holidays in a variable table, so you can do the below:-
here set the weekend names
Declare #WeekDayName1 varchar(50)='Saturday'
Declare #WeekDayName2 varchar(50)='Sunday'
Set the holiday table variable, you may have it as a specific table your database
Declare #Holidays table (
[Date] date,
HolidayName varchar(250)
)
Lets insert a a day or two to test it.
insert into #Holidays values (cast('2019-01-01' as date),'New Year')
insert into #Holidays values (cast('2019-01-08' as date),'some other holiday in your country')
lets say your date you want to start from is action date and you need 11 business days after it
Declare #ActionDate date='2018-12-28'
declare #BusinessDays int=11
A recursive CTE to count the days till you get the correct one.
;with cte([date],BusinessDay) as (
select #ActionDate [date],cast(0 as int) BusinessDay
union all
select dateadd(day,1,cte.[date]),
case
when DATENAME(WEEKDAY,dateadd(day,1,cte.[date]))=#WeekDayName1
OR DATENAME(WEEKDAY,dateadd(day,1,cte.[date]))=#WeekDayName2
OR (select 1 from #Holidays h where h.Date=dateadd(day,1,cte.[date])) is not null
then cte.BusinessDay
else cte.BusinessDay+1
end BusinessDay
From cte where BusinessDay<#BusinessDays
)
--to see the all the dates till business day + 11
--select * from cte option (maxrecursion 0)
--to get the required date
select MAX([date]) from cte option (maxrecursion 0)
In my example the date I get is as below:-
ActionDate =2018-12-28
After 11 business days :2019-01-16
Hope this helps
1st step was to create a date table. Figuring out weekday verse weekends is easy. Weekdays are 1, weekends are 0. Borrowed someone else's holiday calendar, if holiday 1 else 0. Then Business day is Weekday-Holiday = Business Day. Next was to create a running total of business days. That allows you to move from whatever running total day you're current on to where you want to be in the future, say plus 10 business days. Hard coded key milestones in the date table for 2 and 10 business days.
Then JOIN your date table with your transaction table on your zero day and date key.
Finally this allows you to make solid calculations of business days.
WHERE CONVERT(date, D.DTRESOLVED) <= CONVERT(date, [10th_Bus_Day])
Can anyone suggest me, the easiest way to find summation of time field in POSTGRESQL. i just find a solution for MYSQL but i need the POSTGRESQL version.
MYSQL: https://stackoverflow.com/questions/3054943/calculate-sum-time-with-mysql
SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(timespent))) FROM myTable;
Demo Data
id time
1 1:23:23
2 4:00:23
3 9:23:23
Desired Output
14:47:09
What you want, is not possible. But you probably misunderstood the time type: it represents a precise time-point in a day. It doesn't make much sense, to add two (or more) times. f.ex. '14:00' + '14:00' = '28:00' (but there are no 28th hour in a day).
What you probably want, is interval (which represents time intervals; hours, minutes, or even years). sum() supports interval arguments.
If you use intervals, it's just that simple:
SELECT sum(interval_col) FROM my_table;
Although, if you stick to the time type (but you have no reason to do that), you can cast it to interval to calculate with it:
SELECT sum(time_col::interval) FROM my_table;
But again, the result will be interval, because time values cannot exceed the 24th hour in a day.
Note: PostgreSQL will even do the cast for you, so sum(time_col) should work too, but the result is interval in this case too.
I tried this solution on sql fieddle:
link
Table creation:
CREATE TABLE time_table (
id integer, time time
);
Insert data:
INSERT INTO time_table (id,time) VALUES
(1,'1:23:23'),
(2,'4:00:23'),
(3,'9:23:23')
query the data:
SELECT
sum(s.time)
FROM
time_table s;
If you need to calculate sum of some field, according another field, you can do this:
select
keyfield,
sum(time_col::interval) totaltime
FROM myTable
GROUP by keyfield
Output example:
keyfield; totaltime
"Gabriel"; "10:00:00"
"John"; "36:00:00"
"Joseph"; "180:00:00"
Data type of totaltime is interval.
I have a table for Inventory Dollars by Vendor by Month. I want to be able to update the dollar amounts for the current month on a daily basis, but I don't want to lose the previous month's data. Here is the basic query I have:
DELETE Inventory_Dollars
FROM Inventory_Summary
WHERE MonthNum = '4'
SELECT
SUM(Cost*OnHand) AS Inventory_Dollars
FROM Inventory
The Inventory table will always hold the current data. How can I just Insert Into Inventory_Summary the data from the Select statement?
Just preface your query with an INSERT:
INSERT INTO Inventory_Summary
(Inventory_Dollars)
SELECT SUM(Cost * OnHand) AS Inventory_Dollars
FROM Inventory
If you've already inserted the inventory_dollars amount for the current month, you can then update the value every day with something like this:
UPDATE Inventory_Summary
SET Inventory_Dollars = (
SELECT (Cost * OnHand)
FROM Inventory
)
WHERE MonthNum = DATEPART(m, GETDATE()) AND Year = DATEPART(year, GETDATE())
The DATEPART can be used to fill in the number of the month for the current date, GETDATE(). Then you won't be updating the inventory_dollars values for past months.
Edit: Also added a year to the where clause, so you don't update months from past years.
Edit 2: If you use a subquery in the SET, make sure only one result can come back.