Is it possible to update view table on specific interval? (For example every hour)
How can I do this?
Postgres does not have an internal scheduler, so there is NO Direct method to do what you wanting. There are alternatives however. For Linux there is cron, pgAgent, and a Postgres extension pg_cron. For Windows there is Task Scheduler. Additionally, there are many commercial applications.
Related
We're a team of 4 data scientists that use Amazon RDS PostgreSQL for analysis purposes. So we're looking for a way to automatically start/stop the instance automatically but based on usage as opposed to time.
For example, there are clearly solutions for starting and stopping automatically during regular business hours (Stopping an Amazon RDS DB Instance Temporarily).
However, this doesn't quite work for us because we all have different schedules and don't necessarily adhere to a standard schedule. I would like a script that basically checks whether the DB has been used in the past, say 30 minutes, and if not turn off the instance. Then, if someone tries to connect to the DB but it's turned off, then automatically turn it on. My intuition tells me that the latter is harder than the former, but I'm not sure. Is this possible?
To do this you would need to use a CloudWatch Alarm, to do this you would rely on metrics that are available to CloudWatch such as number of connections or CPU Utilization.
This alarm could trigger a Lambda function that will stop your RDS instance, be aware that an RDS instance will restart once it has been off for 7 days.
Alternatively if you're able to use it you could look into Aurora Serverless with the PostgreSQL compatible version. this option would automatically handle the stop/start functionality when no one is using it.
What I want is to update a table every night and cache it so it doesn't have to run each time we run a query based on it. So I figure I need a materialised view (not a view).
Top answer to below question is spot on what I need.
How can I ensure that a materialized view is always up to date?
So, I searched around about materialised views for Postgresql and it seems perfect. All I need is a scheduler.
Pg_cron looks to be popular but from what I understand it is not compatible with Amazon Redshift(See https://github.com/citusdata/pg_cron/)(?)
Is there some other scheduling tool that is useable or some work around to the problem?
Many thanks!
Hannes
Redshift has no inbuilt support for materialised views yet. You will need to have external service do it for you. We are using airflow, where we have written templates DAG which fills materialised views.
Redshift now supports Materialized Views
https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-overview.html
There is however currently no way to schedule the refresh from within RedShift, so you will have to invoke the REFRESH command from some external timer. https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-refresh-sql-command.html
Since 2020/11 it is also possible to allow Redshift to have control of refreshing the materialized view with the AUTO REFRESH YES option. https://docs.amazonaws.cn/en_us/redshift/latest/dg/materialized-view-create-sql-command.html
If you want to keep it serverless, you can use Redshift data API and invoke MV REFRESH from Lambda.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/redshift-data.html
I have few functions that need to run every day at certain time. Is there any way to schedule it from postgres? or do I have to use an external tool?
One option you have is to make use of pgAgent. http://www.pgadmin.org/docs/1.8/pgagent.html
I want to be able to set up one or more Jobs/Triggers when my app runs. The list of Jobs and triggers will come from a db table. I DO NOT care about persisting the jobs to the db for restarting or tracking purposes. Basically I just want to use the DB table as an INIt device. Obviously I can do this by writing the code myself but I am wondering if there is some way to use the SQLJobStore to get this functionality without the overhead of keeping the db updated throughout the life of the app using the scheduler.
Thanks for you help!
Eric
The job store's main purpose is to store the scheduler's state, so there is no way built in to do what you want. You can always write directly to the tables if you want and this will give you the results you want, but this isn't really the best way to do it.
The recommended way to do this would be to write some code that reads the data from your table and then connects to the scheduler using remoting to schedule the jobs.
I just stumbled upon pgpool-II in my search for clustering my Postgres DB (just getting ready to deploy a web app in a couple months). I still have the shakes from excitement, but I'm nervous, as each time I find something this excellent I am soon let down. Have you any experience with pgpool-II, and will it help me run my database in multiple VMs, and later in multiple physical servers altogether? Is it all I need for backing up, load balancing, and providing a higher availability for my DB server!?
Also, is it easy to use the parallel query function (for instance, in Django or through Pythons psycopg2)? This would be most excellent for providing reporting and aggregation!
One last thing: It seems to work between Postgres and psycopg2. Is this a correct understanding of it, so I can use psycopg2 the same as normal, without regard for pgpool-II?
pgpool-II works fine for what it claims to do. And it fits between your application and the database the way you expect it to; just point psycopg2 toward it instead of directly at the database and off you go.
The main thing you have to note is that while it supports many different types of features--replication, load balancing, parallel query--you can't use them all at once. It sounds like you may be under the impression you can do that, and it doesn't work that way. The documentation is not all that clear on this subject (the English version at least, I can't speak to the original Japanese one).
For example, if you run pgpool-II in its "Master/Slave" mode, so that it supports load-balancing for scaling reads, you have to use another program to actually do the replication between those nodes. Slony was the supported replication solution to put underneath of there in earlier PostgreSQL versions, as of pgpool-II 3.0 and PostgreSQL 9.0 you can also use the soon to be released Streaming Replication/Hot Standby features of that new version as well.
pgpool-II is a useful component and you can use it in a lot of interesting ways, but I doubt it will be "all you need" for every requirement you hope to achieve with it.