Scheduling a function in google cloud sql - Postgresql DB - scheduler

I'm trying to schedule a function to periodically run and delete records from my google cloudsql (Postgresql) database. I want this to run a couple of times a day and will run under 10 minutes. What options do I have to schedule this function?
Thanks
Ravi

Your best option will be to use Cloud Scheluder to schedule a job that publishes to a Pub/Sub topic. Then, have a Cloud Function subscribed to this topic so it get's triggered by the message sent.
You can configure this job to run as a Daily routine x times a day.

Try pgAgent
pgAgent is a job scheduling agent for Postgres databases, capable of running multi-step batch or shell scripts and SQL tasks on complex schedules.
pgAgent is distributed independently of pgAdmin. You can download pgAgent from the download area of the pgAdmin website.

Related

How implement Postgres backups on Kubernetes?

What's the best approach for backing up a Postgres on Kubernetes?
My first guess would have been to create a master-slave architecture with enabling replication. Doing an initial pg_basebackup and then fetching the WAL-logs. Once in a month I'd have scheduled another pg_basebackup with a cron, however containerized environemnts don't like cron daemons (no systemd available). How do you schedule base backups?
The best approach is to use the Kubernetes Cronjob resource:
You can use a CronJob to run Jobs on a time-based schedule. These
automated jobs run like Cron tasks on a Linux or UNIX system.
Cron jobs are useful for creating periodic and recurring tasks, like
running backups or sending emails.
You basically need to create a custom Linux image to run in your container jobs. This image will need a Postgres Client (so you can connect to your database with psql, pg_dump or pg_basebackup); and the credentials that can be configured as a secret.
You may want to upload the backup to external storage, so you can install and use awscli for AWS S3, gsutil for Google Cloud Storage, etc...
Here is some references:
Creating a Kubernetes Cron Job to backup Postgres DB
Simple backup of postgres database in kubernetes
Back up databases using Kubernetes CronJobs

What are the best tools to schedule Snowflake tasks or python scripts in Ec2 to load data into snowflake?

Please share your experiences wrt orchestrating jobs run through various tools and programmatic interfaces to load data to Snowflake-
python scripts in Ec2 instances. currently scheduled using crontab.
tasks in snowflake
Alteryx workflows
Are there any tools with sophisticated UI to create job workflows with dependencies?
The workflow can have -
python script followed by a task
Alteryx workflow followed by a python script and then a task
If any job fails then it should send emails to the team.
Thanks
We have used both CONTROL-M and Apache Airflow to schedule and orchestrate data load to snowflake

How can I schedule Postgres queries to run on Amazon RDS?

I tried to install pgAgent, but since it is not supported on Amazon I don't know how to schedule postgres jobs without going with Cron jobs and psql directly. Here is what I got on Amazon RDS:
The following command gave the same result:
CREATE EXTENSION pg_cron;
I have total of three options right now on top of my head for this:
1.)AWS Lambda
2.)AWS Glue
3.)Any small EC2 instance (Linux/Windows)
1.)AWS Lambda:
you can use postgres connectvity python module like pg8000 or psycopg2, to connect and create cursor to your target RDS.
and you can pass your sql jobs code /your SQL statements as an input to lambda. If they are very few, you can just code the whole job in your lambda, if not you can pass it to lambda as a input using DynamoDB.
You can have a cron schedule using cloudwatch event, so that it will trigger lambda whenever you need.
Required tools: DynamoDB, AWS Lambda, Python, Postgres python connectivity module.
2.)AWS Glue
AWS Glue also works almost same. You have a option to connect to your RDS DB directly there and you can schedule your jobs there.
3.)Ec2 instance:
Create any small size ec2 instance, either windows or linux and have setup your cron/bat jobs.
On October 10th, 2018, AWS Lambda launched support for long running functions. Customers can now configure their AWS Lambda functions to run up to 15 minutes per execution. Previously, the maximum execution time (timeout) for a Lambda function was 5 minutes. Using longer running functions, a highly requested feature, customers can perform big data analysis, bulk data transformation, batch event processing, and statistical computations more efficiently.
You could use Amazon CloudWatch Events to trigger a Lambda function on a schedule, but it can only run for a maximum of 15 minutes (https://aws.amazon.com/about-aws/whats-new/2018/10/aws-lambda-supports-functions-that-can-run-up-to-15-minutes/?nc1=h_ls).
You could also run a t2.nano Amazon EC2 instance (about $50/year On-Demand, or $34/year as a Reserved Instance) to run regular cron jobs.

PgAgent jobs not executing on remote server

I don't understand why this isn't working, I set up a pgAgent job to send a NOTIFY from the database every hour
The steps
The schedule
Turns out that problem was that heroku doesn't support agAgent and the database was running on heroku, I ended making a work around the scheduling tasks using windows task scheduler - it's not the best solution but it does the job I needed to do...

Quartz Scheduler using database

I am using Quartz to schedule cron jobs in my web application. i am using a oracle Databse to store jobs and related info. When i add the jobs in the Database, i need to re-start the server/application (tomcat server) for these new jobs to get scheduled. How can i add jobs in the database and make them work without restarting the server.
I assume you mean you are using JDBCJobStore? In that case it is not ideal to make direct changes in the database tables storing the job data. However, I suppose you could set up a separate job that runs every X minutes / hours, checks whether there are new jobs in the database (that need to be scheduled), and schedule them as usual.
Add jobs via the Scheduler API.
http://www.quartz-scheduler.org/docs/best_practices.html