I need to execute a query on a teradata database on a daily basis (select + insert).
Can this be done within the (teradata-) database or should I consider external means (e.g. a cron-job).
Teradata doesn't have a built-in scheduler to run jobs. You will need to leverage something like cron or Tivioli Workload Scheduler to manage your job schedule(s).
Related
Is there a way to kill Snowflake queries using the Spark connector ? Alternatively is there a way to grab the last last query id or session id in Spark to kill it outside of Spark.
The use case is user controlled long running Spark jobs with long running Snowflake queries. When a user is killing the Spark jobs , the current Snowflake query keeps on running (for many hours )
Thank you
Log into snowflake UI (or use snowsql) with the same user you use for spark, and run following:
use database <your_db>;
use warehouse <your wh>;
select
query_id, query_text, execution_status, error_message, start_time, end_time
from
table(information_schema.query_history( RESULT_LIMIT => 10) );
This should show you your recent queries. Find the one that is in RUNNING state, copy its QUERY_ID, and use to run this:
select system$cancel_query('<your query id here>');
I am mostly a Java programmer and we can easily run different methods or functions multithreaded or in parallel (simultaneously) by creating new/different Threads.
I recently was writing many Functions and Procedures for my Postgres database and utilizing the Pg_Cron extension, which lets you schedule "Jobs" (basically plpgsql Functions or Procedures you write) to run based on a Cron expression.
With these Jobs, as I understand it, you can have the scripts run essentially in Parallel/Concurrent.
Now, I am curious, without using Pg_cron to run db maintenance tasks, is there anyone at all in Postgres to write "concurrent" logic or scripts that run parallel, without using 3rd party extensions/libraries?
Yes, that is trivial: just open several database connections and run statements in each of them concurrently.
I want to get the list of Jobs which are working in my server on database, which displays the name, job timings, etc. using a query. Is it possible in Postresql PgAdmin.
I'm trying to schedule a function to periodically run and delete records from my google cloudsql (Postgresql) database. I want this to run a couple of times a day and will run under 10 minutes. What options do I have to schedule this function?
Thanks
Ravi
Your best option will be to use Cloud Scheluder to schedule a job that publishes to a Pub/Sub topic. Then, have a Cloud Function subscribed to this topic so it get's triggered by the message sent.
You can configure this job to run as a Daily routine x times a day.
Try pgAgent
pgAgent is a job scheduling agent for Postgres databases, capable of running multi-step batch or shell scripts and SQL tasks on complex schedules.
pgAgent is distributed independently of pgAdmin. You can download pgAgent from the download area of the pgAdmin website.
I tried to install pgAgent, but since it is not supported on Amazon I don't know how to schedule postgres jobs without going with Cron jobs and psql directly. Here is what I got on Amazon RDS:
The following command gave the same result:
CREATE EXTENSION pg_cron;
I have total of three options right now on top of my head for this:
1.)AWS Lambda
2.)AWS Glue
3.)Any small EC2 instance (Linux/Windows)
1.)AWS Lambda:
you can use postgres connectvity python module like pg8000 or psycopg2, to connect and create cursor to your target RDS.
and you can pass your sql jobs code /your SQL statements as an input to lambda. If they are very few, you can just code the whole job in your lambda, if not you can pass it to lambda as a input using DynamoDB.
You can have a cron schedule using cloudwatch event, so that it will trigger lambda whenever you need.
Required tools: DynamoDB, AWS Lambda, Python, Postgres python connectivity module.
2.)AWS Glue
AWS Glue also works almost same. You have a option to connect to your RDS DB directly there and you can schedule your jobs there.
3.)Ec2 instance:
Create any small size ec2 instance, either windows or linux and have setup your cron/bat jobs.
On October 10th, 2018, AWS Lambda launched support for long running functions. Customers can now configure their AWS Lambda functions to run up to 15 minutes per execution. Previously, the maximum execution time (timeout) for a Lambda function was 5 minutes. Using longer running functions, a highly requested feature, customers can perform big data analysis, bulk data transformation, batch event processing, and statistical computations more efficiently.
You could use Amazon CloudWatch Events to trigger a Lambda function on a schedule, but it can only run for a maximum of 15 minutes (https://aws.amazon.com/about-aws/whats-new/2018/10/aws-lambda-supports-functions-that-can-run-up-to-15-minutes/?nc1=h_ls).
You could also run a t2.nano Amazon EC2 instance (about $50/year On-Demand, or $34/year as a Reserved Instance) to run regular cron jobs.