I am mostly a Java programmer and we can easily run different methods or functions multithreaded or in parallel (simultaneously) by creating new/different Threads.
I recently was writing many Functions and Procedures for my Postgres database and utilizing the Pg_Cron extension, which lets you schedule "Jobs" (basically plpgsql Functions or Procedures you write) to run based on a Cron expression.
With these Jobs, as I understand it, you can have the scripts run essentially in Parallel/Concurrent.
Now, I am curious, without using Pg_cron to run db maintenance tasks, is there anyone at all in Postgres to write "concurrent" logic or scripts that run parallel, without using 3rd party extensions/libraries?
Yes, that is trivial: just open several database connections and run statements in each of them concurrently.
Related
I am seeing some slow performance on a couple of my queries that run against my db2 on cloud instance. When I had a local db2, I would try these tools to see if I could improve performance. Now, with db2 on cloud, I believe I can run them using admin_cmd, however, if they are already being run automatically on my db objects, there is no point, but I am not sure how to tell.
Yes, Db2 on Cloud does auto reorgs and runstats automatic. We do recommend running them manually, if you are running a lot of data loads to better the performance.
As you stated, Db2 on Cloud is a managed (as a Service) database offering. But this is for the general part, not for application-specific stuff. Backup / restore can be done without any application insights, but creating indexes, running runstats or performing reorgs is application-specific.
Runstats can be invoked using admin_cmd. The same is true for running reorg on tables and indexes.
I want to consolidate a couple of historically grown scripts (Python, Bash and Powershell) which purpose is to sync data between a lot of different database backends (mostly postgres, but also oracle and sqlserver) and on different sites. There isn't really a master, its more like a loose couple of partner companies working on the same domain specific use cases, everyone with its own data silo and its my job to hold all this together as good as I can.
Currently those scripts I mentioned are cron scheduled and need to run on the origin server where a dataset gets initially written, to sync it to every partner over night.
I am also familiar with and use Apache Airflow in another project. So my idea was to use an workflow management tool like Airflow to streamline the sync process and get it more centralized. But also with Airflow there is only a time interval scheduler available to trigger a DAG.
As most writes come in over postgres databases, I'd like to make use of the NOTIFY/LISTEN feature and already have a python daemon based on this listening to any database change (via triggers) and calling an event handler then.
The last missing piece is how its probably best done to trigger an airflow DAG with this handler and how to keep all this running reliably?
Perhaps there is a better solution?
I have setup a PostgreSQL server and am using PgAdmin 4 for managing the databases/clusters. I have a bunch of SQL validation scripts (.sql) which I run on the databases every time some data is added to the database.
My current requirement is to automatically run these .sql scripts and generate some results/statistics every time new data is added to any of the tables in the database.
I have explored the use of pg_cron (https://www.citusdata.com/blog/2016/09/09/pgcron-run-periodic-jobs-in-postgres/) and pgAgent (https://www.pgadmin.org/docs/pgadmin4/dev/pgagent_jobs.html)
Before I proceed to integrate any of these tools into my application, I wanted to know if it is advisable to proceed using these utilities or if I should employ the service of a full-fledged CI framework like Jenkins?
I tried to install pgAgent, but since it is not supported on Amazon I don't know how to schedule postgres jobs without going with Cron jobs and psql directly. Here is what I got on Amazon RDS:
The following command gave the same result:
CREATE EXTENSION pg_cron;
I have total of three options right now on top of my head for this:
1.)AWS Lambda
2.)AWS Glue
3.)Any small EC2 instance (Linux/Windows)
1.)AWS Lambda:
you can use postgres connectvity python module like pg8000 or psycopg2, to connect and create cursor to your target RDS.
and you can pass your sql jobs code /your SQL statements as an input to lambda. If they are very few, you can just code the whole job in your lambda, if not you can pass it to lambda as a input using DynamoDB.
You can have a cron schedule using cloudwatch event, so that it will trigger lambda whenever you need.
Required tools: DynamoDB, AWS Lambda, Python, Postgres python connectivity module.
2.)AWS Glue
AWS Glue also works almost same. You have a option to connect to your RDS DB directly there and you can schedule your jobs there.
3.)Ec2 instance:
Create any small size ec2 instance, either windows or linux and have setup your cron/bat jobs.
On October 10th, 2018, AWS Lambda launched support for long running functions. Customers can now configure their AWS Lambda functions to run up to 15 minutes per execution. Previously, the maximum execution time (timeout) for a Lambda function was 5 minutes. Using longer running functions, a highly requested feature, customers can perform big data analysis, bulk data transformation, batch event processing, and statistical computations more efficiently.
You could use Amazon CloudWatch Events to trigger a Lambda function on a schedule, but it can only run for a maximum of 15 minutes (https://aws.amazon.com/about-aws/whats-new/2018/10/aws-lambda-supports-functions-that-can-run-up-to-15-minutes/?nc1=h_ls).
You could also run a t2.nano Amazon EC2 instance (about $50/year On-Demand, or $34/year as a Reserved Instance) to run regular cron jobs.
Can anyone let me know how to pull data from DB2 using SAS program. I have a DB2 query and want to write SAS code to pull the data from DB2 using the DB2 query. Please share you knowledge in achieving this task.[SAS-Mainframe]. (2) Pointers in connecting to DB2(mainframe) using SAS.
Most likely the issue is with your JCL, not SAS. On the mainframe, jobs are run in lpars (logical partitions). An analogy would be several computers networked together. Each lpar(or computer) would be set up with software and networked to hard drives and db2 servers. Usually one lpar is set aside to run only production jobs, another for development, etc. It is a way to make sure production jobs get the resources they need without development jobs interfering.
In this scenario, each lpar would have SAS installed, but only one partition would be networked to the db2 server you are trying to get your data from. Your JCL would tell the system which lpar to run your job on. Either the wrong lpar is coded in your JCL or your job is running in a default lpar which is not the one your job needs.
The JCL code to run in the correct lpar is customized for each system, so only someone who is running jobs on your system will know what the code is. I suggest going to someone also running jobs your system and tell them as you said 'SAS program without DB2 connectivity is working fine, but otherwise it is not.' They should be able to point you to the JCL code you need.
Good luck.