DB2 LUW Parallel Jobs Execution - db2

I have been working in DB2 LUW database, i want to submit procedures as a parallel job. Meaning I have a procesure which will do some DDL, DML statements to one table. This table is having huge data, the same procedure need to run for few more tables run in parallel.
I submit the job using DBMS_JOB.SUBMIT statement and executed the job using DBMS_JOB.RUN statement. I have job handler procedure which helps to do this in parallel.
But each job is executing in sequentially (meaning the first job got completed then the second jobs started, after 2nd job completed 3rd job getting started.
**My First Question **
how to run DBMS_JOB in parallel ?
And second issue I'm facing is the cutrent session is still waiting to get complete all the jobs. I can't use that particular session, once all the job got completed than i can have access to use that same session.
**My Second Question **
*how to make the session accessible, instead of waiting for all jobs completed *
Please help me sir/madam.

DBMS_JOB is an interface to the Administrative Taks Scheduler (ATS) of Db2-LUW for the sake of some compatibility with Oracle RDBMS. However, you can also use the ATS directly independently of DBMS_JOB, via ADMIN_TASK_ADD and related procedures.
My experience is that db2acd (the process that implements autonomic actions including the ATS) is unreliable especially when ulimits are misconfigured, and it silently won't run jobs in some circumstances. It also has a 5 minute wakeup to check for new jobs which can frustrate, and it requires an already activated database which is inconvenient for some use cases.
I would not recommend usage of the Db2 ATS for application layer functionality. Full function enterprise schedulers exist for good reasons.
For parallel invocations, I would use an enterprise scheduling tool if available, or failing-that use the scheduler supplied by the operating system either on the Db2-server or at worst on the client-side, taking care in both cases that each stored-procedure-invocation is its own scheduled-job with its own Db2-connection.
By using a Db2-connection per stored-procedure invocation, and concurrently scheduling them, they run in parallel as long as their actions don't cause mutual contention.
Apart from the above, I believe the ATS will start jobs in parallel provided that the job-defintions are correct.
Examine the contents of both ADMIN_TASK_LIST and ADMIN_TASK_STATUS administrative views, and corroborate with db2diag entries (diaglevel 4 may give more detail, even if you must use it only temporarily).
Calls to SQL PL (or PL/SQL) stored procedures are synchronous relative to the caller, which means that the Db2-connection is blocked until the stored procedure returns. You cannot "make the session accessible" if it is waiting for a stored procedure to complete, but you can open a new connection.
Different options exist for stored procedures that are written in C, or C++, or Java or C++/CLR. They have more freedom. Other options exist for messaging/broker based solutions. uch depends on available skillsets, toolsets, and experience. But in general it's wiser to keep it simple.

Related

Architecture to be able to have a lot of SQL calls on the same tables for a workflow execution

We have a project where we let users execute workflows based on a selection of steps.
Basically each step is linked to an execution and an execution can be linked to one or multiple executionData (the data created or updated during that execution for that step, a blob in postgres).
Today, we execute this through a queuing mechanism where executions are created in queues and workers do the executions and create the next job in the queue.
But this architecture and our implementation make our postgres database slow as when multiple jobs are scheduled at the same time:
We are basically always creating and reading from the execution table (we create the execution to be scheduled, we read the execution when starting the job, we update the status when the job is finished)
We are basically always creating and reading from the executionData table (we add and update executionData during executions)
We have the following issues:
Our executionData table is growing very fast and it's almost impossible to remove rows as there are constantly locks on the table => what could we do to avoid that ? Postgres a good usage for that kind of data ?
Our execution table is growing as well very fast and it impacts the overall execution as to be able to execute we need to create, read & update execution. Delete of rows is as well almost impossible ... => what could we do to improve this ? Usage of historical table ? Suggestions ?
We need to perform statistics on the total executions executed & data saved, this is as well requested on the above table which slows down the process
We use RDS on AWS for our Postgres database.
Thanks for your insights!
Try going for a faster database architecture. Your use-case seems well optimized for a DynamoDB architecture for your executions. You can get O(1) performance, and the blob-storage can fit right into the record as long as you can keep it under 256K.

PostgreSQL: Allow only one instance of stored procedure to run at a time

I have a stored procedure on Postgres, which processes large data and takes a good time to complete.
In my application, there is a chance that 2 processes or schedulers can run this procedure at same time. I want to know if there is a built in mechanism in db to allow only instance of this procedure to run at db level.
I searched the internet, but didn't find anything concrete.
There is nothing built in to define a procedure (or function) so that concurrent execution is prevented.
But you can use advisory locks to achieve something like that.
At the very beginning of the procedure, you can add something like:
perform pg_advisory_lock(987654321);
which will then wait to get the lock. If a second session invokes the procedure it will have to wait.
Make sure you release the lock at the end of the procedure using pg_advisory_unlock() as they are not released when the transaction is committed.
If you use advisory locks elsewhere, make sure you use a key that can't be used in other places.

What would happen if I run two SQL commands using the same DB connection?

I'm writing a program to run mass calculation and output results into PostgreSQL.
My platform is Windows Sever 2008, PostgreSQL 10. My program is written in C.
The results would be produced group by group, finishing of each group will create an extra thread to write the output.
Now since the output threads are created one by one, it is possible that two or more SQL input commands will be created simultaneously, or the previous one is under process when new ones call the function.
So my questions are:
(1) What would happen if one thread is in SQL processing and another thread called PQexec(PGconn *conn, const char *query), would they effect each other?
(2) What if I apply different PGconn? Would it speed up?
If you try to call PQexec on a connection that is in the process of executing an SQL statement, you would cause a protocol violation. That just doesn't work.
Processing could certainly be made faster if you use several database connections in parallel — concurrent transactions is something that PostgreSQL is designed for.

Multiple threads in db2luw

I am very new to Db2. I have a question , Developed few procedures which will perform some operations on db2 database. My question is how to create multiple threads on db2 server concurrently. I mean I have a database with 70,000 tables each having more than 1000 records . I have a procedure which will update all these 70,000 tables. So time consumption is the main factor, here. I want to divide my update statement into 10 threads , where each thread will update 7000 tables. I want to run all the 10 threads simultaneously.
Can some one kindly let me know the way , to achieve this.
DB2 c Express on windows.
There's nothing in DB2 for creating multiple threads.
The enterprise level version of DB2 will automatically process a single statement across multiple cores when and where needed. But that's not what you're asking for.
I don't believe any SQL based RDBMS allows for a SP that create it's own threads. The whole point of SQL is hat it's a higher level of abstraction, you don't have access to those kinds of details.
You'll need to write an external app in a language that supports threads and that opens 10 connections to the DB simultaneously. But depending on the specifics of the update you're doing, and hardware you have. You might find that 10 connections is too many.
To elaborate on Charles's correct answer, it is up to the client application to parallelize its DML workload by opening multiple connections to the database. You could write such a program on your own, but many ETL utilities provide components that enable parallel workflows similar to what you've described. Aside from reduced programming, another advantage of using an ETL tool to define and manage a multi-threaded database update is built-in exception handling, making it easier to roll back all of the involved connections if any of them encounter an error.

Does PostgreSQL allow running stored procedures in parallel?

I'm working with an ETL tool, Business Objects Data Services, which has the capability of specifying parallel execution of functions. The documentation says that before you can do this, you have to make sure that your database, which in our case is Postgres, allows "a stored procedure to run in parallel". Can anyone tell me if Postgres does that?
Sure. Just run your queries in different connections, and they will run in parallel transactions. Beware of locking though.
You can also call different stored procedures from the same connection (and effectively still run them in parallel) by using DBLink.
See this SO answer to see an example.