CICS optimization - db2

I have a CICS program, which will read a DB2 table to obtain the rules based of the field name. Let's say my record type is AA and this type will have at least 20 rules that I need to do loop in DB2 tables. Like wise I have few record types and many more rules tied to each type.
I get data from MQ and for each record type I call separate CICS program. So when I have to process high load, DB2 rules table is getting held by so many program and this causing performance issue.
I want to get away from DB2 and load this rules in CICS Container and maintain periodically. But I'm not sure if this will work. I don't want to use or create VSAM's. I'm looking for some kind of storage I could use and maintain in CICS.
My question is. If I create a pipeline and container will I able to access them by multiple program at a same time and will data stored rules stay in Container after successful get?

Before reading further, please understand that DB2 solves all the sharing and locking problems very efficiently. I've never encountered a problem with too many transactions trying to read a DB2 table concurrently. Updating, yes; a mix of updates and reads, yes; just reading, no.
So, in order to implement your own caching of a DB2 table inside CICS you need a data store. As #BruceMartin indicates, a TS queue is an option, I would say that given your other constraints it is your only option.
In order to automate this you must create a trigger on your DB2 table that fires after INSERT, UPDATE, or DELETE. The trigger must cause the TS queue to be repopulated. The repopulation mechanism could be EXCI or MQ, as the code performing the repopulation must execute within CICS.
During the repopulation, all transactions reading the TS queue must wait for the repopulation to complete. This can be done with the CICS ENQ API, with a caveat. In order to prevent all these transactions from single-threading through their TS queue read due to always ENQing, I suggest using two TS queues, one holds the DB2 data and the other is a "trigger" TS queue. The contents of the trigger TS queue are not significant, you can store a timestamp, or "Hello, World", or "ABC" it doesn't matter.
A normal transaction attempts a read of the trigger TS queue. If the read is unsuccessful the transaction simply reads the TS queue with the DB2 data. But if the read is successful then repopulation is in progress and the transaction ENQs on a resource (call it XYZ). On return from the ENQ, DEQ and read the TS queue with the DB2 data.
During repopulation, a program executed by the trigger on the DB2 table executes in CICS. First ENQing on resource XYZ, then creating the trigger TS queue, then deleting the TS queue with the DB2 data, then creating the TS queue and populating it with the new DB2 data, deleting the trigger TS queue, finally DEQing resource XYZ. I would strongly suggest using a multi-row SELECT to obtain the DB2 data as it is significantly more efficient than the traditional OPEN CURSOR, FETCH, CLOSE CURSOR method.

Related

Bidirectional Replication Design: best way to script and execute unmatched row on Source DB to multiple subscriber DBs, sequentially or concurrently?

Thank you for help or suggestion offered.
I am trying to build my own multi-master replication on Postgresql 10 in Windows, for a situation which cannot use any of the current 3rd party tools for PG multimaster replication, which can also involve another DB platform in a subscriber group (Sybase ADS). I have the following logic to create bidirectional replication, partially inspired by Bucardo's logic, between 1 publisher and 2 subscribers:
When INSERT, UPDATE, or DELETE is made on Source table, Source table Trigger adds row to created meta table on Source DB that will act as a replication transaction to be performed on the 2 subscriber DBs which subcribe to it.
A NOTIFY signal will be sent to a service, or script written in Python or some scripting language will monitor for changes in the metatable or trigger execution and be able to do a table compare or script the statement to run on each subscriber database.
***I believe that triggers on the subscribers will need to be paused to keep them from pushing their received statements to their subscribers, i.e. if node A and node B both subscribe to each other's table A, then an update to node A's table A should replicate to node B's table A without then replicating back to table A in a bidirectional "ping-pong storm".
There will be a final compare between tables and the transaction will be closed. Re-enable triggers on subscribers if they were paused/disabled when pushing transactions from step 2 addendum.
This will hopefully be able to be done bidirectionally, in order of timestamp, in FIFO order, unless I can figure out a to create child processes to run the synchronizations concurrently.
For this, I am trying to figure out the best way to setup the service logic---essentially Step 2 above, which has apparently been done using a daemon in Linux, but I have to work in Windows, making it run as, or resembling, a service/agent---or come up with a reasonably easy and efficient design to send the source DBs statements to the subscribers DBs.
Does anyone see that this plan is faulty or may not work?
Disclaimer: I don't know anything about Postgresql but have done plenty of custom replication.
The main problem with bidirectional replication is merge issues.
If the same key is used in both systems with different attributes, which one gets to push their change? If you nominate a master it's easier. Then the slave just gets overwritten every time.
How much latency can you handle? It's much easier to take the 'notify' part out and just have a five minute windows task scheduler job that inspects log tables and pushes data around.
In other words, this kind of pattern:
Change occurs in a table. A database trigger on that table notes the change and writes the PK of the table to a change log table. A ReplicationBatch column in the log table is set to NULL by default
A windows scheduled task inspects all change log tables to find all changes that happened since the last run and 'reserves' these records by setting their replication state to a replication batch number
i.e. you run a UPDATE LogTable Set ReplicationBatch=BatchNumber WHERE ReplicationState IS NULL
All records that have been marked are replicated
you run a SELECT * FROM LogTable WHERE ReplicationState=RepID to get the records to be processed
When complete, the reserved records are marked as complete so the next time around only subsequent changes are replicated. This completion flag might be in the log table or it might be in a ReplicaionBatch number table
The main point is that you need to reserve records for replication, so that as you are replicating them out, additional log records can be added in from the source without messing up the batch
Then periodically you clear out the log tables.

What's the difference between issuing a query with or without a "begin" and "commit" command in PostgreSQL?

As title say, it is possible to issue a query on psql with a "begin", query, and "commit".
What I want to know is what happens if I don't use a "begin" command?
Some database engine will allow you to execute modifications (INSERT, UPDATE, DELETE) without an open transaction. It's basically assumed that you have an instant BEGIN / COMMIT around each of your instructions, which is a bad practice in case something goes wrong in a batch of many instructions.
You can still make a SELECT, but no INSERT, UPDATE, DELETE without a BEGIN to enforces the good practice. That way, if something goes wrong, a ROLLBACK is instantly executed, canceling all your modifications as if they never existed.
Using a transaction around a batch of various SELECT will guarantee that the data you get for each SELECT matches the same version of the database at the instant you open the transaction depending on your ISOLATION level.
Please read this for more information :
http://www.postgresql.org/docs/9.5/static/sql-start-transaction.html
and
http://www.postgresql.org/docs/9.5/static/tutorial-transactions.html
If you don't use BEGIN/COMMIT, it's the same as wrapping each individual query in a BEGIN/COMMIT block. You can use BEGIN/COMMIT to group multiple queries into a single transaction. A few reasons you might want to do so include
Updating multiple tables at the same time. For instance, usually when you delete a record you also want to delete other rows that reference it. If you do this in the same transaction, nothing will ever be able to reference a row that's already been deleted.
You want to be able to revert some changes if something goes wrong later. Suppose you're writing some user inputted data to multiple tables. At some point you realize that some of it isn't formatted properly. You probably wouldn't want to insert any of it, so you should wrap the entire operation in a transaction.
If you want to ensure the data you're updating hasn't been updated while you're writing to it. Suppose I'm adding $10 to a bank account from two separate connections. I want to add $20 in total - I don't want one of the UPDATEs to clobber the other.
Postgres gives you the first two of these by default. The last one would require a higher transaction isolation level, and makes your query run the risk of raising a serialization error. Transaction isolation levels are a fairly complicated topic, so if you want more info on them the best place to go is the documentation.

How to wait during SELECT that pending INSERT commit?

I'm using PostgreSQL 9.2 in a Windows environment.
I'm in a 2PC (2 phase commit) environment using MSDTC.
I have a client application, that starts a transaction at the SERIALIZABLE isolation level, inserts a new row of data in a table for a specific foreign key value (there is an index on the column), and vote for completion of the transaction (The transaction is PREPARED). The transaction will be COMMITED by the Transaction Coordinator.
Immediatly after that, outside of a transaction, the same client requests all the rows for this same specific foreign key value.
Because there may be a delay before the previous transaction is really commited, the SELECT clause may return a previous snapshot of the data. In fact, it does happen sometimes, and this is problematic. Of course the application may be redesigned but until then, I'm looking for a lock solution. Advisory Lock ?
I already solved the problem while performing UPDATE on specific rows, then using SELECT...FOR SHARE, and it works well. The SELECT waits until the transaction commits and return old and new rows.
Now I'm trying to solve it for INSERT.
SELECT...FOR SHARE does not block and return immediatley.
There is no concurrency issue here as only one client deals with a specific set of rows. I already know about MVCC.
Any help appreciated.
To wait for a not-yet-committed INSERT you'd need to take a predicate lock. There's limited predicate locking in PostgreSQL for the serializable support, but it's not exposed directly to the user.
Simple SERIALIZABLE isolation won't help you here, because SERIALIZABLE only requires that there be an order in which the transactions could've occurred to produce a consistent result. In your case this ordering is SELECT followed by INSERT.
The only option I can think of is to take an ACCESS EXCLUSIVE lock on the table before INSERTing. This will only get released at COMMIT PREPARED or ROLLBACK PREPARED time, and in the mean time any other queries will wait for the lock. You can enforce this via a BEFORE trigger to avoid the need to change the app. You'll probably get the odd deadlock and rollback if you do it that way, though, because INSERT will take a lower lock then you'll attempt lock promotion in the trigger. If possible it's better to run the LOCK TABLE ... IN ACCESS EXCLUSIVE MODE command before the INSERT.
As you've alluded to, this is mostly an application mis-design problem. Expecting to see not-yet-committed rows doesn't really make any sense.

How to manage foreign key errors from insert for the purpose of data validation (t-sql)

I am building a database in SQL Server 2000 and need to perform data validation by testing for foreign key violations. This post is related to an earlier post I made (Trigger exits on first failed insert and cant set xact_abort OFF in SQL Server 2000) which focussed on how to port from a working SQL Server 2005 implementation to a server 2000 implementation. Following the advice received on this post indicating wholesale recoding was required, i am now re-considering the design itself - hence this post. To recap on my application, my
I receive a daily data feed containing ~5k records into a Staging table. When this insert is done a single record is then added to a table called TRIGGER_DATA.
I have created a trigger ‘on insert’ on this table which then attempts to insert the data therein into a FACT_data table one record at a time.
The FACT_data table is foreign keyed to many DIM tables which define the acceptable inputs the field can take.
If any record violates a foreign key constraint the insert should fail and the record should instead be inserted into a Load_error table (which has no foreign key and all fields are Nullable).
Given the volume of records in each insert i thought it would be a bad idea to create the trigger on the Stage_data table since this would result in ~5k trigger firing in one go each day. However since i cannot set xact_abort off in a trigger under SQL Server 2000 and therefore on the first failure it aborts in the trigger i am wondering if it might be actually be a half decent solution.
Questions:
The basic question i am now asking myself is what is the typical approach for doing this - it seems to me that this kind of data validation through checking for FK violations must be common and therefore a consensus best practise may have emerged (although i really cant find any for server 2000 platform!)
Am i correct that the trigger on the stage_data table would be bad practise given the volume of records in each insert or is it acceptable?
Is my approach of looping through each record from within the trigger and testing the insert ok?
What are your thoughts on this alternative that i have just thought of. Stop using triggers altogether and, after the Stage table is loaded, update a 'stack' table with a record saying that data had been received and was ready to be validated and loaded to the FACT table (perhaps along with a priority level indicating order in which order tasks must be processed). This stack or 'job' table would then be a register of all requested inserts along with their status (created/in-progress/completed). I would then have a stored procedure continually poll this table and process the top priority record. This would mean that all stored proc calls would happen outwith the trigger.
Many thanks
You don't need a trigger at all. Unless there is some reason that you need split-second timing of this daily data load, just schedule a job (stored proc) that runs as often as necessary to look for data in the staging table.
When it finds any, process the records one at a time and load the ones that are OK and do whatever you do with the ones that have broken FKs (delete, move to a work queue, etc.).
If you use a schedule frequency that is often enough that there is some risk of the next job starting while the last one is still running, then you should create a sentinel table that your stored proc can write in to say that the job is running. This could work one of two ways. Either you just have one record that says "running" or "not running" or, you could have one record per job (like a transaction log) that has a status code indicating whether the job is complete or not.

How can I be sure that a row, or series of rows returned in one select statement are excluded from other queries to the database in separate threads

I have a PostgreSQL 9.2.2 database that serves orders to my ERP system. The database tables contain boolean columns indicating if a customer is added or not among other records. The code I use extracts the rows from the database and sends them to our ERP system one at a time (single threaded). My code works perfectly in this regard; however over the past year our volume has grown enough to require a multi-threaded solution.
I don't think the MVCC modes will work for me because the added_customer column is only updated once a customer has been successfully added. The default MVCC modes could cause the same row to be worked on at the same time resulting in duplicate web service calls. What I want to avoid is duplicate web service calls to our ERP system as they can be rather heavy, although admittedly I am not an expert on MVCC nor the other modes that PostgreSQL provides.
My question is: How can I be sure that a row, or series of rows returned in one select statement are excluded from other queries to the database in separate threads?
You will need to record the fact that the rows are being processed somehow. You will also need to deal with concurrent attempts to mark them as being processed and handle failures with sending them to your ERP system.
You may find SELECT ... FOR UPDATE useful to get a set of rows and simultaneously lock them against updates. One approach might be for each thread to select a target row, try to add it's ID to a "processing" table, then remove it in the same transaction you update added_customer.
If a thread fetches no candidate rows, or fails to insert then it just needs to sleep briefly and try again. If anything goes badly wrong then you should have rows left in the "processing" table that you can inspect/correct.
Of course the other option is to just grab a set of candidate rows and spawn a separate process/thread for each that communicates with the ERP. That keeps the database fetching single-threaded while allowing multiple channels to the ERP.
You can add a column user_is_proccesed to the table. It can hold the process id for the back end, that updates the record.
Then use a small serializable transaction to set the user_is_proccesed to "lock row for proccesing".
Something like:
START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
UPDATE user_table
SET user_is_proccesed = pg_backend_pid()
WHERE <some condition>
AND user_is_proccesed IS NULL; -- no one is proccesing it now
COMMIT;
The key thing here - with SERIALIZABLE only one transaction can successfully update the record (all other concurrent SERIALIZABLE updates will fail with ERROR: could not serialize access due to concurrent update).