Pagination logic in Mainframe CICS - db2

Here is my requirement.
Front (Client) end will do a search based on predefined conditions (for instance: customer id, account number, first name, last name, etc). I need to get the data corresponding to this request from a db2 database and send it back to them (Server). We use CICS channels and containers to pass requests and responses between the Client and Server.
Front end needs the data ordered by: Receive date descending, Customer id Ascending, Account number Ascending. Data are fetched in pages of 500 records. For example, if for a search request from front end would retrieve 50,000 records from the db2 database, we need to return this data in 500 record "pages". For pagination concept, we use the field security deposit number which is primary key to our database but the sorting order is not based on this field.
I would like to know whether we can use scrollable cursor logic in CICS to implement pagination.
Please note that I do not prefer to go for internal array bubble sort to send the data in response as it would degrade performance. I like to do it via query logic.any thoughts?
Example (Initial Front end input request):
Customer id : A
First time request (To identify whether it is first time or next or previous request for pagination)
First security deposit number : 0
Last security deposit number : 0
Since this is first time request, both this field will be having zero from front end and we need to retrieve records from database based on condition of security deposit > 0
Db2 database:
There are 700 records for this criteria
Mainframe response for first time:
We will send the first 500 records
Front end will then send request for getting next set of records which will contain:
Customer id: A
Next request
First security deposit numbr: 0
Last security deposit number : 17980
So for this detail, if I query my datbase based on security deposit number > 17980, it may result in duplicate records listing in the screen once again since our sorting order in database is not based on security deposit number
How to impelement this logic??

Many Client/Server applications in an IBM Mainframe environment involve psuedo conversational CICS transactions.
If you are using CICS in psueudo conversational mode it
is not possible for the Server to hold cusors when it RETURNs to the Client. Therefore scrollable cusors
are of little use in this environment. So to answer your basic question: No scrollable cursors cannot be used here.
The "trick" here is to create an SQL predicate in the Server that is restartable. It will then pick up rows in the correct order from any given
stating point. When the Client calls your Server it must pass all of the positioning information to your Server.
Typically, on a first call from a Client all of the positioning values are set to cause the cursor to
position itself starting with what must be the the first row. The Server then pulls in a "page" worth of data
and returns it to the Client. On the next page forward request the Client sets these positioning values to
the last row it displayed and calls the Server for the next "page" of data.
In your situation I would assume that the page forward cursor would look something like this, all the
variables prefixed with RESTART... are what the Client must provide to the Server to start the cursor
in the correct position.
DECLARE CURSOR Page-forward FOR
SELECT Receive_Date, Customer_id, Account_Nbr, Security_Dep_Id
FROM Table_Name
WHERE ( (Receive_Date < :RESTART-RCV-DT)
OR (Receive_Date = :RESTART-RCV-DT AND
Customer_Id > :RESTART-CUSTOMER-ID)
OR (Receive_Date = :RESTART-RCV-DT AND
Customer_Id = :RESTART-CUSTOMER-ID AND
Account_Nbr > :RESTART-ACCT-NBR)
OR (Receive_Date = :RESTART-RCV-DT AND
Customer_Id = :RESTART-CUSTOMER-ID AND
Account_Nbr = :RESTART-ACCT-NBR AND
Security_Dep_Id > :RESTART-SEC-DEP-ID))
ORDER BY 1 DESC, 2 ASC , 3 ASC, 4 ASC
For the initial call the Client would have passed something like '9999-12-31' as the RESTART-RCV-DT, zero
for the RESTART-CUSTOMER-ID, RESTART-ACCT-NBR and SEC-DEP-ID (assuming these are all numeric). If you look at
the cursor predicate carefully you can verify that there cannot be any rows prior to these values - therefore this
will return the first page of data. If the Client needs to page forward after this, it must tell the Server to start
with the next row after the last one it received. To do this it would populate the RESTART... variables with
the values from the last row on the page it just
displayed. This process will drive the cursor selects forward one page at a time.
When paging up, the process is reversed (you will need a second cursor to support this, and the Client needs to tell you which direction to page: Forward or Back). The Client
will need to populate the RESTART variables with the first row it recieved from the Server. The trick
for the Server on a page up request is to return the data
to the Client in reverse order. You may have to populate the data page passed back
to the Client in reverse order (ie. put the first row retrieved into the last row of the paging area shared between
the Client and the Server). The page backward cursor would look something like:
DECLARE CURSOR Page-backward FOR
SELECT Receive_Date, Customer_id, Account_Nbr, Security_Dep_Id
FROM Table_Name
WHERE ( (Receive_Date > :RESTART-RCV-DT)
OR (Receive_Date = :RESTART-RCV-DT AND
Customer_Id < :RESTART-CUSTOMER-ID)
OR (Receive_Date = :RESTART-RCV-DT AND
Customer_Id = :RESTART-CUSTOMER-ID AND
Account_Nbr < :RESTART-ACCT-NBR)
OR (Receive_Date = :RESTART-RCV-DT AND
Customer_Id = :RESTART-CUSTOMER-ID AND
Account_Nbr = :RESTART-ACCT-NBR AND
Security_Dep_Id < :RESTART-SEC-DEP-ID))
ORDER BY 1 ASC, 2 DESC , 3 DESC, 4 DESC
As has been pointed out in other answers, this type of paging process does not manage or detect concurrent
updates to the database that may occur duing paging transactions. That is another topic for another day...
Developing Restartable Cursors
The the key to building a paging Server is to develop a cursor that is restartable from a set of values received
from a Client transaction. This leaves control of cursor positioning and direction with the Client.
It also means the Client must receive all critical positioning data from the Server even though
the Client might not actually
use these data for any other purpose (e.g. From your question I got the impression that the Client may not require
the Security Deposit Id except to supply as a positioning parameter for your Server)
To build a paging Server you need to know
what the required sorting order of the data are (e.g. Receive Date Descending then Customer Id Ascending then
Account Number Ascending).
You also need know the set of data that uniquely identify a row
returned by the cursor. In your case that would be the Security Deposit Id (this is the primary key for the
table you are selecting from so it must be unique for each and every row in that table). Knowing this you then build a
cursor predicate (the stuff in the WHERE clause) that will return data needed by the Client in the required sort order that
also includes
the full positioning key (i.e. Security Deposit Id). In the event that two or more returned rows may contain identical data if
the final positioning key were elimiminated makes it important that the positioning key be included as a sort condition.
It doesn't matter if it is ascending or descending, but it needs to be included on the sort to ensure consistent
order of data retrieval.
A fairly simple formula may be followed to build the predicate for a restartable cusor needed to
support paging Servers. Basically this is a cascade of "OR" clauses connecting a series of "AND" clauses
that become progressively more selective following the sort order required by the Client and end up with the positioning
key.
To see how this works consider how the query for your Server might be developed...
Start with the column from the sort order that changes least often...
SELECT ...
FROM ...
WHERE Receive_Date < restart value
This will retrieve all rows prior to the specified restart Receieve date regardless of what the other
column restart values are (e.g. Customer ID's can range from minimum to maximum values, as long as the Receive Date
is less than any Receive Date "seen" so far). Since this column only changes value after all subortinate sort columns values
have been exausted you can be sure that this does not pick up any rows prior to the full restart key.
But what about those rows that occur on the same date as the restart request but have a
larger Customer Id? These can be picked up with....
SELECT ...
FROM ...
WHERE Receive_Date = restart value AND
Customer_id > restart value
What about those where the Receive Date and Customer Id are the same as the restart key but have
a larger Account Number? These can be picked up with...
SELECT ...
FROM ...
WHERE Receive_Date = restart value AND
Customer_Id = restart value AND
Account_Nbr > restart value
Continue this pattern until the full restart key has been processed. Notice that the inequality
signs are determined by the sort order. Use < when the column is sorted Descending and > when Ascending.
Also notice that the SELECT and FROM clauses
are exactly the same for each query - which means you can put them all together using OR conjuctions...
SELECT Receive_Date, Customer_id, Account_Nbr, Security_Dep_Id
FROM Table_Name
WHERE ( (Receive_Date < :RESTART-RCV-DT)
OR (Receive_Date = :RESTART-RCV-DT AND
Customer_Id > :RESTART-CUSTOMER-ID)
OR (Receive_Date = :RESTART-RCV-DT AND
Customer_Id = :RESTART-CUSTOMER-ID AND
Account_Nbr > :RESTART-ACCT-NBR)
OR (Receive_Date = :RESTART-RCV-DT AND
Customer_Id = :RESTART-CUSTOMER-ID AND
Account_Nbr = :RESTART-ACCT-NBR AND
Security_Dep_Id > :RESTART-SEC-DEP-ID))
ORDER BY 1 DESC, 2 ASC , 3 ASC, 4 ASC
There you go... a restartable cursor for forward paging. Construction of the cursor for backward paging follows a similar pattern, just flip the
sort orders and repeat.

A simplistic approach: Write your SQL to retrieve data according to your criteria, in the sort order you specify. Then only retrieve the keys to the rows you want. Save the keys somewhere you will have access to upon subsequent invocations of your transaction. Look into multi-row select in DB2. Also understand pseudo-conversational programming techniques in CICS.
And now we get to the design implications Bill Woodger mentions, that you do not specify in your question, and which are the reason I'm just hitting the high points of a simplistic approach.
If changes to your result set occur between one invocation and the next, your results will not reflect those changes. You must decide if this is important.
You mention a "front end" but do not specify what it is. If it is a BMS application, you may be able to save the keys in your commarea or in a container. If your front end is a distributed application invoking your transactions via CICS Web Services or CICS Web Support or MQ or raw sockets or whatever, you must design a mechanism to store those keys such that you can uniquely retrieve them — perhaps by sending a contrived key back to the distributed application which it must supply upon subsequent invocations. Then you must have some process to clean up your key store.
Creating a solution to your problem that is unique in your IT shop is not something to be done in isolation. You must involve others who will be tasked with maintaining your application, there may be a group external to your project tasked with making such decisions, there may be infrastructure issues with your solution.
So this isn't so much as an answer to your question as it is an elaboration upon why you may not get an answer, or at least the answer you seem to desire.

Related

How can I query postgres by page?

Assume a table that has many pages and I only want some of it, for example
SELECT * FROM t WHERE...
From pg_class.relpages I know how many pages are there, how can I access some page like
SELECT * FROM t WHERE PAGE_NUMBER=1
Since you mention relpages I assume you are talking about "physical" pages where the data is stored on disk. In general you should mind your own business and let the database server mind its own business. But if you don't want to do that, the you can reference tuples by the hidden system column "ctid", which is a composite consisting of the page number and the slot within the page.
select * from pgbench_accounts where ctid between '(18,0)' and '(18,65535)';
This is the 19th page, as page numbers start at 0. Slot numbers start at 1, but for convenience you are allowed to reference slot 0 as if it were a valid (but empty) slot.
Note that until very recent versions, this will not be fast as it would scan the whole table and filter out the disqualified rows one by one.
For that, you need a sort order. For example
SELECT * FROM t WHERE ...
ORDER BY customer, created_ts;
Make sure that the sort condition is unique. If it isn't, add the primary key at the end.
Then you create an index for the ORDER BY condition:
CREATE INDEX ON t (customer, created_ts);
Then you select the first page of 50 items like this:
SELECT * FROM t WHERE ...
ORDER BY customer, created_ts
LIMIT 50;
You display the page and remember customer and created_ts from the last result row in <latest_cust> and <latest_ts>.
The next page is fetched with
SELECT * FROM t WHERE ...
AND (customer, created_ts) > (<latest_cust>, <latest_ts>)
ORDER BY customer, created_ts
LIMIT 50;
and so on.
This method (it is called “keyset pagination”) is efficient and quite stable in the face of concurrent data modifications.

Postgres count(*) optimization idea

I'm currently working on a project involving keeping track of users and their actions with my database (PostgreSQL as the RDMS), and I have run into an issue when trying to perform COUNT(*) on occurrences of each user. What I want is to be able to, efficiently, count the number of times each user appears from every record, and also be able to achieve looking at counts on a particular date range.
So, the problem is how do we achieve counting the total number of times a user appears from the tables contents, and how do we count the total number on a date range.
What I've tried
As you might know, Postgres doesn't support COUNT(*) very well using indices, so we have to consider other ways to reduce the # of records it looks at in order to speed up the query. So my first approach is to create a table to keep track of the number of times a user has a log message associated with them, and on what day (similar to the idea behind a materialized view, but I dont want continually refresh the materialized view with my count query). Here is what I've come up with:
CREATE TABLE users_counts(user varchar(65536), counter int default 0, day date);
CREATE RULE inc_user_date_count
AS ON INSERT TO main_table
DO ALSO UPDATE users_counts SET counter = counter + 1
WHERE user = NEW.user AND day = DATE(NEW.date_);
What this does is every time a new record is inserted into my 'main_table', we update the current users_counts table to increment the records whose date is equal to the new records date, and the user names are the same.
NOTE: the date_ column in 'main_table' is a timestamp so I must cast the new records date_ to be a DATE type.
The problem is, what if the user column value doesn't already exist in my new table 'users_count' for the current day, then nothing is updated.
Here is my question:
How do I write the rule such that we check if a user exists for the current day, if so increment that counter, otherwise insert new row with user, day, and counter of 1;
I also would like to know if my approach makes sense to do, or is there any ideas I am missing that I just haven't thought about. As my database grows, it is increasingly inefficient to perform counting, so I want to avoid any performance bottlenecks.
EDIT 1: I was able to actually figure this out by creating a separate RULE but I'm not sure if this is correct:
CREATE RULE test_insert AS ON INSERT TO main_table
DO ALSO INSERT INTO users_counts(user, counter, day)
SELECT NEW.user, 1, DATE(NEW.date)
WHERE NOT EXISTS (SELECT user FROM users.log_messages WHERE user = NEW.user_);
Basically, an insert happens if the user doesn't already exist in my CACHED table called user_counts, and the first rule above updates the count.
What I'm unsure of is how do I know when which rule is called first, the update rule or insert.. And there must be a better way, how do I combine the two rules? Can this be done with a function?
It is true that postgresql is notoriously slow when it comes to count(*) queries. However if you do have a where clause that limits the number of entries the query will be much faster. If you are using postgresql 9.2 or newer this query will be just as fast as it's in mysql because of index only scans which was added in 9.2 but it's best to explain analyze your query to make sure.
Does my solution make sense?
Very much so provided that your explain analyze show that index only scans are not being used. Trigger based solutions like the one that you have adapted find wide usage. But as you have realized the problem with the initial state arises (whether to do an update or an insert).
which rule is called first
Multiple rules on the same table and same event type are applied in
alphabetical name order.
from http://www.postgresql.org/docs/9.1/static/sql-createrule.html
the same applies for triggers. If you want a particular rule to be executed first change it's name so that it comes up higher in the alphabetical order.
how do I combine the two rules?
One solution is to modify your rule to perform an upsert (Look right at the bottom of that page for a sample upsert ). The other is to populate the counter table with initial values. The trick is to create the trigger at the same time to avoid errors. This blog post explains it really well.
While the initial setup will be slow each individual insert will probably be faster. The two opposing factors being the slowness of a WHERE NOT EXISTS query vs the overhead of catching an exception.
Tip: A block containing an EXCEPTION clause is significantly more
expensive to enter and exit than a block without one. Therefore, don't
use EXCEPTION without need.
Source the postgresql documentation page linked above.

PostgreSQL: Returning ordered rows after a specific ID

Scenario:
I am displaying a table of records. It initially displays the first 500 with "show more" at the bottom, which returns the next 500.
Issue:
If between initial display and clicking "show more" 1 record is added, that will cause "order by date, offset 500, limit 500" to overlap by 1 row.
I'd like to "order by date, offset until 'id of last row shown', limit 500"
My row IDs are UUIDs. I am open to alternative approaches that achieve the same result.
If you can order by ID, you can paginate using
where id > $last_seen_id limit 500
but that's not going to be useful where you're sorting by date.
Sort stability!
I really hope that "date" actually means "timestamp" though, otherwise your ordering will be unstable and you can miss rows in pagination; you'll have to order by date, id to get stable ordering if it's really a date, and should probably do so even for timestamp.
State on client
One option is to push the state out to the client. Have the client remember the last-seen (date,id) tuple, and use:
where date > $last_seen_date and id > $last_seen_id limit 500
Cursors
Do you care about scalability? If not, you can use a server-side cursor. Declare the cursor for the full query, without the LIMIT. Then FETCH chunks of rows as requested. To do this your app must have a way to consistently bind a connection to a specific user's requests, though, and not to reset that connection or return it to the pool between requests. This might not be practical with your pool/framework, but is probably the best solution if you can do it.
Temp tables
Another even less scalable option is to CREATE TABLE sessiondata.myuser_myrequest_blah AS SELECT .... then paginate that table. It's guaranteed not to change. This avoids the difficulty of needing to keep a consistent connection across requests, but will have a very slow first-request response time and is completely impractical for large user counts or large amounts of data.
Related questions
Handling paging with changing sort orders
Using "Cursors" for paging in PostgreSQL
How to provide an API client with 1,000,000 database results?
i think you can use a subquery in the where to accomplish this.
e.g. given you're paginating through a users table, and you want the records after a given user:
SELECT *
FROM users
WHERE created_at > (
SELECT created_at
FROM users
WHERE users.id = '00000000-1111-2222-3333-444444444444'
LIMIT 1
)
ORDER BY created_at DESC limit 5;

How to implement cursors for pagination in an api

This is similar to to this question which doesn't have any answers. I've read all about how to use cursors with the twitter, facebook, and disqus api's and also this article about how disqus generally built their cursors, but I still cannot seem to grok the concept of how they work and how to implement a similar solution in my own projects. Can someone explain specifically the different techniques and concepts behind them?
Lets first understand why offset pagination fails for large data sets with an example.
Clients provide two parameters limit for number of results and offset and for page offset.
For example, with offset = 40, limit = 20, we can tell the database to return the next 20 items, skipping the first 40.
Drawbacks:
Using LIMIT OFFSET doesn’t scale well for large
datasets. As the offset increases the farther you go within the
dataset, the database still has to read up to offset + count rows
from disk, before discarding the offset and only returning count
rows.
If items are being written to the dataset at a high frequency, the
page window becomes unreliable, potentially skipping or returning
duplicate results.
How Cursors solve this ?
Cursor-based pagination works by returning a pointer to a specific item in the dataset. On subsequent requests, the server returns results after the given pointer.
We will use parameters next_cursor along with limit as the parameters provided by client in this case.
Let’s assume we want to paginate from the most recent user to the oldest user.When client request for the first time , suppose we select the first page through query:
SELECT * FROM users
WHERE team_id = %team_id
ORDER BY id DESC
LIMIT %limit
Where limit is equal to limit plus one, to fetch one more result than the count specified by the client. The extra result isn’t returned in the result set, but we use the ID of the value as the next_cursor.
The response from the server would be:
{
"users": [...],
"next_cursor": "1234", # the user id of the extra result
}
The client would then provide next_cursor as cursor in the second request.
SELECT * FROM users
WHERE team_id = %team_id
AND id <= %cursor
ORDER BY id DESC
LIMIT %limit
With this, we’ve addressed the drawbacks of offset based pagination:
Instead of the window being calculated from scratch on each request based on the total number of items, we’re always fetching the next count rows after a specific reference point. If items are being written to the dataset at a high frequency, the overall position of the cursor in the set might change, but the pagination window adjusts accordingly.
This will scale well for large datasets. We’re using a WHERE clause to fetch rows with id values less than the last id from the previous page. This lets us leverage the index on the column and the database doesn’t have to read any rows that we’ve already seen.
For detailed explanation you can visit this wonderful engineering article from slack!
Here is an article about pagination: paginating-real-time-data-cursor-based-pagination
Cursors – we need to have at least one column with unique sequential values to implement cursor based pagination. This can be similar to Twitter’s max_id parameter or Facebook’s after parameter.
In general you should pass the current item or page number in the request as a param. Other usual param is the batch size of the page. Then on the server side backend you select and return the proper dataset, with an SQL query for example.
enter image description hereHere's what I am Done with. The cursor is working as a pointer and it points to that index. and limit will pick that many rows from that pointer. Let's say we have given id 10 and limit 5 then it will go to id 10 and pick 5 elements from there.
Some Graph API connections uses cursors by default. You can use 'limit' and 'before'/'after' parameters in your call. If you are still not clear, you can post your code here and I can explain with it.

Best Way to Sequentially Parse Through a Table in T-SQL

I'm writing a stored procedure in SQL Server and hoping someone can suggest a more computationally efficient way to handle this problem:
I have a table of Customer Orders (i.e., "product demand") data that contains 3000 line items. Each record expresses the Order Qty for a specific product.
I also have another table of Production Orders (i.e., "product supply") data that contains about 200 line items. Each record expresses the Qty Available for each specific product.
The problem is that there is typically less supply than demand and, therefore, the Custom Order table contains an Allocation Priority value that shows each Customer Order's position in line to receive product.
What's the best way to allocate Qty Available in Production Orders to the Order Qty in Customer Orders? Note that you can't allocate more to each Customer Order than has been ordered.
I can do this by creating a WHILE loop and doing the allocation product-by-product, line-by-line but it is very slow.
Is there a faster set-based way to approach this problem?
I don't have data to test against. This would not try and fill partial qty.
select orders.custID, orders.priority, orders.prodID, orders.qty, SUM(cumu.qty) as 'cumu'
from orders
join orders as cumu
on cumu.prodID = orders.prodID
and cumu.priority <= orders.priority
join available
on availble.prodID = orders.prodID
group by orders.custID, orders.priority, orders.prodID, orders.qty
having SUM(cumu.qty) < availble.qty
order by orders.custID, orders.priority, orders.prodID