How does pages work if the DB is manipulated between next - spring-data

The below code i have is working as intended, but is there a better way to do it?
I am consuming a db like a queue and process in batches of a max number. I'm thinking on how i can refactor it to use page.hasNext() and page.nextPageable()
However I can't find any good tutorial/documentation on what happens if the DB is manipulated between getting a page and getting the next page.
List<Customer> toBeProcessedList = customerToBeProcessedRepo
.findFirstXAsCustomer(new PageRequest(0, MAX_NR_TO_PROCESS));
while (!toBeProcessedList.isEmpty()) {
//do something with each customer and
//remove customer, and it's duplicates from the customersToBeProcessed
toBeProcessedList = customerToBeProcessedRepo
.findFirstXAsCustomer(new PageRequest(0, MAX_NR_TO_PROCESS));
}

If you use the paging support for each page requested a new sql statement gets executed, and if you don't do something fancy (and probably stupid) they get executed in different transactions. This can lead to getting elements multiple times or not seeing them at all, when the user moves from page to page.
Example: Page size 3; Elements to start: A, B, C, D, E, F
User opens the first page and sees
A, B, C (total number of pages is 2)
element X gets inserted after B; User moves to the next page and sees
C, D, E (total number of pages is now 3)
if instead of adding X, C gets deleted, the page 2 will show
E, F
since D moves to the first page.
In theory one could have a long running transaction with read stability (if supported by the underlying database) so one gets consistent pages, BUT this opens up questions like:
When does this transaction end, so the user gets to see new/changed data
When does this transaction end, when the user moves away?
This approach would have some rather high resource costs, while the actual benefit is not at all clear
So in 99 of 100 cases the default approach is pretty reasonable.
Footnote: I kind of assumed relational databases, but other stores should behave in basically the same way.

Related

Pattern for updating a recursive linear tree?

Let's say you have a factory, where you put together different things. There are requests for those things, and you want to monitor what's needed to construct the thing.
For example for a simplified car you can have a request for:
car 2 (1)
chassis 2 (1)
wheels 8 (4)
tyres 8 (1)
rims 8 (1)
motor 2 (1)
The numbers next to the parts are indicating the amounts needed in real time, the numbers in parentheses are indicating the amounts needed to construct one parent, and the indentications are showing the tree structure. The children of a specific part are showing how much is needed of which to construct the parent.
At any time a wheel could come in available to the inventory, and it would update the amount of wheels needed to 7, and that would update the amount of tyres, and rims amount needed to 7.
Similarly a whole car could come in available, reducing chassis to 1. motor to 1, and wheels to 3 from 7.
It may seem like a simple problem, but I've spent months with it now to figure out a secure way to do so.
The inventories are tracked, and each inventory has different properties like created at, which item is it, and how much is available. Inventories can also be dedicated to a specific request.
When a new "shipment" comes in, it contains new inventories. When new inventories come in, a check runs if any request needs of that inventory.
Once an inventory is dedicated to a request, the request's amount needed updates, and all the children's amount needed is updated as well.
When an inventory is dedicated to a request, a new inventory is created, with the dedicated amount, and the same properties except that it's being dedicated to a request. The original inventory's amount is decreased with the amount used by the request.
There are a lot of possible problems with this.
Let's start with the main problem. Multiple inventories can come in parallel, trying to dedicate themselves to the same request. A recursive function runs which needs to update all the children of the subtree of the request. The parent request is read, given the amount it has got from inventory, and the children is being updated.
To understand:
1. one shipment of `car` comes in
2. checking if any requests needing `car`
3. assigning general inventory of `1 car` as dedicated inventory to request
4. `car` request amount needed is reduced with `1`
5. `car` request reads children, and for each children:
5.1. read available inventory for child request
5.2. update child request amount needed with `parentRequest.amountNeeded * childRequest.amountNeededPerParent - childRequestAvailableInventory`
5.3. run step 5. for children of children recursively
So every request has a field that shows how much inventory is needed to construct the parent request. The formula for it is parentRequest.amountNeeded * request.amountNeededPerParent - requestAvailableInventory.
At every given point any request can get inventory, and if that happens, the tree of the request much be updated cascading down, updating their amount needed.
First issue:
Between reading children, and reading the child's available inventory, the child request may get updated.
Second issue:
Between reading child's available inventory, and updating the child's amount needed the child request, and available inventory for it can update.
Third issue:
I'm using mongodb, and cannot update request's amount needed, and create dedicated inventory at the exact same time. So it's not guaranteed that the request's amount needed value will be in sync with the request's dedicated inventory amount.
Draft function:
const updateChildRequestsAmountNeeded = async (
parentRequest: Request & Document,
) => {
const childRequests = await RequestModel.find({
parentRequestId: parentRequest._id,
}).exec();
return Promise.all(
childRequests.map(async (childRequest) => {
const availableInventory = await getAvailableInventory({
requestId: childRequest._id,
});
const amountNeeded =
(parentRequest.amountNeeded * childRequest.amountNeededPerParent)- availableInventory;
childRequest.set({ amountNeeded });
await childRequest.save();
await updateChildRequestsAmountNeeded(childRequest)
}),
);
};
See examples of when it can go wrong:
initial state for each case:
A amountNeeded: 5
B amountNeeded: 5 (amountNeededPerParent: 1)
A available: 0
B available: 0
1. parent amount needed decreases (A1, and A2 are the same requests, the number is indicating the ID of the parallel processes)
1. A1 gets inventory (1)
1. A2 gets inventory (2)
2. A1 amount needed updated (4)
2. A2 amount needed updated (2)
3. A2's children read (B2) (needed 5)
3. A1's children read (B1) (needed 5)
6. B2 amount needed updated (to 2)
6. B1 amount needed updated (to 4)
2. request gets inventory while updating:
1. A gets inventory (1)
2. A amount needed updated (4)
3. A's children read (B)
4. B available inventory read (0)
5. B gets inventory (1)
6. B amount needed updated (4)
7. B amount needed updated (4) (should be 3)
I've tried to find a way to solve the issue, and never overwrite amount needed with outdated data, but couldn't find a way. Maybe it's mongodb that is a wrong approach, or the whole data structure, or there is a pattern for updating recursive data atomically.

Load a mutable list dynamically (fe. for recyclerview) from a backend. Probably used in Facebook, GMail, 9GAG, Instagram and co

I got a question for loading a extremely long mutable list of posts.
First of all, for people who don't know android's recyclerview: you load a batch of posts from a server, like 25 posts, and you show them in a scrollable list. when you scroll further, and you want to access the item on position 25, you load the next batch from the server. The recyclerview is my concrete problem, but the same problem would occur on websites and so on.
So this is the problem: Lets say the posts are ordered by likes and we have posts in the order {a, b, c, d, e, f, g, h} with decreasing likes. Our batchsize size is 3, so we always load 3 posts. Our screen can show 2 posts at a time. We begin with loading {a, b, c}. We are waiting 10 minutes before we scroll down and load pos3, pos4 and pos5. In this time the likes are going different and the new order is {a, b, d, c, f, e, g, h}. Our next batch will be {c, f, e}, since our backend request will be sth like backend.com/get3items/?batch=1. Since our screen can show 2 posts, it could show pos2 and pos3, which is {c, c}.
Same could happen if we remove or add an item. batchsize=1, list={a, b, c}. We load a. A new item g is added to the beginning of the list. We load a again.
Other facts: The number of items on the screen can be higher or lower then the batchsize. The changes of the list can happen at any position. Maybe we are loading batch#500 but there was a change at pos0. The code behind backend.com/get3items/?batch=1 could be done with "select * from items order by likes desc limit 3, 3"
For the answer I thought of this approaches:
1) The server notifies the client that a change happened via observer pattern. not useful, because if there are like 1 million subscribers, the server dies on every change (+ it is not restful-friendly, since the server has to track subscribers).
2) The server can calculate how the list was at a previous time. WOAH, is there a backend technique thing that I never heard of?
3) Everytime a new batch is getting loaded, the other batches in the buffer and the list are getting invalidated. The recyclerview will call notifyondatasetchanged then and all shown items are getting reloaded (so only the batches that are needed are loaded again). when loading the batches here, we have to make sure, the other batches are not getting invalidated again, since we would have an endless loop of refreshes. on the other hand - what happens when there is a change between reloading the batches here? same again, so unfortunately not the solution.
If i look at the posts at 9gag: you can scroll but you are getting the items of the time you loaded the site (and in the old order). you will only see new posts, if you refresh the page. so is there anything as described in 2)?

Getting ID fields from the primary table into the linked table via Form

As an amateur coder for some years I have generally used sub forms when dealing with linked tables to make the transfer of ID field from primary to sub nice and simple...
However in my latest project the main form is a continuous form with a list of delivery runs (Date, RunName, RunCompleted) etc... Linked to this primary table is a delivery list containing (SKU of product, Qty etc...). I use a simple Relationship between the two tables.
Now, On the main (RUNS) form at the end of each row is a button that opens the DELIVERIES form and displays all records with matching RUNID
This is fine for displaying pre-existing data but when I want to add new records I have been using the following code attached to the OnCurrent event:
Me.RunID = DLookup("[RunID]", "tbl_BCCRuns", "RunID = " & Forms![frm_BCC_Runs_list]![RunID])
I have also used:
Forms![frm_BCC_Deliveries].Controls![RunID] = Forms![tbl_BCCRuns].Controls![RunID]
(Note: above done from memory and exact code may be incorrect but that's not the problem at hand)
Now... Both these options give me what I need however...
I find that as I am working on the database, or if you open certain forms in the right order (a bug I need to identify and fix clearly) you can open the DELIVERIES form without the filter (to view all deliveries for arguments sake) and the top entry (usually the oldest record) suddenly adopts the RUNID of the selected record back in the main form.
Now, my question is this, and the answer may be a simple "no" and that's fine, I'll move on...
Is there a better way, a way I am not familiar with or just don't know about due to my inconsistent Access progress, to transfer ID's to a form without risking contamination from improper use? Or do I just have to bite the bullet and make sure that there is just no possible way for that to happen?
In effort to alleviate the issue, I have created a Display Only form for viewing the deliveries but there are still times when I need to access the live historical data to modify other fields without wanting to modify the RUNID.
Any pointers greatly appreciated...
Since you only want to pull the RunID if the form is on a new record row, do a check to verify this is a new record.
If Me.NewRecord Then
Me.RunID = DLookup("[RunID]", "tbl_BCCRuns", "RunID = " & Forms![frm_BCC_Runs_list]![RunID])
End If
Could also consider a technique to synchronize parent and child forms when both are subforms on a main form (the main form does not have to be bound) https://www.fmsinc.com/MicrosoftAccess/Forms/Synchronize/LinkedSubforms.asp

Handling multiple updates to a singe db field

To give a bit of background to my issue, I've got a very basic banking system. The process at the moment goes:
A transaction is added to an Azure Service Bus
An Azure Webjob picks up this message and creates the new row in the SQL DB.
The balance (total) of the account needs to be updated with the value in the message (be it + or -).
So for example if the field is 10 and I get two updates (10, -5) the field needs to be 15 (10 + 10 - 5), it isn't a case of just updating the value, it needs to do some arithmetic.
Now I'm not too sure how to handle the update of the balance as there could be many requests come in so need to update accordingly.
I figured one way is to do the update on the SQL side rather than the web job, but that doesn't help with concurrent updates.
Can I do some locking with the field? But what happens to an update when it is blocked because an update is already in progress? Does it wait or fail? If it waits then this should be OK. I'm using EF.
I figured another way round this is to have another WebJob that will run on a schedule and will add up all the amounts and update the value once, and so this will be the only thing touching that field.
Thanks
One way or another, you will need to serialize write access to account balance field (actually to the whole row).
Having a separate job that picks up "pending" inserts, and eventually updates balance will be ok in case writes are more frequent on your system than reads, or you don't have to always return most recent balance. Otherwise, to get the current balance you will need to do something like
SELECT balance +
ISNULL((SELECT SUM(transaction_amount)
FROM pending_insert pi WHERE pi.user_id = ac.user_id
),0) as actual_balance
FROM account ac
WHERE ac.user_id = :user_id
That is definitely more expensive from performance perspective , but for some systems it's perfectly fine. Another pitfall (again, it may or may not be relevant to your case) is enforcing, for instance, non-negative balance.
Alternatively, you can consistently handle banking transactions in the following way :
begin database transaction
find and lock row in account table
validate total amount if needed
insert record into banking_transaction
update user account, i.e. balance = balance +transasction_amount
commit /rollback
If multiple user accounts are involved, you have to always lock them in the same order to avoid deadlocks.
That approach is more robust, but potentially worse from concurrency point of view (again, it depends on the nature of updates in your application - here the worst case is many concurrent banking transactions for one user, updates to multiple users will go fine).
Finally, it's worth mentioning that since you are working with SQLServer, beware of deadlocks due to lock escalation. You may need to implement some retry logic in any case
You would want to use a parameter substitution method in your sql. You would need to find out how to do that based on the programming language you are using in your web job.
$updateval = -5;
Update dbtable set myvalue = myvalue + $updateval
code example:
int qn = int.Parse(TextBox3.Text)
SqlCommand cmd1 = new SqlCommand("update product set group1 = group1 + #qn where productname = #productname", con);
cmd1.Parameters.Add(new SqlParameter("#productname", TextBox1.Text));
cmd1.Parameters.Add(new SqlParameter("#qn", qn));
then execute.

How to get Goal Funnel Step data such as "entered" and "proceeded" through Query API?

When looking at Goal Funnel report in the Google Analytics website. I can see not only the number of goal starts and completion but also how many visits to each step.
How can I find the step data through the Google Analytics API?
I am testing with the query explorer and testing on a goal with 3 steps, which 1st step marked as Required
I was able to get the start and completion by running by using goalXXStarts and goalXXCompletions:
https://www.googleapis.com/analytics/v3/data/ga?ids=ga%3A90593258&start-date=2015-09-12&end-date=2015-10-12&metrics=ga%3Agoal7Starts%2Cga%3Agoal7Completions
However I can't figure out a way to get the goal second step data.
I tried using ga:users or ga:uniquePageViews with the URL of the step 2, and previousPagePath as step 1 (required = true) and add to that the ga:users or ga:uniquePageViews from the next stage with ga:previousPagePath of step 1 (since its required=true) for backfill.
I also tried other combinations, but could never reach the right number or close to it.
One technique that can be used to perform conversion funnel analysis with the Google Analytics Core Reporting API is to define a segment for each step in the funnel. If the first step of the funnel is a 'required' step, then that step must also be included in segments for each of the subsequent steps.
For example, if your funnel has three steps named A, B, and C, then you will need to define a segment for A, another for B, and another again for C.
If step A is required then:
Segment 1: viewed page A,
Segment 2: viewed page A and viewed page B,
Segment 3: viewed page A and viewed page C.
Otherwise, if step A is NOT required then:
Segment 1: viewed page A,
Segment 2: viewed page B,
Segment 3: viewed page C.
To obtain the count for each step in the funnel, you perform a query against each segment to obtain the number of sessions where that segment matches. Additionally, you can query the previous and next pages, including entrances and exits, for each step (if you need to); in which case, query previousPagePath and pagePath as dimensions along with metrics uniquePageviews, entrances and exits. Keep in mind the difference between 'hit-level' vs 'session-level' data when performing, constructing and interpreting the results of each query.
You can also achieve similar results by using sequential segmentation which will offer you finer control over how the funnel steps are counted, as well as allowing for non-sequential funnel analysis if required.