There are 10, 000 users. Each can define up to 500 conditions for an enterprise supply chain inventory.
An example of a condition could be
Group1
Item in InventoryX > 5000 AND colourItem == Red
AND Group2
Item in InventoryY > 4000 and colourItem == Green
Whenever the state of the database (single row in InventoryX, InventoryY, and colourItem columns) meets to the condition mentioned above, the user who has created the alert should be notified.
The first solution that comes to mind is to continuously keep polling the database, at a given time interval (say 1 minute) but the problem with it would be every minute there will be 10000 X 500 polls.
This is difficult to scale.
We also need to keep in mind that the user's are given a simple front-end to create conditions, and they can update these conditions as per their whim. No hard coding can work.
What would be a better architecture/project to be used to achieve the same?
Database = PostgreSQL.
https://www.postgresql.org/docs/current/sql-notify.html
Appears difficult to implement since there is no ease of usage for so many conditions.
You only have three options:
Poll the database for changes (which as you say, gets expensive).
Check all the rules as the changes are made.
Make a note of changed data as changes are made and check that changed subset in batches.
Whether you prefer #2 or #3 depends on how many rules there are, how long it takes to check them for each changed row and whether you can usefully summarise changes or merge alerts.
Both #2 and #3 would use one or more triggers. Option #2 would run the rules and add entries to an "alerts" queueing table (or NOTIFY a listening process to send an alert, or set a flag in memcached/redis etc).
Option #3 would just note either the IDs of changed rows, or perhaps the details of the change and you would have another process read the changes and generate alerts. This gives you the opportunity to notice that the same change was made twice and only send 1 alert or similar if that is useful to you.
Related
I want to show a different option to the user in workflow through input node, depending upon whether the user has modified the record or not.
Problem is if I would use a condition node with custom class to detect whether object has been modified by some person or not in between the workflow process then as soon as the person clicks on route workflow the save is automatically called and isModified() flag gets false, How do I get in condition node whether some person has modified the record or not.
I have to show different options to the user if he has modified and different option on routing workflow if he have not modified.
Sounds to me like you need to enable eAudit on the object and then to check whether eauditusername on the most recent audit record for that object bears the userid of the current user.
It's a little hokey and tempts fate, but if your condition node is early in the workflow's route when this button is pressed, you could try and check to see if the changedate on the object (assuming you are working with one of the many objects that has one) is within the last 5 seconds. There is a gap where the record could be routed twice within a few seconds, but the gap is fairly hard to hit. There is also a gap where if the system slows down at that point and takes more than 5 seconds to get to and run your condition, then it would appear to be not modified. You can play with the delay to find a sweet spot of the fewest false positives and negatives.
To give a bit of background to my issue, I've got a very basic banking system. The process at the moment goes:
A transaction is added to an Azure Service Bus
An Azure Webjob picks up this message and creates the new row in the SQL DB.
The balance (total) of the account needs to be updated with the value in the message (be it + or -).
So for example if the field is 10 and I get two updates (10, -5) the field needs to be 15 (10 + 10 - 5), it isn't a case of just updating the value, it needs to do some arithmetic.
Now I'm not too sure how to handle the update of the balance as there could be many requests come in so need to update accordingly.
I figured one way is to do the update on the SQL side rather than the web job, but that doesn't help with concurrent updates.
Can I do some locking with the field? But what happens to an update when it is blocked because an update is already in progress? Does it wait or fail? If it waits then this should be OK. I'm using EF.
I figured another way round this is to have another WebJob that will run on a schedule and will add up all the amounts and update the value once, and so this will be the only thing touching that field.
Thanks
One way or another, you will need to serialize write access to account balance field (actually to the whole row).
Having a separate job that picks up "pending" inserts, and eventually updates balance will be ok in case writes are more frequent on your system than reads, or you don't have to always return most recent balance. Otherwise, to get the current balance you will need to do something like
SELECT balance +
ISNULL((SELECT SUM(transaction_amount)
FROM pending_insert pi WHERE pi.user_id = ac.user_id
),0) as actual_balance
FROM account ac
WHERE ac.user_id = :user_id
That is definitely more expensive from performance perspective , but for some systems it's perfectly fine. Another pitfall (again, it may or may not be relevant to your case) is enforcing, for instance, non-negative balance.
Alternatively, you can consistently handle banking transactions in the following way :
begin database transaction
find and lock row in account table
validate total amount if needed
insert record into banking_transaction
update user account, i.e. balance = balance +transasction_amount
commit /rollback
If multiple user accounts are involved, you have to always lock them in the same order to avoid deadlocks.
That approach is more robust, but potentially worse from concurrency point of view (again, it depends on the nature of updates in your application - here the worst case is many concurrent banking transactions for one user, updates to multiple users will go fine).
Finally, it's worth mentioning that since you are working with SQLServer, beware of deadlocks due to lock escalation. You may need to implement some retry logic in any case
You would want to use a parameter substitution method in your sql. You would need to find out how to do that based on the programming language you are using in your web job.
$updateval = -5;
Update dbtable set myvalue = myvalue + $updateval
code example:
int qn = int.Parse(TextBox3.Text)
SqlCommand cmd1 = new SqlCommand("update product set group1 = group1 + #qn where productname = #productname", con);
cmd1.Parameters.Add(new SqlParameter("#productname", TextBox1.Text));
cmd1.Parameters.Add(new SqlParameter("#qn", qn));
then execute.
I found time as the best value as event version.
I can merge perfectly independent events of different event sources on different servers whenever needed without being worry about read side event order synchronization. I know which event (from server 1) had happened before the other (from server 2) without the need for global sequential event id generator which makes all read sides to depend on it.
As long as the time is a globally ever sequential event version , different teams in companies can act as distributed event sources or event readers And everyone can always relay on the contract.
The world's simplest notification from a write side to subscribed read sides followed by a query pulling the recent changes from the underlying write side can simplify everything.
Are there any side effects I'm not aware of ?
Time is indeed increasing and you get a deterministic number, however event versioning is not only serves the purpose of preventing conflicts. We always say that when we commit a new event to the event store, we send the new event version there as well and it must match the expected version on the event store side, which must be the previous version plus exactly one. If there will be a thousand or three millions of ticks between two events - I do not really care, this does not give me the information I need. And if I have missed one event on the go is critical to know. So I would not use anything else than incremental counter, with events versioned per aggregate/stream.
We've got a set of forms in our web application that is managed by multiple staff members. The forms are common for all staff members. Right now, we've implemented a locking mechanism. But the issue is that there's no reliable way of knowing when a user has logged out of the system, so the form needs to be unlocked. I was wondering if there was a better way to manage concurrent users editing the same data.
You can use optimistic concurrency which is how the .Net data libraries are designed. Effectively you assume that usually no one will edit a row concurrently. When it occurs, you can either throw away the changes made, or try and create some nicer retry logic when you have two users edit the same row.
If you keep a copy of what was in the row when you started editing it and then write your update as:
Update Table set column = changedvalue
where column1 = column1prev
AND column2 = column2prev...
If this updates zero rows, then you know that the row changed during the edit and you can then deal with it, or simply throw an error and tell the user to try again.
You could also create some retry logic? Re-read the row from the database and check whether the change made by your user and the change made in the database are able to be safely combined, then do so automatically. Or you could present a choice to the user as to whether they still wish to make their change based on the values now in the database.
Do something similar to what is done in many version control systems. Allow anyone to edit the data. When the user submits the form, the database is checked for changes. If the record has not been changed prior to this submission, allow it as usual. If both changes are the same, ignore the incoming (now redundant) change.
If the second change is different from the first, the record is now in conflict. The user is presented with a new form, which indicates which fields were changed by the conflicting update. It is then the user's responsibility to resolve the conflict (by updating both sets of changes), or to allow the existing update to stand.
As Spence suggested, what you need is optimistic concurrency. A standard website that does no accounting for whether the data has changed uses what I call "last write wins". Simply put, whichever connection saves to the database last, that version of the data is the one that sticks. In optimistic concurrency, you use a "first write wins" logic such that if two connections try to save the same row at the same time, the first one that commits wins and the second is rejected.
There are two pieces to this mechanism:
The rules by which you fail the second commit
How the system or the user handles the rejected commit.
Determining whether to reject the commit
Two approaches:
Comparison column that changes each time a commit happens
Compare the data with its committed version in the database.
The first one entails using something like SQL Server's rowversion data type which is guaranteed to change each time the row changes. The upside is that it makes it simple to roll your own logic to determine if something has changed. When you get the data, you pull the rowversion column's value and when you commit, you compare that value with what is currently in the database. If they are different, the data has changed since you last retrieved it and you should reject the commit otherwise proceed to save the data.
The second one entails comparing the columns you pulled with their existing committed values in the database. As Spence suggested, if you attempt the update and no rows were updated, then clearly one of the criteria failed. This logic can get tricky when some of the values are null. Many object relational mappers and even .NET's DataTable and DataAdapter technology can help you handle this.
Handling the rejected commit
If you do not leave it up to the user, then the form would throw some message stating that the data has changed since they last edited and you would simply re-retrieve the data overwriting their changes. As you can imagine, users aren't particularly fond of this solution especially in a high volume system where it might happen frequently.
A more sophisticated (and also more complicated) approach is to show the user what has changed allow them to choose which items to try to re-commit, Behind the scenes you would retrieve the data again, overwrite the values picked by the user with their entries and try to commit again. In high volume system, this will still be problematic because by the time the user has tried to re-commit, the data may have changed yet again.
The checkout concept is effectively pessimistic concurrency where users "lock" rows. As you have discovered, it is difficult to implement in a stateless environment. Users are notorious for simply closing their browser while they have something checked out or using the Back button to return a set that was checked out and try to recommit it. IMO, it is more trouble than it is worth to try go this route in a web-based solution. Assuming you write the user name that last changed a given row, with optimistic concurrency, you can inform the user whose changes are rejected who saved the data before them.
I have seen this done two ways. The first is to have a "checked out" column in your database table associated with that data. Your service would have to look for this flag to see if it is being edited. You can have this expire after a time threshold is met (with a trigger) if the user doesn't commit changes. The second way is having a dedicated "checked out" table that stores id's and object names (probably the table name). It would work the same way and you would have less lookup time, theoretically. I see concurrency issues using the second method, however.
Why do you need to look for session timeout? Just synchronize access to your data (forms or whatever) and that's it.
UPDATE: If you mean you have "long transactions" where form is locked as soon as user opens editor (or whatever) and remains locked until user commits changes, then:
either use optimistic locking, implement it by versioning of forms data table
optimistic locking can cause loss of work, if user have been away for a long time, then tried to commit his changes and discovered that someone else already updated a form. In this case you may want to implement explicit "locking" of form, where user "locks" form as soon as he starts work on it. Other user will notice that form is "locked" and either communicate with lock owner to resolve issue, or he can "relock" form for himself, loosing all updates of first user in process.
We put in a very simple optimistic locking scheme that works like this:
every table has a last_update_date
field in it
when the form is created
the last_update_date for the record
is stored in a hidden input field
when the form is POSTED the server
checks the last_update_date in the
database against the date in the
hidden input field.
If they match,
then no one else has changed the
record since the form was created so
the system updates the data.
If they don't match, then someone else has
changed the record since the form was
created. The system sends the user back to the form edit page and tells the user that someone else edited the record and they must reapply their changes.
It is very simple and works well enough.
You can use "timestamp" column on your table. Refer: What is the mysterious 'timestamp' datatype in Sybase?
I understand that you want to avoid overwriting existing data with consecutively updates.
If so, when the user opens a screen you have to get last "timestamp" column to the client.
After changing data just before update, you should check the "timestamp" columns(yours and db) to make sure if anyone has changed tha data while he is editing.
If its changed you will alert an error and he has to startover. If it is not, update the data. Timestamp columns updated automatically.
The simplest method is to format your update statement to include the datetime when the record was last updated. For example:
UPDATE my_table SET my_column = new_val WHERE last_updated = <datetime when record was pulled from the db>
This way the update only succeeds if no one else has changed the record since the last read.
You can message to the user on conflict by checking if the update suceeded via a SELECT after the UPDATE.
I keep seeing questions floating through that make reference to a column in a database table named something like DateLastUpdated. I don't get it.
The only companion field I've ever seen is LastUpdateUserId or such. There's never an indicator about why the update took place; or even what the update was.
On top of that, this field is sometimes written from within a trigger, where even less context is available.
It certainly doesn't even come close to being an audit trail; so that can't be the justification. And if there is and audit trail somewhere in a log or whatever, this field would be redundant.
What am I missing? Why is this pattern so popular?
Such a field can be used to detect whether there are conflicting edits made by different processes. When you retrieve a record from the database, you get the previous DateLastUpdated field. After making changes to other fields, you submit the record back to the database layer. The database layer checks that the DateLastUpdated you submit matches the one still in the database. If it matches, then the update is performed (and DateLastUpdated is updated to the current time). However, if it does not match, then some other process has changed the record in the meantime and the current update can be aborted.
It depends on the exact circumstance, but a timestamp like that can be very useful for autogenerated data - you can figure out if something needs to be recalculated if a depedency has changed later on (this is how build systems calculate which files need to be recompiled).
Also, many websites will have data marking "Last changed" on a page, particularly news sites that may edit content. The exact reason isn't necessary (and there likely exist backups in case an audit trail is really necessary), but this data needs to be visible to the end user.
These sorts of things are typically used for business applications where user action is required to initiate the update. Typically, there will be some kind of business app (eg a CRM desktop application) and for most updates there tends to be only one way of making the update.
If you're looking at address data, that was done through the "Maintain Address" screen, etc.
Such database auditing is there to augment business-level auditing, not to replace it. Call centres will sometimes (or always in the case of financial services providers in Australia, as one example) record phone calls. That's part of the audit trail too but doesn't tend to be part of the IT solution as far as the desktop application (and related infrastructure) goes, although that is by no means a hard and fast rule.
Call centre staff will also typically have some sort of "Notes" or "Log" functionality where they can type freeform text as to why the customer called and what action was taken so the next operator can pick up where they left off when the customer rings back.
Triggers will often be used to record exactly what was changed (eg writing the old record to an audit table). The purpose of all this is that with all the information (the notes, recorded call, database audit trail and logs) the previous state of the data can be reconstructed as can the resulting action. This may be to find/resolve bugs in the system or simply as a conflict resolution process with the customer.
It is certainly popular - rails for example has a shorthand for it, as well as a creation timestamp (:timestamps).
At the application level it's very useful, as the same pattern is very common in views - look at the questions here for example (answered 56 secs ago, etc).
It can also be used retrospectively in reporting to generate stats (e.g. what is the growth curve of the number of records in the DB).
there are a couple of scenarios
Let's say you have an address table for your customers
you have your CRM app, the customer calls that his address has changed a month ago, with the LastUpdate column you can see that this row for this customer hasn't been touched in 4 months
usually you use triggers to populate a history table so that you can see all the other history, if you see that the creationdate and updated date are the same there is no point hitting the history table since you won't find anything
you calculate indexes (stock market), you can easily see that it was recalculated just by looking at this column
there are 2 DB servers, by comparing the date column you can find out if all the changes have been replicated or not etc etc ect
This is also very useful if you have to send feeds out to clients that are delta feeds, that is only the records that have been changed or inserted since the data of the last feed are sent.