I'm using ServerValue.increment in Flutter to update an inventory amount in Firebase. It is a nice solution when my users are offline, but I need to fix the folowing case:
The user 1 reads the inventory of 40 (in example) and inmediately goes offline
The user 2 reads the same inventory (40) and spend 10, then the online inventory is updated to 30
The user 1 spends 35 (less than 40). When he/she goes online again the inventory is updated to -5 (30 - 35)
I would like to detect this negative number to execute a procedure. How can I detect it in Firebase?
I'm using ServerValue.increment in this way:
db.child('quantityInStock')
.set(ServerValue.increment(-quantityToReduce.round()));
How can I detect when quantityInStock ends up being a negative number in order to execute a new procedure automatically?
If the new value depends on the existing value in the way you describe, you have two options:
Use security rules to ensure the write operations is only allowed when there's enough inventory.
".write": "newData.val() >= 0"
Use a transaction to ensure that your client can actively check the current value, to determine the new value.
dataRef.runTransaction((MutableData transaction) async{
if (transaction.value >= 40) {
transaction.value = transaction.value - 40;
}
return transaction;
});
Both approaches have advantages and disadvantages.
For example: using security rules in your scenario with an offline user may prevent your application code from knowing the write was rejected, as completion listeners are not persisted across app restarts.
Using a transaction you won't have this problem, but in that case your app will only work when the user is connected to the database. Transactions don't work when the user is offline.
Related
I am building an app that helps people transfer money from one account to another. I have two tables "Users" and "Transactions". The way I am currently handling transfers is by
Check if the sender has enough balance to make a transfer.
Deduct the balance from the sender and update the sender's balance.
Add the amount deducted from the sender's account to the recipient's account and then update the balance of the recipient.
Then finally write the transaction record on the "Transactions" table as a single entry like below:
id | transactionId | senderAccount | recipientAccount | Amount |
—--+---------------+---------------+------------------+--------+
1 | ijiej33 | A | B | 100 |
so my question is, is recording a transaction as a single entry like above a good practice or will this kind of database model design produce future challenges?
Thanks
Check if the sender has enough balance to make a transfer.
Deduct the balance from the sender and update the sender's balance.
Yes, but.
If two concurrent connections attempt to deduct money from the sender at the same time, they may both successfully check that there is enough money for each transaction on its own, then succeed even though the balance is insufficient for both transactions to succeed.
You must use a SELECT FOR UPDATE when checking. This will lock the row for the duration of the transaction (until COMMIT or ROLLBACK), and any concurrent connection attempting to also SELECT FOR UPDATE on the same row will have to wait.
Presumably the receiver account can always receive money, so there is no need to lock it explicitly, but the UPDATE will lock it anyway. And locks must always be acquired in the same order or you will get deadlocks.
For example if a transatcion locks rows 1 then 2, while another locks rows 2 then 1: the first one will lock 1, the second will lock 2, then the first will try to lock 2 but it is already locked, and the second will try to lock 1 but it is also already locked by the other transaction. Both transactions will wait for each other forever until the deadlock detector nukes one of them.
One simple way to dodge this is to use ORDER BY:
SELECT ... FROM users WHERE user_id IN (sender_id,receiver_id)
ORDER BY user_id FOR UPDATE;
This will lock both rows in the order of their user_ids, which will always be the same.
Then you can do the rest of the procedure.
Since it is always a good idea to hold locks for the shortest amount of time, I'd recommend to put the whole thing inside a plpgsql stored procedure, including the COMMIT/ROLLBACK and error handling. Try to make the stored procedure failsafe and atomic.
Note, for security purposes, you should:
Store the balance of both accounts before the money transfer occured into the transactions table. You're already SELECT'ing it in the SELECT for update, might as well use it. It will be useful for auditing.
For security, if a user gets their password stolen there's not much you can do, but if your application gets hacked it would be nice if the hacker was not able to issue global UPDATEs to all the account balances, mess with the audit tables, etc. This means you need to read up on this and create several postgres users/roles with suitable permissions for backup, web application, etc. Some tables and especially the transactions table should have all UPDATE privileges revoked, and INSERT allowed only for the transactions stored procs, for example. The aim is to make the audit tables impossible to modify, basically append-only from the point of view of the application code.
Likewise you can handle updates to balance via stored procedures and forbid the web application role from messing with it. You could even add take a user-specific security token passed as a parameter to the stored proc, to authenticate the app user to the database, so the database only allows transfers from the account of the user who is logged in, not just any user.
Basically if it involves money, then it involves legislation, and you have to think about how not to go to jail when your web app gets hacked.
I am simulating multiple concurrent request for MongoDB`s "update".
Here is the thing, I insert a data amount=1000 in mongoDB, and every time I trigger the api, it will update the amount by amount += 50 and save it back to database. Basically it is a find and update operation on a single document.
err := globalDB.C("bank").Find(bson.M{"account": account}).One(&entry)
if err != nil {
panic(err)
}
wait := Random(1, 100)
time.Sleep(time.Duration(wait) * time.Millisecond)
//step 3: add current balance and update back to database
entry.Amount = entry.Amount + 50.000
err = globalDB.C("bank").UpdateId(entry.ID, &entry)
Here is the source code for the project.
I am simulating requests using Vegeta:
If I set -rate=10(which means trigger api 10 times in a second, so 1000 + 50 * 10 = 1500), the data is correct
echo "GET http://localhost:8000" | \
vegeta attack -rate=10 -connections=1 -duration=1s | \
tee results.bin | \
vegeta report
But with -rate=100(which means trigger api 100 times in a second, so 1000 + 50 * 100 = 6000) produces very confusing result.
echo "GET http://localhost:8000" | \
vegeta attack -rate=100 -connections=1 -duration=1s | \
tee results.bin | \
vegeta report
In short, the thing I want to know is: I thought MongoDB is using optimistic concurrency control, which means if there's a write conflict, it should retry again so the latency will go up, but the data should be guaranteed to be correct.
Why the result looks like the data correctness is totally not guaranteed in MongoDB?
I know some of you guys might notice the sleep at line 41 and 42, but even though I commented it out, when I test with -rate=500 the result is still not correct.
Any clues why this is happening?
Generally you should extract the relevant segment of the code into the question. It is inconsiderate to ask people to locate the 5 relevant lines in your 76 line program.
Your test is performing concurrent find-and-modify operations. Let's suppose there are two concurrent processes A and B that each increment account balance by 50. Starting balance is 0. The order of operations could be:
A: what is the current balance for account 1234?
B: what is the current balance for account 1234?
DB -> A: balance for account 1234 is 0
DB -> B: balance for account 1234 is 0
A: new balance is 0+50 = 50
A: set balance for account 1234 to 50
DB -> A: ok, new balance for account 1234 is 50
B: new balance is 0+50 = 50
B: set balance for account 1234 to 50
DB -> B: ok, new balance for account 1234 is 50
From the database's perspective, there are no "write conflicts" here. You asked to set the balance to 50 for the given account twice.
There are different ways of solving this issue. One is to use conditional updates such that the process looks like this:
A: what is the current balance for account 1234?
B: what is the current balance for account 1234?
DB -> A: balance for account 1234 is 0
DB -> B: balance for account 1234 is 0
A: new balance is 0+50 = 50
A: if balance in account 1234 is 0, set balance to 50
DB -> A: ok, new balance for account 1234 is 50
B: new balance is 0+50 = 50
B: if balance in account 1234 is 0, set balance to 50
DB -> B: balance is not 0, no update was performed
B: err, let's start over
B: what is the current balance for account 1234?
DB -> B: balance for account 1234 is 50
B: new balance is 50+50 = 100
B: if balance in account 1234 is 50, set balance to 100
DB -> B: ok, new balance for account 1234 is 100
As you see, the database must support the conditional update and the application must handle the possibility of concurrent updates and retry the operation.
If the balance can go up and down, this is not a practically useful way of writing a debit & credit system (but if balance can only increase or only decrease, this would in fact work quite fine). In real systems you'd use a special field whose purpose is to identify the specific version of the document that was in existence at the moment the application retrieved some data; the update is conditioned on the current version of the document staying the same, and each update increments the version. Concurrent updates would then be detected because the version number is wrong rather than a content field.
There are ways to produce a "write conflict" on the database side, for example by using transactions as supported by MongoDB 4.0+. In principle this works the same way but the "version" is called a "transaction identifier" and it's stored in a different place (not inline in the document being operated on). But the principle is the same. In this case the database would inform you that there was a write conflict, you'd still need to reissue the operations.
Update:
I think you also need to distinguish between "optimistic currency control" as a concept, its implementation, and what the implementation applies to. https://docs.mongodb.com/manual/faq/concurrency/#how-granular-are-locks-in-mongodb for example says:
For most read and write operations, WiredTiger uses optimistic concurrency control. WiredTiger uses only intent locks at the global, database and collection levels. When the storage engine detects conflicts between two operations, one will incur a write conflict causing MongoDB to transparently retry that operation.
Reading this statement carefully, it applies to write operations on storage engine level. I imagine when MongoDB performs something like $set, or other atomic write operations, this would apply. But this doesn't apply to application-level operation sequences like you've given in your example.
If you try your example code with your favorite relational DBMS, I think you'll find it produces roughly the same result as you've seen with MongoDB, if you issue a transaction around each individual read and write (such that balance read and write are in different transactions), for the same reason - RDBMSes lock data (or use techniques like MVCC) for the lifetime of a transaction, but not across transactions.
Similarly if you put both balance read and balance write on the same account into a transaction in MongoDB, you may find that you are receiving transient errors when other transactions modify the account in question concurrently.
Lastly, the API that MongoDB implements for transactions (with retries) is described here. If you look at it carefully you'll find that it expects the application to reissue not just the transaction commit command, but to repeat the entire transaction operation. This is because generally, if there is a "write conflict" the starting data has changed, and simply attempting the final write again isn't enough - potentially calculations in the applications need to be redone, possibly even side effects of that process change as a result.
I'm working on a filtered live search module with Meteor.js.
Usecase & problem:
A user wants to do a search through all the users to find friends. But I cannot afford for each user to ask the complete users collection. The user filter the search using checkboxes. I'd like to subscribe to the matched users. What is the best way to do it ?
I guess it would be better to create the query client-side, then send it the the method to get back the desired set of users. But, I wonder : when the filtering criteria changes, does the new subscription erase all of the old one ? Because, if I do a first search which return me [usr1, usr3, usr5], and after that a search that return me [usr2, usr4], the best would be to keep the first set and simply add the new one to it on the client-side suscribed collection.
And, in addition, if then I do a third research wich should return me [usr1, usr3, usr2, usr4], the autorunned subscription would not send me anything as I already have the whole result set in my collection.
The goal is to spare processing and data transfer from the server.
I have some ideas, but I haven't coded enough of it yet to share it in a easily comprehensive way.
How would you advice me to do to be the more relevant possible in term of time and performance saving ?
Thanks you all.
David
It depends on your application, but you'll probably send a non-empty string to a publisher which uses that string to search the users collection for matching names. For example:
Meteor.publish('usersByName', function(search) {
check(search, String);
// make sure the user is logged in and that search is sufficiently long
if (!(this.userId && search.length > 2))
return [];
// search by case insensitive regular expression
var selector = {username: new RegExp(search, 'i')};
// only publish the necessary fields
var options = {fields: {username: 1}};
return Meteor.users.find(selector, options);
});
Also see common mistakes for why we limit the fields.
performance
Meteor is clever enough to keep track of the current document set that each client has for each publisher. When the publisher reruns, it knows to only send the difference between the sets. So the situation you described above is already taken care of for you.
If you were subscribed for users: 1,2,3
Then you restarted the subscription for users 2,3,4
The server would send a removed message for 1 and an added message for 4.
Note this will not happen if you stopped the subscription prior to rerunning it.
To my knowledge, there isn't a way to avoid removed messages when modifying the parameters for a single subscription. I can think of two possible (but tricky) alternatives:
Accumulate the intersection of all prior search queries and use that when subscribing. For example, if a user searched for {height: 5} and then searched for {eyes: 'blue'} you could subscribe with {height: 5, eyes: 'blue'}. This may be hard to implement on the client, but it should accomplish what you want with the minimum network traffic.
Accumulate active subscriptions. Rather than modifying the existing subscription each time the user modifies the search, start a new subscription for the new set of documents, and push the subscription handle to an array. When the template is destroyed, you'll need to iterate through all of the handles and call stop() on them. This should work, but it will consume more resources (both network and server memory + CPU).
Before attempting either of these solutions, I'd recommend benchmarking the worst case scenario without using them. My main concern is that without fairly tight controls, you could end up publishing the entire users collection after successive searches.
If you want to go easy on your server, you'll want to send as little data to the client as possible. That means every document you send to the client that is NOT a friend is waste. So let's eliminate all that waste.
Collect your filters (eg filters = {sex: 'Male', state: 'Oregon'}). Then call a method to search based on your filter (eg Users.find(filters). Additionally, you can run your own proprietary ranking algorithm to determine the % chance that a person is a friend. Maybe base it off of distance from ip address (or from phone GPS history), mutual friends, etc. This will pay dividends in efficiency in a bit. Index things like GPS coords or other highly unique attributes, maybe try out composite indexes. But remember more indexes means slower writes.
Now you've got a cursor with all possible friends, ranked from most likely to least likely.
Next, change your subscription to match those friends, but put a limit:20 on there. Also, only send over the fields you need. That way, if a user wants to skip this step, you only wasted sending 20 partial docs over the wire. Then, have an infinite scroll or 'load more' button the user can click. When they load more, it's an additive subscription, so it's not resending duplicate info. Discover Meteor describes this pattern in great detail, so I won't.
After a few clicks/scrolls, the user won't find any more friends (because you were smart & sorted them) so they will stop trying & move on to the next step. If you returned 200 possible friends & they stop trying after 60, you just saved 140 docs from going through the pipeline. There's your efficiency.
we have a button in a web game for the users to collect reward. That should only be clicked once, and upon receiving the request, we'll mark it collected in DB.
we've already blocked the buttons in the client from repeated clicking. But that won't help if people resend the package multiple times to our server in short period of time.
what I want is a method to block this from server side.
we're using Playframework 2 (2.0.3-RC2) for server side and so far it's stateless, I'm tempted to use a Set to guard like this:
if processingSet has userId then BadRequest
else put userId in processingSet and handle request
after that remove userId from that Set
but then I'd have to face problem like Updating Scala collections thread-safely and still fail to block the user once we have more than one server behind load balancing.
one possibility I'm thinking about is to have a table in DB in place of the processingSet above, but that would incur 1+ DB operation per request, are there any better solution~?
thanks~
Additional DB operation is relatively 'cheap' solution in that case. You should use it if you'e planning to save the buttons state permanently.
If the button is disabled only for some period of time (for an example until the game is over) you can also consider using the cache API however keep in mind that's not dedicated for solutions which should be stored for long time (it should not be considered as DB alternative).
Given that you're using Mongo and so don't have transactions spanning separate collections, I think you can probably implement this guard using an atomic operation - namely "Update if current", which is effectively CompareAndSwap.
Assuming you've got a collection like "rewards" which has a "collected" attribute, you can update the collected flag to true only if it is currently false and if that operation doesn't fail you can proceed to apply the reward knowing that for any other requests the same operation will fail.
What's a good way to implement a Web Page counter?
On the surface this is a simple problem, but it gets problematic when dealing with search engine crawlers and robots, multiple clicks by the same user, refresh clicks.
Specifically what is a good way to ensure links aren't just 'clicked up' by user by repeatedly clicking? IP address? Cookies? Both of these have a few drawbacks (IP Addresses aren't necessarily unique, cookies can be turned off).
Also what is the best way to store the data? Increment a counter individually or store each click as a record in a log table, then summarize occasionally.
Any live experience would be helpful,
+++ Rick ---
Use IP Addresses in conjunction with Sessions. Count every new session for an IP address as one hit against your counter. You can store this data in a log database if you think you'll ever need to look through it. This can be useful for calculating when your site gets the most traffic, how much traffic per day, per IP, etc.
So I played around with this a bit based on the comments here. What I came up with is counting up a counter in a simple field. In my app I have code snippet entities with a Views property.
When a snippet is viewed a method filters out (white list) just what should hopefully be browsers:
public bool LogSnippetView(string snippetId, string ipAddress, string userAgent)
{
if (string.IsNullOrEmpty(userAgent))
return false;
userAgent = userAgent.ToLower();
if (!(userAgent.Contains("mozilla") || !userAgent.StartsWith("safari") ||
!userAgent.StartsWith("blackberry") || !userAgent.StartsWith("t-mobile") ||
!userAgent.StartsWith("htc") || !userAgent.StartsWith("opera")))
return false;
this.Context.LogSnippetClick(snippetId, IpAddress);
}
The stored procedure then uses a separate table to temporarily hold the latest views which store the snippet Id, entered date and ip address. Each view is logged and when a new view comes in it's checked to see if the same IP address has accessed this snippet within the last 2 minutes. if so nothing is logged.
If it's a new view the view is logged (again SnippetId, IP, Entered) and the actual Views field is updated on the Snippets table.
If it's not a new view the table is cleaned up with any views logged that are older than 4 minutes. This should result in a minmal number of entries in the View log table at any time.
Here's the stored proc:
ALTER PROCEDURE [dbo].[LogSnippetClick]
-- Add the parameters for the stored procedure here
#SnippetId AS VARCHAR(MAX),
#IpAddress AS VARCHAR(MAX)
AS
BEGIN
SET NOCOUNT ON;
-- check if don't allow updating if this ip address has already
-- clicked on this snippet in the last 2 minutes
select Id from SnippetClicks
WHERE snippetId = #SnippetId AND ipaddress = #IpAddress AND
DATEDIFF(minute, Entered, GETDATE() ) < 2
IF ##ROWCOUNT = 0
BEGIN
INSERT INTO SnippetClicks
(SnippetId,IpAddress,Entered) VALUES
(#SnippetId,#IpAddress,GETDATE())
UPDATE CodeSnippets SET VIEWS = VIEWS + 1
WHERE id = #SnippetId
END
ELSE
BEGIN
-- clean up
DELETE FROM SnippetClicks WHERE DATEDIFF(minute,Entered,GETDATE()) > 4
END
END
This seems to work fairly well. As others mentioned this isn't perfect but it looks like it's good enough in initial testing.
If you get to use PHP, you may use sessions to track activity from particular users. In conjunction with a database, you may track activity from particular IP addresses, which you may assume are the same user.
Use timestamps to limit hits (assume no more than 1 hit per 5 seconds, for example), and to tell when new "visits" to the site occur (if the last hit was over 10 minutes ago, for example).
You may find $_SERVER[] properties that aid you in detecting bots or visitor trends (such as browser usage).
edit:
I've tracked hits & visits before, counting a page view as a hit, and +1 to visits when a new session is created. It was fairly reliable (more than reliable enough for the purposes I used it for. Browsers that don't support cookies (and thus, don't support sessions) and users that disable sessions are fairly uncommon nowadays, so I wouldn't worry about it unless there is reason to be excessively accurate.
If I were you, I'd give up on my counter being accurate in the first place. Every solution (e.g. cookies, IP addresses, etc.), like you said, tends to be unreliable. So, I think your best bet is to use redundancy in your system: use cookies, "Flash-cookies" (shared objects), IP addresses (perhaps in conjunction with user-agents), and user IDs for people who are logged in.
You could implement some sort of scheme where any unknown client is given a unique ID, which gets stored (hopefully) on the client's machine and re-transmitted with every request. Then you could tie an IP address, user agent, and/or user ID (plus anything else you can think of) to every unique ID and vice-versa. The timestamp and unique ID of every click could be logged in a database table somewhere, and each click (at least, each click to your website) could be let through or denied depending on how recent the last click was for the same unique ID. This is probably reliable enough for short term click-bursts, and long-term it wouldn't matter much anyway (for the click-up problem, not the page counter).
Friendly robots should have their user agent set appropriately and can be checked against a list of known robot user agents (I found one here after a simple Google search) in order to be properly identified and dealt with seperately from real people.