How to guard against repeated request? - scala

we have a button in a web game for the users to collect reward. That should only be clicked once, and upon receiving the request, we'll mark it collected in DB.
we've already blocked the buttons in the client from repeated clicking. But that won't help if people resend the package multiple times to our server in short period of time.
what I want is a method to block this from server side.
we're using Playframework 2 (2.0.3-RC2) for server side and so far it's stateless, I'm tempted to use a Set to guard like this:
if processingSet has userId then BadRequest
else put userId in processingSet and handle request
after that remove userId from that Set
but then I'd have to face problem like Updating Scala collections thread-safely and still fail to block the user once we have more than one server behind load balancing.
one possibility I'm thinking about is to have a table in DB in place of the processingSet above, but that would incur 1+ DB operation per request, are there any better solution~?
thanks~

Additional DB operation is relatively 'cheap' solution in that case. You should use it if you'e planning to save the buttons state permanently.
If the button is disabled only for some period of time (for an example until the game is over) you can also consider using the cache API however keep in mind that's not dedicated for solutions which should be stored for long time (it should not be considered as DB alternative).

Given that you're using Mongo and so don't have transactions spanning separate collections, I think you can probably implement this guard using an atomic operation - namely "Update if current", which is effectively CompareAndSwap.
Assuming you've got a collection like "rewards" which has a "collected" attribute, you can update the collected flag to true only if it is currently false and if that operation doesn't fail you can proceed to apply the reward knowing that for any other requests the same operation will fail.

Related

Creating an atomic process for a netconf edit-config request

I am creating a custom system that, when a user submits a netconf edit-config, it will initiate a set of actions in my system that will atomically alter the configuration of our system and then submit a notification to the user of its success or failure.
Think of it as a big SQL transaction that, at the end, either commits or rolls back.
So, steps
User submits an edit-config
System accepts config and works to implement this config
If the config is successful, sends by a thumbs up response (not sure the formal way of doing this)
If the config is a failure, sends by a thumbs down response (and I will have to make sure the config is rolled back internally)
All this is done atomically. So, if a user submits two configs in a row, they won't conflict with each other.
Our working idea (probably not the best one) to implement this was to go about this by accepting the edit-config and then, within sysrepo, we would edit parts of our leafs with the success or failure flags and they would happen within the same session as the initial change. We were hoping this would keep everything atomic; by doing edits outside of the session, multiple configuration changes could conflict with each other.
We weren't sure to go about this with pure netconf or to leverage sysrepo directly. We noticed all these plugins/bindings made for sysrepo and figured those could be used directly to talk to our datastore.
But that said, our working idea is most likely not best-practice approach. What would be the best way to achieve this?
Our system is:
netopeer 1.1.27
sysrepo 1.4.58
libyang 1.0.167
libnetconf2 1.1.24
And our yang file is
module rxmbn {
namespace "urn:com:zug:rxmbn";
prefix rxmbn;
container rxmbn-config {
config true;
leaf raw {
type string;
}
leaf raw_hashCode {
type int32;
}
leaf odl_last_processed_hashCode {
type int32;
}
leaf processed {
type boolean;
default "false";
}
}
}
Currently we can:
Execute an edit-config to netopeer server
We can see the new config register in the sysrepo datastore
We can capture the moment sysrepo registers the data via sysrepo's API
But we are having problems
Atomically editing the datastore during the update session (due to locks, which is normal. In fact, if there is no way to edit during an update session, that is fine and not necessary. The main goal is the next bullet)
Atomically reacting to the new edit-config and responding to the end user
We are all a bit new to netconf and yang, so I am sure there is some way to leverage the notification api or event api either through the netopeer session or sysrepo, we just don't know enough yet.
If there are any examples or implementation advice to create an atomic transaction for this, that'd be really useful.
I know nothing of sysrepo so this is from a NETCONF perspective.
NETCONF severs process requests serially within a single session in a request-response fashion, meaning that everything you do within a single NETCONF session should already be "atomic" - you cannot send two requests and have them applied in reverse order or in parallel no matter what you do. A well behaved client would also wait for each response from the server before sending a new request, especially if all updates must execute successfully and in specific order. The protocol also defines no way to cancel a request already sent to a server.
If you need to prevent other sessions from modifying a datatstore while another session is performing a multi- edit-config, you use <lock> and <unlock> NETCONF operations to lock the entire datastore. There is also RFC5717 and partial lock, which would only lock a specific branch of the datastore.
Using notifications to report success of an <edit-config> would be highly unusual - that's what <rpc-reply> and <rpc-error> are there for within the same session. You would use notifications to inform other sessions about what's happening. In fact, there are standard base notifications for config changes.
I suggest reading the entire RFC6241 before proceeding further. There are things like candidate datastores, confirmed-commits, etc. you should know about.
Which component are you developing? Netconf client/manager or Netconf server?
In general, the Netconf server should implement individual Netconf RPC operations in an atomic way.
When a Netconf client wants to perform a set of operations in an atomic way, it should follow the procedure explained in Apendix E.1 in RFC 6241.

How to optimize collection subscription in Meteor?

I'm working on a filtered live search module with Meteor.js.
Usecase & problem:
A user wants to do a search through all the users to find friends. But I cannot afford for each user to ask the complete users collection. The user filter the search using checkboxes. I'd like to subscribe to the matched users. What is the best way to do it ?
I guess it would be better to create the query client-side, then send it the the method to get back the desired set of users. But, I wonder : when the filtering criteria changes, does the new subscription erase all of the old one ? Because, if I do a first search which return me [usr1, usr3, usr5], and after that a search that return me [usr2, usr4], the best would be to keep the first set and simply add the new one to it on the client-side suscribed collection.
And, in addition, if then I do a third research wich should return me [usr1, usr3, usr2, usr4], the autorunned subscription would not send me anything as I already have the whole result set in my collection.
The goal is to spare processing and data transfer from the server.
I have some ideas, but I haven't coded enough of it yet to share it in a easily comprehensive way.
How would you advice me to do to be the more relevant possible in term of time and performance saving ?
Thanks you all.
David
It depends on your application, but you'll probably send a non-empty string to a publisher which uses that string to search the users collection for matching names. For example:
Meteor.publish('usersByName', function(search) {
check(search, String);
// make sure the user is logged in and that search is sufficiently long
if (!(this.userId && search.length > 2))
return [];
// search by case insensitive regular expression
var selector = {username: new RegExp(search, 'i')};
// only publish the necessary fields
var options = {fields: {username: 1}};
return Meteor.users.find(selector, options);
});
Also see common mistakes for why we limit the fields.
performance
Meteor is clever enough to keep track of the current document set that each client has for each publisher. When the publisher reruns, it knows to only send the difference between the sets. So the situation you described above is already taken care of for you.
If you were subscribed for users: 1,2,3
Then you restarted the subscription for users 2,3,4
The server would send a removed message for 1 and an added message for 4.
Note this will not happen if you stopped the subscription prior to rerunning it.
To my knowledge, there isn't a way to avoid removed messages when modifying the parameters for a single subscription. I can think of two possible (but tricky) alternatives:
Accumulate the intersection of all prior search queries and use that when subscribing. For example, if a user searched for {height: 5} and then searched for {eyes: 'blue'} you could subscribe with {height: 5, eyes: 'blue'}. This may be hard to implement on the client, but it should accomplish what you want with the minimum network traffic.
Accumulate active subscriptions. Rather than modifying the existing subscription each time the user modifies the search, start a new subscription for the new set of documents, and push the subscription handle to an array. When the template is destroyed, you'll need to iterate through all of the handles and call stop() on them. This should work, but it will consume more resources (both network and server memory + CPU).
Before attempting either of these solutions, I'd recommend benchmarking the worst case scenario without using them. My main concern is that without fairly tight controls, you could end up publishing the entire users collection after successive searches.
If you want to go easy on your server, you'll want to send as little data to the client as possible. That means every document you send to the client that is NOT a friend is waste. So let's eliminate all that waste.
Collect your filters (eg filters = {sex: 'Male', state: 'Oregon'}). Then call a method to search based on your filter (eg Users.find(filters). Additionally, you can run your own proprietary ranking algorithm to determine the % chance that a person is a friend. Maybe base it off of distance from ip address (or from phone GPS history), mutual friends, etc. This will pay dividends in efficiency in a bit. Index things like GPS coords or other highly unique attributes, maybe try out composite indexes. But remember more indexes means slower writes.
Now you've got a cursor with all possible friends, ranked from most likely to least likely.
Next, change your subscription to match those friends, but put a limit:20 on there. Also, only send over the fields you need. That way, if a user wants to skip this step, you only wasted sending 20 partial docs over the wire. Then, have an infinite scroll or 'load more' button the user can click. When they load more, it's an additive subscription, so it's not resending duplicate info. Discover Meteor describes this pattern in great detail, so I won't.
After a few clicks/scrolls, the user won't find any more friends (because you were smart & sorted them) so they will stop trying & move on to the next step. If you returned 200 possible friends & they stop trying after 60, you just saved 140 docs from going through the pipeline. There's your efficiency.

Avoid duplicate POSTs with REST

I have been using POST in a REST API to create objects. Every once in a while, the server will create the object, but the client will be disconnected before it receives the 201 Created response. The client only sees a failed POST request, and tries again later, and the server happily creates a duplicate object...
Others must have had this problem, right? But I google around, and everyone just seems to ignore it.
I have 2 solutions:
A) Use PUT instead, and create the (GU)ID on the client.
B) Add a GUID to all objects created on the client, and have the server enforce their UNIQUE-ness.
A doesn't match existing frameworks very well, and B feels like a hack. How does other people solve this, in the real world?
Edit:
With Backbone.js, you can set a GUID as the id when you create an object on the client. When it is saved, Backbone will do a PUT request. Make your REST backend handle PUT to non-existing id's, and you're set.
Another solution that's been proposed for this is POST Once Exactly (POE), in which the server generates single-use POST URIs that, when used more than once, will cause the server to return a 405 response.
The downsides are that 1) the POE draft was allowed to expire without any further progress on standardization, and thus 2) implementing it requires changes to clients to make use of the new POE headers, and extra work by servers to implement the POE semantics.
By googling you can find a few APIs that are using it though.
Another idea I had for solving this problem is that of a conditional POST, which I described and asked for feedback on here.
There seems to be no consensus on the best way to prevent duplicate resource creation in cases where the unique URI generation is unable to be PUT on the client and hence POST is needed.
I always use B -- detection of dups due to whatever problem belongs on the server side.
Detection of duplicates is a kludge, and can get very complicated. Genuine distinct but similar requests can arrive at the same time, perhaps because a network connection is restored. And repeat requests can arrive hours or days apart if a network connection drops out.
All of the discussion of identifiers in the other anwsers is with the goal of giving an error in response to duplicate requests, but this will normally just incite a client to get or generate a new id and try again.
A simple and robust pattern to solve this problem is as follows: Server applications should store all responses to unsafe requests, then, if they see a duplicate request, they can repeat the previous response and do nothing else. Do this for all unsafe requests and you will solve a bunch of thorny problems. Repeat DELETE requests will get the original confirmation, not a 404 error. Repeat POSTS do not create duplicates. Repeated updates do not overwrite subsequent changes etc. etc.
"Duplicate" is determined by an application-level id (that serves just to identify the action, not the underlying resource). This can be either a client-generated GUID or a server-generated sequence number. In this second case, a request-response should be dedicated just to exchanging the id. I like this solution because the dedicated step makes clients think they're getting something precious that they need to look after. If they can generate their own identifiers, they're more likely to put this line inside the loop and every bloody request will have a new id.
Using this scheme, all POSTs are empty, and POST is used only for retrieving an action identifier. All PUTs and DELETEs are fully idempotent: successive requests get the same (stored and replayed) response and cause nothing further to happen. The nicest thing about this pattern is its Kung-Fu (Panda) quality. It takes a weakness: the propensity for clients to repeat a request any time they get an unexpected response, and turns it into a force :-)
I have a little google doc here if any-one cares.
You could try a two step approach. You request an object to be created, which returns a token. Then in a second request, ask for a status using the token. Until the status is requested using the token, you leave it in a "staged" state.
If the client disconnects after the first request, they won't have the token and the object stays "staged" indefinitely or until you remove it with another process.
If the first request succeeds, you have a valid token and you can grab the created object as many times as you want without it recreating anything.
There's no reason why the token can't be the ID of the object in the data store. You can create the object during the first request. The second request really just updates the "staged" field.
Server-issued Identifiers
If you are dealing with the case where it is the server that issues the identifiers, create the object in a temporary, staged state. (This is an inherently non-idempotent operation, so it should be done with POST.) The client then has to do a further operation on it to transfer it from the staged state into the active/preserved state (which might be a PUT of a property of the resource, or a suitable POST to the resource).
Each client ought to be able to GET a list of their resources in the staged state somehow (maybe mixed with other resources) and ought to be able to DELETE resources they've created if they're still just staged. You can also periodically delete staged resources that have been inactive for some time.
You do not need to reveal one client's staged resources to any other client; they need exist globally only after the confirmatory step.
Client-issued Identifiers
The alternative is for the client to issue the identifiers. This is mainly useful where you are modeling something like a filestore, as the names of files are typically significant to user code. In this case, you can use PUT to do the creation of the resource as you can do it all idempotently.
The down-side of this is that clients are able to create IDs, and so you have no control at all over what IDs they use.
There is another variation of this problem. Having a client generate a unique id indicates that we are asking a customer to solve this problem for us. Consider an environment where we have a publicly exposed APIs and have 100s of clients integrating with these APIs. Practically, we have no control over the client code and the correctness of his implementation of uniqueness. Hence, it would probably be better to have intelligence in understanding if a request is a duplicate. One simple approach here would be to calculate and store check-sum of every request based on attributes from a user input, define some time threshold (x mins) and compare every new request from the same client against the ones received in past x mins. If the checksum matches, it could be a duplicate request and add some challenge mechanism for a client to resolve this.
If a client is making two different requests with same parameters within x mins, it might be worth to ensure that this is intentional even if it's coming with a unique request id.
This approach may not be suitable for every use case, however, I think this will be useful for cases where the business impact of executing the second call is high and can potentially cost a customer. Consider a situation of payment processing engine where an intermediate layer ends up in retrying a failed requests OR a customer double clicked resulting in submitting two requests by client layer.
Design
Automatic (without the need to maintain a manual black list)
Memory optimized
Disk optimized
Algorithm [solution 1]
REST arrives with UUID
Web server checks if UUID is in Memory cache black list table (if yes, answer 409)
Server writes the request to DB (if was not filtered by ETS)
DB checks if the UUID is repeated before writing
If yes, answer 409 for the server, and blacklist to Memory Cache and Disk
If not repeated write to DB and answer 200
Algorithm [solution 2]
REST arrives with UUID
Save the UUID in the Memory Cache table (expire for 30 days)
Web server checks if UUID is in Memory Cache black list table [return HTTP 409]
Server writes the request to DB [return HTTP 200]
In solution 2, the threshold to create the Memory Cache blacklist is created ONLY in memory, so DB will never be checked for duplicates. The definition of 'duplication' is "any request that comes into a period of time". We also replicate the Memory Cache table on the disk, so we fill it before starting up the server.
In solution 1, there will be never a duplicate, because we always check in the disk ONLY once before writing, and if it's duplicated, the next roundtrips will be treated by the Memory Cache. This solution is better for Big Query, because requests there are not imdepotents, but it's also less optmized.
HTTP response code for POST when resource already exists

NSMutableURLRequest on succession of another NSMutableURLRequest's success

Basically, I want to implement SYNC functionality; where, if internet connection is not available, data gets stored on local sqlite database. Whenever, internet connection is available, SYNC gets into the action.
Now, Say for example; 5 records are stored locally, and then internet connection is available. I want the server to be updated. So, What I do currently is:
Post first record to the server.
Wait for the success of first request.
Post local NSNotification to routine, that the first record has been updated on server & now second request can go.
The routine fires the second post request on server and so on...
Question: Is this approach right and efficient enough to implement SYNC functionality; OR anything I should change into it ??
NOTE: Records to be SYNC will have no limit in numbers.
Well it depends on the requirements on the data that you save. If it is just for backup then you should be fine.
If the 5 records are somehow dependent on each other and you need to access this data from another device/application you should take care on the server side that either all 5 records are written or none. Otherwise you will have an inconsistent state if only 3 get written.
If other users are also reading / writing those data concurrently on the server then you need to implement some kind of lock on all records before writing and also decide how to handle conflicts when someone attempts to overwrite somebody else changes.

How do I pretend duplicate values in my read database with CQRS

Say that I have a User table in my ReadDatabase (use SQL Server). In a regulare read/write database I can put like a index on the table to make sure that 2 users aren't addedd to the table with the same emailadress.
So if I try to add a user with a emailadress that already exist in my table for a diffrent user, the sql server will throw an exception back.
In Cqrs I can't do that since if I decouple the write to my readdatabas from the domain model, by puting it on an asyncronus queue I wont get the exception thrown back to me, and I will return "OK" to the UI and the user will think that he is added to the database, when infact he will never be added to the read database.
I can do a search in the read database checking if there is a user already in my database with the emailadress, and if there is one, then thru an exception back to the UI. But if they press the save button the same time, I will do 2 checks to the database and see that there isn't any user in the database with the emailadress, I send back that it's okay. Put it on my queue and later it will fail (by hitting the unique identifier).
Am I suppose to load all users from my EventSource (it's a SQL Server) and then do the check on that collection, to see if I have a User that already has this emailadress. That sounds a bit crazy too me...
How have you people solved it?
The way I can see is to not using an asyncronized queue, but use a syncronized one but that will affect perfomance really bad, specially when you have many "read storages" to write to...
Need some help here...
Searching for CQRS Set Based Validation will give you solutions to this issue.
Greg Young posted about the business impact of embracing eventual consistency http://codebetter.com/gregyoung/2010/08/12/eventual-consistency-and-set-validation/
Jérémie Chassaing posted about discovering missing aggregate roots in the domain http://thinkbeforecoding.com/post/2009/10/28/Uniqueness-validation-in-CQRS-Architecture
Related stack overflow questions:
How to handle set based consistency validation in CQRS?
CQRS Validation & uniqueness