i'm using tornado for its simplicity, and am using it with Pymongo, so because i hear always about asynchronous calls, to serve lot of clients, then, i was asking, what is really an asynchronous calls to a database, so this code for example:
for example, suppose a page where a user have 4 areas where to search, so the result will be a 4 results.
A = calls the database to search for an element a.
B = calls the database to search for an element b.
C = calls the database to search for an element c.
D = calls the database to search for an element d.
then render a pages where a user will see the results (a,b,c,d)
so, this will be a killer for the server, since he must stay for all the 4 requests to finish, or do it serve the first result and then wait even if the database calls are blocking and make a bucket where he joins all the results to be served to the client? or the split of the 4 operations must be done with asynchronous database library (like Motor or Asyncmongo)?
Every call to PyMongo will block Tornado's IOLoop and prevent further processing of any client HTTP request until the PyMongo method completes.
http://api.mongodb.org/python/current/faq.html#does-pymongo-support-asynchronous-frameworks-like-gevent-tornado-or-twisted
Related
I have a REST web-service which is expected to expose a paginated GET call.
For eg: I have a list of students( "Name" , "Age" , "Class" ) in my sql table. And I have to expose a paginated API to get all students given a class. So far so good. Just a typical REST api does the job and pagination can be achieved by the sql query.
Now suppose we have the same requirement just that we need to send back students who are from particular state. This information is hosted by a web-service, S2. S2 has an API which given a list of student names and a state "X" returns the students that belong to state X.
Here is where I'm finding it difficult to support pagination.
eg: I get a request with page_size 10, a class C and a state X which results in 10 students from class C from my db. Now I make a call to S2 with these 10 students and state X, in return, the result may include 0 students, all 10 students, or any number students between 0 and 10 from state 'X'.
How do I support pagination in this case?
Brute force would be to make db calls and S2 calls till the page size is met and then only reply. I don't like this approach .
Is there a common practice followed for this, a general rule of thumb, or is this architecture a bad service design?
(EDIT): Also please tell about managing the offset value.
if we go with the some approach and get the result set , how can I manage the offset for next page request ?
Thanks for reading :)
Your service should handle the pagination and not hand it off the SQL. Make these steps:
Get all students from S1 (SQL database) where class = C.
Using the result, get all students from S2 that are in the result and where state = X.
Sort the second result in a stable way.
Get the requested page you want from the sorted result.
All this is done in the code that calls both S1 and S2. Only it has the knowledge to build the pages.
Not doing the pagination with SQL can lead to performance problems in case of large databases.
Some solution in between can be applied. I assume that the pagination parameters (offset, page size) are configurable for both services, yours and the external one.
You can implement prefetch logic for both services, lets say the prefetch chunk size can be 100.
The frontend can be served with required page size 10.
If the prefetched chunks do not result in a frontend page size 10, the backend should prefetch another chunk till the fronend can be served with 10 students.
This approach require more logic in backend to calculate the next offsets for prefetching, but if you want performance and pagination solved you must invest some effort.
function _allUsers(callback){
var db = connect.get();
db.collection("users").find({}).toArray(function(err,data){
if(err){
callback(err);
}else{
callback(null,data);
}
});
}
I am trying to understand this code, I have been looking around the web but I find the explanations kinda defficult to understand ( I am new at Mean stack), so my questions are:
What does the Collection method do? I am not sure but the string "users" is it just the name of our collection with all users?
Why do we have to use a callback in this situation? (I find callbacks very confusing).
And why do we have to give toArray function, an annonymous function?
Instead of toArray could I use pretty method() without any annonymous function as a parameter?
MEAN Stack is a software bundle of software programs supporting applications written in all javascript. This means you can use javascript from your database, to your back-end and front-end.
MEAN actually stands for the first characters of each software program included in the stack. MongoDB, Expressjs, AngularJS and NodeJS.
1
MongoDB is a NoSQL database which uses BSON (similar to JSON) to store so called documents. Look at a document as if it is a single entity or row in a traditional database. These entities (or rows) are stored in collections (a collection of documents) which can be compared to tables.
So the answer to your 1st question is opens up the users collection, which grants access to all the user documents.
2
NodeJS is asynchronous by design. This allows NodeJS to perform a lot of operations while running on a single thread*. Because NodeJS is single-threaded we need a way to write our code non-blocking meaning we can start an operation, proceed with executing other code and come back whenever that operation is finished.
In your case we request access to the users collection, this takes some time. In order to allow other parts of our application to continue processing we use a callback. When we have access to our collection, our callback is executed and we can perform whatever operation we wanted to do when we first requested access.
*NodeJS actually runs on multiple threads but a developer never has to worry about multithreading, NodeJS does that for us.'
3
This is exactly what the previous point is about.
The .toArray() method returns an array that contains all the documents from a cursor. The method iterates completely the cursor, loading all the documents into RAM and exhausting the cursor. Source
.toArray() is a computionally intensive operation. Since we do not want to wait untill .toArray() is finished but proceed processing the rest of our code, we give it a callback so that we can come back to our collection processing whenever it's ready.
4
From what I can read from the docs I guess you could indeed write blocking code and do it this way:
var users = db.collection("users").find({}).toArray();
This however will block your code entirely. There is never a good reason to do this.
Disclaimer: I left out or oversimplified details in this explanation for ease of understanding.
db.collection('users') this will return the users collection instance
we are using callback for asynchronous
the annonymous function in toArray is its callback
this is dependent on the library in use..
without any annonymous function as a parameter
expressjs is an asynchronous programming, we need callback || Promises
You can think of the collection as of table in MySQL. A collection consists of documents (rows/items/records in MySQL). Your example calls the Users collection and finds all documents (records) in it.
About the callbacks - NodeJS/Express are commonly callbacks-oriented. This is the pattern they use and most of the code is using it, because it is asynchronous. If you need to be sure that some snippet is executed right after some other snippet, you have to use callback (or promise).
Calling toArray() depends on what your callback expects. You can skip calling this method if the callback expects the Query object returned by the find() method. All that depends on your callback.
You can use non-anonymous function, too, but you have to have in mind the asynchronous logic and continue using callbacks/promises. You can read more about callbacks and promises in this Quora's article.
Here you can find more about the find() method.
Basically, I want to implement SYNC functionality; where, if internet connection is not available, data gets stored on local sqlite database. Whenever, internet connection is available, SYNC gets into the action.
Now, Say for example; 5 records are stored locally, and then internet connection is available. I want the server to be updated. So, What I do currently is:
Post first record to the server.
Wait for the success of first request.
Post local NSNotification to routine, that the first record has been updated on server & now second request can go.
The routine fires the second post request on server and so on...
Question: Is this approach right and efficient enough to implement SYNC functionality; OR anything I should change into it ??
NOTE: Records to be SYNC will have no limit in numbers.
Well it depends on the requirements on the data that you save. If it is just for backup then you should be fine.
If the 5 records are somehow dependent on each other and you need to access this data from another device/application you should take care on the server side that either all 5 records are written or none. Otherwise you will have an inconsistent state if only 3 get written.
If other users are also reading / writing those data concurrently on the server then you need to implement some kind of lock on all records before writing and also decide how to handle conflicts when someone attempts to overwrite somebody else changes.
we have a button in a web game for the users to collect reward. That should only be clicked once, and upon receiving the request, we'll mark it collected in DB.
we've already blocked the buttons in the client from repeated clicking. But that won't help if people resend the package multiple times to our server in short period of time.
what I want is a method to block this from server side.
we're using Playframework 2 (2.0.3-RC2) for server side and so far it's stateless, I'm tempted to use a Set to guard like this:
if processingSet has userId then BadRequest
else put userId in processingSet and handle request
after that remove userId from that Set
but then I'd have to face problem like Updating Scala collections thread-safely and still fail to block the user once we have more than one server behind load balancing.
one possibility I'm thinking about is to have a table in DB in place of the processingSet above, but that would incur 1+ DB operation per request, are there any better solution~?
thanks~
Additional DB operation is relatively 'cheap' solution in that case. You should use it if you'e planning to save the buttons state permanently.
If the button is disabled only for some period of time (for an example until the game is over) you can also consider using the cache API however keep in mind that's not dedicated for solutions which should be stored for long time (it should not be considered as DB alternative).
Given that you're using Mongo and so don't have transactions spanning separate collections, I think you can probably implement this guard using an atomic operation - namely "Update if current", which is effectively CompareAndSwap.
Assuming you've got a collection like "rewards" which has a "collected" attribute, you can update the collected flag to true only if it is currently false and if that operation doesn't fail you can proceed to apply the reward knowing that for any other requests the same operation will fail.
I have the following problem, I am implementing a queue abstraction with RavenDB in a stateless REST server.
Suppose I have 2 REST calls A and B
When call A is happening I query the queue for the last item ( item A ) and give it to call A.
If call B is called at the same time as call A - which can happen with REST calls, I need to prevent the program from giving call B the same item as A meaning item A should be "locked" by the call A.
A standard multi-threaded protection here would be a simple lock, how do I translate this idea to my situation with RavenDB and REST?
P.S I use Nancy for the REST server
That should not be too hard unless I missed something:
Introduce a flag (boolean property) on your items, e.g. "Processed".
Inside your action, open up a new RavenDB session and enable optimistic concurrency on it (DocumentSession.Advanced .UseOptimisticConcurrency).
Get the next unprocessed item and then immediately update its processed flag to true.
Call .SaveChanges on your session -> if that succeeds (you don't get a ConcurrencyException), you can safely return the item as the result of the request. If not, load the next item.