How does Meteor handle the same subscription from multiple users? - mongodb

Does Meteor read from MongoDB multiple times for the same subscription when it's made by multiple users? To illustrate, here's an example:
If you have the following publish function:
Server:
Meteor.publish('articles',function(keyword){
return Articles.find({keyword:keyword});
});
And on the client you subscribe as follows:
Client:
Meteor.subscribe('articles', keyword);
When the first user subscribes (using the keyword: "meteor"), the data needs to be read from MongoDB. Then a second user subscribes to that same publish function using the same keyword. Does Meteor go to MongoDB again to fetch the same data for the second user or is it cached somewhere and served from cache? Is it possible to have it be served from cache and have meteor update the cache when changes are detected?

Does Meteor read from MongoDB multiple times for the same subscription when it's made by multiple users?
No, it uses query de-duplication. If multiple clients subscribe for the same dataset, then only one observer is used and they share a cache of the result set. This is true for both poll-and-diff observers and oplog-tailing observers. I'd recommend watching all of this video where David Glasser explains how these algorithms work at a high level. Your specific question is addressed ~14 minutes in.

Related

Can a Firestore publisher detect subscriptions from listeners?

Can a Firestore client detect subscriptions to a document? I want a client to only publish (write) document changes if at least one other client is listening to that document. In my case, this would dramatically reduce the number of Firestore writes across the database.
Scenario:
a client is sampling a measured value every 3 seconds and publishing this to Firestore
another client (an app) is listening to this Firestore document and displaying the value updates
the app is only open on occasion, when the user wants to view the data. Most of the time it is not open, and thus no clients are listening to that document
Is there any way in the Firestore API to detect if a document is being listened to?
No, that's not possible. Code that writes a document will do so unconditionally. You can't arrange for a write to happen if there are no listeners, and you can't find out if there are any listeners at all. You will need some other (much more complex system) set up to do something like that.
As Doug said, there is nothing built in to Firestore to detect whether there are listeners on the data. The closest I can think of is the presence system that is (sort of) built into Firebase's other database (Realtime Database), but that doesn't exist on Firestore (yet, an Extension is in the works for it).
But even on that you'd probably need to do some extra work, to track specifically what collection(s) each client is listening to. It'd boil down to:
When attaching an observer, also record that observer in the Realtime Database. At the same time register an onDisconnect handler for this node, so that it automatically gets deleted when the client disconnects.
When detaching an obersver, remove its record from the Realtime Database and remove the onDisconnect handler.
Now in your publisher app, you can detect how many observers there are to each collection.

CouchDB Push data to external API when changed

We are working on a POC where we have our CouchDB instance and a pouchDB for each user.
We need to read the data from CouchDB use it in our CRM systems.We wanted to achieve this thorugh API where couch can post data to RestAPI and we take it forward from there.
Scenario:
seperate DB for each user
User1 - submits form and the data goes to couchDB
User2 - submits form and the data goes to CouchDB
Now we need to get the data from Couch whenever any inserts/updates to any database.
We had checked Change Notifications but that is something for one database.
In our case each user submits form will be a seperate database.So Can anyone throw some light on getting data out of CouchDB when any inserts/updates.
Without knowing the details of your data and the general concepts of your app its not easy to give a good advice.
If the data for each user is independent and you just want to collect the data later at in one database, you could consider using a filtered replication.
You can find more information here https://wiki.apache.org/couchdb/Replication#Filtered_Replication
If data must be merged or other advanced processing, you have to write a script to listen to the changes feeds of all user databases and if something changes do your logic to merge and write to the central database.
But beware you're kinda building your own sync protocol then which requires careful planning and experience.

How to combine websockets and http to create a REST API that keeps data up to date? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I am thinking about buildning a REST API with both websockets and http where I use websockets to tell the client that new data is available or provide the new data to the client directly.
Here are some different ideas of how it could work:
ws = websocket
Idea A:
David get all users with GET /users
Jacob add a user with POST /users
A ws message is sent to all clients with info that a new user exist
David recive a message by ws and calls GET /users
Idea B:
David get all users with GET /users
David register to get ws updates when a change is done to /users
Jacob add a user with POST /users
The new user is sent to David by ws
Idea C:
David get all users with GET /users
David register to get ws updates when a change is done to /users
Jacob add a user with POST /users and it gets the id 4
David receive the id 4 of the new user by ws
David get the new user with GET /users/4
Idea D:
David get all users with GET /users
David register to get ws updates when changes is done to /users.
Jacob add a user with POST /users
David receive a ws message that changes is done to /users
David get only the delta by calling GET /users?lastcall='time of step one'
Which alternative is the best and what are the pros and cons?
Is it another better 'Idea E'?
Do we even need to use REST or is ws enought for all data?
Edit
To solve problems with data getting out of sync we could provide the header"If-Unmodified-Since"https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Unmodified-Sinceor "E-Tag" https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag or both with PUT requests.
Idea B is for me the best, because the client specifically subscribes for changes in a resource, and gets the incremental updates from that moment.
Do we even need to use REST or is ws enought for all data?
Please check: WebSocket/REST: Client connections?
I don't know Java, but I worked with both Ruby and C on these designs...
Funny enough, I think the easiest solution is to use JSON, where the REST API simply adds the method data (i.e. method: "POST") to the JSON and forwards the request to the same handler the Websocket uses.
The underlying API's response (the response from the API handling JSON requests) can be translated to any format you need, such as HTML rendering... though I would consider simply returning JSON for most use cases.
This helps encapsulate the code and keep it DRY while accessing the same API using both REST and Websockets.
As you might infer, this design makes testing easier, since the underlying API that handles the JSON can be tested locally without the need to emulate a server.
Good Luck!
P.S. (Pub/Sub)
As for the Pub/Sub, I find it best to have a "hook" for any update API calls (a callback) and a separate Pub/Sub module that handles these things.
I also find it more resource friendly to write the whole data to the Pub/Sub service (option B) instead of just a reference number (option C) or an "update available" message (options A and D).
In general, I also believe that sending the whole user list isn't effective for larger systems. Unless you have 10-15 users, the database call might be a bust. Consider the Amazon admin calling for a list of all users... Brrr....
Instead, I would consider dividing this to pages, say 10-50 users a page. These tables can be filled using multiple requests (Websocket / REST, doesn't matter) and easily updated using live Pub/Sub messages or reloaded if a connection was lost and reestablished.
EDIT (REST vs. Websockets)
As For REST vs. Websockets... I find the question of need is mostly a subset of the question "who's the client?"...
However, once the logic is separated from the transport layer, than supporting both is very easy and often it makes more sense to support both.
I should note that Websockets often have a slight edge when it comes to authentication (credentials are exchanged once per connection instead of once per request). I don't know if this is a concern.
For the same reason (as well as others), Websockets usually have an edge with regards to performance... how big an edge over REST depends on the REST transport layer (HTTP/1.1, HTTP/2, etc').
Usually these things are negligible when it comes time to offer a public API access point and I believe implementing both is probably the way to go for now.
To summarize your ideas:
A: Send a message to all clients when a user edits data on the server. All users then request an update of all data.
-This system may make a lot of unnecessary server calls on behalf of clients who are not using the data. I don't recommend producing all of that extra traffic as processing and sending those updates could become costly.
B: After a user pulls data from the server, they then subscribe to updates from the server which sends them information about what has changed.
-This saves a lot of server traffic, but if you ever get out of sync, you're going to be posting incorrect data to your users.
C: Users who subscribe to data updates are sent information about which data has been updated, then fetch it again themselves.
-This is the worst of A and B in that you'll have extra round trips between your users and servers just to notify them that they need to make a request for information which may be out of sync.
D: Users who subscribe to updates are notified when any changes are made and then request the last change made to the server.
-This presents all of the problems with C, but includes the possibility that, once out of sync, you may send data that will be nonsense to your users which might just crash the client side app for all we know.
I think that this option E would be best:
Every time data changes on the server, send the contents of all the data to the clients who have subscribed to it. This limits the traffic between your users and the server while also giving them the least chance of having out of sync data. They might get stale data if their connection drops, but at least you wouldn't be sending them something like Delete entry 4 when you aren't sure whether or not they got the message that entry 5 just moved into slot 4.
Some Considerations:
How often does the data get updated?
How many users need to be updated each time an update occurs?
What are your transmission
costs? If you have users on mobile devices with slow connections, that will affect how often and how much you can afford to send to them.
How much data gets updated in a given update?
What happens if a user sees stale data?
What happens if a user gets data out of sync?
Your worst case scenario would be something like this: Lots of users, with slow connections who are frequently updating large amounts of data that should never be stale and, if it gets out of sync, becomes misleading.
I personally have used Idea B in production and am very satisfied with the results. We use http://www.axonframework.org/, so every change or creation of an entity is published as an event throughout the application. These events are then used to update several read models, which are basically simple Mysql tables backing one or more queries. I added some interceptors to the event processors that update these read models so that they publish the events they just processed after the data is committed to the DB.
Publishing of events is done through STOMP over web sockets. It is made very simple is you use Spring's Web Socket support (https://docs.spring.io/spring/docs/current/spring-framework-reference/html/websocket.html). This is how I wrote it:
#Override
protected void dispatch(Object serializedEvent, String topic, Class eventClass) {
Map<String, Object> headers = new HashMap<>();
headers.put("eventType", eventClass.getName());
messagingTemplate.convertAndSend("/topic" + topic, serializedEvent, headers);
}
I wrote a little configurer that uses Springs bean factory API so that I can annotate my Axon event handlers like this:
#PublishToTopics({
#PublishToTopic(value = "/salary-table/{agreementId}/{salaryTableId}", eventClass = SalaryTableChanged.class),
#PublishToTopic(
value = "/salary-table-replacement/{agreementId}/{activatedTable}/{deactivatedTable}",
eventClass = ActiveSalaryTableReplaced.class
)
})
Of course, that is just one way to do it. Connecting on the client side may look something like this:
var connectedClient = $.Deferred();
function initialize() {
var basePath = ApplicationContext.cataDirectBaseUrl().replace(/^https/, 'wss');
var accessToken = ApplicationContext.accessToken();
var socket = new WebSocket(basePath + '/wss/query-events?access_token=' + accessToken);
var stompClient = Stomp.over(socket);
stompClient.connect({}, function () {
connectedClient.resolve(stompClient);
});
}
this.subscribe = function (topic, callBack) {
connectedClient.then(function (stompClient) {
stompClient.subscribe('/topic' + topic, function (frame) {
callBack(frame.headers.eventType, JSON.parse(frame.body));
});
});
};
initialize();
Another option is to use Firebase Cloud Messaging:
Using FCM, you can notify a client app that new email or other data is
available to sync.
How does it work?
An FCM implementation includes two main components for sending and
receiving:
A trusted environment such as Cloud Functions for Firebase or an app server on which to build, target and send messages.
An iOS, Android, or Web (JavaScript) client app that receives messages.
Client registers its Firebase key to a server. When updates are available, server sends push notification to the Firebase key associated with the client. Client may receive data in notification structure or sync it with a server after receiving a notification.
Generally you might have a look at current "realtime" web frameworks like MeteorJS which tackle exactly this problem.
Meteor in specific works more or less like your example D with subscriptions on certain data and deltas being sent out after changes only to the affected clients. Their protocol used is called DDP which additionally sends the deltas not as overhead prone HTML but raw data.
If websockets are not available fallbacks like long polling or server sent events can be used.
If you plan to implement it yourself i hope these sources are some kind of inspiration how this problem has been approached. As already stated the specific use case is important
The answer depends on your use case. For the most part though I've found that you can implement everything you need with sockets. As long as you are only trying to access your server with clients who can support sockets. Also, scale can be an issue when you're using only sockets. Here are some examples of how you could use just sockets.
Server side:
socket.on('getUsers', () => {
// Get users from db or data model (save as user_list).
socket.emit('users', user_list );
})
socket.on('createUser', (user_info) => {
// Create user in db or data model (save created user as user_data).
io.sockets.emit('newUser', user_data);
})
Client side:
socket.on('newUser', () => {
// Get users from db or data model (save as user_list).
socket.emit('getUsers');
})
socket.on('users', (users) => {
// Do something with users
})
This uses socket.io for node. I'm not sure what your exact scenario is but this would work for that case. If you need to include REST endpoints that would be fine too.
With all great information all the great people added before me.
I found that eventually there is no right or wrong, its simply goes down to what suits your needs:
lets take CRUD in this scenario:
WS Only Approach:
Create/Read/Update/Deleted information goes all through the websocket.
--> e.g If you have critical performance considerations ,that is not
acceptable that the web client will do successive REST request to fetch
information,or if you know that you want the whole data to be seen in
the client no matter what was the event , so just send the CRUD events
AND DATA inside the websocket.
WS TO SEND EVENT INFO + REST TO CONSUME THE DATA ITSELF
Create/Read/Update/Deleted , Event information is sent in the Websocket,
giving the web client information that is necessary to send the proper
REST request to fetch exactly the thing the CRUD that happend in server.
e.g. WS sends UsersListChangedEvent {"ListChangedTrigger: "ItemModified" , "IdOfItem":"XXXX#3232" , "UserExtrainformation":" Enough info to let the client decide if it relevant for it to fetch the changed data"}
I found that using WS [Only for using Event Data] and REST
[To consume the data ]is better because:
[1] Separation between reading and writing model, Imagine you want to add some runtime information when your data is retrieved when its read from REST , that is now achieved because you are not mixing Write & Read models like in 1.
[2] Lets say other platform , not necessarily web client will consume this data.
so you just change the Event trigger from WS to the new way, and use REST to
consume the data.
[3] Client do not need to write 2 ways to read the new/modified data.
usually there is also code that reads the data when the page loads , and not
through the websocket , this code now can be used twice , once when page
loads , and second when WS triggered the specific event.
[4] Maybe the client do not want to fetch the new User because its showing currently only a view of old Data[E.g. users] , and new data changes is not in its interest to fetch ?
i prefer the A, it allows client the flexibility whether or not to update the existing data.
also with this method, implementation and access control becomes much more easier.
for example you can simply broadcast the userUpdated event to all users, this saves having a client list for do specific broadcasts and the Access Controls and Authentications applied for your REST Route wont have to change to reapplied again because the client is gonna make a GET request again.
Many things depends on the what kind of application you are making.

How do I guarantee unique subscriptions in Orion Context Broker?

In my setup I have one application that should subscribe to a specific kind of context change.
The application currently perform subscription at startup time. However if I restart the application, the subscription is duplicated. To overcome this issue I started to keep track of subscriptions in a database, so that I have an association between my application id and the latest subscription id.
Is there any way to achieve similar result in Orion (let's call it like "named subscriptions"), without using an external database?
There is a planned subscription "browsing" operation in Orion development roadmap (see operation ID 45 in this document) that could help in your case.
However, while this operation get implemented, one alternative to the one you mention (i.e. to keep subscription info in an external DB) would be to access to the Orion DB itself to get the subscription information. The datamodel (described here) is pretty simple and getting the info is quite easy if you are familiar with MongoDB. Note that this solution requires access to the Orion DB (i.e. it is feasible if you control your own instance of Orion).
EDIT: Given that different subscriptions may use the same reference, I'd recommend to use _id field to identify each subscription (_id field values are unique). NGSI doesn't include metadata in subscription, but you may associated subscription IDs with application using Orion itself, e.g. SubscriptionAssociation entities with two attributes, one for the application name and another for the subscription ID being associated
EDIT: since Orion 0.25.0, the GET /v2/subscriptions operation allows you to browse existing subscriptions.

How to ensure that parallel queries to ext. system are executed only once and then cached

Server frameworks: Scala, Play 2.2, ReactiveMongo, Heroku
I think I have quite interesting brain teaser for you:
In my trip-planning application I want to display weather forecast on a map(similar to this). I'm using a paid REST service to query weather data. To speed up user experience and reduce costs I plan to cache weather data for each location for one hour.
There are a few not-so obvious things to consider:
It might require to query up to 100 location for weather to display one weather map
Weather must be queried in parallel because it would take too long to query it in serial fashion considering network latency
However launching 100 threads for each user request is not an option as well (imagine just 5 users looking at a map at one time)
The solution is to have let's say 50 workers that query weather for user requests
Multiple users might be viewing the same portion of map
There is a possible racing condition where one location is queried multiple times.
However it should be queried only once and then cached.
The application is running in clustered environment meaning there will be several play instances.
Coming from a Java EE background I can come up with a pretty good solution using the Java EE stack.
However I wonder how to do this using something more natural to Scala/Play stack: Akka. There is an example (google "heroku scala akka") for similar problem but it doesn't solve one issue: Racing condition when multiple users query the same data at once.
How would you implement this?
EDIT: I have decided that the requirement to ensure that weather data is updated only once is not necessary. The situation would happen far too infrequently to be a real problem and all proposed solutions would bring too much overhead and complexity to the system to be viable.
Thanks everyone for your time and effort. I hope answers to this question will help someone in the future with similar problem.
In Akka you can choose from multiple routing strategies. ConsistentHashingRoutingLogic could serve you well in this situation. Since actors are single-threaded you can easily maintain a cache in each actor. This routing logic will assure that two equal messages will always hit the same actor.
Each actor can work in the following way:
1. check local cache (for example apache commons LRUMap)
- if found, return
2. check global cache (distributed memcache or any other key-value store)
- if found, store the result in the local cache and return
3. query the REST service
4. store the result in the global and local caches
You can have a look at this question, which I based my answer on.
I decided that I'll post my JMS solution as well.
Controller that processes the request for weather does following:
Query the DB for weather data. If there are NO locations with out-of-date data reply immediately. Otherwise continue:
Start listening on a topic (explained later).
For each location: Check whether the weather for the location isn't being updated.
If not send a weather update request message to queue.
Certain amount of workers (50?) listen to that queue.
Worker first marks the location weather as being updated
Worker retrieves updated weather and updates the DB.
Worker sends a message to a topic with weather data for that location.
When controller receives (via topic) weather updates for all out-of-date locations, combine it with up-to-date locations and reply.