I apologise if the question is naive. I wanted to understand what could be a few possible use cases of the live query feature.
Let's say - My database state changes but it doesn't change every minute (or hour). If I execute a live query against my database/class/cluster, I'm not really expecting the callback to be called anytime soon. But, hey, I would still want to be notified when there's a state change.
My need with Orientdb is more on lines of ElasticSearch's percolator bundled with a publish-subscribe system.
Is live query meant to cater to such use cases too? Or is my understanding of live query very limited? What could be a few possible use cases for the live query feature?
Thanks!
Whether or not Live Queries will be appropriate for your use case depends on a few things. There are several reason why live queries make sense. A few questions to ask are:
How frequently does the data change?
How soon after the data changes do you need to know about it?
How many different groups of data (e.g. classes, clusters) do you need to deal with?
How many clients are connected to the server?
If the data does not change very often, or if you can wait a set period of time before an update, or you don't have many clients (hitting the DB directly), or if you only have one thing feeding the database, then you might want to just do polling. There is a balance between holding a connection open that you send a message on very infrequently (live queries) and polling too often.
For example. It's possible that you have an application server (tomcat, node, etc) and that your clients connect via web sockets. Now lets say your app server makes one (or a few pooled) live query to the database. Now lets say your database has an update. It might just go from the database to the app server (e.g. node). Node may now be responsible for fanning out that message across 100 web sockets (1 for each connected client). In this case, the fact that node is connected to the database in a persistent way with a live query open, is not that big of a deal.
The question is. If you have thousands of clients connected, do they all need an immediate update. If so are you planning on having them polling at a short interval? If so, you probably could benefit from a live query. Lots of clients polling at a short interval will generate a lot of unnecessary traffic and queries.
Unfortunately at the end of the day, the answer is it depends. You probably need to prototype and then instrument under load to see what your tradeoffs are. But in principal, it is less about how frequently updates come, and more about how often you would have clients poll, and how many clients you have. If the answer is "short intervals and a lot of clients" Give live queries a try.
Related
For a desktop App (ERP like functionality) I'm and wondering what would be wiser to do.
Assuming that both machines are equal in performance and the server has to deal with max. 5-10 clients and no other obligations. Is it better to load all data initially (~20.000 objects) and do filtering, sorting etc. on the client (electron) or is it better to do the processing on the backend (golang + postgres) over Axios. The user interface should be as snappy as possible but also get the data as fast as possible.
A costly operation is filtering 15.000 Objects by a reference ID. (e.g. a client can have several orders)
So objects that belong to a "parent object" are displayed by querying all those objects by a parentID.
Is there a general answer to what would be more performant, or a better choice here? Doing some assumptions, like a latency of 5ms in the network + 20ms for the API + a couple for filling the store.
At which data size will this operation be slower on the frontend or completely unsustainable?
If it's not a performance problem, are there other reasons I would want to do this on the server?
Edit: Client and Server are on the same local network
You specifically mention an ERP-like software. For such software you have to carefully consider the value of consistency:
Will your software need to show the same data for all clients?
If the answer to this is yes, then the simplest implementation is to do data processing on the server which informs all clients of changing data.
If the answer to this is no, then you should be fine doing most processing on the client software.
There are of course ways to do most of your processing on the client yet still have consistency but they will add complexity to your overall design. One implementation is to broadcast changes on one client to all other clients. This is the architecture behind most multiplayer online games.
Another way to tackle this is implemented by git: the data on all clients are different from each other but there are ways to synchronize each client data with the server thus achieving eventual consistency.
Another consideration you have to think about is the size of your data:
Will downloading all the data from the server take more than a few seconds?
If downloading all data from the server takes too long then the UI will be essentially unresponsive when starting.
I am designing a whatsapp like messenger application for the desktop using WPF and .Net. Now, when a user creates a group I want other members of the group to receive a notification that they were added to a group. My frontend is built in C#.Net, which is connected to a RESTful Webservice (Ruby on Rails). I am using Postgres for the database. I also have a Redis layer to cache my rails models.
I am considering the following options.
1) Use Postgres's inbuilt NOTIFY/LISTEN mechanism which the clients can subscribe to directly. I foresee two issues here
i) Postgres might not be able to handle 10000's of clients subscribed directly.
ii) There is no guarantee of delivery if the client is disconnected
2) Use Redis' Pub/Sub mechanism to which the clients can subscribe. I am still concerned with no guarantee of delivery here.
3) Use a messaging queue like RabbitMQ. The producer of this queue will be postgres which will push in messages through triggers. The consumer of-course will be the .Net clients.
So far, I am inclined to use the 3rd option.
Does anyone have any suggestions how to design this?
In an application like WhatsApp itself, the client running in your phone is an integral part of a large and complex event-based, distributed system.
Without more context, it would be impossible to point in the right direction. That said:
For option 1: You seem to imply that each client, as in a WhatsApp client, would directly (or through some web service) communicate with Postgres as an event bus, which is not sound and would not scale because you can only have ONE Postgres instance.
For option 2: You have the same problem that in option 1 with worse failure modes.
For option 3: RabbitMQ seems like a reasonable ally here. It is distributed in nature and scales well. As a matter of fact, it runs on erlang just as most of WhatsApp does. Using triggers inside Postgres to publish messages however does not make a lot of sense.
You need a message bus because you would have lots of updates to do in the background, not to directly connect your users to each other. As you said, clients can be offline.
Architecture is more about deferring decisions than taking them.
I suggest that you start simple. Build a small, monolithic, synchronous system first, pushing updates as persisted data to all the involved users. For example; In a group of n users, just write n records to a table. It is already complicated to reliably keep track of who has received and read what.
This heavy "group" updates can then be moved to long-running processes using RabbitMQ or the like, but a system with several thousand users can very well work without such thing, especially because a simple message from user A to user B would not need many writes.
I recently discovered Meteor, and I really love the simplicity that it brings to programming new apps. My question is: how do you connect it to an existing back-end? We have a substantial amount of existing Clojure code, also running with MongoDB. What I would like to do is use Meteor to build the front-end of my app. I guess I could connect my Meteor app directly to the MongoDB instance of the back-end, but this does not seem like a good practice... or is it?
Another option I imagined was to access the DB from either the webapp or the Clojure code and create a separate way of communication between the two with a queue mechanism, or sockets. Any hint or pointer to relevant documentation would be helpful!
Take a look at Meteor's environment variable settings. By setting these variables you can easily define an external MongoDB instance. In particular it would be
$export MONGO_URL="mongodb://yourmongodbserver/your-db"
There is a screencast of eventedmind.com for this specific topic https://eventedmind.com/feed/sg3ejYnmhxpBNoWan which is quite resourceful.
Regarding the "how" to point them to the same, #Michael's answer is spot on; just point your Meteor web servers at the same MongoDB.
Regarding whether or not you should, that depends on your situation. Having everything run off the same DB certainly simplifies things.
Having separate dbs can potentially reduce the load on your db tier as you could selectively choose which writes/updates to replicate between the clojure and Meteor dbs.
One issue with either method is speed of notification of changes. Currently, Meteor servers poll the DB every 10 secs to recognize changes. Happily, once the oplog branch gets merged into master, it will give a large speed improvement in how quickly external changes made in the DB (as opposed to directly through a Meteor server) are reflected in the Meteor clients. The oplog support will enable Meteor servers to emulate a replica-set instance, tailing the oplog which will mean practically instant notification of db changes.
Using a queue as a middle-ware layer introduces complexity and adds another point of failure. It also increases latency of notification. These issues can be mitigated, though, and there may be other pieces of your infrastructure in the future that would benefit from such a middle-ware queue. For example, other interested systems could register with the queue to receive notification of changes without querying or needing to know about your db. You can also scale your MongoDB instances independently and tune the queue to determine what "eventually" means in the "eventually consistent" guarantee.
I think the questions to ask are:
how much overlap is there between the clojure dataset and the Meteor dataset
how quickly do you need changes to be reflected between the two
will a middle-ware queue be useful in other circumstances as you grow
Regarding possible queue technologies to look into, I've heard very good things about RabbitMQ. The Oct. 2013 talk at the Clojure NYC meetup included a description of switching to RabbitMQ from Amazon SQS due to latency issues with SQS and anecdotally RabbitMQ has been rock-solid for them.
I'm considering how to design a mechanism for replicating a (potentially large) MongoDB or other NoSQL (CouchDB, etc) database to dozens of clients at once. The clients would function like a replica set, but the replication would be one-way and the remote clients would belong to other parties. Specifically, I am looking for the following features:
real-time: changes to the master database should be pushed out to the clients as quickly as possible
replication to new clients: a new client must be able to connect, automatically sync the majority of existing data, then receive real-time updates.
efficient: both the initial synchronization/transfer of data and tracking of real-time updates ("diffs", if you will) are computationally efficient, with multiple clients connected.
secure: the master database presents an interface to which remote clients (who do not belong to the same owner or system) can connect: i.e., we cannot just add all the clients to the master's replica set.
robust: a temporarily connection failure between a client and the master database should be easily and efficiently recoverable.
In some sense, the server is publishing a collection of data and the clients are subscribing to it. I realize that this is a hard software engineering problem, and to my knowledge no piece of software has implemented this exactly yet. However, some approaches have come to mind as close, which I'll list below.
Meteor's DDP protocol: It's designed to do this with Mongo-like collections and exactly implements the model of publishing and subscribing to a set of data (rather than a stream of messages). It manages the initial sync and sends along live changes. However, it's still in development, and far from being an industrial-strength solutions - current drawbacks are that the server keeps a copy of every client's state in a possibly inefficient way and is only tested on collections that can fit in the memory of a web app. Also, it appears that DDP cannot efficiently sync an out-of-date database without fetching everything from scratch. If anyone can point to some examples of how large of a collection can be synced over DDP, that would be great. (See also: https://stackoverflow.com/q/10128430/586086)
Broadcasting the Mongo oplog: Using a high-throughput message bus like Apache Kafka, one may be able to efficiently send the oplog to many clients at once. This tackles some of the system implementation challenges. However, this requires that the clients start with an initial sync that gets them close enough to the current master state somehow and then start replaying the oplog from the appropriate point.
Continuous replication a la CouchDB: I'm not sure how this is implemented and how robust it is, given the sparsity of the documentation. However, it does seem to work over remote database connections. How efficient is this, though, when multiple clients are trying to replicate at the same time? (A similar hack to this would be to make the clients MongoDB Priority 0 replica set members; however, that seems to be far from its intended use. See also: http://guide.couchdb.org/draft/replication.html)
Please give pointers to software or pieces of software that already implement parts of this, or suggestions on the algorithms/data structures needed to do this efficiently.
If you are looking specifically for real-time replication, I'd recommend you look into SaaS offerings specifically for this purpose, such as https://www.firebase.com/
is streaming a viable option?
will there be a performance difference on the server end depending on which i choose?
is one better than the other for this case?
I am working on a GWT application with Tomcat running on the server end. To understand my needs, imagine updating the stock prices of several stocks concurrently.
Do you want the process to be client- or server-driven? In other words, do you want to push new data to the clients as soon as it's available, or would you rather that the clients request new data whenever they see fit, even though that might not be once/second? What is the likelyhood that the client will be able to stick around to wait for an answer? Even though you expect the events to occur once/second, how long does it take between a request from a client and the return from the server? If it's longer than a second, I'd expect you to lean towards pushing the events to the clients, though the other way around, I'd expect polling to be okay. If the response takes longer than the interval, then you're essentially streaming anyway, since there's a new event ready by the time the client receives the last one, so the client could essentially poll continually and always receive events - in this case, streaming the data would actually be more lightweight, since you're removing the connection/negotiation overhead from the process.
I would suspect that server load to be higher for a client-based (pull) subscription, instead of a streaming configuration, since the client would have to re-negotiate the connection each time, instead of leaving a connection open, but each open connection in a streaming model would require server resources as well. It depends on what the trade-off is between how aggressive your negotiation process is vs. how much memory/processing is required for each open connection. I'm no expert, though, so there may be other factors.
UPDATE: This guy talks about the trade-offs between long-polling and streaming, and he seems to say that with HTTP/1.1, the connection re-negotiation process is trivial, so that's not as much of an issue.
It doesn't really matter. The connection re-negotiation overhead is so slim with HTTP1.1, you won't notice any significant performance differences one way or another.
The benefits of long-polling are compatibility and reliability - no issues with proxies, ports, detecting disconnects, etc.
The benefits of "true" streaming would potentially be reduced overhead, but as mentioned already, this benefit is much, much less than it's made out to be.
Personally, I find a well-designed comet server to be the best solution for large numbers of updates and/or server-push.
Certainly, if you're looking to push data, streaming would seem to provide better performance, if your server can handle the expected number of continuous connections. But there's another issue that you don't address: Are you internet or intranet? Streaming has been reported to have some problems across proxies, much as you'd expect. So for a general purpose solution, you would probably be better served by long poll - for an intranet, where you understand the network infrastructure, streaming is quite likely a simpler, better performance solution for you.
The StreamHub GWT Comet Adapter was designed exactly for this scenario of streaming stock quotes. Example here: GWT Streaming Stock Quotes. It updates the stock prices of several stocks concurrently. I think the implementation underneath is Comet which is essentially streaming over HTTP.
Edit: It uses a different technique for each browser. To quote the website:
There are several different underlying
techniques used to implement Comet
including Hidden iFrame,
XMLHttpRequest/Script Long Polling,
and embedded plugins such as Flash.
The introduction of HTML 5 WebSockets
in to future browsers will provide an
alternative mechanism for HTTP
Streaming. StreamHub uses a "best-fit"
approach utilizing the most performant
and reliable technique for each
browser.
Streaming will be faster because data only crosses the wire one way. With polling, the latency is at least twice.
Polling is more resilient to network outages since it doesn't rely on a connection being kept open.
I'd go for polling just for the robustness.
For live stock prices I would absolutely keep the connection open, and ensure user alert/reconnection on disconnect.