I need to replace an old stats collector that receives stats from a number of apps, does a bit of aggregation (and maybe sorting), and then sends the processed stats onto influx.
The requirements that I have at the moment are that the application/service should make use of tags and loosely follow the metrics 2.0 format.
The frontend will be implemented in Erlang, but I can always make a small API to deal with small changes.
Apologies if I've used the wrong terminology or don't make sense in some places, I'm still relatively new in most of these areas.
Related
I have 2 closed-source application that must share the same data at some point. Both uses REST APIs.
An actual example are helpdesk tickets, they can be created on both applications and i need to update the data on one application when the user adds a new ticket/closes a ticket on the other application and vice versa.
Since is closed-source I can't really modify che code.
I was thinking I can create a third application that every 5 minutes or so, list both applications' tickets for differences on the precedent call, and if the data is different from the precedent call it updates the other application too.
Is there a better way of doing this?
With closed-source applications it's nearly impossible to get something out of them, unless they have some plugin-based setup that you can hook into.
The most efficient way in terms of costs would be to have the first application publish a message on a queue, or call a web-hook that you set, whenever the event is triggered. But as I mentioned, the application needs to support that.
So yeah, your solution is pretty much everything you can do for now, but keep in mind the challenges that you may encounter over time:
What if the results of both APIs are too large to be compared directly? Maybe you need to think about paging the results.
What if your app crashes and you loose the previous state? You need to somehow back it up in an external source
How often you should poll the API to make sure you're getting the updates you need, while keeping a good performance for the existing traffic?
Context
Greetings,
One day I randomly found RethinkDB and I was really fascinated by the whole real-time changes thing. In order to learn how to use this tool I quickly spinned up a container running RethinkDB and i started making a small project. I wanted to make something very simple therefore i thought about creating a service in which speakers can create room and the audience can ask questions. Other users can upvote questions in order to let the speaker know which one are the best. Obviously this project has a lot of realtime needs that i believe are best satisfied by using RethinkDB.
Design
I wanted to use a vary specific set of tools for this. The backend would be made in Laravel Lumen, the frontend in Vue.JS and the database of course would be RethinkDB.
The problem
RethinkDB as it seems is not designed to be exposed to the end user directly despite the fact that no security concern exists.
Assuming that the user only needs to see the questions and the upvoted in real time, no write permissions are needed and if a user changed the room ID nothing bad will happen since the rooms are all publicly accessible.
Therefore something is needed in order to await data updates and push it through a socket to the client (socket.io for example or pusher).
Given the fact that the backend is written in PHP i cannot tell Lumen to stay awake and wait for data updates. From what i have seen from the online tutorials a secondary system should be used that should listen for changes and then push them. (lets say a node.js service for example)
This is understandable however i strongly believe that this way of transferring the data to the user is inefficient and it defeats the purpose of RethinkDB.
If I have to send the action from the client's computer (user asks a question), save it to the database, have a script that listens for changes, then push the changes to socket.io and finally have the client (vue.js) act when a new event arrives, what is the point of having a real-time database in the first place?
I could avoid all this headache simply by having the Lumen app push the event directly to socket.io and user any other database system instead.
I really cant understand the point of all this. I am not experienced with no-sql databases by any means but i really want to experiment with them.
Thank you.
This is understandable however i strongly believe that this way of transferring the data to the user is inefficient and it defeats the purpose of RethinkDB.
RethinkDB has no built in mechanism to transfer data to end-users. It has no access control (in the conventional sense) as well. The common way, like you said, is to spin up one / multiple node instance(s) running socket.io. On each instance you can listen on your RethinkDB change streams and use socket.io's broadcast functionality. This would be a common way, but as RethinkDB's streams are pretty optimized, you could also open a change stream for every incoming socket.io connection.
This may be quite an easy question to answer as it may just be my lack of understanding, but if you are having to run the query twice - once of the server and once on the client - why not just publish all the collection data, and then just run one query on the client?
Obviously I don't mean doing this for the users collection, but if you have a blog Posts collection, wouldn't this be beneficial?
Publish all the post data, then subscribe to it and running whatever query is necessary on the client to get the data you need.
Publishing everything is good for 'development' environment as meteor adds autopublish by default but this has some fallacies in 'production' environment. I find this two points to be of importance
Security : The idea is, supply only as much data to the client as required. You can never trust the client and you don't know what the client may use the data for. For your use case, of simple blog posts, this may not be a serious risk but may be a critical risk for e commerce application. The last thing, you want is a hacker to use the data and leverage a bug in your code to do nasty stuff.
Data Overheads: For subscriptions, generally waitOn is used. Thus, till all the data has been made available to the client, the templates are not rendered. IF you have a very large amount of data it will take considerable time to render. So, it is advised to keep the data at 'only what to need' stage to optimize this time too.
our devices (microscopes with cameras) produce images and additional information to each image.
Now a middleware supplies wants to connect these devices to lab automation system. They have to acquire the data and we have to provide it. An astonishing thing for me was their interface suggestion - a very cryptical token separated format (ASTM E1394-97). Unfortunatelly, they even can't accomodate images in their protocol, and are aiming to get file-paths.
I thought it is not the up-to date approach. While lookink for alternatives, I saw CoachDB.
So, my idea was, our devices would import data including images in CoachDB and they could get the data. It seems even, that using mustache, we could produce the format they want (ascii-text) and placing URLs as image references instead of path's.
My question is, did someone applied CoachDB for such a use case already? It seems to be a little-bit misuse of CoachDB, as the main intention is interface not data storage. Another point disturbing me is, that the inventor of CoachDB went to other project Coachbase. Could it mean lack of support for CoachDB in the future?
Thank you very much for any insights and suggestions!
It's ok use-case and actually we're using CouchDB in such way - as proxing middleware between medical laboratory analyzers and LIS. Some of them publish images or pdf data on shared folders and we'd just loading them into related document as attachments.
More over you'd like to know, CouchDB is able to serve external processes (aka os_daemons) and take care about their lifespan: restarting if someone had terminated and starting right after you update config options through HTTP interface. This helps to setup ASTM client and server processes since this protocol is different from HTTP (which is native for CouchDB) which communicates with devices and creates documents as regular CouchDB clients. In same way you may setup daemons to monitor shared folders for specific files. And all this is just CouchDB with few "low bounded" plugins.
So i've been spending some time developing an iPhone app - it's a simple little game and is similar to "Words with friends" in that it:
1) is turn based
2) contacts a web service API to store the "game data" (turns, user info, etc).
In my case, i'm using .NET MVC and a SQL Server backend to develop the API. We're not talking an immense amount of data here - small images will be transferred back and forth and stored in the database though. A typical request would see a few records added or changed in the database.
I mostly don't have much concept of when things would start to get overloaded - my concern, of course, is that this thing takes off (obviously wishful thinking) and then my server gets so overwhelmed that it dies. That being said, I don't want to spend time and money on Windows Azure or something when my hosting needs may be totally trivial.
So, my somewhat general question is this - does anyone have any firsthand knowledge of when things start to get overloaded? Like...just a general estimate of number of requests or something for a time period, assuming each request hits the .NET app which then hits the database a reasonable number of times.
Even some anecdotal "My similar API gets hit 10,000 times a minute and is hosted on crappy shared hosting" would be awesome just so I get some concept.
Thanks in advance!
It is very hard to give a good answer to your question as it greatly depends on what precisely the backend does for each request. Even "trivial" services as you describe can easily differ greatly in performance depending on the actual implementation.
As a rough guideline based on our projects, if your API is a single HTTP request (no HTTPS), hitting a bare-bones controller, being translated into a single, simple SQL statement ("SELECT * FROM foo WHERE bar") returning less than 100 Bytes of data, you can serve about 750 requests per minute on a 32 Bit, 1 Gigahertz box with 512MB ram.
But this number will be reduced to 75 or less if any of those factors go up.
That said:
This is the poster-child case for cloud computing.
If Azure is too much hassle / cost for you (which is not an uncommon complaint from independent developers) you have three main alternatives:
1) Ditch .NET in favor of Python and host within Google App Engine
Python is quick to learn and GAE scales beautifully without you ever needing to care. Best of all, there is a huge free-tier so unless your app really takes off, you won't pay a cent. As you are developing for iOS, I assume you aren't hell bent on .NET to begin with.
2) If you need .NET, go with AWS
They also have a rather large free-tier. Either throw everything on top of a Mono stack (completely free for the 1st year) or shell out the money for a Windows EC2 instance. This takes more planning than GAE but with a little work you can make it scale to wherever your app goes.
If cost is a concern, use the same AWS cluster to host several of your Apps' APIs.
3) Go with OpenFeint's Multiplayer API
OpenFeint supports basic multiplayer games. If you can implement the needed functionality using it, then this might be the best solution. If not, look into (1) and (2).
How long is a piece of string? It all depends with the hosting and connection speeds. .Net is more than capable of handling LARGE amounts of requests. The simplest solution is to monitor the server (or if you cannot, monitor your web services performance) and get better hosting if your app starts to suffer.