Does it make sense to use ELK to collect page metrics? - postgresql

We would like to collect some interesting user-related metrics on our website (e.g. "user edited profile", or "user clicked on downloaded file", etc.) and are thinking about using the ELK stack for this.
Is it a good idea to use Elasticsearch to store such events? Or would it make more sense to log them in our RDBMS?
What would be the advantages of using either of those?
(Side note: We already use Elasticsearch and PostgreSQL in our our stack.

You could save your logs in any persistent solution out there and later decide what tool to use for analyzing them.
If you want to do some queries (manage your data on the fly/real-time) you could just directly parse/pipe the logs generated by your applications and send them to elastic search, the flow would be something like:
(your app) --> filebeat --> elasticsearch <-- Kibana
Just keep in mind that the elk stack is not "cheap" and based on your setup could become more expensive to maintain in long term.
At the end depends on your use case, both solutions you mention can be used to store data, but the way you extract/query data is the one that makes the difference.

Related

RethinkDB - How to stream data to the browser

Context
Greetings,
One day I randomly found RethinkDB and I was really fascinated by the whole real-time changes thing. In order to learn how to use this tool I quickly spinned up a container running RethinkDB and i started making a small project. I wanted to make something very simple therefore i thought about creating a service in which speakers can create room and the audience can ask questions. Other users can upvote questions in order to let the speaker know which one are the best. Obviously this project has a lot of realtime needs that i believe are best satisfied by using RethinkDB.
Design
I wanted to use a vary specific set of tools for this. The backend would be made in Laravel Lumen, the frontend in Vue.JS and the database of course would be RethinkDB.
The problem
RethinkDB as it seems is not designed to be exposed to the end user directly despite the fact that no security concern exists.
Assuming that the user only needs to see the questions and the upvoted in real time, no write permissions are needed and if a user changed the room ID nothing bad will happen since the rooms are all publicly accessible.
Therefore something is needed in order to await data updates and push it through a socket to the client (socket.io for example or pusher).
Given the fact that the backend is written in PHP i cannot tell Lumen to stay awake and wait for data updates. From what i have seen from the online tutorials a secondary system should be used that should listen for changes and then push them. (lets say a node.js service for example)
This is understandable however i strongly believe that this way of transferring the data to the user is inefficient and it defeats the purpose of RethinkDB.
If I have to send the action from the client's computer (user asks a question), save it to the database, have a script that listens for changes, then push the changes to socket.io and finally have the client (vue.js) act when a new event arrives, what is the point of having a real-time database in the first place?
I could avoid all this headache simply by having the Lumen app push the event directly to socket.io and user any other database system instead.
I really cant understand the point of all this. I am not experienced with no-sql databases by any means but i really want to experiment with them.
Thank you.
This is understandable however i strongly believe that this way of transferring the data to the user is inefficient and it defeats the purpose of RethinkDB.
RethinkDB has no built in mechanism to transfer data to end-users. It has no access control (in the conventional sense) as well. The common way, like you said, is to spin up one / multiple node instance(s) running socket.io. On each instance you can listen on your RethinkDB change streams and use socket.io's broadcast functionality. This would be a common way, but as RethinkDB's streams are pretty optimized, you could also open a change stream for every incoming socket.io connection.

Build internal dashboard analytics on top of mongodb

I want to build an internal dashboard to show the key metrics of a startup.
All data is stored in a mongodb database on Mongolab (SaaS on top of AWS).
Queries to aggregate datas from all documents take 1-10minutes.
What is the best practice to cache such data and make it immediately available?
Should I run a worker thread once a day and store the result somewhere?
I want to build an internal dashboard to show the key metrics of a startup. All data is stored in a mongodb database on Mongolab (SaaS on top of AWS). Queries to aggregate datas from all documents take 1-10minutes.
Generally users aren't happy to wait on the order of minutes to interact with dashboard metrics, so it is common to pre-aggregate using suitable formats to support more realtime interaction.
For example, with time series data you might want to present summaries with different granularity for charts (minute, hour, day, week, ...). The MongoDB manual includes some examples of Pre-Aggregated Reports using time-based metrics from web server access logs.
What is the best practice to cache such data and make it immediately available?
Should I run a worker thread once a day and store the result somewhere?
How and when to pre-aggregate will depend on the source and interdependency of your metrics as well as your use case requirements. Since use cases vary wildly, the only general best practice that comes to mind is that a startup probably doesn't want to be investing too much developer time in building their own dashboard tool (unless that's a core part of the product offering).
There are plenty of dashboard frameworks and reporting/BI tools available in both open source and commercial flavours.

Store views / clicks in the flat file database on Bolt.cm website

I have a site made with Bolt.cm. I would like to show the numbers of views for specific pages on my site on several places on a bolt.cm website.
How can I store these variables in the Bolt.cm flat-file database? How can I show the correct variables in the templates?
It would be nice if I could excludes some ip-addresses in the views. I am not a flat-file expert (yet) ;)
Example:
Article 1 - How to dominate google - 5 views
It's not possible by default, very deliberately.
You're far better off using something like Google Analytics for this sort of thing. Web servers get hammered by spiders/bot/crawlers and all sorts of things that would cripple your database performance in no time. Not to mention the fun of trying to decide what is a real request.
You could then write an extension to query your Google account on each page load, though I really wouldn't recommend it.
Whilst I would completely agree with #Gawain that it's much better to use Google Analytics, or maybe Piwik if you want control of your own data, should you wish to tackle it yourself then I'd suggest something along these lines, just before the $app->run() call.
$app->after(function($request){
$data = json_encode(
array(
'route'=>$request->get("_route"),
'params'=>$request->get("_route_params"),
'ip' => $request->server->get("REMOTE_ADDR")
)
);
error_log($data, 3, '/path/to/logger.txt');
});
That would at least give you a fast write of the necessary data to a log file, which you could then use a job processing command to pick up and save to the db.
Whilst you could do all this in the after handler if you really wanted to, doing an extra db hit on every page load could unnecessarily slow down your app, so use caution.

Using CouchDB as interface. Is it appropriate way?

our devices (microscopes with cameras) produce images and additional information to each image.
Now a middleware supplies wants to connect these devices to lab automation system. They have to acquire the data and we have to provide it. An astonishing thing for me was their interface suggestion - a very cryptical token separated format (ASTM E1394-97). Unfortunatelly, they even can't accomodate images in their protocol, and are aiming to get file-paths.
I thought it is not the up-to date approach. While lookink for alternatives, I saw CoachDB.
So, my idea was, our devices would import data including images in CoachDB and they could get the data. It seems even, that using mustache, we could produce the format they want (ascii-text) and placing URLs as image references instead of path's.
My question is, did someone applied CoachDB for such a use case already? It seems to be a little-bit misuse of CoachDB, as the main intention is interface not data storage. Another point disturbing me is, that the inventor of CoachDB went to other project Coachbase. Could it mean lack of support for CoachDB in the future?
Thank you very much for any insights and suggestions!
It's ok use-case and actually we're using CouchDB in such way - as proxing middleware between medical laboratory analyzers and LIS. Some of them publish images or pdf data on shared folders and we'd just loading them into related document as attachments.
More over you'd like to know, CouchDB is able to serve external processes (aka os_daemons) and take care about their lifespan: restarting if someone had terminated and starting right after you update config options through HTTP interface. This helps to setup ASTM client and server processes since this protocol is different from HTTP (which is native for CouchDB) which communicates with devices and creates documents as regular CouchDB clients. In same way you may setup daemons to monitor shared folders for specific files. And all this is just CouchDB with few "low bounded" plugins.

Best way to store dyamic data on iOS App from Web Service

I want to know what is the best way to store data on the iPhone from a web service.
I want the information to be stored on the device so the person doesn't need to access the web service every time he/she needs it. The currently information isn't much and contains less that 150 records. The records might update from time to time and a few new ones will be added. What is the best way to go about storing the data?
Thanks
If you use ASIHTTPRequest for your network stuff (and if you don't already, I can't sing its praises highly enough), you will find it has a cache layer built in which is perfect for situations like this.
You can activate it with a simple one line;
[ASIHTTPRequest setDefaultCache:[ASIDownloadCache sharedCache]];
And you have full control over the cache policy etc - just read the documentation.
The other simple approach of course is - on the assumption that your web service is returning JSON or XML - simply to store the response in a local file against a hash of the request parameters, then when you request the data again, you can first look to see if the file exists and if it does, return that data rather than going back to the website. You can roll your own cache policies etc too.
Since I discovered ASIHTTPRequest had a cache though, I've not needed to roll my own again.
I find that using coreData or sqllite3 is just overkill for 99% my requirements and a simple cache works very well.
If the data is relational, a Sqlite3 database would be the best storage option you have.
Also, this helps by allowing you to retrieve from the server and to update only the records that have changed, thus saving time and bandwidth.
This is the best option from a scalability point of view as well, as you stated that "current information isn't much", thus giving the impression that this is only a current situation, that may be subjected to further change, probably towards more records being added in time.
Sqite3 also gives you more control and better performance than using, for instance, Core Data. Here's an article explaining some of the details. Moreover, if you work through an Objective-C wrapper, such as FMDB, you get all the advantages without managing the complexity yourself.