Missing features in Grafana - grafana

I am using grafana to visualize some data from my InfluxDB-database, that collects data from some sensors. That said, it's the first time for my working with both grafana and InfluxDB, also I'm pretty new to coding so my knowledge is very limited.
As I scroll through threads and forums on the web trying to find guidance, I find a lot of tutorials mostly 2-4 years old that seem to use features in grafana that are simply not available vor me.
For example I tried to set an alert which tells me when my sensor is delivering flawed values (values that I my case cannot physically be true) too often. But when I'm using avg() from the classic condition operations, I can't select a time frame in which I want the average value monitored.
My expression part of the alert settings
Is it a problem that has to be configured via grafana.ini? Is it because these features cannot be used with InfluxDB?
For some background information, I'm using a Ubuntu Server via VirtualBox to run both the database and the grafana server. I'm using a little python script to distribute the sensor data into the database.
If someone could help me out soon that would be great!

Related

How to scale custom made analytics engine?

We have a mid size analytics engine built on top of Elastic Search cluster.
We store send data to our servers in form of json, very similar to what Google Analytics might be doing. We push this entire data in ES cluster. As of now which amounts to ~60GB per day(Approx 2TB per month).
We have a data retention policy of few months lets say 6 months(As per pricing plan).
We provide dynamic reports like ....
all the users who are coming from United States and are using the chrome browser and are using the browser on an iPhone.
the sum of clicks on a particular button of all the users who are coming from referrer matching regex “http://www.google.com” and are based out of India and are using Desktop.
PROBLEM
It has worked for us pretty good till now, but we are facing a problem to scale. As we have already deployed 100s of servers to handle this amount of data & show near real time analytics.
What I am looking for here is that how can I optimise data storage and still show near real time slicing and dicing of data. Imagine how google analytics or mix panel might be storing and showing data in real time.
I am open any technology shift. Suggestions please. (Something similar to GA or Mix Panel is what we have in term of feature)
Do you guys thing storing this huge amount of data in some NO-SQL like mongodb will work and running MAP-Reduce on that data? But that might not be real time(We can expect a delay of 5-10 mins in showing data)
Tech Stack Used(As of now)
Apache/Nginx as webserver + application code
Programming Language(Ruby/PHP etc)
Log collection/parsing via logstash
Elasticsearch cluster to store and query data
SDK written in Javascript which pushes events to our server(Like GA)
We store event payload which looks something like this.
{
"query_params":[
],
"device_type":"Desktop",
"browser_string":"Chrome 47.0.2526",
"ip":"62.82.34.0",
"screen_colors":"24",
"os":"Mac OS X",
"browser_version":"47.0.2526",
"session":1,
"country_code":"ES",
"document_encoding":"UTF-8",
"city":"Palma De Mallorca",
"tz":"Europe/Madrid",
"uuid":"A37F2D3A4B99FF003132D662EFEEAFCA",
"combination_goals_facet_term":"c2_g1",
"ts":1452015428,
"hour_of_day":17,
"os_version":"10.11.2",
"experiment":465,
"user_time":"2016-01-05T17:37:10.675000",
"direct_traffic":false,
"combination":"2",
"search_traffic":false,
"returning_visitor":false,
"hit_time":"2016-01-05T17:37:08",
"user_language":"es",
"device":"Other",
"active_goals":[
1
],
"account":196,
"url":"http://someurl.com",
"action":"click",
"country":"Spain",
"region":"Islas Baleares",
"day_of_week":"Tuesday",
"converted_goals":[
],
"social_traffic":false,
"converted_goals_info":[
],
"referrer":"http://www.google.com",
"browser":"Chrome",
"ua":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36",
"email_traffic":false
}
EDIT
"optimize data storage" means for every event we receive 70% data same in the json payload. However we keep on creating the new document in ES for event. I was hoping if somehow we stop storing the repeated keys of json and store only what changed in subsequent event payload. Thus optimizing storage space.
We are using SSDs on all our servers. What I am worried about is that what happens we talk about the scale of GA and similar amount of data. I doubt above mentioned Architecture or Tech will survive. Looking for suggestions for that sorta scale.
I think you are already using the best-suited stack for such kind of use case. What I would suggest to work on fine tuning the elasticsearch optimizations if already not done.
Some suggestions could be
Think of using SSD's instead of HDD for elastic search cluster.
Think of using fine tuning parameters like "refresh_intervals"
Using auto scaling via cloud some load balancers in order to handle proper requests.
Hope this helps.

Need advice: How to share a potentially large report to remote users?

I am asking for advice on possibly better solutions for the part of the project I'm working on. I'll first give some background and then my current thoughts.
Background
Our clients can use my company's products to generate potentially large data sets for use in their industry. When the data sets are generated, the clients will file a processing request to us.
We want to send the clients a summary email which contains some statistical charts as well as sampling points from the data sets so they can do some initial quality control work. If the data sets are of bad quality, they don't need to file any request.
One problem is that the charts and sampling points can be potentially too large to be sent in an email. The charts and the sampling points we want to include in the emails are pictures. Although we can use low-quality format such as JPEG to save space, we cannot control how many data sets would be included in the summary email, so the total size could still exceed the normal email size limit.
In terms of technologies, we are mainly developing in Python on Ubuntu 14.04.
Goals of the Solution
In general, we want to present a report-like thing to the clients to do some initial QA. The report may contains external links but does not need to be very interactive. In other words, a static report should be fine.
We want to reduce the steps or things that our clients must do to read the report. For example, if the report can be just an email, the user only needs to 1). log in and 2). open the email. If they use a client software, they may skip 1). and just open and begin to read.
We also want to minimize the burden of maintaining extra user accounts for both us and our clients. For example, if the solution requires us to register a new user account, this solution is, although still acceptable, not ranked very high.
Security is important because our clients don't want their reports to be read by unauthorized third parties.
We want the process automated. We want the solution to provide programming interface so that we can automate the report sending/sharing process.
Performance is NOT a critical issue. Our user base is not large. I think at most in hundreds. They also don't generate data that frequently, at most once a week. We don't need real-time response. Even a delay of a few hours is still acceptable.
My Current Thoughts of Solution
Possible solution #1: In-house web service. I can set up a server machine and develop our own web service. We put the report into our database and the clients can then query via the Internet.
Possible solution #2: Amazon Web Service. AWS is quite mature but I'm not sure if they could be expensive because so far we just wanna share a report with our remote clients which doesn't look like a big deal to use AWS.
Possible solution #3: Google Drive. I know Google Drive provides API to do uploading and sharing programmatically, but I think we need to register a dedicated Google account to use that.
Any better solutions??
You could possibly use AWS S3 and Cloudfront. Files can easily be loaded into S3 using the AWS SDK's and API. You can then use the API to generate secure links to the files that can only be opened for a specific time and optionally from a specific IP.
Files on S3 can also be automatically cleaned up after a specific time if needed using lifecycle rules.
Storage and transfer prices are fairly cheap with AWS and remember that the S3 storage cost indicated is by the month so if you only have an object loaded for a few days then you only pay for a few days.
S3: http://aws.amazon.com/s3/pricing
Cloudfront: https://aws.amazon.com/cloudfront/pricing/
Here's a list of the SDK's for AWS:
https://aws.amazon.com/tools/#sdk
Or you can use their command line tools for Windows batch or powershell scripting:
https://aws.amazon.com/tools/#cli
Here's some info on how the private content urls are created:
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PrivateContent.html
I will suggest to built this service using mix of your #1 and #2 options. You can do the processing and for transferring the data leverage AWS S3 which is quiet cheap.
Example: 100GB costs like approx $3.
Also AWS S3 will be beneficial as you are covered for any disaster on your local environment your data will be safe in S3.
For security you can leverage data encryption and signed URLS in AWS S3.

WoW Addon to REST API

I´m going to create a web service for learning purposes and wanted to combine it with my WoW Hobby. My goal would be to create a "simple" Addon, which tracks my battleground activity in real time.
So when queuing for AB it enters my data in an db and when I´m out of the BG it should delete the db entry. The information should be stored in an JSON/XML-File and whenever the bg-status changes it should execute the post/update on the DB on the RESTful service.
The real time communication is very important here and I would like to know which ways of communicating to a web service are available, so I could directly dive in and create a solution.I´d like to have resources instead of solutions.
Currently I´m not used to LUA, but would like to learn it to get the knowledge of creating such a service.Which sites are you suggesting for learning LUA, especially the WoW-API?
Addons only write to disk when you log out of a character (and read that saved data when you log in) so what you intend would not be possible.*
More involved ways of communicating with the rest of the computer or even the internet are prohibited to prevent the gain of certain advantages, an example would be looking up details about your Arena opponents.
* Well, there are certainly some ways, but rather complicated ones: a program monitoring sound output to check when the BG queue pop sound is played, or a screengrabber that registers when the BG score screen comes up (which can be viewed during the match though, too)

Published Workbook or Dashboards takes quite long time to open in Tableau server

I am using Tableau Desktop 8.2 and Tableau server 8.2 (Licensed versions) , the workbook created in Tableau are successfully published to Tableau server.
But when the user want to see the views or workbooks it takes a very long time to preview or open?
The Workbooks are created with Amazon RedShift Database having (>5 million records)
Could somebody guide me on this? like what is it taking a long to preview or open even after being published to Tableau server?
First question, are the views performant when opened using only Tableau Desktop? Get them working well on Desktop before introducing Server into the mix.
then look at the logs in My Tableau Repository which include query strings and timing info to see if you can narrow down the cause. You can also try the Performance Recorder feature.
A typical problem is an overly expensive query just to display a dashboard. In that case, simplify. Start with a simple high level summary viz and the introduce complexity testing the impact on performance. If one viz is too slow, there are usually alternative approaches available
Completely agree with Alex; I had a similar issue with HP Vertica. I had lot of action set on the dashboard. Considering the database structure is final, I did created the tableau extract and used the Online tableau extract in place of live connection. Vola! that solved my problem and the users are happy with the response time as well. Hope this helps you too..
Tableau provides two mode of data refreshes:
Live : Tableau will execute underlying queries every time the
dashboard is referred / refreshed. Apart from badly formulated
queries, this is one of the reason why your dashboard on Tableau Online
might take forever to load.
Extract : Query will be executed once, according to (your) specified
schedule and same data will reflect everytime the dashboard is
refreshed.
In extract mode, the time is taken only when the extract is being refreshed. However, if the extract is not refreshed periodically, the same, stale data will reflect on the dashboard. Thus, extract view is not recommended for representations of live data.
You can toggle Live <--> Extract from the Data Source pane of Tableau Desktop. (refer top right of the snapshot).

Realtime backend platform for reporting / dashboards?

I will build a dashboard system for my apps, where a page will have several widgets that draw charts, tables and glyphs representing potentially unrelated data.
The client will be HTML5 and I can push for only modern web browser.
My big problem is what backend use for this. I want to store "tables" for use in the charts and in real-time update the widgets.
For example, a invoicing widget will show how much $$ have been collected today. In the "table" will have a row for each total of the invoice:
inv = 1; total = 50
Total: 50
and the widget will draw that. When new data is pushed:
inv = 2; total = 100
Total: 150
The widget will show in realtime the total to the end-user.
The data is private for the user company. Eventually I will need to purge too old data (ie: I only need to keep as much data is necessary to proper evaluation of the info need for the end-user. For example, only keep 1 month of invoicing totals).
I'm thinking in use something like http://www.firebase.com/ or http://pusher.com/ but I suspect only solve the "notify in realtime" part of the equation. As far as I understand, they not let me get past data (ie: If the data is update in the weekend and the user open his dashboard to see what happened)
Then I see http://derbyjs.com/ and the possibility to use mongodb.
I wonder which backend/platform will bring me closer to the build of this system. I have experience with python/django/.net/postgress but could accept the use of something else if solve best this kind of app behavior.
Firebase offers both the "notify in relatime" part that you mention, as well as persistent data storage. Take a look at the tutorial, which walks you through building a real-time persisted chat app (the past chat messages are stored in Firebase and are sent back to the client every time you reload). And you can do much more complicated stuff like the real-time charts / widgets that you mention as well.
The big limitation with Firebase right now is that we're in closed beta and the data is currently unprotected (anybody can read and write your data). The security features are coming soon though.
Some other backend platforms you may want to evaluate are: Meteor and Simperium. Firebase and Simperium are cloud services where your data is stored in the cloud and you don't have to manage any servers of your own, while Meteor and DerbyJS are platforms that you have to install and run on your own server.
I would recommend signalR. It's amazing and you can literally do anything with it. Check it out: www.signalr.net and if you have any problems simply go to www.jabbr.net You will find a very helpful community there. I implemented a notification mechanism similar to facebook together with real time monitoring and a small chat in the same web site.