any suggestions as to what tools I should use to visualize data aggregated into MongoDB on my desktop? I am trying to visual stock data that I've pulled from an API and would like stream said data being pulled into MongoDB in real time. Any suggestions? Is power BI a good go to? Kinda lost here...
You can use MongoDB BI connector with Power BI's On-Premise Gateway solution for Mongo DB. But, it won't give you real time data. You will have to refresh your dataset periodically.
If you want to go with real time data visualization, you should try Power BI's Real Time Streaming functionality with API or PubNub.
Related
I am using grafana to visualize some data from my InfluxDB-database, that collects data from some sensors. That said, it's the first time for my working with both grafana and InfluxDB, also I'm pretty new to coding so my knowledge is very limited.
As I scroll through threads and forums on the web trying to find guidance, I find a lot of tutorials mostly 2-4 years old that seem to use features in grafana that are simply not available vor me.
For example I tried to set an alert which tells me when my sensor is delivering flawed values (values that I my case cannot physically be true) too often. But when I'm using avg() from the classic condition operations, I can't select a time frame in which I want the average value monitored.
My expression part of the alert settings
Is it a problem that has to be configured via grafana.ini? Is it because these features cannot be used with InfluxDB?
For some background information, I'm using a Ubuntu Server via VirtualBox to run both the database and the grafana server. I'm using a little python script to distribute the sensor data into the database.
If someone could help me out soon that would be great!
I am currently working on a project and trying to find out what is the most suitable Azure option to make daily API calls to Dark Sky(https://darksky.net/dev)
In short we are looking to bring weather data inside our data lake storage(Azure Data Lake Store), and we would have to make 100s of thousands of calls a day to dark sky in order to retrieve the data(unfortunatelly they don't provice batch yet). My question is which service would be best to make the calls, get the data and write it to the data lake?
The ingestion will not be continous, and I would like to combine the messages in order to minimize write operations on the data lake.
I have been considering Azure functions and Azure batch, but I am unsure which one is more suitable and also more cost efective considering the given requirements.
Thank you for your help.
We have a mid size analytics engine built on top of Elastic Search cluster.
We store send data to our servers in form of json, very similar to what Google Analytics might be doing. We push this entire data in ES cluster. As of now which amounts to ~60GB per day(Approx 2TB per month).
We have a data retention policy of few months lets say 6 months(As per pricing plan).
We provide dynamic reports like ....
all the users who are coming from United States and are using the chrome browser and are using the browser on an iPhone.
the sum of clicks on a particular button of all the users who are coming from referrer matching regex “http://www.google.com” and are based out of India and are using Desktop.
PROBLEM
It has worked for us pretty good till now, but we are facing a problem to scale. As we have already deployed 100s of servers to handle this amount of data & show near real time analytics.
What I am looking for here is that how can I optimise data storage and still show near real time slicing and dicing of data. Imagine how google analytics or mix panel might be storing and showing data in real time.
I am open any technology shift. Suggestions please. (Something similar to GA or Mix Panel is what we have in term of feature)
Do you guys thing storing this huge amount of data in some NO-SQL like mongodb will work and running MAP-Reduce on that data? But that might not be real time(We can expect a delay of 5-10 mins in showing data)
Tech Stack Used(As of now)
Apache/Nginx as webserver + application code
Programming Language(Ruby/PHP etc)
Log collection/parsing via logstash
Elasticsearch cluster to store and query data
SDK written in Javascript which pushes events to our server(Like GA)
We store event payload which looks something like this.
{
"query_params":[
],
"device_type":"Desktop",
"browser_string":"Chrome 47.0.2526",
"ip":"62.82.34.0",
"screen_colors":"24",
"os":"Mac OS X",
"browser_version":"47.0.2526",
"session":1,
"country_code":"ES",
"document_encoding":"UTF-8",
"city":"Palma De Mallorca",
"tz":"Europe/Madrid",
"uuid":"A37F2D3A4B99FF003132D662EFEEAFCA",
"combination_goals_facet_term":"c2_g1",
"ts":1452015428,
"hour_of_day":17,
"os_version":"10.11.2",
"experiment":465,
"user_time":"2016-01-05T17:37:10.675000",
"direct_traffic":false,
"combination":"2",
"search_traffic":false,
"returning_visitor":false,
"hit_time":"2016-01-05T17:37:08",
"user_language":"es",
"device":"Other",
"active_goals":[
1
],
"account":196,
"url":"http://someurl.com",
"action":"click",
"country":"Spain",
"region":"Islas Baleares",
"day_of_week":"Tuesday",
"converted_goals":[
],
"social_traffic":false,
"converted_goals_info":[
],
"referrer":"http://www.google.com",
"browser":"Chrome",
"ua":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36",
"email_traffic":false
}
EDIT
"optimize data storage" means for every event we receive 70% data same in the json payload. However we keep on creating the new document in ES for event. I was hoping if somehow we stop storing the repeated keys of json and store only what changed in subsequent event payload. Thus optimizing storage space.
We are using SSDs on all our servers. What I am worried about is that what happens we talk about the scale of GA and similar amount of data. I doubt above mentioned Architecture or Tech will survive. Looking for suggestions for that sorta scale.
I think you are already using the best-suited stack for such kind of use case. What I would suggest to work on fine tuning the elasticsearch optimizations if already not done.
Some suggestions could be
Think of using SSD's instead of HDD for elastic search cluster.
Think of using fine tuning parameters like "refresh_intervals"
Using auto scaling via cloud some load balancers in order to handle proper requests.
Hope this helps.
I want to build an internal dashboard to show the key metrics of a startup.
All data is stored in a mongodb database on Mongolab (SaaS on top of AWS).
Queries to aggregate datas from all documents take 1-10minutes.
What is the best practice to cache such data and make it immediately available?
Should I run a worker thread once a day and store the result somewhere?
I want to build an internal dashboard to show the key metrics of a startup. All data is stored in a mongodb database on Mongolab (SaaS on top of AWS). Queries to aggregate datas from all documents take 1-10minutes.
Generally users aren't happy to wait on the order of minutes to interact with dashboard metrics, so it is common to pre-aggregate using suitable formats to support more realtime interaction.
For example, with time series data you might want to present summaries with different granularity for charts (minute, hour, day, week, ...). The MongoDB manual includes some examples of Pre-Aggregated Reports using time-based metrics from web server access logs.
What is the best practice to cache such data and make it immediately available?
Should I run a worker thread once a day and store the result somewhere?
How and when to pre-aggregate will depend on the source and interdependency of your metrics as well as your use case requirements. Since use cases vary wildly, the only general best practice that comes to mind is that a startup probably doesn't want to be investing too much developer time in building their own dashboard tool (unless that's a core part of the product offering).
There are plenty of dashboard frameworks and reporting/BI tools available in both open source and commercial flavours.
I am using Tableau Desktop 8.2 and Tableau server 8.2 (Licensed versions) , the workbook created in Tableau are successfully published to Tableau server.
But when the user want to see the views or workbooks it takes a very long time to preview or open?
The Workbooks are created with Amazon RedShift Database having (>5 million records)
Could somebody guide me on this? like what is it taking a long to preview or open even after being published to Tableau server?
First question, are the views performant when opened using only Tableau Desktop? Get them working well on Desktop before introducing Server into the mix.
then look at the logs in My Tableau Repository which include query strings and timing info to see if you can narrow down the cause. You can also try the Performance Recorder feature.
A typical problem is an overly expensive query just to display a dashboard. In that case, simplify. Start with a simple high level summary viz and the introduce complexity testing the impact on performance. If one viz is too slow, there are usually alternative approaches available
Completely agree with Alex; I had a similar issue with HP Vertica. I had lot of action set on the dashboard. Considering the database structure is final, I did created the tableau extract and used the Online tableau extract in place of live connection. Vola! that solved my problem and the users are happy with the response time as well. Hope this helps you too..
Tableau provides two mode of data refreshes:
Live : Tableau will execute underlying queries every time the
dashboard is referred / refreshed. Apart from badly formulated
queries, this is one of the reason why your dashboard on Tableau Online
might take forever to load.
Extract : Query will be executed once, according to (your) specified
schedule and same data will reflect everytime the dashboard is
refreshed.
In extract mode, the time is taken only when the extract is being refreshed. However, if the extract is not refreshed periodically, the same, stale data will reflect on the dashboard. Thus, extract view is not recommended for representations of live data.
You can toggle Live <--> Extract from the Data Source pane of Tableau Desktop. (refer top right of the snapshot).