In IBM Real-Time Insights, I have a device for which I am creating a schema with a Virtual Data Point that derives its value from one of the device's data points (pressure divided by 100). However, when trying to visualize the data on a dashboard it shows up as "Not available". Is there anything in addition to the schema definition that I need to do in order for a Virtual Data Point to work?
Virtual data points can currently only be used with rules. Dashboard widgets do not support displaying virtual data points.
Related
Architecture which we are using currently is as below
Private Web App Services hosted in US Region and India Region.
Both the apps are behind the respective App Gateway, this app Gateway is behind the front door which helps us serve the request from the nearest app gateway. But Both apps uses the same postgres which is present in US region.
Now our issue is when we hit the api from US response time is less then 2sec whereas when we hit the api from India region it takes 70sec.
How can we reduce the latency ?
Actually, the problem is the APIs does write operation due to which we cannot a read replica.
There a few things you can do
1- Add a cache layer to both regions and rather than querying directly on DB, check if the data is available in the cache first, and if it's not, get it from DB and add to the cache layer.
2- Add a secondary database on India region which will be a read only.
PS: You may have stale data with both approaches so you should sync properly according to your requirements
I have a board on my Watson IoT dashboard, which I use to monitor temperature real-time with a line graphic.
I have a second dashboard with a different organisation, and I want to import the graphic which is in the board in the first dashboard, without making again the same graphic.
I've already tried that solution and, even if possibile, it wouldn't be efficent. I mean, I already have the data on IoT why sending from IoT to Iot?
So, can i display on, let's call it... dashboard2, the board with the temperature (or any other property) which is in dashboard1 without sending or duplicate data on dashboard2?
If so, how can i do that? It's almost a week that I'm searching, and I start to doubt that something like this actually exists...
We have a mid size analytics engine built on top of Elastic Search cluster.
We store send data to our servers in form of json, very similar to what Google Analytics might be doing. We push this entire data in ES cluster. As of now which amounts to ~60GB per day(Approx 2TB per month).
We have a data retention policy of few months lets say 6 months(As per pricing plan).
We provide dynamic reports like ....
all the users who are coming from United States and are using the chrome browser and are using the browser on an iPhone.
the sum of clicks on a particular button of all the users who are coming from referrer matching regex “http://www.google.com” and are based out of India and are using Desktop.
PROBLEM
It has worked for us pretty good till now, but we are facing a problem to scale. As we have already deployed 100s of servers to handle this amount of data & show near real time analytics.
What I am looking for here is that how can I optimise data storage and still show near real time slicing and dicing of data. Imagine how google analytics or mix panel might be storing and showing data in real time.
I am open any technology shift. Suggestions please. (Something similar to GA or Mix Panel is what we have in term of feature)
Do you guys thing storing this huge amount of data in some NO-SQL like mongodb will work and running MAP-Reduce on that data? But that might not be real time(We can expect a delay of 5-10 mins in showing data)
Tech Stack Used(As of now)
Apache/Nginx as webserver + application code
Programming Language(Ruby/PHP etc)
Log collection/parsing via logstash
Elasticsearch cluster to store and query data
SDK written in Javascript which pushes events to our server(Like GA)
We store event payload which looks something like this.
{
"query_params":[
],
"device_type":"Desktop",
"browser_string":"Chrome 47.0.2526",
"ip":"62.82.34.0",
"screen_colors":"24",
"os":"Mac OS X",
"browser_version":"47.0.2526",
"session":1,
"country_code":"ES",
"document_encoding":"UTF-8",
"city":"Palma De Mallorca",
"tz":"Europe/Madrid",
"uuid":"A37F2D3A4B99FF003132D662EFEEAFCA",
"combination_goals_facet_term":"c2_g1",
"ts":1452015428,
"hour_of_day":17,
"os_version":"10.11.2",
"experiment":465,
"user_time":"2016-01-05T17:37:10.675000",
"direct_traffic":false,
"combination":"2",
"search_traffic":false,
"returning_visitor":false,
"hit_time":"2016-01-05T17:37:08",
"user_language":"es",
"device":"Other",
"active_goals":[
1
],
"account":196,
"url":"http://someurl.com",
"action":"click",
"country":"Spain",
"region":"Islas Baleares",
"day_of_week":"Tuesday",
"converted_goals":[
],
"social_traffic":false,
"converted_goals_info":[
],
"referrer":"http://www.google.com",
"browser":"Chrome",
"ua":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36",
"email_traffic":false
}
EDIT
"optimize data storage" means for every event we receive 70% data same in the json payload. However we keep on creating the new document in ES for event. I was hoping if somehow we stop storing the repeated keys of json and store only what changed in subsequent event payload. Thus optimizing storage space.
We are using SSDs on all our servers. What I am worried about is that what happens we talk about the scale of GA and similar amount of data. I doubt above mentioned Architecture or Tech will survive. Looking for suggestions for that sorta scale.
I think you are already using the best-suited stack for such kind of use case. What I would suggest to work on fine tuning the elasticsearch optimizations if already not done.
Some suggestions could be
Think of using SSD's instead of HDD for elastic search cluster.
Think of using fine tuning parameters like "refresh_intervals"
Using auto scaling via cloud some load balancers in order to handle proper requests.
Hope this helps.
(See picture)
I have successfully connected my iOT device to the BlueMix IoT platform
I can see all the events nicely flowing into the dashboard
I now enabled the extension in BlueMix IoT to store all data in "Historical Data Storage" (refer to https://developer.ibm.com/recipes/tutorials/cloudant-nosql-db-as-historian-data-storage-for-ibm-watson-iot-parti/#r_step3)
I can see the data correctly being written in the database
When I put a line graph on the dashboard in BlueMix IoT it does show a graph but only for the realtime data, it seams its not using the historical data now stored in the database. (refer to https://developer.ibm.com/recipes/tutorials/cloudant-nosql-db-as-historian-data-storage-for-ibm-watson-iot-partiii/)
After being in contact with IBM using Skype with screen sharing we found the solution.
Turns out that I did not enter an event in the cards config; I only entered a property and for some reason this is ok for realtime data but not for fetching the historical data out of the DB!
As soon as I entered my event (In my case it was 'status' but this should match your MQTT event name specifically) al worked ok!
Can you confirm per the details in the recipe - https://developer.ibm.com/recipes/tutorials/cloudant-nosql-db-as-historian-data-storage-for-ibm-watson-iot-partiii/, that your Window Size was configured while creating line chart card to display the data for the time period it was gathered in the database?
For example, Data from Last 24 Hours, in this case, window size is 24 Hours and we should see the data from historian only for last 24 Hours not beyond that whether data is in real time / data is stored in historian.
Be sure the window size configured for your chart encompasses the historical time for when you expected to see data.
We have a range of web applications here that allow users to download selected data from a number of databases and online services. Mainly Environmental information. We can track users visiting web pages using tools like Piwik or Google Analytics. We also want to track the amount of resource or data that they use, possibly also applying limits to record downloads.
If this was a single DB system we could track rows delivered within the db. However here we have a SOA with a range of sources and sinks. What I envisage is a service that can be messaged by other systems to register or track the amount of a resource used.
e.g User Andrew was sent 125MB of water quality data.
The central data metering service tracks usage messages from a variety of sources, produces reports and where appropriate applies caps or billing limits.
This service might be expanded to include processing as well as data download.
I would consider this to be a not unusual requirement but I can't find much in the way of existing software for it - perhaps because I am not using the correct terminology.
SO my questions:
What would you call this service - what keywords will lead me to existing systems?
What solutions already exist in this area - in particular FOSS or cloud based systems?
Could something like Google Analytics be persuaded to operate in this fashion?
It would be possible to do with the measurement protocol from Google Universal Analytics in conjunction with the user id feature in Analytics and one or more custom dimensions.
The measurement protocol is a language agnostic vaguely REST-like (inasfar as you send a bunch of parameters to an endpoint) protocol to send tracking data to the Google servers.
User id is a feature to recognize authenticated users across devices and multiple visits.
If the various parts of your setup send http calls build to the measurement protocol and include the user id to recognize the user and a value for a custom dimension for the file size (or rather a custom metric if you want to have sums and averages) and maybe a custom dimension for the file name you can send this to you Analytics account and build a custom report for downloads.
Note that the user id is an internal id that is used to link together visits by the same user from multiple devices - it is not something that shows up in the reports that would allow you to report on individual users in the Analytics interface (if you want that you need to include another id as custom dimension, and you have to check with the Google TOS what kind of id is allowed). Plus you'd need a dedicated data view in GA for sessions with a user id which will not show unauthenicated users.