Averaged Historical Data from Xively feed API - feed

The xively (Cosm) web interface issues the following function for averaged historical datapoints
// For averaged historical datapoints
https://www.xively.com/feeds/<feedId>/datastreams/Humidity/graph.json&duration=21600seconds&interval=30&limit=1000&find_previous=true&function=average
I would like to fetch averaged historical data points (That is if there are multiple samples within the interval I am asking then return an averaged rollup as representative point of the interval) using Xively REST API
However this seems to return the raw data points (They just pick one datapoint to represent the sample interval)
https://api.xively.com/v2/feeds/127181539.json?datastreams=TEMP&duration=1month&interval=21600&limit=200&function=average
So questions
1) How can I return averaged data points like the Xively web interface? what parameter is needed for feed API call?
2) Does anyone know about the parameter interval_type? I have read what is here (https://xively.com/dev/docs/api/quick_reference/historical_data/) about 50 times now but I still don't get it!
Update
function=sum as well as function=average works for
/datastreams/TEMP.json endpoint. Also, they are discrete by default.
The function=average does not works with /feeds/feed_id.json
endPoint. Maybe a Bug?

If you've got "function=average" (which you have) as a query parameter, then the points you get back should be bucketed to the interval you specified (21600 seconds / 6 hours). Each point represents the average value for that period.
It might be worth making this query against the datastreams endpoint though, e.g.
https://api.xively.com/v2/feeds/127181539/datastreams/TEMP.json?duration=1month&interval=21600&limit=200&function=average
Hope this helps!

Related

Athena/DDB to condense millions of data points for plotting them on a graph

I need to plot trend charts on the react app based on user inputs such as timestamps, devices, etc. I have related time series data in DynamoDB and S3 (which I can query using Athena).
Returning all those millions of data points for a graph seems unreasonable and is super laggy.
I guess one option is "binning" where I decide the number of bins based on how big the time range is and take averages of the readings in that bin. However, concerned about how well it will show the drops and high we need to show them accurately.
Athena queries and DDB queries (due to the 1MB limit) - both seem fairly slow so far.
Of course the size of the response payload is another concern as API and Lambda both limit it to 10 and 6Mb respectively.
Any ideas?
I can't suggest anything smarter than "binning", but if you are concerned that the bucket interval might become too wide and performance might suffer, you can fixate the interval. Then create more than one table. For example, the interval can be 1 hour and you can have a new table for each week.
This is what we did when we had to deal with time series in dynamo. At some point, we decided to switch to Amazon Timestream

How to persist previous data point when time range doesn't include a data point

TL;DR:
Can I get Grafana to show me the previous data point, when the currently selected time period does not have a data point? I have an example which sounds ridiculous, but at least it's simple to understand: I send data every 1 minute, and I wish to zoom into the last 30 seconds, and still see data. You may ask "why not just zoom out to 2 minutes" but the reason is that other data is on the same graph that has updated more often, and I wish to compare with that data. Also, for the more lengthy reasons below.
If not, how can I achieve what I want to achieve, see below?
Context
For a few years, I have been monitoring the water level in three of our basement sumps (which have pumps installed) by sending this data from Node-RED to InfluxDB, then visualising the sump levels in Grafana. I have set up three waterproof ultrasonic distance sensors, each pointed down a pipe that is inserted vertically into each sump. The water fills the pipe and the distance sensor, connected to an Arduino, sends me the reading. The Arduino also has other sensors connected (temp / humidity) and deals with distance calibrations to calculate the percent full of each sump. All this data is sent to Node-RED. In total, I am sending 4 values per sump: distance measurement in mm, percent full, temp, humidity. So that's 12 fields. Data is sent every 2 seconds, because I wished to have a reasonably high resolution to see nice curves in graphs.
Also I decided to store all this data so that I could later troubleshoot issues (we have had sewage floods resulting in water not being able to be pumped away, etc...) and design some warning systems for these issues based on data.
Storing 12 values for every 2 seconds, over the course of a number of years, takes up a lot of space (8GB).
Nature of the data
Storing this resolution of data has also helped me be able to describe the nature of the data. I will do so here.
(1) Non-meaningful NOISE (see below) - the percent-full reading goes up and down by 1 or 2 percent every couple of seconds:
(2) Meaningful DRIFT (see below) - I don't mean sensor drift, I am referring to actual water levels changing slowly over time, e.g. over 1 day or 1 week. Perhaps condensation on the walls drips down into the sump, or water evaporates from the sump, and the value can waver by a few percent over the course of a day. Each sump has slightly different characteristics.
(3) Meaningful MONITORING DATA - during wet weather, depending on rainfall amount, the sumps fill up over the course of say 30 mins to 3 hours. Then the pumps run and the water level drops again, wavers a bit, then the sumps continue to fill up. If the rain stopped, you can see a lovely curve as the water fills in progressively more slowly (see the green line below):
Solution to downsample
I know Influx has its own downsampling possibilities, however because of the nature of the data (which can hardly vary for 2 months but when it does, I really need to capture it in detail), I don't think lowering the sample rate is a great idea.
I have some understanding of digital filters (e.g. low pass etc) but have never programmed one myself. So I have written a basic filter in javascript (a Node-RED function) to filter the data in realtime as follows: only send each reading when it has changed from the previous one by x amount. (And update the previous one, when that occurs.)
This has already vastly reduced the amount of data being stored, and I can vary x to filter out noise shown in my first graph above, at the expense of resolution when the pumps run. Even if I set the x value to 2, it still vastly reduces data over long periods of dry weather.
So - onto my problem! Now data is not being logged to InfluxDB unless there is some meaningful change. Which means that when I zoom in to e.g. 15 minute timeframe of data, there is nothing to see.
Grafana does have the option of "fill (previous)" but this draws a line between points on the existing graph, rather than showing the previous data as if it hasn't changed since that point. Now my grafana dashboard looks a bit sad :(
One proposed solution is, in addition to sending "delta" data, send "summary" data, that is - send a full suite of data every 1 minute regardless of whether data changed or not. But then we get noise back again, and pointless storage.
Any other ideas?

How to do a distinct count of a metric using graphite datasource in grafana?

I have a metric that shows the state of a server. The values are integers and if the value is 0 (zero) then the server is stable, else it is unstable. And the graph we have is at a minute level. So, I want to show an aggregated value to know how many hours the server is unstable in the selected time range.
Lets say, if I select "Last 7 days" as the time duration...we have get X hours of instability of server.
And one more thing, I have a line graph (time series graph) that shows the state of server...but, the thing is when I select "Last 24 hours or 48 hours" I am getting the graph at a minute level...when I increase the duration to a quarter I am getting the graph for every 5 min or something like that....I understand it's aggregating the values....but does any body know how the grafana is doing the aggregation ??
I have tried "scaleToSeconds" function and "ConsolidateBy" functions and many more to first get the count of non zero value minutes, but no success.
Any help would be greatly appreciated.
Thanks in advance.
There are a few different ways to tackle this, there are 2 places that aggregation happens in this situation:
When you query for a time range longer than your raw retention interval and whisper returns aggregated data. The aggregation method used here is defined in your carbon aggregation configuration.
When Grafana sends a query to Graphite it passes maxDataPoints=<width of graph in pixels>, and Graphite will perform aggregation to return at most that many points (because you don't have enough pixels to render more points than that). The method used for this consolidation is controlled by the consolidateBy function.
It is possible for both of these to be used in the same query if you eg have a panel that queries 3 days worth of data and you store 2 days at 1-minute and 7 days at 5-minute intervals in whisper then you'd have 72 * 60 / 5 = 864 points from the 5-minute archive in whisper, but if your graph is only 500px wide then at runtime that would be consolidated down to 10-minute intervals and return 432 points.
So, if you want to always have access to the count then you can change your carbon configuration to use sum aggregation for those series (and remove the existing whisper files so new ones are created with the new aggregation config), and pass consolidateBy('sum') in your queries, and you'll always get the sum back for each interval.
That said, you can also address this at query time by multiplying the average back out to get a total (assuming that your whisper aggregation config is using average). The simplest way to do that will be to summarize the data with average into buckets that match the longest aggregation interval you'll be querying, then scale those values by that interval to calculate the total number of minutes. Finally, you'll want to use consolidateBy('sum') so that any runtime consolidation will work properly.
consolidateBy(scale(summarize(my.series, '10min', 'avg'), 60), 'sum')
With all of that said, you may want to consider reporting uptime in terms of percentages rather than raw minutes, in which case you can use the raw averages directly.
When you say the value is zero (0), the server is healthy - what other values are reported while the server is unhealthy/unstable? If you're only reporting zero (healthy) or one (unhealthy), for example, then you could use the sumSeries function to get a count across multiple servers.
Some more information is needed here about the types of values the server is reporting in order to give you a better answer.
Grafana does aggregate - or consolidate - data typically by using the average aggregation function. You can override this using the 'sum' aggregation in the consolidateBy function.
To get a running calculation over time, you would most likely have to use the summarize function (also with the sum aggregation) and define the time period, e.g. 1 hour, 1 day, 1 week, and so on. You could take this a step further by combining this with a time template variable so that as the period grows/shrinks, the summarize period will increase/decrease accordingly.

qlik sense capability api 10000 limit

We've reached the limit for hypercubes and need to extract more than 10000 (data points - I used this term for lack of words to describe the individual cell that the API sends over 10000 is the max when you multiply width and height of your initial fetch) using the capability API. has anyone been able to get the next page for hypercubes? note that our requirement is for mashups not extensions.
we did a work around but it required us to break our dataset and it takes a little longer.
makes you think, since Qlik is a data analytics tool there should be a way to get all of your data. in an era where we process millions if not billions of records, 10000 data points (not even records) is miniscule.
I should also volunteer that the app we are using this for is for stock analysis and they want to see trends and require to see information on individual points as tooltips. with the number of dimensions and measures we pass (total of 7 times the number of stocks - about 20 = 140) we are constricted to only 70 days (10000/140).
we are using qliksenseserver 11.24.4
Qlik Sense November 2017 Patch 2

Why isn't path showing on map in activity posted to Google Fit using REST API?

I am using Google Fit REST API (via Google Java Client Library) to post an activity into Google Fit.
In summary what I am doing is creating three DataSets covering the given time period:
"com.google.location.sample" - Location
"com.google.step_count.delta" - Steps
"com.google.calories.expended" - Calories
... then creating a Session, and finaly a DataSet with a single Activity Segment (in this case all the time is walking).
This basically all seems to work - I can looking in http://fit.google.com, and I can see the activity, with the correct time, location, duration, steps and calories. The problem is with the map... all it shows is a shaded circle over the whole area of the walking - it doesn't show the track/path that I included in the location DataSet.
EDIT... Here is an example of what it looks like (in web UI).
Why would this not be showing up correctly, when all of the rest of the activity shows up perfectly?
These are some of my suspicions
My data does not have either altitude or accuracy - which are two of the fields needed by "com.google.location.sample". So I set altitude to 0.0 (metres), and set accuracy to 5.0 (metres). I particularly wonder if Google is reacting badly to me setting the altitude to 0.0 for each point?
My location DataSet has say 100 DataPoints in it, whereas by steps and calories DataSets only have one DataPoint in each - i.e. I only have total steps, and total calories, for the walk. So there's an inconsistency (the earliest start and latest end dates are the same for each data set)
Can anybody give any guidance about why this is happening please?
Think this may be due to conflicting data points. As stated here. Although this is for the API for Android, I think it holds true too when using the REST API.
Each DataPoint in your app's DataSet must have a startTime and an
endTime that defines a unique interval within that DataSet, with no
overlap between DataPoint instances. If your app attepts to insert a
new DataPoint that conflicts with an existing DataPoint instance, the
new DataPoint is discarded. To insert a new DataPoint that may overlap
existing data points, use the HistoryApi.updateData method described in
Update data.
You mentioned that the dates are the same across data points. Thus overriding the others and it is only treated as one.
For your com.google.location.sample data type fields. Think it's better to leave them as is. Try not to place a static value and for altitude and accuracy.