qlik sense capability api 10000 limit - qliksense

We've reached the limit for hypercubes and need to extract more than 10000 (data points - I used this term for lack of words to describe the individual cell that the API sends over 10000 is the max when you multiply width and height of your initial fetch) using the capability API. has anyone been able to get the next page for hypercubes? note that our requirement is for mashups not extensions.
we did a work around but it required us to break our dataset and it takes a little longer.
makes you think, since Qlik is a data analytics tool there should be a way to get all of your data. in an era where we process millions if not billions of records, 10000 data points (not even records) is miniscule.
I should also volunteer that the app we are using this for is for stock analysis and they want to see trends and require to see information on individual points as tooltips. with the number of dimensions and measures we pass (total of 7 times the number of stocks - about 20 = 140) we are constricted to only 70 days (10000/140).
we are using qliksenseserver 11.24.4
Qlik Sense November 2017 Patch 2

Related

Athena/DDB to condense millions of data points for plotting them on a graph

I need to plot trend charts on the react app based on user inputs such as timestamps, devices, etc. I have related time series data in DynamoDB and S3 (which I can query using Athena).
Returning all those millions of data points for a graph seems unreasonable and is super laggy.
I guess one option is "binning" where I decide the number of bins based on how big the time range is and take averages of the readings in that bin. However, concerned about how well it will show the drops and high we need to show them accurately.
Athena queries and DDB queries (due to the 1MB limit) - both seem fairly slow so far.
Of course the size of the response payload is another concern as API and Lambda both limit it to 10 and 6Mb respectively.
Any ideas?
I can't suggest anything smarter than "binning", but if you are concerned that the bucket interval might become too wide and performance might suffer, you can fixate the interval. Then create more than one table. For example, the interval can be 1 hour and you can have a new table for each week.
This is what we did when we had to deal with time series in dynamo. At some point, we decided to switch to Amazon Timestream

What is the best way to represent a chart of distribution of time intervals in Datadog?

I have a server that processes packets from different devices. Devices can report in different intervals.
I would like to make a chart showing the distribution of intervals by the count of devices (how many devices are reporting within 5 sec/10 sec/60 sec ...)
Intervals for each device can vary.
Now I'm sending metric with Set using deviceId with tags that represent interval (5 sec, 10 sec, 30 sec, and more) but I'm not sure that it is correct.
What is the best way to realize it?
Set is almost never the right custom metric type to use. It will send a count of the number of unique items per a given tag. The underlying items details will be stripped from the metric, meaning that from one time slice to the next, you will have no idea that actual true number of items over time.
For example
3:00:07-3:00:32 | 5 second bucket:[device1,device4,device7] -> 3 values
3:00:32-3:00:47 | 5 second bucket:[device1,device3] -> 2 values
Your time series to datadog will report 3, and then 2. But because the underlying device info is stripped you have no idea how to combine that 2 and 3 if you to zoom out in time and roll up the numbers to show 1 data point per minute. It could be any number from 3 to 5, but the Datadog backend has no idea. (even though we know that across those 30 seconds there were 4 unique values total)
Plus even if it was accurate somehow, you can't create an alert of it or notify anyone, because you won't know which device is having issues if you see a spike of devices in the 60 second bucket.
So let's go through other metric options.
The only metric types that are ever worth using are usually distributions or gauges, or [counts].
A gauge metric is just a measurement of the latency at a point in time, it's usually good for things like CPU or Memory of a computer, or temperature in a room. Numbers that are impossible to actually collect all dat a points for so you just take measurements every 10 seconds, or every minute, or however often you never to get an idea of the behavior.
A count metric is more exact, it's the number of things that happened. Usually good for number of requests to a server, or number of files processed. Even something like the amount of bytes flowing through something, although that usually is treated like a gauge by most people.
Distributions are good for when you want to create a gauge metric, but you need detailed measurements for every single event that happens. For example a web server is handling hundreds of requests per second and we need to know the latency metrics of that server. It's not possible to send a latency metric for every request as a gauge. Gauges have a built in limit of 1 data point per second (in Datadog). Anything more sent in a 1 second interval gets dropped. But we need stats for every request, so a distribution will summarize the data, it keep a running count, min, max, average, and optionally several percentiles (p50, p75, p99).
I haven't seen many good use cases for metric types outside of those 3. For your scenario, it seems like you would want to be sending a distribution metric for that device interval. So device 1 sends a value of 10.14 and device 3 sends a value of 2.3 and so on.
Then you can use a distribution widget in a dashboard to show the number of devices for each interval bucket.
Of course make sure you tag each metric by the device that is generating the metric.

How to do a distinct count of a metric using graphite datasource in grafana?

I have a metric that shows the state of a server. The values are integers and if the value is 0 (zero) then the server is stable, else it is unstable. And the graph we have is at a minute level. So, I want to show an aggregated value to know how many hours the server is unstable in the selected time range.
Lets say, if I select "Last 7 days" as the time duration...we have get X hours of instability of server.
And one more thing, I have a line graph (time series graph) that shows the state of server...but, the thing is when I select "Last 24 hours or 48 hours" I am getting the graph at a minute level...when I increase the duration to a quarter I am getting the graph for every 5 min or something like that....I understand it's aggregating the values....but does any body know how the grafana is doing the aggregation ??
I have tried "scaleToSeconds" function and "ConsolidateBy" functions and many more to first get the count of non zero value minutes, but no success.
Any help would be greatly appreciated.
Thanks in advance.
There are a few different ways to tackle this, there are 2 places that aggregation happens in this situation:
When you query for a time range longer than your raw retention interval and whisper returns aggregated data. The aggregation method used here is defined in your carbon aggregation configuration.
When Grafana sends a query to Graphite it passes maxDataPoints=<width of graph in pixels>, and Graphite will perform aggregation to return at most that many points (because you don't have enough pixels to render more points than that). The method used for this consolidation is controlled by the consolidateBy function.
It is possible for both of these to be used in the same query if you eg have a panel that queries 3 days worth of data and you store 2 days at 1-minute and 7 days at 5-minute intervals in whisper then you'd have 72 * 60 / 5 = 864 points from the 5-minute archive in whisper, but if your graph is only 500px wide then at runtime that would be consolidated down to 10-minute intervals and return 432 points.
So, if you want to always have access to the count then you can change your carbon configuration to use sum aggregation for those series (and remove the existing whisper files so new ones are created with the new aggregation config), and pass consolidateBy('sum') in your queries, and you'll always get the sum back for each interval.
That said, you can also address this at query time by multiplying the average back out to get a total (assuming that your whisper aggregation config is using average). The simplest way to do that will be to summarize the data with average into buckets that match the longest aggregation interval you'll be querying, then scale those values by that interval to calculate the total number of minutes. Finally, you'll want to use consolidateBy('sum') so that any runtime consolidation will work properly.
consolidateBy(scale(summarize(my.series, '10min', 'avg'), 60), 'sum')
With all of that said, you may want to consider reporting uptime in terms of percentages rather than raw minutes, in which case you can use the raw averages directly.
When you say the value is zero (0), the server is healthy - what other values are reported while the server is unhealthy/unstable? If you're only reporting zero (healthy) or one (unhealthy), for example, then you could use the sumSeries function to get a count across multiple servers.
Some more information is needed here about the types of values the server is reporting in order to give you a better answer.
Grafana does aggregate - or consolidate - data typically by using the average aggregation function. You can override this using the 'sum' aggregation in the consolidateBy function.
To get a running calculation over time, you would most likely have to use the summarize function (also with the sum aggregation) and define the time period, e.g. 1 hour, 1 day, 1 week, and so on. You could take this a step further by combining this with a time template variable so that as the period grows/shrinks, the summarize period will increase/decrease accordingly.

Smartsheet Sheet data limits

What are the current limits of the data contained on a smartsheet, and what is it based on ?
I've found some links talking about 5000 records and 500 columns, but I've reached the point with a sheet containing only 1626 rows and 123 columns.
What are the exact specifications, and is there any way to override the settings using the api ?
There are maximums on the number of rows, columns, and cells in a sheet. There is no method via the Smartsheet API to override these limits. On this Smartsheet Help Center article it currently states the maximums as the following:
5,000 Rows
200 Columns
200,000 Cells
It is possible to reach one of these maximums before reaching another. 5,000 rows at 200 columns would actually be more than 200,000 cells. Once one of these maximums is reached there may be errors and the amount of data in the sheet should be decreased.
But, these aren't absolute limits. Depending on the usage of other Smartsheet features, like Formulas, Cell Linking, and Conditional Formatting, there can be limits to the amount of data that can be stored in a sheet. There are a lot of variables that can affect the performance of a sheet and the amount of data that can be stored in it, so these limits aren't absolutely specific.
This can sometimes require a process of elimination to see what can be stored in your sheet. It is best to divide up the data into logical groupings as best you can to help keep the sheets working as you need.

Averaged Historical Data from Xively feed API

The xively (Cosm) web interface issues the following function for averaged historical datapoints
// For averaged historical datapoints
https://www.xively.com/feeds/<feedId>/datastreams/Humidity/graph.json&duration=21600seconds&interval=30&limit=1000&find_previous=true&function=average
I would like to fetch averaged historical data points (That is if there are multiple samples within the interval I am asking then return an averaged rollup as representative point of the interval) using Xively REST API
However this seems to return the raw data points (They just pick one datapoint to represent the sample interval)
https://api.xively.com/v2/feeds/127181539.json?datastreams=TEMP&duration=1month&interval=21600&limit=200&function=average
So questions
1) How can I return averaged data points like the Xively web interface? what parameter is needed for feed API call?
2) Does anyone know about the parameter interval_type? I have read what is here (https://xively.com/dev/docs/api/quick_reference/historical_data/) about 50 times now but I still don't get it!
Update
function=sum as well as function=average works for
/datastreams/TEMP.json endpoint. Also, they are discrete by default.
The function=average does not works with /feeds/feed_id.json
endPoint. Maybe a Bug?
If you've got "function=average" (which you have) as a query parameter, then the points you get back should be bucketed to the interval you specified (21600 seconds / 6 hours). Each point represents the average value for that period.
It might be worth making this query against the datastreams endpoint though, e.g.
https://api.xively.com/v2/feeds/127181539/datastreams/TEMP.json?duration=1month&interval=21600&limit=200&function=average
Hope this helps!