Averaging across a fixed calculation - tableau-api

I am building an overview graph that gives insight into session length in connection to churn.
I am looking to build a graph depicting the average time spent in an app on a per week basis relative to the signup date of all users.
I think I am almost there. I was able to create a graph that shows exactly that except. Except I have a summed up version of all users and not an average.
If I change the row value from SUM to AVG in my graph, the problem that I am facing then is that it takes the average of all users that have been active on that day. Instead, I also want to have not active users being counted as 0, therefore, decreasing the average value. (reflecting the churn aspect of the graph)
days since signup calculation: DATE(([event_timestamp] - [sign_up_date]) /(1000 * 1000 * 60 * 60 * 24))
session length per user: { FIXED [user_pseudo_id], [days since signup] : SUM([engagement_time_msec])}
I expect something in the area of 17k ms as peak instead of 900M as peak (avg instead of a sum).
my current graph

Related

Graph Of utilization

My problem is as follows: I would like to create a graph of the percentage use of boxes over 24 hours. However, the box.utilization() function is cumulative, so I tried to solve the problem by creating a dataset that collects the values every hour and an event that resets the utilization so that the next hour is not affected by the previous hour's utilization.
(I attach a picture of the graph I created).
Is there a more efficient way?
I have faced the same issue before. Here is how I handled it:
Instead of cumulative utilization, I calculate the maximum hourly utilization. That is, I record the number of seized resource for every minute and get an array of 60 elements. Then divide the maximum number in that array by the total number of resources available. An example:
I have 100 machines
During an hour, maximum of 60 of them were busy
60/100= 60% maximum utilization during that hour
Then I plot these for each hour.

Influxdb - ignore partial intervals in group by

I feel this is a problem all users of influxdb/grafana would encounter. Any time I create a graph that shows aggregations by a time interval then the most recent and oldest intervals are cut short and the ends of the graph show incorrect values. For example, I have data coming in every 10 seconds, so I should get 360 values per hour. I wanted to create a graph showing the number of data points that come in per hour. So I have this query below that does a count by hour and run it over a 24 hour period. The problem I have is that the most recent interval is almost always less than 360 because it's not complete and the oldest interval is usually cut off so it too shows too low a value. This is pretty much always an issue for any graph I create that is grouped by a time interval. Is there a way to just leave out incomplete intervals? I'm happy for a solution in influx or grafana.
SELECT count("wifiStrength") FROM "detailed_data"."water" WHERE $timeFilter GROUP BY time(1h) fill(null)
For anyone who is curious, the data is from a water meter and logs water usage.
Use smarter time ranges in the Grafana, so full hours are selected. See doc, /h is important here, e.g.:

How to do a distinct count of a metric using graphite datasource in grafana?

I have a metric that shows the state of a server. The values are integers and if the value is 0 (zero) then the server is stable, else it is unstable. And the graph we have is at a minute level. So, I want to show an aggregated value to know how many hours the server is unstable in the selected time range.
Lets say, if I select "Last 7 days" as the time duration...we have get X hours of instability of server.
And one more thing, I have a line graph (time series graph) that shows the state of server...but, the thing is when I select "Last 24 hours or 48 hours" I am getting the graph at a minute level...when I increase the duration to a quarter I am getting the graph for every 5 min or something like that....I understand it's aggregating the values....but does any body know how the grafana is doing the aggregation ??
I have tried "scaleToSeconds" function and "ConsolidateBy" functions and many more to first get the count of non zero value minutes, but no success.
Any help would be greatly appreciated.
Thanks in advance.
There are a few different ways to tackle this, there are 2 places that aggregation happens in this situation:
When you query for a time range longer than your raw retention interval and whisper returns aggregated data. The aggregation method used here is defined in your carbon aggregation configuration.
When Grafana sends a query to Graphite it passes maxDataPoints=<width of graph in pixels>, and Graphite will perform aggregation to return at most that many points (because you don't have enough pixels to render more points than that). The method used for this consolidation is controlled by the consolidateBy function.
It is possible for both of these to be used in the same query if you eg have a panel that queries 3 days worth of data and you store 2 days at 1-minute and 7 days at 5-minute intervals in whisper then you'd have 72 * 60 / 5 = 864 points from the 5-minute archive in whisper, but if your graph is only 500px wide then at runtime that would be consolidated down to 10-minute intervals and return 432 points.
So, if you want to always have access to the count then you can change your carbon configuration to use sum aggregation for those series (and remove the existing whisper files so new ones are created with the new aggregation config), and pass consolidateBy('sum') in your queries, and you'll always get the sum back for each interval.
That said, you can also address this at query time by multiplying the average back out to get a total (assuming that your whisper aggregation config is using average). The simplest way to do that will be to summarize the data with average into buckets that match the longest aggregation interval you'll be querying, then scale those values by that interval to calculate the total number of minutes. Finally, you'll want to use consolidateBy('sum') so that any runtime consolidation will work properly.
consolidateBy(scale(summarize(my.series, '10min', 'avg'), 60), 'sum')
With all of that said, you may want to consider reporting uptime in terms of percentages rather than raw minutes, in which case you can use the raw averages directly.
When you say the value is zero (0), the server is healthy - what other values are reported while the server is unhealthy/unstable? If you're only reporting zero (healthy) or one (unhealthy), for example, then you could use the sumSeries function to get a count across multiple servers.
Some more information is needed here about the types of values the server is reporting in order to give you a better answer.
Grafana does aggregate - or consolidate - data typically by using the average aggregation function. You can override this using the 'sum' aggregation in the consolidateBy function.
To get a running calculation over time, you would most likely have to use the summarize function (also with the sum aggregation) and define the time period, e.g. 1 hour, 1 day, 1 week, and so on. You could take this a step further by combining this with a time template variable so that as the period grows/shrinks, the summarize period will increase/decrease accordingly.

Using RANK function in Tableau on a Combined column. However, display individual columns as well

I am using RANK function in Tableau and I am displaying the Rank of calculated measure (Eg: 1 to 50)
The calculated Measure I have is Total Amount for Combined Periods.
When there is no Period displayed on the dashboard, the Total Amount is the sum of both periods and this is exactly what I want. I am good in this case.
However, When I want to display the Period in the Rows, the Total Amount changes to "Total Amount for Period 1 and Total Amount for Period 2".
How can I add a different axis to show Individual Periods as well as Rank of Total Amount for Combined Period?
I guess this might come down to Dual axis in Tableau and I believe this is not available yet and users are voting for this in Ideas.

Matlab average number of customers during a single day

I'm having problems creating a graph of the average number of people inside a 24h shopping complex. I have two columns of data on a spreadsheet of the times a customer comes in (intime) and when he leaves (outtime). The data spans a couple of years and is in datetime format (dd-mm-yyyy hh:mm:ss).
I want to make a graph of the data with time of day as x-axis, and average number of people as y-axis. So the graph would display the average number of people inside during the day.
Problems arise because the place is open 24h and the timespan of data is years. Also customer intime & outtime might be on different days.
Example:
intime 2.1.2017 21:50
outtime 3.1.2017 8:31
Any idea how to display the data easily using Matlab?
Been on this for multiple hours without any progress...
Seems like you need to decide what defines a customer being in the shop during the day, is 1 min enough? is there a minimum time length under which you don't want to count it as a visit?
In the former case you shouldn't be concerned with the hours at all, and just count it as 1 entry if the entry and exit are in the same day or as 2 different entries if not.
It's been a couple of years since I coded actively in matlab and I don't have a handy IDE but if you add the code you got so far, I can fix it for you.
I think you need to start by just plotting the raw count of people in the complex at the given times. Once that is visualized it may help you determine how you want to define "average people per day" and how to go about calculating it. Does that mean average at a given time or total "ins" per day? Ex. 100 people enter the complex in a day ... but on average there are only 5 in the complex at a given time. Which stat is more important? Maybe you want both.
Here is an example of how to get the raw plot of # of people at any given time. I simulated your in & out time with random numbers.
inTime = cumsum(rand(100,1)); %They show up randomly
outTime = inTime + rand(100,1) + 0.25; % Stay for 0.25 to 1.25 hrs
inCount = ones(size(inTime)); %Add one for each entry
outCount = ones(size(outTime))*-1; %Subtract one for each exit.
allTime = [inTime; outTime]; %Stick them together.
allCount = [inCount; outCount];
[allTime, idx] = sort(allTime);%Sort the timestamps
allCount = allCount(idx); %Sort counts by the timestamps
allCount = cumsum(allCount); %total at any given time.
plot(allTime,allCount);%total at any given time.
Note that the x-values are not uniformly spaced.
IF you decide are more interested in total customers per day then you could just find the intTimes with in a given time range (each day) & probably just ignore the outTimes all together.