How to create a metric that isn't aggregated (AVG,SUM,etc.) on Google/Looker Studio - visualization

I'm new to Google/Looker studios. I'm creating a dashboard for a manufacturing plant and we are focusing on the following metrics:
Produced Units
Scheduled Units
Assembly Yield - Calculated by Produced Units/Scheduled Units
I'm breaking down the data by month by assembly line number. I want a sum of the produced and scheduled units for the month, but I don't want an average for the assembly yield. The assembly yield can get skewed if I use an average of each day, so I'd like to do a straight calculation of "produced/scheduled".
The problem is that Google/Looker studios only let me do an average (as shown below). What is the best way to create a straight calculation of "produced/scheduled"?
Thanks in advance!

There are three kinds of formulas in Looker Studio:
Dimension formula (allows to return a string)
Metric formula
Metric formula aggregated:
This returns a field, which includes the aggegation. In your case:
sum(Produced Units) / sum(Scheduled Units)
Let's see in a short example the issue of the averaging of percentages with different size distributions:
Line
Produced Units
Scheduled Units
Yield
Line A
50
100
50/100=0.50 %
Line B
10
10
10/10=1.00 %
Total
60
110
60/110=0.55%
Doing the average over 0.5% and 1.0% would result in 0.75%, which is the wrong answer for the total yield.

Related

Tableau Weighted Average of Last Value in Date Group over Running Sum Across Extra Level of Detail not in Report

I am an absolute Tableau beginner, so forgive my lack of proper terminology.
Context
To give some context to the problem, think of the dataset as the balances and current interest rates of two different loans for which we are trying to calculate a weighted average cost of funds at any point in time, while retaining the ability to filter on Program (specific loan).
I have a single dataset that looks like:
The Balance field is used as a running sum, i.e. to get the actual balance as of 4/30/2022, you would sum the column across all Date values on or before 4/30/2022.
The Rate field is the opposite: it represents the discrete interest rate as of the Date. Thus, it cannot be summed.
Each data point is specific to a specific loan, or Program.
So to get the interest rate of Program A as of 4/30/2022, you would simply grab the Rate value of the row where Date = 4/30/2022 and Program = A, or 5.30%. Sums are fine here, since the value of Rate is never repeated for a single Program and Date combo, but we cannot use a running sum.
On the other hand, to get the balance of Program A as of 4/30/2022, you would need to add (running sum) the Balance values for all rows where Date <= 4/30/2022 and Program = A, or 10,000 + -2500 + -2500 + -2500 = 2500.
Problem / Need
I need a report (or whatever it's called in Tableau) with the following:
Date as a column
Measures as rows
This report would NOT include Program as a row or column, but would include it as a filter.
In this report, I need a Weighted Average Cost of Funds measure.
This is effectively the weighted average Rate over/weighted by the running sum of Balance across Programs included in the filter, of course for any given Date in the columns.
In other words, by Date, latest Ratefor eachProgramtimes thePrograms running sum of Balance, divided by running sum of all Balancesfor allProgram`s included in filter.
Here's an example in Excel:
Here's an example if we were to exclude Program A:
And here's an example if we were to exclude Program B:
Finally, here's the formulas underneath everything in the Excel example:

Want to SUM all values for a specific date within column NOT sum all values in that column

I want to create a graph which shows the total capacity for each week relative to remaining availability across a series of specific dates. Just now when I attempt this in Power Bi it calculates this correctly for one of the values (remaining availability) but generates a value much higher than expected by manual calculation for the total capacity - instead showing the total for the entire column rather than for each specific date.
Why is Power Bi doing this and how can I solve it?
So far, I have tried generating the graph like this:
(https://i.stack.imgur.com/GV3vk.png)
and as you can see the capacity values are incredibly high they should be 25 days.
The total availability values are correct (ranging from 0 to 5.5 days).
When I create matrices to see the sum breakdown they are correct but it only appears to be that when combined together one of the values changes to the value for the whole column.
If anyone could help me with this issue that would be great! Thanks!

Prometheus query quantile of pod memory usage performance

I'd like to get the 0.95 percentile memory usage of my pods from the last x time. However this query start to take too long if I use a 'big' (7 / 10d) range.
The query that i'm using right now is:
quantile_over_time(0.95, container_memory_usage_bytes[10d])
Takes around 100s to complete
I removed extra namespace filters for brevity
What steps could I take to make this query more performant ? (except making the machine bigger)
I thought about calculating the 0.95 percentile every x time (let's say 30min) and label it p95_memory_usage and in the query use p95_memory_usage instead of container_memory_usage_bytes, so that i can reduce the amount of points the query has to go through.
However, would this not distort the values ?
As you already observed, aggregating quantiles (over time or otherwise) doesn't really work.
You could try to build a histogram of memory usage over time using recording rules, looking like a "real" Prometheus histogram (consisting of _bucket, _count and _sum metrics) although doing it may be tedious. Something like:
- record: container_memory_usage_bytes_bucket
labels:
le: 100000.0
expr: |
container_memory_usage_bytes > bool 100000.0
+
(
container_memory_usage_bytes_bucket{le="100000.0"}
or ignoring(le)
container_memory_usage_bytes * 0
)
Repeat for all bucket sizes you're interested in, add _count and _sum metrics.
Histograms can be aggregated (over time or otherwise) without problems, so you can use a second set of recording rules that computes an increase of the histogram metrics, at much lower resolution (e.g. hourly or daily increase, at hourly or daily resolution). And finally, you can use histogram_quantile over your low resolution histogram (which has a lot fewer samples than the original time series) to compute your quantile.
It's a lot of work, though, and there will be a couple of downsides: you'll only get hourly/daily updates to your quantile and the accuracy may be lower, depending on how many histogram buckets you define.
Else (and this only came to me after writing all of the above) you could define a recording rule that runs at lower resolution (e.g. once an hour) and records the current value of container_memory_usage_bytes metrics. Then you could continue to use quantile_over_time over this lower resolution metric. You'll obviously lose precision (as you're throwing away a lot of samples) and your quantile will only update once an hour, but it's much simpler. And you only need to wait for 10 days to see if the result is close enough. (o:
The quantile_over_time(0.95, container_memory_usage_bytes[10d]) query can be slow because it needs to take into account all the raw samples for all the container_memory_usage_bytes time series on the last 10 days. The number of samples to process can be quite big. It can be estimated with the following query:
sum(count_over_time(container_memory_usage_bytes[10d]))
Note that if the quantile_over_time(...) query is used for building a graph in Grafana (aka range query instead of instant query), then the number of raw samples returned from the sum(count_over_time(...)) must be multiplied by the number of points on Grafana graph, since Prometheus executes the quantile_over_time(...) individually per each point on the displayed graph. Usually Grafana requests around 1000 points for building smooth graph. So the number returned from sum(count_over_time(...)) must be multiplied by 1000 in order to estimate the number of raw samples Prometheus needs to process for building the quantile_over_time(...) graph. See more details in this article.
There are the following solutions for reducing query duration:
To add more specific label filters in order to reduce the number of selected time series and, consequently, the number of raw samples to process.
To reduce the lookbehind window in square brackets. For example, changing [10d] to [1d] reduces the number of raw samples to process by 10x.
To use recording rules for calculating coarser-grained results.
To try using other Prometheus-compatible systems, which may process heavy queries at faster speed. Try, for example, VictoriaMetrics.

Power BI: Finding average of averages and STDEV.P of averages

All,
My overall objective is to find outliers within an aggregated data set vs the underlying detail for different date ranges. The issue I am having is that Power BI is averaging the SalesPerDay and finding the STDEV.P at the daily level which is the grain of the raw data. I need to first find the average Sales, then find the average of those averages for that "rolled up" data set. Same with STDEV.P. Need to find the STDEV of the "rolled up" averages. Screenshot below depicting how I need the tool to aggregate.
I have brought the Sales column into my dashboard, dimentionalized by user, and set to AVERAGE to get average SalesPerDay.
Then I created the new measure
newavg = CALCULATE(AVERAGE(SalesPerDay[Sales]),ALLSELECTED())
Which is finding the overall average, but at the daily level vs the aggregated level.
I also tried
newSTDV = CALCULATE(STDEV.P(AVERAGE(SalesPerDay[Sales])),ALLSELECTED())
But you cannot find the STDEV.P of a calculation.
Thank you.
What you are looking for is the iterator functions, which take a table or column of data as a grouping, and then applies a calculation on that group.
Example of one would be SUMX. In the example below, it would do a grouping based on Product. Within each product it would get the total of qty and multiply it by the sum of x. It would then sum the results of that calculation into a total.
SUMX( VALUES( table1 [ Product ] ), [Qty] * [x] )
There also being averagex, minx, maxx, plus for the statistical functions there is STDEVX.P and STDEVX.S

Different Total Types in Tableau

I am trying to use Tableau's row total function but am running into a challenge. In the same widget I have Rows 1 - 4 with Numbers. Row 5 is a percentage.
What I would like to do is have Rows 1 - 4 use a Sum Total and Row 5 use an Average total.
Any suggestions on how I can do this?
Thanks,
I don't believe you can use different total metrics on the same worksheet.
What you can do is to create 2 different worsheets, and bring them side by side on a dashboard. Then use the proper Total metric in each.
But beware on calculation average of percentages, because they might be twisted. Usually weighted average is required to accurately express the "average" of a percentage.
What you can do is to actually calculate the percentage (use a calculated field) via the division of two metrics. That way, when you do Totals you will actually a valid value for the "average" of the percentage.
As an exercise, suppose you have sales (in $) in first row, and # of clients in row 2. Now I create a calculated field called ticket, that is
SUM(sales) / sum([# of clients])
That way I can add that to a third row, and for each column I'll have the right number of ticket, and if I add a Row Grand Total, I'll get the actual average ticket value (that is total sales / total # clients), because Tableau will sum all sales, sum all # clients and them perform the calculation (the division)