Tableau, how to calculate Weighted Standard Deviation - tableau-api

I've a problem to calculate weighted standard deviation. Here's the formula I used:
sum([Weight]*(([Variable]-[Mean Score - Variable])^2))
/
SUM([Weight])
But there's a error pop up message "Cannot mix aggregrate and non-aggregrate"
I wonder what's wrong with my formula?
Thanks

I am assuming Variable and Weight are explicit fields in your dataset, while [Mean Score] is a calculated field you defined in Tableau.
[Mean Score] is an aggregate calculation; Variable is not. You can check this by dragging [Mean Score] to any shelf in Tableau, and note that it is show within the prefix AGG(). Note that you can't select the form of aggregation (SUM, MIN, AVG) to apply in that case, because the aggregation function is defined within that calculation.
You can't mix aggregate and record level calculations directly. Record level calculations are evaluated once for each individual data row. Aggregate calculations are evaluated once for each block of data rows.
The dimensions used in your worksheet determine which data rows are grouped together into blocks (partitioning the data). Analogous to the fields that follow the keyword GROUP BY in SQL select statement. As with SQL, the other fields referenced must be aggregated somehow such as via a SUM(), MIN(), MAX() or other call. Tableau calls those fields measures.
The most straightforward solution is to revise your definition of [Mean Score] to make it a Level Of Detail (LOD) calc instead of an aggregate calc.
That will allow you to essentially first compute the mean score separately, and then reference that result in your record level calculation. You will have to decide among 3 different ways for determining the dimensions for your LOD calc. See the online help for more info on LOD calcs.
For example, try replacing [Mean Score] with { include : [Mean Score] }

Related

How to divide an Aggregate and Sum function

I am working in Tableau and trying to create a formula that will return me the value of each customer that walks into a store by dividing Net Sales / Traffic. When I try to combine the two separate formulas, it gives me the following error: Cannot mix aggregate and non-aggregate arguments with this function. The two functions I created that I'm trying to divide are:
SOT = (SUM([Sales Net])-SUM([Sales Gcard Net]))/SUM([Traffic Perday]) and SOT Goal
When I look at it in Tableau, it's stating that SOT is an aggregate function. How do I work around this to be able to get
SOT / SOT Goal
Aggregate variables are values that are calculated in the view, and depend on the level of aggregation in Tableau. e.g. sum(Sales) will show different values in Tableau if it’s next to a Region dimension, or if it’s next to a Category dimension.
In order to avoid the errors you can use many solutions. My favorite is indeed LOD expressions. In your view, though I do not have required sample data and therefore, I cannot try my hands on different possibilities here, I suggest that this should work-
SOT = ({SUM([Sales Net])}-{SUM([Sales Gcard Net])})/{SUM([Traffic Perday])}
Do remember that this solution will over-ride your filters and if you are using filters you have to add all those to Context.
EDIT
While trying different possibilities remember these things...
{SUM([Sales])} will sum the sales over entire data and {} i.e. curly braces wrapped around the sum function will cause to return the value as non-aggregate. In other words, this will work as LOD and if you'll add this field to view, the sum of entire sales will be shown against each row.
{FIXED [DIMENSION NAME] : sum([Sales])} will sum sales separately for each Dimension value. Fixed statement (LOD) again returns the value as non-aggregate value. if you'll add this field to view, the sum of entire sales for that dimension will be shown against each dimension.

Get average value of each (Sub) column

i would like to calculate the average of values of each "main" column, as shown in the picture (named as bundling column). Each bundling(main) column has 1 to N "sub columns" (which are values for a certain datetime). The bundling itself is variable, it changes for different filters.
How can i reference to these sub columns to get the sum/count and average of these? I thought of using the "Window average" function but i don't have an idea how to define the start/end offset parameters. Not sure if window average is the correct option.
Thank you very much
A level of detail (LOD) calculation will allow you to control at what level the aggregation occurs.
{INCLUDE [Agg(Bundling)]: AVG([Agg(Error Select)]}

Table Across Average based on First String Value

Is there a way to calculate the average across a table based on only the value in the first column (School Name) when the first column is a string value? The current values in School Average are not correct due to the additional column values (Grade and Teacher) needed prior to the measures.
This is an ideal use case for LOD calculations. An LOD calculation allows you to specify the dimensions for a calculation as part of the calculation definition -- instead of (solely) by the Tableau shelves and cards.
To start, you can define a calculated field called,say, School_Avg as
{ fixed [School Name] : avg([Grade]) }
Assuming you have a field called Grade.
There is much more to learn about LOD calculations. See the on line help to learn more.

Get 2 max date in Tableau

My requirement is to get the second max date available in report and filter the data set on this.
I tried something like this:
datediff('day',dt,max(dt))=1
Referred to this link
any help?
You're going to need Tableau 9.0 for this. Basically because any calculation you do on Tableau depends on the level of detail you have on the worksheet (the dimensions you put in there). So datediff('day',dt,max(dt))=1 won't work. First because you're mixing aggregated fields (max(dt)) with non-aggreagated (dt). Second, because the aggregation depends on the dimensions in the workfield.
But Tableau 9.0 has a new awesome feature, called Level of Detail calculations. It allows you to perform calculations in the level of detail you choose, depending not on the dimensions on the sheet. It is also calculated BEFORE any calculation on the worksheet (just after context filters).
Now to the answer. First I'll figure out what is the max(dt). Let's call it max_dt
{ FIXED : MAX(dt) }
This will calculated the maximum of dt in all your database
Now to get the second max, you can go like this:
{ FIXED : MAX(IF dt != max_dt
THEN dt
END)
}
This will calculated the maximum of dt, ignoring those who are equal to max_dt (that is the true max(dt)). Therefore, the second max.
Take a look on those LOD calculations. They were just released, I'm having tons of fun with them right now
If the view has date dimension
The easy way to do this,is to create a calculated Last()=1
then filter off the records that evaluate to TRUE

PgSQL - Error while executing a select

I am trying to write a simple select query in PgSQL but I keep getting an error. I am not sure what I am missing. Any help would be appreciated.
select residuals, residuals/stddev_pop(residuals)
from mySchema.results;
This gives an error
ERROR: column "results.residuals" must appear in the GROUP BY clause or be used in an aggregate function
Residuals is a numeric value (continuous variable)
What am I missing?
stddev_pop is an aggregate function. That means that it takes a set of rows as its input. Your query mentions two values in the SELECT clause:
residuals, this is a value from a single row.
stddev_pop(residuals), this is an aggregate value and represents multiple rows.
You're not telling PostgreSQL how it should choose the singular residuals value to go with the aggregate standard deviation and so PostgreSQL says that residuals
must appear in the GROUP BY clause or be used in an aggregate function
I'm not sure what you're trying to accomplish so I can't tell you how to fix your query. A naive suggestion would be:
select residuals, residuals/stddev_pop(residuals)
from mySchema.results
group by residuals
but that would leave you computing the standard deviation of groups of identical values and that doesn't seem terribly productive (especially when you're going to use the standard deviation as a divisor).
Perhaps you need to revisit the formula you're trying to compute as well as fixing your SQL.
If you want to compute the standard deviation separately and then divide each residuals by that then you'd want something like this:
select residuals,
residuals/(select stddev_pop(residuals) from mySchema.results)
from mySchema.results