Trying to figure out tableau calculated fields:
I would like to calculate the occurrence of a row variable. Example:
Fruit | Occurrence
Apple | 2
Apple | 2
Orange | 1
Banana | 1
Occurrence should be the calculated field which in Excel would be =COUNTIF([fruit]=[#fruit])
What's the equivalent syntax for Tableau?
Just take in consideration that if you use "FIXED" the filters won`t wot
I suggest to use "INCLUDE"
{INCLUDE[Fruit] : COUNT([Fruit])}
My solution works with WINDOW calculations. Also, it requires data disaggregation: Tableau online documentation
Code for discrete Calculation2 (Compute using PANE DOWN):
RUNNING_COUNT(ATTR([Fruit]))
Code for discrete Calculation3 (Compute using PANE DOWN):
WINDOW_MAX([Calculation2])
Whether or not this still works when you shuffle around the values in the data source, I do not know. You would need sorting for the Fruit column then, I guess.
I've realised the right answer is:
{FIXED [Fruit] : COUNT([Fruit])}
Where fixed creates a set array filtering the all rows containing the current row's same variable.
Related
I have a spreadsheet of support case management data. I am working with this in Tableau. Each line in the spreadsheet is an individual case. Each case has, among much else, a support agent name and a Yes or No of whether the case work was started within 12 hours. I'd like to know, for each agent, what percentage of the time they started the case work within 12 hours. So, if Bob has 2 "No" and 8 "Yes", he should have 8 / (2 + 8) = 80%.
My attempt at this was to create 2 sets. One is the set of "Yes, started within 12 hours" (those that have "yes" in that field, and one that is the set of "No, not started within 12 hours", the complement to the other set. Silly me, I thought I could do something like COUNT(yeses) / COUNT(nos). Nope, big red failure. So what is the right way to do this?
It would help immensely to please respond as if this is the first thing I have ever done in Tableau. It is. I have learned a lot in this project, but only in comparison to the nothing I knew previously. Please also let me know if I've left out something necessary to answer this. I've tried to be complete but, well, am noob...
If it clarifies anything, here's a poor Excel mockup of data and the effect I'm looking for:
Yes, this is possible and easy in Tableau, but first a couple of points.
The reason your attempt to use COUNT() did not work is that COUNT() does not operate the way you, and that 99% of the people on the planet, expect. COUNT([some expression]) returns the number of records that have a non-null value, any value, for [some expression]. The name comes from SQL relational databases.
The calculations would be just a bit simpler if your third column took the boolean values True or False instead of the string values “Yes” or “No”. (In which case you could drop ‘= “Yes”’ from the formula below)
So two ways to do your calculation are:
Directly with an aggregate calculation which can get the right result, but is hard coded for this case, such as:
SUM(INT([Started within 24 hrs?] = “Yes”)) / SUM([Number of Records])
Using a table calc - which in this case is a bit easier and more flexible.
First, Build a table or viz in Tableau showing SUM([Number of Records]) with the dimensions you care about in play. Say with [Name] on Rows, [Started within 24 hrs?] on Columns and SUM([Number of Records] on Text. Second, Right click on your measure SUM([Number of Records]) and choose Percentage of Total from the Quick Table Calc menu. Finally, use that same menu to adjust Compute Using to specify how you want the percentages computed - in this case, using [Started within 24 hrs?]
If you only want to show some of the data, right click on the column header for the values you wish to hide and choose Hide.
The type conversion function INT() converts True to 1 and False to 0.
You could create another column that converts the Yes's into 1's, and the No's into 0's. Sum up all the 1's and divide it by the total and that's your percentage.
edit: the new column would look something like
=IF(C3="Yes",1,0)
in other words, if Cⁿ is "Yes", then 1, else 0
Just started learning Tableau, would appreciate your tips on the following:
Have 2 columns of data strings, one to many values. Example:
yellow -> banana
yellow -> sun
Instead of duplicating the lines, I would like to present it as a group:
yellow -> banana & sun.
I can not do it manually, as data set is huge and changing. So I need a condition from the "yellow" column.
Could you help me with a query, please? Thank you!
There are multiple ways to do this.
1. Create a table in excel/txt/csv/database that takes those values and assigned a grouping value, and join it in the data model. You would only need to do this once, and update it as needed.
Ex:
Time | Color
---------------
Banana | Yellow
Sun | Yellow
Another would be to create a calculated field with if statements if there are commonalities in the list.
Option one would be much more efficient if its a large list and would require less maintenance than you're currently doing
I've been tasked to set up a Tableau worksheet of counts of data (ultimately to create percentages) where the contrived incoming data looks like the following.
id fruit
1 apple
1 orange
1 lemon
2 apple
2 orange
3 apple
3 orange
4 lemon
4 orange
The worksheet needs to look something like the following:
Count of ids
2 Lemons
2 No lemons
I've only been using Tableau for about 4 hours, so is this doable? Can anyone point me in the right direction?
The data is coming in from a SQL Server database in a format that I can control if that helps contribute towards a solution.
Alex's solution based on sets are very good for this scenario, but I would like to show that LODs can be more flexible if you need to extend your solution to include more categories.
for the current scenario, create a calculation with below formula and create text table using COUNTD(Id)
{FIXED [Id]:IF MAX([Fruit]='lemon') THEN 'Lemon' ELSE 'No Lemon' END}
Now for the extension part, you are considering below list where you want to count IDs with Lemon, Apple and others. Since no double counting of Ids are allowed, categorization will follow the order. (This kind of precedence will be a headache without LODs)
Now you can change your calculation as below:
{FIXED [Id]:IF MAX([Fruit]='lemon') THEN 'Lemon'
ELSEIF MAX([Fruit]='apple') THEN 'Apple'
ELSE 'No Lemon or Apple' END}
Now your visualization automatically changes to include the new category. This can be extended for any number of fruits.
This is a good use for a set.
In the data pane on the left sidebar, right click on the Id field and create a set named "Ids that contain at least one lemon" (or use a shorter less precise name)
In the set definition dialog panel, define the set by choosing "Use all" from the General tab, and then on the Condition tab, define the condition by the formula max([Fruit]="lemon")
There are many ways to think of a set, but the most abstract is just as a mathematical set of Ids that satisfy the condition. Remember each Id has many data rows, so the condition is a function of many data rows and uses the aggregation function MAX(). For booleans, True is treated as greater than False, so MAX() will return True if at least one of the data rows satifies the condition. By contrast, MIN() is True only if ALL (non-null) data rows satisfy the condition.
Once you have a set that separates your ids into Lemon scented Ids and others, then you can use that set in many ways - in calculated fields, in filters, in combination with other sets to make new sets, and of course on shelves to make visualizations.
To get a result like your question seeks, you could put your new set on the Row shelf, and put CNTD(ID) on the text shelf or columns shelf. Make sure you understand why you need count distinct (CNTD) instead of SUM([Number of Records]) here.
BTW, the LOD calculation { fixed [Id] : max([Fruit]="lemon") } is effectively the same solution.
I am trying to calculate the zscore with Rolling window. I need to actually calculate standard deviation for a 3 year rolling window to calculate z-score. A minimal working example is given below:
use http://dss.princeton.edu/training/Panel101.dta
xtset country year
rolling sd_x1=r(sd), step(1) window(3) saving(sd_x1, replace) keep(year): sum x1, detail
Now after this I need to merge it back with the original file. But the variable year does not appear but a column name date appears with all missing values. I am trying to merge it using the following command:
merge 1:1 country year using sd_x1
However, I get the error that variable year is not found and actually this variable is not kept while running the rolling command. Any help will be much appreciated.
I am always surprised that people have interest or faith in standard deviations based on three values.
A more direct approach would be to use rangestat (SSC). The syntax could be something like
use http://dss.princeton.edu/training/Panel101.dta
xtset country year
rangestat (sd) sd=x1, interval(year 0 2) by(country)
except that I cannot test this at the moment.
The key difference here is that rangestat produces new variables in the current dataset. Search the Statalist archives for examples of rangestat use.
Note that in your example the detail option is unnecessary as summarize by itself produces standard deviations.
You can extend this approach to get the mean at the same time.
I am new to Tableau and I have created a crosstab that shows a count of items per type. I want to add a column to my table. I need to know the percentage of the whole - I looked up a couple of things but I can't seem to find this exact problem.
Type |Count |% of Whole
-----------------------------------
A |10 |1%
B |99 |9.9%
C |256 |25.6%
D |300 |30%
E |335 |33.5%
After reading some I think my issue is that I am not sure how to derive a calculation that is going to give a TOTAL # of Types. In Excel I would take the row value divided by the sum of all rows. Additionally I am fairly certain that this will lead to an issue once I filter this table - not sure I know how to preserve the percentages with filters.
I am using Tableau 9.2. Thanks in advance for any help.
You can create the following calculated field:
SUM([Count])/TOTAL(SUM([Count]))
TOTAL takes into consideration all values of your variable.
Alternatively, you can use quick table calculation by right-clicking on count (here I'm using an example from the Superstore dataset):