I've been tasked to set up a Tableau worksheet of counts of data (ultimately to create percentages) where the contrived incoming data looks like the following.
id fruit
1 apple
1 orange
1 lemon
2 apple
2 orange
3 apple
3 orange
4 lemon
4 orange
The worksheet needs to look something like the following:
Count of ids
2 Lemons
2 No lemons
I've only been using Tableau for about 4 hours, so is this doable? Can anyone point me in the right direction?
The data is coming in from a SQL Server database in a format that I can control if that helps contribute towards a solution.
Alex's solution based on sets are very good for this scenario, but I would like to show that LODs can be more flexible if you need to extend your solution to include more categories.
for the current scenario, create a calculation with below formula and create text table using COUNTD(Id)
{FIXED [Id]:IF MAX([Fruit]='lemon') THEN 'Lemon' ELSE 'No Lemon' END}
Now for the extension part, you are considering below list where you want to count IDs with Lemon, Apple and others. Since no double counting of Ids are allowed, categorization will follow the order. (This kind of precedence will be a headache without LODs)
Now you can change your calculation as below:
{FIXED [Id]:IF MAX([Fruit]='lemon') THEN 'Lemon'
ELSEIF MAX([Fruit]='apple') THEN 'Apple'
ELSE 'No Lemon or Apple' END}
Now your visualization automatically changes to include the new category. This can be extended for any number of fruits.
This is a good use for a set.
In the data pane on the left sidebar, right click on the Id field and create a set named "Ids that contain at least one lemon" (or use a shorter less precise name)
In the set definition dialog panel, define the set by choosing "Use all" from the General tab, and then on the Condition tab, define the condition by the formula max([Fruit]="lemon")
There are many ways to think of a set, but the most abstract is just as a mathematical set of Ids that satisfy the condition. Remember each Id has many data rows, so the condition is a function of many data rows and uses the aggregation function MAX(). For booleans, True is treated as greater than False, so MAX() will return True if at least one of the data rows satifies the condition. By contrast, MIN() is True only if ALL (non-null) data rows satisfy the condition.
Once you have a set that separates your ids into Lemon scented Ids and others, then you can use that set in many ways - in calculated fields, in filters, in combination with other sets to make new sets, and of course on shelves to make visualizations.
To get a result like your question seeks, you could put your new set on the Row shelf, and put CNTD(ID) on the text shelf or columns shelf. Make sure you understand why you need count distinct (CNTD) instead of SUM([Number of Records]) here.
BTW, the LOD calculation { fixed [Id] : max([Fruit]="lemon") } is effectively the same solution.
Related
I have a spreadsheet of support case management data. I am working with this in Tableau. Each line in the spreadsheet is an individual case. Each case has, among much else, a support agent name and a Yes or No of whether the case work was started within 12 hours. I'd like to know, for each agent, what percentage of the time they started the case work within 12 hours. So, if Bob has 2 "No" and 8 "Yes", he should have 8 / (2 + 8) = 80%.
My attempt at this was to create 2 sets. One is the set of "Yes, started within 12 hours" (those that have "yes" in that field, and one that is the set of "No, not started within 12 hours", the complement to the other set. Silly me, I thought I could do something like COUNT(yeses) / COUNT(nos). Nope, big red failure. So what is the right way to do this?
It would help immensely to please respond as if this is the first thing I have ever done in Tableau. It is. I have learned a lot in this project, but only in comparison to the nothing I knew previously. Please also let me know if I've left out something necessary to answer this. I've tried to be complete but, well, am noob...
If it clarifies anything, here's a poor Excel mockup of data and the effect I'm looking for:
Yes, this is possible and easy in Tableau, but first a couple of points.
The reason your attempt to use COUNT() did not work is that COUNT() does not operate the way you, and that 99% of the people on the planet, expect. COUNT([some expression]) returns the number of records that have a non-null value, any value, for [some expression]. The name comes from SQL relational databases.
The calculations would be just a bit simpler if your third column took the boolean values True or False instead of the string values “Yes” or “No”. (In which case you could drop ‘= “Yes”’ from the formula below)
So two ways to do your calculation are:
Directly with an aggregate calculation which can get the right result, but is hard coded for this case, such as:
SUM(INT([Started within 24 hrs?] = “Yes”)) / SUM([Number of Records])
Using a table calc - which in this case is a bit easier and more flexible.
First, Build a table or viz in Tableau showing SUM([Number of Records]) with the dimensions you care about in play. Say with [Name] on Rows, [Started within 24 hrs?] on Columns and SUM([Number of Records] on Text. Second, Right click on your measure SUM([Number of Records]) and choose Percentage of Total from the Quick Table Calc menu. Finally, use that same menu to adjust Compute Using to specify how you want the percentages computed - in this case, using [Started within 24 hrs?]
If you only want to show some of the data, right click on the column header for the values you wish to hide and choose Hide.
The type conversion function INT() converts True to 1 and False to 0.
You could create another column that converts the Yes's into 1's, and the No's into 0's. Sum up all the 1's and divide it by the total and that's your percentage.
edit: the new column would look something like
=IF(C3="Yes",1,0)
in other words, if Cⁿ is "Yes", then 1, else 0
I have two calculated fields (HomeScore, AwayScore) and I grouped them by different dimensions(Home, Away). Now, I have TotalRuns per Team both in HomeGames and AwayGames. My problem is that I want to find the sum of TotalRuns per Team not separetely for home games and away games. I want to add these group-by fields somehow. I attach a screenshot to see my work. For example first column for both charts is "Arizona Diamondbacks" which has 263 Runs in first chart and 337 in the second one. I want to show the 263+337=600 Runs. Any Idea?
You'll want to create a LOD expression.
{FIXED [Team Name] : SUM([Total Runs])}
Think of your data as a big table (which it technically always is in Tableau). Every grouping, filter, etc. that you do narrows down the number of columns and rows you have left until you are left with your data set that contributes to your chart. LOD expressions allow you to back out of the filters, etc. in your calculation. In this case, you narrowed down to home or away games, and we are backing out of that to get a bigger picture of the data.
I'm fairly new to tableau and I'm having the following issue. Below is a sample of the data I'm using.
Customer No | Item
___________________
1 A
1 B
2 A
3 A
4 A
4 B
5 B
6 A
I'm trying to get a count of how many customers bought Item A and B. So far I tried doing a separate group by combining A and B but I get the total result of 8. I also tried doing a calculation and I'm getting the same result of 8. Can someone please point me to the right direction on how to get this result. Thanks!
This is the result I'm trying to get:
Item| Count
A 5
B 3
A and B 2
I recreated your exact dataset and pasted it into Tableau so you could see a couple of examples.
Here's how you can see the number of customers who purchased an individual item, plus the number of customers who purchased both items.
Your calculation will be:
IF { FIXED [Customer No]: COUNTD([Item]) } = 1 THEN
[Item]
ELSE
'Both A and B'
END
And you'll need to set your view up to look like this:
Below are ways you can see when both items were purchased.
Boolean OR
The calculation you'll want to use is:
ATTR([ITEM]) = 'A' OR ATTR([ITEM]) = 'B'
And you'll want to set up your view to look like this:
A, B or Both
If you would like a bit more specificity in your result, you might try:
IF ATTR([Item]) = 'A' THEN
'A'
ELSEIF ATTR([Item]) = 'B' THEN
'B'
ELSE
'BOTH'
END
Replacing the previous calculation with the new looks like this:
More than 1 item
If the specific items purchased don't matter, you could use this logic.
COUNTD([Item]) > 1
Replacing the previous calculation with this one would look like:
More than 1 Item using a window function (probably overkill)
The calculation you'll need to use is:
WINDOW_COUNT(COUNTD([Item]))
Because this is a Window function, we'll need to specify how it's calculated across our dimensions. To do this click the down arrow on the right-hand side of the pill and select Edit Table Calculation...
You'll then need to set these settings:
I'll add the calculation we created in the first example ([A and B]) to the filter shelf and select True. That should give you something that looks like:
More than 1 item using a Level of Detail expression
The calculation for this example is:
{ EXCLUDE [Item]: COUNTD([Item]) }
You'll view should look like:
As you can see Tableau is quite flexible. Hope these examples were helpful!
You might want to use Tableau’s set feature to approach problems like this.
For example, right click on the field [Customer No] in the data pane (i.e. left sidebar) and choose the “Create Set” command. Click “Use All” at the top of the set panel and then click the Condition tab. Define the set using the condition MAX([Item] = “A”). Name the set “Customers who bought A”.
Similarly, create a set of customers who bought item B. You can then select both sets in the data pane, and create a combined set to be the intersection, that is, customers who bought both an item A and an item B.
You can think of a set as either a mathematical set of the members of a field that belong to the set (i.e. a set of customer ids) or as Boolean function defined for each data record in the data source indicating whether that data record is associated with the set (i.e. a Boolean function that operates on transactions to say whether the associated customer ID is in the set. A key to keep in mind for the condition formulas used here is that the condition is an aggregate formula, operating on a block of data records for a customer ID to determine whether the customer ID is in the set.
Once you have defined your sets of interest, you can use them in many ways - in calculated fields, as filters, as dimensions on shelves in a visualization, in set actions, to combine with other sets ...
To define a measure that counts the customers in a set, create a calculated field such as “[Num A Customers]” as COUNTD(if [Customers who bought A] then [Customer ID] end) Do the same for whatever other sets you are interested in. Then you can use those measures (probably with Measure Names and Measure Values) to make your viz.
How can I extract the IN count portion of a Tableau set? I can see the IN/OUT counts when I drop the set into Text but can't figure out how to get at the IN value by itself.
Ultimately, I want to create a Pie Chart of three sets with just the IN counts as the measures.
I am using Tableau Public if that is a factor.
You have to be a little careful about specifying what you wish to count.
One way to think of a set is as a Boolean function that gives a value to each data record denoting whether that record is associated with the set.
Another way to think of a set is as a mathematical set whose members are a subset of the values for some discrete field. (Or Tuple of fields)
The difference between the two views is really just a mindset, whether you consider the set as a Boolean function whose domain is a data row in the data source, or whose domain is the field on which the set definition is based.
Say you are looking at Tableau’s Superstore data set where each data record is a line item for a product attached to an order.
If your set is based on the field Region, say its called [My Favorite Regions] and currently contains {“East”, “Central”} do you want your count to be 2 (i.e. the number of regions in the set) ? Or do you want your count to be in the tens of thousands (i.e the number of line items on orders from the regions in the set)? Or something in between, maybe the number of distinct orders (i.e. order ids) within the selected regions...
If you want to count data rows that are associated with the set, you can simply filter by the set and calculate SUM([Number of Records[). If you want to count the regions in the set even though the level of detail of the data is at the order line item level,then you’ll have to use either a COUNTD to count the distinct regions, or some approach to specify what it is you want Tableau to count.
For example, put your set on the filter shelf, and show COUNTD(Region) which could be slow for very large data sets. To get the same effect without an explicit filter, you can define a LOD calculation such as:
{ COUNTD(if [My Favorite Regions] then [Region] end) }
Or you could use a table calc with the SIZE() function to do the calculation in the Tableau client instead of by the data source.
Not sure what your data looks like but you could set a certain condition when creating a set or split the IN/OUT into two different sets.
Here's a link to sets in Tableau.
You can do this with an if statement
IF [set] = TRUE THEN 1 ELSE 0 END
Then I suppose you could sum this calculated field
The most common usage is when you have a lot of categories and want to create an 'Other' category based on the categories that aren't in a set, if the set is a "Top N Set"
To do this:
IF [set] = TRUE THEN [dimension] ELSE 'Others' END
I've got some data that I'd like to display both the averages and the count for.
For instance, there are 50 People taking a survey. Their names are saved in a Dimension "Raters". They are taste testing several products. These products are saved in a Dimension "Products"
They answer 4 questions. Taste, Texture, Appearance, Uniqueness, all saved in Dimension "Question"
The actual ratings are saved in "Ratings". This is a measure.
I can very easily make a table with Raters on the Rows, Question on the Columns, AVG(Ratings) in the text.
This shows me the average score for each question the rater answered.
It looks like this:
Rater-----Taste-----Texture-----Appearance-----Uniqueness
Joe---------2.2---------4.3--------------3.7-----------------2.4
Bob--------3.0----------1.2-------------3.4-----------------4.4
Sally-------4.5----------3.3-------------4.5-----------------3.2
Jessica---5.0----------3.0-------------2.0-----------------1.0
So far, so good.
Jessica's results look suspiciously integerish. When I look at the background data, I see that she only answered for 1 product.
I'd like to be able to add a column to the right of uniqueness which is the count of all product responses for that person.
I've played with this quite a bit, and I'm not sure that it is possible. Maybe with LOD?
I'd also like to filter the table, so that only "tough" raters are shown. Criteria for this is: Their average response for at least two criteria should be below 3.0. That would include Joe and Jessica.
When I try to do counts based on averages, I run into the "cannot aggregate an aggregate rule".
Is there a way around this? It would be trivial to do in excel with another column, a countif, and a filter.
Thanks,
Chris
Part 1:
You should be able to create a calculated field(Analysis->Calculated Field) and name it something like "Number of Records". In the query box just set it to 1 and select "Okay".
This new field will be selectable in the measures. Drag it into your table in the columns area and it should add a count next to your averages.
Part2:
In your measure values box you should be able to right click you measures. This will bring up a list of options including "Filter". Select this option.
On the SUM(Number of Records) set it to "At Least" = 2. Then right click on the AVG(Ratings) measure and set it to "At Most" = 3
Put Products on the Rows shelf.
Then right click on that Products field on the Rows shelf and change ITT from a dimension to a measure. Be sure to choose Count Distinct for the aggregation.
Finally, right click on the field again and change it from continuous to discrete.
This shows how many different products each person reviewed, no matter how many characteristics they rated. If you want the number of ratings, use count instead of count distinct. Or just Sum(number of records), again set to discrete