I'm fairly new to tableau and I'm having the following issue. Below is a sample of the data I'm using.
Customer No | Item
___________________
1 A
1 B
2 A
3 A
4 A
4 B
5 B
6 A
I'm trying to get a count of how many customers bought Item A and B. So far I tried doing a separate group by combining A and B but I get the total result of 8. I also tried doing a calculation and I'm getting the same result of 8. Can someone please point me to the right direction on how to get this result. Thanks!
This is the result I'm trying to get:
Item| Count
A 5
B 3
A and B 2
I recreated your exact dataset and pasted it into Tableau so you could see a couple of examples.
Here's how you can see the number of customers who purchased an individual item, plus the number of customers who purchased both items.
Your calculation will be:
IF { FIXED [Customer No]: COUNTD([Item]) } = 1 THEN
[Item]
ELSE
'Both A and B'
END
And you'll need to set your view up to look like this:
Below are ways you can see when both items were purchased.
Boolean OR
The calculation you'll want to use is:
ATTR([ITEM]) = 'A' OR ATTR([ITEM]) = 'B'
And you'll want to set up your view to look like this:
A, B or Both
If you would like a bit more specificity in your result, you might try:
IF ATTR([Item]) = 'A' THEN
'A'
ELSEIF ATTR([Item]) = 'B' THEN
'B'
ELSE
'BOTH'
END
Replacing the previous calculation with the new looks like this:
More than 1 item
If the specific items purchased don't matter, you could use this logic.
COUNTD([Item]) > 1
Replacing the previous calculation with this one would look like:
More than 1 Item using a window function (probably overkill)
The calculation you'll need to use is:
WINDOW_COUNT(COUNTD([Item]))
Because this is a Window function, we'll need to specify how it's calculated across our dimensions. To do this click the down arrow on the right-hand side of the pill and select Edit Table Calculation...
You'll then need to set these settings:
I'll add the calculation we created in the first example ([A and B]) to the filter shelf and select True. That should give you something that looks like:
More than 1 item using a Level of Detail expression
The calculation for this example is:
{ EXCLUDE [Item]: COUNTD([Item]) }
You'll view should look like:
As you can see Tableau is quite flexible. Hope these examples were helpful!
You might want to use Tableau’s set feature to approach problems like this.
For example, right click on the field [Customer No] in the data pane (i.e. left sidebar) and choose the “Create Set” command. Click “Use All” at the top of the set panel and then click the Condition tab. Define the set using the condition MAX([Item] = “A”). Name the set “Customers who bought A”.
Similarly, create a set of customers who bought item B. You can then select both sets in the data pane, and create a combined set to be the intersection, that is, customers who bought both an item A and an item B.
You can think of a set as either a mathematical set of the members of a field that belong to the set (i.e. a set of customer ids) or as Boolean function defined for each data record in the data source indicating whether that data record is associated with the set (i.e. a Boolean function that operates on transactions to say whether the associated customer ID is in the set. A key to keep in mind for the condition formulas used here is that the condition is an aggregate formula, operating on a block of data records for a customer ID to determine whether the customer ID is in the set.
Once you have defined your sets of interest, you can use them in many ways - in calculated fields, as filters, as dimensions on shelves in a visualization, in set actions, to combine with other sets ...
To define a measure that counts the customers in a set, create a calculated field such as “[Num A Customers]” as COUNTD(if [Customers who bought A] then [Customer ID] end) Do the same for whatever other sets you are interested in. Then you can use those measures (probably with Measure Names and Measure Values) to make your viz.
Related
How can I extract the IN count portion of a Tableau set? I can see the IN/OUT counts when I drop the set into Text but can't figure out how to get at the IN value by itself.
Ultimately, I want to create a Pie Chart of three sets with just the IN counts as the measures.
I am using Tableau Public if that is a factor.
You have to be a little careful about specifying what you wish to count.
One way to think of a set is as a Boolean function that gives a value to each data record denoting whether that record is associated with the set.
Another way to think of a set is as a mathematical set whose members are a subset of the values for some discrete field. (Or Tuple of fields)
The difference between the two views is really just a mindset, whether you consider the set as a Boolean function whose domain is a data row in the data source, or whose domain is the field on which the set definition is based.
Say you are looking at Tableau’s Superstore data set where each data record is a line item for a product attached to an order.
If your set is based on the field Region, say its called [My Favorite Regions] and currently contains {“East”, “Central”} do you want your count to be 2 (i.e. the number of regions in the set) ? Or do you want your count to be in the tens of thousands (i.e the number of line items on orders from the regions in the set)? Or something in between, maybe the number of distinct orders (i.e. order ids) within the selected regions...
If you want to count data rows that are associated with the set, you can simply filter by the set and calculate SUM([Number of Records[). If you want to count the regions in the set even though the level of detail of the data is at the order line item level,then you’ll have to use either a COUNTD to count the distinct regions, or some approach to specify what it is you want Tableau to count.
For example, put your set on the filter shelf, and show COUNTD(Region) which could be slow for very large data sets. To get the same effect without an explicit filter, you can define a LOD calculation such as:
{ COUNTD(if [My Favorite Regions] then [Region] end) }
Or you could use a table calc with the SIZE() function to do the calculation in the Tableau client instead of by the data source.
Not sure what your data looks like but you could set a certain condition when creating a set or split the IN/OUT into two different sets.
Here's a link to sets in Tableau.
You can do this with an if statement
IF [set] = TRUE THEN 1 ELSE 0 END
Then I suppose you could sum this calculated field
The most common usage is when you have a lot of categories and want to create an 'Other' category based on the categories that aren't in a set, if the set is a "Top N Set"
To do this:
IF [set] = TRUE THEN [dimension] ELSE 'Others' END
I've been tasked to set up a Tableau worksheet of counts of data (ultimately to create percentages) where the contrived incoming data looks like the following.
id fruit
1 apple
1 orange
1 lemon
2 apple
2 orange
3 apple
3 orange
4 lemon
4 orange
The worksheet needs to look something like the following:
Count of ids
2 Lemons
2 No lemons
I've only been using Tableau for about 4 hours, so is this doable? Can anyone point me in the right direction?
The data is coming in from a SQL Server database in a format that I can control if that helps contribute towards a solution.
Alex's solution based on sets are very good for this scenario, but I would like to show that LODs can be more flexible if you need to extend your solution to include more categories.
for the current scenario, create a calculation with below formula and create text table using COUNTD(Id)
{FIXED [Id]:IF MAX([Fruit]='lemon') THEN 'Lemon' ELSE 'No Lemon' END}
Now for the extension part, you are considering below list where you want to count IDs with Lemon, Apple and others. Since no double counting of Ids are allowed, categorization will follow the order. (This kind of precedence will be a headache without LODs)
Now you can change your calculation as below:
{FIXED [Id]:IF MAX([Fruit]='lemon') THEN 'Lemon'
ELSEIF MAX([Fruit]='apple') THEN 'Apple'
ELSE 'No Lemon or Apple' END}
Now your visualization automatically changes to include the new category. This can be extended for any number of fruits.
This is a good use for a set.
In the data pane on the left sidebar, right click on the Id field and create a set named "Ids that contain at least one lemon" (or use a shorter less precise name)
In the set definition dialog panel, define the set by choosing "Use all" from the General tab, and then on the Condition tab, define the condition by the formula max([Fruit]="lemon")
There are many ways to think of a set, but the most abstract is just as a mathematical set of Ids that satisfy the condition. Remember each Id has many data rows, so the condition is a function of many data rows and uses the aggregation function MAX(). For booleans, True is treated as greater than False, so MAX() will return True if at least one of the data rows satifies the condition. By contrast, MIN() is True only if ALL (non-null) data rows satisfy the condition.
Once you have a set that separates your ids into Lemon scented Ids and others, then you can use that set in many ways - in calculated fields, in filters, in combination with other sets to make new sets, and of course on shelves to make visualizations.
To get a result like your question seeks, you could put your new set on the Row shelf, and put CNTD(ID) on the text shelf or columns shelf. Make sure you understand why you need count distinct (CNTD) instead of SUM([Number of Records]) here.
BTW, the LOD calculation { fixed [Id] : max([Fruit]="lemon") } is effectively the same solution.
I have a table that displays 8 columns. Due to the way the data source is structured (I cannot change this) it sometimes means one or more of the columns will be the same across two rows.
I need to filter to show only one row of data. This is based on "email', so if an email has more than 1 row, I want one of the rows. If the other columns are the same or not doesn't matter.
I don't need to combine anything, and I don't care which row is displayed, I just need to remove one of the duplicated rows.
Replace this:
Place orderid email item name date
a 1 a#a.com b c 1/1/11
a 1 a#a.com d c 1/1/11
With this:
Place orderid email item name date
a 1 a#a.com b c 1/1/11
Or this:
Place orderid email item name date
a 1 a#a.com d c 1/1/11
Any help would be much appreciated! I had a go with LOD calculations and I couldn't make that do what I wanted (that may well be me not understanding how to use them properly, though).
Given your example, it appears the difference is within the [item] field. You can create an index calculation on it and filter for value of 1.
index()
Set the index to discrete and compute using the [item] and filter for only 1. You can hide the field if you don't want it to appear by deselecting Show Header.
If you want to show the first row per email, regardless of what's in the other fields, try the following:
Create an [Index] calc field using INDEX().
Add [Index] to rows shelf and change to discrete.
Edit the table calc for [Index], select 'Specific Dimensions' and restarting every email (dropdown menu). All of the checkboxes should be ticked by default.
Notice how the index is not restarting for each email as expected. To correct this, drag email checkbox to the top position.
From here, filter [Index] = 1 and hide index column from view.
I have 3 columns in my data "Date", CustomerID and Action Type (There are 10 different types of Action type like item hover, product click etc)
I want to eliminate those customers who has interactions of less that 20.
For example, for a customer ID say 10, if the count of all the action type against this customer is less than 20 I want to eliminate that customer. I want to do this for each ween and create a line graph may be.
Somebody please help, Although I am trying to do this in tableau but excel and access solutions are welcomed too. I have tried everything I could but still couldn't do it. My calculated field only works if I use customer Id along wth count which gives me a table that I dont want.
Assuming Action Type cannot be null, put CustomerID on the filter shelf, select the "Use All" radio button at the top of the filter panel, switch to the Condition tab, Choose "By Field", and require the SUM of the Number of Records to be >= 20
If Action Type can be null, do the same but use COUNT of Action Type instead of SUM of Number Of Records.
Filters have multiple tabs. You can see a summary of how the filter is defined at the bottom of the General tab.
I have a really basic question that I can't figure out. I need to create a table that has multiple calculated fields, but I need only one of the calculated fields to be filtered for a specific dimension value. For example, I have the following data set (dummy data) and I want to create a table that will include total clicks for both companies, but [cost per click] from one company only, company B.
DATA SET
Company| Clicks| $ Cost
------------------------
Comp A | 100 | $20
Comp B | 200 | $40
WHAT I'M LOOKING FOR
CLICKS | COST/CLICK
TOTAL 300 | $0.13
$0.13 comes from 40/300; $40 from company B and 300 clicks from both company A and B.
How do you create a table that has multiple calculations but with one of those calculations filtered on one dimension value only?
One simple calculated field:
sum(if [Company] = 'B' then [Cost] end )/sum([Click])
This should get you in the right direction.
Based on your question and your comment, you want to divide the cost by the TOTAL number of clicks in your dataset.
Create a calculated field called "TotalClicks" and enter this formula
window_sum(sum([Clicks])) // This formula will sum the clicks field for all rows
Create a calculated field called Cost / Clicks and enter this formula
sum([Cost]) / [TotalClicks]
Add the Cost / Clicks field to the sheet and it should look like this
NOTE: If you need to partition / group your report, you may have to play around with this some. I don't use window functions within tableau very often since I usually handle the aggregation at the datasource level instead.
NOTE: Since you mentioned filtering, I will add this statement -- If you filter out any of your data, that data will not (cannot) be included in any calculated fields (To the best of my knowledge and experience, anyway). If you need to include that data (total clicks), I think the only option is to add that aggregated total to your dataset - otherwise, tableau can't calculate it if you are filtering it out.
Edit2: If you cannot change the underlying dataset, you could accomplish this by creating another datasource and joining it to your inital data source --
Data > Add Datasource, add the datasource again and change the name so you can identify it
Click Data > Edit Relationships. Click Custom and REMOVE any linked fields -- this will essentially produce a Cartesian join (every record in your first DataSource will have every reocrd from your second Datasource)
Select the second datasource and create a calculated field (ClicksTotal_DS2) using the same window_sum function
Select the first datasource and create a calculated field (named Cost / Clicks_DS2) using this formula
sum([Cost])/ [Sheet1 (test) (2)].[ClicksTotal_DS2]
Now you can apply a filter to your first data source, and your second data source will still calculate the total.