I'm having some texts and for each text I'm having some tokens attached to it in a single column. The data looks like
Text_ID, Token
1, energy
1, debit
1, flat
2, energy
2, house
3, energy
3, debit
I created a metric according to countd(Text_ID) to count the numbers of distinct texts/text_ids. Now putting everything in the obvious way on a worksheet and selecting/filtering some tokens gives me essentially a union.
I'd like to select energy and debit and get the proper count 2 instead of 3 here. Of course I'm having a lot of tokens, hence the approach described here is not scalable.
Any suggestions on how to proceed here?
I'd assume that I need to create two parameters here, but I don't know how to use them to filter properly.
It seems like you want to identify the Text Ids that have at least one data row containing one specified token, “energy” in your example, and also have at least one data row containing another specified token, “debt” in your example. If that is the problem then I’d suggest using a set as follows.
Define two parameters to allow users to specify their tokens of interest, called for instance, token_1 and token_2
Define a set based on the Text Id field called Selected_Texts using the condition MAX([Token] = token_1) and MAX([Token] = token_2)
You can use the set in many different ways. If you want to count the number of distinct text ids that are in the set, create a calculated field called Selected_Text_Id as if [Selected_Texts] then [Text_ID] end Which holds the text id for matching texts and is null for others. Then you can plot COUNTD([Selected_Text_Id]) to answer your original question.
Create 2 paramter fields for selecting the required values and create a formula and add below code:
{ FIXED [Text ID]:
SUM(IF [Token] = [Parameter 1]
or [Token] = [Parameter 2]
THEN 1
ELSE 0
END)}
Related
I used an Apriori algorithm to view the frequent relationships in the dataset and I want to do a dashboard to better visualize this data but I don't know how to do this filter.
This is the bar chart that I created to show the support (amount of times something happend) and the confidence (probability of B happening given A) of these associations:
Apriori Chart
Next to it on the dashboard, I'll have a table with the full dataset used in this Apriori analysis where I have more information such as ID, Income, Hours Worked, etc:
Table from different data source
How can I create this relationship? The two data sources don't have a column in common that I can use for that.
I would need some way to:
Split the values in the antecedents columns by comma and filter only those columns with value equal to 1 in the other dataset
**Dataset A**
'Age Range <=30, Joblevel 1, Maritalstatus Single'
->
'Age Range <=30'
'Joblevel 1'
'Maritalstatus Single'
**Dataset B**
'Age Range <=30' == 1
'Joblevel 1' == 1
'Maritalstatus Single' == 1
Clicking this would filter the table next to it
Is there any way I can do this in Tableau?
You can download the tbwx i used in this example here https://community.tableau.com/servlet/JiveServlet/download/1083124-384949/Apriori.twbx
Thanks in advance for the help!
I am not able to check your twbx on the machine I'm using but I think you should be able to do this. The fields in the 2 data sources need to match so manipulate the data sources the make this happen.
For data source 1 there's a function SPLIT which will mean you are able to split the comma separated string to 3 fields.
Putting those 3 fields to the Detail shelf of your bar chart (or even Rows and hiding the header) will mean you can use them in an action filter.
Your second data source is a cross tab - post pivot. You should be able to pivot this data source. Highlight the measures and pivot them. This will give you the field Pivot Field Names and Pivot Field Values.
You only want to keep those with a value of 1 so create a calculated field
[Lookup1]: IF [Pivot Field Values] = 1 THEN [Pivot Field Names] END
Duplicate this field twice so you have Lookup1, Lookup2 and Lookup 3.
Then you should be able to action filter the table.
In the action filter set it up so SplitField1 = Lookup1, SplitField2 = Lookup2, etc.
Fingers crossed this works, I haven't been able to test so I am pulling it out of my head.
I'm fairly new to tableau and I'm having the following issue. Below is a sample of the data I'm using.
Customer No | Item
___________________
1 A
1 B
2 A
3 A
4 A
4 B
5 B
6 A
I'm trying to get a count of how many customers bought Item A and B. So far I tried doing a separate group by combining A and B but I get the total result of 8. I also tried doing a calculation and I'm getting the same result of 8. Can someone please point me to the right direction on how to get this result. Thanks!
This is the result I'm trying to get:
Item| Count
A 5
B 3
A and B 2
I recreated your exact dataset and pasted it into Tableau so you could see a couple of examples.
Here's how you can see the number of customers who purchased an individual item, plus the number of customers who purchased both items.
Your calculation will be:
IF { FIXED [Customer No]: COUNTD([Item]) } = 1 THEN
[Item]
ELSE
'Both A and B'
END
And you'll need to set your view up to look like this:
Below are ways you can see when both items were purchased.
Boolean OR
The calculation you'll want to use is:
ATTR([ITEM]) = 'A' OR ATTR([ITEM]) = 'B'
And you'll want to set up your view to look like this:
A, B or Both
If you would like a bit more specificity in your result, you might try:
IF ATTR([Item]) = 'A' THEN
'A'
ELSEIF ATTR([Item]) = 'B' THEN
'B'
ELSE
'BOTH'
END
Replacing the previous calculation with the new looks like this:
More than 1 item
If the specific items purchased don't matter, you could use this logic.
COUNTD([Item]) > 1
Replacing the previous calculation with this one would look like:
More than 1 Item using a window function (probably overkill)
The calculation you'll need to use is:
WINDOW_COUNT(COUNTD([Item]))
Because this is a Window function, we'll need to specify how it's calculated across our dimensions. To do this click the down arrow on the right-hand side of the pill and select Edit Table Calculation...
You'll then need to set these settings:
I'll add the calculation we created in the first example ([A and B]) to the filter shelf and select True. That should give you something that looks like:
More than 1 item using a Level of Detail expression
The calculation for this example is:
{ EXCLUDE [Item]: COUNTD([Item]) }
You'll view should look like:
As you can see Tableau is quite flexible. Hope these examples were helpful!
You might want to use Tableau’s set feature to approach problems like this.
For example, right click on the field [Customer No] in the data pane (i.e. left sidebar) and choose the “Create Set” command. Click “Use All” at the top of the set panel and then click the Condition tab. Define the set using the condition MAX([Item] = “A”). Name the set “Customers who bought A”.
Similarly, create a set of customers who bought item B. You can then select both sets in the data pane, and create a combined set to be the intersection, that is, customers who bought both an item A and an item B.
You can think of a set as either a mathematical set of the members of a field that belong to the set (i.e. a set of customer ids) or as Boolean function defined for each data record in the data source indicating whether that data record is associated with the set (i.e. a Boolean function that operates on transactions to say whether the associated customer ID is in the set. A key to keep in mind for the condition formulas used here is that the condition is an aggregate formula, operating on a block of data records for a customer ID to determine whether the customer ID is in the set.
Once you have defined your sets of interest, you can use them in many ways - in calculated fields, as filters, as dimensions on shelves in a visualization, in set actions, to combine with other sets ...
To define a measure that counts the customers in a set, create a calculated field such as “[Num A Customers]” as COUNTD(if [Customers who bought A] then [Customer ID] end) Do the same for whatever other sets you are interested in. Then you can use those measures (probably with Measure Names and Measure Values) to make your viz.
How can I extract the IN count portion of a Tableau set? I can see the IN/OUT counts when I drop the set into Text but can't figure out how to get at the IN value by itself.
Ultimately, I want to create a Pie Chart of three sets with just the IN counts as the measures.
I am using Tableau Public if that is a factor.
You have to be a little careful about specifying what you wish to count.
One way to think of a set is as a Boolean function that gives a value to each data record denoting whether that record is associated with the set.
Another way to think of a set is as a mathematical set whose members are a subset of the values for some discrete field. (Or Tuple of fields)
The difference between the two views is really just a mindset, whether you consider the set as a Boolean function whose domain is a data row in the data source, or whose domain is the field on which the set definition is based.
Say you are looking at Tableau’s Superstore data set where each data record is a line item for a product attached to an order.
If your set is based on the field Region, say its called [My Favorite Regions] and currently contains {“East”, “Central”} do you want your count to be 2 (i.e. the number of regions in the set) ? Or do you want your count to be in the tens of thousands (i.e the number of line items on orders from the regions in the set)? Or something in between, maybe the number of distinct orders (i.e. order ids) within the selected regions...
If you want to count data rows that are associated with the set, you can simply filter by the set and calculate SUM([Number of Records[). If you want to count the regions in the set even though the level of detail of the data is at the order line item level,then you’ll have to use either a COUNTD to count the distinct regions, or some approach to specify what it is you want Tableau to count.
For example, put your set on the filter shelf, and show COUNTD(Region) which could be slow for very large data sets. To get the same effect without an explicit filter, you can define a LOD calculation such as:
{ COUNTD(if [My Favorite Regions] then [Region] end) }
Or you could use a table calc with the SIZE() function to do the calculation in the Tableau client instead of by the data source.
Not sure what your data looks like but you could set a certain condition when creating a set or split the IN/OUT into two different sets.
Here's a link to sets in Tableau.
You can do this with an if statement
IF [set] = TRUE THEN 1 ELSE 0 END
Then I suppose you could sum this calculated field
The most common usage is when you have a lot of categories and want to create an 'Other' category based on the categories that aren't in a set, if the set is a "Top N Set"
To do this:
IF [set] = TRUE THEN [dimension] ELSE 'Others' END
I've been tasked to set up a Tableau worksheet of counts of data (ultimately to create percentages) where the contrived incoming data looks like the following.
id fruit
1 apple
1 orange
1 lemon
2 apple
2 orange
3 apple
3 orange
4 lemon
4 orange
The worksheet needs to look something like the following:
Count of ids
2 Lemons
2 No lemons
I've only been using Tableau for about 4 hours, so is this doable? Can anyone point me in the right direction?
The data is coming in from a SQL Server database in a format that I can control if that helps contribute towards a solution.
Alex's solution based on sets are very good for this scenario, but I would like to show that LODs can be more flexible if you need to extend your solution to include more categories.
for the current scenario, create a calculation with below formula and create text table using COUNTD(Id)
{FIXED [Id]:IF MAX([Fruit]='lemon') THEN 'Lemon' ELSE 'No Lemon' END}
Now for the extension part, you are considering below list where you want to count IDs with Lemon, Apple and others. Since no double counting of Ids are allowed, categorization will follow the order. (This kind of precedence will be a headache without LODs)
Now you can change your calculation as below:
{FIXED [Id]:IF MAX([Fruit]='lemon') THEN 'Lemon'
ELSEIF MAX([Fruit]='apple') THEN 'Apple'
ELSE 'No Lemon or Apple' END}
Now your visualization automatically changes to include the new category. This can be extended for any number of fruits.
This is a good use for a set.
In the data pane on the left sidebar, right click on the Id field and create a set named "Ids that contain at least one lemon" (or use a shorter less precise name)
In the set definition dialog panel, define the set by choosing "Use all" from the General tab, and then on the Condition tab, define the condition by the formula max([Fruit]="lemon")
There are many ways to think of a set, but the most abstract is just as a mathematical set of Ids that satisfy the condition. Remember each Id has many data rows, so the condition is a function of many data rows and uses the aggregation function MAX(). For booleans, True is treated as greater than False, so MAX() will return True if at least one of the data rows satifies the condition. By contrast, MIN() is True only if ALL (non-null) data rows satisfy the condition.
Once you have a set that separates your ids into Lemon scented Ids and others, then you can use that set in many ways - in calculated fields, in filters, in combination with other sets to make new sets, and of course on shelves to make visualizations.
To get a result like your question seeks, you could put your new set on the Row shelf, and put CNTD(ID) on the text shelf or columns shelf. Make sure you understand why you need count distinct (CNTD) instead of SUM([Number of Records]) here.
BTW, the LOD calculation { fixed [Id] : max([Fruit]="lemon") } is effectively the same solution.
I have a table with different columns called "Bajo conocimiento...", "Exceso de...", etc. Each column has discrete values: 1, 2, 3 or empty (NULL). I need to count the number of occurrences of 1, 2 and 3 in each of these variables.
What I did first was to create bins for each variable. Thus I was able to create a graphic shown below.
However, the problem is that always one of variables is automatically picked as a filter. In the example below this is "Bajo conocimiento..." (the blue one). So, the values for "Bajo conocimiento..." are correct, however, for instance, the COUNT("Exceso de...") for the values 1, 2 and 3 are the intersections of "Exceso de..." with "Bajo conocimiento...". As a result, the values of "Exceso de..." are lower that they should be.
How can I count the occurrences of 1, 2 and 3 for each variable independently and get the resulting chart?.
You will have to create a table calculation for every one of your columns.
Create a calculated field with the following formula and call it eg [ExcesoCOUNT]:
{fixed [columnname]: COUNT([columnname])}
if you drag the newly created field on your dashboard and turn it into a AVG() rather than a SUM(), you will get the count of each value.