Tallying unknown words across columns in Tableau (or from comma separated column) - tableau-api

I have an issue that I have been trying to solve for the better part of a week now. I have a large database (in Google sheets) representing casestudies. I have some columns with multiple categories listed (in this example 'species', 'genera', and 'morphologies'), and I want to be able to tally how many times each category occurs in the data set.
I use Tableau to visalise the data, and the final output will be a large publc tableau. I know I can do a "find" based on the specific string, but I'd like the dataset to be dynamic and be able to handle new data being added without having to update calculated fields? Is there a way of finding uniqe terms (either from a single column of comma separated values, or from multiple columns), and tallying them?
Things I have tried so far:
1 - A pivot table in Tableau. Works well, but messes with all the other data, since it repeats lines.
2 - A pivot table on its own data source in Tableau. Also works well, and avoids the problem of messing with the other data. However, now each figure is disconnected from the others so I can't do a large dashboard where everything is filtered by each other (ie filtering species and genera by country at the same time).
3 - An SQL query() in google sheets, which finds all unique terms and queries them, which can then be plotted in Tableau. Also works well, but similar problem of the data being disconnected from all the other terms in the dataset.
Any ideas of a field calculation that will find, list and tally unique terms in a single comma separated column (or across multiple columns), without changing the data structure?
I have placed a sample data set here (google sheets), which is a smaller version of what I'm actually working on. In it I have marked comma separated columns in grey, and they're followed by a bunch of columns with the values split into columns. I only need to analyse either of those (ie either a calculation to separate comma separate values or from multiple columns).
I've also added a sample Tableau workbook here.

Related

Tableau make one line out of two if same city name

Does anyone know if I can add two rows together so that I end up with just one row in Tableau (see screenshot)? So, if both rows are city Aachen and one row has a value for cost but not for purchasing power and the other row has a value for purchasing power but not cost, I would want just one row with both values. I am not interested in the columns "Table Name" and "Document Index(...". Thankful for any help!
Manipulating data like that in Tableau is usually no-go. Nevertheless, you can try Tableau prep and you should be able to do what you need here. Or maybe a different tool (even excel).
With that said, even though you have the info in two rows, the default approach for Tableau is always to aggregate data, so even if you have many rows with similar cases, once you take it to a viz using City (for example) as a dimension, this issue shouldn't really matter.

Is is possible limit the number of rows in the output of a Dataprep flow?

I'm using Dataprep on GCP to wrangle a large file with a billion rows. I would like to limit the number of rows in the output of the flow, as I am prototyping a Machine Learning model.
Let's say I would like to keep one million rows out of the original billion. Is this possible to do this with Dataprep? I have reviewed the documentation of sampling, but that only applies to the input of the Transformer tool and not the outcome of the process.
You can do this, but it does take a bit of extra work in your Recipe--set up a formula in a new column using something like RANDBETWEEN to give you a random integer output between 1 and 1,000 (in this million-to-billion case). From there, you can filter rows based on whatever random integer between 1 and 1,000 as what you'll keep, and then your output will only have your randomized subset. Just have your last part of the recipe remove this temporary column.
So indeed there are 2 approaches to this.
As Courtney Grimes said, you can use one of the 2 functions that create random-number out of a range.
randbetween :
rand :
These methods can be used to slice an "even" portion of your data. As suggested, a randbetween(1,1000) , then pick 1<x<1000 to filter, because it's 1\1000 of data (million out of a billion).
Alternatively, if you just want to have million records in your output, but either
Don't want to rely on the knowledge of the size of the entire table
just want the first million rows, agnostic to how many rows there are -
You can just use 2 of these 3 row filtering methods: (top rows\ range)
P.S
By understanding the $sourcerownumber metadata parameter (can read in-product documentation), you can filter\keep a portion of the data (as per the first scenario) in 1 step (AKA without creating an additional column.
BTW, an easy way of "discovery" of how-to's in Trifacta would be to just type what you're looking for in the "search-transtormation" pane (accessed via ctrl-k). By searching "filter", you'll get most of the relevant options for your problem.
Cheers!

How can I create filters on a series of tables where the final table yields a single data observation?

I am creating an interactive 'calculator' using tableau. I have a series of dataframes that I have crossed with one another, such that the resulting dataframe is every possible combination between the tables, and every row is unique.
Each column is its own worksheet as a table. Each table in the dashboard is a pane. So, here we have a series of tables with selectable units of measurement, and the final pane on the dashboard should filter to the cell for its respective column, on the unique row of the dataset that the user has selected and 'filtered out'.
I'm having some issues getting this to work and not sure why.
The closest I can think to solving this would be 'Cascading Filters.' Here are a couple resources:
General Use
In dashboard action-filter form
The critical piece, however, is that the filters must be selected in a specific order - therefore making them 'cascading.' This may differ from your presumed concept of clicking/filtering in any order on the worksheets to then arrive to a final answer. I do think that this may be a limitation of Tableau - I don't think that a 'many to many' type of relationship can be set up within Action Filters.

tableau show categories from calculation even when a category is not visible

I have a calculation and it outputs multiple values. Then I am creating a table on those values. For example, in below data my formula is
if data is 1 then calculation is `one`
if data is 2 then calculation is `two`
if data is 3 then calculation is `three`
as three doesn't really appear in the output, when I create a table, three is not displayed. Is there any way to display it?
I tried table layout >> show empty rows and columns and it didn't work
data calculation
1 one
2 two
Tableau discovers the possible values for a dimension field dynamically from the query results.
If ‘three’ does not appear in your data, then how do you expect Tableau to know to make a column header for that non existent, but potential, value? It can’t read your mind.
This situation does occur often though - perhaps you want row or column headers to remain stable, even when you change filters in a way that causes some to no longer appear in the query results.
There are a few ways you can force Tableau to pad ** or **complete a domain:
one solution is to pad your data to make sure each value for your dimension field appears in at least one data row.
You can often do this easily by using a union to append some extra rows to your original data. You can often add padding rows that don’t impact any results by leaving all your Measure columns null since nulls are ignored by aggregation functions
Another common solution that is a bit more effort is to make what is known as scaffolding data source that is not much more than a list of your dimension members. You can then use that data source as a primary data source with data blending, making your original data source secondary.
There are two situations where Tableau can detect the absence of data and leave space for it in the visualization automatically
for numeric types, you can create a bin field that will automatically pad for missing bins
similarly, date fields can show missing values because, like bins, Tableau can tell when a month doesn’t appear in the data and leave room for it in the view

How to display 40 + columns in Tableau?

I am trying to do a list report with about 40 columns(Dims+measure) but not able to get it right,
the requirement pushes the Tableau limitation by exploiting its limit to only 16 columns.
How can I get this done?
I read this
Here is my Tableau workbook with 16+ columns but no column header
Go to Analysis-->Table Layout -->Advanced and change the number in Rows and Columns as per your need.
You can't add more than 16 to this, but increase it to 16 (for identification).
So, save the Tableau file with extension .TWB. Then open this file in notepad.
Then search for the text: attr='row-levels'.
You will find something like:
<format attr='row-levels' value='16' />
<format attr='row-horiz-levels' value='16' />
Change the value of 16 to desired column numbers. Save the notepad file. Open it in Tableau.
The measures names and measures values special fields can help here and covers most use cases. (Using the measure names and values fields is likely a better choice than creating 40+ marks cards as you did in your posted example)
Put Measure Names on the column and filter shelves and measure values on the text shelf. Then add the measure fields you want to the Measures Values shelf. Then put the dimensions that you wish on the rows shelf.
A single field+aggregation can only be on the Measure Values shelf once, but a field can repeat with different aggregations -- so you can show the min, avg and max of a measure in 3 different columns.
As you mentioned, you can increase the max col and row headers up to 16 each via the Analysis->Table Layout->Advanced menu and panel. Beyond that point, adjacent columns will still display, just be coalesced for display.
Still you can have an apparently arbitrary number of fields on the measures values shelf, so can display as many columns of measures (data) as you wish, even though adjacent header columns for dimension (~category) get coalesced for display once you hit the header limit.
Tableau is optimized for summarizing data for efficient interpretation by humans, so displaying extremely wide tables of data is not the best fit for the tool (or a human reader frankly). Importing and exporting large tables is certainly possible.
At the 2015 conference I went to a session called "Use Tableau Like a Sith" and they showed us how to change the XML to workaround the 16 limit. Caveat being this is not supported.
Find the entries in the attached image and change their value to 40. In the screenshot, the Sith presenters were changing them to 36.
Here is a workaround for some data sets:
convert your fields from Dimension to Measure, and then
display using Measure Names / Measure Values, as #Alex Blakemore suggested.
For example, Boolean fields can be converted to numeric using INT().
PROS:
It is easier to change which fields to plot using Measure Names / Measure Values.
Faster performance, at least for some data sets.
CONS:
Often data sets have some fields that cannot or should not be converted to measure.
Not as easy or straightforward as changing Analysis > Table Layout > Advanced settings, or the xml-editing workaround suggested by #Cyndi1976.
There are Two ways:
Edit the saved .twb file and edit the Below xml code by opening the workbook with Notepad
<format attr='row-levels' value='16' />
<format attr='row-horiz-levels' value='16' />
Create 3 different worksheets each consisting multiple column but each worksheet consisting columns >16 and place them in single dashboard. So you will get one view with 40 columns.
A good way to do this is to create groups and filters. I'm sure, out of 40+ columns, a good number of them can be converted to either of the above, giving a neater look to your dashboard, making it easy to comprehend your data.
Let us assume you're creating a dashboard to show the overall split of mobile recharges for a company x.
One of the option is to have multiple columns; each for:
the mobile OS
OS version
service provider
recharge rank
Sub-category (Prepaid / Postpaid)
...
the easier and elegant way to reduce the number of columns is to populate a dropdown list with these values. Not only this will make the dashboard easier to comprehend, it will reduce the number of columns one has to refer to interpret the data and would also reduce the technical limitations imposed on the number of columns.
to create a group in Tableau:
include the fields in the result set i.e. use the column[s] in select statement.
select os, os_version, service_provider, rank, subcategory ... from schema.recharge_table [where...];
In the Sheets view of Tableau, right click on the field to create group. Let's create a split on subcategory.
Group the sub-categories, give them proper alias to be recognised easily.
Drag the Group to filter and you've successfully and elegantly reduced one column.
16 is the maximum limit for row/column labels in tableau table.
Put 20 columns on one sheet and 20 one the other dashabord. Drag and drop both sheets on to your dashbaord, and you should be having 40 columsn.