Tableau counting multiple blanks - tableau-api

I have a dataset containing roughly 50 columns. I want to check the data inputs and show how many blanks/if any are in each column.
I originally started using calculated fields for columns and using sum(if isnull([column] then 1 else 0 end)). This works but doesn’t seem very efficient if I want to do it for multiple columns.
Tried doing a pivot table as well however I need the data in it’s original form for other analysis I would like to do.
Would appreciate any help.
Thanks
Fiona

Related

Is is possible limit the number of rows in the output of a Dataprep flow?

I'm using Dataprep on GCP to wrangle a large file with a billion rows. I would like to limit the number of rows in the output of the flow, as I am prototyping a Machine Learning model.
Let's say I would like to keep one million rows out of the original billion. Is this possible to do this with Dataprep? I have reviewed the documentation of sampling, but that only applies to the input of the Transformer tool and not the outcome of the process.
You can do this, but it does take a bit of extra work in your Recipe--set up a formula in a new column using something like RANDBETWEEN to give you a random integer output between 1 and 1,000 (in this million-to-billion case). From there, you can filter rows based on whatever random integer between 1 and 1,000 as what you'll keep, and then your output will only have your randomized subset. Just have your last part of the recipe remove this temporary column.
So indeed there are 2 approaches to this.
As Courtney Grimes said, you can use one of the 2 functions that create random-number out of a range.
randbetween :
rand :
These methods can be used to slice an "even" portion of your data. As suggested, a randbetween(1,1000) , then pick 1<x<1000 to filter, because it's 1\1000 of data (million out of a billion).
Alternatively, if you just want to have million records in your output, but either
Don't want to rely on the knowledge of the size of the entire table
just want the first million rows, agnostic to how many rows there are -
You can just use 2 of these 3 row filtering methods: (top rows\ range)
P.S
By understanding the $sourcerownumber metadata parameter (can read in-product documentation), you can filter\keep a portion of the data (as per the first scenario) in 1 step (AKA without creating an additional column.
BTW, an easy way of "discovery" of how-to's in Trifacta would be to just type what you're looking for in the "search-transtormation" pane (accessed via ctrl-k). By searching "filter", you'll get most of the relevant options for your problem.
Cheers!

Compare two sql data using mysqlworkbench

I have 3 tables on mysqlworkbench, 1 table need to combine with 2 data ch(17million row) and cl(9million row) suppose to be one table, other table name alldoc.(121k)
Basically i need to combine ch and cl as one table, and compare with alldoc data. Technically they are suppose to be same but people made mistake that why i need to compare. 100 column
enter image description here
enter image description here
I plan to write query till i hit to 100 because i have 100 columns in all data. Just rows sizes are different.
Thank you from advance. I know complicated but i really need to compare these two data writing query

tableau show categories from calculation even when a category is not visible

I have a calculation and it outputs multiple values. Then I am creating a table on those values. For example, in below data my formula is
if data is 1 then calculation is `one`
if data is 2 then calculation is `two`
if data is 3 then calculation is `three`
as three doesn't really appear in the output, when I create a table, three is not displayed. Is there any way to display it?
I tried table layout >> show empty rows and columns and it didn't work
data calculation
1 one
2 two
Tableau discovers the possible values for a dimension field dynamically from the query results.
If ‘three’ does not appear in your data, then how do you expect Tableau to know to make a column header for that non existent, but potential, value? It can’t read your mind.
This situation does occur often though - perhaps you want row or column headers to remain stable, even when you change filters in a way that causes some to no longer appear in the query results.
There are a few ways you can force Tableau to pad ** or **complete a domain:
one solution is to pad your data to make sure each value for your dimension field appears in at least one data row.
You can often do this easily by using a union to append some extra rows to your original data. You can often add padding rows that don’t impact any results by leaving all your Measure columns null since nulls are ignored by aggregation functions
Another common solution that is a bit more effort is to make what is known as scaffolding data source that is not much more than a list of your dimension members. You can then use that data source as a primary data source with data blending, making your original data source secondary.
There are two situations where Tableau can detect the absence of data and leave space for it in the visualization automatically
for numeric types, you can create a bin field that will automatically pad for missing bins
similarly, date fields can show missing values because, like bins, Tableau can tell when a month doesn’t appear in the data and leave room for it in the view

Tallying unknown words across columns in Tableau (or from comma separated column)

I have an issue that I have been trying to solve for the better part of a week now. I have a large database (in Google sheets) representing casestudies. I have some columns with multiple categories listed (in this example 'species', 'genera', and 'morphologies'), and I want to be able to tally how many times each category occurs in the data set.
I use Tableau to visalise the data, and the final output will be a large publc tableau. I know I can do a "find" based on the specific string, but I'd like the dataset to be dynamic and be able to handle new data being added without having to update calculated fields? Is there a way of finding uniqe terms (either from a single column of comma separated values, or from multiple columns), and tallying them?
Things I have tried so far:
1 - A pivot table in Tableau. Works well, but messes with all the other data, since it repeats lines.
2 - A pivot table on its own data source in Tableau. Also works well, and avoids the problem of messing with the other data. However, now each figure is disconnected from the others so I can't do a large dashboard where everything is filtered by each other (ie filtering species and genera by country at the same time).
3 - An SQL query() in google sheets, which finds all unique terms and queries them, which can then be plotted in Tableau. Also works well, but similar problem of the data being disconnected from all the other terms in the dataset.
Any ideas of a field calculation that will find, list and tally unique terms in a single comma separated column (or across multiple columns), without changing the data structure?
I have placed a sample data set here (google sheets), which is a smaller version of what I'm actually working on. In it I have marked comma separated columns in grey, and they're followed by a bunch of columns with the values split into columns. I only need to analyse either of those (ie either a calculation to separate comma separate values or from multiple columns).
I've also added a sample Tableau workbook here.

RDL is taking very longtime to render the report

In my report, I have 1 Tablix and 40 columns, i am simply dumping my data into the report, Scenario is as follow:
First row is for heading of the reports.
In Second row, i have data bound columns.
In third row, SUM expression.
I have some 5000 rows in my data table, This scenario is taking around 18 seconds to render the report.
Now the problem is that I need to apply colors dynamically eg.
=iif(CellValue >= 0 , "Black","Red") to all of my columns. as soon as i have applied this expression, report took around 5 mins to render.
Kindly share you expertise.
Regards
Inderjeet Singh
Sometimes SSRS has a bit of trouble doing lots of calculations during rendering. One thing to try is to have this calculation done as a column in the SQL if you can. Then simply refer to this field to determine the color. If you can't do that, add a calculated field to your dataset so that the calculations are done at a higher level than cell-by-cell. That usually helps too.
Based on my personal experience I have found that adding filters within SSRS tends to slow it down. Based on your dataset though it should not have much problem.
Additionally, if you have a grouping set to display pages on a single page (i.e. Keep together) that will slow it down as well.