Count the non-null columns in Tableau - tableau-api

I need to find how many aliases are present and what are they in my data. The columns storing the alias values are: ColA, ColB, ColC.
Sample Data Description:
"AAA" is also known as “a” & “b”,
“BBB” is also known as “c”,”d” & “e”.
I want to find the number of alias a Name have and display the values of alias too.
**Name** **ColA** **ColB** **ColC**
AAA a b
BBB c d e
CCC f g
Example:
for “AAA”, number of alias = 2 & values of alias are: “a” & “b”
for "BBB", number of alias = 3 & values of alias are: “c” & “d” & "e"

Easy.
First edit your data source to pivot the ColA, ColB and ColC fields. Hide [Pivot Field Names] and rename [Pivot Field Values] to [Alias]. In the top right of the data source page, filter to exclude rows with a null [Alias]. Then your data will appear to have two columns [Name] and [Alias]
The best way to count and display aliases, depends on your Tableau version. Prior to version 2020.2, you'd use SUM([Number of Records]), nowadays, use one of the generated Count() measures.

Related

In gtsummary, is there a way to change the text used for a value (the category value, not the label for the category)?

I am using the excellent gtsummary R package.
As an example, my gtsummary table has columns for multiple characteristics of people in a clinical trial and a column for treatment A and treatment B.
For each group I have various characteristics such as gender. In my dataframe gender is listed as "m" or "f" and I want the table to list these as "Male" and "Female".
I know how to change labels e.g label = list( gender ~ "Gender"), but how do I change the values?
(I know I can go into the dataframe and change them there, but can I do this in gtsummary)

Exclude Combination of Data Items From One Table From Another

I have a view, A, with 20 columns which forms my primary data. I have a table B which lists some of the columns from A and contains data I want to exclude from A.
For example table B will have 6 columns 2 of which are 'customer' and 'country' and contain the data 'HP' and 'America'. These columns exist in A. But I want to write a query that brings back data from A except where any rows that have a combination HP and America.
There are 6 columns and table B can have any combination of rows. Anywhere between 1 and all 6 rows could be filled in or there could be a row which has 5 columns filled in. Also another row with a different 5 columns filled in and so on.
I want to be prepared for any possible combination of the 6 rows and the query to search A for the combination and exclude any rows with that data from B.
I have tried this
SELECT *
FROM A T1
WHERE not EXISTS
(SELECT * FROM [dbo].[ExcludedItems] T2
WHere ReportNumber=1
AND
(
T1.job=ISNULL(T2.job,T1.job) and T1.CustomerName=ISNULL(T2.CustomerName,T1.CustomerName) and
T1.COUNTRY= ISNULL(T2.COUNTRY,T1.COUNTRY) and T1.CONTINENT=ISNULL(T2.CONTINENT,T1.CONTINENT) AND
T1.continer= ISNULL(T2.ContainerName, T1.continer) and T1.UnscheduledJob= ISNULL(T2.unscheduledJob, T1.UnscheduledJob) and
T1.[Price]= ISNULL(T2.Price, T1.Price) and
T1.[Haulage]= ISNULL(T2.[Haulage], T1.[Haulage]) and
T1.SiteAdress= ISNULL(T2.SiteAddress, T1.SiteAdress) and T1.Delta=ISNULL(T2.Delta, T1.Delta) and
T1.Cost= ISNULL(T2.Cost, T1.Cost)
)
)
The problem is the result set is not correct. I have tried with a smaller column sample and able to exclude the correct combination of Customer and Country but when I introduce a 3rd or 4th column combination I can eyeball the result set and immediately see its incorrect. Not sure if I have to use multiple NOT EXISTS for each possible combination, was hoping not to.
A constraint is A has to be a view not a table. Otherwise I would have used variables in some manner and wrapped the whole thing in a stored procedure.
Appreciate any help, fall back is to manually add to the code each time an item combination is supplied in B!

How to select a column containing dot in column name in kdb

I have a table which consists of column named "a.b"
q)t:([]a.b:3?10.0; c:3?10; d:3?`3)
How can we select column a.b and c from table t?
How can we rename column a.b to b?
Is it possible to achieve above two cases without functional select?
Failed attempts:
q)select a.b, c from t
'type
q)?[`t;();0b;enlist (`b`c!`a.b`c)]
'type
q)select b:a.b from t
'type
As others have mentioned, .Q.id t will sanitise table column names if they aren't suitable for qSQL statements or performance in general.
`a.b`c#t
will only work for multiple column selects and
`a.b#t
will return a type error. However, you can get around this by enlisting the single item into the take operator, like so:
q)enlist[`a.b]#t
a.b
---------
4.931835
5.785203
0.8388858
q)(enlist`a.b)#t
a.b
---------
4.931835
5.785203
0.8388858
If you only need the values from a single column another option would be to use indexing, in this case, it would be t[a.b] ` which would return all values from the a.b column.
You could also mix these selection styles like so, but ultimately lose the column name from a.b:
q)select c,t[`a.b] from t
c x
----------
8 4.707883
5 6.346716
4 9.672398
In the query operation the . itself is used for foreign key navigation and it is throwing a type error as it cannot find any table relating to the foreign key it believes you have passed it.
As much as I hate answering any online forum question by refuting the premise, I really must here, do not use periods in column names, it will cause trouble. .Q.id exists to santise column names for a reason.
The primary reason that errors are encountered is that the use of dot notation in qSQL is reserved for the resolution of linked columns. We can see how this is actually working by parsing the query itself
q)parse "select a.b from tab"
?
`tab
()
0b
(,`b)!,`a.b // Here the referencing of a linked column b via a is occuring
// Compared to a normal select
q)parse "select b from tab"
?
`tab
()
0b
(,`b)!,`b
Other issues could crop up depending on future processing, such as q attempting to treat the column names as namespaces or operating on each part of the name with the dot operator.
Using dot notation in your column names will hamstring any further development, and force all other kdb users to use roundabout methods. The development will be slow and encounter many bugs.
I would advise that if periods must be included in the column, you create an API for external users to use to translate queries into the sanitised forms.
You can easily sanitise the whole table with .Q.id
q)tab:enlist `a.b`c`d!(1 2 3)
q)tab:.Q.id tab
q)sel:{[tab;cl] ?[tab;();0b;((),.Q.id each cl)!((),.Q.id each cl)]}
q)sel[tab;`a.b]
ab
--
1
How about the following, using take # :
q) `a.b`c#t
a.b c
-----------
4.931835 1
5.785203 9
0.8388858 5
To rename:
q) `b xcol t
b c d
---------------
4.931835 1 mil
5.785203 9 igf
0.8388858 5 kao
You can use .Q.id to rename any unselectable columns:
q).Q.id t
ab c d
---------------
4.931835 1 mil
5.785203 9 igf
0.8388858 5 kao
Best to avoid dots in columns names and symbols in general, use underscore if you must.

Check presence of a column in multiple tables

I want to check if a column is present in multiple tables.
When I try for one table, it works.
`tickerCol in cols tradeTable / (output is 1b) hence working perfectly
`tickerCol in cols table2 / (output is 1b) hence working perfectly
but when I run
`ticker in cols #' (tradeTable;table2) / (output is 0b, expected output 11b)
for above example ticker column is present in both tables(tradeTable;table2).
The following works using each-both ':
`ticker in ' cols each (tradeTable; table2)
This will find the columns that are present in each of the tables and then perform a check on each of the column lists to find if `ticker is present in these lists.
Solution is already provided in another answer. Just trying to explain why your solution is not working.
Lets say we have 2 tables t1 (columns id and v1) and t2 (columns id and v2).
Now when we run:
q) cols#'`t1`t2
output will be a list of list:
(`id`v1;`id`v2)
This list has 2 entries where each entry is a list.
Now what you are doing is trying to find column in this list .
q) `id in (`id`v1;`id`v2) /output 0b
And since that list doesn't have id as an entry it returns 0b.
If you search `id`v1 which is a list you will get 1b matching first entry.
q) `id`v1 in (`id`v1;`id`v2) / output 1b
What you want here is to search your column name in each entry of that list. So the only thing you are missing in your expression is each-both. This will work:
q) `id in'cols#'`t1`t2 / output 11b
In your case it will be:
q) `ticker in ' cols#'`tradeTable`table2

Pivot Charts in google Sheets by counting non-numeric data?

I have a dataset that I'd like to summarize in chart form. There are about 30 categories whose counts I'd like to display in a bar chart from about 300+ responses. I think a pivot table is probably the best way to do this, but when I create a pivot table and select multiple columns, each new column added gets entered as a sub-set of a previous column. My data looks something like the following
ID Country Age thingA thingB thingC thingD thingE thingF
1 US 5-9 thB thD thF
2 FI 5-9 thA thF
3 GA 5-9 thA thF
4 US 10-14 thC
5 US 10-14 thB thF
6 US 15-18
7 BR 5-9 thA
8 US 15-18 thD thF
9 FI 10-14 thA
So, I'd like to be able to create an interactive chart that showed the counts of "thing" items; I'd then like to be able to filter based upon demographic data (e.g., Country, Age). Notice that the data is non-numeric, so I have to use a CountA to see how many there are in each category.
Is there a simple way to display chart data that summarizes the counts and will allow me to filter based on different criteria?
The query can summarize the data in the form you want. The fact that you have "thA", "thB", etc, instead of "1" complicates the matter, but one can transform the strings to numeric data on the fly.
Assuming the data you've shown is in the cells A1:I10, the following formula will summarize it:
=query({B2:C10, arrayformula(if(len(D2:I10), 1, 0))}, "select Col1, Col2, count(Col3), sum(Col3), sum(Col4), sum(Col5), sum(Col6), sum(Col7) group by Col1, Col2", 0)
Explanation:
{B2:C10, arrayformula(if(len(D2:I10), 1, 0))} creates a table where the first two columns are your B,C (Country, Age) and the other six are filled with 1 or 0 depending on whether the cells in D-I are filled or not.
select Col1, Col2, count(Col3), sum(Col3), ... group by Col1, Col2 selects Country, Age, the total count of rows with this Country-Age combination, the number of rows with thingA for this Country-Age combination, etc.
the last argument, 0, indicates there are no header rows in the table passed to the query.
It's possible to give labels to the columns returned by the query, using label: see query language documentation. It would be something like
label Col1 'Country', Col2 'Age', count(Col3) 'Total count', sum(Col3) 'thingA count', ...
Add a Count column to your data with a "1" for whatever occurrence, this might solve your problem in the Pivot Table. I was just looking for a solution and thought about this. Working now for me.