Space character within symbol literals - kdb

I need to query a database which contains names of companies. I have list of around 50 names, for which i have to get the data. But I am unable to write a query using in command as there spaces in a name which are not being recognized. ex
select from sales where name in (`Coca Cola, `Pepsi)
This is giving me an error as 'Cola' is not being recognized. Is there a way to write such a query?

The spaces between the strings cause the interpreter to get confused. The `$() casts the list of characters to symbols.
q)t:([] a:1 2 3; name:`$("coca cola";"pepsi";"milk"))
q)select from t where name in `$("coca cola";"pepsi")
a name
-----------
1 coca cola
2 pepsi
You may also want to be careful of casing and either use consistently lower or upper case else that would cause unexpected empty results:
q)select from t where name in `$("Coca Cola";"Pepsi")
a name
------
q)select from t where upper[name] in upper `$("Coca Cola";"Pepsi")
a name
-----------
1 coca cola
2 pepsi

You need to do something like the following:
select from sales where name in `$("Coca Cola";"Pepsi")

Related

KDB - How to select a column with prefixed numbers?

I have a table of attributes that I am trying to pivot off of and while the pivot makes sense, several key attribute values I am successfully pivoting off of are prefixed with numbers (for sorting purposes). These are important attributes (there are several like this) that we want to pivot and report on.
I found a similar question here: How to select a column containing dot in column name in kdb and am when I sanitize the dictionary .Q.id t prefixed the columns with a
When I ran type on the returned value it returned 99h so the pivot returns a dictionary.
I'm trying to leverage enlist(`1CODE)#t but to no avail as of yet.
Any thoughts or suggestions?
q) t
monthDate | 1CODE 2CODE 3CODE 4CODE
----------| ------------------------------------
2022.01.01| 18.0054 0.1537228 4.116678 9.332936
2022.02.01| 17.87151 0.1527959 3.866393 9.685012
2022.03.01| 17.739 0.1518747 3.646734 10.00515
...
You can't use colName#table on a keyed table (99h is a keyed table in this case, though yes a keyed table is also a dictionary). So you would have to unkey the table first using 0!
t:1!flip`monthDate`1CODE`2CODE!(2022.01.01 2022.02.01 2022.03.01;3?100.;3?10.);
q)((),`1CODE)#0!t
1CODE
--------
61.37452
52.94808
69.16099
q)((),`1CODE`2CODE)#0!t
1CODE 2CODE
------------------
61.37452 0.8388858
52.94808 1.959907
69.16099 3.75638
Tables in kdb are just lists of dictionaries. Type 99h can be both a keyed table and a dictionary. You can still use qsql if you've sanitised your table:
q)select a1CODE from .Q.id t
a1CODE
--------
18.0054
17.87151
17.739
Another option is to use xcol to rename your columns:
q)t:(`monthDate,`$1 rotate'string 1_cols t)xcol t
q)select CODE1 from t1
CODE1
--------
47.35547
75.21426
99.14374
I'm not sure what you mean by pivoting off of at the beginning, but an issue that sticks out to me is that the enlist function should use square brackets - rather than the round ones in in your post. So the code you want is:
enlist[`1CODE]#t

count t vs actual number in kdb select statement

I noticed the following
select (count t)#`test from t
Returns
flip (enlist `x)!enlist enlist `test`test`test
Vs
select 3#`test from t
Which returns
flip (enlist `x)!enlist `test`test`test
Similar with select (sum 1 2)#1 from t vs select(1 + 2)#1 from t etc
Anyone know the reason why key words in the select seems to cause the return to be a table with one row nested list containing x element vs a table with x rows?
It's because kdb recognises count and sum as aggregations and has special treatment for them (it enlists the result).
For example if you were to slightly change the count and sum to lambdas (which kdb won't recognise) you get the other results you expect:
q)select ({count x}t)#`test from t
x
----
test
test
test
q)select ({sum x}1 2)#1 from t
x
-
1
1
1
The reason kdb "recognises" certain common aggregations and auto-enlists them is because otherwise simple selects such as select sum a from tab would give a rank error as the sum returns an atom but a table column must be a list, e.g.
q)select {sum x}a from t
'rank
[0] select {sum x}a from t
^
/versus
q)select sum a from t
a
-
6
There's also a deeper reason which is to do with map/reduce aggregations over database partitions but that's beyond scope for this problem. The list of recognised aggregations is stored in the variable .Q.a0. See also https://code.kx.com/q/basics/qsql/#special-functions

Count the non-null columns in Tableau

I need to find how many aliases are present and what are they in my data. The columns storing the alias values are: ColA, ColB, ColC.
Sample Data Description:
"AAA" is also known as “a” & “b”,
“BBB” is also known as “c”,”d” & “e”.
I want to find the number of alias a Name have and display the values of alias too.
**Name** **ColA** **ColB** **ColC**
AAA a b
BBB c d e
CCC f g
Example:
for “AAA”, number of alias = 2 & values of alias are: “a” & “b”
for "BBB", number of alias = 3 & values of alias are: “c” & “d” & "e"
Easy.
First edit your data source to pivot the ColA, ColB and ColC fields. Hide [Pivot Field Names] and rename [Pivot Field Values] to [Alias]. In the top right of the data source page, filter to exclude rows with a null [Alias]. Then your data will appear to have two columns [Name] and [Alias]
The best way to count and display aliases, depends on your Tableau version. Prior to version 2020.2, you'd use SUM([Number of Records]), nowadays, use one of the generated Count() measures.

How to select a column containing dot in column name in kdb

I have a table which consists of column named "a.b"
q)t:([]a.b:3?10.0; c:3?10; d:3?`3)
How can we select column a.b and c from table t?
How can we rename column a.b to b?
Is it possible to achieve above two cases without functional select?
Failed attempts:
q)select a.b, c from t
'type
q)?[`t;();0b;enlist (`b`c!`a.b`c)]
'type
q)select b:a.b from t
'type
As others have mentioned, .Q.id t will sanitise table column names if they aren't suitable for qSQL statements or performance in general.
`a.b`c#t
will only work for multiple column selects and
`a.b#t
will return a type error. However, you can get around this by enlisting the single item into the take operator, like so:
q)enlist[`a.b]#t
a.b
---------
4.931835
5.785203
0.8388858
q)(enlist`a.b)#t
a.b
---------
4.931835
5.785203
0.8388858
If you only need the values from a single column another option would be to use indexing, in this case, it would be t[a.b] ` which would return all values from the a.b column.
You could also mix these selection styles like so, but ultimately lose the column name from a.b:
q)select c,t[`a.b] from t
c x
----------
8 4.707883
5 6.346716
4 9.672398
In the query operation the . itself is used for foreign key navigation and it is throwing a type error as it cannot find any table relating to the foreign key it believes you have passed it.
As much as I hate answering any online forum question by refuting the premise, I really must here, do not use periods in column names, it will cause trouble. .Q.id exists to santise column names for a reason.
The primary reason that errors are encountered is that the use of dot notation in qSQL is reserved for the resolution of linked columns. We can see how this is actually working by parsing the query itself
q)parse "select a.b from tab"
?
`tab
()
0b
(,`b)!,`a.b // Here the referencing of a linked column b via a is occuring
// Compared to a normal select
q)parse "select b from tab"
?
`tab
()
0b
(,`b)!,`b
Other issues could crop up depending on future processing, such as q attempting to treat the column names as namespaces or operating on each part of the name with the dot operator.
Using dot notation in your column names will hamstring any further development, and force all other kdb users to use roundabout methods. The development will be slow and encounter many bugs.
I would advise that if periods must be included in the column, you create an API for external users to use to translate queries into the sanitised forms.
You can easily sanitise the whole table with .Q.id
q)tab:enlist `a.b`c`d!(1 2 3)
q)tab:.Q.id tab
q)sel:{[tab;cl] ?[tab;();0b;((),.Q.id each cl)!((),.Q.id each cl)]}
q)sel[tab;`a.b]
ab
--
1
How about the following, using take # :
q) `a.b`c#t
a.b c
-----------
4.931835 1
5.785203 9
0.8388858 5
To rename:
q) `b xcol t
b c d
---------------
4.931835 1 mil
5.785203 9 igf
0.8388858 5 kao
You can use .Q.id to rename any unselectable columns:
q).Q.id t
ab c d
---------------
4.931835 1 mil
5.785203 9 igf
0.8388858 5 kao
Best to avoid dots in columns names and symbols in general, use underscore if you must.

Is there a way to use order by to order similar fields together?

Is there a way to make similar values equivalent in an order by?
Say the data is:
name | number
John. | 9
John | 1
John. | 2
Smith | 4
John | 3
I'd want to order by name and then number, so that the output looks like this, but order by name, number will put all John entries ahead of John. entries.
name | number
John | 1
John. | 2
John | 3
John. | 9
Smith | 4
There's Fuzzy String Matching and beyond that, Pattern Matching.
You need some more advanced treatment on the name field then.
This topic will help you strip non-alphabetic characters from your string before ordering:
How to strip all non-alphabetic characters from string in SQL Server?
But the fact that you need such a complex function makes me question the very building process of your Database: if "John" and "John." are the same person, they should have the same name. So if the "." is important, that means you need another field to store the information it represents.
Use a regex replace function to strip out all special characters in your data, replacing with a space. Then wrap that in a TRIM function to remove the spaces
TRIM(CASE
WHEN name LIKE '%.%'
OR name LIKE '%_%'
OR name ~ '%\d%' --This is for a number
THEN
REGEXP_REPLACE(name, '(\_|\.|\d)', ' ' ) END) AS name_processed
The bit in brackets means replace an underscore or (|) a period or a digit with whatever is after the comma, which here is a space
Now you can order by name_processed and number as well
ORDER BY name_processed, number DESC
But you can always keep the original name in a SELECT afterwards if you wrote a subquery first through WITH. Let me know if you want to do this. Basically the syntx would be:
WITH processed_names AS (
SELECT
name,
TRIM(CASE
WHEN name LIKE '%.%'
OR name LIKE '%_%'
OR name ~ '%\d%' --This is for a number
THEN
REGEXP_REPLACE(name, '(\_|\.|\d)', ' ' ) END) AS name_processed,
number
FROM names
ORDER BY 2,3 DESC)
SELECT
name,
number
FROM processed_names;