Restrict table columns, preserving keys - kdb

I've found in "Q Tips" a technique to preserve keys in a table. This is useful for restriction columns in the right table in lj for example, without re-applying a key. Using each:
q)show t:(`c1`c2!1 2;`c1`c2!3 4)!(`c3`c4`c5!30 40 50;`c3`c4`c5!31 41 51)
c1 c2| c3 c4 c5
-----| --------
1 2 | 30 40 50
3 4 | 31 41 51
q)`c3`c4#/:t
c1 c2| c3 c4
-----| -----
1 2 | 30 40
3 4 | 31 41
I’m trying to understand why it preserves a key part of the table t:
q){-3!x}/:t
'/:
[0] {-3!x}/:t
^
But in this case q doesn’t show how it treats each row of the keyed table.
So why is this syntax #/:t works in such a way for a keyed table? Is it mentioned anywhere in code.kx.com docs?
Upd1: I've found a case with # and keyed table on code.kx.com, but it is about selecting rows, not columns.

If you view the keyed table as a dictionary (which it is) then it's no different to:
q)2*/:`a`b!1 2
a| 2
b| 4
or
q){x+1} each `a`b!1 2
a| 2
b| 3
The keys are retained when applying a function to each element of a dictionary. In your example the function being applied is to use take on a dictionary, e.g:
q)`c3`c4#first t
c3| 30
c4| 40
doing that for each row returns a list of dictionaries which is itself a table.
Also your other attempt would work as:
{-3!x}#/:t
so it's not unique to take #

{-3!x}/:t
each right needs two arguments so this wont work.
Since the table is keyed, it is treated as a dictionary. The each right iterates over the dictionary values and therefore ignores the keys of the main dictionary (= the keyed columns). To see what is happening it might help to see what happens when using each:
q)){-3!x} each t
c1 c2|
-----| --------------------
1 2 | "`c3`c4`c5!30 40 50"
3 4 | "`c3`c4`c5!31 41 51"

Related

How do I parameterise the column list in KDB?

I have a number of repetitive queries:
select lR, e10, e25, vr from z
Is there a way I can do something like:
features: `lR`e10`e25`vr
select features from z
You could use # like so:
`lR`e10`e25`vr#z
NB: The left argument here must be a list so to select a single column use the following:
enlist[`vr]#z
Example:
q)t:([]a:`a`b`c;b:til 3;c:0b);
q)`a`b#t
a b
---
a 0
b 1
c 2
Another approach is to use a functional form (which you can build using parse):
q)0N!parse"select lR, e10, e25, vr from z";
(?;`z;();0b;`lR`e10`e25`vr!`lR`e10`e25`vr)
q)features:`lR`e10`e25`vr
q)?[z;();0b;features!features]
If you use # for this then be aware it will fail on a keyed table.
One possible way of modifying it to work on any table would be something like:
f:{[t;c] if[not .Q.qt[t]; '"Input is not a table"]; c,:(); $[99h = type[t];c#/:t;c#t]}
So make sure your table is, in fact, a table, make sure columns are a list, and then perform the required # operation.
q)t
a| b c d
-| ------
a| 4 7 10
b| 5 8 11
c| 6 9 12
q)f[t;`b]
a| b
-| -
a| 4
b| 5
c| 6
q)f[0!t;`b]
b
-
4
5
6
q)f[flip 0!t;`b]
'Input is not a table
[0] f[flip 0!t;`b]
^

Reshape [cols;table]

How do I get columns from a table? If they don't exist it's ok to get them as null columns.
Trying reshape#:
q)d:`a`b!1 2
q)enlist d
a b
---
1 2
q)`a`c#d
a| 1
c|
q)`a`c#enlist d
'c
[0] `a`c#enlist d
^
Why does thereshape# operator not work on a table? It could easily act on each row (which is dict) and combine results. So I'm forced to write:
q)`a`c#/:enlist d
a c
---
1
Is it the shortest way?
Any key you try to take (#) which is not present in a dictionary will be assigned a null value of the same type as the first value in the dictionary. Similar behaviour is not available for tables.
q)`a`c#`a`b!(1 2;())
a| 1 2
c| `long$()
q)`b`c#`a`b!(();1 2)
b| 1 2
c| ()
Like you mentioned, the use of each-right (/:) will act on each row of the table IE each dictionary. Instead of using an iterator to split the table into dictionaries we can act on the dictionary itself. This will return the same output and is slightly faster.
q)d:`a`b!1 2
q)enlist`a`c#d
a c
---
1
q)(`a`c#/:enlist d)~enlist`a`c#d
1b
q)\ts:1000000 enlist`a`c#d
395 864
q)\ts:1000000 `a`c#/:enlist d
796 880

how to take value for same answered more than once and need to create each value one column

I have data like below, want to take data for same id from one column and put each answer in different new columns respectively
actual
ID Brandid
1 234
1 122
1 134
2 122
3 234
3 122
Excpected
ID BRANDID_1 BRANDID_2 BRANDID_3
1 234 122 134
2 122 - -
3 234 122 -
You can use pivot after a groupBy, but first you can create a column with the future column name using row_number to get monotically number per ID over a Window. Here is one way:
import pyspark.sql.functions as F
from pyspark.sql.window import Window
# create the window on ID and as you need orderBy after,
# you can use a constant to keep the original order do F.lit(1)
w = Window.partitionBy('ID').orderBy(F.lit(1))
# create the column with future columns name to pivot on
pv_df = (df.withColumn('pv', F.concat(F.lit('Brandid_'), F.row_number().over(w).cast('string')))
# groupby the ID and pivot on the created column
.groupBy('ID').pivot('pv')
# in aggregation, you need a function so we use first
.agg(F.first('Brandid')))
and you get
pv_df.show()
+---+---------+---------+---------+
| ID|Brandid_1|Brandid_2|Brandid_3|
+---+---------+---------+---------+
| 1| 234| 122| 134|
| 3| 234| 122| null|
| 2| 122| null| null|
+---+---------+---------+---------+
EDIT: to get the column in order as OP requested, you can use lpad, first define the length for number you want:
nb_pad = 3
and replace in the above method F.concat(F.lit('Brandid_'), F.row_number().over(w).cast('string')) by
F.concat(F.lit('Brandid_'), F.lpad(F.row_number().over(w).cast('string'), nb_pad, "0"))
and if you don't know how many "0" you need to add (here it was number of length of 3 overall), then you can get this value by
nb_val = len(str(sdf.groupBy('ID').count().select(F.max('count')).collect()[0][0]))

Kdb upsert with conditional syntax?

Is there a way I can upsert in kdb where the following occurs:
If key is not present, insert values
If key is present, check if current value is greater
A) If so, perform no action
B) If not, update values
Something like:
job upsert ([title: job1] time: enlist 1 where time > 1)
Since you're using a keyed table, and you want to change values only if they're greater and add in new keys and values, you can try avoiding upsert entirely:
t:([job:`a`b`c] val: 4 4 4) /current table
nt:([job:`a`c`d]val: 6 1 5) /new values to check
t|nt
job| val
---| ---
a | 6
b | 4
c | 4
d | 5
This will automatically add keys that aren't there, and update the current value to the new value if the new value is larger.
please find a solution and explanation below. I'll edit if I come up with a better way - thanks. *also I hope I interpreted the question correctly.
q)t1
name | age height
-------| ----------
michael| 26 173
john | 57 156
sam | 23 134
jimmy | 83 183
conor | 32 145
jim | 64 167
q)t2
name age height
---------------
john 98 220
mary 24 230
jim 50 240
q)t1 upsert t2 where{$[all null n:x[y`name];1b;y[`age]>n[`age]]}[t1;]each t2
name | age height
-------| ----------
michael| 26 173
john | 98 220
sam | 23 134
jimmy | 83 183
conor | 32 145
jim | 64 167
mary | 24 230
q)
Explanation;
The function takes 2 args, x = the keyed table t1 and y = each record from t2(as a dictionary). First we extract the name value from the t2 record(y`name) and try to index into the source keyed table with that value and store the result in the local variable n. If the name exists, the corresponding record(n, as a dictionary)will be returned from y(and all null n will be false) otherwise an empty record will be returned(and all null n will be true). If we cannot find an instance of the t2[`name] in t1 then we just return 1b from the function. Otherwise, then we want to compare the ages between the two records (n[`age] <-- age referenced in t1 for the matching name & y[`age] <-- age of this particular record of t2) - if the age for this matching record in t2 (y[`age]) is greater than the matching value from t1 then we return 1b otherwise we return 0b.
The result of this function is a list of booleans, one for each record in t2. 1b is returned under 2 scenarios - either;
(1) This particular name from t2 has no match in t1. (2) This name from t2 does have a match in t1 and the age is greater than the corresponding age in t1. 0b is returned when the age referenced in t2 is less than the corresponding age from t1.
In our example the result of the function is 110b and after we apply where to this, the result is the indexes where the list value is true i.e. where 110b --> 0 1. We use this list to index into t2 which returns the first 2 records from t2(these are either new records or records where the age is greater than what is referenced in t1), then we simply upsert this into t1.
I hope this helps and hope some better solutions come along.
For a table, a key, and a value: upsert the tuple if the key is new or the value exceeds the existing value.
q)t:([job:`a`b`c] val: 4 4 4) /current table
q)t[`a]|:6 /old key, higher value
q)t
job| val
---| ---
a | 6
b | 4
c | 4
q)t[`c]|:1 /old key, lower value
q)t
job| val
---| ---
a | 6
b | 4
c | 4
q)t[`d]|:5 /new key
q)t
job| val
---| ---
a | 6
b | 4
c | 4
d | 5
Remarks
A keyed table with a single data column could perhaps be a dictionary.
Amending through an operator works also with a new key.
Upserting a table (or dictionary) of new records is more efficient and simpler than updating a single tuple.
q)nt:([job:`a`c`d]val: 6 1 5) /new values to check
q)t|nt /maximum of two tables
job| val
---| ---
a | 6
b | 4
c | 4
d | 5
or just
q)t[([]job:`a`c`d)]|:([]val:6 1 5)
Simple-looking primitives such as maximum (|) repay careful study.

In t-sql how do I create a one to one relationship between the columns of a single table?

I have a table with columns C1,C2,C3, and C4. A value can appear multiple times in C1 but for each value of C1 there must be a unique value for the combination of columns C3 and C4. Similarly a value can appear in multiple rows in C3 and C4, but for any particular combination of values in C3 and C4 there must be a unique value in C1. The primary key is C1+C2+C3+C4. E.g.:
C1 | C2 | C3 | C4 | Comment
11 ! 20 | 31 | 41 | Any time C1=11, must have C3=31 and C4=41 and vice versa
11 ! 21 | 31 | 41 | OK because C2 is different
11 ! 22 | 32 | 41 | Bad because C1=11 but C3<>31
11 ! 22 | 31 | 42 | Bad because C1=11 but C4<>41
12 ! 23 | 31 | 41 | Bad because C1<>11 but C3=31 and C4=41
13 ! 21 | 31 | 42 | Any time C1=12, must have C3=31 and C4=42 and vice versa
-
Create some tables that have the allowed combinations and use foreign keys
That is, you would have a table as follows
C1, C3, C4
11, 31, 41
12, 31, 42
Then add a foreign key on your first table on C1, C3 and C4
Building on gbn's suggestion (or making it explicit). The fact that a particular mapping can appear more than once in the original table (e.g., rows 1 and 2 have the same values for C1, C3, and C4) is what makes things difficult. Use a second table and a foreign key with the first table so that each mapping appears only once in the second table. Then, the trick is, in the second table, to have a UNIQUE constraint on C1 and a separate UNIQUE constraint on (C3,C4). Since each value on each side of the map can only appear once in the second table, you've enforced a 1-1 mapping.