How to concatenate columns in update statement - kdb

I have this table:
t:([] name:("aaa";"bbb";"ccc";"dddd"); side:(1;2;1;2))
Now I want to add a new column "concatenated", which contains a symbol, which is the concatenation of both values for each row:
I would assume that I have to do this with an each-both adverb, but this here does not work:
update concatenated:((`$name),'(`$side)) from t
How would I have to change this? Thanks.

Your attempt is close the issue with it works if you convert the 'side' column to string format first
I've added two versions one where the concatenation does not merge the 2 values and one where they are merged as a single symbol
q)t:([] name:("aaa";"bbb";"ccc";"dddd"); side:(1;2;1;2))
q)update conc:((`$name),'`$string side) from t
name side conc
------------------
"aaa" 1 aaa 1
"bbb" 2 bbb 2
"ccc" 1 ccc 1
"dddd" 2 dddd 2
q)update conc:(`$name,'string side) from t
name side conc
-----------------
"aaa" 1 aaa1
"bbb" 2 bbb2
"ccc" 1 ccc1
"dddd" 2 dddd2
Hope this helps

Related

Replace first n entries in a column in kdb

How can I replace the values in the first n columns of my table?
i.e. mycol:(1 2 3 4) to mycol:(a a 3 4)
Thank you in advance!
If it's the values within mycol that you want updated then they will need to be of the same type as the existing values. See below.
q)t:([]mycol:`$string 1+til 4;mycol2:til 4)
q)update mycol:`a from t where i<2
mycol mycol2
------------
a 0
a 1
3 2
4 3
One way around this though is to enlist mycol, that way updates of any type can be made.
q)t:([]mycol:1+til 4;mycol2:til 4)
q)update mycol:`a from(update enlist each mycol from t)where i<2
mycol mycol2
------------
`a 0
`a 1
,3 2
,4 3
q)meta update mycol:`a from(update enlist each mycol from t)where i<2
c | t f a
------| -----
mycol |
mycol2| j
It's unclear from your question whether you want the column names or the column values changed. If it's the column names, you can use xcol.
q)(2#`a)xcol([]w:3#til 3;x:3#.Q.a;y:`;z:0N)
a a y z
-------
0 a
1 b
2 c

PostgreSQL: Count Number of Occurrences in Columns

BACKGROUND
I have three large tables (employee_info, driver_info, school_info) that I have joined together on common attributes using a series of LEFT OUTER JOIN operations. After each join, the resulting number of records increased slightly, indicating that there are duplicate IDs in the data. To try and find all of the duplicates in the IDs, I dumped the ID columns into a temp table like so:
Original Dump of ID Columns
first_name
last_name
employee_id
driver_id
school_id
Mickey
Mouse
1234
abcd
wxyz
Donald
Duck
2423
heca
qwer
Mary
Poppins
1111
acbe
aaaa
Wiley
Cayote
1234
strf
aaaa
Daffy
Duck
1256
acbe
pqrs
Bugs
Bunny
9999
strf
yxwv
Pink
Panther
2222
zzzz
zzaa
Michael
Archangel
0000
rstu
aaaa
In this overly simplified example, you will see that IDs 1234 (employee_id), strf (driver_id), and aaaa (school_id) are each duplicated at least once. I would like to add a count column for each of the ID columns, and populate them with the count for each ID used, like so:
ID Columns with Counts
first_name
last_name
employee_id
employee_id_count
driver_id
driver_id_count
school_id
school_id_count
Mickey
Mouse
1234
2
abcd
1
wxyz
1
Donald
Duck
2423
1
heca
1
qwer
1
Mary
Poppins
1111
1
acbe
1
aaaa
3
Wiley
Cayote
1234
2
strf
2
aaaa
3
Daffy
Duck
1256
1
acbe
1
pqrs
1
Bugs
Bunny
9999
1
strf
2
yxwv
1
Pink
Panther
2222
1
zzzz
1
zzaa
1
Michael
Archangel
0000
1
rstu
1
aaaa
3
You can see that IDs 1234 and strf each have 2 in the count, and aaaa has 3. After generating this table, my goal is to pull out all records where any of the counts are greater than 1, like so:
All Records with One or More Duplicate IDs
first_name
last_name
employee_id
employee_id_count
driver_id
driver_id_count
school_id
school_id_count
Mickey
Mouse
1234
2
abcd
1
wxyz
1
Mary
Poppins
1111
1
acbe
1
aaaa
3
Wiley
Cayote
1234
2
strf
2
aaaa
3
Bugs
Bunny
9999
1
strf
2
yxwv
1
Michael
Archangel
0000
1
rstu
1
aaaa
3
Real World Perspective
In my real-world work, the JOIN'd table contains 100 columns, 15 different ID fields and over 30,000 records, and the final table came out to be 28 more than the original. This may seem like a small amount, but each of the 28 represent a broken link that we must fix.
Is there a simple way to get the counts populated like in the second table above? I have been wrestling with this for hours already, and have not been able to make this work. I tried some aggregate functions, but they cannot be used in table UPDATE operations.
The COUNT function, when used as an analytic function, can do what you want here, e.g.
WITH cte AS (
SELECT *,
COUNT(employee_id) OVER (PARTITION BY employee_id) employee_id_count,
COUNT(driver_id) OVER (PARTITION BY driver_id) driver_id_count,
COUNT(school_id) OVER (PARTITION BY school_id) school_id_count
FROM yourTable
)
SELECT *
FROM cte
WHERE
employee_id_count > 1
driver_id_count > 1
school_id_count > 1;

kdb items being list and convert into row

I have the following kdb table
name value price
-------------------------
Paul 1 2 3 4
where value and price are lists. How can I convert them into
name value price
------------------------------
Paul 1 3
Paul 2 4
? Thanks!!
ungroup is what you're looking for here.
As an aside, "value" is a reserved word in q and you should get an 'assign error if you try to use it as a column name.
q)t:([]name:`Paul;value:enlist 1 2;price:enlist 3 4)
'assign
q)t:([]name:`Paul;val:enlist 1 2;price:enlist 3 4)
q)ungroup t
name val price
--------------
Paul 1 3
Paul 2 4

Filtering inside LOD expression

Product Item Status
A aa 0
A aaa 0
A aaaa 0
A aaaaa 1
B bb 2
B bbb 0
B bbbb 3
C cc 4
C cccc 5
I need to calculate the count of items which have status = '0' at the Product level. So my output shall be:
Product Count
A 3
B 1
My formula is as follows:
{FIXED [Product]: CountD([Item) }
and I dragged the status into filters.
But this is not working. Can someone help?
Try this:
{ FIXED [Product]:COUNT(IF [Status]=0 THEN [Status] END)}
Edit-------------------------------------------------------------

KDB get substring

How can I add a column containing a substring of a another columns containing symbols. So, go from
t:flip `date`sym`pos!(`d1`d1`d1`d2;`aaaA1`bbbA1`aaaA2`aaaA3;1 2 3 1)
date sym pos
d1 aaaA1 1
d1 bbA1 2
d1 aaaA2 3
d2 aaaA3 1
to
t:flip `date`sym`pos`ext!(`d1`d1`d1`d2;`aaaA1`bbbA1`aaaA2`aaaA3;1 2 3 1;`aaa`bbb`aaa`aaa)
date sym pos ext
d1 aaaA1 1 aaa
d1 bbA1 2 bb
d1 aaaA2 3 aaa
d2 aaaA3 1 aaa
EDIT. The substring should always contain the first len(symbol) -2 characters, so in my example above, aaa for aaaAx and bb for bbAx
If the substring you wish to extract is a constant length, you can do something like this following:
q)t:flip `date`sym`pos!(`d1`d1`d1`d2;`aaaA1`bbbA1`aaaA2`aaaA3;1 2 3 1)
q)update ext:`$3#'string sym from t
date sym pos ext
------------------
d1 aaaA1 1 aaa
d1 bbbA1 2 bbb
d1 aaaA2 3 aaa
d2 aaaA3 1 aaa
If that's not the case, please provide some more detail with regards to how the substring to be extracted can be identified
Hope this helps
Jonathon
There can be a clever way of applying this below, but this is what i first came up with.
t:flip `date`sym`pos!(`d1`d1`d1`d2;`aaaA1`bbbA1`aaaA2`aaaA3;1 2 3 1)
t: update ctr: {-2 + count string x} each sym from t;
t:{[x] :update ext:x[`ctr]#string(x[`sym]) from x} each t;
2nd line is applying your logic of: len(symbol) - 2
3rd line is taking 'ctr' number of characters from the original symbol characters.
You didn’t say so, but this is kdb+, so let’s assume:
your table is long
your sym column has duplicates
You don’t need to convert all the symbols to strings and back: only the distinct ones. (In this example, I’ve changed one of the symbols to create a duplicate.)
q)t:flip `date`sym`pos!(`d1`d1`d1`d2;`aaaA1`bbbA1`aaaA2`aaaA1;1 2 3 1)
q)update ext:{nub:distinct x;(`$-2 _'string nub)nub?x}sym from t
date sym pos ext
------------------
d1 aaaA1 1 aaa
d1 bbbA1 2 bbb
d1 aaaA2 3 aaa
d2 aaaA1 1 aaa
The utility .Q.fu applies a function to the distinct items.
q)update ext:.Q.fu[{`$-2 _'string x};sym] from t
date sym pos ext
------------------
d1 aaaA1 1 aaa
d1 bbbA1 2 bbb
d1 aaaA2 3 aaa
d2 aaaA1 1 aaa
This operation would be faster if the sym column were already stored as an enumeration, because the distinct values would then be available without calculation.
Using drop:
q)t:flip `date`sym`pos!(`d1`d1`d1`d2;`aaaA1`bbA1`aaaA2`aaaA3;1 2 3 1)
q)update ext:`$-2_'string sym from t
date sym pos ext
------------------
d1 aaaA1 1 aaa
d1 bbA1 2 bb
d1 aaaA2 3 aaa
d2 aaaA3 1 aaa