How do I add several columns at once in kdb?

How do I add several columns at once in kdb? - kdb

Somehow, I can only find examples that show how to add one column.
So I have written this code, which works, but I know there is a much better way to do this:
table t already exists with columns filled with data, and I need to add new columns that are initially null.
t: update column1:` from t;
t: update column2:` from t;
t: update column3:` from t;
t: update column4:` from t;
I tried making it a function:
colNames:`column1`column2`column3`column4;
t:{update x:` from t}each colNamesList;
But this only added one column and called it x.
Any suggestions to improve this code will be greatly appreciated. I have to add a lot more than just 4 columns and my code is very long because of this. Thank you!

Various ways to achieve this....
q)newcols:`col3`col4;
q)#[tab;newcols;:;`]
col1 col2 col3 col4
-------------------
a 1
b 2
c 3
Can also specify different types
q)#[tab;newcols;:;(`;0N)]
col1 col2 col3 col4
-------------------
a 1
b 2
c 3
Or do a functional update
q)![`tab;();0b;newcols!count[newcols]#enlist (),`]
`tab

Related

KDB/Q: How to write a function for select by

We know we can write a function like
select avg val by category from tab
But what if I need to write a complicated customized functions like
select myfunc by category from tab
where here myfund will compute with multiple columns in the tab. for example inside myfunc I might do another layer of select by, might do some filtering, etc. As a basic example how do I wrap below of a+b+c+d
select a+b+c+d by category from tab
inside a myfunc, where it has visibility into columns a, b, c, and d, and will do some manipulation with them?

You can replace avg quite easily with your own function like so:
select {[a;b;c;d]a+b+c+d}[a;b;c;d] by category from tab
If you want to do it row by row use each-both '
select {[a;b;c;d]a+b+c+d} ' [a;b;c;d] by category from tab
Could you provide an example of what you are trying to achieve with the additional by/filtering inside the function? Doesn't seem to me like the best approach

You can pass the columns into your function in a tabular format, e.g:
q)t:([]col1:10?`a`b`c;col2:10?10;col3:10?1f;col4:10?.z.D)
q)select {break}[([]col2;col3;col4)] by col1 from t
'break
[1] {break}
^
q))x
col2 col3 col4
-------------------------
9 0.5785203 2008.02.04
7 0.1959907 2003.07.05
8 0.6919531 2007.12.27
If you're going to use all columns inside of your functions then another approach is to group the table into subtables and run a function for each subtable:
func each t group t`col1

How do I remove multiple columns from a table?

So there's delete col from table to delete a single column. I suppose I could use over to delete multiple columns. But:
I'm not sure if this is efficient at all.
I'm not quite sure how to use over correctly here. Something like this doesn't work: {delete y from x}/[t;`name`job]

you can delete multiple columns the same way you can select multiple columns.
delete col1,col2 from table
It would definitely be less efficient to use over in this case.
There are however examples where you may want to pass column names as symbols into a function that does a select, or delete.
To do so requires using the functional form of delete: https://code.kx.com/q/ref/funsql/
Example of functinoal delete
q)table:([]col1:1 2 3;col2:1 2 3;col3:10 20 30)
q)//functional delete
q){![table;();0b;x]} `col1`col2
col3
----
10
20
30
q)//inplace functional delete
q){![`table;();0b;x]} `col1`col2
`table
q)table
col3
----
10
20
30

For an in-memory table you can also use drop: http://code.kx.com/q/ref/lists/#_-drop
q)((),`col1)_table
col2 col3
---------
1 10
2 20
3 30
q)((),`col1`col3)_table
col2
----
1
2
3
q)((),`)_table
col1 col2 col3
--------------
1 1 10
2 2 20
3 3 30

i cannot comment below etc211's solution, so i just started another answer post.
Hmm, functional delete doesn't seem to work when the list of columns is empty. I'd expect that not to touch the table at all, and yet it deletes all the rows in it instead.
For above, why don't you create a function that selects the columns that you are willing to delete?
let's assume the table t of yours contain column names:col1,col2,col3,col4
and you want to delete: col5,col6
from q code:
tgt_cols:`col5`col6;
filtered_cols: (cols t) inter tgt_cols;
if[0 < count filtered_cols;
{![`t;();0b;x]} filtered_cols];
Above will first check the existence of the columns that you want to remove; and if the target-columns-to-delete exists, it will remove those columns.

Row to Column conversion in Talend

I am learning Talend Open studio. I want to implement the scenario where a row converts into 3 rows. My Source is like
Col1 Col2 Col3
a b c
I want to get the output like below
Col
a
b
c
I have used tcolumntopivotdelimited but failed.

Here is the solution :
In your tmap you need to concat with a ";" for example and normalize the resulted column with the good delimiter

RE: Greatest Value with column name

Greatest value of multiple columns with column name?
I was reading the question above (link above) and the "ACCEPTED" answer (which seems correct) and have several questions concerning this answer.
(Sorry I have to create a new post, don't have a high enough reputation to comment on the old post as it seems very old)
Questions
My first question is what is the significance of "#var_max_val:= "? I reran the query without it and everything ran fine.
My second question is can someone explain how this achieve it's desired result:
CASE #var_max_val WHEN col1 THEN 'col1'
WHEN col2 THEN 'col2'
...
END AS max_value_column_name
My third question is as follows:
It seems that in this "case" statement he manually has to write a line of code ("when x then y") for every column in the table. This is fine if you have 1-5 columns. But what if you had 10,000? How would you go about it?
PS: I might be violating some forum rules in this post, do let me know if I am.
Thank you for reading, and thank you for your time!

The linked question is about mysql so it does not apply to postgresql (e.g. the #var_max_val syntax is specific to mysql). To accomplish the same thing in postgresql you can use a LATERAL subquery. For example, suppose that you have the following table and sample data:
CREATE TABLE t(col1 int, col2 int, col3 int);
INSERT INTO t VALUES (1,2,3), (5,8,6);
Then you can identify the maximum column for each row with the following query:
SELECT *
FROM t, LATERAL (
VALUES ('col1',col1),('col2',col2),('col3',col3)
ORDER BY 2 DESC
LIMIT 1
) l(maxcolname, maxcolval);
which produces the following output:
col1 | col2 | col3 | maxcolname | maxcolval
------+------+------+------------+-----------
1 | 2 | 3 | col3 | 3
5 | 8 | 6 | col2 | 8
I think this solution is much more elegant than the one presented in the linked article for mysql.
As for having to manually write the code, unfortunately, I do not think you can avoid that.

In Postgres 9.5 you can use jsonb functions to get column names. In this case you do not have to write manually all the columns names. The solution needs a primary key (or a unique column) for proper
grouping:
create table a_table(id serial primary key, col1 int, col2 int, col3 int);
insert into a_table (col1, col2, col3) values (1,2,3), (5,8,6);
select distinct on(id) id, key, value
from a_table t, jsonb_each(to_jsonb(t))
where key <> 'id'
order by id, value desc;
id | key | value
----+------+-------
1 | col3 | 3
2 | col2 | 8
(2 rows)

Grouping data in postgresql

If I have a table with multiple entries with same name I want to group only the name, i.e., show as many rows present in table but the name should appear only once and other data should show in multiple columns. i.e., for other rows name should be blank:
table expected result
---------------- ------------------
col1 col2 col1 col2
a 5 a 5
a 6 6
a 8 8
b 3 b 3
b 4 4
I'm using PostgreSQL 9.2.

You could use row_number to determine the first occurrence of each group, and from there, it's just a case away from not displaying it:
SELECT CASE rn WHEN 1 THEN col1 ELSE NULL END, col2
FROM (SELECT col1,
col2,
ROW_NUMBER() OVER (PARTITION BY col1
ORDER BY col2 ASC) AS rn
FROM my_table
ORDER BY col1, col2) t

Firstly I need to say that I do not have experience in PostgreSQL, just some basic SQL knowledge. It is not right to change data in original table itself, what you want is some 'view' of the data. Usually such things are made after data set is returned to client, actually it is a matter how to display the data (representation matter), and it should not be made in SQL query but on client side. But, if you want to bother the server with such things indeed, so I would do following: created copy of the table (it can be a temp table), then cleared values in col1 which are not the first in the subsequent select ordering records by col2. By the way, your table does not have primary key, so you will have a problem to implement that, since you can't identify parent record within the subsequent select.
So, the idea to archive that you need on client side (via a data cursor), just traversing records each by one, has even more points.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How do I add several columns at once in kdb? - kdb

Related

KDB/Q: How to write a function for select by

How do I remove multiple columns from a table?

Row to Column conversion in Talend

RE: Greatest Value with column name

Grouping data in postgresql

Categories

Resources