add column to table in kdb based of existing columns? - kdb

I want to add a new column to a kdb table, it should add based of the existing column by populating with the non null value as below
q)t:([]a:`a`b`c`d`e`f`g`h;b:1 0n 3 4 0n 6 0n 8;c:0n 2 0n 0n 5 0n 7 0n)
q)t
a b c
-----
a 1
b 2
c 3
d 4
e 5
f 6
g 7
h 8
I want to add a column d that would take the value from c or d that isn't null
to produce a table like this
a b c d
-------
a 1 1
b 2 2
c 3 3
d 4 4
e 5 5
f 6 6
g 7 7
h 8 8
I tried concatenating but then it has the null in it:
q)update d:(b,'c)from t
a b c d
----------
a 1 1
b 2 2
c 3 3
d 4 4
e 5 5
f 6 6
g 7 7
h 8 8

A vector conditional might be what you’re after, something like the below:
update d:?[null b;c;b] from t
You can read more about vector conditionals here. This expects a Boolean list as the first argument and returns values from a list in the second argument where True, or values from a list in the third argument where False.
For example:
q)?[10101b;”abcde”;”ABCDE”]
“aBcDe”
When used in conjunction with a select/update statement, columns of the table can be specified as the arguments to the vector conditional as these are simply lists.
As an aside, the null keyword returns a Boolean true where a value is null and is useful as part of your solution.

You can use the ^(fill) operator.
t:([]a:`a`b`c`d`e`f`g`h;b:1 0n 3 4 0n 6 0n 8;c:0n 2 0n 0n 5 0n 7 0n)
q)update d:b^c from t
a b c d
-------
a 1 1
b 2 2
c 3 3
d 4 4
e 5 5
f 6 6
g 7 7
h 8 8
It is worth noting that if you had a row with non-null values for b and c then the query above would default to the value in c. If you would prefer the value in b to be default then switch the inputs:
q)t:([]a:`a`b`c`d`e`f`g`h;b:1 0n 3 4 0n 6 0n 8;c:0n 2 0n 0n 5 100 7 0n)
q)update d:b^c from t
a b c d
-----------
a 1 1
b 2 2
c 3 3
d 4 4
e 5 5
f 6 100 100
g 7 7
h 8 8
q)update d:c^b from t
a b c d
---------
a 1 1
b 2 2
c 3 3
d 4 4
e 5 5
f 6 100 6
g 7 7
h 8 8

You could use 'or(|)' operator.
q)update d:b|c from t
Concat will give you a list with items from both 'b' and 'c' column. It will not remove null. 'or' will compare each pair of 'b' and 'c' and will return maximum value from that pair. As null is lesser than an integer, it will give you integer value either from 'b' or 'c' column.

Can use fill here - https://code.kx.com/wiki/Reference/Caret
q)t:([]a:`a`b`c`d`e`f`g`h;b:1 0n 3 4 0n 6 0n 8;c:0n 2 0n 0n 5 0n 7 0n)
q)update d:c^b from t
a b c d
-------
a 1 1
b 2 2
c 3 3
...

Related

separating the records in a kdb table

There is a table with a column that I would like to break into multiple records. For example
q)tab:([]a:1 2 3;b:(`a;`$"b c";`d);c:2 3 4)
q)tab
a b c
-------
1 a 2
2 b c 3
3 d 4
There is a space between b and c in the second entry of column b, I would like the table to become
a b c
-----
1 a 2
2 b 3
2 c 3
3 d 4
I tried
" " string vs exec b from tab
but didn't work.
Any idea?
Since b is the column with multiple entries per row, you can count each value and expand the corresponding row entries accordingly. Then ungroup like Terry mentioned should work.
q)t:([]a:1 2 3;b:(`a;`b`c;`d);c:2 3 4)
q)![t;();0b;{x!(enlist({(count each x)#'y};`b)),/:x}cols t]
a b c
------------
,1 ,`a ,2
2 2 `b`c 3 3
,3 ,`d ,4
q)ungroup ![t;();0b;{x!(enlist({(count each x)#'y};`b)),/:x}cols t]
a b c
-----
1 a 2
2 b 3
2 c 3
3 d 4
EDIT: Realised after your comment that the input is different. I think this is what you want.
q)t:([]a:1 2 3;b:(`a;`$"b c";`d);c:2 3 4)
q)ungroup update`$" "vs'string b from t
a b c
-----
1 a 2
2 b 3
2 c 3
3 d 4
You would normally do this using ungroup:
q)ungroup([]a:1 2 3;b:((),`a;`b`c;(),`d);c:2 3 4)
a b c
-----
1 a 2
2 b 3
2 c 3
3 d 4

SPSS Modeler group by and select top n rows

I would like to know what is the proper way in SPSS to group data by specydic column and then find top n max values.
For example I have below columns:
x<-c(3,2,1,8,7,11,10,9,7,5,4)
y<-c("a","a","a", "b","b","c","c","c","c","c","c")
z<-c(2,2,2,1,1,3,3,3,3,3,3)
I want to select top max n values from column X for each group by column y
x y
1 3 a
2 2 a
3 1 a
4 8 b
5 7 b
6 11 c
7 10 c 3
8 9 c 3
9 7 c 3
10 5 c 3
11 4 c 3

PostgreSQL, sum data from row of table?

x a b c d
----------
A 1 2 3 4
B 5 6 7 8
C 6 7 8 9
I want my sum of A = 1 + 2 + 3 + 4 and so for B and C, Is there any command that can sum row of data in PostgreSQL?
There is no such built-in function, but you can simply do the following:
select x, a+b+c+d as column_sum from mytable
Assuming, of course, that the data type of a, b, c and d are numeric.

How to sum across a row in KDB/Q

I have a table rCom which has various columns. I would like to sum across each row..
for example:
Date TypeA TypeB TypeC TypeD
date1 40.5 23.1 45.1 65.2
date2 23.3 32.2 56.1 30.1
How can I write a q query to add a fourth column 'Total' that sums across each row?
why not just:
update Total: TypeA+TypeB+TypeC+TypeD from rCom
?
Sum will work just fine:
q)flip`a`b`c!3 3#til 9
a b c
-----
0 3 6
1 4 7
2 5 8
q)update d:sum(a;b;c) from flip`a`b`c!3 3#til 9
a b c d
--------
0 3 6 9
1 4 7 12
2 5 8 15
Sum has map reduce which will be better for a huge table.
One quick point regarding summing across rows. You should be careful about nulls in 1 column resulting in a null result for the sum. Borrowing #WooiKent Lee's example.
We put a null into the first position of the a column. Notice how our sum now becomes null
q)wn:.[flip`a`b`c!3 3#til 9;(0;`a);first 0#] //with null
q)update d:sum (a;b;c) from wn
a b c d
--------
3 6
1 4 7 12
2 5 8 15
This is a direct effect of the way nulls in q are treated. If you sum across a simple list, the nulls are ignored
q)sum 1 2 3 0N
6
However, a sum across a general list will not display this behavior
q)sum (),/:1 2 3 0N
,0N
So, for your table situation, you might want to fill in with a zero beforehand
q)update d:sum 0^(a;b;c) from wn
a b c d
--------
3 6 9
1 4 7 12
2 5 8 15
Or alternatively, make it s.t. you are actually summing across simple lists rather than general lists.
q)update d:sum each flip (a;b;c) from wn
a b c d
--------
3 6 9
1 4 7 12
2 5 8 15
For a more complete reference on null treatment please see the reference website
This is what worked:
select Answer:{[x;y;z;a] x+y+z+a }'[TypeA;TypeB;TypeC;TypeD] from
([] dt:2014.01.01 2014.01.02 2014.01.03; TypeA:4 5 6; TypeB:1 2 3; TypeC:8 9 10; TypeD:3 4 5)

Conditionally replacing cell values with column names

I have a 165 x 165 rank matrix such that each row has values ranging from 1-165. I want to parse each row and delete all values >= 5, sort each row in increasing order, then replace the values 1-5 with the name of the column from the original matrix.
For example, for row k the values 1 ,2 3, 4, 5, would result after the first two transformations and would be replaced by p,d, m, n, a.
I am assuming that your array consists of an array of arrays...
Neither Awk, Sed, or Perl have multi-dimensional arrays. However, they can be emulated in Perl by using arrays of arrays.
$a[0]->[0] = xx;
$a[0]->[1] = yy;
[...]
$a[0]->[164] = zz;
$a[1]->[0] = qq;
$a[1]->[1] = rr;
[...]
$a[164]->[164] = vv;
Does this make sense?
I'm calling the row $x and columns $y, so an element in your array will be $array[$x]->[$y]. Is that good?
Okay, your column names will be in row $array[0], so if we find a value less than five in $array[$x]->[$y], we know the column name is in $array[0]->[$y]. Is that good?
for my $x (1..164) { #First row is column names
for my $y (0..164) {
if ($array[$x]->[$y] <= 5) {
$array[$x]->[$y] = $array[0]->[$y];
}
}
}
I'm simply going through all the rows, and for each row, all the columns, and checking the value. If the value is less than or equal to five, I replace it with the column name.
I hope I'm not doing your homework for you.
This GNU sed solution might work although it will need scaling up as I only used a 10x10 matrix for testing purposes:
# { echo {a..j};for x in {1..10};do seq 1 10 | shuf |sed 'N;N;N;N;N;N;N;N;N;s/\n/ /g';done; }> test_data
# cat test_data
a b c d e f g h i j
4 5 9 3 6 2 10 8 7 1
3 7 4 2 1 6 10 5 8 9
10 9 3 1 2 7 8 5 6 4
5 10 4 9 7 8 1 3 6 2
8 6 5 9 1 4 3 2 7 10
2 8 9 3 5 6 10 1 4 7
3 9 8 2 1 4 10 6 7 5
3 7 2 1 8 6 10 4 5 9
1 10 8 3 6 5 4 2 7 9
7 2 3 5 6 1 10 4 8 9
# cat test_data |
sed -rn '1{h;d};s/[0-9]{2,}|[6-9]/0/g;G;s/\n|$/ &/g;s/$/&1 2 3 4 5 /;:a;s/^(\S*) (.*\n)(\S* )(.*)/\2\4\1\3/;ta;s/\n//;s/0[^ ]? //g;:b;s/([1-5])(.*)\1(.)/\3\2/;tb;p'
j f d a b
e d a c h
d e c j h
g j h c a
e h g f c
h a d i e
e d a f j
d c a h i
a h d g f
f b c h d
The sed command works as follows.
The first line of the data file contains the column headings is stored in the hold space then the pattern space (current line) is deleted. For all subsequent data lines all two or more digit numbers and values 6 to 9 are converted to 0. The column names are appended, along with a newline to the data values. Spaces are inserted before the newline and end of string. The data is transformed into a lookup and the sorted values i.e.. 1 2 3 4 5 is prepended to it. The newline is removed along with any 0 values and associated lookups. The values 1 to 5 are replaced by the column names in the lookup.
EDIT:
I may have misunderstood the problem regarding sorting columns or rows, if so it's a minimal fix - replace 1 2 3 4 5 by the original values and perform a numeric sort prior to replacing the numeric data with column names from the lookup.