summarise (avg) table (keyed) for each row - kdb

Given a keyed table, e.g.:
q)\S 7 / seed random numbers for reproducibility
q)v:flip (neg[d 0]?`1)!#[;prd[d]?12] d:4 6 / 4 cols 6 rows
q)show kt:([]letter:d[1]#.Q.an)!v
letter| c g b e
------| ----------
a | 11 0 3 9
b | 11 8 10 0
c | 7 2 2 3
d | 8 4 9 6
e | 0 0 5 0
f | 1 0 0 11
How to calculate an average for each row --- e.g. (c+g+b+e)%4 --- for any number of columns?

Following on from your own solution, note that you have to be a little careful with null handling. Your approach won't ignore nulls in the way that avg normally would.
q).[`kt;("a";`g);:;0N];
q)update av:avg flip value kt from kt
letter| c g b e av
------| ---------------
a | 11 3 9
b | 11 8 10 0 7.25
c | 7 2 2 3 3.5
d | 8 4 9 6 6.75
e | 0 0 5 0 1.25
f | 1 0 0 11 3
To make it ignore nulls you have to avg each row rather than averaging the flip.
q)update av:avg each value kt from kt
letter| c g b e av
------| -------------------
a | 11 3 9 7.666667
b | 11 8 10 0 7.25
c | 7 2 2 3 3.5
d | 8 4 9 6 6.75
e | 0 0 5 0 1.25
f | 1 0 0 11 3

Solution 1: q-sql
q)update av:avg flip value kt from kt
letter| c g b e av
------| ---------------
a | 11 0 3 9 5.75
b | 11 8 10 0 7.25
c | 7 2 2 3 3.5
d | 8 4 9 6 6.75
e | 0 0 5 0 1.25
f | 1 0 0 11 3
Solution 2: functional q-sql
tl;dr:
q)![kt;();0b;](1#`av)!enlist(avg;)enlist,cols[`kt]except cols key`kt
letter| c g b e av
------| ---------------
a | 11 0 3 9 5.75
b | 11 8 10 0 7.25
c | 7 2 2 3 3.5
d | 8 4 9 6 6.75
e | 0 0 5 0 1.25
f | 1 0 0 11 3
let's start with a look how the parse tree of a non-general solution would look like:
q)parse"update av:avg (c;g;b;e) from kt"
!
`kt
()
0b
(,`av)!,(avg;(enlist;`c;`g;`b;`e))
(note that q is a wrapper implemented in k, so the , prefix operator in the above expression is the same as enlist keyword in q)
so all the below are equivalent (verify with ~). relying on projection: (x;y)~(x;)y, we can further improve the readability by reducing the distance between parens:
q)k)(!;`kt;();0b;(,`av)!,(avg;(enlist;`c;`g;`b;`e)))
q)(!;`kt;();0b;(enlist`av)!enlist(avg;(enlist;`c;`g;`b;`e)))
q)(!;`kt;();0b;(1#`av)!enlist(avg;(enlist;`c;`g;`b;`e)))
q)(!;`kt;();0b;)(1#`av)!enlist(avg;)(enlist;`c;`g;`b;`e)
let's evaluate the parse tree to check:
q)eval(!;`kt;();0b;)(1#`av)!enlist(avg;)(enlist;`c;`g;`b;`e)
letter| c g b e av
------| ---------------
a | 11 0 3 9 5.75
b | 11 8 10 0 7.25
c | 7 2 2 3 3.5
d | 8 4 9 6 6.75
e | 0 0 5 0 1.25
f | 1 0 0 11 3
(enlist;`c;`g;`b;`e) in the general case is:
q)enlist,cols[`kt]except cols key`kt
enlist
`c
`g
`b
`e
so let's plug in and check:
q)eval(!;`kt;();0b;(1#`av)!enlist(avg;)enlist,cols[`kt]except cols key`kt)
letter| c g b e av
------| ---------------
a | 11 0 3 9 5.75
b | 11 8 10 0 7.25
c | 7 2 2 3 3.5
d | 8 4 9 6 6.75
e | 0 0 5 0 1.25
f | 1 0 0 11 3
also:
q)![`kt;();0b;(1#`av)!enlist(avg;)enlist,cols[`kt]except cols key`kt]
q)![ kt;();0b;](1#`av)!enlist(avg;)enlist,cols[`kt]except cols key`kt

Related

Why is my conforming dictionary not getting turned into a table?

Let's say I have a table:
m:([] t: raze 3#'(2021.01.04+til 5); sym:15#`A`B`C; c: til 15)
t sym c
-----------------
2021.01.04 A 0
2021.01.04 B 1
2021.01.04 C 2
2021.01.05 A 3
2021.01.05 B 4
When I try to pivot it:
exec t!c by sym:sym from m
sym|
---| -----------------------------------------------------------------
A | 2021.01.04 2021.01.05 2021.01.06 2021.01.07 2021.01.08!0 3 6 9 12
B | 2021.01.04 2021.01.05 2021.01.06 2021.01.07 2021.01.08!1 4 7 10 13
C | 2021.01.04 2021.01.05 2021.01.06 2021.01.07 2021.01.08!2 5 8 11 14
I'd expect to get a table back, with columns sym, but I don't. What am I doing wrong?
if you're after a pivot with columns of sym you would want the following:
q)exec sym!c by t:t from m
t | A B C
----------| --------
2021.01.04| 0 1 2
2021.01.05| 3 4 5
2021.01.06| 6 7 8
2021.01.07| 9 10 11
2021.01.08| 12 13 14
It's because your column names have to be symbols:
q)exec(`$string t)!c by sym:sym from m
sym| 2021.01.04 2021.01.05 2021.01.06 2021.01.07 2021.01.08
---| ------------------------------------------------------
A | 0 3 6 9 12
B | 1 4 7 10 13
C | 2 5 8 11 14
These would be terrible column names though, so I would use .Q.id
q).Q.id exec(`$string t)!c by sym:sym from m
sym| a20210104 a20210105 a20210106 a20210107 a20210108
---| -------------------------------------------------
A | 0 3 6 9 12
B | 1 4 7 10 13
C | 2 5 8 11 14
It sounds like this isn't what you actually want though, so maybe Matthews answer is more relevant. My answer just explains why it didn't look like what you thought

The simple recursive function doesn't work well

It's just an easy recursive function test.
It should stop at n = 3, but not.
Could you please tell me where is wrong in my code?
Thank you!
>> recursiveFunction(0)
101
1
g
102
1
g
103
1
2
3
2
g
103
1
2
3
3
g
103
1
2
3
2
g
102
1
g
103
1
2
3
2
g
103
1
2
3
3
g
103
1
2
3
3
g
102
1
g
103
1
2
3
2
g
103
1
2
3
3
g
103
1
2
3
function recursiveFunction(callHierarchie)
callHierarchie = callHierarchie + 1;
disp(callHierarchie + 100);
for n = 1:3
disp(n);
if callHierarchie <= 2
disp('g');
recursiveFunction(callHierarchie);
end
end
end
The problem is both how you're generating your output and how you're interpreting your output. Here's a Python equivalent function that generates the same output:
def recursiveFunction1(callHierarchie):
callHierarchie = callHierarchie + 1
print("{:>6}".format(callHierarchie + 100))
for n in range(1, 4):
print("{:>6}".format(n))
if callHierarchie <= 2:
print('g')
recursiveFunction(callHierarchie)
recursiveFunction(0)
Folks can verify it produces the same output. Let's modify the code to indent based on the recursion level:
def recursiveFunction(callHierarchie):
callHierarchie = callHierarchie + 1
print(" " * callHierarchie, "{:>6}".format(callHierarchie + 100))
for n in range(1, 4):
print(" " * callHierarchie, "{:>6}".format(n))
if callHierarchie <= 2:
print(" " * callHierarchie, 'g')
recursiveFunction(callHierarchie)
Now the output displays slightly differently:
% python3 test.py
101
1
g
102
1
g
103
1
2
3
2
g
103
1
2
3
3
g
103
1
2
3
2
g
102
1
g
103
1
2
3
2
g
103
1
2
3
3
g
103
1
2
3
3
g
102
1
g
103
1
2
3
2
g
103
1
2
3
3
g
103
1
2
3
%
You can see that n does stop at 3, but the extra numbers you were seeing were n at a different level of recursion!

kdb passing column names into functions

I have a table
t: flip `ref`a`b`c`d`e!(til 10;10?10;10?10;10?10;10?10;10?10)
ref a b c d e
0 5 3 3 9 1
1 1 9 0 0 0
2 5 9 4 1 7
3 0 0 5 1 3
4 2 6 8 9 3
5 3 2 0 6 6
6 7 6 4 9 8
7 4 8 9 7 2
8 7 0 8 8 3
9 7 9 0 4 8
how can I set all values in columns a,b,c ,.. to 0Ni if their value equals the value in column ref without having to do a single line update for every columns?
So something taht would look a bit like (which returns ERROR:type)
{update x:?[x=t;0Ni;x] from t} each `a`b`c`....
The previous answer involves working with strings, this can often get very messy. To avoid this it is possible to build up the query by passing column names as symbols instead. The dictionary for the functional select can be built up using the following function:
q){y!enlist[({?[y=x;0Ni;y]};x)],/:y:(),y}[`ref;`a`b`c]
a| ({?[y=x;0Ni;x]};`ref) `a
b| ({?[y=x;0Ni;x]};`ref) `b
c| ({?[y=x;0Ni;x]};`ref) `c
The initial column is x and it allows to any number of columns to be passed as y for comparison.
This can then be added into the functional select:
q)![t;();0b;{y!enlist[({?[y=x;0Ni;y]};x)],/:y:(),y}[`ref;`a`b`c]]
ref a b c d e
-------------
0 4 5 8 4 8
1 2 6 9 1
2 8 4 7 2 9
3 0 1 2 7 5
4 5 3 0 4
5 8 3 1 6
6 5 7 4 9 6
7 2 8 2 2 1
8 2 7 1 8
9 6 1 8 8 5
You may reuse the next code snippet
t: flip `ref`a`b`c`d`e!(til 10;10?10;10?10;10?10;10?10;10?10);
columns: `a`b`c`d`e;
![t;();0b;columns!{parse "?[",x,"=ref;0Ni;",x,"]" }each string columns]
Where updated columns are put in columns list.
And functional update, which maps every column X to value ?[x=ref;0Ni;x], is used

Fit a piecewise regression in matlab and find change point

In matlab, I want to fit a piecewise regression and find where on the x-axis the first change-point occurs. For example, for the following data, the output might be changepoint=20 (I don't actually want to plot it, just want the change point).
data = [1 4 4 3 4 0 0 4 5 4 5 2 5 10 5 1 4 15 4 9 11 16 23 25 24 17 31 42 35 45 49 54 74 69 63 46 35 31 27 15 10 5 10 4 2 4 2 2 3 5 2 2];
x = 1:52;
plot(x,data,'.')
If you have the Signal Processing Toolbox, you can directly use the findchangepts function (see https://www.mathworks.com/help/signal/ref/findchangepts.html for documentation):
data = [1 4 4 3 4 0 0 4 5 4 5 2 5 10 5 1 4 15 4 9 11 16 23 25 24 17 31 42 35 45 49 54 74 69 63 46 35 31 27 15 10 5 10 4 2 4 2 2 3 5 2 2];
x = 1:52;
ipt = findchangepts(data);
x_cp = x(ipt);
data_cp = data(ipt);
plot(x,data,'.',x_cp,data_cp,'o')
The index of the change point in this case is 22.
Plot of data and its change point circled in red:
I know this is an old question but just want to provide some extra thoughts. In Maltab, an alternative implemented by me is a Bayesian changepoint detection algorithm that estimates not just the number and locations of the changepoints but also reports the occurrence probability of changepoints. In its current implementation, it deals with only time-series-like data (aka, 1D sequential data). More info about the tool is available at this FileExchange entry (https://www.mathworks.com/matlabcentral/fileexchange/72515-bayesian-changepoint-detection-time-series-decomposition).
Here is its quick application to your sample data:
% Automatically install the Rbeast or BEAST library to local drive
eval(webread('http://b.link/beast')) %
data = [1 4 4 3 4 0 0 4 5 4 5 2 5 10 5 1 4 15 4 9 11 16 23 25 24 17 31 42 35 45 49 54 74 69 63 46 35 31 27 15 10 5 10 4 2 4 2 2 3 5 2 2];
out = beast(data, 'season','none') % season='none': there is no seasonal/periodic variation in the data
printbeast(out)
plotbeast(out)
Below is a summary of the changepoint, given by printbeast():
#####################################################################
# Trend Changepoints #
#####################################################################
.-------------------------------------------------------------------.
| Ascii plot of probability distribution for number of chgpts (ncp) |
.-------------------------------------------------------------------.
|Pr(ncp = 0 )=0.000|* |
|Pr(ncp = 1 )=0.000|* |
|Pr(ncp = 2 )=0.000|* |
|Pr(ncp = 3 )=0.859|*********************************************** |
|Pr(ncp = 4 )=0.133|******** |
|Pr(ncp = 5 )=0.008|* |
|Pr(ncp = 6 )=0.000|* |
|Pr(ncp = 7 )=0.000|* |
|Pr(ncp = 8 )=0.000|* |
|Pr(ncp = 9 )=0.000|* |
|Pr(ncp = 10)=0.000|* |
.-------------------------------------------------------------------.
| Summary for number of Trend ChangePoints (tcp) |
.-------------------------------------------------------------------.
|ncp_max = 10 | MaxTrendKnotNum: A parameter you set |
|ncp_mode = 3 | Pr(ncp= 3)=0.86: There is a 85.9% probability |
| | that the trend component has 3 changepoint(s).|
|ncp_mean = 3.15 | Sum{ncp*Pr(ncp)} for ncp = 0,...,10 |
|ncp_pct10 = 3.00 | 10% percentile for number of changepoints |
|ncp_median = 3.00 | 50% percentile: Median number of changepoints |
|ncp_pct90 = 4.00 | 90% percentile for number of changepoints |
.-------------------------------------------------------------------.
| List of probable trend changepoints ranked by probability of |
| occurrence: Please combine the ncp reported above to determine |
| which changepoints below are practically meaningful |
'-------------------------------------------------------------------'
|tcp# |time (cp) |prob(cpPr) |
|------------------|---------------------------|--------------------|
|1 |33.000000 |1.00000 |
|2 |42.000000 |0.98271 |
|3 |19.000000 |0.69183 |
|4 |26.000000 |0.03950 |
|5 |11.000000 |0.02292 |
.-------------------------------------------------------------------.
Here is the graphic output. Three major changepoints are detected:
You can use sgolayfilt function, that is a polynomial fit to the data, or reproduce OLS method: http://www.utdallas.edu/~herve/Abdi-LeastSquares06-pretty.pdf (there is a+bx notation instead of ax+b)
For linear fit of ax+b:
If you replace x with constant vector of length 2n+1: [-n, ... 0 ... n] on each step, you get the following code for sliding regression coeffs:
for i=1+n:length(y)-n
yi = y(i-n : i+n);
sum_xy = sum(yi.*x);
a(i) = sum_xy/sum_x2;
b(i) = sum(yi)/n;
end
Notice that in this code b means sliding average of your data, and a is a least-square slope estimate (first derivate).

Reorder Table Rows and columns Matlab

I have a 5x5 table:
a b c d e
a 1 2 3 4 5
b 3 5 7 2 6
c 1 3 4 6 1
d 4 4 1 7 8
e 6 7 2 1 6
where the headers are the strings.
I want to know how to reorder the table rows and columns using the headers
so for example of I wanted them to be in this order e b c a d then this will be the table:
e b c a d
e 6 7 2 6 1
b 6 5 7 3 2
c 1 3 4 1 6
a 5 7 3 1 4
d 8 4 1 4 7
Let the table be defined as
T = table;
T.a = [1 3 1 4 6].';
T.b = [2 5 3 4 7].';
T.c = [3 7 4 1 2].';
T.d = [4 2 6 7 1].';
T.e = [5 6 1 8 6].';
And let the new desired order be
order = {'e' 'b' 'c' 'a' 'd'};
The table can be reordered using just indexing:
[~, ind] = ismember(order, T.Properties.VariableNames);
T_reordered = T(ind,order);
Note that:
To reorder only columns you'd use T_reorderedCols = T(:,order);
To reorder only rows you'd use T_reorderedRows = T(ind,:);
So in this example,
T =
a b c d e
_ _ _ _ _
1 2 3 4 5
3 5 7 2 6
1 3 4 6 1
4 4 1 7 8
6 7 2 1 6
T_reordered =
e b c a d
_ _ _ _ _
6 7 2 6 1
6 5 7 3 2
1 3 4 1 6
5 2 3 1 4
8 4 1 4 7
Here is a way to do it using indexing. You can indeed re-arrange the rows and columns using indices as you would for any array. In this case, I substitute each letter in the headers array with a number (originally [1 2 3 4 5]) and then, using a vector defining the new order [5 2 3 1 4], re-order the table. You could make some kind of lookup table to automate this when you deal with larger tables:
clc
clear
a = [1 2 3 4 5;
3 5 7 2 6;
1 3 4 6 1;
4 4 1 7 8;
6 7 2 1 6];
headers = {'a' 'b' 'c' 'd' 'e'};
%// Original order. Not used but useful to understand the idea... I think :)
OriginalOrder = 1:5;
%// New order
NewOrder = [5 2 3 1 4];
%// Create table
t = table(a(:,1),a(:,2),a(:,3),a(:,4),a(:,5),'RowNames',headers,'VariableNames',headers)
As a less cumbersome alternative to manually creating the table with the function table, you can use (thanks to #excaza) the function array2table which saves a couple steps:
t = array2table(a,'RowNames',headers,'VariableNames',headers)
Either way, re-arrange the table using the new indices:
New_t = t(NewOrder,NewOrder)
Output:
t =
a b c d e
_ _ _ _ _
a 1 2 3 4 5
b 3 5 7 2 6
c 1 3 4 6 1
d 4 4 1 7 8
e 6 7 2 1 6
New_t =
e b c a d
_ _ _ _ _
e 6 7 2 6 1
b 6 5 7 3 2
c 1 3 4 1 6
a 5 2 3 1 4
d 8 4 1 4 7