All,
I'm having trouble solving for what I believe to be a fairly straightforward task to search a table, identify a point and then truncate or delete the subsequent rows for a set of data within a table. I believe I need a nested function in my update query however I have not been successful writing one. I've also tried to create a "delete_me" column as well which will allow me to identify and then run a single delete which may be faster and better for auditing code as well.
Ideally, I'd like to wrap this in a callable function as there are a few different methods of truncation.
In my example below, I identify the maximum cumulative value date and then label the subsequent dated rows by id for eventual deletion.
///raw data for copy and paste - `:./Data/sample.csv;
id,idate,a,b,c
AAA,1/31/2014,1000,500,500
AAA,2/28/2014,900,500,50
AAA,3/31/2014,850,500,0
AAA,4/30/2014,800,500,0
AAA,5/31/2014,750,500,0
AAA,6/30/2014,700,500,0
AAA,7/31/2014,650,500,0
AAA,8/31/2014,550,500,0
AAA,9/30/2014,500,500,0
AAA,10/31/2014,450,500,0
BBB,6/30/2012,1000,500,2500
BBB,7/31/2012,950,500,75
BBB,8/31/2012,900,500,0
BBB,9/30/2012,850,500,0
BBB,10/31/2012,800,500,0
BBB,11/30/2012,750,500,0
BBB,12/31/2012,700,500,0
BBB,1/31/2013,650,500,0
BBB,2/28/2013,600,500,0
BBB,3/31/2013,550,500,0
BBB,4/30/2013,500,500,0
BBB,5/31/2013,450,500,0
BBB,6/30/2013,400,500,0
CCC,1/1/2016,1000,500,1200
CCC,2/29/2016,950,500,30
CCC,3/31/2016,900,500,0
CCC,4/30/2016,850,500,0
CCC,5/31/2016,800,500,0
CCC,6/30/2016,750,500,0
CCC,7/31/2016,700,500,0
CCC,8/31/2016,650,500,0
CCC,9/30/2016,600,500,0
CCC,10/31/2016,550,500,0
CCC,11/30/2016,500,500,0
CCC,12/31/2016,450,500,0
CCC,1/31/2017,400,500,0
CCC,2/28/2017,350,500,0
CCC,3/31/2017,300,500,0
CCC,4/30/2017,250,500,0
Load data and add some calculations
\c 100 150i
t:("SSFFF";enlist",") 0:`:./Data/sample.csv;
t: update kdbDate: "D"$string idate, d:(a-(b+c)),cum_d: sums (a-(b+c)) from t;
t:![t; (); (enlist`id)!enlist`id; (enlist`maxCum_d)!enlist(max;`cum_d)];
t:![t; enlist(=;`maxCum_d;`cum_d); (enlist`id)!enlist`id; (enlist `date_cutoff)!enlist(*:;`kdbDate)];
Below is where I'm presently stuck. I've also thought of using fills to just fill in the date_cutoff for the remaining rows per id as well and avoid creating another column altogether.
show exec max(date_cutoff) by id from t;
assignDelete:{[t] update del: `delete_me by id from t where max (date_cutoff) > kdbDate}; //<--STUCK--
t: assignDelete over t;
t:![t; enlist (~:;(^:;`del)); 0b; `symbol$()] ; //delete from t where not null `del
Many thanks in advance! Desired output below
q)t
id idate a b c kdbDate d cum_d maxCum_d date_cutoff del
----------------------------------------------------------------------------------
AAA 1/31/2014 1000 500 500 2014.01.31 0 0 1650
AAA 2/28/2014 900 500 50 2014.02.28 350 350 1650
AAA 3/31/2014 850 500 0 2014.03.31 350 700 1650
AAA 4/30/2014 800 500 0 2014.04.30 300 1000 1650
AAA 5/31/2014 750 500 0 2014.05.31 250 1250 1650
AAA 6/30/2014 700 500 0 2014.06.30 200 1450 1650
AAA 7/31/2014 650 500 0 2014.07.31 150 1600 1650
AAA 8/31/2014 550 500 0 2014.08.31 50 1650 1650 2014.08.31
AAA 9/30/2014 500 500 0 2014.09.30 0 1650 1650 2014.08.31 delete_me
AAA 10/31/2014 450 500 0 2014.10.31 -50 1600 1650 delete_me
BBB 6/30/2012 1000 500 2500 2012.06.30 -2000 -400 1775
BBB 7/31/2012 950 500 75 2012.07.31 375 -25 1775
BBB 8/31/2012 900 500 0 2012.08.31 400 375 1775
BBB 9/30/2012 850 500 0 2012.09.30 350 725 1775
BBB 10/31/2012 800 500 0 2012.10.31 300 1025 1775
BBB 11/30/2012 750 500 0 2012.11.30 250 1275 1775
BBB 12/31/2012 700 500 0 2012.12.31 200 1475 1775
BBB 1/31/2013 650 500 0 2013.01.31 150 1625 1775
BBB 2/28/2013 600 500 0 2013.02.28 100 1725 1775
BBB 3/31/2013 550 500 0 2013.03.31 50 1775 1775 2013.03.31
BBB 4/30/2013 500 500 0 2013.04.30 0 1775 1775 2013.03.31 delete_me
BBB 5/31/2013 450 500 0 2013.05.31 -50 1725 1775 delete_me
BBB 6/30/2013 400 500 0 2013.06.30 -100 1625 1775 delete_me
CCC 1/1/2016 1000 500 1200 2016.01.01 -700 925 3145
CCC 2/29/2016 950 500 30 2016.02.29 420 1345 3145
CCC 3/31/2016 900 500 0 2016.03.31 400 1745 3145
CCC 4/30/2016 850 500 0 2016.04.30 350 2095 3145
CCC 5/31/2016 800 500 0 2016.05.31 300 2395 3145
CCC 6/30/2016 750 500 0 2016.06.30 250 2645 3145
CCC 7/31/2016 700 500 0 2016.07.31 200 2845 3145
CCC 8/31/2016 650 500 0 2016.08.31 150 2995 3145
CCC 9/30/2016 600 500 0 2016.09.30 100 3095 3145
CCC 10/31/2016 550 500 0 2016.10.31 50 3145 3145 2016.10.31
CCC 11/30/2016 500 500 0 2016.11.30 0 3145 3145 2016.10.31 delete_me
CCC 12/31/2016 450 500 0 2016.12.31 -50 3095 3145 delete_me
CCC 1/31/2017 400 500 0 2017.01.31 -100 2995 3145 delete_me
CCC 2/28/2017 350 500 0 2017.02.28 -150 2845 3145 delete_me
CCC 3/31/2017 300 500 0 2017.03.31 -200 2645 3145 delete_me
CCC 4/30/2017 250 500 0 2017.04.30 -250 2395 3145 delete_me
[EDIT] using fills on another column seemed to work okay.
Note truncation after the max(cum_d)
t: update del:fills date_cutoff by id from t where kdbDate>date_cutoff;
or in functional form
t: ![t; enlist(>;`kdbDate;`date_cutoff);(enlist`id)!enlist`id;(enlist`del)! enlist (^\;`date_cutoff)];
id idate a b c kdbDate d cum_d maxCum_d date_cutoff del
----------------------------------------------------------------------------
AAA 1/31/2014 1000 500 500 2014.01.31 0 0 1650
AAA 2/28/2014 900 500 50 2014.02.28 350 350 1650
AAA 3/31/2014 850 500 0 2014.03.31 350 700 1650
AAA 4/30/2014 800 500 0 2014.04.30 300 1000 1650
AAA 5/31/2014 750 500 0 2014.05.31 250 1250 1650
AAA 6/30/2014 700 500 0 2014.06.30 200 1450 1650
AAA 7/31/2014 650 500 0 2014.07.31 150 1600 1650
AAA 8/31/2014 550 500 0 2014.08.31 50 1650 1650 2014.08.31
BBB 6/30/2012 1000 500 2500 2012.06.30 -2000 -400 1775
BBB 7/31/2012 950 500 75 2012.07.31 375 -25 1775
BBB 8/31/2012 900 500 0 2012.08.31 400 375 1775
BBB 9/30/2012 850 500 0 2012.09.30 350 725 1775
BBB 10/31/2012 800 500 0 2012.10.31 300 1025 1775
BBB 11/30/2012 750 500 0 2012.11.30 250 1275 1775
BBB 12/31/2012 700 500 0 2012.12.31 200 1475 1775
BBB 1/31/2013 650 500 0 2013.01.31 150 1625 1775
BBB 2/28/2013 600 500 0 2013.02.28 100 1725 1775
BBB 3/31/2013 550 500 0 2013.03.31 50 1775 1775 2013.03.31
CCC 1/1/2016 1000 500 1200 2016.01.01 -700 925 3145
CCC 2/29/2016 950 500 30 2016.02.29 420 1345 3145
CCC 3/31/2016 900 500 0 2016.03.31 400 1745 3145
CCC 4/30/2016 850 500 0 2016.04.30 350 2095 3145
CCC 5/31/2016 800 500 0 2016.05.31 300 2395 3145
CCC 6/30/2016 750 500 0 2016.06.30 250 2645 3145
CCC 7/31/2016 700 500 0 2016.07.31 200 2845 3145
CCC 8/31/2016 650 500 0 2016.08.31 150 2995 3145
CCC 9/30/2016 600 500 0 2016.09.30 100 3095 3145
CCC 10/31/2016 550 500 0 2016.10.31 50 3145 3145 2016.10.31
For this solution I've left-joined date_cutoff by id to the table so that all date_cutoff entries are non-null, then used a vector conditional to determine whether to delete or not.
q)t:t lj select last date_cutoff by id from t where not null date_cutoff
q)update del:?[date_cutoff<kdbDate;`delete_me;`]from t
So long as there is only one distinct date_cutoff within an id grouping, this should work.
The maxs function calculates the running maximum value of a given vector. You can avoid adding those auxiliary columns altogether by leveraging this with an fby where clause:
// define the table
q)t:("SSFFF";enlist",") 0:`:./Data/sample.csv;
q)t: update kdbDate: "D"$string idate, d:(a-(b+c)),cum_d: sums (a-(b+c)) from t;
// delete rows with one q-sql statement
q)delete from t where ({prev max[x]=maxs[x]};cum_d) fby id
I have this table:
execs:([]time:til 12;sym:`a`b`c`a`c`c`c`b`b`a`b`c;leavesQty:(1000;900;1300;800;1200;900;600;800;400;300;200;100))
I have different syms and for each at some time a leavesQty. And now I want to extend the table this way that at every row I get the sum of all leavesQty entries by sym at this time.
So I would have to come up to these values for this example:
execs:([]time:til 12;sym:`a`b`c`a`c`c`c`b`b`a`b`c;leavesQty:(1000;900;1300;800;1200;900;600;800;400;300;200;100);accLeavesQty:(1000;1900;3200;3000;2900;2600;2300;2200;1800;1300;1100;600))
You add this column with a single update statement if you use fby:
q)update accLeavesQty:sums (deltas;leavesQty) fby sym from execs
time sym leavesQty accLeavesQty
-------------------------------
0 a 1000 1000
1 b 900 1900
2 c 1300 3200
3 a 800 3000
4 c 1200 2900
5 c 900 2600
6 c 600 2300
7 b 800 2200
8 b 400 1800
9 a 300 1300
10 b 200 1100
11 c 100 600
Firstly you want to get the deltas of the leaves Qunatity for each symbol so can see how the value changes over time. After that you just need to do a cumulative sum of the resulting column.
q)update sums accLeavesQty from update accLeavesQty:deltas leavesQty by sym from execs
time sym leavesQty accLeavesQty
-------------------------------
0 a 1000 1000
1 b 900 1900
2 c 1300 3200
3 a 800 3000
4 c 1200 2900
5 c 900 2600
6 c 600 2300
7 b 800 2200
8 b 400 1800
9 a 300 1300
10 b 200 1100
11 c 100 600
You have a nice case for fby
q)update accLeavesQty:sums (deltas;leavesQty) fby sym from execs
time sym leavesQty accLeavesQty
-------------------------------
0 a 1000 1000
1 b 900 1900
2 c 1300 3200
3 a 800 3000
4 c 1200 2900
5 c 900 2600
6 c 600 2300
7 b 800 2200
8 b 400 1800
9 a 300 1300
10 b 200 1100
11 c 100 600
Another method involves recursion:
update accLeavesQty:sum each #[;;:;]\[()!();sym;leavesQty] from execs
It keeps a running dictionary of the last accLeavesQty for each sym and then calculates the sum of each of them
q)update accLeavesQty:#[;;:;]\[()!();sym;leavesQty] from execs
time sym leavesQty accLeavesQty
---------------------------------------
0 a 1000 (,`a)!,1000
1 b 900 `a`b!1000 900
2 c 1300 `a`b`c!1000 900 1300
3 a 800 `a`b`c!800 900 1300
4 c 1200 `a`b`c!800 900 1200
5 c 900 `a`b`c!800 900 900
6 c 600 `a`b`c!800 900 600
7 b 800 `a`b`c!800 800 600
8 b 400 `a`b`c!800 400 600
9 a 300 `a`b`c!300 400 600
10 b 200 `a`b`c!300 200 600
11 c 100 `a`b`c!300 200 100
As a simplifying example, I have
tbl:flip `sym`v1`v2!(`a`b`c`d; 50 280 1200 1800; 40 190 1300 1900)
and I d like to pass a column name into a function like
f:{[t;c];:update v3:2 * c from t;}
In this form it doesnt work. any suggestion how I can make this happen?
Thanks
Another option is to use the functional form of the update statement.
https://code.kx.com/q/ref/funsql/#functional-sql
q)tbl:flip `sym`v1`v2!(`a`b`c`d; 50 280 1200 1800; 40 190 1300 1900)
q)parse"update v3:2*x from t"
!
`t
()
0b
(,`v3)!,(*;2;`x)
q){![x;();0b;enlist[`v3]!enlist(*;2;y)]} [tbl;`v2]
sym v1 v2 v3
------------------
a 50 40 80
b 280 190 380
c 1200 1300 2600
d 1800 1900 3800
One option to achieve this is using # amend:
q){[t;c;n] #[t;n;:;2*t c]}[tbl;`v1;`v3]
sym v1 v2 v3
------------------
a 50 40 100
b 280 190 560
c 1200 1300 2400
d 1800 1900 3600
This updates the column c in table t saving the new value as column n. You could also alter this to allow you to pass in custom functions too:
{[t;c;n;f] #[t;n;:;f t c]}[tbl;`v1;`v3;{2*x}]
I use the following code to normalise my data in MATLAB:
Data=[130 100 100 100 300 300 30 300 30 320 200 50 300 25 100 250 1200 300 320 300 100 100 170 500 1000 200 120 450 200 2100 100 100 100 3450 500 30 100 550 4000 150 500 380 400 4000 180 540 700 100 500 2300 200 200 50 2000 400 100 50 50 100 4000 4000 3250 100 100 100 100 3300 100 100 4020 150 150 2300 3000 100 50 100 50 100 200 2000 300]
Data=randn(100,1);
%Histogram with histfit
nbins=20;
h=histfit(Data,nbins);
[nelements,bincenters]=hist(Data,nbins);
hold on %histogram computed manually and scaled with trapz()
avg=mean(Data);
stdev=std(Data);
x=sort(Data);
y=exp(-0.5*((x-avg)/stdev).^2)/(stdev*sqrt(2*pi));
plot(x,y*trapz(bincenters,nelements),'g','LineWidth',1.5)
legend('Histogram','Distribution from hisfit','Distribution computed manually')
It works well, I found the mean and standard deviation. Now, I try to inverse the y values with ln to find back the original points, but, I was not successful. Could you please give any idea to inverse y values?
I have one problem with exporting matrices from Matlab to Excel. This is not a problem, but I need some formatting.
I made matrices A and B and I printed them to .xlsx document.
filename = 'example.xlsx';
A;
sheet = 1;
xlRange = 'A9';
xlswrite(filename,A,sheet,xlRange)
B;
xlRange2= 'B9';
xlswrite(filename,B,sheet,xlRange2)
And i get the example.xlsx file with this formating:
400 4.56
500 5.12
600 6.76
700 7.98
800 8.21
900 9.21
1000 10.12
1100 11.23
1200 12.43
1300 13.89
1400 14.54
1500 15.21
1600 16.23
1700 17.53
I need this kind of formating:
400 4.56
500 5.12
600 6.76
700 7.98
800 8.21
900 9.21
1000 10.12
100 11.23
200 12.43
300 13.89
400 14.54
500 15.21
600 16.23
700 17.53
Steps are on 500, 1000, 1500, 2000, 2500... How to put one empty row and how to make this kind of formating?
This code provides the cell as required for xlswrite:
M=[400 4.56
500 5.12
600 6.76
700 7.98
800 8.21
900 9.21
1000 10.12
1100 11.23
1200 12.43
1300 13.89
1400 14.54
1500 15.21
1600 16.23
1700 17.53
900 9.21
1000 10.12
1100 11.23
1200 12.43
1300 13.89
1400 14.54
1500 15.21
1600 16.23
1700 17.53];
gaps=[500, 1000, 1500, 2000, 2500];
%calculates a group indx. 0 is below first gap, 1 between first and second etc..
group=sum(bsxfun(#ge,M(:,1),gaps),2);
%whenever group increases a line must be jumped, calculate indices
index=cumsum(ones(size(M,1),1)+[0;diff(group)>0]);
%allocate empty cell
X=cell(max(index),size(M,2));
%fill data
X(index,:)=num2cell(M);
xlswrite('a.xlsx',X)