All,
I'm having trouble solving for what I believe to be a fairly straightforward task to search a table, identify a point and then truncate or delete the subsequent rows for a set of data within a table. I believe I need a nested function in my update query however I have not been successful writing one. I've also tried to create a "delete_me" column as well which will allow me to identify and then run a single delete which may be faster and better for auditing code as well.
Ideally, I'd like to wrap this in a callable function as there are a few different methods of truncation.
In my example below, I identify the maximum cumulative value date and then label the subsequent dated rows by id for eventual deletion.
///raw data for copy and paste - `:./Data/sample.csv;
id,idate,a,b,c
AAA,1/31/2014,1000,500,500
AAA,2/28/2014,900,500,50
AAA,3/31/2014,850,500,0
AAA,4/30/2014,800,500,0
AAA,5/31/2014,750,500,0
AAA,6/30/2014,700,500,0
AAA,7/31/2014,650,500,0
AAA,8/31/2014,550,500,0
AAA,9/30/2014,500,500,0
AAA,10/31/2014,450,500,0
BBB,6/30/2012,1000,500,2500
BBB,7/31/2012,950,500,75
BBB,8/31/2012,900,500,0
BBB,9/30/2012,850,500,0
BBB,10/31/2012,800,500,0
BBB,11/30/2012,750,500,0
BBB,12/31/2012,700,500,0
BBB,1/31/2013,650,500,0
BBB,2/28/2013,600,500,0
BBB,3/31/2013,550,500,0
BBB,4/30/2013,500,500,0
BBB,5/31/2013,450,500,0
BBB,6/30/2013,400,500,0
CCC,1/1/2016,1000,500,1200
CCC,2/29/2016,950,500,30
CCC,3/31/2016,900,500,0
CCC,4/30/2016,850,500,0
CCC,5/31/2016,800,500,0
CCC,6/30/2016,750,500,0
CCC,7/31/2016,700,500,0
CCC,8/31/2016,650,500,0
CCC,9/30/2016,600,500,0
CCC,10/31/2016,550,500,0
CCC,11/30/2016,500,500,0
CCC,12/31/2016,450,500,0
CCC,1/31/2017,400,500,0
CCC,2/28/2017,350,500,0
CCC,3/31/2017,300,500,0
CCC,4/30/2017,250,500,0
Load data and add some calculations
\c 100 150i
t:("SSFFF";enlist",") 0:`:./Data/sample.csv;
t: update kdbDate: "D"$string idate, d:(a-(b+c)),cum_d: sums (a-(b+c)) from t;
t:![t; (); (enlist`id)!enlist`id; (enlist`maxCum_d)!enlist(max;`cum_d)];
t:![t; enlist(=;`maxCum_d;`cum_d); (enlist`id)!enlist`id; (enlist `date_cutoff)!enlist(*:;`kdbDate)];
Below is where I'm presently stuck. I've also thought of using fills to just fill in the date_cutoff for the remaining rows per id as well and avoid creating another column altogether.
show exec max(date_cutoff) by id from t;
assignDelete:{[t] update del: `delete_me by id from t where max (date_cutoff) > kdbDate}; //<--STUCK--
t: assignDelete over t;
t:![t; enlist (~:;(^:;`del)); 0b; `symbol$()] ; //delete from t where not null `del
Many thanks in advance! Desired output below
q)t
id idate a b c kdbDate d cum_d maxCum_d date_cutoff del
----------------------------------------------------------------------------------
AAA 1/31/2014 1000 500 500 2014.01.31 0 0 1650
AAA 2/28/2014 900 500 50 2014.02.28 350 350 1650
AAA 3/31/2014 850 500 0 2014.03.31 350 700 1650
AAA 4/30/2014 800 500 0 2014.04.30 300 1000 1650
AAA 5/31/2014 750 500 0 2014.05.31 250 1250 1650
AAA 6/30/2014 700 500 0 2014.06.30 200 1450 1650
AAA 7/31/2014 650 500 0 2014.07.31 150 1600 1650
AAA 8/31/2014 550 500 0 2014.08.31 50 1650 1650 2014.08.31
AAA 9/30/2014 500 500 0 2014.09.30 0 1650 1650 2014.08.31 delete_me
AAA 10/31/2014 450 500 0 2014.10.31 -50 1600 1650 delete_me
BBB 6/30/2012 1000 500 2500 2012.06.30 -2000 -400 1775
BBB 7/31/2012 950 500 75 2012.07.31 375 -25 1775
BBB 8/31/2012 900 500 0 2012.08.31 400 375 1775
BBB 9/30/2012 850 500 0 2012.09.30 350 725 1775
BBB 10/31/2012 800 500 0 2012.10.31 300 1025 1775
BBB 11/30/2012 750 500 0 2012.11.30 250 1275 1775
BBB 12/31/2012 700 500 0 2012.12.31 200 1475 1775
BBB 1/31/2013 650 500 0 2013.01.31 150 1625 1775
BBB 2/28/2013 600 500 0 2013.02.28 100 1725 1775
BBB 3/31/2013 550 500 0 2013.03.31 50 1775 1775 2013.03.31
BBB 4/30/2013 500 500 0 2013.04.30 0 1775 1775 2013.03.31 delete_me
BBB 5/31/2013 450 500 0 2013.05.31 -50 1725 1775 delete_me
BBB 6/30/2013 400 500 0 2013.06.30 -100 1625 1775 delete_me
CCC 1/1/2016 1000 500 1200 2016.01.01 -700 925 3145
CCC 2/29/2016 950 500 30 2016.02.29 420 1345 3145
CCC 3/31/2016 900 500 0 2016.03.31 400 1745 3145
CCC 4/30/2016 850 500 0 2016.04.30 350 2095 3145
CCC 5/31/2016 800 500 0 2016.05.31 300 2395 3145
CCC 6/30/2016 750 500 0 2016.06.30 250 2645 3145
CCC 7/31/2016 700 500 0 2016.07.31 200 2845 3145
CCC 8/31/2016 650 500 0 2016.08.31 150 2995 3145
CCC 9/30/2016 600 500 0 2016.09.30 100 3095 3145
CCC 10/31/2016 550 500 0 2016.10.31 50 3145 3145 2016.10.31
CCC 11/30/2016 500 500 0 2016.11.30 0 3145 3145 2016.10.31 delete_me
CCC 12/31/2016 450 500 0 2016.12.31 -50 3095 3145 delete_me
CCC 1/31/2017 400 500 0 2017.01.31 -100 2995 3145 delete_me
CCC 2/28/2017 350 500 0 2017.02.28 -150 2845 3145 delete_me
CCC 3/31/2017 300 500 0 2017.03.31 -200 2645 3145 delete_me
CCC 4/30/2017 250 500 0 2017.04.30 -250 2395 3145 delete_me
[EDIT] using fills on another column seemed to work okay.
Note truncation after the max(cum_d)
t: update del:fills date_cutoff by id from t where kdbDate>date_cutoff;
or in functional form
t: ![t; enlist(>;`kdbDate;`date_cutoff);(enlist`id)!enlist`id;(enlist`del)! enlist (^\;`date_cutoff)];
id idate a b c kdbDate d cum_d maxCum_d date_cutoff del
----------------------------------------------------------------------------
AAA 1/31/2014 1000 500 500 2014.01.31 0 0 1650
AAA 2/28/2014 900 500 50 2014.02.28 350 350 1650
AAA 3/31/2014 850 500 0 2014.03.31 350 700 1650
AAA 4/30/2014 800 500 0 2014.04.30 300 1000 1650
AAA 5/31/2014 750 500 0 2014.05.31 250 1250 1650
AAA 6/30/2014 700 500 0 2014.06.30 200 1450 1650
AAA 7/31/2014 650 500 0 2014.07.31 150 1600 1650
AAA 8/31/2014 550 500 0 2014.08.31 50 1650 1650 2014.08.31
BBB 6/30/2012 1000 500 2500 2012.06.30 -2000 -400 1775
BBB 7/31/2012 950 500 75 2012.07.31 375 -25 1775
BBB 8/31/2012 900 500 0 2012.08.31 400 375 1775
BBB 9/30/2012 850 500 0 2012.09.30 350 725 1775
BBB 10/31/2012 800 500 0 2012.10.31 300 1025 1775
BBB 11/30/2012 750 500 0 2012.11.30 250 1275 1775
BBB 12/31/2012 700 500 0 2012.12.31 200 1475 1775
BBB 1/31/2013 650 500 0 2013.01.31 150 1625 1775
BBB 2/28/2013 600 500 0 2013.02.28 100 1725 1775
BBB 3/31/2013 550 500 0 2013.03.31 50 1775 1775 2013.03.31
CCC 1/1/2016 1000 500 1200 2016.01.01 -700 925 3145
CCC 2/29/2016 950 500 30 2016.02.29 420 1345 3145
CCC 3/31/2016 900 500 0 2016.03.31 400 1745 3145
CCC 4/30/2016 850 500 0 2016.04.30 350 2095 3145
CCC 5/31/2016 800 500 0 2016.05.31 300 2395 3145
CCC 6/30/2016 750 500 0 2016.06.30 250 2645 3145
CCC 7/31/2016 700 500 0 2016.07.31 200 2845 3145
CCC 8/31/2016 650 500 0 2016.08.31 150 2995 3145
CCC 9/30/2016 600 500 0 2016.09.30 100 3095 3145
CCC 10/31/2016 550 500 0 2016.10.31 50 3145 3145 2016.10.31
For this solution I've left-joined date_cutoff by id to the table so that all date_cutoff entries are non-null, then used a vector conditional to determine whether to delete or not.
q)t:t lj select last date_cutoff by id from t where not null date_cutoff
q)update del:?[date_cutoff<kdbDate;`delete_me;`]from t
So long as there is only one distinct date_cutoff within an id grouping, this should work.
The maxs function calculates the running maximum value of a given vector. You can avoid adding those auxiliary columns altogether by leveraging this with an fby where clause:
// define the table
q)t:("SSFFF";enlist",") 0:`:./Data/sample.csv;
q)t: update kdbDate: "D"$string idate, d:(a-(b+c)),cum_d: sums (a-(b+c)) from t;
// delete rows with one q-sql statement
q)delete from t where ({prev max[x]=maxs[x]};cum_d) fby id
I have this table:
execs:([]time:til 12;sym:`a`b`c`a`c`c`c`b`b`a`b`c;leavesQty:(1000;900;1300;800;1200;900;600;800;400;300;200;100))
I have different syms and for each at some time a leavesQty. And now I want to extend the table this way that at every row I get the sum of all leavesQty entries by sym at this time.
So I would have to come up to these values for this example:
execs:([]time:til 12;sym:`a`b`c`a`c`c`c`b`b`a`b`c;leavesQty:(1000;900;1300;800;1200;900;600;800;400;300;200;100);accLeavesQty:(1000;1900;3200;3000;2900;2600;2300;2200;1800;1300;1100;600))
You add this column with a single update statement if you use fby:
q)update accLeavesQty:sums (deltas;leavesQty) fby sym from execs
time sym leavesQty accLeavesQty
-------------------------------
0 a 1000 1000
1 b 900 1900
2 c 1300 3200
3 a 800 3000
4 c 1200 2900
5 c 900 2600
6 c 600 2300
7 b 800 2200
8 b 400 1800
9 a 300 1300
10 b 200 1100
11 c 100 600
Firstly you want to get the deltas of the leaves Qunatity for each symbol so can see how the value changes over time. After that you just need to do a cumulative sum of the resulting column.
q)update sums accLeavesQty from update accLeavesQty:deltas leavesQty by sym from execs
time sym leavesQty accLeavesQty
-------------------------------
0 a 1000 1000
1 b 900 1900
2 c 1300 3200
3 a 800 3000
4 c 1200 2900
5 c 900 2600
6 c 600 2300
7 b 800 2200
8 b 400 1800
9 a 300 1300
10 b 200 1100
11 c 100 600
You have a nice case for fby
q)update accLeavesQty:sums (deltas;leavesQty) fby sym from execs
time sym leavesQty accLeavesQty
-------------------------------
0 a 1000 1000
1 b 900 1900
2 c 1300 3200
3 a 800 3000
4 c 1200 2900
5 c 900 2600
6 c 600 2300
7 b 800 2200
8 b 400 1800
9 a 300 1300
10 b 200 1100
11 c 100 600
Another method involves recursion:
update accLeavesQty:sum each #[;;:;]\[()!();sym;leavesQty] from execs
It keeps a running dictionary of the last accLeavesQty for each sym and then calculates the sum of each of them
q)update accLeavesQty:#[;;:;]\[()!();sym;leavesQty] from execs
time sym leavesQty accLeavesQty
---------------------------------------
0 a 1000 (,`a)!,1000
1 b 900 `a`b!1000 900
2 c 1300 `a`b`c!1000 900 1300
3 a 800 `a`b`c!800 900 1300
4 c 1200 `a`b`c!800 900 1200
5 c 900 `a`b`c!800 900 900
6 c 600 `a`b`c!800 900 600
7 b 800 `a`b`c!800 800 600
8 b 400 `a`b`c!800 400 600
9 a 300 `a`b`c!300 400 600
10 b 200 `a`b`c!300 200 600
11 c 100 `a`b`c!300 200 100
I have one problem with exporting matrices from Matlab to Excel. This is not a problem, but I need some formatting.
I made matrices A and B and I printed them to .xlsx document.
filename = 'example.xlsx';
A;
sheet = 1;
xlRange = 'A9';
xlswrite(filename,A,sheet,xlRange)
B;
xlRange2= 'B9';
xlswrite(filename,B,sheet,xlRange2)
And i get the example.xlsx file with this formating:
400 4.56
500 5.12
600 6.76
700 7.98
800 8.21
900 9.21
1000 10.12
1100 11.23
1200 12.43
1300 13.89
1400 14.54
1500 15.21
1600 16.23
1700 17.53
I need this kind of formating:
400 4.56
500 5.12
600 6.76
700 7.98
800 8.21
900 9.21
1000 10.12
100 11.23
200 12.43
300 13.89
400 14.54
500 15.21
600 16.23
700 17.53
Steps are on 500, 1000, 1500, 2000, 2500... How to put one empty row and how to make this kind of formating?
This code provides the cell as required for xlswrite:
M=[400 4.56
500 5.12
600 6.76
700 7.98
800 8.21
900 9.21
1000 10.12
1100 11.23
1200 12.43
1300 13.89
1400 14.54
1500 15.21
1600 16.23
1700 17.53
900 9.21
1000 10.12
1100 11.23
1200 12.43
1300 13.89
1400 14.54
1500 15.21
1600 16.23
1700 17.53];
gaps=[500, 1000, 1500, 2000, 2500];
%calculates a group indx. 0 is below first gap, 1 between first and second etc..
group=sum(bsxfun(#ge,M(:,1),gaps),2);
%whenever group increases a line must be jumped, calculate indices
index=cumsum(ones(size(M,1),1)+[0;diff(group)>0]);
%allocate empty cell
X=cell(max(index),size(M,2));
%fill data
X(index,:)=num2cell(M);
xlswrite('a.xlsx',X)
Hi lets say that i have matrix size 5x5.
B=[1 2 3 4 5; 10 20 30 40 50; 100 200 300 400 500; 1000 2000 3000 4000 5000; 10000 20000 30000 40000 50000];
How do i use function sum, to sum rows between 2 and 4 and have result:
A = [1110;2220;3330;4440]
You'll find some useful information about matrix indexing in the documentation at http://www.mathworks.co.uk/help/matlab/math/matrix-indexing.html
To illustrate your example, you can use B(2:4,:) to retreive the following:
ans =
10 20 30 40 50
100 200 300 400 500
1000 2000 3000 4000 5000
You can then use the sum function as follows to achieve your desired result:
A = sum(B(2:4,:))
I hope this helps!
All the best,
Matt
MATLAB>> sum(B(2:4,1:4))
ans =
1110 2220 3330 4440
If you want to transpose the result, add ' at the end.