kdb - truncate subsequent rows based on point of data

kdb - truncate subsequent rows based on point of data - kdb

All,
I'm having trouble solving for what I believe to be a fairly straightforward task to search a table, identify a point and then truncate or delete the subsequent rows for a set of data within a table. I believe I need a nested function in my update query however I have not been successful writing one. I've also tried to create a "delete_me" column as well which will allow me to identify and then run a single delete which may be faster and better for auditing code as well.
Ideally, I'd like to wrap this in a callable function as there are a few different methods of truncation.
In my example below, I identify the maximum cumulative value date and then label the subsequent dated rows by id for eventual deletion.
///raw data for copy and paste - `:./Data/sample.csv;
id,idate,a,b,c
AAA,1/31/2014,1000,500,500
AAA,2/28/2014,900,500,50
AAA,3/31/2014,850,500,0
AAA,4/30/2014,800,500,0
AAA,5/31/2014,750,500,0
AAA,6/30/2014,700,500,0
AAA,7/31/2014,650,500,0
AAA,8/31/2014,550,500,0
AAA,9/30/2014,500,500,0
AAA,10/31/2014,450,500,0
BBB,6/30/2012,1000,500,2500
BBB,7/31/2012,950,500,75
BBB,8/31/2012,900,500,0
BBB,9/30/2012,850,500,0
BBB,10/31/2012,800,500,0
BBB,11/30/2012,750,500,0
BBB,12/31/2012,700,500,0
BBB,1/31/2013,650,500,0
BBB,2/28/2013,600,500,0
BBB,3/31/2013,550,500,0
BBB,4/30/2013,500,500,0
BBB,5/31/2013,450,500,0
BBB,6/30/2013,400,500,0
CCC,1/1/2016,1000,500,1200
CCC,2/29/2016,950,500,30
CCC,3/31/2016,900,500,0
CCC,4/30/2016,850,500,0
CCC,5/31/2016,800,500,0
CCC,6/30/2016,750,500,0
CCC,7/31/2016,700,500,0
CCC,8/31/2016,650,500,0
CCC,9/30/2016,600,500,0
CCC,10/31/2016,550,500,0
CCC,11/30/2016,500,500,0
CCC,12/31/2016,450,500,0
CCC,1/31/2017,400,500,0
CCC,2/28/2017,350,500,0
CCC,3/31/2017,300,500,0
CCC,4/30/2017,250,500,0
Load data and add some calculations
\c 100 150i
t:("SSFFF";enlist",") 0:`:./Data/sample.csv;
t: update kdbDate: "D"$string idate, d:(a-(b+c)),cum_d: sums (a-(b+c)) from t;
t:![t; (); (enlist`id)!enlist`id; (enlist`maxCum_d)!enlist(max;`cum_d)];
t:![t; enlist(=;`maxCum_d;`cum_d); (enlist`id)!enlist`id; (enlist `date_cutoff)!enlist(*:;`kdbDate)];
Below is where I'm presently stuck. I've also thought of using fills to just fill in the date_cutoff for the remaining rows per id as well and avoid creating another column altogether.
show exec max(date_cutoff) by id from t;
assignDelete:{[t] update del: `delete_me by id from t where max (date_cutoff) > kdbDate}; //<--STUCK--
t: assignDelete over t;
t:![t; enlist (~:;(^:;`del)); 0b; `symbol$()] ; //delete from t where not null `del
Many thanks in advance! Desired output below
q)t
id idate a b c kdbDate d cum_d maxCum_d date_cutoff del
----------------------------------------------------------------------------------
AAA 1/31/2014 1000 500 500 2014.01.31 0 0 1650
AAA 2/28/2014 900 500 50 2014.02.28 350 350 1650
AAA 3/31/2014 850 500 0 2014.03.31 350 700 1650
AAA 4/30/2014 800 500 0 2014.04.30 300 1000 1650
AAA 5/31/2014 750 500 0 2014.05.31 250 1250 1650
AAA 6/30/2014 700 500 0 2014.06.30 200 1450 1650
AAA 7/31/2014 650 500 0 2014.07.31 150 1600 1650
AAA 8/31/2014 550 500 0 2014.08.31 50 1650 1650 2014.08.31
AAA 9/30/2014 500 500 0 2014.09.30 0 1650 1650 2014.08.31 delete_me
AAA 10/31/2014 450 500 0 2014.10.31 -50 1600 1650 delete_me
BBB 6/30/2012 1000 500 2500 2012.06.30 -2000 -400 1775
BBB 7/31/2012 950 500 75 2012.07.31 375 -25 1775
BBB 8/31/2012 900 500 0 2012.08.31 400 375 1775
BBB 9/30/2012 850 500 0 2012.09.30 350 725 1775
BBB 10/31/2012 800 500 0 2012.10.31 300 1025 1775
BBB 11/30/2012 750 500 0 2012.11.30 250 1275 1775
BBB 12/31/2012 700 500 0 2012.12.31 200 1475 1775
BBB 1/31/2013 650 500 0 2013.01.31 150 1625 1775
BBB 2/28/2013 600 500 0 2013.02.28 100 1725 1775
BBB 3/31/2013 550 500 0 2013.03.31 50 1775 1775 2013.03.31
BBB 4/30/2013 500 500 0 2013.04.30 0 1775 1775 2013.03.31 delete_me
BBB 5/31/2013 450 500 0 2013.05.31 -50 1725 1775 delete_me
BBB 6/30/2013 400 500 0 2013.06.30 -100 1625 1775 delete_me
CCC 1/1/2016 1000 500 1200 2016.01.01 -700 925 3145
CCC 2/29/2016 950 500 30 2016.02.29 420 1345 3145
CCC 3/31/2016 900 500 0 2016.03.31 400 1745 3145
CCC 4/30/2016 850 500 0 2016.04.30 350 2095 3145
CCC 5/31/2016 800 500 0 2016.05.31 300 2395 3145
CCC 6/30/2016 750 500 0 2016.06.30 250 2645 3145
CCC 7/31/2016 700 500 0 2016.07.31 200 2845 3145
CCC 8/31/2016 650 500 0 2016.08.31 150 2995 3145
CCC 9/30/2016 600 500 0 2016.09.30 100 3095 3145
CCC 10/31/2016 550 500 0 2016.10.31 50 3145 3145 2016.10.31
CCC 11/30/2016 500 500 0 2016.11.30 0 3145 3145 2016.10.31 delete_me
CCC 12/31/2016 450 500 0 2016.12.31 -50 3095 3145 delete_me
CCC 1/31/2017 400 500 0 2017.01.31 -100 2995 3145 delete_me
CCC 2/28/2017 350 500 0 2017.02.28 -150 2845 3145 delete_me
CCC 3/31/2017 300 500 0 2017.03.31 -200 2645 3145 delete_me
CCC 4/30/2017 250 500 0 2017.04.30 -250 2395 3145 delete_me
[EDIT] using fills on another column seemed to work okay.
Note truncation after the max(cum_d)
t: update del:fills date_cutoff by id from t where kdbDate>date_cutoff;
or in functional form
t: ![t; enlist(>;`kdbDate;`date_cutoff);(enlist`id)!enlist`id;(enlist`del)! enlist (^\;`date_cutoff)];
id idate a b c kdbDate d cum_d maxCum_d date_cutoff del
----------------------------------------------------------------------------
AAA 1/31/2014 1000 500 500 2014.01.31 0 0 1650
AAA 2/28/2014 900 500 50 2014.02.28 350 350 1650
AAA 3/31/2014 850 500 0 2014.03.31 350 700 1650
AAA 4/30/2014 800 500 0 2014.04.30 300 1000 1650
AAA 5/31/2014 750 500 0 2014.05.31 250 1250 1650
AAA 6/30/2014 700 500 0 2014.06.30 200 1450 1650
AAA 7/31/2014 650 500 0 2014.07.31 150 1600 1650
AAA 8/31/2014 550 500 0 2014.08.31 50 1650 1650 2014.08.31
BBB 6/30/2012 1000 500 2500 2012.06.30 -2000 -400 1775
BBB 7/31/2012 950 500 75 2012.07.31 375 -25 1775
BBB 8/31/2012 900 500 0 2012.08.31 400 375 1775
BBB 9/30/2012 850 500 0 2012.09.30 350 725 1775
BBB 10/31/2012 800 500 0 2012.10.31 300 1025 1775
BBB 11/30/2012 750 500 0 2012.11.30 250 1275 1775
BBB 12/31/2012 700 500 0 2012.12.31 200 1475 1775
BBB 1/31/2013 650 500 0 2013.01.31 150 1625 1775
BBB 2/28/2013 600 500 0 2013.02.28 100 1725 1775
BBB 3/31/2013 550 500 0 2013.03.31 50 1775 1775 2013.03.31
CCC 1/1/2016 1000 500 1200 2016.01.01 -700 925 3145
CCC 2/29/2016 950 500 30 2016.02.29 420 1345 3145
CCC 3/31/2016 900 500 0 2016.03.31 400 1745 3145
CCC 4/30/2016 850 500 0 2016.04.30 350 2095 3145
CCC 5/31/2016 800 500 0 2016.05.31 300 2395 3145
CCC 6/30/2016 750 500 0 2016.06.30 250 2645 3145
CCC 7/31/2016 700 500 0 2016.07.31 200 2845 3145
CCC 8/31/2016 650 500 0 2016.08.31 150 2995 3145
CCC 9/30/2016 600 500 0 2016.09.30 100 3095 3145
CCC 10/31/2016 550 500 0 2016.10.31 50 3145 3145 2016.10.31

For this solution I've left-joined date_cutoff by id to the table so that all date_cutoff entries are non-null, then used a vector conditional to determine whether to delete or not.
q)t:t lj select last date_cutoff by id from t where not null date_cutoff
q)update del:?[date_cutoff<kdbDate;`delete_me;`]from t
So long as there is only one distinct date_cutoff within an id grouping, this should work.

The maxs function calculates the running maximum value of a given vector. You can avoid adding those auxiliary columns altogether by leveraging this with an fby where clause:
// define the table
q)t:("SSFFF";enlist",") 0:`:./Data/sample.csv;
q)t: update kdbDate: "D"$string idate, d:(a-(b+c)),cum_d: sums (a-(b+c)) from t;
// delete rows with one q-sql statement
q)delete from t where ({prev max[x]=maxs[x]};cum_d) fby id

Related

kdb - KDB Apply logic where column exists - data validation

I'm trying to perform some simple logic on a table but I'd like to verify that the columns exists prior to doing so as a validation step. My data consists of standard table names though they are not always present in each data source.
While the following seems to work (just validating AAA at present) I need to expand to ensure that PRI_AAA (and eventually many other variables) is present as well.
t: $[`AAA in cols `t; temp: update AAA_VAL: AAA*AAA_PRICE from t;()]
Two part question
This seems quite tedious for each variable (imagine AAA-ZZZ inputs and their derivatives). Is there a clever way to leverage a dictionary (or table) to see if a number of variables exists or insert a place holder column of zeros if they do not?
Similarly, can we store a formula or instructions to to apply within a dictionary (or table) to validate and return a calculation (i.e. BBB_VAL: BBB*BBB_PRICE.) Some calculations would be dependent on others (i.e. BBB_Tax_Basis = BBB_VAL - BBB_COSTS costs for example so there could be iterative issues.
Thank in advance!

A functional update may be the best way to achieve this if your intention is to update many columns of a table in a similar fashion.
func:{[t;x]
if[not x in cols t;t:![t;();0b;(enlist x)!enlist 0]];
:$[x in cols t;
![t;();0b;(enlist`$string[x],"_VAL")!enlist(*;x;`$string[x],"_PRICE")];
t;
];
};
This function will update t with *_VAL columns for any column you pass as an argument, while first also adding a zero column for any missing columns passed as an argument.
q)t:([]AAA:10?100;BBB:10?100;CCC:10?100;AAA_PRICE:10*10?10;BBB_PRICE:10*10?10;CCC_PRICE:10*10?10;DDD_PRICE:10*10?10)
q)func/[t;`AAA`BBB`CCC`DDD]
AAA BBB CCC AAA_PRICE BBB_PRICE CCC_PRICE DDD_PRICE AAA_VAL BBB_VAL CCC_VAL DDD DDD_VAL
---------------------------------------------------------------------------------------
70 28 89 10 90 0 0 700 2520 0 0 0
39 17 97 50 90 40 10 1950 1530 3880 0 0
76 11 11 0 0 50 10 0 0 550 0 0
26 55 99 20 60 80 90 520 3300 7920 0 0
91 51 3 30 20 0 60 2730 1020 0 0 0
83 81 7 70 60 40 90 5810 4860 280 0 0
76 68 98 40 80 90 70 3040 5440 8820 0 0
88 96 30 70 0 80 80 6160 0 2400 0 0
4 61 2 70 90 0 40 280 5490 0 0 0
56 70 15 0 50 30 30 0 3500 450 0 0
As you've already mentioned, to cover point 2, a dictionary of functions might be the best way to go.
q)dict:raze{(enlist`$string[x],"_VAL")!enlist(*;x;`$string[x],"_PRICE")}each`AAA`BBB`DDD
q)dict
AAA_VAL| * `AAA `AAA_PRICE
BBB_VAL| * `BBB `BBB_PRICE
DDD_VAL| * `DDD `DDD_PRICE
And then a slightly modified function...
func:{[dict;t;x]
if[not x in cols t;t:![t;();0b;(enlist x)!enlist 0]];
:$[x in cols t;
![t;();0b;(enlist`$string[x],"_VAL")!enlist(dict`$string[x],"_VAL")];
t;
];
};
yields a similar result.
q)func[dict]/[t;`AAA`BBB`DDD]
AAA BBB CCC AAA_PRICE BBB_PRICE CCC_PRICE DDD_PRICE AAA_VAL BBB_VAL DDD DDD_VAL
-------------------------------------------------------------------------------
70 28 89 10 90 0 0 700 2520 0 0
39 17 97 50 90 40 10 1950 1530 0 0
76 11 11 0 0 50 10 0 0 0 0
26 55 99 20 60 80 90 520 3300 0 0
91 51 3 30 20 0 60 2730 1020 0 0
83 81 7 70 60 40 90 5810 4860 0 0
76 68 98 40 80 90 70 3040 5440 0 0
88 96 30 70 0 80 80 6160 0 0 0
4 61 2 70 90 0 40 280 5490 0 0
56 70 15 0 50 30 30 0 3500 0 0

Here's another approach which handles dependent/cascading calculations and also figures out which calculations are possible or not depending on the available columns in the table.
q)show map:`AAA_VAL`BBB_VAL`AAA_RevenueP`AAA_RevenueM`BBB_Other!((*;`AAA;`AAA_PRICE);(*;`BBB;`BBB_PRICE);(+;`AAA_Revenue;`AAA_VAL);(%;`AAA_RevenueP;1e6);(reciprocal;`BBB_VAL));
AAA_VAL | (*;`AAA;`AAA_PRICE)
BBB_VAL | (*;`BBB;`BBB_PRICE)
AAA_RevenueP| (+;`AAA_Revenue;`AAA_VAL)
AAA_RevenueM| (%;`AAA_RevenueP;1000000f)
BBB_Other | (%:;`BBB_VAL)
func:{c:{$[0h=type y;.z.s[x]each y;-11h<>type y;y;y in key x;.z.s[x]each x y;y]}[y]''[y];
![x;();0b;where[{all in[;cols x]r where -11h=type each r:(raze/)y}[x]each c]#c]};
q)t:([] AAA:1 2 3;AAA_PRICE:1 2 3f;AAA_Revenue:10 20 30;BBB:4 5 6);
q)func[t;map]
AAA AAA_PRICE AAA_Revenue BBB AAA_VAL AAA_RevenueP AAA_RevenueM
---------------------------------------------------------------
1 1 10 4 1 11 1.1e-05
2 2 20 5 4 24 2.4e-05
3 3 30 6 9 39 3.9e-05
/if the right columns are there
q)t:([] AAA:1 2 3;AAA_PRICE:1 2 3f;AAA_Revenue:10 20 30;BBB:4 5 6;BBB_PRICE:4 5 6f);
q)func[t;map]
AAA AAA_PRICE AAA_Revenue BBB BBB_PRICE AAA_VAL BBB_VAL AAA_RevenueP AAA_RevenueM BBB_Other
--------------------------------------------------------------------------------------------
1 1 10 4 4 1 16 11 1.1e-05 0.0625
2 2 20 5 5 4 25 24 2.4e-05 0.04
3 3 30 6 6 9 36 39 3.9e-05 0.02777778
The only caveat is that your map can't have the same column name as both the key and in the value of your map, aka cannot re-use column names. And it's assumed all symbols in your map are column names (not global variables) though it could be extended to cover that
EDIT: if you have a large number of column maps then it will be easier to define it in a more vertical fashion like so:
map:(!). flip(
(`AAA_VAL; (*;`AAA;`AAA_PRICE));
(`BBB_VAL; (*;`BBB;`BBB_PRICE));
(`AAA_RevenueP;(+;`AAA_Revenue;`AAA_VAL));
(`AAA_RevenueM;(%;`AAA_RevenueP;1e6));
(`BBB_Other; (reciprocal;`BBB_VAL))
);

KDB - Temporal function results become input for function on subsequent row

I've been working with KDB to create temporal data from a set of inputs that represents segments functions of a model. The challenge I'm facing is that for a particular ID, there are be several segments where the last value of the temporal results from the first segment becomes an input (but not always) to the subsequent segment.
//Create sample table
t:([id:`AAA`AAA`AAA`BBB`CCC`CCC;seg:1 2 3 1 1 2];aa: 1500 0n 0n 40 900 0N;bb:150 200 30 40 10 15;cc: .40 .25 .35 .5 .35 .45; Fname:`Fx`Fy`Fy`Fy`Fz`Fz);
The simple dummy functions below return 5 periods of data but actually each function can return several thousand data points
//Dummy functions to generate temporal data
Fx:{[aa;bb;cc] (aa%bb)*(1-exp(neg cc*1+til 5))*100};
Fy:{[aa;bb;cc] (aa%cc)*(1*(1-exp(neg cc*1+til 5)))};
Fz:{[aa;bb;cc] (aa%bb)*(1-exp(neg cc*1+til 5))};
When I run the result for each function, we can see that where we are missing aa on a few segments. The aa should be the prev (last t[result]) from the prior segment (i.e. aa = 864.6647 for AAA seg 2 and aa= 74.36035f for CCC seg 2)
show update result:first[Call_Function]'[aa;bb;cc] by Call_Function from t
id seg| aa bb cc Fname result
-------| ----------------------------------------------------------------
AAA 1 | 1500 150 0.4 Fx 329.68 550.671 698.8058 798.1035 864.6647
AAA 2 | 200 0.25 Fy
AAA 3 | 30 0.35 Fy
BBB 1 | 40 40 0.5 Fy 31.47755 50.56964 62.14959 69.17318 73.4332
CCC 1 | 900 10 0.35 Fz 26.57807 45.30732 58.5056 67.80627 74.36035
CCC 2 | 15 0.45 Fz
I've played around with trying to reference the prior segment prev(last(t[result]) but the list result isn't referential. Similarly, I understand the / (over) iterator would be useful but I've been unsuccessful implementing it.
I thought about breaking this up into several steps (all the segment 1's then 2's then so on) and then append them all to a final table. Similarly, I'd like to keep track of each segments cumulative values and temporal counts(time) to pass to functions as limiters as well as so referencing the prev row successfully has multiple uses.
Ultimately, once populated, I'll ungroup the table to get it into an output similar to below which I could then re-sort if needed.
q)show ungroup t
id seg aa bb cc Fname result
------------------------------------
AAA 1 1500 150 0.4 Fx 329.68
AAA 1 1500 150 0.4 Fx 550.671
AAA 1 1500 150 0.4 Fx 698.8058
AAA 1 1500 150 0.4 Fx 798.1035
AAA 1 1500 150 0.4 Fx 864.6647
AAA 2 200 0.25 Fy
AAA 2 200 0.25 Fy
AAA 2 200 0.25 Fy
AAA 2 200 0.25 Fy
AAA 2 200 0.25 Fy
AAA 3 30 0.35 Fy
AAA 3 30 0.35 Fy
AAA 3 30 0.35 Fy
AAA 3 30 0.35 Fy
AAA 3 30 0.35 Fy
BBB 1 40 40 0.5 Fy 31.47755
BBB 1 40 40 0.5 Fy 50.56964
BBB 1 40 40 0.5 Fy 62.14959
BBB 1 40 40 0.5 Fy 69.17318
BBB 1 40 40 0.5 Fy 73.4332

TL;DR I think the following is what you want:
q)t:update result:count[t]#enlist`float$() from t; // table extended to already contain a results column
q)applyF:{[t] update result:first[Fname]'[aa;bb;cc] by Fname from t where not null aa, 0=count each result} //applies each Fname function when needed
q)updateA:{[t]update aa:prev[last each result]^aa by id from t}; // updates column aa based on previous results
q)myUpd:updateA applyF ::; // helper function applying the two above
q)ungroup myUpd over t;
id seg aa bb cc Fname result
----------------------------------------
AAA 1 1500 150 0.4 Fx 329.68
AAA 1 1500 150 0.4 Fx 550.671
AAA 1 1500 150 0.4 Fx 698.8058
AAA 1 1500 150 0.4 Fx 798.1035
AAA 1 1500 150 0.4 Fx 864.6647
AAA 2 864.6647 200 0.25 Fy 765.0526
AAA 2 864.6647 200 0.25 Fy 1360.876
AAA 2 864.6647 200 0.25 Fy 1824.904
AAA 2 864.6647 200 0.25 Fy 2186.289
AAA 2 864.6647 200 0.25 Fy 2467.737
AAA 3 2467.737 30 0.35 Fy 2082.149
AAA 3 2467.737 30 0.35 Fy 3549.414
AAA 3 2467.737 30 0.35 Fy 4583.378
AAA 3 2467.737 30 0.35 Fy 5312.001
AAA 3 2467.737 30 0.35 Fy 5825.452
BBB 1 40 40 0.5 Fy 31.47755
BBB 1 40 40 0.5 Fy 50.56964
BBB 1 40 40 0.5 Fy 62.14959
BBB 1 40 40 0.5 Fy 69.17318
BBB 1 40 40 0.5 Fy 73.4332
CCC 1 900 10 0.35 Fz 26.57807
CCC 1 900 10 0.35 Fz 45.30732
CCC 1 900 10 0.35 Fz 58.5056
CCC 1 900 10 0.35 Fz 67.80627
CCC 1 900 10 0.35 Fz 74.36035
CCC 2 74.36035 15 0.45 Fz 1.796406
CCC 2 74.36035 15 0.45 Fz 2.941846
CCC 2 74.36035 15 0.45 Fz 3.67221
CCC 2 74.36035 15 0.45 Fz 4.137911
CCC 2 74.36035 15 0.45 Fz 4.434855
Now for a, hopefully not too long winded, explanation.
I'm going to make a couple of assumptions here:
Only column aa will have nulls
We can defer evaluating result for rows which do not yet have aa defined
For convenience I initiate t so that it has an empty result column
q)t:update result:count[t]#enlist`float$() from t;
id seg| aa bb cc Fname result
-------| --------------------------
AAA 1 | 1500 150 0.4 Fx
AAA 2 | 200 0.25 Fy
AAA 3 | 30 0.35 Fy
BBB 1 | 40 40 0.5 Fy
CCC 1 | 900 10 0.35 Fz
CCC 2 | 15 0.45 Fz
and define a function that will compute result for any rows which have aa defined and which have not already be computed
q)applyF:{[t] update result:first[Fname]'[aa;bb;cc] by Fname from t where not null aa};
Now generating results is as simple as calling the fuction
q)applyF t;
id seg| aa bb cc Fname result
-------| ---------------------------------------------------------------
AAA 1 | 1500 150 0.4 Fx 329.68 550.671 698.8058 798.1035 864.6647
AAA 2 | 200 0.25 Fy `float$()
AAA 3 | 30 0.35 Fy `float$()
BBB 1 | 40 40 0.5 Fy 31.47755 50.56964 62.14959 69.17318 73.4332
CCC 1 | 900 10 0.35 Fz 26.57807 45.30732 58.5056 67.80627 74.36035
CCC 2 | 15 0.45 Fz `float$()
To grab the next aa value from result you can do something like
q)update aa:prev[last each result]^aa by id from applyF t;
id seg| aa bb cc Fname result
-------| -------------------------------------------------------------------
AAA 1 | 1500 150 0.4 Fx 329.68 550.671 698.8058 798.1035 864.6647
AAA 2 | 864.6647 200 0.25 Fy `float$()
AAA 3 | 30 0.35 Fy `float$()
BBB 1 | 40 40 0.5 Fy 31.47755 50.56964 62.14959 69.17318 73.4332
CCC 1 | 900 10 0.35 Fz 26.57807 45.30732 58.5056 67.80627 74.36035
CCC 2 | 74.36035 15 0.45 Fz `float$()
We can simplify by writing another function for updating aa
q)updateA:{[t]update aa:prev[last each result]^aa by id from t};
q)updateA applyF t
id seg| aa bb cc Fname result
-------| -------------------------------------------------------------------
AAA 1 | 1500 150 0.4 Fx 329.68 550.671 698.8058 798.1035 864.6647
AAA 2 | 864.6647 200 0.25 Fy `float$()
AAA 3 | 30 0.35 Fy `float$()
BBB 1 | 40 40 0.5 Fy 31.47755 50.56964 62.14959 69.17318 73.4332
CCC 1 | 900 10 0.35 Fz 26.57807 45.30732 58.5056 67.80627 74.36035
CCC 2 | 74.36035 15 0.45 Fz `float$()
Now to get your desired result we will want to apply these updates over and over. Your instincts about the over iterator are correct here. The usage here applies the updates until the table stops changing (aka converge)
q)myUpd:updateA applyF ::; // both update functions combined into one or convenience
q)myUpd over t
id seg| aa bb cc Fname result
-------| --------------------------------------------------------------------
AAA 1 | 1500 150 0.4 Fx 329.68 550.671 698.8058 798.1035 864.6647
AAA 2 | 864.6647 200 0.25 Fy 765.0526 1360.876 1824.904 2186.289 2467.737
AAA 3 | 2467.737 30 0.35 Fy 2082.149 3549.414 4583.378 5312.001 5825.452
BBB 1 | 40 40 0.5 Fy 31.47755 50.56964 62.14959 69.17318 73.4332
CCC 1 | 900 10 0.35 Fz 26.57807 45.30732 58.5056 67.80627 74.36035
CCC 2 | 74.36035 15 0.45 Fz 1.796406 2.941846 3.67221 4.137911 4.434855
q)(myUpd myUpd myUpd t) ~ (myUpd over t)
1b
And you can apply ungroup to the result above to get your desired output.

Another approach using over:
q)update res:{z .#[y;0;{y^x};last x]}\[0n;flip(aa;bb;cc);Fname] from t
id seg| aa bb cc Fname res
-------| ----------------------------------------------------------------
AAA 1 | 1500 150 0.4 Fx 329.68 550.671 698.8058 798.1035 864.6647
AAA 2 | 200 0.25 Fy 765.0526 1360.876 1824.904 2186.289 2467.737
AAA 3 | 30 0.35 Fy 2082.149 3549.414 4583.378 5312.001 5825.452
BBB 1 | 40 40 0.5 Fy 31.47755 50.56964 62.14959 69.17318 73.4332
CCC 1 | 900 10 0.35 Fz 26.57807 45.30732 58.5056 67.80627 74.36035
CCC 2 | 15 0.45 Fz 1.796406 2.941846 3.67221 4.137911 4.434855
What I'm not clear on from your question is whether or not the "last value" is allowed to spill over from one id to a different id. If it shouldn't, you could simply add a "by id" to my solution

Accumulate all values at every point in time by symbol

I have this table:
execs:([]time:til 12;sym:`a`b`c`a`c`c`c`b`b`a`b`c;leavesQty:(1000;900;1300;800;1200;900;600;800;400;300;200;100))
I have different syms and for each at some time a leavesQty. And now I want to extend the table this way that at every row I get the sum of all leavesQty entries by sym at this time.
So I would have to come up to these values for this example:
execs:([]time:til 12;sym:`a`b`c`a`c`c`c`b`b`a`b`c;leavesQty:(1000;900;1300;800;1200;900;600;800;400;300;200;100);accLeavesQty:(1000;1900;3200;3000;2900;2600;2300;2200;1800;1300;1100;600))

You add this column with a single update statement if you use fby:
q)update accLeavesQty:sums (deltas;leavesQty) fby sym from execs
time sym leavesQty accLeavesQty
-------------------------------
0 a 1000 1000
1 b 900 1900
2 c 1300 3200
3 a 800 3000
4 c 1200 2900
5 c 900 2600
6 c 600 2300
7 b 800 2200
8 b 400 1800
9 a 300 1300
10 b 200 1100
11 c 100 600

Firstly you want to get the deltas of the leaves Qunatity for each symbol so can see how the value changes over time. After that you just need to do a cumulative sum of the resulting column.
q)update sums accLeavesQty from update accLeavesQty:deltas leavesQty by sym from execs
time sym leavesQty accLeavesQty
-------------------------------
0 a 1000 1000
1 b 900 1900
2 c 1300 3200
3 a 800 3000
4 c 1200 2900
5 c 900 2600
6 c 600 2300
7 b 800 2200
8 b 400 1800
9 a 300 1300
10 b 200 1100
11 c 100 600

You have a nice case for fby
q)update accLeavesQty:sums (deltas;leavesQty) fby sym from execs
time sym leavesQty accLeavesQty
-------------------------------
0 a 1000 1000
1 b 900 1900
2 c 1300 3200
3 a 800 3000
4 c 1200 2900
5 c 900 2600
6 c 600 2300
7 b 800 2200
8 b 400 1800
9 a 300 1300
10 b 200 1100
11 c 100 600

Another method involves recursion:
update accLeavesQty:sum each #[;;:;]\[()!();sym;leavesQty] from execs
It keeps a running dictionary of the last accLeavesQty for each sym and then calculates the sum of each of them
q)update accLeavesQty:#[;;:;]\[()!();sym;leavesQty] from execs
time sym leavesQty accLeavesQty
---------------------------------------
0 a 1000 (,`a)!,1000
1 b 900 `a`b!1000 900
2 c 1300 `a`b`c!1000 900 1300
3 a 800 `a`b`c!800 900 1300
4 c 1200 `a`b`c!800 900 1200
5 c 900 `a`b`c!800 900 900
6 c 600 `a`b`c!800 900 600
7 b 800 `a`b`c!800 800 600
8 b 400 `a`b`c!800 400 600
9 a 300 `a`b`c!300 400 600
10 b 200 `a`b`c!300 200 600
11 c 100 `a`b`c!300 200 100

Normalizing data and inverse

I use the following code to normalise my data in MATLAB:
Data=[130 100 100 100 300 300 30 300 30 320 200 50 300 25 100 250 1200 300 320 300 100 100 170 500 1000 200 120 450 200 2100 100 100 100 3450 500 30 100 550 4000 150 500 380 400 4000 180 540 700 100 500 2300 200 200 50 2000 400 100 50 50 100 4000 4000 3250 100 100 100 100 3300 100 100 4020 150 150 2300 3000 100 50 100 50 100 200 2000 300]
Data=randn(100,1);
%Histogram with histfit
nbins=20;
h=histfit(Data,nbins);
[nelements,bincenters]=hist(Data,nbins);
hold on %histogram computed manually and scaled with trapz()
avg=mean(Data);
stdev=std(Data);
x=sort(Data);
y=exp(-0.5*((x-avg)/stdev).^2)/(stdev*sqrt(2*pi));
plot(x,y*trapz(bincenters,nelements),'g','LineWidth',1.5)
legend('Histogram','Distribution from hisfit','Distribution computed manually')
It works well, I found the mean and standard deviation. Now, I try to inverse the y values with ln to find back the original points, but, I was not successful. Could you please give any idea to inverse y values?

How to write matrices from matlab to .xlsx with special formatting tables

I have one problem with exporting matrices from Matlab to Excel. This is not a problem, but I need some formatting.
I made matrices A and B and I printed them to .xlsx document.
filename = 'example.xlsx';
A;
sheet = 1;
xlRange = 'A9';
xlswrite(filename,A,sheet,xlRange)
B;
xlRange2= 'B9';
xlswrite(filename,B,sheet,xlRange2)
And i get the example.xlsx file with this formating:
400 4.56
500 5.12
600 6.76
700 7.98
800 8.21
900 9.21
1000 10.12
1100 11.23
1200 12.43
1300 13.89
1400 14.54
1500 15.21
1600 16.23
1700 17.53
I need this kind of formating:
400 4.56
500 5.12
600 6.76
700 7.98
800 8.21
900 9.21
1000 10.12
100 11.23
200 12.43
300 13.89
400 14.54
500 15.21
600 16.23
700 17.53
Steps are on 500, 1000, 1500, 2000, 2500... How to put one empty row and how to make this kind of formating?

This code provides the cell as required for xlswrite:
M=[400 4.56
500 5.12
600 6.76
700 7.98
800 8.21
900 9.21
1000 10.12
1100 11.23
1200 12.43
1300 13.89
1400 14.54
1500 15.21
1600 16.23
1700 17.53
900 9.21
1000 10.12
1100 11.23
1200 12.43
1300 13.89
1400 14.54
1500 15.21
1600 16.23
1700 17.53];
gaps=[500, 1000, 1500, 2000, 2500];
%calculates a group indx. 0 is below first gap, 1 between first and second etc..
group=sum(bsxfun(#ge,M(:,1),gaps),2);
%whenever group increases a line must be jumped, calculate indices
index=cumsum(ones(size(M,1),1)+[0;diff(group)>0]);
%allocate empty cell
X=cell(max(index),size(M,2));
%fill data
X(index,:)=num2cell(M);
xlswrite('a.xlsx',X)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

kdb - truncate subsequent rows based on point of data - kdb

Related

kdb - KDB Apply logic where column exists - data validation

KDB - Temporal function results become input for function on subsequent row

Accumulate all values at every point in time by symbol

Normalizing data and inverse

How to write matrices from matlab to .xlsx with special formatting tables

Categories

Resources