kdb - nested functions within nested dictionaries - kdb

All,
I'm testing whether or not I can build a robust data model using nested dictionaries
of data and f(x). The namespace/dot notation works well for what I'm attempting and we can reference the dictionary well but I'm unable to get the functions to execute when called as I thought.
I was hoping to reference the dictionary in the functions recursively to execute and recalculate when called and reduce incredible long functions.
Keeping the functions as part of the dictionary is an objective as we expect to have various functions for different dictionaries of data.
Here is a simple example of a nested dictionary and functions with the objective to calculate total rev based on the data in the inventory for prodA and prodB.
Many thanks in advance!
.inv.prodA.qty: 10 #10;
.inv.prodA.price: 10 #2.75;
.inv.prodA.cogs: 10 # 1.35;
.inv.prodA.net_price: {[x] x[`price] - x[`cogs]};
.inv.prodA.net_price [.inv.prodA];
.inv.prodA.rev: {[x] x[`qty] * (inv.net_price[ x[`price] - x[`cogs]])};
.inv.prodB.qty: 20 #2.50;
.inv.prodB.price: 20 #1.50;
.inv.prodB.cogs: 20 # 0.25;
.inv.prodB.net_price: {[x] x[`price] - x[`cogs]};
.inv.prodB.net_price [.inv.prodB];
.inv.prodB.rev: {[x] x[`qty] * (inv.net_price[ x[`price] - x[`cogs]])*.5}; //Note that the rev f(x) are unique for each prodA and prodB
.inv.total.rev: {[x] x[`prodA`rev] + x[`prodB`rev]};

I think you want something like:
.inv.prodA.net_price:{.inv[x;`price] - .inv[x;`cogs]};
.inv.prodA.rev:{.inv[x;`qty] * .inv[x;`net_price]x};
.inv.prodB.net_price:{.inv[x;`price] - .inv[x;`cogs]};
.inv.prodB.rev:{.inv[x;`qty] * .inv[x;`net_price][x] *.5};
.inv.total.rev:{.inv[`prodA;`rev][`prodA] , .inv[`prodB;`rev]`prodB};
q).inv.total.rev[]
14 14 14 14 14 14 14 14 14 14 1.5625 1.5625 1.5625 1.5625 1.5625 1.5625 1.562..
Changed your last function to an append because they're not the same length for addition. If you want to change the .inv into a variable too then you can modify the functions to pass those through as follows:
.inv.prodA.net_price:{y[x;`price] - y[x;`cogs]};
.inv.prodA.rev:{y[x;`qty] * y[x;`net_price][x;y]};
.inv.prodB.net_price:{y[x;`price] - y[x;`cogs]};
.inv.prodB.rev:{y[x;`qty] * y[x;`net_price][x;y] *.5};
.inv.total.rev:{x[`prodA;`rev][`prodA;x] , x[`prodB;`rev][`prodB;x]};
q).inv.total.rev[`.inv]
14 14 14 14 14 14 14 14 14 14 1.5625 1.5625 1.5625 1.5625 1.5625 1.5625 1.562..

Related

What data should I use on predicting month values using linear regression

Predicting next month values using linear regression.
I am using 6 month based historical values to predict future values.
I use vaccinated count on dependent variable and use months for independent variable and converted it to integer starts on 1.
Example.
Historical Data:
Month dependent variable independent variable
Jun 15 1
Jul 14 2
Aug 18 3
Sep 19 4
Oct 20 5
Nov 22 6
Is that correct?
Dependent Variable = Vaccinated Count
Independent Variable = Month converted to number start from 1
Expecting to give me some ideas if my data is correct
See picture below.
Python simple linear regression:
Hardcover
Date
2000-04-01 139
2000-04-02 128
2000-04-03 172
2000-04-04 139
2000-04-05 191
df['Time'] = np.arange(len(df.index))
Hardcover Time
Date
2000-04-01 139 0
2000-04-02 128 1
2000-04-03 172 2
2000-04-04 139 3
2000-04-05 191 4
fig, ax = plt.subplots()
ax.plot('Time', 'Hardcover', data=df, color='0.75')
ax = sns.regplot(x='Time', y='Hardcover', data=df, ci=None, scatter_kws=dict(color='0.25'))
ax.set_title('Time Plot of Hardcover Sales');

How can I efficiently convert the output of one KDB function into three table columns?

I have a function that takes as input some of the values in a table and returns a tuple if you will - three separate return values, which I want to transpose into the output of a query. Here's a simplified example of what I want to achieve:
multiplier:{(x*2;x*3;x*3)};
select twoX:multiplier[price][0]; threeX:multiplier[price][1]; fourX:multiplier[price][2] from data;
The above basically works (I think I've got the syntax right for the simplified example - if not then hopefully my intention is clear), but is inefficient because I'm calling the function three times and throwing away most of the output each time. I want to rewrite the query to only call the function once, and I'm struggling.
Update
I think I missed a crucial piece of information in my explanation of the problem which affects the outcome - I need to get other data in the query alongside the output of my function. Here's a hopefully more realistic example:
multiplier:{(x*2;x*3;x*4)};
select average:avg price, total:sum price, twoX:multiplier[sum price][0]; threeX:multiplier[sum price][1]; fourX:multiplier[sum price][2] by category from data;
I'll have a go at adapting your answers to fit this requirement anyway, and apologies for missing this bit of information. The real function if a proprietary and fairly complex algorithm and the real query has about 30 output columns, hence the attempt at simplifying the example :)
If you're just looking for the results themselves you can extract (exec) as lists, create dictionary and then flip the dictionary into a table:
q)exec flip`twoX`threeX`fourX!multiplier[price] from ([]price:til 10)
twoX threeX fourX
-----------------
0 0 0
2 3 4
4 6 8
6 9 12
8 12 16
10 15 20
12 18 24
14 21 28
16 24 32
18 27 36
If you need other columns from the original table too then its trickier but you could join the tables sideways using ,'
q)t:([]price:til 10)
q)t,'exec flip`twoX`threeX`fourX!multiplier[price] from t
An apply # can also achieve what you want. Here data is just a table with 10 random prices. # is then used to apply the multiplier function to the price column while also assigning a column name to each of the three resulting lists:
q)data:([] price:10?100)
q)multiplier:{(x*2;x*3;x*3)}
q)#[data;`twoX`threeX`fourX;:;multiplier data`price]
price twoX threeX fourX
-----------------------
80 160 240 240
24 48 72 72
41 82 123 123
0 0 0 0
81 162 243 243
10 20 30 30
36 72 108 108
36 72 108 108
16 32 48 48
17 34 51 51

How to convert alphabets to numerical values with spaces and return it back to alphabets?

Want to convert the alphabet to numerical values and transform it back to alphabets using some mathematical techniques like fast Fourier transform in MATLAB.
Example:
The following is the text saved in "text2figure.txt" file
Hi how r u am fine take care of your health
thank u very much
am 2.0
Reading it in MATLAB:
data=fopen('text2figure.txt','r')
d=fscanf(data,'%s')
temp = fileread( 'text2figure.txt' )
temp = regexprep( temp, ' {6}', ' NaN' )
c=cellstr(temp(:))'
Now I wish to convert cell array with spaces to numerical values/integers:
coding = 'abcdefghijklmnñopqrstuvwxyz .,;'
str = temp %// example text
[~, result] = ismember(str, coding)
y=result
result =
Columns 1 through 18
0 9 28 8 16 24 28 19 28 22 28 1 13 28 6 9 14 5
Columns 19 through 36
28 21 1 11 5 28 3 1 19 5 28 16 6 28 26 16 22 19
Columns 37 through 54
28 8 5 1 12 21 8 28 0 0 21 8 1 14 11 28 22 28
Columns 55 through 71
23 5 19 26 28 13 22 3 8 0 0 1 13 28 0 29 0
Now I wish to convert the numerical values back to alphabets:
Hi how r u am fine take care of your health
thank u very much
am 2.0
How to write a MATLAB code to return the numerical values in the variable result to alphabets?
Most of the code in the question doesn't have any useful effects. These three lines are the ones that lead to result:
str = fileread('test2figure.txt');
coding = 'abcdefghijklmnñopqrstuvwxyz .,;';
[~, result] = ismember(str, coding);
ismember returns, in the second output argument, the indices into coding for each element of str. Thus, result are indices that we can use to index into coding:
out = coding(result);
However, this does not work because some elements of str do not occur in coding, and for those elements ismember returns 0, which is not a valid index. We can replace the zeros with a new character:
coding = ['*',coding];
out = coding(result+1);
Basically, we're shifting each code by one, adding a new code for 1.
One of the characters we're missing here is the newline character. Thus the three lines have become one line. You can add a code for the newline character by adding it to the coding table:
str = fileread('test2figure.txt');
coding = ['abcdefghijklmnñopqrstuvwxyz .,;',char(10)]; % char(10) is the newline character
[~, result] = ismember(str, coding);
coding = ['*',coding];
out = coding(result+1);
All of this is easier to achieve just using the ASCII code table:
str = fileread('test2figure.txt');
result = double(str);
out = char(result);

How to eliminate series of values with so much variation

I got a dataset (azimuth vs time) with measure the compass of an object trough time. So I can see when the object is moving (the compass vary so much), and when it's static, without moving (compass do not vary). My question is how to program this in matlab in order to eliminate the data which show that the object is moving and just filter data that shows the object is static.
For example:
Azimuth (angle) | 30 30 30 15 10 16 19 24 24 24 17 14 12 15 16
Time (s) | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
The output would be:
Azimuth (angle) | 30 30 30 24 24 24
Time (s) | 1 2 3 8 9 10
s=diff(Azumuth)==0
%diff only would skip the values at t=1 and t=8. Modify to include them as well:
s=[s(1),s(2:end)|s(1:end-1),s(end)]
Azumuth(s)
Time(s)

How to extract new matrix from existing one

I have a large number of entries arranged in three columns. Sample of the data is:
A=[1 3 2 3 5 4 1 5 ;
22 25 27 20 22 21 23 27;
17 15 15 17 12 19 11 18]'
I want the first column (hours) to control the entire matrix to create new matrix as follows:
Anew=[1 2 3 4 5 ; 22.5 27 22.5 21 24.5; 14 15 16 19 15]'
Where the 2nd column of Anew is the average value of each corresponding hour for example:
from matrix A:
at hour 1, we have 2 values in 2nd column correspond to hour 1
which are 22 and 23 so the average is 22.5
Also the 3rd column: at hour 1 we have 17 and 11 and the
average is 14 and this continues to the hour 5 I am using Matlab
You can use ACCUMARRAY for this:
Anew = [unique(A(:,1)),...
cell2mat(accumarray(A(:,1),1:size(A,1),[],#(x){mean(A(x,2:3),2)}))]
This uses the first column A(:,1) as indices (x) to pick the values in columns 2 and 3 for averaging (mean(A(x,2:3),1)). The curly brackets and the call to cell2mat allow you to work on both columns at once. Otherwise, you could do each column individually, like this
Anew = [unique(A(:,1)), ...
accumarray(A(:,1),A(:,2),[],#mean), ...
accumarray(A(:,1),A(:,3),[],#mean)]
which may actually be a bit more readable.
EDIT
The above assumes that there's no missing entry for any of the hours. It will result in an error otherwise. Thus, a more robust way to calculate Anew is to allow for missing values. For easy identification of the missing values, we use the fillval input argument to accumarray and set it to NaN.
Anew = [(1:max(A(:,1)))', ...
accumarray(A(:,1),A(:,2),[],#mean,NaN), ...
accumarray(A(:,1),A(:,3),[],#mean,NaN)]
You can use consolidator to do the work for you.
[Afinal(:,1),Afinal(:,2:3)] = consolidator(A(:,1),A(:,2:3),#mean);
Afinal
Afinal =
1 22.5 14
2 27 15
3 22.5 16
4 21 19
5 24.5 15