joining three tables each with an incomplete set of each others' keys - tsql

I have three tables (actually temp tables, each the result of other queries), with very similar data sets; I need to "condense", for lack of a better term, and my limited SQL knowledge is stopping me.
For example, we have Budgets by Code, Estimates by Code, and Actuals by Code. Not all possible values for Code exist in any of the three, nor even in another accessible table.
Budgets
1 $13
2 $22
4 $44
7 $71
Estimates
1 $14
4 $49
5 $55
Actuals
2 $21
3 $33
5 $57
7 $70
What I want:
Code Bgt Est Act
1 13 14 0
2 22 0 21
3 0 0 33
4 44 49 0
5 0 55 57
7 71 0 70
(I don't have to have 0 when there's no value, that's just for illustrative purposes.)
I just have no idea how to approach this - any help appreciated!

Try using Full Outer Join, In your case query will look like -
Select ISNULL(Bgt.Code,ISNULL(EST.Code,Act.Code)) AS Code,
ISNULL(Bgt.Budget,0) AS Bgt,
ISNULL(Est.Estimate,0) AS Est,
ISNULL(Act.Actual,0) AS Bgt
FROM Budget Bgt
FULL OUTER JOIN Estimates Est ON Est.Code=Bgt.Code
FULL OUTER JOIN Actuals Act ON Act.Code=bgt.Code

Related

KDB/Q How to implement moving rank efficiently?

I am trying to implement a moving rank function, taking parameters of n, the number of items, and m, the column name. Here is how I implement it:
mwindow: k){[y;x]$[y>0;x#(!#x)+\:!y;x#(!#x)+\:(!-y)+y+1]};
mrank: {[n;x] sum each x > prev mwindow[neg n;x]};
But this seems to take quite some time if n is moderately large, say 100.
I figure it is because it has to calculate from scratch, unlike msum, which keeps a running variable and only calculate the difference between the newly added and the dropped.
There's a number of general sliding window functions here that you can use to generate rolling lists on which to apply your rank: https://code.kx.com/q/kb/programming-idioms/#how-do-i-apply-a-function-to-a-sequence-sliding-window
Those approaches seem to fill the lists out with zeros/nulls however which I think won't really suit your use of rank. Here's another possible approach which might be more suitable to rank (though I haven't tested this for performance on the large scale):
q)mwin:{x each (),/:{neg[x]sublist y,z}[y]\[z]}
q)update r:mwin[rank;4;c] from ([]c:10?100)
c r
----------
84 ,0
25 1 0
31 2 0 1
0 3 1 2 0
51 1 2 0 3
29 2 0 3 1
25 0 3 2 1
73 2 1 0 3
0 2 1 3 0
6 2 3 0 1
q)update r:last each mwin[rank;4;c] from ([]c:10?100)
c r
----
38 0
72 1
13 0
77 3
64 1
9 0
37 1
79 3
97 3
63 1
q)

Value of a function, kdb+

I have a contrived function:
q)f:{a:6;b:2;c::5}
When I use value on it I get the following list:
q)value f
0xa0030902a1030a02a20b0481000004
,`x
`a`b
``c
6
2
5
5 0 4 0 9 0 8 0 14 0 12 0 12 2 2
"..f"
""
-1
"{a:6;b:2;c::5}"
I know what all items mean except items at index 0, 7, 8, 9 and 10.
I assume item 0 is a GUID assigned to the function by the interpreter to uniquely identify the function, (across time and space! ;) ).
Item 7 may be to with order of interpretation of the other indexes but 12 and 14 don't make sense under this assumption.
I assume item 8 is just the functions name, though why the ..?
Any insights into those listed indexes would be appreciated, thank-you.
This lambda reference on the kx website explains each part of the output https://code.kx.com/q/ref/value/#lambda

How can I efficiently convert the output of one KDB function into three table columns?

I have a function that takes as input some of the values in a table and returns a tuple if you will - three separate return values, which I want to transpose into the output of a query. Here's a simplified example of what I want to achieve:
multiplier:{(x*2;x*3;x*3)};
select twoX:multiplier[price][0]; threeX:multiplier[price][1]; fourX:multiplier[price][2] from data;
The above basically works (I think I've got the syntax right for the simplified example - if not then hopefully my intention is clear), but is inefficient because I'm calling the function three times and throwing away most of the output each time. I want to rewrite the query to only call the function once, and I'm struggling.
Update
I think I missed a crucial piece of information in my explanation of the problem which affects the outcome - I need to get other data in the query alongside the output of my function. Here's a hopefully more realistic example:
multiplier:{(x*2;x*3;x*4)};
select average:avg price, total:sum price, twoX:multiplier[sum price][0]; threeX:multiplier[sum price][1]; fourX:multiplier[sum price][2] by category from data;
I'll have a go at adapting your answers to fit this requirement anyway, and apologies for missing this bit of information. The real function if a proprietary and fairly complex algorithm and the real query has about 30 output columns, hence the attempt at simplifying the example :)
If you're just looking for the results themselves you can extract (exec) as lists, create dictionary and then flip the dictionary into a table:
q)exec flip`twoX`threeX`fourX!multiplier[price] from ([]price:til 10)
twoX threeX fourX
-----------------
0 0 0
2 3 4
4 6 8
6 9 12
8 12 16
10 15 20
12 18 24
14 21 28
16 24 32
18 27 36
If you need other columns from the original table too then its trickier but you could join the tables sideways using ,'
q)t:([]price:til 10)
q)t,'exec flip`twoX`threeX`fourX!multiplier[price] from t
An apply # can also achieve what you want. Here data is just a table with 10 random prices. # is then used to apply the multiplier function to the price column while also assigning a column name to each of the three resulting lists:
q)data:([] price:10?100)
q)multiplier:{(x*2;x*3;x*3)}
q)#[data;`twoX`threeX`fourX;:;multiplier data`price]
price twoX threeX fourX
-----------------------
80 160 240 240
24 48 72 72
41 82 123 123
0 0 0 0
81 162 243 243
10 20 30 30
36 72 108 108
36 72 108 108
16 32 48 48
17 34 51 51

Calculating the weighted moving average of 2 lists using a set window

If I have two lists:
a:1 2 3 4;
b:10 20 30 40;
I want to sum the product of the two lists within a window of 2. So the result set should be:
10 50 130 250
For example, to get the result of 130 it would be (2*20)+(3*30) = 130
sums 2 mavg '(a*b)
seems to get me part way there, but the window of 2 isn't being applied. I've tried experimenting with sum, sums, sum each, wavg, mavg, etc. and I am completely stuck. Could anyone help? Thanks!
This line should work for you:
2 msum a*b
as demonstrated here:
q)a:1 2 3 4
q)b:10 20 30 40
q)2 msum a*b
10 50 130 250
For more information about the keyword msum, you could check out the Kx Reference page:
https://code.kx.com/wiki/Reference/msum
Hope that helps!
Alternatively you could use the adverb each prior:
q)+':[a*b]
However this will only work with a window size of 2 and if your data contains null values this needs to be padded with 0:
q)+':[0^a*b2]
On a positive note it is faster than using msum in this situation.
q)\ts:1000000 +':[0^a*b2]
940 1264
q)\ts:1000000 2 msum a*b2
1556 1104

sorting a timer in matlab

ok it seems like a simple problem, but i am having problem
I have a timer for each data set which resets improperly and as a result my timing gets mixed.
Any ideas to correct it? without losing any data.
Example
timer col ideally should be
timer , mine reads
1 3
2 4
3 5
4 6
5 1
6 2
how do i change the colum 2 or make a new colum which reads like colum 1 without changing the order of ther rows which have data
this is just a example as my file lengths are 86000 long , also i have missing timers which i do not want to miss , this imples no data for that period of time.
thanks
EDIT: I do not want to change the other columns. The coulm 1 is the gps counter and so it does not sync with the comp timer due to some other issues. I just want to change the row one such that it goes from high to low without effecting other rows. also take care of missing pts ( if i did not care for missing pts simple n=1: max would work.
missing data in this case is indicated by missing timer. for example i have 4,5,8,9 with missing 6,7
Ok let me try to edit agian
its a 8600x 80 matrix of data:
timer is one row which should go from 0 to 8600
but timer starts at odd times , so i have start of data from middle , lets say 3400, so in the middle of day my timer goes to 0 and then back to 1.
but my other rows are fine. I just need 2 plot other sets based on timer as time.
i cannot use T= 1:length(file) as then it ignores missed time stamps ( timers )
for example my data reads like
timer , mine reads
1 3
2 4
3 5
4 8
5 9
8 1
9 2
so u can see time stamps 6,7 are missing.
if i used n=1:length(file)
i would have got
1 2 3 4 5 6 7
which is wrong
i want
1 2 3 4 5 8 9
without changing the order of other rows , so i cannot use sort for the whole file.
I assume the following problem
data says
3 100
4 101
5 102
NaN 0
1 104
2 105
You want
1 100
2 101
3 102
NaN 0
4 104
5 105
I'd solve the problem like this:
%# create test data
data = [3 100
4 101
5 102
NaN 0
1 104
2 105];
%# find good rows (if missing data are indicated by zeros, use
%# goodRows = data(:,1) > 0;
goodRows = isfinite(data(:,1));
%# count good rows
nGoodRows = sum(goodRows);
%# replace the first column with sequential numbers, but only in good rows
data(goodRows,1) = 1:nGoodRows;
data =
1 100
2 101
3 102
NaN 0
4 104
5 105
EDIT 1
Maybe I understand your question this time
data says
4 101
5 102
1 104
2 105
You want
1 4 101
2 5 102
4 1 104
5 2 105
This can be achieved the following way
%# test data
data = [4 101
5 102
1 104
2 105];
%# use sort to get the correct order of the numbers and add it to the left of data
out = [sort(data(:,1)),data]
out =
1 4 101
2 5 102
4 1 104
5 2 105
EDIT 2
Note that out is the result from the solution in EDIT 1
It seems you want to plot the data so that there is no entry for missing values. One way to do this is to make a plot with dots - there won't be a dot for missing data.
plot(out(:,1),out(:,3),'.')
If you want to plot a line that is interrupted, you have to insert NaNs into out
%# create outNaN, that has NaN-rows for missing entries
outNaN = NaN(max(out(:,1)),size(out,2));
outNaN(out(:,1),:) = out;
%# plot
plot(out(:,1),out(:,3))