What's the easiest way of making a histogram in KDB? - kdb

If I've got a list of values x, what's the easiest way to make a histogram with bin-size b?
It seems like I could hack a complicated solution, but I feel like there must be a built-in function for this somewhere that I don't know about.

I haven't heard about built-in histogram so far. But I would approach this task like below.
For fixed bucket size:
a: 0.39 0.51 0.51 0.4 0.17 0.3 0.78 0.53 0.71 0.41;
b: 0.1;
{count each group x xbar y}[b;a]
// returns 0.3 0.5 0.4 0.1 0.7!2 3 2 1 2j
For "floating" buckets:
a: 0.39 0.51 0.51 0.4 0.17 0.3 0.78 0.53 0.71 0.41;
b: -1 0.5 0.7 1;
{count each group x#x bin y}[b;a]
// returns -1 0.5 0.7!5 3 2j
Above functions return dictionary with bucket starts as keys and number of bucket occurrences as values.

Assuming you have a list of x values (let's assume x = 1000):
v:1000?1.0;
You can achieve what you need as follows:
b:0.1;
hist:(count') group xbar[b;v];
There are two points:
the keys in hist are not sorted
For the bucket, do you prefer to output the left or the right delimiter?
To solve for 1), you simply do:
hist:(asc key hist)#hist;
To solve for 2) - I mean, if you want to have the right delimiter:
hist:(+[b;key hist])!value hist;

Related

How do I set initial value of a scan to be a list?

I'm trying to do:
(.5 .5) (+\) (.9 .2;.4 .1)
Expected result:
1.4 0.7
1.8 0.8
But instead I get 'type.
I cannot seem to use a list as the leftmost argument. What am I doing wrong?
q)(+\)[0.5 0.5;(.9 .2;.4 .1)]
1.4 0.7
1.8 0.8
Your expression works fine without the parens:
q).5 .5 +\ (.9 .2;.4 .1)
1.4 0.7
1.8 0.8
As you see, you can use a list as left argument. But scalar extension means you need not.
q).5 +\ (.9 .2;.4 .1)
1.4 0.7
1.8 0.8
Explanation
You want cumulative sums from your list:
q)sums (.9 .2;.4 .1)
0.9 0.2
1.3 0.3
And then some:
q).5 + sums (.9 .2;.4 .1)
1.4 0.7
1.8 0.8
But you have seen a little deeper into this. sums is syntactic sugar for the derived function Add Scan +\. The derived function is variadic – can be applied as either a unary or a binary – and sums is only for unary application. When Add Scan is applied as a binary its left argument is an initial value.
q).5 +\ (.9 .2;.4 .1)
1.4 0.7
1.8 0.8
When the right argument is a long list, it is more efficient to specify an initial value than to add it afterwards to each of the cumulative sums.
q)show L:1000000?1.
0.3927524 0.5170911 0.5159796 0.4066642 0.1780839 0.3017723 0.785033 0.5347096 0.711171..
q)\ts:1000 .5+sums L
2360 16777472
q)\ts:1000 .5+\ L
1477 8388880
Those parens
As mentioned, the derived function +\ is variadic.
q)5+\1 2 3 / +\ applied as a binary
6 8 11
To apply it as a unary, use either bracket notation +\[1 2 3] or ‘capture’ it in parens. The immediate effect is to prevent its application. You now have a data item. It has ‘noun syntax’ and can be passed as an argument.
q)type(+\)
108h
But q syntax allows you to apply a noun by juxtaposition.
For noun N, N x is equivalent to N#x or N[x].
q)"abc" 2 0
"ca"
q)til 3
0 1 2
q)(+\)1 2 3 / +\ applied as a unary
1 3 6

How to select whole numbers in a row and its adjacent numbers in Matlab

Good day!
I would like to select the whole numbers in my random data, at the same time it will also choose the adjacent numbers.
For example, I have this raw data
A = [0.1 0.2
0.2 0.1
1 0.3
0.3 0.2
0.4 0.4
2 0.5]
so would like to select the (1, 0.3) and (2, 0.5). then my final ouptut will be,
B= [1 0.3
2 0.4]
Thanks in advance!
You can use modulo:
B=A(sum(mod(A,1),2)==0,:)
========== EDIT ====================
Editing w.r.t. comments, if you are only checking for integers in the first column then you do not need to sum results:
B=A(mod(A(:,1),1)==0,:)
Alternative ways would use logicals instead of numericals:
B=A(all(A==round(A),2),:)
or if only the 1st column is checked:
B=A(A==round(A(:,1)),:)

Line chart with means and CIs in Stata

I have a data set with effect estimates for different points in time (for 1 month, 2 months, 6 months, 12 months and 18 months) and its standard errors. Now I want to plot the means for each period and the corresponding CIs around the means.
My sample looks like:
effect horizon se
0.03 1 0.2
0.02 6 0.01
0.01 6 0.3
0.00 1 0.4
0.04 18 0.2
0.02 2 0.05
0.01 2 0.02
... ...
The means of the effects for each horizon lead to 5 data points that I want to plot in a line chart together with the confidence intervals. I tried this:
egen means = mean(effect), by(horizon)
line means horizon
But how can I add the symmetric confidence bands? Such that I get something that looks like this:
Not entirely certain that this makes sense statistically, but here's how I might do this:
gen variance = se^2
collapse (mean) effect (sum) SV = variance (count) noobs = effect, by(horizon)
gen se_mean = sqrt(SV*(1/noobs)^2)
gen LB = effect - 1.96*se_mean
gen UB = effect + 1.96*se_mean
twoway (rline LB UB horizon, lpattern(dash dash)) (line effect horizon, lpattern(solid)), yline(0, lcolor(gray))
Which yields:
To get the SE of the mean effects T̅, I am using the formula
V(T̅) = 1/(n2) Σin V(Ti)
(which assumes the covariances of the effects are all zero). I then take the square root to get the SE of T̅.

access columns of row at index matlab

I have a text file A with N rows and 8 columns (example below):
1 2 0.02 0.28 0.009 0.04 0.03 0.16
2 4 0.04 0.21 0.4 0.04 0.03 0.13
if my variables x=2 and y=4 (I get during the program run -values change with every run.)
I know the index of the row I need to get:
rIndex = find((A{1}==x)&(A{2}==y));
here rindex = 2.
I need to access row at rIndex and do operation with columns 3 to 8 in that row. i.e. i get values from another file based on x and y and divide those values by column values at rIndex row.
x_No = 5, y_No = 6;
B = horzcat((x_No-y_No)./A{3}(1),(x_No-y_No))./A{4}(1),(x_No-y_No)./A{5}(1),(x_No-y_No)./A{6}(1),(x_No-y_No)./A{7}(1),(x_No-y_No)./A{8}(1));
What B does right now is operates on all rows from A. I need only at particular rIndex.
Its still not really clear what happens, but I think you can try something like this:
for r = 1:numel(rIndex)
t=rIndex(r);
result(r,:) = horzcat((x_No-y_No)./A{3}(t),(x_No-y_No))./A{4}(t),(x_No-y_No)./A{5}(t),(x_No-y_No)./A{6}(t),(x_No-y_No)./A{7}(t),(x_No-y_No)./A{8}(t));
end
I may be doing too much actually, but hopefully this helps.

matlab: Error/precision when evaluating vector

I noticed what was for me an unpredicted behavior when evaluating vectors. It seems is really different doing it directly than indexing within a loop. Can anyone give me some help with this?
i know is probably explained in how it makes each operation, so i need some keys about how to look for it
thanks for advise
thanks in advance
example:
x=[0.05:.01:3];
n=size(x,2);
y1 = (8*x-1)/(x)-(exp(x));
for i=1:n
y2(i)=(8*x(i)-1)/(x(i))-(exp(x(i)));
end
a1=[x;y1];
a2=[x;y2];
plot(x,y1, 'red')
hold on
plot(x,y2, 'blue')
here the plot:
http://i.stack.imgur.com/qAHD6.jpg
results:
a2:
0.05 -13.0513
0.06 -9.7285
0.07 -7.3582
0.08 -5.5833
0.09 -4.2053
0.10 -3.1052
0.11 -2.2072
0.12 -1.4608
0.13 -0.8311
0.14 -0.2931
0.15 0.1715
0.16 0.5765
a1:
0.05 6.4497
0.06 6.4391
0.07 6.4284
0.08 6.4177
0.09 6.4068
0.10 6.3958
0.11 6.3847
0.12 6.3734
0.13 6.3621
0.14 6.3507
0.15 6.3391
0.16 6.3274
What you want is:
y1 = (8*x-1)./(x)-(exp(x)); % note the ./
instead of:
y1 = (8*x-1)/(x)-(exp(x));
As a side note, you can type help / to see what your original first statement was really doing. It was effectively doing (x'\(8*x-1)')' (note the backslash).
The error is in your first vector calculation of y1. The problem is in the part where you divide by x. Vector/Matrix division is different from single element division and will return a vector/matrix result.
The solution is to do element-by-element division of these vectors. The vector notation shortcut for this is to use ./ (dot-divide):
y3 = (8*x-1)./(x)-(exp(x));
Now you should find that y2 and y3 are identical.