What's the easiest way of making a histogram in KDB? - kdb

If I've got a list of values x, what's the easiest way to make a histogram with bin-size b?
It seems like I could hack a complicated solution, but I feel like there must be a built-in function for this somewhere that I don't know about.

I haven't heard about built-in histogram so far. But I would approach this task like below.
For fixed bucket size:
a: 0.39 0.51 0.51 0.4 0.17 0.3 0.78 0.53 0.71 0.41;
b: 0.1;
{count each group x xbar y}[b;a]
// returns 0.3 0.5 0.4 0.1 0.7!2 3 2 1 2j
For "floating" buckets:
a: 0.39 0.51 0.51 0.4 0.17 0.3 0.78 0.53 0.71 0.41;
b: -1 0.5 0.7 1;
{count each group x#x bin y}[b;a]
// returns -1 0.5 0.7!5 3 2j
Above functions return dictionary with bucket starts as keys and number of bucket occurrences as values.

Assuming you have a list of x values (let's assume x = 1000):
You can achieve what you need as follows:
hist:(count') group xbar[b;v];
There are two points:
the keys in hist are not sorted
For the bucket, do you prefer to output the left or the right delimiter?
To solve for 1), you simply do:
hist:(asc key hist)#hist;
To solve for 2) - I mean, if you want to have the right delimiter:
hist:(+[b;key hist])!value hist;


How do I set initial value of a scan to be a list?

I'm trying to do:
(.5 .5) (+\) (.9 .2;.4 .1)
Expected result:
1.4 0.7
1.8 0.8
But instead I get 'type.
I cannot seem to use a list as the leftmost argument. What am I doing wrong?
q)(+\)[0.5 0.5;(.9 .2;.4 .1)]
1.4 0.7
1.8 0.8
Your expression works fine without the parens:
q).5 .5 +\ (.9 .2;.4 .1)
1.4 0.7
1.8 0.8
As you see, you can use a list as left argument. But scalar extension means you need not.
q).5 +\ (.9 .2;.4 .1)
1.4 0.7
1.8 0.8
You want cumulative sums from your list:
q)sums (.9 .2;.4 .1)
0.9 0.2
1.3 0.3
And then some:
q).5 + sums (.9 .2;.4 .1)
1.4 0.7
1.8 0.8
But you have seen a little deeper into this. sums is syntactic sugar for the derived function Add Scan +\. The derived function is variadic – can be applied as either a unary or a binary – and sums is only for unary application. When Add Scan is applied as a binary its left argument is an initial value.
q).5 +\ (.9 .2;.4 .1)
1.4 0.7
1.8 0.8
When the right argument is a long list, it is more efficient to specify an initial value than to add it afterwards to each of the cumulative sums.
q)show L:1000000?1.
0.3927524 0.5170911 0.5159796 0.4066642 0.1780839 0.3017723 0.785033 0.5347096 0.711171..
q)\ts:1000 .5+sums L
2360 16777472
q)\ts:1000 .5+\ L
1477 8388880
Those parens
As mentioned, the derived function +\ is variadic.
q)5+\1 2 3 / +\ applied as a binary
6 8 11
To apply it as a unary, use either bracket notation +\[1 2 3] or ‘capture’ it in parens. The immediate effect is to prevent its application. You now have a data item. It has ‘noun syntax’ and can be passed as an argument.
But q syntax allows you to apply a noun by juxtaposition.
For noun N, N x is equivalent to N#x or N[x].
q)"abc" 2 0
q)til 3
0 1 2
q)(+\)1 2 3 / +\ applied as a unary
1 3 6

How to select whole numbers in a row and its adjacent numbers in Matlab

Good day!
I would like to select the whole numbers in my random data, at the same time it will also choose the adjacent numbers.
For example, I have this raw data
A = [0.1 0.2
0.2 0.1
1 0.3
0.3 0.2
0.4 0.4
2 0.5]
so would like to select the (1, 0.3) and (2, 0.5). then my final ouptut will be,
B= [1 0.3
2 0.4]
Thanks in advance!
You can use modulo:
========== EDIT ====================
Editing w.r.t. comments, if you are only checking for integers in the first column then you do not need to sum results:
Alternative ways would use logicals instead of numericals:
or if only the 1st column is checked:

Line chart with means and CIs in Stata

I have a data set with effect estimates for different points in time (for 1 month, 2 months, 6 months, 12 months and 18 months) and its standard errors. Now I want to plot the means for each period and the corresponding CIs around the means.
My sample looks like:
effect horizon se
0.03 1 0.2
0.02 6 0.01
0.01 6 0.3
0.00 1 0.4
0.04 18 0.2
0.02 2 0.05
0.01 2 0.02
... ...
The means of the effects for each horizon lead to 5 data points that I want to plot in a line chart together with the confidence intervals. I tried this:
egen means = mean(effect), by(horizon)
line means horizon
But how can I add the symmetric confidence bands? Such that I get something that looks like this:
Not entirely certain that this makes sense statistically, but here's how I might do this:
gen variance = se^2
collapse (mean) effect (sum) SV = variance (count) noobs = effect, by(horizon)
gen se_mean = sqrt(SV*(1/noobs)^2)
gen LB = effect - 1.96*se_mean
gen UB = effect + 1.96*se_mean
twoway (rline LB UB horizon, lpattern(dash dash)) (line effect horizon, lpattern(solid)), yline(0, lcolor(gray))
Which yields:
To get the SE of the mean effects T̅, I am using the formula
V(T̅) = 1/(n2) Σin V(Ti)
(which assumes the covariances of the effects are all zero). I then take the square root to get the SE of T̅.

access columns of row at index matlab

I have a text file A with N rows and 8 columns (example below):
1 2 0.02 0.28 0.009 0.04 0.03 0.16
2 4 0.04 0.21 0.4 0.04 0.03 0.13
if my variables x=2 and y=4 (I get during the program run -values change with every run.)
I know the index of the row I need to get:
rIndex = find((A{1}==x)&(A{2}==y));
here rindex = 2.
I need to access row at rIndex and do operation with columns 3 to 8 in that row. i.e. i get values from another file based on x and y and divide those values by column values at rIndex row.
x_No = 5, y_No = 6;
B = horzcat((x_No-y_No)./A{3}(1),(x_No-y_No))./A{4}(1),(x_No-y_No)./A{5}(1),(x_No-y_No)./A{6}(1),(x_No-y_No)./A{7}(1),(x_No-y_No)./A{8}(1));
What B does right now is operates on all rows from A. I need only at particular rIndex.
Its still not really clear what happens, but I think you can try something like this:
for r = 1:numel(rIndex)
result(r,:) = horzcat((x_No-y_No)./A{3}(t),(x_No-y_No))./A{4}(t),(x_No-y_No)./A{5}(t),(x_No-y_No)./A{6}(t),(x_No-y_No)./A{7}(t),(x_No-y_No)./A{8}(t));
I may be doing too much actually, but hopefully this helps.

matlab: Error/precision when evaluating vector

I noticed what was for me an unpredicted behavior when evaluating vectors. It seems is really different doing it directly than indexing within a loop. Can anyone give me some help with this?
i know is probably explained in how it makes each operation, so i need some keys about how to look for it
thanks for advise
thanks in advance
y1 = (8*x-1)/(x)-(exp(x));
for i=1:n
plot(x,y1, 'red')
hold on
plot(x,y2, 'blue')
here the plot:
0.05 -13.0513
0.06 -9.7285
0.07 -7.3582
0.08 -5.5833
0.09 -4.2053
0.10 -3.1052
0.11 -2.2072
0.12 -1.4608
0.13 -0.8311
0.14 -0.2931
0.15 0.1715
0.16 0.5765
0.05 6.4497
0.06 6.4391
0.07 6.4284
0.08 6.4177
0.09 6.4068
0.10 6.3958
0.11 6.3847
0.12 6.3734
0.13 6.3621
0.14 6.3507
0.15 6.3391
0.16 6.3274
What you want is:
y1 = (8*x-1)./(x)-(exp(x)); % note the ./
instead of:
y1 = (8*x-1)/(x)-(exp(x));
As a side note, you can type help / to see what your original first statement was really doing. It was effectively doing (x'\(8*x-1)')' (note the backslash).
The error is in your first vector calculation of y1. The problem is in the part where you divide by x. Vector/Matrix division is different from single element division and will return a vector/matrix result.
The solution is to do element-by-element division of these vectors. The vector notation shortcut for this is to use ./ (dot-divide):
y3 = (8*x-1)./(x)-(exp(x));
Now you should find that y2 and y3 are identical.