Line chart with means and CIs in Stata - confidence-interval

I have a data set with effect estimates for different points in time (for 1 month, 2 months, 6 months, 12 months and 18 months) and its standard errors. Now I want to plot the means for each period and the corresponding CIs around the means.
My sample looks like:
effect horizon se
0.03 1 0.2
0.02 6 0.01
0.01 6 0.3
0.00 1 0.4
0.04 18 0.2
0.02 2 0.05
0.01 2 0.02
... ...
The means of the effects for each horizon lead to 5 data points that I want to plot in a line chart together with the confidence intervals. I tried this:
egen means = mean(effect), by(horizon)
line means horizon
But how can I add the symmetric confidence bands? Such that I get something that looks like this:

Not entirely certain that this makes sense statistically, but here's how I might do this:
gen variance = se^2
collapse (mean) effect (sum) SV = variance (count) noobs = effect, by(horizon)
gen se_mean = sqrt(SV*(1/noobs)^2)
gen LB = effect - 1.96*se_mean
gen UB = effect + 1.96*se_mean
twoway (rline LB UB horizon, lpattern(dash dash)) (line effect horizon, lpattern(solid)), yline(0, lcolor(gray))
Which yields:
To get the SE of the mean effects T̅, I am using the formula
V(T̅) = 1/(n2) Σin V(Ti)
(which assumes the covariances of the effects are all zero). I then take the square root to get the SE of T̅.

Related

What's the easiest way of making a histogram in KDB?

If I've got a list of values x, what's the easiest way to make a histogram with bin-size b?
It seems like I could hack a complicated solution, but I feel like there must be a built-in function for this somewhere that I don't know about.
I haven't heard about built-in histogram so far. But I would approach this task like below.
For fixed bucket size:
a: 0.39 0.51 0.51 0.4 0.17 0.3 0.78 0.53 0.71 0.41;
b: 0.1;
{count each group x xbar y}[b;a]
// returns 0.3 0.5 0.4 0.1 0.7!2 3 2 1 2j
For "floating" buckets:
a: 0.39 0.51 0.51 0.4 0.17 0.3 0.78 0.53 0.71 0.41;
b: -1 0.5 0.7 1;
{count each group x#x bin y}[b;a]
// returns -1 0.5 0.7!5 3 2j
Above functions return dictionary with bucket starts as keys and number of bucket occurrences as values.
Assuming you have a list of x values (let's assume x = 1000):
v:1000?1.0;
You can achieve what you need as follows:
b:0.1;
hist:(count') group xbar[b;v];
There are two points:
the keys in hist are not sorted
For the bucket, do you prefer to output the left or the right delimiter?
To solve for 1), you simply do:
hist:(asc key hist)#hist;
To solve for 2) - I mean, if you want to have the right delimiter:
hist:(+[b;key hist])!value hist;

Vowpal Wabbit: question on weight of interaction features

In VW, the format for feature namespaces is shown below:
Label [Tag]|Namespace Features |Namespace Features ... |Namespace
Features Where:
Namespace=String[:Value]
and an example is:
1 1.0 |MetricFeatures:3.28 height:1.5 length:2.0 |Says black with white stripes |OtherFeatures NumberOfLegs:4.0 HasStripes
Notice that the |MetricFeatures namespace has a higher weight than 1 (3.28). Based on the above example, if I create some feature interactions, say between the M and the S namespaces with -q MS, does the new feature namespace that is the cross product of the two original ones have an importance weighting of 1 by default? Or would it inherit the product of the two importance Values (in this case 1*3.28 = 3.28)?
And is there a way to modify the weight of the feature interactions manually? E.g. say MetricFeatures has an importance weight of 1, can I have the features generated by the quadratic interaction of MetricFeaturesXSays have an importance weighting of x?
Currently there is no way to individually weight interactions.
The namespace weight is processed at parse time, so when reading in the features of that namespace they are multiplied by the weight.
This can be verified by using --audit:
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = data.txt
num sources = 1
average since example example current current current
loss last counter weight label predict features
0
MetricFeatures^height:146807:4.92:0#0 MetricFeatures^length:38580:6.56:0#0 Says^black:100768:1:0#0 Says^with:163314:1:0#0 Says^white:106708:1:0#0Says^stripes:112832:1:0#0 OtherFeatures^NumberOfLegs:146847:4:0#0 OtherFeatures^HasStripes:229154:1:0#0 Constant:116060:1:0#0
1.000000 1.000000 1 1.0 1.0000 0.0000 9
finished run
number of examples = 1
weighted example sum = 1.000000
weighted label sum = 1.000000
average loss = 1.000000
best constant = 1.000000
best constant's loss = 0.000000
total feature number = 9
MetricFeatures^height:146807:4.92:0#0 -> 3.28 * 1.5 = 4.92

Neural network - exercise

I am currently learning for myself the concept of neural networks and I am working with the very good pdf from
http://neuralnetworksanddeeplearning.com/chap1.html
There are also few exercises I did, but there is one exercise I really dont understand, at least one step
Task:
There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below. Find a set of weights and biases for the new output layer. Assume that the first 3 layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least 0.99, and incorrect outputs have activation less than 0.01.
I found also the solution, as can be seen on the second image
I understand why the matrix has to have this shape, but I really struggle to understand the step, where the user calculates
0.99 + 3*0.01
4*0.01
I really don't understand these two steps. I would be very happy if someone can help me understand this calculation
Thank you very much for help
Output of previous layer is 10x1(x). Weight matrix is 4x10. New output layer will be 4x1. There are two assumption first:
x is 1 only at one row. xT= [1 0 0 0 0 0 0 0 0 0]. If you multiple this vector with matrix W your output will be yT=[0 0 0 0], because there is only 1 in x. After multiplication by W will be this only 1 multiple by 0th column of W which are zeroes.
Second assumption is, what if x is not 1 anymore, instead of one x can be xT=[0.99 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01]. And if you perform multiplication of x with first row of W result is 0.05(I believe here is typo). When xT=[0.01 0.99 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01] after multiplication with first row of W result is 1.03. Because:
0.01*0 + 0.99*1 + 0.01*0 + 0.01*1 + 0.01*0 + 0.01*1 + 0.01*0 + 0.01*1 + 0.01*0 + 0.01*1 = 1.03
So I believe there is a typo, because author probably assume 4 ones at first row of W, which is not true, because there is 5 ones. Because if there was 4 ones at first first row, than really results will be 0.04 for 0.99 at first row of x and 1.02 for 0.99 at second row of x.

How to efficiently apply correction to a time series with fully vectorized code in Matlab

I face a computation efficiency problem. I have a time series of a non-monotonically drifting variable, that is measurements of objects going through a machine (where the measurement is made) in a production line. The job to do consists of simulating what this time series would yield if there was a correction made to the object each time the measurement drifts above or below a threshold.
To do that I could simply make a for loop, and each time the thresholds are crossed, I apply the correction to the rest of the time series. However the time series is very long and the for loop take too much time to compute. I would like to vectorize this code to make it more efficient. Does anyone sees a way to do so?
Here is a working example using a for loop:
ts = [1 2 3 4 3 2 1 0 -1 -2 -3];
threshold = [-3.5 3];
correction = [1 -1];
for i = 1:numel(ts)
if ts(i) > threshold(2)
ts(i:end) = ts(i:end) + correction(2);
elseif ts(i) < threshold(1)
ts(i:end) = ts(i:end) + correction(1);
end
end
disp(ts)
The result is and should be the following array:
[1 2 3 3 2 1 0 -1 -2 -3 -3]
How can I vectorize this problem?

access columns of row at index matlab

I have a text file A with N rows and 8 columns (example below):
1 2 0.02 0.28 0.009 0.04 0.03 0.16
2 4 0.04 0.21 0.4 0.04 0.03 0.13
if my variables x=2 and y=4 (I get during the program run -values change with every run.)
I know the index of the row I need to get:
rIndex = find((A{1}==x)&(A{2}==y));
here rindex = 2.
I need to access row at rIndex and do operation with columns 3 to 8 in that row. i.e. i get values from another file based on x and y and divide those values by column values at rIndex row.
x_No = 5, y_No = 6;
B = horzcat((x_No-y_No)./A{3}(1),(x_No-y_No))./A{4}(1),(x_No-y_No)./A{5}(1),(x_No-y_No)./A{6}(1),(x_No-y_No)./A{7}(1),(x_No-y_No)./A{8}(1));
What B does right now is operates on all rows from A. I need only at particular rIndex.
Its still not really clear what happens, but I think you can try something like this:
for r = 1:numel(rIndex)
t=rIndex(r);
result(r,:) = horzcat((x_No-y_No)./A{3}(t),(x_No-y_No))./A{4}(t),(x_No-y_No)./A{5}(t),(x_No-y_No)./A{6}(t),(x_No-y_No)./A{7}(t),(x_No-y_No)./A{8}(t));
end
I may be doing too much actually, but hopefully this helps.