Find measurement which fits corresponds to same variable at other locations - matlab

If I have wind speed measurements for 4 different locations within a geographical radius of approximately 400km for one year, is there a method for determining which wind speed measurement best fits all of the location i.e. does one of the locations have a similar wind speed to all other locations? Can this be achieved?

I suppose you could find the one that provides minimum e.g. quadratic loss from all the others:
speeds is an N-by-4 matrix, with N windspeed measurements for each location
%Loss will find the squared loss for location i. It subtracts column i from each column in speeds, and squares this difference (for column i, this will always be 0). Then average over all rows and the 3 non-zero columns.
loss = #(i)sum(mean((speeds - repmat(speeds(:, i), 1, 4)).^2)) ./ 3;
%Apply loss to each of the 4 locations, find the minimum.
[v i] = min(arrayfun(loss, 1:4));
The loss function takes the average squared difference between each windspeed and the speeds at all other locations. Then we use arrayfun to calculate this loss for each location.

Related

How to remove bias when downsampling a vector in Matlab

I have a set of vectors containing some arbitrary shape like a triangle pulse with a single maxima.
I need to downsample these vectors by an integer factor.
The position of the maxima relative to the length of the vector should stay the same.
Below code shows, that when I do this, there is a bias=-0.0085 introduced by the downsampling step which should be zero on average.
The bias doesn't seem to change much depending on the number of vectors (tried between 200 and 800 vectors)
.
I also tried different resampling functions like downsample and decimate leading to the same results.
datapoints = zeros(1000,800);
for ii = 1:size(datapoints,2)
datapoints(ii:ii+18,ii) = [1:10,9:-1:1];
end
%downsample each column of the data
datapoints_downsampled = datapoints(1:10:end,:);
[~,maxinds_downsampled] = max(datapoints_downsampled);
[~,maxinds] = max(datapoints);
%bias needs to be zero
bias = mean(maxinds/size(datapoints,1)-maxinds_downsampled/size(datapoints_downsampled,1))
This graph shows, that there is a systematic bias that does not depend on the number of vectors
How to remove this bias? Is there a way to determine its magnitude given only one vector?
Where does it come from?
There are two main issues with the code:
Dividing the index by the length of the vector leads to a small bias: if the max is at the first element, then 1/1000 is not the same as 1/100, even though the subsampling preserved the element that contained the maximum. This needs to be corrected for by subtracting 1 before the division, and adding 1/1000 after the division.
Subsampling by a factor of 10 leads to a bias as well: since we're determining the integer location only, in 1/10 cases we preserve the location, in 4/10 cases we move the location in one direction, and in 5/10 cases we move the location in the other direction. The solution is to use an odd subsampling factor, or to determine the location of the maximum with sub-sample precision (this requires proper low-pass filtering before subsampling).
The code below is a modification of the code in the OP, it does a scatter plot of the error vs the location, as well as OP's bias plot. The first plot helps identify issue #2 above. I have made the subsampling factor and the offset for subsampling variables, I recommend that you play with these values to understand what is happening. I have also made the location of the maximum random to avoid a sampling bias. Note I also use N/factor instead of size(datapoints_downsampled,1). The size of the downsampled vector is the wrong value to use if N/factor is not integer.
N = 1000;
datapoints = zeros(N,800);
for ii = 1:size(datapoints,2)
datapoints(randi(N-20)+(1:19),ii) = [1:10,9:-1:1];
end
factor = 11;
offset = round(factor/2);
datapoints_downsampled = datapoints(offset:factor:end,:);
[~,maxinds_downsampled] = max(datapoints_downsampled,[],1);
[~,maxinds] = max(datapoints,[],1);
maxpos_downsampled = (maxinds_downsampled-1)/(N/factor) + offset/N;
maxpos = (maxinds)/N;
subplot(121), scatter(maxpos,maxpos_downsampled-maxpos)
bias = cumsum(maxpos_downsampled-maxpos)./(1:size(datapoints,2));
subplot(122), plot(bias)

Matlab: how to compute adjusted R squared for AR model

I've got an econometrics problem in which I have to compute in Matlab an AR(15) time series. After asking me to compute BIC and AIC values, the professor requests also the adjusted R squared statistics, but in this case I have no clue on how to compute it.
I've already implemented the AR model through the command 'arima('ARlags', 1:15)' and using the command 'estimate' I obtained the values of the constant, the 15 AR coefficients and the variance.
I know how to compute the adjusted R squared: I have to calculate the sum of squares of residuals and total sum of squares and divide each by the degrees of freedom. However in this case I do not have, like in any statistics problem, the estimated values of my response, so I do not know how to calculate the residual sum of squares and then the adjusted R squared.
Thanks in advance for any help
parcorr(zero_rate)
AR1=arima('ARlags', 1:15);
[est_AR1,EstParamCov1,logL1]=estimate(AR1,zero_rate);
[AIC1, BIC1]=aicbic(logL1,17,35);
Assuming you are using the arima class , you can use the infer method to get the residuals and then do a dot product to get the sum of squares
E = infer(Mdl,Y)
Ssquares = dot(E,E)
To get the total sum of squares, you can do
Stotal = dot(Y-mean(Y),Y-mean(Y))
Then the R squared is just
Rsq = 1- Ssquares/Stotal
The adjusted R squared
Rsqadj - 1- (1-Rsq)*(n-1)/(n-p-1) (which is = 1-Ssquares/Stotal*(n-1)/(n-p-1))
Where n is your population size, and p is the number of non-intercept coefficients (in your case, i think this is 15).

How to take the difference between the resulting and the correct bucket of a one hot vector into account?

Hi I am using tensorflow at my university to try to classify steering angles of a simulation program using only the images the simulation produces.
The Steering angles are values from -1 to 1 and I separated them into 50 "buckets". So the first value of my prediction vector would mean that the predicted steering angle is between -1 and -0.96.
The following shows the classification and optimization functions I am using.
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(prediction, y))
optimizer = tf.train.AdamOptimizer(0.001).minimize(cost)
y is a vector that with 49 zeros and a single 1 for the correct bucket. My question now is.
How do I take into account if e.g. the correct bucket is at index 25, that the a prediction of 26 is much better than a prediction of 48.
I didn't post the actual network since it is just a couple of conv2d and maxpool layers with a fully connected layer at the end.
Since you are applying Cross entropy or negative log likelihood. you are penalizing the system given the predicted output and the ground truth.
So saying that your system predicted different numbers on your 50 classes output and the highest one was the class number 25 but your ground truth is class 26. So your system will take the value predicted on 26 and adapt the parameters to produce the highest number on this output the next time it sees this input.
You could do two basic things:
Change your y and prediction to be scalars in the range -1..1; make the loss function be (y-prediction)**2 or something. A very different model, but perhaps more reasonable that the one-hot.
Keep the one-hot target and loss, but have y = target*w, where w is a constant matrix, mostly zeros, 1s on the diagonal, and smaller values on the next diagonal, elements (e.g. y(i) = target(i) * 1. + target(i-1) * .5 + target(i+1) * .5 + ...); kind of gross, but it should converge to something reasonable.

iBeacon: Linear Approximation Model (LAM)

first i want to calibrate my Beacons, so for this i go 1 meter away and get 60 rssi values and take the average of them. Then I have the receiving signal power at 1m distance from my beacon.
Now I want to calculate the distance based on the following formula:
A represents the receiving signal power at 1 meter distance
K represents the exponent of the path loss
d represents the distance
K depends on the room, in which i want to calculate the distance. What is the best course of action to calculate the variable K for this solution?
Essentially, you need to solve for K and A. To do this, you need to repeat the calibration procedure for other distances to get more data points so you have multiple d values and multiple RSSI values. Then you need to run a regression to find the best fit values for K and A.
That said, I doubt you will have much success with this formula. I have not been able to use it to accurately predict distance. I have found this formula to be a better predictor.

Kernel Density Estimation for clustering 1 dimensional data

I am using Matlab and the code provided at
http://www.mathworks.com/matlabcentral/fileexchange/14034-kernel-density-estimator/content/kde.m
to cluster 1D data. In particular I estimate the density function of my data and then analysing the peaks I should be able to identify the different distributions that form my dataset. (correct?)
I then cluster the points according to these cluster centroids (peaks in density function).
You can find my data (z) at:
https://drive.google.com/file/d/0B3vXKJ_zYaCJLUE3YkVBMmFtbUk/view?usp=sharing
and the plot of the probabiity density function at:
https://drive.google.com/file/d/0B3vXKJ_zYaCJTjVobHRBOXo4Tmc/view?usp=sharing
What I did was simply to run
[bandwidth,density,xmesh]=kde(z);
plot(xmesh,density);
What I get (please have a look at the second link) is 1 peak in the density function per data point....
I think that I am doing something wrong... Could the default parameter of the kde function be the cause?
kde(data,n,MIN,MAX)
% data - a vector of data from which the density estimate is constructed;
% n - the number of mesh points used in the uniform discretization of the
% interval [MIN, MAX]; n has to be a power of two; if n is not a power of two, then
% n is rounded up to the next power of two, i.e., n is set to n=2^ceil(log2(n));
% the default value of n is n=2^12;
% MIN, MAX - defines the interval [MIN,MAX] on which the density estimate is constructed;
% the default values of MIN and MAX are:
% MIN=min(data)-Range/10 and MAX=max(data)+Range/10, where Range=max(data)-min(data);
Would this be possible? Could you tell me on which basis should I change them?
You point out the solution in your question. The documentation suggests that the algorithm sets a ceiling of 2^N peaks created from the data. The default (16k or 2^14) is larger than the number of points you supplied (~8k) resulting in the "spiky" behaviour.
If you instead run
[bandwidth,density,xmesh]=kde(z,2^N);
for different values of 2^N (the function demands a power of 2, must be a FFT thing) you get a plot like follows:
based on which you can pick the value of N that is appropriate.