Statistical error computing Matlab - matlab

I have two vectors of values and I want to compare them statistically. For simplicity assume A = [2 3 5 10 15] and B = [2.5 3.1 4.8 10 18]. I want to compute the standard deviation, the root mean square error (RMSE), the mean, and present conveniently, maybe as histogram. Can you please help me how to do it so that I understand? I know question is probably simple, but I am new into this. Many thanks!
edited:
This is how I wanted to implement RMSE.
dt = 1;
for k=1:numel(A)
err(k)=sqrt(sum(A(1,1:k)-B(1,1:k))^2/k);
t(k) = dt*k;
end
However it gives me bigger values than I expect, since e.g. 3 and 3.1 differ only in 0.1.
This is how I calculate error between reference value of each cycle with corresponding estimated in that cycle.
Can you tell me, am I doing right, or what's wrong?
abs_err = A-B;

The way you are looping through the vectors is not element by element but rather by increasing the vector length, that is, you are comparing the following at each iteration:
A(1,1:k) B(1,1:k)
-------- --------
k=1 [2] [2.5]
=2 [2 3] [2.5 3.1]
=3 [2 3 5] [2.5 3.1 4.8]
....
At no point do you compare only 2 and 2.1!
Assuming A and B are vectors of identical length (and both are either column or row vectors), then you want functions std(A-B), mean(A-B), and if you look in matlab exchange, you will find a user-contributed rmse(A-B) but you can also compute the RMSE as sqrt(mean((A-B).^2)). As for displaying a histogram, try hist(A-B).
In your case:
dt = 1;
for k=1:numel(A)
stdab(k) = std(A(1,1:k)-B(1,1:k));
meanab(k) = mean(A(1,1:k)-B(1,1:k));
err(k)=sqrt(mean((A(1,1:k)-B(1,1:k)).^2));
t(k) = dt*k;
end
You can also include hist(A(1,1:k)-B(1,1:k)) in the loop if you want to compute histograms for every vector pair difference A(1,1:k)-B(1,1:k).

Related

determine the frequency of a number if a simulation

I have the following function:
I have to generate 2000 random numbers from this function and then make a histogram.
then I have to determine how many of them is greater that 2 with P(X>2).
this is my function:
%function [ output_args ] = Weibullverdeling( X )
%UNTITLED Summary of this function goes here
% Detailed explanation goes here
for i=1:2000
% x= rand*1000;
%x=ceil(x);
x=i;
Y(i) = 3*(log(x))^(6/5);
X(i)=x;
end
plot(X,Y)
and it gives me the following image:
how can I possibly make it to tell me how many values Do i Have more than 2?
Very simple:
>> Y_greater_than_2 = Y(Y>2);
>> size(Y_greater_than_2)
ans =
1 1998
So that's 1998 values out of 2000 that are greater than 2.
EDIT
If you want to find the values between two other values, say between 1 and 4, you need to do something like:
>> Y_between = Y(Y>=1 & Y<=4);
>> size(Y_between)
ans =
1 2
This is what I think:
for i=1:2000
x=rand(1);
Y(i) = 3*(log(x))^(6/5);
X(i)=x;
end
plot(X,Y)
U is a uniform random variable from which you can get the X. So you need to use rand function in MATLAB.
After which you implement:
size(Y(Y>2),2);
You can implement the code directly (here k is your root, n is number of data points, y is the highest number of distribution, x is smallest number of distribution and lambda the lambda in your equation):
X=(log(x+rand(1,n).*(y-x)).*lambda).^(1/k);
result=numel(X(X>2));
Lets split it and explain it detailed:
You want the k-th root of a number:
number.^(1/k)
you want the natural logarithmic of a number:
log(number)
you want to multiply sth.:
numberA.*numberB
you want to get lets say 1000 random numbers between x and y:
(x+rand(1,1000).*(y-x))
you want to combine all of that:
x= lower_bound;
y= upper_bound;
n= No_Of_data;
lambda=wavelength; %my guess
k= No_of_the_root;
X=(log(x+rand(1,n).*(y-x)).*lambda).^(1/k);
So you just have to insert your x,y,n,lambda and k
and then check
bigger_2 = X(X>2);
which would return only the values bigger than 2 and if you want the number of elements bigger than 2
No_bigger_2=numel(bigger_2);
I'm going to go with the assumption that what you've presented is supposed to be a random variate generation algorithm based on inversion, and that you want real-valued (not complex) solutions so you've omitted a negative sign on the logarithm. If those assumptions are correct, there's no need to simulate to get your answer.
Under the stated assumptions, your formula is the inverse of the complementary cumulative distribution function (CCDF). It's complementary because smaller values of U give larger values of X, and vice-versa. Solve the (corrected) formula for U. Using the values from your Matlab implementation:
X = 3 * (-log(U))^(6/5)
X / 3 = (-log(U))^(6/5)
-log(U) = (X / 3)^(5/6)
U = exp(-((X / 3)^(5/6)))
Since this is the CCDF, plugging in a value for X gives the probability (or proportion) of outcomes greater than X. Solving for X=2 yields 0.49, i.e., 49% of your outcomes should be greater than 2.
Make suitable adjustments if lambda is inside the radical, but the algebra leading to solution is similar. Unless I messed up my arithmetic, the proportion would then be 55.22%.
If you still are required to simulate this, knowing the analytical answer should help you confirm the correctness of your simulation.

frequency of each vector value in another vector matlab

I need to calculate the frequency of each value in another vector in MATLAB.
I can use something like
for i=1:length(pdata)
gt(i)=length(find(pf_test(:,1)==pdata(i,1)));
end
But I prefer not to use loop because my dataset is quite large. Is there anything like histc (which is used to find the frequency of values in one vector) to find the frequency of one vector value in another vector?
If your values are only integers, you could do the following:
range = min(pf_test):max(pf_test);
count = histc(pf_test,range);
gt = count(ismember(range,a));
gt(~ismember(unique(a),b)) = 0;
If you can't guarantee that the values are integers, it's a bit more complicated. One possible method of it would be the following:
%restrict yourself to values that appear in the second vector
filter = ismember(pf_test,pdata);
% sort your first vector (ignore this if it is already sorted)
spf_test = sort(pf_test);
% Find the first and last occurrence of each element
[~,last] = unique(spf_test(filter));
[~,first] = unique(spf_test(filter),'first');
% Initialise gt
gt = zeros(length(pf_test));
% Fill gt
gt(filter) = (last-first)+1;
EDIT: Note that I may have got the vectors the wrong way around - if this doesn't work as expected, switch pf_test and pdata. It wasn't immediately clear to me which was which.
You mention histc. Why are you not using it (in its version with two input parameters)?
>> pdata = [1 1 3 2 3 1 4 4 5];
>> pf_test = 1:6;
>> histc(pdata,pf_test)
ans =
3 1 2 2 1 0

Matlab - Spectral Method (Matrix Syntax)

I'm reading Trefethen's Spectral Methods in Matlab.
When creating the differentiation matrices,
column= [ anything ]
D=toeplitz(column,column([1 N:-1:2]))
Can someone please explain what exactly is happening inside the [ ... ] in the line above.
I understand you are shifting the columns but I don't understand that syntax.
Are you referring to the 2nd line with: [1 N:-1:2] ?
If so, lets look at an example, let N = 4 and just calculate:
N = 4; [1 N:-1:2]
ans =
1 4 3 2
Which creates a vector with the first element being 1. Next the values start at 4 and decrement by 1 until you reach 2.
This is a basic Matlab syntax, [a:b:c], creates a vector with starting value a, increasing (or decreasing if -b) to c in steps of b.
Is this what you are referring to?

Update only one matrix element for iterative computation

I have a 3x3 matrix, A. I also compute a value, g, as the maximum eigen value of A. I am trying to change the element A(3,3) = 0 for all values from zero to one in 0.10 increments and then update g for each of the values. I'd like all of the other matrix elements to remain the same.
I thought a for loop would be the way to do this, but I do not know how to update only one element in a matrix without storing this update as one increasingly larger matrix. If I call the element at A(3,3) = p (thereby creating a new matrix Atry) I am able (below) to get all of the values from 0 to 1 that I desired. I do not know how to update Atry to get all of the values of g that I desire. The state of the code now will give me the same value of g for all iterations, as expected, as I do not know how to to update Atry with the different values of p to then compute the values for g.
Any suggestions on how to do this or suggestions for jargon or phrases for me to web search would be appreciated.
A = [1 1 1; 2 2 2; 3 3 0];
g = max(eig(A));
% This below is what I attempted to achieve my solution
clear all
p(1) = 0;
Atry = [1 1 1; 2 2 2; 3 3 p];
g(1) = max(eig(Atry));
for i=1:100;
p(i+1) = p(i)+ 0.01;
% this makes a one giant matrix, not many
%Atry(:,i+1) = Atry(:,i);
g(i+1) = max(eig(Atry));
end
This will also accomplish what you want to do:
A = #(x) [1 1 1; 2 2 2; 3 3 x];
p = 0:0.01:1;
g = arrayfun(#(x) eigs(A(x),1), p);
Breakdown:
Define A as an anonymous function. This means that the command A(x) will return your matrix A with the (3,3) element equal to x.
Define all steps you want to take in vector p
Then "loop" through all elements in p by using arrayfun instead of an actual loop.
The function looped over by arrayfun is not max(eig(A)) but eigs(A,1), i.e., the 1 largest eigenvalue. The result will be the same, but the algorithm used by eigs is more suited for your type of problem -- instead of computing all eigenvalues and then only using the maximum one, you only compute the maximum one. Needless to say, this is much faster.
First, you say 0.1 increments in the text of your question, but your code suggests you are actually interested in 0.01 increments? I'm going to operate under the assumption you mean 0.01 increments.
Now, with that out of the way, let me state what I believe you are after given my interpretation of your question. You want to iterate over the matrix A, where for each iteration you increase A(3, 3) by 0.01. Given that you want all values from 0 to 1, this implies 101 iterations. For each iteration, you want to calculate the maximum eigenvalue of A, and store all these eigenvalues in some vector (which I will call gVec). If this is correct, then I believe you just want the following:
% Specify the "Current" A
CurA = [1 1 1; 2 2 2; 3 3 0];
% Pre-allocate the values we want to iterate over for element (3, 3)
A33Vec = (0:0.01:1)';
% Pre-allocate a vector to store the maximum eigenvalues
gVec = NaN * ones(length(A33Vec), 1);
% Loop over A33Vec
for i = 1:1:length(A33Vec)
% Obtain the version of A that we want for the current i
CurA(3, 3) = A33Vec(i);
% Obtain the maximum eigen value of the current A, and store in gVec
gVec(i, 1) = max(eig(CurA));
end
EDIT: Probably best to paste this code into your matlab editor. The stack-overflow automatic text highlighting hasn't done it any favors :-)
EDIT: Go with Rody's solution (+1) - it is much better!

Is there a vectorized way to operate on a different number of values per column in MATLAB?

In MATLAB, is there a more concise way to handle discrete conditional indexing by column than using a for loop? Here's my code:
x=[1 2 3;4 5 6;7 8 9];
w=[5 3 2];
q=zeros(3,1);
for i = 1:3
q(i)=mean(x(x(:,i)>w(i),i));
end
q
My goal is to take the mean of the top x% of a set of values for each column. The above code works, but I'm just wondering if there is a more concise way to do it?
You mentioned that you were using the function PRCTILE, which would indicate that you have access to the Statistics Toolbox. This gives you yet another option for how you could solve your problem, using the function NANMEAN. In the following code, all the entries in x less than or equal to the threshold w for a column are set to NaN using BSXFUN, then the mean of each column is computed with NANMEAN:
x(bsxfun(#le,x,w)) = nan;
q = nanmean(x);
I don't know of any way to index the columns the way you want. This may be faster than a for loop, but it also creates a matrix y that is the size of x.
x=[1 2 3;4 5 6;7 8 9];
w=[5 3 2];
y = x > repmat(w,size(x,1),1);
q = sum(x.*y) ./ sum(y)
I don't claim this is more concise.
Here's a way to solve your original problem: You have an array, and you want to know the mean of the top x% of each column.
%# make up some data
data = magic(5);
%# find out how many rows the top 40% are
nRows = floor(size(data,1)*0.4);
%# sort the data in descending order
data = sort(data,1,'descend');
%# take the mean of the top 20% of values in each column
topMean = mean(data(1:nRows,:),1);