I want to generate 100 different 5 by 5 random matrices using matlab in [0,1] with the following property: Assume that the required matrix is A=[a_{ij}] having the condition a_{ih}+a_{hj}-a_{ij}-0.5=0 (Matrix A is the so-called Fuzzy Preference matrix (i.e.a_{ji}=1-a_{ij} for all i,j), and also consistent). But, I am stuck to write the matlab code. Could anyone help me? Thanks!
Example of such matrix, but not consistent:
A=[.5 .5 .5 .8155 .5 .3423;...
.5 .5 .6577 .8155 .5 .3423;...
.5 .3423 .5 .88662 .75 .3423;...
.1845 .8145 .1338 .5 .25 .25;...
.5 .5 .25 .75 .5 .25;...
.6577 .6577 .6577 .75 .75 .5]
Example of 3 by 3 consistent Fuzzy Preference matrix:
B=[.5 .2 .5;...
.8 .5 .8;...
.5 .2 .5]
To solve this question we need to find solutions to a system of linear equations. The unknowns are the matrix entries, so for an NxN matrix there will be N^2 unknowns. There are N^3 equations since there is an equation for each combination of i, j, and h. This is an overdetermined system, but is consistent, it's easy to see that the constant matrix with all entries 0.5 is a solution.
The system of equations can be written in matrix form, with matrix M defined as follows:
N = 5; % size of resulting matrix
M = zeros(N^3,N^2); % ar = size of matrix A
t = 0;
for i = 1:N
for j = 1:N
for h = 1:N
t = t+1;
M(t,(i-1)*N+h) = M(t,(i-1)*N+h)+1;
M(t,(h-1)*N+j) = M(t,(h-1)*N+j)+1;
M(t,(i-1)*N+j) = M(t,(i-1)*N+j)-1;
end
end
end
The right hand side is a vector of N^3 0.5's. The rank of the matrix appears to be N^2-N+1 (I think this is because the main diagonal of the result must be filled with 0.5, so there are really only N^2-N unknowns). The rank of the augmented system is the same, so we do have an infinite number of solutions, spanning an N-1 dimensional space.
So we can find the solutions as the sum of the vector of 0.5's plus any element of the null space of the matrix.
a = 0.5*ones(N^2,1) % solution to non-homogenous equation
nullspace = null(M) % columns form a basis for the null space
So we can now generate as many solutions as we like by adding multiples of the basis vectors of the null space to a,
s = a+sum(rand(1,N-1).*nullspace,2) % always a solution
The final problem is to sample uniformly from this null space while requiring the all the entries stay within [0,1].
I think it's hard even to say what uniform sampling over this space is, but at least I can generate random elements by the following:
% first check the size of the largest element
% we can't addd/subtract more than 0.5to the homogeneous solution
for p = 1:size(nullspace,2)
mn(p) = 0.5/max(abs(nullspace(:,p)));
end
c = 0;
mat = {};
while c<5 % generate 5 matrices
% a solution is the homogeneous part 0.5*ones(N) plus
% some element of the nullspace sum(rand(1,size(n,2)).*mn.* nullspace,2)
% reshaped to be a square matrix
newM = 0.5*ones(N)+reshape(sum((2*rand(1,size(nullspace,2))-1).*mn.* nullspace,2),N,N)
% make sure the entries are all in [0,1]
if all(newM(:)>=0) && all(newM(:)<=1)
c = c+1;
mat{c} = newM;
end
end
% mat is a cell array of 5 matrices satisfying the condition
I don't have much idea about the distribution of the resulting matrices.
Related
By using normrnd, I would like to create a normal distribution function with mean and sigma values expressed as vectors of size 1x45 varying from 1:45 and plot this simulated PDF with ideal values.
Whenever I create a normrnd like the one expressed below,
Gaussian = normrnd([1 45],[1 45],[1 500],length(c_t));
I am obtaining the following error,
Size information is inconsistent.
The reason for creating this PDF is to compute Chemical kinetics of a tracer with variable gaussian noise model. Basically i have an Ideal characteristics of a Tracer now i would like to add gaussian noise and understand how the chemical kinetics of a tracer vary with changing noise.
Basically there are different computational models for understanding chemical kinetics of tracer, one of which is Three compartmental model ,others are viz shape analysis,constrained shape analysis model.
I currently have ideal curve for all respective models, now i would like to add noise to these models and understand how each particular model behaves with varying noise
This is why i would like to create a variable noise model with normrnd add this model to ideal characteristics and compute Noise(Sigma) Vs Error -This analysis will give me an approximate estimation how different models behave with varying noise and which model is suitable for estimating chemical kinetics of tracer.
function [c_t,c_t_noise] =Noise_ConstrainedK2(t,a1,a2,a3,b1,b2,b3,td,tmax,k1,k2,k3)
K_1 = (k1*k2)/(k2+k3);
K_2 = (k1*k3)/(k2+k3);
%DV_free= k1/(k2+k3);
c_t = zeros(size(t));
ind = (t > td) & (t < tmax);
c_t(ind)= conv(((t(ind) - td) ./ (tmax - td) * (a1 + a2 + a3)),(K_1*exp(-(k2+k3)*t(ind)+K_2)),'same');
ind = (t >= tmax);
c_t(ind)=conv((a1 * exp(-b1 * (t(ind) - tmax))+ a2 * exp(-b2 * (t(ind) - tmax))) + a3 * exp(-b3 * (t(ind) - tmax)),(K_1*exp(-(k2+k3)*t(ind)+K_2)),'same');
meanAndVar = (rand(45,2)-0.5)*2;
numPoints = 500;
randSamples = zeros(1,numPoints);
for ii = 1:numPoints
idx = mod(ii,size(meanAndVar,1))+1;
randSamples(ii) = normrnd(meanAndVar(idx,1),meanAndVar(idx,2));
c_t_noise = c_t + randSamples(ii);
end
scatter(1:numPoints,randSamples)
dg = [0 0.5 0];
plot(t,c_t,'r');
hold on;
plot(t,c_t_noise,'Color',dg);
hold off;
axis([0 50 0 1900]);
xlabel('Time[mins]');
ylabel('concentration [Mbq]');
title('My signal');
%plot(t,c_tnp);
end
The output characteristics from the above function are as follows,Here i could not visualize any noise
The only remotely close thing to what you want to be done can be done as follows, but will involve looping because you can not request 500 data points from only 45 different means and variances, without the assumption that multiple sets can be revisited.
This is my interpretation of what you want, though I am still not entirely sure.
Random Gaussian Function Selection
meanAndVar = rand(45,2);
numPoints = 500;
randSamples = zeros(1,numPoints);
for ii = 1:numPoints
randMeanVarIdx = randi([1,size(meanAndVar,1)]);
randSamples(ii) = normrnd(meanAndVar(randMeanVarIdx,1),meanAndVar(randMeanVarIdx,2));
end
scatter(1:numPoints,randSamples)
The above code generates a random 2-D matrix of mean and variance (1st col = mean, 2nd col = variance). We then preallocate some space.
Inside the loop we chose a random set of mean and variance to use (uniformly) and then take that mean and variance, plug it into a random gaussian value function, and store it.
the matrix randSamples will contain a list of random values generated by a random set of gaussian functions chosen in a randomly uniform manner.
Sequential Function Selection
If you do not want to randomly select which function to use, and just want to go sequentially you loop using modulus to get the index of which set of values to use.
meanAndVar = (rand(45,2)-0.5)*2; % zero shift and make bounds [-1,1]
numPoints = 500;
randSamples = zeros(1,numPoints);
for ii = 1:numPoints
idx = mod(ii,size(meanAndVar,1))+1;
randSamples(ii) = normrnd(meanAndVar(idx,1),meanAndVar(idx,2));
end
scatter(1:numPoints,randSamples)
The problem with this statement
Gaussian = normrnd([1 45],[1 45],[1 500],length(c_t));
is that you supply two mu values and two sigma values, and ask for a matrix of size [1 500] x length(c_t). You need to pass the size in a uniform way, so either
Gaussian = normrnd(mu, sigma,[500 length(c_t)]);
or
Gaussian = normrnd(mu, sigma, 500, length(c_t));
Then you should make sure that the size of the mu/sigma vectors match the size of the matrix you ask for. So if you want a 500 x length(c_t) matrix as output you need to pass 500 x length(c_t) (mu,sigma) pairs. If you only want to vary one of mu or sigma you can pass a single value for the other parameter
To get N values from a normal distribution with fixed mean and steadily increasing sigma you can do
noise = #(mu, s0, s1, n) normrnd(mu, s0:(s1-s0)/(n-1):s1, 1,n)
where s0 is the lowest sigma value and s1 is the largest sigma value. To get 10 values drawn from distributions with mu=0 and sigma increasing from 1 to 5 you can do
noise(0,1,5,10)
If you want to introduce some randomness in the increase of sigma you can do
noise_rand = #(mu, s0, s1, n) normrnd(mu, (s0:(s1-s0)/(n-1):s1) .* rand(1,n), 1,n)
I have the following code for calculating the result of a linear combination of Gaussian functions. What I'd really like to do is to vectorize this somehow so that it's far more performant in Matlab.
Note that y is a column vector (output), x is a matrix where each column corresponds to a data point and each row corresponds to a dimension (i.e. 2 rows = 2D), variance is a double, gaussians is a matrix where each column is a vector corresponding to the mean point of the gaussian and weights is a row vector of the weights in front of each gaussian. Note that the length of weights is 1 bigger than gaussians as weights(1) is the 0th order weight.
function [ y ] = CalcPrediction( gaussians, variance, weights, x )
basisFunctions = size(gaussians, 2);
xvalues = size(x, 2);
if length(weights) ~= basisFunctions + 1
ME = MException('TRAIN:CALC', 'The number of weights should be equal to the number of basis functions plus one');
throw(ME);
end
y = weights(1) * ones(xvalues, 1);
for xIdx = 1:xvalues
for i = 1:basisFunctions
diff = x(:, xIdx) - gaussians(:, i);
y(xIdx) = y(xIdx) + weights(i+1) * exp(-(diff')*diff/(2*variance));
end
end
end
You can see that at the moment I simply iterate over the x vectors and then the gaussians inside 2 for loops. I'm hoping that this can be improved - I've looked at meshgrid but that seems to only apply to vectors (and I have matrices)
Thanks.
Try this
diffx = bsxfun(#minus,x,permute(gaussians,[1,3,2])); % binary operation with singleton expansion
diffx2 = squeeze(sum(diffx.^2,1)); % dot product, shape is now [XVALUES,BASISFUNCTIONS]
weight_col = weights(:); % make sure weights is a column vector
y = exp(-diffx2/2/variance)*weight_col(2:end); % a column vector of length XVALUES
Note, I changed diff to diffx since diff is a builtin. I'm not sure this will improve performance as allocating arrays will offset increase by vectorization.
I have a function that generates normal random number matrix having normal distribution using normrnd.
values(vvvv)= normrnd(0,0.2);
The output is from round1 is:
ans =
0.0210 0.1445 0.5171 -0.1334 0.0375 -0.0165 Inf -0.3866 -0.0878 -0.3589
The output from round 2 is:
ans =
0.0667 0.0783 0.0903 -0.0261 0.0367 -0.0952 0.1724 -0.2723 Inf Inf
The output from round 3 is:
ans =
0.4047 -0.4517 0.4459 0.0675 0.2000 -0.3328 -0.1180 -0.0556 0.0845 Inf
the function will be repeated 20 times.
It is obvious that the function is completely random. What I seek is to add a condition.
What I need is: if any entry has a value between 0.2 and 0.3. that value will be fixed in the next rounds. Only the remaining entries will be subjected to change using the function rand.
I have found the rng(sd) which seeds the random number generator using the nonnegative integer sd so that rand, randi, and randn produce a predictable sequence of numbers.
How to set custom seed for pseudo-random number generator
but how to make several entries of the matrix only effected!!
Another problem: seems that rng is not available for matlab r2009
How to get something similar without entering in the complication of probability & statistics
You can do this more directly than actually generating all these matrices, and it's pretty easy to do so, by thinking about the distribution of the final output.
The probability of a random variable distributed by N(0, .2) lying between .2 and .3 is p ~= .092.
Call the random variable of the final output of your matrix X, where you do this n (20) times. Then either (a) X lies between .2 and .3 and you stopped early, or (b) you didn't draw a number between .2 and .3 in the first n-1 draws and so you went with whatever you got on the nth draw.
The probability of (b) happening is just b=(1-p)^(n-1): the independent events of drawing outside [.2, .3], which have probability 1-p, happend n-1 times. Therefore the probability of (a) is 1-b.
If (b) happened, you just draw a number from normrnd. If (a) happened, you need the value of a normal variable, conditional on its being between .2 and .3. One way to do this is to find the cdf values for .2 and .3, draw uniformly from the range between there, and then use the inverse cdf to get back the original number.
Code that does this:
mu = 0;
sigma = .2;
upper = .3;
lower = .2;
n = 20;
sz = 15;
cdf_upper = normcdf(upper, mu, sigma);
cdf_lower = normcdf(lower, mu, sigma);
p = cdf_upper - cdf_lower;
b = (1-p) ^ (n - 1);
results = zeros(sz, sz);
mask = rand(sz, sz) > b; % mask value 1 means case (a), 0 means case (b)
num_a = sum(mask(:));
cdf_vals = rand(num_a, 1) * p + cdf_lower;
results(mask) = norminv(cdf_vals, mu, sigma);
results(~mask) = normrnd(mu, sigma, sz^2 - num_a, 1);
If you want to simulate this directly for some reason (which is going to involve a lot of wasted effort, but apparently you don't like "the complications of statistics" -- by the way, this is probability, not statistics), you can generate the first matrix and then replace only the elements that don't fall in your desired range. For example:
mu = 0;
sigma = .2;
n = 10;
m = 10;
num_runs = 20;
lower = .2;
upper = .3;
result = normrnd(mu, sigma, n, m);
for i = 1 : (num_runs - 1)
to_replace = (result < lower) | (result > upper);
result(to_replace) = normrnd(mu, sigma, sum(to_replace(:)), 1);
end
To demonstrate that these are the same, here's a plots of the empirical CDFs of doing this for 1x1 matrices 100,000 times. (That is, I ran both functions 100k times and saved the results, then used cdfplot to plot values on the x axis vs portion of the obtained values that are less than that on the y axis.)
They're identical. (Indeed, a K-S test for identity of distribution gives a p-value of .71.) But the direct way was a bunch faster to run.
At the moment i am using the pdist function in Matlab, to calculate the euclidian distances between various points in a three dimensional cartesian system. I'm doing this because i want to know which point has the smallest average distance to all the other points (the medoid). The syntax for pdist looks like this:
% calculate distances between all points
distances = pdist(m);
But because pdist returns a one dimensional array of distances, there is no easy way to figure out which point has the smallest average distance (directly). Which is why i am using squareform and then calculating the smallest average distance, like so:
% convert found distances to matrix of distances
distanceMatrix = squareform(distances);
% find index of point with smallest average distance
[~,j] = min(mean(distanceMatrix,2));
The distances are averaged for each column, and the variable j is the index for the column (and the point) with the smallest average distance.
This works, but squareform takes a lot of time (this piece of code is repeated thousands of times), so i am looking for a way to optimise it. Does anyone know of a faster way to deduce the point with the smallest average distance from the results of pdist?
I think for your task using SQUAREFORM function is the best way from vectorization view point. If you look at the content of this function by
edit squareform
You will see that it performs a lot of checks that take time of course. Since you know your input to squareform and can be sure it will work, you can create your custom function with just the core of squareform.
[r, c] = size(m);
distanceMatrix = zeros(r);
distanceMatrix(tril(true(r),-1)) = distances;
distanceMatrix = distanceMatrix + distanceMatrix';
Then run the same code as you did to find the medioid.
Here's an implementation that doesn't require a call to squareform:
N1 = 10;
dim = 5;
% generate points
X = randn(N1, dim);
% find mean distance
for iter=N1:-1:1
d_mean(iter) = mean(pdist2(X(iter,:),X([1:(iter-1) (iter+1):end],:),'euclidean'));
% D(iter,:) = pdist2(X(iter,:),X([1:(iter-1) (iter+1):end],:),'euclidean');
end
[val ind] = min(d_mean);
But without knowing more about your problem, I have no idea if it would be faster.
If this is the lynchpin for your program's performance, you may need to consider other speedup options like mex.
Good luck.
When pdist computes distances between pairs of observations (1,2,...,n), the distances are arranged in the following order:
(2,1), (3,1), ..., (m,1), (3,2), ..., (m,2), ..., (m,m–1))
To demonstrate this, try the following:
> X = [.2 .1 .7 .5]';
> D = pdist(X)
.1 .5 .3 .6 .4 .2
In this example, X stores n=4 observations. The result, D, is a vector of distances between observations (2,1), (3,1), (4,1), (3,2), (4,2), (5,4). This arrangement corresponds with the entries of the lower triangular part of the following n-by-n matrix:
M=
0 0 0 0
.1 0 0 0
.5 .6 0 0
.3 .4 .2 0
Notice that D(1)=M(2,1), D(2)=(3,1) and so on. So, one way to get the pair of indices in M that correspond with D(k) would be to compute the linear index of D(k) in M. This could be done as follows:
% matrix size
n = 4;
% r(j) is the no. of elements in cols 1..j, belonging to the upper triangular part
r = cumsum(1:n-1);
% p(j) is the no. elements in cols 1..j, belonging to the lower triangular part
p = cumsum(n-1:-1:1);
% The linear index of value D(k)
q = find(p >= k, 1);
% The subscript indices of value D(k)
[i j] = ind2sub([n n], k + r(q));
Notice that n, r and p need to be set only once. From that point, you can find the index for any given k using the last two lines. Let's check this:
for k = 1:6
q = find(p >= k, 1);
[i, j] = ind2sub([n n], k + r(q));
fprintf('D(%d) is the distance between observations (%d %d)\n', k, i, j);
end
Here's the output:
D(1) is the distance between observations (2 1)
D(2) is the distance between observations (3 1)
D(3) is the distance between observations (4 1)
D(4) is the distance between observations (3 2)
D(5) is the distance between observations (4 2)
D(6) is the distance between observations (4 3)
There is no need to use squareform:
distances = pdist(m);
l=length(distances);
n=(1+sqrt(1+4*l))/2;
m=[];
for i=1:n
idx=[1+i:n:length(distances)];
m(i)=mean(distances(idx));
end
j=min(m);
I am not sure, but maybe this can be vectorised as well, but now it is late.
Background:
Basically I'm using a dynamic time warping algorithm like used in speech recognition to try to warp geological data (filter out noise from environmental conditions) The main difference between these two problems is that dtw prints a warping function that allows both vectors that are input to be warped, whereas for the problem I'm trying to solve I need to keep one reference vector constant while stretching and shrinking the test variable vector to fit.
here is dtw in matlab:
function [Dist,D,k,w]=dtw()
%Dynamic Time Warping Algorithm
%Dist is unnormalized distance between t and r
%D is the accumulated distance matrix
%k is the normalizing factor
%w is the optimal path
%t is the vector you are testing against
%r is the vector you are testing
[t,r,x1,x2]=randomtestdata();
[rows,N]=size(t);
[rows,M]=size(r);
%for n=1:N
% for m=1:M
% d(n,m)=(t(n)-r(m))^2;
% end
%end
d=(repmat(t(:),1,M)-repmat(r(:)',N,1)).^2; %this replaces the nested for loops from above Thanks Georg Schmitz
D=zeros(size(d));
D(1,1)=d(1,1);
for n=2:N
D(n,1)=d(n,1)+D(n-1,1);
end
for m=2:M
D(1,m)=d(1,m)+D(1,m-1);
end
for n=2:N
for m=2:M
D(n,m)=d(n,m)+min([D(n-1,m),D(n-1,m-1),D(n,m-1)]);
end
end
Dist=D(N,M);
n=N;
m=M;
k=1;
w=[];
w(1,:)=[N,M];
while ((n+m)~=2)
if (n-1)==0
m=m-1;
elseif (m-1)==0
n=n-1;
else
[values,number]=min([D(n-1,m),D(n,m-1),D(n-1,m-1)]);
switch number
case 1
n=n-1;
case 2
m=m-1;
case 3
n=n-1;
m=m-1;
end
end
k=k+1;
w=cat(1,w,[n,m]);
end
w=flipud(w)
%w is a matrix that looks like this:
% 1 1
% 1 2
% 2 2
% 3 3
% 3 4
% 3 5
% 4 5
% 5 6
% 6 6
so what this is saying is that the both the first and second points of the second vector should be mapped to the first point of the first vector. i.e. 1 1
1 2
and that the fifth and sixth points on the first vector should be mapped to the second vector at point six. etc. so w contains the x coordinates of the warped data.
Normally I would be able to say
X1=w(:,1);
X2=w(:,2);
for i=1:numel(reference vector)
Y1(i)=reference vector(X1(i));
Y2(i)=test vector(X2(i));
end
but I need not to stretch the reference vector so I need to use the repeats in X1 to know how to shrink Y2 and the repeats in X2 to know how to stretch Y2 rather than using repeats in X1 to stretch Y1 and repeats in X2 to stretch Y2.
I tried using a find method to find the repeats in both X1 and X2 and then average(shrink) or interpolate linearly(stretch) as needed but the code became very complicated and difficult to debug.
Was this really unclear? I had a hard time explaining this problem, but I just need to know how to take w and create a Y2 that is stretched and shrunk accordingly.
First, here's DTW in Matlab translated from the pseudocode on wikipedia:
t = 0:.1:2*pi;
x0 = sin(t) + rand(size(t)) * .1;
x1 = sin(.9*t) + rand(size(t)) * .1;
figure
plot(t, x0, t, x1);
hold on
DTW = zeros(length(x0), length(x1));
DTW(1,:) = inf;
DTW(:,1) = inf;
DTW(1,1) = 0;
for i0 = 2:length(x0)
for i1 = 2:length(x1)
cost = abs(x0(i0) - x1(i1));
DTW(i0, i1) = cost + min( [DTW(i0-1, i1) DTW(i0, i1-1) DTW(i0-1, i1-1)] );
end
end
Whether you are warping x_0 onto x_1, x_1 onto x_0, or warping them onto each other, you can get your answer out of the matrix DTW. In your case:
[cost, path] = min(DTW, [], 2);
plot(t, x1(path));
legend({'x_0', 'x_1', 'x_1 warped to x_0'});
I don't have an answer but I have been playing with the code of #tokkot implemented from the pseudocode in the Wikipedia article. It works, but I think it lacks three requeriments of DTW:
The first and last points of both sequences must be a match, with the use of min(), some (or many) of the first and ending points of one of the sequences are lost.
The output sequence is not monotonically increasing. I have used x1(sort(path)) instead, but I don't believe it is the real minimum distance.
Additionally, for a reason I haven't found yet, some intermediate points of the warped sequences are lost, which I believe is not compatible with DTW.
I'm still searching for an algorithm like DTW in which one of the sequences is fixed (not warped). I need to compare a time series of equally spaced temperature measurements with another sequence. The first one cannot be time shifted, it does not make sense.