Generate random data given mean and standard deviation in MATLAB? - matlab

I have limited data RV for which I can find the mean mu and standard deviation sigma. Now I want to generate more data points keeping the same mu and sigma. How would I go about doing this in MATLAB? I did the following, however when I plot mean of the generated data (mu_2) it does not match mu...
N = 15
R = mean(RV) + std(RV)*randn(N, 1);
mu = mean(RV)*ones(N,1);
mu_2 = mean(R)*ones(N,1);

I think you should use normrnd(mu,sigma) function
go to documentation to get more details
Best regards

That looks correct. For such a small sample size, it's unlikely that you'll get a very good match. Try a much bigger value of N.
If you want to force your dataset to a particular mean and stddev, then you could just generate a set of samples, then measure their mean and stddev, and then just adjust by scaling and scalar addition.
For example:
R = randn(N,1);
% Measure
mu_tmp = mean(R);
std_tmp = std(R);
% Normalise and denormalise
R = (R - mu_tmp) / std_tmp;
R = (R * std_desired) + mu_desired;

You can also generate Gaussian mixtures using the Netlab library (its free!)
mix=gmm(8,3,'spherical');
[Data, Label]=gmmsamp(mix,1000);
The above generates a data set with 8 dimensions and three centers (spherical) over 1000 observations.

Related

Scale one dataset to another in Matlab

I have two datasets that they are a particular metric from two images (dat1 and dat2). I want both images to have the same response. The 'ideal' image should look like the first dataset (dat1)
but the real image looks like the second dataset(dat2).
I want to try to 'fit' the second dataset to the first dataset. How can i scale dat2 so that it looks like dat1 using Matlab?
I have tried to fit dat1 with different polynomials,exponentials or gaussians and then use the coefficients that i found to fit dat2 but the program fails and it does not fit properly, it gives me a straight zero line. When i try to fit dat2 using the same shape allowing the coefficients to be free then the program does not give me the ideal shape that i want because it follows the trends of dat2.
Is there any way to fit a dataset to a another set of data instead of a function?
Normally, in this situation, a very common approach consists in normalizing all the vectors between 0 and 1 (interval [0,1] with both extremes included). This can be easily achieved as follows:
dat1_norm = rescale(dat1);
dat2_norm = rescale(dat2);
If you have a version of Matlab greater than or equal to 2017b, the rescale function is already included by default. Otherwise, it can be defined as follows:
function x = rescale(x)
x = x - min(x);
x = x ./ max(x);
end
In order to achieve the objective you mention (rescaling dat1 based on minimum and maximum values of dat2), you can proceed as #cemsazara said in his comment:
dat2_scaled = rescale(dat2,min(dat1),max(dat1));
But this is a good solution only as long as you can identify the vector with the larger scale a priori. Otherwise, the risk is to rescale the smaller vector based on the values of the bigger one. That's why the first approach I suggested you may be a more comfortable solution.
In order to adopt this second approach, if your Matlab version is less than 2017b, you must modify the custom rescale function defined above in order to accept two supplementar arguments:
function x = rescale(x,mn,mx)
if (nargin == 1)
mn = min(x);
mx = max(x);
elseif ((nargin == 0) || (nargin == 2))
error('Invalid number of arguments supplied.');
end
x = x - mn;
x = x ./ mx;
end

RMSE of two data sets of unequal lengths in MATLAB?

I want to calculate the RMSE of two unequal data sets.
Dataset 1 has the dimensions 1067x1 and dataset 2 has the dimensions 2227x1.
How do I calculate the RMSE?
Thanks
Hard to answer not knowing the data.
One option is interpolate the length of one vector to the other. That gets more accurate if you have e.g. timestamps for the two datasets.
v1 = rand(1067,1);
v2 = rand(2227,1);
v1_int = interp1(1:size(v2,1)/size(v1,1):size(v2,1), v1, 1:size(v2,1), 'linear', 'extrap')';
sqrt(mean((v1_int-v2).^2))
You can interpolate the smaller vector before proceeding with the RMSE calculation, as follows:
d1 = randn(1067,1);
d1_len = numel(d1);
d2 = randn(2227,1);
d2_len = numel(d2);
d1 = interp1(1:(d2_len / d1_len):d2_len,d1,1:d2_len,'linear','extrap');
plot(d2,'b');
hold on;
plot(d1,'r')
hold off;
Alternatively, downsample and upsample functions can be used instead but they require more attention on the final output data and length. Once this is done, you can obtain the RMSE using this code:
RMSE = sqrt(mean(((d2 - d1) .^ 2)));
The RMSE is actually defined as the squared root of the mean of the squares of the errors, where the errors are given by the difference between observed values y and predicted values ycap... so choose carefully which one of your vectors represents the former and which one the latter. For more information, read this.

Computing a moving average

I need to compute a moving average over a data series, within a for loop. I have to get the moving average over N=9 days. The array I'm computing in is 4 series of 365 values (M), which itself are mean values of another set of data. I want to plot the mean values of my data with the moving average in one plot.
I googled a bit about moving averages and the "conv" command and found something which i tried implementing in my code.:
hold on
for ii=1:4;
M=mean(C{ii},2)
wts = [1/24;repmat(1/12,11,1);1/24];
Ms=conv(M,wts,'valid')
plot(M)
plot(Ms,'r')
end
hold off
So basically, I compute my mean and plot it with a (wrong) moving average. I picked the "wts" value right off the mathworks site, so that is incorrect. (source: http://www.mathworks.nl/help/econ/moving-average-trend-estimation.html) My problem though, is that I do not understand what this "wts" is. Could anyone explain? If it has something to do with the weights of the values: that is invalid in this case. All values are weighted the same.
And if I am doing this entirely wrong, could I get some help with it?
My sincerest thanks.
There are two more alternatives:
1) filter
From the doc:
You can use filter to find a running average without using a for loop.
This example finds the running average of a 16-element vector, using a
window size of 5.
data = [1:0.2:4]'; %'
windowSize = 5;
filter(ones(1,windowSize)/windowSize,1,data)
2) smooth as part of the Curve Fitting Toolbox (which is available in most cases)
From the doc:
yy = smooth(y) smooths the data in the column vector y using a moving
average filter. Results are returned in the column vector yy. The
default span for the moving average is 5.
%// Create noisy data with outliers:
x = 15*rand(150,1);
y = sin(x) + 0.5*(rand(size(x))-0.5);
y(ceil(length(x)*rand(2,1))) = 3;
%// Smooth the data using the loess and rloess methods with a span of 10%:
yy1 = smooth(x,y,0.1,'loess');
yy2 = smooth(x,y,0.1,'rloess');
In 2016 MATLAB added the movmean function that calculates a moving average:
N = 9;
M_moving_average = movmean(M,N)
Using conv is an excellent way to implement a moving average. In the code you are using, wts is how much you are weighing each value (as you guessed). the sum of that vector should always be equal to one. If you wish to weight each value evenly and do a size N moving filter then you would want to do
N = 7;
wts = ones(N,1)/N;
sum(wts) % result = 1
Using the 'valid' argument in conv will result in having fewer values in Ms than you have in M. Use 'same' if you don't mind the effects of zero padding. If you have the signal processing toolbox you can use cconv if you want to try a circular moving average. Something like
N = 7;
wts = ones(N,1)/N;
cconv(x,wts,N);
should work.
You should read the conv and cconv documentation for more information if you haven't already.
I would use this:
% does moving average on signal x, window size is w
function y = movingAverage(x, w)
k = ones(1, w) / w
y = conv(x, k, 'same');
end
ripped straight from here.
To comment on your current implementation. wts is the weighting vector, which from the Mathworks, is a 13 point average, with special attention on the first and last point of weightings half of the rest.

generate synthetic data 2d x t x v using matlab

i am trying to generate/simulate a set of synthetic/ simulated data set to generate a synthetic blood flow image in matlab. but i dont know how or where to starts from...
i know i should use the mesh function but how do i make it so it could be in time dimension?
I will be very thankful if anybody could help/ guide me through. I want to generate a data set of size 25x25x10x4. Which is X x Y x t x V. The image should be something similar to this:
or like this:
thank you in advance!
Dataset #1:
Use your favorite line representation (polar, linear, whatever) and randomly generate the parameters for your line. E.g. if you go for y = mx + c, randomly generate m and c. Now that you have defined your line, use this SO method to draw it on the image.
Dataset #2:
They look like 2D Gaussians. Use mvnpdf in the following manner.
[X Y] = meshgrid(x_range,y_range);
Z = reshape( mvnpdf([X(:) Y(:)],MU,SIGMA) ,size(X));
imagesc(Z);
Use some randomly generated MU and SIGMA such that MU lies in x_range and y_range. E.g. x_range = -3:0.1:3;y_range = x_range; and
MU =
0.9575 0.9649
SIGMA =
1.2647 0.3760
0.3760 1.0938
Just to complement #Jacob 's very specific answer, you need a 4D MxNxTxV matrix. In this, according to the post, MxN is the dimension of each image, T is the time dimension, and V is the number of channels or samples per time frame (3 for RGB or >3 for any spectral image).
For each T, generate V images.
Simulate the V images with random parameters for Dataset #1 and Dataset #2.
Put everything in one 4D matrix per Dataset (i.e. using a double for or concatenation)
Replace rand() with generate_image() below, i.e. a function generating random samples of the type of structure you want, according to #Jacob 's suggestions:
M = 25; N = 25;
T = 10; V = 4;
DataSet1 = zeros(M,N,T,V);
DataSet2 = zeros(M,N,T,V);
for t = 1:T
for v = 1:V
DataSet1(:,:,t,v) = randn(M,N);
DataSet2(:,:,t,v) = randn(M,N);
end
end

Matlab - Signal Noise Removal

I have a vector of data, which contains integers in the range -20 20.
Bellow is a plot with the values:
This is a sample of 96 elements from the vector data. The majority of the elements are situated in the interval -2, 2, as can be seen from the above plot.
I want to eliminate the noise from the data. I want to eliminate the low amplitude peaks, and keep the high amplitude peak, namely, peaks like the one at index 74.
Basically, I just want to increase the contrast between the high amplitude peaks and low amplitude peaks, and if it would be possible to eliminate the low amplitude peaks.
Could you please suggest me a way of doing this?
I have tried mapstd function, but the problem is that it also normalizes that high amplitude peak.
I was thinking at using the wavelet transform toolbox, but I don't know exact how to reconstruct the data from the wavelet decomposition coefficients.
Can you recommend me a way of doing this?
One approach to detect outliers is to use the three standard deviation rule. An example:
%# some random data resembling yours
x = randn(100,1);
x(75) = -14;
subplot(211), plot(x)
%# tone down the noisy points
mu = mean(x); sd = std(x); Z = 3;
idx = ( abs(x-mu) > Z*sd ); %# outliers
x(idx) = Z*sd .* sign(x(idx)); %# cap values at 3*STD(X)
subplot(212), plot(x)
EDIT:
It seems I misunderstood the goal here. If you want to do the opposite, maybe something like this instead:
%# some random data resembling yours
x = randn(100,1);
x(75) = -14; x(25) = 20;
subplot(211), plot(x)
%# zero out everything but the high peaks
mu = mean(x); sd = std(x); Z = 3;
x( abs(x-mu) < Z*sd ) = 0;
subplot(212), plot(x)
If it's for demonstrative purposes only, and you're not actually going to be using these scaled values for anything, I sometimes like to increase contrast in the following way:
% your data is in variable 'a'
plot(a.*abs(a)/max(abs(a)))
edit: since we're posting images, here's mine (before/after):
You might try a split window filter. If x is your current sample, the filter would look something like:
k = [L L L L L L 0 0 0 x 0 0 0 R R R R R R]
For each sample x, you average a band of surrounding samples on the left (L) and a band of surrounding samples on the right. If your samples are positive and negative (as yours are) you should take the abs. value first. You then divide the sample x by the average value of these surrounding samples.
y[n] = x[n] / mean(abs(x([L R])))
Each time you do this the peaks are accentuated and the noise is flattened. You can do more than one pass to increase the effect. It is somewhat sensitive to the selection of the widths of these bands, but can work. For example:
Two passes:
What you actually need is some kind of compression to scale your data, that is: values between -2 and 2 are scale by a certain factor and everything else is scaled by another factor. A crude way to accomplish such a thing, is by putting all small values to zero, i.e.
x = randn(1,100)/2; x(50) = 20; x(25) = -15; % just generating some data
threshold = 2;
smallValues = (abs(x) <= threshold);
y = x;
y(smallValues) = 0;
figure;
plot(x,'DisplayName','x'); hold on;
plot(y,'r','DisplayName','y');
legend show;
Please do not that this is a very nonlinear operation (e.g. when you have wanted peaks valued at 2.1 and 1.9, they will produce very different behavior: one will be removed, the other will be kept). So for displaying, this might be all you need, for further processing it might depend on what you are trying to do.
To eliminate the low amplitude peaks, you're going to equate all the low amplitude signal to noise and ignore.
If you have any apriori knowledge, just use it.
if your signal is a, then
a(abs(a)<X) = 0
where X is the max expected size of your noise.
If you want to get fancy, and find this "on the fly" then, use kmeans of 3. It's in the statistics toolbox, here:
http://www.mathworks.com/help/toolbox/stats/kmeans.html
Alternatively, you can use Otsu's method on the absolute values of the data, and use the sign back.
Note, these and every other technique I've seen on this thread is assuming you are doing post processing. If you are doing this processing in real time, things will have to change.